Running Models Across Scenarios¶
Overview¶
Gaspatchio provides efficient, vectorized scenario support that lets you run 1 or 10,000 scenarios with minimal code changes and maximum performance.
Key benefits:
- Single vectorized execution: Run all scenarios in one pass instead of N sequential runs
- Transparent operations: Cross-join and lookup patterns you can audit and understand
- Minimal code changes: Add scenarios to an existing model by changing just a few lines
- Production-ready performance: Leverage Polars' streaming engine for bounded-memory execution at scale
The Core Pattern: with_scenarios()¶
The with_scenarios() function expands your model points across scenarios via a simple cross-join:
import gaspatchio_core as gs
from gaspatchio_core import ActuarialFrame
import polars as pl
# Load model points (8 policies)
af = ActuarialFrame(pl.read_parquet("model_points.parquet"))
# Expand across scenarios
af = gs.with_scenarios(af, ["BASE", "UP", "DOWN"])
# Result: 8 x 3 = 24 rows, with scenario_id column added
# All original columns preserved
What happens:
- Creates a DataFrame with your scenario IDs
- Cross-joins it with your model points
- Returns an ActuarialFrame with a
scenario_idcolumn - Row count multiplies:
policies x scenarios
This is transparent - you can see exactly what happened. No hidden magic.
Loading Scenario-Varying Assumptions¶
Most assumptions (mortality, lapse) stay the same across scenarios. Economic assumptions like discount rates typically vary.
Option 1: Single Table with Scenario Dimension¶
If your assumption table already has a scenario_id column:
# discount_rates.parquet contains (scenario_id, year) -> rate
disc_rate_table = gs.Table(
name="discount_rates",
source="discount_rates.parquet",
dimensions={
"scenario_id": "scenario_id",
"year": "year",
},
value="disc_rate_ann"
)
# Lookup uses the scenario_id from your expanded frame
af.disc_rate = disc_rate_table.lookup(
scenario_id=af.scenario_id, # <-- Dynamic per scenario
year=af.year
)
Option 2: Separate Files Per Scenario¶
If scenarios are stored in separate files (common with ESG tools):
# Load from multiple files and combine into one table
disc_rate_table = gs.Table.from_scenario_files(
scenario_files={
"BASE": "scenarios/BASE/discount_rates.parquet",
"UP": "scenarios/UP/discount_rates.parquet",
"DOWN": "scenarios/DOWN/discount_rates.parquet",
},
scenario_column="scenario_id",
dimensions={"year": "year"},
value="disc_rate_ann",
name="discount_rates"
)
# Lookup works the same way
af.disc_rate = disc_rate_table.lookup(
scenario_id=af.scenario_id,
year=af.year
)
What happens:
- Each file is loaded and tagged with its scenario ID
- All DataFrames are concatenated
- A
Tableis created withscenario_idas a dimension - Lookups join on both
scenario_idand other dimensions
Option 3: Template-Based Loading¶
For consistent file naming patterns:
# When files follow a predictable pattern
disc_rate_table = gs.Table.from_scenario_template(
path_template="scenarios/{scenario_id}/discount_rates.parquet",
scenario_ids=["BASE", "UP", "DOWN"],
scenario_column="scenario_id",
dimensions={"year": "year"},
value="disc_rate_ann"
)
# Equivalent to from_scenario_files but more concise
Complete Workflow Example¶
This example is based on the BasicTerm_ME model from lifelib (lifelib.io/libraries/basiclife/BasicTerm_ME.html), adapted to demonstrate scenario support.
import datetime
from pathlib import Path
import polars as pl
from gaspatchio_core import ActuarialFrame, with_scenarios
from gaspatchio_core.assumptions import Table
def setup_assumptions(base_path: Path):
"""Load assumption tables."""
# Premium rates - fixed across scenarios
premium_table = Table(
name="premium_rates",
source=base_path / "premium_table.parquet",
dimensions={
"age_at_entry": "age_at_entry",
"policy_term": "policy_term"
},
value="premium_rate"
)
# Mortality - fixed across scenarios
mort_table = Table(
name="mortality_std",
source=base_path / "mort_table.parquet",
dimensions={
"age": "age",
"duration": "duration"
},
value="mort_rate"
)
# Discount rates - SCENARIO-VARYING
disc_rate_table = Table.from_scenario_files(
scenario_files={
"BASE": base_path / "disc_rates" / "BASE.parquet",
"UP": base_path / "disc_rates" / "UP.parquet",
"DOWN": base_path / "disc_rates" / "DOWN.parquet",
},
scenario_column="scenario_id",
dimensions={"year": "year"},
value="disc_rate_ann",
name="disc_rates_by_scenario"
)
return premium_table, mort_table, disc_rate_table
def run_projection(af, val_date, premium_table, mort_table, disc_rate_table):
"""Run actuarial projection."""
# Create projection timeline
max_result = af.max()
max_projection_length = max_result["policy_term"] * 12 - max_result["duration_mth"]
af = af.date.create_projection_timeline(
valuation_date=val_date,
projection_end_type="term_months",
projection_end_value=max_projection_length,
projection_frequency="monthly",
output_column="projection_months"
)
# Time variables
af.month = (af.projection_months.dt.year() - val_date.year) * 12 + (
af.projection_months.dt.month() - val_date.month
)
af.duration_mth_t = af.duration_mth + af.month
af.age = af.age_at_entry + (af.duration_mth_t // 12)
# Mortality & lapse (same for all scenarios)
af.mort_rate = mort_table.lookup(
age=af.age.ceil(),
duration=(af.duration_mth_t // 12).clip(lower_bound=0, upper_bound=5)
)
af.mort_rate_mth = 1 - (1 - af.mort_rate) ** (1 / 12)
# Calculate policies in force
af.combined_decrement = 1 - ((1 - af.mort_rate_mth) * (1 - af.lapse_rate_mth))
af.survival_prob = af.combined_decrement.projection.cumulative_survival()
af.pols_if = af.survival_prob * af.policy_count
# Cash flows
af.premium_rate = premium_table.lookup(
age_at_entry=af.age_at_entry,
policy_term=af.policy_term
)
af.premiums = af.premium_rate * af.pols_if
af.claims = af.sum_assured * af.pols_if * af.mort_rate_mth
# DISCOUNTING - SCENARIO-AWARE
# Each row looks up discount rates by BOTH year AND scenario_id
af.year_for_lookup = af.month // 12
af.disc_rate_ann = disc_rate_table.lookup(
scenario_id=af.scenario_id, # <-- THE KEY: dynamic per scenario
year=af.year_for_lookup
)
af.disc_rate_mth = af.disc_rate_ann.finance.to_monthly(method="compound")
af = af.finance.discount_factor(
rate_col="disc_rate_mth",
periods_col="month",
output_col="disc_factors",
method="spot"
)
# Present values
af.pv_premiums = (af.premiums * af.disc_factors).list.sum()
af.pv_claims = (af.claims * af.disc_factors).list.sum()
af.pv_net_cf = af.pv_premiums - af.pv_claims
return af
def main():
"""Run model across multiple scenarios."""
# 1. Load model points
af = ActuarialFrame(pl.read_parquet("model_points.parquet"))
# 2. EXPAND ACROSS SCENARIOS
scenarios = ["BASE", "UP", "DOWN"]
af = with_scenarios(af, scenarios)
# Now: policies x 3 rows, with scenario_id column
# 3. Load assumptions
premium_table, mort_table, disc_rate_table = setup_assumptions(Path("assumptions"))
# 4. Run projection (same code works for 1 or 10,000 scenarios)
af = run_projection(af, datetime.date(2025, 1, 1),
premium_table, mort_table, disc_rate_table)
# 5. Aggregate by scenario
result_df = af.collect()
summary = (
result_df
.group_by("scenario_id")
.agg([
pl.col("pv_premiums").sum().alias("total_pv_premiums"),
pl.col("pv_claims").sum().alias("total_pv_claims"),
pl.col("pv_net_cf").sum().alias("total_pv_net_cf"),
])
.sort("scenario_id")
)
# 6. Calculate risk metrics
reserves = summary.sort("total_pv_net_cf", descending=True)["total_pv_net_cf"]
cte_98 = reserves.head(int(0.02 * len(reserves))).mean()
return summary
if __name__ == "__main__":
summary = main()
print(summary)
Example output (10 policies x 3 scenarios):
shape: (3, 4)
┌─────────────┬───────────────────┬────────────────┬────────────────┐
│ scenario_id ┆ total_pv_premiums ┆ total_pv_claims┆ total_pv_net_cf│
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞═════════════╪═══════════════════╪════════════════╪════════════════╡
│ BASE ┆ 749,547.65 ┆ 536,108.66 ┆ 78,486.57 │
│ DOWN ┆ 750,075.47 ┆ 536,504.02 ┆ 78,583.99 │
│ UP ┆ 745,587.63 ┆ 533,224.12 ┆ 78,020.35 │
└─────────────┴───────────────────┴────────────────┴────────────────┘
SCENARIO IMPACT vs BASE:
BASE : PV = 78,486.57 ( +0.00 / +0.00%)
DOWN : PV = 78,583.99 ( +97.42 / +0.12%) <- Lower rates = higher PV
UP : PV = 78,020.35 ( -466.22 / -0.59%) <- Higher rates = lower PV
Key points:
- Model code is identical whether running 1 scenario or 10,000
- Only scenario-varying assumptions need
scenario_idin lookups - Fixed assumptions (mortality, premium rates) stay unchanged
- Aggregation by
scenario_idgives you per-scenario totals
Design Philosophy: Scenario-Ready by Default¶
Best practice: Build every model scenario-aware from day one.
Even a "single scenario" deterministic model should use the scenario pattern:
# A "single scenario" model is just a model with one scenario
af = with_scenarios(af, ["DETERMINISTIC"])
Why:
- Model code is identical for 1 or 10,000 scenarios
- Adding scenarios later = changing one line, not refactoring
- Testing with 1 scenario, running with 10,000 - same code
- No "basic" vs "advanced" patterns to learn
| Aspect | Pattern |
|---|---|
scenario_id column |
Always exists (even as "DETERMINISTIC") |
| Table dimensions | Always include scenario_id where relevant |
| Lookups | Always use af.scenario_id, never pl.lit("BASE") |
| Results | Always have scenario_id for consistent aggregation |
Common Patterns¶
Sensitivity Analysis¶
# Interest rate sensitivity
scenarios = [
"RATES_DOWN_100BPS",
"RATES_DOWN_50BPS",
"BASE",
"RATES_UP_50BPS",
"RATES_UP_100BPS"
]
af = with_scenarios(af, scenarios)
result = run_projection(af, ...)
# Aggregate and compare
by_scenario = result.collect().group_by("scenario_id").agg([
pl.col("pv_net_cf").sum().alias("total_pv")
])
Regulatory Capital Calculation¶
# Integer IDs for performance
scenario_ids = list(range(1, 1001))
af = with_scenarios(af, scenario_ids)
result = run_projection(af, ...)
# Aggregate to scenario-level reserves
reserves = (
result.collect()
.group_by("scenario_id")
.agg(pl.col("pv_net_cf").sum().alias("total_reserve"))
.sort("total_reserve", descending=True)
)
# CTE at 98th percentile
cte_98 = reserves.head(int(0.02 * len(reserves)))["total_reserve"].mean()
ESG Integration¶
# ESG typically outputs all scenarios in one file
# fund_returns.parquet: (scenario_id, month, fund_id) -> return
returns_table = gs.Table(
source="esg_output/fund_returns_1000scenarios.parquet",
dimensions={
"scenario_id": "scenario_id",
"month": "month",
"fund_id": "fund_id"
},
value="monthly_return"
)
# Use in model
af.fund_return = returns_table.lookup(
scenario_id=af.scenario_id,
month=af.projection_month,
fund_id=af.fund_id
)
Summary¶
Scenario support in Gaspatchio:
- Transparent: Cross-join and lookup - you can see what's happening
- Minimal code: Add scenarios with 1-3 line changes to existing models
- Performant: Vectorized execution with streaming for bounded memory
- Scalable: Works for 3 scenarios or 10,000 with the same code
- Scenario-ready by default: Build models that work for any scenario count
The core pattern is simple:
- Expand model points with
with_scenarios() - Load scenario-varying assumptions with
Table.from_scenario_files() - Use
af.scenario_idin lookups - Aggregate by
scenario_idfor results