Skip to content

Running Models Across Scenarios

Overview

Gaspatchio provides efficient, vectorized scenario support that lets you run 1 or 10,000 scenarios with minimal code changes and maximum performance.

Key benefits:

  • Single vectorized execution: Run all scenarios in one pass instead of N sequential runs
  • Transparent operations: Cross-join and lookup patterns you can audit and understand
  • Minimal code changes: Add scenarios to an existing model by changing just a few lines
  • Production-ready performance: Leverage Polars' streaming engine for bounded-memory execution at scale

The Core Pattern: with_scenarios()

The with_scenarios() function expands your model points across scenarios via a simple cross-join:

import gaspatchio_core as gs
from gaspatchio_core import ActuarialFrame
import polars as pl

# Load model points (8 policies)
af = ActuarialFrame(pl.read_parquet("model_points.parquet"))

# Expand across scenarios
af = gs.with_scenarios(af, ["BASE", "UP", "DOWN"])

# Result: 8 x 3 = 24 rows, with scenario_id column added
# All original columns preserved

What happens:

  1. Creates a DataFrame with your scenario IDs
  2. Cross-joins it with your model points
  3. Returns an ActuarialFrame with a scenario_id column
  4. Row count multiplies: policies x scenarios

This is transparent - you can see exactly what happened. No hidden magic.

Loading Scenario-Varying Assumptions

Most assumptions (mortality, lapse) stay the same across scenarios. Economic assumptions like discount rates typically vary.

Option 1: Single Table with Scenario Dimension

If your assumption table already has a scenario_id column:

# discount_rates.parquet contains (scenario_id, year) -> rate
disc_rate_table = gs.Table(
    name="discount_rates",
    source="discount_rates.parquet",
    dimensions={
        "scenario_id": "scenario_id",
        "year": "year",
    },
    value="disc_rate_ann"
)

# Lookup uses the scenario_id from your expanded frame
af.disc_rate = disc_rate_table.lookup(
    scenario_id=af.scenario_id,  # <-- Dynamic per scenario
    year=af.year
)

Option 2: Separate Files Per Scenario

If scenarios are stored in separate files (common with ESG tools):

# Load from multiple files and combine into one table
disc_rate_table = gs.Table.from_scenario_files(
    scenario_files={
        "BASE": "scenarios/BASE/discount_rates.parquet",
        "UP": "scenarios/UP/discount_rates.parquet",
        "DOWN": "scenarios/DOWN/discount_rates.parquet",
    },
    scenario_column="scenario_id",
    dimensions={"year": "year"},
    value="disc_rate_ann",
    name="discount_rates"
)

# Lookup works the same way
af.disc_rate = disc_rate_table.lookup(
    scenario_id=af.scenario_id,
    year=af.year
)

What happens:

  1. Each file is loaded and tagged with its scenario ID
  2. All DataFrames are concatenated
  3. A Table is created with scenario_id as a dimension
  4. Lookups join on both scenario_id and other dimensions

Option 3: Template-Based Loading

For consistent file naming patterns:

# When files follow a predictable pattern
disc_rate_table = gs.Table.from_scenario_template(
    path_template="scenarios/{scenario_id}/discount_rates.parquet",
    scenario_ids=["BASE", "UP", "DOWN"],
    scenario_column="scenario_id",
    dimensions={"year": "year"},
    value="disc_rate_ann"
)

# Equivalent to from_scenario_files but more concise

Complete Workflow Example

This example is based on the BasicTerm_ME model from lifelib (lifelib.io/libraries/basiclife/BasicTerm_ME.html), adapted to demonstrate scenario support.

import datetime
from pathlib import Path
import polars as pl
from gaspatchio_core import ActuarialFrame, with_scenarios
from gaspatchio_core.assumptions import Table

def setup_assumptions(base_path: Path):
    """Load assumption tables."""

    # Premium rates - fixed across scenarios
    premium_table = Table(
        name="premium_rates",
        source=base_path / "premium_table.parquet",
        dimensions={
            "age_at_entry": "age_at_entry",
            "policy_term": "policy_term"
        },
        value="premium_rate"
    )

    # Mortality - fixed across scenarios
    mort_table = Table(
        name="mortality_std",
        source=base_path / "mort_table.parquet",
        dimensions={
            "age": "age",
            "duration": "duration"
        },
        value="mort_rate"
    )

    # Discount rates - SCENARIO-VARYING
    disc_rate_table = Table.from_scenario_files(
        scenario_files={
            "BASE": base_path / "disc_rates" / "BASE.parquet",
            "UP": base_path / "disc_rates" / "UP.parquet",
            "DOWN": base_path / "disc_rates" / "DOWN.parquet",
        },
        scenario_column="scenario_id",
        dimensions={"year": "year"},
        value="disc_rate_ann",
        name="disc_rates_by_scenario"
    )

    return premium_table, mort_table, disc_rate_table

def run_projection(af, val_date, premium_table, mort_table, disc_rate_table):
    """Run actuarial projection."""

    # Create projection timeline
    max_result = af.max()
    max_projection_length = max_result["policy_term"] * 12 - max_result["duration_mth"]

    af = af.date.create_projection_timeline(
        valuation_date=val_date,
        projection_end_type="term_months",
        projection_end_value=max_projection_length,
        projection_frequency="monthly",
        output_column="projection_months"
    )

    # Time variables
    af.month = (af.projection_months.dt.year() - val_date.year) * 12 + (
        af.projection_months.dt.month() - val_date.month
    )
    af.duration_mth_t = af.duration_mth + af.month
    af.age = af.age_at_entry + (af.duration_mth_t // 12)

    # Mortality & lapse (same for all scenarios)
    af.mort_rate = mort_table.lookup(
        age=af.age.ceil(),
        duration=(af.duration_mth_t // 12).clip(lower_bound=0, upper_bound=5)
    )
    af.mort_rate_mth = 1 - (1 - af.mort_rate) ** (1 / 12)

    # Calculate policies in force
    af.combined_decrement = 1 - ((1 - af.mort_rate_mth) * (1 - af.lapse_rate_mth))
    af.survival_prob = af.combined_decrement.projection.cumulative_survival()
    af.pols_if = af.survival_prob * af.policy_count

    # Cash flows
    af.premium_rate = premium_table.lookup(
        age_at_entry=af.age_at_entry,
        policy_term=af.policy_term
    )
    af.premiums = af.premium_rate * af.pols_if
    af.claims = af.sum_assured * af.pols_if * af.mort_rate_mth

    # DISCOUNTING - SCENARIO-AWARE
    # Each row looks up discount rates by BOTH year AND scenario_id
    af.year_for_lookup = af.month // 12
    af.disc_rate_ann = disc_rate_table.lookup(
        scenario_id=af.scenario_id,  # <-- THE KEY: dynamic per scenario
        year=af.year_for_lookup
    )

    af.disc_rate_mth = af.disc_rate_ann.finance.to_monthly(method="compound")
    af = af.finance.discount_factor(
        rate_col="disc_rate_mth",
        periods_col="month",
        output_col="disc_factors",
        method="spot"
    )

    # Present values
    af.pv_premiums = (af.premiums * af.disc_factors).list.sum()
    af.pv_claims = (af.claims * af.disc_factors).list.sum()
    af.pv_net_cf = af.pv_premiums - af.pv_claims

    return af

def main():
    """Run model across multiple scenarios."""

    # 1. Load model points
    af = ActuarialFrame(pl.read_parquet("model_points.parquet"))

    # 2. EXPAND ACROSS SCENARIOS
    scenarios = ["BASE", "UP", "DOWN"]
    af = with_scenarios(af, scenarios)
    # Now: policies x 3 rows, with scenario_id column

    # 3. Load assumptions
    premium_table, mort_table, disc_rate_table = setup_assumptions(Path("assumptions"))

    # 4. Run projection (same code works for 1 or 10,000 scenarios)
    af = run_projection(af, datetime.date(2025, 1, 1),
                       premium_table, mort_table, disc_rate_table)

    # 5. Aggregate by scenario
    result_df = af.collect()

    summary = (
        result_df
        .group_by("scenario_id")
        .agg([
            pl.col("pv_premiums").sum().alias("total_pv_premiums"),
            pl.col("pv_claims").sum().alias("total_pv_claims"),
            pl.col("pv_net_cf").sum().alias("total_pv_net_cf"),
        ])
        .sort("scenario_id")
    )

    # 6. Calculate risk metrics
    reserves = summary.sort("total_pv_net_cf", descending=True)["total_pv_net_cf"]
    cte_98 = reserves.head(int(0.02 * len(reserves))).mean()

    return summary

if __name__ == "__main__":
    summary = main()
    print(summary)

Example output (10 policies x 3 scenarios):

shape: (3, 4)
┌─────────────┬───────────────────┬────────────────┬────────────────┐
│ scenario_id ┆ total_pv_premiums ┆ total_pv_claims┆ total_pv_net_cf│
│ ---         ┆ ---               ┆ ---            ┆ ---            │
│ str         ┆ f64               ┆ f64            ┆ f64            │
╞═════════════╪═══════════════════╪════════════════╪════════════════╡
│ BASE        ┆ 749,547.65        ┆ 536,108.66     ┆ 78,486.57      │
│ DOWN        ┆ 750,075.47        ┆ 536,504.02     ┆ 78,583.99      │
│ UP          ┆ 745,587.63        ┆ 533,224.12     ┆ 78,020.35      │
└─────────────┴───────────────────┴────────────────┴────────────────┘

SCENARIO IMPACT vs BASE:
  BASE  : PV =    78,486.57  (     +0.00 /  +0.00%)
  DOWN  : PV =    78,583.99  (    +97.42 /  +0.12%)  <- Lower rates = higher PV
  UP    : PV =    78,020.35  (   -466.22 /  -0.59%)  <- Higher rates = lower PV

Key points:

  • Model code is identical whether running 1 scenario or 10,000
  • Only scenario-varying assumptions need scenario_id in lookups
  • Fixed assumptions (mortality, premium rates) stay unchanged
  • Aggregation by scenario_id gives you per-scenario totals

Design Philosophy: Scenario-Ready by Default

Best practice: Build every model scenario-aware from day one.

Even a "single scenario" deterministic model should use the scenario pattern:

# A "single scenario" model is just a model with one scenario
af = with_scenarios(af, ["DETERMINISTIC"])

Why:

  • Model code is identical for 1 or 10,000 scenarios
  • Adding scenarios later = changing one line, not refactoring
  • Testing with 1 scenario, running with 10,000 - same code
  • No "basic" vs "advanced" patterns to learn
Aspect Pattern
scenario_id column Always exists (even as "DETERMINISTIC")
Table dimensions Always include scenario_id where relevant
Lookups Always use af.scenario_id, never pl.lit("BASE")
Results Always have scenario_id for consistent aggregation

Common Patterns

Sensitivity Analysis

# Interest rate sensitivity
scenarios = [
    "RATES_DOWN_100BPS",
    "RATES_DOWN_50BPS",
    "BASE",
    "RATES_UP_50BPS",
    "RATES_UP_100BPS"
]

af = with_scenarios(af, scenarios)
result = run_projection(af, ...)

# Aggregate and compare
by_scenario = result.collect().group_by("scenario_id").agg([
    pl.col("pv_net_cf").sum().alias("total_pv")
])

Regulatory Capital Calculation

# Integer IDs for performance
scenario_ids = list(range(1, 1001))
af = with_scenarios(af, scenario_ids)

result = run_projection(af, ...)

# Aggregate to scenario-level reserves
reserves = (
    result.collect()
    .group_by("scenario_id")
    .agg(pl.col("pv_net_cf").sum().alias("total_reserve"))
    .sort("total_reserve", descending=True)
)

# CTE at 98th percentile
cte_98 = reserves.head(int(0.02 * len(reserves)))["total_reserve"].mean()

ESG Integration

# ESG typically outputs all scenarios in one file
# fund_returns.parquet: (scenario_id, month, fund_id) -> return

returns_table = gs.Table(
    source="esg_output/fund_returns_1000scenarios.parquet",
    dimensions={
        "scenario_id": "scenario_id",
        "month": "month",
        "fund_id": "fund_id"
    },
    value="monthly_return"
)

# Use in model
af.fund_return = returns_table.lookup(
    scenario_id=af.scenario_id,
    month=af.projection_month,
    fund_id=af.fund_id
)

Summary

Scenario support in Gaspatchio:

  • Transparent: Cross-join and lookup - you can see what's happening
  • Minimal code: Add scenarios with 1-3 line changes to existing models
  • Performant: Vectorized execution with streaming for bounded memory
  • Scalable: Works for 3 scenarios or 10,000 with the same code
  • Scenario-ready by default: Build models that work for any scenario count

The core pattern is simple:

  1. Expand model points with with_scenarios()
  2. Load scenario-varying assumptions with Table.from_scenario_files()
  3. Use af.scenario_id in lookups
  4. Aggregate by scenario_id for results