Gaspatchio: Core Concepts¶
What is Gaspatchio?¶
Gaspatchio is a Python-first actuarial modeling framework that lets you build production-grade models with the ergonomics of Python and the performance of Rust. It marries a Python-native DSL with a high‑performance execution engine powered by Polars and Rust, so you can write clear model code and still scale to millions of policies and long projections.
Why it exists¶
- Simple: Express models in idiomatic Python using a small set of core abstractions. No new language to learn.
- Fast: Under the hood, computations run on Polars’ Rust engine with parallelism and SIMD. Heavy joins and columnar math are vectorized.
- Best served cold: Gaspatchio embraces lazy evaluation and query planning so large model runs stay memory‑efficient and predictable.
The core abstraction: ActuarialFrame¶
ActuarialFrame is a thin, actuarial-aware wrapper around Polars that captures column logic, tracks dependencies, and executes lazily as a single plan. Compared to pandas:
- Lazy by default: Transformations are composed, optimized, then executed once via
.collect(). - Expression tracking: Column lineage is tracked for auditability and reproducibility.
- Actuarial affordances: Built-in helpers for projection timelines, date/financial math, and table lookups.
Assumptions and table lookups¶
Assumption handling is a first-class feature:
- Register once, reuse everywhere: Load tables into an in‑memory registry and look them up by name. Composite keys are supported (e.g.,
age_lastandsex_smoking). - Wide-to-long transforms: Register wide tables (e.g., mortality with
MNS/FNS/MS/FS) using a built-in long transform for join‑friendly layouts. - Vector-aware lookups: Lookups work with scalar columns and with vector/list columns for projections, returning aligned vectors.
- High-throughput joins: Implemented on top of Polars joins with Rust concurrency. The registry is designed for read-heavy workloads.
Excel-compatible functions (without Excel)¶
Gaspatchio provides Excel‑like semantics directly on columns for clarity and interoperability. If you know Excel, you can read and write most calculations immediately – the API mirrors spreadsheet formulas but works on columns and vectors:
- Date functions:
YEARFRAC,DAYS,EDATE,EOMONTH, with day‑count conventions (e.g.,basis="act/act"). - Financial functions:
PV, discount factors, and related helpers (Excel semantics, column‑wise and vectorized). - Math and string: Common numeric and text helpers available on columns and list vectors.
These live alongside native Polars expressions, so you can mix and match.
Using Excel functions on columns¶
import datetime
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"issue_date": [datetime.date(2022, 1, 15), datetime.date(2021, 6, 1)],
"val_date": [datetime.date(2024, 12, 31), datetime.date(2024, 12, 31)],
"annual_premium": [1200.0, 2800.0],
"rate": [0.035, 0.04],
"term_years": [20, 25],
}
)
# Dates
af.days_in_force = af.val_date.excel.days(af.issue_date)
af.years_in_force = af.issue_date.excel.yearfrac(af.val_date, basis="act/act").round(4)
# Finance (Excel PV semantics)
af.pv_premiums = af.rate.excel.pv(nper=af.term_years, pmt=af.annual_premium).round(2)
# Strings/Math
af.label = ("POLICY_" + af.days_in_force.cast(str))
af.discount_factor = (1 + af.rate) ** (-af.term_years)
df = af.collect()
Vectorized Excel functions¶
Excel semantics apply to list (vector) columns as well. For time-series operations, prefer the .projection accessor (e.g., .projection.cumulative_survival(), .projection.previous_period()) over manual .list.eval() transforms.
"""Actuarial example: Calculate present value and IRR for policy cash flows."""
from gaspatchio_core import ActuarialFrame
# Portfolio of 3 policies with monthly cash flows over 5 years (60 months)
# Premiums are received monthly, claims are sporadic
policies = ActuarialFrame(
{
"policy_id": ["POL001", "POL002", "POL003"],
"annual_rate": [0.05, 0.04, 0.06],
"monthly_premiums": [
[100.0] * 60, # POL001: constant $100/month
[150.0] * 60, # POL002: constant $150/month
[200.0] * 60, # POL003: constant $200/month
],
"monthly_claims": [
[0.0] * 30 + [2000.0] + [0.0] * 29, # POL001: one claim at month 30
[0.0] * 20
+ [1500.0]
+ [0.0] * 19
+ [3000.0]
+ [0.0] * 19, # POL002: two claims
[0.0] * 60, # POL003: no claims
],
}
)
# Calculate net monthly cash flows (premiums - claims)
policies.net_cashflows = policies.monthly_premiums - policies.monthly_claims
# Calculate monthly interest rate
policies.monthly_rate = policies.annual_rate / 12
# Calculate present value of the cash flow streams using Excel PV function
policies.pv_cashflows = policies.monthly_rate.excel.pv(
nper=60,
pmt=policies.net_cashflows,
)
# Calculate IRR on the net cash flows (internal rate of return)
policies.irr_monthly = policies.net_cashflows.excel.irr()
policies.irr_annual = (1 + policies.irr_monthly) ** 12 - 1
# Round for reporting
policies.pv_cashflows = policies.pv_cashflows.round(2)
policies.irr_annual = policies.irr_annual.round(4)
print(policies.collect())
What’s happening (and why it’s fast):
- List columns as vectors:
monthly_premiumsandmonthly_claimsare list vectors of length 60.policies.net_cashflows = premiums - claimsis computed element‑wise to another 60‑long vector per policy. - Broadcasting scalars over vectors:
monthly_rateis a scalar per policy. When callingmonthly_rate.excel.pv(nper=60, pmt=net_cashflows), the rate is broadcast across each row’s 60 cash‑flow elements; discount factors are generated and the series is reduced to a single PV per policy. - Vector IRR:
net_cashflows.excel.irr()runs a per‑row root‑finder on each cash‑flow vector to solve Σ cf_t / (1+r)^t = 0, returning one monthly IRR per policy;(1 + irr_monthly) ** 12 - 1converts to annual. - Single optimized plan: All assignments are lazy; on
.collect()the work fuses into one Polars/Rust execution plan that runs multi‑threaded and avoids Python loops, so portfolios with large timelines stay fast and memory‑efficient.
Modeling style: Python-native DSL¶
Models are plain Python functions operating on an ActuarialFrame. The DSL favors a functional, pipe‑friendly style for clarity and testability:
- Write calculations as column assignments and expressions.
- Break the model into small, composable functions (easy unit tests, easy reuse).
- Use plugins/utilities for common actuarial patterns (projection month/year, flooring, fills, etc.).
Dual execution modes¶
- Debug mode: Eager-style execution with rich diagnostics (think PyTorch). You get verbose logging, operation/column lineage, and helpful runtime context, but not step‑through Python debugging of the engine.
- Optimize mode: Maximum speed. Batch operations into an optimized Polars plan; optional JIT strategies (e.g., Numba) can accelerate UDF‑like hotspots.
Both modes produce the same results; choose based on your phase (explore vs run).
Projections and vectors¶
Projections are first‑class: list columns represent time series (months/years). Arithmetic, broadcasting, and lookups all work element‑wise on vectors. This enables fast projection‑level metrics (expected claims, premiums, cashflows) without Python loops.
Performance characteristics¶
- Backed by Polars’ Rust executor (multi‑threaded, SIMD‑accelerated).
- Assumption lookups use efficient joins and an in‑memory registry designed for read‑heavy access patterns.
- Lazy planning minimizes materialization and reduces memory churn on large runs.
Developer experience¶
- Auditable: Expression lineage and execution plans make it clear how outputs were derived.
- Composable: Small functions, clear column contracts, and minimal hidden state.
- Predictable: Deterministic execution when you
.collect(), with consistent behavior across modes.
How you typically work¶
- Load model points into an
ActuarialFrame. - Register assumption tables (optionally transforming wide → long) with composite keys.
- Build columns for durations, ages, keys, and rates; apply date/financial math as needed.
- Use vector‑aware lookups for projection‑time calculations.
- Compute cashflows and aggregate portfolio metrics.
- Call
.collect()once to execute.
See the main landing page for a complete walkthrough and the APIs for details on ActuarialFrame, Excel/date functions, and assumption tables. The example in tests/scratch/models/life_docs.py mirrors the same concepts end‑to‑end without requiring spreadsheets.
Quick examples¶
Minimal model setup¶
import datetime
from gaspatchio_core import ActuarialFrame
model_data = {
"policy_id": ["P001", "P002"],
"age": [35, 42],
"issue_date": [datetime.date(2022, 1, 15), datetime.date(2021, 6, 1)],
"valuation_date": [datetime.date(2024, 12, 31), datetime.date(2024, 12, 31)],
}
af = ActuarialFrame(model_data)
# Date math using Excel-like functions
af.days_in_force = af.valuation_date.excel.days(af.issue_date)
af.years_in_force = af.issue_date.excel.yearfrac(af.valuation_date, basis="act/act").round(2)
df = af.collect()
Assumption lookup (composite key)¶
import polars as pl
from gaspatchio_core import ActuarialFrame
from gaspatchio_core.assumptions import Table
model_data = {
"policy_id": ["P001", "P002"],
"age": [35, 42],
"gender": ["M", "F"],
"smoking_status": ["NS", "NS"],
}
af = ActuarialFrame(model_data)
mortality_df = pl.DataFrame(
{
"age_last": [35, 35, 42, 42],
"sex_smoking": ["MNS", "FNS", "MNS", "FNS"],
"mortality_rate": [0.0010, 0.0008, 0.0020, 0.0015],
}
)
mortality = Table(
name="mortality_demo",
source=mortality_df,
dimensions={"age_last": "age_last", "sex_smoking": "sex_smoking"},
value="mortality_rate",
)
af.sex_smoking = af.gender + af.smoking_status
af.age_last = af.age.floor()
af.mortality_rate = mortality.lookup(age_last=af.age_last, sex_smoking=af.sex_smoking)
df = af.collect()