Gaspatchio: Core Concepts¶
What is Gaspatchio?¶
Gaspatchio is a Python-first actuarial modeling framework that lets you build production-grade models with the ergonomics of Python and the performance of Rust. It marries a Python-native DSL with a high‑performance execution engine powered by Polars and Rust, so you can write clear model code and still scale to millions of policies and long projections.
Why it exists¶
- Simple: Express models in idiomatic Python using a small set of core abstractions. No new language to learn.
- Fast: Under the hood, computations run on Polars’ Rust engine with parallelism and SIMD. Heavy joins and columnar math are vectorized.
- Best served cold: Gaspatchio embraces lazy evaluation and query planning so large model runs stay memory‑efficient and predictable.
The core abstraction: ActuarialFrame¶
ActuarialFrame is a thin, actuarial-aware wrapper around Polars that captures column logic, tracks dependencies, and executes lazily as a single plan. Compared to pandas:
- Lazy by default: Transformations are composed, optimized, then executed once via
.collect(). - Expression tracking: Column lineage is tracked for auditability and reproducibility.
- Actuarial affordances: Built-in helpers for projection timelines, date/financial math, and table lookups.
Assumptions and table lookups¶
Assumption handling is a first-class feature:
- Register once, reuse everywhere: Load tables into an in‑memory registry and look them up by name. Composite keys are supported (e.g.,
age_lastandsex_smoking). - Wide-to-long transforms: Register wide tables (e.g., mortality with
MNS/FNS/MS/FS) using a built-in long transform for join‑friendly layouts. - Vector-aware lookups: Lookups work with scalar columns and with vector/list columns for projections, returning aligned vectors.
- High-throughput joins: Implemented on top of Polars joins with Rust concurrency. The registry is designed for read-heavy workloads.
Excel-compatible functions (without Excel)¶
Gaspatchio provides Excel‑like semantics directly on columns for clarity and interoperability. If you know Excel, you can read and write most calculations immediately – the API mirrors spreadsheet formulas but works on columns and vectors:
- Date functions:
YEARFRAC,DAYS,EDATE,EOMONTH, with day‑count conventions (e.g.,basis="act/act"). - Financial functions:
PV, discount factors, and related helpers (Excel semantics, column‑wise and vectorized). - Math and string: Common numeric and text helpers available on columns and list vectors.
These live alongside native Polars expressions, so you can mix and match.
Using Excel functions on columns¶
import datetime
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"issue_date": [datetime.date(2022, 1, 15), datetime.date(2021, 6, 1)],
"val_date": [datetime.date(2024, 12, 31), datetime.date(2024, 12, 31)],
"annual_premium": [1200.0, 2800.0],
"rate": [0.035, 0.04],
"term_years": [20, 25],
}
)
# Dates
af.days_in_force = af.val_date.excel.days(af.issue_date)
af.years_in_force = af.issue_date.excel.yearfrac(af.val_date, basis="act/act").round(4)
# Finance (Excel PV semantics)
af.pv_premiums = af.rate.excel.pv(nper=af.term_years, pmt=af.annual_premium).round(2)
# Strings/Math
af.label = ("POLICY_" + af.days_in_force.cast(str))
af.discount_factor = (1 + af.rate) ** (-af.term_years)
df = af.collect()
Vectorized Excel functions¶
Excel semantics apply to list (vector) columns as well – use .list.eval(...) for per‑element transforms when needed.
import polars as pl
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame({"rate": [0.03], "prem": [1200.0]})
# 5-year monthly timeline (60 elements): [0, 1, 2, ..., 59]
af.months = af.fill_series(60, start=0, increment=1)
# Convert months to years and compute PV of monthly premiums
af.years = (af.months / 12)
af.monthly_pv = (
(af.prem / 12)
* (1 / ((1 + (af.rate / 12)) ** af.months))
)
# Example of using list.eval with Excel helpers if needed
af.rounded_pv = af.monthly_pv.list.eval(pl.element().round(4))
df = af.collect()
Migration from Excel – quick mapping¶
- YEARFRAC(start, end, basis) →
start_col.excel.yearfrac(end_col, basis="act/act") - DAYS(end, start) →
end_col.excel.days(start_col) - PV(rate, nper, pmt, fv=0, type=0) →
rate_col.excel.pv(nper=nper_col, pmt=pmt_col, fv=fv, when=type) - EDATE(start, months) →
start_col.excel.edate(months_col) - EOMONTH(start, months) →
start_col.excel.eomonth(months_col)
Notes:
- All functions operate on columns; broadcast rules follow Polars (scalars broadcast across rows).
- Vector/list columns are supported; use
.list.eval(...)for element‑wise transforms when composing with non‑Excel helpers. - Excel’s
typeargument maps towhen(0 = end, 1 = beginning) for functions that depend on timing.
Modeling style: Python-native DSL¶
Models are plain Python functions operating on an ActuarialFrame. The DSL favors a functional, pipe‑friendly style for clarity and testability:
- Write calculations as column assignments and expressions.
- Break the model into small, composable functions (easy unit tests, easy reuse).
- Use plugins/utilities for common actuarial patterns (projection month/year, flooring, fills, etc.).
Dual execution modes¶
- Debug mode: Eager-style execution with rich diagnostics (think PyTorch). You get verbose logging, operation/column lineage, and helpful runtime context, but not step‑through Python debugging of the engine.
- Optimize mode: Maximum speed. Batch operations into an optimized Polars plan; optional JIT strategies (e.g., Numba) can accelerate UDF‑like hotspots.
Both modes produce the same results; choose based on your phase (explore vs run).
Projections and vectors¶
Projections are first‑class: list columns represent time series (months/years). Arithmetic, broadcasting, and lookups all work element‑wise on vectors. This enables fast projection‑level metrics (expected claims, premiums, cashflows) without Python loops.
Performance characteristics¶
- Backed by Polars’ Rust executor (multi‑threaded, SIMD‑accelerated).
- Assumption lookups use efficient joins and an in‑memory registry designed for read‑heavy access patterns.
- Lazy planning minimizes materialization and reduces memory churn on large runs.
Developer experience¶
- Auditable: Expression lineage and execution plans make it clear how outputs were derived.
- Composable: Small functions, clear column contracts, and minimal hidden state.
- Predictable: Deterministic execution when you
.collect(), with consistent behavior across modes.
How you typically work¶
- Load model points into an
ActuarialFrame. - Register assumption tables (optionally transforming wide → long) with composite keys.
- Build columns for durations, ages, keys, and rates; apply date/financial math as needed.
- Use vector‑aware lookups for projection‑time calculations.
- Compute cashflows and aggregate portfolio metrics.
- Call
.collect()once to execute.
See the main landing page for a complete walkthrough and the APIs for details on ActuarialFrame, Excel/date functions, and assumption tables. The example in tests/scratch/models/life_docs.py mirrors the same concepts end‑to‑end without requiring spreadsheets.
Quick examples¶
Minimal model setup¶
import datetime
from gaspatchio_core import ActuarialFrame
model_data = {
"policy_id": ["P001", "P002"],
"age": [35, 42],
"issue_date": [datetime.date(2022, 1, 15), datetime.date(2021, 6, 1)],
"valuation_date": [datetime.date(2024, 12, 31), datetime.date(2024, 12, 31)],
}
af = ActuarialFrame(model_data)
# Date math using Excel-like functions
af.days_in_force = af.valuation_date.excel.days(af.issue_date)
af.years_in_force = af.issue_date.excel.yearfrac(af.valuation_date, basis="act/act").round(2)
df = af.collect()
Assumption lookup (composite key)¶
import polars as pl
from gaspatchio_core import ActuarialFrame
from gaspatchio_core.assumptions import Table
model_data = {
"policy_id": ["P001", "P002"],
"age": [35, 42],
"gender": ["M", "F"],
"smoking_status": ["NS", "NS"],
}
af = ActuarialFrame(model_data)
mortality_df = pl.DataFrame(
{
"age_last": [35, 35, 42, 42],
"sex_smoking": ["MNS", "FNS", "MNS", "FNS"],
"mortality_rate": [0.0010, 0.0008, 0.0020, 0.0015],
}
)
mortality = Table(
name="mortality_demo",
source=mortality_df,
dimensions={"age_last": "age_last", "sex_smoking": "sex_smoking"},
value="mortality_rate",
)
af.sex_smoking = af.gender + af.smoking_status
af.age_last = af.age.floor()
af.mortality_rate = mortality.lookup(age_last=af.age_last, sex_smoking=af.sex_smoking)
df = af.collect()