Actuarial Frame
gaspatchio_core.frame.base.ActuarialFrame
¶
A lazy, chainable, and traceable DataFrame for actuarial modeling.
The ActuarialFrame provides a high-level API for common actuarial calculations and data manipulations, leveraging Polars LazyFrames for performance. It supports tracing of operations for optimization and introspection, and provides convenient accessors for specialized functionality (e.g., date, finance, excel operations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict | DataFrame | LazyFrame | None
|
Initial data to populate the frame. Can be a Python dictionary, a Polars DataFrame, or a Polars LazyFrame. If None, an empty frame is initialized. Defaults to None. |
None
|
mode
|
str | None
|
The operational mode: "run", "optimize", or "debug".
- "run": Executes operations eagerly.
- "optimize": Defers execution and builds a computation graph.
- "debug": Provides more verbose output.
Defaults to the global default mode ( |
None
|
verbose
|
bool | None
|
Enables or disables verbose logging.
Defaults to the global default verbosity ( |
None
|
threads
|
int | None
|
Number of threads for parallel operations.
Defaults to a system-dependent value or |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
date |
DateFrameAccessor
|
Accessor for date-related operations. |
excel |
ExcelFrameAccessor
|
Accessor for Excel-like operations. |
finance |
FinanceFrameAccessor
|
Accessor for financial calculations. |
columns |
list[str]
|
A list of column names in their current order. |
Examples:
Initialization and Basic Operations
>>> from gaspatchio_core import ActuarialFrame
>>> data = {
... "policy_id": [1, 1, 2, 2, 3],
... "inception_date": [
... "2020-01-01",
... "2020-01-01",
... "2021-05-10",
... "2021-05-10",
... "2022-02-20",
... ],
... "premium": [100, 150, 200, 50, 300],
... "claims": [0, 50, 10, 0, 120],
... }
>>> af = ActuarialFrame(data)
>>> af["loss_ratio"] = af["claims"] / af["premium"]
>>> result = af.collect()
>>> print(result.head(3))
shape: (3, 5)
┌───────────┬────────────────┬─────────┬────────┬────────────┐
│ policy_id ┆ inception_date ┆ premium ┆ claims ┆ loss_ratio │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ i64 ┆ f64 │
╞═══════════╪════════════════╪═════════╪════════╪════════════╡
│ 1 ┆ 2020-01-01 ┆ 100 ┆ 0 ┆ 0.0 │
│ 1 ┆ 2020-01-01 ┆ 150 ┆ 50 ┆ 0.333333 │
│ 2 ┆ 2021-05-10 ┆ 200 ┆ 10 ┆ 0.05 │
└───────────┴────────────────┴─────────┴────────┴────────────┘
Using sum over a group
>>> af = ActuarialFrame(data)
>>> af["total_premium_per_policy"] = af["premium"].sum().over("policy_id")
>>> result_with_sum = af.collect()
>>> print(result_with_sum)
shape: (5, 5)
┌───────────┬────────────────┬─────────┬────────┬──────────────────────────┐
│ policy_id ┆ inception_date ┆ premium ┆ claims ┆ total_premium_per_policy │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │
╞═══════════╪════════════════╪═════════╪════════╪══════════════════════════╡
│ 1 ┆ 2020-01-01 ┆ 100 ┆ 0 ┆ 250 │
│ 1 ┆ 2020-01-01 ┆ 150 ┆ 50 ┆ 250 │
│ 2 ┆ 2021-05-10 ┆ 200 ┆ 10 ┆ 250 │
│ 2 ┆ 2021-05-10 ┆ 50 ┆ 0 ┆ 250 │
│ 3 ┆ 2022-02-20 ┆ 300 ┆ 120 ┆ 300 │
└───────────┴────────────────┴─────────┴────────┴──────────────────────────┘
Using an accessor (e.g., date accessor)
Assume 'inception_date' needs to be parsed to a date type first.
For simplicity, let's imagine it's already a date type for this example.
(Actual parsing would use af["inception_date"].str.to_date("%Y-%m-%d") or similar)
>>> # If 'inception_date' was a date type:
>>> # af["inception_year"] = af.date.year("inception_date")
>>> # af_with_year = af.collect()
>>> # print(af_with_year.select(["policy_id", "inception_year"]))
columns
property
¶
Return the names of the columns in the current order.
date
property
¶
Access date-related frame operations.
excel
property
¶
Access excel-related frame operations.
finance
property
¶
Access finance-related frame operations.
collect(*, engine='streaming')
¶
Execute and materialize the dataframe.
collect is the public escape hatch from the lazy ActuarialFrame
graph to an eager :class:polars.DataFrame. Inside a model function
the lazy form is usually what you want — Polars fuses the expressions
and avoids intermediate materialisation. Reach for collect() when
the calculation genuinely needs eager column arrays, most commonly
when handing per-policy probabilities to a numpy RNG inside a
stochastic scenario kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
str
|
Execution engine to use. Options: - "streaming" (default): Process data in batches for ~2x faster execution. Falls back to in-memory for unsupported operations. - "in-memory": Classic Polars in-memory execution. - "auto": Let Polars choose the engine. |
'streaming'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Materialized DataFrame with all computations applied. |
Examples:
Inside a for_each_scenario stochastic model function:
>>> import numpy as np
>>> import polars as pl
>>> def my_model(af, *, tables, drivers):
... df = af.collect() # materialise for numpy RNG access
... rng = np.random.default_rng(drivers["rng_seed"])
... deaths = rng.binomial(1, df["q_mort"].to_numpy())
... return af.with_columns(pl.Series("died", deaths))
count()
¶
Count non-null values in each column.
Returns a single-row frame containing the count of non-null values for each column. Essential for data quality assessment, completeness checks, and exposure calculations in actuarial analysis.
When to use
- Data Quality: Assess completeness of critical fields like policy ID, sum assured, or premium to identify missing data issues.
- Exposure Calculation: Count policies, lives, or claims for exposure-based calculations in pricing and reserving.
- Cohort Analysis: Determine size of different risk groups, age bands, or product segments for credibility assessment.
- Validation: Verify record counts match expected values after data processing, joins, or filtering operations.
Returns¶
CountResult A frame with one row containing non-null counts for each column.
Examples¶
Scalar Example: Data Completeness Check
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004", None],
"age": [25, 45, None, 35, 52],
"sum_assured": [100000, 500000, 250000, None, 300000],
"status": ["Active", "Active", "Lapsed", "Active", "Active"],
}
af = ActuarialFrame(data)
counts = af.count()
print(counts)
print("Complete policies:", counts["policy_id"])
print("Complete ages:", counts["age"])
print("Data completeness %:", counts["age"] / 5 * 100)
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────┐
│ policy_id ┆ age ┆ sum_assured ┆ status │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 ┆ u32 │
╞═══════════╪═════╪═════════════╪════════╡
│ 4 ┆ 4 ┆ 4 ┆ 5 │
└───────────┴─────┴─────────────┴────────┘
Complete policies: 4
Complete ages: 4
Data completeness %: 80.0
Vector Example: Monthly Activity Counts
from gaspatchio_core import ActuarialFrame
data = {
"month": ["Jan", "Feb"],
"daily_claims": [
[5, 3, 0, 4, None, 2, 1, 0, 3, None, 4, 2, 0, 1, 5],
[2, None, 3, 1, 0, 4, None, 2, 0, 3, 1, None, 4, 2, 0]
],
"daily_lapses": [
[1, 0, 0, 2, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 1],
[0, 1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 1, 0, 2, 0]
]
}
af = ActuarialFrame(data)
# Count valid daily observations
counts = af.count()
print(counts)
shape: (1, 3)
┌───────┬──────────────┬──────────────┐
│ month ┆ daily_claims ┆ daily_lapses │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═══════╪══════════════╪══════════════╡
│ 2 ┆ 2 ┆ 2 │
└───────┴──────────────┴──────────────┘
drop(*columns)
¶
Remove columns from the frame.
Drops intermediate or debug columns that are no longer needed. Commonly used at the end of a model to clean up temporary calculations before writing results.
When to use
- Cleanup: Remove intermediate calculation columns (flags, temporary rates) before writing final results to parquet.
- Memory: Drop large columns that are no longer needed to reduce peak memory during collection.
Parameters¶
*columns : str Column names to drop.
Returns¶
ActuarialFrame Frame without the specified columns.
Examples¶
Remove temporary columns before output
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"policy_id": ["P001"],
"premium": [1200],
"sum_assured": [100000],
"temp_debug": [999],
}
)
af = af.drop("temp_debug")
print(af.collect())
shape: (1, 3)
┌───────────┬─────────┬─────────────┐
│ policy_id ┆ premium ┆ sum_assured │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════════╪═════════╪═════════════╡
│ P001 ┆ 1200 ┆ 100000 │
└───────────┴─────────┴─────────────┘
fill_series(column, start=0, increment=1)
¶
Apply fill_series using the core function.
filter(predicate)
¶
Filter rows by a boolean expression.
Removes policies that don't match the condition. Commonly used to exclude lapsed, matured, or otherwise inactive policies before running a projection.
When to use
- In-Force Selection: Filter to active policies (
status == "IF") before projection to exclude lapsed, surrendered, or matured business. - Cohort Analysis: Isolate a subset of policies by product, age band, or underwriting class for targeted analysis.
Parameters¶
predicate : pl.Expr Boolean expression to filter by.
Returns¶
ActuarialFrame Frame with only matching rows.
Examples¶
Filter to in-force policies only
import polars as pl
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"policy_id": ["P001", "P002", "P003"],
"status": ["IF", "LAPSED", "IF"],
"premium": [1200, 800, 1500],
}
)
af = af.filter(pl.col("status") == "IF")
print(af.collect())
shape: (2, 3)
┌───────────┬────────┬─────────┐
│ policy_id ┆ status ┆ premium │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════════╪════════╪═════════╡
│ P001 ┆ IF ┆ 1200 │
│ P003 ┆ IF ┆ 1500 │
└───────────┴────────┴─────────┘
get_column_order()
¶
Return the tracked order of columns.
join(other, on=None, left_on=None, right_on=None, how='left')
¶
Join with another DataFrame without leaving the ActuarialFrame API.
Enriches model points with assumption parameters, product lookups,
expense tables, or any static data that should be attached per policy.
This replaces the pattern of calling .collect() + raw Polars join
+ re-wrapping with ActuarialFrame().
When to use
- Expense Parameters: Attach product-level expense loadings, commission rates, or overhead allocations to each policy before projection.
- Product Configuration: Join product parameter tables (riders, benefit features, premium patterns) onto model points by product code.
- Cohort Enrichment: Add portfolio-level or cohort-level attributes (risk class, distribution channel) from external reference data.
Parameters¶
other : pl.DataFrame | pl.LazyFrame The right-side table to join against. on : str | list[str] | None Column name(s) to join on (when both sides use the same name). left_on : str | list[str] | None Column name(s) on the left (this frame). right_on : str | list[str] | None Column name(s) on the right (other frame). how : str, default "left" Join type: "left", "inner", "outer", "cross".
Returns¶
ActuarialFrame Frame with columns from both sides.
Examples¶
Attach expense parameters by product code
import polars as pl
from gaspatchio_core import ActuarialFrame
expense_params = pl.DataFrame(
{
"product_code": ["TERM", "WL", "UL"],
"expense_pct": [0.05, 0.08, 0.10],
}
)
af = ActuarialFrame(
{
"policy_id": ["P001", "P002", "P003"],
"product_code": ["TERM", "WL", "TERM"],
"sum_assured": [100000, 200000, 150000],
}
)
af = af.join(expense_params, on="product_code")
print(af.collect())
shape: (3, 4)
┌───────────┬──────────────┬─────────────┬─────────────┐
│ policy_id ┆ product_code ┆ sum_assured ┆ expense_pct │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ f64 │
╞═══════════╪══════════════╪═════════════╪═════════════╡
│ P001 ┆ TERM ┆ 100000 ┆ 0.05 │
│ P002 ┆ WL ┆ 200000 ┆ 0.08 │
│ P003 ┆ TERM ┆ 150000 ┆ 0.05 │
└───────────┴──────────────┴─────────────┴─────────────┘
max()
¶
Calculate maximum values across all numeric columns.
Returns a single-row frame containing the maximum value for each column. Essential for identifying outliers, validating data ranges, and determining upper bounds in actuarial calculations.
When to use
- Data Validation: Identify outliers in premium amounts, sum assured, or claim values that may require investigation.
- Experience Analysis: Find maximum claim amounts, policy sizes, or ages in a portfolio for risk assessment.
- Regulatory Reporting: Determine maximum exposure amounts for solvency calculations and stress testing.
- Pricing Boundaries: Identify upper limits for age bands, benefit amounts, or policy terms in product design.
Returns¶
MaxResult A frame with one row containing maximum values for each column.
Examples¶
Scalar Example: Portfolio Maximum Values
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004"],
"age": [25, 45, 67, 35],
"sum_assured": [100000, 500000, 250000, 1000000],
"annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
max_values = af.max()
print(max_values)
print("Max age:", max_values["age"][0])
print("Max sum assured:", max_values["sum_assured"][0])
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────────────┐
│ policy_id ┆ age ┆ sum_assured ┆ annual_premium │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═══════════╪═════╪═════════════╪════════════════╡
│ P004 ┆ 67 ┆ 1000000 ┆ 15000 │
└───────────┴─────┴─────────────┴────────────────┘
Max age: 67
Max sum assured: 1000000
Vector Example: Maximum Monthly Claims
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002"],
"policy_year": [1, 2],
"monthly_claims": [
[0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
[0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
],
"monthly_premiums": [
[1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000],
[1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500]
]
}
af = ActuarialFrame(data)
# Get maximum values to understand worst-case scenarios
max_values = af.max()
print(max_values)
print("Max policy year:", max_values["policy_year"][0])
shape: (1, 4)
┌───────────┬─────────────┬─────────────────────────────────────┬─────────────────────────────────────┐
│ policy_id ┆ policy_year ┆ monthly_claims ┆ monthly_premiums │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[i64] ┆ list[i64] │
╞═══════════╪═════════════╪═════════════════════════════════════╪═════════════════════════════════════╡
│ P002 ┆ 2 ┆ [0, 500, 3000, 1200, … 4000, 0, 0] ┆ [1500, 1500, 1500, 1500, … 1500] │
└───────────┴─────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
Max policy year: 2
mean()
¶
Calculate mean values across all numeric columns.
Returns a single-row frame containing the mean value for each numeric column. Essential for portfolio analysis, experience studies, and establishing benchmarks in actuarial calculations.
When to use
- Experience Analysis: Calculate average claim amounts, policy sizes, or premium levels for portfolio segmentation and pricing.
- Trend Analysis: Determine average lapse rates, mortality rates, or expense ratios over observation periods.
- Benchmarking: Establish portfolio averages for age, sum assured, or duration to compare against industry standards.
- Reserve Calculations: Compute average policy values, benefit amounts, or reserve factors for grouped calculations.
Returns¶
MeanResult A frame with one row containing mean values for numeric columns.
Examples¶
Scalar Example: Portfolio Averages
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004"],
"age": [25, 45, 67, 35],
"sum_assured": [100000, 500000, 250000, 1000000],
"annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
mean_values = af.mean()
print(mean_values)
print("Average age:", mean_values["age"])
print("Average sum assured:", mean_values["sum_assured"])
shape: (1, 3)
┌──────┬──────────────┬─────────────────┐
│ age ┆ sum_assured ┆ annual_premium │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════════════╪═════════════════╡
│ 43.0 ┆ 462500.0 ┆ 7425.0 │
└──────┴──────────────┴─────────────────┘
Average age: 43.0
Average sum assured: 462500.0
Vector Example: Average Monthly Experience
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002"],
"policy_year": [1, 2],
"monthly_claims": [
[0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
[0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
],
"monthly_lapses": [
[2, 1, 3, 0, 1, 2, 1, 0, 1, 0, 2, 1],
[1, 0, 2, 1, 0, 1, 0, 1, 0, 2, 1, 0]
]
}
af = ActuarialFrame(data)
# Get average monthly experience
mean_values = af.mean()
print(mean_values)
shape: (1, 4)
┌─────────────┬───────────────────────────────┬──────────────────────────────┐
│ policy_year ┆ monthly_claims ┆ monthly_lapses │
│ --- ┆ --- ┆ --- │
│ f64 ┆ list[f64] ┆ list[f64] │
╞═════════════╪═══════════════════════════════╪══════════════════════════════╡
│ 1.5 ┆ [0.0, 250.0, 1500.0, … 0.0] ┆ [1.5, 0.5, 2.5, … 0.5] │
└─────────────┴───────────────────────────────┴──────────────────────────────┘
median()
¶
Calculate median values across all numeric columns.
Returns a single-row frame containing the median value for each numeric column. Useful for robust central tendency measures that are less affected by outliers in actuarial data.
When to use
- Robust Analysis: Use median instead of mean when data contains outliers, such as large claims or extreme ages in the portfolio.
- Income Analysis: Analyze median policyholder income or premium levels for market segmentation and product design.
- Experience Studies: Calculate median time to claim, policy duration, or age at lapse for more representative measures.
- Pricing Benchmarks: Determine median rates or factors when comparing across competitors or market segments.
Returns¶
MedianResult A frame with one row containing median values for numeric columns.
Examples¶
Scalar Example: Median Policy Metrics
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004", "P005"],
"duration_years": [1, 3, 5, 7, 15],
"annual_premium": [1200, 3500, 2800, 4200, 12000],
"age": [25, 35, 42, 38, 65],
}
af = ActuarialFrame(data)
median_values = af.median()
print(median_values)
print("Median duration:", median_values["duration_years"])
print("Median premium:", median_values["annual_premium"])
shape: (1, 3)
┌────────────────┬────────────────┬──────┐
│ duration_years ┆ annual_premium ┆ age │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞════════════════╪════════════════╪══════╡
│ 5.0 ┆ 3500.0 ┆ 38.0 │
└────────────────┴────────────────┴──────┘
Median duration: 5.0
Median premium: 3500.0
Vector Example: Median Monthly Performance
from gaspatchio_core import ActuarialFrame
data = {
"agent": ["A001", "A002"],
"monthly_sales": [
[3, 5, 2, 8, 4, 6, 3, 7, 5, 4, 6, 9],
[12, 15, 10, 18, 14, 16, 11, 20, 13, 17, 15, 22]
],
"monthly_commission": [
[450, 750, 300, 1200, 600, 900, 450, 1050, 750, 600, 900, 1350],
[1800, 2250, 1500, 2700, 2100, 2400, 1650, 3000, 1950, 2550, 2250, 3300]
]
}
af = ActuarialFrame(data)
# Calculate median for typical performance assessment
median_values = af.median()
print(median_values)
print("Agent A001 median sales:", median_values["monthly_sales"][0])
print("Agent A002 median sales:", median_values["monthly_sales"][1])
shape: (1, 3)
┌────────────┬────────────────────┬──────────────────────┐
│ agent ┆ monthly_sales ┆ monthly_commission │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞════════════╪════════════════════╪══════════════════════╡
│ null ┆ [5.0, 15.0] ┆ [750.0, 2250.0] │
└────────────┴────────────────────┴──────────────────────┘
Agent A001 median sales: 5.0
Agent A002 median sales: 15.0
min()
¶
Calculate minimum values across all numeric columns.
Returns a single-row frame containing the minimum value for each column. Essential for identifying baseline values, detecting anomalies, and establishing lower bounds in actuarial calculations.
When to use
- Data Quality Checks: Identify potential data errors like negative ages, zero premiums, or missing values coded as extreme minimums.
- Portfolio Analysis: Find minimum entry ages, smallest policy sizes, or lowest premium amounts for market segmentation.
- Risk Assessment: Determine minimum coverage levels, deductibles, or retention limits in reinsurance analysis.
- Product Design: Establish minimum benefit guarantees, surrender values, or contribution limits for new products.
Returns¶
MinResult A frame with one row containing minimum values for each column.
Examples¶
Scalar Example: Portfolio Minimum Values
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004"],
"age": [25, 45, 67, 35],
"sum_assured": [100000, 500000, 250000, 1000000],
"annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
min_values = af.min()
print(min_values)
print("Min age:", min_values["age"])
print("Min sum assured:", min_values["sum_assured"])
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────────────┐
│ policy_id ┆ age ┆ sum_assured ┆ annual_premium │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═══════════╪═════╪═════════════╪════════════════╡
│ P001 ┆ 25 ┆ 100000 ┆ 1200 │
└───────────┴─────┴─────────────┴────────────────┘
Min age: 25
Min sum assured: 100000
Vector Example: Minimum Monthly Claims
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002"],
"policy_year": [1, 2],
"monthly_claims": [
[0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
[0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
],
"monthly_retention": [
[1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000],
[500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500]
]
}
af = ActuarialFrame(data)
# Get minimum values to understand retention levels
min_values = af.min()
print(min_values)
print("Min retention level:", min_values["monthly_retention"])
shape: (1, 4)
┌───────────┬─────────────┬─────────────────────────────────────┬─────────────────────────────────────┐
│ policy_id ┆ policy_year ┆ monthly_claims ┆ monthly_retention │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[i64] ┆ list[i64] │
╞═══════════╪═════════════╪═════════════════════════════════════╪═════════════════════════════════════╡
│ P001 ┆ 1 ┆ [0, 0, 0, 0, … 0, 0, 0] ┆ [500, 500, 500, 500, … 500] │
└───────────┴─────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
Min retention level: [500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500]
pipe(func, *args, **kwargs)
¶
Apply a function that accepts and returns an ActuarialFrame.
product()
¶
Calculate the product of values in each numeric column.
Returns a single-row frame containing the product of all values for each numeric column. Useful for compound calculations, probability chains, and multiplicative factors in actuarial modeling.
When to use
- Compound Interest: Calculate accumulated values using multiple period growth factors or discount factors.
- Probability Chains: Multiply survival probabilities, persistency rates, or success rates across multiple periods.
- Factor Application: Apply multiple adjustment factors, loading factors, or credibility factors in sequence.
- Index Calculations: Compute cumulative index values from period-to-period change factors.
Returns¶
ProductResult A frame with one row containing products for numeric columns.
Examples¶
Scalar Example: Survival Probability Chain
from gaspatchio_core import ActuarialFrame
data = {
"year": [1, 2, 3, 4, 5],
"annual_survival": [0.999, 0.998, 0.997, 0.995, 0.993],
"annual_persistency": [0.95, 0.92, 0.90, 0.88, 0.85],
}
af = ActuarialFrame(data)
products = af.product()
print(products)
print("5-year survival probability:", round(products["annual_survival"], 6))
print("5-year persistency:", round(products["annual_persistency"], 4))
shape: (1, 3)
┌──────┬─────────────────┬────────────────────┐
│ year ┆ annual_survival ┆ annual_persistency │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞══════╪═════════════════╪════════════════════╡
│ 120 ┆ 0.982089 ┆ 0.59262 │
└──────┴─────────────────┴────────────────────┘
5-year survival probability: 0.982089
5-year persistency: 0.5926
Vector Example: Discount Factor Chains
from gaspatchio_core import ActuarialFrame
data = {
"scenario": ["Base", "Stressed"],
"monthly_discount": [
[0.9992, 0.9992, 0.9992, 0.9992, 0.9992, 0.9992],
[0.9990, 0.9990, 0.9990, 0.9990, 0.9990, 0.9990]
],
"monthly_survival": [
[0.9999, 0.9999, 0.9999, 0.9999, 0.9999, 0.9999],
[0.9998, 0.9998, 0.9998, 0.9998, 0.9998, 0.9998]
]
}
af = ActuarialFrame(data)
# Calculate cumulative factors
products = af.product()
print(products)
shape: (1, 3)
┌──────────┬──────────────────┬──────────────────┐
│ scenario ┆ monthly_discount ┆ monthly_survival │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞══════════╪══════════════════╪══════════════════╡
│ null ┆ [0.9952, 0.9940] ┆ [0.9994, 0.9988] │
└──────────┴──────────────────┴──────────────────┘
profile()
¶
Execute and materialize the dataframe with profiling, returning (result_df, profile_info).
quantile(quantile, interpolation='nearest')
¶
Calculate quantile values across all numeric columns.
Returns a single-row frame containing the specified quantile for each numeric column. Essential for risk assessment, percentile-based analysis, and regulatory reporting in actuarial applications.
When to use
- Risk Assessment: Calculate VaR (Value at Risk) at different confidence levels (e.g., 95th, 99th percentile) for solvency calculations.
- Experience Analysis: Determine percentile thresholds for large claims, high-risk ages, or outlier detection in portfolios.
- Pricing Segmentation: Identify quantile boundaries for premium bands, risk tiers, or underwriting categories.
- Regulatory Reporting: Calculate required percentiles for stress testing, capital requirements, or reserve adequacy testing.
Parameters¶
quantile : float Quantile value between 0 and 1 (e.g., 0.5 for median, 0.95 for 95th percentile). interpolation : str, default "nearest" Interpolation method: "nearest", "higher", "lower", "midpoint", or "linear".
Returns¶
QuantileResult A frame with one row containing quantile values for numeric columns.
Examples¶
Scalar Example: Claims Distribution Analysis
from gaspatchio_core import ActuarialFrame
data = {
"claim_id": list(range(1, 101)),
"claim_amount": [
1000,
1500,
2000,
2500,
3000,
3500,
4000,
5000,
6000,
7500,
8000,
9000,
10000,
12000,
15000,
18000,
20000,
25000,
30000,
35000,
40000,
45000,
50000,
60000,
75000,
85000,
95000,
100000,
120000,
150000,
]
+ [2000] * 70,
"processing_days": list(range(5, 35)) + list(range(10, 80)),
}
af = ActuarialFrame(data)
# Calculate key percentiles
p90 = af.quantile(0.90)
p95 = af.quantile(0.95)
p99 = af.quantile(0.99)
print("90th percentile:")
print(p90)
print("\\nClaim amount 90th percentile:", p90["claim_amount"])
print("Claim amount 95th percentile:", p95["claim_amount"])
print("Claim amount 99th percentile:", p99["claim_amount"])
90th percentile:
shape: (1, 3)
┌──────────┬──────────────┬─────────────────┐
│ claim_id ┆ claim_amount ┆ processing_days │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════════╪══════════════╪═════════════════╡
│ 90.0 ┆ 85000.0 ┆ 71.0 │
└──────────┴──────────────┴─────────────────┘
Claim amount 90th percentile: 85000.0
Claim amount 95th percentile: 100000.0
Claim amount 99th percentile: 150000.0
Vector Example: Portfolio Risk Percentiles
from gaspatchio_core import ActuarialFrame
data = {
"product": ["Term Life", "Whole Life"],
"claim_amounts": [
[10000, 15000, 20000, 25000, 30000, 35000, 40000, 50000, 75000, 100000,
150000, 200000, 250000, 300000, 500000, 750000, 1000000, 1500000, 2000000, 3000000],
[50000, 75000, 100000, 125000, 150000, 175000, 200000, 250000, 300000, 400000,
500000, 600000, 750000, 900000, 1000000, 1250000, 1500000, 2000000, 2500000, 5000000]
]
}
af = ActuarialFrame(data)
# Calculate 95th percentile for risk assessment
var_95 = af.quantile(0.95)
print("95% VaR by product:")
print(var_95)
95% VaR by product:
shape: (1, 2)
┌────────────┬──────────────────────────────────┐
│ product ┆ claim_amounts │
│ --- ┆ --- │
│ str ┆ list[f64] │
╞════════════╪══════════════════════════════════╡
│ null ┆ [2000000.0, 2500000.0] │
└────────────┴──────────────────────────────────┘
rename(mapping)
¶
Rename columns to snake_case or actuarial conventions.
Converts raw data column names (often from Excel or vendor systems) to the snake_case convention used throughout Gaspatchio models. Run this in Phase 1 (setup) before creating the projection timeline.
When to use
- Data Ingestion: Convert Excel-style column names
(
"Issue Age","Sum Assured") to snake_case for use as ActuarialFrame attributes. - Vendor Data: Standardise column names from different admin systems or data providers before building assumptions.
Parameters¶
mapping : dict[str, str] Mapping of old name to new name.
Returns¶
ActuarialFrame Frame with renamed columns.
Examples¶
Rename Excel-style columns to snake_case
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"Policy Number": ["P001", "P002"],
"Issue Age": [30, 45],
"Sum Assured": [100000, 200000],
}
)
af = af.rename(
{
"Policy Number": "policy_id",
"Issue Age": "issue_age",
"Sum Assured": "sum_assured",
}
)
print(af.collect())
shape: (2, 3)
┌───────────┬───────────┬─────────────┐
│ policy_id ┆ issue_age ┆ sum_assured │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════════╪═══════════╪═════════════╡
│ P001 ┆ 30 ┆ 100000 │
│ P002 ┆ 45 ┆ 200000 │
└───────────┴───────────┴─────────────┘
select(*exprs, **named_exprs)
¶
Select columns from the DataFrame.
Accepts positional expressions (column names, proxies, or expressions) and keyword arguments for renamed/new expressions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*exprs
|
IntoExprColumn
|
Columns or expressions to select. |
()
|
**named_exprs
|
IntoExprColumn
|
Expressions to select with specific output names. |
{}
|
Returns:
| Type | Description |
|---|---|
Self
|
The modified ActuarialFrame. |
show_query_plan(enabled=True)
¶
Enable or disable query plan logging (basic implementation).
sort(by, *, descending=False)
¶
Sort rows by one or more columns.
Orders policies by a key column before projection or output. Useful for deterministic output ordering in reconciliation.
When to use
- Reconciliation: Sort by policy ID before comparing against a reference model to ensure row-by-row alignment.
- Reporting: Order output by issue age, premium, or product code for presentation.
Parameters¶
by : str | list[str] Column name(s) to sort by. descending : bool, default False Sort in descending order.
Returns¶
ActuarialFrame Sorted frame.
Examples¶
Sort by issue age for reporting
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame(
{
"policy_id": ["P003", "P001", "P002"],
"issue_age": [55, 30, 45],
"premium": [1500, 1200, 800],
}
)
af = af.sort("issue_age")
print(af.collect())
shape: (3, 3)
┌───────────┬───────────┬─────────┐
│ policy_id ┆ issue_age ┆ premium │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═══════════╪═══════════╪═════════╡
│ P001 ┆ 30 ┆ 1200 │
│ P002 ┆ 45 ┆ 800 │
│ P003 ┆ 55 ┆ 1500 │
└───────────┴───────────┴─────────┘
std(ddof=1)
¶
Calculate standard deviation across all numeric columns.
Returns a single-row frame containing the standard deviation for each numeric column. Essential for risk assessment, volatility analysis, and confidence interval calculations in actuarial modeling.
When to use
- Risk Assessment: Measure volatility in claim amounts, premium variations, or mortality experience for pricing and reserving.
- Experience Monitoring: Quantify variability in lapse rates, expense ratios, or benefit utilization for assumption setting.
- Confidence Intervals: Calculate standard errors for mortality estimates, reserve factors, or pricing assumptions.
- Portfolio Analysis: Assess homogeneity of risk groups by comparing standard deviations across segments.
Parameters¶
ddof : int, default 1 Delta degrees of freedom. The divisor is N - ddof.
Returns¶
StdResult A frame with one row containing standard deviations for numeric columns.
Examples¶
Scalar Example: Premium Volatility Analysis
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003", "P004", "P005"],
"age_band": ["25-35", "25-35", "36-45", "36-45", "46-55"],
"annual_premium": [1200, 1350, 3500, 3200, 8500],
"sum_assured": [100000, 150000, 350000, 300000, 500000],
}
af = ActuarialFrame(data)
std_values = af.std()
print(std_values)
print("Premium volatility:", std_values["annual_premium"])
shape: (1, 2)
┌──────────────────┬─────────────┐
│ annual_premium ┆ sum_assured │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════════════════╪═════════════╡
│ 2913.8 ┆ 158113.9 │
└──────────────────┴─────────────┘
Premium volatility: 2913.8
Vector Example: Monthly Claims Volatility
from gaspatchio_core import ActuarialFrame
data = {
"product": ["Term Life", "Whole Life"],
"monthly_claims": [
[0, 1000, 500, 2000, 0, 3000, 1500, 0, 2500, 1000, 0, 4000],
[5000, 6000, 4500, 7000, 5500, 8000, 6500, 5000, 7500, 6000, 9000, 10000]
],
"monthly_premiums": [
[50000, 50000, 52000, 51000, 50000, 49000, 50000, 51000, 50000, 50000, 51000, 50000],
[120000, 125000, 122000, 128000, 124000, 130000, 126000, 123000, 127000, 125000, 129000, 132000]
]
}
af = ActuarialFrame(data)
# Calculate standard deviation for risk assessment
std_values = af.std()
print(std_values)
print("Term Life claims volatility:", round(std_values["monthly_claims"][0], 2))
print("Whole Life claims volatility:", round(std_values["monthly_claims"][1], 2))
shape: (1, 3)
┌────────────┬──────────────────────────────┬───────────────────────────────┐
│ product ┆ monthly_claims ┆ monthly_premiums │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞════════════╪══════════════════════════════╪═══════════════════════════════╡
│ null ┆ [1443.38, 1443.38] ┆ [831.66, 3207.14] │
└────────────┴──────────────────────────────┴───────────────────────────────┘
Term Life claims volatility: 1443.38
Whole Life claims volatility: 1443.38
sum()
¶
Calculate sum totals across all numeric columns.
Returns a single-row frame containing the sum total for each numeric column. Critical for calculating portfolio totals, aggregate exposures, and overall metrics in actuarial reporting.
When to use
- Portfolio Totals: Calculate total sum assured, total premiums collected, or total claims paid for financial reporting.
- Exposure Analysis: Sum total lives covered, total benefits, or total risk amounts for reinsurance and capital calculations.
- Revenue Reporting: Aggregate premium income, fee revenue, or investment income across product lines or time periods.
- Claims Analysis: Total claim counts, amounts paid, or reserves across different claim types or cohorts.
Returns¶
SumResult A frame with one row containing sum totals for numeric columns.
Examples¶
Scalar Example: Portfolio Totals
from gaspatchio_core import ActuarialFrame
data = {
"product": ["Term", "Whole Life", "Universal", "Term", "Endowment"],
"policies_inforce": [1250, 890, 445, 2100, 325],
"annual_premium": [1500000, 3200000, 2100000, 2800000, 1900000],
"sum_assured": [125000000, 89000000, 67000000, 315000000, 48000000],
}
af = ActuarialFrame(data)
sum_values = af.sum()
print(sum_values)
print("Total policies:", sum_values["policies_inforce"])
print("Total premium:", sum_values["annual_premium"])
print("Total exposure:", sum_values["sum_assured"])
shape: (1, 3)
┌──────────────────┬────────────────┬─────────────┐
│ policies_inforce ┆ annual_premium ┆ sum_assured │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════════════════╪════════════════╪═════════════╡
│ 5010 ┆ 11500000 ┆ 644000000 │
└──────────────────┴────────────────┴─────────────┘
Total policies: 5010
Total premium: 11500000
Total exposure: 644000000
Vector Example: Monthly Totals
from gaspatchio_core import ActuarialFrame
data = {
"branch": ["North", "South"],
"monthly_new_business": [
[120, 135, 110, 145, 130, 125, 140, 155, 135, 140, 130, 160],
[95, 100, 90, 105, 110, 95, 100, 115, 105, 100, 95, 120]
],
"monthly_premium": [
[180000, 202500, 165000, 217500, 195000, 187500, 210000, 232500, 202500, 210000, 195000, 240000],
[142500, 150000, 135000, 157500, 165000, 142500, 150000, 172500, 157500, 150000, 142500, 180000]
]
}
af = ActuarialFrame(data)
# Get total new business and premiums
sum_values = af.sum()
print(sum_values)
shape: (1, 2)
┌───────────────────────────────────────┬───────────────────────────────────────┐
│ monthly_new_business ┆ monthly_premium │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════════════════════════════════╪═══════════════════════════════════════╡
│ [215, 235, 200, 250, … 240, 225, 280] ┆ [322500, 352500, 300000, … 420000] │
└───────────────────────────────────────┴───────────────────────────────────────┘
trace(func)
¶
Capture operations within a function call in optimize mode.
var(ddof=1)
¶
Calculate variance across all numeric columns.
Returns a single-row frame containing the variance for each numeric column. Used for risk metrics, ANOVA calculations, and statistical modeling in actuarial applications.
When to use
- Risk Metrics: Calculate variance in loss ratios, combined ratios, or expense ratios for enterprise risk management.
- Statistical Testing: Perform ANOVA on mortality rates, lapse rates, or claim frequencies across different cohorts.
- Credibility Theory: Calculate variance components for Bühlmann credibility factors in experience rating.
- Asset-Liability Modeling: Measure variance in investment returns, liability cash flows, or surplus positions.
Parameters¶
ddof : int, default 1 Delta degrees of freedom. The divisor is N - ddof.
Returns¶
VarResult A frame with one row containing variances for numeric columns.
Examples¶
Scalar Example: Claims Variance Analysis
from gaspatchio_core import ActuarialFrame
data = {
"month": [1, 2, 3, 4, 5, 6],
"claims_count": [45, 52, 38, 61, 43, 55],
"claims_amount": [125000, 145000, 95000, 185000, 120000, 165000],
}
af = ActuarialFrame(data)
var_values = af.var()
print(var_values)
print("Claims count variance:", var_values["claims_count"])
print("Claims amount variance:", var_values["claims_amount"])
shape: (1, 3)
┌───────┬──────────────┬──────────────────┐
│ month ┆ claims_count ┆ claims_amount │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═══════╪══════════════╪══════════════════╡
│ 3.5 ┆ 70.3 ┆ 1.091e9 │
└───────┴──────────────┴──────────────────┘
Claims count variance: 70.3
Claims amount variance: 1091000000.0
Vector Example: Experience Variance Components
from gaspatchio_core import ActuarialFrame
data = {
"region": ["North", "South"],
"quarterly_lapse_rates": [
[0.025, 0.028, 0.022, 0.026],
[0.031, 0.029, 0.033, 0.030]
],
"quarterly_mortality_rates": [
[0.0010, 0.0011, 0.0009, 0.0010],
[0.0012, 0.0013, 0.0011, 0.0014]
]
}
af = ActuarialFrame(data)
# Calculate variance for credibility analysis
var_values = af.var()
print(var_values)
print("North region lapse variance:", var_values["quarterly_lapse_rates"][0])
print("South region lapse variance:", var_values["quarterly_lapse_rates"][1])
shape: (1, 3)
┌────────────┬────────────────────────┬──────────────────────────────┐
│ region ┆ quarterly_lapse_rates ┆ quarterly_mortality_rates │
│ --- ┆ --- ┆ --- │
│ str ┆ list[f64] ┆ list[f64] │
╞════════════╪════════════════════════╪══════════════════════════════╡
│ null ┆ [0.000007, 0.000003] ┆ [0.0000000067, 0.0000000167] │
└────────────┴────────────────────────┴──────────────────────────────┘
North region lapse variance: 0.000007
South region lapse variance: 0.000003
with_columns(*exprs)
¶
Add columns to the DataFrame.