Assumption Table Examples in Gaspatchio¶
Working with Mortality Tables¶
This guide walks through using the 2015 VBT Female Smoker Mortality Table (ANB) as an example to demonstrate how to set up and use assumption tables in Gaspatchio.
Understanding the Table Structure¶
The 2015 VBT table is structured as follows: - Rows represent issue ages (18-95) - Columns represent policy durations (1-25 plus "Ultimate") - Values represent mortality rates per 1,000
Here's a small sample from the table:
| Issue Age | Duration 1 | Duration 2 | Duration 3 | Duration 4 | Duration 5 | Ultimate | Attained Age |
|---|---|---|---|---|---|---|---|
| 30 | 0.20 | 0.25 | 0.31 | 0.38 | 0.45 | 4.84 | 55 |
| 31 | 0.21 | 0.26 | 0.34 | 0.42 | 0.51 | 5.35 | 56 |
| 32 | 0.22 | 0.28 | 0.37 | 0.47 | 0.58 | 5.93 | 57 |
| 33 | 0.23 | 0.31 | 0.42 | 0.53 | 0.65 | 6.59 | 58 |
| 34 | 0.25 | 0.35 | 0.48 | 0.61 | 0.73 | 7.31 | 59 |
Loading the Assumption Table¶
Loading assumption tables is straightforward with the dimension-based API. Gaspatchio provides tools to analyze table structure and configure dimensions:
import gaspatchio_core as gs
# First, analyze the table structure (optional but helpful)
df = pl.read_csv("2015-VBT-FSM-ANB.csv")
schema = gs.assumptions.analyze_table(df)
print(schema.suggest_table_config())
# Load the mortality table with dimension configuration
vbt_table = gs.Table(
name="vbt_2015_female_smoker",
source="2015-VBT-FSM-ANB.csv",
dimensions={
"issue_age": "Issue Age", # Map model name to CSV column name
"duration": gs.assumptions.MeltDimension(
columns=[str(i) for i in range(1, 26)] + ["Ultimate"],
name="duration",
overflow=gs.assumptions.ExtendOverflow("Ultimate", to_value=200)
)
},
value="mortality_rate"
)
The API explicitly configures:
- Dimension mapping: The key (
issue_age) is the name you use in lookups; the value ("Issue Age") is the column name in the source file - Melt dimensions: Transform wide columns (1-25, Ultimate) into long format
- Overflow strategies: Expand "Ultimate" values to higher durations
- Value column name: Name for the melted rates
After loading, the internal data looks like this:
| issue_age | duration | mortality_rate |
|---|---|---|
| 30 | 1 | 0.20 |
| 30 | 2 | 0.25 |
| 30 | 3 | 0.31 |
| 30 | 4 | 0.38 |
| 30 | 5 | 0.45 |
| 30 | 26 | 4.84 |
| 30 | 27 | 4.84 |
| 30 | 150 | 4.84 |
| ... | ... | ... |
Using the Assumption Table in ActuarialFrame¶
Now we can use this table for lightning-fast lookups.
Why So Fast?
This VBT table has dimensions [78 ages × 200 durations] = 15,600 entries. Gaspatchio detects this as a dense table and stores it as a contiguous array in memory. Each lookup is just:
- Compute index:
age_offset × 200 + duration(a few nanoseconds) - Read value:
data[index](direct memory access)
No hash computation, no bucket probing - just arithmetic and array indexing. This is why 324 million lookups complete in ~1 second instead of ~27 seconds.
# Create a simple policy dataset
policy_data = pl.DataFrame({
"policy_id": ["A001", "A002", "A003", "A004"],
"issue_age": [30, 35, 40, 45],
"duration": [1, 3, 5, 10]
})
# Convert to ActuarialFrame
af = gs.ActuarialFrame(policy_data)
# Look up mortality rates using the table's lookup method
af.mortality_rate = vbt_table.lookup(issue_age=af.issue_age, duration=af.duration)
print(af)
Result:
shape: (4, 4)
┌──────────┬───────────┬──────────┬───────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ f64 │
╞══════════╪═══════════╪══════════╪═══════════════╡
│ A001 ┆ 30 ┆ 1 ┆ 0.20 │
│ A002 ┆ 35 ┆ 3 ┆ 0.54 │
│ A003 ┆ 40 ┆ 5 ┆ 1.15 │
│ A004 ┆ 45 ┆ 10 ┆ 4.10 │
└──────────┴───────────┴──────────┴───────────────┘
Working with Overflow Durations¶
The beauty of the API is that overflow handling is completely transparent. Even extreme durations work instantly:
# Test with durations beyond the table (> 25)
extreme_data = pl.DataFrame({
"policy_id": ["X001", "X002"],
"issue_age": [30, 40],
"duration": [50, 100] # Way beyond table max of 25!
})
af_extreme = gs.ActuarialFrame(extreme_data)
af_extreme.mortality_rate = vbt_table.lookup(
issue_age=af_extreme.issue_age,
duration=af_extreme.duration
)
print(af_extreme)
Result:
shape: (2, 4)
┌──────────┬───────────┬──────────┬────────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ f64 │
╞══════════╪═══════════╪══════════╪════════════════╡
│ X001 ┆ 30 ┆ 50 ┆ 4.84 │
│ X002 ┆ 40 ┆ 100 ┆ 9.32 │
└──────────┴───────────┴──────────┴────────────────┘
Both policies get the "Ultimate" rate because the ExtendOverflow strategy pre-expanded the overflow during loading.
Projecting Multiple Periods¶
Gaspatchio's vector-based approach works seamlessly with the API:
# Create a policy with projection over multiple durations
policy_projection = pl.DataFrame({
"policy_id": ["B001"],
"issue_age": [30],
"duration": [[1, 2, 3, 4, 5, 25, 26, 50, 100]] # Mix of regular and overflow
})
af_proj = gs.ActuarialFrame(policy_projection)
# Look up mortality rates for all durations at once
af_proj.mortality_rate = vbt_table.lookup(
issue_age=af_proj.issue_age,
duration=af_proj.duration
)
# Explode for visualization
result = af_proj.explode(["duration", "mortality_rate"])
print(result)
Result:
shape: (9, 4)
┌──────────┬───────────┬──────────┬───────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ f64 │
╞══════════╪═══════════╪══════════╪═══════════════╡
│ B001 ┆ 30 ┆ 1 ┆ 0.20 │
│ B001 ┆ 30 ┆ 2 ┆ 0.25 │
│ B001 ┆ 30 ┆ 3 ┆ 0.31 │
│ B001 ┆ 30 ┆ 4 ┆ 0.38 │
│ B001 ┆ 30 ┆ 5 ┆ 0.45 │
│ B001 ┆ 30 ┆ 25 ┆ 4.12 │
│ B001 ┆ 30 ┆ 26 ┆ 4.84 │
│ B001 ┆ 30 ┆ 50 ┆ 4.84 │
│ B001 ┆ 30 ┆ 100 ┆ 4.84 │
└──────────┴───────────┴──────────┴───────────────┘
Loading Simple Curves¶
For 1-dimensional tables (like lapse rates by age), the API is even simpler:
# Load a simple age → lapse rate curve
lapse_table = gs.Table(
name="lapse_2025",
source="lapse_curve.csv",
dimensions={
"age": "age" # Simple string shorthand
},
value="lapse_rate"
)
# Use it immediately
af.lapse_rate = lapse_table.lookup(age=af.age)
Advanced Features¶
For more complex scenarios, you have full control with the dimension-based API:
# Multi-dimensional table with selective column loading
mortality_table = gs.Table(
name="mortality_by_gender",
source="mortality_m_f.csv",
dimensions={
"age": "age",
"gender": gs.assumptions.MeltDimension(
columns=["Male", "Female"],
name="gender"
)
},
value="mortality_rate"
)
# Table with custom overflow limits
salary_table = gs.Table(
name="salary_scale",
source="salary_by_service.csv",
dimensions={
"grade": "grade",
"service": gs.assumptions.MeltDimension(
columns=[str(i) for i in range(1, 21)] + ["20+"],
name="service",
overflow=gs.assumptions.ExtendOverflow("20+", to_value=50)
)
},
value="scale_factor"
)
# Using computed dimensions
complex_table = gs.Table(
name="complex_assumptions",
source=df,
dimensions={
"issue_age": "issue_age",
"policy_year": "policy_year",
"attained_age": gs.assumptions.ComputedDimension(
pl.col("issue_age") + pl.col("policy_year") - 1,
"attained_age"
)
},
value="assumption_value"
)
Using the TableBuilder Pattern¶
For step-by-step table construction, use the fluent TableBuilder API:
# Build a complex mortality table
mortality_table = (
gs.TableBuilder("mortality_select_ultimate")
.from_source("mortality_su.csv")
.with_data_dimension("issue_age", "IssueAge")
.with_data_dimension("gender", "Gender")
.with_melt_dimension(
"duration",
columns=[f"Dur{i}" for i in range(1, 16)] + ["Ultimate"],
overflow=gs.assumptions.ExtendOverflow("Ultimate", to_value=100),
fill=gs.assumptions.LinearInterpolate() # Interpolate any gaps
)
.with_value_column("qx_rate")
.build()
)
# The table is ready for lookups
af.mortality_rate = mortality_table.lookup(
issue_age=af.age,
gender=af.sex,
duration=af.policy_duration
)
Metadata and Table Discovery¶
Tables can include metadata for documentation and discovery:
# Create table with rich metadata
vbt_table = gs.Table(
name="vbt_2015_complete",
source="vbt_2015_all.csv",
dimensions={
"age": "Age",
"gender": "Gender",
"smoking": "Smoker",
"duration": gs.assumptions.MeltDimension(
columns=duration_columns,
name="duration",
overflow=gs.assumptions.ExtendOverflow("Ultimate", to_value=120)
)
},
value="mortality_rate",
metadata={
"source": "2015 Valuation Basic Table",
"basis": "ANB",
"version": "2015",
"effective_date": "2015-01-01",
"description": "Industry standard mortality table",
"tags": ["mortality", "vbt", "2015", "standard"]
}
)
# Discover tables
all_tables = gs.list_tables()
print(f"Available tables: {all_tables}")
# Get metadata for a specific table
metadata = gs.get_table_metadata("vbt_2015_complete")
print(f"Table metadata: {metadata}")
# List all tables with metadata
tables_info = gs.list_tables_with_metadata()
for name, meta in tables_info.items():
print(f"{name}: {meta.get('description', 'No description')}")