# Gaspatchio

Gaspatchio is an actuarial modelling framework that allows you to build and run actuarial models in pure Python but built for use with LLMs from the ground up.

# Concepts documentation

# Assumptions in Gaspatchio

## Overview

Actuarial models rely heavily on assumption tables - mortality rates, lapse rates, expense assumptions, and other factors that drive projections. Gaspatchio provides a high-performance vector-based lookup system with a simple, intuitive API that handles the complexities of assumption table loading and transformation automatically.

One core principle of Gaspatchio is to "meet people where they are". With regard to assumption tables, this means recognizing that you've likely already got a table in a format that you like. That might come from Excel, another system, regulatory requirements, or a combination of all of those.

Gaspatchio's assumption system is designed to work with any table format and will automatically transform it into a format that is optimized for performance. That's what "meeting people where they are" means to us. Keep your data as it is, and let Gaspatchio do the rest.

## The Table API

Gaspatchio's assumption system revolves around the dimension-based Table API:

- **`Table()`** - Load and register assumption tables with automatic format detection and dimension configuration. This happens ONCE before you start your projection/run.
- **`table.lookup()`** - Perform high-performance vector lookups. This happens for each policy/projection period. Which is A LOT.

```python
import gaspatchio_core as gs
import polars as pl

# Load any assumption table with the Table API
mortality_table = gs.Table(
    name="mortality_rates",
    source="mortality_table.csv",  # or DataFrame
    dimensions={
        "age": "age"  # Simple string shorthand for data dimensions
    },
    value="mortality_rate"
)

# Use in projections with vector lookups
af = af.with_columns(
    mortality_table.lookup(age=af["age_last"])
)

```

## Key Advantages

### 1. **Dimension-Based Design**

The API uses explicit dimensions for clarity and flexibility:

```python
# Simple curve (1D table)
lapse_table = gs.Table(
    name="lapse_rates", 
    source=lapse_df,
    dimensions={
        "duration": "duration"  # Map duration column to duration dimension
    },
    value="lapse_rate"
)

# Wide table with melt dimension (age × duration grid)
mortality_table = gs.Table(
    name="mortality_rates",
    source=mortality_df,
    dimensions={
        "age": "age",
        "duration": gs.MeltDimension(
            columns=["1", "2", "3", "4", "5", "Ultimate"],
            name="duration",
            overflow=gs.ExtendOverflow("Ultimate", to_value=120)
        )
    },
    value="qx"
)

# Multi-dimensional table
multi_dim_table = gs.Table(
    name="vbt_2015",
    source=vbt_df,
    dimensions={
        "age": "age",
        "sex": "sex",
        "smoker": "smoker_status"  # Can map different column names
    },
    value="mortality_rate"
)

```

### 2. **Automatic Format Detection with Analysis**

Use the `analyze_table()` function to get insights and configuration suggestions:

```python
# Analyze any table to understand its structure
schema = gs.analyze_table(df)
print(schema.suggest_table_config())

# Output:
# Table(
#     name="your_table_name",
#     source=df,
#     dimensions={
#         "age": "age",
#         "duration": MeltDimension(
#             columns=["1", "2", "3", "4", "5", "Ultimate"],
#             overflow=ExtendOverflow("Ultimate", to_value=120)
#         )
#     },
#     value="rate"
# )

```

### 3. **Smart Overflow Handling**

Wide tables often have "Ultimate" or overflow columns for durations beyond the explicit range. The API handles this explicitly:

```python
# Table with columns: Age, 1, 2, 3, 4, 5, "Ult."
mortality_table = gs.Table(
    name="mortality_table",
    source=df,
    dimensions={
        "age": "age",
        "duration": gs.MeltDimension(
            columns=["1", "2", "3", "4", "5", "Ult."],
            overflow=gs.ExtendOverflow("Ult.", to_value=120),  # Expands to duration 120
            fill=gs.LinearInterpolate()  # Optional: interpolate gaps
        )
    },
    value="rate"
)

# Lookups work seamlessly for any duration
af = af.with_columns(
    mortality_table.lookup(age=af["age"], duration=af["duration"])
)

```

### 4. **Vector-Native Performance**

Handle entire projection vectors without loops or exploding data:

```python
# Age progresses as a vector per policy
df = df.with_columns(
    age_vector=[[30, 31, 32, 33, ...]]  # 480 months of ages
)

# Single lookup returns vector of rates for all ages
df = df.with_columns(
    mortality_table.lookup(age=pl.col("age_vector"))
)
# Result: [0.0011, 0.0012, 0.0013, ...]

```

Rust-Powered Multi-Core Performance

Gaspatchio's assumption system is **implemented in Rust** and leverages **all available CPU cores** automatically. The core registry (`PyAssumptionTableRegistry`) stores lookup indices as optimized Rust `HashMap` structures, providing:

- **O(1) hash-based lookups** regardless of table size
- **Zero-copy memory access** through Rust's ownership system
- **Automatic parallelization** via Polars' multi-threaded query engine
- **SIMD vectorization** for mathematical operations on assumption vectors

When you perform a lookup on 1 million policies with 480-month projections (480M total lookups), Gaspatchio distributes the work across all CPU cores simultaneously. A 16-core machine can process assumption lookups **16x faster** than traditional single-threaded approaches.

```python
# This single operation uses ALL your CPU cores
af = af.with_columns(
    mortality_table.lookup(age=af["age_vector"])
)
# 480M lookups completed in seconds, not minutes

```

## Tidy Data Principles

Following Tidy Data Best Practices

Gaspatchio's assumption system is built around the **tidy data** principles outlined by Hadley Wickham in his seminal 2014 paper "Tidy Data" (Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10).

Tidy datasets follow three fundamental rules:

1. **Each variable is a column** - keys (age, duration, gender) and values (mortality rates, lapse rates) are separate columns
1. **Each observation is a row** - each row represents one lookup combination (e.g., age 30 + duration 5 = rate 0.0023)
1. **Each type of observational unit is a table** - mortality assumptions, lapse assumptions, etc. are separate tables

### Why Tidy Assumptions Matter

Traditional actuarial tables are often stored in "wide" format - convenient for human reading but inefficient for computation:

**Wide Format (Human-Readable)**

```text
┌─────┬──────┬──────┬──────┬──────┐
│ Age │  1   │  2   │  3   │ Ult. │
├─────┼──────┼──────┼──────┼──────┤
│ 30  │0.001 │0.002 │0.003 │0.005 │
│ 31  │0.001 │0.002 │0.003 │0.005 │
└─────┴──────┴──────┴──────┴──────┘

```

**Tidy Format (Machine-Optimized)**

```text
┌─────┬──────────┬───────┐
│ Age │ Duration │  Rate │
├─────┼──────────┼───────┤
│ 30  │    1     │ 0.001 │
│ 30  │    2     │ 0.002 │
│ 30  │    3     │ 0.003 │
│ 30  │   120    │ 0.005 │
│ 31  │    1     │ 0.001 │
└─────┴──────────┴───────┘

```

### Automatic Tidy Transformation

The `Table` class with `MeltDimension` automatically converts wide tables to tidy format:

```python
# Input: Wide mortality table
wide_table = pl.DataFrame({
    "age": [30, 31, 32],
    "1": [0.0011, 0.0012, 0.0013],
    "2": [0.0012, 0.0013, 0.0014],
    "3": [0.0013, 0.0014, 0.0015],
    "Ult.": [0.0050, 0.0051, 0.0052]
})

# Automatic tidy transformation
mortality_table = gs.Table(
    name="mortality",
    source=wide_table,
    dimensions={
        "age": "age",
        "duration": gs.MeltDimension(
            columns=["1", "2", "3", "Ult."],
            name="duration"
        )
    },
    value="rate"
)

# Result: Tidy table ready for high-performance lookups
# Each age/duration combination becomes a separate row

```

The tidy format enables:

- **Vectorized lookups**: Query millions of age/duration combinations in microseconds
- **Flexible filtering**: Add conditions like gender, smoking status, or product type as additional columns
- **Consistent API**: Same lookup pattern works for all assumption types
- **Memory efficiency**: No duplicate storage of rates across multiple table formats

## Loading Different Table Types

### Curve Tables (1-Dimensional)

For simple tables with one key and one value:

```python
# Lapse rates by policy duration
lapse_df = pl.DataFrame({
    "policy_duration": [1, 2, 3, 4, 5],
    "lapse_rate": [0.05, 0.04, 0.03, 0.02, 0.01]
})

lapse_table = gs.Table(
    name="lapse_rates",
    source=lapse_df,
    dimensions={
        "policy_duration": "policy_duration"
    },
    value="lapse_rate"
)

```

### Wide Tables (Age × Duration Grids)

For mortality tables and similar multi-dimensional assumptions:

```python
# Mortality table with multiple gender/smoking combinations
mortality_table = gs.Table(
    name="mortality_vbt_2015",
    source="mortality.parquet",
    dimensions={
        "age-last": "age-last",
        "variable": gs.MeltDimension(
            columns=["MNS", "FNS", "MS", "FS"],  # Male/Female, Non-Smoker/Smoker
            name="variable"
        )
    },
    value="mortality_rate"
)

```

**Input DataFrame:**

```text
┌──────────┬──────────┬──────────┬──────────┬──────────┐
│ age-last │ MNS      │ FNS      │ MS       │ FS       │
├──────────┼──────────┼──────────┼──────────┼──────────┤
│ 30       │ 0.0011   │ 0.0010   │ 0.0021   │ 0.0019   │
│ 31       │ 0.0012   │ 0.0011   │ 0.0022   │ 0.0020   │
└──────────┴──────────┴──────────┴──────────┴──────────┘

```

**Automatic transformation to tidy format:**

```text
┌──────────┬──────────┬───────────────┐
│ age-last │ variable │ mortality_rate│
├──────────┼──────────┼───────────────┤
│ 30       │ MNS      │ 0.0011        │
│ 30       │ FNS      │ 0.0010        │
│ 30       │ MS       │ 0.0021        │
│ 30       │ FS       │ 0.0019        │
│ 31       │ MNS      │ 0.0012        │
│ 31       │ FNS      │ 0.0011        │
└──────────┴──────────┴───────────────┘

```

### Tables with Overflow Columns

For tables with "Ultimate" or "Term" columns representing rates beyond the explicit duration range:

```python
# VBT 2015 table with durations 1-25 plus "Ult." column
vbt_table = gs.Table(
    name="vbt_2015_female_smoker",
    source="2015-VBT-FSM-ANB.csv",
    dimensions={
        "issue_age": "issue_age",
        "duration": gs.MeltDimension(
            columns=[str(i) for i in range(1, 26)] + ["Ult."],
            name="duration",
            overflow=gs.ExtendOverflow("Ult.", to_value=120)
        )
    },
    value="qx"
)

```

This automatically creates lookup entries for durations 26, 27, 28, ... 120, all using the "Ultimate" rate from the original table.

## Performing Lookups

### Single-Key Lookups

```python
# Simple lapse rate lookup
af = af.with_columns(
    lapse_table.lookup({"policy_duration": af["policy_duration"]})
)

```

### Multi-Key Lookups

```python
# Mortality lookup with age and gender/smoking status
af = af.with_columns(
    mortality_table.lookup({
        "age_last": af["age_last"],
        "variable": af["gender_smoking"]
    })
)

```

### Vector Lookups

The most powerful feature - handle entire projection vectors:

```python
# Project 480 months for each policy
af = af.with_columns(
    monthly_ages=af["issue_age"] + (af["projection_months"] / 12),
    monthly_durations=af["policy_duration"] + (af["projection_months"] / 12)
)

# Single lookup returns 480 mortality rates per policy
af = af.with_columns(
    mortality_table.lookup(
        age=af["monthly_ages"],
        duration=af["monthly_durations"]
    )
)

```

## Complete Model Example

Here's how assumption tables integrate into a complete actuarial model:

```python
import gaspatchio_core as gs
import polars as pl
from gaspatchio_core import ActuarialFrame

def setup_assumptions():
    """Load all assumption tables for the model"""

    # Load mortality table (wide format with overflow)
    mortality_df = pl.read_parquet("assumptions/mortality.parquet")
    mortality_table = gs.Table(
        name="mortality_rates",
        source=mortality_df,
        dimensions={
            "age-last": "age-last",
            "variable": gs.MeltDimension(
                columns=["MNS", "FNS", "MS", "FS"],
                name="variable"
            )
        },
        value="mortality_rate"
    )

    # Load lapse curve (simple 1D table)
    lapse_df = pl.read_parquet("assumptions/lapse.parquet")
    lapse_table = gs.Table(
        name="lapse_rates",
        source=lapse_df,
        dimensions={
            "policy_duration": "policy_duration"
        },
        value="lapse_rate"
    )

    # Load premium rates (wide format)
    premium_df = pl.read_parquet("assumptions/premium_rates.parquet")
    premium_table = gs.Table(
        name="premium_rates",
        source=premium_df,
        dimensions={
            "age-last": "age-last",
            "variable": gs.MeltDimension(
                columns=["MNS", "FNS", "MS", "FS"],
                name="variable"
            )
        },
        value="premium_rate"
    )

    return mortality_table, lapse_table, premium_table

def life_model(policies_df):
    """Complete life insurance projection model"""

    # Setup assumption tables
    mortality_table, lapse_table, premium_table = setup_assumptions()

    # Create ActuarialFrame
    af = ActuarialFrame(policies_df)

    # Setup projection vectors (480 months per policy)
    max_age = 101
    af["num_proj_months"] = (max_age - af["age"]) * 12
    af["proj_months"] = af.fill_series(af["num_proj_months"], 0, 1)

    # Calculate age and duration vectors
    af["age_last"] = (af["age"] + (af["proj_months"] / 12)).floor()
    af["policy_duration"] = (af["policy_duration"] + (af["proj_months"] / 12)).floor()

    # Create gender/smoking variable for lookups
    af["variable"] = af["gender"] + af["smoking_status"]

    # Vector lookups - get rates for all 480 months at once
    af["mortality_rate"] = mortality_table.lookup({
        "age_last": af["age_last"],
        "variable": af["variable"]
    })

    af["lapse_rate"] = lapse_table.lookup({
        "policy_duration": af["policy_duration"]
    })

    af["premium_rate"] = premium_table.lookup({
        "age_last": af["age_last"],
        "variable": af["variable"]
    })

    # Calculate probabilities and cash flows
    af["monthly_persist_prob"] = (1 - af["mortality_rate"] / 12) * (1 - af["lapse_rate"] / 12)

    # Probability in force (cumulative product with shift)
    af["prob_in_force"] = af["monthly_persist_prob"].list.eval(
        pl.element().cum_prod().shift(1).fill_null(1.0)
    )

    # Cash flows
    af["premium_cf"] = af["premium_rate"] / 12 * af["prob_in_force"] * af["sum_assured"] / 1000
    af["claims_cf"] = af["prob_in_force"] * af["mortality_rate"] / 12 * af["sum_assured"]
    af["profit_cf"] = af["premium_cf"] - af["claims_cf"]

    return af

# Run the model
policies = pl.read_csv("model_points.csv")
results = life_model(policies)

```

## Using the TableBuilder Pattern

For complex table configurations, use the `TableBuilder` pattern:

```python
# Build a complex table step by step
table = (
    gs.TableBuilder("complex_mortality")
    .from_source("mortality_data.csv")
    .with_data_dimension("issue_age", "issue_age")
    .with_data_dimension("policy_year", "policy_year")
    .with_computed_dimension(
        "attained_age",
        pl.col("issue_age") + pl.col("policy_year") - 1,
        "attained_age"
    )
    .with_melt_dimension(
        "duration",
        columns=[str(i) for i in range(1, 26)] + ["Ultimate"],
        overflow=gs.ExtendOverflow("Ultimate", to_value=100)
    )
    .with_value_column("mortality_rate")
    .build()
)

```

## Performance Benefits

The assumption system provides significant performance improvements:

### 1. **Pre-Computed Expansion**

Overflow columns are expanded once at load time, not during every lookup:

```python
# Table with durations 1-25 + "Ult." gets expanded to 1-120 immediately
table = gs.Table(
    name="mortality",
    source=df,
    dimensions={
        "age": "age",
        "duration": gs.MeltDimension(
            columns=duration_cols,
            overflow=gs.ExtendOverflow("Ult.", to_value=120)
        )
    },
    value="rate"
)

# All lookups are now O(1) hash operations - no overflow logic needed
af = af.with_columns(
    table.lookup({"age": af["age"], "duration": 100})  # duration=100 works instantly
)

```

### 2. **Vector-Native Operations**

No exploding, joining, or reaggregating required:

```python
# Traditional approach: explode 1M policies × 480 months = 480M rows
# Gaspatchio: 1M policies with 480-element vectors = 1M rows

# Single operation handles entire projection
af = af.with_columns(
    mortality_table.lookup({"age": af["age_vector"]})
)

```

### 3. **Optimized Hash-Based Lookups**

Built on Rust HashMaps for maximum performance:

- O(1) lookup time regardless of table size
- Efficient memory usage with pre-indexed structures
- Integrates with Polars' lazy evaluation for optimal query planning

## API Reference

### `Table` Class

```python
gs.Table(
    name: str,                          # Table name for lookups
    source: str | pl.DataFrame,         # File path or DataFrame
    dimensions: dict[str, str | Dimension], # Dimension configuration
    value: str = "rate",                # Name for value column
    metadata: dict | None = None,       # Optional metadata storage
    validate: bool = True               # Enable validation
) -> Table

```

**Parameters:**

- **`name`**: Unique identifier for the table in the lookup registry
- **`source`**: Either a file path (.csv/.parquet) or a Polars DataFrame
- **`dimensions`**: Dictionary mapping dimension names to columns or Dimension objects
- **`value`**: Name of the value column in the final tidy table
- **`metadata`**: Optional dictionary stored with the table
- **`validate`**: Whether to validate dimension configuration

### `table.lookup()`

```python
table.lookup(
    **dimensions: str | pl.Expr         # Dimension names mapped to columns/expressions
) -> pl.Expr

```

Returns a Polars expression that performs the lookup. Use within `.with_columns()` or similar Polars operations.

### Dimension Types

- **`DataDimension`**: Maps a column directly to a dimension
- **`MeltDimension`**: Transforms wide columns into long format
- **`CategoricalDimension`**: Adds a constant categorical value
- **`ComputedDimension`**: Creates a dimension from an expression

### Strategy Types

- **`ExtendOverflow`**: Extends a specific column value to higher indices
- **`AutoDetectOverflow`**: Automatically detects overflow columns
- **`LinearInterpolate`**: Fills gaps with linear interpolation
- **`FillConstant`**: Fills gaps with a constant value
- **`FillForward`**: Forward fills missing values

# Assumption Table Examples in Gaspatchio

## Working with Mortality Tables

This guide walks through using the 2015 VBT Female Smoker Mortality Table (ANB) as an example to demonstrate how to set up and use assumption tables in Gaspatchio.

### Understanding the Table Structure

The 2015 VBT table is structured as follows:

- Rows represent issue ages (18-95)
- Columns represent policy durations (1-25 plus "Ultimate")
- Values represent mortality rates per 1,000

Here's a small sample from the table:

| Issue Age | Duration 1 | Duration 2 | Duration 3 | Duration 4 | Duration 5 | Ultimate | Attained Age | | --- | --- | --- | --- | --- | --- | --- | --- | | 30 | 0.20 | 0.25 | 0.31 | 0.38 | 0.45 | 4.84 | 55 | | 31 | 0.21 | 0.26 | 0.34 | 0.42 | 0.51 | 5.35 | 56 | | 32 | 0.22 | 0.28 | 0.37 | 0.47 | 0.58 | 5.93 | 57 | | 33 | 0.23 | 0.31 | 0.42 | 0.53 | 0.65 | 6.59 | 58 | | 34 | 0.25 | 0.35 | 0.48 | 0.61 | 0.73 | 7.31 | 59 |

### Loading the Assumption Table

Loading assumption tables is straightforward with the dimension-based API. Gaspatchio provides tools to analyze table structure and configure dimensions:

```python
import gaspatchio_core as gs

# First, analyze the table structure (optional but helpful)
df = pl.read_csv("2015-VBT-FSM-ANB.csv")
schema = gs.analyze_table(df)
print(schema.suggest_table_config())

# Load the mortality table with dimension configuration
vbt_table = gs.Table(
    name="vbt_2015_female_smoker",
    source="2015-VBT-FSM-ANB.csv",
    dimensions={
        "Issue Age": "Issue Age",  # Simple data dimension
        "duration": gs.MeltDimension(
            columns=[str(i) for i in range(1, 26)] + ["Ultimate"],
            name="duration",
            overflow=gs.ExtendOverflow("Ultimate", to_value=200)
        )
    },
    value="mortality_rate"
)

```

The API explicitly configures:

- Data dimensions (like Issue Age) that map directly from columns
- Melt dimensions that transform wide columns (1-25, Ultimate) into long format
- Overflow strategies that expand "Ultimate" values to higher durations
- The value column name for the melted rates

After loading, the internal data looks like this:

| Issue Age | duration | mortality_rate | | --- | --- | --- | | 30 | 1 | 0.20 | | 30 | 2 | 0.25 | | 30 | 3 | 0.31 | | 30 | 4 | 0.38 | | 30 | 5 | 0.45 | | 30 | 26 | 4.84 | | 30 | 27 | 4.84 | | 30 | 150 | 4.84 | | ... | ... | ... |

### Using the Assumption Table in ActuarialFrame

Now we can use this table for lightning-fast lookups:

```python
# Create a simple policy dataset
policy_data = pl.DataFrame({
    "policy_id": ["A001", "A002", "A003", "A004"],
    "issue_age": [30, 35, 40, 45],
    "duration": [1, 3, 5, 10]
})

# Convert to ActuarialFrame
af = gs.ActuarialFrame(policy_data)

# Look up mortality rates using the table's lookup method
af = af.with_columns(
    vbt_table.lookup({
        "Issue Age": af["issue_age"], 
        "duration": af["duration"]
    }).alias("mortality_rate")
)

print(af)

```

Result:

```text
shape: (4, 4)
┌──────────┬───────────┬──────────┬───────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ ---      ┆ ---       ┆ ---      ┆ ---           │
│ str      ┆ i64       ┆ i64      ┆ f64           │
╞══════════╪═══════════╪══════════╪═══════════════╡
│ A001     ┆ 30        ┆ 1        ┆ 0.20          │
│ A002     ┆ 35        ┆ 3        ┆ 0.54          │
│ A003     ┆ 40        ┆ 5        ┆ 1.15          │
│ A004     ┆ 45        ┆ 10       ┆ 4.10          │
└──────────┴───────────┴──────────┴───────────────┘

```

### Working with Overflow Durations

The beauty of the API is that overflow handling is completely transparent. Even extreme durations work instantly:

```python
# Test with durations beyond the table (> 25)
extreme_data = pl.DataFrame({
    "policy_id": ["X001", "X002"],
    "issue_age": [30, 40], 
    "duration": [50, 100]  # Way beyond table max of 25!
})

af_extreme = gs.ActuarialFrame(extreme_data)
af_extreme = af_extreme.with_columns(
    vbt_table.lookup({
        "Issue Age": af_extreme["issue_age"], 
        "duration": af_extreme["duration"]
    }).alias("mortality_rate")
)

print(af_extreme)

```

Result:

```text
shape: (2, 4)
┌──────────┬───────────┬──────────┬────────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ ---      ┆ ---       ┆ ---      ┆ ---            │
│ str      ┆ i64       ┆ i64      ┆ f64            │
╞══════════╪═══════════╪══════════╪════════════════╡
│ X001     ┆ 30        ┆ 50       ┆ 4.84           │
│ X002     ┆ 40        ┆ 100      ┆ 9.32           │
└──────────┴───────────┴──────────┴────────────────┘

```

Both policies get the "Ultimate" rate because the `ExtendOverflow` strategy pre-expanded the overflow during loading.

### Projecting Multiple Periods

Gaspatchio's vector-based approach works seamlessly with the API:

```python
# Create a policy with projection over multiple durations
policy_projection = pl.DataFrame({
    "policy_id": ["B001"],
    "issue_age": [30],
    "duration": [[1, 2, 3, 4, 5, 25, 26, 50, 100]]  # Mix of regular and overflow
})

af_proj = gs.ActuarialFrame(policy_projection)

# Look up mortality rates for all durations at once
af_proj = af_proj.with_columns(
    vbt_table.lookup({
        "Issue Age": af_proj["issue_age"], 
        "duration": af_proj["duration"]
    }).alias("mortality_rate")
)

# Explode for visualization
result = af_proj.explode(["duration", "mortality_rate"])
print(result)

```

Result:

```text
shape: (9, 4)
┌──────────┬───────────┬──────────┬───────────────┐
│ policy_id ┆ issue_age ┆ duration ┆ mortality_rate │
│ ---      ┆ ---       ┆ ---      ┆ ---           │
│ str      ┆ i64       ┆ i64      ┆ f64           │
╞══════════╪═══════════╪══════════╪═══════════════╡
│ B001     ┆ 30        ┆ 1        ┆ 0.20          │
│ B001     ┆ 30        ┆ 2        ┆ 0.25          │
│ B001     ┆ 30        ┆ 3        ┆ 0.31          │
│ B001     ┆ 30        ┆ 4        ┆ 0.38          │
│ B001     ┆ 30        ┆ 5        ┆ 0.45          │
│ B001     ┆ 30        ┆ 25       ┆ 4.12          │
│ B001     ┆ 30        ┆ 26       ┆ 4.84          │
│ B001     ┆ 30        ┆ 50       ┆ 4.84          │
│ B001     ┆ 30        ┆ 100      ┆ 4.84          │
└──────────┴───────────┴──────────┴───────────────┘

```

### Loading Simple Curves

For 1-dimensional tables (like lapse rates by age), the API is even simpler:

```python
# Load a simple age → lapse rate curve
lapse_table = gs.Table(
    name="lapse_2025",
    source="lapse_curve.csv",
    dimensions={
        "age": "age"  # Simple string shorthand
    },
    value="lapse_rate"
)

# Use it immediately
af = af.with_columns(
    lapse_table.lookup({"age": af["age"]}).alias("lapse_rate")
)

```

### Advanced Features

For more complex scenarios, you have full control with the dimension-based API:

```python
# Multi-dimensional table with selective column loading
mortality_table = gs.Table(
    name="mortality_by_gender",
    source="mortality_m_f.csv",
    dimensions={
        "age": "age",
        "gender": gs.MeltDimension(
            columns=["Male", "Female"],
            name="gender"
        )
    },
    value="mortality_rate"
)

# Table with custom overflow limits
salary_table = gs.Table(
    name="salary_scale",
    source="salary_by_service.csv",
    dimensions={
        "grade": "grade",
        "service": gs.MeltDimension(
            columns=[str(i) for i in range(1, 21)] + ["20+"],
            name="service",
            overflow=gs.ExtendOverflow("20+", to_value=50)
        )
    },
    value="scale_factor"
)

# Using computed dimensions
complex_table = gs.Table(
    name="complex_assumptions",
    source=df,
    dimensions={
        "issue_age": "issue_age",
        "policy_year": "policy_year",
        "attained_age": gs.ComputedDimension(
            pl.col("issue_age") + pl.col("policy_year") - 1,
            "attained_age"
        )
    },
    value="assumption_value"
)

```

### Using the TableBuilder Pattern

For step-by-step table construction, use the fluent `TableBuilder` API:

```python
# Build a complex mortality table
mortality_table = (
    gs.TableBuilder("mortality_select_ultimate")
    .from_source("mortality_su.csv")
    .with_data_dimension("issue_age", "IssueAge")
    .with_data_dimension("gender", "Gender")
    .with_melt_dimension(
        "duration",
        columns=[f"Dur{i}" for i in range(1, 16)] + ["Ultimate"],
        overflow=gs.ExtendOverflow("Ultimate", to_value=100),
        fill=gs.LinearInterpolate()  # Interpolate any gaps
    )
    .with_value_column("qx_rate")
    .build()
)

# The table is ready for lookups
af = af.with_columns(
    mortality_table.lookup({
        "issue_age": af["age"],
        "gender": af["sex"],
        "duration": af["policy_duration"]
    }).alias("mortality_rate")
)

```

### Metadata and Table Discovery

Tables can include metadata for documentation and discovery:

```python
# Create table with rich metadata
vbt_table = gs.Table(
    name="vbt_2015_complete",
    source="vbt_2015_all.csv",
    dimensions={
        "age": "Age",
        "gender": "Gender",
        "smoking": "Smoker",
        "duration": gs.MeltDimension(
            columns=duration_columns,
            name="duration",
            overflow=gs.ExtendOverflow("Ultimate", to_value=120)
        )
    },
    value="mortality_rate",
    metadata={
        "source": "2015 Valuation Basic Table",
        "basis": "ANB",
        "version": "2015",
        "effective_date": "2015-01-01",
        "description": "Industry standard mortality table",
        "tags": ["mortality", "vbt", "2015", "standard"]
    }
)

# Discover tables
all_tables = gs.list_tables()
print(f"Available tables: {all_tables}")

# Get metadata for a specific table
metadata = gs.get_table_metadata("vbt_2015_complete")
print(f"Table metadata: {metadata}")

# List all tables with metadata
tables_info = gs.list_tables_with_metadata()
for name, meta in tables_info.items():
    print(f"{name}: {meta.get('description', 'No description')}")

```

# Core Concepts in Gaspatchio

## Introduction

Gaspatchio is a Python library designed specifically for actuarial modeling. It provides a domain-specific language (DSL) that makes it easier to express complex actuarial calculations while maintaining performance and readability. If you're a modeling actuary with Python experience, this library builds on concepts you might already know from pandas, but with specific optimizations and features for actuarial work.

## ActuarialFrame: The Foundation

At the heart of Gaspatchio is the `ActuarialFrame`, a powerful alternative to pandas DataFrames. While pandas is excellent for general data manipulation, actuarial models often require:

- Handling of projection periods across many time steps
- Complex calculation dependencies
- Performance optimization for large datasets
- Vectorized operations on grouped data

`ActuarialFrame` addresses these needs by wrapping [Polars](https://pola.rs), a lightning-fast DataFrame library, and adding actuarial-specific functionality.

```python
from gaspatchio_core.dsl.core import ActuarialFrame

# Create an ActuarialFrame from existing data
af = ActuarialFrame(your_data)

# Set calculation columns using natural Python syntax
af["age-last"] = af.floor(af["age"]).cast(pl.Int64)
af["premium_rate"] = af["base_rate"] * af["age_factor"]

```

## Key Differences from pandas

If you're familiar with pandas, here are the main differences:

1. **Lazy Evaluation**: Operations are captured and optimized before execution, rather than being executed immediately
1. **Expression Tracking**: The library tracks how columns are derived, enabling model auditing and optimization
1. **Actuarial Functions**: Built-in functions for common actuarial operations
1. **Performance Modes**: Debug and optimize modes to balance development speed with production performance

## Modeling Approach

Gaspatchio encourages a functional, pipeline-based approach to model building:

```python
def life_model(af):
    # Chain operations using pipe for cleaner code
    af = (setup_ages(af)
          .pipe(mortality_rate)
          .pipe(lapse_rate)
          .pipe(premium_rate))

    # Define cashflows
    af["premium_cashflow"] = af["premium_rate"] * af["P[IF]"] * af["sum_assured"] / 1000
    af["claims_cashflow"] = af["P[death]"] * af["sum_assured"]

    return af

```

This approach makes models more testable, maintainable and readable.

- Testable: Each function can be tested independently.
- Maintainable: Clear separation of concerns.
- Readable: Operations flow naturally from inputs to outputs.

## Performance Optimization

Gaspatchio provides two execution modes:

1. **Debug Mode**: Direct execution for easier debugging and development
1. **Optimize Mode**: Captures operations to optimize before execution, with optional Numba acceleration

```python
# Set mode globally
from gaspatchio_core.dsl.core import set_default_mode
set_default_mode("optimize")

# Or per frame
af = ActuarialFrame(data, mode="optimize")

```

## Table Lookups and Assumptions

The library provides efficient ways to handle assumption tables common in actuarial work:

```python
# Register a mortality table
registry.register_table(
    name="mortality_rates",
    df=mortality_df,
    keys=["age-last", "gender_smoking"],
    value_column="mortality_rate"
)

# Look up values in models
af["mortality_rate"] = assumption_lookup(
    "age-last", 
    "gender_smoking",
    table_name="mortality_rates"
)

```

## Getting Started

To start building your first model with Gaspatchio, you need to:

1. Define your model points (policy data)
1. Create an ActuarialFrame from your data
1. Define projection functions
1. Run your model with the `run_model` function

For detailed examples and API documentation, see the subsequent sections of this guide.

# Integrating Custom Python Code

As an actuary using Gaspatchio, you might have existing Python functions or complex logic you want to integrate into your models. Perhaps you have a specific benefit calculation, a complex decrement logic, or a custom reserving method implemented in Python.

Gaspatchio provides two primary ways to incorporate this custom logic into the `ActuarialFrame` workflow:

1. **Direct Application (`.apply`)**: For quick, one-off use cases or simple functions.
1. **Accessor Plugins**: For more complex, reusable logic that benefits from better organization and integration.

## 1. Direct Application with `.apply()`

If you have a relatively simple Python function that operates on a single column's data element-wise, the quickest way to use it is via the `.apply()` method on a column proxy.

Let's say you have a Python function to calculate a simple bonus amount based on the policy duration:

```python
# Your existing Python function
def calculate_bonus(duration: int) -> float:
    if duration <= 5:
        return 0.0
    elif duration <= 10:
        return 50.0
    else:
        return 100.0 + (duration - 10) * 5.0

```

You can apply this directly within your model definition:

```python
import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame

# Assume 'af' is your ActuarialFrame with a 'policy_duration' column
# af = ActuarialFrame(...)

# Apply the custom Python function
# Note: We provide a return_dtype for better performance and type stability
af["bonus_amount"] = af["policy_duration"].apply(
    calculate_bonus, return_dtype=pl.Float64
)

result = af.collect()
print(result)

```

**Pros:**

- **Simple:** Very straightforward for existing functions.
- **Quick:** No extra setup required for one-off calculations.

**Cons:**

- **Performance:** Python function execution can be slower than native Polars/Gaspatchio operations, especially for large datasets. Providing `return_dtype` helps, but it won't be as fast as a pure expression. Gaspatchio might attempt Numba optimization in "optimize" mode if Numba is installed, but this isn't guaranteed.
- **Readability:** Can clutter model logic if many complex `.apply` calls are used.
- **Reusability:** Less discoverable and reusable across different models compared to plugins.
- **Limited Scope:** Primarily designed for element-wise operations on single columns.

Use `.apply()` when you need a quick integration and the performance impact is acceptable, or when prototyping logic before potentially converting it into a more optimized expression or plugin.

Performance Considerations with `.apply()`

Using `.apply()` executes your Python function row by row. This involves overhead for each element (calling the Python interpreter, type checking, etc.) and prevents vectorized optimizations that operate on entire columns simultaneously. As a result, it can be **orders of magnitude slower** than equivalent logic written using native Polars/Gaspatchio expressions, especially on large datasets.

You might see a `PerformanceWarning` when using `.apply()` similar to this:

```text
PerformanceWarning: Applying a Python function 'your_function_name' using map_elements. This is potentially slow.
For better performance, consider using Polars expressions directly.

```

While convenient for quick tests or simple logic, relying heavily on `.apply()` for core calculations will significantly impact your model's performance. It's strongly recommended to rewrite the logic using native Polars expressions or within an accessor plugin for production use, as shown in the next section.

## 2. Accessor Plugins (Recommended for Reusability)

If your custom logic is more complex, will be reused across different models, or involves multiple related calculations, creating an **accessor plugin** is the recommended approach.

Accessor plugins extend `ActuarialFrame` (or its column/expression proxies) with custom namespaces. Think of the built-in `.dt` (for dates) or `.str` (for strings) namespaces in Polars – plugins let you create your own, like `.mortality` or `.reserving`.

### Why Create a Plugin?

- **Organization:** Group related custom calculations under a single namespace (e.g., `af["premium"].finance.present_value(...)`).
- **Reusability:** Define logic once and use it across multiple models or share it with colleagues.
- **Readability:** Keeps model definitions cleaner by encapsulating complex logic within accessor methods.
- **Discoverability:** Makes custom functions easily discoverable via standard attribute access (and `dir()`).
- **Potential for Optimization:** Accessor methods can be written to leverage efficient Polars expressions internally.

### Creating a Simple Column Accessor

Let's adapt our `calculate_bonus` function into a reusable column accessor plugin.

**Step 1: Define the Accessor Class**

Create a Python file (e.g., `my_company_accessors.py`) and define your class:

```python
# my_company_accessors.py
import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame, ColumnProxy, ExpressionProxy
from gaspatchio_core.dsl.plugins import register_accessor

class BaseAccessor:
    """Optional base class for convenience."""
    def __init__(self, obj):
        # obj will be the ColumnProxy or ExpressionProxy instance
        self._obj = obj

@register_accessor("bonus", kind="column") # Register as .bonus for columns/expressions
class BonusAccessor(BaseAccessor):

    def amount(self) -> ExpressionProxy:
        """Calculates the bonus amount based on the proxied duration column."""
        # We use Polars expressions *inside* the accessor for performance
        duration_expr = self._obj # self._obj is the duration column/expression proxy

        bonus_expr = (
            pl.when(duration_expr <= 5).then(0.0)
            .when(duration_expr <= 10).then(50.0)
            .otherwise(100.0 + (duration_expr - 10) * 5.0)
            .cast(pl.Float64) # Ensure consistent output type
        )
        # Important: Return an ExpressionProxy
        # We assume self._obj has a ._parent attribute (true for Column/ExpressionProxy)
        return ExpressionProxy(bonus_expr, self._obj._parent)

    def is_eligible(self, threshold: int = 5) -> ExpressionProxy:
         """Checks if bonus is eligible based on duration."""
         duration_expr = self._obj
         eligibility_expr = duration_expr > threshold
         return ExpressionProxy(eligibility_expr, self._obj._parent)


# IMPORTANT: Ensure this module (my_company_accessors.py) is imported somewhere
# in your application *after* gaspatchio_core.dsl.core is defined.
# e.g., in __init__.py or main.py:
# import my_company_accessors

```

**Key Points:**

- `@register_accessor("bonus", kind="column")`: This decorator registers the `BonusAccessor` class. It will be available as `.bonus` on `ColumnProxy` and `ExpressionProxy` instances.
- `__init__(self, obj)`: Stores the proxy object (`ColumnProxy` or `ExpressionProxy`) the accessor is attached to.
- `amount(self)`: Implements the bonus logic using efficient Polars `when/then/otherwise` expressions instead of a Python function. It returns a new `ExpressionProxy`.
- Returning `ExpressionProxy`: Accessor methods that perform calculations should generally return `ExpressionProxy` objects to keep the operations within the Gaspatchio/Polars expression system for optimal performance and lazy evaluation.

**Step 2: Import Your Accessor Module**

Somewhere in your project (e.g., your main script or a relevant `__init__.py`), make sure to import the module containing your accessor definition. This triggers the registration decorator.

```python
# main_model.py
import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame
import my_company_accessors # <--- Import to register .bonus accessor

af = ActuarialFrame({ "policy_duration": [3, 7, 12] })

# Use the accessor!
af["bonus_amount"] = af["policy_duration"].bonus.amount()
af["is_bonus_eligible"] = af["policy_duration"].bonus.is_eligible()
# You can chain accessors with other operations
af["eligible_bonus"] = af["is_bonus_eligible"] * af["bonus_amount"]

print(af.collect())

```

### Frame Accessors and Entry Points

You can also create `frame` accessors (`kind="frame"`) that attach to the `ActuarialFrame` itself, useful for portfolio-level calculations.

Furthermore, if you are developing a package of reusable actuarial components, you can use **entry points** to make your accessors automatically discoverable when someone installs your package, without requiring them to explicitly import your accessor module.

These are more advanced topics covered in the technical reference documentation. For most users integrating their own project-specific code, the `@register_accessor` decorator provides the best balance of organization and ease of use.

Choose the method that best suits the complexity and reusability needs of your custom Python code. For simple, infrequent use, `.apply()` is sufficient. For structured, reusable, and potentially performance-critical logic, invest the time to create an accessor plugin.

- Why Polars?
- Shimming polars
- Column wise operations
# API

## `gaspatchio_core.frame.base.ActuarialFrame`

A lazy, chainable, and traceable DataFrame for actuarial modeling.

The ActuarialFrame provides a high-level API for common actuarial calculations and data manipulations, leveraging Polars LazyFrames for performance. It supports tracing of operations for optimization and introspection, and provides convenient accessors for specialized functionality (e.g., date, finance, excel operations).

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `data` | `dict | DataFrame | LazyFrame | None` | Initial data to populate the frame. Can be a Python dictionary, a Polars DataFrame, or a Polars LazyFrame. If None, an empty frame is initialized. Defaults to None. | `None` | | `mode` | `str | None` | The operational mode: "run", "optimize", or "debug". - "run": Executes operations eagerly. - "optimize": Defers execution and builds a computation graph. - "debug": Provides more verbose output. Defaults to the global default mode (get_default_mode). | `None` | | `verbose` | `bool | None` | Enables or disables verbose logging. Defaults to the global default verbosity (get_default_verbose). | `None` | | `threads` | `int | None` | Number of threads for parallel operations. Defaults to a system-dependent value or \_DEFAULT_THREADS. | `None` |

Attributes:

| Name | Type | Description | | --- | --- | --- | | `date` | `DateFrameAccessor` | Accessor for date-related operations. | | `excel` | `ExcelFrameAccessor` | Accessor for Excel-like operations. | | `finance` | `FinanceFrameAccessor` | Accessor for financial calculations. | | `columns` | `list[str]` | A list of column names in their current order. |

Examples:

**Initialization and Basic Operations**

```pycon
>>> from gaspatchio_core import ActuarialFrame
>>> data = {
...     "policy_id": [1, 1, 2, 2, 3],
...     "inception_date": ["2020-01-01", "2020-01-01", "2021-05-10", "2021-05-10", "2022-02-20"],
...     "premium": [100, 150, 200, 50, 300],
...     "claims": [0, 50, 10, 0, 120]
... }
>>> af = ActuarialFrame(data)
>>> af["loss_ratio"] = af["claims"] / af["premium"]
>>> result = af.collect()
>>> print(result.head(3))
shape: (3, 5)
┌───────────┬────────────────┬─────────┬────────┬────────────┐
│ policy_id ┆ inception_date ┆ premium ┆ claims ┆ loss_ratio │
│ ---       ┆ ---            ┆ ---     ┆ ---    ┆ ---        │
│ i64       ┆ str            ┆ i64     ┆ i64    ┆ f64        │
╞═══════════╪════════════════╪═════════╪════════╪════════════╡
│ 1         ┆ 2020-01-01     ┆ 100     ┆ 0      ┆ 0.0        │
│ 1         ┆ 2020-01-01     ┆ 150     ┆ 50     ┆ 0.333333   │
│ 2         ┆ 2021-05-10     ┆ 200     ┆ 10     ┆ 0.05       │
└───────────┴────────────────┴─────────┴────────┴────────────┘

```

**Using `sum` over a group**

```pycon
>>> af = ActuarialFrame(data)
>>> af["total_premium_per_policy"] = af["premium"].sum().over("policy_id")
>>> result_with_sum = af.collect()
>>> print(result_with_sum)
shape: (5, 5)
┌───────────┬────────────────┬─────────┬────────┬──────────────────────────┐
│ policy_id ┆ inception_date ┆ premium ┆ claims ┆ total_premium_per_policy │
│ ---       ┆ ---            ┆ ---     ┆ ---    ┆ ---                      │
│ i64       ┆ str            ┆ i64     ┆ i64    ┆ i64                      │
╞═══════════╪════════════════╪═════════╪════════╪══════════════════════════╡
│ 1         ┆ 2020-01-01     ┆ 100     ┆ 0      ┆ 250                      │
│ 1         ┆ 2020-01-01     ┆ 150     ┆ 50     ┆ 250                      │
│ 2         ┆ 2021-05-10     ┆ 200     ┆ 10     ┆ 250                      │
│ 2         ┆ 2021-05-10     ┆ 50      ┆ 0      ┆ 250                      │
│ 3         ┆ 2022-02-20     ┆ 300     ┆ 120    ┆ 300                      │
└───────────┴────────────────┴─────────┴────────┴──────────────────────────┘

```

**Using an accessor (e.g., date accessor)**

Assume 'inception_date' needs to be parsed to a date type first. For simplicity, let's imagine it's already a date type for this example. (Actual parsing would use `af["inception_date"].str.to_date("%Y-%m-%d")` or similar)

```pycon
>>> # If 'inception_date' was a date type:
>>> # af["inception_year"] = af.date.year("inception_date")
>>> # af_with_year = af.collect()
>>> # print(af_with_year.select(["policy_id", "inception_year"]))

```

### `columns`

Return the names of the columns in the current order.

### `date`

Access date-related frame operations.

### `excel`

Access excel-related frame operations.

### `finance`

Access finance-related frame operations.

### `__dir__()`

Enhance dir() output to include standard methods, df methods, and accessors.

### `__getattr__(name)`

Dynamically instantiate and return registered frame accessors.

### `__getitem__(key)`

Allow df['column'] access, returning a ColumnProxy.

### `__repr__()`

Return a string representation of the ActuarialFrame.

### `__setitem__(key, value)`

Handle column assignment using df['column'] = value.

### `collect()`

Execute and materialize the dataframe.

### `count()`

Count non-null values in each column.

Returns a single-row frame containing the count of non-null values for each column. Essential for data quality assessment, completeness checks, and exposure calculations in actuarial analysis.

When to use

- **Data Quality:** Assess completeness of critical fields like policy ID, sum assured, or premium to identify missing data issues.
- **Exposure Calculation:** Count policies, lives, or claims for exposure-based calculations in pricing and reserving.
- **Cohort Analysis:** Determine size of different risk groups, age bands, or product segments for credibility assessment.
- **Validation:** Verify record counts match expected values after data processing, joins, or filtering operations.

##### Returns

pl.DataFrame A frame with one row containing non-null counts for each column.

##### Examples

**Scalar Example: Data Completeness Check**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004", None],
    "age": [25, 45, None, 35, 52],
    "sum_assured": [100000, 500000, 250000, None, 300000],
    "status": ["Active", "Active", "Lapsed", "Active", "Active"],
}
af = ActuarialFrame(data)
counts = af.count()
print(counts)
print("Complete policies:", counts["policy_id"])
print("Complete ages:", counts["age"])
print("Data completeness %:", counts["age"] / 5 * 100)

```

```text
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────┐
│ policy_id ┆ age ┆ sum_assured ┆ status │
│ ---       ┆ --- ┆ ---         ┆ ---    │
│ u32       ┆ u32 ┆ u32         ┆ u32    │
╞═══════════╪═════╪═════════════╪════════╡
│ 4         ┆ 4   ┆ 4           ┆ 5      │
└───────────┴─────┴─────────────┴────────┘
Complete policies: 4
Complete ages: 4
Data completeness %: 80.0

```

**Vector Example: Monthly Activity Counts**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "month": ["Jan", "Feb"],
    "daily_claims": [
        [5, 3, 0, 4, None, 2, 1, 0, 3, None, 4, 2, 0, 1, 5],
        [2, None, 3, 1, 0, 4, None, 2, 0, 3, 1, None, 4, 2, 0]
    ],
    "daily_lapses": [
        [1, 0, 0, 2, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 1],
        [0, 1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 1, 0, 2, 0]
    ]
}
af = ActuarialFrame(data)

# Count valid daily observations
counts = af.count()
print(counts)

```

```text
shape: (1, 3)
┌───────┬──────────────┬──────────────┐
│ month ┆ daily_claims ┆ daily_lapses │
│ ---   ┆ ---          ┆ ---          │
│ u32   ┆ u32          ┆ u32          │
╞═══════╪══════════════╪══════════════╡
│ 2     ┆ 2            ┆ 2            │
└───────┴──────────────┴──────────────┘

```

### `fill_series(column, start=0, increment=1)`

Apply fill_series using the core function.

### `get_column_order()`

Return the tracked order of columns.

### `max()`

Calculate maximum values across all numeric columns.

Returns a single-row frame containing the maximum value for each column. Essential for identifying outliers, validating data ranges, and determining upper bounds in actuarial calculations.

When to use

- **Data Validation:** Identify outliers in premium amounts, sum assured, or claim values that may require investigation.
- **Experience Analysis:** Find maximum claim amounts, policy sizes, or ages in a portfolio for risk assessment.
- **Regulatory Reporting:** Determine maximum exposure amounts for solvency calculations and stress testing.
- **Pricing Boundaries:** Identify upper limits for age bands, benefit amounts, or policy terms in product design.

##### Returns

pl.DataFrame A frame with one row containing maximum values for each column.

##### Examples

**Scalar Example: Portfolio Maximum Values**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004"],
    "age": [25, 45, 67, 35],
    "sum_assured": [100000, 500000, 250000, 1000000],
    "annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
max_values = af.max()
print(max_values)
print("Max age:", max_values["age"][0])
print("Max sum assured:", max_values["sum_assured"][0])

```

```text
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────────────┐
│ policy_id ┆ age ┆ sum_assured ┆ annual_premium │
│ ---       ┆ --- ┆ ---         ┆ ---            │
│ str       ┆ i64 ┆ i64         ┆ i64            │
╞═══════════╪═════╪═════════════╪════════════════╡
│ P004      ┆ 67  ┆ 1000000     ┆ 15000          │
└───────────┴─────┴─────────────┴────────────────┘
Max age: 67
Max sum assured: 1000000

```

**Vector Example: Maximum Monthly Claims**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002"],
    "policy_year": [1, 2],
    "monthly_claims": [
        [0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
        [0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
    ],
    "monthly_premiums": [
        [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000],
        [1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500, 1500]
    ]
}
af = ActuarialFrame(data)

# Get maximum values to understand worst-case scenarios
max_values = af.max()
print(max_values)
print("Max policy year:", max_values["policy_year"][0])

```

```text
shape: (1, 4)
┌───────────┬─────────────┬─────────────────────────────────────┬─────────────────────────────────────┐
│ policy_id ┆ policy_year ┆ monthly_claims                      ┆ monthly_premiums                    │
│ ---       ┆ ---         ┆ ---                                 ┆ ---                                 │
│ str       ┆ i64         ┆ list[i64]                           ┆ list[i64]                           │
╞═══════════╪═════════════╪═════════════════════════════════════╪═════════════════════════════════════╡
│ P002      ┆ 2           ┆ [0, 500, 3000, 1200, … 4000, 0, 0]  ┆ [1500, 1500, 1500, 1500, … 1500]    │
└───────────┴─────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
Max policy year: 2

```

### `mean()`

Calculate mean values across all numeric columns.

Returns a single-row frame containing the mean value for each numeric column. Essential for portfolio analysis, experience studies, and establishing benchmarks in actuarial calculations.

When to use

- **Experience Analysis:** Calculate average claim amounts, policy sizes, or premium levels for portfolio segmentation and pricing.
- **Trend Analysis:** Determine average lapse rates, mortality rates, or expense ratios over observation periods.
- **Benchmarking:** Establish portfolio averages for age, sum assured, or duration to compare against industry standards.
- **Reserve Calculations:** Compute average policy values, benefit amounts, or reserve factors for grouped calculations.

##### Returns

pl.DataFrame A frame with one row containing mean values for numeric columns.

##### Examples

**Scalar Example: Portfolio Averages**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004"],
    "age": [25, 45, 67, 35],
    "sum_assured": [100000, 500000, 250000, 1000000],
    "annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
mean_values = af.mean()
print(mean_values)
print("Average age:", mean_values["age"])
print("Average sum assured:", mean_values["sum_assured"])

```

```text
shape: (1, 3)
┌──────┬──────────────┬─────────────────┐
│ age  ┆ sum_assured  ┆ annual_premium  │
│ ---  ┆ ---          ┆ ---             │
│ f64  ┆ f64          ┆ f64             │
╞══════╪══════════════╪═════════════════╡
│ 43.0 ┆ 462500.0     ┆ 7425.0          │
└──────┴──────────────┴─────────────────┘
Average age: 43.0
Average sum assured: 462500.0

```

**Vector Example: Average Monthly Experience**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002"],
    "policy_year": [1, 2],
    "monthly_claims": [
        [0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
        [0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
    ],
    "monthly_lapses": [
        [2, 1, 3, 0, 1, 2, 1, 0, 1, 0, 2, 1],
        [1, 0, 2, 1, 0, 1, 0, 1, 0, 2, 1, 0]
    ]
}
af = ActuarialFrame(data)

# Get average monthly experience
mean_values = af.mean()
print(mean_values)

```

```text
shape: (1, 4)
┌─────────────┬───────────────────────────────┬──────────────────────────────┐
│ policy_year ┆ monthly_claims                ┆ monthly_lapses               │
│ ---         ┆ ---                           ┆ ---                          │
│ f64         ┆ list[f64]                     ┆ list[f64]                    │
╞═════════════╪═══════════════════════════════╪══════════════════════════════╡
│ 1.5         ┆ [0.0, 250.0, 1500.0, … 0.0]   ┆ [1.5, 0.5, 2.5, … 0.5]       │
└─────────────┴───────────────────────────────┴──────────────────────────────┘

```

### `median()`

Calculate median values across all numeric columns.

Returns a single-row frame containing the median value for each numeric column. Useful for robust central tendency measures that are less affected by outliers in actuarial data.

When to use

- **Robust Analysis:** Use median instead of mean when data contains outliers, such as large claims or extreme ages in the portfolio.
- **Income Analysis:** Analyze median policyholder income or premium levels for market segmentation and product design.
- **Experience Studies:** Calculate median time to claim, policy duration, or age at lapse for more representative measures.
- **Pricing Benchmarks:** Determine median rates or factors when comparing across competitors or market segments.

##### Returns

pl.DataFrame A frame with one row containing median values for numeric columns.

##### Examples

**Scalar Example: Median Policy Metrics**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004", "P005"],
    "duration_years": [1, 3, 5, 7, 15],
    "annual_premium": [1200, 3500, 2800, 4200, 12000],
    "age": [25, 35, 42, 38, 65],
}
af = ActuarialFrame(data)
median_values = af.median()
print(median_values)
print("Median duration:", median_values["duration_years"])
print("Median premium:", median_values["annual_premium"])

```

```text
shape: (1, 3)
┌────────────────┬────────────────┬──────┐
│ duration_years ┆ annual_premium ┆ age  │
│ ---            ┆ ---            ┆ ---  │
│ f64            ┆ f64            ┆ f64  │
╞════════════════╪════════════════╪══════╡
│ 5.0            ┆ 3500.0         ┆ 38.0 │
└────────────────┴────────────────┴──────┘
Median duration: 5.0
Median premium: 3500.0

```

**Vector Example: Median Monthly Performance**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "agent": ["A001", "A002"],
    "monthly_sales": [
        [3, 5, 2, 8, 4, 6, 3, 7, 5, 4, 6, 9],
        [12, 15, 10, 18, 14, 16, 11, 20, 13, 17, 15, 22]
    ],
    "monthly_commission": [
        [450, 750, 300, 1200, 600, 900, 450, 1050, 750, 600, 900, 1350],
        [1800, 2250, 1500, 2700, 2100, 2400, 1650, 3000, 1950, 2550, 2250, 3300]
    ]
}
af = ActuarialFrame(data)

# Calculate median for typical performance assessment
median_values = af.median()
print(median_values)
print("Agent A001 median sales:", median_values["monthly_sales"][0])
print("Agent A002 median sales:", median_values["monthly_sales"][1])

```

```text
shape: (1, 3)
┌────────────┬────────────────────┬──────────────────────┐
│ agent      ┆ monthly_sales      ┆ monthly_commission   │
│ ---        ┆ ---                ┆ ---                  │
│ str        ┆ list[f64]          ┆ list[f64]            │
╞════════════╪════════════════════╪══════════════════════╡
│ null       ┆ [5.0, 15.0]        ┆ [750.0, 2250.0]      │
└────────────┴────────────────────┴──────────────────────┘
Agent A001 median sales: 5.0
Agent A002 median sales: 15.0

```

### `min()`

Calculate minimum values across all numeric columns.

Returns a single-row frame containing the minimum value for each column. Essential for identifying baseline values, detecting anomalies, and establishing lower bounds in actuarial calculations.

When to use

- **Data Quality Checks:** Identify potential data errors like negative ages, zero premiums, or missing values coded as extreme minimums.
- **Portfolio Analysis:** Find minimum entry ages, smallest policy sizes, or lowest premium amounts for market segmentation.
- **Risk Assessment:** Determine minimum coverage levels, deductibles, or retention limits in reinsurance analysis.
- **Product Design:** Establish minimum benefit guarantees, surrender values, or contribution limits for new products.

##### Returns

pl.DataFrame A frame with one row containing minimum values for each column.

##### Examples

**Scalar Example: Portfolio Minimum Values**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004"],
    "age": [25, 45, 67, 35],
    "sum_assured": [100000, 500000, 250000, 1000000],
    "annual_premium": [1200, 6000, 8500, 15000],
}
af = ActuarialFrame(data)
min_values = af.min()
print(min_values)
print("Min age:", min_values["age"])
print("Min sum assured:", min_values["sum_assured"])

```

```text
shape: (1, 4)
┌───────────┬─────┬─────────────┬────────────────┐
│ policy_id ┆ age ┆ sum_assured ┆ annual_premium │
│ ---       ┆ --- ┆ ---         ┆ ---            │
│ str       ┆ i64 ┆ i64         ┆ i64            │
╞═══════════╪═════╪═════════════╪════════════════╡
│ P001      ┆ 25  ┆ 100000      ┆ 1200           │
└───────────┴─────┴─────────────┴────────────────┘
Min age: 25
Min sum assured: 100000

```

**Vector Example: Minimum Monthly Claims**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002"],
    "policy_year": [1, 2],
    "monthly_claims": [
        [0, 500, 0, 1200, 0, 0, 800, 0, 0, 0, 0, 2500],
        [0, 0, 3000, 0, 0, 1500, 0, 0, 0, 4000, 0, 0]
    ],
    "monthly_retention": [
        [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000],
        [500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500]
    ]
}
af = ActuarialFrame(data)

# Get minimum values to understand retention levels
min_values = af.min()
print(min_values)
print("Min retention level:", min_values["monthly_retention"])

```

```text
shape: (1, 4)
┌───────────┬─────────────┬─────────────────────────────────────┬─────────────────────────────────────┐
│ policy_id ┆ policy_year ┆ monthly_claims                      ┆ monthly_retention                   │
│ ---       ┆ ---         ┆ ---                                 ┆ ---                                 │
│ str       ┆ i64         ┆ list[i64]                           ┆ list[i64]                           │
╞═══════════╪═════════════╪═════════════════════════════════════╪═════════════════════════════════════╡
│ P001      ┆ 1           ┆ [0, 0, 0, 0, … 0, 0, 0]             ┆ [500, 500, 500, 500, … 500]         │
└───────────┴─────────────┴─────────────────────────────────────┴─────────────────────────────────────┘
Min retention level: [500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500]

```

### `pipe(func, *args, **kwargs)`

Apply a function that accepts and returns an ActuarialFrame.

### `product()`

Calculate the product of values in each numeric column.

Returns a single-row frame containing the product of all values for each numeric column. Useful for compound calculations, probability chains, and multiplicative factors in actuarial modeling.

When to use

- **Compound Interest:** Calculate accumulated values using multiple period growth factors or discount factors.
- **Probability Chains:** Multiply survival probabilities, persistency rates, or success rates across multiple periods.
- **Factor Application:** Apply multiple adjustment factors, loading factors, or credibility factors in sequence.
- **Index Calculations:** Compute cumulative index values from period-to-period change factors.

##### Returns

pl.DataFrame A frame with one row containing products for numeric columns.

##### Examples

**Scalar Example: Survival Probability Chain**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "year": [1, 2, 3, 4, 5],
    "annual_survival": [0.999, 0.998, 0.997, 0.995, 0.993],
    "annual_persistency": [0.95, 0.92, 0.90, 0.88, 0.85],
}
af = ActuarialFrame(data)
products = af.product()
print(products)
print("5-year survival probability:", round(products["annual_survival"], 6))
print("5-year persistency:", round(products["annual_persistency"], 4))

```

```text
shape: (1, 3)
┌──────┬─────────────────┬────────────────────┐
│ year ┆ annual_survival ┆ annual_persistency │
│ ---  ┆ ---             ┆ ---                │
│ i64  ┆ f64             ┆ f64                │
╞══════╪═════════════════╪════════════════════╡
│ 120  ┆ 0.982089        ┆ 0.59262            │
└──────┴─────────────────┴────────────────────┘
5-year survival probability: 0.982089
5-year persistency: 0.5926

```

**Vector Example: Discount Factor Chains**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "scenario": ["Base", "Stressed"],
    "monthly_discount": [
        [0.9992, 0.9992, 0.9992, 0.9992, 0.9992, 0.9992],
        [0.9990, 0.9990, 0.9990, 0.9990, 0.9990, 0.9990]
    ],
    "monthly_survival": [
        [0.9999, 0.9999, 0.9999, 0.9999, 0.9999, 0.9999],
        [0.9998, 0.9998, 0.9998, 0.9998, 0.9998, 0.9998]
    ]
}
af = ActuarialFrame(data)

# Calculate cumulative factors
products = af.product()
print(products)

```

```text
shape: (1, 3)
┌──────────┬──────────────────┬──────────────────┐
│ scenario ┆ monthly_discount ┆ monthly_survival │
│ ---      ┆ ---              ┆ ---              │
│ str      ┆ list[f64]        ┆ list[f64]        │
╞══════════╪══════════════════╪══════════════════╡
│ null     ┆ [0.9952, 0.9940] ┆ [0.9994, 0.9988] │
└──────────┴──────────────────┴──────────────────┘

```

### `profile()`

Execute and materialize the dataframe with profiling, returning (result_df, profile_info).

### `quantile(quantile, interpolation='nearest')`

Calculate quantile values across all numeric columns.

Returns a single-row frame containing the specified quantile for each numeric column. Essential for risk assessment, percentile-based analysis, and regulatory reporting in actuarial applications.

When to use

- **Risk Assessment:** Calculate VaR (Value at Risk) at different confidence levels (e.g., 95th, 99th percentile) for solvency calculations.
- **Experience Analysis:** Determine percentile thresholds for large claims, high-risk ages, or outlier detection in portfolios.
- **Pricing Segmentation:** Identify quantile boundaries for premium bands, risk tiers, or underwriting categories.
- **Regulatory Reporting:** Calculate required percentiles for stress testing, capital requirements, or reserve adequacy testing.

##### Parameters

quantile : float Quantile value between 0 and 1 (e.g., 0.5 for median, 0.95 for 95th percentile). interpolation : str, default "nearest" Interpolation method: "nearest", "higher", "lower", "midpoint", or "linear".

##### Returns

pl.DataFrame A frame with one row containing quantile values for numeric columns.

##### Examples

**Scalar Example: Claims Distribution Analysis**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "claim_id": list(range(1, 101)),
    "claim_amount": [1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7500,
                   8000, 9000, 10000, 12000, 15000, 18000, 20000, 25000, 30000, 35000,
                   40000, 45000, 50000, 60000, 75000, 85000, 95000, 100000, 120000, 150000] + [2000] * 70,
    "processing_days": list(range(5, 35)) + list(range(10, 80)),
}
af = ActuarialFrame(data)

# Calculate key percentiles
p90 = af.quantile(0.90)
p95 = af.quantile(0.95)
p99 = af.quantile(0.99)

print("90th percentile:")
print(p90)
print("\nClaim amount 90th percentile:", p90["claim_amount"])
print("Claim amount 95th percentile:", p95["claim_amount"])
print("Claim amount 99th percentile:", p99["claim_amount"])

```

```text
90th percentile:
shape: (1, 3)
┌──────────┬──────────────┬─────────────────┐
│ claim_id ┆ claim_amount ┆ processing_days │
│ ---      ┆ ---          ┆ ---             │
│ f64      ┆ f64          ┆ f64             │
╞══════════╪══════════════╪═════════════════╡
│ 90.0     ┆ 85000.0      ┆ 71.0            │
└──────────┴──────────────┴─────────────────┘

Claim amount 90th percentile: 85000.0
Claim amount 95th percentile: 100000.0
Claim amount 99th percentile: 150000.0

```

**Vector Example: Portfolio Risk Percentiles**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "product": ["Term Life", "Whole Life"],
    "claim_amounts": [
        [10000, 15000, 20000, 25000, 30000, 35000, 40000, 50000, 75000, 100000, 
         150000, 200000, 250000, 300000, 500000, 750000, 1000000, 1500000, 2000000, 3000000],
        [50000, 75000, 100000, 125000, 150000, 175000, 200000, 250000, 300000, 400000,
         500000, 600000, 750000, 900000, 1000000, 1250000, 1500000, 2000000, 2500000, 5000000]
    ]
}
af = ActuarialFrame(data)

# Calculate 95th percentile for risk assessment
var_95 = af.quantile(0.95)
print("95% VaR by product:")
print(var_95)

```

```text
95% VaR by product:
shape: (1, 2)
┌────────────┬──────────────────────────────────┐
│ product    ┆ claim_amounts                    │
│ ---        ┆ ---                              │
│ str        ┆ list[f64]                        │
╞════════════╪══════════════════════════════════╡
│ null       ┆ [2000000.0, 2500000.0]           │
└────────────┴──────────────────────────────────┘

```

### `select(*exprs, **named_exprs)`

Select columns from the DataFrame.

Accepts positional expressions (column names, proxies, or expressions) and keyword arguments for renamed/new expressions.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `*exprs` | `IntoExprColumn` | Columns or expressions to select. | `()` | | `**named_exprs` | `IntoExprColumn` | Expressions to select with specific output names. | `{}` |

Returns:

| Type | Description | | --- | --- | | `Self` | The modified ActuarialFrame. |

### `show_query_plan(enabled=True)`

Enable or disable query plan logging (basic implementation).

### `std(ddof=1)`

Calculate standard deviation across all numeric columns.

Returns a single-row frame containing the standard deviation for each numeric column. Essential for risk assessment, volatility analysis, and confidence interval calculations in actuarial modeling.

When to use

- **Risk Assessment:** Measure volatility in claim amounts, premium variations, or mortality experience for pricing and reserving.
- **Experience Monitoring:** Quantify variability in lapse rates, expense ratios, or benefit utilization for assumption setting.
- **Confidence Intervals:** Calculate standard errors for mortality estimates, reserve factors, or pricing assumptions.
- **Portfolio Analysis:** Assess homogeneity of risk groups by comparing standard deviations across segments.

##### Parameters

ddof : int, default 1 Delta degrees of freedom. The divisor is N - ddof.

##### Returns

pl.DataFrame A frame with one row containing standard deviations for numeric columns.

##### Examples

**Scalar Example: Premium Volatility Analysis**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003", "P004", "P005"],
    "age_band": ["25-35", "25-35", "36-45", "36-45", "46-55"],
    "annual_premium": [1200, 1350, 3500, 3200, 8500],
    "sum_assured": [100000, 150000, 350000, 300000, 500000],
}
af = ActuarialFrame(data)
std_values = af.std()
print(std_values)
print("Premium volatility:", std_values["annual_premium"])

```

```text
shape: (1, 2)
┌──────────────────┬─────────────┐
│ annual_premium   ┆ sum_assured │
│ ---              ┆ ---         │
│ f64              ┆ f64         │
╞══════════════════╪═════════════╡
│ 2913.8           ┆ 158113.9    │
└──────────────────┴─────────────┘
Premium volatility: 2913.8

```

**Vector Example: Monthly Claims Volatility**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "product": ["Term Life", "Whole Life"],
    "monthly_claims": [
        [0, 1000, 500, 2000, 0, 3000, 1500, 0, 2500, 1000, 0, 4000],
        [5000, 6000, 4500, 7000, 5500, 8000, 6500, 5000, 7500, 6000, 9000, 10000]
    ],
    "monthly_premiums": [
        [50000, 50000, 52000, 51000, 50000, 49000, 50000, 51000, 50000, 50000, 51000, 50000],
        [120000, 125000, 122000, 128000, 124000, 130000, 126000, 123000, 127000, 125000, 129000, 132000]
    ]
}
af = ActuarialFrame(data)

# Calculate standard deviation for risk assessment
std_values = af.std()
print(std_values)
print("Term Life claims volatility:", round(std_values["monthly_claims"][0], 2))
print("Whole Life claims volatility:", round(std_values["monthly_claims"][1], 2))

```

```text
shape: (1, 3)
┌────────────┬──────────────────────────────┬───────────────────────────────┐
│ product    ┆ monthly_claims               ┆ monthly_premiums              │
│ ---        ┆ ---                          ┆ ---                           │
│ str        ┆ list[f64]                    ┆ list[f64]                     │
╞════════════╪══════════════════════════════╪═══════════════════════════════╡
│ null       ┆ [1443.38, 1443.38]           ┆ [831.66, 3207.14]             │
└────────────┴──────────────────────────────┴───────────────────────────────┘
Term Life claims volatility: 1443.38
Whole Life claims volatility: 1443.38

```

### `sum()`

Calculate sum totals across all numeric columns.

Returns a single-row frame containing the sum total for each numeric column. Critical for calculating portfolio totals, aggregate exposures, and overall metrics in actuarial reporting.

When to use

- **Portfolio Totals:** Calculate total sum assured, total premiums collected, or total claims paid for financial reporting.
- **Exposure Analysis:** Sum total lives covered, total benefits, or total risk amounts for reinsurance and capital calculations.
- **Revenue Reporting:** Aggregate premium income, fee revenue, or investment income across product lines or time periods.
- **Claims Analysis:** Total claim counts, amounts paid, or reserves across different claim types or cohorts.

##### Returns

pl.DataFrame A frame with one row containing sum totals for numeric columns.

##### Examples

**Scalar Example: Portfolio Totals**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "product": ["Term", "Whole Life", "Universal", "Term", "Endowment"],
    "policies_inforce": [1250, 890, 445, 2100, 325],
    "annual_premium": [1500000, 3200000, 2100000, 2800000, 1900000],
    "sum_assured": [125000000, 89000000, 67000000, 315000000, 48000000],
}
af = ActuarialFrame(data)
sum_values = af.sum()
print(sum_values)
print("Total policies:", sum_values["policies_inforce"])
print("Total premium:", sum_values["annual_premium"])
print("Total exposure:", sum_values["sum_assured"])

```

```text
shape: (1, 3)
┌──────────────────┬────────────────┬─────────────┐
│ policies_inforce ┆ annual_premium ┆ sum_assured │
│ ---              ┆ ---            ┆ ---         │
│ i64              ┆ i64            ┆ i64         │
╞══════════════════╪════════════════╪═════════════╡
│ 5010             ┆ 11500000       ┆ 644000000   │
└──────────────────┴────────────────┴─────────────┘
Total policies: 5010
Total premium: 11500000
Total exposure: 644000000

```

**Vector Example: Monthly Totals**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "branch": ["North", "South"],
    "monthly_new_business": [
        [120, 135, 110, 145, 130, 125, 140, 155, 135, 140, 130, 160],
        [95, 100, 90, 105, 110, 95, 100, 115, 105, 100, 95, 120]
    ],
    "monthly_premium": [
        [180000, 202500, 165000, 217500, 195000, 187500, 210000, 232500, 202500, 210000, 195000, 240000],
        [142500, 150000, 135000, 157500, 165000, 142500, 150000, 172500, 157500, 150000, 142500, 180000]
    ]
}
af = ActuarialFrame(data)

# Get total new business and premiums
sum_values = af.sum()
print(sum_values)

```

```text
shape: (1, 2)
┌───────────────────────────────────────┬───────────────────────────────────────┐
│ monthly_new_business                  ┆ monthly_premium                       │
│ ---                                   ┆ ---                                   │
│ list[i64]                             ┆ list[i64]                             │
╞═══════════════════════════════════════╪═══════════════════════════════════════╡
│ [215, 235, 200, 250, … 240, 225, 280] ┆ [322500, 352500, 300000, … 420000]    │
└───────────────────────────────────────┴───────────────────────────────────────┘

```

### `trace(func)`

Decorator to capture operations within a function call in optimize mode.

### `var(ddof=1)`

Calculate variance across all numeric columns.

Returns a single-row frame containing the variance for each numeric column. Used for risk metrics, ANOVA calculations, and statistical modeling in actuarial applications.

When to use

- **Risk Metrics:** Calculate variance in loss ratios, combined ratios, or expense ratios for enterprise risk management.
- **Statistical Testing:** Perform ANOVA on mortality rates, lapse rates, or claim frequencies across different cohorts.
- **Credibility Theory:** Calculate variance components for Bühlmann credibility factors in experience rating.
- **Asset-Liability Modeling:** Measure variance in investment returns, liability cash flows, or surplus positions.

##### Parameters

ddof : int, default 1 Delta degrees of freedom. The divisor is N - ddof.

##### Returns

pl.DataFrame A frame with one row containing variances for numeric columns.

##### Examples

**Scalar Example: Claims Variance Analysis**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "month": [1, 2, 3, 4, 5, 6],
    "claims_count": [45, 52, 38, 61, 43, 55],
    "claims_amount": [125000, 145000, 95000, 185000, 120000, 165000],
}
af = ActuarialFrame(data)
var_values = af.var()
print(var_values)
print("Claims count variance:", var_values["claims_count"])
print("Claims amount variance:", var_values["claims_amount"])

```

```text
shape: (1, 3)
┌───────┬──────────────┬──────────────────┐
│ month ┆ claims_count ┆ claims_amount    │
│ ---   ┆ ---          ┆ ---              │
│ f64   ┆ f64          ┆ f64              │
╞═══════╪══════════════╪══════════════════╡
│ 3.5   ┆ 70.3         ┆ 1.091e9          │
└───────┴──────────────┴──────────────────┘
Claims count variance: 70.3
Claims amount variance: 1091000000.0

```

**Vector Example: Experience Variance Components**

```python
from gaspatchio_core import ActuarialFrame

data = {
    "region": ["North", "South"],
    "quarterly_lapse_rates": [
        [0.025, 0.028, 0.022, 0.026],
        [0.031, 0.029, 0.033, 0.030]
    ],
    "quarterly_mortality_rates": [
        [0.0010, 0.0011, 0.0009, 0.0010],
        [0.0012, 0.0013, 0.0011, 0.0014]
    ]
}
af = ActuarialFrame(data)

# Calculate variance for credibility analysis
var_values = af.var()
print(var_values)
print("North region lapse variance:", var_values["quarterly_lapse_rates"][0])
print("South region lapse variance:", var_values["quarterly_lapse_rates"][1])

```

```text
shape: (1, 3)
┌────────────┬────────────────────────┬──────────────────────────────┐
│ region     ┆ quarterly_lapse_rates  ┆ quarterly_mortality_rates    │
│ ---        ┆ ---                    ┆ ---                          │
│ str        ┆ list[f64]              ┆ list[f64]                    │
╞════════════╪════════════════════════╪══════════════════════════════╡
│ null       ┆ [0.000007, 0.000003]   ┆ [0.0000000067, 0.0000000167] │
└────────────┴────────────────────────┴──────────────────────────────┘
North region lapse variance: 0.000007
South region lapse variance: 0.000003

```

### `with_columns(*exprs)`

Add columns to the DataFrame.

## `gaspatchio_core.column.namespaces.dt_proxy.DtNamespaceProxy`

A proxy for Polars datetime (dt) namespace operations, enabling type-hinting and IDE intellisense for `ActuarialFrame` datetime manipulations.

This proxy intercepts calls to datetime methods, retrieves the underlying Polars expression from its parent proxy (either a `ColumnProxy` or `ExpressionProxy`), applies the datetime operation, and then wraps the resulting Polars expression back into an `ExpressionProxy`.

### `__getattr__(name)`

Dynamically handle any other methods available on Polars' dt namespace.

This provides a fallback for dt methods not explicitly defined on this proxy. It attempts to call the method via `_call_dt_method`.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the dt method to access. | *required* |

Returns:

| Type | Description | | --- | --- | | `Callable[..., 'ExpressionProxy']` | A callable that, when invoked, will execute the corresponding | | `Callable[..., 'ExpressionProxy']` | Polars dt method and return an ExpressionProxy. |

Raises:

| Type | Description | | --- | --- | | `AttributeError` | If the method does not exist on the Polars dt namespace (raised by \_call_dt_method if the underlying Polars call fails). |

### `__init__(parent_proxy, parent_af)`

Initialize the DtNamespaceProxy.

This constructor is typically not called directly by users. It's used internally when accessing the `.dt` attribute of an `ActuarialFrame` column or expression proxy (e.g., `af["my_date_col"].dt`).

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `parent_proxy` | `'ParentProxyType'` | The parent proxy (ColumnProxy or ExpressionProxy) from which this dt namespace is accessed. | *required* | | `parent_af` | `Optional['ActuarialFrame']` | The parent ActuarialFrame, if available, for context. | *required* |

### `day()`

Extract the day number of the month (1-31) from a date/datetime expression.

This function isolates the day component from a date or datetime, returning it as an integer (e.g., 15 for the 15th of the month). It works for both individual dates and lists of dates.

When to use

Extracting the day of the month can be useful in actuarial contexts for: * **Specific Date Checks:** Identifying events occurring on particular days (e.g., end-of-month processing). * **Intra-month Analysis:** Analyzing patterns within a month, though less common than month or year analysis. * **Data Validation:** Ensuring dates fall within expected day ranges for specific calculations.

##### Examples

Scalar example::

```python
import polars as pl
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame({"d": pl.Series(["2023-06-05", "2023-06-15"]).str.to_date()})
print(af.select(af["d"].dt.day().alias("day")).collect())

```

```text
shape: (2, 1)
┌─────┐
│ day │
│ --- │
│ i8  │
╞═════╡
│ 5   │
│ 15  │
└─────┘

```

Vector (list) example – loss-event days::

```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame
data = {
    "policy_id": ["E005", "F006"],
    "loss_event_dates": [
        [datetime.date(2023, 6, 5), datetime.date(2023, 6, 15)],
        [datetime.date(2024, 2, 1), datetime.date(2024, 2, 29)],
    ],
}
af = ActuarialFrame(data).with_columns(
    pl.col("loss_event_dates").cast(pl.List(pl.Date))
)
days_expr = af["loss_event_dates"].dt.day()
print(af.select("policy_id", days_expr.alias("event_days")).collect())

```

```text
shape: (2, 2)
┌───────────┬────────────┐
│ literal   ┆ event_days │
│ ---       ┆ ---        │
│ str       ┆ list[i8]   │
╞═══════════╪════════════╡
│ policy_id ┆ [5, 15]    │
│ policy_id ┆ [1, 29]    │
└───────────┴────────────┘

```

### `month()`

Extract the month number (1-12) from a date or datetime expression.

This function allows you to isolate the month component from a series of dates or datetimes. The result is an integer representing the month, where January is 1 and December is 12.

When to use

In actuarial modeling, extracting the month from dates is crucial for various analyses. For instance, you might use this to:

- Analyze seasonality in claims (e.g., identifying if certain types of claims are more frequent in specific months).
- Group policies by their issue month for cohort analysis or to study underwriting patterns.
- Determine premium due dates or benefit payment schedules that occur on a monthly basis.
- Calculate fractional year components for financial calculations.

##### Examples

Scalar example::

```python
import polars as pl
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame({"d": pl.Series(["2022-01-01", "2022-02-01", "2022-03-01"]).str.to_date("%Y-%m-%d")})
print(af.select(af["d"].dt.month().alias("m")).collect())

```

```text
shape: (3, 1)
┌─────┐
│ m   │
│ --- │
│ i8  │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

```

Vector (list) example – claim-lodgement months::

```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame
data = {
    "policy_id": ["C003", "D004"],
    "claim_lodgement_dates": [
        [datetime.date(2022, 3, 10), datetime.date(2022, 4, 5)],
        [datetime.date(2023, 1, 20), datetime.date(2023, 11, 30)],
    ],
}
af = ActuarialFrame(data).with_columns(
    pl.col("claim_lodgement_dates").cast(pl.List(pl.Date))
)
months_expr = af["claim_lodgement_dates"].dt.month()
print(af.select(pl.col("policy_id"), months_expr.alias("lodgement_months")).collect())

```

```text
shape: (2, 2)
┌───────────┬──────────────────┐
│ policy_id ┆ lodgement_months │
│ ---       ┆ ---              │
│ str       ┆ list[i8]         │
╞═══════════╪══════════════════╡
│ C003      ┆ [3, 4]           │
│ D004      ┆ [1, 11]          │
└───────────┴──────────────────┘

```

### `year()`

Extract the year from the underlying datetime expression.

This function isolates the year component from a date or datetime, returning it as an integer (e.g., 2023). It is applicable to both single date values and lists of dates within your `ActuarialFrame`.

When to use

Extracting the year is fundamental in actuarial analysis for: * **Valuation and Reporting:** Determining the calendar year for financial reporting or regulatory submissions. * **Experience Studies:** Grouping data by calendar year of event (e.g., year of claim, year of lapse) to analyze trends. * **Cohort Analysis:** Defining cohorts based on the year of policy issue or birth year. * **Projection Models:** Calculating durations or projecting cash flows based on calendar years.

##### Examples

Scalar example (single-date column)::

```python
import polars as pl
from gaspatchio_core import ActuarialFrame
data = {
    "dates": pl.Series(["2020-01-15", "2021-07-20"]).str.to_date(format="%Y-%m-%d")
}
af = ActuarialFrame(data)
year_expr = af["dates"].dt.year()
print(af.select(year_expr.alias("year")).collect())

```

```text
shape: (2, 1)
┌──────┐
│ year │
│ ---  │
│ i32  │
╞══════╡
│ 2020 │
│ 2021 │
└──────┘

```

Vector example (list-of-dates per policy)::

```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame
data_vec = {
    "policy_id": ["A001", "B002"],
    "policy_event_dates": [
        [datetime.date(2019, 12, 1), datetime.date(2020, 1, 20)],
        [datetime.date(2021, 5, 10), datetime.date(2021, 8, 15), datetime.date(2022, 2, 25)],
    ],
}
af_vec = ActuarialFrame(data_vec)
af_vec = af_vec.with_columns(pl.col("policy_event_dates").cast(pl.List(pl.Date)))
years_expr = af_vec["policy_event_dates"].dt.year()
print(af_vec.select(pl.col("policy_id"), years_expr.alias("event_years")).collect())

```

```text
shape: (2, 2)
┌───────────┬────────────────────┐
│ policy_id ┆ event_years        │
│ ---       ┆ ---                │
│ str       ┆ list[i32]          │
╞═══════════╪════════════════════╡
│ A001      ┆ [2019, 2020]       │
│ B002      ┆ [2021, 2021, 2022] │
└───────────┴────────────────────┘

```

## `gaspatchio_core.accessors.excel.ExcelColumnAccessor`

Bases: `BaseColumnAccessor`

Provides Excel-related methods applicable to columns or expressions.

Accessed via `.excel` on an ActuarialFrame column or expression proxy, e.g., `af["my_excel_col"].excel`.

### `__init__(proxy)`

Initializes the accessor with the parent proxy.

Internal initialization method for the Excel column accessor.

### `from_excel_serial(epoch='1900')`

Converts Excel serial numbers (integers or floats) to Polars Date.

Follows logic similar to openpyxl for compatibility. This method handles Excel's date serialization system, including the notorious Excel 1900 leap year bug where Excel incorrectly treats 1900 as a leap year.

When to use

- **Excel File Import:** When importing Excel files that contain date columns stored as serial numbers rather than proper date values.
- **Legacy Data Processing:** When working with older Excel files or systems that export dates as numeric serial values.
- **Cross-Platform Compatibility:** When handling Excel files that may have been created on different platforms (Windows vs Mac) with different epoch systems.
- **Data Validation:** When you need to convert and validate date serial numbers from external Excel-based data sources.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `epoch` | `str` | The epoch system used by Excel ('1900' or '1904'). Defaults to '1900'. 1900 Epoch (WINDOWS_1900_EPOCH = 1899-12-30): Serial 1 is 1900-01-01. Excel's serial 60 (phantom 1900-02-29) is mapped to 1900-03-01. Serials > 60 are adjusted by -1 day before adding to epoch. 1904 Epoch (MAC_1904_EPOCH = 1904-01-01): Serial 1 is 1904-01-01. Days to add from epoch are serial - 1. | `'1900'` |

Returns:

| Type | Description | | --- | --- | | `ExpressionProxy` | An ExpressionProxy representing the converted date column. |

Raises:

| Type | Description | | --- | --- | | `ValueError` | If an invalid epoch is provided. |

Examples:

```python
from gaspatchio_core import ActuarialFrame

# Excel serial numbers for some dates
data = {
    "policy_id": ["P001", "P002", "P003"],
    "excel_date_serial": [44197, 44562, 44927],  # Excel serial numbers
}
af = ActuarialFrame(data)

# Convert Excel serial numbers to proper dates
af_with_dates = af.with_columns(
    actual_date=af["excel_date_serial"].excel.from_excel_serial(epoch="1900")
)
print(af_with_dates.collect())

```

```text
shape: (3, 3)
┌───────────┬────────────────────┬─────────────┐
│ policy_id ┆ excel_date_serial  ┆ actual_date │
│ ---       ┆ ---                ┆ ---         │
│ str       ┆ i64                ┆ date        │
╞═══════════╪════════════════════╪═════════════╡
│ P001      ┆ 44197              ┆ 2021-01-01  │
│ P002      ┆ 44562              ┆ 2021-12-31  │
│ P003      ┆ 44927              ┆ 2022-12-31  │
└───────────┴────────────────────┴─────────────┘

```

### `yearfrac(end_date_expr, basis='act/act')`

Calculate the year fraction between two dates, similar to Excel's YEARFRAC.

This function computes the fraction of a year represented by the number of whole days between a start date (the column/expression this accessor is on) and an end date. It uses a specified day count basis. The function can operate on individual dates (scalars or columns) and also handles scenarios where one of the date inputs is a list of dates within a column.

When to use

- **Premium Proration**: Calculate the portion of an annual premium that corresponds to a partial policy term, for example, if a policy starts or ends mid-year.
- **Exposure Calculation**: Determine fractional exposure periods for reserving or IBNR (Incurred But Not Reported) calculations, especially when dealing with policies that are not in force for a full year.
- **Investment Analysis**: Compute fractional year periods for accrued interest calculations or for annualizing returns on investments held for parts of a year.
- **Performance Metrics**: Analyze time-based metrics such as time-to-claim or duration of an event, expressed as a fraction of a year.

##### Parameters

end_date_expr : IntoExprColumn An expression or column representing the end dates. Can be a scalar date, a column of dates, or a column of `List[Date]` if the start date is a scalar/column of dates (and vice-versa). basis : int or str, optional The day count basis to use. Can be an integer (0-4) or a string name. Defaults to "act/act" (which is basis 1).

```text
Supported bases:
- `0` or `'us_nasd_30_360'` (30/360 US NASD) - US (NASD) 30/360 convention
- `1` or `'act/act'` (Actual/Actual) - Simplified version (uses 365.25 days)
- `2` or `'actual_360'` (Actual/360) - Not Implemented
- `3` or `'actual_365'` (Actual/365 fixed) - Not Implemented
- `4` or `'european_30_360'` (30/360 European) - Not Implemented

```

##### Returns

ExpressionProxy An expression representing the calculated year fraction as a `Float64`. If one of the inputs was a `List[Date]`, the output will be a `List[Float64]`.

##### Raises

NotImplementedError If a `basis` other than the currently supported basis values is specified, or if both start and end date expressions resolve to `List[Date]` columns (which requires a more complex UDF or explode/aggregate pattern). TypeError If the underlying proxy for the start date is not a `ColumnProxy` or `ExpressionProxy`. RuntimeError If the operation requires an `ActuarialFrame` context that is not available. ValueError If an invalid basis is provided.

##### Examples

Calculating Policy Term as Year Fraction (Scalar/Column Operations)::

````text
Scenario: You have policy start and end dates and want to calculate the policy term in years.

```python
import datetime
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["P001", "P002", "P003"],
    "start_date": [
        datetime.date(2020, 1, 1),
        datetime.date(2021, 6, 15),
        datetime.date(2022, 3, 1),
    ],
    "end_date": [
        datetime.date(2021, 1, 1),
        datetime.date(2022, 6, 15),
        datetime.date(2022, 9, 1), # Partial year
    ],
}
af = ActuarialFrame(data)

# Calculate year fraction using 'act/act' (simplified)
af_with_term = af["start_date"].excel.yearfrac(af["end_date"], basis="act/act")
print(af_with_term.collect())
````

```
shape: (3, 4)
┌───────────┬────────────┬────────────┬────────────┐
│ policy_id ┆ start_date ┆ end_date   ┆ term_years │
│ ---       ┆ ---        ┆ ---        ┆ ---        │
│ str       ┆ date       ┆ date       ┆ f64        │
╞═══════════╪════════════╪════════════╪════════════╡
│ P001      ┆ 2020-01-01 ┆ 2021-01-01 ┆ 1.002053   │
│ P002      ┆ 2021-06-15 ┆ 2022-06-15 ┆ 0.999316   │
│ P003      ┆ 2022-03-01 ┆ 2022-09-01 ┆ 0.503765   │
└───────────┴────────────┴────────────┴────────────┘
```

````

Fractional Exposure for Multiple Claim Events from a Single Policy Start (List Operation)::

```text
Scenario: A policy has a single start date, but multiple claim event dates.
Calculate the time from policy start to each claim event as a year fraction.

```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame

data = {
    "policy_id": ["PolicyA", "PolicyB"],
    "policy_start_date": [datetime.date(2020, 1, 1), datetime.date(2021, 1, 1)],
    "claim_event_dates": [
        [datetime.date(2020, 7, 1), datetime.date(2021, 3, 15)], # Events for PolicyA
        [datetime.date(2021, 2, 1)],                            # Event for PolicyB
    ],
}
# Ensure claim_event_dates is typed as List[Date]
af = ActuarialFrame(data, schema_overrides={"claim_event_dates": pl.List(pl.Date)})

af_with_frac = af.with_columns(
    time_to_event_years = af["policy_start_date"].excel.yearfrac(af["claim_event_dates"])
)
print(af_with_frac.collect())
````

```
shape: (2, 4)
┌───────────┬───────────────────┬───────────────────────────────────────────┬─────────────────────────────┐
│ policy_id ┆ policy_start_date ┆ claim_event_dates                         ┆ time_to_event_years         │
│ ---       ┆ ---               ┆ ---                                       ┆ ---                         │
│ str       ┆ date              ┆ list[date]                                ┆ list[f64]                   │
╞═══════════╪═══════════════════╪═══════════════════════════════════════════╪═════════════════════════════╡
│ PolicyA   ┆ 2020-01-01        ┆ [2020-07-01, 2021-03-15]                  ┆ [0.50016, 1.200046]         │
│ PolicyB   ┆ 2021-01-01        ┆ [2021-02-01]                              ┆ [0.084873]                  │
└───────────┴───────────────────┴───────────────────────────────────────────┴─────────────────────────────┘
```

```
```

## `gaspatchio_core.column.namespaces.string_proxy.StringNamespaceProxy`

A proxy for Polars expression string (str) namespace operations.

This proxy is typically accessed via the `.str` attribute of a `ColumnProxy` or `ExpressionProxy` that refers to a string or list-of-strings column within an `ActuarialFrame`. It allows for intuitive, Polars-like string manipulations while remaining integrated with the ActuarialFrame ecosystem.

It automatically handles shimming for `List[String]` columns, applying string methods element-wise to the contents of the lists.

Examples:

**Scalar Example: Uppercasing policyholder names**

This demonstrates applying a string operation to a scalar string column. We'll convert policyholder names to uppercase.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data_for_class_doctest = { # Renamed to avoid conflict with other examples
    "policy_holder_name": ["John Doe", "Jane Smith", "Robert Jones"],
    "policy_type_codes": [["TERM", "WL"], ["UL"], ["TERM", "CI"]]
}
af_scalar = ActuarialFrame(data_for_class_doctest)
af_upper_names = af_scalar.select(
    af_scalar["policy_holder_name"].str.to_uppercase().alias("upper_name")
)
print(af_upper_names.collect())

```

```text
shape: (3, 1)
┌──────────────┐
│ upper_name   │
│ ---          │
│ str          │
╞══════════════╡
│ JOHN DOE     │
│ JANE SMITH   │
│ ROBERT JONES │
└──────────────┘

```

**Vector (List Shimming) Example: Lowercasing policy type codes**

This demonstrates applying a string operation to a list-of-strings column. We'll convert lists of policy type codes to lowercase.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_for_class_doctest = {
    "policy_holder_name": ["John Doe", "Jane Smith", "Robert Jones"],
    "policy_type_codes": [["TERM", "WL"], ["UL"], ["TERM", "CI"]]
}
af_vector = ActuarialFrame(data_for_class_doctest).with_columns(
    pl.col("policy_type_codes").cast(pl.List(pl.String))
)
af_lower_codes = af_vector.select(
   af_vector["policy_type_codes"].str.to_lowercase().alias("lower_codes")
)
print(af_lower_codes.collect())

```

```text
shape: (3, 1)
┌────────────────┐
│ lower_codes    │
│ ---            │
│ list[str]      │
╞════════════════╡
│ ["term", "wl"] │
│ ["ul"]         │
│ ["term", "ci"] │
└────────────────┘

```

### `__getattr__(name)`

Dynamically handle calls to Polars string methods not explicitly defined.

This allows the proxy to support any method available on Polars' str namespace without needing to define each one explicitly on this proxy class.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the string method to call. | *required* |

Returns:

| Type | Description | | --- | --- | | `Callable[..., 'ExpressionProxy']` | A callable that, when invoked, will execute the corresponding Polars | | `Callable[..., 'ExpressionProxy']` | string method via \_call_string_method. |

Raises:

| Type | Description | | --- | --- | | `AttributeError` | If the method does not exist on the Polars string namespace (this is typically raised by \_call_string_method), or if a dunder method (e.g. __repr__) is accessed that isn't defined. |

### `__init__(parent_proxy, parent_af)`

Initialize the StringNamespaceProxy.

This constructor is not typically called directly by users. Instances are created by the dispatch mechanism when accessing `.str` on a ColumnProxy or ExpressionProxy.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `parent_proxy` | `'ProxyType'` | The parent ColumnProxy or ExpressionProxy from which .str was accessed. | *required* | | `parent_af` | `Optional['ActuarialFrame']` | The parent ActuarialFrame, providing context such as the underlying DataFrame/LazyFrame and schema. | *required* |

### `contains(pattern, literal=False, strict=False)`

Checks if strings in a column contain a specified pattern.

This method searches for a pattern within string values, returning a boolean indicating if the pattern exists in each string. It's useful for filtering, data categorization, and identifying records with specific text patterns.

When to use

- Identify policies with specific riders or endorsements from description fields
- Find claims that mention particular medical conditions or causes
- Filter customer feedback containing specific keywords for risk analysis
- Segment policyholders based on address information (e.g., rural vs urban)
- Flag policies or claims with special handling notes (e.g., "legal review")
- Screen underwriting notes for high-risk indicators

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `pattern` | `str | Expr` | The substring or regex pattern to search for. Can be a literal string (e.g., "RiderX") or a Polars expression (e.g., pl.col("other_column_with_patterns")). | *required* | | `literal` | `bool` | If True, pattern is treated as a literal string. If False (default), pattern is treated as a regex. | `False` | | `strict` | `bool` | If True and pattern is a Polars expression, an error is raised if pattern is not a string type. If False (default), pattern is cast to string if possible. | `False` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy containing a boolean Series indicating for each input string whether the pattern was found. If the input was List[String], the output will be List[bool]. |

Examples:

**Scalar Example: Identifying policies with an Accidental Death Benefit (ADB) rider**

Imagine you have a dataset of policy descriptions and you want to flag all policies that include an "ADB" rider.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id": ["POL001", "POL002", "POL003", "POL004"],
    "description": [
        "Term Life Plan with ADB rider",
        "Whole Life - Standard",
        "Universal Life, includes ADB rider and Accidental Death Benefit (ADB)",
        "Term Life, no Accidental Death Benefit rider"
    ]
}
af = ActuarialFrame(data)
af_with_adb_rider = af.select(
    af["description"].str.contains("ADB rider", literal=True).alias("has_adb_rider")
)
print(af_with_adb_rider.collect())

```

```text
shape: (4, 1)
┌───────────────┐
│ has_adb_rider │
│ ---           │
│ bool          │
╞═══════════════╡
│ true          │
│ false         │
│ true          │
│ false         │
└───────────────┘

```

**Vector Example: Checking underwriter notes for high-risk keywords**

Suppose each policy has a list of notes from underwriters. We want to check if any note for a given policy contains keywords like "medical history" or "hazardous occupation", which might indicate higher risk.

```python
from gaspatchio_core.frame.base import ActuarialFrame

uw_notes_data = {
    "policy_id": ["UW001", "UW002", "UW003"],
    "underwriter_notes": [
        "Standard risk. Family history clear.",
        "Applicant works in construction. Reviewed medical history: smoker.",
        "No concerning notes. Possible hazardous occupation mentioned."
    ]
}
af_notes = ActuarialFrame(uw_notes_data)
af_results = af_notes.select(
    af_notes["underwriter_notes"].str.contains("medical history").alias("mentions_medical_history"),
    af_notes["underwriter_notes"].str.contains("(?i)hazardous occupation").alias("mentions_hazardous_occupation"),
)
print(af_results.collect())

```

```text
shape: (3, 2)
┌──────────────────────────┬───────────────────────────────┐
│ mentions_medical_history ┆ mentions_hazardous_occupation │
│ ---                      ┆ ---                           │
│ bool                     ┆ bool                          │
╞══════════════════════════╪═══════════════════════════════╡
│ false                    ┆ false                         │
│ true                     ┆ false                         │
│ false                    ┆ true                          │
└──────────────────────────┴───────────────────────────────┘

```

**Using `contains` with a list of patterns (regex and literal)**

Suppose we want to check for multiple keywords in underwriter notes using both literal and regex matching.

```python
from gaspatchio_core.frame.base import ActuarialFrame

uw_notes_data_multi = { # Renamed to avoid conflict
    "policy_id": ["UW001", "UW002", "UW003"],
    "underwriter_notes": [
        "Standard risk. Family history clear.",
        "Applicant works in construction. Reviewed medical history: smoker.",
        "No concerning notes. Possible hazardous occupation mentioned."
    ]
}
af_multi = ActuarialFrame(uw_notes_data_multi)
af_multi_processed = af_multi.select(
    # Literal check
    af_multi["underwriter_notes"].str.contains("medical history", literal=True).alias("mentions_medical_history_literal"),
    # Regex check (case insensitive)
    af_multi["underwriter_notes"].str.contains(r"(?i)hazardous occupation").alias("mentions_hazardous_occupation_regex"),
    # Another Regex check (case insensitive) for medical history
    af_multi["underwriter_notes"].str.contains(r"(?i)medical history").alias("mentions_medical_history_regex")
)
print(af_multi_processed.collect())

```

```text
shape: (3, 3)
┌──────────────────────────────────┬─────────────────────────────────────┬────────────────────────────────┐
│ mentions_medical_history_literal ┆ mentions_hazardous_occupation_regex ┆ mentions_medical_history_regex │
│ ---                              ┆ ---                                 ┆ ---                            │
│ bool                             ┆ bool                                ┆ bool                           │
╞══════════════════════════════════╪═════════════════════════════════════╪════════════════════════════════╡
│ false                            ┆ false                               ┆ false                          │
│ true                             ┆ false                               ┆ true                           │
│ false                            ┆ true                                ┆ false                          │
└──────────────────────────────────┴─────────────────────────────────────┴────────────────────────────────┘

```

### `ends_with(suffix)`

Check if strings end with a specific substring.

This method returns a boolean expression showing whether each string value ends with the provided suffix. For columns containing `List[String]`, the check is applied to every element within each list.

When to use

- Verify that policy identifiers end with region or product codes.
- Flag claim or log entries that end with status markers like "OK" or "PENDING".
- Validate strings against suffixes supplied in another column, such as checking payout account numbers.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `suffix` | `str | Expr` | The substring to test for at the end of each string. It can be a literal value or a Polars expression. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A boolean result indicating whether each string | | | `'ExpressionProxy'` | ends with suffix. For list columns, the result is a list of | | | `'ExpressionProxy'` | booleans. |

Examples:

**Scalar example – region codes**

```python
from gaspatchio_core.frame.base import ActuarialFrame

af = ActuarialFrame({
    "policy_id": ["P100-US", "P101-CA", "P102-US", None, "P103-EU"]
})
result = af.select(
    af["policy_id"].str.ends_with("-US").alias("is_us_policy")
)
print(result.collect())

```

```text
shape: (5, 1)
┌──────────────┐
│ is_us_policy │
│ ---          │
│ bool         │
╞══════════════╡
│ true         │
│ false        │
│ true         │
│ null         │
│ false        │
└──────────────┘

```

**Vector (list) example – status flags**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

logs = {
    "policy_id": ["A100", "A101"],
    "update_notes_str": [
        "Issued OK,Review PENDING",
        "None,Paid OK",
    ],
}
af_logs = ActuarialFrame(logs)
af_logs = af_logs.with_columns(
    af_logs["update_notes_str"].str.split(",").alias("update_notes").map_elements(
        lambda x: [None if item == "None" else item for item in x], return_dtype=pl.List(pl.String)
    )
)
status_ok = af_logs.select(
    af_logs["update_notes"].str.ends_with("OK").alias("ends_with_ok")
)
print(status_ok.collect())

```

```text
shape: (2, 1)
┌───────────────┐
│ ends_with_ok  │
│ ---           │
│ list[bool]    │
╞═══════════════╡
│ [true, false] │
│ [null, true]  │
└───────────────┘

```

### `extract(pattern, group_index=1)`

Extract a capturing group from a regex pattern.

This method returns the specified group from each string that matches `pattern`. It operates element-wise on list columns, making it ideal for pulling identifiers or amounts embedded in free-text fields.

When to use

- Retrieve policy or claim numbers from combined identifiers or descriptive text
- Capture monetary amounts from claim notes for validation
- Isolate classification codes embedded within longer strings

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `pattern` | `str` | The regex pattern with capturing groups. | *required* | | `group_index` | `int` | The 1-based index of the group to extract. | `1` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy containing the extracted group. |

Examples:

**Scalar Example: Extracting policy numbers from combined IDs**

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "full_id": ["POLICY-12345-AB", "CLAIM-67890-CD", "POLICY-ABCDE-FG"],
}
af = ActuarialFrame(data)
af_extracted = af.select(
    af["full_id"].str.extract(r"POLICY-([A-Z0-9]+)-.*", group_index=1).alias("policy_num")
)
print(af_extracted.collect())

```

```text
shape: (3, 1)
┌────────────┐
│ policy_num │
│ ---        │
│ str        │
╞════════════╡
│ 12345      │
│ null       │
│ ABCDE      │
└────────────┘

```

**Vector Example: Extracting amounts from transaction descriptions**

```python
from gaspatchio_core.frame.base import ActuarialFrame
data_list = {
    "policy_id": ["P001"],
    "transactions": ["Premium paid: $100.50, Fee: $10.00, Adjustment: $-5.25"],
}
af_list = ActuarialFrame(data_list)
af_list = af_list.with_columns(
    af_list["transactions"].str.split(", ").alias("transactions")
)
af_list_extracted = af_list.select(
    af_list["transactions"].str.extract(r"\$?([-+]?[0-9]+\.[0-9]{2})", group_index=1).alias("amounts_str")
)
print(af_list_extracted.collect())

```

```text
shape: (1, 1)
┌──────────────────────────────┐
│ amounts_str                  │
│ ---                          │
│ list[str]                    │
╞══════════════════════════════╡
│ ["100.50", "10.00", "-5.25"] │
└──────────────────────────────┘

```

### `extract_all(pattern)`

Extract all non-overlapping regex matches as a list.

Mirrors Polars' `Expr.str.extract_all`. For `List[String]` columns, the extraction is applied element-wise.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `pattern` | `str` | The regex pattern to search for. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy containing a list of all matches for each row. |

When to use

- Collect every monetary amount mentioned in claim notes for validation against the claim ledger.
- Extract all policy reference numbers from free-text fields when reconciling cross-policy transactions.
- Gather every ICD code from a medical report to determine claim triggers.
- Capture all state abbreviations from an address string when assessing geographical concentration risk.

Examples:

**Scalar example – Extracting amounts from claim descriptions**

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "claim_id": ["C1", "C2"],
    "details": ["Paid $150.00 and $25.50 fee", "Refunded $10.00"]
}
af = ActuarialFrame(data)
af_amounts = af.select(
    af["details"].str.extract_all(r"\$([0-9]+\.[0-9]{2})").alias("amounts")
)
print(af_amounts.collect())

```

```text
shape: (2, 1)
┌───────────────────────┐
│ amounts               │
│ ---                   │
│ list[str]             │
╞═══════════════════════╡
│ ["$150.00", "$25.50"] │
│ ["$10.00"]            │
└───────────────────────┘

```

**Vector example – Extracting policy numbers from lists of notes**

```python
from gaspatchio_core.frame.base import ActuarialFrame

notes = {
    "claim_id": ["C1"],
    "notes": ["Policy 12345 reported, Adjustment for policy 98765"]
}
af = ActuarialFrame(notes)
af_list = af.with_columns(
    af["notes"].str.split(", ").alias("notes")
)
result = af_list.select(
    af_list["notes"].str.extract_all(r"[0-9]+").alias("policy_numbers")
)
print(result.collect())

```

```text
shape: (1, 1)
┌────────────────────────┐
│ policy_numbers         │
│ ---                    │
│ list[list[str]]        │
╞════════════════════════╡
│ [["12345"], ["98765"]] │
└────────────────────────┘

```

### `len_bytes()`

Get the number of bytes in each string.

Calculates the byte length of each string in a column. This is particularly useful when dealing with multi-byte character encodings (like UTF-8) where the number of characters may not equal the number of bytes.

When to use

- **Data Storage Estimation:** Accurately estimating storage requirements for datasets containing text fields, especially with international character sets (e.g., policyholder names, addresses from various regions).
- **System Integration Limits:** Ensuring that string data, when exported or sent to other systems, conforms to byte-length restrictions imposed by those systems (e.g., fixed-width file formats or database field constraints defined in bytes).
- **Performance Considerations:** Recognizing that operations on strings with many multi-byte characters might be more resource-intensive.
- **Encoding Issue Detection:** While not a direct detection method, unexpected byte lengths compared to character lengths might hint at encoding problems or the presence of unusual characters.

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with the byte count (as UInt32) for each string. If the input was List[String], the output will be List[UInt32]. |

Examples:

**Scalar Example: Byte length of UTF-8 encoded client names**

Scenario: You have client names that may include characters from various languages, and you need to understand their storage size in bytes.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "client_id": ["C001", "C002", "C003", "C004"],
    "client_name": ["René", "沐宸", "Zoë", "John Doe"] # French, Chinese, German, English names
}
af = ActuarialFrame(data)
af_byte_len = af.select(
    af["client_name"].str.len_bytes().alias("name_byte_length")
)
print(af_byte_len.collect())

```

```text
shape: (4, 1)
┌──────────────────┐
│ name_byte_length │
│ ---              │
│ u32              │
╞══════════════════╡
│ 5                │
│ 6                │
│ 4                │
│ 8                │
└──────────────────┘

```

**Vector Example: Byte length of free-text comments in a list**

Scenario: A policy record contains a list of comments, potentially with special characters or different languages. You need to find the byte length of each comment.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_list_comments = {
    "policy_id": ["P501", "P502"],
    "comments_list": [
        ["Test € symbol", "Standard comment.", None], # Euro symbol is multi-byte
        ["Résumé", "日本語のコメント"] # French with accent, Japanese comment
    ]
}
af_comments = ActuarialFrame(data_list_comments)
# Ensure the list column has the correct Polars type
af_comments = af_comments.with_columns(
    af_comments["comments_list"].cast(pl.List(pl.String))
)

af_comment_byte_len = af_comments.select(
    af_comments["comments_list"].str.len_bytes().alias("comment_byte_lengths")
)
print(af_comment_byte_len.collect())

```

```text
shape: (2, 1)
┌──────────────────────────┐
│ comment_byte_lengths     │
│ ---                      │
│ list[u32]                │
╞══════════════════════════╡
│ [13, 17, null]           │
│ [7, 21]                  │
└──────────────────────────┘

```

### `len_chars()`

Alias for `n_chars`. Get the number of characters in each string.

Calculates the length of each string in a column, returning an integer representing the number of characters. This is an alias for `n_chars()`.

When to use

- **Data Validation:** Ensuring identifiers like policy numbers, social security numbers, or postal codes adhere to expected length constraints, helping to identify data entry errors.
- **System Integration:** Verifying that string data, such as client names or addresses, does not exceed length limitations of downstream systems or databases.
- **Feature Engineering:** Using the length of free-text fields (e.g., claim descriptions, underwriter notes) as a potential feature in predictive models, where length might correlate with complexity or severity.
- **Data Quality Assessment:** Identifying outliers or anomalies in string lengths that might indicate corrupted or incomplete data.

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with the character count (as UInt32) for each string. If the input was List[String], the output will be List[UInt32]. |

Examples:

**Scalar Example: Validating policy number length**

Scenario: You need to check if policy numbers in your dataset conform to an expected length, say 7 characters.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id_raw": ["POL1234", "POL567", "POL89012", None, "POL3456"],
    "premium": [100.0, 150.0, 200.0, 50.0, 120.0]
}
af = ActuarialFrame(data)

# Calculate the length of each policy_id_raw
af_len_check = af.select(
    af["policy_id_raw"].str.len_chars().alias("policy_id_length")
)
print(af_len_check.collect())

```

```text
shape: (5, 1)
┌──────────────────┐
│ policy_id_length │
│ ---              │
│ u32              │
╞══════════════════╡
│ 7                │
│ 6                │
│ 8                │
│ null             │
│ 7                │
└──────────────────┘

```

**Vector Example: Character count of claim notes**

Scenario: Each policy may have a list of associated claim notes. You want to find the character length of each note to understand the verbosity or for display purposes.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_list = {
    "policy_id": ["P7001", "P7002"],
    "claim_notes_list": [
        ["Short note.", "This is a much longer note regarding the claim details.", None],
        ["Urgent review needed!", "All clear."]
    ]
}
af_list_notes = ActuarialFrame(data_list)
# Ensure the list column has the correct Polars type
af_list_notes = af_list_notes.with_columns(
    af_list_notes["claim_notes_list"].cast(pl.List(pl.String))
)

af_notes_len = af_list_notes.select(
    af_list_notes["claim_notes_list"].str.len_chars().alias("note_char_lengths")
)
print(af_notes_len.collect())

```

```text
shape: (2, 1)
┌───────────────────────────┐
│ note_char_lengths         │
│ ---                       │
│ list[u32]                 │
╞═══════════════════════════╡
│ [11, 53, null]            │
│ [20, 9]                   │
└───────────────────────────┘

```

### `ljust(width, fill_char=' ')`

Left-align strings by padding on the right.

Strings shorter than `width` are padded on the right with `fill_char`. When the column contains `List[String]` values, each element is padded individually.

When to use

- Formatting account or policy identifiers for fixed-width exports.
- Preparing ledger extracts where text fields must be left-aligned.
- Normalizing rider or sub-account codes stored as lists so they compare consistently.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `width` | `int` | The desired total length of the string after padding. | *required* | | `fill_char` | `str` | The character to pad with. Defaults to a space. | `' '` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with strings padded at the end. |

Examples:

**Scalar example – fixed-width account codes**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

with pl.Config(fmt_str_lengths=100):
    data = {"account_code": ["A1", "B123", None, "C"]}
    af = ActuarialFrame(data)
    af_ljust = af.select(
        af["account_code"].str.ljust(6, "-").alias("ljust_code")
    )
    print(af_ljust.collect())

```

```text
shape: (4, 1)
┌────────────┐
│ ljust_code │
│ ---        │
│ str        │
╞════════════╡
│ A1----     │
│ B123--     │
│ null       │
│ C-----     │
└────────────┘

```

**Vector example – padding elements in a list column**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

with pl.Config(fmt_str_lengths=100):
    data_list = {
        "batch_id": ["X01"],
        "sub_codes": [["S1", "LONGCODE", "S23"]],
    }
    af_list = ActuarialFrame(data_list)
    af_list = af_list.with_columns(
        af_list["sub_codes"].cast(pl.List(pl.String))
    )
    af_list_ljust = af_list.select(
        af_list["sub_codes"].str.ljust(8, "X").alias("ljust_sub_codes")
    )
    print(af_list_ljust.collect())

```

```text
shape: (1, 1)
┌──────────────────────────────────────┐
│ ljust_sub_codes                      │
│ ---                                  │
│ list[str]                            │
╞══════════════════════════════════════╡
│ ["S1XXXXXX", "LONGCODE", "S23XXXXX"] │
└──────────────────────────────────────┘

```

### `n_chars()`

Get the number of characters in each string.

This function calculates the length of each string in a column, returning an integer representing the number of characters. It's a fundamental operation for understanding string data characteristics.

When to use

- **Data Quality Checks:** Identifying unexpectedly short or long strings that might indicate data entry errors or truncation (e.g., validating the length of policy numbers, postal codes, or identification numbers).
- **Feature Engineering:** Creating new features based on string length for predictive models (e.g., the length of a claim description might correlate with claim complexity).
- **Data Cleaning & Transformation:** Deciding on padding or truncation strategies if string fields need to conform to a fixed length for system integration or reporting.
- **Understanding Free-Text Fields:** Analyzing the distribution of lengths in fields like underwriter notes or medical descriptions to gauge the amount of detail typically provided.
- **Filtering or Segmenting Data:** Selecting records based on the length of a specific string field (e.g., finding all policyholder names shorter than 3 characters for review).

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with the character count (as UInt32) for each string. |

Examples:

**Scalar Example: Length of product names**

To understand the typical length of product names in your portfolio, or to identify names that might be too long for certain display formats.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "product_code": ["L-TERM-10", "L-WL-P", "ANN-SDA"],
    "product_name": ["Term Life 10 Year", "Whole Life Par", "Single Deferred Annuity"]
}
af = ActuarialFrame(data)
af_len = af.select(
    af["product_name"].str.n_chars().alias("name_length")
)
print(af_len.collect())

```

```text
shape: (3, 1)
┌─────────────┐
│ name_length │
│ ---         │
│ u32         │
╞═════════════╡
│ 17          │
│ 14          │
│ 23          │
└─────────────┘

```

**Vector Example: Length of beneficiary names in a list**

For policies with multiple beneficiaries, you might want to check the length of each beneficiary's name, perhaps to ensure it fits within system limits or for data validation.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_list = {
    "policy_id": ["P001", "P002"],
    "beneficiaries": [["John A. Doe", "Jane B. Smith"], ["Robert King", None, "Alice Wonderland"]]
}
af_list_initial = ActuarialFrame(data_list)
af_list = af_list_initial.with_columns(
    af_list_initial["beneficiaries"].cast(pl.List(pl.String))
)
af_bene_len = af_list.select(
    af_list["beneficiaries"].str.n_chars().alias("beneficiary_name_lengths")
)
print(af_bene_len.collect())

```

```text
shape: (2, 1)
┌──────────────────────────┐
│ beneficiary_name_lengths │
│ ---                      │
│ list[u32]                │
╞══════════════════════════╡
│ [11, 13]                 │
│ [11, null, 16]           │
└──────────────────────────┘

```

### `pad_end(width, fill_char=' ')`

Left-align strings by padding on the right.

Strings shorter than `width` are padded on the right with `fill_char`. If the column is `List[String]` the padding is applied to each element of the list.

When to use

- Format policy numbers or claim identifiers for extracts that require fixed-width fields.
- Pad abbreviations in list columns (such as rider codes) so that they line up cleanly in cross-system feeds.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `width` | `int` | The desired total length of the string after padding. | *required* | | `fill_char` | `str` | The character to pad with. Defaults to a space. | `' '` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with strings padded at the end. |

Examples:

**Scalar example – fixed-width policy codes**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame
with pl.Config(fmt_str_lengths=100):

    data = {"policy_code": ["L101", "L20", None]}
    af = ActuarialFrame(data)
    result = af.select(
        af["policy_code"].str.pad_end(6, "0").alias("fixed_length_code")
    )
    print(result.collect())

```

```text
shape: (3, 1)
┌───────────────────┐
│ fixed_length_code │
│ ---               │
│ str               │
╞═══════════════════╡
│ L10100            │
│ L20000            │
│ null              │
└───────────────────┘

```

**Vector example – padding claim codes in a list**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame
with pl.Config(fmt_str_lengths=100):

    data_list = {"batch_id": ["B200"], "claim_codes": [["A1", "XYZ", "C1234"]]}
    af_list = ActuarialFrame(data_list).with_columns(
        pl.col("claim_codes").cast(pl.List(pl.String))
    )
    result = af_list.select(
        af_list["claim_codes"].str.pad_end(6, "_").alias("aligned_codes")
    )
    print(result.collect())

```

```text
shape: (1, 1)
┌────────────────────────────────┐
│ aligned_codes                  │
│ ---                            │
│ list[str]                      │
╞════════════════════════════════╡
│ ["A1____", "XYZ___", "C1234_"] │
└────────────────────────────────┘

```

### `pad_start(width, fill_char=' ')`

Alias for `rjust`. Pads the start of strings (right-aligns content).

Adds characters to the beginning of each string until it reaches the given width. This is handy when preparing fixed-width extracts or aligning numeric text fields in actuarial reports.

When to use

- Preparing policy identifiers for legacy mainframe interfaces that expect fixed-width fields.
- Aligning premium or reserve amounts in textual summaries generated for regulators or management.
- Standardizing rider codes stored in lists so that they can be compared consistently across policies.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `width` | `int` | The desired minimum length of the string. | *required* | | `fill_char` | `str` | The character to pad with. Defaults to a space. | `' '` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with strings padded at the | | | `'ExpressionProxy'` | start. |

Examples:

**Scalar Example: Align premium amounts in a report**

```python
# Test with pl.Config to ensure consistent display
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame
with pl.Config(fmt_str_lengths=100):

    data = {
        "premium_str": ["1200.5", "85.75", None]
    }
    af = ActuarialFrame(data)
    result = af.select(
        af["premium_str"].str.pad_start(8, " ").alias("padded_premium")
    )
    print(result.collect())

```

```text
shape: (3, 1)
┌────────────────┐
│ padded_premium │
│ ---            │
│ str            │
╞════════════════╡
│    1200.5      │
│      85.75     │
│ null           │
└────────────────┘

```

**Vector Example: Pad rider codes stored as a list**

```python
# Test with pl.Config to ensure consistent display
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame
with pl.Config(fmt_str_lengths=100):

    data_list = {
        "policy_id": ["P01"],
        "rider_codes": [["RID1", "LONGRID", "R2"]]
    }
    af_list = ActuarialFrame(data_list).with_columns(
        pl.col("rider_codes").cast(pl.List(pl.String))
    )
    result = af_list.select(
        af_list["rider_codes"].str.pad_start(8, "0").alias("padded_rider_codes")
    )
    print(result.collect())

```

```text
shape: (1, 1)
┌──────────────────────────────────────────┐
│ padded_rider_codes                       │
│ ---                                      │
│ list[str]                                │
╞══════════════════════════════════════════╡
│ ["0000RID1", "0LONGRID", "000000R2"]     │
└──────────────────────────────────────────┘

```

### `remove_prefix(prefix)`

Alias for `strip_prefix`. Remove a prefix from each string.

The prefix is removed from the beginning of every string. Strings without that prefix remain unchanged. `List[String]` columns are processed element by element.

When to use

- **Standardizing vendor codes** before mapping them to your base product dictionary.
- **Cleaning temporary policy identifiers** created during data migrations.
- **Dropping country prefixes** from location codes when you need only the state or province.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `prefix` | `str | Expr` | The substring to remove. May be a literal string or an expression resolving to one. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | The expression with the prefix removed. |

Examples:

**Scalar example – clean temporary policy IDs**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id_raw": ["TMP-001", "TMP-002", "003", None],
    "processing_prefix": ["TMP-", "TMP-", "TMP-", "TMP-"],
}

with pl.Config(set_tbl_width_chars=100):
    af_fixed = ActuarialFrame(data)
    fixed = af_fixed.select(
        af_fixed["policy_id_raw"].str.remove_prefix("TMP-").alias("policy_id")
    ).collect()
    print(fixed)

    af_dynamic = ActuarialFrame(data)
    dynamic = af_dynamic.select(
        af_dynamic["policy_id_raw"].str.remove_prefix(
            af_dynamic["processing_prefix"]
        ).alias("policy_id")
    ).collect()
    print()
    print("Dynamic prefix removal:")
    print(dynamic)

```

```text
shape: (4, 1)
┌───────────┐
│ policy_id │
│ ---       │
│ str       │
╞═══════════╡
│ 001       │
│ 002       │
│ 003       │
│ null      │
└───────────┘
Dynamic prefix removal:
shape: (4, 1)
┌───────────┐
│ policy_id │
│ ---       │
│ str       │
╞═══════════╡
│ 001       │
│ 002       │
│ 003       │
│ null      │
└───────────┘

```

**Vector example – remove `LEGACY-` from feature codes**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

af_list = ActuarialFrame({
    "policy_key": ["P1", "P2"],
    "feature_codes_raw": [
        ["LEGACY-RIDER1", "BENEFIT_A"],
        [None, "LEGACY-OPTION_B"],
    ],
})
af_list = af_list.with_columns(
    af_list["feature_codes_raw"].cast(pl.List(pl.String))
)
with pl.Config(set_tbl_width_chars=100, fmt_str_lengths=100):
    result = af_list.select(
        af_list["feature_codes_raw"].str.remove_prefix("LEGACY-").alias(
            "feature_codes"
        )
    ).collect()
    print(result)

```

```text
shape: (2, 1)
┌─────────────────────────┐
│ feature_codes           │
│ ---                     │
│ list[str]               │
╞═════════════════════════╡
│ ["RIDER1", "BENEFIT_A"] │
│ [null, "OPTION_B"]      │
└─────────────────────────┘

```

### `remove_suffix(suffix)`

Alias for `strip_suffix`. Remove a suffix from each string.

This method behaves identically to meth:`strip_suffix`, removing the specified trailing substring from each string value. If a string does not end with the provided suffix it is returned unchanged. When the column is a list of strings, the removal is applied element-wise.

When to use

- **Normalizing Product Names:** Stripping version tags like "-2024" or "\_NEW" from product identifiers so that experience can be grouped by the base product.
- **Cleaning Import Data:** Eliminating temporary indicators such as "-DRAFT" that may be appended to policy numbers imported from administration systems.
- **Simplifying Text Fields:** Removing trailing notes like "\*cancelled" from agent remarks prior to text analytics or matching.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `suffix` | `str | Expr` | The suffix to remove. Can be a literal string or a Polars expression that evaluates to a string. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy with the suffix removed. |

Examples:

**Scalar Example: Removing '-OLD' from policy codes**

Scenario: Historical policy codes may include a trailing `-OLD` suffix that should be dropped for reporting.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {"policy_code": ["TERM10-OLD", "WL-OLD", "ANN"]}
af = ActuarialFrame(data)
af_clean = af.select(
    af["policy_code"].str.remove_suffix("-OLD").alias("code_clean")
)
print(af_clean.collect())

```

```text
shape: (3, 1)
┌─────────────┐
│ code_clean  │
│ ---         │
│ str         │
╞═════════════╡
│ TERM10      │
│ WL          │
│ ANN         │
└─────────────┘

```

**Vector (list) example: Removing trailing '\*exp' from lists of underwriting notes**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

notes_data = {
    "policy_id": [1, 2],
    "uw_notes": [
        ["Declined*exp", "Check later*exp"],
        ["Approved", None],
    ],
}
af_notes = ActuarialFrame(notes_data)
af_notes = af_notes.with_columns(
    af_notes["uw_notes"].cast(pl.List(pl.String))
)
af_notes_clean = af_notes.select(
    af_notes["uw_notes"].str.remove_suffix("*exp").alias("notes_clean")
)
print(af_notes_clean.collect())

```

```text
shape: (2, 1)
┌────────────────────────────┐
│ notes_clean                │
│ ---                        │
│ list[str]                  │
╞════════════════════════════╡
│ ["Declined", "Check later"] │
│ ["Approved", null]          │
└────────────────────────────┘

```

### `replace(pattern, value, literal=False, n=1)`

Replace occurrences of a pattern in each string.

This method searches every string in the column for a given substring or regular expression pattern and replaces the first `n` matches with the provided `value`. When `literal` is `True` the `pattern` is treated as a plain string; otherwise it is interpreted as a regex.

When to use

- **Updating Legacy Codes:** Converting outdated product or policy codes to a new standard so assumption tables align across systems.
- **Cleaning Free-Text Fields:** Removing or altering specific phrases in underwriting or claim notes prior to text analysis.
- **Normalizing Reference Data:** Adjusting naming conventions in data feeds before merging them with internal models.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `pattern` | `str | Expr` | Substring or regex pattern to search for. May also be a Polars expression yielding the pattern. | *required* | | `value` | `str | Expr` | Replacement text. Can be a string or a Polars expression. | *required* | | `literal` | `bool` | If True, pattern is treated as a literal string. | `False` | | `n` | `int` | Maximum number of replacements per string. Defaults to 1. | `1` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new expression with the specified replacements | | | `'ExpressionProxy'` | applied. |

Examples:

**Scalar Example: Normalizing policy status descriptions**

Scenario: Some policy statuses contain the phrase `"IN FORCE"`. Replace it with `"INFORCE"` for consistency.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id": ["P1", "P2", "P3"],
    "status_raw": ["IN FORCE", "LAPSED", "IN FORCE"],
}
af = ActuarialFrame(data)
af_clean = af.select(
    af["status_raw"].str.replace("IN FORCE", "INFORCE", literal=True).alias("status")
)
print(af_clean.collect())

```

```text
shape: (3, 1)
┌─────────┐
│ status  │
│ ---     │
│ str     │
╞═════════╡
│ INFORCE │
│ LAPSED  │
│ INFORCE │
└─────────┘

```

**Vector Example: Removing 'NOTE: ' from lists of claim notes**

Scenario: Each policy has a list of claim notes and some entries start with `"NOTE: "`. Remove this prefix from each note.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

notes_data = {
    "policy_id": ["A1", "A2"],
    "claim_notes_str": [
        "NOTE: Initial review,Payment authorised",
        "None,NOTE: Follow up required",
    ],
}
af_notes = ActuarialFrame(notes_data)
af_notes = af_notes.with_columns(
    af_notes["claim_notes_str"].str.split(",").alias("claim_notes").map_elements(
        lambda x: [None if item == "None" else item for item in x], return_dtype=pl.List(pl.String)
    )
)
af_clean_notes = af_notes.select(
    af_notes["claim_notes"].str.replace("NOTE: ", "", literal=True, n=1).alias("clean_notes")
)
result = af_clean_notes.collect()
print(result)

```

```text
shape: (2, 1)
┌──────────────────────────────────────────┐
│ clean_notes                               │
│ ---                                      │
│ list[str]                                │
╞══════════════════════════════════════════╡
│ ["Initial review", "Payment authorised"] │
│ [null, "Follow up required"]             │
└──────────────────────────────────────────┘

```

### `rjust(width, fill_char=' ')`

Right-align strings by padding on the left.

Strings shorter than `width` are padded on the left with `fill_char`. If the column is `List[String]` the padding is applied to each element of the list.

When to use

- Aligning premium or claim amounts before exporting to legacy ledger systems.
- Presenting policy identifiers or rider codes in uniformly padded columns for regulatory or management reports.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `width` | `int` | The desired total length of the string after padding. | *required* | | `fill_char` | `str` | The character to pad with. Defaults to a space. | `' '` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with strings padded at the start. |

Examples:

**Scalar example – formatting premium amounts**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data = {"premium_str": ["123.45", "7", None]}
af = ActuarialFrame(data)
af_rjust = af.select(
    af["premium_str"].str.rjust(8).alias("rjust_premium")
)
with pl.Config(fmt_str_lengths=100, tbl_width_chars=100):
    print(af_rjust.collect())

```

```text
shape: (3, 1)
┌───────────────┐
│ rjust_premium │
│ ---           │
│ str           │
╞═══════════════╡
│   123.45      │
│        7      │
│ null          │
└───────────────┘

```

**Vector example – aligning claim references**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_list = {
    "batch_id": ["B100"],
    "claim_refs": [["C1", "C234", "C56789"]],
}
af_list = ActuarialFrame(data_list).with_columns(
    pl.col("claim_refs").cast(pl.List(pl.String))
)
result = af_list.select(
    af_list["claim_refs"].str.rjust(6, "0").alias("formatted_refs")
)
with pl.Config(fmt_str_lengths=100, tbl_width_chars=100):
    print(result.collect())

```

```text
shape: (1, 1)
┌────────────────────────────────┐
│ formatted_refs                 │
│ ---                            │
│ list[str]                      │
╞════════════════════════════════╡
│ ["0000C1", "00C234", "C56789"] │
└────────────────────────────────┘

```

### `starts_with(prefix)`

Check if strings in a column start with a given substring.

This is useful for categorizing or flagging records based on prefixes in textual data. For example, identifying policies based on product code prefixes (e.g., "TERM-" for term life, "WL-" for whole life) or segmenting claims by a prefix in their claim ID (e.g., "AUTO-" for auto claims).

When applied to a column of `List[String]`, such as a list of associated product features for a policy, the operation is performed element-wise on each string within each list, returning a list of booleans.

When to use

- Classify policies by prefix to drive product-specific assumptions.
- Identify riders with a particular prefix (e.g., primary benefits) when stored in a list column.
- Validate codes against expected prefixes coming from another column.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `prefix` | `str | Expr` | The substring to check for at the beginning of each string. Can be a literal string (e.g., "TERM-") or a Polars expression (e.g., pl.col("another_column_with_prefixes")). | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy containing a boolean Series indicating for each input string whether it starts with the prefix. If the input was List[String], the output will be List[bool]. |

Examples:

**Scalar example – policy prefixes**

```python
from gaspatchio_core.frame.base import ActuarialFrame

data_policies = {
    "policy_no": ["TERM-1001", "WL-2002", "TERM-1003", None, "UL-3004", "TERM-1004"],
    "issue_age": [25, 30, 28, 45, 35, 40]
}
af = ActuarialFrame(data_policies)

# Check if policy_no starts with "TERM-"
af_term_policies = af.select(
    af["policy_no"].str.starts_with("TERM-").alias("is_term_policy")
)
print(af_term_policies.collect())

```

```text
shape: (6, 1)
┌────────────────┐
│ is_term_policy │
│ ---            │
│ bool           │
╞════════════════╡
│ true           │
│ false          │
│ true           │
│ null           │
│ false          │
│ true           │
└────────────────┘

```

**Vector (list) example – rider prefixes**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_policy_riders = {
    "policy_id": ["P201", "P202", "P203"],
    "rider_codes_list": [
        ["B-ADB", "S-WP", "S-CI"],  # B-AccidentalDeathBenefit, S-WaiverOfPremium, S-CriticalIllness
        ["S-LTC", None, "B-GIO"],  # S-LongTermCare, B-GuaranteedInsurabilityOption
        ["S-WPR", "S-CIR"]
    ]
}
af_riders = ActuarialFrame(data_policy_riders).with_columns(
    pl.col("rider_codes_list").cast(pl.List(pl.String))
)

af_primary_benefit_check = af_riders.select(
    af_riders["rider_codes_list"].str.starts_with("B-").alias("has_primary_benefit_rider")
)
print(af_primary_benefit_check.collect())

```

```text
shape: (3, 1)
┌───────────────────────────┐
│ has_primary_benefit_rider │
│ ---                       │
│ list[bool]                │
╞═══════════════════════════╡
│ [true, false, false]      │
│ [false, null, true]       │
│ [false, false]            │
└───────────────────────────┘

```

### `strip_chars(characters=None)`

Removes specified leading and trailing characters from strings.

This is useful for cleaning data, such as removing unwanted prefixes, suffixes, or whitespace from policy numbers, client names, or address fields. It mirrors Polars' `Expr.str.strip_chars`. If no characters are specified, it defaults to removing leading and trailing whitespace. For `List[String]` columns, like a list of addresses for a client, the operation is applied element-wise to each string in the list.

When to use

- **Cleanse Identifier Fields:** Remove extraneous characters (e.g., spaces, hyphens, special symbols) from policy numbers, claim IDs, or client identifiers to ensure consistency for matching and lookups. For example, "POL- 123\* " could become "POL-123" by stripping " \*".
- **Standardize Textual Data:** Trim leading/trailing whitespace from free-text fields like occupation descriptions, addresses, or underwriter notes before analysis or storage.
- **Prepare Data for Joins:** Ensure that join keys consisting of string data are clean and consistently formatted to avoid join failures due to subtle differences like trailing spaces.
- **Sanitize User Input:** Clean user-provided search terms or filter values by removing unwanted characters before using them in queries.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `characters` | `str | Expr` | A string of characters to remove from both ends of each string. Can also be a Polars expression that evaluates to a string of characters. If None (default), removes whitespace (spaces, tabs, newlines, etc.). | `None` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy with the specified characters stripped from the strings. |

Examples:

**Scalar Example 1: Cleaning policy numbers by removing specific prefixes/suffixes and whitespace**

Policy numbers might be recorded with inconsistent characters (e.g., "ID-", "\*", spaces). We want to standardize them by removing these specific characters and any surrounding whitespace.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_policy_nos = {
    "raw_policy_id": [
        "ID-A123-XYZ*",
        " B456 ",
        "ID-C789*",
        "D012-XYZ",
        None,
        " ID-E345* ",
    ],
    "chars_to_remove_col": ["ID-*XYZ ", " ", "ID-*", "-XYZ", None, " *ID-"]
}
af = ActuarialFrame(data_policy_nos)

# Example 1a: Remove a fixed set of characters "ID-*XYZ " from policy IDs
af_cleaned_fixed = af.select(
    af["raw_policy_id"].str.strip_chars("ID-*XYZ ").alias("cleaned_fixed_chars")
)
print("Cleaned with fixed characters 'ID-*XYZ ':")
print(af_cleaned_fixed.collect())

# Example 1b: Remove characters specified in another column
# This dynamically strips characters based on the 'chars_to_remove_col' for each row.
af_cleaned_dynamic = af.select(
    af["raw_policy_id"].str.strip_chars(pl.col("chars_to_remove_col")).alias("cleaned_dynamic_chars")
)
print("\nCleaned with characters from 'chars_to_remove_col':")
print(af_cleaned_dynamic.collect())

# Example 1c: Remove only leading and trailing whitespace
af_trimmed_whitespace = af.select(
    af["raw_policy_id"].str.strip_chars().alias("trimmed_whitespace_only") # characters=None
)
print("\nCleaned with default whitespace stripping:")
print(af_trimmed_whitespace.collect())

```

```text
Cleaned with fixed characters 'ID-*XYZ ':
shape: (6, 1)
┌─────────────────────┐
│ cleaned_fixed_chars │
│ ---                 │
│ str                 │
╞═════════════════════╡
│ A123                │
│ B456                │
│ C789                │
│ D012                │
│ null                │
│ E345                │
└─────────────────────┘

Cleaned with characters from 'chars_to_remove_col':
shape: (6, 1)
┌───────────────────────┐
│ cleaned_dynamic_chars │
│ ---                   │
│ str                   │
╞═══════════════════════╡
│ A123                  │
│ B456                  │
│ C789                  │
│ D012                  │
│ null                  │
│ E345                  │
└───────────────────────┘

Cleaned with default whitespace stripping:
shape: (6, 1)
┌───────────────────────────┐
│ trimmed_whitespace_only   │
│ ---                       │
│ str                       │
╞═══════════════════════════╡
│ ID-A123-XYZ*              │
│ B456                      │
│ ID-C789*                  │
│ D012-XYZ                  │
│ null                      │
│ ID-E345*                  │
└───────────────────────────┘

```

**Vector (List Shimming) Example: Cleaning lists of product add-on codes**

Product codes for add-ons might be stored in a list, with potential unwanted characters like asterisks, hyphens, or spaces.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_addons = {
    "policy_id": ["P1001", "P1002"],
    "addon_codes_raw": [
        ["*RIDER_A- ", " -RIDER_B*", "BASE_PLAN"],
        [None, " *-RIDER_C- ", "\tRIDER_D\t*"]
    ]
}
af_addons = ActuarialFrame(data_addons).with_columns(
    pl.col("addon_codes_raw").cast(pl.List(pl.String))
)

# Strip asterisks, hyphens, spaces, and tabs from each code in the lists
af_cleaned_addons = af_addons.select(
    af_addons["addon_codes_raw"].str.strip_chars(" *-#\t").alias("cleaned_addon_codes") # Added '#' to demonstrate it's ignored if not present
)
print(af_cleaned_addons.collect())

```

```text
shape: (2, 1)
┌───────────────────────────────────┐
│ cleaned_addon_codes               │
│ ---                               │
│ list[str]                         │
╞═══════════════════════════════════╡
│ ["RIDER_A", "RIDER_B", "BASE_PLA… │
│ [null, "RIDER_C", "RIDER_D"]      │
└───────────────────────────────────┘

```

### `strip_chars_start(characters=None)`

Removes specified leading characters from strings.

Useful for standardizing data by removing known prefixes or initial whitespace. For instance, cleaning policy numbers by removing a "TEMP-" prefix or trimming spaces from the beginning of address lines. It mirrors Polars' `Expr.str.strip_chars_start`. If no characters are specified, it defaults to removing leading whitespace. When applied to `List[String]` columns (e.g., a list of historical status codes for a policy), the operation is performed element-wise.

When to use

- **Normalizing Prefixed Identifiers:** Removing consistent prefixes from identifiers like policy numbers (e.g., "PN-", "TEMP\_"), claim codes (e.g., "CL-"), or agent codes to get the core identifier.
- **Cleaning Leading Characters in Text Fields:** Removing leading non-essential characters (e.g., bullets, numbers, special symbols, spaces) from free-text fields like notes, descriptions, or imported data before further processing.
- **Standardizing Data from Multiple Sources:** If different source systems prefix the same data differently, this function can help unify them by removing those specific leading characters.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `characters` | `str | Expr` | A string of characters to remove from the start of each string. Can also be a Polars expression that evaluates to a string of characters. If None (default), removes leading whitespace (spaces, tabs, newlines, etc.). | `None` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy with specified leading characters stripped from the strings. |

Examples:

**Scalar Example: Removing prefixes from legacy system IDs and leading whitespace**

Legacy system IDs might have prefixes like "LEG\_", "OLD-", or be padded with spaces.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_ids = {
    "legacy_id": [
        "LEG_POL123",
        "  OLD-CLM456",
        "POL789",
        None,
        "LEG_ UW001", # Note the space after LEG_
        "  TRN999"
    ],
    "prefixes_to_strip": ["LEG_", "OLD-", "NONEXISTENT_", None, "LEG_ ", "  "]
}
af = ActuarialFrame(data_ids)

# Example 1a: Remove a fixed prefix "LEG_"
af_no_leg_prefix = af.select(
    af["legacy_id"].str.strip_chars_start("LEG_").alias("id_no_leg_prefix")
)
print("Stripping fixed prefix 'LEG_':")
print(af_no_leg_prefix.collect())

# Example 1b: Remove leading whitespace only (characters=None)
af_trimmed_space = af.select(
    af["legacy_id"].str.strip_chars_start().alias("id_trimmed_leading_space")
)
print("\nStripping leading whitespace only:")
print(af_trimmed_space.collect())

# Example 1c: Remove prefixes defined in another column
# This will strip any character found in the corresponding 'prefixes_to_strip' string from the start.
af_dynamic_prefix = af.select(
    af["legacy_id"].str.strip_chars_start(pl.col("prefixes_to_strip")).alias("id_dynamic_prefix_removed")
)
print("\nStripping prefixes from 'prefixes_to_strip' column (character-wise from start):")
print(af_dynamic_prefix.collect())

```

```text
Stripping fixed prefix 'LEG_':
shape: (6, 1)
┌────────────────────┐
│ id_no_leg_prefix   │
│ ---                │
│ str                │
╞════════════════════╡
│ POL123             │
│   OLD-CLM456       │
│ POL789             │
│ null               │
│ UW001              │
│   TRN999           │
└────────────────────┘

Stripping leading whitespace only:
shape: (6, 1)
┌───────────────────────────┐
│ id_trimmed_leading_space  │
│ ---                       │
│ str                       │
╞═══════════════════════════╡
│ LEG_POL123                │
│ OLD-CLM456                │
│ POL789                    │
│ null                      │
│ LEG_ UW001                │
│ TRN999                    │
└───────────────────────────┘

Stripping prefixes from 'prefixes_to_strip' column (character-wise from start):
shape: (6, 1)
┌─────────────────────────────┐
│ id_dynamic_prefix_removed   │
│ ---                         │
│ str                         │
╞═════════════════════════════╡
│ POL123                      │
│   CLM456                    │
│ POL789                      │
│ null                        │
│ UW001                       │
│ TRN999                      │
└─────────────────────────────┘

```

**Vector (List Shimming) Example: Cleaning lists of temporary transaction remarks**

Transaction remarks might be stored in lists, with some prefixed by "TEMP: " or spaces.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_remarks = {
    "policy_id": ["TRN01", "TRN02"],
    "transaction_remarks_raw": [
        ["TEMP: Initial assessment", "  Adjustment processed", "Final Review"],
        [None, "TEMP: Hold for now", "TEMP: Resolved", "Status: OK"]
    ]
}
af_remarks = ActuarialFrame(data_remarks).with_columns(
    pl.col("transaction_remarks_raw").cast(pl.List(pl.String))
)

# Example 2a: Strip fixed prefix "TEMP: " from each remark in the lists
af_cleaned_remarks_prefix = af_remarks.select(
    af_remarks["transaction_remarks_raw"].str.strip_chars_start("TEMP: ").alias("cleaned_remarks_prefix")
)
print("Cleaned remarks (prefix 'TEMP: '):")
print(af_cleaned_remarks_prefix.collect())

# Example 2b: Strip leading whitespace from list elements
af_cleaned_remarks_space = af_remarks.select(
    af_remarks["transaction_remarks_raw"].str.strip_chars_start().alias("cleaned_remarks_space")
)
print("\nCleaned remarks (leading whitespace):")
print(af_cleaned_remarks_space.collect())

```

```text
Cleaned remarks (prefix 'TEMP: '):
shape: (2, 1)
┌────────────────────────────────────────────────────────────────────────────┐
│ cleaned_remarks_prefix                                                     │
│ ---                                                                        │
│ list[str]                                                                  │
╞════════════════════════════════════════════════════════════════════════════╡
│ ["Initial assessment", "  Adjustment processed", "Final Review"]            │
│ [null, "Hold for now", "Resolved", "Status: OK"]                           │
└────────────────────────────────────────────────────────────────────────────┘

Cleaned remarks (leading whitespace):
shape: (2, 1)
┌────────────────────────────────────────────────────────────────────────────┐
│ cleaned_remarks_space                                                      │
│ ---                                                                        │
│ list[str]                                                                  │
╞════════════════════════════════════════════════════════════════════════════╡
│ ["TEMP: Initial assessment", "Adjustment processed", "Final Review"]       │
│ [null, "TEMP: Hold for now", "TEMP: Resolved", "Status: OK"]               │
└────────────────────────────────────────────────────────────────────────────┘

```

### `strip_prefix(prefix)`

Remove a prefix from each string.

The prefix is stripped whenever it occurs at the start of the string. Strings without the prefix are returned unchanged. On columns containing lists of strings, the removal happens element by element.

When to use

- Cleaning temporary identifiers such as `TEMP-123` once a policy is fully underwritten.
- Harmonizing product codes from different administration systems before mapping them to an actuarial model.
- Stripping `LEGACY-` markers from lists of rider codes imported from historical sources.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `prefix` | `str | Expr` | Prefix to remove. May be a literal string or an expression that evaluates to a string. | *required* |

Returns:

| Type | Description | | --- | --- | | `'ExpressionProxy'` | ExpressionProxy with the prefix removed. |

Examples:

**Scalar example – cleaning policy IDs**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

with pl.Config(set_tbl_width_chars=100):
    af = ActuarialFrame({"pol_id_raw": ["TEMP-001", "TEMP-002", "003", None]})
    cleaned = af.select(
        af["pol_id_raw"].str.strip_prefix("TEMP-").alias("pol_id")
    ).collect()
    print(cleaned)

```

```text
shape: (4, 1)
┌────────┐
│ pol_id │
│ ---    │
│ str    │
╞════════╡
│ 001    │
│ 002    │
│ 003    │
│ null   │
└────────┘

```

**Vector example – removing `LEGACY-` from feature codes**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

af = ActuarialFrame({
    "policy_key": ["POLICY_A", "POLICY_B"],
    "feature_codes_raw": [
        ["LEGACY-RIDER1", "NEW_FEATURE_X", "LEGACY-BENEFIT2"],
        [None, "LEGACY-COVERAGE_Y", "STANDARD_Z"],
    ],
})
af = af.with_columns(
    af["feature_codes_raw"].cast(pl.List(pl.String))
)
with pl.Config(set_tbl_width_chars=120, fmt_str_lengths=100):
    cleaned = af.select(
        af["feature_codes_raw"].str.strip_prefix("LEGACY-").alias("cleaned_feature_codes")
    ).collect()
    print(cleaned)

```

```text
shape: (2, 1)
┌─────────────────────────────────────────┐
│ cleaned_feature_codes                   │
│ ---                                     │
│ list[str]                               │
╞═════════════════════════════════════════╡
│ ["RIDER1", "NEW_FEATURE_X", "BENEFIT2"] │
│ [null, "COVERAGE_Y", "STANDARD_Z"]      │
└─────────────────────────────────────────┘

```

### `strip_suffix(suffix)`

Remove a suffix from each string.

If a string does not end with the given suffix, it is returned unchanged. For `List[String]` columns, the operation is applied element-wise.

When to use

- **Normalizing coverage names** that include trailing version codes such as "-OLD".
- **Preparing ledger accounts** by removing year suffixes like "-2024" before comparing periods.
- **Cleaning temporary identifiers** imported from external systems (for example, removing a trailing "-TMP").

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `suffix` | `str | Expr` | The suffix to remove. Either a string literal or an expression resolving to a string. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | The expression with the suffix removed. |

Examples:

**Scalar example – normalize plan names**

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "plan_name_raw": ["Term Basic-OLD", "Income Protection-OLD", "Annuity Plus", None]
}
af = ActuarialFrame(data)
result = af.select(
    af["plan_name_raw"].str.strip_suffix("-OLD").alias("plan_name")
)
print(result.collect())

```

```text
shape: (4, 1)
┌───────────────────────┐
│ plan_name             │
│ ---                   │
│ str                   │
╞═══════════════════════╡
│ Term Basic            │
│ Income Protection     │
│ Annuity Plus          │
│ null                  │
└───────────────────────┘

```

**Vector (list) example – clean trailing punctuation in claim notes**

```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

notes_data = {
    "claim_id": ["C1", "C2"],
    "notes": [["Approved.", "Paid."], [None, "In Review."]],
}
af_list = ActuarialFrame(notes_data)
af_list = af_list.with_columns(
    af_list["notes"].cast(pl.List(pl.String))
)
cleaned = af_list.select(
    af_list["notes"].str.strip_suffix(".").alias("notes_cleaned")
)
print(cleaned.collect())

```

```text
shape: (2, 1)
┌────────────────────────┐
│ notes_cleaned          │
│ ---                    │
│ list[str]              │
╞════════════════════════╡
│ ["Approved", "Paid"]    │
│ [null, "In Review"]     │
└────────────────────────┘

```

### `strptime(dtype, format=None, *, strict=True, exact=True, cache=True, ambiguous='raise', **kwargs)`

Convert string values to Date, Datetime, or Time.

This method parses textual date or time information into Polars temporal types. For `List[String]` columns, each element is parsed individually.

When to use

- Convert policy issue or claim reporting dates that are stored as strings in raw data extracts.
- Parse lists of event timestamps—such as claim status updates—when building experience studies or exposure models.
- Ingest external datasets from underwriting or administration systems where date fields come in a variety of text formats.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `dtype` | `'PolarsTemporalType'` | The Polars temporal type to convert to (pl.Date, pl.Datetime, or pl.Time). | *required* | | `format` | `Optional[str]` | The strf/strptime format string. If None, the format is inferred where possible. | `None` | | `strict` | `bool` | If True (default), raise an error on parsing failure. | `True` | | `exact` | `bool` | If True (default), require an exact format match. | `True` | | `cache` | `bool` | If True (default), cache parsing results for performance. | `True` | | `ambiguous` | `str | Expr` | How to handle ambiguous datetimes, such as daylight-saving transitions. Options are "raise" (default), "earliest", "latest", or "null". Can also be a Polars expression. | `'raise'` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | Strings converted to the specified temporal type. |

Examples:

**Scalar Example: Parsing policy issue dates**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data = {
    "policy_id": ["A100", "B200", "C300"],
    "issue_date_str": [
        "2021-01-15",
        "20/02/2022",
        "2023-03-10 14:30:00"
    ]
}
af = ActuarialFrame(data)
af_parsed_dates = af.select(
    af["issue_date_str"].str.strptime(pl.Date, "%Y-%m-%d", strict=False).alias("issue_date_strict_fmt"),
    af["issue_date_str"].str.strptime(pl.Date, "%d/%m/%Y", strict=False).alias("issue_date_dmy_fmt"),
    af["issue_date_str"].str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S", strict=False).alias("issue_datetime"),
)
result = af_parsed_dates.collect()
print(result)

```

```text
shape: (3, 3)
┌───────────────────────┬────────────────────┬─────────────────────┐
│ issue_date_strict_fmt ┆ issue_date_dmy_fmt ┆ issue_datetime      │
│ ---                   ┆ ---                ┆ ---                 │
│ date                  ┆ date               ┆ datetime[μs]        │
╞═══════════════════════╪════════════════════╪═════════════════════╡
│ 2021-01-15            ┆ null               ┆ null                │
│ null                  ┆ 2022-02-20         ┆ null                │
│ null                  ┆ null               ┆ 2023-03-10 14:30:00 │
└───────────────────────┴────────────────────┴─────────────────────┘

```

**Vector Example: Parsing lists of event timestamps**

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_list = {
    "claim_id": ["CL001"],
    "event_timestamps_str": [["2023-04-01T10:00:00", "2023-04-01T10:05:00", "Invalid"]],
}
af_list = ActuarialFrame(data_list).with_columns(
    pl.col("event_timestamps_str").cast(pl.List(pl.String))
)
af_parsed_list = af_list.select(
    af_list["event_timestamps_str"].str.strptime(
        pl.Datetime, "%Y-%m-%dT%H:%M:%S", strict=False
    ).alias("event_datetimes_μs")
)
result = af_parsed_list.collect()
print(result)

```

```text
shape: (1, 1)
┌──────────────────────────────────────────────────┐
│ event_datetimes_μs                               │
│ ---                                              │
│ list[datetime[μs]]                               │
╞══════════════════════════════════════════════════╡
│ [2023-04-01 10:00:00, 2023-04-01 10:05:00, null] │
└──────────────────────────────────────────────────┘

```

### `to_lowercase()`

Converts all characters in string columns to lowercase.

This function standardizes textual data by converting all characters in a string column to lowercase. This is essential for ensuring consistency in data fields critical for actuarial analysis, such as system codes, free-text fields like occupation or medical conditions, or external data sources, facilitating accurate matching, aggregation, and text analysis.

When to use

- **Normalizing Text for Analysis:** Preparing free-text fields (e.g., underwriting notes, claim descriptions, occupation details) for text mining or NLP by ensuring terms like "SMOKER", "Smoker", and "smoker" are treated identically.
- **Improving Data Matching with External Sources:** When integrating data from various systems or third-party providers where case consistency is not guaranteed (e.g., matching addresses, names, or city information).
- **Standardizing User Input:** Converting user-entered data (e.g., search terms, filter criteria) to a consistent case before processing or querying.

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | An ExpressionProxy with strings converted to lowercase. |

Examples:

**Scalar Example: Normalizing occupation descriptions for risk analysis**

Occupation descriptions might be entered in various casings. Converting to lowercase helps in standardizing them for consistent risk factor analysis or grouping.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id": ["POL001", "POL002", "POL003", "POL004"],
    "occupation_raw": ["Engineer", "software DEVELOPER", "Teacher", "Project Manager"]
}
af = ActuarialFrame(data)
af_lower_occupation = af.select(
    af["occupation_raw"].str.to_lowercase().alias("occupation_normalized")
)
print(af_lower_occupation.collect())

```

```text
shape: (4, 1)
┌───────────────────────┐
│ occupation_normalized │
│ ---                   │
│ str                   │
╞═══════════════════════╡
│ engineer              │
│ software developer    │
│ teacher               │
│ project manager       │
└───────────────────────┘

```

**Vector Example: Lowercasing medical condition codes from multiple sources**

Medical condition codes might come from different systems with varying casing. Lowercasing them ensures they can be consistently mapped or analyzed.

```python
from gaspatchio_core.frame.base import ActuarialFrame
import polars as pl

data_medical_codes = {
    "claim_id": ["C001", "C002"],
    "condition_codes_list": [
        ["DIAB_T2", "HBP", "ASTHMA"], # DIAB_T2 = Type 2 Diabetes, HBP = High Blood Pressure
        ["hbp", None, "copd"]         # COPD = Chronic Obstructive Pulmonary Disease
    ]
}
af_codes = ActuarialFrame(data_medical_codes)
# Ensure the list column has the correct Polars type for the string operation
af_codes = af_codes.with_columns(
    af_codes["condition_codes_list"].cast(pl.List(pl.String))
)
af_lower_codes = af_codes.select(
    af_codes["condition_codes_list"].str.to_lowercase().alias("lower_condition_codes")
)
print(af_lower_codes.collect())

```

```text
shape: (2, 1)
┌─────────────────────────────────────┐
│ lower_condition_codes               │
│ ---                                 │
│ list[str]                           │
╞═════════════════════════════════════╡
│ ["diab_t2", "hbp", "asthma"]        │
│ ["hbp", null, "copd"]               │
└─────────────────────────────────────┘

```

### `to_uppercase()`

Converts all characters in string columns to uppercase.

This function standardizes textual data by converting all characters in a string column to uppercase. This is essential for ensuring consistency in data fields critical for actuarial analysis, such as policy status codes, product identifiers, or geographical regions, facilitating accurate matching, aggregation, and reporting.

When to use

- **Standardizing Categorical Data:** Ensuring that codes like policy status (e.g., "active", "Lapsed", "ACTIVE" all become "ACTIVE"), gender codes (e.g., "m", "F" become "M", "F"), or smoker status (e.g. "non-smoker", "Smoker" become "NON-SMOKER", "SMOKER") are consistent for grouping and analysis.
- **Improving Data Matching:** Facilitating joins and lookups between different datasets where case sensitivity might cause mismatches (e.g., matching policyholder names or addresses from different sources).
- **Enhancing Readability and Reporting:** Presenting data in a uniform case for reports and dashboards, especially for identifiers or codes.
- **Preparing Text for Analysis:** As a preprocessing step before text mining or natural language processing tasks on fields like claim descriptions or underwriter notes, where case normalization can simplify pattern recognition.
- **Simplifying Rule-Based Logic:** When applying business rules that depend on string comparisons (e.g., identifying policies with specific rider codes like "ADB" or "WP" irrespective of their original casing).

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | A new ExpressionProxy with strings converted to uppercase. |

Examples:

**Scalar Example: Standardizing policy status codes**

Policy status might be entered in various cases ("active", "lapsed", "ACTIVE"). Converting to uppercase ensures consistency for analysis.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data = {
    "policy_id": ["S3001", "S3002", "S3003", "S3004"],
    "status_raw": ["active", "lapsed", "Active", "PENDING"]
}
af = ActuarialFrame(data)
af_upper_status = af.select(
    af["status_raw"].str.to_uppercase().alias("status_standardized")
)
print(af_upper_status.collect())

```

```text
shape: (4, 1)
┌─────────────────────┐
│ status_standardized │
│ ---                 │
│ str                 │
╞═════════════════════╡
│ ACTIVE              │
│ LAPSED              │
│ ACTIVE              │
│ PENDING             │
└─────────────────────┘

```

**Vector Example: Uppercasing rider codes for a policy**

A policy might have multiple rider codes stored in a list. To ensure uniformity, we can convert all rider codes to uppercase.

```python
from gaspatchio_core.frame.base import ActuarialFrame

data_policy_riders = {
    "policy_id": ["R4001", "R4002", "R4003"],
    "rider_codes_str": [
        "adb,wp",
        "ci,ltc,acc_death",
        "gio"
    ]
}
af_riders = ActuarialFrame(data_policy_riders)
# Convert string to list for the string operation
af_riders = af_riders.with_columns(
    af_riders["rider_codes_str"].str.split(",").alias("rider_codes_list")
)
af_upper_riders = af_riders.select(
    af_riders["rider_codes_list"].str.to_uppercase().alias("upper_rider_codes")
)
print(af_upper_riders.collect())

```

```text
shape: (3, 1)
┌────────────────────────────┐
│ upper_rider_codes          │
│ ---                        │
│ list[str]                  │
╞════════════════════════════╡
│ ["ADB", "WP"]              │
│ ["CI", "LTC", "ACC_DEATH"] │
│ ["GIO"]                    │
└────────────────────────────┘

```

### `zfill(length)`

Pad strings with leading zeros to a minimum width.

Shorter values are padded on the left with zeros so each entry reaches `length` characters. For list columns, the padding occurs element-wise.

When to use

- Standardizing policy numbers from different administration systems before merging with valuation data
- Preparing zero-padded claim numbers for extracts sent to reinsurers or regulators
- Building fixed-width keys when joining to rating tables or mapping grids

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `length` | `int` | The desired minimum length of the string. | *required* |

Returns:

| Name | Type | Description | | --- | --- | --- | | `ExpressionProxy` | `'ExpressionProxy'` | Strings padded with leading zeros. |

##### Examples

Scalar example – Standardizing policy serial numbers::

````text
```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

with pl.Config(fmt_str_lengths=100):
    data = {"policy_serial": ["123", "45", "6789", None, "1"]}
    af = ActuarialFrame(data)
    result = af.select(
        af["policy_serial"].str.zfill(5).alias("zfilled_serial")
    )
    print(result.collect())
````

```text
shape: (5, 1)
┌────────────────┐
│ zfilled_serial │
│ ---            │
│ str            │
╞════════════════╡
│ 00123          │
│ 00045          │
│ 06789          │
│ null           │
│ 00001          │
└────────────────┘
```

````

Vector example – Padding numerical components in claim codes::

```text
```python
import polars as pl
from gaspatchio_core.frame.base import ActuarialFrame

with pl.Config(fmt_str_lengths=100):
    data = {
        "claim_batch": ["B01", "B02"],
        "item_codes": [["A1", "B123", "C04"], [None, "D56"]],
    }
    af = ActuarialFrame(data)
    af = af.with_columns(
        af["item_codes"].cast(pl.List(pl.String))
    )
    result = af.select(
        af["item_codes"].str.zfill(4).alias("zfilled_item_codes")
    )
    print(result.collect())
````

```text
shape: (2, 1)
┌──────────────────────────┐
│ zfilled_item_codes       │
│ ---                      │
│ list[str]                │
╞══════════════════════════╡
│ ["00A1", "B123", "0C04"] │
│ [null, "0D56"]           │
└──────────────────────────┘
```

```
```