Excel
gaspatchio_core.accessors.excel.ExcelColumnAccessor
¶
Bases: BaseColumnAccessor
Provides Excel-related methods applicable to columns or expressions.
Accessed via .excel
on an ActuarialFrame column or expression proxy,
e.g., af["my_excel_col"].excel
.
__init__(proxy)
¶
Initializes the accessor with the parent proxy.
Internal initialization method for the Excel column accessor.
from_excel_serial(epoch='1900')
¶
Converts Excel serial numbers (integers or floats) to Polars Date.
Follows logic similar to openpyxl for compatibility. This method handles Excel's date serialization system, including the notorious Excel 1900 leap year bug where Excel incorrectly treats 1900 as a leap year.
When to use
- Excel File Import: When importing Excel files that contain date columns stored as serial numbers rather than proper date values.
- Legacy Data Processing: When working with older Excel files or systems that export dates as numeric serial values.
- Cross-Platform Compatibility: When handling Excel files that may have been created on different platforms (Windows vs Mac) with different epoch systems.
- Data Validation: When you need to convert and validate date serial numbers from external Excel-based data sources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epoch
|
str
|
The epoch system used by Excel ('1900' or '1904'). Defaults to '1900'.
|
'1900'
|
Returns:
Type | Description |
---|---|
ExpressionProxy
|
An ExpressionProxy representing the converted date column. |
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid epoch is provided. |
Examples:
from gaspatchio_core import ActuarialFrame
# Excel serial numbers for some dates
data = {
"policy_id": ["P001", "P002", "P003"],
"excel_date_serial": [44197, 44562, 44927], # Excel serial numbers
}
af = ActuarialFrame(data)
# Convert Excel serial numbers to proper dates
af_with_dates = af.with_columns(
actual_date=af["excel_date_serial"].excel.from_excel_serial(epoch="1900")
)
print(af_with_dates.collect())
shape: (3, 3)
┌───────────┬────────────────────┬─────────────┐
│ policy_id ┆ excel_date_serial ┆ actual_date │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ date │
╞═══════════╪════════════════════╪═════════════╡
│ P001 ┆ 44197 ┆ 2021-01-01 │
│ P002 ┆ 44562 ┆ 2021-12-31 │
│ P003 ┆ 44927 ┆ 2022-12-31 │
└───────────┴────────────────────┴─────────────┘
yearfrac(end_date_expr, basis='act/act')
¶
Calculate the year fraction between two dates, similar to Excel's YEARFRAC.
This function computes the fraction of a year represented by the number of whole days between a start date (the column/expression this accessor is on) and an end date. It uses a specified day count basis. The function can operate on individual dates (scalars or columns) and also handles scenarios where one of the date inputs is a list of dates within a column.
When to use
- Premium Proration: Calculate the portion of an annual premium that corresponds to a partial policy term, for example, if a policy starts or ends mid-year.
- Exposure Calculation: Determine fractional exposure periods for reserving or IBNR (Incurred But Not Reported) calculations, especially when dealing with policies that are not in force for a full year.
- Investment Analysis: Compute fractional year periods for accrued interest calculations or for annualizing returns on investments held for parts of a year.
- Performance Metrics: Analyze time-based metrics such as time-to-claim or duration of an event, expressed as a fraction of a year.
Parameters¶
end_date_expr : IntoExprColumn
An expression or column representing the end dates. Can be a scalar date,
a column of dates, or a column of List[Date]
if the start date is a
scalar/column of dates (and vice-versa).
basis : int or str, optional
The day count basis to use. Can be an integer (0-4) or a string name.
Defaults to "act/act" (which is basis 1).
Supported bases:
- `0` or `'us_nasd_30_360'` (30/360 US NASD) - US (NASD) 30/360 convention
- `1` or `'act/act'` (Actual/Actual) - Simplified version (uses 365.25 days)
- `2` or `'actual_360'` (Actual/360) - Not Implemented
- `3` or `'actual_365'` (Actual/365 fixed) - Not Implemented
- `4` or `'european_30_360'` (30/360 European) - Not Implemented
Returns¶
ExpressionProxy
An expression representing the calculated year fraction as a Float64
.
If one of the inputs was a List[Date]
, the output will be a List[Float64]
.
Raises¶
NotImplementedError
If a basis
other than the currently supported basis values is specified,
or if both start and end date expressions resolve to List[Date]
columns
(which requires a more complex UDF or explode/aggregate pattern).
TypeError
If the underlying proxy for the start date is not a ColumnProxy
or ExpressionProxy
.
RuntimeError
If the operation requires an ActuarialFrame
context that is not available.
ValueError
If an invalid basis is provided.
Examples¶
Calculating Policy Term as Year Fraction (Scalar/Column Operations)::
Scenario: You have policy start and end dates and want to calculate the policy term in years.
```python
import datetime
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["P001", "P002", "P003"],
"start_date": [
datetime.date(2020, 1, 1),
datetime.date(2021, 6, 15),
datetime.date(2022, 3, 1),
],
"end_date": [
datetime.date(2021, 1, 1),
datetime.date(2022, 6, 15),
datetime.date(2022, 9, 1), # Partial year
],
}
af = ActuarialFrame(data)
# Calculate year fraction using 'act/act' (simplified)
af_with_term = af["start_date"].excel.yearfrac(af["end_date"], basis="act/act")
print(af_with_term.collect())
```
```
shape: (3, 4)
┌───────────┬────────────┬────────────┬────────────┐
│ policy_id ┆ start_date ┆ end_date ┆ term_years │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ date ┆ f64 │
╞═══════════╪════════════╪════════════╪════════════╡
│ P001 ┆ 2020-01-01 ┆ 2021-01-01 ┆ 1.002053 │
│ P002 ┆ 2021-06-15 ┆ 2022-06-15 ┆ 0.999316 │
│ P003 ┆ 2022-03-01 ┆ 2022-09-01 ┆ 0.503765 │
└───────────┴────────────┴────────────┴────────────┘
```
Fractional Exposure for Multiple Claim Events from a Single Policy Start (List Operation)::
Scenario: A policy has a single start date, but multiple claim event dates.
Calculate the time from policy start to each claim event as a year fraction.
```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame
data = {
"policy_id": ["PolicyA", "PolicyB"],
"policy_start_date": [datetime.date(2020, 1, 1), datetime.date(2021, 1, 1)],
"claim_event_dates": [
[datetime.date(2020, 7, 1), datetime.date(2021, 3, 15)], # Events for PolicyA
[datetime.date(2021, 2, 1)], # Event for PolicyB
],
}
# Ensure claim_event_dates is typed as List[Date]
af = ActuarialFrame(data, schema_overrides={"claim_event_dates": pl.List(pl.Date)})
af_with_frac = af.with_columns(
time_to_event_years = af["policy_start_date"].excel.yearfrac(af["claim_event_dates"])
)
print(af_with_frac.collect())
```
```
shape: (2, 4)
┌───────────┬───────────────────┬───────────────────────────────────────────┬─────────────────────────────┐
│ policy_id ┆ policy_start_date ┆ claim_event_dates ┆ time_to_event_years │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ list[date] ┆ list[f64] │
╞═══════════╪═══════════════════╪═══════════════════════════════════════════╪═════════════════════════════╡
│ PolicyA ┆ 2020-01-01 ┆ [2020-07-01, 2021-03-15] ┆ [0.50016, 1.200046] │
│ PolicyB ┆ 2021-01-01 ┆ [2021-02-01] ┆ [0.084873] │
└───────────┴───────────────────┴───────────────────────────────────────────┴─────────────────────────────┘
```