Skip to content

Integrating Custom Python Code

As an actuary using Gaspatchio, you might have existing Python functions or complex logic you want to integrate into your models. Perhaps you have a specific benefit calculation, a complex decrement logic, or a custom reserving method implemented in Python.

Gaspatchio provides two primary ways to incorporate this custom logic into the ActuarialFrame workflow:

  1. Direct Application (.apply): For quick, one-off use cases or simple functions.
  2. Accessor Plugins: For more complex, reusable logic that benefits from better organization and integration.

1. Direct Application with .apply()

If you have a relatively simple Python function that operates on a single column's data element-wise, the quickest way to use it is via the .apply() method on a column proxy.

Let's say you have a Python function to calculate a simple bonus amount based on the policy duration:

# Your existing Python function
def calculate_bonus(duration: int) -> float:
    if duration <= 5:
        return 0.0
    elif duration <= 10:
        return 50.0
    else:
        return 100.0 + (duration - 10) * 5.0

You can apply this directly within your model definition:

import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame

# Assume 'af' is your ActuarialFrame with a 'policy_duration' column
# af = ActuarialFrame(...)

# Apply the custom Python function
# Note: We provide a return_dtype for better performance and type stability
af["bonus_amount"] = af["policy_duration"].apply(
    calculate_bonus, return_dtype=pl.Float64
)

result = af.collect()
print(result)

Pros:

  • Simple: Very straightforward for existing functions.
  • Quick: No extra setup required for one-off calculations.

Cons:

  • Performance: Python function execution can be slower than native Polars/Gaspatchio operations, especially for large datasets. Providing return_dtype helps, but it won't be as fast as a pure expression. Gaspatchio might attempt Numba optimization in "optimize" mode if Numba is installed, but this isn't guaranteed.
  • Readability: Can clutter model logic if many complex .apply calls are used.
  • Reusability: Less discoverable and reusable across different models compared to plugins.
  • Limited Scope: Primarily designed for element-wise operations on single columns.

Use .apply() when you need a quick integration and the performance impact is acceptable, or when prototyping logic before potentially converting it into a more optimized expression or plugin.

Performance Considerations with .apply()

Using .apply() executes your Python function row by row. This involves overhead for each element (calling the Python interpreter, type checking, etc.) and prevents vectorized optimizations that operate on entire columns simultaneously. As a result, it can be orders of magnitude slower than equivalent logic written using native Polars/Gaspatchio expressions, especially on large datasets.

You might see a PerformanceWarning when using .apply() similar to this:

PerformanceWarning: Applying a Python function 'your_function_name' using map_elements. This is potentially slow.
For better performance, consider using Polars expressions directly.

While convenient for quick tests or simple logic, relying heavily on .apply() for core calculations will significantly impact your model's performance. It's strongly recommended to rewrite the logic using native Polars expressions or within an accessor plugin for production use, as shown in the next section.

If your custom logic is more complex, will be reused across different models, or involves multiple related calculations, creating an accessor plugin is the recommended approach.

Accessor plugins extend ActuarialFrame (or its column/expression proxies) with custom namespaces. Think of the built-in .dt (for dates) or .str (for strings) namespaces in Polars – plugins let you create your own, like .mortality or .reserving.

Why Create a Plugin?

  • Organization: Group related custom calculations under a single namespace (e.g., af["premium"].finance.present_value(...)).
  • Reusability: Define logic once and use it across multiple models or share it with colleagues.
  • Readability: Keeps model definitions cleaner by encapsulating complex logic within accessor methods.
  • Discoverability: Makes custom functions easily discoverable via standard attribute access (and dir()).
  • Potential for Optimization: Accessor methods can be written to leverage efficient Polars expressions internally.

Creating a Simple Column Accessor

Let's adapt our calculate_bonus function into a reusable column accessor plugin.

Step 1: Define the Accessor Class

Create a Python file (e.g., my_company_accessors.py) and define your class:

# my_company_accessors.py
import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame, ColumnProxy, ExpressionProxy
from gaspatchio_core.dsl.plugins import register_accessor

class BaseAccessor:
    """Optional base class for convenience."""
    def __init__(self, obj):
        # obj will be the ColumnProxy or ExpressionProxy instance
        self._obj = obj

@register_accessor("bonus", kind="column") # Register as .bonus for columns/expressions
class BonusAccessor(BaseAccessor):

    def amount(self) -> ExpressionProxy:
        """Calculates the bonus amount based on the proxied duration column."""
        # We use Polars expressions *inside* the accessor for performance
        duration_expr = self._obj # self._obj is the duration column/expression proxy

        bonus_expr = (
            pl.when(duration_expr <= 5).then(0.0)
            .when(duration_expr <= 10).then(50.0)
            .otherwise(100.0 + (duration_expr - 10) * 5.0)
            .cast(pl.Float64) # Ensure consistent output type
        )
        # Important: Return an ExpressionProxy
        # We assume self._obj has a ._parent attribute (true for Column/ExpressionProxy)
        return ExpressionProxy(bonus_expr, self._obj._parent)

    def is_eligible(self, threshold: int = 5) -> ExpressionProxy:
         """Checks if bonus is eligible based on duration."""
         duration_expr = self._obj
         eligibility_expr = duration_expr > threshold
         return ExpressionProxy(eligibility_expr, self._obj._parent)


# IMPORTANT: Ensure this module (my_company_accessors.py) is imported somewhere
# in your application *after* gaspatchio_core.dsl.core is defined.
# e.g., in __init__.py or main.py:
# import my_company_accessors

Key Points:

  • @register_accessor("bonus", kind="column"): This decorator registers the BonusAccessor class. It will be available as .bonus on ColumnProxy and ExpressionProxy instances.
  • __init__(self, obj): Stores the proxy object (ColumnProxy or ExpressionProxy) the accessor is attached to.
  • amount(self): Implements the bonus logic using efficient Polars when/then/otherwise expressions instead of a Python function. It returns a new ExpressionProxy.
  • Returning ExpressionProxy: Accessor methods that perform calculations should generally return ExpressionProxy objects to keep the operations within the Gaspatchio/Polars expression system for optimal performance and lazy evaluation.

Step 2: Import Your Accessor Module

Somewhere in your project (e.g., your main script or a relevant __init__.py), make sure to import the module containing your accessor definition. This triggers the registration decorator.

# main_model.py
import polars as pl
from gaspatchio_core.dsl.core import ActuarialFrame
import my_company_accessors # <--- Import to register .bonus accessor

af = ActuarialFrame({ "policy_duration": [3, 7, 12] })

# Use the accessor!
af["bonus_amount"] = af["policy_duration"].bonus.amount()
af["is_bonus_eligible"] = af["policy_duration"].bonus.is_eligible()
# You can chain accessors with other operations
af["eligible_bonus"] = af["is_bonus_eligible"] * af["bonus_amount"]

print(af.collect())

Frame Accessors and Entry Points

You can also create frame accessors (kind="frame") that attach to the ActuarialFrame itself, useful for portfolio-level calculations.

Furthermore, if you are developing a package of reusable actuarial components, you can use entry points to make your accessors automatically discoverable when someone installs your package, without requiring them to explicitly import your accessor module.

These are more advanced topics covered in the technical reference documentation. For most users integrating their own project-specific code, the @register_accessor decorator provides the best balance of organization and ease of use.

Choose the method that best suits the complexity and reusability needs of your custom Python code. For simple, infrequent use, .apply() is sufficient. For structured, reusable, and potentially performance-critical logic, invest the time to create an accessor plugin.