Documentation Guidelines¶
This document explains the Gaspatchio documentation system, its underlying philosophy, and how to write effective docstrings that serve both human readers and AI systems.
Philosophy and Architecture¶
The Three-Pillar Approach¶
Our documentation system is built on three core principles:
- Human-First: Every docstring should educate experienced life insurance actuaries with domain-specific examples
- AI-Ready: Structured content that enables excellent RAG (Retrieval Augmented Generation) for LLM assistance
- Always Correct: Executable, linted, and automatically validated examples that never become stale
Why This Matters¶
Traditional documentation suffers from three critical problems:
- Stale Examples: Code examples that worked when written but break as the codebase evolves
- Generic Content: Examples that don't connect to real-world actuarial use cases
- Poor AI Integration: Unstructured content that LLMs struggle to understand and use effectively
Our system solves these by treating documentation as executable code that gets validated on every commit.
The Docstring Engine¶
Core Components¶
The system is built around several key components:
1. Pydantic Models (models.py
)
- GaspatchioDocstring
: Represents a complete parsed docstring
- DocstringCodeExample
: Individual executable code examples
- DocstringParameter
/DocstringReturn
: Structured parameter documentation
2. Parser (parse.py
)
- Extracts docstrings from Python files using AST analysis
- Parses Markdown fenced code blocks (```python) for examples
- Links examples to their parent objects for validation
3. Validation Engine (validate.py
)
- Structural validation (required sections, formatting)
- Ruff linting of all code examples
- Execution testing to verify outputs match documentation
- Doctest compatibility for legacy workflows
4. Pytest Integration (pytest_plugin.py
)
- Discovers and runs docstring examples as tests
- Provides detailed failure reporting
- Supports updating outputs automatically when code changes
Example Structure¶
Here's the anatomy of a well-formed Gaspatchio docstring:
def year(self) -> "ExpressionProxy":
"""Extract the year from the underlying datetime expression.
Corresponds to Polars ``Expr.dt.year()`` and returns an integer
representing the year component of the datetime.
!!! note "When to use"
In actuarial modeling, extracting years from dates enables:
* **Policy vintage analysis**: Group policies by issue year to study underwriting changes over time
* **Claims trending**: Analyze how claim frequencies and severities change by accident year
* **Regulatory compliance**: Extract policy years for reserve calculations and statutory reporting
Examples
--------
Scalar example - Policy Issue Years::
```python
import datetime
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame({
"policy_id": ["P001", "P002", "P003"],
"issue_date": [
datetime.date(2020, 3, 15),
datetime.date(2021, 7, 22),
datetime.date(2022, 1, 10)
]
})
year_expr = af["issue_date"].dt.year()
print(af.select(year_expr.alias("issue_year")).collect())
```
```
shape: (3, 1)
┌────────────┐
│ issue_year │
│ --- │
│ i32 │
╞════════════╡
│ 2020 │
│ 2021 │
│ 2022 │
└────────────┘
```
Vector example - Multiple Claim Years::
```python
import datetime
import polars as pl
from gaspatchio_core import ActuarialFrame
af = ActuarialFrame({
"policy_id": ["C001", "C002"],
"claim_dates": [
[datetime.date(2020, 6, 1), datetime.date(2021, 8, 15)],
[datetime.date(2019, 12, 3), datetime.date(2020, 4, 20)]
]
})
af = af.with_columns(
af["claim_dates"].cast(pl.List(pl.Date))
)
years_expr = af["claim_dates"].dt.year()
print(af.select("policy_id", years_expr.alias("claim_years")).collect())
```
```
shape: (2, 2)
┌───────────┬─────────────┐
│ policy_id ┆ claim_years │
│ --- ┆ --- │
│ str ┆ list[i32] │
╞═══════════╪═════════════╡
│ C001 ┆ [2020, 2021]│
│ C002 ┆ [2019, 2020]│
└───────────┴─────────────┘
```
"""
Required Sections¶
Every public method docstring must include:
- Short description: One-line summary of what the function does
- Long description: More detailed explanation of behavior and context
- When to use: Domain-specific actuarial use cases (see guidelines below)
- Examples: At least one executable example, preferably both scalar and vector
Pytest Integration and CI/CD¶
Configuration¶
The system is configured in pyproject.toml
:
[tool.pytest.ini_options]
addopts = [
"--gp-docstring-paths=gaspatchio_core/column/namespaces/dt_proxy.py",
"--gp-docstring-paths=gaspatchio_core/column/namespaces/string_proxy.py",
"--gp-docstring-paths=gaspatchio_core/assumptions/_loader.py",
"--gp-docstring-paths=gaspatchio_core/__init__.py",
"-m",
"not benchmark",
]
markers = [
"gaspatchio_docstring_example: marks tests as Gaspatchio docstring examples",
"gaspatchio_docstring_structure_check: marks tests as docstring structure validation",
]
Running Tests¶
Basic validation (structure + linting):
# Run all docstring tests
uv run pytest -m gaspatchio_docstring_example
# Test specific file
uv run pytest gaspatchio_core/column/namespaces/dt_proxy.py --gp-docstring-paths="gaspatchio_core/column/namespaces/dt_proxy.py"
# Test with output execution
uv run pytest -m gaspatchio_docstring_example -v
Update mode (regenerate outputs):
# Update all examples after code changes
uv run pytest -m gaspatchio_docstring_example --gp-update-examples
# Update specific method
uv run pytest -k "DtNamespaceProxy.year" --gp-update-examples
CLI Tools¶
The system provides dedicated CLI commands:
# Parse and inspect docstrings
uv run gp-docstrings parse gaspatchio_core/column/namespaces/dt_proxy.py --method "DtNamespaceProxy.year"
# Check example execution without updating
uv run gp-docstrings run-print-check --file gaspatchio_core/column/namespaces/dt_proxy.py
# Update docstring outputs after code changes
uv run gp-docstrings update --file gaspatchio_core/column/namespaces/dt_proxy.py
CI/CD Pipeline Integration¶
The documentation system integrates into CI/CD through multiple checkpoints:
1. Pre-commit Hooks
# Lint docstring examples
uv run gp-docstrings lint src/gaspatchio_core --strict
# Validate structure
uv run pytest -m gaspatchio_docstring_structure_check
2. CI Test Matrix
# Example GitHub Actions step
- name: Validate Documentation
run: |
uv run pytest -m "gaspatchio_docstring_example or gaspatchio_docstring_structure_check" --tb=short
uv run gp-docstrings run-print-check gaspatchio_core/
3. Release Validation
Before releases, the system ensures:
- All examples execute without errors
- All outputs match current behavior
- No structural issues in docstrings
- Ruff linting passes for all example code
Writing Effective Actuarial Examples¶
The "When to use" Section¶
This section is crucial for AI systems to understand when to recommend functions. Follow these guidelines:
✅ Good - Specific actuarial use cases:
!!! note "When to use"
In actuarial modeling, extracting months enables:
* **Seasonality analysis**: Identify patterns in claims frequency by month of occurrence
* **Policy grouping**: Batch policies by issue month for cohort studies and renewal processing
* **Regulatory reporting**: Extract policy months for statutory reserve calculations
❌ Bad - Generic or obvious:
!!! note "When to use"
Use this function when you need to extract the month from a date column.
Example Guidelines¶
Domain Focus: Always use actuarial data and scenarios
# ✅ Good - Actuarial context
af = ActuarialFrame({
"policy_id": ["P001", "P002"],
"premium_due_date": [datetime.date(2023, 3, 15), datetime.date(2023, 6, 20)],
"claim_amount": [15000, 8500]
})
# ❌ Bad - Generic example
df = pl.DataFrame({
"id": [1, 2],
"date": [datetime.date(2023, 1, 1), datetime.date(2023, 2, 1)],
"value": [100, 200]
})
Self-Contained: Every example must include all necessary imports and setup
# ✅ Complete example
import datetime
import polars as pl # Only if pl.List or pl.Date are used
from gaspatchio_core import ActuarialFrame
# Set up realistic data
af = ActuarialFrame({...})
# Show the operation
result = af["date_column"].dt.year()
print(af.select(result.alias("year")).collect())
Both Scalar and Vector: Provide examples for both single values and list operations when applicable
Prompting for Examples¶
When writing docstrings, consider this workflow:
1. Identify the Function's Purpose - What actuarial problem does this solve? - When would an actuary reach for this function?
2. Create Realistic Scenarios
"I'm documenting the `contains()` string method. An actuary might use this to:
- Filter policies by rider codes (checking if policy codes contain 'WL' for whole life)
- Identify claim descriptions containing specific keywords
- Find coverage types with certain patterns"
3. Build Complete Examples - Start with realistic actuarial data - Show the operation in context - Include expected output - Test that it actually runs
4. Write the "When to use" Section
Think: "An LLM should recommend this function when an actuary says..."
- "I need to filter policies by coverage type"
- "I want to find claims with specific descriptions"
- "I need to identify which policies have certain riders"
Testing Your Documentation¶
Before submitting:
# Validate structure and examples
uv run pytest -k "your_method_name" -m gaspatchio_docstring_example -v
# Check just the linting
uv run gp-docstrings lint --file your_file.py --method "YourClass.your_method"
# Test execution and outputs
uv run gp-docstrings run-print-check --file your_file.py --method "YourClass.your_method"
Advanced Features¶
Handling Different Example Types¶
Skip execution for illustrative examples:
```python skip
# This won't be executed, just shown for illustration
import some_external_system
result = some_external_system.complex_operation()
Examples expected to fail:
```python expect_failure
# This example demonstrates error handling
af["nonexistent_column"].dt.year() # Raises ColumnNotFoundError
Skip output checking:
```python no_output_check
# Code runs but output isn't validated (useful for timing tests)
import time
start = time.time()
result = af.complex_operation()
print(f"Took {time.time() - start:.2f} seconds")
Debugging Common Issues¶
Ruff Linting Failures:
- Check for unused imports: import polars as pl
when only using built-in types
- Ensure proper spacing around operators
- Use af["column"]
instead of pl.col("column")
where possible
Output Mismatches:
- Polars formatting can change between versions
- Use --gp-update-examples
to regenerate outputs after valid changes
- Check for trailing whitespace in expected outputs
Structure Validation Errors:
- Ensure all public methods have "When to use" sections
- Check that examples ending in expressions have output blocks
- Verify Markdown fencing is correct (python, not just
)
Benefits¶
This system provides several key advantages:
For Developers: - Never-stale examples that break builds when they become incorrect - Consistent documentation structure across the codebase - Automated generation of example outputs
For Actuaries: - Domain-specific examples that directly relate to their work - Confidence that examples actually work as documented - Rich context about when and why to use each function
For AI Systems: - Structured, semantic content perfect for RAG - Clear use-case descriptions that enable intelligent recommendations - Validated examples that can be safely suggested to users
The investment in this system pays dividends through improved developer productivity, user experience, and AI integration capabilities.