IRBStudio is a Python package designed as a comprehensive AIRB Scenario & Impact Analysis Engine. It empowers bank risk analysts and model owners to run sophisticated "what-if" scenarios locally, simulating the impact of modeling choices and parameter assumptions on Risk-Weighted Assets (RWA) and capital requirements.
The core goal is to provide a clear, quantitative basis for strategic decisions, such as understanding the capital impact of improving a PD model's predictive power (AUC) or comparing standardized approach vs. AIRB capital requirements.
- Hybrid Portfolio Segmentation: Correctly segments portfolios into historical vs. application samples, and existing vs. new clients
- Beta Mixture Distribution: Learns portfolio risk profile from historical PD scores
- Rating Migration Matrices: Captures temporal dynamics and rating grade transitions
- AUC-Driven PD Simulation: Generates realistic PD distributions calibrated to target model performance
- Monte Carlo Framework: Produces full distributions of RWA outcomes with confidence intervals
- AIRB Calculator: Full implementation for mortgage portfolios with regulatory formulas
- SA Calculator: Standardized approach baseline for comparison scenarios
- Modular Design: Easy to extend to other asset classes and approaches
- YAML-Based Setup: Configure all aspects (data mapping, regulatory parameters, scenarios) in one file
- Pydantic Validation: Robust schema validation with clear error messages
- Flexible Column Mapping: Works with any portfolio data structure
- Plotly Dashboards: Generate rich, interactive HTML reports
- Distribution Plots: Visualize RWA distributions across scenarios
- Comparison Charts: Waterfall charts, scenario comparisons, percentile analysis
- Statistical Summaries: Mean, median, standard deviation, skewness, kurtosis, VaR metrics
- 340+ Tests: Comprehensive test coverage exceeding 100% of planned tests
- Memory Efficient: Handles large portfolios with optimized processing modes
- Fully Typed: Type hints throughout for better IDE support
- Extensive Logging: Detailed logging for debugging and audit trails
Current Version: 0.1.0 (Beta)
The project has completed MVP development and is entering production readiness phase.
Feature 1: Foundation & Configuration (100%)
- ✅ Project structure and dependency management
- ✅ Pydantic configuration schemas with validation
- ✅ YAML configuration loader
- ✅ Centralized logging framework
Feature 2: Core Simulation Engine (100%)
- ✅ Beta Mixture distribution fitter with EM algorithm
- ✅ Rating migration matrix calculator
- ✅ AUC-driven score generation
- ✅ Hybrid portfolio simulator (OOP refactored)
- ✅ Monte Carlo simulation framework with parallel processing
Feature 3: RWA Calculators (100%)
- ✅ AIRB mortgage calculator with full regulatory formulas
- ✅ SA mortgage calculator for comparison
- ✅ Modular base classes for extensibility
- ✅ Result aggregation and statistics
Feature 4: Integrated Analysis (100%)
- ✅ IntegratedAnalysis orchestration layer
- ✅ Scenario comparison framework
- ✅ Statistical summaries and percentile analysis
- ✅ High-level API (
run_analysis,run_scenario_comparison)
Feature 5: Reporting & Visualization (100%)
- ✅ Interactive Plotly dashboards
- ✅ RWA distribution plots
- ✅ Scenario comparison visualizations
- ✅ Waterfall charts and summary tables
- ✅ HTML report generation
Feature 6: Testing & Quality (100%)
- ✅ 340+ comprehensive tests (104% of planned coverage)
- ✅ Unit, integration, and end-to-end tests
- ✅ Edge case handling
- ✅ Performance benchmarks
Near Term (Q1 2026)
- 📝 Enhanced documentation and user guides
- 🎓 Tutorial notebooks for common workflows
- 📦 PyPI package publication
- 🔄 CI/CD pipeline setup
Future Extensions
- 🔮 LGD & EAD simulation engines
- 📈 Multi-period portfolio growth simulation
- 🏢 Additional asset classes (Corporate, SME, Retail)
- 🌐 Web interface for non-technical users
- 📊 Model monitoring and validation toolkit
- Python 3.9+
- pip package manager
- A virtual environment tool (
venv,conda, etc.)
# Clone the repository
git clone https://github.com/jacekkrawiec/IRBStudio.git
cd IRBStudio
# Create and activate virtual environment
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
# Install in editable mode with dependencies
pip install -e .pip install irbstudioHere's a complete example showing how to run an AIRB scenario analysis:
Create a portfolio CSV file (portfolio.csv):
loan_id,balance,pd,score,rating,reporting_date,default_flag,into_default_flag,ltv,property_value
L001,100000,0.02,0.05,A,2024-01-01,0,0,0.80,125000
L002,200000,0.05,0.10,B,2024-01-01,0,0,0.75,266667
L003,150000,0.03,0.07,A,2024-01-01,0,0,0.70,214286Create config.yaml:
# Column mapping - map your data to canonical fields
column_mapping:
loan_id: loan_id
exposure: balance
pd: pd
score: score
rating: rating
date: reporting_date
default_flag: default_flag
into_default_flag: into_default_flag
ltv: ltv
# Regulatory parameters
regulatory:
jurisdiction: generic
asset_correlation: 0.15
confidence_level: 0.999
# Define scenarios to compare
scenarios:
- name: "Baseline"
description: "Current model performance"
pd_auc: 0.75
portfolio_default_rate: 0.03
lgd: 0.25
new_loan_rate: 0.10
rating_pd_map:
AAA: 0.001
AA: 0.005
A: 0.01
BBB: 0.03
BB: 0.05
B: 0.10
- name: "Improved Model"
description: "Better PD model with higher AUC"
pd_auc: 0.85
portfolio_default_rate: 0.03
lgd: 0.25
new_loan_rate: 0.10
rating_pd_map:
AAA: 0.001
AA: 0.005
A: 0.01
BBB: 0.03
BB: 0.05
B: 0.10from irbstudio import run_analysis
# Run the analysis
results = run_analysis(
config_path="config.yaml",
portfolio_path="portfolio.csv",
n_iterations=1000,
random_seed=42,
output_dir="results",
memory_efficient=False
)
# Access results
print(f"Baseline Mean RWA: ${results['Baseline']['AIRB']['mean']:,.0f}")
print(f"Improved Mean RWA: ${results['Improved Model']['AIRB']['mean']:,.0f}")
print(f"Capital Savings: ${results['capital_delta']:,.0f}")
# An interactive HTML dashboard is automatically saved to results/dashboard.htmlFor more control, use the programmatic API:
from irbstudio.data.loader import load_portfolio, load_config
from irbstudio.simulation.portfolio_simulator import PortfolioSimulator
from irbstudio.engine.integrated_analysis import IntegratedAnalysis
from irbstudio.engine.mortgage import AIRBMortgageCalculator
# Load configuration and data
config = load_config("config.yaml")
portfolio_df = load_portfolio("portfolio.csv", config.column_mapping)
# Create analysis engine
analysis = IntegratedAnalysis()
# Add calculator
calculator = AIRBMortgageCalculator(
regulatory_params={'lgd': 0.25, 'asset_correlation': 0.15}
)
analysis.add_calculator('AIRB', calculator)
# Create simulator
simulator = PortfolioSimulator(
portfolio_df=portfolio_df,
score_to_rating_bounds={'A': (0.03, 0.05), 'B': (0.05, 0.15)},
rating_col='rating',
loan_id_col='loan_id',
date_col='reporting_date',
default_col='default_flag',
into_default_flag_col='into_default_flag',
score_col='score',
target_auc=0.80
)
# Add scenario and run
analysis.add_scenario('baseline', simulator, n_iterations=1000)
results = analysis.run_scenario('baseline', random_seed=42)
# Get statistics
stats = analysis.get_summary_stats('baseline', 'AIRB')
percentiles = analysis.get_percentiles('baseline', 'AIRB')- User Guide: Comprehensive guide to using IRBStudio
- API Reference: Detailed API documentation
- Project Plan: Technical design and architecture
- Examples: Sample scripts and notebooks
- Tests Documentation: Test coverage and guidelines
notebooks/integrated_analysis_example.ipynb- Complete workflow demonstrationnotebooks/freddie_mac_sample_dataset.ipynb- Real-world mortgage data analysis
IRBStudio follows a modular, layered architecture:
┌─────────────────────────────────────────────────────┐
│ High-Level API (main.py) │
│ run_analysis() | run_scenario_comparison() │
└─────────────────────┬───────────────────────────────┘
│
┌─────────────────────┴───────────────────────────────┐
│ IntegratedAnalysis (Orchestration) │
│ Coordinates Simulators + Calculators + Reporting │
└─────────┬───────────────────────────┬───────────────┘
│ │
┌─────────┴──────────┐ ┌───────────┴──────────────┐
│ PortfolioSimulator │ │ RWA Calculators │
│ - Beta Mixture │ │ - AIRBCalculator │
│ - Migrations │ │ - SACalculator │
│ - Score Gen │ │ - BaseCalculator │
│ - Monte Carlo │ │ │
└────────────────────┘ └──────────────────────────┘
│ │
┌─────────┴───────────────────────────┴───────────────┐
│ Data Layer (loader.py) │
│ Config Loading | Portfolio Loading | Validation │
└─────────────────────────────────────────────────────┘
IRBStudio has extensive test coverage with 340+ tests:
# Run all tests
pytest
# Run with coverage report
pytest --cov=irbstudio --cov-report=html
# Run specific test module
pytest tests/test_portfolio_simulator.py -v
# Run with markers
pytest -m "not slow"Test Categories:
- Unit tests for individual components
- Integration tests for component interactions
- End-to-end tests for complete workflows
- Edge case tests for robustness
- Performance benchmarks
Contributions are welcome! Whether it's:
- 🐛 Bug reports
- 💡 Feature suggestions
- 📖 Documentation improvements
- 🔧 Code contributions
Please feel free to open an issue or submit a pull request.
# Clone and install in editable mode
git clone https://github.com/jacekkrawiec/IRBStudio.git
cd IRBStudio
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
flake8 irbstudio
black --check irbstudio
mypy irbstudioThis project is licensed under the MIT License - see the LICENSE file for details.
This project is for educational and research purposes. It should not be used for actual regulatory capital calculations without:
- Independent verification by qualified professionals
- Validation against regulatory requirements
- Approval from relevant authorities
The models and calculations provided are simplified representations and may not capture all regulatory nuances.
- Freddie Mac for providing the Single-Family Loan-Level Dataset
- The Python scientific computing community (NumPy, Pandas, SciPy)
- The Plotly team for excellent visualization tools
- Author: Jacek Krawiec
- GitHub: @jacekkrawiec
- Repository: IRBStudio
For technical details, see the Project Plan and API Reference.
This project is in the alpha stage of development. The foundational modules for configuration and data loading are complete, but the core simulation engine is still under construction.
- ✅ Configuration: Load and validate analysis parameters from a YAML file using Pydantic schemas.
- ✅ Data Loading: Ingest portfolio data from
.csvand.parquetfiles. - ✅ Column Mapping: Rename user-defined columns to a standardized, canonical format.
- ✅ Validation: Robust validation for configuration and data, with clear error messages.
- ✅ Logging: Centralized logging for better traceability.
The immediate focus is on building the core simulation and calculation engine for the Minimum Viable Product (MVP).
- Feature 2: Core Simulation & Calculation Engine
- PD Simulation: Implement the core logic to simulate a PD distribution based on a target AUC.
- RWA Calculators: Build the initial AIRB and SA RWA calculators for a mortgage portfolio.
- Feature 3: End-to-End Pipeline & Reporting
- Orchestration: Wire all the components together into a main analysis pipeline.
- Reporting: Generate an interactive HTML dashboard with Plotly to visualize scenario impacts.
- Example Notebook: Provide a comprehensive example notebook demonstrating a full analysis.
As the project is under active development, the best way to use it is by cloning the repository and installing it in editable mode.
- Python 3.9+
- A virtual environment (e.g.,
venv)
-
Clone the repository:
git clone https://github.com/jacekkrawiec/IRBStudio.git cd IRBStudio -
Create and activate a virtual environment:
# For Windows python -m venv .venv .\.venv\Scripts\activate # For macOS/Linux python3 -m venv .venv source .venv/bin/activate
-
Install the package in editable mode: This command will install the package and its dependencies. The
-eflag means that any changes you make to the source code will be immediately available.pip install -e .
The following example demonstrates the current data loading capabilities.
-
Create a sample portfolio file (
my_portfolio.csv):loan_identifier,balance,ltv A1,100000,0.8 B2,200000,0.7
-
Create a configuration file (
config.yaml): This file tellsIRBStudiohow to interpret your data and what scenarios to run.column_mapping: loan_id: loan_identifier exposure: balance ltv: ltv regulatory: asset_correlation: 0.15 scenarios: - name: "Baseline" pd_auc: 0.75 - name: "Improved PD Model" pd_auc: 0.80
-
Run the loader in Python:
from irbstudio.data import load_config, load_portfolio # Load and validate the configuration config = load_config("config.yaml") print("Config loaded successfully!") # Load and validate the portfolio data portfolio_df = load_portfolio("my_portfolio.csv", config.column_mapping) print("Portfolio loaded successfully:") print(portfolio_df.head())
Contributions are welcome, even at this early stage! Whether it's reporting a bug, suggesting a feature, or writing code, your help is appreciated.
Please feel free to open an issue or submit a pull request.
- See
docs/anddocumentation/for project plans and technical documentation. - See
examples/for example scripts. - See
notebooks/for interactive demos.
This project is for educational and research purposes and should not be used for actual regulatory capital calculation without independent verification.