Skip to content

R implementation of Python crepes package - Conformal prediction for regression, classification, and predictive systems

License

Notifications You must be signed in to change notification settings

valeman/cRepes-R

Repository files navigation

cRepes: Conformal Prediction in R

⚠️ PRE-RELEASE BETA VERSION ⚠️

This package is currently in beta/pre-release phase. While the core functionality is implemented and tested, we encourage users to:

  • Report any issues you encounter via GitHub Issues
  • Perform extensive stress testing with your own datasets and use cases
  • Validate results against expected behavior, especially for edge cases

Your feedback and bug reports are invaluable for improving the package before the stable release!

License: MIT R-CMD-check

Overview

cRepes is a comprehensive R implementation of the Python crepes package by Henrik Boström. This package provides complete conformal prediction methods for regression, classification, and full predictive systems, with dual API interfaces to serve both R-native and Python-migrating users. Parity targets Python crepes v0.8.x semantics:

  • Classification set inclusion uses p >= threshold.
  • Regression intervals use explicit order statistics and return only 'lower'/'upper' columns.
  • CPS threshold queries follow CDF orientation: predict(y=t) ≈ P(Y ≤ t | X).
  • Seeds persist across fit/calibrate/predict for deterministic smoothing.

🙏 Full Credit: All algorithms, design, and theoretical foundations come from Henrik Boström and the Python crepes package. This R implementation makes these powerful conformal prediction methods accessible to the R community while maintaining complete fidelity to the original.

Key Features

  • 📊 Conformal Regression: Prediction intervals with finite-sample coverage guarantees
  • 🎯 Conformal Classification: Prediction sets and p-values for multi-class problems
  • 📈 Conformal Predictive Systems (CPS): Full predictive distributions with CDFs, percentiles, and sampling
  • 🔬 Advanced Methods: Mondrian categorization, difficulty estimation, online calibration
  • ⚡ Complete Python API: 100% compatibility with crepes v0.8.0
  • 🧪 Theoretical Rigor: Exact finite-sample coverage guarantees under exchangeability
  • 🔄 Cross-Validation: Built-in coverage validation and efficiency assessment
  • 🚀 Expanded Base Learner Support: randomForest, ranger, glm/lm, SVM, multinom/nnet, and custom functions
  • 🎛️ Dual API Interface planned: Both Python-style (sklearn-like) and R-style (caret-like) interfaces

Installation

You can install the development version of cRepes from GitHub:

# install.packages("devtools")
devtools::install_github("tuvelofstrom/cRepes-R")

Optional Dependencies

For enhanced performance and additional base learners, consider installing:

# Fast random forests (recommended for large datasets)
install.packages("ranger")

# Support Vector Machines
install.packages("e1071")

# Neural networks and multinomial regression
install.packages("nnet")

Quick Start

⚡ Python-Style Interface (For sklearn Users)

This interface exposes wrappers with method-style calls (R6-like). Below are small, concrete examples using built-in datasets so the calls can be run interactively.

# Load package and a base learner
library(cRepes)
library(randomForest)

#=== Classification example (iris) ===
data(iris)
set.seed(1)
idx <- sample(nrow(iris), 0.7 * nrow(iris))
X_train <- iris[idx, 1:4]
y_train <- iris[idx, "Species"]
X_test <- iris[-idx, 1:4]
y_test <- iris[-idx, "Species"]

cc <- wrap_classifier(randomForest::randomForest)
cc$fit(X_train, y_train)
cc$calibrate(X_train, y_train)  # small example: using same data for calibration

pred_sets <- cc$predict_set(X_test, confidence = 0.9)
p_values <- cc$predict_p(X_test)
eval_cls <- cc$evaluate(X_test, y_test)
print(head(pred_sets))
print(head(p_values))
print(eval_cls)

#=== Regression example (mtcars) ===
data(mtcars)
set.seed(1)
idx_r <- sample(nrow(mtcars), 0.7 * nrow(mtcars))
X_train_r <- mtcars[idx_r, setdiff(names(mtcars), "mpg")]
y_train_r <- mtcars[idx_r, "mpg"]
X_test_r <- mtcars[-idx_r, setdiff(names(mtcars), "mpg")]
y_test_r <- mtcars[-idx_r, "mpg"]

cr <- wrap_regressor(randomForest::randomForest)
cr$fit(X_train_r, y_train_r)
cr$calibrate(X_train_r, y_train_r)

intervals <- cr$predict_int(X_test_r, confidence = 0.9)
print(head(intervals))
print(cr$evaluate(X_test_r, y_test_r))

📈 Conformal Predictive Systems (CPS)

CPS provides full predictive CDFs, percentiles and sampling. The brief example below demonstrates creating a simple CPS using residuals from a base predictor (here: linear model). In realistic use you'd compute residuals on a calibration set.

library(cRepes)

# Simple CPS example using mtcars: obtain residuals from an lm on a calibration split
data(mtcars)
set.seed(2)
idx <- sample(nrow(mtcars), 0.7 * nrow(mtcars))
train <- mtcars[idx, ]
cal <- mtcars[-idx, ]

# Fit a point predictor and collect calibration residuals
fit_lm <- lm(mpg ~ ., data = train)
yhat_cal <- predict(fit_lm, newdata = cal)
residuals_cal <- cal$mpg - yhat_cal

# Create CPS and fit using residuals (small illustrative example)
cps <- conformal_predictive_system()
fit(cps, residuals = residuals_cal)

# For test points, supply point predictions y_hat and get intervals/percentiles
test <- train[1:3, ]
yhat_test <- predict(fit_lm, newdata = test)

intervals <- predict_int(cps, yhat_test, confidence = 0.9)
percentiles <- predict_percentiles(cps, yhat_test, higher_percentiles = c(90, 95))
cdfs <- predict_cpds(cps, yhat_test)

print(intervals)
print(percentiles)

🔬 Advanced Features

Core Functionality

  • Regression: Prediction intervals with exact finite-sample coverage
  • Classification: Prediction sets for binary and multi-class problems
  • Conformal Predictive Systems: Complete predictive distributions beyond point predictions

Advanced Methods

  • Mondrian Categorization: Conditional coverage for different data regions
  • Difficulty Estimation: Adaptive intervals based on prediction uncertainty
  • Online Calibration: Update calibration with new data points

Base Learner Compatibility

Both interfaces support multiple base learning methods:

# Python-style interface
wrap_regressor(randomForest::randomForest)            # Random Forest
wrap_regressor(ranger::ranger)                        # Fast Random Forest
wrap_classifier(e1071::svm)                           # SVM
wrap_regressor(function(X,y) lm(y ~ ., data.frame(y=y, X)))  # Custom function

Theoretical Guarantees

  • Coverage: Exact finite-sample validity under exchangeability
  • Efficiency: Minimal prediction set sizes and interval widths
  • Calibration: PIT uniformity testing for distributional validity

📚 Documentation & Examples

Comprehensive Guides

Documentation & Access

Option 1: After installing the package

# View available vignettes
vignette(package = "cRepes")

# Access the main introduction vignette
vignette("introduction", package = "cRepes")

Option 2: View source files directly

Available Documentation:

  • Introduction to cRepes: Complete overview of conformal prediction methods and package usage
  • Function Documentation: help(package = "cRepes") - Full function reference
  • Examples: Run example("wrap_regressor") for interactive examples

Ready-to-Run Examples

After installing the package, find example files in:

# Get the examples directory path
examples_dir <- system.file("examples", package = "cRepes")
list.files(examples_dir, pattern = "*.R", full.names = TRUE)

# Run a specific example
source(file.path(examples_dir, "example_regression.R"))

Available Examples:

  • example_regression.R: Basic and advanced regression examples with multiple base learners
  • example_classification.R: Multi-class classification with different methods including ranger
  • example_cps.R: Full predictive systems with distributional analysis

✅ Validation & Testing

🧪 Robust Test Suite: The package includes 75+ comprehensive tests covering:

  • API compatibility with Python crepes
  • Coverage guarantee validation
  • Mathematical property verification
  • Error handling and edge cases
# Run all tests
devtools::test()

# Test specific components
devtools::test(filter="test-cross-language-parity-regression")

Citing cRepes

If you use cRepes in your research, please cite:

@Manual{cRepes,
  title = {cRepes: Conformal Prediction in R},
  author = {Tuwe Löfström-Cavallin},
  year = {2025},
  note = {R package version 0.8.0},
  url = {https://github.com/tuvelofstrom/cRepes-R}
}

Please also cite the original Python crepes package:

@inproceedings{bostrom2024,
  title={Conformal Prediction in Python with crepes},
  author={Bostr{\"o}m, Henrik},
  booktitle={Proc. of the 13th Symposium on Conformal and Probabilistic Prediction with Applications},
  pages={236--249},
  year={2024},
  organization={PMLR},
  url = {https://github.com/henrikbostrom/crepes}
}

🤝 Getting Help & Contributing

Getting Help

  • 📖 Documentation: Start with vignette("introduction", package = "cRepes")
  • 💻 Examples: Access examples via system.file("examples", package = "cRepes")
  • 🔍 Function Help: Use help("wrap_regressor") for specific functions
  • 📋 Package Overview: Run help(package = "cRepes") for complete function list
  • 🐛 Issues: Report bugs or request features on GitHub Issues

Contributing

Contributions are welcome! The package follows standard R development practices:

  • Testing: 75+ comprehensive tests ensure reliability
  • Documentation: Roxygen2 documentation for all functions
  • Examples: Real-world examples for all major features
  • CI/CD: Automated testing and validation

Please see CONTRIBUTING.md for development guidelines.

🙏 Acknowledgments

This R package is built upon the outstanding work of Henrik Boström and the Python crepes package. All credit for the original algorithms, theoretical foundations, and software design goes to Henrik Boström and contributors to the Python crepes project.

Key Contributions:

  • Core Algorithms: Henrik Boström (Python crepes)
  • R Implementation: Faithful translation with dual API interfaces
  • Comprehensive Testing: 75+ tests for reliability and Python compatibility

The goal of cRepes is to make these powerful conformal prediction methods accessible to the R community while maintaining the mathematical rigor and quality of the original implementation.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❤️ for the R communityConformal Prediction for EveryonePython crepes ➡️ R cRepes

About

R implementation of Python crepes package - Conformal prediction for regression, classification, and predictive systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published