A fast and flexible machine learning library for nonparametric high-dimensional regression and classification with guarantees.
- Python 3.8+
- C++ compiler (g++, clang, or MSVC)
- CMake 3.15+
- Eigen3
pip install hapcpip install git+https://github.com/yourusername/hapc.gitOr with editable install for development:
git clone https://github.com/yourusername/hapc.git
cd hapc
pip install -e .If installation fails, you may need to install build dependencies:
macOS:
brew install cmake eigenUbuntu/Debian:
sudo apt-get install cmake libeigen3-dev build-essentialWindows:
pip install cmake
# Install Visual Studio Build Tools or use conda
conda install -c conda-forge eigenimport numpy as np
from hapc.single import single_pcghal
from hapc.cv import pcghal_cv
# Generate sample data
X = np.random.randn(100, 5)
Y = X[:, 0] + 0.5 * X[:, 1] + np.random.randn(100) * 0.1
# Single fit with fixed lambda
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01)
print(f"Risk: {result.optimizer_output.risk:.6f}")
# Cross-validation to select lambda
lambdas = np.logspace(-4, 0, 10)
cv_result = pcghal_cv(X, Y, maxdeg=2, npc=5, lambdas=lambdas, nfolds=5)
print(f"Best lambda: {cv_result.best_lambda:.6f}")
# Make predictions
X_test = np.random.randn(20, 5)
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01, predict=X_test)
print(f"Predictions: {result.predictions}")from hapc.single import single_pcghal
result = single_pcghal(
X, Y,
maxdeg=2, # Maximum degree of interactions
npc=10, # Number of principal components
single_lambda=0.01,
predict=X_test # Optional: test data for predictions
)from hapc.single import single_pcghal
result = single_pcghal(
X, Y_binary,
maxdeg=2,
npc=10,
single_lambda=0.01,
predict=X_test
)from hapc.cv import pcghal_cv
cv_result = pcghal_cv(
X, Y,
maxdeg=2,
npc=10,
lambdas=np.logspace(-4, 0, 20),
nfolds=5
)
print(cv_result.best_lambda)Fit PC-GHAL with a single lambda value.
Parameters:
X(ndarray, shape (n, p)): Input featuresY(ndarray, shape (n,)): Response variablemaxdeg(int): Maximum degree of interactionsnpc(int): Number of principal componentssingle_lambda(float): Regularization parametermax_iter(int, default=100): Maximum iterationstol(float, default=1e-6): Convergence toleranceverbose(bool, default=False): Print progresspredict(ndarray, optional): Test data for predictionscenter(bool, default=True): Center the design matrix
Returns:
result.optimizer_output.alpha: Coefficientsresult.optimizer_output.risk: Final riskresult.optimizer_output.iter: Iterations until convergenceresult.predictions: Predictions on test data (if provided)
Cross-validation to select lambda.
Parameters:
lambdas(ndarray): Grid of lambda values to testnfolds(int, default=5): Number of CV folds- ...other parameters same as
single_pcghal
Returns:
cv_result.best_lambda: Optimal lambdacv_result.mses: CV errors for each lambdacv_result.best_model: Fitted model with best lambdacv_result.predictions: Predictions on test data (if provided)
Contributions welcome! The C++ core is shared between R and Python packages.
git clone https://github.com/yourusername/hapc.git
cd hapc
pip install -e .
pytestMIT License - see LICENSE file