econml.sklearn_extensions.linear_model.SelectiveRegularization
- class econml.sklearn_extensions.linear_model.SelectiveRegularization(unpenalized_inds, penalized_model, fit_intercept=True)[source]
Bases:
object
Estimator of a linear model where regularization is applied to only a subset of the coefficients.
Assume that our loss is
\[\ell(\beta_1, \beta_2) = \lVert y - X_1 \beta_1 - X_2 \beta_2 \rVert^2 + f(\beta_2)\]so that we’re regularizing only the coefficients in \(\beta_2\).
Then, since \(\beta_1\) doesn’t appear in the penalty, the problem of finding \(\beta_1\) to minimize the loss once \(\beta_2\) is known reduces to just a normal OLS regression, so that:
\[\beta_1 = (X_1^\top X_1)^{-1}X_1^\top(y - X_2 \beta_2)\]Plugging this into the loss, we obtain
\[\begin{split}~& \lVert y - X_1 (X_1^\top X_1)^{-1}X_1^\top(y - X_2 \beta_2) - X_2 \beta_2 \rVert^2 + f(\beta_2) \\ =~& \lVert (I - X_1 (X_1^\top X_1)^{-1}X_1^\top)(y - X_2 \beta_2) \rVert^2 + f(\beta_2)\end{split}\]But, letting \(M_{X_1} = I - X_1 (X_1^\top X_1)^{-1}X_1^\top\), we see that this is
\[\lVert (M_{X_1} y) - (M_{X_1} X_2) \beta_2 \rVert^2 + f(\beta_2)\]so finding the minimizing \(\beta_2\) can be done by regressing \(M_{X_1} y\) on \(M_{X_1} X_2\) using the penalized regression method incorporating \(f\). Note that these are just the residual values of \(y\) and \(X_2\) when regressed on \(X_1\) using OLS.
- Parameters
unpenalized_inds (list of int, other 1-dimensional indexing expression, or callable) – The indices that should not be penalized when the model is fit; all other indices will be penalized. If this is a callable, it will be called with the arguments to fit and should return a corresponding indexing expression. For example,
lambda X, y: unpenalized_inds=slice(1,-1)
will result in only the first and last indices being penalized.penalized_model (regressor) – A penalized linear regression model
fit_intercept (bool, default True) – Whether to fit an intercept; the intercept will not be penalized if it is fit
- coef_
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- Type
array, shape (n_features, ) or (n_targets, n_features)
- penalized_model
The penalized linear regression model, cloned from the one passed into the initializer
- Type
Methods
__init__
(unpenalized_inds, penalized_model)fit
(X, y[, sample_weight])Fit the model.
predict
(X)Make a prediction for each sample.
score
(X, y)Score the predictions for a set of features to ground truth.
Attributes
known_params
- fit(X, y, sample_weight=None)[source]
Fit the model.
- Parameters
X (array_like, shape (n, d_x)) – The features to regress against
y (array_like, shape (n,) or (n, d_y)) – The regression target
sample_weight (array_like, shape (n,), optional) – Relative weights for each sample