Title: | Split Regularized Regression |
---|---|
Description: | Functions for computing split regularized estimators defined in Christidis, Lakshmanan, Smucler and Zamar (2019) <arXiv:1712.03561>. The approach fits linear regression models that split the set of covariates into groups. The optimal split of the variables into groups and the regularized estimation of the regression coefficients are performed by minimizing an objective function that encourages sparsity within each group and diversity among them. The estimated coefficients are then pooled together to form the final fit. |
Authors: | Anthony Christidis <[email protected]>, Ezequiel Smucler <[email protected]>, Ruben Zamar <[email protected]> |
Maintainer: | Anthony Christidis <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.2 |
Built: | 2024-12-13 06:54:01 UTC |
Source: | CRAN |
Extract coefficients from a cv.SplitReg object.
## S3 method for class 'cv.SplitReg' coef(object, index = object$index_opt, ...)
## S3 method for class 'cv.SplitReg' coef(object, index = object$index_opt, ...)
object |
Fitted cv.SplitReg object. |
index |
Indices indicating values of lambda_S at which to extract coefficients. Defaults to the optimal value. |
... |
Additional arguments for compatibility. |
A vector of coefficients
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) split.coefs <- coef(fit)
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) split.coefs <- coef(fit)
Computes a split regularized regression estimator. The sparsity and diversity penalty parameters are chosen automatically.
cv.SplitReg( x, y, num_lambdas_sparsity = 100, num_lambdas_diversity = 100, alpha = 1, num_models = 10, tolerance = 1e-08, max_iter = 1e+05, num_folds = 10, num_threads = 1 )
cv.SplitReg( x, y, num_lambdas_sparsity = 100, num_lambdas_diversity = 100, alpha = 1, num_models = 10, tolerance = 1e-08, max_iter = 1e+05, num_folds = 10, num_threads = 1 )
x |
Design matrix. |
y |
Response vector. |
num_lambdas_sparsity |
Length of the grid of sparsity penalties. |
num_lambdas_diversity |
Length of the grid of diversity penalties. |
alpha |
Elastic Net tuning constant: the value must be between 0 and 1. Default is 1 (Lasso). |
num_models |
Number of models to build. |
tolerance |
Tolerance parameter to stop the iterations while cycling over the models. |
max_iter |
Maximum number of iterations before stopping the iterations while cycling over the models. |
num_folds |
Number of folds for cross-validating. |
num_threads |
Number of threads used for parallel computation over the folds. |
Computes a split regularized regression estimator with num_models
() models, defined as the linear models
that minimize
over grids for the penalty parameters and
that are built automatically.
Larger values of
encourage more sparsity within the models and larger values of
encourage more diversity
among them.
If
, then all of the models are equal to the Elastic Net regularized
least squares estimator with penalty parameter
. Optimal penalty parameters are found by
num_folds
cross-validation, where the prediction of the ensemble is formed by simple averaging.
The predictors and the response are standardized to zero mean and unit variance before any computations are performed.
The final output is in the original scales.
An object of class cv.SplitReg, a list with entries
betas |
Coefficients computed over the path of penalties for sparsity; the penalty for diversity is fixed at the optimal value. |
intercepts |
Intercepts for each of the models along the path of penalties for sparsity. |
index_opt |
Index of the optimal penalty parameter for sparsity. |
lambda_sparsity_opt |
Optimal penalty parameter for sparsity. |
lambda_diversity_opt |
Optimal penalty parameter for diversity. |
lambdas_sparsity |
Grid of sparsity parameters. |
lambdas_diversity |
Grid of diversity parameters. |
cv_mse_opt |
Optimal CV MSE. |
call |
The matched call. |
predict.cv.SplitReg
, coef.cv.SplitReg
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) coefs <- predict(fit, type="coefficients")
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) coefs <- predict(fit, type="coefficients")
Make predictions from a cv.SplitReg object, similar to other predict methods.
## S3 method for class 'cv.SplitReg' predict( object, newx, index = object$index_opt, type = c("response", "coefficients"), ... )
## S3 method for class 'cv.SplitReg' predict( object, newx, index = object$index_opt, type = c("response", "coefficients"), ... )
object |
Fitted cv.SplitReg object. |
newx |
Matrix of new values of x at which prediction are to be made. Ignored if type is "coefficients". |
index |
Indices indicating values of lambda_S at which to predict. Defaults to the optimal value. |
type |
Either "response" for predicted values or "coefficients" for the estimated coefficients. |
... |
Additional arguments for compatibility. |
Either a matrix with predictions or a vector of coefficients
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) x.new <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) split.predictions <- predict(fit, newx = x.new, type="response")
library(MASS) set.seed(1) beta <- c(rep(5, 5), rep(0, 45)) Sigma <- matrix(0.5, 50, 50) diag(Sigma) <- 1 x <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) y <- x %*% beta + rnorm(50) fit <- cv.SplitReg(x, y, num_models=2) x.new <- mvrnorm(50, mu = rep(0, 50), Sigma = Sigma) split.predictions <- predict(fit, newx = x.new, type="response")