Title: | Orthogonalizing EM: Penalized Regression for Big Tall Data |
---|---|
Description: | Solves penalized least squares problems for big tall data using the orthogonalizing EM algorithm of Xiong et al. (2016) <doi:10.1080/00401706.2015.1054436>. The main fitting function is oem() and the functions cv.oem() and xval.oem() are for cross validation, the latter being an accelerated cross validation function for linear models. The big.oem() function allows for out of memory fitting. A description of the underlying methods and code interface is described in Huling and Chien (2022) <doi:10.18637/jss.v104.i06>. |
Authors: | Bin Dai [aut], Jared Huling [aut, cre] , Yixuan Qiu [ctb], Gael Guennebaud [cph], Jitse Niesen [cph] |
Maintainer: | Jared Huling <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.12 |
Built: | 2024-10-30 06:52:03 UTC |
Source: | CRAN |
Orthogonalizing EM for big.matrix objects
big.oem( x, y, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, compute.loss = FALSE, gigs = 4, hessian.type = c("full", "upper.bound") )
big.oem( x, y, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, compute.loss = FALSE, gigs = 4, hessian.type = c("full", "upper.bound") )
x |
input big.matrix object pointing to design matrix Each row is an observation, each column corresponds to a covariate |
y |
numeric response vector of length nobs. |
family |
|
penalty |
Specification of penalty type. Choices include:
Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the |
weights |
observation weights. Not implemented yet. Defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1) |
lambda |
A user supplied lambda sequence. By default, the program computes
its own lambda sequence based on |
nlambda |
The number of lambda values - default is 100. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of |
alpha |
mixing value for |
gamma |
tuning parameter for SCAD and MCP penalties. must be >= 1 |
tau |
mixing value for |
groups |
A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0 |
penalty.factor |
Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. |
group.weights |
penalty factors applied to each group for the group lasso. Similar to |
standardize |
Logical flag for x variable standardization, prior to fitting the models.
The coefficients are always returned on the original scale. Default is |
intercept |
Should intercept(s) be fitted ( |
maxit |
integer. Maximum number of OEM iterations |
tol |
convergence tolerance for OEM iterations |
irls.maxit |
integer. Maximum number of IRLS iterations |
irls.tol |
convergence tolerance for IRLS iterations. Only used if |
compute.loss |
should the loss be computed for each estimated tuning parameter? Defaults to |
gigs |
maximum number of gigs of memory available. Used to figure out how to break up calculations involving the design matrix x |
hessian.type |
only for logistic regression. if |
An object with S3 class "oem"
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
## Not run: set.seed(123) nrows <- 50000 ncols <- 100 bkFile <- "bigmat.bk" descFile <- "bigmatk.desc" bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double", backingfile=bkFile, backingpath=".", descriptorfile=descFile, dimnames=c(NULL,NULL)) # Each column value with be the column number multiplied by # samples from a standard normal distribution. set.seed(123) for (i in 1:ncols) bigmat[,i] = rnorm(nrows)*i y <- rnorm(nrows) + bigmat[,1] - bigmat[,2] fit <- big.oem(x = bigmat, y = y, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)) fit2 <- oem(x = bigmat[,], y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5)) max(abs(fit$beta[[1]] - fit2$beta[[1]])) layout(matrix(1:2, ncol = 2)) plot(fit) plot(fit, which.model = 2) ## End(Not run)
## Not run: set.seed(123) nrows <- 50000 ncols <- 100 bkFile <- "bigmat.bk" descFile <- "bigmatk.desc" bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double", backingfile=bkFile, backingpath=".", descriptorfile=descFile, dimnames=c(NULL,NULL)) # Each column value with be the column number multiplied by # samples from a standard normal distribution. set.seed(123) for (i in 1:ncols) bigmat[,i] = rnorm(nrows)*i y <- rnorm(nrows) + bigmat[,1] - bigmat[,2] fit <- big.oem(x = bigmat, y = y, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)) fit2 <- oem(x = bigmat[,], y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5)) max(abs(fit$beta[[1]] - fit2$beta[[1]])) layout(matrix(1:2, ncol = 2)) plot(fit) plot(fit, which.model = 2) ## End(Not run)
Cross validation for Orthogonalizing EM
cv.oem( x, y, penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), nfolds = 10, foldid = NULL, grouped = TRUE, keep = FALSE, parallel = FALSE, ncores = -1, ... )
cv.oem( x, y, penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), nfolds = 10, foldid = NULL, grouped = TRUE, keep = FALSE, parallel = FALSE, ncores = -1, ... )
x |
input matrix of dimension n x p or |
y |
numeric response vector of length nobs. |
penalty |
Specification of penalty type in lowercase letters. Choices include |
weights |
observation weights. defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1) |
lambda |
A user supplied lambda sequence. By default, the program computes its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. |
type.measure |
measure to evaluate for cross-validation. The default is |
nfolds |
number of folds for cross-validation. default is 10. 3 is smallest value allowed. |
foldid |
an optional vector of values between 1 and nfold specifying which fold each observation belongs to. |
grouped |
Like in glmnet, this is an experimental argument, with default |
keep |
If |
parallel |
If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC. |
ncores |
Number of cores to use. If |
... |
other parameters to be passed to |
An object with S3 class "cv.oem"
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5)) layout(matrix(1:2, ncol = 2)) plot(fit) plot(fit, which.model = 2)
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5)) layout(matrix(1:2, ncol = 2)) plot(fit) plot(fit, which.model = 2)
log likelihood function for fitted oem objects
log likelihood function for fitted cross validation oem
objects
log likelihood function for fitted cross validation oem
objects
## S3 method for class 'oem' logLik(object, which.model = 1, ...) ## S3 method for class 'cv.oem' logLik(object, which.model = 1, ...) ## S3 method for class 'xval.oem' logLik(object, which.model = 1, ...)
## S3 method for class 'oem' logLik(object, which.model = 1, ...) ## S3 method for class 'cv.oem' logLik(object, which.model = 1, ...) ## S3 method for class 'xval.oem' logLik(object, which.model = 1, ...)
object |
fitted "oem" model object. |
which.model |
If multiple penalties are fit and returned in the same |
... |
not used |
set.seed(123) n.obs <- 2000 n.vars <- 50 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE) logLik(fit) logLik(fit, which.model = "mcp") fit <- cv.oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE, nlambda = 25) logLik(fit) logLik(fit, which.model = "mcp") fit <- xval.oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE, nlambda = 25) logLik(fit) logLik(fit, which.model = "mcp")
set.seed(123) n.obs <- 2000 n.vars <- 50 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE) logLik(fit) logLik(fit, which.model = "mcp") fit <- cv.oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE, nlambda = 25) logLik(fit) logLik(fit, which.model = "mcp") fit <- xval.oem(x = x, y = y, penalty = c("lasso", "mcp"), compute.loss = TRUE, nlambda = 25) logLik(fit) logLik(fit, which.model = "mcp")
Orthogonalizing EM
oem( x, y, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, accelerate = FALSE, ncores = -1, compute.loss = FALSE, hessian.type = c("upper.bound", "full") )
oem( x, y, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, accelerate = FALSE, ncores = -1, compute.loss = FALSE, hessian.type = c("upper.bound", "full") )
x |
input matrix of dimension n x p or |
y |
numeric response vector of length |
family |
|
penalty |
Specification of penalty type. Choices include:
Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the |
weights |
observation weights. Not implemented yet. Defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1) |
lambda |
A user supplied lambda sequence. By default, the program computes
its own lambda sequence based on |
nlambda |
The number of lambda values. The default is 100. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of |
alpha |
mixing value for |
gamma |
tuning parameter for SCAD and MCP penalties. must be >= 1 |
tau |
mixing value for |
groups |
A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0 |
penalty.factor |
Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. |
group.weights |
penalty factors applied to each group for the group lasso. Similar to |
standardize |
Logical flag for x variable standardization, prior to fitting the models.
The coefficients are always returned on the original scale. Default is |
intercept |
Should intercept(s) be fitted ( |
maxit |
integer. Maximum number of OEM iterations |
tol |
convergence tolerance for OEM iterations |
irls.maxit |
integer. Maximum number of IRLS iterations |
irls.tol |
convergence tolerance for IRLS iterations. Only used if |
accelerate |
boolean argument. Whether or not to use Nesterov acceleration with adaptive restarting |
ncores |
Integer scalar that specifies the number of threads to be used |
compute.loss |
should the loss be computed for each estimated tuning parameter? Defaults to |
hessian.type |
only for logistic regression. if |
An object with S3 class "oem"
Shifeng Xiong, Bin Dai, Jared Huling, and Peter Z. G. Qian. Orthogonalizing EM: A design-based least squares algorithm. Technometrics, 58(3):285-293, 2016. doi:10.1080/00401706.2015.1054436
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
set.seed(123) n.obs <- 1e4 n.vars <- 50 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso", "sparse.grp.lasso"), groups = rep(1:10, each = 5)) layout(matrix(1:3, ncol = 3)) plot(fit) plot(fit, which.model = 2) plot(fit, which.model = "sparse.grp.lasso") # the oem package has support for # sparse design matrices library(Matrix) xs <- rsparsematrix(n.obs * 25, n.vars * 2, density = 0.01) ys <- rnorm(n.obs * 25, sd = 3) + as.vector(xs %*% c(true.beta, rep(0, n.vars)) ) x.dense <- as.matrix(xs) system.time(fit <- oem(x = x.dense, y = ys, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5), intercept = FALSE, standardize = FALSE)) system.time(fits <- oem(x = xs, y = ys, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5), intercept = FALSE, standardize = FALSE, lambda = fit$lambda)) max(abs(fit$beta[[1]] - fits$beta[[1]])) max(abs(fit$beta[[2]] - fits$beta[[2]])) # logistic y <- rbinom(n.obs, 1, prob = 1 / (1 + exp(-x %*% true.beta))) system.time(res <- oem(x, y, intercept = FALSE, penalty = c("lasso", "sparse.grp.lasso", "mcp"), family = "binomial", groups = rep(1:10, each = 5), nlambda = 10, irls.tol = 1e-3, tol = 1e-8)) layout(matrix(1:3, ncol = 3)) plot(res) plot(res, which.model = 2) plot(res, which.model = "mcp") # sparse design matrix xs <- rsparsematrix(n.obs * 2, n.vars, density = 0.01) x.dense <- as.matrix(xs) ys <- rbinom(n.obs * 2, 1, prob = 1 / (1 + exp(-x %*% true.beta))) system.time(res.gr <- oem(x.dense, ys, intercept = FALSE, penalty = "grp.lasso", family = "binomial", nlambda = 10, groups = rep(1:5, each = 10), irls.tol = 1e-3, tol = 1e-8)) system.time(res.gr.s <- oem(xs, ys, intercept = FALSE, penalty = "grp.lasso", family = "binomial", nlambda = 10, groups = rep(1:5, each = 10), irls.tol = 1e-3, tol = 1e-8)) max(abs(res.gr$beta[[1]] - res.gr.s$beta[[1]]))
set.seed(123) n.obs <- 1e4 n.vars <- 50 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso", "sparse.grp.lasso"), groups = rep(1:10, each = 5)) layout(matrix(1:3, ncol = 3)) plot(fit) plot(fit, which.model = 2) plot(fit, which.model = "sparse.grp.lasso") # the oem package has support for # sparse design matrices library(Matrix) xs <- rsparsematrix(n.obs * 25, n.vars * 2, density = 0.01) ys <- rnorm(n.obs * 25, sd = 3) + as.vector(xs %*% c(true.beta, rep(0, n.vars)) ) x.dense <- as.matrix(xs) system.time(fit <- oem(x = x.dense, y = ys, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5), intercept = FALSE, standardize = FALSE)) system.time(fits <- oem(x = xs, y = ys, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5), intercept = FALSE, standardize = FALSE, lambda = fit$lambda)) max(abs(fit$beta[[1]] - fits$beta[[1]])) max(abs(fit$beta[[2]] - fits$beta[[2]])) # logistic y <- rbinom(n.obs, 1, prob = 1 / (1 + exp(-x %*% true.beta))) system.time(res <- oem(x, y, intercept = FALSE, penalty = c("lasso", "sparse.grp.lasso", "mcp"), family = "binomial", groups = rep(1:10, each = 5), nlambda = 10, irls.tol = 1e-3, tol = 1e-8)) layout(matrix(1:3, ncol = 3)) plot(res) plot(res, which.model = 2) plot(res, which.model = "mcp") # sparse design matrix xs <- rsparsematrix(n.obs * 2, n.vars, density = 0.01) x.dense <- as.matrix(xs) ys <- rbinom(n.obs * 2, 1, prob = 1 / (1 + exp(-x %*% true.beta))) system.time(res.gr <- oem(x.dense, ys, intercept = FALSE, penalty = "grp.lasso", family = "binomial", nlambda = 10, groups = rep(1:5, each = 10), irls.tol = 1e-3, tol = 1e-8)) system.time(res.gr.s <- oem(xs, ys, intercept = FALSE, penalty = "grp.lasso", family = "binomial", nlambda = 10, groups = rep(1:5, each = 10), irls.tol = 1e-3, tol = 1e-8)) max(abs(res.gr$beta[[1]] - res.gr.s$beta[[1]]))
Orthogonalizing EM with precomputed XtX
oem.xtx( xtx, xty, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), scale.factor = numeric(0), penalty.factor = NULL, group.weights = NULL, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001 )
oem.xtx( xtx, xty, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), scale.factor = numeric(0), penalty.factor = NULL, group.weights = NULL, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001 )
xtx |
input matrix equal to |
xty |
numeric vector of length |
family |
|
penalty |
Specification of penalty type. Choices include:
Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the |
lambda |
A user supplied lambda sequence. By default, the program computes
its own lambda sequence based on |
nlambda |
The number of lambda values - default is 100. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of |
alpha |
mixing value for |
gamma |
tuning parameter for SCAD and MCP penalties. must be >= 1 |
tau |
mixing value for |
groups |
A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0 |
scale.factor |
of length |
penalty.factor |
Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. |
group.weights |
penalty factors applied to each group for the group lasso. Similar to |
maxit |
integer. Maximum number of OEM iterations |
tol |
convergence tolerance for OEM iterations |
irls.maxit |
integer. Maximum number of IRLS iterations |
irls.tol |
convergence tolerance for IRLS iterations. Only used if |
An object with S3 class "oem"
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), standardize = FALSE, intercept = FALSE, groups = rep(1:20, each = 5)) xtx <- crossprod(x) / n.obs xty <- crossprod(x, y) / n.obs fit.xtx <- oem.xtx(xtx = xtx, xty = xty, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)) max(abs(fit$beta[[1]][-1,] - fit.xtx$beta[[1]])) max(abs(fit$beta[[2]][-1,] - fit.xtx$beta[[2]])) layout(matrix(1:2, ncol = 2)) plot(fit.xtx) plot(fit.xtx, which.model = 2)
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), standardize = FALSE, intercept = FALSE, groups = rep(1:20, each = 5)) xtx <- crossprod(x) / n.obs xty <- crossprod(x, y) / n.obs fit.xtx <- oem.xtx(xtx = xtx, xty = xty, penalty = c("lasso", "elastic.net", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)) max(abs(fit$beta[[1]][-1,] - fit.xtx$beta[[1]])) max(abs(fit$beta[[2]][-1,] - fit.xtx$beta[[2]])) layout(matrix(1:2, ncol = 2)) plot(fit.xtx) plot(fit.xtx, which.model = 2)
These functions have been renamed and deprecated in oem:
oemfit()
(use oem()
), cv.oemfit()
(use cv.oem()
), print.oemfit()
,
plot.oemfit()
, predict.oemfit()
, and
coef.oemfit()
.
oemfit( formula, data = list(), lambda = NULL, nlambda = 100, lambda.min.ratio = NULL, tolerance = 0.001, maxIter = 1000, standardized = TRUE, numGroup = 1, penalty = c("lasso", "scad", "ols", "elastic.net", "ngarrote", "mcp"), alpha = 3, evaluate = 0, condition = -1 ) cv.oemfit( formula, data = list(), lambda = NULL, type.measure = c("mse", "mae"), ..., nfolds = 10, foldid, penalty = c("lasso", "scad", "elastic.net", "ngarrote", "mcp") ) ## S3 method for class 'oemfit' plot( x, xvar = c("norm", "lambda", "loglambda", "dev"), xlab = iname, ylab = "Coefficients", ... ) ## S3 method for class 'oemfit' predict( object, newx, s = NULL, type = c("response", "coefficients", "nonzero"), ... ) ## S3 method for class 'oemfit' print(x, digits = max(3, getOption("digits") - 3), ...)
oemfit( formula, data = list(), lambda = NULL, nlambda = 100, lambda.min.ratio = NULL, tolerance = 0.001, maxIter = 1000, standardized = TRUE, numGroup = 1, penalty = c("lasso", "scad", "ols", "elastic.net", "ngarrote", "mcp"), alpha = 3, evaluate = 0, condition = -1 ) cv.oemfit( formula, data = list(), lambda = NULL, type.measure = c("mse", "mae"), ..., nfolds = 10, foldid, penalty = c("lasso", "scad", "elastic.net", "ngarrote", "mcp") ) ## S3 method for class 'oemfit' plot( x, xvar = c("norm", "lambda", "loglambda", "dev"), xlab = iname, ylab = "Coefficients", ... ) ## S3 method for class 'oemfit' predict( object, newx, s = NULL, type = c("response", "coefficients", "nonzero"), ... ) ## S3 method for class 'oemfit' print(x, digits = max(3, getOption("digits") - 3), ...)
formula |
an object of 'formula' (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details' |
data |
an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'oemfit' is called. |
lambda |
A user supplied |
nlambda |
The number of |
lambda.min.ratio |
Smallest value for |
tolerance |
Convergence tolerance for OEM. Each inner
OEM loop continues until the maximum change in the
objective after any coefficient update is less than |
maxIter |
Maximum number of passes over the data for all lambda values; default is 1000. |
standardized |
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on
the original scale. Default is |
numGroup |
Integer value for the number of groups to use for OEM fitting. Default is 1. |
penalty |
type in lower letters. Different types include 'lasso', 'scad', 'ols' (ordinary least square), 'elastic-net', 'ngarrote' (non-negative garrote) and 'mcp'. |
alpha |
alpha value for scad and mcp. |
evaluate |
debugging argument |
condition |
Debugging for different ways of calculating OEM. |
type.measure |
type.measure measure to evaluate for cross-validation.
|
... |
arguments to be passed to |
nfolds |
number of folds for cross-validation. default is 10. |
foldid |
an optional vector of values between 1 and nfold specifying which fold each observation belongs to. |
x |
fitted |
xvar |
what is on the X-axis. "norm" plots against the L1-norm of the coefficients, "lambda" against the log-lambda sequence, and "dev" against the percent deviance explained. |
xlab |
x-axis label |
ylab |
y-axis label |
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. Must be a matrix. |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model. |
type |
not used. |
digits |
significant digits in print out. |
The sequence of models implied by 'lambda' is fit by OEM algorithm.
Bin Dai
Plot method for Orthogonalizing EM fitted objects
Plot method for Orthogonalizing EM fitted objects
## S3 method for class 'oem' plot( x, which.model = 1, xvar = c("norm", "lambda", "loglambda", "dev"), labsize = 0.6, xlab = iname, ylab = NULL, main = x$penalty[which.model], ... ) ## S3 method for class 'cv.oem' plot(x, which.model = 1, sign.lambda = 1, ...) ## S3 method for class 'xval.oem' plot( x, which.model = 1, type = c("cv", "coefficients"), xvar = c("norm", "lambda", "loglambda", "dev"), labsize = 0.6, xlab = iname, ylab = NULL, main = x$penalty[which.model], sign.lambda = 1, ... )
## S3 method for class 'oem' plot( x, which.model = 1, xvar = c("norm", "lambda", "loglambda", "dev"), labsize = 0.6, xlab = iname, ylab = NULL, main = x$penalty[which.model], ... ) ## S3 method for class 'cv.oem' plot(x, which.model = 1, sign.lambda = 1, ...) ## S3 method for class 'xval.oem' plot( x, which.model = 1, type = c("cv", "coefficients"), xvar = c("norm", "lambda", "loglambda", "dev"), labsize = 0.6, xlab = iname, ylab = NULL, main = x$penalty[which.model], sign.lambda = 1, ... )
x |
fitted "oem" model object |
which.model |
If multiple penalties are fit and returned in the same oem object, the which.model argument is used to
specify which model to plot. For example, if the oem object |
xvar |
What is on the X-axis. |
labsize |
size of labels for variable names. If labsize = 0, then no variable names will be plotted |
xlab |
label for x-axis |
ylab |
label for y-axis |
main |
main title for plot |
... |
other graphical parameters for the plot |
sign.lambda |
Either plot against log(lambda) (default) or its negative if |
type |
one of |
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:2, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = 2) set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:2, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = "grp.lasso") set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:4, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = 2) plot(fit, which.model = 1, type = "coef") plot(fit, which.model = 2, type = "coef")
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:2, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = 2) set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:2, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = "grp.lasso") set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10)) layout(matrix(1:4, ncol = 2)) plot(fit, which.model = 1) plot(fit, which.model = 2) plot(fit, which.model = 1, type = "coef") plot(fit, which.model = 2, type = "coef")
Prediction function for fitted cross validation oem objects
## S3 method for class 'cv.oem' predict( object, newx, which.model = "best.model", s = c("lambda.min", "lambda.1se"), ... )
## S3 method for class 'cv.oem' predict( object, newx, which.model = "best.model", s = c("lambda.min", "lambda.1se"), ... )
object |
fitted |
newx |
Matrix of new values for |
which.model |
If multiple penalties are fit and returned in the same |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create
the model. For |
... |
used to pass the other arguments for predict.oem |
An object depending on the type argument
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.best <- predict(fit, newx = x.test, type = "response", which.model = "best.model") apply(preds.best, 2, function(x) mean((y.test - x) ^ 2)) preds.gl <- predict(fit, newx = x.test, type = "response", which.model = "grp.lasso") apply(preds.gl, 2, function(x) mean((y.test - x) ^ 2)) preds.l <- predict(fit, newx = x.test, type = "response", which.model = 1) apply(preds.l, 2, function(x) mean((y.test - x) ^ 2))
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.best <- predict(fit, newx = x.test, type = "response", which.model = "best.model") apply(preds.best, 2, function(x) mean((y.test - x) ^ 2)) preds.gl <- predict(fit, newx = x.test, type = "response", which.model = "grp.lasso") apply(preds.gl, 2, function(x) mean((y.test - x) ^ 2)) preds.l <- predict(fit, newx = x.test, type = "response", which.model = 1) apply(preds.l, 2, function(x) mean((y.test - x) ^ 2))
Prediction method for Orthogonalizing EM fitted objects
## S3 method for class 'oem' predict( object, newx, s = NULL, which.model = 1, type = c("link", "response", "coefficients", "nonzero", "class"), ... )
## S3 method for class 'oem' predict( object, newx, s = NULL, which.model = 1, type = c("link", "response", "coefficients", "nonzero", "class"), ... )
object |
fitted "oem" model object |
newx |
Matrix of new values for |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model. |
which.model |
If multiple penalties are fit and returned in the same oem object, the |
type |
Type of prediction required. |
... |
not used |
An object depending on the type argument
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.lasso <- predict(fit, newx = x.test, type = "response", which.model = 1) preds.grp.lasso <- predict(fit, newx = x.test, type = "response", which.model = 2) apply(preds.lasso, 2, function(x) mean((y.test - x) ^ 2)) apply(preds.grp.lasso, 2, function(x) mean((y.test - x) ^ 2))
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.lasso <- predict(fit, newx = x.test, type = "response", which.model = 1) preds.grp.lasso <- predict(fit, newx = x.test, type = "response", which.model = 2) apply(preds.lasso, 2, function(x) mean((y.test - x) ^ 2)) apply(preds.grp.lasso, 2, function(x) mean((y.test - x) ^ 2))
Prediction function for fitted cross validation oem objects
## S3 method for class 'xval.oem' predict( object, newx, which.model = "best.model", s = c("lambda.min", "lambda.1se"), ... )
## S3 method for class 'xval.oem' predict( object, newx, which.model = "best.model", s = c("lambda.min", "lambda.1se"), ... )
object |
fitted "cv.oem" model object |
newx |
Matrix of new values for x at which predictions are to be made. Must be a matrix; can be sparse as in the
|
which.model |
If multiple penalties are fit and returned in the same |
s |
Value(s) of the penalty parameter |
... |
used to pass the other arguments for |
An object depending on the type argument
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.best <- predict(fit, newx = x.test, type = "response", which.model = "best.model") apply(preds.best, 2, function(x) mean((y.test - x) ^ 2)) preds.gl <- predict(fit, newx = x.test, type = "response", which.model = "grp.lasso") apply(preds.gl, 2, function(x) mean((y.test - x) ^ 2)) preds.l <- predict(fit, newx = x.test, type = "response", which.model = 1) apply(preds.l, 2, function(x) mean((y.test - x) ^ 2))
set.seed(123) n.obs <- 1e4 n.vars <- 100 n.obs.test <- 1e3 true.beta <- c(runif(15, -0.5, 0.5), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta x.test <- matrix(rnorm(n.obs.test * n.vars), n.obs.test, n.vars) y.test <- rnorm(n.obs.test, sd = 3) + x.test %*% true.beta fit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:10, each = 10), nlambda = 10) preds.best <- predict(fit, newx = x.test, type = "response", which.model = "best.model") apply(preds.best, 2, function(x) mean((y.test - x) ^ 2)) preds.gl <- predict(fit, newx = x.test, type = "response", which.model = "grp.lasso") apply(preds.gl, 2, function(x) mean((y.test - x) ^ 2)) preds.l <- predict(fit, newx = x.test, type = "response", which.model = 1) apply(preds.l, 2, function(x) mean((y.test - x) ^ 2))
summary.cv.oem
objectsprint method for summary.cv.oem
objects
## S3 method for class 'summary.cv.oem' print(x, digits, ...)
## S3 method for class 'summary.cv.oem' print(x, digits, ...)
x |
a "summary.cv.oem" object |
digits |
digits to display |
... |
not used |
summary method for cross validation Orthogonalizing EM fitted objects
summary method for cross validation Orthogonalizing EM fitted objects
## S3 method for class 'cv.oem' summary(object, ...) ## S3 method for class 'xval.oem' summary(object, ...)
## S3 method for class 'cv.oem' summary(object, ...) ## S3 method for class 'xval.oem' summary(object, ...)
object |
fitted |
... |
not used |
Fast cross validation for Orthogonalizing EM
xval.oem( x, y, nfolds = 10L, foldid = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), ncores = -1, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, compute.loss = FALSE )
xval.oem( x, y, nfolds = 10L, foldid = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), ncores = -1, family = c("gaussian", "binomial"), penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), weights = numeric(0), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 3, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, standardize = TRUE, intercept = TRUE, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001, compute.loss = FALSE )
x |
input matrix of dimension n x p (sparse matrices not yet implemented).
Each row is an observation, each column corresponds to a covariate. The xval.oem() function
is optimized for n >> p settings and may be very slow when p > n, so please use other packages
such as |
y |
numeric response vector of length |
nfolds |
integer number of cross validation folds. 3 is the minimum number allowed. defaults to 10 |
foldid |
an optional vector of values between 1 and |
type.measure |
measure to evaluate for cross-validation. The default is |
ncores |
Integer scalar that specifies the number of threads to be used |
family |
|
penalty |
Specification of penalty type. Choices include:
Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the |
weights |
observation weights. defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1) |
lambda |
A user supplied lambda sequence. By default, the program computes
its own lambda sequence based on |
nlambda |
The number of lambda values - default is 100. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of |
alpha |
mixing value for |
gamma |
tuning parameter for SCAD and MCP penalties. must be >= 1 |
tau |
mixing value for |
groups |
A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0 |
penalty.factor |
Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. |
group.weights |
penalty factors applied to each group for the group lasso. Similar to |
standardize |
Logical flag for |
intercept |
Should intercept(s) be fitted ( |
maxit |
integer. Maximum number of OEM iterations |
tol |
convergence tolerance for OEM iterations |
irls.maxit |
integer. Maximum number of IRLS iterations |
irls.tol |
convergence tolerance for IRLS iterations. Only used if |
compute.loss |
should the loss be computed for each estimated tuning parameter? Defaults to |
An object with S3 class "xval.oem"
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta system.time(fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5))) system.time(xfit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5))) system.time(xfit2 <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)))
set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta system.time(fit <- oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5))) system.time(xfit <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5))) system.time(xfit2 <- xval.oem(x = x, y = y, penalty = c("lasso", "grp.lasso", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "sparse.grp.lasso"), groups = rep(1:20, each = 5)))