Title: | Fit Sparse Linear Regression Models via Nonconvex Optimization |
---|---|
Description: | Efficient procedure for fitting regularization paths between L1 and L0, using the MC+ penalty of Zhang, C.H. (2010)<doi:10.1214/09-AOS729>. Implements the methodology described in Mazumder, Friedman and Hastie (2011) <DOI: 10.1198/jasa.2011.tm09738>. Sparsenet computes the regularization surface over both the family parameter and the tuning parameter by coordinate descent. |
Authors: | Trevor Hastie [aut, cre], Rahul Mazumder [aut], Jerome Friedman [aut] |
Maintainer: | Trevor Hastie <[email protected]> |
License: | GPL-2 |
Version: | 1.7 |
Built: | 2024-11-25 16:37:40 UTC |
Source: | CRAN |
Sparsenet uses coordinate descent on the MC+ nonconvex penalty family, and fits a surface of solutions over the two-dimensional parameter space.
At its simplest, provide x,y
data and it returns the solution
paths. There are tools for prediction, cross-validation, plotting and printing.
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) plot(fit) cvfit=cv.sparsenet(x,y) plot(cvfit)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) plot(fit) cvfit=cv.sparsenet(x,y) plot(cvfit)
Does k-fold cross-validation for sparsenet, produces a plot,
and returns values for gamma, lambda
cv.sparsenet(x, y, weights, type.measure = c("mse", "mae"), ...,nfolds = 10, foldid, keep=FALSE, trace.it=FALSE)
cv.sparsenet(x, y, weights, type.measure = c("mse", "mae"), ...,nfolds = 10, foldid, keep=FALSE, trace.it=FALSE)
x |
|
y |
response |
weights |
Observation weights; defaults to 1 per observation |
type.measure |
loss to use for cross-validation. Currently two
options:
squared-error ( |
... |
Other arguments that can be passed to |
nfolds |
number of folds - default is 10. Although |
foldid |
an optional vector of values between 1 and |
keep |
If |
trace.it |
If |
The function runs sparsenet
nfolds
+1 times; the
first to get the lambda
sequence, and then the remainder to
compute the fit with each of the folds omitted. The error is
accumulated, and the average error and standard deviation over the
folds is computed.
an object of class "cv.sparsenet"
is returned, which is a
list with the ingredients of the cross-validation fit.
lambda |
the values of |
cvm |
The mean cross-validated error - a matrix shaped like lambda |
cvsd |
estimate of standard error of |
cvup |
upper curve = |
cvlo |
lower curve = |
nzero |
number of non-zero coefficients at each |
name |
a text string indicating type of measure (for plotting purposes). |
sparsenet.fit |
a fitted sparsenet object for the full data. |
call |
The call that produced this object |
parms.min |
values of |
which.min |
indices for the above |
lambda.1se |
|
which.1se |
indices of the above |
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, predict
, coef
, print
and plot
methods, and the sparsenet
function.
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)
This function generates x/y data for testing sparsenet and glmnet
gendata(N, p, nonzero, rho, snr = 3, alternate = TRUE)
gendata(N, p, nonzero, rho, snr = 3, alternate = TRUE)
N |
Sample size (eg 500) |
p |
Number of features or variables (eg 1000) |
nonzero |
Number if nonzero coefficients (eg 30) |
rho |
pairwise correlation between features |
snr |
Signal to noise ratio - SD signal/ SD noise - try 3 |
alternate |
Alternate sign of coefficients |
Generates Gaussian x and y data. The nonzero coefficients decrease
linearly in absolute value from nonzero down to 0. If
alternate=TRUE
their signs alternate, else not
A list with components x and y as well some other details about the dataset
Trevor Hastie and Jerome Friedman
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)
Plots the cross-validation curves for each value of gamma
in one figure, as a function of the lambda
values used.
## S3 method for class 'cv.sparsenet' plot(x, ...)
## S3 method for class 'cv.sparsenet' plot(x, ...)
x |
fitted |
... |
Other graphical parameters to plot |
A plot is produced, and nothing is returned.
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, sparsenet
, cv.sparsenet
and
print
and plot
methods for both.
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fitcv=cv.sparsenet(x,y) plot(fitcv)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fitcv=cv.sparsenet(x,y) plot(fitcv)
Produces a series of coefficient profile plots of the coefficient paths for a
fitted "sparsenet"
object.
## S3 method for class 'sparsenet' plot(x, xvar = c("rsq","lambda","norm"), which.gamma=NULL, label = FALSE,...)
## S3 method for class 'sparsenet' plot(x, xvar = c("rsq","lambda","norm"), which.gamma=NULL, label = FALSE,...)
x |
fitted |
xvar |
What is on the X-axis. |
which.gamma |
sequence numbers of |
label |
If |
... |
Other graphical parameters to plot |
A series of coefficient profile plots is produced, one for each
gamma
specified. Users should set up the appropriate layout.
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, sparsenet
, cv.sparsenet
and
print
and plot
methods for both.
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) par(mfrow=c(3,3)) plot(fit)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) par(mfrow=c(3,3)) plot(fit)
This function makes predictions from a cross-validated sparsenet model,
using the stored "sparsenet.fit"
object, and the optimal value
chosen for lambda
.
## S3 method for class 'cv.sparsenet' predict(object, newx, which=c("parms.min","parms.1se"),...) ## S3 method for class 'cv.sparsenet' coef(object, which=c("parms.min","parms.1se"),...)
## S3 method for class 'cv.sparsenet' predict(object, newx, which=c("parms.min","parms.1se"),...) ## S3 method for class 'cv.sparsenet' coef(object, which=c("parms.min","parms.1se"),...)
object |
Fitted |
newx |
Matrix of new values for |
which |
Either the paramaters of the minimum of the CV curves
(default |
... |
Not used. Other arguments to predict. |
This function makes it easier to use the results of cross-validation to make a prediction.
The object returned depends the ... argument which is passed on
to the predict
method for sparsenet
objects.
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, sparsenet
, cv.sparsenet
and
print
and plot
methods for both.
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fitcv=cv.sparsenet(x,y) predict(fitcv,x)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fitcv=cv.sparsenet(x,y) predict(fitcv,x)
Similar to other predict methods, this functions predicts fitted values,
coefficients and more from a fitted "sparsenet"
object.
## S3 method for class 'sparsenet' predict(object, newx, s = NULL, which.gamma = NULL, type=c("response","coefficients","nonzero"), exact = FALSE, ...) ## S3 method for class 'sparsenet' coef(object,s=NULL, which.gamma = NULL,exact=FALSE, ...)
## S3 method for class 'sparsenet' predict(object, newx, s = NULL, which.gamma = NULL, type=c("response","coefficients","nonzero"), exact = FALSE, ...) ## S3 method for class 'sparsenet' coef(object,s=NULL, which.gamma = NULL,exact=FALSE, ...)
object |
Fitted |
newx |
Matrix of new values for |
s |
Value(s) of the penalty parameter |
which.gamma |
Index or indices of |
type |
|
exact |
By default ( |
... |
Not used. Other arguments to predict. |
The shape of the objects returned depends on which
which.gamma
has more than one element.
If more than one element, a list of predictions is returned, one for
each gamma.
The object returned depends on type.
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, sparsenet
, cv.sparsenet
and
print
and plot
methods for both.
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) predict(fit, which.gamma=5,type="nonzero") predict(fit,x)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit=sparsenet(x,y) predict(fit, which.gamma=5,type="nonzero") predict(fit,x)
Sparsenet uses coordinate descent on the MC+ nonconvex penalty family, and fits a surface of solutions over the two-dimensional parameter space. This penalty family is indexed by an overall strength paramter lambda (like lasso), and a convexity parameter gamma. Gamma = infinity corresponds to the lasso, and gamma = 1 best subset.
sparsenet(x, y, weights, exclude, dfmax = nvars + 1, pmax = min(dfmax *2, nvars), ngamma = 9, nlambda = 50, max.gamma = 150, min.gamma = 1.000001, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, gamma = NULL, parms = NULL, warm = c("lambda", "gamma", "both"), thresh = 1e-05, maxit = 1e+06)
sparsenet(x, y, weights, exclude, dfmax = nvars + 1, pmax = min(dfmax *2, nvars), ngamma = 9, nlambda = 50, max.gamma = 150, min.gamma = 1.000001, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, gamma = NULL, parms = NULL, warm = c("lambda", "gamma", "both"), thresh = 1e-05, maxit = 1e+06)
x |
Input matrix of nobs x nvars predictors |
y |
response vector |
weights |
Observation weights; default 1 for each observation |
exclude |
Indices of variables to be excluded from the
model. Default is none. Since by default |
dfmax |
Limit the maximum number of variables in the
model. Useful for very large |
pmax |
Limit the maximum number of variables ever to be nonzero |
ngamma |
Number of gamma values, if |
nlambda |
Number of lambda values, if |
max.gamma |
Largest gamma value to be used, apart from infinity
(lasso), if |
min.gamma |
Smallest value of gamma to use, and should be >1; default is 1.000001 |
lambda.min.ratio |
Smallest value for |
lambda |
A user supplied |
gamma |
Sparsity parameter vector, with 1<gamma<infty. Gamma=1 corresponds to best-subset regression, gamma=infty to the lasso. Should be given in decreasing order. |
parms |
An optional three-dimensional array: 2x ngamma x nlambda. Here the user can supply exactly the gamma, lambda pairs that are to be traversed by the coordinate descent algorithm. |
warm |
How to traverse the grid. Default is "lambda", meaning warm starts from the previous lambda with the same gamma. "gamma" means the opposite, previous gamma for the same lambda. "both" tries both warm starts, and uses the one that improves the criterion the most. |
thresh |
Convergence threshold for coordinate descent. Each
coordinate-descent loop continues until the maximum change in the
objective after any coefficient update is less than |
maxit |
Maximum number of passes over the data for all lambda/gamma values; default is 10^6. |
This algorithm operates like glmnet
, with its alpha parameter
which moves the penalty between lasso and ridge; here gamma moves it
between lasso and best subset.
The algorithm traverses the two dimensional gamma/lambda array in a nested loop, with
decreasing gamma in the outer loop, and decreasing lambda in the inner
loop. Because of the nature of the MC+ penalty, each coordinate update
is a convex problem, with a simple two-threshold shrinking scheme:
beta< lambda set to zero; beta > lambda*gamma leave alone; beta
inbetween, shrink proportionally. Note that this algorithm ALWAYS
standardizes the columns of x and y to have mean zero and variance 1
(using the 1/N averaging) before it computes its fit. The
coefficients reflect the original scale.
An object of class "sparsenet"
, with a number of
components. Mostly one will access the components via generic
functions
like coef()
, plot()
, predict()
etc.
call |
the call that produced this object |
rsq |
The percentage variance explained on the training data; an ngamma x nlambda matrix. |
jerr |
error flag, for warnings and errors (largely for internal debugging). |
coefficients |
A coefficient list with ngamma elements; each of these is a coefficient list with various components: the matrix beta of coefficients, its dimension dim, the vector of intercepts, the lambda sequence, the gamma value, the sequence of df (nonzero coefficients) for each solution. |
parms |
Irrespective how the parameters were input, the three-way array of what was used. |
gamma |
The gamma values used |
lambda |
The lambda values used |
max.lambda |
The entry value for lambda |
Rahul Mazumder, Jerome Friedman and Trevor Hastie
Maintainer: Trevor Hastie <[email protected]>
Mazumder, Rahul, Friedman, Jerome and Hastie, Trevor (2011) SparseNet: Coordinate Descent with Nonconvex Penalties. JASA, Vol 106(495), 1125-38, https://hastie.su.domains/public/Papers/Sparsenet/Mazumder-SparseNetCoordinateDescent-2011.pdf
glmnet
package, predict
, coef
, print
and plot
methods, and the cv.sparsenet
function.
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)
train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3) fit=sparsenet(train.data$x,train.data$y) par(mfrow=c(3,3)) plot(fit) par(mfrow=c(1,1)) fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE) plot(fitcv)