Title: | Prediction and Variable Selection Using Spike and Slab Regression |
---|---|
Description: | Spike and slab for prediction and variable selection in linear regression models. Uses a generalized elastic net for variable selection. |
Authors: | Hemant Ishwaran <[email protected]> |
Maintainer: | Udaya B. Kogalur <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.6 |
Built: | 2024-12-15 07:20:19 UTC |
Source: | CRAN |
Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.
cv.spikeslab(x = NULL, y = NULL, K = 10, plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE, bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln), r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE, fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE, ntree = 300, seed = NULL, ...)
cv.spikeslab(x = NULL, y = NULL, K = 10, plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE, bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln), r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE, fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE, ntree = 300, seed = NULL, ...)
x |
x-predictor matrix. |
y |
y-response values. |
K |
Number of folds. |
plot.it |
If TRUE, plots the mean prediction error and its standard error. |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
mse |
If TRUE, an external estimate for the overall variance is calculated. |
bigp.smalln |
Use if |
bigp.smalln.factor |
Top |
screen |
If TRUE, variables are first pre-filtered. |
r.effects |
List used for grouping variables (see details below). |
max.var |
Maximum number of variables allowed in the final model. |
center |
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples. |
intercept |
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE. |
fast |
If TRUE, use blocked Gibbs sampling to accelerate the algorithm. |
beta.blocks |
Update beta using this number of blocks ( |
verbose |
If TRUE, verbose output is sent to the terminal. |
save.all |
If TRUE, spikeslab object for each fold is saved and returned. |
ntree |
Number of trees used by random forests (applies only when |
seed |
Seed for random number generator. Must be a negative integer. |
... |
Further arguments passed to or from other methods. |
Invisibly returns a list with components:
spikeslab.obj |
Spike and slab object from the full data. |
cv.spikeslab.obj |
List containing spike and slab objects from each fold. Can be NULL. |
cv.fold |
List containing the cv splits. |
cv |
Mean-squared error for each fold for the gnet. |
cv.path |
A matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds. |
stability |
Matrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values. |
bma |
bma coefficients from the full data in terms of the standardized x. |
bma.scale |
bma coefficients from the full data, scaled in terms of the original x. |
gnet |
cv-optimized gnet in terms of the standardized x. |
gnet.scale |
cv-optimized gnet in terms of the original x. |
gnet.model |
List of models selected by gnet over the K-folds. |
gnet.path |
gnet path from the full data, scaled in terms of the original x. |
gnet.obj |
gnet object from fitting the full data (a lars-type object). |
gnet.obj.vars |
Variables (in order) used to calculate the gnet object. |
verbose |
Verbose details (used for printing). |
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.
sparsePC.spikeslab
,
plot.spikeslab
,
predict.spikeslab
,
print.spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: 10-fold validation using parallel processing #------------------------------------------------------------ data(ozoneI, package = "spikeslab") y <- ozoneI[, 1] x <- ozoneI[, -1] cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4) plot(cv.obj, plot.type = "cv") plot(cv.obj, plot.type = "path") #------------------------------------------------------------ # Example 2: 10-fold validation using parallel processing # (high dimensional diabetes data) #------------------------------------------------------------ # add 2000 noise variables data(diabetesI, package = "spikeslab") diabetes.noise <- cbind(diabetesI, noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI))) x <- diabetes.noise[, -1] y <- diabetes.noise[, 1] cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4) plot(cv.obj) ## End(Not run)
## Not run: #------------------------------------------------------------ # Example 1: 10-fold validation using parallel processing #------------------------------------------------------------ data(ozoneI, package = "spikeslab") y <- ozoneI[, 1] x <- ozoneI[, -1] cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4) plot(cv.obj, plot.type = "cv") plot(cv.obj, plot.type = "path") #------------------------------------------------------------ # Example 2: 10-fold validation using parallel processing # (high dimensional diabetes data) #------------------------------------------------------------ # add 2000 noise variables data(diabetesI, package = "spikeslab") diabetes.noise <- cbind(diabetesI, noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI))) x <- diabetes.noise[, -1] y <- diabetes.noise[, 1] cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4) plot(cv.obj) ## End(Not run)
The data consists of 442 patients in which the response of interest is a
quantitative measure of disease progression. The data includes 10
baseline measurements for each patient, in addition to 45 interactions
and 9 quadratic terms, for a total of 64 variables for each patient.
The outcome is Y
.
Efron B., Hastie T., Johnstone. I and Tibshirani R. (2004). Least angle regression (with discussion). Ann. Statist., 32:407-499.
data(diabetesI, package = "spikeslab")
data(diabetesI, package = "spikeslab")
Median house price for 506 census tracts of Boston from the 1970 census.
The original data comprises 506 observations and 13 variables but has
been modified here to include all pairwise interactions of main effects
and to include B-spline basis functions of up to 6 degrees of freedom
for all original predictors. In addition, all real valued variables
were mapped to dummy variables representing a factor with three levels
and all pairwise interactions of these dummy variables were added to the
design matrix. In total, the modified data contains 506 observations
and 658 variables. The outcome is the median house price medv
.
Harrison D. and Rubinfeld D.L. (1978). Hedonic prices and the demand for clean air. J. Envir. Economics Management, 5:81-102
data(housingI, package = "spikeslab")
data(housingI, package = "spikeslab")
Gene expression cancer data set (Golub et al.) of samples from human acute myeloid (AML) and acute lymphoblastic leukemias (ALL). 3571 expression values on 72 individuals.
Golub T.R et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439):531-537.
data(leukemia, package = "spikeslab")
data(leukemia, package = "spikeslab")
The data consists of 366 readings of maximum daily ozone measured in the
Los Angeles basin. After removing missing values, the original data has
been expanded to include all pairwise interactions, as well as B-spline
basis functions (6 degrees of freedom), for each of the original 12
variables (9 meteorlogical variables and 3 variables recording date of
measurement: month, day of the month, and day of week). In total, the
modified data has 203 observations and 134 variables. The outcome is
ozone
.
Breiman L. and Friedman J.H. (1985). Estimating optimal transformations for multiple regression and correlation. J. Amer. Stat. Assoc., 80:580-598.
data(ozoneI, package = "spikeslab")
data(ozoneI, package = "spikeslab")
Plots either the gnet solution path or the cross-validated mean-squared-error (the latter only applies when cross-validation is used).
## S3 method for class 'spikeslab' plot(x, plot.type = c("path", "cv"), breaks = FALSE, ...)
## S3 method for class 'spikeslab' plot(x, plot.type = c("path", "cv"), breaks = FALSE, ...)
x |
An object of class |
plot.type |
Choosing "path" produces a plot of the gnet solution
path. The choice "cv" produces the mean-squared error plot. The
latter applies only to objects from a |
breaks |
If TRUE, then vertical lines are drawn at each break point in the gnet solution path. |
... |
Further arguments passed to or from other methods. |
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Efron B., Hastie T., Johnstone I., and Tibshirani R. (2004). Least angle regression (with discussion). Ann. Statist., 32:407-499.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
spikeslab, cv.spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose = TRUE) plot(obj, plot.type = "path") ## End(Not run)
## Not run: #------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose = TRUE) plot(obj, plot.type = "path") ## End(Not run)
Prediction on test data using spike and slab regression.
## S3 method for class 'spikeslab' predict(object, newdata = NULL, ...)
## S3 method for class 'spikeslab' predict(object, newdata = NULL, ...)
object |
An object of class |
newdata |
Data frame or x-matrix containing test data (if omitted, the training data is used). |
... |
Further arguments passed to or from other methods. |
Computes the predicted value using a test data set.
A vector of fitted values for the BMA and gnet and a matrix of fitted values for the gnet path. Also returns the grr mixing predictor if the object has been parsed by the mixing wrapper.
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Ishwaran H. and Rao J.S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Stat. Assoc., 98:438-455.
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2005b). Spike and slab gene selection for multigroup microarray data. J. Amer. Stat. Assoc., 100:764-780.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
Ishwaran H., Kogalur U.B. and Rao J.S. (2010). spikeslab: prediction and variable selection using spike and slab regression. R Journal, 2(2), 68-73.
Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.
spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: get the predictor for the training data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") x <- diabetesI[, -1] y <- diabetesI[, 1] obj <- spikeslab(x = x, y = y) #gnet predictor yhat.gnet <- predict(obj)$yhat.gnet #an equivalent call is... yhat.gnet <- predict(obj, x = x)$yhat.gnet #------------------------------------------------------------ # Example 2: ozone data with interactions #------------------------------------------------------------ data(ozoneI, package = "spikeslab") train.pt <- sample(1:nrow(ozoneI), nrow(ozoneI) * 0.80) obj <- spikeslab(ozone ~ . , ozoneI[train.pt, ]) ytest <- ozoneI$ozone[-train.pt] ss.pred <- predict(obj, ozoneI[-train.pt, ]) yhat.bma <- ss.pred$yhat.bma yhat.gnet <- ss.pred$yhat.gnet plot(ytest, yhat.bma, ylab = "yhat", pch = 16, col = 4) points(ytest, yhat.gnet, pch = 16, col = 2) abline(0, 1, lty = 2, col = 2) legend("bottomright", legend = c("bma", "gnet"), col = c(4, 2), pch = 16) ## End(Not run)
## Not run: #------------------------------------------------------------ # Example 1: get the predictor for the training data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") x <- diabetesI[, -1] y <- diabetesI[, 1] obj <- spikeslab(x = x, y = y) #gnet predictor yhat.gnet <- predict(obj)$yhat.gnet #an equivalent call is... yhat.gnet <- predict(obj, x = x)$yhat.gnet #------------------------------------------------------------ # Example 2: ozone data with interactions #------------------------------------------------------------ data(ozoneI, package = "spikeslab") train.pt <- sample(1:nrow(ozoneI), nrow(ozoneI) * 0.80) obj <- spikeslab(ozone ~ . , ozoneI[train.pt, ]) ytest <- ozoneI$ozone[-train.pt] ss.pred <- predict(obj, ozoneI[-train.pt, ]) yhat.bma <- ss.pred$yhat.bma yhat.gnet <- ss.pred$yhat.gnet plot(ytest, yhat.bma, ylab = "yhat", pch = 16, col = 4) points(ytest, yhat.gnet, pch = 16, col = 2) abline(0, 1, lty = 2, col = 2) legend("bottomright", legend = c("bma", "gnet"), col = c(4, 2), pch = 16) ## End(Not run)
Print summary output from spike and slab analysis. Note that this is the default print method for the package.
## S3 method for class 'spikeslab' print(x, ...)
## S3 method for class 'spikeslab' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. |
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Ishwaran H. and Rao J.S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Stat. Assoc., 98:438-455.
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2005b). Spike and slab gene selection for multigroup microarray data. J. Amer. Stat. Assoc., 100:764-780.
Ishwaran H. and Rao J.S. (2009). Generalized ridge regression: geometry and computational solutions when p is larger than n.
spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose = TRUE) print(obj) ## End(Not run)
## Not run: #------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose = TRUE) print(obj) ## End(Not run)
Variable selection for the multiclass gene prediction problem.
sparsePC.spikeslab(x = NULL, y = NULL, n.rep = 10, n.iter1 = 150, n.iter2 = 100, n.prcmp = 5, max.genes = 100, ntree = 1000, nodesize = 1, verbose = TRUE, ...)
sparsePC.spikeslab(x = NULL, y = NULL, n.rep = 10, n.iter1 = 150, n.iter2 = 100, n.prcmp = 5, max.genes = 100, ntree = 1000, nodesize = 1, verbose = TRUE, ...)
x |
x matrix of gene expressions. |
y |
Class labels. |
n.rep |
Number of Monte Carlo replicates. |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
n.prcmp |
Number of principal components. |
max.genes |
Maximum number of genes in final model. |
ntree |
Number of trees used by random forests. |
nodesize |
Nodesize of trees. |
verbose |
If TRUE, verbose output is sent to the terminal. |
... |
Further arguments passed to or from other methods. |
Multiclass prediction using a hybrid combination of spike and slab
linear regression and random forest multiclass prediction (Ishwaran
and Rao, 2009). A pseudo y-vector of response values is calculated
using each of the top n.prcmp
principal components of the
x-gene expression matrix. The generalized elastic net obtained from
using spike and slab regression is used to select genes; one
regression fit is used for each of the pseduo y-response vectors. The
final combined set of genes are passed to random forests and used to
construct a multiclass forest predictor. This procedure is repeated
n.rep
times with each Monte Carlo replicate based on balanced
cross-validation with 2/3rds of the data used for training and 1/3rd
used for testing.
—> Miscellanea:
Test set error is only computed when n.rep
is larger than 1.
If n.rep
=1 the full data is used without any cross-validation.
Invisibly, the final set of selected genes as well as the complete set
of genes selected over the n.rep
Monte Carlo replications. The
random forest classifier is also returned.
The misclassification error rate, error rate for each class, and other summary information are output to the terminal.
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Ishwaran H. and Rao J.S. (2009). Generalized ridge regression: geometry and computational solutions when p is larger than n.
spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: leukemia data #------------------------------------------------------------ data(leukemia, package = "spikeslab") sparsePC.out <- sparsePC(x = leukemia[, -1], y = leukemia[, 1], n.rep = 3) rf.obj <- sparsePC.out$rf.obj varImpPlot(rf.obj) ## End(Not run)
## Not run: #------------------------------------------------------------ # Example 1: leukemia data #------------------------------------------------------------ data(leukemia, package = "spikeslab") sparsePC.out <- sparsePC(x = leukemia[, -1], y = leukemia[, 1], n.rep = 3) rf.obj <- sparsePC.out$rf.obj varImpPlot(rf.obj) ## End(Not run)
Fits a rescaled spike and slab model using a continuous bimodal prior. A generalized elastic net estimator is used for variable selection and estimation. Can be used for prediction and variable selection in low- and high-dimensional linear regression models.
spikeslab(formula, data = NULL, x = NULL, y = NULL, n.iter1 = 500, n.iter2 = 500, mse = TRUE, bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln), r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE, fast = TRUE, beta.blocks = 5, verbose = FALSE, ntree = 300, seed = NULL, ...)
spikeslab(formula, data = NULL, x = NULL, y = NULL, n.iter1 = 500, n.iter2 = 500, mse = TRUE, bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln), r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE, fast = TRUE, beta.blocks = 5, verbose = FALSE, ntree = 300, seed = NULL, ...)
formula |
A symbolic description of the model to be fit. |
data |
Data frame containing the data used in the formula. |
x |
x predictor matrix (can be used in place of formula and data frame call). |
y |
y response (can be used in place of formula and data frame call). |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
mse |
If TRUE, an external estimate for the overall variance is calculated using ridge regression or random forests (the latter is used when the degrees of freedom are low). Otherwise, the variance is included in the prior and estimated using Gibbs sampling. |
bigp.smalln |
Use if |
bigp.smalln.factor |
Removes all variables except the top
|
screen |
If TRUE, variables are pre-filtered. |
r.effects |
List used for grouping variables (see details below). |
max.var |
Maximum number of variables allowed in the final model. |
center |
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples. |
intercept |
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE. |
fast |
If TRUE, use blocked Gibbs sampling to accelerate the algorithm. |
beta.blocks |
Update beta using this number of blocks ( |
verbose |
If TRUE, verbose output is sent to the terminal. |
ntree |
Number of trees used by random forests (applies only when |
seed |
Seed for random number generator. Must be a negative integer. |
... |
Further arguments passed to or from other methods. |
—> General:
The spike and slab method is described in detail in Ishwaran and Rao
(2003, 2005a, 2005b and 2009). For high-dimensional problems in which
p
>> n
, where p
is the number of variables and
n
is the sample size, use the option bigp.smalln
=TRUE.
Doing so implements a three-stage procedure:
(1) Filtering step. This removes all variables except the top
n
times bigp.smalln.factor
ones. Uses spike and slab
regression with grouped regularization (complexity) parameters.
(2) Model averaging step. Refit the model using only those predictors from step 1. Returns the posterior mean values from fitting a spike and slab model; referred to as the Bayesian model averaged (bma) estimate.
(3) Variable selection step. Select variables using the generalized elastic net (gnet).
The filtering step is omitted when bigp.smalln
=FALSE.
Filtering can however be requested by setting screen
=TRUE
although users should be aware that this may degrade performance and
should only be used when p
is on the same order of n
.
Variables can be grouped using r.effects
. Grouping has the
effect of forcing variables within a given group to share a common
complexity (regularization) parameter. To do so, define a list with
each entry in the list made up of the variable names to be grouped.
There is no limit to the number of groups. Any variable that does
not appear in the list will be assigned to a default group (the
default group also has its own group-specific regularization
parameter). See Examples 1 and 3 below.
—> Miscellanea:
By default, fast
=TRUE when bigp.smalln
=TRUE. This
invokes an ultra-fast filtering step. Setting fast
=FALSE
invokes a more thorough filtering method that may slightly improve
inferential results, but computational times will become very slow.
The trade-off is unlikely to be justified.
The formula and data-frame call should be avoided in high-dimensional problems and instead the x-predictor matrix and y response vector should be passed directly (see Example 3). This avoids the huge overhead in parsing formula in R.
By default, predictors are normalized to have mean 0 and variance 1.
Pre-processing also involves centering y unless the user specifically
requests that the intercept be excluded from the model. Users can
also over-ride centering predictors by setting center
=FALSE.
Use with extreme care.
The verbose
option sends output to the terminal showing the
number of Gibbs iterations and the current complexity (regularization)
parameter(s).
Depends on the randomForest
package for estimating the variance
when mse
=TRUE. Note that mse
is over-ridden and set to
FALSE when bigp.smalln
=TRUE.
Depends on the lars
package for the variable slection step.
An object of class spikeslab
with the following components:
summary |
Summary object. |
verbose |
Verbose details (used for printing). |
terms |
Terms. |
sigma.hat |
Estimated variance. |
y |
Original y. |
xnew |
Centered, rescaled x-matrix. |
x |
Original x-matrix. |
y.center |
Centering for original y. |
x.center |
Centering for original x-matrix. |
x.scale |
Scaling for original x-matrix. |
names |
Variable names. |
bma |
bma coefficients in terms of xnew. |
bma.scale |
bma coefficients rescaled in terms of original x. |
gnet |
gnet coefficients in terms of xnew. |
gnet.scale |
gnet coefficients rescaled in terms of original x. |
gnet.path |
gnet path scaled in terms of the original x. |
gnet.obj |
gnet object (a lars-type object). |
gnet.obj.vars |
Variables (in order) used to calculate the gnet object. |
gnet.parms |
Generalized ridge regression parameters used to define the gnet. |
phat |
Estimated model dimension. |
complexity |
Complexity (regularization) parameter estimates. |
ridge |
List containing ridge values used to determine the bma. |
models |
List containing the models sampled. |
Hemant Ishwaran ([email protected])
J. Sunil Rao ([email protected])
Udaya B. Kogalur ([email protected])
Breiman L. (2001). Random forests, Machine Learning, 45:5-32.
Efron B., Hastie T., Johnstone I. and Tibshirani R. (2004). Least angle regression (with discussion). Ann. Statist., 32:407-499.
Ishwaran H. and Rao J.S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Stat. Assoc., 98:438-455.
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2005b). Spike and slab gene selection for multigroup microarray data. J. Amer. Stat. Assoc., 100:764-780.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
Ishwaran H., Kogalur U.B. and Rao J.S. (2010). spikeslab: prediction and variable selection using spike and slab regression. R Journal, 2(2), 68-73.
Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.
Zou H. and Hastie T. (2005). Regularization and variable selection via the elastic net. J. Royal Statist. Society B, 67(2):301-320.
cv.spikeslab
,
plot.spikeslab
,
predict.spikeslab
,
print.spikeslab
,
sparsePC.spikeslab
.
#------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ # basic call data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose=TRUE) print(obj) plot(obj) # grouping effect # separate main effects and interactions into two groups # use a group-specific regularization parameter for each group xnames <- names(diabetesI[, -1]) r.eff <- vector("list", 2) r.eff[[1]] <- xnames[c(1:10)] r.eff[[2]] <- xnames[-c(1:10)] obj2 <- spikeslab(Y ~ . , diabetesI, verbose=TRUE, r.effects=r.eff) obj2 # extract the regularization parameters print(apply(obj2$complexity, 2, summary)) ## Not run: #------------------------------------------------------------ # Example 2: high-dimensional noise (diabetes data) #------------------------------------------------------------ # add 2000 noise variables data(diabetesI, package = "spikeslab") diabetes.noise <- cbind(diabetesI, noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI))) # example of a big p, small n call # don't use formula call; make call with x and y arguments x <- diabetes.noise[, -1] y <- diabetes.noise[, 1] obj <- spikeslab(x=x, y=y, verbose=TRUE, bigp.smalln=TRUE, max.var=100) obj # same example ... but now group variables r.eff <- vector("list", 2) r.eff[[1]] <- names(x)[c(1:100)] r.eff[[2]] <- names(x)[-c(1:100)] obj2 <- spikeslab(x=x, y=y, verbose=TRUE, bigp.smalln=TRUE, r.effects=r.eff, max.var=100) obj2 #------------------------------------------------------------ # Example 3: housing data with interactions #------------------------------------------------------------ # another example of a big p, small n call data(housingI, package = "spikeslab") obj <- spikeslab(medv ~ ., housingI, verbose = TRUE, bigp.smalln = TRUE, max.var = 200) print(obj) ## End(Not run)
#------------------------------------------------------------ # Example 1: diabetes data #------------------------------------------------------------ # basic call data(diabetesI, package = "spikeslab") obj <- spikeslab(Y ~ . , diabetesI, verbose=TRUE) print(obj) plot(obj) # grouping effect # separate main effects and interactions into two groups # use a group-specific regularization parameter for each group xnames <- names(diabetesI[, -1]) r.eff <- vector("list", 2) r.eff[[1]] <- xnames[c(1:10)] r.eff[[2]] <- xnames[-c(1:10)] obj2 <- spikeslab(Y ~ . , diabetesI, verbose=TRUE, r.effects=r.eff) obj2 # extract the regularization parameters print(apply(obj2$complexity, 2, summary)) ## Not run: #------------------------------------------------------------ # Example 2: high-dimensional noise (diabetes data) #------------------------------------------------------------ # add 2000 noise variables data(diabetesI, package = "spikeslab") diabetes.noise <- cbind(diabetesI, noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI))) # example of a big p, small n call # don't use formula call; make call with x and y arguments x <- diabetes.noise[, -1] y <- diabetes.noise[, 1] obj <- spikeslab(x=x, y=y, verbose=TRUE, bigp.smalln=TRUE, max.var=100) obj # same example ... but now group variables r.eff <- vector("list", 2) r.eff[[1]] <- names(x)[c(1:100)] r.eff[[2]] <- names(x)[-c(1:100)] obj2 <- spikeslab(x=x, y=y, verbose=TRUE, bigp.smalln=TRUE, r.effects=r.eff, max.var=100) obj2 #------------------------------------------------------------ # Example 3: housing data with interactions #------------------------------------------------------------ # another example of a big p, small n call data(housingI, package = "spikeslab") obj <- spikeslab(medv ~ ., housingI, verbose = TRUE, bigp.smalln = TRUE, max.var = 200) print(obj) ## End(Not run)
Show the NEWS file of the spikeslab package.
spikeslab.news(...)
spikeslab.news(...)
... |
Further arguments passed to or from other methods. |
None.
Hemant Ishwaran [email protected]