Title: | Regression Subset Selection |
---|---|
Description: | Regression subset selection, including exhaustive search. |
Authors: | Thomas Lumley based on Fortran code by Alan Miller |
Maintainer: | Thomas Lumley <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.2 |
Built: | 2024-11-08 06:35:21 UTC |
Source: | CRAN |
leaps() performs an exhaustive search for the best subsets of the
variables in x for predicting y in linear regression, using an efficient
branch-and-bound algorithm. It is a compatibility wrapper for
regsubsets
does the same thing better.
Since the algorithm returns a best model of each size, the results do not depend on a penalty model for model size: it doesn't make any difference whether you want to use AIC, BIC, CIC, DIC, ...
leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE)
leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE)
x |
A matrix of predictors |
y |
A response vector |
wt |
Optional weight vector |
int |
Add an intercept to the model |
method |
Calculate Cp, adjusted R-squared or R-squared |
nbest |
Number of subsets of each size to report |
names |
vector of names for columns of |
df |
Total degrees of freedom to use instead of |
strictly.compatible |
Implement misfeatures of leaps() in S |
A list with components
which |
logical matrix. Each row can be used to select the columns of |
size |
Number of variables, including intercept if any, in the model |
cp |
or |
label |
vector of names for the columns of x |
With strictly.compatible=T
the function will stop with an error if x
is not of full rank or if it has more than 31 columns. It will ignore the column names of x
even if names==NULL
and will replace them with "0" to "9", "A" to "Z".
Alan Miller "Subset Selection in Regression" Chapman & Hall
regsubsets
, regsubsets.formula
,
regsubsets.default
x<-matrix(rnorm(100),ncol=4) y<-rnorm(25) leaps(x,y)
x<-matrix(rnorm(100),ncol=4) y<-rnorm(25) leaps(x,y)
These functions are used internally by regsubsets
and leaps
. They are wrappers for Fortran routines that construct and manipulate a QR decomposition.
leaps.setup(x,y,wt=rep(1,length(y)),force.in=NULL,force.out=NULL,intercept=TRUE,nvmax=8, nbest=1,warn.dep=TRUE) leaps.seqrep(leaps.obj) leaps.exhaustive(leaps.obj,really.big=FALSE) leaps.backward(leaps.obj,nested) leaps.forward(leaps.obj,nested)
leaps.setup(x,y,wt=rep(1,length(y)),force.in=NULL,force.out=NULL,intercept=TRUE,nvmax=8, nbest=1,warn.dep=TRUE) leaps.seqrep(leaps.obj) leaps.exhaustive(leaps.obj,really.big=FALSE) leaps.backward(leaps.obj,nested) leaps.forward(leaps.obj,nested)
x |
A matrix of predictors |
y |
A response vector |
wt |
Optional weight vector |
intercept |
Add an intercept to the model |
force.in |
vector indicating variable that must be in the model |
force.out |
vector indicating variable that must not be in the model |
nbest |
Number of subsets of each size to report |
nvmax |
largest subset size to examine |
warn.dep |
warn if |
leaps.obj |
An object of class |
really.big |
required before R gets sent off on a long uninterruptible computation |
nested |
Use just the forward or backward selection models, not the models with variables |
Plots a table of models showing which variables are in each model. The
models are ordered by the specified model selection statistic. This plot
is particularly useful when there are more than ten or so models and the simple table
produced by summary.regsubsets
is too big to read.
## S3 method for class 'regsubsets' plot(x, labels=obj$xnames, main=NULL, scale=c("bic", "Cp", "adjr2", "r2"), col=gray(seq(0, 0.9, length = 10)),...)
## S3 method for class 'regsubsets' plot(x, labels=obj$xnames, main=NULL, scale=c("bic", "Cp", "adjr2", "r2"), col=gray(seq(0, 0.9, length = 10)),...)
x |
|
labels |
variable names |
main |
title for plot |
scale |
which summary statistic to use for ordering plots |
col |
Colors: the last color should be close to but distinct from white |
... |
other arguments |
None
Thomas Lumley, based on a concept by Merlise Clyde
data(swiss) a<-regsubsets(Fertility~.,nbest=3,data=swiss) par(mfrow=c(1,2)) plot(a) plot(a,scale="r2")
data(swiss) a<-regsubsets(Fertility~.,nbest=3,data=swiss) par(mfrow=c(1,2)) plot(a) plot(a,scale="r2")
Model selection by exhaustive search, forward or backward stepwise, or sequential replacement
regsubsets(x=, ...) ## S3 method for class 'formula' regsubsets(x=, data=, weights=NULL, nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive", "backward", "forward", "seqrep"), really.big=FALSE, nested=(nbest==1),...) ## Default S3 method: regsubsets(x=, y=, weights=rep(1, length(y)), nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'biglm' regsubsets(x,nbest=1,nvmax=8,force.in=NULL, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'regsubsets' summary(object,all.best=TRUE,matrix=TRUE,matrix.logical=FALSE,df=NULL,...) ## S3 method for class 'regsubsets' coef(object,id,vcov=FALSE,...) ## S3 method for class 'regsubsets' vcov(object,id,...)
regsubsets(x=, ...) ## S3 method for class 'formula' regsubsets(x=, data=, weights=NULL, nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive", "backward", "forward", "seqrep"), really.big=FALSE, nested=(nbest==1),...) ## Default S3 method: regsubsets(x=, y=, weights=rep(1, length(y)), nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'biglm' regsubsets(x,nbest=1,nvmax=8,force.in=NULL, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'regsubsets' summary(object,all.best=TRUE,matrix=TRUE,matrix.logical=FALSE,df=NULL,...) ## S3 method for class 'regsubsets' coef(object,id,vcov=FALSE,...) ## S3 method for class 'regsubsets' vcov(object,id,...)
x |
design matrix or model formula for full model, or |
data |
Optional data frame |
y |
response vector |
weights |
weight vector |
nbest |
number of subsets of each size to record |
nvmax |
maximum size of subsets to examine |
force.in |
index to columns of design matrix that should be in all models |
force.out |
index to columns of design matrix that should be in no models |
intercept |
Add an intercept? |
method |
Use exhaustive search, forward selection, backward selection or sequential replacement to search. |
really.big |
Must be TRUE to perform exhaustive search on more than 50 variables. |
nested |
See the Note below: if |
object |
regsubsets object |
all.best |
Show all the best subsets or just one of each size |
matrix |
Show a matrix of the variables in each model or just summary statistics |
matrix.logical |
With |
df |
Specify a number of degrees of freedom for the summary
statistics. The default is |
id |
Which model or models (ordered as in the summary output) to return coefficients and variance matrix for |
vcov |
If |
... |
Other arguments for future methods |
Since this function returns separate best models of all sizes up to
nvmax
and since different model selection criteria such as AIC,
BIC, CIC, DIC, ... differ only in how models of different sizes are compared, the
results do not depend on the choice of cost-complexity tradeoff.
When x
is a biglm
object it is assumed to be the full
model, so force.out
is not relevant. If there is an intercept it
is forced in by default; specify a force.in
as a logical vector
with FALSE
as the first element to allow the intercept to be
dropped.
The model search does not actually fit each model, so the returned
object does not contain coefficients or standard errors. Coefficients
and the variance-covariance matrix for one or model models can be
obtained with the coef
and vcov
methods.
regsubsets
returns an object of class "regsubsets" containing no
user-serviceable parts. It is designed to be processed by
summary.regsubsets
.
summary.regsubsets
returns an object with elements
which |
A logical matrix indicating which elements are in each model |
rsq |
The r-squared for each model |
rss |
Residual sum of squares for each model |
adjr2 |
Adjusted r-squared |
cp |
Mallows' Cp |
bic |
Schwartz's information criterion, BIC |
outmat |
A version of the |
obj |
A copy of the |
The coef
method returns a coefficient vector or list of vectors,
the vcov
method returns a matrix or list of matrices.
As part of the setup process, the code initially fits models with the
first variable in x
, the first two, the first three, and so on.
For forward and backward selection it is possible that the model with the k
first variables will be better than the model with k
variables from the selection algorithm. If it is, the model with the
first k
variables will be returned, with a warning. This can
happen for forward and backward selection. It (obviously) can't for
exhaustive search.
With nbest=1
you can avoid these extra models with
nested=TRUE
, which is the default.
data(swiss) a<-regsubsets(as.matrix(swiss[,-1]),swiss[,1]) summary(a) b<-regsubsets(Fertility~.,data=swiss,nbest=2) summary(b) coef(a, 1:3) vcov(a, 3)
data(swiss) a<-regsubsets(as.matrix(swiss[,-1]),swiss[,1]) summary(a) b<-regsubsets(Fertility~.,data=swiss,nbest=2) summary(b) coef(a, 1:3) vcov(a, 3)