Title:  Regression Subset Selection 

Description:  Regression subset selection, including exhaustive search. 
Authors:  Thomas Lumley based on Fortran code by Alan Miller 
Maintainer:  Thomas Lumley <[email protected]> 
License:  GPL (>= 2) 
Version:  3.2 
Built:  20240611 06:38:48 UTC 
Source:  CRAN 
leaps() performs an exhaustive search for the best subsets of the
variables in x for predicting y in linear regression, using an efficient
branchandbound algorithm. It is a compatibility wrapper for
regsubsets
does the same thing better.
Since the algorithm returns a best model of each size, the results do not depend on a penalty model for model size: it doesn't make any difference whether you want to use AIC, BIC, CIC, DIC, ...
leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE)
leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE)
x 
A matrix of predictors 
y 
A response vector 
wt 
Optional weight vector 
int 
Add an intercept to the model 
method 
Calculate Cp, adjusted Rsquared or Rsquared 
nbest 
Number of subsets of each size to report 
names 
vector of names for columns of 
df 
Total degrees of freedom to use instead of 
strictly.compatible 
Implement misfeatures of leaps() in S 
A list with components
which 
logical matrix. Each row can be used to select the columns of 
size 
Number of variables, including intercept if any, in the model 
cp 
or 
label 
vector of names for the columns of x 
With strictly.compatible=T
the function will stop with an error if x
is not of full rank or if it has more than 31 columns. It will ignore the column names of x
even if names==NULL
and will replace them with "0" to "9", "A" to "Z".
Alan Miller "Subset Selection in Regression" Chapman & Hall
regsubsets
, regsubsets.formula
,
regsubsets.default
x<matrix(rnorm(100),ncol=4) y<rnorm(25) leaps(x,y)
x<matrix(rnorm(100),ncol=4) y<rnorm(25) leaps(x,y)
These functions are used internally by regsubsets
and leaps
. They are wrappers for Fortran routines that construct and manipulate a QR decomposition.
leaps.setup(x,y,wt=rep(1,length(y)),force.in=NULL,force.out=NULL,intercept=TRUE,nvmax=8, nbest=1,warn.dep=TRUE) leaps.seqrep(leaps.obj) leaps.exhaustive(leaps.obj,really.big=FALSE) leaps.backward(leaps.obj,nested) leaps.forward(leaps.obj,nested)
leaps.setup(x,y,wt=rep(1,length(y)),force.in=NULL,force.out=NULL,intercept=TRUE,nvmax=8, nbest=1,warn.dep=TRUE) leaps.seqrep(leaps.obj) leaps.exhaustive(leaps.obj,really.big=FALSE) leaps.backward(leaps.obj,nested) leaps.forward(leaps.obj,nested)
x 
A matrix of predictors 
y 
A response vector 
wt 
Optional weight vector 
intercept 
Add an intercept to the model 
force.in 
vector indicating variable that must be in the model 
force.out 
vector indicating variable that must not be in the model 
nbest 
Number of subsets of each size to report 
nvmax 
largest subset size to examine 
warn.dep 
warn if 
leaps.obj 
An object of class 
really.big 
required before R gets sent off on a long uninterruptible computation 
nested 
Use just the forward or backward selection models, not the models with variables 
Plots a table of models showing which variables are in each model. The
models are ordered by the specified model selection statistic. This plot
is particularly useful when there are more than ten or so models and the simple table
produced by summary.regsubsets
is too big to read.
## S3 method for class 'regsubsets' plot(x, labels=obj$xnames, main=NULL, scale=c("bic", "Cp", "adjr2", "r2"), col=gray(seq(0, 0.9, length = 10)),...)
## S3 method for class 'regsubsets' plot(x, labels=obj$xnames, main=NULL, scale=c("bic", "Cp", "adjr2", "r2"), col=gray(seq(0, 0.9, length = 10)),...)
x 

labels 
variable names 
main 
title for plot 
scale 
which summary statistic to use for ordering plots 
col 
Colors: the last color should be close to but distinct from white 
... 
other arguments 
None
Thomas Lumley, based on a concept by Merlise Clyde
data(swiss) a<regsubsets(Fertility~.,nbest=3,data=swiss) par(mfrow=c(1,2)) plot(a) plot(a,scale="r2")
data(swiss) a<regsubsets(Fertility~.,nbest=3,data=swiss) par(mfrow=c(1,2)) plot(a) plot(a,scale="r2")
Model selection by exhaustive search, forward or backward stepwise, or sequential replacement
regsubsets(x=, ...) ## S3 method for class 'formula' regsubsets(x=, data=, weights=NULL, nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive", "backward", "forward", "seqrep"), really.big=FALSE, nested=(nbest==1),...) ## Default S3 method: regsubsets(x=, y=, weights=rep(1, length(y)), nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'biglm' regsubsets(x,nbest=1,nvmax=8,force.in=NULL, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'regsubsets' summary(object,all.best=TRUE,matrix=TRUE,matrix.logical=FALSE,df=NULL,...) ## S3 method for class 'regsubsets' coef(object,id,vcov=FALSE,...) ## S3 method for class 'regsubsets' vcov(object,id,...)
regsubsets(x=, ...) ## S3 method for class 'formula' regsubsets(x=, data=, weights=NULL, nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive", "backward", "forward", "seqrep"), really.big=FALSE, nested=(nbest==1),...) ## Default S3 method: regsubsets(x=, y=, weights=rep(1, length(y)), nbest=1, nvmax=8, force.in=NULL, force.out=NULL, intercept=TRUE, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'biglm' regsubsets(x,nbest=1,nvmax=8,force.in=NULL, method=c("exhaustive","backward", "forward", "seqrep"), really.big=FALSE,nested=(nbest==1),...) ## S3 method for class 'regsubsets' summary(object,all.best=TRUE,matrix=TRUE,matrix.logical=FALSE,df=NULL,...) ## S3 method for class 'regsubsets' coef(object,id,vcov=FALSE,...) ## S3 method for class 'regsubsets' vcov(object,id,...)
x 
design matrix or model formula for full model, or 
data 
Optional data frame 
y 
response vector 
weights 
weight vector 
nbest 
number of subsets of each size to record 
nvmax 
maximum size of subsets to examine 
force.in 
index to columns of design matrix that should be in all models 
force.out 
index to columns of design matrix that should be in no models 
intercept 
Add an intercept? 
method 
Use exhaustive search, forward selection, backward selection or sequential replacement to search. 
really.big 
Must be TRUE to perform exhaustive search on more than 50 variables. 
nested 
See the Note below: if 
object 
regsubsets object 
all.best 
Show all the best subsets or just one of each size 
matrix 
Show a matrix of the variables in each model or just summary statistics 
matrix.logical 
With 
df 
Specify a number of degrees of freedom for the summary
statistics. The default is 
id 
Which model or models (ordered as in the summary output) to return coefficients and variance matrix for 
vcov 
If 
... 
Other arguments for future methods 
Since this function returns separate best models of all sizes up to
nvmax
and since different model selection criteria such as AIC,
BIC, CIC, DIC, ... differ only in how models of different sizes are compared, the
results do not depend on the choice of costcomplexity tradeoff.
When x
is a biglm
object it is assumed to be the full
model, so force.out
is not relevant. If there is an intercept it
is forced in by default; specify a force.in
as a logical vector
with FALSE
as the first element to allow the intercept to be
dropped.
The model search does not actually fit each model, so the returned
object does not contain coefficients or standard errors. Coefficients
and the variancecovariance matrix for one or model models can be
obtained with the coef
and vcov
methods.
regsubsets
returns an object of class "regsubsets" containing no
userserviceable parts. It is designed to be processed by
summary.regsubsets
.
summary.regsubsets
returns an object with elements
which 
A logical matrix indicating which elements are in each model 
rsq 
The rsquared for each model 
rss 
Residual sum of squares for each model 
adjr2 
Adjusted rsquared 
cp 
Mallows' Cp 
bic 
Schwartz's information criterion, BIC 
outmat 
A version of the 
obj 
A copy of the 
The coef
method returns a coefficient vector or list of vectors,
the vcov
method returns a matrix or list of matrices.
As part of the setup process, the code initially fits models with the
first variable in x
, the first two, the first three, and so on.
For forward and backward selection it is possible that the model with the k
first variables will be better than the model with k
variables from the selection algorithm. If it is, the model with the
first k
variables will be returned, with a warning. This can
happen for forward and backward selection. It (obviously) can't for
exhaustive search.
With nbest=1
you can avoid these extra models with
nested=TRUE
, which is the default.
data(swiss) a<regsubsets(as.matrix(swiss[,1]),swiss[,1]) summary(a) b<regsubsets(Fertility~.,data=swiss,nbest=2) summary(b) coef(a, 1:3) vcov(a, 3)
data(swiss) a<regsubsets(as.matrix(swiss[,1]),swiss[,1]) summary(a) b<regsubsets(Fertility~.,data=swiss,nbest=2) summary(b) coef(a, 1:3) vcov(a, 3)