Title: | Selection of Linear Estimators |
---|---|
Description: | Estimate the mean of a Gaussian vector, by choosing among a large collection of estimators, following the method developed by Y. Baraud, C. Giraud and S. Huet (2014) <doi:10.1214/13-AIHP539>. In particular it solves the problem of variable selection by choosing the best predictor among predictors emanating from different methods as lasso, elastic-net, adaptive lasso, pls, randomForest. Moreover, it can be applied for choosing the tuning parameter in a Gauss-lasso procedure. |
Authors: | Yannick Baraud, Christophe Giraud, Sylvie Huet |
Maintainer: | Benjamin Auder <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.5 |
Built: | 2024-11-02 06:16:09 UTC |
Source: | CRAN |
Calculate the penalty function for estimators selection.
penalty(Delta, n, p, K)
penalty(Delta, n, p, K)
Delta |
vector with |
n |
integer : number of observatons. |
p |
integer : number of variables. |
K |
scalar : constant in the penalty function. |
A vector with the same length as Delta: for each d
=0, ..., Dmax
, let
N
=n
-d
, D
=d+1
and pen(d) = x K N/(N-1)
where x
satisfies
(x) = exp(-Delta(d))
, when Delta(d)<50
,
where (x)=pf(q=x/(D+2),df1=D+2,df2=N-1,lower.tail=F)-(x/D)pf(q=(N+1)x/D(N-1),df1=D,df2=N+1,lower.tail=F)
(x) = Delta(d)
, when
Delta(d)
50
,
where (x)=lbeta(1+D/2,(N-1)/2)-log(2(2x+(N-1)D)/((N-1)(N+2)x))-(N-1)/2log((N-1)/(N-1+x))-(D/2)log(x/(N-1+x))
The values of the penalty function greater than 1e+08 are set to 1e+08.
If for some Delta(d)
the
equation (x) = exp(-Delta(d)/(d+1))
has no
solution, then the execution is stopped.
Yannick Baraud, Christophe Giraud, Sylvie Huet
Function to simulate data
simulData(p = 100, n = 100, beta = NULL, C = NULL, r = 0.95, rSN = 10)
simulData(p = 100, n = 100, beta = NULL, C = NULL, r = 0.95, rSN = 10)
p |
integer : number of variates. Should be >15 if |
n |
integer : number of observations |
beta |
vector with |
C |
matrix |
r |
scalar for calculating the covariance of X when |
rSN |
scalar : ratio signal/noise |
When beta
is NULL
, then p
should be
greater than 15 and
beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15))
When C
is NULL
, then C
is block
diagonal with C[a,b] = r**abs(a-b)
for
C[a,b] = r**abs(a-b)
for
The lines of X
are n
i.i.d. gaussian variables with
mean 0 and covariance matrix C
.
The variance sigma**2
equals the squared euclidean
norm of divided by
rSN*n
.
A list with components :
Y |
vector |
X |
matrix |
C |
matrix |
sigma |
scalar. See details. |
beta |
vector with |
Library mvtnorm
is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
tune the lasso parameter in the
regression model :
using the lasso or the gauss-lasso method
tuneLasso(Y, X, normalize = TRUE, method = c("lasso", "Glasso"), dmax = NULL, Vfold = TRUE, V = 10, LINselect = TRUE, a = 0.5, K = 1.1, verbose = TRUE, max.steps = NULL)
tuneLasso(Y, X, normalize = TRUE, method = c("lasso", "Glasso"), dmax = NULL, Vfold = TRUE, V = 10, LINselect = TRUE, a = 0.5, K = 1.1, verbose = TRUE, max.steps = NULL)
Y |
vector with n components : response variable. |
X |
matrix with n rows and p columns : covariates. |
normalize |
logical : corresponds to the input |
method |
vector of characters whose components are subset of (“lasso”, “Glasso”) |
dmax |
integer : maximum number of variables in the lasso
estimator. |
Vfold |
logical : if TRUE the tuning is done by Vfold-CV |
V |
integer. Gives the value of V in the Vfold-CV procedure |
LINselect |
logical : if TRUE the tuning is done by LINselect |
a |
scalar : value of the parameter |
K |
scalar : value of the parameter |
verbose |
logical : if TRUE a trace of the current process is displayed in real time. |
max.steps |
integer : maximum number of steps in the lasso
procedure. |
A list with one or two components according to
method
. lasso
if method
contains "lasso" is a list with one or two components
according to Vfold
and LINselect
.
Ls
if LINselect
=TRUE. A list with components
support
: vector of integers. Estimated support of the
parameter vector .
coef
: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients.
fitted
: vector with length n. Fitted value of
the response.
crit
: vector containing the values of the criteria for
each value of lambda
.
lambda
: vector containing the values of the tuning
parameter of the lasso algorithm.
CV
if Vfold
=TRUE. A list with components
support
: vector of integers. Estimated support of the
parameter vector .
coef
: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients.
fitted
: vector with length n. Fitted value of
the response.
crit
: vector containing the values of the criteria for
each value of lambda
.
crit.err
: vector containing the estimated
standard-error of the criteria.
lambda
: vector containing the values of the tuning
parameter of the lasso algorithm.
Glasso
if method
contains "Glasso". The same as lasso
.
library elasticnet
is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
See Baraud et al. 2010
http://hal.archives-ouvertes.fr/hal-00502156/fr/
Giraud et al., 2013,
https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1356098553
#source("charge.R") library("LINselect") # simulate data with ## Not run: ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.tuneLasso <- tuneLasso(ex$Y,ex$X) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- tuneLasso(y,x2) ## Not run: detach(diabetes)
#source("charge.R") library("LINselect") # simulate data with ## Not run: ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.tuneLasso <- tuneLasso(ex$Y,ex$X) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- tuneLasso(y,x2) ## Not run: detach(diabetes)
Estimation in the regression model :
Variable selection by choosing the best predictor among
predictors emanating
from different methods as lasso,
elastic-net, adaptive lasso, pls, randomForest.
VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso", "ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"), pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL, en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL, exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL, K = 1.1, verbose = TRUE, long.output = FALSE)
VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso", "ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"), pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL, en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL, exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL, K = 1.1, verbose = TRUE, long.output = FALSE)
Y |
vector with n components : response variable. |
X |
matrix with n rows and p columns : covariates. |
dmax |
integer : maximum number of variables in the lasso
estimator. |
normalize |
logical : if TRUE the columns of X are scaled |
method |
vector of characters whose components are subset of |
pen.crit |
vector with |
lasso.dmax |
integer lower than |
ridge.dmax |
integer lower than |
pls.dmax |
integer lower than |
en.dmax |
integer lower than |
ALridge.dmax |
integer lower than |
ALpls.dmax |
integer lower than |
rF.dmax |
integer lower than |
exhaustive.maxdim |
integer : maximum number of subsets of covariates considered in the exhaustive method. See details. |
exhaustive.dmax |
integer lower than |
en.lambda |
vector : tuning parameter of the
ridge. It is the input parameter |
ridge.lambda |
vector : tuning parameter of the
ridge. It is the input parameter lambda of function
|
rF.lmtry |
vector : tuning paramer |
pls.ncomp |
integer : tuning parameter of the pls. It is the
input parameter |
ALridge.lambda |
similar to
|
ALpls.ncomp |
similar to |
max.steps |
integer. Maximum number of steps in the lasso
procedure. Corresponds to the input |
K |
scalar : value of the parameter |
verbose |
logical : if TRUE a trace of the current process is displayed in real time. |
long.output |
logical : if FALSE only the component summary will be returned. See Value. |
When method is pls
or ALpls
, the
LINselect
procedure is carried out considering the number
of components in the pls
method as the tuning
parameter.
This tuning parameter varies from 1 to pls.ncomp
.
When method is exhaustive
, the maximum
number of variate d is calculated as
follows.
Let q be the largest integer such that choose(p,q)
<
exhaustive.maxdim
. Then d = min(q, exhaustive.dmax,dmax)
.
A list with at least length(method)
components.
For each procedure in method
a list with components
support
: vector of integers. Estimated support of the
parameters for the considered procedure.
crit
: scalar equals to the LINselect criteria
calculated in the estimated support.
fitted
: vector with length n. Fitted value of
the response calculated when the support of
equals
support
.
coef
: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients when the support of
equals
support
.
If length(method)
> 1, the additional component summary
is a list with three
components:
support
: vector of integers. Estimated support of the
parameters corresponding to the minimum
of the criteria among all procedures.
crit
: scalar. Minimum value of the
criteria among all procedures.
method
: vector of characters. Names of the
procedures for
which the minimum is reached
If pen.crit = NULL
, the component pen.crit
gives the
values of the penalty calculated by the function penalty
.
If long.output
is TRUE the component named
chatty
is a list with length(method)
components.
For each procedure in method
, a list with components
support
where support[[l]]
is a vector of
integers containing an estimator of the support of the
parameters .
crit
: vector where crit[l]
contains the
value of the LINselect criteria calculated in
support[[l]]
.
When method is lasso
, library elasticnet
is loaded.
When method is en
, library elasticnet
is loaded.
When method is ridge
, library MASS
is loaded.
When method is rF
, library randomForest
is loaded.
When method is pls
, library pls
is loaded.
When method is ALridge
, libraries MASS
and elasticnet
are loaded.
When method is ALpls
, libraries pls
and elasticnet
are loaded.
When method is exhaustive
, library gtools
is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
See Baraud et al. 2010
http://hal.archives-ouvertes.fr/hal-00502156/fr/
Giraud et al., 2013,
https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1356098553
#source("charge.R") library("LINselect") # simulate data with # beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15)) ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.VARselect <- VARselect(ex$Y,ex$X,exhaustive.dmax=2) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- VARselect(y,x2,exhaustive.dmax=5) ## Not run: detach(diabetes)
#source("charge.R") library("LINselect") # simulate data with # beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15)) ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.VARselect <- VARselect(ex$Y,ex$X,exhaustive.dmax=2) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- VARselect(y,x2,exhaustive.dmax=5) ## Not run: detach(diabetes)