| Title: | Selection of Linear Estimators |
|---|---|
| Description: | Estimate the mean of a Gaussian vector, by choosing among a large collection of estimators, following the method developed by Y. Baraud, C. Giraud and S. Huet (2014) <doi:10.1214/13-AIHP539>. In particular it solves the problem of variable selection by choosing the best predictor among predictors emanating from different methods as lasso, elastic-net, adaptive lasso, pls, randomForest. Moreover, it can be applied for choosing the tuning parameter in a Gauss-lasso procedure. |
| Authors: | Yannick Baraud [aut], Christophe Giraud [aut], Sylvie Huet [aut], Benjamin Auder [cre] |
| Maintainer: | Benjamin Auder <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.6 |
| Built: | 2026-05-09 06:42:21 UTC |
| Source: | https://github.com/cran/LINselect |
Calculate the penalty function for estimators selection.
penalty(Delta, n, p, K)penalty(Delta, n, p, K)
Delta |
vector with |
n |
integer : number of observatons. |
p |
integer : number of variables. |
K |
scalar : constant in the penalty function. |
A vector with the same length as Delta: for each d=0, ..., Dmax, let
N=n-d, D=d+1 and pen(d) = x K N/(N-1) where x satisfies
(x) = exp(-Delta(d)), when Delta(d)<50,
where (x)=pf(q=x/(D+2),df1=D+2,df2=N-1,lower.tail=F)-(x/D)pf(q=(N+1)x/D(N-1),df1=D,df2=N+1,lower.tail=F)
(x) = Delta(d), when
Delta(d)50,
where (x)=lbeta(1+D/2,(N-1)/2)-log(2(2x+(N-1)D)/((N-1)(N+2)x))-(N-1)/2log((N-1)/(N-1+x))-(D/2)log(x/(N-1+x))
The values of the penalty function greater than 1e+08 are set to 1e+08.
If for some Delta(d) the
equation (x) = exp(-Delta(d)/(d+1)) has no
solution, then the execution is stopped.
Yannick Baraud, Christophe Giraud, Sylvie Huet
Function to simulate data
simulData(p = 100, n = 100, beta = NULL, C = NULL, r = 0.95, rSN = 10)simulData(p = 100, n = 100, beta = NULL, C = NULL, r = 0.95, rSN = 10)
p |
integer : number of variates. Should be >15 if |
n |
integer : number of observations |
beta |
vector with |
C |
matrix |
r |
scalar for calculating the covariance of X when |
rSN |
scalar : ratio signal/noise |
When beta is NULL, then p should be
greater than 15 and
beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15))
When C is NULL, then C is block
diagonal with C[a,b] = r**abs(a-b) for C[a,b] = r**abs(a-b) for
The lines of X are n i.i.d. gaussian variables with
mean 0 and covariance matrix C.
The variance sigma**2 equals the squared euclidean
norm of divided by rSN*n.
A list with components :
Y |
vector |
X |
matrix |
C |
matrix |
sigma |
scalar. See details. |
beta |
vector with |
Library mvtnorm is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
tune the lasso parameter in the
regression model :
using the lasso or the gauss-lasso method
tuneLasso(Y, X, normalize = TRUE, method = c("lasso", "Glasso"), dmax = NULL, Vfold = TRUE, V = 10, LINselect = TRUE, a = 0.5, K = 1.1, verbose = TRUE, max.steps = NULL)tuneLasso(Y, X, normalize = TRUE, method = c("lasso", "Glasso"), dmax = NULL, Vfold = TRUE, V = 10, LINselect = TRUE, a = 0.5, K = 1.1, verbose = TRUE, max.steps = NULL)
Y |
vector with n components : response variable. |
X |
matrix with n rows and p columns : covariates. |
normalize |
logical : corresponds to the input |
method |
vector of characters whose components are subset of (“lasso”, “Glasso”) |
dmax |
integer : maximum number of variables in the lasso
estimator. |
Vfold |
logical : if TRUE the tuning is done by Vfold-CV |
V |
integer. Gives the value of V in the Vfold-CV procedure |
LINselect |
logical : if TRUE the tuning is done by LINselect |
a |
scalar : value of the parameter |
K |
scalar : value of the parameter |
verbose |
logical : if TRUE a trace of the current process is displayed in real time. |
max.steps |
integer : maximum number of steps in the lasso
procedure. |
A list with one or two components according to
method. lasso if method contains "lasso" is a list with one or two components
according to Vfold and LINselect.
Ls if LINselect=TRUE. A list with components
support: vector of integers. Estimated support of the
parameter vector .
coef: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients.
fitted: vector with length n. Fitted value of
the response.
crit: vector containing the values of the criteria for
each value of lambda.
lambda: vector containing the values of the tuning
parameter of the lasso algorithm.
CV if Vfold=TRUE. A list with components
support: vector of integers. Estimated support of the
parameter vector .
coef: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients.
fitted: vector with length n. Fitted value of
the response.
crit: vector containing the values of the criteria for
each value of lambda.
crit.err: vector containing the estimated
standard-error of the criteria.
lambda: vector containing the values of the tuning
parameter of the lasso algorithm.
Glasso if method contains "Glasso". The same as lasso.
library elasticnet is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
See Baraud et al. 2010
http://hal.archives-ouvertes.fr/hal-00502156/fr/
Giraud et al., 2013,
https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1356098553
#source("charge.R") library("LINselect") # simulate data with ## Not run: ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.tuneLasso <- tuneLasso(ex$Y,ex$X) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- tuneLasso(y,x2) ## Not run: detach(diabetes)#source("charge.R") library("LINselect") # simulate data with ## Not run: ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.tuneLasso <- tuneLasso(ex$Y,ex$X) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- tuneLasso(y,x2) ## Not run: detach(diabetes)
Estimation in the regression model :
Variable selection by choosing the best predictor among
predictors emanating
from different methods as lasso,
elastic-net, adaptive lasso, pls, randomForest.
VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso", "ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"), pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL, en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL, exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL, K = 1.1, verbose = TRUE, long.output = FALSE)VARselect(Y, X, dmax = NULL, normalize = TRUE, method = c("lasso", "ridge", "pls", "en", "ALridge", "ALpls", "rF", "exhaustive"), pen.crit = NULL, lasso.dmax = NULL, ridge.dmax = NULL, pls.dmax = NULL, en.dmax = NULL, ALridge.dmax = NULL, ALpls.dmax = NULL, rF.dmax = NULL, exhaustive.maxdim = 5e+05, exhaustive.dmax = NULL, en.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), rF.lmtry = 2, pls.ncomp = 5, ALridge.lambda = c(0.01, 0.1, 0.5, 1, 2, 5), ALpls.ncomp = 5, max.steps = NULL, K = 1.1, verbose = TRUE, long.output = FALSE)
Y |
vector with n components : response variable. |
X |
matrix with n rows and p columns : covariates. |
dmax |
integer : maximum number of variables in the lasso
estimator. |
normalize |
logical : if TRUE the columns of X are scaled |
method |
vector of characters whose components are subset of |
pen.crit |
vector with |
lasso.dmax |
integer lower than |
ridge.dmax |
integer lower than |
pls.dmax |
integer lower than |
en.dmax |
integer lower than |
ALridge.dmax |
integer lower than |
ALpls.dmax |
integer lower than |
rF.dmax |
integer lower than |
exhaustive.maxdim |
integer : maximum number of subsets of covariates considered in the exhaustive method. See details. |
exhaustive.dmax |
integer lower than |
en.lambda |
vector : tuning parameter of the
ridge. It is the input parameter |
ridge.lambda |
vector : tuning parameter of the
ridge. It is the input parameter lambda of function
|
rF.lmtry |
vector : tuning paramer |
pls.ncomp |
integer : tuning parameter of the pls. It is the
input parameter |
ALridge.lambda |
similar to
|
ALpls.ncomp |
similar to |
max.steps |
integer. Maximum number of steps in the lasso
procedure. Corresponds to the input |
K |
scalar : value of the parameter |
verbose |
logical : if TRUE a trace of the current process is displayed in real time. |
long.output |
logical : if FALSE only the component summary will be returned. See Value. |
When method is pls or ALpls, the
LINselect procedure is carried out considering the number
of components in the pls method as the tuning
parameter.
This tuning parameter varies from 1 to pls.ncomp.
When method is exhaustive, the maximum
number of variate d is calculated as
follows.
Let q be the largest integer such that choose(p,q) <
exhaustive.maxdim. Then d = min(q, exhaustive.dmax,dmax).
A list with at least length(method)
components.
For each procedure in method a list with components
support: vector of integers. Estimated support of the
parameters for the considered procedure.
crit: scalar equals to the LINselect criteria
calculated in the estimated support.
fitted: vector with length n. Fitted value of
the response calculated when the support of
equals support.
coef: vector whose first component is the estimated
intercept.
The other components are the estimated non zero
coefficients when the support of
equals support.
If length(method) > 1, the additional component summary is a list with three
components:
support: vector of integers. Estimated support of the
parameters corresponding to the minimum
of the criteria among all procedures.
crit: scalar. Minimum value of the
criteria among all procedures.
method: vector of characters. Names of the
procedures for
which the minimum is reached
If pen.crit = NULL, the component pen.crit gives the
values of the penalty calculated by the function penalty.
If long.output is TRUE the component named
chatty is a list with length(method)
components.
For each procedure in method, a list with components
support where support[[l]] is a vector of
integers containing an estimator of the support of the
parameters .
crit : vector where crit[l] contains the
value of the LINselect criteria calculated in
support[[l]].
When method is lasso, library elasticnet is loaded.
When method is en, library elasticnet is loaded.
When method is ridge, library MASS is loaded.
When method is rF, library randomForest is loaded.
When method is pls, library pls is loaded.
When method is ALridge, libraries MASS and elasticnet are loaded.
When method is ALpls, libraries pls and elasticnet are loaded.
When method is exhaustive, library gtools
is loaded.
Yannick Baraud, Christophe Giraud, Sylvie Huet
See Baraud et al. 2010
http://hal.archives-ouvertes.fr/hal-00502156/fr/
Giraud et al., 2013,
https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1356098553
#source("charge.R") library("LINselect") # simulate data with # beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15)) ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.VARselect <- VARselect(ex$Y,ex$X,exhaustive.dmax=2) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- VARselect(y,x2,exhaustive.dmax=5) ## Not run: detach(diabetes)#source("charge.R") library("LINselect") # simulate data with # beta=c(rep(2.5,5),rep(1.5,5),rep(0.5,5),rep(0,p-15)) ex <- simulData(p=100,n=100,r=0.8,rSN=5) ## Not run: ex1.VARselect <- VARselect(ex$Y,ex$X,exhaustive.dmax=2) ## Not run: data(diabetes) ## Not run: attach(diabetes) ## Not run: ex.diab <- VARselect(y,x2,exhaustive.dmax=5) ## Not run: detach(diabetes)