Title: | CAlibrating Penalities Using Slope HEuristics |
---|---|
Description: | Calibration of penalized criteria for model selection. The calibration methods available are based on the slope heuristics. |
Authors: | Sylvain Arlot, Vincent Brault, Jean-Patrick Baudry, Cathy Maugis and Bertrand Michel |
Maintainer: | Vincent Brault <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 1.1.2 |
Built: | 2024-11-22 06:28:31 UTC |
Source: | CRAN |
This package includes functions for model selection via penalization. The model
selection criterion has the following form: .
Two algorithms based on the slope heuristics are proposed to calibrate the
parameter
in the penalty: the data-driven slope estimation algorithm (DDSE)
and the dimension jump algorithm (Djump).
The data-driven slope estimation algorithm and the dimension jump algorithm are
respectively implemented into the DDSE
function and the Djump
function. Somes
classes are defined for the outputs of DDSE
and Djump
and a graphical display is
available for each one of these two classes. DDSE
and Djump
are both included in
the capushe
function which is the main function of the package.
Sylvain Arlot, Vincent Brault, Jean-Patrick Baudry, Cathy Maugis and Bertrand Michel.
Maintainer: Vincent Brault <[email protected]>
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
Djump
and DDSE
for model selection
algorithms based on the slope heuristics. plot
for a
graphical display of the two algorithms. validation
to check that the slope heuristics can be applied confidently.
data(datacapushe) ## capushe returns the same model with DDSE and Djump: capushe(datacapushe) ## capushe also returns the model selected by AIC and BIC capushe(datacapushe,n=1000) ## Djump only Djump(datacapushe) ## DDSE only DDSE(datacapushe) ## Graphical representations plot(Djump(datacapushe)) plot(DDSE(datacapushe)) plot(capushe(datacapushe)) ## Validation procedure data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) plot(capushepartial) ## Additional data data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe.
data(datacapushe) ## capushe returns the same model with DDSE and Djump: capushe(datacapushe) ## capushe also returns the model selected by AIC and BIC capushe(datacapushe,n=1000) ## Djump only Djump(datacapushe) ## DDSE only DDSE(datacapushe) ## Graphical representations plot(Djump(datacapushe)) plot(DDSE(datacapushe)) plot(capushe(datacapushe)) ## Validation procedure data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) plot(capushepartial) ## Additional data data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe.
These functions return the model selected by the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).
AICcapushe(data, n) BICcapushe(data, n)
AICcapushe(data, n) BICcapushe(data, n)
data |
|
n |
|
The penalty shape value should be increasing with respect to the complexity value (column 3).
The complexity values have to be positive.
n
is necessary to compute AIC and BIC criteria. n
is the size of
sample used to compute the contrast values given in the data
matrix.
Do not confuse n
with the size of the model collection which is the number
of rows of the data
matrix.
model |
The model selected by AIC or BIC. |
AIC |
The corresponding value of AIC (for AICcapushe only). |
BIC |
The corresponding value of BIC (for BICcapushe only). |
Vincent Brault
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
capushe
for a model selection function including AIC, BIC,
the DDSE
algorithm and the Djump
algorithm.
data(datacapushe) AICcapushe(datacapushe,n=1000) BICcapushe(datacapushe,n=1000)
data(datacapushe) AICcapushe(datacapushe,n=1000) BICcapushe(datacapushe,n=1000)
The capushe
function proposes two algorithms based on the slope heuristics
to calibrate penalties in the context of model selection via penalization.
capushe(data,n=0,pct=0.15,point=0,psi.rlm=psi.bisquare,scoef=2, Careajump=0,Ctresh=0)
capushe(data,n=0,pct=0.15,point=0,psi.rlm=psi.bisquare,scoef=2, Careajump=0,Ctresh=0)
data |
|
n |
|
pct |
Minimum percentage of points for the plateau selection.
See |
point |
Minimum number of point for the plateau selection (See |
psi.rlm |
Weight function used by |
scoef |
Ratio parameter. Default value is 2. |
Careajump |
Constant of jump area (See |
Ctresh |
Maximal treshold for the complexity associated to the penalty coefficient (See |
The model selected by the procedure fulfills
argmin
where
is the penalty coefficient.
is the empirical contrast.
is the estimator for the model
.
is the ratio parameter.
is the penalty shape.
The capushe function calls the functions DDSE
and
Djump
to calibrate , see the description of these functions
for more details.
In the case of equality between two penalty shape values, only the model with the
smallest contrast is considered.
@DDSE |
A list returned by the |
@DDSE@model |
The |
@DDSE@kappa |
The vector of the successive slope values. |
@DDSE@ModelHat |
A list providing details about the model selected by the |
@DDSE@interval |
A list about the "slope interval" corresponding to the
plateau selected in |
@DDSE@graph |
A list computed for the |
@Djump |
A list returned by the |
@Djump@model |
The |
@Djump@ModelHat |
A list providing details about the model selected by the |
@Djump@graph |
A list computed for the |
@AIC_capushe |
A list returned by the |
@BIC_capushe |
A list returned by the |
@n |
Sample size. |
Vincent Brault
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
Djump
, DDSE
, AIC
or BIC
to use only one of these model selection functions.
plot
for graphical displays of DDSE
and Djump.
data(datacapushe) capushe(datacapushe) capushe(datacapushe,1000)
data(datacapushe) capushe(datacapushe) capushe(datacapushe,1000)
A dataframe example for the capushe package
based on a simulated Gaussian
mixture dataset in .
data(datacapushe)
data(datacapushe)
A data frame with 50 rows (models) and the following 4 variables:
model
a character vector
: model names.
pen
a numeric vector
: model penalty shape values.
complexity
a numeric vector
: model complexity values.
contrast
a numeric vector
: model contrast values.
The simulated dataset is composed of observations in
. It
consists of an equiprobable mixture of three large "bubble" groups centered at
,
and
respectively. Each
bubble group
is simulated from a mixture of seven components according
to the following density distribution:
with ,
,
,
,
,
and
. Thus the
distribution of the dataset is actually a
-component Gaussian mixture.
A model collection of spherical Gaussian mixtures is considered and the dataframe
datacapushe
contains the maximum likelihood estimations for each of these models.
The number of free parameters of each model is used for the complexity values and
is defined by this complexity divided by
.
datapartialcapushe
and datavalidcapushe
can be used to run the
validation
function. datapartialcapushe
only
contains the models with less than components.
datavalidcapushe
contains three models with ,
and
components respectively.
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe)) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe.
data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe)) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe.
DDSE
is a model selection function based on the slope heuristics.
DDSE(data, pct = 0.15, point = 0, psi.rlm = psi.bisquare, scoef = 2)
DDSE(data, pct = 0.15, point = 0, psi.rlm = psi.bisquare, scoef = 2)
data |
|
pct |
Minimum percentage of points for the plateau selection. It must be between 0 and 1. Default value is 0.15. |
point |
Minimum number of point for the plateau selection.
If |
psi.rlm |
Weight function used by |
scoef |
Ratio parameter. Default value is 2. |
Let be the model collection and
.
The DDSE algorithm proceeds in four steps:
If several models in the collection have the same penalty shape value (column 2),
only the model having the smallest contrast value (column 4)
is considered.
For any , the slope
(argument
@kappa
) of the linear regression
(argument psi.rlm
) on the couples of points
is computed.
For any , the model fulfilling the following condition is selected:
argmin
.
This gives an increasing sequence of change-points (output
@ModelHat$point_breaking
). Let (output
@ModelHat$number_plateau
)
be the lengths of each "plateau".
If point
is different from 0, let max
else let
max
(output
@ModelHat$imax
).
The model (output
@model
) is finally returned.
The "slope interval" is the interval where
and
.
@model |
The |
@kappa |
The vector of the successive slope values. |
@ModelHat |
A list describing the algorithm. |
@ModelHat$model_hat |
The vector of preselected models |
@ModelHat$point_breaking |
The vector of the breaking points |
@ModelHat$number_plateau |
The vector of the lengths |
@ModelHat$imax |
The rank |
@interval |
A list about the "slope interval". |
@interval$interval |
The slope interval. |
@interval$percent_of_points |
The proportion |
@graph |
A list computed for the |
Vincent Brault
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
capushe
for a model selection function including AIC
,
BIC
, the DDSE
algorithm and the Djump
algorithm.
plot
for graphical dsiplays of the DDSE
algorithm
and the Djump
algorithm.
data(datacapushe) DDSE(datacapushe) plot(DDSE(datacapushe)) ## DDSE with "lm" for the regression DDSE(datacapushe,psi.rlm="lm")
data(datacapushe) DDSE(datacapushe) plot(DDSE(datacapushe)) ## DDSE with "lm" for the regression DDSE(datacapushe,psi.rlm="lm")
Djump
is a model selection function based on the slope heuristics.
Djump(data,scoef=2,Careajump=0,Ctresh=0)
Djump(data,scoef=2,Careajump=0,Ctresh=0)
data |
|
scoef |
Ratio parameter. Default value is 2. |
Careajump |
Constant of jump area. Default value is 0 (no area). In practice,
it is advisable to take |
Ctresh |
Maximal treshold for the complexity associated to the penalty coefficient.
Default value is 0 (Maximal jump selected as the greatest jump). In practice,
it is advisable to take |
The Djump algorithm proceeds in three steps:
For all , compute
This gives a decreasing step function .
Find such that
corresponds to the
greatest jump of complexity if
else
such that
Select (output
@model
).
Arlot has proposed a jump area containing the maximal jump defined by :
If ,
Djump
return the area with the greatest jump. In practice,
it is advisable to take where
is the number of observations.
@model |
The |
@ModelHat |
A list describing the algorithm. |
@ModelHat$jump |
The vector of jump heights. |
@ModelHat$kappa |
The vector of the values of |
@ModelHat$model_hat |
The vector of the selected models |
@ModelHat$JumpMax |
The location of the greatest jump. |
@ModelHat$Kopt |
|
@graph |
A list computed for the |
Vincent Brault
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
capushe
for a model selection function including AIC
,
BIC
, the DDSE
algorithm and the Djump
algorithm. plot
for a graphical display of the DDSE
algorithm and the Djump
algorithm.
data(datacapushe) Djump(datacapushe) plot(Djump(datacapushe)) Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(Djump(datacapushe,Careajump=sqrt(log(1000)/1000))) Djump(datacapushe,Ctresh=1000/log(1000)) plot(Djump(datacapushe,Ctresh=1000/log(1000)))
data(datacapushe) Djump(datacapushe) plot(Djump(datacapushe)) Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(Djump(datacapushe,Careajump=sqrt(log(1000)/1000))) Djump(datacapushe,Ctresh=1000/log(1000)) plot(Djump(datacapushe,Ctresh=1000/log(1000)))
The plot methods allow the user to check that the slope heuristics can be applied confidently.
Usage
plot(x,newwindow=TRUE,ask=TRUE) for capushe
.
plot(x,newwindow=TRUE) for DDSE
and Djump
.
x |
|
newwindow |
If |
ask |
If |
The graphical window of DDSE
is composed of three graphics (see DDSE
for more details):
The left plot shows with respect to the
penalty shape values.
Successive slope values .
The bottomright plot shows the selected models with respect
to the successive slope values. The plateau in blue is selected.
The graphical window of Djump
shows the complexity of
the selected model with respect to
.
corresponds
to the greatest jump.
is defined by
.
The red line represents the slope interval computed by the
DDSE
algorithm
(only for capushe
). See Djump
for more details.
signature(x = "Capushe")
This graphical function displays the DDSE
plot and the Djump
plot.
signature(x = "DDSE")
This graphical function displays the DDSE
plot.
signature(x = "Djump")
This graphical function displays the Djump
plot.
Use newwindow
=FALSE
to produce a PDF files (for an object of class
capushe
, use moreover ask
=FALSE
).
validation
checks that the slope heuristics can be applied confidently.
validation(x,data2,...)
validation(x,data2,...)
x |
|
data2 |
|
... |
|
The validation
function plots the additional and more complex models data2
to check that the linear relation between the penalty shape values and the contrast
values (which is recorded in x
) is valid for the more complex models.
Vincent Brault
http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html
http://www.math.u-psud.fr/~brault/capushe.html
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1
capushe
for a more general model selection function including
AIC
, BIC
, the DDSE
algorithm and the Djump
algorithm.
data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe. data(datacapushe) plot(capushe(datacapushe))
data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe) ## The slope heuristics should not ## be applied for datapartialcapushe. data(datacapushe) plot(capushe(datacapushe))