Title: | Sparse-Group SLOPE: Adaptive Bi-Level Selection with FDR Control |
---|---|
Description: | Implementation of Sparse-group SLOPE (SGS) (Feser and Evangelou (2023) <doi:10.48550/arXiv.2305.09467>) models. Linear and logistic regression models are supported, both of which can be fit using k-fold cross-validation. Dense and sparse input matrices are supported. In addition, a general Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018) <doi:10.48550/arXiv.1804.02339>) implementation is provided. Group SLOPE (gSLOPE) (Brzyski et al. (2019) <doi:10.1080/01621459.2017.1411269>) and group-based OSCAR models (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) are also implemented. All models are available with strong screening rules (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) for computational speed-up. |
Authors: | Fabio Feser [aut, cre] |
Maintainer: | Fabio Feser <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.2 |
Built: | 2024-12-28 06:43:38 UTC |
Source: | CRAN |
Matrix Product in RcppArmadillo.
arma_mv(m, v)
arma_mv(m, v)
m |
numeric matrix |
v |
numeric vector |
matrix product of m and v
Matrix Product in RcppArmadillo.
arma_sparse(m, v)
arma_sparse(m, v)
m |
numeric sparse matrix |
v |
numeric vector |
matrix product of m and v
Fits an SGS model using the noise estimation procedure, termed adaptively scaled SGS (Algorithm 2 from Feser and Evangelou (2023)).
This adaptively estimates and then fits the model using the estimated value. It is an alternative approach to
cross-validation (
fit_sgs_cv()
). The approach is only compatible with the SGS penalties.
as_sgs( X, y, groups, type = "linear", pen_method = 2, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, standardise = "l2", intercept = TRUE, verbose = FALSE )
as_sgs( X, y, groups, type = "linear", pen_method = 2, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, standardise = "l2", intercept = TRUE, verbose = FALSE )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
pen_method |
The type of penalty sequences to use.
|
alpha |
The value of |
vFDR |
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1. |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
verbose |
Logical flag for whether to print fitting information. |
An object of type "sgs"
containing model fit information (see fit_sgs()
).
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Other model-selection:
fit_goscar_cv()
,
fit_gslope_cv()
,
fit_sgo_cv()
,
fit_sgs_cv()
,
scaled_sgs()
Other SGS-methods:
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
Function for fitting adaptive three operator splitting (ATOS) with general convex penalties. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
atos( X, y, type = "linear", prox_1, prox_2, pen_prox_1 = 0.5, pen_prox_2 = 0.5, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, prox_1_opts = NULL, prox_2_opts = NULL, standardise = "l2", intercept = TRUE, x0 = NULL, u = NULL, verbose = FALSE )
atos( X, y, type = "linear", prox_1, prox_2, pen_prox_1 = 0.5, pen_prox_2 = 0.5, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, prox_1_opts = NULL, prox_2_opts = NULL, standardise = "l2", intercept = TRUE, x0 = NULL, u = NULL, verbose = FALSE )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
type |
The type of regression to perform. Supported values are: |
prox_1 |
The proximal operator for the first function, |
prox_2 |
The proximal operator for the second function, |
pen_prox_1 |
The penalty for the first proximal operator. For the lasso, this would be the sparsity parameter, |
pen_prox_2 |
The penalty for the second proximal operator. |
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
prox_1_opts |
Optional argument for first proximal operator. For the group lasso, this would be the group IDs. Note: this must be inserted as a list. |
prox_2_opts |
Optional argument for second proximal operator. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
x0 |
Optional initial vector for |
u |
Optional initial vector for |
verbose |
Logical flag for whether to print fitting information. |
atos()
solves convex minimization problems of the form
where is convex and differentiable with
-Lipschitz gradient, and
and
are both convex.
The algorithm is not symmetrical, but usually the difference between variations are only small numerical values, which are filtered out.
However, both variations should be checked regardless, by looking at
x
and u
. An example for the sparse-group lasso (SGL) is given.
An object of class "atos"
containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
type |
Indicates which type of regression was performed. |
success |
Logical flag indicating whether ATOS converged, according to |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
certificate |
Final value of convergence criteria. |
intercept |
Logical flag indicating whether an intercept was fit. |
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
"sgs"
, "sgs_cv"
, "gslope"
, "gslope_cv"
.Print the coefficients using model fitted with one of the following functions: fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
.
The predictions are calculated for each "lambda"
value in the path.
## S3 method for class 'sgs' coef(object, ...)
## S3 method for class 'sgs' coef(object, ...)
object |
Object of one of the following classes: |
... |
further arguments passed to stats function. |
The fitted coefficients
fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
Other SGS-methods:
as_sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
Other gSLOPE-methods:
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope()
,
fit_gslope_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # use predict function model_coef = coef(model)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # use predict function model_coef = coef(model)
Group OSCAR (gOSCAR) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_goscar( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL )
fit_goscar( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
fit_goscar()
fits a gOSCAR model (Feser and Evangelou (2024)) using adaptive three operator splitting (ATOS). gOSCAR uses the same model set-up as for gSLOPE, but with different weights (see Bao et al. (2020) and Feser and Evangelou (2024)).
The penalties are given by (for a group with
groups):
where
A list containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
group_effects |
The group values from the regression. Taken by applying the |
selected_var |
A list containing the indicies of the active/selected variables for each |
selected_grp |
A list containing the indicies of the active/selected groups for each |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
success |
Logical flag indicating whether ATOS converged, according to |
certificate |
Final value of convergence criteria. |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
screen_set |
List of groups that were kept after screening step for each |
epsilon_set |
List of groups that were used for fitting after screening for each |
kkt_violations |
List of groups that violated the KKT conditions each |
pen_gslope |
Vector of the group penalty sequence. |
screen |
Logical flag indicating whether screening was applied. |
type |
Indicates which type of regression was performed. |
intercept |
Logical flag indicating whether an intercept was fit. |
standardise |
Type of standardisation used. |
lambda |
Value(s) of |
Bao, R., Gu B., Huang, H. (2020). Fast OSCAR and OWL Regression via Safe Screening Rules, https://proceedings.mlr.press/v119/bao20b
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://proceedings.mlr.press/v80/pedregosa18a.html
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar_cv()
,
fit_gslope()
,
fit_gslope_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gOSCAR model = fit_goscar(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, standardise = "l2", intercept = TRUE, verbose=FALSE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gOSCAR model = fit_goscar(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, standardise = "l2", intercept = TRUE, verbose=FALSE)
Function to fit a pathwise solution of group OSCAR (gOSCAR) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_goscar_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, w_weights = NULL )
fit_goscar_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
nfolds |
The number of folds to use in cross-validation. |
backtracking |
The backtracking parameter, |
max_iter |
Maximum number of ATOS iterations to perform. |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
error_criteria |
The criteria used to discriminate between models along the path. Supported values are: |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
Fits gOSCAR models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.
A list containing:
errors |
A table containing fitting information about the models on the path. |
all_models |
Fitting information for all models fit on the path, which is a |
fit |
The 1se chosen model, which is a |
best_lambda |
The value of |
best_lambda_id |
The path index for the chosen model. |
Bao, R., Gu B., Huang, H. (2020). Fast OSCAR and OWL Regression via Safe Screening Rules, https://proceedings.mlr.press/v119/bao20b
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://proceedings.mlr.press/v80/pedregosa18a.html
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_gslope()
,
fit_gslope_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
Other model-selection:
as_sgs()
,
fit_gslope_cv()
,
fit_sgo_cv()
,
fit_sgs_cv()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gOSCAR with cross-validation cv_model = fit_goscar_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gOSCAR with cross-validation cv_model = fit_goscar_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
Group SLOPE (gSLOPE) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_gslope( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, gFDR = 0.1, pen_method = 1, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL )
fit_gslope( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, gFDR = 0.1, pen_method = 1, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1. |
pen_method |
The type of penalty sequences to use (see Brzyski et al. (2019)):
|
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the penalties from |
fit_gslope()
fits a gSLOPE model (Brzyski et al. (2019)) using adaptive three operator splitting (ATOS). gSLOPE is a sparse-group method, so that it selects both variables and groups. Unlike group selection approaches, not every variable within a group is set as active.
It solves the convex optimisation problem given by
where the penalty sequences are sorted and is the loss function. In the case of the linear model, the loss function is given by the mean-squared error loss:
In the logistic model, the loss function is given by
where the log-likelihood is given by
The penalty parameters in gSLOPE are sorted so that the largest group effects are matched with the largest penalties, to reduce the group FDR.
The gMean sequence (pen_method=1
) is given by
where is the cumulative distribution function of a
distribution with
degrees of freedom. The gMax sequence (
pen_method=2
) is given by
where is the cumulative distribution function of a
distribution with
degrees of freedom.
A list containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
group_effects |
The group values from the regression. Taken by applying the |
selected_var |
A list containing the indicies of the active/selected variables for each |
selected_grp |
A list containing the indicies of the active/selected groups for each |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
success |
Logical flag indicating whether ATOS converged, according to |
certificate |
Final value of convergence criteria. |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
screen_set |
List of groups that were kept after screening step for each |
epsilon_set |
List of groups that were used for fitting after screening for each |
kkt_violations |
List of groups that violated the KKT conditions each |
pen_gslope |
Vector of the group penalty sequence. |
screen |
Logical flag indicating whether screening was applied. |
type |
Indicates which type of regression was performed. |
intercept |
Logical flag indicating whether an intercept was fit. |
standardise |
Type of standardisation used. |
lambda |
Value(s) of |
Brzyski, D., Gossmann, A., Su, W., Bodgan, M. (2019). Group SLOPE – Adaptive Selection of Groups of Predictors, https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1411269
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://proceedings.mlr.press/v80/pedregosa18a.html
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gSLOPE model = fit_gslope(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gSLOPE model = fit_gslope(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE)
Function to fit a pathwise solution of group SLOPE (gSLOPE) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_gslope_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, nfolds = 10, gFDR = 0.1, pen_method = 1, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, w_weights = NULL )
fit_gslope_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, nfolds = 10, gFDR = 0.1, pen_method = 1, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
nfolds |
The number of folds to use in cross-validation. |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the penalties. Must be between 0 and 1. |
pen_method |
The type of penalty sequences to use (see Brzyski et al. (2019)):
|
backtracking |
The backtracking parameter, |
max_iter |
Maximum number of ATOS iterations to perform. |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
error_criteria |
The criteria used to discriminate between models along the path. Supported values are: |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the penalties from |
Fits gSLOPE models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.
A list containing:
errors |
A table containing fitting information about the models on the path. |
all_models |
Fitting information for all models fit on the path, which is a |
fit |
The 1se chosen model, which is a |
best_lambda |
The value of |
best_lambda_id |
The path index for the chosen model. |
Brzyski, D., Gossmann, A., Su, W., Bodgan, M. (2019). Group SLOPE – Adaptive Selection of Groups of Predictors, https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1411269
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://proceedings.mlr.press/v80/pedregosa18a.html
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
Other model-selection:
as_sgs()
,
fit_goscar_cv()
,
fit_sgo_cv()
,
fit_sgs_cv()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gSLOPE with cross-validation cv_model = fit_gslope_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run gSLOPE with cross-validation cv_model = fit_gslope_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
Sparse-group OSCAR (SGO) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_sgo( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL, v_weights = NULL )
fit_sgo( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL, v_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
alpha |
The value of |
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
v_weights |
Optional vector for the variable penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
fit_sgo()
fits an SGO model (Feser and Evangelou (2024)) using adaptive three operator splitting (ATOS). SGO uses the same model set-up as for SGS, but with different weights (see Bao et al. (2020) and Feser and Evangelou (2024)).
The penalties are given by (for a group and variable
, with
variables and
groups):
where
A list containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
group_effects |
The group values from the regression. Taken by applying the |
selected_var |
A list containing the indicies of the active/selected variables for each |
selected_grp |
A list containing the indicies of the active/selected groups for each |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
success |
Logical flag indicating whether ATOS converged, according to |
certificate |
Final value of convergence criteria. |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
screen_set_var |
List of variables that were kept after screening step for each |
screen_set_grp |
List of groups that were kept after screening step for each |
epsilon_set_var |
List of variables that were used for fitting after screening for each |
epsilon_set_grp |
List of groups that were used for fitting after screening for each |
kkt_violations_var |
List of variables that violated the KKT conditions each |
kkt_violations_grp |
List of groups that violated the KKT conditions each |
pen_slope |
Vector of the variable penalty sequence. |
pen_gslope |
Vector of the group penalty sequence. |
screen |
Logical flag indicating whether screening was performed. |
type |
Indicates which type of regression was performed. |
intercept |
Logical flag indicating whether an intercept was fit. |
lambda |
Value(s) of |
Bao, R., Gu B., Huang, H. (2020). Fast OSCAR and OWL Regression via Safe Screening Rules, https://proceedings.mlr.press/v119/bao20b
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGO model = fit_sgo(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, alpha=0.95, standardise = "l2", intercept = TRUE, verbose=FALSE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGO model = fit_sgo(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, alpha=0.95, standardise = "l2", intercept = TRUE, verbose=FALSE)
Function to fit a pathwise solution of sparse-group SLOPE (SGO) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_sgo_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, v_weights = NULL, w_weights = NULL )
fit_sgo_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, v_weights = NULL, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
alpha |
The value of |
nfolds |
The number of folds to use in cross-validation. |
backtracking |
The backtracking parameter, |
max_iter |
Maximum number of ATOS iterations to perform. |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
error_criteria |
The criteria used to discriminate between models along the path. Supported values are: |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
v_weights |
Optional vector for the variable penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
w_weights |
Optional vector for the group penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by |
Fits SGO models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.
A list containing:
all_models |
A list of all the models fitted along the path. |
fit |
The 1se chosen model, which is a |
best_lambda |
The value of |
best_lambda_id |
The path index for the chosen model. |
errors |
A table containing fitting information about the models on the path. |
type |
Indicates which type of regression was performed. |
Bao, R., Gu B., Huang, H. (2020). Fast OSCAR and OWL Regression via Safe Screening Rules, https://proceedings.mlr.press/v119/bao20b
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357
Other model-selection:
as_sgs()
,
fit_goscar_cv()
,
fit_gslope_cv()
,
fit_sgs_cv()
,
scaled_sgs()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGO with cross-validation cv_model = fit_sgo_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, alpha = 0.95, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGO with cross-validation cv_model = fit_sgo_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, alpha = 0.95, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
Sparse-group SLOPE (SGS) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_sgs( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, pen_method = 1, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL, v_weights = NULL )
fit_sgs( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, pen_method = 1, max_iter = 5000, backtracking = 0.7, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, screen = TRUE, verbose = FALSE, w_weights = NULL, v_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
alpha |
The value of |
vFDR |
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1. |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1. |
pen_method |
The type of penalty sequences to use (see Feser and Evangelou (2023)):
|
max_iter |
Maximum number of ATOS iterations to perform. |
backtracking |
The backtracking parameter, |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
w_weights |
Optional vector for the group penalty weights. Overrides the penalties from |
v_weights |
Optional vector for the variable penalty weights. Overrides the penalties from |
fit_sgs()
fits an SGS model (Feser and Evangelou (2023)) using adaptive three operator splitting (ATOS). SGS is a sparse-group method, so that it selects both variables and groups. Unlike group selection approaches, not every variable within a group is set as active.
It solves the convex optimisation problem given by
where is the loss function and
are the group sizes. The penalty parameters in SGS are sorted so that the largest coefficients are matched with the largest penalties, to reduce the FDR.
For the variables:
and
.
For the groups:
and
.
In the case of the linear model, the loss function is given by the mean-squared error loss:
In the logistic model, the loss function is given by
where the log-likelihood is given by
SGS can be seen to be a convex combination of SLOPE and gSLOPE, balanced through alpha
, such that it reduces to SLOPE for alpha = 0
and to gSLOPE for alpha = 1
.
The penalty parameters in SGS are sorted so that the largest coefficients are matched with the largest penalties, to reduce the FDR.
For the group penalties, see fit_gslope()
. For the variable penalties, the vMean SGS sequence (pen_method=1
) (Feser and Evangelou (2023)) is given by
where is the cumulative distribution functions of a standard Gaussian distribution. The vMax SGS sequence (
pen_method=2
) (Feser and Evangelou (2023)) is given by
The BH SLOPE sequence (pen_method=3
) (Bogdan et al. (2015)) is given by
where is the quantile function of a standard normal distribution.
A list containing:
beta |
The fitted values from the regression. Taken to be the more stable fit between |
group_effects |
The group values from the regression. Taken by applying the |
selected_var |
A list containing the indicies of the active/selected variables for each |
selected_grp |
A list containing the indicies of the active/selected groups for each |
num_it |
Number of iterations performed. If convergence is not reached, this will be |
success |
Logical flag indicating whether ATOS converged, according to |
certificate |
Final value of convergence criteria. |
x |
The solution to the original problem (see Pedregosa and Gidel (2018)). |
z |
The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)). |
u |
The solution to the dual problem (see Pedregosa and Gidel (2018)). |
screen_set_var |
List of variables that were kept after screening step for each |
screen_set_grp |
List of groups that were kept after screening step for each |
epsilon_set_var |
List of variables that were used for fitting after screening for each |
epsilon_set_grp |
List of groups that were used for fitting after screening for each |
kkt_violations_var |
List of variables that violated the KKT conditions each |
kkt_violations_grp |
List of groups that violated the KKT conditions each |
pen_slope |
Vector of the variable penalty sequence. |
pen_gslope |
Vector of the group penalty sequence. |
screen |
Logical flag indicating whether screening was performed. |
type |
Indicates which type of regression was performed. |
intercept |
Logical flag indicating whether an intercept was fit. |
lambda |
Value(s) of |
Bogdan, M., van den Berg, E., Sabatti, C., Candes, E. (2015). SLOPE - Adaptive variable selection via convex optimization, https://projecteuclid.org/journals/annals-of-applied-statistics/volume-9/issue-3/SLOPEAdaptive-variable-selection-via-convex-optimization/10.1214/15-AOAS842.full
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357
Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE)
Function to fit a pathwise solution of sparse-group SLOPE (SGS) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_sgs_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, pen_method = 1, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, v_weights = NULL, w_weights = NULL )
fit_sgs_cv( X, y, groups, type = "linear", lambda = "path", path_length = 20, min_frac = 0.05, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, pen_method = 1, nfolds = 10, backtracking = 0.7, max_iter = 5000, max_iter_backtracking = 100, tol = 1e-05, standardise = "l2", intercept = TRUE, error_criteria = "mse", screen = TRUE, verbose = FALSE, v_weights = NULL, w_weights = NULL )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
lambda |
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
|
path_length |
The number of |
min_frac |
Smallest value of |
alpha |
The value of |
vFDR |
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1. |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1. |
pen_method |
The type of penalty sequences to use (see Feser and Evangelou (2023)):
|
nfolds |
The number of folds to use in cross-validation. |
backtracking |
The backtracking parameter, |
max_iter |
Maximum number of ATOS iterations to perform. |
max_iter_backtracking |
Maximum number of backtracking line search iterations to perform per global iteration. |
tol |
Convergence tolerance for the stopping criteria. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
error_criteria |
The criteria used to discriminate between models along the path. Supported values are: |
screen |
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed. |
verbose |
Logical flag for whether to print fitting information. |
v_weights |
Optional vector for the variable penalty weights. Overrides the penalties from pen_method if specified. When entering custom weights, these are multiplied internally by |
w_weights |
Optional vector for the group penalty weights. Overrides the penalties from pen_method if specified. When entering custom weights, these are multiplied internally by |
Fits SGS models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.
A list containing:
all_models |
A list of all the models fitted along the path. |
fit |
The 1se chosen model, which is a |
best_lambda |
The value of |
best_lambda_id |
The path index for the chosen model. |
errors |
A table containing fitting information about the models on the path. |
type |
Indicates which type of regression was performed. |
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357
Other model-selection:
as_sgs()
,
fit_goscar_cv()
,
fit_gslope_cv()
,
fit_sgo_cv()
,
scaled_sgs()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS with cross-validation cv_model = fit_sgs_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS with cross-validation cv_model = fit_sgs_cv(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 5, nfolds=5, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=TRUE)
Generates variable and group penalties for SGS.
gen_pens(gFDR, vFDR, pen_method, groups, alpha)
gen_pens(gFDR, vFDR, pen_method, groups, alpha)
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. |
vFDR |
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. |
pen_method |
The type of penalty sequences to use (see Feser and Evangelou (2023)):
|
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
alpha |
The value of |
The vMean and vMax SGS sequences are variable sequences derived specifically to give variable false discovery rate (FDR) control for SGS under orthogonal designs (see Feser and Evangelou (2023)). The BH SLOPE sequence is derived in Bodgan et al. (2015) and has links to the Benjamini-Hochberg critical values. The sequence provides variable FDR-control for SLOPE under orthogonal designs. The gMean gSLOPE sequence is derived in Brzyski et al. (2015) and provides group FDR-control for gSLOPE under orthogonal designs.
A list containing:
pen_slope_org |
A vector of the variable penalty sequence. |
pen_gslope_org |
A vector of the group penalty sequence. |
Bogdan, M., Van den Berg, E., Sabatti, C., Su, W., Candes, E. (2015). SLOPE — Adaptive variable selection via convex optimization, https://projecteuclid.org/journals/annals-of-applied-statistics/volume-9/issue-3/SLOPEAdaptive-variable-selection-via-convex-optimization/10.1214/15-AOAS842.full
Brzyski, D., Gossmann, A., Su, W., Bodgan, M. (2019). Group SLOPE – Adaptive Selection of Groups of Predictors, https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1411269
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate sequences sequences = gen_pens(gFDR=0.1, vFDR=0.1, pen_method=1, groups=groups, alpha=0.5)
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate sequences sequences = gen_pens(gFDR=0.1, vFDR=0.1, pen_method=1, groups=groups, alpha=0.5)
Generates different types of datasets, which can then be fitted using sparse-group SLOPE.
gen_toy_data( p, n, rho = 0, seed_id = 2, grouped = TRUE, groups, noise_level = 1, group_sparsity = 0.1, var_sparsity = 0.5, orthogonal = FALSE, data_mean = 0, data_sd = 1, signal_mean = 0, signal_sd = sqrt(10) )
gen_toy_data( p, n, rho = 0, seed_id = 2, grouped = TRUE, groups, noise_level = 1, group_sparsity = 0.1, var_sparsity = 0.5, orthogonal = FALSE, data_mean = 0, data_sd = 1, signal_mean = 0, signal_sd = sqrt(10) )
p |
The number of input variables. |
n |
The number of observations. |
rho |
Correlation coefficient. Must be in range |
seed_id |
Seed to be used to generate the data matrix |
grouped |
A logical flag indicating whether grouped data is required. |
groups |
If |
noise_level |
Defines the level of noise ( |
group_sparsity |
Defines the level of group sparsity. Must be in the range |
var_sparsity |
Defines the level of variable sparsity. Must be in the range |
orthogonal |
Logical flag as to whether the input matrix should be orthogonal. |
data_mean |
Defines the mean of input predictors. |
data_sd |
Defines the standard deviation of the signal ( |
signal_mean |
Defines the mean of the signal ( |
signal_sd |
Defines the standard deviation of the signal ( |
The data is generated under a Gaussian linear model. The generated data can be grouped and sparsity can be provided at both a group and/or variable level.
A list containing:
y |
The response vector. |
X |
The input matrix. |
true_beta |
The true values of |
true_grp_id |
Indices of which groups are non-zero in |
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate data data = gen_toy_data(p=500, n=400, groups = groups, seed_id=3)
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate data data = gen_toy_data(p=500, n=400, groups = groups, seed_id=3)
"sgs"
, "sgs_cv"
, "gslope"
, "gslope_cv"
.Plots the pathwise solution of a cross-validation fit, from a call to one of the following: fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
.
## S3 method for class 'sgs' plot(x, how_many = 10, ...)
## S3 method for class 'sgs' plot(x, how_many = 10, ...)
x |
Object of one of the following classes: |
how_many |
Defines how many predictors to plot. Plots the predictors in decreasing order of largest absolute value. |
... |
further arguments passed to base function. |
A list containing:
response |
The predicted response. In the logistic case, this represents the predicted class probabilities. |
class |
The predicted class assignments. Only returned if type = "logistic" in the model object. |
fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope()
,
fit_gslope_cv()
,
predict.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,2,2,3) # generate data data = gen_toy_data(p=5, n=4, groups = groups, seed_id=3,signal_mean=20,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 20, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=FALSE) plot(model, how_many = 10)
# specify a grouping structure groups = c(1,1,2,2,3) # generate data data = gen_toy_data(p=5, n=4, groups = groups, seed_id=3,signal_mean=20,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups=groups, type = "linear", path_length = 20, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, min_frac = 0.05, standardise="l2",intercept=TRUE,verbose=FALSE) plot(model, how_many = 10)
"sgs"
, "sgs_cv"
, "gslope"
, "gslope_cv"
.Performs prediction from one of the following fits: fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
. The predictions are calculated for each "lambda"
value in the path.
## S3 method for class 'sgs' predict(object, x, ...)
## S3 method for class 'sgs' predict(object, x, ...)
object |
Object of one of the following classes: |
x |
Input data to use for prediction. |
... |
further arguments passed to stats function. |
A list containing:
response |
The predicted response. In the logistic case, this represents the predicted class probabilities. |
class |
The predicted class assignments. Only returned if type = "logistic" in the model object. |
fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
print.sgs()
,
scaled_sgs()
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope()
,
fit_gslope_cv()
,
plot.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # use predict function model_predictions = predict(model, x = data$X)
# specify a grouping structure groups = c(1,1,1,2,2,3,3,3,4,4) # generate data data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # use predict function model_predictions = predict(model, x = data$X)
"sgs"
, "sgs_cv"
, "gslope"
, "gslope_cv"
.Prints out useful metric from a model fit.
## S3 method for class 'sgs' print(x, ...)
## S3 method for class 'sgs' print(x, ...)
x |
Object of one of the following classes: |
... |
further arguments passed to base function. |
A summary of the model fit(s).
fit_sgs()
, fit_sgs_cv()
, fit_gslope()
, fit_gslope_cv()
, fit_sgo()
, fit_sgo_cv()
, fit_goscar()
, fit_goscar_cv()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
scaled_sgs()
Other gSLOPE-methods:
coef.sgs()
,
fit_goscar()
,
fit_goscar_cv()
,
fit_gslope()
,
fit_gslope_cv()
,
plot.sgs()
,
predict.sgs()
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate data data = gen_toy_data(p=500, n=400, groups = groups, seed_id=3) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # print model print(model)
# specify a grouping structure groups = c(rep(1:20, each=3), rep(21:40, each=4), rep(41:60, each=5), rep(61:80, each=6), rep(81:100, each=7)) # generate data data = gen_toy_data(p=500, n=400, groups = groups, seed_id=3) # run SGS model = fit_sgs(X = data$X, y = data$y, groups = groups, type="linear", lambda = 1, alpha=0.95, vFDR=0.1, gFDR=0.1, standardise = "l2", intercept = TRUE, verbose=FALSE) # print model print(model)
Fits an SGS model using the noise estimation procedure (Algorithm 5 from Bogdan et al. (2015)). This estimates and then fits the model using the estimated value. It is an alternative approach to cross-validation (
fit_sgs_cv()
).
scaled_sgs( X, y, groups, type = "linear", pen_method = 1, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, standardise = "l2", intercept = TRUE, verbose = FALSE )
scaled_sgs( X, y, groups, type = "linear", pen_method = 1, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, standardise = "l2", intercept = TRUE, verbose = FALSE )
X |
Input matrix of dimensions |
y |
Output vector of dimension |
groups |
A grouping structure for the input data. Should take the form of a vector of group indices. |
type |
The type of regression to perform. Supported values are: |
pen_method |
The type of penalty sequences to use.
|
alpha |
The value of |
vFDR |
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1. |
gFDR |
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1. |
standardise |
Type of standardisation to perform on
|
intercept |
Logical flag for whether to fit an intercept. |
verbose |
Logical flag for whether to print fitting information. |
An object of type "sgs"
containing model fit information (see fit_sgs()
).
Bogdan, M., Van den Berg, E., Sabatti, C., Su, W., Candes, E. (2015). SLOPE — Adaptive variable selection via convex optimization, https://projecteuclid.org/journals/annals-of-applied-statistics/volume-9/issue-3/SLOPEAdaptive-variable-selection-via-convex-optimization/10.1214/15-AOAS842.full
Other model-selection:
as_sgs()
,
fit_goscar_cv()
,
fit_gslope_cv()
,
fit_sgo_cv()
,
fit_sgs_cv()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
fit_sgs_cv()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
# specify a grouping structure groups = c(1,1,2,2,3) # generate data data = gen_toy_data(p=5, n=4, groups = groups, seed_id=3, signal_mean=20,group_sparsity=1,var_sparsity=1) # run noise estimation model = scaled_sgs(X=data$X, y=data$y, groups=groups, pen_method=1)
# specify a grouping structure groups = c(1,1,2,2,3) # generate data data = gen_toy_data(p=5, n=4, groups = groups, seed_id=3, signal_mean=20,group_sparsity=1,var_sparsity=1) # run noise estimation model = scaled_sgs(X=data$X, y=data$y, groups=groups, pen_method=1)