Title: | Adaptive Double Sparse Iterative Hard Thresholding |
---|---|
Description: | Solving the high-dimensional double sparse linear regression via iterative hard thresholding algorithm. For more details, please see Zhang et al. (2024, <DOI:10.48550/arXiv.2305.04182>). |
Authors: | Yanhang Zhang [aut, cre], Zhifan Li [aut], Jianxin Yin [aut], Shixiang Liu [aut], Lingren Kong [aut] |
Maintainer: | Yanhang Zhang <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-12-11 19:07:04 UTC |
Source: | CRAN |
An implementation of the sparse group selection in linear regression model via ADSIHT.
ADSIHT( x, y, group, s0, kappa = 0.9, ic.type = c("dsic", "loss"), ic.scale = 3, ic.coef = 3, L = 5, weight = rep(1, nrow(x)), coef1 = 1, coef2 = 1, eta = 0.8, max_iter = 20, method = "ols" )
ADSIHT( x, y, group, s0, kappa = 0.9, ic.type = c("dsic", "loss"), ic.scale = 3, ic.coef = 3, L = 5, weight = rep(1, nrow(x)), coef1 = 1, coef2 = 1, eta = 0.8, max_iter = 20, method = "ols" )
x |
Input matrix, of dimension |
y |
The response variable of |
group |
A vector indicating which group each variable belongs to
For variables in the same group, they should be located in adjacent columns of |
s0 |
A vector that controls the degrees with group.
Default is |
kappa |
A parameter that controls the rapid of the decrease of threshold. Default is 0.9. |
ic.type |
The type of criterion for choosing the support size.
Available options are |
ic.scale |
A non-negative value used for multiplying the penalty term
in information criterion. Default: |
ic.coef |
A non-negative value used for multiplying the penalty term
for choosing the optimal stopping time. Default: |
L |
The length of the sequence of s0. Default: |
weight |
The weight of the samples, with the default value set to 1 for each sample. |
coef1 |
A positive value to control the sub-optimal stopping time. |
coef2 |
A positive value to control the overall stopping time. A small value leads to larger search range. |
eta |
A parameter controls the step size in the gradient descent step.
Default: |
max_iter |
A parameter that controls the maximum number of line search, ignored if |
method |
Whether |
A list
object comprising:
beta |
A |
intercept |
A |
.
lambda |
A |
A_out |
The selected variables given threshold value in |
ic |
The values of the specified criterion for each fitted model given threshold |
Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin.
n <- 200 m <- 100 d <- 10 s <- 5 s0 <- 5 data <- gen.data(n, m, d, s, s0) fit <- ADSIHT(data$x, data$y, data$group) fit$A_out[which.min(fit$ic)]
n <- 200 m <- 100 d <- 10 s <- 5 s0 <- 5 data <- gen.data(n, m, d, s, s0) fit <- ADSIHT(data$x, data$y, data$group) fit$A_out[which.min(fit$ic)]
An implementation of the sparse group selection in linear regression model via ADSIHT.
ADSIHT.ML( x_list, y_list, group_list, s0, kappa = 0.9, ic.type = c("dsic", "loss"), ic.scale = 3, ic.coef = 3, L = 5, weight, coef1 = 1, coef2 = 1, eta = 0.8, max_iter = 20, method = "ols" )
ADSIHT.ML( x_list, y_list, group_list, s0, kappa = 0.9, ic.type = c("dsic", "loss"), ic.scale = 3, ic.coef = 3, L = 5, weight, coef1 = 1, coef2 = 1, eta = 0.8, max_iter = 20, method = "ols" )
x_list |
The list of input matrix. |
y_list |
The list of response variable. |
group_list |
A vector indicating which group each variable belongs to
For variables in the same group, they should be located in adjacent columns of |
s0 |
A vector that controls the degrees with group.
Default is |
kappa |
A parameter that controls the rapid of the decrease of threshold. Default is 0.9. |
ic.type |
The type of criterion for choosing the support size.
Available options are |
ic.scale |
A non-negative value used for multiplying the penalty term
in information criterion. Default: |
ic.coef |
A non-negative value used for multiplying the penalty term
for choosing the optimal stopping time. Default: |
L |
The length of the sequence of s0. Default: |
weight |
The weight of the samples, with the default value set to 1 for each sample. |
coef1 |
A positive value to control the sub-optimal stopping time. |
coef2 |
A positive value to control the overall stopping time. A small value leads to larger search range. |
eta |
A parameter controls the step size in the gradient descent step.
Default: |
max_iter |
A parameter that controls the maximum number of line search, ignored if |
method |
Whether |
A list
object comprising:
beta |
A |
intercept |
A |
.
lambda |
A |
A_out |
The selected variables given threshold value in |
ic |
The values of the specified criterion for each fitted model given threshold |
Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin.
set.seed(1) n <- 200 p <- 100 K <- 4 s <- 5 s0 <- 2 x_list <- lapply(1:K, function(x) matrix(rnorm(n*p, 0, 1), nrow = n)) vec <- rep(0, K * p) non_sparse_groups <- sample(1:p, size = s, replace = FALSE) for (group in non_sparse_groups) { group_indices <- seq(group, K * p, by = p) non_zero_indices <- sample(group_indices, size = s0, replace = FALSE) vec[non_zero_indices] <- rep(2, s0) } y_list <- lapply(1:K, function(i) return( y = x_list[[i]] %*% vec[((i-1)*p+1):(i*p)]+rnorm(n, 0, 0.5)) ) fit <- ADSIHT.ML(x_list, y_list) fit$A_out[, which.min(fit$ic)]
set.seed(1) n <- 200 p <- 100 K <- 4 s <- 5 s0 <- 2 x_list <- lapply(1:K, function(x) matrix(rnorm(n*p, 0, 1), nrow = n)) vec <- rep(0, K * p) non_sparse_groups <- sample(1:p, size = s, replace = FALSE) for (group in non_sparse_groups) { group_indices <- seq(group, K * p, by = p) non_zero_indices <- sample(group_indices, size = s0, replace = FALSE) vec[non_zero_indices] <- rep(2, s0) } y_list <- lapply(1:K, function(i) return( y = x_list[[i]] %*% vec[((i-1)*p+1):(i*p)]+rnorm(n, 0, 0.5)) ) fit <- ADSIHT.ML(x_list, y_list) fit$A_out[, which.min(fit$ic)]
Generate simulated data for sparse group linear model.
gen.data( n, m, d, s, s0, cor.type = 1, beta.type = 1, rho = 0.5, sigma1 = 1, sigma2 = 1, seed = 1 )
gen.data( n, m, d, s, s0, cor.type = 1, beta.type = 1, rho = 0.5, sigma1 = 1, sigma2 = 1, seed = 1 )
n |
The number of observations. |
m |
The number of groups of interest. |
d |
The group size of each group. Only even group structure is allowed here. |
s |
The number of important groups in the underlying regression model. |
s0 |
The number of important variables in each important group. |
cor.type |
The structure of correlation.
|
beta.type |
The structure of coefficients.
|
rho |
A parameter used to characterize the pairwise correlation in
predictors. Default is |
sigma1 |
The value controlling the strength of the gaussian noise. A large value implies strong noise. Default |
sigma2 |
The value controlling the strength of the coefficients. A large value implies large coefficients. Default |
seed |
random seed. Default: |
A list
object comprising:
x |
Design matrix of predictors. |
y |
Response variable. |
beta |
The coefficients used in the underlying regression model. |
group |
The group index of each variable. |
true.group |
The important groups in the sparse group linear model. |
true.variable |
The important variables in the sparse group linear model. |
Yanhang Zhang, Zhifan Li, Jianxin Yin.
# Generate simulated data n <- 200 m <- 100 d <- 10 s <- 5 s0 <- 5 data <- gen.data(n, m, d, s, s0) str(data)
# Generate simulated data n <- 200 m <- 100 d <- 10 s <- 5 s0 <- 5 data <- gen.data(n, m, d, s, s0) str(data)