Title: | Subgroup Identification with Latent Factor Structure |
---|---|
Description: | In various domains, many datasets exhibit both high variable dependency and group structures, which necessitates their simultaneous estimation. This package provides functions for two subgroup identification methods based on penalized functions, both of which utilize factor model structures to adapt to data with cross-sectional dependency. The first method is the Subgroup Identification with Latent Factor Structure Method (SILFSM) we proposed. By employing Center-Augmented Regularization and factor structures, the SILFSM effectively eliminates data dependencies while identifying subgroups within datasets. For this model, we offer optimization functions based on two different methods: Coordinate Descent and our newly developed Difference of Convex-Alternating Direction Method of Multipliers (DC-ADMM) algorithms; the latter can be applied to cases where the distance function in Center-Augmented Regularization takes L1 and L2 forms. The other method is the Factor-Adjusted Pairwise Fusion Penalty (FA-PFP) model, which incorporates factor augmentation into the Pairwise Fusion Penalty (PFP) developed by Ma, S. and Huang, J. (2017) <doi:10.1080/01621459.2016.1148039>. Additionally, we provide a function for the Standard CAR (S-CAR) method, which does not consider the dependency and is for comparative analysis with other approaches. Furthermore, functions based on the Bayesian Information Criterion (BIC) of the SILFSM and the FA-PFP method are also included in 'SILFS' for selecting tuning parameters. For more details of Subgroup Identification with Latent Factor Structure Method, please refer to He et al. (2024) <doi:10.48550/arXiv.2407.00882>. |
Authors: | Yong He [aut], Dong Liu [aut], Fuxin Wang [aut, cre], Mingjuan Zhang [aut], Wenxin Zhou [aut] |
Maintainer: | Fuxin Wang <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-01 08:09:54 UTC |
Source: | CRAN |
This function is to select tuning parameters simultaneously for FA-PFP method via minimizing the BIC.
BIC_PFP( Y, Fhat, Uhat, alpha_init, lasso_start, lasso_stop, lam_start, lam_stop, grid_1, grid_2, epsilon )
BIC_PFP( Y, Fhat, Uhat, alpha_init, lasso_start, lasso_stop, lam_start, lam_stop, grid_1, grid_2, epsilon )
Y |
The response vector of length |
Fhat |
The estimated common factors matrix of size |
Uhat |
The estimated idiosyncratic factors matrix of size |
alpha_init |
The initialization of intercept parameter. |
lasso_start |
The user-supplied start search value of the tuning parameters for LASSO. |
lasso_stop |
The user-supplied stop search value of the tuning parameters for LASSO. |
lam_start |
The user-supplied start search value of the tuning parameters for Pairwise Fusion Penalty. |
lam_stop |
The user-supplied stop search value of the tuning parameters for Pairwise Fusion Penalty. |
grid_1 |
The user-supplied number of search grid points corresponding to the LASSO tuning parameter. |
grid_2 |
The user-supplied number of search grid points corresponding to the tuning parameter for Pairwise Fusion Penalty. |
epsilon |
The user-supplied stopping tolerance. |
A list with the following components:
lasso |
The tuning parameter of the LASSO penalty selected using BIC. |
lambda |
The tuning parameter of the Pairwise Concave Fusion Penalty selected using BIC. |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
n <- 50 p <- 50 r <- 3 lasso_start <- sqrt(log(p)/n)*0.1 lasso_stop <- sqrt(log(p)/n) lam_start <- 0.3 lam_stop <- 1 grid_1 <- 5 grid_2 <- 5 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) BIC_PFP(Y,F_1,U,alpha_init,lasso_start,lasso_stop,lam_start,lam_stop,grid_1,grid_2,0.3)
n <- 50 p <- 50 r <- 3 lasso_start <- sqrt(log(p)/n)*0.1 lasso_stop <- sqrt(log(p)/n) lam_start <- 0.3 lam_stop <- 1 grid_1 <- 5 grid_2 <- 5 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) BIC_PFP(Y,F_1,U,alpha_init,lasso_start,lasso_stop,lam_start,lam_stop,grid_1,grid_2,0.3)
This function is to select tuning parameters simultaneously for SILFS method via minimizing the BIC.
BIC_SILFS( Y, Fhat, Uhat, K, alpha_init, lasso_start, lasso_stop, CAR_start, CAR_stop, grid_1, grid_2, epsilon )
BIC_SILFS( Y, Fhat, Uhat, K, alpha_init, lasso_start, lasso_stop, CAR_start, CAR_stop, grid_1, grid_2, epsilon )
Y |
The response vector of length |
Fhat |
The estimated common factors matrix of size |
Uhat |
The estimated idiosyncratic factors matrix of size |
K |
The estimated subgroup number. |
alpha_init |
The initialization of intercept parameter. |
lasso_start |
The user-supplied start search value of the tuning parameters for LASSO. |
lasso_stop |
The user-supplied stop search value of the tuning parameters for LASSO. |
CAR_start |
The user-supplied start search value of the tuning parameters for Center-Augmented Regularization. |
CAR_stop |
The user-supplied stop search value of the tuning parameters for Center-Augmented Regularization. |
grid_1 |
The user-supplied number of search grid points corresponding to the LASSO tuning parameter. |
grid_2 |
The user-supplied number of search grid points corresponding to the tuning parameter for Center-Augmented Regularization. |
epsilon |
The user-supplied stopping tolerance. |
A list with the following components:
lasso |
The tuning parameter of the LASSO penalty selected using BIC. |
CAR |
The tuning parameter of the Center Augmented Regularization selected using BIC. |
n <- 50 p <- 50 r <- 3 K <- 2 lasso_start <- sqrt(log(p)/n)*0.01 lasso_stop <- sqrt(log(p)/n)*10^(0.5) CAR_start <- 0.001 CAR_stop <- 0.1 grid_1 <- 5 grid_2 <- 5 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) BIC_SILFS(Y,F_1,U,K,alpha_init,lasso_start,lasso_stop,CAR_start,CAR_stop,grid_1,grid_2,0.3)
n <- 50 p <- 50 r <- 3 K <- 2 lasso_start <- sqrt(log(p)/n)*0.01 lasso_stop <- sqrt(log(p)/n)*10^(0.5) CAR_start <- 0.001 CAR_stop <- 0.1 grid_1 <- 5 grid_2 <- 5 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) BIC_SILFS(Y,F_1,U,K,alpha_init,lasso_start,lasso_stop,CAR_start,CAR_stop,grid_1,grid_2,0.3)
This function employs SILFS method and uses the corresponding Difference of Convex functions-Alternating Direction Method of Multipliers (DC-ADMM) algorithm for optimization to identify subgroup structures and conduct variable selection under the L1 Distance.
DCADMM_iter_l1( Y, F_hat, U_hat, r_1, r_2, r_3, lambda_1, lambda_2, K, alpha_init, epsilon_1, epsilon_2 )
DCADMM_iter_l1( Y, F_hat, U_hat, r_1, r_2, r_3, lambda_1, lambda_2, K, alpha_init, epsilon_1, epsilon_2 )
Y |
The response vector of length |
F_hat |
The estimated factor matrix of size |
U_hat |
The estimated idiosyncratic factors matrix of size |
r_1 |
The Lagrangian augmentation parameter for constraints of intercepts. |
r_2 |
The Lagrangian augmentation parameter for constraints of group centers. |
r_3 |
The Lagrangian augmentation parameter for constraints of coefficients. |
lambda_1 |
The tuning parameter for Center-Augmented Regularization. |
lambda_2 |
The tuning parameter for LASSO. |
K |
The estimated group number. |
alpha_init |
The initialization of intercept parameter. |
epsilon_1 |
The user-supplied stopping error for outer loop. |
epsilon_2 |
The user-supplied stopping error for inner loop. |
A list with the following components:
alpha_curr |
The estimated intercept parameter vector of length |
gamma_curr |
The estimated vector of subgroup centers of length |
theta_curr |
The estimated regression coefficient vector, matched with common factor terms, with a dimension of |
beta_curr |
The estimated regression coefficients matched with idiosyncratic factors, with a dimension of |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
He, Y., Liu, D., Wang, F., Zhang, M., Zhou, W., 2024. High-Dimensional Subgroup Identification under Latent Factor Structures.
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) DCADMM_iter_l1(Y,F_1,U,0.5,0.5,0.5,0.01,0.05,K,alpha_init,1,0.3)
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) DCADMM_iter_l1(Y,F_1,U,0.5,0.5,0.5,0.01,0.05,K,alpha_init,1,0.3)
This function employs SILFS method and uses the corresponding Difference of Convex functions-Alternating Direction Method of Multipliers (DC-ADMM) algorithm for optimization to identify subgroup structures and conduct variable selection under the L2 Distance.
DCADMM_iter_l2( Y, F_hat, U_hat, r_1, r_2, r_3, lambda_1, lambda_2, K, alpha_init, epsilon_1, epsilon_2 )
DCADMM_iter_l2( Y, F_hat, U_hat, r_1, r_2, r_3, lambda_1, lambda_2, K, alpha_init, epsilon_1, epsilon_2 )
Y |
The response vector of length |
F_hat |
The estimated factor matrix of size |
U_hat |
The estimated idiosyncratic factors matrix of size |
r_1 |
The Lagrangian augmentation parameter for constraints of intercepts. |
r_2 |
The Lagrangian augmentation parameter for constraints of group centers. |
r_3 |
The Lagrangian augmentation parameter for constraints of coefficients. |
lambda_1 |
The tuning parameter for Center-Augmented Regularization. |
lambda_2 |
The tuning parameter for LASSO. |
K |
The estimated group number. |
alpha_init |
The initialization of intercept parameter. |
epsilon_1 |
The user-supplied stopping error for outer loop. |
epsilon_2 |
The user-supplied stopping error for inner loop. |
A list with the following components:
alpha_curr |
The estimated intercept parameter vector of length |
gamma_curr |
The estimated vector of subgroup centers of length |
theta_curr |
The estimated regression coefficient vector, matched with common factor terms, with a dimension of |
beta_curr |
The estimated regression coefficients matched with idiosyncratic factors, with a dimension of |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
He, Y., Liu, D., Wang, F., Zhang, M., Zhou, W., 2024. High-Dimensional Subgroup Identification under Latent Factor Structures.
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) DCADMM_iter_l2(Y,F_1,U,0.5,0.5,0.5,0.01,0.05,K,alpha_init,1,0.3)
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) DCADMM_iter_l2(Y,F_1,U,0.5,0.5,0.5,0.01,0.05,K,alpha_init,1,0.3)
This function utilizes the FA-PFP method implemented via the Alternating Direction Method of Multipliers (ADMM) algorithm to identify subgroup structures and conduct variable selection.
FA_PFP(Y, Fhat, Uhat, vartheta, lam, gam, alpha_init, lam_lasso, epsilon)
FA_PFP(Y, Fhat, Uhat, vartheta, lam, gam, alpha_init, lam_lasso, epsilon)
Y |
The response vector of length |
Fhat |
The estimated common factors matrix of size |
Uhat |
The estimated idiosyncratic factors matrix of size |
vartheta |
The Lagrangian augmentation parameter for intercepts. |
lam |
The tuning parameter for Pairwise Fusion Penalty. |
gam |
The user-supplied parameter for Alternating Direction Method of Multipliers (ADMM) algorithm. |
alpha_init |
The initialization of intercept parameter. |
lam_lasso |
The tuning parameter for LASSO. |
epsilon |
The user-supplied stopping tolerance. |
A list with the following components:
alpha_m |
The estimated intercept parameter vector of length |
theta_m |
The estimated regression coefficient vector, matched with common factor terms, with a dimension of |
beta_m |
The estimated regression coefficients matched with idiosyncratic factors, with a dimension of |
eta_m |
A numeric matrix storing the pairwise differences of the estimated intercepts, with size of |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
Ma, S., Huang, J., 2017. A concave pairwise fusion approach to subgroup analysis.
n <- 50 p <- 50 r <- 3 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) FA_PFP(Y,F_1,U,1,0.67,3,alpha_init,0.05,0.3)
n <- 50 p <- 50 r <- 3 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) FA_PFP(Y,F_1,U,1,0.67,3,alpha_init,0.05,0.3)
This function computes initial values for intercept parameter by solving a ridge regression problem.
INIT(Y, X, lam_ridge)
INIT(Y, X, lam_ridge)
Y |
The response vector of length |
X |
The design matrix of size |
lam_ridge |
The tuning parameter for ridge regression. |
A numeric vector of length , representing the initial estimation for intercept parameter.
n <- 100 p <- 100 beta <- rep(1,p) X <- matrix(rnorm(100*100), n, p) Y <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) + X%*%beta lam_ridge <- 0.1 alpha_init <- INIT(Y, X, lam_ridge)
n <- 100 p <- 100 beta <- rep(1,p) X <- matrix(rnorm(100*100), n, p) Y <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) + X%*%beta lam_ridge <- 0.1 alpha_init <- INIT(Y, X, lam_ridge)
This function employs the S-CAR method under L2 distance and uses the Coordinate Descent Algorithm for optimization to identify subgroup structures and execute variable selection.
SCAR(Y, X, lam_CAR, lam_lasso, alpha_init, K, epsilon)
SCAR(Y, X, lam_CAR, lam_lasso, alpha_init, K, epsilon)
Y |
The response vector of length |
X |
The design matrix of size |
lam_CAR |
The tuning parameter for Center-Augmented Regularization. |
lam_lasso |
The tuning parameter for lasso. |
alpha_init |
The initialization of intercept parameter. |
K |
The estimated group number. |
epsilon |
The user-supplied stopping tolerance. |
A list with the following components:
alpha_m |
The estimated intercept parameter vector of length |
gamma |
The estimated vector of subgroup centers of length |
beta_m |
The estimated regression coefficient vector of dimension |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,X,0.1) SCAR(Y,X,0.01,0.05,alpha_init,K,0.3)
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,X,0.1) SCAR(Y,X,0.01,0.05,alpha_init,K,0.3)
This function employs SILFS method under L2 distance and uses the Coordinate Descent Algorithm for optimization to effectively identify subgroup structures and perform variable selection.
SILFS(Y, X_aug, r, lam_CAR, lam_lasso, alpha_init, K, epsilon)
SILFS(Y, X_aug, r, lam_CAR, lam_lasso, alpha_init, K, epsilon)
Y |
The response vector of length |
X_aug |
The augmented design matrix created by row concatenation of common and idiosyncratic factor matrices, with a size of |
r |
The user supplied number of common factors. |
lam_CAR |
The tuning parameter for Center-Augmented Regularization. |
lam_lasso |
The tuning parameter for LASSO. |
alpha_init |
The initialization of intercept parameter. |
K |
The user-supplied group number. |
epsilon |
The user-supplied stopping tolerance. |
A vector containing the following components:
alpha_m |
The estimated intercept parameter vector of length |
gamma |
The estimated vector of subgroup centers of length |
theta_m |
The estimated regression coefficient vector, matched with common factor terms, with a dimension of |
beta_m |
The estimated regression coefficients matched with idiosyncratic factors, with a dimension of |
Yong He, Liu Dong, Fuxin Wang, Mingjuan Zhang, Wenxin Zhou.
He, Y., Liu, D., Wang, F., Zhang, M., Zhou, W., 2024. High-Dimensional Subgroup Identification under Latent Factor Structures.
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) SILFS(Y,cbind(F_1,U),3,0.01,0.05,alpha_init,K,0.3)
n <- 50 p <- 50 r <- 3 K <- 2 alpha <- sample(c(-3,3),n,replace=TRUE,prob=c(1/2,1/2)) beta <- c(rep(1,2),rep(0,48)) B <- matrix((rnorm(p*r,1,1)),p,r) F_1 <- matrix((rnorm(n*r,0,1)),n,r) U <- matrix(rnorm(p*n,0,0.1),n,p) X <- F_1%*%t(B)+U Y <- alpha + X%*%beta + rnorm(n,0,0.5) alpha_init <- INIT(Y,F_1,0.1) SILFS(Y,cbind(F_1,U),3,0.01,0.05,alpha_init,K,0.3)