Title: | Robust Gene-Environment Interaction Analysis |
---|---|
Description: | Description: For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data. (Mengyun Wu et al (2017), <doi:10.1080/00949655.2018.1523411>; Mengyun Wu et al (2017), <doi:10.1002/gepi.22055>; Yaqing Xu et al (2018), <doi:10.1080/00949655.2018.1523411>; Yaqing Xu et al (2019), <doi:10.1016/j.ygeno.2018.07.006>; Mengyun Wu et al (2021), <doi:10.1093/bioinformatics/btab318>). |
Authors: | Mengyun Wu [aut], Xing Qin [aut, cre], Shuangge Ma [aut] |
Maintainer: | Xing Qin <[email protected]> |
License: | GPL-2 |
Version: | 0.3.2 |
Built: | 2024-12-02 06:49:39 UTC |
Source: | CRAN |
The covariance matrix with an AR structure among variables, where the marginal variances are 1 and the j
th and k
th variables have correlation coefficient rho^abs(j-k)
.
AR(rho, p)
AR(rho, p)
rho |
The correlation coefficient indicating the AR relationship between the variables. |
p |
The dimension of variables. |
A covariance matrix.
We consider the scenario with missingness in environmental (E) measurements. Our approach
consists of two steps. We first develop a nonparametric kernel-based data augmentation
approach to accommodate missingness. Then, we adopt a penalization approach BLMCP
for regularized estimation and selection of important interactions and main genetic (G) effects,
where the "main effects-interactions" hierarchical structure is respected.
As E variables are usually preselected and have a low dimension, selection is not conducted on E
variables. With a well-designed weighting scheme, a nice "byproduct" is that the proposed
approach enjoys a certain robustness property.
Augmented.data(G, E, Y, h, family = c("continuous", "survival"), E_type)
Augmented.data(G, E, Y, h, family = c("continuous", "survival"), E_type)
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
h |
The bandwidths of the kernel functions with the first and second elements corresponding to the discrete and continuous E factors. |
family |
Response type of |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
E_w |
The augmented data corresponding to |
G_w |
The augmented data corresponding to |
y_w |
The augmented data corresponding to response |
weight |
The weights of the augmented observation data for accommodating missingness and also
right censoring if |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing
Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
set.seed(100) sigmaG=AR(0.3,50) G=MASS::mvrnorm(100,rep(0,50),sigmaG) E=matrix(rnorm(100*5),100,5) E[,2]=E[,2]>0 E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,50) beta[1,1:7]=runif(7,2,3) beta[2:4,1]=runif(3,2,3) beta[2:3,2]=runif(2,2,3) beta[5,3]=runif(1,2,3) # continuous with Normal error N(0,4) y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=rnorm(100,0,4),family="continuous") # survival with Normal error N(0,1) y2=simulated_data(G,E,alpha,beta,rnorm(100,0,1),family="survival",0.7,0.9) # generate E measurements with missingness miss_label1=c(2,6,8,15) miss_label2=c(4,6,8,16) E1=E2=E;E1[miss_label1,1]=NA;E2[miss_label2,1]=NA # continuous data_new1<-Augmented.data(G,E1,y1,h=c(0.5,1), family="continuous", E_type=c("EC","ED","ED","EC","EC")) fit1<-BLMCP(data_new1$G_w, data_new1$E_w, data_new1$y_w, data_new1$weight, lambda1=0.025,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E[c(1,2),],G[c(1,2),]) plot(fit1) ## survival data_new2<-Augmented.data(G,E2,y2, h=c(0.5,1), family="survival", E_type=c("EC","ED","ED","EC","EC")) fit2<-BLMCP(data_new2$G_w, data_new2$E_w, data_new2$y_w, data_new2$weight, lambda1=0.04,lambda2=0.05,gamma1=3,gamma2=3,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E[c(1,2),],G[c(1,2),]) plot(fit2)
set.seed(100) sigmaG=AR(0.3,50) G=MASS::mvrnorm(100,rep(0,50),sigmaG) E=matrix(rnorm(100*5),100,5) E[,2]=E[,2]>0 E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,50) beta[1,1:7]=runif(7,2,3) beta[2:4,1]=runif(3,2,3) beta[2:3,2]=runif(2,2,3) beta[5,3]=runif(1,2,3) # continuous with Normal error N(0,4) y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=rnorm(100,0,4),family="continuous") # survival with Normal error N(0,1) y2=simulated_data(G,E,alpha,beta,rnorm(100,0,1),family="survival",0.7,0.9) # generate E measurements with missingness miss_label1=c(2,6,8,15) miss_label2=c(4,6,8,16) E1=E2=E;E1[miss_label1,1]=NA;E2[miss_label2,1]=NA # continuous data_new1<-Augmented.data(G,E1,y1,h=c(0.5,1), family="continuous", E_type=c("EC","ED","ED","EC","EC")) fit1<-BLMCP(data_new1$G_w, data_new1$E_w, data_new1$y_w, data_new1$weight, lambda1=0.025,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E[c(1,2),],G[c(1,2),]) plot(fit1) ## survival data_new2<-Augmented.data(G,E2,y2, h=c(0.5,1), family="survival", E_type=c("EC","ED","ED","EC","EC")) fit2<-BLMCP(data_new2$G_w, data_new2$E_w, data_new2$y_w, data_new2$weight, lambda1=0.04,lambda2=0.05,gamma1=3,gamma2=3,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E[c(1,2),],G[c(1,2),]) plot(fit2)
Selects a point along the regularization path of a fitted BLMCP
object according to
the BIC.
bic.BLMCP( G, E, Y, weight = NULL, lambda1_set = NULL, lambda2_set = NULL, nlambda1 = 20, nlambda2 = 20, gamma1 = 6, gamma2 = 6, max_iter = 200 )
bic.BLMCP( G, E, Y, weight = NULL, lambda1_set = NULL, lambda2_set = NULL, nlambda1 = 20, nlambda2 = 20, gamma1 = 6, gamma2 = 6, max_iter = 200 )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for continuous response. For survival response, |
weight |
Observation weights. |
lambda1_set |
A user supplied lambda sequence for group minimax concave penalty (MCP), where each main G effect and its corresponding interactions are regarded as a group. |
lambda2_set |
A user supplied lambda sequence for MCP accommodating interaction selection. |
nlambda1 |
The number of lambda1 values. |
nlambda2 |
The number of lambda2 values. |
gamma1 |
The regularization parameter of the group MCP penalty. |
gamma2 |
The regularization parameter of the MCP penalty. |
max_iter |
Maximum number of iterations. |
An object with S3 class "bic.BLMCP"
is returned, which is a list with the ingredients of the BIC fit.
call |
The call that produced this object. |
alpha |
The matrix of the coefficients for main E effects, each column corresponds to one combination of (lambda1,lambda2). |
beta |
The coefficients for main G effects and G-E interactions, each column corresponds to
one combination of (lambda1,lambda2). For each column, the first element is the first G effect and
the second to ( |
df |
The number of nonzeros for each value of (lambda1,lambda2). |
BIC |
Bayesian Information Criterion for each value of (lambda1,lambda2). |
alpha_estimate |
Final alpha estimate using Bayesian Information Criterion. |
beta_estimate |
Final beta estimate using Bayesian Information Criterion. |
lambda_combine |
The matrix of (lambda1, lambda2), with the first column being the values of lambda1, the second being the values of lambda2. |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing
Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
predict
, coef
and plot
methods,
and the BLMCP
function.
set.seed(100) sigmaG=AR(0.3,50) G=MASS::mvrnorm(150,rep(0,50),sigmaG) E=matrix(rnorm(150*5),150,5) E[,2]=E[,2]>0;E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,50);beta[1,1:8]=runif(8,2,3) beta[2:4,1]=runif(3,2,3) beta[2:3,2]=runif(2,2,3) beta[5,3]=runif(1,2,3) # continuous with Normal error y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=rnorm(150),family="continuous") # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1),family="survival",0.8,1) # continuous fit1<-bic.BLMCP(G,E,y1,weight=NULL,lambda1_set=NULL,lambda2_set=NULL, nlambda1=10,nlambda2=10,gamma1=6,gamma2=6,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E,G) plot(fit1) ## survival fit2<-bic.BLMCP(G,E,y2,weight=NULL,lambda1_set=NULL,lambda2_set=NULL, nlambda1=20,nlambda2=20,gamma1=6,gamma2=6,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E,G) plot(fit2)
set.seed(100) sigmaG=AR(0.3,50) G=MASS::mvrnorm(150,rep(0,50),sigmaG) E=matrix(rnorm(150*5),150,5) E[,2]=E[,2]>0;E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,50);beta[1,1:8]=runif(8,2,3) beta[2:4,1]=runif(3,2,3) beta[2:3,2]=runif(2,2,3) beta[5,3]=runif(1,2,3) # continuous with Normal error y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=rnorm(150),family="continuous") # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1),family="survival",0.8,1) # continuous fit1<-bic.BLMCP(G,E,y1,weight=NULL,lambda1_set=NULL,lambda2_set=NULL, nlambda1=10,nlambda2=10,gamma1=6,gamma2=6,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E,G) plot(fit1) ## survival fit2<-bic.BLMCP(G,E,y2,weight=NULL,lambda1_set=NULL,lambda2_set=NULL, nlambda1=20,nlambda2=20,gamma1=6,gamma2=6,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E,G) plot(fit2)
Selects a point along the regularization path of a fitted PTReg
object according to
the BIC.
bic.PTReg( G, E, Y, lambda1_set, lambda2_set, gamma1, gamma2, max_init, h = NULL, tau = 0.4, mu = 2.5, family = c("continuous", "survival") )
bic.PTReg( G, E, Y, lambda1_set, lambda2_set, gamma1, gamma2, max_init, h = NULL, tau = 0.4, mu = 2.5, family = c("continuous", "survival") )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
lambda1_set |
A user supplied lambda sequence for minimax concave penalty (MCP) accommodating main G effect selection. |
lambda2_set |
A user supplied lambda sequence for MCP accommodating interaction selection. |
gamma1 |
The regularization parameter of the MCP penalty corresponding to G effects. |
gamma2 |
The regularization parameter of the MCP penalty corresponding to G-E interactions. |
max_init |
The number of initializations. |
h |
The number of the trimmed samples if the parameter |
tau |
The threshold value used in stability selection. |
mu |
The parameter for screening outliers with extreme absolute residuals if the number of
the trimmed samples |
family |
Response type of |
An object with S3 class "bic.PTReg"
is returned, which is a list with the ingredients of the BIC fit.
call |
The call that produced this object. |
alpha |
The matrix of the coefficients for main E effects, each column corresponds to one combination of (lambda1,lambda2). |
beta |
The coefficients for main G effects and G-E interactions, each column corresponds to
one combination of (lambda1,lambda2). For each column, the first element is the first G effect and
the second to ( |
intercept |
Matrix of the intercept estimate, each column corresponds to one combination of (lambda1,lambda2). |
df |
The number of nonzeros for each value of (lambda1,lambda2). |
BIC |
Bayesian Information Criterion for each value of (lambda1,lambda2). |
family |
The same as input |
intercept_estimate |
Final intercept estimate using Bayesian Information Criterion. |
alpha_estimate |
Final alpha estimate using Bayesian Information Criterion. |
beta_estimate |
Final beta estimate using Bayesian Information Criterion. |
lambda_combine |
Matrix of (lambda1, lambda2), with the first column being the values of lambda1, the second being the values of lambda2. |
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
sigmaG<-AR(rho=0.3,p=30) sigmaE<-AR(rho=0.3,p=3) set.seed(300) G=MASS::mvrnorm(150,rep(0,30),sigmaG) EC=MASS::mvrnorm(150,rep(0,2),sigmaE[1:2,1:2]) ED = matrix(rbinom((150),1,0.6),150,1) E=cbind(EC,ED) alpha=runif(3,0.8,1.5) beta=matrix(0,4,30) beta[1,1:4]=runif(4,1,1.5) beta[2,c(1,2)]=runif(2,1,1.5) lambda1_set=lambda2_set=c(0.2,0.25,0.3,0.35,0.4,0.5) #continuous response with outliers/contaminations in response variable y1=simulated_data(G,E,alpha,beta,error=c(rnorm(140),rcauchy(10,0,5)),family="continuous") fit1<-bic.PTReg(G,E,y1,lambda1_set,lambda2_set,gamma1=6,gamma2=6, max_init=50,tau=0.6,mu=2.5,family="continuous") coefficients1=coefficients(fit1) y_predict=predict(fit1,E,G) plot(fit1) # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1),family="survival",0.7,0.9) fit2<-bic.PTReg(G,E,y2,lambda1_set,lambda2_set,gamma1=6,gamma2=6, max_init=50,tau=0.6,mu=2.5,family="survival") coefficients2=coefficients(fit2) y_predict=predict(fit2,E,G) plot(fit2)
sigmaG<-AR(rho=0.3,p=30) sigmaE<-AR(rho=0.3,p=3) set.seed(300) G=MASS::mvrnorm(150,rep(0,30),sigmaG) EC=MASS::mvrnorm(150,rep(0,2),sigmaE[1:2,1:2]) ED = matrix(rbinom((150),1,0.6),150,1) E=cbind(EC,ED) alpha=runif(3,0.8,1.5) beta=matrix(0,4,30) beta[1,1:4]=runif(4,1,1.5) beta[2,c(1,2)]=runif(2,1,1.5) lambda1_set=lambda2_set=c(0.2,0.25,0.3,0.35,0.4,0.5) #continuous response with outliers/contaminations in response variable y1=simulated_data(G,E,alpha,beta,error=c(rnorm(140),rcauchy(10,0,5)),family="continuous") fit1<-bic.PTReg(G,E,y1,lambda1_set,lambda2_set,gamma1=6,gamma2=6, max_init=50,tau=0.6,mu=2.5,family="continuous") coefficients1=coefficients(fit1) y_predict=predict(fit1,E,G) plot(fit1) # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1),family="survival",0.7,0.9) fit2<-bic.PTReg(G,E,y2,lambda1_set,lambda2_set,gamma1=6,gamma2=6, max_init=50,tau=0.6,mu=2.5,family="survival") coefficients2=coefficients(fit2) y_predict=predict(fit2,E,G) plot(fit2)
The joint gene-environment (G-E) interaction analysis approach developed in Liu et al, 2013. To accommodate "main effects, interactions" hierarchy, two types of penalty, group minimax concave penalty (MCP) and MCP are adopted. Specifically, for each G factor, its main effect and corresponding G-E interactions are regarded as a group, where the group MCP is imposed to identify whether this G factor has any effect at all. In addition, the MCP is imposed on the interaction terms to further identify important interactions.
BLMCP( G, E, Y, weight = NULL, lambda1, lambda2, gamma1 = 6, gamma2 = 6, max_iter = 200 )
BLMCP( G, E, Y, weight = NULL, lambda1, lambda2, gamma1 = 6, gamma2 = 6, max_iter = 200 )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for continuous response. For survival response, |
weight |
Observation weights. |
lambda1 |
A user supplied lambda for group MCP, where each main G effect and its corresponding interactions are regarded as a group. |
lambda2 |
A user supplied lambda for MCP accommodating interaction selection. |
gamma1 |
The regularization parameter of the group MCP penalty. |
gamma2 |
The regularization parameter of the MCP penalty. |
max_iter |
Maximum number of iterations. |
An object with S3 class "BLMCP"
is returned, which is a list with the following components.
call |
The call that produced this object. |
alpha |
The matrix of the coefficients for main E effects. |
beta |
The matrix of the regression coefficients for all main G effects (the first row) and interactions. |
df |
The number of nonzeros. |
BIC |
Bayesian Information Criterion. |
aa |
The indicator representing whether the algorithm reaches convergence. |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang,
Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
predict
, and coef
, and plot
, and bic.BLMCP
and
Augmentated.data
methods.
set.seed(100) sigmaG=AR(0.3,100) G=MASS::mvrnorm(250,rep(0,100),sigmaG) E=matrix(rnorm(250*5),250,5) E[,2]=E[,2]>0;E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,100);beta[1,1:8]=runif(8,2,3) beta[2:4,1]=runif(3,2,3);beta[2:3,2]=runif(2,2,3);beta[5,3]=runif(1,2,3) # continuous with Normal error y1=simulated_data(G,E,alpha,beta,error=rnorm(250),family="continuous") fit1<-BLMCP(G,E,y1,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E,G) plot(fit1) # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(250,0,1),family="survival",0.7,0.9) fit2<-BLMCP(G,E,y2,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E,G) plot(fit2)
set.seed(100) sigmaG=AR(0.3,100) G=MASS::mvrnorm(250,rep(0,100),sigmaG) E=matrix(rnorm(250*5),250,5) E[,2]=E[,2]>0;E[,3]=E[,3]>0 alpha=runif(5,2,3) beta=matrix(0,5+1,100);beta[1,1:8]=runif(8,2,3) beta[2:4,1]=runif(3,2,3);beta[2:3,2]=runif(2,2,3);beta[5,3]=runif(1,2,3) # continuous with Normal error y1=simulated_data(G,E,alpha,beta,error=rnorm(250),family="continuous") fit1<-BLMCP(G,E,y1,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef1=coef(fit1) y1_hat=predict(fit1,E,G) plot(fit1) # survival with Normal error y2=simulated_data(G,E,alpha,beta,rnorm(250,0,1),family="survival",0.7,0.9) fit2<-BLMCP(G,E,y2,weight=NULL,lambda1=0.05,lambda2=0.06,gamma1=3,gamma2=3,max_iter=200) coef2=coef(fit2) y2_hat=predict(fit2,E,G) plot(fit2)
This function extracts the coefficients of main effects and interactions
from a BIC BLMCP model, using the stored "bic.BLMCP"
object.
## S3 method for class 'bic.BLMCP' coef(object, ...)
## S3 method for class 'bic.BLMCP' coef(object, ...)
object |
Fitted |
... |
Not used. Other arguments to get coefficients. |
The object returned depends on the ... argument which is passed on to the coef
method for bic.BLMCP
objects.
alpha |
The matrix of the coefficients for main environmental effects. |
beta |
The matrix of the regression coefficients for all main genetic effects (the first row) and interactions. |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing
Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
bic.BLMCP
, and predict
, and plot
methods, and the
BLMCP
function.
This function extracts the coefficients of main effects and interactions from a BIC PTReg model,
using the stored "bic.PTReg"
object.
## S3 method for class 'bic.PTReg' coef(object, ...)
## S3 method for class 'bic.PTReg' coef(object, ...)
object |
Fitted "bic.PTReg" model object. |
... |
Not used. Other arguments to get coefficients. |
The object returned depends on the ... argument which is passed on to the coef
method for bic.PTReg
objects.
intercept |
The intercept estimate. |
alpha |
The matrix of the coefficients for main environmental effects. |
beta |
The matrix of the regression coefficients for all main genetic effects (the first row) and interactions. |
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
bic.PTReg
, and predict
, and plot
methods, and
PTReg
.
This function extracts the coefficients of main effects and interactions from a BLMCP model,
using the stored "BLMCP"
object.
## S3 method for class 'BLMCP' coef(object, ...)
## S3 method for class 'BLMCP' coef(object, ...)
object |
Fitted |
... |
Not used. Other arguments to get coefficients. |
The object returned depends on the ... argument which is passed on to the coef
method for BLMCP
objects.
alpha |
The matrix of the coefficients for main environmental effects. |
beta |
The matrix of the regression coefficients for all main genetic effects (the first row) and interactions. |
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
BLMCP
, and predict
, plot
methods, and
bic.BLMCP
.
This function extracts main effect and interaction coefficients from a PTReg model, using the
stored "PTReg"
object.
## S3 method for class 'PTReg' coef(object, ...)
## S3 method for class 'PTReg' coef(object, ...)
object |
Fitted |
... |
Not used. Other arguments to get coefficients. |
The object returned depends on the ... argument which is passed on to the coef
method for PTReg
objects.
intercept |
The intercept estimate. |
alpha |
The matrix of the coefficients for main environmental effects. |
beta |
The matrix of the regression coefficients for all main genetic effects (the first row) and interactions. |
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
PTReg
, and predict
methods, and
bic.PTReg
.
This function extracts coefficients from a RobSBoosting model, using the stored
"RobSBoosting"
object.
## S3 method for class 'RobSBoosting' coef(object, ...)
## S3 method for class 'RobSBoosting' coef(object, ...)
object |
Fitted |
... |
Not used. Other arguments to get coefficients. |
intercept |
The intercept estimate. |
unique_variable |
A matrix with two columns that represents the variables that are selected
for the model after removing the duplicates, since the |
unique_coef |
Coefficients corresponding to |
unique_knots |
A list of knots corresponding to |
unique_Boundary.knots |
A list of boundary knots corresponding to |
unique_vtype |
A vector representing the variable type of |
estimation_results |
A list of estimation results for each variable. Here, the first
|
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
RobSBoosting
, and predict
, and plot
methods.
A data frame containing the 7 environmental (E)
effects (the first 7 columns), 2000 genetic (G) effects (column 8 to column 2007), logarithm of survival time
(column 2008), and censoring indicator (column 2009). All of them can be downloaded from TCGA Provisional using the
R package cgdsr
. See details.
data(HNSCC)
data(HNSCC)
A data frame with 484 rows and 2009 variables.
There are seven E effects, namely alcohol consumption frequency (ACF), smoking pack years (SPY), age, gender, PN, PT, and ICD O3 site. For G effects, 2,000 gene expressions are considered. Among 484 subjects, 343 subjects have missingness in ACF and/or SPY. For G effects, we analyze mRNA gene expressions. A total of 18,409 gene expression measurements are available, then prescreening is conducted using marginal Cox models, finally, the top 2,000 genes with the smallest p-values are selected for downstream analysis.
data(HNSCC) E=as.matrix(HNSCC[,1:7]) G=as.matrix(HNSCC[,8:2007]) Y=as.matrix(HNSCC[,2008:2009]) fit<-Miss.boosting(G,E,Y,im_time=10,loop_time=1000,v=0.25,num.knots=5,degree=3,tau=0.3, family="survival",E_type=c(rep("EC",3),rep("ED",4))) plot(fit)
data(HNSCC) E=as.matrix(HNSCC[,1:7]) G=as.matrix(HNSCC[,8:2007]) Y=as.matrix(HNSCC[,2008:2009]) fit<-Miss.boosting(G,E,Y,im_time=10,loop_time=1000,v=0.25,num.knots=5,degree=3,tau=0.3, family="survival",E_type=c(rep("EC",3),rep("ED",4))) plot(fit)
This gene-environment analysis approach includes three steps to accommodate both missingness
in environmental (E) measurements and long-tailed or contaminated outcomes. At the first step,
the multiple imputation approach based on sparse boosting method is developed to accommodate
missingness in E measurements, where we use NA
to represent those E measurments which
are missing. Here a semiparametric model is assumed to accommodate nonlinear effects, where we
model continuous E factors in a nonlinear way, and discrete E factors in a linear way. For
estimating the nonlinear functions, the B spline expansion is adopted. At the second step, for
each imputed data, we develop RobSBoosting
approach for identifying important main E
and genetic (G) effects, and G-E interactions, where the Huber loss function and Qn estimator are
adopted to accommodate long-tailed distribution/data contamination (see RobSBoosting
).
At the third step, the identification results from Step 2 are combined based on stability
selection technique.
Miss.boosting( G, E, Y, im_time = 10, loop_time = 500, num.knots = c(2), Boundary.knots, degree = c(2), v = 0.1, tau, family = c("continuous", "survival"), knots = NULL, E_type )
Miss.boosting( G, E, Y, im_time = 10, loop_time = 500, num.knots = c(2), Boundary.knots, degree = c(2), v = 0.1, tau, family = c("continuous", "survival"), knots = NULL, E_type )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
im_time |
Number of imputation for accommodating missingness in E variables. |
loop_time |
Number of iterations of the sparse boosting. |
num.knots |
Numbers of knots for the B spline basis. |
Boundary.knots |
The boundary of knots for the B spline basis. |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. Default is 0.1. |
tau |
Threshold used in the stability selection at the third step. |
family |
Response type of |
knots |
List of knots for the B spline basis. Default is NULL and knots can be generated
with the given |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
An object with S3 class "Miss.boosting"
is returned, which is a list with the following components
call |
The call that produced this object. |
alpha0 |
A vector with each element indicating whether the corresponding E factor is selected. |
beta0 |
A vector with each element indicating whether the corresponding G factor or G-E
interaction is selected. The first element is the first G effect and the second to
( |
intercept |
The intercept estimate. |
unique_variable |
A matrix with two columns that represents the variables that are
selected for the model after removing the duplicates, since the |
unique_coef |
Coefficients corresponding to |
unique_knots |
A list of knots corresponding to |
unique_Boundary.knots |
A list of boundary knots corresponding to
|
unique_vtype |
A vector representing the variable type of |
degree |
Degree for the B spline basis. |
NorM |
The values of B spline basis. |
E_type |
The type of E effects. |
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
data(Rob_data) G=Rob_data[,1:20];E=Rob_data[,21:24] Y=Rob_data[,25];Y_s=Rob_data[,26:27] knots=list();Boundary.knots=matrix(0,(20+4),2) for (i in 1:4){ knots[[i]]=c(0,1) Boundary.knots[i,]=c(0,1) } E2=E1=E ##continuous E1[7,1]=NA fit1<-Miss.boosting(G,E1,Y,im_time=1,loop_time=100,num.knots=c(2),Boundary.knots, degree=c(2),v=0.1,tau=0.3,family="continuous",knots=knots,E_type=c("EC","EC","ED","ED")) y1_hat=predict(fit1,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1)) plot(fit1) ##survival E2[4,1]=NA fit2<-Miss.boosting(G,E2,Y_s,im_time=2,loop_time=200,num.knots=c(2),Boundary.knots, degree=c(2),v=0.1,tau=0.3,family="survival",knots,E_type=c("EC","EC","ED","ED")) y2_hat=predict(fit2,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1)) plot(fit2)
data(Rob_data) G=Rob_data[,1:20];E=Rob_data[,21:24] Y=Rob_data[,25];Y_s=Rob_data[,26:27] knots=list();Boundary.knots=matrix(0,(20+4),2) for (i in 1:4){ knots[[i]]=c(0,1) Boundary.knots[i,]=c(0,1) } E2=E1=E ##continuous E1[7,1]=NA fit1<-Miss.boosting(G,E1,Y,im_time=1,loop_time=100,num.knots=c(2),Boundary.knots, degree=c(2),v=0.1,tau=0.3,family="continuous",knots=knots,E_type=c("EC","EC","ED","ED")) y1_hat=predict(fit1,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1)) plot(fit1) ##survival E2[4,1]=NA fit2<-Miss.boosting(G,E2,Y_s,im_time=2,loop_time=200,num.knots=c(2),Boundary.knots, degree=c(2),v=0.1,tau=0.3,family="survival",knots,E_type=c("EC","EC","ED","ED")) y2_hat=predict(fit2,matrix(E1[1,],nrow=1),matrix(G[1,],nrow=1)) plot(fit2)
Draw a heatmap for estimated coefficients in a fitted
"bic.BLMCP"
object.
## S3 method for class 'bic.BLMCP' plot(x, ...)
## S3 method for class 'bic.BLMCP' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
A heatmap for estimated coefficients.
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang,
Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization.
Genomics, 102(4):189-194, 2013.
predict
, coef
and BLMCP
methods.
Draw a heatmap for estimated coefficients in a fitted
"bic.PTReg"
object.
## S3 method for class 'bic.PTReg' plot(x, ...)
## S3 method for class 'bic.PTReg' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
A heatmap for estimated coefficients.
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
bic.PTReg
, and predict
, and coef
methods.
Draw a heatmap for estimated coefficients in a fitted
"BLMCP"
object.
## S3 method for class 'BLMCP' plot(x, ...)
## S3 method for class 'BLMCP' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
A heatmap for estimated coefficients.
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma. Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma. Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102(4):189-194, 2013.
BLMCP
, and predict
and coef
methods.
Draw plots for estimated parameters in a fitted
"Miss.boosting"
object, including a heatmap for discrete
environmental (E) effects, and selected genetic (G) effects and G-E interactions, and plots for each
of selected continuous E (EC) effect and interactions between EC and G.
## S3 method for class 'Miss.boosting' plot(x, ...)
## S3 method for class 'Miss.boosting' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
A heatmap for estimated coefficients.
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
Miss.boosting
, and predict
methods.
Draw a heatmap for estimated coefficients in a fitted
"PTReg"
object.
## S3 method for class 'PTReg' plot(x, ...)
## S3 method for class 'PTReg' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
A heatmap for estimated coefficients.
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
PTReg
, and predict
, and coef
methods.
Draw plots for estimated parameters in a fitted
"RobSBoosting"
object, including a heatmap for discrete
environmental (E) effects, and selected genetic (G) effects and G-E interactions, and plots for
each of selected continuous E (EC) effect and interactions between EC and G.
## S3 method for class 'RobSBoosting' plot(x, ...)
## S3 method for class 'RobSBoosting' plot(x, ...)
x |
Fitted |
... |
Other graphical parameters to plot. |
Plots for estimated coefficients.
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
RobSBoosting
, predict
and coef
methods.
This function makes predictions from a BIC BLMCP model, using the stored
"bic.BLMCP"
object. This function makes it easier to use the results of BIC to
make a prediction.
## S3 method for class 'bic.BLMCP' predict(object, newE, newG, ...)
## S3 method for class 'bic.BLMCP' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for BLMCP
objects.
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma. Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang, Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and
Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102(4):189-194, 2013.
http://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC3869641&blobtype=pdf
coef
, and plot
and
bic.BLMCP
methods, and BLMCP
.
This function makes predictions from a BIC PTReg model, using the stored "bic.PTReg"
object. This function makes it easier to use the results of BIC to make
a prediction.
## S3 method for class 'bic.PTReg' predict(object, newE, newG, ...)
## S3 method for class 'bic.PTReg' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for PTReg
objects.
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
bic.PTReg
, and coef
, and plot
methods, and PTReg
.
This function makes predictions from a BLMCP
model, using the stored "BLMCP"
object.
## S3 method for class 'BLMCP' predict(object, newE, newG, ...)
## S3 method for class 'BLMCP' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for BLMCP
objects.
Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, and Shuangge Ma.
Accommodating missingness in environmental measurements in gene-environment interaction
analysis. Genetic Epidemiology, 41(6):523-554, 2017.
Jin Liu, Jian Huang, Yawei Zhang,
Qing Lan, Nathaniel Rothman, Tongzhang Zheng, and Shuangge Ma.
Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102(4):189-194, 2013.
BLMCP
, coef
, and plot
methods, and bic.BLMCP
method.
This function makes predictions from a Miss.boosting model, using the stored
"Miss.boosting"
object.
## S3 method for class 'Miss.boosting' predict(object, newE, newG, ...)
## S3 method for class 'Miss.boosting' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for Miss.boosting
objects.
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
Miss.boosting
, and plot
methods.
This function makes predictions from a PTReg model, using the stored "PTReg"
object.
## S3 method for class 'PTReg' predict(object, newE, newG, ...)
## S3 method for class 'PTReg' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for PTReg
objects.
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
PTReg
, coef
and plot
methods, and bic.PTReg
.
This function makes predictions from a RobSBoosting
model, using the stored "RobSBoosting"
object.
## S3 method for class 'RobSBoosting' predict(object, newE, newG, ...)
## S3 method for class 'RobSBoosting' predict(object, newE, newG, ...)
object |
Fitted |
newE |
Matrix of new values for |
newG |
Matrix of new values for |
... |
Not used. Other arguments to predict. |
The object returned depends on the ... argument which is passed
on to the predict
method for RobSBoosting
objects.
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
RobSBoosting
, coef
, and plot
methods.
Gene-environment interaction analysis using penalized trimmed regression, which is robust to outliers in both predictor and response spaces. The objective function is based on trimming technique, where the samples with extreme absolute residuals are trimmed. A decomposition framework is adopted for accommodating "main effects-interactions" hierarchy, and minimax concave penalty (MCP) is adopted for regularized estimation and interaction (and main genetic effect) selection.
PTReg( G, E, Y, lambda1, lambda2, gamma1 = 6, gamma2 = 6, max_init, h = NULL, tau = 0.4, mu = 2.5, family = c("continuous", "survival") )
PTReg( G, E, Y, lambda1, lambda2, gamma1 = 6, gamma2 = 6, max_init, h = NULL, tau = 0.4, mu = 2.5, family = c("continuous", "survival") )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
lambda1 |
A user supplied lambda for MCP accommodating main G effect selection. |
lambda2 |
A user supplied lambda for MCP accommodating G-E interaction selecton. |
gamma1 |
The regularization parameter of the MCP penalty corresponding to G effects. |
gamma2 |
The regularization parameter of the MCP penalty corresponding to G-E interactions. |
max_init |
The number of initializations. |
h |
The number of the trimmed samples if the parameter |
tau |
The threshold value used in stability selection. |
mu |
The parameter for screening outliers with extreme absolute residuals if the number
of the trimmed samples |
family |
Response type of |
An object with S3 class "PTReg"
is returned, which is a list with the following components.
call |
The call that produced this object. |
intercept |
The intercept estimate. |
alpha |
The matrix of the coefficients for main E effects. |
beta |
The matrix of the regression coefficients for all main G effects (the first row) and interactions. |
df |
The number of nonzeros. |
BIC |
Bayesian Information Criterion. |
select_sample |
Selected samples where samples with extreme absolute residuals are trimmed. |
family |
The same as input |
Yaqing Xu, Mengyun Wu, Shuangge Ma, and Syed Ejaz Ahmed. Robust gene-environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88(18):3502-3528, 2018.
coef
, predict
, and plot
methods, and bic.PTReg
method.
sigmaG<-AR(rho=0.3,p=30) sigmaE<-AR(rho=0.3,p=3) set.seed(300) G=MASS::mvrnorm(150,rep(0,30),sigmaG) EC=MASS::mvrnorm(150,rep(0,2),sigmaE[1:2,1:2]) ED = matrix(rbinom((150),1,0.6),150,1) E=cbind(EC,ED) alpha=runif(3,0.8,1.5) beta=matrix(0,4,30) beta[1,1:4]=runif(4,1,1.5) beta[2,c(1,2)]=runif(2,1,1.5) #continuous response y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=c(rnorm(130), rcauchy(20,0,5)),family="continuous") fit1<-PTReg(G=G,E=E,y1,lambda1=0.3,lambda2=0.3,gamma1=6,gamma2=6, max_init=50,h=NULL,tau=0.6,mu=2.5,family="continuous") coef1=coef(fit1) y_hat1=predict(fit1,E,G) plot(fit1) # survival response y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1), family="survival",0.7,0.9) fit2<-PTReg(G=G,E=E,y2,lambda1=0.3,lambda2=0.3,gamma1=6,gamma2=6, max_init=50,h=NULL,tau=0.6,mu=2.5,family="survival") coef2=coef(fit2) y_hat2=predict(fit2,E,G) plot(fit2)
sigmaG<-AR(rho=0.3,p=30) sigmaE<-AR(rho=0.3,p=3) set.seed(300) G=MASS::mvrnorm(150,rep(0,30),sigmaG) EC=MASS::mvrnorm(150,rep(0,2),sigmaE[1:2,1:2]) ED = matrix(rbinom((150),1,0.6),150,1) E=cbind(EC,ED) alpha=runif(3,0.8,1.5) beta=matrix(0,4,30) beta[1,1:4]=runif(4,1,1.5) beta[2,c(1,2)]=runif(2,1,1.5) #continuous response y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=c(rnorm(130), rcauchy(20,0,5)),family="continuous") fit1<-PTReg(G=G,E=E,y1,lambda1=0.3,lambda2=0.3,gamma1=6,gamma2=6, max_init=50,h=NULL,tau=0.6,mu=2.5,family="continuous") coef1=coef(fit1) y_hat1=predict(fit1,E,G) plot(fit1) # survival response y2=simulated_data(G,E,alpha,beta,rnorm(150,0,1), family="survival",0.7,0.9) fit2<-PTReg(G=G,E=E,y2,lambda1=0.3,lambda2=0.3,gamma1=6,gamma2=6, max_init=50,h=NULL,tau=0.6,mu=2.5,family="survival") coef2=coef(fit2) y_hat2=predict(fit2,E,G) plot(fit2)
A robust gene-environment interaction identification approach using the quantile partial correlation technique. This approach is a marginal analysis approach built on the quantile regression technique, which can accommodate long-tailed or contaminated outcomes. For response with right censoring, Kaplan-Meier (KM) estimator-based weights are adopted to easily accommodate censoring. In addition, it adopts partial correlation to identify important interactions while properly controlling for the main genetic (G) and environmental (E) effects.
QPCorr.matrix(G, E, Y, tau, w = NULL, family = c("continuous", "survival"))
QPCorr.matrix(G, E, Y, tau, w = NULL, family = c("continuous", "survival"))
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
tau |
Quantile. |
w |
Weight for accommodating censoring if |
family |
Response type of |
Matrix of (censored) quantile partial correlations for interactions.
Yaqing Xu, Mengyun Wu, Qingzhao Zhang, and Shuangge Ma. Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111(5):1115-1123, 2019.
QPCorr.pval
method.
alpha=matrix(0,5,1) alpha[1:2]=1 beta=matrix(0,6,100) beta[1,1:5]=1 beta[2:3,1:5]=2 beta[4:6,6:7]=2 sigmaG<-AR(rho=0.3,100) sigmaE<-AR(rho=0.3,5) G<-MASS::mvrnorm(200,rep(0,100),sigmaG) E<-MASS::mvrnorm(200,rep(0,5),sigmaE) e1<-rnorm(200*.05,50,1);e2<-rnorm(200*.05,-50,1);e3<-rnorm(200*.9) e<-c(e1,e2,e3) # continuous y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=e,family="continuous") cpqcorr_stat1<-QPCorr.matrix(G,E,y1,tau=0.5,w=NULL,family="continuous") # survival y2=simulated_data(G,E,alpha,beta,rnorm(200,0,1),family="survival",0.7,0.9) cpqcorr_stat<-QPCorr.matrix(G,E,y2,tau=0.5,w=NULL,family="survival")
alpha=matrix(0,5,1) alpha[1:2]=1 beta=matrix(0,6,100) beta[1,1:5]=1 beta[2:3,1:5]=2 beta[4:6,6:7]=2 sigmaG<-AR(rho=0.3,100) sigmaE<-AR(rho=0.3,5) G<-MASS::mvrnorm(200,rep(0,100),sigmaG) E<-MASS::mvrnorm(200,rep(0,5),sigmaE) e1<-rnorm(200*.05,50,1);e2<-rnorm(200*.05,-50,1);e3<-rnorm(200*.9) e<-c(e1,e2,e3) # continuous y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=e,family="continuous") cpqcorr_stat1<-QPCorr.matrix(G,E,y1,tau=0.5,w=NULL,family="continuous") # survival y2=simulated_data(G,E,alpha,beta,rnorm(200,0,1),family="survival",0.7,0.9) cpqcorr_stat<-QPCorr.matrix(G,E,y2,tau=0.5,w=NULL,family="survival")
P-values of the "QPCorr.matrix "
obtained using a permutation approach, the
interactions with smaller p-values are regarded as more important.
QPCorr.pval( G, E, Y, tau, w = NULL, permutation_t = 1000, family = c("continuous", "survival") )
QPCorr.pval( G, E, Y, tau, w = NULL, permutation_t = 1000, family = c("continuous", "survival") )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
tau |
Quantile. |
w |
Weight for accommodating censoring if |
permutation_t |
Number of permutation. |
family |
Response type of |
Matrix of p-value, with the element in the i
th row and the j
column
represents the p-value of the (censored) quantile partial correlation corresponding to the
i
th E and the j
th G.
Yaqing Xu, Mengyun Wu, Qingzhao Zhang, and Shuangge Ma. Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111(5):1115-1123, 2019.
QPCorr.matrix
method.
n=50 alpha=matrix(0,5,1) alpha[1:2]=1 beta=matrix(0,6,20) beta[1,1:4]=1 beta[2:3,1:4]=2 sigmaG<-AR(rho=0.3,20) sigmaE<-AR(rho=0.3,5) G<-MASS::mvrnorm(n,rep(0,20),sigmaG) E<-MASS::mvrnorm(n,rep(0,5),sigmaE) e1<-rnorm(n*.05,50,1);e2<-rnorm(n*.05,-50,1);e3<-rnorm((n-length(e1)-length(e2))) e<-c(e1,e2,e3) # continuous y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=e,family="continuous") cpqcorr_pvalue1<-QPCorr.pval(G,E,y1,tau=0.5,permutation_t=500,family="continuous") # survival y2=simulated_data(G,E,alpha,beta,rnorm(n,0,1),family="survival",0.7,0.9) cpqcorr_pvalue2<-QPCorr.pval(G,E,y2,tau=0.5,permutation_t=500,family="survival")
n=50 alpha=matrix(0,5,1) alpha[1:2]=1 beta=matrix(0,6,20) beta[1,1:4]=1 beta[2:3,1:4]=2 sigmaG<-AR(rho=0.3,20) sigmaE<-AR(rho=0.3,5) G<-MASS::mvrnorm(n,rep(0,20),sigmaG) E<-MASS::mvrnorm(n,rep(0,5),sigmaE) e1<-rnorm(n*.05,50,1);e2<-rnorm(n*.05,-50,1);e3<-rnorm((n-length(e1)-length(e2))) e<-c(e1,e2,e3) # continuous y1=simulated_data(G=G,E=E,alpha=alpha,beta=beta,error=e,family="continuous") cpqcorr_pvalue1<-QPCorr.pval(G,E,y1,tau=0.5,permutation_t=500,family="continuous") # survival y2=simulated_data(G,E,alpha,beta,rnorm(n,0,1),family="survival",0.7,0.9) cpqcorr_pvalue2<-QPCorr.pval(G,E,y2,tau=0.5,permutation_t=500,family="survival")
RobSBoosting
and Miss.boosting
methodsA matrix containing the simulated genetic (G) effects (the first 20 columns), environmental (E) effects (column 21 to column 24), continuous response (column 25), logarithm of survival time (column 26), and censoring indicator (column 27).
data(Rob_data)
data(Rob_data)
A matrix with 100 rows and 27 variables.
data(Rob_data)
data(Rob_data)
Robust semiparametric gene-environment interaction analysis using sparse boosting. Here a semiparametric model is assumed to accommodate nonlinear effects, where we model continuous environmental (E) factors in a nonlinear way, and discrete E factors and all genetic (G) factors in a linear way. For estimating the nonlinear functions, the B spline expansion is adopted. The Huber loss function and Qn estimator are adopted to accommodate long-tailed distribution/data contamination. For model estimation and selection of relevant variables, we adopt an effective sparse boosting approach, where the strong hierarchy is respected.
RobSBoosting( G, E, Y, loop_time, num.knots = NULL, Boundary.knots = NULL, degree = 1, v = 0.1, family = c("continuous", "survival"), knots = NULL, E_type )
RobSBoosting( G, E, Y, loop_time, num.knots = NULL, Boundary.knots = NULL, degree = 1, v = 0.1, family = c("continuous", "survival"), knots = NULL, E_type )
G |
Input matrix of |
E |
Input matrix of |
Y |
Response variable. A quantitative vector for |
loop_time |
Number of iterations of the sparse boosting. |
num.knots |
Numbers of knots for the B spline basis. |
Boundary.knots |
The boundary of knots for the B spline basis. |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. Default is 0.1. |
family |
Response type of |
knots |
List of knots for the B spline basis. Default is NULL and knots can be generated
with the given |
E_type |
A vector indicating the type of each E factor, with "ED" representing discrete E factor, and "EC" representing continuous E factor. |
An object with S3 class "RobSBoosting"
is returned, which is a list with the following components.
call |
The call that produced this object. |
max_t |
The stopping iteration time of the sparse boosting. |
spline_result |
A list of length |
BIC |
A vector of length max_t that includes Bayesian Information Criterion based on the Huber's prediction error. |
variable |
A vector of length max_t that includes the index of selected variable in each iteration. |
id |
The iteration time with the smallest BIC. |
variable_pair |
A matrix with two columns that include the set of variables that can potentially enter the regression model at the stopping iteration time. Here, the first and second columns correspond to the indexes of E factors and G factors. For example, (1, 0) represents that this variable is the first E factor, and (1,2) represents that the variable is the interaction between the first E factor and second G factor. |
v_type |
A vector whose length is the number of rows of |
family |
The same as input |
degree |
Degree for the B spline basis. |
v |
The step size used in the sparse boosting process. |
NorM |
The values of B spline basis. |
estimation_results |
A list of estimation results for each variable. Here, the first
|
Mengyun Wu and Shuangge Ma. Robust semiparametric gene-environment interaction analysis using sparse boosting. Statistics in Medicine, 38(23):4625-4641, 2019.
bs
method for B spline expansion, coef
, predict
, and plot
methods, and Miss.boosting
method.
data(Rob_data) G=Rob_data[,1:20];E=Rob_data[,21:24] Y=Rob_data[,25];Y_s=Rob_data[,26:27] knots = list();Boundary.knots = matrix(0, 24, 2) for(i in 1:4) { knots[[i]] = c(0, 1) Boundary.knots[i, ] = c(0, 1) } #continuous fit1= RobSBoosting(G,E,Y,loop_time = 80,num.knots = 2,Boundary.knots=Boundary.knots, degree = 2,family = "continuous",knots = knots,E_type=c("EC","EC","ED","ED")) coef1 = coef(fit1) predict1=predict(fit1,newE=E[1:2,],newG=G[1:2,]) plot(fit1) #survival fit2= RobSBoosting(G,E,Y_s,loop_time = 200, num.knots = 2, Boundary.knots=Boundary.knots, family = "survival", knots = knots,E_type=c("EC","EC","ED","ED")) coef2 = coef(fit2) predict2=predict(fit2,newE=E[1:2,],newG=G[1:2,]) plot(fit2)
data(Rob_data) G=Rob_data[,1:20];E=Rob_data[,21:24] Y=Rob_data[,25];Y_s=Rob_data[,26:27] knots = list();Boundary.knots = matrix(0, 24, 2) for(i in 1:4) { knots[[i]] = c(0, 1) Boundary.knots[i, ] = c(0, 1) } #continuous fit1= RobSBoosting(G,E,Y,loop_time = 80,num.knots = 2,Boundary.knots=Boundary.knots, degree = 2,family = "continuous",knots = knots,E_type=c("EC","EC","ED","ED")) coef1 = coef(fit1) predict1=predict(fit1,newE=E[1:2,],newG=G[1:2,]) plot(fit1) #survival fit2= RobSBoosting(G,E,Y_s,loop_time = 200, num.knots = 2, Boundary.knots=Boundary.knots, family = "survival", knots = knots,E_type=c("EC","EC","ED","ED")) coef2 = coef(fit2) predict2=predict(fit2,newE=E[1:2,],newG=G[1:2,]) plot(fit2)
Generate simulated response.
simulated_data( G, E, alpha, beta, error, family = c("continuous", "survival"), a1 = NULL, a2 = NULL )
simulated_data( G, E, alpha, beta, error, family = c("continuous", "survival"), a1 = NULL, a2 = NULL )
G |
Input matrix of |
E |
Input matrix of |
alpha |
Matrix of the true coefficients for main E effects. |
beta |
Matrix of the true regression coefficients for all main G effects (the first row) and interactions. |
error |
Error terms. |
family |
Type of the response variable. If |
a1 |
If |
a2 |
If |
Response variable. A quantitative vector for family="continuous"
. For
family="survival"
, it would be a two-column matrix with the first column being the
log(survival time) and the second column being the censoring indicator. The indicator
is a binary variable, with "1" indicating dead, and "0" indicating right censored.