Title: | Interaction Analysis of Repeated Measure Data |
---|---|
Description: | Extensive penalized variable selection methods have been developed in the past two decades for analyzing high dimensional omics data, such as gene expressions, single nucleotide polymorphisms (SNPs), copy number variations (CNVs) and others. However, lipidomics data have been rarely investigated by using high dimensional variable selection methods. This package incorporates our recently developed penalization procedures to conduct interaction analysis for high dimensional lipidomics data with repeated measurements. The core module of this package is developed in C++. The development of this software package and the associated statistical methods have been partially supported by an Innovative Research Award from Johnson Cancer Research Center, Kansas State University. |
Authors: | Fei Zhou, Jie Ren, Yuwen Liu, Xiaoxi Li, Cen Wu, Yu Jiang |
Maintainer: | Fei Zhou <[email protected]> |
License: | GPL-2 |
Version: | 0.4.1 |
Built: | 2024-11-26 06:24:28 UTC |
Source: | CRAN |
This function does k-fold cross-validation for interep and returns the optimal value of lambda.
cv.interep(e, g, y, beta0, lambda1, lambda2, nfolds, corre, pmethod, maxits)
cv.interep(e, g, y, beta0, lambda1, lambda2, nfolds, corre, pmethod, maxits)
e |
matrix of environment factors. |
g |
matrix of omics factors. In the case study, the omics measurements are lipidomics data. |
y |
the longitudinal response. |
beta0 |
the intial value for the coefficient vector. |
lambda1 |
a user-supplied sequence of |
lambda2 |
a user-supplied sequence of |
nfolds |
the number of folds for cross-validation. |
corre |
the working correlation structure that is used in the estimation algorithm. interep provides three choices for the working correlation structure: "a" as AR-1", "i" as "independence" and "e" as "exchangeable". |
pmethod |
the penalization method. "mixed" refers to MCP penalty to individual main effects and group MCP penalty to interactions; "individual" means MCP penalty to all effects. |
maxits |
the maximum number of iterations that is used in the estimation algorithm. |
When dealing with predictors with both main effects and interactions, this function returns two optimal tuning parameters,
and
; when there are only main effects in the predictors, this function returns
,
which is the optimal tuning parameter for individual predictors containing main effects.
an object of class "cv.interep" is returned, which is a list with components:
lam1 |
the optimal |
lam2 |
the optimal |
Zhou, F., Ren, J., Li, G., Jiang, Y., Li, X., Wang, W.and Wu, C. (2019). Penalized variable selection for Lipid–environment interactions in a longitudinal lipidomics study. Genes, 10(12), 1002
Zhou, F., Ren, J., Liu, Y., Li, X., Wang, W.and Wu, C. (2022). Interep: An r package for high-dimensional interaction analysis of the repeated measurement data. Genes, 13(3): 554 doi:10.3390/genes13030544
Zhou, F., Ren, J., Lu, X., Ma, S. and Wu, C. (2021) Gene–Environment Interaction: a Variable Selection Perspective. Epistasis: Methods and Protocols, 191-223
Ren, J., Zhou, F., Li, X., Chen, Q., Zhang, H., Ma, S., Jiang,Y. and Wu, C. (2020). Semi-parametric Bayesian variable selection for Gene-Environment interactions. Statistics in Medicine, 39(5): 617-638 doi:10.1002/sim.8434
Wu, C., Zhou, F., Ren, J., Li, X., Jiang, Y., Ma, S. (2019). A Selective Review of Multi-Level Omics Data Integration Using Variable Selection. High-Throughput, 8(1) doi:10.3390/ht8010004
Ren, J., Du, Y., Li, S., Ma, S., Jiang, Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis. Genetic epidemiology, 43(3), 276-291 doi:10.1002/gepi.22194
Ren, J., Jung, L., Du, Y., Wu, C., Jiang, Y. and Liu, J. (2019). regnet: Network-Based Regularization for Generalized Linear Models. R package, version 0.4.0
Wu, C., Zhang, Q., Jiang, Y. and Ma, S. (2018). Robust network-based analysis of the associations between (epi) genetic measurements. Journal of multivariate analysis, 168, 119-130 doi:10.1016/j.jmva.2018.06.009
Wu, C., Zhong, P.-S., and Cui, Y. (2018). Additive varying-coefficient model for nonlinear gene-environment interactions. Statistical Applications in Genetics and Molecular Biology, 17(2) doi:10.1515/sagmb-2017-0008
Wu, C., Jiang, Y., Ren, J., Cui, Y., Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in Medicine, 37: 437–456 doi:10.1002/sim.7518
Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y. and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes. BMC genetics, 18(1), 44 doi:10.1186/s12863-017-0495-5
Jiang, Y., Huang, Y., Du, Y., Zhao, Y., Ren, J., Ma, S., & Wu, C. (2017). Identification of prognostic genes and pathways in lung adenocarcinoma using a Bayesian approach. Cancer Inform, 1(7) doi:10.1177/1176935116684825
Wu, C., and Ma, S. (2015). A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics, 16(5), 873–883 doi:10.1093/bib/bbu046
Wu, C., Shi, X., Cui, Y. and Ma, S. (2015). A penalized robust semiparametric approach for gene-environment interactions. Statistics in Medicine, 34 (30): 4016–4030 doi:10.1002/sim.6609
Wu, C., Cui, Y., and Ma, S. (2014). Integrative analysis of gene–environment interactions under a multi–response partially linear varying coefficient model. Statistics in Medicine, 33(28), 4988–4998 doi:10.1002/sim.6287
Wu, C. and Cui, Y. (2013). A novel method for identifying nonlinear gene–environment interactions in case–control association studies. Human Genetics, 132(12):1413–1425 doi:10.1007/s00439-013-1350-z
Wu, C. and Cui, Y. (2013). Boosting signals in gene–based association studies via efficient SNP selection. Briefings in Bioinformatics, 15(2):279–291 doi:10.1093/bib/bbs087
Wu, C., Zhong, P.S. and Cui, Y. (2013). High dimensional variable selection for gene-environment interactions. Technical Report, Michigan State University.
Wu, C., Li, S., and Cui, Y. (2012). Genetic Association Studies: An Information Content Perspective. Current Genomics, 13(7), 566–573 doi:10.2174/138920212803251382
Simulated data for demonstrating the features of interep.
data("dat")
data("dat")
Each data consists of six components: e, z, x, y, coef and index; index shows the location of the true coefficients used to generate y.
data("dat")
data("dat")
This function obtains the first derivative function of MCP (Minimax Concave Penalty)
dmcp(theta, lambda, gamma = 3)
dmcp(theta, lambda, gamma = 3)
theta |
a coefficient vector. |
lambda |
the tuning parameter. |
gamma |
the regularization parameter in MCP (Minimax Concave Penalty). It balances between the unbiasedness and concavity of MCP. |
Rigorously speaking, the regularization parametre needs to be obtained via a data-driven approach.
Published studies suggest experimenting with a few values, such as 1.8, 3, 4.5, 6, and 10, then fixing its value. In our numerical
study, we have examined this sequence and found that the results are not sensitive to the choice of value of
,
and set the value at 3. In practice, to be prudent, values other than 3 should also be investigated. Similar discussions can be found
in the references below.
the first derivative of MCP function.
Ren, J., Du, Y., Li, S., Ma, S., Jiang, Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis. Genetic epidemiology, 43(3), 276-291 doi:10.1002/gepi.22194
Ren, J., Jung, L., Du, Y., Wu, C., Jiang, Y. and Liu, J. (2019). regnet: Network-Based Regularization for Generalized Linear Models. R package, version 0.4.0
Wu, C., Zhang, Q., Jiang, Y. and Ma, S. (2018). Robust network-based analysis of the associations between (epi) genetic measurements. Journal of multivariate analysis, 168, 119-130 doi:10.1016/j.jmva.2018.06.009
Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y. and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case–control study of Type 2 diabetes. BMC genetics, 18(1), 44 doi:10.1186/s12863-017-0495-5
theta=runif(20,-5,5) lambda=1 dmcp(theta,lambda,gamma=3)
theta=runif(20,-5,5) lambda=1 dmcp(theta,lambda,gamma=3)
This function makes predictions for generalized estimating equation with a given value of lambda. Typical usage is to have the cv.interep function compute the optimal lambda, then provide it to the interep function.
interep(e, g, y, beta0, corre, pmethod, lam1, lam2, maxits)
interep(e, g, y, beta0, corre, pmethod, lam1, lam2, maxits)
e |
matrix of environment factors. |
g |
matrix of omics factors. In the case study, the omics measurements are lipidomics data. |
y |
the longitudinal response. |
beta0 |
the inital coefficient vector. |
corre |
the working correlation structure that is used in the estimation algorithm. interep provides three choices for the working correlation structure: "a" as AR-1", "i" as "independence" and "e" as "exchangeable". |
pmethod |
the penalization method. "mixed" refers to MCP penalty to individual main effects and group MCP penalty to interactions; "individual" means MCP penalty to all effects. |
lam1 |
the tuning parameter lambda1 for individual predictors. |
lam2 |
the tuning parameter lambda2 for interactions. |
maxits |
the maximum number of iterations that is used in the estimation algorithm. The default value is 30 |
coef |
the coefficient vector. |
Zhou, F., Ren, J., Li, G., Jiang, Y., Li, X., Wang, W.and Wu, C. (2019). Penalized variable selection for Lipid–environment interactions in a longitudinal lipidomics study. Genes, 10(12), 1002
Zhou, F., Ren, J., Liu, Y., Li, X., Wang, W.and Wu, C. (2022). Interep: An r package for high-dimensional interaction analysis of the repeated measurement data. Genes, 13(3): 554 doi:10.3390/genes13030544
Zhou, F., Ren, J., Lu, X., Ma, S. and Wu, C. (2021) Gene–Environment Interaction: a Variable Selection Perspective. Epistasis: Methods and Protocols, 191-223
data("dat") e=dat$e g=dat$z y=dat$y beta0=dat$coef index=dat$index b = interep(e, g, y,beta0,corre="e",pmethod="mixed",lam1=dat$lam1, lam2=dat$lam2,maxits=30) b[abs(b)<0.05]=0 pos = which(b != 0) tp = length(intersect(index, pos)) fp = length(pos) - tp list(tp=tp, fp=fp)
data("dat") e=dat$e g=dat$z y=dat$y beta0=dat$coef index=dat$index b = interep(e, g, y,beta0,corre="e",pmethod="mixed",lam1=dat$lam1, lam2=dat$lam2,maxits=30) b[abs(b)<0.05]=0 pos = which(b != 0) tp = length(intersect(index, pos)) fp = length(pos) - tp list(tp=tp, fp=fp)
This function gives the penalty functions
penalty(x, n, p, q, beta, lam1, pmethod, p1, lam2)
penalty(x, n, p, q, beta, lam1, pmethod, p1, lam2)
x |
matrix of covariates. |
n |
the sample size. |
p |
the number of predictors. |
q |
the number of environment factors. |
beta |
the coefficient vector. |
lam1 |
the tuning parameter lambda1 for individual penalty. |
pmethod |
the penalization method. "mixed" refers to MCP penalty to individual main effects and group MCP penalty to interactions; "individual" means MCP penalty to all effects. |
p1 |
the number of gene factors. |
lam2 |
the tuning parameter lambda2 for group penalty. |
E |
the penalty function. |
This function changes the format of the longitudinal data from wide format to long format
reformat(k, y, x)
reformat(k, y, x)
k |
the number of repeated measurement. |
y |
the longitudinal response. |
x |
a matrix of predictors, consisting of omics and environment factors, as well as their interactions. In the case study, the omics measurements are lipidomics data. |