Title: | A Multiple Testing Procedure for High-Dimensional Mediation Hypotheses |
---|---|
Description: | A multiple-testing procedure for high-dimensional mediation hypotheses. Mediation analysis is of rising interest in epidemiology and clinical trials. Among existing methods for mediation analyses, the popular joint significance (JS) test yields an overly conservative type I error rate and therefore low power. In the R package 'HDMT' we implement a multiple-testing procedure that accurately controls the family-wise error rate (FWER) and the false discovery rate (FDR) when using JS for testing high-dimensional mediation hypotheses. The core of our procedure is based on estimating the proportions of three component null hypotheses and deriving the corresponding mixture distribution of null p-values. Results of the data examples include better-behaved quantile-quantile plots and improved detection of novel mediation relationships on the role of DNA methylation in genetic regulation of gene expression. With increasing interest in mediation by molecular intermediaries such as gene expression, the proposed method addresses an unmet methodological challenge. Methods used in the package refer to James Y. Dai, Janet L. Stanford & Michael LeBlanc (2020) <doi:10.1080/01621459.2020.1765785>. |
Authors: | James Dai [aut, cre], Xiaoyu Wang [aut] |
Maintainer: | James Dai <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.5 |
Built: | 2024-11-26 06:37:10 UTC |
Source: | CRAN |
A function to generate the quantiles of the estimated three-component mixture null distribution for pmax (the maximum of the two p-values for testing mediation) using either approximation or exact method
adjust_quantile(alpha00, alpha01, alpha10, alpha1, alpha2, input_pvalues, exact = 0)
adjust_quantile(alpha00, alpha01, alpha10, alpha1, alpha2, input_pvalues, exact = 0)
alpha00 |
A numeric number represents the proportion of null |
alpha01 |
A numeric number represents the proportion of null |
alpha10 |
A numeric number represents the proportion of null |
alpha1 |
A numeric number represents the proportion of null alpha==0 (association between exposure and mediator) |
alpha2 |
A numeric number represents the proportion of null beta==0 (association between mediator and outcome adjusted for exposure) |
input_pvalues |
A matrix contains two columns of p-values for candidate mediators. Column 1 is the p-value of testing if a exposure is associated with the mediator (alpha!=0). Column 2 is the p-value of testing if a mediator is associated with the outcome adjusted for exposure(beta!=0) |
exact |
Use the option to choose from two methods. exact=0: the approximation method without estimating the CDFs when deriving the mixture null distribution; exact=1: the exact method to estimate the CDFs nonparametrically when deriving the mixture null distribution |
The is a function to computed the expected quantiles for the observed p-max values based on the estimated mixture null distribution. The methodology detail can be found in Dai et al (2020).
A vector contains the expected quantiles of p-values based on the estimated mixture null distribution. See Dai et al (2020) for details of how to compute quantiles using the approximation method (exact=0) or the exact method (exact=1).
James Y. Dai and X. Wang
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses. Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues),size=ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) pnull <- adjust_quantile(nullprop$alpha00,nullprop$alpha01,nullprop$alpha10,nullprop$alpha1, nullprop$alpha2,input_pvalues,exact=0)
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues),size=ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) pnull <- adjust_quantile(nullprop$alpha00,nullprop$alpha01,nullprop$alpha10,nullprop$alpha1, nullprop$alpha2,input_pvalues,exact=0)
A function to draw the corrected quantile-quantile (Q-Q) plots. The corrected quantiles were computed from the mixture null distribution (green dots) and the naive quantiles were computed from the uniform distribution (red dots).
correct_qqplot(pmax, pnull, opt="all")
correct_qqplot(pmax, pnull, opt="all")
pmax |
The vector for maximum p-values |
pnull |
The quantiles of pmax based on the estimated mixture null distribution |
opt |
Option to draw the plot. opt="all":use all the data points, opt="subset": use a subset of the data points, in case there are too many points in a genome-wide setting, to avoid drawing an overcrowded Q-Q plot with a prohibitive image size. |
James Y. Dai and X. Wang
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size=ceiling(nrow(input_pvalues)/10)),] pmax <- apply(input_pvalues,1,max) nullprop <- null_estimation(input_pvalues) pnull1 <- adjust_quantile(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00, nullprop$alpha1,nullprop$alpha2,input_pvalues,exact=1) correct_qqplot(pmax,pnull1)
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size=ceiling(nrow(input_pvalues)/10)),] pmax <- apply(input_pvalues,1,max) nullprop <- null_estimation(input_pvalues) pnull1 <- adjust_quantile(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00, nullprop$alpha1,nullprop$alpha2,input_pvalues,exact=1) correct_qqplot(pmax,pnull1)
This example dataset was included to assess the mediation role of DNA methylation in the effect of exercise on prostate cancer progression in a Seattle-based cohort of patients diagnosed with clinically localized PCa. The entire data set contains two sets of p-values from genome-wide testing of 450K CpG sites. Due to space limit, a subset (10 percent) of the full dataset is included in the package for illustration.
The dataset is a matrix containing two columns of p-values for candidate mediators. Column 1 contains the p-values for testing if an exposure is associated with the mediator (alpha!=0). Column 2 contains the p-value for testing if a mediator is associated with the outcome after adjusted for the exposure (beta!=0).
data("exercise_input")
data("exercise_input")
The format of exercise_input is: num [1:47900, 1:2] 0.4966344 0.1048730 0.1005355 0.4946623 ...
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(exercise_input) dim(exercise_input)
data(exercise_input) dim(exercise_input)
A function to compute the estimated pointwise FDR based on the proposed joint significance mixture null method (JS-mixture).
fdr_est(alpha00, alpha01, alpha10, alpha1, alpha2, input_pvalues, exact = 0)
fdr_est(alpha00, alpha01, alpha10, alpha1, alpha2, input_pvalues, exact = 0)
alpha00 |
A numeric number represents the proportion of null |
alpha01 |
A numeric number represents the proportion of null |
alpha10 |
A numeric number represents the proportion of null |
alpha1 |
A numeric number represents the proportion of null alpha=0 |
alpha2 |
A numeric number represents the proportion of null beta=0 |
input_pvalues |
A matrix contains two columns of p-values for candidate mediators. Column 1 is the p-value of testing if an exposure is associated with the mediator (alpha!=0). Column 2 is the p-value of testing if a mediator is associated with the outcome adjusted for the exposure (beta!=0) |
exact |
The option to choose from two methods. exact=0: approximation without estimating the CDFs; exact=1: estimate the CDFs nonparametrically |
A function to estimate the pointwise FDR based on the proposed method to estimate the mixture null distribution. See Dai et al (2020) for details of how to compute quantiles using the approximation method (exact=0) or the exact method (exact=1).
The estimated pointwise FDR for p-max
James Y. Dai and X. Wang
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues),size=ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) fdr <- fdr_est(nullprop$alpha00,nullprop$alpha01,nullprop$alpha10, nullprop$alpha1,nullprop$alpha2,input_pvalues,exact=0)
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues),size=ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) fdr <- fdr_est(nullprop$alpha00,nullprop$alpha01,nullprop$alpha10, nullprop$alpha1,nullprop$alpha2,input_pvalues,exact=0)
A function to compute the FWER cutoff for p-max using the estimated mixture null distribution
fwer_est(alpha10, alpha01, alpha00, alpha1, alpha2, input_pvalues, alpha = 0.05, exact = 0)
fwer_est(alpha10, alpha01, alpha00, alpha1, alpha2, input_pvalues, alpha = 0.05, exact = 0)
alpha00 |
A numeric number represents the proportion of null |
alpha01 |
A numeric number represents the proportion of null |
alpha10 |
A numeric number represents the proportion of null |
alpha1 |
A numeric number represents the proportion of null alpha==0 |
alpha2 |
A numeric number represents the proportion of null beta==0 |
input_pvalues |
A matrix contains two columns of p-values for candidate mediators. Column 1 is the p-value of testing if an exposure is associated with the mediator (alpha!=0). Column 2 is the p-value of testing if a mediator is associated with the outcome adjusted for the exposure (beta!=0) |
alpha |
The designated significance level for FWER |
exact |
The option to choose from two methods. exact=0: approximation without estimating the CDFs; exact=1: estimate the CDFs nonparametrically |
A function to compute FWER cutoff for p-max accounting for the mixture null distribution. The methodology detail can be found in Dai et al (2020).
A numeric number represents the output FWER cutoff
James Y. Dai and X. Wang
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size = ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) fwercut0 <- fwer_est(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00,nullprop$alpha1, nullprop$alpha2,input_pvalues,alpha=0.05,exact=0) fwercut1 <- fwer_est(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00,nullprop$alpha1, nullprop$alpha2,input_pvalues,alpha=0.05,exact=1)
data(snp_input) input_pvalues <- snp_input #To save time for illustration, we use 10 percent of rows input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size = ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues) fwercut0 <- fwer_est(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00,nullprop$alpha1, nullprop$alpha2,input_pvalues,alpha=0.05,exact=0) fwercut1 <- fwer_est(nullprop$alpha10,nullprop$alpha01,nullprop$alpha00,nullprop$alpha1, nullprop$alpha2,input_pvalues,alpha=0.05,exact=1)
This is a function to estimate the proportions of the three component nulls involved in mediation testing. We developed a three component-mixture model method to estimate the proportions of nulls and provide much more accurate control of the family-wise error rate (FWER) and the false discovery rate (FDR), when compared to the standard approach using the uniform null distribution.
null_estimation(input_pvalues)
null_estimation(input_pvalues)
input_pvalues |
A matrix contains two columns of p-values for candidate mediators. Column 1 is the p-value of testing if the exposure is associated with the candidate mediator (alpha!=0). Column 2 is the p-value of testing if the candidate mediator is associated with the outcome adjusted for the exposure (beta!=0). |
A function to estimate the proportions of the three types of component null hypotheses:
: alpha==0 and beta==0
: alpha==0 and beta!=0
: alpha!=0 and beta==0
The methodology detail can be found in Dai et al (2020).
A list contains five elements.
alpha00 |
A numeric number represents the proportion of null |
alpha01 |
A numeric number represents the proportion of null |
alpha10 |
A numeric number represents the proportion of null |
alpha1 |
A numeric number represents the proportion of null alpha==0 |
alpha2 |
A numeric number represents the proportion of null beta==0 |
James Y. Dai and X. Wang
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) input_pvalues <- snp_input #To save computing time for illustration, we use 10 percent of rows (p-values) input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size <- ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues)
data(snp_input) input_pvalues <- snp_input #To save computing time for illustration, we use 10 percent of rows (p-values) input_pvalues <- input_pvalues[sample(1:nrow(input_pvalues), size <- ceiling(nrow(input_pvalues)/10)),] nullprop <- null_estimation(input_pvalues)
This example dataset is included in 'HDMT' to assess the mediation role of DNA methylation in genetic regulation of gene expression in primary prostate cancer (PCa) samples from The Cancer Genome Atlas (TCGA) with risk SNPs as the exposure.
The dataset is a matrix containing two columns of p-values for candidate mediators. Column 1 contains the p-values for testing if an exposure is associated with the mediator (alpha!=0). Column 2 contains the p-value for testing if a mediator is associated with the outcome after adjusted for the exposure (beta!=0).
data("snp_input")
data("snp_input")
The format of snp_input is: num [1:69602, 1:2] 0.106 0.999 0.101 0.173 0.89 ...
James Y. Dai, Janet L. Stanford, Michael LeBlanc. A multiple-testing procedure for high-dimensional mediation hypotheses, Journal of the American Statistical Association, 2020, DOI: 10.1080/01621459.2020.1765785.
data(snp_input) dim(snp_input)
data(snp_input) dim(snp_input)