Title: | Power Analysis for Differential Expression Studies |
---|---|
Description: | Provides a convenient framework to simulate, test, power, and visualize data for differential expression studies with lognormal or negative binomial outcomes. Supported designs are two-sample comparisons of independent or dependent outcomes. Power may be summarized in the context of controlling the per-family error rate or family-wise error rate. Negative binomial methods are described in Yu, Fernandez, and Brock (2017) <doi:10.1186/s12859-017-1648-2> and Yu, Fernandez, and Brock (2020) <doi:10.1186/s12859-020-3541-7>. |
Authors: | Brett Klamer [aut, cre] , Lianbo Yu [aut] |
Maintainer: | Brett Klamer <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2024.12.4 |
Built: | 2024-12-09 09:37:32 UTC |
Source: | CRAN |
Generalized linear model for two independent negative binomial outcomes.
glm_nb(data, equal_dispersion = FALSE, test = "wald", ci_level = NULL, ...)
glm_nb(data, equal_dispersion = FALSE, test = "wald", ci_level = NULL, ...)
data |
(list) |
equal_dispersion |
(Scalar logical: |
test |
(String: |
ci_level |
(Scalar numeric: |
... |
Optional arguments passed to |
Uses glmmTMB::glmmTMB()
in the form
glmmTMB( formula = value ~ condition, data = data, dispformula = ~ condition, family = nbinom2 )
to model independent negative binomial outcomes
and
where
is the mean of group 1,
is the ratio of the means of
group 2 with respect to group 1,
is the dispersion parameter
of group 1, and
is the dispersion parameter of group 2.
The hypotheses for the LRT and Wald test of are
where is the population ratio of
arithmetic means for group 2 with respect to group 1 and
assumes the population means are identical.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (group 2 / group 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean1 |
Estimated mean of group 1. | |
5 | 1 | estimate |
Point estimate. |
5 | 2 | lower |
Confidence interval lower bound. |
5 | 3 | upper |
Confidence interval upper bound. |
6 | mean2 |
Estimated mean of group 2. | |
6 | 1 | estimate |
Point estimate. |
6 | 2 | lower |
Confidence interval lower bound. |
6 | 3 | upper |
Confidence interval upper bound. |
7 | dispersion1 |
Estimated dispersion of group 1. | |
7 | 1 | estimate |
Point estimate. |
7 | 2 | lower |
Confidence interval lower bound. |
7 | 3 | upper |
Confidence interval upper bound. |
8 | dispersion2 |
Estimated dispersion of group 2. | |
8 | 1 | estimate |
Point estimate. |
8 | 2 | lower |
Confidence interval lower bound. |
8 | 3 | upper |
Confidence interval upper bound. |
9 | n1 |
Sample size of group 1. | |
10 | n2 |
Sample size of group 2. | |
11 | method |
Method used for the results. | |
12 | test |
Type of hypothesis test. | |
13 | alternative |
The alternative hypothesis. | |
14 | equal_dispersion |
Whether or not equal dispersions were assumed. | |
15 | ci_level |
Confidence level of the intervals. | |
16 | hessian |
Information about the Hessian matrix. | |
17 | convergence |
Information about convergence. |
Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.
Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.
#---------------------------------------------------------------------------- # glm_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) lrt <- glm_nb(d, equal_dispersion = FALSE, test = "lrt", ci_level = 0.95) lrt wald <- glm_nb(d, equal_dispersion = FALSE, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition, data = d, dispformula = ~ condition, family = glmmTMB::nbinom2, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1, data = d, dispformula = ~ condition, family = glmmTMB::nbinom2, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
#---------------------------------------------------------------------------- # glm_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) lrt <- glm_nb(d, equal_dispersion = FALSE, test = "lrt", ci_level = 0.95) lrt wald <- glm_nb(d, equal_dispersion = FALSE, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition, data = d, dispformula = ~ condition, family = glmmTMB::nbinom2, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1, data = d, dispformula = ~ condition, family = glmmTMB::nbinom2, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
Generalized linear mixed model for bivariate negative binomial outcomes.
glmm_bnb(data, test = "wald", ci_level = NULL, ...)
glmm_bnb(data, test = "wald", ci_level = NULL, ...)
data |
(list) |
test |
(String: |
ci_level |
(Scalar numeric: |
... |
Optional arguments passed to |
Uses glmmTMB::glmmTMB()
in the form
glmmTMB( formula = value ~ condition + (1 | item), data = data, dispformula = ~ 1, family = nbinom2 )
to model dependent negative binomial outcomes
where
is the mean of
sample 1,
is the ratio of the means of sample 2 with respect to
sample 1, and
is the dispersion parameter.
The hypotheses for the LRT and Wald test of are
where is the population ratio of
arithmetic means for sample 2 with respect to sample 1 and
assumes the population means are identical.
When simulating data from sim_bnb()
, the mean is a function of the
item (subject) random effect which in turn is a function of the dispersion
parameter. Thus, glmm_bnb()
has biased mean and dispersion estimates. The
bias increases as the dispersion parameter gets smaller and decreases as
the dispersion parameter gets larger. However, estimates of the ratio and
standard deviation of the random intercept tend to be accurate. The p-value
for glmm_bnb()
is generally overconservative compared to glmm_poisson()
,
wald_test_bnb()
and lrt_bnb()
. In summary, the negative binomial
mixed-effects model fit by glmm_bnb()
is not recommended for the BNB data
simulated by sim_bnb()
. Instead, wald_test_bnb()
or lrt_bnb()
should
typically be used instead.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (sample 2 / sample 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean1 |
Estimated mean of sample 1. | |
5 | 1 | estimate |
Point estimate. |
5 | 2 | lower |
Confidence interval lower bound. |
5 | 3 | upper |
Confidence interval upper bound. |
6 | mean2 |
Estimated mean of sample 2. | |
6 | 1 | estimate |
Point estimate. |
6 | 2 | lower |
Confidence interval lower bound. |
6 | 3 | upper |
Confidence interval upper bound. |
7 | dispersion |
Estimated dispersion. | |
7 | 1 | estimate |
Point estimate. |
7 | 2 | lower |
Confidence interval lower bound. |
7 | 3 | upper |
Confidence interval upper bound. |
8 | item_sd |
Estimated standard deviation of the item (subject) random intercept. | |
8 | 1 | estimate |
Point estimate. |
8 | 2 | lower |
Confidence interval lower bound. |
8 | 3 | upper |
Confidence interval upper bound. |
9 | n1 |
Sample size of sample 1. | |
10 | n2 |
Sample size of sample 2. | |
11 | method |
Method used for the results. | |
12 | test |
Type of hypothesis test. | |
13 | alternative |
The alternative hypothesis. | |
14 | ci_level |
Confidence level of the interval. | |
15 | hessian |
Information about the Hessian matrix. | |
16 | convergence |
Information about convergence. |
Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.
Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.
#---------------------------------------------------------------------------- # glmm_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) lrt <- glmm_bnb(d, test = "lrt") lrt wald <- glmm_bnb(d, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition + (1 | item), data = d, dispformula = ~ 1, family = glmmTMB::nbinom2, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1 + (1 | item), data = d, dispformula = ~ 1, family = glmmTMB::nbinom2, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
#---------------------------------------------------------------------------- # glmm_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) lrt <- glmm_bnb(d, test = "lrt") lrt wald <- glmm_bnb(d, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition + (1 | item), data = d, dispformula = ~ 1, family = glmmTMB::nbinom2, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1 + (1 | item), data = d, dispformula = ~ 1, family = glmmTMB::nbinom2, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
Generalized linear mixed model for two dependent Poisson outcomes.
glmm_poisson(data, test = "wald", ci_level = NULL, ...)
glmm_poisson(data, test = "wald", ci_level = NULL, ...)
data |
(list) |
test |
(String: |
ci_level |
(Scalar numeric: |
... |
Optional arguments passed to |
Uses glmmTMB::glmmTMB()
in the form
glmmTMB( formula = value ~ condition + (1 | item), data = data, family = stats::poisson )
to model dependent Poisson outcomes and
where
is the mean of sample 1
and
is the ratio of the means of sample 2 with respect to sample 1.
The hypotheses for the LRT and Wald test of are
where is the population ratio of
arithmetic means for sample 2 with respect to sample 1 and
assumes the population means are identical.
When simulating data from sim_bnb()
, the mean is a function of the
item (subject) random effect which in turn is a function of the dispersion
parameter. Thus, glmm_poisson()
has biased mean estimates. The bias
increases as the dispersion parameter gets smaller and decreases as the
dispersion parameter gets larger. However, estimates of the ratio and
standard deviation of the random intercept tend to be accurate. In summary,
the Poisson mixed-effects model fit by glmm_poisson()
is not recommended
for the BNB data simulated by sim_bnb()
. Instead, wald_test_bnb()
or
lrt_bnb()
should typically be used instead.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (sample 2 / sample 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean1 |
Estimated mean of sample 1. | |
5 | 1 | estimate |
Point estimate. |
5 | 2 | lower |
Confidence interval lower bound. |
5 | 3 | upper |
Confidence interval upper bound. |
6 | mean2 |
Estimated mean of sample 2. | |
6 | 1 | estimate |
Point estimate. |
6 | 2 | lower |
Confidence interval lower bound. |
6 | 3 | upper |
Confidence interval upper bound. |
7 | item_sd |
Estimated standard deviation of the item (subject) random intercept. | |
7 | 1 | estimate |
Point estimate. |
7 | 2 | lower |
Confidence interval lower bound. |
7 | 3 | upper |
Confidence interval upper bound. |
8 | n1 |
Sample size of sample 1. | |
9 | n2 |
Sample size of sample 2. | |
10 | method |
Method used for the results. | |
11 | test |
Type of hypothesis test. | |
12 | alternative |
The alternative hypothesis. | |
13 | ci_level |
Confidence level of the interval. | |
14 | hessian |
Information about the Hessian matrix. | |
15 | convergence |
Information about convergence. |
Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.
Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.
#---------------------------------------------------------------------------- # glmm_poisson() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) lrt <- glmm_poisson(d, test = "lrt") lrt wald <- glmm_poisson(d, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition + (1 | item), data = d, family = stats::poisson, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1 + (1 | item), data = d, family = stats::poisson, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
#---------------------------------------------------------------------------- # glmm_poisson() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) lrt <- glmm_poisson(d, test = "lrt") lrt wald <- glmm_poisson(d, test = "wald", ci_level = 0.95) wald #---------------------------------------------------------------------------- # Compare results to manual calculation of chi-square statistic #---------------------------------------------------------------------------- # Use the same data, but as a data frame instead of list set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2, return_type = "data.frame" ) mod_alt <- glmmTMB::glmmTMB( formula = value ~ condition + (1 | item), data = d, family = stats::poisson, ) mod_null <- glmmTMB::glmmTMB( formula = value ~ 1 + (1 | item), data = d, family = stats::poisson, ) lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt))) lrt_chisq wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2 wald_chisq anova(mod_null, mod_alt)
Likelihood ratio test for the ratio of means from bivariate negative binomial outcomes.
lrt_bnb(data, ratio_null = 1, ...)
lrt_bnb(data, ratio_null = 1, ...)
data |
(list) |
ratio_null |
(Scalar numeric: |
... |
Optional arguments passed to the MLE function |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
where
is the random item (subject)
effect. Then
is the joint
distribution where
and
are dependent (though conditionally
independent),
is the count outcome for sample 1 of the items
(subjects),
is the count outcome for sample 2 of the items (subjects),
is the conditional mean of sample 1,
is the ratio of the
conditional means of sample 2 with respect to sample 1, and
is
the gamma distribution shape parameter which controls the dispersion and the
correlation between sample 1 and 2.
The hypotheses for the LRT of are
where is the population ratio of
arithmetic means for sample 2 with respect to sample 1 and
is
a constant for the assumed null population ratio of means (typically
).
The LRT statistic is
Under , the LRT test statistic is asymptotically distributed
as
. The approximate level
test rejects
if
. Note that
the asymptotic critical value is known to underestimate the exact critical
value. Hence, the nominal significance level may not be achieved for small
sample sizes (possibly
or
).
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (sample 2 / sample 1). | |
5 | alternative |
Point estimates under the alternative hypothesis. | |
5 | 1 | mean1 |
Estimated mean of sample 1. |
5 | 2 | mean2 |
Estimated mean of sample 2. |
5 | 3 | dispersion |
Estimated dispersion. |
6 | null |
Point estimates under the null hypothesis. | |
6 | 1 | mean1 |
Estimated mean of sample 1. |
6 | 2 | mean2 |
Estimated mean of sample 2. |
6 | 3 | dispersion |
Estimated dispersion. |
7 | n1 |
The sample size of sample 1. | |
8 | n2 |
The sample size of sample 2. | |
9 | method |
Method used for the results. | |
10 | ratio_null |
Assumed population ratio of means. | |
11 | mle_code |
Integer indicating why the optimization process terminated. | |
12 | mle_message |
Information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # lrt_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) |> lrt_bnb()
#---------------------------------------------------------------------------- # lrt_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) |> lrt_bnb()
Likelihood ratio test for the ratio of means from two independent negative binomial outcomes.
lrt_nb(data, equal_dispersion = FALSE, ratio_null = 1, ...)
lrt_nb(data, equal_dispersion = FALSE, ratio_null = 1, ...)
data |
(list) |
equal_dispersion |
(Scalar logical: |
ratio_null |
(Scalar numeric: |
... |
Optional arguments passed to the MLE function |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
where
and
are
independent,
is the count outcome for items in group 1,
is the count outcome for items in group 2,
is the arithmetic mean
count in group 1,
is the ratio of arithmetic means for group 2 with
respect to group 1,
is the dispersion parameter of group 1,
and
is the dispersion parameter of group 2.
The hypotheses for the LRT of are
where is the population ratio of
arithmetic means for group 2 with respect to group 1 and
is a
constant for the assumed null population ratio of means (typically
).
The LRT statistic is
Under , the LRT test statistic is asymptotically distributed
as
. The approximate level
test rejects
if
. Note that
the asymptotic critical value is known to underestimate the exact critical
value. Hence, the nominal significance level may not be achieved for small
sample sizes (possibly
or
).
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (group 2 / group 1). | |
5 | alternative |
Point estimates under the alternative hypothesis. | |
5 | 1 | mean1 |
Estimated mean of group 1. |
5 | 2 | mean2 |
Estimated mean of group 2. |
5 | 3 | dispersion1 |
Estimated dispersion of group 1. |
5 | 4 | dispersion2 |
Estimated dispersion of group 2. |
6 | null |
Point estimates under the null hypothesis. | |
6 | 1 | mean1 |
Estimated mean of group 1. |
6 | 2 | mean2 |
Estimated mean of group 2. |
6 | 3 | dispersion1 |
Estimated dispersion of group 1. |
6 | 4 | dispersion2 |
Estimated dispersion of group 2. |
7 | n1 |
Sample size of group 1. | |
8 | n2 |
Sample size of group 2. | |
9 | method |
Method used for the results. | |
10 | equal_dispersion |
Whether or not equal dispersions were assumed. | |
11 | ratio_null |
Assumed population ratio of means. | |
12 | mle_code |
Integer indicating why the optimization process terminated. | |
13 | mle_message |
Information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # lrt_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) |> lrt_nb()
#---------------------------------------------------------------------------- # lrt_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) |> lrt_nb()
Maximum likelihood estimates (MLE) for bivariate negative binomial outcomes.
mle_bnb_null(data, ratio_null = 1, method = "nlm_constrained", ...) mle_bnb_alt(data, method = "nlm_constrained", ...)
mle_bnb_null(data, ratio_null = 1, method = "nlm_constrained", ...) mle_bnb_alt(data, method = "nlm_constrained", ...)
data |
(list) |
ratio_null |
(Scalar numeric: |
method |
(string: |
... |
Optional arguments passed to the optimization method. |
These functions are primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
where
is the random item (subject) effect.
Then
is the joint distribution where
and
are dependent (though conditionally independent),
is the count outcome for sample 1 of the items (subjects),
is the count outcome for sample 2 of the items (subjects),
is the conditional mean of sample 1,
is the ratio of the
conditional means of sample 2 with respect to sample 1, and
is
the gamma distribution shape parameter which controls the dispersion and the
correlation between sample 1 and 2.
The MLEs of and
are
and
. The MLE of
is found by
maximizing the profile log-likelihood
with respect to
. When
is known, the MLE of
is
and
is obtained by maximizing the profile log-likelihood
with respect to
.
The backend method for numerical optimization is controlled by argument
method
which refers to stats::nlm()
, stats::nlminb()
, or
stats::optim()
. If you would like to see warnings from the optimizer,
include argument warnings = TRUE
.
For mle_bnb_alt
, a list with the following elements:
Slot | Name | Description |
1 | mean1 |
MLE for mean of sample 1. |
2 | mean2 |
MLE for mean of sample 2. |
3 | ratio |
MLE for ratio of means. |
4 | dispersion |
MLE for BNB dispersion. |
5 | nll |
Minimum of negative log-likelihood. |
6 | nparams |
Number of estimated parameters. |
7 | n1 |
Sample size of sample 1. |
8 | n2 |
Sample size of sample 2. |
9 | method |
Method used for the results. |
10 | mle_method |
Method used for optimization. |
11 | mle_code |
Integer indicating why the optimization process terminated. |
12 | mle_message |
Additional information from the optimizer. |
For mle_bnb_null
, a list with the following elements:
Slot | Name | Description |
1 | mean1 |
MLE for mean of sample 1. |
2 | mean2 |
MLE for mean of sample 2. |
3 | ratio_null |
Population ratio of means assumed for null hypothesis.
mean2 = mean1 * ratio_null . |
4 | dispersion |
MLE for BNB dispersion. |
5 | nll |
Minimum of negative log-likelihood. |
6 | nparams |
Number of estimated parameters. |
7 | n1 |
Sample size of sample 1. |
8 | n2 |
Sample size of sample 2. |
9 | method |
Method used for the results. |
10 | mle_method |
Method used for optimization. |
11 | mle_code |
Integer indicating why the optimization process terminated. |
12 | mle_message |
Additional information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # mle_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) mle_alt <- d |> mle_bnb_alt() mle_null <- d |> mle_bnb_null() mle_alt mle_null
#---------------------------------------------------------------------------- # mle_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) mle_alt <- d |> mle_bnb_alt() mle_null <- d |> mle_bnb_null() mle_alt mle_null
Maximum likelihood estimates (MLE) for two independent negative binomial outcomes.
mle_nb_null( data, equal_dispersion = FALSE, ratio_null = 1, method = "nlm_constrained", ... ) mle_nb_alt(data, equal_dispersion = FALSE, method = "nlm_constrained", ...)
mle_nb_null( data, equal_dispersion = FALSE, ratio_null = 1, method = "nlm_constrained", ... ) mle_nb_alt(data, equal_dispersion = FALSE, method = "nlm_constrained", ...)
data |
(list) |
equal_dispersion |
(Scalar logical: |
ratio_null |
(Scalar numeric: |
method |
(string: |
... |
Optional arguments passed to the optimization method. |
These functions are primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
, where
and
are
independent,
is the count outcome for items in group 1,
is the count outcome for items in group 2,
is the arithmetic mean
count in group 1,
is the ratio of arithmetic means for group 2 with
respect to group 1,
is the dispersion parameter of group 1,
and
is the dispersion parameter of group 2.
The MLEs of and
are
and
. The MLEs of
and
are found by maximizing the profile log-likelihood
with respect to
and
. When
is known, the MLE
of
is
and
and
are obtained by maximizing
the profile log-likelihood
.
The backend method for numerical optimization is controlled by argument
method
which refers to stats::nlm()
, stats::nlminb()
, or
stats::optim()
. If you would like to see warnings from the optimizer,
include argument warnings = TRUE
.
For mle_nb_alt()
, a list with the following elements:
Slot | Name | Description |
1 | mean1 |
MLE for mean of group 1. |
2 | mean2 |
MLE for mean of group 2. |
3 | ratio |
MLE for ratio of means. |
4 | dispersion1 |
MLE for dispersion of group 1. |
5 | dispersion2 |
MLE for dispersion of group 2. |
6 | equal_dispersion |
Were equal dispersions assumed. |
7 | n1 |
Sample size of group 1. |
8 | n2 |
Sample size of group 2. |
9 | nll |
Minimum of negative log-likelihood. |
10 | nparams |
Number of estimated parameters. |
11 | method |
Method used for the results. |
12 | mle_method |
Method used for optimization. |
13 | mle_code |
Integer indicating why the optimization process terminated. |
14 | mle_message |
Additional information from the optimizer. |
For mle_nb_null()
, a list with the following elements:
Slot | Name | Description |
1 | mean1 |
MLE for mean of group 1. |
2 | mean2 |
MLE for mean of group 2. |
3 | ratio_null |
Population ratio of means assumed for null hypothesis.
mean2 = mean1 * ratio_null . |
4 | dispersion1 |
MLE for dispersion of group 1. |
5 | dispersion2 |
MLE for dispersion of group 2. |
6 | equal_dispersion |
Were equal dispersions assumed. |
7 | n1 |
Sample size of group 1. |
8 | n2 |
Sample size of group 2. |
9 | nll |
Minimum of negative log-likelihood. |
10 | nparams |
Number of estimated parameters. |
11 | method |
Method used for the results. |
12 | mle_method |
Method used for optimization. |
13 | mle_code |
Integer indicating why the optimization process terminated. |
14 | mle_message |
Additional information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # mle_nb() examples #---------------------------------------------------------------------------- library(depower) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) mle_alt <- d |> mle_nb_alt() mle_null <- d |> mle_nb_null() mle_alt mle_null
#---------------------------------------------------------------------------- # mle_nb() examples #---------------------------------------------------------------------------- library(depower) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) mle_alt <- d |> mle_nb_alt() mle_null <- d |> mle_nb_null() mle_alt mle_null
The negative log-likelihood for bivariate negative binomial outcomes.
nll_bnb_null(param, value1, value2, ratio_null) nll_bnb_alt(param, value1, value2)
nll_bnb_null(param, value1, value2, ratio_null) nll_bnb_alt(param, value1, value2)
param |
(numeric)
for samples 1 and 2. |
value1 |
(numeric) |
value2 |
(numeric) |
ratio_null |
(Scalar numeric: |
These functions are primarily designed for speed in simulation. Arguments are not checked.
Suppose and
where
is the random item (subject) effect.
Then
is the joint distribution where
and
are dependent (though conditionally independent),
is the count outcome for sample 1 of the items (subjects),
is the count outcome for sample 2 of the items (subjects),
is the conditional mean of sample 1,
is the ratio of the
conditional means of sample 2 with respect to sample 1, and
is
the gamma distribution shape parameter which controls the dispersion and the
correlation between sample 1 and 2.
The likelihood is
and the parameter space is
.
The log-likelihood is
Scalar numeric negative log-likelihood.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
sim_nb()
, stats::nlminb()
, stats::nlm()
,
stats::optim()
#---------------------------------------------------------------------------- # nll_bnb*() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) nll_bnb_alt( param = c(mean1 = 10, mean2 = 12, dispersion = 2), value1 = d[[1L]], value2 = d[[2L]] ) nll_bnb_null( param = c(mean = 10, dispersion = 2), value1 = d[[1L]], value2 = d[[2L]], ratio_null = 1 )
#---------------------------------------------------------------------------- # nll_bnb*() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) nll_bnb_alt( param = c(mean1 = 10, mean2 = 12, dispersion = 2), value1 = d[[1L]], value2 = d[[2L]] ) nll_bnb_null( param = c(mean = 10, dispersion = 2), value1 = d[[1L]], value2 = d[[2L]], ratio_null = 1 )
The negative log-likelihood for two independent samples of negative binomial distributions.
nll_nb_null(param, value1, value2, equal_dispersion, ratio_null) nll_nb_alt(param, value1, value2, equal_dispersion)
nll_nb_null(param, value1, value2, equal_dispersion, ratio_null) nll_nb_alt(param, value1, value2, equal_dispersion)
param |
(numeric)
for groups 1 and 2. |
value1 |
(numeric) |
value2 |
(numeric) |
equal_dispersion |
(Scalar logical) |
ratio_null |
(Scalar numeric: |
These functions are primarily designed for speed in simulation. Arguments are not checked.
Suppose and
where
and
are
independent,
is the count outcome for items in group 1,
is the count outcome for items in group 2,
is the arithmetic mean
count in group 1,
is the ratio of arithmetic means for group 2 with
respect to group 1,
is the dispersion parameter of group 1,
and
is the dispersion parameter of group 2.
When the dispersion parameters are not equal, the likelihood is
and the parameter space is
.
The log-likelihood is
When the dispersion parameters are equal, the likelihood is
and the parameter space is
.
The log-likelihood is
Scalar numeric negative log-likelihood.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
sim_nb()
, stats::nlminb()
, stats::nlm()
,
stats::optim()
#---------------------------------------------------------------------------- # nll_nb_*() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) nll_nb_alt( param = c(mean1 = 10, mean2 = 15, dispersion = 2, dispersion2 = 8), value1 = d[[1L]], value2 = d[[2L]], equal_dispersion = FALSE ) nll_nb_null( param = c(mean = 10, dispersion = 2, dispersion2 = 8), value1 = d[[1L]], value2 = d[[2L]], equal_dispersion = FALSE, ratio_null = 1 )
#---------------------------------------------------------------------------- # nll_nb_*() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) d <- sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) nll_nb_alt( param = c(mean1 = 10, mean2 = 15, dispersion = 2, dispersion2 = 8), value1 = d[[1L]], value2 = d[[2L]], equal_dispersion = FALSE ) nll_nb_null( param = c(mean = 10, dispersion = 2, dispersion2 = 8), value1 = d[[1L]], value2 = d[[2L]], equal_dispersion = FALSE, ratio_null = 1 )
An automatic plot method for objects returned by power()
.
## S3 method for class 'depower' plot( x, x_axis = NULL, y_axis = NULL, color = NULL, facet_row = NULL, facet_col = NULL, hline = NULL, caption = TRUE, caption_width = 70L, ... )
## S3 method for class 'depower' plot( x, x_axis = NULL, y_axis = NULL, color = NULL, facet_row = NULL, facet_col = NULL, hline = NULL, caption = TRUE, caption_width = 70L, ... )
x |
(depower) |
x_axis |
(string: |
y_axis |
(string: |
color |
(string: |
facet_row |
(string: |
facet_col |
(string: |
hline |
(numeric: |
caption |
(Scalar logical: |
caption_width |
(Scalar integer: |
... |
Unused additional arguments. |
If you are limited by the output from plot.depower()
, keep in mind that the
object returned by power()
is a standard data frame. This allows
you to easily plot all results with standard plotting functions. In addition,
because plot.depower()
uses ggplot2, you can modify the plot as you
normally would. For example:
set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3), cv2 = c(0.3, 0.5), nsims = 1000 ) |> power(alpha = 0.05) |> plot(hline = 0.8, caption_width = 60) + ggplot2::theme_bw() + ggplot2::theme(plot.caption = ggplot2::element_text(hjust = 0)) + ggplot2::labs(title = "Power for the ratio of geometric means")
A ggplot2::ggplot()
object.
#---------------------------------------------------------------------------- # plot() examples #---------------------------------------------------------------------------- library(depower) # Power for independent two-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3), cv2 = c(0.3, 0.5), nsims = 500 ) |> power(alpha = 0.05) |> plot() # Power for dependent two-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3, 0.5), cv2 = c(0.3, 0.5), cor = c(0.3), nsims = 500 ) |> power(alpha = 0.01) |> plot() # Power for two-sample independent AND two-sample dependent t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3), cv2 = c(0.3), cor = c(0, 0.3, 0.6), nsims = 500 ) |> power(alpha = c(0.05, 0.01)) |> plot(facet_row = "cor", color = "test") # Power for one-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), ratio = c(1.2, 1.4), cv1 = c(0.3, 0.5), nsims = 500 ) |> power(alpha = c(0.05, 0.01)) |> plot() # Power for independent two-sample NB test set.seed(1234) sim_nb( n1 = c(10, 15), mean1 = 10, ratio = c(1.8, 2), dispersion1 = 10, dispersion2 = 3, nsims = 100 ) |> power(alpha = 0.01) |> plot() # Power for BNB test set.seed(1234) sim_bnb( n = c(10, 12), mean1 = 10, ratio = c(1.3, 1.5), dispersion = 5, nsims = 100 ) |> power(alpha = 0.01) |> plot()
#---------------------------------------------------------------------------- # plot() examples #---------------------------------------------------------------------------- library(depower) # Power for independent two-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3), cv2 = c(0.3, 0.5), nsims = 500 ) |> power(alpha = 0.05) |> plot() # Power for dependent two-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3, 0.5), cv2 = c(0.3, 0.5), cor = c(0.3), nsims = 500 ) |> power(alpha = 0.01) |> plot() # Power for two-sample independent AND two-sample dependent t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), n2 = c(10, 15), ratio = c(1.3, 1.5), cv1 = c(0.3), cv2 = c(0.3), cor = c(0, 0.3, 0.6), nsims = 500 ) |> power(alpha = c(0.05, 0.01)) |> plot(facet_row = "cor", color = "test") # Power for one-sample t-test set.seed(1234) sim_log_lognormal( n1 = c(10, 15), ratio = c(1.2, 1.4), cv1 = c(0.3, 0.5), nsims = 500 ) |> power(alpha = c(0.05, 0.01)) |> plot() # Power for independent two-sample NB test set.seed(1234) sim_nb( n1 = c(10, 15), mean1 = 10, ratio = c(1.8, 2), dispersion1 = 10, dispersion2 = 3, nsims = 100 ) |> power(alpha = 0.01) |> plot() # Power for BNB test set.seed(1234) sim_bnb( n = c(10, 12), mean1 = 10, ratio = c(1.3, 1.5), dispersion = 5, nsims = 100 ) |> power(alpha = 0.01) |> plot()
A method to calculate power for objects returned by sim_log_lognormal()
,
sim_nb()
, and sim_bnb()
.
power(data, ..., alpha = 0.05, list_column = FALSE, ncores = 1L)
power(data, ..., alpha = 0.05, list_column = FALSE, ncores = 1L)
data |
(depower) |
... |
(functions) |
alpha |
(numeric: |
list_column |
(Scalar logical: |
ncores |
(Scalar integer: |
Power is calculated as the proportion of hypothesis tests which result in a
p-value less than or equal to alpha
. e.g.
sum(p <= alpha, na.rm = TRUE) / nsims
Power is defined as the expected probability of rejecting the null hypothesis for a chosen value of the unknown effect. In a multiple comparisons scenario, power is defined as the marginal power, which is the expected power of the test for each individual null hypothesis assumed to be false.
Other forms of power under the multiple comparisons scenario include
disjunctive or conjunctive power. Disjunctive power is defined as the
expected probability of correctly rejecting one or more null hypotheses.
Conjunctive power is defined as the expected probability of correctly
rejecting all null hypotheses. In the simplest case, and where all hypotheses
are independent, if the marginal power is defined as and
is
the number of null hypotheses assumed to be false, then disjunctive power may
be calculated as
and conjunctive power may be calculated
as
. Disjunctive power tends to decrease with increasingly
correlated hypotheses and conjunctive power tends to increase with
increasingly correlated hypotheses.
...
...
are the name-value pairs for the functions used to perform the tests.
If not named, the functions coerced to character will be used for the
name-value pairs. Typical in non-standard evaluation, ...
accepts bare
functions and converts them to a list of expressions. Each element in this
list will be validated as a call
and then evaluated on the simulated data.
A base::call()
is simply an unevaluated function. Below are some examples
of specifying ...
in power()
.
# Examples of specifying ... in power() data <- sim_nb( n1 = 10, mean1 = 10, ratio = c(1.6, 2), dispersion1 = 2, dispersion2 = 2, nsims = 200 ) # ... is empty, so a an appropriate default function will be provided power(data) # This is equivalent to leaving ... empty power(data, "NB Wald test" = wald_test_nb()) # If not named, "wald_test_nb()" will be used to label the function power(data, wald_test_nb()) # You can specify any parameters in the call. The data argument # will automatically be inserted or overwritten. data |> power("NB Wald test" = wald_test_nb(equal_dispersion=TRUE, link="log")) # Multiple functions may be used. data |> power( wald_test_nb(link='log'), wald_test_nb(link='sqrt'), wald_test_nb(link='squared'), wald_test_nb(link='identity') ) # Just like functions in a pipe, the parentheses are required. # This will error because wald_test_nb is missing parentheses. try(power(data, wald_test_nb))
In most cases*, any user created test function may be utilized in ...
if the
following conditions are satisfied:
The function contains argument data
which is defined as a list with the
first and second elements for simulated data.
The return object is a list with element p
for the p-value of the
hypothesis test.
Validate with test cases beforehand.
*Simulated data of class log_lognormal_mixed_two_sample
has both independent
and dependent data. To ensure the appropriate test function is used,
power.log_lognormal_mixed_two_sample()
allows only
t_test_welch()
and t_test_paired()
in ...
. Each will
be evaluated on the simulated data according to column data$cor
. If one or
both of these functions are not included in ...
, the corresponding default
function will be used automatically. If any other test function is included,
an error will be returned.
alpha
is known as the type I assertion probability and is defined as
the expected probability of rejecting a null hypothesis when it was actually
true.
is compared with the p-value and used as the decision
boundary for rejecting or not rejecting the null hypothesis.
The family-wise error rate is the expected probability of making one or more
type I assertions among a family of hypotheses. Using Bonferroni's method,
is chosen for the family of hypotheses then divided by the
number of tests performed (
). Each individual hypothesis is tested at
. For example, if you plan to conduct 30 hypothesis
tests and want to control the family-wise error rate to no greater than
, you would set
alpha = 0.05/30
.
The per-family error rate is the expected number of type I assertions among a
family of hypotheses. If you calculate power for the scenario where you
perform 1,000 hypotheses and want to control the per-family error rate to no
greater than 10 type I assertions, you would choose alpha = 10/1000
. This
implicitly assumes all 1,000 hypotheses are truly null. Alternatively, if you
assume 800 of these hypotheses are truly null and 200 are not,
alpha = 10/1000
would control the per-family error rate to no greater than
8 type I assertions. If it is acceptable to keep the per-family error rate as
10, setting alpha = 10/800
would provide greater marginal power than the
previous scenario.
These two methods assume that the distribution of p-values for the truly null hypotheses are uniform(0,1), but remain valid under various other testing scenarios (such as dependent tests). Other multiple comparison methods, such as FDR control, are common in practice but don't directly fit into this power simulation framework.
nsims
The final number of valid simulations per unique set of simulation parameters
may be less than the original number requested. This may occur when the test
results in a missing p-value. For wald_test_bnb()
, pathological MLE
estimates, generally from small sample sizes and very small dispersions, may
result in a negative estimated standard deviation of the ratio. Thus the test
statistic and p-value would not be calculated. Note that simulated data from
sim_nb()
and sim_bnb()
may also reduce nsims
during the data simulation
phase.
The nsims
column in the return data frame is the effective number of
simulations for power results.
A data frame with the following columns appended to the data
object:
Column | Name | Description |
alpha |
Type I assertion probability. | |
test |
Name-value pair of the function and statistical test: c(as.character(...) = names(...) . |
|
data |
List-column of simulated data. | |
result |
List-column of test results. | |
power |
Power of the test for corresponding parameters. |
For power(list_column = FALSE)
, columns data
, and result
are excluded from
the data frame.
Yu L, Fernandez S, Brock G (2017). “Power analysis for RNA-Seq differential expression studies.” BMC Bioinformatics, 18(1), 234. ISSN 1471-2105, doi:10.1186/s12859-017-1648-2.
Yu L, Fernandez S, Brock G (2020). “Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.” BMC Bioinformatics, 21(1), 198. ISSN 1471-2105, doi:10.1186/s12859-020-3541-7.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.
Vickerstaff V, Omar RZ, Ambler G (2019). “Methods to adjust for multiple comparisons in the analysis and sample size calculation of randomised controlled trials with multiple primary outcomes.” BMC Medical Research Methodology, 19(1), 129. ISSN 1471-2288, doi:10.1186/s12874-019-0754-4.
#---------------------------------------------------------------------------- # power() examples #---------------------------------------------------------------------------- library(depower) # Power for independent two-sample t-Test set.seed(1234) data <- sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = 0, nsims = 1000 ) # Welch's t-test is used by default power(data) # But you can specify anything else that is needed power( data = data, "Welch's t-Test" = t_test_welch(alternative = "greater"), alpha = 0.01 ) # Power for dependent two-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = 0.5, nsims = 1000 ) |> power() # Power for mixed-type two-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = c(0, 0.5), nsims = 1000 ) |> power() # Power for one-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, nsims = 1000 ) |> power() # Power for independent two-sample NB test set.seed(1234) sim_nb( n1 = 10, mean1 = 10, ratio = c(1.6, 2), dispersion1 = 2, dispersion2 = 2, nsims = 200 ) |> power() # Power for BNB test set.seed(1234) sim_bnb( n = 10, mean1 = 10, ratio = c(1.4, 1.6), dispersion = 10, nsims = 200 ) |> power()
#---------------------------------------------------------------------------- # power() examples #---------------------------------------------------------------------------- library(depower) # Power for independent two-sample t-Test set.seed(1234) data <- sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = 0, nsims = 1000 ) # Welch's t-test is used by default power(data) # But you can specify anything else that is needed power( data = data, "Welch's t-Test" = t_test_welch(alternative = "greater"), alpha = 0.01 ) # Power for dependent two-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = 0.5, nsims = 1000 ) |> power() # Power for mixed-type two-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, n2 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, cv2 = 0.4, cor = c(0, 0.5), nsims = 1000 ) |> power() # Power for one-sample t-Test set.seed(1234) sim_log_lognormal( n1 = 20, ratio = c(1.2, 1.4), cv1 = 0.4, nsims = 1000 ) |> power() # Power for independent two-sample NB test set.seed(1234) sim_nb( n1 = 10, mean1 = 10, ratio = c(1.6, 2), dispersion1 = 2, dispersion2 = 2, nsims = 200 ) |> power() # Power for BNB test set.seed(1234) sim_bnb( n = 10, mean1 = 10, ratio = c(1.4, 1.6), dispersion = 10, nsims = 200 ) |> power()
Simulate data from the bivariate negative binomial (BNB) distribution. The
BNB distribution is used to simulate count data where the event counts are
jointly dependent (correlated). For independent data, see
sim_nb()
.
sim_bnb( n, mean1, mean2, ratio, dispersion, nsims = 1L, return_type = "list", max_zeros = 0.99, ncores = 1L )
sim_bnb( n, mean1, mean2, ratio, dispersion, nsims = 1L, return_type = "list", max_zeros = 0.99, ncores = 1L )
n |
(integer: |
mean1 |
(numeric: |
mean2 , ratio
|
(numeric:
|
dispersion |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
max_zeros |
(Scalar numeric: |
ncores |
(Scalar integer: |
The negative binomial distribution may be defined using a gamma-Poisson
mixture distribution. In this case, the Poisson parameter
is a random variable with gamma distribution. Equivalence between different
parameterizations are demonstrated below:
# Define constants and their relationships n <- 10000 dispersion <- 8 mu <- 4 p <- dispersion / (dispersion + mu) q <- mu / (mu + dispersion) variance <- mu + (mu^2 / dispersion) rate <- p / (1 - p) scale <- (1 - p) / p # alternative formula for mu mu_alt <- (dispersion * (1 - p)) / p stopifnot(isTRUE(all.equal(mu, mu_alt))) set.seed(20240321) # Using built-in rnbinom with dispersion and mean w <- rnbinom(n = n, size = dispersion, mu = mu) # Using gamma-Poisson mixture with gamma rate parameter x <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, rate = rate) ) # Using gamma-Poisson mixture with gamma scale parameter y <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, scale = scale) ) # Using gamma-Poisson mixture with multiplicative mean and # gamma scale parameter z <- rpois( n = n, lambda = mu * rgamma(n = n, shape = dispersion, scale = 1/dispersion) ) # Compare CDFs par(mar=c(4,4,1,1)) plot( x = sort(w), y = (1:n)/n, xlim = range(c(w,x,y,z)), ylim = c(0,1), col = 'green', lwd = 4, type = 'l', main = 'CDF' ) lines(x = sort(x), y = (1:n)/n, col = 'red', lwd = 2) lines(x = sort(y), y = (1:n)/n, col = 'yellow', lwd = 1.5) lines(x = sort(z), y = (1:n)/n, col = 'black')
The BNB distribution is implemented by compounding two conditionally
independent Poisson random variables
and
with a gamma random
variable
. The probability mass
function for the joint distribution of
is
where are specific values of the count
outcomes,
is the
dispersion
parameter
which controls the dispersion and level of correlation between the two
samples (otherwise known as the shape parameter of the gamma distribution),
is the mean parameter, and
is the
ratio
parameter
representing the multiplicative change in the mean of the second sample
relative to the first sample. denotes the random subject effect and
the gamma distribution scale parameter is assumed to be the inverse of the
dispersion parameter (
) for identifiability.
Correlation decreases from 1 to 0 as the dispersion parameter increases from 0 to infinity. For a given dispersion, increasing means also increases the correlation. See 'Examples' for a demonstration.
See 'Details' in sim_nb()
for additional information on the
negative binomial distribution.
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If return_type = "list"
, a list:
Slot | Name | Description |
1 | Simulated counts from sample 1. | |
2 | Simulated counts from sample 2. |
If return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item |
Subject/item indicator. |
2 | condition |
Sample/condition indicator. |
3 | value |
Simulated counts. |
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in a list-column named data
in
a depower simulation data frame:
Column | Name | Description |
1 | n1 |
Sample size of sample 1. |
2 | n2 |
Sample size of sample 2. |
3 | mean1 |
Mean for sample 1. |
4 | mean2 |
Mean for sample 2. |
5 | ratio |
Ratio of means (sample 2 / sample 1). |
6 | dispersion |
Gamma distribution shape parameter (dispersion). |
7 | nsims |
Number of valid simulation iterations. |
8 | distribution |
Distribution sampled from. |
9 | data |
List-column of simulated data. |
Yu L, Fernandez S, Brock G (2020). “Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.” BMC Bioinformatics, 21(1), 198. ISSN 1471-2105, doi:10.1186/s12859-020-3541-7.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
sim_nb()
, stats::rpois()
, stats::rgamma()
,
stats::rnbinom()
#---------------------------------------------------------------------------- # sim_bnb() examples #---------------------------------------------------------------------------- library(depower) # Paired two-sample data returned in a data frame sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 1, return_type = "data.frame" ) # Paired two-sample data returned in a list sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 1, return_type = "list" ) # Two simulations of paired two-sample data # returned as a list of data frames sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 2, return_type = "data.frame" ) # Two simulations of Paired two-sample data # returned as a list of lists sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 2, return_type = "list" ) #---------------------------------------------------------------------------- # Visualization of the BNB distribution as dispersion varies. #---------------------------------------------------------------------------- set.seed(1234) data <- lapply( X = c(1, 10, 100, 1000), FUN = function(x) { d <- sim_bnb( n = 1000, mean1 = 10, ratio = 1.5, dispersion = x, nsims = 1, return_type = "data.frame" ) cor <- cor( x = d[d$condition == "1", ]$value, y = d[d$condition == "2", ]$value ) cbind(dispersion = x, correlation = cor, d) } ) data <- do.call(what = "rbind", args = data) ggplot2::ggplot( data = data, mapping = ggplot2::aes(x = value, fill = condition) ) + ggplot2::facet_wrap( facets = ggplot2::vars(.data$dispersion), ncol = 2, labeller = ggplot2::labeller(.rows = ggplot2::label_both) ) + ggplot2::geom_density(alpha = 0.3) + ggplot2::coord_cartesian(xlim = c(0, 60)) + ggplot2::geom_text( mapping = ggplot2::aes( x = 30, y = 0.12, label = paste0("Correlation: ", round(correlation, 2)) ), check_overlap = TRUE ) + ggplot2::labs( x = "Value", y = "Density", fill = "Condition", caption = "Mean1=10, Mean2=15, ratio=1.5" )
#---------------------------------------------------------------------------- # sim_bnb() examples #---------------------------------------------------------------------------- library(depower) # Paired two-sample data returned in a data frame sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 1, return_type = "data.frame" ) # Paired two-sample data returned in a list sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 1, return_type = "list" ) # Two simulations of paired two-sample data # returned as a list of data frames sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 2, return_type = "data.frame" ) # Two simulations of Paired two-sample data # returned as a list of lists sim_bnb( n = 10, mean1 = 10, ratio = 1.6, dispersion = 3, nsims = 2, return_type = "list" ) #---------------------------------------------------------------------------- # Visualization of the BNB distribution as dispersion varies. #---------------------------------------------------------------------------- set.seed(1234) data <- lapply( X = c(1, 10, 100, 1000), FUN = function(x) { d <- sim_bnb( n = 1000, mean1 = 10, ratio = 1.5, dispersion = x, nsims = 1, return_type = "data.frame" ) cor <- cor( x = d[d$condition == "1", ]$value, y = d[d$condition == "2", ]$value ) cbind(dispersion = x, correlation = cor, d) } ) data <- do.call(what = "rbind", args = data) ggplot2::ggplot( data = data, mapping = ggplot2::aes(x = value, fill = condition) ) + ggplot2::facet_wrap( facets = ggplot2::vars(.data$dispersion), ncol = 2, labeller = ggplot2::labeller(.rows = ggplot2::label_both) ) + ggplot2::geom_density(alpha = 0.3) + ggplot2::coord_cartesian(xlim = c(0, 60)) + ggplot2::geom_text( mapping = ggplot2::aes( x = 30, y = 0.12, label = paste0("Correlation: ", round(correlation, 2)) ), check_overlap = TRUE ) + ggplot2::labs( x = "Value", y = "Density", fill = "Condition", caption = "Mean1=10, Mean2=15, ratio=1.5" )
Simulate data from the log transformed lognormal distribution (i.e. a normal distribution). This function handles all three cases:
One-sample data
Dependent two-sample data
Independent two-sample data
sim_log_lognormal( n1, n2 = NULL, ratio, cv1, cv2 = NULL, cor = 0, nsims = 1L, return_type = "list", ncores = 1L, messages = TRUE )
sim_log_lognormal( n1, n2 = NULL, ratio, cv1, cv2 = NULL, cor = 0, nsims = 1L, return_type = "list", ncores = 1L, messages = TRUE )
n1 |
(integer: |
n2 |
(integer: |
ratio |
(numeric:
See 'Details' for additional information. |
cv1 |
(numeric: |
cv2 |
(numeric: |
cor |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
ncores |
(Scalar integer: |
messages |
(Scalar logical: |
Based on assumed characteristics of the original lognormal distribution, data is simulated from the corresponding log-transformed (normal) distribution. This simulated data is suitable for assessing power of a hypothesis for the geometric mean or ratio of geometric means from the original lognormal data.
This method can also be useful for other population distributions which are positive and where it makes sense to describe the ratio of geometric means. However, the lognormal distribution is theoretically correct in the sense that you can log transform to a normal distribution, compute the summary statistic, then apply the inverse transformation to summarize on the original lognormal scale.
Let be the geometric mean and
be the
arithmetic mean. For independent lognormal samples
and
For dependent lognormal samples and
Unlike ratios and the arithmetic mean, for equal sample sizes of
and
it follows that
.
The coefficient of variation (CV) for is defined as
The relationship between sample statistics for the original lognormal data
() and the natural logged data (
) are
and
Based on the properties of correlation and variance,
The standard deviation of the differences gets smaller the more positive the correlation and conversely gets larger the more negative the correlation. For the special case where the two samples are uncorrelated and each has the same variance, it follows that
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If one-sample data with return_type = "list"
, a list:
Slot | Name | Description |
1 | One sample of simulated normal values. |
If one-sample data with return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item |
Pair/subject/item indicator. |
2 | value |
Simulated normal values. |
If two-sample data with return_type = "list"
, a list:
Slot | Name | Description |
1 | Simulated normal values from sample 1. | |
2 | Simulated normal values from sample 2. |
If two-sample data with return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item |
Pair/subject/item indicator. |
2 | condition |
Time/group/condition indicator. |
3 | value |
Simulated normal values. |
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in data frame, located in a
list-column named data
.
If one-sample data, a data frame:
Column | Name | Description |
1 | n1 |
The sample size. |
2 | ratio |
Geometric mean [GM(sample 1)]. |
3 | cv1 |
Coefficient of variation for sample 1. |
4 | nsims |
Number of data simulations. |
5 | distribution |
Distribution sampled from. |
6 | data |
List-column of simulated data. |
If two-sample data, a data frame:
Column | Name | Description |
1 | n1 |
Sample size of sample 1. |
2 | n2 |
Sample size of sample 2. |
3 | ratio |
Ratio of geometric means [GM(sample 2) / GM(sample 1)] or geometric mean ratio [GM(sample 2 / sample 1)]. |
4 | cv1 |
Coefficient of variation for sample 1. |
5 | cv2 |
Coefficient of variation for sample 2. |
6 | cor |
Correlation between samples. |
7 | nsims |
Number of data simulations. |
8 | distribution |
Distribution sampled from. |
9 | data |
List-column of simulated data. |
Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.
Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.
Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.
stats::rnorm()
, mvnfast::rmvn()
#---------------------------------------------------------------------------- # sim_log_lognormal() examples #---------------------------------------------------------------------------- library(depower) # Independent two-sample data returned in a data frame sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 1, return_type = "data.frame" ) # Independent two-sample data returned in a list sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 1, return_type = "list" ) # Dependent two-sample data returned in a data frame sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 1, return_type = "data.frame" ) # Dependent two-sample data returned in a list sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 1, return_type = "list" ) # One-sample data returned in a data frame sim_log_lognormal( n1 = 10, ratio = 1.3, cv1 = 0.35, nsims = 1, return_type = "data.frame" ) # One-sample data returned in a list sim_log_lognormal( n1 = 10, ratio = 1.3, cv1 = 0.35, nsims = 1, return_type = "list" ) # Independent two-sample data: two simulations for four parameter combinations. # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), n2 = c(10, 20), ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 2, return_type = "list" ) # Dependent two-sample data: two simulations for two parameter combinations. # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), n2 = c(10, 20), ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 2, return_type = "list" ) # One-sample data: two simulations for two parameter combinations # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), ratio = 1.3, cv1 = 0.35, nsims = 2, return_type = "list" )
#---------------------------------------------------------------------------- # sim_log_lognormal() examples #---------------------------------------------------------------------------- library(depower) # Independent two-sample data returned in a data frame sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 1, return_type = "data.frame" ) # Independent two-sample data returned in a list sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 1, return_type = "list" ) # Dependent two-sample data returned in a data frame sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 1, return_type = "data.frame" ) # Dependent two-sample data returned in a list sim_log_lognormal( n1 = 10, n2 = 10, ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 1, return_type = "list" ) # One-sample data returned in a data frame sim_log_lognormal( n1 = 10, ratio = 1.3, cv1 = 0.35, nsims = 1, return_type = "data.frame" ) # One-sample data returned in a list sim_log_lognormal( n1 = 10, ratio = 1.3, cv1 = 0.35, nsims = 1, return_type = "list" ) # Independent two-sample data: two simulations for four parameter combinations. # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), n2 = c(10, 20), ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0, nsims = 2, return_type = "list" ) # Dependent two-sample data: two simulations for two parameter combinations. # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), n2 = c(10, 20), ratio = 1.3, cv1 = 0.35, cv2 = 0.35, cor = 0.4, nsims = 2, return_type = "list" ) # One-sample data: two simulations for two parameter combinations # Returned as a list-column of lists within a data frame sim_log_lognormal( n1 = c(10, 20), ratio = 1.3, cv1 = 0.35, nsims = 2, return_type = "list" )
Simulate data from two independent negative binomial (NB) distributions. For
paired data, see sim_bnb()
.
sim_nb( n1, n2 = n1, mean1, mean2, ratio, dispersion1, dispersion2 = dispersion1, nsims = 1L, return_type = "list", max_zeros = 0.99, ncores = 1L )
sim_nb( n1, n2 = n1, mean1, mean2, ratio, dispersion1, dispersion2 = dispersion1, nsims = 1L, return_type = "list", max_zeros = 0.99, ncores = 1L )
n1 |
(integer: |
n2 |
(integer: |
mean1 |
(numeric: |
mean2 , ratio
|
(numeric:
|
dispersion1 |
(numeric: |
dispersion2 |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
max_zeros |
(Scalar numeric: |
ncores |
(Scalar integer: |
The negative binomial distribution has many parameterizations. In the regression modeling context, it is common to specify the distribution in terms of its mean and dispersion. We use the following probability mass function:
where ,
is the dispersion parameter, and
is the mean.
This is analogous to the typical formulation where
is counting
failures given
successes and
is the probability of success on each
trial. It follows that
and
. The
parameter
describes the 'dispersion' among observations. Smaller values of
lead to overdispersion and larger values of
decrease the
overdispersion, eventually converging to the Poisson distribution.
Described above is the 'indirect quadratic parameterization' of the negative
binomial distribution, which is commonly found in the R ecosystem. However, it
is somewhat counterintuitive because the smaller gets, the larger
the overdispersion. The 'direct quadratic parameterization' of the negative
binomial distribution may be found in some R packages and other languages
such as SAS and Stata. The direct parameterization is defined by substituting
(
) which results in
. In this case, the larger
gets
the larger the overdispersion, and the Poisson distribution is a special case
of the negative binomial distribution where
.
A general class of negative binomial models may be defined with mean
and variance
. The 'linear
parameterization' is then found by setting
, resulting in
. It's common to label the linear
parameterization as 'NB1' and the direct quadratic parameterization as 'NB2'.
See 'Details' in sim_bnb()
for additional information on the
gamma-Poisson mixture formulation of the negative binomial distribution.
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If return_type = "list"
, a list:
Slot | Name | Description |
1 | Simulated counts from group 1. | |
2 | Simulated counts from group 2. |
If return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item |
Subject/item indicator. |
2 | condition |
Group/condition indicator. |
3 | value |
Simulated counts. |
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in a list-column named data
in
a depower simulation data frame:
Column | Name | Description |
1 | n1 |
Sample size of group 1. |
2 | n2 |
Sample size of group 2. |
3 | mean1 |
Mean for group 1. |
4 | mean2 |
Mean for group 2. |
5 | ratio |
Ratio of means (group 2 / group 1). |
6 | dispersion1 |
Dispersion parameter for group 1. |
7 | dispersion2 |
Dispersion parameter for group 2. |
8 | nsims |
Number of valid simulation iterations. |
9 | distribution |
Distribution sampled from. |
10 | data |
List-column of simulated data. |
Yu L, Fernandez S, Brock G (2017). “Power analysis for RNA-Seq differential expression studies.” BMC Bioinformatics, 18(1), 234. ISSN 1471-2105, doi:10.1186/s12859-017-1648-2.
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.
Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.
Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, Econometric Society Monographs, 2 edition. Cambridge University Press. doi:10.1017/CBO9781139013567.
#---------------------------------------------------------------------------- # sim_nb() examples #---------------------------------------------------------------------------- library(depower) # Independent two-sample NB data returned in a data frame sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 1, return_type = "data.frame" ) # Independent two-sample NB data returned in a list sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 1, return_type = "list" ) # Two simulations of independent two-sample data # returned as a list of data frames sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 2, return_type = "data.frame" ) # Two simulations of independent two-sample data # returned as a list of lists sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 2, return_type = "list" ) #---------------------------------------------------------------------------- # Visualization of the NB distribution as dispersion varies between groups. #---------------------------------------------------------------------------- disp <- expand.grid(c(1, 10, 100), c(1, 10, 100)) set.seed(1234) data <- mapply( FUN = function(disp1, disp2) { d <- sim_nb( n1 = 1000, mean1 = 10, ratio = 1.5, dispersion1 = disp1, dispersion2 = disp2, nsims = 1, return_type = "data.frame" ) cbind(dispersion1 = disp1, dispersion2 = disp2, d) }, disp1 = disp[[1]], disp2 = disp[[2]], SIMPLIFY = FALSE ) data <- do.call(what = "rbind", args = data) ggplot2::ggplot( data = data, mapping = ggplot2::aes(x = value, fill = condition) ) + ggplot2::facet_grid( rows = ggplot2::vars(.data$dispersion2), cols = ggplot2::vars(.data$dispersion1), labeller = ggplot2::labeller( .rows = ggplot2::label_both, .cols = ggplot2::label_both ) ) + ggplot2::geom_density(alpha = 0.3) + ggplot2::coord_cartesian(xlim = c(0, 50)) + ggplot2::labs( x = "Value", y = "Density", fill = "Condition", caption = "Mean1=10, Mean2=15, ratio=1.5" )
#---------------------------------------------------------------------------- # sim_nb() examples #---------------------------------------------------------------------------- library(depower) # Independent two-sample NB data returned in a data frame sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 1, return_type = "data.frame" ) # Independent two-sample NB data returned in a list sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 1, return_type = "list" ) # Two simulations of independent two-sample data # returned as a list of data frames sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 2, return_type = "data.frame" ) # Two simulations of independent two-sample data # returned as a list of lists sim_nb( n1 = 10, mean1 = 5, ratio = 1.6, dispersion1 = 0.5, dispersion2 = 0.5, nsims = 2, return_type = "list" ) #---------------------------------------------------------------------------- # Visualization of the NB distribution as dispersion varies between groups. #---------------------------------------------------------------------------- disp <- expand.grid(c(1, 10, 100), c(1, 10, 100)) set.seed(1234) data <- mapply( FUN = function(disp1, disp2) { d <- sim_nb( n1 = 1000, mean1 = 10, ratio = 1.5, dispersion1 = disp1, dispersion2 = disp2, nsims = 1, return_type = "data.frame" ) cbind(dispersion1 = disp1, dispersion2 = disp2, d) }, disp1 = disp[[1]], disp2 = disp[[2]], SIMPLIFY = FALSE ) data <- do.call(what = "rbind", args = data) ggplot2::ggplot( data = data, mapping = ggplot2::aes(x = value, fill = condition) ) + ggplot2::facet_grid( rows = ggplot2::vars(.data$dispersion2), cols = ggplot2::vars(.data$dispersion1), labeller = ggplot2::labeller( .rows = ggplot2::label_both, .cols = ggplot2::label_both ) ) + ggplot2::geom_density(alpha = 0.3) + ggplot2::coord_cartesian(xlim = c(0, 50)) + ggplot2::labs( x = "Value", y = "Density", fill = "Condition", caption = "Mean1=10, Mean2=15, ratio=1.5" )
Performs paired and one-sample t-Tests.
t_test_paired(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)
t_test_paired(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)
data |
(list) |
alternative |
(string: |
ci_level |
(Scalar numeric: |
mean_null |
(Scalar numeric: |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
The one-sample test is used for both the true one-sample scenario and for the paired differences from a dependent two-sample scenario. Below we use paired difference language as that is the most common case. The hypotheses for the paired t-test are
where is the arithmetic mean of the paired
differences (sample 2 - sample 1) and
is a constant for the
assumed population mean difference (usually
).
The test statistic is
where is the sample mean of the differences,
is the population mean difference assumed under the null hypothesis,
is the sample size of the differences, and
is the sample variance.
The critical value of the test statistic has degrees of freedom
and the p-value is calculated as
Let be the geometric mean and
be the
arithmetic mean. For dependent lognormal samples
and
it
follows that
and
are dependent normally
distributed variables. Setting
we have
This forms the basis for making inference about the geometric mean ratio of the original lognormal data using the mean difference of the log transformed normal data.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | t |
Value of the t-statistic. | |
2 | df |
Degrees of freedom for the t-statistic. | |
3 | p |
p-value. | |
4 | mean_diff |
Estimated mean or mean of the differences (sample 2 – sample 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | n |
Number of paired observations. | |
6 | method |
Method used for the results. | |
7 | alternative |
The alternative hypothesis. | |
8 | ci_level |
The confidence level. | |
9 | mean_null |
Assumed population mean of the differences under the null hypothesis. |
Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.
Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.
Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.
#---------------------------------------------------------------------------- # t_test_paired() examples #---------------------------------------------------------------------------- library(depower) # One-sample t-test set.seed(1234) t_test1 <- sim_log_lognormal( n1 = 40, ratio = 1.5, cv1 = 0.4 ) |> t_test_paired(ci_level = 0.95) t_test1 # Paired t-test using two dependent samples set.seed(1234) t_test2 <- sim_log_lognormal( n1 = 40, n2 = 40, ratio = 1.5, cv1 = 0.4, cv2 = 0.2, cor = 0.3 ) |> t_test_paired(ci_level = 0.95) t_test2
#---------------------------------------------------------------------------- # t_test_paired() examples #---------------------------------------------------------------------------- library(depower) # One-sample t-test set.seed(1234) t_test1 <- sim_log_lognormal( n1 = 40, ratio = 1.5, cv1 = 0.4 ) |> t_test_paired(ci_level = 0.95) t_test1 # Paired t-test using two dependent samples set.seed(1234) t_test2 <- sim_log_lognormal( n1 = 40, n2 = 40, ratio = 1.5, cv1 = 0.4, cv2 = 0.2, cor = 0.3 ) |> t_test_paired(ci_level = 0.95) t_test2
Performs Welch's independent two-sample t-test.
t_test_welch(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)
t_test_welch(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)
data |
(list) |
alternative |
(string: |
ci_level |
(Scalar numeric: |
mean_null |
(Scalar numeric: |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
The hypotheses for Welch's independent two-sample t-test are
where is the population mean of group 1,
is the
population mean of group 2, and
is a constant for the assumed
difference of population means (usually
).
The test statistic is
where and
are the sample means,
is the difference of population means assumed under the null hypothesis,
and
are the sample sizes, and
and
are the sample variances.
The critical value of the test statistic uses the Welch–Satterthwaite degrees of freedom
and the p-value is calculated as
Let be the geometric mean and
be the
arithmetic mean. For independent lognormal variables
and
it follows that
and
are independent normally
distributed variables. Defining
we have
This forms the basis for making inference about the ratio of geometric means of the original lognormal data using the difference of means of the log transformed normal data.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | t |
Value of the t-statistic. | |
2 | df |
Degrees of freedom for the t-statistic. | |
3 | p |
p-value. | |
4 | diff_mean |
Estimated difference of means (group 2 – group 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean_1 |
Estimated mean of group 1. | |
6 | mean_2 |
Estimated mean of group 2. | |
7 | n1 |
Sample size of group 1. | |
8 | n2 |
Sample size of group 2. | |
9 | method |
Method used for the results. | |
10 | alternative |
The alternative hypothesis. | |
11 | ci_level |
The confidence level. | |
12 | mean_null |
Assumed population difference of the means under the null hypothesis. |
Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.
Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.
Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.
#---------------------------------------------------------------------------- # t_test_welch() examples #---------------------------------------------------------------------------- library(depower) # Welch's t-test set.seed(1234) sim_log_lognormal( n1 = 40, n2 = 40, ratio = 1.5, cv1 = 0.4, cv2 = 0.4 ) |> t_test_welch(ci_level = 0.95)
#---------------------------------------------------------------------------- # t_test_welch() examples #---------------------------------------------------------------------------- library(depower) # Welch's t-test set.seed(1234) sim_log_lognormal( n1 = 40, n2 = 40, ratio = 1.5, cv1 = 0.4, cv2 = 0.4 ) |> t_test_welch(ci_level = 0.95)
Wald test for the ratio of means from bivariate negative binomial outcomes.
wald_test_bnb(data, ci_level = NULL, link = "log", ratio_null = 1, ...)
wald_test_bnb(data, ci_level = NULL, link = "log", ratio_null = 1, ...)
data |
(list) |
ci_level |
(Scalar numeric: |
link |
(Scalar string: |
ratio_null |
(Scalar numeric: |
... |
Optional arguments passed to the MLE function |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
where
is the random item (subject)
effect. Then
is the joint
distribution where
and
are dependent (though conditionally
independent),
is the count outcome for sample 1 of the items
(subjects),
is the count outcome for sample 2 of the items (subjects),
is the conditional mean of sample 1,
is the ratio of the
conditional means of sample 2 with respect to sample 1, and
is
the gamma distribution shape parameter which controls the dispersion and the
correlation between sample 1 and 2.
The hypotheses for the Wald test of are
where is a one-to-one link function with nonzero derivative,
is the population ratio of arithmetic
means for sample 2 with respect to sample 1, and
is a constant
for the assumed null population ratio of means (typically
).
Rettiganti and Nagaraja (2012) found that ,
, and
had greatest power when
.
However, when
,
, the likelihood ratio test, and
had greatest power.
was biased when
. Both
and
produced
acceptable results for any
value. These results depend on the use of
asymptotic vs. exact critical values.
The Wald test statistic is
where
Under , the Wald test statistic is asymptotically distributed
as
. The approximate level
test rejects
if
. Note that
the asymptotic critical value is known to underestimate the exact critical
value. Hence, the nominal significance level may not be achieved for small
sample sizes (possibly
or
). The level of
significance inflation also depends on
and is most severe for
, where only the exact critical value is recommended.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (group 2 / group 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean1 |
Estimated mean of sample 1. | |
6 | mean2 |
Estimated mean of sample 2. | |
7 | dispersion |
Estimated dispersion. | |
8 | n1 |
The sample size of sample 1. | |
9 | n2 |
The sample size of sample 2. | |
10 | method |
Method used for the results. | |
11 | ci_level |
The confidence level. | |
12 | link |
Link function used to transform the ratio of means in the test hypotheses. | |
13 | ratio_null |
Assumed ratio of means under the null hypothesis. | |
14 | mle_code |
Integer indicating why the optimization process terminated. | |
15 | mle_message |
Information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # wald_test_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) |> wald_test_bnb()
#---------------------------------------------------------------------------- # wald_test_bnb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_bnb( n = 40, mean1 = 10, ratio = 1.2, dispersion = 2 ) |> wald_test_bnb()
Wald test for the ratio of means from two independent negative binomial outcomes.
wald_test_nb( data, equal_dispersion = FALSE, ci_level = NULL, link = "log", ratio_null = 1, ... )
wald_test_nb( data, equal_dispersion = FALSE, ci_level = NULL, link = "log", ratio_null = 1, ... )
data |
(list) |
equal_dispersion |
(Scalar logical: |
ci_level |
(Scalar numeric: |
link |
(Scalar string: |
ratio_null |
(Scalar numeric: |
... |
Optional arguments passed to the MLE function |
This function is primarily designed for speed in simulation. Missing values are silently excluded.
Suppose and
where
and
are
independent,
is the count outcome for items in group 1,
is the count outcome for items in group 2,
is the arithmetic mean
count in group 1,
is the ratio of arithmetic means for group 2 with
respect to group 1,
is the dispersion parameter of group 1,
and
is the dispersion parameter of group 2.
The hypotheses for the Wald test of are
where is a one-to-one link function with nonzero derivative,
is the population ratio of arithmetic
means for group 2 with respect to group 1, and
is a constant
for the assumed null population ratio of means (typically
).
Rettiganti and Nagaraja (2012) found that and
had greatest power when
. However, when
,
, the likelihood ratio test, and the Rao score
test have greatest power. Note that
, LRT, and RST were
unbiased tests while the
and
tests were
biased when
. The
, LRT, and RST produced
acceptable results for any
value. These results depend on the use of
asymptotic vs. exact critical values.
The Wald test statistic is
where
Under , the Wald test statistic is asymptotically distributed
as
. The approximate level
test rejects
if
. Note that
the asymptotic critical value is known to underestimate the exact critical
value. Hence, the nominal significance level may not be achieved for small
sample sizes (possibly
or
). The level of
significance inflation also depends on
and is most severe for
, where only the exact critical value is recommended.
A list with the following elements:
Slot | Subslot | Name | Description |
1 | chisq |
Asymptotic test statistic for the ratio of means. |
|
2 | df |
Degrees of freedom. | |
3 | p |
p-value. | |
4 | ratio |
Estimated ratio of means (group 2 / group 1). | |
4 | 1 | estimate |
Point estimate. |
4 | 2 | lower |
Confidence interval lower bound. |
4 | 3 | upper |
Confidence interval upper bound. |
5 | mean1 |
Estimated mean of group 1. | |
6 | mean2 |
Estimated mean of group 2. | |
7 | dispersion1 |
Estimated dispersion of group 1. | |
8 | dispersion2 |
Estimated dispersion of group 2. | |
9 | n1 |
Sample size of group 1. | |
10 | n2 |
Sample size of group 2. | |
11 | method |
Method used for the results. | |
12 | ci_level |
The confidence level. | |
13 | equal_dispersion |
Whether or not equal dispersions were assumed. | |
14 | link |
Link function used to transform the ratio of means in the test hypotheses. | |
15 | ratio_null |
Assumed ratio of means under the null hypothesis. | |
16 | mle_code |
Integer indicating why the optimization process terminated. | |
17 | mle_message |
Information from the optimizer. |
Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.
Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.
#---------------------------------------------------------------------------- # wald_test_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) |> wald_test_nb()
#---------------------------------------------------------------------------- # wald_test_nb() examples #---------------------------------------------------------------------------- library(depower) set.seed(1234) sim_nb( n1 = 60, n2 = 40, mean1 = 10, ratio = 1.5, dispersion1 = 2, dispersion2 = 8 ) |> wald_test_nb()