| Title: | Sample Size Calculations for Microarray Experiments |
|---|---|
| Description: | Functions that calculate appropriate sample sizes for one-sample t-tests, two-sample t-tests, and F-tests for microarray experiments based on desired power while controlling for false discovery rates. For all tests, the standard deviations (variances) among genes can be assumed fixed or random. This is also true for effect sizes among genes in one-sample and two sample experiments. Functions also output a chart of power versus sample size, a table of power at different sample sizes, and a table of critical test values at different sample sizes. |
| Authors: | Megan Orr [aut, cre], Peng Liu [aut] |
| Maintainer: | Megan Orr <[email protected]> |
| License: | GPL-3 |
| Version: | 1.3 |
| Built: | 2026-05-20 06:07:02 UTC |
| Source: | https://github.com/cran/ssize.fdr |
This package calculates appropriate sample sizes for one-sample, two-sample, and multi-sample microarray experiments for a desired power of the test. Sample sizes are calculated under controlled false discovery rates and fixed proportions of non-differentially expressed genes. Outputs a graph of power versus sample size.
| Package: | ssize.fdr |
| Type: | Package |
| Version: | 1.3 |
| Date: | 2022-06-05 |
| License: | GPL-3 |
For all functions, the user inputs the desired power, the false
discovery rate to be controlled, the proportion(s) of non-
differentially expressed genes, and the maximum possible sample size
to be used in calculations. If the user inputs a vector of proportions
of non-differentially expressed genes, samples size calculations are
performed for each proportion. For the function ssize.twoSamp,
the user must additionally input the common difference in mean treatment
expressions as well as the common standard deviation for all genes.
This becomes the common effect size and common standard deviation for
all genes when using the function ssize.oneSamp. For the
function ssize.twoSampVary (ssize.oneSampVary)
the differences in mean treatment expressions (effect sizes) are assumed
to follow a normal distribution and the variances among genes are assumed
to follow an inverse gamma distribution, so parameters for these
distributions must be entered. For the function ssize.F,
the design matrix of the experiment, the parameter vector, and an optional
coefficient matrix or vector of linear contrasts of interest must also be
entered. The function ssize.Fvary allows the variances of
the genes to follow an inverse gamma distribution, so the shape and scale
parameters must be specified by the user.
Megan Orr <[email protected]>, Peng Liu <[email protected]>
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
a<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20; N1<-35 ##maximum sample size for calculations ##Example of function ssize.oneSamp d<-1 ##effect size s<-0.5 ##standard deviation os<-ssize.oneSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") os$ssize ##first sample sizes to reach desired power os$power ##calculated power for each sample size os$crit.vals ##calculated critical value for each sample size ##Example of function ssize.oneSampVary dm<-2; ds<-1 ##the effect sizes of the genes follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of the genes follow an Inverse Gamma(3,1) distribution. osv<-ssize.oneSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a,power=pwr, pi0=p0,maxN=N1,side="two-sided") osv$ssize ##first sample sizes to reach desired power osv$power ##calculated power for each sample size osv$crit.vals ##calculated critical value for each sample size ##Example of function ssize.twoSamp ##Calculates sample sizes for two-sample microarray experiments ##See Figure 1.(a) of Liu & Hwang (2007) d1<-1 ##difference in differentially expressed genes to be detected s1<-0.5 ##standard deviation ts<-ssize.twoSamp(delta=d1,sigma=s1,fdr=a,power=pwr,pi0=pi,maxN=N,side="two-sided") ts$ssize ##first sample sizes to reach desired power ts$power ##calculated power for each sample size ts$crit.vals ##calculated critical value for each sample size ##Example of function ssize.twoSampVary ##Calculates sample sizes for multi-sample microarray experiments in which both the differences in ##expressions between treatments and the standard deviations vary among genes. ##See Figure 3.(a) of Liu & Hwang (2007) dm<-2 ##mean parameter of normal distribution of differences ##between treatments among genes ds<-1 ##standard deviation parameter of normal distribution ##of differences between treatments among genes alph<-3 ##shape parameter of inverse gamma distribution followed ##by standard deviations of genes beta<-1 ##scale parameter of inverse gamma distribution followed ##by standard deviations of genes tsv<-ssize.twoSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta, fdr=a,power=pwr,pi0=p0,maxN=N1,side="two-sided") tsv$ssize ##first sample sizes to reach desired power tsv$power ##calculated power for each sample size tsv$crit.vals ##calculated critical value for each sample sizesv ##Example of function ssize.F ##Sample size calculation for three-treatment loop design microarray experiment ##See Figure S2. of Liu & Hwang (2007) des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and ##second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 s<-1 ##standard deviation p0.F<-c(0.5,0.9,0.95,0.995) ##proportions of non-differentially expressed genes ft<-ssize.F(X=des,beta=b,dn=df,sigma=s,fdr=a,power=pwr,pi0=p0.F,maxN=N) ft$ssize ##first sample sizes to reach desired power ft$power ##calculated power for each sample size ft$crit.vals ##calculated critical value for each sample sizeft$ssize ##Example of function ssize.Fvary ##Sample size calculation for three-treatment loop design microarray experiment des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and ##second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 alph<-3;beta<-1 ##variances among genes follow an Inverse Gamma(3,1) a1<-0.05 ##fdr to be fixed p0.F<-c(0.9,0.95,0.995) ##proportions of non-differentially expressed genes ftv<-ssize.Fvary(X=des,beta=b,dn=df,a=alph,b=beta,fdr=a1,power=pwr,pi0=p0,maxN=N1) ftv$ssize ##first sample sizes to reach desired power ftv$power ##calculated power for each sample size ftv$crit.vals ##calculated critical value for each sample sizeft$ssizea<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20; N1<-35 ##maximum sample size for calculations ##Example of function ssize.oneSamp d<-1 ##effect size s<-0.5 ##standard deviation os<-ssize.oneSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") os$ssize ##first sample sizes to reach desired power os$power ##calculated power for each sample size os$crit.vals ##calculated critical value for each sample size ##Example of function ssize.oneSampVary dm<-2; ds<-1 ##the effect sizes of the genes follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of the genes follow an Inverse Gamma(3,1) distribution. osv<-ssize.oneSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a,power=pwr, pi0=p0,maxN=N1,side="two-sided") osv$ssize ##first sample sizes to reach desired power osv$power ##calculated power for each sample size osv$crit.vals ##calculated critical value for each sample size ##Example of function ssize.twoSamp ##Calculates sample sizes for two-sample microarray experiments ##See Figure 1.(a) of Liu & Hwang (2007) d1<-1 ##difference in differentially expressed genes to be detected s1<-0.5 ##standard deviation ts<-ssize.twoSamp(delta=d1,sigma=s1,fdr=a,power=pwr,pi0=pi,maxN=N,side="two-sided") ts$ssize ##first sample sizes to reach desired power ts$power ##calculated power for each sample size ts$crit.vals ##calculated critical value for each sample size ##Example of function ssize.twoSampVary ##Calculates sample sizes for multi-sample microarray experiments in which both the differences in ##expressions between treatments and the standard deviations vary among genes. ##See Figure 3.(a) of Liu & Hwang (2007) dm<-2 ##mean parameter of normal distribution of differences ##between treatments among genes ds<-1 ##standard deviation parameter of normal distribution ##of differences between treatments among genes alph<-3 ##shape parameter of inverse gamma distribution followed ##by standard deviations of genes beta<-1 ##scale parameter of inverse gamma distribution followed ##by standard deviations of genes tsv<-ssize.twoSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta, fdr=a,power=pwr,pi0=p0,maxN=N1,side="two-sided") tsv$ssize ##first sample sizes to reach desired power tsv$power ##calculated power for each sample size tsv$crit.vals ##calculated critical value for each sample sizesv ##Example of function ssize.F ##Sample size calculation for three-treatment loop design microarray experiment ##See Figure S2. of Liu & Hwang (2007) des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and ##second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 s<-1 ##standard deviation p0.F<-c(0.5,0.9,0.95,0.995) ##proportions of non-differentially expressed genes ft<-ssize.F(X=des,beta=b,dn=df,sigma=s,fdr=a,power=pwr,pi0=p0.F,maxN=N) ft$ssize ##first sample sizes to reach desired power ft$power ##calculated power for each sample size ft$crit.vals ##calculated critical value for each sample sizeft$ssize ##Example of function ssize.Fvary ##Sample size calculation for three-treatment loop design microarray experiment des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and ##second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 alph<-3;beta<-1 ##variances among genes follow an Inverse Gamma(3,1) a1<-0.05 ##fdr to be fixed p0.F<-c(0.9,0.95,0.995) ##proportions of non-differentially expressed genes ftv<-ssize.Fvary(X=des,beta=b,dn=df,a=alph,b=beta,fdr=a1,power=pwr,pi0=p0,maxN=N1) ftv$ssize ##first sample sizes to reach desired power ftv$power ##calculated power for each sample size ftv$crit.vals ##calculated critical value for each sample sizeft$ssize
Calculates appropriate sample sizes for multi-sample microarray experiments for a desired power. Sample size calculations are performed at controlled false discovery rates and user-specified proportions of non-differentially expressed genes, design matrix, and standard deviation. A graph of power versus sample size is created.
ssize.F(X, beta, L = NULL, dn, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 20, cex.title=1.15, cex.legend=1)ssize.F(X, beta, L = NULL, dn, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 20, cex.title=1.15, cex.legend=1)
X |
design matrix of experiment |
beta |
parameter vector |
L |
coefficient matrix or vector for linear contrasts of interest |
dn |
a function of the degrees of freedom based on the design of the experiment |
sigma |
the standard deviation for all genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
Standard deviations are assumed to be identical for all genes.
See the function ssize.Fvary for sample size
calculations with varying standard deviations among genes.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes (for each treatment) at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations with corresponding sample sizes |
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >100.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSampVary, ssize.oneSamp,
ssize.oneSampVary, ssize.F,
ssize.Fvary
##Sample size calculation for three-treatment loop design microarray experiment ##See Figure S2 of Liu & Hwang (2007) des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and #second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 s<-1 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr1<-0.8 ##desired power p0<-c(0.5,0.9,0.95,0.995) ##proportions of non-differentially expressed genes N1<-20 ##maximum sample size for calculations ft<-ssize.F(X=des,beta=b,dn=df,sigma=s,fdr=a,power=pwr1,pi0=p0,maxN=N1) ft$ssize ##first sample sizes to reach desired power for each proportion of #non-differentially expressed genes ft$power ##power for each sample size ft$crit.vals ##critical value for each sample size##Sample size calculation for three-treatment loop design microarray experiment ##See Figure S2 of Liu & Hwang (2007) des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and #second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 s<-1 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr1<-0.8 ##desired power p0<-c(0.5,0.9,0.95,0.995) ##proportions of non-differentially expressed genes N1<-20 ##maximum sample size for calculations ft<-ssize.F(X=des,beta=b,dn=df,sigma=s,fdr=a,power=pwr1,pi0=p0,maxN=N1) ft$ssize ##first sample sizes to reach desired power for each proportion of #non-differentially expressed genes ft$power ##power for each sample size ft$crit.vals ##critical value for each sample size
Calculates appropriate sample sizes for multi-sample microarray experiments in which standard deviations vary among genes. Sample sizes are determined based on a desired power, a controlled false discovery rate, and user-specified proportions of non-differentially expressed genes and design matrix. A graph of power versus sample size is created.
ssize.Fvary(X, beta, L = NULL, dn, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 20, cex.title=1.15, cex.legend=1)ssize.Fvary(X, beta, L = NULL, dn, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 20, cex.title=1.15, cex.legend=1)
X |
design matrix of experiment |
beta |
parameter vector |
L |
coefficient matrix or vector for linear contrasts of interest |
dn |
a function of the degrees of freedom based on the design of the experiment |
a |
shape parameter of inverse gamma distribution followed by variances of genes |
b |
scale parameter of inverse gamma distribution followed by variances of genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
The variances among genes are assumed to follow an Inverse Gamma
distribution with shape parameter a and scale parameter
b.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes (for each treatment) at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations with corresponding sample sizes |
Numerical integration used in calculations performed by the function
integrate, which uses adaptive quadrature of functions.
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >100.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSamp, ssize.twoSampVary,
ssize.oneSamp, ssize.oneSampVary,
ssize.F
##Sample size calculation for three-treatment loop design microarray experiment des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and #second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 alph<-3;beta<-1 ##variances among genes follow an Inverse Gamma(3,1) a1<-0.05 ##fdr to be fixed pwr<-0.8 ##desired power p0<-c(0.9,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations ftv<-ssize.Fvary(X=des,beta=b,dn=df,a=alph,b=beta,fdr=a1,power=pwr,pi0=p0,maxN=N1) ftv$ssize ##first sample sizes to reach desired power ftv$power ##calculated power for each sample size ftv$crit.vals ##calculated critical value for each sample sizeft$ssize##Sample size calculation for three-treatment loop design microarray experiment des<-matrix(c(1,-1,0,0,1,-1),ncol=2,byrow=FALSE) ##design matrix of loop design experiment b<-c(1,-0.5) ##difference between first two treatments is 1 and #second and third treatments is -0.5 df<-function(n){3*n-2} ##degrees of freedom for this design is 3n-2 alph<-3;beta<-1 ##variances among genes follow an Inverse Gamma(3,1) a1<-0.05 ##fdr to be fixed pwr<-0.8 ##desired power p0<-c(0.9,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations ftv<-ssize.Fvary(X=des,beta=b,dn=df,a=alph,b=beta,fdr=a1,power=pwr,pi0=p0,maxN=N1) ftv$ssize ##first sample sizes to reach desired power ftv$power ##calculated power for each sample size ftv$crit.vals ##calculated critical value for each sample sizeft$ssize
Calculates appropriate sample sizes for one-sample microarray experiments for a desired power. Sample size calculations are performed at controlled false discovery rates and user-specified proportions of non-differentially expressed genes, effect size, and standard deviation. A graph of power versus sample size is created.
ssize.oneSamp(delta, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)ssize.oneSamp(delta, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)
delta |
the common effect size for all genes |
sigma |
the standard deviation for all genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
side |
options are "two-sided", "upper", or "lower" |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
Effect sizes and standard deviations are assumed to be identical
for all genes. See the function ssize.oneSampVary
for sample size calculations with varying effects sizes and
standard deviations among genes.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations with corresponding sample sizes |
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >20.
Running this function with the side option of “lower” will
possibly result in multiple warnings. Calculating the probability
that an observation is less than the negative critical value under
a t-distribution with non-centrality parameter delta/sigma
(see argument section above) and the appropriate degrees of freedom
is a calculation that is performed many times while the function
runs. When the difference between the critical value and
delta/sigma is large, this probability is virtually zero.
This happens repeatedly while the function optimize
finds the appropriate critical value for each sample size. Because
of this, the function pt outputs a value <1e-8 in
addition to a warning of “full precision not achieved”. This has no
impact on the accuracy of the resulting calculations of sample size.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSampVary, ssize.oneSamp,
ssize.oneSampVary, ssize.F,
ssize.Fvary
d<-2 ##effect size s<-1 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20 ##maximum sample size for calculations os<-ssize.oneSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") os$ssize ##first sample sizes to reach desired power os$power ##calculated power for each sample size os$crit.vals ##calculated critical value for each sample sized<-2 ##effect size s<-1 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20 ##maximum sample size for calculations os<-ssize.oneSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") os$ssize ##first sample sizes to reach desired power os$power ##calculated power for each sample size os$crit.vals ##calculated critical value for each sample size
Calculates appropriate sample sizes for two-sample microarray experiments in which effect sizes as well as variances vary among genes. Sample sizes are determined based on a desired power, a controlled false discovery rate, and user-specified proportions of non-differentially expressed genes. Outputs a graph of power versus sample size. A graph of power versus sample size is created.
ssize.oneSampVary(deltaMean, deltaSE, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)ssize.oneSampVary(deltaMean, deltaSE, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)
deltaMean |
mean of normal distribution followed by effect sizes among genes |
deltaSE |
standard deviation of normal distribution followed by effect sizes among genes |
a |
shape parameter of inverse gamma distribution followed by variances of genes |
b |
scale parameter of inverse gamma distribution followed by variances of genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
side |
options are "two-sided", "upper", or "lower" |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
The effect sizes among genes are assumed to follow a Normal distribution
with mean specified by deltaMean and standard deviation specified by
deltaSE. The variances among genes are assumed to follow an
Inverse Gamma distribution with shape parameter a and scale parameter
b.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes (for each treatment) at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations with corresponding sample sizes |
Numerical integration used in calculations performed by the function
integrate, which uses adaptive quadrature of functions.
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >20.
Running this function may result in many warnings. Probabilities under
different t-distributions with non-zero non-centrality parameters are
calculated many times while the function runs. If these probabilities
are virtually zero, the function pt outputs a value <1e-8
and outputs a warning of “full precision not achieved”. These values
have no impact on the accuracy of the resulting calculations.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSampVary, ssize.oneSamp,
ssize.oneSampVary, ssize.F,
ssize.Fvary
dm<-2; ds<-1 ##the effect sizes of the genes follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of the genes follow an Inverse Gamma(3,1) distribution. a2<-0.05 ##false discovery rate to be controlled pwr2<-0.8 ##desired power p0<-c(0.90,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations osv<-ssize.oneSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a2,power=pwr2,pi0=p0, maxN=N1,side="two-sided") osv$ssize ##first sample sizes to reach desired power osv$power ##calculated power for each sample size osv$crit.vals ##calculated critical value for each sample sizedm<-2; ds<-1 ##the effect sizes of the genes follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of the genes follow an Inverse Gamma(3,1) distribution. a2<-0.05 ##false discovery rate to be controlled pwr2<-0.8 ##desired power p0<-c(0.90,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations osv<-ssize.oneSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a2,power=pwr2,pi0=p0, maxN=N1,side="two-sided") osv$ssize ##first sample sizes to reach desired power osv$power ##calculated power for each sample size osv$crit.vals ##calculated critical value for each sample size
Calculates appropriate sample sizes for two-sample microarray experiments for a desired power. Sample size calculations are performed at controlled false discovery rates, user-specified proportions of non-differentially expressed genes, effect size, and standard deviation. A graph of power versus sample size is created.
ssize.twoSamp(delta, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)ssize.twoSamp(delta, sigma, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)
delta |
the common difference in mean expressions between the two samples for all genes |
sigma |
the common standard deviation of expressions for all genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
side |
options are "two-sided", "upper", or "lower" |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
The true difference between mean expressions of the two samples
as well as the standard deviations of expressions are assumed
identical for all genes. See the function
ssize.twoSampVary for sample size calculations
with varying differences between sample mean expressions and
standard deviations among genes.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes (for each treatment) at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations of two-sample t-test with corresponding sample sizes |
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >20.
Running this function with the side option of "lower" will
possibly result in multiple warnings. Calculating the probability
that an observation is less than the negative critical value under
a t-distribution with non-centrality parameter delta/sigma
(see argument section above) and the appropriate degrees of freedom
is a calculation that is performed many times while the function
runs. When the difference between the critical value and
delta/sigma is large, this probability is virtually zero.
This happens repeatedly while the function optimize
finds the appropriate critical value for each sample size. Because
of this, the function pt outputs a value <1e-8 in
addition to a warning of “full precision not achieved”. This has no
impact on the accuracy of the resulting calculations of sample size.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSampVary, ssize.oneSamp,
ssize.oneSampVary, ssize.F,
ssize.Fvary
##See Figure 1.(a) of Liu & Hwang (2007) d<-1 ##difference in differentially expressed genes to be detected s<-0.5 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20 ##maximum sample size for calculations ts<-ssize.twoSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") ts$ssize ##first sample sizes to reach desired power for each proportion of ##non-differentially expressed genes ts$power ##calculated power for each sample size ts$crit.vals ##calculated critical value for each sample size##See Figure 1.(a) of Liu & Hwang (2007) d<-1 ##difference in differentially expressed genes to be detected s<-0.5 ##standard deviation a<-0.05 ##false discovery rate to be controlled pwr<-0.8 ##desired power p0<-c(0.5,0.9,0.95) ##proportions of non-differentially expressed genes N<-20 ##maximum sample size for calculations ts<-ssize.twoSamp(delta=d,sigma=s,fdr=a,power=pwr,pi0=p0,maxN=N,side="two-sided") ts$ssize ##first sample sizes to reach desired power for each proportion of ##non-differentially expressed genes ts$power ##calculated power for each sample size ts$crit.vals ##calculated critical value for each sample size
Calculates appropriate sample sizes for two-sample microarray experiments in which the differences between mean treatment expression levels (delta.g for gene g) as well as standard deviations vary among genes. Sample sizes are determined based on a desired power, a controlled false discovery rate, and user-specified proportions of non-differentially expressed genes. A graph of power versus sample size is created.
ssize.twoSampVary(deltaMean, deltaSE, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)ssize.twoSampVary(deltaMean, deltaSE, a, b, fdr = 0.05, power = 0.8, pi0 = 0.95, maxN = 35, side = "two-sided", cex.title=1.15, cex.legend=1)
deltaMean |
location (mean) parameter of normal distribution followed by each delta.g |
deltaSE |
scale (standard deviation) parameter of normal distribution followed by each delta.g |
a |
shape parameter of inverse gamma distribution followed by variances of genes |
b |
scale parameter of inverse gamma distribution followed by variances of genes |
fdr |
the false discovery rate to be controlled |
power |
the desired power to be achieved |
pi0 |
a vector (or scalar) of proportions of non-differentially expressed genes |
maxN |
the maximum sample size used for power calculations |
side |
options are "two-sided", "upper", or "lower" |
cex.title |
controls size of chart titles |
cex.legend |
controls size of chart legend |
Each delta.g is assumed to follow a Normal distribution
with mean specified by deltaMean and standard deviation specified
by deltaSE. The variances among genes are assumed to follow an
Inverse Gamma distribution with shape parameter a and scale
parameter b.
If a vector is input for pi0, sample size calculations
are performed for each proportion.
ssize |
sample sizes (for each treatment) at which desired power is first reached |
power |
power calculations with corresponding sample sizes |
crit.vals |
critical value calculations with corresponding sample sizes |
Numerical integration used in calculations performed by the function
integrate, which uses adaptive quadrature of functions.
Powers calculated to be 0 may be negligibly conservative.
Critical values calculated as ‘NA’ are values >20.
Running this function may result in many warnings. Probabilities under
different t-distributions with non-zero non-centrality parameters are
calculated many times while the function runs. If these probabilities
are virtually zero, the function pt outputs a value <1e-8
and outputs a warning of “full precision not achieved”. These values
have no impact on the accuracy of the resulting calculations.
Megan Orr [email protected], Peng Liu [email protected]
Liu, Peng and J. T. Gene Hwang. 2007. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.
ssize.twoSampVary, ssize.oneSamp,
ssize.oneSampVary, ssize.F,
ssize.Fvary
##See Figure 3.(a) of Liu & Hwang (2007) dm<-2; ds<-1 ##the delta.g's follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of genes follow an Inverse Gamma(a,b) distribution a2<-0.05 ##false discovery rate to be controlled pwr2<-0.8 ##desired power p0<-c(0.90,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations tsv<-ssize.twoSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a2,power=pwr2,pi0=p0, maxN=N1,side="two-sided") tsv$ssize ##first sample size(s) to reach desired power tsv$power ##calculated power for each sample size tsv$crit.vals ##calculated critical value for each sample size##See Figure 3.(a) of Liu & Hwang (2007) dm<-2; ds<-1 ##the delta.g's follow a Normal(2,1) distribution alph<-3; beta<-1 ##the variances of genes follow an Inverse Gamma(a,b) distribution a2<-0.05 ##false discovery rate to be controlled pwr2<-0.8 ##desired power p0<-c(0.90,0.95,0.995) ##proportions of non-differentially expressed genes N1<-35 ##maximum sample size to be used in calculations tsv<-ssize.twoSampVary(deltaMean=dm,deltaSE=ds,a=alph,b=beta,fdr=a2,power=pwr2,pi0=p0, maxN=N1,side="two-sided") tsv$ssize ##first sample size(s) to reach desired power tsv$power ##calculated power for each sample size tsv$crit.vals ##calculated critical value for each sample size