{PRDA} allows performing a prospective or retrospective design analysis to evaluate inferential risks (i.e., power, Type M error, and Type S error) in a study considering Pearson’s correlation between two variables or mean comparisons (one-sample, paired, two-sample, and Welch’s t-test).
The term Design Analysis was introduced by Gelman and Carlin (2014) as a broader definition of Power Analysis. Traditional power analysis has a narrow focus on statistical significance. Design analysis, instead, evaluates together with power levels also other inferential risks (i.e., Type M error and Type S error), to assess estimates uncertainty under hypothetical replications of a study.
Given an hypothetical effect size and the study characteristics (i.e., sample size, statistical test directionality, α level), design analysis evaluates:
Moreover, Gelman and Carlin (2014) distinguished between two types of design analysis according to the study phase:
It is important to do not mistake a retrospective design analysis for post-hoc power analysis. The former defines the hypothetical effect size according to previous results in the literature or experts indications, whereas the latter defines the hypothetical effect size based on the same study results and it is a widely-deprecated practice (Goodman 1994; Lenth 2007; Senn 2002).
Although Type M error and Type S error depend directly on power level, they underline valuable information regarding estimates uncertainty that would otherwise be overlooked. This enhances researchers awareness about the inferential risks related to their studies and helps them in the interpretation of the results.
Despite the lower chances, a statistically significant result could be obtained even in an underpowered study (e.g., power = 20%). This might seem a promising finding, and researchers might think that getting a statistically significant result in an underpowered study means the results must be reliable. Therefore, they would probably be even more confident in the interpretation of their results.
However, in this scenario statistically significant results are almost certain to be an overestimation of the population effect. As pointed out by Gelman, Hill, and Vehtari (2020) “a key risk for a low-power study is not so much that it has a small chance of succeeding, but rather that an apparent success merely masks a larger failure” (p.292). This is also referred as the “Winner’s curse”, indicating that the apparent win in terms of a statistically significant result is an actual loss as the obtained estimate is inflated.
For example, in a study considering a two-sample t-test with 30 participants per group, if the hypothetical population effect size is small (e.g., Cohen’s d of .25) the actual power is only 16%. The associated Type M error is around 2.60 and the Type S error is 0.01. That means, statistical significant results are on average an overestimation of 160% of the hypothesized population effect and there is a 1% probability of obtaining a statistically significant result in the opposite direction.
In this scenario, knowing the type M and S errors, researchers would be much more cautious in interpreting the results and might consider carrying out a replication study to obtain more reliable results.
To know more about design analysis consider Gelman and Carlin (2014). While, for an introduction to design analysis considering examples in psychology see Altoè et al. (2020) and Bertoldo, Zandonella Callegher, and Altoè (2020).
Given a plausible value of effect size, {PRDA} performs a prospective or retrospective design analysis to evaluate the inferential risks (i.e., power, Type M error, and Type S error) related to the study design.
{PRDA} package can be used for Pearson’s correlation between two
variables or mean comparisons (i.e., one-sample, paired, two-sample, and
Welch’s t-test) considering an hypothetical value of ρ or Cohen’s d
respectively. See vignette("retrospective")
and
vignette("prospective")
to
know how to set function arguments for the different effect types.
To install the github version type in R:
In {PRDA} there are two main functions:
retrospective()
. Given the
hypothetical population effect size and the study sample size, the
function retrospective()
performs a retrospective design
analysis. According to the defined alternative hypothesis and the
significance level, the inferential risks (i.e., Power level, Type M
error, and Type S error) are computed together with the critical effect
value (i.e., the minimum absolute effect size value that would result
significant). To know more about function arguments and examples see the
function documentation ?retrospective
and vignette("retrospective")
.
prospective()
. Given the
hypothetical population effect size and the required power level, the
function prospective()
performs a prospective design
analysis. According to the defined alternative hypothesis and the
significance level, the required sample size is computed together with
the associated Type M error, Type S error, and the critical effect value
(i.e., the minimum absolute effect size value that would result
significant). To know more about function arguments and examples see the
function documentation ?prospective
and vignette("prospective")
.
The hypothetical population effect size can be defined as a single
value according to previous results in the literature or experts
indications. Alternatively, {PRDA} allows users to specify a
distribution of plausible values to account for their uncertainty about
the hypothetical population effect size. To know how to specify the
hypothetical effect size according to a distribution and an example of
application see vignette("retrospective")
.
Eisenberger, Lieberman, and Williams (2003) claimed that social and physical pain seem to share similar neural underpinnings. Their experiment included 13 participants, and they found a statistically significant correlation between perceived distress due to social exclusion and activity in the brain area associated with physical pain. However, the magnitude of the estimated correlation (r = .88) is beyond what could be considered plausible. In this field correlations are likely to be around ρ = .25 (for a complete discussion see Bertoldo, Zandonella Callegher, and Altoè 2020).
The function retrospective()
can be used to evaluate the
inferential risks associated with the study.
set.seed(2020) # set seed to make results reproducible
retrospective(effect_size = .25, sample_n1 = 13, test_method = "pearson")
#>
#> Design Analysis
#>
#> Hypothesized effect: rho = 0.25
#>
#> Study characteristics:
#> test_method sample_n1 sample_n2 alternative sig_level df
#> pearson 13 NULL two_sided 0.05 11
#>
#> Inferential risks:
#> power typeM typeS
#> 0.127 2.583 0.028
#>
#> Critical value(s): rho = ± 0.553
In the output, we have the summary information about the hypothesized population effect, the study characteristics, and the inferential risks. We obtained a statistical power of almost 13% that is associated with a Type M error of around 2.6 and a Type S error of 0.03. That means, statistical significant results are on average an overestimation of 160% of the hypothesized population effect and there is a 3% probability of obtaining a statistically significant result in the opposite direction.
To know more about function arguments and examples see the function
documentation ?retrospective
and vignette("retrospective")
.
Considering the previous results, researchers might consider planning
a replication study to obtain more reliable results. The function
prospective()
can be used to compute the sample size needed
to obtain a required level of power (e.g., power = 80%).
prospective(effect_size = .25, power = .8, test_method = "pearson",
display_message = FALSE)
#>
#> Design Analysis
#>
#> Hypothesized effect: rho = 0.25
#>
#> Study characteristics:
#> test_method sample_n1 sample_n2 alternative sig_level df
#> pearson 126 NULL two_sided 0.05 124
#>
#> Inferential risks:
#> power typeM typeS
#> 0.807 1.111 0
#>
#> Critical value(s): rho = ± 0.175
In the output, we have again the summary information about the hypothesized population effect, the study characteristics, and the inferential risks. To obtain a power of around 80% the required sample size is n = 126, the associated Type M error is around 1.10 and the Type S error is approximately 0.
To know more about function arguments and examples see the function
documentation ?prospective
and vignette("prospective")
.