testflow provides statistical testing workflows
organized by study design.
library(testflow)
cardio <- make_cardio_data()
test_one_sample(cardio, sbp_3m, mu = 140)
#> Statistical test workflow
#>
#> Outcome: sbp_3m
#> Design: one numerical sample
#>
#> Assumptions
#> * Independence of observations: assumed: Assumed from study design.
#> * Normality: sbp_3m: acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.99; p=0.308)
#> * Symmetry of deviations: not checked: Normality made the symmetry check unnecessary. (method=Signed-rank teaching note)
#>
#> Recommended test
#> One-sample t-test
#>
#> Result
#> H0: the population mean or location of sbp_3m equals the reference value.
#> statistic = -0.46, df = 179.00, p = 0.646, 95% CI [136.47, 142.20]
#>
#> Effect size
#> Cohen's d: -0.03, negligible
#>
#> Report
#> The one numerical sample workflow for sbp_3m did not show a statistically significant result using One-sample t-test, statistic = -0.46, df = 179.00, p = 0.646. The 95% confidence interval was [136.47, 142.20]. The effect size was negligible (Cohen's d = -0.03). H0: the population mean or location of sbp_3m equals the reference value.test_two_groups(sbp_3m ~ sex, data = cardio)
#> Statistical test workflow
#>
#> Outcome: sbp_3m
#> Group: sex
#> Design: two independent groups
#>
#> Assumptions
#> * Independence of observations: assumed: Assumed from study design.
#> * Normality: sbp_3m (female): acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.99; p=0.913)
#> * Normality: sbp_3m (male): acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.98; p=0.233)
#> * Variance homogeneity: acceptable: Variance homogeneity looks reasonable. (method=Levene test; statistic=1.57)
#> * Extreme outliers: warning: 4 potential outlier(s) flagged by IQR. (IQR rule, n = 4)
#> * Variance ratio check: acceptable: Variance ratio looks reasonable. (statistic=1.27)
#>
#> Recommended test
#> Student independent t-test
#>
#> Result
#> H0: the population mean or location of sbp_3m is equal across levels of sex.
#> statistic = -1.91, df = 178.00, p = 0.058, 95% CI [-11.22, 0.18]
#>
#> Effect size
#> Cohen's d: -0.29, small
#>
#> Report
#> The two independent groups workflow for sbp_3m did not show a statistically significant result using Student independent t-test, statistic = -1.91, df = 178.00, p = 0.058. The 95% confidence interval was [-11.22, 0.18]. The effect size was small (Cohen's d = -0.29). H0: the population mean or location of sbp_3m is equal across levels of sex.test_paired(sbp_3m ~ sbp_baseline, data = cardio)
#> Statistical test workflow
#>
#> Outcome: sbp_3m - sbp_baseline
#> Design: paired measurements
#>
#> Assumptions
#> * Independence of observations: assumed: Paired observations from the same subjects are assumed by design.
#> * Normality: diff: acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.99; p=0.557)
#> * Symmetry of paired differences: not checked: Normality made the symmetry check unnecessary.
#> * Extreme outliers: warning: 1 potential outlier(s) flagged by IQR. (IQR rule, n = 1)
#>
#> Recommended test
#> Paired t-test
#>
#> Result
#> H0: the mean or median paired difference (sbp_3m - sbp_baseline) equals 0.
#> statistic = -9.20, df = 179.00, p = <0.001, 95% CI [-9.53, -6.16]
#>
#> Effect size
#> Cohen's dz: -0.69, moderate
#>
#> Report
#> The paired measurements workflow for sbp_3m - sbp_baseline showed a statistically significant result using Paired t-test, statistic = -9.20, df = 179.00, p = <0.001. The 95% confidence interval was [-9.53, -6.16]. The effect size was moderate (Cohen's dz = -0.69). H0: the mean or median paired difference (sbp_3m - sbp_baseline) equals 0.test_groups(sbp_3m ~ treatment, data = cardio)
#> Statistical test workflow
#>
#> Outcome: sbp_3m
#> Group: treatment
#> Design: more than two independent groups
#>
#> Assumptions
#> * Independence of observations: assumed: Assumed from study design.
#> * Normality: sbp_3m (lifestyle): not acceptable: Normality may be violated. (method=Shapiro-Wilk; statistic=0.96; p=0.030)
#> * Normality: sbp_3m (medication): acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.98; p=0.647)
#> * Normality: sbp_3m (usual care): acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.98; p=0.349)
#> * Variance homogeneity: acceptable: Variance homogeneity looks reasonable. (method=Levene test; statistic=1.20)
#> * Bartlett test: acceptable: Variance homogeneity looks reasonable. (method=Bartlett test; statistic=0.66)
#> * Extreme outliers: warning: 4 potential outlier(s) flagged by IQR. (IQR rule, n = 4)
#>
#> Recommended test
#> Kruskal-Wallis test
#>
#> Result
#> H0: the population mean or location of sbp_3m is equal across levels of treatment.
#> statistic = 7.58, df = 2.00, p = 0.023
#>
#> Effect size
#> Kruskal epsilon squared: 0.03, small
#>
#> Report
#> The more than two independent groups workflow for sbp_3m showed a statistically significant result using Kruskal-Wallis test, statistic = 7.58, df = 2.00, p = 0.023. The effect size was small (Kruskal epsilon squared = 0.03). H0: the population mean or location of sbp_3m is equal across levels of treatment.test_factorial(sbp_3m ~ sex * treatment, data = cardio)
#> Statistical test workflow
#>
#> Outcome: sbp_3m
#> Group: sex, treatment
#> Design: factorial design
#>
#> Assumptions
#> * Independence of observations: assumed: Assumed from study design.
#> * Normality of residuals: acceptable: Residuals appear approximately normal. (method=Shapiro-Wilk; statistic=0.99; p=0.560)
#> * Variance homogeneity: acceptable: Variance homogeneity looks reasonable. (method=Levene test; statistic=1.57; p=0.211; Df1=1; Df2=178)
#> * Balanced design: not required: Cell sizes are unbalanced; the workflow still reports the design.
#>
#> Recommended test
#> Factorial ANOVA
#>
#> Result
#> H0: the population mean or location of sbp_3m is equal across levels of sex, treatment.
#> statistic = 3.78, df = 1.00, p = 0.053
#>
#> Effect size
#> eta squared: 0.02, small
#>
#> Report
#> The factorial design workflow for sbp_3m did not show a statistically significant result using Factorial ANOVA, statistic = 3.78, df = 1.00, p = 0.053. The effect size was small (eta squared = 0.02). H0: the population mean or location of sbp_3m is equal across levels of sex, treatment.test_repeated(cardio, c(sbp_baseline, sbp_3m, sbp_6m), id = id)
#> Statistical test workflow
#>
#> Outcome: sbp_baseline, sbp_3m, sbp_6m
#> Group: time
#> Design: repeated numeric measurements
#>
#> Assumptions
#> * Independence of observations: assumed: Repeated measurements from the same subjects are assumed by design.
#> * Normality: sbp_3m: acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.99; p=0.308)
#> * Normality: sbp_6m: acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=1.00; p=0.842)
#> * Normality: sbp_baseline: acceptable: Approximate normality looks reasonable. (method=Shapiro-Wilk; statistic=0.99; p=0.732)
#> * Sphericity: not checked: Sphericity is not checked here; use this as a teaching note unless a formal test is added.
#>
#> Recommended test
#> Repeated-measures ANOVA
#>
#> Result
#> H0: the population mean or location of sbp_baseline, sbp_3m, sbp_6m is equal across levels of time.
#> statistic = 3.76, df = 2.00, p = 0.024
#>
#> Effect size
#> eta squared: 0.05, small
#>
#> Report
#> The repeated numeric measurements workflow for sbp_baseline, sbp_3m, sbp_6m showed a statistically significant result using Repeated-measures ANOVA, statistic = 3.76, df = 2.00, p = 0.024. The effect size was small (eta squared = 0.05). H0: the population mean or location of sbp_baseline, sbp_3m, sbp_6m is equal across levels of time.The repeated numeric workflow chooses repeated-measures ANOVA when the within-time normality checks are acceptable and Friedman otherwise. Post-hoc comparisons are paired t-tests for the parametric branch and paired Wilcoxon tests for the non-parametric branch.
test_categorical(treatment ~ controlled_3m, data = cardio)
#> Statistical test workflow
#>
#> Outcome: treatment
#> Group: controlled_3m
#> Design: two categorical variables
#>
#> Assumptions
#> * Independence of observations: assumed: Assumed from study design.
#> * Expected cell counts: acceptable: Chi-square approximation is reasonable. (method=Pearson chi-square approximation; Min expected = 26.1)
#>
#> Recommended test
#> Chi-square test of independence
#>
#> Result
#> H0: treatment and controlled_3m are independent.
#> statistic = 5.02, df = 2.00, p = 0.081
#>
#> Effect size
#> Cramer's V: 0.17, small
#>
#> Report
#> The two categorical variables workflow for treatment did not show a statistically significant result using Chi-square test of independence, statistic = 5.02, df = 2.00, p = 0.081. The effect size was small (Cramer's V = 0.17). H0: treatment and controlled_3m are independent.test_repeated_categorical(cardio, c(controlled_baseline, controlled_3m, controlled_6m))
#> Statistical test workflow
#>
#> Outcome: controlled_baseline, controlled_3m, controlled_6m
#> Design: repeated categorical measurements
#>
#> Assumptions
#> * Repeated binary measurements: assumed: Same subjects should be measured at 3 or more time points.
#> * Complete repeated data: acceptable: Missingness should be handled explicitly or via complete-case analysis.
#>
#> Recommended test
#> Cochran Q test
#>
#> Result
#> H0: the success proportions are equal across repeated categorical measures.
#> statistic = 39.58, df = 2.00, p = <0.001
#>
#> Effect size
#> Cochran Q Kendall's W: 0.11, small
#>
#> Report
#> The repeated categorical measurements workflow for controlled_baseline, controlled_3m, controlled_6m showed a statistically significant result using Cochran Q test, statistic = 39.58, df = 2.00, p = <0.001. The effect size was small (Cochran Q Kendall's W = 0.11). H0: the success proportions are equal across repeated categorical measures.The repeated categorical workflow uses Cochran Q for binary repeated outcomes and pairwise McNemar tests for follow-up comparisons.
test_correlation(sbp_3m ~ age, data = cardio)
#> Statistical test workflow
#>
#> Outcome: sbp_3m
#> Group: age
#> Design: two numeric variables
#>
#> Assumptions
#> * Monotonic relationship: warning: Relationship may be non-monotonic. (method=Spearman correlation; statistic=793638.65; p=0.014)
#> * Extreme outliers: warning: 7 potential outlier(s) flagged by IQR. (IQR rule applied to age, sbp_3m)
#> * Normality: not required: Normality is not required for Spearman correlation.
#>
#> Recommended test
#> Spearman Correlation
#>
#> Result
#> H0: the correlation between age and sbp_3m is 0.
#> statistic = 793638.65, p = 0.014
#>
#> Effect size
#> Spearman Correlation r: 0.18, small
#>
#> Report
#> The two numeric variables workflow for sbp_3m showed a statistically significant result using Spearman Correlation, statistic = 793638.65, p = 0.014. The effect size was small (Spearman Correlation r = 0.18). H0: the correlation between age and sbp_3m is 0.test_outliers(c(sbp_3m, ldl, crp), data = cardio)
#> Warning: `outliers` is a screening workflow, not a single hypothesis test.
#> Statistical test workflow
#>
#> Outcome: sbp_3m, ldl, crp
#> Design: outlier screening
#>
#> Assumptions
#> * Numeric variable: acceptable: IQR outlier detection is univariate and does not require normality.
#> * Skewness sensitivity: warning: Interpret IQR outliers with care when the distribution is strongly skewed.
#>
#> Recommended test
#> IQR outlier detection
#>
#> Result
#> flagged rows = 11
#>
#> Effect size
#> * Effect size not reported.
#>
#> Report
#> The outlier workflow flagged 11 rows for review.Every workflow returns a testflow object. Use
report(x), plot(x), and
as_tibble(x). See effect-size-formulas.Rmd for
the exact formulas used by the reported effect-size estimates.