| Title: | An Elegant Approach to Summarizing Clinical Data |
|---|---|
| Description: | Streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types. For method details see Motulsky H J (2016) <https://www.graphpad.com/guides/prism/10/statistics/index.htm> and d'Agostino R B (1971) <doi:10.1093/biomet/58.2.341>. |
| Authors: | Xiang Li [aut, cre] |
| Maintainer: | Xiang Li <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-18 07:29:24 UTC |
| Source: | https://github.com/cran/tidysummary |
Calculates and appends p-values with optional statistical details to a summary table based on variable types and group comparisons. Handles both continuous and categorical variables with appropriate statistical tests.
add_p( summary, digit = 3, asterisk = FALSE, add_method = FALSE, add_statistic_name = FALSE, add_statistic_value = FALSE )add_p( summary, digit = 3, asterisk = FALSE, add_method = FALSE, add_statistic_name = FALSE, add_statistic_value = FALSE )
summary |
A data frame that has been processed by |
digit |
A numeric determine decimal. Accepts:
|
asterisk |
Logical indicating whether to show asterisk significance markers. |
add_method |
Control parameter for display of statistical methods. Accepts:
|
add_statistic_name |
Logical indicating whether to include test statistic names. |
add_statistic_value |
Logical indicating whether to include test statistic values. |
A data frame merged with statistical test results, containing: - Variable names - Summary - Formatted p-values - Optional method names/codes - Optional statistic names/values
# `summary` is a data frame processed by `add_var()` and `add_summary()`: data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species") summary <- add_summary(data) # Add statistical test results result <- add_p(summary)# `summary` is a data frame processed by `add_var()` and `add_summary()`: data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species") summary <- add_summary(data) # Add statistical test results result <- add_p(summary)
This function generates summary statistics for variables from a data frame that has been processed by add_var(), with options to format outputs.
add_summary( data, add_overall = TRUE, continuous_format = NULL, norm_continuous_format = "{mean} ± {SD}", unnorm_continuous_format = "{median} ({Q1}, {Q3})", categorical_format = "{n} ({pct})", binary_show = "last", digit = 2 )add_summary( data, add_overall = TRUE, continuous_format = NULL, norm_continuous_format = "{mean} ± {SD}", unnorm_continuous_format = "{median} ({Q1}, {Q3})", categorical_format = "{n} ({pct})", binary_show = "last", digit = 2 )
data |
A data frame that has been processed by |
add_overall |
Logical indicating whether to include an "Overall" summary column. |
continuous_format |
Format string to override both normal/abnormal continuous formats. Accepted placeholders are |
norm_continuous_format |
Format string for normally distributed continuous variables. Default is |
unnorm_continuous_format |
Format string for non-normal continuous variables. Default is |
categorical_format |
Format string for categorical variables. Default is |
binary_show |
Display option for binary variables:
|
digit |
digit A numeric determine decimal. |
A data frame containing summary statistics with the following columns:
variable: Variable name
Overall (n=X): Summary statistics for all data, if add_overall=TRUE
Group-specific columns named [group] (n=X) with summary statistics
# `data` is a data frame processed by `add_var()`: data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species") # Add summary statistics result <- add_summary(data, add_overall = TRUE) result <- add_summary(data, continuous_format = "{mean}, ({SD})")# `data` is a data frame processed by `add_var()`: data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species") # Add summary statistics result <- add_summary(data, add_overall = TRUE) result <- add_summary(data, continuous_format = "{mean}, ({SD})")
This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.
add_var(data, var = NULL, group = "group", norm = "auto", center = "median")add_var(data, var = NULL, group = "group", norm = "auto", center = "median")
data |
A data frame containing the variables to analyze, with variables at columns and observations at rows. |
var |
A character vector of variable names to include. If |
group |
A character string specifying the grouping variable in |
norm |
Control parameter for normality tests. Accepts:
|
center |
A character string specifying the |
A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:
var: List of categorized variables:
valid: All valid variable names after checks
continuous: Sublist of continuous variables (further divided by normality/equal variance)
categorical: Sublist of categorical variables (further divided by ordered/expected frequency)
group: Grouping variable name
overall_n: Total number of observations
group_n: Observation counts per group
group_nlevels: Number of groups
group_levels: Group level names
norm: Normality check method used
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
Performs Levene's test to assess equality of variances between groups.
equal_test(data, var, group, center = "median")equal_test(data, var, group, center = "median")
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the numeric variable in |
group |
A character string specifying the grouping variable in |
center |
A character string specifying the |
Logical value:
TRUE: Variances are equal, p-value more than 0.05
FALSE: Variances are unequal or an error occurred during testing
Levene's test is the default method adopted in SPSS, the original Levene's test select center = mean, but here select center = median for a more robust test
equal_test(iris, "Sepal.Length", "Species")equal_test(iris, "Sepal.Length", "Species")
Formats p-values as strings with specified precision and optional significance asterisks.
format_p(p, digit = 3, asterisk = FALSE)format_p(p, digit = 3, asterisk = FALSE)
p |
A numeric p-value between 0 and 1. |
digit |
A numeric determine decimal. Accepts:
|
asterisk |
Logical indicating whether to return significance asterisks. |
Character of formatted p-value or asterisks.
format_p(0.00009, 4) format_p(0.03, 3) format_p(0.02, asterisk = TRUE)format_p(0.00009, 4) format_p(0.03, 3) format_p(0.02, asterisk = TRUE)
Conducts normality tests for a specified variable, optionally by group. Supports automatic testing and interactive visualization.
normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the numeric variable in |
group |
A character string specifying the grouping variable in |
norm |
Control parameter for test behavior. Accepts:
|
A logical value:
TRUE: data are normally distributed
FALSE: data are not normally distributed
Automatically selects test based on sample size per group:
n < 3: Too small, assuming non-normal
(3, 50] Shapiro-Wilk test
(50, 1000]: D'Agostino Chi2 test, instead of Kolmogorov-Smirnov test
n > 1000: Show p-values, plots QQ plots and prompts for decision
normal_test(iris, "Sepal.Length", "Species", norm = "auto") normal_test(iris, "Sepal.Length", "Species", norm = TRUE)normal_test(iris, "Sepal.Length", "Species", norm = "auto") normal_test(iris, "Sepal.Length", "Species", norm = TRUE)
This function determines if a contingency table meets the expected frequency assumptions for a valid chi-squared test. It categorizes the data into "not_small", "small", or "very_small" based on sample size and expected frequencies.
small_test(data, var, group)small_test(data, var, group)
data |
A data frame containing the variables to be tested. |
var |
A character string specifying the factor variable in |
group |
A character string specifying the grouping variable in |
A character string with one of three values:
"not_small": Sample size more than or euqal to 40 and all expected frequencies more than or euqal to 5
"small": Sample size more than or euqal to 40, all expected frequencies more than or euqal to 1 and at least one <5, only for 2*2 contingency tables
"very_small": Other conditions, including sample size <40 or any expected frequency <1
df <- data.frame( category = factor(c("A", "B", "A", "B")), group = factor(c("X", "X", "Y", "Y")) ) small_test(data = df, var = "category", group = "group")df <- data.frame( category = factor(c("A", "B", "A", "B")), group = factor(c("X", "X", "Y", "Y")) ) small_test(data = df, var = "category", group = "group")