Title: | Run Permutation Tests and Construct Associated Confidence Intervals |
---|---|
Description: | Implements permutation tests for any test statistic and randomization scheme and constructs associated confidence intervals as described in Glazer and Stark (2024) <doi:10.48550/arXiv.2405.05238>. |
Authors: | Amanda Glazer [aut, cre] |
Maintainer: | Amanda Glazer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-11-26 06:48:12 UTC |
Source: | CRAN |
This function takes an array of p-values and returns adjusted p-values using user-inputted FWER or FDR correction method
adjust_p_value(pvalues, method = "holm-bonferroni")
adjust_p_value(pvalues, method = "holm-bonferroni")
pvalues |
Array of p-values |
method |
The FWER or FDR correction to use, either 'holm-bonferroni', 'bonferroni', or 'benjamini-hochberg' |
Adjusted p-values
adjust_p_value(pvalues = c(.05, .1, .5), method='holm-bonferroni')
adjust_p_value(pvalues = c(.05, .1, .5), method='holm-bonferroni')
This function takes a data frame, and group and outcome column names as input and returns the difference in mean outcome between the two groups
diff_in_means(df, group_col, outcome_col, treatment_value = NULL)
diff_in_means(df, group_col, outcome_col, treatment_value = NULL)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
treatment_value |
The value of group_col to be considered 'treatment' |
The difference in mean outcome between the two groups
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) diff_in_means(df = data, group_col = "group", outcome_col = "outcome", treatment_value = 1)
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) diff_in_means(df = data, group_col = "group", outcome_col = "outcome", treatment_value = 1)
This function takes a data frame, and group and outcome column names as input and returns the difference in median outcome between the two groups
diff_in_medians(df, group_col, outcome_col, treatment_value = NULL)
diff_in_medians(df, group_col, outcome_col, treatment_value = NULL)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
treatment_value |
The value of group_col to be considered 'treatment' |
The difference in median outcome between the two groups
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) diff_in_medians(df = data, group_col = "group", outcome_col = "outcome", treatment_value = 1)
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) diff_in_medians(df = data, group_col = "group", outcome_col = "outcome", treatment_value = 1)
This function takes an array of p-values and returns a combined p-value using fisher's combining function:
fisher(pvalues)
fisher(pvalues)
pvalues |
Array of p-values |
Combined p-value using fisher's method
fisher(pvalues = c(.05, .1, .5))
fisher(pvalues = c(.05, .1, .5))
This function takes an array of p-values and returns a combined p-value using Liptak's combining function:
where
is the CDF of the Normal distribution
liptak(pvalues)
liptak(pvalues)
pvalues |
Array of p-values |
Combined p-value using Liptak's method
liptak(pvalues = c(.05, .1, .5))
liptak(pvalues = c(.05, .1, .5))
This function takes a data frame and group and outcome column names as input and returns the nonparametric combination of tests (NPC) omnibus p-value
npc( df, group_col, outcome_cols, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, combn = "fisher", shift = 0, reps = 10000, perm_set = NULL, complete_enum = FALSE, seed = NULL )
npc( df, group_col, outcome_cols, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, combn = "fisher", shift = 0, reps = 10000, perm_set = NULL, complete_enum = FALSE, seed = NULL )
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_cols |
The names of the columns in df that corresponds to the outcome variable |
strata_col |
The name of the column in df that corresponds to the strata |
test_stat |
Test statistic function |
perm_func |
Function to permute group, default is permute_group which randomly permutes group assignment |
combn |
Combining function method to use, takes values 'fisher', 'tippett', or 'liptak', or a user defined function |
shift |
Value of shift to apply in one- or two-sample problem |
reps |
Number of iterations to use when calculating permutation p-value |
perm_set |
Matrix of permutations to use instead of reps iterations of perm_func |
complete_enum |
Boolean, whether to calculate P-value under complete enumeration of permutations |
seed |
An integer seed value |
The omnibus p-value
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), out1 = c(0, 1, 0, 0, 1, 1, 1, 0), out2 = rep(1, 8)) output <- npc(df = data, group_col = "group", outcome_cols = c("out1", "out2"), perm_func = permute_group, combn = "fisher", reps = 10^4, seed=42)
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), out1 = c(0, 1, 0, 0, 1, 1, 1, 0), out2 = rep(1, 8)) output <- npc(df = data, group_col = "group", outcome_cols = c("out1", "out2"), perm_func = permute_group, combn = "fisher", reps = 10^4, seed=42)
This function runs a permutation test for the one-sample problem by calling the permutation_test function using the one-sample mean test statistic.
one_sample(x, shift = 0, alternative = "greater", reps = 10^4, seed = NULL)
one_sample(x, shift = 0, alternative = "greater", reps = 10^4, seed = NULL)
x |
array of data |
shift |
Value of shift to apply in one-sample problem |
alternative |
String, two-sided or one-sided (greater or less) p-value |
reps |
Number of iterations to use when calculating permutation p-value |
seed |
An integer seed value |
The permutation test p-value
one_sample(x = c(-1, 1, 2), seed = 42)
one_sample(x = c(-1, 1, 2), seed = 42)
This function takes a data frame, and group and outcome column names as input and returns the mean of the product of the outcome and group. This test statistic is used for the one-sample problem.
one_sample_mean(df, group_col, outcome_col)
one_sample_mean(df, group_col, outcome_col)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
The one-sample problem test statistic: the mean of the product of the outcome and group
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) one_sample_mean(df = data, group_col = "group", outcome_col = "outcome")
data <- data.frame(group = c(rep(1, 4), rep(2, 4)), outcome = c(rep(3, 4), rep(5, 4))) one_sample_mean(df = data, group_col = "group", outcome_col = "outcome")
This function takes a data frame, and group and outcome column names as input and returns the one-way anova test statistic
one_way_anova_stat(df, group_col, outcome_col)
one_way_anova_stat(df, group_col, outcome_col)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
The one-way anova test statistic:
where
indexes the groups
Run permutation test with user inputted data, test statistic, and permutation function
permutation_test( df, group_col, outcome_col, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, alternative = "two-sided", shift = 0, reps = 10000, perm_set = NULL, complete_enum = FALSE, return_test_dist = FALSE, return_perm_dist = FALSE, seed = NULL )
permutation_test( df, group_col, outcome_col, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, alternative = "two-sided", shift = 0, reps = 10000, perm_set = NULL, complete_enum = FALSE, return_test_dist = FALSE, return_perm_dist = FALSE, seed = NULL )
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
strata_col |
The name of the column in df that corresponds to the strata |
test_stat |
Test statistic function |
perm_func |
Function to permute group |
alternative |
String, two-sided or one-sided (greater or less) p-value; options are 'greater', 'less', or 'two-sided' |
shift |
Value of shift to apply in one- or two-sample problem |
reps |
Number of iterations to use when calculating permutation p-value |
perm_set |
Matrix of group assignments to use instead of reps iterations of perm_func |
complete_enum |
Boolean, whether to calculate P-value under complete enumeration of permutations |
return_test_dist |
Boolean, whether to return test statistic distribution under permutations |
return_perm_dist |
Boolean, whether to return a matrix where each row is the group assignment under that permutation |
seed |
An integer seed value |
p_value
: the permutation test p-value
test_stat_dist
: array, the distribution of the test statistic under the set of permutations,
if return_test_dist is set to TRUE
perm_indices_mat
: matrix, each row corresponds to a permutation used
in the permutation test calculation
data <- data.frame(group = c(rep(1, 10), rep(2, 10)), outcome = c(rep(1, 10), rep(1, 10))) permutation_test(df = data, group_col = "group", outcome_col = "outcome", test_stat = "diff_in_means", perm_func = permute_group, alternative = "greater", shift = 0, reps = 10, return_perm_dist = TRUE, return_test_dist = TRUE, seed = 42)
data <- data.frame(group = c(rep(1, 10), rep(2, 10)), outcome = c(rep(1, 10), rep(1, 10))) permutation_test(df = data, group_col = "group", outcome_col = "outcome", test_stat = "diff_in_means", perm_func = permute_group, alternative = "greater", shift = 0, reps = 10, return_perm_dist = TRUE, return_test_dist = TRUE, seed = 42)
This function constructs a confidence interval by inverting permutation tests and applying the method in Glazer and Stark, 2024.
permutation_test_ci( df, group_col, outcome_col, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, upper_bracket = NULL, lower_bracket = NULL, cl = 0.95, e = 0.1, reps = 10000, perm_set = NULL, seed = 42 )
permutation_test_ci( df, group_col, outcome_col, strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, upper_bracket = NULL, lower_bracket = NULL, cl = 0.95, e = 0.1, reps = 10000, perm_set = NULL, seed = 42 )
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
strata_col |
The name of the column in df that corresponds to the strata |
test_stat |
Test statistic function |
perm_func |
Function to permute group |
upper_bracket |
Array with 2 values that bracket upper confidence bound |
lower_bracket |
Array with 2 values that bracket lower confidence bound |
cl |
Confidence level, default 0.95 |
e |
Maximum distance from true confidence bound value |
reps |
Number of iterations to use when calculating permutation p-value |
perm_set |
Matrix of group assignments to use instead of reps iterations of perm_func |
seed |
An integer seed value |
A list containing the permutation test p-value, and the test statistic distribution if applicable
x <- c(35.3, 35.9, 37.2, 33.0, 31.9, 33.7, 36.0, 35.0, 33.3, 33.6, 37.9, 35.6, 29.0, 33.7, 35.7) y <- c(32.5, 34.0, 34.4, 31.8, 35.0, 34.6, 33.5, 33.6, 31.5, 33.8, 34.6) df <- data.frame(outcome = c(x, y), group = c(rep(1, length(x)), rep(0, length(y)))) permutation_test_ci(df = df, group_col = "group", outcome_col = "outcome", strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, upper_bracket = NULL, lower_bracket = NULL, cl = 0.95, e = 0.01, reps = 10^3, seed = 42)
x <- c(35.3, 35.9, 37.2, 33.0, 31.9, 33.7, 36.0, 35.0, 33.3, 33.6, 37.9, 35.6, 29.0, 33.7, 35.7) y <- c(32.5, 34.0, 34.4, 31.8, 35.0, 34.6, 33.5, 33.6, 31.5, 33.8, 34.6) df <- data.frame(outcome = c(x, y), group = c(rep(1, length(x)), rep(0, length(y)))) permutation_test_ci(df = df, group_col = "group", outcome_col = "outcome", strata_col = NULL, test_stat = "diff_in_means", perm_func = permute_group, upper_bracket = NULL, lower_bracket = NULL, cl = 0.95, e = 0.01, reps = 10^3, seed = 42)
This function takes a data frame and group column name as input and returns the dataframe with the group column randomly permuted
permute_group(df, group_col, strata_col = NULL, seed = NULL)
permute_group(df, group_col, strata_col = NULL, seed = NULL)
df |
A data frame |
group_col |
String, the name of the column in df that corresponds to the group label |
strata_col |
The name of the column in df that corresponds to the strata, should be NULL for unstratified permutation |
seed |
An integer seed value |
The inputted data frame with the group column randomly shuffled
data <- data.frame(group_label = c(1, 2, 2, 1, 2, 1), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = NULL, seed = 42)
data <- data.frame(group_label = c(1, 2, 2, 1, 2, 1), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = NULL, seed = 42)
This function takes a data frame and group and outcome column name as input and returns the dataframe with the group column replaced with randomly assigned signs
permute_sign(df, group_col, strata_col = NULL, seed = NULL)
permute_sign(df, group_col, strata_col = NULL, seed = NULL)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
strata_col |
The name of the column in df that corresponds to the strata, should be NULL for this function |
seed |
An integer seed value |
The inputted data frame with the group column replaced with randomly assigned signs
data <- data.frame(group_label = rep(1, 6), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = NULL, seed = 42)
data <- data.frame(group_label = rep(1, 6), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = NULL, seed = 42)
This function takes a data frame and group and strata column name as input and returns the dataframe with the group column randomly permuted by strata
strat_permute_group(df, group_col, strata_col, seed = NULL)
strat_permute_group(df, group_col, strata_col, seed = NULL)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
strata_col |
The name of the column in df that corresponds to the strata |
seed |
An integer seed value |
The inputted data frame with the group column randomly shuffled by strata
data <- data.frame(group_label = c(1, 2, 2, 1, 2, 1), stratum = c(1, 1, 1, 2, 2, 2), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = "stratum", seed = 42)
data <- data.frame(group_label = c(1, 2, 2, 1, 2, 1), stratum = c(1, 1, 1, 2, 2, 2), outcome = 1:6) permute_group(df = data, group_col = "group_label", strata_col = "stratum", seed = 42)
This function takes an array of p-values and returns a combined p-value using Tippett's combining function:
tippett(pvalues)
tippett(pvalues)
pvalues |
Array of p-values |
Combined p-value using Tippett's method
tippett(pvalues = c(.05, .1, .5))
tippett(pvalues = c(.05, .1, .5))
This function takes a data frame, and group and outcome column names as input and returns the t test statistic
ttest_stat(df, group_col, outcome_col)
ttest_stat(df, group_col, outcome_col)
df |
A data frame |
group_col |
The name of the column in df that corresponds to the group label |
outcome_col |
The name of the column in df that corresponds to the outcome variable |
The t test statistic
This function runs a permutation test with difference in means test statistic for the two-sample problem by calling the permutation_test function.
two_sample(x, y, shift = 0, alternative = "greater", reps = 10^4, seed = NULL)
two_sample(x, y, shift = 0, alternative = "greater", reps = 10^4, seed = NULL)
x |
array of data for treatment group |
y |
array of data for control group |
shift |
Value of shift to apply in two-sample problem |
alternative |
String, two-sided or one-sided (greater or less) p-value; options are 'greater', 'less', or 'two-sided' |
reps |
Number of iterations to use when calculating permutation p-value |
seed |
An integer seed value |
The permutation test p-value
two_sample(x = c(10, 9, 11), y = c(12, 11, 13), alternative = "less", seed = 42)
two_sample(x = c(10, 9, 11), y = c(12, 11, 13), alternative = "less", seed = 42)