Title: | Statistical Tools for Assessing Publication Integrity of Groups of Trials |
---|---|
Description: | Takes user-provided baseline data from groups of randomised controlled data and assesses whether the observed distribution of baseline p-values, numbers of participants in each group, or categorical variables are consistent with the expected distribution, as an aid to the assessment of integrity concerns in published randomised controlled trials. References (citations in PubMed format in details of each function): Bolland MJ, Avenell A, Gamble GD, Grey A. (2016) <doi:10.1212/WNL.0000000000003387>. Bolland MJ, Gamble GD, Avenell A, Grey A, Lumley T. (2019) <doi:10.1016/j.jclinepi.2019.05.006>. Bolland MJ, Gamble GD, Avenell A, Grey A. (2019) <doi:10.1016/j.jclinepi.2019.03.001>. Bolland MJ, Gamble GD, Grey A, Avenell A. (2020) <doi:10.1111/anae.15165>. Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. (2021) <doi:10.1016/j.jclinepi.2020.11.012>. Bolland MJ, Gamble GD, Avenell A, Grey A. (2021) <doi:10.1016/j.jclinepi.2021.05.002>. Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. (2023) <doi:10.1016/j.jclinepi.2022.12.018>. Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>. Carlisle JB. (2017) <doi:10.1111/anae.13938>. |
Authors: | Mark Bolland [aut, cre, cph] |
Maintainer: | Mark Bolland <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-11-29 09:08:59 UTC |
Source: | CRAN |
Creates plots of distribution of p-values for differences in baseline means calculated using Carlisle's montecarlo anova method.
anova_fn( df = anova_data, method = "alt", seed = 0, sims = -1, btsp = 500, title = "", verbose = TRUE )
anova_fn( df = anova_data, method = "alt", seed = 0, sims = -1, btsp = 500, title = "", verbose = TRUE )
df |
dataframe generated from load_clean function |
method |
"orig" is adapted from original code; "alt" avoids using loops in the code (see details) |
seed |
the seed to use for random number generation, default 0 = current date and time. Specify seed to make repeatable. |
sims |
number of simulations, default -1 = function selects based on number of variables and sample size |
btsp |
number of bootstrap repeats used to generate 95% confidence interval around AUC |
title |
optional title for plots |
verbose |
TRUE or FALSE indicates whether progress bar and comments show and prints plot |
Method is from Carlisle JB, Loadsman JA. Evidence for non-random sampling in randomised, controlled trials by Yuhji Saitoh. Anaesthesia. 2017;72:17-27.
R code is in appendix to paper. This function is adapted from that code.
The function has two methods. The published code selects each variable from each study then generates
simulations for that variable using a row-wise approach with several loops. The adapted method is method = "orig".
The method = "alt" generates all the simulations at once and initially I thought was considerably faster, but in practice the time savings are small.
The results from the two approaches will not be identical even if the same random number seed is used
because they use the generated random numbers in different orders but the p-values generated differ by about <0.1. Usually the
differences are close to 0.01 (although this depends on the number of simulations- more simulations = smaller differences).
The code that generates the p-value for each variable from the simulated means is essentially the same.
Returns a list containing 3 objects and (if verbose = TRUE) prints the plot anova_ecdf
list containing 3 objects as described
anova_ecdf = plot of cumulative distribution of calculated p-values compared to the expected uniform distribution
anova_pvalues = plots of distribution of calculated p-values and AUC, as for pval_cont_fn()
anova_all_results = list containing
anova_data = data frame of baseline data, with calculated p-values
anova_pvals = plot of distribution of calculated p-values from anova_pvalues
anova_auc = plot of AUC of calculated p-values from anova_pvalues
# load example data anova_data <- load_clean(import= "no", file.cont = "SI_pvals_cont",anova= "yes", format.cont = "wide")$anova_data # run function (takes only a few seconds) anova_fn(seed=10, sims = 100, btsp = 100)$anova_ecdf # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data anova_data <- load_clean(import= "yes", anova = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A:O", format.cont = "wide")$anova_data
# load example data anova_data <- load_clean(import= "no", file.cont = "SI_pvals_cont",anova= "yes", format.cont = "wide")$anova_data # run function (takes only a few seconds) anova_fn(seed=10, sims = 100, btsp = 100)$anova_ecdf # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data anova_data <- load_clean(import= "yes", anova = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A:O", format.cont = "wide")$anova_data
Creates plots of observed to expected numbers and ratios for the binomial variables and/or compares reported and calculated p-values for the variables
Reference: Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. Distributions of baseline categorical variables were different from the expected distributions in randomized trials with integrity concerns. J Clin Epidemiol. 2023;154:117-124
cat_all_fn( df = cat_all_data, comp.pvals = "no", fisher.sim = "y", fish.n.sims = 10000, binom = "no", two_levels = "no", del.disparate = "yes", excl.level = "yes", seed = 0, title = "", verbose = TRUE )
cat_all_fn( df = cat_all_data, comp.pvals = "no", fisher.sim = "y", fish.n.sims = 10000, binom = "no", two_levels = "no", del.disparate = "yes", excl.level = "yes", seed = 0, title = "", verbose = TRUE )
df |
data frame generated from load_clean function |
comp.pvals |
"yes" or "no" indicator whether reported and calculated p-values should be compared |
fisher.sim |
"yes" or "no" indicator whether to allow fisher test to simulate p-values for >2*2 tables |
fish.n.sims |
number of simulations to use in Fisher test, default 10,000 |
binom |
"yes" or "no" indicator whether observed to expected distributions of binomial variables should be calculated |
two_levels |
"yes" or "no" indicator whether variables with more than 2 levels should be collapsed to 2 levels |
del.disparate |
if yes, data in which the absolute difference between group sizes is >20% are deleted |
excl.level |
"yes" or "no" indicator whether one level of a variable should be deleted. Deleted level is chosen randomly using seed parameter. |
seed |
seed for random number generator, default 0 = current date and time. Specify seed to make repeatable. |
title |
title name for plots (optional) |
verbose |
TRUE or FALSE indicates whether progress bar and comments show and flextable or plot or both are printed |
Returns a list containing objects described below and (if verbose = TRUE) prints the flextable cat_all_diff_calc_rep_ft and/or graph cat_all_graph depending on options chosen
list containing objects as described
if p-value comparison used:
cat_all_pvals = data frame of data for comparison of reported and calculated p-values
cat_all_diff_calc_rep_ft = flextable of comparison of reported and calculated p-values
cat_all_diff_calc_rep_data = data frame used to make flextable
cat_all_diff_thresh_ft = flextable of comparison of reported and calculated p-values when only threshold given
cat_all_diff_thresh_data = data frame used to make flextable for p-value thresholds
if comparing categorical variables used
cat_all_graph = plot of observed to expected numbers and differences between groups, top panels are the absolute numbers, bottom panels are the differences between trial arms in two arm studies
cat_all_graph_pc = plot of observed to expected numbers expressed as percentages and differences between groups, top panels are the percentages, bottom panels are the differences between trial arms in two arm studies
cat_all_data_abs = data frame of data for absolute numbers
cat_all_data_df = data frame of data for difference between groups in two arm studies
cat_all_dataset_abs = data frame of dataset used for all trials
cat_all_dataset_df = data frame of dataset used for two arm trials
cat_all_all_graphs list containing
abs = plot for absolute numbers only
df = plot for difference between groups in two arm studies only
pc = plot for percentages only
all_pc = composite plot of percentages and absolute numbers
individual_graphs list of 6 individual plots making up composite figures
# load example data cat_all_data <- load_clean(import= "no", file.cat = "SI_cat_all", cat_all= "yes", format.cat = "wide")$cat_all_data # run function comparing p-values only (takes only a few seconds) cat_all_fn (comp.pvals = "yes")$cat_all_diff_calc_rep_ft # run function comparing distribution of binomial variables only # to speed example up limit to 12 2-arm trials with 20 variables # (takes close to 5 secs) cat_all_data <- cat_all_data [1:41, c(1:8,10:11,13:15)] cat_all_fn (binom = "yes", two_levels = "yes", del.disparate = "yes", excl.level = "yes", seed = 10)$cat_all_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cat_all_data <- load_clean(import= "yes", cat_all = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all", range.name.cat = "A:N", format.cat = "wide")$cat_all_data
# load example data cat_all_data <- load_clean(import= "no", file.cat = "SI_cat_all", cat_all= "yes", format.cat = "wide")$cat_all_data # run function comparing p-values only (takes only a few seconds) cat_all_fn (comp.pvals = "yes")$cat_all_diff_calc_rep_ft # run function comparing distribution of binomial variables only # to speed example up limit to 12 2-arm trials with 20 variables # (takes close to 5 secs) cat_all_data <- cat_all_data [1:41, c(1:8,10:11,13:15)] cat_all_fn (binom = "yes", two_levels = "yes", del.disparate = "yes", excl.level = "yes", seed = 10)$cat_all_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cat_all_data <- load_clean(import= "yes", cat_all = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all", range.name.cat = "A:N", format.cat = "wide")$cat_all_data
Creates plots of observed to expected numbers and ratios for the specified binomial variable
cat_fn( df = cat_data, x_title = "", prefix = "", del.disparate = "yes", title = "", verbose = TRUE )
cat_fn( df = cat_data, x_title = "", prefix = "", del.disparate = "yes", title = "", verbose = TRUE )
df |
data frame generated from load_clean function |
x_title |
name of the variable for use on the x-axis |
prefix |
letter for variable columns in data frame |
del.disparate |
if yes, data in which the absolute difference between group sizes is >20% are deleted |
title |
title name for plots (optional) |
verbose |
TRUE or FALSE indicates whether to print plot |
An example is for trial withdrawls in Bolland 2021
Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. Participant withdrawals were unusually distributed in randomized trials with integrity concerns: a statistical investigation. J Clin Epidemiol 2021;131:22-29.
Returns a list containing 4 objects and (if verbose = TRUE) prints the plot cat_graph
list containing 4 objects as described
cat_graph = plot of observed to expected numbers and differences between groups, top panels are the absolute numbers, bottom panels are the differences between trial arms in two arm studies
cat_data_abs = data frame of data for absolute numbers
cat_data_df = data frame of data for difference between groups in two arm studies
cat_all_graphs = list containing
abs = plot for absolute numbers only
df = plot for difference between groups in two arm studies only
individual_graphs list of 4 individual plots making up composite figures
# load example data cat_data <- load_clean(import= "no", file.cat = "SI_cat", cat= "yes", format.cat = "wide", cat.names = c("n", "w"))$cat_data # run function (takes only a few seconds) cat_fn(x_title= "withdrawals", prefix="w", del.disparate = "yes")$cat_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cat_data <- load_clean(import= "yes", cat = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat", range.name.cat = "A:G", cat.names = c("n", "w"), format.cat = "wide")$cat_data
# load example data cat_data <- load_clean(import= "no", file.cat = "SI_cat", cat= "yes", format.cat = "wide", cat.names = c("n", "w"))$cat_data # run function (takes only a few seconds) cat_fn(x_title= "withdrawals", prefix="w", del.disparate = "yes")$cat_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cat_data <- load_clean(import= "yes", cat = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat", range.name.cat = "A:G", cat.names = c("n", "w"), format.cat = "wide")$cat_data
Creates flextable of probability of matching mean, SD, and mean and SD for each variable in different cohorts in the
specified number of simulations
cohort_fn( df = cohort_data, seed = 0, sims = -1, n_vars = 10, popn = "", title = "", verbose = TRUE )
cohort_fn( df = cohort_data, seed = 0, sims = -1, n_vars = 10, popn = "", title = "", verbose = TRUE )
df |
data frame generated from load_clean function |
seed |
the seed to use for random number generation, default 0 = current date and time. Specify seed to make repeatable. |
sims |
number of simulations, default -1 = function selects based on number of variables and sample size. |
n_vars |
restrict analyses to variables in at least (>=) this number of cohorts, default = 10 (ie variable has mean in 10 or more cohorts). |
popn |
if dataset contains studies in different sub-populations, code this in cohort_data$population and studies are subsetted if match in this variable. 'All' overrides this and uses all data regardless of information in this variable. |
title |
title name for plots (optional) |
verbose |
TRUE or FALSE indicates whether progress bar and comments show and flextable is printed |
Reference data is from Bolland 2021
Bolland MJ, Gamble GD, Avenell A, Grey A. Identical summary statistics were uncommon in randomized trials and cohort studies. J Clin Epidemiol 2021;136:180-188.
Returns a list containing 6 objects and (if verbose = TRUE) prints the flextable cohort_ft
list containing 6 objects as described
cohort_ft = flextable of results
cohort_graph = plot of observed to expected numbers of matches per cohort for mean; SD; and mean and SD
all_graphs = list containing
all_graphs = all plots on single plot
both_graphs = list of 3 plots row by row used to form all_graphs
individual_graphs= list of 6 individual plots used to form all_graphs
cohort_cohort_data = data frame used to generate results data
cohort_prob_data = data frame used to make flextable
cohort_oe_data= data frame used to make observed to expected plots
# load example data cohort_data <- load_clean(import= "no", file.cont = "SI_cohort", cohort= "yes", format.cont = "long")$cohort_data # run function (takes close to 5 seconds) cohort_fn(seed=10, sims = 100)$cohort_ft # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cohort_data <- load_clean(import= "yes", cohort = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_cohort", range.name.cont = "A1:F101", format.cont = "long")$cohort_data
# load example data cohort_data <- load_clean(import= "no", file.cont = "SI_cohort", cohort= "yes", format.cont = "long")$cohort_data # run function (takes close to 5 seconds) cohort_fn(seed=10, sims = 100)$cohort_ft # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data cohort_data <- load_clean(import= "yes", cohort = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_cohort", range.name.cont = "A1:F101", format.cont = "long")$cohort_data
Creates graph of proportion of final digits for summary statistics of specified variables
final_digit_fn( df = generic_data, vars = "", dec.pl = "no", dec.pl.vars = "", title = "", verbose = TRUE )
final_digit_fn( df = generic_data, vars = "", dec.pl = "no", dec.pl.vars = "", title = "", verbose = TRUE )
df |
data frame generated from load_clean function |
vars |
vector of the summary statistics to be used |
dec.pl |
"yes" or "no" indicator whether columns for decimal places are included (yes) or should be calculated (no) |
dec.pl.vars |
vector of the names of the columns for decimal places for each statistics |
title |
title name for plots (optional) |
verbose |
TRUE or FALSE indicates whether print plot |
This approach is still in development and needs validation and discussion about its place in integrity assessment.
Requires data frame containing columns for study, variable (named var), summary statistic(s) (named with single letter eg m or s),
and optional columns for decimal places for each statistic (named dp_* eg dp_m, dp_s). Data can be imported using the generic option of load_clean function
Returns a list containing 5 objects and prints the plot digit_graph
list containing 5 objects as described
digit_graph = plot of proportions of final digits
digit_ft = flextable of results
digit_table = data frame of results
digit_dataset = data frame of data set used to generate results data
digit_data = results of analyses used to generate results data
# load example data generic_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", generic= "yes", gen.vars.del = c("p"), format.cont = "wide")$generic_data # run function (takes only a few seconds) final_digit_fn(vars = c("m","s"), dec.pl = "n")$digit_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data generic_data <- load_clean(import= "yes", generic = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", gen.vars.del = c("p"), format.cont = "wide")$generic_data
# load example data generic_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", generic= "yes", gen.vars.del = c("p"), format.cont = "wide")$generic_data # run function (takes only a few seconds) final_digit_fn(vars = c("m","s"), dec.pl = "n")$digit_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data generic_data <- load_clean(import= "yes", generic = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", gen.vars.del = c("p"), format.cont = "wide")$generic_data
Function loads and cleans data for the nine functions
load_clean( import = "yes", file.cont = "", file.cat = "", dir = "", file.name = "", pval_cont = "no", match = "no", cohort = "no", anova = "no", dir.cont = "", file.name.cont = "", sheet.name.cont = "Sheet1", range.name.cont = "", format.cont = "wide", cat = "no", sr = "no", cat_all = "no", pval_cat = "no", cat.names = c("n"), dir.cat = "", file.name.cat = "", sheet.name.cat = "Sheet1", range.name.cat = "", format.cat = "wide", generic = "", gen.vars.keep = "", gen.vars.del = "", verbose = TRUE )
load_clean( import = "yes", file.cont = "", file.cat = "", dir = "", file.name = "", pval_cont = "no", match = "no", cohort = "no", anova = "no", dir.cont = "", file.name.cont = "", sheet.name.cont = "Sheet1", range.name.cont = "", format.cont = "wide", cat = "no", sr = "no", cat_all = "no", pval_cat = "no", cat.names = c("n"), dir.cat = "", file.name.cat = "", sheet.name.cat = "Sheet1", range.name.cat = "", format.cat = "wide", generic = "", gen.vars.keep = "", gen.vars.del = "", verbose = TRUE )
import |
'yes' indicates import excel file. 'no' indicates takes dataset already loaded into R as data frame |
file.cont |
If import = 'no', name of data frame containing continuous data |
file.cat |
If import = 'no', name of data frame containing categorical data |
dir |
If import = 'yes', path to location of excel file for continuous and categorical data |
file.name |
If import = 'yes', file name of excel file containing continuous and categorical data |
pval_cont |
'yes'/'no' indicating if data will be used for pval_cont_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
match |
'yes'/'no' indicating if data will be used for match_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
cohort |
'yes'/'no' indicating if data will be used for cohort_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
anova |
'yes'/'no' indicating if data will be used for anova_fn. Only data for 1 continuous data function can be loaded with each run of this function. |
dir.cont |
If import = 'yes', path to location of excel file for continuous data |
file.name.cont |
If import = 'yes', file name of excel file containing continuous data |
sheet.name.cont |
Sheet name containing continuous data |
range.name.cont |
Range of cells containing continuous data. Can be in format 'a1:b20' or 'a:b' |
format.cont |
'wide'/'long' indicating continuous data is in wide or long format |
cat |
'yes'/'no' indicating if data will be used for cat_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
sr |
'yes'/'no' indicating if data will be used for sr_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
cat_all |
'yes'/'no' indicating if data will be used for cat_all_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
pval_cat |
'yes'/'no' indicating if data will be used for cat_all_fn. Only data for 1 categorical data function can be loaded with each run of this function. |
cat.names |
names of variables to be used in cat_fn and sr_fn |
dir.cat |
If import = 'yes', path to location of excel file for categorical data |
file.name.cat |
If import = 'yes', file name of excel file containing categorical data |
sheet.name.cat |
Sheet name containing categorical data |
range.name.cat |
Range of cells containing categorical. Can be in format 'a1:b20' or 'a:b' |
format.cat |
'wide'/'long' indicating categorical data is in wide or long format |
generic |
'yes'/'no' indicating if data to be loaded for generic use |
gen.vars.keep |
Vector of variables in data to keep |
gen.vars.del |
Vector of variables in data to delete |
verbose |
TRUE/FALSE TRUE indicates comments will be printed during loading |
Function can load continuous or categorical data.
Continuous data can be used for comparison of baseline p-values (pval_cont_fn),
matching summary stats within a trial (match_fn), matching summary stats in different cohorts (cohort_fn),
or comparing means of baseline p-values (anova_fn).
Categorical data can be used for comparisons of observed with expected distributions for single variable (cat_fn),
for group numbers in trials using simple randomisation (sr_fn), for all variables (cat_all_fn), and for comparison
of baseline p-values (pval_cat_fn).
There is one function in development that allows assessment of proportion of final digits in summary statistics (final_digit_fn). This function works using summary statistics but could be adapted to use on raw continuous or categorical data.
Only 1 continuous and/or 1 categorical data set allowed per load to avoid clashes
Data can be imported from a file (import = "yes") or taken from an existing data frame, import = "no"
If loading from an existing data use file.cont and file.cat
If loading from common directory or file, can use dir and file.name rather than more specific dir.cont, dir.cat, file.name.cont, or file.name.cat.
Comments about each indicator:
pval_cont
loads continuous data for pval_cont_fn, outputs as list of 1 containing named data frame pval_cont_data.
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation,
p = baseline p value (can omit if not reported)
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3..., p
long: study, var, group, m , s, n , p
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
match
loads continuous data for match_fn, outputs as list of 1 containing named data frame match_data
remainder is same as for pval_cont above.
only difference between pval_cont and match is that match allows for missing mean or SD whereas pval_cont does not
format should be study, variable or var, n, m, s. Can be in any order. n = sample size, m = mean, s = standard deviation
can be in wide or long format
wide: study, var, n1, n2, m1, m2, s1, s2, p
long: study, var, group, m , s, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cohort
loads continuous data for cohort_fn, outputs as list of 1 containing named data frame cohort_data
same as pval_cont but allows a lookup variable for variable names
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3...
long: study, var, group, m , s, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
lookup table is var_name_final, var_name_orig and allows you to specify a list of all variables names (var_name_orig)
from all studies and a lookup table of standardised names (var_name_final) allowing different names in different studies to
be standardised
has optional variable 'population' which can be used to subset the data if trials in different populations are reported
anova
loads continuous data for anova_fn, outputs as list of 1 containing named data frame anova_data
same as for pval_cont above but allows for optional value for decimal place
format should be study, variable or var, n, m, s, p. Can be in any order. n = sample size, m = mean, s = standard deviation,
d= decimal place of mean (if omitted, this is calculated automatically in anova_fn)
can be in wide or long format
wide: study, var, n1, n2, n3 ..., m1, m2, m3 ... s1, s2, s3..., d
long: study, var, group, m , s, n , d
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cat
loads categorical data for cat_fn, outputs as list of 1 containing named data frame cat_data
format should be study, n, v. Can be in any order, n= group size, v= number with characteristic
can be in wide or long format
wide: study, n1, n2, n3 ..., v1, v2, v3...
long: study, group, n, v
group or g or grp required for long format
use cat.names to name variable eg c("n", "v") , c("n", "g") ...
separators (eg n1 n_1 n.1) are stripped and replaced
sr
loads categorial data for sr_fn, outputs as list of 1 containing named data frame sr_data
as for cat but only requires study and n
format should be study, n. n= group size
can be in wide or long format
wide: study, n1, n2, n3 ...
long: study, group, n
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
cat_all
loads categorical data for cat_all_fn, outputs as list of 1 containing named data frame cat_all_data
format should be study, var or variable, n, N, level, stat, recode, p. Can be in any order, n = number with characteristic, N = group size,
p = baseline p value (can omit if not reported), can use "ns" for not significant or "<" or ">" to indicate threshold (eg "<0.05")
optional level - number for level of variable (eg y/n =1,2; high/med/low =1,2,3)
optional recode- for variables with >2 levels to tell how to recode into 2 groups
optional stat: statistical test used for p-value : chisq - Chisquare, chisqc- Chisquare with correction,
fisher- Fisher's exact, midp - midp -calculated using two different methods, lr- likelihood ratio,
mh - Mantel-Haenszel test
can be in wide or long format
wide study, var, n1, n2, n3, ... N1, N2, N3... p, stat, level, recode
long study, var, group, n, N, p, stat, level, recode
group or g or grp required for long format
if variable has 2 levels, only 1 required, other will be calculated.
separators (eg n1 n_1 n.1) are stripped and replaced
pval_cat
loads categorical data for pval_cat_fn, outputs as list of 1 containing named data frame pval_cat_data
as for cat_all but recode variable is not generated
format should be study, var or variable, n, N, p. Can be in any order, n = number with characteristic, N = group size,
p = baseline p value (can omit if not reported), can use "ns" for not significant or "<" or ">" to indicate threshold (eg "<0.05")
optional level - number for level of variable (eg y/n =1,2; high/med/low =1,2,3)
optional stat: statistical test used for p-value : chisq - Chisquare, fisher- Fisher's exact
can be in wide or long format
wide study, var, n1, n2, n3, ... N1, N2, N3... p, stat, level
long study, var, group, n, N, p, stat, level
group or g or grp required for long format
if variable has 2 levels, only 1 required, other will be calculated.
separators (eg n1 n_1 n.1) are stripped and replaced
generic
loads data for use generic use, outputs as list of 1 containing named data frame generic_data
use cont suffixes for file details: dir.cont (or dir), file.name.cont (or file.name), sheet.name,cont, range.name.cont)
format should be study, var or variable, variable names
optional gen.vars.keep = vector of variables to keep
optional gen.vars.del = vector of variables to delete
can be in wide or long format
wide study, var, a1, a2..., b1, b2 ...
long study, var, group, a, b, ....
group or g or grp required for long format
separators (eg n1 n_1 n.1) are stripped and replaced
no data checking or other transformations take place
list containing a named data frame containing data in suitable format for appropriate function as described in Details
# examples of loading data for each function are given in the individual functions. # Here is one- for pval_cont_fn(): pval_cont_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", pval_cont= "yes", format.cont = "wide")$pval_cont_data # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cont_data <- load_clean(import= "yes", pval_cont = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", format.cont = "wide")$pval_cont_data
# examples of loading data for each function are given in the individual functions. # Here is one- for pval_cont_fn(): pval_cont_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", pval_cont= "yes", format.cont = "wide")$pval_cont_data # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cont_data <- load_clean(import= "yes", pval_cont = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", format.cont = "wide")$pval_cont_data
Creates flextable of matching summary statistics by significant figures with Reference data
match_fn(df = match_data, verbose = TRUE)
match_fn(df = match_data, verbose = TRUE)
df |
data frame generated from load_clean function |
verbose |
TRUE or FALSE indicates whether to print flextable |
Reference data is from Bolland 2021
Bolland MJ, Gamble GD, Avenell A, Grey A. Identical summary statistics were uncommon in randomized trials and cohort studies. J Clin Epidemiol 2021;136:180-188.
Returns a list containing 6 objects and (if verbose = TRUE ) prints the flextable match_ft_all
list containing 6 objects as described
match_ft_all = flextable of matches with reference data
match_ft = flextable of matches (no reference data)
ref_match_ft = flextable of reference data
match_match_data = data frame of results used in calculations
match_table = data frame of matches used to make flextable
ref_table = data frame of reference data used to make flextable
# load example data match_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", match= "yes", format.cont = "wide")$match_data # run function (takes only a few seconds) match_fn()$match_ft_all # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data match_data <- load_clean(import= "yes", match = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A:O", format.cont = "wide")$match_data
# load example data match_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", match= "yes", format.cont = "wide")$match_data # run function (takes only a few seconds) match_fn()$match_ft_all # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data match_data <- load_clean(import= "yes", match = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A:O", format.cont = "wide")$match_data
Creates plots of calculated p-value distribution and AUC (area under curve)
pval_cat_fn( df = pval_cat_data, seed = 0, sims = -1, btsp = 500, title = "", stat = "chi_midp", stat.override = "no", fisher.sim = "y", fish.n.sims = 10000, method = "mix", verbose = TRUE )
pval_cat_fn( df = pval_cat_data, seed = 0, sims = -1, btsp = 500, title = "", stat = "chi_midp", stat.override = "no", fisher.sim = "y", fish.n.sims = 10000, method = "mix", verbose = TRUE )
df |
data frame generated from load_clean function |
seed |
the seed to use for random number generation, default 0 = current date and time. Specify seed to make repeatable. |
sims |
number of simulations, default -1 = function selects based on number of variables. |
btsp |
number of bootstrap repeats used to generate 95% confidence interval around AUC |
title |
optional title for plots |
stat |
statistical test to be used 'chisq', 'fisher', 'midp' or 'midp.epitools' (from epitools package), 'midp.sas' (as calculated in SAS), or combinations -if chisq is not appropriate because expected cells<5, use second test: 'chi_fish', 'chi_midp' or 'chi_midp.epi','chi_midp.sas' |
stat.override |
if 'yes' then test specified in stat will be used rather than values for stat in data frame |
fisher.sim |
"yes" or "no" indicator whether to allow fisher test to simulate p-values for >2*2 tables |
fish.n.sims |
number of simulations to use in Fisher test, default 10,000 |
method |
'sm', 'mix', or 'ind'. 'ind' does test on individual data, 'sm' summarises data and then does test on summary data, 'mix' does 'ind' for fisher and 'sm' for others. Duration varies with size of studies, test, and number of simulations. Experiment before running large simulations. |
verbose |
TRUE or FALSE indicates whether progress bar and comments show and prints plot |
See also Bolland MJ, Gamble GD, Avenell A, Grey A, Lumley T. Baseline P value distributions in randomized trials were uniform for continuous but not categorical variables. J Clin Epidemiol 2019;112:67-76.
Returns a list containing 3 objects and (if verbose = TRUE) prints the plot pval_cat_calculated_pvalues
list containing 3 objects as described
pval_cat_calculated_pvalues = plots of calculated p-value distribution and AUC
pval_cat_reported_pvalues = plots of reported p-value distribution and AUC (if p-values were reported)
all_results = list containing
pval_cat_baseline_pvalues_data = data frame of all results used in calculations
pval_cat_reported_pvalues= plot of reported p-value distribution
pval_cat_auc_reported_pvalues = AUC of reported p-values
pval_cat_calculated_pvalues = plot of calculated p-value distribution
pval_cat_auc_calculated_pvalues= AUC of calculated p-values
# load example data pval_cat_data <- load_clean(import= "no", file.cat = "SI_cat_all", pval_cat= "yes", format.cont = "wide")$pval_cat_data # run function (takes a few seconds) pval_cat_fn(seed=10, sims = 50, btsp = 100)$pval_cat_calculated_pvalues # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cat_data <- load_clean(import= "yes", pval_cat = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all", range.name.cat = "A:n", format.cat = "wide")$pval_cat_data
# load example data pval_cat_data <- load_clean(import= "no", file.cat = "SI_cat_all", pval_cat= "yes", format.cont = "wide")$pval_cat_data # run function (takes a few seconds) pval_cat_fn(seed=10, sims = 50, btsp = 100)$pval_cat_calculated_pvalues # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cat_data <- load_clean(import= "yes", pval_cat = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat_all", range.name.cat = "A:n", format.cat = "wide")$pval_cat_data
Creates plots of calculated p-value distribution and AUC (area under curve)
pval_cont_fn(df = pval_cont_data, btsp = 500, title = "", verbose = TRUE)
pval_cont_fn(df = pval_cont_data, btsp = 500, title = "", verbose = TRUE)
df |
data frame generated from load_clean function |
btsp |
number of bootstrap repeats used to generate 95% confidence interval around AUC |
title |
optional title for plots |
verbose |
TRUE or FALSE indicates whether progress bar and comments show and prints plot |
Reference data is from (Carlisle 2017, Bolland 2021)
Carlisle JB . Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 2017;72:944–52 .2017
Bolland MJ, Gamble GD, Grey A, Avenell A. Empirically generated reference proportions for baseline p values from rounded summary statistics. Anaesthesia 2020;75:1685-1687.
See also Bolland MJ, Gamble GD, Avenell A, Grey A, Lumley T. Baseline P value distributions in randomized trials were uniform for continuous but not categorical variables. J Clin Epidemiol 2019;112:67-76.
and Bolland MJ, Gamble GD, Avenell A, Grey A. Rounding, but not randomization method, non-normality, or correlation, affected baseline P-value distributions in randomized trials. J Clin Epidemiol 2019;110:50-62.
Returns a list containing 4 objects and (if verbose = TRUE) prints the plot pval_cont_calculated_pvalues
list containing 4 objects as described
pval_cont_calculated_pvalues = plots of calculated p-value distribution and AUC
pval_cont_reported_pvalues = plots of reported p-value distribution and AUC (if p-values were reported)
pval_cont_ft_diff_calc_rep_p = flextable of distribution of differences in calculated and reported results
all_results = list containing
pval_cont_baseline_pvalues_data = data frame of all results used in calculations
pval_cont_diff_calc_rep_p = data frame of differences between calculated and reported p-values
pval_cont_reported_pvalues= plot of reported p-value distribution
pval_cont_auc_reported_pvalues = AUC of reported p-values
pval_cont_calculated_pvalues = plot of calculated p-value distribution
pval_cont_auc_calculated_pvalues= AUC of calculated p-values
# load example data pval_cont_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", pval_cont= "yes", format.cont = "wide")$pval_cont_data # run function (takes only a few seconds) pval_cont_fn(btsp=100)$pval_cont_calculated_pvalues # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cont_data <- load_clean(import= "yes", pval_cont = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", format.cont = "wide")$pval_cont_data
# load example data pval_cont_data <- load_clean(import= "no", file.cont = "SI_pvals_cont", pval_cont= "yes", format.cont = "wide")$pval_cont_data # run function (takes only a few seconds) pval_cont_fn(btsp=100)$pval_cont_calculated_pvalues # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data pval_cont_data <- load_clean(import= "yes", pval_cont = "yes", dir = path, file.name.cont = "reappraised_examples.xlsx", sheet.name.cont = "SI_pvals_cont", range.name.cont = "A1:O51", format.cont = "wide")$pval_cont_data
Sample from Sato/Iwamoto dataset of 35 studies with withdrawal data
see Bolland MJ, Gamble GD, Avenell A, Cooper DJ, Grey A. Participant withdrawals were unusually distributed in randomized trials with integrity concerns: a statistical investigation. J Clin Epidemiol 2021;131:22-29.
SI_cat
SI_cat
A data frame with 20 rows and 8 variables:
study ID
number of participants in each group
number of withdrawals in each group
Sample from Sato/Iwamoto dataset of 31 studies with categorical data
SI_cat_all
SI_cat_all
A data frame with 106 rows and 14 variables:
study ID
variable
level of variable
value to recode level two if collapsing variable to two levels
total number of levels for each variable
number of trial arms
number of participants with characteristic in each group
number of participants in each group
reported p-value
reported statistical test used to calculate p-value
Sample from Sato/Iwamoto dataset of 226 baseline variables in 34 cohorts
see Bolland MJ, Gamble GD, Avenell A, Grey A. Identical summary statistics were uncommon in randomized trials and cohort studies. J Clin Epidemiol 2021;136:180-188.
SI_cohort
SI_cohort
A data frame with 100 rows and 6 variables:
study ID
variable name
number of participants in group
mean of variable
sd of variable
number of group
Sample from Sato/Iwamoto dataset of 500 baseline variables in 41 trials
for details see Bolland MJ, Avenell A, Gamble GD, Grey A. Systematic review and statistical analysis of the integrity of 33 randomized controlled trials. Neurology 2016;87:2391-2402.
SI_pvals_cont
SI_pvals_cont
A data frame with 50 rows and 15 variables:
study ID
variable name
size of group 1, 2, 3, 4
mean of variable for group 1, 2, 3, 4
sd of variable for group 1, 2, 3, 4
reported p-value
Creates plot of observed to expected numbers and ratios for differences in numbers of participants between trial groups
sr_fn( df = sr_data, br = "no", block = data.frame(study = "", fbsz = "", n_fb = "", df = ""), title = "", verbose = TRUE )
sr_fn( df = sr_data, br = "no", block = data.frame(study = "", fbsz = "", n_fb = "", df = ""), title = "", verbose = TRUE )
df |
data frame generated from load_clean function |
br |
block randomisation: 'yes' or 'no'. If 'no' runs function as if all trials used simple randomisation. If 'yes' performs simple calculations as if block randomised |
block |
an additional option for studies using block randomisation. block is a data frame containing columns named study (for study id); fbsz (for the final block size); n_fb (for number of participants in the final block); df (the difference between groups) |
title |
title name for plots (optional) |
verbose |
TRUE or FALSE indicates whether progress bar and comments in block randomisation function show and whether to print plot |
An example is for Sato and Iwamoto trials in Bolland 2016
Bolland MJ, Avenell A, Gamble GD, Grey A. Systematic review and statistical analysis of the integrity of 33 randomized controlled trials. Neurology 2016;87:2391-2402.
Returns a list containing 4 objects and (if verbose = TRUE) prints the plot sr_graph
list containing 4 objects as described
sr_graph = plot of observed to expected numbers for differences between numbers of participants in trial groups
sr_graph = plot of observed to expected numbers and ratios for differences between numbers of participants in trial groups
sr_individual_graphs = list containing 2 plots making up composite figure
sr_data = data frame containing data for plots
# load example data sr_data <- load_clean(import= "no", file.cat = "SI_cat", sr= "yes", format.cat = "wide")$sr_data # run function (takes only a few seconds) sr_fn()$sr_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data sr_data <- load_clean(import= "yes", sr = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat", range.name.cat = "A:D", format.cat = "wide")$sr_data # function has an additional option for block randomisation. # If studies are block randomised and the final block size is known, # the number of participants in the final block can be determined. # The distribution of differences between groups for the final block # can be compared to the expected distribution # # Few studies provide all these details so it seems unlikely this function # would get used often # Example takes only a few seconds to run sr_fn(br = "yes", block = data.frame(study = c(1,2,3,4,5,6,7,8,9,10), fb_sz= c(2,4,6,8,10,12,8,8,6,14), n_fb = c(1,1,4,5,7,8,4,6,2,10), df=c(1,1,0,1,3,4,2,2,0,0)))$sr_graph
# load example data sr_data <- load_clean(import= "no", file.cat = "SI_cat", sr= "yes", format.cat = "wide")$sr_data # run function (takes only a few seconds) sr_fn()$sr_graph # to import an excel spreadsheet (modify using local path, # file and sheet name, range, and format): # get path for example files path <- system.file("extdata", "reappraised_examples.xlsx", package = "reappraised", mustWork = TRUE) # delete file name from path path <- sub("/[^/]+$", "", path) # load data sr_data <- load_clean(import= "yes", sr = "yes", dir = path, file.name.cat = "reappraised_examples.xlsx", sheet.name.cat = "SI_cat", range.name.cat = "A:D", format.cat = "wide")$sr_data # function has an additional option for block randomisation. # If studies are block randomised and the final block size is known, # the number of participants in the final block can be determined. # The distribution of differences between groups for the final block # can be compared to the expected distribution # # Few studies provide all these details so it seems unlikely this function # would get used often # Example takes only a few seconds to run sr_fn(br = "yes", block = data.frame(study = c(1,2,3,4,5,6,7,8,9,10), fb_sz= c(2,4,6,8,10,12,8,8,6,14), n_fb = c(1,1,4,5,7,8,4,6,2,10), df=c(1,1,0,1,3,4,2,2,0,0)))$sr_graph