| Title: | Design and Modeling for Repeated Measures Studies |
|---|---|
| Description: | Provides complete functionality to analyse data from repeated measures experiments with hierarchical or crossed experimental designs. Supports testing modeling assumptions, identifying outlier observations and experimental units, estimating statistical power, and performing sample size calculations. Uses linear mixed effects models via 'lme4' and simulation-based power analysis via 'simr'. Handles both normal and non-normal error distributions including binomial and Poisson families. For more details see Shin et al. (2022) <doi:10.1101/2022.07.18.500490>, Bates et al. (2015) <doi:10.18637/jss.v067.i01>, Green and MacLeod (2016) <doi:10.1111/2041-210X.12504>, Hartig (2024) <doi:10.32614/CRAN.package.DHARMa>, Nieuwenhuis et al. (2012) <doi:10.32614/RJ-2012-011>, Millard (2013) <doi:10.1007/978-1-4614-8456-1> and Kuznetsova et al. (2017) <doi:10.18637/jss.v082.i13>. |
| Authors: | Min-Gyoung Shin [aut], Reuben Thomas [aut, cre] |
| Maintainer: | Reuben Thomas <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.1 |
| Built: | 2026-05-07 18:31:09 UTC |
| Source: | https://github.com/cran/RMeDPower2 |
This functions makes statistical power estimates given the data, the underlying design for it and the assumed probability model of the error distribution
calculatePower(data, design, model, power_param)calculatePower(data, design, model, power_param)
data |
Input data frame with columns having all the necessary information regarding the dependent and independent variables of interest |
design |
an object of class RMeDesign with the necessary design information about the data |
model |
an object of class ProbabilityModel giving the error distribution of the data |
power_param |
an object of class PowerParams giving the target parameter of interest and the other necessary parameter to perform the power estimation |
A power curve as a ggplot object or a power calculation result printed in a text file
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) power_param <- readPowerParams(file.path(template_dir,"power_param.json")) power_res <- calculatePower(data, design, model, power_param)template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) power_param <- readPowerParams(file.path(template_dir,"power_param.json")) power_res <- calculatePower(data, design, model, power_param)
This function can be used to generate diagnostic QC plots for given model assumptions related to the input data, identify potential outlier observations and/or outlier experimental units
diagnoseDataModel(data, design, model)diagnoseDataModel(data, design, model)
data |
Input data frame with columns having all the necessary information regarding the dependent and independent variables of interest |
design |
an object of class RMeDesign with the necessary design information about the data |
model |
an object of class ProbabilityModel giving the error distribution of the data |
A list with four elements. 1) models: representing the names of the models evaluated based on differnt modifications of the response column. The models would include one called natural_scale, another model called natural_scale_wo_outliers if outliers had beeen identified, another model called log_scale if the respose column is continuous and the model on the log-transformed values of the responses are what was evaluated and finally log_scale_wo_outliers model if there were outliers identified in the log_scale model. 2) Data_updated representing the updated data frame with additional columns for the modified response column corresponding to each of the models evaluated. 3) cooks_result: cooks distance of each of the experimental columns for each of the models evaluated. For models based on the binomial probability distribution, cooks distance is only reported for the first experimental column on account the increased computation time for evaluating this metric for the other experimental columns. 4) plots_info: is a list with two elements plots and captions. plots is a named list and captions is a character vector, both of the same length as the number of models evaluated. Each element of the plots list is yet another list of QC/diagnostic plots related to the corresponding model fit, while the captions is a vector of captions for each of the QC plots output
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) diagnose_res <- diagnoseDataModel(data, design, model)template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) diagnose_res <- diagnoseDataModel(data, design, model)
This function performs the estimations of interest and also visualizes the resulting association
getEstimatesOfInterest(data, design, model, print_plots = TRUE)getEstimatesOfInterest(data, design, model, print_plots = TRUE)
data |
Input data frame with columns having all the necessary information regarding the dependent and independent variables of interest |
design |
an object of class RMeDesign with the necessary design information about the data |
model |
an object of class ProbabilityModel giving the error distribution of the data |
print_plots |
Whether or not to print the plots, irrespective of this argument ggplot versions of evaluated association between the response_column and the condition_column. TRUE - print the plot, FALSE - do not print the plot |
a list with two elements - 1. an object of class summary.merMod and 2. the output from the get_residuals functions. This output consists of a list with 3 elements. 1. The updated input data with an additional column with the model residuals of the individual observations. 2. A plot representing the purported association between the response column and the condition column. 3. The corresponding caption for this figure.
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) res <- getEstimatesOfInterest(data, design, model)template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") data <- plate_assay_pilot_data design <- readDesign(file.path(template_dir,"design_cell_assay.json")) model <- readProbabilityModel(file.path(template_dir,"prob_model.json")) res <- getEstimatesOfInterest(data, design, model)
Example behavioral dataset containing measurements from a mouse Morris Water Maze (MWM) assay. The data represent repeated measures across trials and subjects and are suitable for illustrating repeated measures power analysis.
data(mouse_behavior_MWM_assay_data)data(mouse_behavior_MWM_assay_data)
A data frame of behavioral measurements with information on mouse, trial, and experimental condition.
Example dataset containing electrophysiological measurements from mouse brain recordings. The data are used to demonstrate power analyses in experiments with repeated measurements and complex correlation structures.
data(mouse_brain_electro_physiology_data)data(mouse_brain_electro_physiology_data)
A data frame of electrophysiological measurements and associated experimental annotations.
Full plate assay dataset corresponding to the pilot data but including the complete experimental run. This dataset is used in examples demonstrating power calculations under more realistic sample sizes and hierarchies.
data(plate_assay_full_data)data(plate_assay_full_data)
A data frame containing the full plate assay data.
Column definitions follow those of
plate_assay_pilot_data.
plate_assay_pilot_data,
plate_assay_pilot_data_wo_repeats
Pilot dataset from plate-based assays used in the RMeDPower2 documentation and examples. The data represent repeated measurements across plates and experimental units and are intended for illustrating experimental design specification, model diagnostics, and power calculations.
data(plate_assay_pilot_data)data(plate_assay_pilot_data)
A data frame with observations from a pilot plate assay. Column names and structure are documented in the package vignette and example code.
plate_assay_pilot_data_wo_repeats,
plate_assay_full_data
Version of plate_assay_pilot_data where repeated
measurements have been removed, suitable for power analyses
that assume a single observation per experimental unit at each
time point or condition.
data(plate_assay_pilot_data_wo_repeats)data(plate_assay_pilot_data_wo_repeats)
A data frame containing the pilot plate assay data without repeated measurements. See the vignette for details on columns and preprocessing.
plate_assay_pilot_data,
plate_assay_full_data
Objects of PowerParams class store information required for sample size estimation for given data
power_curve |
1: Power simulation over a range of sample sizes or levels. 0: Power calculation over a single sample size or a level. |
nsimn |
The number of simulations to run. Default=1000 |
target_columns |
Name of the experimental parameters to use for the power calculation. |
levels |
1: Amplify the number of corresponding target parameter. 0: Amplify the number of samples from the corresponding target parameter, ex) If target_columns = c("experiment","cell_line") and if you want to expand the number of experiment and sample more cells from each cell line, set levels = c(1,0). |
max_size |
Maximum levels or sample sizes to test. Default: the current level or the current sample size x 5. ex) If max_levels = c(10,5), it will test upto 10 experiments and 5 cell lines. |
breaks |
Levels /sample sizes of the variable to be specified along the power curve. Default: max(1, round( the number of current levels / 5 )) |
effect_size |
If you know the effect size of your condition variable, the effect size can be provided as a parameter. If the effect size is not provided, it will be estimated from your data |
alpha |
Threshold for Type I error |
ICC |
Intra-Class Coefficients (ICC) for each parameter |
an object of class ProbabilityModel
power_param=new("PowerParams")power_param=new("PowerParams")
Objects of ProbabilityModel class store information on the assumed probability distribution for the model
error_is_non_normal |
Default: the observed variable is continuous Categorical response variable will be implemented in the future. TRUE: Categorical , FALSE: Continuous (default). |
family_p |
The type of distribution family to specify when the response is categorical. If family is "binary" then binary(link="log") is used, if family is "poisson" then poisson(link="logit") is used, if family is "poisson_log" then poisson(link=") log") is used. |
an object of class ProbabilityModel
model=new("ProbabilityModel")model=new("ProbabilityModel")
This functions reads the underlying design for the data
readDesign(jsonfile)readDesign(jsonfile)
jsonfile |
the jsonfile with the necessary design parameters: condition_column, experimental_columns, response_column, total_column, condition_is_categorical, covariate, method, crossed_columns, error_is_non_normal, family_p, outlier_alpha, na.action |
an object of class RMeDesign
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") design <- readDesign(file.path(template_dir,"design_cell_assay.json"))template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") design <- readDesign(file.path(template_dir,"design_cell_assay.json"))
This functions reads the underlying design for the data
readPowerParams(jsonfile)readPowerParams(jsonfile)
jsonfile |
the jsonfile with the necessary parameters for statistical power estimation: target_columns, power_curve, nsimn, levels, max_size, alpha, breaks, effect_size, icc |
an object of class PowerParams
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") power_param <- readPowerParams(file.path(template_dir,"power_param.json"))template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") power_param <- readPowerParams(file.path(template_dir,"power_param.json"))
This functions reads the underlying design for the data
readProbabilityModel(jsonfile)readProbabilityModel(jsonfile)
jsonfile |
the jsonfile with the necessary parameters for probability model: error_is_non_normal, family_p |
an object of class ProbabilityModel
template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") model <- readProbabilityModel(file.path(template_dir,"prob_model.json"))template_dir <- system.file("input_templates/cell_assay_data", package = "RMeDPower2") model <- readProbabilityModel(file.path(template_dir,"prob_model.json"))
Objects of RMeDesign class store information on the relevant repeated measures design for the given data
data |
Input data |
condition_column |
Name of the condition variable (ex variable with values such as control/case). The input file has to have a corresponding column name |
experimental_columns |
Name of the variable related to experimental design such as "experiment", "plate", and "cell_line". They should be in order, for example, "experiment" should always come first . |
response_column |
Name of the variable observed by performing the experiment. ex) intensity. |
total_column |
Set this column only when family_p="binomial" and it is equal to the total number of observations (number of cases plus number of controls) for a given number of cases |
outlier_alpha |
numeric scalar between 0 and 1 indicating the Type I error associated with the test of outliers |
condition_is_categorical |
Specify whether the condition variable is categorical. TRUE: Categorical, FALSE: Continuous. |
covariate |
The name of the covariate to control in the regression model |
method |
The method used to detect outliers. "rosner" (default) runs Rosner's test and "cook" runs Cook's distance. |
crossed_columns |
Name of experimental variables that may appear repeatedly with the same ID. For example, cell_line C1 may appear in multiple experiments, but plate P1 cannot appear in more than one experiment |
error_is_non_normal |
Default: the observed variable is continuous Categorical response variable will be implemented in the future. TRUE: Categorical , FALSE: Continuous (default). |
family_p |
The type of distribution family to specify when the response is categorical. If family is "binary" then binary(link="log") is used, if family is "poisson" then poisson(link="logit") is used, if family is "poisson_log" then poisson(link=") log") is used. |
na.action |
"complete": missing data is not allowed in all columns (default), "unique": missing data is not allowed only in condition, experimental, and response columns. Selecting "complete" removes an entire row when there is one or more missing values, which may affect the distribution of other features. |
include_interaction |
logical - TRUE or FALSE - Whether to include condition * covariate interaction |
random_slope_variable |
Variable for random slopes (typically one of "condition_column" or "covariate" and assuming that they are numeric variables). A random slope term is added for each of the variables specified in the experimental columns in addition to their corresponding random intercept terms. The random slope and intercept terms for each experimental_columns variable are assumed to be uncorrelated. |
covariate_is_categorical |
Specify whether the covariate variable is categorical. TRUE: Categorical, FALSE: Continuous. |
an object of class RMeDesign
design=new("RMeDesign")design=new("RMeDesign")
Example dataset containing cluster-level count summaries from a single-nucleus RNA-seq experiment. The data are intended to illustrate how RMeDPower2 can be applied to hierarchical omics experiments with counts aggregated at the cluster level.
data(snRNAseq_cluster_count_data)data(snRNAseq_cluster_count_data)
A data frame of cluster-level counts and associated annotations.
Example dataset containing gene-level count summaries from a single-nucleus RNA-seq experiment. This dataset can be used to demonstrate power calculations for differential expression-type analyses across experimental conditions.
data(snRNAseq_gene_count_data)data(snRNAseq_gene_count_data)
A data frame of gene-level counts and associated annotations.