Title: | Observational Database Study Planning using Exact Sequential Analysis for Poisson and Binomial Data |
---|---|
Description: | Functions to be used in conjunction with the 'Sequential' package that allows for planning of observational database studies that will be analyzed with exact sequential analysis. This package supports Poisson- and binomial-based data. The primary function, seq_wrapper(...), accepts parameters for simulation of a simple exposure pattern and for the 'Sequential' package setup and analysis functions. The exposure matrix is used to simulate the true and false positive and negative populations (Green (1983) <doi:10.1093/oxfordjournals.aje.a113521>, Brenner (1993) <doi:10.1093/oxfordjournals.aje.a116805>). Functions are then run from the 'Sequential' package on these populations, which allows for the exploration of outcome misclassification in data. |
Authors: | Judith Maro [aut, cre], Laura Hou [aut] |
Maintainer: | Judith Maro <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-12-10 06:56:49 UTC |
Source: | CRAN |
This function create.exposure is a sub-function used in conjunction with the initialize.data function and creates an exposure matrix. The columns represent strata for the observational data and the rows represent new exposures in unit time. It take cumulative data and segregates it by time period. Do not run create.exposure as a stand-alone function.
create.exposure(params)
create.exposure(params)
params |
This is a set of parameters from the initialize.data function that allows for simulation of a sequence of sequential exposures in unit time. |
paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30) create.exposure(paramtest)
paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30) create.exposure(paramtest)
The function takes a matrix of values that have accumulated over rows and returns a matrix of the incremental increase between each row. Do not run fun.decum as a stand-alone function
fun.decum(matrix)
fun.decum(matrix)
matrix |
This is a matrix of values to be decumulated where the cumulation occurs over rows within the same column. |
testarray <- array(NA, dim=c(5,2,2)) testarray[,,1] <- cbind(c(1:5), c(1:5)*2) testarray[,,2] <- cbind(c(1:5)*1.5, c(1:5)*3) fun.decum(testarray)
testarray <- array(NA, dim=c(5,2,2)) testarray[,,1] <- cbind(c(1:5), c(1:5)*2) testarray[,,2] <- cbind(c(1:5)*1.5, c(1:5)*3) fun.decum(testarray)
This function creates a dataset simulating the accumulation of individuals exposed to treatment under the self-controlled risk interval design.
fun.exposure(rate, offset, t)
fun.exposure(rate, offset, t)
rate |
Rate of accumulation. |
offset |
Initial exposed population. |
t |
Time at which individuals are exposed. |
fun.exposure(rate=100, offset=20, t=20)
fun.exposure(rate=100, offset=20, t=20)
The function creates a data frame with all the needed parameters for simulation and initializes the simulation problem. Do not run initialize.data as a stand-alone function.
initialize.data(seed, N, t0, tf, NStrata, strataRatio, EventRate, sensitivity, PPVest, RR, MatchRatio, maxSampleSize, maxTest, totalAlpha, minEvents, AlphaSpendType, AlphaParameter, address, rate, offset)
initialize.data(seed, N, t0, tf, NStrata, strataRatio, EventRate, sensitivity, PPVest, RR, MatchRatio, maxSampleSize, maxTest, totalAlpha, minEvents, AlphaSpendType, AlphaParameter, address, rate, offset)
seed |
Seed used for randomization. |
N |
Number of simulations to be created. Because adverse event assignment is stochastic, this number is usually at least 10,000. |
t0 |
Initial time point, a number in units of either days, weeks, months, or years. It is important to be consistent. |
tf |
Final time point, a number in units of either days, weeks, months, or years. |
NStrata |
Number of strata in the observational study design, where a "stratum" can be defined by age categories, sex, and any other defining characteristics. Event rate of the adverse event of interest is also segregated by strata and database population size is also segregated by strata. For example, a single strata might 0-17 year old females. |
strataRatio |
Ratio of individuals within a single strata for exposed and unexposed individuals. The number of elements in this list should be 2*NStrata. |
EventRate |
Rate of event accrual given in events /person-time where the time constant is the same constant being used throughout the study. Additionally, the number of elements in the EventRate matrix should be equal to the NStrata. |
sensitivity |
True sensitivity of the outcome of interest. sensitivity = (true positive case) / (true positive case + false negative case). |
PPVest |
True positive predictive value of outcome in the unexposed group. PPV = (true positive case) / (true positive case + false positive case). |
RR |
Intended relative risk to detect (and therefore to simulate) in the dataset. |
MatchRatio |
Single numeric value. In a self-controlled risk interval design, it is the ratio of the length of the control window to the length of the risk window. |
maxSampleSize |
Maximum number of events before sequential analysis is ended or the upper limit on sample size expressed in terms of total number of events. This is the same variable as N from R Sequential. |
maxTest |
Number of tests to perform on simulation data. |
totalAlpha |
Total amount of alpha available to spend. |
minEvents |
Minimum number of events needed before the null hypothesis can be rejected. Represented as M in R Sequential. |
AlphaSpendType |
Method of alpha expenditure. Available values are "Wald" or "power-type". This is the same as AlphaSpending R Sequential. |
AlphaParameter |
Rho parameter for power-type alpha spending function. This is the same as rho in R Sequential. |
address |
File directory where data for sequential analysis is stored for future tests. |
rate |
Rate of exposure/cohort accrual. |
offset |
Offset for exposure/cohort accrual. |
initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30)
initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30)
This function performs a prespecified number of binomial sequential analyses on real and misclassified binomial data that is designed to simulate a self-controlled risk interval design. Do not run SCRI.seq as a stand-alone function.
SCRI.seq(data, params)
SCRI.seq(data, params)
data |
Output from sim.exposure that contains real and observed data. |
params |
Output from initialize.data function. |
#paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, #strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, #MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", #AlphaParameter=0.5, address=getwd(), rate=20, offset=30) #exposed1 <- create.exposure(paramtest) #exposed2 <- sim.exposure(exposed1, paramtest) #SCRI.seq(exposed2, paramtest)
#paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, #strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, #MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", #AlphaParameter=0.5, address=getwd(), rate=20, offset=30) #exposed1 <- create.exposure(paramtest) #exposed2 <- sim.exposure(exposed1, paramtest) #SCRI.seq(exposed2, paramtest)
SequentialDesign is designed for planning observational database studies that use exact sequential analysis. It is designed to be used in conjunction with the R package Sequential. This package is appropriate to use when one is performing a multi-site observational database study (i.e., an epidemiologic study) and planning to use sequential statistical analysis. This package supports two types of observational study designs:
a self-controlled risk interval design which creates binomial data, and
a current v. historical design which creates Poisson data.
The goal of this package is to allow the investigator to plan for the optimal study.
seq_wrapper(seed, N, t0, tf, NStrata, strataRatio, EventRate, sensitivity, PPVest, RR, MatchRatio, maxSampleSize, maxTest, totalAlpha, minEvents, AlphaSpendType, AlphaParameter, rate, offset, address, ...)
seq_wrapper(seed, N, t0, tf, NStrata, strataRatio, EventRate, sensitivity, PPVest, RR, MatchRatio, maxSampleSize, maxTest, totalAlpha, minEvents, AlphaSpendType, AlphaParameter, rate, offset, address, ...)
seed |
Seed used for randomization |
N |
Number of simulations to be created. Because adverse event assignment is stochastic, this number is usually at least 10,000. |
t0 |
Initial time point, a number in units of either days, weeks, months, or years. It is important to be consistent. |
tf |
Final time point, a number in units of either days, weeks, months, or years. |
NStrata |
Number of strata in the observational study design, where a "stratum" can be defined by age categories, sex, and any other defining characteristics. Event rate of the adverse event of interest is also segregated by strata and database population size is also segregated by strata. For example, a single strata might 0-17 year old females. |
strataRatio |
Ratio of individuals within a single strata for exposed and unexposed individuals. The number of elements in this list should be 2*NStrata. |
EventRate |
Rate of event accrual given in events /person-time where the time constant is the same constant being used throughout the study. Additionally, the number of elements in the EventRate matrix should be equal to the NStrata. |
sensitivity |
True sensitivity of the outcome of interest. sensitivity = (true positive case) / (true positive case + false negative case). |
PPVest |
True positive predictive value of outcome in the unexposed group. PPV = (true positive case) / (true positive case + false positive case). |
RR |
Intended relative risk to detect (and therefore to simulate) in the dataset. |
MatchRatio |
Single numeric value. In a self-controlled risk interval design, it is the ratio of the length of the control window to the length of the risk window. |
maxSampleSize |
Maximum number of events before sequential analysis is ended or the upper limit on sample size expressed in terms of total number of events. This is the same variable as N from R Sequential. |
maxTest |
Number of tests to perform on simulation data. |
totalAlpha |
Total amount of alpha available to spend. |
minEvents |
Minimum number of events needed before the null hypothesis can be rejected. Represented as M in R Sequential. |
AlphaSpendType |
Method of alpha spenditure. Available values are "Wald" or "power-type". This is the same as AlphaSpending R Sequential. |
AlphaParameter |
Rho parameter for power-type alpha spending function. This is the same as rho in R Sequential. |
rate |
Rate of exposure/cohort accrual. |
offset |
Offset for exposure/cohort accrual. |
address |
Output folder where Sequential TXT files are to be stored. These should be preserved between runs, as detailed within the Sequential package. |
... |
additional arguments to be passed to or from methods. |
The simulation has the following steps:
Sample Size Calculations for the study using the R Sequential package
Given these sample size calculations and an exposure uptake function, calculate new exposure accrual in calendar time for the exposures of interest.
Given the simulated exposure information, generate adverse events of interest according to a pre-specified effect size.
Perform sequential analysis on these simulated data.
Generate calendar time descriptive statistics with respect to stopping points.
These steps will be discussed in more detail.
First, the investigator should work with the R package Sequential in order to
calculate design parameters for their study. These are the statistical
parameters that govern stopping points in statistical analysis. The relevant
ones required for this analysis are: maxSampleSize, totalAlpha, minEvents,
AlphaSpendType, AlphaParameter. If binomial data is being used for sequential
analysis of a self-controlled risk interval design, then MatchRatio is also
needed.
Second, this function will generate incident exposure to a simulated
study population based on the parameters of an exposure accrual function.
Third, with incremental exposure accrual information, new adverse events
will be assigned based on user-specified characteristics.
This function also allows for outcome misclassification so true positive
adverse events, false positive adverse events, and false negative adverse
events are all simualted.
Fourth, Sequential analysis is implemented on these simulated data using
function in R Sequential.
Fifth, the investigator is able to generate descriptive statistics in
calendar time to enable the investigator to plan for their analysis.
Simulating sequential analysis in observational data requires many parameter
inputs about
the parameters that control the epidemiologic study design,
the parameters that describe the characteristics of the databases, and
the parameters of the simulation.
In addition to the parameter inputs, there are many sub-functions that are needed to perform different steps in the simulation. These sub-functions are not intended to be run as stand-alone functions but rather always in the sequence specified in this function.
#paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, #strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, #MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", #AlphaParameter=0.5, address=getwd(), rate=20, offset=30) #exposed1 <- create.exposure(paramtest) #exposed2 <- sim.exposure(exposed1, paramtest) #SCRI.seq(exposed2, paramtest)
#paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, #strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, #MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", #AlphaParameter=0.5, address=getwd(), rate=20, offset=30) #exposed1 <- create.exposure(paramtest) #exposed2 <- sim.exposure(exposed1, paramtest) #SCRI.seq(exposed2, paramtest)
This function creates an exposure matrix with real and observed data after taking into account true positive, false negative, and false positive rates. The columns represent strata for the observational data #' and the rows represent new events in unit time. Do not run sim.exposure as a stand-alone function.
sim.exposure(exposed.matrix, params)
sim.exposure(exposed.matrix, params)
exposed.matrix |
Output exposure matrix from create.exposure function. |
params |
Output from initialize.data function. |
paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30) exposed1 <- create.exposure(paramtest) sim.exposure(exposed1, paramtest)
paramtest <- initialize.data(seed=8768, N=1, t0=0, tf=2, NStrata=2, strataRatio=c(0.2, 0.3, 0.3, 0.2), EventRate=c(0.4, 0.5), sensitivity=0.9, PPVest=0.9, RR=3.0, MatchRatio=1, maxSampleSize=200, maxTest=1, totalAlpha=0.05, minEvents=5, AlphaSpendType="Wald", AlphaParameter=0.5, address=getwd(), rate=20, offset=30) exposed1 <- create.exposure(paramtest) sim.exposure(exposed1, paramtest)