Title: | The Self-Controlled Case Series Method |
---|---|
Description: | Various self-controlled case series models used to investigate associations between time-varying exposures such as vaccines or other drugs or non drug exposures and an adverse event can be fitted. Detailed information on the self-controlled case series method and its extensions with more examples can be found in Farrington, P., Whitaker, H., and Ghebremichael Weldeselassie, Y. (2018, ISBN: 978-1-4987-8159-6. Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press) and <https://sccs-studies.info/index.html>. |
Authors: | Yonas Ghebremichael Weldeselassie, Heather Whitaker, Paddy Farrington |
Maintainer: | "Yonas Ghebremichael Weldeselassie" <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.7 |
Built: | 2024-11-28 06:43:11 UTC |
Source: | CRAN |
Fits the self-controlled case series model used to investigate the association between a time-varying exposures such as vaccines or other drugs and an adverse event. Some extensions of the SCCS method can be fitted with this package.
Note: Ages and times describing the start and end of observation, exposure times, and event times, should be expressed as integers. In most of the datasets, units of time are days; other choices may be appropriate according to context, but must be given as integers. Ages appearing as covariates need not be integers.
Package: | SCCS |
Type: | Package |
Version: | 1.7 |
Date: | 2024-04-01 |
License: | "GPL-2" |
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington
Maintainer: Yonas Ghebremichael-Weldeselassie <[email protected]>
https://sccs-studies.info/index.html
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
The data comprise ages in days at first gastro-intestinal (GI) bleed, treatments with non-steroidal anti-inflammatory drugs (NSAIDs), and treatments with antidepressants (SDs). There are 1000 simulated cases based on Tata et al (2005).
addat
addat
A data frame containing 3628 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'bleed' (age at first GI bleed), 'ns' (age at first day of NSAID treatment episode), 'endns' (age at last day of NSAID treatment episode), 'ad' (age at first day of AD treatment episode), 'endad' (age on last day of AD treatment episode). The data are in format 'stack' (see dataformat
for details) with different exposure episodes on different rows.
Tata, L.J., Fortun, P. J., Hubbard, R. B., Smeeth, L., Hawkey, C. J., Smith, P., Whitaker, H. J., Farrington, C. P., Card, T. R. and West J. (2005). Does concurrent prescription of selective serotonin reuptake inhibitors and non-steroidal anti-inflammatory drugs substantially increase the risk of upper gastrointestinal bleeding? Alimentary Pharmacology and Therapeutics 22, 175–181.
The data comprise ages in days at first fracture and start of treatment with a thiazolidinedione antidiabetic. There are 2000 simulated cases based on Douglas et al (2009). Observation is curtailed at the end of treatment.
adidat
adidat
A data frame containing 2000 rows and 6 columns. The column names are 'case' (individual identifier), 'frac' (age at fracture), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'adi' (age at first day of antidiabetic treatment), 'type' (fracture type: 1 for foot/ankle/wrist/hand, 2 for hip, 3 for spine).
Douglas, I. J., Evans, S. J., Pocock, S. and Smeeth L. (2009). The risk of fractures associated with thiazolidinediones: A self-controlled case-series study. PLoS Medicine 6 (9), e1000154.
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination and hospital admission for aseptic meningitis. There are 10 admissions in 10 children.
amdat
amdat
A data frame containing 10 rows and 5 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'am' (age at admission for aseptic meningitis, with a confirmed diagnosis of viral meningitis), and 'mmr' (age at mmr vaccination).
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Miller, E., Goldacre, M., Pugh, S., Colville, A., Farrington, C.P., Flower, A., Nash, J., MacFarlane, L. and Tettmar, R. Risk of aseptic meningitis after measles, mumps and rubella vaccine in UK children (1993). The Lancet 341, 979–982.
The data comprise ages in days at first stroke and treatments with antipsychotics. There are 2000 simulated cases based on Douglas and Smeeth (2008), including 1500 without dementia and 500 with dementia.
apdat
apdat
A data frame containing 11792 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'stro' (age at first stroke), 'ap' (age at first day of antipsychotic treatment episode), 'endap' (age at last day of antipsychotic treatment episode), 'cen' (1 if observation ended before the end of the study, 0 otherwise), 'dem' (0 if the patient does not have dementia, 1 if the patient has dementia). The data are in format 'stack' with different exposure episodes on different rows.
Douglas, I. J. and Smeeth L. (2008). Exposure to antipsychotics and risk of stroke: Self-controlled case series study. British Medical Journal 337, a1227.
The data comprise ages in days at measles mumps rubella (MMR) vaccination and autism diagnosis. There are 350 simulated cases based on Taylor et al (1999).
autdat
autdat
A data frame containing 350 rows and 5 columns. The column names are 'case' (individual identifier), 'diag' (age at autism diagnosis), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'mmr' (age at MMR vaccination.
Taylor, B., Miller, E., Farrington, C. P., Petropoulos, M.C., Favot-Mayaud, I., Li, J., and Waight, P. A. (1999). Autism and measles, mumps and rubella vaccine: No epidemiological evidence for a causal association. Lancet 353, 2026-2029.
The data comprise systolic and diastolic blood pressures taken twice a day for seven days, and start of headache in the next period. There are 71 headaches in 64 cases.
bpdat
bpdat
A data frame containing 896 rows and 6 columns. The column names are 'case' (individual identifier), 'dow' (day of week, coded 1 for Monday,...,7 for Sunday), 'time' (time of reading, coded 1 for am, 2 for pm), 'sys' (systolic blood pressure), 'dia' (diastolic blood pressure), 'head' (1 if a headache started in the period up to the next blood pressure reading, or within 12 hours of the last reading).
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
The data comprise simulated data on bupropion and sudden death. There are 121 cases. The start of observation coincides with age at first bupropion prescription; the nominal end of observation is age on 11 November 2003 (day 1136). Ages are in days.
bupdat
bupdat
A data frame containing 121 rows and 4 columns. The column names are 'case' (individual identifier), 'date' (date of first bupropion prescription, day 0 = 1 October 2000), 'bup' (age at first bupropion prescription), 'death' (age at sudden death).
Hubbard R., Lewis S., West J., Smith C., Godfrey C., Smeeth L., Farrington P. and Britton J. (2005). Bupropion and the risk of sudden death: a self-controlled case series analysis using The Health Improvement Network. Thorax 60, 848-850.
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination, Haemophilus influenzae type b (Hib) booster or catch-up vaccination, and febrile convulsion. There are 2435 convulsions in 2201 children. The ages have been jittered.
condat
condat
A data frame containing 2435 rows and 11 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'hib' (age at Hib booster/catch-up vaccination), 'mmr' (age at MMR vaccination), 'sex' (1 for males, 2 for females), 'gap' (days from convulsion to next convulsion within the same case, or to end of observation), 'cen' (0 if last admission for this case, 1 otherwise), 'rec' (within-case event number), 'ngrp' (1 if case has a unique event, 2 if case has 2+ events).
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
The data comprise ages in days at diphteria, tetanus and pertussis (DTP) vaccination and febrile convulsion. There are 1379 convulsions in 1214 children. The ages have been jittered.
dtpdat
dtpdat
A data frame containing 1379 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'dtp' (age at first dose of DTP), 'dtpd2' (age at second dose of DTP), 'dtpd3' (age at third dose of DTP), 'sex' (1 for males, 2 for females).
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
One of the assumptions of the self-controlled case series model is that occurence of an event does not affect subsequent exposure. This function fits the modified SCCS model when the assumption is not satisfied, see Farrington et al (2009). This modified method assumes that no exposure is possible following a unique event. It requires that exposure is of a fixed finite length and that the end of observation that would have applied in the absence of the event is known.
eventdepenexp(indiv, astart, aend, aevent, adrug, aedrug, expogrp=0, sameexpopar=T,agegrp=NULL, dataformat="stack", verbose=F, tolerance=1e-8,itermax=100, data)
eventdepenexp(indiv, astart, aend, aevent, adrug, aedrug, expogrp=0, sameexpopar=T,agegrp=NULL, dataformat="stack", verbose=F, tolerance=1e-8,itermax=100, data)
indiv |
a vector of individual identifiers of cases |
astart |
a vector of ages at which the observation periods start |
aend |
a vector of ages at end of observation periods, that would have applied in the absence of the event |
aevent |
a vector of ages at event (one event per case) |
adrug |
a vector of ages at which exposure starts or a matrix if there are multiple episodes of the same exposure type ( |
aedrug |
a vector of ages at which exposure-related risk ends or a matrix if there are multiple episodes of the same exposure type. The dimension of |
expogrp |
a vector of days to the start of exposure-related risk, counted from |
sameexpopar |
a logical value. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in |
agegrp |
a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stacked in one column. In the "multi" |
verbose |
a logical value indicating whether information about the iterations should be printed. Default is FALSE |
tolerance |
the convergence tolerance when estimating the parameters. Defaults to 1e-8. |
itermax |
maximum number of iterations. 100 is the default. |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
This model fits a SCCS model with event-dependent exposures.
Relative incidence estimates along with their 95% confidence intervals.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington, C. P., Whitaker H.J., and Hocine M.N. (2009). Case series analysis for censored, perturbed or curtailed post-event exposures. Biotatistics, 10(1), 3-16.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
library(SCCS) # Analysis of rotavirus vaccination and intussusception data # Model 1: Three doses of the same vaccine exposure OPV (OPV, OPV2 and opv3), # only one risk period [adrug, aedrug] rot.mod1 <- eventdepenexp(indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(rv,rvd2), expogrp=1,aedrug=cbind(rv+21,rvd2+21), agegrp=seq(56,168,14), dataformat="multi", data=rotdat) rot.mod1 # Model 2: Two doses with two riks periods, 1-7 and 8-21 rot.mod2 <- eventdepenexp(indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(rv,rvd2), aedrug=cbind(rv+21,rvd2+21), expogrp=c(1,8), agegrp=seq(56,168,14), dataformat="multi", data=rotdat) rot.mod2
library(SCCS) # Analysis of rotavirus vaccination and intussusception data # Model 1: Three doses of the same vaccine exposure OPV (OPV, OPV2 and opv3), # only one risk period [adrug, aedrug] rot.mod1 <- eventdepenexp(indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(rv,rvd2), expogrp=1,aedrug=cbind(rv+21,rvd2+21), agegrp=seq(56,168,14), dataformat="multi", data=rotdat) rot.mod1 # Model 2: Two doses with two riks periods, 1-7 and 8-21 rot.mod2 <- eventdepenexp(indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(rv,rvd2), aedrug=cbind(rv+21,rvd2+21), expogrp=c(1,8), agegrp=seq(56,168,14), dataformat="multi", data=rotdat) rot.mod2
One of the assumptions of the self-controlled case series models is that the observation period for each individual is independent of event times. If an event increases the risk of death, such as myocardial infraction or stroke, this assumption is violated. This function fits the modified SCCS model when the assumption is not satisfied i.e ages at end of observation periods might depend on age at event as outlined in Farrington et al (2011).
eventdepenobs(formula, indiv, astart, aend, aevent, adrug, aedrug, censor, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, dataformat="stack", covariates=NULL, regress=F, initval=rep(0.1, 7), data)
eventdepenobs(formula, indiv, astart, aend, aevent, adrug, aedrug, censor, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, dataformat="stack", covariates=NULL, regress=F, initval=rep(0.1, 7), data)
formula |
a model formula. The dependent variable should always be "event" e.g. event ~ itp. If age effects are included in the model, the word 'age' must be used in the formula, e.g event ~ itp + age. |
indiv |
a vector of individual identifiers of cases |
astart |
a vector of ages at which the observation periods start |
aend |
a vector of ages at end of observation periods |
aevent |
a vector of ages at event (one event per case) |
adrug |
a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( |
aedrug |
a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of |
censor |
a vector of indicators for whether an observation periods were censored (1 = observation period ended early, 0 = fully observed). |
expogrp |
list of vectors of days to the start of exposure-related risk, counted from |
washout |
list of vectors with days on start of washout periods counted from |
sameexpopar |
a vector of logical values. If TRUE (the default) no dose effect is assumed, the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in |
agegrp |
a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same exposure type are stacked in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. |
covariates |
list of covariates believed to affect the age at censoring (age at end of observation period) (e.g. covariates = gender). |
regress |
logical, |
initval |
a vector of intial values used in fitting the weight functions. These are given in the order of: 1. Log mean of the exponential component 2. Intercept of the EG/EW log mean function 3. Intercept of the EG/EW log shape function 4. Intercept of the logit mixing probability function 5. Regression parameter of the G/W log mean functions, if regress = T 6. Regression parameter of the G/W log shape function, if regress=T 7. Regression parameter of the G/W logit mixing probability function, if regress=T. When |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
This model is suitable when the event increases the risk of death, such as myocardial infraction (MI) or stroke. It is not suitable when the event itself is death. Four models are fitted to the interval between the age at end of observation and the event date, these are detailed in section 5.4 of Farrington et al (2011). The model with the lowest AIC is selected, and used to estimate weights that replace interval lengths in the model formula. This modification allows unbiased estimates of the exposure effect to be estimated, while age effects take on a different interpretation as they include the thinning effect of censoring.
summary |
exposure related relative incidence estimates along with their 95% confidence intervals, age related relative incidence estimates and estimates of interactions with covariates if there are any. |
modelfit |
model fit of the 4 different weight functions and their AIC values. |
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington, C. P., Anaya-Izquierdo, A., Whitaker, H. J., Hocine, M.N., Douglas, I., and Smeeth, L. (2011). Self-Controlled case series analysis With event-Dependent observation periods. Journal of the American Statistical Association 106 (494), 417–426.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
library(SCCS) # Nicotine replacement therapy and myocardial infarction (MI) # With no age effect included nrt.mod <- eventdepenobs(event~nrt, indiv=case, astart=nrt, aend=act, aevent=mi, adrug=nrt, aedrug=nrt+28, censor=cen, expogrp=c(0,8,15,22), agegrp=NULL, data=nrtdat) # Respiratory tract infections and MI # Age effect included # intial values provided and there are two risk periods uni <- (1-duplicated(midat$case)) ageq <- floor(quantile(midat$mi[uni==1], seq(0.1,0.9,0.1), names=FALSE)) # age groups mi.mod <- eventdepenobs(event~rti+age, indiv=case, astart=sta, aend=end, aevent=mi, adrug=rti, aedrug=rti+14, expogrp=c(0,8), agegrp=ageq, censor=cen, data=midat, initval=rep(1.1,4)) mi.mod
library(SCCS) # Nicotine replacement therapy and myocardial infarction (MI) # With no age effect included nrt.mod <- eventdepenobs(event~nrt, indiv=case, astart=nrt, aend=act, aevent=mi, adrug=nrt, aedrug=nrt+28, censor=cen, expogrp=c(0,8,15,22), agegrp=NULL, data=nrtdat) # Respiratory tract infections and MI # Age effect included # intial values provided and there are two risk periods uni <- (1-duplicated(midat$case)) ageq <- floor(quantile(midat$mi[uni==1], seq(0.1,0.9,0.1), names=FALSE)) # age groups mi.mod <- eventdepenobs(event~rti+age, indiv=case, astart=sta, aend=end, aevent=mi, adrug=rti, aedrug=rti+14, expogrp=c(0,8), agegrp=ageq, censor=cen, data=midat, initval=rep(1.1,4)) mi.mod
The data comprise ages in days at MMR vaccine and convusion in children aged 366 to 730 days of age. The convulsions are two types: febrile or non-febrile. There are 988 events in 894 cases.
febdat
febdat
A data frame containing 988 rows and 7 columns. The column names are 'case' (individual identifier), 'conv' (age at convulsion), 'sta' (age at start of observation period), 'end' (age at end of observation period), 'mmr' (age at MMR vaccine), 'sex' (coded 1 for males, 2 for females), 'type' (coded 1 for non-febrile convulsion, 2 for febrile convulsion).
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
Reformats the data based on age and/or season and exposure groups prior to fitting SCCS model.
formatdata(indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, seasongrp=NULL, dob=NULL, cov = cbind(), dataformat="stack", data)
formatdata(indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, seasongrp=NULL, dob=NULL, cov = cbind(), dataformat="stack", data)
indiv |
a vector of individual identifiers of cases |
astart |
a vector of ages at which the observation periods start |
aend |
a vector of ages at end of observation periods |
aevent |
a vector of ages at event, an individual can experience multiple events |
adrug |
a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( |
aedrug |
a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of |
expogrp |
list of vectors of days to the start of exposure-related risk, counted from |
washout |
list of vectors with days on start of washout periods counted from |
sameexpopar |
a vector of logical values. If TRUE (the default) no dose effect is assumed, the same exposure parameters are used for multiple doses/episodes of the same exposure type presented in |
agegrp |
a vector of cut points of the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at |
seasongrp |
a vector of cut points for seasonal effects. The values should be given in ddmm format, representing the first days of each season group. The seasonal effect is a factor, the reference level being the time interval starting at the earliest date in |
dob |
a vector of birth dates of the cases, in ddmmyyyy format. They are used if seasonal effects are included in the model. The default |
cov |
a vector (or a matrix if there are multiple) of fixed covariates. The default is NULL where no covariates are included. |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
a data frame containing the following columns:
indivL |
an identfier for each individual event. |
event |
indicator for presence of an event within an interval. "1" where an event occured, "0" otherwise. |
age |
factor for age groups. |
Season |
a factor for season if |
exposures |
factors for exposure status of each exposure type. "0" for baseline/control periods, "1" for the first risk period. "1" for subsequent exposure risk periods if sameexpopar=TRUE, or increasing factor levels for each subsequent exposure if sameexpopar = FALSE. Indicators for washout periods (if there are any) are also included here. The column names of these factors are the same as the column names of the exposures in |
interval |
length of interval. Needed for offsets within the model. |
There are also columns for eventday (day of adverse event), lower (day a period starts), upper (day a period ends), indiv (original individual indentifier), aevent, astart, aend and any covariates included in cov
.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# MMR vaccine and ITP data # A single exposure with three risk periods and no age groups included itp.dat1 <- formatdata(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.dat1 # A single exposure with three risk periods and six age groups itp.dat2 <- formatdata(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.dat2
# MMR vaccine and ITP data # A single exposure with three risk periods and no age groups included itp.dat1 <- formatdata(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.dat1 # A single exposure with three risk periods and six age groups itp.dat2 <- formatdata(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.dat2
The data comprise days (day 1 is 1st October 2010) at seasonal influenza vaccination and at onset of Guillain-Barre Syndrome (GBS) in Italy, gathered in the 2010-2011 influenza season. There are 174 cases from Galeotti (2013). Times have been jittered.
gbsdat
gbsdat
A data frame containing 174 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (the first day of the observation period), 'end' (the last day of the observation period), 'gbs' (day of GBS onset), 'flu' (day of influenza vaccination), 'sage' (age in years on day 1, 1st October 2010).
Galeotti, F., M. Massari, R. D'Alessandro, E. Beghi, A. Chio, G. Logroscino, G. Filippini, M. D. Benedetti, M. Pugliatti, C. Santiccio, and R. Raschetti (2013). Risk of Guillain-Barre syndrome after 2010-2011 influenza vaccination. European Journal of Epidemiology 28 (5), 433-444.
The data comprise ages in days at first gastro-intestinal (GI) bleed and treatments with non-steroidal anti-inflammatory drugs (NSAIDs). There are 838 simulated cases based on Tata et al (2005).
gidat
gidat
A data frame containing 2920 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on the first day of the observation period), 'end' (age on last day of the observation period), 'bleed' (age at first GI bleed), 'ns' (age at first day of NSAID treatment episode), 'endns' (age at last day of NSAID treatment episode). The data are in format 'stack' with different exposure episodes on different rows.
Tata, L. J., Fortun, P. J., Hubbard, R. B., Smeeth, L., Hawkey, C. J., Smith, P., Whitaker, H. J., Farrington, C. P., Card, T. R., and West J. (2005). Does concurrent prescription of selective serotonin reuptake inhibitors and non-steroidal anti-inammatory drugs substantially increase the risk of upper gastrointestinal bleeding? Alimentary Pharmacology and Therapeutics 22, 175–181.
The data comprise ages at diphteria, tetanus and pertussis (DTP) vaccination, Haemophilus influenzae type b (Hib) vaccination, and febrile convulsion. There are 1378 convulsions in 1213 children. The ages have been jittered.
hibdat
hibdat
A data frame containing 1378 rows and 11 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'conv' (age at convulsion), 'dtp' (age at first dose of DTP), 'dtpd2' (age at second dose of DTP), 'dtpd3' (age at third dose of DTP), 'hib' (age at first dose of Hib), 'hibd2' (age at second dose of Hib), 'hibd3' (age at third dose of hib), 'sex' (1 for males, 2 for females).
The data comprise ages in days at first treatment with antidepressant and first hip fracture. There are 1000 simulated cases based on Hubbard et al (2003).
hipdat
hipdat
A data frame containing 1000 rows and 6 columns. The column names are 'case' (individual identifier), 'frac' (age at first hip fracture), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'ad' (age at start of first treatment with antidepressant), 'endad' (age at end of first antidepressant treatment).
Hubbard, R., Farrington, C.p., Smith, C., Smeeth, L., and Tattersfield A. (2003). Exposure to tricyclic and selective serotonin reuptake inhibitor antidepressants and the risk of hip fracture. American Journal of Epidemiology 158 (1), 77–84.
The data comprise ages in days at intussusception and oral polio vaccine (OPV) from Cuba. There are 273 intussusceptions in 273 children. The ages have been jittered.
intdat
intdat
A data frame containing 273 rows and 7 columns. The column names are 'case' (individual identifier), 'intus' (age at intussusception), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'opv' (age at first dose of OPV), 'opvd2' (age at second dose of OPV), 'dob' (date of birth, in ddmmyyyy format).
Sardinas, M.A.G., Cardenas, A.z., Marie, G.c., Santiago, M.A., Sanchez, M.V., and Farrington C. P. (2001). Lack of association between intussusception and oral polio vaccine in Cuban children. European Journal of Epidemiology 17 (8), 783-787.
Evaluates design matrix for integrals of I-splines and integrals of the integrals. The function evaluates first, second and third integrals of I-splines.
integrateIspline(x, knots1, m, int)
integrateIspline(x, knots1, m, int)
x |
a numeric vector of values at which to evaluate the integrals of I-spline functions |
knots1 |
a numeric vector of interior knot positions with non-decreasing values |
m |
a positive integer giving the order of the spline function. |
int |
a positive integer (1, 2, or 3) for first, second or third integral of an I-spline function. |
A matrix with length of (x) rows and length of (knots1) - m columns.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Ghebremichael-Weldeselassie Y. (2014). Smooth risk functions for self-cotrolled case series models. PhD thesis, The Open University.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Spline-based self controlled case series method. Statistics in Medicine 33:639-649.
The data comprise ages in days at measles, mumps and rubella (MMR) vaccination and hospital admission for idiopathic thrombocytopaenic purpura (ITP). There are 44 admissions in 35 children.
itpdat
itpdat
A data frame containing 44 rows and 9 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'itp' (age at admission for ITP), 'mmr' (age at mmr vaccination), 'sex' (1 for males, 2 for females), 'gap' (days from ITP admission to next ITP admission within the same case, or to end of observation), 'cen' (0 if last admission for this case, 1 otherwise), 'rec' (within-case event number).
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Miller, E., Waight, P., Farrington, P., Andrews, N., Stowe, J., and Taylor B. (2001). Idiopathic thrombocytopenic purpura and MMR vaccine. Archives of Disease in Childhood 84, 227–229.
The function performs the likelihood ratio test for SCCS models that are nested (up to combining of multinomial categories).
lrtsccs(model1, model2)
lrtsccs(model1, model2)
model1 |
an object fitted by the SCCS method e.g |
model2 |
an object fitted by the SCCS method e.g |
likelihood ratio test statistic, degrees of freedom and p-value.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
itp.mod1 <- standardsccs(event~mmr+age, indiv=case, astart=sta,aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod2 <- standardsccs(event~age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod3 <- standardsccs(event~mmr + age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427,488,549,610,671), data=itpdat) # Compare itp.mod1 a model with both age and exposure (mmr) and itpmod2 a model # with only age effect lrtsccs(itp.mod1,itp.mod2) # Compare itp.mod1 a model with both age and 3 exposure categories and itpmod3 # a model with age and only one exposure category lrtsccs(itp.mod3,itp.mod1) # order of the objects doesn't matter
itp.mod1 <- standardsccs(event~mmr+age, indiv=case, astart=sta,aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod2 <- standardsccs(event~age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod3 <- standardsccs(event~mmr + age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427,488,549,610,671), data=itpdat) # Compare itp.mod1 a model with both age and exposure (mmr) and itpmod2 a model # with only age effect lrtsccs(itp.mod1,itp.mod2) # Compare itp.mod1 a model with both age and 3 exposure categories and itpmod3 # a model with age and only one exposure category lrtsccs(itp.mod3,itp.mod1) # order of the objects doesn't matter
These simulated data comprise ages in days at respiratory tract infection and first myocardial infarction in 1000 cases.
midat
midat
A data frame containing 2187 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'mi' (age at first myocardial infarction), 'rti' (age at respiratory tract infection), 'cen' (the censoring indicator, equal to 1 if the observation period is censored, 0 if not). The data are in format "stack".
Smeeth L., Thomas, S. L., Hall, A. J., Hubbard, R., Farrington, P., Vallance, P. (2004). Risk of myocardial infarction and stroke after acute infection or vaccination. New England Journal of Medicine 351, 2611-2618.
Fits a spline-based non parametric SCCS model where both the exposure related relative incidence and age related relative incidence functions are represented by spline functions; that is, linear combinations of M-splines.
nonparasccs(indiv, astart, aend, aevent, adrug, aedrug, kn1=12, kn2=12, sp1=NULL, sp2=NULL, data)
nonparasccs(indiv, astart, aend, aevent, adrug, aedrug, kn1=12, kn2=12, sp1=NULL, sp2=NULL, data)
indiv |
a vector of individual identifiers of cases. |
astart |
a vector of ages at start of observation periods. |
aend |
a vector of ages at end of observation periods. |
aevent |
a vector of ages at event, an individual can experience multiple events. |
adrug |
a vector of ages at which exposure related risk period starts. |
aedrug |
a vector of ages at which exposure-related risk ends. |
kn1 |
an integer >= 5 representing the number of interior knots used to define the M-spline basis functions which are related to the age specific relative incidence function, usually between 8 and 12 knots is sufficient. It defaults to 12 knots. |
kn2 |
a an integer >= 5 representing the number of interior knots used to define the M-spline basis functions which are related to the exposure specific relative incidence function, usually between 8 and 12 knots is sufficient. The default value is 12. |
sp1 |
smoothing parameter value for age related relative incidence function. It defaults to "NULL" where the smoothing parameter is obtained automatically using an approximate cross-validation method. The value of "sp1" must be a number greater or equal to 0. |
sp2 |
smoothing parameter value for exposure related relative incidence function. It defaults to "NULL" where the smoothing paramter is obtained automatically using an approximate cross-validation method. The value of "sp1" must be a number greater or equal to 0. |
data |
A data frame containing the input data. |
The smoothing parameters for the age and exposure related relative incidence functions are chosen using a cross-validation method. To visualize the exposure-related relative incidence function, use the plot function.
Relative incidence estimates along with their 95% confidence intervals.
estimates |
exposure related relative incidence estimates at each point of time since start of exposure until the maximum difference between the start and end of exposure. |
timesinceexposure |
time units since the start of exposure. |
lci |
lower confidence limits of the exposure related relative incidence estimates. |
uci |
upper confidence limits of the exposure related relative incidence estimates. |
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2016). Flexible modelling of vaccine effects in self-controlled case series models. Biometrical Journal, 58(3):607-622.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2017). Spline-based self controlled case series method. Statistics in Medicine 33:639-649.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# ITP and MMR data itp.mod <- nonparasccs(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, sp1=28000, sp2=1200, data=itpdat) itp.mod # Plot the exposure and age related relative incidence functions plot(itp.mod)
# ITP and MMR data itp.mod <- nonparasccs(indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, sp1=28000, sp2=1200, data=itpdat) itp.mod # Plot the exposure and age related relative incidence functions plot(itp.mod)
The data comprise ages in days at first treatment with nicotine replacement therapy (NRT) and first subsequent myocardial infarction (MI). There are 141 simulated cases based on Tata et al (2005).
nrtdat
nrtdat
A data frame containing 141 rows and 7 columns. The column names are 'case' (individual identifier), 'nrt' (age at initiation of NRT treatment), 'mi' (age at first subsequent MI), 'end' (age at end of nominal observation period, nrt + 365), 'act' (age at earliest of end and actual end of observation period), 'cage' (centred age, in years, at NRT), 'cen' (1 if act = end, 0 if act < end).
Hubbard, R., Lewis, S., Smith, C., Godfrey, C., Smeeth, L., Farrington, P., and Britton J. (2005). Use of nicotine replacement therapy and the risk of acute myocardial infarction, stroke and death. Tobacco Control 14, 416-421.
The data comprise ages in days at oral polio vaccine (OPV) and first hospital admission for intussusception in children between the ages of 27 and 365 days. There are 207 cases.
opvdat
opvdat
A data frame containing 207 rows and 8 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'intus' (age at admission for intussusception), 'opv' (age at first dose of OPV), 'opvd2' (age at second dose of OPV), 'opvd3' (age at third dose of OPV), 'sex' (1 for males, 2 for females).
Whitaker, H. J., Farrington, C. P., Spiessens, B., and Musonda, P. (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine 25, 1768–1797.
Andrews N., Miller, E., Waight, P., Farrington, P., Crowcroft, N., Stowe, J., and Taylor B. (2002). Does oral polio vaccine cause intussusception in infants? Evidence from a sequence of three self-controlled case series studies in the United Kingdom. European Journal of Epidemiology 17, 701–706.
The data are a daily time series of hospital admissions for asthma in Nottingham, with levels (in micrograms per cubic metre) of particulate matter less than 10 micrometres in diameter (PM10 levels).
pmdat
pmdat
A data frame containing 2922 rows and 3 columns. The column names are 'day' (day of observation, 1 to 2922), 'asma' (number of asthma admissions on that day), and 'pm10' (PM10 level on that day). The data are in time series format, for SCCS analysis using generalised linear models.
Farrington, C. P. and Whitaker H. J. (2006). Semiparametric analysis of case series data (with discussion). Journal of the Royal Statistical Society, Series C 55 (5), 553-594.
This function fits the starndard SCCS model where the exposures are measured on a continuous scale.
quantsccs(formula, indiv, event, data)
quantsccs(formula, indiv, event, data)
formula |
model formula. The dependent variale should always be "event" e.g. event ~ expo, where expo is an exposure measured at each time unit. |
indiv |
a vector of individual identifiers of cases |
event |
number of events occuring at each time unit. |
data |
a data frame containing the input data. Data are assembled one line per time unit of observation. |
In this method exposures are measured at successive time points within the observation period for each case. And number of events experienced by each case at each time point are recorded.
Relative incidence estimates along with their 95% confidence intervals.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# Headaches and blood pressure data. sys and dia (continuoous exposures) are systolic and # diastolic blood pressure measurements respectively bp.mod <- quantsccs(event~sys+dia, indiv=case, event=head, data=bpdat) bp.mod
# Headaches and blood pressure data. sys and dia (continuoous exposures) are systolic and # diastolic blood pressure measurements respectively bp.mod <- quantsccs(event~sys+dia, indiv=case, event=head, data=bpdat) bp.mod
The data comprise ages in days at rotavirus vaccine (RV) and first symptoms of intussusception in children between the ages of 42 and 183 days. There are 566 cases. Ages have been jittered.
rotdat
rotdat
A data frame containing 566 rows and 6 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the observation period), 'intus' (age at first symptom of intussusception), 'rv' (age at first dose of RV), 'rvd2' (age at second dose of RV).
Stowe J., Andrews, N., Ladhani, S., and Miller E. (2016). The risk of intussusception following monovalent rotavirus vaccination in England: A self-controlled case-series evaluation. Vaccine 34, 3684-3689.
The data are a weekly time series of counts of respiratory syncytial virus (RSV) isolates from England and Wales and average temperatures (in degrees Celsius) in Central England.
rsvdat
rsvdat
A data frame containing 364 rows and 5 columns. The column names are 'year' (year of report, 1996 to 2003), 'week' (week of the year, numbered 1 to 52), 'win' (4-week window, numbered 1 to 91), 'temp' (average daily temperature for the week prior to the current week), and 'rsv' (count of RSV isolates for the current week). The data are in time series format, for SCCS analysis using generalised linear models.
Whitaker, H. J., Hocine, N. and Farrington C. P. (2007). On case-crossover methods for environmental time series data. Environmetrics 18 (2), 157-171.
The function calculates the sample size required for an SCCS analysis.
samplesize(eexpo, risk, astart, aend, p, alpha=0.05, power=0.8, eage=NULL, agegrp=NULL)
samplesize(eexpo, risk, astart, aend, p, alpha=0.05, power=0.8, eage=NULL, agegrp=NULL)
eexpo |
the design value of the exposure related relative incidence. |
risk |
a positive number showing the duration the risk period. It should not be greater than the duration of the shortest age group, if age groups are specified. |
astart |
age at start of the observation period. It is an integer greater or equal to 0, that is the same start of observation for all cases. |
aend |
age at end of an observation period. It is an integer greater than 0, that is the same end of observation for all cases. |
p |
a vector/scalar of proportions of exposed in each age group, the default is p=1 where there are no age effects and all cases are exposed. |
alpha |
level of significance e.g 0.05. |
power |
the power required to detect the relative incidence specified in |
eage |
age related relative incidence parameters, the reference group is the first one. |
agegrp |
a vector of cut points of the age groups where each value represents the start of an age category. The first element in the vector is the start of the second age group. The first age group starts at the minimum of |
a sample size is produced.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Musonda, P., Farrington, C. P., and Whitaker, H. (2006). Sample sizes for self-controlled case series studies. Statistics in Medicine 25, 2618–2631.
Farrington P., Whitaker H., and Ghebremichael-Weldeselassie Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# Sample size for exposure RI = 2.5 with 21 days risk period, # all cases exposed. The level of significance is 0.05 # with 80% power. The sample size of when p=1 is: ss1 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=1, alpha=0.05, power=0.8) ss1 # the sample size of events from cases exposed or not when 75% of the # population are exposed ss2 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=0.75, alpha=0.05, power=0.8) ss2 # Sample size when age effect is included and the proportions of the # target exposed population which are exposed in each age group # are p=c(0.50,0.35,0.1,0.05): ss3 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8, eage=c(1.2,1.6,2.0), agegrp=c(457,548,639)) ss3 # Suppose that the sample is from the entire population with 75% exposed, # then p=0.75*c(0.50,0.35,0.1,0.05) ss4 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=0.75*c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8, eage=c(1.2,1.6,2.0), agegrp=c(457,548,639)) ss4
# Sample size for exposure RI = 2.5 with 21 days risk period, # all cases exposed. The level of significance is 0.05 # with 80% power. The sample size of when p=1 is: ss1 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=1, alpha=0.05, power=0.8) ss1 # the sample size of events from cases exposed or not when 75% of the # population are exposed ss2 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=0.75, alpha=0.05, power=0.8) ss2 # Sample size when age effect is included and the proportions of the # target exposed population which are exposed in each age group # are p=c(0.50,0.35,0.1,0.05): ss3 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8, eage=c(1.2,1.6,2.0), agegrp=c(457,548,639)) ss3 # Suppose that the sample is from the entire population with 75% exposed, # then p=0.75*c(0.50,0.35,0.1,0.05) ss4 <- samplesize(eexpo=2.5, risk=21, astart=366, aend=730, p=0.75*c(0.50,0.35,0.1,0.05), alpha=0.05, power=0.8, eage=c(1.2,1.6,2.0), agegrp=c(457,548,639)) ss4
The function fits the semiparametric self-controlled case series method where the age effect is left unspecified, as published in Farrington and Whitaker (2006).
semisccs(formula, indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), dataformat="stack", data)
semisccs(formula, indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), dataformat="stack", data)
formula |
model formula. The dependent variable should always be "event" e.g. event ~ itp. There is no need to specify age in the model formula. |
indiv |
a vector of individual identifiers of cases. |
astart |
a vector of ages at which the observation periods start. |
aend |
a vector of ages at end of observation periods. |
aevent |
a vector of ages at event, an individual can experience multiple events. |
adrug |
a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( |
aedrug |
a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of |
expogrp |
list of vectors of days to the start of exposure-related risk, counted from |
washout |
list of vectors with days to start of washout periods counted from |
sameexpopar |
a vector of logical values. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses of the same exposure type, presented in |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same type are recorded as separate columns in the dataframe. |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
In the standard SCCS method both age and exposure effects are modelled using step functions. However, mis-specification of age groups in the standard SCCS may lead to bias in the exposure related relative incidence estimates. In the semiparametric SCCS no age groups are pre-specified. A parameter for each day an event occurred is fitted, which means that this method is only suitable for small to medium sized data sets. An alternative for large data sets is provided by smoothagesccs
.
The function returns age and exposure related relative incidence estimates along with 95% confidence limits.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington, C. P., Whitaker, H. J. (2006). Semiparametric analysis of case series data. Applied Statistics, 55(5): 553–594.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
standardsccs
, smoothagesccs
, smoothexposccs
# Example 1 # Semiparametric model for the ITP and MMR vaccine data itp.mod1 <- semisccs(event~mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.mod1 # Example 2 # Data on itp and mmr vaccine # Sex and mmr interaction included itp.mod2 <- semisccs(event~factor(sex)*mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.mod2
# Example 1 # Semiparametric model for the ITP and MMR vaccine data itp.mod1 <- semisccs(event~mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.mod1 # Example 2 # Data on itp and mmr vaccine # Sex and mmr interaction included itp.mod2 <- semisccs(event~factor(sex)*mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), data=itpdat) itp.mod2
These simulated data comprise ages in days at hexavalent vaccination and SIDS in 300 cases.
siddat
siddat
A data frame containing 300 rows and 5 columns. The column names are 'case' (individual identifier), 'sta' (age on first day of the observation period), 'end' (age on last day of the nominal observation period), 'sids' (age at SIDS), 'hex' (age at first hexavalent vaccination), 'hexd2' (age at second dose), 'hexd3' (age at third dose).
Kuhnert R., Hecker, H., Poethko-Muller, C., Schlaud, M., Vennemann, M., Whitaker, H.J., and Farrington C. P. (2011). A modified self-controlled case series method to examine association between multidose vaccinations and death. Statistics in Medicine 30, 666-677.
This function creates a simulated SCCS data set with given design parameters, and can be used to generate cases with observation and risk periods of different durations, multiple risk periods, repeated exposures, and washout periods.
simulatesccsdata(nindivs, astart, aend, adrug, aedrug, expogrp=c(0), eexpo, washout=NULL, ewashout=NULL, agegrp=NULL, eage=NULL)
simulatesccsdata(nindivs, astart, aend, adrug, aedrug, expogrp=c(0), eexpo, washout=NULL, ewashout=NULL, agegrp=NULL, eage=NULL)
nindivs |
a positive integer: number of cases to be generated (1 event per case). |
astart |
age at start of an observation period. It is a single number if the same start of observation for all cases is required or a vector of length equal to |
aend |
age at end of the observation period. A single number for the same end of observation periods for all cases or a vector to allow for different end of observation periods. |
adrug |
a vector (of length |
aedrug |
a vector of ages at which exposure-related risk ends or a matrix if there are multiple exposures. The number of columns of |
expogrp |
a vectors of days to the start of exposure-related risk, counted from |
eexpo |
a vector of exposure-related relative incidences. |
washout |
a vector of days to start of washout periods counted from |
ewashout |
a vector of true relative incidence values associated with washout periods; it defaults to NULL when washout=NULL. |
agegrp |
cut points of age groups, defaults to NULL (i.e no age effect included). These are given as the day of an age category starts, the first age category starts at the minimum of |
eage |
a vector of age-related relative incidences. The default is NULL where there is no age effect i.e agegrp = NULL. If age-specific relative incidences are from a continuous function |
The true relative incidences related to age and exposure could be generated from discrete or continuous distributions.
A data frame with columns "indiv" = individual identifier, "astart" = age on the day observation period starts, "adrug" = age on the day exposure starts, "aedrug" = age at the end of exposure related risk period, "aend" = age at the end of observation period, and "aevent" = age on the day of outcome event.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# Simulate data where all the cases have same start and end of # observation periods and no age effect set.seed(4321) arisk <- round(runif(110,366,730)) # ages at start of exposure simdata <- simulatesccsdata(nindivs=110, astart=366, aend=730, adrug=arisk, aedrug=arisk+20, eexpo=2.5) simdata
# Simulate data where all the cases have same start and end of # observation periods and no age effect set.seed(4321) arisk <- round(runif(110,366,730)) # ages at start of exposure simdata <- simulatesccsdata(nindivs=110, astart=366, aend=730, adrug=arisk, aedrug=arisk+20, eexpo=2.5) simdata
Fits a semiparametric SCCS model with smooth age effect, where the age related relative incidence function is represented by spline function; that is, linear combinations of M-splines. The exposure related relative incidence function is represented by step functions. One exposure group can be included.
smoothagesccs(indiv, astart, aend, aevent, adrug, aedrug, expogrp = 0, washout = NULL, kn=12, sp = NULL, data)
smoothagesccs(indiv, astart, aend, aevent, adrug, aedrug, expogrp = 0, washout = NULL, kn=12, sp = NULL, data)
indiv |
a vector of individual identifiers of cases. |
astart |
a vector of ages at which observation periods start. |
aend |
a vector of ages at end of observation periods. |
aevent |
a vector of ages at event, an individual can experience multiple events. |
adrug |
a vector of ages at which exposure starts, only a single exposure type can be included. |
aedrug |
a vector of ages at which the exposure-related risk periods end. |
expogrp |
a vector of days to the start of exposure-related risk, counted from |
washout |
a vector of days to start of washout periods counted from |
kn |
an integer >=5 representing the number of interior knots used to define the M-spline basis functions which are related to the age specific relative incidence function, usually between 8 and 12 knots is sufficient. It defaults to 12 knots. |
sp |
smoothing parameter value. It defaults to "auto" where the smoothing paramter is obtained automatically using a cross-validation method. The value of "sp" must be a number greater or equal to 0. |
data |
a data frame containing the input data. The data are assembled one line per event. |
The standard SCCS represents the age and exposure effects by piecewise constant step functions, however mis-specification of age group cut points might lead to biased estimates of the exposure related relative incidences. The semiparametric SCCS model, semisccs
, has numerical challenges when the number of cases is large. This splined-based semiparametric SCCS model with smooth age effect avoids these limitations of the standard and semiparametric SCCS models. The smoothing parameter for the age-related relative incidence function is chosen by an approximate cross-validation method. The method is outlined in Ghebremichael-Weldeselassie et al (2014).
Relative incidence estimates along with their 95% confidence limits.
coef |
log of the exposure related relative incidence estimates. |
se |
standard errors of the log of exposure related relative incidence estimates. |
age |
age related relative incidences at each day between the minimum age at start of observation and maximum age at end of observation periods. |
ageaxis |
sequence of ages between the minimum age at start of observations and maximum age at end of observation periods corresponding to the age related relative incidences. |
smoothingpara |
smoothing parameter chosen by maximizing an approximate cross-validation score or given as an argument in the function |
cv |
cross-validation score |
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Self-controlled case series method with smooth age effect. Statistics in Medicine, 33(4), 639-649.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# Fit the SCCS model with smooth age effect to the itp data and plot age effect. itp.mod <- smoothagesccs(indiv=case, astart=sta,aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), sp=2800, data=itpdat) itp.mod plot(itp.mod)
# Fit the SCCS model with smooth age effect to the itp data and plot age effect. itp.mod <- smoothagesccs(indiv=case, astart=sta,aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), sp=2800, data=itpdat) itp.mod plot(itp.mod)
Fits a spline-based SCCS model where the exposure-related relative incidence function is represented by a spline function, that is a linear combination of M-splines, and the age effects are represented by a piecewise constant function.
smoothexposccs(indiv, astart, aend, aevent, adrug, aedrug, agegrp, kn=12, sp = NULL, data)
smoothexposccs(indiv, astart, aend, aevent, adrug, aedrug, agegrp, kn=12, sp = NULL, data)
indiv |
a vector of individual identifiers of cases. |
astart |
a vector of ages at which the observation periods start. |
aend |
a vector of ages at end of observation periods. |
aevent |
a vector of ages at event (outcome of interest), an individual can experience multiple events. |
adrug |
a vector of ages at which exposure related risk period starts. |
aedrug |
a vector of ages at which exposure related risk period ends. |
agegrp |
a vector of cut points for the age groups where each value represents the start of an age catagory. The first element in the vector is the start of the second age group. The first age group starts at the minimum of |
kn |
number of interior knots >=5 used to define the M-spline basis functions, usually between 8 and 12 knots is sufficient. The default is 12. |
sp |
smoothing parameter value. It defaults to "auto" where the smoothing paramter is obtained automatically using a cross validation method. The value of "sp" must be a number greater or equal to 0. |
data |
a data frame containing the input data. |
The standardsccs
, semisccs
and smoothagesccs
use piecewise constant step functions to model the exposure effect. However mis-specification of exposure group cut points might result in biased estimates. This method represents exposure related relative incidence function by a spline function.
Relative incidence estimates along with their 95% confidence limits. Varaince-covariance matrix can also be obtained.
estimates |
exposure related relative incidence estimates at each point of time since start of exposure until the maximum duration of exposure. |
lci |
lower confidence limits of the exposure-related relative incidence estimates. |
uci |
upper confidence limits of the exposure-related relative incidence estimates. |
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Ghebremichael-Weldeselassie, Y., Whitaker, H. J., Farrington, C. P. (2015). Flexible modelling of vaccine effects in self-controlled case series models 25, 1768–1797.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series Studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
library(SCCS) # Fit smooth exposure SCCS to MMR vaccine and itp itp.mod1 <- smoothexposccs(sp=10, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427, 488, 549, 610, 671), data=itpdat) itp.mod1 plot(itp.mod1)
library(SCCS) # Fit smooth exposure SCCS to MMR vaccine and itp itp.mod1 <- smoothexposccs(sp=10, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427, 488, 549, 610, 671), data=itpdat) itp.mod1 plot(itp.mod1)
Fits the standard SCCS model where age and exposure effects are represented by piecewise constant functions.
standardsccs(formula, indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, seasongrp=NULL, dob=NULL, dataformat="stack", data)
standardsccs(formula, indiv, astart, aend, aevent, adrug, aedrug, expogrp = list(), washout = list(), sameexpopar = list(), agegrp = NULL, seasongrp=NULL, dob=NULL, dataformat="stack", data)
formula |
model formula. The dependent variable should always be "event" e.g. event ~ itp. If age and/or season effects are included, they should always be included as 'age' and 'season', e.g. event ~ itp + age. |
indiv |
a vector of individual identifiers of cases. |
astart |
a vector of ages at which the observation periods start. |
aend |
a vector of ages at end of observation periods. |
aevent |
a vector of ages at event; an individual can experience multiple events |
adrug |
a list of vectors of ages at start of exposures or a list of matrices if the exposures have multiple episodes ( |
aedrug |
a list of vectors of ages at which exposure-related risk ends or a list of matrices if there are multiple episodes (repeat exposures in different columns) of the same exposure type. The dimension of each item of |
expogrp |
list of vectors of days to the start of exposure-related risk, counted from |
washout |
list of vectors with days to start of washout periods counted from |
sameexpopar |
a vector of logical values. If TRUE (the default) no dose effect is assumed: the same exposure parameters are used for multiple doses/episodes of the same exposure type, presented in |
agegrp |
a vector of cut points of the age groups where each value represents the start of an age category. The first element in the vector is the start of the second age group. The first age group starts at the minimum of |
seasongrp |
a vector of cut points for seasonal effects. The values should be given in ddmm format, representing the first days of each season group. The seasonal effect is a factor, the reference level being the time interval starting at the earliest date in |
dob |
a vector of birth dates of the cases, in ddmmyyyy format. They are used if seasonal effects are included in the model. The default |
dataformat |
the way the input data are assembled. It accepts "multi" or "stack" (the default), where "multi" refers to a data assembled with one row representing one event and "stack" refers to a data frame where repeated exposures of the same type are stack in one column. In the "multi" dataformat different episodes of the same exposure type are recorded as separate columns in the dataframe. |
data |
a data frame containing the input data. The data should be in 'stack' or 'multi' (see |
In the standard SCCS model, originally described in Farrington (1995), age and exposure effects are represented by step functions. Suppose that individual has
events,
occurring in age group
and exposure group
and that the
event falls within age group
and exposure group
. The SCCS likelihood contribution for individual
is
This is a multinomial likelihood with index , responses
and probabilities
The standard SCCS likelihood is equivalent to a product multinomial likelihood.
The SCCS likelihood is equivalent to that of the conditional logistic model for matched case-control studies: see Breslow and Day (1980), Chapter 7. This means that SCCS models can be fit using conditional logistic regression, with a factor for each individual event.
The SCCS R package maximises the likelihood using clogit. Each event is assigned an identifier (indivL), and direct estimation is avoided using the option strata(indivL).
Relative incidence estimates along with their 95% confidence limits.
Yonas Ghebremichael-Weldeselassie, Heather Whitaker, Paddy Farrington.
Farrington, C.P. (1995). Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51, 228–235.
Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research, volume I: The analysis of case-control studies. IARC Publications No.32.
Farrington, P., Whitaker, H., and Ghebremichael-Weldeselassie, Y. (2018). Self-controlled Case Series studies: A modelling Guide with R. Boca Raton: Chapman & Hall/CRC Press.
# Single exposure-related risk period with no age effect itp.mod1 <- standardsccs(event~mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, data=itpdat) itp.mod1 # Single exposure-related risk period and age effect included itp.mod2 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427,488,549,610,671), data=itpdat) itp.mod2 # Multiple risk periods and age effect included itp.mod3 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod3 # Multiple risk periods, washout periods and age effects ageq <- floor(quantile(hipdat$frac, seq(0.05,0.95,0.05), names=FALSE)) # Age group # cut points hip.mod1 <- standardsccs(event~ad+age, indiv=case, astart=sta, aend=end, aevent=frac, adrug=ad, aedrug=endad, expogrp=c(0,15,43), washout=c(1,92,182), agegrp=ageq, data=hipdat) # Multiple/repeat exposures of the same exposure type, dataformat="stack" ageq <- floor(quantile(gidat$bleed[duplicated(gidat$case)==0], seq(0.025,0.975,0.025), names=FALSE)) gi.mod1 <- standardsccs(event~ns+relevel(age,ref=21), indiv=case, astart=sta, aend=end, aevent=bleed, adrug=ns, aedrug=endns, agegrp=ageq, dataformat="stack", data=gidat) gi.mod1 # Multiple doses of a vaccine each with different parameter estimates (sameexpopar=F) ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points dtp.mod2 <- standardsccs(event~dtp+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=cbind(dtp,dtpd2, dtpd3), aedrug=cbind(dtp+14,dtpd2+14,dtpd3+14), expogrp=c(0,4,8),agegrp=ageg, dataformat="multi", sameexpopar=FALSE, data=dtpdat) dtp.mod2 # Multiple exposure types ageg <- seq(387,707,20) # Age group cut points con.mod <- standardsccs(event~hib+mmr+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=cbind(hib,mmr), aedrug=cbind(hib+14,mmr+14), expogrp=list(c(0,8), c(0,8)), agegrp=ageg, data=condat) con.mod # Multiple doses/episodes of several exposure types, the doses of each exposure type # have same paramter ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points hib.mod1 <- standardsccs(event~dtp+hib+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=list(cbind(dtp, dtpd2,dtpd3), cbind(hib,hibd2,hibd3)), aedrug=list(cbind(dtp+14,dtpd2+14,dtpd3+14), cbind(hib+14,hibd2+14,hibd3+14)), expogrp=list(c(0,4,8),c(0,8)),agegrp=ageg, dataformat="multi", data=hibdat) hib.mod1 # Multiple doses/episodes of several exposure types, the doses of "dtp" # different parameters and the doses of the second exposure hib have # same paramters ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points hib.mod2 <- standardsccs(event~dtp+hib+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=list(cbind(dtp,dtpd2,dtpd3), cbind(hib,hibd2,hibd3)), aedrug=list(cbind(dtp+3,dtpd2+3,dtpd3+3), cbind(hib+7,hibd2+7,hibd3+7)), sameexpopar=c(FALSE,TRUE), agegrp=ageg, dataformat="multi", data=hibdat) hib.mod2 # Season included in a model month <- c(0101,0102,0103,0104,0105,0106,0107,0108,0109,0110,0111,0112) # season cutpoints int.mod <- standardsccs(event~opv+age+season, indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(opv,opvd2), aedrug=cbind(opv+42,opvd2+42), expogrp=c(0,15,29), agegrp=seq(30,330,30), seasongrp=month,dob=dob, dataformat="multi", data=intdat) int.mod
# Single exposure-related risk period with no age effect itp.mod1 <- standardsccs(event~mmr, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, data=itpdat) itp.mod1 # Single exposure-related risk period and age effect included itp.mod2 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, agegrp=c(427,488,549,610,671), data=itpdat) itp.mod2 # Multiple risk periods and age effect included itp.mod3 <- standardsccs(event~mmr+age, indiv=case, astart=sta, aend=end, aevent=itp, adrug=mmr, aedrug=mmr+42, expogrp=c(0,15,29), agegrp=c(427,488,549,610,671), data=itpdat) itp.mod3 # Multiple risk periods, washout periods and age effects ageq <- floor(quantile(hipdat$frac, seq(0.05,0.95,0.05), names=FALSE)) # Age group # cut points hip.mod1 <- standardsccs(event~ad+age, indiv=case, astart=sta, aend=end, aevent=frac, adrug=ad, aedrug=endad, expogrp=c(0,15,43), washout=c(1,92,182), agegrp=ageq, data=hipdat) # Multiple/repeat exposures of the same exposure type, dataformat="stack" ageq <- floor(quantile(gidat$bleed[duplicated(gidat$case)==0], seq(0.025,0.975,0.025), names=FALSE)) gi.mod1 <- standardsccs(event~ns+relevel(age,ref=21), indiv=case, astart=sta, aend=end, aevent=bleed, adrug=ns, aedrug=endns, agegrp=ageq, dataformat="stack", data=gidat) gi.mod1 # Multiple doses of a vaccine each with different parameter estimates (sameexpopar=F) ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points dtp.mod2 <- standardsccs(event~dtp+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=cbind(dtp,dtpd2, dtpd3), aedrug=cbind(dtp+14,dtpd2+14,dtpd3+14), expogrp=c(0,4,8),agegrp=ageg, dataformat="multi", sameexpopar=FALSE, data=dtpdat) dtp.mod2 # Multiple exposure types ageg <- seq(387,707,20) # Age group cut points con.mod <- standardsccs(event~hib+mmr+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=cbind(hib,mmr), aedrug=cbind(hib+14,mmr+14), expogrp=list(c(0,8), c(0,8)), agegrp=ageg, data=condat) con.mod # Multiple doses/episodes of several exposure types, the doses of each exposure type # have same paramter ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points hib.mod1 <- standardsccs(event~dtp+hib+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=list(cbind(dtp, dtpd2,dtpd3), cbind(hib,hibd2,hibd3)), aedrug=list(cbind(dtp+14,dtpd2+14,dtpd3+14), cbind(hib+14,hibd2+14,hibd3+14)), expogrp=list(c(0,4,8),c(0,8)),agegrp=ageg, dataformat="multi", data=hibdat) hib.mod1 # Multiple doses/episodes of several exposure types, the doses of "dtp" # different parameters and the doses of the second exposure hib have # same paramters ageg <- c(57,85,113,141,169,197,225,253,281,309,337) # age group cut points hib.mod2 <- standardsccs(event~dtp+hib+age, indiv=case, astart=sta, aend=end, aevent=conv, adrug=list(cbind(dtp,dtpd2,dtpd3), cbind(hib,hibd2,hibd3)), aedrug=list(cbind(dtp+3,dtpd2+3,dtpd3+3), cbind(hib+7,hibd2+7,hibd3+7)), sameexpopar=c(FALSE,TRUE), agegrp=ageg, dataformat="multi", data=hibdat) hib.mod2 # Season included in a model month <- c(0101,0102,0103,0104,0105,0106,0107,0108,0109,0110,0111,0112) # season cutpoints int.mod <- standardsccs(event~opv+age+season, indiv=case, astart=sta, aend=end, aevent=intus, adrug=cbind(opv,opvd2), aedrug=cbind(opv+42,opvd2+42), expogrp=c(0,15,29), agegrp=seq(30,330,30), seasongrp=month,dob=dob, dataformat="multi", data=intdat) int.mod