Title: | Family Age-at-Onset Data Simulation and Penetrance Estimation |
---|---|
Description: | Simulates age-at-onset traits associated with a segregating major gene in family data obtained from population-based, clinic-based, or multi-stage designs. Appropriate ascertainment correction is utilized to estimate age-dependent penetrance functions either parametrically from the fitted model or nonparametrically from the data. The Expectation and Maximization algorithm can infer missing genotypes and carrier probabilities estimated from family's genotype and phenotype information or from a fitted model. Plot functions include pedigrees of simulated families and predicted penetrance curves based on specified parameter values. For more information see Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. |
Authors: | Yun-Hee Choi, Karen Kopciuk, Wenqing He, Laurent Briollais |
Maintainer: | Yun-Hee Choi <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 3.2 |
Built: | 2024-10-31 06:35:13 UTC |
Source: | CRAN |
Family-based studies are used to characterize the disease risk associated
with being a carrier of a major gene. When the disease risk can vary with
age of onset, penetrance or disease risk functions need to provide age-dependent
estimates of this disease risk over lifetime.
This FamEvent package can generate age-at-onset data in the context of familial studies,
with correction for ascertainment (selection) bias arising from a specified
study design based on proband's mutation and disease statuses.
Possible study designs are: "pop"
for population-based design where
families are ascertained through affected probands, "pop+"
are similar to
"pop"
but probands are also known mutation carriers, "cli"
for
clinic-based design that includes affected probands with at least one
parent and one sib affected, "cli+"
are similar to "cli"
but probands are
also known mutation carrriers. And "twostage"
for two-stage design that randomly samples families from
the population in the first stage and oversamples high risk families that includes at least two
affected members in the family at the second stage.
Ages at disease onset are generated specific to family members' gender and mutation status according to the specified study design with residual familial correlations induced by a shared frailty, correlated frailties with Kinship matrix, or a second gene.
For estimating age at onset risks with family data, an ascertainment corrected prospective likelihood approach is used to account for the population or clinic-based study designs while a composite likelihood approach is used for the two-stage sampling design. The Expectation and Maximization (EM) algorithm has been implemented for inferring missing genotypes conditional on observed genotypes and phenotypes in the families. For family members who have missing genotypes, their carrier probabilities are obtained either from the fitted model or from Mendelian transmission probabilities.
This package also provides functions to plot the age-dependent penetrance curves estimated parametrically from the fitted model or non-parametrically from the data, pedigree plots of simulated families and penetrance function curves for carriers and non-carriers of a major and second gene based on specified parameter values.
In summary, this package facilitates the following:
1. Family data generations from 1) a shared frailty model or 2) a correlated frailty model with Kinship and/or IBD matrix; 3) a competing risk model.
2. Covariates considered: sex, gene, options for additional covariates.
3. Time-varying covariate generation based on permanent exposure (PE) or the Cox-Oaks model (CO).
3. Model estimation for shared frailty models and competing risk models.
Yun-Hee Choi, Karen Kopciuk, Laurent Briollais, Wenqing He
Maintainer: Yun-Hee Choi < [email protected] >
Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
simfam
, summary.simfam
, plot.simfam
,
simfam2
, summary.simfam2
, plot.simfam2
,
simfam_tvc
, summary.simfam_tvc
,
plot.simfam_tvc
,
penplot
, carrierprob
, penmodel
, penmodelEM
,
print.penmodel
, summary.penmodel
,print.summary.penmodel
, plot.penmodel
,
simfam_cmp
, summary.simfam_cmp
, plot.simfam_cmp
, penplot_cmp
, penmodel_cmp
,
print.penmodel_cmp
, summary.penmodel_cmp
,
print.summary.penmodel_cmp
, plot.penmodel_cmp
## Not run: # Example1: Simulate family data set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), allelefreq = 0.02) # summary of simulated family data summary(fam) # Pedigree plots for family 1 and 2 plot(fam, famid = c(1,2)) # penetrance function plots given model parameter values for Weibull baseline penplot(base.parms = c(0.01, 3), vbeta = c(-1.3, 2.35), base.dist = "Weibull", variation = "none", agemin = 20) # model fit of family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms=c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE) # summary of estimated model parameters and penetrance estimates summary(fit) # penetrance curves useful for model checking plot(fit) ## Example 2: Simulate family data from a correlated frailty model # with Kinship and IBD matrices given pedigree data. # Inputdata and IBD matrix should be provided; # Inputdata was generated as an example using simfam. data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) summary(fam2) ### Example 3: Simulate correlated competing risks family data set.seed(4321) fam3 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) # summary of simulated family data summary(fam3) # Pedigree plots for family 1 plot(fam3, famid = 1) # penetrance function plot for event 1 given model parameter values for Weibull baseline penplot_cmp(event = 1, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.3, 2.35), c(-1, 2)), base.dist = "Weibull", variation = "frailty", frailty.dist = "cgamma", depend=c(1,2,0.5), agemin = 20) # Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data fit3 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)), base.dist = "Weibull", frailty.dist = "cgamma", data = fam2, robust = TRUE) # Summary of the model parameter estimates from the model fit summary(fit3) # Plot the lifetime penetrance curves with 95 # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit3, add.CIF = TRUE, conf.int = TRUE, MC = 100) ### Example 4: Simulate family data with a time-varying covariate set.seed(4321) fam4 <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam4) ## End(Not run)
## Not run: # Example1: Simulate family data set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), allelefreq = 0.02) # summary of simulated family data summary(fam) # Pedigree plots for family 1 and 2 plot(fam, famid = c(1,2)) # penetrance function plots given model parameter values for Weibull baseline penplot(base.parms = c(0.01, 3), vbeta = c(-1.3, 2.35), base.dist = "Weibull", variation = "none", agemin = 20) # model fit of family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms=c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE) # summary of estimated model parameters and penetrance estimates summary(fit) # penetrance curves useful for model checking plot(fit) ## Example 2: Simulate family data from a correlated frailty model # with Kinship and IBD matrices given pedigree data. # Inputdata and IBD matrix should be provided; # Inputdata was generated as an example using simfam. data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) summary(fam2) ### Example 3: Simulate correlated competing risks family data set.seed(4321) fam3 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) # summary of simulated family data summary(fam3) # Pedigree plots for family 1 plot(fam3, famid = 1) # penetrance function plot for event 1 given model parameter values for Weibull baseline penplot_cmp(event = 1, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.3, 2.35), c(-1, 2)), base.dist = "Weibull", variation = "frailty", frailty.dist = "cgamma", depend=c(1,2,0.5), agemin = 20) # Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data fit3 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)), base.dist = "Weibull", frailty.dist = "cgamma", data = fam2, robust = TRUE) # Summary of the model parameter estimates from the model fit summary(fit3) # Plot the lifetime penetrance curves with 95 # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit3, add.CIF = TRUE, conf.int = TRUE, MC = 100) ### Example 4: Simulate family data with a time-varying covariate set.seed(4321) fam4 <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam4) ## End(Not run)
Computes model- or data-based carrier probabilities for individuals with missing genotypes based on the observed mutation status of family members and the individual's phenotype.
carrierprob(condition = "geno", method = "data", fit = NULL, data, mode = "dominant", q = 0.02)
carrierprob(condition = "geno", method = "data", fit = NULL, data, mode = "dominant", q = 0.02)
condition |
Choice of conditional information to use for computing the carrier probability. Possible choices are |
method |
Choice of methods to calculate the carrier probability. Possible choices are |
fit |
An object of class |
data |
Family data that includes missing genotypes using the same data format generated by the function |
mode |
Choice of modes of inheritance when using |
q |
Frequency of the disease causing allele when using |
When method="model"
along with the choice of condition="geno+pheno"
, the carrier probability for individual is calculated by conditioning on her/his observed phenotype and carrier statuses of family members
P(Xi = 1 | Yi , Xo ) =
P(Yi | Xi = 1) * P(Xi = 1 | Xo) /
(P(Yi | Xi = 1) * P(Xi = 1 | Xo) + P(Yi | Xi = 0) * P(Xi = 0 | Xo))
where Xi indicates the unknown carrier status of individual
and Xo represents the observed carrier statuses in his or her family members; Yi represents the observed phenotype ti, δi of individual
in terms of age at onset ti and disease status indicator δi with 1 used for affected individuals and 0 for unaffected individuals.
When
method="mendelian"
along with the choice of condition="geno"
, the carrier probability is calculated based on Mendelian laws of genetic transmission with a fixed allele frequency.
Returns a data frame with a vector of carrier probabilities called carrp.geno
when condition="geno"
or carrp.pheno
when condtion="geno+pheno"
added after the last column of the family data.
Yun-Hee Choi
simfam, penmodelEM, plot.simfam, summary.simfam
# Simulated family data with 30% of members missing their genetic information. set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", mrate = 0.3, base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), agemin = 20) # EM algorithm for fitting family data with missing genotypes assuming a Weibull # baseline hazard and dominant mode of Mendelian inheritance for a major gene. fitEM <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", parms = c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull", method = "mendelian", mode = "dominant") # Carrier probability obtained by conditioning on the observed genotypes and phenotype, # assuming a dominant Mendelian mode of inheritance fam.added <- carrierprob(condition = "geno+pheno", method = "model", fit = fitEM, data = fam, mode = "dominant", q = 0.02) # pedigree plot for family 1 displaying carrier probabilities plot.simfam(fam.added, famid = 1)
# Simulated family data with 30% of members missing their genetic information. set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", mrate = 0.3, base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), agemin = 20) # EM algorithm for fitting family data with missing genotypes assuming a Weibull # baseline hazard and dominant mode of Mendelian inheritance for a major gene. fitEM <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", parms = c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull", method = "mendelian", mode = "dominant") # Carrier probability obtained by conditioning on the observed genotypes and phenotype, # assuming a dominant Mendelian mode of inheritance fam.added <- carrierprob(condition = "geno+pheno", method = "model", fit = fitEM, data = fam, mode = "dominant", q = 0.02) # pedigree plot for family 1 displaying carrier probabilities plot.simfam(fam.added, famid = 1)
Computes the power of detecting genetic effect in the penetrance model based on a family-based simulation study.
fampower(N.fam, N.sim, effectsize, beta.sex, alpha = 0.05, side = 2, design = "pop", variation = "none", interaction = FALSE, depend = NULL, base.dist = "Weibull", frailty.dist = NULL, base.parms, allelefreq = c(0.02, 0.2), dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
fampower(N.fam, N.sim, effectsize, beta.sex, alpha = 0.05, side = 2, design = "pop", variation = "none", interaction = FALSE, depend = NULL, base.dist = "Weibull", frailty.dist = NULL, base.parms, allelefreq = c(0.02, 0.2), dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
N.fam |
Number of families to generate. |
N.sim |
Number of simulations. |
effectsize |
Effect size of the major mutated gene ( |
beta.sex |
Gender effect that is fixed in the model. |
alpha |
Significance level. Default value is 0.05. |
side |
Number of sides for the alternative hypothesis. Possible choices are 1 for one-sided test and 2 for two-sided test. Default value is 2. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
depend |
Variance of the frailty distribution. Dependence within families increases with depend value. Default value is |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for baseline hazard function.
|
allelefreq |
Vector of population allele frequencies of major and second disease gene alleles. Frequencies must be between 0 and 1. Default frequencies are 0.02 for major gene allele and 0.2 for second gene allele, |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
The power of testing vs.
effectsize
is obtained by the proportion of times the null hypothesis is rejected out of the N.sim
simulations.
When interaction = TRUE
, the powers of both the main effect of mutated gend and the interaction effect of mutated gene and gender will be computed.
Returns
power |
Power of detecting the genetic effect. |
Yun-Hee Choi
## Example 1: obtain the power for testing the genetic effect # based on 50 POP families simulated using 100 simulations ## Not run: set.seed(4321) fampower(N.fam = 50, N.sim = 100, effectsize = 1, beta.sex = 0.8, alpha = 0.05, side = 2, design = "pop+", variation = "none", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3)) ## End(Not run) ## Example 2: obtain the power for both the main and interaction effects # based on 50 POP families simulated using 100 simulations ## Not run: set.seed(4321) fampower(N.fam = 50, N.sim = 100, effectsize = c(1.5, 1), beta.sex = 0.8, alpha = 0.05, side = 2, interaction = TRUE, design = "pop+", variation = "none", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3)) ## End(Not run)
## Example 1: obtain the power for testing the genetic effect # based on 50 POP families simulated using 100 simulations ## Not run: set.seed(4321) fampower(N.fam = 50, N.sim = 100, effectsize = 1, beta.sex = 0.8, alpha = 0.05, side = 2, design = "pop+", variation = "none", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3)) ## End(Not run) ## Example 2: obtain the power for both the main and interaction effects # based on 50 POP families simulated using 100 simulations ## Not run: set.seed(4321) fampower(N.fam = 50, N.sim = 100, effectsize = c(1.5, 1), beta.sex = 0.8, alpha = 0.05, side = 2, interaction = TRUE, design = "pop+", variation = "none", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3)) ## End(Not run)
Data from 32 Lynch Syndrome families segregating mismatch repair mutations selected from the Ontario Familial Colorectal Cancer Registry that includes 765 individuals, both probands and relatives. The families were ascertained throughout affected and mutation carrier probands.
data("LSfam")
data("LSfam")
A data frame with 765 observations on the following 11 variables.
famID
Family identification (ID) numbers.
indID
Individuals ID numbers.
fatherID
Father ID numbers.
motherID
Mother ID numbers.
gender
Gender indicators: 1 for male, 0 for female.
status
Disease statuses: 1 for affected, 0 for unaffected.
time
Ages at diagnosis of colorectal cancer for the affected or ages of last follow-up for the unaffected.
currentage
Current ages in years.
mgene
MLH1 or MSH2 mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or NA
if missing.
proband
Proband indicators: 1 for proband, 0 for non-proband.
relation
Family members' relationship with the proband. Relation codes:
1 | Proband (self) |
2 | Brother or sister |
3 | Son or daughter |
4 | Parent |
5 | Nephew or niece |
6 | Spouse |
7 | Brother or sister in law |
8 | Paternal grandparent |
9 | Paternal uncle or aunt |
10 | Paternal cousin |
11 | Maternal grandparent |
12 | Maternal uncle or aunt |
13 | Maternal cousin |
14 | Son or daughter in law |
15 | Grandchild |
16 | Uncle's or aunt's spouse. |
Choi, Y.-H., Cotterchio, M., McKeown-Eyssen, G., Neerav, M., Bapat, B., Boyd, K., Gallinger, S., McLaughlin, J., Aronson, M., and Briollais, L. (2009). Penetrance of Colorectal Cancer among MLH1/ MSH2 Carriers Participating in the Colorectal Cancer Familial Registry in Ontario, Hereditary Cancer in Clinical Practice, 7:14.
data(LSfam) # Summary of LSfam summary.simfam(LSfam) # Pedigree plot for the first family plot.simfam(LSfam) # Assign minimum age for fitting penmodel attr(LSfam, "agemin") <- 18 fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", parms = c(0.05, 2, 1, 3), data = LSfam[!is.na(LSfam$time) & LSfam$time > 18, ], method = "mendelian", base.dist = "Weibull", design = "pop+", robust = TRUE) summary(fit) penetrance(fit, fixed = c(1, 1), age = c(50, 60, 70), CI = TRUE, MC = 100)
data(LSfam) # Summary of LSfam summary.simfam(LSfam) # Pedigree plot for the first family plot.simfam(LSfam) # Assign minimum age for fitting penmodel attr(LSfam, "agemin") <- 18 fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", parms = c(0.05, 2, 1, 3), data = LSfam[!is.na(LSfam$time) & LSfam$time > 18, ], method = "mendelian", base.dist = "Weibull", design = "pop+", robust = TRUE) summary(fit) penetrance(fit, fixed = c(1, 1), age = c(50, 60, 70), CI = TRUE, MC = 100)
Estimates the cumulative disease risks (penetrances) and confidence intervals at given age(s) based on the fitted penetrance model.
penetrance(fit, fixed, age, CI = TRUE, MC = 100)
penetrance(fit, fixed, age, CI = TRUE, MC = 100)
fit |
An object class of |
fixed |
Vector of fixed values of the covariates used for penetrance calculation. |
age |
Vector of ages used for penetrance calculation. |
CI |
Logical; if |
MC |
Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method.
If |
The penetrance function is defined as the probability of developing a disease by age given fixed values of covariates
,
where is greater than the minimum age t0 and
is the survival distribution based on a proportional hazards model with a specified baseline hazard distribution.
The proportional hazards model is specified as:
h(t|x) = h0(t) exp(β*x),
where h0(t) is the baseline hazards function, is the vector of covariates and β is the vector of corresponding regression coefficients.
Calculations of standard errors of the penetrance estimates and 95% confidence intervals (CIs) for the penetrance at a given age are based on Monte-Carlo simulations of the estimated penetrance model.
A multivariate normal distribution is assumed for the parameter estimates, and MC = n
sets of the parameters are generated from the multivariate normal distribution with the parameter estimates and their variance-covariance matrix.
For each simulated set, a penetrance estimate is calculated at a given age by substituting the simulated parameters into the penetrance function.
The standard error of the penetrance estimate at a given age is calculated by the standard deviation of penetrance estimates obtained from simulations.
The 95% CI for the penetrance at a given age is calculated using the 2.5th and 97.5th percentiles of the penetrance estimates obtained from simulations.
Returns the following values:
age |
Ages at which the penetrances are calculated. |
penetrance |
Penetrance estimates at given ages. |
lower |
Lower limit of the 95% confidence interval; simulation-based 2.5th percentile of the penetrance estimates. |
upper |
Upper limit of the 95% confidence interval; simulation-based 97.5th percentile of the penetrance estimates. |
se |
Simulation-based standard errors of the penetrance estimates. |
Yun-Hee Choi
set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01,3), vbeta = c(-1.13, 2.35)) fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", design = "pop+") # Compute penetrance estimates for male carriers at age 40, 50, 60, and 70 and # their 95% CIs based on 100 Monte Carlo simulations. penetrance(fit, fixed = c(1,1), age = c(40, 50, 60, 70), CI = TRUE, MC = 100)
set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01,3), vbeta = c(-1.13, 2.35)) fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", design = "pop+") # Compute penetrance estimates for male carriers at age 40, 50, 60, and 70 and # their 95% CIs based on 100 Monte Carlo simulations. penetrance(fit, fixed = c(1,1), age = c(40, 50, 60, 70), CI = TRUE, MC = 100)
Estimates the cumulative disease risks (penetrances) and confidence intervals for the event of interest in the presence of competing event given fixed values of covariates based on the fitted competing risk model.
penetrance_cmp(fit, event = 1, fixed, age, CI = TRUE, MC = 100)
penetrance_cmp(fit, event = 1, fixed, age, CI = TRUE, MC = 100)
fit |
An object class of |
event |
Event of interest (either 1 or 2) for penetrance estimation. Default value is 1. |
fixed |
list of vectors of fixed values of the covariates for both events used for penetrance calculation. |
age |
Vector of ages used for penetrance calculation. |
CI |
Logical; if |
MC |
Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method.
If |
The cause-specific hazard for event is specified as:
hj(t|x) = h0j(t) exp(βj *xj),
where h0j(t) is the baseline hazards function for event
, xj is the vector of covariates associated with event
and βj is the vector of corresponding regression coefficients,
.
The penetrance function for event in the presence of competing risks based on cause-specific hazards (with no frailties assumed) model is defined as the probability of developing an event of interest by age
given fixed values of covariates
in the following form:
P(T < t, d = j |x) = ∫t0t hj(u|xj)exp(-H1(u|x1)-H2(u|x2))du,
where t0 is the minimum age of onset,
is the type of event which takes
.
The shared frailty competing risks model is:
hj(t|zj, x) = zj*h0j(t) exp(βj *xj),
where zj is the shared frailty for event within families whose distribution is specified by
frailty.dist
.
The penetrance function for event from the shared frailty competing risks model is obtained by integrating over the frailty distribution of G(z1, z2),
P(T < t, d = j | x) = ∫t0t
∫ ∫ hj(u|xj, zj) exp(-H1(u|x1 zj)-H2(u|x2, zj)) dG(z1 dz2)du,
where t0 is the minimum age of onset,
is the type of event which takes
.
See Choi et al. (2021) for more details about the penetrance functions.
Calculations of standard errors of the penetrance estimates and 95% confidence intervals (CIs) for the penetrance at a given age are based on Monte-Carlo simulations of the estimated penetrance model. A multivariate normal distribution is assumed for the parameter estimates, and MC = n
sets of the parameters are generated from the multivariate normal distribution with the parameter estimates and their variance-covariance matrix.
For each simulated set, a penetrance estimate is calculated at a given age by substituting the simulated parameters into the penetrance function.
The standard error of the penetrance estimate at a given age is calculated by the standard deviation of penetrance estimates obtained from simulations.
The 95% CI for the penetrance at a given age is calculated using the 2.5th and 97.5th percentiles of the penetrance estimates obtained from simulations.
Returns the following values:
age |
Ages at which the penetrances are calculated. |
penetrance |
Penetrance estimates at given ages. |
lower |
Lower limit of the 95% confidence interval; simulation-based 2.5th percentile of the penetrance estimates. |
upper |
Upper limit of the 95% confidence interval; simulation-based 97.5th percentile of the penetrance estimates. |
se |
Simulation-based standard errors of the penetrance estimates. |
Yun-Hee Choi
Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
## Not run: set.seed(4321) fam2 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", competing = TRUE, base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02) fit2 <- penmodel_cmp(Surv(time, status==1) ~ gender + mgene, Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", frailty.dist = "cgamma", parms = list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), data = fam2, design = "pop+", base.dist = "Weibull", agemin = NULL, robust = TRUE) # Compute penetrance estimates for event 1 for male carriers at age 40, 50, 60, 70 and # their 95 penetrance_cmp(fit2, event = 1, fixed = list(c(1,1), c(1,1)), age = c(40, 50, 60, 70), CI = TRUE, MC = 200) ## End(Not run)
## Not run: set.seed(4321) fam2 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", competing = TRUE, base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02) fit2 <- penmodel_cmp(Surv(time, status==1) ~ gender + mgene, Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", frailty.dist = "cgamma", parms = list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), data = fam2, design = "pop+", base.dist = "Weibull", agemin = NULL, robust = TRUE) # Compute penetrance estimates for event 1 for male carriers at age 40, 50, 60, 70 and # their 95 penetrance_cmp(fit2, event = 1, fixed = list(c(1,1), c(1,1)), age = c(40, 50, 60, 70), CI = TRUE, MC = 200) ## End(Not run)
Fits a penetrance model for family data based on a prospective likelihood with ascertainment correction and provides model parameter estimates.
penmodel(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", frailty.dist = "none", agemin = NULL, robust = FALSE)
penmodel(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", frailty.dist = "none", agemin = NULL, robust = FALSE)
formula |
A formula expression as for other regression models. The response should be a survival object as returned by the |
cluster |
Name of cluster variable. Default is |
gvar |
Name of genetic variable. Default is |
parms |
Vector of initial values for the parameters in the model including baseline parameters and regression coefficients. |
cuts |
Vector of cut points that define the intervals where the hazard function is constant. The |
data |
Data frame generated from |
design |
Study design of the family data. Possible choices are: |
base.dist |
Choice of baseline hazard distributions to fit. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
agemin |
Minimum age of disease onset or minimum age. Default is |
robust |
Logical; if |
When frailty.dist = "none"
, the following penetrance model is fitted to family data with a specified baseline hazard distribution
h(t|xs, xg) = h0(t - t0) exp(βs * xs + βg * xg),
where h0(t) is the baseline hazards function specified by base.dist
, which depends on the shape and scale parameters, λ and ρ; xs indicates male (1) and female (0) and xg indicates carrier (1) or non-carrier (0) of a gene of interest (major gene). Additional covariates can be added to formula
in the model.
When frailty.dist
is specified as either "gamma"
or "lognormal"
, the follwoing shared frailty model is fitted to family data
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, and Z represents a frailty shared within families whose distribution is specified by frailty.dist
.
Choice of frailty distributions
frailty.dist = "gamma"
assumes follows Gamma(
,
).
frailty.dist = "lognormal"
assumes follows log-normal distribution with mean 0 and variance
.
frailty.dist = "none"
shares no frailties within families and assumes independence among family members.
For family data arising from population- or clinic-based study designs (design="pop", "pop+"
, "cli"
, or "cli+"
), the parameters of the penetrance model are estimated using the ascertainment-corrected prospective likelihood approach (Choi, Kopciuk and Briollais, 2008).
For family data arising from a two-stage study design (design="twostage"
), model parameters are estimated using the composite likelihood approach (Choi and Briollais, 2011)
Note that the baseline parameters include lambda
and rho
, which represent the scale and shape parameters, respectively, and eta
, additional parameter to specify for "logBurr"
distribution. For the "lognormal"
baseline distribution, lambda
and rho
represent the location and scale parameters for the normally distributed logarithm, where lambda
can take any real values and rho
> 0. For the other baselinse distributions, lambda
> 0, rho
> 0, and eta
> 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise"
, baseparm
should specify the initial interval-constant values, one more than the cut points specified bycuts
.
Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters () for
"Weibull"
, "loglogistic"
, "Gompertz"
and "gamma"
baselines, to () for
"logBurr"
and to the piecewise constant parameters for a piecewise
baseline hazard. For "lognormal"
baseline distribution, the log transformation is applied only to , not to
, which represents the location parameter for the normally distributed logarithm.
Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance
function via Monte-Carlo simulations of the estimated penetrance model.
Returns an object of class 'penmodel'
, including the following elements:
estimates |
Parameter estimates of transformed baseline parameters and regression coefficients. |
varcov |
Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix. |
varcov.robust |
Robust ‘sandwich’ variance-covariance matrix of parameter estimates when |
se |
Standard errors of parameter estimates obtained from the inverse of Hessian matrix. |
se.robust |
Robust ‘sandwich’ standard errors of parameter estimates when |
logLik |
Loglikelihood value for the fitted penetrance model. |
AIC |
Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model. |
Yun-Hee Choi
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
penmodelEM
, simfam
, penplot
, print.penmodel
,
summary.penmodel
, print.summary.penmodel
, plot.penmodel
# Family data simulated from population-based design using a Weibull baseline hazard set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02) # Penetrance model fit for simulated family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull") # Summary of the model parameter estimates from the model fit summary(fit) # Plot the lifetime penetrance curves with 95% CIs from the model fit for specific # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit, add.KM = TRUE, conf.int = TRUE, MC = 100)
# Family data simulated from population-based design using a Weibull baseline hazard set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02) # Penetrance model fit for simulated family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull") # Summary of the model parameter estimates from the model fit summary(fit) # Plot the lifetime penetrance curves with 95% CIs from the model fit for specific # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit, add.KM = TRUE, conf.int = TRUE, MC = 100)
Fits a competing risks model for family data with ascertainment correction and provides model parameter estimates.
penmodel_cmp(formula1, formula2, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", frailty.dist = "none", agemin = NULL, robust = FALSE)
penmodel_cmp(formula1, formula2, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", frailty.dist = "none", agemin = NULL, robust = FALSE)
formula1 |
A formula expression for event 1 as for other regression models. The response should be a survival object as returned by the |
formula2 |
A formula expression for event 2 as for other regression models. The response should be a survival object as returned by the |
cluster |
Name of cluster variable. Default is |
gvar |
Name of genetic variable. Default is |
parms |
list of Vectors of initial values for the parameters in each model including baseline parameters and regression coefficients and frailty parameters.
|
cuts |
Vector of cut points that define the intervals when |
data |
Data frame generated from |
design |
Study design of the family data. Possible choices are: |
base.dist |
Vector of two baseline hazard distributions to be fitted for competing events. Possible choices for each event are: |
frailty.dist |
Choice of frailty distribution to fit a shared frailty model for competing events. Possible choices are: |
agemin |
Minimum age of disease onset or minimum age. Default is |
robust |
Logical; if |
The shared frailty comepting risks model is fitted to family data with specified baseline hazard distributions and frailty distribution
Event 1:
h1(t|X,Z) = h01(t - t0) Z1 exp(βs1 * xs + βg1 * xg),
Event 2:
h2(t|X,Z) = h02(t - t0) Z2 exp(βs2 * xs + βg2 * xg),
where h01(t) and h02(t) are the baseline hazard functions for event 1 and event 2, respectively, which can be specified by base.dist
. t0 is a minimum age of disease onset, Z1 and Z2 are frailties shared within families for each event and follow either a gamma, log-normal, correlateg gamma, or correlated log-normal distributions, which can be specified by frailty.dist
. xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively. Additional covariates can be added to formula1
for event 1 and formula2
for event 2 in the model.
Choice of frailty distributions for competing risk models
frailty.dist = "gamma"
shares the frailties within families generated from a gamma distribution independently for each competing event, where
Zj follows Gamma(kj, 1/kj).
frailty.dist = "lognormal"
shares the frailties within families generated from a log-normal distribution independently for each competing event, where
Zj follows log-normal distribution with mean 0 and variance 1/kj.
frailty.dist = "cgamma"
shares the frailties within families generated from a correlated gamma distribution to allow the frailties between two events to be correlated, where the correlated gamma frailties (Z1, Z2) are generated with three independent gamma frailties (Y0, Y1, Y2) as follows:
;
Z2 = k0/(k0 + k2) Y0 + Y2,
where Y0 from Gamma(k0, 1/k0), Y1 from Gamma(k1, 1/(k0 + k1)), Y2 from Gamma(k2, 1/(k0 + k2)).
frailty.dist = "clognormal"
shares the frailties within families generated from a correlated log-normal distribution where
log(Zj) follows a normal distribution with mean 0, variance 1/kj and correlation between two events k0.
depend
should specify the values of related frailty parameters: c(k1, k2)
with frailty.dist = "gamma"
or frailty.dist = "lognormal"
; c(k1, k2, k0)
for frailty.dist = "cgamma"
or frailty.dist = "clognormal"
.
More details about the competing risks model for family data arising from population-based study designs (design="pop", "pop+"
and their inference procedure based on the ascertainment-corrected likelihood approach can be found in Choi et al., 2021.
Note that the baseline parameters include lambda
and rho
, which represent the scale and shape parameters, respectively, and eta
, additional parameter to specify for "logBurr"
distribution. For the "lognormal"
baseline distribution, lambda
and rho
represent the location and scale parameters for the normally distributed logarithm, where lambda
can take any real values and rho
> 0. For the other baselinse distributions, lambda
> 0, rho
> 0, and eta
> 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise"
, baseparm
should specify the initial interval-constant values, one more than the cut points specified bycuts
.
Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters () for
"Weibull"
, "loglogistic"
, "Gompertz"
and "gamma"
baselines, to () for
"logBurr"
and to the piecewise constant parameters for a piecewise
baseline hazard. For "lognormal"
baseline distribution, the log transformation is applied only to , not to
, which represents the location parameter for the normally distributed logarithm.
Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance
function via Monte-Carlo simulations of the estimated penetrance model.
Returns an object of class 'penmodel_cmp'
, including the following elements:
estimates |
Parameter estimates of transformed baseline parameters and regression coefficients. |
varcov |
Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix. |
varcov.robust |
Robust ‘sandwich’ variance-covariance matrix of parameter estimates when |
se |
Standard errors of parameter estimates obtained from the inverse of Hessian matrix. |
se.robust |
Robust ‘sandwich’ standard errors of parameter estimates when |
logLik |
Loglikelihood value for the fitted penetrance model. |
AIC |
Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model. |
Yun-Hee Choi
Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
simfam_cmp
, penplot_cmp
, print.penmodel_cmp
,
summary.penmodel_cmp
, print.summary.penmodel_cmp
, plot.penmodel_cmp
# Competing risk family data simulated from population-based design # using Weibull baseline hazards with gamma frailty distribution. ## Not run: set.seed(4321) fam1 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(0.5, 1, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) # Fitting shared gamma frailty Penetrance model for simulated competing risk data fit1 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1)), base.dist = "Weibull", frailty.dist = "gamma", data = fam1, robust = TRUE) # Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data fit2 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)), base.dist = "Weibull", frailty.dist = "cgamma", data = fam1, robust = TRUE) # Summary of the model parameter estimates from the model fit summary(fit1) # Plot the lifetime penetrance curves with 95 # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit1, add.CIF = TRUE, conf.int = TRUE, MC = 100) ## End(Not run)
# Competing risk family data simulated from population-based design # using Weibull baseline hazards with gamma frailty distribution. ## Not run: set.seed(4321) fam1 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(0.5, 1, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) # Fitting shared gamma frailty Penetrance model for simulated competing risk data fit1 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1)), base.dist = "Weibull", frailty.dist = "gamma", data = fam1, robust = TRUE) # Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data fit2 <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", gvar = "mgene", design = "pop+", parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)), base.dist = "Weibull", frailty.dist = "cgamma", data = fam1, robust = TRUE) # Summary of the model parameter estimates from the model fit summary(fit1) # Plot the lifetime penetrance curves with 95 # gender and mutation status groups along with their nonparametric penetrance curves # based on data excluding probands. plot(fit1, add.CIF = TRUE, conf.int = TRUE, MC = 100) ## End(Not run)
Fits a penetrance model for family data with missing genotypes via the EM algorithm and provides model parameter estimates.
penmodelEM(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", agemin = NULL, robust = FALSE, method = "data", mode = "dominant", q = 0.02)
penmodelEM(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, design = "pop", base.dist = "Weibull", agemin = NULL, robust = FALSE, method = "data", mode = "dominant", q = 0.02)
formula |
A formula expression as for other regression models. The response should be a survival object as returned by the |
cluster |
Name of cluster variable. Default is |
gvar |
Name of genetic variable. Default is |
parms |
Vector of initial values for the parameters in the model including baseline parameters and regression coefficients. |
cuts |
Vector of cuts that define the intervals where the hazard function is constant. The |
data |
Data frame generated from |
design |
Study design of the family data. Possible choices are: |
base.dist |
Choice of baseline hazard distributions to fit. Possible choices are: |
agemin |
Minimum age of disease onset or minimum age. Default is |
robust |
Logical; if |
method |
Choice of methods for calculating the carrier probabilities for individuals with missing mutation status. Possible choices are If |
mode |
Choice of modes of inheritance for calculating carrier probabilies for individuals with missing mutation status. Possible choices are |
q |
Frequency of the disease causing allele used for calculating carrier pobabilities. The value should be between 0 and 1. If |
The expectation and maximization (EM) algorithm is applied for making inference about the missing genotypes. In the expectation step, for individuals with unknown carrier status, we first compute their carrier probabilities given their family's observed phenotype and genotype information based on current estimates of parameters θ as follows,
wfi = P(Xfi = 1 | Yfi, Xfo),
where Xfi represents the mutation carrier status and Yfi represents the phenotype in terms of age at onset tfi and disease status δfi for individual i, i = 1, ..., nf, in family and Xfo represents the observed genotypes in family
.
Then, we obtain the conditional expectation of the log-likelihood function () of the complete data given the observed data as a weighted log-likelihood, which has the form
Eθ [ l (θ | Y, Xo) ] = ∑f ∑i l fi (θ | Xfi = 1) * wfi + lfi (θ | Xfi = 0) * ( 1 - wfi ).
In the maximization step, the updated parameter estimates are obtained by maximizing the weighted log likelihood computed in the E-step. These expectation and maximization steps iterate until convergence to obtain the maximum likelihood estimates. See more details in Choi and Briollais (2011) or Choi et al. (2014).
Note that the baseline parameters include lambda
and rho
, which represent the scale and shape parameters, respectively, and eta
, additional parameter to specify for "logBurr"
distribution. For the "lognormal"
baseline distribution, lambda
and rho
represent the location and scale parameters for the normally distributed logarithm, where lambda
can take any real values and rho
> 0. For the other baselinse distributions, lambda
> 0, rho
> 0, and eta
> 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise"
, baseparm
should specify the initial interval-constant values, one more than the cut points specified bycuts
.
Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters () for
"Weibull"
, "loglogistic"
, "Gompertz"
and "gamma"
baselines, to () for
"logBurr"
and to the piecewise constant parameters for a piecewise
baseline hazard. For "lognormal"
baseline distribution, the log transformation is applied only to , not to
, which represents the location parameter for the normally distributed logarithm.
Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance
function via Monte-Carlo simulations of the estimated penetrance model.
Returns an object of class 'penmodel'
, including the following elements:
estimates |
Parameter estimates of transformed baseline parameters and regression coefficients. |
varcov |
Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix. |
varcov.robust |
Robust ‘sandwich’ variance-covariance matrix of parameter estimates when |
se |
Standard errors of parameter estimates obtained from the inverse of Hessian matrix. |
se.robust |
Robust ‘sandwich’ standard errors of parameter estimates when |
logLik |
Loglikelihood value for the fitted penetrance model. |
AIC |
Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model. |
Yun-Hee Choi
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H. and Briollais, L. (2011) An EM composite likelihood approach for multistage sampling of family data with missing genetic covariates, Statistica Sinica 21, 231-253.
Choi, Y.-H., Briollais, L., Green, J., Parfrey, P., and Kopciuk, K. (2014) Estimating successive cancer risks in Lynch Syndrome families using a progressive three-state model, Statistics in Medicine 33, 618-638.
simfam
, penmodel
, print.penmodel
, summary.penmodel
,
print.summary.penmodel
, plot.penmodel
, carrierprob
# Family data simulated with 20% of members missing their genetic information. set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(1, 2), agemin = 20, allelefreq = 0.02, mrate = 0.2) # EM algorithm for fitting family data with missing genotypes fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", parms = c(0.01, 3, 1, 2), data = fam, design="pop+", robust = TRUE, base.dist = "Weibull", method = "mendelian", mode = "dominant", q = 0.02) # Summary of the model parameter estimates from the model fit by penmodelEM summary(fit) # Plot the lifetime penetrance curves from model fit for gender and # mutation status groups along with their nonparametric penetrance curves # based on observed data excluding probands. plot(fit)
# Family data simulated with 20% of members missing their genetic information. set.seed(4321) fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(1, 2), agemin = 20, allelefreq = 0.02, mrate = 0.2) # EM algorithm for fitting family data with missing genotypes fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", parms = c(0.01, 3, 1, 2), data = fam, design="pop+", robust = TRUE, base.dist = "Weibull", method = "mendelian", mode = "dominant", q = 0.02) # Summary of the model parameter estimates from the model fit by penmodelEM summary(fit) # Plot the lifetime penetrance curves from model fit for gender and # mutation status groups along with their nonparametric penetrance curves # based on observed data excluding probands. plot(fit)
Plots the penetrance functions given the values of baseline parameters and regression coefficients and choices of baseline and frailty distributions.
penplot(base.parms, vbeta, cuts = NULL, variation = "none", base.dist = "Weibull", frailty.dist = NULL, depend = 1, agemin = 20, agemax = 80, print = TRUE, col = c("blue","red","blue","red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, x = "topleft", y = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
penplot(base.parms, vbeta, cuts = NULL, variation = "none", base.dist = "Weibull", frailty.dist = NULL, depend = 1, agemin = 20, agemax = 80, print = TRUE, col = c("blue","red","blue","red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, x = "topleft", y = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
base.parms |
Vector of parameter values for the specified baseline hazard function: |
vbeta |
Vector of regression coefficients for gender and majorgene, |
cuts |
Vector of cut points defining the intervals where the hazard function is constant. The |
variation |
Source of residual familial correlation. Possible choices are: |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are |
depend |
Variance of the frailty distribution. Dependence within families increases with |
agemin |
Minimum age of disease onset. Default is 20 years of age. |
agemax |
Maximum age of disease onset. Default is 80 years of age. |
print |
Logical; if |
col |
Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is |
lty |
Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is |
add.legend |
Logical; if |
add.title |
Logical; if |
x , y
|
Position of legend; see legend. Defaults are |
xlab |
Title for the x-axis. Default is |
ylab |
Title for the y-axis. Default is |
ylim |
Limits of the y-axis. Default is |
main |
Main title of the plot. Default is |
... |
Other parameters to be passed through to plotting functions. |
Proportional hazard models
The penetrance model conditional on the covariates X = c(xs, xg) is assumed to have the following hazard function: h(t|X) = h0(t - t0) exp(βs * xs + βg * xg), where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The penetrance function for the penetrance model has the form, 1 - exp(- H0(t - t0) * exp(βs * xs + βg * xg )), where H0(t) is the cumulative baseline hazard function.
Shared frailty models
The penetrance model conditional on the frailty and covariates
X = c(xs, xg) is assumed to have the following hazard function:
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
For example, when using a Weibull distribution for baseline hazard and a gamma distribution for frailty, the penetrance function has the form 1 - (1 + λρ * (t - t0)ρ * exp(βs * xs + βg * xg)/κ)-κ.
Two-gene models
The penetrance curve for the two-gene model is generated by 1 - exp(- H0(t - t0) * exp(βs * xs + β1 * x1 + β2 * x2)), where H0(t) is the cumulative baseline hazard function, x1 indicates carrior (1) or non-carrior (0) of a major gene and x2 indicates carrior (1) or non-carrior (0) of a second gene. When plotting with the two-gene model, the plot will generate separate curves for mutation carriers and noncarriers, and separate curves for the second gene carriers and noncarriers.
Displays plots of the penetrance functions and returns the following values:
pen70 |
Penetrance estimates by age 70 specific to gender and mutation-status subgroups. |
x.age |
Vector of ages of onset ranging from |
pen |
Penetrance estimates computed at each age of |
Yun-Hee Choi
# Penetrance function curves based on Weibull baseline hazard function penplot(base.parms = c(0.01,3), vbeta = c(0.5, 2), variation = "none", base.dist = "Weibull", agemin = 20, ylim = c(0,1))
# Penetrance function curves based on Weibull baseline hazard function penplot(base.parms = c(0.01,3), vbeta = c(0.5, 2), variation = "none", base.dist = "Weibull", agemin = 20, ylim = c(0,1))
Plots the penetrance functions from competing risk models given the values of baseline parameters and regression coefficients and choices of baseline and frailty distributions.
penplot_cmp(event, base.parms, vbeta, cuts = NULL, variation = "none", base.dist = "Weibull", frailty.dist = NULL, depend = c(1, 1), agemin = 20, agemax = 80, print = TRUE, col = c("blue","red","blue","red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, x = "topleft", y = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
penplot_cmp(event, base.parms, vbeta, cuts = NULL, variation = "none", base.dist = "Weibull", frailty.dist = NULL, depend = c(1, 1), agemin = 20, agemax = 80, print = TRUE, col = c("blue","red","blue","red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, x = "topleft", y = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
event |
Event of interest for penetrance function: either 1 or 2. Default is |
base.parms |
List of vectors of parameter values for the specified baseline hazard functions for both events. For example, |
vbeta |
List of vectors of regression coefficients for gender and majorgene, |
cuts |
Vector of cut points defining the intervals when |
variation |
Source of residual familial correlation. Possible choices are: |
base.dist |
Vector of two baseline hazard distributions chosen for competing events. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are |
depend |
Vector of frailty parameter values assumed for specified frailty distribution. They corresponds inverse of variance of the frailty distribution. Dependence within families decreases with |
agemin |
Minimum age of disease onset. Default is 20 years of age. |
agemax |
Maximum age of disease onset. Default is 80 years of age. |
print |
Logical; if |
col |
Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is |
lty |
Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is |
add.legend |
Logical; if |
add.title |
Logical; if |
x , y
|
Position of legend; see legend. Defaults are |
xlab |
Title for the x-axis. Default is |
ylab |
Title for the y-axis. Default is |
ylim |
Limits of the y-axis. Default is |
main |
Main title of the plot. Default is |
... |
Other parameters to be passed through to plotting functions. |
Cause-specific proportional hazard models
The penetrance models for competing events conditional on the covariates X = c(xs, xg) are assumed to have the following hazard functions for event :
hj(t|X) = h0j(t - t0) exp(βjs * xs + βjg * xg),
where h0j(t) is the baseline hazard function for event
,
, t0j is a minimum age of disease onset, xs and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a major gene of interest, respectively.
The penetrance function for the penetrance model has the form, 1 - exp(- H0(t - t0) * exp(βs * xs + βg * xg )), where H0(t) is the cumulative baseline hazard function.
Shared frailty models
The penetrance model conditional on the frailty and covariates
X = c(xs, xg) is assumed to have the following hazard function:
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
For example, when using a Weibull distribution for baseline hazard and a gamma distribution for frailty, the penetrance function has the form 1 - (1 + λρ * (t - t0)ρ * exp(βs * xs + βg * xg)/κ)-κ.
Two-gene models
The penetrance curve for the two-gene model is generated by 1 - exp(- H0(t - t0) * exp(βs * xs + β1 * x1 + β2 * x2)), where H0(t) is the cumulative baseline hazard function, x1 indicates carrior (1) or non-carrior (0) of a major gene and x2 indicates carrior (1) or non-carrior (0) of a second gene. When plotting with the two-gene model, the plot will generate separate curves for mutation carriers and noncarriers, and separate curves for the second gene carriers and noncarriers.
Displays plots of the penetrance functions and returns the following values:
pen70 |
Penetrance estimates by age 70 specific to gender and mutation-status subgroups. |
x.age |
Vector of ages of onset ranging from |
pen |
Penetrance estimates computed at each age of |
Yun-Hee Choi
# Penetrance function curves for event 1 # based on Weibull baselines (no frailty) penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), variation = "none", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on gamma frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2), variation = "frailty", frailty.dist="gamma", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on correlated gamma frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), variation = "frailty", frailty.dist="cgamma", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on correlated lognormal frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), variation = "frailty", frailty.dist="clognormal", base.dist = "Weibull", agemin = 20, ylim = c(0,1))
# Penetrance function curves for event 1 # based on Weibull baselines (no frailty) penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), variation = "none", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on gamma frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2), variation = "frailty", frailty.dist="gamma", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on correlated gamma frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), variation = "frailty", frailty.dist="cgamma", base.dist = "Weibull", agemin = 20, ylim = c(0,1)) # Penetrance function curves for event 1 # based on correlated lognormal frailty and Weibull baselines penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), variation = "frailty", frailty.dist="clognormal", base.dist = "Weibull", agemin = 20, ylim = c(0,1))
penmodel
Plots penetrance curves estimated from the fitted penetrance model and overlays non-parametric penetrance curves estimated from the data without proabands.
## S3 method for class 'penmodel' plot(x, agemax = 80, print = TRUE, mark.time = FALSE, conf.int = FALSE, add.KM = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, xpos = "topleft", ypos = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
## S3 method for class 'penmodel' plot(x, agemax = 80, print = TRUE, mark.time = FALSE, conf.int = FALSE, add.KM = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), add.legend = TRUE, add.title = TRUE, xpos = "topleft", ypos = NULL, xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL, ...)
x |
An object class of |
agemax |
Maximum age of disease onset or maximum age. Default is 80 years of age. |
print |
Logical; if |
mark.time |
Logical; if |
conf.int |
Logical; if |
add.KM |
Logical; if |
MC |
Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method.
If |
col |
Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is |
lty |
Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is |
add.legend |
Logical; if |
add.title |
Logical; if |
xpos , ypos
|
Position of legend; see legend. Defaults are |
xlab |
Title for the x-axis. Default is |
ylab |
Title for the y-axis. Default is |
ylim |
Limits for the y-axis. Default is |
main |
Main title of the plot. Default is |
... |
Other parameters to be passed through to plotting functions. |
The 95% confidence intervals for the parametric penetrance curves are
obtained based on simulations of the parameters, assuming a multivariate normal distribution for the estimated
parameters with their variance-covariance matrix. See penetrance
for more details.
Returns the following summary values:
coefficients |
Parameter estimates of transformed baseline parameters ( |
pen70 |
Penetrance estimates by age 70, specific to gender and mutation-status subgroups. |
x.age |
Vector of ages of onsest ranging from |
pen |
Penetrance estimates at each age in |
lower |
Lower limits of 95% confidence interval estimates for penetrance at each age in |
upper |
Upper limits of 95% confidence interval estimates for penetrance at each age in |
Yun-Hee Choi
penmodel, print.penmodel, penmodelEM, summary.penmodel,print.summary.penmodel
,
simfam
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 300, design = "pop+", base.dist = "Weibull", variation = "none", base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), allelefreq = 0.02, agemin = 20) # Fit family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE) # Plot penetrance function curves with 95% CIs plot(fit, agemax = 80, conf.int = TRUE)
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 300, design = "pop+", base.dist = "Weibull", variation = "none", base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), allelefreq = 0.02, agemin = 20) # Fit family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+", parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE) # Plot penetrance function curves with 95% CIs plot(fit, agemax = 80, conf.int = TRUE)
penmodel_cmp
Plots penetrance curves for each event estimated from the fitted competing risks model and overlays non-parametric cumulative incidence curves estimated from the data without proabands.
## S3 method for class 'penmodel_cmp' plot(x, agemax = 80, print = TRUE, conf.int = FALSE, add.CIF = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, ...)
## S3 method for class 'penmodel_cmp' plot(x, agemax = 80, print = TRUE, conf.int = FALSE, add.CIF = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, ...)
x |
An object class of |
agemax |
Maximum age of disease onset or maximum age. Default is 80 years of age. |
print |
Logical; if |
conf.int |
Logical; if |
add.CIF |
Logical; if |
MC |
Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method.
If |
col |
Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is |
lty |
Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is |
xlab |
Title for the x-axis. Default is |
ylab |
Title for the y-axis. Default is |
ylim |
Limits for the y-axis. Default is |
... |
Other parameters to be passed through to plotting functions. |
The 95% confidence intervals for the parametric penetrance curves are
obtained based on simulations of the parameters, assuming a multivariate normal distribution for the estimated
parameters with their variance-covariance matrix. See penetrance_cmp
for more details.
Returns the following summary values:
coefficients |
Parameter estimates from the competing risks model. |
pen70 |
Penetrance estimates by age 70, specific to gender and mutation-status subgroups. |
age |
Vector of ages of onsest ranging from |
pen1 |
Penetrance estimates for event 1 at each age in |
pen2 |
Penetrance estimates for event 2 at each age in |
lower1 |
Lower limits of 95% confidence interval estimates for penetrance for event 1 at each age in |
upper1 |
Upper limits of 95% confidence interval estimates for penetrance for event 1 at each age in |
lower2 |
Lower limits of 95% confidence interval estimates for penetrance for event 2 at each age in |
upper2 |
Upper limits of 95% confidence interval estimates for penetrance for event 2 at each age in |
Yun-Hee Choi
penmodel_cmp, print.penmodel_cmp, summary.penmodel_cmp,
print.summary.penmodel_cmp, simfam_cmp
## Not run: # Simulate family data set.seed(4321) fam2 <- simfam_cmp(N.fam = 500, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Fit family data fit2 <- penmodel_cmp(formula1 = Surv(time, status==1)~ gender + mgene, formula2 = Surv(time, status==2)~ gender + mgene, cluster = "famID", gvar = "mgene", frailty.dist = "cgamma", parms=list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), data=fam2, design="pop+", base.dist="Weibull", robust=TRUE) # Plot penetrance function curves with 95 plot(fit2, conf.int=TRUE, MC=200, ylim=c(0, 0.7)) ## End(Not run)
## Not run: # Simulate family data set.seed(4321) fam2 <- simfam_cmp(N.fam = 500, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Fit family data fit2 <- penmodel_cmp(formula1 = Surv(time, status==1)~ gender + mgene, formula2 = Surv(time, status==2)~ gender + mgene, cluster = "famID", gvar = "mgene", frailty.dist = "cgamma", parms=list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), data=fam2, design="pop+", base.dist="Weibull", robust=TRUE) # Plot penetrance function curves with 95 plot(fit2, conf.int=TRUE, MC=200, ylim=c(0, 0.7)) ## End(Not run)
simfam
or Plot pedigreesProvides pedigree plots for specified families generated from simfam
function with option to save plots into a pdf file.
## S3 method for class 'simfam' plot(x, famid, pdf = FALSE, file = NULL, ...)
## S3 method for class 'simfam' plot(x, famid, pdf = FALSE, file = NULL, ...)
x |
An object of class |
famid |
List of family IDs to plot. Default is the first family in given data set. |
pdf |
Logical; if |
file |
File name to save the pedigree plots; Default file name is |
... |
Additional arguments passed on to the plot function. |
Argument x
can be a data frame that contains famID
, indID
, fatherID
, motherID
, gender
(1 for male, 0 for female), status
(1 for affected, 0 for non-affected), mgene
(1 for mutation carrier, 0 for non-carrier, NA
for missing), and proband
(1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame")
.
Optionally, the data frame can contain a column named carrp.geno
or carrp.pheno
to replace missing values in mgene
with their carrier probabilities.
Returns pedigree plots for specified families created by simfam
function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.
When object inlcudes carrp.geno
and/or carrp.pheno
generated by carrierprob
function, the plot
function displays the carrier probabilities for those with missing carrier status.
simfam, summary.simfam, carrierprob
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20) # Pedigree plots for first three simulated families plot(fam, famid = c(1:3))
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20) # Pedigree plots for first three simulated families plot(fam, famid = c(1:3))
simfam_cmp
or Plot pedigreesProvides pedigree plots for specified families generated from simfam_cmp
function with option to save plots into a pdf file.
## S3 method for class 'simfam_cmp' plot(x, famid, pdf = FALSE, file = NULL, ...)
## S3 method for class 'simfam_cmp' plot(x, famid, pdf = FALSE, file = NULL, ...)
x |
An object of class |
famid |
List of family IDs to plot. Default is the first family in given data set. |
pdf |
Logical; if |
file |
File name to save the pedigree plots; Default file name is |
... |
Additional arguments passed on to the plot function. |
Argument x
can be a data frame that contains famID
, indID
, fatherID
, motherID
, gender
(1 for male, 0 for female), status
(1 for affected by event 1, 2 for affected by event 2, 0 for non-affected), mgene
(1 for mutation carrier, 0 for non-carrier, NA
for missing), and proband
(1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame")
.
Optionally, the data frame can contain a column named carrp.geno
or carrp.pheno
to replace missing values in mgene
with their carrier probabilities.
Returns pedigree plots for specified families created by simfam_cmp
function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.
When object inlcudes carrp.geno
and/or carrp.pheno
generated by carrierprob
function, the plot
function displays the carrier probabilities for those with missing carrier status.
simfam, summary.simfam, carrierprob
# Simulated competing risk data from gamma frailty model based on pop+ design set.seed(4321) fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend=c(2, 2), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Pedigree plots for first three simulated families plot(fam, famid = 1)
# Simulated competing risk data from gamma frailty model based on pop+ design set.seed(4321) fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend=c(2, 2), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Pedigree plots for first three simulated families plot(fam, famid = 1)
simfam_tvc
or Plot pedigreesProvides pedigree plots for specified families generated from simfam_tvc
function with option to save plots into a pdf file.
## S3 method for class 'simfam_tvc' plot(x, famid, pdf = FALSE, file = NULL, ...)
## S3 method for class 'simfam_tvc' plot(x, famid, pdf = FALSE, file = NULL, ...)
x |
An object of class |
famid |
List of family IDs to plot. Default is the first family in given data set. |
pdf |
Logical; if |
file |
File name to save the pedigree plots; Default file name is |
... |
Additional arguments passed on to the plot function. |
Argument x
can be a data frame that contains famID
, indID
, fatherID
, motherID
, gender
(1 for male, 0 for female), status
(1 for affected, 0 for non-affected), mgene
(1 for mutation carrier, 0 for non-carrier, NA
for missing), and proband
(1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame")
.
Optionally, the data frame can contain a column named carrp.geno
or carrp.pheno
to replace missing values in mgene
with their carrier probabilities.
Returns pedigree plots for specified families created by simfam_tvc
function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.
When object inlcudes carrp.geno
and/or carrp.pheno
generated by carrierprob
function, the plot
function displays the carrier probabilities for those with missing carrier status.
simfam_tvc, summary.simfam_tvc, carrierprob
# Simulated family data set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) # Pedigree plots for first three simulated families plot(fam, famid = c(1:2))
# Simulated family data set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) # Pedigree plots for first three simulated families plot(fam, famid = c(1:2))
simfam2
or Plot pedigreesProvides pedigree plots for specified families generated from simfam2
function with option to save plots into a pdf file.
## S3 method for class 'simfam2' plot(x, famid, pdf = FALSE, file = NULL, ...)
## S3 method for class 'simfam2' plot(x, famid, pdf = FALSE, file = NULL, ...)
x |
An object of class |
famid |
List of family IDs to plot. Default is the first family in given data set. |
pdf |
Logical; if |
file |
File name to save the pedigree plots; Default file name is |
... |
Additional arguments passed on to the plot function. |
Argument x
can be a data frame that contains famID
, indID
, fatherID
, motherID
, gender
(1 for male, 0 for female), status
(1 for affected, 0 for non-affected), mgene
(1 for mutation carrier, 0 for non-carrier, NA
for missing), and proband
(1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame")
.
Optionally, the data frame can contain a column named carrp.geno
or carrp.pheno
to replace missing values in mgene
with their carrier probabilities.
Returns pedigree plots for specified families created by simfam2
function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.
When object inlcudes carrp.geno
and/or carrp.pheno
generated by carrierprob
function, the plot
function displays the carrier probabilities for those with missing carrier status.
simfam2, summary.simfam2, carrierprob
set.seed(4321) data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) plot(fam2, famid = c(1:2))
set.seed(4321) data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) plot(fam2, famid = c(1:2))
penmodel
.
Prints a summary of parameter estimates of a fitted penetrance model.
## S3 method for class 'penmodel' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'penmodel' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Prints a short summary of the model and model fit.
Returns an object of class 'penmodel'
.
Yun-Hee Choi
penmodel, penmodelEM, summary.penmodel, print.summary.penmodel, plot.penmodel
penmodel_cmp
.
Prints a summary of parameter estimates of a fitted competing risk penetrance model.
## S3 method for class 'penmodel_cmp' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'penmodel_cmp' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Prints a short summary of the model and model fit.
Returns an object of class 'penmodel_cmp'
.
Yun-Hee Choi
penmodel_cmp, summary.penmodel_cmp, print.summary.penmodel_cmp, plot.penmodel_cmp
summary.penmodel
of a fitted penetrance model.
Prints a short summary of parameter and penetrance estimates of a 'summary.penmodel'
object.
## S3 method for class 'summary.penmodel' print(x, digits = max(3, getOption("digits") - 3), signif.stars=TRUE, ...)
## S3 method for class 'summary.penmodel' print(x, digits = max(3, getOption("digits") - 3), signif.stars=TRUE, ...)
x |
An object class of |
digits |
Number of significant digits to use when printing. |
signif.stars |
Logical; if |
... |
Further arguments passed to or from other methods. |
Prints a summary of parameter estimates, their standard errors, -statistics and corresponding two-sided
-values and additionally indicates significance stars if
signif.stars
is TRUE
.
Also prints penetrance estimates by age 70 specific to gender and mutation-status subgroups along with their standard errors and 95% confidence intervals.
Returns an object of class 'summary.penmodel'
.
Yun-Hee Choi
penmodel, penmodelEM, print.penmodel, summary.penmodel
summary.penmodel_cmp
of a fitted competing risks penetrance model.
Prints a short summary of parameter and penetrance estimates of a 'summary.penmodel_cmp'
object.
## S3 method for class 'summary.penmodel_cmp' print(x, digits = max(3, getOption("digits") - 3), signif.stars=TRUE, ...)
## S3 method for class 'summary.penmodel_cmp' print(x, digits = max(3, getOption("digits") - 3), signif.stars=TRUE, ...)
x |
An object class of |
digits |
Number of significant digits to use when printing. |
signif.stars |
Logical; if |
... |
Further arguments passed to or from other methods. |
Prints a summary of parameter estimates, their standard errors, -statistics and corresponding two-sided
-values and additionally indicates significance stars if
signif.stars
is TRUE
.
Also prints penetrance estimates for each event by age 70 specific to gender and mutation-status subgroups along with their standard errors and 95% confidence intervals.
Returns an object of class 'summary.penmodel_cmp'
.
Yun-Hee Choi
penmodel_cmp, print.penmodel_cmp, summary.penmodel_cmp
Generates familial time-to-event data for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.
simfam(N.fam, design = "pop", variation = "none", interaction = FALSE, add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), vbeta = c(1, 1), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
simfam(N.fam, design = "pop", variation = "none", interaction = FALSE, add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), vbeta = c(1, 1), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
add.x |
Logical; if |
x.dist |
Distribution of the covairate. Possible choices to generate the covariate are: |
x.parms |
Parameter values for the specified distribution of the covariate. |
depend |
Inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default is |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
vbeta |
Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if |
allelefreq |
Population allele frequencies of major disease gene. Value should be between 0 and 1.
Vector of population allele frequencies of major and second disease genes should be provided when |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family. design = "noasc"
is to be used for no ascertainment correction.
The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene", "kinship".
. When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg).
The ages at onset correlated within families are generated from the shared frailty model (variation = "frailty"
) , the correlated shared frailty model with kinship matrix (variation = "kinship"
), or the two-gene model (variation = "secondene"
), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.
The proportional hazard model
h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The shared frailty model
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The correlated shared frailty model with kinship matrix
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a vector of frailties following a multivariate log-normal distribution with mean
and variance
, where
represents the kinship matrix, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The two-gene model
h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),
where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.
Note that simulating family data under the clinic-based designs ("cli"
or "cli+"
) or the two-stage design can be slower since the ascertainment criteria for the high risk families are difficult to meet in such settings. Especially, "cli"
design could be slower than "cli+"
design since the proband's mutation status is randomly selected from a disease population in "cli"
design, so his/her family members are less likely to be mutation carriers and have less chance to be affected, whereas the probands are all mutation carriers, their family members have higher chance to be carriers and affected by disease. Therefore, "cli"
design requires more iterations to sample high risk families than "cli+"
design.
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification (ID) numbers. |
||||||||||||||
indID |
Individual ID numbers. |
||||||||||||||
gender |
Gender indicators: 1 for males, 0 for females. |
||||||||||||||
motherID |
Mother ID numbers. |
||||||||||||||
fatherID |
Father ID numbers. |
||||||||||||||
proband |
Proband indicators: 1 if the individual is the proband, 0 otherwise. |
||||||||||||||
generation |
Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings. |
||||||||||||||
majorgene |
Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. |
||||||||||||||
secondgene |
Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. |
||||||||||||||
ageonset |
Ages at disease onset in years. |
||||||||||||||
currentage |
Current ages in years. |
||||||||||||||
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. |
||||||||||||||
status |
Disease statuses: 1 for affected, 0 for unaffected (censored). |
||||||||||||||
mgene |
Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or |
||||||||||||||
relation |
Family members' relationship with the proband:
|
||||||||||||||
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
||||||||||||||
naff |
Number of affected members in family. |
||||||||||||||
weight |
Sampling weights. |
Yun-Hee Choi, Wenqing He
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
summary.simfam, plot.simfam, penplot
## Example 1: simulate family data from a population-based design using # a Weibull distribution for the baseline hazard and inducing # residual familial correlation through a shared gamma frailty. set.seed(4321) fam <- simfam(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) head(fam) ## Not run: famID indID gender motherID fatherID proband generation majorgene secondgene 1 1 1 1 0 0 0 1 2 0 2 1 2 0 0 0 0 1 2 0 3 1 3 0 2 1 1 2 2 0 4 1 4 1 0 0 0 0 3 0 5 1 9 0 3 4 0 3 2 0 6 1 10 1 3 4 0 3 3 0 ageonset currentage time status mgene relation fsize naff weight 1 103.76925 69.19250 69.19250 0 1 4 18 2 1 2 64.88982 67.31119 64.88982 1 1 4 18 2 1 3 45.84891 47.57119 45.84891 1 1 1 18 2 1 4 269.71990 47.37403 47.37403 0 0 6 18 2 1 5 69.78355 27.80081 27.80081 0 1 3 18 2 1 6 192.09392 25.34148 25.34148 0 0 3 18 2 1 ## End(Not run) summary(fam) plot(fam, famid = c(1:2)) # pedigree plots for families with IDs = 1 and 2 ## Example 2: simulate family data from a two-stage design to include # 30% of high risk families in the sample. set.seed(4321) fam <- simfam(N.fam = 50, design = "twostage", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), hr = 0.3, allelefreq = 0.02) summary(fam) ## Example 3: simulate family data from a correlated frailty model with kinship matrix set.seed(4321) fam <- simfam(N.fam = 50, design = "pop", variation = "kinship", base.dist = "Weibull", frailty.dist = "lognormal", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), depend = 1, allelefreq = 0.02) summary(fam)
## Example 1: simulate family data from a population-based design using # a Weibull distribution for the baseline hazard and inducing # residual familial correlation through a shared gamma frailty. set.seed(4321) fam <- simfam(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) head(fam) ## Not run: famID indID gender motherID fatherID proband generation majorgene secondgene 1 1 1 1 0 0 0 1 2 0 2 1 2 0 0 0 0 1 2 0 3 1 3 0 2 1 1 2 2 0 4 1 4 1 0 0 0 0 3 0 5 1 9 0 3 4 0 3 2 0 6 1 10 1 3 4 0 3 3 0 ageonset currentage time status mgene relation fsize naff weight 1 103.76925 69.19250 69.19250 0 1 4 18 2 1 2 64.88982 67.31119 64.88982 1 1 4 18 2 1 3 45.84891 47.57119 45.84891 1 1 1 18 2 1 4 269.71990 47.37403 47.37403 0 0 6 18 2 1 5 69.78355 27.80081 27.80081 0 1 3 18 2 1 6 192.09392 25.34148 25.34148 0 0 3 18 2 1 ## End(Not run) summary(fam) plot(fam, famid = c(1:2)) # pedigree plots for families with IDs = 1 and 2 ## Example 2: simulate family data from a two-stage design to include # 30% of high risk families in the sample. set.seed(4321) fam <- simfam(N.fam = 50, design = "twostage", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), hr = 0.3, allelefreq = 0.02) summary(fam) ## Example 3: simulate family data from a correlated frailty model with kinship matrix set.seed(4321) fam <- simfam(N.fam = 50, design = "pop", variation = "kinship", base.dist = "Weibull", frailty.dist = "lognormal", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), depend = 1, allelefreq = 0.02) summary(fam)
Generates familial competing risks data for specified study design, genetic model and source of residual familial correlation; the generated data frame has the same family structure as that simfam
function, including individual's id, father id, mother id, relationship to proband, generation, gender, current age, genotypes of major or second genes.
simfam_cmp(N.fam, design = "pop+", variation = "none", interaction = FALSE, depend = NULL, base.dist = c("Weibull", "Weibull"), frailty.dist = "none", base.parms = list(c(0.016, 3), c(0.016, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
simfam_cmp(N.fam, design = "pop+", variation = "none", interaction = FALSE, depend = NULL, base.dist = c("Weibull", "Weibull"), frailty.dist = "none", base.parms = list(c(0.016, 3), c(0.016, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
depend |
Two values shoud be specified for each competing event when |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are |
base.parms |
The list of two vectors of baseline parameters for each event should be specified. For example, Two parameters |
vbeta |
List of two vectors of regression coefficients for each event should be specified.
Each vector contains regression coefficients for gender, majorgene, interaction between gender and majorgene (if |
allelefreq |
Population allele frequencies of major disease gene. Value should be between 0 and 1.
Vector of population allele frequencies for major and second disease genes should be provided when |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
Competing risk model
Event 1:
h1(t|X,Z) = h01(t - t0) Z1 exp(βs1 * xs + βg1 * xg),
Event 2:
h2(t|X,Z) = h02(t - t0) Z2 exp(βs2 * xs + βg2 * xg),
where h01(t) and h02(t) are the baseline hazard functions for event 1 and event 2, respectively, t0 is a minimum age of disease onset, Z1 and Z2 are frailties shared within families for each event and follow either a gamma, log-normal, correlateg gamma, or correlated log-normal distributions, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
Choice of frailty distributions for competing risk models
frailty.dist = "gamma"
shares the frailties within families generated from a gamma distribution independently for each competing event, where
Zj follows Gamma(kj, 1/kj).
frailty.dist = "lognormal"
shares the frailties within families generated from a log-normal distribution independently for each competing event, where
Zj follows log-normal distribution with mean 0 and variance (1/kj.
frailty.dist = "cgamma"
shares the frailties within families generated from a correlated gamma distribution to allow the frailties between two events to be correlated, where the correlated gamma frailties (Z1, Z2) are generated with three independent gamma frailties (Y0, Y1, Y2) as follows:
where Y0 from Gamma(k0, 1/k0);
Y1from Gamma(k1, 1/(k0 + k1));
Y2from Gamma(k2, 1/(k0 + k2)).
frailty.dist = "clognormal"
shares the frailties within families generated from a correlated log-normal distribution where
log(Zj) follows a normal distribution with mean 0, variance 1/kj and correlation between two events k0.
depend
should specify the values of related frailty parameters: c(k1, k2)
with frailty.dist = "gamma"
or frailty.dist = "lognormal"
; c(k1, k2, k0)
for frailty.dist = "cgamma"
or frailty.dist = "clognormal"
.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family.
Note that simulating family data under the clinic-based designs ("cli"
or "cli+"
) or the two-stage design can be slower since the ascertainment criteria for the high risk families are difficult to meet in such settings. Especially, "cli"
design could be slower than "cli+"
design since the proband's mutation status is randomly selected from a disease population in "cli"
design, so his/her family members are less likely to be mutation carriers and have less chance to be affected, whereas the probands are all mutation carriers, their family members have higher chance to be carriers and affected by disease. Therefore, "cli"
design requires more iterations to sample high risk families than "cli+"
design.
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification (ID) numbers. |
||||||||||||||
indID |
Individual ID numbers. |
||||||||||||||
gender |
Gender indicators: 1 for males, 0 for females. |
||||||||||||||
motherID |
Mother ID numbers. |
||||||||||||||
fatherID |
Father ID numbers. |
||||||||||||||
proband |
Proband indicators: 1 if the individual is the proband, 0 otherwise. |
||||||||||||||
generation |
Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings. |
||||||||||||||
majorgene |
Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. |
||||||||||||||
secondgene |
Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. |
||||||||||||||
ageonset |
Ages at disease onset in years. |
||||||||||||||
currentage |
Current ages in years. |
||||||||||||||
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. |
||||||||||||||
status |
Disease statuses: 1 for affected by event 1, 2 for affected by event 2, 0 for unaffected (censored). |
||||||||||||||
mgene |
Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or |
||||||||||||||
relation |
Family members' relationship with the proband:
|
||||||||||||||
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
||||||||||||||
naff |
Number of affected members by either event 1 or 2 within family. |
||||||||||||||
df1 |
Number of affected members by event 1 within family. |
||||||||||||||
df2 |
Number of affected members by event 2 within family. |
||||||||||||||
weight |
Sampling weights. |
Yun-Hee Choi
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07.
Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
summary.simfam_cmp, plot.simfam_cmp, penplot_cmp
## Example 1: simulate competing risk family data from pop+ design using # Weibull distribution for both baseline hazards and inducing # residual familial correlation through a correlated gamma frailty. set.seed(4321) fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) head(fam) ## Not run: famID indID gender motherID fatherID proband generation majorgene secondgene ageonset 1 1 1 1 0 0 0 1 3 0 124.23752 2 1 2 0 0 0 0 1 2 0 54.66936 3 1 3 0 2 1 1 2 2 0 32.75208 4 1 4 1 0 0 0 0 3 0 136.44926 5 1 11 1 3 4 0 3 3 0 71.53672 6 1 12 1 3 4 0 3 3 0 152.47073 currentage time status true_status mgene relation fsize naff df1 df2 weight 1 65.30602 65.30602 0 2 0 4 25 2 1 1 1 2 68.62107 54.66936 1 1 1 4 25 2 1 1 1 3 47.07842 32.75208 2 2 1 1 25 2 1 1 1 4 45.09295 45.09295 0 2 0 6 25 2 1 1 1 5 25.32819 25.32819 0 1 0 3 25 2 1 1 1 6 22.95059 22.95059 0 2 0 3 25 2 1 1 1 ## End(Not run) summary(fam) plot(fam, famid = 1) # pedigree plots for family with ID = 1
## Example 1: simulate competing risk family data from pop+ design using # Weibull distribution for both baseline hazards and inducing # residual familial correlation through a correlated gamma frailty. set.seed(4321) fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) head(fam) ## Not run: famID indID gender motherID fatherID proband generation majorgene secondgene ageonset 1 1 1 1 0 0 0 1 3 0 124.23752 2 1 2 0 0 0 0 1 2 0 54.66936 3 1 3 0 2 1 1 2 2 0 32.75208 4 1 4 1 0 0 0 0 3 0 136.44926 5 1 11 1 3 4 0 3 3 0 71.53672 6 1 12 1 3 4 0 3 3 0 152.47073 currentage time status true_status mgene relation fsize naff df1 df2 weight 1 65.30602 65.30602 0 2 0 4 25 2 1 1 1 2 68.62107 54.66936 1 1 1 4 25 2 1 1 1 3 47.07842 32.75208 2 2 1 1 25 2 1 1 1 4 45.09295 45.09295 0 2 0 6 25 2 1 1 1 5 25.32819 25.32819 0 1 0 3 25 2 1 1 1 6 22.95059 22.95059 0 2 0 3 25 2 1 1 1 ## End(Not run) summary(fam) plot(fam, famid = 1) # pedigree plots for family with ID = 1
Generates familial time-to-event data with a time-varying covariate for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.
simfam_tvc(N.fam, design = "pop", variation = "none", interaction = FALSE, add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, add.tvc = FALSE, tvc.type = "PE", tvc.range = NULL, tvc.parms = 1, base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), vbeta = c(1, 1), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
simfam_tvc(N.fam, design = "pop", variation = "none", interaction = FALSE, add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, add.tvc = FALSE, tvc.type = "PE", tvc.range = NULL, tvc.parms = 1, base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), vbeta = c(1, 1), allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
add.x |
Logical; if |
x.dist |
Distribution of the covairate. Possible choices to generate the covariate are: |
x.parms |
Parameter values for the specified distribution of the covariate. |
depend |
Inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default is |
add.tvc |
Logical; if |
tvc.type |
Choice of time-varying covariate model. Possible choices are: |
tvc.range |
Range of ages at which the time-varying covariate occurs.
Default is |
tvc.parms |
Vector of parameter values used for the time-varying covariate model. Default value is 1. |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
vbeta |
Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if |
allelefreq |
Population allele frequencies of major disease gene. Value should be between 0 and 1.
Vector of population allele frequencies of major and second disease genes should be provided when |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
Time-varying covariate
When add.tvc = TRUE
, the time at which the time-varying covariate (TVC) occurs, tvc.age
, is generated from a uniform distribution with the range specified by tvc.range
. A vector of minimum and maximum ages for the TVC should be specified in tve.range
. When tvc.range = NULL
, agemin
and agemax
are used as the range. In addition, tvc.type
should be either "PE"
or "CO"
and the parameter values for the specified TVC type should be provided in tvc.parms
.
tvc.type = "PE"
represents a permanent exposure model for TVC which assumes that the effect of the TVC stays constant after tvc.age
. The tvc.parms
for the PE model should be specified as a single value, which represents log hazard ratio.
tvc.type = "CO"
represents the Cox and Oaks model for TVC which assumes that the effect of the TVC decays exponentially over time in the form
β exp(-(t - t*) η) + η0,
where t* is the time at which the TVC occurs.
The tvc.parms
for the CO model should be specified by a vector of three parameters consisting of c(beta, eta, eta0)
.
Family-based study design
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family. design = "noasc"
is to be used for no ascertainment correction.
Penetrance model
The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene", "kinship".
. When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg).
The ages at onset correlated within families are generated from the shared frailty model (variation = "frailty"
) , the correlated shared frailty model with kinship matrix (variation = "kinship"
), or the two-gene model (variation = "secondene"
), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.
The proportional hazard model
h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The shared frailty model
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The correlated shared frailty model with kinship matrix
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a vector of frailties following a multivariate log-normal distribution with mean
and variance
, where
represents the kinship matrix, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The two-gene model
h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),
where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification (ID) numbers. |
||||||||||||||
indID |
Individual ID numbers. |
||||||||||||||
gender |
Gender indicators: 1 for males, 0 for females. |
||||||||||||||
motherID |
Mother ID numbers. |
||||||||||||||
fatherID |
Father ID numbers. |
||||||||||||||
proband |
Proband indicators: 1 if the individual is the proband, 0 otherwise. |
||||||||||||||
generation |
Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings. |
||||||||||||||
majorgene |
Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. |
||||||||||||||
secondgene |
Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. |
||||||||||||||
ageonset |
Ages at disease onset in years. |
||||||||||||||
currentage |
Current ages in years. |
||||||||||||||
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. |
||||||||||||||
status |
Disease statuses: 1 for affected, 0 for unaffected (censored). |
||||||||||||||
mgene |
Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or |
||||||||||||||
newx |
Additional covariate when |
||||||||||||||
tvc.age |
Age at which the time-varying covariate occurs when |
||||||||||||||
tvc.status |
TVC status: 1 if |
||||||||||||||
relation |
Family members' relationship with the proband:
|
||||||||||||||
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
||||||||||||||
naff |
Number of affected members in family. |
||||||||||||||
weight |
Sampling weights. |
Yun-Hee Choi
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
summary.simfam_tvc, plot.simfam_tvc
## Example: simulate family data with TVC based on CO model. set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) ## Not run: > head(fam) famID indID gender motherID fatherID proband generation majorgene secondgene ageonset 1 1 1 1 0 0 0 1 2 0 61.80566 2 1 2 0 0 0 0 1 3 0 61.56996 3 1 3 0 2 1 1 2 2 0 39.42050 4 1 4 1 0 0 0 0 3 0 90.17320 5 1 13 0 3 4 0 3 3 0 51.49538 6 1 14 0 3 4 0 3 3 0 75.97238 currentage time status mgene tvc.age tvc.status relation fsize naff weight 1 68.26812 61.80566 1 1 59.16387 1 4 29 3 1 2 68.60174 61.56996 1 0 39.45786 1 4 29 3 1 3 47.05410 39.42050 1 1 35.01941 1 1 29 3 1 4 44.86501 44.86501 0 0 58.67013 0 6 29 3 1 5 22.73075 22.73075 0 0 30.19254 0 3 29 3 1 6 22.71399 22.71399 0 0 40.66258 0 3 29 3 1 > summary(fam) Study design: pop: population-based study with affected probands Baseline distribution: Weibull Frailty distribution: gamma Number of families: 10 Average number of affected per family: 3.1 Average number of carriers per family: 3.4 Average family size: 16.3 Average age of onset for affected: 48.19 Average number of TVC event per family: 4 Sampling weights used: 1 ## End(Not run)
## Example: simulate family data with TVC based on CO model. set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) ## Not run: > head(fam) famID indID gender motherID fatherID proband generation majorgene secondgene ageonset 1 1 1 1 0 0 0 1 2 0 61.80566 2 1 2 0 0 0 0 1 3 0 61.56996 3 1 3 0 2 1 1 2 2 0 39.42050 4 1 4 1 0 0 0 0 3 0 90.17320 5 1 13 0 3 4 0 3 3 0 51.49538 6 1 14 0 3 4 0 3 3 0 75.97238 currentage time status mgene tvc.age tvc.status relation fsize naff weight 1 68.26812 61.80566 1 1 59.16387 1 4 29 3 1 2 68.60174 61.56996 1 0 39.45786 1 4 29 3 1 3 47.05410 39.42050 1 1 35.01941 1 1 29 3 1 4 44.86501 44.86501 0 0 58.67013 0 6 29 3 1 5 22.73075 22.73075 0 0 30.19254 0 3 29 3 1 6 22.71399 22.71399 0 0 40.66258 0 3 29 3 1 > summary(fam) Study design: pop: population-based study with affected probands Baseline distribution: Weibull Frailty distribution: gamma Number of families: 10 Average number of affected per family: 3.1 Average number of carriers per family: 3.4 Average family size: 16.3 Average age of onset for affected: 48.19 Average number of TVC event per family: 4 Sampling weights used: 1 ## End(Not run)
Generate familial time-to-event data from correlated fraily model with Kinship or/and IBD matrices given pedigree data.
simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1, 1), agemin = 20, hr = NULL)
simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1, 1), agemin = 20, hr = NULL)
inputdata |
Dataframe contains variables |
IBD |
IBD matrix |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
depend |
Inverse of variance for the frailty distribution. A single value should be specified when |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
var_names |
Names of variables to be used in generating time-to-event data. Specified variables should be part of |
vbeta |
Vector of regression coefficients for the variables specified by |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
The ages at onset are generated from the correlated frailties and covariates using the following model:
The correlated shared frailty model with kinship and/or IBD matrices
h(t|X,Z) = h0(t - t0) Z exp( X*vbeta ),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a vector of frailties following a multivariate log-normal distribution with mean
and variance
, where
represents the kinship matrix and D is IBD matrix,
and
are variance components related to each matrix and their values are specified by
depend = c(1/sig1, 1/sig2)
, and represents a vector of variables whose names are specified by
var_names
, and \beta is a vector of corresponding coefficients whose values are specified by vbeta
.
The variance structure of the frailties shared within families is chosen by either variation = "kinship"
or "IBD"
matrix or both variation = c("kinship", "IBD")
.
When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the covariates .
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family. design = "noasc"
is to be used for no ascertainment correction.
Returns an object of class 'simfam'
, a data frame which contains inputdata
and the following:
ageonset |
Ages at disease onset in years. |
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. |
status |
Disease statuses: 1 for affected, 0 for unaffected (censored). |
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
naff |
Number of affected members in family. |
weight |
Sampling weights. |
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
summary.simfam2, plot.simfam, penplot
## Example: simulate family data from a population-based design using # a Weibull distribution for the baseline hazard and inducing # residual familial correlation through kinship and IBD matrices. # Inputdata and IBD matrix should be provided; # simuated inputdata as an example here; data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) head(fam2) summary(fam2)
## Example: simulate family data from a population-based design using # a Weibull distribution for the baseline hazard and inducing # residual familial correlation through kinship and IBD matrices. # Inputdata and IBD matrix should be provided; # simuated inputdata as an example here; data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) head(fam2) summary(fam2)
penmodel
Provides a summary of a fitted penetrance model.
## S3 method for class 'penmodel' summary(object, correlation=FALSE, ...)
## S3 method for class 'penmodel' summary(object, correlation=FALSE, ...)
object |
An object class of |
correlation |
Logical; if |
... |
Further arguments passed to or from other methods. |
Returns the object of class 'summary.penmodel'
, including the following summary values:
estimates |
List of parameter estimates of transformed baseline parameters and regression coefficients, their standard errors, their robust standard errors if |
varcov |
Variance-covariance matrix of the parameter estimates. |
varcov.robust |
Robust variance-covariance matrix of the parameter estimates if |
correlation |
Correlation matrix obtained from the variance-covariance matrix. |
correlation.robust |
Correlation matrix obtained from the robust variance-covariance matrix if |
Yun-Hee Choi
penmodel, penmodelEM, print.penmodel, print.summary.penmodel plot.penmodel
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02) # Penetrance model fit for the simulated family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", parms=c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull") # Summary of the model parameter and penetrance estimates from model fit summary(fit) ## Not run: Estimates: Estimate Std. Error t value Pr(>|t|) log(lambda) -4.531 0.08583 -52.793 0.01206 * log(rho) 1.113 0.04688 23.737 0.02680 * gender -1.302 0.19233 -6.768 0.09339 . mgene 2.349 0.23825 9.859 0.06436 . Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## End(Not run)
# Simulated family data set.seed(4321) fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02) # Penetrance model fit for the simulated family data fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", parms=c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull") # Summary of the model parameter and penetrance estimates from model fit summary(fit) ## Not run: Estimates: Estimate Std. Error t value Pr(>|t|) log(lambda) -4.531 0.08583 -52.793 0.01206 * log(rho) 1.113 0.04688 23.737 0.02680 * gender -1.302 0.19233 -6.768 0.09339 . mgene 2.349 0.23825 9.859 0.06436 . Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## End(Not run)
penmodel_cmp
Provides a summary of a fitted competing risks penetrance model.
## S3 method for class 'penmodel_cmp' summary(object, correlation=FALSE, ...)
## S3 method for class 'penmodel_cmp' summary(object, correlation=FALSE, ...)
object |
An object class of |
correlation |
Logical; if |
... |
Further arguments passed to or from other methods. |
Returns the object of class 'summary.penmodel_cmp'
, including the following summary values:
estimates |
List of parameter estimates of transformed baseline parameters and regression coefficients, their standard errors, their robust standard errors if |
varcov |
Variance-covariance matrix of the parameter estimates. |
varcov.robust |
Robust variance-covariance matrix of the parameter estimates if |
correlation |
Correlation matrix obtained from the variance-covariance matrix. |
correlation.robust |
Correlation matrix obtained from the robust variance-covariance matrix if |
Yun-Hee Choi
penmodel_cmp, print.penmodel_cmp, print.summary.penmodel_cmp plot.penmodel_cmp
# Simulated family completing risks data ## Not run: set.seed(4321) fam1 <- simfam_cmp(N.fam = 300, design = "pop+", variation = "frailty", competing=TRUE, base.dist = "Weibull", frailty.dist = "gamma", depend=c(0.5, 1), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Penetrance model fit for the simulated family data fit <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", parms = list(c(0.01, 3, -1.13, 2.35), c(0.01, 3, -1, 2)), data = fam1, design = "pop+", base.dist = "Weibull") # Summary of the model parameter and penetrance estimates from model fit summary(fit) ## End(Not run)
# Simulated family completing risks data ## Not run: set.seed(4321) fam1 <- simfam_cmp(N.fam = 300, design = "pop+", variation = "frailty", competing=TRUE, base.dist = "Weibull", frailty.dist = "gamma", depend=c(0.5, 1), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35),c(-1, 2))) # Penetrance model fit for the simulated family data fit <- penmodel_cmp( formula1 = Surv(time, status==1) ~ gender + mgene, formula2 = Surv(time, status==2) ~ gender + mgene, cluster = "famID", parms = list(c(0.01, 3, -1.13, 2.35), c(0.01, 3, -1, 2)), data = fam1, design = "pop+", base.dist = "Weibull") # Summary of the model parameter and penetrance estimates from model fit summary(fit) ## End(Not run)
simfam
Provides a summary of simulated data.
## S3 method for class 'simfam' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'simfam' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Displays a summary of simulated data and returns the following values:
num.fam |
Number of families simulated. |
avg.num.affected |
Average number of affected individuals per family. |
avg.num.carriers |
Average number of mutation carriers per family. |
avg.family.size |
Average family size. |
ave.ageonset |
Average age of onset for affected individuals. |
Yun-Hee Choi
set.seed(4321) fam <- simfam(N.fam = 50, design = "pop", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam) ## Not run: Study design: pop Baseline distribution: Weibull Number of families: 50 Average number of affected per family: 1.24 Average number of carriers per family: 1.3 Average family size: 17.02 Average age of onset for affected: 40.08 ## End(Not run)
set.seed(4321) fam <- simfam(N.fam = 50, design = "pop", variation = "none", base.dist = "Weibull", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam) ## Not run: Study design: pop Baseline distribution: Weibull Number of families: 50 Average number of affected per family: 1.24 Average number of carriers per family: 1.3 Average family size: 17.02 Average age of onset for affected: 40.08 ## End(Not run)
simfam_cmp
Provides a summary of simulated data from simfam_cmp
function.
## S3 method for class 'simfam_cmp' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'simfam_cmp' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Displays a summary of simulated data and returns the following values:
num.fam |
Number of families simulated. |
avg.num.affected1 |
Average number of affected individuals by event 1 per family. |
avg.num.affected2 |
Average number of affected individuals by event 2 per family. |
avg.num.carriers |
Average number of mutation carriers per family. |
avg.family.size |
Average family size. |
ave.ageonset1 |
Average age of onset for affected individuals by event 1. |
ave.ageonset2 |
Average age of onset for affected individuals by event 2. |
Yun-Hee Choi
set.seed(4321) fam <- simfam_cmp(N.fam = 50, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1,2))) summary(fam) ## Not run: Study design: pop+ Baseline distribution for event 1: Weibull Baseline distribution for event 2: Weibull Number of families: 50 Average number of event 1 per family: 1.24 Average number of event 2 per family: 0.7 Average number of carriers per family: 5.54 Average family size: 15.58 Average age of onset for event 1: 42.59 Average age of onset for event 2: 43.72 ## End(Not run)
set.seed(4321) fam <- simfam_cmp(N.fam = 50, design = "pop+", variation = "none", base.dist = "Weibull", base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1,2))) summary(fam) ## Not run: Study design: pop+ Baseline distribution for event 1: Weibull Baseline distribution for event 2: Weibull Number of families: 50 Average number of event 1 per family: 1.24 Average number of event 2 per family: 0.7 Average number of carriers per family: 5.54 Average family size: 15.58 Average age of onset for event 1: 42.59 Average age of onset for event 2: 43.72 ## End(Not run)
simfam_tvc
Provides a summary of simulated data.
## S3 method for class 'simfam_tvc' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'simfam_tvc' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Displays a summary of simulated data and returns the following values:
num.fam |
Number of families simulated. |
avg.num.affected |
Average number of affected individuals per family. |
avg.num.carriers |
Average number of mutation carriers per family. |
avg.family.size |
Average family size. |
ave.ageonset |
Average age of onset for affected individuals. |
ave.num.tvc |
Average number of TVC events per family. |
Yun-Hee Choi
set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam) ## Not run: Study design: pop: population-based study with affected probands Baseline distribution: Weibull Frailty distribution: gamma Number of families: 10 Average number of affected per family: 3.1 Average number of carriers per family: 3.4 Average family size: 16.3 Average age of onset for affected: 48.19 Average number of TVC event per family: 4 ## End(Not run)
set.seed(4321) fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", base.dist = "Weibull", frailty.dist = "gamma", depend = 1, add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35)) summary(fam) ## Not run: Study design: pop: population-based study with affected probands Baseline distribution: Weibull Frailty distribution: gamma Number of families: 10 Average number of affected per family: 3.1 Average number of carriers per family: 3.4 Average family size: 16.3 Average age of onset for affected: 48.19 Average number of TVC event per family: 4 ## End(Not run)
simfam2
Provides a summary of simulated data.
## S3 method for class 'simfam2' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'simfam2' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object class of |
digits |
Number of significant digits to use when printing. |
... |
Further arguments passed to or from other methods. |
Displays a summary of simulated data and returns the following values:
num.fam |
Number of families simulated. |
avg.num.affected |
Average number of affected individuals per family. |
avg.num.carriers |
Average number of mutation carriers per family. |
avg.family.size |
Average family size. |
ave.ageonset |
Average age of onset for affected individuals. |
Yun-Hee Choi
set.seed(4321) data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) summary(fam2) ## Not run: Study design: pop Baseline distribution: Weibull Frailty distribution: lognormal with kinship and IBD matrices Number of families: 50 Average number of affected per family: 1.24 Average number of carriers per family: 1.3 Average family size: 17.02 Average age of onset for affected: 40.08 ## End(Not run)
set.seed(4321) data <- simfam(N.fam = 10, design = "noasc", variation = "none", base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1)) IBDmatrix <- diag(1, dim(data)[1]) data <- data[ , c(1:7, 11, 14)] fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", variation = c("kinship","IBD"), depend = c(1, 1), base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), vbeta = c(1,1), agemin=20) summary(fam2) ## Not run: Study design: pop Baseline distribution: Weibull Frailty distribution: lognormal with kinship and IBD matrices Number of families: 50 Average number of affected per family: 1.24 Average number of carriers per family: 1.3 Average family size: 17.02 Average age of onset for affected: 40.08 ## End(Not run)