Package 'FamEvent'

Title: Family Age-at-Onset Data Simulation and Penetrance Estimation
Description: Simulates age-at-onset traits associated with a segregating major gene in family data obtained from population-based, clinic-based, or multi-stage designs. Appropriate ascertainment correction is utilized to estimate age-dependent penetrance functions either parametrically from the fitted model or nonparametrically from the data. The Expectation and Maximization algorithm can infer missing genotypes and carrier probabilities estimated from family's genotype and phenotype information or from a fitted model. Plot functions include pedigrees of simulated families and predicted penetrance curves based on specified parameter values. For more information see Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30.
Authors: Yun-Hee Choi, Karen Kopciuk, Wenqing He, Laurent Briollais
Maintainer: Yun-Hee Choi <[email protected]>
License: GPL (>= 2.0)
Version: 3.2
Built: 2024-10-31 06:35:13 UTC
Source: CRAN

Help Index


Family age-at-onset data simulation and penetrance estimation

Description

Family-based studies are used to characterize the disease risk associated with being a carrier of a major gene. When the disease risk can vary with age of onset, penetrance or disease risk functions need to provide age-dependent estimates of this disease risk over lifetime. This FamEvent package can generate age-at-onset data in the context of familial studies, with correction for ascertainment (selection) bias arising from a specified study design based on proband's mutation and disease statuses. Possible study designs are: "pop" for population-based design where families are ascertained through affected probands, "pop+" are similar to "pop" but probands are also known mutation carriers, "cli" for clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" are similar to "cli" but probands are also known mutation carrriers. And "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families that includes at least two affected members in the family at the second stage.

Ages at disease onset are generated specific to family members' gender and mutation status according to the specified study design with residual familial correlations induced by a shared frailty, correlated frailties with Kinship matrix, or a second gene.

For estimating age at onset risks with family data, an ascertainment corrected prospective likelihood approach is used to account for the population or clinic-based study designs while a composite likelihood approach is used for the two-stage sampling design. The Expectation and Maximization (EM) algorithm has been implemented for inferring missing genotypes conditional on observed genotypes and phenotypes in the families. For family members who have missing genotypes, their carrier probabilities are obtained either from the fitted model or from Mendelian transmission probabilities.

This package also provides functions to plot the age-dependent penetrance curves estimated parametrically from the fitted model or non-parametrically from the data, pedigree plots of simulated families and penetrance function curves for carriers and non-carriers of a major and second gene based on specified parameter values.

In summary, this package facilitates the following:

1. Family data generations from 1) a shared frailty model or 2) a correlated frailty model with Kinship and/or IBD matrix; 3) a competing risk model.

2. Covariates considered: sex, gene, options for additional covariates.

3. Time-varying covariate generation based on permanent exposure (PE) or the Cox-Oaks model (CO).

3. Model estimation for shared frailty models and competing risk models.

Author(s)

Yun-Hee Choi, Karen Kopciuk, Laurent Briollais, Wenqing He

Maintainer: Yun-Hee Choi < [email protected] >

References

Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

simfam, summary.simfam, plot.simfam, simfam2, summary.simfam2, plot.simfam2, simfam_tvc, summary.simfam_tvc, plot.simfam_tvc, penplot, carrierprob, penmodel, penmodelEM, print.penmodel, summary.penmodel,print.summary.penmodel, plot.penmodel, simfam_cmp, summary.simfam_cmp, plot.simfam_cmp, penplot_cmp, penmodel_cmp, print.penmodel_cmp, summary.penmodel_cmp, print.summary.penmodel_cmp, plot.penmodel_cmp

Examples

## Not run: 
# Example1:  Simulate family data 
set.seed(4321)
fam <- simfam(N.fam = 100, design = "pop+", variation = "none", base.dist = "Weibull", 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), allelefreq = 0.02)

# summary of simulated family data
summary(fam) 

# Pedigree plots for family 1 and 2
plot(fam, famid = c(1,2))

# penetrance function plots given model parameter values for Weibull baseline 
penplot(base.parms = c(0.01, 3), vbeta = c(-1.3, 2.35), base.dist = "Weibull", 
        variation = "none", agemin = 20)

# model fit of family data 
fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+",
       parms=c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE)

# summary of estimated model parameters and penetrance estimates 
summary(fit)

# penetrance curves useful for model checking 
plot(fit) 

## Example 2: Simulate family data from a correlated frailty model 
#  with Kinship and IBD matrices given pedigree data.

# Inputdata and IBD matrix should be provided; 
# Inputdata was generated as an example using simfam.

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

summary(fam2)

### Example 3: Simulate correlated competing risks family data 
set.seed(4321)
fam3 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), 
       allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
       vbeta = list(c(-1.13, 2.35), c(-1, 2)))

# summary of simulated family data
summary(fam3) 

# Pedigree plots for family 1 
plot(fam3, famid = 1)

# penetrance function plot for event 1 given model parameter values for Weibull baseline 
penplot_cmp(event = 1, base.parms = list(c(0.01, 3), c(0.01, 3)), 
          vbeta = list(c(-1.3, 2.35), c(-1, 2)), base.dist = "Weibull", 
          variation = "frailty", frailty.dist = "cgamma", 
          depend=c(1,2,0.5), agemin = 20)


# Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data

fit3 <- penmodel_cmp(
        formula1 = Surv(time, status==1) ~ gender + mgene, 
        formula2 = Surv(time, status==2) ~ gender + mgene,
        cluster = "famID", gvar = "mgene", design = "pop+",  
        parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)),
        base.dist = "Weibull", frailty.dist = "cgamma", data = fam2, robust = TRUE)

# Summary of the model parameter estimates from the model fit
summary(fit3)

# Plot the lifetime penetrance curves with 95
# gender and mutation status groups along with their nonparametric penetrance curves  
# based on data excluding probands. 

plot(fit3, add.CIF = TRUE, conf.int = TRUE, MC = 100)

### Example 4: Simulate family data with a time-varying covariate 
set.seed(4321)

fam4 <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "gamma", depend = 1, 
       add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), 
       tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))

summary(fam4)


## End(Not run)

Compute mutation carrier probabilities for individuals with missing gentoypes

Description

Computes model- or data-based carrier probabilities for individuals with missing genotypes based on the observed mutation status of family members and the individual's phenotype.

Usage

carrierprob(condition = "geno", method = "data", fit = NULL, data, mode = "dominant", 
q = 0.02)

Arguments

condition

Choice of conditional information to use for computing the carrier probability. Possible choices are "geno" for using observed genotypes and "geno+pheno" for using both observed genotype and phenotype information in the calculation of the carrier probability.

method

Choice of methods to calculate the carrier probability. Possible choices are "data" for empirical calculation of the carrier probabilities based on data, "mendelian" using Mendelian transmission probabilities based on observed carriers within families, or "model" using the parametric model fit; see details below. Default is "data". If method = "data", only data is required to be specified.

fit

An object of class penmodel, a fitted model by penmodelEM function for inferring missing mutation statuses in the family.

data

Family data that includes missing genotypes using the same data format generated by the function simfam.

mode

Choice of modes of inheritance when using method="model". Possible choices are "dominant" for dominant model or "recessive" for recessive model. Default is "dominant".

q

Frequency of the disease causing allele when using method="model". The value should be between 0 and 1. If NULL, the estimated allele frequency from data will be used. Default value is 0.02.

Details

When method="model" along with the choice of condition="geno+pheno", the carrier probability for individual ii is calculated by conditioning on her/his observed phenotype and carrier statuses of family members P(Xi = 1 | Yi , Xo ) = P(Yi | Xi = 1) * P(Xi = 1 | Xo) / (P(Yi | Xi = 1) * P(Xi = 1 | Xo) + P(Yi | Xi = 0) * P(Xi = 0 | Xo)) where Xi indicates the unknown carrier status of individual ii and Xo represents the observed carrier statuses in his or her family members; Yi represents the observed phenotype ti, δi of individual ii in terms of age at onset ti and disease status indicator δi with 1 used for affected individuals and 0 for unaffected individuals. When method="mendelian" along with the choice of condition="geno", the carrier probability is calculated based on Mendelian laws of genetic transmission with a fixed allele frequency.

Value

Returns a data frame with a vector of carrier probabilities called carrp.geno when condition="geno" or carrp.pheno when condtion="geno+pheno" added after the last column of the family data.

Author(s)

Yun-Hee Choi

See Also

simfam, penmodelEM, plot.simfam, summary.simfam

Examples

# Simulated family data with 30% of members missing their genetic information.

set.seed(4321)
fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", mrate = 0.3,
       base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), agemin = 20)
 
# EM algorithm for fitting family data with missing genotypes assuming a Weibull
# baseline hazard and dominant mode of Mendelian inheritance for a major gene.

fitEM <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", 
        parms = c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull", 
        method = "mendelian", mode = "dominant")

# Carrier probability obtained by conditioning on the observed genotypes and phenotype, 
# assuming a dominant Mendelian mode of inheritance

fam.added <- carrierprob(condition = "geno+pheno", method = "model", fit = fitEM, 
             data = fam, mode = "dominant", q = 0.02)

# pedigree plot for family 1 displaying carrier probabilities

plot.simfam(fam.added, famid = 1)

Simulation-based power calculation for genetic effect

Description

Computes the power of detecting genetic effect in the penetrance model based on a family-based simulation study.

Usage

fampower(N.fam, N.sim, effectsize, beta.sex, alpha = 0.05, side = 2, design = "pop", 
variation = "none", interaction = FALSE, depend = NULL, base.dist = "Weibull", 
frailty.dist = NULL, base.parms, allelefreq = c(0.02, 0.2), dominant.m = TRUE, 
dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)

Arguments

N.fam

Number of families to generate.

N.sim

Number of simulations.

effectsize

Effect size of the major mutated gene (beta.gene) to detect under the alternative hypothesis. When interaction=TRUE, both the main and interaction effects should be specified, effectsize = c(beta.gene, beta.int).

beta.sex

Gender effect that is fixed in the model.

alpha

Significance level. Default value is 0.05.

side

Number of sides for the alternative hypothesis. Possible choices are 1 for one-sided test and 2 for two-sided test. Default value is 2.

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+" or "twostage", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sibling affected, "cli+" is similar to "cli" but with mutation carrier probands and "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that includes at least two affected members in the family. Default is "pop".

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, or "none" for no residual familial correlation. Default is "none".

interaction

Logical; if TRUE, the interaction between gender and mutation status is allowed, otherwise no interaction is allowed. Default is FALSE.

depend

Variance of the frailty distribution. Dependence within families increases with depend value. Default value is NULL. Value > 0 should be specified when variation = "frailty".

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", or "logBurr". Default is "Weibull".

frailty.dist

Choice of frailty distribution. Possible choices are: "gamma" for gamma distribution or "lognormal" for log normal distribution when variation = "frailty". Default is NULL.

base.parms

Vector of parameter values for baseline hazard function.

base.parms = c(lambda, rho), where lambda and rho are the shape and scale parameters, respectively. If base.dist = "logBurr" is chosen, three parameters should be specified for base.parms = c(lambda, rho, eta).

allelefreq

Vector of population allele frequencies of major and second disease gene alleles. Frequencies must be between 0 and 1. Default frequencies are 0.02 for major gene allele and 0.2 for second gene allele, allelefreq = c(0.02, 0.2).

dominant.m

Logical; if TRUE, the genetic model of the major gene is dominant, otherwise recessive.

dominant.s

Logical; if TRUE, the genetic model of the second gene is dominant, otherwise recessive.

mrate

Proportion of missing genotypes, value between 0 and 1. Default value is 0.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design = "twostage". Default value is 0. Value should lie between 0 and 1.

probandage

Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, probandage = c(45, 2).

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

agemax

Maximum age of disease onset or maximum age. Default is 100 years of age.

Details

The power of testing H0:βgene=0H_0: \beta_{gene} = 0 vs. H1:βgene=H_1: \beta_{gene} = effectsize is obtained by the proportion of times the null hypothesis is rejected out of the N.sim simulations.

When interaction = TRUE, the powers of both the main effect of mutated gend and the interaction effect of mutated gene and gender will be computed.

Value

Returns

power

Power of detecting the genetic effect.

Author(s)

Yun-Hee Choi

See Also

simfam

Examples

## Example 1: obtain the power for testing the genetic effect 
# based on 50 POP families simulated using 100 simulations
## Not run: 
set.seed(4321)
fampower(N.fam = 50, N.sim = 100, effectsize = 1, beta.sex = 0.8, alpha = 0.05, side = 2, 
design = "pop+", variation = "none", base.dist = "Weibull", allelefreq = 0.02, 
base.parms = c(0.01, 3))
## End(Not run)
    
## Example 2: obtain the power for both the main and interaction effects  
# based on 50 POP families simulated using 100 simulations
## Not run: 
set.seed(4321)
fampower(N.fam = 50, N.sim = 100, effectsize = c(1.5, 1), beta.sex = 0.8, alpha = 0.05, 
side = 2, interaction = TRUE,  design = "pop+", variation = "none", base.dist = "Weibull", 
allelefreq = 0.02, base.parms = c(0.01, 3))
## End(Not run)

Ontario Lynch Syndrom families

Description

Data from 32 Lynch Syndrome families segregating mismatch repair mutations selected from the Ontario Familial Colorectal Cancer Registry that includes 765 individuals, both probands and relatives. The families were ascertained throughout affected and mutation carrier probands.

Usage

data("LSfam")

Format

A data frame with 765 observations on the following 11 variables.

famID

Family identification (ID) numbers.

indID

Individuals ID numbers.

fatherID

Father ID numbers.

motherID

Mother ID numbers.

gender

Gender indicators: 1 for male, 0 for female.

status

Disease statuses: 1 for affected, 0 for unaffected.

time

Ages at diagnosis of colorectal cancer for the affected or ages of last follow-up for the unaffected.

currentage

Current ages in years.

mgene

MLH1 or MSH2 mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or NA if missing.

proband

Proband indicators: 1 for proband, 0 for non-proband.

relation

Family members' relationship with the proband. Relation codes:

1 Proband (self)
2 Brother or sister
3 Son or daughter
4 Parent
5 Nephew or niece
6 Spouse
7 Brother or sister in law
8 Paternal grandparent
9 Paternal uncle or aunt
10 Paternal cousin
11 Maternal grandparent
12 Maternal uncle or aunt
13 Maternal cousin
14 Son or daughter in law
15 Grandchild
16 Uncle's or aunt's spouse.

References

Choi, Y.-H., Cotterchio, M., McKeown-Eyssen, G., Neerav, M., Bapat, B., Boyd, K., Gallinger, S., McLaughlin, J., Aronson, M., and Briollais, L. (2009). Penetrance of Colorectal Cancer among MLH1/ MSH2 Carriers Participating in the Colorectal Cancer Familial Registry in Ontario, Hereditary Cancer in Clinical Practice, 7:14.

Examples

data(LSfam)

# Summary of LSfam
summary.simfam(LSfam)

# Pedigree plot for the first family
plot.simfam(LSfam)

# Assign minimum age for fitting penmodel
attr(LSfam, "agemin") <- 18

fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", 
       parms = c(0.05, 2, 1, 3), data = LSfam[!is.na(LSfam$time) & LSfam$time > 18, ], 
       method = "mendelian", base.dist = "Weibull", design = "pop+", robust = TRUE)

summary(fit)

penetrance(fit, fixed = c(1, 1), age = c(50, 60, 70), CI = TRUE, MC = 100)

Penetrance function and confidence intervals

Description

Estimates the cumulative disease risks (penetrances) and confidence intervals at given age(s) based on the fitted penetrance model.

Usage

penetrance(fit, fixed, age, CI = TRUE, MC = 100)

Arguments

fit

An object class of 'penmodel', a fitted model by penmodel or penmodelEM functions.

fixed

Vector of fixed values of the covariates used for penetrance calculation.

age

Vector of ages used for penetrance calculation.

CI

Logical; if TRUE, the 95% confidence interval will be obtained using a Monte-Carlo method, otherwise no confidence interval will be provided. Default is TRUE.

MC

Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method. If MC=0, no confidence intervals will be calculated. Default value is 100.

Details

The penetrance function is defined as the probability of developing a disease by age tt given fixed values of covariates xx,

P(T<tx)=1S(t;x),P(T < t | x) = 1 - S(t; x),

where tt is greater than the minimum age t0 and S(t;x)S(t; x) is the survival distribution based on a proportional hazards model with a specified baseline hazard distribution.

The proportional hazards model is specified as: h(t|x) = h0(t) exp(β*x), where h0(t) is the baseline hazards function, xx is the vector of covariates and β is the vector of corresponding regression coefficients.

Calculations of standard errors of the penetrance estimates and 95% confidence intervals (CIs) for the penetrance at a given age are based on Monte-Carlo simulations of the estimated penetrance model.

A multivariate normal distribution is assumed for the parameter estimates, and MC = n sets of the parameters are generated from the multivariate normal distribution with the parameter estimates and their variance-covariance matrix. For each simulated set, a penetrance estimate is calculated at a given age by substituting the simulated parameters into the penetrance function.

The standard error of the penetrance estimate at a given age is calculated by the standard deviation of penetrance estimates obtained from nn simulations.

The 95% CI for the penetrance at a given age is calculated using the 2.5th and 97.5th percentiles of the penetrance estimates obtained from nn simulations.

Value

Returns the following values:

age

Ages at which the penetrances are calculated.

penetrance

Penetrance estimates at given ages.

lower

Lower limit of the 95% confidence interval; simulation-based 2.5th percentile of the penetrance estimates.

upper

Upper limit of the 95% confidence interval; simulation-based 97.5th percentile of the penetrance estimates.

se

Simulation-based standard errors of the penetrance estimates.

Author(s)

Yun-Hee Choi

See Also

simfam, penmodel, penmodelEM

Examples

set.seed(4321)
  fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, 
         base.parms = c(0.01,3), vbeta = c(-1.13, 2.35))
	
  fit <- penmodel(Surv(time, status) ~ gender +  mgene, cluster = "famID", 
         parms = c(0.01, 3, -1.13, 2.35),  data = fam, base.dist = "Weibull", design = "pop+")
 
 # Compute penetrance estimates for male carriers at age 40, 50, 60, and 70 and
 # their 95% CIs based on 100 Monte Carlo simulations.
 
 penetrance(fit, fixed = c(1,1), age = c(40, 50, 60, 70), CI = TRUE, MC = 100)

Estimating Penetrances for competing risk models and confidence intervals

Description

Estimates the cumulative disease risks (penetrances) and confidence intervals for the event of interest in the presence of competing event given fixed values of covariates based on the fitted competing risk model.

Usage

penetrance_cmp(fit, event = 1, fixed, age, CI = TRUE, MC = 100)

Arguments

fit

An object class of 'penmodel_cmp', a fitted model by penmodel_cmp function.

event

Event of interest (either 1 or 2) for penetrance estimation. Default value is 1.

fixed

list of vectors of fixed values of the covariates for both events used for penetrance calculation.

age

Vector of ages used for penetrance calculation.

CI

Logical; if TRUE, the 95% confidence interval will be obtained using a Monte-Carlo method, otherwise no confidence interval will be provided. Default is TRUE.

MC

Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method. If MC=0, no confidence intervals will be calculated. Default value is 100.

Details

The cause-specific hazard for event jj is specified as: hj(t|x) = h0j(t) exp(βj *xj), where h0j(t) is the baseline hazards function for event jj, xj is the vector of covariates associated with event jj and βj is the vector of corresponding regression coefficients, j=1,2j = 1, 2.

The penetrance function for event jj in the presence of competing risks based on cause-specific hazards (with no frailties assumed) model is defined as the probability of developing an event of interest by age tt given fixed values of covariates xx in the following form: P(T < t, d = j |x) = ∫t0t hj(u|xj)exp(-H1(u|x1)-H2(u|x2))du, where t0 is the minimum age of onset, dd is the type of event which takes (j=1,2)(j = 1, 2).

The shared frailty competing risks model is: hj(t|zj, x) = zj*h0j(t) exp(βj *xj), where zj is the shared frailty for event jj within families whose distribution is specified by frailty.dist.

The penetrance function for event jj from the shared frailty competing risks model is obtained by integrating over the frailty distribution of G(z1, z2), P(T < t, d = j | x) = ∫t0t ∫ ∫ hj(u|xj, zj) exp(-H1(u|x1 zj)-H2(u|x2, zj)) dG(z1 dz2)du, where t0 is the minimum age of onset, dd is the type of event which takes (j=1,2)(j = 1, 2).

See Choi et al. (2021) for more details about the penetrance functions.

Calculations of standard errors of the penetrance estimates and 95% confidence intervals (CIs) for the penetrance at a given age are based on Monte-Carlo simulations of the estimated penetrance model. A multivariate normal distribution is assumed for the parameter estimates, and MC = n sets of the parameters are generated from the multivariate normal distribution with the parameter estimates and their variance-covariance matrix. For each simulated set, a penetrance estimate is calculated at a given age by substituting the simulated parameters into the penetrance function.

The standard error of the penetrance estimate at a given age is calculated by the standard deviation of penetrance estimates obtained from nn simulations.

The 95% CI for the penetrance at a given age is calculated using the 2.5th and 97.5th percentiles of the penetrance estimates obtained from nn simulations.

Value

Returns the following values:

age

Ages at which the penetrances are calculated.

penetrance

Penetrance estimates at given ages.

lower

Lower limit of the 95% confidence interval; simulation-based 2.5th percentile of the penetrance estimates.

upper

Upper limit of the 95% confidence interval; simulation-based 97.5th percentile of the penetrance estimates.

se

Simulation-based standard errors of the penetrance estimates.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

See Also

simfam_cmp, penmodel_cmp

Examples

## Not run: 
set.seed(4321)
fam2 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", competing = TRUE,
                base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), 
                base.parms = list(c(0.01, 3), c(0.01, 3)), 
                vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02)

fit2 <- penmodel_cmp(Surv(time, status==1) ~ gender + mgene, 
                   Surv(time, status==2) ~ gender + mgene, 
                   cluster = "famID", gvar = "mgene", frailty.dist = "cgamma",
                   parms = list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), 
                   data = fam2, design = "pop+", base.dist = "Weibull", 
                   agemin = NULL, robust = TRUE)

# Compute penetrance estimates for event 1 for male carriers at age 40, 50, 60, 70 and
# their 95
 
penetrance_cmp(fit2, event = 1, fixed = list(c(1,1), c(1,1)), 
             age = c(40, 50, 60, 70), CI = TRUE, MC = 200)

## End(Not run)

Fit a penetrance model

Description

Fits a penetrance model for family data based on a prospective likelihood with ascertainment correction and provides model parameter estimates.

Usage

penmodel(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, 
design = "pop", base.dist = "Weibull", frailty.dist = "none",
agemin = NULL, robust = FALSE)

Arguments

formula

A formula expression as for other regression models. The response should be a survival object as returned by the Surv function. See the documentation for Surv, lm and formula for details.

cluster

Name of cluster variable. Default is "famID".

gvar

Name of genetic variable. Default is "mgene".

parms

Vector of initial values for the parameters in the model including baseline parameters and regression coefficients. parms = c(baseparm, coef), where baseparm includes the initial values for baseline parameters used for base.dist, and coef includes the initial values for regression coefficients for the variables specified in formula. If frailty.dist is specified, the initial value of the frailty parameter should be specified parms = c(baseparm, coef, k), where k the initial value for the frailty parameter. See details for the baseline parameters.

cuts

Vector of cut points that define the intervals where the hazard function is constant. The cuts should be specified when base.dist="piecewise" and must be strictly positive and finite. Default is NULL.

data

Data frame generated from simfam or data frame containing variables named in the formula and specific variables: famID, indID, gender, currentage, mgene, time, status and weight with attr(data,"agemin") specified.

design

Study design of the family data. Possible choices are: "pop", "pop+", "cli", "cli+" or "twostage", where "pop" is for the population-based design with affected probands whose mutation status can be either carrier or non-carrier, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, and "twostage" is for the two-stage design with oversampling of high risks families. Default is "pop".

base.dist

Choice of baseline hazard distributions to fit. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal", "gamma", "logBurr", or "piecewise". Default is "Weibull".

frailty.dist

Choice of frailty distribution. Possible choices are: "gamma", "lognormal", or "none". Default is "none".

agemin

Minimum age of disease onset or minimum age. Default is NULL.

robust

Logical; if TRUE, the robust ‘sandwich’ standard errors and variance-covariance matrix are provided, otherwise the conventional standard errors and variance-covariance matrix are provided.

Details

When frailty.dist = "none", the following penetrance model is fitted to family data with a specified baseline hazard distribution h(t|xs, xg) = h0(t - t0) exp(βs * xs + βg * xg), where h0(t) is the baseline hazards function specified by base.dist, which depends on the shape and scale parameters, λ and ρ; xs indicates male (1) and female (0) and xg indicates carrier (1) or non-carrier (0) of a gene of interest (major gene). Additional covariates can be added to formula in the model.

When frailty.dist is specified as either "gamma" or "lognormal", the follwoing shared frailty model is fitted to family data h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg), where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, and Z represents a frailty shared within families whose distribution is specified by frailty.dist.

Choice of frailty distributions

frailty.dist = "gamma" assumes ZZ follows Gamma(kk, 1/k1/k).

frailty.dist = "lognormal" assumes ZZ follows log-normal distribution with mean 0 and variance 1/k1/k.

frailty.dist = "none" shares no frailties within families and assumes independence among family members.

For family data arising from population- or clinic-based study designs (design="pop", "pop+", "cli", or "cli+"), the parameters of the penetrance model are estimated using the ascertainment-corrected prospective likelihood approach (Choi, Kopciuk and Briollais, 2008).

For family data arising from a two-stage study design (design="twostage"), model parameters are estimated using the composite likelihood approach (Choi and Briollais, 2011)

Note that the baseline parameters include lambda and rho, which represent the scale and shape parameters, respectively, and eta, additional parameter to specify for "logBurr" distribution. For the "lognormal" baseline distribution, lambda and rho represent the location and scale parameters for the normally distributed logarithm, where lambda can take any real values and rho > 0. For the other baselinse distributions, lambda > 0, rho > 0, and eta > 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise", baseparm should specify the initial interval-constant values, one more than the cut points specified bycuts.

Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters (λ,ρ\lambda, \rho) for "Weibull", "loglogistic", "Gompertz" and "gamma" baselines, to (λ,ρ,η\lambda, \rho, \eta) for "logBurr" and to the piecewise constant parameters for a piecewise baseline hazard. For "lognormal" baseline distribution, the log transformation is applied only to ρ\rho, not to λ\lambda, which represents the location parameter for the normally distributed logarithm.

Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance function via Monte-Carlo simulations of the estimated penetrance model.

Value

Returns an object of class 'penmodel', including the following elements:

estimates

Parameter estimates of transformed baseline parameters and regression coefficients.

varcov

Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix.

varcov.robust

Robust ‘sandwich’ variance-covariance matrix of parameter estimates when robust=TRUE.

se

Standard errors of parameter estimates obtained from the inverse of Hessian matrix.

se.robust

Robust ‘sandwich’ standard errors of parameter estimates when robust=TRUE.

logLik

Loglikelihood value for the fitted penetrance model.

AIC

Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

penmodelEM, simfam, penplot, print.penmodel, summary.penmodel, print.summary.penmodel, plot.penmodel

Examples

# Family data simulated from population-based design using a Weibull baseline hazard 

set.seed(4321)
fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02)
 
# Penetrance model fit for simulated family data

fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+",
       parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull")

# Summary of the model parameter estimates from the model fit

summary(fit)

# Plot the lifetime penetrance curves with 95% CIs from the model fit for specific  
# gender and mutation status groups along with their nonparametric penetrance curves  
# based on data excluding probands. 

plot(fit, add.KM = TRUE, conf.int = TRUE, MC = 100)

Fit a penetrance model for competing risks data

Description

Fits a competing risks model for family data with ascertainment correction and provides model parameter estimates.

Usage

penmodel_cmp(formula1, formula2, cluster = "famID", gvar = "mgene", 
parms, cuts = NULL, data,  design = "pop", base.dist = "Weibull", 
frailty.dist = "none", agemin = NULL, robust = FALSE)

Arguments

formula1

A formula expression for event 1 as for other regression models. The response should be a survival object as returned by the Surv function. See the documentation for Surv, lm and formula for details.

formula2

A formula expression for event 2 as for other regression models. The response should be a survival object as returned by the Surv function. See the documentation for Surv, lm and formula for details.

cluster

Name of cluster variable. Default is "famID".

gvar

Name of genetic variable. Default is "mgene".

parms

list of Vectors of initial values for the parameters in each model including baseline parameters and regression coefficients and frailty parameters.

parms = list(c(baseparm1, coef1), c(baseparms2, coef2), c(k1, k2), where baseparm1 and baseparm2 include the initial values for baseline parameters used for each base.dist; coef1 and coef2 include the initial values for regression coefficients for the variables specified in formula1 and formula2, respectively, and c(k1, k2) are the initial values for frailty parameters used for the specified frailty.distribution. See Details for more details.

cuts

Vector of cut points that define the intervals when base.dist="piecewise" is specified and must be strictly positive and finite and greater than agemin. Default is NULL.

data

Data frame generated from simfam or data frame containing variables named in the formula and specific variables: famID, indID, gender, currentage, mgene, time, status and weight with attr(data,"agemin") specified.

design

Study design of the family data. Possible choices are: "pop" and "pop+", where "pop" is for the population-based design with affected probands whose mutation status can be either carrier or non-carrier, "pop+" is similar to "pop" but with mutation carrier probands. Default is "pop".

base.dist

Vector of two baseline hazard distributions to be fitted for competing events. Possible choices for each event are: "Weibull", "loglogistic", "Gompertz", "lognormal", "gamma", "logBurr", or "piecewise". If only one distribution is specified, the same distribution will be assumed for both events. A vector of two distributions should be specified if different baseline distributions are assumed for different events. Default is c("Weibull", "Weibull").

frailty.dist

Choice of frailty distribution to fit a shared frailty model for competing events. Possible choices are: "gamma" for independent gamma, "lognormal" for independent log-normal, "cgamma" for correlated gamma, "clognormal" for correlated log-normal distributions, or "none" can be chosen to ignoring frailties in the model assuming no residual familial correlation given covariates. Default is "gamma").

agemin

Minimum age of disease onset or minimum age. Default is NULL.

robust

Logical; if TRUE, the robust ‘sandwich’ standard errors and variance-covariance matrix are provided, otherwise the conventional standard errors and variance-covariance matrix are provided.

Details

The shared frailty comepting risks model is fitted to family data with specified baseline hazard distributions and frailty distribution

Event 1:

h1(t|X,Z) = h01(t - t0) Z1 exp(βs1 * xs + βg1 * xg),

Event 2:

h2(t|X,Z) = h02(t - t0) Z2 exp(βs2 * xs + βg2 * xg),

where h01(t) and h02(t) are the baseline hazard functions for event 1 and event 2, respectively, which can be specified by base.dist. t0 is a minimum age of disease onset, Z1 and Z2 are frailties shared within families for each event and follow either a gamma, log-normal, correlateg gamma, or correlated log-normal distributions, which can be specified by frailty.dist. xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively. Additional covariates can be added to formula1 for event 1 and formula2 for event 2 in the model.

Choice of frailty distributions for competing risk models

frailty.dist = "gamma" shares the frailties within families generated from a gamma distribution independently for each competing event, where Zj follows Gamma(kj, 1/kj).

frailty.dist = "lognormal" shares the frailties within families generated from a log-normal distribution independently for each competing event, where Zj follows log-normal distribution with mean 0 and variance 1/kj.

frailty.dist = "cgamma" shares the frailties within families generated from a correlated gamma distribution to allow the frailties between two events to be correlated, where the correlated gamma frailties (Z1, Z2) are generated with three independent gamma frailties (Y0, Y1, Y2) as follows:

Z1 = k0/(k0 + k1) Y0 + Y1

;

Z2 = k0/(k0 + k2) Y0 + Y2

,

where Y0 from Gamma(k0, 1/k0), Y1 from Gamma(k1, 1/(k0 + k1)), Y2 from Gamma(k2, 1/(k0 + k2)).

frailty.dist = "clognormal" shares the frailties within families generated from a correlated log-normal distribution where log(Zj) follows a normal distribution with mean 0, variance 1/kj and correlation between two events k0.

depend should specify the values of related frailty parameters: c(k1, k2) with frailty.dist = "gamma" or frailty.dist = "lognormal"; c(k1, k2, k0) for frailty.dist = "cgamma" or frailty.dist = "clognormal".

More details about the competing risks model for family data arising from population-based study designs (design="pop", "pop+" and their inference procedure based on the ascertainment-corrected likelihood approach can be found in Choi et al., 2021.

Note that the baseline parameters include lambda and rho, which represent the scale and shape parameters, respectively, and eta, additional parameter to specify for "logBurr" distribution. For the "lognormal" baseline distribution, lambda and rho represent the location and scale parameters for the normally distributed logarithm, where lambda can take any real values and rho > 0. For the other baselinse distributions, lambda > 0, rho > 0, and eta > 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise", baseparm should specify the initial interval-constant values, one more than the cut points specified bycuts.

Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters (λ,ρ\lambda, \rho) for "Weibull", "loglogistic", "Gompertz" and "gamma" baselines, to (λ,ρ,η\lambda, \rho, \eta) for "logBurr" and to the piecewise constant parameters for a piecewise baseline hazard. For "lognormal" baseline distribution, the log transformation is applied only to ρ\rho, not to λ\lambda, which represents the location parameter for the normally distributed logarithm.

Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance function via Monte-Carlo simulations of the estimated penetrance model.

Value

Returns an object of class 'penmodel_cmp', including the following elements:

estimates

Parameter estimates of transformed baseline parameters and regression coefficients.

varcov

Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix.

varcov.robust

Robust ‘sandwich’ variance-covariance matrix of parameter estimates when robust=TRUE.

se

Standard errors of parameter estimates obtained from the inverse of Hessian matrix.

se.robust

Robust ‘sandwich’ standard errors of parameter estimates when robust=TRUE.

logLik

Loglikelihood value for the fitted penetrance model.

AIC

Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

See Also

simfam_cmp, penplot_cmp, print.penmodel_cmp, summary.penmodel_cmp, print.summary.penmodel_cmp, plot.penmodel_cmp

Examples

# Competing risk family data simulated from population-based design 
# using Weibull baseline hazards with gamma frailty distribution.
## Not run: 
set.seed(4321)
fam1 <- simfam_cmp(N.fam = 200, design = "pop+", variation = "frailty", 
              base.dist = "Weibull", frailty.dist = "cgamma", depend=c(0.5, 1, 0.5), 
              allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
              vbeta = list(c(-1.13, 2.35), c(-1, 2)))

# Fitting shared gamma frailty Penetrance model for simulated competing risk data

fit1 <- penmodel_cmp(
        formula1 = Surv(time, status==1) ~ gender + mgene, 
        formula2 = Surv(time, status==2) ~ gender + mgene,
        cluster = "famID", gvar = "mgene", design = "pop+",  
        parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1)),
        base.dist = "Weibull", frailty.dist = "gamma", data = fam1, robust = TRUE)

# Fitting shared correlated gamma frailty Penetrance model for simulated competing risk data

fit2 <- penmodel_cmp(
        formula1 = Surv(time, status==1) ~ gender + mgene, 
        formula2 = Surv(time, status==2) ~ gender + mgene,
        cluster = "famID", gvar = "mgene", design = "pop+",  
        parms = list(c(0.01, 3, -1, 2), c(0.01, 3, -1, 2), c(0.5, 1, 0.5)),
        base.dist = "Weibull", frailty.dist = "cgamma", data = fam1, robust = TRUE)

# Summary of the model parameter estimates from the model fit

summary(fit1)

# Plot the lifetime penetrance curves with 95
# gender and mutation status groups along with their nonparametric penetrance curves  
# based on data excluding probands. 

plot(fit1, add.CIF = TRUE, conf.int = TRUE, MC = 100)

## End(Not run)

EM algorithm for estimating the penetrance model with missing genotypes

Description

Fits a penetrance model for family data with missing genotypes via the EM algorithm and provides model parameter estimates.

Usage

penmodelEM(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data, 
design = "pop", base.dist = "Weibull", agemin = NULL, robust = FALSE, method = "data", 
mode = "dominant", q = 0.02)

Arguments

formula

A formula expression as for other regression models. The response should be a survival object as returned by the Surv function. See the documentation for Surv, lm and formula for details.

cluster

Name of cluster variable. Default is "famID".

gvar

Name of genetic variable. Default is "mgene".

parms

Vector of initial values for the parameters in the model including baseline parameters and regression coefficients. parms = c(baseparm, coef), where baseparm includes the initial values for baseline parameters used for base.dist, and coef includes the initial values for regression coefficients for the variables specified in formula. See details for the baseline parameters.

cuts

Vector of cuts that define the intervals where the hazard function is constant. The cuts should be specified base.dist="piecewise" and must be strictly positive and finite. Default is NULL.

data

Data frame generated from simfam or data frame containing specific variables: famID, indID, gender, currentage, mgene, time, status and weight with attr(data,"agemin") specified.

design

Study design of the family data. Possible choices are: "pop", "pop+", "cli", "cli+" or "twostage", where "pop" is for the population-based design with affected probands whose mutation status can be either carrier or non-carrier, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, and "twostage" is for the two-stage design with oversampling of high risks families. Default is "pop".

base.dist

Choice of baseline hazard distributions to fit. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal", "gamma", "logBurr", or "piecewise". Default is "Weibull".

agemin

Minimum age of disease onset or minimum age. Default is NULL.

robust

Logical; if TRUE, the robust ‘sandwich’ standard errors and variance-covariance matrix are provided, otherwise the conventional standard errors and variance-covariance matrix are provided.

method

Choice of methods for calculating the carrier probabilities for individuals with missing mutation status. Possible choices are "data" for empirical calculation of the carrier probabilities based on the observed carriers' statuses in the entire sample, specific to generation and proband's mutation status or "mendelian" for calculating carrier probabilities based on Mendelian transmission probabilies with the given allele frequency and mutation statuses observed in the family. Default is "data".

If method = "mendelian", specify both mode of the inheritance and the allele frequency q.

mode

Choice of modes of inheritance for calculating carrier probabilies for individuals with missing mutation status. Possible choices are "dominant" for dominant model or "recessive" for recessive model. Default is "dominant".

q

Frequency of the disease causing allele used for calculating carrier pobabilities. The value should be between 0 and 1. If NULL, the estimated allele frequency from data will be used. Default value is 0.02.

Details

The expectation and maximization (EM) algorithm is applied for making inference about the missing genotypes. In the expectation step, for individuals with unknown carrier status, we first compute their carrier probabilities given their family's observed phenotype and genotype information based on current estimates of parameters θ as follows, wfi = P(Xfi = 1 | Yfi, Xfo), where Xfi represents the mutation carrier status and Yfi represents the phenotype in terms of age at onset tfi and disease status δfi for individual i, i = 1, ..., nf, in family f,f=1,...,n,f, f = 1, ..., n, and Xfo represents the observed genotypes in family ff.

Then, we obtain the conditional expectation of the log-likelihood function (\ell) of the complete data given the observed data as a weighted log-likelihood, which has the form Eθ [ l (θ | Y, Xo) ] = ∑fi l fi (θ | Xfi = 1) * wfi + lfi (θ | Xfi = 0) * ( 1 - wfi ). In the maximization step, the updated parameter estimates are obtained by maximizing the weighted log likelihood computed in the E-step. These expectation and maximization steps iterate until convergence to obtain the maximum likelihood estimates. See more details in Choi and Briollais (2011) or Choi et al. (2014).

Note that the baseline parameters include lambda and rho, which represent the scale and shape parameters, respectively, and eta, additional parameter to specify for "logBurr" distribution. For the "lognormal" baseline distribution, lambda and rho represent the location and scale parameters for the normally distributed logarithm, where lambda can take any real values and rho > 0. For the other baselinse distributions, lambda > 0, rho > 0, and eta > 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise", baseparm should specify the initial interval-constant values, one more than the cut points specified bycuts.

Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters (λ,ρ\lambda, \rho) for "Weibull", "loglogistic", "Gompertz" and "gamma" baselines, to (λ,ρ,η\lambda, \rho, \eta) for "logBurr" and to the piecewise constant parameters for a piecewise baseline hazard. For "lognormal" baseline distribution, the log transformation is applied only to ρ\rho, not to λ\lambda, which represents the location parameter for the normally distributed logarithm.

Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance function via Monte-Carlo simulations of the estimated penetrance model.

Value

Returns an object of class 'penmodel', including the following elements:

estimates

Parameter estimates of transformed baseline parameters and regression coefficients.

varcov

Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix.

varcov.robust

Robust ‘sandwich’ variance-covariance matrix of parameter estimates when robust=TRUE.

se

Standard errors of parameter estimates obtained from the inverse of Hessian matrix.

se.robust

Robust ‘sandwich’ standard errors of parameter estimates when robust=TRUE.

logLik

Loglikelihood value for the fitted penetrance model.

AIC

Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H. and Briollais, L. (2011) An EM composite likelihood approach for multistage sampling of family data with missing genetic covariates, Statistica Sinica 21, 231-253.

Choi, Y.-H., Briollais, L., Green, J., Parfrey, P., and Kopciuk, K. (2014) Estimating successive cancer risks in Lynch Syndrome families using a progressive three-state model, Statistics in Medicine 33, 618-638.

See Also

simfam, penmodel, print.penmodel, summary.penmodel, print.summary.penmodel, plot.penmodel, carrierprob

Examples

# Family data simulated with 20% of members missing their genetic information.

set.seed(4321)
fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", base.parms = c(0.01, 3),
       vbeta = c(1, 2), agemin = 20, allelefreq = 0.02, mrate = 0.2)
 
# EM algorithm for fitting family data with missing genotypes

fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene", 
       parms = c(0.01, 3, 1, 2), data = fam, design="pop+", robust = TRUE, 
       base.dist = "Weibull", method = "mendelian", mode = "dominant", q = 0.02)

# Summary of the model parameter estimates from the model fit by penmodelEM 

summary(fit)

# Plot the lifetime penetrance curves from model fit for gender and 
# mutation status groups along with their nonparametric penetrance curves 
# based on observed data excluding probands.
 
plot(fit)

Plot penetrance functions

Description

Plots the penetrance functions given the values of baseline parameters and regression coefficients and choices of baseline and frailty distributions.

Usage

penplot(base.parms, vbeta, cuts = NULL, variation = "none", base.dist = "Weibull", 
frailty.dist = NULL, depend = 1, agemin = 20, agemax = 80, print = TRUE, 
col = c("blue","red","blue","red"),  lty = c(1, 1, 2, 2), add.legend = TRUE, 
add.title = TRUE, x = "topleft", y = NULL, xlab = "Age at onset", ylab = "Penetrance", 
ylim = NULL, main = NULL, ...)

Arguments

base.parms

Vector of parameter values for the specified baseline hazard function: base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal", c(lambda, rho, eta) for base.dist = "logBurr", or interval constant hazard values for the intervals produced by cuts for base.dist = "piecewise".

vbeta

Vector of regression coefficients for gender and majorgene, vbeta = c(beta.s, beta.g). If variation = "secondgene", regression coefficients for gender, major gene and second gene, vbeta = c(beta.s, beta.g1, beta.g2), should be specified.

cuts

Vector of cut points defining the intervals where the hazard function is constant. The cuts should be specified when base.dist = "piecewise" and must be strictly positive and finite. Default is NULL.

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, or "none" for no residual familial correlation. Default is "none".

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal", "gamma", or "piecewise". Default is "Weibull".

frailty.dist

Choice of frailty distribution. Possible choices are "gamma" for gamma distribution or "lognormal" for log normal distributions when variation = "frailty". Default is NULL.

depend

Variance of the frailty distribution. Dependence within families increases with depend value. Default value is 1.

agemin

Minimum age of disease onset. Default is 20 years of age.

agemax

Maximum age of disease onset. Default is 80 years of age.

print

Logical; if TRUE, prints the penetrance values by age 70 obtained from the assumed model for male carriers, female carriers, male noncarrers, and female noncarriers. Default is TRUE.

col

Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is c("blue", "red", "blue", "red").

lty

Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is c(1, 1, 2, 2).

add.legend

Logical; if TRUE, displays legend in the plot. Default is TRUE.

add.title

Logical; if TRUE, displays title in the plot. Default is TRUE.

x, y

Position of legend; see legend. Defaults are x = "topleft", y = NULL.

xlab

Title for the x-axis. Default is "Age at onset".

ylab

Title for the y-axis. Default is "Penetrance".

ylim

Limits of the y-axis. Default is NULL. If NULL, ylim will be automatically determined.

main

Main title of the plot. Default is NULL. If NULL, the title will be automatically created.

...

Other parameters to be passed through to plotting functions.

Details

Proportional hazard models

The penetrance model conditional on the covariates X = c(xs, xg) is assumed to have the following hazard function: h(t|X) = h0(t - t0) exp(βs * xs + βg * xg), where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The penetrance function for the penetrance model has the form, 1 - exp(- H0(t - t0) * exp(βs * xs + βg * xg )), where H0(t) is the cumulative baseline hazard function.

Shared frailty models

The penetrance model conditional on the frailty ZZ and covariates X = c(xs, xg) is assumed to have the following hazard function: h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg), where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

For example, when using a Weibull distribution for baseline hazard and a gamma distribution for frailty, the penetrance function has the form 1 - (1 + λρ * (t - t0)ρ * exp(βs * xs + βg * xg)/κ).

Two-gene models

The penetrance curve for the two-gene model is generated by 1 - exp(- H0(t - t0) * exp(βs * xs + β1 * x1 + β2 * x2)), where H0(t) is the cumulative baseline hazard function, x1 indicates carrior (1) or non-carrior (0) of a major gene and x2 indicates carrior (1) or non-carrior (0) of a second gene. When plotting with the two-gene model, the plot will generate separate curves for mutation carriers and noncarriers, and separate curves for the second gene carriers and noncarriers.

Value

Displays plots of the penetrance functions and returns the following values:

pen70

Penetrance estimates by age 70 specific to gender and mutation-status subgroups.

x.age

Vector of ages of onset ranging from agemin to agemax years

pen

Penetrance estimates computed at each age of x.age; if variation = "none" or "frailty", it includes subgroups specific to gender and mutation status for major gene. If variation = "secondgene", it includes subgroups specific to gender and both mutation statuses for major gene and second gene.

Author(s)

Yun-Hee Choi

See Also

simfam, plot.penmodel

Examples

# Penetrance function curves based on Weibull baseline hazard function

penplot(base.parms = c(0.01,3), vbeta = c(0.5, 2), variation = "none", base.dist = "Weibull", 
		agemin = 20, ylim = c(0,1))

Plot penetrance functions from competing risk models

Description

Plots the penetrance functions from competing risk models given the values of baseline parameters and regression coefficients and choices of baseline and frailty distributions.

Usage

penplot_cmp(event, base.parms, vbeta, cuts = NULL, variation = "none", 
          base.dist = "Weibull", frailty.dist = NULL, depend = c(1, 1), 
          agemin = 20, agemax = 80, print = TRUE, 
          col = c("blue","red","blue","red"),  lty = c(1, 1, 2, 2), 
          add.legend = TRUE, add.title = TRUE, x = "topleft", 
          y = NULL, xlab = "Age at onset", ylab = "Penetrance", 
          ylim = NULL, main = NULL, ...)

Arguments

event

Event of interest for penetrance function: either 1 or 2. Default is 1.

base.parms

List of vectors of parameter values for the specified baseline hazard functions for both events. For example, base.parms = list(c(lambda1, rho1), c(lambda2, rho2)) should be specified when base.dist = c("Weibull", "Weibull").

vbeta

List of vectors of regression coefficients for gender and majorgene, vbeta = list(c(beta1.s, beta1.g), c(beta2.s, beta2.g)) for both events. If variation = "secondgene", regression coefficients for gender, major gene and second gene, vbeta = list(c(beta1.s, beta1.g1, beta1.g2), c(beta2.s, beta2.g1, beta2.g2)), should be specified for each event.

cuts

Vector of cut points defining the intervals when base.dist = "piecewise" is specified and must be strictly positive and finite and greater than agemin. Default is NULL.

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, or "none" for no residual familial correlation. Default is "none".

base.dist

Vector of two baseline hazard distributions chosen for competing events. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal", "gamma", or "piecewise".If only one distribution is specified, the same distribution will be assumed for both events. A vector of two distributions should be specified if different baseline distributions are assumed for different events. Default is c("Weibull", "Weibull").

frailty.dist

Choice of frailty distribution. Possible choices are "gamma" for independent gamma, "lognormal" for independent log-normal, "cgamma" for correlated gamma, "clognormal" for correlated log-normal distribution when variation = "frailty" or "none" or NULL when no frailty distribution is assumed. Default is NULL.

depend

Vector of frailty parameter values assumed for specified frailty distribution. They corresponds inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default value is c(1,1).

agemin

Minimum age of disease onset. Default is 20 years of age.

agemax

Maximum age of disease onset. Default is 80 years of age.

print

Logical; if TRUE, prints the penetrance values by age 70 obtained from the assumed model for male carriers, female carriers, male noncarrers, and female noncarriers. Default is TRUE.

col

Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is c("blue", "red", "blue", "red").

lty

Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is c(1, 1, 2, 2).

add.legend

Logical; if TRUE, displays legend in the plot. Default is TRUE.

add.title

Logical; if TRUE, displays title in the plot. Default is TRUE.

x, y

Position of legend; see legend. Defaults are x = "topleft", y = NULL.

xlab

Title for the x-axis. Default is "Age at Onset".

ylab

Title for the y-axis. Default is "Penetrance".

ylim

Limits of the y-axis. Default is NULL. If NULL, ylim will be automatically determined.

main

Main title of the plot. Default is NULL. If NULL, the title will be automatically created.

...

Other parameters to be passed through to plotting functions.

Details

Cause-specific proportional hazard models The penetrance models for competing events conditional on the covariates X = c(xs, xg) are assumed to have the following hazard functions for event j=1,2j = 1, 2: hj(t|X) = h0j(t - t0) exp(βjs * xs + βjg * xg), where h0j(t) is the baseline hazard function for event jj, j=1,2j=1,2, t0j is a minimum age of disease onset, xs and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a major gene of interest, respectively.

The penetrance function for the penetrance model has the form, 1 - exp(- H0(t - t0) * exp(βs * xs + βg * xg )), where H0(t) is the cumulative baseline hazard function.

Shared frailty models

The penetrance model conditional on the frailty ZZ and covariates X = c(xs, xg) is assumed to have the following hazard function: h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg), where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

For example, when using a Weibull distribution for baseline hazard and a gamma distribution for frailty, the penetrance function has the form 1 - (1 + λρ * (t - t0)ρ * exp(βs * xs + βg * xg)/κ).

Two-gene models

The penetrance curve for the two-gene model is generated by 1 - exp(- H0(t - t0) * exp(βs * xs + β1 * x1 + β2 * x2)), where H0(t) is the cumulative baseline hazard function, x1 indicates carrior (1) or non-carrior (0) of a major gene and x2 indicates carrior (1) or non-carrior (0) of a second gene. When plotting with the two-gene model, the plot will generate separate curves for mutation carriers and noncarriers, and separate curves for the second gene carriers and noncarriers.

Value

Displays plots of the penetrance functions and returns the following values:

pen70

Penetrance estimates by age 70 specific to gender and mutation-status subgroups.

x.age

Vector of ages of onset ranging from agemin to agemax years

pen

Penetrance estimates computed at each age of x.age; if variation = "none" or "frailty", it includes subgroups specific to gender and mutation status for major gene. If variation = "secondgene", it includes subgroups specific to gender and both mutation statuses for major gene and second gene.

Author(s)

Yun-Hee Choi

See Also

simfam_cmp, plot.penmodel_cmp

Examples

# Penetrance function curves for event 1 
# based on Weibull baselines (no frailty)
penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), 
        vbeta = list(c(-1, 2), c(-1, 1)), variation = "none", 
        base.dist = "Weibull", agemin = 20, ylim = c(0,1))

# Penetrance function curves for event 1 
# based on gamma frailty and Weibull baselines
penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), 
        vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2), 
        variation = "frailty", frailty.dist="gamma", base.dist = "Weibull", 
        agemin = 20, ylim = c(0,1))

# Penetrance function curves for event 1 
# based on correlated gamma frailty and Weibull baselines
penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), 
        vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), 
        variation = "frailty", frailty.dist="cgamma", 
        base.dist = "Weibull", agemin = 20, ylim = c(0,1))

# Penetrance function curves for event 1 
# based on correlated lognormal frailty and Weibull baselines
penplot_cmp(event=1, base.parms = list(c(0.01,3), c(0.01, 3)), 
        vbeta = list(c(-1, 2), c(-1, 1)), depend=c(2, 2, 0.2), 
        variation = "frailty", frailty.dist="clognormal", 
        base.dist = "Weibull", agemin = 20, ylim = c(0,1))

Plot method for penmodel

Description

Plots penetrance curves estimated from the fitted penetrance model and overlays non-parametric penetrance curves estimated from the data without proabands.

Usage

## S3 method for class 'penmodel'
plot(x, agemax = 80, print = TRUE, mark.time = FALSE, conf.int = FALSE, 
add.KM = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), 
add.legend = TRUE, add.title = TRUE, xpos = "topleft", ypos = NULL, 
xlab = "Age at onset", ylab = "Penetrance", ylim = NULL, main = NULL,  ...)

Arguments

x

An object class of 'penmodel', a fitted model by penmodel or penmodelEM functions.

agemax

Maximum age of disease onset or maximum age. Default is 80 years of age.

print

Logical; if TRUE, displays parameter estimates and penetrance estimates by age 70.

mark.time

Logical; if TRUE, curves are marked at each censoring time, otherwise, no labeling is done.

conf.int

Logical; if TRUE, displays 95% confidence intervals for both parametric and non-parametric penetrance estimates for each subgroup and returns their lower and upper limits.

add.KM

Logical; if TRUE, displays Kaplan-Meier curves from data.

MC

Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method. If MC = 0, no confidence intervals will be calculated. Default value is 100.

col

Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is c("blue", "red", "blue", "red").

lty

Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is c(1, 1, 2, 2).

add.legend

Logical; if TRUE, displays a legend in the plot.

add.title

Logical; if TRUE, displays a title in the plot.

xpos, ypos

Position of legend; see legend. Defaults are xpos = "topleft", ypos = NULL.

xlab

Title for the x-axis. Default is "Age at onset".

ylab

Title for the y-axis. Default is "Penetrance".

ylim

Limits for the y-axis. Default is NULL. If NULL, ylim will be automatically determined.

main

Main title of the plot. Default is NULL. If NULL, the title will be automatically created.

...

Other parameters to be passed through to plotting functions.

Details

The 95% confidence intervals for the parametric penetrance curves are obtained based on simulations of the parameters, assuming a multivariate normal distribution for the estimated parameters with their variance-covariance matrix. See penetrance for more details.

Value

Returns the following summary values:

coefficients

Parameter estimates of transformed baseline parameters (λ,ρ\lambda, \rho) and regression coefficients for gender and mutation status (βs,βg\beta_s, \beta_g).

pen70

Penetrance estimates by age 70, specific to gender and mutation-status subgroups.

x.age

Vector of ages of onsest ranging from agemin to agemax years

pen

Penetrance estimates at each age in x.age, specific to gender and mutation-status subgroups.

lower

Lower limits of 95% confidence interval estimates for penetrance at each age in x.age, specific to gender and mutation status subgroups.

upper

Upper limits of 95% confidence interval estimates for penetrance at each age in x.age, specific to gender and mutation status subgroups.

Author(s)

Yun-Hee Choi

See Also

penmodel, print.penmodel, penmodelEM, summary.penmodel,print.summary.penmodel, simfam

Examples

# Simulated family data  

set.seed(4321)
fam <- simfam(N.fam = 300, design = "pop+", base.dist = "Weibull", variation = "none",
       base.parms = c(0.01,3), vbeta = c(-1.13, 2.35), allelefreq = 0.02, agemin = 20)
 
# Fit family data

fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", design = "pop+",
parms = c(0.01, 3, -1.13, 2.35), data = fam, base.dist = "Weibull", robust = TRUE)       

# Plot penetrance function curves with 95% CIs

plot(fit, agemax = 80, conf.int = TRUE)

Plot method for penmodel_cmp

Description

Plots penetrance curves for each event estimated from the fitted competing risks model and overlays non-parametric cumulative incidence curves estimated from the data without proabands.

Usage

## S3 method for class 'penmodel_cmp'
plot(x, agemax = 80, print = TRUE, conf.int = FALSE, 
add.CIF = TRUE, MC = 100, col = c("blue", "red", "blue", "red"), lty = c(1, 1, 2, 2), 
xlab = "Age at onset", ylab = "Penetrance", ylim = NULL,  ...)

Arguments

x

An object class of 'penmodel', a fitted model by penmodel or penmodelEM functions.

agemax

Maximum age of disease onset or maximum age. Default is 80 years of age.

print

Logical; if TRUE, displays parameter estimates and penetrance estimates by age 70.

conf.int

Logical; if TRUE, displays 95% confidence intervals for both parametric and non-parametric penetrance estimates for each subgroup and returns their lower and upper limits.

add.CIF

Logical; if TRUE, displays cumulative incidence curves from competing event data.

MC

Number of simulated samples used to calculate confidence intervals with a Monte-Carlo method. If MC = 0, no confidence intervals will be calculated. Default value is 100.

col

Colors of lines for male carriers, female carriers, male noncarrers, and female noncarriers. Default is c("blue", "red", "blue", "red").

lty

Types of lines for male carriers, female carriers, male noncarriers, and female noncarriers. Default is c(1, 1, 2, 2).

xlab

Title for the x-axis. Default is "Age at onset".

ylab

Title for the y-axis. Default is "Penetrance".

ylim

Limits for the y-axis. Default is NULL. If NULL, ylim will be automatically determined.

...

Other parameters to be passed through to plotting functions.

Details

The 95% confidence intervals for the parametric penetrance curves are obtained based on simulations of the parameters, assuming a multivariate normal distribution for the estimated parameters with their variance-covariance matrix. See penetrance_cmp for more details.

Value

Returns the following summary values:

coefficients

Parameter estimates from the competing risks model.

pen70

Penetrance estimates by age 70, specific to gender and mutation-status subgroups.

age

Vector of ages of onsest ranging from agemin to agemax years

pen1

Penetrance estimates for event 1 at each age in age, specific to gender and mutation-status subgroups.

pen2

Penetrance estimates for event 2 at each age in age, specific to gender and mutation-status subgroups.

lower1

Lower limits of 95% confidence interval estimates for penetrance for event 1 at each age in age, specific to gender and mutation status subgroups.

upper1

Upper limits of 95% confidence interval estimates for penetrance for event 1 at each age in x.age, specific to gender and mutation status subgroups.

lower2

Lower limits of 95% confidence interval estimates for penetrance for event 2 at each age in age, specific to gender and mutation status subgroups.

upper2

Upper limits of 95% confidence interval estimates for penetrance for event 2 at each age in x.age, specific to gender and mutation status subgroups.

Author(s)

Yun-Hee Choi

See Also

penmodel_cmp, print.penmodel_cmp, summary.penmodel_cmp, print.summary.penmodel_cmp, simfam_cmp

Examples

## Not run: 
# Simulate family data  
set.seed(4321)
fam2 <- simfam_cmp(N.fam = 500, design = "pop+", variation = "frailty", 
                  base.dist = "Weibull", frailty.dist = "cgamma", depend=c(2, 2, 2), 
                  allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
                  vbeta = list(c(-1.13, 2.35),c(-1, 2)))

# Fit family data
fit2 <- penmodel_cmp(formula1 = Surv(time, status==1)~ gender + mgene, 
                   formula2 = Surv(time, status==2)~ gender + mgene, 
                   cluster = "famID", gvar = "mgene", frailty.dist = "cgamma",
                   parms=list(c(0.01, 3, -1, 2.3), c(0.01, 3, -1, 2), c(2, 2, 2)), 
                   data=fam2, design="pop+", base.dist="Weibull", robust=TRUE)

# Plot penetrance function curves with 95
plot(fit2, conf.int=TRUE, MC=200, ylim=c(0, 0.7))


## End(Not run)

Plot method for simfam or Plot pedigrees

Description

Provides pedigree plots for specified families generated from simfam function with option to save plots into a pdf file.

Usage

## S3 method for class 'simfam'
plot(x, famid, pdf = FALSE, file = NULL, ...)

Arguments

x

An object of class 'simfam' created by simfam function or a data frame that has class attributes c("simfam", "data.frame").

famid

List of family IDs to plot. Default is the first family in given data set.

pdf

Logical; if TRUE, pedigree plots are saved in a pdf file. If FALSE, plot pedigrees on current plotting device. Default is FALSE.

file

File name to save the pedigree plots; Default file name is "pedigreeplot.pdf".

...

Additional arguments passed on to the plot function.

Details

Argument x can be a data frame that contains famID, indID, fatherID, motherID, gender (1 for male, 0 for female), status (1 for affected, 0 for non-affected), mgene (1 for mutation carrier, 0 for non-carrier, NA for missing), and proband (1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame").

Optionally, the data frame can contain a column named carrp.geno or carrp.pheno to replace missing values in mgene with their carrier probabilities.

Value

Returns pedigree plots for specified families created by simfam function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.

When object inlcudes carrp.geno and/or carrp.pheno generated by carrierprob function, the plot function displays the carrier probabilities for those with missing carrier status.

See Also

simfam, summary.simfam, carrierprob

Examples

# Simulated family data

set.seed(4321)
fam <- simfam(N.fam = 200, design = "pop+", base.dist = "Weibull", allelefreq = 0.02, 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20)
 
# Pedigree plots for first three simulated families

plot(fam, famid = c(1:3))

Plot method for simfam_cmp or Plot pedigrees

Description

Provides pedigree plots for specified families generated from simfam_cmp function with option to save plots into a pdf file.

Usage

## S3 method for class 'simfam_cmp'
plot(x, famid, pdf = FALSE, file = NULL, ...)

Arguments

x

An object of class 'simfam' created by simfam function or a data frame that has class attributes c("simfam", "data.frame").

famid

List of family IDs to plot. Default is the first family in given data set.

pdf

Logical; if TRUE, pedigree plots are saved in a pdf file. If FALSE, plot pedigrees on current plotting device. Default is FALSE.

file

File name to save the pedigree plots; Default file name is "pedigreeplot.pdf".

...

Additional arguments passed on to the plot function.

Details

Argument x can be a data frame that contains famID, indID, fatherID, motherID, gender (1 for male, 0 for female), status (1 for affected by event 1, 2 for affected by event 2, 0 for non-affected), mgene (1 for mutation carrier, 0 for non-carrier, NA for missing), and proband (1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame").

Optionally, the data frame can contain a column named carrp.geno or carrp.pheno to replace missing values in mgene with their carrier probabilities.

Value

Returns pedigree plots for specified families created by simfam_cmp function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.

When object inlcudes carrp.geno and/or carrp.pheno generated by carrierprob function, the plot function displays the carrier probabilities for those with missing carrier status.

See Also

simfam, summary.simfam, carrierprob

Examples

# Simulated competing risk data from gamma frailty model based on pop+ design

set.seed(4321)
fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", 
                  base.dist = "Weibull", frailty.dist = "gamma", depend=c(2, 2), 
                  allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
                  vbeta = list(c(-1.13, 2.35),c(-1, 2)))

# Pedigree plots for first three simulated families

plot(fam, famid = 1)

Plot method for simfam_tvc or Plot pedigrees

Description

Provides pedigree plots for specified families generated from simfam_tvc function with option to save plots into a pdf file.

Usage

## S3 method for class 'simfam_tvc'
plot(x, famid, pdf = FALSE, file = NULL, ...)

Arguments

x

An object of class 'simfam_tvc' created by simfam_tvc function or a data frame that has class attributes c("simfam_tvc", "data.frame").

famid

List of family IDs to plot. Default is the first family in given data set.

pdf

Logical; if TRUE, pedigree plots are saved in a pdf file. If FALSE, plot pedigrees on current plotting device. Default is FALSE.

file

File name to save the pedigree plots; Default file name is "pedigreeplot.pdf".

...

Additional arguments passed on to the plot function.

Details

Argument x can be a data frame that contains famID, indID, fatherID, motherID, gender (1 for male, 0 for female), status (1 for affected, 0 for non-affected), mgene (1 for mutation carrier, 0 for non-carrier, NA for missing), and proband (1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame").

Optionally, the data frame can contain a column named carrp.geno or carrp.pheno to replace missing values in mgene with their carrier probabilities.

Value

Returns pedigree plots for specified families created by simfam_tvc function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.

When object inlcudes carrp.geno and/or carrp.pheno generated by carrierprob function, the plot function displays the carrier probabilities for those with missing carrier status.

See Also

simfam_tvc, summary.simfam_tvc, carrierprob

Examples

# Simulated family data
set.seed(4321)
fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "gamma", depend = 1, 
       add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), 
       tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))
 
# Pedigree plots for first three simulated families

plot(fam, famid = c(1:2))

Plot method for simfam2 or Plot pedigrees

Description

Provides pedigree plots for specified families generated from simfam2 function with option to save plots into a pdf file.

Usage

## S3 method for class 'simfam2'
plot(x, famid, pdf = FALSE, file = NULL, ...)

Arguments

x

An object of class 'simfam2' created by simfam2 function or a data frame that has class attributes c("simfam2", "data.frame").

famid

List of family IDs to plot. Default is the first family in given data set.

pdf

Logical; if TRUE, pedigree plots are saved in a pdf file. If FALSE, plot pedigrees on current plotting device. Default is FALSE.

file

File name to save the pedigree plots; Default file name is "pedigreeplot.pdf".

...

Additional arguments passed on to the plot function.

Details

Argument x can be a data frame that contains famID, indID, fatherID, motherID, gender (1 for male, 0 for female), status (1 for affected, 0 for non-affected), mgene (1 for mutation carrier, 0 for non-carrier, NA for missing), and proband (1 for proband, 0 for non-proband) and should have class attributes class(x) <- c("simfam", "data.frame").

Optionally, the data frame can contain a column named carrp.geno or carrp.pheno to replace missing values in mgene with their carrier probabilities.

Value

Returns pedigree plots for specified families created by simfam2 function or for the data frame provided along with the affection and carrier mutation statuses of family members. Probands from each pedigree are indicated using red color.

When object inlcudes carrp.geno and/or carrp.pheno generated by carrierprob function, the plot function displays the carrier probabilities for those with missing carrier status.

See Also

simfam2, summary.simfam2, carrierprob

Examples

set.seed(4321)

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

plot(fam2, famid = c(1:2))

Print method for penmodel.

Description

Prints a summary of parameter estimates of a fitted penetrance model.

Usage

## S3 method for class 'penmodel'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

x

An object class of 'penmodel', a fitted model by penmodel or penmodelEM functions.

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Prints a short summary of the model and model fit.

Returns an object of class 'penmodel'.

Author(s)

Yun-Hee Choi

See Also

penmodel, penmodelEM, summary.penmodel, print.summary.penmodel, plot.penmodel


Print method for penmodel_cmp.

Description

Prints a summary of parameter estimates of a fitted competing risk penetrance model.

Usage

## S3 method for class 'penmodel_cmp'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

x

An object class of 'penmodel_cmp', a fitted model by penmodel_cmp function.

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Prints a short summary of the model and model fit.

Returns an object of class 'penmodel_cmp'.

Author(s)

Yun-Hee Choi

See Also

penmodel_cmp, summary.penmodel_cmp, print.summary.penmodel_cmp, plot.penmodel_cmp


Print method for summary.penmodel of a fitted penetrance model.

Description

Prints a short summary of parameter and penetrance estimates of a 'summary.penmodel' object.

Usage

## S3 method for class 'summary.penmodel'
print(x, digits = max(3, getOption("digits") - 3),
  signif.stars=TRUE, ...)

Arguments

x

An object class of 'summary.penmodel', a result of a call to summary.penmodel.

digits

Number of significant digits to use when printing.

signif.stars

Logical; if TRUE, provides stars to hightlight significant p-values. Default is TRUE.

...

Further arguments passed to or from other methods.

Value

Prints a summary of parameter estimates, their standard errors, tt-statistics and corresponding two-sided pp-values and additionally indicates significance stars if signif.stars is TRUE.

Also prints penetrance estimates by age 70 specific to gender and mutation-status subgroups along with their standard errors and 95% confidence intervals.

Returns an object of class 'summary.penmodel'.

Author(s)

Yun-Hee Choi

See Also

penmodel, penmodelEM, print.penmodel, summary.penmodel


Print method for summary.penmodel_cmp of a fitted competing risks penetrance model.

Description

Prints a short summary of parameter and penetrance estimates of a 'summary.penmodel_cmp' object.

Usage

## S3 method for class 'summary.penmodel_cmp'
print(x, digits = max(3, getOption("digits") - 3),
  signif.stars=TRUE, ...)

Arguments

x

An object class of 'summary.penmodel_cmp', a result of a call to summary.penmodel_cmp.

digits

Number of significant digits to use when printing.

signif.stars

Logical; if TRUE, provides stars to hightlight significant p-values. Default is TRUE.

...

Further arguments passed to or from other methods.

Value

Prints a summary of parameter estimates, their standard errors, tt-statistics and corresponding two-sided pp-values and additionally indicates significance stars if signif.stars is TRUE.

Also prints penetrance estimates for each event by age 70 specific to gender and mutation-status subgroups along with their standard errors and 95% confidence intervals.

Returns an object of class 'summary.penmodel_cmp'.

Author(s)

Yun-Hee Choi

See Also

penmodel_cmp, print.penmodel_cmp, summary.penmodel_cmp


Generate familial time-to-event data

Description

Generates familial time-to-event data for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.

Usage

simfam(N.fam, design = "pop", variation = "none", interaction = FALSE, 
       add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, 
       base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), 
       vbeta = c(1, 1), 
       allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE,
       mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)

Arguments

N.fam

Number of families to generate.

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+", "twostage", or "noasc", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family, and "noasc" for no ascertainment correction that families are from simple random sampling. Default is "pop".

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, "kinship" for correlated frailties within families with kinship matrix, or "none" for no residual familial correlation. Default is "none".

interaction

Logical; if TRUE, allows the interaction between gender and mutation status. Default is FALSE.

add.x

Logical; if TRUE, generates a covariate in addition to gender and majorgene. Default is FALSE.

x.dist

Distribution of the covairate. Possible choices to generate the covariate are: "normal" from normal distribution and "binomial" from binominal distribution. Default is NULL.

x.parms

Parameter values for the specified distribution of the covariate. x.parms = c(mean, sd) should be specified for x.dist = "normal" and x.parms = c(size, probs) should be specified for "binomial"; when size = 1, it generates binary covariate. Default is NULL.

depend

Inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default is NULL. Value should be specified as a positive real number when variation = "frailty" or variation = "kinship".

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", "logBurr". Default is "Weibull".

frailty.dist

Choice of frailty distribution. Possible choices are: "gamma" or "lognormal" when variation="frailty". Default is NULL.

base.parms

Vector of parameter values for the specified baseline hazard function. base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal". For base.dist = "logBurr", three parameters should be specified base.parms = c(lambda, rho, eta). Default value is base.parms = c(0.016, 3) for base.dist = "Weibull".

vbeta

Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if interaction = TRUE), secondgene (if variation = "secondgene"), and additional covariate (if add.x = TRUE).

allelefreq

Population allele frequencies of major disease gene. Value should be between 0 and 1. Vector of population allele frequencies of major and second disease genes should be provided when variation = "secondgene". Default value is allelefreq = 0.02.

dominant.m

Logical; if TRUE, the genetic model of major gene is dominant, otherwise recessive.

dominant.s

Logical; if TRUE, the genetic model of second gene is dominant, otherwise recessive.

mrate

Proportion of missing genotypes, value between 0 and 1. Default value is 0.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

probandage

Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, probandage = c(45, 2).

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

agemax

Maximum age of disease onset or maximum age. Default is 100 years of age.

Details

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family. design = "noasc" is to be used for no ascertainment correction.

The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene", "kinship".. When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg). The ages at onset correlated within families are generated from the shared frailty model (variation = "frailty") , the correlated shared frailty model with kinship matrix (variation = "kinship"), or the two-gene model (variation = "secondene"), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.

The proportional hazard model

h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The shared frailty model

h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, ZZ represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The correlated shared frailty model with kinship matrix

h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, ZZ represents a vector of frailties following a multivariate log-normal distribution with mean 00 and variance 2K/depend2*K/depend, where KK represents the kinship matrix, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The two-gene model

h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),

where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.

The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.

Note that simulating family data under the clinic-based designs ("cli" or "cli+") or the two-stage design can be slower since the ascertainment criteria for the high risk families are difficult to meet in such settings. Especially, "cli" design could be slower than "cli+" design since the proband's mutation status is randomly selected from a disease population in "cli" design, so his/her family members are less likely to be mutation carriers and have less chance to be affected, whereas the probands are all mutation carriers, their family members have higher chance to be carriers and affected by disease. Therefore, "cli" design requires more iterations to sample high risk families than "cli+" design.

Value

Returns an object of class 'simfam', a data frame which contains:

famID

Family identification (ID) numbers.

indID

Individual ID numbers.

gender

Gender indicators: 1 for males, 0 for females.

motherID

Mother ID numbers.

fatherID

Father ID numbers.

proband

Proband indicators: 1 if the individual is the proband, 0 otherwise.

generation

Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings.

majorgene

Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene.

secondgene

Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene.

ageonset

Ages at disease onset in years.

currentage

Current ages in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected, 0 for unaffected (censored).

mgene

Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or NA if missing.

relation

Family members' relationship with the proband:

1 Proband (self)
2 Brother or sister
3 Son or daughter
4 Parent
5 Nephew or niece
6 Spouse
7 Brother or sister in law
fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members in family.

weight

Sampling weights.

Author(s)

Yun-Hee Choi, Wenqing He

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam, plot.simfam, penplot

Examples

## Example 1: simulate family data from a population-based design using
#  a Weibull distribution for the baseline hazard and inducing 
#  residual familial correlation through a shared gamma frailty.

set.seed(4321)
fam <- simfam(N.fam = 10, design = "pop+", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "gamma", depend = 1, 
       allelefreq = 0.02, base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))

head(fam) 

## Not run: 
  famID indID gender motherID fatherID proband generation majorgene secondgene
1     1     1      1        0        0       0          1         2          0
2     1     2      0        0        0       0          1         2          0
3     1     3      0        2        1       1          2         2          0
4     1     4      1        0        0       0          0         3          0
5     1     9      0        3        4       0          3         2          0
6     1    10      1        3        4       0          3         3          0
   ageonset currentage     time status mgene relation fsize naff weight
1 103.76925   69.19250 69.19250      0     1        4    18    2      1
2  64.88982   67.31119 64.88982      1     1        4    18    2      1
3  45.84891   47.57119 45.84891      1     1        1    18    2      1
4 269.71990   47.37403 47.37403      0     0        6    18    2      1
5  69.78355   27.80081 27.80081      0     1        3    18    2      1
6 192.09392   25.34148 25.34148      0     0        3    18    2      1

## End(Not run)

summary(fam)

plot(fam, famid = c(1:2)) # pedigree plots for families with IDs = 1 and 2

## Example 2: simulate family data from a two-stage design to include 
#  30% of high risk families in the sample. 

set.seed(4321)
fam <- simfam(N.fam = 50, design = "twostage", variation = "none", base.dist = "Weibull", 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), hr = 0.3, allelefreq = 0.02)

summary(fam)

## Example 3: simulate family data from a correlated frailty model with kinship matrix 

set.seed(4321)
fam <- simfam(N.fam = 50, design = "pop", variation = "kinship", base.dist = "Weibull", 
       frailty.dist = "lognormal", base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), 
       depend = 1, allelefreq = 0.02)

summary(fam)

Generate familial competing risks data

Description

Generates familial competing risks data for specified study design, genetic model and source of residual familial correlation; the generated data frame has the same family structure as that simfam function, including individual's id, father id, mother id, relationship to proband, generation, gender, current age, genotypes of major or second genes.

Usage

simfam_cmp(N.fam, design = "pop+", variation = "none", interaction = FALSE, 
         depend = NULL, base.dist = c("Weibull", "Weibull"), frailty.dist = "none", 
         base.parms = list(c(0.016, 3), c(0.016, 3)), 
         vbeta = list(c(-1.13, 2.35), c(-1, 2)), allelefreq = 0.02, dominant.m = TRUE, 
         dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), 
         agemin = 20, agemax = 100)

Arguments

N.fam

Number of families to generate.

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+" or "twostage", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands and "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family. Default is "pop+".

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, or "none" for no residual familial correlation. Default is "none".

interaction

Logical; if TRUE, allows the interaction between gender and mutation status. Two logical values should be specified for each competing event; if only one logical value is provided, the same logical value will be assumed for both events. Default is FALSE.

depend

Two values shoud be specified for each competing event when frailty.dist = "gamma" or frailty.dist = "lognormal", three values should be specified with frailty.dist = "cgamma" or frailty.dist = "clognormal". The first two values represent the inverse of the variance for each competing event and the third value represents the correlation between the two events.

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", "logBurr". Default is "Weibull". Two distributions should be specified for each competing event. If only one distribution is specified, the same distribution will be assumed for both events.

frailty.dist

Choice of frailty distribution. Possible choices are "gamma" for independent gamma, "lognormal" for independent lognormal, "cgamma" for correlated gamma, or "clognormal" for correlated lognormal distribution. Default is NULL.

base.parms

The list of two vectors of baseline parameters for each event should be specified. For example, base.parms=list(c(lambda1, rho1), c(lambda2, rho2)) should be specified for base.dist=c("Weibull", "Weibull").

Two parameters base.parms=c(lambda, rho) should be specified for base.dist="Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal", and three parameters should be specified base.parms = c(lambda, rho, eta) for base.dist="logBurr".

vbeta

List of two vectors of regression coefficients for each event should be specified. Each vector contains regression coefficients for gender, majorgene, interaction between gender and majorgene (if interaction = TRUE), and secondgene (if variation = "secondgene").

allelefreq

Population allele frequencies of major disease gene. Value should be between 0 and 1. Vector of population allele frequencies for major and second disease genes should be provided when variation = "secondgene". Default value is allelefreq = 0.02.

dominant.m

Logical; if TRUE, the genetic model of major gene is dominant, otherwise recessive.

dominant.s

Logical; if TRUE, the genetic model of second gene is dominant, otherwise recessive.

mrate

Proportion of missing genotypes, value between 0 and 1. Default value is 0.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

probandage

Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, probandage = c(45, 2).

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

agemax

Maximum age of disease onset or maximum age. Default is 100 years of age.

Details

Competing risk model

Event 1:

h1(t|X,Z) = h01(t - t0) Z1 exp(βs1 * xs + βg1 * xg),

Event 2:

h2(t|X,Z) = h02(t - t0) Z2 exp(βs2 * xs + βg2 * xg),

where h01(t) and h02(t) are the baseline hazard functions for event 1 and event 2, respectively, t0 is a minimum age of disease onset, Z1 and Z2 are frailties shared within families for each event and follow either a gamma, log-normal, correlateg gamma, or correlated log-normal distributions, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

Choice of frailty distributions for competing risk models

frailty.dist = "gamma" shares the frailties within families generated from a gamma distribution independently for each competing event, where Zj follows Gamma(kj, 1/kj).

frailty.dist = "lognormal" shares the frailties within families generated from a log-normal distribution independently for each competing event, where Zj follows log-normal distribution with mean 0 and variance (1/kj.

frailty.dist = "cgamma" shares the frailties within families generated from a correlated gamma distribution to allow the frailties between two events to be correlated, where the correlated gamma frailties (Z1, Z2) are generated with three independent gamma frailties (Y0, Y1, Y2) as follows:

Z1 = k0/(k0 + k1) Y0 + Y1 Z2 = k0/(k0 + k2) Y0 + Y2

where Y0 from Gamma(k0, 1/k0);

Y1

from Gamma(k1, 1/(k0 + k1));

Y2

from Gamma(k2, 1/(k0 + k2)).

frailty.dist = "clognormal" shares the frailties within families generated from a correlated log-normal distribution where log(Zj) follows a normal distribution with mean 0, variance 1/kj and correlation between two events k0.

depend should specify the values of related frailty parameters: c(k1, k2) with frailty.dist = "gamma" or frailty.dist = "lognormal"; c(k1, k2, k0) for frailty.dist = "cgamma" or frailty.dist = "clognormal".

The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family.

Note that simulating family data under the clinic-based designs ("cli" or "cli+") or the two-stage design can be slower since the ascertainment criteria for the high risk families are difficult to meet in such settings. Especially, "cli" design could be slower than "cli+" design since the proband's mutation status is randomly selected from a disease population in "cli" design, so his/her family members are less likely to be mutation carriers and have less chance to be affected, whereas the probands are all mutation carriers, their family members have higher chance to be carriers and affected by disease. Therefore, "cli" design requires more iterations to sample high risk families than "cli+" design.

Value

Returns an object of class 'simfam', a data frame which contains:

famID

Family identification (ID) numbers.

indID

Individual ID numbers.

gender

Gender indicators: 1 for males, 0 for females.

motherID

Mother ID numbers.

fatherID

Father ID numbers.

proband

Proband indicators: 1 if the individual is the proband, 0 otherwise.

generation

Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings.

majorgene

Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene.

secondgene

Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene.

ageonset

Ages at disease onset in years.

currentage

Current ages in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected by event 1, 2 for affected by event 2, 0 for unaffected (censored).

mgene

Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or NA if missing.

relation

Family members' relationship with the proband:

1 Proband (self)
2 Brother or sister
3 Son or daughter
4 Parent
5 Nephew or niece
6 Spouse
7 Brother or sister in law
fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members by either event 1 or 2 within family.

df1

Number of affected members by event 1 within family.

df2

Number of affected members by event 2 within family.

weight

Sampling weights.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07.

Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam_cmp, plot.simfam_cmp, penplot_cmp

Examples

## Example 1: simulate competing risk family data from pop+ design using
#  Weibull distribution for both baseline hazards and inducing 
#  residual familial correlation through a correlated gamma frailty.

set.seed(4321)
fam <- simfam_cmp(N.fam = 10, design = "pop+", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), 
       allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
       vbeta = list(c(-1.13, 2.35), c(-1, 2)))


head(fam) 

## Not run: 
  famID indID gender motherID fatherID proband generation majorgene secondgene  ageonset
1     1     1      1        0        0       0          1         3          0 124.23752
2     1     2      0        0        0       0          1         2          0  54.66936
3     1     3      0        2        1       1          2         2          0  32.75208
4     1     4      1        0        0       0          0         3          0 136.44926
5     1    11      1        3        4       0          3         3          0  71.53672
6     1    12      1        3        4       0          3         3          0 152.47073
  currentage     time status true_status mgene relation fsize naff df1 df2 weight
1   65.30602 65.30602      0           2     0        4    25    2   1   1      1
2   68.62107 54.66936      1           1     1        4    25    2   1   1      1
3   47.07842 32.75208      2           2     1        1    25    2   1   1      1
4   45.09295 45.09295      0           2     0        6    25    2   1   1      1
5   25.32819 25.32819      0           1     0        3    25    2   1   1      1
6   22.95059 22.95059      0           2     0        3    25    2   1   1      1

## End(Not run)

summary(fam)

plot(fam, famid = 1) # pedigree plots for family with ID = 1

Generate familial time-to-event data with a time-varying covariate

Description

Generates familial time-to-event data with a time-varying covariate for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.

Usage

simfam_tvc(N.fam, design = "pop", variation = "none", interaction = FALSE, 
       add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL, 
       add.tvc = FALSE, tvc.type = "PE", tvc.range = NULL, tvc.parms = 1,
       base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3), 
       vbeta = c(1, 1), 
       allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE,
       mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)

Arguments

N.fam

Number of families to generate.

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+", "twostage", or "noasc", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family, and "noasc" for no ascertainment correction that families are from simple random sampling. Default is "pop".

variation

Source of residual familial correlation. Possible choices are: "frailty" for frailty shared within families, "secondgene" for second gene variation, "kinship" for correlated frailties within families with kinship matrix, or "none" for no residual familial correlation. Default is "none".

interaction

Logical; if TRUE, allows the interaction between gender and mutation status. Default is FALSE.

add.x

Logical; if TRUE, generates a covariate in addition to gender and majorgene. Default is FALSE.

x.dist

Distribution of the covairate. Possible choices to generate the covariate are: "normal" from normal distribution and "binomial" from binominal distribution. Default is NULL.

x.parms

Parameter values for the specified distribution of the covariate. x.parms = c(mean, sd) should be specified for x.dist = "normal" and x.parms = c(size, probs) should be specified for "binomial"; when size = 1, it generates binary covariate. Default is NULL.

depend

Inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default is NULL. Value should be specified as a positive real number when variation = "frailty" or variation = "kinship".

add.tvc

Logical; if TRUE, generates a time varying covariate. Default is FALSE.

tvc.type

Choice of time-varying covariate model. Possible choices are: "PE" and "CO". Default is "PE".

tvc.range

Range of ages at which the time-varying covariate occurs. Default is NULL.

tvc.parms

Vector of parameter values used for the time-varying covariate model. Default value is 1.

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", "logBurr". Default is "Weibull".

frailty.dist

Choice of frailty distribution. Possible choices are: "gamma" or "lognormal" when variation="frailty". Default is NULL.

base.parms

Vector of parameter values for the specified baseline hazard function. base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal". For base.dist = "logBurr", three parameters should be specified base.parms = c(lambda, rho, eta). Default value is base.parms = c(0.016, 3) for base.dist = "Weibull".

vbeta

Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if interaction = TRUE), secondgene (if variation = "secondgene"), and additional covariate (if add.x = TRUE).

allelefreq

Population allele frequencies of major disease gene. Value should be between 0 and 1. Vector of population allele frequencies of major and second disease genes should be provided when variation = "secondgene". Default value is allelefreq = 0.02.

dominant.m

Logical; if TRUE, the genetic model of major gene is dominant, otherwise recessive.

dominant.s

Logical; if TRUE, the genetic model of second gene is dominant, otherwise recessive.

mrate

Proportion of missing genotypes, value between 0 and 1. Default value is 0.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

probandage

Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, probandage = c(45, 2).

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

agemax

Maximum age of disease onset or maximum age. Default is 100 years of age.

Details

Time-varying covariate

When add.tvc = TRUE, the time at which the time-varying covariate (TVC) occurs, tvc.age, is generated from a uniform distribution with the range specified by tvc.range. A vector of minimum and maximum ages for the TVC should be specified in tve.range. When tvc.range = NULL, agemin and agemax are used as the range. In addition, tvc.type should be either "PE" or "CO" and the parameter values for the specified TVC type should be provided in tvc.parms.

tvc.type = "PE" represents a permanent exposure model for TVC which assumes that the effect of the TVC stays constant after tvc.age. The tvc.parms for the PE model should be specified as a single value, which represents log hazard ratio.

tvc.type = "CO" represents the Cox and Oaks model for TVC which assumes that the effect of the TVC decays exponentially over time in the form β exp(-(t - t*) η) + η0, where t* is the time at which the TVC occurs. The tvc.parms for the CO model should be specified by a vector of three parameters consisting of c(beta, eta, eta0).

Family-based study design

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family. design = "noasc" is to be used for no ascertainment correction.

Penetrance model

The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene", "kinship".. When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg).

The ages at onset correlated within families are generated from the shared frailty model (variation = "frailty") , the correlated shared frailty model with kinship matrix (variation = "kinship"), or the two-gene model (variation = "secondene"), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.

The proportional hazard model

h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The shared frailty model

h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, ZZ represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The correlated shared frailty model with kinship matrix

h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, ZZ represents a vector of frailties following a multivariate log-normal distribution with mean 00 and variance 2Kdepend2*K*depend, where KK represents the kinship matrix, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.

The two-gene model

h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),

where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.

The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.

Value

Returns an object of class 'simfam', a data frame which contains:

famID

Family identification (ID) numbers.

indID

Individual ID numbers.

gender

Gender indicators: 1 for males, 0 for females.

motherID

Mother ID numbers.

fatherID

Father ID numbers.

proband

Proband indicators: 1 if the individual is the proband, 0 otherwise.

generation

Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings.

majorgene

Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene.

secondgene

Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene.

ageonset

Ages at disease onset in years.

currentage

Current ages in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected, 0 for unaffected (censored).

mgene

Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or NA if missing.

newx

Additional covariate when add.x = TRUE.

tvc.age

Age at which the time-varying covariate occurs when add.tvc = TRUE.

tvc.status

TVC status: 1 if tvc.age < time, 0 otherwise.

relation

Family members' relationship with the proband:

1 Proband (self)
2 Brother or sister
3 Son or daughter
4 Parent
5 Nephew or niece
6 Spouse
7 Brother or sister in law
fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members in family.

weight

Sampling weights.

Author(s)

Yun-Hee Choi

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam_tvc, plot.simfam_tvc

Examples

## Example: simulate family data with TVC based on CO model.

set.seed(4321)
fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "gamma", depend = 1, 
       add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), 
       tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))

## Not run: 
> head(fam) 
  famID indID gender motherID fatherID proband generation majorgene secondgene ageonset
1     1     1      1        0        0       0          1         2          0 61.80566
2     1     2      0        0        0       0          1         3          0 61.56996
3     1     3      0        2        1       1          2         2          0 39.42050
4     1     4      1        0        0       0          0         3          0 90.17320
5     1    13      0        3        4       0          3         3          0 51.49538
6     1    14      0        3        4       0          3         3          0 75.97238
  currentage     time status mgene  tvc.age tvc.status relation fsize naff weight
1   68.26812 61.80566      1     1 59.16387          1        4    29    3      1
2   68.60174 61.56996      1     0 39.45786          1        4    29    3      1
3   47.05410 39.42050      1     1 35.01941          1        1    29    3      1
4   44.86501 44.86501      0     0 58.67013          0        6    29    3      1
5   22.73075 22.73075      0     0 30.19254          0        3    29    3      1
6   22.71399 22.71399      0     0 40.66258          0        3    29    3      1

> summary(fam) 
Study design:           pop: population-based study with affected probands 
Baseline distribution:  Weibull 
Frailty distribution:   gamma 
Number of families:                     10 
Average number of affected per family:  3.1 
Average number of carriers per family:  3.4 
Average family size:                    16.3 
Average age of onset for affected:      48.19 
Average number of TVC event per family: 4 
Sampling weights used:                  1

## End(Not run)

Generate familial time-to-event data with Kinship or IBD matrices.

Description

Generate familial time-to-event data from correlated fraily model with Kinship or/and IBD matrices given pedigree data.

Usage

simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, 
base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), 
vbeta = c(1, 1), agemin = 20, hr = NULL)

Arguments

inputdata

Dataframe contains variables famID, indID, gender, motherID, fatherID, proband, generation, currentage and other variables to be used in generating time-to-event data.

IBD

IBD matrix

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+", "twostage", or "noasc", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family, and "noasc" for no ascertainment correction that families are from simple random sampling. Default is "pop".

variation

Source of residual familial correlation. Possible choices are: "kinship" for correlated frailties within families generated by kinship matrix, "IBD" for correlated frailties by IBD matrix, c("kinship", "IBD") by both kinship and IBD matrices, or "none" for no residual familial correlation. Default is "none".

depend

Inverse of variance for the frailty distribution. A single value should be specified when variation = "IBD" or variation = "kinship" or a vector of two values when variation = c("kinship", "IBD"), where the first element corresponds to kinship matrix and the second element corresponds to IBD matrix. Default is NULL.

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", "logBurr". Default is "Weibull".

base.parms

Vector of parameter values for the specified baseline hazard function. base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal". For base.dist = "logBurr", three parameters should be specified base.parms = c(lambda, rho, eta). Default value is base.parms = c(0.016, 3) for base.dist = "Weibull".

var_names

Names of variables to be used in generating time-to-event data. Specified variables should be part of inputdata.

vbeta

Vector of regression coefficients for the variables specified by var_names.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

Details

The ages at onset are generated from the correlated frailties and covariates using the following model:

The correlated shared frailty model with kinship and/or IBD matrices

h(t|X,Z) = h0(t - t0) Z exp( X*vbeta ),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, ZZ represents a vector of frailties following a multivariate log-normal distribution with mean 00 and variance 2Ksig1+Dsig22*K*sig1 + D*sig2, where KK represents the kinship matrix and D is IBD matrix, sig1sig1 and sig2sig2 are variance components related to each matrix and their values are specified by depend = c(1/sig1, 1/sig2), and XX represents a vector of variables whose names are specified by var_names, and \beta is a vector of corresponding coefficients whose values are specified by vbeta.

The variance structure of the frailties shared within families is chosen by either variation = "kinship" or "IBD" matrix or both variation = c("kinship", "IBD").

When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the covariates XX.

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family. design = "noasc" is to be used for no ascertainment correction.

Value

Returns an object of class 'simfam', a data frame which contains inputdata and the following:

ageonset

Ages at disease onset in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected, 0 for unaffected (censored).

fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members in family.

weight

Sampling weights.

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam2, plot.simfam, penplot

Examples

## Example: simulate family data from a population-based design using
#  a Weibull distribution for the baseline hazard and inducing 
#  residual familial correlation through kinship and IBD matrices.

# Inputdata and IBD matrix should be provided; 
# simuated inputdata as an example here;

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

head(fam2)

summary(fam2)

Summary method for class penmodel

Description

Provides a summary of a fitted penetrance model.

Usage

## S3 method for class 'penmodel'
summary(object, correlation=FALSE, ...)

Arguments

object

An object class of 'penmodel', a fitted model by penmodel or penmodelEM functions.

correlation

Logical; if TRUE, returns the correlation matrix of the estimated parameters. Default is FALSE.

...

Further arguments passed to or from other methods.

Value

Returns the object of class 'summary.penmodel', including the following summary values:

estimates

List of parameter estimates of transformed baseline parameters and regression coefficients, their standard errors, their robust standard errors if robust=TRUE was selected when fitting the penetrance model, tt-statistics and corresponding two-sided pp-values.

varcov

Variance-covariance matrix of the parameter estimates.

varcov.robust

Robust variance-covariance matrix of the parameter estimates if robust = TRUE was selected when fitting the penetrance model.

correlation

Correlation matrix obtained from the variance-covariance matrix.

correlation.robust

Correlation matrix obtained from the robust variance-covariance matrix if robust = TRUE was selected when fitting the penetrance model.

Author(s)

Yun-Hee Choi

See Also

penmodel, penmodelEM, print.penmodel, print.summary.penmodel plot.penmodel

Examples

# Simulated family data 

set.seed(4321)
fam <- simfam(N.fam = 200, design = "pop+", variation = "none", base.dist = "Weibull", 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35), agemin = 20, allelefreq = 0.02)
 
# Penetrance model fit for the simulated family data

fit <- penmodel(Surv(time, status) ~ gender + mgene, cluster = "famID", 
       parms=c(0.01, 3, -1.13, 2.35), data = fam, design = "pop+", base.dist = "Weibull")

# Summary of the model parameter and penetrance estimates from model fit

summary(fit)

## Not run: 
Estimates: 
            Estimate Std. Error t value Pr(>|t|)  
log(lambda)   -4.531    0.08583 -52.793  0.01206 *
log(rho)       1.113    0.04688  23.737  0.02680 *
gender        -1.302    0.19233  -6.768  0.09339 .
mgene          2.349    0.23825   9.859  0.06436 .
Signif. codes:   0 '***'  0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

## End(Not run)

Summary method for class penmodel_cmp

Description

Provides a summary of a fitted competing risks penetrance model.

Usage

## S3 method for class 'penmodel_cmp'
summary(object, correlation=FALSE, ...)

Arguments

object

An object class of 'penmodel_cmp', a fitted model by penmodel_cmp function.

correlation

Logical; if TRUE, returns the correlation matrix of the estimated parameters. Default is FALSE.

...

Further arguments passed to or from other methods.

Value

Returns the object of class 'summary.penmodel_cmp', including the following summary values:

estimates

List of parameter estimates of transformed baseline parameters and regression coefficients, their standard errors, their robust standard errors if robust=TRUE was selected when fitting the penetrance model, tt-statistics and corresponding two-sided pp-values.

varcov

Variance-covariance matrix of the parameter estimates.

varcov.robust

Robust variance-covariance matrix of the parameter estimates if robust = TRUE was selected when fitting the penetrance model.

correlation

Correlation matrix obtained from the variance-covariance matrix.

correlation.robust

Correlation matrix obtained from the robust variance-covariance matrix if robust = TRUE was selected when fitting the penetrance model.

Author(s)

Yun-Hee Choi

See Also

penmodel_cmp, print.penmodel_cmp, print.summary.penmodel_cmp plot.penmodel_cmp

Examples

# Simulated family completing risks data 

## Not run: 
set.seed(4321)
fam1 <- simfam_cmp(N.fam = 300, design = "pop+", variation = "frailty", competing=TRUE,
              base.dist = "Weibull", frailty.dist = "gamma", depend=c(0.5, 1), 
              allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), 
              vbeta = list(c(-1.13, 2.35),c(-1, 2)))

# Penetrance model fit for the simulated family data

fit <- penmodel_cmp(
       formula1 = Surv(time, status==1) ~ gender + mgene, 
       formula2 = Surv(time, status==2) ~ gender + mgene, 
       cluster = "famID", 
       parms = list(c(0.01, 3, -1.13, 2.35), c(0.01, 3, -1, 2)),
       data = fam1, design = "pop+", base.dist = "Weibull")

# Summary of the model parameter and penetrance estimates from model fit

summary(fit)

## End(Not run)

Summary method for simfam

Description

Provides a summary of simulated data.

Usage

## S3 method for class 'simfam'
summary(object, digits = max(3, getOption("digits") - 3), ...)

Arguments

object

An object class of 'simfam' generated from simfam function

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Displays a summary of simulated data and returns the following values:

num.fam

Number of families simulated.

avg.num.affected

Average number of affected individuals per family.

avg.num.carriers

Average number of mutation carriers per family.

avg.family.size

Average family size.

ave.ageonset

Average age of onset for affected individuals.

Author(s)

Yun-Hee Choi

See Also

simfam

Examples

set.seed(4321)
fam <- simfam(N.fam = 50, design = "pop", variation = "none", base.dist = "Weibull", 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))
       
summary(fam)
## Not run: 
Study design:                           pop 
Baseline distribution:                  Weibull 
Number of families:                     50 
Average number of affected per family:  1.24 
Average number of carriers per family:  1.3 
Average family size:                    17.02 
Average age of onset for affected:      40.08 

## End(Not run)

Summary method for simfam_cmp

Description

Provides a summary of simulated data from simfam_cmp function.

Usage

## S3 method for class 'simfam_cmp'
summary(object, digits = max(3, getOption("digits") - 3), ...)

Arguments

object

An object class of 'simfam' generated from simfam_cmp function

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Displays a summary of simulated data and returns the following values:

num.fam

Number of families simulated.

avg.num.affected1

Average number of affected individuals by event 1 per family.

avg.num.affected2

Average number of affected individuals by event 2 per family.

avg.num.carriers

Average number of mutation carriers per family.

avg.family.size

Average family size.

ave.ageonset1

Average age of onset for affected individuals by event 1.

ave.ageonset2

Average age of onset for affected individuals by event 2.

Author(s)

Yun-Hee Choi

See Also

simfam_cmp

Examples

set.seed(4321)
fam <- simfam_cmp(N.fam = 50, design = "pop+", variation = "none", 
        base.dist = "Weibull", 
        base.parms = list(c(0.01, 3), c(0.01, 3)), 
        vbeta = list(c(-1.13, 2.35), c(-1,2)))
       
summary(fam)
## Not run: 
Study design:                           pop+ 
Baseline distribution for event 1:      Weibull 
Baseline distribution for event 2:      Weibull 
Number of families:                     50 
Average number of event 1 per family:   1.24 
Average number of event 2 per family:   0.7 
Average number of carriers per family:  5.54 
Average family size:                    15.58 
Average age of onset for event 1:       42.59 
Average age of onset for event 2:       43.72 

## End(Not run)

Summary method for simfam_tvc

Description

Provides a summary of simulated data.

Usage

## S3 method for class 'simfam_tvc'
summary(object, digits = max(3, getOption("digits") - 3), ...)

Arguments

object

An object class of 'simfam_tvc' generated from simfam_tvc function

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Displays a summary of simulated data and returns the following values:

num.fam

Number of families simulated.

avg.num.affected

Average number of affected individuals per family.

avg.num.carriers

Average number of mutation carriers per family.

avg.family.size

Average family size.

ave.ageonset

Average age of onset for affected individuals.

ave.num.tvc

Average number of TVC events per family.

Author(s)

Yun-Hee Choi

See Also

simfam_tvc

Examples

set.seed(4321)
fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty", 
       base.dist = "Weibull", frailty.dist = "gamma", depend = 1, 
       add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60), 
       tvc.parms = c(1, 0.1, 0), allelefreq = 0.02, 
       base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))

summary(fam)
## Not run: 
Study design:           pop: population-based study with affected probands 
Baseline distribution:  Weibull 
Frailty distribution:   gamma 
Number of families:                     10 
Average number of affected per family:  3.1 
Average number of carriers per family:  3.4 
Average family size:                    16.3 
Average age of onset for affected:      48.19 
Average number of TVC event per family: 4 

## End(Not run)

Summary method for simfam2

Description

Provides a summary of simulated data.

Usage

## S3 method for class 'simfam2'
summary(object, digits = max(3, getOption("digits") - 3), ...)

Arguments

object

An object class of 'simfam2' generated from simfam2 function

digits

Number of significant digits to use when printing.

...

Further arguments passed to or from other methods.

Value

Displays a summary of simulated data and returns the following values:

num.fam

Number of families simulated.

avg.num.affected

Average number of affected individuals per family.

avg.num.carriers

Average number of mutation carriers per family.

avg.family.size

Average family size.

ave.ageonset

Average age of onset for affected individuals.

Author(s)

Yun-Hee Choi

See Also

simfam2

Examples

set.seed(4321)

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

summary(fam2)
## Not run: 
Study design:          pop 
Baseline distribution: Weibull 
Frailty distribution:  lognormal with kinship and IBD matrices 
Number of families:                     50 
Average number of affected per family:  1.24 
Average number of carriers per family:  1.3 
Average family size:                    17.02 
Average age of onset for affected:      40.08 

## End(Not run)