Title: | Genotype-Specific Survival Estimation |
---|---|
Description: | We propose a fully efficient sieve maximum likelihood method to estimate genotype-specific distribution of time-to-event outcomes under a nonparametric model. We can handle missing genotypes in pedigrees. We estimate the time-dependent hazard ratio between two genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation to the reference baseline hazard function. The estimators are calculated via an expectation-maximization algorithm. |
Authors: | Baosheng Liang, Yuanjia Wang and Donglin Zeng |
Maintainer: | Baosheng Liang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1 |
Built: | 2024-12-12 06:57:34 UTC |
Source: | CRAN |
The package ‘GSSE’ (Genotype-Specific Survival Estimation) is made to implement a fully efficient sieve maximum likelihood method to estimate genotype-specific distribution of time-to-event outcomes under a nonparametric model. The package can handle missing genotypes in pedigrees. We estimate time-dependent hazard ratio between two genetic mutation groups using B-splines and apply nonparametric maximum likelihood estimation to the reference baseline hazard function. The estimators are calculated via an expectation-maximization algorithm.
Package: | GSSE |
Type: | Package |
Version: | 0.1 |
Date: | 2015-10-17 |
License: | GPL (>= 2) |
The main function is Sieve_NPMLE_Switch()
. See the documentation file with examples.
Baosheng Liang <[email protected]>, Yuanjia Wang <[email protected]> and Donglin Zeng <[email protected]>
Maintainer: Baosheng Liang <[email protected]>
Wang, Y., Clark, L. N., Louis, E. D., Mejia-Santana, H., Harris, J., Cote, L. J., ... & Marder, K. (2008). Risk of Parkinson disease in carriers of parkin mutations: estimation using the kin-cohort method. Archives of neurology, 65(4), 467-474.
Qin, J., Garcia, T., Ma, Y., Tang, M., Marder, K. & Wang, Y. (2014). Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. The Annals of Applied Statistics 8(2), 1182-1208.
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
This function is used to estimate the genotype-specific distribution of time-to-event outcomes using EM-PAVA algorithm (Qin et al. 2014).
EM_PAVA_Func (q, x, delta, timeval, p, ep = 1e-05, maxiter = 400)
EM_PAVA_Func (q, x, delta, timeval, p, ep = 1e-05, maxiter = 400)
q |
matrix of 2 columns, where the first and second columns are the probabilities of belonging to the carrier |
x |
observed event time or censoring time. |
delta |
indicator of event. |
timeval |
grid points at which the distribution function values are estimated. |
p |
number of groups. |
ep |
convergence criterion. Here, |
maxiter |
maximum number of EM iterations. |
Technical details can be found in Qin et al. (2014).
Returns a list of prediction values for classes
Fest |
estimated values at the points of |
Fest.all |
estimated values of cumulative distribution function on both carrier and non-carrier groups. |
Qin, J., Garcia, T., Ma, Y., Tang, M., Marder, K. & Wang, Y. (2014). Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. The Annals of Applied Statistics 8(2), 1182-1208.
p0G_Func()
, Sieve_NPMLE_Switch()
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.01, 3.65, 0.01); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); EMpava_result = EM_PAVA_Func ( q = rbind(p0G,1-p0G), x = Y, delta = Delta, timeval = Grid, p = 2, ep = 1e-4 ); all = sort(c(Grid, Y)); F_carr_func = function(x){ EMpava_result$Fest.all[1, which.max(all[all <= x]) ] }; F_non_func = function(x){ EMpava_result$Fest.all[2, which.max(all[all <= x]) ] }; PAVA_F1.hat_fix_t = apply( matrix(fix_t1, ncol=1), 1, F_carr_func ); PAVA_F2.hat_fix_t = apply( matrix(fix_t2, ncol=1), 1, F_non_func ); PAVA_F.hat_fix_t = data.frame( fix_t1 = fix_t1, PAVA_F1.hat = PAVA_F1.hat_fix_t, fix_t2 = fix_t2, PAVA_F2.hat = PAVA_F2.hat_fix_t ); print(PAVA_F.hat_fix_t); # plot estimated curves F_carr = apply( matrix(Grid, ncol=1), 1, F_carr_func ); F_non = apply( matrix(Grid, ncol=1), 1, F_non_func ); plot( Grid, F_carr, type = 's', lty = 1, xlab = "Y", ylab = "Estimated Cumulative Distribution Function", ylim = c(0,1), col = 'blue' ); lines(Grid, F_non, type='s', lty=2, col='red'); legend("topleft", legend=c("Carrier group", "Non-Carrier group"), lty=c(1,2), col=c("blue", "red") );
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.01, 3.65, 0.01); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); EMpava_result = EM_PAVA_Func ( q = rbind(p0G,1-p0G), x = Y, delta = Delta, timeval = Grid, p = 2, ep = 1e-4 ); all = sort(c(Grid, Y)); F_carr_func = function(x){ EMpava_result$Fest.all[1, which.max(all[all <= x]) ] }; F_non_func = function(x){ EMpava_result$Fest.all[2, which.max(all[all <= x]) ] }; PAVA_F1.hat_fix_t = apply( matrix(fix_t1, ncol=1), 1, F_carr_func ); PAVA_F2.hat_fix_t = apply( matrix(fix_t2, ncol=1), 1, F_non_func ); PAVA_F.hat_fix_t = data.frame( fix_t1 = fix_t1, PAVA_F1.hat = PAVA_F1.hat_fix_t, fix_t2 = fix_t2, PAVA_F2.hat = PAVA_F2.hat_fix_t ); print(PAVA_F.hat_fix_t); # plot estimated curves F_carr = apply( matrix(Grid, ncol=1), 1, F_carr_func ); F_non = apply( matrix(Grid, ncol=1), 1, F_non_func ); plot( Grid, F_carr, type = 's', lty = 1, xlab = "Y", ylab = "Estimated Cumulative Distribution Function", ylim = c(0,1), col = 'blue' ); lines(Grid, F_non, type='s', lty=2, col='red'); legend("topleft", legend=c("Carrier group", "Non-Carrier group"), lty=c(1,2), col=c("blue", "red") );
This is a small data set used for illustration of the ‘p0G’ calculation. There are 2 columns, 20 rows in this data set.
data("p0G_data")
data("p0G_data")
A data frame with 20 observations on the following 2 columns with
relative
a factor with levels Child
, Parent
and Sibling
,
proband_pd
a factor with levels No
and Yes
.
This data set is a tiny subset of the data set of a Parkinson disease research project supported by Michael J. Fox Foundation.
Wang, Y., Clark, L. N., Louis, E. D., Mejia-Santana, H., Harris, J., Cote, L. J., ... & Marder, K. (2008). Risk of Parkinson disease in carriers of parkin mutations: estimation using the kin-cohort method. Archives of neurology, 65(4), 467-474.
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
p0G_Func()
data("p0G_data")
data("p0G_data")
This function is used to calculate the probability of a relative being a carrier.
p0G_Func (p, status, relative, model)
p0G_Func (p, status, relative, model)
p |
population frequency of mutation. |
status |
proband's carrier status ( |
relative |
relative's relationship to the proband ( |
model |
assumed genetic model ("dominant" or "recessive"). |
Technical details can be found in Wang et al. (2008).
The function p0G_Func
returns the probability of a relative being a carrier under the given genetic model.
Wang, Y., Clark, L. N., Louis, E. D., Mejia-Santana, H., Harris, J., Cote, L. J., ... & Marder, K. (2008). Risk of Parkinson disease in carriers of parkin mutations: estimation using the kin-cohort method. Archives of neurology, 65(4), 467-474.
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
p0G_data
, Sieve_NPMLE_Switch()
, Sieve_NPMLE_Bootstrap()
, EM_PAVA_Func()
and
# Example 1 # data("p0G_data"); attach(p0G_data); n = dim(p0G_data)[1]; Status = as.numeric( proband_pd == "Yes" ); Relative = rep(1, n) + 1*(relative == "Sibling") + 2*(relative == "Child"); detach(p0G_data); p0G = rep(0, n); for(i in 1:n) { p0G[i] = p0G_Func(p = 0.02, status = Status[i], relative = Relative[i], model="dominant"); } data.frame(pd_status = Status, relatives = Relative, prob = p0G); # Example 2 # n = 50; status = sample(x=c(0,1), size = n, replace = TRUE, prob = c(0.6, 0.4) ); relative = sample(x=1:3, size = n, replace = TRUE, prob = c(1/3, 1/3, 1/3) ); p0G = rep(0, n); for(i in 1:n) { p0G[i] = p0G_Func(p = 0.2, status = status[i], relative = relative[i], model="dominant"); } data.frame(status = status, relative = relative, p0G = p0G);
# Example 1 # data("p0G_data"); attach(p0G_data); n = dim(p0G_data)[1]; Status = as.numeric( proband_pd == "Yes" ); Relative = rep(1, n) + 1*(relative == "Sibling") + 2*(relative == "Child"); detach(p0G_data); p0G = rep(0, n); for(i in 1:n) { p0G[i] = p0G_Func(p = 0.02, status = Status[i], relative = Relative[i], model="dominant"); } data.frame(pd_status = Status, relatives = Relative, prob = p0G); # Example 2 # n = 50; status = sample(x=c(0,1), size = n, replace = TRUE, prob = c(0.6, 0.4) ); relative = sample(x=1:3, size = n, replace = TRUE, prob = c(1/3, 1/3, 1/3) ); p0G = rep(0, n); for(i in 1:n) { p0G[i] = p0G_Func(p = 0.2, status = status[i], relative = relative[i], model="dominant"); } data.frame(status = status, relative = relative, p0G = p0G);
Permutation test for the Sieve-NPMLE switch method with null hypothesis : F_carr = F_non and alternative hypothesis
: F_carr is not equal to F_non.
Permutation_Test (Grid, F_carr, F_non, OY, ODelta, Op0G, nperm)
Permutation_Test (Grid, F_carr, F_non, OY, ODelta, Op0G, nperm)
Grid |
time points at which the distribution function values are estimated. |
F_carr |
a vector of distribution function values at given grid points of the carrier group. |
F_non |
a vector of distribution function values at given grid points of the non-carrier group. |
OY |
observed event times. |
ODelta |
observed indicators of right censoring. |
Op0G |
observed probability values of carrier and non-carrier groups. |
nperm |
replication number used in permutation. |
Technical details can be found in Wang et al. (2015).
This function returns a list of prediction values for classes,
Test_Stat |
value of the Kolmogorov-Smirnov statistic with observed data. |
Pvalues |
p-value of the permutation test. |
Permutation.value |
values of Kolmogorov-Smirnov statistics under all permutations. |
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
test_stat()
and Sieve_NPMLE_Switch()
.
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.1, 3, 0.1); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=7, degree=3 ); Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp( max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp( max( Lambda_2.hat[Y <= x] ) ) } F_carr = apply( matrix(px, ncol=1), 1, F_carr_func ); F_non = apply( matrix(px, ncol=1), 1, F_non_func ); # Permutation test # Permutation_Test( Grid=Grid, F_carr=F_carr, F_non=F_non, OY=OY, ODelta=ODelta, Op0G=Op0G, nperm=10 );
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.1, 3, 0.1); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=7, degree=3 ); Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp( max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp( max( Lambda_2.hat[Y <= x] ) ) } F_carr = apply( matrix(px, ncol=1), 1, F_carr_func ); F_non = apply( matrix(px, ncol=1), 1, F_non_func ); # Permutation test # Permutation_Test( Grid=Grid, F_carr=F_carr, F_non=F_non, OY=OY, ODelta=ODelta, Op0G=Op0G, nperm=10 );
This function is used for calculating standard error estimates and 95% confidence bands in quantile using the bootstrap method.
Sieve_NPMLE_Bootstrap ( fam_ID, Y0, Delta0, p0G0, fix_t1, fix_t2, Grid, Knot, degree=3, Bn, maxiter=400, ep=1e-05)
Sieve_NPMLE_Bootstrap ( fam_ID, Y0, Delta0, p0G0, fix_t1, fix_t2, Grid, Knot, degree=3, Bn, maxiter=400, ep=1e-05)
fam_ID |
family ID numbers. |
Y0 |
observed event times or censoring times. |
Delta0 |
indicators of event. |
p0G0 |
probabilities of being a carrier. |
fix_t1 |
a vector of fixed points at which the carrier's cumulative distribution function values are estimated. |
fix_t2 |
a vector of fixed points at which the non-carrier's cumulative distribution function values are estimated. |
Grid |
a vector of grid points used for plotting the estimated distribution functions of carrier and non-carrier groups. |
Knot |
number of knots of the B-spline base functions. |
degree |
degree of the B-spline base functions. |
Bn |
number of bootstrap samples. |
maxiter |
maximum number of iterations. |
ep |
convergence criterion, default is ep= 1e-05. |
Using bootstrap for standard error estimation and 95% confidence bands calculation. We do the Bootstrap resample according to fam_ID
.
This function returns a list
Boot.L1 |
estimated cumulative hazard function for the carrier group. |
Boot.L2 |
estimated cumulative hazard function for the non-carrier group. |
SE_F1_fix_t |
estimated standard errors for the carrier group at given points |
SE_F2_fix_t |
estimated standard errors for the non-carrier group at given points |
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
p0G_Func()
, Sieve_NPMLE_Switch()
.
data("Simulated_data"); OID = Simulated_data[,1]; OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.1, 3, 0.1); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=7, degree=3 ); Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp( - max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp( - max( Lambda_2.hat[Y <= x] ) ) } est.f1 = apply(matrix(fix_t1, ncol=1), 1, F_carr_func ); est.f2 = apply(matrix(fix_t2, ncol=1), 1, F_non_func ); # ---------------- # # Bootstrap # # ---------------- # Boot = Sieve_NPMLE_Bootstrap( fam_ID=OID, Y0=OY, Delta0=ODelta, p0G0=Op0G, fix_t1=fix_t1, fix_t2=fix_t2, Grid = Grid, Knot=6, degree =3, Bn=10 ); SE1 = Boot$SE_F1_fix_t; SE2 = Boot$SE_F2_fix_t; estp = data.frame( fix_t1 = fix_t1, F1.hat = est.f1, SE_F1 = SE1, fix_t2 = fix_t2, F2.hat = est.f2, SE_F2 = SE2 ); print(estp)
data("Simulated_data"); OID = Simulated_data[,1]; OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.1, 3, 0.1); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=7, degree=3 ); Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp( - max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp( - max( Lambda_2.hat[Y <= x] ) ) } est.f1 = apply(matrix(fix_t1, ncol=1), 1, F_carr_func ); est.f2 = apply(matrix(fix_t2, ncol=1), 1, F_non_func ); # ---------------- # # Bootstrap # # ---------------- # Boot = Sieve_NPMLE_Bootstrap( fam_ID=OID, Y0=OY, Delta0=ODelta, p0G0=Op0G, fix_t1=fix_t1, fix_t2=fix_t2, Grid = Grid, Knot=6, degree =3, Bn=10 ); SE1 = Boot$SE_F1_fix_t; SE2 = Boot$SE_F2_fix_t; estp = data.frame( fix_t1 = fix_t1, F1.hat = est.f1, SE_F1 = SE1, fix_t2 = fix_t2, F2.hat = est.f2, SE_F2 = SE2 ); print(estp)
This function is used to estimate the genotype-specific distribution of time-to-event outcomes with the Sieve-NPMLE switch algorithm.
Sieve_NPMLE_Switch (Y, p0G, Delta, px, Grid, Knot, degree, maxiter=400, ep=1e-05)
Sieve_NPMLE_Switch (Y, p0G, Delta, px, Grid, Knot, degree, maxiter=400, ep=1e-05)
Y |
observed event times or censoring times. |
p0G |
probabilities of being a carrier. |
Delta |
indicators of event. |
px |
grid points at which the distribution function values are estimated. |
Grid |
grid points used for plots. |
Knot |
number of knots. |
degree |
degree of the B-spline base functions. |
maxiter |
maximum number of iterations. |
ep |
convergence criterion, default is |
This function is used to estimate the distribution of the age-at-onset of Parkinsons disease for carriers' and non-carriers' mutations in the leucine-rich repeat kinase 2, LRRK2, gene. Please refer to Wang et al. (2015) for more details.
This function returns a list of prediction values for classes
lamb1.hat |
estimated values of hazard function for the carrier group. |
lamb2.hat |
estimated values of hazard function for the non-carrier group. |
Lamb1 |
estimated values of cumulative hazard function for the carrier group. |
Lamb2 |
estimated values of cumulative hazard function for the non-carrier group. |
Converge |
convergence status. |
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
p0G_Func()
, Sieve_NPMLE_Bootstrap()
and Permutation_Test()
.
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.01, 3.65, 0.01); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=6, degree=3 ); # 'Sieve_NPMLE_Switch' only returns values of hazard function # or cumulative hazard function of mutation groups, in practice, we only # need to do a little bit more as follows for calculating the # cumulative distribution values of mutation groups. Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp(- max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp(- max( Lambda_2.hat[Y <= x] ) ) } est.f1 = apply(matrix(fix_t1, ncol=1), 1, F_carr_func ); est.f2 = apply(matrix(fix_t2, ncol=1), 1, F_non_func ); estp = data.frame( fix_t1 = fix_t1, F_carr_t1 = est.f1, fix_t2 = fix_t2, F_non_t2 = est.f2 ); print(estp); # plot estimated curves F_carr = apply( matrix(px, ncol=1), 1, F_carr_func ); F_non = apply( matrix(px, ncol=1), 1, F_non_func ); plot( px, F_carr, type='s', lty=1, ylim=c(0, 1), xlab="Y", ylab="Estimated Cumulative Distribution Function", col='blue' ); lines(px, F_non, type='s', lty=2, col='red'); legend("topleft", legend=c("Carrier group", "Non-Carrier group"), lty=c(1,2), col=c("blue","red") );
data("Simulated_data"); OY = Simulated_data[,2]; ind = order(OY); ODelta = Simulated_data[,3]; Op0G = Simulated_data[,4]; Y = OY[ind]; Delta = ODelta[ind]; p0G = Op0G[ind]; Grid = seq(0.2, 3.65, 0.05); fix_t1 = c(0.288, 0.693, 1.390); fix_t2 = c(0.779, 1.860, 3.650); px = seq(0.01, 3.65, 0.01); SieveNPMLE_result = Sieve_NPMLE_Switch( Y=Y, Delta=Delta, p0G=p0G, px=px, Grid=Grid, Knot=6, degree=3 ); # 'Sieve_NPMLE_Switch' only returns values of hazard function # or cumulative hazard function of mutation groups, in practice, we only # need to do a little bit more as follows for calculating the # cumulative distribution values of mutation groups. Lambda_1.hat = cumsum( SieveNPMLE_result$lamb1.hat ); Lambda_2.hat = cumsum( SieveNPMLE_result$lamb2.hat ); F_carr_func = function(x){ 1 - exp(- max( Lambda_1.hat[Y <= x] ) ) } F_non_func = function(x){ 1 - exp(- max( Lambda_2.hat[Y <= x] ) ) } est.f1 = apply(matrix(fix_t1, ncol=1), 1, F_carr_func ); est.f2 = apply(matrix(fix_t2, ncol=1), 1, F_non_func ); estp = data.frame( fix_t1 = fix_t1, F_carr_t1 = est.f1, fix_t2 = fix_t2, F_non_t2 = est.f2 ); print(estp); # plot estimated curves F_carr = apply( matrix(px, ncol=1), 1, F_carr_func ); F_non = apply( matrix(px, ncol=1), 1, F_non_func ); plot( px, F_carr, type='s', lty=1, ylim=c(0, 1), xlab="Y", ylab="Estimated Cumulative Distribution Function", col='blue' ); lines(px, F_non, type='s', lty=2, col='red'); legend("topleft", legend=c("Carrier group", "Non-Carrier group"), lty=c(1,2), col=c("blue","red") );
This is an artificial data set used for the illustration of the current R-package. There are 4 columns, 268 rows in this data set.
data("Simulated_data")
data("Simulated_data")
A data frame with 268 observations on the following 4 columns with.
FamID
a factor with levels Child
, Parent
and Sibling
Y
a factor with levels No
and Yes
delta
a numeric vector
p0G
a numeric vector, probability distribution value.
This is a simulated data set based on the real data analysis results in Wang et al. (2015).
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
Marder, K., Wang, Y., Alcalay, R. N., Mejia-Santana, H., Tang, M. X., Lee, A., ... & Miravite, J. (2015). Age-specific penetrance of LRRK2 G2019S in the Michael J. Fox Ashkenazi Jewish LRRK2 Consortium. Neurology, 85(1), 89–95.
Qin, J., Garcia, T., Ma, Y., Tang, M., Marder, K. & Wang, Y. (2014). Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. The Annals of Applied Statistics 8(2), 1182-1208.
Sieve_NPMLE_Switch()
, Sieve_NPMLE_Bootstrap()
, EM_PAVA_Func()
.
data("Simulated_data")
data("Simulated_data")
This function is used to calculate the Kolmogorov-Smirnov statistic: .
test_stat (F1, F2)
test_stat (F1, F2)
F1 |
vector of one distribution function value. |
F2 |
vector of the other distribution function value. |
It is necessary to point out that F1
and F2
are based on common grids. Other technical details can be found in Wang et al. (2015).
This function returns a list of prediction values for classes,
out |
value of Kolmogorov-Smirnov statistics. |
Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. & Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika, 102(3), 515-532.
X = seq(0.01, 1, 0.05); Func_1 = function(x){ x + runif(length(x), min=0, max=1) }; Func_2 = function(x){ x + 2*runif(length(x), min=0, max=1) }; U1 = Func_1(X); U2 = Func_2(X); test_stat (F1 = U1, F2 = U2);
X = seq(0.01, 1, 0.05); Func_1 = function(x){ x + runif(length(x), min=0, max=1) }; Func_2 = function(x){ x + 2*runif(length(x), min=0, max=1) }; U1 = Func_1(X); U2 = Func_2(X); test_stat (F1 = U1, F2 = U2);