Title: | Combining Different Spatial Datasets in Cancer Risk Estimation |
---|---|
Description: | We propose a novel two-step procedure to combine epidemiological data obtained from diverse sources with the aim to quantify risk factors affecting the probability that an individual develops certain disease such as cancer. See Hui Huang, Xiaomei Ma, Rasmus Waagepetersen, Theodore R. Holford, Rong Wang, Harvey Risch, Lloyd Mueller & Yongtao Guan (2014) A New Estimation Approach for Combining Epidemiological Data From Multiple Sources, Journal of the American Statistical Association, 109:505, 11-23, <doi:10.1080/01621459.2013.870904>. |
Authors: | Ming Wang, Yongtao Guan, Kun Xu |
Maintainer: | Ming Wang <[email protected]> |
License: | GPL |
Version: | 0.1 |
Built: | 2024-12-10 06:55:45 UTC |
Source: | CRAN |
The main function is to solve the estimating equations constructed by combining all pairs (N1,M1), (N1,M2),
(N2,M1) and (N2,M2) with selection bias probility
included.
DA_AllEE5(realdata_covariates, realdata_alpha, beta0)
DA_AllEE5(realdata_covariates, realdata_alpha, beta0)
realdata_covariates |
a list contains the following data matrics: CASEZ_1, CASEZ_2, CASEZhat_1, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZ_2, CONTZhat_1, CONTZhat_2, CONTZhat_22. For details please see definition in the help of realdata_covariates. Please be noted that all the variables have to use the same name as listed above. |
realdata_alpha |
a list contains the following data matrics: prob_case_1, prob_case_11, prob_case_2, prob_case_22, prob_cont_1, prob_cont_2, pwt_cont_2. details please see definition in the help of realdata_alpha. Please be noted that all the variables have to use the same name as listed above. |
beta0 |
We need an initial parameter for solver "nleqslv". Default value is beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) |
The function solves GMM combined estimating equation with handling selection bias, see Huang(2014).
A list of estimator and its standard deviation.
Huang, H., Ma, X., Waagepetersen, R., Holford, T.R. , Wang, R., Risch, H., Mueller, L. & Guan, Y. (2014). A New Estimation Approach for Combining Epidemiological Data From Multiple Sources, Journal of the American Statistical Association, 109:505, 11-23.
#you can use glm to get the estimate as the initial value of beta0 #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_AllEE5(realdata_covariates,realdata_alpha,beta0=beta0)
#you can use glm to get the estimate as the initial value of beta0 #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_AllEE5(realdata_covariates,realdata_alpha,beta0=beta0)
This is the internal function to solve the estimating equation constructed by
pair (N1,M1), (N1,M2), (N2,M1) and (N2,M2) with selection bias probility
included. Since it's a internal function for function DA_AllEE5, thus it's not a necessary
or important function.
DA_AllEE5_inside(beta, CASEZ_1, CASEZ_2, CASEZhat_1, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZ_2, CONTZhat_1, CONTZhat_2, CONTZhat_22, prob_case_1, prob_case_11, prob_case_2, prob_case_22, prob_cont_1, prob_cont_2, p, pi_case_1, pi_case_1_t, pi_case_2, pi_case_2_t, pi_cont_1, pi_cont_1_t, pi_cont_2, Z_case_pi_1, Z_case_pi_1_t, Z_case_pi_2, Z_case_pi_2_t, Z_cont_pi_1, Z_cont_pi_1_t, Z_cont_pi_2, J_step3, V_step3, pwt_cont_2, subset_2, subset_3, subset_4)
DA_AllEE5_inside(beta, CASEZ_1, CASEZ_2, CASEZhat_1, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZ_2, CONTZhat_1, CONTZhat_2, CONTZhat_22, prob_case_1, prob_case_11, prob_case_2, prob_case_22, prob_cont_1, prob_cont_2, p, pi_case_1, pi_case_1_t, pi_case_2, pi_case_2_t, pi_cont_1, pi_cont_1_t, pi_cont_2, Z_case_pi_1, Z_case_pi_1_t, Z_case_pi_2, Z_case_pi_2_t, Z_cont_pi_1, Z_cont_pi_1_t, Z_cont_pi_2, J_step3, V_step3, pwt_cont_2, subset_2, subset_3, subset_4)
beta |
Parameter |
CASEZ_1 , CASEZhat_1
|
case data(N1) from case-control study, details please see definition in the help of realdata_covariates. |
CASEZ_2 , CASEZhat_2 , CASEZhat_22
|
CTR data(N2), details please see definition in the help of realdata_covariates. |
CONTZ_1 , CONTZhat_1
|
control data(M1) from case-control study, details please see definition in the help of realdata_covariates. |
CONTZ_2 , CONTZhat_2 , CONTZhat_22
|
BRFSS data(M2), details please see definition in the help of realdata_covariates. |
prob_cont_1 , prob_cont_2 , prob_case_1 , prob_case_11 , prob_case_2 , prob_case_22 , pwt_cont_2
|
please see definition in the help of realdata_alpha. |
p |
Number of parameters, a constant value of 8. |
pi_case_1 , pi_case_1_t , pi_case_2 , pi_case_2_t , pi_cont_1 , pi_cont_1_t , pi_cont_2
|
selection bias |
Z_case_pi_1 , Z_case_pi_1_t , Z_case_pi_2 , Z_case_pi_2_t , Z_cont_pi_1 , Z_cont_pi_1_t , Z_cont_pi_2
|
part of variables from covariates, used for the estiamtion of variance. |
J_step3 |
Derivative of the estimating equation. |
V_step3 |
Variance of the estimating equation. |
subset_2 |
A vector of 1:(p-2). |
subset_3 |
A vector of 1:p. |
subset_4 |
A vector of 1:(p-2). |
The function solves estimating equation based on GMM combined estimating equations
with handling selection bias.
It also accounts for the uncertainty due to the estimated value of eta. The function will
output the estimating equation at current input value beta. Hence it can be used in "nleqslv" to solve for
. Because the function also outputs J and V, the asymptotic variance of
can be calculated in a straightforward way.
may be highly correlated with Z_d, so it is
removed in the estimation. And it has to be careful in constructing f, J and V.
A list of (f,J,V)
f The final form of the estimating equation after adjusting eta.
J_step3 The derivative of the estimating equation.
V_step3 The variance of the estimating equation.
The main function to solve the estimating equations constructed by combining pair (N2,M1) and (N2,M2). Since there is just one case data, no selection bias needed.
DA_FDN2M1M2(realdata_covariates, realdata_alpha, subset_2, subset_4, p, beta0)
DA_FDN2M1M2(realdata_covariates, realdata_alpha, subset_2, subset_4, p, beta0)
realdata_covariates |
a list contains the following data matrics: CASEZ_2, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZhat_1, CONTZhat_2, CONTZhat_22 |
realdata_alpha |
a list contains the following data matrics: prob_case_22, prob_cont_1, prob_cont_2, pwt_cont_2 |
subset_2 |
A vector of 1:(p-2), which is the subset of |
subset_4 |
A vector of 1:(p-2), which is the subset of |
p |
number of parameters. |
beta0 |
an initial parameter for solver "nleqslv". |
The function solves estimating equation based on (N2,M1) and (N2,M2), see Huang(2014).
A list of estimator and its standard deviation.
Huang, H., Ma, X., Waagepetersen, R., Holford, T.R. , Wang, R., Risch, H., Mueller, L. & Guan, Y. (2014). A New Estimation Approach for Combining Epidemiological Data From Multiple Sources, Journal of the American Statistical Association, 109:505, 11-23.
#p <- 8 #subset_2 <- 1:p #subset_4 <- 1:p #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_FDN2M1M2(realdata_covariates,realdata_alpha,subset_2,subset_4,p=p,beta0=beta0)
#p <- 8 #subset_2 <- 1:p #subset_4 <- 1:p #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_FDN2M1M2(realdata_covariates,realdata_alpha,subset_2,subset_4,p=p,beta0=beta0)
The internal function to solve the estimating equations constructed by combining pair (N2,M1) and (N2,M2). Since there is just one case data, no selection bias needed. Since it's a internal function for function DA_FDN2M1M2, thus it's not a necessary or important function.
DA_FDN2M1M2_inside(beta, CASEZ_2, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZhat_1, CONTZhat_2, CONTZhat_22, prob_case_2, prob_case_22, prob_cont_1, prob_cont_2, p, J, V, subset_2, subset_4, pwt_cont_2)
DA_FDN2M1M2_inside(beta, CASEZ_2, CASEZhat_2, CASEZhat_22, CONTZ_1, CONTZhat_1, CONTZhat_2, CONTZhat_22, prob_case_2, prob_case_22, prob_cont_1, prob_cont_2, p, J, V, subset_2, subset_4, pwt_cont_2)
beta |
Parameter |
CASEZ_2 , CASEZhat_2 , CASEZhat_22
|
CTR data(N2), details please see definition in the help of realdata_covariates. |
CONTZ_1 , CONTZhat_1
|
control data(M1) from case-control study, details please see definition in the help of realdata_covariates. |
CONTZhat_2 , CONTZhat_22
|
BRFSS data(M2), details please see definition in the help of realdata_covariates. |
prob_cont_1 , prob_cont_2 , prob_case_2 , prob_case_22 , pwt_cont_2
|
please see definition in the help of realdata_alpha. |
p |
Number of parameters, a constant value of 8. |
J |
The derivative of the estimating equation. |
V |
The variance of the estimating equation. |
subset_2 |
A vector of 1:(p-2). |
subset_4 |
A vector of 1:(p-2). |
The function solves estimating equation based on (N2,M1) and (N2, M2) with handling selection bias.
It also accounts for the uncertainty due to the estimated value of eta. The function will
output the estimating equation at current input value beta. Hence it can be used in "nleqslv" to solve for
. Because the function also outputs J and V, the asymptotic variance of
can be calculated in a straightforward way.
may be highly correlated with Z_d, so it is
removed in the estimation.
A list of (f,J,V)
f The final form of the estimating equation after adjusting eta.
J The derivative of the estimating equation.
V The variance of the estimating equation.
The main function to solve the estimating equations constructed by (N2,M2). Since there is just one case data, no selection bias needed.
DA_FDN2M2(realdata_covariates, realdata_alpha, p, beta0)
DA_FDN2M2(realdata_covariates, realdata_alpha, p, beta0)
realdata_covariates |
a list contains the following data matrics: CASEZhat_2, CASEZhat_22, CONTZhat_2, CONTZhat_22 |
realdata_alpha |
a list contains the following data matrics: prob_case_22,prob_cont_2, pwt_cont_2 |
p |
number of parameters. |
beta0 |
an initial parameter for solver "nleqslv". |
The function solves estimating equation based on (N2,M2), see Huang(2014).
A list of estimator and its standard deviation.
Huang, H., Ma, X., Waagepetersen, R., Holford, T.R. , Wang, R., Risch, H., Mueller, L. & Guan, Y. (2014). A New Estimation Approach for Combining Epidemiological Data From Multiple Sources, Journal of the American Statistical Association, 109:505, 11-23.
#p <- 8 #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_FDN2M2(realdata_covariates,realdata_alpha,p=p,beta0=beta0)
#p <- 8 #beta0=c(-5.4163,0.7790,-0.1289,0.2773,-0.5510,0.1568,0.4353,-0.6895) #DA_FDN2M2(realdata_covariates,realdata_alpha,p=p,beta0=beta0)
A list of matrices containing value of alpha at each location.
realdata_alpha
realdata_alpha
An object of class list
of length 8.
A list of 8 matrices of calculated value of alpha for case and control points.
Age-by-sex stratification in Connecticut, which is a matrix with 18 rows and 2 variables: Male, Female. In this dataset the age-by-sex distribution based on the Census for the following ten age groups: 35-40, 41-45, 46-50, 51-55, 56-60, 61-65, 66-70, 71-75, 76-80, and 81-83.
Value of alpha for cases in case-control study,
this is matched to controls' age-by-sex proportion
Value of alpha for cases in case-control study,
this is matched to BRFSS age-by-sex proportion
Value of alpha for cases in CTR, which is a matrix with 1929 rows and 1 variables: .
This
in CTR is matched to controls' age-by-sex proportion in case-control study
Value of alpha for cases in CTR, which is a matrix with 1929 rows and 1 variables: .
This
in CTR is matched to BRFSS age-by-sex proportion
Value of alpha for controls in case-control study, which is a matrix with 690 rows and 1 variables:
. A dataset of controls'
of its own
Another Value of alpha for controls in BRFSS data, which is a matrix with 4459 rows and 1 variables:
. A dataset of controls'
in BRFSS data of its own
Value of weights(sampling probability) for controls in BRFSS data, which is
in equation(14) of Huang(2014), a matrix with 4459 rows and 1 variables
# For example of each matrix, type the command in R: attributes(realdata_alpha) # to obtain names of 8 matrices in the list: #"counts_agebysex_state", "prob_case_1", "prob_case_11", "prob_case_2", #"prob_case_22", "prob_cont_1", "prob_cont_2", "pwt_cont_2".
# For example of each matrix, type the command in R: attributes(realdata_alpha) # to obtain names of 8 matrices in the list: #"counts_agebysex_state", "prob_case_1", "prob_case_11", "prob_case_2", #"prob_case_22", "prob_cont_1", "prob_cont_2", "pwt_cont_2".
The list includes 10 matrices of covariates of cases and controls from different sources. Some of them need to impute the missing data, some of them need to estimate the variables even not missing to make sure the consistent format of input.
realdata_covariates
realdata_covariates
A list of 10 matrices
in the first case data which has complete cases:
CASEZ_1=
CASEZhat_1=
in the second case data(CTR) which has missing lifestyle covariates:
CASEZ_2=
CASEZhat_2=
CASEZhat_22=
in the 1st control data which has complete controls:
CONTZ_1=
CONTZhat_1=
in the 2nd control data which has missing traffic covariates(BRFSS):
CONTZ_2=
CONTZhat_2=
CONTZhat_22=
# For example of each matrix, type the command in R: attributes(realdata_covariates) # to obtain names of 10 bulit-in matrices in the list: # "CASEZ_1", "CASEZhat_1", "CASEZ_2", "CASEZhat_2", "CASEZhat_22", "CONTZ_1", # "CONTZhat_1", "CONTZ_2", "CONTZhat_2", "CONTZhat_22".
# For example of each matrix, type the command in R: attributes(realdata_covariates) # to obtain names of 10 bulit-in matrices in the list: # "CASEZ_1", "CASEZhat_1", "CASEZ_2", "CASEZhat_2", "CASEZhat_22", "CONTZ_1", # "CONTZhat_1", "CONTZ_2", "CONTZhat_2", "CONTZhat_22".