Title: | Efficient Estimation of Clustered Current Status Data |
---|---|
Description: | Current status data abounds in the field of epidemiology and public health, where the only observable data for a subject is the random inspection time and the event status at inspection. Motivated by such a current status data from a periodontal study where data are inherently clustered, we propose a unified methodology to analyze such complex data. |
Authors: | Tong Wang [aut, cre], Kejun He [aut], Wei Ma [aut], Dipankar Bandyopadhyay [aut], Samiran Sinha [aut] |
Maintainer: | Tong Wang <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-12-02 06:30:53 UTC |
Source: | CRAN |
Current status data abounds in the field of epidemiology and public health, where the only observable data for a subject is the random inspection time and the event status at inspection. Motivated by such a current status data from a periodontal study where data are inherently clustered, we propose a unified methodology to analyze such complex data.
Tong Wang [aut, cre], Kejun He [aut], Wei Ma [aut], Dipankar Bandyopadhyay [aut], Samiran Sinha [aut]
Maintainer:Tong Wang <[email protected]>
CSDfit is used to analyze clustered current status data. The function provides parameter estimates, the maximum log likelihood value and the corresponding AIC value.
CSDfit(Rawdata, n_subject.raw, n_within.raw, r, n_quad = 30, lambda = 0, tolerance = 0.5, knots.num = 2, degree = 2, scale.numr = TRUE, clustering=TRUE)
CSDfit(Rawdata, n_subject.raw, n_within.raw, r, n_quad = 30, lambda = 0, tolerance = 0.5, knots.num = 2, degree = 2, scale.numr = TRUE, clustering=TRUE)
Rawdata |
This is a dataframe of the current status data. The first column should be the index of the subject (cluster). The second column is the inspection time. The next n_subjec.raw columns are the subject (cluster-specifie) level covariates. Then the next n_within.raw columns are the within subject covariates. The last column is the indicator of the event where 1 or 0 indicate if the event has or has not happened by the inspection time, respectively. All the covariates are assumed to be either numerical or binary, and our program automatically detects if a covariate is a binary or numerical variable. |
n_subject.raw |
The number of subject (cluster-specifie) level covariates. |
n_within.raw |
The number of within cluster covariates. |
r |
The index of the Generalized odds ratio (GOR) model. This index is a non-negative number and it must be specified by the user. Here r=0 and 1 imply the proportional hazard and the proportional odds model, respectively. |
n_quad |
The number of Gauss-Hermite quadrature nodes used in numerical integration. The default value is 30. |
lambda |
The tuning parameter of the roughness penalty used for estimating the non-parametric component of the GOR model. The default value is 0. One must use the roughness penalty when the number of basis functions in the non-parametric component of the GOR model is large. |
tolerance |
This denotes the summation of the absolute values of the relative tolerance of all parameters in the model. It is used to define the convergence of the parameter estimates. The default value is 0.5. |
knots.num |
The number of equidistant interior knots for the integrated B-spline approximation of the nonparametric component of the GOR model. The default value is 2. |
degree |
The degree of integrated B-splines. The default value is 2. |
scale.numr |
logical. If TRUE, then all numeric covariates (cluster specifie and within cluster) are scaled with mean zero and standard deviation one. The default value is TRUE. |
clustering |
logical. TRUE and FALSE indicate assume there is clustering effect or not, respectively. The default value is TRUE. |
Function CSDfit returns a list containing the following components:
parameter.est |
It is a matrix. The column "par.est" contains the estimate of the regression parameters. The column "SE" contains the standard error of these estimators. The columns "Z" and "p-value" are the corresponding Z-statistic and p-value. The last two columns are the lower and upper bound of the 95% Wald's confidence interval for the parameters. |
log_likelihood |
The maximum log likelihood value. This includes the logarithm of the roughness penalty and the Cauchy penalty if Cauchy.pen=TRUE. |
AICvalue |
The AIC value. |
coefs |
The estimates of the coefficients of the integrated B-spline basis functions. |
data(PD) PDfit=CSDfit(PD,3,1,0,n_quad=5)
data(PD) PDfit=CSDfit(PD,3,1,0,n_quad=5)
A clustered current status dataset arises from a periodontal disease (PD) study where tooth level data are clusterd within subjects. The first and second columns are the index for patients and the inspection time for each tooth, respectively. The next 3 column are the three subject level covariates (gender, smoking and Hba1c). After that, it is the tooth level covariate (jaw). The last column is the indicator for the event.
data(PD)
data(PD)