Title: | Doubly Robust Generalized Estimating Equations |
---|---|
Description: | Fit restricted mean models for the conditional association between an exposure and an outcome, given covariates. Three methods are implemented: O-estimation, where a nuisance model for the association between the covariates and the outcome is used; E-estimation where a nuisance model for the association between the covariates and the exposure is used, and doubly robust (DR) estimation where both nuisance models are used. In DR-estimation, the estimates will be consistent when at least one of the nuisance models is correctly specified, not necessarily both. For more information, see Zetterqvist and Sjölander (2015) <doi:10.1515/em-2014-0021>. |
Authors: | Johan Zetterqvist <drjohanzetterqvist at gmail.com> , Arvid Sjölander <arvid.sjolander at ki.se> with contributions from Alexander Ploner. |
Maintainer: | Johan Zetterqvist <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.1.10 |
Built: | 2024-12-26 06:39:25 UTC |
Source: | CRAN |
The main function is drgee
, which estimates a parameter
in a model defined as
. By supplying (nuisance) models for
and
, a consistent estimate of
is
obtained when at least one of these models is correctly specified.
In the function drgee
, three estimation methods are
implemented:
O-estimation, where a model for is used;
E-estimation, where a model for
is used; and
DR-estimation where models for both
and
are used.
The function gee
is an implementation of standard GEE with
independent working correlation matrix.
For conditional methods with clustered data, cluster-specific intercepts are assumed in all models.
The function drgeeData
is used for extraction and manipulation
of data, and is called by drgee
and gee
.
The function RobVcov
is used to calculate standard errors of
the estimates, given a vector of the residuals from estimating
equations, the Jacobian, and a cluster-identifying variable.
The function findRoots
solves a system of non linear equations.
Johan Zetterqvist, Arvid Sjölander with contributions from Alexander Ploner.
Given a design matrix and a cluster identification variable. A new design matrix is returned, where each column is centered around its cluster mean. Using C++ code for speed.
centerX(x, id)
centerX(x, id)
x |
A design matrix |
id |
A cluster identifying variable. |
centerX
is written as a help function with the aim of being fast.
A matrix with cluster centered columns.
Johan Zetterqvist, Arvid Sjölander
drgee
is used to estimate an exposure-outcome effect adjusted
for additional covariates. The estimation is based on regression
models for the outcome, exposure or a combination of both.
For clustered data the models may
have cluster-specific intercepts.
drgee(outcome, exposure, oformula, eformula, iaformula = formula(~1), olink = c("identity", "log", "logit"), elink = c("identity", "log", "logit"), data, subset = NULL, estimation.method = c("dr", "o", "e"), cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots, intercept = TRUE, ...)
drgee(outcome, exposure, oformula, eformula, iaformula = formula(~1), olink = c("identity", "log", "logit"), elink = c("identity", "log", "logit"), data, subset = NULL, estimation.method = c("dr", "o", "e"), cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots, intercept = TRUE, ...)
outcome |
The outcome as variable or as a character string naming a variable
in the |
exposure |
The exposure as variable or as a character string naming a variable
in the |
oformula |
An expression or formula for the outcome nuisance model. |
eformula |
An expression or formula for the exposure nuisance model. |
iaformula |
An expression or formula where the RHS should contain the variables
that "interact" (i.e. are supposed to be multiplied with) with the
exposure in the main model. "1" will always added. Default value is no
interactions, i.e. |
olink |
A character string naming the link function in the outcome nuisance
model. Has to be |
elink |
A character string naming the link function in the exposure nuisance
model. Has to be |
data |
A data frame or environment containing the variables used. If missing, variables are expected to be found in the calling environment of the calling environment. |
subset |
An optional vector defining a subset of the data to be used. |
estimation.method |
A character string naming the desired estimation method. Choose
|
cond |
A logical value indicating whether the nuisance models should have
cluster-specific intercepts. Requires a |
rootFinder |
A function to solve a system of non-linear equations. Default
is |
clusterid |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
clusterid.vcov |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
intercept |
A boolean to choose whether the nuisance parameters in doubly robust conditional logistic regression should be fitted with a model with an intercept. Only used for doubly robust condtional logistic regression. |
... |
Further arguments to be passed to the function |
drgee
estimates the parameter in a main
model
,
where
is the outcome of interest,
is the exposure of
interest, and
is a vector of covariates that we wish to
adjust for.
is a vector valued function of
. Note that
should be interpreted as a columnwise
multiplication and that
will always contain a column of 1's.
Given a specification of an outcome nuisance model
(where
is a function of
)
O-estimation is performed. Alternatively, leaving
unspecified and using an exposure nuisance model
(where
is a link
function and
is a function of
) E-estimation
is performed. When
is logit, the exposure nuisance
model is required be of the form
.
In this case the exposure needs to binary.
Given both an outcome and an exposure nuisance model, DR-estimation can be
performed. DR-estimation gives a consistent estimate of the parameter
when either the outcome nuisance model or
the exposure nuisance model
is correctly specified, not necessarily both.
Usage is best explained through an example. Suppose that we are
interested in the parameter vector in a main model
where
and
are the covariates that we wish
to adjust for. To adjust for
and
, we can use an outcome
nuisance model
or an
exposure nuisance model
to calculate estimates of
and
in the main model. We specify the outcome nuisance model as
oformula=Y~L_1
and olink = "logit"
. The exposure nuisance model is specified as
eformula = A~L_1+L_2
and elink = "logit"
.
Since the outcome and the exposure
are
identified as the LHS of
oformula
and eformla
respectively and since the outcome link is specified in the
olink
argument,
the only thing left to specify for the main model is the
(multiplicative) interactions . This
is done by specifying
as
iaformula = ~L_1
, since is always included in
.
We can then perform O-estimation, E-estimation or DR-estimation by
setting
estimation.method
to "o"
,
"e"
or "dr"
respectively. O-estimation uses only the
outcome nuisance model, and E-estimation uses only the exposure
nuisance model. DR-estimation uses both nuisance models, and gives a
consistent estimate of if either nuisance model is correct, not necessarily both.
When estimation.method = "o"
, the RHS of eformula
will be
ignored. The eformula
argument can also be replaced by an exposure
argument specifying what the exposure of interest is.
When estimation.method = "e"
, the RHS of oformula
will be
ignored. The oformula
argument can also be replaced by an outcome
argument specifying what the outcome of interest is.
When cond = TRUE
the nuisance models will be assumed to have
cluster-specific intercept. These intercepts will not estimated.
When E-estimation or DR-estimation is chosen with
olink = "logit"
, the exposure link will be
changed to "logit"
. Note that this choice
of outcome link does not work for DR-estimation
when cond = TRUE
.
Robust variance for the estimated parameter is calculated
using the function robVcov
. A cluster robust variance is calculated when
a character string naming a cluster variable is
supplied in the clusterid
argument.
For E-estimation when cond = FALSE
and is the identity
or log link, see Robins et al. (1992).
For DR-estimation when cond = TRUE
and is the identity
or log link, see Robins (1999). For DR-estimation when
is the logit link, see Tchetgen et al. (2010).
O-estimation can also be performed using the gee
function.
drgee
returns an object of class drgee
containing:
coefficients |
Estimates of the parameters in the main model. |
vcov |
Robust variance for all main model parameters. |
coefficients.all |
Estimates of all estimated parameters. |
vcov.all |
Robust variance of the all parameter estimates. |
optim.object |
An estimation object returned from the function specified
in the |
optim.object.o |
An estimation object returned from the function specified
in the |
optim.object.e |
An estimation object returned from the function specified
in the |
call |
The matched call. |
estimation.method |
The value of the input argument |
data |
The original data object, if given as an input argument |
oformula |
The original oformula object, if given as an input argument |
eformula |
The original eformula object, if given as an input argument |
iaformula |
The original iaformula object, if given as an input argument |
The class methods coef
and vcov
can be used to extract
the estimated parameters and their covariance matrix from a
drgee
object. summary.drgee
produces a summary of the
calculations.
Johan Zetterqvist, Arvid Sjölander
Orsini N., Belocco R., Sjölander A. (2013), Doubly Robust Estimation in Generalized Linear Models, Stata Journal, 13, 1, pp. 185–205
Robins J.M., Mark S.D., Newey W.K. (1992), Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders, Biometrics, 48, pp. 479–495
Robins JM (1999), Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models, Proceedings of the American Statistical Association Section on Bayesian Statistical Science, pp. 6–10
Tchetgen E.J.T., Robins J.M., Rotnitzky A. (2010), On Doubly Robust Estimation in a Semiparametric Odds Ratio Model, Biometrika, 97, 1, 171–180
Zetterqvist J., Vansteelandt S., Pawitan Y., Sjölander (2016), Doubly Robust Methods for Handling Confounding by Cluster, Biostatistics, 17, 2, 264–276
gee
for O-estimation, findRoots
for
nonlinear equation solving and robVcov
for
estimation of variance.
## DR-estimation when ## the main model is ## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1 ## and the outcome nuisance model is ## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2 ## and the exposure nuisance model is ## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2) library(drgee) expit<-function(x) exp(x)/(1+exp(x)) n<-5000 ## nuisance l1<-rnorm(n, mean = 0, sd = 1) l2<-rnorm(n, mean = 0, sd = 1) beta0<-1.5 beta1<-1 gamma0<--1 gamma1<--2 gamma2<-2 alpha0<-1 alpha1<-5 alpha2<-3 ## Exposure generated from the exposure nuisance model a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2)) ## Outcome generated from the main model and the ## outcome nuisance model y<-rnorm(n, mean = beta0 * a + beta1 * a * l1 + gamma0 + gamma1 * l1 + gamma2 * l2, sd = 1) simdata<-data.frame(y,a,l1,l2) ## outcome nuisance model misspecified and ## exposure nuisance model correctly specified ## DR-estimation dr.est <- drgee(oformula = formula(y~l1), eformula = formula(a~l1+l2), iaformula = formula(~l1), olink = "identity", elink = "logit", data = simdata, estimation.method = "dr") summary(dr.est) ## O-estimation o.est <- drgee(exposure = "a", oformula = formula(y~l1), iaformula = formula(~l1), olink = "identity", data = simdata, estimation.method = "o") summary(o.est) ## E-estimation e.est <- drgee(outcome = "y", eformula = formula(a~l1+l2), iaformula = formula(~l1), elink="logit", data = simdata, estimation.method = "e") summary(e.est)
## DR-estimation when ## the main model is ## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1 ## and the outcome nuisance model is ## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2 ## and the exposure nuisance model is ## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2) library(drgee) expit<-function(x) exp(x)/(1+exp(x)) n<-5000 ## nuisance l1<-rnorm(n, mean = 0, sd = 1) l2<-rnorm(n, mean = 0, sd = 1) beta0<-1.5 beta1<-1 gamma0<--1 gamma1<--2 gamma2<-2 alpha0<-1 alpha1<-5 alpha2<-3 ## Exposure generated from the exposure nuisance model a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2)) ## Outcome generated from the main model and the ## outcome nuisance model y<-rnorm(n, mean = beta0 * a + beta1 * a * l1 + gamma0 + gamma1 * l1 + gamma2 * l2, sd = 1) simdata<-data.frame(y,a,l1,l2) ## outcome nuisance model misspecified and ## exposure nuisance model correctly specified ## DR-estimation dr.est <- drgee(oformula = formula(y~l1), eformula = formula(a~l1+l2), iaformula = formula(~l1), olink = "identity", elink = "logit", data = simdata, estimation.method = "dr") summary(dr.est) ## O-estimation o.est <- drgee(exposure = "a", oformula = formula(y~l1), iaformula = formula(~l1), olink = "identity", data = simdata, estimation.method = "o") summary(o.est) ## E-estimation e.est <- drgee(outcome = "y", eformula = formula(a~l1+l2), iaformula = formula(~l1), elink="logit", data = simdata, estimation.method = "e") summary(e.est)
Given a main model, an outcome nuisance model and an
exposure nuisance model drgeeData
extracts the model
variables and matrices from a data.frame
or an environment
object. It also performs some data cleaning and error checking.
drgeeData(outcome, exposure, oformula, eformula, iaformula = formula(~1), olink = c("identity", "log", "logit"), elink = c("identity", "log", "logit"), data, subset = NULL, estimation.method = c("dr", "o", "e"), cond = FALSE, clusterid, clusterid.vcov)
drgeeData(outcome, exposure, oformula, eformula, iaformula = formula(~1), olink = c("identity", "log", "logit"), elink = c("identity", "log", "logit"), data, subset = NULL, estimation.method = c("dr", "o", "e"), cond = FALSE, clusterid, clusterid.vcov)
outcome |
The outcome as a variable or as a character string naming a variable in the
|
exposure |
The exposure as a variable or as a character string naming a variable in the
|
oformula |
An expression or formula for the outcome nuisance model. The outcome is identified as the response in this formula. |
eformula |
An expression or formula for the exposure nuisance model. The exposure is identified as the response in this formula. |
iaformula |
An expression or formula where the RHS should contain the variables
that "interact" (i.e. are supposed to be multiplied with) with the
exposure in the main model to create the terms associated with the
parameters of interest. "1" will always added. Default value is no
interactions, i.e. |
olink |
A character string naming the link function in the outcome nuisance
model. Have to be |
elink |
A character string naming the link function in the exposure nuisance
model. Have to be |
data |
A data frame or environment containing the variables in |
subset |
An optional vector defining a subset of the data to be used. |
estimation.method |
A character string naming the desired estimation method. Choose
|
cond |
A logical value indicating whether the nuisance models should have
cluster-specific intercepts. If |
clusterid |
A cluster-defining variable or a character string naming a cluster-defining variable in the
|
clusterid.vcov |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
drgeeData
is called by drgee
and gee
to extract
data from a data.frame
or environment
object. The data can then be used to for O-estimation, E-estimation or
DR-estimation. drgeeData
uses
model.frame
and model.matrix
to remove incomplete
observations and to convert factors to dummy variables. It also
performs check the supplied data for errors or inconsistencies.
The class method summary.drgeeData
produces strings for the
formulas with terms referring to the columns in the produced design
matrices.
drgee.data
returns an object of class drgeeData
containing
used.rows |
The rows numbers in the original data for the used rows (after subset selection and exlusions). |
orig.order |
The original order of the observations. |
y |
The outcome matrix. |
a |
The exposure matrix. |
x |
The matrix of of interactions defined in |
ax |
The matrix of elementwise product(s) of |
v |
The matrix of terms in the outcome nuisance model. |
z |
The matrix of terms in the exposure nuisance model. |
yx |
The matrix of elementwise product(s) of |
id |
A factor defining clusters. For independent observations, the number of levels equals the number of complete observations. |
clustname |
A string for the name of the cluster defining variable. |
y.names |
A string for the name of the outcome. |
a.names |
A string for the name of the exposure. |
x.names |
A string vector for the variable names in |
ax.names |
A string vector for the variable names in |
v.names |
A string vector for the variable names in |
z.names |
A string vector for the variable names in |
yx.names |
A string vector for the variable names in |
olink |
A character string naming the link function in the outcome nuisance model. |
elink |
A character string naming the link function in the outcome nuisance model. |
cond |
A logical value indicating whether cluster-specific intercepts should
be assumed. If |
oterms |
The |
eterms |
The |
Johan Zetterqvist, Arvid Sjölander
drgee
, gee
, model.frame
and model.matrix
.
A wrapper around nleqslv
from the nleqslv package
to solve a non linear system of equations.
findRoots(beta.init, eq.func, d.eq.func = NULL, arg.list, ...)
findRoots(beta.init, eq.func, d.eq.func = NULL, arg.list, ...)
beta.init |
An initial guess for the zero. |
eq.func |
A function of two variables for which the zero are sought. Its first
argument |
d.eq.func |
A function to return the Jacobian of |
arg.list |
The second argument to |
... |
A list of additional arguments to be passed to |
findRoots
calculates zeros fo the function eq.func
and is the default equation solving function in drgee
.
It is supplied as a separate function in order to allow users
to use other equation solvers by writing their own wrapper
with the same interface as findRoots
.
The value is a list containing the following arguments:
roots |
The zero(s) of the function |
optim.object |
The optimization object returned from |
Johan Zetterqvist, Arvid Sjölander
nleqslv
in package nleqslv.
gee
performs estimation of parameters in a restricted mean
model using standard GEEs with independent working correlation matrix. For
clustered data, cluster-robust standard errors are calculated. When
cond=TRUE
, cluster-specific
intercepts are assumed.
gee(formula, link = c("identity", "log", "logit"), data, subset, cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots, ...)
gee(formula, link = c("identity", "log", "logit"), data, subset, cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots, ...)
formula |
An expression or formula representing the expected outcome conditional on covariates. |
link |
A character string naming the link function to use. Has to be
|
data |
A data frame or environment containing the variables appearing in
|
subset |
An optional vector defining a subset of the data to be used. |
cond |
A logical value indicating whether cluster-specific intercepts should
be assumed. Requires a |
clusterid |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
clusterid.vcov |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
rootFinder |
A function to solve a system of non linear equations. Default
is |
... |
Further arguments to be passed to the function |
Estimates parameters in a regression model, defined by
formula
. When cond=FALSE
, the estimated coefficients are
identical to those obtained with glm
, but since no
distributional assumptions are made, a robust variance is
calculated. When cond=TRUE
and link
is "identity"
or "log"
, the coefficients will be calculated
using conditional estimating equations as described in Goetgeluk and
Vansteelandt (2008). When cond=TRUE
and link="logit"
,
the coefficients will be calculated by conditional logistic regression
(with robust standard errors).
gee
return an object of class gee
containing:
coefficients |
Estimates of the parameters. |
vcov |
Robust variance of the estimates. |
call |
The matched call. |
y |
The outcome vector. |
x |
The design matrix. For conditional methods there is no column corresponding to the intercept. |
optim.object |
An estimation object returned from the function specified
in the |
res |
The residuals from the estimating equations. |
d.res |
The derivative of the residuals from the estimating equations. |
data |
The original data object, if given as an input argument |
formula |
The original formula object, if given as an input argument |
The class methods coef
and vcov
can be used to extract
the estimated parameters and their covariance matrix from a
gee
object. To obtain the 'naive' variance, i.e. the variance
obtained from maximum likelihood estimation assuming correct
parameteric model and no clustering, use the class method
naiveVcov
. The class method summary.drgee
produces a
summary of the calculations.
Johan Zetterqvist, Arvid Sjölander
Goetgeluk S., & Vansteelandt S. (2008), Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics, 64(3), pp. 772–780.
glm
A function to obtain the conditional score residuals from conditional logistic regression as well as the sum of derivatives of these residuals with respect to the parameters evaluated at the estimated parameters.
getScoreResidualsFromClogit(fit, coefs, y, x, id)
getScoreResidualsFromClogit(fit, coefs, y, x, id)
fit |
A fitted object. Could be any object that has a
class.function |
coefs |
An estimate of the parameters. If the argument |
y |
The observed outcomes. |
x |
The design matrix. |
id |
The cluster identification variable. |
getScoreResidualsFromClogit
calculates the residuals from the
conditional score equations used in conditional logistic regression as
we as the sums of their derivatives with respect to the paramaters.
This is useful if one wants to use a calculate the sandwich estimate of a variance where the uncertainty in the estimation of a conditional odds ratio needs to be taken into account.
getScoreResidualsFromClogit
return a list containing:
U |
An |
dU.sum |
An |
dU.mean |
The mean cluster sum of the derivatives of the score residuals from the conditional likelihood. |
Johan Zetterqvist, Arvid Sjölander
robVcov
and robustVcov
calculates the asymptotic variance for Z-estimators.
robustVcov(U, d.U.sum, id = NULL) robVcov(U, d.U, id = NULL)
robustVcov(U, d.U.sum, id = NULL) robVcov(U, d.U, id = NULL)
U |
A n x q matrix of the estimating equations evaluated at the estimated model parameters, where n is the number of observations and q is the number of estimating equations. |
d.U.sum |
The sum of the jacobian of |
d.U |
The mean of |
id |
A factor with levels corresponding the clusters in the data. Default
is |
For robust variance estimation, see van der Vaart (2000).
For clustered data, the rows in U
are added clusterwise
resulting in a cluster robust estimate of the variance.
The estimated covariance matrix.
Johan Zetterqvist, Arvid Sjölander
van der Vaart, A.W. (2000), Asymptotic Statistics, Cambridge University Press, pp. 52–53.