An Introduction to Mestim

This vignette serves as a short introduction to computing the variance-covariance matrix of a multidimensional parameter using M-estimation and the empirical sandwich variance.

Baby review of M-estimation

Denoting F a probability distribution, a p-dimensional M-estimator of ψ-type T solves the vector equation βˆ«π’΅Οˆ(z, T(F))dF(z) = 0. In practice, this means that the estimator T has an unbiased estimating function ψ(z, θ) = {ψ1(z, θ), …,β€†Οˆp(z, θ)}T with solution ΞΈΜ‚Β (pβ€…Γ—β€…1) solving in ΞΈ the (pβ€…Γ—β€…1) set of β€œstacked” estimating equations given by $$ \sum_{i=1}^{n}\psi(Z_i,\theta)=\boldsymbol{0}.$$ For a ψ-type M-estimator T with estimate ΞΈΜ‚, under suitable regularity conditions for ψ, the central limit theorem and Slutsky’s theorem yield $$\sqrt{n}\big(\hat{\theta}-T(F)\big)\xrightarrow{\mathcal{D}}\mathcal{N}\big(0, \Sigma)$$ where $$\Sigma=\Bigg[\mathbb{E}\bigg\{\frac{\partial \psi\big(Z,T(F)\big) }{\partial\theta^T}\bigg\}\Bigg]^{-1}\mathbb{E}\Big\{ \psi\big(Z,T(F)\big) \psi^T\big(Z,T(F)\big)\Big\}\Bigg[\mathbb{E}\bigg\{\frac{\partial \psi\big(Z,T(F)\big) }{\partial\theta^T}\bigg\}\Bigg]^{-T}.$$ This implies that the p-dimensional M-estimator ΞΈΜ‚ is an asymptotically normal, $\sqrt{n}$-consistent, estimator for T(F). See Boos and Stefanski (2013) for a full introduction to M-estimation.

What Mestim does

Provided with observed data (Zi)1 ≀ i ≀ n, a p-dimensional vector of estimates ΞΈΜ‚ and a (pβ€…Γ—β€…1) unbiased estimating function ψ, the Mestim package computes the sandwich formula $$\hat{\Sigma}=\Bigg[n^{-1}\sum_{i=1}^n\bigg\{\frac{\partial \psi\big(Z_i,\hat{\theta}\big) }{\partial\theta^T}\bigg\}\Bigg]^{-1} n^{-1}\sum_{i=1}^n\Big\{ \psi\big(Z_i,\hat{\theta}\big) \psi^T\big(Z_i,\hat{\theta}\big)\Big\} \Bigg[n^{-1}\sum_{i=1}^n\bigg\{\frac{\partial \psi\big(Z_i,\hat{\theta}\big) }{\partial\theta^T}\bigg\}\Bigg]^{-T}.$$ The estimated asymptotic variance-covariance matrix of ΞΈΜ‚ is nβˆ’1Ξ£Μ‚, and so, we have $$\hat{\theta} \mathrel{\dot\sim} \mathcal{N}\big(0, n^{-1} \hat{\Sigma}).$$ Under the hood, Mestim algorithmically computes the Jacobian matrix of ψ; derivatives and outer products in Ξ£Μ‚ are then computed in parallel. To compute the asymptotic variance-covariance matrix, the analyst thus only need to provide a list detailing the β€œstacked” estimating functions in ψ. Below, we give examples of growing complexity to illustrate how Mestim can leverage the flexibility of M-estimation theory to calculate asymptotic standard errors (and confidence intervals) for parameter estimates ΞΈΜ‚.

Example 1: Prediction task via logistic regression

Let’s try to compute the asymptotic standard errors of estimated parameters in a logistic regression model. This simple example serves to get familiar with Mestim commands.

Let’s generate synthetic data with two predictors and a binary outcome such that Z = (X1, X2, Y)T.
click here to see the data generating process.


Here we use X1β€„βˆΌβ€„π’©(0, 1) X2β€„βˆΌβ€„π’©(0, 3) Y|Xβ€„βˆΌβ€„β„¬(expit(Ξ²10X1β€…+β€…Ξ²20X2)) with Ξ²10 = 4, Ξ²20 = 6.

NB: We use superscript 0 to denote true values of the parameters.

gen_lr_dat <- function(n, seed=123)
{
set.seed(seed)
X_1 <- rnorm(n, sd = 1); X_2 <- rnorm(n, sd = 3) # generate x_1 and x_2
true_betas <- c(4,5) # generate true parameters
X <- model.matrix(~-1+X_1+X_2) # build the design matrix
Y <- rbinom(n, 1, 1/(1 + exp(-X %*% true_betas)) ) # generate Y from X and true_betas
dat  <-  data.frame(X_1=X_1, X_2=X_2, Y=Y) # build a simulated dataset
return(dat)
}


dat <- gen_lr_dat(5000)
head(dat)
##           X_1       X_2 Y
## 1 -0.56047565 -1.482522 0
## 2 -0.23017749  3.382780 1
## 3  1.55870831 -3.440849 0
## 4  0.07050839  4.443056 1
## 5  0.12928774  2.748574 1
## 6  1.71506499  1.005393 1

Let’s fit a logistic regression model and put the estimated parameters in a list.

mod <- glm(Y~-1 + X_1 + X_2, data=dat, family = "binomial")
thetas_hat <- list(thetas_1=coef(mod)[1], thetas_2=coef(mod)[2])

Recall that the estimated parameters θ̂ = (ΞΈΜ‚1, θ̂2)T from this logistic regression model jointly solve $$\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{1,i} =0$$ and $$\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{2,i} =0.$$ Therefore, we can identify ψ1(Zi, θ1) = ([1β€…+β€…exp {β€…βˆ’β€…(ΞΈ1X1, iβ€…+β€…ΞΈ2X2, i)}]βˆ’1β€…βˆ’β€…Yi)X1, i and ψ2(Zi, θ1) = ([1β€…+β€…exp {β€…βˆ’β€…(ΞΈ1X1, iβ€…+β€…ΞΈ2X2, i)}]βˆ’1β€…βˆ’β€…Yi)X2, i. With that in mind, let’s build a list for the unbiased estimating function ψ(z, θ) = (ψ1(z, θ1),β€†Οˆ2(z, θ2))T.

psi_1 <- expression( ((1/(1+exp(-(thetas_1 * X_1 + thetas_2 * X_2)))) - Y) * X_1 )
psi_2 <- expression( ((1/(1+exp(-(thetas_1 * X_1 + thetas_2 * X_2)))) - Y) * X_2 )
psi <- list(psi_1, psi_2)

NB: parameters’ names (here thetas_1 and thetas_2) must be consistent with the previous list.

We are finally ready to pass these arguments to the get_vcov function form the Mestim package.

library(Mestim)
res <- get_vcov(data=dat, thetas=thetas_hat, M=psi)

You can obtain the variance-covariance matrix from a get_vcov result as follows

res$vcov
##            thetas_1   thetas_2
## thetas_1 0.05239025 0.05366863
## thetas_2 0.05366863 0.06795271
click here to verify that the variance-covariance matrix from get_vcov is similar to that of glm.


vcov(mod)
##            X_1        X_2
## X_1 0.05698408 0.06143937
## X_2 0.06143937 0.07963092
This is indeed very close the results inres$vcov.


Asymptotic standard errors are square root of the diagonal elements from the estimated variance-covariance matrix. These are stored in the se attribute.

res$se
##  thetas_1  thetas_2 
## 0.2288892 0.2606774

Example 2: Average treatment effect via outcome regression

Let’s generate synthetic observational data with treatment allocation A, continuous outcome Y and a single confounder X such that Z = (X, A, Y)T.

click here to see the data generating process.


Here we use Xβ€„βˆΌβ€„π’©(0, 1) A|Xβ€„βˆΌβ€„β„¬(expit(2X)) Ο΅β€„βˆΌβ€„π’©(0, 20) Y|X, ϡ = γ10Xβ€…+β€…Ξ³20Aβ€…+β€…Ξ³30AXβ€…+β€…Ο΅ with Ξ³10 = 4, Ξ³20 = 3, and Ξ³30 = 2.

NB: We use superscript 0 to denote true values of the parameters.

gen_obs_dat <- function(n, seed=123)
{
set.seed(seed)
X <- rnorm(n) # generate X
A <- rbinom(n, 1, 1/(1 + exp(- 2 * X)) )  # generate treatment allocation A
X_mat <- model.matrix(~ -1 + X + A + A:X) # build the design matrix
true_gammas <- c(4,3,2) 
epsi <- rnorm(n,0,20) # generate gaussian noise 
Y <- (X_mat %*% true_gammas) + epsi # generate observed outcomes 
dat  <-  data.frame(X=X, A=A, Y=Y) # build a simulated dataset
return(dat)
}


dat <- gen_obs_dat(5000)
head(dat)
##             X A          Y
## 1 -0.56047565 0   4.758148
## 2 -0.23017749 0  15.368124
## 3  1.55870831 1   2.018927
## 4  0.07050839 1 -50.422238
## 5  0.12928774 1 -18.163366
## 6  1.71506499 1 -11.819112

In this example, the goal is to calculate standard errors for the outcome regression average causal effect estimator $$\hat{\delta}=n^{-1}\sum_{i=1}^n\mathbb{\hat{E}}(Y|X=X_i, A=1)-\mathbb{\hat{E}}(Y|X=X_i, A=0).$$

For 𝔼(Y|X, A), let’s specify the regression model m(X, A; γ) = γ1Xβ€…+β€…Ξ³2Aβ€…+β€…Ξ³3AX and store the estimated parameters.

m <- lm(Y~ -1 + X + A + A:X, data = dat)
gamma_1_hat <- coef(m)[1]
gamma_2_hat <- coef(m)[2]
gamma_3_hat <- coef(m)[3]

Recall that the estimated parameters $\hat{\boldsymbol{\gamma}}=(\hat{\gamma}_1,\hat{\gamma}_2,\hat{\gamma}_3)^T$ from this linear regression model jointly solve $$\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)X_i=0,$$ $$\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)A_i=0,$$ $$\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)A_iX_i=0.$$ Disregarding the summation sign, we can straightforwardly identify the three first elements of the estimating function ψ(z, θ). Before building a list detailing the function ψ, we need to identify the estimating function of our main parameter of interest Ξ΄. To do so, recall that we can estimate Ξ΄ as $$\hat{\delta}=n^{-1}\sum_{i=1}^nm(X_i,1;\hat{\boldsymbol{\gamma}})-m(X_i,0;\hat{\boldsymbol{\gamma}}) \\ =n^{-1}\sum_{i=1}^n\{\hat{\gamma}_1+ \hat{\gamma}_2 \times 1 +\hat{\gamma}_3 \times 1 \times X_i\} - \{\hat{\gamma}_1+ \hat{\gamma}_2 \times 0 +\hat{\gamma}_3 \times 0 \times X_i\} \\ =n^{-1}\sum_{i=1}^n \hat{\gamma}_2 +\hat{\gamma}_3 X_i. $$ Let’s first compute it.

delta_hat <- mean(gamma_2_hat + gamma_3_hat*dat$X)

Note that rearranging the last equality we have $$\sum_{i=1}^n \hat{\gamma}_2 +\hat{\gamma}_3 X_i - \hat{\delta} = 0$$ which straightforwardly yields the last element of the estimating function ψ(z, θ). Disregarding the summation sign yields the last estimating function which we can now β€œstack” with the previous ones. Let’s now build a list detailing the full function ψ(z, θ).

psi_1 <- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * X )
psi_2 <- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * A )
psi_3 <- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * A*X )
psi_4 <- expression( gamma_2 + gamma_3 * X - delta )
psi <- list(psi_1, psi_2, psi_3, psi_4)

Let’s also store all the estimated parameters θ̂ = (Ξ³Μ‚1, γ̂2, γ̂3, δ̂)T in a list.

thetas_hat <- list(gamma_1=gamma_1_hat,
                  gamma_2=gamma_2_hat,
                  gamma_3=gamma_3_hat,
                  delta=delta_hat)

Let’s pass the relevant arguments to get_vcov and check results for the variance-covariance matrix.

res <- get_vcov(data=dat, thetas=thetas_hat, M=psi)
res$vcov
##               gamma_1       gamma_2    gamma_3         delta
## gamma_1  1.686258e-01 -6.659882e-17 -0.1686258  2.291608e-05
## gamma_2 -4.168437e-17  2.510135e-01 -0.1497095  2.509786e-01
## gamma_3 -1.686258e-01 -1.497095e-01  0.4228791 -1.496732e-01
## delta    2.291608e-05  2.509786e-01 -0.1496732  2.512757e-01
Let’s see how the results compare with standard errors obtained from the bootstrap.
click here to see the bootstrap procedure


boot_fun <- function(d, i=1:nrow(d)) {
  z<-d[i,]
  mod <- lm(Y~ -1 + X + A + A:X, data = z)
  gamma_1_hat <- coef(mod)[1]
  gamma_2_hat <- coef(mod)[2]
  gamma_3_hat <- coef(mod)[3]
  delta_hat <- mean(gamma_2_hat*1 + gamma_3_hat*1*z$X)
  return( c(gamma_1_hat, gamma_2_hat, gamma_3_hat, delta_hat) )
}
boot_start_time <- Sys.time()
res_boot <- boot::boot(dat, boot_fun, R=999)
boot_end_time <- Sys.time()
paste("Bootstrapping took", round(as.numeric(boot_end_time - boot_start_time), 2), "seconds.")
## [1] "Bootstrapping took 2.74 seconds."
##              [,1]        [,2]       [,3]         [,4]
## [1,]  0.166959339 -0.00255421 -0.1636470 -0.002452947
## [2,] -0.002554210  0.24003558 -0.1433816  0.240057058
## [3,] -0.163647006 -0.14338163  0.4128908 -0.143436074
## [4,] -0.002452947  0.24005706 -0.1434361  0.240449603

This is pretty close to the results in res$vcov that we obtained 41.9 times faster with Mestim.

Example 3: Value estimation for dynamic treatment regime

Let’s generate synthetic observational data for dynamic clinical decisions. We note St the observed state at time t, At the binary action taken at time t, and Y the terminal outcome for the sequence where higher values indicate worse disease symptoms. For illustrative purpose, we consider only T = 2 decision points so that we have data Z = (S1, A1, S2, A2, Y)T.

click here to see the data generating process.


The data are generated via the following hidden Markov process, where we only get to observe St, which is a noisy version of the underlying state Xt: X1β€„βˆΌβ€„π’©(0, 0.1) Xtβ€…+β€…1|Xt, Atβ€„βˆΌβ€„1Xt > 0𝒩(1.1Xtβ€…βˆ’β€…0.5At, 0.05)β€…+β€…1Xt < 0Xt St|Xtβ€„βˆΌβ€„π’©(Xt, 0.1) A1|S1β€„βˆΌβ€„β„¬(expit(βˆ’0.1β€…+β€…log St)) A2|S2, A1β€„βˆΌβ€„β„¬(expit(0.1β€…+β€…log Stβ€…+β€…3A1))

Y|X3, A2, A1β€„βˆΌβ€„exp(𝒩(X3β€…+β€…Ξ»(A2β€…+β€…A1), 0.1)). We consider that receiving treatment actions has a side effect penalty of λ = 0.1.

gen_dtr_dat <- function(n, seed=456)
{
set.seed(seed)
expit <- function(x) 1/(1+exp(-x))

X_1 <- rnorm(n, sd=.1)
S_1 <- exp(rnorm(n, mean = X_1, sd = .1))
A_1 <- rbinom(n, size = 1, prob = expit(-.1+log(S_1)))

X_2 <- (X_1>0) * rnorm(n, mean = 1.1*X_1 - .5 * A_1, sd=.05) + (X_1<0) * X_1
S_2 <- exp(rnorm(n, mean = X_2, sd = .1))
A_2 <- rbinom(n, size = 1, prob = expit(.1+log(S_2)+3*A_1))

X_3 <- (X_2>0) * rnorm(n, mean = 1.1*X_2 - .5 * A_2, sd=.05) + (X_2<0) * X_2
Y <- exp(rnorm(n, mean = X_3 + .1*(A_1 + A_2), sd = .1)) #0.1 penalty for treating

dat <- data.frame(S_1=S_1, A_1=A_1, S_2=S_2, A_2=A_2, Y)  
return(dat)
}


dat <- gen_dtr_dat(5000)
head(dat)
##         S_1 A_1       S_2 A_2        Y
## 1 0.8402181   1 0.9292966   1 1.046362
## 2 1.0033476   0 1.0623962   0 1.024897
## 3 1.0889107   0 1.0687287   0 1.152630
## 4 0.8716224   1 0.8938716   1 1.069209
## 5 0.9743708   1 0.9324451   1 1.189594
## 6 1.0222173   1 0.9174528   1 1.195341

Given any treatment action regime d = {d1, …, dT}T, an estimator for 𝔼Zβ€„βˆΌβ€„d(Y) is

where we use the history notation Ht = (St, Atβ€…βˆ’β€…1, Stβ€…βˆ’β€…1, …, S1)T and write the relevant generalization of the propensity score as et(Ht) = 𝔼(At|Ht) for clarity.

In this example we consider the regime dΜƒ(Z) = {dΜƒ1(H1) = 1S1 > 1, dΜƒ2(H2) = 1S2 > 1}T. The goal is to calculate standard errors for $\hat{\mathcal{V}}_{IPW}(\tilde{d})$. As T = 2, we need specify models for e1(H1) and e2(H2). Let’s specify the parametric regression models e1(H1; δ) = expit(Ξ΄1β€…+β€…Ξ΄2log S1) and e2(H2; ϕ) = expit(Ο•1β€…+β€…Ο•2log S2β€…+β€…Ο•3A1). We fit and store the estimated parameters as follows.

e_1 <- glm(A_1~I(log(S_1)), data=dat, family = "binomial")
delta_1_hat <- coef(e_1)[1]
delta_2_hat <- coef(e_1)[2]

e_2 <- glm(A_2~I(log(S_2)) + A_1 , data=dat, family = "binomial")
phi_1_hat <- coef(e_2)[1]
phi_2_hat <- coef(e_2)[2]
phi_3_hat <- coef(e_2)[3]

As in example 1, recall that for e1 the estimated parameters $\hat{\boldsymbol{\delta}}=(\hat{\delta}_1, \hat{\delta}_2)^T$ jointly solve $$\sum_{i=1}^{n}\Big[1+\exp\big\{-{(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}}-A_{1,i} =0,$$ $$\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}}-A_{1,i}\bigg) \log S_{1,i}=0.$$ Similarly for e2 the estimated parameters $\hat{\boldsymbol{\phi}}=(\hat{\phi}_1, \hat{\phi}_2, \hat{\phi}_3)^T$ jointly solve $$\sum_{i=1}^{n}\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i} =0,$$ $$\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i}\bigg) \log S_{2,i}=0,$$ $$\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i}\bigg) A_{1,i}=0.$$

Disregarding the summation sign, we can straightforwardly identify the five first elements of the estimating function ψ(z, θ). Let’s store them before building our final list for ψ.

Note that for programming convenience, we recommend to store all relevant variable transformations as columns in the original dataframe.

dat$log_S_1 <- log(dat$S_1) ; dat$log_S_2 <- log(dat$S_2) # For ease of programming

psi_1 <- expression( ((1/(1+exp(-(delta_1 + delta_2 * log_S_1)))) - A_1) * 1 )
psi_2 <- expression( ((1/(1+exp(-(delta_1 + delta_2 * log_S_1)))) - A_1) * log_S_1)

psi_3 <- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * 1 )
psi_4 <- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * log_S_2)
psi_5 <- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * A_1)

To obtain the last element of ψ, we need to do algebraic manipulations. Denoting $\mathcal{C}_{\tilde{d}, i}=\prod_{t=1}^2\mathbf{1}\big({\tilde{d}_t(H_{t,i})=A_{t,i}}\big)$ for simplicity, after substitution for eΜ‚1(H1; δ̂) and eΜ‚2(H2; ϕ̂), equation (1) yields

Rearrangements of the equation above yield We can now straightforwardly identify and store the last elements of the estimating function ψ.

# The regime we are interested in
dat$d_1 <- dat$S_1>1
dat$d_2 <- dat$S_2>1

# For ease of programming
dat$C_d <- with(dat, d_1==A_1 & d_2==A_2)

# Store the last element of psi
psi_6 <- expression( Y * C_d *
          (  (1+exp(-(delta_1 + delta_2 * log_S_1)))^d_1 * # numerator
             (1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))^d_2  ) 
          
      /   (  (1-(1+exp(-(delta_1 + delta_2 * log_S_1)))^(-1))^(1-d_1) * # denominator
             (1-(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))^(-1))^(1-d_2)  ) 
- V
)

Let’s now build a list detailing the full function ψ(z, θ).

psi <- list(psi_1, psi_2, psi_3, psi_4, psi_5, psi_6)

Now, let’s compute $\hat{\mathcal{V}}_{IPW}(\tilde{d})$ and stack it in a list of all the estimated parameters $\hat{\theta}=\Big(\hat{\delta}_1,\hat{\delta}_2,\hat{\phi}_1,\hat{\phi}_2,\hat{\phi}_3, \hat{\mathcal{V}}_{IPW}(\tilde{d})\Big)^T$.

For simplicity in computing $\hat{\mathcal{V}}_{IPW}(\tilde{d})$, we recommend to use a slightly modified version of psi_6 as follows.

# Just delete "- V" from in the previous expression
# add _hat as appropriate
ipw_estimator <- expression( Y * C_d *
          (  (1+exp(-(delta_1_hat + delta_2_hat * log_S_1)))^d_1 * # numerator
             (1+exp(-(phi_1_hat + phi_2_hat * log_S_2 + phi_3_hat * A_1)))^d_2  ) 
          
      /   (  (1-(1+exp(-(delta_1_hat + delta_2_hat * log_S_1)))^(-1))^(1-d_1) * # denominator
             (1-(1+exp(-(phi_1_hat + phi_2_hat * log_S_2 + phi_3_hat * A_1)))^(-1))^(1-d_2)  )  
)

# Compute individual contribution and take the average
V_hat <- with(dat, mean(eval(ipw_estimator))) # Other ways to compute this quantity are OK too.

thetas_hat <- list(delta_1=delta_1_hat,
                   delta_2=delta_2_hat,
                   phi_1=phi_1_hat,
                   phi_2=phi_2_hat,
                   phi_3=phi_3_hat,
                   V=V_hat)

Let’s pass the relevant arguments to get_vcov and check results for the standard errors.

res <- get_vcov(data=dat, thetas=thetas_hat, M=psi)
res$se
##    delta_1    delta_2      phi_1      phi_2      phi_3          V 
## 0.02836275 0.19963843 0.03921097 0.22778301 0.12032851 0.03641272
Let’s see how the results compare with standard errors obtained from the bootstrap.
click here to see the bootstrap procedure


boot_fun <- function(d, i=1:nrow(d)) {
  z<-d[i,]
  e_1 <- glm(A_1~I(log(S_1)), data=z, family = "binomial")
  e_2 <- glm(A_2~I(log(S_2)) + A_1 , data=z, family = "binomial")

  delta_1_hat <- coef(e_1)[1]
  delta_2_hat <- coef(e_1)[2]
  phi_1_hat <- coef(e_2)[1]
  phi_2_hat <- coef(e_2)[2]
  phi_3_hat <- coef(e_2)[3]

  ipw_estimator <- expression( z$Y * z$C_d *
            (  (1+exp(-(delta_1_hat + delta_2_hat * z$log_S_1)))^z$d_1 * # numerator
               (1+exp(-(phi_1_hat + phi_2_hat * z$log_S_2 + phi_3_hat * z$A_1)))^z$d_2  ) 
            
        /   (  (1-(1+exp(-(delta_1_hat + delta_2_hat * z$log_S_1)))^(-1))^(1-z$d_1) * # denominator
               (1-(1+exp(-(phi_1_hat + phi_2_hat * z$log_S_2 + phi_3_hat * z$A_1)))^(-1))^(1-z$d_2)  )  
  )

  V_hat <- mean(eval(ipw_estimator))
  return( c(delta_1_hat, delta_2_hat, phi_1_hat, phi_2_hat, phi_3_hat, V_hat) )
}
boot_start_time <- Sys.time()
res_boot <- boot::boot(dat, boot_fun, R=999)
boot_end_time <- Sys.time()
paste("Bootstrapping took", round(as.numeric(boot_end_time - boot_start_time), 2), "seconds.")
## [1] "Bootstrapping took 23.53 seconds."
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot::boot(data = dat, statistic = boot_fun, R = 999)
## 
## 
## Bootstrap Statistics :
##        original        bias    std. error
## t1* -0.10641382 -0.0002500751  0.02843059
## t2*  0.65733524  0.0080253525  0.20518095
## t3*  0.07486354  0.0005029474  0.03972660
## t4*  1.22872312 -0.0066237516  0.23196144
## t5*  3.12746280  0.0087706975  0.12208167
## t6*  0.83983316  0.0021767814  0.03644645

This is pretty close to the results in res$se that we obtained 201.3 times faster with Mestim.

References

Boos, D. D. and Stefanski, L. (2013). Essential Statistical Inference. Springer, New York. https://doi.org/10.1007/978-1-4614-4818-1.

Tsiatis, A. A., Davidian, M., Holloway, S. T. & Laber, E. B. (2019), Dynamic Treatment Regimes: Statistical Methods for Precision Medicine, CRC Press. https://doi.org/10.1201/9780429192692.