The isni package
provides functions to compute, print and summarize the Index of
Sensitivity to Nonignorability (ISNI). One can compute the sensitivity
index without estimating any nonignorable models or positing specific
magnitude of nonignorability. Thus ISNI provides a simple quantitative
assessment of how robust the standard estimates assuming missing at
random is with respect to the assumption of ignorability. This vignette
serves as a quick start for how to use the package. Currently the
package provides ISNI computation for
It allows for arbitrary patterns of missingness in the longitudinal regression outcomes caused by dropout and/or intermittent missingness.
sos examplesos is dataset on a cross-sectional survey of sexual
practices among students at the University of Edinburgh. The response
variable is the students’ answer to the question ``Have you ever had
sexual intercourse?’’. Because of the sensitivity of this question, many
students declined to answer, leading to substantial missing data. We
consider a simplified data set consisting of the answer to this
question, with the student’s sex and faculty as predictors.
## sexact gender faculty
## 2186 <NA> male other
## 4540 no female other
## 4981 <NA> female other
## 305 yes male other
## 1263 yes male other
## 167 yes male other
## 3046 no male mdv
## 5486 <NA> female other
## 6030 no female mdv
## 2818 <NA> male other
The R code above loads the library isni and the data
frame sos, displaying a random subsample of \(10\) records. sos includes the
following factor variables: sexact is the response to the
question Have you ever had sexual intercourse? (two levels:
no (reference level), yes); gender is the student’s sex
(two levels: male (reference level), female); faculty is
the student’s faculty (medical/dental/veterinary, all other faculty
categories (reference level)).
Assuming ignorable nonresponse, one can fit a logistic model (using responders only) to predict the outcome by sex, faculty and their interaction. We estimated the model with function :
##
## Call:
## glm(formula = ymodel, family = binomial, data = sos)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.08153 0.05561 19.448 < 2e-16 ***
## genderfemale 0.03081 0.07958 0.387 0.699
## facultymdv -0.73389 0.14921 -4.918 8.73e-07 ***
## genderfemale:facultymdv 0.10213 0.20670 0.494 0.621
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4450.3 on 3827 degrees of freedom
## Residual deviance: 4408.2 on 3824 degrees of freedom
## (2308 observations deleted due to missingness)
## AIC: 4416.2
##
## Number of Fisher Scoring iterations: 4
The estimates show that students in a medical faculty were less
likely to report having had sexual intercourse. Because only 62.4%
responded to the sexual practice question, there is concern that this
analysis is sensitive to the assumption of ignorability. For this
purpose one can conduct an ISNI analysis for this model with the
function isniglm(). We posit a nonignorable nonresponse
model in the following form \[\begin{eqnarray}
logit (Prob(is.na(sexact)=``yes''))=\gamma_{0}^T s
+\gamma_1*sexact
\end{eqnarray}\] where the observed missingness predictor
s including gender, faculty and
their interaction. In the above nonresponse model, the probability of
nonresponse to the sexual practice question is associated with the
observed missingness predictor s via the parameter \(\gamma_0\) and is associated with the
partially missing outcome sexact via the parameter \(\gamma_1\). The nonignorable parameter
\(\gamma_1\) captures the mangnitude
and nature of nonignrable missingness. When \(\gamma_1=0\), the nonresponse becomes
ignorable in the sense that the probability of missingness is indepdent
of unobserved values of sexact. The above MAR analysis
provides consistent and valid estimates. When \(\gamma_1\) departs from zero, the
nonresponse becomes nonignorable and the above MAR estimates are subject
to selection bias due to nonignorable nonresponse. The ISNI functions
(specifically the isniglm function for this example) can be
applied to evaluate the rate of change of model estimates in the
neighborhood of the MAR model where the missingness probability is
allowed to depend on the unobserved value of sexact, even
after conditioning on the other missingness predictors in
s.
A simple ISNI analysis can be conducted using the
isniglm function as follows:
## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
##
## ISNIs:
## (Intercept) genderfemale facultymdv
## 0.410141 -0.038983 -0.169859
## genderfemale:facultymdv
## 0.027542
##
## c statistics:
## (Intercept) genderfemale facultymdv
## 0.13559 2.04146 0.87846
## genderfemale:facultymdv
## 7.50482
##
## Residual Deviance of the MAR model: 4408.2
##
## AIC of the MAR model: 4416.2
The summary function in the package expresses the
isniglm() object:
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
The columns MAR Est. and Std. Err denote
the logistic model estimates and their standard errors under MAR;
ISNI and c denote ISNI values and
c statistics. Recall that ISNI denotes the approximate
change in the MLEs when \(\gamma_1\) in
the selection model is changed from \(0\) to \(1\). Under our nonignorable selection
model, assuming that \(\gamma_1=1\)
means that a student whose answer is yes has an increase of
2.7-fold in the odds of nonresponse. Thus, subjects whose true value is
yes would be more likely to have a missing value, and the
naive MAR estimate for (Intercept) should be less than the
(Intercept) estimate under the correct nonignorable model.
The positive sign of the ISNI value for (Intercept) is
consistent with this prediction. The ISNI for the faculty
predictor is \(-0.17\), indicating that
if, as is more plausible here, \(\gamma_1 =
1\), the MLE for the estimate should change from \(-0.73\) to \(-0.90\). If \(\gamma_1 = -1\), the estimate would change
from \(-0.73\) to \(-0.56\).
The column c presents the c statistics that
approximate the minimum magnitude of nonignorability that is needed for
the change in an MLE to equal one standard error (\(\text{SE}\)). One can then assess
sensitivity by evaluating whether this level of nonignorability is
plausible. For our sos example with a binary outcome, the
\(c\) statistic is defined as \[\begin{eqnarray}
c= \left| \frac{\text{SE} }{\text{ISNI}}\right|.
\end{eqnarray}\] The \(c\)
statistic here informs us that in order for selection bias to be as
large as the sampling error, the magnitude of nonignorability needs to
be at least as large as that with which one-unit change in
sexact is associated with an odds ratio of 2.7 in the
probability of being missing.
When \(c\) is large, only extreme
nonignorability can make the estimate change substantially, and
consequently sensitivity to nonignorability is of little concern. For
example, \(c=10\) implies that in order
for the error in an MAR estimate to be the same size as its sampling
error, the nonignorability needs to be strong enough that a \(0.1\)-unit change in sexact
causes a significant change in the odds of being missing. When \(c\) is small, modest departure from MAR can
cause the estimate to change substantially. For example, \(c=0.1\) implies that when even a \(10\)-unit change in sexact
causes a significant change in the odds of being missing, the estimate
may change substantially. As such a degree of nonignorability is
plausible in many applications, this small \(c\) value signals sensitivity. Prior
research suggests \(c<1\) as a rule
of thumb to signal significant sensitivity.
In the sos example, the \(c\) statistics for
(Intercept)} and faculty are both less than
\(1\), suggesting that these
coefficients are sensitive to nonignorability, confirming previous
findings. Prior research also found that neither the gender
nor the interaction term between gender and
faculty should be sensitive, as our findings using ISNI
confirm.
In the above we do not explicitly specify an missing data mechanism
model (MDM) via formula argument in the
isniglm function. The same analysis can be replicated by
explicitly specifying an MDM model using the code below. The
two-equation formula below
sexact | is.na(sexact) ~ gender*faculty | gender *faculty
uses the operator | to separately specify variables used in
the complete-data model and MDM. The two-equation formula means that the
complete-data model is sexact \(\sim\) gender*faculty and that
is.na(sexact) and gender*faculty are the
missingness indicator and the missingness predictor \(s\) in the nonresponse model described
above, respectively.
ygmodel <- sexact | is.na(sexact) ~ gender*faculty | gender *faculty
summary(isniglm(ygmodel, family=binomial, data=sos))## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ygmodel, family = binomial, data = sos)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
Because all the covariates in are categorical variables, one can also
analyze the data as a grouped binomial outcome using the
weight argument as below.
gender <- c(0,0,1,1,0,0,1,1)
faculty <- c(0,0,0,0,1,1,1,1)
gender <- factor(gender, levels = c(0, 1), labels =c("male", "female"))
faculty <- factor(faculty, levels = c(0, 1), labels =c("other", "mdv"))
SAcount <- c(NA, 1277, NA, 1247, NA, 126, NA, 152)
total <- c(1189,1710,978,1657,68,215,73,246)
sosgrp <- data.frame(gender=gender, faculty=faculty, SAcount=SAcount, total=total)
ymodel <- SAcount/total ~gender*faculty
sosgrp.isni<-isniglm(ymodel, family=binomial, data=sosgrp, weight=total)## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sosgrp, weights = total)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
A tutorial describing the ISNI methodology and containing examples for ISNI computation for nonignorable missing data in longitudinal setting can be download (via)