The isni
package
provides functions to compute, print and summarize the Index of
Sensitivity to Nonignorability (ISNI). One can compute the sensitivity
index without estimating any nonignorable models or positing specific
magnitude of nonignorability. Thus ISNI provides a simple quantitative
assessment of how robust the standard estimates assuming missing at
random is with respect to the assumption of ignorability. This vignette
serves as a quick start for how to use the package. Currently the
package provides ISNI computation for
It allows for arbitrary patterns of missingness in the longitudinal regression outcomes caused by dropout and/or intermittent missingness.
sos
examplesos
is dataset on a cross-sectional survey of sexual
practices among students at the University of Edinburgh. The response
variable is the students’ answer to the question ``Have you ever had
sexual intercourse?’’. Because of the sensitivity of this question, many
students declined to answer, leading to substantial missing data. We
consider a simplified data set consisting of the answer to this
question, with the student’s sex and faculty as predictors.
## sexact gender faculty
## 2175 <NA> male other
## 4674 no female other
## 3329 yes female other
## 500 yes male other
## 2152 <NA> male other
## 4070 yes female other
## 3703 yes female other
## 1003 yes male other
## 2021 <NA> male other
## 748 yes male other
The R code above loads the library isni
and the data
frame sos
, displaying a random subsample of 10 records. sos
includes the
following factor variables: sexact
is the response to the
question Have you ever had sexual intercourse?
(two levels:
no (reference level), yes); gender
is the student’s sex
(two levels: male (reference level), female); faculty
is
the student’s faculty (medical/dental/veterinary, all other faculty
categories (reference level)).
Assuming ignorable nonresponse, one can fit a logistic model (using responders only) to predict the outcome by sex, faculty and their interaction. We estimated the model with function :
##
## Call:
## glm(formula = ymodel, family = binomial, data = sos)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.08153 0.05561 19.448 < 2e-16 ***
## genderfemale 0.03081 0.07958 0.387 0.699
## facultymdv -0.73389 0.14921 -4.918 8.73e-07 ***
## genderfemale:facultymdv 0.10213 0.20670 0.494 0.621
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4450.3 on 3827 degrees of freedom
## Residual deviance: 4408.2 on 3824 degrees of freedom
## (2308 observations deleted due to missingness)
## AIC: 4416.2
##
## Number of Fisher Scoring iterations: 4
The estimates show that students in a medical faculty were less
likely to report having had sexual intercourse. Because only 62.4%
responded to the sexual practice question, there is concern that this
analysis is sensitive to the assumption of ignorability. For this
purpose one can conduct an ISNI analysis for this model with the
function isniglm()
. We posit a nonignorable nonresponse
model in the following form where the observed missingness predictor
s
including gender
, faculty
and
their interaction. In the above nonresponse model, the probability of
nonresponse to the sexual practice question is associated with the
observed missingness predictor s
via the parameter γ0 and is associated with
the partially missing outcome sexact
via the parameter
γ1. The
nonignorable parameter γ1 captures the
mangnitude and nature of nonignrable missingness. When γ1 = 0, the nonresponse
becomes ignorable in the sense that the probability of missingness is
indepdent of unobserved values of sexact
. The above MAR
analysis provides consistent and valid estimates. When γ1 departs from zero, the
nonresponse becomes nonignorable and the above MAR estimates are subject
to selection bias due to nonignorable nonresponse. The ISNI functions
(specifically the isniglm
function for this example) can be
applied to evaluate the rate of change of model estimates in the
neighborhood of the MAR model where the missingness probability is
allowed to depend on the unobserved value of sexact
, even
after conditioning on the other missingness predictors in
s
.
A simple ISNI analysis can be conducted using the
isniglm
function as follows:
## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
##
## ISNIs:
## (Intercept) genderfemale facultymdv
## 0.410141 -0.038983 -0.169859
## genderfemale:facultymdv
## 0.027542
##
## c statistics:
## (Intercept) genderfemale facultymdv
## 0.13559 2.04146 0.87846
## genderfemale:facultymdv
## 7.50482
##
## Residual Deviance of the MAR model: 4408.2
##
## AIC of the MAR model: 4416.2
The summary
function in the package expresses the
isniglm()
object:
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
The columns MAR Est.
and Std. Err
denote
the logistic model estimates and their standard errors under MAR;
ISNI
and c
denote ISNI values and
c
statistics. Recall that ISNI denotes the approximate
change in the MLEs when γ1 in the selection model
is changed from 0 to 1. Under our nonignorable selection model,
assuming that γ1 = 1 means that a
student whose answer is yes
has an increase of 2.7-fold in
the odds of nonresponse. Thus, subjects whose true value is
yes
would be more likely to have a missing value, and the
naive MAR estimate for (Intercept)
should be less than the
(Intercept)
estimate under the correct nonignorable model.
The positive sign of the ISNI value for (Intercept)
is
consistent with this prediction. The ISNI for the faculty
predictor is −0.17, indicating that if,
as is more plausible here, γ1 = 1, the MLE for the
estimate should change from −0.73 to
−0.90. If γ1 = −1, the estimate
would change from −0.73 to −0.56.
The column c
presents the c
statistics that
approximate the minimum magnitude of nonignorability that is needed for
the change in an MLE to equal one standard error (SE). One can then assess sensitivity by
evaluating whether this level of nonignorability is plausible. For our
sos
example with a binary outcome, the c statistic is defined as The c statistic here informs us that in
order for selection bias to be as large as the sampling error, the
magnitude of nonignorability needs to be at least as large as that with
which one-unit change in sexact
is associated with an odds
ratio of 2.7 in the probability of being missing.
When c is large, only
extreme nonignorability can make the estimate change substantially, and
consequently sensitivity to nonignorability is of little concern. For
example, c = 10 implies that
in order for the error in an MAR estimate to be the same size as its
sampling error, the nonignorability needs to be strong enough that a
0.1-unit change in sexact
causes a significant change in the odds of being missing. When c is small, modest departure from
MAR can cause the estimate to change substantially. For example, c = 0.1 implies that when even a
10-unit change in sexact
causes a significant change in the odds of being missing, the estimate
may change substantially. As such a degree of nonignorability is
plausible in many applications, this small c value signals sensitivity. Prior
research suggests c < 1 as
a rule of thumb to signal significant sensitivity.
In the sos
example, the c statistics for
(Intercept)}
and faculty
are both less than
1, suggesting that these coefficients
are sensitive to nonignorability, confirming previous findings. Prior
research also found that neither the gender
nor the
interaction term between gender
and faculty
should be sensitive, as our findings using ISNI confirm.
In the above we do not explicitly specify an missing data mechanism
model (MDM) via formula
argument in the
isniglm
function. The same analysis can be replicated by
explicitly specifying an MDM model using the code below. The
two-equation formula below
sexact | is.na(sexact) ~ gender*faculty | gender *faculty
uses the operator |
to separately specify variables used in
the complete-data model and MDM. The two-equation formula means that the
complete-data model is sexact
∼ gender*faculty
and that
is.na(sexact)
and gender*faculty
are the
missingness indicator and the missingness predictor s in the nonresponse model described
above, respectively.
ygmodel <- sexact | is.na(sexact) ~ gender*faculty | gender *faculty
summary(isniglm(ygmodel, family=binomial, data=sos))
## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ygmodel, family = binomial, data = sos)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
Because all the covariates in are categorical variables, one can also
analyze the data as a grouped binomial outcome using the
weight
argument as below.
gender <- c(0,0,1,1,0,0,1,1)
faculty <- c(0,0,0,0,1,1,1,1)
gender <- factor(gender, levels = c(0, 1), labels =c("male", "female"))
faculty <- factor(faculty, levels = c(0, 1), labels =c("other", "mdv"))
SAcount <- c(NA, 1277, NA, 1247, NA, 126, NA, 152)
total <- c(1189,1710,978,1657,68,215,73,246)
sosgrp <- data.frame(gender=gender, faculty=faculty, SAcount=SAcount, total=total)
ymodel <- SAcount/total ~gender*faculty
sosgrp.isni<-isniglm(ymodel, family=binomial, data=sosgrp, weight=total)
## # weights: 5 (4 variable)
## initial value 4253.151100
## final value 4027.954580
## converged
##
## Call:
## isniglm(formula = ymodel, family = binomial, data = sosgrp, weights = total)
##
## MAR Est. Std. Err ISNI c
## (Intercept) 1.081531 0.055611 0.410141 0.1356
## genderfemale 0.030808 0.079583 -0.038983 2.0415
## facultymdv -0.733886 0.149215 -0.169859 0.8785
## genderfemale:facultymdv 0.102133 0.206696 0.027542 7.5048
A tutorial describing the ISNI methodology and containing examples for ISNI computation for nonignorable missing data in longitudinal setting can be download (via)