Package 'dstat'

Title: Conditional Sensitivity Analysis for Matched Observational Studies
Description: A d-statistic tests the null hypothesis of no treatment effect in a matched, nonrandomized study of the effects caused by treatments. A d-statistic focuses on subsets of matched pairs that demonstrate insensitivity to unmeasured bias in such an observational study, correcting for double-use of the data by conditional inference. This conditional inference can, in favorable circumstances, substantially increase the power of a sensitivity analysis (Rosenbaum (2010) <doi:10.1007/978-1-4419-1213-8_14>). There are two examples, one concerning unemployment from Lalive et al. (2006) <doi:10.1111/j.1467-937X.2006.00406.x>, the other concerning smoking and periodontal disease from Rosenbaum (2017) <doi:10.1214/17-STS621>.
Authors: Paul R. Rosenbaum
Maintainer: Paul R. Rosenbaum <[email protected]>
License: GPL-2
Version: 1.0.4
Built: 2024-12-13 06:51:23 UTC
Source: CRAN

Help Index


Conditional Sensitivity Analysis for Matched Observational Studies

Description

A d-statistic tests the null hypothesis of no treatment effect in a matched, nonrandomized study of the effects caused by treatments. A d-statistic focuses on subsets of matched pairs that demonstrate insensitivity to unmeasured bias in such an observational study, correcting for double-use of the data by conditional inference. This conditional inference can, in favorable circumstances, substantially increase the power of a sensitivity analysis (Rosenbaum (2010) <doi:10.1007/978-1-4419-1213-8_14>). There are two examples, one concerning unemployment from Lalive et al. (2006) <doi:10.1111/j.1467-937X.2006.00406.x>, the other concerning smoking and periodontal disease from Rosenbaum (2017) <doi:10.1214/17-STS621>.

Details

The DESCRIPTION file:

Package: dstat
Type: Package
Title: Conditional Sensitivity Analysis for Matched Observational Studies
Version: 1.0.4
Author: Paul R. Rosenbaum
Maintainer: Paul R. Rosenbaum <[email protected]>
Description: A d-statistic tests the null hypothesis of no treatment effect in a matched, nonrandomized study of the effects caused by treatments. A d-statistic focuses on subsets of matched pairs that demonstrate insensitivity to unmeasured bias in such an observational study, correcting for double-use of the data by conditional inference. This conditional inference can, in favorable circumstances, substantially increase the power of a sensitivity analysis (Rosenbaum (2010) <doi:10.1007/978-1-4419-1213-8_14>). There are two examples, one concerning unemployment from Lalive et al. (2006) <doi:10.1111/j.1467-937X.2006.00406.x>, the other concerning smoking and periodontal disease from Rosenbaum (2017) <doi:10.1214/17-STS621>.
License: GPL-2
Encoding: UTF-8
LazyData: true
Imports: stats
NeedsCompilation: no
Packaged: 2019-04-15 13:23:56 UTC; Rosenbaum
Repository: CRAN
Date/Publication: 2019-04-16 09:42:41 UTC

Index of help topics:

amplify                 Amplification of sensitivity analysis in
                        observational studies.
dental                  Dental Problems Caused by Smoking
dstat                   Sensitivity Analysis Focusing on Subgroups with
                        Demonstrated Insensitivity to Unmeasured Bias
dstat-package           Conditional Sensitivity Analysis for Matched
                        Observational Studies
lalive                  Unemployment Duration Following an Increase in
                        Unemployment Benefits

The package provides a sensitivity analysis for a conditional test of the null hypothesis of no treatment effect in a matched observational study in which the unmeasured bias in treatment assignment is quantified by a sensitivity parameter gamma>=1. The test uses only those categories of pairs that demonstrate insensitivity to a bias of magnitude gamma, correcting for data-dependent selection of categories by conditional inference. The main function in the package is dstat().

Author(s)

Paul R. Rosenbaum

Maintainer: Paul R. Rosenbaum <[email protected]>

References

Rosenbaum, P. R. (1999). Using quantile averages in matched observational studies. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(1), 63-78. <doi.org/10.1111/1467-9876.00140>

Examples

data("dental")
attach(dental)
head(dental)
dstat(y,gamma=4.1,f=dose:age,fscore=c(1,1,2,2))
amplify(4,c(5,6,7))
detach(dental)

Amplification of sensitivity analysis in observational studies.

Description

Uses the method in Rosenbaum and Silber (2009) to interpret a value of the sensitivity parameter gamma. Each value of gamma amplifies to a curve (lambda,delta) in a two-dimensional sensitivity analysis, the inference being the same for all points on the curve. That is, a one-dimensional sensitivity analysis in terms of gamma has a two-dimensional interpretation in terms of (lambda,delta).

Usage

amplify(gamma, lambda)

Arguments

gamma

gamma > 1 is the value of the sensitivity parameter, for instance the parameter in senmv. length(gamma)>1 will generate an error.

lambda

lambda is a vector of values > gamma. An error will result unless lambda[i] > gamma > 1 for every i.

Details

A single value of gamma, say gamma = 2.2 in the example, corresponds to a curve of values of (lambda, delta), including (3, 7), (4, 4.33), (5, 3.57), and (7, 3) in the example. An unobserved covariate that is associated with a lambda = 3 fold increase in the odds of treatment and a delta = 7 fold increase in the odds of a positive pair difference is equivalent to gamma = 2.2.

The curve is gamma = (lambda*delta + 1)/(lambda+delta). Amplify is given one gamma and a vector of lambdas and solves for the vector of deltas. The calculation is elementary.

This interpretation of gamma is developed in detail in Rosenbaum and Silber (2009), and it makes use of Wolfe's (1974) family of semiparametric deformations of an arbitrary symmetric distribuiton.

Strictly speaking, the amplification describes matched pairs, not matched sets. The senm function views a k-to-1 matched set with k controls matched to one treated individual as a collection of k correlated treated-minus-control matched pair differences; see Rosenbaum (2007). For matched sets, it is natural to think of the amplification as describing any one of the k matched pair differences in a k-to-1 matched set.

The curve has asymptotes that the function amplify does not compute: gamma corresponds with (lambda,delta) = (gamma, Inf) and (Inf, gamma).

A related though distict idea is developed in Gastwirth et al (1998). The two approaches agree when the outcome is binary, that is, for McNemar's test.

Value

Returns a vector of values of delta of length(lambda) with names lambda.

Note

The amplify function is also in the sensitivitymv package where a different example is used.

Author(s)

Paul R. Rosenbaum

References

Gastwirth, J. L., Krieger, A. M., Rosenbaum, P. R. (1998) Dual and simultaneous sensitivity analysis for matched pairs. Biometrika, 85, 907-920.

Rosenbaum, P. R. (2007). Sensitivity analysis for m-estimates, tests and confidence intervals in matched observational studies. Biometrics 63 456-64. (R package sensitivitymv) <doi:10.1111/j.1541-0420.2006.00717.x>

Rosenbaum, P. R. (2016) Using Scheffe projections for multiple outcomes in an observational study of smoking and periondontal disease. Annals of Applied Statistics, 10, 1447-1471. <doi:10.1214/16-AOAS942>.

Rosenbaum, P. R. and Silber, J. H. (2009) Amplification of sensitivity analysis in observational studies. Journal of the American Statistical Association, 104, 1398-1405. <doi:10.1198/jasa.2009.tm08470>

Rosenbaum, P. R. (2015). Two R packages for sensitivity analysis in observational studies. Observational Studies, v. 1. (Free on-line.)

Wolfe, D. A. (1974) A charaterization of population weighted symmetry and related results. Journal of the American Statistical Association, 69, 819-822.

Examples

amplify(4,c(5,6,7))

Dental Problems Caused by Smoking

Description

Data from NHANES 2011-2012 containing 441 matched pairs of a daily cigarette smoker and a never smoker, recording the extent of periodontal disease. Pairs were matched for sex, age, black race, education in five categories, and ratio of family income to the poverty level.

Usage

data("dental")

Format

A data frame with 441 observations on the following 5 variables.

smoker

Periodontal disease in the daily-smoker

control

Periodontal disease in the never smoker

y

Smoker-minus-control pair difference

age

Age <= 50 > 50

dose

Cigarettes smoked per day by the smoker <10 >=10

Details

Excluding wisdom teeth, 6 measurements are taken for each tooth that is present, up to 28 teeth. Following Tomar and Asma (2000), a measurement indicates periodontal disease if either there is a loss of attachment of at least 4mm or a pocket depth of at least 4mm. The first individual has 11 measurements indicative of periodontal disease, out of 106 measurements, so pcteither is 100*11/106 = 10.38 percent.

Source

Data are from the National Health and Nutrition Examination Survey 2011-2012 and were used as an example in Rosenbaum (2017). In the second edition of Design of Observational Studies, these data are discussed in the chapter entitled Evidence Factors. Although the same 2x441 individuals were used here and in Rosenbaum (2017), the pairing was slightly changed to be exact for age>50.

References

Rosenbaum, P. R. (2017) The general structure of evidence factors in observational studies. Statist Sci 32, 514-530. <doi:10.1214/17-STS621>

Tomar, S. L. and Asma, S. (2000) Smoking attributable periodontitis in the US: Findings from NHANES III. J Periodont 71, 743-751.

US National Health and Nutrition Examination Survey 2011-2012. www.cdc.gov/nchs/nhanes/index.htm

Examples

data(dental)
attach(dental)
boxplot(y~dose:age)
abline(h=0)
detach(dental)
rm(dental)

Sensitivity Analysis Focusing on Subgroups with Demonstrated Insensitivity to Unmeasured Bias

Description

Sensitivity analysis using a d-statistic employing conditional inference to focus on those subgroups with demonstrated insensitivity to unmeasured biases.

Usage

dstat(y, qs = c(1/3, 2/3), gamma = 1, f = NULL, fscore = NULL, fr = 1, alpha = 0.05)

Arguments

y

A numeric vector of treated-minus-control matched pair differences in outcomes.

qs

Quantiles of |y| that partly define the d-statistic. Each coordinate of qs must be a number strictly between 0 and 1; otherwise, an error will result. See Details.

gamma

The sensitivity parameter, a number gamma>=1.

f

If f is not NULL, then it must be a factor that further subdivides y beyond the subdivisions implied by qs. If f is not NULL, then the length of f must equal the length of y; otherwise, an error will result.

fscore

If fscore is not NULL, then fscore contains integer scores to be attached to the levels of f. If f is not NULL but fscore is NULL, then the levels of f are viewed as nominal with equal emphasis. An error will result if fscore is not NULL but: (i) the scores are not integers, (ii) f is NULL, or (iii) the number of scores does not equal the number levels of f.

fr

A nonnegative number. If fr=0, then the test is simply a group-rank test, using every category, without conditional inference. The recommended default of fr=1 uses a category only if the proportion of positive y's in this category is at least equal to gamma/(1+gamma), and the conditional inference corrects for selection of categories based on y. In general, a category is used if the proportion of positive y is at least fr*gamma/(1+gamma), reducing to all categories if fr=0.

alpha

Of limited importance, a text message interprets numerical results in terms of rejection or not of the null hypothesis of no treatment effect in a one-sided, level-alpha test in the presence of a bias in treatment assignment of at most gamma>=1.

Details

The method is from Rosenbaum (2019). The example reproduces aspects of this manuscript.

The default values of qs, 1/3 and 2/3, are from Brown (1981)'s test. See Markowski and Hettmansperger (1982) for discussion of other choices. See Rosenbaum (2015) for comparisons of performance of different fixed choices of qs; here, a fixed choice is obtained by setting fr=0.

If a pair difference in y is zero, it falls in the lowest quantile of pairs and therefore receives weight zero along with other pair differences with small |y|.

Value

T

The test statistic

comp2

The sharp upper bound on the one-sided, exact P-value testing the null hypothesis of no treatment effect in the presence of a bias in treatment assignment of at most gamma.

scores

A vector reminding you of the scores, fscore, that you may have attached to the levels of f.

table

A table showing how individual categories contribute to the overall test. The notation in this table is from Rosenbaum (2019).

summary

A text summary of the conclusion.

Author(s)

Paul R. Rosenbaum

References

Brown, B. M. (1981). Symmetric quantile averages and related estimators. Biometrika, 68(1), 235-242.

Lalive, R., Van Ours, J., & Zweimüller, J. (2006). How changes in financial incentives affect the duration of unemployment. The Review of Economic Studies, 73, 1009-1038.

Markowski, E. P., Hettmansperger, T. P. (1982). Inference based on simple rank step score statistics for the location model. Journal of the American Statistical Association, 77(380), 901-907.

Noether, G. E. (1973). Some simple distribution-free confidence intervals for the center of a symmetric distribution. Journal of the American Statistical Association, 68(343), 716-719.

Rosenbaum, P. R. (1999). Using quantile averages in matched observational studies. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(1), 63-78. <doi.org/10.1111/1467-9876.00140>

Rosenbaum, P. R. and Silber, J. H. (2009) Amplification of sensitivity analysis in observational studies. Journal of the American Statistical Association, 104, 1398-1405. <doi:10.1198/jasa.2009.tm08470>

Rosenbaum, P. R. (2010) The power of a sensitivity analysis and its limit. Chapter 14 of Design of Observational Studies. NY: Springer. <doi:10.1007/978-1-4419-1213-8_14>

Rosenbaum, P. R. (2015). Bahadur efficiency of sensitivity analyses in observational studies. Journal of the American Statistical Association, 110(509), 205-217. <doi.org/10.1080/01621459.2014.960968>

Rosenbaum, P. R. (2017). Observation and Experiment: An Introduction to Causal Inference. Cambridge, MA: Harvard University Press.

Rosenbaum, P. R. (2019). A highly adaptive test for matched observational studies. Manuscript.

Examples

# First example is from Rosenbaum (2019)
data("lalive")
attach(lalive)
y<-log2((1+dur[after==1])/52)-log2((1+dur[after==0])/52)
dstat(y,qs=c(1/3,2/3),fr=0,gamma=1.15) # Brown's (1981) test
dstat(y,qs=c(2/3),fr=0,gamma=1.25) # Noether's (1973) Test
#Amplification: see Rosenbaum and Silber (2009), Rosenbaum (2017, Table 9.1)
amplify(1.25,2)

bothseasonal<-(2==(seasonal[after==1]+seasonal[after==0]))*1
bothseasonal<-factor(bothseasonal,levels=1:0,
   labels=c("S","O"),ordered=TRUE)
straddle<-1*((pmin(dur[after==1],dur[after==0])<=(bdur[after==0]))
   &(pmax(dur[after==1],dur[after==0])>(bdur[after==0])))
straddle<-factor(straddle,levels=c(0,1),
   labels=c("N","Y"),ordered=TRUE)
dose<-1*((bdur[after==1]-bdur[after==0])>9.5)
dose<-factor(dose,levels=c(0,1),
   labels=c("L","H"),ordered=TRUE)
f<-bothseasonal:dose:straddle
dstat(y,qs=c(1/3,2/3),f=f,gamma=1.25)
# Reproduces Table 2 in Rosenbaum (2019)
dstat(y,qs=c(1/3,2/3),f=f,gamma=1.45)
amplify(1.45,c(2,2.5,3,4))

# Doubling the weight for high-dose matched pairs
levels(f)
fs<-c(1,1,2,2,1,1,2,2)
dstat(y,qs=c(1/3,2/3),f=f,fscore=fs,gamma=1.45)

rm(y,f,dose,straddle,bothseasonal)
detach(lalive)
rm(lalive)

# Second example uses dental data
data(dental)
attach(dental)
f<-age:dose
levels(f)
# Doubles the weight at high dose using fscore
# For qs = (.4,.8), see Markowski and Hettmansperger (1982)
dstat(y,qs=c(.4,.8),gamma=4.25,f=f,fscore=c(1,2,1,2))
rm(f)
detach(dental)
rm(dental)

Unemployment Duration Following an Increase in Unemployment Benefits

Description

Data from a study by Lalive, van Ours and Zweimüller (2006) concerning the duration of unemployment before and after an increase in unemployment benefits, both the benefit amount and the duration of benefits. The original study takes account of many relevant considerations not included in the current subset of the data The data were used as methodological example in Rosenbaum (2019).

Usage

data("lalive")

Format

A data frame with 2782 observations on the following 17 variables.

id

ID number

mset

Matched pair, 1,2,...,1391.

after

Treatment indicator, 1=after benefits increase, 0=before benefits increase

type

a factor with levels PBD and RR

dur

Duration of unemployment in weeks.

bdur

Duration of unemployment benefits in weeks

e3_5

1 if worked for at least 3 of the past 5 years, 0 otherwise.

lehre

1 if apprenticeship, 0 otherwise

married

1 if married, 0 otherwise

divorced

1 if divorced, 0 otherwise

bc

1 if lost a blue colar job, 0 otherwise

seasonal

1 if lost a seasonal job, 0 otherwise

manuf

1 if lost a manufacturing job, 0 otherwise

age

Age in years

nwage_pj

Wage in the prior job in Austrian schillings

educ

0 if primary education, 1 if secondary education, 2 if tertiary education

propensity

An estimated propensity score

Details

The data are from Lalive, van Ours and Zweimüller (2006), by way of the web-page for the textbook Cahuc, P., Carcillo, S. and Zylberberg, A. (2014).

In August 1989, Austria increased its unemployment benefits for certain categories of workers. The category considered here, type=PBD and RR, had an increase in the duration of unemployment benefits and an increase in unemployment compensation. There are two groups, those unemployed in the two years before the benefit increase, after=0, and those unemployed in the two years after the increase, after=1.

The data are 1391 matched pairs, matached for e3_5, lehre, married, divorced, bc, seasonal, manuf, age, nwage_pj, and educ, with fine balance for quintiles of the propensity score. All are men, and none were temporarily laid off. The matching used a simplified version of the method in Rosenbaum (2017).

The original study by Lalive et al. (2006) sensibly takes account of many relevant considerations not included in the current subset of the data. The limited data available here were used to illustrate certain methodological issues in Rosenbaum (2019).

Source

Lalive, R., Van Ours, J., & Zweimüller, J. (2006).

References

Cahuc, P., Carcillo, S. and Zylberberg, A. (2014). Labor Economics, Second Edition. Cambridge, MA: MIT Press. https://mitpress.mit.edu/books/labor-economics-second-edition

Lalive, R., Van Ours, J., & Zweimüller, J. (2006). How changes in financial incentives affect the duration of unemployment. The Review of Economic Studies, 73, 1009-1038. <doi:10.1111/j.1467-937X.2006.00406.x>

Rosenbaum, P. R. (2017). Imposing minimax and quantile constraints on optimal matching in observational studies. Journal of Computational and Graphical Statistics, 26, 66-78.

Rosenbaum, P. R. (2019). A highly adaptive test for matched observational studies. Manuscript.

Examples

data(lalive)
attach(lalive)
# covariate balance
boxplot(propensity~after,names=c("Before","After"),ylab="Propensity Score")
boxplot(age~after,names=c("Before","After"),ylab="Age")
boxplot(nwage_pj~after,names=c("Before","After"),ylab="Prior Wage")
table(after,seasonal)
# outcome
y<-log2((1+dur[after==1])/52)-log2((1+dur[after==0])/52)
boxplot(y,ylab="Pair Difference in base 2 logs",
    main="Unemployment Duration")
abline(h=c(-1,0,1),lty=2)
rm(y)
detach(lalive)