Package 'iTOS' reference manual

Title:	Methods and Examples from Introduction to the Theory of Observational Studies
Description:	Supplements for a book, "iTOS" = "Introduction to the Theory of Observational Studies." Data sets are 'aHDL' from Rosenbaum (2023a) <doi:10.1111/biom.13558> and 'bingeM' from Rosenbaum (2023b) <doi:10.1111/biom.13921>. The function makematch() uses two-criteria matching from Zhang et al. (2023) <doi:10.1080/01621459.2021.1981337> to create the matched data 'bingeM' from 'binge'. The makematch() function also implements optimal matching (Rosenbaum (1989) <doi:10.2307/2290079>) and matching with fine or near-fine balance (Rosenbaum et al. (2007) <doi:10.1198/016214506000001059> and Yang et al (2012) <doi:10.1111/j.1541-0420.2011.01691.x>). The book makes use of two other R packages, 'weightedRank' and 'tightenBlock'.
Authors:	Paul R. Rosenbaum [aut, cre]
Maintainer:	Paul R. Rosenbaum <rosenbaum@wharton.upenn.edu>
License:	GPL-2
Version:	1.0.3
Built:	2025-03-05 06:34:34 UTC
Source:	CRAN

Methods and Examples from Introduction to the Theory of Observational Studies

Description

Supplements for a book, "iTOS" = "Introduction to the Theory of Observational Studies." Data sets are 'aHDL' from Rosenbaum (2023a) <doi:10.1111/biom.13558> and 'bingeM' from Rosenbaum (2023b) <doi:10.1111/biom.13921>. The function makematch() uses two-criteria matching from Zhang et al. (2023) <doi:10.1080/01621459.2021.1981337> to create the matched data 'bingeM' from 'binge'. The makematch() function also implements optimal matching (Rosenbaum (1989) <doi:10.2307/2290079>) and matching with fine or near-fine balance (Rosenbaum et al. (2007) <doi:10.1198/016214506000001059> and Yang et al (2012) <doi:10.1111/j.1541-0420.2011.01691.x>). The book makes use of two other R packages, 'weightedRank' and 'tightenBlock'.

Details

The DESCRIPTION file:

Package:	iTOS
Type:	Package
Title:	Methods and Examples from Introduction to the Theory of Observational Studies
Version:	1.0.3
Authors@R:	person(given = c("Paul", "R."), family = "Rosenbaum", role = c("aut", "cre"), email = "rosenbaum@wharton.upenn.edu")
Author:	Paul R. Rosenbaum [aut, cre]
Maintainer:	Paul R. Rosenbaum <rosenbaum@wharton.upenn.edu>
Description:	Supplements for a book, "iTOS" = "Introduction to the Theory of Observational Studies." Data sets are 'aHDL' from Rosenbaum (2023a) <doi:10.1111/biom.13558> and 'bingeM' from Rosenbaum (2023b) <doi:10.1111/biom.13921>. The function makematch() uses two-criteria matching from Zhang et al. (2023) <doi:10.1080/01621459.2021.1981337> to create the matched data 'bingeM' from 'binge'. The makematch() function also implements optimal matching (Rosenbaum (1989) <doi:10.2307/2290079>) and matching with fine or near-fine balance (Rosenbaum et al. (2007) <doi:10.1198/016214506000001059> and Yang et al (2012) <doi:10.1111/j.1541-0420.2011.01691.x>). The book makes use of two other R packages, 'weightedRank' and 'tightenBlock'.
License:	GPL-2
Encoding:	UTF-8
LazyData:	true
Imports:	stats, MASS, rcbalance, BiasedUrn, xtable
Suggests:	weightedRank
Depends:	R (>= 3.5.0)
NeedsCompilation:	no
Packaged:	2024-09-02 19:37:06 UTC; rosenbap
Repository:	CRAN
Date/Publication:	2024-09-05 16:00:02 UTC

Index of help topics:

aHDL                    Alcohol and HDL Cholesterol
addMahal                Rank-Based Mahalanobis Distance Matrix
addNearExact            Add a Near-exact Penalty to an Exisiting
                        Distance Matrix.
addcaliper              Add a Caliper to an Existing Cost Matrix
addinteger              Add an Integer Penalty to an Existing Distance
                        Matrix
addquantile             Cut a Covariate at Quantiles and Add a Penalty
                        for Different Quantile Categories
amplify                 Amplification of sensitivity analysis in
                        observational studies.
binge                   Binge Drinking and High Blood Pressure
bingeM                  Binge Drinking and High Blood Pressure -
                        Matched With Two Control Groups
computep                Computes individual and pairwise treatment
                        assignment probabilities.
ev                      Computes the null expectation and variance for
                        one stratum.
evalBal                 Evaluate Covariate Balance in a Matched Sample
evall                   Compute expectations and variances for one
                        stratum.
gconv                   Convolution of Two Probability Generating
                        Functions
iTOS-package            Methods and Examples from Introduction to the
                        Theory of Observational Studies
makematch               Two-Criteria Matching
makenetwork             Make the Network Used for Matching with Two
                        Criteria
noether                 Sensitivity Analysis Using Noether's Test for
                        Matched Pairs
startcost               Initialize a Distance Matrix.
zeta                    zeta function in sensitivity analysis

Author(s)

Paul R. Rosenbaum [aut, cre]

Maintainer: Paul R. Rosenbaum <rosenbaum@wharton.upenn.edu>

References

Rosenbaum, Paul R. Introduction to the Theory of Observational Studies. Manuscript, 2024.

Rosenbaum, P. R. (1989) <doi:10.2307/2290079> Optimal matching for observational studies. Journal of the American Statistical Association, 84, 1024-1032.

Rosenbaum, Paul R., Richard N. Ross, and Jeffrey H. Silber (2007) <doi:10.1198/016214506000001059> Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association 102, 75-83.

Rosenbaum, P. R. (2023a) <doi:10.1111/biom.13558> Sensitivity analyses informed by tests for bias in observational studies. Biometrics 79, 475-487.

Rosenbaum, P. R. (2023b) <doi:10.1111/biom.13921> A second evidence factor for a second control group. Biometrics, 79, 3968-3980.

Examples

data(binge)
table(binge$AlcGroup)
data(aHDL)
table(aHDL$grp)
data(binge)
table(binge$AlcGroup)
data(aHDL)
table(aHDL$grp)

Add a Caliper to an Existing Cost Matrix

Description

For one covariate, adds a caliper to an existing cost matrix.

Usage

addcaliper(costmatrix, z, p, caliper = NULL, penalty = 1000, twostep = TRUE)
addcaliper(costmatrix, z, p, caliper = NULL, penalty = 1000, twostep = TRUE)

Arguments

`costmatrix`	An existing cost matrix with sum(z) rows and sum(1-z) columns. The function checks the compatability of costmatrix, z and p; so, it may stop with an error if these are not of appropriate dimensions. In particular, costmatrix may come from startcost().
`z`	A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls.
`p`	A vector with the same length as p. The vector p is the covariate for which a caliper is needed.
`caliper`	Determines the type and length of the caliper. The caliper becomes a vector cvex with length 2. If is.null(caliper), then the caliper is +/- 0.2 times the standard deviation of p, namely cvec = c(-.2,.2)sd(p). If caliper is a single number, then the caliper is +/- caliper, or cvec = c(-1,1)abs(caliper). If caliper is a vector of length 2, then an asymmetric caliper is used, cvec = c(min(caliper),max(caliper)), where min(caliper) must be negative and max caliper must be positive.
`penalty`	Let I be the index of ith treated individual, 1,...,sum(z), and J be the index of the jth control, j=1,...,sum(1-z), so 1 <= I <= length(z) and so 1 <= J <= length(z). The penality added to costmatrix[i,j] is 0 if cvec[1] <= p[I]-p[J] <= cvex[2].
`twostep`	If twostep is FALSE, then no action is taken. If twostep is true, no action is take if 2 cvec[1] <= p[I]-p[J] <= 2 cvex[2], and otherwise costmatrix[i,j] is further increased by adding penalty. In words, the penalty is doubled if p[I]-p[J] falls outside twice the caliper.

Details

For discussion of directional calipers, see Yu and Rosenbaum (2019).

Value

A penalized costmatrix.

Author(s)

Paul R. Rosenbaum

References

Cochran, William G., and Donald B. Rubin. Controlling bias in observational studies: A review. Sankhya: The Indian Journal of Statistics, Series A 1973;35:417-446.

Yu, Ruoqi, and Paul R. Rosenbaum. <doi:10.1111/biom.13098> Directional penalties for optimal matching in observational studies. Biometrics 2019;75:1380-1390.

Examples

data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist

# Ten-year age caliper
addcaliper(dist,z,x$age,caliper=10,twostep=FALSE)

# Ten-year age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=10,twostep=TRUE)

# Same ten-year age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=c(-10,10))

# Asymmetric, directional age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=c(-2,10))
# Treated 109315 aged 30 is more than 2 years younger
# than control 109273 aged 36, 30-36<(-2), so
# row 109315 column 109273 is penalized, indeed
# double penalized, as 30-36<2*(-2)

rm(z,x,dist,d)
data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist

# Ten-year age caliper
addcaliper(dist,z,x$age,caliper=10,twostep=FALSE)

# Ten-year age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=10,twostep=TRUE)

# Same ten-year age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=c(-10,10))

# Asymmetric, directional age caliper with twostep=TRUE
addcaliper(dist,z,x$age,caliper=c(-2,10))
# Treated 109315 aged 30 is more than 2 years younger
# than control 109273 aged 36, 30-36<(-2), so
# row 109315 column 109273 is penalized, indeed
# double penalized, as 30-36<2*(-2)

rm(z,x,dist,d)

Add an Integer Penalty to an Existing Distance Matrix

Description

Takes an integer valued covariate, and adds a penalty proportional to the difference in the integer values, with proportionality constant penalty.

Usage

addinteger(costmatrix, z, iscore, penalty = 1000)
addinteger(costmatrix, z, iscore, penalty = 1000)

Arguments

`costmatrix`	An existing cost matrix with sum(z) rows and sum(1-z) columns. The function checks the compatability of costmatrix, z and p; so, it may stop with an error if these are not of appropriate dimensions. In particular, costmatrix may come from startcost().
`z`	A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls.
`iscore`	An vector of integers with length equal to length(z).
`penalty`	One positive number used to penalize mismatches for iscore.

Details

If a treated and control individual differ on iscore in absolute value by dif, then the distance between them is increased by adding dif*penalty.

Value

A penalized distance matrix.

Author(s)

Paul R. Rosenbaum

Examples

data(binge)
# Select two treated and four controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290,109332)),]
attach(d)
z<-1*(AlcGroup=="B")
names(z)<-d$SEQN
rbind(z,education)
dist<-startcost(z)
addinteger(dist,z,education,penalty=3)
detach(d)
rm(d,dist,z)
data(binge)
# Select two treated and four controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290,109332)),]
attach(d)
z<-1*(AlcGroup=="B")
names(z)<-d$SEQN
rbind(z,education)
dist<-startcost(z)
addinteger(dist,z,education,penalty=3)
detach(d)
rm(d,dist,z)

Rank-Based Mahalanobis Distance Matrix

Description

Adds a rank-based Mahalanobis distance to an exisiting distance matrix.

Usage

addMahal(costmatrix, z, X)
addMahal(costmatrix, z, X)

Arguments

`costmatrix`	An existing cost matrix with sum(z) rows and sum(1-z) columns. The function checks the compatability of costmatrix, z and p; so, it may stop with an error if these are not of appropriate dimensions. In particular, costmatrix may come from startcost().
`z`	A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls.
`X`	A matrix with length(z) rows containing covariates.

Details

The rank-based Mahalanobis distance is defined in section 9.3 of Rosenbaum (2020).

Value

A new distance matrix that is the sum of costmatrix and the rank-based Mahalanobis distances.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. (2020) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Rubin, D. B. (1980) <doi:10.2307/2529981> Bias reduction using Mahalanobis-metric matching. Biometrics, 36, 293-298.

Examples

data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-cbind(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist
dist<-addMahal(dist,z,x)
dist
rm(z,x,dist,d)
data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-cbind(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist
dist<-addMahal(dist,z,x)
dist
rm(z,x,dist,d)

Add a Near-exact Penalty to an Exisiting Distance Matrix.

Description

Add a Near-exact Penalty to an Exisiting Distance Matrix.

Usage

addNearExact(costmatrix, z, exact, penalty = 1000)
addNearExact(costmatrix, z, exact, penalty = 1000)

Arguments

`costmatrix`	An existing cost matrix with sum(z) rows and sum(1-z) columns. The function checks the compatability of costmatrix, z and p; so, it may stop with an error if these are not of appropriate dimensions. In particular, costmatrix may come from startcost().
`z`	A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls.
`exact`	A vector with the same length as z. Typically, exact take a small or moderate number of values.
`penalty`	One positive number.

Details

If the ith treated individual and the jth control have different values of exact, then the distance between them in costmatrix is increased by adding penalty.

Value

A penalized distance matrix.

Note

A sufficiently large penalty will maximize the number of individuals exactly matched for exact. A smaller penalty will tend to increase the number of individuals matched exactly, without prioritizing one covariate over all others.

If the left distance matrix is penalized, it will affect pairing and balance; however, if the right distance matrix is penalized it will affect balance only, as in the near-fine balance technique of Yang et al. (2012).

Adding several near-exact penalties for different covariates on the right distance matrix implements a Hamming distance on the joint distribution of those covariates, as discussed in Zhang et al. (2023).

Near-exact matching for a nominal covariate is discussed and contrasted with exact matching in Sections 10.3 and 10.4 of Rosenbaum (2020). Near-exact matching is always feasible, because it implements a constraint using a penalty. Exact matching may be infeasible, but when feasible it may be used to speed up computations. For an alternative method of speeding computations, see Yu et al. (2020) who identify feasible constraints very quickly prior to matching with those constraints.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. (2020) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Yang, D., Small, D. S., Silber, J. H. and Rosenbaum, P. R. (2012) <doi:10.1111/j.1541-0420.2011.01691.x> Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes. Biometrics, 68, 628-636. (Extension of fine balance useful when fine balance is infeasible. Comes as close as possible to fine balance. Implemented in makematch() by placing a large near-exact penalty on a nominal/integer covariate x1 on the right distance matrix.)

Yu, R., Silber, J. H., Rosenbaum, P. R. (2020) <doi:10.1214/19-STS699> Matching Methods for Observational Studies Derived from Large Administrative Databases. Statistical Science, 35, 338-355.

Examples

data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist
addNearExact(dist,z,x$female)
addNearExact(dist,z,x$age<40,penalty=10)

# Combine several penalties
dist<-addNearExact(dist,z,x$female)
dist<-addNearExact(dist,z,x$age<40,penalty=10)
dist
dist<-addNearExact(dist,z,x$age<60,penalty=5)
dist
# This distance suggests pairing 109315-109266
# and 109365-109290
rm(z,x,dist,d)
data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
dist<-startcost(z)
z
x
dist
addNearExact(dist,z,x$female)
addNearExact(dist,z,x$age<40,penalty=10)

# Combine several penalties
dist<-addNearExact(dist,z,x$female)
dist<-addNearExact(dist,z,x$age<40,penalty=10)
dist
dist<-addNearExact(dist,z,x$age<60,penalty=5)
dist
# This distance suggests pairing 109315-109266
# and 109365-109290
rm(z,x,dist,d)

Cut a Covariate at Quantiles and Add a Penalty for Different Quantile Categories

Description

Cut a Covariate at Quantiles and Add a Penalty for Different Quantile Categories

Usage

addquantile(costmatrix, z, p, pct = c(0.2, 0.4, 0.6, 0.8), penalty = 1000)
addquantile(costmatrix, z, p, pct = c(0.2, 0.4, 0.6, 0.8), penalty = 1000)

Arguments

`costmatrix`	An existing cost matrix with sum(z) rows and sum(1-z) columns. The function checks the compatability of costmatrix, z and p; so, it may stop with an error if these are not of appropriate dimensions. In particular, costmatrix may come from startcost().
`z`	A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls.
`p`	A vector of length equal to length(z). Quantiles of p will penalize the distance.
`pct`	A vector of numbers strictly between 0 and 1. These determine the quantiles of p. For instance, c(.25,.5,.75) uses the quartiles of p.
`penalty`	One positive number used as a penalty.

Details

The vector p is cut at its quantiles defined by pct, and the integer difference in quantile categories is multiplied by penalty and added to the distance matrix. The function is similar to addinteger(), except the integer values are not specified, but rather are deduced from the quantiles.

If you cannot match for the quantile category of p, then addquantile() prefers to match from an adjacent quantile category.

Value

A penalized distance matrix.

Author(s)

Paul R. Rosenbaum

Examples

data(binge)
d<-binge[binge$AlcGroup!="N",]
attach(d)
z<-1*(AlcGroup=="B")
names(z)<-SEQN
dist<-startcost(z)
quantile(age,pct=c(1/4,1/2,3/4))
rbind(z,age)[,1:20]
addquantile(dist,z,d$age,pct=c(1/4,1/2,3/4),penalty=5)[1:5,1:7]
detach(d)
rm(z,dist,d)
data(binge)
d<-binge[binge$AlcGroup!="N",]
attach(d)
z<-1*(AlcGroup=="B")
names(z)<-SEQN
dist<-startcost(z)
quantile(age,pct=c(1/4,1/2,3/4))
rbind(z,age)[,1:20]
addquantile(dist,z,d$age,pct=c(1/4,1/2,3/4),penalty=5)[1:5,1:7]
detach(d)
rm(z,dist,d)

Alcohol and HDL Cholesterol

Description

A small observational study of light daily alcohol consumption and HDL cholesterol – so-called good cholesterol – derived from NHANES 2013-2014 and 2015-2016. There are 406 matched sets of four individuals, making 1624 individuals in total. Sets were matched for age, female and education in five ordered categories.

Usage

data("aHDL")data("aHDL")

Format

A data frame with 1624 observations on the following 11 variables.

nh: NHANES 2013-2014 is 1314, and NHANES 2015-2016 is 1516
SEQN: NHANES ID number
age: Age in years
female: 1=female, 0=male
education: 1 is <9th grade, 3 is high school, 5 is a BA degree
z: 1=light almost daily alcohol, 0=little or no alcohol last year.
grp: Treated group and control groups. Daily=light almost daily alcohol, Never=fewer than 12 drinks during entire life, Rarely=more than 12 drinks in life, but fewer than 12 in the past year, and never had a period of daily binge drinking, PastBinge = a past history of binge drinking on most days, but currently drinks once a week or less. For details, see Rosenbaum (2022a, Appendix).
grpL: Short labels for plotting formed as the first letters of grp. D < N < R < B
hdl: HDL cholesterol level mg/dL
mmercury: Methylmercury level ug/L
mset: Matched set indicator, 1, 2, ..., 406. The 1624 observations are in 406 matched sets, each of size 4.

Details

There is a debate about whether light daily alcohol consumption – a single glass of red wine – shortens or lengthens life. LoConte et al. (2018) emphasize that alcohol is a carcinogen. Suh et al. (1992) claim reduced cardiovascular mortality brought about by an increase in high density high-density lipoprotein (HDL) cholesterol, the so-called good cholesterol. There is on-going debate about whether there are cardiovascular benefits, and if they exist, whether they are large enough to offset an increased risk of cancer. This example looks at a small corner of the larger debate, namely the effect on HDL cholesterol.

The example contains several attempts to detect unmeasured confounding bias, if present. There is a secondary outcome thought to be unaffected by alcohol consumption, namely methylmercury levels in the blood, likely an indicator of the consumption of fish, not of alcohol; see Pedersen et al. (1994) and WHO (2021). There are also three control groups, all with little present alcohol consumption, but with different uses of alcohol in the past; see the definition of variable grp above.

The appendix to Rosenbaum (2023) describes the data and matching in detail. It is used as an example in Rosenbaum (2022).

The help file for boxplotTT() applies the tail transformation to this example, reproducing a plot from Rosenbaum (2022).

This data set is also included in the tailTransform package. See also the informedSen package which contains a part of this data set.

Source

US National Health and Nutrition Examination Survey (NHANES), 2013-2014 and 2015-2016.

References

LoConte, N. K., Brewster, A. M., Kaur, J. S., Merrill, J. K., and Alberg, A. J. <doi:10.1200/JCO.2017.76.1155> Alcohol and cancer: a statement of the American Society of Clinical Oncology. Journal of Clinical Oncology 2018;36:83-93.

Pedersen, G. A., Mortensen, G. K. and Larsen, E. H. <doi:10.1080/02652039409374234> Beverages as a source of toxic trace element intake. Food Additives and Contaminants, 1994;11:351–363.

Rosenbaum, P. R. (1987) <doi:10.1214/ss/1177013232> The role of a second control group in an observational study. Statistical Science, 1987;2:292-306.

Rosenbaum, P. R. <doi:10.2307/2531497> The role of known effects in observational studies. Biometrics, 1989;45:557-569.

Rosenbaum, P. R. <doi:10.1214/aos/1176347131> On permutation tests for hidden biases in observational studies. The Annals of Statistics, 1989;17:643-653.

Rosenbaum, P. R. <doi:10.1080/01621459.2013.879261> Weighted M-statistics with superior design sensitivity in matched observational studies with multiple controls. Journal of the American Statistical Association, 2014;109(507):1145-1158

Rosenbaum, P. R. <doi:10.1080/00031305.2022.2063944> (2022). A new transformation of treated-control matched-pair differences for graphical display. American Statistician, 2022;76(4):346-352.

Rosenbaum, P. R. <doi:10.1111/biom.13558> Sensitivity analyses informed by tests for bias in observational studies. Biometrics 2023;79(1):475-487.

Suh, I., Shaten, B. J., Cutler, J. A., and Kuller, L. H. <doi:10.7326/0003-4819-116-11-881> Alcohol use and mortality from coronary heart disease: The role of high-density lipoprotein cholesterol. Annals of Internal Medicine 1992;116:881-887.

World Health Organization (2021). Mercury and Health, <https://www.who.int/news-room/fact-sheets/detail/mercury-and-health>, (Accessed 30 August 2021).

Examples

data(aHDL)
table(aHDL$grp,aHDL$grpL) # Short labels for plotting
boxplot(aHDL$age~aHDL$grp,xlab="Group",ylab="Age")
boxplot(aHDL$education~aHDL$grp,xlab="Group",ylab="Education")
table(aHDL$female,aHDL$grpL)
table(aHDL$z,aHDL$grpL)
#
# In iTOS book, for Table 8.1
y<-t(matrix(aHDL$hdl,4,406))
weightedRank::wgtRank(y,gamma=5,phi="wilc")
weightedRank::wgtRank(y,gamma=5,phi="quade")
weightedRank::wgtRank(y,gamma=5,phi="u868")
weightedRank::wgtRank(y,gamma=5,phi="u878")
#
# The sets were also matched for is.na(aHDL$mmercury), for use
# in Rosenbaum (2023).  About half of the matched sets
# have values for mmercury.  Discussed in Chapter 12 of iTOS.
table(is.na(aHDL$mmercury),aHDL$grp)
# See also the informedSen package for additional analysis
data(aHDL)
table(aHDL$grp,aHDL$grpL) # Short labels for plotting
boxplot(aHDL$age~aHDL$grp,xlab="Group",ylab="Age")
boxplot(aHDL$education~aHDL$grp,xlab="Group",ylab="Education")
table(aHDL$female,aHDL$grpL)
table(aHDL$z,aHDL$grpL)
#
# In iTOS book, for Table 8.1
y<-t(matrix(aHDL$hdl,4,406))
weightedRank::wgtRank(y,gamma=5,phi="wilc")
weightedRank::wgtRank(y,gamma=5,phi="quade")
weightedRank::wgtRank(y,gamma=5,phi="u868")
weightedRank::wgtRank(y,gamma=5,phi="u878")
#
# The sets were also matched for is.na(aHDL$mmercury), for use
# in Rosenbaum (2023).  About half of the matched sets
# have values for mmercury.  Discussed in Chapter 12 of iTOS.
table(is.na(aHDL$mmercury),aHDL$grp)
# See also the informedSen package for additional analysis

Amplification of sensitivity analysis in observational studies.

Description

Uses the method in Rosenbaum and Silber (2009) to interpret a value of the sensitivity parameter gamma. Each value of gamma amplifies to a curve (lambda,delta) in a two-dimensional sensitivity analysis, the inference being the same for all points on the curve. That is, a one-dimensional sensitivity analysis in terms of gamma has a two-dimensional interpretation in terms of (lambda,delta).

Usage

amplify(gamma, lambda)
amplify(gamma, lambda)

Arguments

`gamma`	gamma > 1 is the value of the sensitivity parameter, for instance the parameter in senmv. length(gamma)>1 will generate an error.
`lambda`	lambda is a vector of values > gamma. An error will result unless lambda[i] > gamma > 1 for every i.

Details

A single value of gamma, say gamma = 2.2 in the example, corresponds to a curve of values of (lambda, delta), including (3, 7), (4, 4.33), (5, 3.57), and (7, 3) in the example. An unobserved covariate that is associated with a lambda = 3 fold increase in the odds of treatment and a delta = 7 fold increase in the odds of a positive pair difference is equivalent to gamma = 2.2.

The curve is gamma = (lambda*delta + 1)/(lambda+delta). Amplify is given one gamma and a vector of lambdas and solves for the vector of deltas. The calculation is elementary.

This interpretation of gamma is developed in detail in Rosenbaum and Silber (2009), and it makes use of Wolfe's (1974) family of semiparametric deformations of an arbitrary symmetric distribuiton. See also Rosenbaum (2020, Section 3.6). For an elementary discussion, see Rosenbaum (2017, Table 9.1).

Strictly speaking, the amplification describes matched pairs, not matched sets. The senm function views a k-to-1 matched set with k controls matched to one treated individual as a collection of k correlated treated-minus-control matched pair differences; see Rosenbaum (2007). For matched sets, it is natural to think of the amplification as describing any one of the k matched pair differences in a k-to-1 matched set.

The curve has asymptotes that the function amplify does not compute: gamma corresponds with (lambda,delta) = (gamma, Inf) and (Inf, gamma).

A related though distict idea is developed in Gastwirth et al (1998). The two approaches agree when the outcome is binary, that is, for McNemar's test.

Value

Returns a vector of values of delta of length(lambda) with names lambda.

Note

The amplify function is also in the sensitivitymv package where a different example is used.

Author(s)

Paul R. Rosenbaum

References

Gastwirth, J. L., Krieger, A. M., Rosenbaum, P. R. (1998) <doi:10.1093/biomet/85.4.907> Dual and simultaneous sensitivity analysis for matched pairs. Biometrika, 85, 907-920.

Rosenbaum, P. R. and Silber, J. H. (2009) <doi:10.1198/jasa.2009.tm08470> Amplification of sensitivity analysis in observational studies. Journal of the American Statistical Association, 104, 1398-1405.

Rosenbaum, P. R. (2017) <doi:10.4159/9780674982697> Observation and Experiment: An Introduction to Causal Inference. Cambridge, MA: Harvard University Press. Table 9.1.

Rosenbaum, P. R. (2020) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd ed.) NY: Springer. Section 3.6.

Wolfe, D. A. (1974) <doi:10.2307/2286025> A charaterization of population weighted symmetry and related results. Journal of the American Statistical Association, 69, 819-822.

Examples

#  Consider a treated-control match pair as the unit of measure,
#  analogous to one meter or one foot.  The calculation
#  amplify(4,7) says that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  7-fold and increases the odds of positive matched-pair
#  difference in outcomes 9-fold.
amplify(4,7)
#  It is also true that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  9-fold and increases the odds of positive matched-pair
#  difference in outcomes 7-fold.
amplify(4,9)
#  It is also true that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  5-fold and increases the odds of positive matched-pair
#  difference in outcomes 19-fold.
amplify(4,5)
# The amplify function can produce the entire curve at once:
amplify(4,5:19)
#  Consider a treated-control match pair as the unit of measure,
#  analogous to one meter or one foot.  The calculation
#  amplify(4,7) says that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  7-fold and increases the odds of positive matched-pair
#  difference in outcomes 9-fold.
amplify(4,7)
#  It is also true that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  9-fold and increases the odds of positive matched-pair
#  difference in outcomes 7-fold.
amplify(4,9)
#  It is also true that, in a matched pair, gamma=4
#  is the same a bias that increases the odds of treatment
#  5-fold and increases the odds of positive matched-pair
#  difference in outcomes 19-fold.
amplify(4,5)
# The amplify function can produce the entire curve at once:
amplify(4,5:19)

Binge Drinking and High Blood Pressure

Description

These unmatched data are from NHANES, and they illustrate multivariate matching. The matched version of the data is bingeM, and it was produced by the example in the documentation for the makematch() function.

Usage

data("binge")data("binge")

Format

A data frame with 4627 observations on the following 17 variables.

SEQN: NHANES identification number
age: Age in years
ageC: Age cut into four categories, [20,30), [30,45), [45,60) and [60,Inf).
female: 1=female, 0=male
educationf: Education in five categories: <9th grade, 9th-11th grade without a high school degree or equivalent, high school degree or equivalent, some college, at least a BA degree. An ordered factor with levels <9th < 9-11 < HS < SomeCol < >=BA
education: The previous variable, educationf, as integer scores, 1-5.
bmi: BMI or body mass index. A measure of obesity.
waisthip: Waist-to-hip ratio. A measure of obesity.
vigor: 1 if engages in vigorous activity, either in recreation or at work, 0 otherwise
smokenowf: Do you smoke now? Everyday < SomeDays < No
smokenow: The previous variable, smokenowf, as integer scores.
bpRX: 1=currently taking medication to control high blood pressure, 0=other
smokeQuit: 1=used to smoke regularly but quit, 0=other. A current smoker and a never smoker both have value 0.
AlcGroup: An ordered factor with levels B < N < P
bpSystolic: Systolic blood pressure. The average of up to three measurements.
bpDiastolic: Diastolic blood pressure. The average of up to three measurements.
bpCombined: A combined measure of blood pressure.

Details

This data set is intended to illustrate multivariate matching. See the example in the documentation for the makematch() function.

In the examples below, the simple B-N match in Section 4.3 of the iTOS book is constructed. It matches for the propensity score and nothing else.

bpCombined is the sum of two standardized measures, one for systolic blood pressure and one for diastolic blood pressure. In the larger NHANES data set of individuals at least 20 years of age who are not pregnant, the median and the mad (=median absolute deviation from the median) were determined separately for systolic and diastolic blood pressure. The calculation used the median and mad functions in the 'stats' package, so the mad was by default scaled to resemble the standard deviation for a Normal distribution. bpCombined is the sum of two quantities: systolic blood pressure minus its median divided by its mad plus diastolic blood pressure minus its median divided by its mad.

Source

The data are from the US National Health and Nutrition Examination Survey, NHANES 2017-March 2020 Pre-pandemic. The 2017-2020 data were affected by COVID-19 and are not a survey. The complete data are available from the CDC web page. With minor differences, the data were used as an example in Rosenbaum (2023).

References

Rosenbaum, P. R. (2023) <doi:10.1111/biom.13921> A second evidence factor for a second control group. Biometrics, 79(4), 3968-3980.

Rosenbaum, P.R. and Rubin, D.B. (1985) <doi:10.2307/2683903> Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1):33-38.

US National Health and Nutrition Examination Survey, 2017-2020. Atlanta: US Centers for Disease Control.

Examples

data(binge)
attach(binge)
# A simple goal is to match each of the 206 binge drinkers
# to one control from each of the two control groups.
table(AlcGroup)
# The matching ratios are very different, 19-to-1 or 2.4 to 1.
table(AlcGroup)/206
# Before matching, the age distributions are very different.
boxplot(age~AlcGroup)
# NHANES does not ask every question of every person, often
# using either age or sex to determine what will be asked.
# If we match exactly for age categories that determine what
# will be asked, then both members of a matched pair will
# either have an answer or no answer, and this simplifies
# some analyses, while also beginning the work of controlling
# an important covariate.
table(AlcGroup,ageC)
# Age is going to be a problem for the P controls.  There
# are 29 bingers B under age 30, but only 19 past bingers P.
table(AlcGroup,smokenowf)
# Smoking is a challenge, too.  The 19-to-1 matching ratio
# for N controls drops to about 4 = 364/92 for everyday
# smokers, and from 2.43 to 1.44 = 133/92 for P controls.
# There are too few P controls who smoke SomeDays to
# match exactly with bingers B.
detach(binge)

####################################################
# This example produces the elementary match in
# Section 4.3 of the iTOS book.
# It matches binge drinkers (B) to never binge
# controls (N), matching just for the propensity
# score.  The match uses both a caliper and a fine
# balance constraint for the propensity score.
# Generally, one does not just match for the
# propensity score, but this is an example.
# See also the documentation for makematch.
library(iTOS)
data(binge)
#  Make the treatment indicator z for B versus N
z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="N"]<-0

#  I find it convenient to write a small function
#  to create a match.  In that way, small
#  changes in the function can improve the
#  match by improving covariate balance.
#  It also documents the details of the match.
#  It also makes it possible to reproduce the match.

matchPropensity<-function(z,ncontrols=1){
  #
  # Some bookkeeping.
  # Select the B versus N part of the binge data
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  # Sort data, placing treated (B) first,
  # and then ordered by SEQN.
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN

  #  Initialize the distance matrix on
  #  the left and right to zero distances
  left<-startcost(z)
  right<-startcost(z)

  #  Create the propensity score
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
                 bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values

  # The left distance matrix is changed from zero
  # by adding a caliper on the propensity score.
  # The caliper is almost the one used in
  # Rosenbaum and Rubin (1985, American Statistician).
  # If two individuals differ on the propensity score
  # by more than 0.2 times the standard deviation of p,
  # the distance between them is increased from 0 to 10.
  # That was the caliper in Rosenbaum and Rubin (1985).
  # By default, addcaliper doubles the distance (to 20) at
  # 0.4 = 2 x 0.2 times the standard deviation of p.
  # Without this doubling feature, the caliper views
  # everyone who violates the 0.2 caliper as equivalent.
  left<-addcaliper(left,z,p,penalty=10)

  # The right distance matrix now adds a fine balance
  # constraint.  The variable (p>0.05)+(p>.1)+(p>.15)+(p>.2)
  # takes integer values from 0 to 4, taking steps up as
  # the propensity score increases.
  # The right distance matrix is 0 if two people fall in
  # the same category, is 1000 if they fall in adjacent
  # categories, 2000 if there if there is a category
  # between them, and so on.  Because 1000 is so much
  # larger than 10, the fine balance constraint takes
  # priority over the caliper.
  right<-addinteger(right,z,(p>0.05)+(p>.1)+(p>.15)+(p>.2))

  # Some more bookkeeping
  detach(dt)
  dt<-cbind(dt,z,p)

  # The big step: Use the distance matrices to make the match
  m<-makematch(dt,left,right,ncontrols=ncontrols)

  # Final bookkeeping
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}

# Call the function above to make the match
mProp<-matchPropensity(z)
m<-mProp$m

# Make Table 4.3 in the iTOS book.
t.test(m$age~m$z)$p.value
t.test(m$female~m$z)$p.value
t.test(m$education~m$z)$p.value
t.test(m$bmi~m$z)$p.value
t.test(m$waisthip~m$z)$p.value
t.test(m$vigor~m$z)$p.value
t.test(m$smokenow~m$z)$p.value
t.test(m$smokeQuit~m$z)$p.value
t.test(m$bpRX~m$z)$p.value
t.test(m$p~m$z)$p.value
dt<-mProp$dt
tr<-m$z==1
co<-m$z==0
un<-!is.element(dt$SEQN,m$SEQN)

vnames<-c("age","female","education","bmi","waisthip","vigor",
          "smokenow","smokeQuit","bpRX","p")
o<-matrix(NA,10,3)
pval<-rep(NA,10)
names(pval)<-vnames
rownames(o)<-vnames
colnames(o)<-c("Treated","Control","Unmatched")
for (i in 1:10){
  vname<-vnames[i]
  vm<-as.vector(m[,colnames(m)==vname])
  vdt<-as.vector(dt[,colnames(dt)==vname])
  o[i,1]<-mean(vm[tr])
  o[i,2]<-mean(vm[co])
  o[i,3]<-mean(vdt[un])
  pval[i]<-t.test(vm[tr],vm[co])$p.value
}
library(xtable)
o<-cbind(o,pval)
xtable(o,digits=c(NA,2,2,2,3))
m[5:6,]

data(binge)
attach(binge)
# A simple goal is to match each of the 206 binge drinkers
# to one control from each of the two control groups.
table(AlcGroup)
# The matching ratios are very different, 19-to-1 or 2.4 to 1.
table(AlcGroup)/206
# Before matching, the age distributions are very different.
boxplot(age~AlcGroup)
# NHANES does not ask every question of every person, often
# using either age or sex to determine what will be asked.
# If we match exactly for age categories that determine what
# will be asked, then both members of a matched pair will
# either have an answer or no answer, and this simplifies
# some analyses, while also beginning the work of controlling
# an important covariate.
table(AlcGroup,ageC)
# Age is going to be a problem for the P controls.  There
# are 29 bingers B under age 30, but only 19 past bingers P.
table(AlcGroup,smokenowf)
# Smoking is a challenge, too.  The 19-to-1 matching ratio
# for N controls drops to about 4 = 364/92 for everyday
# smokers, and from 2.43 to 1.44 = 133/92 for P controls.
# There are too few P controls who smoke SomeDays to
# match exactly with bingers B.
detach(binge)

####################################################
# This example produces the elementary match in
# Section 4.3 of the iTOS book.
# It matches binge drinkers (B) to never binge
# controls (N), matching just for the propensity
# score.  The match uses both a caliper and a fine
# balance constraint for the propensity score.
# Generally, one does not just match for the
# propensity score, but this is an example.
# See also the documentation for makematch.
library(iTOS)
data(binge)
#  Make the treatment indicator z for B versus N
z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="N"]<-0

#  I find it convenient to write a small function
#  to create a match.  In that way, small
#  changes in the function can improve the
#  match by improving covariate balance.
#  It also documents the details of the match.
#  It also makes it possible to reproduce the match.

matchPropensity<-function(z,ncontrols=1){
  #
  # Some bookkeeping.
  # Select the B versus N part of the binge data
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  # Sort data, placing treated (B) first,
  # and then ordered by SEQN.
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN

  #  Initialize the distance matrix on
  #  the left and right to zero distances
  left<-startcost(z)
  right<-startcost(z)

  #  Create the propensity score
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
                 bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values

  # The left distance matrix is changed from zero
  # by adding a caliper on the propensity score.
  # The caliper is almost the one used in
  # Rosenbaum and Rubin (1985, American Statistician).
  # If two individuals differ on the propensity score
  # by more than 0.2 times the standard deviation of p,
  # the distance between them is increased from 0 to 10.
  # That was the caliper in Rosenbaum and Rubin (1985).
  # By default, addcaliper doubles the distance (to 20) at
  # 0.4 = 2 x 0.2 times the standard deviation of p.
  # Without this doubling feature, the caliper views
  # everyone who violates the 0.2 caliper as equivalent.
  left<-addcaliper(left,z,p,penalty=10)

  # The right distance matrix now adds a fine balance
  # constraint.  The variable (p>0.05)+(p>.1)+(p>.15)+(p>.2)
  # takes integer values from 0 to 4, taking steps up as
  # the propensity score increases.
  # The right distance matrix is 0 if two people fall in
  # the same category, is 1000 if they fall in adjacent
  # categories, 2000 if there if there is a category
  # between them, and so on.  Because 1000 is so much
  # larger than 10, the fine balance constraint takes
  # priority over the caliper.
  right<-addinteger(right,z,(p>0.05)+(p>.1)+(p>.15)+(p>.2))

  # Some more bookkeeping
  detach(dt)
  dt<-cbind(dt,z,p)

  # The big step: Use the distance matrices to make the match
  m<-makematch(dt,left,right,ncontrols=ncontrols)

  # Final bookkeeping
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}

# Call the function above to make the match
mProp<-matchPropensity(z)
m<-mProp$m

# Make Table 4.3 in the iTOS book.
t.test(m$age~m$z)$p.value
t.test(m$female~m$z)$p.value
t.test(m$education~m$z)$p.value
t.test(m$bmi~m$z)$p.value
t.test(m$waisthip~m$z)$p.value
t.test(m$vigor~m$z)$p.value
t.test(m$smokenow~m$z)$p.value
t.test(m$smokeQuit~m$z)$p.value
t.test(m$bpRX~m$z)$p.value
t.test(m$p~m$z)$p.value
dt<-mProp$dt
tr<-m$z==1
co<-m$z==0
un<-!is.element(dt$SEQN,m$SEQN)

vnames<-c("age","female","education","bmi","waisthip","vigor",
          "smokenow","smokeQuit","bpRX","p")
o<-matrix(NA,10,3)
pval<-rep(NA,10)
names(pval)<-vnames
rownames(o)<-vnames
colnames(o)<-c("Treated","Control","Unmatched")
for (i in 1:10){
  vname<-vnames[i]
  vm<-as.vector(m[,colnames(m)==vname])
  vdt<-as.vector(dt[,colnames(dt)==vname])
  o[i,1]<-mean(vm[tr])
  o[i,2]<-mean(vm[co])
  o[i,3]<-mean(vdt[un])
  pval[i]<-t.test(vm[tr],vm[co])$p.value
}
library(xtable)
o<-cbind(o,pval)
xtable(o,digits=c(NA,2,2,2,3))
m[5:6,]

Binge Drinking and High Blood Pressure – Matched With Two Control Groups

Description

The bingeM data set is the matched data set built from the unmatched binge data using the makematch() function. The documentation for the makematch() function builds bingeM from binge.

Usage

data("bingeM")data("bingeM")

Format

A data frame with 618 observations on the following 20 variables.

SEQN: NHANES identification number
age: Age in years
ageC: Age cut into four categories, [20,30), [30,45), [45,60) and [60,Inf).
female: 1=female, 0=male
educationf: Education in five categories: <9th grade, 9th-11th grade without a high school degree or equivalent, high school degree or equivalent, some college, at least a BA degree. An ordered factor with levels <9th < 9-11 < HS < SomeCol < >=BA
education: The previous variable, educationf, as integer scores, 1-5.
bmi: BMI or body mass index. A measure of obesity.
waisthip: Waist-to-hip ratio. A measure of obesity.
vigor: 1 if engages in vigorous activity, either in recreation or at work, 0 otherwise
smokenowf: Do you smoke now? Everyday < SomeDays < No
smokenow: The previous variable, smokenowf, as integer scores.
bpRX: 1=currently taking medication to control high blood pressure, 0=other
smokeQuit: 1=used to smoke regularly but quit, 0=other. A current smoker and a never smoker both have value 0.
AlcGroup: An ordered factor with levels B < N < P
bpSystolic: Systolic blood pressure. The average of up to three measurements.
bpDiastolic: Diastolic blood pressure. The average of up to three measurements.
bpCombined: A combined measure of blood pressure.
z: Treatment/control indicator, z[i]=1 if i is in AlcGroup category B and z[i]=0 otherwise.
mset: Indicator of the matched set, 1, 2, ..., 206.
treated: The SEQN for the treated individual in this matched set.

Details

bingeM is the matched data set built from the unmatched binge data using the makematch() function. The documentation for the makematch() function builds bingeM from binge.

Source

References

Rosenbaum, P. R. (2023) <doi:10.1111/biom.13921> A second evidence factor for a second control group. Biometrics, 79(4), 3968-3980.

US National Health and Nutrition Examination Survey, 2017-2020. Atlanta: US Centers for Disease Control.

Examples

data(bingeM)
table(bingeM$AlcGroup)
tapply(bingeM$age,bingeM$AlcGroup,median)
tapply(bingeM$bmi,bingeM$AlcGroup,median)
tapply(bingeM$education,bingeM$AlcGroup,mean)
tapply(bingeM$female,bingeM$AlcGroup,mean)
tapply(bingeM$smokenow,bingeM$AlcGroup,mean)
tapply(bingeM$vigor,bingeM$AlcGroup,mean)
tapply(bingeM$smokeQuit,bingeM$AlcGroup,mean)
boxplot(bingeM$bpCombined~bingeM$AlcGroup,ylab="Combined BP",
        xlab="Alcohol Group",las=1)
y<-t(matrix(bingeM$bpCombined,3,206))
weightedRank::wgtRank(y,gamma=2)
amplify(2,3)
# In iTOS book, comparison of Tables 13.2 and 13.3
weightedRank::ef2C(y,gamma=2,upsilon=2,m1=c(6,6),range=TRUE,scores=1:3)
weightedRank::ef2C(y,gamma=2,upsilon=2.3,m1=c(6,8),range=FALSE,scores=c(1,2,5))
data(bingeM)
table(bingeM$AlcGroup)
tapply(bingeM$age,bingeM$AlcGroup,median)
tapply(bingeM$bmi,bingeM$AlcGroup,median)
tapply(bingeM$education,bingeM$AlcGroup,mean)
tapply(bingeM$female,bingeM$AlcGroup,mean)
tapply(bingeM$smokenow,bingeM$AlcGroup,mean)
tapply(bingeM$vigor,bingeM$AlcGroup,mean)
tapply(bingeM$smokeQuit,bingeM$AlcGroup,mean)
boxplot(bingeM$bpCombined~bingeM$AlcGroup,ylab="Combined BP",
        xlab="Alcohol Group",las=1)
y<-t(matrix(bingeM$bpCombined,3,206))
weightedRank::wgtRank(y,gamma=2)
amplify(2,3)
# In iTOS book, comparison of Tables 13.2 and 13.3
weightedRank::ef2C(y,gamma=2,upsilon=2,m1=c(6,6),range=TRUE,scores=1:3)
weightedRank::ef2C(y,gamma=2,upsilon=2.3,m1=c(6,8),range=FALSE,scores=c(1,2,5))

Computes individual and pairwise treatment assignment probabilities.

Description

Of limited interest to most users, the computep function plays an internal role in 2-sample and stratified sensitivity analyses. The computep function is equations (9) and (10), page 496, in Rosenbaum and Krieger (1990). The computep function is from the senstrat package.

Usage

computep(bigN, n, m, g)
computep(bigN, n, m, g)

Arguments

`bigN`	Total sample size in this stratum.
`n`	Treated sample size in this stratum.
`m`	The number of 1's in the vector u of unobserved covariates. Here, u has bigN-m 0's followed by m 1's.
`g`	The sensitivity parameter $\Gamma$ , where $\Gamma \ge 1$ .

Value

`p1`	Equation (9), page 496, in Rosenbaum and Krieger (1990) evaluated with u[i]=1.
`p0`	Equation (9), page 496, in Rosenbaum and Krieger (1990) evaluated with u[i]=0.
`p11`	Equation (10), page 496, in Rosenbaum and Krieger (1990) evaluated with u[i]=1, u[j]=1.
`p10`	Equation (10), page 496, in Rosenbaum and Krieger (1990) evaluated with u[i]=1, u[j]=0.
`p00`	Equation (10), page 496, in Rosenbaum and Krieger (1990) evaluated with u[i]=0, u[j]=0.

Note

The function computep is called by the function ev.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. and Krieger, A. M. (1990) <doi:10.2307/2289789> Sensitivity of two-sample permutation inferences in observational studies. Journal of the American Statistical Association, 85, 493-498.

Rosenbaum, P. R. (2002). Observational Studies (2nd edition). New York: Springer. Section 4.6.

Examples

computep(10,5,6,2)
computep(10,5,6,2)

Computes the null expectation and variance for one stratum.

Description

Of limited interest to most users, the ev function plays an internal role in 2-sample and stratified sensitivity analyses. The expectation and variance returned by the ev function are defined in the third paragraph of section 4, page 495, of Rosenbaum and Krieger (1990).

Usage

ev(sc, z, m, g, method)
ev(sc, z, m, g, method)

Arguments

`sc`	A vector of scored outcomes for one stratum. For instance, for Wilcoxon's rank sum test, these would be the ranks of the outcomes in the current stratum.
`z`	Treatment indicators, with length(z)=length(sc). Here, z[i]=1 if i is treated and z[i]=0 if i is control.
`m`	The unobserved covariate u has length(z)-m 0's followed by m 1's.
`g`	The sensitivity parameter $\Gamma$ , where $\Gamma \ge 1$ .
`method`	If method="RK" or if method="BU", exact expectations and variances are used in a large sample approximation. Methods "RK" and "BU" should give the same answer, but "RK" uses formulas from Rosenbaum and Krieger (1990), while "BU" obtains exact moments for the extended hypergeometric distribution using the BiasedUrn package and then applies Proposition 20, page 155, section 4.7.4 of Rosenbaum (2002). In contrast, method="LS" does not use exact expectations and variances, but rather uses the large sample approximations in section 4.6.4 of Rosenbaum (2002). Finally, method="AD" uses method="LS" for large strata and method="BU" for smaller strata.

Details

The function ev() is called by the function evall(). The ev() function is from the senstrat package.

Value

`expect`	Null expectation of the test statistic.
`vari`	Null variance of the test statistic.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. (2002). Observational Studies (2nd edition). New York: Springer. Section 4.6.

Rosenbaum, P. R. (2018) <doi:10.1214/18-AOAS1153> Sensitivity analysis for stratified comparisons in an observational study of the effect of smoking on homocysteine levels. The Annals of Applied Statistics, 12(4), 2312-2334.

Examples

ev(1:5,c(0,1,0,1,0),3,2,"RK")
ev(1:5,c(0,1,0,1,0),3,2,"BU")
ev(1:5,c(0,1,0,1,0),3,2,"RK")
ev(1:5,c(0,1,0,1,0),3,2,"BU")

Evaluate Covariate Balance in a Matched Sample

Description

The covariate balance in a matched sample is compared to the balance that would have been obtained in a completely randomized experiment built from the same people. The existing matched sample is randomized to treatment or control many times, and various measures of covariate balance are computed from the one matched sample and the many randomized experiments. The main elements of the method are from Hansen and Bowers (2008), Pimentel et al. (2015, Table 1), and Yu (2021).

Usage

evalBal(z, x, statistic = "s", reps = 1000, trunc = 0.2, nunique = 2, alpha = 0.05)
evalBal(z, x, statistic = "s", reps = 1000, trunc = 0.2, nunique = 2, alpha = 0.05)

Arguments

`z`	z is a vector, with z[i]=1 for treated and z[i]=0 for control
`x`	x is a matrix or a data-frame containing covariates with no NAs. An error will result if length(z) does not equal the number of rows of x.
`statistic`	If statistic="t", the default two-sample t-test from the 'stats' package computes a two-sided P-value for each covariate in the matched sample and the many randomized experiments. If statistic="w", then the two-sample Wilcoxon rank sum test is used, as implemented in the 'stats' package. If statistic="s", the two-sample Wilcoxon rank sum test is used for numeric covariates with more than nunique distinct values, or the chi-square test for a two-way table is used for factors and for numeric covariates with at most nunique distinct values. The default value is nunique=2.
`reps`	A positive integer. A total of reps randomized experiments are compared to the one matched sample.
`trunc`	For each simulated randomized experiment, a P-value is computed for each covariate. Also computed is the statistic proposed by Zykin et al. (2002) defined as the product of those covariate-specific P-values that do not exceed trunc. This truncated product is not a P-value, but it is a statistic. See Details.
`nunique`	See the option statistic="s" above.
`alpha`	For each simulated randomized experiment, a P-value is computed for each covariate. Also computed is number of these P-values that are less than or equal to alpha.

Details

Truncated Product: For independent uniform P-values, Zaykin et al. (2002) derive a true P-value from the null distribution of their truncated product of P-values. That null distribution can be computed using the truncatedP() function in the 'sensitivitymv' package; however, it is not used here, because the P-values for dependent covariates are not independent. Rather, the actual randomization distribution of the truncated product is simulated. Taking trunc=1 yields Fisher's statistic, namely the product of all of the P-values.

Value

`test.name`	The name of the test used.
`actual`	For each covariate, the usual two sample P-values comparing the distributions of treated and control groups for each covariate in x. Also, the minimum P-value, the truncated product of P-values, and the number of P-values less than or equal to alpha – none of these quantities is a P-value.
`simBetter`	Comparison of the covariate imbalance in the actual matched sample and the many simulated randomized experiment. Of the reps randomized experiments, how many were strictly better balanced than the one matched observational study.
`sim`	Details of the simulated randomized experiments.

Author(s)

Paul R. Rosenbaum

References

Hansen, B. B., and Bowers, J. (2008) <doi:10.1214/08-STS254> Covariate balance in simple, stratified and clustered comparative studies. Statistical Science, 23, 219-236.

Yu, R. (2021) <doi:10.1111/biom.13098> Evaluating and improving a matched comparison of antidepressants and bone density. Biometrics, 77(4), 1276-1288.

Zaykin, D. V., Zhivotovsky, L. A., Westfall, P. H. and Weir, B. S. (2002) <doi:10.1002/gepi.0042> Truncated product method of combining P-values. Genetic Epidemiology, 22, 170-185.

Examples

# Evaluate the balance in the bingeM matched sample.
# The more difficult control group, P, will be evaluated.
data(bingeM)
attach(bingeM)
xBP<-data.frame(age,female,education,bmi,waisthip,vigor,smokenow,bpRX,smokeQuit)
xBP<-xBP[bingeM$AlcGroup!="N",]
detach(bingeM)
z<-bingeM$z[bingeM$AlcGroup!="N"]

# In a serious evaluation, take reps=1000 or reps=10000.
# For a quick example, reps is set to reps=100 here.
set.seed(5)
balBP<-evalBal(z,xBP,reps=100)
balBP$test.name
# This says that age is compared using the Wilcoxon two-sample test,
# and female is compared using the chi-square test for a 2x2 table.
# Because the default, nunique=2, was used, education was evaluated
# using Wilcoxon's test; however, changing nunique to 5 would evaluate
# the 5 levels of education using a chi-square test for a 2x5 table.
balBP$actual
# In the matched sample, none of the 9 covariates has a P-value
# of 0.05 or less.  The smallest of the 9 P-values is .366, and
# their truncated product is 1, because, by definition, the truncated
# product is 1 if all of the P-values are above trunc.
apply(balBP$sim,2,median)
# In the simulated randomized experiments, the median of the 100
# P-values is close to 1/2 for all covariates.
balBP$simBetter
# Of the 100 simulated randomized experiments, only 3 were better
# balanced than the matched sample in terms of the minimum P-value,
# and none were better balanced in terms of the truncated product
# of P-values.
#
# There were too few controls in the P control group who smoked
# on somedays to match exactly for smokenow.  Nonetheless, only
# 13/100 randomized experiments were better balanced for smokenow.
#
# Now compare the binge group B to the combination of the two
# control groups.
attach(bingeM)
x<-data.frame(age,female,education,bmi,waisthip,vigor,smokenow,bpRX,smokeQuit)
detach(bingeM)
set.seed(5)
balAll<-evalBal(bingeM$z,x,reps=100,trunc=1)
balAll$actual
balAll$simBetter
# This time, Fisher's product of all P-values is used, with trunc=1.
# In terms of the minimum P-value and the product of P-values,
# none of the 100 randomized experiments is better balanced than the
# matched sample.
# Evaluate the balance in the bingeM matched sample.
# The more difficult control group, P, will be evaluated.
data(bingeM)
attach(bingeM)
xBP<-data.frame(age,female,education,bmi,waisthip,vigor,smokenow,bpRX,smokeQuit)
xBP<-xBP[bingeM$AlcGroup!="N",]
detach(bingeM)
z<-bingeM$z[bingeM$AlcGroup!="N"]

# In a serious evaluation, take reps=1000 or reps=10000.
# For a quick example, reps is set to reps=100 here.
set.seed(5)
balBP<-evalBal(z,xBP,reps=100)
balBP$test.name
# This says that age is compared using the Wilcoxon two-sample test,
# and female is compared using the chi-square test for a 2x2 table.
# Because the default, nunique=2, was used, education was evaluated
# using Wilcoxon's test; however, changing nunique to 5 would evaluate
# the 5 levels of education using a chi-square test for a 2x5 table.
balBP$actual
# In the matched sample, none of the 9 covariates has a P-value
# of 0.05 or less.  The smallest of the 9 P-values is .366, and
# their truncated product is 1, because, by definition, the truncated
# product is 1 if all of the P-values are above trunc.
apply(balBP$sim,2,median)
# In the simulated randomized experiments, the median of the 100
# P-values is close to 1/2 for all covariates.
balBP$simBetter
# Of the 100 simulated randomized experiments, only 3 were better
# balanced than the matched sample in terms of the minimum P-value,
# and none were better balanced in terms of the truncated product
# of P-values.
#
# There were too few controls in the P control group who smoked
# on somedays to match exactly for smokenow.  Nonetheless, only
# 13/100 randomized experiments were better balanced for smokenow.
#
# Now compare the binge group B to the combination of the two
# control groups.
attach(bingeM)
x<-data.frame(age,female,education,bmi,waisthip,vigor,smokenow,bpRX,smokeQuit)
detach(bingeM)
set.seed(5)
balAll<-evalBal(bingeM$z,x,reps=100,trunc=1)
balAll$actual
balAll$simBetter
# This time, Fisher's product of all P-values is used, with trunc=1.
# In terms of the minimum P-value and the product of P-values,
# none of the 100 randomized experiments is better balanced than the
# matched sample.

Compute expectations and variances for one stratum.

Description

Of limited interest to most users, the evall() function plays an internal role in 2-sample and stratified sensitivity analyses. The expectation and variance returned by the evall() function are defined in the third paragraph of section 4, page 495, of Rosenbaum and Krieger (1990). The function evall() calls the function ev() to determine the expectation and variance of the test statistic for an unobserved covariate u with length(z)-m 0's followed by m 1's, doing this for m=1,...,length(z)-1.

Usage

evall(sc, z, g, method)
evall(sc, z, g, method)

Arguments

`sc`	A vector of scored outcomes for one stratum. For instance, for Wilcoxon's rank sum test, these would be the ranks of the outcomes in the current stratum.
`z`	Treatment indicators, with length(z)=length(sc). Here, z[i]=1 if i is treated and z[i]=0 if i is control.
`g`	The sensitivity parameter $\Gamma$ , where $\Gamma \ge 1$ .
`method`	If method="RK" or if method="BU", exact expectations and variances are used in a large sample approximation. Methods "RK" and "BU" should give the same answer, but "RK" uses formulas from Rosenbaum and Krieger (1990), while "BU" obtains exact moments for the extended hypergeometric distribution using the BiasedUrn package and then applies Proposition 20, page 155, section 4.7.4 of Rosenbaum (2002). In contrast, method="LS" does not use exact expectations and variances, but rather uses the large sample approximations in section 4.6.4 of Rosenbaum (2002). Finally, method="AD" uses method="LS" for large strata and method="BU" for smaller strata.

Details

The evall() function is called by the sen2sample() function and the senstrat() function.

Value

A data.frame with length(z)-1 rows and three columns. The first column, m, gives the number of 1's in the unobserved covariate vector, u. The second column, expect, and the third column, var, give the expectation and variance of the test statistic for this u.

Note

The example is from Table 1, page 497, of Rosenbaum and Krieger (1990). The example is also Table 4.15, page 146, in Rosenbaum (2002). The example refers to Cu cells. The data are orignally from Skerfving et al. (1974). The evall function is from the senstrat package.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. (2002). Observational Studies (2nd edition). New York: Springer. Section 4.6.

Examples

z<-c(rep(0,16),rep(1,23))
CuCells<-c(2.7, .5, 0, 0, 5, 0, 0, 1.3, 0, 1.8, 0, 0, 1.0, 1.8,
           0, 3.1, .7, 4.6, 0, 1.7, 5.2, 0, 5, 9.5, 2, 3, 1, 3.5,
           2, 5, 5.5, 2, 3, 4, 0, 2, 2.2, 0, 2)
evall(rank(CuCells),z,2,"RK")
z<-c(rep(0,16),rep(1,23))
CuCells<-c(2.7, .5, 0, 0, 5, 0, 0, 1.3, 0, 1.8, 0, 0, 1.0, 1.8,
           0, 3.1, .7, 4.6, 0, 1.7, 5.2, 0, 5, 9.5, 2, 3, 1, 3.5,
           2, 5, 5.5, 2, 3, 4, 0, 2, 2.2, 0, 2)
evall(rank(CuCells),z,2,"RK")

Convolution of Two Probability Generating Functions

Description

Computes the convolution of two probability generating functions using the convolve function in the stats package. The convolve function uses the fast fourier transform.

Usage

gconv(g1,g2)
gconv(g1,g2)

Arguments

`g1`	A probability generating function. A vector g1 for a random variable X taking values 0, 1, 2, ..., length(g1)-1, where g1[i] = Pr(X=i-1)For example, g1 = c(2/3, 1/3) is the generating function of a binary random variable X with Pr(X=0)=2/3, Pr(X=1)=1/3. The random variable that is 0 with probability 1 has g1=1.
`g2`	Another probability generating function for a random variable Y. For a fair die, g2 = c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6).

Value

The probability generating function of X+Y when X and Y are independent.

Note

The gconv function is a slight modification of a similar function in the sensitivity2x2xk package.

References

Pagano, M. and Tritchler, D. (1983) <doi:10.2307/2288653> On obtaining permutation distributions in polynomial time. Journal of the American Statistical Association, 78, 435-440.

Rosenbaum, P. R. (2020) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies. New York: Springer. Chapter 3 Appendix: Exact Computations for Sensitivity Analysis, page 103.

Examples


gconv(c(2/3,1/3),c(2/3,1/3))

gconv(1,c(2/3,1/3))

round(gconv(c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6),
     c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6)),3)
#
# Compute the exact distribution of Quade's treated-control
# statistic forI=3 blocks of size J=3.
#
# Block with range rank = 1
rk1<-c(0,1/3,1/3,1/3)
names(rk1)<-0:3
rk1
#
# Block with range rank = 2
rk2<-c(0,0,1/3,0,1/3,0,1/3)
names(rk2)<-0:6
rk2
#
# Block with range rank = 3
rk3<-c(0,0,0,1/3,0,0,1/3,0,0,1/3)
names(rk3)<-0:9
rk3
#
# Convolution of rk1 and rk2
round(gconv(rk1,rk2),3)
1/(3^2)
#
# Convolution of rk1, rk2 and rk3
round(gconv(gconv(rk1,rk2),rk3),3)
1/(3^3)
gconv(c(2/3,1/3),c(2/3,1/3))

gconv(1,c(2/3,1/3))

round(gconv(c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6),
     c(0, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6)),3)
#
# Compute the exact distribution of Quade's treated-control
# statistic forI=3 blocks of size J=3.
#
# Block with range rank = 1
rk1<-c(0,1/3,1/3,1/3)
names(rk1)<-0:3
rk1
#
# Block with range rank = 2
rk2<-c(0,0,1/3,0,1/3,0,1/3)
names(rk2)<-0:6
rk2
#
# Block with range rank = 3
rk3<-c(0,0,0,1/3,0,0,1/3,0,0,1/3)
names(rk3)<-0:9
rk3
#
# Convolution of rk1 and rk2
round(gconv(rk1,rk2),3)
1/(3^2)
#
# Convolution of rk1, rk2 and rk3
round(gconv(gconv(rk1,rk2),rk3),3)
1/(3^3)

Two-Criteria Matching

Description

Implements the method of Zhang et al (2023) <doi:10.1080/01621459.2021.1981337>. As special cases, this includes: minimum distance (or optimal matching), matching with fine balance or near-fine balance or refined balance; see Chapter 5 of the iTOS book or the references below.

Usage

makematch(dat, costL, costR, ncontrols = 1, controlcosts = NULL,solver='rlemon')
makematch(dat, costL, costR, ncontrols = 1, controlcosts = NULL,solver='rlemon')

Arguments

`dat`	A data frame. Typically, this is the entire data set. Part of it will be returned as a matched sample with some added variables.
`costL`	The distance matrix on the left side of the network, used for pairing. This matrix would most often be made by adding distances to a zero distance matrix created by startcost(), for instance, using addMahal(). In section 5.4 of the book iTOS or Figure 1 of Zhang et al. (2023), these are the costs on the left treated-control edges.
`costR`	The distance matrix on the right side of the network, used for balancing. This matrix would most often be made by adding distances to a zero distance matrix created by startcost(), for instance, using addNearExact(). If you do not need a right distance matrix, then initialize it to zero using startcost() and do not add additional distances to its intial form. In section 5.4 of the book iTOS or Figure 1 of Zhang et al. (2023), these are the costs on the right control-treated edges.
`ncontrols`	One positive integer, 1 for pair matching, 2 for matching two controls to each treated individual, etc. When ncontrols=2 is feasible, it is often useful to compare the quality of the match obtained with ncontrols=1 and ncontrols=2. In Figure 1 of Zhang et al. (2023), ncontrols determines the total flow that leaves the source and is collected by the sink as ncontrols times the number of treated individuals.
`controlcosts`	An optional vector of costs used to penalize the control-control edges. For instance, one might penalize the use of controls with low propensity scores. This is illustrated in the example for the B-P match. In Figure 1 of Zhang et al. (2023), these are the costs on the central control-control edges.
`solver`	The solver must be either 'rlemon' or 'rrelaxiv'. Both solvers find a minimum cost flow in a network, but rlemon is public and rrelaxiv has an academic license; so, rrelaxiv requires a separate instalation; see details.

Details

This function calls the callrelax() function in Samuel Pimentel's package rcbalance. There are two solvers for callrelax, namely rlemon and the rrelaxiv code of Bertsekas and Tseng (1988). The default is rlemon. rrelaxiv may be the better solver, but it has an academic license, so it is not distributed by CRAN and must be downloaded separately from github at https://errickson.net/rrelaxiv/ or <https://github.com/josherrickson/rrelaxiv/>; see the documentation for callrelax(). There is no need to install rrelaxiv: the package works without it.

In principle, it can happen that two different matches have the same minimum cost, and in this case the two solvers might produce different but equally good matched samples. This is not a problem, but it can come as a surprise. It is unlikely to happen unless the distance matrix has many tied distances. Tied distances are rare in modern distance matrices, with many covariates, using modern ambitious matching techniques.

Value

Returns a matched data set. The matched rows of dat are returned with a new variable mset indicating the matched set. The returned file is sorted by mset and z.

Author(s)

Paul R. Rosenbaum

References

Bertsekas, D. P., Tseng, P. (1988) <doi:10.1007/BF02288322> The Relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13, 125-190.

Bertsekas, D. P. (1990) <doi:10.1287/inte.20.4.133> The auction algorithm for assignment and other network flow problems: A tutorial. Interfaces, 20(4), 133-149.

Bertsekas, D. P., Tseng, P. (1994) <http://web.mit.edu/dimitrib/www/Bertsekas_Tseng_RELAX4_!994.pdf> RELAX-IV: A Faster Version of the RELAX Code for Solving Minimum Cost Flow Problems.

Hansen, B. B. and Klopfer, S. O. (2006) <doi:10.1198/106186006X137047> "Optimal full matching and related designs via network flows". Journal of computational and Graphical Statistics, 15(3), 609-627. ('optmatch' package)

Hansen, B. B. (2007) <https://www.r-project.org/conferences/useR-2007/program/presentations/hansen.pdf> Flexible, optimal matching for observational studies. R News, 7, 18-24. ('optmatch' package)

Pimentel, S. D., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2015) <doi:10.1080/01621459.2014.997879> Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. Journal of the American Statistical Association, 110, 515-527. (Introduces an extension of fine balance called refined balance that is implemented in Pimentel's package 'rcbalance'. This can be implemented using makematch() by, say, placing on the right a very large near-exact penalty on a nominal/integer covariate x1, and a still large but smaller penalty on the nominal/integer covariate as.integer(factor(x1):factor(x2)), etc.)

Pimentel, S. D. (2016) "Large, Sparse Optimal Matching with R Package rcbalance" <https://obsstudies.org/large-sparse-optimal-matching-with-r-package-rcbalance/> Observational Studies, 2, 4-23. (Discusses and illustrates the use of Pimentel's 'rcbalance' package.)

Rosenbaum, P. R. and Rubin, D. B. (1985) <doi:10.1080/00031305.1985.10479383> Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39, 33-38. (This paper suggested emphasizing the propensity score in a match, but also attempting to obtain a close match for key covariates using a Mahalanobis distance.)

Rosenbaum, P. R. (1989) <doi:10.1080/01621459.1989.10478868> Optimal matching for observational studies. Journal of the American Statistical Association, 84(408), 1024-1032. (Discusses and illustrates fine balance using minimum cost flow in a network in section 3.2. This is implemented using makematch() by placing a large near-exact penalty on a nominal/integer covariate x1 on the right distance matrix.)

Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007) <doi:10.1198/016214506000001059> Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association, 102, 75-83.

Rosenbaum, P. R. (2020) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Yu, Ruoqi, and P. R. Rosenbaum. <doi:10.1111/biom.13098> Directional penalties for optimal matching in observational studies. Biometrics 75, no. 4 (2019): 1380-1390. (If a covariate is very out-of-balance, we should prefer a mismatch that works against the imbalance to an equally large mismatch that supports the imbalance. This is illustrated with the propensity score in the example below. Because a directional penalty tolerates a big mismatch in the good direction, it makes sense to place the penalty on the right distance matrix.)

Yu, R. (2023) <doi:10.1111/biom.13771> How well can fine balance work for covariate balancing? Biometrics. 79(3), 2346-2356.

Zhang, B., D. S. Small, K. B. Lasater, M. McHugh, J. H. Silber, and P. R. Rosenbaum (2023) <doi:10.1080/01621459.2021.1981337> Matching one sample according to two criteria in observational studies. Journal of the American Statistical Association, 118, 1140-1151. (This is the basic reference for the makematch() function. It generalizes the concepts of fine balance, near-fine balance and refined balance that were developed in other references.)

Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011) <doi:10.1198/tas.2011.11072> Matching for several sparse nominal variables in a case control study of readmission following surgery. The American Statistician, 65(4), 229-238. (This paper combines near-exact matching and fine balance for the same nominal covariate. It is implemented in makematch() by placing the same covariate on the left and the right, as with smokenow in the example below.)

Examples

# See also the examples for binge in the documentation for the dataset binge.

data(binge)
# matchNever creates the B-N match for the binge data.
# matchPast creates the B-P match for the binge data.
# Each match has its own propensity score, and these
# mean different things in the B-N match and the B-P
# match.  The propensity score is denoted p.
# The Mahalanobis distance is the rank-based method
# described in Rosenbaum (2020, section 9.3).
# A directional caliper from Yu and Rosenbaum (2019)
# favors controls with high propensity scores, p.
#
# The two matches share most features, including:
# 1. A heavy left penalty for mismatching for female and bpRX.
# 2. Left penalties for mismatching for ageC, vigor, smokenow.
# 3. A left Mahalanobis distance for all covariates.
# 4. A heavy right penalty for mismatching smokenow.
# 5. Right penalties for education and smokeQuit.
# 6. A right Mahalanobis distance for just female, age and p.
# 7. An asymmetric, directional caliper on p.
# Because the left distance determines pairing, it emphasizes
# covariates that are out-of-balance and thought to be related
# to blood pressure.  Because the right distance affects balance
# but not pairing, it worries about education and smoking in the
# distant past, as well as the propensity score which again is
# focused on covariate balance.  The asymmetric caliper is
# tolerant of mismatches in the desired direction, so it too
# is placed on the right.  Current smoking, smokenow, is placed
# on both the left and the right: even if we cannot always pair
# for it, perhaps we can balance it.
# In common practice, before examining outcomes, one compares
# several matched designs, fixing imperfections by adjusting
# penalties and other considerations.
#
# The B-P match is more difficult, because the P group is
# smaller than the N group.  The B-P match uses the
# controlcosts feature while the B-N match does not.
# In the B-P match, there are too few young controls
# and too few controls with high propensity scores.
# Therefore, controlcosts penalize the use of controls
# with both low propensity scores (p<.4) and higher
# ages (age>42), thereby minimizing the use of these
# controls, even though the use of some of these
# controls is unavoidable in a pair match.

matchNever<-function(z,ncontrols=1){
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN
  left<-startcost(z)
  right<-startcost(z)
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
           bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values
  left<-addinteger(left,z,as.integer(ageC),penalty=100)
  left<-addNearExact(left,z,female,penalty = 10000)
  left<-addNearExact(left,z,bpRX,penalty = 10000)
  left<-addNearExact(left,z,vigor,penalty=10)
  left<-addinteger(left,z,smokenow,penalty=10)
  left<-addMahal(left,z,cbind(age,bpRX,female,education,smokenow,
        smokeQuit,bmi,vigor,waisthip))
  right<-addMahal(right,z,cbind(female,age,p))
  right<-addinteger(right,z,education,penalty=10)
  right<-addinteger(right,z,smokenow,penalty=1000)
  right<-addinteger(right,z,smokeQuit,penalty=10)
  right<-addcaliper(right,z,p,caliper=c(-1,.03),penalty=10)
  detach(dt)
  dt<-cbind(dt,z,p)
  m<-makematch(dt,left,right,ncontrols=ncontrols)
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}



matchPast<-function(z,ncontrols=1){
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN
  left<-startcost(z)
  right<-startcost(z)
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
                 bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values
  left<-addinteger(left,z,as.integer(ageC),penalty=100)
  left<-addNearExact(left,z,female,penalty = 10000)
  left<-addNearExact(left,z,bpRX,penalty = 10000)
  left<-addNearExact(left,z,vigor,penalty=10)
  left<-addinteger(left,z,smokenow,penalty=10)
  left<-addMahal(left,z,cbind(age,bpRX,female,education,smokenow,
        smokeQuit,bmi,vigor,waisthip))
  right<-addMahal(right,z,cbind(female,age,p))
  right<-addinteger(right,z,education,penalty=10)
  right<-addinteger(right,z,smokenow,penalty=1000)
  right<-addinteger(right,z,smokeQuit,penalty=10)
  right<-addcaliper(right,z,p,caliper=c(-1,.03),penalty=10)
  controlcosts<-((p[z==0]<.4)&(age[z==0]>42))*1000
  detach(dt)
  dt<-cbind(dt,z,p)
  m<-makematch(dt,left,right,ncontrols=ncontrols,controlcosts=controlcosts)
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}

z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="P"]<-0
mPastComplete<-matchPast(z,ncontrols=1)
mPast<-mPastComplete$m
mPastComplete<-mPastComplete$dt
rm(z)

z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="N"]<-0
mNeverComplete<-matchNever(z,ncontrols=1)
mNever<-mNeverComplete$m
mNeverComplete<-mNeverComplete$dt
rm(z)

bingeM<-rbind(mNever,mPast[mPast$z==0,])
bingeM<-bingeM[order(bingeM$treated,bingeM$AlcGroup,bingeM$SEQN),]
w<-which(colnames(bingeM)=="p")[1]
bingeM<-bingeM[,-w]
rm(binge,matchNever,matchPast,w)

old.par <- par(no.readonly = TRUE)

par(mfrow=c(1,2))
 boxplot(mNever$p[mNever$z==1],mNever$p[mNever$z==0],
 mNeverComplete$p[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Propensity Score",main="Never",
 ylim=c(0,.8))

 boxplot(mPast$p[mPast$z==1],mPast$p[mPast$z==0],
 mPastComplete$p[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Propensity Score",main="Past",
 ylim=c(0,.8))

 par(mfrow=c(1,2))
 boxplot(mNever$age[mNever$z==1],mNever$age[mNever$z==0],
 mNeverComplete$age[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Age",main="Never",
 ylim=c(20,80))

 boxplot(mPast$age[mPast$z==1],mPast$age[mPast$z==0],
 mPastComplete$age[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Age",main="Past",
 ylim=c(20,80))

 par(mfrow=c(1,2))
 boxplot(mNever$education[mNever$z==1],mNever$education[mNever$z==0],
 mNeverComplete$education[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Education",main="Never",
 ylim=c(1,5))

 boxplot(mPast$education[mPast$z==1],mPast$education[mPast$z==0],
 mPastComplete$education[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Education",main="Past",
 ylim=c(1,5))

 par(mfrow=c(1,2))
 boxplot(mNever$bmi[mNever$z==1],mNever$bmi[mNever$z==0],
 mNeverComplete$bmi[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="BMI",main="Never",
 ylim=c(14,70))

 boxplot(mPast$bmi[mPast$z==1],mPast$bmi[mPast$z==0],
 mPastComplete$bmi[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="BMI",main="Past",
 ylim=c(14,70))

 par(mfrow=c(1,2))
 boxplot(mNever$waisthip[mNever$z==1],mNever$waisthip[mNever$z==0],
 mNeverComplete$waisthip[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Waist/Hip",main="Never",
 ylim=c(.65,1.25))

 boxplot(mPast$waisthip[mPast$z==1],mPast$waisthip[mPast$z==0],
 mPastComplete$waisthip[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Waist/Hip",main="Past",
 ylim=c(.65,1.25))

 par(old.par)
 
# See also the examples for binge in the documentation for the dataset binge.

data(binge)
# matchNever creates the B-N match for the binge data.
# matchPast creates the B-P match for the binge data.
# Each match has its own propensity score, and these
# mean different things in the B-N match and the B-P
# match.  The propensity score is denoted p.
# The Mahalanobis distance is the rank-based method
# described in Rosenbaum (2020, section 9.3).
# A directional caliper from Yu and Rosenbaum (2019)
# favors controls with high propensity scores, p.
#
# The two matches share most features, including:
# 1. A heavy left penalty for mismatching for female and bpRX.
# 2. Left penalties for mismatching for ageC, vigor, smokenow.
# 3. A left Mahalanobis distance for all covariates.
# 4. A heavy right penalty for mismatching smokenow.
# 5. Right penalties for education and smokeQuit.
# 6. A right Mahalanobis distance for just female, age and p.
# 7. An asymmetric, directional caliper on p.
# Because the left distance determines pairing, it emphasizes
# covariates that are out-of-balance and thought to be related
# to blood pressure.  Because the right distance affects balance
# but not pairing, it worries about education and smoking in the
# distant past, as well as the propensity score which again is
# focused on covariate balance.  The asymmetric caliper is
# tolerant of mismatches in the desired direction, so it too
# is placed on the right.  Current smoking, smokenow, is placed
# on both the left and the right: even if we cannot always pair
# for it, perhaps we can balance it.
# In common practice, before examining outcomes, one compares
# several matched designs, fixing imperfections by adjusting
# penalties and other considerations.
#
# The B-P match is more difficult, because the P group is
# smaller than the N group.  The B-P match uses the
# controlcosts feature while the B-N match does not.
# In the B-P match, there are too few young controls
# and too few controls with high propensity scores.
# Therefore, controlcosts penalize the use of controls
# with both low propensity scores (p<.4) and higher
# ages (age>42), thereby minimizing the use of these
# controls, even though the use of some of these
# controls is unavoidable in a pair match.

matchNever<-function(z,ncontrols=1){
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN
  left<-startcost(z)
  right<-startcost(z)
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
           bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values
  left<-addinteger(left,z,as.integer(ageC),penalty=100)
  left<-addNearExact(left,z,female,penalty = 10000)
  left<-addNearExact(left,z,bpRX,penalty = 10000)
  left<-addNearExact(left,z,vigor,penalty=10)
  left<-addinteger(left,z,smokenow,penalty=10)
  left<-addMahal(left,z,cbind(age,bpRX,female,education,smokenow,
        smokeQuit,bmi,vigor,waisthip))
  right<-addMahal(right,z,cbind(female,age,p))
  right<-addinteger(right,z,education,penalty=10)
  right<-addinteger(right,z,smokenow,penalty=1000)
  right<-addinteger(right,z,smokeQuit,penalty=10)
  right<-addcaliper(right,z,p,caliper=c(-1,.03),penalty=10)
  detach(dt)
  dt<-cbind(dt,z,p)
  m<-makematch(dt,left,right,ncontrols=ncontrols)
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}



matchPast<-function(z,ncontrols=1){
  dt<-binge[!is.na(z),]
  z<-z[!is.na(z)]
  dt<-dt[order(1-z,dt$SEQN),]
  z<-z[order(1-z,dt$SEQN)]
  rownames(dt)<-dt$SEQN
  names(z)<-dt$SEQN
  left<-startcost(z)
  right<-startcost(z)
  attach(dt)
  propmod<-glm(z~age+female+education+smokenow+smokeQuit+bpRX+
                 bmi+vigor+waisthip,family=binomial)
  p<-propmod$fitted.values
  left<-addinteger(left,z,as.integer(ageC),penalty=100)
  left<-addNearExact(left,z,female,penalty = 10000)
  left<-addNearExact(left,z,bpRX,penalty = 10000)
  left<-addNearExact(left,z,vigor,penalty=10)
  left<-addinteger(left,z,smokenow,penalty=10)
  left<-addMahal(left,z,cbind(age,bpRX,female,education,smokenow,
        smokeQuit,bmi,vigor,waisthip))
  right<-addMahal(right,z,cbind(female,age,p))
  right<-addinteger(right,z,education,penalty=10)
  right<-addinteger(right,z,smokenow,penalty=1000)
  right<-addinteger(right,z,smokeQuit,penalty=10)
  right<-addcaliper(right,z,p,caliper=c(-1,.03),penalty=10)
  controlcosts<-((p[z==0]<.4)&(age[z==0]>42))*1000
  detach(dt)
  dt<-cbind(dt,z,p)
  m<-makematch(dt,left,right,ncontrols=ncontrols,controlcosts=controlcosts)
  m$mset<-as.integer(m$mset)
  treated<-m$SEQN[m$z==1]
  treated<-as.vector(t(matrix(treated,length(treated),ncontrols+1)))
  m<-cbind(m,treated)
  list(m=m,dt=dt)
}

z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="P"]<-0
mPastComplete<-matchPast(z,ncontrols=1)
mPast<-mPastComplete$m
mPastComplete<-mPastComplete$dt
rm(z)

z<-rep(NA,dim(binge)[1])
z[binge$AlcGroup=="B"]<-1
z[binge$AlcGroup=="N"]<-0
mNeverComplete<-matchNever(z,ncontrols=1)
mNever<-mNeverComplete$m
mNeverComplete<-mNeverComplete$dt
rm(z)

bingeM<-rbind(mNever,mPast[mPast$z==0,])
bingeM<-bingeM[order(bingeM$treated,bingeM$AlcGroup,bingeM$SEQN),]
w<-which(colnames(bingeM)=="p")[1]
bingeM<-bingeM[,-w]
rm(binge,matchNever,matchPast,w)

old.par <- par(no.readonly = TRUE)

par(mfrow=c(1,2))
 boxplot(mNever$p[mNever$z==1],mNever$p[mNever$z==0],
 mNeverComplete$p[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Propensity Score",main="Never",
 ylim=c(0,.8))

 boxplot(mPast$p[mPast$z==1],mPast$p[mPast$z==0],
 mPastComplete$p[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Propensity Score",main="Past",
 ylim=c(0,.8))

 par(mfrow=c(1,2))
 boxplot(mNever$age[mNever$z==1],mNever$age[mNever$z==0],
 mNeverComplete$age[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Age",main="Never",
 ylim=c(20,80))

 boxplot(mPast$age[mPast$z==1],mPast$age[mPast$z==0],
 mPastComplete$age[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Age",main="Past",
 ylim=c(20,80))

 par(mfrow=c(1,2))
 boxplot(mNever$education[mNever$z==1],mNever$education[mNever$z==0],
 mNeverComplete$education[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Education",main="Never",
 ylim=c(1,5))

 boxplot(mPast$education[mPast$z==1],mPast$education[mPast$z==0],
 mPastComplete$education[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Education",main="Past",
 ylim=c(1,5))

 par(mfrow=c(1,2))
 boxplot(mNever$bmi[mNever$z==1],mNever$bmi[mNever$z==0],
 mNeverComplete$bmi[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="BMI",main="Never",
 ylim=c(14,70))

 boxplot(mPast$bmi[mPast$z==1],mPast$bmi[mPast$z==0],
 mPastComplete$bmi[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="BMI",main="Past",
 ylim=c(14,70))

 par(mfrow=c(1,2))
 boxplot(mNever$waisthip[mNever$z==1],mNever$waisthip[mNever$z==0],
 mNeverComplete$waisthip[!is.element(mNeverComplete$SEQN,mNever$SEQN)],
 names=c("B","mN","uN"),ylab="Waist/Hip",main="Never",
 ylim=c(.65,1.25))

 boxplot(mPast$waisthip[mPast$z==1],mPast$waisthip[mPast$z==0],
 mPastComplete$waisthip[!is.element(mPastComplete$SEQN,mPast$SEQN)],
 names=c("B","mP","uP"),ylab="Waist/Hip",main="Past",
 ylim=c(.65,1.25))

 par(old.par)

Make the Network Used for Matching with Two Criteria

Description

This function is of limited interest to most users, and is called by other functions in the package. Makes the network used in the two-criteria matching method of Zhang et al. (2022).

Usage

makenetwork(costL, costR, ncontrols = 1, controlcosts = NULL)
makenetwork(costL, costR, ncontrols = 1, controlcosts = NULL)

Arguments

`costL`	The distance matrix on the left side of the network, used for pairing.
`costR`	The distance matrix on the right side of the network, used for balancing.
`ncontrols`	One positive integer, 1 for pair matching, 2 for matching two controls to each treated individual, etc.
`controlcosts`	An optional vector of costs used to penalize the control-control edges.

Details

This function creates the network depicted in Figure 1 of Zhang et al. (2023).

A minimum cost flow in this network is found by passing net to callrelax() in the package 'rcbalance'. If you use callrelax(), I strongly suggest you do this with solver set to 'rrelaxiv'. The 'rrelaxiv' package has an academic license. The 'rrelaxiv' package uses Fortran code from RELAX IV developed by Bertsekas and Tseng (1988, 1994) based on Bertsekas' (1990) auction algorithm.

Value

`idtreated`	Row identifications for treated individuals
`idcontrol`	Control identifications for control individuals
`net`	A network for use with callrelax in the 'rcbalance' package.

Author(s)

Paul R. Rosenbaum

References

Bertsekas, D. P., Tseng, P. (1988) <doi:10.1007/BF02288322> The relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13, 125-190.

Bertsekas, D. P. (1990) <doi:10.1287/inte.20.4.133> The auction algorithm for assignment and other network flow problems: A tutorial. Interfaces, 20(4), 133-149.

Bertsekas, D. P., Tseng, P. (1994) <http://web.mit.edu/dimitrib/www/Bertsekas_Tseng_RELAX4_!994.pdf> RELAX-IV: A Faster Version of the RELAX Code for Solving Minimum Cost Flow Problems.

Examples

data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
Ldist<-startcost(z)
Ldist<-addcaliper(Ldist,z,x$age,caliper=10,penalty=5)
Rdist<-startcost(z)
Rdist<-addNearExact(Rdist,z,x$female)
makenetwork(Ldist,Rdist)
data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
attach(d)
x<-data.frame(age,female)
detach(d)
rownames(x)<-d$SEQN
Ldist<-startcost(z)
Ldist<-addcaliper(Ldist,z,x$age,caliper=10,penalty=5)
Rdist<-startcost(z)
Rdist<-addNearExact(Rdist,z,x$female)
makenetwork(Ldist,Rdist)

Sensitivity Analysis Using Noether's Test for Matched Pairs

Description

Computes a sensitivity analysis for treated-minus-control matched pair differences in observational studies.

Usage

noether(y, f = 2/3, gamma = 1, alternative = "greater")
noether(y, f = 2/3, gamma = 1, alternative = "greater")

Arguments

`y`	A vector of treated-minus-control matched pair differences.
`f`	A nonnegative number strictly less than 1. Suppose that there are I matched pair differences, length(y)=I. Rank the absolute pair differences from 1 to I with average ranks for ties. Noether's statistic looks at the roughly (1-f)I pair differences with absolute ranks that are at least fI, and computes the sign test from these fI pair differences. With f=0, Noether's statistic is the usual sign test statistic. With f=2/3, Noether's statistic focuses on the 1/3 of pairs with the largest absolute pair differences. In his article, Noether suggested f=1/3 for randomized matched pair differences from a Normal distribution, but f=2/3 is better for sensitivity analyses in observational studies. Pair differences that are zero are not counted, but this is uncommon for f=2/3.
`gamma`	A number greater than or equal to 1. gamma is the sensitivity parameter, where gamma=1 for a randomization test, and gamma>1 for a sensitivity analysis.
`alternative`	The possible alternatives are "greater", "less" or "two.sided"; however, "two.sided" is available only for gamma=1.

Details

Noether's (1973) strengthens the sign test. In a randomized experiment, it increase power. In an observational study, it increases design sensitivity and the Bahadur efficiency of a sensitivity analysis.

Because the test has a binomial null distribution in both a randomized experiment and in an observational study, Noether's test is used in a number of problems in Introduction to the Theory of Observational Studies.

Noether's test is related to methods of Gastwirth (1966), Brown (1981), and Markowski and Hettmansperger (1982). Its properties in an observational study are discussed Rosenbaum (2012, 2015).

Value

`number.pairs`	Number of pairs used by Noether's statistic, roughly fI.
`positive.pairs`	Number of positive pair differences among used pairs.
`pval`	P-value testing the null hypothesis of no treatment effect. Obtained from the binomial distribution.

Note

As noted in the Preface to Introduction to the Theory of Observational Studies, Noether's statistic is used in a sequence of Problems that appear in various chapters.

Author(s)

Paul R. Rosenbaum

References

Brown, B. M. (1981) <doi:10.1093/biomet/68.1.235> Symmetric quantile averages and related estimators. Biometrika, 68(1), 235-242.

Gastwirth, J. L. (1966) <doi:10.1080/01621459.1966.10482185> On robust procedures. Journal of the American Statistical Association, 61(316), 929-948.

Markowski, E. P. and Hettmansperger, T. P. (1982) <doi:10.1080/01621459.1982.10477905> Inference based on simple rank step score statistics for the location model. Journal of the American Statistical Association, 77(380), 901-907.

Noether, G. E. (1973) <doi:10.1080/01621459.1973.10481411> Some simple distribution-free confidence intervals for the center of a symmetric distribution. Journal of the American Statistical Association, 68(343), 716-719.

Rosenbaum, P. R. (2012) <10.1214/11-AOAS508> An exact adaptive test with superior design sensitivity in an observational study of treatments for ovarian cancer. Annals of Applied Statistics, 6, 83-105.

Rosenbaum, P. R. (2015) <doi:10.1080/01621459.2014.960968> Bahadur efficiency of sensitivity analyses in observational studies. Journal of the American Statistical Association, 110(509), 205-217.

Examples

set.seed(1)
y<-rnorm(1000)+.5
noether(y,f=0,gamma=3)
noether(y,f=2/3,gamma=3)
set.seed(1)
y<-rnorm(1000)+.5
noether(y,f=0,gamma=3)
noether(y,f=2/3,gamma=3)

Initialize a Distance Matrix.

Description

Creates an distance matrix of zeros of dimensions compatible with the treatment indicator vector z.

Usage

startcost(z)
startcost(z)

Arguments

z

A vector with z[i]=1 if individual i is treated or z[i]=0 if individual i is control. The rows of costmatrix refer to treated individuals and the columns refer to controls. Although not strictly required, it is best that z has names that are the same as the names of the data frame dat that will be used in matching.

Value

A matrix of zeros with sum(z) rows and sum(1-z) columns. If z has names, then they become the row and column names of this matrix.

Author(s)

Paul R. Rosenbaum

Examples

data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
dist<-startcost(z)
dist
rm(z,dist,d)
data(binge)
# Select two treated and three controls from binge
d<-binge[is.element(binge$SEQN,c(109315,109365,109266,109273,109290)),]
z<-1*(d$AlcGroup=="B")
names(z)<-d$SEQN
dist<-startcost(z)
dist
rm(z,dist,d)

zeta function in sensitivity analysis

Description

Of limited interest to most users, the zeta function plays an internal role in 2-sample and stratified sensitivity analyses. The zeta function is equation (8), page 495, in Rosenbaum and Krieger (1990).

Usage

zeta(bigN, n, m, g)
zeta(bigN, n, m, g)

Arguments

`bigN`	Total sample size in this stratum.
`n`	Treated sample size in this stratum.
`m`	The number of 1's in the vector u of unobserved covariates. Here, u has bigN-m 0's followed by m 1's.
`g`	The sensitivity parameter $\Gamma$ , where $\Gamma \ge 1$ .

Value

The value of the zeta function.

Note

The zeta function is called by computep. The zeta function is from the senstrat package.

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. and Krieger, A. M. (1990). Sensitivity of two-sampler permutation inferences in observational studies. Journal of the American Statistical Association, 85, 493-498.

Rosenbaum, P. R. (2002). Observational Studies (2nd edition). New York: Springer. Section 4.6.

Examples

zeta(10,5,6,2)
zeta(10,5,6,2)

Package 'iTOS'

Help Index

Methods and Examples from Introduction to the Theory of Observational Studies

Description

Details

Author(s)

References

Examples

Add a Caliper to an Existing Cost Matrix

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Add an Integer Penalty to an Existing Distance Matrix

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Rank-Based Mahalanobis Distance Matrix

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Add a Near-exact Penalty to an Exisiting Distance Matrix.

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

Examples

Cut a Covariate at Quantiles and Add a Penalty for Different Quantile Categories

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Alcohol and HDL Cholesterol

Description

Usage

Format

Details

Source

References

Examples

Amplification of sensitivity analysis in observational studies.

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

Examples

Binge Drinking and High Blood Pressure

Description

Usage

Format

Details

Source

References

Examples

Binge Drinking and High Blood Pressure – Matched With Two Control Groups

Description