Package 'evident'

Title: Evidence Factors in Observational Studies
Description: Contains a collection of examples of evidence factors in observational studies from the book Replication and Evidence Factors in Observational Studies by Paul R. Rosenbaum (2021) <doi:10.1201/9781003039648>.
Authors: Paul R. Rosenbaum
Maintainer: Paul R. Rosenbaum <[email protected]>
License: GPL-2
Version: 1.0.4
Built: 2024-12-18 06:34:40 UTC
Source: CRAN

Help Index


Biomonitoring of Workers Exposed to Antineoplastic Drugs

Description

Two groups of exposed workers compared to unexposed controls, where exposed workers prepared antineoplasitic drugs, protected either only by gloves or by gloves and a laminar hood with vertical air flow. The outcome is the comet assay applied to blood lymphocytes. The comet assay is a measure of damage to DNA. Data from Kopjar and Garaj-Vrhovac (2001). These data illustrate two evidence factors formed when comparing three treatment groups.

Usage

data("antineoplastic")

Format

A data frame with 59 observations on the following 9 variables.

id

ID number. See Tables I and III of Kopjar and Garaj-Vrhovac (2001).

age

Age in years

str

Three age strata, cut at the thirds of age.

grp

Gloves = exposed nurses/doctors protected only by gloves, Hood = exposed nurses/doctors protected by both gloves and a laminar hood with vertical air flow, Control = students and office workers not exposed to antineoplastic drugs.

tailmoment

Tail moment of the comet assay.

taillength

Tail length of the comet assay mu-m.

z1

1 if exposed, 0 if control

z2

1 if Gloves, 0 if Hood, NA if control

f2

TRUE if in factor 2, FALSE otherwise

Details

The data set is intended to illustrate evidence factors, comparing exposed workers to controls, and workers protected only by gloves to workers with the additional protecting of a laminar hood with vertical air flow.

Source

From Tables I and III of Kopjar and Garaj-Vrhovac (2001).

Used as an example in Rosenbaum (2021).

References

Kopjar, N. and Garaj-Vrhovac, V. (2001) <doi:10.1093/mutage/16.1.71> "Application of the alkaline comet assay in human biomonitoring for genotoxicity: a study on Croation medical personnel handling antineoplastic drugs". Mutagenesis, 16, 71-78.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(antineoplastic)
attach(antineoplastic)
table(str)
table(grp)
oldpar<-par(mfrow=c(1,2))
boxplot(tailmoment~grp,ylab="Tail Moment",
       main="3 Groups",ylim=c(8,20))
boxplot(tailmoment~z1,names=c("Control","Exposed"),
       main="Factor 1",ylab="Tail Moment",ylim=c(8,20))
boxplot(tailmoment[f2]~z2[f2],names=c("Hood","Gloves"),
       ylab="Tail Moment",main="Factor 2",ylim=c(8,20))
y<-senstrat::hodgeslehmann(tailmoment,z1,str)
# First factor
senstrat::senstrat(y,z1,str,gamma=20)
# Second factor
senstrat::senstrat(y[f2],z2[f2],str[f2],gamma=2.75)
detach(antineoplastic)
par(oldpar)

Chromosome Damage from Exposure to Benzene

Description

Examines chromosome aberrations among shoe workers exposed to benzene and unexposed controls.

Usage

data("benzene")

Format

A data frame with 78 observations on the following 5 variables.

age

Age in years

exposure

Years employed as a shoe worker. For controls, exposure = 0

alcohol

0=no, 1=yes

smoking

Cigarette smoking, packs per day

totalplus

Total chromosome aberrations, including gaps.

Source

The data are frome Tunca, B.T. and Egeli, U. (1996). Paz-y-Mino (2002) discuss whether to include gaps in chromosome aberrations. Used as an example in Rosenbaum (2001) and Rosenbaum (2021).

References

Paz-y-Mino, C., Davalos, M.V., Sanchez, M.E., Arevalo, M. and Leone, P.E. (2002). <doi:10.1016/S1383-5718(02)00021-9> "Should gaps be included in chromosomal aberration analysis? Evidence based on the comet assay". Mutation Research: Genetic Toxicology and Environmental Mutagenesis, 516, 57-61.

Rosenbaum, P.R. (2001). <doi:10.1093/biomet/88.1.219> "Effects attributable to treatment: Inference in experiments and observational studies with a discrete pivot". Biometrika, 88, 219-231.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Tunca, B.T., Egeli, U. (1996). <doi:10.1289/ehp.961041313> "Cytogenetic findings on shoe workers exposed long-term to benzene". Environmental Health Perspectives, 104, 1313-1317.

Examples

data(benzene)
attach(benzene)
boxplot(totalplus[exposure==0],totalplus[exposure>0],names=c("C","W"),ylab="Chromosome aberrations",
       main=c("Controls vs Workers"),xlab="C=control, W=worker",ylim=c(0,36))
plot(exposure[exposure>0],totalplus[exposure>0],ylim=c(0,36),main="Shoe Workers Only",
       xlab="Years of Exposure",ylab="Chromosome aberrations",pch=16)
wilcox.test(totalplus[exposure>0],totalplus[exposure==0],conf.int=TRUE,alternative = "greater")
cor.test(exposure[exposure>0],totalplus[exposure>0],method="k",alternative = "greater")
detach(benzene)

Minimum Wages and Employment

Description

Data from Card and Krueger's study of minimum wages and employment in New Jersey and Pennsylvania.

Usage

data("ck")

Format

A data frame with 198 observations on the following 7 variables.

mset

Matched set indicator, 1, 2, ..., 66

grp

Group indicator NJhigh NJlow PA

fte

Full time employees, Februrary 1992, before the increase in the minimum wage in New Jersey (NJ).

fte2

Full time employees, November 1992, after the increase in the minimum wage in New Jersey (NJ).

CHAIN

Restaurant chain, Burger King, Wendy's, KFC and Roy Rogers

HRSOPEN

Number of hours the restaurant was open in February 1992

SHEET

Identifier that may be used to link the limited data here to the full data set provided by Card and Krueger.

Details

On 1 April 1992, the state of New Jersy increased its minimum wage from 4.25 dollars per hour to 5.05 dollars per hour, while adjacent Pennsylvania made no change. David Card and Alan Krueger studied the change in employment at fast food restaurants, from before the wage increase to after, based on survey data they collected.

There are three groups matched for HRSOPEN and indicators of the CHAIN. PA are restaurants in eastern Pennsylvania, adjacent to western New Jersey. NJlow are New Jersey restaurants with a starting wage of at most 4.50 dollars in February 1992, and NJhigh are New Jersey restaurants with a starting wage of more than 4.50 dollars. The NJlow restaurants were required to increase the starting wage by substantially more than the NJhigh restaurants.

The match was built using the algorithm of Karmakar et al. (2019) and the approximatch package in R.

Used as an example in Rosenbaum (2021).

Using a different matched sample, an analysis of two evidence factors in Card and Krueger's study is given in Rosenbaum (2010).

Source

Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.

References

Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.

Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2019) <doi:10.1080/10618600.2019.1584900> "Using approximation algorithms to build evidence factors and related designs for observational studies". Journal of Computational and Graphical Statistics, 28, 698-709.

Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2021) <doi:10.1080/01621459.2020.1745811> Reinforced designs: Multiple instruments plus control groups as evidence factors in an observational study of the effectiveness of Catholic schools. Journal of the American Statistical Association, 116, 82-92.

Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(ck)
boxplot((ck$fte2-ck$fte)~ck$grp)

Matching the Minimum Wage Data

Description

The data are Card and Krueger's (1994) minimum wage case study, prior to matching for CHAIN and HRSOPEN. The data illustrate matching to form three groups using Bikram Karmakar's approxmatch package. A similar matched data set after matching is in ck in this package: ck is a matched subset of all of the data in ckA.

Usage

data("ckA")

Format

A data frame with 351 observations on the following 9 variables.

SHEET

Identifier that may be used to link the limited data here to the full data set provided by Card and Krueger.

HRSOPEN

Number of hours the restaurant was open in February 1992

CHAIN

Restaurant chain, Burger King, Wendy's, KFC and Roy Rogers

chain1

Binary indicator, chain 1

chain2

Binary indicator, chain 2

chain3

Binary indicator, chain 3

FTE

Full time employees, Februrary 1992, before the increase in the minimum wage in New Jersey (NJ).

FTE2

Full time employees, November 1992, after the increase in the minimum wage in New Jersey (NJ).

grp

Group indicator NJhigh NJlow PA

Details

These data are used to illustrate matching to form matched triples from three groups, Pennsylvania restaurants, New Jersey restaurants with low starting wages before the wage increase, and New Jersey restaurants with high starting wages before the wage increase. The low/high wage cut is at 4.50 dollars per hour. The method illustrated is from Karmakar et al. (2019) as implemented in Karmakar's approxmatch package.

If you would like to create the match in the ck data using the ckA data, then you must install the optmach package, accept its academic license, and install the approxmatch package. The optmatch packages uses the relax code of Bertsekas and Tseng (1988). The following code will then recreate the matched data set. As the matched data is available in ck, there is no need to run this code, unless you are interested in how to use the approxmatch package.

dist<-approxmatch::multigrp_dist_struc(ckA,as.character(ckA$grp), list(mahal=c("chain1","chain2","chain3", "HRSOPEN")),wgts=1) mtch<-approxmatch::tripletmatching(dist,as.character(ckA$grp), indexgroup="PA",ckA,"CHAIN",design=c(1,1,1))

Used as an example in Rosenbaum (2021).

Using a different matched sample, an analysis of two evidence factors in Card and Krueger's study is given in Rosenbaum (2010).

Source

Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.

References

Bertsekas, D. P. and Tseng, P. (1988) <doi:10.1007/BF02288322> "The Relax codes for linear minimum cost network flow problems". Annals of Operations Research, 13(1), 125-190.

Card, D. and Krueger, A. B. (1994) <doi:10.1257/aer.90.5.1397> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.

Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2019) <doi:10.1080/10618600.2019.1584900> "Using approximation algorithms to build evidence factors and related designs for observational studies". Journal of Computational and Graphical Statistics, 28, 698-709.

Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(ckA)
table(ckA$CHAIN)
table(ckA$grp)

Smoking and Homocysteine

Description

Data from NHANES 2005-2006 concerning homocysteine levels in daily smokers (z=1) and never smokers (z=0), aged 20 and older. Daily smokers smoked every day for the last 30 days, smoking an average of at least 10 cigarettes per day. Never smokers smoked fewer than 100 cigarettes in their lives, do not smoke now, and had no tobacco use in the previous 5 days. There are 2475 individuals in 104 strata defined by gender, age, education, income, and BMI.

Usage

data("hsmoke")

Format

A data frame with 2475 observations on the following 15 variables.

SEQN

2005-2006 NHANES ID number.

homocysteine

Homocysteine level, umol/L. Based on LBXHCY.

z

z=1 for a daily smoker, z=0 for a never smoker. Based on SMQ020, SMQ040, SMD641, SMD650, SMQ680.

female

1=female, 0=male. Based on RIAGENDR

age

Age in years. Based on RIDAGEYR.

education

Education. Based on DMDEDUC2.

povertyr

Ratio of family income to the poverty level, capped at five times the poverty level. Based on INDFMPIR.

bmi

BMI = body-mass-index. Based on BMXBMI.

cotinine

Cotinine level (ng/mL), a marker for recent smoking. Based on LBXCOT.

st

Numeric strata indicating female, age, education, BMI and poverty.

stf

A factor for strata indicating female, age, education, BMI and poverty.

age3

Three age categories, 20-39, 40-50, >=60. Used to define the strata.

ed3

Three education categories, <High School, High School, at least some College. Used to define the strata.

bmi3

Three categories of the body-mass-index, BMI, <30, [30,35), >= 35. Used to define the strata.

pov2

TRUE=income at least twice the poverty level, FALSE otherwise. Used to define the strata.

Details

Bazzano et al. (2003) noted higher homocysteine levels in the blood of smokers. This data set reexamines this issue using more recent data from NHANES 2005-2006, the most recent NHANES that measured homocysteine.

The example reproduces analyses from Rosenbaum (2018). The example illustrates use a sensitivity analysis with 104 strata defined by covariates, with and without additional adjustment for continuous versions of these covariates using robust covariance adjustment; see Rosenbaum (2002).

Used as an example in Rosenbaum (2021).

Other analyses of smoking and homocysteine are in Karmakar et al. (2020) and Pimentel et al. (2015).

Source

NHANES, the US National Health and Nutrition Examination Survey, 2005-2006. Publicly available from the US National Center for Health Statistics, US Centers for Disease Control.

References

Bazzano, L. A., He, J., Muntner, P., Vupputuri, S. and Whelton, P. K. (2003) <doi:10.7326/0003-4819-138-11-200306030-00010> "Relationship between cigarette smoking and novel risk factors for cardiovascular disease in the United States". Annals of Internal Medicine, 138, 891-897.

Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2020) <doi:10.1093/aje/kwz263> "Using evidence factors to clarify exposure biomarkers". American Journal of Epidemiology, 189, 243–249.

Pimentel, S. D., Small, D. S. and Rosenbaum, P. R. (2016) <doi:10.1080/01621459.2015.1076342> "Constructed second control groups and attenuation of unmeasured biases". Journal of the American Statistical Association, 111, 1157-1167.

Rosenbaum, P. R. (2002). <doi:10.1214/ss/1042727942> "Covariance adjustment in randomized experiments and observational studies". Statistical Science, 17(3), 286-327.

Rosenbaum, P. R. (2018) <doi:10.1214/18-AOAS1153> "Sensitivity analysis for stratified comparisons in an observational study of the effect of smoking on homocysteine levels". Annals of Applied Statistics 12(4):2312–2334.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(hsmoke)
attach(hsmoke)
# Calculations discussed in Rosenbaum (2018).

# Stratified analysis of log2(homocysteine)
# Uses Hodges-Lehmann aligned ranks
l2h<-log2(homocysteine)
hll2h<-senstrat::hodgeslehmann(l2h,z,st,align="hl")
senstrat::senstrat(hll2h,z,st,gamma=1.95)
senstrat::senstrat(hll2h,z,st,gamma=2.1)

# Stratification + Covariance Adjustment
# Covariance model does not include the treatment, z
mod<-MASS::rlm(l2h~female+age+povertyr+bmi+education)
l2hr<-as.vector(mod$residual)
hll2hr<-senstrat::hodgeslehmann(l2hr,z,st,align="hl")
senstrat::senstrat(hll2hr,z,st,gamma=2.1)

# Uses M-scores in place of aligned ranks
msr<-senstrat::mscores(l2hr,z,st=st)
senstrat::senstrat(msr,z,st,gamma=2.35)

# Evidence factor analysis from Section 6.2 of Rosenbaum (2018)
# Among smokers, is homocysteine higher with high cotinine?
summary(cotinine[z==1]) # Cotinine among smokers
table(cotinine[z==1]<=185.750)
table(cotinine[z==1]>=342)
use<-(z==1)&((cotinine<=185.750)|(cotinine>=342))
cot<-1*((cotinine>=342))
# Figure 4 in Rosenbaum (2018)
graphics::boxplot(l2hr[use]~cot[use],names=c("Low","High"),
      ylab="Residual",xlab="Cotinine")
cotl2hr<-senstrat::mscores(l2hr[use],cot[use],st=st[use])
senstrat::senstrat(cotl2hr,cot[use],st[use],gamma=2.5)
detach(hsmoke)

Lead in Children

Description

Data from Morton et al. (1982) concerning exposed children whose fathers worked in a battery plant where lead was used in the manufacture of batteries. Exposed children were matched to controls for age and neighborhood. For exposed children, also given are the father's level of exposure to lead at work (level) and the father's hygiene upon leaving the battery plant at the end of the day.

Usage

data("lead")

Format

A data frame with 33 observations on the following 6 variables.

control

Blood lead level for the control, micrograms of lead per decaliter of whole blood.

exposed

Blood lead level for the exposed/treated child, micrograms of lead per decaliter of whole blood.

level

Father's level of exposure to lead: a factor with levels high low medium

hyg

Father's hygiene before going home: a factor with levels good mod poor

both

A factor built from level and hyg: a factor with levels high.ok high.poor low medium

dif

Exposed-minus-control pair difference in blood lead levels.

Details

The data were assembled from two published tables in Morton et al. (1982). One matched pair with no control is omitted here. Small ambiguities in assembling a complete data set from two tables were resolved by a throw of the dice; however, it is a reasonable example to illustrate statistical methods. One table described the exposed-versus-control matched pairs, and these are as in the paper. The second table described the exposed individuals, their level of exposure and their hygiene, and again these are as in the paper. The two tables were linked using the blood lead level of the exposed children, and a couple of ties in these lead levels made for small ambiguities about which control responses belong with which hygienes and exposure levels for the exposed children. See, for instance, rows 18 and 22, where both exposed children have blood lead level 34.

Source

Data are from Morton et al. (1982). They were used as an example in Rosenbaum (1991, 2002, 2011, 2017, 2021).

References

Morton, D. E., Saah, A. J., Silberg, S. L., Owens, W. L., Roberts, M. A., & Saah, M. D. (1982) <doi:10.1093/oxfordjournals.aje.a113336> "Lead absorption in children of employees in a lead-related industry". American Journal of Epidemiology, 115(4), 549-555.

Rosenbaum, P. R. (1991) <doi:10.1214/aos/1176348141> "Some poset statistics". The Annals of Statistics, 19(2), 1091-1097.

Rosenbaum, P. R. (2002) <doi:10.1007/978-1-4757-3692-2_3> "Observational Studies" (2nd edition). New York: Springer. Section 4.3.

Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.

Rosenbaum, P. R. (2011) <doi:10.1111/j.1541-0420.2010.01535.x> "A new U-statistic with superior design sensitivity in matched observational studies". Biometrics, 67(3), 1017-1027.

Rosenbaum, P. R. (2017) <https://www.hup.harvard.edu/catalog.php?isbn=9780674975576> "Observation and Experiment: An Introduction to Causal Inference". Cambridge, MA: Harvard University Press. Chapters 7 and 9.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(lead)
# Reproduces parts of Table 2 in Rosenbaum (2011)
DOS2::senU(lead$dif,gamma=5.8,m=8,m1=5,m2=8)
DOS2::senU(lead$dif,gamma=5,m=5,m1=4,m2=5)

# m=2, m1=2, m2=2 is the U-statistic that closely
# resembles Wilcoxon's signed rank test.  Note
# that the results are almost the same.
DOS2::senWilcox(lead$dif,gamma=5) # In Table 2
DOS2::senU(lead$dif,gamma=5,m=2,m1=2,m2=2)

DNA Damage in Lead Workers

Description

Data from Table 1 of Wu et al. (2002) concerning DNA damage among lead workers compared to controls. In this example, the data have been matched for age and smoking, making 11 matched pairs. Used as an example in Rosenbaum (2021).

Usage

data("leadworker")

Format

A data frame with 22 observations on the following 7 variables.

mset

Matched set indicator, 1, 2, ..., 11

group

1=lead worker, 0=control

age

Age in years

smoking

Smoking, pack-years

bll

Blood lead level, mu-g/dl

dpc

DNA-protein cross-links, percent

id

ID number in the original study, before matching

Details

The outcome, a measure of DNA damage is dpc.

Source

Fang-Yang Wu, Pao-Wen Chang, Chin-Ching Wu and Hsien-Wen Kuo (2002). "Correlations of Blood Lead with DNA-Protein Cross-Links and Sister Chromatid Exchanges in Lead Workers". Cancer Epidemiology, Biomarkers and Prevtion. March 1 2002 11 (3) 287-290.

References

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Examples

data(leadworker)
boxplot(leadworker$dpc~leadworker$group)

Smoking and Periodontal Disease

Description

Data from NHANES 2011-2012 containing 441 matched pairs of a daily cigarette smoker and a never smoker, recording the extent of periodontal disease. See Rosenbaum (2017, 2021).

Usage

data("periodontal")

Format

A data frame with 882 observations on the following 12 variables.

SEQN

NHANES 2011-2012 sequence number

female

=1 for female, 0 for male

age

Age in years

black

=1 for black, 0 for other

educf

Education, in five categories. An ordered factor with levels <9 for less than 9th grade, 9 to 11 for 9th to 11th grade, HS/GED for high school or GED degree, SomeCol for some college, College for college degree.

income

Ratio of family income to the poverty level, capped at 5 times the poverty level.

cigsperday

Cigarettes smoked per day for daily smokers, 0 for never smokers

either

Number of periodonal measurements indicative of periodontal disease.

neither

Number of periodonal measurements

pcteither

Percent indicative of periodontal disease, =100*either/neither.

z

Treatment indicator, 1=daily smoker, 0=never smoker

mset

Matched set indicator, 1 to 441.

Details

Excluding wisdom teeth, 6 measurements are taken for each tooth that is present, up to 28 teeth. Following Tomar and Asma (2000), a measurement indicates periodontal disease if either there is a loss of attachment of at least 4mm or a pocket depth of at least 4mm. The first individual has 11 measurements indicative of periodontal disease, out of 106 measurements, so pcteither is 100*11/106 = 10.38 percent.

Source

Data are from the National Health and Nutrition Examination Survey 2011-2012 and were used as an example in Rosenbaum (2017).

References

Rosenbaum, P. R. (2015) <https://obsstudies.org/two-r-packages-for-sensitivity-analysis-in-observational-studies/> "Two R packages for sensitivity analysis in observational studies". Observational Studies, 1(1), 1-17.

Rosenbaum, P. R. (2017) <doi:10.1214/17-STS621> "The general structure of evidence factors in observational studies". Statistical Science 32, 514-530.

Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Tomar, S. L. and Asma, S. (2000) <doi:10.1902/jop.2000.71.5.743> "Smoking attributable periodontitis in the US: Findings from NHANES III". J Periodont 71, 743-751.

"US National Health and Nutrition Examination Survey 2011-2012". www.cdc.gov/nchs/nhanes/index.htm

Examples

# Figure 1 in Rosenbaum (2017)
data(periodontal)
attach(periodontal)
oldpar<-par()
m<-matrix(1:2,1,2)
layout(m,widths=c(1,2))
boxplot(pcteither[z==1]-pcteither[z==0],ylab="Smoker-Control Difference",
        main="(i)",xlab="Matched Pairs",ylim=c(-100,100))
abline(h=0,lty=2)
DOS2::crosscutplot(cigsperday[z==1],
    pcteither[z==1]-pcteither[z==0],
    ylab="Smoker-Control Difference",
    xlab="Cigarettes per Day",main="(ii)",
    ylim=c(-100,100))
abline(h=0,lty=2)

# Sensitivity analysis in Section 2.3 of Rosenbaum (2017)
y<-pcteither[z==1]-pcteither[z==0]
x<-cigsperday[z==1]
DOS2::senWilcox(y,gamma=2.76)
# The following is the same as sensitivitymw::senmw(y,gamma=2.77,method="p")
sensitivitymult::senm(pcteither,z,mset,gamma=2.77,inner=.5,trim=2)
# The following is the same as sensitivitymw::senmw(y,gamma=3.5,method="p")
sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=.5,trim=2)
# Second evidence factor
DOS2::crosscut(x,y)
DOS2::crosscut(x,y,gamma=1.6)

# Note, however, that other statistics report greater insensitivity to
# bias by virtue of having larger design sensitivity:
sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=1,trim=4)
sensitivitymult::senm(pcteither,z,mset,gamma=4.2,inner=1,trim=4)
DOS2::senU(y,m1=4,m2=5,m=5,gamma=2.77)
DOS2::senU(y,m1=6,m2=8,m=8,gamma=2.77)
DOS2::senU(y,m1=6,m2=8,m=8,gamma=3.5)
detach(periodontal)
par(oldpar)

DNA Damage Among Tannery Workers

Description

Data are from Zhang et al. (2008, Table 2) who studied DNA damage among tannery workers often exposed to trivalent chromium. The outcome is the mean tail moment (mtm) of the comet assay, a standard measure of DNA damage, with higher values signifying greater damage. The study describes 90 males in 30 blocks of 3 individuals. There are three groups, each with 30 individuals. The three groups had a simlar distribution of ages, and the blocks control for smoking as closely as possible. Group e1 consists of 30 exposed workers at the tannery who worked in the tannery department, where the highest exposures to trivalent chromium are expected. Group e2 consists of 30 workers at the tannery who worked in the finishing department, where exposure to trivalent chromium is expected to be much lower. Group c consists of 30 controls who did not work at the tannery. This example is discussed in Chapter 20 of "Design of Observational Studies", 2nd ed.

Usage

data("tannery")

Format

A data frame with 30 observations on the following 4 variables.

block

Block indicator, 1 to 30.

e1mtm

mtm for the tannery worker from the tannery department.

e2mtm

mtm for the tannery worker from the finishing department.

cmtm

mtm for the control who did not work in the tannery

Details

The comet assay is described by Collins (2004). It is thought to measure DNA strand breaks, producing an image that resembles the tail of a comet, a larger, longer tail suggesting more extensive strand breaks. This example was discussed in Rosenbaum (2011, 2021).

Source

Data from Zhang et al. (2008, Table 2).

References

Collins, A. R. (2004) <doi:10.1385/MB:26:3:249> "The comet assay for DNA damage and repair: principles, applications, and limitations". Molecular Biotechnology 26(3), 249-261.

Rosenbaum, P. R. (2011) <doi:10.1198/jasa.2011.tm10422> "Some approximate evidence factors in observational studies". Journal of the American Statistical Association, 106, 285-293.

Rosenbaum, P. R. (2013)<doi:10.1111/j.1541-0420.2012.01821.x> "Impact of multiple matched controls on design sensitivity in observational studies". Biometrics 69 118-127. (Introduces inner trimming.)

Rosenbaum, P. R. (2015) <https://obsstudies.org/two-r-packages-for-sensitivity-analysis-in-observational-studies/> "Two R packages for sensitivity analysis in observational studies". Observational Studies, 1(1), 1-17.

Rosenbaum, P. R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.

Zhang, M., Chen, Z., Chen, Q., Zou, H., Lou, J. He, J. (2008) <doi:10.1016/j.mrgentox.2008.04.011> "Investigating DNA damage in tannery workers occupationally exposed to trivalent chromium using comet assay". Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 654(1), 45-51.

Examples

data(tannery)
boxplot(tannery[,2:4],names=c("Tannery E1","Finishing E2",
    "Control C"),ylab="Mean Tail Moment")
oldpar<-par(mfrow=c(1,2))
boxplot(tannery[,2:3],names=c("E1","E2"),ylab="Mean Tail Moment",
     main="Tannery vs. Finishing",ylim=c(0,12))
boxplot(as.vector(unlist(tannery[,2:3])),tannery[,4],
     names=c("E1+E2","C"),ylab="Mean Tail Moment",
     main="Exposed vs. Control",ylim=c(0,12))

# Stratified Wilcoxon analysis from the chapter Evidence Factors
# of Design of Observational Studies, Second Edition
# Also reproduces the F1, F2 example in Rosenbaum (2011, sec 6).
y<-tannery[,2:4]
rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2,C)
sum(rkc[,3]) # Stratified rank sum for C in (E1, E2, C)
(35-60)/sqrt(20)

y<-tannery[,2:3]
rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2)
sum(rkc[,2]) # Stratified rank sum for E2 in (E1, E2)

# Reorganize y for input to 'separable1v' from 'sensitivitymult'
# 'separable1v' is one-sided, looking for a large rank sum

# Factor 1
y<-tannery[,4:2]*(-1)
rkc<-t(apply(y,1,rank)) # Ranks for -y for (C,E2,E1)
sensitivitymult::separable1v(rkc,gamma=1)
# Test for C in (E1, E2, C)
(85-60)/sqrt(20)
(35-60)/sqrt(20)
sensitivitymult::separable1v(rkc,gamma=6)
# Test for C in (E1, E2, C)
p1<-sensitivitymult::separable1v(rkc,gamma=7)$pval

#Factor 2
y<-tannery[,3:2]*(-1)
rkc<-t(apply(y,1,rank)) # Ranks for -y for (E2,E1)
sensitivitymult::separable1v(rkc,gamma=1)
# Test for E2 in (E2, E1)
# Combine P-values using Fisher's method
sensitivitymv::truncatedP(c(1.134237e-08,0.001743502),trunc=1)

# Larger gammas
sensitivitymult::separable1v(rkc,gamma=1.7)
p2<-sensitivitymult::separable1v(rkc,gamma=2)$pval
# Combine P-values using Fisher's method
c(p1,p2)
sensitivitymv::truncatedP(c(p1,p2),trunc=1)

# Nearly reproduces calculations from Section 6 of Rosenbaum (2011)
# However, in Rosenbaum (2011), the second factor
# uses a pooled scale factor, whereas senm does not,
# so the result is very slightly different.
attach(tannery)
mset<-rep(block,3)
zC<-c(rep(0,60),rep(1,30))
z12<-c(rep(1,30),rep(0,30),rep(NA,30))
y<-c(e1mtm,e2mtm,cmtm)
detach(tannery)
use<-!is.na(z12)
# Factor 1
sensitivitymult::senm(y,zC,mset,gamma=1,
   alternative="less",trim=1)
sensitivitymult::senm(y,zC,mset,gamma=11.7,
   alternative="less",trim=1)
# Factor 2
sensitivitymult::senm(y[use],z12[use],mset[use],
   gamma=2,alternative="greater",trim=1)

# Combine two evidence factors
p1<-sensitivitymult::senm(y,zC,mset,gamma=12,
    alternative="less",trim=1)$pval
p2<-sensitivitymult::senm(y[use],z12[use],mset[use],gamma=3,
    alternative="greater",trim=1)$pval
c(p1,p2)
sensitivitymv::truncatedP(c(p1,p2),trunc=1)
# Combine p-values using Fisher's method

# Other psi-functions often have higher design
# sensitivity; see Rosenbaum (2013)
par(oldpar)