Title: | Evidence Factors in Observational Studies |
---|---|
Description: | Contains a collection of examples of evidence factors in observational studies from the book Replication and Evidence Factors in Observational Studies by Paul R. Rosenbaum (2021) <doi:10.1201/9781003039648>. |
Authors: | Paul R. Rosenbaum |
Maintainer: | Paul R. Rosenbaum <[email protected]> |
License: | GPL-2 |
Version: | 1.0.4 |
Built: | 2024-12-18 06:34:40 UTC |
Source: | CRAN |
Two groups of exposed workers compared to unexposed controls, where exposed workers prepared antineoplasitic drugs, protected either only by gloves or by gloves and a laminar hood with vertical air flow. The outcome is the comet assay applied to blood lymphocytes. The comet assay is a measure of damage to DNA. Data from Kopjar and Garaj-Vrhovac (2001). These data illustrate two evidence factors formed when comparing three treatment groups.
data("antineoplastic")
data("antineoplastic")
A data frame with 59 observations on the following 9 variables.
id
ID number. See Tables I and III of Kopjar and Garaj-Vrhovac (2001).
age
Age in years
str
Three age strata, cut at the thirds of age.
grp
Gloves = exposed nurses/doctors protected only by gloves, Hood = exposed nurses/doctors protected by both gloves and a laminar hood with vertical air flow, Control = students and office workers not exposed to antineoplastic drugs.
tailmoment
Tail moment of the comet assay.
taillength
Tail length of the comet assay mu-m.
z1
1 if exposed, 0 if control
z2
1 if Gloves, 0 if Hood, NA if control
f2
TRUE if in factor 2, FALSE otherwise
The data set is intended to illustrate evidence factors, comparing exposed workers to controls, and workers protected only by gloves to workers with the additional protecting of a laminar hood with vertical air flow.
From Tables I and III of Kopjar and Garaj-Vrhovac (2001).
Used as an example in Rosenbaum (2021).
Kopjar, N. and Garaj-Vrhovac, V. (2001) <doi:10.1093/mutage/16.1.71> "Application of the alkaline comet assay in human biomonitoring for genotoxicity: a study on Croation medical personnel handling antineoplastic drugs". Mutagenesis, 16, 71-78.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(antineoplastic) attach(antineoplastic) table(str) table(grp) oldpar<-par(mfrow=c(1,2)) boxplot(tailmoment~grp,ylab="Tail Moment", main="3 Groups",ylim=c(8,20)) boxplot(tailmoment~z1,names=c("Control","Exposed"), main="Factor 1",ylab="Tail Moment",ylim=c(8,20)) boxplot(tailmoment[f2]~z2[f2],names=c("Hood","Gloves"), ylab="Tail Moment",main="Factor 2",ylim=c(8,20)) y<-senstrat::hodgeslehmann(tailmoment,z1,str) # First factor senstrat::senstrat(y,z1,str,gamma=20) # Second factor senstrat::senstrat(y[f2],z2[f2],str[f2],gamma=2.75) detach(antineoplastic) par(oldpar)
data(antineoplastic) attach(antineoplastic) table(str) table(grp) oldpar<-par(mfrow=c(1,2)) boxplot(tailmoment~grp,ylab="Tail Moment", main="3 Groups",ylim=c(8,20)) boxplot(tailmoment~z1,names=c("Control","Exposed"), main="Factor 1",ylab="Tail Moment",ylim=c(8,20)) boxplot(tailmoment[f2]~z2[f2],names=c("Hood","Gloves"), ylab="Tail Moment",main="Factor 2",ylim=c(8,20)) y<-senstrat::hodgeslehmann(tailmoment,z1,str) # First factor senstrat::senstrat(y,z1,str,gamma=20) # Second factor senstrat::senstrat(y[f2],z2[f2],str[f2],gamma=2.75) detach(antineoplastic) par(oldpar)
Examines chromosome aberrations among shoe workers exposed to benzene and unexposed controls.
data("benzene")
data("benzene")
A data frame with 78 observations on the following 5 variables.
age
Age in years
exposure
Years employed as a shoe worker. For controls, exposure = 0
alcohol
0=no, 1=yes
smoking
Cigarette smoking, packs per day
totalplus
Total chromosome aberrations, including gaps.
The data are frome Tunca, B.T. and Egeli, U. (1996). Paz-y-Mino (2002) discuss whether to include gaps in chromosome aberrations. Used as an example in Rosenbaum (2001) and Rosenbaum (2021).
Paz-y-Mino, C., Davalos, M.V., Sanchez, M.E., Arevalo, M. and Leone, P.E. (2002). <doi:10.1016/S1383-5718(02)00021-9> "Should gaps be included in chromosomal aberration analysis? Evidence based on the comet assay". Mutation Research: Genetic Toxicology and Environmental Mutagenesis, 516, 57-61.
Rosenbaum, P.R. (2001). <doi:10.1093/biomet/88.1.219> "Effects attributable to treatment: Inference in experiments and observational studies with a discrete pivot". Biometrika, 88, 219-231.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
Tunca, B.T., Egeli, U. (1996). <doi:10.1289/ehp.961041313> "Cytogenetic findings on shoe workers exposed long-term to benzene". Environmental Health Perspectives, 104, 1313-1317.
data(benzene) attach(benzene) boxplot(totalplus[exposure==0],totalplus[exposure>0],names=c("C","W"),ylab="Chromosome aberrations", main=c("Controls vs Workers"),xlab="C=control, W=worker",ylim=c(0,36)) plot(exposure[exposure>0],totalplus[exposure>0],ylim=c(0,36),main="Shoe Workers Only", xlab="Years of Exposure",ylab="Chromosome aberrations",pch=16) wilcox.test(totalplus[exposure>0],totalplus[exposure==0],conf.int=TRUE,alternative = "greater") cor.test(exposure[exposure>0],totalplus[exposure>0],method="k",alternative = "greater") detach(benzene)
data(benzene) attach(benzene) boxplot(totalplus[exposure==0],totalplus[exposure>0],names=c("C","W"),ylab="Chromosome aberrations", main=c("Controls vs Workers"),xlab="C=control, W=worker",ylim=c(0,36)) plot(exposure[exposure>0],totalplus[exposure>0],ylim=c(0,36),main="Shoe Workers Only", xlab="Years of Exposure",ylab="Chromosome aberrations",pch=16) wilcox.test(totalplus[exposure>0],totalplus[exposure==0],conf.int=TRUE,alternative = "greater") cor.test(exposure[exposure>0],totalplus[exposure>0],method="k",alternative = "greater") detach(benzene)
Data from Card and Krueger's study of minimum wages and employment in New Jersey and Pennsylvania.
data("ck")
data("ck")
A data frame with 198 observations on the following 7 variables.
mset
Matched set indicator, 1, 2, ..., 66
grp
Group indicator NJhigh
NJlow
PA
fte
Full time employees, Februrary 1992, before the increase in the minimum wage in New Jersey (NJ).
fte2
Full time employees, November 1992, after the increase in the minimum wage in New Jersey (NJ).
CHAIN
Restaurant chain, Burger King, Wendy's, KFC and Roy Rogers
HRSOPEN
Number of hours the restaurant was open in February 1992
SHEET
Identifier that may be used to link the limited data here to the full data set provided by Card and Krueger.
On 1 April 1992, the state of New Jersy increased its minimum wage from 4.25 dollars per hour to 5.05 dollars per hour, while adjacent Pennsylvania made no change. David Card and Alan Krueger studied the change in employment at fast food restaurants, from before the wage increase to after, based on survey data they collected.
There are three groups matched for HRSOPEN and indicators of the CHAIN. PA are restaurants in eastern Pennsylvania, adjacent to western New Jersey. NJlow are New Jersey restaurants with a starting wage of at most 4.50 dollars in February 1992, and NJhigh are New Jersey restaurants with a starting wage of more than 4.50 dollars. The NJlow restaurants were required to increase the starting wage by substantially more than the NJhigh restaurants.
The match was built using the algorithm of Karmakar et al. (2019) and the approximatch package in R.
Used as an example in Rosenbaum (2021).
Using a different matched sample, an analysis of two evidence factors in Card and Krueger's study is given in Rosenbaum (2010).
Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.
Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.
Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2019) <doi:10.1080/10618600.2019.1584900> "Using approximation algorithms to build evidence factors and related designs for observational studies". Journal of Computational and Graphical Statistics, 28, 698-709.
Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2021) <doi:10.1080/01621459.2020.1745811> Reinforced designs: Multiple instruments plus control groups as evidence factors in an observational study of the effectiveness of Catholic schools. Journal of the American Statistical Association, 116, 82-92.
Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(ck) boxplot((ck$fte2-ck$fte)~ck$grp)
data(ck) boxplot((ck$fte2-ck$fte)~ck$grp)
The data are Card and Krueger's (1994) minimum wage case study, prior to matching for CHAIN and HRSOPEN. The data illustrate matching to form three groups using Bikram Karmakar's approxmatch package. A similar matched data set after matching is in ck in this package: ck is a matched subset of all of the data in ckA.
data("ckA")
data("ckA")
A data frame with 351 observations on the following 9 variables.
SHEET
Identifier that may be used to link the limited data here to the full data set provided by Card and Krueger.
HRSOPEN
Number of hours the restaurant was open in February 1992
CHAIN
Restaurant chain, Burger King, Wendy's, KFC and Roy Rogers
chain1
Binary indicator, chain 1
chain2
Binary indicator, chain 2
chain3
Binary indicator, chain 3
FTE
Full time employees, Februrary 1992, before the increase in the minimum wage in New Jersey (NJ).
FTE2
Full time employees, November 1992, after the increase in the minimum wage in New Jersey (NJ).
grp
Group indicator NJhigh
NJlow
PA
These data are used to illustrate matching to form matched triples from three groups, Pennsylvania restaurants, New Jersey restaurants with low starting wages before the wage increase, and New Jersey restaurants with high starting wages before the wage increase. The low/high wage cut is at 4.50 dollars per hour. The method illustrated is from Karmakar et al. (2019) as implemented in Karmakar's approxmatch package.
If you would like to create the match in the ck data using the ckA data, then you must install the optmach package, accept its academic license, and install the approxmatch package. The optmatch packages uses the relax code of Bertsekas and Tseng (1988). The following code will then recreate the matched data set. As the matched data is available in ck, there is no need to run this code, unless you are interested in how to use the approxmatch package.
dist<-approxmatch::multigrp_dist_struc(ckA,as.character(ckA$grp), list(mahal=c("chain1","chain2","chain3", "HRSOPEN")),wgts=1) mtch<-approxmatch::tripletmatching(dist,as.character(ckA$grp), indexgroup="PA",ckA,"CHAIN",design=c(1,1,1))
Used as an example in Rosenbaum (2021).
Using a different matched sample, an analysis of two evidence factors in Card and Krueger's study is given in Rosenbaum (2010).
Card, D. and Krueger, A. B. (1994) <doi:10.1080/10618600.2019.1584900> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.
Bertsekas, D. P. and Tseng, P. (1988) <doi:10.1007/BF02288322> "The Relax codes for linear minimum cost network flow problems". Annals of Operations Research, 13(1), 125-190.
Card, D. and Krueger, A. B. (1994) <doi:10.1257/aer.90.5.1397> "Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania". American Economic Review, 84, 772-793.
Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2019) <doi:10.1080/10618600.2019.1584900> "Using approximation algorithms to build evidence factors and related designs for observational studies". Journal of Computational and Graphical Statistics, 28, 698-709.
Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(ckA) table(ckA$CHAIN) table(ckA$grp)
data(ckA) table(ckA$CHAIN) table(ckA$grp)
Data from NHANES 2005-2006 concerning homocysteine levels in daily smokers (z=1) and never smokers (z=0), aged 20 and older. Daily smokers smoked every day for the last 30 days, smoking an average of at least 10 cigarettes per day. Never smokers smoked fewer than 100 cigarettes in their lives, do not smoke now, and had no tobacco use in the previous 5 days. There are 2475 individuals in 104 strata defined by gender, age, education, income, and BMI.
data("hsmoke")
data("hsmoke")
A data frame with 2475 observations on the following 15 variables.
SEQN
2005-2006 NHANES ID number.
homocysteine
Homocysteine level, umol/L. Based on LBXHCY.
z
z=1 for a daily smoker, z=0 for a never smoker. Based on SMQ020, SMQ040, SMD641, SMD650, SMQ680.
female
1=female, 0=male. Based on RIAGENDR
age
Age in years. Based on RIDAGEYR.
education
Education. Based on DMDEDUC2.
povertyr
Ratio of family income to the poverty level, capped at five times the poverty level. Based on INDFMPIR.
bmi
BMI = body-mass-index. Based on BMXBMI.
cotinine
Cotinine level (ng/mL), a marker for recent smoking. Based on LBXCOT.
st
Numeric strata indicating female, age, education, BMI and poverty.
stf
A factor for strata indicating female, age, education, BMI and poverty.
age3
Three age categories, 20-39, 40-50, >=60. Used to define the strata.
ed3
Three education categories, <High School, High School, at least some College. Used to define the strata.
bmi3
Three categories of the body-mass-index, BMI, <30, [30,35), >= 35. Used to define the strata.
pov2
TRUE=income at least twice the poverty level, FALSE otherwise. Used to define the strata.
Bazzano et al. (2003) noted higher homocysteine levels in the blood of smokers. This data set reexamines this issue using more recent data from NHANES 2005-2006, the most recent NHANES that measured homocysteine.
The example reproduces analyses from Rosenbaum (2018). The example illustrates use a sensitivity analysis with 104 strata defined by covariates, with and without additional adjustment for continuous versions of these covariates using robust covariance adjustment; see Rosenbaum (2002).
Used as an example in Rosenbaum (2021).
Other analyses of smoking and homocysteine are in Karmakar et al. (2020) and Pimentel et al. (2015).
NHANES, the US National Health and Nutrition Examination Survey, 2005-2006. Publicly available from the US National Center for Health Statistics, US Centers for Disease Control.
Bazzano, L. A., He, J., Muntner, P., Vupputuri, S. and Whelton, P. K. (2003) <doi:10.7326/0003-4819-138-11-200306030-00010> "Relationship between cigarette smoking and novel risk factors for cardiovascular disease in the United States". Annals of Internal Medicine, 138, 891-897.
Karmakar, B., Small, D. S. and Rosenbaum, P. R. (2020) <doi:10.1093/aje/kwz263> "Using evidence factors to clarify exposure biomarkers". American Journal of Epidemiology, 189, 243–249.
Pimentel, S. D., Small, D. S. and Rosenbaum, P. R. (2016) <doi:10.1080/01621459.2015.1076342> "Constructed second control groups and attenuation of unmeasured biases". Journal of the American Statistical Association, 111, 1157-1167.
Rosenbaum, P. R. (2002). <doi:10.1214/ss/1042727942> "Covariance adjustment in randomized experiments and observational studies". Statistical Science, 17(3), 286-327.
Rosenbaum, P. R. (2018) <doi:10.1214/18-AOAS1153> "Sensitivity analysis for stratified comparisons in an observational study of the effect of smoking on homocysteine levels". Annals of Applied Statistics 12(4):2312–2334.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(hsmoke) attach(hsmoke) # Calculations discussed in Rosenbaum (2018). # Stratified analysis of log2(homocysteine) # Uses Hodges-Lehmann aligned ranks l2h<-log2(homocysteine) hll2h<-senstrat::hodgeslehmann(l2h,z,st,align="hl") senstrat::senstrat(hll2h,z,st,gamma=1.95) senstrat::senstrat(hll2h,z,st,gamma=2.1) # Stratification + Covariance Adjustment # Covariance model does not include the treatment, z mod<-MASS::rlm(l2h~female+age+povertyr+bmi+education) l2hr<-as.vector(mod$residual) hll2hr<-senstrat::hodgeslehmann(l2hr,z,st,align="hl") senstrat::senstrat(hll2hr,z,st,gamma=2.1) # Uses M-scores in place of aligned ranks msr<-senstrat::mscores(l2hr,z,st=st) senstrat::senstrat(msr,z,st,gamma=2.35) # Evidence factor analysis from Section 6.2 of Rosenbaum (2018) # Among smokers, is homocysteine higher with high cotinine? summary(cotinine[z==1]) # Cotinine among smokers table(cotinine[z==1]<=185.750) table(cotinine[z==1]>=342) use<-(z==1)&((cotinine<=185.750)|(cotinine>=342)) cot<-1*((cotinine>=342)) # Figure 4 in Rosenbaum (2018) graphics::boxplot(l2hr[use]~cot[use],names=c("Low","High"), ylab="Residual",xlab="Cotinine") cotl2hr<-senstrat::mscores(l2hr[use],cot[use],st=st[use]) senstrat::senstrat(cotl2hr,cot[use],st[use],gamma=2.5) detach(hsmoke)
data(hsmoke) attach(hsmoke) # Calculations discussed in Rosenbaum (2018). # Stratified analysis of log2(homocysteine) # Uses Hodges-Lehmann aligned ranks l2h<-log2(homocysteine) hll2h<-senstrat::hodgeslehmann(l2h,z,st,align="hl") senstrat::senstrat(hll2h,z,st,gamma=1.95) senstrat::senstrat(hll2h,z,st,gamma=2.1) # Stratification + Covariance Adjustment # Covariance model does not include the treatment, z mod<-MASS::rlm(l2h~female+age+povertyr+bmi+education) l2hr<-as.vector(mod$residual) hll2hr<-senstrat::hodgeslehmann(l2hr,z,st,align="hl") senstrat::senstrat(hll2hr,z,st,gamma=2.1) # Uses M-scores in place of aligned ranks msr<-senstrat::mscores(l2hr,z,st=st) senstrat::senstrat(msr,z,st,gamma=2.35) # Evidence factor analysis from Section 6.2 of Rosenbaum (2018) # Among smokers, is homocysteine higher with high cotinine? summary(cotinine[z==1]) # Cotinine among smokers table(cotinine[z==1]<=185.750) table(cotinine[z==1]>=342) use<-(z==1)&((cotinine<=185.750)|(cotinine>=342)) cot<-1*((cotinine>=342)) # Figure 4 in Rosenbaum (2018) graphics::boxplot(l2hr[use]~cot[use],names=c("Low","High"), ylab="Residual",xlab="Cotinine") cotl2hr<-senstrat::mscores(l2hr[use],cot[use],st=st[use]) senstrat::senstrat(cotl2hr,cot[use],st[use],gamma=2.5) detach(hsmoke)
Data from Morton et al. (1982) concerning exposed children whose fathers worked in a battery plant where lead was used in the manufacture of batteries. Exposed children were matched to controls for age and neighborhood. For exposed children, also given are the father's level of exposure to lead at work (level) and the father's hygiene upon leaving the battery plant at the end of the day.
data("lead")
data("lead")
A data frame with 33 observations on the following 6 variables.
control
Blood lead level for the control, micrograms of lead per decaliter of whole blood.
exposed
Blood lead level for the exposed/treated child, micrograms of lead per decaliter of whole blood.
level
Father's level of exposure to lead: a factor with levels high
low
medium
hyg
Father's hygiene before going home: a factor with levels good
mod
poor
both
A factor built from level and hyg: a factor with levels high.ok
high.poor
low
medium
dif
Exposed-minus-control pair difference in blood lead levels.
The data were assembled from two published tables in Morton et al. (1982). One matched pair with no control is omitted here. Small ambiguities in assembling a complete data set from two tables were resolved by a throw of the dice; however, it is a reasonable example to illustrate statistical methods. One table described the exposed-versus-control matched pairs, and these are as in the paper. The second table described the exposed individuals, their level of exposure and their hygiene, and again these are as in the paper. The two tables were linked using the blood lead level of the exposed children, and a couple of ties in these lead levels made for small ambiguities about which control responses belong with which hygienes and exposure levels for the exposed children. See, for instance, rows 18 and 22, where both exposed children have blood lead level 34.
Data are from Morton et al. (1982). They were used as an example in Rosenbaum (1991, 2002, 2011, 2017, 2021).
Morton, D. E., Saah, A. J., Silberg, S. L., Owens, W. L., Roberts, M. A., & Saah, M. D. (1982) <doi:10.1093/oxfordjournals.aje.a113336> "Lead absorption in children of employees in a lead-related industry". American Journal of Epidemiology, 115(4), 549-555.
Rosenbaum, P. R. (1991) <doi:10.1214/aos/1176348141> "Some poset statistics". The Annals of Statistics, 19(2), 1091-1097.
Rosenbaum, P. R. (2002) <doi:10.1007/978-1-4757-3692-2_3> "Observational Studies" (2nd edition). New York: Springer. Section 4.3.
Rosenbaum, P. R. (2010). <doi:10.1093/biomet/asq019> "Evidence factors in observational studies". Biometrika, 97(2), 333-345.
Rosenbaum, P. R. (2011) <doi:10.1111/j.1541-0420.2010.01535.x> "A new U-statistic with superior design sensitivity in matched observational studies". Biometrics, 67(3), 1017-1027.
Rosenbaum, P. R. (2017) <https://www.hup.harvard.edu/catalog.php?isbn=9780674975576> "Observation and Experiment: An Introduction to Causal Inference". Cambridge, MA: Harvard University Press. Chapters 7 and 9.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(lead) # Reproduces parts of Table 2 in Rosenbaum (2011) DOS2::senU(lead$dif,gamma=5.8,m=8,m1=5,m2=8) DOS2::senU(lead$dif,gamma=5,m=5,m1=4,m2=5) # m=2, m1=2, m2=2 is the U-statistic that closely # resembles Wilcoxon's signed rank test. Note # that the results are almost the same. DOS2::senWilcox(lead$dif,gamma=5) # In Table 2 DOS2::senU(lead$dif,gamma=5,m=2,m1=2,m2=2)
data(lead) # Reproduces parts of Table 2 in Rosenbaum (2011) DOS2::senU(lead$dif,gamma=5.8,m=8,m1=5,m2=8) DOS2::senU(lead$dif,gamma=5,m=5,m1=4,m2=5) # m=2, m1=2, m2=2 is the U-statistic that closely # resembles Wilcoxon's signed rank test. Note # that the results are almost the same. DOS2::senWilcox(lead$dif,gamma=5) # In Table 2 DOS2::senU(lead$dif,gamma=5,m=2,m1=2,m2=2)
Data from Table 1 of Wu et al. (2002) concerning DNA damage among lead workers compared to controls. In this example, the data have been matched for age and smoking, making 11 matched pairs. Used as an example in Rosenbaum (2021).
data("leadworker")
data("leadworker")
A data frame with 22 observations on the following 7 variables.
mset
Matched set indicator, 1, 2, ..., 11
group
1=lead worker, 0=control
age
Age in years
smoking
Smoking, pack-years
bll
Blood lead level, mu-g/dl
dpc
DNA-protein cross-links, percent
id
ID number in the original study, before matching
The outcome, a measure of DNA damage is dpc.
Fang-Yang Wu, Pao-Wen Chang, Chin-Ching Wu and Hsien-Wen Kuo (2002). "Correlations of Blood Lead with DNA-Protein Cross-Links and Sister Chromatid Exchanges in Lead Workers". Cancer Epidemiology, Biomarkers and Prevtion. March 1 2002 11 (3) 287-290.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
data(leadworker) boxplot(leadworker$dpc~leadworker$group)
data(leadworker) boxplot(leadworker$dpc~leadworker$group)
Data from NHANES 2011-2012 containing 441 matched pairs of a daily cigarette smoker and a never smoker, recording the extent of periodontal disease. See Rosenbaum (2017, 2021).
data("periodontal")
data("periodontal")
A data frame with 882 observations on the following 12 variables.
SEQN
NHANES 2011-2012 sequence number
female
=1 for female, 0 for male
age
Age in years
black
=1 for black, 0 for other
educf
Education, in five categories. An ordered factor with levels <9
for less than 9th grade, 9 to 11
for 9th to 11th grade, HS/GED
for high school or GED degree,
SomeCol
for some college, College
for college degree.
income
Ratio of family income to the poverty level, capped at 5 times the poverty level.
cigsperday
Cigarettes smoked per day for daily smokers, 0 for never smokers
either
Number of periodonal measurements indicative of periodontal disease.
neither
Number of periodonal measurements
pcteither
Percent indicative of periodontal disease, =100*either/neither.
z
Treatment indicator, 1=daily smoker, 0=never smoker
mset
Matched set indicator, 1 to 441.
Excluding wisdom teeth, 6 measurements are taken for each tooth that is present, up to 28 teeth. Following Tomar and Asma (2000), a measurement indicates periodontal disease if either there is a loss of attachment of at least 4mm or a pocket depth of at least 4mm. The first individual has 11 measurements indicative of periodontal disease, out of 106 measurements, so pcteither is 100*11/106 = 10.38 percent.
Data are from the National Health and Nutrition Examination Survey 2011-2012 and were used as an example in Rosenbaum (2017).
Rosenbaum, P. R. (2015) <https://obsstudies.org/two-r-packages-for-sensitivity-analysis-in-observational-studies/> "Two R packages for sensitivity analysis in observational studies". Observational Studies, 1(1), 1-17.
Rosenbaum, P. R. (2017) <doi:10.1214/17-STS621> "The general structure of evidence factors in observational studies". Statistical Science 32, 514-530.
Rosenbaum, Paul R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
Tomar, S. L. and Asma, S. (2000) <doi:10.1902/jop.2000.71.5.743> "Smoking attributable periodontitis in the US: Findings from NHANES III". J Periodont 71, 743-751.
"US National Health and Nutrition Examination Survey 2011-2012". www.cdc.gov/nchs/nhanes/index.htm
# Figure 1 in Rosenbaum (2017) data(periodontal) attach(periodontal) oldpar<-par() m<-matrix(1:2,1,2) layout(m,widths=c(1,2)) boxplot(pcteither[z==1]-pcteither[z==0],ylab="Smoker-Control Difference", main="(i)",xlab="Matched Pairs",ylim=c(-100,100)) abline(h=0,lty=2) DOS2::crosscutplot(cigsperday[z==1], pcteither[z==1]-pcteither[z==0], ylab="Smoker-Control Difference", xlab="Cigarettes per Day",main="(ii)", ylim=c(-100,100)) abline(h=0,lty=2) # Sensitivity analysis in Section 2.3 of Rosenbaum (2017) y<-pcteither[z==1]-pcteither[z==0] x<-cigsperday[z==1] DOS2::senWilcox(y,gamma=2.76) # The following is the same as sensitivitymw::senmw(y,gamma=2.77,method="p") sensitivitymult::senm(pcteither,z,mset,gamma=2.77,inner=.5,trim=2) # The following is the same as sensitivitymw::senmw(y,gamma=3.5,method="p") sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=.5,trim=2) # Second evidence factor DOS2::crosscut(x,y) DOS2::crosscut(x,y,gamma=1.6) # Note, however, that other statistics report greater insensitivity to # bias by virtue of having larger design sensitivity: sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=1,trim=4) sensitivitymult::senm(pcteither,z,mset,gamma=4.2,inner=1,trim=4) DOS2::senU(y,m1=4,m2=5,m=5,gamma=2.77) DOS2::senU(y,m1=6,m2=8,m=8,gamma=2.77) DOS2::senU(y,m1=6,m2=8,m=8,gamma=3.5) detach(periodontal) par(oldpar)
# Figure 1 in Rosenbaum (2017) data(periodontal) attach(periodontal) oldpar<-par() m<-matrix(1:2,1,2) layout(m,widths=c(1,2)) boxplot(pcteither[z==1]-pcteither[z==0],ylab="Smoker-Control Difference", main="(i)",xlab="Matched Pairs",ylim=c(-100,100)) abline(h=0,lty=2) DOS2::crosscutplot(cigsperday[z==1], pcteither[z==1]-pcteither[z==0], ylab="Smoker-Control Difference", xlab="Cigarettes per Day",main="(ii)", ylim=c(-100,100)) abline(h=0,lty=2) # Sensitivity analysis in Section 2.3 of Rosenbaum (2017) y<-pcteither[z==1]-pcteither[z==0] x<-cigsperday[z==1] DOS2::senWilcox(y,gamma=2.76) # The following is the same as sensitivitymw::senmw(y,gamma=2.77,method="p") sensitivitymult::senm(pcteither,z,mset,gamma=2.77,inner=.5,trim=2) # The following is the same as sensitivitymw::senmw(y,gamma=3.5,method="p") sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=.5,trim=2) # Second evidence factor DOS2::crosscut(x,y) DOS2::crosscut(x,y,gamma=1.6) # Note, however, that other statistics report greater insensitivity to # bias by virtue of having larger design sensitivity: sensitivitymult::senm(pcteither,z,mset,gamma=3.5,inner=1,trim=4) sensitivitymult::senm(pcteither,z,mset,gamma=4.2,inner=1,trim=4) DOS2::senU(y,m1=4,m2=5,m=5,gamma=2.77) DOS2::senU(y,m1=6,m2=8,m=8,gamma=2.77) DOS2::senU(y,m1=6,m2=8,m=8,gamma=3.5) detach(periodontal) par(oldpar)
Data are from Zhang et al. (2008, Table 2) who studied DNA damage among tannery workers often exposed to trivalent chromium. The outcome is the mean tail moment (mtm) of the comet assay, a standard measure of DNA damage, with higher values signifying greater damage. The study describes 90 males in 30 blocks of 3 individuals. There are three groups, each with 30 individuals. The three groups had a simlar distribution of ages, and the blocks control for smoking as closely as possible. Group e1 consists of 30 exposed workers at the tannery who worked in the tannery department, where the highest exposures to trivalent chromium are expected. Group e2 consists of 30 workers at the tannery who worked in the finishing department, where exposure to trivalent chromium is expected to be much lower. Group c consists of 30 controls who did not work at the tannery. This example is discussed in Chapter 20 of "Design of Observational Studies", 2nd ed.
data("tannery")
data("tannery")
A data frame with 30 observations on the following 4 variables.
block
Block indicator, 1 to 30.
e1mtm
mtm for the tannery worker from the tannery department.
e2mtm
mtm for the tannery worker from the finishing department.
cmtm
mtm for the control who did not work in the tannery
The comet assay is described by Collins (2004). It is thought to measure DNA strand breaks, producing an image that resembles the tail of a comet, a larger, longer tail suggesting more extensive strand breaks. This example was discussed in Rosenbaum (2011, 2021).
Data from Zhang et al. (2008, Table 2).
Collins, A. R. (2004) <doi:10.1385/MB:26:3:249> "The comet assay for DNA damage and repair: principles, applications, and limitations". Molecular Biotechnology 26(3), 249-261.
Rosenbaum, P. R. (2011) <doi:10.1198/jasa.2011.tm10422> "Some approximate evidence factors in observational studies". Journal of the American Statistical Association, 106, 285-293.
Rosenbaum, P. R. (2013)<doi:10.1111/j.1541-0420.2012.01821.x> "Impact of multiple matched controls on design sensitivity in observational studies". Biometrics 69 118-127. (Introduces inner trimming.)
Rosenbaum, P. R. (2015) <https://obsstudies.org/two-r-packages-for-sensitivity-analysis-in-observational-studies/> "Two R packages for sensitivity analysis in observational studies". Observational Studies, 1(1), 1-17.
Rosenbaum, P. R. (2021) <doi:10.1201/9781003039648> Replication and Evidence Factors in Observational Studies. Chapman and Hall/CRC.
Zhang, M., Chen, Z., Chen, Q., Zou, H., Lou, J. He, J. (2008) <doi:10.1016/j.mrgentox.2008.04.011> "Investigating DNA damage in tannery workers occupationally exposed to trivalent chromium using comet assay". Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 654(1), 45-51.
data(tannery) boxplot(tannery[,2:4],names=c("Tannery E1","Finishing E2", "Control C"),ylab="Mean Tail Moment") oldpar<-par(mfrow=c(1,2)) boxplot(tannery[,2:3],names=c("E1","E2"),ylab="Mean Tail Moment", main="Tannery vs. Finishing",ylim=c(0,12)) boxplot(as.vector(unlist(tannery[,2:3])),tannery[,4], names=c("E1+E2","C"),ylab="Mean Tail Moment", main="Exposed vs. Control",ylim=c(0,12)) # Stratified Wilcoxon analysis from the chapter Evidence Factors # of Design of Observational Studies, Second Edition # Also reproduces the F1, F2 example in Rosenbaum (2011, sec 6). y<-tannery[,2:4] rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2,C) sum(rkc[,3]) # Stratified rank sum for C in (E1, E2, C) (35-60)/sqrt(20) y<-tannery[,2:3] rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2) sum(rkc[,2]) # Stratified rank sum for E2 in (E1, E2) # Reorganize y for input to 'separable1v' from 'sensitivitymult' # 'separable1v' is one-sided, looking for a large rank sum # Factor 1 y<-tannery[,4:2]*(-1) rkc<-t(apply(y,1,rank)) # Ranks for -y for (C,E2,E1) sensitivitymult::separable1v(rkc,gamma=1) # Test for C in (E1, E2, C) (85-60)/sqrt(20) (35-60)/sqrt(20) sensitivitymult::separable1v(rkc,gamma=6) # Test for C in (E1, E2, C) p1<-sensitivitymult::separable1v(rkc,gamma=7)$pval #Factor 2 y<-tannery[,3:2]*(-1) rkc<-t(apply(y,1,rank)) # Ranks for -y for (E2,E1) sensitivitymult::separable1v(rkc,gamma=1) # Test for E2 in (E2, E1) # Combine P-values using Fisher's method sensitivitymv::truncatedP(c(1.134237e-08,0.001743502),trunc=1) # Larger gammas sensitivitymult::separable1v(rkc,gamma=1.7) p2<-sensitivitymult::separable1v(rkc,gamma=2)$pval # Combine P-values using Fisher's method c(p1,p2) sensitivitymv::truncatedP(c(p1,p2),trunc=1) # Nearly reproduces calculations from Section 6 of Rosenbaum (2011) # However, in Rosenbaum (2011), the second factor # uses a pooled scale factor, whereas senm does not, # so the result is very slightly different. attach(tannery) mset<-rep(block,3) zC<-c(rep(0,60),rep(1,30)) z12<-c(rep(1,30),rep(0,30),rep(NA,30)) y<-c(e1mtm,e2mtm,cmtm) detach(tannery) use<-!is.na(z12) # Factor 1 sensitivitymult::senm(y,zC,mset,gamma=1, alternative="less",trim=1) sensitivitymult::senm(y,zC,mset,gamma=11.7, alternative="less",trim=1) # Factor 2 sensitivitymult::senm(y[use],z12[use],mset[use], gamma=2,alternative="greater",trim=1) # Combine two evidence factors p1<-sensitivitymult::senm(y,zC,mset,gamma=12, alternative="less",trim=1)$pval p2<-sensitivitymult::senm(y[use],z12[use],mset[use],gamma=3, alternative="greater",trim=1)$pval c(p1,p2) sensitivitymv::truncatedP(c(p1,p2),trunc=1) # Combine p-values using Fisher's method # Other psi-functions often have higher design # sensitivity; see Rosenbaum (2013) par(oldpar)
data(tannery) boxplot(tannery[,2:4],names=c("Tannery E1","Finishing E2", "Control C"),ylab="Mean Tail Moment") oldpar<-par(mfrow=c(1,2)) boxplot(tannery[,2:3],names=c("E1","E2"),ylab="Mean Tail Moment", main="Tannery vs. Finishing",ylim=c(0,12)) boxplot(as.vector(unlist(tannery[,2:3])),tannery[,4], names=c("E1+E2","C"),ylab="Mean Tail Moment", main="Exposed vs. Control",ylim=c(0,12)) # Stratified Wilcoxon analysis from the chapter Evidence Factors # of Design of Observational Studies, Second Edition # Also reproduces the F1, F2 example in Rosenbaum (2011, sec 6). y<-tannery[,2:4] rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2,C) sum(rkc[,3]) # Stratified rank sum for C in (E1, E2, C) (35-60)/sqrt(20) y<-tannery[,2:3] rkc<-t(apply(y,1,rank)) # Ranks for (E1,E2) sum(rkc[,2]) # Stratified rank sum for E2 in (E1, E2) # Reorganize y for input to 'separable1v' from 'sensitivitymult' # 'separable1v' is one-sided, looking for a large rank sum # Factor 1 y<-tannery[,4:2]*(-1) rkc<-t(apply(y,1,rank)) # Ranks for -y for (C,E2,E1) sensitivitymult::separable1v(rkc,gamma=1) # Test for C in (E1, E2, C) (85-60)/sqrt(20) (35-60)/sqrt(20) sensitivitymult::separable1v(rkc,gamma=6) # Test for C in (E1, E2, C) p1<-sensitivitymult::separable1v(rkc,gamma=7)$pval #Factor 2 y<-tannery[,3:2]*(-1) rkc<-t(apply(y,1,rank)) # Ranks for -y for (E2,E1) sensitivitymult::separable1v(rkc,gamma=1) # Test for E2 in (E2, E1) # Combine P-values using Fisher's method sensitivitymv::truncatedP(c(1.134237e-08,0.001743502),trunc=1) # Larger gammas sensitivitymult::separable1v(rkc,gamma=1.7) p2<-sensitivitymult::separable1v(rkc,gamma=2)$pval # Combine P-values using Fisher's method c(p1,p2) sensitivitymv::truncatedP(c(p1,p2),trunc=1) # Nearly reproduces calculations from Section 6 of Rosenbaum (2011) # However, in Rosenbaum (2011), the second factor # uses a pooled scale factor, whereas senm does not, # so the result is very slightly different. attach(tannery) mset<-rep(block,3) zC<-c(rep(0,60),rep(1,30)) z12<-c(rep(1,30),rep(0,30),rep(NA,30)) y<-c(e1mtm,e2mtm,cmtm) detach(tannery) use<-!is.na(z12) # Factor 1 sensitivitymult::senm(y,zC,mset,gamma=1, alternative="less",trim=1) sensitivitymult::senm(y,zC,mset,gamma=11.7, alternative="less",trim=1) # Factor 2 sensitivitymult::senm(y[use],z12[use],mset[use], gamma=2,alternative="greater",trim=1) # Combine two evidence factors p1<-sensitivitymult::senm(y,zC,mset,gamma=12, alternative="less",trim=1)$pval p2<-sensitivitymult::senm(y[use],z12[use],mset[use],gamma=3, alternative="greater",trim=1)$pval c(p1,p2) sensitivitymv::truncatedP(c(p1,p2),trunc=1) # Combine p-values using Fisher's method # Other psi-functions often have higher design # sensitivity; see Rosenbaum (2013) par(oldpar)