Title: | R for Health Care Research |
---|---|
Description: | A collection of datasets that accompany the forthcoming book "R for Health Care Research". |
Authors: | Jason L. Oke [aut, cre, cph]
|
Maintainer: | Jason L. Oke <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1 |
Built: | 2025-02-14 06:58:29 UTC |
Source: | CRAN |
Data from a randomised control trial (RCT) of acupuncture therapy for chronic headaches. The primary outcome was headache severity score measured using a 6-item Likert-type scale at the one-year follow-up.
Acupuncture
Acupuncture
A data frame with 301 observations on the following 4 variables.
group
Randomisation group (0
= Usual care, 1
= Acupuncture treatment).
pk1
Headache severity score at baseline.
pk5
Headache severity score at 1 year.
change
Change score (pk5
- pk1
).
These are data from a randomised controlled trial comparing acupuncture therapy to usual care (no acupuncture therapy) on headache severity scores in patients with chronic headaches. 401 patients with chronic headache (predominantly migraine) were recruited from general practices in England and Wales. Patients were randomly allocated to receive up to 12 acupuncture treatments over three months or to a control intervention offering usual care. The primary outcome measure was headache score at the one-year follow-up.
Teaching of Statistics in the Health Sciences Resources Portal Community https://www.causeweb.org/tshs/?s=Acupuncture
Vickers, A.J., Rees, R.W., Zollman, C.E., McCarney, R., Smith, C.M., Ellis, N., Fisher, P. and Van Haselen, R., 2004. Acupuncture for chronic headache in primary care: large, pragmatic, randomised trial. BMJ, 328(7442), p.744.
data(Acupuncture, package = "R4HCR") # Checking baseline balance with(Acupuncture, tapply(pk1,group,mean)) # Correlation between change scores and baseline scores with(Acupuncture, cor(I(pk5-pk1),pk1)) # ANCOVA model lm(pk5 ~ group + pk1, data = Acupuncture)
data(Acupuncture, package = "R4HCR") # Checking baseline balance with(Acupuncture, tapply(pk1,group,mean)) # Correlation between change scores and baseline scores with(Acupuncture, cor(I(pk5-pk1),pk1)) # ANCOVA model lm(pk5 ~ group + pk1, data = Acupuncture)
Data from a meta-analysis of 13 studies of the efficacy of BCG vaccine against Tuberculosis (TB).
BCG
BCG
A data frame with 13 observations on the following 8 variables.
trialnam
Name of the trial.
authors
Authors of the paper.
startyr
Start year.
latitude
Latitude in degrees from the equator.
cases1
Number of TB cases in intervention group.
tot1
Total number in intervention group.
cases0
Number of TB cases in control group.
tot0
Total number in control group.
https://www.biostat.jhsph.edu/~fdominic/teaching/bio656/software/meta.analysis.pdf
Colditz GA, Brewer TF, Berkey CS, et al. Efficacy of BCG Vaccine in the Prevention of Tuberculosis: Meta-analysis of the Published Literature. JAMA. 1994;271(9):698–702. doi:10.1001/jama.1994.03510330076038.
require(meta) data(BCG, package = "R4HCR") # Meta-analysis using relative risk summary measure ma5 <- metabin( sm = "RR", event.e = cases1, n.e = tot1, event.c = cases0, n.c = tot0, studlab = trialnam, data = BCG)
require(meta) data(BCG, package = "R4HCR") # Meta-analysis using relative risk summary measure ma5 <- metabin( sm = "RR", event.e = cases1, n.e = tot1, event.c = cases0, n.c = tot0, studlab = trialnam, data = BCG)
A simplified version of the data set printed in Klein and Moeschberger, 2003. Briefly, these data are from a study of 137 patients with acute myelocytic leukemia (AML) or acute lymphoblastic leukemia (ALL) aged 7 to 52 from four centres. Failure time is defined as the time (in days) to relapse or death.
BMT
BMT
A data frame with 137 observations on the following 3 variables.
group
Categorisation of the patients' Leukemia (ALL
= Acute Lymphoblastic Leukemia, AML-High Risk
= High risk Acute Myelocytic Leukemia,
AML-Low Risk
= Low risk Acute Myelocytic Leukemia).
time
Failure time, defined as time (in days) to relapse or death.
status
Disease-free survival indicator (1
= Dead or Relapsed, 0
= Alive Disease Free).
Bone marrow transplants are a standard treatment for acute leukemia.Recovery following bone marrow transplantation is a complex process and prognosis may depend on a number of different risk factors. Transplantation can be considered a failure when a patient's leukemia returns (relapse) or when he or she dies while in remission (treatment related death).
Klein, J.P. and Moeschberger, M.L., 2003. Survival analysis: techniques for censored and truncated data (Vol. 1230). New York: Springer.
see also
Copelan,Biggs, Thompson, et al, Treatment for Acute Myelocytic Leukemia With Allogeneic Bone Marrow Transplantation Following Preparation With BuCy2, Blood, Volume 78, Issue 3, 1991, Pages 838-843, ISSN 0006-4971,
and
data(BMT, package = "R4HCR")
data(BMT, package = "R4HCR")
Data from a diagnostic accuracy review of imaging techniques and tumor markers for the diagnosis of pancreatic carcinoma.
CA19
CA19
A data frame with 22 observations on the following 5 variables.
study
Name of study.
TP
The number of true positive test results.
FP
The number of false positive test results.
FN
The number of false negative test results.
TN
The number of true negative test results.
Protein cancer antigen 19-9 (CA 19-9) is a test used to monitor response to treatment for cancers such as pancreatic, Bile duct, Colorectal, Stomach, Ovarian and Bladder cancer.
Niederau C, Grendell JH. Diagnosis of pancreatic carcinoma. Imaging techniques and tumor markers. Pancreas. 1992;7(1):66-86. doi: 10.1097/00006676-199201000-00011. PMID: 1557348.
require(mada) data(CA19, package = "R4HCR") # Bivariate Reitsma model/HSROC analysis. reitsma(CA19, method = "ml")
require(mada) data(CA19, package = "R4HCR") # Bivariate Reitsma model/HSROC analysis. reitsma(CA19, method = "ml")
These data are a subset of a larger set of data collected by Low et al and reprinted in Hollander et al. The data correspond to two methods for measuring ciliary activity (ciliary beat frequency (CBF)); 1) nasal brushing and 2) the more invasive but accepted method of endobronchial forceps biopsy. The subjects in the study were all men undergoing bronchoscopies for diagnoses of various lung problems. The CBF values are averages of 10 consecutive measurements on each subject.
CBF
CBF
A data frame with 15 observations on the following 2 variables.
Nasal
CBF (hertz) measured using nasal brushing method.
Biopsy
CBF (hertz) measured using endobronchial forceps biopsy method.
Originally from P. P. Low, C. K. Luk, M. J. Dulfano, and P. J. P. Finch (1984).
Hollander, M., Wolfe, D.A. and Chicken, E., 2013. Nonparametric statistical methods. John Wiley & Sons.
data(CBF, package = "R4HCR") # Pearson's r with(CBF, cor(Nasal, Biopsy) )
data(CBF, package = "R4HCR") # Pearson's r with(CBF, cor(Nasal, Biopsy) )
Duplicate salivary cotinine measurements for 20 Scottish schoolchildren.
Cotinine
Cotinine
A data frame with 20 observations on the following 3 variables.
subject
Subject identifier
cotinine1
First of two cotinine measurements (ng/ml).
cotinine2
Second of two cotinine measurements (ng/ml).
Cited as originating from D Strachan (by personal communication), first printed in Bland and Altman (1996).
Bland, J.M. and Altman, D.G., 1996. Measurement error proportional to the mean. BMJ: British Medical Journal, 313(7049), p.106.
data(Cotinine, package = "R4HCR") mean <- rowMeans(Cotinine[,c(2,3)]) range <- abs(Cotinine[,2] - Cotinine[,3]) # error vs the mean. plot(mean,range, pch=16, xlab = "Average of first and second measurement")
data(Cotinine, package = "R4HCR") mean <- rowMeans(Cotinine[,c(2,3)]) range <- abs(Cotinine[,2] - Cotinine[,3]) # error vs the mean. plot(mean,range, pch=16, xlab = "Average of first and second measurement")
Cardiac output measured using Doppler echocardiography by two different observers.
Doppler
Doppler
A data frame with 23 observations on the following 2 variables.
A
Cardiac ouput measured by observer A (litres/minute).
B
Cardiac ouput measured by observer B (litres/minute).
In a study to assess the inter-observer reproducibility of cardiac output. Twenty-three ventilated patients were measured non-invasively by Doppler echocardiography. From the four-chamber view of the heart, the readings were made by positioning the Doppler sample volume at the mitral anulus plane.
Müller, R. and Büttner, P., 1994. A critical discussion of intraclass correlation coefficients. Statistics in Medicine, 13(23‐24), pp.2465-2476.
require(irr) data(Doppler, package = "R4HCR") # Intra-class correlation. icc(Doppler, model = "twoway", type = "agreement", unit = "single")
require(irr) data(Doppler, package = "R4HCR") # Intra-class correlation. icc(Doppler, model = "twoway", type = "agreement", unit = "single")
Diagnostic performance of duplex and color-guided duplex for detecting peripheral arterial disease (PAD) in 14 studies. PAD is defined as stenosis of 50-99% or an occlusion.
Duplex
Duplex
A data frame with 14 observations on the following 6 variables.
study
Name of study
test
Type of ultrasound (Color
or Duplex
)
tp
The number of true positive test results.
fn
The number of false negative test results.
tn
The number of true negative test results.
fp
The number of false positive test results.
de Vries SO, Hunink MG, Polak JF. Summary receiver operating characteristic curves as a technique for meta-analysis of the diagnostic performance of duplex ultrasonography in peripheral arterial disease. Acad Radiol. 1996 Apr;3(4):361-9. doi: 10.1016/s1076-6332(96)80257-1. PMID: 8796687.
require(metafor); require(meta) data(Duplex, package = "R4HCR") # Fitting the common effects model. Duplex <- escalc( measure = "OR", add = 0.5, to = "all", ai = tp, bi = fp, ci = fn, di = tn, data = Duplex) Duplex <- within(Duplex, { S = log((fp + 0.5)/(tn + 0.5)) + log((tp + 0.5)/(fn + 0.5)) } ) ma <- metagen(TE = yi, seTE = vi, data = Duplex,sm = "OR") metareg(ma, formula = S,method = "FE")
require(metafor); require(meta) data(Duplex, package = "R4HCR") # Fitting the common effects model. Duplex <- escalc( measure = "OR", add = 0.5, to = "all", ai = tp, bi = fp, ci = fn, di = tn, data = Duplex) Duplex <- within(Duplex, { S = log((fp + 0.5)/(tn + 0.5)) + log((tp + 0.5)/(fn + 0.5)) } ) ma <- metagen(TE = yi, seTE = vi, data = Duplex,sm = "OR") metareg(ma, formula = S,method = "FE")
Data from a survey of adult Americans in 1994.
Earnings
Earnings
A data frame with 1192 observations on the following 4 variables.
earn
Annual earnings (in dollars).
sex
Sex (1
= men, 2
= women).
yearbn
Year of birth.
height
Height (in inches).
This is a subset of the data was used in a number of regression examples in Data analysis using regression and multilevel/hierarchical models by Gelman and Hill (2006).
http://www.stat.columbia.edu/~gelman/arm/software/
Gelman, Andrew, and Jennifer Hill. Data Analysis Using Regression and Multilevel/Hierarchical models. Cambridge university press, 2006.
Persico, Nicola, Andrew Postlewaite, and Dan Silverman. "The effect of adolescent experience on labor market outcomes: the case of height (No. w10522)." (2004).
data(Earnings, package = "R4HCR") mod <- lm(earn ~ height, data = Earnings) # % variation explained summary(mod)$adj.r.squared # regression coefficients. coef(mod) # log earnings model logm <- lm(I(log(earn)) ~ height, data = Earnings) coef(logm)
data(Earnings, package = "R4HCR") mod <- lm(earn ~ height, data = Earnings) # % variation explained summary(mod)$adj.r.squared # regression coefficients. coef(mod) # log earnings model logm <- lm(I(log(earn)) ~ height, data = Earnings) coef(logm)
This is a matched case control study investigated the effect of exogenous oestrogens on the risk of endometrial cancer.
Endometrial
Endometrial
A data frame with 126 observations on the following 8 variables.
set
Matched pair indicator (1
- 63
).
case
Indicator for case/control status (0
= control, 1
= case).
gallbladder
History of gallbladder disease (0
= No, 1
= Yes).
hypertension
History of hypertension (0
= No, 1
= Yes).
obesity
Obesity (0
= No, 1
= Yes).
estrogen
Any use of estrogen (0
= No, 1
= Yes).
age
Age of the women.
dose
Conjugated estrogen dose (1
= none, 2
= 0.1-0.299 mg, 3
=
0.3-0.625 mg and 4
= 0.626+ mg).
Investigators matched 63 cases of endometrial cancer with four control women who were alive and living in the community at the time the case was diagnosed, who were born within one year of the case, who had the same marital status, and who had entered the community at approximately the same time. This data set includes all 63 cases and the first matched control, as per the results in Table 7.3 (page 255) of Breslow and Day (1980).
Breslow, N.E., Day, N.E. and Heseltine, E., 1980. Statistical Methods in Cancer Research.
Mack, T.M., Pike, M.C., Henderson, B.E., Pfeffer, R.I., Gerkins, V.R., Arthur, M. and Brown, S.E., 1976. Estrogens and endometrial cancer in a retirement community. New England Journal of Medicine, 294(23), pp.1262-1267.
require(survival) data(Endometrial, package = "R4HCR") # Conditional logistic regression. mod2 <- clogit(case ~ estrogen + strata(set), data = Endometrial) summary(mod2)
require(survival) data(Endometrial, package = "R4HCR") # Conditional logistic regression. mod2 <- clogit(case ~ estrogen + strata(set), data = Endometrial) summary(mod2)
Data from a cross-over randomised controlled study on the effect of face-masks while taking exercise.
Facemasks
Facemasks
A data frame with 216 observations on the following 3 variables.
patid
Participant identifiction number.
comparison
Variable indiciating which of the three comparisons the outcome corresponds to (Cloth vs None, Surgical vs None, FFP3 vs none).
delta
Difference in oxygen saturation (SaO2) in percent (%).
These data are from a cross-over randomised controlled study, completed between June 2021 and January 2022. Volunteers were aged 18–35 years, exercised regularly, and had no significant pre-existing health conditions. The primary outcome was change in oxygen saturation. Oxygen saturation levels were measured after exercise whilst wearing a cloth mask, a surgical mask,or filtering facepiece (FFP3) mask, and compared to oxygen saturation levels without any mask, during 4 15 min bouts of exercise. The exercise was running outdoors or indoor rowing at moderate-to-high intensity, with the consistency of distance traveled between bouts confirmed using a smartphone application (Strava). Each participant completed each bout in random order.
Jones N, Oke JL, Marsh S, et al. Face masks while exercising trial (MERIT): a cross-over randomised controlled study. BMJ Open 2023;13:e063014.
data(Facemasks, package = "R4HCR") # focus on cloth - none comparison t.test(delta ~ 1, data = Facemasks, subset = comparison == "Cloth - None")
data(Facemasks, package = "R4HCR") # focus on cloth - none comparison t.test(delta ~ 1, data = Facemasks, subset = comparison == "Cloth - None")
Pairs of measurements of Forced Expiratory Volume (FEV), taken a few weeks apart from 20 Scottish schoolchildren.
FEV
FEV
A data frame with 20 observations on the following 3 variables.
child
Child identification number
fev1
First FEV measurement
fev2
Second FEV measurement
The data in table 1 of the original Bland and Altman paper does not correspond to the ANOVA analysis of Table 2. The corrected data does recreate the ANOVA analysis and so is given here.
Corrected data can be found here https://www.bmj.com/content/suppl/1999/03/16/313.7048.41.DC1
Bland, JM. & Alman, DG. 1996. Measurement Error and Correlation Coefficients. Br Med J., 313, pp.41-42.
data(FEV, package="R4HCR") # reshape to long FEVl <- reshape(FEV, direction = "long", idvar = "child", varying =list(2:3), v.names = "fev") # one-way ANOVA - as per table 2 of Bland and Altman. anova(lm(fev ~ factor(child), data = FEVl))
data(FEV, package="R4HCR") # reshape to long FEVl <- reshape(FEV, direction = "long", idvar = "child", varying =list(2:3), v.names = "fev") # one-way ANOVA - as per table 2 of Bland and Altman. anova(lm(fev ~ factor(child), data = FEVl))
Many versions of the Framingham heart disease dataset exist, this one includes over 4,000 records and includes several cardiovascular disease risk factors such as blood pressure, blood chemistry, smoking history, markers of disease, and cardiovascular outcomes.
Framingham
Framingham
A data frame with 4240 observations on the following 16 variables.
sex
Sex of participant (0
= female, 1
= male).
age
Age (in years).
education
1
= 0-11 years, 2
= High School Diploma, GED, 3
= Some College, Vocational School, 4
= College (BS, BA) degree or more.
currentsmoker
Current cigarette smoking at exam, 0
= Not current smoker, 1
= Current smoker.
cigsperday
Number of cigarettes smoked each
day, 0
= Not current smoker. 1
= 1-90 cigarettes per day.
bpmeds
Use of Anti-hypertensive medication
at exam, 0
= Not currently used, 1
= Current Use.
prevalentstroke
Prevalent Stroke (0
= Free of disease
1
= Prevalent disease).
prevalenthyp
Prevalent Hypertension (0
= Free of disease
1
= Prevalent disease).
diabetes
Diabetic according to criteria of first
exam treated or first exam with casual
glucose of 200 mg/dL or more (0
= No diabetes, 1
= Diabetes).
totchol
Serum Total Cholesterol (mg/dL).
sysbp
Systolic Blood Pressure (mean of last two of three measurements) (mmHg).
diabp
Diastolic Blood Pressure (mean of last two of three measurements) (mmHg).
bmi
Body Mass Index, weight in kilograms/height meters squared.
heartrate
Heart rate (Ventricular rate) in beats/min.
glucose
Casual serum glucose (mg/dL).
tenyearchd
Whether the invidividual developed Coronary Heart Disease within ten years (0
= no, 1
= yes).
The Framingham Heart Study is a long-term, ongoing cardiovascular cohort study of residents of the city of Framingham, Massachusetts. It began in 1948 and is now on its third generation of participants.
https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset?resource=download https://www.framinghamheartstudy.org
For a description of the full data set see here; https://biolincc.nhlbi.nih.gov/media/teachingstudies/FHS_Teaching_Longitudinal_Data_Documentation_2021a.pdf?link_time=2024-05-26_10:36:20.705109
For more details on the Heart study see for example: Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet. 2014 Mar 15;383(9921):999-1008. PMID: 24084292; PMCID: PMC4159698.
data(Framingham, package = "R4HCR")
data(Framingham, package = "R4HCR")
These data are from Galton's 1886 study of human height.
Galton
Galton
A data frame with 898 observations on the following 9 variables.
family
Indicator variable for family unit (or parentages).
father
Height of the father in inches.
mother
Height of the mother in inches.
sex
Sex of the child (M
= Male, F
= Female).
height
Height of the child.
no.children
Number of children in family unit.
mother.adj
Mother's height multiplied by 1.08.
height.adj
Adjusted height of the children (see details).
mid.parent
The “mid-parent” height (see details).
Galton's data comprised 898 adult children from 197 family units (father-and-mother couples). Mid-parent is the mean of the height of the father and of his wife's height multiplied by 1.08. Similarly, adjusted height has the same correction with female children's height also multiplied by 1.08, and male child heights are left unchanged.
Francis Galton, 2017, "Galton height data", Harvard Dataverse
Galton, Francis. "Regression towards mediocrity in hereditary stature." The Journal of the Anthropological Institute of Great Britain and Ireland 15 (1886): 246-263.
Stephen Senn, Francis Galton and Regression to the Mean, Significance, Volume 8, Issue 3, September 2011, Pages 124–126.
data(Galton, package = "R4HCR") # Regression to the mean lm.mod <- lm(height.adj ~ mid.parent, data = Galton) su <- summary(lm.mod) coef(lm.mod)
data(Galton, package = "R4HCR") # Regression to the mean lm.mod <- lm(height.adj ~ mid.parent, data = Galton) su <- summary(lm.mod) coef(lm.mod)
Data from the study by Shen et al 'Comparison of impedance to insulin-mediated glucose uptake in normal subjects and in subjects with latent diabetes.
Glucose
Glucose
A data frame with 14 observations on the following 3 variables.
diabetes
Indicator of whether the person had diabetes (1
) or not (0
).
glucose
Weighted glucose response to an oral glucose tolerance test (mg/100ml).
impedance
Glucose Impedance (ohms).
These data are originally from Shen et al (1970) and reprinted in Hollander et al (2013). Glucose impedance represents the tissues' insensitivity or resistance to insulin-mediated glucose uptake. It was hypothesised that the newly developed technique of estimating impedance would allow the detection of a difference in glucose uptake efficiency between normal and mildly diabetic subjects. Two groups of normal-weight subjects were studied, one had maturity onset latent diabetes, and the other (matched for age, weight, and percent adiposity) were 'normal'. Impedance data is taken from Table II 'Results of Standard Infusion Studies', whereas the glucose response data is shown in Table 1.
Shen SW, Reaven GM, Farquhar JW. Comparison of impedance to insulin-mediated glucose uptake in normal subjects and in subjects with latent diabetes. J Clin Invest. 1970 Dec;49(12):2151-60. doi: 10.1172/JCI106433. PMID: 5480843; PMCID: PMC322715.
Shen SW, Reaven GM, Farquhar JW. Comparison of impedance to insulin-mediated glucose uptake in normal subjects and in subjects with latent diabetes. J Clin Invest. 1970 Dec;49(12):2151-60. doi: 10.1172/JCI106433. PMID: 5480843; PMCID: PMC322715.
Hollander, M., Wolfe, D.A. and Chicken, E., 2013. Nonparametric statistical methods. John Wiley & Sons.
data(Glucose, package = "R4HCR") # Kendall's Tau. with( subset(Glucose, diabetes==0), cor.test(glucose, impedance, exact = TRUE, method = "kendall") )
data(Glucose, package = "R4HCR") # Kendall's Tau. with( subset(Glucose, diabetes==0), cor.test(glucose, impedance, exact = TRUE, method = "kendall") )
The number of false positives in negative samples in each evaluation stage of the Innova lateral flow device.
Innova
Innova
A data frame with 8 observations on the following 3 variables.
phase
Evalution phase
fp
Number of false positives
total
Total number of tests conducted
The Innova LFD was a first-generation Lateral Flow Device (LFD) for rapid point-of-care (POC) SARS-CoV-2 testing. Peto at al conducted a phased evaluation of available SARS-CoV-2 antigen LFDs from 15th August to December 2020 and reported the diagnostic performance of the Innova LFD.
Peto, T., Affron, D., Afrough, B., Agasu, A., Ainsworth, M., Allanson, A., Allen, K., Allen, C., Archer, L., Ashbridge, N. and Aurfan, I., 2021. COVID-19: Rapid antigen detection for SARS-CoV-2 by lateral flow assay: A national systematic evaluation of sensitivity and specificity for mass-testing. EClinicalMedicine, 36.
require(meta) data(Innova, package = "R4HCR") # Meta-analysis of false-positive fraction ma1 <- metaprop(event = fp, n = total, studlab = phase, backtransf=TRUE, data = Innova)
require(meta) data(Innova, package = "R4HCR") # Meta-analysis of false-positive fraction ma1 <- metaprop(event = fp, n = total, studlab = phase, backtransf=TRUE, data = Innova)
The performance of an artifical intelligence (AI) risk stratification tool for Indeterminate Pulmonary Nodules (IPN's) on chest CT scans.
IPNs
IPNs
A data frame with 200 observations on the following 2 variables.
cancer
Indicator for an cancerous IPN (1
) or non-cancerous IPN (0
).
rating
AI algorithm score for the likelihod of cancer.
This data set is taken from a retrospective multireader multicase study performed in June and July 2020 on chest CT studies of Indeterminate Pulmonary Nodules (IPNs). An artificial intelligence tool was used to evaluate CT images and provide an estimated probability of cancer (from 0 to 100).
This data set represents a subset of the orginal data.
Kim, R.Y., Oke, J.L., Pickup, L.C., Munden, R.F., Dotson, T.L., Bellinger, C.R., Cohen, A., Simoff, M.J., Massion, P.P., Filippini, C. and Gleeson, F.V., 2022. Artificial intelligence tool for assessment of indeterminate pulmonary nodules detected with CT. Radiology, 304(3), pp.683-691.
data(IPNs, package = "R4HCR")
data(IPNs, package = "R4HCR")
Data on man-years of risk and observed number of lung cancer deaths.
LungCa
LungCa
A data frame with 63 observations on the following 4 variables.
yrs_smk
Years of smoking (15-19
, 20-24
, 25-29
, 30-34
, 35-39
,40-44
, 45-49
, 50-54
, 55-59
).
pys
Person-years of follow-up.
num_cigs
Number of cigarettes smoked per day (0
, 1-9
, 10-14
, 15-19
, 20-24
, 25-34
, 35+
).
deaths
Number of lung cancer deaths.
These data come from Table 24-4, page 702 of Kleinbaum et al (1988).
Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A., 1988. Applied regression analysis and other multivariable methods (Vol. 601). Belmont, CA: Duxbury press
data(LungCa, package = "R4HCR")
data(LungCa, package = "R4HCR")
Transoesophageal measurements of left ventricular length (cm).
LVD
LVD
Four matrices, each representing a block of 36 LVD measurements.
block1
a 6x6 matrix, representing indices 1 - 36
block2
a 6x6 matrix, representing indices 37 - 72
block3
a 6x6 matrix, representing indices 73 - 108
block4
a 6x6 matrix, representing indices 109 - 144
These data were used to teach confidence intervals to undergraduate 1st year medical students in Oxford. Each student (from classes of between 20-25 students) draws a set of 12 numbers from a much larger list (the 'population') from which the mean is known to us, but not revealed to them. We instruct the students to use dice to select 12 numbers from the list in order to mimic a random sample. Each student then calculates a sample mean and a 95% confidence interval and they are invited to come up to the front and write their confidence intervals up on the board at the front of the class and the concept of confidence intervals demonstrated.
With thanks to Dr Thomas Fanshawe, Prof Richard Stevens and Prof Rafael Perera.
data(LVD, package = "R4HCR") # population is 144 individuals arranged in 4 blocks # sampling is done with two dice - # scores indicate which row and column to select # sample, three from each of the four blocks # sample size n = 12 # simulate 12 throws of 2 dice die1 <- sample(x = 1:6, 12, TRUE) die2 <- sample(x = 1:6, 12, TRUE) # drawing the numbers from the blocks smp <- c( LVD[[1]][cbind(die1[1:3],die2[1:3])], LVD[[2]][cbind(die1[4:6],die2[4:6])], LVD[[3]][cbind(die1[7:9],die2[7:9])], LVD[[4]][cbind(die1[10:12],die2[10:12])] ) # the first four numbers of our sample smp[1:4]
data(LVD, package = "R4HCR") # population is 144 individuals arranged in 4 blocks # sampling is done with two dice - # scores indicate which row and column to select # sample, three from each of the four blocks # sample size n = 12 # simulate 12 throws of 2 dice die1 <- sample(x = 1:6, 12, TRUE) die2 <- sample(x = 1:6, 12, TRUE) # drawing the numbers from the blocks smp <- c( LVD[[1]][cbind(die1[1:3],die2[1:3])], LVD[[2]][cbind(die1[4:6],die2[4:6])], LVD[[3]][cbind(die1[7:9],die2[7:9])], LVD[[4]][cbind(die1[10:12],die2[10:12])] ) # the first four numbers of our sample smp[1:4]
Data from a prospective study of maternal drinking and congenital malformation. Alcohol consumption was measured using a questionnaire (3 months after pregnancy). The presence or absence of congenital sex organ malformation was recorded following childbirth.
Malformation
Malformation
A data frame with 5 observations on the following four variables.
Alcohol_consumption
Alcohol consumption measured as average numebr of drinks per day.
Absent
Absence of any congential malformation
Present
Congenital malformation present
Midpoints
Midpoints of the alcohol consumption categories
This data set appears in An Introduction to Categorical Data Analysis by Agresti (section 2.5.2, page 35). The original source is cited as B.I.Graubard and E.L.Korn, Biometrics 43: 471-476 (1987).
Agresti, A., 2012. Categorical data analysis (Vol. 792). John Wiley & Sons.
data(Malformation, package = "R4HCR") # Chi-square test. with(Malformation, chisq.test(cbind(Absent,Present), simulate.p.value = TRUE))
data(Malformation, package = "R4HCR") # Chi-square test. with(Malformation, chisq.test(cbind(Absent,Present), simulate.p.value = TRUE))
Medical humanities courses and average world ranking in 109 in US medical schools. Two rankings were used for medical schools: the Times Higher Education in the ‘clinical, pre-clinical, and health’ category and the U.S. News and World Report (USNWR) ranking.
MedSchools
MedSchools
A data frame with 109 observations on the following 4 variables.
School
Name of the medical school.
Ranking
Average world ranking for the medical school.
Humanities
The number of medical humanities courses offered to students.
Compulsory
Whether at least one humanities course was offered.
Medical humanities are believed to positively impact medical education and medical practice, yet the extent of medical humanities teaching in medical schools is largely unknown. As part of a larger study, Howick et al explored whether there was a relationship between the number (mandatory or not) of medical humanities topics offered and the average world ranking in 109 accredited medical schools in the US.
Howick, J., Zhao, L., McKaig, B., Rosa, A., Campaner, R., Oke, J.L. and Ho, D., 2022. Do medical schools teach medical humanities? Review of curricula in the United States, Canada and the United Kingdom. Journal of Evaluation in Clinical Practice, 28(1), pp.86-92.
data(MedSchools, package = "R4HCR")
data(MedSchools, package = "R4HCR")
Fat content of human milk determined by enzymic procedure for the determination of triglycerides and measured by the standard Gerber method (g/100 ml).
Milk
Milk
A data frame with 45 observations on the following 2 variables.
Gerber
Fat content measured by the standard gerber method (g/100 ml).
Trig
Fat content measured by determination of triglycerides (g/100 ml).
Fat content of human milk determined by enzymic procedure for the determination of triglycerides (standard Gerber method) and determined by the measurement of glycerol released by enzymic hydrolysis of triglycerides.
Bland, J.M. and Altman, D.G., 1999. Measuring agreement in method comparison studies. Statistical methods in medical research, 8(2), pp.135-160.
data(Milk, package = "R4HCR") d <- with(Milk, Trig - Gerber) a <- with(Milk, (Trig + Gerber)/2) # regression approach for nonuniform differences M <- lm(d ~ a) # as per Bland and Altman (1999) page 147. coef(M)
data(Milk, package = "R4HCR") d <- with(Milk, Trig - Gerber) a <- with(Milk, (Trig + Gerber)/2) # regression approach for nonuniform differences M <- lm(d ~ a) # as per Bland and Altman (1999) page 147. coef(M)
A subset of retrospectively collected data from patients with pulmonary nodule(s) of up to 15mm detected on routinely performed CT chest scans aged 18 years old or older from 3 academic centres in the UK.
Nodules
Nodules
A data frame with 999 observations on the following 8 variables.
sex
Sex of the patient (F
= female, M
= male)
age
Age of the patient at CT scan (years)
num.annotated
Number of nodules annotated
location
Location of the nodule within the lung (Lingular Segment
Left Lower Lobe
Left Upper Lobe
Right Lower Lobe
Right Middle Lobe
Right Upper Lobe
)
spiculate
Is the nodule spiculated (No
or Yes
)
smoke.status
Smoking status (with levels current
, exsmoke
, never
, unknown
, NR
- not recorded)
diameter
Maximum diameter measured on a 2D axial CT slice (mm)
malignant
Ground truth of the nodule 0
= benign, 1
= malignant
,
Small pulmonary nodules are a common finding on computed tomographic (CT) scans of the chest. Up to 75% of smokers scanned either as part of their clinical care or in lung cancer screening trials have sub-centimeter pulmonary nodules detected. Most nodules detected on CT scans of the chest are not malignant and detection of nodules is expensive and time-consuming with potential associated patient morbidity and mortality. The outcome or ground truth for each nodule was established routinely in clinical care using the accepted published standards of Histology, 1 year for volume stability or 2 year for diameter stability (for benign nodules only), Expert opinion (for subpleural or perifissural lymph nodes only), or Nodule resolution (i.e. infection clears up). Benign nodules are coded as zero, malignant nodules as 1.
Oke, J.L., Pickup, L.C., Declerck, J., Callister, M.E., Baldwin, D., Gustafson, J., Peschl, H., Ather, S., Tsakok, M., Exell, A. and Gleeson, F., 2018. Development and validation of clinical prediction models to risk stratify patients presenting with small pulmonary nodules: a research protocol. Diagnostic and prognostic research, 2, pp.1-6.
data(Nodules, package = "R4HCR")
data(Nodules, package = "R4HCR")
Data from a meta-analysis of natriuretic peptide-guided (NP-guided) treatment for heart failure.
NPguided
NPguided
A data frame with 18 observations on the following 7 variables.
studyid
Name and year of study.
year
Year of publication.
eventsnp
Number of events (all-cause mortality) in NP-guided monitoring group.
totalnp
Total number of participants in NP-guided monitoring group.
eventscntrl
Number of events (all-cause mortality) with treatment guided by clinical assessment alone.
totalcntrl
Total number of participants with treatment guided by clinical assessment alone.
comparator
Indicator for type of comparator arm in study (0
= usual care, 1
= clinical assessment).
Natriuretic peptides (NP) are released by the myocardium in response to pressure or fluid overload and are raised in patients with heart failure (HF). NP is a collective term for N-terminal pro-B-type natriuretic peptide (NT-proBNP) and B-type natriuretic peptide (BNP). Studies compared NP-guided treatment to treatment guided by clinical assessment alone. These data are from a study that aimed to determine whether NP-guided treatment of patients with HF reduces all-cause mortality, amongst other outcomes.
McLellan J, Bankhead CR, Oke JL, Hobbs FDR, Taylor CJ, Perera R. Natriuretic peptide-guided treatment for heart failure: a systematic review and meta-analysis. BMJ Evid Based Med. 2020 Feb;25(1):33-37. doi: 10.1136/bmjebm-2019-111208. Epub 2019 Jul 20. PMID: 31326896; PMCID: PMC7029248.
require(meta) data(NPguided, package = "R4HCR") metabin( sm = "RR", method = "MH", event.e = eventsnp, n.e = totalnp, event.c = eventscntrl, n.c = totalcntrl, studlab = studyid, data = NPguided)
require(meta) data(NPguided, package = "R4HCR") metabin( sm = "RR", method = "MH", event.e = eventsnp, n.e = totalnp, event.c = eventscntrl, n.c = totalcntrl, studlab = studyid, data = NPguided)
Faecal immunochemical testing for adults with symptoms of colorectal cancer attending English primary care.
OXFIT
OXFIT
A data frame with 9.999 observations on the following 10 variables.
sex
Sex of patient, coded 1
= male,2
= female
fit_val
Faecal immunochemical test (FIT) micro grams per Hb/g faeces.
albumin
Blood albumin in grams per decilitre (g/dL).
alkphosphatase
Alkophosphatase (ALK) in units per litre (U/L).
crp
C-reactive protein (CRP) in mg/dL.
haemoglobin
Haemoglobin in grams per decilitre (g/dL).
mean_cell_hgb
Mean cell haemoglobin in picograms per cell (pg).
mean_cell_vol
Mean cell volume (MCV) in cubic microns (micrometre ^3).
platelets
Platelets in millilitres per Kilogram (mL/Kg).
cancer
Whether the patient had colorectal cancer (0
= No, 1
= Yes)
Faecal samples and other blood tests from routine primary care practice in Oxfordshire, UK between March 2017 and March 2020. FIT was analysed using the HM-JACKarc FIT method. Patients were followed for up to 36 months in linked hospital records for evidence of benign and serious colrectal disease (e.g. colorectal cancer, high-risk adenomas, and bowel inflammation).
This is a synthetic data set generated from the original data set and therefore does not contain actual patient data, only data from simulated patients that share similar attributes to those of the original cohort.
Nicholson BD, James T, Paddon M, et al. Faecal immunochemical testing for adults with symptoms of colorectal cancer attending English primary care: a retrospective cohort study of 14 487 consecutive test requests. Aliment Pharmacol Ther. 2020; 52: 1031–1041.
data(OXFIT, package = "R4HCR")
data(OXFIT, package = "R4HCR")
Repeated measurements of lung function (peak expiratory flow rate (PEFR)) in 20 schoolchildren (taken from a larger study).
PEFR
PEFR
A data frame with 20 observations on the following 7 variables.
child
Child ID number.
pefr1
First PEFR measurement (l/min).
pefr2
Second PEFR measurement (l/min).
pefr3
Third PEFR measurement (l/min).
pefr4
Fourth PEFR measurement (l/min).
mean
Row mean of the four PEFR measurements (l/min).
sd
Row SD of the four PEFR measurements (l/min).
Bland JM, Altman DG. Measurement error. BMJ. 1996 Sep 21;313(7059):744.
data(PEFR, package = "R4HCR")
data(PEFR, package = "R4HCR")
An amino acid bioactive peptide considered to be neurotoxic in the adult brain and a potential key driver of neurodegeneration is measured in samples from 17 men and 21 women.
Peptide
Peptide
A data frame with 38 observations on the following 2 variables.
peptide
Peptide concentrations.
sex
Sex of patient (M
= male, F
= female)
data(Peptides, package = "R4HCR") # Compare levels in men and women. t.test(peptide ~ sex, data = Peptides)
data(Peptides, package = "R4HCR") # Compare levels in men and women. t.test(peptide ~ sex, data = Peptides)
Measurements of plasma volume expressed as a percentage of normal in 99 subjects, using two alternative sets of normal values due to Nadler and Hurley.
PlasmaVolume
PlasmaVolume
A data frame with 99 observations on the following 3 variables.
Nadler
Plasma volume expressed as a percentage of normal using Nadler normal values.
Hurley
Plasma volume expressed as a percentage of normal using Hurley normal values.
Data originally supplied by C Dore, reprinted in Altman and Bland 1999.
Bland, J.M. and Altman, D.G., 1999. Measuring agreement in method comparison studies. Statistical methods in medical research, 8(2), pp.135-160.
data(PlasmaVolume, package = "R4HCR")
data(PlasmaVolume, package = "R4HCR")
Data from a study of the potencies of four cardiac substances (from Kleinbaum et al)
Potency
Potency
A data frame with 40 observations on the following 2 variables.
dosage
Dosage at which the guinea pig died.
substance
The type of cardiac substance (sub1-sub4
).
In this experiment, a dilution of one of the substances was infused into an anaesthetized guinea pig, and the dosage at which the pig died was recorded. There were ten replicates in each group (cardiac substance).
This data is featured in Kleinbaum et al (1988).
Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A., 1988. Applied regression analysis and other multivariable methods (Vol. 601). Belmont, CA: Duxbury press.
data(Potency, package = "R4HCR")
data(Potency, package = "R4HCR")
A synthesised data set from a multicentre blinded fully-crossed multi-case multi-reader (MRMC) study conducted between October 2021 to January 2022.
PTX
PTX
A data frame with 200 observations on the following 6 variables.
PTX1
The judgment from one reader on whether a pneumothorax (PTX) is present(1) or absent (0) on an image.
Conf1
The confidence score (1-4) from one reader on whether a pneumothorax is present.
PTX2
The judgment from a second reader on whether a pneumothorax is present or absent on an image.
Conf2
The confidence score (1-4) from a second reader on whether a pneumothorax is present.
PTX3
The judgment from a third reader on whether a pneumothorax is present or absent on an image.
Conf3
The confidence score (1-4) from third reader on whether a pneumothorax is present.
The original data consisted of 400 retrospectively collected and de-identified chest X-ray images of patients aged 18 years or older, identified from the CRIS database in Oxford University Hospitals NHS Trust. The study included two reader phases. In the first phase (from which the data is synthesised) readers were asked to interpret the entire dataset over three weeks, recording the perceived presence/absence of a pneumothorax on each image and their degree of confidence on a Likert type scale. A second phase (not included here) repeated the exercise with readers re-interpreting the images with assistance from Artificial Intelligence (AI)
This is a synthetic data set generated from the original data set and therefore does not contain actual patient data, only data from simulated patients that share similar attributes to those of the original cohort.
Novak, Alex, Ather, S, Gleeson, F, Espinosa, M, et al. Evaluation of the Impact of Artificial Intelligence-Assisted Image Interpretation on the Diagnostic Performance of Clinicians When Identifying Pneumothoraces on Plain Chest X-Ray: A Multi-Case Multi-Reader Study.
data(PTX, package = "R4HCR")
data(PTX, package = "R4HCR")
Subjective confidence rating in the presence of a pneumothorax (PTX) on X-ray.This dataset represents a subset of one reader's confidence scores, in one phase of the study.
PTXII
PTXII
A data frame with 300 observations on the following 2 variables.
response
Indicator for presence 1
or absence 0
of a pneumothorax on X-ray
predictor
Subjective connfidence score (1-8) in the absence or presence of a pneumothorax on a X-ray
The original data consisted of 400 retrospectively collected and de-identified chest X-ray images of patients aged 18 years or older, identified from the CRIS database in Oxford University Hospitals NHS Trust. The study included two reader phases. In the first phase (from which the data is synthesised) readers were asked to interpret the entire dataset over three weeks, recording the perceived presence/absence of a pneumothorax on each image and their degree of confidence on a Likert type scale. A second phase (not included here) repeated the exercise with readers re-interpreting the images with assistance from Artificial Intelligence (AI)
The dataset represents a subset of one reader, in one phase of the study.
Novak, Alex, Ather, S, Gleeson, F, Espinosa, M, et al. Evaluation of the Impact of Artificial Intelligence-Assisted Image Interpretation on the Diagnostic Performance of Clinicians When Identifying Pneumothoraces on Plain Chest X-Ray: A Multi-Case Multi-Reader Study.
data(PTXII, package = "R4HCR")
data(PTXII, package = "R4HCR")
Duration of remission for acute leukemia patients on active treatment or placebo.
Remission
Remission
A data frame with 42 observations on the following 5 variables.
sex
Sex of the patient (0
= male, 1
= female).
wbc
log white-blood cell count (WBC).
time
Time to event, where the event is either relapse or loss to follow up.
event
Indicator of event type, either Relapse
or Censored
.
grp
Treatment group (6-MP
= allocated to active treament, or Placebo
).
In this study, patients in remission were randomly assigned to maintenance therapy with 6-MP, an active antileukemic compound 6-MP, or a placebo. White blood cell count was also recorded as this was considered a prognostic indicator of survival for leukemia patients, with the higher values being associated with a worse prognosis.
Kleinbaum, D.G. and Klein, M., 1996. Survival Analysis: A Self-Learning Text. Springer.
Acute Leukemia Group B, Freireich, E.J., Gehan, E., Frei III, E.M.I.L., Schroeder, L.R., Wolman, I.J., Anbari, R., Burgert, E.O., Mills, S.D., Pinkel, D. and Selawry, O.S., 1963. The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy. Blood, 21(6), pp.699-716.
data(Remission, package = "R4HCR") # Number of events/censored by group aggregate(event ~ grp, data = Remission, FUN = table) # median survival times, ignoring the censoring. aggregate(time ~ grp, data = Remission, FUN = median)
data(Remission, package = "R4HCR") # Number of events/censored by group aggregate(event ~ grp, data = Remission, FUN = table) # median survival times, ignoring the censoring. aggregate(time ~ grp, data = Remission, FUN = median)
Blood test results from people presenting to primary care with non-specific symptoms of cancer.
SCAN
SCAN
A data frame with 750 observations on the following 8 variables.
age
Age of the patient (in years).
comorbidity
Charlson comorbidity score.
haemoglobin
Haemoglobin (g/dL)
albumin
Blood Albumin (g/dL)
alaninetrans
Alanine Transaminase (U/L)
whitebloodcell
White blood cell count (per microlitre x 10^9/L)
bilirubin
Bilirubin (umol/L)
calcium
Calcium in milligrams (mg/dL)
This is a synthetic data set generated from the original data set and therefore does not contain actual patient data, only data from simulated patients that share similar attributes to those of the original cohort.
Nicholson BD, Oke JL, Friedemann Smith C, et al. The Suspected CANcer (SCAN) pathway: protocol for evaluating a new standard of care for patients with non-specific symptoms of cancer. BMJ Open 2018;8:e018168.
data(SCAN, package = "R4HCR")
data(SCAN, package = "R4HCR")
The number of deaths registered in Scotland per week for the first 42 weeks of 2021, stratified by cause of death.
Scotland
Scotland
A matrix with five rows and 42 columns.
rows
Cancer, Dementia, Respiratory, SARS-Cov2 and Other causes of death.
columns
Regsitration Weeks (Wk1 - Wk42).
Downloaded from https://www.nrscotland.gov.uk/research/guides/birth-death-and-marriage-records in Nov 2021.
data(Scotland, package = "R4HCR") # A stacked barplot. barplot(Scotland, legend.text = c("Cancer","Dementia/Alzheimers", "Circulatory","Respiratory","Covid-19","Other"), beside = FALSE, cex.names = 0.8, angle = c(45,90,135,180,215), density = 45, args.legend = c(ncol = 3, cex = 0.65, x = 45))
data(Scotland, package = "R4HCR") # A stacked barplot. barplot(Scotland, legend.text = c("Cancer","Dementia/Alzheimers", "Circulatory","Respiratory","Covid-19","Other"), beside = FALSE, cex.names = 0.8, angle = c(45,90,135,180,215), density = 45, args.legend = c(ncol = 3, cex = 0.65, x = 45))
The objective of this study was to evaluate the diagnostic accuracy of CIN2+ detection using a combined approach (naked-eye and digital VIA (visual inspection with acetic acid) using a Samsung Galaxy J5 smartphone) compared to a traditional naked-eye alone.
Smartphone
Smartphone
A data frame with 181 observations on the following 10 variables.
hpv16
negative
or positive
for HPV16.
hpv1845
HPV18 and/or HPV45 (present
or absent
)
hpvother
Other high-risk HPV types (present
or absent
).
naked_via
Convential visual assessment using naked eye alone (negative
, positive
).
smart_via
Digital VIA result (negative
or positive
).
treatment
Decision to treat (no
or yes
).
combined_via
Combined naked-eye and digital VIA diagnosis (neither positive
or either positive
).
histology
Histological result (negative
,CIN1
,CIN2
, CIN3
, cancer
).
cytology
Cytological result (negative
, LSIL
, HSIL
, ASC-US
, AGC
,
ASC-H
, cancer
, non-interpretable
).
CIN2plus
Histological result CIN2 or higher (<CIN2
, CIN2+
).
These data are from a screening trial conducted in Dschang (West Cameroon) between February 2019 and March 2020. Women aged 30 to 49 were invited to participate in a free cervical cancer screening campaign. Primary HPV-based screening was followed by a pelvic exam for visual assessment (viewing the cervix with the naked eye to identify colour changes on the cervix) and then cervical biopsy and endocervical curettage. The study aimed to assess whether the use, in addition to normal visual inspection, of images captured using a smartphone could improve the detection of precancerous lesions or cancer.
Data directly available from https://yareta.unige.ch/archives/ffbeb6d7-b390-4755-987e-8faf85f97c67
Dufeil, E., Kenfack, B., Tincho, E., Fouogue, J., Wisniak, A., Sormani, J., Vassilakos, P. and Petignat, P., 2022. Addition of digital VIA/VILI to conventional naked-eye examination for triage of HPV-positive women: A study conducted in a low-resource setting. Plos one, 17(5), p.e0268015.
data(Smartphone, package = "R4HCR")
data(Smartphone, package = "R4HCR")
Systolic blood pressure measurements made simultaneously by two observers (J and R) using a sphygmomanometer and an automatic blood pressure measuring machine (S), each making three observations in quick succession.
Systolic
Systolic
A data frame with 85 observations on the following 9 variables.
J1
First (of three) measurements made by observer J.
J2
Second (of three) measurements made by observer J.
J3
Third (of three) measurements made by observer J.
R1
First (of three) measurements made by observer R.
R2
Second (of three) measurements made by observer R.
R3
Third (of three) measurements made by observer R.
S1
First (of three) measurements made using a machine.
S2
Second (of three) measurements made using a machine.
S3
Third (of three) measurements made using a machine.
Data supplied originally by Dr E O'Brien, and reprinted in Altman and Bland (1999).
Bland, J.M. and Altman, D.G., 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), pp.135-160.
data(Systolic, package = "R4HCR")
data(Systolic, package = "R4HCR")
Data from the study of Hill and Doll (1966) on the mortality of British doctors in relation to smoking: observations on coronary thrombosis and used in Agresti (1996).
Thrombosis
Thrombosis
A data frame with 10 observations on the following 4 variables.
age
Age band of strata (35-44
, 45-54
, 55-64
, 65-74
).
smoking
Smoking status (Nonsmokers
or Smokers
).
deaths
Number of deaths from coronary thrombosis per strata.
pyrs
Sum of person-years in strata.
Agresti, A., 1996. An introduction to categorical data analysis.
Doll R, Hill AB. Mortality of British doctors in relation to smoking: observations on coronary thrombosis. Natl Cancer Inst Monogr. 1966 Jan;19:205-68. PMID: 5905669.
data(Thrombosis) with(Thrombosis, xtabs(cbind(deaths,pyrs) ~ age + smoking))
data(Thrombosis) with(Thrombosis, xtabs(cbind(deaths,pyrs) ~ age + smoking))
US Incidence, mortality, and survival statistics for 20 solid tumor types.
USCancerStats
USCancerStats
A data frame with 20 observations on the following 4 variables.
site
The site (or organ) of the cancer.
survival
Absolute change in site-specific five-year survival.
mortality
Percentage change in site-specific mortality.
incidence
Percentage change in sit-specific incidence.
Incidence, mortality, and survival statistics for 20 solid tumor types reported by the SEER pro- gram. For each tumor, the absolute difference in 5-year survival between 1989-1995 and 1950-1954 is reported, along with the percentage change in mortality and incidence for 1950 - 1996.
Welch, H.G., Schwartz, L.M. and Woloshin, S., 2000. Are increasing 5-year survival rates evidence of success against cancer?. JAMA, 283(22), pp.2975-2978.
data(USCancerStats, package = "R4HCR") cor.test( ~ survival + mortality, data = USCancerStats, exact = FALSE, method = "sp")
data(USCancerStats, package = "R4HCR") cor.test( ~ survival + mortality, data = USCancerStats, exact = FALSE, method = "sp")
Number of people with at least one vaccination against SARS-COV2 as of Nov 2021
Vaccinated
Vaccinated
A data frame with 15 observations on the following 3 variables.
country
Name of European country.
vaccinated
Percentage of people vaccinated against SARS-COV2.
fully_vaccinated
Percentage of people fully vaccinated against SARS-COV2.
These data are the number of people with at least one vaccination against SARS-COV2 (a.k.a Covid-19) as per the week ending the 12th November 2021, per hundred for countries in Europe with a population greater than 10 million. Fully vaccinated refers to having completed all vaccinations (including boosters) for that country.
data(Vaccinated, package = "R4HCR") heights <- Vaccinated$vaccinated names <- Vaccinated$country bp <- barplot(height = heights, col = "white", ylim=c(0,100), names.arg = names, cex.names = 0.9, las = 2, ylab = "People vaccinated per 100") # using round here to save space labels <- round(Vaccinated$vaccinated,0) text(x = bp, y = labels-2, labels = labels, cex = 0.9, pos = 3)
data(Vaccinated, package = "R4HCR") heights <- Vaccinated$vaccinated names <- Vaccinated$country bp <- barplot(height = heights, col = "white", ylim=c(0,100), names.arg = names, cex.names = 0.9, las = 2, ylab = "People vaccinated per 100") # using round here to save space labels <- round(Vaccinated$vaccinated,0) text(x = bp, y = labels-2, labels = labels, cex = 0.9, pos = 3)
Mortaility associated with volatile substance abuse (VSA).This study collated all known death associated with VSA from 1971 to 1983 (inclusively).
VSA
VSA
A data frame with 9 observations on the following 4 variables.
age
Age band in nine categories 0-9
,10-14
,15-19
,20-24
,25-29
,30-39
,40-49
,50-59
,60+
.
country
The country in which the deaths were recorded (Great Britain
or Scotland
).
pop
Population size of the age band.
deaths
The number of deaths associated with VSA per age band.
The data was taken from Bland (2015), who cites Anderson et al (1985) as the source of the data. Note that Scotland is one of the three countries that make up Great Britain, along with England and Wales.
Bland, M., 2015. An introduction to medical statistics. Oxford University Press.
Anderson, H.R., Macnair, R.S. and Ramsey, J.D., 1985. Deaths from abuse of volatile substances: a national epidemiological study. Br Med J (Clin Res Ed), 290(6464), pp.304-307.
data(VSA, package = "R4HCR")
data(VSA, package = "R4HCR")