Title: | A Toolbox for Public Health and Epidemiology |
---|---|
Description: | A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand 'ggplot2' plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only. |
Authors: | Josie Athens [aut, cre], Frank Harell [ctb], John Fox [ctb], R-Core [ctb] |
Maintainer: | Josie Athens <[email protected]> |
License: | GPL-2 |
Version: | 2.0.0 |
Built: | 2025-01-07 06:51:13 UTC |
Source: | CRAN |
axis_labs
takes labels from labelled data to use them as axis-labels for plots generated by gformula
or ggplot2
.
axis_labs(object)
axis_labs(object)
object |
ggplot2 object (see examples). |
This functions is helpful when data has been already labelled by sjlabelled
. It retrives variable labels and use them for plotting.
A ggplot2 object.
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> gf_point(weight ~ dl.milk) |> gf_lm(col = 2, interval = "confidence", col = 2) |> axis_labs() kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> axis_labs() |> gf_star(x1 = 1, y1 = 10.9, x2 = 2, y2 = 11, y3 = 11.2)
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> gf_point(weight ~ dl.milk) |> gf_lm(col = 2, interval = "confidence", col = 2) |> axis_labs() kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> axis_labs() |> gf_star(x1 = 1, y1 = 10.9, x2 = 2, y2 = 11, y3 = 11.2)
bar_error
constructs bar charts in with error bars showing 95
confidence intervals around mean values. High of bars represent mean values.
bar_error( object = NULL, formula = NULL, data = NULL, fill = "indianred3", col = "black", alpha = 0.7, ... )
bar_error( object = NULL, formula = NULL, data = NULL, fill = "indianred3", col = "black", alpha = 0.7, ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
fill |
Colour used to fill the bars. |
col |
Colour used for the borders of the bars. |
alpha |
Opacity of the colour fill (0 = invisible, 1 = opaque). |
... |
Additional information passed to |
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> bar_error(bwt ~ smoke, fill = "plum3") birthwt |> bar_error(bwt ~ smoke | Race, fill = "plum3") birthwt |> bar_error(bwt ~ smoke, fill = ~Race)
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> bar_error(bwt ~ smoke, fill = "plum3") birthwt |> bar_error(bwt ~ smoke | Race, fill = "plum3") birthwt |> bar_error(bwt ~ smoke, fill = ~Race)
A randomised, double-blind, placebo-controlled trial of intravenous ibuprofen in 455 patients who had sepsis, defined as fever, tachycardia, tachypnea, and acute failure of at least one organ system.
Bernard
Bernard
A labelled tibble with 455 rows and 9 variables:
Patient ID
Treatment, factor with levels "Placebo" and "Ibuprofen".
Race/ethnicity, factor with levels "White", "African American" and "Other".
Mortality status at 30 days, factor with levels "Alive" and "Dead".
Baseline APACHE score.
Oxygen delivery at baseline.
Follow-up time in hours.
Baseline temperature in centigrades.
Temperature after 36 hr in centigrades.
Bernard, GR, et al. (1997) The effects of ibuprofen on the physiology and survival of patients with sepsis, N Engl J Med 336: 912–918.
Bland-Altman agreement plots.
bland_altman( object = NULL, formula = NULL, data = NULL, pch = 20, size = 1, col = "black", transform = FALSE, ... )
bland_altman( object = NULL, formula = NULL, data = NULL, pch = 20, size = 1, col = "black", transform = FALSE, ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: y ~ x (see details). |
data |
A data frame where the variables in the |
pch |
Symbol for plotting data. |
size |
Size of the symbol using to plot data. |
col |
Colour used for the symbol to plot data. |
transform |
Logical, should ratios instead of difference be used to construct the plot? |
... |
Further arguments passed to |
bland_altman
constructs Bland-Altman agreement plots.
Variables in formula
are continuous paired observations. When the distribution of the outcome
is not normal, but becomes normal with a log-transformation, bland_altman
can plot the ratio between
outcomes (difference in the log scale) by using option transform = TRUE
.
data(wright, package = "ISwR") wright |> bland_altman(mini.wright ~ std.wright, pch = 16, ylab = "Large-mini expiratory flow rate (l/min)", xlab = "Mean expiratory flow rate (l/min)" ) |> gf_labs( y = "Large-mini expiratory flow rate (l/min)", x = "Mean expiratory flow rate (l/min)" ) data(Sharples) Sharples |> bland_altman(srweight ~ weight, transform = TRUE) |> gf_labs(x = "Mean of weights (kg)", y = "Measured weight / Self-reported weight")
data(wright, package = "ISwR") wright |> bland_altman(mini.wright ~ std.wright, pch = 16, ylab = "Large-mini expiratory flow rate (l/min)", xlab = "Mean expiratory flow rate (l/min)" ) |> gf_labs( y = "Large-mini expiratory flow rate (l/min)", x = "Mean expiratory flow rate (l/min)" ) data(Sharples) Sharples |> bland_altman(srweight ~ weight, transform = TRUE) |> gf_labs(x = "Mean of weights (kg)", y = "Measured weight / Self-reported weight")
box_plot
is a wrap function that calls gf_boxplot
to construct more aesthetic box plots.
box_plot( object = NULL, formula = NULL, data = NULL, fill = "indianred3", alpha = 0.7, outlier.shape = 20, outlier.size = 1, ... )
box_plot( object = NULL, formula = NULL, data = NULL, fill = "indianred3", alpha = 0.7, outlier.shape = 20, outlier.size = 1, ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
fill |
Colour used for the box passed to |
alpha |
Opacity (0 = invisible, 1 = opaque). |
outlier.shape |
Shape ( |
outlier.size |
Size of the outlier symbol. |
... |
Further arguments passed to |
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) t.test(dl.milk ~ sex, data = kfm) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(1, 10.9, 2, 11, 11.4, legend = "p = 0.035", size = 2.5)
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) t.test(dl.milk ~ sex, data = kfm) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(1, 10.9, 2, 11, 11.4, legend = "p = 0.035", size = 2.5)
A data set containing the prevalence of Helicobacter pylori infection in preschool children according to parental history of duodenal or gastric ulcer.
Brenner
Brenner
A labelled tibble with 863 rows and 2 variables:
History of duodenal or gastric ulcer, factor with levels "No" and "Yes".
Infected with Helicobacter pylori, factor with levels "No" and "Yes".
Brenner H, Rothenbacher D, Bode G, Adler G (1998) Parental history of gastric or duodenal ulcer and prevalence of Helicobacter pylori infection in preschool children: population based study. BMJ 316:665.
bst
estimates confidence intervals around the mean
, median
or geo_mean
.
bst(x, stat = "mean", n = 1000, CI = 95, digits = 2)
bst(x, stat = "mean", n = 1000, CI = 95, digits = 2)
x |
A numerical variable. Missing observations are removed by default. |
stat |
Statistic, either "mean" (default), "median" or "gmean" (geometric mean). |
n |
Number of replicates for the bootstrap (n=1000 by default). |
CI |
Confidence intervals (CI=95 by default). |
digits |
Number of digits for rounding (default = 2). |
A data frame with the estimate and confidence intervals.
data(IgM, package = "ISwR") bst(IgM, "median") bst(IgM, "gmean")
data(IgM, package = "ISwR") bst(IgM, "median") bst(IgM, "gmean")
coef_det
estimates the coefficient of determination (r-squared) from fitted (predicted) and
observed values. Outcome from the model is assumed to be numerical.
coef_det(obs, fit)
coef_det(obs, fit)
obs |
Vector with observed values (numerical outcome). |
fit |
Vector with fitted (predicted) values. |
A scalar, the coefficient of determination (r-squared).
## Linear regression: Riboflavin <- seq(0, 80, 10) OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03) titration <- data.frame(Riboflavin, OD) model1 <- lm(OD ~ Riboflavin, data = titration) summary(model1) coef_det(titration$OD, fitted(model1)) ## Non-linear regression: library(nlme, quietly = TRUE) data(Puromycin) mm.tx <- gnls(rate ~ SSmicmen(conc, Vm, K), data = Puromycin, subset = state == "treated" ) summary(mm.tx) coef_det(Puromycin$rate[1:12], mm.tx$fitted)
## Linear regression: Riboflavin <- seq(0, 80, 10) OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03) titration <- data.frame(Riboflavin, OD) model1 <- lm(OD ~ Riboflavin, data = titration) summary(model1) coef_det(titration$OD, fitted(model1)) ## Non-linear regression: library(nlme, quietly = TRUE) data(Puromycin) mm.tx <- gnls(rate ~ SSmicmen(conc, Vm, K), data = Puromycin, subset = state == "treated" ) summary(mm.tx) coef_det(Puromycin$rate[1:12], mm.tx$fitted)
estat
calculates descriptives of numerical variables.
estat(object = NULL, formula = NULL, data = NULL, digits = 2, label = NULL)
estat(object = NULL, formula = NULL, data = NULL, digits = 2, label = NULL)
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: ~ x or ~ x|z (for groups). |
data |
A data frame where the variables in the |
digits |
Number of digits for rounding (default = 2). |
label |
Label used to display the name of the variable (see examples). |
A data frame with descriptive statistics.
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> estat(~dl.milk) estat(~ dl.milk | sex, data = kfm) kfm |> estat(~ weight | sex)
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> estat(~dl.milk) estat(~ dl.milk | sex, data = kfm) kfm |> estat(~ weight | sex)
expand_df
expands a data frame by a vector of frequencies.
expand_df(aggregate.data, index.var = "Freq", retain.freq = FALSE)
expand_df(aggregate.data, index.var = "Freq", retain.freq = FALSE)
aggregate.data |
A data frame. |
index.var |
A numerical variable with the frequencies (counts). |
retain.freq |
Logical expression indicating if frequencies should be kept. |
This is a generic function that resembles weighted frequencies in other statistical packages
(for example, Stata). expand.df
was adapted from a function developed by deprecated package
epicalc
(now package epiDisplay
).
An expanded data frame with replicates given by the frequencies.
Freq <- c(5032, 5095, 41, 204) Mortality <- gl(2, 2, labels = c("No", "Yes")) Calcium <- gl(2, 1, 4, labels = c("No", "Yes")) anyca <- data.frame(Freq, Mortality, Calcium) anyca anyca.exp <- expand_df(anyca) with(anyca.exp, table(Calcium, Mortality))
Freq <- c(5032, 5095, 41, 204) Mortality <- gl(2, 2, labels = c("No", "Yes")) Calcium <- gl(2, 1, 4, labels = c("No", "Yes")) anyca <- data.frame(Freq, Mortality, Calcium) anyca anyca.exp <- expand_df(anyca) with(anyca.exp, table(Calcium, Mortality))
Randomised control trial on children suffering from frequent and severe migraine. Control group represents untreated children. The active treatments were either relaxation alone or relaxation with biofeedback.
Fentress
Fentress
A labelled tibble with 18 rows and 2 variables:
Reduction in weekly headache activity expressed as percentage of baseline data.
Group, a factor with levels "Untreated", "Relaxation" (alone) and "Biofeedback" (relaxation and biofeedback).
Fentress, DW, et al. (1986) Biofeedback and relaxation-response in the treatment of pediatric migraine. Dev Med Child Neurol 28:1 39-46.
Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.
data(Fentress) Fentress |> strip_error(pain ~ group)
data(Fentress) Fentress |> strip_error(pain ~ group)
freq_cont
tabulates a continuous variable by given classes.
freq_cont(x, bks, dg = 2)
freq_cont(x, bks, dg = 2)
x |
A numerical (continuous) variable. Ideally, relatively long (greater than 100 observations). |
bks |
Breaks defining the classes (see example). |
dg |
Number of digits for rounding (default = 2). |
A data frame with the classes, the mid-point, the frequencies, the relative and cumulative frequencies.
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) freq_cont(IgM, seq(0, 4.5, 0.5))
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) freq_cont(IgM, seq(0, 4.5, 0.5))
gen_bst_df
is a function called that generates a data frame with
confidence intervals of a continuous variable by levels of one or two categorical ones (factors).
gen_bst_df(object = NULL, formula = NULL, data = NULL, stat = "mean", ...)
gen_bst_df(object = NULL, formula = NULL, data = NULL, stat = "mean", ...)
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
stat |
Statistic used for bst. |
... |
Passes optional arguments to bst. |
A data frame with the confidence intervals by level.
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> gen_bst_df(dl.milk ~ sex) data(birthwt, package = "MASS") require(dplyr, quietly = TRUE) birthwt <- mutate(birthwt, smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) birthwt <- birthwt |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) gen_bst_df(bwt ~ smoke | Race, data = birthwt)
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> gen_bst_df(dl.milk ~ sex) data(birthwt, package = "MASS") require(dplyr, quietly = TRUE) birthwt <- mutate(birthwt, smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) birthwt <- birthwt |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) gen_bst_df(bwt ~ smoke | Race, data = birthwt)
Geometric mean.
geo_mean(x)
geo_mean(x)
x |
A numeric variable with no negative values. |
A scalar, the calculated geometric mean.
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) geo_mean(IgM)
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) geo_mean(IgM)
get_r2
is a is a wrap function that calls r2
from package performance
.
Calculates the R2 or pseudo-R2 value for different regression model objects, returning a character object for easy printing in tables of coefficients.
get_r2(model, ...)
get_r2(model, ...)
model |
A statistical regression model. |
... |
Additional arguments passed to |
The main purpose of get_r2
is to allow easy printing of R2 value in tables of coefficients (see examples).
r2
.
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) get_r2(model_norm)
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) get_r2(model_norm)
gf_star
Is a function used to display differences between groups (see details).
gf_star(fig, x1, y1, x2, y2, y3, legend = "*", ...)
gf_star(fig, x1, y1, x2, y2, y3, legend = "*", ...)
fig |
A gformula object. |
x1 |
Position in |
y1 |
Position in |
x2 |
Position in |
y2 |
Position in |
y3 |
Position in |
legend |
Character text used for annotating the plot. |
... |
Additional information passed to |
This function draws an horizontal line from coordinate (x1
, y2
)
to coordinate (x2
, y2
). Draws vertical lines below the horizontal line,
towards data, from (x1
, y1
) to (x1
, y2
) and from
(x2
, y1
) to (x2
, y2
). Finally, adds
text above the horizontal line, at the mid point between x1
and x2
.
See examples.
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(x1 = 1, y1 = 10.9, x2 = 2, y2 = 11, y3 = 11.2) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(1, 10.9, 2, 11, 11.4, legend = "p = 0.035", size = 2.5) data(energy, package = "ISwR") energy <- energy |> var_labels( expend = "Energy expenditure (MJ/day)", stature = "Stature" ) energy |> strip_error(expend ~ stature, col = "red") |> gf_star(1, 13, 2, 13.2, 13.4, "**")
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(x1 = 1, y1 = 10.9, x2 = 2, y2 = 11, y3 = 11.2) kfm |> box_plot(dl.milk ~ sex, fill = "thistle", alpha = 0.8) |> gf_star(1, 10.9, 2, 11, 11.4, legend = "p = 0.035", size = 2.5) data(energy, package = "ISwR") energy <- energy |> var_labels( expend = "Energy expenditure (MJ/day)", stature = "Stature" ) energy |> strip_error(expend ~ stature, col = "red") |> gf_star(1, 13, 2, 13.2, 13.4, "**")
glm_coef
displays estimates with confidence intervals and p-values from generalised linear models (see Details).
glm_coef( model, digits = 2, alpha = 0.05, labels = NULL, se_rob = FALSE, type = "cond", exp_norm = FALSE )
glm_coef( model, digits = 2, alpha = 0.05, labels = NULL, se_rob = FALSE, type = "cond", exp_norm = FALSE )
model |
A model from any of the classes listed in the details section. |
digits |
A scalar, number of digits for rounding the results (default = 2). |
alpha |
Significant level (default = 0.05) used to calculate confidence intervals. |
labels |
An optional character vector with the names of the coefficients (including intercept). |
se_rob |
Logical, should robust errors be used to calculate confidence intervals? (default = FALSE). |
type |
Character, either "cond" (condensed) or "ext" (extended). See details. |
exp_norm |
Logical, should estimates and confidence intervals should be exponentiated? (for family == "gaussian"). |
glm_coef
recognises objects (models) from the following classes: clm
, clogit
,
coxph
, gee
, glm
, glmerMod
, lm
, lme
, lmerMod
, multinom
, negbin
,
polr
and surveg
For models from logistic regression (including conditional logistic, ordinal and multinomial), Poisson or survival analysis, coefficient estimates and corresponding confidence intervals are automatically exponentiated (back-transformed).
By default, glm_coef
uses naive standard errors for calculating confidence intervals but has the option of using robust standard errors instead.
glm_coef
can display two different data frames depending on the option of type
,
for type type = "cond"
(the default), glm_coef
displays the standard table of coefficients
with confidence intervals and p-values; for type = "ext"
, glm_coef
displays additional statistics including standard errors.
Please read the Vignette on Regression for more details.
A data frame with estimates, confidence intervals and p-values from glm
objects.
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) ## Continuous outcome. data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) glm_coef(model_norm, labels = model_labels(model_norm)) ## Logistic regression. data(diet, package = "Epi") model_binom <- glm(chd ~ fibre, data = diet, family = binomial) model_binom |> glm_coef(labels = c("Constant", "Fibre intake (g/day)")) model_binom |> glm_coef(labels = c("Constant", "Fibre intake (g/day)"), type = "ext")
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) ## Continuous outcome. data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) glm_coef(model_norm, labels = model_labels(model_norm)) ## Logistic regression. data(diet, package = "Epi") model_binom <- glm(chd ~ fibre, data = diet, family = binomial) model_binom |> glm_coef(labels = c("Constant", "Fibre intake (g/day)")) model_binom |> glm_coef(labels = c("Constant", "Fibre intake (g/day)"), type = "ext")
Harmonic mean.
harm_mean(x)
harm_mean(x)
x |
A numeric variable with no zero values. |
A scalar, the calculated harmonic mean.
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) harm_mean(IgM)
data(IgM, package = "ISwR") Ab <- data.frame(IgM) estat(~IgM, data = Ab) harm_mean(IgM)
hist_norm
constructs histograms and adds corresponding Normal density curve.
hist_norm( object = NULL, formula = NULL, data = NULL, bins = 20, fill = "indianred3", color = "black", alpha = 0.4, ... )
hist_norm( object = NULL, formula = NULL, data = NULL, bins = 20, fill = "indianred3", color = "black", alpha = 0.4, ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
bins |
Number of bins of the histogram. |
fill |
Colour to fill the bars of the histogram. |
color |
Colour used for the border of the bars. |
alpha |
Opacity (0 = invisible, 1 = opaque). |
... |
Further arguments passed to |
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> hist_norm(~bwt, alpha = 0.7, bins = 20, fill = "cadetblue") birthwt |> hist_norm(~ bwt | smoke, alpha = 0.7, bins = 20, fill = "cadetblue")
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> hist_norm(~bwt, alpha = 0.7, bins = 20, fill = "cadetblue") birthwt |> hist_norm(~ bwt | smoke, alpha = 0.7, bins = 20, fill = "cadetblue")
Number of CD4+ T-cells and CD8+ T-cells in blood samples from patients in remission from Hodgkin's disease or in remission from disseminated malignancies.
Hodgkin
Hodgkin
A labelled tibble with 40 rows and 3 variables:
Concentration of CD4+ T-cells (cells / mm^3).
Concentration of CD8+ T-cells (cells / mm^3).
Group, factor with levels "Non-Hodgkin" and "Hodgkin".
Shapiro, CM, et al (1986) Immunologic status of patients in remission from Hodgkin's disease and disseminated malignancies. Am J Med Sci 293:366-370.
Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.
data(Hodgkin) require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) Hodgkin <- Hodgkin |> mutate( Ratio = CD4 / CD8 ) |> var_labels( Ratio = "CD4+ / CD8+ T-cells" ) estat(~ Ratio | Group, data = Hodgkin) Hodgkin |> qq_plot(~ Ratio | Group) Hodgkin$Ratio <- Hodgkin$CD4 / Hodgkin$CD8 estat(~ Ratio | Group, data = Hodgkin) qq_plot(~ Ratio | Group, data = Hodgkin)
data(Hodgkin) require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) Hodgkin <- Hodgkin |> mutate( Ratio = CD4 / CD8 ) |> var_labels( Ratio = "CD4+ / CD8+ T-cells" ) estat(~ Ratio | Group, data = Hodgkin) Hodgkin |> qq_plot(~ Ratio | Group) Hodgkin$Ratio <- Hodgkin$CD4 / Hodgkin$CD8 estat(~ Ratio | Group, data = Hodgkin) qq_plot(~ Ratio | Group, data = Hodgkin)
inv_logit
Calculates the inverse of the logit (probability in logistic regression)
inv_logit(x)
inv_logit(x)
x |
Numerical value used to compute the inverse of the logit. |
jack_knife
Ranks the squared differences between mean values from Jackknife analysis
(arithmetic mean estimated by removing one observation at a time) and the original mean value.
jack_knife(x)
jack_knife(x)
x |
A numeric variable. Missing values are removed by default. |
Data frame with the ranked squared differences.
x <- rnorm(10, 170, 8) x mean(x) jack_knife(x) x <- rnorm(100, 170, 8) mean(x) head(jack_knife(x))
x <- rnorm(10, 170, 8) x mean(x) jack_knife(x) x <- rnorm(100, 170, 8) mean(x) head(jack_knife(x))
Body weight and plasma volume in eight healthy men.
Kirkwood
Kirkwood
A labelled data frame with 8 rows and 3 variables:
Subject ID.
Body weight in kg.
Plasma volume in litres.
Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.
data(Kirkwood) Kirkwood |> gf_point(volume ~ weight) |> gf_lm(col = "indianred3", interval = "confidence", fill = "indianred3")
data(Kirkwood) Kirkwood |> gf_point(volume ~ weight) |> gf_lm(col = "indianred3", interval = "confidence", fill = "indianred3")
knife_mean
is an internal function. Calculates arithmetic means by removing one observation
at a time.
knife_mean(x)
knife_mean(x)
x |
A numerical variable. Missing values are removed for the mean calculation. |
A vector with the mean calculations.
x <- rnorm(10, 170, 8) x mean(x) knife_mean(x)
x <- rnorm(10, 170, 8) x mean(x) knife_mean(x)
leverage
is an internal function called by rank_leverage
.
leverage(x)
leverage(x)
x |
A numeric variable. Missing values are removed by default. |
Estimates the leverage of each observation around the arithmetic mean.
Variable with corresponding leverage estimations
x <- rnorm(10, 170, 8) x mean(x) leverage(x) rank_leverage(x)
x <- rnorm(10, 170, 8) x mean(x) leverage(x) rank_leverage(x)
logistic_gof
performs the Hosmer and Lemeshow test to test the goodness of fit of a logistic
regression model. This function is part of residuals.lrm
from package rms
.
logistic_gof(model)
logistic_gof(model)
model |
A logistic regression model object. |
Frank Harell, Vanderbilt University <[email protected]>
Hosmer DW, Hosmer T, Lemeshow S, le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med 16:965–980, 1997.
data(diet, package = "Epi") model <- glm(chd ~ fibre, data = diet, family = binomial) glm_coef(model, labels = c("Constant", "Fibre intake (g/day)")) logistic_gof(model)
data(diet, package = "Epi") model <- glm(chd ~ fibre, data = diet, family = binomial) glm_coef(model, labels = c("Constant", "Fibre intake (g/day)")) logistic_gof(model)
An international case-control study to test the hypothesis that breast cancer is related to the age that a woman gives childbirth.
Macmahon
Macmahon
A labelled tibble with 185 rows and 2 variables:
Diagnosed with breast cancer, a factor with levels "No" and "Yes".
Age mother gives childbirth, factor with levels "<20", "20-24", "25-29", "30-34" and ">34".
Macmahon, B. et al. (1970). Age at first birth and breast cancer risk. Bull WHO 43, 209-221.
mhor
computes odds ratios by levels of the stratum variable as well as the Mantel-Haenszel
pooled odds ratio. The test for effect modification (test for interaction) is also displayed.
mhor(object = NULL, formula = NULL, data = NULL)
mhor(object = NULL, formula = NULL, data = NULL)
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: outcome ~ stratum/exposure. |
data |
A data frame containing the variables used in |
Odds ratios with 95
outcome
by levels of stratum
. The Mantel-Haenszel pooled OR and the test
for effect modification is also reported.
data(oswego, package = "epitools") require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) oswego <- oswego |> mutate( ill = factor(ill, labels = c("No", "Yes")), sex = factor(sex, labels = c("Female", "Male")), chocolate.ice.cream = factor(chocolate.ice.cream, labels = c("No", "Yes")) ) |> var_labels( ill = "Developed illness", sex = "Sex", chocolate.ice.cream = "Consumed chocolate ice cream" ) oswego |> mhor(ill ~ sex / chocolate.ice.cream)
data(oswego, package = "epitools") require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) oswego <- oswego |> mutate( ill = factor(ill, labels = c("No", "Yes")), sex = factor(sex, labels = c("Female", "Male")), chocolate.ice.cream = factor(chocolate.ice.cream, labels = c("No", "Yes")) ) |> var_labels( ill = "Developed illness", sex = "Sex", chocolate.ice.cream = "Consumed chocolate ice cream" ) oswego |> mhor(ill ~ sex / chocolate.ice.cream)
model_labels
replaces row names in glm_coef
with labels from the original data frame.
model_labels(model, intercept = TRUE)
model_labels(model, intercept = TRUE)
model |
A generalised linear model. |
intercept |
Logical, should the intercept be added to the list of coefficients? |
model_labels
does not handle yet interaction terms, see examples.
Please read the Vignette on Regression for more examples.
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) glm_coef(model_norm, labels = model_labels(model_norm)) model_int <- lm(formula = bwt ~ smoke * race, data = birthwt) model_int |> glm_coef(labels = c( model_labels(model_int), "Smoker: African American", "Smoker: Other" ))
require(dplyr, quietly = TRUE) require(sjlabelled, quietly = TRUE) data(birthwt, package = "MASS") birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), race = factor(race, labels = c("White", "African American", "Other")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status", race = "Race" ) model_norm <- lm(bwt ~ smoke + race, data = birthwt) glm_coef(model_norm, labels = model_labels(model_norm)) model_int <- lm(formula = bwt ~ smoke * race, data = birthwt) model_int |> glm_coef(labels = c( model_labels(model_int), "Smoker: African American", "Smoker: Other" ))
multiple
displays results from post-doc analysis and constructs corresponding plot.
multiple( model, formula, adjust = "mvt", type = "response", reverse = TRUE, level = 0.95, digits = 2, ... )
multiple( model, formula, adjust = "mvt", type = "response", reverse = TRUE, level = 0.95, digits = 2, ... )
model |
A fitted model supported by |
formula |
A formula with shape: |
adjust |
Method to adjust CIs and p-values (see details). |
type |
Type of prediction (matching "linear.predictor", "link", or "response"). |
reverse |
Logical argument. Determines the direction of comparisons. |
level |
Confidence interval significance level. |
digits |
Number of digits for rounding (default = 2). |
... |
Further arguments passed to |
The default adjusting method is "mvt" which uses the multivariate t distribution.
Other options are: "bonferroni", "holm", "hochberg", "tukey" and "none". The default option for argument reverse
is to make reverse comparisons, i.e., against the reference level matching comparisons from lm
and glm
.
A list with objects: df
A data frame with adjusted p-values, fig_ci
a plot with estimates and adjusted confidence intervals and fig_pval
a plot comparing adjusted p-values.
data(birthwt, package = "MASS") birthwt$race <- factor(birthwt$race, labels = c("White", "African American", "Other")) model_1 <- aov(bwt ~ race, data = birthwt) multiple(model_1, ~race)$df multiple(model_1, ~race)$fig_ci |> gf_labs(y = "Race", x = "Difference in birth weights (g)") multiple(model_1, ~race)$fig_pval |> gf_labs(y = "Race")
data(birthwt, package = "MASS") birthwt$race <- factor(birthwt$race, labels = c("White", "African American", "Other")) model_1 <- aov(bwt ~ race, data = birthwt) multiple(model_1, ~race)$df multiple(model_1, ~race)$fig_ci |> gf_labs(y = "Race", x = "Difference in birth weights (g)") multiple(model_1, ~race)$fig_pval |> gf_labs(y = "Race")
odds_trend
calculates the odds ratio with confidence intervals (Wald) for different levels
(three or more) of the exposure variable, constructs the corresponding plot and calculates if the trend is
significant or not.
odds_trend(formula, data, angle = 45, hjust = 1, method = "wald", ...)
odds_trend(formula, data, angle = 45, hjust = 1, method = "wald", ...)
formula |
A formula with shape: outcome ~ exposure. |
data |
A data frame where the variables in the |
angle |
Angle of for the x labels (default = 45). |
hjust |
Horizontal adjustment for x labels (default = 1). |
method |
Method for calculating confidence interval around odds ratio. |
... |
Passes optional arguments to |
odds_trend
is a wrap function that calls oddsratio
from package epitools
.
Additional methods for confidence intervals include: "midp"
, "fisher"
, and "small"
.
A list with components df
a data frame with the results and fig
corresponding plot.
## A cross-sectional study looked at the association between obesity and a biopsy resulting ## from mammography screening. Freq <- c(3441, 34, 39137, 519, 20509, 280, 12149, 196, 11882, 199) Biopsy <- gl(2, 1, 10, labels = c("No", "Yes")) Weight <- gl(5, 2, 10, labels = c( "Underweight", "Normal", "Over (11-24%)", "Over (25-39%)", "Over (> 39%)" )) breast <- data.frame(Freq, Biopsy, Weight) breast breast <- expand_df(breast) require(sjlabelled, quietly = TRUE) breast <- var_labels(breast, Weight = "Weight group" ) odds_trend(Biopsy ~ Weight, data = breast)$df odds_trend(Biopsy ~ Weight, data = breast)$fig
## A cross-sectional study looked at the association between obesity and a biopsy resulting ## from mammography screening. Freq <- c(3441, 34, 39137, 519, 20509, 280, 12149, 196, 11882, 199) Biopsy <- gl(2, 1, 10, labels = c("No", "Yes")) Weight <- gl(5, 2, 10, labels = c( "Underweight", "Normal", "Over (11-24%)", "Over (25-39%)", "Over (> 39%)" )) breast <- data.frame(Freq, Biopsy, Weight) breast breast <- expand_df(breast) require(sjlabelled, quietly = TRUE) breast <- var_labels(breast, Weight = "Weight group" ) odds_trend(Biopsy ~ Weight, data = breast)$df odds_trend(Biopsy ~ Weight, data = breast)$fig
Study of onchocerciasis ("river blindness") in Sierra Leone, in which subjects were classified according to whether they lived in villages in savannah or rainforest area.
Oncho
Oncho
A labelled tibble with 1302 rows and 7 variables:
Subject ID.
Infected with Onchocerciasis volvulus, factor with levels "Not-infected" and "Infected".
Area of residence, factor with levels "Savannah" and "Rainforest".
Age group in years, factor with levels "5-9", "10-19", "20-39" and "40+".
Subject sex, factor with levels "Male" and "Female".
Microfiliariae load.
Severe eye lesions, factor with levels "No" and "Yes".
McMahon, JE, Sowa, SIC, Maude, GH and Kirkwood BR (1988) Onchocerciasis in Sierra Leone 2: a comparison of forest and savannah villages. Trans Roy Soc Trop Med Hyg 82: 595-600.
Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.
predict_inv
Calculates the value the predictor x that generates value y with a simple linear model.
predict_inv(model, y)
predict_inv(model, y)
model |
A simple linear model object (class lm). |
y |
A numerical scalar, the value of the outcome for which we want to calculate the predictor x. |
The estimated value of the predictor.
## Spectrophotometry example. Titration curve for riboflavin (nmol/ml). The sample has an absorbance ## of 1.15. Aim is to estimate the concentration of riboflavin in the sample. Riboflavin <- seq(0, 80, 10) OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03) titration <- data.frame(Riboflavin, OD) require(sjlabelled, quietly = TRUE) titration <- titration |> var_labels( Riboflavin = "Riboflavin (nmol/ml)", OD = "Optical density" ) titration |> gf_point(OD ~ Riboflavin) |> gf_smooth(col = "indianred3", se = TRUE, lwd = 0.5, method = "loess") ## Model with intercept different from zero: model <- lm(OD ~ Riboflavin, data = titration) glm_coef(model) predict_inv(model, 1.15)
## Spectrophotometry example. Titration curve for riboflavin (nmol/ml). The sample has an absorbance ## of 1.15. Aim is to estimate the concentration of riboflavin in the sample. Riboflavin <- seq(0, 80, 10) OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03) titration <- data.frame(Riboflavin, OD) require(sjlabelled, quietly = TRUE) titration <- titration |> var_labels( Riboflavin = "Riboflavin (nmol/ml)", OD = "Optical density" ) titration |> gf_point(OD ~ Riboflavin) |> gf_smooth(col = "indianred3", se = TRUE, lwd = 0.5, method = "loess") ## Model with intercept different from zero: model <- lm(OD ~ Riboflavin, data = titration) glm_coef(model) predict_inv(model, 1.15)
prop_or
is a simple function to calculate a proportion, from another proportion and the odds
ratio between them.
prop_or(p2, or)
prop_or(p2, or)
p2 |
The value of a proportion in the unexposed group (p2). |
or |
The odds ratio of p1/p2. |
p1
, the proportion in the exposed group (p1).
flu <- matrix(c(20, 80, 220, 140), nrow = 2) colnames(flu) <- c("Yes", "No") rownames(flu) <- c("Vaccine", "Placebo") flu or <- (20 * 140) / (80 * 220) p2 <- 80 / 220 prop_or(p2 = p2, or = or) 20 / 240
flu <- matrix(c(20, 80, 220, 140), nrow = 2) colnames(flu) <- c("Yes", "No") rownames(flu) <- c("Vaccine", "Placebo") flu or <- (20 * 140) / (80 * 220) p2 <- 80 / 220 prop_or(p2 = p2, or = or) 20 / 240
pseudo_r2
Calculates R2 analogues (pseudo R2) of logistic regression.Pseudo R2 (logistic regression)
pseudo_r2
Calculates R2 analogues (pseudo R2) of logistic regression.
pseudo_r2(model)
pseudo_r2(model)
model |
A logistic regression model. |
pseudo_r2
calculates three pseudo R2 of logistic regression models: 1) Nagelkerke, @0 Cox and Snell, 3) Hosmer and Lemeshow.
A data frame with the calculated pseudo R2 values.
data(Oncho) model_oncho <- glm(mf ~ area, data = Oncho, binomial) glm_coef(model_oncho, labels = c("Constant", "Area (rainforest/savannah)")) pseudo_r2(model_oncho)
data(Oncho) model_oncho <- glm(mf ~ area, data = Oncho, binomial) glm_coef(model_oncho, labels = c("Constant", "Area (rainforest/savannah)")) pseudo_r2(model_oncho)
qq_plot
constructs quantile-quantile plots against the standard normal distribution
(also known as quantile-normal plots).
qq_plot( object = NULL, formula = NULL, data = NULL, pch = 20, col = "indianred3", ylab = NULL, ... )
qq_plot( object = NULL, formula = NULL, data = NULL, pch = 20, col = "indianred3", ylab = NULL, ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
pch |
Point character passed to |
col |
Colour of the reference line, passed to |
ylab |
Optional character passed as label for the y-axis. |
... |
Further arguments passed to |
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> qq_plot(~dl.milk) qq_plot(~ dl.milk | sex, data = kfm)
data(kfm, package = "ISwR") require(sjlabelled, quietly = TRUE) kfm <- kfm |> var_labels( dl.milk = "Breast-milk intake (dl/day)", sex = "Sex", weight = "Child weight (kg)", ml.suppl = "Milk substitute (ml/day)", mat.weight = "Maternal weight (kg)", mat.height = "Maternal height (cm)" ) kfm |> qq_plot(~dl.milk) qq_plot(~ dl.milk | sex, data = kfm)
rank_influence
calculates influence measures of each data observation on models and then ranks them.
rank_influence(model)
rank_influence(model)
model |
A generalised linear model object. |
rank_influence
is a wrap function that calls influence.measures
, ranks observations on
their significance influence on the model and displays the 10 most influential observations
(if they are significant).
data(diet, package = "Epi") model <- glm(chd ~ fibre, data = diet, family = binomial) rank_influence(model)
data(diet, package = "Epi") model <- glm(chd ~ fibre, data = diet, family = binomial) rank_influence(model)
rank_leverage
ranks observations by their leverage (influence) on the arithmetic mean.
rank_leverage(x)
rank_leverage(x)
x |
A numeric variable. Missing values are removed by default. |
A data frame ranking observations by their leverage around the mean.
x <- rnorm(10, 170, 8) x mean(x) rank_leverage(x) x <- rnorm(100, 170, 8) mean(x) head(rank_leverage(x))
x <- rnorm(10, 170, 8) x mean(x) rank_leverage(x) x <- rnorm(100, 170, 8) mean(x) head(rank_leverage(x))
reference_range
estimates the reference range (reference interval) of a numerical variable.
reference_range(avg, std)
reference_range(avg, std)
avg |
The arithmetic mean (a scalar numerical value). |
std |
The standard deviation (a scalar numerical value). |
The reference range assumes normality and represents the limits that would include 95 observations.
A data frame with the reference range limits.
x <- rnorm(100, 170, 8) round(mean(x), 2) round(sd(x), 2) round(reference_range(mean(x), sd(x)), 2)
x <- rnorm(100, 170, 8) round(mean(x), 2) round(sd(x), 2) round(reference_range(mean(x), sd(x)), 2)
Calculates the coefficient of variation (relative dispersion) of a variable. The relative dispersion is defined as the standard deviation over the arithmetic mean.
rel_dis(x)
rel_dis(x)
x |
A numerical variable. NA's observations are removed by default. |
The coefficient of variation (relative dispersion).
height <- rnorm(100, 170, 8) rel_dis(height)
height <- rnorm(100, 170, 8) rel_dis(height)
A clinical trial on the value of extracorporeal membrane oxygenation for term neonates with severe respiratory failure. RCT compares active treatment against conventional management.
Roberts
Roberts
A labelled tibble with 185 rows and 2 variables:
Extracorporeal membrane oxygenation treatment, factor with levels "No" and "Yes".
One year survival, factor with levels "No" and "Yes".
Roberts, TE (1998) Extracorporeal Membrane Oxygenation Economics Working Group. Economic evaluation and randomised controlled trial of extracorporeal membrane oxygenation: UK collaborative trial. Brit Med J 317:911-16.
A case-control study of oral contraceptives and stroke in young women with presence or absence of hypertension. Cases represent thrombotic stroke and controls are hospital controls. The group of no hypertension includes normal blood pressure (<140/90 mm Hg) and borderline hypertension (140-159/90-94 mm Hg). Hypertension group includes moderate hypertension (160-179/95-109 mm Hg) and severe hypertension (180+/110+ mm Hg). This data has been used as an example of join exposure by Rothman for measuring interactions (see examples).
Rothman
Rothman
A labelled tibble with 477 rows and 3 variables:
Thrombotic stroke, factor with levels "No" and "Yes".
Current user of oral contraceptives, factor with levels "Non-user" and "User".
Hypertension, factor with levels "No" (<160/95 mm Hg) and "Yes".
Collaborative Group for the Study of Stroke in Young Women (1975) Oral contraceptives and stroke in young women. JAMA 231:718-722.
Rothman, KJ (2002) Epidemiology. An Introduction. Oxford University Press.
data(Rothman) mhor(stroke ~ ht / oc, data = Rothman) ## Model with standard interaction term: model1 <- glm(stroke ~ ht * oc, data = Rothman, family = binomial) glm_coef(model1) ## Model considering join exposure: Rothman$join <- 0 Rothman$join[Rothman$oc == "Non-user" & Rothman$ht == "Yes"] <- 1 Rothman$join[Rothman$oc == "User" & Rothman$ht == "No"] <- 2 Rothman$join[Rothman$oc == "User" & Rothman$ht == "Yes"] <- 3 Rothman$join <- factor(Rothman$join, labels = c( "Unexposed", "Hypertension", "OC user", "OC and hypertension" )) require(sjlabelled, quietly = TRUE) Rothman$join <- set_label(Rothman$join, label = "Exposure") model2 <- glm(stroke ~ join, data = Rothman, family = binomial) glm_coef(model2)
data(Rothman) mhor(stroke ~ ht / oc, data = Rothman) ## Model with standard interaction term: model1 <- glm(stroke ~ ht * oc, data = Rothman, family = binomial) glm_coef(model1) ## Model considering join exposure: Rothman$join <- 0 Rothman$join[Rothman$oc == "Non-user" & Rothman$ht == "Yes"] <- 1 Rothman$join[Rothman$oc == "User" & Rothman$ht == "No"] <- 2 Rothman$join[Rothman$oc == "User" & Rothman$ht == "Yes"] <- 3 Rothman$join <- factor(Rothman$join, labels = c( "Unexposed", "Hypertension", "OC user", "OC and hypertension" )) require(sjlabelled, quietly = TRUE) Rothman$join <- set_label(Rothman$join, label = "Exposure") model2 <- glm(stroke ~ join, data = Rothman, family = binomial) glm_coef(model2)
round_pval
is an internal function called by glm_coef
to round p-values from model coefficients.
round_pval(pval)
round_pval(pval)
pval |
vector of p-values, numeric. |
A case-control study to investigate the effects of passive smoking on cancer. Passive smoking was defined as exposure to the cigarette smoke of a spouse who smoked at least one cigarette per day for at least 6 months.
Sandler
Sandler
A labelled tibble with 998 rows and 3 variables:
Passive smoker, factor with levels "No" and "Yes".
Diagnosed with cancer, factor with levels "No" and "Yes".
Active smoker, factor with levels "No" and "Yes".
Sandler, DP, Everson, RB, Wilcox, AJ (1985). Passive smoking in adulthood and cancer risk. Amer J Epidem, 121: 37-48.
data(Sandler) mhor(cancer ~ smoke / passive, data = Sandler)
data(Sandler) mhor(cancer ~ smoke / passive, data = Sandler)
Data on measured and self-reported weight from 40–50 year old participants in the 1989/1990 Life In New Zealand Survey.
Sharples
Sharples
A tibble with 343 rows and 4 variables:
Self-reported weight in kg.
Measured weight in kg.
Body mass index calculated from self-reported weight and self-reported height in kg/m^2.
Body mass index calculated from measured weight and measured height in kg/m^2.
Sharples, H, et al. (2012) Agreement between measured and self-reported height, weight and BMI in predominantly European middle-aged New Zealanders: findings from a nationwide 1989 survey. New Zealand Med J 125: 60-69.
Sharples |> bland_altman(srweight ~ weight, transform = TRUE) |> gf_labs(x = "Mean of weights (kg)", y = "Measured weight / Self-reported weight")
Sharples |> bland_altman(srweight ~ weight, transform = TRUE) |> gf_labs(x = "Mean of weights (kg)", y = "Measured weight / Self-reported weight")
ss_jk
is an internal function called by jack_knife
. It calculates the squared
difference of a numerical variable around a given value (for example, the mean).
ss_jk(obs, stat)
ss_jk(obs, stat)
obs |
A numerical vector with no missing values (NA's). |
stat |
The value of the statistic that is used as a reference. |
The squared difference between a variable and a given value.
x <- rnorm(10, 170, 8) x mean(x) ss_jk(x, mean(x)) jack_knife(x)
x <- rnorm(10, 170, 8) x mean(x) ss_jk(x, mean(x)) jack_knife(x)
stats_quotes
is an internal function called by estat
.
stats_quotes(x, data2, digits = 2)
stats_quotes(x, data2, digits = 2)
x |
a numeric variable |
data2 |
A data frame where |
digits |
Number of digits for rounding. |
strip_error
constructs strip plots with error bars showing 95
confidence intervals around mean values.
strip_error( object = NULL, formula = NULL, data = NULL, pch = 20, size = 1, alpha = 0.7, col = "indianred3", ... )
strip_error( object = NULL, formula = NULL, data = NULL, pch = 20, size = 1, alpha = 0.7, col = "indianred3", ... )
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
pch |
|
size |
Size of the symbol ( |
alpha |
Opacity of the symbol (0 = invisible, 1 = opaque). |
col |
A colour or formula used for mapping colour. |
... |
data(energy, package = "ISwR") require(sjlabelled, quietly = TRUE) energy <- energy |> var_labels( expend = "Energy expenditure (MJ/day)", stature = "Stature" ) energy |> strip_error(expend ~ stature, col = "red") t.test(expend ~ stature, data = energy) ## Adding an horizontal line to show significant difference: energy |> strip_error(expend ~ stature, col = "red") |> gf_star(1, 13, 2, 13.2, 13.4, "**") data(birthwt, package = "MASS") require(dplyr, quietly = TRUE) birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> strip_error(bwt ~ smoke | Race, col = "darksalmon") birthwt |> strip_error(bwt ~ smoke, col = ~Race) birthwt |> strip_error(bwt ~ smoke, pch = ~Race, col = ~Race) birthwt |> strip_error(bwt ~ smoke | Race)
data(energy, package = "ISwR") require(sjlabelled, quietly = TRUE) energy <- energy |> var_labels( expend = "Energy expenditure (MJ/day)", stature = "Stature" ) energy |> strip_error(expend ~ stature, col = "red") t.test(expend ~ stature, data = energy) ## Adding an horizontal line to show significant difference: energy |> strip_error(expend ~ stature, col = "red") |> gf_star(1, 13, 2, 13.2, 13.4, "**") data(birthwt, package = "MASS") require(dplyr, quietly = TRUE) birthwt <- birthwt |> mutate( smoke = factor(smoke, labels = c("Non-smoker", "Smoker")), Race = factor(race > 1, labels = c("White", "Non-white")) ) |> var_labels( bwt = "Birth weight (g)", smoke = "Smoking status" ) birthwt |> strip_error(bwt ~ smoke | Race, col = "darksalmon") birthwt |> strip_error(bwt ~ smoke, col = ~Race) birthwt |> strip_error(bwt ~ smoke, pch = ~Race, col = ~Race) birthwt |> strip_error(bwt ~ smoke | Race)
Randomised control trial of an antiepilectic drug (prograbide), in which the number of seizures of 59 patients at baseline and other four follow-up visits were recorded.
Thall
Thall
A tibble with 59 rows and 8 variables:
Subject ID.
Treatment, factor with levels "Control" and "Prograbide".
Number of seizures at baseline.
Age in years at baseline.
Number of seizures at year one follow-up.
Number of seizures at year two follow-up.
Number of seizures at year three follow-up.
Number of seizures at year four follow-up.
Thall, PF and Vail, SC (1990) Some covariance models for longitudinal count data with over-dispersion. Biometrics, 46: 657-671.
Stukel, TA (1993) Comparison of methods for the analysis of longitudinal data. Statistics Med 12: 1339-1351.
Shoukri, MM and Chaudhary, MA (2007) Analysis of correlated data with SAS and R. Third Edition. Chapman & Hall/CRC.
Data of peak knee velocity in walking at flexion and extension in studies about functional performance in cerebral palsy.
Tuzson
Tuzson
A labelled tibble with 18 rows and 2 variables:
Peak knee velocity in gait: flexion (degree/s).
Peak knee velocity in gait: extension (degree/s).
Tuzson, AE, Granata, KP, and Abel, MF (2003) Spastic velocity threshold constrains functional performance in cerebral palsy. Arch Phys Med Rehabil 84: 1363-1368.
data(Tuzson) Tuzson |> gf_point(flexion ~ extension) cor.test(~ flexion + extension, data = Tuzson)
data(Tuzson) Tuzson |> gf_point(flexion ~ extension) cor.test(~ flexion + extension, data = Tuzson)
Data represents women participating in a health survey in Whickham, England in 1972-1974.
Vanderpump
Vanderpump
A labelled tibble with 1314 rows and 3 variables:
Vitality status, factor with levels "Alive" and "Death".
Smoking status, factor with levels "Non-smoker" and "Smoker".
Age group, factor with levels "18-44", "45-64" and "64+".
Vanderpump, MP, et al (1996) Thyroid, 6:155-160.
Appleton, DR, French, JM and Vanderpump, PJ (1996) Ignoring a covariate: An example of Simpson's paradox. The American Statistician 50:340-341.
Vittinghoff, E, Glidden, DV, Shiboski, SC and McCulloh, CE (2005) Regression methods in Biostatistics. Springer.
data(Vanderpump) mhor(vstatus ~ agegrp / smoker, data = Vanderpump)
data(Vanderpump) mhor(vstatus ~ agegrp / smoker, data = Vanderpump)