Package 'distdichoR'

Title: Distributional Method for the Dichotomisation of Continuous Outcomes
Description: Contains a range of functions covering the present development of the distributional method for the dichotomisation of continuous outcomes. The method provides estimates with standard error of a comparison of proportions (difference, odds ratio and risk ratio) derived, with similar precision, from a comparison of means. See the URL below or <arXiv:1809.03279> for more information.
Authors: Odile Sauzet
Maintainer: Odile Sauzet <[email protected]>
License: MIT + file LICENSE
Version: 0.1-1
Built: 2024-12-06 06:42:08 UTC
Source: CRAN

Help Index


BMI of 1,781 mothers

Description

A dataset containing the Body Mass Index (BMI) of mothers at the beginning of their pregnancy and a variable showing if it is the first pregnancy (parity=0) or a subsequent (parity>0).

Usage

bmi

Format

A data frame with 1781 rows and 4 variables:

bmi

Body Mass Index (15.04-45.18)

inv_bmi

inverse Body Mass Index (0.022-0.066)

parity

parity of the mother (primi, multi)

group_par

0 = primipari, 1 = multipari


BMI of 1,560 mothers

Description

A dataset containing the Body Mass Index (BMI) of employed and unemployed mothers.

Usage

bmi2

Format

A data frame with 1560 rows and 4 variables:

bmi

Body Mass Index (15.04-45.18)

inv_bmi

inverse Body Mass Index (0.022-0.066)

employ

employment status (unemployed, employed)

group_emp

1 = employed, 2 = unemployed


Birth weight of 1,458 babies

Description

A dataset containing the smoking status of the mother during their pregnancy, the gestational age and the birth weight of their babies.

Usage

bwsmoke

Format

A data frame with 1458 rows and 3 variables:

birthwt

birth weight in gram (1780-4870)

S

smoke

smoking status of the mother (smoker, non-smoker)

gest

gestational age (37-44.43)


Apgar score of 1755 babies

Description

A dataset containing the apgar score after 5 minutes, the condition of the babies, the birthweight of the babies, the gestational age and the smoking status of the mothers

Usage

bwsmokecompl

Format

A data frame with 1755 rows and 7 variables:

apgar5

Apgar score after 5 minutes

babycon

Condition of the baby

birthwt

Birthweight of the babies

gest

gestational age

smoke

smoking status of the mother (0 non-smoker, 1 smoker)

apgar_10

10 minus apgar5, so the apgarscore can be seen as gamma distributed

smoke2

Allowing the smoking status to have three characteristics (0 non-smoker, 1 smoker, 2 no information)

momid

ID-number of the mother


normal data

Description

The distributional method for dichotomising normal data allowing for assumptions of unequal variances (based on Sauzet et al. 2014 and Peacock et al. 2012).

Usage

distdicho(x, ...)

## Default S3 method:
distdicho(x, y, cp = 0, tail = c("lower", "upper"),
  R = 1, correction = FALSE, unequal = FALSE, conf.level = 0.95,
  bootci = FALSE, nrep = 2000, ...)

## S3 method for class 'formula'
distdicho(formula, data, exposed, ...)

Arguments

x

A numeric vector of data values.

...

Further arguments to be passed to or from methods.

y

A numeric vector of data values.

cp

A numeric value specifying the cut point under which the distributional proportions are computed.

tail

A character string specifying the tail of the distribution in which the proportions are computed. Must be either 'lower' (default) or 'upper'.

R

A numeric value indicating the true ratio of variances (R = Var(x)/Var(y)). A value of 0 specifies that the true ratio of variances is unknown.

correction

A logical indicating whether to use a correction factor for large effect sizes (>0.7) (valid for difference in proportions only).

unequal

A logical variable indicating if a correction for an unknown variance ratio should be used if no assumption can be made about the variance ratio.

conf.level

Confidence level of the interval.

bootci

A logical variable indicating whether bootstrap bias-corrected confidence intervals are calculated instead of distributional ones.

nrep

A numeric value specifying the number of bootstrap replications (nrep must be higher than the number of observations).

formula

A formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding exposed and unexposed groups.

data

An optional matrix or data frame containing the variables. in the formula. By default, the variables are taken from environment(formula).

exposed

A character string specifying the grouping value of the exposed group.

Details

distdicho first returns the results of a two-group unpaired t-test (allowing for unequal variances in the unequal variances cases). Followed by the distributional estimates and their standard errors (see Sauzet et al. 2014 and Peacock et al. 2012) for a difference in proportions, risk ratio and odds ratio. It also provides the distributional confidence intervals for the statistics estimated (this assumes an asymptotic normal distribution of estimates and might not be valid for small sample sizes (see Sauzet et al. 2014 for details)). Estimates are calculated using either assumption of equal variances in both groups (default R = 1) or assumption of unequal variance ratio (R != 1 & R !=0 for known variance ratio and R=0 for correction for unknown variance ratio). The data can either be given as two variables, which provide the outcome in each group or specified as a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding exposed and unexposed groups. In all cases, it is assumed that there are only two groups.

Value

A list with class 'distdicho' containing the following components:

data.name

The names of the data.

arguments

A list with the specified arguments.

parameter

The mean, standard error and number of observations for both groups.

prop

The estimated proportions below / above the cut point for both groups.

dist.estimates

The difference in proportions, risk ratio and odds ratio of the groups.

se

The estimated standard error of the difference in proportions, the risk ratio and the odds ratio.

ci

The confidence intervals of the difference in proportions, the risk ratio and the odds ratio.

method

A character string indicating the used method.

ttest

A list containing the results of a t-test.

References

Peacock J.L., Sauzet O., Ewings S.M., Kerry S.M. Dichotomising continuous data while retaining statistical power using a distributional approach. Statist. Med; 2012;26:3089-3103. Sauzet, O., Peacock, J. L. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Statist. Med; 2014 33 4547-4559 ;DOI: 10.1002/sim.6255. Peacock, J.L., Bland, J.M., Anderson, H.R.: Preterm delivery: effects of socioeconomic factors, psychological stress, smoking, alcohol, and caffeine. BMJ 311(7004), 531-535 (1995).

See Also

distdichoi, distdichogen, distdichoigen, regdistdicho

Examples

## Proportions of low birth weight babies among smoking and non-smoking mothers
## (data from Peacock et al. 1995). Returns distributional estimates, standard 
## errors and distributional confidence intervals for differences in proportions,
## RR and OR of babies having a birth weight under 2500g (low birth weight)
## for group smoker (mother smokes) over the odds of LBW in group non-smoker 
## (mother doesn't smoke)
# Formula interface
distdicho(birthwt ~ smoke, cp = 2500, data = bwsmoke, exposed = 'smoker')
# Data stored in two vectors
bw_smoker <- bwsmoke$birthwt[bwsmoke$smoke == 'smoker']
bw_nonsmoker <- bwsmoke$birthwt[bwsmoke$smoke == 'non-smoker']
distdicho(x = bw_smoker, y = bw_nonsmoker, cp = 2500)


## Inverse Body Mass Index (transformation required to have a normal outcome)
## and parity (data from Peacock et al. 1995). Returns distributional estimates,
## standard errors and distributional confidence intervals for differences in 
## proportions, RR and OR of obese mothers (BMI of >30 kg/m^2) for multiparas 
## (group_par=1) over the odds of obesity in group primiparity (group_par=0).
distdicho(inv_bmi ~ group_par, cp = 0.033, data = bmi, exposed = '1')


## Inverse Body Mass Index (BMI) and employment. Returns distributional estimates,
## standard errors and distributional confidence intervals for differences in
## proportions, RR and OR with correction for unknown variance ratio of obese 
## mothers (BMI of >30 kg/m^2) for group_emp = 2 (mother unemployed) over
## the odds of obesity in group_emp = 1 (mother employed)
distdicho(inv_bmi ~ group_emp, cp = 0.033, R = 0, data = bmi2, exposed = '2')


## Inverse Body Mass Index (BMI) and employment. Returns distributional estimates,
## standard errors and distributional confidence intervals for differences in
## proportions, RR and OR computed under the hypothesis that the ratio of variances
## is equal to 1.3 of obese mothers (BMI of >30 kg/m^2) for group_emp = 2
## (mother unemployed) over the odds of obesity in group_emp = 1 (mother employed)
distdicho(inv_bmi ~ group_emp, cp = 0.033, R = 1.3, data = bmi2, exposed = '2')

normal, skew-normal or gamma distributed data

Description

distdichogen first returns the results of a two-group unpaired t-test. Followed by the distributional estimates and their standard errors (see Sauzet et al. 2014 and Peacock et al. 2012) for a difference in proportions, risk ratio and odds ratio. It also provides the distributional confidence intervals for the statistics estimated. distdicho_gen takes normal (dist = 'normal'), skew normal (dist = 'sk_normal') and gamma (dist = 'gamma') distributed data. The data can either be given as two variables, which provide the outcome in each group or specified as a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding exposed and unexposed groups. In all cases, it is assumed that there are only two groups.

Usage

distdichogen(x, ...)

## Default S3 method:
distdichogen(x, y, cp = 0, tail = c("lower", "upper"),
  conf.level = 0.95, dist = c("normal", "sk_normal", "gamma"),
  bootci = FALSE, nrep = 2000, ...)

## S3 method for class 'formula'
distdichogen(formula, data, exposed, ...)

Arguments

x

A numeric vector of data values.

...

Further arguments to be passed to or from methods.

y

A numeric vector of data values.

cp

A numeric value specifying the cut point under which the distributional proportions are computed.

tail

A character string specifying the tail of the distribution in which the proportions are computed. Must be either 'lower' (default) or 'upper'.

conf.level

Confidence level of the interval.

dist

A character string specifying the distribution of the data. Must be either 'normal' (default), 'sk_normal or 'gamma'.

bootci

A logical variable indicating whether bootstrap bias-corrected confidence intervals are calculated instead of distributional ones.

nrep

A numeric value, specifies the number of bootstrap replications (nrep must be higher than the number of observations).

formula

A formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding exposed and unexposed groups.

data

An optional matrix or data frame containing the variables in the formula. By default, the variables are taken from environment(formula).

exposed

A character string specifying the grouping value of the exposed group.

Value

A list with class 'distdicho' containing the following components:

data.name

The names of the data.

arguments

A list with the specified arguments.

parameter

The mean, standard error and number of observations for both groups.

prop

The estimated proportions below / above the cut point for both groups.

dist.estimates

The difference in proportions, risk ratio and odds ratio of the groups.

se

The estimated standard error of the difference in proportions, the risk ratio and the odds ratio.

ci

The confidence intervals of the difference in proportions, the risk ratio and the odds ratio.

method

A character string indicating the used method.

ttest

A list containing the results of a t-test.

References

Peacock J.L., Sauzet O., Ewings S.M., Kerry S.M. Dichotomising continuous data while retaining statistical power using a distributional approach. Statist. Med; 2012; 26:3089-3103. Sauzet, O., Peacock, J. L. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Statist. Med; 2014 33 4547-4559 ;DOI: 10.1002/sim.6255. Sauzet, O., Ofuya, M., Peacock, J. L. Dichotomisation using a distributional approach when the outcome is skewed BMC Medical Research Methodology 2015, 15:40; doi:10.1186/s12874-015-0028-8. Peacock, J.L., Bland, J.M., Anderson, H.R.: Preterm delivery: effects of socioeconomic factors, psychological stress, smoking, alcohol, and caffeine. BMJ 311(7004), 531-535 (1995).

See Also

distdicho, distdichoi, distdichoigen, regdistdicho

Examples

## Proportions of low birth weight babies among smoking and non-smoking mothers
## (data from Peacock et al. 1995). Returns distributional estimates, standard 
## errors and distributional confidence intervals for differences in proportions,
## RR and OR of babies having a birth weight under 2500g (low birth weight)
## for group smoker (mother smokes) over the odds of LBW in group non-smoker 
## (mother doesn't smoke)
# Formula interface
distdichogen(birthwt ~ smoke, cp = 2500, data = bwsmoke, exposed = 'smoker',
             dist = 'sk_normal')
# Data stored in two vectors
bw_smoker <- bwsmoke$birthwt[bwsmoke$smoke == 'smoker']
bw_nonsmoker <- bwsmoke$birthwt[bwsmoke$smoke == 'non-smoker']
distdichogen(x = bw_smoker, y = bw_nonsmoker, 
              cp = 2500, tail = 'lower', dist = 'sk_normal')


## Body Mass Index (BMI) and parity. Returns distributional estimates, standard
## errors and distributional confidence intervals for difference in proportions,
## RR and OR of obese mothers (BMI of >30kg/m^2) for group_par=1 (multiparity) 
## over the odds of obesity in group_par=0 (primiparity)
distdichogen(bmi ~ group_par, cp = 30, data = bmi, exposed = '1',
             tail = 'upper', dist = 'sk_normal')

nomal data (immdediate form, allowing unequal variances)

Description

Immediate form of the distributional method for dichotomising normal data allowing for assumptions of unequal variances (based on Sauzet et al. 2014 and Peacock et al. 2012).

Usage

distdichoi(n1, m1, s1, n2, m2, s2, cp = 0, tail = c("lower", "upper"),
  R = 1, conf.level = 0.95)

Arguments

n1

A number specifying the number of observations in the exposed group.

m1

A number specifying the mean of the exposed group.

s1

A number specifying the standard deviation of the exposed group.

n2

A number specifying the number of observations in the unexposed (reference) group

m2

A number specifying the mean of the unexposed (reference) group.

s2

A number specifying the standard deviation of the unexposed (reference) group.

cp

A numeric value specifying the cut point under or over which the distributional proportions are computed.

tail

A character string specifying the tail of the distribution in which the proportions are computed. Must be either 'lower' (default) or 'upper'.

R

A numeric value indicating the true ratio of variances (R = Var(group1)/Var(group2). A value of 0 specifies that the true ratio of variances is unknown.

conf.level

Confidence level of the interval.

Details

distdichoi takes no data, but the number of observations as well as the mean and standard deviations of both groups. It first returns the results of a two-group unpaired t-test (allowing for unequal variances in the unequal variance cases). Followed by the distributional estimates and their standard errors (see Sauzet et al. 2014 and Peacock et al. 2012) for a difference in proportions, risk ratio and odds ratio. It also provides the distributional confidence intervals for the statistics estimated (this assumes an asymptotic normal distribution of estimates and might not be valid for small sample sizes (see Sauzet et al. 2014 for details)). Estimates are calculated using either assumption of equal variances in both groups (default R = 1) or assumption of unequal variance ratio (R != 1 & R !=0 for known variance ratio and R=0 for correction for unknown variance ratio).

Value

A list with class 'distdicho' containing the following components:

data.name

The names of the data.

arguments

A list with the specified arguments.

parameter

The mean, standard error and number of observations for both groups.

prop

The estimated proportions below / above the cut point for both groups.

dist.estimates

The difference in proportions, risk ratio and odds ratio of the groups.

se

The estimated standard error of the difference in proportions, the risk ratio and the odds ratio.

ci

The confidence intervals of the difference in proportions, the risk ratio and the odds ratio.

method

A character string indicating the used method.

ttest

A list containing the results of a t-test.

References

Peacock J.L., Sauzet O., Ewings S.M., Kerry S.M. Dichotomising continuous data while retaining statistical power using a distributional approach. Statist. Med; 2012;26:3089-3103. Sauzet, O., Peacock, J. L. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Statist. Med; 2014 33 4547-4559 ;DOI: 10.1002/sim.6255. Peacock, J.L., Bland, J.M., Anderson, H.R.: Preterm delivery: effects of socioeconomic factors, psychological stress, smoking, alcohol, and caffeine. BMJ 311(7004), 531-535 (1995).

See Also

distdicho, distdichogen, distdichoigen, regdistdicho

Examples

# Immediate form of distdicho
distdichoi(n1 = 494, m1 = 3267.4, s1 = 441.3,
           n2 = 983, m2 = 3452, s2 = 435.9,
           cp = 2500, tail = 'upper')

## Proportions of low birth weight babies among smoking and non-smoking mothers
## (data from Peacock et al. 1995). Returns distributional estimates, standard 
## errors and distributional confidence intervals for differences in proportions,
## RR and OR of babies having a birth weight under 2500g (low birth weight LBW)
## for group smoker (mother smokes) over the odds of LBW in group non-smoker 
## (mother doesn't smoke)
# distdicho and distdichoi are returning the same results
bw_smoker <- bwsmoke$birthwt[bwsmoke$smoke == 'smoker']
bw_nonsmoker <- bwsmoke$birthwt[bwsmoke$smoke == 'non-smoker']
distdicho(x = bw_smoker, y = bw_nonsmoker, cp = 2500)
distdichoi(n1 = length(bw_smoker[!is.na(bw_smoker)]), 
           m1 = mean(bw_smoker, na.rm = TRUE), 
           s1 = sd(bw_smoker, na.rm = TRUE),
           n2 = length(bw_nonsmoker[!is.na(bw_smoker)]), 
           m2 = mean(bw_nonsmoker, na.rm = TRUE), 
           s2 = sd(bw_nonsmoker, na.rm = TRUE), 
           cp = 2500)

normal, skew-normal or gamma distributed data (immediate form)

Description

Immediate form of the distributional method for dichotomising normal, skew normal or gamma distributed data (based on Sauzet et al. 2015).

Usage

distdichoigen(n1, m1, s1, n2, m2, s2, alpha = 1, cp = 0, tail = c("lower",
  "upper"), conf.level = 0.95, dist = c("normal", "sk_normal", "gamma"))

Arguments

n1

A number specifying the number of observations in the exposed group.

m1

A number specifying the mean of the exposed group.

s1

A number specifying the standard deviation of the exposed group.

n2

A number specifying the number of observations in the unexposed (reference) group.

m2

A number specifying the mean of the unexposed (reference) group.

s2

A number specifying the standard deviation of the unexposed (reference) group.

alpha

A numeric value specifying further parameter of the skew normal / gamma distribution.

cp

A numeric value specifying the cut point under which the distributional proportions are computed.

tail

A character string specifying the tail of the distribution in which the proportions are computed, must be either 'lower' (default) or 'upper'.

conf.level

Confidence level of the interval.

dist

A character string specifying the distribution, must be either 'normal' (default), 'sk_normal or 'gamma'.

Details

distdichoigen takes no data, but the number of observations as well as the mean and standard deviations of both groups. It first returns the results of a two-group unpaired t-test. Followed by the distributional estimates and their standard errors (see Sauzet et al. 2014 and Peacock et al. 2012) for a difference in proportions, risk ratio and odds ratio. It also provides the distributional confidence intervals for the statistics estimated. If a skew normal (dist = 'sk_normal') or gamma (dist = 'gamma') distribution is assumed, a third parameter alpha needs to be specified. For (dist = 'sk_normal') alpha is described in psn. For dist = 'gamma' alpha is the shape as described in pgamma.

Value

A list with class 'distdicho' containing the following components:

data.name

The names of the data.

arguments

A list with the specified arguments.

parameter

The mean, standard error and number of observations for both groups.

prop

The estimated proportions below / above the cut point for both groups.

dist.estimates

The difference in proportions, risk ratio and odds ratio of the groups.

se

The estimated standard error of the difference in proportions, the risk ratio and the odds ratio.

ci

The confidence intervals of the difference in proportions, the risk ratio and the odds ratio.

method

A character string indicating the used method.

ttest

A list containing the results of a t-test.

References

Peacock J.L., Sauzet O., Ewings S.M., Kerry S.M. Dichotomising continuous data while retaining statistical power using a distributional approach. Statist. Med; 2012; 26:3089-3103. Sauzet, O., Peacock, J. L. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. Statist. Med; 2014 33 4547-4559 ;DOI: 10.1002/sim.6255. Sauzet, O., Ofuya, M., Peacock, J. L. Dichotomisation using a distributional approach when the outcome is skewed BMC Medical Research Methodology 2015, 15:40; doi:10.1186/s12874-015-0028-8. Peacock, J.L., Bland, J.M., Anderson, H.R.: Preterm delivery: effects of socioeconomic factors, psychological stress, smoking, alcohol, and caffeine. BMJ 311(7004), 531-535 (1995).

See Also

distdicho, distdichoi, distdichogen, regdistdicho

Examples

# Immediate form of sk_distdicho
distdichoigen(n1 = 75, m1 = 3250, s1 = 450, n2 = 110, m2 = 2950, s2 = 475,
               cp = 2500, tail = 'lower', alpha = -2.3, dist = 'sk_normal')

normal, skew-normal or gamma distributed data (via linear regression)

Description

Provides adjusted distributional estimates for the comparison of proportions for a dichotomised dependent continuous variable derived from a linear regression of the continuous outcome on the grouping variable and other covariates as described in Sauzet et al. 2015.

Usage

regdistdicho(mod, group_var, cp = 0, tail = c("lower", "upper"),
  conf.level = 0.95, dist = c("normal", "sk_normal", "gamma"), alpha = 1)

Arguments

mod

A linear model of the form lm(lhs ~ rhs) where lhs is a numeric variable giving the data values and rhs is the grouping variable and other covariates.

group_var

A character string specifying the name of the grouping variable.

cp

A numeric value specifying the cut point under which the distributional proportions are computed.

tail

A character string specifying the tail of the distribution in which the proportions are computed, must be either 'lower' (default) or 'upper'.

conf.level

Confidence level of the interval.

dist

A character string specifying the distribution of the error variable in the linear regression, must be either 'normal' (default), 'sk_normal or 'gamma'.

alpha

A numeric value specifying further parameter of the skew normal / gamma distribution.

Details

regdistdicho returns the distributional estimates and their standard errors (see Sauzet et al. 2014 and Peacock et al. 2012) for a difference in proportions, risk ratio and odds ratio. It also provides the distributional confidence intervals for the statistics estimated. The estimation is based on the marginal means of a linear regression of the outcome on the grouping variable and other covariates.

Value

A list with class 'distdicho' containing the following components:

data.name

The names of the data.

arguments

A list with the specified arguments.

parameter

The marginal mean, standard error and number of observations for both groups.

prop

The estimated proportions below / above the cut point for both groups.

dist.estimates

The difference in proportions, risk ratio and odds ratio of the groups.

se

The estimated standard error of the difference in proportions, the risk ratio and the odds ratio.

ci

The confidence intervals of the difference in proportions, the risk ratio and the odds ratio.

method

A character string indicating the used method.

References

Peacock J.L., Sauzet O., Ewings S.M., Kerry S.M. Dichotomising continuous data while retaining statistical power using a distributional approach. 2012 Statist. Med; 26:3089-3103. Sauzet, O., Peacock, J. L. Estimating dichotomised outcomes in two groups with unequal variances: a distributional approach. 2014 Statist. Med; 33 4547-4559 ;DOI: 10.1002/sim.6255. Sauzet, O., Brekenkamp, J., Brenne, S. , Borde, T., David, M., Razum, O., Peacock, J.L. 2015. A distributional approach to obtain adjusted differences in population at risk with a comparison with other regressions methods using perinatal data. In preparation. Peacock, J.L., Bland, J.M., Anderson, H.R.: Preterm delivery: effects of socioeconomic factors, psychological stress, smoking, alcohol, and caffeine. BMJ 311(7004), 531-535 (1995).

See Also

distdicho, distdichoi, distdichogen, distdichoigen

Examples

## Proportions of low birth weight babies among smoking and non-smoking mothers
## (data from Peacock et al. 1995)
mod_smoke <- lm(birthwt ~ smoke + gest, data = bwsmoke)
regdistdicho(mod = mod_smoke, group_var = 'smoke', cp = 2500, tail = 'lower')