Title: | Analysis of Overdispersed Data using S3 Methods |
---|---|
Description: | Provides functions to analyse overdispersed counts or proportions. These functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM). aods3 is an S3 re-implementation of the deprecated S4 package aod. |
Authors: | Matthieu Lesnoff [aut], Renaud Lancelot [aut], Aurélie Siberchicot [ctb, cre] |
Maintainer: | Aurélie Siberchicot <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5 |
Built: | 2024-11-07 13:38:22 UTC |
Source: | CRAN |
Hypothetical drug trial to compare the effect of four antibiotics against Shipping fever in calves (Shoukri and Pause, 1999, Table 3.11).
data(antibio)
data(antibio)
A data frame with 24 observations on the following 3 variables.
A factor with levels 1
, 2
, 3
and 4
A numeric vector: the number of treated animals within a two-week period.
A numeric vector: the number of deaths at the end of the two weeks.
Shoukri, M.M., Pause, C.A., 1999, 2nd ed. Statistical methods for health sciences. CRC Press, London.
The function fits a beta-binomial (BB) or a negative binomial (NB) generalized linear model from clustered data.
For the BB model, data have the form {}, where
is the size of cluster
,
the number of “successes”, and
the number of clusters. The response is the proportion
For the NB model, data can be of two types. When modeling simple counts, data have the form {}, where
is the number of occurences of the event under study. When modeling rates (e.g. hazard rates), data have the same form as for the BB model, where
is the denominator of the rate for cluster
(considered as an “offset”, i.e. a constant known value) and
the number of occurences of the event. For both types of data, the response is the count
.
aodml(formula, data, family = c("bb", "nb"), link = c("logit", "cloglog", "probit"), phi.formula = ~ 1, phi.scale = c("identity", "log", "inverse"), phi.start = NULL, fixpar = list(), hessian = TRUE, method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(maxit = 3000, trace = 0), ...) ## S3 method for class 'aodml' print(x, ...) ## S3 method for class 'aodml' summary(object, ...) ## S3 method for class 'aodml' anova(object, ...) ## S3 method for class 'anova.aodml' print(x, digits, ...) ## S3 method for class 'aodml' fitted(object, ..., what = c("mu", "nu", "eta", "phi")) ## S3 method for class 'aodml' residuals(object, ..., type = c("deviance", "pearson", "response")) ## S3 method for class 'aodml' coef(object, ...) ## S3 method for class 'aodml' df.residual(object, ...) ## S3 method for class 'aodml' logLik(object, ...) ## S3 method for class 'aodml' deviance(object, ...) ## S3 method for class 'aodml' AIC(object, ..., k = 2) ## S3 method for class 'aodml' vcov(object, ...) ## S3 method for class 'aodml' predict(object, ..., type = c("link", "response"), se.fit = FALSE, newdata = NULL)
aodml(formula, data, family = c("bb", "nb"), link = c("logit", "cloglog", "probit"), phi.formula = ~ 1, phi.scale = c("identity", "log", "inverse"), phi.start = NULL, fixpar = list(), hessian = TRUE, method = c("BFGS", "Nelder-Mead", "CG", "SANN"), control = list(maxit = 3000, trace = 0), ...) ## S3 method for class 'aodml' print(x, ...) ## S3 method for class 'aodml' summary(object, ...) ## S3 method for class 'aodml' anova(object, ...) ## S3 method for class 'anova.aodml' print(x, digits, ...) ## S3 method for class 'aodml' fitted(object, ..., what = c("mu", "nu", "eta", "phi")) ## S3 method for class 'aodml' residuals(object, ..., type = c("deviance", "pearson", "response")) ## S3 method for class 'aodml' coef(object, ...) ## S3 method for class 'aodml' df.residual(object, ...) ## S3 method for class 'aodml' logLik(object, ...) ## S3 method for class 'aodml' deviance(object, ...) ## S3 method for class 'aodml' AIC(object, ..., k = 2) ## S3 method for class 'aodml' vcov(object, ...) ## S3 method for class 'aodml' predict(object, ..., type = c("link", "response"), se.fit = FALSE, newdata = NULL)
formula |
A formula for the mean For the BB model, the left-hand side of the formula must be of the form
where the fitted proportion is For the NB model, the left-hand side of the formula must be of the form
where the fitted count is |
data |
A data frame containing the response ( |
family |
Define the model which is fitted: “bb” for the BB model and “nb” for the NB model. |
link |
For the BB model only. Define the link function |
phi.formula |
A right-hand side formula to model optional heterogeneity for the over-dispersion parameter Default to |
phi.scale |
Scale on which |
phi.start |
Optional starting values for |
fixpar |
An optional list of 2 vectors of same length ( The first vector indicates which parameters are set as constant (i.e., not optimized) in the global parameter vector The second vector indicates the corresponding values. For instance,
means that the 4th and 5th components of vector |
hessian |
A logical (default to |
method |
Define the method used for the optimization (see |
control |
A list to control the optimization parameters. See |
object |
An object of class |
x |
An object of class |
digits |
Number of digits to print in print.summary.aodml and print.anova.aodml.
Default to |
... |
Further arguments passed to |
what |
For function |
type |
For function |
k |
Numeric scalar for the penalty parameter used to compute the information criterion. The default value ( |
se.fit |
Logical scalar indicating whether standard errors should be computed for the predicted values. Default to |
newdata |
A |
Beta-binomial model (BB)
For a given cluster , the model is
where follows a Beta distribution
. Noting
the beta function, the marginal (beta-binomial) distribution of
is
Using the re-parameterization and
, the marginal mean and variance are
The response in aodml
is . The mean is
, defined such as
, where
is the link function,
is a design-matrix,
a vector of fixed effects and
is the corresponding linear predictor. The variance is
.
Negative binomial model (NB)
—— Simple counts (model with no offset)
For a given cluster , the model is
where follows a Gamma distribution of mean
and shape
(
). Noting
the gamma function, the marginal (negative binomial) distribution of
is
Using the re-parameterization , the marginal mean and variance are
The response in aodml
is . The mean is
, defined such as
. The variance is
.
—— Rates (model with offset)
For a given cluster , the model is
The marginal (negative binomial) distribution is the same as for the case with no offset (
). The response in
aodml
is . The mean is
, defined such as
, where
is the offset. The variance is
.
Other details
Argument phi.scale
of function aodml
enables to estimate the over-dispersion parameter under different scales.
If phi.scale = "identity"
(Default), the function estimates .
If phi.scale = "log"
, the function estimates .
If phi.scale = "inverse"
, the function estimates .
The full parameter vector returned by aodml
, say param
, is equal to . This vector is estimated by maximizing the log-likelihood of the marginal model using function
optim
. The estimated variances-covariances matrix of param
is calculated by the inverse of the observed hessian matrix returned by optim
, and is referred to as varparam
.
An object of class aodml
, printed and summarized by various functions. Function deviance.aodml
returns the value -2 * (logL - logL_max)
. The “deviance” used in function AIC.aodml
to calculate AIC and AICc is -2 * logL
.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application
to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3): 209-225.
McCullagh, P., Nelder, J. A., 1989, 2nd ed. Generalized linear models. New York, USA: Chapman and Hall.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of
correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31, 949-952.
#------ Beta-binomial model data(orob2) fm1 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") names(fm1) # summaries fm1 summary(fm1) coef(fm1) vcov(fm1) logLik(fm1) deviance(fm1) AIC(fm1) gof(fm1) # predictions cbind( fitted(fm1), fitted(fm1, what = "nu"), fitted(fm1, what = "eta"), fitted(fm1, what = "phi") ) predict(fm1, type = "response", se.fit = TRUE) newdat <- data.frame(seed = c("O73", "O75")) predict(fm1, type = "response", se.fit = TRUE, newdata = newdat) # model with heterogeneity in phi fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.formula = ~ seed) summary(fm) AIC(fm1, fm) # various phi scales fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") fm$phi fm$phi.scale fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.scale = "log") fm$phi fm$phi.scale fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.scale = "inverse") fm$phi fm$phi.scale ### Models with coefficient(s) set as constant # model with 1 phi coefficient, set as constant "0.02" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", fixpar = list(5, 0.02)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", phi.formula = ~ seed, fixpar = list(5, 1e-8)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0", # and the mu intercept (1rst coef of vector b) set as as constant "-0.5" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", phi.formula = ~ seed, fixpar = list(c(1, 5), c(-0.5, 1e-8))) fm$param fm$varparam ### Model tests # LR tests - non-constant phi fm0 <- aodml(cbind(m, n - m) ~ 1, data = orob2, family = "bb") fm2 <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb") fm3 <- aodml(cbind(m, n - m) ~ seed * root, data = orob2, family = "bb") anova(fm0, fm1, fm2, fm3) # LR tests - constant phi # phi is assumed to be estimated from fm3 fm2.bis <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb", fixpar = list(4, fm3$phi)) LRstat <- 2 * (logLik(fm3) - logLik(fm2.bis)) pchisq(LRstat, df = 1, lower.tail = FALSE) # Wald test of the seed factor in fm1 wald.test(b = coef(fm3), varb = vcov(fm3), Terms = 4) #------ Negative binomial model ### Modelling counts data(salmonella) fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb") ## fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", ## method = "Nelder-Mead") # summaries fm1 summary(fm1) coef(fm1) vcov(fm1) logLik(fm1) deviance(fm1) AIC(fm1) gof(fm1) # predictions cbind( fitted(fm1), fitted(fm1, what = "nu"), fitted(fm1, what = "eta"), fitted(fm1, what = "phi") ) predict(fm1, type = "response", se.fit = TRUE) newdat <- data.frame(dose = c(20, 40)) predict(fm1, type = "response", se.fit = TRUE, newdata = newdat) # various phi scales fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb") fm$phi fm$phi.scale fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", phi.scale = "log") fm$phi fm$phi.scale fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", phi.scale = "inverse") fm$phi fm$phi.scale # LR and Wald tests of the "log(dose + 10) + dose" factors fm0 <- aodml(m ~ 1, data = salmonella, family = "nb") anova(fm0, fm1) fm0.bis <- aodml(m ~ 1, data = salmonella, family = "nb", fixpar = list(2, fm1$phi)) LRstat <- 2 * (logLik(fm1) - logLik(fm0.bis)) pchisq(LRstat, df = 2, lower.tail = FALSE) wald.test(b = coef(fm1), varb = vcov(fm1), Terms = 2:3) ### Modelling a rate data(dja) # rate "m / trisk" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb") summary(fm) fm <- aodml(formula = m ~ group + offset(log(trisk)), phi.formula = ~ group, data = dja, family = "nb", phi.scale = "log") summary(fm) # model with 1 phi coefficient, set as constant "0.8" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~1, fixpar = list(3, 0.8)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0" in the identity scale fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~ group, phi.scale = "log", fixpar = list(4, -15)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant "0" in the identity scale, # and the mu intercept coefficient (1rst coef of vector b) set as as constant "-0.5" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~ group, phi.scale = "log", fixpar = list(c(1, 4), c(-0.5, -15))) fm$param fm$varparam
#------ Beta-binomial model data(orob2) fm1 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") names(fm1) # summaries fm1 summary(fm1) coef(fm1) vcov(fm1) logLik(fm1) deviance(fm1) AIC(fm1) gof(fm1) # predictions cbind( fitted(fm1), fitted(fm1, what = "nu"), fitted(fm1, what = "eta"), fitted(fm1, what = "phi") ) predict(fm1, type = "response", se.fit = TRUE) newdat <- data.frame(seed = c("O73", "O75")) predict(fm1, type = "response", se.fit = TRUE, newdata = newdat) # model with heterogeneity in phi fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.formula = ~ seed) summary(fm) AIC(fm1, fm) # various phi scales fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") fm$phi fm$phi.scale fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.scale = "log") fm$phi fm$phi.scale fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb", phi.scale = "inverse") fm$phi fm$phi.scale ### Models with coefficient(s) set as constant # model with 1 phi coefficient, set as constant "0.02" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", fixpar = list(5, 0.02)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", phi.formula = ~ seed, fixpar = list(5, 1e-8)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0", # and the mu intercept (1rst coef of vector b) set as as constant "-0.5" fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2, family = "bb", phi.formula = ~ seed, fixpar = list(c(1, 5), c(-0.5, 1e-8))) fm$param fm$varparam ### Model tests # LR tests - non-constant phi fm0 <- aodml(cbind(m, n - m) ~ 1, data = orob2, family = "bb") fm2 <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb") fm3 <- aodml(cbind(m, n - m) ~ seed * root, data = orob2, family = "bb") anova(fm0, fm1, fm2, fm3) # LR tests - constant phi # phi is assumed to be estimated from fm3 fm2.bis <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb", fixpar = list(4, fm3$phi)) LRstat <- 2 * (logLik(fm3) - logLik(fm2.bis)) pchisq(LRstat, df = 1, lower.tail = FALSE) # Wald test of the seed factor in fm1 wald.test(b = coef(fm3), varb = vcov(fm3), Terms = 4) #------ Negative binomial model ### Modelling counts data(salmonella) fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb") ## fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", ## method = "Nelder-Mead") # summaries fm1 summary(fm1) coef(fm1) vcov(fm1) logLik(fm1) deviance(fm1) AIC(fm1) gof(fm1) # predictions cbind( fitted(fm1), fitted(fm1, what = "nu"), fitted(fm1, what = "eta"), fitted(fm1, what = "phi") ) predict(fm1, type = "response", se.fit = TRUE) newdat <- data.frame(dose = c(20, 40)) predict(fm1, type = "response", se.fit = TRUE, newdata = newdat) # various phi scales fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb") fm$phi fm$phi.scale fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", phi.scale = "log") fm$phi fm$phi.scale fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb", phi.scale = "inverse") fm$phi fm$phi.scale # LR and Wald tests of the "log(dose + 10) + dose" factors fm0 <- aodml(m ~ 1, data = salmonella, family = "nb") anova(fm0, fm1) fm0.bis <- aodml(m ~ 1, data = salmonella, family = "nb", fixpar = list(2, fm1$phi)) LRstat <- 2 * (logLik(fm1) - logLik(fm0.bis)) pchisq(LRstat, df = 2, lower.tail = FALSE) wald.test(b = coef(fm1), varb = vcov(fm1), Terms = 2:3) ### Modelling a rate data(dja) # rate "m / trisk" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb") summary(fm) fm <- aodml(formula = m ~ group + offset(log(trisk)), phi.formula = ~ group, data = dja, family = "nb", phi.scale = "log") summary(fm) # model with 1 phi coefficient, set as constant "0.8" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~1, fixpar = list(3, 0.8)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant ~ "0" in the identity scale fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~ group, phi.scale = "log", fixpar = list(4, -15)) fm$param fm$varparam # model with 2 phi coefficients, with the first set as constant "0" in the identity scale, # and the mu intercept coefficient (1rst coef of vector b) set as as constant "-0.5" fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja, family = "nb", phi.formula = ~ group, phi.scale = "log", fixpar = list(c(1, 4), c(-0.5, -15))) fm$param fm$varparam
From clustered data, the function fits generalized linear models containing an over-dispersion parameter using quasi-likelihood estimating equations for the mean
and a moment estimator for
.
For binomial-type models, data have the form {}, where
is the size of cluster
,
the number of “successes”, and
the number of clusters. The response is the proportion
.
For Poisson-type models, data can be of two forms. When modeling “simple counts”, data have the form {}, where
is the number of occurences of the event under study. When modeling rates (e.g. hazard rates), data have the same form as for the BB model, where
is the denominator of the rate for cluster
(considered as an “offset”, i.e. a constant known value) and
the number of occurences of the event. For both forms of data, the response is the count
.
aodql(formula, data, family = c("qbin", "qpois"), link = c("logit", "cloglog", "probit"), method = c("chisq", "dev"), phi = NULL, tol = 1e-5, ...) ## S3 method for class 'aodql' anova(object, ...) ## S3 method for class 'aodql' coef(object, ...) ## S3 method for class 'aodql' deviance(object, ...) ## S3 method for class 'aodql' df.residual(object, ...) ## S3 method for class 'aodql' fitted(object, ...) ## S3 method for class 'aodql' logLik(object, ...) ## S3 method for class 'aodql' predict(object, ...) ## S3 method for class 'aodql' print(x, ...) ## S3 method for class 'aodql' residuals(object, ...) ## S3 method for class 'aodql' summary(object, ...) ## S3 method for class 'aodql' vcov(object, ...)
aodql(formula, data, family = c("qbin", "qpois"), link = c("logit", "cloglog", "probit"), method = c("chisq", "dev"), phi = NULL, tol = 1e-5, ...) ## S3 method for class 'aodql' anova(object, ...) ## S3 method for class 'aodql' coef(object, ...) ## S3 method for class 'aodql' deviance(object, ...) ## S3 method for class 'aodql' df.residual(object, ...) ## S3 method for class 'aodql' fitted(object, ...) ## S3 method for class 'aodql' logLik(object, ...) ## S3 method for class 'aodql' predict(object, ...) ## S3 method for class 'aodql' print(x, ...) ## S3 method for class 'aodql' residuals(object, ...) ## S3 method for class 'aodql' summary(object, ...) ## S3 method for class 'aodql' vcov(object, ...)
formula |
A formula for the mean |
data |
A data frame containing the response ( |
family |
Define the model which is fitted: “qbin” for binomial-type models and “qpois” for Poisson-type models. |
link |
For binomial-type models only. Define the link function |
method |
For function |
phi |
An optional value defining the over-dispersion parameter |
tol |
A positive scalar (default to 0.001). The algorithm stops at iteration |
... |
Further arguments to passed to the appropriate functions. |
object |
An object of class “aodql”. |
x |
An object of class “aodql”. |
Binomial-type models
For a given cluster , the model is
where follows a random variable of mean
and variance
. The marginal mean and variance of
are
The response in aodql
is . The mean is
, defined such as
, where
is the link function,
is a design-matrix,
a vector of fixed effects and
is the corresponding linear predictor. The variance is
.
Poisson-type models
—— Simple counts (model with no offset)
For a given cluster , the model is
where follows a random distribution of mean
and variance
. The mean and variance of the marginal distribution of
are
The response in aodql
is . The mean is
, defined such as
. The variance is
.
—— Rates (model with offset)
For a given cluster , the model is
where follows the same random distribution as for the case with no offset. The marginal mean and variance are
The response in aodql
is . The mean is
, defined such as
, where
is the offset. The variance is
.
Other details
Vector and parameter
are estimated iteratively, using procedures referred to as "Model I" in Williams (1982) for binomial-type models, and "Procedure II" in Breslow (1984) for Poisson-type models.
Iterations are as follows. Quasi-likelihood estimating equations (McCullagh & Nelder, 1989) are used to estimate (using function
glm
and its weights
argument), being set to a constant value. Then,
is calculated by the moment estimator, obtained by equalizing the goodness-of-fit statistic (chi-squared
X2
or deviance D
) of the model to its degrees of freedom.
Parameter can be set as constant, using argument
phi
. In that case, only is estimated.
An object of class aodql
, printed and summarized by various functions.
Breslow, N.E., 1984. Extra-Poisson variation in log-linear models. Appl. Statist. 33, 38-44.
Moore, D.F., 1987, Modelling the extraneous variance in the presence of extra-binomial variation.
Appl. Statist. 36, 8-14.
Moore, D.F., Tsiatis, A., 1991. Robust estimation of the variance in moment methods for extra-binomial
and extra-poisson variation. Biometrics 47, 383-401.
McCullagh, P., Nelder, J. A., 1989, 2nd ed. Generalized linear models. New York, USA: Chapman and Hall.
Williams, D.A., 1982, Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-148.
#------ Binomial-type models data(orob2) fm <- aodql(cbind(m, n - m) ~ seed, data = orob2, family = "qbin") coef(fm) vcov(fm) summary(fm) # chi2 tests of the seed factor in fm wald.test(b = coef(fm), varb = vcov(fm), Terms = 2) # chi-2 vs. deviance statistic to estimate phi fm1 <- aodql(cbind(m, n - m) ~ seed + root, data = orob2, family = "qbin") fm2 <- aodql(cbind(m, n - m) ~ seed + root, data = orob2, family = "qbin", method = "dev") coef(fm1) coef(fm2) fm1$phi fm2$phi vcov(fm1) vcov(fm2) gof(fm1) gof(fm2) # estimate with fixed phi fm <- aodql(cbind(m, n - m) ~ seed, data = orob2, family = "qbin", phi = 0.05) coef(fm) vcov(fm) summary(fm) #------ Poisson-type models data(salmonella) fm <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois") coef(fm) vcov(fm) summary(fm) # chi2 tests of the "log(dose + 10) + dose" factors wald.test(b = coef(fm), varb = vcov(fm), Terms = 2:3) # chi-2 vs. deviance statistic to estimate phi fm1 <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois") fm2 <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois", method = "dev") coef(fm1) coef(fm2) fm1$phi fm2$phi vcov(fm1) vcov(fm2) gof(fm1) gof(fm2) # estimate with fixed phi fm <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois", phi = 0.05) coef(fm) vcov(fm) summary(fm) # modelling a rate data(dja) # rate "m / trisk" fm <- aodql(formula = m ~ group + offset(log(trisk)), data = dja, family = "qpois") summary(fm)
#------ Binomial-type models data(orob2) fm <- aodql(cbind(m, n - m) ~ seed, data = orob2, family = "qbin") coef(fm) vcov(fm) summary(fm) # chi2 tests of the seed factor in fm wald.test(b = coef(fm), varb = vcov(fm), Terms = 2) # chi-2 vs. deviance statistic to estimate phi fm1 <- aodql(cbind(m, n - m) ~ seed + root, data = orob2, family = "qbin") fm2 <- aodql(cbind(m, n - m) ~ seed + root, data = orob2, family = "qbin", method = "dev") coef(fm1) coef(fm2) fm1$phi fm2$phi vcov(fm1) vcov(fm2) gof(fm1) gof(fm2) # estimate with fixed phi fm <- aodql(cbind(m, n - m) ~ seed, data = orob2, family = "qbin", phi = 0.05) coef(fm) vcov(fm) summary(fm) #------ Poisson-type models data(salmonella) fm <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois") coef(fm) vcov(fm) summary(fm) # chi2 tests of the "log(dose + 10) + dose" factors wald.test(b = coef(fm), varb = vcov(fm), Terms = 2:3) # chi-2 vs. deviance statistic to estimate phi fm1 <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois") fm2 <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois", method = "dev") coef(fm1) coef(fm2) fm1$phi fm2$phi vcov(fm1) vcov(fm2) gof(fm1) gof(fm2) # estimate with fixed phi fm <- aodql(m ~ log(dose + 10) + dose, data = salmonella, family = "qpois", phi = 0.05) coef(fm) vcov(fm) summary(fm) # modelling a rate data(dja) # rate "m / trisk" fm <- aodql(formula = m ~ group + offset(log(trisk)), data = dja, family = "qpois") summary(fm)
This package provides functions to analyse overdispersed counts or proportions. The functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
Functions Index :
aodml | ML Estimation of Generalized Linear Models for Overdispersed Count Data |
aodql | QL/MM Estimation of Generalized Linear Models for Overdispersed Count Data |
drs | Test of Proportion Homogeneity between Groups using Donner's and Rao-Scott's Adjustments |
gof | Test of Goodness-of-Fit of Models for Count data |
iccbin | Intra-Cluster Correlation for Clustered Binomial data |
invlink | Transformation from the Link Scale to the Observation Scale |
link | Transformation from the Observation Scale to the Link Scale |
splitbin | Split Grouped Data Into Individual Data |
varbin | Estimate of a Probability from Clustered Binomial Data |
wald.test | Wald Test for Model Coefficients |
Data sets Index :
antibio | Antibiotics against Shipping Fever in Calves |
cohorts | Data set: Age, Period and Cohort Effects for Vital Rates |
dja | Mortality of Djallonke Lambs in Senegal |
lizards | A Comparison of Site Preferences of Two Species of Lizard |
mice | Pregnant Female Mice Experiment |
orob1 | Germination Data |
orob2 | Germination Data |
rabbits | Rabbits Foetuses Survival Experiment |
rats | Rats Diet Experiment |
salmonella | Salmonella Reverse Mutagenicity Assay |
Matthieu Lesnoff <[email protected]> and Renaud Lancelot <[email protected]>
Maintainer: Renaud Lancelot <[email protected]>
Number of prostate cancer deaths and midperiod population for nonwhites in the USA by age and period.
The cohort index is related to age and period indices (
and
, respectively) by
, where
(Holford, 1983, Table 2).
data(cohorts)
data(cohorts)
A data frame with 49 observations on the following 4 variables.
A factor with levels 1935-
, 1940-
, ..., 1965-
.
A factor with levels 50-
, 55-
, ..., 80-
.
Numeric: the number of prostate cancer deaths.
Numeric: the midperiod population size.
Holford, T.R., 1983. The estimation of age, period and cohort effects for vital rates. Biometrics 39, 311-324.
Field trial to assess the effect of ewes deworming (prevention of gastro-intestinal parasitism) on the mortality of their offspring (age < 1 year). This data set is extracted from a large database on small ruminants production and health in Senegal (Lancelot et al., 1998). Data were collected in a sample of herds in Kolda (Upper Casamance, Senegal) during a multi-site survey (Faug?re et al., 1992). See also the references below for a presentation of the follow-up survey (Faug?re and Faug?re, 1986) and a description of the farming systems (Faug?re et al., 1990).
data(dja)
data(dja)
A data frame with 21 observations on the following 4 variables.
a factor with 2 levels: CTRL
and TREAT
, indicating the treatment.
a factor indicating the village of the herd.
a factor indicating the herd.
a numeric vector: the number of animals exposed to mortality.
a numeric vector: the exposition time to mortality (in year).
a numeric vector: the number of deaths.
Faug?re, O., Faug?re, B., 1986. Suivi de troupeaux et contr?le des performances individuelles des petits ruminants en milieu traditionnel africain. Aspects m?thodologiques. Rev. Elev. M?d. v?t. Pays trop., 39 (1): 29-40.
Faug?re, O., Dock?s, A.-C., Perrot, C., Faug?re, B., 1990. L'?levage traditionnel des petits ruminants
au S?n?gal. I. Pratiques de conduite et d'exploitation des animaux chez les ?leveurs de la r?gion de Kolda. Revue
Elev. M?d. v?t. Pays trop. 43: 249-259.
Faug?re, O., Tillard, E., Faug?re, B., 1992. Prophylaxie chez les petits ruminants au S?n?gal : r?gionalisation d'une politique nationale de protection sanitaire. In: B. Rey, S. H. B. Lebbie, L. Reynolds (Ed.), First biennial conference of the African Small Ruminant Research Network, ILCA, 1990, ILRAD, Nairobi, pp. 307-314.
Lancelot, R., Faye, B., Juan?s, X., Ndiaye, M., P?rochon, L., Tillard, E., 1998. La base de donn?es BAOBAB: un outil pour mod?liser la production et la sant? des petits ruminants dans les syst?mes d'?levage traditionnels au S?n?gal. Revue Elev. M?d. v?t. Pays trop., 51 (2): 135-146.
The function tests the homogeneity of probabilities between groups (H_0:
) from clustered binomial data {
}, where
is the size of cluster
,
the number of “successes” (proportions are
), and
the number of clusters. The function uses adjusted chi-squared statistics, with either the correction proposed by proposed by Donner (1989) or the correction proposed by Rao and Scott (1993).
drs(formula, data, type = c("d", "rs"), C = NULL, pooled = FALSE) ## S3 method for class 'drs' print(x, ...)
drs(formula, data, type = c("d", "rs"), C = NULL, pooled = FALSE) ## S3 method for class 'drs' print(x, ...)
formula |
An formula where the left-hand side is a matrix of the form |
type |
A character string: either “d” for the Donner's test and “rs” for the Rao and Scott's test. |
data |
A data frame containing |
C |
An optional vector of a priori |
pooled |
Logical indicating if a pooled design effect is estimated over the |
x |
An object of class “drf”. |
... |
Further arguments to be passed to |
Donner's test
The chi-squared statistic is adjusted with the correction factor computed in each group
. The test statistic is given by:
where and
.
is a scalar depending on the cluster sizes, and
is the ANOVA estimate of the intra-cluster correlation assumed common across groups (see Donner, 1989 or Donner et al., 1994). The statistic is compared to a chi-squared distribution with
degrees of freedom. Fixed correction factors
can be specified with the argument
C
.
Rao ans Scott's test
The method uses design effects and “effective” sample sizes. The design effect in each group
is estimated by
, where
is the variance of the ratio estimate of the probability in group
(Cochran, 1999, p. 32 and p. 66) and
is the standard binomial variance. The
are used to compute the effective sample sizes
, the effective numbers of successes
in each group
, and the overall effective proportion
. The test statistic is obtained by substituting these quantities in the usual chi-squared statistic, yielding:
which is compared to a chi-squared distribution with degrees of freedom.
A pooled design effect over the groups is estimated if argument
pooled = TRUE
(see Rao and Scott, 1993, Eq. 6). Fixed design effects can be specified with the argument
C
.
An object of class drs
, printed with print.drs
.
Donner, A., 1989. Statistical methods in ophthalmology: an adjusted chi-squared approach. Biometrics 45, 605-611.
Donner, A., 1993. The comparison of proportions in the presence of litter effects. Prev. Vet. Med. 18, 17-26.
Donner, A., Eliasziw, M., Klar, N., 1994. A comparison of methods for testing homogeneity of proportions in teratologic studies. Stat. Med. 13, 1253-1264.
data(dja) # Donner drs(formula = cbind(m, n - m) ~ group, data = dja, type = "d") # Rao and Scott drs(formula = cbind(m, n - m) ~ group, type = "rs", data = dja) drs(formula = cbind(m, n - m) ~ group, type = "rs", data = dja, pooled = TRUE) # standard chi2 test drs(formula = cbind(m, n - m) ~ group, data = dja, type = "d", C = c(1:1)) drs(formula = cbind(m, n - m) ~ group, data = dja, type = "rs", C = c(1:1))
data(dja) # Donner drs(formula = cbind(m, n - m) ~ group, data = dja, type = "d") # Rao and Scott drs(formula = cbind(m, n - m) ~ group, type = "rs", data = dja) drs(formula = cbind(m, n - m) ~ group, type = "rs", data = dja, pooled = TRUE) # standard chi2 test drs(formula = cbind(m, n - m) ~ group, data = dja, type = "d", C = c(1:1)) drs(formula = cbind(m, n - m) ~ group, data = dja, type = "rs", C = c(1:1))
The function returns a chi-squared test of goodness of fit for models of class glm
, aodml
or aodql
.
gof(object) gof.default(object) ## S3 method for class 'gof' print(x, ..., digits = max(3, getOption("digits") - 3))
gof(object) gof.default(object) ## S3 method for class 'gof' print(x, ..., digits = max(3, getOption("digits") - 3))
object |
An object of class |
x |
An object of class |
digits |
A numerical scalar indicating the number of digits to be printed after the decimal place. |
... |
Further arguments passed to |
Function gof
calculates the deviance and the Pearson chi-squared
statistics for the model under consideration. Let
be the observed response, and
and
its mean and variance estimated from the model, statistic
is calculated by:
Assuming that the data length is and the number of the parameters in the model is
,
and
are compared to a chi-squared distribution with
degrees of freedom.
An object of class gof
, printed with print.gof
.
Agresti, A. Categorical data analysis. Wiley, 1990.
data(orob2) fm1 <- glm(cbind(m, n - m) ~ seed, data = orob2, family = binomial) fm2 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") gof(fm1) gof(fm2)
data(orob2) fm1 <- glm(cbind(m, n - m) ~ seed, data = orob2, family = binomial) fm2 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb") gof(fm1) gof(fm2)
The function estimates the intraclass correlation from clustered binomial data:
{},
where is the size of cluster
,
the number of “successes” (proportions are
), and
the number of clusters. The function uses a one-way random effect model. Three estimates, corresponding to methods referred to as “A”, “B” and “C” in Goldstein et al. (2002), can be returned.
iccbin(n, m, method = c("A", "B", "C"), nAGQ = 1, M = 1000) ## S3 method for class 'iccbin' print(x, ...)
iccbin(n, m, method = c("A", "B", "C"), nAGQ = 1, M = 1000) ## S3 method for class 'iccbin' print(x, ...)
n |
A vector of the sizes of the clusters. |
m |
A vector of the numbers of successes (proportions are |
method |
A character (“A”, “B” or “C”) defining the calculation method. See Details. |
nAGQ |
Same as in function |
M |
Number of Monte Carlo (MC) replicates used in method “B”. Default to 1000. |
x |
An object of class “iccbin”. |
... |
Further arguments to ba passed to “print”. |
Before computations, the clustered data are split to binary “0/1” observations (observation
in cluster
). The methods of calculation are described in Goldstein et al. (2002).
Methods "A" and "B" use the 1-way logistic binomial-Gaussian model
where is a constant and
a cluster random effect with
. The ML estimate of the variance component
is calculated with the function
glmer
of package lme4. The intra-class correlation is then calculated from a first-order model linearization around
in method “A”, and with Monte Carlo simulations in method “B”.
Method "C" provides the common ANOVA (moment) estimate of . For details, see for instance Donner (1986), Searle et al. (1992) or Ukoumunne (2002).
An object of class iccbin
, printed with print.iccbin
.
Donner A., 1986, A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model. International Statistical Review 54, 67-82.
Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance components. Wiley, New York.
Ukoumunne, O. C., 2002. A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials. Statistics in Medicine 21, 3757-3774.
Golstein, H., Browne, H., Rasbash, J., 2002. Partitioning variation in multilevel models. Understanding Statistics 1(4), 223-231.
data(rats) z <- rats[rats$group == "TREAT", ] # A: glmm (model linearization) iccbin(z$n, z$m, method = "A") iccbin(z$n, z$m, method = "A", nAGQ = 10) # B: glmm (Monte Carlo) iccbin(z$n, z$m, method = "B") iccbin(z$n, z$m, method = "B", nAGQ = 10, M = 1500) # C: lmm (ANOVA moments) iccbin(z$n, z$m, method = "C") ## Not run: # Example of CI calculation with nonparametric bootstrap require(boot) foo <- function(X, ind) { n <- X$n[ind] m <- X$m[ind] iccbin(n = n, m = m, method = "C")$rho } res <- boot(data = z[, c("n", "m")], statistic = foo, R = 500, sim = "ordinary", stype = "i") res boot.ci(res, conf = 0.95, type = "basic") ## End(Not run)
data(rats) z <- rats[rats$group == "TREAT", ] # A: glmm (model linearization) iccbin(z$n, z$m, method = "A") iccbin(z$n, z$m, method = "A", nAGQ = 10) # B: glmm (Monte Carlo) iccbin(z$n, z$m, method = "B") iccbin(z$n, z$m, method = "B", nAGQ = 10, M = 1500) # C: lmm (ANOVA moments) iccbin(z$n, z$m, method = "C") ## Not run: # Example of CI calculation with nonparametric bootstrap require(boot) foo <- function(X, ind) { n <- X$n[ind] m <- X$m[ind] iccbin(n = n, m = m, method = "C")$rho } res <- boot(data = z[, c("n", "m")], statistic = foo, R = 500, sim = "ordinary", stype = "i") res boot.ci(res, conf = 0.95, type = "basic") ## End(Not run)
The function transforms a variable from the link scale to the observation scale (probability or count).
invlink(x, type = c("cloglog", "log", "logit", "probit"))
invlink(x, type = c("cloglog", "log", "logit", "probit"))
x |
A vector of real numbers. |
type |
A character string: “cloglog”, “log”, “logit” or “probit”. |
clog-log:
log:
logit:
probit:
x <- seq(-5, 5, length = 100) plot(x, invlink(x, type = "logit"), type = "l", lwd = 2, ylab = "Probability") lines(x, invlink(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(-5, 1, legend = c("alogit(x)", "acloglog(x)"), lty = c(1, 2), bg = "white")
x <- seq(-5, 5, length = 100) plot(x, invlink(x, type = "logit"), type = "l", lwd = 2, ylab = "Probability") lines(x, invlink(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(-5, 1, legend = c("alogit(x)", "acloglog(x)"), lty = c(1, 2), bg = "white")
The function transforms a variable from the observation scale (probability or count) to the link scale.
link(x, type = c("cloglog", "log", "logit", "probit"))
link(x, type = c("cloglog", "log", "logit", "probit"))
x |
A vector of real numbers. |
type |
A character string: “cloglog”, “log”, “logit” or “probit”. |
clog-log:
log:
logit:
probit:
x <- seq(.001, .999, length = 100) plot(x, link(x, type = "logit"), type = "l", lwd = 2, ylab = "link(proba.)") lines(x, link(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(0, 6, legend = c("logit(x)", "cloglog(x)"), lty = c(1, 2), bg = "white")
x <- seq(.001, .999, length = 100) plot(x, link(x, type = "logit"), type = "l", lwd = 2, ylab = "link(proba.)") lines(x, link(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(0, 6, legend = c("logit(x)", "cloglog(x)"), lty = c(1, 2), bg = "white")
“These data describe the daytime habits of two species of lizards, grahami and opalinus. They were collected by observing occupied sites or perches and recording the appropriate description, namely species involved, time of the day, height and diameter of the perch and whether the site was sunny or shaded. Time of the day is recorded as early, mid-day or late.” (McCullagh and Nelder, 1989, p.129).
data(lizards)
data(lizards)
A data frame with 24 observations on the following 6 variables.
A factor with levels Sun
and Shade
.
A factor with levels D <= 2
and D > 2
(inches).
A factor with levels H < 5
and H >= 5
(feet).
A factor with levels Early
, Mid-day
and Late
.
A numeric vector giving the observed sample size for grahami lizards.
A numeric vector giving the observed sample size for opalinus lizards.
The data were originally published in Fienberg (1970).
McCullagh, P., Nelder, J. A., 1989, 2nd ed. Generalized linear models. New York, USA: Chapman and Hall.
Fienberg, S.E., 1970. The analysis of multidimensional contingency tables. Ecology 51: 419-433.
Unpublished laboratory data on the proportion of affected foetuses in two groups (control and treatment) of 10 pregnant female mice (Kupper and Haseman, 1978, p. 75).
data(mice)
data(mice)
A data frame with 20 observations on the following 3 variables.
a factor with levels CTRL
and TREAT
a numeric vector: the total number of foetuses.
a numeric vector: the number of affected foetuses.
Kupper, L.L., Haseman, J.K., 1978. The use of a correlated binomial model for the analysis of a certain toxicological experiments. Biometrics 34, 69-76.
Data describing the germination for seed Orobanche cernua cultivated in three dilutions of a bean root extract (Crowder, 1978, Table 1). The mean proportions of the three groups are 0.142, 0.872 and 0.842, and the overall mean is 0.614.
data(orob1)
data(orob1)
A data frame with 16 observations on the following 3 variables.
a factor with 3 levels: 1/1
, 1/25
and 1/625
.
a numeric vector: the number of seeds exposed to germination.
a numeric vector: the number of seeds which actually germinated.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
A 2 x 2 factorial experiment comparing 2 types of seed and 2 root extracts (Crowder, 1978, Table 3). There are 5 or 6 replicates in each of the 4 treatment groups, and each replicate comprises a number of seeds varying between 4 and 81. The response variable is the proportion of seeds germinating in each replicate.
data(orob2)
data(orob2)
A data frame with 21 observations on the following 4 variables.
a factor with 2 levels: O73
and O75
.
a factor with 2 levels BEAN
and CUCUMBER
.
a numeric vector: the number of seeds exposed to germination.
a numeric vector: the number of seeds which actually germinated.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Experimental data for analyzing the effect of an increasing dose of a compound on the proportion of live foetuses affected (Paul, 1982, Table 1). Four treatment-groups were considered: control “C”, low dose “L”, medium dose “M” and high dose “H”. The animal species used in the experiment was banded Dutch rabbit.
data(rabbits)
data(rabbits)
A data frame with 84 observations on the following 3 variables.
a factor with levels C
, H
, L
and M
a numeric vector: the total number of foetuses.
a numeric vector: the number of affected foetuses.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments. Biometrics 38, 361-370.
“Weil (1970) in Table 1 gives the results from an experiment comprising two treatments. One group of 16 pregnant female rats was fed a control diet during pregnancy and lactation, the diet of a second group of 16 pregnant females was treated with a chemical. For each litter the number of pups alive at 4 days and the number
of pups that survived the 21 day lactation period were recorded.” (Williams, 1975, p. 951).
data(rats)
data(rats)
A data frame with 32 observations on the following 3 variables.
A factor with levels CTRL
and TREAT
A numeric vector: the number of pups alive at 4 days.
A numeric vector: the number of pups that survived the 21 day lactation.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.
Weil, C.S., 1970. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Fd. Cosmet. Toxicol. 8, 177-182.
“Data for our third example were compiled by Margolin et al. (1981) from an Ames Salmonella reverse mutagenicity assay. Table 1 shows the number of revertant colonies observed on each of 3 replicate plates tested at each of 6 dose levels of quinoline.” (Breslow, 1984, Table 1).
data(salmonella)
data(salmonella)
A data frame with 18 observations on the following 2 variables.
a numeric vector: the dose level of quinoline (microgram per plate).
a numeric vector: the number of revertant colonies of TA98 Salmonella.
Breslow, N.E., 1984. Extra-Poisson variation in log-linear models. Applied Statistics 33(1), 38-44.
Margolin, B.H., Kaplan, N., Zeiger, E., 1981. Statistical analysis of the Ames Salmonella / microsome test. Proc. Natl Acad. Sci. USA 76, 3779-3783.
The function splits grouped binomial data and optional covariates to individual binary data. Two types of grouped data are managed by splitbin
:
Grouped data with weights;
Grouped data of form cbind(success, failure)
.
When weights, successes or failures involve non-integer numbers, these numbers are rounded (using round()
) before splitting.
splitbin(formula, data, id = "idbin")
splitbin(formula, data, id = "idbin")
formula |
A formula. The left-hand side describes the grouped data. The right-hand side describes the covariates. See examples for syntax. |
data |
A data frame where all the variables described in |
id |
An optional character string naming the identifier (= grouping factor). Default to “idbin”. |
A data frame built according to the formula and function used in the call.
# grouped data with weights z <- data.frame( m = c(0, 1, 0, 1), f1 = c("A", "A", "B", "B"), f2 = c("C", "D", "C", "D"), n = c(4, 2, 1, 3) ) z splitbin(formula = n ~ f1, data = z)$tab splitbin(formula = n ~ f1 + f2 + m , data = z)$tab # grouped data of form "cbind(success, failure)" z <- data.frame( m = c(4, 1), n = c(5, 3), f1 = c("A", "B"), f2 = c("C", "D") ) z splitbin(formula = cbind(m, n - m) ~ 1, data = z)$tab splitbin(formula = cbind(m, n - m) ~ f1 + f2, data = z)$tab splitbin(formula = cbind(m, n - m) ~ f1 + f2, data = z)$tab
# grouped data with weights z <- data.frame( m = c(0, 1, 0, 1), f1 = c("A", "A", "B", "B"), f2 = c("C", "D", "C", "D"), n = c(4, 2, 1, 3) ) z splitbin(formula = n ~ f1, data = z)$tab splitbin(formula = n ~ f1 + f2 + m , data = z)$tab # grouped data of form "cbind(success, failure)" z <- data.frame( m = c(4, 1), n = c(5, 3), f1 = c("A", "B"), f2 = c("C", "D") ) z splitbin(formula = cbind(m, n - m) ~ 1, data = z)$tab splitbin(formula = cbind(m, n - m) ~ f1 + f2, data = z)$tab splitbin(formula = cbind(m, n - m) ~ f1 + f2, data = z)$tab
The function estimates a probability and its variance from clustered binomial data
{},
where is the size of cluster
,
the number of “successes” (proportions are
), and
the number of clusters. Confidence intervals are calculated using a normal approximation, which might be inappropriate when the probability is close to 0 or 1.
varbin(n, m, alpha = 0.05, R = 5000) ## S3 method for class 'varbin' print(x, ...)
varbin(n, m, alpha = 0.05, R = 5000) ## S3 method for class 'varbin' print(x, ...)
n |
A vector of the sizes of the clusters. |
m |
A vector of the numbers of successes (proportions are |
alpha |
The significance level for the confidence intervals. Default to 0.05, providing 95% CI's. |
R |
The number of bootstrap replicates to compute bootstrap mean and variance. Default to 5000. |
x |
An object of class “varbin”. |
... |
Further arguments to be passed to “print”. |
Five methods are used for the estimations. Let us consider clusters of sizes
with observed count responses
. We note
the observed proportions. The underlying assumption is that the probability, say
, is homogeneous across the clusters.
Binomial method: the probability estimate and its variance are calculated by
(ratio estimate) and
, respectively.
Ratio method: the probability is estimated as for the binomial method (ratio estimate). The one-stage cluster sampling formula is used to calculate the variance of
(see Cochran, 1999, p. 32 and p. 66).
Arithmetic method: the probability is estimated by . The variance of
is estimated by
.
Jackknife method: the probability is estimated by defined by the arithmetic mean of the pseudovalues
. The variance is estimated by
(Gladen, 1977, Paul, 1982).
Bootstrap method: samples of clusters of size
are drawn with equal probability from the initial sample
(Efron and Tibshirani, 1993). The bootstrap estimate
and its estimated variance are the arithmetic mean and the empirical variance (computed with denominator
) of the
binomial ratio estimates, respectively.
An object of class varbin
, printed with print.varbin
.
Cochran, W.G., 1999, 3th ed. Sampling techniques. Wiley, New York.
Efron, B., Tibshirani, R., 1993. An introduction to the bootstrap. Chapman and Hall, London.
Gladen, B., 1977. The use of the jackknife to estimate proportions from toxicological data in the presence
of litter effects. JASA 74(366), 278-283.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments.
Biometrics 38, 361-370.
data(rabbits) z <- rabbits[rabbits$group == "M", ] varbin(z$n, z$m) by(rabbits, list(group = rabbits$group), function(x) varbin(n = x$n, m = x$m, R = 1000))
data(rabbits) z <- rabbits[rabbits$group == "M", ] varbin(z$n, z$m) by(rabbits, list(group = rabbits$group), function(x) varbin(n = x$n, m = x$m, R = 1000))
The function returns a Wald chi-squared test or a test for a vector of model coefficients (possibly of length one), given its variance-covariance matrix.
wald.test(b, varb, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE, ...) ## S3 method for class 'wald.test' print(x, ..., digits = max(3, getOption("digits") - 3))
wald.test(b, varb, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE, ...) ## S3 method for class 'wald.test' print(x, ..., digits = max(3, getOption("digits") - 3))
b |
A vector of coefficients with their var-cov matrix |
varb |
A var-cov matrix of coefficients |
Terms |
An optional integer vector specifying which coefficients should be jointly tested, using a Wald chi-squared test or a |
L |
An optional matrix conformable to |
H0 |
A numeric vector giving the null hypothesis |
df |
A numeric vector giving the degrees of freedom to be used in an |
verbose |
A logical scalar controlling the amount of output information. The default is |
x |
An object of class “wald.test” |
digits |
A numeric scalar indicating the number of digits to be kept after the decimal place. |
... |
Additional arguments to |
The assumption is that the coefficients follow asymptotically a multivariate normal distribution with mean equal to the model coefficients b
and variance equal to their var-cov matrix varb
.
One (and only one) of Terms
or L
must be given. When L
is given, it must have the same number of columns as the length of b
, and the same number of rows as the number of linear combinations of coefficients.
When df
is given, the chi-squared Wald statistic is divided by m
, the number of linear combinations of coefficients to be tested (i.e., length(Terms)
or nrow(L)
). Under the null hypothesis , this new statistic follows an
distribution.
An object of class wald.test
, printed with print.wald.test
.
Diggle, P.J., Liang, K.-Y., Zeger, S.L., 1994. Analysis of longitudinal data. Oxford, Clarendon Press, 253 p.
Draper, N.R., Smith, H., 1998. Applied Regression Analysis. New York, John Wiley & Sons, Inc., 706 p.
data(orob2) fm <- aodql(cbind(m, n - m) ~ seed * root, data = orob2, family = "qbin") # Wald chi2 test for the effect of root wald.test(b = coef(fm), varb = vcov(fm), Terms = 3:4) L <- matrix(c(0, 0, 1, 0, 0, 0, 0, 1), nrow = 2, byrow = TRUE) wald.test(b = coef(fm), varb = vcov(fm), L = L)
data(orob2) fm <- aodql(cbind(m, n - m) ~ seed * root, data = orob2, family = "qbin") # Wald chi2 test for the effect of root wald.test(b = coef(fm), varb = vcov(fm), Terms = 3:4) L <- matrix(c(0, 0, 1, 0, 0, 0, 0, 1), nrow = 2, byrow = TRUE) wald.test(b = coef(fm), varb = vcov(fm), L = L)