Title: | Analysis of Overdispersed Data |
---|---|
Description: | Provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM). |
Authors: | Matthieu Lesnoff <[email protected]> and Renaud Lancelot <[email protected]> |
Maintainer: | Renaud Lancelot <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.3 |
Built: | 2024-12-08 06:54:19 UTC |
Source: | CRAN |
Representation of the output of function AIC
.
A data frame with 3 columns describing the models indicated by the row names:
number of parameters in the model
,
Akaike information criterion for the model (see AIC
)
,
small-sample corrected Akaike information criterion for the model (see AIC
).
signature(object = "aic")
signature(object = "aic")
Extracts the Akaike information criterion (AIC) and the corrected AIC (AICc) from fitted models of formal class “glimML” and possibly computes derived statistics.
## S4 method for signature 'glimML' AIC(object, ..., k = 2)
## S4 method for signature 'glimML' AIC(object, ..., k = 2)
object |
fitted model of formal class “glimML” (functions |
... |
optional list of fitted models separated by commas. |
k |
numeric scalar, with a default value set to 2, thus providing the regular AIC. |
, where
represents the number of parameters in the fitted model.
,
where
is the number of observations used to compute the log-likelihood. It should be used when the number
of fitted parameters is large compared to sample size, i.e., when
(Hurvich and Tsai, 1995).
Extracts the AIC and AICc from models of formal class “glimML”, fitted by functions
betabin
and negbin
.
Burnham, K.P., Anderson, D.R., 2002. Model selection and multimodel inference: a practical
information-theoretic approach. New-York, Springer-Verlag, 496 p.
Hurvich, C.M., Tsai, C.-L., 1995. Model selection for extended quasi-likelihood models in small samples.
Biometrics, 51 (3): 1077-1084.
Examples in betabin
and see AIC
in package stats.
Performs likelihood-ratio tests on nested models. Currently, one method was implemented
for beta-binomial models (betabin
) or negative-binomial models (negbin
).
## S4 method for signature 'glimML' anova(object, ...)
## S4 method for signature 'glimML' anova(object, ...)
object |
Fitted model of class “glimML”. |
... |
Further models to be tested or arguments passed to the |
The anova
method for models of formal class “glimML” needs at least 2 nested models of the
same type (either beta-binomial or negative-binomial models: they cannot be mixed). The quantity of interest is
the deviance difference between the compared models: it is a log-likelihood ratio statistic. Under the null
hypothesis that 2 nested models fit the data equally well, the deviance difference has an approximate
distribution with degrees of freedom = the difference in the number of parameters between
the compared models (Mc Cullagh and Nelder, 1989).
An object of formal class “anova.glimML” with 3 slots:
models |
A vector of character strings with each component giving the name of the models and the formulas for the fixed and random effects. |
|||||||||||||||||||||||||||||||||
anova.table |
A data frame containing the results. Row names correspond to the models.
|
|||||||||||||||||||||||||||||||||
type |
A character chain indicating the kind of fitted model: “BB” for beta-binomial, or “NB” for negative-binomial model. |
The comparison between 2 or more models will only be valid if they are fitted to the same data set.
McCullagh, P., Nelder, J.A., 1989. Generalized linear models. London, Chapman & Hall, 511 p.
See Appendix C. Likelihood ratio statistics, p. 476-478.
data(orob2) # likelihood ratio test for the effect of root fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2) anova(fm1, fm2)
data(orob2) # likelihood ratio test for the effect of root fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2) anova(fm1, fm2)
Hypothetical drug trial to compare the effect of four antibiotics against Shipping fever in calves (Shoukri and Pause, 1999, Table 3.11).
data(antibio)
data(antibio)
A data frame with 24 observations on the following 3 variables.
A factor with levels 1
, 2
, 3
and 4
A numeric vector: the number of treated animals within a two-week period.
A numeric vector: the number of deaths at the end of the two weeks.
Shoukri, M.M., Pause, C.A., 1999, 2nd ed. Statistical methods for health sciences. CRC Press, London.
This package provides a set of functions to analyse overdispersed counts or proportions. Most of the methods are already available elsewhere but are scattered in different packages. The proposed functions should be considered as complements to more sophisticated methods such as generalized estimating equations (GEE) or generalized linear mixed effect models (GLMM).
Package: | aod |
Version: | 1.1-32 |
Date: | 2010-04-02 |
Depends: | R (>= 2.0.0), methods, stats |
Suggests: | MASS, nlme, boot |
License: | GPL version 2 or newer |
URL: | http://cran.r-project.org/package=aod |
LazyLoad: | yes |
LazyData: | yes |
Index :
AIC-methods | Akaike Information Criteria |
aic-class | Representation of Objects of Formal Class “aic” |
anova-method | Likelihood-Ratio Tests for Nested ML Models |
antibio | Antibiotics against Shipping Fever in Calves |
betabin | Beta-Binomial Model for Proportions |
coef-methods | Methods for Function “coef” in Package aod |
cohorts | Data set: Age, Period and Cohort Effects for Vital Rates |
deviance-methods | Methods for Function deviance in Package aod
|
df.residual-methods | Methods for Function df.residual in Package aod
|
dja | Mortality of Djallonke Lambs in Senegal |
donner | Test of Proportion Homogeneity using Donner's Adjustment |
drs-class | Representation of Objects of Formal Class “drs” |
fitted-methods | Methods for Function fitted in Package aod
|
glimML-class | Representation of Models of Formal Class “glimML” |
glimQL-class | Representation of Models of Formal Class “glimQL” |
icc | Intra-Cluster Correlation |
icc-class | Representation of Objects of Formal Class “icc” |
invlink | Transformation from the Link Scale to the Observation Scale |
link | Transformation from the Observation Scale to the Link Scale |
lizards | A Comparison of Site Preferences of Two Species of Lizard |
logLik-methods | Methods for Functions “logLik” in Package aod |
mice | Pregnant Female Mice Experiment |
negbin | Negative-Binomial Model for Counts |
orob1 | Germination Data |
orob2 | Germination Data |
predict-methods | Methods for Function “predict” in Package aod |
quasibin | Quasi-Likelihood Model for Proportions |
quasipois | Quasi-Likelihood Model for Counts |
rabbits | Rabbits Foetuses Survival Experiment |
raoscott | Test of Proportion Homogeneity using Rao and Scott's Adjustment |
rats | Rats Diet Experiment |
residuals-methods | Residuals for Maximum-Likelihood and Quasi-Likelihood Models |
salmonella | Salmonella Reverse Mutagenicity Assay |
splitbin | Splits Binomial Data into Bernoulli Data |
summary,aic-method | Akaike Information Statistics |
summary.glimML-class | Summary of Objects of Class “summary.glimML” |
varbin | Mean, Variance and Confidence Interval of a Proportion |
varbin-class | Representation of Objects of Formal Class “varbin” |
vcov-methods | Methods for Function “vcov” in Package aod |
wald.test | Wald Test for Model Coefficients |
Matthieu Lesnoff <[email protected]> and Renaud Lancelot <[email protected]>
Maintainer: Renaud Lancelot
Fits a beta-binomial generalized linear model accounting for overdispersion in clustered binomial data .
betabin(formula, random, data, link = c("logit", "cloglog"), phi.ini = NULL, warnings = FALSE, na.action = na.omit, fixpar = list(), hessian = TRUE, control = list(maxit = 2000), ...)
betabin(formula, random, data, link = c("logit", "cloglog"), phi.ini = NULL, warnings = FALSE, na.action = na.omit, fixpar = list(), hessian = TRUE, control = list(maxit = 2000), ...)
formula |
A formula for the fixed effects |
random |
A right-hand formula for the overdispersion parameter(s) |
link |
The link function for the mean |
data |
A data frame containing the response ( |
phi.ini |
Initial values for the overdispersion parameter(s) |
warnings |
Logical to control the printing of warnings occurring during log-likelihood maximization.
Default to |
na.action |
A function name: which action should be taken in the case of missing value(s). |
fixpar |
A list with 2 components (scalars or vectors) of the same size, indicating which parameters are
fixed (i.e., not optimized) in the global parameter vector |
hessian |
A logical. When set to |
control |
A list to control the optimization parameters. See |
... |
Further arguments passed to |
For a given cluster , the model is:
with following a Beta distribution
.
If denotes the beta function, then:
The marginal beta-binomial distribution is:
The function uses the parameterization and
,
where
is the inverse of the link function (logit or complementary log-log),
is a design-matrix,
is a vector of fixed effects,
is the linear predictor and
is the overdispersion
parameter (i.e., the intracluster correlation coefficient, which is here restricted to be positive).
The marginal mean and variance are:
The parameters and
are estimated by maximizing the log-likelihood of the marginal model (using the
function
optim
). Several explanatory variables are allowed in , only one in
.
An object of formal class “glimML”: see glimML-class
for details.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application
to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of
correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31, 949-952.
glimML-class
, glm
and optim
data(orob2) fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2) fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2) # show the model fm1; fm2; fm3 # AIC AIC(fm1, fm2, fm3) summary(AIC(fm1, fm2, fm3), which = "AICc") # Wald test for root effect wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4) # likelihood ratio test for root effect anova(fm1, fm3) # model predictions New <- expand.grid(seed = levels(orob2$seed), root = levels(orob2$root)) data.frame(New, predict(fm3, New, se = TRUE, type = "response")) # Djallonke sheep data data(dja) betabin(cbind(y, n - y) ~ group, ~ 1, dja) # heterogeneous phi betabin(cbind(y, n - y) ~ group, ~ group, dja, control = list(maxit = 1000)) # phi fixed to zero in group TREAT betabin(cbind(y, n - y) ~ group, ~ group, dja, fixpar = list(4, 0)) # glim without overdispersion summary(glm(cbind(y, n - y) ~ group, family = binomial, data = dja)) # phi fixed to zero in both groups betabin(cbind(y, n - y) ~ group, ~ group, dja, fixpar = list(c(3, 4), c(0, 0)))
data(orob2) fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2) fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2) # show the model fm1; fm2; fm3 # AIC AIC(fm1, fm2, fm3) summary(AIC(fm1, fm2, fm3), which = "AICc") # Wald test for root effect wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4) # likelihood ratio test for root effect anova(fm1, fm3) # model predictions New <- expand.grid(seed = levels(orob2$seed), root = levels(orob2$root)) data.frame(New, predict(fm3, New, se = TRUE, type = "response")) # Djallonke sheep data data(dja) betabin(cbind(y, n - y) ~ group, ~ 1, dja) # heterogeneous phi betabin(cbind(y, n - y) ~ group, ~ group, dja, control = list(maxit = 1000)) # phi fixed to zero in group TREAT betabin(cbind(y, n - y) ~ group, ~ group, dja, fixpar = list(4, 0)) # glim without overdispersion summary(glm(cbind(y, n - y) ~ group, family = binomial, data = dja)) # phi fixed to zero in both groups betabin(cbind(y, n - y) ~ group, ~ group, dja, fixpar = list(c(3, 4), c(0, 0)))
Extract the fixed-effect coefficients from fitted objects.
Generic function: see coef
.
Extract the estimated fixed-effect coefficients from objects of formal class “glimML”.
Presently, these objects are generated by functions betabin
and negbin
.
Extract the estimated fixed-effect coefficients from objects of formal class “glimQL”.
Presently, these objects are generated by functions quasibin
and quasipois
.
Number of prostate cancer deaths and midperiod population for nonwhites in the USA by age and period.
The cohort index is related to age and period indices (
and
, respectively) by
,
where
(Holford, 1983, Table 2).
data(cohorts)
data(cohorts)
A data frame with 49 observations on the following 4 variables.
A factor with levels 1935-
, 1940-
, ..., 1965-
.
A factor with levels 50-
, 55-
, ..., 80-
.
Numeric: the number of prostate cancer deaths.
Numeric: the midperiod population size.
Holford, T.R., 1983. The estimation of age, period and cohort effects for vital rates. Biometrics 39, 311-324.
Extracts the deviance fitted models.
Generic function: see deviance
.
Extracts the deviance from models fitted by betabin
or negbin
.
Computes the number of degrees of freedom of the residuals from fitted objects.
Generic function: see df.residual
.
Computes the df of residuals for models fitted by betabin
or negbin
.
Computes the df of residuals for models fitted by quasibin
or quasipois
.
Field trial to assess the effect of ewes deworming (prevention of gastro-intestinal parasitism) on the mortality of their offspring (age < 1 year). This data set is extracted from a large database on small ruminants production and health in Senegal (Lancelot et al., 1998). Data were collected in a sample of herds in Kolda (Upper Casamance, Senegal) during a multi-site survey (Faugère et al., 1992). See also the references below for a presentation of the follow-up survey (Faugère and Faugère, 1986) and a description of the farming systems (Faugère et al., 1990).
data(dja)
data(dja)
A data frame with 21 observations on the following 4 variables.
a factor with 2 levels: CTRL
and TREAT
, indicating the treatment.
a factor indicating the village of the herd.
a factor indicating the herd.
a numeric vector: the number of animals exposed to mortality.
a numeric vector: the exposition time to mortality (in year).
a numeric vector: the number of deaths.
Faugère, O., Faugère, B., 1986. Suivi de troupeaux et contrôle des performances individuelles des petits
ruminants en milieu traditionnel africain. Aspects méthodologiques. Rev. Elev. Méd. vét. Pays trop., 39 (1): 29-40.
Faugère, O., Dockès, A.-C., Perrot, C., Faugère, B., 1990. L'élevage traditionnel des petits ruminants
au Sénégal. I. Pratiques de conduite et d'exploitation des animaux chez les éleveurs de la région de Kolda. Revue
Elev. Méd. vét. Pays trop. 43: 249-259.
Faugère, O., Tillard, E., Faugère, B., 1992. Prophylaxie chez les petits ruminants au Sénégal : régionalisation
d'une politique nationale de protection sanitaire. In: B. Rey, S. H. B. Lebbie, L. Reynolds (Ed.), First biennial
conference of the African Small Ruminant Research Network, ILCA, 1990, ILRAD, Nairobi, pp. 307-314.
Lancelot, R., Faye, B., Juanès, X., Ndiaye, M., Pérochon, L., Tillard, E., 1998. La base de données BAOBAB:
un outil pour modéliser la production et la santé des petits ruminants dans les systèmes d'élevage traditionnels
au Sénégal. Revue Elev. Méd. vét. Pays trop., 51 (2): 135-146.
Tests the homogeneity of proportions between groups (H0:
) from clustered
binomial data
using the adjusted
statistic proposed by Donner (1989).
donner(formula = NULL, response = NULL, weights = NULL, group = NULL, data, C = NULL)
donner(formula = NULL, response = NULL, weights = NULL, group = NULL, data, C = NULL)
formula |
An optional formula where the left-hand side is either a matrix of the form |
response |
An optional argument indicating either a matrix of the form |
weights |
An optional argument used when the left-hand side of |
group |
An optional argument only used when |
data |
A data frame containing the response ( |
C |
If not NULL, a numerical vector of |
The statistic is adjusted with the correction factor
computed in each group
. The test statistic is given by:
where ,
is a scalar depending on the cluster sizes,
and
is the ANOVA estimate of the intra-cluster correlation, assumed common
across groups (see Donner, 1989 or Donner et al., 1994). The statistic is compared to a
distribution with
degrees of freedom. Fixed correction factors can be specified
with the argument
C
.
An object of formal class “drs”: see drs-class
for details. The slot tab
provides the proportion of successes and the correction factor for each group.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Donner, A., 1989. Statistical methods in ophthalmology: an adjusted chi-squared approach. Biometrics 45, 605-611.
Donner, A., 1993. The comparison of proportions in the presence of litter effects. Prev. Vet. Med. 18, 17-26.
Donner, A., Eliasziw, M., Klar, N., 1994. A comparison of methods for testing homogeneity of proportions in
teratologic studies. Stat. Med. 13, 1253-1264.
chisq.test
, raoscott
, drs-class
data(rats) donner(formula = cbind(y, n - y) ~ group, data = rats) donner(formula = y/n ~ group, weights = n, data = rats) donner(response = cbind(y, n - y), group = group, data = rats) donner(response = y/n, weights = n, group = group, data = rats) # standard test donner(cbind(y, n - y) ~ group, data = rats, C = c(1, 1)) data(antibio) donner(cbind(y, n - y) ~ treatment, data = antibio)
data(rats) donner(formula = cbind(y, n - y) ~ group, data = rats) donner(formula = y/n ~ group, weights = n, data = rats) donner(response = cbind(y, n - y), group = group, data = rats) donner(response = y/n, weights = n, group = group, data = rats) # standard test donner(cbind(y, n - y) ~ group, data = rats, C = c(1, 1)) data(antibio) donner(cbind(y, n - y) ~ treatment, data = antibio)
Representation of the output of functions donner
and raoscott
.
Objects can be created by calls of the form new("drs", ...)
or, more commonly,
via the donner
or raoscott
functions.
CALL
The call of the function.
tab
A data frame containing test information. The content of the data frame depends on the type of the function which generated it.
rho
The ANOVA estimate of the intra-cluster correlation (function donner
).
X2
The adjusted statistic.
Extracts the fitted values from models.
Generic function: see fitted
.
Extract the fitted values from models of formal class “glimML”, presently generated
by functions betabin
and negbin
.
Extract the fitted values from models of formal class “glimQL”, presently generated
by functions quasibin
and quasibin
.
Representation of models of formal class "glimML" fitted by maximum-likelihood method.
Objects can be created by calls of the form new("glimML", ...)
or,
more commonly, via the functions betabin
or negbin
.
CALL
The call of the function.
link
The link function used to transform the mean: “logit”, “cloglog” or “log”.
method
The type of fitted model: “BB” for beta-binomial and “NB” for negative-binomial models.
formula
The formula used to model the mean.
random
The formula used to model the overdispersion parameter .
data
Data set to which model was fitted. Different from the original data in case of missing value(s).
param
The vector of the ML estimated parameters and
.
varparam
The variance-covariance matrix of the ML estimated parameters and
.
fixed.param
The vector of the ML estimated fixed-effect parameters .
random.param
The vector of the ML estimated random-effect (correlation) parameters .
logL
The log-likelihood of the fitted model.
logL.max
The log-likelihood of the maximal model (data).
dev
The deviance of the model, i.e., - 2 * (logL - logL.max)
.
df.residual
The residual degrees of freedom of the fitted model.
nbpar
The number of estimated parameters, i.e., nbpar = total number of parameters - number
of fixed parameters. See argument fixpar
in betabin
or negbin
.
iterations
The number of iterations performed in optim
.
code
An integer (returned by optim
) indicating why the optimization process terminated.
Relative gradient is close to 0, current iterate is probably solution.
Successive iterates within tolerance, current iterate is probably solution.
Last global step failed to locate a point lower than estimate. Either estimate is an approximate
local minimum of the function or steptol
is too small.
Iteration limit exceeded.
Maximum step size stepmax
exceeded 5 consecutive times. Either the function is unbounded below,
becomes asymptotic to a finite value from above in some direction or stepmax
is too small.
msg
Message returned by optim
.
singular.hessian
Logical: true when fitting provided a singular hessian, indicating an overparamaterized model.
param.ini
The initial values provided to the ML algorithm.
na.action
A function defining the action taken when missing values are encountered.
Representation of models of formal class "glimQL" fitted by quasi-likelihood method.
Objects can be created by calls of the form new("glimQL", ...)
or,
more commonly, via the quasibin
or quasipois
functions.
CALL
The call of the function.
fm
A fitted model of class “glm”.
phi
The overdispersion parameter.
signature(object = "glimQL")
: Main results of “glimQL” models.
This function calculates point estimates of the intraclass correlation
from clustered binomial data
(with
the number of clusters),
using a 1-way random effect model. Three estimates, following methods referred to as “A”, “B” and “C”
in Goldstein et al. (2002), can be obtained.
iccbin(n, y, data, method = c("A", "B", "C"), nAGQ = 1, M = 1000)
iccbin(n, y, data, method = c("A", "B", "C"), nAGQ = 1, M = 1000)
n |
Vector of the denominators of the proportions. |
y |
Vector of the numerators of the proportions. |
data |
A data frame containing the variables n and y. |
method |
A character (“A”, “B” or “C”) defining the calculation method. See Details. |
nAGQ |
Same as in function |
M |
Number of Monte Carlo (MC) replicates used in method “B”. Default to 1000. |
Before computations, the clustered data are split into binary (0/1) observations (obs.
in cluster
).
The calculation methods are described in Goldstein et al. (2002).
Methods "A" and "B" assume a 1-way generalized linear mixed model,
and method "C" a 1-way linear mixed model.
For "A" and "B", function iccbin
uses the logistic binomial-Gaussian model:
where is a constant and
a cluster random effect with
.
The ML estimate of the variance component
is calculated with the function
glmer
of package lme4.
The intra-class correlation is then calculated with a first-order model linearization
around
in method “A”, and with Monte Carlo simulations in method “B”.
For “C”, function iccbin
provides the common ANOVA (moments) estimate of .
For details, see for instance Donner (1986), Searle et al. (1992) or Ukoumunne (2002).
An object of formal class “iccbin”, with 3 slots:
CALL |
The call of the function. |
features |
A character vector summarizing the main features of the method used. |
rho |
The point estimate of the intraclass correlation |
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Donner A., 1986, A review of inference procedures for the intraclass correlation coefficient in the one-way random
effects model. International Statistical Review 54, 67-82.
Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance components. Wiley, New York.
Ukoumunne, O. C., 2002. A comparison of confidence interval methods for the intraclass
correlation coefficient in cluster randomized trials. Statistics in Medicine 21, 3757-3774.
Golstein, H., Browne, H., Rasbash, J., 2002. Partitioning variation in multilevel models.
Understanding Statistics 1(4), 223-231.
data(rats) tmp <- rats[rats$group == "TREAT", ] # A: glmm (model linearization) iccbin(n, y, data = tmp, method = "A") iccbin(n, y, data = tmp, method = "A", nAGQ = 10) # B: glmm (Monte Carlo) iccbin(n, y, data = tmp, method = "B") iccbin(n, y, data = tmp, method = "B", nAGQ = 10, M = 1500) # C: lmm (ANOVA moments) iccbin(n, y, data = tmp, method = "C") ## Not run: # Example of confidence interval calculation with nonparametric bootstrap require(boot) foo <- function(X, ind) { n <- X$n[ind] y <- X$y[ind] X <- data.frame(n = n, y = y) iccbin(n = n, y = y, data = X, method = "C")@rho[1] } res <- boot(data = tmp[, c("n", "y")], statistic = foo, R = 500, sim = "ordinary", stype = "i") res boot.ci(res, conf = 0.95, type = "basic") ## End(Not run)
data(rats) tmp <- rats[rats$group == "TREAT", ] # A: glmm (model linearization) iccbin(n, y, data = tmp, method = "A") iccbin(n, y, data = tmp, method = "A", nAGQ = 10) # B: glmm (Monte Carlo) iccbin(n, y, data = tmp, method = "B") iccbin(n, y, data = tmp, method = "B", nAGQ = 10, M = 1500) # C: lmm (ANOVA moments) iccbin(n, y, data = tmp, method = "C") ## Not run: # Example of confidence interval calculation with nonparametric bootstrap require(boot) foo <- function(X, ind) { n <- X$n[ind] y <- X$y[ind] X <- data.frame(n = n, y = y) iccbin(n = n, y = y, data = X, method = "C")@rho[1] } res <- boot(data = tmp[, c("n", "y")], statistic = foo, R = 500, sim = "ordinary", stype = "i") res boot.ci(res, conf = 0.95, type = "basic") ## End(Not run)
Representation of the output of function iccbin
.
Objects can be created by calls of the form new("iccbin", ...)
or, more commonly,
via the function iccbin
.
CALL
The call of the function.
features
A character vector summarizing the main features of the method used.
rho
A numeric scalar giving the intra-cluster correlation.
signature(object = "iccbin")
: see iccbin
.
The function transforms a variable from the link scale to the observation scale: probability or count.
invlink(x, type = c("cloglog", "log", "logit"))
invlink(x, type = c("cloglog", "log", "logit"))
x |
A vector of real numbers. |
type |
A character string. Legal values are “cloglog”, “log” and “logit”. |
x <- seq(-5, 5, length = 100) plot(x, invlink(x, type = "logit"), type = "l", lwd = 2, ylab = "Probability") lines(x, invlink(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(-5, 1, legend = c("alogit(x)", "acloglog(x)"), lty = c(1, 2), bg = "white")
x <- seq(-5, 5, length = 100) plot(x, invlink(x, type = "logit"), type = "l", lwd = 2, ylab = "Probability") lines(x, invlink(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(-5, 1, legend = c("alogit(x)", "acloglog(x)"), lty = c(1, 2), bg = "white")
The function transforms a variable from the observation scale (probability or count) to the link scale.
link(x, type = c("cloglog", "log", "logit"))
link(x, type = c("cloglog", "log", "logit"))
x |
A vector of real numbers. |
type |
A character string. Legal values are “cloglog”, “log” and “logit”. |
x <- seq(.001, .999, length = 100) plot(x, link(x, type = "logit"), type = "l", lwd = 2, ylab = "link(proba.)") lines(x, link(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(0, 6, legend = c("logit(x)", "cloglog(x)"), lty = c(1, 2), bg = "white")
x <- seq(.001, .999, length = 100) plot(x, link(x, type = "logit"), type = "l", lwd = 2, ylab = "link(proba.)") lines(x, link(x, type = "cloglog"), lty = 2, lwd = 2) grid(col = "black") legend(0, 6, legend = c("logit(x)", "cloglog(x)"), lty = c(1, 2), bg = "white")
“These data describe the daytime habits of two species of lizards, grahami and opalinus. They were collected by observing occupied sites or perches and recording the appropriate description, namely species involved, time of the day, height and diameter of the perch and whether the site was sunny or shaded. Time of the day is recorded as early, mid-day or late.” (McCullagh and Nelder, 1989, p.129).
data(lizards)
data(lizards)
A data frame with 24 observations on the following 6 variables.
A factor with levels Sun
and Shade
.
A factor with levels D <= 2
and D > 2
(inches).
A factor with levels H < 5
and H >= 5
(feet).
A factor with levels Early
, Mid-day
and Late
.
A numeric vector giving the observed sample size for grahami lizards.
A numeric vector giving the observed sample size for opalinus lizards.
The data were originally published in Fienberg (1970).
McCullagh, P., Nelder, J.A., 1989. Generalized linear models. London, Chapman & Hall, 511 p.
Fienberg, S.E., 1970. The analysis of multidimensional contingency tables. Ecology 51: 419-433.
data(lizards)
data(lizards)
Extracts the maximized log-likelihood from fitted models of formal class “glimML”.
## S4 method for signature 'glimML' logLik(object, ...)
## S4 method for signature 'glimML' logLik(object, ...)
object |
A fitted model of formal class “glimML” (functions |
... |
Other arguments passed to methods. |
A numeric scalar with 2 attributes: “df” (number of parameters in the model) and “nobs” (number of observations = degrees of freedom of the residuals + number of parameters in the model).
Generic function: see logLik
.
Extract the maximized log-likelihood from models of formal class “glimML”, fitted by functions
betabin
and negbin
.
logLik
in package stats.
Unpublished laboratory data on the proportion of affected foetuses in two groups (control and treatment) of 10 pregnant female mice (Kupper and Haseman, 1978, p. 75).
data(mice)
data(mice)
A data frame with 20 observations on the following 3 variables.
a factor with levels CTRL
and TREAT
a numeric vector: the total number of foetuses.
a numeric vector: the number of affected foetuses.
Kupper, L.L., Haseman, J.K., 1978. The use of a correlated binomial model for the analysis of a certain toxicological experiments. Biometrics 34, 69-76.
The function fits a negative-binomial log linear model accounting for overdispersion in counts .
negbin(formula, random, data, phi.ini = NULL, warnings = FALSE, na.action = na.omit, fixpar = list(), hessian = TRUE, control = list(maxit = 2000), ...)
negbin(formula, random, data, phi.ini = NULL, warnings = FALSE, na.action = na.omit, fixpar = list(), hessian = TRUE, control = list(maxit = 2000), ...)
formula |
A formula for the fixed effects. The left-hand side of the formula must be the counts |
random |
A right-hand formula for the overdispersion parameter(s) |
data |
A data frame containing the response ( |
phi.ini |
Initial values for the overdispersion parameter(s) |
warnings |
Logical to control printing of warnings occurring during log-likelihood maximization. Default to FALSE (no printing). |
na.action |
A function name. Indicates which action should be taken in the case of missing value(s). |
fixpar |
A list with 2 components (scalars or vectors) of the same size, indicating which parameters are
fixed (i.e., not optimized) in the global parameter vector |
hessian |
A logical. When set to |
control |
A list to control the optimization parameters. See |
... |
Further arguments passed to |
For a given count , the model is:
with following a Gamma distribution
.
If denote the gamma function, then:
The marginal negative-binomial distribution is:
The function uses the parameterization and
,
where
is a design-matrix,
is a vector of fixed effects,
is the linear predictor and
the overdispersion parameter.
The marginal mean and variance are:
The parameters and
are estimated by maximizing the log-likelihood of the marginal model (using the
function
optim()
). Several explanatory variables are allowed in . Only one is allowed in
.
An offset can be specified in the formula
argument to model rates . The offset and the marginal mean
are
and
, respectively.
An object of formal class “glimML”: see glimML-class
for details.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3): 209-225.
glimML-class
, glm
and optim
,glm.nb
in the recommended package MASS,gnlr
in package gnlm available at https://www.commanster.eu/rcode.html.
# without offset data(salmonella) negbin(y ~ log(dose + 10) + dose, ~ 1, salmonella) library(MASS) # function glm.nb in MASS fm.nb <- glm.nb(y ~ log(dose + 10) + dose, link = log, data = salmonella) coef(fm.nb) 1 / fm.nb$theta # theta = 1 / phi c(logLik(fm.nb), AIC(fm.nb)) # with offset data(dja) negbin(y ~ group + offset(log(trisk)), ~ group, dja) # phi fixed to zero in group TREAT negbin(y ~ group + offset(log(trisk)), ~ group, dja, fixpar = list(4, 0)) # glim without overdispersion summary(glm(y ~ group + offset(log(trisk)), family = poisson, data = dja)) # phi fixed to zero in both groups negbin(y ~ group + offset(log(trisk)), ~ group, dja, fixpar = list(c(3, 4), c(0, 0)))
# without offset data(salmonella) negbin(y ~ log(dose + 10) + dose, ~ 1, salmonella) library(MASS) # function glm.nb in MASS fm.nb <- glm.nb(y ~ log(dose + 10) + dose, link = log, data = salmonella) coef(fm.nb) 1 / fm.nb$theta # theta = 1 / phi c(logLik(fm.nb), AIC(fm.nb)) # with offset data(dja) negbin(y ~ group + offset(log(trisk)), ~ group, dja) # phi fixed to zero in group TREAT negbin(y ~ group + offset(log(trisk)), ~ group, dja, fixpar = list(4, 0)) # glim without overdispersion summary(glm(y ~ group + offset(log(trisk)), family = poisson, data = dja)) # phi fixed to zero in both groups negbin(y ~ group + offset(log(trisk)), ~ group, dja, fixpar = list(c(3, 4), c(0, 0)))
[Data describing the germination] “for seed Orobanche cernua cultivated in three dilutions of a bean root extract. The mean proportions of the three sets are 0.142, 0.872 and 0.842, and the overall mean is 0.614.” (Crowder, 1978, Table 1).
data(orob1)
data(orob1)
A data frame with 16 observations on the following 3 variables.
a factor with 3 levels: 1/1
, 1/25
and 1/625
.
a numeric vector: the number of seeds exposed to germination.
a numeric vector: the number of seeds which actually germinated.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
“A 2 x 2 factorial experiment comparing 2 types of seed and 2 root extracts. There are 5 or 6 replicates in each of the 4 treatment groups, and each replicate comprises a number of seeds varying between 4 and 81. The response variable is the proportion of seeds germinating in each replicate.” (Crowder, 1978, Table 3).
data(orob2)
data(orob2)
A data frame with 21 observations on the following 4 variables.
a factor with 2 levels: O73
and O75
.
a factor with 2 levels BEAN
and CUCUMBER
.
a numeric vector: the number of seeds exposed to germination.
a numeric vector: the number of seeds which actually germinated.
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
“predict” methods for fitted models generated by functions in package aod.
## S4 method for signature 'glimML' predict(object, newdata = NULL, type = c("response", "link"), se.fit = FALSE, ...) ## S4 method for signature 'glimQL' predict(object, newdata = NULL, type = c("response", "link"), se.fit = FALSE, ...)
## S4 method for signature 'glimML' predict(object, newdata = NULL, type = c("response", "link"), se.fit = FALSE, ...) ## S4 method for signature 'glimQL' predict(object, newdata = NULL, type = c("response", "link"), se.fit = FALSE, ...)
object |
A fitted model of formal class “glimML” (functions |
newdata |
A data.frame providing all the explanatory variables necessary for predictions. |
type |
A character string indicating the scale on which predictions are made: either “response” for predictions on the observation scale, or “link” for predictions on the scale of the link. |
se.fit |
A logical scalar indicating whether pointwise standard errors should be computed for the predictions. |
... |
Other arguments passed to methods. |
Compute predictions for models of formal class “glimML”, presently generated
by functions betabin
and negbin
. See the examples for these functions.
Compute predictions for models of formal class “glimQL”, presently generated by the
functions quasibin
and quasibin
. See the examples for these functions.
The function fits the generalized linear model “II” proposed by Williams (1982) accounting
for overdispersion in clustered binomial data .
quasibin(formula, data, link = c("logit", "cloglog"), phi = NULL, tol = 0.001)
quasibin(formula, data, link = c("logit", "cloglog"), phi = NULL, tol = 0.001)
formula |
A formula for the fixed effects. The left-hand side of the formula must be of the form
|
link |
The link function for the mean |
data |
A data frame containing the response ( |
phi |
When |
tol |
A positive scalar (default to 0.001). The algorithm stops at iteration |
For a given cluster , the model is:
with a random variable of mean
and variance
.
The marginal mean and variance are:
The overdispersion parameter corresponds to the intra-cluster correlation coefficient,
which is here restricted to be positive.
The function uses the function glm
and the parameterization: , where
is the
inverse of a given link function,
is a design-matrix,
is a vector of fixed effects and
is the linear predictor.
The estimate of maximizes the quasi log-likelihood of the marginal model.
The parameter
is estimated with the moment method or can be set to a constant
(a regular glim is fitted when
is set to zero). The literature recommends to estimate
from the saturated model. Several explanatory variables are allowed in
. None is allowed in
.
An object of formal class “glimQL”: see glimQL-class
for details.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Moore, D.F., 1987, Modelling the extraneous variance in the presence of extra-binomial variation.
Appl. Statist. 36, 8-14.
Williams, D.A., 1982, Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-148.
glm
, geese
in the contributed package geepack,
glm.binomial.disp
in the contributed package dispmod.
data(orob2) fm1 <- glm(cbind(y, n - y) ~ seed * root, family = binomial, data = orob2) fm2 <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2, phi = 0) fm3 <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2) rbind(fm1 = coef(fm1), fm2 = coef(fm2), fm3 = coef(fm3)) # show the model fm3 # dispersion parameter and goodness-of-fit statistic c(phi = fm3@phi, X2 = sum(residuals(fm3, type = "pearson")^2)) # model predictions predfm1 <- predict(fm1, type = "response", se = TRUE) predfm3 <- predict(fm3, type = "response", se = TRUE) New <- expand.grid(seed = levels(orob2$seed), root = levels(orob2$root)) predict(fm3, New, se = TRUE, type = "response") data.frame(orob2, p1 = predfm1$fit, se.p1 = predfm1$se.fit, p3 = predfm3$fit, se.p3 = predfm3$se.fit) fm4 <- quasibin(cbind(y, n - y) ~ seed + root, data = orob2, phi = fm3@phi) # Pearson's chi-squared goodness-of-fit statistic # compare with fm3's X2 sum(residuals(fm4, type = "pearson")^2)
data(orob2) fm1 <- glm(cbind(y, n - y) ~ seed * root, family = binomial, data = orob2) fm2 <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2, phi = 0) fm3 <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2) rbind(fm1 = coef(fm1), fm2 = coef(fm2), fm3 = coef(fm3)) # show the model fm3 # dispersion parameter and goodness-of-fit statistic c(phi = fm3@phi, X2 = sum(residuals(fm3, type = "pearson")^2)) # model predictions predfm1 <- predict(fm1, type = "response", se = TRUE) predfm3 <- predict(fm3, type = "response", se = TRUE) New <- expand.grid(seed = levels(orob2$seed), root = levels(orob2$root)) predict(fm3, New, se = TRUE, type = "response") data.frame(orob2, p1 = predfm1$fit, se.p1 = predfm1$se.fit, p3 = predfm3$fit, se.p3 = predfm3$se.fit) fm4 <- quasibin(cbind(y, n - y) ~ seed + root, data = orob2, phi = fm3@phi) # Pearson's chi-squared goodness-of-fit statistic # compare with fm3's X2 sum(residuals(fm4, type = "pearson")^2)
The function fits the log linear model (“Procedure II”) proposed by Breslow (1984) accounting for
overdispersion in counts .
quasipois(formula, data, phi = NULL, tol = 0.001)
quasipois(formula, data, phi = NULL, tol = 0.001)
formula |
A formula for the fixed effects. The left-hand side of the formula must be the counts |
data |
A data frame containing the response ( |
phi |
When |
tol |
A positive scalar (default to 0.001). The algorithm stops at iteration |
For a given count , the model is:
with a random variable of mean
and variance
.
The marginal mean and variance are:
The function uses the function glm
and the parameterization: , where
is a design-matrix,
is a vector of fixed effects and
is the linear predictor.
The estimate of maximizes the quasi log-likelihood of the marginal model.
The parameter
is estimated with the moment method or can be set to a constant
(a regular glim is fitted when
is set to 0). The literature recommends to estimate
with the saturated model. Several explanatory variables are allowed in
. None is allowed in
.
An offset can be specified in the argument formula
to model rates (see examples). The offset and the
marginal mean are
and
, respectively.
An object of formal class “glimQL”: see glimQL-class
for details.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Breslow, N.E., 1984. Extra-Poisson variation in log-linear models. Appl. Statist. 33, 38-44.
Moore, D.F., Tsiatis, A., 1991. Robust estimation of the variance in moment methods for extra-binomial
and extra-poisson variation. Biometrics 47, 383-401.
glm
, negative.binomial
in the recommended package MASS,
geese
in the contributed package geepack,
glm.poisson.disp
in the contributed package dispmod.
# without offset data(salmonella) quasipois(y ~ log(dose + 10) + dose, data = salmonella) quasipois(y ~ log(dose + 10) + dose, data = salmonella, phi = 0.07180449) summary(glm(y ~ log(dose + 10) + dose, family = poisson, data = salmonella)) quasipois(y ~ log(dose + 10) + dose, data = salmonella, phi = 0) # with offset data(cohorts) i <- cohorts$age ; levels(i) <- 1:7 j <- cohorts$period ; levels(j) <- 1:7 i <- as.numeric(i); j <- as.numeric(j) cohorts$cohort <- j + max(i) - i cohorts$cohort <- as.factor(1850 + 5 * cohorts$cohort) fm1 <- quasipois(y ~ age + period + cohort + offset(log(n)), data = cohorts) fm1 quasipois(y ~ age + cohort + offset(log(n)), data = cohorts, phi = fm1@phi)
# without offset data(salmonella) quasipois(y ~ log(dose + 10) + dose, data = salmonella) quasipois(y ~ log(dose + 10) + dose, data = salmonella, phi = 0.07180449) summary(glm(y ~ log(dose + 10) + dose, family = poisson, data = salmonella)) quasipois(y ~ log(dose + 10) + dose, data = salmonella, phi = 0) # with offset data(cohorts) i <- cohorts$age ; levels(i) <- 1:7 j <- cohorts$period ; levels(j) <- 1:7 i <- as.numeric(i); j <- as.numeric(j) cohorts$cohort <- j + max(i) - i cohorts$cohort <- as.factor(1850 + 5 * cohorts$cohort) fm1 <- quasipois(y ~ age + period + cohort + offset(log(n)), data = cohorts) fm1 quasipois(y ~ age + cohort + offset(log(n)), data = cohorts, phi = fm1@phi)
Experimental data for analyzing the effect of an increasing dose of a compound on the proportion of live foetuses affected. Four treatment-groups were considered: control “C”, low dose “L”, medium dose “M” and high dose “H”. The animal species used in the experiment was banded Dutch rabbit (Paul, 1982, Table 1).
data(rabbits)
data(rabbits)
A data frame with 84 observations on the following 3 variables.
a factor with levels C
, H
, L
and M
a numeric vector: the total number of foetuses.
a numeric vector: the number of affected foetuses.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments. Biometrics 38, 361-370.
Tests the homogeneity of proportions between groups (H0:
) from clustered binomial
data
using the adjusted
statistic proposed by Rao and Scott (1993).
raoscott(formula = NULL, response = NULL, weights = NULL, group = NULL, data, pooled = FALSE, deff = NULL)
raoscott(formula = NULL, response = NULL, weights = NULL, group = NULL, data, pooled = FALSE, deff = NULL)
formula |
An optional formula where the left-hand side is either a matrix of the form |
response |
An optional argument: either a matrix of the form |
weights |
An optional argument used when the left-hand side of |
group |
An optional argument only used when |
data |
A data frame containing the response ( |
pooled |
Logical indicating if a pooled design effect is estimated over the |
deff |
A numerical vector of |
The method is based on the concepts of design effect and effective sample size.
The design effect in each group is estimated by
, where
is
the variance of the ratio estimate of the probability in group
(Cochran, 1999, p. 32 and p. 66)
and
is the standard binomial variance. A pooled design effect (i.e., over the
groups)
is estimated if argument
pooled = TRUE
(see Rao and Scott, 1993, Eq. 6). Fixed design effects can be specified
with the argument deff
.
The are used to compute the effective sample sizes
, the effective numbers
of successes
in each group
, and the overall effective proportion
.
The test statistic is obtained by substituting these quantities in the usual
statistic,
yielding:
which is compared to a distribution with
degrees of freedom.
An object of formal class “drs”: see drs-class
for details. The slot tab
provides the proportion of successes, the variances of the proportion and the design effect for each group.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Cochran, W.G., 1999, 2nd ed. Sampling techniques. John Wiley & Sons, New York.
Rao, J.N.K., Scott, A.J., 1992. A simple method for the analysis of clustered binary data.
Biometrics 48, 577-585.
chisq.test
, donner
, iccbin
, drs-class
data(rats) # deff by group raoscott(cbind(y, n - y) ~ group, data = rats) raoscott(y/n ~ group, weights = n, data = rats) raoscott(response = cbind(y, n - y), group = group, data = rats) raoscott(response = y/n, weights = n, group = group, data = rats) # pooled deff raoscott(cbind(y, n - y) ~ group, data = rats, pooled = TRUE) # standard test raoscott(cbind(y, n - y) ~ group, data = rats, deff = c(1, 1)) data(antibio) raoscott(cbind(y, n - y) ~ treatment, data = antibio)
data(rats) # deff by group raoscott(cbind(y, n - y) ~ group, data = rats) raoscott(y/n ~ group, weights = n, data = rats) raoscott(response = cbind(y, n - y), group = group, data = rats) raoscott(response = y/n, weights = n, group = group, data = rats) # pooled deff raoscott(cbind(y, n - y) ~ group, data = rats, pooled = TRUE) # standard test raoscott(cbind(y, n - y) ~ group, data = rats, deff = c(1, 1)) data(antibio) raoscott(cbind(y, n - y) ~ treatment, data = antibio)
“Weil (1970) in Table 1 gives the results from an experiment comprising two treatments. One group of
16 pregnant female rats was fed a control diet during pregnancy and lactation, the diet of a second group of 16 pregnant
females was treated with a chemical. For each litter the number of pups alive at 4 days and the number
of pups that survived the 21 day lactation period were recorded.” (Williams, 1975, p. 951).
data(rats)
data(rats)
A data frame with 32 observations on the following 3 variables.
A factor with levels CTRL
and TREAT
A numeric vector: the number of pups alive at 4 days.
A numeric vector: the number of pups that survived the 21 day lactation.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.
Weil, C.S., 1970. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Fd. Cosmet. Toxicol. 8, 177-182.
Residuals of models fitted with functions betabin
and negbin
(formal class “glimML”), or
quasibin
and quasipois
(formal class “glimQL”).
## S4 method for signature 'glimML' residuals(object, type = c("pearson", "response"), ...) ## S4 method for signature 'glimQL' residuals(object, type = c("pearson", "response"), ...)
## S4 method for signature 'glimML' residuals(object, type = c("pearson", "response"), ...) ## S4 method for signature 'glimQL' residuals(object, type = c("pearson", "response"), ...)
object |
Fitted model of formal class “glimML” or “glimQL”. |
type |
Character string for the type of residual: “pearson” (default) or “response”. |
... |
Further arguments to be passed to the function, such as |
For models fitted with betabin
or quasibin
, Pearson's residuals are computed as:
where and
are respectively the numerator and the denominator of the response,
is the fitted probability and
is the fitted overdispersion parameter. When
, the
residual is set to 0. Response residuals are computed as
.
For models fitted with negbin
or quasipois
, Pearson's residuals are computed as:
where and
are the observed and fitted counts, respectively. Response residuals are
computed as
.
A numeric vector of residuals.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
data(orob2) fm <- betabin(cbind(y, n - y) ~ seed, ~ 1, link = "logit", data = orob2) #Pearson's chi-squared goodness-of-fit statistic sum(residuals(fm, type = "pearson")^2)
data(orob2) fm <- betabin(cbind(y, n - y) ~ seed, ~ 1, link = "logit", data = orob2) #Pearson's chi-squared goodness-of-fit statistic sum(residuals(fm, type = "pearson")^2)
“Data for our third example were compiled by Margolin et al. (1981) from an Ames Salmonella reverse mutagenicity assay. Table 1 shows the number of revertant colonies observed on each of 3 replicate plates tested at each of 6 dose levels of quinoline.” (Breslow, 1984, Table 1).
data(salmonella)
data(salmonella)
A data frame with 18 observations on the following 2 variables.
a numeric vector: the dose level of quinoline (microgram per plate).
a numeric vector: the number of revertant colonies of TA98 Salmonella.
Breslow, N.E., 1984. Extra-Poisson variation in log-linear models. Applied Statistics 33(1), 38-44.
Margolin, B.H., Kaplan, N., Zeiger, E., 1981. Statistical analysis of the Ames Salmonella/microsome test. Proc. Natl Acad. Sci. USA 76, 3779-3783.
The function splits grouped data and optional covariates into individual data.
Two types of grouped data are managed by splitbin
:
Grouped data with weights;
Grouped data of form cbind(success, failure)
.
When weights, successes or failures involve non-integer numbers, these numbers are rounded before splitting.
splitbin(formula, data, id = "idbin")
splitbin(formula, data, id = "idbin")
formula |
A formula. The left-hand side represents grouped data. The right-hand side defines the covariates. See examples for syntax. |
data |
A data frame where all the variables described in |
id |
An optional character string naming the identifier (= grouping factor). Default to “idbin”. |
A data frame built according to the formula and function used in the call.
# Grouped data with weights mydata <- data.frame( success = c(0, 1, 0, 1), f1 = c("A", "A", "B", "B"), f2 = c("C", "D", "C", "D"), n = c(4, 2, 1, 3) ) mydata splitbin(formula = n ~ f1, data = mydata)$tab splitbin(formula = n ~ f1 + f2 + success , data = mydata)$tab # Grouped data of form "cbind(success, failure)" mydata <- data.frame( success = c(4, 1), failure = c(1, 2), f1 = c("A", "B"), f2 = c("C", "D") ) mydata$n <- mydata$success + mydata$failure mydata splitbin(formula = cbind(success, failure) ~ 1, data = mydata)$tab splitbin(formula = cbind(success, failure) ~ f1 + f2, data = mydata)$tab splitbin(formula = cbind(success, n - success) ~ f1 + f2, data = mydata)$tab splitbin(formula = cbind(success, n - 0.5 * failure - success) ~ f1 + f2, data = mydata)$tab
# Grouped data with weights mydata <- data.frame( success = c(0, 1, 0, 1), f1 = c("A", "A", "B", "B"), f2 = c("C", "D", "C", "D"), n = c(4, 2, 1, 3) ) mydata splitbin(formula = n ~ f1, data = mydata)$tab splitbin(formula = n ~ f1 + f2 + success , data = mydata)$tab # Grouped data of form "cbind(success, failure)" mydata <- data.frame( success = c(4, 1), failure = c(1, 2), f1 = c("A", "B"), f2 = c("C", "D") ) mydata$n <- mydata$success + mydata$failure mydata splitbin(formula = cbind(success, failure) ~ 1, data = mydata)$tab splitbin(formula = cbind(success, failure) ~ f1 + f2, data = mydata)$tab splitbin(formula = cbind(success, n - success) ~ f1 + f2, data = mydata)$tab splitbin(formula = cbind(success, n - 0.5 * failure - success) ~ f1 + f2, data = mydata)$tab
Computes Akaike difference and Akaike weights from an object of formal class “aic”.
## S4 method for signature 'aic' summary(object, which = c("AIC", "AICc"))
## S4 method for signature 'aic' summary(object, which = c("AIC", "AICc"))
object |
An object of formal class “aic”. |
which |
A character string indicating which information criterion is selected to compute Akaike difference and Akaike weights: either “AIC” or “AICc”. |
The models are ordered according to AIC or AICc and 3 statistics are computed:
- the Akaike difference : the change in AIC (or AICc) between successive (ordered) models,
- the Akaike weight : when
models are compared,
,
- the cumulative Akaike weight : the Akaike weights sum to 1 for the
models which
are compared.
Burnham, K.P., Anderson, D.R., 2002. Model selection and multimodel inference: a practical
information-theoretic approach. New-York, Springer-Verlag, 496 p.
Hurvich, C.M., Tsai, C.-L., 1995. Model selection for extended quasi-likelihood models in small samples.
Biometrics, 51 (3): 1077-1084.
Examples in betabin
and AIC
in package stats.
Summary of a model of formal class “glimML” fitted by betabin
or negbin
.
Objects can be created by calls of the form new("summary.glimML", ...)
or,
more commonly, via the summary
or show
method for objects of formal class “glimML”.
object
An object of formal class “glimML”.
Coef
A data frame containing the estimates, standard error, z and P values for the fixed-effect coefficients which were estimated by the fitting function.
FixedCoef
A data frame containing the values of the fixed-effect coefficients which were set to a fixed value.
Phi
A data frame containing the estimates, standard error, z and P values for the overdispersion
coefficients which were estimated by the fitting function. Because the overdispersion
coefficients are , P values correspond to unilateral tests.
FixedPhi
A data frame containing the values of the overdispersion coefficients which were set to a fixed value.
signature(object = "summary.glimML")
signature(object = "glimML")
signature(object = "glimML")
data(orob2) fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) # show objects of class "glimML" fm1 # summary for objects of class "glimML" res <- summary(fm1) res@Coef # show objects of class "summary.glimML" res
data(orob2) fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2) # show objects of class "glimML" fm1 # summary for objects of class "glimML" res <- summary(fm1) res@Coef # show objects of class "summary.glimML" res
This function computes the mean and variance of a proportion from clustered binomial data , using various
methods. Confidence intervals are computed using a normal approximation, which might be inappropriate when the
proportion is close to 0 or 1.
varbin(n, y, data, alpha = 0.05, R = 5000)
varbin(n, y, data, alpha = 0.05, R = 5000)
n |
The denominator of the proportion. |
y |
The numerator of the proportion. |
data |
A data frame containing the data. |
alpha |
The significance level for the confidence intervals. Default to 0.05, providing 95% CI's. |
R |
The number of bootstrap replicates to compute the bootstrap mean and variance. |
Five methods are used for the estimations. Let us
consider clusters of sizes
with observed responses (counts)
.
We note
the observed proportions
. An underlying assumption is that the
theoretical proportion is homogeneous across the clusters.
Binomial method: the proportion and its variance are estimated as and
, respectively.
Ratio method: the one-stage cluster sampling formula is used to estimate the variance of the ratio estimate
(see Cochran, 1999, p. 32 and p. 66). The proportion is estimated as above ().
Arithmetic method: the proportion is estimated as , with estimated variance
.
Jackknife method: the proportion is the arithmetic mean of the pseudovalues
, with estimated
variance
(Gladen, 1977, Paul, 1982).
Bootstrap method: samples of size
are drawn with equal probability from the initial sample
(Efron and Tibshirani, 1993). The bootstrap estimate
and its estimated variance
are the arithmetic mean and the empirical variance (computed with denominator
) of the
binomial
estimates, respectively.
An object of formal class “varbin”, with 5 slots:
CALL |
The call of the function. |
tab |
A 4-column data frame giving for each estimation method the mean, variance, upper and lower limits
of the |
boot |
A numeric vector containing the R bootstrap replicates of the proportion. Might be used to compute other kinds of CI's for the proportion. |
alpha |
The significance level used to compute the |
features |
A numeric vector with 3 components summarizing the main features of the data: |
The “show” method displays the slot tab
described above, substituting the standard error to the variance.
Matthieu Lesnoff [email protected], Renaud Lancelot [email protected]
Cochran, W.G., 1999, 3th ed. Sampling techniques. Wiley, New York.
Efron, B., Tibshirani, R., 1993. An introduction to the bootstrap. Chapman and Hall, London.
Gladen, B., 1977. The use of the jackknife to estimate proportions from toxicological data in the presence
of litter effects. JASA 74(366), 278-283.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments.
Biometrics 38, 361-370.
data(rabbits) varbin(n, y, rabbits[rabbits$group == "M", ]) by(rabbits, list(group = rabbits$group), function(x) varbin(n = n, y = y, data = x, R = 1000))
data(rabbits) varbin(n, y, rabbits[rabbits$group == "M", ]) by(rabbits, list(group = rabbits$group), function(x) varbin(n = n, y = y, data = x, R = 1000))
Representation of the output of function varbin
used to estimate proportions and their variance
under various distribution assumptions.
Objects can be created by calls of the form new("varbin", ...)
or, more commonly,
via the function varbin
.
CALL
The call of the function.
tab
A data frame containing the estimates, their variance and the confidence limits.
pboot
A numeric vector containing the bootstrap replicates.
alpha
The level to compute confidence intervals.
features
A named numeric vector summarizing the design.
signature(object = "varbin")
: see varbin
.
Extract the approximate var-cov matrix of estimated coefficients from fitted models.
Generic function: see vcov
.
Extract the var-cov matrix of estimated coefficients for fitted models of formal class “glimML”.
Extract the var-cov matrix of estimated coefficients for fitted models of formal class “glimQL”.
Extract the var-cov matrix of estimated coefficients for fitted models of class “geese” (contributed package geepack).
Extract the var-cov matrix of estimated coefficients for fitted objects of class “geeglm” (contributed package geepack).
Computes a Wald test for 1 or more coefficients, given their variance-covariance matrix.
wald.test(Sigma, b, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE) ## S3 method for class 'wald.test' print(x, digits = 2, ...)
wald.test(Sigma, b, Terms = NULL, L = NULL, H0 = NULL, df = NULL, verbose = FALSE) ## S3 method for class 'wald.test' print(x, digits = 2, ...)
Sigma |
A var-cov matrix, usually extracted from one of the fitting functions (e.g., |
b |
A vector of coefficients with var-cov matrix |
Terms |
An optional integer vector specifying which coefficients should be jointly tested, using a Wald
|
L |
An optional matrix conformable to |
H0 |
A numeric vector giving the null hypothesis for the test. It must be as long as |
df |
A numeric vector giving the degrees of freedom to be used in an |
verbose |
A logical scalar controlling the amount of output information. The default is |
x |
Object of class “wald.test” |
digits |
Number of decimal places for displaying test results. Default to 2. |
... |
Additional arguments to |
The key assumption is that the coefficients asymptotically follow a (multivariate) normal distribution with mean =
model coefficients and variance = their var-cov matrix.
One (and only one) of Terms
or L
must be given. When L
is given, it must have the same number of
columns as the length of b
, and the same number of rows as the number of linear combinations of coefficients.
When df
is given, the Wald statistic is divided by
m
= the number of
linear combinations of coefficients to be tested (i.e., length(Terms)
or nrow(L)
). Under the null
hypothesis H0
, this new statistic follows an distribution.
An object of class wald.test
, printed with print.wald.test
.
Diggle, P.J., Liang, K.-Y., Zeger, S.L., 1994. Analysis of longitudinal data. Oxford, Clarendon Press, 253 p.
Draper, N.R., Smith, H., 1998. Applied Regression Analysis. New York, John Wiley & Sons, Inc., 706 p.
data(orob2) fm <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2) # Wald test for the effect of root wald.test(b = coef(fm), Sigma = vcov(fm), Terms = 3:4)
data(orob2) fm <- quasibin(cbind(y, n - y) ~ seed * root, data = orob2) # Wald test for the effect of root wald.test(b = coef(fm), Sigma = vcov(fm), Terms = 3:4)