Title: | Count Regression Models Based on the Bell Distribution |
---|---|
Description: | Bell regression models for count data with overdispersion. The implemented models account for ordinary and zero-inflated regression models under both frequentist and Bayesian approaches. Theoretical details regarding the models implemented in the package can be found in Castellares et al. (2018) <doi:10.1016/j.apm.2017.12.014> and Lemonte et al. (2020) <doi:10.1080/02664763.2019.1636940>. |
Authors: | Fabio Demarqui [aut, cre, cph] , Marcos Prates [ctb] , Fredy Caceres [ctb], Andrew Johnson [ctb] |
Maintainer: | Fabio Demarqui <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.2.2 |
Built: | 2024-10-24 04:28:04 UTC |
Source: | CRAN |
Bell Regression models for count data with overdispersion. The implemented models account for ordinary and zero-inflated regression models under both frequentist and Bayesian approaches. Theorical details regarding the models implemented in the package can be found in (Castellares et al. 2018) and (Lemonte et al. 2020)
_PACKAGE
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.19.3. https://mc-stan.org
Castellares F, Ferrari SL, Lemonte AJ (2018). “On the Bell distribution and its associated regression model for count data.” Applied Mathematical Modelling, 56, 172 - 185. doi:10.1016/j.apm.2017.12.014.
Lemonte AJ, Moreno-Arenas G, Castellares F (2020). “Zero-inflated Bell regression models for count data.” Journal of Applied Statistics, 47(2), 265-286. doi:10.1080/02664763.2019.1636940.
Akaike information criterion
## S3 method for class 'bellreg' AIC(object, ..., k = 2)
## S3 method for class 'bellreg' AIC(object, ..., k = 2)
object |
an object of the class bellreg. |
... |
further arguments passed to or from other methods. |
k |
numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC. |
the Akaike information criterion value when a single model is passed to the function; otherwise, a data.frame with the Akaike information criterion values and the number of parameters is returned.
library(bellreg) data(faults) fit1 <- bellreg(nf ~ 1, data = faults, approach = "mle") fit2 <- bellreg(nf ~ lroll, data = faults, approach = "mle") AIC(fit1, fit2)
library(bellreg) data(faults) fit1 <- bellreg(nf ~ 1, data = faults, approach = "mle") fit2 <- bellreg(nf ~ lroll, data = faults, approach = "mle") AIC(fit1, fit2)
Akaike information criterion for zibellreg objects
## S3 method for class 'zibellreg' AIC(object, ..., k = 2)
## S3 method for class 'zibellreg' AIC(object, ..., k = 2)
object |
an object of the class zibellreg. |
... |
further arguments passed to or from other methods. |
k |
numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC. |
the Akaike information criterion value when a single model is passed to the function; otherwise, a data.frame with the Akaike information criterion values and the number of parameters is returned.
library(bellreg) data(cells) fit1 <- zibellreg(cells ~ 1|1, data = cells, approach = "mle") fit2 <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "mle") fit3 <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") AIC(fit1, fit2, fit3)
library(bellreg) data(cells) fit1 <- zibellreg(cells ~ 1|1, data = cells, approach = "mle") fit2 <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "mle") fit3 <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") AIC(fit1, fit2, fit3)
Family objects provide a convenient way to specify the details of the
models used by functions such as glm
. See the
documentation for glm
for the details on how such model
fitting takes place.
bell(link = "log")
bell(link = "log")
link |
a specification for the model link function. This can be
a name/expression, a literal character string, a length-one character
vector, or an object of class
The The |
family
is a generic function with methods for classes
"glm"
and "lm"
(the latter returning gaussian()
).
For the binomial
and quasibinomial
families the response
can be specified in one of three ways:
As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
As a numerical vector with values between 0
and
1
, interpreted as the proportion of successful cases (with the
total number of cases given by the weights
).
As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
The quasibinomial
and quasipoisson
families differ from
the binomial
and poisson
families only in that the
dispersion parameter is not fixed at one, so they can model
over-dispersion. For the binomial case see
McCullagh and Nelder (1989, pp. 124–8).
Although they show that there is (under some
restrictions) a model with
variance proportional to mean as in the quasi-binomial model, note
that glm
does not compute maximum-likelihood estimates in that
model. The behaviour of S is closer to the quasi- variants.
An object of class "family"
(which has a concise print method).
This is a list with elements
family |
character: the family name. |
link |
character: the link name. |
linkfun |
function: the link. |
linkinv |
function: the inverse of the link function. |
variance |
function: the variance as a function of the mean. |
dev.resids |
function giving the deviance for each observation
as a function of |
aic |
function giving the AIC value if appropriate (but |
mu.eta |
function: derivative of the inverse-link function
with respect to the linear predictor. If the inverse-link
function is |
initialize |
expression. This needs to set up whatever data
objects are needed for the family as well as |
validmu |
logical function. Returns |
valideta |
logical function. Returns |
simulate |
(optional) function |
dispersion |
(optional since R version 4.3.0) numeric: value of the
dispersion parameter, if fixed, or |
The link
and variance
arguments have rather awkward
semantics for back-compatibility. The recommended way is to supply
them as quoted character strings, but they can also be supplied
unquoted (as names or expressions). Additionally, they can be
supplied as a length-one character vector giving the name of one of
the options, or as a list (for link
, of class
"link-glm"
). The restrictions apply only to links given as
names: when given as a character string all the links known to
make.link
are accepted.
This is potentially ambiguous: supplying link = logit
could mean
the unquoted name of a link or the value of object logit
. It
is interpreted if possible as the name of an allowed link, then
as an object. (You can force the interpretation to always be the value of
an object via logit[1]
.)
The design was inspired by S functions of the same names described
in Hastie & Pregibon (1992) (except quasibinomial
and
quasipoisson
).
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.
Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples. London: Chapman and Hall.
Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
For binomial coefficients, choose
;
the binomial and negative binomial distributions,
Binomial
, and NegBinomial
.
library(bellreg) data(faults) fit <- glm(nf ~ lroll, data = faults, family = bell("log")) summary(fit)
library(bellreg) data(faults) fit <- glm(nf ~ lroll, data = faults, family = bell("log")) summary(fit)
Probability function, distribution function, quantile function and random generation for the Bell distribution with parameter theta.
dbell(x, theta, log = FALSE) pbell(q, theta, lower.tail = TRUE, log.p = FALSE) qbell(p, theta, log.p = FALSE) rbell(n, theta)
dbell(x, theta, log = FALSE) pbell(q, theta, lower.tail = TRUE, log.p = FALSE) qbell(p, theta, log.p = FALSE) rbell(n, theta)
x |
vector of (non-negative integer) quantiles. |
theta |
parameter of the Bell distribution (theta > 0). |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
q |
vector of quantiles. |
lower.tail |
logical; if TRUE (default), probabilities are |
p |
vector of probabilities. |
n |
number of random values to return. |
Probability mass function
where is the Bell number, and x = 0, 1, ....
dbell gives the (log) probability function, pbell gives the (log) distribution function, qbell gives the quantile function, and rbell generates random deviates.
Fits the Bell regression model to overdispersed count data.
bellreg( formula, data = NULL, approach = c("mle", "bayes"), hessian = TRUE, link = c("log", "sqrt", "identity"), hyperpars = list(mu_beta = 0, sigma_beta = 10), ... )
bellreg( formula, data = NULL, approach = c("mle", "bayes"), hessian = TRUE, link = c("log", "sqrt", "identity"), hyperpars = list(mu_beta = 0, sigma_beta = 10), ... )
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which ypbp is called. |
approach |
approach to be used to fit the model (mle: maximum likelihood; bayes: Bayesian approach). |
hessian |
hessian logical; If TRUE (default), the hessian matrix is returned when approach="mle". |
link |
assumed link function (log, sqrt or identiy); default is log. |
hyperpars |
a list containing the hyperparameters associated with the prior distribution of the regression coefficients; if not specified then default choice is hyperpars = c(mu_beta = 0, sigma_beta = 10). |
... |
further arguments passed to either |
bellreg returns an object of class "bellreg" containing the fitted model.
data(faults) # ML approach: mle <- bellreg(nf ~ lroll, data = faults, approach = "mle") summary(mle) # Bayesian approach: bayes <- bellreg(nf ~ lroll, data = faults, approach = "bayes", refresh = FALSE) summary(bayes)
data(faults) # ML approach: mle <- bellreg(nf ~ lroll, data = faults, approach = "mle") summary(mle) # Bayesian approach: bayes <- bellreg(nf ~ lroll, data = faults, approach = "bayes", refresh = FALSE) summary(bayes)
Data set taken from (Crawley 2012) and posteriorly analyzed by (Lemonte et al. 2020). The data includes the count of infected blood cells per square millimetre on microscope slides prepared from n = 511 randomly selected individuals.
A data frame with 511 rows and 5 variables:
cells: count of infected blood cells per square millimetre on microscope slides
smoker: smoking status of the subject (0: smoker; 1: non smoker)
gender: subject's gender (1: male; 0: female).
age: subject's age categorized into three levels: young (), mid (21 to 59), and old (
).
weight: body mass score categorized into three levels: normal, overweight, obese.
Crawley MJ (2012). The R Book, 2nd edition. Wiley Publishing. ISBN 0470973927.
Lemonte AJ, Moreno-Arenas G, Castellares F (2020). “Zero-inflated Bell regression models for count data.” Journal of Applied Statistics, 47(2), 265-286. doi:10.1080/02664763.2019.1636940.
Estimated regression coefficients for the bellreg model
## S3 method for class 'bellreg' coef(object, ...)
## S3 method for class 'bellreg' coef(object, ...)
object |
an object of the class bellreg. |
... |
further arguments passed to or from other methods. |
a vector with the estimated regression coefficients.
data(faults) fit <- bellreg(nf ~ lroll, data=faults) coef(fit)
data(faults) fit <- bellreg(nf ~ lroll, data=faults) coef(fit)
Estimated regression coefficients for zibellreg model
## S3 method for class 'zibellreg' coef(object, ...)
## S3 method for class 'zibellreg' coef(object, ...)
object |
an object of the class bellreg |
... |
further arguments passed to or from other methods |
a list containing the the estimated regression coefficients associated with the degenerated and Bell count distributions, respectively.
data(cells) fit <- zibellreg(cells ~ smoker + gender|smoker + gender, data = cells) coef(fit)
data(cells) fit <- zibellreg(cells ~ smoker + gender|smoker + gender, data = cells) coef(fit)
Confidence intervals for the regression coefficients
## S3 method for class 'bellreg' confint(object, parm = NULL, level = 0.95, ...)
## S3 method for class 'bellreg' confint(object, parm = NULL, level = 0.95, ...)
object |
an object of the class bellreg |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required |
... |
further arguments passed to or from other methods |
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in \
data(faults) fit <- bellreg(nf ~ lroll, data = faults) confint(fit)
data(faults) fit <- bellreg(nf ~ lroll, data = faults) confint(fit)
Confidence intervals for the regression coefficients
## S3 method for class 'zibellreg' confint(object, parm = NULL, level = 0.95, ...)
## S3 method for class 'zibellreg' confint(object, parm = NULL, level = 0.95, ...)
object |
an object of the class zibellreg |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required |
... |
further arguments passed to or from other methods |
100(1-alpha)% confidence intervals for the regression coefficients
data(cells) fit <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") confint(fit)
data(cells) fit <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") confint(fit)
This function extracts the pointwise log-likelihood for a bellreg model.
extract_log_lik(object, ...)
extract_log_lik(object, ...)
object |
an object of the class bellreg. |
... |
further arguments passed to or from other methods. |
a matrix with the pointwise extracted log-likelihood associated with a bellreg model.
data(faults) fit <- bellreg(nf ~ lroll, data = faults, approach = "bayes") loglik <- extract_log_lik(fit) data(cells) fit <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "bayes", chains = 1, iter = 100) loglik <- extract_log_lik(fit)
data(faults) fit <- bellreg(nf ~ lroll, data = faults, approach = "bayes") loglik <- extract_log_lik(fit) data(cells) fit <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "bayes", chains = 1, iter = 100) loglik <- extract_log_lik(fit)
Data set taken from ( ) and posteriorly analyzed by (Castellares et al. 2018). The data contains the number of faults in rolls of fabric of different lengths.
A data frame with 32 rows and 2 variables:
nf: number of faults in rolls of fabric of different lengths.
lroll: length of the roll.
Castellares F, Ferrari SL, Lemonte AJ (2018).
“On the Bell distribution and its associated regression model for count data.”
Applied Mathematical Modelling, 56, 172 - 185.
doi:10.1016/j.apm.2017.12.014.
Hind J (ed.) (1982).
Compound Poisson Regression Models, volume 14 of Lecture Notes in Statistics.
ISBN 978-0-387-90777-2, doi:10.1007/978-1-4612-5771-4_11.
This function returns the fitted values.
## S3 method for class 'bellreg' fitted(object, ...)
## S3 method for class 'bellreg' fitted(object, ...)
object |
an object of the class bellreg. |
... |
further arguments passed to or from other methods. |
a vector with the fitted values (for MLE approach) or a matrix containing the posterior sample of the fitted values.
data(faults) fit <- bellreg(nf ~ lroll, data = faults) fitted.values(fit)
data(faults) fit <- bellreg(nf ~ lroll, data = faults) fitted.values(fit)
Print the summary.bellreg output
## S3 method for class 'summary.bellreg' print(x, ...)
## S3 method for class 'summary.bellreg' print(x, ...)
x |
an object of the class summary.bellreg. |
... |
further arguments passed to or from other methods. |
a summary of the fitted model.
Print the summary.zibellreg output
## S3 method for class 'summary.zibellreg' print(x, ...)
## S3 method for class 'summary.zibellreg' print(x, ...)
x |
an object of the class summary.zibellreg. |
... |
further arguments passed to or from other methods. |
a summary of the fitted model.
Summary for the bellreg model
## S3 method for class 'bellreg' summary(object, ...)
## S3 method for class 'bellreg' summary(object, ...)
object |
an objecto of the class 'bellreg'. |
... |
further arguments passed to or from other methods. |
Summary for the zibellreg model
## S3 method for class 'zibellreg' summary(object, ...)
## S3 method for class 'zibellreg' summary(object, ...)
object |
an objecto of the class 'zibellreg'. |
... |
further arguments passed to or from other methods. |
This function extracts and returns the variance-covariance matrix associated with the regression coefficients when the maximum likelihood estimation approach is used in the model fitting.
## S3 method for class 'bellreg' vcov(object, ...)
## S3 method for class 'bellreg' vcov(object, ...)
object |
an object of the class bellreg. |
... |
further arguments passed to or from other methods. |
the variance-covariance matrix associated with the regression coefficients.
data(faults) fit <- bellreg(nf ~ lroll, data = faults) vcov(fit)
data(faults) fit <- bellreg(nf ~ lroll, data = faults) vcov(fit)
Covariance of the regression coefficients
## S3 method for class 'zibellreg' vcov(object, ...)
## S3 method for class 'zibellreg' vcov(object, ...)
object |
an object of the class bellreg |
... |
further arguments passed to or from other methods. |
the variance-covariance matrix associated with the regression coefficients.
data(cells) fit <- zibellreg(cells ~ smoker + gender|smoker + gender, data = cells) vcov(fit)
data(cells) fit <- zibellreg(cells ~ smoker + gender|smoker + gender, data = cells) vcov(fit)
Fits the Bell regression model to overdispersed count data.
zibellreg( formula, data, approach = c("mle", "bayes"), hessian = TRUE, link1 = c("logit", "probit", "cloglog", "cauchy"), link2 = c("log", "sqrt", "identity"), hyperpars = list(mu_psi = 0, sigma_psi = 10, mu_beta = 0, sigma_beta = 10), ... )
zibellreg( formula, data, approach = c("mle", "bayes"), hessian = TRUE, link1 = c("logit", "probit", "cloglog", "cauchy"), link2 = c("log", "sqrt", "identity"), hyperpars = list(mu_psi = 0, sigma_psi = 10, mu_beta = 0, sigma_beta = 10), ... )
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which ypbp is called. |
approach |
approach to be used to fit the model (mle: maximum likelihood; bayes: Bayesian approach). |
hessian |
hessian logical; If TRUE (default), the hessian matrix is returned when approach="mle". |
link1 |
assumed link function for degenerate distribution (logit, probit, cloglog, cauchy); default is logit. |
link2 |
assumed link function for count distribution (log, sqrt or identiy); default is log. |
hyperpars |
a list containing the hyperparameters associated with the prior distribution of the regression coefficients; if not specified then default choice is hyperpars = c(mu_psi = 0, sigma_psi = 10, mu_beta = 0, sigma_beta = 10). |
... |
further arguments passed to either |
zibellreg returns an object of class "zibellreg" containing the fitted model.
# ML approach: data(cells) mle <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") summary(mle) # Bayesian approach: bayes <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "bayes", refresh = FALSE) summary(bayes)
# ML approach: data(cells) mle <- zibellreg(cells ~ smoker+gender|smoker+gender, data = cells, approach = "mle") summary(mle) # Bayesian approach: bayes <- zibellreg(cells ~ 1|smoker+gender, data = cells, approach = "bayes", refresh = FALSE) summary(bayes)