Title: | Good Regression |
---|---|
Description: | Fit Good regression models to count data (Tur et al., 2021) <doi:10.48550/arXiv.2105.01557>. The package provides functions for model estimation and model prediction. Density, distribution function, quantile function and random generation for the Good distribution are also provided. |
Authors: | Jordi Tur [aut, cre], David Moriña [ctb], Pere Puig [ctb], Argimiro Arratia [ctb], Alejandra Cabaña [ctb], David Agis [ctb], Amanda Fernández-Fontelo [aut] |
Maintainer: | Jordi Tur <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.2 |
Built: | 2024-12-08 07:19:37 UTC |
Source: | CRAN |
Probability mass function for the Good distribution with parameters z and s.
dgood ( x , z , s )
dgood ( x , z , s )
x |
vector of non-negative integer quantiles. |
z |
vector of first parameter for the Good distribution. |
s |
vector of second parameter for the Good distribution. |
The Good distribution has the probability mass function (pmf):
where . Parameter z should be within the interval
, and parameter s in the reals.
is the polylogarithm function:
and acts in the pmf as the normalizing constant.
If does not converge (e.g., for large negative values of the parameter s), the following
approximation is used instead:
and dgood
returns approximated probabilities:
dgood
gives the probability mass function for the Good distribution with
parameters z and s. x should be a vector of non-negative integer quantiles. If x is
non-integer and/or negative, dgood
returns with a warning. z and s can be vectors with values
within the interval
and the reals respectively. If vector z has negative values and/or outside
the interval
,
dgood
returns NaN with a warning.
If function polylog
from package copula returns Inf
(e.g., for large negative values of parameter s), dgood
uses the approximation
described above for probabilities, and additionally returns an informative warning.
Jordi Tur, David Moriña, Pere Puig, Alejandra Cabaña, Argimiro Arratia, Amanda Fernández-Fontelo
Good, J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40: 237–264.
Zörnig, P. and Altmann, G. (1995). Unified representation of zipf distributions. Computational Statistics & Data Analysis, 19: 461–473.
Kulasekera, K.B. and Tonkyn, D. (1992). A new distribution with applications to survival dispersal anddispersion. Communication in Statistics - Simulation and Computation, 21: 499–518.
Doray, L.G. and Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics - Simulation and Computation, 26: 1075–1088.
Johnson, N.L., Kemp, A.W. and Kotz, S. Univariate Discrete Distributions. Wiley, Hoboken, 2005.
Kemp. A.W. (2010). Families of power series distributions, with particular reference to the lerch family. Journal of Statistical Planning and Inference, 140:2255–2259.
Wood, D.C. (1992). The Computation of Polylogarithms. Technical report. UKC, University of Kent, Canterbury, UK (KAR id:21052).
See also polylog
from copula, pgood
,
and qgood
and rgood
from good.
# if x is not a non-negative integer, dgood returns 0 with a warning dgood ( x = -3 , z = c ( 0.6 , 0.5 ) , s = -3 ) dgood ( x = 4.5 , z = c ( 0.6 , 0.5 ) , s = -3 ) # if z is not within 0 and 1, dgood returns NaN with a warning dgood ( x = 4 , z = c ( 0.6 , 0.5 , -0.9 ) , s = -3 ) # if the approximation is used, dgood returns a warning dgood ( x = 330 : 331 , z = c ( 0.6 , 0.5 ) , s = -170 ) dgood ( x = 4 , z = 0.6 , s = -3 ) dgood ( x = 4 , z = c ( 0.6 , 0.5 ) , s = -3 ) dgood ( x = 4 : 5 , z = c ( 0.6 , 0.5 ) , s = c ( -3 , -10 ) ) dgood ( x = 4 : 6 , z = c ( 0.6 , 0.5 ) , s = c ( -3 , -10 ) ) dgood ( x = 3 : 5 , z = c ( 0.6 , 0.5 , 0.9 , 0.4 ) , s = c ( -3 , -10 ) )
# if x is not a non-negative integer, dgood returns 0 with a warning dgood ( x = -3 , z = c ( 0.6 , 0.5 ) , s = -3 ) dgood ( x = 4.5 , z = c ( 0.6 , 0.5 ) , s = -3 ) # if z is not within 0 and 1, dgood returns NaN with a warning dgood ( x = 4 , z = c ( 0.6 , 0.5 , -0.9 ) , s = -3 ) # if the approximation is used, dgood returns a warning dgood ( x = 330 : 331 , z = c ( 0.6 , 0.5 ) , s = -170 ) dgood ( x = 4 , z = 0.6 , s = -3 ) dgood ( x = 4 , z = c ( 0.6 , 0.5 ) , s = -3 ) dgood ( x = 4 : 5 , z = c ( 0.6 , 0.5 ) , s = c ( -3 , -10 ) ) dgood ( x = 4 : 6 , z = c ( 0.6 , 0.5 ) , s = c ( -3 , -10 ) ) dgood ( x = 3 : 5 , z = c ( 0.6 , 0.5 , 0.9 , 0.4 ) , s = c ( -3 , -10 ) )
glm.good
is used to fit generalized linear models with a response variable following a
Good distribution with parameters z and s. glm.good
allows incorporating predictors in
the model with a link function (log, logit and identity) that relates parameter z and
predictors. A summary method over an object of class glm.good
provides essential
information regarding the fitted model such as parameters estimates, standard errors,
and some goodness-of-fit measures. A prediction method over an object of class glm.good
provides the fitted values with the estimated model and optionally standard errors and predictions
for a new data set.
glm.good ( formula , data , link = "log" , start = NULL )
glm.good ( formula , data , link = "log" , start = NULL )
formula |
symbolic description of the model to be fitted. A typical predictor has the form response ~ terms where the response is the integer-valued response vector following a Good distribution with parameters s and z, and terms is a series of predictors. |
data |
an optional data frame with the variables in the model. |
link |
character specification of link function: "logit", "log" or "identity". By default link="log". |
start |
a vector with the starting values for the model parameters. Used for numerically maximize the likelihood function for parameters estimation. By default start = NULL. |
glm.good
returns an object of class glm.good
that is a list including:
coefs |
The vector of coefficients. |
loglik |
Log-likelihood of the fitted model. |
vcov |
Variance-covariance matrix of all model parameters (derived from the Hessian matrix returned by nlm() ). |
hess |
Hessian matrix, returned by nlm(). |
fitted.values |
The fitted mean values. These are obtained by transforming the linear predictors by the link function inverse. |
Jordi Tur, David Moriña, Pere Puig, Alejandra Cabaña, Argimiro Arratia, Amanda Fernández-Fontelo
Good, J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40: 237–264.
Zörnig, P. and Altmann, G. (1995). Unified representation of zipf distributions. Computational Statistics & Data Analysis, 19: 461–473.
Kulasekera, K.B. and Tonkyn, D. (1992). A new distribution with applications to survival dispersal anddispersion. Communication in Statistics - Simulation and Computation, 21: 499–518.
Doray, L.G. and Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics - Simulation and Computation, 26: 1075–1088.
Johnson, N.L., Kemp, A.W. and Kotz, S. Univariate Discrete Distributions. Wiley, Hoboken, 2005.
Kemp. A.W. (2010). Families of power series distributions, with particular reference to the lerch family. Journal of Statistical Planning and Inference, 140:2255–2259.
Wood, D.C. (1992). The Computation of Polylogarithms. Technical report. UKC, University of Kent, Canterbury, UK (KAR id:21052).
See also polylog
from copula, dgood
,
and pgood
, qgood
and rgood
from good, and maxLik
from maxLik.
strikes <- c ( rep ( 0, 46 ) , rep ( 1, 76 ) , rep ( 2, 24 ) , rep ( 3, 9 ) , rep ( 4, 1 ) ) mle <- glm.good ( strikes ~ 1 , link = "log" ) names ( mle ) mle$coefficients mle$fitted.values mean ( strikes ) summary ( mle ) predict ( mle , newdata = NULL , se.fit = TRUE )
strikes <- c ( rep ( 0, 46 ) , rep ( 1, 76 ) , rep ( 2, 24 ) , rep ( 3, 9 ) , rep ( 4, 1 ) ) mle <- glm.good ( strikes ~ 1 , link = "log" ) names ( mle ) mle$coefficients mle$fitted.values mean ( strikes ) summary ( mle ) predict ( mle , newdata = NULL , se.fit = TRUE )
Distribution function for the Good distribution with parameters z and s.
pgood ( q , z , s , lower.tail = TRUE )
pgood ( q , z , s , lower.tail = TRUE )
q |
vector of non-negative integer quantiles. |
z |
vector of first parameter for the Good distribution. |
s |
vector of second parameter for the Good distribution. |
lower.tail |
logical; if TRUE (default), probabilities are |
pgood
returns the cumulative distribution function (cdf) for the Good
distribution with parameters z and s. Parameter z should be within the interval ,
and parameter s in the reals. If q is non-integer,
pgood
returns the cdf
of floor(q)
with a warning. If q is negative, pgood
returns with a warning.
pgood
calls dgood
from package good.
Jordi Tur, David Moriña, Pere Puig, Alejandra Cabaña, Argimiro Arratia, Amanda Fernández-Fontelo
Good, J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40: 237–264.
Zörnig, P. and Altmann, G. (1995). Unified representation of zipf distributions. Computational Statistics & Data Analysis, 19: 461–473.
Kulasekera, K.B. and Tonkyn, D. (1992). A new distribution with applications to survival dispersal anddispersion. Communication in Statistics - Simulation and Computation, 21: 499–518.
Doray, L.G. and Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics - Simulation and Computation, 26: 1075–1088.
Johnson, N.L., Kemp, A.W. and Kotz, S. Univariate Discrete Distributions. Wiley, Hoboken, 2005.
Kemp. A.W. (2010). Families of power series distributions, with particular reference to the lerch family. Journal of Statistical Planning and Inference, 140:2255–2259.
Wood, D.C. (1992). The Computation of Polylogarithms. Technical report. UKC, University of Kent, Canterbury, UK (KAR id:21052).
See also polylog
from copula, dgood
,
and qgood
and rgood
from good.
# if q < 0, pgood returns NaN with a warning pgood ( q = -3 , z = 0.6 , s = -3 ) # if q is non-integer, pgood returns the cdf of floor(q) with a warning pgood ( q = 3.4 , z = 0.6 , s = -3 ) # if z is not within 0 and 1, pgood returns returns NaN with a warning pgood ( q = 3.4 , z = c( -0.6 , 0.6) , s = -3 ) pgood ( q = 0 : 2 , z = 0.6 , s = -3 ) pgood ( q = 0 : 1 , z = c ( 0.6 , 0.9 ) , s = -3 ) pgood ( q = 0 : 1 , z = c ( 0.6 , 0.9 ) , s = -3 , lower.tail = FALSE ) pgood ( q = 0 : 2 , z = c ( 0.6 , 0.9 ) , s = c ( -3 , -4 , -5 ) )
# if q < 0, pgood returns NaN with a warning pgood ( q = -3 , z = 0.6 , s = -3 ) # if q is non-integer, pgood returns the cdf of floor(q) with a warning pgood ( q = 3.4 , z = 0.6 , s = -3 ) # if z is not within 0 and 1, pgood returns returns NaN with a warning pgood ( q = 3.4 , z = c( -0.6 , 0.6) , s = -3 ) pgood ( q = 0 : 2 , z = 0.6 , s = -3 ) pgood ( q = 0 : 1 , z = c ( 0.6 , 0.9 ) , s = -3 ) pgood ( q = 0 : 1 , z = c ( 0.6 , 0.9 ) , s = -3 , lower.tail = FALSE ) pgood ( q = 0 : 2 , z = c ( 0.6 , 0.9 ) , s = c ( -3 , -4 , -5 ) )
This data set corresponds to live-captured polar bears from late March 1992 to beginning of May 2017 at Svalbard, Norway.
data(polar)
data(polar)
A data frame with 231 rows and 7 columns.
Catch year
Number of the day of the catch year
Unique specimen id
Age of the specimen, estimated using premolar tooth
Categorized age of the specimen
Body straight length (cm)
Litter size
Folio, Dorinda Marie et al. (2019), Data from: How many cubs can a mum nurse? Maternal age and size influence litter size in polar bears, Dryad, Dataset.
Folio D. M., Aars J., Gimenez O., Derocher A. E., Wiig O. and Cubaynes S. (2019) How many cubs can a mum nurse? Maternal age and size influence litter size in polar bears, Biology letters, 15.
data(polar) head(polar)
data(polar) head(polar)
Quantile function for the Good distribution with parameters z and s.
qgood ( p , z , s , lower.tail = TRUE )
qgood ( p , z , s , lower.tail = TRUE )
p |
vector of non-negative integer quantiles. |
z |
vector of first parameter for the Good distribution. |
s |
vector of second parameter for the Good distribution. |
lower.tail |
logical; if TRUE (default), probabilities are
|
The smallest integer x such that
(or such that
if lower.tail is FALSE),
where X is a random variable following a Good distribution with parameters z
and s. Parameter z should be within the interval
, and parameter s
in the reals. Vector p should have values between
and
. If vector p has
negative values and/or outside the interval
,
qgood
returns NaN
with a warning. If vector p contains 1, qgood
returns Inf. qgood
calls dgood
from package good.
Jordi Tur, David Moriña, Pere Puig, Alejandra Cabaña, Argimiro Arratia, Amanda Fernández-Fontelo
Good, J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40: 237–264.
Zörnig, P. and Altmann, G. (1995). Unified representation of zipf distributions. Computational Statistics & Data Analysis, 19: 461–473.
Kulasekera, K.B. and Tonkyn, D. (1992). A new distribution with applications to survival dispersal anddispersion. Communication in Statistics - Simulation and Computation, 21: 499–518.
Doray, L.G. and Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics - Simulation and Computation, 26: 1075–1088.
Johnson, N.L., Kemp, A.W. and Kotz, S. Univariate Discrete Distributions. Wiley, Hoboken, 2005.
Kemp. A.W. (2010). Families of power series distributions, with particular reference to the lerch family. Journal of Statistical Planning and Inference, 140:2255–2259.
Wood, D.C. (1992). The Computation of Polylogarithms. Technical report. UKC, University of Kent, Canterbury, UK (KAR id:21052).
See also polylog
from copula, dgood
,
and pgood
and rgood
from good.
# if p is not within [0, 1], NaN is returned with a warning qgood ( p = c ( -0.6 , 1.3 ) , z = 0.5 , s = -3 ) # if z is not within 0 and 1, NaN is returned with a warning qgood ( p = 0.5 , z = c(-0.6, -9, 0.5) , s = -3 ) qgood ( p = 0.5 , z = 0.6 , s = -3 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = 0.6 , s = -3 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 , 0.1 ) , s = -5 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 , 0.5 ) , s = -3 , lower.tail = FALSE ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 ) , s = -3 )
# if p is not within [0, 1], NaN is returned with a warning qgood ( p = c ( -0.6 , 1.3 ) , z = 0.5 , s = -3 ) # if z is not within 0 and 1, NaN is returned with a warning qgood ( p = 0.5 , z = c(-0.6, -9, 0.5) , s = -3 ) qgood ( p = 0.5 , z = 0.6 , s = -3 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = 0.6 , s = -3 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 , 0.1 ) , s = -5 ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 , 0.5 ) , s = -3 , lower.tail = FALSE ) qgood ( p = c ( 0.025 , 0.5 , 0.975 ) , z = c ( 0.6 , 0.3 ) , s = -3 )
Random generation for the Good distribution with parameters z and s.
rgood ( n , z , s , th = 10^-6 )
rgood ( n , z , s , th = 10^-6 )
n |
vector of number of observations to be generated, accounting for all possible combinations of parameters |
z |
vector of first parameter for the Good distribution |
s |
vector of second parameter for the Good distribution |
th |
defines the lower ( |
A vector containing n random deviates from a Good distribution with parameters z
and s. Parameter z should be within the interval , and parameter s in the reals.
rgood
returns NaN if either arguments n or th are negative. rgood
calls qgood
and pgood
from package good.
Jordi Tur, David Moriña, Pere Puig, Alejandra Cabaña, Argimiro Arratia, Amanda Fernández-Fontelo
Good, J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40: 237–264.
Zörnig, P. and Altmann, G. (1995). Unified representation of zipf distributions. Computational Statistics & Data Analysis, 19: 461–473.
Kulasekera, K.B. and Tonkyn, D. (1992). A new distribution with applications to survival dispersal anddispersion. Communication in Statistics - Simulation and Computation, 21: 499–518.
Doray, L.G. and Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics - Simulation and Computation, 26: 1075–1088.
Johnson, N.L., Kemp, A.W. and Kotz, S. Univariate Discrete Distributions. Wiley, Hoboken, 2005.
Kemp. A.W. (2010). Families of power series distributions, with particular reference to the lerch family. Journal of Statistical Planning and Inference, 140:2255–2259.
Wood, D.C. (1992). The Computation of Polylogarithms. Technical report. UKC, University of Kent, Canterbury, UK (KAR id:21052).
See also polylog
from copula, dgood
,
and pgood
and qgood
from good.
# if n is not a non-negative interger, function returns NaN with a warning rgood ( n = -100 , z = 0.5 , s = -3 ) # if th is not positive, th is replaced by 1e-06 and a warning is provided rgood ( n = 1 , z = 0.5 , s = -3 , th = -9 ) # if z is not within 0 and 1, NaN is returned with a warning rgood ( n = 2 , z = c( -0.5, 0.5 ) , s = -3 ) rgood ( n = 10 , z = 0.6 , s = -3 ) rgood ( n = 1000 , z = 0.6 , s = -3 ) rgood ( n = c ( 3 , 10 ) , z = 0.6 , s = -3 ) rgood ( n = c ( 3 , 10 ) , z = c ( 0.2 , 0.8 ) , s = - 3 ) rgood ( n = c ( 3 , 10 , 6 ) , z = c ( 0.2 , 0.8 ) , s = c ( - 3 , -2 ) ) rgood ( n = 1000 , z = 0.3 , s = - 170 )
# if n is not a non-negative interger, function returns NaN with a warning rgood ( n = -100 , z = 0.5 , s = -3 ) # if th is not positive, th is replaced by 1e-06 and a warning is provided rgood ( n = 1 , z = 0.5 , s = -3 , th = -9 ) # if z is not within 0 and 1, NaN is returned with a warning rgood ( n = 2 , z = c( -0.5, 0.5 ) , s = -3 ) rgood ( n = 10 , z = 0.6 , s = -3 ) rgood ( n = 1000 , z = 0.6 , s = -3 ) rgood ( n = c ( 3 , 10 ) , z = 0.6 , s = -3 ) rgood ( n = c ( 3 , 10 ) , z = c ( 0.2 , 0.8 ) , s = - 3 ) rgood ( n = c ( 3 , 10 , 6 ) , z = c ( 0.2 , 0.8 ) , s = c ( - 3 , -2 ) ) rgood ( n = 1000 , z = 0.3 , s = - 170 )