Title: | Functions and Datasets for "Methods of Statistical Model Estimation" |
---|---|
Description: | Functions and datasets from Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC. |
Authors: | Joseph Hilbe and Andrew Robinson |
Maintainer: | Andrew Robinson <[email protected]> |
License: | GPL-3 |
Version: | 0.5.3 |
Built: | 2025-02-06 06:30:34 UTC |
Source: | CRAN |
This function computes the asymptotic likelihood ratio test of two models by comparing twice the different in the log-likelihoods of the models with the Chi-squared distribution with degrees of freedom equal to the difference in the degrees of freedom of the models.
alrt(x1, x2, boundary = FALSE)
alrt(x1, x2, boundary = FALSE)
x1 |
A fitted model as an object that logLik will work for. |
x2 |
A fitted model as an object that logLik will work for. |
boundary |
A flag that reports whether a boundary correction should be made. |
out.tab |
A data frame that summarizes the test. |
jll.diff |
The difference between the log-likelihoods. |
df.diff |
The difference between the degrees of freedom. |
p.value |
The p-value of the statistical test of the null hypothesis that there is no difference between the fit of the models. |
The function does not provide any checks for nesting, data equivalence, etc.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.poi.1 <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) ml.poi.2 <- ml_glm(los ~ hmo, family = "poisson", link = "log", data = medpar) alrt(ml.poi.1, ml.poi.2)
data(medpar) ml.poi.1 <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) ml.poi.2 <- ml_glm(los ~ hmo, family = "poisson", link = "log", data = medpar) alrt(ml.poi.1, ml.poi.2)
The data are a record of physician smoking habits and the frequency of death by myocardial infarction, or heart attack.
data(doll)
data(doll)
A data frame with 10 observations on the following variables.
Ordinal age group
smoking status
count of deaths in category
number of physisian years in scope of data
Dummy variable for age level 1
Dummy variable for age level 2
Dummy variable for age level 3
Dummy variable for age level 4
Dummy variable for age level 5
The physicians were divided into five age divisions, with deaths as the response, person years (pyears) as the binomial denominator, and both smoking behavior (smokes) and agegroup (a1–a5) as predictors.
Doll, R and A.B.Hill (1966). Mortality of British doctors in relation to smoking; observations on coronary thrombosis. In Epidemiological Approaches to the Study of Cancer and Other Chronic Diseases, W. Haenszel (ed), 19: 204–268. National Cancer Institute Monograph.
Hilbe, J., and A.P. Robinson. 2012. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(doll) i.glog <- irls(deaths ~ smokes + ordered(age), family = "binomial", link = "logit", data = doll, m = doll$pyears) summary(i.glog) glm.glog <- glm(cbind(deaths, pyears - deaths) ~ smokes + ordered(age), data = doll, family = binomial) coef(summary(glm.glog))
data(doll) i.glog <- irls(deaths ~ smokes + ordered(age), family = "binomial", link = "logit", data = doll, m = doll$pyears) summary(i.glog) glm.glog <- glm(cbind(deaths, pyears - deaths) ~ smokes + ordered(age), data = doll, family = binomial) coef(summary(glm.glog))
This function uses QR decomposition to determine the hat matrix of a model given its design matrix X. It is specific to objects of class msme.
## S3 method for class 'msme' hatvalues(model, ...)
## S3 method for class 'msme' hatvalues(model, ...)
model |
A fitted model of class msme. |
... |
other arguments, retained for compatibility with generic method. |
An n*n matrix of hat values, where n is the number of observations used to fit the model. Needed to standardize the residuals.
Leverages can be obtained as the diagonal of the output. See the examples.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) str(diag(hatvalues(ml.poi)))
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) str(diag(hatvalues(ml.poi)))
The data consists of Canadian patients who have either a Coronary Artery Bypass Graft surgery (CABG) or Percutaneous Transluminal Coronary Angioplasty (PTCA) heart procedure.
data(heart)
data(heart)
A grouped binomial data frame with 15 observations.
number of patients that died within 48 hours of hospital admission
number of patients monitored
1: anterior site damage heart attack; 0: other site damage
1: previous CABG procedure; 0: previous PTCA procedure;
1: normal heart; 2: angina; 3: minor heart blockage; 4: heart attack or myocardial infarction;
The data are presented as a grouped binomial dataset, with each row representing a different combination of the predictor variables.
National Canadian Registry of Cardiovascular Disease
Hilbe, Joseph M (2009), Logistic Regression Models, Chapman & Hall/CRC first used in Hardin, JW and JM Hilbe (2001, 2007), Generalized Linear Models and Extensions, Stata Press
data(heart) heart.nb <- irls(death ~ anterior + hcabg + factor(killip), a = 0.0001, offset = log(heart$cases), family = "negBinomial", link = "log", data = heart)
data(heart) heart.nb <- irls(death ~ anterior + hcabg + factor(killip), a = 0.0001, offset = log(heart$cases), family = "negBinomial", link = "log", data = heart)
This function fits a wide range of generalized linear models using the iteratively reweighted least squares algorithm. The intended benefit of this function is for teaching. Its scope is similar to that of R's glm function, which should be preferred for operational use.
irls(formula, data, family, link, tol = 1e-06, offset = 0, m = 1, a = 1, verbose = 0)
irls(formula, data, family, link, tol = 1e-06, offset = 0, m = 1, a = 1, verbose = 0)
formula |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. (See the help for 'glm' for more details). |
data |
a data frame containing the variables in the model. |
family |
a description of the error distribution be used in the model. This must be a character string naming a family. |
link |
a description of the link function be used in the model. This must be a character string naming a link function. |
tol |
an optional quantity to use as the convergence criterion for the change in deviance. |
offset |
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 0 or a numeric vector of length equal to the number of cases. |
m |
the number of cases per observation for binomial regression. |
a |
the scale for negative binomial regression. |
verbose |
a flag to control the amount of output printed by the function. |
The containing package, msme, provides the needed functions to use the irls function to fit the Poisson, negative binomial (2), Bernoulli, and binomial families, and supports the use of the identity, log, logit, probit, complementary log-log, inverse, inverse^2, and negative binomial link functions. All statistics are computed at the final iteration of the IRLS algorithm. The convergence criterion is the magnitude of the change in deviance. The object returned by the function is designed to be reported by the print.glm function.
coefficients |
parameter estimates. |
se.beta.hat |
standard errors of parameter estimates. |
model |
the final, weighted linear model. |
call |
the function call used to create the object. |
nobs |
the number of observations. |
eta |
the linear predictor at the final iteration. |
mu |
the estimated mean at the final iteration. |
df.residual |
the residual degrees of freedom. |
df.null |
the degrees of freedom for the null model. |
deviance |
the residual deviance. |
null.deviance |
a place-holder for the null deviance - returned as NA |
p.dispersion |
Pearsons's Chi-squared statistic. |
pearson |
Pearson's deviance. |
loglik |
the maximized log-likelihood. |
family |
the chosen family. |
X |
the design matrix. |
i |
the number of iterations required for convergence. |
residuals |
the deviance residuals. |
aic |
Akaike's Information Criterion. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) irls.poi <- irls(los ~ hmo + white, family = "poisson", link = "log", data = medpar) summary(irls.poi) irls.probit <- irls(died ~ hmo + white, family = "binomial", link = "probit", data = medpar) summary(irls.probit)
data(medpar) irls.poi <- irls(los ~ hmo + white, family = "poisson", link = "log", data = medpar) summary(irls.poi) irls.probit <- irls(died ~ hmo + white, family = "binomial", link = "probit", data = medpar) summary(irls.probit)
hospital database is referred to as the Medpar data, which is prepared yearly from hospital filing records. Medpar files for each state are also prepared. The full Medpar data consists of 115 variables. The national Medpar has some 14 million records, with one record for each hospilitiztion. The data in the medpar file comes from 1991 Medicare files for the state of Arizona. The data are limited to only one diagnostic group (DRG 112). Patient data have been randomly selected from the original data.
data(medpar)
data(medpar)
A data frame with 1495 observations on the following 10 variables.
los
length of hospital stay
hmo
Patient belongs to a Health Maintenance Organization, binary
white
Patient identifies themselves as Caucasian, binary
died
Patient died, binary
age80
Patient age 80 and over, binary
type
Type of admission, categorical
type1
Elective admission, binary
type2
Urgent admission,binary
type3
Elective admission, binary
provnum
Provider ID
Medpar is saved as a data frame. Count models use los as response variable. 0 counts are structurally excluded
1991 National Medpar data, National Health Economics & Research Co.
Hilbe, Joseph M (2007, 2011), Negative Binomial Regression, Cambridge University Press Hilbe, Joseph M (2009), Logistic Regression Models, Chapman & Hall/CRC first used in Hardin, JW and JM Hilbe (2001, 2007), Generalized Linear Models and Extensions, Stata Press
data(medpar) glmp <- glm(los ~ hmo + white + factor(type), family = poisson, data = medpar) summary(glmp) exp(coef(glmp)) ml.p <- ml_glm(los ~ hmo + white + factor(type), family = "poisson", link = "log", data = medpar) summary(ml.p) library(MASS) glmnb <- glm.nb(los ~ hmo + white + factor(type), data = medpar) summary(glmnb) exp(coef(glmnb))
data(medpar) glmp <- glm(los ~ hmo + white + factor(type), family = poisson, data = medpar) summary(glmp) exp(coef(glmp)) ml.p <- ml_glm(los ~ hmo + white + factor(type), family = "poisson", link = "log", data = medpar) summary(ml.p) library(MASS) glmnb <- glm.nb(los ~ hmo + white + factor(type), data = medpar) summary(glmnb) exp(coef(glmnb))
This function demonstrates the use of maximum likelihood to fit ordinary least-squares regression models, by maximizing the likelihood as a function of the parameters. Only conditional normal errors are supported.
ml_g(formula, data)
ml_g(formula, data)
formula |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. (See the help for 'lm' for more details). |
data |
a data frame containing the variables in the model. |
This function has limited functionality compared with R's internal lm function, which should be preferred in general.
fit |
the output of optim. |
X |
the design matrix. |
y |
the response variable. |
call |
the call used for the function. |
beta.hat |
the parameter estimates. |
se.beta.hat |
estimated standard errors of the parameter estimates. |
sigma.hat |
the estimated conditional standard deviation of the response variable. |
We use least squares to get initial estimates, which is a pretty barbaric hack. But the purpose of this function is as a starting point, not to replace existing functions.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman \& Hall / CRC.
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) summary(ufc.g.reg)
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) summary(ufc.g.reg)
This function fits generalized linear models by maximizing the joint log-likeliood, which is set in a separate function. Only single-parameter members of the exponential family are covered. The post-estimation output is designed to work with existing reporting functions.
ml_glm(formula, data, family, link, offset = 0, start = NULL, verbose = FALSE, ...)
ml_glm(formula, data, family, link, offset = 0, start = NULL, verbose = FALSE, ...)
formula |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. (See the help for 'glm' for more details). |
data |
a data frame containing the variables in the model. |
family |
a description of the error distribution be used in the model. This must be a character string naming a family. |
link |
a description of the link function be used in the model. This must be a character string naming a link function. |
offset |
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 0 or a numeric vector of length equal to the number of cases. |
start |
optional starting points for the parameter estimation. |
verbose |
logical flag affecting the detail of printing. Defaults to FALSE. |
... |
optional arguments to pass within the function. |
The containing package, msme, provides the needed functions to use the ml_glm function to fit the Poisson and Bernoulli families, and supports the use of the identity, log, logit, probit, and complementary log-log link functions. The object returned by the function is designed to be reported by the print.glm function.
fit |
the output of optim. |
X |
the design matrix. |
y |
the response variable. |
call |
the call used for the function. |
obs |
the number of observations. |
df.null |
the degrees of freedom for the null model. |
df.residual |
the residual degrees of freedom. |
deviance |
the residual deviance. |
null.deviance |
the residual deviance for the null model. |
residuals |
the deviance residuals. |
coefficients |
parameter estimates. |
se.beta.hat |
standard errors of parameter estimates. |
aic |
Akaike's Information Criterion. |
i |
the number of iterations required for convergence. |
This function is neither as comprehensive nor as stable as the inbuilt glm function. It is a lot easier to read, however.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) ml.poi summary(ml.poi)
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) ml.poi summary(ml.poi)
This function fits generalized linear models by maximizing the joint log-likeliood, which is set in a separate function. Two-parameter members of the exponential family are covered. The post-estimation output is designed to work with existing reporting functions.
ml_glm2(formula1, formula2 = ~1, data, family, mean.link, scale.link, offset = 0, start = NULL, verbose = FALSE)
ml_glm2(formula1, formula2 = ~1, data, family, mean.link, scale.link, offset = 0, start = NULL, verbose = FALSE)
formula1 |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the mean function for the model to be fitted. (See the help for 'glm' for more details). |
formula2 |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the scale function for the model to be fitted. (See the help for 'glm' for more details). |
data |
a data frame containing the variables in the model. |
family |
a description of the error distribution be used in the model. This must be a character string naming a family. |
mean.link |
a description of the link function be used for the mean in the model. This must be a character string naming a link function. |
scale.link |
a description of the link function be used for the scale in the model. This must be a character string naming a link function. |
offset |
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 0 or a numeric vector of length equal to the number of cases. |
start |
optional starting points for the parameter estimation. |
verbose |
logical flag affecting the detail of printing. Defaults to FALSE. |
The containing package, msme, provides the needed functions to use the ml_glm2 function to fit the normal and negative binomial (2), families, and supports the use of the identity and log link functions.
The object returned by the function is designed to be reported by the print.glm function.
fit |
the output of optim. |
loglike |
the maximized log-likelihood. |
X |
the design matrix. |
y |
the response variable. |
p |
the number of parameters estimated. |
rank |
the rank of the design matrix for the mean function. |
call |
the call used for the function. |
obs |
the number of observations. |
fitted.values |
estimated response variable. |
linear.predictor |
linear predictor. |
df.null |
the degrees of freedom for the null model. |
df.residual |
the residual degrees of freedom. |
pearson |
the Pearson Chi2. |
null.pearson |
the Pearson Chi2 for the null model. |
dispersion |
the dispersion. |
deviance |
the residual deviance. |
null.deviance |
the residual deviance for the null model. |
residuals |
the deviance residuals. |
presiduals |
the Pearson residuals. |
coefficients |
parameter estimates. |
se.beta.hat |
standard errors of parameter estimates. |
aic |
Akaike's Information Criterion. |
offset |
the offset used. |
i |
the number of iterations required for convergence. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.nb2 <- ml_glm2(los ~ hmo + white, formula2 = ~1, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "inverse_s") data(ufc) ufc <- na.omit(ufc) ml.g <- ml_glm2(height.m ~ dbh.cm, formula2 = ~ dbh.cm, data = ufc, family = "normal", mean.link = "identity", scale.link = "log_s") summary(ml.g)
data(medpar) ml.nb2 <- ml_glm2(los ~ hmo + white, formula2 = ~1, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "inverse_s") data(ufc) ufc <- na.omit(ufc) ml.g <- ml_glm2(height.m ~ dbh.cm, formula2 = ~ dbh.cm, data = ufc, family = "normal", mean.link = "identity", scale.link = "log_s") summary(ml.g)
This function fits generalized linear models by maximizing the joint log-likeliood, which is set in a separate function. Null models are omitted from the fit. The post-estimation output is designed to work with existing reporting functions.
ml_glm3(formula, data, family, link, offset = 0, start = NULL, verbose = FALSE, ...)
ml_glm3(formula, data, family, link, offset = 0, start = NULL, verbose = FALSE, ...)
formula |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. (See the help for 'glm' for more details). |
data |
a data frame containing the variables in the model. |
family |
a description of the error distribution be used in the model. This must be a character string naming a family. |
link |
a description of the link function be used in the model. This must be a character string naming a link function. |
offset |
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 0 or a numeric vector of length equal to the number of cases. |
start |
optional starting points for the parameter estimation. |
verbose |
logical flag affecting the detail of printing. Defaults to FALSE. |
... |
other arguments to pass to the likelihood function, e.g. group stucture. |
This function is essentially the same as ml_glm, but includes the dots argument to allow a richer set of model likelihoods to be fit, and omits computation of the null deviance. The function is presently set up to only fit the conditional fixed-effects negative binomial model.
fit |
the output of optim. |
X |
the design matrix. |
y |
the response variable. |
call |
the call used for the function. |
obs |
the number of observations. |
df.null |
the degrees of freedom for the null model. |
df.residual |
the residual degrees of freedom. |
deviance |
the residual deviance. |
null.deviance |
the residual deviance for the null model, set to NA. |
residuals |
the deviance residuals. |
coefficients |
parameter estimates. |
se.beta.hat |
standard errors of parameter estimates. |
aic |
Akaike's Information Criterion. |
i |
the number of iterations required for convergence. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) med.nb.g <- ml_glm3(los ~ hmo + white, family = "gNegBinomial", link = "log", group = medpar$provnum, data = medpar) summary(med.nb.g)
data(medpar) med.nb.g <- ml_glm3(los ~ hmo + white, family = "gNegBinomial", link = "log", group = medpar$provnum, data = medpar) summary(med.nb.g)
This function fits generalized linear models by maximizing the joint log-likeliood, which is set in a separate function. Two-parameter members of the negative binomial family are covered. The post-estimation output is designed to work with existing reporting functions.
nbinomial(formula1, formula2 = ~1, data, family="nb2", mean.link="log", scale.link="inverse_s", offset=0, start=NULL, verbose=FALSE)
nbinomial(formula1, formula2 = ~1, data, family="nb2", mean.link="log", scale.link="inverse_s", offset=0, start=NULL, verbose=FALSE)
formula1 |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the mean function for the model to be fitted. (See the help for 'glm' for more details). |
formula2 |
an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the scale function for the model to be fitted. (See the help for 'glm' for more details). |
data |
a data frame containing the variables in the model. |
family |
a description of the error distribution be used in the model. This must be a character string naming a family. |
mean.link |
a description of the link function be used for the mean in the model. This must be a character string naming a link function. |
scale.link |
a description of the link function be used for the scale in the model. This must be a character string naming a link function. |
offset |
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 0 or a numeric vector of length equal to the number of cases. |
start |
optional starting points for the parameter estimation. |
verbose |
logical flag affecting the detail of printing. Defaults to FALSE. |
The containing package, msme, provides the needed functions to use the nbinomial function to fit the negative binomial (2), families, and supports the use of the identity and log link functions.
The object returned by the function is designed to be reported by the print.glm function.
fit |
the output of optim. |
loglike |
the maximized log-likelihood. |
X |
the design matrix. |
y |
the response variable. |
p |
the number of parameters estimated. |
rank |
the rank of the design matrix for the mean function. |
call |
the call used for the function. |
obs |
the number of observations. |
fitted.values |
estimated response variable. |
linear.predictor |
linear predictor. |
df.null |
the degrees of freedom for the null model. |
df.residual |
the residual degrees of freedom. |
pearson |
the Pearson Chi2. |
null.pearson |
the Pearson Chi2 for the null model. |
dispersion |
the dispersion. |
deviance |
the residual deviance. |
null.deviance |
the residual deviance for the null model. |
residuals |
the deviance residuals. |
presiduals |
the Pearson residuals. |
coefficients |
parameter estimates. |
se.beta.hat |
standard errors of parameter estimates. |
aic |
Akaike's Information Criterion. |
offset |
the offset used. |
i |
the number of iterations required for convergence. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) # TRADITIONAL NB REGRESSION WITH ALPHA mynb1 <- nbinomial(los ~ hmo + white, data=medpar) summary(mynb1) # TRADITIONAL NB -- SHOWING ALL OPTIONS mynb2 <- nbinomial(los ~ hmo + white, formula2 = ~ 1, data = medpar, family = "nb2", mean.link = "log", scale.link = "inverse_s") summary(mynb2) # R GLM.NB - LIKE INVERTED DISPERSION BASED M mynb3 <- nbinomial(los ~ hmo + white, formula2 = ~ 1, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "inverse_s") summary(mynb3) # R GLM.NB-TYPE INVERTED DISPERSON --THETA ; WITH DEFAULTS mynb4 <- nbinomial(los ~ hmo + white, family="negBinomial", data =medpar) summary(mynb4) # HETEROGENEOUS NB; DISPERSION PARAMETERIZED mynb5 <- nbinomial(los ~ hmo + white, formula2 = ~ hmo + white, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "log_s") summary(mynb5)
data(medpar) # TRADITIONAL NB REGRESSION WITH ALPHA mynb1 <- nbinomial(los ~ hmo + white, data=medpar) summary(mynb1) # TRADITIONAL NB -- SHOWING ALL OPTIONS mynb2 <- nbinomial(los ~ hmo + white, formula2 = ~ 1, data = medpar, family = "nb2", mean.link = "log", scale.link = "inverse_s") summary(mynb2) # R GLM.NB - LIKE INVERTED DISPERSION BASED M mynb3 <- nbinomial(los ~ hmo + white, formula2 = ~ 1, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "inverse_s") summary(mynb3) # R GLM.NB-TYPE INVERTED DISPERSON --THETA ; WITH DEFAULTS mynb4 <- nbinomial(los ~ hmo + white, family="negBinomial", data =medpar) summary(mynb4) # HETEROGENEOUS NB; DISPERSION PARAMETERIZED mynb5 <- nbinomial(los ~ hmo + white, formula2 = ~ hmo + white, data = medpar, family = "negBinomial", mean.link = "log", scale.link = "log_s") summary(mynb5)
This function calculates Pearson Chi2 statistic and the Pearson-based dipersion statistic. Values of the dispersion greater than 1 indicate model overdispersion. Values under 1 indicate under-dispersion.
P__disp(x)
P__disp(x)
x |
the fitted model. |
To be used following glm and glm.nb functions.
pearson.chi2 |
Pearson Chi2 value. |
dispersion |
Pearson-basde dispersion. |
Joseph Hilbe and Andrew Robinson
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) mymod <- glm(los ~ hmo + white + factor(type), family = poisson, data = medpar) P__disp(mymod)
data(medpar) mymod <- glm(los ~ hmo + white + factor(type), family = poisson, data = medpar) P__disp(mymod)
This function provides a four-way plot for fitted models.
## S3 method for class 'ml_g_fit' plot(x, ...)
## S3 method for class 'ml_g_fit' plot(x, ...)
x |
the fitted model. |
... |
other arguments, retained for compatibility with generic method. |
The function plots a summary. The output is structured to broadly match the default options of the plot.lm function.
Run for its side effect of producing a plot object.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) plot(ufc.g.reg)
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) plot(ufc.g.reg)
Function to produce deviance and standardized deviance residuals from a model of class msme.
## S3 method for class 'msme' residuals(object, type = c("deviance", "standard"), ...)
## S3 method for class 'msme' residuals(object, type = c("deviance", "standard"), ...)
object |
a model of class msme. |
type |
the type of residual requested. Defaults to deviance. |
... |
arguments to pass on. Retained for compatibility with generic method. |
Presently only deviance or standardized deviance residuals are computed.
A vector of residuals.
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) str(residuals(ml.poi))
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) str(residuals(ml.poi))
German health registry for the years 1984-1988. Health information for years immediately prior to health reform.
data(rwm5yr)
data(rwm5yr)
A data frame with 19,609 observations on the following 17 variables.
id
patient ID (1=7028)
docvis
number of visits to doctor during year (0-121)
hospvis
number of days in hospital during year (0-51)
year
year; (categorical: 1984, 1985, 1986, 1987, 1988)
edlevel
educational level (categorical: 1-4)
age
age: 25-64
outwork
out of work=1; 0=working
female
female=1; 0=male
married
married=1; 0=not married
kids
have children=1; no children=0
hhninc
household yearly income in marks (in Marks)
educ
years of formal education (7-18)
self
self-employed=1; not self employed=0
edlevel1
(1/0) not high school graduate
edlevel2
(1/0) high school graduate
edlevel3
(1/0) university/college
edlevel4
(1/0) graduate school
rwm5yr is saved as a data frame. Count models typically use docvis as response variable. 0 counts are included
German Health Reform Registry, years pre-reform 1984-1988,
Hilbe, Joseph M (2007, 2011), Negative Binomial Regression, Cambridge University Press
data(rwm5yr) glmrp <- glm(docvis ~ outwork + female + age + factor(edlevel), family = poisson, data = rwm5yr) summary(glmrp) exp(coef(glmrp)) ml_p <- ml_glm(docvis ~ outwork + female + age + factor(edlevel), family = "poisson", link = "log", data = rwm5yr) summary(ml_p) exp(coef(ml_p)) library(MASS) glmrnb <- glm.nb(docvis ~ outwork + female + age + factor(edlevel), data = rwm5yr) summary(glmrnb) exp(coef(glmrnb)) ## Not run: library(gee) mygee <- gee(docvis ~ outwork + age + factor(edlevel), id=id, corstr = "exchangeable", family=poisson, data=rwm5yr) summary(mygee) exp(coef(mygee)) ## End(Not run)
data(rwm5yr) glmrp <- glm(docvis ~ outwork + female + age + factor(edlevel), family = poisson, data = rwm5yr) summary(glmrp) exp(coef(glmrp)) ml_p <- ml_glm(docvis ~ outwork + female + age + factor(edlevel), family = "poisson", link = "log", data = rwm5yr) summary(ml_p) exp(coef(ml_p)) library(MASS) glmrnb <- glm.nb(docvis ~ outwork + female + age + factor(edlevel), data = rwm5yr) summary(glmrnb) exp(coef(glmrnb)) ## Not run: library(gee) mygee <- gee(docvis ~ outwork + age + factor(edlevel), id=id, corstr = "exchangeable", family=poisson, data=rwm5yr) summary(mygee) exp(coef(mygee)) ## End(Not run)
This function provides a compact summary for fitted models.
## S3 method for class 'ml_g_fit' summary(object, dig = 3, ...)
## S3 method for class 'ml_g_fit' summary(object, dig = 3, ...)
object |
the fitted model. |
dig |
an optional integer detailing the number of significant digits for printing. |
... |
other arguments, retained for compatibility with generic method. |
The function prints out a summary and returns an invisible list with useful objects. The output is structured to match the print.summary.lm function.
call |
the call used to fit the model. |
coefficients |
a dataframe of estimates, standard errors, etc. |
residuals |
deviance residuals from the model. |
aliased |
included to match the print.summary.lm function. Lazily set to FALSE for all parameters. |
sigma |
the estimate of the conditional standard deviation of the response variable. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) summary(ufc.g.reg)
data(ufc) ufc <- na.omit(ufc) ufc.g.reg <- ml_g(height.m ~ dbh.cm, data = ufc) summary(ufc.g.reg)
This function provides a compact summary for fitted models.
## S3 method for class 'msme' summary(object, ...)
## S3 method for class 'msme' summary(object, ...)
object |
the fitted model. |
... |
optional arguments to be passed through. |
The function prints out a summary and returns an invisible list with useful objects.
call |
the call used to fit the model. |
coefficients |
a dataframe of estimates, standard errors, etc. |
deviance |
deviance from the model fit. |
null.deviance |
deviance from the null model fit. |
df.residual |
residual degrees of freedom from the model fit. |
df.null |
residual degrees of freedom from the null model fit. |
Andrew Robinson and Joe Hilbe.
Hilbe, J.M., and Robinson, A.P. 2013. Methods of Statistical Model Estimation. Chapman & Hall / CRC.
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) summary(ml.poi)
data(medpar) ml.poi <- ml_glm(los ~ hmo + white, family = "poisson", link = "log", data = medpar) summary(ml.poi)
Passenger survival data from 1912 Titanic shipping accident.
data(titanic)
data(titanic)
A data frame with 1316 observations on the following 4 variables.
survived
1=survived; 0=died
age
1=adult; 0=child
sex
1=Male; 0=female
class
ticket class 1= 1st class; 2= second class; 3= third class
Titanic is saved as a data frame. Used to assess risk ratio; not stardard count model; good binary response model.
Found in many other texts
Hilbe, Joseph M (2007, 2011), Negative Binomial Regression, Cambridge University Press Hilbe, Joseph M (2009), Logistic Regression Models, Chapman & Hall/CRC
data(titanic) glm.lr <- glm(survived ~ age + sex + factor(class), family=binomial, data=titanic) summary(glm.lr) exp(coef(glm.lr)) glm.irls <- irls(survived ~ age + sex + factor(class), family = "binomial", link = "cloglog", data = titanic) summary(glm.irls) exp(coef(glm.irls)) glm.ml <- ml_glm(survived ~ age + sex + factor(class), family = "bernoulli", link = "cloglog1", data = titanic) summary(glm.ml) exp(coef(glm.ml))
data(titanic) glm.lr <- glm(survived ~ age + sex + factor(class), family=binomial, data=titanic) summary(glm.lr) exp(coef(glm.lr)) glm.irls <- irls(survived ~ age + sex + factor(class), family = "binomial", link = "cloglog", data = titanic) summary(glm.irls) exp(coef(glm.irls)) glm.ml <- ml_glm(survived ~ age + sex + factor(class), family = "bernoulli", link = "cloglog1", data = titanic) summary(glm.ml) exp(coef(glm.ml))
These are a subset of the tree measurement data from the Upper Flat Creek unit of the University of Idaho Experimental Forest, which was measured in 1991.
data(ufc)
data(ufc)
A data frame with 336 observations on the following 5 variables.
plot label
tree label
species kbd with levels DF, GF, WC, WL
tree diameter at 1.37 m. from the ground, measured in centimetres.
tree height measured in metres
The inventory was based on variable radius plots with 6.43 sq. m. per ha. BAF (Basal Area Factor). The forest stand was 121.5 ha. This version of the data omits errors, trees with missing heights, and uncommon species. The four species are Douglas-fir, grand fir, western red cedar, and western larch.
The data are provided courtesy of Harold Osborne and Ross Appelgren of the University of Idaho Experimental Forest.
Robinson, A.P., and J.D. Hamann. 2010. Forest Analytics with R: an Introduction. Springer.
data(ufc) ufc <- na.omit(ufc) ml.g <- ml_glm2(height.m ~ dbh.cm, formula2 = ~1, data = ufc, family = "normal", mean.link = "identity", scale.link = "log_s") lm.g <- lm(height.m ~ dbh.cm, data = ufc) ml.g lm.g summary(ml.g) summary(lm.g)
data(ufc) ufc <- na.omit(ufc) ml.g <- ml_glm2(height.m ~ dbh.cm, formula2 = ~1, data = ufc, family = "normal", mean.link = "identity", scale.link = "log_s") lm.g <- lm(height.m ~ dbh.cm, data = ufc) ml.g lm.g summary(ml.g) summary(lm.g)