| Title: | Maximum Likelihood Models and Tools for Estimation, Prediction, and Testing |
|---|---|
| Description: | Provides a collection of maximum likelihood estimators with a consistent S3 interface. Supported models include Gaussian (linear and log-normal), logit, probit, Poisson, negative binomial (NB1 and NB2), gamma, and beta regression. A distinctive feature is flexible modeling of the scale parameter (variance, dispersion, precision, or shape) alongside the location/mean parameters. The package offers unified predict() methods, multiple variance-covariance estimators (observed information, outer product of gradients, robust/Huber-White, cluster-robust, bootstrap, jackknife), and a full suite of hypothesis tests (Wald, likelihood ratio, information matrix, Vuong, overdispersion, and goodness-of-fit). It is fully compatible with 'marginaleffects' for post-estimation analysis. Methods implemented include Cameron and Trivedi (1990) <doi:10.1016/0304-4076(90)90014-K>, for Poisson overdispersion testing, Manjon and Martinez (2014) <doi:10.1177/1536867X1401400406>, for goodness-of-fit testing of count data models, Vuong (1989) <doi:10.2307/1912557>, for non-nested likelihood ratio testing, and White (1982) <doi:10.2307/1912526>, for information matrix tests. |
| Authors: | Alfonso Sanchez-Penalver [aut, cre] (ORCID: <https://orcid.org/0000-0001-8491-4632>) |
| Maintainer: | Alfonso Sanchez-Penalver <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-08 19:19:05 UTC |
| Source: | https://github.com/cran/mlmodels |
Extract AIC from mlmodel objects
## S3 method for class 'mlmodel' AIC(object, ..., k = 2) ## S3 method for class 'summary.mlmodel' AIC(object, ..., k = 2)## S3 method for class 'mlmodel' AIC(object, ..., k = 2) ## S3 method for class 'summary.mlmodel' AIC(object, ..., k = 2)
object |
An object of class |
... |
Further arguments passed to methods. |
k |
Numeric. The penalty per parameter. Default is |
For mlmodel objects, AIC is computed as -2 * logLik(object) + k * npar.
For summary.mlmodel objects, the pre-computed AIC (with k = 2) is
returned; the k argument is accepted for compatibility but ignored.
A numeric value with the AIC.
Extract BIC from mlmodel objects
## S3 method for class 'mlmodel' BIC(object, ...) ## S3 method for class 'summary.mlmodel' BIC(object, ...)## S3 method for class 'mlmodel' BIC(object, ...) ## S3 method for class 'summary.mlmodel' BIC(object, ...)
object |
An object of class |
... |
Further arguments passed to methods. |
BIC is computed as -2 * logLik(object) + log(nobs) * npar.
A numeric value with the BIC.
Extract Model Coefficients
## S3 method for class 'mlmodel' coef(object, ...)## S3 method for class 'mlmodel' coef(object, ...)
object |
An |
... |
Currently not used. |
A named numeric vector of estimated coefficients.
Confidence Intervals for mlmodel Coefficients
## S3 method for class 'mlmodel' confint( object, parm, level = 0.95, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )## S3 method for class 'mlmodel' confint( object, parm, level = 0.95, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )
object |
An |
parm |
A specification of which parameters are to be given confidence intervals (names or numeric indices). If missing, all parameters are used. |
level |
The confidence level required. Default is 0.95. |
vcov |
Optional user-supplied variance-covariance matrix. |
vcov.type |
Type of variance-covariance matrix to use. See vcov.mlmodel. |
cl_var |
Clustering variable (name as string or vector). |
repetitions |
Number of bootstrap replications when |
seed |
Random seed for bootstrap/jackknife. |
progress |
Show progress bar? Default |
... |
Further arguments passed to methods. |
Confidence intervals are constructed as
,
where the standard errors come from the requested variance-covariance matrix.
The function supports all variance types available in vcov.mlmodel, including robust, clustered, bootstrap, and jackknife estimators.
Matrix with the confidence intervals for the requested parameters.
data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default 95% confidence intervals (using OIM) confint(fit) # 90% confidence intervals confint(fit, level = 0.90) # Confidence intervals for specific parameters confint(fit, parm = c("value::educ", "value::huswage")) confint(fit, parm = 4:5) # by position # Using different variance types confint(fit, vcov.type = "robust") # Clustered confidence intervals confint(fit, vcov.type = "robust", cl_var = "age") # Using a pre-computed bootstrap variance matrix v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) confint(fit, vcov = v_boot)data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default 95% confidence intervals (using OIM) confint(fit) # 90% confidence intervals confint(fit, level = 0.90) # Confidence intervals for specific parameters confint(fit, parm = c("value::educ", "value::huswage")) confint(fit, parm = 4:5) # by position # Using different variance types confint(fit, vcov.type = "robust") # Clustered confidence intervals confint(fit, vcov.type = "robust", cl_var = "age") # Using a pre-computed bootstrap variance matrix v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) confint(fit, vcov = v_boot)
Cross sectional sample for 2003.
docvisdocvis
A data frame with 3677 rows and 22 variables.
Binary: employer offers insurance
Ratio of SSI income to total income
Age in years
Years of education
Number of visits to doctor
Number of visits to health professional (not doctor)
Binary: has medicaid public insurance
Binary: has private supplementary insurance
Binary: female
Binary: physical limitation
Binary: activity limitation
Income (in thousands)
Number of chronic conditions
Same as private
age squared
log(income)
Binary: black or hispanic
Number of doctor visits
log(docvis) if docvis > 0
log(docvis + .01)
Constant term (1)
Binary: docvis > 0
https://www.stata-press.com/data/mus2.html mus220mepsdocvis.dta
Cameron, A. C., and P. K. Trivedi. 2022. Microeconometrics Using Stata, Second Edition. Volumes I and II. College Station, TX: Stata Press.
Extract the predictors used in the model (for insight/marginaleffects compatibility)
## S3 method for class 'mlmodel' find_predictors(x, ...)## S3 method for class 'mlmodel' find_predictors(x, ...)
x |
An object of class |
... |
Further arguments passed to methods |
A character vector with the names of the predictor variables.
Extract the variables used in the model (for insight/marginaleffects compatibility)
find_variables.mlmodel(x, ...)find_variables.mlmodel(x, ...)
x |
An object of class |
... |
Further arguments passed to methods |
A list with components response (character) and conditional (character vector).
Extract Fitted Values from mlmodel
## S3 method for class 'mlmodel' fitted(object, ...) ## S3 method for class 'values.mlmodel' fitted(object, ...)## S3 method for class 'mlmodel' fitted(object, ...) ## S3 method for class 'values.mlmodel' fitted(object, ...)
object |
An |
... |
Further arguments passed to methods (currently ignored). |
A numeric vector of fitted values, aligned to the original data.
Dropped observations (due to NAs or subset) return NA.
Extract value formula from mlmodel objects
## S3 method for class 'mlmodel' formula(x, ...)## S3 method for class 'mlmodel' formula(x, ...)
x |
An |
... |
Currently not implemented |
The formula of the value equation (object of class formula).
Extract data used to fit the model (for insight/marginaleffects compatibility)
## S3 method for class 'mlmodel' get_data(x, ...) get_modeldata.mlmodel(x, ...)## S3 method for class 'mlmodel' get_data(x, ...) get_modeldata.mlmodel(x, ...)
x |
An object of class |
... |
Further arguments (currently ignored) |
The original data frame used when fitting the model
Performs the Manjon and Martinez (2014) chi-squared goodness-of-fit test for count data models.
GOFtest(object, bins = 0:5) ## S3 method for class 'mlmodel' GOFtest(object, bins = 0:5)GOFtest(object, bins = 0:5) ## S3 method for class 'mlmodel' GOFtest(object, bins = 0:5)
object |
An object of class |
bins |
Integer vector. Defines the boundaries of the bins used
to group counts. Default is |
The test compares the observed frequencies with the expected frequencies predicted by the model across different count bins. It produces both a binned comparison table and an overall regression-based chi-squared test statistic.
A low p-value indicates that the model's predicted probabilities do not adequately match the observed count distribution (model misspecification).
An object of class "GOFtest.mlmodel" with components:
Description of the fitted model.
A table with observed and predicted frequencies, proportions, absolute differences, and Pearson contributions per bin.
A list containing teststat, df, and pval for the overall
goodness-of-fit test.
Alfonso Sanchez-Penalver
Manjon, M., & Martinez, O. (2014). 'The chi-squared goodness-of-fit test for count-data models.' The Stata Journal, 14(4), 798-816. doi:10.1177/1536867X1401400406
# Poisson model fit_pois <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) GOFtest(fit_pois, bins = 0:5) # Negative binomial model fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) GOFtest(fit_nb2)# Poisson model fit_pois <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) GOFtest(fit_pois, bins = 0:5) # Negative binomial model fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) GOFtest(fit_nb2)
Extract the per-observation gradients (scores) evaluated at the estimated
parameters from an mlmodel object.
gradientObs(object) ## S3 method for class 'mlmodel' gradientObs(object)gradientObs(object) ## S3 method for class 'mlmodel' gradientObs(object)
object |
An |
These are the individual contributions to the score vector. They are mainly useful for advanced users who want to implement custom tests or diagnostics.
A numeric matrix with one row per observation and one column per parameter. Each row contains the gradient of the log-likelihood for that observation.
Extract the per-observation Hessian matrices evaluated at the estimated
parameters from an mlmodel object.
hessianObs(object) ## S3 method for class 'mlmodel' hessianObs(object)hessianObs(object) ## S3 method for class 'mlmodel' hessianObs(object)
object |
An |
This is mainly intended for advanced use (e.g., custom diagnostics or
information matrix tests). For most users, the functions IMtest or
vcov are more convenient.
A numeric matrix of dimension (N*K) x K, where N is the number
of observations and K is the number of parameters. The Hessian for each
observation is stacked vertically.
Performs the Information Matrix (IM) test for misspecification on models
fitted with the mlmodels package.
IMtest(object, ...) ## S3 method for class 'mlmodel' IMtest(object, method = "quad", repetitions = 999, seed = 1234L, ...)IMtest(object, ...) ## S3 method for class 'mlmodel' IMtest(object, method = "quad", repetitions = 999, seed = 1234L, ...)
object |
A fitted model object inheriting from |
... |
Further arguments passed to methods (currently not used). |
method |
Character string. Specifies the version of the test:
|
repetitions |
Integer. Number of bootstrap replications when using a bootstrap method. Default is 999. |
seed |
Integer. Random seed for reproducibility in bootstrap methods.
If |
The Information Matrix test checks whether the model is correctly specified by testing the equality between the Hessian and the outer product of the gradient (information matrix equality). Rejection of the null hypothesis indicates model misspecification (e.g., incorrect functional form, heteroskedasticity not properly modeled, omitted variables, etc.).
Two main versions are implemented:
Quadratic form ("quad"): Generally preferred for its better finite-sample properties.
OPG version ("opg"): Chesher and Lancaster (1983) version.
Bootstrap versions ("boot_quad" and "boot_opg") provide p-values based on
the empirical distribution of the test statistic and are useful when asymptotic
approximations may be unreliable.
An object of class "IMtest.mlmodel" containing the analytical test
statistic, degrees of freedom and p-value, plus the bootstrapped p-value
(if a bootstrap method was selected).
Alfonso Sanchez-Penalver
Chesher, A. (1983). The information matrix test: Simplified calculation via a score test interpretation. Economics Letters, 13(1), 45-48.
Lancaster, T. (1984). The covariance matrix of the information matrix test. Econometrica, 52(4), 1051-1053.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50(1), 1-25.
waldtest(), lrtest(), vuongtest()
# Linear model example data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default quadratic form test IMtest(fit) # OPG version IMtest(fit, method = "opg") # Bootstrap p-value (quadratic form) IMtest(fit, method = "boot_quad", repetitions = 50, seed = 123) # Heteroskedastic model fit_het <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, scale = ~ educ, data = mroz) IMtest(fit_het)# Linear model example data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default quadratic form test IMtest(fit) # OPG version IMtest(fit, method = "opg") # Bootstrap p-value (quadratic form) IMtest(fit, method = "boot_quad", repetitions = 50, seed = 123) # Heteroskedastic model fit_het <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, scale = ~ educ, data = mroz) IMtest(fit_het)
Extract Log-Likelihood from mlmodel objects
## S3 method for class 'mlmodel' logLik(object, ...) ## S3 method for class 'summary.mlmodel' logLik(object, ...)## S3 method for class 'mlmodel' logLik(object, ...) ## S3 method for class 'summary.mlmodel' logLik(object, ...)
object |
An object of class |
... |
Additional arguments passed to methods. |
The returned object is of class "logLik" and has two important attributes:
nobs: number of observations used in estimation.
df: number of estimated parameters (usually called K),
computed as length(coef(object)). This includes coefficients from both
the location (mean/value) and scale equations when present.
An object of class "logLik" with the log-likelihood value and the
attributes nobs and df.
Extract the per-observation log-likelihood contributions from an mlmodel
object.
loglikeObs(object) ## S3 method for class 'mlmodel' loglikeObs(object)loglikeObs(object) ## S3 method for class 'mlmodel' loglikeObs(object)
object |
An |
These individual contributions are useful for Vuong tests, robust variance estimation, or custom model diagnostics.
A numeric vector of length nobs(object) containing the
log-likelihood contribution of each observation.
Performs a likelihood ratio test comparing two nested models fitted with
the same estimator (e.g. ml_lm, ml_logit, ml_negbin, etc.).
lrtest(object_1, object_2, ...) ## S3 method for class 'mlmodel' lrtest(object_1, object_2, ...)lrtest(object_1, object_2, ...) ## S3 method for class 'mlmodel' lrtest(object_1, object_2, ...)
object_1 |
A fitted model object inheriting from |
object_2 |
A fitted model object inheriting from |
... |
Further arguments passed to methods (currently not used). |
The likelihood ratio test statistic is calculated as:
Under the null hypothesis that the restricted model is correct, LR follows
a distribution with degrees of freedom equal to the difference
in the number of parameters between the two models.
Important: The two models must be nested (the restricted model must be a special case of the unrestricted one) and fitted on exactly the same sample. The restricted model must have a lower (or equal) log-likelihood.
An object of class "lrtest.mlmodel" with the test statistic,
degrees of freedom, and p-value.
Alfonso Sanchez-Penalver
waldtest(), IMtest(), vuongtest()
# Linear model example data(mroz) mroz$incthou <- mroz$faminc / 1000 fit_small <- ml_lm(incthou ~ age + huswage, data = mroz) fit_large <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) lrtest(fit_small, fit_large) # You can also reverse the order — the function detects the restricted model lrtest(fit_large, fit_small)# Linear model example data(mroz) mroz$incthou <- mroz$faminc / 1000 fit_small <- ml_lm(incthou ~ age + huswage, data = mroz) fit_large <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) lrtest(fit_small, fit_large) # You can also reverse the order — the function detects the restricted model lrtest(fit_large, fit_small)
Fit Beta Model by Maximum Likelihood
ml_beta( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_beta( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Formula for the conditional log(mean) equation. |
scale |
Formula for log(phi) equation (precision parameter - optional).
If |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
noint_scale |
Logical. Should the scale equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value or scale equations. Use the dedicated
arguments noint_value and noint_scale instead.
Coefficient names in the fitted object use the prefixes value:: and
scale:: to clearly identify to which equation each coefficient belongs to,
and to avoid confusion when the same variable(s) appear(s) in both the value
and scale equations.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_lm automatically chooses the optimizer:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
The Beta model is only defined for a strictly fractional response variable
in the open interval (0, 1). Observations where the response is
y <= 0 or y >= 1 are automatically dropped with a warning.
If your data contains boundary values (0 or 1), consider using
ml_logit() instead.
An object of class ml_beta that extends mlmodel.
Alfonso Sanchez-Penalver
# Homoskedastic beta regression (fractional response) data(pw401k) # Beta regression requires 0 < y < 1. # Observations at the boundaries (0 or 1) are automatically dropped. fit_beta <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, data = pw401k, subset = prate < 1) # drop y = 1 summary(fit_beta, vcov.type = "robust") # Heteroskedastic beta regression fit_beta_het <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, scale = ~ totemp + sole, data = pw401k, subset = prate < 1) summary(fit_beta_het, vcov.type = "robust") # Note: All predictions (including those from predict(), fitted(), and # residuals()) return values aligned to the original data, with NA # for observations dropped due to subset or boundary values. # Different predict types head(predict(fit_beta, type = "response")$fit) # Expected value E[y] head(predict(fit_beta, type = "variance")$fit) # Variance of y head(predict(fit_beta, type = "phi")$fit) # Precision parameter # Fitted values and residuals head(fitted(fit_beta)) head(residuals(fit_beta)) head(residuals(fit_beta, type = "pearson"))# Homoskedastic beta regression (fractional response) data(pw401k) # Beta regression requires 0 < y < 1. # Observations at the boundaries (0 or 1) are automatically dropped. fit_beta <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, data = pw401k, subset = prate < 1) # drop y = 1 summary(fit_beta, vcov.type = "robust") # Heteroskedastic beta regression fit_beta_het <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, scale = ~ totemp + sole, data = pw401k, subset = prate < 1) summary(fit_beta_het, vcov.type = "robust") # Note: All predictions (including those from predict(), fitted(), and # residuals()) return values aligned to the original data, with NA # for observations dropped due to subset or boundary values. # Different predict types head(predict(fit_beta, type = "response")$fit) # Expected value E[y] head(predict(fit_beta, type = "variance")$fit) # Variance of y head(predict(fit_beta, type = "phi")$fit) # Precision parameter # Fitted values and residuals head(fitted(fit_beta)) head(residuals(fit_beta)) head(residuals(fit_beta, type = "pearson"))
Fit Gamma Model by Maximum Likelihood
ml_gamma( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_gamma( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Formula for the conditional log(mean) equation. |
scale |
Formula for log(nu) equation (shape parameter - optional).
If |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
noint_scale |
Logical. Should the scale equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value or scale equations. Use the dedicated
arguments noint_value and noint_scale instead.
Coefficient names in the fitted object use the prefixes value:: and
scale:: to clearly identify to which equation each coefficient belongs to,
and to avoid confusion when the same variable(s) appear(s) in both the value
and scale equations.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_lm automatically chooses the optimizer:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
The Gamma model requires a strictly positive response variable
(y > 0). Observations where y <= 0 are automatically
dropped with a warning.
If your data contains zeros or non-positive values, consider using
ml_poisson() or ml_negbin() instead, as they are frequently
applied to continuous non-negative outcomes.
An object of class ml_gamma that extends mlmodel.
Alfonso Sanchez-Penalver
# Homoskedastic gamma regression data(mroz) fit_gamma <- ml_gamma(faminc ~ hours + hushrs + age + educ, data = mroz) summary(fit_gamma, vcov.type = "robust") # Heteroskedastic gamma regression fit_gamma_het <- ml_gamma(faminc ~ hours + hushrs + age + educ, scale = ~ kidslt6, data = mroz) summary(fit_gamma_het, vcov.type = "robust") # Different predict types head(predict(fit_gamma, type = "response")$fit) # Expected value E[y] head(predict(fit_gamma, type = "variance")$fit) # Variance of y # Fitted values and residuals head(fitted(fit_gamma)) head(residuals(fit_gamma)) head(residuals(fit_gamma, type = "pearson")) # Comparison with lognormal model (often very similar mean predictions) fit_lognorm <- ml_lm(log(faminc) ~ hours + hushrs + age + educ, data = mroz) head(predict(fit_gamma, type = "response")$fit) head(predict(fit_lognorm, type = "response")$fit)# Homoskedastic gamma regression data(mroz) fit_gamma <- ml_gamma(faminc ~ hours + hushrs + age + educ, data = mroz) summary(fit_gamma, vcov.type = "robust") # Heteroskedastic gamma regression fit_gamma_het <- ml_gamma(faminc ~ hours + hushrs + age + educ, scale = ~ kidslt6, data = mroz) summary(fit_gamma_het, vcov.type = "robust") # Different predict types head(predict(fit_gamma, type = "response")$fit) # Expected value E[y] head(predict(fit_gamma, type = "variance")$fit) # Variance of y # Fitted values and residuals head(fitted(fit_gamma)) head(residuals(fit_gamma)) head(residuals(fit_gamma, type = "pearson")) # Comparison with lognormal model (often very similar mean predictions) fit_lognorm <- ml_lm(log(faminc) ~ hours + hushrs + age + educ, data = mroz) head(predict(fit_gamma, type = "response")$fit) head(predict(fit_lognorm, type = "response")$fit)
Fit linear model by Maximum Likelihood
ml_lm( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_lm( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Formula for the conditional mean (value) equation. |
scale |
Formula for log(sigma) (optional). If |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
noint_scale |
Logical. Should the scale equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value or scale equations. Use the dedicated
arguments noint_value and noint_scale instead.
Coefficient names in the fitted object use the prefixes value:: and
scale:: to clearly identify to which equation each coefficient belongs to,
and to avoid confusion when the same variable(s) appear(s) in both the value
and scale equations.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_lm automatically chooses the optimizer:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
An object of class ml_lm that extends mlmodel.
Alfonso Sanchez-Penalver
# Homoskedastic linear model data(mroz) fit_lin <- ml_lm(faminc ~ age + I(age^2) + huswage + educ + unem, data = mroz) summary(fit_lin, vcov.type = "robust") # Heteroskedastic linear model fit_het <- ml_lm(faminc ~ age + I(age^2) + huswage + educ + unem, scale = ~ educ + exper, data = mroz) summary(fit_het, vcov.type = "robust") # Lognormal (log-linear) model fit_log <- ml_lm(log(faminc) ~ age + I(age^2) + huswage + educ + unem, data = mroz) summary(fit_log, vcov.type = "robust") # Different predict types head(predict(fit_log, type = "response")$fit) # Expected value E[y] head(predict(fit_log, type = "median")$fit) # Median of y head(predict(fit_log, type = "variance_y")$fit) # Variance of y head(predict(fit_log, type = "var")$fit) # Variance of log(y) # Fitted values and residuals head(fitted(fit_lin)) head(residuals(fit_lin)) head(residuals(fit_lin, type = "pearson"))# Homoskedastic linear model data(mroz) fit_lin <- ml_lm(faminc ~ age + I(age^2) + huswage + educ + unem, data = mroz) summary(fit_lin, vcov.type = "robust") # Heteroskedastic linear model fit_het <- ml_lm(faminc ~ age + I(age^2) + huswage + educ + unem, scale = ~ educ + exper, data = mroz) summary(fit_het, vcov.type = "robust") # Lognormal (log-linear) model fit_log <- ml_lm(log(faminc) ~ age + I(age^2) + huswage + educ + unem, data = mroz) summary(fit_log, vcov.type = "robust") # Different predict types head(predict(fit_log, type = "response")$fit) # Expected value E[y] head(predict(fit_log, type = "median")$fit) # Median of y head(predict(fit_log, type = "variance_y")$fit) # Variance of y head(predict(fit_log, type = "var")$fit) # Variance of log(y) # Fitted values and residuals head(fitted(fit_lin)) head(residuals(fit_lin)) head(residuals(fit_lin, type = "pearson"))
Fit Binary Logit Model by Maximum Likelihood
ml_logit( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_logit( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Two-sided formula for the probability equation. |
scale |
Optional one-sided formula for heteroskedasticity. |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value equation. Use the dedicated argument
noint_value instead. For the scale equation (if modeling heteroskedasticity),
the formula must contain only the predictors (right-hand side).
ml_logit() handles both strictly binary (0/1), and fractional response
(0 <= y <= 1) outcomes. When using fractional responses, it is recommended
to use robust standard errors (vcov.type = "robust").
Coefficient names in the fitted object use the prefixes value:: and
scale:: (when heteroskedasticity is modeled) to clearly identify which
equation each coefficient belongs to.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_logit automatically chooses method:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
An object of class ml_logit that extends mlmodel.
Alfonso Sanchez-Penalver
# Homoskedastic binary logit model data(smoke) smoke$smokes <- smoke$cigs > 0 fit_logit <- ml_logit(smokes ~ cigpric + income + age, data = smoke) summary(fit_logit, vcov.type = "robust") # Heteroskedastic binary logit model fit_logit_het <- ml_logit(smokes ~ cigpric + income + age, scale = ~ educ, data = smoke) summary(fit_logit_het, vcov.type = "robust") # Different predict types head(predict(fit_logit, type = "response")$fit) # Predicted probability head(predict(fit_logit, type = "link")$fit) # Linear predictor (log-odds) # Fitted values and residuals head(fitted(fit_logit)) head(residuals(fit_logit)) head(residuals(fit_logit, type = "pearson"))# Homoskedastic binary logit model data(smoke) smoke$smokes <- smoke$cigs > 0 fit_logit <- ml_logit(smokes ~ cigpric + income + age, data = smoke) summary(fit_logit, vcov.type = "robust") # Heteroskedastic binary logit model fit_logit_het <- ml_logit(smokes ~ cigpric + income + age, scale = ~ educ, data = smoke) summary(fit_logit_het, vcov.type = "robust") # Different predict types head(predict(fit_logit, type = "response")$fit) # Predicted probability head(predict(fit_logit, type = "link")$fit) # Linear predictor (log-odds) # Fitted values and residuals head(fitted(fit_logit)) head(residuals(fit_logit)) head(residuals(fit_logit, type = "pearson"))
Fit negative binomial models by Maximum Likelihood
ml_negbin( value, scale = NULL, dispersion = "NB2", weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_negbin( value, scale = NULL, dispersion = "NB2", weights = NULL, data, subset = NULL, noint_value = FALSE, noint_scale = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Formula for the conditional mean (value) equation. |
scale |
Formula for the dispersion parameter log(alpha) (optional).
If |
dispersion |
Either NB1 (proportional to mean variance), or NB2 (quadratic to mean variance). Defaults to NB2. |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
noint_scale |
Logical. Should the scale equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value or scale equations. Use the dedicated
arguments noint_value and noint_scale instead.
Coefficient names in the fitted object use the prefixes value:: and
scale:: to clearly identify to which equation each coefficient belongs to,
and to avoid confusion when the same variable(s) appear(s) in both the value
and scale equations.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_lm automatically chooses the optimizer:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
An object of class ml_negbin that extends mlmodel.count and mlmodel.
Alfonso Sanchez-Penalver
# Homoskedastic NB2 model (default dispersion) data(docvis) fit_nb2 <- ml_negbin(docvis ~ age + educyr + totchr, data = docvis) summary(fit_nb2, vcov.type = "robust") # Homoskedastic NB1 model fit_nb1 <- ml_negbin(docvis ~ age + educyr + totchr, dispersion = "NB1", data = docvis) summary(fit_nb1, vcov.type = "robust") # Heteroskedastic NB2 model fit_nb2_het <- ml_negbin(docvis ~ age + educyr + totchr, scale = ~ female + bh, data = docvis) summary(fit_nb2_het, vcov.type = "robust") # Different predict types head(predict(fit_nb2, type = "response")$fit) # Expected count head(predict(fit_nb2, type = "var")$fit) # Variance head(predict(fit_nb2, type = "alpha")$fit) # Dispersion parameter # Fitted values and residuals head(fitted(fit_nb2)) head(residuals(fit_nb2)) head(residuals(fit_nb2, type = "pearson"))# Homoskedastic NB2 model (default dispersion) data(docvis) fit_nb2 <- ml_negbin(docvis ~ age + educyr + totchr, data = docvis) summary(fit_nb2, vcov.type = "robust") # Homoskedastic NB1 model fit_nb1 <- ml_negbin(docvis ~ age + educyr + totchr, dispersion = "NB1", data = docvis) summary(fit_nb1, vcov.type = "robust") # Heteroskedastic NB2 model fit_nb2_het <- ml_negbin(docvis ~ age + educyr + totchr, scale = ~ female + bh, data = docvis) summary(fit_nb2_het, vcov.type = "robust") # Different predict types head(predict(fit_nb2, type = "response")$fit) # Expected count head(predict(fit_nb2, type = "var")$fit) # Variance head(predict(fit_nb2, type = "alpha")$fit) # Dispersion parameter # Fitted values and residuals head(fitted(fit_nb2)) head(residuals(fit_nb2)) head(residuals(fit_nb2, type = "pearson"))
Fit Poisson model by Maximum Likelihood
ml_poisson( value, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_poisson( value, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Formula for the conditional mean (value) equation. |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value equation. Use the dedicated
argument noint_value instead.
Coefficient names in the fitted object use the prefixes value::. This is for
consistency with other mlmodel estimators that model the scale (dispersion)
as well.
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_lm automatically chooses the optimizer:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
The Poisson model assumes equidispersion (mean = variance). When the data show overdispersion (as is common), consider using ml_negbin instead.
An object of class ml_poisson that extends mlmodel.count and mlmodel.
Alfonso Sanchez-Penalver
# Poisson model data(docvis) fit_pois <- ml_poisson(docvis ~ age + educyr + totchr, data = docvis) summary(fit_pois, vcov.type = "robust") # Different predict types head(predict(fit_pois, type = "response")$fit) # Expected count head(predict(fit_pois, type = "P(2,)")$fit) # Probability of at least 2 head(predict(fit_pois, type = "P(3)")$fit) # Probability of exactly 3 # Fitted values and residuals head(fitted(fit_pois)) head(residuals(fit_pois)) head(residuals(fit_pois, type = "pearson"))# Poisson model data(docvis) fit_pois <- ml_poisson(docvis ~ age + educyr + totchr, data = docvis) summary(fit_pois, vcov.type = "robust") # Different predict types head(predict(fit_pois, type = "response")$fit) # Expected count head(predict(fit_pois, type = "P(2,)")$fit) # Probability of at least 2 head(predict(fit_pois, type = "P(3)")$fit) # Probability of exactly 3 # Fitted values and residuals head(fitted(fit_pois)) head(residuals(fit_pois)) head(residuals(fit_pois, type = "pearson"))
Fit Binary Probit Model by Maximum Likelihood
ml_probit( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )ml_probit( value, scale = NULL, weights = NULL, data, subset = NULL, noint_value = FALSE, constraints = NULL, start = NULL, method = "NR", control = NULL, ... )
value |
Two-sided formula for the probability equation. |
scale |
Optional one-sided formula for heteroskedasticity. |
weights |
Optional weights variable. It can be either the name of the
variable in |
data |
Data frame. |
subset |
Optional subset expression. Only observations for which this
expression evaluates to |
noint_value |
Logical. Should the value equation omit the intercept?
Default is |
constraints |
Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details. |
start |
Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details. |
method |
A string with the method used for optimization. See maxLik for options, and see Details. |
control |
A list of control parameters passed to maxLik.
If |
... |
Additional arguments passed to maxLik. |
Important: Do not use the usual R syntax to remove the intercept in the
formula (- 1 or + 0) for the value equation. Use the dedicated argument
noint_value instead. For the scale equation (if modeling heteroskedasticity),
the formula must contain only the predictors (right-hand side).
The dependent variable must be binary (0/1 or TRUE/FALSE).
Coefficient names in the fitted object use the prefixes value:: and
scale:: (when heteroskedasticity is modeled) to clearly identify which
equation each coefficient belongs to.
ml_probit() handles both strictly binary (0/1), and fractional response
(0 <= y <= 1) outcomes. When using fractional responses, it is recommended
to use robust standard errors (vcov.type = "robust").
Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.
Important: When constraints are supplied, start cannot be NULL.
You must provide initial values that yield a feasible log-likelihood.
If no constraints are used, any supplied start is ignored.
When constraints are used, ml_probit automatically chooses method:
Equality constraints => Nelder-Mead ("NM")
Inequality constraints => BFGS ("BFGS")
In these cases your supplied method argument (if any) is ignored.
An object of class ml_probit that extends mlmodel.
Alfonso Sanchez-Penalver
# Probit model (strictly binary outcome) data(smoke) smoke$smokes <- smoke$cigs > 0 fit_probit <- ml_probit(smokes ~ cigpric + income + age, data = smoke) summary(fit_probit, vcov.type = "robust") # Heteroskedastic probit model fit_probit_het <- ml_probit(smokes ~ cigpric + income + age, scale = ~ educ, data = smoke) summary(fit_probit_het, vcov.type = "robust") # Different predict types head(predict(fit_probit, type = "response")$fit) # Probability of success head(predict(fit_probit, type = "prob0")$fit) # Probability of failure head(predict(fit_probit, type = "link")$fit) # Linear predictor (z) # Fitted values and residuals head(fitted(fit_probit)) head(residuals(fit_probit)) head(residuals(fit_probit, type = "pearson"))# Probit model (strictly binary outcome) data(smoke) smoke$smokes <- smoke$cigs > 0 fit_probit <- ml_probit(smokes ~ cigpric + income + age, data = smoke) summary(fit_probit, vcov.type = "robust") # Heteroskedastic probit model fit_probit_het <- ml_probit(smokes ~ cigpric + income + age, scale = ~ educ, data = smoke) summary(fit_probit_het, vcov.type = "robust") # Different predict types head(predict(fit_probit, type = "response")$fit) # Probability of success head(predict(fit_probit, type = "prob0")$fit) # Probability of failure head(predict(fit_probit, type = "link")$fit) # Linear predictor (z) # Fitted values and residuals head(fitted(fit_probit)) head(residuals(fit_probit)) head(residuals(fit_probit, type = "pearson"))
Sample of 753 between the ages of 30 and 60 in 1975 (PSID interview year 1976)
mrozmroz
A data frame with 753 observations on 22 variables.
Binary: in labor force in 1975
Hours worked in 1975
Number of children less than 6 years old
Number of children 6 years old or older
Woman's age in years
Woman's years of education
Woman's estimated average hourly earnings
Representative wage in 1976 (interview year)
Hours worked by husband in 1975
Husband's age in years
Husband's years of education
Husband's estimated average hourly earnings
Total family income
Woman's Federal marginal income tax rate
Mother's years of education
Father's years of education
Unemployment rate in county of residence
Binary: lives in Standard Metropolitan Statistical Area (SMSA)
Years of labor market experience
Income from other sources than the wife's labor, in thousands
log(wage)
exper^2
https://cran.r-project.org/package=wooldridge
Wooldridge, J. M. (2020). Introductory Econometrics: A Modern Approach. 7th Edition. Boston, MA: Cengage Learning.
Mroz, T. A. (1987). "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions." Econometrica 55, 765-799.
Extract the Number of Observations from an mlmodel
## S3 method for class 'mlmodel' nobs(object, ...)## S3 method for class 'mlmodel' nobs(object, ...)
object |
An object of class |
... |
Further arguments passed to methods (currently not used). |
An integer giving the number of observations used in the estimation
(after removing missing values and applying any subset).
Provides a convenient infix operator for replacing NULL values.
x \%||\% y is equivalent to if (is.null(x)) y else x.
x %||% yx %||% y
x, y
|
Any R objects. |
The value of x if it is not NULL, otherwise the value of y.
NULL %||% "fallback" list(a = 1) %||% list(b = 2)NULL %||% "fallback" list(a = 1) %||% list(b = 2)
Performs Cameron and Trivedi's (1990) regression-based tests for overdispersion in count models.
OVDtest(object)OVDtest(object)
object |
An object of class |
These tests evaluate the null hypothesis that the conditional variance equals the conditional mean (the Poisson assumption). Rejection indicates overdispersion and suggests that a negative binomial model may be more appropriate.
When the input object is not a Poisson model, a Poisson regression is fitted
internally using the value (mean) equation specification from object in order
to perform the test.
A list containing the results of the tests against NB1 and NB2 alternatives, with coefficient estimates, t-statistics, and p-values.
Alfonso Sanchez-Penalver
Cameron, A. C., & Trivedi, P. K. (1990). 'Regression-based tests for overdispersion in the Poisson model.' Journal of Econometrics, 46(3), 347-364. doi:10.1016/0304-4076(90)90014-K
Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data (2nd ed.). Cambridge University Press. doi:10.1017/CBO9781139013567
# Poisson model fit_pois <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) OVDtest(fit_pois) # Negative binomial model (the test still fits a Poisson internally using # only the value equation, so results are identical) fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) OVDtest(fit_nb2)# Poisson model fit_pois <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) OVDtest(fit_pois) # Negative binomial model (the test still fits a Poisson internally using # only the value equation, so results are identical) fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) OVDtest(fit_nb2)
Methods for computing predictions from models fitted with the
mlmodels package.
## S3 method for class 'ml_beta' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_gamma' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_lm' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_logit' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'mlmodel' predict(object, ...) ## S3 method for class 'ml_negbin' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_poisson' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_probit' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )## S3 method for class 'ml_beta' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_gamma' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_lm' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_logit' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'mlmodel' predict(object, ...) ## S3 method for class 'ml_negbin' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_poisson' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_probit' predict( object, newdata = NULL, type = "response", se.fit = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )
object |
An object from an estimation with one of our models. |
newdata |
Optional data frame for out-of-sample predictions. |
type |
Character string indicating what to predict. See Details. |
se.fit |
Logical. If |
vcov |
Optional user-supplied variance-covariance matrix. |
vcov.type |
Type of variance-covariance matrix. See vcov. |
cl_var |
Clustering variable (name or vector). |
repetitions |
Number of bootstrap replications when |
seed |
Random seed for bootstrapping, for reproducibility. |
progress |
Logical. Show bootstrap/jackknife progress bar? Default is
|
... |
Additional arguments passed to methods. |
The type argument controls what quantity is returned.
| Type | Description | Notes |
"link" |
Linear mean predictor ( xb ) | logit-mean |
"response" |
Expected proportion (outcome) | Default |
"mean" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
- |
"odds" |
Odds ratio | exp(xb) |
"zd" |
Linear precision predictor | log-phi |
"phi" |
Dispersion parameter | - |
"shape1" |
Shape parameter of the beta distribution | mu * phi |
"shape2" |
Shape parameter of the beta distribution | (1 - mu) * phi |
"mode" |
Mode prediction (See below) | (shape1 - 1) / (shape1 + shape2 - 2) |
"variance" |
Variance of the outcome variable | mu * (1 - mu) / (1 + phi) |
"var" |
Alias for "variance" |
- |
"sigma" |
Standard deviation of outcome variable | sqrt("variance") |
"sd" |
Alias for "sigma" |
- |
When se.fit = TRUE, standard errors are computed using the delta method
for all supported types.
Mode Indeterminations
The mode is only defined if shape1 > 1 and shape2 > 1 and shape1 + shape2 != 2. If these conditions are not met the prediction and standard
error will be NA.
The type argument controls what quantity is returned.
| Type | Description | Notes |
"link" |
Linear mean predictor ( xb ) | log-mean |
"response" |
Expected outcome | Default |
"mean" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
- |
"zd" |
Linear shape predictor | log-nu |
"nu" |
Shape parameter | - |
"variance" |
Variance of the outcome variable | - |
"var" |
Alias for "variance" |
- |
"sigma" |
Standard deviation of outcome variable | sqrt("variance") |
"sd" |
Alias for "sigma" |
- |
When se.fit = TRUE, standard errors are computed using the delta method
for all supported types.
The type argument controls what quantity is returned. Behavior differs
depending on whether the outcome was modeled in logs (log(y)).
| Type | Normal (linear) case | Lognormal case (log(y)) |
Notes |
link |
Linear predictor for scale (zd) | Linear predictor on log scale (mu-log) | Scale equation |
fitted |
xb (mean predictor) | xb (original log-scale predictor) | Mean equation |
response, mean, mu |
xb (E[y]) |
E[y] = exp(mu-log + sigma^2/2) - shift |
Proper expected value on original scale |
median |
xb (same as mean) | exp(mu-log) - shift | Median of y |
sigma, sd |
sd of y | sd of log(y) |
On log scale |
sigma_y, sd_y |
same as sigma |
sd of y | Only meaningful in lognormal case |
variance, var |
sigma^2 | sigma^2 (variance of log(y)) |
On log scale |
variance_y, var_y |
same as variance |
Var(y) = exp(2 mu-log + sigma^2)(exp(sigma^2) - 1) | Only meaningful in lognormal case |
zd |
Linear predictor for scale (zd) | Linear predictor for scale (zd) | Alias for link
|
When the outcome is log-transformed, response (or mean) returns the
correct lognormal expected value on the original scale of y. The median
is the simple exponential back-transform.
The type argument controls what quantity is returned. Behavior differs
depending on whether the model is homoskedastic or heteroskedastic.
| Type | Homoskedastic case | Heteroskedastic case | Notes |
"xb" |
Linear predictor xb | Linear predictor xb | Linear predictor for value |
"response" |
P(y=1 | x) | P(y=1 | x) | Prob. of success (default) |
"prob" |
Alias for "response" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
Alias for "response" |
- |
"prob0" |
P(y=0 | x) | P(y=0 | x) | Prob. of failure |
"link" |
Linear predictor xb | xb / exp(zd) | Log-odds |
"odds" |
Odds = exp(xb) | Odds = exp(xb / exp(zd)) | - |
"sigma" |
1 (constant) | Std. Deviation: exp(zd) | Only available if heteroskedastic |
"variance" |
1 (constant) | Variance: exp(2*zd) | Only available if heteroskedastic |
"zd" |
0 (constant) | Linear predictor zd | Linear predictor for scale |
In binary logit models, the overall scale of the latent error term is
not identified and is normalized to 1. In the homoskedastic case there is
no scale equation, so sigma is fixed at 1. In the heteroskedastic case,
the scale equation has no intercept. Therefore, the predicted "sigma"
and "variance" represent individual-level deviations from the
normalized overall scale, not the absolute standard deviation or variance.
When se.fit = TRUE, standard errors are computed using the delta method.
Standard errors are not available (and will return NA) for "sigma",
"variance", and "zd" in homoskedastic models.
The type argument controls what quantity is returned. In addition to
standard types, Negative Binomial models support flexible probability requests
using the P(...) syntax.
| Type | Description | Notes |
"link" |
Linear mean predictor ( xb ) | log-mean |
"response" |
Expected count ( mu = exp(xb) ) |
Default |
"mean" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
- |
"zd" |
Linear dispersion predictor | log-alpha |
"alpha" |
Dispersion parameter | - |
"variance" |
Variance of the outcome variable | - |
"var" |
Alias for "variance" |
- |
"sigma" |
Standard deviation of outcome variable | sqrt("variance") |
"sd" |
Alias for "sigma" |
- |
P(k) |
P(Y = k) | Exact probability, k integer >= 0 |
P(,k) |
P(Y <= k) | Cumulative (lower tail) |
P(k,) |
P(Y >= k) | Survival (upper tail) |
P(a,b) |
P(a <= Y <= b) | Interval probability, a <= b, a >= 0 |
When se.fit = TRUE, standard errors are computed using the delta method
for all supported types.
The type argument controls what quantity is returned. In addition to
standard types, Poisson models support flexible probability requests
using the P(...) syntax.
| Type | Description | Notes |
"link" |
Linear predictor ( xb ) | log-mean |
"response" |
Expected count ( mu = exp(xb) ) |
Default |
"mean" |
Alias for "response" |
- |
"mu" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
- |
P(k) |
P(Y = k) | Exact probability, k integer >= 0 |
P(,k) |
P(Y <= k) | Cumulative (lower tail) |
P(k,) |
P(Y >= k) | Survival (upper tail) |
P(a,b) |
P(a <= Y <= b) | Interval probability, a <= b, a >= 0 |
When se.fit = TRUE, standard errors are computed using the delta method
for all supported types.
The type argument controls what quantity is returned. Behavior differs
depending on whether the model is homoskedastic or heteroskedastic.
| Type | Homoskedastic case | Heteroskedastic case | Notes |
"xb" |
Linear predictor xb | Linear predictor xb | Linear predictor for value |
"response" |
P(y=1 | x) | P(y=1 | x) | Prob. of success (default) |
"prob" |
Alias for "response" |
Alias for "response" |
- |
"fitted" |
Alias for "response" |
Alias for "response" |
- |
"prob0" |
P(y=0 | x) | P(y=0 | x) | Prob. of failure |
"link" |
Linear predictor xb | xb / exp(zd) | Probit index |
"odds" |
Odds = prob / prob0 | Odds = prob / prob0. | - |
"sigma" |
1 (constant) | Std. Deviation: exp(zd) | Only available if heteroskedastic |
"variance" |
1 (constant) | Variance: exp(2*zd) | Only available if heteroskedastic |
"zd" |
0 (constant) | Linear predictor zd | Linear predictor for scale |
In binary probit models, the overall scale of the latent error term is
not identified and is normalized to 1. In the homoskedastic case there is
no scale equation, so sigma is fixed at 1. In the heteroskedastic case,
the scale equation has no intercept. Therefore, the predicted "sigma"
and "variance" represent individual-level deviations from the
normalized overall scale, not the absolute standard deviation or variance.
The "link" type returns the value on the probit scale, which is the
inverse of the standard normal cumulative distribution function
(p = Phi^(-1)(p)). This is the linear prediction (p = xb) ih homoskedastic models,
and the standardized linear predictor (p = xb / sigma) in heteroskedastic models.
When se.fit = TRUE, standard errors are computed using the delta method.
Standard errors are not available (and will return NA) for "sigma",
"variance", and "zd" in homoskedastic models.
An object that inherits from predict.mlmodel and has two elements:
Vector with the predictions.
If se.fit is TRUE a vector with the delta-method standard
errors, using analytical gradients. If se.fit is FALSE, it is set to
NULL.
Alfonso Sanchez-Penalver
# Basic usage and different predict types data(docvis) fit_pois <- ml_poisson(docvis ~ age + educyr + totchr, data = docvis) head(predict(fit_pois, type = "response")$fit) # Expected count head(predict(fit_pois, type = "P(3)")$fit) # Prob of exactly 3 # Prediction at the mean (typical case) typical <- data.frame(age = mean(docvis$age), educyr = mean(docvis$educyr), totchr = mean(docvis$totchr)) predict(fit_pois, newdata = typical, type = "response") # In-sample vs full-data prediction with subset / boundary dropping data(pw401k) fit_beta <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, data = pw401k, subset = prate < 1) # In-sample prediction (NAs for dropped observations) head(predict(fit_beta, type = "response")$fit) # Full-data prediction (predicts for all rows, including dropped ones) head(predict(fit_beta, newdata = pw401k, type = "response")$fit)# Basic usage and different predict types data(docvis) fit_pois <- ml_poisson(docvis ~ age + educyr + totchr, data = docvis) head(predict(fit_pois, type = "response")$fit) # Expected count head(predict(fit_pois, type = "P(3)")$fit) # Prob of exactly 3 # Prediction at the mean (typical case) typical <- data.frame(age = mean(docvis$age), educyr = mean(docvis$educyr), totchr = mean(docvis$totchr)) predict(fit_pois, newdata = typical, type = "response") # In-sample vs full-data prediction with subset / boundary dropping data(pw401k) fit_beta <- ml_beta(prate ~ mrate + I(mrate^2) + log(totemp) + I(log(totemp)^2) + age + I(age^2) + sole, data = pw401k, subset = prate < 1) # In-sample prediction (NAs for dropped observations) head(predict(fit_beta, type = "response")$fit) # Full-data prediction (predicts for all rows, including dropped ones) head(predict(fit_beta, newdata = pw401k, type = "response")$fit)
A dataset containing participation rates and firm characteristics used in Papke and Wooldridge (1996).
pw401kpw401k
A data frame with 4734 rows and 5 variables:
participation rate, proportion
matching rate, proportion
total number of employees
years the plan has been in place
binary indicating if it's sole plan offered by employer
Papke, L. E. and Wooldridge, J. M. (1996).
Papke, L. E., & Wooldridge, J. M. (1996). "Econometric methods for fractional response variables with an application to 401(k) plan participation rates." Journal of Applied Econometrics, 11(6), 619-632. doi:10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1
Extract Model Residuals
## S3 method for class 'mlmodel' residuals(object, type = c("response", "pearson"), ...)## S3 method for class 'mlmodel' residuals(object, type = c("response", "pearson"), ...)
object |
An |
type |
Character string. Type of residuals to return.
Currently supported: |
... |
Further arguments passed to methods (currently not used). |
"response" residuals are the raw residuals: observed minus fitted values.
"pearson" residuals are standardized by the model-implied standard deviation:
.
For Poisson models they use , for binary models
, and for other models the appropriate
variance from predict(object, type = "var") or type = "var_y".
A numeric vector of residuals aligned to the original data frame.
Observations dropped during estimation (due to NAs or subset)
return NA.
Extract Standard Errors from mlmodel Objects
Extracts standard errors for an mlmodel object.
se(object, ...) ## S3 method for class 'mlmodel' se( object, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )se(object, ...) ## S3 method for class 'mlmodel' se( object, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )
object |
An object of class |
... |
Further arguments passed to methods (currently not used). |
vcov |
An optional user-supplied variance-covariance matrix. |
vcov.type |
Character string specifying the type of variance-covariance
matrix to use. One of |
cl_var |
Character string or vector. Name of the clustering variable
(or the vector itself) when |
repetitions |
Integer. Number of bootstrap replications when
|
seed |
Integer. Random seed for reproducibility when bootstrapping.
If |
progress |
Logical. Should a progress bar be shown during bootstrapping
or jackknifing? Default is |
A named numeric vector of standard errors, with the same names
as coef(object).
vcov.mlmodel, summary.mlmodel, confint.mlmodel
Sample of males in 1979 and early 1980 from the smoking supplement.
smokesmoke
A data frame with 807 observations and 10 variables.
Years of education
State's cigarette price (cents per pack)
Binary: white
Age in years
Annual income in dollars
Number of cigarettes smoked per day
Binary: restaurant restrictions on smoking
log(income)
age^2
log(cigprice)
https://cran.r-project.org/package=wooldridge
Wooldridge, J. M. (2020). Introductory Econometrics: A Modern Approach. 7th Edition. Boston, MA: Cengage Learning.
Mullahy, J. (1997). "Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking and Infant Health." Review of Economics and Statistics 79, 586-593.
Summary for mlmodel objects
## S3 method for class 'ml_beta' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_gamma' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_lm' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_logit' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'mlmodel' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_negbin' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_poisson' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_probit' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )## S3 method for class 'ml_beta' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_gamma' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_lm' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_logit' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'mlmodel' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_negbin' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_poisson' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... ) ## S3 method for class 'ml_probit' summary( object, correlation = FALSE, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )
object |
A fitted model object of class |
correlation |
Logical. Should the correlation matrix of the estimated
parameters be included in the output? Default is |
vcov |
Optional user-supplied variance-covariance matrix. If provided, it will be used instead of computing one internally. |
vcov.type |
Character string specifying the type of variance-covariance matrix to use. See vcov. |
cl_var |
Character string or vector. Name of the clustering variable or the vector itself. See vcov. |
repetitions |
Integer. Number of bootstrap replications when
|
seed |
Integer. Random seed for reproducibility when |
progress |
Logical. Should a progress bar be displayed during
bootstrapping or jackknifing? Default is |
... |
Further arguments passed to methods. |
Coefficient names in the fitted object use the prefixes value:: and
scale:: to identify to which equation they belong to, and to avoid
confusion when the same variable(s) appear(s) in both the value and scale
equations.
An object of class summary.mlmodel and any other class that extends
it.
Alfonso Sanchez-Penalver
data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default: observed information matrix summary(fit) # Different variance types summary(fit, vcov.type = "opg") # Outer product of gradients summary(fit, vcov.type = "robust") # Robust/sandwich estimator # Clustered robust standard errors summary(fit, vcov.type = "robust", cl_var = "age") # Using a pre-computed variance matrix (e.g. bootstrap) v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) summary(fit, vcov = v_boot)data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Default: observed information matrix summary(fit) # Different variance types summary(fit, vcov.type = "opg") # Outer product of gradients summary(fit, vcov.type = "robust") # Robust/sandwich estimator # Clustered robust standard errors summary(fit, vcov.type = "robust", cl_var = "age") # Using a pre-computed variance matrix (e.g. bootstrap) v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) summary(fit, vcov = v_boot)
Update an mlmodel Call
## S3 method for class 'mlmodel' update( object, formula. = NULL, scale. = NULL, data = NULL, weights = NULL, subset = NULL, ..., evaluate = TRUE ) ## S3 method for class 'ml_poisson' update( object, formula. = NULL, data = NULL, weights = NULL, subset = NULL, ..., evaluate = TRUE )## S3 method for class 'mlmodel' update( object, formula. = NULL, scale. = NULL, data = NULL, weights = NULL, subset = NULL, ..., evaluate = TRUE ) ## S3 method for class 'ml_poisson' update( object, formula. = NULL, data = NULL, weights = NULL, subset = NULL, ..., evaluate = TRUE )
object |
An |
formula. |
An updated formula for the value (location/mean) equation. |
scale. |
An updated formula for the scale equation (if supported by the model). |
data |
A data frame to be used when re-fitting the model. |
weights |
Optional case weights. |
subset |
An expression or logical vector to subset, or a vector of indices
to resample ( |
... |
Further arguments passed to methods (currently ignored). |
evaluate |
Logical. If |
This method re-evaluates the original model call after modifying selected
arguments. It is used internally by IMtest() and serves as a fallback
mechanism in bootstrap and jackknife variance estimation when a model-specific
implementation is not available.
sandwich package compatibility
The functions sandwich::vcovBS() and sandwich::vcovJK() are supported
through this update() method. They produce numerically equivalent results
to our own vcov(object, type = "boot") and vcov(object, type = "jack")
when all bootstrap/jackknife replications converge, taking longer to compute
them.
Important difference: When some replications fail to converge,
sandwich includes those failed iterations in the variance calculation,
while our vcov() implementation uses only successful replications.
The latter is statistically more appropriate.
We therefore strongly recommend using the native vcov() methods provided
by mlmodels for bootstrap and jackknife variance-covariance matrices.
And updated mlmodel object (or the class of the estimator that
extends it) with the modified formula/call and refitted parameters.
Returns the variance-covariance matrix of the estimated parameters using different methods.
## S3 method for class 'mlmodel' vcov( object, type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = TRUE, ... )## S3 method for class 'mlmodel' vcov( object, type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = TRUE, ... )
object |
An object of class |
type |
Character string specifying the type of variance-covariance
matrix. One of |
cl_var |
Character string or vector. Name of the clustering variable in the data, or the vector itself. |
repetitions |
Integer. Number of bootstrap replications to use when
|
seed |
Integer. Random seed for reproducibility when |
progress |
Logical. Should a progress bar be displayed? Default is
|
... |
Further arguments passed to methods. |
The package provides several variance-covariance estimators through the type argument:
"oim" - Observed Information Matrix (default)
"opg" - Outer Product of Gradients (BHHH)
"robust" - Robust (sandwich) estimator
"cluster" - Alias for "robust" when clustering the variance but requires cl_var to be set.
"boot" - Bootstrap (with optional clustering)
"jack" - Jackknife (with optional clustering)
"jackknife" - alias for "jack"
Clustered standard errors are obtained by setting cl_var when using "robust"/"cluster", "boot", or "jack".
A symmetric variance-covariance matrix with coefficient names on the rows and columns.
Alfonso Sanchez-Penalver
data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Different variance-covariance estimators v_oim <- vcov(fit, type = "oim") # Observed Information Matrix (default) v_opg <- vcov(fit, type = "opg") # Outer Product of Gradients (BHHH) v_robust <- vcov(fit, type = "robust") # Robust / Sandwich estimator # Clustered robust standard errors v_clust <- vcov(fit, type = "robust", cl_var = "age") # Bootstrap variance-covariance matrix v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) # Jackknife variance-covariance matrix v_jack <- vcov(fit, type = "jack") # Compare standard errors across methods sterrors <- data.frame( oim = sqrt(diag(v_oim)), opg = sqrt(diag(v_opg)), robust = sqrt(diag(v_robust)), cluster = sqrt(diag(v_clust)), bootstrap = sqrt(diag(v_boot)), jackknife = sqrt(diag(v_jack)) ) sterrorsdata(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # Different variance-covariance estimators v_oim <- vcov(fit, type = "oim") # Observed Information Matrix (default) v_opg <- vcov(fit, type = "opg") # Outer Product of Gradients (BHHH) v_robust <- vcov(fit, type = "robust") # Robust / Sandwich estimator # Clustered robust standard errors v_clust <- vcov(fit, type = "robust", cl_var = "age") # Bootstrap variance-covariance matrix v_boot <- vcov(fit, type = "boot", repetitions = 100, seed = 123) # Jackknife variance-covariance matrix v_jack <- vcov(fit, type = "jack") # Compare standard errors across methods sterrors <- data.frame( oim = sqrt(diag(v_oim)), opg = sqrt(diag(v_opg)), robust = sqrt(diag(v_robust)), cluster = sqrt(diag(v_clust)), bootstrap = sqrt(diag(v_boot)), jackknife = sqrt(diag(v_jack)) ) sterrors
Performs Vuong's (1989) test for comparing two non-nested models fitted
via maximum likelihood with the mlmodels package.
vuongtest(object_1, object_2, ...) ## S3 method for class 'mlmodel' vuongtest(object_1, object_2, ...)vuongtest(object_1, object_2, ...) ## S3 method for class 'mlmodel' vuongtest(object_1, object_2, ...)
object_1 |
A fitted model object inheriting from |
object_2 |
A fitted model object inheriting from |
... |
Further arguments passed to methods (currently not used). |
Vuong's test compares two non-nested models by testing the null hypothesis that the two models are equally close to the true data generating process.
The test statistic is based on the difference in the per-observation
log-likelihood contributions between the two models. A positive significant
value favors object_1, a negative significant value favors object_2,
and a non-significant value leads to an "inconclusive" result.
Both models must be estimated on exactly the same sample.
An object of class "vuongtest.mlmodel" containing the test
statistic, p-value, and a conclusion (which model is preferred or
"inconclusive").
Vuong, Q. H. (1989). 'Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.' Econometrica, 57(2), 307-333. doi:10.2307/1912557
lrtest(), waldtest(), IMtest()
# Linear models example (lognormal vs gamma) data(mroz) mroz$incthou <- mroz$faminc / 1000 fit_lognormal <- ml_lm(log(incthou) ~ age + I(age^2) + huswage + educ + unem, data = mroz) fit_gamma <- ml_gamma(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) vuongtest(fit_lognormal, fit_gamma) # Count models example fit_poi <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) fit_nb1 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis, dispersion = "NB1") fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) # Poisson vs. NB1 vuongtest(fit_poi, fit_nb1) # NB1 vs. NB2 vuongtest(fit_nb1, fit_nb2) # Binary models example data(smoke) smoke$smokes <- smoke$cigs > 0 fit_logit <- ml_logit(smokes ~ cigpric + income + age, data = smoke) fit_probit <- ml_probit(smokes ~ cigpric + income + age, data = smoke) vuongtest(fit_logit, fit_probit)# Linear models example (lognormal vs gamma) data(mroz) mroz$incthou <- mroz$faminc / 1000 fit_lognormal <- ml_lm(log(incthou) ~ age + I(age^2) + huswage + educ + unem, data = mroz) fit_gamma <- ml_gamma(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) vuongtest(fit_lognormal, fit_gamma) # Count models example fit_poi <- ml_poisson(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) fit_nb1 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis, dispersion = "NB1") fit_nb2 <- ml_negbin(docvis ~ private + medicaid + age + I(age^2) + educyr + actlim + totchr, data = docvis) # Poisson vs. NB1 vuongtest(fit_poi, fit_nb1) # NB1 vs. NB2 vuongtest(fit_nb1, fit_nb2) # Binary models example data(smoke) smoke$smokes <- smoke$cigs > 0 fit_logit <- ml_logit(smokes ~ cigpric + income + age, data = smoke) fit_probit <- ml_probit(smokes ~ cigpric + income + age, data = smoke) vuongtest(fit_logit, fit_probit)
Performs a Wald test of linear restrictions on the parameters of an
mlmodel object.
waldtest(object, ...) ## S3 method for class 'mlmodel' waldtest( object, indices = NULL, coef_names = NULL, rest_matrix = NULL, rhs = 0, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )waldtest(object, ...) ## S3 method for class 'mlmodel' waldtest( object, indices = NULL, coef_names = NULL, rest_matrix = NULL, rhs = 0, vcov = NULL, vcov.type = "oim", cl_var = NULL, repetitions = 999, seed = NULL, progress = FALSE, ... )
object |
An object of class |
... |
Further arguments passed to methods. |
indices |
Integer vector. Positions of the coefficients to be tested. |
coef_names |
Character vector. Names of the coefficients to test. |
rest_matrix |
Numeric matrix. A q × k restriction matrix (advanced use). |
rhs |
Numeric vector. Value(s) the linear combination(s) should equal. Default is 0. |
vcov |
Optional user-supplied variance-covariance matrix. |
vcov.type |
Character string. Type of variance-covariance matrix to use.
One of |
cl_var |
Character string or vector. Clustering variable when
|
repetitions |
Integer. Number of bootstrap replications when
|
seed |
Integer. Random seed for bootstrap. |
progress |
Logical. Show progress bar during bootstrapping? Default |
The Wald test evaluates linear restrictions of the form .
Three convenient interfaces are provided:
indices or coef_names: Test individual coefficients (or groups of
coefficients) against the value(s) in rhs (defaults to 0, which is
useful for joint significance tests).
rest_matrix + rhs: Test general linear combinations of coefficients
(advanced use).
The test statistic follows a distribution with degrees of
freedom equal to the number of restrictions under the null hypothesis.
An object of class "waldtest.mlmodel".
Alfonso Sanchez-Penalver
lrtest(), IMtest(), confint.mlmodel()
data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # 1. Test single coefficients using indices (default OIM) waldtest(fit, indices = c(2, 5)) # 2. Test using coefficient names and robust standard errors waldtest(fit, coef_names = c("value::educ", "value::unem"), vcov.type = "robust") # 3. Test explicit constraints waldtest(fit, coef_names = "value::educ", rhs = 1, vcov.type = "robust") # 4. Test a linear combination of two coefficients using a restriction matrix # H0: educ + huswage = 3 R <- matrix(c(0, 0, 0, 1, 1, 0, 0), nrow = 1) waldtest(fit, rest_matrix = R, rhs = 3, vcov.type = "boot", repetitions = 100, seed = 123)data(mroz) mroz$incthou <- mroz$faminc / 1000 fit <- ml_lm(incthou ~ age + I(age^2) + huswage + educ + unem, data = mroz) # 1. Test single coefficients using indices (default OIM) waldtest(fit, indices = c(2, 5)) # 2. Test using coefficient names and robust standard errors waldtest(fit, coef_names = c("value::educ", "value::unem"), vcov.type = "robust") # 3. Test explicit constraints waldtest(fit, coef_names = "value::educ", rhs = 1, vcov.type = "robust") # 4. Test a linear combination of two coefficients using a restriction matrix # H0: educ + huswage = 3 R <- matrix(c(0, 0, 0, 1, 1, 0, 0), nrow = 1) waldtest(fit, rest_matrix = R, rhs = 3, vcov.type = "boot", repetitions = 100, seed = 123)