Title: | Model Selection and Multimodel Inference Made Easy |
---|---|
Description: | Automated model selection and model-averaging. Provides a wrapper for glm and other functions, automatically generating all possible models (under constraints set by the user) with the specified response and explanatory variables, and finding the best models in terms of some Information Criterion (AIC, AICc or BIC). Can handle very large numbers of candidate models. Features a Genetic Algorithm to find the best models when an exhaustive screening of the candidates is not feasible. |
Authors: | Vincent Calcagno [aut, cre] |
Maintainer: | Vincent Calcagno <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.8 |
Built: | 2024-11-26 06:27:41 UTC |
Source: | CRAN |
These functions are used by glmulti
to compute Information Criteria from a fitted model object.
They are S4 generics, currently with methods for glm
/lm
objects.
You may define your own methods if needed, to support other fitting functions or to use other IC.
aic(object, ...) aicc(object, ...) bic(object, ...) qaic(object, ...) qaicc(object, ...)
aic(object, ...) aicc(object, ...) bic(object, ...) qaic(object, ...) qaicc(object, ...)
object |
A fitted model object. |
... |
Further arguments that may be useful for future methods. |
When using qaic
or qaicc
, you will need to provide an estimate of the inflation factor c.
This is done through the global variable glmultiqaiccvalue
. Set this varaible equal to your estimated c before running glmulti
.
Otherwise it will stop with a warning.
The IC value of the model.
Vincent Calcagno, McGill University
Used to compute the AIC value for several classes of models.
Default method for e.g. glm
and lm
Used to compute the AICc value for several classes of models.
Default method for e.g. glm
and lm
Used to compute the BIC value for several classes of models.
Default method for e.g. glm
and lm
These functions, applied on a glmulti
object, produce model-averaged estimates, unconditional confidence intervals, and predictions
from the models in the confidence set (or a subset of them). They are equivalents of the standard coef
and predict
for single models.
# S3 coef method for class 'glmulti' ## S3 method for class 'glmulti' coef(object, select="all", varweighting="Buckland", icmethod="Lukacs", alphaIC=0.05, ...) # S3 predict method for class 'glmulti' ## S3 method for class 'glmulti' predict(object, select="all", newdata=NA, se.fit=FALSE, varweighting="Buckland", icmethod="Lukacs", alphaIC=0.05, ...)
# S3 coef method for class 'glmulti' ## S3 method for class 'glmulti' coef(object, select="all", varweighting="Buckland", icmethod="Lukacs", alphaIC=0.05, ...) # S3 predict method for class 'glmulti' ## S3 method for class 'glmulti' predict(object, select="all", newdata=NA, se.fit=FALSE, varweighting="Buckland", icmethod="Lukacs", alphaIC=0.05, ...)
object |
an object of class |
select |
A specification of which models should be used for inference. By default all models are used, see below. |
varweighting |
The method to be used to compute the unconditional variance. "Buckland" (the default) (implements the approach presented in Buckland et al. 1997. "Johnson" implements a slightly different approach recommended in Johnson \& Omland 2004 and proposed at page 235 in Burnham \& Anderson 2002. The latter results in slightly bigger estimates of the unconditional variance of model coefficients. |
icmethod |
Method to construct confidence intervals. One of "Standard", "Burnham" or "Lukacs". The three methods differ in their use (or not) of degrees of freedom. |
newdata |
New data.frame of data for which to predict values |
se.fit |
Whether to return unconditional variances and confidence intervals associated with predicted values |
alphaIC |
The alpha risk when building the confidence intervals |
... |
Further arguments to single-model |
select can be used to specify which models should be used for inference. By default all are used. If specifying an integer value x, only the x best models are used. If a numeric value is provided, if it less than one, models that sum up to x% of evidence weight are used. If it more than one, models within x IC units from the best model are used.
coef
returns a data.frame with model-averaged estimates of the different parameters in the models, as well as their unconditional variance, importance, and confidence interval according to one of three methods: "Standard" simply assumes a Normal distribution of the estimator (Buckland 1997), "Lukacs" assumes a Student distribution with degrees of freedom taken to be averaged across models (see Lukacs et al. 2010), and "Burnham" is a more sophisticated Student-based method proposed by Burnham \& Anderson 2002.
predict
returns a list of three elements: the multi-model predictions, their variability (unconditional variance and confidence interval, if se.fit=T
), and the number of NA predicted values that were treated as zeros when averaging models.
Vincent Calcagno, McGill University
Buckland et al. 1997. Model selection: an integral part of inference. Biometrics. Burnham \& Anderson. 2002. Model Selection and Multimodel Inference. An Information Theoretic Practical Approach. Johnson \& Omland. 2004. Model selection in ecology and evolution. TREE. Lukacs et al. 2010..Model selection bias and Freedman's paradox. Annals of the Institute of Statistical Mathematics.
The function takes a list of glmulti
objects and returns a new glmulti
object.
This object contains the best models found in all passed objects, with duplicates removed.
Useful to bring back together the different parts of a parallelized exhaustive screen, or to make the consensus
of several replicate genetic algorithms.
consensus(xs, confsetsize = NA, ...)
consensus(xs, confsetsize = NA, ...)
xs |
A list containing |
confsetsize |
The number of best models to be included in the consensus object. If |
... |
Further arguments, allowing to write custom S4 methods for this generic. |
The name of the returned object is the name of the first object passed, with "consensus-" prepended.
The params
slot of the returned object is taken from the first object too.
Note that if not all glmulti
objects in xs do contain fitted model objects, then no model object will be included in the returned glmulti
object.
A standard object of class glmulti
Vincent Calcagno, McGill University
Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12
Concatenates several glmulti
objects and makes a consensus of them.
a list of glmulti
objects to make a consensus of. Can include actual objects or filenames pointing to objects written on the disk
(typically glmulti
objects that have been saved using write
with file="filename|object"
).
Generic S4 function used to access the coefficients (and their standard error) from a fitted model.
It is used by coef.glmulti
.
getfit(object, ...)
getfit(object, ...)
object |
A fitted model object, of class |
... |
Further arguments that may be required by some methods of this generic. |
Define your own method for this generic when using glmulti
with some custom fitting function.
A data.frame, with as rows the coefficients that are fitted in the model, and three columns: the first with the estimated coefficients, the second with the associated standard errors, and the third with degrees of freedom.
Vincent Calcagno, McGill University
Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12
Used by coef.glmulti
to access the contents of fitted model objects of different classes.
Default method, for e.g. glm
objects.
Method for coxph
objects.
Method for coxph.null
objects.
glmulti
finds what are the n
best models (the confidence set of models) among all possible models (the candidate set, as specified by the user).
Models are fitted with the specified fitting function (default is glm
) and are ranked with the specified Information Criterion (default is aicc
).
The best models are found either through exhaustive screening of the candidates, or using a genetic algorithm, which allows very large candidate sets to be adressed.
The output can be used for model selection, variable selection, and multimodel inference.
# glmulti S4 generic glmulti(y, xr, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1, level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aic", confsetsize = 100, popsize = 100, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", includeobjects=TRUE, ...)
# glmulti S4 generic glmulti(y, xr, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1, level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aic", confsetsize = 100, popsize = 100, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", includeobjects=TRUE, ...)
y |
A formula, character string, or fitted model (of class lm or glm) specifying the response variable and the terms (main effects and/or interactions) to be used in the candidate models (e.g. height~age*sex+mass). Alternatively, a character string naming the variable to be used as response (e.g. "height") (in which case the names of the predictors must be passed through the xr argument) Alternatively, a custom list of (fitted) model objects can also be passed (can be convenient for small candidate sets). |
xr |
An optional character array specifying the variables (categorical or quantitative) to be used as predictors, e.g. c("age", "height" , "mass") |
exclude |
Optional character vector naming terms (main effects or interactions) to be excluded from the candidate models, e.g. c("mass:height") |
intercept |
Whether to include an intercept in the candidate models or not. |
level |
If 1, only main effects (terms of order 1) are used to build the candidate set. If 2, pairwise interactions are also used (higher order interactions are currently ignored) |
data |
A data.frame containing the data. If not specified, |
name |
The name of this |
marginality |
Whether to apply the marginality rule or not. If TRUE, only marginal models will be considered. |
minsize |
This sets a constraint on candidate models. Minimal number of TERMS (main effects or interactions) to be included in candidate models (negative = no constraint) |
maxsize |
This sets a constraint on candidate models. Maximal number of TERMS to be included in candidate models (negative = no constraint) |
minK |
This sets a constraint on candidate models. Minimal complexity of candidate models (negative = no constraint) |
maxK |
This sets a constraint on candidate models. Maximal complexity of candidate models (negative = no constraint) |
method |
The method to be used to explore the candidate set of models. If "h" an exhaustive screening is undertaken. If "g" the genetic algorithm is employed (recommended for large candidate sets). If "l", a very fast exhaustive branch-and-bound algorithm is used. Package leaps must then be loaded, and this can only be applied to linear models with covariates and no interactions. If "d", a simple summary of the candidate set is printed, including the number of candidate models. |
crit |
The Information Criterion to be used. This should be a function that accepts a fitted model as first argument. Default is the original Akaike IC (aic).
Other provided functions are the Bayes IC ( |
fitfunction |
The fitting function to be used. Any function similar to |
confsetsize |
The number of models to be looked for, i.e. the size of the returned confidence set. |
plotty |
Whether to plot the progress of the IC profile when running. |
report |
Whether to report about the progress at run time. |
bunch |
The number of model formulas to be returned (to be fitted) at each call to the enumerator. Exhaustive screening only. |
chunk |
When using an exhaustive screening approach, it can be splitted in several parts to take advantage of multiple CPUs. chunk is an integer specifying which part the current call should perform. |
chunks |
When splitting an exhaustive screening, the total number of parts the task should be divided into. For example, with a quad-core processor, 4 may be useful. Use consensus to bring back the pieces into a single object. |
popsize |
The population size for the genetic algorithm |
mutrate |
The per locus (i.e. per term) mutation rate for genetic algorithm, between 0 and 1 |
sexrate |
The rate of sexual reproduction for the genetic algorithm, between 0 and 1 |
imm |
The rate of immigration for the genetic algorithm, between 0 and 1 |
deltaM |
The target change in mean IC (defines the stop rules for the genetic algorithm) |
deltaB |
The target change in best IC (defines the stop rules for the genetic algorithm) |
conseq |
The target successive number of times with no improvement (i.e. target changes have been attained) (defines the stop rule for the GA). The greater it is, the more stringent the stop rule. |
resumefile |
When resuming an analysis (method="r"), the name of the files from which to resume. Default uses the same as name |
includeobjects |
Whether or not to include fiited models as objects. This makes |
... |
Further arguments to be passed to the fitting function. E.g. maxit=50 or family=binomial |
glmulti
is defined as a S4 function. It acts as a frontend that calls background compiled functions (contained if archive glmulti.jar).
Running the function therefore requires a Java Running Environment, and package rJava
.
A thorough description of this function and package can be found in the article by Calcagno and de Mazancourt (see References).
print.glmulti
and summary.glmulti
are S3 methods which provide a synthesis of glmulti
analyses.
NOTE: When calling glmulti
with a model object as y, only the formula will be extracted from the object. This means that optional arguments to the fitting function (e.g. family or maxit) will NOT be extracted. These arguments should be passed to glmulti
through the ....
An object of class glmulti
is returned. It is a S4 object with several slots containing relevant data for model selection and beyond.
Several standard S3 functions are provided to help access the content of this object.
Several glmulti
objects can be shrunk to one using the function consensus. This is useful to get the best of several replicates (of the genetic algorithm) or to bring together the different parts of a splitted exhaustive screening.
When running a genetic algorithm, two tiny java
files (serialized objects) are also written to the disk at regular intervals. They can be used to resume the calculation (method="r") if it was interrupted for any reason. This can also be used to continue a GA with modified parameters (e.g. mutation rate).
Vincent Calcagno, McGill University, Canada
Buckland (1997) Model Selection: an Integral Part of Inference. Biometrics 10:41 Burnham & Anderson (2002) Model Selection and Multimodel Inference: an Information Theoretic Approach Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12
consensus
, aic
, weightable
, summary.glmulti
, coef.glmulti
, step
# See the document "glmulti.pdf" included in the package. # It explains the general approach and shows how to use # glmulti with mixed models from the lme4 package. # Other examples: # A. This shows how to do the same for zero-inflated poisson models # we load the required package library(pscl) # a random vector of count data round(runif(100, 0,20)*round(runif(100)))-> vy2 # dummy predictors va = runif(100) vb = runif(100) # 1. The wrapper function zeroinfl.glmulti=function(formula, data, inflate = "|1",...) { zeroinfl(as.formula(paste(deparse(formula), inflate)),data=data,...) } # The default getfit and aicc method will work for zeroinfl objects, # so no need to redefine them # we can proceed directly glmulti(vy2~va*vb,fitfunc=zeroinfl.glmulti,inflate="|1")->bab # B. This shows how to include some terms in ALL the models # As above, we just prepare a wrapper of the fitting function glm.redefined = function(formula, data, always="", ...) { glm(as.formula(paste(deparse(formula), always)), data=data, ...) } # we then use this fitting function in glmulti glmulti(vy2~va,level=1,fitfunc=glm.redefined,always="+vb")-> bab # va will be shuffled but vb is always included in the models # this procedure allows support of arbitrarily any fitting function, # or the use of sophisticated constraints on the model structure
# See the document "glmulti.pdf" included in the package. # It explains the general approach and shows how to use # glmulti with mixed models from the lme4 package. # Other examples: # A. This shows how to do the same for zero-inflated poisson models # we load the required package library(pscl) # a random vector of count data round(runif(100, 0,20)*round(runif(100)))-> vy2 # dummy predictors va = runif(100) vb = runif(100) # 1. The wrapper function zeroinfl.glmulti=function(formula, data, inflate = "|1",...) { zeroinfl(as.formula(paste(deparse(formula), inflate)),data=data,...) } # The default getfit and aicc method will work for zeroinfl objects, # so no need to redefine them # we can proceed directly glmulti(vy2~va*vb,fitfunc=zeroinfl.glmulti,inflate="|1")->bab # B. This shows how to include some terms in ALL the models # As above, we just prepare a wrapper of the fitting function glm.redefined = function(formula, data, always="", ...) { glm(as.formula(paste(deparse(formula), always)), data=data, ...) } # we then use this fitting function in glmulti glmulti(vy2~va,level=1,fitfunc=glm.redefined,always="+vb")-> bab # va will be shuffled but vb is always included in the models # this procedure allows support of arbitrarily any fitting function, # or the use of sophisticated constraints on the model structure
Contains the results of a glmulti
analysis.
Objects will never be created directly but through calls of glmulti
or by applying
consensus
on a list of glmulti
objects.
name
:Object of class "character"
: the name of the analysis.
params
:Object of class "list"
: parameter values used when calling glmulti
to produce the object.
nbmods
:Object of class "integer"
: the number of models that have been found by glmulti
.
crits
:Object of class "numeric"
: the IC values of the models found, in ascending order.
K
:Object of class "integer"
: for each model, its complexity, from best to worst model.
formulas
:Object of class "list"
: for each model, its formula, from best to worst.
call
:Object of class "call"
: the original glmulti
call that produced this object.
adi
:Object of class "list"
: additional arguments that had been passed to the fitting function through glmulti
.
objects
:Object of class "list"
: The list of fitted model objects for the confidence set, if includeobjects=T
.
Vincent Calcagno, McGill University
Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12
summary.glmulti
, consensus
, coef.glmulti
showClass("glmulti")
showClass("glmulti")
codeglmulti finds what are the n
best models (the confidence set of models) among all possible models (the candidate set, as specified by the user).
Models are fitted with the specified fitting function (default is glm
) and are ranked with the specified Information Criterion (default is aic
).
The best models are found either through exhaustive screening of the candidates or using a genetic algorithm, which allows very large candidate sets to be adressed.
The output can be used for model selection, variable selection, and multimodel inference.
This will stop with a warning that an improper call has been attempted.
Calling glmulti
with the names of the response variable and of the predictors as
character strings. This is the original interface used in versions earlier than 0.6-1.
Calling glmulti
with a model formula represented as a character string. E.g. "u~c+x"
Calling glmulti
with a model formula containing all the terms to be included in candidate models.
E.g. u~c+x
Calling glmulti
on a list of (fitted) model objects. Models will not be refitted, but the information criteria will be computed and a regular glmulti
object is returned.
Calling glmulti
with a glm
object from which the formula and other parameters will be extracted.
Calling glmulti
on a lm
object from which the formula and other parameters will be extracted.
If y
is missing the glmulti
version currently used is printed.
Used to compute the QAIC value for several classes of models. Specify the estimated c value to be used through the "glmulti-cvalue" option.
Default method for e.g. glm
and lm
Used to compute the QAICc value for several classes of models. Specify the estimated c value to be used through the "glmulti-cvalue" option.
Default method, for e.g. glm
and lm
These standard S3 functions can be applied on glmulti
to print a short report, obtain a more detailed summary,
or produce different types of graphics.
# S3 summary method for class 'glmulti' ## S3 method for class 'glmulti' summary(object, ...) # S3 print method for class 'glmulti' ## S3 method for class 'glmulti' print(x, ...) # S3 plot method for class 'glmulti' ## S3 method for class 'glmulti' plot(x, type="p", ...)
# S3 summary method for class 'glmulti' ## S3 method for class 'glmulti' summary(object, ...) # S3 print method for class 'glmulti' ## S3 method for class 'glmulti' print(x, ...) # S3 plot method for class 'glmulti' ## S3 method for class 'glmulti' plot(x, type="p", ...)
object |
A |
x |
An object of class |
type |
The type of graph to be produced. One of "p", "r", "s" or "w" (see below). |
... |
Further arguments. |
The name of the returned object is the name of the first object passed, with "consensus-" prepended.
The params
slot of the returned object is taken from the first object too.
plot
can be used to have a graphical representation of the results. Two types are proposed:
type="p"
plots the IC profile (the IC values form the best to the worst model). A horizontal line delineates
models that are less than 2 IC units away from the best model.
type="r"
shows diagnostics of the fit (residuals versus predicted values, and QQ plot of residuals) for the five best models. It calls the plot
functions on the fitted model objects, which should be defined (e.g. plot.lm
, plot.glm
). This type of plot can only be used if model objects are included in the glmulti
object (i.e. if includeobjects
was set to true).
type="s"
plots the relative importance of model terms, i.e. the overall support for each variable across all models.
A vertical line is drawn at 80
type="w"
plots the normalized evidence weights of the models. A vertical line delineates models that sum um to 95
print
prints a brief synthesis of the analysis (e.g. the best model found, its IC value and evidence weight...)
summary
returns a list with more detailed elements:
name |
the name of the analysis |
method |
The method used |
fitting |
The fitting function used |
crit |
The IC used |
level |
Whether interactions between predictors were considered or not |
marginality |
Whether the marginality rule was applied |
confsetsize |
The requested size of the confidence set |
bestic |
The lowest IC found |
icvalues |
The IC values of the models in the confidence set, from lower to greater |
bestmodel |
A list containing the formula of the best model found, or of the best models found if several had the same IC value |
generations |
The number of generations it took to converge. For genetic algorithm only. |
elapsed |
The actual (system) time it took. For genetic algorithm only. |
includeobjects |
A boolean indicating whether fitted model objects are contained in the object. |
Vincent Calcagno, McGill University
Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12
Prepares a table with model formulas, IC values and IC relative supports
weightable(object, ...)
weightable(object, ...)
object |
A |
... |
For further improvements |
A data frame with the list of formulas, IC values and IC weights
Jarret Byrnes and Vincent Calcagno, McGill University
Prepares a table with model formulas, IC values and IC relative supports
Default call
The S4 method provided for glmulti
objects
writes a glmulti
object as a data.frame, or alternatively as a raw R object.
write(x, file = "data", ncolumns = if (is.character(x)) 1 else 5, append = FALSE, sep = " ")
write(x, file = "data", ncolumns = if (is.character(x)) 1 else 5, append = FALSE, sep = " ")
x |
An object of class glmulti |
file |
The name of the file to write into. If missing, the name is taken to be the name of the glmulti object.* If "\object" is at the end of the filename, then the object is written as a raw R object using saveRDS. |
ncolumns |
Not used with glmulti objects |
append |
Whether to append the output to the file, if existing. |
sep |
The character to be used to separate columns. |
When using "\object", the file written can be read directly from R using readRDS, or with consensus.
A data.frame. Rows correspond to all models, sorted from best to worse.. The first columns indicate whether the different terms are found in the formula of each model (0/1). The next columns contain model complexity (K), model IC, and model formula (as a character string).
Vincent Calcagno, McGill University
Methods for function write
Method to write glmulti
objects.