Title: | Bayesian Additive Models for Location, Scale, and Shape (and Beyond) |
---|---|
Description: | Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2021) <doi:10.18637/jss.v100.i04>. |
Authors: | Nikolaus Umlauf [aut, cre] , Nadja Klein [aut] , Achim Zeileis [aut] , Meike Koehler [ctb], Thorsten Simon [aut] , Stanislaus Stadlmann [ctb], Alexander Volkmann [ctb] |
Maintainer: | Nikolaus Umlauf <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.2-5 |
Built: | 2024-11-11 06:49:59 UTC |
Source: | CRAN |
The bamlss package is a general tool for complex Bayesian regression modeling
with structured additive predictors based on Markov chain Monte Carlo simulation.
The design of this package substantially focuses on maximum flexibility and easy
integration of new code and/or standalone systems. The package makes heavy use of
mgcv infrastructures to build up all necessary model matrices and
information from which it is relatively easy for the user to construct estimation
algorithms or interfaces to existing software packages. The package can also be
seen as an harmonized framework for regression modeling since it does not restrict
to any type of problem. The main function in this package is bamlss
,
which is a wrapper function that calls optimizer and/or sampling functions for fitting
Bayesian additive models for location scale and shape (and beyond). These model fitting
functions can be exchanged by the user. Moreover, the package contains numerous
functions for creating post-estimation results like summary statistics and
effect plots etc.
Nikolaus Umlauf, Nadja Klein, Achim Zeileis.
Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. doi:10.1080/10618600.2017.1407325
Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1–53. doi:10.18637/jss.v100.i04
This is the main model fitting function of the package. Function bamlss()
is a wrapper function that parses the data
and the model formula
, or
extended bamlss.formula
, as well as the bamlss.family
into a bamlss.frame
. The bamlss.frame
then holds all model
matrices and information that is needed for setting up estimation engines.
The model matrices are based on mgcv infrastructures, i.e.,
smooth terms are constructed using smooth.construct
and
smoothCon
. Therefore, all mgcv model term constructors like
s
, te
, t2
and ti
can be used. Identifiability conditions are imposed using function gam.side
.
After the bamlss.frame
is set up function bamlss()
applies optimizer
and/or sampling functions. These functions can also be provided by the user. See the details
below on how to create new engines to be used with function bamlss()
.
Finally, the estimated parameters and/or samples are used to create model output results like summary statistics or effect plots. The computation of results may also be controlled by the user.
bamlss(formula, family = "gaussian", data = NULL, start = NULL, knots = NULL, weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, contrasts = NULL, reference = NULL, transform = NULL, optimizer = NULL, sampler = NULL, samplestats = NULL, results = NULL, cores = NULL, sleep = NULL, combine = TRUE, model = TRUE, x = TRUE, light = FALSE, ...)
bamlss(formula, family = "gaussian", data = NULL, start = NULL, knots = NULL, weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, contrasts = NULL, reference = NULL, transform = NULL, optimizer = NULL, sampler = NULL, samplestats = NULL, results = NULL, cores = NULL, sleep = NULL, combine = TRUE, model = TRUE, x = TRUE, light = FALSE, ...)
formula |
A formula or extended formula, i.e., the |
family |
A |
data |
A |
start |
A named numeric vector containing starting values to be send to the |
knots |
An optional list containing user specified knots, see the documentation of
function |
weights |
Prior weights on the data. |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Can be used to supply model offsets for use in fitting. |
na.action |
A function which indicates what should happen when the data
contain |
contrasts |
An optional list. See the |
reference |
A |
transform |
A transformer function that is applied on the |
optimizer |
An optimizer function that returns, e.g., posterior mode estimates
of the parameters as a named numeric vector. The default optimizer function is
|
sampler |
A sampler function that returns a matrix of samples, the columns represent the
parameters, the rows the iterations. The returned matrix must be coerced to an object of
class |
samplestats |
A function computing statistics from samples, per default function
|
results |
A function computing results from the parameters and/or samples, e.g., for
creating effect plots, see function |
cores |
An integer specifying the number of cores that should be used for the sampler
function. This is based on function |
sleep |
Time the system should sleep before the next core is started. |
combine |
If samples are computed on multiple cores, should the samples be combined into
one |
model |
If set to |
x |
If set to |
light |
Should the returned object be lighter, i.e., if |
... |
Arguments passed to the |
The main idea of this function is to provide infrastructures that make it relatively easy to create estimation engines for new problems, or write interfaces to existing software packages.
The steps that are performed within the function are:
First, the function parses the data
, the formula
or the extended
bamlss.formula
as well as the bamlss.family
into a model frame
like object, the bamlss.frame
. This object holds all necessary model matrices
and information that is needed for subsequent model fitting engines. Per default,
all package mgcv smooth term constructor functions like
s
, te
, t2
and
ti
can be used (see also function smooth.construct
),
however, even special user defined constructors can be included, see the manual of
bamlss.frame
.
In a second step, the bamlss.frame
can be transformed, e.g., if a mixed
model representation of smooth terms is needed, see function randomize
.
Then an optimizer function is started, e.g., a function that finds posterior mode estimates of the parameters. A convention for model fitting engines is that such functions should have the following arguments:
optimizer(x, y, family, start, weights, offset, ...)
Internally, function bamlss()
will send the x
object that holds all
model matrices, the response y
object, the family
object, start
ing
values for the parameters, possible weights
and offset
s of the created
bamlss.frame
to the
optimizer function (see the manual of bamlss.frame
for more details on the
x
, y
and other objects). The job of the optimizer is to return a named numeric
vector of optimum parameters. The names of the parameters should be such that they can be
uniquely mapped to the corresponding model matrices in x
. See function
parameters
for more details on parameter names. The default optimizer function
is opt_bfit
. The optimizer can return more information than only the optimum
parameters. It is possible to return a list, the convention here is that an element named
"parameters"
then holds the named vector of estimated parameters. Possible other return
values could be fitted values, the Hessian matrix, information criteria or information
about convergence of the algorithm, etc. Note that the parameters are simply added to the
bamlss.frame
in an (list) entry named parameters
.
After the optimization step, a sampler
function is started. The arguments of such
sampler functions are the same as for the optimizer
functions
sampler(x, y, family, start, weights, offset, ...)
Sampler functions must return a matrix of samples, each row represents one iteration and the matrix
can be coerced to mcmc
objects. The function may return a list of samples,
e.g., if multiple chains are returned each list entry then holds one sample matrix of
one chain. The column names of the sample matrix should be the same as the names of estimated
parameters. For a possible naming convention see function parameters
, which
ensures unique mapping of samples with the model matrices in the x
object of the
bamlss.frame
. The samples are added to the bamlss.frame
in an (list) entry named samples
.
Next, the samplestats
function is applied. This function can compute any quantity
from the samples and the x
object, the arguments of such functions are
samplestats(samples, x, y, family, ...)
where argument samples
are the samples returned from the sampler
function,
and x
, y
and family
are the same objects as passed to the optimizer
and or sampler functions. For example, the default function in bamlss()
for this task
is also called samplestats
and returns the mean of the log-likelihood and the
log-posterior computed of all samples, as well as the DIC.
The last step is to compute more complex information about the model using the
results
function. The arguments of such results
functions are
results(bamlss.frame, ...)
here, the full bamlss.frame
including possible parameters
and
samples
is passed to the function within bamlss()
. The default function
for this task is results.bamlss.default
which returns an object of class
"bamlss.results"
for which generic plotting functions are and a summary
function is provided. Hence, the user can control the output of the model, the plotting
and summary statistics, too.
Note that function transform()
, optimizer()
, sampler()
, samplestats()
and results()
can be provided from the bamlss.family
object, e.g.,
if a bamlss.family
object has an element named "optimizer"
, which
represents a valid optimizer function such as opt_bfit
, exactly this optimizer
function will be used as a default when using the family.
An object of class "bamlss"
. The object is in principle only a slight extension
of a bamlss.frame
, i.e., if an optimizer
is applied it will hold the
estimated parameters in an additional element named "parameters"
. If a sampler function
is applied it will additionally hold the samples in an element named "samples"
.
The same mechanism is used for results
function.
If the optimizer
function computes additional output next to the parameters, this will
be saved in an element named "model.stats"
. If a samplestats
function is applied,
the output will also be saved in the "model.stats"
element.
Additionally, all functions that are called are saved as attribute "functions"
in the
returned object.
Nikolaus Umlauf, Nadja Klein, Achim Zeileis.
Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. doi:10.1080/10618600.2017.1407325
Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1–53. doi:10.18637/jss.v100.i04
bamlss.frame
, family.bamlss
, bamlss.formula
,
randomize
, bamlss.engine.setup
,
opt_bfit
, sam_GMCMC
, continue
,
coef.bamlss
, parameters
, predict.bamlss
,
plot.bamlss
## Not run: ## Simulated data example. d <- GAMart() f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b <- bamlss(f, data = d) summary(b) plot(b) plot(b, which = 3:4) plot(b, which = "samples") ## Use of optimizer and sampler functions: ## * first run optimizer, b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE) print(b1) summary(b1) ## * afterwards, start sampler with staring values, b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC) print(b2) summary(b2) ## Continue sampling. b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10) plot(b3, which = "samples") plot(b3, which = "max-acf") plot(b3, which = "max-acf", burnin = 500, thin = 4) ## End(Not run)
## Not run: ## Simulated data example. d <- GAMart() f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b <- bamlss(f, data = d) summary(b) plot(b) plot(b, which = 3:4) plot(b, which = "samples") ## Use of optimizer and sampler functions: ## * first run optimizer, b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE) print(b1) summary(b1) ## * afterwards, start sampler with staring values, b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC) print(b2) summary(b2) ## Continue sampling. b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10) plot(b3, which = "samples") plot(b3, which = "max-acf") plot(b3, which = "max-acf", burnin = 500, thin = 4) ## End(Not run)
A single class and corresponding methods encompassing all bamlss.family
distributions (from the bamlss package) using the workflow from the
distributions3 package.
BAMLSS(family, ...)
BAMLSS(family, ...)
family |
object. BAMLSS family specifications recognized by
|
... |
further arguments passed as parameters to the BAMLSS family. Can be scalars or vectors. |
The constructor function BAMLSS
sets up a distribution
object, representing a distribution from the BAMLSS (Bayesian additive
model of location, scale, and shape) framework by the corresponding parameters
plus a family
attribute, e.g., gaussian_bamlss
for the
normal distribution or binomial_bamlss
for the binomial
distribution. The parameters employed by the family vary across the families
but typically capture different distributional properties (like location, scale,
shape, etc.).
All parameters can also be vectors, so that it is possible to define a vector of BAMLSS distributions from the same family with potentially different parameters. All parameters need to have the same length or must be scalars (i.e., of length 1) which are then recycled to the length of the other parameters.
For the BAMLSS
distribution objects there is a wide range
of standard methods available to the generics provided in the distributions3
package: pdf
and log_pdf
for the (log-)density (PDF), cdf
for the probability
from the cumulative distribution function (CDF), quantile
for quantiles,
random
for simulating random variables,
and support
for the support interval
(minimum and maximum). Internally, these methods rely on the usual d/p/q/r
functions provided in bamlss, see the manual pages of the individual
families. The methods is_discrete
and
is_continuous
can be used to query whether the
distributions are discrete on the entire support or continuous on the entire
support, respectively.
See the examples below for an illustration of the workflow for the class and methods.
A BAMLSS
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three Weibull distributions X <- BAMLSS("weibull", lambda = c(1, 1, 2), alpha = c(1, 2, 2)) X ## moments (FIXME: mean and variance not provided by weibull_bamlss) ## mean(X) ## variance(X) ## support interval (minimum and maximum) support(X) is_discrete(X) is_continuous(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "Weibull(1,1)") hist(x[2, ], main = "Weibull(1,2)") hist(x[3, ], main = "Weibull(2,2)") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(2, 2, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations ## (FIXME: mean not provided by weibull_bamlss) ## cbind( ## "theoretical" = mean(X), ## "empirical" = rowMeans(random(X, 1000)) ## )
## package and random seed library("distributions3") set.seed(6020) ## three Weibull distributions X <- BAMLSS("weibull", lambda = c(1, 1, 2), alpha = c(1, 2, 2)) X ## moments (FIXME: mean and variance not provided by weibull_bamlss) ## mean(X) ## variance(X) ## support interval (minimum and maximum) support(X) is_discrete(X) is_continuous(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ], main = "Weibull(1,1)") hist(x[2, ], main = "Weibull(1,2)") hist(x[3, ], main = "Weibull(2,2)") ## probability density function (PDF) and log-density (or log-likelihood) x <- c(2, 2, 1) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations ## (FIXME: mean not provided by weibull_bamlss) ## cbind( ## "theoretical" = mean(X), ## "empirical" = rowMeans(random(X, 1000)) ## )
These functions can be useful when setting up new model fitting engines that are based
on the setup function bamlss.engine.setup
. See the examples.
## Functions to extract parameter states. get.par(x, what = NULL) get.state(x, what = NULL) set.par(x, replacement, what) ## Function for setting starting values. set.starting.values(x, start)
## Functions to extract parameter states. get.par(x, what = NULL) get.state(x, what = NULL) set.par(x, replacement, what) ## Function for setting starting values. set.starting.values(x, start)
x |
For function |
what |
The name of the parameter(s) that should be extracted or replaced. |
replacement |
The value(s) that should be used for replacement. |
start |
The named numeric vector of starting values. The name convention is based
on function |
## Create a bamlss.frame. d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + s(x2) + te(lon,lat), data = d, family = "gaussian") names(bf$x$mu$smooth.construct) ## Use the setup function for ## adding state elements. bf$x <- bamlss.engine.setup(bf$x, df = c("s(x1)" = 1, "s(x2)" = 3)) names(bf$x$mu$smooth.construct) ## Extract regression coefficients. get.state(bf$x$mu$smooth.construct[["te(lon,lat)"]], "b") ## Extract smoothing variances. get.state(bf$x$mu$smooth.construct[["te(lon,lat)"]], "tau2") ## More examples. state <- bf$x$mu$smooth.construct[["te(lon,lat)"]]$state get.par(state$parameters, "b") get.par(state$parameters, "tau2") state$parameters <- set.par(state$parameters, c(0.1, 0.5), "tau2") get.par(state$parameters, "tau2") ## Setting starting values. start <- c("mu.s.s(x1).b" = 1:9, "mu.s.s(x1).tau2" = 0.1) bf$x <- set.starting.values(bf$x, start = start) get.state(bf$x$mu$smooth.construct[["s(x1)"]], "b") get.state(bf$x$mu$smooth.construct[["s(x1)"]], "tau2")
## Create a bamlss.frame. d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + s(x2) + te(lon,lat), data = d, family = "gaussian") names(bf$x$mu$smooth.construct) ## Use the setup function for ## adding state elements. bf$x <- bamlss.engine.setup(bf$x, df = c("s(x1)" = 1, "s(x2)" = 3)) names(bf$x$mu$smooth.construct) ## Extract regression coefficients. get.state(bf$x$mu$smooth.construct[["te(lon,lat)"]], "b") ## Extract smoothing variances. get.state(bf$x$mu$smooth.construct[["te(lon,lat)"]], "tau2") ## More examples. state <- bf$x$mu$smooth.construct[["te(lon,lat)"]]$state get.par(state$parameters, "b") get.par(state$parameters, "tau2") state$parameters <- set.par(state$parameters, c(0.1, 0.5), "tau2") get.par(state$parameters, "tau2") ## Setting starting values. start <- c("mu.s.s(x1).b" = 1:9, "mu.s.s(x1).tau2" = 0.1) bf$x <- set.starting.values(bf$x, start = start) get.state(bf$x$mu$smooth.construct[["s(x1)"]], "b") get.state(bf$x$mu$smooth.construct[["s(x1)"]], "tau2")
This function takes the x
object of a bamlss.frame
and adds
additional objects that are useful for model fitting engines. This is applied only
for 'regular' terms, e.g., as created by s
and te
.
For special model terms the corresponding smooth.construct
method is in charge
of this (see also the examples for function bfit
).
bamlss.engine.setup(x, update = "iwls", propose = "iwlsC_gp", do.optim = NULL, df = NULL, parametric2smooth = TRUE, ...)
bamlss.engine.setup(x, update = "iwls", propose = "iwlsC_gp", do.optim = NULL, df = NULL, parametric2smooth = TRUE, ...)
x |
The |
update |
Sets the updating function for model terms, see function |
propose |
Sets the propose function for model terms, see function |
do.optim |
Adds list element |
df |
The initial degrees of freedom that should be assigned to a smooth model term,
based on the trace of the smoother matrix. Note that |
parametric2smooth |
Should parametric model terms be transformed into an artificial
smooth model term and be added to the |
... |
Currently not used. |
For each model term in the "smooth.construct"
object of the x
list (as returned
from bamlss.frame
), this function adds a named list called "state"
with the
following entries:
"parameters"
: A numeric vector. Regression coefficients are named with "b"
,
smooth variances are named with "tau2"
.
"fitted.values"
: Given the "parameters"
, the actual fitted values of
the model term.
"edf"
: Given the smoothing variances, the actual equivalent degrees of freedom
(edf) of the model term.
"do.optim"
: Should an optimizer function try to find optimum smoothing variances?
The state will be changed in each iteration and can be passed outside an updating function.
Additionally, if missing in the xt
argument of a model term (see, e.g., function
s
for xt
) the function adds the corresponding log-prior and its first
and second order derivatives w.r.t. regression coefficients in functions grad()
and
hess()
.
Also, objects named "lower"
and "upper"
are added to each model term. These
indicate the lower and upper boundaries of the parameter space.
A transformed x
list, as returned from function bamlss.frame
.
bamlss.frame
, bfit
, GMCMC
,
get.par
, set.par
, get.state
d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + s(x2), data = d, family = "gaussian") names(bf$x$mu$smooth.construct) bf$x <- bamlss.engine.setup(bf$x, df = c("s(x1)" = 1, "s(x2)" = 3)) names(bf$x$mu$smooth.construct) names(bf$x$mu$smooth.construct[["s(x1)"]]) names(bf$x$mu$smooth.construct[["s(x1)"]]$state) sapply(bf$x$mu$smooth.construct, function(x) { c(x$state$edf, get.state(x, "tau2")) })
d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + s(x2), data = d, family = "gaussian") names(bf$x$mu$smooth.construct) bf$x <- bamlss.engine.setup(bf$x, df = c("s(x1)" = 1, "s(x2)" = 3)) names(bf$x$mu$smooth.construct) names(bf$x$mu$smooth.construct[["s(x1)"]]) names(bf$x$mu$smooth.construct[["s(x1)"]]$state) sapply(bf$x$mu$smooth.construct, function(x) { c(x$state$edf, get.state(x, "tau2")) })
This function creates an extended BAMLSS formula
. In combination with a
bamlss.family
object, each parameter of the response distribution is linked
to a single formula. If no formula is supplied for a parameter, a simple intercept only
model is created. Moreover, the function identifies hierarchical structures, see the examples.
This function is useful for creating complex model.frame
s for (hierarchical)
multi-parameter models and is used by function bamlss.frame
.
bamlss.formula(formula, family = NULL, specials = NULL, env = NULL, ...)
bamlss.formula(formula, family = NULL, specials = NULL, env = NULL, ...)
formula |
A simple |
family |
A |
specials |
A character vector specifying special functions to be used within
formulae, see |
env |
The environment that should be assigned to the formula. |
... |
Arguments passed to the |
A named list of class "bamlss.formula"
. Each list entry specifies a model, e.g., for
one parameter of a provided bamlss.family
object. Each entry (parameter model) then
holds:
formula |
A simple |
fake.formula |
A formula with all function calls being dropped, e.g., the formula
|
bamlss
, bamlss.frame
, bamlss.family
## Simple formula without family object. f <- bamlss.formula(y ~ x1 + s(x2)) print(f) print(str(f)) ## Complex formula with list of formulae. f <- list( y1 ~ x1 + s(x2), y2 ~ x3 + te(lon, lat), u ~ x4 + x1 ) f <- bamlss.formula(f) print(f) print(names(f)) ## Same formula but using extended formulae ## of package Formula. f <- y1|y2|u ~ x1 + s(x2)|x3 + te(lon,lat)|x4 + x1 f <- bamlss.formula(f) print(f) print(names(f)) ## Using a bamlss family object, e.g., gaussian_bamlss(). ## The family has two parameters, mu and sigma, for ## each parameter one formula is returned. If no ## formula is specified an intercept only model is ## generated for the respective parameter. f <- bamlss.formula(y ~ x1 + s(x2), family = gaussian_bamlss) ## Note, same as: f <- bamlss.formula(y ~ x1 + s(x2), family = "gaussian") print(f) ## Specify model for parameter sigma, too f <- list( y ~ x1 + s(x2), sigma ~ x2 + te(lon, lat) ) f <- bamlss.formula(f, family = "gaussian") print(f) ## With complex hierarchical structures, ## each parameter is another list of formulae, ## indicated by the h1,...,hk, names. f <- list( y ~ x1 + s(x2) + id1, sigma ~ x2 + te(lon, lat) + id2, id1 ~ s(x3) + x4 + s(id3), id3 ~ x5 + s(x5, x6), id2 ~ x7 ) f <- bamlss.formula(f, family = "gaussian") print(f)
## Simple formula without family object. f <- bamlss.formula(y ~ x1 + s(x2)) print(f) print(str(f)) ## Complex formula with list of formulae. f <- list( y1 ~ x1 + s(x2), y2 ~ x3 + te(lon, lat), u ~ x4 + x1 ) f <- bamlss.formula(f) print(f) print(names(f)) ## Same formula but using extended formulae ## of package Formula. f <- y1|y2|u ~ x1 + s(x2)|x3 + te(lon,lat)|x4 + x1 f <- bamlss.formula(f) print(f) print(names(f)) ## Using a bamlss family object, e.g., gaussian_bamlss(). ## The family has two parameters, mu and sigma, for ## each parameter one formula is returned. If no ## formula is specified an intercept only model is ## generated for the respective parameter. f <- bamlss.formula(y ~ x1 + s(x2), family = gaussian_bamlss) ## Note, same as: f <- bamlss.formula(y ~ x1 + s(x2), family = "gaussian") print(f) ## Specify model for parameter sigma, too f <- list( y ~ x1 + s(x2), sigma ~ x2 + te(lon, lat) ) f <- bamlss.formula(f, family = "gaussian") print(f) ## With complex hierarchical structures, ## each parameter is another list of formulae, ## indicated by the h1,...,hk, names. f <- list( y ~ x1 + s(x2) + id1, sigma ~ x2 + te(lon, lat) + id2, id1 ~ s(x3) + x4 + s(id3), id3 ~ x5 + s(x5, x6), id2 ~ x7 ) f <- bamlss.formula(f, family = "gaussian") print(f)
This function parses the data
and the model formula
, or
extended bamlss.formula
, as well as the bamlss.family
into a bamlss.frame
object. The bamlss.frame
then holds all model
matrices and information that is needed for setting up estimation engines.
bamlss.frame(formula, data = NULL, family = "gaussian", weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, contrasts = NULL, knots = NULL, specials = NULL, reference = NULL, model.matrix = TRUE, smooth.construct = TRUE, ytype = c("matrix", "vector", "integer"), scale.x = FALSE, scale.d = FALSE, ...)
bamlss.frame(formula, data = NULL, family = "gaussian", weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, contrasts = NULL, knots = NULL, specials = NULL, reference = NULL, model.matrix = TRUE, smooth.construct = TRUE, ytype = c("matrix", "vector", "integer"), scale.x = FALSE, scale.d = FALSE, ...)
formula |
A formula or extended formula, i.e., the |
data |
A |
family |
A |
weights |
Prior weights on the data. |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Can be used to supply model offsets for use in fitting. |
na.action |
A function which indicates what should happen when the data
contain |
contrasts |
An optional list. See the |
knots |
An optional list containing user specified knots, see the documentation of
function |
specials |
Specify new special terms here to be used with the |
reference |
A |
model.matrix |
Logical, should model matrices for linear parts be returned? |
smooth.construct |
Logical, should model matrices, e.g., as returned from
|
ytype |
For categorical responses, should the response be a vector or matrix. If
|
scale.x |
Logical, should the model matrices of the linear parts be scaled? |
scale.d |
Logical, should the numeric variables in the model frame be scaled? |
... |
Arguments passed to function |
The function parses the data
, the formula
or the extended
bamlss.formula
as well as the bamlss.family
into a model frame
like object, the bamlss.frame
. This object holds all necessary model matrices
and information that is needed for model fitting engines. Per default,
all package mgcv smooth term constructor functions like
s
, te
, t2
and
ti
can be used (see also function smooth.construct
),
however, even special user defined constructors can be included, see the examples below.
Function bamlss.frame()
uses function model.matrix.bamlss.frame
to
compute all design matrices for simple linear parts, all smooth terms are parsed with
function smooth.construct.bamlss.frame
.
It is also possible to create a "bamlss.frame"
using hierarchical formulae, see the
example below.
An list of class "bamlss.frame"
with the following elements:
call |
The initial call. |
model.frame |
The |
formula |
The |
family |
The |
terms |
The |
x |
A named list, the elements correspond to the parameters that are specified
within the |
y |
The response data. |
bamlss
, bamlss.formula
, bamlss.family
,
smooth.construct.bamlss.frame
, model.matrix.bamlss.frame
## Create a 'bamlss.frame'. d <- GAMart() f <- list( num ~ fac + s(x1) + s(x2) + te(lon, lat), sigma ~ id + s(x2) + s(x3) ) bf <- bamlss.frame(f, data = d, family = "gaussian") ## Show parts of the 'bamlss.frame'. print(bf) ## Categorical responses. f <- list( cat ~ fac + s(x1) + s(x2) ) bf <- bamlss.frame(f, data = d, family = "multinomial", reference = "low") print(bf) ## The response is a matrix per default. head(bf$y) ## 0/1 responses. d <- cbind(d, model.matrix(~ -1 + cat, data = d)) f <- list( catnone ~ fac + s(x1), catlow ~ s(x2), catmedium ~ s(x3) ) bf <- bamlss.frame(f, data = d, family = "multinomial") print(bf) ## Hierarchical structures. f <- list( num ~ s(x1) + s(x2) + id, id ~ te(lon, lat), sigma ~ s(x1) + fac ) bf <- bamlss.frame(f, data = d, family = "gaussian") print(bf) ## Special model term constructors, ## set up "new" constructor function and eval ## with bamlss.frame(). s77 <- function(...) { sm <- s(...) sm$label <- paste("s77(", paste(sm$term, collapse = ","), ")", sep = "") sm } f <- list( num ~ s77(x1) + s(x2) + id, sigma ~ s77(x1) ) bf <- bamlss.frame(f, data = d, family = "gaussian", specials = "s77") print(bf) names(bf$x$mu$smooth.construct)
## Create a 'bamlss.frame'. d <- GAMart() f <- list( num ~ fac + s(x1) + s(x2) + te(lon, lat), sigma ~ id + s(x2) + s(x3) ) bf <- bamlss.frame(f, data = d, family = "gaussian") ## Show parts of the 'bamlss.frame'. print(bf) ## Categorical responses. f <- list( cat ~ fac + s(x1) + s(x2) ) bf <- bamlss.frame(f, data = d, family = "multinomial", reference = "low") print(bf) ## The response is a matrix per default. head(bf$y) ## 0/1 responses. d <- cbind(d, model.matrix(~ -1 + cat, data = d)) f <- list( catnone ~ fac + s(x1), catlow ~ s(x2), catmedium ~ s(x3) ) bf <- bamlss.frame(f, data = d, family = "multinomial") print(bf) ## Hierarchical structures. f <- list( num ~ s(x1) + s(x2) + id, id ~ te(lon, lat), sigma ~ s(x1) + fac ) bf <- bamlss.frame(f, data = d, family = "gaussian") print(bf) ## Special model term constructors, ## set up "new" constructor function and eval ## with bamlss.frame(). s77 <- function(...) { sm <- s(...) sm$label <- paste("s77(", paste(sm$term, collapse = ","), ")", sep = "") sm } f <- list( num ~ s77(x1) + s(x2) + id, sigma ~ s77(x1) ) bf <- bamlss.frame(f, data = d, family = "gaussian", specials = "s77") print(bf) names(bf$x$mu$smooth.construct)
Wrapper function for applying bootstrap estimation using gradient boosting.
## Bootstrap boosting. bboost(..., data, type = 1, cores = 1, n = 2, prob = 0.623, fmstop = NULL, trace = TRUE, drop = FALSE, replace = FALSE) ## Plotting function. bboost_plot(object, col = NULL) ## Predict method. ## S3 method for class 'bboost' predict(object, newdata, ..., cores = 1, pfun = NULL)
## Bootstrap boosting. bboost(..., data, type = 1, cores = 1, n = 2, prob = 0.623, fmstop = NULL, trace = TRUE, drop = FALSE, replace = FALSE) ## Plotting function. bboost_plot(object, col = NULL) ## Predict method. ## S3 method for class 'bboost' predict(object, newdata, ..., cores = 1, pfun = NULL)
... |
Arguments passed to |
data |
The data frame to be used for modeling. |
type |
Type of algorithm, |
cores |
The number of cores to be used. |
n |
The number of bootstrap iterations. |
prob |
The fraction that should be used to fit the model in each bootstrap iteration. |
fmstop |
The function that should return the optimum stopping iteration. The function must
have two arguments: (1) the |
trace |
Prints out the current state of the bootstrap algorithm. |
drop |
Should only the best set of parameters be saved? |
replace |
Sampling with replacement, or sampling |
object |
The |
col |
The color that should be used for plotting. |
newdata |
The data frame predictions should be made for. |
pfun |
The prediction function that should be used, for example |
A list of bamlss
objects.
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1) + s(x2) + s(x3) + s(lon,lat) ## Function for evaluation of hold out sample ## criterion to find the optimum mstop. fmstop <- function(model, data) { p <- predict(model, newdata = data, model = "mu") mse <- NULL for(i in 1:nrow(model$parameters)) mse <- c(mse, mean((data$num - p[, i])^2)) list("MSE" = mse, "mstop" = which.min(mse)) } ## Bootstrap boosted models. b <- bboost(f, data = d, n = 50, cores = 3, fmstop = fmstop) ## Plot hold out sample MSE. bboost_plot(b) ## Predict for each bootstrap sample. nd <- data.frame("x2" = seq(0, 1, length = 100)) p <- predict(b, newdata = nd, model = "mu", term = "x2") plot2d(p ~ x2, data = nd) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1) + s(x2) + s(x3) + s(lon,lat) ## Function for evaluation of hold out sample ## criterion to find the optimum mstop. fmstop <- function(model, data) { p <- predict(model, newdata = data, model = "mu") mse <- NULL for(i in 1:nrow(model$parameters)) mse <- c(mse, mean((data$num - p[, i])^2)) list("MSE" = mse, "mstop" = which.min(mse)) } ## Bootstrap boosted models. b <- bboost(f, data = d, n = 50, cores = 3, fmstop = fmstop) ## Plot hold out sample MSE. bboost_plot(b) ## Predict for each bootstrap sample. nd <- data.frame("x2" = seq(0, 1, length = 100)) p <- predict(b, newdata = nd, model = "mu", term = "x2") plot2d(p ~ x2, data = nd) ## End(Not run)
Some simple shortcuts to model fitting engines.
## BayesX. bayesx2(...) ## Gradient boosting. boost2(...) ## Lasso. lasso2(...)
## BayesX. bayesx2(...) ## Gradient boosting. boost2(...) ## Lasso. lasso2(...)
... |
Arguments passed to |
A bamlss
object.
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1) + s(x2) + s(x3) + s(lon,lat) ## Boosted model. b <- boost2(f, data = d) ## Plot estimated effects. plot(b) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1) + s(x2) + s(x3) + s(lon,lat) ## Boosted model. b <- boost2(f, data = d) ## Plot estimated effects. plot(b) ## End(Not run)
Small helper function that computes the 2.5% and 97.5% quantiles and the
mean of a vector. Useful for example when using function predict.bamlss
.
c95(x)
c95(x)
x |
A numeric vector. |
x <- rnorm(100) c95(x) ## Not run: ## Example computing predictions. set.seed(123) d <- data.frame("x" = seq(-3, 3, length = 30)) d$y <- sin(d$x) + rnorm(30, sd = 0.3) ## Estimate model and compute predictions. ## with c95(). b <- bamlss(y ~ s(x), data = d) p <- predict(b, model = "mu", FUN = c95) plot(d) matplot(d$x, p, type = "l", lty = c(2, 1, 2), col = "black", add = TRUE) ## Example extracting coefficients. coef(b, FUN = c95) ## End(Not run)
x <- rnorm(100) c95(x) ## Not run: ## Example computing predictions. set.seed(123) d <- data.frame("x" = seq(-3, 3, length = 30)) d$y <- sin(d$x) + rnorm(30, sd = 0.3) ## Estimate model and compute predictions. ## with c95(). b <- bamlss(y ~ s(x), data = d) p <- predict(b, model = "mu", FUN = c95) plot(d) matplot(d$x, p, type = "l", lty = c(2, 1, 2), col = "black", add = TRUE) ## Example extracting coefficients. coef(b, FUN = c95) ## End(Not run)
Methods to extract coefficients of fitted bamlss
objects, either coefficients
returned from optimizer functions, or samples from a sampler functions.
Method confint.bamlss()
produces credible intervals or parameter samples
using quantiles.
## S3 method for class 'bamlss' coef(object, model = NULL, term = NULL, FUN = NULL, parameters = NULL, pterms = TRUE, sterms = TRUE, hyper.parameters = TRUE, list = FALSE, full.names = TRUE, rescale = FALSE, ...) ## S3 method for class 'bamlss' confint(object, parm, level = 0.95, model = NULL, pterms = TRUE, sterms = FALSE, full.names = FALSE, hyper.parameters = FALSE, ...)
## S3 method for class 'bamlss' coef(object, model = NULL, term = NULL, FUN = NULL, parameters = NULL, pterms = TRUE, sterms = TRUE, hyper.parameters = TRUE, list = FALSE, full.names = TRUE, rescale = FALSE, ...) ## S3 method for class 'bamlss' confint(object, parm, level = 0.95, model = NULL, pterms = TRUE, sterms = FALSE, full.names = FALSE, hyper.parameters = FALSE, ...)
object |
An object of class |
model |
Character or integer. For which model should coefficients be extracted? |
term |
Character or integer. For which term should coefficients be extracted? |
FUN |
A function that is applied on the parameter samples. |
parameters |
If is set to |
pterms |
Should coefficients of parametric terms be included? |
sterms |
Should coefficients of smooths terms be included? |
hyper.parameters |
For smooth terms, should hyper parameters such as smoothing variances be included? |
list |
Should the returned object have a list structure for each distribution parameter? |
full.names |
Should full names be assigned, indicating whether a term is parametric "p" or smooth "s". |
rescale |
Should parameters of the linear parts be rescaled if the |
parm |
Character or integer. For which term should coefficients intervals be extracted? |
level |
The credible level which defines the lower and upper quantiles that should be computed from the samples. |
... |
Arguments to be passed to |
Depending on argument list
and the number of distributional parameters, either a
list
or vector/matrix of model coefficients.
## Not run: ## Simulate data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Estimate model. b <- bamlss(f, data = d) ## Extract coefficients based on MCMC samples. coef(b) ## Now only the mean. coef(b, FUN = mean) ## As list without the full names. coef(b, FUN = mean, list = TRUE, full.names = FALSE) ## Coefficients only for "mu". coef(b, model = "mu") ## And "s(x2)". coef(b, model = "mu", term = "s(x2)") ## With optimizer parameters. coef(b, model = "mu", term = "s(x2)", parameters = TRUE) ## Only parameteric part. coef(b, sterms = FALSE, hyper.parameters = FALSE) ## For sigma. coef(b, model = "sigma", sterms = FALSE, hyper.parameters = FALSE) ## 95 perc. credible interval based on samples. confint(b) ## End(Not run)
## Not run: ## Simulate data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Estimate model. b <- bamlss(f, data = d) ## Extract coefficients based on MCMC samples. coef(b) ## Now only the mean. coef(b, FUN = mean) ## As list without the full names. coef(b, FUN = mean, list = TRUE, full.names = FALSE) ## Coefficients only for "mu". coef(b, model = "mu") ## And "s(x2)". coef(b, model = "mu", term = "s(x2)") ## With optimizer parameters. coef(b, model = "mu", term = "s(x2)", parameters = TRUE) ## Only parameteric part. coef(b, sterms = FALSE, hyper.parameters = FALSE) ## For sigma. coef(b, model = "sigma", sterms = FALSE, hyper.parameters = FALSE) ## 95 perc. credible interval based on samples. confint(b) ## End(Not run)
Function to generate a color legend, the legend may be added to an existing plot or drawn in a separate plotting window.
colorlegend(color = NULL, ncol = NULL, x = NULL, breaks = NULL, pos = "center", shift = 0.02, side.legend = 1L, side.ticks = 1L, range = NULL, lrange = NULL, width = 0.25, height = 0.05, scale = TRUE, xlim = NULL, ylim = NULL, plot = NULL, full = FALSE, add = FALSE, col.border = "black", lty.border = 1L, lwd.border = 1L, ticks = TRUE, at = NULL, col.ticks = "black", lwd.ticks = 1L, lty.ticks = 1L, length.ticks = 0.3, labels = NULL, distance.labels = 0, col.labels = "black", cex.labels = 1L, digits = 2L, swap = FALSE, symmetric = TRUE, xpd = NULL, title = NULL, side.title = 2, shift.title = c(0, 0), cex.title = 1, ...)
colorlegend(color = NULL, ncol = NULL, x = NULL, breaks = NULL, pos = "center", shift = 0.02, side.legend = 1L, side.ticks = 1L, range = NULL, lrange = NULL, width = 0.25, height = 0.05, scale = TRUE, xlim = NULL, ylim = NULL, plot = NULL, full = FALSE, add = FALSE, col.border = "black", lty.border = 1L, lwd.border = 1L, ticks = TRUE, at = NULL, col.ticks = "black", lwd.ticks = 1L, lty.ticks = 1L, length.ticks = 0.3, labels = NULL, distance.labels = 0, col.labels = "black", cex.labels = 1L, digits = 2L, swap = FALSE, symmetric = TRUE, xpd = NULL, title = NULL, side.title = 2, shift.title = c(0, 0), cex.title = 1, ...)
color |
Character, integer. The colors for the legend, may also be a function, e.g.
|
ncol |
Integer, the number of different colors that should be generated if |
x |
Numeric, values for which the color legend should be drawn. |
breaks |
Numeric, a set of breakpoints for the colors: must give one more breakpoint than
|
pos |
Character, numeric. The position of the legend. Either a numeric vector, e.g.
|
shift |
Numeric, if argument |
side.legend |
Integer, if set to |
side.ticks |
Integer, if set to |
range |
Numeric, specifies a range for |
lrange |
Numeric, specifies the range of legend. |
width |
Numeric, the width of the legend, if |
height |
Numeric, the height of the legend, if |
scale |
Logical, if set to |
xlim |
Numeric, the x-limits of the plotting window the legend should be added for, numeric
vector, e.g., returned from function |
ylim |
Numeric, the y-limits of the plotting window the legend should be added for, numeric
vector, e.g., returned from function |
plot |
Logical, if set to |
full |
Logical, if set to |
add |
Logical, if set to |
col.border |
The color of the surrounding border line of the legend. |
lty.border |
The line type of the surrounding border line of the legend. |
lwd.border |
The line width of the surrounding border line of the legend. |
ticks |
Logical, if set to |
at |
Numeric, specifies at which locations ticks and labels should be added. |
col.ticks |
The colors of the ticks. |
lwd.ticks |
The line width of the ticks. |
lty.ticks |
The line type of the ticks. |
length.ticks |
Numeric, the length of the ticks as percentage of the |
labels |
Character, specifies labels that should be added to the ticks. |
distance.labels |
Numeric, the distance of the labels to the ticks, proportional to the length of the ticks. |
col.labels |
The colors of the labels. |
cex.labels |
Text size of the labels. |
digits |
Integer, the decimal places if labels are numerical. |
swap |
Logical, if set to |
symmetric |
Logical, if set to |
xpd |
Sets the |
title |
Character, a title for the legend. |
side.title |
Integer, |
shift.title |
Numeric vector of length 2. Specifies a possible shift of the title in either x- or y-direction. |
cex.title |
Text size for the title. |
... |
Other graphical parameters to be passed to function |
A named list
with the colors generated, the breaks and the function map
, which may
be used for mapping of x
values to the colors specified in argument colors
, please
see the examples below.
## Play with colorlegend. colorlegend() colorlegend(side.legend = 2) colorlegend(side.legend = 2, side.ticks = 2) colorlegend(height = 2) colorlegend(width = 1, height = 0.8, scale = FALSE, pos = c(0, 0.2), length.ticks = 0.5) colorlegend(color = heat.colors, ncol = 9) colorlegend(color = heat.colors, ncol = 9, swap = TRUE) colorlegend(pos = "bottomleft") colorlegend(pos = "topleft") colorlegend(pos = "topright") colorlegend(pos = "bottomright") ## Take x values for the color legend. x <- runif(100, -2, 2) colorlegend(color = diverge_hcl, x = x) colorlegend(color = diverge_hcl, x = x, at = c(-1.5, 0, 1.5)) colorlegend(color = diverge_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high")) colorlegend(color = rainbow_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5) colorlegend(color = heat_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2) colorlegend(color = topo.colors, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3)) colorlegend(color = diverge_hsv, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2)) colorlegend(color = diverge_hsv, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2), ncol = 3) colorlegend(color = c("red", "white", "red"), x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2), ncol = 3, breaks = c(-2, -1, 1, 2)) colorlegend(color = diverge_hcl, x = x, range = c(-3, 3)) colorlegend(color = diverge_hcl, x = x, range = c(-3, 3), lrange = c(-6, 6)) ## Combine plot with color legend. n <- 100 x <- y <- seq(-3, 3, length.out = n) z <- outer(sin(x), cos(x)) pal <- colorlegend(color = diverge_hcl, x = z, plot = FALSE) op <- par(no.readonly = TRUE) par(mar = c(4.1, 4.1, 1.1, 1.1)) layout(matrix(c(1, 2), nrow = 1), widths = c(1, 0.3)) image(x = x, y = y, z = z, col = pal$colors, breaks = pal$breaks) par(mar = c(4.1, 0.1, 1.1, 3.1)) colorlegend(color = diverge_hcl, x = z, plot = TRUE, full = TRUE, side.legend = 2, side.ticks = 2) par(op) ## Another example with different plot. n <- 50 x <- sin(seq(-3, 3, length.out = n)) pal <- colorlegend(color = diverge_hcl, x = x, plot = FALSE) op <- par(no.readonly = TRUE) par(mar = c(7.1, 4.1, 1.1, 1.1)) barplot(x, border = "transparent", col = pal$map(x)) colorlegend(color = diverge_hcl, x = x, plot = FALSE, add = TRUE, xlim = c(0, 60), ylim = c(-1, 1), pos = c(0, -0.15), xpd = TRUE, scale = FALSE, width = 60, height = 0.15, at = seq(min(x), max(x), length.out = 9)) par(op)
## Play with colorlegend. colorlegend() colorlegend(side.legend = 2) colorlegend(side.legend = 2, side.ticks = 2) colorlegend(height = 2) colorlegend(width = 1, height = 0.8, scale = FALSE, pos = c(0, 0.2), length.ticks = 0.5) colorlegend(color = heat.colors, ncol = 9) colorlegend(color = heat.colors, ncol = 9, swap = TRUE) colorlegend(pos = "bottomleft") colorlegend(pos = "topleft") colorlegend(pos = "topright") colorlegend(pos = "bottomright") ## Take x values for the color legend. x <- runif(100, -2, 2) colorlegend(color = diverge_hcl, x = x) colorlegend(color = diverge_hcl, x = x, at = c(-1.5, 0, 1.5)) colorlegend(color = diverge_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high")) colorlegend(color = rainbow_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5) colorlegend(color = heat_hcl, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2) colorlegend(color = topo.colors, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3)) colorlegend(color = diverge_hsv, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2)) colorlegend(color = diverge_hsv, x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2), ncol = 3) colorlegend(color = c("red", "white", "red"), x = x, at = c(-1.5, 0, 1.5), labels = c("low", "middle", "high"), length.ticks = 1.5, lwd.border = 2, lwd.ticks = 2, cex.labels = 1.5, font = 2, col.border = "green3", col.ticks = c(2, 5, 2), col.labels = c(6, 4, 3), lty.border = 2, lty.ticks = c(2, 3, 2), ncol = 3, breaks = c(-2, -1, 1, 2)) colorlegend(color = diverge_hcl, x = x, range = c(-3, 3)) colorlegend(color = diverge_hcl, x = x, range = c(-3, 3), lrange = c(-6, 6)) ## Combine plot with color legend. n <- 100 x <- y <- seq(-3, 3, length.out = n) z <- outer(sin(x), cos(x)) pal <- colorlegend(color = diverge_hcl, x = z, plot = FALSE) op <- par(no.readonly = TRUE) par(mar = c(4.1, 4.1, 1.1, 1.1)) layout(matrix(c(1, 2), nrow = 1), widths = c(1, 0.3)) image(x = x, y = y, z = z, col = pal$colors, breaks = pal$breaks) par(mar = c(4.1, 0.1, 1.1, 3.1)) colorlegend(color = diverge_hcl, x = z, plot = TRUE, full = TRUE, side.legend = 2, side.ticks = 2) par(op) ## Another example with different plot. n <- 50 x <- sin(seq(-3, 3, length.out = n)) pal <- colorlegend(color = diverge_hcl, x = x, plot = FALSE) op <- par(no.readonly = TRUE) par(mar = c(7.1, 4.1, 1.1, 1.1)) barplot(x, border = "transparent", col = pal$map(x)) colorlegend(color = diverge_hcl, x = x, plot = FALSE, add = TRUE, xlim = c(0, 60), ylim = c(-1, 1), pos = c(0, -0.15), xpd = TRUE, scale = FALSE, width = 60, height = 0.15, at = seq(min(x), max(x), length.out = 9)) par(op)
This function takes a bamlss
object which was created using a sampler function
and continues sampling from the last state of the MCMC chain.
continue(object, cores = NULL, combine = TRUE, sleep = NULL, results = TRUE, ...)
continue(object, cores = NULL, combine = TRUE, sleep = NULL, results = TRUE, ...)
object |
A |
cores |
An integer specifying the number of cores that should be used for the sampler
function. This is based on function |
combine |
Should the new samples be combined with the old samples into one
|
sleep |
Time the system should sleep before the next core is started. |
results |
If a results function was used to create the |
... |
Arguments passed to the sampler function. |
A bamlss
object.
## Not run: ## Simulate and run model with MCMC. set.seed(123) d <- GAMart() b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d) ## Continue sampling. a <- continue(b) ## Plot all samples. ## plot(a, which = "samples") ## End(Not run)
## Not run: ## Simulate and run model with MCMC. set.seed(123) d <- GAMart() b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d) ## Continue sampling. a <- continue(b) ## Plot all samples. ## plot(a, which = "samples") ## End(Not run)
This function takes a fitted Cox model, i.e., a model estimated by opt_Cox
or sam_Cox
and computes predictions given a new data set or the
original values. Survival probabilities are computed using numerical integration, therefore,
computation may take some time. To avoid problems with computer memory, the prediction
of survival probabilities can be split into chunks and computed parallel on different cores.
cox_predict(object, newdata, type = c("link", "parameter", "probabilities"), FUN = function(x) { mean(x, na.rm = TRUE) }, time = NULL, subdivisions = 100, cores = NULL, chunks = 1, verbose = FALSE, ...)
cox_predict(object, newdata, type = c("link", "parameter", "probabilities"), FUN = function(x) { mean(x, na.rm = TRUE) }, time = NULL, subdivisions = 100, cores = NULL, chunks = 1, verbose = FALSE, ...)
object |
A |
newdata |
A data frame or list containing the values of the model
covariates at which predictions are required. If missing |
type |
Specifies the type of predictions that should be computed. |
FUN |
A function that should be applied on each row of the samples
of the additive predictor, parameter or probabilities. Per default
the function computes means of samples, however, other functions like |
time |
|
subdivisions |
How many time points should be created for each individual. |
cores |
Specifies the number of cores that should be used for prediction. The problem is
split into |
chunks |
The number of chunks that should be processed sequentially on one core. This way memory problems can be avoided when computing survival times for large problems. |
verbose |
Print progress information. |
... |
Arguments passed to |
Depending on the type of function provided in argument FUN
, a numeric
vector
or matrix
.
sam_Cox
, cox_bamlss
, surv_transform
,
simSurv
, bamlss
, predict.bamlss
## Not run: library("survival") set.seed(123) ## Simulate survival data. d <- simSurv(n = 500) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Cox model with continuous time. ## Note the the family object cox_bamlss() sets ## the default optimizer and sampler function! ## First, posterior mode estimates are computed ## using function opt_Cox(), afterwards the ## sampler sam_Cox() is started. b <- bamlss(f, family = "cox", data = d) ## Predict survival probabilities P(T > t). p <- predict(b, type = "probabilities", time = 3, subdivisions = 100, FUN = c95) ## End(Not run)
## Not run: library("survival") set.seed(123) ## Simulate survival data. d <- simSurv(n = 500) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Cox model with continuous time. ## Note the the family object cox_bamlss() sets ## the default optimizer and sampler function! ## First, posterior mode estimates are computed ## using function opt_Cox(), afterwards the ## sampler sam_Cox() is started. b <- bamlss(f, family = "cox", data = d) ## Predict survival probabilities P(T > t). p <- predict(b, type = "probabilities", time = 3, subdivisions = 100, FUN = c95) ## End(Not run)
This function creates simulated data based on a crazy function.
Crazy(n = 1000)
Crazy(n = 1000)
n |
The number of observations to be simulated. |
A data frame with response y
and covariate x
.
d <- Crazy(1000) head(d) plot(d)
d <- Crazy(1000) head(d) plot(d)
The function computes the continuous rank probability score (CRPS). Note that the function uses numerical integration, for highly efficient computation please see the scoringRules package.
CRPS(object, newdata = NULL, interval = c(-Inf, Inf), FUN = mean, term = NULL, ...)
CRPS(object, newdata = NULL, interval = c(-Inf, Inf), FUN = mean, term = NULL, ...)
object |
An object returned from |
newdata |
Optional new data that should be used for calculation. |
interval |
The interval that should be used for numerical integration |
FUN |
Function to be applied on the CRPS scores. |
term |
If required, specify the model terms that should be used within the
|
... |
Arguments passed to function |
Gneiting T, Raftery AE (2007). Strictly Proper Scoring Rules, Prediction, and Estimation." Journal of the American Statistical Association, 102(477), 359–378. doi:10.1198/016214506000001437cd ..
Gneiting T, Balabdaoui F, Raftery AE (2007). Probabilistic Forecasts, Calibration and Sharpness. Journal of the Royal Statistical Society B, 69(2), 243–268. doi:10.1111/j.1467-9868.2007.00587.x
## Not run: ## Simulate data. d <- GAMart() ## Model only including covariate x1. b1 <- bamlss(num ~ s(x1), data = d) ## Now, also including x2 and x2. b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Compare using the CRPS score. CRPS(b1) CRPS(b2) ## End(Not run)
## Not run: ## Simulate data. d <- GAMart() ## Model only including covariate x1. b1 <- bamlss(num ~ s(x1), data = d) ## Now, also including x2 and x2. b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Compare using the CRPS score. CRPS(b1) CRPS(b2) ## End(Not run)
This function interfaces keras infrastructures for high-level neural networks. The function
can be used as a standalone model fitting engine such as bamlss
or as an on top
model engine to capture special features in the data that could not be captures by other
model fitting engines.
## Deep distributional neural net. ddnn(object, optimizer = "adam", learning_rate = 0.01, epochs = 100, batch_size = NULL, nlayers = 2, units = 100, activation = "relu", l1 = NULL, l2 = NULL, validation_split = 0.2, early_stopping = TRUE, patience = 50, verbose = TRUE, ...) ## Predict method. ## S3 method for class 'ddnn' predict(object, newdata, model = NULL, type = c("link", "parameter"), drop = TRUE, ...) ## CV method for optimizing ## the number of epochs using ## the CRPS. cv_ddnn(formula, data, folds = 10, min_epochs = 300, max_epochs = 400, interval = c(-Inf, Inf), ...)
## Deep distributional neural net. ddnn(object, optimizer = "adam", learning_rate = 0.01, epochs = 100, batch_size = NULL, nlayers = 2, units = 100, activation = "relu", l1 = NULL, l2 = NULL, validation_split = 0.2, early_stopping = TRUE, patience = 50, verbose = TRUE, ...) ## Predict method. ## S3 method for class 'ddnn' predict(object, newdata, model = NULL, type = c("link", "parameter"), drop = TRUE, ...) ## CV method for optimizing ## the number of epochs using ## the CRPS. cv_ddnn(formula, data, folds = 10, min_epochs = 300, max_epochs = 400, interval = c(-Inf, Inf), ...)
object |
An object of class |
optimizer |
Character or call to optimizer functions to be used within |
learning_rate |
The learning rate of the optimizer. |
epochs |
Number of times to iterate over the training data arrays, see
|
batch_size |
Number of samples per gradient update, see |
nlayers |
Number of hidden layers. |
units |
Number of nodes per hidden layer, can be a vector. |
activation |
Activation functions used for the hidden layers, can be a vector. |
l1 |
Shrinkage parameter for L1 penalty. |
l2 |
Shrinkage parameter for L2 penalty. |
validation_split |
Proportion of data that should be used for validation. |
early_stopping |
Logical, should early stopping of the optimizer be applied? |
patience |
Integer, number of iterations the optimizer waits until early stopping is applied after changes get small in validation data set. |
verbose |
Print information during runtime of the algorithm. |
newdata |
A |
model |
Character or integer specifying for which distributional parameter predictions should be computed. |
type |
If |
drop |
If predictions for only one |
formula |
The model formula. |
data |
The data used for estimation. |
folds |
The number of folds that should be generated. |
min_epochs , max_epochs
|
Defines the minimum and maximum epochs thet should be used. |
interval |
Response interval, see function |
... |
Arguments passed to |
The default keras model is a sequential model with two hidden layers with "relu"
activation function and 100 units in each layer. Between each layer is a dropout layer with
0.1 dropout rate.
For function ddnn()
an object of class "ddnn"
. Note that extractor
functions fitted
and residuals.bamlss
can be applied.
For function predict.ddnn()
a list or vector of predicted values.
The deep learning infrastructure is experimental!
## Not run: ## Simulate data. set.seed(123) n <- 300 x <- runif(n, -3, 3) fsigma <- -2 + cos(x) y <- sin(x) + rnorm(n, sd = exp(fsigma)) ## Setup model formula. f <- list( y ~ x, sigma ~ x ) ## Fit neural network. library("keras") b <- ddnn(f, epochs = 2000) ## Plot estimated functions. par(mfrow = c(1, 2)) plot(x, y) plot2d(fitted(b)$mu ~ x, add = TRUE) plot2d(fitted(b)$sigma ~ x, ylim = range(c(fitted(b)$sigma, fsigma))) plot2d(fsigma ~ x, add = TRUE, col.lines = "red") ## Predict with newdata. nd <- data.frame(x = seq(-6, 6, length = 100)) nd$p <- predict(b, newdata = nd, type = "link") par(mfrow = c(1, 2)) plot(x, y, xlim = c(-6, 6), ylim = range(c(nd$p$mu, y))) plot2d(p$mu ~ x, data = nd, add = TRUE) plot2d(p$sigma ~ x, data = nd, ylim = range(c(nd$p$sigma, fsigma))) plot2d(fsigma ~ x, add = TRUE, col.lines = "red") ## Plot quantile residuals. e <- residuals(b) plot(e) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) n <- 300 x <- runif(n, -3, 3) fsigma <- -2 + cos(x) y <- sin(x) + rnorm(n, sd = exp(fsigma)) ## Setup model formula. f <- list( y ~ x, sigma ~ x ) ## Fit neural network. library("keras") b <- ddnn(f, epochs = 2000) ## Plot estimated functions. par(mfrow = c(1, 2)) plot(x, y) plot2d(fitted(b)$mu ~ x, add = TRUE) plot2d(fitted(b)$sigma ~ x, ylim = range(c(fitted(b)$sigma, fsigma))) plot2d(fsigma ~ x, add = TRUE, col.lines = "red") ## Predict with newdata. nd <- data.frame(x = seq(-6, 6, length = 100)) nd$p <- predict(b, newdata = nd, type = "link") par(mfrow = c(1, 2)) plot(x, y, xlim = c(-6, 6), ylim = range(c(nd$p$mu, y))) plot2d(p$mu ~ x, data = nd, add = TRUE) plot2d(p$sigma ~ x, data = nd, ylim = range(c(nd$p$sigma, fsigma))) plot2d(fsigma ~ x, add = TRUE, col.lines = "red") ## Plot quantile residuals. e <- residuals(b) plot(e) ## End(Not run)
Generic function returning the deviance information criterion (DIC) of a fitted model object.
DIC(object, ...)
DIC(object, ...)
object |
A fitted model object for which there exists a |
... |
Optionally more fitted model objects. |
## Not run: d <- GAMart() b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2), data = d) DIC(b1, b2) ## End(Not run)
## Not run: d <- GAMart() b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2), data = d) DIC(b1, b2) ## End(Not run)
disttree Families for MVN with Cholesky Parameterization
dist_mvnchol(k, r = k - 1L, type = c("basic", "modified", "chol"), ...)
dist_mvnchol(k, r = k - 1L, type = c("basic", "modified", "chol"), ...)
k |
integer. The dimension of the multivariate distribution. |
r |
Integer, the number of off-diagonals to model (AD-r covariance). |
type |
character. Choose |
... |
not used. |
NOTE: These functions are under development!! disttree families that models a multivariate Normal (Gaussian) distribution by (modified) Cholesky decomposition of the covariance matrix.
a bamlss family.
The function shows available optimizer and sampling engines for a given family object.
engines(family, ...)
engines(family, ...)
family |
A family object or the name of the family. |
... |
Further family objects or names. |
engines(gaussian_bamlss, "gamma", cox_bamlss)
engines(gaussian_bamlss, "gamma", cox_bamlss)
Family objects in bamlss specify the information that is needed for using (different)
model fitting engines, e.g., the parameter names and corresponding link functions, the
density function, derivatives of the log-likelihood w.r.t. the predictors, and so
forth. The optimizer or sampler functions that are called by bamlss
must know how
much information is needed to interpret the model since the family objects are simply
passed through. Family objects are also used for computing post-modeling statistics, e.g., for
residual diagnostics or random number generation. See the details and examples.
## Family objects in bamlss: ALD_bamlss(..., tau = 0.5, eps = 0.01) beta_bamlss(...) binomial_bamlss(link = "logit", ...) cnorm_bamlss(...) cox_bamlss(...) dw_bamlss(...) DGP_bamlss(...) dirichlet_bamlss(...) ELF_bamlss(..., tau = 0.5) gaussian_bamlss(...) gaussian2_bamlss(...) Gaussian_bamlss(...) gamma_bamlss(...) logNN_bamlss(...) multinomial_bamlss(...) mvnorm_bamlss(k = 2, ...) mvnormAR1_bamlss(k = 2, ...) poisson_bamlss(...) gpareto_bamlss(...) glogis_bamlss(...) AR1_bamlss(...) beta1_bamlss(ar.start, ...) nbinom_bamlss(...) ztnbinom_bamlss(...) lognormal_bamlss(...) weibull_bamlss(...) Sichel_bamlss(...) GEV_bamlss(...) gumbel_bamlss(...) mix_bamlss(f1, f2, ...) ZANBI_bamlss(...) ## Extractor functions: ## S3 method for class 'bamlss' family(object, ...) ## S3 method for class 'bamlss.frame' family(object, ...)
## Family objects in bamlss: ALD_bamlss(..., tau = 0.5, eps = 0.01) beta_bamlss(...) binomial_bamlss(link = "logit", ...) cnorm_bamlss(...) cox_bamlss(...) dw_bamlss(...) DGP_bamlss(...) dirichlet_bamlss(...) ELF_bamlss(..., tau = 0.5) gaussian_bamlss(...) gaussian2_bamlss(...) Gaussian_bamlss(...) gamma_bamlss(...) logNN_bamlss(...) multinomial_bamlss(...) mvnorm_bamlss(k = 2, ...) mvnormAR1_bamlss(k = 2, ...) poisson_bamlss(...) gpareto_bamlss(...) glogis_bamlss(...) AR1_bamlss(...) beta1_bamlss(ar.start, ...) nbinom_bamlss(...) ztnbinom_bamlss(...) lognormal_bamlss(...) weibull_bamlss(...) Sichel_bamlss(...) GEV_bamlss(...) gumbel_bamlss(...) mix_bamlss(f1, f2, ...) ZANBI_bamlss(...) ## Extractor functions: ## S3 method for class 'bamlss' family(object, ...) ## S3 method for class 'bamlss.frame' family(object, ...)
object |
An object of class |
k |
The dimension of the multivariate normal. Note, if |
ar.start |
Logical vector of length equal to the number of rows of the full data set
used for modeling. Must hold entries |
link |
Possible link functions. |
tau |
The quantile the should be fitted. |
eps |
Constant to be used for the approximation of the absolute function. |
f1 , f2
|
A family of class |
... |
Arguments passed to functions that are called within the family object. |
The following lists the minimum requirements on a bamlss family object to be used with
bamlss
and bamlss.frame
:
The family object must return a list
of class "family.bamlss"
.
The object must contain the family name as a character string.
The object must contain the names of the parameters as a character string, as well as the corresponding link functions as character string.
For most optimizer and sampling functions at least the density function, including a log
argument, should be provided.
When using generic model fitting engines like opt_bfit
or sam_GMCMC
, as
well as for computing post-modeling statistics with function samplestats
, and others,
it is assumed that the density function in a family object has the following arguments:
d(y, par, log = FALSE, ...)
where argument y
is the response (possibly a matrix) and par
is a named list
holding the evaluated parameters of the distribution, e.g., using a normal distribution par
has two elements, one for the mean par$mu
and one for the standard deviation
par$sigma
. The dots argument is for passing special internally used objects, depending
on the type of model this feature is usually not needed.
Similarly, for derivative based algorithms, e.g. using iteratively weighted least squares
(IWLS, see function opt_bfit
, the family object holds derivative functions evaluating
derivatives of the log-likelihood w.r.t. the predictors (or expectations of derivatives).
For each parameter, these functions also hold the following arguments:
score(y, par, ...)
for computing the first derivative of the log-likelihood w.r.t. a predictor and
hess(y, par, ...)
for computing the negative second derivatives. Within the family object these functions are organized in a named list, see the examples below.
In addition, for the cumulative distribution function (p(y, par, ...)
), for the quantile
function (q(y, par, ...)
) or for creating random numbers (r(n, par, ...)
) the same
structure is assumed. See, e.g., the code of function gaussian.bamlss()
.
Some model fitting engines can initialize the distributional parameters which oftentimes
leads to much faster convergence. The initialize functions are again organized within a named list,
one entry for each parameter, similar to the score
and hess
functions, e.g.,
see the code of family object gaussian.bamlss
.
Using function bamlss
, residuals.bamlss
and
predict.bamlss
the family objects may
also specify the transform()
er, optimizer()
, sampler()
,
samplestats()
, results()
, residuals()
and predict()
function that
should be used with this family. See for example the setup of cox_bamlss
.
For using specialized estimation engines like sam_JAGS
it is recommended to supply
any extra arguments needed by those engines with an additional list entry within the family
object, e.g., using gaussian_bamlss()
with sam_JAGS
the family objects holds
special details in an element named "bugs"
.
The examples below illustrate this setup. See also the code of the bamlss family functions.
## New family object for the normal distribution, ## can be used by function opt_bfit() and sam_GMCMC(). normal_bamlss <- function(...) { f <- list( "family" = "normal", "names" = c("mu", "sigma"), "links" = c("identity", "log"), "d" = function(y, par, log = FALSE) { dnorm(y, mean = par$mu, sd = par$sigma, log = log) }, "score" = list( "mu" = function(y, par, ...) { drop((y - par$mu) / (par$sigma^2)) }, "sigma" = function(y, par, ...) { drop(-1 + (y - par$mu)^2 / (par$sigma^2)) } ), "hess" = list( "mu" = function(y, par, ...) { drop(1 / (par$sigma^2)) }, "sigma" = function(y, par, ...) { rep(2, length(y)) } ) ) class(f) <- "family.bamlss" return(f) } ## Not run: ## Test on simulated data. d <- GAMart() b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, family = "normal") plot(b) ## Compute the log-likelihood using the family object. f <- family(b) sum(f$d(y = d$num, par = f$map2par(fitted(b)), log = TRUE)) ## For using JAGS() more details are needed. norm4JAGS_bamlss <- function(...) { f <- normal_bamlss() f$bugs <- list( "dist" = "dnorm", "eta" = BUGSeta, "model" = BUGSmodel, "reparam" = c(sigma = "1 / sqrt(sigma)") ) return(f) } ## Now with opt_bfit() and sam_JAGS(). b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, optimizer = opt_bfit, sampler = sam_JAGS, family = "norm4JAGS") plot(b) ## End(Not run)
## New family object for the normal distribution, ## can be used by function opt_bfit() and sam_GMCMC(). normal_bamlss <- function(...) { f <- list( "family" = "normal", "names" = c("mu", "sigma"), "links" = c("identity", "log"), "d" = function(y, par, log = FALSE) { dnorm(y, mean = par$mu, sd = par$sigma, log = log) }, "score" = list( "mu" = function(y, par, ...) { drop((y - par$mu) / (par$sigma^2)) }, "sigma" = function(y, par, ...) { drop(-1 + (y - par$mu)^2 / (par$sigma^2)) } ), "hess" = list( "mu" = function(y, par, ...) { drop(1 / (par$sigma^2)) }, "sigma" = function(y, par, ...) { rep(2, length(y)) } ) ) class(f) <- "family.bamlss" return(f) } ## Not run: ## Test on simulated data. d <- GAMart() b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, family = "normal") plot(b) ## Compute the log-likelihood using the family object. f <- family(b) sum(f$d(y = d$num, par = f$map2par(fitted(b)), log = TRUE)) ## For using JAGS() more details are needed. norm4JAGS_bamlss <- function(...) { f <- normal_bamlss() f$bugs <- list( "dist" = "dnorm", "eta" = BUGSeta, "model" = BUGSmodel, "reparam" = c(sigma = "1 / sqrt(sigma)") ) return(f) } ## Now with opt_bfit() and sam_JAGS(). b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, optimizer = opt_bfit, sampler = sam_JAGS, family = "norm4JAGS") plot(b) ## End(Not run)
This data set includes weekly fatalities in Austria from 2000 to 46 weeks in 2020. The data is taken from the Eurostat data base.
data("fatalities")
data("fatalities")
The fatalities
data contains the following variables:
Integer, the number of fatalities.
Integer, the corresponding year fatalities are recorded.
Integer, the corresponding week fatalities are recorded..
Eurostat Database (2020). Population and social conditions, demography and migration, mortality, weekly deaths, deaths by week and NUTS 3 region, Austria https://ec.europa.eu/eurostat/
data("fatalities") plot(num ~ week, data = fatalities)
data("fatalities") plot(num ~ week, data = fatalities)
Function to compute fitted values for bamlss
models. The function calls
predict.bamlss
to compute fitted values from samples.
## S3 method for class 'bamlss' fitted(object, model = NULL, term = NULL, type = c("link", "parameter"), samples = TRUE, FUN = c95, nsamps = NULL, ...)
## S3 method for class 'bamlss' fitted(object, model = NULL, term = NULL, type = c("link", "parameter"), samples = TRUE, FUN = c95, nsamps = NULL, ...)
object |
An object of class |
model |
Character or integer, specifies the model for which fitted values should be computed. |
term |
Character or integer, specifies the model terms for which fitted values are required.
Note that if |
type |
If |
samples |
Should fitted values be computed using samples of parameters or estimated parameters
as returned from optimizer functions (e.g., function |
FUN |
A function that should be applied on the samples of predictors or
parameters, depending on argument |
nsamps |
If the fitted |
... |
Arguments passed to function |
Depending on arguments model
, FUN
and the structure of the bamlss
model, a list of fitted values or simple vectors or matrices of fitted values.
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x1) + s(x2) + s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Fitted values returned from optimizer. f1 <- fitted(b, model = "mu", samples = FALSE) ## Fitted values returned from sampler. f2 <- fitted(b, model = "mu", samples = TRUE, FUN = mean) plot(f1, f2) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x1) + s(x2) + s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Fitted values returned from optimizer. f1 <- fitted(b, model = "mu", samples = FALSE) ## Fitted values returned from sampler. f2 <- fitted(b, model = "mu", samples = TRUE, FUN = mean) plot(f1, f2) ## End(Not run)
This function creates artificial GAM-type data.frame
s.
The function is mainly used for testing purposes.
GAMart(n = 500, sd = 0.1, seed = FALSE, ti = c("none", "vcm", "main", "both"))
GAMart(n = 500, sd = 0.1, seed = FALSE, ti = c("none", "vcm", "main", "both"))
n |
The number of observations. |
sd |
Standard deviation of the normal errors. |
seed |
Sets the |
ti |
For tensor product interaction term, the type of interaction. |
d <- GAMart() head(d) ## Not run: b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d) plot(b) ## End(Not run)
d <- GAMart() head(d) ## Not run: b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d) plot(b) ## End(Not run)
The functions searches in the gamlss.dist namespace for available
distributions. It returns a named list of faily functions which can be used
with bamlss
.
gamlss_distributions(type = c("continuous", "discrete"))
gamlss_distributions(type = c("continuous", "discrete"))
type |
Character specifying the type of distribution to be extracted. |
## Not run: dists <- gamlss_distributions(type = "continuous") print(dists) ## End(Not run)
## Not run: dists <- gamlss_distributions(type = "continuous") print(dists) ## End(Not run)
Function to get a family.bamlss
object to be used for fitting. The main
purpose of this function is to ease the handling of extra arguments to the family object.
gF(x, ...)
gF(x, ...)
x |
The name of the |
... |
Arguments passed to the family object. |
A family.bamlss
object.
f <- gF(gaussian) print(f)
f <- gF(gaussian) print(f)
This dataset is taken from the Regression Book and is about prices of used VW Golf cars.
data("Golf")
data("Golf")
The Golf
data contains the following variables:
Numeric, sale price in 1000 Euro.
Numeric, age of the car in month.
Numeric, kilometer reading in 1000 kilometers.
Numeric, month until the next TIA appointment (German TUEV).
Factor, does the car have abs?
Factor, does the car have a sunroof?
Fahrmeir, L., Kneib, T., Lang, S. and Marx, B. (2013).
Regression - Models, Methods and Applications, Springer.
https://www.uni-goettingen.de/de/551357.html
.
data("Golf") plot(price ~ age, data = Golf) plot(price ~ kilometer, data = Golf)
data("Golf") plot(price ~ age, data = Golf) plot(price ~ kilometer, data = Golf)
This function downloads and compiles the HOMSTART-project data set. The data is downloaded from the Zentralanstalt fuer Meteorologie und Geodynamik (ZAMG, http://www.zamg.ac.at) and funded by the Austrian Climate Research Programme (ACRP) and is free for research purposes.
homstart_data(dir = NULL, load = TRUE, tdir = NULL)
homstart_data(dir = NULL, load = TRUE, tdir = NULL)
dir |
The directory where the |
load |
Should the |
tdir |
An optional temporary directory where all downloaded files are processed. |
A data frame containing the following variables:
raw |
The daily precipitation observations. |
cens |
Precipitation observations censored at 0. |
bin |
Factor with levels |
cat |
Factor with levels |
trend |
A numeric time trend |
month |
Month of of the observation. |
year |
Year of the observation. |
day |
Day of the year. |
lon |
The longitude coordinate of the corresponding meteorological station. |
lat |
The latitude coordinate of the corresponding meteorological station. |
id |
Factor, meteorological station identifier. |
cos1 , cos2 , sin1 , sin2
|
Transformed time trend for harmonic regression. |
weekend |
Factor, with levels |
elevation |
Numeric, the elevation of the meteorological station. |
Nemec J, Gruber C, Chimani B, Auer I (2012). Trends in extreme temperature indices in Austria based on a new homogenised dataset. International Journal of Climatology. DOI 10.1002/joc.3532.
Nemec J, Chimani B, Gruber C, Auer I (2011). Ein neuer Datensatz homogenisierter Tagesdaten. OEGM Bulletin, 1, 19–20. https://www.meteorologie.at/docs/OEGM_bulletin_2011_1.pdf
Umlauf N, Mayr G, Messner J, Zeileis A (2012). Why does it always rain on me? A spatio-temporal analysis of precipitation in Austria. Austrian Journal of Statistics, 41(1), 81–92. doi:10.17713/ajs.v41i1.190
## Not run: homstart_data(load = TRUE) head(homstart) ## End(Not run)
## Not run: homstart_data(load = TRUE) head(homstart) ## End(Not run)
Family object to fit a flexible additive joint model for longitudinal and survival data under a Bayesian approach as presented in Koehler et al. (2017a, b). All parts of the joint model can be specified as structured additive predictors. See the details and examples.
## JM family object. jm_bamlss(...) ## "bamlss.frame" transformer function ## to set up joint models. jm_transform(x, y, data, terms, knots, formula, family, subdivisions = 25, timedependent = c("lambda", "mu", "alpha", "dalpha"), timevar = NULL, idvar = NULL, alpha = .Machine$double.eps, mu = NULL, sigma = NULL, sparse = TRUE, nonlinear = FALSE, edf_alt = FALSE, start_mu = NULL, k_mu = 6, ...) ## Posterior mode optimizing engine. opt_JM(x, y, start = NULL, weights = NULL, offset = NULL, criterion = c("AICc", "BIC", "AIC"), maxit = c(100, 1), nu = c("lambda" = 0.1, "gamma" = 0.1, "mu" = 1, "sigma" = 1, "alpha" = 1, "dalpha" = 1), update.nu = FALSE, eps = 0.0001, alpha.eps = 0.001, ic.eps = 1e-08, nback = 40, verbose = TRUE, digits = 4, ...) jm_mode(x, y, start = NULL, weights = NULL, offset = NULL, criterion = c("AICc", "BIC", "AIC"), maxit = c(100, 1), nu = c("lambda" = 0.1, "gamma" = 0.1, "mu" = 1, "sigma" = 1, "alpha" = 1, "dalpha" = 1), update.nu = FALSE, eps = 0.0001, alpha.eps = 0.001, ic.eps = 1e-08, nback = 40, verbose = TRUE, digits = 4, ...) ## Sampler function. sam_JM(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) jm_mcmc(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) ## Predict function, set to default in jm_bamlss(). jm_predict(object, newdata, type = c("link", "parameter", "probabilities", "cumhaz", "loglik"), dt, steps, id, FUN = function(x) { mean(x, na.rm = TRUE) }, subdivisions = 100, cores = NULL, chunks = 1, verbose = FALSE, ...) ## Survival plot. jm_survplot(object, id = 1, dt = NULL, steps = 10, points = TRUE, rug = !points)
## JM family object. jm_bamlss(...) ## "bamlss.frame" transformer function ## to set up joint models. jm_transform(x, y, data, terms, knots, formula, family, subdivisions = 25, timedependent = c("lambda", "mu", "alpha", "dalpha"), timevar = NULL, idvar = NULL, alpha = .Machine$double.eps, mu = NULL, sigma = NULL, sparse = TRUE, nonlinear = FALSE, edf_alt = FALSE, start_mu = NULL, k_mu = 6, ...) ## Posterior mode optimizing engine. opt_JM(x, y, start = NULL, weights = NULL, offset = NULL, criterion = c("AICc", "BIC", "AIC"), maxit = c(100, 1), nu = c("lambda" = 0.1, "gamma" = 0.1, "mu" = 1, "sigma" = 1, "alpha" = 1, "dalpha" = 1), update.nu = FALSE, eps = 0.0001, alpha.eps = 0.001, ic.eps = 1e-08, nback = 40, verbose = TRUE, digits = 4, ...) jm_mode(x, y, start = NULL, weights = NULL, offset = NULL, criterion = c("AICc", "BIC", "AIC"), maxit = c(100, 1), nu = c("lambda" = 0.1, "gamma" = 0.1, "mu" = 1, "sigma" = 1, "alpha" = 1, "dalpha" = 1), update.nu = FALSE, eps = 0.0001, alpha.eps = 0.001, ic.eps = 1e-08, nback = 40, verbose = TRUE, digits = 4, ...) ## Sampler function. sam_JM(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) jm_mcmc(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) ## Predict function, set to default in jm_bamlss(). jm_predict(object, newdata, type = c("link", "parameter", "probabilities", "cumhaz", "loglik"), dt, steps, id, FUN = function(x) { mean(x, na.rm = TRUE) }, subdivisions = 100, cores = NULL, chunks = 1, verbose = FALSE, ...) ## Survival plot. jm_survplot(object, id = 1, dt = NULL, steps = 10, points = TRUE, rug = !points)
x |
The |
y |
The model response, as returned from function |
data |
A |
terms |
The corresponding |
knots |
An optional list containing user specified knots, see the documentation of
function |
formula |
The corresponding |
family |
The |
subdivisions |
How many time points should be created for each individual. |
timedependent |
A character vector specifying the names of parameters in |
timevar |
A character specifying the name of the survival time variable in the data set. |
idvar |
Depending on the type of data set, this is the name of the variable specifying identifier of individuals. |
alpha |
Numeric, a starting value for the intercept of the association parameter alpha. |
mu |
Numeric, a starting value for the intercept of the mu parameter. |
sigma |
Numeric, a starting value for the intercept of the sigma parameter. |
sparse |
Logical, indicating if sparse matrix structures are used for updating and sampling of mu parameter model terms. |
nonlinear |
Logical, indicating if association is nonlinear in mu. See Details on the different model specifications. |
edf_alt |
Logical, indicating if an alternative computation of estimated degrees of freedom for penalized model terms should be used. |
start_mu |
Starting values for the computation of mu. For estimating associations which are nonlinear in mu, knot placement is based on these starting values which can improve stability. |
k_mu |
Number of knots for spline basis of association nonlinear in mu. Reducing this number improves stability of the estimation. |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Currently not supported. |
offset |
Currently not supported. |
criterion |
Information criterion to be used, e.g., for smoothing
variance selection. Options are the corrected AIC |
maxit |
Vector containing the maximum number of iterations for the backfitting
algorithm with |
nu |
Vector of step lengths for parameter updates of one Newton-Raphson update for each predictor of the joint model. |
update.nu |
Should the updating step length be optimized in each iteration
of the backfitting algorithm? Uses |
eps |
The relative convergence tolerance of the backfitting algorithm. |
alpha.eps |
The relative convergence tolerance of the backfitting algorithm for predictor alpha. |
ic.eps |
The relative convergence tolerance of the information criterion used, e.g., for smoothing variance selection. |
nback |
For computing |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
n.iter |
the number of MCMC iterations. |
burnin |
the burn-in phase of the sampler, i.e., the number of starting samples that should be removed. |
thin |
the thinning parameter for MCMC simulation. E.g., |
step |
How many times should algorithm runtime information be printed, divides |
object |
A |
newdata |
Dataset for which to create predictions. Not needed for conditional survival probabilities. |
type |
Character string indicating which type of predictions to compute. |
id |
Integer or character, that specifies the individual for which the plot should be created. |
dt |
The time window after the last observed measurement for which predictions should be computed.
The default is |
steps |
Integer, the number of steps for which to evaluate the conditional survival probability
up to |
FUN |
A function that should be applied on the samples of predictors or
parameters, depending on argument |
cores |
Specifies the number of cores that should be used for prediction. Note that
this functionality is based on the |
chunks |
Should computations be split into |
points |
Should longitudinal observations be added to the plot. |
rug |
Should longitudinal observed time points be added on the x-axis to the plot. |
... |
Currently not used. |
We refer to the papers of Koehler et al. (2017a, b) for details on the flexible
additive joint model. In short, we model the hazard of subject an event at time
as
with predictor for all survival covariates that are time-varying or have a
time-varying coefficient (including the log baseline hazard), predictor
for
baseline survival covariates, predictor
representing the potentially
time-varying or nonlinear association between the longitudinal marker
and the hazard. The
longitudinal response
at time points
is modeled as
with independent normal errors .
Each predictor is a structured additive predictor, i.e. a sum of functions of
covariates
. Each of these functions can be
modeled parametrically or using basis function evaluations from the smooth constructors in
mgcv such as
s
, te
and ti
and can
include smooth time-varying, random or spatial effects. For the Bayesian estimation of these effects
we specify corresponding priors: For linear or parametric terms we use vague normal priors, smooth
and random effect terms are regularized by placing generic multivariate normal priors on the
coefficients and for anisotropic smooths, when multiple smoothing variance parameters are involved,
more complex prior are in place (cf. Koehler et al., 2017a). We use inverse Gamma
hyper-priors, i.e. IG(0.001, 0.001) to obtain an inverse Gamma full conditional for the variance
parameters. We estimate the posterior mode by maximizing the log-posterior of the model using a
Newton-Raphson procedure, the posterior mean is obtained via derivative-based Metropolis-Hastings
sampling. We recommend to use posterior mode estimates for a quick model assessment. In order to
draw correct inferences from the model, posterior mean estimates should be computed.
We approximate integration in the survival part of the likelihood using trapezoidal rule. For
posterior mode estimation.
A variety specifications of the association are possible with an
important distinction between associations which are nonlinear in
for
nonlinear = TRUE
(Koehler et al. 2017b) or linear where for
nonlinear = FALSE
(Koehler et al. 2017a).
The indicator nonlinear
for associations with are linear or nonlinear in was named
interaction]
in earlier versions stages of the development.
Koehler M, Umlauf N, Beyerlein, A., Winkler, C. Ziegler, A.-G., Greven S (2017). Flexible Bayesian Additive Joint Models with an Application to Type 1 Diabetes Research. Biometrical Journal. doi:10.1002/bimj.201600224
Meike Koehler, Nikolaus Umlauf, and Sonja Greven (2018). Nonlinear association structures in flexible Bayesian additive joint models. Statistics in Medicine. doi:10.1002/sim.7967
## Not run: set.seed(123) ## Simulate survival data ## with random intercepts/slopes and a linear effect of time, ## constant association alpha and no effect of the derivative d <- simJM(nsub = 200, long_setting = "linear", alpha_setting = "constant", dalpha_setting = "zero", full = FALSE) ## Formula of the according joint model f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ s(x1, bs = "ps"), mu ~ obstime + s(id, bs = "re") + s(id, obstime, bs = "re"), sigma ~ 1, alpha ~ 1, dalpha ~ -1 ) ## Joint model estimation ## jm_bamlss() sets the default optimizer and sampler function. ## First, posterior mode estimates are computed using function ## opt_JM(), afterwards the sampler sam_JM() is started. b <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id") ## Plot estimated effects. plot(b) ## Predict event probabilities for two individuals ## at 12 time units after their last longitudinal measurement. ## The event probability is conditional on their survival ## up to their last observed measurement. p <- predict(b, type = "probabilities", id = c(1, 2), dt = 12, FUN = c95) print(p) ## Plot of survival probabilities and ## corresponding longitudinal effects ## for individual id. jm_survplot(b, id = 3) jm_survplot(b, id = 30) ## Simulate survival data ## with functional random intercepts and a nonlinear effect ## of time, time-varying association alpha and no effect ## of the derivative. ## Note: This specification is the simJM default. d <- simJM(nsub = 200, full = FALSE) ## Formula of the according joint model ## number of knots for the smooth nonlinear effect of time klong <- 8 f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ s(x1, bs = "ps"), mu ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = klong) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), klong)) + s(x2, bs = "ps"), sigma ~ 1, alpha ~ s(survtime, bs = "ps"), dalpha ~ -1 ) ## Estimating posterior mode only using opt_JM() b_mode <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", sampler = FALSE) ## Estimating posterior means using sam_JM() ## with starting values generated from posterior mode b_mean <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", optimizer = FALSE, start = parameters(b_mode), results = FALSE) ## Plot effects. plot(b_mean, model = "alpha") ## Simulate survival data ## with functional random intercepts and an association nonlinear in mu set.seed(234) d <- simJM(nsub = 300, long_setting = "functional", alpha_setting = "nonlinear", nonlinear = TRUE, full = FALSE, probmiss = 0.9) ## Calculate longitudinal model to obtain starting values for mu long_df <- 7 f_start <- y ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = long_df) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), long_df)) + s(x2, bs = "ps") b_start <- bamlss(f_start, data = d, sampler = FALSE) mu <- predict(b_start)$mu ## Fit joint model with nonlinear association (nonlinear = TRUE) f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ x1, mu ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = long_df) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), long_df)) + s(x2, bs = "ps"), sigma ~ 1, alpha ~ 1, dalpha ~ -1 ) b <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", nonlinear = TRUE, start_mu = mu, n.iter = 6000, burnin = 2000, thin = 2) plot(b) samplestats(b$samples) ## End(Not run)
## Not run: set.seed(123) ## Simulate survival data ## with random intercepts/slopes and a linear effect of time, ## constant association alpha and no effect of the derivative d <- simJM(nsub = 200, long_setting = "linear", alpha_setting = "constant", dalpha_setting = "zero", full = FALSE) ## Formula of the according joint model f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ s(x1, bs = "ps"), mu ~ obstime + s(id, bs = "re") + s(id, obstime, bs = "re"), sigma ~ 1, alpha ~ 1, dalpha ~ -1 ) ## Joint model estimation ## jm_bamlss() sets the default optimizer and sampler function. ## First, posterior mode estimates are computed using function ## opt_JM(), afterwards the sampler sam_JM() is started. b <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id") ## Plot estimated effects. plot(b) ## Predict event probabilities for two individuals ## at 12 time units after their last longitudinal measurement. ## The event probability is conditional on their survival ## up to their last observed measurement. p <- predict(b, type = "probabilities", id = c(1, 2), dt = 12, FUN = c95) print(p) ## Plot of survival probabilities and ## corresponding longitudinal effects ## for individual id. jm_survplot(b, id = 3) jm_survplot(b, id = 30) ## Simulate survival data ## with functional random intercepts and a nonlinear effect ## of time, time-varying association alpha and no effect ## of the derivative. ## Note: This specification is the simJM default. d <- simJM(nsub = 200, full = FALSE) ## Formula of the according joint model ## number of knots for the smooth nonlinear effect of time klong <- 8 f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ s(x1, bs = "ps"), mu ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = klong) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), klong)) + s(x2, bs = "ps"), sigma ~ 1, alpha ~ s(survtime, bs = "ps"), dalpha ~ -1 ) ## Estimating posterior mode only using opt_JM() b_mode <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", sampler = FALSE) ## Estimating posterior means using sam_JM() ## with starting values generated from posterior mode b_mean <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", optimizer = FALSE, start = parameters(b_mode), results = FALSE) ## Plot effects. plot(b_mean, model = "alpha") ## Simulate survival data ## with functional random intercepts and an association nonlinear in mu set.seed(234) d <- simJM(nsub = 300, long_setting = "functional", alpha_setting = "nonlinear", nonlinear = TRUE, full = FALSE, probmiss = 0.9) ## Calculate longitudinal model to obtain starting values for mu long_df <- 7 f_start <- y ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = long_df) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), long_df)) + s(x2, bs = "ps") b_start <- bamlss(f_start, data = d, sampler = FALSE) mu <- predict(b_start)$mu ## Fit joint model with nonlinear association (nonlinear = TRUE) f <- list( Surv2(survtime, event, obs = y) ~ s(survtime, bs = "ps"), gamma ~ x1, mu ~ ti(id, bs = "re") + ti(obstime, bs = "ps", k = long_df) + ti(id, obstime, bs = c("re", "ps"), k = c(nlevels(d$id), long_df)) + s(x2, bs = "ps"), sigma ~ 1, alpha ~ 1, dalpha ~ -1 ) b <- bamlss(f, data = d, family = "jm", timevar = "obstime", idvar = "id", nonlinear = TRUE, start_mu = mu, n.iter = 6000, burnin = 2000, thin = 2) plot(b) samplestats(b$samples) ## End(Not run)
Smooth constructors and optimizer for Lasso penalization with bamlss
. The
penalization is based on a Taylor series approximation of the Lasso penalty.
## Smooth constructor function. la(formula, type = c("single", "multiple"), ...) ## Single Lasso smoothing parameter optimizer. opt_lasso(x, y, start = NULL, adaptive = TRUE, lower = 0.001, upper = 1000, nlambda = 100, lambda = NULL, multiple = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = NULL, stop.nu = NULL, ridge = .Machine$double.eps^0.5, zeromodel = NULL, ...) lasso(x, y, start = NULL, adaptive = TRUE, lower = 0.001, upper = 1000, nlambda = 100, lambda = NULL, multiple = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = NULL, stop.nu = NULL, ridge = .Machine$double.eps^0.5, zeromodel = NULL, ...) ## Lasso transformation function to set ## adaptive weights from an unpenalized model. lasso_transform(x, zeromodel, nobs = NULL, ...) ## Plotting function for opt_lasso() optimizer. lasso_plot(x, which = c("criterion", "parameters"), spar = TRUE, model = NULL, name = NULL, mstop = NULL, retrans = FALSE, color = NULL, show.lambda = TRUE, labels = NULL, digits = 2, ...) ## Extract optimum stopping iteration for opt_lasso() optimizer. ## Based on the minimum of the information criterion. lasso_stop(x) ## Extract retransformed Lasso coefficients. lasso_coef(x, ...)
## Smooth constructor function. la(formula, type = c("single", "multiple"), ...) ## Single Lasso smoothing parameter optimizer. opt_lasso(x, y, start = NULL, adaptive = TRUE, lower = 0.001, upper = 1000, nlambda = 100, lambda = NULL, multiple = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = NULL, stop.nu = NULL, ridge = .Machine$double.eps^0.5, zeromodel = NULL, ...) lasso(x, y, start = NULL, adaptive = TRUE, lower = 0.001, upper = 1000, nlambda = 100, lambda = NULL, multiple = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = NULL, stop.nu = NULL, ridge = .Machine$double.eps^0.5, zeromodel = NULL, ...) ## Lasso transformation function to set ## adaptive weights from an unpenalized model. lasso_transform(x, zeromodel, nobs = NULL, ...) ## Plotting function for opt_lasso() optimizer. lasso_plot(x, which = c("criterion", "parameters"), spar = TRUE, model = NULL, name = NULL, mstop = NULL, retrans = FALSE, color = NULL, show.lambda = TRUE, labels = NULL, digits = 2, ...) ## Extract optimum stopping iteration for opt_lasso() optimizer. ## Based on the minimum of the information criterion. lasso_stop(x) ## Extract retransformed Lasso coefficients. lasso_coef(x, ...)
formula |
A formula like |
type |
Should one single penalty parameter be used or multiple parameters, one for each
covariate in |
x |
For function |
y |
The model response, as returned from function |
start |
A vector of starting values. Note, Lasso smoothing parameters will be dropped. |
adaptive |
Should adaptive weights be used for fused Lasso terms? |
lower |
Numeric. The minimum lambda value. |
upper |
Numeric. The maximum lambda value. |
nlambda |
Integer. The number of smoothing parameters for which coefficients should be
estimated, i.e., the vector of smoothing parameters is build up as a sequence from
|
lambda |
Numeric. A sequence/vector of lambda parameters that should be used. |
multiple |
Logical. Should the lambda grid be exapnded to search for multiple lambdas, one for each distributional parameter. |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
flush |
use |
nu |
Numeric or logical. Defines the step length for parameter updating of a model term,
useful when the algorithm encounters convergence problems. If |
stop.nu |
Integer. Should step length reduction be stopped after |
ridge |
A ridge penalty parameter that should be used when finding adaptive weights, i.e., parameters from an unpenalized model. The ridge penalty is used to stabilize the estimation in complex models. |
zeromodel |
A model containing the unpenalized parameters, e.g., for each |
nobs |
Integer, number of observations of the data used for modeling. If not supplied
|
which |
Which of the two provided plots should be created, character or integer |
spar |
Should graphical parameters be set by the plotting function? |
model |
Character selecting for which model the plot shpuld be created. |
name |
Character, the name of the coefficient group that should be plotted. Note that
the string provided in |
mstop |
Integer vector, defines the path length to be plotted. |
retrans |
Logical, should coefficients be re-transformed before plotting? |
color |
Colors or color function that creates colors for the group paths. |
show.lambda |
Logical. Should the optimum value of the penalty parameter lambda be shown? |
labels |
A character string of labels that should be used on the 4 axis. |
... |
Arguments passed to the subsequent smooth constructor function.
|
For function la()
, similar to function s
a simple smooth
specification object.
For function opt_lasso()
a list containing the following objects:
fitted.values |
A named list of the fitted values based on the last lasso iteration of the modeled parameters of the selected distribution. |
parameters |
A matrix, each row corresponds to the parameter values of one boosting iteration. |
lasso.stats |
A matrix containing information about the log-likelihood, log-posterior and the information criterion for each lambda. |
Andreas Groll, Julien Hambuckers, Thomas Kneib, and Nikolaus Umlauf (2019). Lasso-type penalization in the framework of generalized additive models for location, scale and shape. Computational Statistics & Data Analysis. doi:10.1016/j.csda.2019.06.005
Oelker Margreth-Ruth and Tutz Gerhard (2015). A uniform framework for combination of penalties in generalized structured models. Adv Data Anal Classif. doi:10.1007/s11634-015-0205-y
## Not run: ## Simulated fusion Lasso example. bmu <- c(0,0,0,2,2,2,4,4,4) bsigma <- c(0,0,0,-2,-2,-2,-1,-1,-1) id <- factor(sort(rep(1:length(bmu), length.out = 300))) ## Response. set.seed(123) y <- bmu[id] + rnorm(length(id), sd = exp(bsigma[id])) ## Estimate model: ## fuse=1 -> nominal fusion, ## fuse=2 -> ordinal fusion, ## first, unpenalized model to be used for adaptive fusion weights. f <- list(y ~ la(id,fuse=2,fx=TRUE), sigma ~ la(id,fuse=1,fx=TRUE)) b0 <- bamlss(f, sampler = FALSE) ## Model with single lambda parameter. f <- list(y ~ la(id,fuse=2), sigma ~ la(id,fuse=1)) b1 <- bamlss(f, sampler = FALSE, optimizer = opt_lasso, criterion = "BIC", zeromodel = b0) ## Plot information criterion and coefficient paths. lasso_plot(b1, which = 1) lasso_plot(b1, which = 2) lasso_plot(b1, which = 2, model = "mu", name = "mu.s.la(id).id") lasso_plot(b1, which = 2, model = "sigma", name = "sigma.s.la(id).id") ## Extract coefficients for optimum Lasso parameter. coef(b1, mstop = lasso_stop(b1)) ## Predict with optimum Lasso parameter. p1 <- predict(b1, mstop = lasso_stop(b1)) ## Full MCMC, needs lasso_transform() to assign the ## adaptive weights from unpenalized model b0. b2 <- bamlss(f, optimizer = FALSE, transform = lasso_transform, zeromodel = b0, nobs = length(y), start = coef(b1, mstop = lasso_stop(b1)), n.iter = 4000, burnin = 1000) summary(b2) plot(b2) ci <- confint(b2, model = "mu", pterms = FALSE, sterms = TRUE) lasso_plot(b1, which = 2, model = "mu", name = "mu.s.la(id).id", spar = FALSE) for(i in 1:8) { abline(h = ci[i, 1], lty = 2, col = "red") abline(h = ci[i, 2], lty = 2, col = "red") } ## End(Not run)
## Not run: ## Simulated fusion Lasso example. bmu <- c(0,0,0,2,2,2,4,4,4) bsigma <- c(0,0,0,-2,-2,-2,-1,-1,-1) id <- factor(sort(rep(1:length(bmu), length.out = 300))) ## Response. set.seed(123) y <- bmu[id] + rnorm(length(id), sd = exp(bsigma[id])) ## Estimate model: ## fuse=1 -> nominal fusion, ## fuse=2 -> ordinal fusion, ## first, unpenalized model to be used for adaptive fusion weights. f <- list(y ~ la(id,fuse=2,fx=TRUE), sigma ~ la(id,fuse=1,fx=TRUE)) b0 <- bamlss(f, sampler = FALSE) ## Model with single lambda parameter. f <- list(y ~ la(id,fuse=2), sigma ~ la(id,fuse=1)) b1 <- bamlss(f, sampler = FALSE, optimizer = opt_lasso, criterion = "BIC", zeromodel = b0) ## Plot information criterion and coefficient paths. lasso_plot(b1, which = 1) lasso_plot(b1, which = 2) lasso_plot(b1, which = 2, model = "mu", name = "mu.s.la(id).id") lasso_plot(b1, which = 2, model = "sigma", name = "sigma.s.la(id).id") ## Extract coefficients for optimum Lasso parameter. coef(b1, mstop = lasso_stop(b1)) ## Predict with optimum Lasso parameter. p1 <- predict(b1, mstop = lasso_stop(b1)) ## Full MCMC, needs lasso_transform() to assign the ## adaptive weights from unpenalized model b0. b2 <- bamlss(f, optimizer = FALSE, transform = lasso_transform, zeromodel = b0, nobs = length(y), start = coef(b1, mstop = lasso_stop(b1)), n.iter = 4000, burnin = 1000) summary(b2) plot(b2) ci <- confint(b2, model = "mu", pterms = FALSE, sterms = TRUE) lasso_plot(b1, which = 2, model = "mu", name = "mu.s.la(id).id", spar = FALSE) for(i in 1:8) { abline(h = ci[i, 1], lty = 2, col = "red") abline(h = ci[i, 2], lty = 2, col = "red") } ## End(Not run)
This smooth constructor implements simple linear effects. The columns of the design matrix
are automatically scaled. The main advantage of this constructor is speed when used in
the BAMLSS boosting algorithm boost
. Optionally, a ridge penalty can be
added, please see the example.
## Linear smooth constructor. lin(...) ## For mgcv. ## S3 method for class 'linear.smooth.spec' smooth.construct(object, data, knots, ...)
## Linear smooth constructor. lin(...) ## For mgcv. ## S3 method for class 'linear.smooth.spec' smooth.construct(object, data, knots, ...)
... |
For function |
object , data , knots
|
See |
Function lin()
, similar to function s
a simple smooth specification
object.
bamlss
, predict.bamlss
, bfit
, boost
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ lin(~x1+x2+x3+fac,ridge=TRUE) b <- bamlss(f, data = d) summary(b) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ lin(~x1+x2+x3+fac,ridge=TRUE) b <- bamlss(f, data = d) summary(b) ## End(Not run)
Provides the compiled dwelling fire data of London in 2015. The data is provided
as a SpatialPointsDataFrame
in object LondonFire
. In addition
the boundary and borough information is provided in objects LondonBoundaries
and
LondonBoroughs
. Locations of all fire stations in London of 2015 are provided in
object LondonFStations
.
data("LondonFire")
data("LondonFire")
The LondonFire
data contains the following variables:
Numeric, the time after the emergency call until the first fire engine arrived.
Numeric, The time of day at which the emergency call was received.
Numeric, the fire station intensity at the location of the fire scene. The
intensity is measured using a kernel density estimate of the LondonFStations
using the
spatstat package.
London Fire (2015). London Fire Brigade Incident Records. London Data Store, UK Open Government Licence (OGL v2). https://data.london.gov.uk/dataset/london-fire-brigade-incident-records
London Boroughs/Boundaries (2015). Statistical GIS Boundary Files for London. London Data Store, UK Open Government Licence (OGL v2). https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london
London Fire Stations (2015). London Fire Brigade,
https://www.london-fire.gov.uk/
.
(old url http://www.london-fire.gov.uk/A-ZFireStations.asp
)
Taylor BM (2016). Spatial Modelling of Emergency Service Response Times. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(2), 433–453. doi:10.1111/rssa.12192
data("LondonFire") plot(LondonFire, col = "red") plot(LondonFStations, col = "blue", cex = 1.5, pch = 16, add = TRUE) plot(LondonBoroughs, lwd = 0.5, add = TRUE) plot(LondonBoundaries, add = TRUE, lwd = 1.5) head(LondonFire@data)
data("LondonFire") plot(LondonFire, col = "red") plot(LondonFStations, col = "blue", cex = 1.5, pch = 16, add = TRUE) plot(LondonBoroughs, lwd = 0.5, add = TRUE) plot(LondonBoundaries, add = TRUE, lwd = 1.5) head(LondonFire@data)
Generate a formula for a MVN Cholesky model.
make_formula(formula, type = "basic")
make_formula(formula, type = "basic")
formula |
formula. |
type |
character. Type of Cholesky decomposition. |
This is a helper to generate a formula for a bamlss model with k
-dimensional
multi-variate normal distribution and Cholesky decomposed variance-covariance matrix.
It is helpful if one formula should be used for means, another for all diagonal
entries of the Cholesky factor, and a third one for all lower triangular entries
of the Cholesky factor.
The left hand side has k
elements separated by |
.
The right hand side has one to three elements separated by |
specifying
the formulas used for all means, diagonal entries of the Cholesky factor and
lower triangular entries of the Cholesky factor, respectively.
f <- O | C | E | A | N ~ s(x1) + s(x2) | s(y) | z f2 <- make_formula(f) f2
f <- O | C | E | A | N ~ s(x1) + s(x2) | s(y) | z f2 <- make_formula(f) f2
Extracts the model frame of a bamlss
or bamlss.frame
object.
## S3 method for class 'bamlss' model.frame(formula, ...) ## S3 method for class 'bamlss.frame' model.frame(formula, ...) ## Internal helper function for creating ## the model frame. bamlss.model.frame(formula, data, family = gaussian_bamlss(), weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, specials = NULL, contrasts.arg = NULL, drop.unused.levels = TRUE, ...)
## S3 method for class 'bamlss' model.frame(formula, ...) ## S3 method for class 'bamlss.frame' model.frame(formula, ...) ## Internal helper function for creating ## the model frame. bamlss.model.frame(formula, data, family = gaussian_bamlss(), weights = NULL, subset = NULL, offset = NULL, na.action = na.omit, specials = NULL, contrasts.arg = NULL, drop.unused.levels = TRUE, ...)
formula |
An object of class |
data |
A |
family |
A |
weights |
Prior weights on the data. |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
offset |
Can be used to supply model offsets for use in fitting. |
na.action |
A function which indicates what should happen when the data
contain |
specials |
Special function in formulas, see |
contrasts.arg |
An optional list. See the |
drop.unused.levels |
Should factors have unused levels dropped? |
... |
Arguments to be passed to |
The data.frame
containing the variables used for modeling
with function bamlss
.
bamlss
, bamlss.frame
,
model.matrix.bamlss.frame
.
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( log(pnum) ~ s(x1) + log(x2), sigma ~ s(x3) ) ## Estimate model. b <- bamlss(f, data = d) ## Extract the model frame. head(model.frame(b)) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( log(pnum) ~ s(x1) + log(x2), sigma ~ s(x3) ) ## Estimate model. b <- bamlss(f, data = d) ## Extract the model frame. head(model.frame(b)) ## End(Not run)
The function creates design (or model) matrices for BAMLSS, i.e., for
each parameter of a bamlss.family
object.
## S3 method for class 'bamlss.frame' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...) ## S3 method for class 'bamlss.formula' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...) ## S3 method for class 'bamlss.terms' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...)
## S3 method for class 'bamlss.frame' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...) ## S3 method for class 'bamlss.formula' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...) ## S3 method for class 'bamlss.terms' model.matrix(object, data = NULL, model = NULL, drop = TRUE, scale.x = FALSE, ...)
object |
A |
data |
A data frame or list. |
model |
Character or integer, specifies the model for which design matrices should be returned. |
drop |
If model matrices for only one |
scale.x |
Logical, should the model matrices of the linear parts be scaled? |
... |
Not used. |
Depending on the type of model a named list of model matrices or a single model matrix.
model.matrix
, bamlss.frame
,
bamlss.formula
, terms.bamlss
.
## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ x1 + x2 + id, sigma ~ x3 + fac + lon + lat ) ## Create a "bamlss.frame". bf <- bamlss.frame(f, data = d) ## Get the model matrices. X <- model.matrix(bf) head(X$sigma) ## Same with "bamlss.formula". X <- model.matrix(bamlss.formula(f), data = d) head(X$sigma)
## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ x1 + x2 + id, sigma ~ x3 + fac + lon + lat ) ## Create a "bamlss.frame". bf <- bamlss.frame(f, data = d) ## Get the model matrices. X <- model.matrix(bf) head(X$sigma) ## Same with "bamlss.formula". X <- model.matrix(bamlss.formula(f), data = d) head(X$sigma)
BAMLSS Family for MVN with Cholesky Parameterization
mvn_chol(k = 2L, ...)
mvn_chol(k = 2L, ...)
k |
integer. The dimension of the multivariate distribution. |
... |
not used. |
This is a prototype implementation of a BAMLSS family that models a multivariate Normal (Gaussian) distribution by a Cholesky decomposition of the covariance matrix.
a bamlss family.
BAMLSS Family for MVN with Modified Cholesky Parameterization
mvn_modchol(k = 2L, ...)
mvn_modchol(k = 2L, ...)
k |
integer. The dimension of the multivariate distribution. |
... |
not used. |
This is a prototype implementation of a BAMLSS family that models a multivariate Normal (Gaussian) distribution by a modified Cholesky decomposition of the covariance matrix.
a bamlss family.
BAMLSS Families for MVN with Cholesky Parameterization
mvnchol_bamlss(k, type = c("basic", "modified", "chol"), ...)
mvnchol_bamlss(k, type = c("basic", "modified", "chol"), ...)
k |
integer. The dimension of the multivariate distribution. |
type |
character. Choose |
... |
not used. |
BAMLSS families that models a multivariate Normal (Gaussian) distribution by (modified) Cholesky decomposition of the covariance matrix.
For examples see TempIbk
.
a bamlss family.
This smooth constructor implements single hidden layer neural networks.
## The neural network smooth constructor. n(..., k = 10, type = 2) ## Initialize weights. n.weights(nodes, k, r = NULL, s = NULL, type = c("sigmoid", "gauss", "softplus", "cos", "sin"), x = NULL, ...) ## Second weights initializer, internally calls n.weights. make_weights(object, data, dropout = 0.2) ## Boosted neural net predictions. predictn(object, newdata, model = NULL, mstop = NULL, type = c("link", "parameter"))
## The neural network smooth constructor. n(..., k = 10, type = 2) ## Initialize weights. n.weights(nodes, k, r = NULL, s = NULL, type = c("sigmoid", "gauss", "softplus", "cos", "sin"), x = NULL, ...) ## Second weights initializer, internally calls n.weights. make_weights(object, data, dropout = 0.2) ## Boosted neural net predictions. predictn(object, newdata, model = NULL, mstop = NULL, type = c("link", "parameter"))
... |
For function |
k |
For function |
type |
Integer. Type |
nodes |
Number of nodes for each layer, i.e., can also be a vector. |
r , s
|
Parameters controlling the shape of the activation functions. |
x |
A scaled covariate matrix, the data will be used to identify the range of the weights. |
object , data
|
See |
dropout |
The fraction of inner weights that should be set to zero. |
newdata |
The data frame that should be used for prediction. |
model |
For which parameter of the distribution predictions should be computed. |
mstop |
The stopping iteration for which predictions should be computed. The default is to return a matrix of predictions, each column represents the prediction of one boosting iteration. |
Function n()
, similar to function s
a simple smooth specification
object.
bamlss
, predict.bamlss
, bfit
, boost
## ... coming soon ...!
## ... coming soon ...!
The function takes a SpatialPolygonsDataFrame
and computes
the neighbor penalty matrix that can be used to fit a Markov
random field, e.g., using the smooth constructor
smooth.construct.mrf.smooth.spec
.
## Compute the neighborhood matrix. neighbormatrix(x, type = c("boundary", "dist", "delaunay", "knear"), k = 1, id = NULL, nb = FALSE, names = NULL, ...) ## Plot neighborhood structure. plotneighbors(x, add = FALSE, ...)
## Compute the neighborhood matrix. neighbormatrix(x, type = c("boundary", "dist", "delaunay", "knear"), k = 1, id = NULL, nb = FALSE, names = NULL, ...) ## Plot neighborhood structure. plotneighbors(x, add = FALSE, ...)
x |
An object of class |
type |
Which type of neighborhood structure should be used,
|
k |
For |
id |
An identifier variable for which the penalty matrix should be computed. |
nb |
Should only the neighborhood object be returned. |
names |
Specifies the column where the regions names are provided in the data
slot in the |
add |
Should the neighborhood structure be added to an existing plot? |
... |
Arguments to be passed to function |
smooth.construct.mrf.smooth.spec
, dnearneigh
,
tri2nb
, knn2nb
.
## Not run: data("LondonFire") ## Compute polygon boundary based ## neighborhood matrix. nm <- neighbormatrix(LondonBoroughs) print(nm) ## Plot neighborhood structures. plotneighbors(LondonBoroughs) plotneighbors(LondonBoroughs, type = "delaunay") plotneighbors(LondonBoroughs, type = "dist", d1 = 0, d2 = 0.15) ## End(Not run)
## Not run: data("LondonFire") ## Compute polygon boundary based ## neighborhood matrix. nm <- neighbormatrix(LondonBoroughs) print(nm) ## Plot neighborhood structures. plotneighbors(LondonBoroughs) plotneighbors(LondonBoroughs, type = "delaunay") plotneighbors(LondonBoroughs, type = "dist", d1 = 0, d2 = 0.15) ## End(Not run)
Batchwise backfitting estimation engine for GAMLSS using very large data sets.
## Batchwise backfitting engine. opt_bbfit(x, y, family, shuffle = TRUE, start = NULL, offset = NULL, epochs = 1, nbatch = 10, verbose = TRUE, ...) bbfit(x, y, family, shuffle = TRUE, start = NULL, offset = NULL, epochs = 1, nbatch = 10, verbose = TRUE, ...) ## Parallel version. opt_bbfitp(x, y, family, mc.cores = 1, ...) ## Loglik contribution plot. contribplot(x, ...)
## Batchwise backfitting engine. opt_bbfit(x, y, family, shuffle = TRUE, start = NULL, offset = NULL, epochs = 1, nbatch = 10, verbose = TRUE, ...) bbfit(x, y, family, shuffle = TRUE, start = NULL, offset = NULL, epochs = 1, nbatch = 10, verbose = TRUE, ...) ## Parallel version. opt_bbfitp(x, y, family, mc.cores = 1, ...) ## Loglik contribution plot. contribplot(x, ...)
x |
For function |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
shuffle |
Should observations be shuffled? |
start |
A named numeric vector containing possible starting values, the names are based on
function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
epochs |
For how many epochs should the algorithm run? |
nbatch |
Number of batches. Can also be a number between 0 and 1, i.e., determining the fraction of observations that should be used for fitting. |
verbose |
Print information during runtime of the algorithm. |
mc.cores |
On how many cores should estimation be started? |
... |
For |
The algorithm uses batch-wise estimation of regression coefficients and smoothing variances.
The smoothing variances are estimated on an hold-out batch. This way, models for very large
data sets can be estimated. Note, the algorithm can work in combination with the ff
and ffbase package, i.e., the entire data is never in the computer RAM. Therefore, the
data can either to be stored as comma separated file on disc or provided as "ffdf"
data frame, see also the examples.
The optimizer functions use additional arguments:
batch_ids
. This argument can either be a list of indices specifying the
batches that should be used for estimation, or a vector of length 2, where the
first element specifies the number of observations that should be sampled for each
batch and the second argument specifies the number of batches, see the example.
nu
, the step length control parameter. Defaults to nu = 0.05
.
If argument slice = TRUE
then nu = 1
.
loglik
, defaults to loglik = FALSE
. If set to loglik = TRUE
the "out-of-sample" log-likelihood is used for smoothing variance estimation.
aic
, defaults to aic = FALSE
, If set to aic = TRUE
the "out-of-sample" AIC is used for smoothing variance estimation.
eps_loglik
, defaults to eps_loglik = 0.01
. This argument specifies
the relative change in the "out-of-sample" log-likelihood that is needed such that
a model term gets updated.
select
, defaults to select = FALSE
. If set to select = TRUE
,
the algorithm only selects the model term with the largest contribution in the
"out-of-sample" log-likelihood for updating in each iteration/batch.
always
, defaults to always = FALSE
. If set to always = TRUE
no log-likelihood contribution checks will be used and model terms are always updated.
K
, defaults to K = 2
. This argument controls the penalty on the
degrees of freedom in the computation of the AIC.
slice
, defaults to slice = FALSE
. If set to slice = TRUE
,
slice sampling using the "out-of-sample" log-likelihood or AIC is used for smoothing
variance estimation. Moreover, always = TRUE
, eps_loglik = -Inf
and
nu = 1
. If slice
is an integer n
, slice sampling is started after
n
iterations, before smoothing variances are optimized.
When using function opt_bbfitp
, the parameter updates are stored as "mcmc"
objects. In this case the traceplots can be visualized using plot.bamlss
.
For function opt_bbfit()
a list containing the following objects:
fitted.values |
A named list of the fitted values of the modeled parameters of the selected distribution. |
parameters |
The estimated set regression coefficients and smoothing variances. |
shuffle |
Logical |
runtime |
The runtime of the algorithm. |
## Not run: ## Simulate data. set.seed(123) d <- GAMart(n = 27000, sd = -1) ## Write data to disc. tf <- tempdir() write.table(d, file.path(tf, "d.raw"), quote = FALSE, row.names = FALSE, sep = ",") ## Model formula. f <- list( y ~ s(x1,k=40) + s(x2,k=40) + s(x3,k=40) + te(lon,lat,k=10), sigma ~ s(x1,k=40) + s(x2,k=40) + s(x3,k=40) + te(lon,lat,k=10) ) ## Specify 50 batches with 1000 observations. batch_ids <- c("nobs" = 1000, "nbatch" = 50) ## Note, can also be a list of indices, e.g. ## batch_ids <- lapply(1:50, function(i) { sample(1:nrow(d), size = 1000) }) ## Different flavors: ## (1) Using "out-of-sample" aic for smoothing ## variance estimation. Update is only accepted ## if the "out-of-sample" log-likelihood is ## increased. If data is a filepath, the data set is ## read into R using package ff and model and ## design matrices are processed with ff. This may ## take some time depending on the size of the data. set.seed(1) b1 <- bamlss(f, data = file.path(tf, "d.raw"), sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = -Inf, always = FALSE) ## Plot estimated effects. ## plot(b1) ## Plot coefficient paths for x3 in mu. ## pathplot(b1, name = "mu.s.s(x3).b") ## (2) Same but always update, this mimics the classic SGD. ## Note, for prediction only the last iteration is ## used in this case. To use more iterations use opt_bbfitp(), ## Then iterations are stored as "mcmc" object and we can ## predict using the burnin argment, e.g., ## p <- predict(b2, model = "mu", burnin = 35) set.seed(2) b2 <- bamlss(f, data = file.path(tf, "d.raw"), sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = -Inf, always = TRUE) ## Plot coefficient paths for x3 in mu. ## pathplot(b2, name = "mu.s.s(x3).b") ## (3) Boosting type flavor, only update model term with ## the largest contribution in the "out-of-sample" ## log-likelihood. In this case, if edf = 0 during ## runtime of the algorithm, no model has an additional ## contribution and the algorithm converges. This ## behavior is controlled by argument eps_loglik, the ## higher eps_loglik, the more restrictive is the ## updating step. ## Initialize intercepts. set.seed(0) batch_ids <- lapply(1:400, function(i) { sample(1:nrow(d), size = 1000) }) b0 <- bamlss(y ~ 1, data = d, sampler = FALSE, optimizer = opt_bbfitp, batch_ids = batch_ids) ## Compute starting values, remove the first ## 10 iterates and compute the mean of the ## remaining iterates. start <- coef(b0, FUN = mean, burnin = 200) ## Start boosting, only update if change in ## "out-of-sample" log-likelihood is 0.1 ## eps_loglik = 0.001. b3 <- bamlss(f, data = d, sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = 0.001, select = TRUE, always = FALSE, start = start) ## Plot log-likelihood contributions. ## contribplot(b3) ## In this case, the algorithm did not converge, ## we need more iterations/batches. ## Note, prediction uses last iterate. p3 <- predict(b3, model = "mu") ## (4) Use slice sampling under the "out-of-sample" ## log likelihood for estimation of smoothing ## variances. In this case model terms are always ## updated ad parameter paths behave like a MCMC ## chain. Therefore, use opt_bbfitp(), which stores ## parameter paths as "mcmc" objects and we can ## inspect using traceplots. Note nu = 1 if ## slice = TRUE. set.seed(4) b4 <- bamlss(f, data = d, sampler = FALSE, optimizer = opt_bbfitp, batch_ids = batch_ids, aic = TRUE, slice = TRUE) ## plot(b4) ## Plot parameter updates/samples. ## plot(b4, which = "samples") ## Predict with burnin and compute mean ## prediction of the last 20 iterates. p4 <- predict(b4, model = "mu", burnin = 30, FUN = mean) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart(n = 27000, sd = -1) ## Write data to disc. tf <- tempdir() write.table(d, file.path(tf, "d.raw"), quote = FALSE, row.names = FALSE, sep = ",") ## Model formula. f <- list( y ~ s(x1,k=40) + s(x2,k=40) + s(x3,k=40) + te(lon,lat,k=10), sigma ~ s(x1,k=40) + s(x2,k=40) + s(x3,k=40) + te(lon,lat,k=10) ) ## Specify 50 batches with 1000 observations. batch_ids <- c("nobs" = 1000, "nbatch" = 50) ## Note, can also be a list of indices, e.g. ## batch_ids <- lapply(1:50, function(i) { sample(1:nrow(d), size = 1000) }) ## Different flavors: ## (1) Using "out-of-sample" aic for smoothing ## variance estimation. Update is only accepted ## if the "out-of-sample" log-likelihood is ## increased. If data is a filepath, the data set is ## read into R using package ff and model and ## design matrices are processed with ff. This may ## take some time depending on the size of the data. set.seed(1) b1 <- bamlss(f, data = file.path(tf, "d.raw"), sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = -Inf, always = FALSE) ## Plot estimated effects. ## plot(b1) ## Plot coefficient paths for x3 in mu. ## pathplot(b1, name = "mu.s.s(x3).b") ## (2) Same but always update, this mimics the classic SGD. ## Note, for prediction only the last iteration is ## used in this case. To use more iterations use opt_bbfitp(), ## Then iterations are stored as "mcmc" object and we can ## predict using the burnin argment, e.g., ## p <- predict(b2, model = "mu", burnin = 35) set.seed(2) b2 <- bamlss(f, data = file.path(tf, "d.raw"), sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = -Inf, always = TRUE) ## Plot coefficient paths for x3 in mu. ## pathplot(b2, name = "mu.s.s(x3).b") ## (3) Boosting type flavor, only update model term with ## the largest contribution in the "out-of-sample" ## log-likelihood. In this case, if edf = 0 during ## runtime of the algorithm, no model has an additional ## contribution and the algorithm converges. This ## behavior is controlled by argument eps_loglik, the ## higher eps_loglik, the more restrictive is the ## updating step. ## Initialize intercepts. set.seed(0) batch_ids <- lapply(1:400, function(i) { sample(1:nrow(d), size = 1000) }) b0 <- bamlss(y ~ 1, data = d, sampler = FALSE, optimizer = opt_bbfitp, batch_ids = batch_ids) ## Compute starting values, remove the first ## 10 iterates and compute the mean of the ## remaining iterates. start <- coef(b0, FUN = mean, burnin = 200) ## Start boosting, only update if change in ## "out-of-sample" log-likelihood is 0.1 ## eps_loglik = 0.001. b3 <- bamlss(f, data = d, sampler = FALSE, optimizer = opt_bbfit, batch_ids = batch_ids, nu = 0.1, aic = TRUE, eps_loglik = 0.001, select = TRUE, always = FALSE, start = start) ## Plot log-likelihood contributions. ## contribplot(b3) ## In this case, the algorithm did not converge, ## we need more iterations/batches. ## Note, prediction uses last iterate. p3 <- predict(b3, model = "mu") ## (4) Use slice sampling under the "out-of-sample" ## log likelihood for estimation of smoothing ## variances. In this case model terms are always ## updated ad parameter paths behave like a MCMC ## chain. Therefore, use opt_bbfitp(), which stores ## parameter paths as "mcmc" objects and we can ## inspect using traceplots. Note nu = 1 if ## slice = TRUE. set.seed(4) b4 <- bamlss(f, data = d, sampler = FALSE, optimizer = opt_bbfitp, batch_ids = batch_ids, aic = TRUE, slice = TRUE) ## plot(b4) ## Plot parameter updates/samples. ## plot(b4, which = "samples") ## Predict with burnin and compute mean ## prediction of the last 20 iterates. p4 <- predict(b4, model = "mu", burnin = 30, FUN = mean) ## End(Not run)
This optimizer function is a generic tool for fitting BAMLSS using a backfitting algorithm. The backfitting procedure is based on iteratively weighted least squares (IWLS) for finding posterior mode estimates, however, the updating methods for model terms can be more general, see the details section. In addition, the default IWLS updating scheme implements optimum smoothing variance selection based on information criteria using a stepwise approach.
## Optimizer functions: opt_bfit(x, y, family, start = NULL, weights = NULL, offset = NULL, update = "iwls", criterion = c("AICc", "BIC", "AIC"), eps = .Machine$double.eps^0.25, maxit = 400, outer = NULL, inner = FALSE, mgcv = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = TRUE, stop.nu = NULL, ...) bfit(x, y, family, start = NULL, weights = NULL, offset = NULL, update = "iwls", criterion = c("AICc", "BIC", "AIC"), eps = .Machine$double.eps^0.25, maxit = 400, outer = NULL, inner = FALSE, mgcv = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = TRUE, stop.nu = NULL, ...) ## Model term updating functions: bfit_iwls(x, family, y, eta, id, weights, criterion, ...) bfit_iwls_Matrix(x, family, y, eta, id, weights, criterion, ...) bfit_lm(x, family, y, eta, id, weights, criterion, ...) bfit_optim(x, family, y, eta, id, weights, criterion, ...) bfit_glmnet(x, family, y, eta, id, weights, criterion, ...)
## Optimizer functions: opt_bfit(x, y, family, start = NULL, weights = NULL, offset = NULL, update = "iwls", criterion = c("AICc", "BIC", "AIC"), eps = .Machine$double.eps^0.25, maxit = 400, outer = NULL, inner = FALSE, mgcv = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = TRUE, stop.nu = NULL, ...) bfit(x, y, family, start = NULL, weights = NULL, offset = NULL, update = "iwls", criterion = c("AICc", "BIC", "AIC"), eps = .Machine$double.eps^0.25, maxit = 400, outer = NULL, inner = FALSE, mgcv = FALSE, verbose = TRUE, digits = 4, flush = TRUE, nu = TRUE, stop.nu = NULL, ...) ## Model term updating functions: bfit_iwls(x, family, y, eta, id, weights, criterion, ...) bfit_iwls_Matrix(x, family, y, eta, id, weights, criterion, ...) bfit_lm(x, family, y, eta, id, weights, criterion, ...) bfit_optim(x, family, y, eta, id, weights, criterion, ...) bfit_glmnet(x, family, y, eta, id, weights, criterion, ...)
x |
For function |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
update |
Sets the updating function for model terms, e.g. for a term |
criterion |
Set the information criterion that should be used, e.g., for smoothing
variance selection. Options are the corrected AIC |
eps |
The relative convergence tolerance of the backfitting algorithm. |
maxit |
The maximum number of iterations for the backfitting algorithm |
outer |
Should the current working observations and weights be computed in one outer iteration, otherwise the working observations are computed anew for each model term updating step. The default will run one outer iteration first, afterwards model weights are computed for each term anew. |
inner |
Should the model terms for one parameter of the modeled distribution be fully updated until convergence in an inner iteration, i.e., the algorithm waits until coefficients for the current distribution parameter do not change anymore before updating the next parameter. |
mgcv |
Should the mgcv |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
flush |
use |
nu |
Logical, numeric or |
stop.nu |
Integer. Should step length reduction be stopped after |
eta |
The current value of the predictors, provided as a named list, one list entry
for each parameter. The names correspond to the parameter names in the family object,
see |
id |
Character, the name of the current parameter for which the model term should be updated. |
... |
For function |
This algorithm is based on iteratively weighted least squares (IWLS) for BAMLSS, i.e., a Newton-Raphson or Fisher scoring algorithm is applied, similar to Rigby~and~Stasinopoulos~(2005). The algorithm utilizes the chain rule for computing derivatives of the log-posterior w.r.t. regression coefficients, therefore, to compute the working observations and weights only the derivatives of the log-likelihood w.r.t. the predictors are required.
It is assumed that the provided family
object holds functions for computing the first
and second order derivatives of the log-likelihood w.r.t. the predictors. These Functions
are provided within the named lists "score"
and "hess"
within the family
object. See the documentation of family.bamlss
and the code of the provided
families, e.g. gaussian_bamlss
, for examples of the required structure.
The algorithm either updates each model term over all distributional parameters sequentially,
or does a full update until convergence for model terms for one distributional parameter before
updating the next parameter, see argument inner
. Additionally, working observations and
weights can be computed only once in an outer
iteration.
Starting values of regression coefficients and smoothing variances can be supplied, moreover,
if a family object holds functions for initializing the distributional parameters, see also
family.bamlss
, starting values are based on the initialize functions.
The default updating function for model terms is based on IWLS, which is assigned by function
bamlss.engine.setup
, however, special updating functions can be used.
This is achieved by providing an updating function to argument
update
, which should be used for all model terms. Another option is to set the updating
function within the xt
argument of the mgcv smooth term constructor functions, see
e.g. function s
. If the xt
list then holds an element named "update"
,
which is a valid updating function, this updating function is used for the corresponding model
term. This way it is possible to call different (special) updating functions for specific terms,
e.g., that do not fit in the IWLS scheme. See the examples below. Note that this does not work if
mgcv = TRUE
, since the gam
function assumes a strict linear
representation of smooth terms.
A model term updating function has the following arguments:
update(x, family, y, eta, id, weights, criterion, ...)
Here x
is an object as returned from function smooth.construct
or smoothCon
. The x
object is preprocessed by function
bamlss.engine.setup
, i.e., an element called "state"
is assigned. The state
element represents the current state of the model term holding the current values of the
parameters with corresponding fitted values, as well as equivalent degrees of freedom, see
also the values that are returned by such functions below. The backfitting algorithm uses the
state of a model term for generating updates of the parameters. Note that for special model
terms the state list should already be provided within the call to the corresponding
smooth constructor function, see the growth curve example below.
In addition, for special model terms the fitted values may not be computed by a linear combination
of the design matrix and the coefficients. Therefore, the x
object should hold an element
named "fit.fun"
which is a function for computing the fitted values.
See also smooth.construct.bamlss.frame
and predict.bamlss
that use
this setup. The arguments of fitting functions are
fit.fun(X, b, ...)
where X
is the design matrix and b
is the vector of coefficients. Hence, for
usual IWLS updating the fitted values are computed by X %*% b
. For special terms like
nonlinear growth curves this may not be the case, see the example below. The fitting functions
are assigned by bamlss.engine.setup
, unless the function is already provided
after calling the constructor function smooth.construct
or
smoothCon
. Note that the dots argument is usually not needed by the user.
The default updating function is bfit_iwls()
. Function bfit_iwls_Matrix()
uses the
sparse matrix infrastructures of package Matrix. The Matrix package and
bfit_iwls_Matrix()
is used for model terms where
the maximum number of non-zero entries in the design matrix is less than half of the total number
of columns, if an additional argument force.Matrix
is set to TRUE
in the
opt_bfit()
call.
The IWLS updating functions find optimum smoothing variances according to an information criterion using a stepwise approach, i.e., in each iteration and for each model term update the updating functions try to find a better smoothing variance to control the trade-off between over-smoothing and nonlinear functional estimation. The search interval is centered around the current state of the smoothing variances, hence, in each iteration only a slight improvement is achieved. This algorithm is based on Belitz~and~Lang~(2008) and can also be viewed as a boosting approach for optimization.
For function opt_bfit()
a list containing the following objects:
fitted.values |
A named list of the fitted values of the modeled parameters of the selected distribution. |
parameters |
The estimated set regression coefficients and smoothing variances. |
edf |
The equivalent degrees of freedom used to fit the model. |
logLik |
The value of the log-likelihood. |
logPost |
The value of the log-posterior. |
IC |
The value of the information criterion. |
converged |
Logical, indicating convergence of the backfitting algorithm. |
For updating functions a list providing the current state |
|
fitted.values |
The resulting fitted values after updating. |
parameters |
The resulting named numeric vector of updated model term parameters.
Coefficients should be named with |
edf |
The equivalent degrees of freedom used to produce the fitted values. |
hessian |
Optional, the coefficient Hessian information |
log.prior |
Optional, the value of the log-prior of the model term. |
Belitz C, Lang S (2008). Simultaneous Selection of Variables and Smoothing Parameters in Structured Additive Regression Models. Computational Statistics & Data Analysis, 53, pp 61-81.
Umlauf N, Klein N, Zeileis A (2016). Bayesian Additive Models for Location Scale and Shape (and Beyond). (to appear)
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape, (with discussion), Appl. Statist., 54, part 3, pp 507-554.
bamlss
, bamlss.frame
,
bamlss.engine.setup
, set.starting.values
, s2
## Not run: ## Simulated data example illustrating ## how to call the optimizer function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart(n = 200) f <- num ~ s(x1) + s(x2) + s(x3) bf <- bamlss.frame(f, data = d, family = "gaussian") opt <- with(bf, opt_bfit(x, y, family)) print(str(opt)) ## Same with bamlss(). b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE) plot(b) summary(b) ## Use of different updating function. b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE, update = bfit_lm) plot(b) ## Use mgcv gam() function for updating. b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE, mgcv = TRUE) plot(b) ## Special smooth constructor including updating/sampler ## function for nonlinear Gompertz curves. ## Note: element special.npar is needed here since this ## function has 3 parameters but the design matrix only ## one column! smooth.construct.gc.smooth.spec <- function(object, data, knots) { object$X <- matrix(as.numeric(data[[object$term]]), ncol = 1) center <- if(!is.null(object$xt$center)) { object$xt$center } else TRUE object$by.done <- TRUE if(object$by != "NA") stop("by variables not supported!") object$fit.fun <- function(X, b, ...) { f <- b[1] * exp(-b[2] * exp(-b[3] * drop(X))) if(center) f <- f - mean(f) f } object$update <- bfit_optim object$propose <- GMCMC_slice object$prior <- function(b) { sum(dnorm(b, sd = 1000, log = TRUE)) } object$fixed <- TRUE object$state$parameters <- c("b1" = 0, "b2" = 0.5, "b3" = 0.1) object$state$fitted.values <- rep(0, length(object$X)) object$state$edf <- 3 object$special.npar <- 3 ## Important! class(object) <- c("gc.smooth", "no.mgcv", "special") object } ## Work around for the "prediction matrix" of a growth curve. Predict.matrix.gc.smooth <- function(object, data, knots) { X <- matrix(as.numeric(data[[object$term]]), ncol = 1) X } ## Heteroscedastic growth curve data example. set.seed(111) d <- data.frame("time" = 1:30) d$y <- 2 + 1 / (1 + exp(0.5 * (15 - d$time))) + rnorm(30, sd = exp(-3 + 2 * cos(d$time/30 * 6 - 3))) ## Special model terms must be called with s2()! f <- list( y ~ s2(time, bs = "gc"), sigma ~ s(time) ) ## Fit model with special model term. b <- bamlss(f, data = d, optimizer = opt_bfit, sampler = sam_GMCMC) ## Plot the fitted curves. plot(b) ## Predict with special model term. nd <- data.frame("time" = seq(1, 30, length = 100)) p <- predict(b, newdata = nd, model = "mu", FUN = c95) plot(d, ylim = range(c(d$y, p))) matplot(nd$time, p, type = "l", lty = c(2, 1, 2), col = "black", add = TRUE) ## End(Not run)
## Not run: ## Simulated data example illustrating ## how to call the optimizer function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart(n = 200) f <- num ~ s(x1) + s(x2) + s(x3) bf <- bamlss.frame(f, data = d, family = "gaussian") opt <- with(bf, opt_bfit(x, y, family)) print(str(opt)) ## Same with bamlss(). b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE) plot(b) summary(b) ## Use of different updating function. b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE, update = bfit_lm) plot(b) ## Use mgcv gam() function for updating. b <- bamlss(f, data = d, family = "gaussian", sampler = FALSE, mgcv = TRUE) plot(b) ## Special smooth constructor including updating/sampler ## function for nonlinear Gompertz curves. ## Note: element special.npar is needed here since this ## function has 3 parameters but the design matrix only ## one column! smooth.construct.gc.smooth.spec <- function(object, data, knots) { object$X <- matrix(as.numeric(data[[object$term]]), ncol = 1) center <- if(!is.null(object$xt$center)) { object$xt$center } else TRUE object$by.done <- TRUE if(object$by != "NA") stop("by variables not supported!") object$fit.fun <- function(X, b, ...) { f <- b[1] * exp(-b[2] * exp(-b[3] * drop(X))) if(center) f <- f - mean(f) f } object$update <- bfit_optim object$propose <- GMCMC_slice object$prior <- function(b) { sum(dnorm(b, sd = 1000, log = TRUE)) } object$fixed <- TRUE object$state$parameters <- c("b1" = 0, "b2" = 0.5, "b3" = 0.1) object$state$fitted.values <- rep(0, length(object$X)) object$state$edf <- 3 object$special.npar <- 3 ## Important! class(object) <- c("gc.smooth", "no.mgcv", "special") object } ## Work around for the "prediction matrix" of a growth curve. Predict.matrix.gc.smooth <- function(object, data, knots) { X <- matrix(as.numeric(data[[object$term]]), ncol = 1) X } ## Heteroscedastic growth curve data example. set.seed(111) d <- data.frame("time" = 1:30) d$y <- 2 + 1 / (1 + exp(0.5 * (15 - d$time))) + rnorm(30, sd = exp(-3 + 2 * cos(d$time/30 * 6 - 3))) ## Special model terms must be called with s2()! f <- list( y ~ s2(time, bs = "gc"), sigma ~ s(time) ) ## Fit model with special model term. b <- bamlss(f, data = d, optimizer = opt_bfit, sampler = sam_GMCMC) ## Plot the fitted curves. plot(b) ## Predict with special model term. nd <- data.frame("time" = seq(1, 30, length = 100)) p <- predict(b, newdata = nd, model = "mu", FUN = c95) plot(d, ylim = range(c(d$y, p))) matplot(nd$time, p, type = "l", lty = c(2, 1, 2), col = "black", add = TRUE) ## End(Not run)
Optimizer functions for gradient and likelihood boosting with bamlss
. In each
boosting iteration the function selects the model term with the largest contribution to the
log-likelihood, AIC or BIC.
## Gradient boosting optimizer. opt_boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) ## Modified likelihood based boosting. opt_boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) ## Boosting summary extractor. boost_summary(object, ...) ## Plot all boosting paths. boost_plot(x, which = c("loglik", "loglik.contrib", "parameters", "aic", "bic", "user"), intercept = TRUE, spar = TRUE, mstop = NULL, name = NULL, drop = NULL, labels = NULL, color = NULL, ...) ## Boosting summary printing and plotting. ## S3 method for class 'boost_summary' print(x, summary = TRUE, plot = TRUE, which = c("loglik", "loglik.contrib"), intercept = TRUE, spar = TRUE, ...) ## S3 method for class 'boost_summary' plot(x, ...) ## Model frame for out-of-sample selection. boost_frame(formula, train, test, family = "gaussian", ...)
## Gradient boosting optimizer. opt_boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) ## Modified likelihood based boosting. opt_boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) ## Boosting summary extractor. boost_summary(object, ...) ## Plot all boosting paths. boost_plot(x, which = c("loglik", "loglik.contrib", "parameters", "aic", "bic", "user"), intercept = TRUE, spar = TRUE, mstop = NULL, name = NULL, drop = NULL, labels = NULL, color = NULL, ...) ## Boosting summary printing and plotting. ## S3 method for class 'boost_summary' print(x, summary = TRUE, plot = TRUE, which = c("loglik", "loglik.contrib"), intercept = TRUE, spar = TRUE, ...) ## S3 method for class 'boost_summary' plot(x, ...) ## Model frame for out-of-sample selection. boost_frame(formula, train, test, family = "gaussian", ...)
x |
For function |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
nu |
Numeric, between [0, 1], controls the step size, i.e., the amount that should be added to model term parameters. |
nu.adapt |
Logical. If set to TRUE (default) step size |
df |
Integer, defines the initial degrees of freedom that should be assigned
to each smooth model term. May also be a named vector, the names must match the model term
labels, e.g., as provided in |
maxit |
Integer, the maximum number of boosting iterations. |
mstop |
For convenience, overwrites |
maxq |
Integer, defines the maximum number of selected base-learners. The algorithm stops if this numer is exceeded. |
qsel.splitfactor |
Logical, if set to |
name |
Character, the name of the coefficient (group) that should be plotted. Note that
the string provided in |
drop |
Character, the name of the coefficient (group) that should not be plotted. |
labels |
A character string of labels that should be used on the 4 axis. |
color |
Colors or color function that creates colors for the (group) paths. |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
flush |
use |
eps |
The tolerance used as stopping mechanism, see argument |
nback |
Integer. If |
plot |
Should the boosting summary be printed and plotted? |
initialize |
Logical, should intercepts be initialized? |
stop.criterion |
Character, selects the information criterion that should be used
to determine the optimum number of boosting iterations. Either |
select.type |
Should model terms be selected by the log-likelihood contribution,
|
force.stop |
Logical, should the algorithm stop if the information criterion increases? |
do.optim |
Logical. Should smoothing parameters be optimized in each boosting iteration? |
hatmatrix |
Logical, if set to |
reverse.edf |
Logical. Instead of computing degrees of freedom with hat-matrices, the actual smoothing parameters are reverse engineered to compute the corresponding actual smoother matrix. Note that this option is still experimental. |
approx.edf |
Logical. Another experimental and fast approximation of the degrees of freedom. |
always |
Logical or character. Should the intercepts forced to be updated in each boosting iteration?
If |
object |
A |
summary |
Should the summary be printed? |
which |
Which of the three provided plots should be created? |
intercept |
Should the coefficient paths of intercepts be dropped in the plot? |
spar |
Should graphical parmeters be set with |
formula |
See |
train , test
|
Data frames used for training and testing the model.. |
... |
For function |
For function boost_summary()
a list containing information on selection frequencies etc.
For function opt_boost()
and opt_boostm()
a list containing the following objects:
fitted.values |
A named list of the fitted values based on the last boosting iteration of the modeled parameters of the selected distribution. |
parameters |
A matrix, each row corresponds to the parameter values of one boosting iteration. |
boost_summary |
The boosting summary which can be printed and plotted. |
The function does not take care of variable scaling for the linear parts! This must be done by the
user, e.g., one option is to use argument scale.d
in function bamlss.frame
,
which uses scale
.
Function opt_boost()
does not select the optimum stopping iteration! The modified likelihood
based algorithm implemented in function opt_boostm()
is still experimental!
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) b <- bamlss(f, data = d, optimizer = opt_boost, sampler = FALSE, scale.d = TRUE, nu = 0.01, maxit = 1000, plot = FALSE) ## Plot estimated effects. ## plot(b) ## Print and plot the boosting summary. boost_summary(b, plot = FALSE) ## boost_plot(b, which = 1) ## boost_plot(b, which = 2) ## boost_plot(b, which = 3, name = "mu.s.te(lon,lat).") ## Extract estimated parameters for certain ## boosting iterations. parameters(b, mstop = 1) parameters(b, mstop = 100) ## Also works with predict(). head(do.call("cbind", predict(b, mstop = 1))) head(do.call("cbind", predict(b, mstop = 100))) ## Another example using the modified likelihood ## boosting algorithm. f <- list( num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat), sigma ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) ) b <- bamlss(f, data = d, optimizer = opt_boostm, sampler = FALSE, scale.d = TRUE, nu = 0.05, maxit = 400, stop.criterion = "AIC", force.stop = FALSE) ## Plot estimated effects. ## plot(b) ## Plot AIC and log-lik contributions. ## boost_plot(b, "AIC") ## boost_plot(b, "loglik.contrib") ## Out-of-sample selection of model terms. set.seed(123) d <- GAMart(n = 5000) ## Split data into training and testing i <- sample(1:2, size = nrow(d), replace = TRUE) dtest <- subset(d, i == 1) dtrain <- subset(d, i == 2) ## Model formula f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Create model frame for out-of-sample selection. sm <- boost_frame(f, train = dtrain, test = dtest, family = "gaussian") ## Out-of-sample selection function. sfun <- function(parameters) { sm$parameters <- parameters p <- predict(sm, type = "parameter") -1 * sum(sm$family$d(dtest$num, p, log = TRUE)) } ## Start boosting with out-of-sample negative ## log-likelihood selection of model terms. b <- bamlss(f, data = dtrain, sampler = FALSE, optimizer = opt_boost, selectfun = sfun, always = "best") ## Plot curve of negative out-of-sample log-likelihood. ## boost_plot(b, which = "user") ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) b <- bamlss(f, data = d, optimizer = opt_boost, sampler = FALSE, scale.d = TRUE, nu = 0.01, maxit = 1000, plot = FALSE) ## Plot estimated effects. ## plot(b) ## Print and plot the boosting summary. boost_summary(b, plot = FALSE) ## boost_plot(b, which = 1) ## boost_plot(b, which = 2) ## boost_plot(b, which = 3, name = "mu.s.te(lon,lat).") ## Extract estimated parameters for certain ## boosting iterations. parameters(b, mstop = 1) parameters(b, mstop = 100) ## Also works with predict(). head(do.call("cbind", predict(b, mstop = 1))) head(do.call("cbind", predict(b, mstop = 100))) ## Another example using the modified likelihood ## boosting algorithm. f <- list( num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat), sigma ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) ) b <- bamlss(f, data = d, optimizer = opt_boostm, sampler = FALSE, scale.d = TRUE, nu = 0.05, maxit = 400, stop.criterion = "AIC", force.stop = FALSE) ## Plot estimated effects. ## plot(b) ## Plot AIC and log-lik contributions. ## boost_plot(b, "AIC") ## boost_plot(b, "loglik.contrib") ## Out-of-sample selection of model terms. set.seed(123) d <- GAMart(n = 5000) ## Split data into training and testing i <- sample(1:2, size = nrow(d), replace = TRUE) dtest <- subset(d, i == 1) dtrain <- subset(d, i == 2) ## Model formula f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Create model frame for out-of-sample selection. sm <- boost_frame(f, train = dtrain, test = dtest, family = "gaussian") ## Out-of-sample selection function. sfun <- function(parameters) { sm$parameters <- parameters p <- predict(sm, type = "parameter") -1 * sum(sm$family$d(dtest$num, p, log = TRUE)) } ## Start boosting with out-of-sample negative ## log-likelihood selection of model terms. b <- bamlss(f, data = dtrain, sampler = FALSE, optimizer = opt_boost, selectfun = sfun, always = "best") ## Plot curve of negative out-of-sample log-likelihood. ## boost_plot(b, which = "user") ## End(Not run)
This function computes posterior mode estimates of the parameters of a flexible Cox model
with structured additive predictors using a Newton-Raphson algorithm. Integrals are solved
numerically. Moreover, optimum smoothing variances are computed using a stepwise optimization,
see also the details section of function bfit
.
opt_Cox(x, y, start, weights, offset, criterion = c("AICc", "BIC", "AIC"), nu = 0.1, update.nu = TRUE, eps = .Machine$double.eps^0.25, maxit = 400, verbose = TRUE, digits = 4, ...) cox_mode(x, y, start, weights, offset, criterion = c("AICc", "BIC", "AIC"), nu = 0.1, update.nu = TRUE, eps = .Machine$double.eps^0.25, maxit = 400, verbose = TRUE, digits = 4, ...)
opt_Cox(x, y, start, weights, offset, criterion = c("AICc", "BIC", "AIC"), nu = 0.1, update.nu = TRUE, eps = .Machine$double.eps^0.25, maxit = 400, verbose = TRUE, digits = 4, ...) cox_mode(x, y, start, weights, offset, criterion = c("AICc", "BIC", "AIC"), nu = 0.1, update.nu = TRUE, eps = .Machine$double.eps^0.25, maxit = 400, verbose = TRUE, digits = 4, ...)
x |
The |
y |
The model response, as returned from function |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
criterion |
Set the information criterion that should be used, e.g., for smoothing
variance selection. Options are the corrected AIC |
nu |
Calibrates the step length of parameter updates of one Newton-Raphson update. |
update.nu |
Should the updating step length be optimized in each iteration of the backfitting algorithm. |
eps |
The relative convergence tolerance of the backfitting algorithm. |
maxit |
The maximum number of iterations for the backfitting algorithm |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
... |
Currently not used. |
A list containing the following objects:
fitted.values |
A named list of the fitted values of the modeled parameters of the selected distribution. |
parameters |
The estimated set regression coefficients and smoothing variances. |
edf |
The equivalent degrees of freedom used to fit the model. |
logLik |
The value of the log-likelihood. |
logPost |
The value of the log-posterior. |
hessian |
The Hessian matrix evaluated at the posterior mode. |
converged |
Logical, indicating convergence of the backfitting algorithm. |
time |
The runtime of the algorithm. |
Umlauf N, Klein N, Zeileis A (2016). Bayesian Additive Models for Location Scale and Shape (and Beyond). (to appear)
sam_Cox
, cox_bamlss
, surv_transform
,
simSurv
, bamlss
## Please see the examples of function sam_Cox()!
## Please see the examples of function sam_Cox()!
This optimizer performs an implicit stochastic gradient descent
algorithm. It is mainly used within a bamlss
call.
opt_isgd(x, y, family, weights = NULL, offset = NULL, gammaFun = function(i) 1/(1 + i), shuffle = TRUE, CFun = function(beta) diag(length(beta)), start = NULL, i.state = 0)
opt_isgd(x, y, family, weights = NULL, offset = NULL, gammaFun = function(i) 1/(1 + i), shuffle = TRUE, CFun = function(beta) diag(length(beta)), start = NULL, i.state = 0)
x |
For function |
y |
The model response, as returned from function
|
family |
A bamlss family object, see
|
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
gammaFun |
Function specifying the step length. |
shuffle |
Should the data be shuffled? |
CFun |
Hessian approximating function. |
start |
Vector of starting values. |
i.state |
Added to |
tpf
For function opt_isgd()
a list containing the following objects:
fitted.values |
A named list of the fitted values based on the last iteration of the modeled parameters of the selected distribution. |
parameters |
A matrix, each row corresponds to the parameter values of one iteration. |
sgd.summary |
The summary of the stochastic gradient descent algorithm which can be printed and plotted. |
CAUTION: Arguments weights
and offset
are not implemented yet!
Motivated by the lecture 'Regression modelling with large data sets' given by Ioannis Kosmidis in Innsbruck, January 2017.
Thorsten Simon
Toulis, P and Airoldi, EM (2015): Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing, 25, no. 4, 781–795. doi: 10.1007/s11222-015-9560-y
## Not run: set.seed(111) d <- GAMart(n = 10000) f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b <- bamlss(f, data = d, optimizer = opt_isgd, sampler = FALSE) plot(b, ask = F) ## loop over observations a 2nd time b <- bamlss(f, data = d, optimizer = opt_isgd, sampler = FALSE, start = parameters(b), i.state = b$model.stats$optimizer$sgd.summary$i.state) plot(b, ask = F) ## try differeent gammaFuns, e.g., # gammaFun <- function(i) .3/sqrt((1+i)) + 0.001 ## testing some families f2 <- bin ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b2 <- bamlss(f2, data = d, optimizer = opt_isgd, sampler = FALSE, family = "binomial") f3 <- cens ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b3 <- bamlss(f3, data = d, optimizer = opt_isgd, sampler = FALSE, family = "cnorm") ## End(Not run)
## Not run: set.seed(111) d <- GAMart(n = 10000) f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b <- bamlss(f, data = d, optimizer = opt_isgd, sampler = FALSE) plot(b, ask = F) ## loop over observations a 2nd time b <- bamlss(f, data = d, optimizer = opt_isgd, sampler = FALSE, start = parameters(b), i.state = b$model.stats$optimizer$sgd.summary$i.state) plot(b, ask = F) ## try differeent gammaFuns, e.g., # gammaFun <- function(i) .3/sqrt((1+i)) + 0.001 ## testing some families f2 <- bin ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b2 <- bamlss(f2, data = d, optimizer = opt_isgd, sampler = FALSE, family = "binomial") f3 <- cens ~ s(x1) + s(x2) + s(x3) + te(lon, lat) b3 <- bamlss(f3, data = d, optimizer = opt_isgd, sampler = FALSE, family = "cnorm") ## End(Not run)
The function either sets up a list of all parameters of a bamlss.frame
, which
can be used for setting up models, or extracts the estimated parameters of a bamlss
object.
parameters(x, model = NULL, start = NULL, fill = c(0, 1e-04), list = FALSE, simple.list = FALSE, extract = FALSE, ...)
parameters(x, model = NULL, start = NULL, fill = c(0, 1e-04), list = FALSE, simple.list = FALSE, extract = FALSE, ...)
x |
A |
model |
The model name for which parameters should be initialized or extracted. |
start |
A named numeric vector which should be used when creating the parameter list.
See also function |
fill |
Numeric, when setting up a parameter list, the values the should be used for regression
coefficients (first element of |
list |
Should the function return a list of all parameters? |
simple.list |
Should the names of parameter vectors be dropped? |
extract |
Should parameters of a |
... |
Currently not used. |
Parameters for BAMLSS are used for optimizer functions in function bamlss
.
The function is useful for initializing all parameters given a bamlss.frame
(which is done internally in function bamlss
), but also for extracting all
estimated parameters of some optimizer.
The naming convention of the parameter list is used by a couple of functions in this package.
For each parameter of the modeled distribution, e.g., gaussian_bamlss
has
parameters "mu"
and "sigma"
, a list element is created. These elements the contain
the list of all model term parameters. Parametric model terms are indicated with "p"
and
smooth model terms with "s"
. If the design matrix of a model term in the x
list
of a bamlss.frame
does not contain any columns names, then the parameters are named
with a leading "b"
, otherwise the column names of the design matrix are used. Smoothing
variances parameter vectors are named with a leading "tau2"
.
The naming convention is useful when setting up new model fitting engines for bamlss
and is used, e.g., by bfit
and GMCMC
, which are based on parameter
state list objects as provided by function bamlss.engine.setup
.
A named list of all parameters of a bamlss.frame
or bamlss
object.
bamlss.frame
, bamlss
, opt_bfit
, sam_GMCMC
,
get.par
, set.par
## Create a "bamlss.frame" set.seed(123) d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + te(lon,lat), data = d) ## Create list of all parameters from "bamlss.frame". p <- parameters(bf, list = TRUE) str(p) ## Not run: ## Estimate model. f <- list(num ~ s(x1) + te(lon,lat), sigma ~ s(x1)) b <- bamlss(f, data = d, sampler = FALSE) ## Extract estimated parameters. parameters(b) parameters(b, list = TRUE) ## End(Not run)
## Create a "bamlss.frame" set.seed(123) d <- GAMart() bf <- bamlss.frame(num ~ s(x1) + te(lon,lat), data = d) ## Create list of all parameters from "bamlss.frame". p <- parameters(bf, list = TRUE) str(p) ## Not run: ## Estimate model. f <- list(num ~ s(x1) + te(lon,lat), sigma ~ s(x1)) b <- bamlss(f, data = d, sampler = FALSE) ## Extract estimated parameters. parameters(b) parameters(b, list = TRUE) ## End(Not run)
This is a simple wrapper function to plot coefficients paths obtained from
the boosting optimizer function boost
and the LASSO optimizer
lasso
.
pathplot(object, ...)
pathplot(object, ...)
object |
An object of class |
... |
Arguments passed to |
Plotting methods for objects of class "bamlss"
and "bamlss.results"
, which can be
used for producing effect plots of model terms, trace plots of samples or residual plots.
Note that effect plots of model terms with more than two covariates are not supported, for this
purpose use function predict.bamlss
.
## S3 method for class 'bamlss' plot(x, model = NULL, term = NULL, which = "effects", parameters = FALSE, ask = dev.interactive(), spar = TRUE, ...) ## S3 method for class 'bamlss.results' plot(x, model = NULL, term = NULL, ask = dev.interactive(), scale = 1, spar = TRUE, ...)
## S3 method for class 'bamlss' plot(x, model = NULL, term = NULL, which = "effects", parameters = FALSE, ask = dev.interactive(), spar = TRUE, ...) ## S3 method for class 'bamlss.results' plot(x, model = NULL, term = NULL, ask = dev.interactive(), scale = 1, spar = TRUE, ...)
x |
An object of class |
model |
Character or integer. For which model should the plots be created? |
term |
Character or integer. For which model term should a plot be created? |
which |
Character or integer, selects the type of plot: |
parameters |
For trace plots of parameters, should corresponding parameter values as returned
from an optimizer function (e.g., |
ask |
For multiple plots, the user is asked to show the next plot. |
scale |
If set to |
spar |
Should graphical parameters be set? |
... |
Arguments to be passed to |
bamlss
, results.bamlss.default
, residuals.bamlss
.
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x2) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Effect plots plot(b, ask = FALSE) plot(b, model = "mu") plot(b, model = "sigma", term = "te(lon,lat)") ## Trace plots. plot(b, which = "samples") ## Residual plots. plot(b, which = 3:4) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x2) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Effect plots plot(b, ask = FALSE) plot(b, model = "mu") plot(b, model = "sigma", term = "te(lon,lat)") ## Trace plots. plot(b, which = "samples") ## Residual plots. plot(b, which = 3:4) ## End(Not run)
Function to plot simple 2D graphics for univariate effects/functions.
plot2d(x, residuals = FALSE, rug = FALSE, jitter = TRUE, col.residuals = NULL, col.lines = NULL, col.polygons = NULL, col.rug = NULL, c.select = NULL, fill.select = NULL, data = NULL, sep = "", month = NULL, year = NULL, step = 12, shift = NULL, trans = NULL, scheme = 2, s2.col = NULL, grid = 50, ...)
plot2d(x, residuals = FALSE, rug = FALSE, jitter = TRUE, col.residuals = NULL, col.lines = NULL, col.polygons = NULL, col.rug = NULL, c.select = NULL, fill.select = NULL, data = NULL, sep = "", month = NULL, year = NULL, step = 12, shift = NULL, trans = NULL, scheme = 2, s2.col = NULL, grid = 50, ...)
x |
A matrix or data frame, containing the covariate for which the effect should be plotted
in the first column and at least a second column containing the effect. Another possibility is
to specify the plot via a |
residuals |
If set to |
rug |
Add a |
jitter |
|
col.residuals |
The color of the partial residuals. |
col.lines |
The color of the lines. |
col.polygons |
Specify the background color of polygons, if |
col.rug |
Specify the color of the rug representation. |
c.select |
Integer vector of maximum length of columns of |
fill.select |
Integer vector, select pairwise the columns of the resulting data matrix
that should form one polygon with a certain background color specified in argument |
data |
If |
sep |
The field separator character when |
month , year , step
|
Provide specific annotation for plotting estimation results for temporal
variables. |
shift |
Numeric constant to be added to the smooth before plotting. |
trans |
Function to be applied to the smooth before plotting, e.g., to transform the plot to the response scale. |
scheme |
Sets the plotting scheme for polygons, possible values are |
s2.col |
The color for the second plotting scheme. |
grid |
Integer, specifies the number of polygons for the second plotting scheme. |
... |
Other graphical parameters, please see the details. |
For 2D plots the following graphical parameters may be specified additionally:
cex
: Specify the size of partial residuals,
lty
: The line type for each column that is plotted, e.g. lty = c(1, 2)
,
lwd
: The line width for each column that is plotted, e.g. lwd = c(1, 2)
,
poly.lty
: The line type to be used for the polygons,
poly.lwd
: The line width to be used for the polygons,
density
angle
, border
: See polygon
,
...
: Other graphical parameters, see function plot
.
plot3d
, plotmap
,
plotblock
, sliceplot
.
## Generate some data. set.seed(111) n <- 500 ## Regressor. d <- data.frame(x = runif(n, -3, 3)) ## Response. d$y <- with(d, 10 + sin(x) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(x), data = d) summary(b) ## Plot estimated effect. plot(b) plot(b, rug = FALSE) ## Extract fitted values. f <- fitted(b, model = "mu", term = "s(x)") f <- cbind(d["x"], f) ## Now use plot2d. plot2d(f) plot2d(f, fill.select = c(0, 1, 0, 1)) plot2d(f, fill.select = c(0, 1, 0, 1), lty = c(2, 1, 2)) plot2d(f, fill.select = c(0, 1, 0, 1), lty = c(2, 1, 2), scheme = 2) ## Variations. plot2d(sin(x) ~ x, data = d) d$f <- with(d, sin(d$x)) plot2d(f ~ x, data = d) d$f1 <- with(d, f + 0.1) d$f2 <- with(d, f - 0.1) plot2d(f1 + f2 ~ x, data = d) plot2d(f1 + f2 ~ x, data = d, fill.select = c(0, 1, 1), lty = 0) plot2d(f1 + f2 ~ x, data = d, fill.select = c(0, 1, 1), lty = 0, density = 20, poly.lty = 1, poly.lwd = 2) plot2d(f1 + f + f2 ~ x, data = d, fill.select = c(0, 1, 0, 1), lty = c(0, 1, 0), density = 20, poly.lty = 1, poly.lwd = 2) ## End(Not run)
## Generate some data. set.seed(111) n <- 500 ## Regressor. d <- data.frame(x = runif(n, -3, 3)) ## Response. d$y <- with(d, 10 + sin(x) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(x), data = d) summary(b) ## Plot estimated effect. plot(b) plot(b, rug = FALSE) ## Extract fitted values. f <- fitted(b, model = "mu", term = "s(x)") f <- cbind(d["x"], f) ## Now use plot2d. plot2d(f) plot2d(f, fill.select = c(0, 1, 0, 1)) plot2d(f, fill.select = c(0, 1, 0, 1), lty = c(2, 1, 2)) plot2d(f, fill.select = c(0, 1, 0, 1), lty = c(2, 1, 2), scheme = 2) ## Variations. plot2d(sin(x) ~ x, data = d) d$f <- with(d, sin(d$x)) plot2d(f ~ x, data = d) d$f1 <- with(d, f + 0.1) d$f2 <- with(d, f - 0.1) plot2d(f1 + f2 ~ x, data = d) plot2d(f1 + f2 ~ x, data = d, fill.select = c(0, 1, 1), lty = 0) plot2d(f1 + f2 ~ x, data = d, fill.select = c(0, 1, 1), lty = 0, density = 20, poly.lty = 1, poly.lwd = 2) plot2d(f1 + f + f2 ~ x, data = d, fill.select = c(0, 1, 0, 1), lty = c(0, 1, 0), density = 20, poly.lty = 1, poly.lwd = 2) ## End(Not run)
Function to plot 3D graphics or image and/or contour plots for bivariate effects/functions.
plot3d(x, residuals = FALSE, col.surface = NULL, ncol = 99L, swap = FALSE, col.residuals = NULL, col.contour = NULL, c.select = NULL, grid = 30L, image = FALSE, contour = FALSE, legend = TRUE, cex.legend = 1, breaks = NULL, range = NULL, digits = 2L, d.persp = 1L, r.persp = sqrt(3), outscale = 0, data = NULL, sep = "", shift = NULL, trans = NULL, type = "mba", linear = FALSE, extrap = FALSE, k = 40, ...)
plot3d(x, residuals = FALSE, col.surface = NULL, ncol = 99L, swap = FALSE, col.residuals = NULL, col.contour = NULL, c.select = NULL, grid = 30L, image = FALSE, contour = FALSE, legend = TRUE, cex.legend = 1, breaks = NULL, range = NULL, digits = 2L, d.persp = 1L, r.persp = sqrt(3), outscale = 0, data = NULL, sep = "", shift = NULL, trans = NULL, type = "mba", linear = FALSE, extrap = FALSE, k = 40, ...)
x |
A matrix or data frame, containing the covariates for which the effect should be plotted
in the first and second column and at least a third column containing the effect. Another
possibility is to specify the plot via a |
residuals |
If set to |
col.surface |
The color of the surface, may also be a function, e.g.
|
ncol |
the number of different colors that should be generated, if |
swap |
If set to |
col.residuals |
The color of the partial residuals, or if |
col.contour |
The color of the contour lines. |
c.select |
Integer vector of maximum length of columns of |
grid |
The grid size of the surface(s). |
image |
If set to |
contour |
If set to |
legend |
If |
cex.legend |
The expansion factor for the legend text, see |
breaks |
A set of breakpoints for the colors: must give one more breakpoint than
|
range |
Specifies a certain range values should be plotted for. |
digits |
Specifies the legend decimal places. |
d.persp |
See argument |
r.persp |
See argument |
outscale |
Scales the outer ranges of |
data |
If |
sep |
The field separator character when |
shift |
Numeric constant to be added to the smooth before plotting. |
trans |
Function to be applied to the smooth before plotting, e.g., to transform the plot to the response scale. |
type |
Character, which type of interpolation method should be used. The default is
|
linear |
Logical, should linear interpolation be used withing function
|
extrap |
Logical, should interpolations be computed outside the observation area (i.e., extrapolated)? |
k |
Integer, the number of basis functions to be used to compute the interpolated surface
when |
... |
Parameters passed to |
For 3D plots the following graphical parameters may be specified additionally:
cex
: Specify the size of partial residuals,
col
: It is possible to specify the color for the surfaces if se > 0
, then
e.g. col = c("green", "black", "red")
,
pch
: The plotting character of the partial residuals,
...
: Other graphical parameters passed functions persp
,
image.plot
and contour
.
Function plot3d
can use the akima package to construct smooth interpolated
surfaces, therefore, package akima needs to be installed. The akima package has an ACM
license that restricts applications to non-commercial usage, see
https://www.acm.org/publications/policies/software-copyright-notice
Function plot3d
prints a note referring to the ACM license. This note can be suppressed by
setting
options("use.akima" = TRUE)
colorlegend
, plot2d
, plotmap
,
plotblock
, sliceplot
.
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(z = runif(n, -3, 3), w = runif(n, 0, 6)) ## Response. d$y <- with(d, 1.5 + cos(z) * sin(w) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(z,w), data = d) summary(b) ## Plot estimated effect. plot(b, model = "mu", term = "s(z,w)") ## Extract fitted values. f <- fitted(b, model = "mu", term = "s(z,w)", intercept = FALSE) f <- cbind(d[, c("z", "w")], f) ## Now use plot3d(). plot3d(f) plot3d(f, swap = TRUE) plot3d(f, grid = 100, border = NA) ## Only works if columns are named with ## '2.5 plot3d(f, c.select = 95, border = c("red", NA, "green"), col.surface = c(1, NA, 1), resid = TRUE, cex.resid = 0.2) ## Now some image and contour. # plot3d(f, image = TRUE, legend = FALSE) # plot3d(f, image = TRUE, legend = TRUE) # plot3d(f, image = TRUE, contour = TRUE) # plot3d(f, image = TRUE, contour = TRUE, swap = TRUE) # plot3d(f, image = TRUE, contour = TRUE, col.contour = "white") # plot3d(f, contour = TRUE) # plot3d(f, image = TRUE, contour = TRUE, c.select = 3) # plot3d(f, image = TRUE, contour = TRUE, c.select = "Mean") # plot3d(f, image = TRUE, contour = TRUE, c.select = "97.5 ## End(Not run) ## Variations. d$f1 <- with(d, sin(z) * cos(w)) with(d, plot3d(cbind(z, w, f1))) ## Same with formula. plot3d(sin(z) * cos(w) ~ z + w, zlab = "f(z,w)", data = d) plot3d(sin(z) * cos(w) ~ z + w, zlab = "f(z,w)", data = d, ticktype = "detailed") ## Play with palettes. plot3d(sin(z) * cos(w) ~ z + w, col.surface = heat.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = topo.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = cm.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = terrain.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = diverge_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = sequential_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow_hcl(n = 99, c = 300, l = 80, start = 0, end = 100), data = d) # plot3d(sin(z) * cos(w) ~ z + w, # col.surface = rainbow_hcl(n = 99, c = 300, l = 80, start = 0, end = 100), # image = TRUE, grid = 200, data = d)
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(z = runif(n, -3, 3), w = runif(n, 0, 6)) ## Response. d$y <- with(d, 1.5 + cos(z) * sin(w) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(z,w), data = d) summary(b) ## Plot estimated effect. plot(b, model = "mu", term = "s(z,w)") ## Extract fitted values. f <- fitted(b, model = "mu", term = "s(z,w)", intercept = FALSE) f <- cbind(d[, c("z", "w")], f) ## Now use plot3d(). plot3d(f) plot3d(f, swap = TRUE) plot3d(f, grid = 100, border = NA) ## Only works if columns are named with ## '2.5 plot3d(f, c.select = 95, border = c("red", NA, "green"), col.surface = c(1, NA, 1), resid = TRUE, cex.resid = 0.2) ## Now some image and contour. # plot3d(f, image = TRUE, legend = FALSE) # plot3d(f, image = TRUE, legend = TRUE) # plot3d(f, image = TRUE, contour = TRUE) # plot3d(f, image = TRUE, contour = TRUE, swap = TRUE) # plot3d(f, image = TRUE, contour = TRUE, col.contour = "white") # plot3d(f, contour = TRUE) # plot3d(f, image = TRUE, contour = TRUE, c.select = 3) # plot3d(f, image = TRUE, contour = TRUE, c.select = "Mean") # plot3d(f, image = TRUE, contour = TRUE, c.select = "97.5 ## End(Not run) ## Variations. d$f1 <- with(d, sin(z) * cos(w)) with(d, plot3d(cbind(z, w, f1))) ## Same with formula. plot3d(sin(z) * cos(w) ~ z + w, zlab = "f(z,w)", data = d) plot3d(sin(z) * cos(w) ~ z + w, zlab = "f(z,w)", data = d, ticktype = "detailed") ## Play with palettes. plot3d(sin(z) * cos(w) ~ z + w, col.surface = heat.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = topo.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = cm.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = terrain.colors, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = diverge_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = sequential_hcl, data = d) plot3d(sin(z) * cos(w) ~ z + w, col.surface = rainbow_hcl(n = 99, c = 300, l = 80, start = 0, end = 100), data = d) # plot3d(sin(z) * cos(w) ~ z + w, # col.surface = rainbow_hcl(n = 99, c = 300, l = 80, start = 0, end = 100), # image = TRUE, grid = 200, data = d)
Function to plot effects for model terms including factor, or group variables for random effects.
plotblock(x, residuals = FALSE, range = c(0.3, 0.3), col.residuals = "black", col.lines = "black", c.select = NULL, fill.select = NULL , col.polygons = NULL, data = NULL, shift = NULL, trans = NULL, labels = NULL, ...)
plotblock(x, residuals = FALSE, range = c(0.3, 0.3), col.residuals = "black", col.lines = "black", c.select = NULL, fill.select = NULL , col.polygons = NULL, data = NULL, shift = NULL, trans = NULL, labels = NULL, ...)
x |
Either a |
residuals |
If set to |
range |
Numeric vector, specifying the left and right bound of the block. |
col.residuals |
The color of the partial residuals. |
col.lines |
Vector of maximum length of columns of |
c.select |
Integer vector of maximum length of columns of |
fill.select |
Integer vector, select pairwise the columns of the resulting data matrix
that should form one polygon with a certain background color specified in argument |
col.polygons |
Specify the background color for the upper and lower confidence bands, e.g.
|
data |
If |
shift |
Numeric constant to be added to the smooth before plotting. |
trans |
Function to be applied to the smooth before plotting, e.g., to transform the plot to the response scale. |
labels |
Character, labels for the factor levels. |
... |
Graphical parameters, please see the details. |
Function plotblock
draws for every factor or group the effect as a "block" in one graphic,
i.e., similar to boxplots, estimated fitted effects, e.g., containing quantiles of MCMC samples,
are drawn as one block, where the upper lines represent upper quantiles, the
middle line the mean or median, and lower lines lower quantiles, also see the examples. The
following graphical parameters may be supplied additionally:
cex
: Specify the size of partial residuals,
lty
: The line type for each column that is plotted, e.g. lty = c(1, 2)
,
lwd
: The line width for each column that is plotted, e.g. lwd = c(1, 2)
,
poly.lty
: The line type to be used for the polygons,
poly.lwd
: The line width to be used for the polygons,
density
angle
, border
: See polygon
,
...
: Other graphical parameters, see function plot
.
plot2d
, plot3d
,
plotmap
, sliceplot
.
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(fac = factor(rep(1:10, n/10))) ## Response. d$y <- with(d, 1.5 + rnorm(10, sd = 0.6)[fac] + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(fac,bs="re"), data = d) summary(b) ## Plot random effects. plot(b) ## Extract fitted values. f <- fitted(b, model = "mu", term = "fac") f <- cbind(d["fac"], f) ## Now use plotblock. plotblock(f) ## Variations. plotblock(f, fill.select = c(0, 1, 0, 1), col.poly = "red") plotblock(f, fill.select = c(0, 1, 0, 1), col.poly = "lightgray", lty = c(2, 1, 2), lwd = c(2, 1, 2)) ## End(Not run) ## More examples. plotblock(y ~ fac, data = d, range = c(0.45, 0.45)) d <- data.frame(fac = factor(rep(1:10, n/10))) d$y <- with(d, c(2.67, 5, 6, 3, 4, 2, 6, 7, 9, 7.5)[fac]) plotblock(y ~ fac, data = d) plotblock(cbind(y - 0.1, y + 0.1) ~ fac, data = d)
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(fac = factor(rep(1:10, n/10))) ## Response. d$y <- with(d, 1.5 + rnorm(10, sd = 0.6)[fac] + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ s(fac,bs="re"), data = d) summary(b) ## Plot random effects. plot(b) ## Extract fitted values. f <- fitted(b, model = "mu", term = "fac") f <- cbind(d["fac"], f) ## Now use plotblock. plotblock(f) ## Variations. plotblock(f, fill.select = c(0, 1, 0, 1), col.poly = "red") plotblock(f, fill.select = c(0, 1, 0, 1), col.poly = "lightgray", lty = c(2, 1, 2), lwd = c(2, 1, 2)) ## End(Not run) ## More examples. plotblock(y ~ fac, data = d, range = c(0.45, 0.45)) d <- data.frame(fac = factor(rep(1:10, n/10))) d$y <- with(d, c(2.67, 5, 6, 3, 4, 2, 6, 7, 9, 7.5)[fac]) plotblock(y ~ fac, data = d) plotblock(cbind(y - 0.1, y + 0.1) ~ fac, data = d)
The function takes a list
polygons and draws the corresponding map. Different
colors for each polygon can be used.
plotmap(map, x = NA, id = NULL, select = NULL, legend = TRUE, names = FALSE, values = FALSE, ...)
plotmap(map, x = NA, id = NULL, select = NULL, legend = TRUE, names = FALSE, values = FALSE, ...)
map |
The map to be plotted, usually an object that inherits from
|
x |
A vector, data.frame or matrix. In the latter case |
id |
If argument |
select |
Select the column of the data in |
legend |
Should a color legend be added to the plot, see also function |
names |
If set to |
values |
If set to |
... |
Arguments to be passed to |
plot2d
, plot3d
,
sliceplot
, plotblock
.
## Example from mgcv ?mrf. ## Load Columbus Ohio crime data (see ?columbus for details and credits). data("columb", package = "mgcv") data("columb.polys", package = "mgcv") ## Plot the map. plotmap(columb.polys) ## Plot aggregated data. a <- with(columb, aggregate(crime, by = list("district" = district), FUN = mean)) plotmap(columb.polys, x = a$x, id = a$district) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft") plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl, swap = TRUE) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl, swap = TRUE, range = c(10, 50)) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl(5), swap = TRUE, range = c(10, 50), lrange = c(0, 60))
## Example from mgcv ?mrf. ## Load Columbus Ohio crime data (see ?columbus for details and credits). data("columb", package = "mgcv") data("columb.polys", package = "mgcv") ## Plot the map. plotmap(columb.polys) ## Plot aggregated data. a <- with(columb, aggregate(crime, by = list("district" = district), FUN = mean)) plotmap(columb.polys, x = a$x, id = a$district) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft") plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl, swap = TRUE) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl, swap = TRUE, range = c(10, 50)) plotmap(columb.polys, x = a$x, id = a$district, pos = "topleft", side.legend = 2, side.ticks = 2, col = heat_hcl(5), swap = TRUE, range = c(10, 50), lrange = c(0, 60))
Takes a fitted bamlss
object and computes predictions. Predictions can be based on
estimated parameters of optimizer functions or on samples returned from sampler functions.
It is possible to compute predictions on multiple cores using the
parallel
and to chunk predictions to save computation time and memory
storage. Predictions can be computed for full distributional parameters or specific model
terms. If a link{bamlss}
model was fitted on multiple cores, i.e., the samples
are provided as link{mcmc.list}
where each list entry represents samples from one
core, function predict.bamlss()
computes combined predictions based on samples of all
cores.
## S3 method for class 'bamlss' predict(object, newdata, model = NULL, term = NULL, match.names = TRUE, intercept = TRUE, type = c("link", "parameter"), FUN = function(x) { mean(x, na.rm = TRUE) }, trans = NULL, what = c("samples", "parameters"), nsamps = NULL, verbose = FALSE, drop = TRUE, cores = NULL, chunks = 1, ...)
## S3 method for class 'bamlss' predict(object, newdata, model = NULL, term = NULL, match.names = TRUE, intercept = TRUE, type = c("link", "parameter"), FUN = function(x) { mean(x, na.rm = TRUE) }, trans = NULL, what = c("samples", "parameters"), nsamps = NULL, verbose = FALSE, drop = TRUE, cores = NULL, chunks = 1, ...)
object |
An object of class |
newdata |
A data frame or list containing the values of the model
covariates at which predictions are required. Note that depending
on argument |
model |
Character or integer, specifies the model for which predictions should be computed. |
term |
Character or integer, specifies the model terms for which predictions are required.
Note that, e.g., |
match.names |
Should partial string matching be used to select the |
intercept |
Should the intercept be included? |
type |
If |
FUN |
A function that should be applied on the samples of predictors or
parameters, depending on argument |
trans |
A transformer function or named list of transformer functions that computes
transformed predictions. If |
what |
Predictions can be computed from samples or estimated parameters of optimizer functions. If no samples are available the default is to use estimated parameters. |
nsamps |
If the fitted |
verbose |
If predictions are chunked, information on the prediction process can be printed. |
drop |
If predictions for only one |
cores |
Specifies the number of cores that should be used for prediction. Note that
this functionality is based on the |
chunks |
Should computations be split into |
... |
Arguments passed to prediction functions that are part of a |
Depending on arguments model
, FUN
and the structure of the bamlss
model, a list of predictions or simple vectors or matrices of predictions.
link{bamlss}
, fitted.bamlss
.
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x1) + s(x2) + s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Predictions. p <- predict(b) str(b) ## Prediction for "mu" model and term "s(x2)". p <- predict(b, model = "mu", term = "s(x2)") ## Plot effect plot2d(p ~ x2, data = d) ## Same for "sigma" model. p <- predict(b, model = "sigma", term = "s(x2)") plot2d(p ~ x2, data = d) ## Prediction for "mu" model and term "s(x1)" + "s(x2)" ## without intercept. p <- predict(b, model = "mu", term = c("s(x1)", "s(x2)"), intercept = FALSE) ## Prediction based on quantiles. p <- predict(b, model = "mu", term = "s(x2)", FUN = c95) plot2d(p ~ x2, data = d) ## Extract samples of predictor for "s(x2)". p <- predict(b, model = "mu", term = "s(x2)", intercept = FALSE, FUN = function(x) { x }) print(dim(p)) plot2d(p ~ x2, data = d, col.lines = rgb(0.1, 0.1, 0.1, alpha = 0.1)) ## Or using specific combinations of terms. p <- predict(b, model = "mu", term = c("s(x2)", "te(lon,lat)"), intercept = FALSE, FUN = function(x) { x }) head(p) ## Prediction using new data. ## Only need x3 data when predicting ## for s(x3). nd <- data.frame("x3" = seq(0, 1, length = 100)) nd <- cbind(nd, predict(b, newdata = nd, term = "s(x3)")) print(head(nd)) plot2d(mu ~ x3, data = nd) plot2d(sigma ~ x3, data = nd) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), sigma ~ s(x1) + s(x2) + s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Predictions. p <- predict(b) str(b) ## Prediction for "mu" model and term "s(x2)". p <- predict(b, model = "mu", term = "s(x2)") ## Plot effect plot2d(p ~ x2, data = d) ## Same for "sigma" model. p <- predict(b, model = "sigma", term = "s(x2)") plot2d(p ~ x2, data = d) ## Prediction for "mu" model and term "s(x1)" + "s(x2)" ## without intercept. p <- predict(b, model = "mu", term = c("s(x1)", "s(x2)"), intercept = FALSE) ## Prediction based on quantiles. p <- predict(b, model = "mu", term = "s(x2)", FUN = c95) plot2d(p ~ x2, data = d) ## Extract samples of predictor for "s(x2)". p <- predict(b, model = "mu", term = "s(x2)", intercept = FALSE, FUN = function(x) { x }) print(dim(p)) plot2d(p ~ x2, data = d, col.lines = rgb(0.1, 0.1, 0.1, alpha = 0.1)) ## Or using specific combinations of terms. p <- predict(b, model = "mu", term = c("s(x2)", "te(lon,lat)"), intercept = FALSE, FUN = function(x) { x }) head(p) ## Prediction using new data. ## Only need x3 data when predicting ## for s(x3). nd <- data.frame("x3" = seq(0, 1, length = 100)) nd <- cbind(nd, predict(b, newdata = nd, term = "s(x3)")) print(head(nd)) plot2d(mu ~ x3, data = nd) plot2d(sigma ~ x3, data = nd) ## End(Not run)
The transformer function takes a bamlss.frame
object and transforms all
smooth.construct
s into a random effects representation. Note that this
is only possible for smooth terms with a single smoothing variance. The function
is based on function smooth2random
.
trans_random(x) randomize(x)
trans_random(x) randomize(x)
x |
Object returned from function |
The decomposition is achieved by a spectral decomposition of the penalty and design matrix by
finding a basis of the null space of the penalty matrix. This feature is used, e.g., for the
JAGS
sampler function. For more details see also jagam
.
A transformed bamlss.frame
. To each smooth.construct
model term
an element named "Xf"
, the fixed effects design matrix, and an element "rand$Xr"
,
the random effects design matrix, is added. In addition, for re-transforming parameters elements
"trans.U"
and "trans.D"
are supplied. See also function
smooth2random
.
Fahrmeir L, Kneib T, Lang S, Marx B (2013). Regression - Models, Methods and Applications. Springer-Verlag, Berlin. ISBN 978-3-642-34332-2.
Wood S.N. (2006). Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC.
bamlss.frame
, bamlss
, smooth2random
.
## Simulate data. d <- GAMart() ## Create a "bamlss.frame". bf <- bamlss.frame(num ~ s(x1) + s(x2) + s(x3) + s(lon,lat), data = d) ## Structure of the "s(x1)" smooth.construct. str(bf$x$mu$smooth.construct[["s(x1)"]]) ## Transform. bf <- randomize(bf) ## New structure adding fixed ## and random effect matrices. str(bf$x$mu$smooth.construct[["s(x1)"]])
## Simulate data. d <- GAMart() ## Create a "bamlss.frame". bf <- bamlss.frame(num ~ s(x1) + s(x2) + s(x3) + s(lon,lat), data = d) ## Structure of the "s(x1)" smooth.construct. str(bf$x$mu$smooth.construct[["s(x1)"]]) ## Transform. bf <- randomize(bf) ## New structure adding fixed ## and random effect matrices. str(bf$x$mu$smooth.construct[["s(x1)"]])
This smooth constructor implements random bits model terms. Note that this is experimental.
## Linear smooth constructor. rb(..., k = 50) ## For mgcv. ## S3 method for class 'randombits.smooth.spec' smooth.construct(object, data, knots, ...)
## Linear smooth constructor. rb(..., k = 50) ## For mgcv. ## S3 method for class 'randombits.smooth.spec' smooth.construct(object, data, knots, ...)
... |
For function |
k |
Integer, number of radnom bit columns in the design matrix. |
object , data , knots
|
See |
Function rb()
, similar to function s
a simple smooth specification
object.
bamlss
, predict.bamlss
, bfit
, boost
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ rb(x1) + rb(x2) + rb(x3) + rb(~lon+lat) b <- bamlss(f, data = d) plot(b) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ rb(x1) + rb(x2) + rb(x3) + rb(~lon+lat) b <- bamlss(f, data = d) plot(b) ## End(Not run)
Function to compute quantile and response residuals.
## S3 method for class 'bamlss' residuals(object, type = c("quantile", "response"), nsamps = NULL, ...) ## S3 method for class 'bamlss.residuals' plot(x, which = c("hist-resid", "qq-resid", "wp"), spar = TRUE, ...)
## S3 method for class 'bamlss' residuals(object, type = c("quantile", "response"), nsamps = NULL, ...) ## S3 method for class 'bamlss.residuals' plot(x, which = c("hist-resid", "qq-resid", "wp"), spar = TRUE, ...)
object |
An object of class |
type |
The type of residuals wanted, possible types are
|
nsamps |
If the fitted |
x |
Object returned from function |
which |
Should a histogram with kernel density estimates be plotted, a qq-plot or a worm plot? |
spar |
Should graphical parameters be set by the plotting function? |
... |
For function |
Response residuals are the raw residuals, i.e., the response data minus the fitted distributional
mean. If the bamlss.family
object contains a function $mu(par, ...)
, then
raw residuals are computed with y - mu(par)
where par
is the named list of fitted
values of distributional parameters. If $mu(par, ...)
is missing, then the fitted values
of the first distributional parameter are used.
Randomized quantile residuals are based on the cumulative distribution function of the
bamlss.family
object, i.e., the $p(y, par, ...)
function.
A vector of residuals.
Dunn P. K., and Smyth G. K. (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics 5, 236–244.
van Buuren S., and Fredriks M. (2001) Worm Plot: Simple Diagnostic Device for Modelling Growth Reference Curves. Statistics in Medicine, 20, 1259–1277
bamlss
, predict.bamlss
, fitted.bamlss
.
## Not run: ## Generate data. d <- GAMart() ## Estimate models. b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract quantile residuals. e1 <- residuals(b1, type = "quantile") e2 <- residuals(b2, type = "quantile") ## Plots. plot(e1) plot(e2) ## End(Not run)
## Not run: ## Generate data. d <- GAMart() ## Estimate models. b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract quantile residuals. e1 <- residuals(b1, type = "quantile") e2 <- residuals(b2, type = "quantile") ## Plots. plot(e1) plot(e2) ## End(Not run)
bamlss.frame
object.
This is a small helper function to quickly extract the response name(s) of an object of
class "bamlss.frame"
or "bamlss"
.
response_name(object, ...)
response_name(object, ...)
object |
An object of class |
... |
Not used. |
## Simulate some data. d <- GAMart() ## Create a bamlss.frame. bf <- bamlss.frame(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract the response name. response_name(bf)
## Simulate some data. d <- GAMart() ## Create a bamlss.frame. bf <- bamlss.frame(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract the response name. response_name(bf)
The results function combines estimated parameters and/or samples with the
bamlss.frame
and computes the data that can be used, e.g., for
creating effect plots or summary statistics. The function is usually used
internally within bamlss
. The object returned is of class "bamlss.results",
which has a plotting method, see plot.bamlss.results
.
results.bamlss.default(x, what = c("samples", "parameters"), grid = -1, nsamps = NULL, burnin = NULL, thin = NULL, ...)
results.bamlss.default(x, what = c("samples", "parameters"), grid = -1, nsamps = NULL, burnin = NULL, thin = NULL, ...)
x |
A |
what |
Should the results data be prepared using estimated parameters or samples? |
grid |
Integer, sets the number of grid points for univariate functions to be used for creating results data, e.g., for plotting. This is more efficient when using data sets with a large number of unique covariate values. If negative suitable defaults are chosen. |
nsamps |
Integer, if results are computed using parameter samples, this argument controls
the number of samples that should be used, e.g., if |
burnin |
Integer, sets the number of samples that should be dropped from the beginning of the MCMC chain when creating results. |
thin |
Integer, should the MCMC chain be thinned additionally? |
... |
Currently not used. |
An object of class "bamlss.results"
.
## Not run: ## Simulate data. d <- GAMart() ## Estimate model with no results. b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, results = FALSE) ## Compute model results a <- results.bamlss.default(b) ## Plot results for smooth terms. plot(a) ## End(Not run)
## Not run: ## Simulate data. d <- GAMart() ## Estimate model with no results. b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d, results = FALSE) ## Compute model results a <- results.bamlss.default(b) ## Plot results for smooth terms. plot(a) ## End(Not run)
A simple helper function that removes special characters from a character string.
rmf(x)
rmf(x)
x |
A character string. |
A character string with special characters removed.
rmf("ba*&m^l$$:s.s")
rmf("ba*&m^l$$:s.s")
This is a simple wrapper function to define special smooth terms
in BAMLSS formulae. The function calls the smooth term constructor function
s
. The return value of s
is only slightly
modified, such that function bamlss.frame
identifies this term as a special
term and uses the appropriate (internal) infrastructures.
This structure is useful when the model term structure is, e.g., not a
linear combination of a design matrix and coefficients. See the example section of
function bfit
on how to use this setup.
s2(...)
s2(...)
... |
Arguments passed to function |
Slightly modified return value of function s
.
bamlss
, bamlss.frame
, bamlss.formula
, opt_bfit
print(names(s(x))) print(names(s2(x)))
print(names(s(x))) print(names(s2(x)))
This sampler function for BAMLSS is an interface to the BayesX
(https://www.uni-goettingen.de/de/bayesx/550513.html
)
command-line binary from R. The sampler is based on the command line version and functions
provided in the BayesXsrc package, which can be installed using
function get_BayesXsrc()
.
## Sampler functions: sam_BayesX(x, y, family, start = NULL, weights = NULL, offset = NULL, data = NULL, control = BayesX.control(...), ...) BayesX(x, y, family, start = NULL, weights = NULL, offset = NULL, data = NULL, control = BayesX.control(...), ...) ## Sampler control: BayesX.control(n.iter = 1200, thin = 1, burnin = 200, seed = NULL, predict = "light", model.name = "bamlss", data.name = "d", prg.name = NULL, dir = NULL, verbose = FALSE, show.prg = TRUE, modeonly = FALSE, ...) ## Special BayesX smooth term constructor. sx(x, z = NULL, bs = "ps", by = NA, ...) ## Special BayesX tensor product smooth term constructor. tx(..., bs = "ps", k = -1, ctr = c("center", "main", "both", "both1", "both2", "none", "meanf", "meanfd", "meansimple", "nullspace"), xt = NULL, special = TRUE) tx2(...) tx3(..., bs = "ps", k = c(10, 5), ctr = c("main", "center"), xt = NULL, special = TRUE) tx4(..., ctr = c("center", "main", "both", "both1", "both2")) ## Smooth constructors and predict matrix. ## S3 method for class 'tensorX.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'tensorX.smooth' Predict.matrix(object, data) ## S3 method for class 'tensorX3.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'tensorX3.smooth' Predict.matrix(object, data) ## Family object for quantile regression with BayesX. quant_bamlss(prob = 0.5) ## Download the newest version of BayesXsrc. get_BayesXsrc(dir = NULL, install = TRUE)
## Sampler functions: sam_BayesX(x, y, family, start = NULL, weights = NULL, offset = NULL, data = NULL, control = BayesX.control(...), ...) BayesX(x, y, family, start = NULL, weights = NULL, offset = NULL, data = NULL, control = BayesX.control(...), ...) ## Sampler control: BayesX.control(n.iter = 1200, thin = 1, burnin = 200, seed = NULL, predict = "light", model.name = "bamlss", data.name = "d", prg.name = NULL, dir = NULL, verbose = FALSE, show.prg = TRUE, modeonly = FALSE, ...) ## Special BayesX smooth term constructor. sx(x, z = NULL, bs = "ps", by = NA, ...) ## Special BayesX tensor product smooth term constructor. tx(..., bs = "ps", k = -1, ctr = c("center", "main", "both", "both1", "both2", "none", "meanf", "meanfd", "meansimple", "nullspace"), xt = NULL, special = TRUE) tx2(...) tx3(..., bs = "ps", k = c(10, 5), ctr = c("main", "center"), xt = NULL, special = TRUE) tx4(..., ctr = c("center", "main", "both", "both1", "both2")) ## Smooth constructors and predict matrix. ## S3 method for class 'tensorX.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'tensorX.smooth' Predict.matrix(object, data) ## S3 method for class 'tensorX3.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'tensorX3.smooth' Predict.matrix(object, data) ## Family object for quantile regression with BayesX. quant_bamlss(prob = 0.5) ## Download the newest version of BayesXsrc. get_BayesXsrc(dir = NULL, install = TRUE)
x |
For function |
y |
The model response, as returned from function |
z |
Second variable in a |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
data |
The model frame that should be used for modeling. Note that argument |
control |
List of control arguments to be send to BayesX. See below. |
n.iter |
Sets the number of MCMC iterations. |
thin |
Defines the thinning parameter for MCMC simulation. E.g., |
burnin |
Sets the burn-in phase of the sampler, i.e., the number of starting samples that should be removed. |
seed |
Sets the seed. |
predict |
Not supported at the moment, do not modify! |
model.name |
The name that should be used for the model when calling BayesX. |
data.name |
The name that should be used for the data set when calling BayesX. |
prg.name |
The name that should be used for the |
dir |
Specifies the directory where BayesX should store all output files. For function
|
verbose |
Print information during runtime of the algorithm. |
show.prg |
Show the BayesX |
modeonly |
Should only the posterior mode be compute, note that this is done using fixed smoothing parameters/variances. |
bs |
A |
by |
A by variable for varying coefficient model terms. |
k |
The dimension(s) of the bases used to represent the |
... |
Not used in |
ctr |
Specifies the type of constraints that should be applied. |
xt |
A list of extra arguments to be passed to BayesX. |
special |
Should the |
object , knots
|
See, function |
prob |
Numeric, specifies the quantile to be modeled, see the examples. |
install |
Should package BayesXsrc be installed? |
Function sam_BayesX()
writes a BayesX .prg
file and processes the data.
Then, the function call the BayesX binary via function
run.bayesx()
of the BayesXsrc package. After the BayesX sampler has
finished, the function reads back in all the parameter samples that can then be used
for further processing within bamlss
, i.a.
The smooth term constructor functions s
and te
can
be used with the sam_BayesX()
sampler. When using te
note that only
one smoothing variance is estimated by BayesX.
For anisotropic penalties use function tx()
and tx3()
, the former currently
supports smooth functions of two variables, while tx3()
is supposed to model space-time
interactions. Note that in tx3()
the first variable represents time and the 2nd and 3rd
variable the coordinates in space.
Function sam_BayesX()
returns samples of parameters. The samples are provided as a
mcmc
matrix.
Function BayesX.control()
returns a list
with control arguments for
BayesX.
Function sx()
a list
of class "xx.smooth.spec"
and "no.mgcv"
, where
"xx"
is a basis/type identifying code given by the bs
argument.
Function tx()
and tx2()
a list
of class tensorX.smooth.spec
.
Note that this interface is still experimental and needs the newest version of the BayesX
source code, which is not yet part of the BayesXsrc package on CRAN. The newest version
can be installed with function get_BayesXsrc
. Note that the function assumes that sh,
subversion (svn) and R can be run from the command line!
Note that for setting up a new family object to be used with sam_BayesX()
additional
information needs to be supplied. The extra information must be placed within the
family object in an named list
element named "bayesx"
. For each parameter of
the distribution a character string with the corresponding BayesX family
name and the
equationtype
must be supplied. See, e.g., the R code of gaussian_bamlss
how the setup works.
For function sx()
the following basis types are currently supported:
"ps"
: P-spline with second order difference penalty.
"mrf"
: Markov random fields: Defines a Markov random field prior for a
spatial covariate, where geographical information is provided by a map object in
boundary or graph file format (see function read.bnd
,
read.gra
and shp2bnd
), as an additional
argument named map
.
"re"
: Gaussian i.i.d. Random effects of a unit or cluster identification covariate.
Function tx()
currently supports smooth terms with two variables.
## Get newest version of BayesXsrc. ## Note: needs sh, svn and R build tools! ## get_BayesXsrc() ## Not run: if(require("BayesXsrc")) { ## Simulate some data set.seed(123) d <- GAMart() ## Estimate model with BayesX. Note ## that BayesX computes starting values, so ## these are not required by some optimizer function ## in bamlss() b1 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + s(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX) plot(b1) ## Same model with anisotropic penalty. b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX) plot(b2) ## Quantile regression. b3_0.1 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX, family = gF("quant", prob = 0.1)) b3_0.9 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX, family = gF("quant", prob = 0.9)) ## Predict quantiles. p_0.1 <- predict(b3_0.1, term = "s(x2)") p_0.9 <- predict(b3_0.9, term = "s(x2)") ## Plot. plot2d(p_0.1 + p_0.9 ~ x2, data = d) } ## End(Not run)
## Get newest version of BayesXsrc. ## Note: needs sh, svn and R build tools! ## get_BayesXsrc() ## Not run: if(require("BayesXsrc")) { ## Simulate some data set.seed(123) d <- GAMart() ## Estimate model with BayesX. Note ## that BayesX computes starting values, so ## these are not required by some optimizer function ## in bamlss() b1 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + s(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX) plot(b1) ## Same model with anisotropic penalty. b2 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX) plot(b2) ## Quantile regression. b3_0.1 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX, family = gF("quant", prob = 0.1)) b3_0.9 <- bamlss(num ~ s(x1) + s(x2) + s(x3) + tx(lon,lat), data = d, optimizer = FALSE, sampler = sam_BayesX, family = gF("quant", prob = 0.9)) ## Predict quantiles. p_0.1 <- predict(b3_0.1, term = "s(x2)") p_0.9 <- predict(b3_0.9, term = "s(x2)") ## Plot. plot2d(p_0.1 + p_0.9 ~ x2, data = d) } ## End(Not run)
This sampler function implements a derivative based MCMC algorithm for flexible Cox models with structured additive predictors.
sam_Cox(x, y, family, start, weights, offset, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) cox_mcmc(x, y, family, start, weights, offset, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...)
sam_Cox(x, y, family, start, weights, offset, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...) cox_mcmc(x, y, family, start, weights, offset, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, digits = 4, step = 20, ...)
x |
The |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
n.iter |
Sets the number of MCMC iterations. |
burnin |
Sets the burn-in phase of the sampler, i.e., the number of starting samples that should be removed. |
thin |
Defines the thinning parameter for MCMC simulation. E.g., |
verbose |
Print information during runtime of the algorithm. |
digits |
Set the digits for printing when |
step |
How many times should algorithm runtime information be printed, divides |
... |
Currently not used. |
The sampler uses derivative based proposal functions to create samples of parameters.
For time-dependent functions the proposals are based on one Newton-Raphson iteration centered
at the last state, while for the time-constant functions proposals can be based
on iteratively reweighted least squares (IWLS), see also function GMCMC
.
The integrals that are part of the time-dependent function updates are solved numerically.
In addition, smoothing variances are sampled using slice sampling.
The function returns samples of parameters. The samples are provided as a
mcmc
matrix.
Umlauf N, Klein N, Zeileis A (2016). Bayesian Additive Models for Location Scale and Shape (and Beyond). (to appear)
opt_Cox
, cox_bamlss
, surv_transform
,
simSurv
, bamlss
## Not run: library("survival") set.seed(123) ## Simulate survival data. d <- simSurv(n = 500) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Cox model with continuous time. ## Note the the family object cox_bamlss() sets ## the default optimizer and sampler function! ## First, posterior mode estimates are computed ## using function opt_Cox(), afterwards the ## sampler sam_Cox() is started. b <- bamlss(f, family = "cox", data = d) ## Plot estimated effects. plot(b) ## End(Not run)
## Not run: library("survival") set.seed(123) ## Simulate survival data. d <- simSurv(n = 500) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Cox model with continuous time. ## Note the the family object cox_bamlss() sets ## the default optimizer and sampler function! ## First, posterior mode estimates are computed ## using function opt_Cox(), afterwards the ## sampler sam_Cox() is started. b <- bamlss(f, family = "cox", data = d) ## Plot estimated effects. plot(b) ## End(Not run)
These functions provide a quite general infrastructure for sampling BAMLSS. The default proposal function is based on iteratively weighted least squares (IWLS), however, each model term may have a different updating function, see the details.
## Sampler functions: sam_GMCMC(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, step = 20, propose = "iwlsC_gp", chains = NULL, ...) GMCMC(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, step = 20, propose = "iwlsC_gp", chains = NULL, ...) ## Propose functions: GMCMC_iwls(family, theta, id, eta, y, data, weights = NULL, offset = NULL, ...) GMCMC_iwlsC(family, theta, id, eta, y, data, weights = NULL, offset = NULL, zworking, resids, rho, ...) GMCMC_iwlsC_gp(family, theta, id, eta, y, data, weights = NULL, offset = NULL, zworking, resids, rho, ...) GMCMC_slice(family, theta, id, eta, y, data, ...)
## Sampler functions: sam_GMCMC(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, step = 20, propose = "iwlsC_gp", chains = NULL, ...) GMCMC(x, y, family, start = NULL, weights = NULL, offset = NULL, n.iter = 1200, burnin = 200, thin = 1, verbose = TRUE, step = 20, propose = "iwlsC_gp", chains = NULL, ...) ## Propose functions: GMCMC_iwls(family, theta, id, eta, y, data, weights = NULL, offset = NULL, ...) GMCMC_iwlsC(family, theta, id, eta, y, data, weights = NULL, offset = NULL, zworking, resids, rho, ...) GMCMC_iwlsC_gp(family, theta, id, eta, y, data, weights = NULL, offset = NULL, zworking, resids, rho, ...) GMCMC_slice(family, theta, id, eta, y, data, ...)
x |
For function |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
weights |
Prior weights on the data, as returned from function |
offset |
Can be used to supply model offsets for use in fitting,
returned from function |
n.iter |
Sets the number of MCMC iterations. |
burnin |
Sets the burn-in phase of the sampler, i.e., the number of starting samples that should be removed. |
thin |
Defines the thinning parameter for MCMC simulation. E.g., |
verbose |
Print information during runtime of the algorithm. |
step |
How many times should algorithm runtime information be printed, divides |
propose |
Sets the propose function for model terms, e.g. for a term |
chains |
How many chains should be started? Chains a sampled sequentially! |
theta |
The current state of parameters, provided as a named list. The first level
represents the parameters of the distribution, the second level the parameters of
the model terms. E.g., using the |
id |
The parameter identifier, a character vector of length 2. The first character specifies the current distributional parameter, the second the current model term. |
eta |
The current value of the predictors, provided as a named list, one list entry
for each parameter. The names correspond to the parameter names in the family object,
see |
data |
An object as returned from function
|
zworking |
Preinitialized numeric vector of |
resids |
Preinitialized numeric vector of |
rho |
An environment, only for internal usage. |
... |
Arguments passed to function |
The sampler function sam_GMCMC()
cycles through all distributional parameters and corresponding
model terms in each iteration of the MCMC chain. Samples of the parameters of a model term (e.g.,
s(x)
) are generated by proposal functions, e.g. GMCMC_iwls()
.
The default proposal function that should be used for all model terms is set with argument
propose
. For smooth terms, e.g. terms created with function s
, if
a valid propose function is supplied within the extra xt
list, this propose function will
be used. This way each model term may have its own propose function for creating samples
of the parameters. See the example section.
The default proposal function GMCMC_iwlsC_gp
allows for general priors for the
smoothing variances and general penalty functions. Samples of smoothing variances are computed
using slice sampling. Function GMCMC_iwlsC
samples smoothing variances of univariate terms
assuming an inverse gamma prior. Terms of higher dimensions use again slice sampling for creating
samples of smoothing variances.
Function GMCMC_iwls
is similar to function GMCMC_iwlsC
but uses plain R code.
Function GMCMC_slice
applies slice sampling also for the regression coefficients and
is therefore relatively slow.
The function returns samples of parameters, depending on the return value of the propose functions
other quantities can be returned. The samples are provided as a mcmc
matrix.
If chains > 1
, the samples are provided as a mcmc.list
.
Umlauf N, Klein N, Zeileis A (2016). Bayesian Additive Models for Location Scale and Shape (and Beyond). (to appear)
bamlss
, bamlss.frame
,
bamlss.engine.setup
, set.starting.values
, s2
## Not run: ## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. opt <- with(bf, bfit(x, y, family)) ## Sample. samps <- with(bf, sam_GMCMC(x, y, family, start = opt$parameters)) plot(samps) ## End(Not run)
## Not run: ## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. opt <- with(bf, bfit(x, y, family)) ## Sample. samps <- with(bf, sam_GMCMC(x, y, family, start = opt$parameters)) plot(samps) ## End(Not run)
This sampler function for BAMLSS is an interface to the JAGS library
using package rjags
. The function basically interprets the
bamlss.frame
into BUGS code, similar to the jagam
function of
package mgcv
. I.e., the function uses the random effects representation of
smooth terms, see the transformer function randomize
to generate the BUGS code.
Note that estimating BAMLSS with JAGS is not very efficient.
Also note that this function is more experimental and support is only provided for a small
number of bamlss.family
objects.
Function BUGSeta()
therefore computes the code and data for one parameter of
the modeled distribution. Function BUGSmodel()
then collects all parameter model code
and data, which can be send to JAGS.
## Sampler functions: sam_JAGS(x, y, family, start = NULL, tdir = NULL, n.chains = 1, n.adapt = 100, n.iter = 4000, thin = 2, burnin = 1000, seed = NULL, verbose = TRUE, set.inits = TRUE, save.all = FALSE, modules = NULL, ...) JAGS(x, y, family, start = NULL, tdir = NULL, n.chains = 1, n.adapt = 100, n.iter = 4000, thin = 2, burnin = 1000, seed = NULL, verbose = TRUE, set.inits = TRUE, save.all = FALSE, modules = NULL, ...) ## Function to interpret an additive predictor into BUGS code: BUGSeta(x, id = NULL, ...) ## Function to interpret the full BAMLSS: BUGSmodel(x, family, is.stan = FALSE, reference = NULL, ...)
## Sampler functions: sam_JAGS(x, y, family, start = NULL, tdir = NULL, n.chains = 1, n.adapt = 100, n.iter = 4000, thin = 2, burnin = 1000, seed = NULL, verbose = TRUE, set.inits = TRUE, save.all = FALSE, modules = NULL, ...) JAGS(x, y, family, start = NULL, tdir = NULL, n.chains = 1, n.adapt = 100, n.iter = 4000, thin = 2, burnin = 1000, seed = NULL, verbose = TRUE, set.inits = TRUE, save.all = FALSE, modules = NULL, ...) ## Function to interpret an additive predictor into BUGS code: BUGSeta(x, id = NULL, ...) ## Function to interpret the full BAMLSS: BUGSmodel(x, family, is.stan = FALSE, reference = NULL, ...)
x |
For function |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
tdir |
The path to the temporary directory that should be used. |
n.chains |
Specifies the number of sequential MCMC chains that should be run with JAGS. |
n.adapt |
Specifies the number of iterations that should be used as an initial adaptive phase. |
n.iter |
Sets the number of MCMC iterations. |
thin |
Defines the thinning parameter for MCMC simulation. E.g., |
burnin |
Sets the burn-in phase of the sampler, i.e., the number of starting samples that should be removed. |
seed |
Sets the seed. |
verbose |
Print information during runtime of the algorithm. |
set.inits |
Should initial values of BAMLSS |
save.all |
Should all JAGS files be saved in |
modules |
Specify additional modules that should be loaded, see function |
id |
Character, the current parameter name for which the BUGS code should be produced. |
is.stan |
Should the BUGS code be translated to STAN code. Note that this is only experimental. |
reference |
A |
... |
Currently not used. |
Function sam_JAGS()
returns samples of parameters. The samples are provided as a mcmc
matrix. If n.chains > 1
, the samples are provided as a mcmc.list
.
Function BUGSeta()
returns the BUGS model code and preprocessed data for one
additive predictor. Function BUGSmodel()
then combines all single BUGS code chunks and
the data and creates the final BUGS model code that can be send to JAGS.
Note that for setting up a new family object to be used with sam_JAGS()
additional
information needs to be supplied. The extra information must be placed within the
family object in an element named "bugs"
. The following entries should be supplied
within the ..$bugs
list:
"dist"
. The name of the distribution in BUGS/JAGS model language.
"eta"
. The function that computes the BUGS code for one structured additive predictor.
Function BUGSeta()
is used per default.
"model"
. The function that merges all single predictor BUGS model code and data.
The default function is BUGSmodel()
.
"reparam"
. A named vector of character strings that specify a re-parametrization.
See also the example code of family.bamlss
.
bamlss
, bamlss.frame
,
bamlss.engine.setup
, set.starting.values
, bfit
,
GMCMC
## Not run: ## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. opt <- with(bf, opt_bfit(x, y, family)) ## Sample with JAGS. if(require("rjags")) { samps <- with(bf, sam_JAGS(x, y, family, start = opt$parameters)) plot(samps) b <- bamlss(f, data = d, family = "gaussian", sampler = sam_JAGS) plot(b) } ## End(Not run)
## Not run: ## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. opt <- with(bf, opt_bfit(x, y, family)) ## Sample with JAGS. if(require("rjags")) { samps <- with(bf, sam_JAGS(x, y, family, start = opt$parameters)) plot(samps) b <- bamlss(f, data = d, family = "gaussian", sampler = sam_JAGS) plot(b) } ## End(Not run)
This sampler function for BAMLSS uses estimated parameters
and the Hessian
information to create samples from a multivariate normal distribution. Note that smoothing
variance uncertainty is not accounted for, therefore, the resulting credible intervals
are most likely too narrow.
sam_MVNORM(x, y = NULL, family = NULL, start = NULL, n.samples = 500, hessian = NULL, ...) MVNORM(x, y = NULL, family = NULL, start = NULL, n.samples = 500, hessian = NULL, ...)
sam_MVNORM(x, y = NULL, family = NULL, start = NULL, n.samples = 500, hessian = NULL, ...) MVNORM(x, y = NULL, family = NULL, start = NULL, n.samples = 500, hessian = NULL, ...)
x |
The |
y |
The model response, as returned from function |
family |
A bamlss family object, see |
start |
A named numeric vector containing possible starting values, the names are based on
function |
n.samples |
Sets the number of samples that should be generated. |
hessian |
The Hessian matrix that should be used. Note that the row and column names
must be the same as the names of the |
... |
Arguments passed to function |
Function MVNORM()
returns samples of parameters. The samples are provided as a
mcmc
matrix.
bamlss
, bamlss.frame
,
bamlss.engine.setup
, set.starting.values
, opt_bfit
,
sam_GMCMC
## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. o <- with(bf, opt_bfit(x, y, family)) ## Sample. samps <- with(bf, sam_MVNORM(x, y, family, start = o$parameters)) plot(samps)
## Simulated data example illustrating ## how to call the sampler function. ## This is done internally within ## the setup of function bamlss(). d <- GAMart() f <- num ~ s(x1, bs = "ps") bf <- bamlss.frame(f, data = d, family = "gaussian") ## First, find starting values with optimizer. o <- with(bf, opt_bfit(x, y, family)) ## Sample. samps <- with(bf, sam_MVNORM(x, y, family, start = o$parameters)) plot(samps)
Generic function to extract samples from objects.
## Generic. samples(object, ...) ## Method for "bamlss" objects. ## S3 method for class 'bamlss' samples(object, model = NULL, term = NULL, combine = TRUE, drop = TRUE, burnin = NULL, thin = NULL, coef.only = FALSE, ...)
## Generic. samples(object, ...) ## Method for "bamlss" objects. ## S3 method for class 'bamlss' samples(object, model = NULL, term = NULL, combine = TRUE, drop = TRUE, burnin = NULL, thin = NULL, coef.only = FALSE, ...)
object |
An object for which samples should be extracted. |
model |
Character or integer, specifies the model for which samples should be extracted. |
term |
Character or integer, specifies the term for which samples should be extracted. |
combine |
Samples stored as a |
drop |
If there is only one |
burnin |
Integer, specifies the number of samples that should be withdrawn as a burn-in phase. |
thin |
Integer, specifies the step length of samples that should be extracted, e.g.,
|
coef.only |
Logical, should only samples of model coefficients be returned? |
... |
Other arguments. |
## Not run: ## Generate data. d <- GAMart() ## Estimate model. b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract samples for "s(x2)". sa <- samples(b, term = "s(x2)") head(sa) ## Trace plot. plot(sa) ## End(Not run)
## Not run: ## Generate data. d <- GAMart() ## Estimate model. b <- bamlss(num ~ s(x1) + s(x2) + s(x3), data = d) ## Extract samples for "s(x2)". sa <- samples(b, term = "s(x2)") head(sa) ## Trace plot. plot(sa) ## End(Not run)
The function computes the average the log-likelihood, log-posterior, the deviance
information criterion and estimated degrees of freedom from samples of, e.g., a
bamlss
object.
samplestats(samples, x = NULL, y = NULL, family = NULL, logLik = FALSE, ...)
samplestats(samples, x = NULL, y = NULL, family = NULL, logLik = FALSE, ...)
samples |
An object of class |
x |
The |
y |
The model response, as returned by function |
family |
A |
logLik |
Logical, should the log-likelihood be computed, may take some time! |
... |
Currently not used. |
If the log-likelihood is not available in the samples, the function tries to compute
the information. Depending on the complexity of the model, this may take some time.
Computations are based on the $d()
or $loglik()
function of the
bamlss.family
object.
If a bamlss.family
object contains a function $p2d()
or $p2loglik()
,
which computes the log-likelihood from parameters, these functions are used for computation.
A list with the following entries (if available):
logLik |
The average log-likelihood. |
logPost |
The average log-posterior. |
DIC |
The deviance information criterion. |
pd |
The estimated degrees of freedom. |
## Not run: ## Generate some data. d <- GAMart() ## Estimate model without sampling statistics b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d, samplestats = FALSE) ## Note: needs the $d() or $loglik() function in the family! names(family(b)) ## Compute sampling statistics. samplestats(b) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Estimate model without sampling statistics b <- bamlss(num ~ s(x1) + s(x2) + s(x3) + te(lon,lat), data = d, samplestats = FALSE) ## Note: needs the $d() or $loglik() function in the family! names(family(b)) ## Compute sampling statistics. samplestats(b) ## End(Not run)
The function scales numeric objects to specific ranges.
scale2(x, lower = -1.5, upper = 1.5)
scale2(x, lower = -1.5, upper = 1.5)
x |
Numeric, vector or matrix. |
lower |
The upper range. |
upper |
The lower range. |
A scaled numeric vector or matrix, scaled to the range provided in lower
and upper
.
set.seed(123) x <- runif(5) scale2(x, -1, 1) scale2(x, 0, 10)
set.seed(123) x <- runif(5) scale2(x, -1, 1) scale2(x, 0, 10)
Simulated data to test the implementation of the bamlss families.
data("simdata")
data("simdata")
An object of class list
of length 3.
## Not run: ## Reproducing code. set.seed(111) n <- 2000 ## build orthogonal rotation matrix thetax <- pi/4 thetay <- pi/4 thetaz <- pi/4 Rx <- matrix( c(1,0,0, 0,cos(thetax),sin(thetax), 0,-sin(thetax),cos(thetax) ), 3, 3 ) Ry <- matrix( c(cos(thetay),0,-sin(thetay), 0,1,0, sin(thetay),0,cos(thetay) ), 3, 3 ) Rz <- matrix( c(cos(thetaz),sin(thetaz),0, -sin(thetaz),cos(thetaz),0, 0,0,1 ), 3, 3 ) R <- Rx %*% Ry %*% Rz ## non-linear functions f1 <- function(x) (sin(pi * x))^2 f2 <- function(x) (cos(pi * x))^2 ## random derivitates x <- runif(n) ## eigenvalues val1 <- f1(x) val2 <- f2(x) val3 <- rep(0, n) ## initialize vectors for parameter lists p12 <- NULL p13 <- NULL p23 <- NULL sig <- matrix(0, n, 3) lamdiag <- matrix(0, n, 3) lambda <- matrix(0, n, 3) y <- matrix(0, n, 3) log_dens_ref <- rep(0, n) tau <- .1 ## offset on diagonal l <- 0 ## count occasions with invertible cv dens1 <- NULL for ( ii in seq(n) ) { mu <- rep(0, 3) val <- diag( c(val1[ii], val2[ii], val3[ii]) ) + diag(tau, 3) ## compute covariance matrix from rotation matrix and eigenvalues cv <- R %*% val %*% t(R) ## compute parameters for parameter list sig[ii,] <- sqrt(diag(cv)) p12[ii] <- cv[1,2] p13[ii] <- cv[1,3] p23[ii] <- cv[2,3] ## compute paramters for Cholesky family chol_cv <- solve(chol(cv)) # lambdas come from L^-1 not L lamdiag[ii,] <- diag(chol_cv) lambda[ii,] <- chol_cv[upper.tri(chol_cv)] ## Check if cv is invertible if ( !is.matrix(try(chol(cv))) ) l <- l + 1 y[ii,] <- mvtnorm::rmvnorm(1, mu, cv) log_dens_ref[ii] <- mvtnorm::dmvnorm(y[ii,], mu, cv, log = TRUE) } print(l) ## Data d <- as.data.frame(y) names(d) <- paste0("y", 1:3) d$x <- x ## make parameter list for mvn chol family par <- list() par[["mu1"]] <- rep(0,n) par[["mu2"]] <- rep(0,n) par[["mu3"]] <- rep(0,n) par[["lamdiag1"]] <- lamdiag[,1] par[["lamdiag2"]] <- lamdiag[,2] par[["lamdiag3"]] <- lamdiag[,3] par[["lambda12"]] <- lambda[,1] par[["lambda13"]] <- lambda[,2] par[["lambda23"]] <- lambda[,3] simdata <- list( d = d, par = par, y = y ) ## save(simdata, file = "simdata.rda") ## End of simulation ## End(Not run)
## Not run: ## Reproducing code. set.seed(111) n <- 2000 ## build orthogonal rotation matrix thetax <- pi/4 thetay <- pi/4 thetaz <- pi/4 Rx <- matrix( c(1,0,0, 0,cos(thetax),sin(thetax), 0,-sin(thetax),cos(thetax) ), 3, 3 ) Ry <- matrix( c(cos(thetay),0,-sin(thetay), 0,1,0, sin(thetay),0,cos(thetay) ), 3, 3 ) Rz <- matrix( c(cos(thetaz),sin(thetaz),0, -sin(thetaz),cos(thetaz),0, 0,0,1 ), 3, 3 ) R <- Rx %*% Ry %*% Rz ## non-linear functions f1 <- function(x) (sin(pi * x))^2 f2 <- function(x) (cos(pi * x))^2 ## random derivitates x <- runif(n) ## eigenvalues val1 <- f1(x) val2 <- f2(x) val3 <- rep(0, n) ## initialize vectors for parameter lists p12 <- NULL p13 <- NULL p23 <- NULL sig <- matrix(0, n, 3) lamdiag <- matrix(0, n, 3) lambda <- matrix(0, n, 3) y <- matrix(0, n, 3) log_dens_ref <- rep(0, n) tau <- .1 ## offset on diagonal l <- 0 ## count occasions with invertible cv dens1 <- NULL for ( ii in seq(n) ) { mu <- rep(0, 3) val <- diag( c(val1[ii], val2[ii], val3[ii]) ) + diag(tau, 3) ## compute covariance matrix from rotation matrix and eigenvalues cv <- R %*% val %*% t(R) ## compute parameters for parameter list sig[ii,] <- sqrt(diag(cv)) p12[ii] <- cv[1,2] p13[ii] <- cv[1,3] p23[ii] <- cv[2,3] ## compute paramters for Cholesky family chol_cv <- solve(chol(cv)) # lambdas come from L^-1 not L lamdiag[ii,] <- diag(chol_cv) lambda[ii,] <- chol_cv[upper.tri(chol_cv)] ## Check if cv is invertible if ( !is.matrix(try(chol(cv))) ) l <- l + 1 y[ii,] <- mvtnorm::rmvnorm(1, mu, cv) log_dens_ref[ii] <- mvtnorm::dmvnorm(y[ii,], mu, cv, log = TRUE) } print(l) ## Data d <- as.data.frame(y) names(d) <- paste0("y", 1:3) d$x <- x ## make parameter list for mvn chol family par <- list() par[["mu1"]] <- rep(0,n) par[["mu2"]] <- rep(0,n) par[["mu3"]] <- rep(0,n) par[["lamdiag1"]] <- lamdiag[,1] par[["lamdiag2"]] <- lamdiag[,2] par[["lamdiag3"]] <- lamdiag[,3] par[["lambda12"]] <- lambda[,1] par[["lambda13"]] <- lambda[,2] par[["lambda23"]] <- lambda[,3] simdata <- list( d = d, par = par, y = y ) ## save(simdata, file = "simdata.rda") ## End of simulation ## End(Not run)
Simulates longitudinal data with normal error and (Cox-type) survival times
using the inversion method. The function simJM()
is a wrapper specifying
all predictors and the resulting data sets. The wrapper calls rJM()
to sample
the survival times, a modified version of rSurvtime()
from the R package
CoxFlexBoost.
simJM(nsub = 300, times = seq(0, 120, 1), probmiss = 0.75, long_setting = "functional", alpha_setting = if(nonlinear) "linear" else "nonlinear", dalpha_setting = "zero", sigma = 0.3, long_df = 6, tmax = NULL, seed = NULL, full = FALSE, file = NULL, nonlinear = FALSE, fac = FALSE) rJM(hazard, censoring, x, r, subdivisions = 1000, tmin = 0, tmax, file = NULL, ...)
simJM(nsub = 300, times = seq(0, 120, 1), probmiss = 0.75, long_setting = "functional", alpha_setting = if(nonlinear) "linear" else "nonlinear", dalpha_setting = "zero", sigma = 0.3, long_df = 6, tmax = NULL, seed = NULL, full = FALSE, file = NULL, nonlinear = FALSE, fac = FALSE) rJM(hazard, censoring, x, r, subdivisions = 1000, tmin = 0, tmax, file = NULL, ...)
nsub |
number of individuals for which longitudinal data and survival times should be simulated. |
times |
vector of time points at which longitudinal measurements are "sampled". |
probmiss |
proportion of longitudinal measurements to be set to missing. Used to induce sparsity in the longitudinal measurements. |
long_setting |
Specification of the longitudinal trajectories of the sampled subjects.
Preset specifications are |
alpha_setting |
specification of the association between survival and longitudinal. Preset
specifications are |
dalpha_setting |
specification of the association between survival and the derivative of the longitudinal. Work in progress. |
sigma |
standard deviation of the normal error around the true longitudinal measurements. |
long_df |
number of basis functions from which functional random intercepts are sampled. |
tmax |
For function |
seed |
numeric scalar setting the random seed. |
full |
logical indicating if only the longitudinal data set should be returned ( |
file |
name of the data file the generated data set should be stored into (e.g., "simdata.RData") or NULL if the dataset should directly be returned in R. |
nonlinear |
If set to |
fac |
If set to |
hazard |
complete hazard function to specify the joint model. Time must be the first argument. |
censoring |
function to compute (random) censoring. |
x |
matrix of sampled covariate values. |
r |
matrix of sampled random coefficients. |
subdivisions |
the maximum number of subintervals for the integration. |
tmin |
earliest time point to sample a survival time. |
... |
further arguments to be passed to |
The function simulates longitudinal data basing on the given specification at given times
.
The full hazard is built from all joint model predictors ,
,
,
,
as presented in
Koehler, Umlauf, and Greven (2016), see also
jm_bamlss
. Survival times are sampled using the inversion
method (cf. Bender, Augustin, & Blettner, 2005). Additional censoring and missingness is
introduced. The longitudinal information is censored according to the survival information. The
user can also specify own predictors and use only rJM
to simulate survival times
accordingly.
Pre-specified functions for in
long_setting
are for linear
,
for nonlinear
and for functional
,
where denotes a B-spline basis function and
are the sampled penalized
coefficients from
gen_b
per person.
Prespecified functions for in
alpha_setting
are for constant
, for linear
, for
nonlinear
, and for nonlinear
.
Additionally the fixed functions for and
are employed.
For full = TRUE
a list of the three data.frame
s is returned:
data |
Simulated dataset in long format including all longitudinal and survival covariates. |
data_grid |
Dataset of the time-varying survival predictors which are not subject specific, evaluated at a grid of fixed time points. |
data_full |
Simulated data set prior to generating longitudinal missings. Useful to assess the longitudinal fit. |
For full = FALSE
only the first dataset is returned.
Covariates within these datasets include a subject identifier id
, the sampled survival
times survtime
, the event indicator event
, the time points of longitudinally
"observed" measurements obstime
, the longitudinal response y
, the cumulative
hazard at the survival time cumhaz
, as well as covariates x1, x2
, random effects
r1, r2, b1, ...
, and the true predictors alpha, lambda, gamma, mu, sigma
.
Hofner, B (2016). CoxFlexBoost: Boosting Flexible Cox Models (with Time-Varying Effects). R package version 0.7-0.
Bender, R., Augustin, T., and Blettner, M. (2005). Generating Survival Times to Simulate Cox Proportional Hazards Models. Statistics in Medicine, 24, 1713-1723.
Koehler N, Umlauf N, Beyerlein, A., Winkler, C., Ziegler, A., and Greven S (2016). Flexible Bayesian Additive Joint Models with an Application to Type 1 Diabetes Research. (submitted)
jm_bamlss
, opt_JM
, sam_JM
, bamlss
.
## Not run: ## Simulate survival data ## with functional random intercepts and a nonlinear effect ## of time, time-varying association alpha. d <- simJM(nsub = 300) head(d) ## Simulate survival data ## with random intercepts/slopes and a linear effect of time, ## constant association alpha. d <- simJM(nsub = 200, long_setting = "linear", alpha_setting = "constant") head(d) ## End(Not run)
## Not run: ## Simulate survival data ## with functional random intercepts and a nonlinear effect ## of time, time-varying association alpha. d <- simJM(nsub = 300) head(d) ## Simulate survival data ## with random intercepts/slopes and a linear effect of time, ## constant association alpha. d <- simJM(nsub = 200, long_setting = "linear", alpha_setting = "constant") head(d) ## End(Not run)
Function simSurv()
and rSurvtime2()
simulate arbitrary (Cox-type) survival times
using the inversion method. Function simSurv()
is a simple wrapper that calls
rSurvtime2()
. The functions are based on the R package CoxFlexBoost implementation
rSurvtime()
and only slightly modify the code.
## Simulate a pre-specified survival times data set. simSurv(n = 300) ## Simulate arbitrary survival times. rSurvTime2(lambda, x, cens_fct, upper = 1000, ..., file = NULL, subdivisions = 1000)
## Simulate a pre-specified survival times data set. simSurv(n = 300) ## Simulate arbitrary survival times. rSurvTime2(lambda, x, cens_fct, upper = 1000, ..., file = NULL, subdivisions = 1000)
n |
The number of individuals for which survival times should be simulated. |
lambda |
function. Baseline hazard |
x |
matrix. (Sampled) values for covariates (without time). |
cens_fct |
function. Function to compute (random) censoring. |
upper |
upper boundary of the interval the random survival times fall into. |
... |
further arguments to be passed to |
file |
character. name of the data file the generated data set should be stored into (e.g., "survtimes.RData") or NULL if the dataset should directly be returned in R. |
subdivisions |
The maximum number of subintervals for the integration. |
This is basically a slight modification according the computation of the integral,
see the manual page of function rSurvtime()
of package CoxFlexBoost
for
details.
A data.frame
consisting of the observed survival time
(time
), the non-censoring indicator (event
) and further
covariates x
is returned. If file
is specified, the
data.frame is additionally stored on the disc.
Benjamin Hofner (2016). CoxFlexBoost: Boosting Flexible Cox Models (with Time-Varying Effects). R package version 0.7-0.
Ralph Bender and Thomas Augustin and Maria Blettner (2005), Generating Survival Times to Simulate Cox Proportional Hazards Models. Statistics in Medicine, 24, 1713-1723.
cox_bamlss
, opt_Cox
, sam_Cox
, bamlss
## The following shows the code of the ## wrapper function simSurv(). set.seed(111) n <- 100 X <- matrix(NA, nrow = n, ncol = 3) X[, 1] <- runif(n, -1, 1) X[, 2] <- runif(n, -3, 3) X[, 3] <- runif(n, -1, 1) ## Specify censoring function. cens_fct <- function(time, mean_cens) { ## Censoring times are independent exponentially distributed. censor_time <- rexp(n = length(time), rate = 1 / mean_cens) event <- (time <= censor_time) t_obs <- apply(cbind(time, censor_time), 1, min) ## Return matrix of observed survival times and event indicator. return(cbind(t_obs, event)) } ## log(time) is the baseline hazard. lambda <- function(time, x) { exp(log(time) + 0.7 * x[1] + sin(x[2]) + sin(time * 2) * x[3]) } ## Simulate data with lambda() and cens_fct(). d <- rSurvTime2(lambda, X, cens_fct, mean_cens = 5)
## The following shows the code of the ## wrapper function simSurv(). set.seed(111) n <- 100 X <- matrix(NA, nrow = n, ncol = 3) X[, 1] <- runif(n, -1, 1) X[, 2] <- runif(n, -3, 3) X[, 3] <- runif(n, -1, 1) ## Specify censoring function. cens_fct <- function(time, mean_cens) { ## Censoring times are independent exponentially distributed. censor_time <- rexp(n = length(time), rate = 1 / mean_cens) event <- (time <= censor_time) t_obs <- apply(cbind(time, censor_time), 1, min) ## Return matrix of observed survival times and event indicator. return(cbind(t_obs, event)) } ## log(time) is the baseline hazard. lambda <- function(time, x) { exp(log(time) + 0.7 * x[1] + sin(x[2]) + sin(time * 2) * x[3]) } ## Simulate data with lambda() and cens_fct(). d <- rSurvTime2(lambda, X, cens_fct, mean_cens = 5)
This function plots slices from user defined values of bivariate surfaces.
sliceplot(x, y = NULL, z = NULL, view = 1, c.select = NULL, values = NULL, probs = c(0.1, 0.5, 0.9), grid = 100, legend = TRUE, pos = "topright", digits = 2, data = NULL, rawdata = FALSE, type = "mba", linear = FALSE, extrap = FALSE, k = 40, rug = TRUE, rug.col = NULL, jitter = TRUE, ...)
sliceplot(x, y = NULL, z = NULL, view = 1, c.select = NULL, values = NULL, probs = c(0.1, 0.5, 0.9), grid = 100, legend = TRUE, pos = "topright", digits = 2, data = NULL, rawdata = FALSE, type = "mba", linear = FALSE, extrap = FALSE, k = 40, rug = TRUE, rug.col = NULL, jitter = TRUE, ...)
x |
A matrix or data frame, containing the covariates for which the effect should be plotted
in the first and second column and at least a third column containing the effect. Another
possibility is to specify the plot via a |
y |
If |
z |
If |
view |
Which variable should be used for the x-axis of the plot, the other variable will be
used to compute the slices. May also be a |
c.select |
Integer, selects the column that is used in the resulting matrix to be
used as the |
values |
The values of the |
probs |
Numeric vector of probabilities with values in [0,1] to be used within function
|
grid |
The grid size of the surface where the slices are generated from. |
legend |
If set to |
pos |
The position of the legend, see also function |
digits |
The decimal place the legend values should be rounded. |
data |
If |
rawdata |
If set to |
type |
Character, which type of interpolation method should be used. The default is
|
linear |
Logical, should linear interpolation be used withing function
|
extrap |
Logical, should interpolations be computed outside the observation area (i.e., extrapolated)? |
k |
Integer, the number of basis functions to be used to compute the interpolated surface
when |
rug |
Add a |
jitter |
|
rug.col |
Specify the color of the rug representation. |
... |
Similar to function plot3d
, this function first applies bivariate interpolation
on a regular grid
, afterwards the slices are computed from the resulting surface.
Function sliceplot
can use the akima package to construct smooth interpolated
surfaces, therefore, package akima needs to be installed. The akima package has an ACM
license that restricts applications to non-commercial usage, see
https://www.acm.org/publications/policies/software-copyright-notice
Function sliceplot
prints a note referring to the ACM license. This note can be suppressed by
setting
options("use.akima" = TRUE)
plot2d
, plot3d
,
plotmap
, plotblock
.
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(z = runif(n, -3, 3), w = runif(n, 0, 6)) ## Response. d$y <- with(d, 1.5 + cos(z) * sin(w) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ te(z, w), data = d) summary(b) ## Plot estimated effect. plot(b, term = "te(z,w)", sliceplot = TRUE) plot(b, term = "te(z,w)", sliceplot = TRUE, view = 2) plot(b, term = "te(z,w)", sliceplot = TRUE, view = "w") plot(b, term = "te(z,w)", sliceplot = TRUE, probs = seq(0, 1, length = 10)) ## End(Not run) ## Variations. d$f1 <- with(d, sin(z) * cos(w)) sliceplot(cbind(z = d$z, w = d$w, f1 = d$f1)) ## Same with formula. sliceplot(sin(z) * cos(w) ~ z + w, ylab = "f(z)", data = d) ## Compare with plot3d(). plot3d(sin(z) * 1.5 * w ~ z + w, zlab = "f(z,w)", data = d) sliceplot(sin(z) * 1.5 * w ~ z + w, ylab = "f(z)", data = d) sliceplot(sin(z) * 1.5 * w ~ z + w, view = 2, ylab = "f(z)", data = d)
## Generate some data. set.seed(111) n <- 500 ## Regressors. d <- data.frame(z = runif(n, -3, 3), w = runif(n, 0, 6)) ## Response. d$y <- with(d, 1.5 + cos(z) * sin(w) + rnorm(n, sd = 0.6)) ## Not run: ## Estimate model. b <- bamlss(y ~ te(z, w), data = d) summary(b) ## Plot estimated effect. plot(b, term = "te(z,w)", sliceplot = TRUE) plot(b, term = "te(z,w)", sliceplot = TRUE, view = 2) plot(b, term = "te(z,w)", sliceplot = TRUE, view = "w") plot(b, term = "te(z,w)", sliceplot = TRUE, probs = seq(0, 1, length = 10)) ## End(Not run) ## Variations. d$f1 <- with(d, sin(z) * cos(w)) sliceplot(cbind(z = d$z, w = d$w, f1 = d$f1)) ## Same with formula. sliceplot(sin(z) * cos(w) ~ z + w, ylab = "f(z)", data = d) ## Compare with plot3d(). plot3d(sin(z) * 1.5 * w ~ z + w, zlab = "f(z,w)", data = d) sliceplot(sin(z) * 1.5 * w ~ z + w, ylab = "f(z)", data = d) sliceplot(sin(z) * 1.5 * w ~ z + w, view = 2, ylab = "f(z)", data = d)
For each smooth term estimated with MCMC, the function computes 95 intervals and simply computes the fraction of the cases where the interval does not contain zero.
smooth_check(object, newdata = NULL, model = NULL, term = NULL, ...)
smooth_check(object, newdata = NULL, model = NULL, term = NULL, ...)
object |
A fitted model object which contains MCMC samples. |
newdata |
Optionally, use new data for computing the check. |
model |
Character, for which model should the check be computed? |
term |
Character, for which term should the check be computed? |
... |
Arguments passed to |
## Not run: ## Simulate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Estimate model with MCMC. b <- bamlss(f, data = d) ## Run the check, note that all variables ## for sigma should have no effect. smooth_check(b) ## End(Not run)
## Not run: ## Simulate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Estimate model with MCMC. b <- bamlss(f, data = d) ## Run the check, note that all variables ## for sigma should have no effect. smooth_check(b) ## End(Not run)
The generic function is only a copy of smooth.construct
adding
a ...
argument. For objects of class "bamlss.frame"
and "bamlss"
the
method extracts all smooth model terms, see function bamlss.frame
for details
on the setup of BAMLSS.
## Function as in package mgcv ## but with additional dots argument. smooth.construct(object, data, knots, ...) ## For 'bamlss.frame's. ## S3 method for class 'bamlss.frame' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...) ## S3 method for class 'bamlss.formula' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...) ## S3 method for class 'bamlss.terms' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...)
## Function as in package mgcv ## but with additional dots argument. smooth.construct(object, data, knots, ...) ## For 'bamlss.frame's. ## S3 method for class 'bamlss.frame' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...) ## S3 method for class 'bamlss.formula' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...) ## S3 method for class 'bamlss.terms' smooth.construct(object, data = NULL, knots = NULL, model = NULL, drop = TRUE, ...)
object |
Either a smooth specification object, or object of class |
data |
A data frame or list, see also see function |
knots |
See function |
model |
Character, specifies for which model parameter the smooth constructs should be created. |
drop |
If there is only one model parameter the returned named list is simplified. |
... |
Arguments passed to the smooth term constructor functions. |
For smooth specification objects see function see smooth.construct
. For
objects of class "bamlss.frame"
or "bamlss"
the list of smooth constructs,
see function bamlss.frame
for more details.
bamlss.frame
, bamlss.formula
,
bamlss
, smooth.construct
.
## Generate some data. d <- GAMart() ## Create a "bamlss.frame". bf <- bamlss.frame(num ~ s(x1) + s(x2), data = d) ## Extract the smooth construct. sc <- smooth.construct(bf) str(sc) ## Also possible with formulas. f <- bamlss.formula(list( num ~ s(x1) + te(lon,lat), sigma ~ s(x2) ), family = "gaussian") sc <- smooth.construct(f, data = d) str(sc)
## Generate some data. d <- GAMart() ## Create a "bamlss.frame". bf <- bamlss.frame(num ~ s(x1) + s(x2), data = d) ## Extract the smooth construct. sc <- smooth.construct(bf) str(sc) ## Also possible with formulas. f <- bamlss.formula(list( num ~ s(x1) + te(lon,lat), sigma ~ s(x2) ), family = "gaussian") sc <- smooth.construct(f, data = d) str(sc)
This smooth constructor implements a kriging based model term.
## S3 method for class 'kr.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'kriging.smooth' Predict.matrix(object, data)
## S3 method for class 'kr.smooth.spec' smooth.construct(object, data, knots, ...) ## S3 method for class 'kriging.smooth' Predict.matrix(object, data)
object , data , knots
|
See |
... |
Currently not used. |
This smooth constructor implements univariate and bivariate Kriging terms. The
basis functions are based on the Matern covariance function. For finding
knots, a space filling algorithm is used, see cover.design
.
A smooth specification object, see also smooth.construct.
Fahrmeir, L., Kneib, T. , Lang, S., Marx, B. (2013): Regression. Models, Methods and Applications,
Springer Verlag. https://www.uni-goettingen.de/de/551357.html
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1,bs="kr") + s(x2,bs="kr") + s(x3,bs="kr") + s(lon,lat,bs="kr",k=30) ## Set the seed, estimate model. set.seed(111) b <- bamlss(f, data = d) ## Plot estimated effects. plot(b) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ s(x1,bs="kr") + s(x2,bs="kr") + s(x3,bs="kr") + s(lon,lat,bs="kr",k=30) ## Set the seed, estimate model. set.seed(111) b <- bamlss(f, data = d) ## Plot estimated effects. plot(b) ## End(Not run)
The function sets up a smooth term for shape constraint estimation of P-spline model terms. Note that this currently only works using boosting and backfitting.
## S3 method for class 'ms.smooth.spec' smooth.construct(object, data, knots, ...)
## S3 method for class 'ms.smooth.spec' smooth.construct(object, data, knots, ...)
object |
Either a smooth specification object, or object of class |
data |
A data frame or list, see also see function |
knots |
See function |
... |
Arguments passed to the smooth term constructor functions. |
See function see smooth.construct
.
bamlss.frame
, bamlss.formula
,
bamlss
, smooth.construct
.
## Not run: ## Generate some data. set.seed(123) n <- 300 x <- runif(n, -2, 3) y <- sin(x) + rnorm(n, sd = 0.1) d <- data.frame("y" = y, "x" = x) ## Increasing: constr = 1. ## Decreasing: constr = 2. b <- bamlss(y ~ s2(x,bs="ms",xt=list(constr=1)), data = d, optimizer = opt_bfit, sampler = sam_MVNORM) ## Predict and plot. p <- predict(b, model = "mu", FUN = c95) plot(y ~ x) plot2d(p ~ x, add = TRUE, col.lines = 4, lwd = 2) ## End(Not run)
## Not run: ## Generate some data. set.seed(123) n <- 300 x <- runif(n, -2, 3) y <- sin(x) + rnorm(n, sd = 0.1) d <- data.frame("y" = y, "x" = x) ## Increasing: constr = 1. ## Decreasing: constr = 2. b <- bamlss(y ~ s2(x,bs="ms",xt=list(constr=1)), data = d, optimizer = opt_bfit, sampler = sam_MVNORM) ## Predict and plot. p <- predict(b, model = "mu", FUN = c95) plot(y ~ x) plot2d(p ~ x, add = TRUE, col.lines = 4, lwd = 2) ## End(Not run)
This smooth constructor implements the random effects representation of a P-spline.
## S3 method for class 'sr.smooth.spec' smooth.construct(object, data, knots, ...)
## S3 method for class 'sr.smooth.spec' smooth.construct(object, data, knots, ...)
object , data , knots
|
See |
... |
Currently not used. |
See smooth.construct
bamlss
, predict.bamlss
, opt_bfit
, opt_boost
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ x1 + x2 + x3 + s2(x1,bs="sr") + s2(x2,bs="sr") + s2(x3,bs="sr") b <- bamlss(f, data = d, optimizer = boost, sampler = FALSE) plot(b) ## End(Not run)
## Not run: ## Simulate data. set.seed(123) d <- GAMart() ## Estimate model. f <- num ~ x1 + x2 + x3 + s2(x1,bs="sr") + s2(x2,bs="sr") + s2(x3,bs="sr") b <- bamlss(f, data = d, optimizer = boost, sampler = FALSE) plot(b) ## End(Not run)
Performs stability selection based on gradient boosting.
stabsel(formula, data, family = "gaussian", q, maxit, B = 100, thr = .9, fraction = 0.5, seed = NULL, ...) ## Plot selection frequencies. ## S3 method for class 'stabsel' plot(x, show = NULL, pal = function(n) gray.colors(n, start = 0.9, end = 0.3), ...)
stabsel(formula, data, family = "gaussian", q, maxit, B = 100, thr = .9, fraction = 0.5, seed = NULL, ...) ## Plot selection frequencies. ## S3 method for class 'stabsel' plot(x, show = NULL, pal = function(n) gray.colors(n, start = 0.9, end = 0.3), ...)
formula |
A formula or extended formula. |
data |
A |
family |
A |
q |
An integer specifying how many terms to select in each boosting run. |
maxit |
An integer specifying the maximum number of boosting iterations.
See |
B |
An integer. The boosting is run B times. |
thr |
Cut-off threshold of relative frequencies (between 0 and 1) for selection. |
fraction |
Numeric between 0 and 1. The fraction of data to be used in each boosting run. |
seed |
A seed to be set before the stability selection. |
x |
A object of class stabsel. |
show |
Number of terms to be shown. |
pal |
Color palette for different model terms. |
... |
Not used yet in |
stabsel
performs stability selection based on gradient
boosting (opt_boost
): The boosting algorithm is run
B
times on a randomly drawn fraction
of the data
.
Each boosting run is stopped either when q
terms have been selected,
or when maxit
iterations have been performed, i.e. either q
or maxit
can be used to tune the regularization of the boosting.
After the boosting the relative selection frequencies are evaluated.
Terms with a relative selection frequency larger then thr
are suggested for a final regression model.
If neither q
nor maxit
has been specified, q
will be set to the square root of the number of columns in data
.
Gradient boosting does not depend on random numbers. Thus, the individual boosting runs differ only in the subset of data which is used.
A object of class stabsel.
Thorsten Simon
## Not run: ## Simulate some data. set.seed(111) d <- GAMart() n <- nrow(d) ## Add some noise variables. for(i in 4:9) d[[paste0("x",i)]] <- rnorm(n) f <- paste0("~ ", paste("s(x", 1:9, ")", collapse = "+", sep = "")) f <- paste(f, "+ te(lon,lat)") f <- as.formula(f) f <- list(update(f, num ~ .), f) ## Run stability selection. sel <- stabsel(f, data = d, q = 6, B = 10) plot(sel) ## Estimate selected model. nf <- formula(sel) b <- bamlss(nf, data = d) plot(b) ## End(Not run)
## Not run: ## Simulate some data. set.seed(111) d <- GAMart() n <- nrow(d) ## Add some noise variables. for(i in 4:9) d[[paste0("x",i)]] <- rnorm(n) f <- paste0("~ ", paste("s(x", 1:9, ")", collapse = "+", sep = "")) f <- paste(f, "+ te(lon,lat)") f <- as.formula(f) f <- list(update(f, num ~ .), f) ## Run stability selection. sel <- stabsel(f, data = d, q = 6, B = 10) plot(sel) ## Estimate selected model. nf <- formula(sel) b <- bamlss(nf, data = d) plot(b) ## End(Not run)
The function takes an object of class "bamlss"
and produces summaries
of optimizer and sampler function outputs.
## S3 method for class 'bamlss' summary(object, model = NULL, FUN = NULL, parameters = TRUE, ...) ## S3 method for class 'summary.bamlss' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'bamlss' summary(object, model = NULL, FUN = NULL, parameters = TRUE, ...) ## S3 method for class 'summary.bamlss' print(x, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class "bamlss". |
x |
An oject of class |
model |
Character or integer, specifies the model for which a summary should be computed. |
FUN |
Function that should be applied on samples, see also function |
parameters |
If an optimizer function is applied within the |
digits |
Controls number of digits printed in output. |
... |
Other arguments. |
If the fitted model contains samples, summaries according to the supplied function can be
computed, e.g., different quantiles of samples. See also function coef.bamlss
that extracts the coefficient summaries.
If an optimizer function was used within the bamlss
call, estimated parameters
will be included per default into the summary.
Note that summaries not based on samples can be user defined, e.g., as returned from function
samplestats
or the return values of optimizer function, e.g., see function
opt_bfit
.
summary.bamlss
produces the following summary:
call |
The initial |
family |
The family that is used for modeling. |
formula |
The model formula. |
model.matrix |
Summary of parameteric terms. |
smooth.construct |
Summary of smooth terms. |
model.stats |
Other model statistics, e.g., as returned from
optimizer functions and/or produces by function |
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2), sigma ~ s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Print the summary. print(summary(b)) ## End(Not run)
## Not run: ## Generate some data. d <- GAMart() ## Model formula. f <- list( num ~ s(x1) + s(x2), sigma ~ s(x3) + te(lon,lat) ) ## Estimate model. b <- bamlss(f, data = d) ## Print the summary. print(summary(b)) ## End(Not run)
This function takes a bamlss.frame
and computes design matrices of model terms
based on a time grid for time-dependent structured additive predictors in a survival context.
Note that this transformer function is usually used internally by function bamlss
and is the default transformer function using the cox_bamlss
family object.
The time grid design matrices can be used to construct the full structured additive predictor for each time point. This way it is possible to solve the integrals that are part of, e.g., a Newton-Raphson updating scheme, numerically.
See the example section on how to extract the time grid design matrices.
surv_transform(x, y, data, family, subdivisions = 100, timedependent = "lambda", timevar = NULL, idvar = NULL, is.cox = FALSE, alpha = 0.1, ...)
surv_transform(x, y, data, family, subdivisions = 100, timedependent = "lambda", timevar = NULL, idvar = NULL, is.cox = FALSE, alpha = 0.1, ...)
x |
The |
y |
The model response, as returned from function |
data |
The data.frame that should be used for setting up all matrices. |
family |
A bamlss family object, see |
subdivisions |
How many time points should be created for each individual. |
timedependent |
A character vector specifying the names of parameters in |
timevar |
A character specifying the name of the survival time variable in the data set. |
idvar |
Depending on the type of data set, this is the name of the variable specifying identifier of individuals. |
is.cox |
Should the |
alpha |
A value for the intercept of a parameter names alpha. Typically the association parameter of a longitudinal and survival process in a joint model. |
... |
Arguments passed to function |
A bamlss.frame
including the time grid design matrices.
cox_bamlss
, opt_Cox
, sam_Cox
,
simSurv
, bamlss
library("survival") set.seed(111) ## Simulate survival data. d <- simSurv(n = 20) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Create the bamlss.frame. bf <- bamlss.frame(f, family = "cox", data = d) ## Lambda is the time-dependent parameter. print(bf) ## Apply the transformer. bf <- with(bf, surv_transform(x, y, data = model.frame, family = family, is.cox = TRUE, subdivisions = 25)) ## Extract the time grid design matrix for term s(time). X <- bf$x$lambda$smooth.construct[["s(time)"]]$fit.fun_timegrid(NULL) dim(X) ## Compute fitted values for each time point. grid <- attr(bf$y[[1]], "grid") gdim <- c(length(grid), length(grid[[1]])) b <- runif(ncol(X)) fit <- X %*% b fit <- matrix(fit, nrow = gdim[1], ncol = gdim[2], byrow = TRUE) plot(as.vector(fit) ~ unlist(grid), type = "n", xlab = "Survival time", ylab = "Effect") for(j in seq_along(grid)) { lines(fit[j, ] ~ grid[[j]], lwd = 2, col = rgb(0.1, 0.1, 0.1, alpha = 0.3)) points(grid[[j]][gdim[2]], fit[j, gdim[2]], col = "red") }
library("survival") set.seed(111) ## Simulate survival data. d <- simSurv(n = 20) ## Formula of the survival model, note ## that the baseline is given in the first formula by s(time). f <- list( Surv(time, event) ~ s(time) + s(time, by = x3), gamma ~ s(x1) + s(x2) ) ## Create the bamlss.frame. bf <- bamlss.frame(f, family = "cox", data = d) ## Lambda is the time-dependent parameter. print(bf) ## Apply the transformer. bf <- with(bf, surv_transform(x, y, data = model.frame, family = family, is.cox = TRUE, subdivisions = 25)) ## Extract the time grid design matrix for term s(time). X <- bf$x$lambda$smooth.construct[["s(time)"]]$fit.fun_timegrid(NULL) dim(X) ## Compute fitted values for each time point. grid <- attr(bf$y[[1]], "grid") gdim <- c(length(grid), length(grid[[1]])) b <- runif(ncol(X)) fit <- X %*% b fit <- matrix(fit, nrow = gdim[1], ncol = gdim[2], byrow = TRUE) plot(as.vector(fit) ~ unlist(grid), type = "n", xlab = "Survival time", ylab = "Effect") for(j in seq_along(grid)) { lines(fit[j, ] ~ grid[[j]], lwd = 2, col = rgb(0.1, 0.1, 0.1, alpha = 0.3)) points(grid[[j]][gdim[2]], fit[j, gdim[2]], col = "red") }
This function is only a slight extension of Surv
for
joint models.
Surv2(..., obs = NULL)
Surv2(..., obs = NULL)
... |
Arguments passed to function |
obs |
The observed longitudinal response. |
An object of class "Surv2"
and "matrix"
.
## Surv2(time, event, obs = y) ## See the examples of opt_JM() and sam_JM()!
## Surv2(time, event, obs = y) ## See the examples of opt_JM() and sam_JM()!
Temperature Data for Innsbruck Airport
data("TempIbk")
data("TempIbk")
An object of class data.frame
with 1798 rows and 17 columns.
Numerical weather predictions (NWP) and observations of 2 meter temperature at Innsbruck Airport. The observations from the SYNOP station 11120 cover 5 years from 2015-01-01 to 2019-31-12. The NWP data are derived from GEFS reforecasts (Hamill et al. 2013). The data contain following variables:
init
: Time of initialization of the NWP model.
obs_*
: Observations for lead time *
.
mean_ens_*
: NWP ensemble mean for lead time *
.
logsd_ens_*
: NWP logarithm of ensemble standard deviation for lead time *
.
yday
: Yearday.
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau Jr TJ, Zhu Y, Lapenta W (2013). NOAA's Second-Generation Global Medium-Range Ensemble Reforecast Data Set. Bulletin of the American Meteorological Society, 94(10), 1553-1565.
## Not run: ## Innsbruck temperature data. data("TempIbk", package = "bamlss") ## Five lead times. lead <- seq(192, 216, by = 6) ## Set up formulas. f <- c( ## mu equations sprintf('obs_%s ~ s(yday, bs = "cc") + s(yday, bs = "cc", by = mean_ens_%s)', lead, lead), ## lambda diag equations sprintf('lamdiag%s ~ s(yday, bs = "cc") + s(yday, bs = "cc", by = logsd_ens_%s)', 1:5, lead), ## lambda off-diag equations sprintf('lambda%s ~ s(yday, bs = "cc")', apply(combn(1:5, 2), 2, paste, collapse = "")) ) f <- lapply(f, as.formula) ## Multivariate normal family with basic Cholesky parameterization. fam <- mvnchol_bamlss(k = 5, type = "basic") ## Fit model. set.seed(123) b <- bamlss(f, family = fam, data = TempIbk, optimizer = opt_boost, maxit = 1000) ## Show estimated effects. par(mfrow = c(2, 2)) plot(b, model = "mu1", scale = 0, spar = FALSE) plot(b, model = "lamdiag2", term = "s(yday)", spar = FALSE) plot(b, model = "lambda12") ## Predict sample case. nd <- subset(TempIbk, format(init, "%Y-%m-%d") %in% c("2015-01-03", "2015-10-10")) fit <- predict(b, newdata = nd, type = "parameter") ## Plot correlation matrix for GEFS initialization 2015-10-10. plot_cor <- function(i) { image(lead, lead, fam$correlation(fit)[[i]][5:1, ], zlim = c(0, 1), col = hcl.colors(10, "Blues 3", rev = TRUE), axes = FALSE, xlab = "lead time in hours", ylab = "lead time in hours", main = sprintf("Correlation matrix fitted for %s", nd[i, "init"])) axis(1, lead) axis(2, lead, rev(lead)) box() } par(mfrow = c(1, 2)) plot_cor(1) plot_cor(2) ## Plot means and standard deviations. plot_ms <- function(i) { stdev <- fam$stdev(fit)[[i]] means <- fam$means(fit)[[i]] lower <- means - stdev upper <- means + stdev plot(lead, means, type = 'b', cex = 2, lwd = 1, lty = 2, axes = FALSE, ylim = c(-6, 16), # c(min(lower), max(upper)), ylab = expression("Temperature in " * degree * "C"), xlab = "lead time in hours", main = sprintf("Means +/- one st. dev. for %s", nd[i, "init"])) segments(lead, y0 = lower, y1 = upper) axis(1, lead) axis(2) box() } par(mfrow = c(1, 2)) plot_ms(1) plot_ms(2) ## End(Not run)
## Not run: ## Innsbruck temperature data. data("TempIbk", package = "bamlss") ## Five lead times. lead <- seq(192, 216, by = 6) ## Set up formulas. f <- c( ## mu equations sprintf('obs_%s ~ s(yday, bs = "cc") + s(yday, bs = "cc", by = mean_ens_%s)', lead, lead), ## lambda diag equations sprintf('lamdiag%s ~ s(yday, bs = "cc") + s(yday, bs = "cc", by = logsd_ens_%s)', 1:5, lead), ## lambda off-diag equations sprintf('lambda%s ~ s(yday, bs = "cc")', apply(combn(1:5, 2), 2, paste, collapse = "")) ) f <- lapply(f, as.formula) ## Multivariate normal family with basic Cholesky parameterization. fam <- mvnchol_bamlss(k = 5, type = "basic") ## Fit model. set.seed(123) b <- bamlss(f, family = fam, data = TempIbk, optimizer = opt_boost, maxit = 1000) ## Show estimated effects. par(mfrow = c(2, 2)) plot(b, model = "mu1", scale = 0, spar = FALSE) plot(b, model = "lamdiag2", term = "s(yday)", spar = FALSE) plot(b, model = "lambda12") ## Predict sample case. nd <- subset(TempIbk, format(init, "%Y-%m-%d") %in% c("2015-01-03", "2015-10-10")) fit <- predict(b, newdata = nd, type = "parameter") ## Plot correlation matrix for GEFS initialization 2015-10-10. plot_cor <- function(i) { image(lead, lead, fam$correlation(fit)[[i]][5:1, ], zlim = c(0, 1), col = hcl.colors(10, "Blues 3", rev = TRUE), axes = FALSE, xlab = "lead time in hours", ylab = "lead time in hours", main = sprintf("Correlation matrix fitted for %s", nd[i, "init"])) axis(1, lead) axis(2, lead, rev(lead)) box() } par(mfrow = c(1, 2)) plot_cor(1) plot_cor(2) ## Plot means and standard deviations. plot_ms <- function(i) { stdev <- fam$stdev(fit)[[i]] means <- fam$means(fit)[[i]] lower <- means - stdev upper <- means + stdev plot(lead, means, type = 'b', cex = 2, lwd = 1, lty = 2, axes = FALSE, ylim = c(-6, 16), # c(min(lower), max(upper)), ylab = expression("Temperature in " * degree * "C"), xlab = "lead time in hours", main = sprintf("Means +/- one st. dev. for %s", nd[i, "init"])) segments(lead, y0 = lower, y1 = upper) axis(1, lead) axis(2) box() } par(mfrow = c(1, 2)) plot_ms(1) plot_ms(2) ## End(Not run)
Extract terms.object
s for BAMLSS.
## S3 method for class 'bamlss' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...) ## S3 method for class 'bamlss.frame' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...) ## S3 method for class 'bamlss.formula' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...)
## S3 method for class 'bamlss' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...) ## S3 method for class 'bamlss.frame' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...) ## S3 method for class 'bamlss.formula' terms(x, specials = NULL, data = NULL, model = NULL, pterms = TRUE, sterms = TRUE, drop = TRUE, ...)
x |
An |
specials |
See |
data |
Data passed to |
model |
Character or integer, specifies the model for which terms should be returned. |
pterms |
Should parametric terms be part of the object? |
sterms |
Should smooth terms be part of the object? |
drop |
If terms for only one |
... |
Arguments passed to |
Object of class "bamlss.terms"
, a list of terms.object
s, depending on
the structure of the bamlss.formula
object.
bamlss
, bamlss.frame
, bamlss.formula
.
## Model formula. f <- list( num ~ x1 + x2 + id, sigma ~ x3 + fac + lon + lat ) ## Create the terms object. terms(bamlss.formula(f))
## Model formula. f <- list( num ~ x1 + x2 + id, sigma ~ x3 + fac + lon + lat ) ## Create the terms object. terms(bamlss.formula(f))
The transformer function takes a bamlss.frame
object and transforms
the response and the design matrices to account for lag 1 autocorrelation. The method
is also known as Prais-Winsten estimation.
trans_AR1(rho = 0.1) AR1(rho = 0.1)
trans_AR1(rho = 0.1) AR1(rho = 0.1)
rho |
Specifies the correlation parameter at lag 1. |
A transformer function which can be used in the bamlss
call.
Johnston, John (1972). Econometric Methods (2nd ed.). New York: McGraw-Hill. pp. 259–265.
bamlss.frame
, bamlss
, smooth2random
.
## Not run: ## Simulate AR1 data. set.seed(111) n <- 240 d <- data.frame("t" = 1:n) ## Nonlinear function. f <- function(x) { 2 + sin(x / n * 2 * pi - pi) } ## Correlated errors. rho <- 0.8 e <- rnorm(n, sd = 0.1) u <- c(e[1], rep(NA, n - 1)) for(i in 2:n){ u[i] <- rho * u[i - 1] + e[i] } ## Response. d$y <- f(d$t) + u ## Plot time-series data. plot(d, type = "l") ## Estimate models without and with AR1 transformation. b0 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC") b1 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC", transform = AR1(rho = 0.8)) ## Estimate full AR1 model. b2 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC", family = "AR1") rho <- predict(b2, model = "rho", type = "parameter") print(range(rho)) ## Estimated standard deviations. sd0 <- predict(b0, model = "sigma", type = "parameter") sd1 <- predict(b1, model = "sigma", type = "parameter") sd2 <- predict(b2, model = "sigma", type = "parameter") print(round(c(sd0[1], sd1[1], sd2[1]), 2)) ## Plot fitted trends. p0 <- predict(b0, model = "mu") p1 <- predict(b1, model = "mu") p2 <- predict(b2, model = "mu") plot(d, type = "l") lines(f(d$t) ~ d$t, col = 2, lwd = 2) lines(p0 ~ d$t, col = 4, lwd = 2) lines(p1 ~ d$t, col = 3, lwd = 3) lines(p2 ~ d$t, col = 5, lwd = 3) legend("topleft", c("no trans", "with trans", "AR1 model", "truth"), lwd = 2, col = c(4, 3, 5, 2), bty = "n") ## End(Not run)
## Not run: ## Simulate AR1 data. set.seed(111) n <- 240 d <- data.frame("t" = 1:n) ## Nonlinear function. f <- function(x) { 2 + sin(x / n * 2 * pi - pi) } ## Correlated errors. rho <- 0.8 e <- rnorm(n, sd = 0.1) u <- c(e[1], rep(NA, n - 1)) for(i in 2:n){ u[i] <- rho * u[i - 1] + e[i] } ## Response. d$y <- f(d$t) + u ## Plot time-series data. plot(d, type = "l") ## Estimate models without and with AR1 transformation. b0 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC") b1 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC", transform = AR1(rho = 0.8)) ## Estimate full AR1 model. b2 <- bamlss(y ~ s(t,k=20), data = d, criterion = "BIC", family = "AR1") rho <- predict(b2, model = "rho", type = "parameter") print(range(rho)) ## Estimated standard deviations. sd0 <- predict(b0, model = "sigma", type = "parameter") sd1 <- predict(b1, model = "sigma", type = "parameter") sd2 <- predict(b2, model = "sigma", type = "parameter") print(round(c(sd0[1], sd1[1], sd2[1]), 2)) ## Plot fitted trends. p0 <- predict(b0, model = "mu") p1 <- predict(b1, model = "mu") p2 <- predict(b2, model = "mu") plot(d, type = "l") lines(f(d$t) ~ d$t, col = 2, lwd = 2) lines(p0 ~ d$t, col = 4, lwd = 2) lines(p1 ~ d$t, col = 3, lwd = 3) lines(p2 ~ d$t, col = 5, lwd = 3) legend("topleft", c("no trans", "with trans", "AR1 model", "truth"), lwd = 2, col = c(4, 3, 5, 2), bty = "n") ## End(Not run)
This function creates a data set based on the volcano
data by adding normal errors to the topographic information.
Volcano(sd = 0.3)
Volcano(sd = 0.3)
sd |
The standard deviation of the normal errors. |
A data frame with coordinates and noisy elevation.
d <- Volcano() head(d) ## Not run: b <- bamlss(y ~ te(lon,lat,k=10), data = d) plot(b, theta = -130) ## End(Not run)
d <- Volcano() head(d) ## Not run: b <- bamlss(y ~ te(lon,lat,k=10), data = d) plot(b, theta = -130) ## End(Not run)
Function returning the Watanabe-Akaike Information Criterion (WAIC) of a fitted model object.
WAIC(object, ..., newdata = NULL)
WAIC(object, ..., newdata = NULL)
object |
A fitted model object which contains MCMC samples. |
... |
Optionally more fitted model objects. |
newdata |
Optionally, use new data for computing the WAIC. |
A data frame containing the WAIC and estimated number of parameters.
Watanabe S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. The Journal of Machine Learning Research, 11, 3571–3594. https://jmlr.org/papers/v11/watanabe10a.html
## Not run: d <- GAMart() b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2), data = d) WAIC(b1, b2) ## End(Not run)
## Not run: d <- GAMart() b1 <- bamlss(num ~ s(x1), data = d) b2 <- bamlss(num ~ s(x1) + s(x2), data = d) WAIC(b1, b2) ## End(Not run)