Title: | Bayesian Non- And Semi-Parametric Model Fitting |
---|---|
Description: | MCMC algorithms & processing functions for: 1. single response multiple regression, see Papageorgiou, G. (2018) <doi: 10.32614/RJ-2018-069>, 2. multivariate response multiple regression, with nonparametric models for the means, the variances and the correlation matrix, with variable selection, see Papageorgiou, G. and Marshall, B. C. (2020) <doi: 10.1080/10618600.2020.1739534>, 3. joint mean-covariance models for multivariate responses, see Papageorgiou, G. (2022) <doi: 10.1002/sim.9376>, and 4.Dirichlet process mixtures, see Papageorgiou, G. (2019) <doi: 10.1111/anzs.12273>. |
Authors: | Georgios Papageorgiou |
Maintainer: | Georgios Papageorgiou <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.2.3 |
Built: | 2024-12-30 08:24:24 UTC |
Source: | CRAN |
Markov chain Monte Carlo algorithms for non- and semi-parametric models: 1. spike-slab variable selection in multivariate mean/variance regression models with function mvrm
, 2. joint mean-covariance models for multivariate longitudinal responses with function lmrm
, and 3. Dirichlet process mixture models with function dpmj
.
Package: | BNSP |
Type: | Package |
Version: | 2.2.3 |
Date: | 2023-05-25 |
License: | GPL (>=2) |
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
For details on the GNU General Public License see http://www.gnu.org/copyleft/gpl.html or write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
Georgios Papageorgiou (2014)
Maintainer: Georgios Papageorgiou <[email protected]>
Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. https://arxiv.org/abs/2012.09833
Papageorgiou, G. and Marshall, B.C. (2020). Bayesian semiparametric analysis of multivariate continuous responses, with variable selection. Journal of Computational and Graphical Statistics, 29:4, 896-909, DOI: 10.1080/10618600.2020.1739534
Papageorgiou, G. (2018). BNSP: an R Package for fitting Bayesian semiparametric regression models and variable selection. The R Journal, 10(2):526-548.
Papageorgiou, G. (2018). Bayesian density regression for discrete outcomes. Australian and New Zealand Journal of Statistics, arXiv:1603.09706v3 [stat.ME].
Papageorgiou, G., Richardson, S. and Best, N. (2015). Bayesian nonparametric models for spatially indexed data of mixed type. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77:973-999.
Amitriptyline is a prescription antidepressant. The dataset consists of measurements on 17 patients who had over-dosed on amitriptyline.
data(ami)
data(ami)
A data frame containing 17 rows and 7 columns. The columns represent
tot
total blood plasma level.
ami
amount of amitriptyline found in the plasma.
gen
gender (1 for female).
amt
amount of the drug taken.
pr
PR wave measurement.
bp
diastolic blood pressure.
qrs
QRS wave measurement.
Johnson, R. A., and Wichern, D. W. (2007), Applied Multivariate Statistical Analysis, Essex: Pearson, page 426.
Johnson, R. A., and Wichern, D. W. (2007). Applied Multivariate Statistical Analysis, Essex: Pearson.
Computes the Cholesky factorization and modified Cholesky factorizations of a real symmetric positive-definite square matrix.
chol(x, mod = TRUE, p = 1, ...)
chol(x, mod = TRUE, p = 1, ...)
x |
A symmetric, positive-definite matrix. |
mod |
Defaults to TRUE. With this choice, the function returns the modified Cholesky decomposition. When mod = FALSE, the function returns the usual Cholesky decomposition. |
p |
Relevant only when |
... |
other arguments. |
The function computes the modified Cholesky decomposition of a real symmetric positive-definite square matrix . This is given by
where is a lower tringular matrix with ones on its main diagonal and D is a block diagonal matrix with block size determined by argument
p
.
The function returns matrices and
.
Georgios Papageorgiou [email protected]
The default function from base, chol
Sigma <- matrix(c(1.21,0.18,0.13,0.41,0.06,0.23, 0.18,0.64,0.10,-0.16,0.23,0.07, 0.13,0.10,0.36,-0.10,0.03,0.18, 0.41,-0.16,-0.10,1.05,-0.29,-0.08, 0.06,0.23,0.03,-0.29,1.71,-0.10, 0.23,0.07,0.18,-0.08,-0.10,0.36),6,6) LD <- chol(Sigma) L <- LD$L D <- LD$D round(L,5) round(D,5) solve(L) %*% D %*% solve(t(L)) LD <- chol(Sigma, p = 2) L <- LD$L D <- LD$D round(L, 5) round(D, 5) solve(L) %*% D %*% solve(t(L))
Sigma <- matrix(c(1.21,0.18,0.13,0.41,0.06,0.23, 0.18,0.64,0.10,-0.16,0.23,0.07, 0.13,0.10,0.36,-0.10,0.03,0.18, 0.41,-0.16,-0.10,1.05,-0.29,-0.08, 0.06,0.23,0.03,-0.29,1.71,-0.10, 0.23,0.07,0.18,-0.08,-0.10,0.36),6,6) LD <- chol(Sigma) L <- LD$L D <- LD$D round(L,5) round(D,5) solve(L) %*% D %*% solve(t(L)) LD <- chol(Sigma, p = 2) L <- LD$L D <- LD$D round(L, 5) round(D, 5) solve(L) %*% D %*% solve(t(L))
Computes the similarity matrix.
clustering(object, ...)
clustering(object, ...)
object |
an object of class "mvrm", usually a result of a call to |
... |
other arguments. |
The function computes the similarity matrix for clustering based on corrrelations or variables.
Similarity matrix.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Allows the user to continue the sampler from the state it stopped in the previous call to mvrm
.
continue(object, sweeps, burn = 0, thin, discard = FALSE,...)
continue(object, sweeps, burn = 0, thin, discard = FALSE,...)
object |
An object of class "mvrm", usually a result of a call to |
sweeps |
The number of additional sweeps, maintaining the same thinning interval as specified in the original call to |
burn |
length of burn-in period. Defaults to zero. |
thin |
thinning parameter. Defaults to the thinning parameter chosen for |
discard |
If set to true, the previous samples are discarded. |
... |
other arguments. |
The function allows the sampler to continue from the state it last stopped.
The function returns an object of class mvrm
.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Fits Dirichlet process mixtures of joint response-covariate models, where the covariates are of mixed type while the discrete responses are represented utilizing continuous latent variables. See ‘Details’ section for a full model description and Papageorgiou (2018) for all technical details.
dpmj(formula, Fcdf, data, offset, sampler = "truncated", Xpred, offsetPred, StorageDir, ncomp, sweeps, burn, thin = 1, seed, H, Hdf, d, D, Alpha.xi, Beta.xi, Alpha.alpha, Beta.alpha, Trunc.alpha, ...)
dpmj(formula, Fcdf, data, offset, sampler = "truncated", Xpred, offsetPred, StorageDir, ncomp, sweeps, burn, thin = 1, seed, H, Hdf, d, D, Alpha.xi, Beta.xi, Alpha.alpha, Beta.alpha, Trunc.alpha, ...)
formula |
a formula defining the response and the covariates e.g. |
Fcdf |
a description of the kernel of the response variable. Currently five options are supported: 1. "poisson", 2. "negative binomial", 3. "generalized poisson", 4. "binomial" and 5. "beta binomial". The first three kernels are used for count data analysis, where the third kernel allows for both over- and under-dispersion relative to the Poisson distribution. The last two kernels are used for binomial data analysis. See ‘Details’ section for some of the kernel details. |
data |
an optional data frame, list or environment (or object coercible by ‘as.data.frame’ to a data frame) containing the variables in the model. If not found in ‘data’, the variables are taken from ‘environment(formula)’. |
offset |
this can be used to specify an a priori known component to be included in the model. This should be ‘NULL’ or a numeric vector of length equal to the sample size. One ‘offset’ term can be included in the formula, and if more are required, their sum should be used. |
sampler |
the MCMC algorithm to be utilized. The two options are |
Xpred |
an optional design matrix the rows of which include the values of the covariates |
offsetPred |
the offset term associated with the new covariates |
StorageDir |
a directory to store files with the posterior samples of models parameters and other quantities of interest. If a directory is not provided, files are created in the current directory and removed when the sampler completes. |
ncomp |
number of mixture components. It defines where the countable mixture of densities [in (1) below] is truncated.
Even if |
sweeps |
total number of posterior samples, including those discarded in burn-in period (see argument |
burn |
length of burn-in period. |
thin |
thinning parameter. |
seed |
optional seed for the random generator. |
H |
optional scale matrix of the Wishart-like prior assigned to the restricted covariance matrices |
Hdf |
optional degrees of freedom of the prior Wishart-like prior assigned to the restricted covariance matrices |
d |
optional prior mean of the mean vector |
D |
optional prior covariance matrix of the mean vector |
Alpha.xi |
an optional parameter that depends on the specified
See ‘Details’ section. |
Beta.xi |
an optional parameter that depends on the specified family.
See ‘Details’ section. |
Alpha.alpha |
optional shape parameter |
Beta.alpha |
optional rate parameter |
Trunc.alpha |
optional truncation point |
... |
Other options that will be ignored. |
Function dpmj
returns samples from the posterior distributions of the parameters of the model:
where is a univariate discrete response,
is a
-dimensional vector of mixed type covariates, and
are obtained according to
Sethuraman's (1994) stick-breaking construction:
, and for
, where
are iid samples
Beta
Let denote a discrete variable (response or covariate). It is represented as discretized version of a continuous
latent variable
.
Observed discrete
and continuous latent variable
are connected by:
where the cut-points are obtained as: ,
while for
,
Here
is the cumulative distribution function (cdf) of a standard normal variable
and
denotes an appropriate cdf. Further, latent variables are assumed to
independently follow a
distribution, where the mean and variance are restricted to be zero and one as
they are non-identifiable by the data. Choices for
are described next.
For counts, three options are supported. First, can be specified as the
cdf of a Poisson
variable. Here
denotes the Poisson rate
associated with cluster
, and
the offset term associated with sampling unit
.
Second,
can be specified as the negative binomial cdf, where
. This option allows for overdispersion within each cluster relative to the
Poisson distribution. Third,
can be specified as the Generalized Poisson cdf, where, again,
. This option allows for both over- and under-dispersion within each
cluster.
For Binomial data, two options are supported. First, may be taken to be the cdf of a
Binomial
variable, where
denotes the success probability of cluster
and
the number of trials associated with sampling unit
.
Second,
may be specified to be the beta-binomial cdf, where
.
The special case of Binomial data is treated as
Details on all kernels are provided in the two tables below. The first table provides the probability mass functions and the mean in the presence of an offset term (which may be taken to be one). The column ‘Sample’ indicates for which parameters the routine provides posterior samples. The second table provides information on the assumed priors along with the default values of the parameters of the prior distributions and it also indicates the function arguments that allow the user to alter these.
Kernel | PMF | Offset | Mean | Sample |
Poisson | |
|
|
|
Negative Binomial |
|
|
|
|
Generalized Poisson | |
|
|
|
|
||||
Binomial | |
|
|
|
Beta Binomial |
|
|
|
|
Kernel | Priors | Default Values |
Poisson | Gamma |
Alpha.xi = 1.0, Beta.xi = 0.1 |
Negative Binomial | Gamma |
Alpha.xi = c(1.0,1.0), Beta.xi = c(0.1,0.1) |
Generalized Poisson | Gamma |
|
N |
Alpha.xi = c(1.0,1.0), Beta.xi = c(0.1,1.0) | |
where denotes st.dev. |
||
Binomial | Beta |
Alpha.xi = 1.0, Beta.xi = 1.0 |
Beta Binomial | Gamma |
Alpha.xi = c(1.0,1.0), Beta.xi = c(0.1,0.1) |
Let denote the joint vector of observed continuous and discrete variables and
the corresponding vector of continuous observed and latent variables. With
denoting model parameters
associated with the
th cluster, the joint density
takes the form
where
where is the covariance matrix of the latent continuous variables and it has
diagonal elements equal to one i.e. it is a correlation matrix.
In addition to the priors defined in the table above, we specify the following:
The restricted covariance matrix is assigned a prior distribution that is based on the Wishart
distribution with degrees of freedom set by default to dimension of matrix plus two and diagonal scale matrix,
with the sub-matrix that corresponds to discrete variables taken to be the identity matrix and with sub-matrix
that corresponds to continuous variables having entries equal to 1/8 of the square of
the observed data range. Default values can be changed using arguments
H
and Hdf
.
The prior on , the non-zero part of
, is taken to be multivariate normal
.
The mean
is taken to be equal to the center of the dataset. The covariance matrix
is taken to be diagonal.
Its elements that correspond to continuous variables are set equal to 1/8 of the square of the observed data range while the
elements that correspond to binary variables are set equal to 5.
Arguments
Mu.mu
and Sigma.mu
allow the user to change the default values.
The concentration parameter is assigned a Gamma
prior over the range
, that is,
,
where
is the indicator function. The default values are
,
and
. Users can alter the default using using arguments
Alpha.alpha
, Beta.alpha
and
Turnc.alpha
.
Function dpmj
returns the following:
call |
the matched call. |
seed |
the seed that was used (in case replication of the results is needed). |
meanReg |
if |
medianReg |
if |
q1Reg |
if |
q3Reg |
if |
modeReg |
if |
denReg |
if |
denVar |
if |
Further, function dpmj
creates files where the posterior samples are written. These files are (with all file names
preceded by ‘BNSP.’):
alpha.txt |
this file contains samples from the posterior of the concentration parameters |
compAlloc.txt |
this file contains the allocations to clusters obtained during posterior sampling.
It consists of |
MeanReg.txt |
this file contains the conditional means of the response |
MedianReg.txt |
this file contains the 50% conditional quantile of the response |
muh.txt |
this file contains samples from the posteriors of the |
nmembers.txt |
this file contains |
Q05Reg.txt |
this file contains the 5% conditional quantile of the response |
Q10Reg.txt |
as above, for the 10% conditional quantile. |
Q15Reg.txt |
as above, for the 15% conditional quantile. |
Q20Reg.txt |
as above, for the 20% conditional quantile. |
Q25Reg.txt |
as above, for the 25% conditional quantile. |
Q75Reg.txt |
as above, for the 75% conditional quantile. |
Q80Reg.txt |
as above, for the 80% conditional quantile. |
Q85Reg.txt |
as above, for the 85% conditional quantile. |
Q90Reg.txt |
as above, for the 90% conditional quantile. |
Q95Reg.txt |
as above, for the 95% conditional quantile. |
Sigmah.txt |
this file contains samples from the posteriors of the |
xih.txt |
this file contains samples from the posteriors of parameters |
Updated.txt |
this file contains |
Georgios Papageorgiou [email protected]
Consul, P. C. & Famoye, G. C. (1992). Generalized Poisson regression model. Communications in Statistics - Theory and Methods, 1992, 89-109.
Papageorgiou, G. (2018). Bayesian density regression for discrete outcomes. arXiv:1603.09706v3 [stat.ME].
Papaspiliopoulos, O. (2008). A note on posterior sampling from Dirichlet mixture models. Technical report, University of Warwick.
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639-650.
Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Communications in Statistics Simulation and Computation, 36(1), 45-54.
#Bayesian nonparametric joint model with binomial response Y and one predictor X data(simD) pred<-seq(with(simD,min(X))+0.1,with(simD,max(X))-0.1,length.out=30) npred<-length(pred) # fit1 and fit2 define the same model but with different numbers of # components and posterior samples fit1 <- dpmj(cbind(Y,(E-Y))~X, Fcdf="binomial", data=simD, ncomp=10, sweeps=20, burn=10, sampler="truncated", Xpred=pred, offsetPred=30) fit2 <- dpmj(cbind(Y,(E-Y))~X, Fcdf="binomial", data=simD, ncomp=50, sweeps=5000, burn=1000, sampler="truncated", Xpred=pred, offsetPred=30) plot(with(simD,X),with(simD,Y)/with(simD,E)) lines(pred,fit2$medianReg/30,col=3,lwd=2) # with discrete covariate simD<-data.frame(simD,Xd=sample(c(0,1),300,replace=TRUE)) pred<-c(0,1) fit3 <- dpmj(cbind(Y,(E-Y))~Xd, Fcdf="binomial", data=simD, ncomp=10, sweeps=20, burn=10, sampler="truncated", Xpred=pred, offsetPred=30)
#Bayesian nonparametric joint model with binomial response Y and one predictor X data(simD) pred<-seq(with(simD,min(X))+0.1,with(simD,max(X))-0.1,length.out=30) npred<-length(pred) # fit1 and fit2 define the same model but with different numbers of # components and posterior samples fit1 <- dpmj(cbind(Y,(E-Y))~X, Fcdf="binomial", data=simD, ncomp=10, sweeps=20, burn=10, sampler="truncated", Xpred=pred, offsetPred=30) fit2 <- dpmj(cbind(Y,(E-Y))~X, Fcdf="binomial", data=simD, ncomp=50, sweeps=5000, burn=1000, sampler="truncated", Xpred=pred, offsetPred=30) plot(with(simD,X),with(simD,Y)/with(simD,E)) lines(pred,fit2$medianReg/30,col=3,lwd=2) # with discrete covariate simD<-data.frame(simD,Xd=sample(c(0,1),300,replace=TRUE)) pred<-c(0,1) fit3 <- dpmj(cbind(Y,(E-Y))~Xd, Fcdf="binomial", data=simD, ncomp=10, sweeps=20, burn=10, sampler="truncated", Xpred=pred, offsetPred=30)
This function plots the posterior distribution of the elements of correlation matrices.
histCorr(x, term = "R", plotOptions = list(),...)
histCorr(x, term = "R", plotOptions = list(),...)
x |
an object of class ‘mvrm’, as generated by function |
term |
Admits two possible values: "R" to plot samples from the posterior of the correlation matrix |
plotOptions |
ggplot type options. |
... |
other arguments. |
Use this function to visualize the elements of a correlation matrix.
Posterior distributions of elements of correlation matrices.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See ‘Details’ section for a full description of the model.
lmrm(formula, data = list(), centre=TRUE, id, time, sweeps, burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)", sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)", c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)", tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)
lmrm(formula, data = list(), centre=TRUE, id, time, sweeps, burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)", sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)", c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)", tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)
formula |
a formula defining the responses and the covariates in the 5 regression models e.g. |
data |
a data frame. |
centre |
Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred. |
id |
identifiers of the individuals or other sampling units that are observed over time. |
time |
a vector input that specifies the time of observation |
sweeps |
total number of posterior samples, including those discarded in burn-in period
(see argument |
burn |
length of burn-in period. |
thin |
thinning parameter. |
seed |
optional seed for the random generator. |
StorageDir |
a required directory to store files with the posterior samples of models parameters. |
c.betaPrior |
The inverse Gamma prior of |
pi.muPrior |
The Beta prior of |
c.alphaPrior |
The inverse Gamma prior of |
pi.phiPrior |
The Beta prior of |
c.psiPrior |
The prior of |
sigmaPrior |
The prior of |
pi.sigmaPrior |
The Beta prior of |
corr.Model |
Specifies the model for the correlation matrices |
DP.concPrior |
The Gamma prior for the Dirichlet process concentration parameter. |
c.etaPrior |
The inverse Gamma prior of |
pi.nuPrior |
The Beta prior of |
pi.fiPrior |
The Beta prior of |
c.omegaPrior |
The prior of |
sigmaCorPrior |
The prior of |
tuneCalpha |
Starting value of the tuning parameter for sampling |
tuneSigma2 |
Starting value of the tuning parameter for sampling |
tuneCbeta |
Starting value of the tuning parameter for sampling |
tuneAlpha |
Starting value of the tuning parameter for sampling regression coefficients of the variance models
|
tuneSigma2R |
Starting value of the tuning parameter for sampling |
tuneR |
Starting value of the tuning parameter for sampling correlation matrices.
Defaults at |
tuneCpsi |
Starting value of the tuning parameter for sampling variances |
.
tuneCbCor |
Starting value of the tuning parameter for sampling |
tuneOmega |
Starting value of the tuning parameter for sampling regression coefficients of the variance models
|
tuneComega |
Starting value of the tuning parameter for sampling |
tau |
The |
FT |
Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled. |
... |
Other options that will be ignored. |
Function lmrm
returns samples from the posterior distributions of the parameters of
a regression model with normally distributed multivariate longitudinal responses. To describe the model,
let denote the vector of
responses
observed on individual
at time point
.
The observational time points
are allowed to be unequally spaced.
Further, let
denote the covariate vector that is observed along with
and that may include time,
other time-dependent covariates and time-independent ones.
In addition, let
denote the
th response vector.
With
and
cov
, the model assumes multivariate normality,
.
The means
and covariance matrices
are modelled semiparametrically in terms of covariates.
For the means one can specify semiparametric models,
This is the first of the 5 regression submodels.
To model the covariance matrix, first consider the modified Cholesky decomposition,
where
is a unit block lower triangular matrix
and
is a block diagonal matrix,
For modelling in terms of covariates,
first we separate the variances and the correlations
.
It is easy to model matrix
in terms of covariates as the only requirement on its diagonal
elements is that they are nonnegative,
This is the second of the 5 regression submodels.
For , the
element of
, one can specify semiparametric models
This is the third of the 5 regression submodels.
The elements of the correlations matrices are modelled in terms of covariate time only, hence they are denoted by
.
Subject to the positive definiteness constraint, the elements of
are modelled using a normal distribution
with location and scale parameters,
and
, modelled as
and these are the last 2 of the 5 submodels.
Function lmrm
returns samples from the posteriors of the model parameters.
Georgios Papageorgiou [email protected]
Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.
# Fit a joint mean-covariance model on the simulated dataset simD2 require(ggplot2) data(simD2) model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 # the above defines the responses and the regression models on the left and # right of "~", respectively # the first model, for the mean, is a linear function of time, this is sufficient as # the 2 responses have constant mean. # the second model, for the variances, is a smooth function of time # the third model, for the dependence structure, is a smooth function of lag, # that lmrm figures out and it does not need to be computed by the user # the fourth model, for location of the correlations, is a smooth function of time # the fifth model, for scale of the correlations, is just an intercept model ## Not run: m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2, id = id, time = time, sweeps = 2500, burn = 500, thin = 2, StorageDir = getwd(), seed = 1) plot(m1) ## End(Not run)
# Fit a joint mean-covariance model on the simulated dataset simD2 require(ggplot2) data(simD2) model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 # the above defines the responses and the regression models on the left and # right of "~", respectively # the first model, for the mean, is a linear function of time, this is sufficient as # the 2 responses have constant mean. # the second model, for the variances, is a smooth function of time # the third model, for the dependence structure, is a smooth function of lag, # that lmrm figures out and it does not need to be computed by the user # the fourth model, for location of the correlations, is a smooth function of time # the fifth model, for scale of the correlations, is just an intercept model ## Not run: m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2, id = id, time = time, sweeps = 2500, burn = 500, thin = 2, StorageDir = getwd(), seed = 1) plot(m1) ## End(Not run)
Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous multivariate responses and additive models for the mean and variance functions. The model utilizes spike-slab priors for variable selection and regularization. See ‘Details’ section for a full description of the model.
mvrm(formula, distribution = "normal", data = list(), centre = TRUE, sweeps, burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5, 0.5 * n * p)", pi.muPrior = "Beta(1, 1)", c.alphaPrior = "IG(1.1, 1.1)", sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1, 1)", c.psiPrior = "HN(1)", phiPrior = "HN(2)", pi.omegaPrior = "Beta(1, 1)", mu.RPrior = "N(0, 1)", sigma.RPrior = "HN(1)", corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5, 2)", breaksPrior = "SBeta(1, 2)", tuneCbeta, tuneCalpha, tuneAlpha, tuneSigma2, tuneCpsi, tunePsi, tunePhi, tuneR, tuneSigma2R, tuneHar, tuneBreaks, tunePeriod, tau, FT = 1, compDeviance = FALSE, ...)
mvrm(formula, distribution = "normal", data = list(), centre = TRUE, sweeps, burn = 0, thin = 1, seed, StorageDir, c.betaPrior = "IG(0.5, 0.5 * n * p)", pi.muPrior = "Beta(1, 1)", c.alphaPrior = "IG(1.1, 1.1)", sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1, 1)", c.psiPrior = "HN(1)", phiPrior = "HN(2)", pi.omegaPrior = "Beta(1, 1)", mu.RPrior = "N(0, 1)", sigma.RPrior = "HN(1)", corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5, 2)", breaksPrior = "SBeta(1, 2)", tuneCbeta, tuneCalpha, tuneAlpha, tuneSigma2, tuneCpsi, tunePsi, tunePhi, tuneR, tuneSigma2R, tuneHar, tuneBreaks, tunePeriod, tau, FT = 1, compDeviance = FALSE, ...)
formula |
a formula defining the responses and the covariates in the mean and variance models e.g. |
distribution |
The distribution for the response variables. Currently two options are supported: "normal" and "t". |
data |
a data frame. |
centre |
Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred. |
sweeps |
total number of posterior samples, including those discarded in burn-in period
(see argument |
burn |
length of burn-in period. |
thin |
thinning parameter. |
seed |
optional seed for the random generator. |
StorageDir |
a required directory to store files with the posterior samples of models parameters. |
c.betaPrior |
The inverse Gamma prior of |
pi.muPrior |
The Beta prior of |
c.alphaPrior |
The prior of |
sigmaPrior |
The prior of |
pi.sigmaPrior |
The Beta prior of |
c.psiPrior |
The prior of |
phiPrior |
The prior of |
pi.omegaPrior |
The Beta prior of |
mu.RPrior |
The normal prior for |
sigma.RPrior |
The half normal prior for |
corr.Model |
Specifies the model for the correlation matrix |
DP.concPrior |
The Gamma prior for the Dirichlet process concentration parameter. |
breaksPrior |
The prior for the shifts associated with the growth break points. The shift is taken to have a scaled Beta prior with support the (0, p) interval, where p is the period of the sin curve. The default SBeta(1, 2) is a scaled Beta(1, 2) distribution supported in the (0, p) interval. The shifts are in increasing order. |
tuneCbeta |
Starting value of the tuning parameter for sampling |
tuneCalpha |
Starting value of the tuning parameter for sampling |
tuneAlpha |
Starting value of the tuning parameter for sampling regression coefficients of the variance model
|
tuneSigma2 |
Starting value of the tuning parameter for sampling variances |
tuneCpsi |
Starting value of the tuning parameter for sampling |
tunePsi |
Starting value of the tuning parameter for sampling |
tunePhi |
Starting value of the tuning parameter for sampling |
tuneR |
Starting value of the tuning parameter for sampling correlation matrices.
Defaults at |
tuneSigma2R |
Starting value of the tuning parameter for sampling |
tuneHar |
Starting value of the tuning parameter for sampling the regression coefficients of the harmonics. Defaults at 100. |
tuneBreaks |
Starting value of the tuning parameter for sampling the shift parameters associated with growth breaks. Defaults at 0.01 times the period of the sin wave. |
tunePeriod |
Starting value of the tuning parameter for sampling the period parameter of the sin curve. Defaults to 0.01. |
tau |
The |
FT |
Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled. |
compDeviance |
Binary indicator. If set equal to 1, the deviance is computed. |
... |
Other options that will be ignored. |
Function mvrm
returns samples from the posterior distributions of the parameters of
a regression model with normally distributed multivariate responses and mean and variance functions modeled
in terms of covariates. For instance, in the presence of two responses () and two covariates
in the mean model (
) and two in the variance model (
), we may choose to fit
parametrically modelling the effects of and
and non-parametrically modelling
the effects of
and
. Smooth functions, such as
and
, are represented by basis function expansion,
where are the basis functions and
and
are regression coefficients.
The variance model can equivalently be expressed as
where . This is the parameterization that we adopt in this implementation.
Positive prior probability that the regression coefficients in the mean model are exactly
zero is achieved by defining binary variables that take value
if the associated coefficient
and
if
.
Indicators
that take value
if the associated coefficient
and
if
for the variance function
are defined analogously. We note that all coefficients in the mean and variance functions are
subject to selection except the intercepts,
and
.
Prior specification:
For the vector of non-zero regression coefficients we specify a g-prior
where is a scaled version of design matrix
of the mean model.
For the vector of non-zero regression coefficients we specify a normal prior
Independent priors are specified for the indicators variables and
as
and
.
Further, Beta priors are specified for
and
We note that blocks of regression coefficients associated with distinct covariate effects
have their own probability of selection ( or
)
and this probability has its own prior distribution.
Further, we specify inverse Gamma priors for and
For we consider inverse Gamma and half-normal priors
Lastly, for the elements of the correlation matrix, we specify normal distributions with mean and variance
,
with the priors on these two parameters being normal and half-normal, respectively. This is the common correlations model. Further,
the grouped correlations model can be specified. It considers a mixture of normal distributions for the means
. The grouped correlations model can also be specified. It clusters the variables instead of the correlations.
Function mvrm
returns the following:
call |
the matched call. |
formula |
model formula. |
seed |
the seed that was used (in case replication of the results is needed). |
data |
the dataset |
X |
the mean model design matrix. |
Z |
the variance model design matrix. |
LG |
the length of the vector of indicators |
LD |
the length of the vector of indicators |
mcpar |
the MCMC parameters: length of burn in period, total number of samples, thinning period. |
nSamples |
total number of posterior samples |
DIR |
the storage directory |
Further, function mvrm
creates files where the posterior samples are written.
These files are (with all file names preceded by ‘BNSP.’):
alpha.txt |
contains samples from the posterior of vector |
beta.txt |
contains samples from the posterior of vector |
gamma.txt |
contains samples from the posterior of the vector of the indicators
|
delta.txt |
contains samples from the posterior of the vector of the indicators
|
sigma2.txt |
contains samples from the posterior of the error variance |
cbeta.txt |
contains samples from the posterior of |
calpha.txt |
contains samples from the posterior of |
R.txt |
contains samples from the posterior of the correlation matrix |
theta.txt |
contains samples from the posterior of |
muR.txt |
contains samples from the posterior of |
sigma2R.txt |
contains samples from the posterior of |
deviance.txt |
contains the deviance, minus twice the log likelihood evaluated at the sampled values of the parameters. |
In addition to the above, for models that cluster the correlations we have
compAlloc.txt |
The cluster at which the correlations were allocated, |
nmembers.txt |
The numbers of correlations assigned to each cluster. |
DPconc.txt |
Contains samples from the posterior of the Dirichlet process concentration parameter. |
In addition to the above, for models that cluster the variables we have
compAllocV.txt |
The cluster at which the variables were allocated, |
nmembersV.txt |
The numbers of variables assigned to each cluster. |
Georgios Papageorgiou [email protected]
Papageorgiou, G. and Marshall, B.C. (2019). Bayesian semiparametric analysis of multivariate continuous responses, with variable selection. arXiv.
Papageorgiou, G. (2018). BNSP: an R Package for fitting Bayesian semiparametric regression models and variable selection. The R Journal, 10(2):526-548.
Chan, D., Kohn, R., Nott, D., & Kirby, C. (2006). Locally adaptive semiparametric estimation of the mean and variance functions in regression models. Journal of Computational and Graphical Statistics, 15(4), 915-936.
# Fit a mean/variance regression model on the cps71 dataset from package np. #This is a univariate response model require(np) require(ggplot2) data(cps71) model <- logwage ~ sm(age, k = 30, bs = "rd") | sm(age, k = 30, bs = "rd") DIR <- getwd() ## Not run: m1 <- mvrm(formula = model, data = cps71, sweeps = 10000, burn = 5000, thin = 2, seed = 1, StorageDir = DIR) #Print information and summarize the model print(m1) summary(m1) #Summarize and plot one parameter of interest alpha <- mvrm2mcmc(m1, "alpha") summary(alpha) plot(alpha) #Obtain a plot of a term in the mean model wagePlotOptions <- list(geom_point(data = cps71, aes(x = age, y = logwage))) plot(x = m1, model = "mean", term = "sm(age)", plotOptions = wagePlotOptions) plot(m1) #Obtain predictions for new values of the predictor "age" predict(m1, data.frame(age = c(21, 65)), interval = "credible") # Fit a bivariate mean/variance model on the marks dataset from package ggm # two responses: marks mechanics and vectors, and one covariate: marks on algebra model2 <- mechanics | vectors ~ sm(algebra, k = 5) | sm(algebra, k = 3) m2 <- mvrm(formula = model2, data = marks, sweeps = 100000, burn = 50000, thin = 2, seed = 1, StorageDir = DIR) plot(m2) ## End(Not run)
# Fit a mean/variance regression model on the cps71 dataset from package np. #This is a univariate response model require(np) require(ggplot2) data(cps71) model <- logwage ~ sm(age, k = 30, bs = "rd") | sm(age, k = 30, bs = "rd") DIR <- getwd() ## Not run: m1 <- mvrm(formula = model, data = cps71, sweeps = 10000, burn = 5000, thin = 2, seed = 1, StorageDir = DIR) #Print information and summarize the model print(m1) summary(m1) #Summarize and plot one parameter of interest alpha <- mvrm2mcmc(m1, "alpha") summary(alpha) plot(alpha) #Obtain a plot of a term in the mean model wagePlotOptions <- list(geom_point(data = cps71, aes(x = age, y = logwage))) plot(x = m1, model = "mean", term = "sm(age)", plotOptions = wagePlotOptions) plot(m1) #Obtain predictions for new values of the predictor "age" predict(m1, data.frame(age = c(21, 65)), interval = "credible") # Fit a bivariate mean/variance model on the marks dataset from package ggm # two responses: marks mechanics and vectors, and one covariate: marks on algebra model2 <- mechanics | vectors ~ sm(algebra, k = 5) | sm(algebra, k = 3) m2 <- mvrm(formula = model2, data = marks, sweeps = 100000, burn = 50000, thin = 2, seed = 1, StorageDir = DIR) plot(m2) ## End(Not run)
mvrm
into an object of class ‘mcmc’Reads in files where the posterior samples were written and creates an object of class
‘mcmc’ so that functions like summary
and plot
from package coda
can be used
mvrm2mcmc(mvrmObj, labels)
mvrm2mcmc(mvrmObj, labels)
mvrmObj |
An object of class ‘mvrm’ as created by a call to function |
labels |
The labels of the files to be read in. These can be one or more of: "alpha", "beta", "gamma", "delta", "sigma2", "cbeta", "calpha", "R", "muR", "sigma2R", "nmembers", "nmembersV", "compAlloc", "compAllocV", and "DPconc" and they correspond to the parameters of the model that a call to functions |
An object of class ‘mcmc’ that holds the samples from the posterior of the selected parameter.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
This function plots estimated terms that appear in the mean and variance models.
## S3 method for class 'mvrm' plot(x, model, term, response, response2, intercept = TRUE, grid = 30, centre = mean, quantiles = c(0.1, 0.9), contour = TRUE, static = TRUE, centreEffects = FALSE, plotOptions = list(), nrow, ask = FALSE, plotEmptyCluster = FALSE, combine = FALSE, ...)
## S3 method for class 'mvrm' plot(x, model, term, response, response2, intercept = TRUE, grid = 30, centre = mean, quantiles = c(0.1, 0.9), contour = TRUE, static = TRUE, centreEffects = FALSE, plotOptions = list(), nrow, ask = FALSE, plotEmptyCluster = FALSE, combine = FALSE, ...)
x |
an object of class ‘mvrm’ as generated by function |
model |
one of "mean", "stdev", or "both", specifying which model to be visualized. |
term |
the term in the selected model to be plotted. |
response |
integer number denoting the response variable to be plotted (in case there is more than one). |
response2 |
only relevant for multivariate longitudinal data. |
intercept |
specifies if an intercept should be included in the calculations. |
grid |
the length of the grid on which the term will be evaluated. |
centre |
a description of how the centre of the posterior should be measured. Usually |
quantiles |
the quantiles to be used when plotting credible regions. Plots without credible intervals may be obtained by setting this argument to NULL. |
contour |
relevant for 3D plots only. If |
static |
relevant for 3D plots only. If |
centreEffects |
if TRUE then the effects in the mean functions are centred around zero over the range of the predictor while the effects in the variance function are scaled around one. |
plotOptions |
for plots of univariate smooth terms or for plots of bivariate smooth terms where one of the
two covariates is discrete, this is a list of plot elements to give to |
nrow |
the number of rows in the figure with the plots. |
ask |
if set to TRUE, plots will be displayed one at a time. |
plotEmptyCluster |
if set to TRUE, plots of empty clusters will be displayed. Relevant for multivariate longitudinal datasets. |
combine |
Binary indicator. It can be set to TRUE to simultaneously plot two terms. One of the terms must be continuous and the other must be discrete. This makes sense to set to TRUE when wanting to visualize groups that have a common slope. |
... |
other arguments. |
Use this function to obtain predictions.
Predictions along with credible/pediction intervals
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
This function plots the posterior mean and credible intervals of the elements of correlation matrices.
plotCorr(x, term = "R", centre = mean, quantiles = c(0.1, 0.9), ...)
plotCorr(x, term = "R", centre = mean, quantiles = c(0.1, 0.9), ...)
x |
an object of class ‘mvrm’ as generated by function |
term |
R or muR, |
centre |
a description of how the centre of the posterior should be measured. Usually |
quantiles |
the quantiles to be used when plotting credible regions. Plots without credible intervals may be obtained by setting this argument to NULL. |
... |
other arguments. |
Use this function to visualize the elements of a correlation matrix.
Posterior means and credible intervals of elements of correlation matrices.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Provides predictions and posterior credible/prediction intervals for given feature vectors.
## S3 method for class 'mvrm' predict(object, newdata, interval = c("none", "credible", "prediction"), level = 0.95, ind.preds=FALSE, ...)
## S3 method for class 'mvrm' predict(object, newdata, interval = c("none", "credible", "prediction"), level = 0.95, ind.preds=FALSE, ...)
object |
an object of class "mvrm", usually a result of a call to |
newdata |
data frame of feature vectors to obtain predictions for. If newdata is missing, the function will use the feature vectors in the data frame used to fit the mvrm object. |
interval |
type of interval calculation. |
level |
the level of the credible interval. |
ind.preds |
Binary indicator. If set to TRUE the function returns additionally the predictions per individual MCMC sample. |
... |
other arguments. |
The function returns predictions of new responses or the means of the responses for given feature vectors. Predictions for new responses or the means of new responses are the same. However, the two differ in the associated level of uncertainty: response predictions are associated with wider (prediction) intervals than mean response predictions. To obtain prediction intervals (for new responses) the function samples from the normal distributions with means and variances as sampled during the MCMC run.
Predictions for given covariate/feature vectors.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Provides basic information from an mvrm fit.
## S3 method for class 'mvrm' print(x, digits = 5, ...)
## S3 method for class 'mvrm' print(x, digits = 5, ...)
x |
an object of class "mvrm", usually a result of a call to |
digits |
the number of significant digits to use when printing. |
... |
other arguments. |
The function prints information about mvrm fits.
The function provides a matched call, the number of posterior samples obtained and marginal inclusion probabilities of the terms in the mean and variance models.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
s
Provides interface between mgcv::s and BNSP. s(...)
calls
mgcv::smoothCon(mgcv::s(...),...
s(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = -1, fx = FALSE, bs = "tp", m = NA, by = NA, xt = NULL, id = NULL, sp = NULL, pc = NULL)
s(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = -1, fx = FALSE, bs = "tp", m = NA, by = NA, xt = NULL, id = NULL, sp = NULL, pc = NULL)
... |
a list of variables. See mgcv::s |
data |
see mgcv::smoothCon |
knots |
see mgcv::knots |
absorb.cons |
see mgcv::smoothCon |
scale.penalty |
see mgcv::smoothCon |
n |
see mgcv::smoothCon |
dataX |
see mgcv::smoothCon |
null.space.penalty |
see mgcv::smoothCon |
sparse.cons |
see mgcv::smoothCon |
diagonal.penalty |
see mgcv::smoothCon |
apply.by |
see mgcv::smoothCon |
modCon |
see mgcv::smoothCon |
k |
see mgcv::s |
fx |
see mgcv::s |
bs |
see mgcv::s |
m |
see mgcv::s |
by |
see mgcv::s |
xt |
see mgcv::s |
id |
see mgcv::s |
sp |
see mgcv::s |
pc |
see mgcv::s |
The most relevant arguments for BNSP users are the list of variables ...
, knots
, absorb.cons
, bs
, and by
.
A design matrix that specifies a smooth term in a model.
Georgios Papageorgiou [email protected]
Just a simulated dataset to illustrate the DO mixture model. The success probability and the covariate have a non-linear relationship.
data(simD)
data(simD)
A data frame with 300 independent observations. Three numerical vectors contain information on
Y
number of successes.
E
number of trials.
X
explanatory variable.
A simulated dataset to illustrate the multivariate longitudinal model. It consists of a bivariate vector of responses observed over 6 time points.
data(simD2)
data(simD2)
A data frame that includes observations on 50 sampling units. The data frame has 300 rows for the 50 sampling units observed over 6 time points. It has 4 columns
Y1
first response.
Y2
second response.
time
the time of observation.
id
unique sampling unit identifier.
Function used to define sinusoidal curves in the mean formula of function mvrm
.
The function is used internally to construct design matrices.
sinusoid(..., harmonics = 1, amplitude = 1, period = 0, periodRange = NULL, breaks = NULL, knots = NULL)
sinusoid(..., harmonics = 1, amplitude = 1, period = 0, periodRange = NULL, breaks = NULL, knots = NULL)
... |
a single covariate that the sinusoid term is a function of. |
harmonics |
an integer value that denotes the number of sins and cosines to be utilized in the representation of a sinusoidal curve. |
amplitude |
a positive integer. If set equal to one, it denotes a fixed amplitude. Otherwise, if set to an integer that is greater than one, it denotes the number of knots to be utilized in the representation of the time-varying amplitude. |
period |
the period of the sinusoidal wave. Values less than or equal to zero signify that the period is unknown. Positive values signify that the period is known and fixed. |
periodRange |
a vector of length two with the range of possible period values. It is required when the period is unknown. |
breaks |
the growth break points. |
knots |
the knots to be utilized in the representation of the time-varying amplitude. Relevant only when |
Use this function within calls to function mvrm
to specify sinusoidal waves in the mean function
of a regression model.
Consider the sinusoidal curve
where is the response at time
,
is an intercept term,
is a time-varying amplitude,
is the phase shift parameter,
is the period taken to be known, and
is the error term.
The period is defined by the argument
period
.
The time-varying amplitude is represented using ,
where
, the number of knots, is defined by argument
amplitude
. If amplitude = 1
, then
the amplitude is taken to be fixed: .
Further, is represented utilizing
,
where
, the number of harmonics, is defined by argument
harmonics
.
Specifies the design matrices of an mvrm
call
Georgios Papageorgiou [email protected]
# Simulate and fit a sinusoidal curve # First load releveant packages require(BNSP) require(ggplot2) require(gridExtra) require(Formula) # Simulate the data mu <- function(u) {cos(0.5 * u) * sin(2 * pi * u + 1)} set.seed(1) n <- 100 u <- sort(runif(n, min = 0, max = 2*pi)) y <- rnorm(n, mu(u), 0.1) data <- data.frame(y, u) # Define the model and call function \code{mvrm} that perfomes posterior sampling for the given # dataset and defined model model <- y ~ sinusoid(u, harmonics = 2, amplitude = 20, period = 1) ## Not run: m1 <- mvrm(formula = model, data = data, sweeps = 10000, burn = 5000, thin = 2, seed = 1, StorageDir = getwd()) # Plot x1 <- seq(min(u), max(u), length.out = 100) plotOptionsM <- list(geom_line(aes_string(x = x1, y = mu(x1)), col = 2, alpha = 0.5, lty = 2), geom_point(data = data, aes(x = u, y = y))) plot(x = m1, term = 1, plotOptions = plotOptionsM, intercept = TRUE, quantiles = c(0.005, 0.995), grid = 100, combine = 1) ## End(Not run)
# Simulate and fit a sinusoidal curve # First load releveant packages require(BNSP) require(ggplot2) require(gridExtra) require(Formula) # Simulate the data mu <- function(u) {cos(0.5 * u) * sin(2 * pi * u + 1)} set.seed(1) n <- 100 u <- sort(runif(n, min = 0, max = 2*pi)) y <- rnorm(n, mu(u), 0.1) data <- data.frame(y, u) # Define the model and call function \code{mvrm} that perfomes posterior sampling for the given # dataset and defined model model <- y ~ sinusoid(u, harmonics = 2, amplitude = 20, period = 1) ## Not run: m1 <- mvrm(formula = model, data = data, sweeps = 10000, burn = 5000, thin = 2, seed = 1, StorageDir = getwd()) # Plot x1 <- seq(min(u), max(u), length.out = 100) plotOptionsM <- list(geom_line(aes_string(x = x1, y = mu(x1)), col = 2, alpha = 0.5, lty = 2), geom_point(data = data, aes(x = u, y = y))) plot(x = m1, term = 1, plotOptions = plotOptionsM, intercept = TRUE, quantiles = c(0.005, 0.995), grid = 100, combine = 1) ## End(Not run)
Function used to define smooth effects in the mean and variance formulae of function mvrm
.
The function is used internally to construct the design matrices.
sm(..., k = 10, knots = NULL, bs = "rd")
sm(..., k = 10, knots = NULL, bs = "rd")
... |
one or two covariates that the smooth term is a function of. If two covariates are used,
they may be both continuous or one continuous and one discrete. Discrete variables should be defined as |
k |
the number of knots to be utilized in the basis function expansion. |
knots |
the knots to be utilized in the basis function expansion. |
bs |
a two letter character indicating the basis functions to be used. Currently, the options are
|
Use this function within calls to function mvrm
to specify smooth terms in the mean and/or variance function
of the regression model.
Univariate radial basis functions with basis functions or
knots are defined by
where denotes the Euclidean norm of
and
are the knots that
are chosen as the quantiles of the observed values of explanatory variable
,
with
and the remaining knots chosen as equally spaced quantiles between
and
.
Thin plate splines are defined by
where .
Radial basis functions for bivariate smooths are defined by
Specifies the design matrices of an mvrm
call
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
Provides basic information from an mvrm fit.
## S3 method for class 'mvrm' summary(object, nModels = 5, digits = 5, printTuning = FALSE, ...)
## S3 method for class 'mvrm' summary(object, nModels = 5, digits = 5, printTuning = FALSE, ...)
object |
an object of class "mvrm", usually a result of a call to |
nModels |
integer number of models with the highest posterior probability to be displayed. |
digits |
the number of significant digits to use when printing. |
printTuning |
if set to TRUE, the starting and finishig values of the tuninf parameters are displayed. |
... |
other arguments. |
Use this function to summarize mvrm fits.
The functions provides a detailed description of the specified model and priors. In addition, the function provides information about the Markov chain ran (length, burn-in, thinning) and the folder where the files with posterior samples are stored. Lastly, the function provides the mean posterior and null deviance and the mean/variance models visited most often during posterior sampling.
Georgios Papageorgiou [email protected]
#see \code{mvrm} example
#see \code{mvrm} example
te
Provides interface between mgcv::te and BNSP. te(...)
calls
mgcv::smoothCon(mgcv::te(...),...
te(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = NA, bs = "cr", m = NA, d = NA, by = NA, fx = FALSE, np = TRUE, xt = NULL, id = NULL, sp = NULL, pc = NULL)
te(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = NA, bs = "cr", m = NA, d = NA, by = NA, fx = FALSE, np = TRUE, xt = NULL, id = NULL, sp = NULL, pc = NULL)
... |
a list of variables. See mgcv::te |
data |
see mgcv::smoothCon |
knots |
see mgcv::knots |
absorb.cons |
see mgcv::smoothCon |
scale.penalty |
see mgcv::smoothCon |
n |
see mgcv::smoothCon |
dataX |
see mgcv::smoothCon |
null.space.penalty |
see mgcv::smoothCon |
sparse.cons |
see mgcv::smoothCon |
diagonal.penalty |
see mgcv::smoothCon |
apply.by |
see mgcv::smoothCon |
modCon |
see mgcv::smoothCon |
k |
see mgcv::te |
bs |
see mgcv::te |
m |
see mgcv::te |
d |
see mgcv::te |
by |
see mgcv::te |
fx |
see mgcv::te |
np |
see mgcv::te |
xt |
see mgcv::te |
id |
see mgcv::te |
sp |
see mgcv::te |
pc |
see mgcv::te |
The most relevant arguments for BNSP users are the list of variables ...
, knots
, absorb.cons
, bs
, and by
.
A design matrix that specifies a smooth term in a model.
Georgios Papageorgiou [email protected]
ti
Provides interface between mgcv::ti and BNSP. ti(...)
calls
mgcv::smoothCon(mgcv::ti(...),...
ti(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = NA, bs = "cr", m = NA, d = NA, by = NA, fx = FALSE, np = TRUE, xt = NULL, id = NULL, sp = NULL, mc = NULL, pc = NULL)
ti(..., data, knots = NULL, absorb.cons = FALSE, scale.penalty = TRUE, n = nrow(data), dataX = NULL, null.space.penalty = FALSE, sparse.cons = 0, diagonal.penalty = FALSE, apply.by = TRUE, modCon = 0, k = NA, bs = "cr", m = NA, d = NA, by = NA, fx = FALSE, np = TRUE, xt = NULL, id = NULL, sp = NULL, mc = NULL, pc = NULL)
... |
a list of variables. See mgcv::ti |
data |
see mgcv::smoothCon |
knots |
see mgcv::knots |
absorb.cons |
see mgcv::smoothCon |
scale.penalty |
see mgcv::smoothCon |
n |
see mgcv::smoothCon |
dataX |
see mgcv::smoothCon |
null.space.penalty |
see mgcv::smoothCon |
sparse.cons |
see mgcv::smoothCon |
diagonal.penalty |
see mgcv::smoothCon |
apply.by |
see mgcv::smoothCon |
modCon |
see mgcv::smoothCon |
k |
see mgcv::ti |
bs |
see mgcv::ti |
m |
see mgcv::ti |
d |
see mgcv::ti |
by |
see mgcv::ti |
fx |
see mgcv::ti |
np |
see mgcv::ti |
xt |
see mgcv::ti |
id |
see mgcv::ti |
sp |
see mgcv::ti |
mc |
see mgcv::ti |
pc |
see mgcv::ti |
The most relevant arguments for BNSP users are the list of variables ...
, knots
, absorb.cons
, bs
, and by
.
A design matrix that specifies a smooth term in a model.
Georgios Papageorgiou [email protected]