Title: | Efficient Implementation of Gaussian Process in Bayesian Hierarchical Models |
---|---|
Description: | Implements Bayesian hierarchical models with flexible Gaussian process priors, focusing on Extended Latent Gaussian Models and incorporating various Gaussian process priors for Bayesian smoothing. Computations leverage finite element approximations and adaptive quadrature for efficient inference. Methods are detailed in Zhang, Stringer, Brown, and Stafford (2023) <doi:10.1177/09622802221134172>; Zhang, Stringer, Brown, and Stafford (2024) <doi:10.1080/10618600.2023.2289532>; Zhang, Brown, and Stafford (2023) <doi:10.48550/arXiv.2305.09914>; and Stringer, Brown, and Stafford (2021) <doi:10.1111/biom.13329>. |
Authors: | Ziang Zhang [aut, cre], Yongwei Lin [aut], Alex Stringer [aut], Patrick Brown [aut] |
Maintainer: | Ziang Zhang <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.3 |
Built: | 2024-11-13 07:27:33 UTC |
Source: | CRAN |
A simulated dataset from the case-crossover model.
ccData
ccData
'ccData' A data frame with 3596 rows and 6 columns.
Compute the SD correction factor for sgp
compute_d_step_sgpsd(d, a)
compute_d_step_sgpsd(d, a)
d |
A numeric value for the prediction step. |
a |
The frequency parameter of the sgp. |
The correction factor c that should be used to compute the d-step PSD as c*SD.
Computing the posterior samples of the function or its derivative using the posterior samples of the basis coefficients for iwp
compute_post_fun_iwp( samps, global_samps = NULL, knots, refined_x, p, degree = 0, intercept_samps = NULL )
compute_post_fun_iwp( samps, global_samps = NULL, knots, refined_x, p, degree = 0, intercept_samps = NULL )
samps |
A matrix that consists of posterior samples for the O-spline basis coefficients. Each column represents a particular sample of coefficients, and each row is associated with one basis function. This can be extracted using 'sample_marginal' function from 'aghq' package. |
global_samps |
A matrix that consists of posterior samples for the global basis coefficients. If NULL, assume there will be no global polynomials and the boundary conditions are exactly zero. |
knots |
A vector of knots used to construct the O-spline basis, first knot should be viewed as "0", the reference starting location. These k knots will define (k-1) basis function in total. |
refined_x |
A vector of locations to evaluate the O-spline basis |
p |
An integer value indicates the order of smoothness |
degree |
The order of the derivative to take, if zero, implies to consider the function itself. |
intercept_samps |
A matrix that consists of posterior samples for the intercept parameter. If NULL, assume the function evaluated at zero is zero. |
A data.frame that contains different samples of the function or its derivative, with the first column being the locations of evaluations x = refined_x.
knots <- c(0, 0.2, 0.4, 0.6) samps <- matrix(rnorm(n = (3 * 10)), ncol = 10) result <- compute_post_fun_iwp(samps = samps, knots = knots, refined_x = seq(0, 1, by = 0.1), p = 2) plot(result[, 2] ~ result$x, type = "l", ylim = c(-0.3, 0.3)) for (i in 1:9) { lines(result[, (i + 1)] ~ result$x, lty = "dashed", ylim = c(-0.1, 0.1)) } global_samps <- matrix(rnorm(n = (2 * 10), sd = 0.1), ncol = 10)
knots <- c(0, 0.2, 0.4, 0.6) samps <- matrix(rnorm(n = (3 * 10)), ncol = 10) result <- compute_post_fun_iwp(samps = samps, knots = knots, refined_x = seq(0, 1, by = 0.1), p = 2) plot(result[, 2] ~ result$x, type = "l", ylim = c(-0.3, 0.3)) for (i in 1:9) { lines(result[, (i + 1)] ~ result$x, lty = "dashed", ylim = c(-0.1, 0.1)) } global_samps <- matrix(rnorm(n = (2 * 10), sd = 0.1), ncol = 10)
Computing the posterior samples of the function using the posterior samples of the basis coefficients for sGP
compute_post_fun_sgp( samps, global_samps = NULL, k, refined_x, a, region, boundary = TRUE, m, intercept_samps = NULL, initial_location = NULL )
compute_post_fun_sgp( samps, global_samps = NULL, k, refined_x, a, region, boundary = TRUE, m, intercept_samps = NULL, initial_location = NULL )
samps |
A matrix that consists of posterior samples for the O-spline basis coefficients. Each column represents a particular sample of coefficients, and each row is associated with one basis function. This can be extracted using 'sample_marginal' function from 'aghq' package. |
global_samps |
A matrix that consists of posterior samples for the global basis coefficients. If NULL, assume there will be no global polynomials and the boundary conditions are exactly zero. |
k |
The number of the sB basis. |
refined_x |
A vector of locations to evaluate the sB basis |
a |
The frequency of sGP. |
region |
The region to define the sB basis |
boundary |
A boolean variable to indicate whether the boundary condition should be considered in the prediction. |
m |
The number of harmonics to consider |
intercept_samps |
A matrix that consists of posterior samples for the intercept parameter. If NULL, assume there is no intercept samples to adjust. |
initial_location |
The initial location of the sGP. |
A data.frame that contains different samples of the function, with the first column being the locations of evaluations x = refined_x.
Constructing the precision matrix given the knot sequence
compute_weights_precision(object)
compute_weights_precision(object)
object |
A fitted model object. |
A precision matrix of the corresponding basis function, should be diagonal matrix with size (k-1) by (k-1).
Constructing the precision matrix given the knot sequence (helper)
compute_weights_precision_helper(x)
compute_weights_precision_helper(x)
x |
A vector of knots used to construct the O-spline basis, first knot should be viewed as "0", the reference starting location. These k knots will define (k-1) basis function in total. |
A precision matrix of the corresponding basis function, should be diagonal matrix with size (k-1) by (k-1).
compute_weights_precision_helper(x = c(0,0.2,0.4,0.6,0.8))
compute_weights_precision_helper(x = c(0,0.2,0.4,0.6,0.8))
A subset of the the COVID-19 daily death data collected between 2020 to 2022. The data is obtained from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Dong et al., 2020).
covid_canada
covid_canada
'covid_canada' A data frame with 787 rows and 5 columns:
The date of the measurement.
The number of new deaths at that date.
The converted numerical value of 'Date'.
Coded as 1 if the date is the corresponding weekday, -1 else if the date is on Sunday, and 0 otherwise.
The index of that observation.
Dong, E., H. Du, and L. Gardner (2020). An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases 20 (5), 533–534.
This function allows for the dynamic modification of a C++ template within the BayesGP package. Users can specify custom content for the log-likelihood as well as the log-prior of the SD parameter in the template.
custom_template( SETUP = NULL, LOG_LIKELIHOOD = NULL, LOG_PRIOR = NULL, compile_template = FALSE )
custom_template( SETUP = NULL, LOG_LIKELIHOOD = NULL, LOG_PRIOR = NULL, compile_template = FALSE )
SETUP |
A character string or vector containing the lines of C++ code to be inserted before the computation of log-likelihood. |
LOG_LIKELIHOOD |
A character string or vector containing the lines of C++ code to be inserted in the log-likelihood section of the template. Should be NULL if no changes are to be made to this section. |
LOG_PRIOR |
A character string or vector containing the lines of C++ code to be inserted in the log-prior (of the SD parameter) section of the template. Should be NULL if no changes are to be made to this section. |
compile_template |
A indicator of whether the new template should be compiled. default is FALSE. |
A string representing the path to the temporary .so (or .dll) file containing the compiled custom C++ code.
Construct posterior inference given samples
extract_mean_interval_given_samps(samps, level = 0.95, quantiles = NULL)
extract_mean_interval_given_samps(samps, level = 0.95, quantiles = NULL)
samps |
Posterior samples of f or its derivative, with the first column being evaluation points x. This can be yielded by 'compute_post_fun_iwp' function. |
level |
The level to compute the pointwise interval. Ignored when quantiles are provided. |
quantiles |
A numeric vector of quantiles to be computed. |
A dataframe with a column for evaluation locations x, and posterior mean and pointwise intervals at that set of locations.
Function defined to enhance the usability for users on IDEs.
f( smoothing_var, model = "iid", sd.prior = NULL, boundary.prior = NULL, initial_location = c("middle", "left", "right") )
f( smoothing_var, model = "iid", sd.prior = NULL, boundary.prior = NULL, initial_location = c("middle", "left", "right") )
smoothing_var |
The variable name of the smoothing variable. |
model |
The name of the smoothing model. |
sd.prior |
A list/vector that specifies the prior of the sd parameter. |
boundary.prior |
A list/vector that specifies the prior of the boundary parameter. |
initial_location |
A character/number that specifies the initial location of the smoothing variable. |
A call object that can be used in the model_fit formula to indicate a smooth term or random effect.
This function takes an optional list of options and fills in any missing values with a set of default MCMC options.
get_default_option_list_MCMC(option_list = list())
get_default_option_list_MCMC(option_list = list())
option_list |
A list of options to be passed to the MCMC. If some options are missing, the function will use default values. |
A list containing the complete set of options with defaults where necessary.
# Example: Get the default option list options <- get_default_option_list_MCMC() print(options)
# Example: Get the default option list options <- get_default_option_list_MCMC() print(options)
Constructing and evaluating the global polynomials, to account for boundary conditions (design matrix)
global_poly_helper(x, p = 2)
global_poly_helper(x, p = 2)
x |
A vector of locations to evaluate the global polynomials |
p |
An integer value indicates the order of smoothness |
A matrix with i,j componet being the value of jth basis function value at ith element of x, the ncol should equal to p, and nrow should equal to the number of elements in x
global_poly_helper(x = c(0, 0.2, 0.4, 0.6, 0.8), p = 2)
global_poly_helper(x = c(0, 0.2, 0.4, 0.6, 0.8), p = 2)
Constructing and evaluating the global polynomials, to account for boundary conditions (design matrix) of sgp
global_poly_helper_sgp(refined_x, a, m, initial_location = NULL)
global_poly_helper_sgp(refined_x, a, m, initial_location = NULL)
refined_x |
A vector of locations to evaluate the sB basis |
a |
The frequency of sgp. |
m |
The number of harmonics to consider |
initial_location |
The value of the initial location. If NULL, the minimum value of refined_x will be used. |
A matrix with i,j componet being the value of jth basis function value at ith element of x, the ncol should equal to (2*m), and nrow should equal to the number of elements in x
Constructing and evaluating the local O-spline basis (design matrix)
local_poly_helper(knots, refined_x, p = 2, neg_sign_order = 0)
local_poly_helper(knots, refined_x, p = 2, neg_sign_order = 0)
knots |
A vector of knots used to construct the O-spline basis, first knot should be viewed as "0", the reference starting location. These k knots will define (k-1) basis function in total. |
refined_x |
A vector of locations to evaluate the O-spline basis |
p |
An integer value indicates the order of smoothness |
neg_sign_order |
An integer value N such that D = ((-1)^N)*D for the splines at negative knots. Default is 0. |
A matrix with i,j component being the value of jth basis function value at ith element of refined_x, the ncol should equal to number of knots minus 1, and nrow should equal to the number of elements in refined_x.
local_poly_helper(knots = c(0, 0.2, 0.4, 0.6, 0.8), refined_x = seq(0, 0.8, by = 0.1), p = 2)
local_poly_helper(knots = c(0, 0.2, 0.4, 0.6, 0.8), refined_x = seq(0, 0.8, by = 0.1), p = 2)
Fitting a hierarchical model based on the provided formula, data and parameters such as type of method and family of response. Returning the S4 objects for the random effects, concatenated design matrix for the intercepts and fixed effects, fitted model, indexes to partition the posterior samples.
model_fit( formula, data, method = "aghq", family = "gaussian", control.family, control.fixed, aghq_k = 4, size = NULL, cens = NULL, weight = NULL, strata = NULL, M = 3000, customized_template = NULL, Customized_RE = NULL, option_list = list(), envir = parent.frame(), extra_theta_num = NULL )
model_fit( formula, data, method = "aghq", family = "gaussian", control.family, control.fixed, aghq_k = 4, size = NULL, cens = NULL, weight = NULL, strata = NULL, M = 3000, customized_template = NULL, Customized_RE = NULL, option_list = list(), envir = parent.frame(), extra_theta_num = NULL )
formula |
A formula that contains one response variable, and covariates with either random or fixed effect. |
data |
A dataframe that contains the response variable and other covariates mentioned in the formula. |
method |
The inference method used in the model. By default, the method is set to be "aghq". |
family |
The family of response used in the model. By default, the family is set to be "gaussian". |
control.family |
Parameters used to specify the priors for the family parameters, such as the standard deviation parameter of Gaussian family. For example control.family = 1 in the Gaussian family corresponds to an Exponential prior to the standard deviation parameter of the Gaussian noise with median 1. When left unspecified, the default prior is an Exponential prior with median 1. |
control.fixed |
Parameters used to specify the priors for the fixed effects. For example control.fixed = list(intercept = list(prec = 0.001, mean = 0)) will setup the prior N(0,1/0.001) for the intercept parameter. When left unspecified, all fixed effect parameters will be assigned independent N(0,1/0.001) priors. |
aghq_k |
An integer to specify the number of quadrature points used in the aghq method. By default, the value is 4. |
size |
The name of the size variable, should be one of the variables in 'data'. The default value is "NULL", corresponding to a vector of 1s. This is only used for the Binomial family, and denotes the number of binomial trails. |
cens |
The name of the right-censoring indicator, should be one of the variables in 'data'. The default value is "NULL". This is only used for the CoxPH family. |
weight |
The name of the weight variable, should be one of the variables in 'data'. The default value is "NULL", corresponding to a vector of 1s. This is only used for the Case-Crossover family, and denotes the weight of each observation. |
strata |
The name of the strata variable, should be one of the variables in 'data'. The default value is "NULL". This is only used for the Case-Crossover family, and denotes the strata of each observation. |
M |
The number of posterior samples to be taken, by default is 3000. |
customized_template |
The name of the customized cpp template that the user wants to use instead. By default this is NULL, and the cpp template 'BayesGP' will be used. |
Customized_RE |
The list that contains the compute_B and compute_P functions for the customized random effect. By default, this is NULL and there is not customized random effect in the model. |
option_list |
A list that controls the details of the inference algorithm, by default is an empty list. |
envir |
The environment in which the formula and other expressions are to be evaluated. Defaults to 'parent.frame()', which refers to the environment from which the function was called. This allows the function to access variables that are defined in the calling function's scope. |
extra_theta_num |
An integer number to indicate the extra number of parameters required in the 'theta' vector. Should only be specified when using customized template. |
A list that contains following items: the S4 objects for the random effects (instances), concatenated design matrix for the fixed effects (design_mat_fixed), fitted aghq (mod) and indexes to partition the posterior samples (boundary_samp_indexes, random_samp_indexes and fixed_samp_indexes).
Performs repeated model fitting over a sequence of values for a specified variable within a hierarchical model. This function repeatedly fits a model for each value of the looping variable, compiles the log marginal likelihoods, and calculates the posterior probabilities for the variable's values.
model_fit_loop( loop_holder = "LOOP", loop_values, prior_func = function(x) { 1 }, parallel = FALSE, cores = (parallel::detectCores() - 1), ... )
model_fit_loop( loop_holder = "LOOP", loop_values, prior_func = function(x) { 1 }, parallel = FALSE, cores = (parallel::detectCores() - 1), ... )
loop_holder |
A string specifying the name of the variable to loop over. The default value is 'LOOP'. |
loop_values |
A numeric vector containing the values to loop over for the specified variable. |
prior_func |
A function that takes the specified loop_values and returns the values of the prior for the loop variable. By default, it is a uniform prior which returns a constant value, indicating equal probability for all values. |
parallel |
Logical, indicating whether or not to run the model fitting in parallel (default is FALSE). |
cores |
The number of cores to use for parallel execution (default is detected cores - 1). |
... |
Additional arguments passed to the model fitting function 'model_fit'. |
A data frame containing the values of the looping variable, their corresponding log marginal likelihoods, and posterior probabilities.
Obtain the posterior and prior density of all the parameters in the fitted model
para_density(object)
para_density(object)
object |
The fitted object from the function 'model_fit'. |
A list of data frames, each data frame contains the posterior density of the corresponding parameter.
The monthly all-cause mortality for male with age less than 40 in Pennsylvania.
PEN_death
PEN_death
'PEN_death' A data frame with 299 rows of observations.
Obtain the posterior summary table for all the parameters in the fitted model
post_table(object, quantiles = c(0.025, 0.975), digits = 3)
post_table(object, quantiles = c(0.025, 0.975), digits = 3)
object |
The fitted object from the function 'model_fit'. |
quantiles |
The specified quantile to display the posterior summary, default is c(0.025, 0.975). |
digits |
The significant digits to be kept in the result, default is 3. |
A data frame that contains the posterior summary of all the parameters in the fitted model.
To predict the GP component in the fitted model, at the locations specified in 'newdata'.
## S3 method for class 'FitResult' predict( object, newdata = NULL, variable, deriv = 0, include.intercept = TRUE, only.samples = FALSE, quantiles = c(0.025, 0.5, 0.975), boundary.condition = "Yes", ... )
## S3 method for class 'FitResult' predict( object, newdata = NULL, variable, deriv = 0, include.intercept = TRUE, only.samples = FALSE, quantiles = c(0.025, 0.5, 0.975), boundary.condition = "Yes", ... )
object |
The fitted object from the function 'model_fit'. |
newdata |
The dataset that contains the locations to be predicted for the specified GP. Its column names must include 'variable'. |
variable |
The name of the variable to be predicted, should be in the 'newdata'. |
deriv |
The degree of derivative that the user specifies for inference. Only applicable for a GP in the 'iwp' type. |
include.intercept |
A logical variable specifying whether the intercept should be accounted when doing the prediction. The default is TRUE. For Coxph model, this variable will be forced to FALSE. |
only.samples |
A logical variable indicating whether only the posterior samples are required. The default is FALSE, and the summary of posterior samples will be reported. |
quantiles |
A numeric vector of quantiles that predict.FitResult will produce, the default is c(0.025, 0.5, 0.975). |
boundary.condition |
A string specifies whether the boundary.condition should be considered in the prediction, should be one of c("yes", "no", "only"). The default option is "Yes". |
... |
Other arguments to be passed to the function. |
A data.frame that contains the posterior mean and pointwise intervals (or posterior samples) at the locations specified in 'newdata'.
Construct prior based on d-step prediction SD (for iwp)
prior_conversion_iwp(d, prior, p)
prior_conversion_iwp(d, prior, p)
d |
A numeric value for the prediction step. |
prior |
A list that contains alpha and u. This specifies the target prior on the d-step SD |
p |
An integer for the order of iwp. |
A list that contains alpha and u. The prior for the smoothness parameter such that
, that yields the ideal prior on the d-step SD.
Construct prior based on d-step prediction SD (for sgp)
prior_conversion_sgp(d, prior, freq, period, a, m = 1)
prior_conversion_sgp(d, prior, freq, period, a, m = 1)
d |
A numeric value for the prediction step. |
prior |
A list that contains alpha and u. This specifies the target prior on the d-step SD |
freq |
The frequency of the sgp, ignored if a is provided. |
period |
The period of the sgp, ignored if a or freq is provided. |
a |
The frequency parameter of the sgp. |
m |
The number of harmonics that should be considered, by default m = 1 represents only the sgp. |
A list that contains alpha and u. The prior for the smoothness parameter such that
, that yields the ideal prior on the d-step SD.
Extract the posterior samples from the fitted model for the target fixed variables.
sample_fixed_effect(model_fit, variables)
sample_fixed_effect(model_fit, variables)
model_fit |
The result from model_fit(). |
variables |
A vector of names of the target fixed variables to sample. |
A matrix with columns being the posterior samples of the target fixed effect variables.
Obtain the posterior density of a SD parameter in the fitted model
sd_density( object, component = NULL, h = NULL, theta_logprior = NULL, MCMC_samps_only = FALSE )
sd_density( object, component = NULL, h = NULL, theta_logprior = NULL, MCMC_samps_only = FALSE )
object |
The fitted object from the function 'model_fit'. |
component |
The component of the SD parameter that you want to show. By default this is 'NULL', indicating the family.sd is of interest. |
h |
For PSD, the unit of predictive step to consider, by default is set to 'NULL', indicating the result is using the same 'h' as in the model fitting. |
theta_logprior |
The log prior function used on the selected SD parameter. By default is 'NULL', and the current Exponential prior will be used. |
MCMC_samps_only |
For model fitted with MCMC, whether only the posterior samples are needed. |
A data.frame that contains the posterior and prior densities of the SD parameter. The SD parameter will be converted into PSD if applicable.
Plot the posterior density of a SD parameter in the fitted model
sd_plot(object, component = NULL, h = NULL, theta_logprior = NULL)
sd_plot(object, component = NULL, h = NULL, theta_logprior = NULL)
object |
The fitted object from the function 'model_fit'. |
component |
The component of the SD parameter that you want to show. By default this is 'NULL', indicating the family.sd is of interest. |
h |
For PSD, the unit of predictive step to consider, by default is set to 'NULL', indicating the result is using the same 'h' as in the model fitting. |
theta_logprior |
The log prior function used on the selected SD parameter. By default is 'NULL', and the current Exponential prior will be used. |
A plot that shows the posterior and prior density of the SD parameter. The SD parameter will be converted into PSD if applicable.