Package 'growfunctions' reference manual

Title:	Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data
Description:	Estimates a collection of time-indexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.
Authors:	Terrance Savitsky
Maintainer:	Terrance Savitsky <[email protected]>
License:	GPL (>= 3)
Version:	0.16
Built:	2025-03-07 07:08:04 UTC
Source:	CRAN

Bayesian Non-Parametric Models for Estimating a Set of Denoised, Latent Functions From an Observed Collection of Domain-Indexed Time-Series

Description

Package:	growfunctions
Type:	Package
Version:	0.16
Date:	2023-12-08
License:	GPL (>= 3)
LazyLoad:	yes

Details

Parameterizes a model for a collection of noisy time-series indexed by domain or observation unit as an additive function process plus noise process. The latent functions are modeled under both Gaussian process (GP) and intrinsic Gaussian Markov random (iGMRF) field priors. Dependence is estimated among the set of functions by selecting a prior, G, on the covariance or precision parameters of the GP and iGMRF, respectively, where G receives a Dirichlet process (DP) prior. The resulting marginal prior on the functions is a nonparametric scale mixture over the covariance or precision parameters, with G the unknown mixing measure. Draws from a DP are almost surely discrete, allowing for ties (interpreted as clusters) among the covariance or precision parameters indexed by domain or observation unit. Functions are included that permit additional inference on the clustering properties over the domains. The mixture models allow specification of multiple additive latent functions for each of the GP and iGMRF DP mixture formulations. Each function (under a GP prior) may be specified with a covariance function selected from a set of available options that allow for combinations of trend and seasonal components. The same is true for precision matrix constructions under the iGMRF DP mixture.

ESTIMATION FUNCTIONS

gpdpgrow performs Bayesian nonparametric estimation of a set of dependent, denoised latent functions under a DP mixture of GP's in an unsupervised fashion based on user input of an N x T data matrix, y, where N denotes the number of domains or units and T denotes the number of time points. The DP prior estimates the dependent structure among the covariance parameters generating each T x 1 function across the N domains. The user may specify multiple latent functions in an additive regression formulation, where each covariance kernel may be selected from the squared exponential ("se"), the rational quadratic ("rq"), or a quasi-periodic (product of a period and squared exponential to allow the periodic kernel to evolve) ("sn").

gmrfdpgrow also inputs an N x T data matrix, y, but replaces the GP prior used in gpdpgrow() with an iGMRF prior. The DP mixture is over the precision parameter, kappa, that multiplies a fixed matrix, Q, which specifies the length-scale of dependence. The user may specify multiple functions in an additive formulation, each with its own precision matrix. The precision matrix is specified via two options. Input q_type indicates whether the precision matrix is a trend ("tr") precision matrix or a seasonal ("sn") precision matrix. Integer innput q_order controls the length scale of the estimated functions and specifies the difference order of the precision matrix in the case of q_type = "tr", or the periodicity, in the case q_type = "sn". Typically, q_order is set to 1 or 2 when q_type = "tr", though other values are possible. An optional N x R predictor matrix, ksi, may be input to adjust the prior for cluster assignments, s, to produce a dependent product partition model.

MSPE Inputs a gpdpgrow() or gmrfdpgrow() object estimated where some data values deliberately set to missing (NA) and produces an out-of-sample mean square prediction error (MSPE), and a normalized MSPE (normalized by the variance of the missing test set). Both gpdpgrow() and gmrfdpgrow() returned objects also include a leave-one-out log-pseudo marginal likelihood LPML fit statistic estimated on the training set (so that no data are required to be left out). One would expect the nMSPE statistic to provide a greater penalty for model complexity since it is computed on data not used for estimation.

predict_functions Uses the model-estimated GP covariance parameters from gpdpgrow() or iGMRF precision parameters from gmrfdpgrow() to predict functions at future time points beyond the range of the data from their posterior predictive distributions. Both gpdpgrow() and gmrfdpgrow() will predict missing function values in the range of the data.

PLOT FUNCTIONS

cluster_plot inputs a returned object from either of gpdpgrow() or gmrfdpgrow() and produces two plots. The first plot creates panels for the clusters and aggregates line plots of posterior mean estimates for member denoised functions. An optional smoother may be drawn through the set of functions in each panel to differentiate the patterns expressed across clusters. A second plot renders the estimated posterior mean values (with an option for credible intervals) for a single or group of randomly-selected latent functions against the actual data values, y.

informative_plot inputs a list of returned objects, all from either of gpdpgrow() or gmrfdpgrow() (where the model type, GP or iGMRF, is communicated with input model) that compares credible intervals for covariance or precision parameters where the data are drawn from an informative sampling design (rather than as iid), so that the distribution for the population is not the same as that for the sample. One model conducts estimation in a fashion that ignores the informative design and the other incorporates sampling weights to account for the informativeness. Comparing the resulting estimated credible intervals provides a means to assess the sensitivity of estimated parameters to the sampling design. The set of objects are labeled with objects_labels with allowable inputs c("ignore","weight","iid"). One of the objects must have label "ignore", and the other must have label, weight. An additional object may also be input that is estimated from an iid sample drawn from the same population as the informative sample used for estimation under objects "ignore" and "weight". Both informative and iid samples are generated from the synthetic data function, gen_informative_sample().

fit_compare inputs a list of returned objects, each from either gpdpgrow() or gmrfdpgrow(), and plots the posterior mean estimate for a randomly-selected latent function as compared to the actual data in a set panels indexed by cluster and object. Allows comparison of fit performance of functions to data across varied model specifications.

predict_plot uses a returned object from predict_functions() to plot both estimated and predicted function values (with the option for credible intervals) where the prediction interval is outside the range of data. The plot function, cluster_plot, should be used where the predicted values are in the range of the data (and may, hence, be treated as missing values. The estimated functions are plotted in cluster_plot for the missing, as well as observed, data points).

DATA SETS (and functions to generate synthetic data sets)

cps Data derived from the Current Population Survey (CPS) administered to households in local areas within each of 51 states. These data capture a rectangular matrix of monthly-indexed all-other employment direct estimates computed from the set of household responses. The data capture 156 month observations (from 2000 - 2013) for each of the 51 states. Two objects are included: y_raw contains the 51 x 156 matrix of all-other employment counts. y contains the 51 x 156 matrix of all-other employment counts are standardization to (0,1), by state, to all them to be modeled together.

gen_informative_sample Generates an N x T population data matrix, y, and an associated n x T sample data matrix, y_obs, where the sample is drawn using a 1 or 2-stage informative process. The 1-stage sample uses unequal probability stratified sampling, while the 2-stage process samples the first stage in blocks, while the second stage samples with unequal probability from strata for selected blocks. Both block and strata memberships for population units are generated based on the variance of their time-series, y. The resulting sample is informative because the block and cluster memberships and selection probabilities are determined based on y.

Author(s)

Terrance Savitsky [email protected]

References

Terrance D. Savitsky (2016) Bayesian Nonparametric Mixture Estimation for Time-Indexed Functional Data in R, Journal of Statistical Software, Volume 72, Number 2, pages 1 – 34, doi:10.18637/jss.v072.i02.

T. D. Savitsky, D. Toth (2016) Bayesian Estimation Under Informative Sampling, Electronic Journal of Statistics, Volume 10, Number 1.

T. D. Savitsky (2016) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Survey Methodology.

Examples

{
library(growfunctions)

## load the monthly employment count 
## data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, 
## associated with 
## the years 2011 - 2013
## to examine the state level 
## employment levels 
## during the "great recession"
y_short <- cps$y[,(cps$yr_label %in% 
                  c(2011:2013))]

## run DP mixture of GP's to estimate 
## posterior distributions 
## for model parameters
## uses default setting of a single 
## "rational quadratic" 
## covariance formula
## A short number of iterations is used 
## for illustration
## Run for 500 iterations with half 
## as burn-in to 
## get a more useful result
res_gp     <- gpdpgrow(
                   y = y_short, 
                   n.iter = 3, 
                   n.burn = 1, 
                   n.thin = 1, 
                   n.tune = 0)  
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected 
## functions
fit_plots_gp   <- cluster_plot( 
  object = res_gp, units_name = "state", 
  units_label = cps$st, single_unit = FALSE, 
  credible = TRUE )
                                   
## Run the DP mixture of iGMRF's 
## to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## A short number of iterations 
## is used for illustration
## Run for 2000 iterations with 
## half as burn-in to 
## get a more useful result
res_gmrf <- gmrfdpgrow(
                 y = y_short, 
                 n.iter = 11, 
                 n.burn = 4, 
                 n.thin = 1) 
                                     
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of 
## randomly-selected functions
fit_plots_gmrf <- cluster_plot( 
  object = res_gmrf, units_name = "state", 
  units_label = cps$st, single_unit = FALSE, 
  credible = TRUE )                                    
                                     
## visual comparison of fit performance 
## between gpdpgrow() and gmrfdpgrow()
## or any two objects returned from any
## combination of these estimation
## functions
objects        <- vector("list",2)
objects[[1]]   <- res_gmrf
objects[[2]]   <- res_gp
label.object   <- c("gmrf_tr2","gp_rq")
## the map data.frame 
## object from fit_plots gp 
## includes a field that 
## identifies cluster assignments
## for each unit (or domain)
H        <- fit_plots_gp$map$cluster
fit_plot_compare_facet <- 
fit_compare( objects = objects, 
 H = H, label.object = label.object,
 y.axis.label = "normalized y",
 units_name = "state", units_label = cps$st)                                
}

{
library(growfunctions)

## load the monthly employment count 
## data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, 
## associated with 
## the years 2011 - 2013
## to examine the state level 
## employment levels 
## during the "great recession"
y_short <- cps$y[,(cps$yr_label %in% 
                  c(2011:2013))]

## run DP mixture of GP's to estimate 
## posterior distributions 
## for model parameters
## uses default setting of a single 
## "rational quadratic" 
## covariance formula
## A short number of iterations is used 
## for illustration
## Run for 500 iterations with half 
## as burn-in to 
## get a more useful result
res_gp     <- gpdpgrow(
                   y = y_short, 
                   n.iter = 3, 
                   n.burn = 1, 
                   n.thin = 1, 
                   n.tune = 0)  
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected 
## functions
fit_plots_gp   <- cluster_plot( 
  object = res_gp, units_name = "state", 
  units_label = cps$st, single_unit = FALSE, 
  credible = TRUE )
                                   
## Run the DP mixture of iGMRF's 
## to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## A short number of iterations 
## is used for illustration
## Run for 2000 iterations with 
## half as burn-in to 
## get a more useful result
res_gmrf <- gmrfdpgrow(
                 y = y_short, 
                 n.iter = 11, 
                 n.burn = 4, 
                 n.thin = 1) 
                                     
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of 
## randomly-selected functions
fit_plots_gmrf <- cluster_plot( 
  object = res_gmrf, units_name = "state", 
  units_label = cps$st, single_unit = FALSE, 
  credible = TRUE )                                    
                                     
## visual comparison of fit performance 
## between gpdpgrow() and gmrfdpgrow()
## or any two objects returned from any
## combination of these estimation
## functions
objects        <- vector("list",2)
objects[[1]]   <- res_gmrf
objects[[2]]   <- res_gp
label.object   <- c("gmrf_tr2","gp_rq")
## the map data.frame 
## object from fit_plots gp 
## includes a field that 
## identifies cluster assignments
## for each unit (or domain)
H        <- fit_plots_gp$map$cluster
fit_plot_compare_facet <- 
fit_compare( objects = objects, 
 H = H, label.object = label.object,
 y.axis.label = "normalized y",
 units_name = "state", units_label = cps$st)                                
}

Plot estimated functions for experimental units faceted by cluster versus data to assess fit.

Description

Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.

Usage

cluster_plot(
  object,
  N_clusters = NULL,
  time_points = NULL,
  units_name = "unit",
  units_label = NULL,
  date_field = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL,
  smoother = TRUE,
  sample_rate = 1,
  single_unit = FALSE,
  credible = FALSE,
  num_plot = NULL
)
cluster_plot(
  object,
  N_clusters = NULL,
  time_points = NULL,
  units_name = "unit",
  units_label = NULL,
  date_field = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL,
  smoother = TRUE,
  sample_rate = 1,
  single_unit = FALSE,
  credible = FALSE,
  num_plot = NULL
)

Arguments

`object`	A `gpdpgrow` or `gmrfdpgrow` object.
`N_clusters`	Denotes the number of largest sized (in terms of membership) clusters to plot. Defaults to all clusters.
`time_points`	Inputs a vector of common time points at which the collections of functions were observed (with the possibility of intermittent missingness). The length of `time_points` should be equal to the number of columns in the data matrix, `y`. Defaults to `time_points = 1:ncol(y)`.
`units_name`	The plot label for observation units. Defaults to `units_name = "function"`.
`units_label`	A vector of labels to apply to the observation units with length equal to the number of unique units. Defaults to sequential numeric values as input with data, `y`.
`date_field`	A vector of `Date` values for labeling the x-axis tick marks. Defaults to `1:T` .
`x.axis.label`	Text label for x-axis. Defaults to `"time"`.
`y.axis.label`	Text label for y-axis. Defaults to `"function values"`.
`smoother`	A scalar boolean input indicating whether to co-plot a smoother line through the functions in each cluster.
`sample_rate`	A numeric value in (0,1] indicating percent of functions to randomly sample within each cluster to address over-plotting. Defaults to 1.
`single_unit`	A scalar boolean indicating whether to plot the fitted vs data curve for only a single experimental units (versus a random sample of 6). Defaults to `single_unit = FALSE`.
`credible`	A scalar boolean indicating whether to plot 95 percent credible intervals for estimated functions, `bb`, when plotting fitted functions versus data. Defaults to `credible = FALSE`
`num_plot`	A scalar integer indicating how many randomly-selected functions to plot (each in it's own plot panel) in the plot of functions versus the observed time series in the case that `single_unit == TRUE`. Defaults to `num_plot = 6`.

Value

A list object containing the plot of estimated functions, faceted by cluster, and the associated data.frame object.

`p.cluster`	A `ggplot2` plot object
`dat.cluster`	A `data.frame` object used to generate `p.cluster`.

Author(s)

Terrance Savitsky [email protected]

Examples

{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short             <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )   
}
{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short             <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )   
}

Monthly employment counts from 1990 - 2013 from the Current Population Survey

Description

Monthly employment counts published by the U.S. Bureau of Labor Statistics in the Current Population Survey (CPS) for each of N = 51 states (including the District of Columbia). This dataset covers T = 278 months from 1990 the first two months of 2013. The data include a N x T matrix, y_raw, of raw employment counts, as well as set of standardized values, y, where the standardization is done within state. The standardized data matrix is used in our gpdpgrow and gmrfdpgrow estimating functions because the standardization facilitates comparisons of the time-series across states.

Format

A list object of 5 objects supporting a data matrix of N = 51 state time series for T = 278 months.

Details

y. An (N = 51) x (T = 278) matrix of standardized employment count estimates for N = 51 states for T = 278 months, beginning in 1990. The counts are standardized to (0,1) for each state series
y_raw. An N x T matrix of estimated monthly employment counts for N = 51 states.
st. Two-digit labels for each of the N states in the order presented in y and y_raw.
dte. A Date vector of length T that presents the set of dates (in y-m-d format) associated to the T time points presented in y and y_raw.
yr. A number vector listing sequence of years, 1990 - 2013 included in the data set.
yr_label. A numerical vector of length T = 278 with year labels for each monthly employment count in the cps data set.

Side-by-side plot panels that compare latent function values to data for different estimation models

Description

Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.

Usage

fit_compare(
  objects,
  H = NULL,
  label.object = c("gp_rq", "gmrf_rw2"),
  units_name = "Observation_Unit",
  units_label = NULL,
  date_field = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL
)
fit_compare(
  objects,
  H = NULL,
  label.object = c("gp_rq", "gmrf_rw2"),
  units_name = "Observation_Unit",
  units_label = NULL,
  date_field = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL
)

Arguments

`objects`	A list input where each element is a returned object from estimation with either of `gpdpgrow` or `gmrfdpgrow` or an object that contains true `N x T` matrix of true latent function values, `f`. This latter input is only needed if want to compare estimated to true latent function values.
`H`	An `N x 1` with entries in `1,...,M` of cluster assignments for the `N` units of `y` under a known clustering.
`label.object`	A character vector of length equal to `objects` that contains labels for each element of `objects` to be used in rendering comparison plots. Defaults to `label.object = c("gp_rq","gmrf_rw2")`.
`units_name`	A character input that provides a label for the set of `N` observation units. Defaults to `units_name = "Observation_Unit"`.
`units_label`	A vector of labels to apply to the observation units with length equal to the number of unique units. Defaults to sequential numeric values as input with data, `y`.
`date_field`	A vector of `Date` values for labeling the x-axis tick marks. Defaults to `1:T` .
`x.axis.label`	Text label for x-axis. Defaults to `"time"`.
`y.axis.label`	Text label for y-axis. Defaults to `"function values"`.

Value

A list object containing the plot of estimated functions, faceted by cluster, and the associated data.frame object.

`p.t`	A `ggplot2` plot object
`map`	A `data.frame` object that contains clustering structure of observation units.

Author(s)

Terrance Savitsky [email protected]

Examples

{
library(growfunctions)

## load the monthly employment count data 
## for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, 
## associated with 
## the years 2009 - 2013
## to examine the state level 
## employment levels 
## during the "great recession"
y_short <- cps$y[,(cps$yr_label %in% 
                 c(2010:2013))]

## run DP mixture of GP's to 
## estimate posterior distributions 
## for model parameters
## uses default setting of a 
## single "rational quadratic" 
## covariance formula
res_gp         <- gpdpgrow(
                     y = y_short, 
                     n.iter = 3, 
                     n.burn = 1, 
                     n.thin = 1, 
                     n.tune = 0)  
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected 
## functions
fit_plots_gp   <- cluster_plot( 
 object = res_gp,  units_name = "state", 
 units_label = cps$st, single_unit = FALSE, 
 credible = TRUE )
                                   
## Run the DP mixture of iGMRF's to 
## estimate posterior 
## distributions for model parameters
## Under default 
## RW2(kappa) = order 2 trend 
## precision term
res_gmrf     <- gmrfdpgrow(y = y_short, 
                       n.iter = 13, 
                       n.burn = 4, 
                       n.thin = 1) 
                                     
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf   <- cluster_plot( object = res_gmrf, 
  units_name = "state", units_label = cps$st, 
  single_unit = FALSE, 
  credible = TRUE )                                    
                                     
## visual comparison of fit performance 
## between gpdpgrow() and gmrfdpgrow()
## or any two objects returned from any
## combination of these estimation
## functions
objects        <- vector("list",2)
objects[[1]]   <- res_gmrf
objects[[2]]   <- res_gp
label.object   <- c("gmrf_tr2","gp_rq")
## the map data.frame object 
## from fit_plots gp 
## includes a field that 
## identifies cluster assignments
## for each unit (or domain)
H        <- fit_plots_gp$map$cluster
fit_plot_compare_facet <- 
fit_compare( objects = objects, 
 H = H, label.object = label.object,
 y.axis.label = "normalized y",
 units_name = "state", units_label = cps$st)                                  
}
{
library(growfunctions)

## load the monthly employment count data 
## for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, 
## associated with 
## the years 2009 - 2013
## to examine the state level 
## employment levels 
## during the "great recession"
y_short <- cps$y[,(cps$yr_label %in% 
                 c(2010:2013))]

## run DP mixture of GP's to 
## estimate posterior distributions 
## for model parameters
## uses default setting of a 
## single "rational quadratic" 
## covariance formula
res_gp         <- gpdpgrow(
                     y = y_short, 
                     n.iter = 3, 
                     n.burn = 1, 
                     n.thin = 1, 
                     n.tune = 0)  
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected 
## functions
fit_plots_gp   <- cluster_plot( 
 object = res_gp,  units_name = "state", 
 units_label = cps$st, single_unit = FALSE, 
 credible = TRUE )
                                   
## Run the DP mixture of iGMRF's to 
## estimate posterior 
## distributions for model parameters
## Under default 
## RW2(kappa) = order 2 trend 
## precision term
res_gmrf     <- gmrfdpgrow(y = y_short, 
                       n.iter = 13, 
                       n.burn = 4, 
                       n.thin = 1) 
                                     
## 2 plots of estimated functions: 
## 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf   <- cluster_plot( object = res_gmrf, 
  units_name = "state", units_label = cps$st, 
  single_unit = FALSE, 
  credible = TRUE )                                    
                                     
## visual comparison of fit performance 
## between gpdpgrow() and gmrfdpgrow()
## or any two objects returned from any
## combination of these estimation
## functions
objects        <- vector("list",2)
objects[[1]]   <- res_gmrf
objects[[2]]   <- res_gp
label.object   <- c("gmrf_tr2","gp_rq")
## the map data.frame object 
## from fit_plots gp 
## includes a field that 
## identifies cluster assignments
## for each unit (or domain)
H        <- fit_plots_gp$map$cluster
fit_plot_compare_facet <- 
fit_compare( objects = objects, 
 H = H, label.object = label.object,
 y.axis.label = "normalized y",
 units_name = "state", units_label = cps$st)                                  
}

Generate a finite population and take an informative single or two-stage sample.

Description

Used to compare performance of sample design-weighted and unweighted estimation procedures.

Usage

gen_informative_sample(
  clustering = TRUE,
  two_stage = FALSE,
  theta = c(0.2, 0.7, 1),
  M = 3,
  theta_star = matrix(c(0.3, 0.3, 0.3, 0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow =
    TRUE),
  gp_type = "rq",
  N = 10000,
  T = 15,
  L = 10,
  R = 8,
  I = 4,
  n = 750,
  noise_to_signal = 0.05,
  incl_gradient = "medium"
)
gen_informative_sample(
  clustering = TRUE,
  two_stage = FALSE,
  theta = c(0.2, 0.7, 1),
  M = 3,
  theta_star = matrix(c(0.3, 0.3, 0.3, 0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow =
    TRUE),
  gp_type = "rq",
  N = 10000,
  T = 15,
  L = 10,
  R = 8,
  I = 4,
  n = 750,
  noise_to_signal = 0.05,
  incl_gradient = "medium"
)

Arguments

`clustering`	Boolean input on whether want population generated from clusters of covariance parameters. Defaults to `clustering = FALSE`
`two_stage`	Boolean input on whether want two stage sampling, with first stage defining set of `L` blocks, where membership in blocks determined by quantiles of observation unit variance functions. (They are structured like strata, though they are sub-sampled).
`theta`	A numeric vector of global covariance parameters in the case of `clustering = FALSE`. The length, `P`, of `theta` must be consistent with the selected `gp_type`. Defaults to `theta = c(0.30.7,1.0)` in the case of `clustering = FALSE`.
`M`	Scalar input denoting number of clusters to employ if `clustering = TRUE`. Defaults to `M = 3`
`theta_star`	An P x M matrix of cluster location values associated with the choice of `M` and the selected `gp_type`. Defaults to `matrix(c(0.3,0.3,0.3,0.31,0.72,2.04,0.58,0.83,1.00),3,3,byrow=TRUE))`.
`gp_type`	Input of choice for covariance matrix formulation to be used to generate the functions for the `N` population units. Choices are `c("se","rq")`, where `"se"` denotes the squared exponential covariance function and `"rq"` denotes the rational quadratic. Defaults to `gp_type = "se"`
`N`	A scalar input denoting the number of population units (or establishments).
`T`	A scalar input denoting the number of time points in each of `N`, T x 1 functions that contribute to the N x T population data matrix, `y`. Defaults to `T = 15`.
`L`	A scalar input that denotes the number of blocks in which to assign the population units to be sub-sampled in the first stage of sampling. Defaults to `L = 10`.
`R`	A scalar input that denotes the number of blocks to sample from `L = 10` with probability proportional to the average variance of member functions in each block.
`I`	A scalar input denoting the number of strata to form within each block. Population units are divided into equally-sized strata based on variance quantiles. Defaults to `I = 4`.
`n`	Sample size to be generated. Both an informative sample under either single (`two_stage = FALSE`) or 2-stage (`two_stage = TRUE`) sample is taken, along with a non-informative, iid sample of the same size (`n`) from the finite population (generated with (`clustering = TRUE`) or without clustering). Defaults to `n = 770`.
`noise_to_signal`	A numeric input in the interval, `(0,1)`, denoting the ratio of noise variance to the average variance of the generated functions, `bb_i`. Defaults to `noise_to_signal = 0.05`
`incl_gradient`	A character input on whether stratum probabilities from lowest-to-highest is to `"high"`, in which case they are proportional to the exponential of the cluster number. If set to `"medium"` , the inclusion probabilities are proportional to the square of the cluster number. Note that population units are assigned to each stratum proportional to a progressively increasing quantile variance. The `incl_gradient` setting is used for both `two_stage = TRUE`, in which case it is applied to strata within block, as well as `two_stage = FALSE`, in which case a simple stratified random sample is conducted. Defaults to `incl_gradient = "medium"`

Value

A list object named dat_sim containing objects related to the generated sample finite population, the informative sample and the non-informative, iid, sample. Some important objects, include:

`H`	A vector of length `N`, the population size, with cluster assignments for each establishment (unit) in `1,..M` clusters.
`map.tot`	A `data.frame` object including unit label identifiers (under `establishment`), the cluster assignment (if `clustering = TRUE`), the block (if`two_stage = TRUE`) and stratum assignments and the sample inclusion probabilities.
`map.obs`	A `data.frame` object configured the same as `map.tot`, only confined to those establishments/units selected into the informative sample of size `n`.
`map.iid`	A `data.frame` object configured the same as `map.tot`, only confined to those establishments/units selected into the non-informative, iid sample of size `n`.
`(y`, `bb)`	N x T `matrix` objects containing data responses and de-noised ' functions, respectively, for each of the `N` population units. The order of the `N` units is consistent with `map`.
`(y_obs`, `bb_obs)`	N x T `matrix` objects containing observed responses and de-noised ' functions, respectively, for each of the `n` units sampled under an informative sampling design. The order of the `n` units is consistent with `map_obs`.
`(y_iid`, `bb_iid)`	N x T `matrix` objects containing observed responses and de-noised ' functions, respectively, for each of the `n` units sampled under a non-informative / iid sampling design. The order of the `n` units is consistent with `map_iid`.

Author(s)

Terrance Savitsky [email protected]

Examples

## Not run: 
library(growfunctions)
## use gen_informative_sample() to generate an 
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate 
## the population.
## A single stage stratified random sample of size n 
## is drawn from the population using I = 4 strata. 
## The resulting sample is informative in that the 
## distribution for this sample is
## different from the population from which 
## it was drawn because the strata inclusion
## probabilities are proportional to a feature 
## of the response, y (in the case, the variance.
## The stratified random sample over-samples 
## large variance strata).
## (The user may also select a 2-stage 
## sample with the first stage
## sampling "blocks" of the population and 
## the second stage sampling strata within blocks). 
dat_sim        <- gen_informative_sample(N = 10000, 
                                n = 500, T = 10,
                                noise_to_signal = 0.1)

## extract n x T observed sample under informative
## stratified sampling design.
y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

## End(Not run)
## Not run: 
library(growfunctions)
## use gen_informative_sample() to generate an 
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate 
## the population.
## A single stage stratified random sample of size n 
## is drawn from the population using I = 4 strata. 
## The resulting sample is informative in that the 
## distribution for this sample is
## different from the population from which 
## it was drawn because the strata inclusion
## probabilities are proportional to a feature 
## of the response, y (in the case, the variance.
## The stratified random sample over-samples 
## large variance strata).
## (The user may also select a 2-stage 
## sample with the first stage
## sampling "blocks" of the population and 
## the second stage sampling strata within blocks). 
dat_sim        <- gen_informative_sample(N = 10000, 
                                n = 500, T = 10,
                                noise_to_signal = 0.1)

## extract n x T observed sample under informative
## stratified sampling design.
y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

## End(Not run)

Run a Bayesian functional data model under an instrinsic GMRF prior whose precision parameters employ a DP prior for a COUNT data response type where: y ~ poisson(E*exp(Psi)) Psi ~ N(gamma,tau_e^-1) which is a Poisson-lognormal model

Description

An internal function to gmrfdpgrow

Usage

gmrfdpcountPost(
  y,
  E,
  ksi,
  ipr,
  C,
  D,
  q_order,
  q_type,
  n.iter,
  n.burn,
  n.thin,
  M_init,
  w_star,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)
gmrfdpcountPost(
  y,
  E,
  ksi,
  ipr,
  C,
  D,
  q_order,
  q_type,
  n.iter,
  n.burn,
  n.thin,
  M_init,
  w_star,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions
`E`	A multivariate offset variable, specified as an N x T matrix, in the case that `y` is of type count data. The offset will be used to model the `y` as under a poisson lognormal where `y ~ Pois(E*exp(Psi))`. Defaults to `NULL`, in which case the response type is assumed continuous.
`ksi`	An N x P matrix of N observations of P predictors to be used in prior probability of co-clustering of set of N, T x 1 observations. Defaults to `ksi = NULL` such that predictors are not used to a priori determine co-clustering probabilities.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`C`	A list object of length, `K`, the number of iGMRF precision terms. Each entry contains a T x T normalized adjacency matrix. The diagonal entries are `0` and row `i` contains the weight for each entry `!=i` divided by the sum of the weights.
`D`	A K x T matrix, where `K` denotes the number of iGMRF terms. Row `k` contains the `T` elements of the diagonal of the term-`k` precision matrix, `Q_k`. Will increase with order and be equal, except for boundary corrections.
`q_order`	An integer vector where each entry contains the order of the associated `K` iGMRF precision terms matrix of Euclidean distances associated to each seasonal covariance term.
`q_type`	A vector of length `K`, the number of iGMRF precision terms, with each entry indicating whether the associated term is a trend (`"tr"`) or seasonality (`"sn"`) term.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`M_init`	Starting value of number of clusters for sampling cluster assignments.
`w_star`	Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces samplin auto-correlation, but increases computational burden. Defaults to `w_star = 2`.
`q_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`q_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`tau_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error precision parameter. Defaults to `tau_shape = 1.0`.
`tau_rate`	The rate parameter of the Gamma prior distribution on `tau_e`. Defaults to `tau_rate = 1`.
`dp_shape`	The shape parameter for the $\Gamma$ prior on the DP concentration parameter. The rate parameter is set of `1`.
`dp_rate`	The rate parameter for the $\Gamma$ prior on the DP concentration parameter. Default value is `1`.
`nu`	The degree of freedom parameter for the Huang and Wand prior on precision matrix locations, `Lambda_star`, in the case that predictors, `ksi`, are entered to instantiate a predictor-dependent prior for co-clustering. Default value is `4`
`Rep`	The number of times to draw samples of the `N x T` log-mean parameters, `Psi`, for each MCMC iteration under a Poisson-lognormal model when the response type for `y` is count (not continuous) Default value is `1`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.
`jitter`	A scalar double indicating amount of jitter to subract from the posterior rate and shape hyperparameters of `tau_e` to stabilize computation. Defaults to `jitter = 0.0`.
`kappa_fast`	Boolean for whether to generate rate hyperparameter from full conditionals versus joint Gaussian (on random effects, `bb`, given `kappa`. The former is faster, but numerically less stable. Defaults to `kappa_fast = FALSE`.
`stable_launch`	A boolean indicator on whether to generate initial values for `N x T` log-mean, `Psi`, when `y` are count data from a Gaussian prior distribution, which can induce numerical launch instabilities, or whether to initialize as `y/E` (where missing values in `y` are imputed). Defaults to `stable_launch = TRUE`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gmrfdpgrow

Author(s)

Terrance Savitsky [email protected]

Bayesian instrinsic Gaussian Markov Random Field model for dependent time-indexed functions

Description

Estimates a collection of time-indexed functions under intrinsic Gaussian Markov random field prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same iGMRF precision parameter. The iGMRF formulation supports any number of additive precision terms, expressing either or both of multiple trend and seasonality.

Usage

gmrfdpgrow(
  y,
  ksi,
  E,
  ipr,
  q_order,
  q_type,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  M_init,
  w_star,
  n.iter,
  n.burn,
  n.thin,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)
gmrfdpgrow(
  y,
  ksi,
  E,
  ipr,
  q_order,
  q_type,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  M_init,
  w_star,
  n.iter,
  n.burn,
  n.thin,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)

Arguments

`y`	A multivariate continuous response, specified as an N x T matrix, where `N` denotes the number of functions and `T`, the number of time points per function. Intermittent missing-at-random values are allowed and will be estimated from the posterior predictive distribution. Missing cells should be denoted with `NA`.
`ksi`	An N x P matrix of N observations of P predictors to be used in prior probability of co-clustering of set of N, T x 1 observations. Defaults to `ksi = NULL` such that predictors are not used to a priori determine co-clustering probabilities.
`E`	A multivariate offset variable, specified as an N x T matrix, in the case that `y` is of type count data. The offset will be used to model the `y` as under a poisson lognormal where `y ~ Pois(E*exp(Psi))`. Defaults to `NULL`, in which case the response type is assumed continuous.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`q_order`	An integer vector of length `K` to select the order for each iGMRF precision term. e.g. If the first term is a RW2 and there is a second is a 3-month seasonality term, where the time points are indexed by month, then `q_order = c(2,3)`. Defaults to `q_order = 2`
`q_type`	A vector of length `K`, the number of iGMRF precision terms, with each entry indicating whether the associated term is a trend (`"tr"`) or seasonality (`"sn"`) term. So all entries must be one of `c("tr","sn")`. Defaults to `q_type = "tr"`.
`q_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma base distribution for the iGMRF scale parameters, `kappa_star(k,m)`, where `k` denotes the term and `m`, the cluster. Defaults to `q_shape = 0.3`.
`q_rate`	The rate parameter of the Gamma base distribution on `kappa_star`. Defaults to `q_rate = 0.0005`.
`tau_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error precision parameter. Defaults to `tau_shape = 1.0`.
`tau_rate`	The rate parameter of the Gamma prior distribution on `tau_e`. Defaults to `tau_rate = 1`.
`dp_shape`	The shape parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_shape = 1`.
`dp_rate`	The rate parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_rate = 1`.
`M_init`	Starting number of clusters of `nrow(y)` units to initialize sampler. Defaults to `M_init = nrow(y)`.
`w_star`	Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces sampling auto-correlation, but increases computational burden. Defaults to `w_star = 2`. The auxiliary Gibbs sampler is only used in the case that `!is.null(ksi)`, for predictor- assisted clustering; otherwise, the full conditionals for cluster assigment are conjugate.
`n.iter`	Total number of MCMC iterations.
`n.burn`	Number of MCMC iterations to discard. `gmrfdpgrow` will return `(n.iter - n.burn)` posterior samples.
`n.thin`	Gap between successive sampling iterations to save.
`nu`	The degree of freedom parameter for the Huang and Wand prior on precision `R x R` matrix locations, `Lambda_star`, in the case that `N x R` predictors, `ksi`, are entered to instantiate a predictor-dependent prior for co-clustering. Default value is `4`.
`Rep`	The number of times to draw samples of the `N x T` log-mean parameters, `Psi`, for each MCMC iteration under a Poisson-lognormal model when the response type for `y` is count (not continuous), which is indicated by inputing an offset matrix, `E`. Default value is `1`.
`progress`	A boolean value denoting whether to display a progress bar during model execution. Defaults to `progress = true`.
`jitter`	A scalar double indicating amount of jitter to subract from the posterior rate and shape hyperparameters of `tau_e` to stabilize computation. Defaults to `jitter = 0.0`.
`kappa_fast`	Boolean for whether to generate rate hyperparameter from full conditionals versus joint Gaussian (on random effects, `bb`, given `kappa`. The former is faster, but numerically less stable. Defaults to `kappa_fast = FALSE`.
`stable_launch`	A boolean indicator on whether to generate initial values for `N x T` log-mean, `Psi`, when `y` are count data from a Gaussian prior distribution, which can induce numerical launch instabilities, or whether to initialize as `y/E` (where missing values in `y` are imputed). Defaults to `stable_launch = TRUE`.

Value

S3 gmrfdpgrow object, for which many methods are available to return and view results. Generic functions applied to an object, res of class gmrfdpgrow, includes:

`plot(res)`	returns results plots, including fit functions versus data and allocation of fitted functions into clusters
`samples(res)`	contains (`n.iter - n.burn`) posterior sampling iterations for every model parameter
`resid(res)`	contains the model residuals.

Note

The intended focus for this package are data composed of observed noisy functions (each of length T) for a set of experimental units where the functions may express dependence among the experimental units

Author(s)

Terrance Savitsky [email protected] Daniell toth [email protected]

References

T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).

T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.

T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional Data for R. Submitted to: Journal of Statistical Software.

Examples

{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated 
## with the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## Run for 1500 iterations, with half as burn-in for a
## more useful (converged) result.
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )    
}
{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated 
## with the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## Run for 1500 iterations, with half as burn-in for a
## more useful (converged) result.
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )    
}

Run a Bayesian functional data model under an instrinsic GMRF prior whose precision parameters employ a DP prior

Description

An internal function to gmrfdpgrow

Usage

gmrfdpPost(
  y,
  ksi,
  ipr,
  C,
  D,
  q_order,
  q_type,
  n.iter,
  n.burn,
  n.thin,
  M_init,
  w_star,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  nu,
  progress,
  jitter,
  kappa_fast
)
gmrfdpPost(
  y,
  ksi,
  ipr,
  C,
  D,
  q_order,
  q_type,
  n.iter,
  n.burn,
  n.thin,
  M_init,
  w_star,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  nu,
  progress,
  jitter,
  kappa_fast
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions
`ksi`	An N x P matrix of N observations of P predictors to be used in prior probability of co-clustering of set of N, T x 1 observations. Defaults to `ksi = NULL` such that predictors are not used to a priori determine co-clustering probabilities.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`C`	A list object of length, `K`, the number of iGMRF precision terms. Each entry contains a T x T normalized adjacency matrix. The diagonal entries are `0` and row `i` contains the weight for each entry `!=i` divided by the sum of the weights.
`D`	A K x T matrix, where `K` denotes the number of iGMRF terms. Row `k` contains the `T` elements of the diagonal of the term-`k` precision matrix, `Q_k`. Will increase with order and be equal, except for boundary corrections.
`q_order`	An integer vector where each entry contains the order of the associated `K` iGMRF precision terms matrix of Euclidean distances associated to each seasonal covariance term.
`q_type`	A vector of length `K`, the number of iGMRF precision terms, with each entry indicating whether the associated term is a trend (`"tr"`) or seasonality (`"sn"`) term.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`M_init`	Starting value of number of clusters for sampling cluster assignments.
`w_star`	Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces samplin auto-correlation, but increases computational burden. Defaults to `w_star = 2`.
`q_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`q_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`tau_shape`	The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error precision parameter. Defaults to `tau_shape = 1.0`.
`tau_rate`	The rate parameter of the Gamma prior distribution on `tau_e`. Defaults to `tau_rate = 1`.
`dp_shape`	The shape parameter for the $\Gamma$ prior on the DP concentration parameter. The rate parameter is set of `1`.
`dp_rate`	The rate parameter for the $\Gamma$ prior on the DP concentration parameter. Default value is `1`.
`nu`	The degree of freedom parameter for the Huang and Wand prior on precision matrix locations, `Lambda_star`, in the case that predictors, `ksi`, are entered to instantiate a predictor-dependent prior for co-clustering. Default value is `4`
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.
`jitter`	A scalar double indicating amount of jitter to subract from the posterior rate and shape hyperparameters of `tau_e` to stabilize computation. Defaults to `jitter = 0.0`.
`kappa_fast`	Boolean for whether to generate rate hyperparameter from full conditionals versus joint Gaussian (on random effects, `bb`, given `kappa`. The former is faster, but numerically less stable. Defaults to `kappa_fast = FALSE`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gmrfdpgrow

Author(s)

Terrance Savitsky [email protected]

Run a Bayesian functional data model under a GP prior with a fixed clustering structure that co-samples latent functions, `bb_i`.

Description

An internal function to gpdpgrow

Usage

gpBFixPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  s
)
gpBFixPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  s
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions. `y` may\ have intermittant missing values. They should be input with `NA`.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`Omega_t`	A T x T matrix of squared Eucidean distances for `T` time points
`Omega_s`	A `list` object of length `L_s`, where each contains the T x T matrix of Euclidean distances associated to each seasonal covariance term.
`gp_mod`	An L x 1 numeric vector denoting the selected covariance function for each of `L` terms. `gp_mod = 1` is `"rq"`. `gp_mod = 2` is `"se"`. `gp_mod = 3` is `"sn"`.
`jitter`	Numeric value added to diagonals of GP covariance matrix to stabilize inversion.
`gp_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`gp_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`lower`	Minimum in range of support for GP covariance parameters, `kappa`.
`upper`	Maximum in range of support for GP covariance parameters, `kappa`.
`w`	Tuning parameter for slice sampling interval width used for GP covariance parameters, `kappa`.
`n_slice_iter`	Maximum number of steps to widen slice samplind width for GP covariance parameters, `kappa`.
`y_index`	List object where each contains index of time points to use in `n` progressively coarser distribution for sampling `kappa` in tempered update steps.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`n.tune`	The number of tuning iterations to update the slice sampler width, `w`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.
`s`	An integer vector inputting cluster membership structure if select `fix == TRUE`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gpdpgrow

Author(s)

Terrance Savitsky [email protected]

Run a Bayesian functional data model under a GP prior whose parameters employ a DP prior

Description

An internal function to gpdpgrow

Usage

gpdpbPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  b_move,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w_star,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  M_init,
  dp_shape,
  dp_rate,
  progress
)
gpdpbPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  b_move,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w_star,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  M_init,
  dp_shape,
  dp_rate,
  progress
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions. `y` may\ have intermittant missing values. They should be input with `NA`.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`Omega_t`	A T x T matrix of squared Eucidean distances for `T` time points
`Omega_s`	A `list` object of length `L_s`, where each contains the T x T matrix of Euclidean distances associated to each seasonal covariance term.
`gp_mod`	An L x 1 numeric vector denoting the selected covariance function for each of `L` terms. `gp_mod = 1` is `"rq"`. `gp_mod = 2` is `"se"`. `gp_mod = 3` is `"sn"`.
`jitter`	Numeric value added to diagonals of GP covariance matrix to stabilize inversion.
`b_move`	An indicator in `{0,1}` denoting whether to sample GP functions, `(bb_i)` in a T x 1 Gibbs step or through elliptical slice sampling. `b_move = 1` samples in a Gibbs step. Defaults to `b_move = 1`.
`gp_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`gp_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`lower`	Minimum in range of support for GP covariance parameters, `kappa`.
`upper`	Maximum in range of support for GP covariance parameters, `kappa`.
`w_star`	Tuning parameter for number of locations to sample not linked to observations in the auxiliary Gibbs sampler for cluster assignments.
`w`	Tuning parameter for slice sampling interval width used for GP covariance parameters, `kappa`.
`n_slice_iter`	Maximum number of steps to widen slice samplind width for GP covariance parameters, `kappa`.
`y_index`	List object where each contains index of time points to use in `n` progressively coarser distribution for sampling `kappa` in tempered update steps.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`n.tune`	The number of tuning iterations to update the slice sampler width, `w`.
`M_init`	Starting value of number of clusters for sampling cluster assignments.
`dp_shape`	The shape parameter for the $\Gamma$ prior on the DP concentration parameter. The rate parameter is set of `1`.
`dp_rate`	The rate parameter for the $\Gamma$ prior on the DP concentration parameter. Default value is `1`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gpdpgrow

Author(s)

Terrance Savitsky [email protected]

Bayesian non-parametric dependent Gaussian process model for time-indexed functional data

Description

Estimates a collection of time-indexed functions with Gaussian process (GP) formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same GP covariance parameters. The GP formulation supports any number of additive GP covariance terms, expressing either or both of multiple trend and seasonality.

Usage

gpdpgrow(
  y,
  ipr,
  time_points,
  gp_cov,
  sn_order,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  dp_shape,
  dp_rate,
  M_init,
  lower,
  upper,
  sub_size,
  w_star,
  w,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  b_move,
  cluster,
  s
)
gpdpgrow(
  y,
  ipr,
  time_points,
  gp_cov,
  sn_order,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  dp_shape,
  dp_rate,
  M_init,
  lower,
  upper,
  sub_size,
  w_star,
  w,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  b_move,
  cluster,
  s
)

Arguments

`y`	A multivariate continuous response, specified as an N x T matrix, where `N` denotes the number of functions and `T`, the number of time points per function. Intermittent missing-at-random values are allowed and will be estimated from the posterior predictive distribution. Missing cells should be denoted with `NA`. The sampling of missing values requires co-sampling the functions, `bb`, while these functions are marginalized out for sampling under the case of no missing data. So the sampler will run more slowly in the case of intermittent missingness.
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`time_points`	Inputs a vector of common time points at which the collections of functions were observed (with the possibility of intermittent missingness). The length of `time_points` should be equal to the number of columns in the data matrix, `y`. Defaults to `time_points = 1:ncol(y)`.
`gp_cov`	A vector of length `L` to select the covariance function for each of `L` terms. Allowed inputs are `c("rq","se","sn")`, where "rq" denotes the rational quadratic covariance function, "se", the square exponential, and "sn", seasonality. The seasonality covariance is a quasi-periodic multiplicative combination of a fixed length- scale periodic covariance kernel (where the period length is specified in 'sn_order') and a squared exponential kernel (of varying length-scale) to allow the periodicity in the seasonal term to evolve. e.g. If 4 terms are wanted - 2 trend terms with "se" and 2 seasonality terms, the input would be `gp_cov = c("se","se","sn","sn")`. Defaults to `gp_cov = "rq"`.
`sn_order`	A vector of length `L_s`, the number of terms in `gp_cov` where `"sn"` is selected, that denotes the seasonality order for each term; e.g. if the two "sn" terms above are for 3 and 12 month seasonality, respectively, for monthly data, then `sn_order = c(3,12)`. Defaults to `sn_order = NULL`.
`jitter`	A scalar numerical value added to the diagonal elements of the T x T GP covariance matrix to stabilize computation. Defaults to `jitter = 0.01`.
`gp_shape`	The shape parameter of the Gamma base distribution for the DP prior on the P x N matrix of GP covariance parameters (where P denotes the number of parameters for each of the N experimental units). Defaults to `gp_shape = 1`
`gp_rate`	The rate parameter of the Gamma base distribution on GP covariance parameters. Defaults to `gp_rate = 1`.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`dp_shape`	The shape parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_shape = 1`.
`dp_rate`	The rate parameter for the Gamma prior on the DP concentration parameter, `conc`. Defaults to `dp_rate = 1`.
`M_init`	Starting number of clusters of `nrow(y)` units to initialize sampler. Defaults to `M_init = nrow(y)`.
`lower`	The lower end of the range to be used in conditionally sampling the GP covariance parameters (`kappa,tau_e`) in the slice sampler. Defaults to `lower = 0`.
`upper`	The upper end of the range to be used in conditionally sampling the GP covariance parameters (`kappa,tau_e`) in the slice sampler. Defaults to `upper = 1e10`.
`sub_size`	Integer vector whose length, `n`, equals the number of progressively coarser GP covariance matrices to use for tempered sampling steps in an alternative space to sample the GP covariance parameters. Each entry denotes the number of sub-sample time points in `T` to draw under a latin hypercube design from the N x T data matrix, `y` to employ for that distribution; for example, suppose `T = 300`. `sub_size = c(100,50)`, would randomly select first `100` and then `50` time points from `y` to use for each of the `n = 2` distributions. Defaults to `sub_size = c(floor(0.25T),floor(0.1T))`.
`w_star`	Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces samplin auto-correlation, but increases computational burden. Defaults to `w_star = 2`.
`w`	Numeric value denoting the step width used to construct the interval from which to draw a sample for each GP covariance parameter in the slice sampler. This value is adaptively updated in the sampler tuning stage for each parameter to be equal to the difference in the 0.95 and 0.05 sample quantiles for each of 5 block updates. Defaults to `w = 1.5`.
`n.iter`	Total number of MCMC iterations.
`n.burn`	Number of MCMC iterations to discard. `gpdpgrow` will return `(n.iter - n.burn)` posterior samples.
`n.thin`	Gap between successive sampling iterations to save.
`n.tune`	Number of iterations (before ergodic chain instantiated) to adapt `w`, separately, for each covariance term, `p = 1,...,P`. Sets each `w_p` to lie in the 90 percent credible interval computed from the tuning sample (that is divided into 5 blocks so that `w_p` is successively updated in each block of runs).
`progress`	A boolean value denoting whether to display a progress bar during model execution. Defaults to `progress = TRUE`
`b_move`	A boolean value denoting whether to sample the GP function, `bb`, in T x 1 Gibbs steps `b_move = TRUE` or through elliptical slice sampling. Defaults to `b_move = TRUE`. Only used in the case there is any intermittent missingness; otherwise `bb` is marginalized out of the sampler and post-sampled from the its predictive distribution.
`cluster`	A boolean value denoting whether to employ DP mix model over set of GP functions or to just use GP model with no clustering of covariance function parameters. Defaults to `cluster = TRUE`
`s`	An N x 1 integer vector that inputs a fixed clustering, rather than sampling it. Defaults to `s = NULL`

Value

S3 gpdpgrow object, for which many methods are available to return and view results. Generic functions applied to an object, res of class gpdpgrow, includes:

`samples(res)`	contains (`n.iter - n.burn`) posterior sampling iterations for every model parameter
`resid(res)`	contains the model residuals.

Note

Author(s)

Terrance Savitsky [email protected] Daniell Toth [email protected]

References

T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).

T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.

T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional Data for R. Submitted to: Journal of Statistical Software.

Examples

{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2011 - 2013
## to examine the state level employment 
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2011:2013))]

## uses default setting of a single "rational quadratic" covariance
## run for 500 iterations, with half discarded as burn-in to 
## obtain a more useful result.
res_gp               <- gpdpgrow(y = y_short, 
                                 n.iter = 4, 
                                 n.burn = 1, 
                                 n.thin = 1, 
                                 n.tune = 0)  

## Two plots of estimated functions, 
## 1. faceted by cluster 
## 2. fitted functions vs noisy observations
## first plot will plot estimated denoised function, 
## bb_i, for a single (randomly-selected) "state"
fit_plots_gp        <- cluster_plot( object = res_gp,  
                           units_name = "state", 
                           units_label = cps$st, 
                           single_unit = TRUE, 
                           credible = TRUE )
## second plot will randomly select 6 states 
## and plot their estimated denoised functions, bb_i.
## with setting "single_unit = FALSE". 
## (Option "num_plot" may be set to plot 
## any integer number of 
## randomly-selected units.)
fit_plots_gp        <- cluster_plot( object = res_gp,  
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )

}
{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2011 - 2013
## to examine the state level employment 
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2011:2013))]

## uses default setting of a single "rational quadratic" covariance
## run for 500 iterations, with half discarded as burn-in to 
## obtain a more useful result.
res_gp               <- gpdpgrow(y = y_short, 
                                 n.iter = 4, 
                                 n.burn = 1, 
                                 n.thin = 1, 
                                 n.tune = 0)  

## Two plots of estimated functions, 
## 1. faceted by cluster 
## 2. fitted functions vs noisy observations
## first plot will plot estimated denoised function, 
## bb_i, for a single (randomly-selected) "state"
fit_plots_gp        <- cluster_plot( object = res_gp,  
                           units_name = "state", 
                           units_label = cps$st, 
                           single_unit = TRUE, 
                           credible = TRUE )
## second plot will randomly select 6 states 
## and plot their estimated denoised functions, bb_i.
## with setting "single_unit = FALSE". 
## (Option "num_plot" may be set to plot 
## any integer number of 
## randomly-selected units.)
fit_plots_gp        <- cluster_plot( object = res_gp,  
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )

}

Run a Bayesian functional data model under a GP prior whose parameters employ a DP prior

Description

An internal function to gpdpgrow

Usage

gpdpPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w_star,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  M_init,
  dp_shape,
  dp_rate,
  progress
)
gpdpPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w_star,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  M_init,
  dp_shape,
  dp_rate,
  progress
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`Omega_t`	A T x T matrix of squared Eucidean distances for `T` time points
`Omega_s`	A `list` object of length `L_s`, where each contains the T x T matrix of Euclidean distances associated to each seasonal covariance term.
`gp_mod`	An L x 1 numeric vector denoting the selected covariance function for each of `L` terms. `gp_mod = 1` is `"rq"`. `gp_mod = 2` is `"se"`. `gp_mod = 3` is `"sn"`.
`jitter`	Numeric value added to diagonals of GP covariance matrix to stabilize inversion.
`gp_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`gp_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`lower`	Minimum in range of support for GP covariance parameters, `kappa`.
`upper`	Maximum in range of support for GP covariance parameters, `kappa`.
`w_star`	Tuning parameter for number of locations to sample not linked to observations in the auxiliary Gibbs sampler for cluster assignments.
`w`	Tuning parameter for slice sampling interval width used for GP covariance parameters, `kappa`.
`n_slice_iter`	Maximum number of steps to widen slice samplind width for GP covariance parameters, `kappa`.
`y_index`	List object where each contains index of time points to use in `n` progressively coarser distribution for sampling `kappa` in tempered update steps.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`n.tune`	The number of tuning iterations to update the slice sampler width, `w`.
`M_init`	Starting value of number of clusters for sampling cluster assignments.
`dp_shape`	The shape parameter for the $\Gamma$ prior on the DP concentration parameter. The rate parameter is set of `1`.
`dp_rate`	The rate parameter for the $\Gamma$ prior on the DP concentration parameter. Default value is `1`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gpdpgrow

Author(s)

Terrance Savitsky [email protected]

Run a Bayesian functional data model under a GP prior whose parameters employ a DP prior

Description

An internal function to gpdpgrow

Usage

gpFixPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  s
)
gpFixPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress,
  s
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`Omega_t`	A T x T matrix of squared Eucidean distances for `T` time points
`Omega_s`	A `list` object of length `L_s`, where each contains the T x T matrix of Euclidean distances associated to each seasonal covariance term.
`gp_mod`	An L x 1 numeric vector denoting the selected covariance function for each of `L` terms. `gp_mod = 1` is `"rq"`. `gp_mod = 2` is `"se"`. `gp_mod = 3` is `"sn"`.
`jitter`	Numeric value added to diagonals of GP covariance matrix to stabilize inversion.
`gp_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`gp_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`lower`	Minimum in range of support for GP covariance parameters, `kappa`.
`upper`	Maximum in range of support for GP covariance parameters, `kappa`.
`w`	Tuning parameter for slice sampling interval width used for GP covariance parameters, `kappa`.
`n_slice_iter`	Maximum number of steps to widen slice samplind width for GP covariance parameters, `kappa`.
`y_index`	List object where each contains index of time points to use in `n` progressively coarser distribution for sampling `kappa` in tempered update steps.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`n.tune`	The number of tuning iterations to update the slice sampler width, `w`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.
`s`	An integer vector inputting cluster membership structure if select `fix == TRUE`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gpdpgrow

Author(s)

Terrance Savitsky [email protected]

Run a Bayesian functional data model under a GP prior whose parameters employ a DP prior

Description

An internal function to gpdpgrow

Usage

gpPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress
)
gpPost(
  y,
  ipr,
  Omega_t,
  Omega_s,
  gp_mod,
  jitter,
  gp_shape,
  gp_rate,
  noise_shape,
  noise_rate,
  lower,
  upper,
  w,
  n_slice_iter,
  y_index,
  n.iter,
  n.burn,
  n.thin,
  n.tune,
  progress
)

Arguments

`y`	An N x T matrix of N observations of T x 1 functions
`ipr`	An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample. Defaults to `ipr = rep(1,nrow(y))` indicating an iid sample.
`Omega_t`	A T x T matrix of squared Eucidean distances for `T` time points
`Omega_s`	A `list` object of length `L_s`, where each contains the T x T matrix of Euclidean distances associated to each seasonal covariance term.
`gp_mod`	An L x 1 numeric vector denoting the selected covariance function for each of `L` terms. `gp_mod = 1` is `"rq"`. `gp_mod = 2` is `"se"`. `gp_mod = 3` is `"sn"`.
`jitter`	Numeric value added to diagonals of GP covariance matrix to stabilize inversion.
`gp_shape`	The shape parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`gp_rate`	The rate parameter of the Gamma base distribution for the `kappa_star` locations used to sample the DP prior on the `P` GP covariance parameters, `kappa`, for each experimental unit.
`noise_shape`	The shape parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_shape = 3`.
`noise_rate`	The rate parameter of the Gamma base distribution on `tau_e`, the model noise precision parameter. Defaults to `noise_rate = 1`.
`lower`	Minimum in range of support for GP covariance parameters, `kappa`.
`upper`	Maximum in range of support for GP covariance parameters, `kappa`.
`w`	Tuning parameter for slice sampling interval width used for GP covariance parameters, `kappa`.
`n_slice_iter`	Maximum number of steps to widen slice samplind width for GP covariance parameters, `kappa`.
`y_index`	List object where each contains index of time points to use in `n` progressively coarser distribution for sampling `kappa` in tempered update steps.
`n.iter`	The number of MCMC sampling iterations
`n.burn`	The number of warm-up iterations to discard
`n.thin`	The interval or step size of post-burn-in samples to return
`n.tune`	The number of tuning iterations to update the slice sampler width, `w`.
`progress`	An indicator in `{0,1}` denoting whether to display a progress bar during model execution. `progress = 1` displays a progress bar. Defaults to `progress = 1`.

Value

res A list object containing MCMC runs for all model parameters.

Note

Intended as an internal function for gpdpgrow

Author(s)

Terrance Savitsky [email protected]

Plot credible intervals for parameters to compare ignoring with weighting an informative sample

Description

Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.

Usage

informative_plot(
  objects = NULL,
  objects_labels = c("ignore", "weight"),
  map = NULL,
  units_name = NULL,
  model = "gp",
  true_star = NULL,
  map_true = NULL
)
informative_plot(
  objects = NULL,
  objects_labels = c("ignore", "weight"),
  map = NULL,
  units_name = NULL,
  model = "gp",
  true_star = NULL,
  map_true = NULL
)

Arguments

`objects`	A list of objects, either all outputs from gpdpgrow(), or all from gmrfdpgrow(). `objects` includes a model estimated under ignoring the informativeness of the sampling design and another that employs weighting to account for the informativeness. An additional object may be added to represent a separate "iid" (or non-informative) sample from the same population, which will typically be available if the dataset was generated as synthetic data using function, "gen_informative_sample()".
`objects_labels`	A character vector of length equal to `objects` that provides labels for each entry in `objects`. Allowed entries in `objects_labels` are `c("ignore","weight","iid")`, where "ignore" denotes a model that ignores the informativeness, while "weight" denotes a model that employs sampling weights and "iid" denotes a model run on a non-informative, iid sample from the same population. Defaults to `objects_labels = c("ignore","weight")`.
`map`	A list matrices, where each entry is produced from `cluster_plot(object)$map`. It is comprised of unit labels and cluster assignments for each object in `objects`. The length of `map` must be equal to the length of `objects`.
`units_name`	The label in each "map" matrix for the observation units. Will be the same as the `units_name` entry for the previously run, `cluster_plot()` function.
`model`	A scalar character input indicating the estimation model for all of the entries in `objects`. Allowable values for `model` are c("gp","gmrf"). Defaults to `model = "gp"`.
`true_star`	An optional, P x M matrix, of true parameter location values, where `P` denotes the number of parameters per cluster and `M` denotes the number of clusters. For example, in `model = "gp"` with a single, rational quadratic covariance, `P = 3` and if there are 3 clusters, then `M = 3`. For a `model = "gmrf"`, with a single covariance, `P = 1`.
`map_true`	An optional `data.frame` object with `n` rows, the size of the informative sample used for `c("ignore","weight")` objects that maps the `units_name` to a true cluster. `map_true` must have 2 columns (and the rest are ignored), one must be named the same value as input for `units_name`. The second column must be named, `cluster`. If the true values derive from running `gen_informative_sample()` as the source of the true values, one may just input the `map_obs` `data.frame` that is listed in the object returned by `gen_informative_sample()`.

Value

A list object containing the plot of estimated functions, faceted by cluster, and the associated data.frame object.

`p.compare`	A `ggplot2` plot object
`dat.compare`	A `data.frame` object used to generate `p.compare`.

Author(s)

Terrance Savitsky [email protected]

Examples

## Not run: 
library(growfunctions)
## use gen_informative_sample() to generate an 
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate 
## the population.
## A single stage stratified random sample of size n 
## is drawn from the population using I = 4 strata. 
## The resulting sample is informative in that the 
## distribution for this sample is
## different from the population from which 
## it was drawn because the strata inclusion
## probabilities are proportional to a feature 
## of the response, y (in the case, the variance.
## The stratified random sample over-samples 
## large variance strata).
## (The user may also select a 2-stage 
## sample with the first stage
## sampling "blocks" of the population and 
## the second stage sampling strata within blocks). 
dat_sim        <- gen_informative_sample(N= 10000, 
                                n = 500, T = 5,
                                noise_to_signal = 0.1)

y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

an informative sampling design that inputs inclusion
probabilities, ipr
res_gp_w            <- gpdpgrow(y = y_obs, 
                               ipr = dat_sim$map_obs$incl_prob, 
                               n.iter = 5, n.burn = 2,  
                               n.thin = 1, n.tune = 0)
and fit vs. data for experimental units
fit_plots_w         <- cluster_plot( object = res_gp_w,  
                           units_name = "establishment", 
                           units_label = dat_sim$map_obs$establishment, 
                           single_unit = FALSE, credible = TRUE )

## estimate parameters ignoring sampling design
res_gp_i            <- gpdpgrow(y = y_obs, 
                               n.iter = 5, n.burn = 2, 
                               n.thin = 1, n.tune = 0)
## plots of estimated functions, faceted by cluster and fit vs. 
## data for experimental units
fit_plots_i         <- cluster_plot( object = res_gp_i,  
                                    units_name = "establishment", 
                                    units_label = dat_sim$map_obs$establishment, 
                                    single_unit = FALSE, credible = TRUE )

## We also draw an iid (non-informative, exchangeable) 
## sample from the same population to 
## compare estimation results to our weighted 
## (w) and unweighted/ignoring (i) models

## estimate parameters under an iid sampling design
res_gp_iid          <- gpdpgrow(y = dat_sim$y_iid, 
                               n.iter = 5, n.burn = 2,   
                               n.thin = 1, n.tune = 0)
## plots of estimated functions, faceted by cluster and 
## fit vs. data for experimental units
fit_plots_iid       <- cluster_plot( object = res_gp_iid,  
                           units_name = "establishment", 
                           units_label = dat_sim$map_iid$establishment, 
                           single_unit = FALSE, credible = TRUE )

## compare estimations of covariance parameter credible 
## intervals when ignoring informativeness vs.
## weighting to account for informativeness
objects                  <- map <- vector("list",3)
objects[[1]]             <- res_gp_i
objects[[2]]             <- res_gp_iid
objects[[3]]             <- res_gp_w
map[[1]]                 <- fit_plots_i$map
map[[2]]                 <- fit_plots_iid$map
map[[3]]                 <- fit_plots_w$map
objects_labels           <- c("ignore","iid","weight")

parms_plots_compare      <- informative_plot( objects = objects, 
                                objects_labels = objects_labels,
                                map = map, units_name = "establishment", 
                                model = "gp",
                                true_star = dat_sim$theta_star, 
                                map_true = dat_sim$map_obs)


## End(Not run)
## Not run: 
library(growfunctions)
## use gen_informative_sample() to generate an 
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate 
## the population.
## A single stage stratified random sample of size n 
## is drawn from the population using I = 4 strata. 
## The resulting sample is informative in that the 
## distribution for this sample is
## different from the population from which 
## it was drawn because the strata inclusion
## probabilities are proportional to a feature 
## of the response, y (in the case, the variance.
## The stratified random sample over-samples 
## large variance strata).
## (The user may also select a 2-stage 
## sample with the first stage
## sampling "blocks" of the population and 
## the second stage sampling strata within blocks). 
dat_sim        <- gen_informative_sample(N= 10000, 
                                n = 500, T = 5,
                                noise_to_signal = 0.1)

y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

an informative sampling design that inputs inclusion
probabilities, ipr
res_gp_w            <- gpdpgrow(y = y_obs, 
                               ipr = dat_sim$map_obs$incl_prob, 
                               n.iter = 5, n.burn = 2,  
                               n.thin = 1, n.tune = 0)
and fit vs. data for experimental units
fit_plots_w         <- cluster_plot( object = res_gp_w,  
                           units_name = "establishment", 
                           units_label = dat_sim$map_obs$establishment, 
                           single_unit = FALSE, credible = TRUE )

## estimate parameters ignoring sampling design
res_gp_i            <- gpdpgrow(y = y_obs, 
                               n.iter = 5, n.burn = 2, 
                               n.thin = 1, n.tune = 0)
## plots of estimated functions, faceted by cluster and fit vs. 
## data for experimental units
fit_plots_i         <- cluster_plot( object = res_gp_i,  
                                    units_name = "establishment", 
                                    units_label = dat_sim$map_obs$establishment, 
                                    single_unit = FALSE, credible = TRUE )

## We also draw an iid (non-informative, exchangeable) 
## sample from the same population to 
## compare estimation results to our weighted 
## (w) and unweighted/ignoring (i) models

## estimate parameters under an iid sampling design
res_gp_iid          <- gpdpgrow(y = dat_sim$y_iid, 
                               n.iter = 5, n.burn = 2,   
                               n.thin = 1, n.tune = 0)
## plots of estimated functions, faceted by cluster and 
## fit vs. data for experimental units
fit_plots_iid       <- cluster_plot( object = res_gp_iid,  
                           units_name = "establishment", 
                           units_label = dat_sim$map_iid$establishment, 
                           single_unit = FALSE, credible = TRUE )

## compare estimations of covariance parameter credible 
## intervals when ignoring informativeness vs.
## weighting to account for informativeness
objects                  <- map <- vector("list",3)
objects[[1]]             <- res_gp_i
objects[[2]]             <- res_gp_iid
objects[[3]]             <- res_gp_w
map[[1]]                 <- fit_plots_i$map
map[[2]]                 <- fit_plots_iid$map
map[[3]]                 <- fit_plots_w$map
objects_labels           <- c("ignore","iid","weight")

parms_plots_compare      <- informative_plot( objects = objects, 
                                objects_labels = objects_labels,
                                map = map, units_name = "establishment", 
                                model = "gp",
                                true_star = dat_sim$theta_star, 
                                map_true = dat_sim$map_obs)


## End(Not run)

Compute normalized mean squared prediction error based on accuracy to impute missing data values

Description

Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.

Usage

MSPE(object, y_true, pos)
MSPE(object, y_true, pos)

Arguments

`object`	A `gpdpgrow` or `gmrfdpgrow` object.
`y_true`	An `N x T` numeric matrix of test set values.
`pos`	An `N x T` matrix with all entries either `0` or `1`, where a `1` indexes a missing entry or test point in `y_true`.

Value

A list object containing various MSPE fit statistics that measure the accuracy of of predicting the values in y_true indexed by pos.

`SSE`	Sum of squared errors based on full `N x T`, `y_true - y_hat`.
`MSE`	Mean squared error computed from `SSE`.
`RMSE`	Square root of `MSE`.
`SSPE`	Sum of squared prediction error based on missing values.
`MSPE`	Mean squared prediction error based on missing values.
`nMSPE`	Mean squared prediction error based on missing values that is normalized by the variance of the test observations to produce a value in `[0,1]`.
`RMSPE`	Square root of `MSPE`.

Author(s)

Terrance Savitsky [email protected]

Examples

{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2009 - 2013
## to examine the state level employment 
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2009:2013))]

## dimensions
T         <- ncol(y_short)  ## time points per unit
N         <- nrow(y_short)  ## number of units

## Demonstrate estimation of intermittent missing data 
## from posterior predictive distribution by randomly
## selecting 10 percent of entries in y_short and 
## setting them to NA.

## randomly assign missing positions in y.
## assume every unit has equal number of 
## missing positions
## randomly select number of missing 
## observations for each unit
m_factor  <- .1
M         <- floor(m_factor*N*T)
m_vec     <- rep(floor(M/N),N)
if( sum(m_vec) < M )
{
    m_left              <- M - sum(m_vec)
    pos_i               <- sample(1:N, m_left, 
                                replace = FALSE)
     m_vec[pos_i]        <- m_vec[pos_i] + 1
} # end conditional statement on whether all 
  # missing cells allocated
  # randomly select missing 
  # positions for each unit
pos       <- matrix(0,N,T)
for( i in 1:N )
{
    sel_ij              <- sample(3:(T-3), m_vec[i], 
                           replace = FALSE) 
                           ## avoid edge effects
    pos[i,sel_ij]       <- 1
}

## configure N x T matrix, y_obs, with 10 percent missing 
## observations (filled with NA)
## to use for sampling.  Entries in y_short 
## that are set to missing (NA) are
## determined by entries of "1" in the 
## N x T matrix, pos.
y_obs               <- y_short
y_obs[pos == 1]     <- NA       

## Conduct dependent GP model estimation under 
## missing observations in y_obs.
## We illustrate the ability to have multiple 
## function or covariance terms
## by seting gp_cov = c("se","sn"), which indicates 
## the first term is a
## squared exponential ("se") trend covariance term 
## and the "sn" is a seasonality
## term.  The setting, sn_order = 4, indicates the 
## length scale of the seasonality
## term is 4 month.  The season term is actually 
## "quasi" seasonal, in that the
## seasonal covariance kernel is multiplied by a 
## squared exponential, which allows
## the pattern of seasonality to evolve over time.
res_gp_2               <- gpdpgrow(y = y_obs, 
                                  gp_cov=c("se","sn"), 
                                  sn_order = 4, 
                                  n.iter = 10, 
                                  n.burn = 4, 
                                  n.thin = 1, 
                                  n.tune = 0) 

## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gp_2        <- cluster_plot( object = res_gp_2,  
                                      units_name = "state", 
                                      units_label = cps$st, 
                                      single_unit = TRUE, 
                                      credible = TRUE )
                                      
## Compute out-of-sample fit statistic, normalized mean-square 
## prediction error (MSPE)
## The normalized MSPE will take the predicted values 
## for the entries in y_short purposefully
## set to NA and will subtract them from the known true 
## values in y_short (and square them).
## This MSE on predicted (test) data is then 
## divided by the variance of the test observations
## to output something akin to a percent error.
( nmspe_gp  <- MSPE(res_gp_2, y_short, pos)$nMSPE )    
}
{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with 
## the years 2009 - 2013
## to examine the state level employment 
## levels during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2009:2013))]

## dimensions
T         <- ncol(y_short)  ## time points per unit
N         <- nrow(y_short)  ## number of units

## Demonstrate estimation of intermittent missing data 
## from posterior predictive distribution by randomly
## selecting 10 percent of entries in y_short and 
## setting them to NA.

## randomly assign missing positions in y.
## assume every unit has equal number of 
## missing positions
## randomly select number of missing 
## observations for each unit
m_factor  <- .1
M         <- floor(m_factor*N*T)
m_vec     <- rep(floor(M/N),N)
if( sum(m_vec) < M )
{
    m_left              <- M - sum(m_vec)
    pos_i               <- sample(1:N, m_left, 
                                replace = FALSE)
     m_vec[pos_i]        <- m_vec[pos_i] + 1
} # end conditional statement on whether all 
  # missing cells allocated
  # randomly select missing 
  # positions for each unit
pos       <- matrix(0,N,T)
for( i in 1:N )
{
    sel_ij              <- sample(3:(T-3), m_vec[i], 
                           replace = FALSE) 
                           ## avoid edge effects
    pos[i,sel_ij]       <- 1
}

## configure N x T matrix, y_obs, with 10 percent missing 
## observations (filled with NA)
## to use for sampling.  Entries in y_short 
## that are set to missing (NA) are
## determined by entries of "1" in the 
## N x T matrix, pos.
y_obs               <- y_short
y_obs[pos == 1]     <- NA       

## Conduct dependent GP model estimation under 
## missing observations in y_obs.
## We illustrate the ability to have multiple 
## function or covariance terms
## by seting gp_cov = c("se","sn"), which indicates 
## the first term is a
## squared exponential ("se") trend covariance term 
## and the "sn" is a seasonality
## term.  The setting, sn_order = 4, indicates the 
## length scale of the seasonality
## term is 4 month.  The season term is actually 
## "quasi" seasonal, in that the
## seasonal covariance kernel is multiplied by a 
## squared exponential, which allows
## the pattern of seasonality to evolve over time.
res_gp_2               <- gpdpgrow(y = y_obs, 
                                  gp_cov=c("se","sn"), 
                                  sn_order = 4, 
                                  n.iter = 10, 
                                  n.burn = 4, 
                                  n.thin = 1, 
                                  n.tune = 0) 

## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gp_2        <- cluster_plot( object = res_gp_2,  
                                      units_name = "state", 
                                      units_label = cps$st, 
                                      single_unit = TRUE, 
                                      credible = TRUE )
                                      
## Compute out-of-sample fit statistic, normalized mean-square 
## prediction error (MSPE)
## The normalized MSPE will take the predicted values 
## for the entries in y_short purposefully
## set to NA and will subtract them from the known true 
## values in y_short (and square them).
## This MSE on predicted (test) data is then 
## divided by the variance of the test observations
## to output something akin to a percent error.
( nmspe_gp  <- MSPE(res_gp_2, y_short, pos)$nMSPE )    
}

Plot estimated functions, faceted by cluster numbers, for a known clustering

Description

An internal function of the growfunctions package.

Usage

plot_cluster(
  y,
  H,
  sort = FALSE,
  sample_rate = 0.05,
  y.axis.label = NULL,
  smoother = TRUE,
  fade = 0.2,
  cluster_order = NULL,
  plot_render = TRUE
)
plot_cluster(
  y,
  H,
  sort = FALSE,
  sample_rate = 0.05,
  y.axis.label = NULL,
  smoother = TRUE,
  fade = 0.2,
  cluster_order = NULL,
  plot_render = TRUE
)

Arguments

`y`	An `N x T` matrix for a collection of functions.
`H`	An `N x 1` with entries in `1,...,M` of cluster assignments for the `N` units of `y` under a known clustering.
`sort`	An optional boolean input on whether to sort the cluster-indexed plot panels of function by size of cluster. Defaults `sort = FALSE`.
`sample_rate`	An optional numeric value in (0,1] indicating percent of functions to randomly sample within each cluster to address over-plotting. Defaults to 1.
`y.axis.label`	An optional text label for y-axis. Defaults to `"function values"`.
`smoother`	An optional scalar boolean input indicating whether to co-plot a smoother line through the functions in each cluster.
`fade`	An optional numeric input in `(0,1)` indicating the degree of fading to apply to the plots of functions in each cluster-indexed panel. Defaults to `fade = 0.2`.
`cluster_order`	An optional numeric vector of length `M`, the number of clusters, indicating the order, from-to-right, for plotting the cluster-indexed panels.
`plot_render`	An optional boolean input indicating whether to render the plot. Defaults to `plot_render = TRUE`.

Value

A list object containing the plot of estimated functions, faceted by cluster, and the associated data.frame object.

`p.basis`	A `ggplot2` plot object
`map`	A `data.frame` object listing the unit and associated cluster membership.

Author(s)

Terrance Savitsky [email protected]

Use the model-estimated covariance parameters from gpdpgrow() or gmrdpgrow to predict the function at future time points.

Description

This is the generic predict_functions method. See the following functions for the details about different data structures:

Usage

predict_functions(object, J, ...)
predict_functions(object, J, ...)

Arguments

`object`	Object of class `gpdpgrow` or`gmrfdpgrow()`.
`J`	Scalar denoting number of draws to take from posterior predictive for each unit. Defaults to `J = 500`.
`...`	further arguments passed to or from other methods.

Details

predict_functions.gpdpgrow for objects of class gpdpgrow
predict_functions.gmrfdpgrow for objects of class gmrfdpgrow

Use the model-estimated iGMRF precision parameters from gmrfdpgrow() to predict the iGMRF function at future time points. Inputs the `gmrfdpgrow` object of estimated parameters.

Description

A companion function to gmrfdpgrow

Usage

## S3 method for class 'gmrfdpgrow'
predict_functions(object, J = 500, T_test, ...)
## S3 method for class 'gmrfdpgrow'
predict_functions(object, J = 500, T_test, ...)

Arguments

`object`	Object of class `gmrfdpgrow` returned from model run of `gmrfdpgrow()`
`J`	Scalar denoting number of draws to take from posterior predictive for each unit. Defaults to `J = 500`.
`T_test`	The number of equally-spaced time points to predict the iGMRF functions ahead of of the functions estimated at `T_train` time points.
`...`	further arguments passed to or from other methods.

Value

out A list object containing containing two matrices; the first is a P x (N*T) matrix of predicted function values for each of P sampled iterations. N is slow index and denotes the number of experimental units. The second matrix is an N x T average over the P sampled draws, composed in Rao-Blackwellized fashion.

Note

Intended as a companion function for gmrfdpgrow for prediction

Author(s)

Terrance Savitsky [email protected]

Examples

## Not run: 
library(growfunctions)
data(cps)
y_short   <- cps$y[,(cps$yr_label %in% c(2010:2013))]
t_train   <- ncol(y_short)
N         <- nrow(y_short)
t_test    <- 4
 
## Model Runs

res_gmrf            = gmrfdpgrow(y = y_short, 
                                q_order = c(2,4), 
                                q_type = c("tr","sn"), 
                                n.iter = 100, 
                                n.burn = 50, 
                                n.thin = 1) 
## Prediction Model Runs
T_test             <- 4

pred_gmrf          <- predict_functions( object = res_gmrf,
                                     J = 1000, 
                                     T_test = T_test )

## plot estimated and predicted functions
plot_gmrf       <- predict_plot(object = pred_gmrf, 
                               units_label = cps$st, 
                               single_unit = TRUE, 
                               credible = FALSE)

## End(Not run)
## Not run: 
library(growfunctions)
data(cps)
y_short   <- cps$y[,(cps$yr_label %in% c(2010:2013))]
t_train   <- ncol(y_short)
N         <- nrow(y_short)
t_test    <- 4
 
## Model Runs

res_gmrf            = gmrfdpgrow(y = y_short, 
                                q_order = c(2,4), 
                                q_type = c("tr","sn"), 
                                n.iter = 100, 
                                n.burn = 50, 
                                n.thin = 1) 
## Prediction Model Runs
T_test             <- 4

pred_gmrf          <- predict_functions( object = res_gmrf,
                                     J = 1000, 
                                     T_test = T_test )

## plot estimated and predicted functions
plot_gmrf       <- predict_plot(object = pred_gmrf, 
                               units_label = cps$st, 
                               single_unit = TRUE, 
                               credible = FALSE)

## End(Not run)

Use the model-estimated GP covariance parameters from gpdpgrow() to predict the GP function at future time points. Inputs the `gpdpgrow` object of estimated parameters.

Description

A companion function to gpdpgrow

Usage

## S3 method for class 'gpdpgrow'
predict_functions(
  object,
  J = 500,
  test_times,
  time_points = NULL,
  sn_order = NULL,
  ...
)
## S3 method for class 'gpdpgrow'
predict_functions(
  object,
  J = 500,
  test_times,
  time_points = NULL,
  sn_order = NULL,
  ...
)

Arguments

`object`	Object of class `gpdpgrow` returned from model run of `gpdpgrow()`
`J`	Scalar denoting number of draws to take from posterior predictive for each unit. Defaults to `J = 500`.
`test_times`	A numeric vector holding test times at which to predict GP function values Will use the estimated covariance parameters from the training data to predict functions at the test_times for the `N` observation units.
`time_points`	Inputs a vector of common time points at which the collections of functions were observed (with the possibility of intermittent missingness). The length of `time_points` should be equal to the number of columns in the data matrix, `y`. Defaults to `time_points = 1:ncol(y)`.
`sn_order`	An integer vector of length, `L_s`, equal to the number of seasonal terms. Conveys the order of the seasonality for each term on the scale of T; for example, if T is dimensioned in months, and one wishes to model quarterly seasonality, then the applicable seasonality term would be of order `3`.
`...`	further arguments passed to or from other methods.

Value

out A list object containing containing two matrices; the first is a K x (N*T) matrix of predicted function values for each of K sampled iterations. N is slow index and denotes the number of experimental units. The second matrix is an N x T average over the K sampled draws, composed in Rao-Blackwellized fashion.

Note

Intended as a companion function for gpdpgrow for prediction

Author(s)

Terrance Savitsky [email protected]

Examples

## Not run: 
library(growfunctions)
data(cps)
y_short   <- cps$y[,(cps$yr_label %in% c(2010:2013))]
t_train   <- ncol(y_short)
N         <- nrow(y_short)
t_test    <- 4
 
## Model Runs
res_gp              = gpdpgrow(y = y_short
                              n.iter = 50, 
                              n.burn = 25, 
                              n.thin = 1, 
                              n.tune = 0) 

## Prediction Model Runs
T_test             <- 4
T_yshort           <- ncol(y_short)
pred_gp            <- predict_functions( object = res_gp, 
                       test_times = (T_yshort+1):(T_yshort+T_test) )

## plot estimated and predicted functions
plot_gp         <- predict_plot(object = pred_gp, 
                               units_label = cps$st, 
                               single_unit = FALSE, 
                               credible = TRUE)  

## End(Not run)
## Not run: 
library(growfunctions)
data(cps)
y_short   <- cps$y[,(cps$yr_label %in% c(2010:2013))]
t_train   <- ncol(y_short)
N         <- nrow(y_short)
t_test    <- 4
 
## Model Runs
res_gp              = gpdpgrow(y = y_short
                              n.iter = 50, 
                              n.burn = 25, 
                              n.thin = 1, 
                              n.tune = 0) 

## Prediction Model Runs
T_test             <- 4
T_yshort           <- ncol(y_short)
pred_gp            <- predict_functions( object = res_gp, 
                       test_times = (T_yshort+1):(T_yshort+T_test) )

## plot estimated and predicted functions
plot_gp         <- predict_plot(object = pred_gp, 
                               units_label = cps$st, 
                               single_unit = FALSE, 
                               credible = TRUE)  

## End(Not run)

Plot estimated functions both at estimated and predicted time points with 95% credible intervals.

Description

Uses as input the output object from the gpdpgrow.predict() and gmrfdpgrow.predict() methods.

Usage

predict_plot(
  object = NULL,
  units_label = NULL,
  type_label = c("fitted", "predicted"),
  time_points = NULL,
  date_label = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL,
  single_unit = FALSE,
  credible = TRUE
)
predict_plot(
  object = NULL,
  units_label = NULL,
  type_label = c("fitted", "predicted"),
  time_points = NULL,
  date_label = NULL,
  x.axis.label = NULL,
  y.axis.label = NULL,
  single_unit = FALSE,
  credible = TRUE
)

Arguments

`object`	A `gpdpgrow.predict` or `gmrfdpgrow.predict` object, obtained from `predict_functions(object,...)`.
`units_label`	A vector of labels to apply to experimental units with length equal to the number of unique units. Defaults to sequential numeric values as input with data, `y`.
`type_label`	A character vector assigning a "fitted" or "predicted label for the `time_points` input. Defaults to `type_label = c("fitted","predicted")`.
`time_points`	A list input of length 2 with each entry containing a numeric vector of times - one for the observed times for the set of "fitted" functions and the other denotes time values at which "predicted" values were rendered for the functions. This input variable only applies to `gpdpgrow` objects and not `gmrfdpgrow` objects because the latter covariance structure is based on adjacency for equally-spaced time points. Defaults to `1:T_train` for the list entry pointed to "fitted" and `(T_train+1):(T_train + T_test)` for the list entry pointed to "predicted".
`date_label`	A vector of `Date` values for labeling the x-axis tick marks. Defaults to `1:T` .
`x.axis.label`	Text label for x-axis. Defaults to `"time"`.
`y.axis.label`	Text label for y-axis. Defaults to `"function values"`.
`single_unit`	A scalar boolean indicating whether to plot the fitted vs data curve for only a single experimental units (versus a random sample of 6). Defaults to `FALSE`.
`credible`	A scalar boolean indicating whether to plot 95 percent credible intervals for estimated functions, `bb`, when plotting fitted functions versus data. Defaults to `credible = TRUE` .

Value

A list object containing the plot of estimated functions, faceted by cluster, and the associated data.frame object.

`p.cluster`	A `ggplot2` plot object
`dat.cluster`	A `data.frame` object used to generate `p.cluster`.

Author(s)

Terrance Savitsky [email protected]

Examples

## Not run: 
library(growfunctions)
data(cps)
y_short             <- cps$y[,(cps$yr_label %in% c(2008:2013))]
t_train             <- ncol(y_short)
N                   <- nrow(y_short)
t_test              <- 4
 
## Model Runs

res_gmrf            <- gmrfdpgrow(y = y_short, 
                                q_order = c(2,4), 
                                q_type = c("tr","sn"), 
                                n.iter = 40, 
                                n.burn = 20, 
                                n.thin = 1) 

res_gp              <- gpdpgrow(y = y_short
                              n.iter = 10, 
                              n.burn = 4, 
                              n.thin = 1, 
                              n.tune = 0) 

## Prediction Model Runs
T_test             <- 4

pred_gmrf          <- predict_functions( object = res_gmrf,
                                     J = 1000, 
                                     T_test = T_test )

T_yshort           <- ncol(y_short)
pred_gp            <- predict_functions( object = res_gp, 
                     test_times = (T_yshort+1):(T_yshort+T_test) )

## plot estimated and predicted functions
plot_gmrf       <- predict_plot(object = pred_gmrf, 
                               units_label = cps$st, 
                               single_unit = TRUE, 
                               credible = FALSE)

plot_gp         <- predict_plot(object = pred_gp, 
                               units_label = cps$st, 
                               single_unit = FALSE, 
                               credible = TRUE)  

## End(Not run)
## Not run: 
library(growfunctions)
data(cps)
y_short             <- cps$y[,(cps$yr_label %in% c(2008:2013))]
t_train             <- ncol(y_short)
N                   <- nrow(y_short)
t_test              <- 4
 
## Model Runs

res_gmrf            <- gmrfdpgrow(y = y_short, 
                                q_order = c(2,4), 
                                q_type = c("tr","sn"), 
                                n.iter = 40, 
                                n.burn = 20, 
                                n.thin = 1) 

res_gp              <- gpdpgrow(y = y_short
                              n.iter = 10, 
                              n.burn = 4, 
                              n.thin = 1, 
                              n.tune = 0) 

## Prediction Model Runs
T_test             <- 4

pred_gmrf          <- predict_functions( object = res_gmrf,
                                     J = 1000, 
                                     T_test = T_test )

T_yshort           <- ncol(y_short)
pred_gp            <- predict_functions( object = res_gp, 
                     test_times = (T_yshort+1):(T_yshort+T_test) )

## plot estimated and predicted functions
plot_gmrf       <- predict_plot(object = pred_gmrf, 
                               units_label = cps$st, 
                               single_unit = TRUE, 
                               credible = FALSE)

plot_gp         <- predict_plot(object = pred_gp, 
                               units_label = cps$st, 
                               single_unit = FALSE, 
                               credible = TRUE)  

## End(Not run)

Produce samples of MCMC output

Description

provides posterior sampled values for every model parameter of a gpdpgrow object

Usage

samples(object, ...)
samples(object, ...)

Arguments

`object`	A `gpdpgrow` object
`...`	Ignored

Produce samples of MCMC output

Description

provides posterior sampled values for every model parameter of a gmrfdpgrow object

Usage

## S3 method for class 'gmrfdpgrow'
samples(object, ...)
## S3 method for class 'gmrfdpgrow'
samples(object, ...)

Arguments

`object`	A `gmrfdpgrow` object
`...`	Ignored

Package 'growfunctions'

Help Index

Bayesian Non-Parametric Models for Estimating a Set of Denoised, Latent Functions From an Observed Collection of Domain-Indexed Time-Series

Description

Details

Author(s)

References

Examples

Plot estimated functions for experimental units faceted by cluster versus data to assess fit.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Monthly employment counts from 1990 - 2013 from the Current Population Survey

Description

Format

Details

Side-by-side plot panels that compare latent function values to data for different estimation models

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Generate a finite population and take an informative single or two-stage sample.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Run a Bayesian functional data model under an instrinsic GMRF prior whose precision parameters employ a DP prior for a COUNT data response type where: y ~ poisson(E*exp(Psi)) Psi ~ N(gamma,tau_e^-1) which is a Poisson-lognormal model

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Bayesian instrinsic Gaussian Markov Random Field model for dependent time-indexed functions

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Run a Bayesian functional data model under an instrinsic GMRF prior whose precision parameters employ a DP prior

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Run a Bayesian functional data model under a GP prior with a fixed clustering structure that co-samples latent functions, bb_i.

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Run a Bayesian functional data model under a GP prior whose parameters employ a DP prior

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Bayesian non-parametric dependent Gaussian process model for time-indexed functional data

Description

Run a Bayesian functional data model under a GP prior with a fixed clustering structure that co-samples latent functions, `bb_i`.

Use the model-estimated iGMRF precision parameters from gmrfdpgrow() to predict the iGMRF function at future time points. Inputs the `gmrfdpgrow` object of estimated parameters.

Use the model-estimated GP covariance parameters from gpdpgrow() to predict the GP function at future time points. Inputs the `gpdpgrow` object of estimated parameters.