Title: | Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data |
---|---|
Description: | Estimates a collection of time-indexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same covariance or precision parameters. The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality. |
Authors: | Terrance Savitsky |
Maintainer: | Terrance Savitsky <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.16 |
Built: | 2024-12-07 06:57:03 UTC |
Source: | CRAN |
Package: | growfunctions |
Type: | Package |
Version: | 0.16 |
Date: | 2023-12-08 |
License: | GPL (>= 3) |
LazyLoad: | yes |
Parameterizes a model for a collection of noisy time-series indexed by domain or observation unit as an additive function process plus noise process. The latent functions are modeled under both Gaussian process (GP) and intrinsic Gaussian Markov random (iGMRF) field priors. Dependence is estimated among the set of functions by selecting a prior, G, on the covariance or precision parameters of the GP and iGMRF, respectively, where G receives a Dirichlet process (DP) prior. The resulting marginal prior on the functions is a nonparametric scale mixture over the covariance or precision parameters, with G the unknown mixing measure. Draws from a DP are almost surely discrete, allowing for ties (interpreted as clusters) among the covariance or precision parameters indexed by domain or observation unit. Functions are included that permit additional inference on the clustering properties over the domains. The mixture models allow specification of multiple additive latent functions for each of the GP and iGMRF DP mixture formulations. Each function (under a GP prior) may be specified with a covariance function selected from a set of available options that allow for combinations of trend and seasonal components. The same is true for precision matrix constructions under the iGMRF DP mixture.
ESTIMATION FUNCTIONS
gpdpgrow
performs Bayesian nonparametric estimation of a set of dependent, denoised
latent functions under a DP mixture of GP's in an unsupervised fashion based on user input of
an N x T
data matrix, y
, where N
denotes the number of domains or units
and T
denotes the number of time points. The DP prior estimates the dependent structure
among the covariance parameters generating each T x 1
function across the N
domains.
The user may specify multiple latent functions in an additive
regression formulation, where
each covariance kernel may be selected from the squared exponential ("se"
), the
rational quadratic ("rq"
), or a quasi-periodic (product of a period and
squared exponential
to allow the periodic kernel to evolve) ("sn"
).
gmrfdpgrow
also inputs an N x T
data matrix, y
, but replaces the
GP prior used in gpdpgrow()
with an iGMRF prior. The DP mixture is over the precision
parameter, kappa
, that multiplies a fixed matrix, Q
, which specifies
the length-scale
of dependence. The user may specify multiple functions in an additive formulation,
each with its own precision matrix. The precision matrix is specified
via two options. Input q_type
indicates whether the precision matrix is a trend ("tr"
)
precision matrix or a
seasonal ("sn"
) precision matrix. Integer innput q_order
controls the length scale of the estimated functions and specifies the
difference order of the precision matrix in the case of q_type = "tr"
,
or the periodicity, in the case q_type = "sn"
. Typically, q_order
is set to
1
or 2
when q_type = "tr"
, though other values are possible. An optional
N x R
predictor matrix, ksi, may be input to adjust the prior for cluster assignments,
s
, to produce a dependent product partition model.
MSPE
Inputs a gpdpgrow()
or gmrfdpgrow()
object estimated where
some data values deliberately set to missing (NA
) and produces an out-of-sample
mean square prediction error (MSPE), and a normalized MSPE (normalized by the variance of
the missing test set). Both gpdpgrow()
and gmrfdpgrow()
returned objects also
include a leave-one-out log-pseudo marginal likelihood LPML
fit statistic estimated
on the training set (so that no data are required to be left out). One would expect the
nMSPE
statistic to provide a greater penalty for model complexity since it is computed
on data not used for estimation.
predict_functions
Uses the model-estimated GP covariance parameters from
gpdpgrow()
or iGMRF precision parameters from gmrfdpgrow()
to predict functions at future time points beyond the range of the data
from their posterior predictive distributions. Both gpdpgrow()
and gmrfdpgrow()
will predict missing function values in the range of the data.
PLOT FUNCTIONS
cluster_plot
inputs a returned object from either of gpdpgrow()
or
gmrfdpgrow()
and produces two plots. The first plot creates panels for the clusters
and aggregates line plots of posterior mean estimates for member denoised functions.
An optional
smoother may be drawn through the set of functions in each panel to differentiate the
patterns expressed across clusters. A second plot renders the estimated posterior mean
values (with an option for credible intervals) for a single or group of randomly-selected
latent functions against the actual data values, y
.
informative_plot
inputs a list of returned objects, all from either
of gpdpgrow()
or gmrfdpgrow()
(where the model type, GP or
iGMRF, is communicated with input model
) that compares
credible intervals for covariance or precision
parameters where the data are drawn from an informative
sampling design (rather than as iid), so that the distribution
for the population is not the same as that for
the sample. One model conducts estimation in a fashion that ignores the informative
design and the other incorporates sampling weights to account for the informativeness.
Comparing the resulting estimated credible intervals provides a means to assess the
sensitivity of estimated parameters to the sampling design. The set of objects are
labeled with objects_labels
with allowable inputs c("ignore","weight","iid")
.
One of the objects must have label "ignore"
, and the other must have label,
weight
. An additional object may also be input that is estimated from an iid sample
drawn from the same population as the informative sample used for estimation under
objects "ignore"
and "weight"
. Both informative and iid samples are generated
from the synthetic data function, gen_informative_sample()
.
fit_compare
inputs a list of returned objects, each from either
gpdpgrow()
or gmrfdpgrow()
, and plots the posterior mean estimate
for a randomly-selected latent function as compared to the actual data in a set
panels indexed by cluster and object. Allows comparison of fit performance of
functions to data across varied model specifications.
predict_plot
uses a returned object from predict_functions()
to plot both estimated and predicted function values (with the option for
credible intervals) where the prediction interval is outside the range of data.
The plot function, cluster_plot
, should be used where
the predicted values are in the range
of the data (and may, hence, be treated as missing values. The estimated functions are
plotted in cluster_plot
for the missing, as well as observed, data points).
DATA SETS (and functions to generate synthetic data sets)
cps
Data derived from the Current Population Survey (CPS) administered
to households in
local areas within each of 51 states. These data capture a rectangular matrix of
monthly-indexed
all-other employment direct estimates computed from the set of household responses.
The data capture
156 month observations (from 2000 - 2013) for each of the 51 states. Two objects are included:
y_raw
contains the 51 x 156 matrix of all-other employment counts.
y
contains the 51 x 156 matrix of all-other
employment counts are standardization to (0,1), by state, to all
them to be modeled together.
gen_informative_sample
Generates an N x T
population data matrix,
y
, and an associated n x T
sample data matrix, y_obs
, where the
sample is drawn using a 1 or 2-stage informative process. The 1-stage sample uses
unequal probability stratified sampling, while the 2-stage process samples the first stage
in blocks, while the second stage samples with unequal probability from strata for selected
blocks. Both block and strata memberships for population units are generated based on the
variance of their time-series, y
. The resulting sample is informative because the
block and cluster memberships and selection probabilities are determined based on y
.
Terrance Savitsky [email protected]
Terrance D. Savitsky (2016) Bayesian Nonparametric Mixture Estimation for Time-Indexed Functional Data in R, Journal of Statistical Software, Volume 72, Number 2, pages 1 – 34, doi:10.18637/jss.v072.i02.
T. D. Savitsky, D. Toth (2016) Bayesian Estimation Under Informative Sampling, Electronic Journal of Statistics, Volume 10, Number 1.
T. D. Savitsky (2016) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Survey Methodology.
{ library(growfunctions) ## load the monthly employment count ## data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, ## associated with ## the years 2011 - 2013 ## to examine the state level ## employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2011:2013))] ## run DP mixture of GP's to estimate ## posterior distributions ## for model parameters ## uses default setting of a single ## "rational quadratic" ## covariance formula ## A short number of iterations is used ## for illustration ## Run for 500 iterations with half ## as burn-in to ## get a more useful result res_gp <- gpdpgrow( y = y_short, n.iter = 3, n.burn = 1, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected ## functions fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## Run the DP mixture of iGMRF's ## to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term ## A short number of iterations ## is used for illustration ## Run for 2000 iterations with ## half as burn-in to ## get a more useful result res_gmrf <- gmrfdpgrow( y = y_short, n.iter = 11, n.burn = 4, n.thin = 1) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of ## randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## visual comparison of fit performance ## between gpdpgrow() and gmrfdpgrow() ## or any two objects returned from any ## combination of these estimation ## functions objects <- vector("list",2) objects[[1]] <- res_gmrf objects[[2]] <- res_gp label.object <- c("gmrf_tr2","gp_rq") ## the map data.frame ## object from fit_plots gp ## includes a field that ## identifies cluster assignments ## for each unit (or domain) H <- fit_plots_gp$map$cluster fit_plot_compare_facet <- fit_compare( objects = objects, H = H, label.object = label.object, y.axis.label = "normalized y", units_name = "state", units_label = cps$st) }
{ library(growfunctions) ## load the monthly employment count ## data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, ## associated with ## the years 2011 - 2013 ## to examine the state level ## employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2011:2013))] ## run DP mixture of GP's to estimate ## posterior distributions ## for model parameters ## uses default setting of a single ## "rational quadratic" ## covariance formula ## A short number of iterations is used ## for illustration ## Run for 500 iterations with half ## as burn-in to ## get a more useful result res_gp <- gpdpgrow( y = y_short, n.iter = 3, n.burn = 1, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected ## functions fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## Run the DP mixture of iGMRF's ## to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term ## A short number of iterations ## is used for illustration ## Run for 2000 iterations with ## half as burn-in to ## get a more useful result res_gmrf <- gmrfdpgrow( y = y_short, n.iter = 11, n.burn = 4, n.thin = 1) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of ## randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## visual comparison of fit performance ## between gpdpgrow() and gmrfdpgrow() ## or any two objects returned from any ## combination of these estimation ## functions objects <- vector("list",2) objects[[1]] <- res_gmrf objects[[2]] <- res_gp label.object <- c("gmrf_tr2","gp_rq") ## the map data.frame ## object from fit_plots gp ## includes a field that ## identifies cluster assignments ## for each unit (or domain) H <- fit_plots_gp$map$cluster fit_plot_compare_facet <- fit_compare( objects = objects, H = H, label.object = label.object, y.axis.label = "normalized y", units_name = "state", units_label = cps$st) }
Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.
cluster_plot( object, N_clusters = NULL, time_points = NULL, units_name = "unit", units_label = NULL, date_field = NULL, x.axis.label = NULL, y.axis.label = NULL, smoother = TRUE, sample_rate = 1, single_unit = FALSE, credible = FALSE, num_plot = NULL )
cluster_plot( object, N_clusters = NULL, time_points = NULL, units_name = "unit", units_label = NULL, date_field = NULL, x.axis.label = NULL, y.axis.label = NULL, smoother = TRUE, sample_rate = 1, single_unit = FALSE, credible = FALSE, num_plot = NULL )
object |
A |
N_clusters |
Denotes the number of largest sized (in terms of membership) clusters to plot. Defaults to all clusters. |
time_points |
Inputs a vector of common time points at which the collections of functions were
observed (with the possibility of intermittent missingness). The length of |
units_name |
The plot label for observation units. Defaults to |
units_label |
A vector of labels to apply to the observation units with length equal to the number of
unique units. Defaults to sequential numeric values as input with data, |
date_field |
A vector of |
x.axis.label |
Text label for x-axis. Defaults to |
y.axis.label |
Text label for y-axis. Defaults to |
smoother |
A scalar boolean input indicating whether to co-plot a smoother line through the functions in each cluster. |
sample_rate |
A numeric value in (0,1] indicating percent of functions to randomly sample within each cluster to address over-plotting. Defaults to 1. |
single_unit |
A scalar boolean indicating whether to plot the fitted vs data curve for
only a single experimental units (versus a random sample of 6).
Defaults to |
credible |
A scalar boolean indicating whether to plot 95 percent credible intervals for
estimated functions, |
num_plot |
A scalar integer indicating how many randomly-selected functions to plot
(each in it's own plot panel) in the plot of functions versus the observed time series
in the case that |
A list object containing the plot of estimated functions, faceted by cluster,
and the associated data.frame
object.
p.cluster |
A |
dat.cluster |
A |
Terrance Savitsky [email protected]
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2008 - 2013 ## to examine the state level employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] ## Run the DP mixture of iGMRF's to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 40, n.burn = 20, n.thin = 1) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2008 - 2013 ## to examine the state level employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] ## Run the DP mixture of iGMRF's to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 40, n.burn = 20, n.thin = 1) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
Monthly employment counts published by the U.S. Bureau of Labor Statistics in the
Current Population Survey (CPS) for each of N = 51
states (including the District
of Columbia). This dataset covers T = 278
months from 1990 the first two
months of 2013. The data include a N x T
matrix, y_raw
, of raw employment counts,
as well as set of standardized values, y
, where the standardization is done within state.
The standardized data matrix is used in our gpdpgrow
and gmrfdpgrow
estimating functions because the standardization facilitates comparisons of the time-series across
states.
A list object of 5 objects supporting a data matrix of N = 51 state time series for T = 278 months.
y. An (N = 51) x (T = 278)
matrix of standardized employment count
estimates for N = 51
states for T = 278
months, beginning in 1990. The counts
are standardized to (0,1) for each state series
y_raw. An N x T
matrix of estimated monthly employment counts for N = 51
states.
st. Two-digit labels for each of the N
states in the order presented in
y
and y_raw
.
dte. A Date
vector of length T
that presents the set of dates (in y-m-d
format) associated to the T
time points presented in y
and y_raw
.
yr. A number vector listing sequence of years, 1990 - 2013 included in the data set.
yr_label. A numerical vector of length T = 278
with year labels for each
monthly employment count in the cps
data set.
Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.
fit_compare( objects, H = NULL, label.object = c("gp_rq", "gmrf_rw2"), units_name = "Observation_Unit", units_label = NULL, date_field = NULL, x.axis.label = NULL, y.axis.label = NULL )
fit_compare( objects, H = NULL, label.object = c("gp_rq", "gmrf_rw2"), units_name = "Observation_Unit", units_label = NULL, date_field = NULL, x.axis.label = NULL, y.axis.label = NULL )
objects |
A list input where each element is a returned object
from estimation with either of |
H |
An |
label.object |
A character vector of length equal to |
units_name |
A character input that provides a label
for the set of |
units_label |
A vector of labels to apply to the observation units
with length equal to the
number of unique units. Defaults to sequential
numeric values as input with data, |
date_field |
A vector of |
x.axis.label |
Text label for x-axis. Defaults to |
y.axis.label |
Text label for y-axis. Defaults to |
A list object containing the plot of estimated functions, faceted by cluster,
and the associated data.frame
object.
p.t |
A |
map |
A |
Terrance Savitsky [email protected]
{ library(growfunctions) ## load the monthly employment count data ## for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, ## associated with ## the years 2009 - 2013 ## to examine the state level ## employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] ## run DP mixture of GP's to ## estimate posterior distributions ## for model parameters ## uses default setting of a ## single "rational quadratic" ## covariance formula res_gp <- gpdpgrow( y = y_short, n.iter = 3, n.burn = 1, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected ## functions fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## Run the DP mixture of iGMRF's to ## estimate posterior ## distributions for model parameters ## Under default ## RW2(kappa) = order 2 trend ## precision term res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 13, n.burn = 4, n.thin = 1) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## visual comparison of fit performance ## between gpdpgrow() and gmrfdpgrow() ## or any two objects returned from any ## combination of these estimation ## functions objects <- vector("list",2) objects[[1]] <- res_gmrf objects[[2]] <- res_gp label.object <- c("gmrf_tr2","gp_rq") ## the map data.frame object ## from fit_plots gp ## includes a field that ## identifies cluster assignments ## for each unit (or domain) H <- fit_plots_gp$map$cluster fit_plot_compare_facet <- fit_compare( objects = objects, H = H, label.object = label.object, y.axis.label = "normalized y", units_name = "state", units_label = cps$st) }
{ library(growfunctions) ## load the monthly employment count data ## for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, ## associated with ## the years 2009 - 2013 ## to examine the state level ## employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] ## run DP mixture of GP's to ## estimate posterior distributions ## for model parameters ## uses default setting of a ## single "rational quadratic" ## covariance formula res_gp <- gpdpgrow( y = y_short, n.iter = 3, n.burn = 1, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected ## functions fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## Run the DP mixture of iGMRF's to ## estimate posterior ## distributions for model parameters ## Under default ## RW2(kappa) = order 2 trend ## precision term res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 13, n.burn = 4, n.thin = 1) ## 2 plots of estimated functions: ## 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) ## visual comparison of fit performance ## between gpdpgrow() and gmrfdpgrow() ## or any two objects returned from any ## combination of these estimation ## functions objects <- vector("list",2) objects[[1]] <- res_gmrf objects[[2]] <- res_gp label.object <- c("gmrf_tr2","gp_rq") ## the map data.frame object ## from fit_plots gp ## includes a field that ## identifies cluster assignments ## for each unit (or domain) H <- fit_plots_gp$map$cluster fit_plot_compare_facet <- fit_compare( objects = objects, H = H, label.object = label.object, y.axis.label = "normalized y", units_name = "state", units_label = cps$st) }
Used to compare performance of sample design-weighted and unweighted estimation procedures.
gen_informative_sample( clustering = TRUE, two_stage = FALSE, theta = c(0.2, 0.7, 1), M = 3, theta_star = matrix(c(0.3, 0.3, 0.3, 0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow = TRUE), gp_type = "rq", N = 10000, T = 15, L = 10, R = 8, I = 4, n = 750, noise_to_signal = 0.05, incl_gradient = "medium" )
gen_informative_sample( clustering = TRUE, two_stage = FALSE, theta = c(0.2, 0.7, 1), M = 3, theta_star = matrix(c(0.3, 0.3, 0.3, 0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow = TRUE), gp_type = "rq", N = 10000, T = 15, L = 10, R = 8, I = 4, n = 750, noise_to_signal = 0.05, incl_gradient = "medium" )
clustering |
Boolean input on whether want population generated from clusters of covariance
parameters. Defaults to |
two_stage |
Boolean input on whether want two stage sampling, with first stage defining set
of |
theta |
A numeric vector of global covariance parameters in the case of |
M |
Scalar input denoting number of clusters to employ if |
theta_star |
An P x M matrix of cluster location values associated with the choice of
|
gp_type |
Input of choice for covariance matrix formulation to be used to generate the functions
for the |
N |
A scalar input denoting the number of population units (or establishments). |
T |
A scalar input denoting the number of time points in each of |
L |
A scalar input that denotes the number of blocks in which to assign the population
units to be sub-sampled in the first stage of sampling.
Defaults to |
R |
A scalar input that denotes the number of blocks to sample from |
I |
A scalar input denoting the number of strata to form within each block. Population units
are divided into equally-sized strata based on variance quantiles. Defaults to |
n |
Sample size to be generated. Both an informative sample under either single
( |
noise_to_signal |
A numeric input in the interval, |
incl_gradient |
A character input on whether stratum probabilities from lowest-to-highest
is to |
A list object named dat_sim
containing objects related to the generated sample
finite population, the informative sample and the non-informative, iid, sample.
Some important objects, include:
H |
A vector of length |
map.tot |
A |
map.obs |
A |
map.iid |
A |
(y , bb)
|
N x T |
(y_obs , bb_obs)
|
N x T |
(y_iid , bb_iid)
|
N x T |
Terrance Savitsky [email protected]
## Not run: library(growfunctions) ## use gen_informative_sample() to generate an ## N X T population drawn from a dependent GP ## By default, 3 clusters are used to generate ## the population. ## A single stage stratified random sample of size n ## is drawn from the population using I = 4 strata. ## The resulting sample is informative in that the ## distribution for this sample is ## different from the population from which ## it was drawn because the strata inclusion ## probabilities are proportional to a feature ## of the response, y (in the case, the variance. ## The stratified random sample over-samples ## large variance strata). ## (The user may also select a 2-stage ## sample with the first stage ## sampling "blocks" of the population and ## the second stage sampling strata within blocks). dat_sim <- gen_informative_sample(N = 10000, n = 500, T = 10, noise_to_signal = 0.1) ## extract n x T observed sample under informative ## stratified sampling design. y_obs <- dat_sim$y_obs T <- ncol(y_obs) ## End(Not run)
## Not run: library(growfunctions) ## use gen_informative_sample() to generate an ## N X T population drawn from a dependent GP ## By default, 3 clusters are used to generate ## the population. ## A single stage stratified random sample of size n ## is drawn from the population using I = 4 strata. ## The resulting sample is informative in that the ## distribution for this sample is ## different from the population from which ## it was drawn because the strata inclusion ## probabilities are proportional to a feature ## of the response, y (in the case, the variance. ## The stratified random sample over-samples ## large variance strata). ## (The user may also select a 2-stage ## sample with the first stage ## sampling "blocks" of the population and ## the second stage sampling strata within blocks). dat_sim <- gen_informative_sample(N = 10000, n = 500, T = 10, noise_to_signal = 0.1) ## extract n x T observed sample under informative ## stratified sampling design. y_obs <- dat_sim$y_obs T <- ncol(y_obs) ## End(Not run)
An internal function to gmrfdpgrow
gmrfdpcountPost( y, E, ksi, ipr, C, D, q_order, q_type, n.iter, n.burn, n.thin, M_init, w_star, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, nu, Rep, progress, jitter, kappa_fast, stable_launch )
gmrfdpcountPost( y, E, ksi, ipr, C, D, q_order, q_type, n.iter, n.burn, n.thin, M_init, w_star, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, nu, Rep, progress, jitter, kappa_fast, stable_launch )
y |
An N x T matrix of N observations of T x 1 functions |
E |
A multivariate offset variable, specified as an N x T matrix, in the case
that |
ksi |
An N x P matrix of N observations of P predictors to be used
in prior probability of co-clustering of set of N, T x 1 observations.
Defaults to |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
C |
A list object of length, |
D |
A K x T matrix, where |
q_order |
An integer vector where each entry contains the order of the associated |
q_type |
A vector of length |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
M_init |
Starting value of number of clusters for sampling cluster assignments. |
w_star |
Integer value denoting the number of cluster locations to sample ahead of
observations in the auxiliary Gibbs sampler used to sample the number of clusters
and associated cluster assignments. A higher value reduces samplin auto-correlation,
but increases computational burden. Defaults to |
q_shape |
The shape parameter of the Gamma base distribution for the |
q_rate |
The rate parameter of the Gamma base distribution for the |
tau_shape |
The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error
precision parameter. Defaults to |
tau_rate |
The rate parameter of the Gamma prior distribution on |
dp_shape |
The shape parameter for the |
dp_rate |
The rate parameter for the |
nu |
The degree of freedom parameter for the Huang and Wand prior on precision
matrix locations, |
Rep |
The number of times to draw samples of the |
progress |
An indicator in |
jitter |
A scalar double indicating amount of jitter to subract from the posterior
rate and shape hyperparameters of |
kappa_fast |
Boolean for whether to generate rate hyperparameter from full conditionals
versus joint Gaussian (on random effects, |
stable_launch |
A boolean indicator on whether to generate initial values for
|
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gmrfdpgrow
Terrance Savitsky [email protected]
Estimates a collection of time-indexed functions under intrinsic Gaussian Markov random field prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same iGMRF precision parameter. The iGMRF formulation supports any number of additive precision terms, expressing either or both of multiple trend and seasonality.
gmrfdpgrow( y, ksi, E, ipr, q_order, q_type, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, M_init, w_star, n.iter, n.burn, n.thin, nu, Rep, progress, jitter, kappa_fast, stable_launch )
gmrfdpgrow( y, ksi, E, ipr, q_order, q_type, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, M_init, w_star, n.iter, n.burn, n.thin, nu, Rep, progress, jitter, kappa_fast, stable_launch )
y |
A multivariate continuous response, specified as an N x T matrix, where |
ksi |
An N x P matrix of N observations of P predictors to be used
in prior probability of co-clustering of set of N, T x 1 observations.
Defaults to |
E |
A multivariate offset variable, specified as an N x T matrix, in the case
that |
ipr |
An optional input vector of inclusion probabilities for each observation unit
in the case the observed data were acquired through an informative sampling design, so
that unbiased inference about the population requires adjustments to the observed sample
Defaults to |
q_order |
An integer vector of length |
q_type |
A vector of length |
q_shape |
The value (in (0,infty)) for the shape hyperparameter for the Gamma base distribution for
the iGMRF scale parameters, |
q_rate |
The rate parameter of the Gamma base distribution on |
tau_shape |
The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error
precision parameter. Defaults to |
tau_rate |
The rate parameter of the Gamma prior distribution on |
dp_shape |
The shape parameter for the Gamma prior on the DP concentration parameter,
|
dp_rate |
The rate parameter for the Gamma prior on the DP concentration parameter,
|
M_init |
Starting number of clusters of |
w_star |
Integer value denoting the number of cluster locations to sample ahead of
observations in the auxiliary Gibbs sampler used to sample the number of clusters
and associated cluster assignments. A higher value reduces sampling auto-correlation,
but increases computational burden. Defaults to |
n.iter |
Total number of MCMC iterations. |
n.burn |
Number of MCMC iterations to discard.
|
n.thin |
Gap between successive sampling iterations to save. |
nu |
The degree of freedom parameter for the Huang and Wand prior on precision
|
Rep |
The number of times to draw samples of the |
progress |
A boolean value denoting whether to display a progress bar during model execution.
Defaults to |
jitter |
A scalar double indicating amount of jitter to subract from the posterior
rate and shape hyperparameters of |
kappa_fast |
Boolean for whether to generate rate hyperparameter from full conditionals
versus joint Gaussian (on random effects, |
stable_launch |
A boolean indicator on whether to generate initial values for
|
S3 gmrfdpgrow
object, for which many methods are available to return and view results.
Generic functions applied to an object, res
of class gmrfdpgrow
, includes:
plot(res) |
returns results plots, including fit functions versus data and allocation of fitted functions into clusters |
samples(res) |
contains ( |
resid(res) |
contains the model residuals. |
The intended focus for this package are data composed of observed noisy functions (each of
length T
) for a set of experimental units where the functions may express dependence
among the experimental units
Terrance Savitsky [email protected] Daniell toth [email protected]
T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).
T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.
T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional
Data for R
. Submitted to: Journal of Statistical Software.
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated ## with the years 2008 - 2013 ## to examine the state level employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] ## Run the DP mixture of iGMRF's to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term ## Run for 1500 iterations, with half as burn-in for a ## more useful (converged) result. res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 40, n.burn = 20, n.thin = 1) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated ## with the years 2008 - 2013 ## to examine the state level employment levels ## during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] ## Run the DP mixture of iGMRF's to estimate posterior ## distributions for model parameters ## Under default RW2(kappa) = order 2 trend ## precision term ## Run for 1500 iterations, with half as burn-in for a ## more useful (converged) result. res_gmrf <- gmrfdpgrow(y = y_short, n.iter = 40, n.burn = 20, n.thin = 1) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gmrf <- cluster_plot( object = res_gmrf, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
An internal function to gmrfdpgrow
gmrfdpPost( y, ksi, ipr, C, D, q_order, q_type, n.iter, n.burn, n.thin, M_init, w_star, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, nu, progress, jitter, kappa_fast )
gmrfdpPost( y, ksi, ipr, C, D, q_order, q_type, n.iter, n.burn, n.thin, M_init, w_star, q_shape, q_rate, tau_shape, tau_rate, dp_shape, dp_rate, nu, progress, jitter, kappa_fast )
y |
An N x T matrix of N observations of T x 1 functions |
ksi |
An N x P matrix of N observations of P predictors to be used
in prior probability of co-clustering of set of N, T x 1 observations.
Defaults to |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
C |
A list object of length, |
D |
A K x T matrix, where |
q_order |
An integer vector where each entry contains the order of the associated |
q_type |
A vector of length |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
M_init |
Starting value of number of clusters for sampling cluster assignments. |
w_star |
Integer value denoting the number of cluster locations to sample ahead of
observations in the auxiliary Gibbs sampler used to sample the number of clusters
and associated cluster assignments. A higher value reduces samplin auto-correlation,
but increases computational burden. Defaults to |
q_shape |
The shape parameter of the Gamma base distribution for the |
q_rate |
The rate parameter of the Gamma base distribution for the |
tau_shape |
The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error
precision parameter. Defaults to |
tau_rate |
The rate parameter of the Gamma prior distribution on |
dp_shape |
The shape parameter for the |
dp_rate |
The rate parameter for the |
nu |
The degree of freedom parameter for the Huang and Wand prior on precision
matrix locations, |
progress |
An indicator in |
jitter |
A scalar double indicating amount of jitter to subract from the posterior
rate and shape hyperparameters of |
kappa_fast |
Boolean for whether to generate rate hyperparameter from full conditionals
versus joint Gaussian (on random effects, |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gmrfdpgrow
Terrance Savitsky [email protected]
bb_i
.An internal function to gpdpgrow
gpBFixPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress, s )
gpBFixPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress, s )
y |
An N x T matrix of N observations of T x 1 functions. |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
Omega_t |
A T x T matrix of squared Eucidean distances for |
Omega_s |
A |
gp_mod |
An L x 1 numeric vector denoting the selected covariance function for each
of |
jitter |
Numeric value added to diagonals of GP covariance matrix to stabilize inversion. |
gp_shape |
The shape parameter of the Gamma base distribution for the |
gp_rate |
The rate parameter of the Gamma base distribution for the |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
lower |
Minimum in range of support for GP covariance parameters, |
upper |
Maximum in range of support for GP covariance parameters, |
w |
Tuning parameter for slice sampling interval width used for GP
covariance parameters, |
n_slice_iter |
Maximum number of steps to widen slice samplind width for
GP covariance parameters, |
y_index |
List object where each contains index of time points to use in |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
n.tune |
The number of tuning iterations to update the slice sampler width, |
progress |
An indicator in |
s |
An integer vector inputting cluster membership structure if select |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gpdpgrow
Terrance Savitsky [email protected]
An internal function to gpdpgrow
gpdpbPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, b_move, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w_star, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, M_init, dp_shape, dp_rate, progress )
gpdpbPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, b_move, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w_star, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, M_init, dp_shape, dp_rate, progress )
y |
An N x T matrix of N observations of T x 1 functions. |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
Omega_t |
A T x T matrix of squared Eucidean distances for |
Omega_s |
A |
gp_mod |
An L x 1 numeric vector denoting the selected covariance function for each
of |
jitter |
Numeric value added to diagonals of GP covariance matrix to stabilize inversion. |
b_move |
An indicator in |
gp_shape |
The shape parameter of the Gamma base distribution for the |
gp_rate |
The rate parameter of the Gamma base distribution for the |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
lower |
Minimum in range of support for GP covariance parameters, |
upper |
Maximum in range of support for GP covariance parameters, |
w_star |
Tuning parameter for number of locations to sample not linked to observations in the auxiliary Gibbs sampler for cluster assignments. |
w |
Tuning parameter for slice sampling interval width used for GP
covariance parameters, |
n_slice_iter |
Maximum number of steps to widen slice samplind width for
GP covariance parameters, |
y_index |
List object where each contains index of time points to use in |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
n.tune |
The number of tuning iterations to update the slice sampler width, |
M_init |
Starting value of number of clusters for sampling cluster assignments. |
dp_shape |
The shape parameter for the |
dp_rate |
The rate parameter for the |
progress |
An indicator in |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gpdpgrow
Terrance Savitsky [email protected]
Estimates a collection of time-indexed functions with Gaussian process (GP) formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same GP covariance parameters. The GP formulation supports any number of additive GP covariance terms, expressing either or both of multiple trend and seasonality.
gpdpgrow( y, ipr, time_points, gp_cov, sn_order, jitter, gp_shape, gp_rate, noise_shape, noise_rate, dp_shape, dp_rate, M_init, lower, upper, sub_size, w_star, w, n.iter, n.burn, n.thin, n.tune, progress, b_move, cluster, s )
gpdpgrow( y, ipr, time_points, gp_cov, sn_order, jitter, gp_shape, gp_rate, noise_shape, noise_rate, dp_shape, dp_rate, M_init, lower, upper, sub_size, w_star, w, n.iter, n.burn, n.thin, n.tune, progress, b_move, cluster, s )
y |
A multivariate continuous response, specified as an N x T matrix, where |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
time_points |
Inputs a vector of common time points at which the collections of functions were
observed (with the possibility of intermittent missingness). The length of |
gp_cov |
A vector of length |
sn_order |
A vector of length |
jitter |
A scalar numerical value added to the diagonal elements of the T x T GP covariance
matrix to stabilize computation. Defaults to |
gp_shape |
The shape parameter of the Gamma base distribution for the DP prior on
the P x N matrix of GP covariance parameters (where P
denotes the number of parameters for each of the N experimental units).
Defaults to |
gp_rate |
The rate parameter of the Gamma base distribution on GP covariance parameters.
Defaults to |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
dp_shape |
The shape parameter for the Gamma prior on the DP concentration parameter,
|
dp_rate |
The rate parameter for the Gamma prior on the DP concentration parameter,
|
M_init |
Starting number of clusters of |
lower |
The lower end of the range to be used in conditionally sampling the GP covariance
parameters ( |
upper |
The upper end of the range to be used in conditionally sampling the GP covariance
parameters ( |
sub_size |
Integer vector whose length, |
w_star |
Integer value denoting the number of cluster locations to sample ahead of
observations in the auxiliary Gibbs sampler used to sample the number of clusters
and associated cluster assignments. A higher value reduces samplin auto-correlation, but
increases computational burden. Defaults to |
w |
Numeric value denoting the step width used to construct the interval from
which to draw a sample for each GP covariance parameter in the slice sampler. This
value is adaptively updated in the sampler tuning stage for each parameter to be equal
to the difference in the 0.95 and 0.05 sample quantiles for each of 5 block updates.
Defaults to |
n.iter |
Total number of MCMC iterations. |
n.burn |
Number of MCMC iterations to discard.
|
n.thin |
Gap between successive sampling iterations to save. |
n.tune |
Number of iterations (before ergodic chain instantiated) to adapt |
progress |
A boolean value denoting whether to display a progress bar during model execution.
Defaults to |
b_move |
A boolean value denoting whether to sample the GP function, |
cluster |
A boolean value denoting whether to employ DP mix model over set of GP functions or
to just use GP model with no clustering of covariance function parameters.
Defaults to |
s |
An N x 1 integer vector that inputs a fixed clustering, rather than sampling it.
Defaults to |
S3 gpdpgrow
object, for which many methods are available to return and view results. Generic functions applied
to an object, res
of class gpdpgrow
, includes:
samples(res) |
contains ( |
resid(res) |
contains the model residuals. |
The intended focus for this package are data composed of observed noisy functions (each of
length T
) for a set of experimental units where the functions may express dependence
among the experimental units
Terrance Savitsky [email protected] Daniell Toth [email protected]
T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).
T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.
T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional
Data for R
. Submitted to: Journal of Statistical Software.
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2011 - 2013 ## to examine the state level employment ## levels during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2011:2013))] ## uses default setting of a single "rational quadratic" covariance ## run for 500 iterations, with half discarded as burn-in to ## obtain a more useful result. res_gp <- gpdpgrow(y = y_short, n.iter = 4, n.burn = 1, n.thin = 1, n.tune = 0) ## Two plots of estimated functions, ## 1. faceted by cluster ## 2. fitted functions vs noisy observations ## first plot will plot estimated denoised function, ## bb_i, for a single (randomly-selected) "state" fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = TRUE, credible = TRUE ) ## second plot will randomly select 6 states ## and plot their estimated denoised functions, bb_i. ## with setting "single_unit = FALSE". ## (Option "num_plot" may be set to plot ## any integer number of ## randomly-selected units.) fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2011 - 2013 ## to examine the state level employment ## levels during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2011:2013))] ## uses default setting of a single "rational quadratic" covariance ## run for 500 iterations, with half discarded as burn-in to ## obtain a more useful result. res_gp <- gpdpgrow(y = y_short, n.iter = 4, n.burn = 1, n.thin = 1, n.tune = 0) ## Two plots of estimated functions, ## 1. faceted by cluster ## 2. fitted functions vs noisy observations ## first plot will plot estimated denoised function, ## bb_i, for a single (randomly-selected) "state" fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = TRUE, credible = TRUE ) ## second plot will randomly select 6 states ## and plot their estimated denoised functions, bb_i. ## with setting "single_unit = FALSE". ## (Option "num_plot" may be set to plot ## any integer number of ## randomly-selected units.) fit_plots_gp <- cluster_plot( object = res_gp, units_name = "state", units_label = cps$st, single_unit = FALSE, credible = TRUE ) }
An internal function to gpdpgrow
gpdpPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w_star, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, M_init, dp_shape, dp_rate, progress )
gpdpPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w_star, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, M_init, dp_shape, dp_rate, progress )
y |
An N x T matrix of N observations of T x 1 functions |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
Omega_t |
A T x T matrix of squared Eucidean distances for |
Omega_s |
A |
gp_mod |
An L x 1 numeric vector denoting the selected covariance function for each
of |
jitter |
Numeric value added to diagonals of GP covariance matrix to stabilize inversion. |
gp_shape |
The shape parameter of the Gamma base distribution for the |
gp_rate |
The rate parameter of the Gamma base distribution for the |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
lower |
Minimum in range of support for GP covariance parameters, |
upper |
Maximum in range of support for GP covariance parameters, |
w_star |
Tuning parameter for number of locations to sample not linked to observations in the auxiliary Gibbs sampler for cluster assignments. |
w |
Tuning parameter for slice sampling interval width used for GP
covariance parameters, |
n_slice_iter |
Maximum number of steps to widen slice samplind width for
GP covariance parameters, |
y_index |
List object where each contains index of time points to use in |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
n.tune |
The number of tuning iterations to update the slice sampler width, |
M_init |
Starting value of number of clusters for sampling cluster assignments. |
dp_shape |
The shape parameter for the |
dp_rate |
The rate parameter for the |
progress |
An indicator in |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gpdpgrow
Terrance Savitsky [email protected]
An internal function to gpdpgrow
gpFixPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress, s )
gpFixPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress, s )
y |
An N x T matrix of N observations of T x 1 functions |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
Omega_t |
A T x T matrix of squared Eucidean distances for |
Omega_s |
A |
gp_mod |
An L x 1 numeric vector denoting the selected covariance function for each
of |
jitter |
Numeric value added to diagonals of GP covariance matrix to stabilize inversion. |
gp_shape |
The shape parameter of the Gamma base distribution for the |
gp_rate |
The rate parameter of the Gamma base distribution for the |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
lower |
Minimum in range of support for GP covariance parameters, |
upper |
Maximum in range of support for GP covariance parameters, |
w |
Tuning parameter for slice sampling interval width used for GP
covariance parameters, |
n_slice_iter |
Maximum number of steps to widen slice samplind width for
GP covariance parameters, |
y_index |
List object where each contains index of time points to use in |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
n.tune |
The number of tuning iterations to update the slice sampler width, |
progress |
An indicator in |
s |
An integer vector inputting cluster membership structure if select |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gpdpgrow
Terrance Savitsky [email protected]
An internal function to gpdpgrow
gpPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress )
gpPost( y, ipr, Omega_t, Omega_s, gp_mod, jitter, gp_shape, gp_rate, noise_shape, noise_rate, lower, upper, w, n_slice_iter, y_index, n.iter, n.burn, n.thin, n.tune, progress )
y |
An N x T matrix of N observations of T x 1 functions |
ipr |
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to
|
Omega_t |
A T x T matrix of squared Eucidean distances for |
Omega_s |
A |
gp_mod |
An L x 1 numeric vector denoting the selected covariance function for each
of |
jitter |
Numeric value added to diagonals of GP covariance matrix to stabilize inversion. |
gp_shape |
The shape parameter of the Gamma base distribution for the |
gp_rate |
The rate parameter of the Gamma base distribution for the |
noise_shape |
The shape parameter of the Gamma base distribution on |
noise_rate |
The rate parameter of the Gamma base distribution on |
lower |
Minimum in range of support for GP covariance parameters, |
upper |
Maximum in range of support for GP covariance parameters, |
w |
Tuning parameter for slice sampling interval width used for GP
covariance parameters, |
n_slice_iter |
Maximum number of steps to widen slice samplind width for
GP covariance parameters, |
y_index |
List object where each contains index of time points to use in |
n.iter |
The number of MCMC sampling iterations |
n.burn |
The number of warm-up iterations to discard |
n.thin |
The interval or step size of post-burn-in samples to return |
n.tune |
The number of tuning iterations to update the slice sampler width, |
progress |
An indicator in |
res A list object containing MCMC runs for all model parameters.
Intended as an internal function for gpdpgrow
Terrance Savitsky [email protected]
Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.
informative_plot( objects = NULL, objects_labels = c("ignore", "weight"), map = NULL, units_name = NULL, model = "gp", true_star = NULL, map_true = NULL )
informative_plot( objects = NULL, objects_labels = c("ignore", "weight"), map = NULL, units_name = NULL, model = "gp", true_star = NULL, map_true = NULL )
objects |
A list of objects, either all outputs from gpdpgrow(), or all from gmrfdpgrow().
|
objects_labels |
A character vector of length equal to |
map |
A list matrices, where each entry is produced
from |
units_name |
The label in each "map" matrix for the observation units. Will be the same as the
|
model |
A scalar character input indicating the estimation model for all of the entries
in |
true_star |
An optional, P x M matrix, of true parameter location values, where
|
map_true |
An optional |
A list object containing the plot of estimated functions, faceted by cluster,
and the associated data.frame
object.
p.compare |
A |
dat.compare |
A |
Terrance Savitsky [email protected]
## Not run: library(growfunctions) ## use gen_informative_sample() to generate an ## N X T population drawn from a dependent GP ## By default, 3 clusters are used to generate ## the population. ## A single stage stratified random sample of size n ## is drawn from the population using I = 4 strata. ## The resulting sample is informative in that the ## distribution for this sample is ## different from the population from which ## it was drawn because the strata inclusion ## probabilities are proportional to a feature ## of the response, y (in the case, the variance. ## The stratified random sample over-samples ## large variance strata). ## (The user may also select a 2-stage ## sample with the first stage ## sampling "blocks" of the population and ## the second stage sampling strata within blocks). dat_sim <- gen_informative_sample(N= 10000, n = 500, T = 5, noise_to_signal = 0.1) y_obs <- dat_sim$y_obs T <- ncol(y_obs) an informative sampling design that inputs inclusion probabilities, ipr res_gp_w <- gpdpgrow(y = y_obs, ipr = dat_sim$map_obs$incl_prob, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) and fit vs. data for experimental units fit_plots_w <- cluster_plot( object = res_gp_w, units_name = "establishment", units_label = dat_sim$map_obs$establishment, single_unit = FALSE, credible = TRUE ) ## estimate parameters ignoring sampling design res_gp_i <- gpdpgrow(y = y_obs, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) ## plots of estimated functions, faceted by cluster and fit vs. ## data for experimental units fit_plots_i <- cluster_plot( object = res_gp_i, units_name = "establishment", units_label = dat_sim$map_obs$establishment, single_unit = FALSE, credible = TRUE ) ## We also draw an iid (non-informative, exchangeable) ## sample from the same population to ## compare estimation results to our weighted ## (w) and unweighted/ignoring (i) models ## estimate parameters under an iid sampling design res_gp_iid <- gpdpgrow(y = dat_sim$y_iid, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) ## plots of estimated functions, faceted by cluster and ## fit vs. data for experimental units fit_plots_iid <- cluster_plot( object = res_gp_iid, units_name = "establishment", units_label = dat_sim$map_iid$establishment, single_unit = FALSE, credible = TRUE ) ## compare estimations of covariance parameter credible ## intervals when ignoring informativeness vs. ## weighting to account for informativeness objects <- map <- vector("list",3) objects[[1]] <- res_gp_i objects[[2]] <- res_gp_iid objects[[3]] <- res_gp_w map[[1]] <- fit_plots_i$map map[[2]] <- fit_plots_iid$map map[[3]] <- fit_plots_w$map objects_labels <- c("ignore","iid","weight") parms_plots_compare <- informative_plot( objects = objects, objects_labels = objects_labels, map = map, units_name = "establishment", model = "gp", true_star = dat_sim$theta_star, map_true = dat_sim$map_obs) ## End(Not run)
## Not run: library(growfunctions) ## use gen_informative_sample() to generate an ## N X T population drawn from a dependent GP ## By default, 3 clusters are used to generate ## the population. ## A single stage stratified random sample of size n ## is drawn from the population using I = 4 strata. ## The resulting sample is informative in that the ## distribution for this sample is ## different from the population from which ## it was drawn because the strata inclusion ## probabilities are proportional to a feature ## of the response, y (in the case, the variance. ## The stratified random sample over-samples ## large variance strata). ## (The user may also select a 2-stage ## sample with the first stage ## sampling "blocks" of the population and ## the second stage sampling strata within blocks). dat_sim <- gen_informative_sample(N= 10000, n = 500, T = 5, noise_to_signal = 0.1) y_obs <- dat_sim$y_obs T <- ncol(y_obs) an informative sampling design that inputs inclusion probabilities, ipr res_gp_w <- gpdpgrow(y = y_obs, ipr = dat_sim$map_obs$incl_prob, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) and fit vs. data for experimental units fit_plots_w <- cluster_plot( object = res_gp_w, units_name = "establishment", units_label = dat_sim$map_obs$establishment, single_unit = FALSE, credible = TRUE ) ## estimate parameters ignoring sampling design res_gp_i <- gpdpgrow(y = y_obs, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) ## plots of estimated functions, faceted by cluster and fit vs. ## data for experimental units fit_plots_i <- cluster_plot( object = res_gp_i, units_name = "establishment", units_label = dat_sim$map_obs$establishment, single_unit = FALSE, credible = TRUE ) ## We also draw an iid (non-informative, exchangeable) ## sample from the same population to ## compare estimation results to our weighted ## (w) and unweighted/ignoring (i) models ## estimate parameters under an iid sampling design res_gp_iid <- gpdpgrow(y = dat_sim$y_iid, n.iter = 5, n.burn = 2, n.thin = 1, n.tune = 0) ## plots of estimated functions, faceted by cluster and ## fit vs. data for experimental units fit_plots_iid <- cluster_plot( object = res_gp_iid, units_name = "establishment", units_label = dat_sim$map_iid$establishment, single_unit = FALSE, credible = TRUE ) ## compare estimations of covariance parameter credible ## intervals when ignoring informativeness vs. ## weighting to account for informativeness objects <- map <- vector("list",3) objects[[1]] <- res_gp_i objects[[2]] <- res_gp_iid objects[[3]] <- res_gp_w map[[1]] <- fit_plots_i$map map[[2]] <- fit_plots_iid$map map[[3]] <- fit_plots_w$map objects_labels <- c("ignore","iid","weight") parms_plots_compare <- informative_plot( objects = objects, objects_labels = objects_labels, map = map, units_name = "establishment", model = "gp", true_star = dat_sim$theta_star, map_true = dat_sim$map_obs) ## End(Not run)
Uses as input the output object from the gpdpgrow() and gmrfdpgrow() functions.
MSPE(object, y_true, pos)
MSPE(object, y_true, pos)
object |
A |
y_true |
An |
pos |
An |
A list object containing various MSPE fit statistics that measure the accuracy of
of predicting the values in y_true
indexed by pos
.
SSE |
Sum of squared errors based on full |
MSE |
Mean squared error computed from |
RMSE |
Square root of |
SSPE |
Sum of squared prediction error based on missing values. |
MSPE |
Mean squared prediction error based on missing values. |
nMSPE |
Mean squared prediction error based on missing values that is
normalized by the variance of the test observations to produce a value in
|
RMSPE |
Square root of |
Terrance Savitsky [email protected]
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2009 - 2013 ## to examine the state level employment ## levels during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2009:2013))] ## dimensions T <- ncol(y_short) ## time points per unit N <- nrow(y_short) ## number of units ## Demonstrate estimation of intermittent missing data ## from posterior predictive distribution by randomly ## selecting 10 percent of entries in y_short and ## setting them to NA. ## randomly assign missing positions in y. ## assume every unit has equal number of ## missing positions ## randomly select number of missing ## observations for each unit m_factor <- .1 M <- floor(m_factor*N*T) m_vec <- rep(floor(M/N),N) if( sum(m_vec) < M ) { m_left <- M - sum(m_vec) pos_i <- sample(1:N, m_left, replace = FALSE) m_vec[pos_i] <- m_vec[pos_i] + 1 } # end conditional statement on whether all # missing cells allocated # randomly select missing # positions for each unit pos <- matrix(0,N,T) for( i in 1:N ) { sel_ij <- sample(3:(T-3), m_vec[i], replace = FALSE) ## avoid edge effects pos[i,sel_ij] <- 1 } ## configure N x T matrix, y_obs, with 10 percent missing ## observations (filled with NA) ## to use for sampling. Entries in y_short ## that are set to missing (NA) are ## determined by entries of "1" in the ## N x T matrix, pos. y_obs <- y_short y_obs[pos == 1] <- NA ## Conduct dependent GP model estimation under ## missing observations in y_obs. ## We illustrate the ability to have multiple ## function or covariance terms ## by seting gp_cov = c("se","sn"), which indicates ## the first term is a ## squared exponential ("se") trend covariance term ## and the "sn" is a seasonality ## term. The setting, sn_order = 4, indicates the ## length scale of the seasonality ## term is 4 month. The season term is actually ## "quasi" seasonal, in that the ## seasonal covariance kernel is multiplied by a ## squared exponential, which allows ## the pattern of seasonality to evolve over time. res_gp_2 <- gpdpgrow(y = y_obs, gp_cov=c("se","sn"), sn_order = 4, n.iter = 10, n.burn = 4, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gp_2 <- cluster_plot( object = res_gp_2, units_name = "state", units_label = cps$st, single_unit = TRUE, credible = TRUE ) ## Compute out-of-sample fit statistic, normalized mean-square ## prediction error (MSPE) ## The normalized MSPE will take the predicted values ## for the entries in y_short purposefully ## set to NA and will subtract them from the known true ## values in y_short (and square them). ## This MSE on predicted (test) data is then ## divided by the variance of the test observations ## to output something akin to a percent error. ( nmspe_gp <- MSPE(res_gp_2, y_short, pos)$nMSPE ) }
{ library(growfunctions) ## load the monthly employment count data for a collection of ## U.S. states from the Current ## Population Survey (cps) data(cps) ## subselect the columns of N x T, y, associated with ## the years 2009 - 2013 ## to examine the state level employment ## levels during the "great recession" y_short <- cps$y[,(cps$yr_label %in% c(2009:2013))] ## dimensions T <- ncol(y_short) ## time points per unit N <- nrow(y_short) ## number of units ## Demonstrate estimation of intermittent missing data ## from posterior predictive distribution by randomly ## selecting 10 percent of entries in y_short and ## setting them to NA. ## randomly assign missing positions in y. ## assume every unit has equal number of ## missing positions ## randomly select number of missing ## observations for each unit m_factor <- .1 M <- floor(m_factor*N*T) m_vec <- rep(floor(M/N),N) if( sum(m_vec) < M ) { m_left <- M - sum(m_vec) pos_i <- sample(1:N, m_left, replace = FALSE) m_vec[pos_i] <- m_vec[pos_i] + 1 } # end conditional statement on whether all # missing cells allocated # randomly select missing # positions for each unit pos <- matrix(0,N,T) for( i in 1:N ) { sel_ij <- sample(3:(T-3), m_vec[i], replace = FALSE) ## avoid edge effects pos[i,sel_ij] <- 1 } ## configure N x T matrix, y_obs, with 10 percent missing ## observations (filled with NA) ## to use for sampling. Entries in y_short ## that are set to missing (NA) are ## determined by entries of "1" in the ## N x T matrix, pos. y_obs <- y_short y_obs[pos == 1] <- NA ## Conduct dependent GP model estimation under ## missing observations in y_obs. ## We illustrate the ability to have multiple ## function or covariance terms ## by seting gp_cov = c("se","sn"), which indicates ## the first term is a ## squared exponential ("se") trend covariance term ## and the "sn" is a seasonality ## term. The setting, sn_order = 4, indicates the ## length scale of the seasonality ## term is 4 month. The season term is actually ## "quasi" seasonal, in that the ## seasonal covariance kernel is multiplied by a ## squared exponential, which allows ## the pattern of seasonality to evolve over time. res_gp_2 <- gpdpgrow(y = y_obs, gp_cov=c("se","sn"), sn_order = 4, n.iter = 10, n.burn = 4, n.thin = 1, n.tune = 0) ## 2 plots of estimated functions: 1. faceted by cluster and fit; ## 2. data for experimental units. ## for a group of randomly-selected functions fit_plots_gp_2 <- cluster_plot( object = res_gp_2, units_name = "state", units_label = cps$st, single_unit = TRUE, credible = TRUE ) ## Compute out-of-sample fit statistic, normalized mean-square ## prediction error (MSPE) ## The normalized MSPE will take the predicted values ## for the entries in y_short purposefully ## set to NA and will subtract them from the known true ## values in y_short (and square them). ## This MSE on predicted (test) data is then ## divided by the variance of the test observations ## to output something akin to a percent error. ( nmspe_gp <- MSPE(res_gp_2, y_short, pos)$nMSPE ) }
An internal function of the growfunctions
package.
plot_cluster( y, H, sort = FALSE, sample_rate = 0.05, y.axis.label = NULL, smoother = TRUE, fade = 0.2, cluster_order = NULL, plot_render = TRUE )
plot_cluster( y, H, sort = FALSE, sample_rate = 0.05, y.axis.label = NULL, smoother = TRUE, fade = 0.2, cluster_order = NULL, plot_render = TRUE )
y |
An |
H |
An |
sort |
An optional boolean input on whether to sort the cluster-indexed plot panels
of function by size of cluster. Defaults |
sample_rate |
An optional numeric value in (0,1] indicating percent of functions to randomly sample within each cluster to address over-plotting. Defaults to 1. |
y.axis.label |
An optional text label for y-axis. Defaults to |
smoother |
An optional scalar boolean input indicating whether to co-plot a smoother line through the functions in each cluster. |
fade |
An optional numeric input in |
cluster_order |
An optional numeric vector of length |
plot_render |
An optional boolean input indicating whether to render the plot.
Defaults to |
A list object containing the plot of estimated functions, faceted by cluster,
and the associated data.frame
object.
p.basis |
A |
map |
A |
Terrance Savitsky [email protected]
This is the generic predict_functions method. See the following functions for the details about different data structures:
predict_functions(object, J, ...)
predict_functions(object, J, ...)
object |
Object of class |
J |
Scalar denoting number of draws to take from posterior predictive for each unit.
Defaults to |
... |
further arguments passed to or from other methods. |
predict_functions.gpdpgrow
for objects of class gpdpgrow
predict_functions.gmrfdpgrow
for objects of class gmrfdpgrow
gmrfdpgrow
object of estimated parameters.A companion function to gmrfdpgrow
## S3 method for class 'gmrfdpgrow' predict_functions(object, J = 500, T_test, ...)
## S3 method for class 'gmrfdpgrow' predict_functions(object, J = 500, T_test, ...)
object |
Object of class |
J |
Scalar denoting number of draws to take from posterior predictive for each unit.
Defaults to |
T_test |
The number of equally-spaced time points to predict the iGMRF functions ahead of
of the functions estimated at |
... |
further arguments passed to or from other methods. |
out A list object containing containing two matrices; the first is a P x (N*T) matrix of predicted function values for each of P sampled iterations. N is slow index and denotes the number of experimental units. The second matrix is an N x T average over the P sampled draws, composed in Rao-Blackwellized fashion.
Intended as a companion function for gmrfdpgrow
for prediction
Terrance Savitsky [email protected]
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gmrf = gmrfdpgrow(y = y_short, q_order = c(2,4), q_type = c("tr","sn"), n.iter = 100, n.burn = 50, n.thin = 1) ## Prediction Model Runs T_test <- 4 pred_gmrf <- predict_functions( object = res_gmrf, J = 1000, T_test = T_test ) ## plot estimated and predicted functions plot_gmrf <- predict_plot(object = pred_gmrf, units_label = cps$st, single_unit = TRUE, credible = FALSE) ## End(Not run)
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gmrf = gmrfdpgrow(y = y_short, q_order = c(2,4), q_type = c("tr","sn"), n.iter = 100, n.burn = 50, n.thin = 1) ## Prediction Model Runs T_test <- 4 pred_gmrf <- predict_functions( object = res_gmrf, J = 1000, T_test = T_test ) ## plot estimated and predicted functions plot_gmrf <- predict_plot(object = pred_gmrf, units_label = cps$st, single_unit = TRUE, credible = FALSE) ## End(Not run)
gpdpgrow
object of estimated parameters.A companion function to gpdpgrow
## S3 method for class 'gpdpgrow' predict_functions( object, J = 500, test_times, time_points = NULL, sn_order = NULL, ... )
## S3 method for class 'gpdpgrow' predict_functions( object, J = 500, test_times, time_points = NULL, sn_order = NULL, ... )
object |
Object of class |
J |
Scalar denoting number of draws to take from posterior predictive for each unit.
Defaults to |
test_times |
A numeric vector holding test times at which to predict GP function values
Will use the estimated covariance parameters from the training data to predict
functions at the test_times for the |
time_points |
Inputs a vector of common time points at which the collections of functions were
observed (with the possibility of intermittent missingness). The length of |
sn_order |
An integer vector of length, |
... |
further arguments passed to or from other methods. |
out A list object containing containing two matrices; the first is a K x (N*T) matrix of predicted function values for each of K sampled iterations. N is slow index and denotes the number of experimental units. The second matrix is an N x T average over the K sampled draws, composed in Rao-Blackwellized fashion.
Intended as a companion function for gpdpgrow
for prediction
Terrance Savitsky [email protected]
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gp = gpdpgrow(y = y_short n.iter = 50, n.burn = 25, n.thin = 1, n.tune = 0) ## Prediction Model Runs T_test <- 4 T_yshort <- ncol(y_short) pred_gp <- predict_functions( object = res_gp, test_times = (T_yshort+1):(T_yshort+T_test) ) ## plot estimated and predicted functions plot_gp <- predict_plot(object = pred_gp, units_label = cps$st, single_unit = FALSE, credible = TRUE) ## End(Not run)
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2010:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gp = gpdpgrow(y = y_short n.iter = 50, n.burn = 25, n.thin = 1, n.tune = 0) ## Prediction Model Runs T_test <- 4 T_yshort <- ncol(y_short) pred_gp <- predict_functions( object = res_gp, test_times = (T_yshort+1):(T_yshort+T_test) ) ## plot estimated and predicted functions plot_gp <- predict_plot(object = pred_gp, units_label = cps$st, single_unit = FALSE, credible = TRUE) ## End(Not run)
Uses as input the output object from the gpdpgrow.predict() and gmrfdpgrow.predict() methods.
predict_plot( object = NULL, units_label = NULL, type_label = c("fitted", "predicted"), time_points = NULL, date_label = NULL, x.axis.label = NULL, y.axis.label = NULL, single_unit = FALSE, credible = TRUE )
predict_plot( object = NULL, units_label = NULL, type_label = c("fitted", "predicted"), time_points = NULL, date_label = NULL, x.axis.label = NULL, y.axis.label = NULL, single_unit = FALSE, credible = TRUE )
object |
A |
units_label |
A vector of labels to apply to experimental units with length equal to the number of
unique units. Defaults to sequential numeric values as input with data, |
type_label |
A character vector assigning a "fitted" or "predicted
label for the |
time_points |
A list input of length 2 with each entry containing a numeric vector
of times - one for the observed times for the set of "fitted" functions and the other denotes
time values at which "predicted" values were rendered for the functions. This input variable
only applies to |
date_label |
A vector of |
x.axis.label |
Text label for x-axis. Defaults to |
y.axis.label |
Text label for y-axis. Defaults to |
single_unit |
A scalar boolean indicating whether to plot the fitted vs data curve for
only a single experimental units (versus a random sample of 6).
Defaults to |
credible |
A scalar boolean indicating whether to plot 95 percent credible intervals for
estimated functions, |
A list object containing the plot of estimated functions, faceted by cluster,
and the associated data.frame
object.
p.cluster |
A |
dat.cluster |
A |
Terrance Savitsky [email protected]
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gmrf <- gmrfdpgrow(y = y_short, q_order = c(2,4), q_type = c("tr","sn"), n.iter = 40, n.burn = 20, n.thin = 1) res_gp <- gpdpgrow(y = y_short n.iter = 10, n.burn = 4, n.thin = 1, n.tune = 0) ## Prediction Model Runs T_test <- 4 pred_gmrf <- predict_functions( object = res_gmrf, J = 1000, T_test = T_test ) T_yshort <- ncol(y_short) pred_gp <- predict_functions( object = res_gp, test_times = (T_yshort+1):(T_yshort+T_test) ) ## plot estimated and predicted functions plot_gmrf <- predict_plot(object = pred_gmrf, units_label = cps$st, single_unit = TRUE, credible = FALSE) plot_gp <- predict_plot(object = pred_gp, units_label = cps$st, single_unit = FALSE, credible = TRUE) ## End(Not run)
## Not run: library(growfunctions) data(cps) y_short <- cps$y[,(cps$yr_label %in% c(2008:2013))] t_train <- ncol(y_short) N <- nrow(y_short) t_test <- 4 ## Model Runs res_gmrf <- gmrfdpgrow(y = y_short, q_order = c(2,4), q_type = c("tr","sn"), n.iter = 40, n.burn = 20, n.thin = 1) res_gp <- gpdpgrow(y = y_short n.iter = 10, n.burn = 4, n.thin = 1, n.tune = 0) ## Prediction Model Runs T_test <- 4 pred_gmrf <- predict_functions( object = res_gmrf, J = 1000, T_test = T_test ) T_yshort <- ncol(y_short) pred_gp <- predict_functions( object = res_gp, test_times = (T_yshort+1):(T_yshort+T_test) ) ## plot estimated and predicted functions plot_gmrf <- predict_plot(object = pred_gmrf, units_label = cps$st, single_unit = TRUE, credible = FALSE) plot_gp <- predict_plot(object = pred_gp, units_label = cps$st, single_unit = FALSE, credible = TRUE) ## End(Not run)
provides posterior sampled values for every model parameter of a
gpdpgrow
object
samples(object, ...)
samples(object, ...)
object |
A |
... |
Ignored |
provides posterior sampled values for every model parameter of a
gmrfdpgrow
object
## S3 method for class 'gmrfdpgrow' samples(object, ...)
## S3 method for class 'gmrfdpgrow' samples(object, ...)
object |
A |
... |
Ignored |