Title: | Multivariate Spatio-Temporal Models using Structural Equations |
---|---|
Description: | Fits a wide variety of multivariate spatio-temporal models with simultaneous and lagged interactions among variables (including vector autoregressive spatio-temporal ('VAST') dynamics) for areal, continuous, or network spatial domains. It includes time-variable, space-variable, and space-time-variable interactions using dynamic structural equation models ('DSEM') as expressive interface, and the 'mgcv' package to specify splines via the formula interface. See Thorson et al. (2024) <doi:10.48550/arXiv.2401.10193> for more details. |
Authors: | James T. Thorson [aut, cre]
|
Maintainer: | James T. Thorson <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2025-03-13 14:27:29 UTC |
Source: | CRAN |
Given user-provided newdata
, expand the object tmb_data
to include predictions corresponding to those new observations
add_predictions(object, newdata, remove_origdata = FALSE)
add_predictions(object, newdata, remove_origdata = FALSE)
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response. |
remove_origdata |
Whether to remove original-data to allow faster evaluation.
|
the object fit$tmb_inputs$tmb_data
representing data used during fitting,
but with updated values for slots associated with predictions, where this
updated object can be recompiled by TMB to provide predictions
Shapefile defining the spatial domain for the eastern and northern Bering Sea bottom trawl surveys.
data(bering_sea)
data(bering_sea)
Data used to demonstrate and test model-based age expansion, using density= dependence corrected survey catch rates after first=stage expansion from the bottom trawl survey for ages 1-15, conducted by by the Alaska Fisheries Science Center, including annual surveys in the eastern Bering Sea 1982-2019 and 2021-2023, as well as the northern Bering Sea in 1982/85/88/91 and 2010/17/18/19/21/22/23.
data(bering_sea_pollock_ages)
data(bering_sea_pollock_ages)
Estimated proporrtion-at-age for Alaska pollock using the package VAST, for comparison with output using tinyVAST.
data(bering_sea_pollock_vast)
data(bering_sea_pollock_vast)
Calculates the conditional Akaike Information criterion (cAIC).
cAIC(object)
cAIC(object)
object |
Output from |
cAIC is designed to optimize the expected out-of-sample predictive performance for new data that share the same random effects as the in-sample (fitted) data, e.g., spatial interpolation. In this sense, it should be a fast approximation to optimizing the model structure based on k-fold cross-validation.
By contrast, AIC()
calculates the marginal Akaike Information Criterion,
which is designed to optimize expected predictive performance for new data
that have new random effects, e.g., extrapolation, or inference about
generative parameters.
Both cAIC and EDF are calculated using Eq. 6 of Zheng, Cadigan, and Thorson (2024).
For models that include profiled fixed effects, these profiles are turned off.
cAIC value
Zheng, N., Cadigan, N., & Thorson, J. T. (2024). A note on numerical evaluation of conditional Akaike information for nonlinear mixed-effects models (arXiv:2411.14185). arXiv. doi:10.48550/arXiv.2411.14185
data( red_snapper ) red_snapper = droplevels(subset(red_snapper, Data_type=="Biomass_KG")) # Define mesh mesh = fmesher::fm_mesh_2d( red_snapper[,c('Lon','Lat')], cutoff = 1 ) # define formula with a catchability covariate for gear formula = Response_variable ~ factor(Year) + offset(log(AreaSwept_km2)) # make variable column red_snapper$var = "logdens" # fit using tinyVAST fit = tinyVAST( data = red_snapper, formula = formula, sem = "logdens <-> logdens, sd_space", space_columns = c("Lon",'Lat'), spatial_graph = mesh, family = tweedie(link="log"), variable_column = "var", control = tinyVASTcontrol( getsd = FALSE, profile = "alpha_j" ) ) cAIC(fit) # conditional AIC AIC(fit) # marginal AIC
data( red_snapper ) red_snapper = droplevels(subset(red_snapper, Data_type=="Biomass_KG")) # Define mesh mesh = fmesher::fm_mesh_2d( red_snapper[,c('Lon','Lat')], cutoff = 1 ) # define formula with a catchability covariate for gear formula = Response_variable ~ factor(Year) + offset(log(AreaSwept_km2)) # make variable column red_snapper$var = "logdens" # fit using tinyVAST fit = tinyVAST( data = red_snapper, formula = formula, sem = "logdens <-> logdens, sd_space", space_columns = c("Lon",'Lat'), spatial_graph = mesh, family = tweedie(link="log"), variable_column = "var", control = tinyVASTcontrol( getsd = FALSE, profile = "alpha_j" ) ) cAIC(fit) # conditional AIC AIC(fit) # marginal AIC
classify_variables
is copied from sem:::classifyVariables
classify_variables(model)
classify_variables(model)
model |
syntax for structural equation model |
Copied from package sem
under licence GPL (>= 2) with permission from John Fox
Tagged-list defining exogenous and endogenous variables
Data used to demonstrate and test a bivariate model for morphometric condition (i.e., residuals in a weight-at-length relationship) and density for fishes, using the same example as was provided as a wiki example for VAST. Data are from doi:10.3354/meps13213
data(condition_and_density)
data(condition_and_density)
deviance_explained
fits a null model, calculates the deviance relative to
a saturated model for both the original and the null model, and uses these
to calculate the proportion of deviance explained.
This implementation conditions upon the maximum likelihood estimate of fixed effects and the empirical Bayes ("plug-in") prediction of random effects. It can be described as "conditional deviance explained". A state-space model that estimates measurement error variance approaching zero (i.e., collapses to a process-error-only model) will have a conditional deviance explained that approaches 1.0
deviance_explained(x, null_formula, null_delta_formula = ~1)
deviance_explained(x, null_formula, null_delta_formula = ~1)
x |
output from |
null_formula |
formula for the null model. If missing, it uses
|
null_delta_formula |
formula for the null model for the delta component.
If missing, it uses
|
the proportion of conditional deviance explained.
Additional families compatible with tinyVAST()
.
delta_lognormal(link1, link2 = "log", type = c("standard", "poisson-link")) delta_gamma(link1, link2 = "log", type = c("standard", "poisson-link"))
delta_lognormal(link1, link2 = "log", type = c("standard", "poisson-link")) delta_gamma(link1, link2 = "log", type = c("standard", "poisson-link"))
link1 |
Link for first part of delta/hurdle model. |
link2 |
Link for second part of delta/hurdle model. |
type |
Delta/hurdle family type. |
link |
Link. |
A list with elements common to standard R family objects including family
,
link
, linkfun
, and linkinv
. Delta/hurdle model families also have
elements delta
(logical) and type
(standard vs. Poisson-link).
Poisson-link delta families:
Thorson, J.T. 2018. Three problems with the conventional delta-model for biomass sampling data, and a computationally efficient alternative. Canadian Journal of Fisheries and Aquatic Sciences, 75(9), 1369-1382. doi:10.1139/cjfas-2017-0266
Poisson-link delta families:
Thorson, J.T. 2018. Three problems with the conventional delta-model for biomass sampling data, and a computationally efficient alternative. Canadian Journal of Fisheries and Aquatic Sciences, 75(9), 1369-1382. doi:10.1139/cjfas-2017-0266
delta_lognormal() delta_gamma()
delta_lognormal() delta_gamma()
Calculates an estimator for a derived quantity by summing across multiple predictions. This can be used to approximate an integral when estimating area-expanded abundance, abundance-weighting a covariate to calculate distribution shifts, and/or weighting one model variable by another.
integrate_output( object, newdata, area, type = rep(1, nrow(newdata)), weighting_index, covariate, getsd = TRUE, bias.correct = TRUE, apply.epsilon = FALSE, intern = FALSE )
integrate_output( object, newdata, area, type = rep(1, nrow(newdata)), weighting_index, covariate, getsd = TRUE, bias.correct = TRUE, apply.epsilon = FALSE, intern = FALSE )
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response,
where a total value is calculated by combining across these individual predictions.
If these locations are randomly drawn from a specified spatial domain, then
|
area |
vector of values used for area-weighted expansion of
estimated density surface for each row of |
type |
Integer-vector indicating what type of expansion to apply to
each row of
|
weighting_index |
integer-vector used to indicate a previous row
that is used to calculate a weighted average that is then
applied to the given row of |
covariate |
numeric-vector used to provide a covariate
that is used in expansion, e.g., to provide positional
coordinates when calculating the abundance-weighted centroid with respect
to that coordinate. Only used for when |
getsd |
logical indicating whether to get the standard error, where
|
bias.correct |
logical indicating if bias correction should be applied using
standard methods in |
apply.epsilon |
Apply epsilon bias correction using a manual calculation rather than using the conventional method in TMB::sdreport? See details for more information. |
intern |
Do Laplace approximation on C++ side? Passed to |
Analysts will often want to calculate some value by combining the predicted response at multiple
locations, and potentially from multiple variables in a multivariate analysis.
This arises in a univariate model, e.g., when calculating the integral under a predicted
density function, which is approximated using a midpoint or Monte Carlo approximation
by calculating the linear predictors at each location newdata
,
applying the inverse-link-trainsformation,
and calling this predicted response mu_g
. Total abundance is then be approximated
by multiplying mu_g
by the area associated with each midpoint or Monte Carlo
approximation point (supplied by argument area
),
and summing across these area-expanded values.
In more complicated cases, an analyst can then use covariate
to calculate the weighted average
of a covariate for each midpoint location. For example, if the covariate is
positional coordinates or depth/elevation, then type=2
measures shifts in the average habitat utilization with respect to that covariate.
Alternatively, an analyst fitting a multivariate model might weight one variable
based on another using weighting_index
, e.g.,
to calculate abundance-weighted average condition, or
predator-expanded stomach contents.
In practice, spatial integration in a multivariate model requires two passes through the rows of
newdata
when calculating a total value. In the following, we
write equations using C++ indexing conventions such that indexing starts with 0,
to match the way that integrate_output
expects indices to be supplied.
Given inverse-link-transformed predictor ,
function argument
type
as
function argument
area
as ,
function argument
covariate
as ,
function argument
weighting_index
as \eqn{ h_g }
function argument weighting_index
as \eqn{ h_g }
the first pass calculates:
where the total value from this first pass is calculated as:
The second pass then applies a further weighting, which depends upon ,
and potentially upon
and
.
If then
If then
If then
If then
If then
Finally, the total value from this second pass is calculated as:
and is outputted by
integrate_output
,
along with a standard error and potentially using
the epsilon bias-correction estimator to correct for skewness and retransformation
bias.
Standard bias-correction using bias.correct=TRUE
can be slow, and in
some cases it might be faster to do apply.epsilon=TRUE
and intern=TRUE
.
However, that option is somewhat experimental, and a user might want to confirm
that the two options give identical results. Similarly, using bias.correct=TRUE
will still calculate the standard-error, whereas using
apply.epsilon=TRUE
and intern=TRUE
will not.
A vector containing the plug-in estimate, standard error, the epsilon bias-corrected
estimate if available, and the standard error for the bias-corrected estimator.
Depending upon settings, one or more of these will be NA
values, and the
function can be repeatedly called to get multiple estimators and/or statistics.
Extract the (marginal) log-likelihood of a tinyVAST model
## S3 method for class 'tinyVAST' logLik(object, ...)
## S3 method for class 'tinyVAST' logLik(object, ...)
object |
output from |
... |
not used |
object of class logLik
with attributes
val |
log-likelihood |
df |
number of parameters |
make_dsem_ram
converts SEM arrow notation to ram
describing SEM parameters
make_dsem_ram( dsem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )
make_dsem_ram( dsem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )
dsem |
dynamic structural equation model structure,
passed to either |
times |
A character vector listing the set of times in order |
variables |
A character vector listing the set of variables |
covs |
optional: a character vector of one or more elements, with each element
giving a string of variable names, separated by commas. Variances and covariances
among all variables in each such string are added to the model. For confirmatory
factor analysis models specified via |
quiet |
Boolean indicating whether to print messages to terminal |
remove_na |
Boolean indicating whether to remove NA values from RAM (default) or not.
|
RAM specification using arrow-and-lag notation
Each line of the RAM specification for make_dsem_ram
consists of four (unquoted) entries,
separated by commas:
This is a simple formula, of the form
A -> B
or, equivalently, B <- A
for a regression
coefficient (i.e., a single-headed or directional arrow);
A <-> A
for a variance or A <-> B
for a covariance
(i.e., a double-headed or bidirectional arrow). Here, A
and
B
are variable names in the model. If a name does not correspond
to an observed variable, then it is assumed to be a latent variable.
Spaces can appear freely in an arrow specification, and
there can be any number of hyphens in the arrows, including zero: Thus,
e.g., A->B
, A --> B
, and A>B
are all legitimate
and equivalent.
An integer specifying whether the linkage
is simultaneous (lag=0
) or lagged (e.g., X -> Y, 1, XtoY
indicates that X in time T affects Y in time T+1), where
only one-headed arrows can be lagged. Using positive values to indicate lags
then matches the notational convention used in package dynlm.
The name of the regression coefficient, variance,
or covariance specified by the arrow. Assigning the same name to two or
more arrows results in an equality constraint. Specifying the parameter name
as NA
produces a fixed parameter.
start value for a free parameter or value of a fixed parameter.
If given as NA
(or simply omitted), the model is provide a default
starting value.
Lines may end in a comment following #. The function extends code copied from package
sem
under licence GPL (>= 2) with permission from John Fox.
Simultaneous autoregressive process for simultaneous and lagged effects
This text then specifies linkages in a multivariate time-series model for variables
with dimensions
for
times and
variables.
make_dsem_ram
then parses this text to build a path matrix with
dimensions
, where
represents the impact of
on
, where
and
. This path matrix defines a simultaneous equation
where is a matrix of exogenous errors with covariance
,
where
is the Cholesky of exogenous covariance. This
simultaneous autoregressive (SAR) process then results in
having covariance:
Usefully, it is also easy to compute the inverse-covariance (precision) matrix :
Example: univariate and first-order autoregressive model
This simultaneous autoregressive (SAR) process across variables and times
allows the user to specify both simultaneous effects (effects among variables within
year ) and lagged effects (effects among variables among years
).
As one example, consider a univariate and first-order autoregressive process where
.
with independent errors. This is specified by passing
dsem = X -> X, 1, rho; X <-> X, 0, sigma
to make_dsem_ram
.
This is then parsed to a RAM:
heads | to | from | paarameter | start |
1 | 2 | 1 | 1 | NA |
1 | 3 | 2 | 1 | NA |
1 | 4 | 3 | 1 | NA |
2 | 1 | 1 | 2 | NA |
2 | 2 | 2 | 2 | NA |
2 | 3 | 3 | 2 | NA |
2 | 4 | 4 | 2 | NA |
Rows of this RAM where heads=1
are then interpreted to construct the path matrix :
\deqn{ \mathbf P = \begin{bmatrix} 0 & 0 & 0 & 0 \ \rho & 0 & 0 & 0 \ 0 & \rho & 0 & 0 \ 0 & 0 & \rho & 0\ \end{bmatrix} }
While rows where heads=2
are interpreted to construct the Cholesky of exogenous covariance :
\deqn{ \mathbf \Gamma = \begin{bmatrix} \sigma & 0 & 0 & 0 \ 0 & \sigma & 0 & 0 \ 0 & 0 & \sigma & 0 \ 0 & 0 & 0 & \sigma\ \end{bmatrix} }
with two estimated parameters . This then results in covariance:
\deqn{ \mathrm{Cov}(\mathbf X) = \sigma^2 \begin{bmatrix} 1 & \rho^1 & \rho^2 & \rho^3 \ \rho^1 & 1 & \rho^1 & \rho^2 \ \rho^2 & \rho^1 & 1 & \rho^1 \ \rho^3 & \rho^2 & \rho^1 & 1\ \end{bmatrix} }
Similarly, the arrow-and-lag notation can be used to specify a SAR representing a conventional structural equation model (SEM), cross-lagged (a.k.a. vector autoregressive) models (VAR), dynamic factor analysis (DFA), or many other time-series models.
A reticular action module (RAM) describing dependencies
# Univariate AR1 dsem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Univariate AR2 dsem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Bivariate VAR dsem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( dsem=dsem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) dsem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0 # No additional variance Y <-> Y, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) dsem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) dsem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 )
# Univariate AR1 dsem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Univariate AR2 dsem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Bivariate VAR dsem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( dsem=dsem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) dsem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0 # No additional variance Y <-> Y, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) dsem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) dsem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 )
make_eof_ram
converts SEM arrow notation to ram
describing SEM parameters
make_eof_ram( times, variables, n_eof, remove_na = TRUE, standard_deviations = "unequal" )
make_eof_ram( times, variables, n_eof, remove_na = TRUE, standard_deviations = "unequal" )
times |
A character vector listing the set of times in order |
variables |
A character vector listing the set of variables |
n_eof |
Number of EOF modes of variability to estimate |
remove_na |
Boolean indicating whether to remove NA values from RAM (default) or not.
|
standard_deviations |
One of |
A reticular action module (RAM) describing dependencies
# Two EOFs for two variables make_eof_ram( times = 2010:2020, variables = c("pollock","cod"), n_eof=2 )
# Two EOFs for two variables make_eof_ram( times = 2010:2020, variables = c("pollock","cod"), n_eof=2 )
make_sem_ram
converts SEM arrow notation to ram
describing SEM parameters
make_sem_ram(sem, variables, quiet = FALSE, covs = variables)
make_sem_ram(sem, variables, quiet = FALSE, covs = variables)
sem |
structural equation model structure, passed to either |
variables |
A character vector listing the set of variables |
quiet |
if |
covs |
optional: a character vector of one or more elements, with each element
giving a string of variable names, separated by commas. Variances and covariances
among all variables in each such string are added to the model. For confirmatory
factor analysis models specified via |
An S3-class "sem_ram"
containing:
model
Output from specifyEquations
or specifyModel
that defines paths and parameters
ram
reticular action module (RAM) describing dependencies
parse_path
is copied from sem::parse.path
parse_path(path)
parse_path(path)
path |
character string indicating a one-headed or two-headed path in a structural equation model |
Copied from package sem
under licence GPL (>= 2) with permission from John Fox
Tagged-list defining variables and direction for a specified path coefficient
Predicts values given new covariates using a tinyVAST model
## S3 method for class 'tinyVAST' predict( object, newdata, remove_origdata = FALSE, what = c("mu_g", "p_g", "palpha_g", "pgamma_g", "pepsilon_g", "pomega_g", "pdelta_g", "pxi_g", "p2_g", "palpha2_g", "pgamma2_g", "pepsilon2_g", "pomega2_g", "pdelta2_g", "pxi2_g"), se.fit = FALSE, ... )
## S3 method for class 'tinyVAST' predict( object, newdata, remove_origdata = FALSE, what = c("mu_g", "p_g", "palpha_g", "pgamma_g", "pepsilon_g", "pomega_g", "pdelta_g", "pxi_g", "p2_g", "palpha2_g", "pgamma2_g", "pepsilon2_g", "pomega2_g", "pdelta2_g", "pxi2_g"), se.fit = FALSE, ... )
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response. |
remove_origdata |
Whether to eliminate the original data from the TMB object, thereby speeding up the TMB object construction. However, this also eliminates information about random-effect variance, and is not appropriate when requesting predictive standard errors or epsilon bias-correction. |
what |
What REPORTed object to output, where
|
se.fit |
Calculate standard errors? |
... |
Not used. |
Either a vector with the prediction for each row of newdata
, or a named list
with the prediction and standard error (when se.fit = TRUE
).
print summary of tinyVAST model
## S3 method for class 'tinyVAST' print(x, ...)
## S3 method for class 'tinyVAST' print(x, ...)
x |
output from |
... |
not used |
invisibly returns a named list of key model outputs and summary statements
Data used to demonstrate and test analysis using multiple data types
data(red_snapper)
data(red_snapper)
Spatial extent used for red snapper analysis, derived from Chap-7 of doi:10.1201/9781003410294
data(red_snapper_shapefile)
data(red_snapper_shapefile)
reload_model
allows a user to save a fitted model, reload it in a new
R terminal, and then relink the DLLs so that it functions as expected.
reload_model(x, check_gradient = TRUE)
reload_model(x, check_gradient = TRUE)
x |
Output from |
check_gradient |
Whether to check the gradients of the reloaded model |
Output from tinyVAST
with DLLs relinked
Calculate residuals
## S3 method for class 'tinyVAST' residuals(object, type = c("deviance", "response"), ...)
## S3 method for class 'tinyVAST' residuals(object, type = c("deviance", "response"), ...)
object |
Output from |
type |
which type of residuals to compute (only option is |
... |
Note used |
a vector residuals, associated with each row of data
supplied during fitting
This function provides a random number generator for
the multivariate normal distribution with mean equal
to mean
and sparse precision matrix Q
.
rmvnorm_prec(Q, n = 1, mean = rep(0, nrow(Q)))
rmvnorm_prec(Q, n = 1, mean = rep(0, nrow(Q)))
Q |
sparse precision (inverse-covariance) matrix. |
n |
number of observations. |
mean |
mean vector. |
a matrix with dimension length(mean)
by
n
, containing realized draws from the specified
mean and precision
Rotate lower-triangle loadings matrix to order factors from largest to smallest variance.
rotate_pca( L_tf, x_sf = matrix(0, nrow = 0, ncol = ncol(L_tf)), order = c("none", "increasing", "decreasing") )
rotate_pca( L_tf, x_sf = matrix(0, nrow = 0, ncol = ncol(L_tf)), order = c("none", "increasing", "decreasing") )
L_tf |
Loadings matrix with dimension |
x_sf |
Spatial response with dimensions |
order |
Options for resolving label-switching via reflecting
each factor to achieve a given order across dimension |
List containing the rotated loadings L_tf
,
the inverse-rotated response matrix x_sf
,
and the rotation H
Data used to demonstrate and test multivariate second-order autoregressive models using a simultaneous autoregressive (SAR) process across regions. Data are from doi:10.1002/mcf2.10023
data(salmon_returns)
data(salmon_returns)
sample_variable
samples from the joint distribution of random and fixed effects to approximate the predictive distribution for a variable
Using sample_fixed=TRUE
(the default) in sample_variable
propagates variance in both fixed and random effects, while
using sample_fixed=FALSE
does not.
Sampling fixed effects will sometimes cause numerical under- or overflow (i.e., output values of NA
) in cases when
variance parameters are estimated imprecisely. In these cases, the multivariate normal approximation being used is a poor
representation of the tail probabilities, and results in some samples with implausibly high (or negative) variances,
such that the associated random effects then have implausibly high magnitude.
sample_variable( x, variable_name = "mu_i", n_samples = 100, sample_fixed = TRUE, seed = 123456 )
sample_variable( x, variable_name = "mu_i", n_samples = 100, sample_fixed = TRUE, seed = 123456 )
x |
output from |
variable_name |
name of variable available in report using |
n_samples |
number of samples from the joint predictive distribution for fixed and random effects. Default is 100, which is slow. |
sample_fixed |
whether to sample fixed and random effects, |
seed |
integer used to set random-number seed when sampling variables, as passed to |
A matrix with a row for each data
supplied during fitting, and
n_samples
columns, where each column in a vector of samples
for a requested quantity given sampled uncertainty in fixed and/or random effects
set.seed(101) x = runif(n = 100, min = 0, max = 2*pi) y = 1 + sin(x) + 0.1 * rnorm(100) # Do fit with getJointPrecision=TRUE fit = tinyVAST( formula = y ~ s(x), data = data.frame(x=x,y=y), control = tinyVASTcontrol(getJointPrecision = TRUE) ) # samples from distribution for the mean sample_variable(fit)
set.seed(101) x = runif(n = 100, min = 0, max = 2*pi) y = 1 + sin(x) + 0.1 * rnorm(100) # Do fit with getJointPrecision=TRUE fit = tinyVAST( formula = y ~ s(x), data = data.frame(x=x,y=y), control = tinyVASTcontrol(getJointPrecision = TRUE) ) # samples from distribution for the mean sample_variable(fit)
Data used to demonstrate and test empirical orthogonal function generalized linear latent variable model (EOF-GLLVM)
data(sea_ice)
data(sea_ice)
Make sparse matrix to project from stream-network nodes to user-supplied points
sfnetwork_evaluator(stream, loc, tolerance = 0.01)
sfnetwork_evaluator(stream, loc, tolerance = 0.01)
stream |
sfnetworks object representing stream network |
loc |
sf object representing points to which are being projected |
tolerance |
error-check tolerance |
the sparse interpolation matrix, with rows for each row of data
supplied during fitting and columns for each spatial random effect.
make an object representing spatial information required
to specify a stream-network spatial domain, similar in usage to
link[fmesher]{fm_mesh_2d}
for a 2-dimensional continuous domain
sfnetwork_mesh(stream)
sfnetwork_mesh(stream)
stream |
sfnetworks object representing stream network |
An object (list) of class sfnetwork_mesh
. Elements include:
The number of random effects used to represent the network
a table containing a description of parent nodes (from), childen nodes (to), and the distance separating them
copy of the stream network object passed as argument
Simulate values from a GMRF using a tail-up exponential model on a stream network
simulate_sfnetwork(sfnetwork_mesh, theta, n = 1, what = c("samples", "Q"))
simulate_sfnetwork(sfnetwork_mesh, theta, n = 1, what = c("samples", "Q"))
sfnetwork_mesh |
Output from |
theta |
Decorrelation rate |
n |
number of simulated GMRFs |
what |
Whether to return the simulated GMRF or its precision matrix |
a matrix of simulated values for a Gaussian Markov random field
arising from a stream-network spatial domain, with row for each spatial random
effect and n
columns, using the sparse precision matrix
defined in Charsley et al. (2023)
Charsley, A. R., Gruss, A., Thorson, J. T., Rudd, M. B., Crow, S. K., David, B., Williams, E. K., & Hoyle, S. D. (2023). Catchment-scale stream network spatio-temporal models, applied to the freshwater stages of a diadromous fish species, longfin eel (Anguilla dieffenbachii). Fisheries Research, 259, 106583. doi:10.1016/j.fishres.2022.106583
simulate.tinyVAST
is an S3 method for producing a matrix of simulations from
a fitted model. It can be used with the DHARMa package
among other uses. Code is modified from the version in sdmTMB
## S3 method for class 'tinyVAST' simulate( object, nsim = 1L, seed = sample.int(1e+06, 1L), type = c("mle-eb", "mle-mvn"), ... )
## S3 method for class 'tinyVAST' simulate( object, nsim = 1L, seed = sample.int(1e+06, 1L), type = c("mle-eb", "mle-mvn"), ... )
object |
output from |
nsim |
how many simulations to do |
seed |
random seed |
type |
How parameters should be treated. |
... |
not used |
A matrix with row for each row of data
in the fitted model and nsim
columns, containing new samples from the fitted model.
set.seed(101) x = seq(0, 2*pi, length=100) y = sin(x) + 0.1*rnorm(length(x)) fit = tinyVAST( data=data.frame(x=x,y=y), formula = y ~ s(x) ) simulate(fit, nsim=100, type="mle-mvn") if(requireNamespace("DHARMa")){ # simulate new data conditional on fixed effects # and sampling random effects from their predictive distribution y_iz = simulate(fit, nsim=500, type="mle-mvn") # Visualize using DHARMa res = DHARMa::createDHARMa( simulatedResponse = y_iz, observedResponse = y, fittedPredictedResponse = fitted(fit) ) plot(res) }
set.seed(101) x = seq(0, 2*pi, length=100) y = sin(x) + 0.1*rnorm(length(x)) fit = tinyVAST( data=data.frame(x=x,y=y), formula = y ~ s(x) ) simulate(fit, nsim=100, type="mle-mvn") if(requireNamespace("DHARMa")){ # simulate new data conditional on fixed effects # and sampling random effects from their predictive distribution y_iz = simulate(fit, nsim=500, type="mle-mvn") # Visualize using DHARMa res = DHARMa::createDHARMa( simulatedResponse = y_iz, observedResponse = y, fittedPredictedResponse = fitted(fit) ) plot(res) }
summarize parameters from a fitted tinyVAST
## S3 method for class 'tinyVAST' summary( object, what = c("space_term", "time_term", "spacetime_term", "fixed"), predictor = c("one", "two"), ... )
## S3 method for class 'tinyVAST' summary( object, what = c("space_term", "time_term", "spacetime_term", "fixed"), predictor = c("one", "two"), ... )
object |
Output from |
what |
What component to summarize, whether |
predictor |
whether to get the 1st or 2nd linear predictor (the latter is only applicable in delta models) |
... |
Not used |
tinyVAST
includes three components:
a separable Gaussian Markov random field (GMRF) constructed from a structural equation model (SEM) and a spatial variable
a separable GMRF constructed from a a dynamic SEM (a nonseparable time-variable interaction) and a spatial variable
a generalized additive model (GAM), representing exogenous covariates
Each of these are summarized and interpreted differently, and summary.tinyVAST
facilitates this.
Regarding the DSEM componennt, tinyVAST includes an "arrow and lag"
notation, which specifies the set of
path coefficients and exogenous variance parameters to be estimated. Function tinyVAST
then estimates the maximum likelihood value for those coefficients and parameters
by maximizing the log-marginal likelihood.
However, many users will want to associate individual parameters and standard errors
with the path coefficients that were specified using the "arrow and lag" notation.
This task is complicated in
models where some path coefficients or variance parameters are specified to share a single value a priori,
or were assigned a name of NA and hence assumed to have a fixed value a priori (such that
these coefficients or parameters have an assigned value but no standard error).
The summary
function therefore compiles the MLE for coefficients (including duplicating
values for any path coefficients that assigned the same value) and standard error
estimates, and outputs those in a table that associates them with the user-supplied path and parameter names.
It also outputs the z-score and a p-value arising from a two-sided Wald test (i.e.
comparing the estimate divided by standard error against a standard normal distribution).
A data-frame containing the estimate (and standard errors, two-sided Wald-test
z-value, and associated p-value if the standard errors are available) for
model parameters, including the fixed-effects specified via formula
,
or the path coefficients for the spatial SEM specified via space_term
,
the dynamic SEM specified via time_term
, or the spatial dynamic SEM
specified via spacetime_term
Fits a vector autoregressive spatio-temporal (VAST) model using a minimal feature-set and a widely used interface.
tinyVAST( formula, data, time_term = NULL, space_term = NULL, spacetime_term = NULL, family = gaussian(), space_columns = c("x", "y"), spatial_domain = NULL, time_column = "time", times = NULL, variable_column = "var", variables = NULL, distribution_column = "dist", delta_options = list(formula = ~1), spatial_varying = NULL, weights = NULL, control = tinyVASTcontrol(), ... )
tinyVAST( formula, data, time_term = NULL, space_term = NULL, spacetime_term = NULL, family = gaussian(), space_columns = c("x", "y"), spatial_domain = NULL, time_column = "time", times = NULL, variable_column = "var", variables = NULL, distribution_column = "dist", delta_options = list(formula = ~1), spatial_varying = NULL, weights = NULL, control = tinyVASTcontrol(), ... )
formula |
Formula with response on left-hand-side and predictors on right-hand-side,
parsed by |
data |
Data-frame of predictor, response, and offset variables. Also includes
variables that specify space, time, variables, and the distribution for samples,
as identified by arguments |
time_term |
Specification for time-series structural equation model structure for
constructing a time-variable interaction that defines a time-varying intercept
for each variable (i.e., applies uniformly across space).
|
space_term |
Specification for structural equation model structure for
constructing a space-variable interaction.
|
spacetime_term |
Specification for time-series structural equation model structure
including lagged or simultaneous effects for
constructing a time-variable interaction, which is then combined in
a separable process with the spatial correlation to form a
space-time-variable interaction (i.e., the interaction occurs locally at each site).
|
family |
A function returning a class |
space_columns |
A string or character vector that indicates
the column(s) of |
spatial_domain |
Object that represents spatial relationships, either using
|
time_column |
A character string indicating the column of |
times |
A integer vector listing the set of times in order.
If |
variable_column |
A character string indicating the column of |
variables |
A character vector listing the set of variables.
if |
distribution_column |
A character string indicating the column of |
delta_options |
a named list with slots for |
spatial_varying |
a formula specifying spatially varying coefficients. |
weights |
A numeric vector representing optional likelihood weights for the data likelihood. Weights do not have to sum to one and are not internally modified. Thee weights argument needs to be a vector and not a name of the variable in the data frame. |
control |
Output from |
... |
Not used. |
tinyVAST
includes four basic inputs that specify the model structure:
formula
specifies covariates and splines in a Generalized Additive Model;
space_term
specifies interactions among variables and over time, constructing
the space-variable interaction.
spacetime_term
specifies interactions among variables and over time, constructing
the space-time-variable interaction.
spatial_domain
specifies spatial correlations
the default spacetime_term=NULL
and space_term=NULL
turns off all multivariate
and temporal indexing, such that spatial_domain
is then ignored, and the model collapses
to a generalized additive model using gam
. To specify a univariate spatial model,
the user must specify spatial_domain
and either space_term=""
or spacetime_term=""
, where the latter
two are then parsed to include a single exogenous variance for the single variable
Model type | How to specify |
Generalized additive model | specify spatial_domain=NULL space_term="" and spacetime_term="" , and then use formula to specify splines and covariates |
Dynamic structural equation model (including vector autoregressive, dynamic factor analysis, ARIMA, and structural equation models) | specify spatial_domain=NULL and use spacetime_term to specify interactions among variables and over time |
Univariate spatio-temporal model, or multiple independence spatio-temporal variables | specify spatial_domain and spacetime_term="" , where the latter is then parsed to include a single exogenous variance for the single variable |
Multivariate spatial model including interactions | specify spatial_domain and use space_term to specify spatial interactions |
Vector autoregressive spatio-temporal model (i.e., lag-1 interactions among variables) | specify spatial_domain and use spacetime_term="" to specify interactions among variables and over time, where spatio-temporal variables are constructed via the separable interaction of spacetime_term and spatial_domain |
An object (list) of class tinyVAST
. Elements include:
Data-frame supplied during model fitting
the spatial domain supplied during fitting
the formula specified during model fitting
The TMB object from MakeADFun
The output from nlminb
The report from obj$report()
The output from sdreport
The list of inputs passed to MakeADFun
A record of the function call
Total time to run model
Objects useful for package function, i.e., all arguments passed during the call
output from deviance_explained
Details section of make_dsem_ram()
for a summary of the math involved with constructing the DSEM, and doi:10.1111/2041-210X.14289 for more background on math and inference
doi:10.48550/arXiv.2401.10193 for more details on how GAM, SEM, and DSEM components are combined from a statistical and software-user perspective
summary.tinyVAST()
to visualize parameter estimates related to SEM and DSEM model components
# Simulate a seperable two-dimensional AR1 spatial process n_x = n_y = 25 n_w = 10 R_xx = exp(-0.4 * abs(outer(1:n_x, 1:n_x, FUN="-")) ) R_yy = exp(-0.4 * abs(outer(1:n_y, 1:n_y, FUN="-")) ) z = mvtnorm::rmvnorm(1, sigma=kronecker(R_xx,R_yy) ) # Simulate nuissance parameter z from oscillatory (day-night) process w = sample(1:n_w, replace=TRUE, size=length(z)) Data = data.frame( expand.grid(x=1:n_x, y=1:n_y), w=w, z=as.vector(z) + cos(w/n_w*2*pi)) Data$n = Data$z + rnorm(nrow(Data), sd=1) # Add columns for multivariate and/or temporal dimensions Data$var = "n" # make SPDE mesh for spatial term mesh = fmesher::fm_mesh_2d( Data[,c('x','y')], n=100 ) # fit model with cyclic confounder as GAM term out = tinyVAST( data = Data, formula = n ~ s(w), spatial_domain = mesh, space_term = "n <-> n, sd_n" )
# Simulate a seperable two-dimensional AR1 spatial process n_x = n_y = 25 n_w = 10 R_xx = exp(-0.4 * abs(outer(1:n_x, 1:n_x, FUN="-")) ) R_yy = exp(-0.4 * abs(outer(1:n_y, 1:n_y, FUN="-")) ) z = mvtnorm::rmvnorm(1, sigma=kronecker(R_xx,R_yy) ) # Simulate nuissance parameter z from oscillatory (day-night) process w = sample(1:n_w, replace=TRUE, size=length(z)) Data = data.frame( expand.grid(x=1:n_x, y=1:n_y), w=w, z=as.vector(z) + cos(w/n_w*2*pi)) Data$n = Data$z + rnorm(nrow(Data), sd=1) # Add columns for multivariate and/or temporal dimensions Data$var = "n" # make SPDE mesh for spatial term mesh = fmesher::fm_mesh_2d( Data[,c('x','y')], n=100 ) # fit model with cyclic confounder as GAM term out = tinyVAST( data = Data, formula = n ~ s(w), spatial_domain = mesh, space_term = "n <-> n, sd_n" )
Control parameters for tinyVAST
tinyVASTcontrol( nlminb_loops = 1, newton_loops = 0, eval.max = 1000, iter.max = 1000, getsd = TRUE, silent = getOption("tinyVAST.silent", TRUE), trace = getOption("tinyVAST.trace", 0), verbose = getOption("tinyVAST.verbose", FALSE), profile = c(), tmb_par = NULL, tmb_map = NULL, gmrf_parameterization = c("separable", "projection"), reml = FALSE, getJointPrecision = FALSE, calculate_deviance_explained = TRUE, run_model = TRUE )
tinyVASTcontrol( nlminb_loops = 1, newton_loops = 0, eval.max = 1000, iter.max = 1000, getsd = TRUE, silent = getOption("tinyVAST.silent", TRUE), trace = getOption("tinyVAST.trace", 0), verbose = getOption("tinyVAST.verbose", FALSE), profile = c(), tmb_par = NULL, tmb_map = NULL, gmrf_parameterization = c("separable", "projection"), reml = FALSE, getJointPrecision = FALSE, calculate_deviance_explained = TRUE, run_model = TRUE )
nlminb_loops |
Integer number of times to call |
newton_loops |
Integer number of Newton steps to do after running
|
eval.max |
Maximum number of evaluations of the objective function
allowed. Passed to |
iter.max |
Maximum number of iterations allowed. Passed to |
getsd |
Boolean indicating whether to call |
silent |
Disable terminal output for inner optimizer? |
trace |
Parameter values are printed every |
verbose |
Output additional messages about model steps during fitting? |
profile |
Parameters to profile out of the likelihood (this subset will be appended to |
tmb_par |
list of parameters for starting values, with shape identical
to |
tmb_map |
input passed to TMB::MakeADFun as argument |
gmrf_parameterization |
Parameterization to use for the Gaussian Markov
random field, where the default |
reml |
Logical: use REML (restricted maximum likelihood) estimation rather than maximum likelihood? Internally, this adds the fixed effects to the list of random effects to integrate over. |
getJointPrecision |
whether to get the joint precision matrix. Passed
to |
calculate_deviance_explained |
whether to calculate proportion of deviance
explained. See |
run_model |
whether to run the model of export TMB objects prior to compilation (useful for debugging) |
An object (list) of class tinyVASTcontrol
, containing either default or
updated values supplied by the user for model settings
extract the covariance of fixed effects, or both fixed and random effects.
## S3 method for class 'tinyVAST' vcov(object, which = c("fixed", "random", "both"), ...)
## S3 method for class 'tinyVAST' vcov(object, which = c("fixed", "random", "both"), ...)
object |
output from |
which |
whether to extract the covariance among fixed effects, random effects, or both |
... |
ignored, for method compatibility |
A square matrix containing the estimated covariances among the parameter estimates in the model.
The dimensions dependend upon the argument which
, to determine whether fixed, random effects,
or both are outputted.