| Title: | Functional Data Analysis using Variational Inference |
|---|---|
| Description: | Implements a variational Expectation-Maximization (VEM) algorithm for smoothing one or multiple functional observations via basis function selection. The algorithm estimates all model parameters simultaneously and automatically, while accounting for within-curve correlation. The approach provides a flexible and computationally efficient framework for smoothing correlated functional data. The algorithm is described in da Cruz, A. C., de Souza, C. P., and Sousa, P. H. (2024). 'Fast Bayesian basis selection for functional data representation with correlated errors.' <doi:10.48550/arXiv.2405.20758>. |
| Authors: | Camila de Souza [cre], Stephen Kinsey [aut], Ana Carolina da Cruz [aut], Pedro Henrique Toledo Oliveira Sousa [aut] |
| Maintainer: | Camila de Souza <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-20 17:30:24 UTC |
| Source: | https://github.com/cran/fda.vi |
Returns a matrix of estimated basis coefficients.
Each column corresponds to one curve; each row to one basis function.
Coefficients are set to zero when the posterior inclusion probability
threshold (inactive bases). When
is.composite = TRUE, the matrix has dimension ,
where is the highest selected by GCV across all
curves; coefficients for curves with smaller optimal are
zero-padded (structural padding).
## S3 method for class 'vem_fit' coef(object, threshold = 0.5, ...)## S3 method for class 'vem_fit' coef(object, threshold = 0.5, ...)
object |
A |
threshold |
Numeric in |
... |
Currently unused. |
A numeric matrix of dimension , with row
names B1, B2, ... and column names Curve_1, Curve_2, ....
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, predict.vem_fit,
summary.vem_fit
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # K x m matrix of active coefficients coefs <- coef(fit) dim(coefs) # 8 x 3 # Compare estimated vs true coefficients for curve 1 cbind(estimated = coefs[, 1], true = toy_curves$true_coef) # Stricter threshold — only very confident inclusions coef(fit, threshold = 0.9)data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # K x m matrix of active coefficients coefs <- coef(fit) dim(coefs) # 8 x 3 # Compare estimated vs true coefficients for curve 1 cbind(estimated = coefs[, 1], true = toy_curves$true_coef) # Stricter threshold — only very confident inclusions coef(fit, threshold = 0.9)
Computes the generalized cross-validation (GCV) score for each curve from a
vem_smooth model object. GCV approximates leave-one-out prediction
error and is used by tune_vem_by_gcv to select the optimal
number of basis functions .
The smoother matrix maps observed values to fitted values and is
constructed from the variational posteriors. Its trace provides the effective
degrees of freedom used in the GCV penalty.
gcv_vem(out, threshold = 0.5)gcv_vem(out, threshold = 0.5)
out |
A fitted object returned by |
threshold |
Numeric in |
A named numeric vector of length of per-curve GCV scores.
Lower scores indicate better fit relative to model complexity.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
Plots observed data, the posterior mean fitted curve, and an optional 95%
credible band for a single curve from a vem_fit object. The credible
band provides uncertainty quantification by sampling from the
variational posteriors: and
.
Predictions are automatically back-transformed if the model was fitted with
center = TRUE or scale = TRUE.
## S3 method for class 'vem_fit' plot( x, curve_idx = 1, type = c("polygon", "lines"), show_CI = TRUE, n_samples = 200, alpha_shade = 0.25, ylim = NULL, xlab = "t", ylab = "Value", show_basis = FALSE, ... )## S3 method for class 'vem_fit' plot( x, curve_idx = 1, type = c("polygon", "lines"), show_CI = TRUE, n_samples = 200, alpha_shade = 0.25, ylim = NULL, xlab = "t", ylab = "Value", show_basis = FALSE, ... )
x |
A |
curve_idx |
Integer. Index of the curve to plot. Default |
type |
Character. Credible band style: |
show_CI |
Logical. If |
n_samples |
Integer. Number of posterior draws used to construct the
credible band. Default |
alpha_shade |
Numeric in |
ylim |
Optional numeric vector of length 2. If |
xlab |
Character. Label for the horizontal axis. Default |
ylab |
Character. Label for the vertical axis. Default |
show_basis |
Logical. If |
... |
Additional graphical parameters passed to |
Invisibly returns NULL. Called for its side effect of
producing a plot.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Default: shaded credible band for curve 1 plot(fit) # Dashed credible band for curve 2 plot(fit, curve_idx = 2, type = "lines") # With basis selection subplot plot(fit, curve_idx = 1, show_basis = TRUE) # Suppress credible band plot(fit, show_CI = FALSE, main = "Mean fit only")data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Default: shaded credible band for curve 1 plot(fit) # Dashed credible band for curve 2 plot(fit, curve_idx = 2, type = "lines") # With basis selection subplot plot(fit, curve_idx = 1, show_basis = TRUE) # Suppress credible band plot(fit, show_CI = FALSE, main = "Mean fit only")
Returns posterior mean curve estimates from a vem_fit object.
Active basis functions are selected by applying a 0.5 probability threshold
on the posterior inclusion probabilities. If newdata is supplied,
a new basis matrix is
constructed at those time points; otherwise the original fitted time points
are used. Predictions are automatically back-transformed if the model was
fitted with center = TRUE or scale = TRUE.
## S3 method for class 'vem_fit' predict(object, newdata = NULL, ...)## S3 method for class 'vem_fit' predict(object, newdata = NULL, ...)
object |
A |
newdata |
Optional numeric vector of new time points at which to
evaluate the fitted curves. Must lie within the original domain
|
... |
Currently unused. |
A list of length . Each element is a numeric vector of
predicted values on the original (back-transformed) scale.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, plot.vem_fit,
coef.vem_fit
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Predictions at original time points preds <- predict(fit) length(preds) # 3 — one vector per curve # Predictions at a denser grid Xt_new <- seq(0, 1, length.out = 200) preds_dense <- predict(fit, newdata = Xt_new) # Plot observed vs predicted for curve 1 plot(toy_curves$Xt, toy_curves$y[[1]], pch = 16, cex = 0.6, col = "grey50", xlab = "t", ylab = "y") lines(Xt_new, preds_dense[[1]], col = "firebrick", lwd = 2)data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) # Predictions at original time points preds <- predict(fit) length(preds) # 3 — one vector per curve # Predictions at a denser grid Xt_new <- seq(0, 1, length.out = 200) preds_dense <- predict(fit, newdata = Xt_new) # Plot observed vs predicted for curve 1 plot(toy_curves$Xt, toy_curves$y[[1]], pch = 16, cex = 0.6, col = "grey50", xlab = "t", ylab = "y") lines(Xt_new, preds_dense[[1]], col = "firebrick", lwd = 2)
Provides a displayed summary of the results from vem_fit and
invisibly returns a list of summary statistics, including the basis type,
number of curves, selected , active basis counts per curve,
estimated model parameters, and GCV tuning results if applicable.
Reported variational posterior parameters for and
are the shape and scale of their respective
Inverse-Gamma variational distributions:
and
.
For composite fits (selection_metric = "per_curve"), parameters
from the first curve are shown as representative values.
## S3 method for class 'vem_fit' summary(object, ...)## S3 method for class 'vem_fit' summary(object, ...)
object |
A |
... |
Currently unused. |
Invisibly returns a list with element active_bases: an
integer vector of active basis counts per curve.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
data(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) summary(fit) # Active basis counts are returned invisibly s <- summary(fit) s$active_basesdata(toy_curves) fit <- vem_fit(y = toy_curves$y, Xt = toy_curves$Xt, K = 8) summary(fit) # Active basis counts are returned invisibly s <- summary(fit) s$active_bases
A small simulated dataset of three functional curves used in package examples. Curves are generated from a known cubic B-spline expansion with correlated errors, making it suitable for demonstrating basis selection and recovery of true coefficients.
toy_curvestoy_curves
A list with the following elements:
yNamed list of 3 numeric vectors of length 50, one per curve.
XtNumeric vector of 50 equally spaced time points on .
true_coefNumeric vector of length 8. True basis coefficients:
c(1.5, 0, -1, 0.8, 0, -0.5, 1.2, -0.9).
KInteger. Number of basis functions used (8).
mInteger. Number of curves (3).
sigmaNumeric. True noise standard deviation (0.1).
wNumeric. True correlation decay parameter (6).
Each curve is generated as:
where
for all , and with
and (correlation function of an
Ornstein-Uhlenbeck (OU) process).
Basis functions 2 and 5 have zero coefficients, providing a ground truth
for evaluating basis selection.
Generated via data-raw/generate_toy_curves.R.
data(toy_curves) str(toy_curves) # Plot the three raw curves plot(toy_curves$Xt, toy_curves$y[[1]], type = "l", ylab = "y", xlab = "t", main = "Toy curves") lines(toy_curves$Xt, toy_curves$y[[2]], col = "blue") lines(toy_curves$Xt, toy_curves$y[[3]], col = "red")data(toy_curves) str(toy_curves) # Plot the three raw curves plot(toy_curves$Xt, toy_curves$y[[1]], type = "l", ylab = "y", xlab = "t", main = "Toy curves") lines(toy_curves$Xt, toy_curves$y[[2]], col = "blue") lines(toy_curves$Xt, toy_curves$y[[3]], col = "red")
Fits vem_smooth across a grid of candidate basis sizes
K_grid and selects the best using GCV scores from
gcv_vem. Called internally by vem_fit when
a vector of values is supplied; not typically called directly.
Two selection modes are supported: "mean" selects the single
minimizing the mean GCV across all curves; "per_curve" selects the
that minimizes the GCV criterion for each individual curve, producing a composite fit.
tune_vem_by_gcv( y, Xt, K_grid, build_B, initial_values_fn, threshold = 0.5, mode = c("mean", "per_curve"), ... )tune_vem_by_gcv( y, Xt, K_grid, build_B, initial_values_fn, threshold = 0.5, mode = c("mean", "per_curve"), ... )
y |
List of curves. |
Xt |
Numeric vector of time points. |
K_grid |
Integer vector of candidate |
build_B |
Function with signature |
initial_values_fn |
Function with signature |
threshold |
Posterior inclusion probability (PIP) threshold passed to |
mode |
Character. |
... |
Additional arguments passed to |
A list with:
fitsNamed list of fitted vem_smooth objects, one per
candidate .
gcv_matrixNumeric matrix ( length(K_grid))
of per-curve GCV scores.
best_K_meanInteger. Best by mean GCV.
best_K_per_curveInteger vector of length . Best
for each curve.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
Fits one or more functional curves using Bayesian basis function selection
via the Variational EM algorithm, with an Ornstein-Uhlenbeck within-curve
correlation structure. Internally calls vem_smooth to run the
VEM algorithm.
If a single value is provided for K, the model is fitted using that fixed
number of basis functions. If a vector of candidate values is supplied, the
function tune_vem_by_gcv is called to automatically select the
optimal K based on the GCV criterion. The resulting fitted object provides
methods for plot, predict, coef, and summary via the corresponding S3
methods plot.vem_fit, predict.vem_fit,
coef.vem_fit, and summary.vem_fit.
vem_fit( y, Xt, K = NULL, basis_type = c("cubic_bspline", "fourier"), selection_metric = c("mean", "per_curve"), threshold = 0.5, center = FALSE, scale = FALSE, period = NULL, initial_values_fn = NULL, lambda_1 = NULL, lambda_2 = NULL, delta_1 = NULL, delta_2 = NULL, ... )vem_fit( y, Xt, K = NULL, basis_type = c("cubic_bspline", "fourier"), selection_metric = c("mean", "per_curve"), threshold = 0.5, center = FALSE, scale = FALSE, period = NULL, initial_values_fn = NULL, lambda_1 = NULL, lambda_2 = NULL, delta_1 = NULL, delta_2 = NULL, ... )
y |
Named list of numeric vectors, one per curve. |
Xt |
Numeric vector of time points, common across all curves. |
K |
Integer or integer vector of candidate basis sizes. If a single
value, fits directly at that |
basis_type |
Character. One of |
selection_metric |
Character. |
threshold |
Numeric in |
center |
Logical. If |
scale |
Logical. If |
period |
Numeric. Period for Fourier bases. Defaults to the domain
range |
initial_values_fn |
Function with signature |
lambda_1, lambda_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
delta_1, delta_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
... |
Additional arguments passed to |
An object of class "vem_fit" containing:
modelThe fitted vem_smooth object (global fit), or a
named list of per-curve model objects (composite fit).
selected_KInteger vector of length m. The K
used for each curve.
best_KThe single selected K (global fit), or a vector
(composite fit).
tuningOutput of tune_vem_by_gcv, including
the full GCV matrix and all candidate fits. NULL if a single
K was supplied.
scaling_paramsList with means and sds used
for standardization. Used by predict.vem_fit and
plot.vem_fit to back-transform predictions.
data_origThe input curves in their original scales.
basis_type, is_composite, Xt, call
Metadata stored for use by S3 methods.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_smooth, plot.vem_fit,
predict.vem_fit, coef.vem_fit,
summary.vem_fit
data(toy_curves) fit <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = 8 ) summary(fit) plot(fit, curve_idx = 1) coef(fit) predict(fit) # GCV tuning over a grid of K values fit_gcv <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = c(6, 8, 10) ) fit_gcv$best_Kdata(toy_curves) fit <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = 8 ) summary(fit) plot(fit, curve_idx = 1) coef(fit) predict(fit) # GCV tuning over a grid of K values fit_gcv <- vem_fit( y = toy_curves$y, Xt = toy_curves$Xt, K = c(6, 8, 10) ) fit_gcv$best_K
Fits functional curves simultaneously via Bayesian basis function
selection with an Ornstein-Uhlenbeck within-curve correlation structure.
This function is called internally by vem_fit and only runs
the VEM algorithm itself, without performing basis construction,
standardization, or GCV tuning. Most users should call
vem_fit instead, which handles those steps automatically.
vem_smooth( y, B, Xt = Xt, m = length(y), K = K, mu_ki = 0.5, lambda_1 = 1e-10, lambda_2 = 1e-10, delta_1 = 1e-10, delta_2 = 1e-10, maxIter = 1000, initial_values, convergence_threshold = 0.01, lower_opt = 0.1 )vem_smooth( y, B, Xt = Xt, m = length(y), K = K, mu_ki = 0.5, lambda_1 = 1e-10, lambda_2 = 1e-10, delta_1 = 1e-10, delta_2 = 1e-10, maxIter = 1000, initial_values, convergence_threshold = 0.01, lower_opt = 0.1 )
y |
List of length |
B |
List of length |
Xt |
Numeric vector of |
m |
Integer. Number of curves. Defaults to |
K |
Integer. Number of basis functions. |
mu_ki |
Numeric scalar in |
lambda_1, lambda_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
delta_1, delta_2
|
Positive scalars. Inverse-Gamma prior hyperparameters
for |
maxIter |
Integer. Maximum VEM iterations. Default |
initial_values |
Named list with elements |
convergence_threshold |
Positive scalar. Absolute ELBO tolerance for
convergence. Default |
lower_opt |
Positive scalar. Lower bound for |
The algorithm alternates between an E-step — sequential coordinate ascent variational inference (CAVI) updates for
, , , ,
and — and an M-step that maximizes the ELBO with
respect to the correlation decay parameter via L-BFGS-B with an
analytic gradient. Convergence is declared when the absolute ELBO change
between iterations falls below convergence_threshold.
For hyperparameter initialization, set delta_1 and delta_2
such that delta_2 / (delta_1 - 1) is a rough estimate of the noise
variance, and initialize w consistent with the expected correlation
strength in the data.
A named list containing:
mu_betaPosterior means (length ).
Sigma_betaPosterior covariance array ().
probPosterior inclusion probabilities (length ).
Basis is active for curve when .
delta1, delta2
Final parameters.
lambda1, lambda2
Final parameters.
wEstimated correlation decay parameter (range-normalized scale).
cor_matThe Ornstein-Uhlenbeck correlation
matrix evaluated at the final estimated decay parameter
, as returned by computePsiMatrix.
elbo_valuesELBO trajectory across iterations.
convergedLogical. Whether the convergence criterion was met.
n_iterationsNumber of iterations run.
da Cruz, A. C., de Souza, C. P. E., & Sousa, P. H. T. O. (2024). Fast Bayesian basis selection for functional data representation with correlated errors. arXiv:2405.20758. https://arxiv.org/abs/2405.20758
vem_fit, plot.vem_fit,
predict.vem_fit, coef.vem_fit