State-space modeling in tidyILD with KFAS

What is a state-space model?

In a state-space (or dynamic linear) model, you observe a sequence \(y_1,\ldots,y_T\) and posit latent states \(\alpha_t\) that evolve over time and drive the observations. A minimal Gaussian local level model is:

State: \(\alpha_t = \alpha_{t-1} + \eta_t\) with \(\eta_t \sim \mathcal{N}(0, Q)\) (random walk).
Observation: \(y_t = \alpha_t + \varepsilon_t\) with \(\varepsilon_t \sim \mathcal{N}(0, H)\).

So the “level” of the process drifts slowly; the data are noisy measurements of that level. In tidyILD, ild_kfas(..., state_spec = "local_level") fits this structure (via KFAS) for a single time series per call—one distinct .ild_id after ild_prepare().

When use this instead of mixed-model residual correlation?

Multilevel models (ild_lme(), ild_brms()) are the right tool when you want population inference: fixed and random effects across many persons, within/between decomposition, and residual dynamics (e.g. AR1 or CAR1 on the within-person residuals) as a nuisance correlation structure.

State-space models in this package focus on explicit latent dynamics for one series at a time: estimating a time-varying level (or, in future specs, trend or AR) in state space, with diagnostics built on one-step-ahead innovations from the Kalman filter.

Conceptual contrast:

AR1 on residuals (nlme / lme): correlation among errors around a smooth mean structure.
Local level (KFAS): the mean itself is a random walk and is smoothed; the “residual” is often summarized as standardized prediction errors (innovations).

Neither replaces the other in general—choose based on whether your primary goal is hierarchical population inference or structured univariate latent dynamics for one person (or one series).

Filtered vs smoothed states

In ILD terms:

Filtered (“online” / nowcast): your best estimate of the latent level at occasion t using only measurements up through t—what the model would have said about the state at that moment as data arrived. Useful when you think about sequential self-report or real-time summaries.
Smoothed (“offline” / full-history): your best estimate of the level at each occasion using the entire series—what you report after seeing all waves, including revising earlier time points. This is usually what you want for scientific summaries of a completed diary or EMA study.

Formally, after fitting, KFAS runs KFS() to obtain:

Filtered state: \(E(\alpha_t \mid y_1,\ldots,y_t)\). In KFAS output this is often att.
Smoothed state: \(E(\alpha_t \mid y_1,\ldots,y_T)\). In KFAS output this is often alphahat.

In tidyILD, ild_kfas(..., smoother = TRUE) requests smoothing in KFS(); when FALSE, smoothed states may be unavailable or less central. Use ild_plot_filtered_vs_smoothed() to compare the first latent state over time.

Minimal example

If the KFAS package is installed, you can run:

library(tidyILD)
set.seed(1)
d <- ild_simulate(n_id = 1, n_obs_per = 60, seed = 42)
x <- ild_prepare(d, id = "id", time = "time")
x <- ild_center(x, y)
fit <- suppressWarnings(
  ild_kfas(x, outcome = "y", state_spec = "local_level", time_units = "sim_steps")
)
b <- ild_diagnose(fit)
class(b)
#> [1] "ild_diagnostics_bundle" "list"
ild_autoplot(b, section = "residual", type = "acf")

If KFAS is not installed, install it with install.packages("KFAS") and load tidyILD; the same code then runs end-to-end.

What the backend does not yet do

Read this section before relying on KFAS in a paper or preregistration. The normative scope document inst/dev/KFAS_V1_BACKEND.md in the package source has full detail; the points below are the trust boundaries for v1.

What ild_kfas() is:

Discrete-time state-space modeling: the latent state advances one step per observation (row order after ild_prepare() for that series). This is standard dynamic linear modeling on an index, not a continuous-time differential equation.

What it is not (today):

Not ctsem-style (or similar) continuous-time latent dynamics with unequal physical intervals baked into the transition model. Those workflows are a later tier; this backend does not replace them for that use case.
Not a multilevel latent-state model: there is no pooled latent trajectory across persons in v1. Fitting one series per call is the supported semantics; pooling mode across IDs is limited (see the backend doc). Stacking independent per-person fits is explicit via fit_context and guardrails—not hierarchical partial pooling of a shared state.
local_level only in v1; other state_spec labels (local_trend, ar1_state, regression_local_level, …) are reserved for later releases.
Optional short-horizon forecasts and richer uncertainty quantification are planned; see ?ild_plot_forecast and NEWS.md.

Irregular timing: see vignette("kfas-irregular-timing-spacing", package = "tidyILD")—tidyILD diagnoses spacing; the KFAS wrapper fits under discrete-time choices and does not, by itself, “solve” irregular measurement in a continuous-time sense.