We report all the formulae used in the main computations that take place in the package as a reference for the users and developers.
The basic model is based on the state-space representation
The Kalman filtering recursions, written in scalar form (to gain computational speed and insights) are the following (for generality we let also the variance of the measurement error vary over time).
The initial innovation, its variance and the Kalman gains are:
When one or more values of yt are missing,
then the only modifications to the above recursions are the following.
Since the two state variables are nonstationary, their initialization
should be diffuse:
As it will be clear from the computations below, when v is infinite, the mean squared errors of at(1) and at(2), and the variances of the innovations are infinite for t = 1, 2, while from t = 3 on they are finite.
Let us carry out the computations for t = 1, 2, 3 and then take the limit for v → ∞.
The smoothing recursions start from t = n and work backwards
down to t = 1. The following
quantities are auxiliar to compute the smoothed values of αt(1)
and their MSE.
Since the smoother is linear in the observations, the vector of
smoothed αt(1),
say s, is just a
linear transformation of the vector of observations, y: s = Wy.
The number of effective degrees of freedom is the trace of the weighting
matrix W (cf. Hastie,
Tibshirani and Friedman, 2009, The Elements of Statistical
Learning, Section 5.4.1). The formulae for computing such weights
in a general state-space form can be found in Koopman and Harvey (2003)
Journal of Economic Dynamics and Control, vol. 27. In our
framework, the diagonal elements of the matrix W are given by
The log-likelihood must be maximised with respect to a very large number of parameters (n + 3). Thus, providing the numerical optimiser with analytical scores is important for stability and speed. Since all of our parameters are related to quantities in the disturbance covariance matrices, we can adapt the results in Koopman and Shephard (1992, Biometrika vol. 79).
Recall that our (slightly re-parametrised) model is
If λ is not fixed and ℓ(θ) represents the
log-likelihood function, with θ vector all of the
parameters, then
Generally, constrained optimisation problems also need the
derivatives of the constraining function, which in our case is $g(\boldsymbol{\theta}) = \sum_{t=1}^n
\sigma_t$. The solution to the regularised maximum likelihood
problem must satisfy g(θ) ≤ M.
The derivatives are trivial:
If λ is fixed, σε2 = λσ2
and, in the log-likelihood function ℓ(θ), the vector of
parameters θ does not
contain λ or σε2.
The derivatives are now