---
title: "Estimate a model"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Estimate a model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(
  fig.width = 10, # Set default plot width (adjust as needed)
  fig.height = 8, # Set default plot height (adjust as needed)
  fig.align = "center" # Center align all plots
)

# knitr::opts_chunk$set(eval = FALSE)
```

# Model and estimator
The `gmwmx2` `R` package allows to estimate linear model with correlated residuals in presence of missing data.

More precisely, we assume the following model:
\begin{equation}
    \boldsymbol{Y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{F}\left\{\boldsymbol{\Sigma}\left(\boldsymbol{\gamma}\right)\right\},
\end{equation}

where $\boldsymbol{X} \in \mathbb{R}^{n \times p}$ is a design matrix of observed predictors, $\boldsymbol{\beta} \in \mathbb{R}^p$ is the regression parameter vector and $\boldsymbol{\varepsilon} = \{\varepsilon_{i}\}_{i=1,\ldots,n}$ is a zero-mean process following an unspecified joint distribution $\mathcal{F}$ with positive-definite covariance function $\boldsymbol{\Sigma}(\boldsymbol{\gamma}) \in \mathbb{R}^{n\times n}$ characterizing the second-order dependence structure of the process and parameterized by the vector $\boldsymbol{\gamma} \in \boldsymbol{\Gamma} \subset \mathbb{R}^q$. 

We then define the a random variable $\boldsymbol{Z} =\{Z_{i}\}_{i=1,\ldots,n}$ which describes the missing observation mechanism. More specifically, the vector $\boldsymbol{Z} \in \{0, 1\}^n$ is a binary-valued stationary process independent of $\boldsymbol{Y}$ with expectation $\mu(\boldsymbol{\vartheta}) = \mathbb{E}[Z_i] \in [0, \, 1)$, $\forall \, i$, and covariance matrix $\boldsymbol{\Lambda}(\boldsymbol{\vartheta}) = \mathbb{V}[\boldsymbol{Z}] \in \mathbb{R}^{n\times n}$ whose structure is assumed known up to the parameter vector $\boldsymbol{\vartheta} \in \boldsymbol{\Upsilon} \subset \mathbb{R}^k$


We then assume that we only observe the stochastic process $\tilde{\boldsymbol{Y}} = \boldsymbol{Z} \odot \boldsymbol{Y}$, where $\odot$ denotes the Hadamard product. Hence, $\tilde{\boldsymbol{Y}} \in \mathbb{R}^n = \boldsymbol{Y} \odot \boldsymbol{Z}$ represents the observed process vector with null elements in the positions where observations are missing. 

Using $\otimes$ to denote the Kronecker product, we then define $\tilde{\boldsymbol{X}} = \boldsymbol{Z} \otimes \boldsymbol{1}^T \odot \boldsymbol{X} \in \mathbb{R}^{n \times p}$ as the design matrix $\boldsymbol{X}$ with zero-valued vectors for the rows where observations are missing in $\boldsymbol{Y}$ (where $\boldsymbol{1}$ represents a vector of ones of dimension $p$).

We estimate the parameters $\boldsymbol{\beta}$ with the least square estimator:

\begin{equation}
    \hat{\boldsymbol{\beta}} = \left(\tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{X}}\right)^{-1} \tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{Y}}.
\end{equation}

We compute the estimated residuals as $\hat{\boldsymbol{\varepsilon}} = {\tilde{\boldsymbol{Y}}} - \tilde{{\boldsymbol{X}}} \hat{\boldsymbol{\beta}}$.

We then estimate with the Maximum Likelihood Estimator the parameters $\boldsymbol{\vartheta}$ of the missingness process $\boldsymbol{Z}$ assuming that $\boldsymbol{Z}$ is generated from a Markov model with the following transition probabilities:

\begin{equation}
    \label{eq:markov_model_def}
    \begin{aligned}
& P\left( Z_2=1 \mid Z_1=1\right)=1 - p_1 \\
& P\left(Z_2=1 \mid Z_1=0\right) = p_2 \\
& P\left(Z_2=0 \mid Z_1=1\right)=p_1 \\
& P\left(Z_2=0 \mid Z_1=0\right)=1-p_2.
\end{aligned}
\end{equation}

We then estimate the parameters $\boldsymbol{\gamma}$ using a Generalized method of Wavelet Moments approach [@guerrier2013wavelet] and using the fact that the variance-covariance matrix of $\hat{\boldsymbol{\varepsilon}}$ is given by:


$$\boldsymbol{\Sigma}_{\hat{\boldsymbol{\varepsilon}}}(\boldsymbol{\gamma}) = [\boldsymbol{\Lambda}(\hat{\boldsymbol{\vartheta}}) + \mu(\hat{\boldsymbol{\vartheta}})^2 \mathbf{1} \mathbf{1}^T ] \odot ( \boldsymbol{I} - \boldsymbol{P})\boldsymbol{\Sigma}(\boldsymbol{\gamma}) (\boldsymbol{I} - \boldsymbol{P})$$

where $\boldsymbol{P} = \boldsymbol{X}(\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T$ and $\boldsymbol{I}$ is the identity matrix of dimension $n\times n$.

More precisely, we rely on the result of @xu2017study that provide a computationally efficient form of the theoretical Allan variance (equivalent to the Haar wavelet variance up to a constant) for zero-mean stochastic processes such as $\boldsymbol{\varepsilon}$ to avoid computing these large matrices multiplication in the objective function. Indeed in @xu2017study they generalize the results in @zhang2008allan to zero-mean non-stationary processes by using averages of the diagonals and super-diagonals of the covariance matrix of $\boldsymbol{\varepsilon}$. What this implies is that the GMWM, which uses this form, does not require the storage of the $n \times n$ covariance matrix of $\boldsymbol{\varepsilon}$, but only of a vector of dimension $n$ which is then plugged into an explicit formula consisting in a linear combination of the elements of this vector (these elements being averages of the diagonal and super-diagonals of the covariance matrix).


# Estimating tectonic velocities and crustal uplift

While the GMWMX as described above and in more details in @voirol2024inference, is a general method for estimating large linear model with complex dependence structure in presence of missing data, the `gmwmx2` `R` package is currently developed specifically to estimate tectonic velocities from position times series in graticule distance coordinates system (GD) provided by the Nevada geodetic Laboratory [@blewitt2024improved; @blewitt2018harnessing].

## Trajectory model

To estimate the trajectory model (see e.g., @bevis2014 for more details), we construct the design matrix $\boldsymbol{X}$ such that $i$-th component of the vector $\mathbf{X} {{\boldsymbol{\beta}}}$ can be described as follows with $t_i$ representing the $i^{th}$ ordered time point (epoch) indicated in Modified Julian Date and $t_0$ representing the reference epoch located exactly in the middle of start and end point of
the time series:

\begin{split}
 \mathbb{E}[\mathbf{Y}_i] &= \mathbf{X}_i^T {{\boldsymbol{\beta}}}  \\
&= a + b \left(t_{i} - t_0\right) +  \sum_{h=1}^{m}\left[c_{h} \sin \left(2 \pi f_{h} t_{i}\right)  + d_h \cos \left(2 \pi f_h t_i\right)\right] + \\& \sum_{j=1}^{r}e_j H\left(t_i - t_j\right) + \sum_{k = 1 }^{s} l_k \left[1- \exp\left(\frac{-(t_i-t_k)}{\tau_k}\right)\right]H\left(t_{i}-\tau_k\right) 
\end{split}

where $a$ is the initial position at the reference epoch $t_0$, $b$ is the velocity parameter, $c_h$  and $d_h$ are the periodic motion parameters ($h = 1$ and $h = 2$ represent the annual and semi-annual seasonal terms, respectively with $f_1 = \frac{1}{365.25}$ and $f_2 = \frac{2}{365.25}$). The offset terms models earthquakes, equipment changes or human intervention in which $e_j$ is the magnitude of the step at epochs $t_j$, $r$ is the total number of offsets, $H$ is the Heaviside step function defined as $H(x):= \begin{cases}1, & x \geq 0 \\ 0, & x<0\end{cases}$ and the last term allow to model post-seismic deformation (see e.g., @sobrero2020logarithmic) where $s$ is the number of post seismic relaxation time specified, $t_k$ is the time when the relaxation $k$
starts in Modified Julian Date (MJD), $\tau_k$ is the relaxation period in days for the post-seismic relaxation $k$ and $l_k$ is the amplitude of the transient. Note that by default the estimates of the functional parameters  are provided in unit/day. 

When loading data from a specific station using the function `gmwmx2::download_station_ngl()`, we extract from the Nevada Geodetic Laboratory the position time series in  GD coordinates, the time steps associated with a equipment or software change and the time steps associated with an earthquake near the station. All these objects are stored in a object of class `gnss_ts_ngl`.  

When applying the function `gmwmx2::gmwmx2()` to an object of class `gnss_ts_ngl`, we construct the design matrix $\boldsymbol{X}$ by considering an offset term for all equipment or software changes steps and all earthquakes indicated by the NGL. We also specify a post-seismic relaxation term for all earthquakes indicated by the NGL. If no relaxation time is specified in the argument `vec_earthquakes_relaxation_time`, we consider a default relaxation time of $365.25$ days. 

## Stochastic model

It is generally recognized that noise in GNSS time series is best described by a combination of colored noise plus white noise [@he2017review; @langbein2008noise; @williams2004error; @bos2013fast] where the white noise generally model noise at high frequencies and the colored noise model the lower frequencies of the spectrum. In a large study on the noise properties of GNSS signals, \cite{he2019investigation} concluded that the optimal
noise models for 80–90\% of GNSS time series signals are the power law and white noise model or  white noise and flicker/pink noise with model. The \texttt{gmwmx} \texttt{R} package currently support both stochastic model specification.


More precisely, the power spectrum of a power-law noise has the following form:
$$
P(f)=P_0\left(f / f_s\right)^\kappa
$$
where $f$ is the frequency, $P_0$ is a constant, $f_s$ the sampling frequency and the exponent $\kappa$ is called the spectral index.

Many stochastic noise can be expressed as such, for example, a spectral index $\kappa=0$ produces a white noise, a spectral index $\kappa=-2$ produces a red noise or random walk and a spectral index $\kappa=-1$ produce a flicker noise, also called pink noise. 

@granger1980long and @hosking1981fractional showed that power-law noise with a spectral index between $-1$ and $1$ can be obtained by using fractional differencing of Gaussian noise:

$$
(1-B)^{-\kappa / 2} \mathbf{v}
$$

where $B$ is the backward-shift operator $\left(B v_i=v_{i-1}\right)$ and $\mathbf{v}$ a vector with independent and identically distributed (IID) Gaussian noise. 

Following from Hosking's definition of the fractional differencing, we obtain

$$
\begin{aligned}
(1-B)^{-\kappa / 2} & =\sum_{i=0}^{\infty}\binom{-\kappa / 2}{i}(-B)^i \\
& =1-\frac{\kappa}{2} B-\frac{1}{2} \frac{\kappa}{2}\left(1-\frac{\kappa}{2}\right) B^2+\ldots \\
& =\sum_{i=0}^{\infty} h_i
\end{aligned}
$$
with the coefficient $h_i$ that can be computed using the following recurrence relation [@kasdin1992discrete]:

$$
\begin{aligned}
h_0 & =1 \\
h_i & =\left(i-\frac{\kappa}{2}-1\right) \frac{h_{i-1}}{i} \quad \text { for } i>0
\end{aligned}
$$

Assuming an infinite sequence of zero-mean white noise $\mathbf{v}$, with variance $\sigma_{P L}^2$, and a spectral index $kappa > -1$ then the autocovariance $\gamma(\tau)=\operatorname{Cov}\left(X_t, X_{t+\tau}\right)=\mathbb{E}\left[\left(X_t-\mu\right)\left(X_{t+\tau}-\mu\right)\right]$ is [@bos2008fast]:

$$
\begin{aligned}
& \gamma(0)=\sigma_{p l}^2 \frac{\Gamma(1-\alpha)}{\left(\Gamma\left(1-\frac{\alpha}{2}\right)\right)^2} \\
& \gamma(\tau)=\frac{\frac{\alpha}{2}+\tau-1}{-\frac{\alpha}{\alpha}+\tau} \gamma(\tau-1) \text { for } \tau>0
\end{aligned}
$$
When the argument `stochastic_model` is set to `"wn + pl"`, the stochastic model considered includes both white noise and power-law with the specified above stationary autocovariance structure. The parameters estimated are: $\sigma^2_{W N}$, $\kappa$ (constrained to be greater than $-1$) and $\sigma^2_{P L}$.

When the argument `stochastic_model` is set to `"wn + fl"`, the stochastic model considered includes both white noise and flicker noise (not stationary power-law noise with a spectral index $\kappa=-1$) where the variance covariance of the flicker noise $\omega$ is obtained as follows (see e.g., [@bos2008fast]): 

$$
\operatorname{Cov}(\omega) = \sigma^2_{F L}\mathbf{U}^T \mathbf{U}
$$

where 
$$
\mathbf{U}=\left(\begin{array}{cccc}
h_0 & h_1 & \ldots & h_n \\
0 & h_0 & & h_{n-1} \\
\vdots & & \ddots & \vdots \\
0 & 0 & \ldots & h_0
\end{array}\right)
$$
with the coefficients $h_i$ computed considering a spectral index $\kappa=-1$.

The stochastic parameters estimated are: $\sigma^2_{W N}$, $\sigma^2_{F L}$ and $\kappa$ is fixed to $-1$.


# Example of estimation
Let us showcase how to estimate the tectonic velocity in for one specific component (North, East or Vertical) of one station.

Let us first load the `gmwmx2` package.
```{r}
library(gmwmx2)
```

# Download a station from Nevada Geodetic Laboratory
```{r}
station_data <- download_station_ngl("CHML")
```

# Plot Station
```{r}
plot(station_data)
```


# Plot Northing component
```{r}
plot(station_data, component = "N")
```


# Estimate model on station data
```{r}
fit1 <- gmwmx2(station_data, n_seasonal = 2, component = "N", stochastic_model = "wn + pl")
```

# Extract estimated parameters
```{r}
summary(fit1)
```

By default, the estimated parameters are provided in m/day, we can optionally scale the estimated functional parameters so that they are returned in m/year with the argument `scale_parameters`.
```{r}
summary(fit1, scale_parameters = TRUE)
```


# Plot estimated model
```{r}
plot(fit1)
```
```{r}
fit2 <- gmwmx2(station_data, n_seasonal = 2, component = "N", stochastic_model = "wn + fl")
```

```{r}
summary(fit2)
```

```{r}
plot(fit2)
```


# References