---
title: "Satterthwaite"
package: mmrm
bibliography: '`r system.file("REFERENCES.bib", package = "mmrm")`'
csl: '`r system.file("jss.csl", package = "mmrm")`'
output:
  rmarkdown::html_vignette:
          toc: true
vignette: |
  %\VignetteIndexEntry{Satterthwaite}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
  markdown:
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Here we describe the details of the Satterthwaite degrees of freedom
calculations.

## Satterthwaite degrees of freedom for asymptotic covariance

In @Christensen2018 the Satterthwaite degrees of freedom approximation
based on normal models is well detailed and the computational approach
for models fitted with the `lme4` package is explained. We follow the
algorithm and explain the implementation in this `mmrm` package. The
model definition is the same as in [Details of the model fitting in
`mmrm`](algorithm.html).

We are also using the same notation as in the [Details of the
Kenward-Roger calculations](kenward.html). In particular, we assume we
have a contrast matrix $C \in \mathbb{R}^{c\times p}$ with which we want
to test the linear hypothesis $C\beta = 0$. Further, $W(\hat\theta)$ is
the inverse of the Hessian matrix of the log-likelihood function of
$\theta$ evaluated at the estimate $\hat\theta$, i.e. the observed
Fisher Information matrix as a consistent estimator of the
variance-covariance matrix of $\hat\theta$.
$\Phi(\theta) = \left\{X^\top \Omega(\theta)^{-1} X\right\} ^{-1}$ is
the asymptotic covariance matrix of $\hat\beta$.

### One-dimensional contrast

We start with the case of a one-dimensional contrast, i.e. $c = 1$. The
Satterthwaite adjusted degrees of freedom for the corresponding t-test
are then defined as:
\[
\hat\nu(\hat\theta) = \frac{2f(\hat\theta)^2}{f{'}(\hat\theta)^\top W(\hat\theta) f{'}(\hat\theta)}
\]
where $f(\hat\theta) = C \Phi(\hat\theta) C^\top$ is the scalar in the
numerator and we can identify it as the variance estimate for the
estimated scalar contrast $C\hat\beta$. The computational challenge is
essentially to evaluate the denominator in the expression for
$\hat\nu(\hat\theta)$, which amounts to computing the $k$-dimensional
gradient $f{'}(\hat\theta)$ of $f(\theta)$ (for the given contrast
matrix $C$) at the estimate $\hat\theta$. We already have the
variance-covariance matrix $W(\hat\theta)$ of the variance parameter
vector $\theta$ from the model fitting.

#### Jacobian approach

However, if we proceeded in a naive way here, we would need to recompute
the denominator again for every chosen $C$. This would be slow, e.g.
when changing $C$ every time we want to test a single coefficient within
$\beta$. It is better to instead evaluate the gradient of the matrix
valued function $\Phi(\theta)$, which is therefore the Jacobian, with
regards to $\theta$, $\mathcal{J}(\theta) = \nabla_\theta \Phi(\theta)$.
Imagine $\mathcal{J}(\theta)$ as the the 3-dimensional array with $k$
faces of size $p\times p$. Left and right multiplying each face by $C$
and $C^\top$ respectively leads to the $k$-dimensional gradient
$f'(\theta) = C \mathcal{J}(\theta) C^\top$. Therefore for each
new contrast $C$ we just need to perform simple matrix multiplications,
which is fast (see `h_gradient()` where this is implemented). Thus,
having computed the estimated Jacobian $\mathcal{J}(\hat\theta)$,
it is only a matter of putting the different quantities together to
compute the estimate of the denominator degrees of freedom,
$\hat\nu(\hat\theta)$.

#### Jacobian calculation

Currently, we evaluate the gradient of $\Phi(\theta)$ through function `h_jac_list()`.
It uses automatic differentiation provided in `TMB`.

We first obtain the Jacobian of the inverse of the covariance matrix of coefficient ($\Phi(\theta)^{-1}$), following
the [Kenward-Roger calculations](kenward.html#special-considerations-for-mmrm-models).
Please note that we only need $P_h$ matrices.

Then, to obtain the Jacobian of the covariance matrix of coefficient, following the [algorithm](kenward.html#derivative-of-the-sigma-1),
we use $\Phi(\theta)$ estimated in the fit to obtain the Jacobian.

The result is a list (of length $k$ where $k$ is the dimension of the variance parameter $\theta$) of matrices of $p \times p$,
where $p$ is the dimension of $\beta$.

### Multi-dimensional contrast

When $c > 1$ we are testing multiple contrasts at once. Here an F-statistic
\[
F = \frac{1}{c} (C\hat\beta)^\top  (C \Phi(\hat\theta) C^\top)^{-1} C^\top (C\hat\beta)
\]
is calculated, and we are interested in estimating an appropriate denominator degrees of freedom for $F$,
while assuming $c$ are the numerator degrees of freedom. Note that only in special cases,
such as orthogonal or balanced designs, the F distribution will be exact under the
null hypothesis. In general, it is an approximation.

The calculations are described in detail in @Christensen2018, and we don't repeat
them here in detail. The implementation is in `h_df_md_sat()` and starts with an
eigen-decomposition of the asymptotic variance-covariance matrix of the contrast estimate,
i.e. $C \Phi(\hat\theta) C^\top$. The F-statistic can be rewritten as a sum
of $t^2$ statistics based on these eigen-values. The corresponding random variables
are independent (by design because they are derived from the orthogonal eigen-vectors) and
essentially have one degree of freedom each. Hence, each of the $t$ statistics is
treated as above in the one-dimensional contrast case, i.e. the denominator degree
of freedom is calculated for each of them. Finally, using properties of the F
distribution's expectation, the denominator degree of freedom for the whole F statistic
is derived.

## Satterthwaite degrees of freedom for empirical covariance

In @bell2002bias the Satterthwaite degrees of freedom in combination with a sandwich covariance matrix estimator are described.

### One-dimensional contrast

For one-dimensional contrast, following the same notation in [Details of the model fitting in `mmrm`](algorithm.html)
and [Details of the Kenward-Roger calculations](kenward.html), we have the following derivation.
For an estimator of variance with the following term

\[
  v = s c^\top(X^\top X)^{-1}\sum_{i}{X_i^\top A_i \epsilon_i \epsilon_i^\top A_i X_i} (X^\top X)^{-1} c
\]

where $s$ takes the value of $\frac{n}{n-1}$, $1$ or $\frac{n-1}{n}$, and $A_i$ takes $I_i$, $(I_i - H_{ii})^{-\frac{1}{2}}$, or $(I_i - H_{ii})^{-1}$
respectively, $c$ is a column vector, then $v$ can be decomposed into the a weighted sum of independent $\chi_1^2$ distribution, where the weights are
the eigenvalues of the $n\times n$ matrix $G$ with elements
\[
  G_{ij} = g_i^\top V g_j
\]

where

\[
  g_i = s^{\frac{1}{2}} (I - H)_i^\top A_i X_i (X^\top X)^{-1} c
\]
\[
  H = X(X^\top X)^{-1}X^\top
\]

$(I - H)_i$ corresponds to the rows of subject $i$.

So the degrees of freedom can be represented as
\[
  \nu = \frac{(\sum_{i}\lambda_i)^2}{\sum_{i}{\lambda_i^2}}
\]

where $\lambda_i, i = 1, \dotsc, n$ are the eigenvalues of $G$.
@bell2002bias also suggests that $V$ can be chosen as identify matrix, so $G_{ij} = g_i ^\top g_j$.

Following [Weighted Least Square Estimator](algorithm.html#weighted-least-squares-estimator), we can transform
the original $X$ into $\tilde{x}$ to use the above equations.

To avoid repeated computation of matrix $A_i$, $H$ etc for different contrasts, we calculate and cache the following

\[
  G^\ast_i = (I - H)_i^\top A_i X_i (X^\top X)^{-1}
\]
which is a $\sum_i{m_i} \times p$ matrix. With different contrasts, we need only calculate the following
\[
  g_i = G^\ast_i c
\]
to obtain a $\sum_i{m_i} \times 1$ matrix, $G$ can be computed with $g_i$.

To obtain the degrees of freedom, and to avoid eigen computation on a large matrix, we can use the following equation

\[
  \nu = \frac{(\sum_{i}\lambda_i)^2}{\sum_{i}{\lambda_i^2}} = \frac{tr(G)^2}{\sum_{i}{\sum_{j}{G_{ij}^2}}}
\]

The scale parameter is not used throughout the package.

The proof is as following

1. Proof of
\[
  tr(AB) = tr(BA)
\]

Let $A$ has dimension $p\times q$, $B$ has dimension $q\times p$
\[
  tr(AB) = \sum_{i=1}^{p}{(AB)_{ii}} = \sum_{i=1}^{p}{\sum_{j=1}^{q}{A_{ij}B_{ji}}}
\]

\[
  tr(BA) = \sum_{i=1}^{q}{(BA)_{ii}} = \sum_{i=1}^{q}{\sum_{j=1}^{p}{B_{ij}A_{ji}}}
\]

so $tr(AB) = tr(BA)$

2. Proof of
\[
  tr(G) = \sum_{i}(\lambda_i)
\]
and
\[
  \sum_{i}(\lambda_i^2) = \sum_{i}{\sum_{j}{G_{ij}^2}}
\]
if $G = G^\top$

Following eigen decomposition, we have
\[
  G = Q \Lambda Q^\top
\]
where $\Lambda$ is diagonal matrix, $Q$ is orthogonal matrix.

Using the previous formula that $tr(AB) = tr(BA)$, we have

\[
  tr(G) = tr(Q \Lambda Q^\top) = tr(\Lambda Q^\top Q) = tr(\Lambda) = \sum_{i}(\lambda_i)
\]

\[
  tr(G^\top G) = tr(Q \Lambda Q^\top Q \Lambda Q^\top) = tr(\Lambda^2 Q^\top Q) = tr(\Lambda^2) = \sum_{i}(\lambda_i^2)
\]

and $tr(G^\top G)$ can be further expressed as

\[
  tr(G^\top G) = \sum_{i}{(G^\top G)_{ii}} = \sum_{i}{\sum_{j}{G^\top_{ij}G_{ji}}} = \sum_{i}{\sum_{j}{G_{ij}^2}}
\]

### Multi-dimensional contrast

For multi-dimensional contrast we use the same technique for multi-dimensional contrast for asymptotic covariance.

# References