---
title: "Some Distributions Related to the Wishart"
author: "Geoffrey  Thompson"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: wishart.bib
vignette: >
  %\VignetteIndexEntry{Some Distributions Related to the Wishart}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##"
)
```
The `stats` implementation of `rWishart` is in `C` and is very fast. It is often the case that we do not want a sample from the Wishart distribution, but rather from the inverse of it or from the Cholesky decomposition of a sample from the Wishart distribution. Or even from the inverse of the Cholesky decomposition of a draw from the Wishart distribution. Funnily enough (if you have a weird sense of humor), when you inspect the [source code](https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/rWishart.c) for the `rWishart` distribution (@rstatsteam), it generates the Cholesky decomposition and then multiplies it out. Meanwhile, drawing from the `rWishart` and then inverting or doing a Cholesky decomposition or whatever in R is just slow -- comparatively.

This suggests some obvious efficiencies: perhaps, if we would rather have the Cholesky decomposition of the Wishart random matrix, we could tell the function to stop right there.

```{r chol}
library('CholWishart')
set.seed(20180220)
A <- stats::rWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
B <- rInvWishart(1,10,.2*diag(4))[,,1]
set.seed(20180220)
C <- rCholWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
D <- rInvCholWishart(1,10,.2*diag(4))[,,1]
```

## How the Wishart distribution works

Suppose $X_i \sim MVN(0, \Sigma)$ are independent $p$-variate normal 
random variables, $i = 1, 2, \ldots n$ with $n > p-1$. 
Then $S = \sum X_i^T X_i$, called the "scatter matrix", is almost surely 
positive definite if $\Sigma$ is positive definite. 
The random variable $S$ is said to be distributed as a Wishart random variable:
$S \sim W_p(n, \Sigma)$, see @gupta1999matrix. This can be extended to the non-integer case
as well.

How does `rWishart(n, df, Sigma)` work (supposing `Sigma` is a $p \times p$ matrix)? First, it generates a sample from the Cholesky decomposition of a Wishart distribution with $\Sigma = \mathbf{I}_p$. How this is done: on the $i^{th}$ element of the main diagonal, draw from $\sqrt{\chi_{p-i+1}^2}$. On the upper triangle of the matrix, sample from an independent $N(0,1)$ for each entry in the matrix. Then, this can be multiplied by the Cholesky decomposition of the provided `Sigma` to obtain the Cholesky factor of the desired sample from the Wishart random variable (this construction is due to Bartlett and is also known as the Bartlett Decomposition) (see @Anderson84a). The `rWishart` function multiplies this out. Therefore, if the Cholesky decomposition is desired, one only needs to stop there.

## The Inverse Wishart distribution

If $X \sim \textrm{W}_p(\nu,\Sigma)$, then we define the Inverse Wishart as $X^{-1}  = Y \sim \textrm{IW}_p(\nu , \Sigma^{-1})$. There are other parameterizations of the distribution, mostly coming down to different ways of writing the $\nu$ parameter - be aware of this when using any package drawing from the Inverse Wishart distribution (see @dawid for an alternative; this presentation follows @gupta1999matrix). This comes up directly in Bayesian statistics. We are also interested in the Cholesky decomposition of this, as it is required in the generation of the matrix variate $t$-distribution. In this package it is done by taking the covariance matrix, inverting it, computing the Cholesky decomposition of the inverted covariance matrix, drawing the Cholesky factor of a Wishart matrix using that, and then inverting based on that (as finding $\Psi^{-1}$ given the Cholesky factorization of $\Psi$ is relatively fast). This can then be converted into the Cholesky factor of the Inverse Wishart if that is what is desired. This would be slow to do in R, but in C it is not so bad. 

Here is what happens with the results of the above:

```{r results}
A %*% B 
crossprod(C) %*% crossprod(D) # note: we do not expect C = D^-1, we expect this! 
crossprod(D) %*% A
crossprod(C) %*% B
```

There is some roundoff error.

## The Pseudo-Wishart and Generalized Inverse Wishart

Suppose, instead of the above definition of the Wishart, we have $n \leq p-1$. 
Then the scatter matrix defined above will not be positive definite. This is called
the pseudo Wishart distribution. If we then take the Moore-Penrose pseudo-inverse
of this, we have the generalized inverse Wishart distribution. 

```{r pseudo}
A <- rPseudoWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
A
qr(A)$rank
B <- rGenInvWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
B
qr(B)$rank
```

Note that the rank of both of these matrices is less than the dimension.


## Density computation

This package also has functions for density computations with the Wishart distribution. 
Densities are only defined for positive-definite input matrices and $\nu$ parameters 
larger than the dimension $p$.

The return value is on the `log` scale but it can be specified otherwise.

```{r density}
dWishart(diag(3), df = 5, 5*diag(3))

dInvWishart(diag(3), df = 5, .2*diag(3))

```

Note that, in general, these will not agree even if their covariance matrix parameters are inverses of each other. One of the reasons this works is that the determinant of $\mathbf{X}$ is $1$.

The density functions can take 3-D array input indexed on the third dimension and will output a vector of densities.

```{r threedensities}
set.seed(20180311)
A <- rWishart(n = 3, df = 3, Sigma = diag(3))
dWishart(A, df = 3, Sigma = diag(3))

```

## Other functions

The multivariate gamma ($\Gamma_p$) and digamma ($\psi_p$) functions are extensions of the univariate gamma ($\Gamma$) and digamma ($\psi$) functions (@mardia1982multivariate). They are useful in calculating the densities above. They come up in other distributions as well. The digamma is the first derivative of the gamma function. When the dimension $p = 1$, they coincide with the usual definitions of the digamma and gamma functions.

The multivariate gamma also comes in a logarithmic form (`lmvgamma`).

```{r lmvgamma}
lmvgamma(1:4,1) # note how they agree when p = 1
lgamma(1:4)
```

# References