The stats
implementation of rWishart
is in C
and is very
fast. It is often the case that we do not want a sample from the Wishart
distribution, but rather from the inverse of it or from the Cholesky
decomposition of a sample from the Wishart distribution. Or even from
the inverse of the Cholesky decomposition of a draw from the Wishart
distribution. Funnily enough (if you have a weird sense of humor), when
you inspect the source
code for the rWishart
distribution (R Core Team (2017)), it generates the Cholesky
decomposition and then multiplies it out. Meanwhile, drawing from the
rWishart
and then inverting or doing a Cholesky
decomposition or whatever in R is just slow – comparatively.
This suggests some obvious efficiencies: perhaps, if we would rather have the Cholesky decomposition of the Wishart random matrix, we could tell the function to stop right there.
library('CholWishart')
set.seed(20180220)
A <- stats::rWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
B <- rInvWishart(1,10,.2*diag(4))[,,1]
set.seed(20180220)
C <- rCholWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
D <- rInvCholWishart(1,10,.2*diag(4))[,,1]
Suppose Xi ∼ MVN(0, Σ) are independent p-variate normal random variables, i = 1, 2, …n with n > p − 1. Then S = ∑XiTXi, called the “scatter matrix”, is almost surely positive definite if Σ is positive definite. The random variable S is said to be distributed as a Wishart random variable: S ∼ Wp(n, Σ), see Gupta and Nagar (1999). This can be extended to the non-integer case as well.
How does rWishart(n, df, Sigma)
work (supposing
Sigma
is a p × p matrix)? First, it
generates a sample from the Cholesky decomposition of a Wishart
distribution with Σ = Ip.
How this is done: on the ith
element of the main diagonal, draw from $\sqrt{\chi_{p-i+1}^2}$. On the upper
triangle of the matrix, sample from an independent N(0, 1) for each entry in the
matrix. Then, this can be multiplied by the Cholesky decomposition of
the provided Sigma
to obtain the Cholesky factor of the
desired sample from the Wishart random variable (this construction is
due to Bartlett and is also known as the Bartlett Decomposition) (see
Anderson (1984)). The
rWishart
function multiplies this out. Therefore, if the
Cholesky decomposition is desired, one only needs to stop there.
If X ∼ Wp(ν, Σ), then we define the Inverse Wishart as X−1 = Y ∼ IWp(ν, Σ−1). There are other parameterizations of the distribution, mostly coming down to different ways of writing the ν parameter - be aware of this when using any package drawing from the Inverse Wishart distribution (see Dawid (1981) for an alternative; this presentation follows Gupta and Nagar (1999)). This comes up directly in Bayesian statistics. We are also interested in the Cholesky decomposition of this, as it is required in the generation of the matrix variate t-distribution. In this package it is done by taking the covariance matrix, inverting it, computing the Cholesky decomposition of the inverted covariance matrix, drawing the Cholesky factor of a Wishart matrix using that, and then inverting based on that (as finding Ψ−1 given the Cholesky factorization of Ψ is relatively fast). This can then be converted into the Cholesky factor of the Inverse Wishart if that is what is desired. This would be slow to do in R, but in C it is not so bad.
Here is what happens with the results of the above:
A %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 8.976532e-17 -1.256786e-16 -6.985020e-17
## [2,] -6.179054e-16 1.000000e+00 9.857855e-17 -9.551455e-17
## [3,] -2.371165e-16 -9.684781e-17 1.000000e+00 9.102991e-18
## [4,] 1.388748e-16 -9.861383e-17 -3.128188e-17 1.000000e+00
crossprod(C) %*% crossprod(D) # note: we do not expect C = D^-1, we expect this!
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 8.976532e-17 -1.256786e-16 -6.985020e-17
## [2,] -6.179054e-16 1.000000e+00 9.857855e-17 -9.551455e-17
## [3,] -2.371165e-16 -9.684781e-17 1.000000e+00 9.102991e-18
## [4,] 1.388748e-16 -9.861383e-17 -3.128188e-17 1.000000e+00
crossprod(D) %*% A
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -6.179054e-16 -2.371165e-16 1.388748e-16
## [2,] 8.976532e-17 1.000000e+00 -9.684781e-17 -9.861383e-17
## [3,] -1.256786e-16 9.857855e-17 1.000000e+00 -3.128188e-17
## [4,] -6.985020e-17 -9.551455e-17 9.102991e-18 1.000000e+00
crossprod(C) %*% B
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 8.976532e-17 -1.256786e-16 -6.985020e-17
## [2,] -6.179054e-16 1.000000e+00 9.857855e-17 -9.551455e-17
## [3,] -2.371165e-16 -9.684781e-17 1.000000e+00 9.102991e-18
## [4,] 1.388748e-16 -9.861383e-17 -3.128188e-17 1.000000e+00
There is some roundoff error.
Suppose, instead of the above definition of the Wishart, we have n ≤ p − 1. Then the scatter matrix defined above will not be positive definite. This is called the pseudo Wishart distribution. If we then take the Moore-Penrose pseudo-inverse of this, we have the generalized inverse Wishart distribution.
A <- rPseudoWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.9553698 0.3134085 -0.8487259 -0.5174803 0.7487007
## [2,] 0.3134085 2.3856728 -0.6106572 1.4471373 1.5976931
## [3,] -0.8487259 -0.6106572 0.5838199 -0.7509533 -0.7863404
## [4,] -0.5174803 1.4471373 -0.7509533 4.8520003 1.6696925
## [5,] 0.7487007 1.5976931 -0.7863404 1.6696925 1.4396817
qr(A)$rank
## [1] 3
B <- rGenInvWishart(n = 1, df = 3, Sigma = diag(5))[, , 1]
B
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.33422500 0.189387794 0.07086651 -0.052283855 -0.04241885
## [2,] 0.18938779 0.155647000 -0.02450637 0.001357582 -0.07549583
## [3,] 0.07086651 -0.024506366 0.10489215 -0.052325490 0.03515182
## [4,] -0.05228386 0.001357582 -0.05232549 0.028056042 -0.02793470
## [5,] -0.04241885 -0.075495830 0.03515182 -0.027934699 0.24217199
qr(B)$rank
## [1] 3
Note that the rank of both of these matrices is less than the dimension.
This package also has functions for density computations with the Wishart distribution. Densities are only defined for positive-definite input matrices and ν parameters larger than the dimension p.
The return value is on the log
scale but it can be
specified otherwise.
dWishart(diag(3), df = 5, 5*diag(3))
## [1] -19.45038
dInvWishart(diag(3), df = 5, .2*diag(3))
## [1] -19.45038
Note that, in general, these will not agree even if their covariance matrix parameters are inverses of each other. One of the reasons this works is that the determinant of X is 1.
The density functions can take 3-D array input indexed on the third dimension and will output a vector of densities.
The multivariate gamma (Γp) and digamma (ψp) functions are extensions of the univariate gamma (Γ) and digamma (ψ) functions (Mardia, Bibby, and Kent (1982)). They are useful in calculating the densities above. They come up in other distributions as well. The digamma is the first derivative of the gamma function. When the dimension p = 1, they coincide with the usual definitions of the digamma and gamma functions.
The multivariate gamma also comes in a logarithmic form
(lmvgamma
).