The tsmarch
package represents a re-write and re-think
of the models in rmgarch. It is
written using simpler S3 methods and classes, has a cleaner code base,
extensive documentation and unit tests, provides speed gains by making
use of parallelization in both R (via the future
package)
and in the C++ code (via RcppParallel
package), and works with the new univariate GARCH package tsgarch.
To simplify usage, similar to the approach adopted in
tsgarch
, conditional mean dynamics are not included. Since
the distributions are location/scale invariant, the series must be
pre-filtered for conditional mean dynamics before submitting the data,
but there is an option to pass the conditional_mean
(cond_mean
) as an argument in both the setup specification,
filtering, prediction and simulation methods so that it is incorporated
into the output of various methods at each step of the way.
Alternatively, the user can location shift (re-center) the simulated
predictive distribution matrix by the pre-calculated conditional mean
forecast, or pass the zero mean correlated predictive distribution
innovations as an input to the conditional mean prediction simulation
dynamics.
The nature of the models implemented allows for separable dynamics in estimation which means that individual series have univariate dynamics which can be estimated in parallel. However, the drawback to this is that for standard errors a partitioned Hessian must be calculated which may be very time consuming. While the package defaults to the use of OPG (outer of product of gradients) for standard errors which is fast, there is always the option to calculate the full partitioned Hessian for other types of standard error estimators.
The next sections provide a detailed overview of each of the 3 models available, namely the Dynamic Correlation GARCH model of R. F. Engle (2002) (DCC), the Copula-DCC model (see Patton (2009) and Ausin and Lopes (2010)) (CGARCH) and the Generalized Orthogonal GARCH model with multivariate affine Generalized Hyperbolic Distribution (see Chen, Härdle, and Spokoiny (2010) and Broda and Paolella (2009)) (GOGARCH).
Code snippets are provided in each section to highlight the code implementation, whilst Section provides diagrams for the methods in the package.
dcc
and
cgarch
)The covariance decomposition type models decompose the dynamics of the conditional covariance into the conditional standard deviations and correlations, so that they may be expressed in such a way that the univariate and multivariate dynamics may be separated, thus simplifying the estimation process. This decomposition allows a great deal of flexibility by defining individual univariate dynamics for each marginal series, and separate dynamics for the conditional correlation, using a 2-stage estimation process.
In the Constant Conditional Correlation (CCC) model of Bollerslev (1990) the equation for the conditional covariance can be represented as:
where $D_t = diag(\sqrt {h_{11,t}} ,...,\sqrt{h_{nn,t}})$, and R is the positive definite constant correlation matrix.
The conditional variances, hii, t, which can be estimated separately, usually admit some type of GARCH(p,q) model, but can be generalized to any admissible model for the variance dynamics.
The conditions for the positivity of the covariance matrix Ht are that R is positive definite and the diagonal elements of Dt, hii, t, are positive.
Formally, the model may be represented as follow:
where yt is a vector representing the series values at time t, μt a vector of the conditional mean at time t for each series, εt a vector of zero mean residuals which have a multivariate Normal distribution with conditional covariance Σt, and LLt(θ) is the log-likelihood at time t given a parameter vector θ of univariate GARCH (or any other model) parameters. The constant correlation matrix R is usually approximated by its sample counterpart as
where zt is the vector of GARCH standardized residuals at time t.
Code Snippet
Allowing the correlation to be time varying, whilst retaining the 2-stage separation of dynamics, creates an additional challenge to the positive definiteness constraint at each point in time for the conditional correlation. R. F. Engle (2002) proposed the following proxy process :
where aj and bj are non-negative coefficients with the constraint that they sum to less than 1 for stationarity and positive definiteness, and Q̄ is the unconditional matrix of the standardized residuals zt. To recover Rt, the following re-scaling operation is performed:
Under the assumption that εt ∼ N(0, Σt), the log-likelihood is:
where LLV, t(ϕ) is the log-likelihood of the conditional variances given the GARCH parameters ϕ, and LLR, t(ψ|ϕ) is the log-likelihood of the conditional correlation given the DCC parameters ψ|ϕ (conditioned on the GARCH parameters).
Code Snippet
Estimation of the CCC and DCC models is done in 2 steps. The first step is to estimate the conditional variances for each series using a univariate GARCH model. The univariate models do not need to conform to the any single type of dynamics, but can be any admissible model. The conditional variances are then used to standardize the residuals which are then used to estimate the conditional correlation matrix. Since the first step is separable, it can be done in parallel for each series. After considerable experimentation, it was chosen to avoid the use of TMB for automatic differentiation and instead use numerical derivatives for the second stage estimation due to speed considerations, and observing that there was little gain in terms of accuracy.
Code Snippet
Since estimation is performed in 2 stages, the log-likelihood is the sum of these 2 stages. More specifically, the second stage log-likelihood is estimated conditioning on the parameters of the first stage. Details of the assumptions behind this and the arguments for the asymptotic distribution of the estimates are put forth in R. F. Engle and Sheppard (2001), who posit that the asymptotic distribution of the 2-stage DCC model estimates with parameters θ = (ϕ, ψ) is:
where:
and
where B0−1 and M0 are the Hessian and the covariance matrix of the score vector respectively, also called the bread and meat of the sandwich estimator. In the case of this 2 stage estimation procedure, the Hessian is partitioned into 4 blocks, B11, B12, B22 and M11, M12, M22, where B11 and M11 are the Hessian and covariance matrix of the first stage log-likelihood, and B22 and M22 are the Hessian and covariance matrix of the second stage log-likelihood. B12 and M12 are the cross derivatives of the first and second stage log-likelihoods.
Because the Hessian is partitioned, the calculation of the standard errors is more computationally intensive than the standard OPG (outer product of gradients) method. The package defaults to the OPG method, but the user can choose to calculate the full partitioned Hessian if they so wish, for which parallel functionality has been implemented to provide some speed gains. The package uses numerical derivatives for the second stage estimation, as well as the calculation of the standard errors. When the partitioned Hessian is present, a number of additional estimators become available including the those of the QML and HAC (using the bandwidth of Newey and West (1994)) type.
Code Snippet
The original DCC model was based on the assumption that the residuals are Normally distributed, as shown in equations and .
However, this is not always the case in practice. There are 2 possible approaches to this problem. The first is to use a different multivariate distribution for the DCC model and the second is to use a copula approach.
mvt
)Introducing a multivariate distribution which requires estimation of
higher moment parameters implies that the univariate margins must also
have the same distribution and usually the same higher moment
parameters. This in turn implies the need for joint estimation of all
dynamics in the model with specific constraints on the joint parameters.
This is computationally intensive and may not be feasible for anything
but a few series. However, it may be possible to estimate the higher
moment parameters using QML (quasi-maximum likelihood) for the first
stage (essentially use a Normal distribution for the residuals), and
then use these estimates in the second stage estimation. This is the
approach taken in the tsmarch
package when using the
multivariate Student distribution with log-likelihood equal to:
where ν is the degrees of freedom parameter of the Student distribution, and n is the number of series. The assumption post-estimation is that the marginal distribution of each series is Student with the same degrees of freedom parameter ν.
Code Snippet
cgarch
)The copula approach is a more general approach which allows for the use of different marginal distributions for each series. In simple terms, the copula is a function which links the marginal distributions to the joint distribution by applying a transformation to the marginal residuals to make them uniformly distributed. The joint distribution is then obtained by applying the inverse distribution of the copula to the uniform margins.
Specifically:
where F is the marginal
distribution function, F−1 is the inverse of the
multivariate distribution function, C is the copula function, and ut is the vector
of uniform margins. The transformation of the margins is called the
Probability Integral Transform (PIT). There are a number of ways to
transform the margins to uniform distributions, including a parametric
transform (also called the Inference-Functions-For-Margins by Joe (1997)), using
the empirical CDF (also called pseudo-likelihood and investigated by
Genest, Ghoudi, and Rivest (1995)) and the semi-parametric
approach (see Davison and Smith (1990)). All three approaches are
implemented in the tsmarch
package. The package implements
both a Normal and Student copula, with either constant or dynamic
correlation. For the Student Copula, the log-likelihood is given by:
where
with ui, t = Fi, t(εi, t|ϕi) is the PIT transformation of each series by it’s conditional distribution estimated in the first stage GARCH process by any choice of dynamic and distributions, Fi−1(ui, t|η) represents the quantile transformation of the uniform margins subject to a common shape parameter of the multivariate Student Copula, and ft(Fi−1(u1, t|η), …, Fi−1(un, t|η)|Rt, η) is the joint density of the Student Copula with shape parameter η and correlation matrix Rt.
Therefore, the log-likelihood is composed of essentially 3 parts: the first stage univariate models, the univariate PIT transformation and inversion given the Copula distribution and the second stage multivariate model.
The correlation can be modeled as either constant or dynamic in the copula model. The dynamic correlation is modeled in the same way as the DCC model, but the residuals are transformed to uniform margins before estimating the correlation. Appendix provides a more detailed overview for the constant correlation case.
Essentially, the difference between the standard DCC type model and the one with a Copula distribution involves the PIT transformation step, which is reflected in the inference (standard errors), forecasting and simulation methods for this model.
Code Snippet
A number of extensions to the DCC model have been proposed in the literature. These include the use of asymmetric and generalized dynamics. Cappiello, Engle, and Sheppard (2006) generalize the DCC model with the introduction of the Asymmetric Generalized DCC (AGDCC) where the dynamics of Qt now become:
where A, B and G are the N × N parameter matrices,
z̄t are the
zero-threshold standardized residuals (equal to zt when less
than zero else zero otherwise), Q̄ and N̄ are the unconditional matrices of
zt and
z̄t
respectively. The generalization allows for cross dynamics between the
standardized residuals, and cross-asymmetric dynamics. However, this
comes at the cost of increased computational complexity making it less
feasible for large scale systems. Additionally, covariance targeting in
such a high dimensional model where the parameters are no longer scalars
creates difficulties in imposing positive definiteness constraints. The
tsmarch
package does not implement this generalized model,
but instead a scalar version to capture asymmetric dynamics
(adcc
):
Code Snippet
Because of the non linearity of the DCC evolution process, the
multi-step ahead forecast of the correlation cannot be directly solved.
R. F. Engle and Sheppard (2001) provide a couple of suggestions
which form a reasonable approximation. In the tsmarch
package we instead use a simulation approach, similar to the other
packages in the tsmodels
universe, and in doing so generate
a simulated distribution of the h-step ahead correlation and covariance.
This also generalizes to the copula model which does not have a closed
form solution. Formally, for a draw s at time t, the simulated correlation is
given by:
where ϵst is a vector of standard Normal random variables, Λ is the diagonal matrix of the eigenvalues of Rst and E is the matrix of eigenvectors of Rst, such that:
For the multivariate Student distribution, the draws for zts are generated as :
where Ws is a random draw from a chi-squared distribution with ν degrees of freedom. For ϵst, it is also possible to instead use the empirical standardized and de-correlated residuals, an option which is offered in the package.
Code Snippet
Since all predictions are generated through simulation, the weighted predictive distribution $\bar {y_t}^s$ can be obtained as:
where s is a sample draw from the simulated predictive distribution of the n series, and wi, t the weight on series i at time t.
Code Snippet
Robert F. Engle and Ng (1993) introduced the news impact curve as a tool for evaluating the effects of news on the conditional variances, and further extended by Kroner and Ng (1998) to the multivariate GARCH case. The news impact surface shows the effect of shocks from a pair of assets on their conditional covariance (or correlation). Formally, the correlation Ri, j given a set of shock (zi, zj) is given by:
where z̄, Q̄ and N̄ are defined as in equation .
Code Snippet
The DCC model has come under a considerable amount of criticism in the literature, particularly by Caporin and McAleer (2008) who argue that consistency and asymptotic Normality of the estimators has not been fully established, and proposes instead the indirect DCC model (essentially a scalar BEKK) which models directly the conditional covariance matrix and forgoes univariate GARCH dynamics.
Aielli (2013) also points out that the estimation of Qt as the empirical counterpart of the correlation matrix of zt is inconsistent since E[ztz′t] = E[Rt] ≠ E[Qt]. They propose a DCC (cDCC) model which includes a corrective step which eliminates this inconsistency, albeit at the cost of targeting which is not allowed.
gogarch
)The GOGARCH model in the tsmarch
package is based on the
approach described in Chen, Härdle, and Spokoiny
(2010), Broda
and Paolella (2009) and Ghalanos, Rossi, and Urga (2015). Unlike the covariance
decomposition models, the Generalized Orthogonal GARCH model is akin to
a full distributional decomposition model rather than just the
covariance, since it makes use of higher moments to achieve independent
margins from which univariate models can then be used to estimate the
conditional dynamics (see Schmidt, Hrycej, and
Stützle (2006) for details on the
multivariate affine Generalized Hyperbolic distribution).
Formally, the stationary return series yt follows a linear dynamic model:
where A is an invertible and constant mixing matrix, which transforms the independent components ft back into the residuals εt. This can be further decomposed into de-whitening and rotation matrices, Σ1/2 and U respectively:
where Σ is the unconditional covariance matrix of the residuals and performs the role of de-correlating (whitening) the residuals prior to rotation (independence). It is at this juncture where dimensionality reduction may also be performed, by selecting the first m( ≤ n) principal components of the residuals. In that case εt ≈ ε̂t = Aft since the dimensionality reduced system only approximates the original residuals.
The rotation matrix U is
calculated through the use of Independent Component Analysis (ICA) which
is a method separating multivariate mixed signals into additive
statistically independent and non-Gaussian components (see Hyvärinen and Oja (2000)). The algorithm used in the
tsmarch
package for ICA is based on the RADICAL algorithm
of Learned-Miller and Fisher (2003) which uses an efficient entropy
estimator (due to Vasicek (1976)) which is robust to outliers
and requires no strong characterization assumptions about the data
generating process.
The independent components ft can be further decomposed as:
where Ht is the conditional covariance matrix of the independent components with Ht = E[ftft′|𝔉t − 1], being a diagonal matrix with elements (h1, t, …, hn, t) which are the conditional variances of the factors, and zt = (z1, t, …, zn, t)′. The random variable zi, t is independent of zjt − s ∀j ≠ i and ∀s, with E[zi, t|𝔉t − 1] = 0 and E[zi, t2] = 1, implying that E[ft|𝔉t − 1] = 0 and E[εt|𝔉t − 1] = 0. The factor conditional variances, hi, t can be modeled as a GARCH-type process. The unconditional distribution of the factors is characterized by:
which, in turn, implies that:
It then follows that the return series can be expressed as:
The conditional covariance matrix, Σt ≡ E[(yt − μt)(yt − μt)′|𝔉t − 1] is given by Σt = AHtA′.
The Orthogonal Factor model of Alexander (2002) which uses only information in the covariance matrix, leads to uncorrelated components but not necessarily independent unless assuming a multivariate Normal distribution. However, while whitening is not sufficient for independence, it is nevertheless an important step in the pre-processing of the data in the search for independent factors, since by exhausting the second order information contained in the covariance matrix it makes it easier to infer higher order information, reducing the problem to one of rotation (orthogonalization).
Code Snippet
The estimation of the GOGARCH model is performed in 3 steps.
tsmarch
package we use the tsgarch
package for estimating the GARCH
model for each factor with either a Generalized Hyperbolic
(gh
) or Normal Inverse Gaussian (nig
)
distributionThe log-likelihood of the GOGARCH model is composed of the individual log-likelihoods of the univariate independent factor GARCH models and a term for the mixing matrix:
where GHλi is the Generalized Hyperbolic distribution with shape parameter λi for the i-th factor. Since the inverse of the mixing matrix is the un-mixing matrix A−1 = W = K′U, and since the determinant of U is 1 (|U| = 1), the the log-likelihood simplifies to:
In the presence of dimensionality reduction, K is no longer square and we cannot directly compute its determinant. Instead, we use the determinant of KK′ , to reflect the combined effect of whitening and dimensionality reduction:
where m are the number of factors. A key advantage of this approach is the ability to estimate multivariate dynamics using only univariate methods, providing for feasible estimation of large scale systems. The speed advantage of this approach does not come for free, as there is a corresponding need for memory to handle the parallelization offered, particularly when generating higher co-moment matrices.
Code Snippet
The linear affine representation of the GOGARCH model provides for a closed-form expression for the conditional co-moments. The first 3 co-moments are given by:
where Mf, t2, Mf, t3 and Mf, t4 are the (N × N) conditional covariance, (N × N2) the conditional third co-moment matrix and the (N × N3) conditional fourth co-moment matrix of the factors, respectively, represented as unfolded tensors
These are calculated from the univariate factor dynamics, and defined as:
where Mk, f, t3, k = 1, …, n and Mk, l, f, t4, k, l = 1, …, n are the N × N sub-matrices of Mf, t3 and Mf, t4, respectively, with elements
Since the factors fi, t can be decomposed as $z_{i,t}\sqrt{h_{i,t}}$, and given the assumptions on zi, t then E[fi, tfj, tfk, t|𝔉t − 1] = 0. It is also true that for i ≠ j ≠ k ≠ l E[fi, tfj, tfk, tfl, t|𝔉t − 1] = 0 and when i = j and k = l,
Thus, under the assumption of mutual independence, all elements in the conditional co-moments matrices with at least 3 different indices are zero. Finally, standardizing the conditional co-moments one obtains conditional co-skewness and co-kurtosis of yt,
where Si, j, k, t represents the series co-skewness between elements i, j, k of yt, σi, t the standard deviation of yi, t, and in the case of i = j = k represents the skewness of series i at time t, and similarly for the co-kurtosis tensor Ki, j, k, l, t. Two natural applications of return co-moments matrices are Taylor type utility expansions in portfolio allocation and higher moment news impact surfaces.
Code Snippet
The weighted moments are calculated as follows:
where wt is the vector of weights at time t. Using the vectorization operator vec the weighted moments can be re-written compactly as:
One application of this is in the calculation of the Constant Absolute Risk Aversion utility function, using a Taylor series expansion as in Ghalanos, Rossi, and Urga (2015):
where η represents the investor’s constant absolute risk aversion, with higher (lower) values representing higher (lower) aversion, and W̄t = w′tEt − 1[yt] is the expected portfolio return at time t. Note that this approximation includes two moments more than the mean-variance criterion, with the skewness and kurtosis are directly related to investor preferences (dislike) for odd (even) moments, under certain mild assumptions given in Scott and Horvath (1980).
Code Snippet
The weighted conditional density of the model may be obtained via the
inversion of the Generalized Hyperbolic characteristic function through
the fast fourier transform (FFT) method as in Chen, Härdle, and Spokoiny (2010) or by simulation as in Section .
The tsmarch
package generates both simulated predictive
distributions as well as the weighted p*, d* and q* functions based on
FFT.
It should be noted that while the n-dimensional NIG distribution is closed under convolution, when the distributional higher moment parameters are allowed to be different across assets, as is the case here, this property no longer holds.
Provided that ϵt is a n-dimensional vector of innovations, marginally distributed as 1-dimensional standardized GH, the density of weighted asset returns, wi, tyi, t is:
where w̄t = w′tAHt1/2
and hence w̄i, t
is the i-th element of this
vector, μi, t
is the conditional mean of series i at time t, and the parameter vector (μ̄i, t, δi, t, βi, t, αi, t, λi)
represents the generalized hyperbolic distributional parameters which
have been converted from their (ρ, ζ) parameterization
used in the tsgarch
package estimation routine. In order to
obtain the weighted density we need to sum the individual log densities
of ϵi, t:
where, γ = ᾱj2 − β̄j2, υ = ᾱj2 − (β̄j + iu)2, and (ᾱj, β̄j, δ̄j, μ̄j) are the scaled versions of the parameters (αi, βi, δi, μi) as shown in equation . The density may be accurately approximated by FFT as follows:
Once the density is obtained, we use smooth approximations to obtain functions for the density (d*), quantile (q*) and distribution (p*), with random number generation provided by the use of the inverse transform method using the quantile function.
Code Snippet
Consider the equation for the construction of the conditional covariance matrix Σt:
where Ht is the diagonal conditional covariance matrix of the independent components. The factor news impact covariance surface is a measure of the effect of shocks from the independent factors on the conditional covariance of the returns of assets i and j at time t − 1. In order to construct this we first generate the news impact curves for all the factors using a common grid of shock values, and then calculate the effect of these on the conditional covariance of the returns of assets i and j. Since the factors are independent, we simply loop through all pairwise combinations of these shocks to calculate the covariance surface, keeping the value of the other factors constant:
where :
This means that the covariance will be affected by both the relative sign and magnitude of the loadings and the relative magnitude of the variance of the factors.
The news impact surface is not restricted to the covariance matrix, but can be extended to the conditional co-skewness and co-kurtosis, but that is only truly meaningful when the higher moment distributional parameters are allowed to vary as in Ghalanos, Rossi, and Urga (2015).
Code Snippet
Figures and provide a visual summary of the main methods in the
package for the DCC and GOGARCH type models. Blue boxes denote a method
which produces a custom class which can be further manipulated with
other methods, whilst Appendix provides a summary of some of the methods
and output classes in the tsmarch
package and the type of
parallelization they use.
Pearson’s product moment correlation R totally characterizes the dependence structure in the multivariate Normal case, where zero correlation also implies independence, but can only characterize the ellipses of equal density when the distribution belongs to the elliptical class. In the latter case for instance, with a distribution such as the multivariate Student, the correlation cannot capture tail dependence determined by the shape parameter. Furthermore, it is not invariant under monotone transformations of original variables making it inadequate in many cases. An alternative measure which does not suffer from this is Kendall’s τ (see Kruskal (1958)), based on rank correlations which makes no assumption about the marginal distributions but depends only on the copula C. It is a pairwise measure of concordance calculated as:
For elliptical distributions, Lindskog, Mcneil, and Schmock (2003), proved that there is a one-to-one relationship between this measure and Pearson’s correlation coefficient ρ given by:
which under certain assumptions (e.g. in the multivariate Normal case) simplifies to $$\frac{2}{\pi} \arcsin \rho_{ij}$$. Kendall’s τ is invariant under monotone transformations making it more suitable when working with non-elliptical distributions. A useful application arises in the case of the multivariate Student distribution where the maximum likelihood approach for the estimation of the correlation matrix R become unfeasible for large dimensions. Instead, estimating the sample counterpart of Kendall’s τ from the transformed margins and then translating that into a correlation matrix as detailed in equation provides for a method of moments type estimator. The shape parameter ν is then estimated keeping the correlation matrix constant, with little loss in efficiency.
The table below provides a summary of selected methods, the classes
they belong to, the classes they output, whether they are parallelized
and the dimensions of the objects involved. For example, the tsaggrate
method on predicted and simulated objects will return a list of length
4, with each slot having an object of class
tsmodel.distribution
of size sim × h
representing the predictive simulatated distribution of each of the
weighted moments generated by the model. The output for the
tsconvolve
method which is list(h):list(sim)
means that the output is a list of length h where each slot contains a list of
length sim.