Title: | Estimate Dynamic Factor Models with Sparse Loadings |
---|---|
Description: | Implementation of various estimation methods for dynamic factor models (DFMs) including principal components analysis (PCA) Stock and Watson (2002) <doi:10.1198/016214502388618960>, 2Stage Giannone et al. (2008) <doi:10.1016/j.jmoneco.2008.05.010>, expectation-maximisation (EM) Banbura and Modugno (2014) <doi:10.1002/jae.2306>, and the novel EM-sparse approach for sparse DFMs Mosley et al. (2023) <arXiv:2303.11892>. Options to use classic multivariate Kalman filter and smoother (KFS) equations from Shumway and Stoffer (1982) <doi:10.1111/j.1467-9892.1982.tb00349.x> or fast univariate KFS equations from Koopman and Durbin (2000) <doi:10.1111/1467-9892.00186>, and options for independent and identically distributed (IID) white noise or auto-regressive (AR(1)) idiosyncratic errors. Algorithms coded in 'C++' and linked to R via 'RcppArmadillo'. |
Authors: | Luke Mosley [aut], Tak-Shing Chan [aut], Alex Gibberd [aut, cre] |
Maintainer: | Alex Gibberd <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0 |
Built: | 2024-11-04 06:50:40 UTC |
Source: | CRAN |
A full dataset used for nowcasting UK trade in goods (Exports) including the 9 export target series and 436 monthly indicator series.
exports
exports
exports
A data frame with 226 observations and 445 variables:
Export target series (9) and monthly indicators (436).
Monthly values from Jan 2004 to Oct 2022.
...
# load exports data data = exports
# load exports data data = exports
Internal missing data is filled in using a cubic spline. Start and end of sample missing data is filled in using the median of the series and then smoothed with an MA(3) process.
fillNA(X)
fillNA(X)
X |
n x p numeric matrix of stationary and standardized time series |
X numeric matrix with missing data interpolated
idx.na logical matrix with
TRUE
if missing and FALSE
otherwise.
A subset of quarterly CPI Index data from the ONS Inflation data (Q4 2022 release).
A subset of quarterly CPI Index data from the ONS Inflation data (Q4 2022 release).
inflation inflation
inflation inflation
inflation
A data frame with 135 observations and 36 variables:
Different classes of inflation index
Quarterly values of the relevant CPI index, benchmarked to 2015=100
...
inflation
A data frame with 135 observations and 36 variables:
Different classes of inflation index
Quarterly values of the relevant CPI index, benchmarked to 2015=100
...
https://www.ons.gov.uk/economy/inflationandpriceindices/datasets/consumerpriceindices
https://www.ons.gov.uk/economy/inflationandpriceindices/datasets/consumerpriceindices
data = inflation # load inflation data data = inflation
data = inflation # load inflation data data = inflation
Implementation of the classic multivariate Kalman filter and smoother equations of Shumway and Stoffer (1982).
kalmanMultivariate(X, a0_0, P0_0, A, Lambda, Sig_e, Sig_u)
kalmanMultivariate(X, a0_0, P0_0, A, Lambda, Sig_e, Sig_u)
X |
n x p, numeric matrix of (stationary) time series |
a0_0 |
k x 1, initial state mean vector |
P0_0 |
k x k, initial state covariance matrix |
A |
k x k, state transition matrix |
Lambda |
p x k, measurement matrix |
Sig_e |
p x p, measurement equation residuals covariance matrix (diagonal) |
Sig_u |
k x k, state equation residuals covariance matrix |
For full details of the classic multivariate KFS approach, please refer to Mosley et al. (2023). Note that is the number of observations,
is the number of time series, and
is the number of states.
logl log-likelihood of the innovations from the Kalman filter
at_t , filtered state mean vectors
Pt_t , filtered state covariance matrices
at_n , smoothed state mean vectors
Pt_n , smoothed state covariance matrices
Pt_tlag_n , smoothed state covariance with lag
Mosley, L., Chan, TS., & Gibberd, A. (2023). sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings.
Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis, 3(4), 253-264.
Univariate treatment (sequential processing) of the multivariate Kalman filter and smoother equations for fast implementation. Refer to Koopman and Durbin (2000).
kalmanUnivariate(X, a0_0, P0_0, A, Lambda, Sig_e, Sig_u)
kalmanUnivariate(X, a0_0, P0_0, A, Lambda, Sig_e, Sig_u)
X |
n x p, numeric matrix of (stationary) time series |
a0_0 |
k x 1, initial state mean vector |
P0_0 |
k x k, initial state covariance matrix |
A |
k x k, state transition matrix |
Lambda |
p x k, measurement matrix |
Sig_e |
p x p, measurement equation residuals covariance matrix (diagonal) |
Sig_u |
k x k, state equation residuals covariance matrix |
For full details of the univariate filtering approach, please refer to Mosley et al. (2023). Note that is the number of observations,
is the number of time series, and
is the number of states.
logl log-likelihood of the innovations from the Kalman filter
at_t , filtered state mean vectors
Pt_t , filtered state covariance matrices
at_n , smoothed state mean vectors
Pt_n , smoothed state covariance matrices
Pt_tlag_n , smoothed state covariance with lag
Koopman, S. J., & Durbin, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis, 21(3), 281-296.
Mosley, L., Chan, TS., & Gibberd, A. (2023). sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings.
Produce a vector of log10 space values
logspace(x1, x2, n = 50)
logspace(x1, x2, n = 50)
x1 |
lower bound |
x2 |
upper bound |
n |
length |
Vector of log10 spaced values of length n
Visualise the amount of missing data in a data matrix or data frame.
missing_data_plot( data, present.colour = "grey80", missing.colour = "grey20", use.names = TRUE )
missing_data_plot( data, present.colour = "grey80", missing.colour = "grey20", use.names = TRUE )
data |
Numeric matrix or data frame with NA for missing values. |
present.colour |
The colour for data that is present. Default is 'grey80'. |
missing.colour |
The colour for data that is missing. Default is 'grey20'. |
use.names |
Logical. Label the axis with data variables names. Default is TRUE. Set to FALSE to remove. |
A matrix plot showing where missing data is present.
Make plots for the output of sparseDFM(). Options include:
factor
- plot factor estimate series on top of the original standardized stationary data
loading.heatmap
- make a heatmap of the loadings matrix
loading.lineplot
- make a lineplot of variable loadings for a given factor
loading.grouplineplot
- separate variable groups into colours for better visualisation
residual
- boxplot or scatterplot of residuals
lasso.bic
- BIC values for the LASSO tuning parameter
em.convergence
- log-likelihood convergence of EM iterations
## S3 method for class 'sparseDFM' plot( x, type = "factor", which.factors = 1:(dim(x$state$factors)[2]), scale.factors = TRUE, which.series = 1:(dim(x$params$Lambda)[1]), loading.factor = 1, series.col = "grey", factor.col = "black", factor.lwd = 2, factor.lab = NULL, use.series.names = FALSE, series.lab = NULL, series.labpos = NULL, colorkey = TRUE, col.regions = NULL, group.names = NULL, group.cols = NULL, group.legend = TRUE, residual.type = "boxplot", scatter.series = 1, min.bic.col = "red", alpha_index = "best", ... )
## S3 method for class 'sparseDFM' plot( x, type = "factor", which.factors = 1:(dim(x$state$factors)[2]), scale.factors = TRUE, which.series = 1:(dim(x$params$Lambda)[1]), loading.factor = 1, series.col = "grey", factor.col = "black", factor.lwd = 2, factor.lab = NULL, use.series.names = FALSE, series.lab = NULL, series.labpos = NULL, colorkey = TRUE, col.regions = NULL, group.names = NULL, group.cols = NULL, group.legend = TRUE, residual.type = "boxplot", scatter.series = 1, min.bic.col = "red", alpha_index = "best", ... )
x |
an object of class 'sparseDFM'. |
type |
character. The type of plot: |
which.factors |
numeric vector of integers representing which factors should be plotted in |
scale.factors |
logical. Standardize the factor estimates when plotting in |
which.series |
numeric vector of integers representing which series should be plotted in |
loading.factor |
integer. The factor to use in |
series.col |
character. The colour of the background series plotted in |
factor.col |
character. The colour of the factor estimate line in |
factor.lwd |
integer. The line width of the factor estimate line in |
factor.lab |
vector of characters to label each factor in |
use.series.names |
logical. Set to TRUE if plot should display series names in the data matrix X. Default is |
series.lab |
vector of characters to label each data series in |
series.labpos |
numeric vector of integers representing which series are labeled by |
colorkey |
logical. Display the colour key of the heatmap in |
col.regions |
vector of gradually varying colors for |
group.names |
vector of characters of the same dimension as |
group.cols |
vector of characters of the same dimension as the number of different groups in |
group.legend |
logical. Display the legend. Default is |
residual.type |
character. The type of residual plot: |
scatter.series |
integer. The series to plot when |
min.bic.col |
character. Colour for the best |
alpha_index |
Choose which L1 penalty parameter to display the results for. Default is 'best'. Otherwise, input a number between 1:length(alpha_grid) that indicates the required alpha parameter. |
... |
for |
Plots for the output of sparseDFM().
Predict the next h steps ahead for the factor estimates and the data series. Given information up to time , a h-step ahead forecast is
, where
for the IID idiosyncratic error case.
## S3 method for class 'sparseDFM' predict(object, h = 1, standardize = FALSE, alpha_index = "best", ...) ## S3 method for class 'sparseDFM_forecast' print(x, ...)
## S3 method for class 'sparseDFM' predict(object, h = 1, standardize = FALSE, alpha_index = "best", ...) ## S3 method for class 'sparseDFM_forecast' print(x, ...)
object |
an object of class 'sparseDFM'. |
h |
integer. The number of steps ahead to compute the forecast for. Default is |
standardize |
logical. Returns data series forecasts in the original data scale if set to |
alpha_index |
Choose which L1 penalty parameter to display the results for. Default is 'best'. Otherwise, input a number between 1:length(alpha_grid) that indicates the required alpha parameter. |
... |
Further |
x |
an object of class 'sparseDFM_forecast' from |
X_hat numeric matrix of data series forecasts.
F_hat numeric matrix of factor forecasts.
e_hat numeric matrix of AR(1) idiosyncratic error forecasts if
err
=AR1
in sparseDFM
.
h forecasts produced for h steps ahead.
err the type of idiosyncratic errors used in sparseDFM
.
Prints out the h-step ahead forecast from predict.sparseDFM
.
Generate a ragged edge structure for a data matrix
raggedEdge(X, lags)
raggedEdge(X, lags)
X |
numeric data matrix |
lags |
vector of integers representing publication lag of each variable |
ragged edge version of X
data = matrix(rnorm(100),ncol=10) pub_lags = c(rep(2,5),rep(1,3),rep(0,2)) new_data = raggedEdge(data, pub_lags)
data = matrix(rnorm(100),ncol=10) pub_lags = c(rep(2,5),rep(1,3),rep(0,2)) new_data = raggedEdge(data, pub_lags)
Obtain the residuals or fitted values of the sparseDFM
fit.
## S3 method for class 'sparseDFM' fitted(object, standardize = FALSE, alpha_index = "best", ...) ## S3 method for class 'sparseDFM' residuals(object, standardize = FALSE, alpha_index = "best", ...)
## S3 method for class 'sparseDFM' fitted(object, standardize = FALSE, alpha_index = "best", ...) ## S3 method for class 'sparseDFM' residuals(object, standardize = FALSE, alpha_index = "best", ...)
object |
an object of class 'sparseDFM'. |
standardize |
logical. The residuals and fitted values should be standardized. Default is |
alpha_index |
Choose which L1 penalty parameter to display the results for. Default is 'best'. Otherwise, input a number between 1:length(alpha_grid) that indicates the required alpha parameter. |
... |
Further |
Residuals or fitted values of sparseDFM
.
Main function to allow estimation of a DFM or a sparse DFM (with sparse loadings) on stationary data that may have arbitrary patterns of missing data. We allow the user:
an option for estimation method - "PCA"
, "2Stage"
, "EM"
or "EM-sparse"
an option for IID
or AR1
idiosyncratic errors
an option for Kalman Filter/Smoother estimation using standard multivariate
equations or fast univariate
filtering equations
sparseDFM( X, r, q = 0, alphas = logspace(-2, 3, 100), alg = "EM-sparse", err = "IID", kalman = "univariate", store.parameters = FALSE, standardize = TRUE, max_iter = 100, threshold = 1e-04 )
sparseDFM( X, r, q = 0, alphas = logspace(-2, 3, 100), alg = "EM-sparse", err = "IID", kalman = "univariate", store.parameters = FALSE, standardize = TRUE, max_iter = 100, threshold = 1e-04 )
X |
|
|||||||||||||||||
r |
Integer. Number of factors. |
|||||||||||||||||
q |
Integer. The first q series (columns of X) should not be made sparse. Default q = 0. |
|||||||||||||||||
alphas |
Numeric vector or value of LASSO regularisation parameters. Default is alphas = logspace(-2,3,100). |
|||||||||||||||||
alg |
Character. Option for estimation algorithm. Default is
|
|||||||||||||||||
err |
Character. Option for idiosyncratic errors. Default is
|
|||||||||||||||||
kalman |
Character. Option for Kalman filter and smoother equations. Default is
|
|||||||||||||||||
store.parameters |
Logical. Store outputs for every alpha L1 penalty parameter. Default is FALSE. |
|||||||||||||||||
standardize |
Logical. Standardize the data before estimating the model. Default is |
|||||||||||||||||
max_iter |
Integer. Maximum number of EM iterations. Default is 100. |
|||||||||||||||||
threshold |
Numeric. Tolerance on EM iterates. Default is 1e-4. |
For full details of the model please refer to Mosley et al. (2023).
A list-of-lists-like S3 object of class 'sparseDFM' with the following elements:
data |
A list containing information about the data with the following elements:
|
|||||||||||||||||||||||||||||||||||||||||||||
params |
A list containing the estimated parameters of the model with the following elements:
|
|||||||||||||||||||||||||||||||||||||||||||||
state |
A list containing the estimated states and state covariances with the following elements:
|
|||||||||||||||||||||||||||||||||||||||||||||
em |
A list containing information about the EM algorithm with the following elements:
|
|||||||||||||||||||||||||||||||||||||||||||||
alpha.output |
Parameter and state outputs for each L1-norm penalty parameter in |
Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.
Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.
Giannone, D., Reichlin, L., & Small, D. (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of monetary economics, 55(4), 665-676.
Koopman, S. J., & Durbin, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis, 21(3), 281-296.
Mosley, L., Chan, TS., & Gibberd, A. (2023). sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings.
Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis, 3(4), 253-264.
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460), 1167-1179.
# load inflation data set data = inflation # reduce the size for these examples - full data found in vignette data = data[1:60,] # make stationary by taking first differences new_data = transformData(data, rep(2,ncol(data))) # tune for the number of factors to use tuneFactors(new_data, type = 2) # fit a PCA using 3 PC's fit.pca <- sparseDFM(new_data, r = 3, alg = 'PCA') # fit a DFM using the two-stage approach fit.2stage <- sparseDFM(new_data, r = 3, alg = '2Stage') # fit a DFM using EM algorithm with 3 factors fit.dfm <- sparseDFM(new_data, r = 3, alg = 'EM') # fit a Sparse DFM with 3 factors fit.sdfm <- sparseDFM(new_data, r = 3, alg = 'EM-sparse') # observe the factor loadings of the sparse DFM plot(fit.sdfm, type = 'loading.heatmap') # observe the factors plot(fit.sdfm, type = 'factor') # observe the residuals plot(fit.sdfm, type = 'residual') # observe the LASSO parameter selected and BIC values plot(fit.sdfm, type = 'lasso.bic') # predict 3 steps ahead predict(fit.sdfm, h = 3)
# load inflation data set data = inflation # reduce the size for these examples - full data found in vignette data = data[1:60,] # make stationary by taking first differences new_data = transformData(data, rep(2,ncol(data))) # tune for the number of factors to use tuneFactors(new_data, type = 2) # fit a PCA using 3 PC's fit.pca <- sparseDFM(new_data, r = 3, alg = 'PCA') # fit a DFM using the two-stage approach fit.2stage <- sparseDFM(new_data, r = 3, alg = '2Stage') # fit a DFM using EM algorithm with 3 factors fit.dfm <- sparseDFM(new_data, r = 3, alg = 'EM') # fit a Sparse DFM with 3 factors fit.sdfm <- sparseDFM(new_data, r = 3, alg = 'EM-sparse') # observe the factor loadings of the sparse DFM plot(fit.sdfm, type = 'loading.heatmap') # observe the factors plot(fit.sdfm, type = 'factor') # observe the residuals plot(fit.sdfm, type = 'residual') # observe the LASSO parameter selected and BIC values plot(fit.sdfm, type = 'lasso.bic') # predict 3 steps ahead predict(fit.sdfm, h = 3)
Summary and print outputs for class 'sparseDFM'.
## S3 method for class 'sparseDFM' print(x, ...) ## S3 method for class 'sparseDFM' summary(object, ...)
## S3 method for class 'sparseDFM' print(x, ...) ## S3 method for class 'sparseDFM' summary(object, ...)
x |
an object of class 'sparseDFM' |
... |
Further |
object |
an object of class 'sparseDFM' |
Information on the model fitted.
Summary information on estimation details.
Methods to transform the data to make it stationary. Input a numeric data matrix and what transform is required for each data series. Returns a
matrix of the transformed data.
transformData(X, stationary_transform)
transformData(X, stationary_transform)
X |
n x p numeric data matrix |
|||||||||||||||||||||||||||||
stationary_transform |
p-dimensional vector filled with numbers from
|
Transformed stationary version of .
Uses Bai and Ng (2002) information criteria approach. Missing data is interpolated using the fillNA
function.
tuneFactors( X, type = 2, standardize = TRUE, r.max = min(15, ncol(X) - 1), plot = TRUE )
tuneFactors( X, type = 2, standardize = TRUE, r.max = min(15, ncol(X) - 1), plot = TRUE )
X |
|
type |
Character. Option for which information criteria to use. Default is 2. |
standardize |
Logical. Standardize the data before estimating the model. Default is |
r.max |
Integer. Maximum number of factors to search for. Default is min(15,ncol(X)-1). |
plot |
Logical. Make a plot showing the IC value for each of the number of factors considered. Default is |
To calculate the number of factors to use in the model, the information criteria approach of Bai and Ng (2002) is used. This can be done before sparseDFM
is fitted to the data to determine r
. Bai and Ng (2002) consider 3 types of information criteria with different penalties of the form:
The sum of squared residuals for factors
with
is found using PCA on the standardized data set
. The estimated factors
corresponding to the principle components and the estimated loadings
corresponding to the eigenvectors. Should the data contain missing values, then the missing data is interpolated using
fillNA
.
The number of factors to use will correspond to for
or
. Type 2 is the highest when working in finite samples and therefore is set to default.
The number of factors to use according to Bai and Ng (2002) information criteria.
Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191-221.