--- title: "Getting Started with pvarife" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with pvarife} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Introduction **pvarife** implements the Panel VAR estimator with Interactive Fixed Effects (IFE) of Tuğan (2021). The model is $$ y_{i,t} = \sum_{\ell=1}^{L} \Theta_\ell \, y_{i,t-\ell} + F_t \lambda_i + e_{i,t}, \quad i = 1,\ldots,I,\; t = 1,\ldots,T, $$ where $y_{i,t}$ is a $K\times 1$ vector of endogenous variables, $F_t$ is an $r\times 1$ vector of **unobservable** common factors, $\lambda_i$ is a unit-specific loading vector, and $e_{i,t}$ is an idiosyncratic error. The estimator jointly identifies $\beta = [\Theta_1,\ldots,\Theta_L]$, $F = \{F_t\}$, and $\Lambda = \{\lambda_i\}$ via an inner/outer EM algorithm (Bai 2009). Impulse responses under recursive (Cholesky) identification and asymptotic confidence bands (Theorem 2.3 of Tuğan 2021) are provided. > **Reference:** > Tuğan, M. (2021). Panel VAR models with interactive fixed effects. > *Econometrics Journal*, 24, 225–246. > --- ## 1 Data Format The main input is a three-dimensional array `y[i, t, k]`: | Dimension | Index | Meaning | |-----------|-------|---------| | 1 | `i` | cross-sectional unit | | 2 | `t` | time period | | 3 | `k` | VAR variable | `NA` values are allowed for **unbalanced panels**; the algorithm imputes missing entries via the EM step. --- ## 2 Simulate Data ```{r simulate} library(pvarife) set.seed(1) sim <- sim_pvarife( n_units = 50, # I n_time = 30, # T n_vars = 2, # K n_lags = 1, # lag order n_factors = 1, # number of interactive fixed effects seed = 42 ) dim(sim$y) # I x T x K sim$beta_true # true coefficient vector sim$sigma_true # true reduced-form covariance ``` The function generates data from the DGP of Tuğan (2021), Section S10: - VAR coefficient matrix $\Theta_1 = \begin{pmatrix}0.65 & 0.30\\0.20 & 0.60\end{pmatrix}$ - Reduced-form covariance $\Sigma_e = \begin{pmatrix}1 & 0.5\\0.5 & 1\end{pmatrix}$ - Common factor: AR(1) with coefficient 0.5 - Loadings $\lambda_{k,i} \sim N(1, 1)$ independently --- ## 3 Estimate the Model ```{r estimate} fit <- pvarife( y = sim$y, n_lags = 1, n_factors = 1, n_out = 20, # outer GLS iterations (paper default: 50) n_in = 5 # inner PCA/EM iterations (paper default: 10) ) print(fit) ``` The `n_out` and `n_in` parameters control the convergence of the EM algorithm. For publication-quality results use the defaults (`n_out = 50`, `n_in = 10`); smaller values speed up experimentation. ### Extract coefficients ```{r coef} coef(fit) ``` The first $K$ elements are equation-specific intercepts; the remaining $K^2 L$ elements are the VAR lag matrices $\Theta_1, \ldots, \Theta_L$ (stacked row-major). --- ## 4 Impulse Response Functions ### Point estimates (no confidence bands) ```{r irf-point} ir <- compute_irf( fit, n_periods = 8, shock = 1L # shock to first variable (Cholesky ordering) ) dim(ir) # K x n_periods plot(ir, var_names = c("Variable 1", "Variable 2")) ``` ### Confidence bands (asymptotic simulation) The `irf_bands()` function draws from the joint asymptotic distribution of $\hat\beta$ and the common component (Theorem 2.3 of Tuğan 2021). ```{r irf-bands, eval=FALSE} bands <- irf_bands( fit, n_periods = 8, n_draw = 200, # paper default: 500 level = 0.95, seed = 1 ) plot(bands, var_names = c("Variable 1", "Variable 2")) ``` ### Classical residual bootstrap (optional) For robustness checks that do not rely on asymptotic approximations: ```{r bootstrap, eval=FALSE} bsb <- bootstrap_irf_bands( fit, n_periods = 8, n_boot = 100, # computationally expensive level = 0.95, seed = 2 ) plot(bsb, var_names = c("Variable 1", "Variable 2")) ``` --- ## 5 Long-Run Identification (Blanchard-Quah) By default `compute_irf()` uses **short-run (Cholesky) identification**: the impact matrix $A_0 = \mathrm{chol}(\hat\Sigma)^\top$ is lower-triangular, so shock 1 has no *contemporaneous* effect on variable 2. For **long-run identification**, the *cumulative* impact matrix $C(1) = (I - \Theta_1)^{-1} A_0$ is constrained to be lower-triangular: shock 1 has no *long-run* (permanent) effect on variable 2. The impact matrix becomes $$A_0 = (I - \Theta_1)\,\mathrm{chol}(D)^\top, \quad D = (I-\Theta_1)^{-1}\hat\Sigma(I-\Theta_1)^{-\top}.$$ This matches the MATLAB Monte Carlo code `IRs_to_Shocks_LR_Identification.m`. ### Long-run DGP ```{r lr-sim, eval=FALSE} sim_lr <- sim_pvarife( n_units = 50, n_time = 30, identification = "long_run", # Blanchard-Quah DGP seed = 42 ) sim_lr$identification # "long_run" sim_lr$diff_vars_suggested # 1L (display cumulative IRF for variable 1) ``` ### Long-run IRFs and bands ```{r lr-irf, eval=FALSE} fit_lr <- pvarife(sim_lr$y, n_lags = 1, n_factors = 1, n_out = 20, n_in = 5) # Point estimate at estimated beta (uncorrected) ir_lr <- compute_irf( fit_lr, n_periods = 8, identification = "long_run", diff_vars = sim_lr$diff_vars_suggested # cumulate variable 1 ) plot(ir_lr, var_names = c("Variable 1", "Variable 2")) # Bias-corrected point estimate ir_lr_bc <- compute_irf( fit_lr, n_periods = 8, identification = "long_run", diff_vars = 1L, bias_correct = TRUE # apply Theorem 2.3 bias correction ) # Full confidence bands (median is already bias-corrected) bands_lr <- irf_bands( fit_lr, n_periods = 8, identification = "long_run", diff_vars = 1L, n_draw = 200, seed = 1 ) plot(bands_lr, var_names = c("Variable 1", "Variable 2")) ``` --- ## 6 Bias-Corrected Point Estimate The `pvarife()` estimator has a bias of order $O(1/\sqrt{TC})$ documented in Theorem 2.3 of Tuğan (2021). Setting `bias_correct = TRUE` in `compute_irf()` subtracts the analytical bias from $\hat\beta$ before computing the MA representation. | Call | Description | |------|-------------| | `compute_irf(fit)` | Fast uncorrected point estimate | | `compute_irf(fit, bias_correct = TRUE)` | Bias-corrected point estimate | | `irf_bands(fit)` | Full inference; **median is implicitly bias-corrected** | ```{r bias-correct, eval=FALSE} ir_unc <- compute_irf(fit, n_periods = 8) ir_bc <- compute_irf(fit, n_periods = 8, bias_correct = TRUE) # ir_bc is close to the median returned by irf_bands() ``` --- ## 7 Asymptotic Inference ```{r avar, eval=FALSE} avar <- asymptotic_var(fit) se <- sqrt(diag(avar$variance)) # Bias-corrected 95% confidence intervals for each element of beta beta_hat <- as.numeric(fit$beta) beta_biasC <- beta_hat - avar$bias ci_lower <- beta_biasC - 1.96 * se ci_upper <- beta_biasC + 1.96 * se data.frame( parameter = names(coef(fit)), estimate = round(beta_hat, 4), std_err = round(se, 4), ci_lower = round(ci_lower, 4), ci_upper = round(ci_upper, 4) ) ``` --- ## 8 Working with Unbalanced Panels Simply set missing cells to `NA` before calling `pvarife()`: ```{r unbalanced, eval=FALSE} y_unbal <- sim$y # Randomly set 15% of unit-time observations to NA set.seed(10) mask <- array(runif(prod(dim(y_unbal))) < 0.15, dim = dim(y_unbal)) y_unbal[mask] <- NA fit_unbal <- pvarife(y_unbal, n_lags = 1, n_factors = 1, n_out = 10, n_in = 5) print(fit_unbal) ``` The EM step in the inner loop imputes missing observations using the current factor estimates, consistent with the approach of Bai (2009). --- ## 9 Tuning Advice | Parameter | Recommended | Notes | |-----------|-------------|-------| | `n_out` | 50 | Increase if coefficients have not converged | | `n_in` | 10 | Increase for more precise factor extraction | | `n_factors` | Use IC criteria | See paper Section 3 for selection | | `n_draw` | 500 | For `irf_bands()`; reduce for testing | | `n_boot` | 200 | For `bootstrap_irf_bands()`; very slow | To check convergence, compare results with `n_out = 50` and `n_out = 100`. --- ## 10 Replicating the Liaqat (2019) Application The paper's empirical application uses panel data on government debt and capital formation from Liaqat (2019). The dataset is not bundled with this package due to redistribution restrictions, but the replication code is available in the MATLAB package at `documentforrepl/replication package/Economic Application/`. The R workflow would be: 1. Load the Liaqat dataset and compute first-differenced variables 2. Set `n_lags = 1`, `n_factors = 2` (for Panel B of Figure 1, `rmax = 2`) 3. Call `pvarife()`, `compute_irf()`, and `irf_bands()` with `n_draw = 500` 4. Normalize IRFs by dividing by `ir[1, 1]` (shock to first variable at horizon 1) --- ## References - Tuğan, M. (2021). Panel VAR models with interactive fixed effects. *Econometrics Journal*, **24**, 225–246. - Bai, J. (2009). Panel data models with interactive fixed effects. *Econometrica*, **77**(4), 1229–1279. - Liaqat, Z. (2019). Does public debt cause crowding out? A cross-country empirical evidence. *Economics Letters*, **181**, 1–4.