--- title: "Different Variance-Covariance Estimators" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Different Variance-Covariance Estimators} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```r library(capybara) ``` A very quick verification of Ross (2004) is to obtain the coefficients for the OLS model from table 1: ```r felm(ltrade ~ bothin + onein + gsp + ldist + lrgdp + lrgdppc + regional + custrict + comlang + border + landl + island + lareap + comcol + curcol + colony + comctry | year, data = ross2004) ``` ```r Formula: ltrade ~ bothin + onein + gsp + ldist + lrgdp + lrgdppc + regional + custrict + comlang + border + landl + island + lareap + comcol + curcol + colony + comctry | year Estimates: | | Estimate | Std. Error | z value | Pr(>|z|) | |----------|----------|------------|-----------|-----------| | bothin | -0.0423 | 0.0159 | -2.6540 | 0.0080 ** | | onein | -0.0583 | 0.0154 | -3.7722 | 0.0002 ** | | gsp | 0.8585 | 0.0111 | 77.0829 | 0.0000 ** | | ldist | -1.1190 | 0.0061 | -183.0863 | 0.0000 ** | | lrgdp | 0.9159 | 0.0026 | 355.9087 | 0.0000 ** | | lrgdppc | 0.3214 | 0.0039 | 83.2804 | 0.0000 ** | | regional | 1.1988 | 0.0360 | 33.2956 | 0.0000 ** | | custrict | 1.1181 | 0.0374 | 29.8774 | 0.0000 ** | | comlang | 0.3125 | 0.0110 | 28.3085 | 0.0000 ** | | border | 0.5257 | 0.0266 | 19.7328 | 0.0000 ** | | landl | -0.2706 | 0.0093 | -29.0310 | 0.0000 ** | | island | 0.0419 | 0.0095 | 4.4335 | 0.0000 ** | | lareap | -0.0967 | 0.0020 | -47.5692 | 0.0000 ** | | comcol | 0.5846 | 0.0162 | 36.1094 | 0.0000 ** | | curcol | 1.0751 | 0.1067 | 10.0763 | 0.0000 ** | | colony | 1.1638 | 0.0312 | 37.3391 | 0.0000 ** | | comctry | -0.0163 | 0.2623 | -0.0622 | 0.9504 | Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10 R-squared : 0.648 Adj. R-squared: 0.6479 Fixed effects: year: 52 Number of observations: Full 234597; Missing 0; Perfect classification 0 ``` This does not involve many fixed effects. However, capybara will be particularly useful to obtain different standard errors for the same functional form, which is the main focus of table 4B in Cameron & Miller (2014). Table 4B shows the following clustering methods that we can replicate with the third part of the model formula (e.g. `y ~ x1 + x2 + ... | fe1 + fe2 | cl1 + cl2`) and using the `vcov` argument to select how the clustering variables are used: | `vcov` value | Description | |:-----------------|:-----------------------------------------------| | `"iid"` | Default OLS (i.i.d. errors) | | `"hetero"` | Heteroskedastic-robust (HC0) | | `"cluster"` | One-way cluster sandwich | | `"m-estimator"` | One-way M-estimator sandwich | | `"dyadic"` | Dyadic-robust (Cameron & Miller, 2014) | Capybara provides an `update()` method to easily modify the formula for each model, as we need to change the clustering variables for each model, which avoids error-prone copy-pasting of the full formula every time. ## IID (no cluster part in formula) ```r fml <- ltrade ~ bothin + onein + gsp + ldist + lrgdp + lrgdppc + regional + custrict + comlang + border + landl + island + lareap + comcol + curcol + colony + comctry | year fit_iid <- felm( fml, data = ross2004, vcov = "iid" ) ``` ## Heteroskedastic-robust (HC0) ```r fit_hetero <- felm( fml, data = ross2004, vcov = "hetero" ) ``` ## One-way: cluster on country-pair (g,h) Note that multi-part formulas (e.g., `y ~ x | fe | cl`) are handled by the Formula package. Therefore, updating those needs an explicit `Formula::as.Formula()` call if the formula type was not specified initially. ```r fml2 <- update(Formula::as.Formula(fml), . ~ . | . | pair) fit_pairs <- felm( fml2, data = ross2004, vcov = "cluster" ) ``` ## One-way: cluster on country 1 (g) ```r fit_ctry1 <- felm( update(fml2, . ~ . | . | ctry1), data = ross2004, vcov = "cluster" ) ``` ## One-way: cluster on country 2 (h) ```r fit_ctry2 <- felm( update(fml2, . ~ . | . | ctry2), data = ross2004, vcov = "cluster" ) ``` ## Two-way: cluster on (g) and (h) simultaneously ```r fit_2way <- felm( update(fml2, . ~ . | . | ctry1 + ctry2), data = ross2004, vcov = "cluster" ) ``` ## Dyadic-robust: Cameron-Miller (2014) sandwich with cross-dyad correlations ```r fit_dyadic <- felm( update(fml2, . ~ . | . | ctry1 + ctry2), data = ross2004, vcov = "dyadic" ) ``` With the above models, we can replicate Table 4B from Cameron & Miller (2014) using `summary_table()`, another convenience function in capybara to display multiple models side-by-side. This completely avoids calling `summary()` on each model and then conducting post-processing to extract the relevant information for the table. ```r summary_table( fit_iid, fit_hetero, fit_pairs, fit_ctry1, fit_ctry2, fit_2way, fit_dyadic, model_names = c("IID", "Hetero", "Pairs", "Country 1", "Country 2", "2-Way", "Dyadic") ) ``` ```r | Variable | IID | Hetero | Pairs | Country 1 | Country 2 | 2-Way | Dyadic | |------------------|--------------------|--------------------------|--------------------|--------------------|--------------------|--------------------|--------------------| | bothin | -0.042** | -0.042* | -0.042 | -0.042 | -0.042 | -0.042 | -0.042 | | | (0.016) | (0.018) | (0.053) | (0.122) | (0.107) | (0.122) | (0.186) | | onein | -0.058** | -0.058** | -0.058 | -0.058 | -0.058 | -0.058 | -0.058 | | | (0.015) | (0.018) | (0.049) | (0.095) | (0.071) | (0.095) | (0.126) | | gsp | 0.859** | 0.859** | 0.859** | 0.859** | 0.859** | 0.859** | 0.859** | | | (0.011) | (0.009) | (0.032) | (0.098) | (0.074) | (0.098) | (0.131) | | ldist | -1.119** | -1.119** | -1.119** | -1.119** | -1.119** | -1.119** | -1.119** | | | (0.006) | (0.006) | (0.022) | (0.051) | (0.050) | (0.051) | (0.078) | | lrgdp | 0.916** | 0.916** | 0.916** | 0.916** | 0.916** | 0.916** | 0.916** | | | (0.003) | (0.003) | (0.010) | (0.024) | (0.028) | (0.024) | (0.043) | | lrgdppc | 0.321** | 0.321** | 0.321** | 0.321** | 0.321** | 0.321** | 0.321** | | | (0.004) | (0.004) | (0.014) | (0.033) | (0.038) | (0.033) | (0.052) | | regional | 1.199** | 1.199** | 1.199** | 1.199** | 1.199** | 1.199** | 1.199** | | | (0.036) | (0.029) | (0.106) | (0.222) | (0.185) | (0.222) | (0.333) | | custrict | 1.118** | 1.118** | 1.118** | 1.118** | 1.118** | 1.118** | 1.118** | | | (0.037) | (0.035) | (0.122) | (0.165) | (0.176) | (0.165) | (0.235) | | comlang | 0.313** | 0.313** | 0.313** | 0.313** | 0.313** | 0.313** | 0.313** | | | (0.011) | (0.011) | (0.040) | (0.081) | (0.065) | (0.081) | (0.111) | | border | 0.526** | 0.526** | 0.526** | 0.526** | 0.526** | 0.526** | 0.526* | | | (0.027) | (0.026) | (0.111) | (0.148) | (0.150) | (0.148) | (0.207) | | landl | -0.271** | -0.271** | -0.271** | -0.271** | -0.271** | -0.271** | -0.271* | | | (0.009) | (0.010) | (0.031) | (0.069) | (0.079) | (0.069) | (0.110) | | island | 0.042** | 0.042** | 0.042 | 0.042 | 0.042 | 0.042 | 0.042 | | | (0.009) | (0.009) | (0.036) | (0.105) | (0.083) | (0.105) | (0.154) | | lareap | -0.097** | -0.097** | -0.097** | -0.097** | -0.097** | -0.097** | -0.097* | | | (0.002) | (0.002) | (0.008) | (0.025) | (0.025) | (0.025) | (0.043) | | comcol | 0.585** | 0.585** | 0.585** | 0.585** | 0.585** | 0.585** | 0.585** | | | (0.016) | (0.019) | (0.067) | (0.126) | (0.108) | (0.126) | (0.178) | | curcol | 1.075** | 1.075** | 1.075** | 1.075* | 1.075** | 1.075* | 1.075* | | | (0.107) | (0.070) | (0.235) | (0.462) | (0.256) | (0.462) | (0.480) | | colony | 1.164** | 1.164** | 1.164** | 1.164** | 1.164** | 1.164** | 1.164** | | | (0.031) | (0.023) | (0.117) | (0.193) | (0.115) | (0.193) | (0.209) | | comctry | -0.016 | -0.016 | -0.016 | -0.016 | -0.016 | -0.016 | -0.016 | | | (0.262) | (0.205) | (1.081) | (0.884) | (1.077) | (0.884) | (0.859) | | | | | | | | | | | Fixed effects | | | | | | | | | year | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | | | | | | | | | | N | 234,597 | 234,597 | 234,597 | 234,597 | 234,597 | 234,597 | 234,597 | | R-squared | 0.648 | 0.648 | 0.648 | 0.648 | 0.648 | 0.648 | 0.648 | | SE type | IID | Heteroskedastic-robust | Cluster-robust | Cluster-robust | Cluster-robust | Cluster-robust | Dyadic-robust | Standard errors in parenthesis Significance levels: ** p < 0.01; * p < 0.05; + p < 0.10 ```