The search.sur()
function is one of the three main
functions in the ldt
package. This vignette explains a
basic usage of this function using the world bank dataset (World Bank (2022)). Output growth is a widely
discussed topic in the field of economics. Several factors can influence
the rate and quality of output growth, including physical and human
capital, technological progress, institutions, trade openness, and
macroeconomic stability Chirwa and Odhiambo
(2016). We will use this package to identify the long-run
determinants of GDP per capita growth while making minimal
assumptions.
To minimize user discretion, we use all available data to select the set of potential regressors. Additionally, to avoid the endogeneity problem, we use information from before 2005 to explain the dependent variable after this year. This results in 571 potential regressors and 208 observations.
Of course, for this illustration, we use just the first 5 columns of data:
Here are the last few observations from this subset of the data:
tail(data)
#> NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> WSM 0.6948973 NA NA NA
#> XKX 3.5026405 NA NA NA
#> YEM -5.7036924 NA NA NA
#> ZAF -0.2084907 0.83394060 -1.533726 0.2149429
#> ZMB 1.9830446 -0.63088082 NA NA
#> ZWE 1.2915497 -0.05297394 NA -3.1003477
#> AG.CON.FERT.ZS AG.LND.AGRI.K2
#> WSM 5.16498292 -0.65382289
#> XKX NA NA
#> YEM 14.88937834 0.01804223
#> ZAF 2.20864028 -0.08807695
#> ZMB 4.42032159 0.37414717
#> ZWE -0.01642054 0.86883765
And here are some summary statistics for each variable:
sapply(as.data.frame(data), summary)
#> NY.GDP.PCAP.KD NY.GDP.PCAP.KD.lag AG.AGR.TRAC.NO AG.CON.FERT.PT.ZS
#> Min. -5.7036924 -2.7562067 -1.533726 -16.9560997
#> 1st Qu. -0.1431228 0.7235014 1.308611 -2.9008332
#> Median 1.0235845 1.7697597 2.876800 -1.2511855
#> Mean 1.1094147 1.9232678 3.856278 -1.6759932
#> 3rd Qu. 2.4052532 2.8698123 5.600846 0.2268538
#> Max. 7.1613101 12.7823340 20.814750 7.3208970
#> NA's 9.0000000 73.0000000 134.000000 146.0000000
#> AG.CON.FERT.ZS AG.LND.AGRI.K2
#> Min. -6.526751 -6.62157767
#> 1st Qu. 1.310606 -0.29306446
#> Median 4.326556 0.01489903
#> Mean 4.329299 0.05915160
#> 3rd Qu. 6.856201 0.57567407
#> Max. 15.949830 2.23869809
#> NA's 80.000000 7.00000000
The columns of the data represent the following variables:
NY.GDP.PCAP.KD: GDP per capita (constant 2015 US$)
AG.AGR.TRAC.NO: Agricultural machinery, tractors
AG.CON.FERT.PT.ZS: Fertilizer consumption (% of fertilizer production)
AG.CON.FERT.ZS: Fertilizer consumption (kilograms per hectare of arable land)
AG.LND.AGRI.K2: Agricultural land (sq. km)
We use the AIC metric to find four best explanatory models. Note that
we restrict the modelset by setting a maximum value for the number of
equations allowed in the models. Note that “intercept” and “lag” of the
dependent variable are included in all equations by
numFixPartitions
argument.
search_res <- search.sur(data = get.data(data, endogenous = 1),
combinations = get.combinations(sizes = c(1,2,3),
numTargets = 1,
numFixPartitions = 2),
metric <- get.search.metrics(typesIn = c("aic")),
items = get.search.items(bestK = 4))
print(search_res)
#> LDT search result:
#> Method in the search process: SUR
#> Expected number of models: 5, searched: 5 , failed: 0 (0%)
#> Elapsed time: 0.01667507 minutes
#> Length of results: 4
#> --------
#> Target (NY.GDP.PCAP.KD):
#> Evaluation (aic):
#> Best model:
#> endogenous: NY.GDP.PCAP.KD
#> exogenous: (3x1) (Intercept), NY.GDP.PCAP.KD.lag, AG.CON.FERT.PT.ZS
#> metric: 213.8385
#> --------
#> ** results for 4 best model(s) are saved
The output of the search.SUR()
function does not contain
any estimation results, but only the information required to replicate
them. The summary()
function returns a similar structure
but with the estimation results included.
The following code generates a table for presenting the result.
models <- lapply(0:3, function(i)
search_sum$results[which(sapply(search_sum$results, function(d)
d$info==i && d$typeName=="best model"))][[1]]$value)
names(models) <- paste("Best",c(1:4))
table <- coefs.table(models, latex = FALSE,
regInfo = c("obs", "aic", "sic"))
Best 1 | Best 2 | Best 3 | Best 4 | |
---|---|---|---|---|
(Intercept) | 0.34 | 0.80* | 0.41 | 0.85*** |
NY.GDP.PCAP.KD.lag | 0.41* | -0.10 | 0.20* | 0.04 |
AG.CON.FERT.PT.ZS | 0.08 | |||
AG.AGR.TRAC.NO | 0.07 | |||
AG.CON.FERT.ZS | 0.08** | |||
AG.LND.AGRI.K2 | 0.21 | |||
obs | 51 | 58 | 106 | 133 |
aic | 213.84 | 234.47 | 430.61 | 546.35 |
sic | 219.63 | 240.65 | 438.60 | 555.02 |
This package can be a recommended tool for empirical studies that
require reducing assumptions and summarizing uncertainty analysis
results. This vignette is just a demonstration. There are indeed other
options you can explore with the search.sur()
function. For
instance, you can experiment with different evaluation metrics or
restrict the model set based on your specific needs. Additionally,
there’s an alternative approach where you can combine modeling with
Principal Component Analysis (PCA) (see estim.sur()
function). I encourage you to experiment with these options and see how
they can enhance your data analysis journey.