For the unbiased statistical analysis of data transformation is necessary to transform data for fit model assumptions. AFR package has default time-series dataset macroKZ of macroeconomic parameters for 2010-2022 period. Dataset is raw, not ordered, with missing values and etc.
AFR recommends:
Step 1. Check data for the format, missing values, outliers and summary statistics (min, max and etc).
Step 2. Check data for stationarity.
Step 3. In case of non-stationarity transform data to stationarity by transformation method.
Step 4. As data is transformed, choose regressors for a model.
As default dataset macroKZ is uploaded, check dataset by checkdata and summary functions. Depending on the outputs, apply necessary functions to eliminate inappropriate properties of the data. For instance, in case of missing values delete these missing values.
data(macroKZ)
checkdata(macroKZ)
#> There are 0 missing items in the dataset.
#> There are 0 items in non-numeric format in the dataset.
#> There are 0 outliers in the dataset.
#> --------------------------------------------------------------------------------
#> Missing items
#> --------------------------------------------------------------------------------
#> [[1]]
#> real_gdp GDD_Agr_R GDD_Min_R
#> 0 0 0
#> GDD_Man_R GDD_Elc_R GDD_Con_R
#> 0 0 0
#> GDD_Trd_R GDD_Trn_R GDD_Inf_R
#> 0 0 0
#> GDD_Est_R GDD_R Rincpop_q
#> 0 0 0
#> Rexppop_q Rwage_q imp
#> 0 0 0
#> exp usdkzt eurkzt
#> 0 0 0
#> rurkzt poil GDP_DEF
#> 0 0 0
#> cpi realest_resed_prim realest_resed_sec
#> 0 0 0
#> realest_comm index_stock_weighted ntrade_Agr
#> 0 0 0
#> ntrade_Min ntrade_Man ntrade_Elc
#> 0 0 0
#> ntrade_Con ntrade_Trd ntrade_Trn
#> 0 0 0
#> ntrade_Inf fed_fund_rate govsec_rate_kzt_3m
#> 0 0 0
#> govsec_rate_kzt_1y govsec_rate_kzt_7y govsec_rate_kzt_10y
#> 0 0 0
#> tonia_rate rate_kzt_mort_0y_1y rate_kzt_mort_1y_iy
#> 0 0 0
#> rate_kzt_corp_0y_1y rate_usd_corp_0y_1y rate_kzt_corp_1y_iy
#> 0 0 0
#> rate_usd_corp_1y_iy rate_kzt_indv_0y_1y rate_kzt_indv_1y_iy
#> 0 0 0
#> realest_resed_prim_rus realest_resed_sec_rus cred_portfolio
#> 0 0 0
#> coef_k1 coef_k3 provisions
#> 0 0 0
#> percent_margin com_inc com_exp
#> 0 0 0
#> oper_inc oth_inc DR
#> 0 0 0
#>
#> --------------------------------------------------------------------------------
#>
#> --------------------------------------------------------------------------------
#> Numeric format
#> --------------------------------------------------------------------------------
#> [[1]]
#> real_gdp GDD_Agr_R GDD_Min_R
#> 0 0 0
#> GDD_Man_R GDD_Elc_R GDD_Con_R
#> 0 0 0
#> GDD_Trd_R GDD_Trn_R GDD_Inf_R
#> 0 0 0
#> GDD_Est_R GDD_R Rincpop_q
#> 0 0 0
#> Rexppop_q Rwage_q imp
#> 0 0 0
#> exp usdkzt eurkzt
#> 0 0 0
#> rurkzt poil GDP_DEF
#> 0 0 0
#> cpi realest_resed_prim realest_resed_sec
#> 0 0 0
#> realest_comm index_stock_weighted ntrade_Agr
#> 0 0 0
#> ntrade_Min ntrade_Man ntrade_Elc
#> 0 0 0
#> ntrade_Con ntrade_Trd ntrade_Trn
#> 0 0 0
#> ntrade_Inf fed_fund_rate govsec_rate_kzt_3m
#> 0 0 0
#> govsec_rate_kzt_1y govsec_rate_kzt_7y govsec_rate_kzt_10y
#> 0 0 0
#> tonia_rate rate_kzt_mort_0y_1y rate_kzt_mort_1y_iy
#> 0 0 0
#> rate_kzt_corp_0y_1y rate_usd_corp_0y_1y rate_kzt_corp_1y_iy
#> 0 0 0
#> rate_usd_corp_1y_iy rate_kzt_indv_0y_1y rate_kzt_indv_1y_iy
#> 0 0 0
#> realest_resed_prim_rus realest_resed_sec_rus cred_portfolio
#> 0 0 0
#> coef_k1 coef_k3 provisions
#> 0 0 0
#> percent_margin com_inc com_exp
#> 0 0 0
#> oper_inc oth_inc DR
#> 0 0 0
#>
#> --------------------------------------------------------------------------------
#>
#> --------------------------------------------------------------------------------
#> Outliers
#> --------------------------------------------------------------------------------
#> [[1]]
#> real_gdp GDD_Agr_R GDD_Min_R
#> 0 0 0
#> GDD_Man_R GDD_Elc_R GDD_Con_R
#> 0 0 0
#> GDD_Trd_R GDD_Trn_R GDD_Inf_R
#> 0 0 0
#> GDD_Est_R GDD_R Rincpop_q
#> 0 0 0
#> Rexppop_q Rwage_q imp
#> 0 0 0
#> exp usdkzt eurkzt
#> 0 0 0
#> rurkzt poil GDP_DEF
#> 0 0 0
#> cpi realest_resed_prim realest_resed_sec
#> 0 0 0
#> realest_comm index_stock_weighted ntrade_Agr
#> 0 0 0
#> ntrade_Min ntrade_Man ntrade_Elc
#> 0 0 0
#> ntrade_Con ntrade_Trd ntrade_Trn
#> 0 0 0
#> ntrade_Inf fed_fund_rate govsec_rate_kzt_3m
#> 0 0 0
#> govsec_rate_kzt_1y govsec_rate_kzt_7y govsec_rate_kzt_10y
#> 0 0 0
#> tonia_rate rate_kzt_mort_0y_1y rate_kzt_mort_1y_iy
#> 0 0 0
#> rate_kzt_corp_0y_1y rate_usd_corp_0y_1y rate_kzt_corp_1y_iy
#> 0 0 0
#> rate_usd_corp_1y_iy rate_kzt_indv_0y_1y rate_kzt_indv_1y_iy
#> 0 0 0
#> realest_resed_prim_rus realest_resed_sec_rus cred_portfolio
#> 0 0 0
#> coef_k1 coef_k3 provisions
#> 0 0 0
#> percent_margin com_inc com_exp
#> 0 0 0
#> oper_inc oth_inc DR
#> 0 0 0
#>
#> --------------------------------------------------------------------------------
Depending on the outputs, apply necessary functions to eliminate inappropriate properties of the data. For instance, in case of missing values delete these missing values.
As dataset is preliminary cleaned, time-series data needs to be stationary. Stationarity is needed for the properties to be independent of time periods, i.e. mean, variance etc are constant over time. In R stationarity can be checked by Augmented-Dickey Fuller (adf.test) and/or Kwiatkowski-Phillips-Schmidt-Shin (kpss.test) tests.
In more details, macroKZ can use sapply function to view which parameter is stationary or not.
If dataset, as a whole, or individual parameters are non-stationary, it is recommended to apply transformation techniques to make data stationary. Most common transformation tools are differencing (first and second order), logarithming, difference of logarithms, detrending and etc. After transformation method(s) is applied, make sure that data is stationary.
To build the best regression model regressors/independent variables need to be independent of each other. If this condition is violated, multicollinearity presents and regression estimators are biased. AFR package offers corsel function that estimates correlation between regressors in the dataset given a threshold (set by the user). The result can be presented numerically or logically (TRUE/FALSE).
Once regressors are chosen, linear regression model can be built via lm function.