Title: | Simulate Time Series Diagnostics |
---|---|
Description: | These are tools that allow users to do time series diagnostics, primarily tests of unit root, by way of simulation. While there is nothing necessarily wrong with the received wisdom of critical values generated decades ago, simulation provides its own perks. Not only is simulation broadly informative as to what these various test statistics do and what are their plausible values, simulation provides more flexibility for assessing unit root by way of different thresholds or different hypothesized distributions. |
Authors: | Steven Miller [aut, cre]
|
Maintainer: | Steven Miller <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2025-02-26 07:00:34 UTC |
Source: | CRAN |
adf_lag_select()
runs a series of (Augmented) Dickey-Fuller
tests and returns information that may (or may not) be useful in identify
a potential lag order for unit root tests.
adf_lag_select(x, min_lag = 0, max_lag = NULL)
adf_lag_select(x, min_lag = 0, max_lag = NULL)
x |
a vector |
min_lag |
the minimum lag order to use. Defaults to 0. |
max_lag |
the maximum lag order to use. Defaults to Schwert's (1989) upper lag. |
This function removes missing values from the vector before calculating test statistics.
The lower bound lag order suggested by Schwert (1989) and the default suggested
by Said and Dickey (1984) do not meaningfully separate from each other until
the length of the series reaches 127. Under those conditions, if the note
column returned by this function for a finite series does not identify the
Said and Dickey (1984) default, but identifies the Schwert (1989) lower
bound, interpret the latter as the former.
adf_lag_select()
returns a list of length 3. The first element
in the list is a data frame of a series of (Augmented) Dickey-Fuller tests
for no-drift, no-trend. The second is a data frame of a series of (Augmented)
Dickey-Fuller tests for drift, no trend. The third is a data frame of a series
of (Augmented) Dickey-Fuller tests for a drift and trend. Each data frame has
the following columns communicating the following information.
The lag order
The (A)DF statistic for the lag order.
The Akaike information criterion for the model.
Schwartz' (Bayesian) criteron for the model.
The absolute value of the last lagged first difference in the model.
The "modified" Akaike information criterion for the model.
The "modified" Schwarz' (Bayesian) criterion for the model.
A note indicating if the lag was suggested by Schwert (1989) or Said and Dickey (1984)
Steven V. Miller
x <- head(tbills$tb3m, 500) adf_lag_select(x)
x <- head(tbills$tb3m, 500) adf_lag_select(x)
A data frame on monthly dyadic foreign policy behavior from 1948 to 1978 for select dyads, using COPDAB data. The data offer the opportunity for a basic replication of Lebo and Moore (2003).
exCopdab
exCopdab
A data frame with 372 observations on the following 12 variables.
ym
a year-month indicator, in the format of YYMM
eg2is
an estimate of the dyadic foreign policy behavior of Egypt to Israel
is2eg
an estimate of the dyadic foreign policy behavior of Israel to Egypt
us2ussr
an estimate of the dyadic foreign policy behavior of the U.S. to the Soviet Union
ussr2us
an estimate of the dyadic foreign policy behavior of the Soviet Union to the U.S.
us2fra
an estimate of the dyadic foreign policy behavior of the U.S. to France
fra2us
an estimate of the dyadic foreign policy behavior of France to the U.S.
us2is
an estimate of the dyadic foreign policy behavior of the U.S. to Israel
is2us
an estimate of the dyadic foreign policy behavior of Israel to the U.S.
suez
a dummy variable indicating if the observation corresponds with the Suez Crisis
sixday
a dummy variable indicating if the observation corresponds with the Six-Day War
yomk
a dummy variable indicating if the observation corresponds with the Yom Kippur War
Lebo and Moore (2003, 22-24) will offer more context about how these variables are coded. Important details for replication from scratch are assuredly lost to history, but the authors are clear about what they're doing and the procedure they used to weight fundamentally ordinal data to create some kind of continuous estimate. Context clues offer more information as well.
Lebo, Matthew J. and Will H. Moore. 2003. "Dynamic Foreign Policy Behavior." Journal of Conflict Resolution 47(1): 13-32.
A data frame on various suggestions for lags for your time series, given the length of your time series. You are not compelled to use these. These are just suggestions.
lag_suggests
lag_suggests
A data frame with 1000 observations on the following 4 variables.
n
an integer corresponding with an assumed length of your time series
schwert_ub
the upper bound lag order suggested by Schwert (1989) for a time series of that length
schwert_lb
the lower bound lag order suggested by Schwert (1989) for a time series of that length
qiuetal2013
the suggested lag order from Qiu et al. (2013)
sd84
the suggested lag order from Said and Dickey (1984)
The lower bound lag order suggested by Schwert (1989) and the default suggested by Said and Dickey (1984) do not meaningfully separate from each other until the length of the series reaches 127. You should think long and hard about doing any of this if your time series is so finite that it has fewer than 25 observations.
The Qiu et al. (2013) suggestion is the default lag if you're using the aTSA package. It is almost equivalent to the Schwert (1989) lower bound, except the length of the series is raised to 2/9 and not 2/8. The two do not meaningfully separate until the length of the series reaches 5,720 observations (which is when the difference between two reaches two lags of separation).
Qiu, D., Q. Shao, and L. Yang. 2013. "Efficient Inference for Autoregressive Coefficients in the Presence of Trends." Jounal of Multivariate Analysis 114: 40–53.
Said, Said E. and David A. Dickey. 1984. "Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order." Biometrika 71(3): 599-607.
Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation". Journal of Business & Economic Statistics 7(2): 147–59.
A data frame of quarterly indicators useful for modeling the demand for money in the United States. Data go from the first quarter of 1960 to the third quarter of 2024.
money_demand
money_demand
A data frame with 259 observations on the following 6 variables.
date
a date
m1
so-called 'narrow' money (M1) in supply, in billions, not seasonally adjusted
m2
monetary supply (M2), in billions, not seasonally adjusted
gnpdef
an implicit price deflator for gross national product (index, 2017 = 100)
ffer
the federal funds effective rate
rgnp
real gross national product (in 2017 dollars)
pcepi
the chain-type price index (index, 2017 == 100)
Data come by way of fredr call. Be mindful of changes in the definition of the money supply, especially as they manifest in May 2020. Subject domain expertise goes a long way here. The "M2" indicator is the "M1" indicator with small-time deposits that are "close substitutes" for M1.
sadf_test()
provides a simulation approach to assessing
unit root in a time series by way of the (Augmented) Dickey-Fuller test. It
takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift,
no trend; drift, no trend; drift and trend) and calculates tau statistics as
one normally would. Rather than interpolate or approximate a p-value, it
simulates some user-specified number of (Augmented) Dickey-Fuller tests of
either a known, non-stationary time series or a known, white-noise time series
matching the length of the time series the user provides. This allows the
user to make assessments of non-stationarity or stationarity by way of
simulation rather than approximation from received critical values by way of
books or tables some years out of date.
sadf_test(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary")
sadf_test(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary")
x |
a vector |
n_lags |
defaults to NULL, but must be 0 or a positive integer. This
argument determines the number of lagged first differences to include in the
estimation procedure. Recall that the test statistic (tau) is still the
t-statistic for the level value of the vector at t-1, whether the constant
(drift) and time trend is included or not. If this value is 0, the procedure
is the classic Dickey-Fuller test. If this value is greater than 0, this is
the "augmented" Dickey-Fuller test, so-called because it is "augmented" by
the number of lagged first differences to assess higher-order AR processes.
If no argument is specified, the default lag is Schwert's suggested lower
bound. The |
n_sims |
the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000. |
sim_hyp |
can be either "stationary" or "nonstationary". If "stationary", the function runs (A)DF tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs (A)DF tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form. |
The Dickey-Fuller and its "augmented" corollary are curious statistical
procedures, even if the underlying concept is straightforward. I have seen
various implementations of these procedures use slightly different
terminology to describe its procedure, though this particular implementation
will impose nomenclature in which the classic Dickey-Fuller procedure that
assumes just the AR(1) process is one in which n_lags
is 0. The
addition of lags (of first differences) is what ultimately makes the
Dickey-Fuller procedure to be "augmented."
The function employs the default suggested by Schwert (1989) for the number
of lagged first differences to include in this procedure. Schwert (1989)
recommends taking the length of the series and dividing it by 100 before
raising that number to the power of 1/4. Thereafter, multiply it by 12 and
round down the number to the nearest integer. There are other suggested
defaults you can consider. adf.test
in aTSA takes the length of
the series, divides it by 100 and raises it to the power of 2/9. It
multiplies that by 4 and floors the result. adf.test
in tseries
subtracts 1 from the length of the series before raising it to the power of
1/3 (flooring that result as well). The Examples section will show you how
you can do this.
This function specifies three different types of tests: 1) no drift, no trend,
2) drift, no trend, and 3) drift and trend. In the language of the lm()
function, the first is lm(y ~ ly - 1)
where y
is the value of y
and
ly
is its first-order lag. The second test is lm(y ~ ly)
, intuitively
suggesting the y-intercept in this equation is the "drift". The third would
be lm(y ~ ly + t)
with t
being a simple integer that increases by 1 for
each observation (i.e. a time-trend).
None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.
This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).
This function removes missing values from the vector before calculating test statistics.
sadf_test()
returns a list of length 3. The first element
in the list is a matrix of tau statistics calculated by the test. The second
element is a data frame of the simulated tau statistics of either a known
white-noise time series or three different non-stationary time series
(pure random walk, random walk with drift, random walk with drift and trend).
The third element contains some attributes about the procedure for
post-processing.
Steven V. Miller
Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business & Economic Statistics 7(2): 147–159.
y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499. sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499. sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
sim_df_mod()
is designed as a helper function, to be used
internally in this package in sadf_test()
. But, you can use it here to
simulate a time series and perform a(n Augmented) Dickey-Fuller test.
sim_df_mod(x, ts_type, df_lags = NULL, classic_df = FALSE, wn = FALSE)
sim_df_mod(x, ts_type, df_lags = NULL, classic_df = FALSE, wn = FALSE)
x |
a numeric vector corresponding to a series to replicate/simulate |
ts_type |
a type of time-series to simulate (either 'ndnt', 'dnt', or 'dt') |
df_lags |
a numeric vector for the number of lags to calculate for the test. |
classic_df |
logical, defaults to FALSE. If FALSE, the function calculates
an "Augmented" Dickey-Fuller test on a simulated series with the number of lagged
first differences requested in the |
wn |
logical, defaults to FALSE. If FALSE, generates a random walk of some description for a DF/ADF test. If TRUE, series to be simulated for a DF/ADF test is white noise. |
classic_df = TRUE
suppresses the need to specify df_lags = 0
, but
df_lags
cannot be 0 if classic_df = FALSE
.
This might change in future iterations, but it's worth clarifying the values assigned to the parameters of a drift and trend. The drift is randomly generated from a Rademacher distribution for both the times series with drift and drift-and-trend. The series with a deterministic trend divides the value from the Rademacher distribution by 10. My rationale is largely based on what I've seen other pedagogical guides do, the extent to which they talk about simulating values for these types of random walks.
sim_df_mod()
returns the output of a linear model (with class
lm
) that performs a(n Augmented) Dickey-Fuller test on a simulated time
series. This is mostly for internal use, but it might pique the user's
interest to see such a test in action independent of simulated summaries
generated by sadf_test()
.
Steven V. Miller
set.seed(8675309) # don't want new numbers in documentation every time... sim_df_mod(rnorm(25), ts_type = 'ndnt', classic_df = TRUE) sim_df_mod(rnorm(25), ts_type = 'ndnt', df_lags = 2, classic_df = FALSE)
set.seed(8675309) # don't want new numbers in documentation every time... sim_df_mod(rnorm(25), ts_type = 'ndnt', classic_df = TRUE) sim_df_mod(rnorm(25), ts_type = 'ndnt', df_lags = 2, classic_df = FALSE)
sim_ts()
is mostly a helper function, to be used
internally in this package, but you can use it here to simulate a time series.
sim_ts(n, b0 = 0, bt = 0, rho = 1, white_noise = FALSE, rsd = 1)
sim_ts(n, b0 = 0, bt = 0, rho = 1, white_noise = FALSE, rsd = 1)
n |
a numeric vector for the length of the series |
b0 |
a numeric vector for a potential drift in the series. Defaults to 0 |
bt |
a numeric vector for a potential trend in the series. Defaults to 0. |
rho |
a numeric vector for the simple autoregressive parameter. Defaults to 1. |
white_noise |
= logical, defaults to FALSE. If FALSE, generates a random walk. If TRUE, series is white noise. |
rsd |
the standard deviation for a normal distribution to be simulated. Defaults to 1. |
sim_ts()
returns a numeric vector of a simulated time series
that would follow the user's input.
Steven V. Miller
set.seed(8675309) # don't want new numbers in documentation every time... sim_ts(25) sim_ts(25, b0 = 1) sim_ts(25, b0 = 1, bt = .05)
set.seed(8675309) # don't want new numbers in documentation every time... sim_ts(25) sim_ts(25, b0 = 1) sim_ts(25, b0 = 1, bt = .05)
skpss_test()
provides a simulation approach to assessing
unit root in a time series by way of KPSS test, first proposed by Kwiatkowski
et al. (1992). It takes a vector and extracts the residuals from two models to
assess stationarity around a level or trend, producing a KPSS test statistic
(eta). Rather than interpolate or approximate a p-value, it simulates
some user-specified number of KPSS of either a known, stationary time series
(default) or a known, non-stationary time series matching the length of the
time series the user provides. This allows the user to make assessments of
non-stationarity or stationarity by way of simulation rather than
approximation from received critical values by way of various books/tables.
skpss_test(x, lag_short = TRUE, n_sims = 1000, sim_hyp = "stationary")
skpss_test(x, lag_short = TRUE, n_sims = 1000, sim_hyp = "stationary")
x |
a vector |
lag_short |
logical, defaults to |
n_sims |
the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000. |
sim_hyp |
can be either "stationary" or "nonstationary". If "stationary" (default), the function runs KPSS tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary", the simulations are conducted on two different random walks. The "trend" test includes a level drawn from a Rademacher distribution with a time trend of that level, divided by 10. |
Recall that this procedure defaults from almost every other unit root test by making stationarity a null hypothesis. Non-stationarity is the alternative hypothesis.
As of writing, the default lags are those suggested by Schwert (1989) to apply to the Bartlett kernel generating the KPSS test statistic (eta).
aTSA has a particularly interesting approach to this test that draws on on insights seemingly proposed by Hobijn et al. (2004). Future updates may include those, but this function, as is, performs calculations of eta identical to what tseries or urca would produce. Right now, I don't have the time or energy to get into the weeds of what Hobijn et al. (2004) are doing or what aTSA is implementing, but it seems pretty cool.
This function removes missing values from the vector before calculating test statistics.
skpss_test()
returns a list of length 3. The first element
in the list is a matrix of eta statistics calculated by the test. The first
of those is the level statistic and the second of those is the trend
statistic. The second element is a data frame of the simulated eta statistics,
where the type of simulation (level, trend) is communicated in the cat
column.
The third element contains some attributes about the procedure for
post-processing.
Steven V. Miller
Hobijn, Bart, Philip Hans Franses, and Marius Ooms. 2004. "Generalizations of the KPSS-test for Stationarity". Statistica Neerlandica 58(4): 483–502.
Kwiatkowski, Denis, Peter C.B. Phillips, Peter Schmidt, and Yongcheol Shin. 1992. "Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure Are We that Economic Time Series Have a Unit Root?" Journal of Econometrics 54(1-3): 159–78.
x <- USDSEK$close[1:500] # note: one missing obs here; becomes series of 499 skpss_test(x, n_sims = 25) # make it quick...
x <- USDSEK$close[1:500] # note: one missing obs here; becomes series of 499 skpss_test(x, n_sims = 25) # make it quick...
spp_test()
provides a simulation approach to assessing
unit root in a time series by way of the Phillips-Perron test. It takes a
vector and performs three Phillips-Perron tests (no drift, no trend; drift, no
trend; drift and trend) and calculates both rho and tau statistics as one
normally would. Rather than interpolate or approximate a p-value, it
simulates some user-specified number of Phillips-Perron tests of either a
known, non-stationary time series or a known, white-noise time series
matching the length of the time series the user provides. This allows the
user to make assessments of non-stationarity or stationarity by way of
simulation rather than approximation from received critical values by way of
various books/tables.
spp_test(x, lag_short = TRUE, n_sims = 1000, sim_hyp = "nonstationary")
spp_test(x, lag_short = TRUE, n_sims = 1000, sim_hyp = "nonstationary")
x |
a vector |
lag_short |
logical, defaults to |
n_sims |
the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000. |
sim_hyp |
can be either "stationary" or "nonstationary". If "stationary", the function runs Phillips-Perron tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs Phillips-Perron tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form. |
Some knowledge of Augmented Dickey-Fuller and the Phillips-Perron procedure is assumed here. Generally, the Phillips-Perron test purports to build on the Augmented Dickey-Fuller (ADF) procedure through two primary means. The first is relaxing the need to be as deliberate about lag structures as ADF is, typically asking for some kind of long- or short-term lag for the bandwidth/kernel. The second, related to this, is that its nonparametric procedure makes it robust to various forms of heteroskedasticity in the error term.
The short-term and long-term lags follow the convention introduced in the
Phillips-Perron test. The short-term lag uses the default number of
Newey-West lags, defined as the floor of 4*(n/100)^.25 where n
is the length
of the time series. The long-term lag substitutes 4 for 12 in this equation.
We have Schwert (1989) to thank for these defaults. Future iterations of this
function may permit manual lag selection, but this is fine for now.
This function specifies three different types of tests: 1) no drift, no trend,
2) drift, no trend, and 3) drift and trend. In the language of the lm()
function, the first is lm(y ~ ly - 1)
where y
is the value of y
and
ly
is its first-order lag. The second test is lm(y ~ ly)
, intuitively
suggesting the y-intercept in this equation is the "drift". The third would
be lm(y ~ ly + t)
with t
being a simple integer that increases by 1 for
each observation (i.e. a time-trend).
There are two types of statistics in the Phillips-Perron test: rho and tau. Of the two, tau is the more intuitive statistic and compares favorably to its corollary statistic in the Augmented Dickey-Fuller test. It's why you'll typically see tau reported as the statistic of interest in other implementations. rho has its utility for more advanced diagnostics, though. Both are calculated in this function, though tau is the default statistic.
None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.
This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).
This function removes missing values from the vector before calculating test statistics.
spp_test()
returns a list of length 3. The first element
in the list is a matrix of rho statistics and tau statistics calculated by
the Phillips-Perron test. The second element is a data frame of the simulated
rho and tau statistics of either a known white-noise time series or three
different non-stationary time series (pure random walk, random walk with
drift, random walk with drift and trend). The third element is some
attributes about the procedure for post-processing.
Steven V. Miller
a <- rnorm(25) # white noise b <- cumsum(a) # random walk spp_test(a, n_sims = 25) spp_test(b, n_sims = 25)
a <- rnorm(25) # white noise b <- cumsum(a) # random walk spp_test(a, n_sims = 25) spp_test(b, n_sims = 25)
A data frame on daily (when applicable/available) U.S. Treasury Bill rates. These are the yield received for investing in a government-issued treasury security that has a maturity of a given period of time (three months, six months, or a year).
tbills
tbills
A data frame with 17,741 observations on the following 4 variables.
date
a date
tb3m
the three-month treasury bill rate
tb6m
the six-month treasury bill rate
tb1y
the one-year treasury bill rate
Data come by way of fredr call. The one-year (52-week) treasury bill rate was discontinued in 2001 and re-introduced in 2008. Be mindful of that gap in the series.
ur_summary()
provides a summary of the unit root tests
included in this package.
ur_summary(obj, pp_stat = "tau", ...)
ur_summary(obj, pp_stat = "tau", ...)
obj |
the object to be summarized, of class 'spp_test' |
pp_stat |
a statistic to be summarized: either "tau" or "rho". Applicable only to Phillips-Perron tests generated by functions in this package. |
... |
additional argument, currently ignored |
This function makes ample use of the "attributes" element in the list produced by the unit root simulations.
ur_summary()
produces console output that offers a summary
assessment about the presence of a unit root based on your simulations.
Steven V. Miller
A <- spp_test(money_demand$ffer, n_sims = 100) ur_summary(A)
A <- spp_test(money_demand$ffer, n_sims = 100) ur_summary(A)
A data frame on personal consumption expenditures and disposable personal income in the United States.
USDICE
USDICE
A data frame with 299 observations on the following 4 variables.
date
a date
pce
personal consumption expenditures, seasonally adjusted, in billions
dpi
disposable personal income, seasonally adjusted, in billions
pira
personal income receipts on assets (personal dividend income), in billions
cpiu
consumer price index for all urban consumers (all items in U.S. city average)
Data come by way of fredr call. Data are quarterly. Personal consumption expenditure. disposable personal income, and personal dividend income are not inflation-adjusted. The data on the consumer price index allow for such inflation adjustment to "real" dollars based on researcher discretion.
A data frame on the USD/SEK exchange rate (i.e. how many Swedish crowns does one dollar get you).
USDSEK
USDSEK
A data frame with 3905 observations on the following 2 variables.
date
a date
close
the exchange rate at the close of trading
Data come by way of quantmod.