Title: | Smoothing Long-Memory Time Series |
---|---|
Description: | The nonparametric trend and its derivatives in equidistant time series (TS) with long-memory errors can be estimated. The estimation is conducted via local polynomial regression using an automatically selected bandwidth obtained by a built-in iterative plug-in algorithm or a bandwidth fixed by the user. The smoothing methods of the package are described in Letmathe, S., Beran, J. and Feng, Y., (2023) <doi:10.1080/03610926.2023.2276049>. |
Authors: | Yuanhua Feng [aut] (Paderborn University, Germany), Jan Beran [aut] (University of Konstanz, Germany), Sebastian Letmathe [aut] (Paderborn University, Germany), Dominik Schulz [aut, cre] (Paderborn University, Germany) |
Maintainer: | Dominik Schulz <[email protected]> |
License: | GPL-3 |
Version: | 2.0.1 |
Built: | 2024-11-04 06:27:28 UTC |
Source: | CRAN |
A dataset that contains daily observations of individual air pollutants from 2014 to 2020.
airLDN
airLDN
A data frame with 2552 rows and 8 variables:
the observation date
particle matter with an aerodynamic diameter smaller than 2.5
m
particle matter with an aerodynamic diameter smaller than 10
m
ozone or trioxygen
nitrogen dioxide
sulphur dioxide
carbon monoxide
composite air quality index
The data can be obtained from the World Air Quality Project.
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
arma_to_ar(ar = numeric(0), ma = numeric(0), max_i = 1000)
arma_to_ar(ar = numeric(0), ma = numeric(0), max_i = 1000)
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
max_i |
the maximum index up until which to return the coefficient series. |
Consider the ARMA model
where are the innovations.
,
, are the AR-coefficients to pass to the
argument
ar
, ,
, are the MA-coefficients
to pass to the argument
ma
.The function then returns the coefficients
from the corresponding infinite-order AR-representation
where ,
, are the coefficients. Following this
notation,
by definition.
A numeric vector is returned.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
arma_to_ar(ar = 0.75, ma = 0.5, max_i = 100)
arma_to_ar(ar = 0.75, ma = 0.5, max_i = 100)
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
arma_to_ma(ar = numeric(0), ma = numeric(0), max_i = 1000)
arma_to_ma(ar = numeric(0), ma = numeric(0), max_i = 1000)
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
max_i |
the maximum index up until which to return the coefficient series. |
Consider the ARMA model
where are the innovations.
,
, are the AR-coefficients to pass to the
argument
ar
, ,
, are the MA-coefficients
to pass to the argument
ma
.The function then returns the coefficients
from the corresponding infinite-order MA-representation
where ,
, are the coefficients. Following this
notation,
by definition.
A numeric vector is returned.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
arma_to_ma(ar = 0.75, ma = 0.5, max_i = 100)
arma_to_ma(ar = 0.75, ma = 0.5, max_i = 100)
An information criterion is calculated for different orders of a fractionally integrated autoregressive-moving-average (FARIMA) model.
critMatlm(X, p.max = 5, q.max = 5, criterion = c("bic", "aic"))
critMatlm(X, p.max = 5, q.max = 5, criterion = c("bic", "aic"))
X |
a numeric vector that contains the observed time series ordered from past to present; the series is assumed to follow an FARIMA process. |
p.max |
an integer value |
q.max |
an integer value |
criterion |
a character value that defines the information criterion
that will be calculated; the Bayesian Information Criterion ( |
The series passed to X
is assumed to follow an
FARIMA() model. A
p.max + 1
by q.max + 1
matrix is
calculated for this series. More precisely, the criterion chosen via the
argument criterion
is calculated for all combinations of orders
and
.
Within the function, two information criteria are supported: the Bayesian Information Criterion (BIC) and Akaike's Information Criterion (AIC). The AIC is given by
where is the estimated
innovation variance,
and
are the ARMA orders and
is
the number of observations.
The BIC, on the other hand, is defined by
with being the number of estimated parameters and
being the estimated Log-Likelihood. Since the parameter
only differs with respect to the orders
and
for all
estimated models, the term
is reduced to
within the function. Exemplary,
if the mean of the series is estimated as well, it is usually considered
within the parameter
when calculating the BIC.
However, since the mean is estimated for all models, not considering this
estimated parameter within the calculation of the BIC will reduce all BIC
values by the same amount of
. Therefore, the selection
via this simplified criterion is still valid, if the number of the estimated
parameters only differs with respect to
and
between the
models that the BIC is obtained for.
The optimal orders are considered to be the ones which minimize either the BIC or the AIC. The use of the BIC is however recommended, because the BIC is consistent, whereas the AIC is not.
NOTE:
Within this function, the fracdiff
function of the
fracdiff
package is used throughout for
the estimation of FARIMA models.
The function returns a p.max + 1
by q.max + 1
matrix, where the
rows represent the AR orders from to
and the columns represent the MA orders from
to
. The values within the matrix are the values of
the previously selected information criterion for the different combinations
of
and
.
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn
University),
Author
# Simulate a FARIMA(2, 0.3 ,1) process set.seed(1) X.sim <- fracdiff::fracdiff.sim(n = 1000, ar = c(1.2, -0.71), ma = -0.46, d = 0.3)$series # Application of the function with BIC-criterion BIC_mat <- critMatlm(X.sim) BIC_mat # determining the optimal order smoots::optOrd(BIC_mat)
# Simulate a FARIMA(2, 0.3 ,1) process set.seed(1) X.sim <- fracdiff::fracdiff.sim(n = 1000, ar = c(1.2, -0.71), ma = -0.46, d = 0.3)$series # Application of the function with BIC-criterion BIC_mat <- critMatlm(X.sim) BIC_mat # determining the optimal order smoots::optOrd(BIC_mat)
Output is with positive signs on the left-hand side of the equation.
d_to_coef(d, max_i = 1000)
d_to_coef(d, max_i = 1000)
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Consider the FARIMA model
where are the innovations and where
.
is the fractional differencing
coefficient.
The fractional differencing operator can alternatively be expressed
as an infinite coefficient series, so that
where is the backshift operator and where
,
,
are the coefficients. Note that
by definition.
The function returns the series of coefficients .
A numeric vector is returned.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
d_to_coef(d = 0.3, max_i = 100)
d_to_coef(d = 0.3, max_i = 100)
This function runs through an iterative process in order to find the optimal bandwidth for the nonparametric estimation of the first or second derivative of the trend in an equidistant time series (with long-memory errors) and subsequently employs the obtained bandwidth via local polynomial regression.
dsmoothlm( y, d = c(1, 2), pmin = c(0, 1, 2, 3, 4, 5), pmax = c(0, 1, 2, 3, 4, 5), qmin = c(0, 1, 2, 3, 4, 5), qmax = c(0, 1, 2, 3, 4, 5), mu = c(0, 1, 2, 3), mu.p = c(0, 1, 2, 3), pp = c(1, 3), bStart.p = 0.15, InfR.p = c("Opt", "Nai", "Var") )
dsmoothlm( y, d = c(1, 2), pmin = c(0, 1, 2, 3, 4, 5), pmax = c(0, 1, 2, 3, 4, 5), qmin = c(0, 1, 2, 3, 4, 5), qmax = c(0, 1, 2, 3, 4, 5), mu = c(0, 1, 2, 3), mu.p = c(0, 1, 2, 3), pp = c(1, 3), bStart.p = 0.15, InfR.p = c("Opt", "Nai", "Var") )
y |
a numeric vector that contains the time series ordered from past to present. |
||||||||||
d |
an integer |
||||||||||
pmin |
an integer value |
||||||||||
pmax |
an integer value |
||||||||||
qmin |
an integer value |
||||||||||
qmax |
an integer value |
||||||||||
mu |
an integer |
||||||||||
mu.p |
an integer
|
||||||||||
pp |
an integer |
||||||||||
bStart.p |
a numeric object that indicates the starting value of the
bandwidth for the iterative process to obtain initial estimates for |
||||||||||
InfR.p |
a character object that represents the inflation
rate in the form |
The trend is estimated based on the additive nonparametric regression model for an equidistant time series
where is the observed time series,
is the rescaled time
on the interval
,
is a smooth and deterministic
trend function and
are stationary errors with
and is assumed to follow a FARIMA(
)
model (see also Beran and Feng, 2002).
The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).
The variance factor , the long memory parameter
and the
starting bandwidth
are first obtained from a pilot-estimation of
the time series' nonparametric trend (
) with polynomial order
. The estimate is then plugged into the iterative procedure for
estimating the first or second derivative (
or
).
For further details on the asymptotic theory or the algorithm, we refer the
user to Letmathe, Beran and Feng (2023).
The function itself is applicable in the following way: Based on a data input
y
, an order of polynomial pp
for the variance factor estimation
procedure, a starting value for the relative bandwidth bStart.p
in the
variance factor estimation procedure and a kernel function defined by the
smoothness parameter mu
, an optimal bandwidth is numerically calculated
for the trend's derivative of order d
. In fact, aside from the input
vector y
, every argument has a default setting that can be adjusted for
the individual case. However, it is recommended to initially use the default
values for the estimation of the
first derivative and adjust the argument d
to d = 2
for the
estimation of the second derivative.
The initial bandwidth does not affect the resulting optimal bandwidth in
theory. However in practice, local minima of the AMISE can influence the
results. For more specific information on the input arguments consult the
section Arguments.
After the bandwidth estimation, the nonparametric derivative of the series is calculated with respect to the obtained optimal bandwidth by means of a local polynomial regression. The output object is then a list that contains, among other components, the original time series, the estimates of the derivative and the estimated optimal bandwidth.
The default print method for this function delivers key numbers such as
the iteration steps and the generated optimal bandwidth rounded to the fourth
decimal. The exact numbers and results such as the estimated nonparametric
trend series are saved within the output object and can be addressed via the
$
sign.
The function returns a list with different components:
the optimal bandwidth chosen by the IPI-algorithm.
the starting bandwidth for the nonparametric trend estimation that leads to the initial estimates; input argument.
the estimated variance factor.
the inflation rate setting.
the bandwidths of the single iterations steps
the smoothness parameter of the second order kernel; input argument.
the number of observations.
the total number of iterations until convergence.
the original input series; input argument.
the order of polynomial for the local polynomial regression used within derivative estimation procedure.
the order of polynomial for the local polynomial regression used in the pilot estimation; input argument.
the considered order of the trend's derivative; input argument
d
.
the weighting system matrix used within the local polynomial
regression; this matrix is a condensed version of a complete weighting system
matrix; in each row of ws
, the weights for conducting the smoothing
procedure at a specific observation time point can be found; the first
rows, where
corresponds to the number of
observations,
is the bandwidth considered for smoothing and
denotes the integer part, contain the weights at the
left-hand boundary points; the weights in row
are representative for the estimation at all
interior points and the remaining rows contain the weights for the right-hand
boundary points; each row has exactly
elements,
more specifically the weights for observations of the nearest
time points; moreover, the weights are normalized,
i.e. the weights are obtained under consideration of the time points
, where
.
the nonparametric estimates of the derivative.
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Sebastian Letmathe (Scientific Employee) (Department of Economics,
Paderborn
University),
Package Creator and Maintainer
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn
University),
Author
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
# Logarithm of test data test_data <- gdpG7 y <- log(test_data$gdp) n <- length(y) t <- seq(from = 1962, to = 2020, length.out = n) # Applied dsmooth function for the trend's first derivative result_d <- dsmoothlm(y, d = 1, pp = 1, pmax = 1, qmax = 1, InfR.p = "Opt") estim <- result_d$ye # Plot of the results plot(t, estim, xlab = "Year", ylab = "First derivative", type = "l", main = paste0("Estimated first derivative of the trend for log-quarterly ", "G7-GDP, Q1 1962 - Q4 2019"), cex.axis = 0.8, cex.main = 0.8, cex.lab = 0.8, bty = "n") # Print result result_d # The main function "dsmoothlm"------------------------------------------
# Logarithm of test data test_data <- gdpG7 y <- log(test_data$gdp) n <- length(y) t <- seq(from = 1962, to = 2020, length.out = n) # Applied dsmooth function for the trend's first derivative result_d <- dsmoothlm(y, d = 1, pp = 1, pmax = 1, qmax = 1, InfR.p = "Opt") estim <- result_d$ye # Plot of the results plot(t, estim, xlab = "Year", ylab = "First derivative", type = "l", main = paste0("Estimated first derivative of the trend for log-quarterly ", "G7-GDP, Q1 1962 - Q4 2019"), cex.axis = 0.8, cex.main = 0.8, cex.lab = 0.8, bty = "n") # Print result result_d # The main function "dsmoothlm"------------------------------------------
The esemifar
package provides different applicable functions for the
estimation of the trend or its derivatives in equidistant time series.
The main functions include an automated bandwidth selection method for time
series with long-memory errors.
An alternative method to create an ESEMIFAR estimation object stitched together from a nonparametric and a parametric part.
esemifar(nonpar_model, par_model)
esemifar(nonpar_model, par_model)
nonpar_model |
an estimation object returned by |
par_model |
an estimation object returned by |
The main function tsmoothlm
already returns a fully estimated
ESEMIFAR model. In some instances, alternative specifications of the
nonparametric and parametric model parts are needed, for which
tsmoothlm
with its automated estimation algorithm does not
provide sufficient flexibility. Therefore, this function allows to stitch
together a nonparametric model part returned by gsmooth
and
a FARIMA part for the residuals obtained via fracdiff
.
The resulting object can then be used for forecasting.
The function returns a list of class "esemifar"
with elements
nonpar_model
and par_model
.
The esemifar
functions are either meant for calculating nonparametric
estimates of the trend of a time series or its derivatives.
dsmoothlm
is a function that calculates the derivatives of the
trend after obtaining the optimal bandwidth by an iterative plug-in
algorithm.
tsmoothlm
is the central function of the package. It allows
the user to conduct a local polynomial regression of the trend based on
an optimal bandwidth that is obtained by an iterative plug-in algorithm.
Inflation rate (and other factors) can be manually
and individually adjusted as arguments in the function
(see also: tsmoothlm
).
critMatlm
is a quick tool for the calculation of information criteria
for FARIMA() models with different order combinations
and
. The function returns a matrix with the obtained values of the
selected criterion for the different combinations of
and
(see also:
critMatlm
).
The package includes two datasets: airLDN
(see also:
airLDN
) with daily observations of individual air pollutants
from 2014 to 2020 and gdpG7
(see also: gdpG7
) that has
data concerning the quarterly G7 GDP between Q1 1962 and Q4 2019.
The package is distributed under the General Public License v3 ([GPL-3](https://tldrlegal.com/license/gnu-general-public-license-v3-(gpl-3))).
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Sebastian Letmathe,
Author (and Original Package Creator and Maintainer)
Dominik Schulz,
Author and Current Package Creator and Maintainer
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.
Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
lgdp <- log(esemifar::gdpG7$gdp) nonpar <- gsmooth(lgdp, b = 0.15) res <- nonpar$res par <- fracdiff::fracdiff(res, nar = 1, nma = 1) model <- esemifar(nonpar_model = nonpar, par_model = par) model
lgdp <- log(esemifar::gdpG7$gdp) nonpar <- gsmooth(lgdp, b = 0.15) res <- nonpar$res par <- fracdiff::fracdiff(res, nar = 1, nma = 1) model <- esemifar(nonpar_model = nonpar, par_model = par) model
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
farima_to_ar(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
farima_to_ar(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Consider the FARIMA model
where are the innovations and where
.
,
, are the AR-coefficients to pass to the
argument
ar
, ,
, are the MA-coefficients
to pass to the argument
ma
. is the fractional differencing
coefficient. The function then returns the coefficients
from the corresponding infinite-order AR-representation
where ,
, are the coefficients. Following this
notation,
by definition.
A numeric vector is returned.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
farima_to_ar(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
farima_to_ar(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
farima_to_ma(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
farima_to_ma(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Consider the FARIMA model
where are the innovations and where
.
,
, are the AR-coefficients to pass to the
argument
ar
, ,
, are the MA-coefficients
to pass to the argument
ma
. is the fractional differencing coefficient.
The function then returns the coefficients
from the corresponding infinite-order AR-representation
where ,
, are the coefficients. Following this
notation,
by definition.
A numeric vector is returned.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
farima_to_ma(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
farima_to_ma(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
Generic function which extracts fitted values from a esemifar
class
object. Both fitted
and fitted.values
can be called.
## S3 method for class 'esemifar' fitted(object, ...)
## S3 method for class 'esemifar' fitted(object, ...)
object |
an object from the |
... |
included for consistency with the generic function. |
Fitted values extracted from a esemifar
class object.
Sebastian Letmathe (Scientific Employee) (Department of Economics,
Paderborn University),
A dataset that contains the (seasonally adjusted) Gross Domestic Product of the G7 nations from the first quarter of 1962 to the fourth quarter of 2019
gdpG7
gdpG7
A data frame with 232 rows and 3 variables:
the observation year
the observation quarter in the given year
the volume Index of the Gross Domestic Product of the G7 nations
The data was obtained from the Organization for Economic Co-operation and Development (OECD)
https://data.oecd.org/gdp/quarterly-gdp.htm#indicator-chart
This function is an R function for estimating the trend function and its derivatives in an equidistant time series with local polynomial regression and a fixed bandwidth given beforehand.
gsmooth(y, v = 0, p = v + 1, mu = 1, b = 0.15, bb = c(0, 1))
gsmooth(y, v = 0, p = v + 1, mu = 1, b = 0.15, bb = c(0, 1))
y |
a numeric vector that contains the time series data ordered from past to present. |
|||||||||||||||||||||||||
v |
an integer
|
|||||||||||||||||||||||||
p |
an integer Exemplary for
|
|||||||||||||||||||||||||
mu |
an integer
|
|||||||||||||||||||||||||
b |
a real number |
|||||||||||||||||||||||||
bb |
can be set to
|
The trend or its derivatives are estimated based on the additive nonparametric regression model for an equidistant time series
where is the observed time series,
is the rescaled time
on the interval
,
is a smooth and deterministic
trend function and
are stationary errors with
(see also Beran and Feng, 2002).
This function is part of the package smoots
and is used in
the field of analyzing equidistant time series data. It applies the local
polynomial regression method to the input data with an arbitrarily
selectable bandwidth. By these means, the trend as well as its derivatives
can be estimated nonparametrically, even though the result will strongly
depend on the bandwidth given beforehand as an input.
NOTE:
The estimates are obtained with regard to the rescaled time points on the
interval . Thus, if
, the estimates might not
reflect the values for the actual time points. To rescale the estimates, we
refer the user to the
rescale
function of the smoots
package.
With package version 1.1.0, this function implements C++ code by means
of the Rcpp
and
RcppArmadillo
packages for
better performance.
The output object is a list with different components:
the chosen (relative) bandwidth; input argument.
the chosen bandwidth option at the boundaries; input argument.
the chosen smoothness parameter for the second order kernel; input argument.
the number of observations.
the original input series; input argument.
the chosen order of polynomial; input argument.
a vector with the estimated residual series; is set to NULL
for v > 0
.
the order of derivative; input argument.
the weighting system matrix used within the local polynomial
regression; this matrix is a condensed version of a complete weighting system
matrix; in each row of ws
, the weights for conducting the smoothing
procedure at a specific observation time point can be found; the first
rows, where
corresponds to the number of
observations,
is the bandwidth considered for smoothing and
denotes the integer part, contain the weights at the
left-hand boundary points; the weights in row
are representative for the estimation at all
interior points and the remaining rows contain the weights for the right-hand
boundary points; each row has exactly
elements,
more specifically the weights for observations of the nearest
time points; moreover, the weights are normalized,
i.e. the weights are obtained under consideration of the time points
, where
.
a vector with the estimates of the selected nonparametric order of
derivative on the rescaled time interval .
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn
University),
Package Creator and Maintainer
Beran, J. and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.
Feng, Y., Gries, T., Letmathe, S. and Schulz, D. (2019). The smoots package in R for semiparametric modeling of trend stationary time series. Discussion Paper. Paderborn University. Unpublished.
# Logarithm of test data y <- log(esemifar::gdpG7$gdp) # Applied gsmooth function for the trend with two different bandwidths results1 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.28, bb = 1) results2 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.11, bb = 1) trend1 <- results1$ye trend2 <- results2$ye # Plot of the results t <- seq(from = 1962, to = 2019.75, by = 0.25) plot(t, y, type = "l", xlab = "Year", ylab = "log(US-GDP)", bty = "n", lwd = 2, main = "Estimated trend for log-quarterly US-GDP, Q1 1947 - Q2 2019") points(t, trend1, type = "l", col = "red", lwd = 1) points(t, trend2, type = "l", col = "blue", lwd = 1) legend("bottomright", legend = c("Trend (b = 0.28)", "Trend (b = 0.11)"), fill = c("red", "blue"), cex = 0.6) title(sub = expression(italic("Figure 1")), col.sub = "gray47", cex.sub = 0.6, adj = 0)
# Logarithm of test data y <- log(esemifar::gdpG7$gdp) # Applied gsmooth function for the trend with two different bandwidths results1 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.28, bb = 1) results2 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.11, bb = 1) trend1 <- results1$ye trend2 <- results2$ye # Plot of the results t <- seq(from = 1962, to = 2019.75, by = 0.25) plot(t, y, type = "l", xlab = "Year", ylab = "log(US-GDP)", bty = "n", lwd = 2, main = "Estimated trend for log-quarterly US-GDP, Q1 1947 - Q2 2019") points(t, trend1, type = "l", col = "red", lwd = 1) points(t, trend2, type = "l", col = "blue", lwd = 1) legend("bottomright", legend = c("Trend (b = 0.28)", "Trend (b = 0.11)"), fill = c("red", "blue"), cex = 0.6) title(sub = expression(italic("Figure 1")), col.sub = "gray47", cex.sub = 0.6, adj = 0)
This function regulates how objects created by the package esemifar
are
plotted.
## S3 method for class 'esemifar' plot(x, t = NULL, rescale = TRUE, which = NULL, ...)
## S3 method for class 'esemifar' plot(x, t = NULL, rescale = TRUE, which = NULL, ...)
x |
an input object of class |
t |
an optional vector with time points that will be considered for
the x-axis within the plot; is set to NULL by default and uses a vector
|
rescale |
a single logical value; is set to |
which |
one of 1 (observations), 2 (trend), 3 (residuals) or 4
(observations with trend); the default |
... |
additional arguments of the standard plot method. |
None
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn
University),
"esemifar_fc"
Create basic R plots for forecasting objects of class "esemifar_fc"
.
## S3 method for class 'esemifar_fc' plot(x, y = NULL, t = NULL, ...)
## S3 method for class 'esemifar_fc' plot(x, y = NULL, t = NULL, ...)
x |
an object of class |
y |
currently without use; for compatibility only. |
t |
a numeric vector with series of time points for observations and forecasts. |
... |
further arguments of |
This is a plot method to visualize the forecasting results for an ESEMIFAR model. Common plot arguments can be implemented to change the appearance.
This method returns NULL
.
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn
University),
Author and Package Creator
lgdp <- log(esemifar::gdpG7$gdp) est <- tsmoothlm(lgdp, pmax = 1, qmax = 1) # Under normality fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE) plot(fc)
lgdp <- log(esemifar::gdpG7$gdp) est <- tsmoothlm(lgdp, pmax = 1, qmax = 1) # Under normality fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE) plot(fc)
Point and interval forecasts (under the normality assumption or via a bootstrap) for fitted ESEMIFAR models.
## S3 method for class 'esemifar' predict( object, n.ahead = 5, alpha = c(0.95, 0.99), method = c("norm", "boot"), bootMethod = c("simple", "advanced"), npaths = 5000, quant.type = 8, boot_progress = TRUE, expo = FALSE, trend_extrap = c("linear", "constant"), future = TRUE, num_cores = future::availableCores() - 1, ... )
## S3 method for class 'esemifar' predict( object, n.ahead = 5, alpha = c(0.95, 0.99), method = c("norm", "boot"), bootMethod = c("simple", "advanced"), npaths = 5000, quant.type = 8, boot_progress = TRUE, expo = FALSE, trend_extrap = c("linear", "constant"), future = TRUE, num_cores = future::availableCores() - 1, ... )
object |
|
n.ahead |
a single numeric value that represents the forecasting horizon. |
alpha |
a numeric vector with confidence levels for the forecasting
intervals; the default |
method |
whether to obtain the forecasting intervals under the
normality assumption ( |
bootMethod |
only for |
npaths |
only for |
quant.type |
only for |
boot_progress |
only for |
expo |
whether to exponentiate all results at the end. |
trend_extrap |
how to extrapolate the estimated trend into the future:
linearly ( |
future |
only for |
num_cores |
only for |
... |
no purpose; for compatibility only. |
Produce point and interval forecasts based on ESEMIFAR models. Throughout, the infinite-order AR-representation of the parametric FARIMA part is considered to produce point forecasts and future paths of the series. The trend is usually extrapolated linearly (or constantly as an alternative).
The function returns a list of class "esemifar"
with elements
nonpar_model
and par_model
.
A list with various elements is returned.
the observed series.
the point forecasts.
the lower bounds of the forecasting intervals.
the upper bounds of the forecasting intervals.
the fitted ESEMIFAR model object.
the confidence levels for the forecasting intervals.
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
lgdp <- log(esemifar::gdpG7$gdp) est <- tsmoothlm(lgdp, pmax = 1, qmax = 1) # Under normality fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE) fc$mean fc$lower fc$upper
lgdp <- log(esemifar::gdpG7$gdp) est <- tsmoothlm(lgdp, pmax = 1, qmax = 1) # Under normality fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE) fc$mean fc$lower fc$upper
This function regulates how objects created by the package esemifar
are
printed.
## S3 method for class 'esemifar' print(x, ...)
## S3 method for class 'esemifar' print(x, ...)
x |
an input object of class |
... |
included for compatibility; additional arguments will however not affect the output. |
None
Dominik Schulz (Scientific employee) (Department of Economics, Paderborn
University),
Generic function which extracts model residuals from a esemifar
class
object. Both residuals
and its abbreviated form resid
can be called.
## S3 method for class 'esemifar' residuals(object, ...)
## S3 method for class 'esemifar' residuals(object, ...)
object |
an object from the |
... |
included for consistency with the generic function. |
Residuals extracted from a esemifar
class object.
Sebastian Letmathe (Scientific Employee) (Department of Economics,
Paderborn University),
This function runs an iterative plug-in algorithm to find the optimal bandwidth for the estimation of the nonparametric trend in equidistant time series (with long-memory errors) and then employs the resulting bandwidth via either local polynomial or kernel regression.
tsmoothlm( y, pmin = c(0, 1, 2, 3, 4, 5), pmax = c(0, 1, 2, 3, 4, 5), qmin = c(0, 1, 2, 3, 4, 5), qmax = c(0, 1, 2, 3, 4, 5), p = c(1, 3), mu = c(0, 1, 2, 3), InfR = c("Opt", "Nai", "Var"), bStart = 0.15, bb = c(0, 1), cb = 0.05, method = c("lpr", "kr") )
tsmoothlm( y, pmin = c(0, 1, 2, 3, 4, 5), pmax = c(0, 1, 2, 3, 4, 5), qmin = c(0, 1, 2, 3, 4, 5), qmax = c(0, 1, 2, 3, 4, 5), p = c(1, 3), mu = c(0, 1, 2, 3), InfR = c("Opt", "Nai", "Var"), bStart = 0.15, bb = c(0, 1), cb = 0.05, method = c("lpr", "kr") )
y |
a numeric vector that contains the time series ordered from past to present. |
||||||||||
pmin |
an integer value |
||||||||||
pmax |
an integer value |
||||||||||
qmin |
an integer value |
||||||||||
qmax |
an integer value |
||||||||||
p |
an integer |
||||||||||
mu |
an integer
|
||||||||||
InfR |
a character object that represents the inflation
rate in the form
|
||||||||||
bStart |
a numeric object that indicates the starting value of the
bandwidth for the iterative process; should be |
||||||||||
bb |
can be set to
|
||||||||||
cb |
a numeric value that indicates the percentage of omitted
observations on each side of the observation period for the automated
bandwidth selection; is set to |
||||||||||
method |
the final smoothing approach; |
The trend is estimated based on the additive nonparametric regression model for an equidistant time series
where is the observed time series,
is the rescaled time
on the interval
,
is a smooth and deterministic
trend function and
are stationary errors with
and is assumed to follow a FARIMA(
)
model (see also Beran and Feng, 2002a, Beran and Feng, 2002b and Beran
and Feng, 2002c).
The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).
The function calculates suitable estimates for , the variance
factor, and
over different iterations. In each
iteration, a bandwidth is obtained in accordance with the AMISE that once
more serves as an input for the following iteration. The process repeats
until either convergence or the 40th iteration is reached. For further
details on the asymptotic theory or the algorithm, please see Letmathe et
al., 2023.
To apply the function, the following arguments are needed: a data input
y
, an order of polynomial p
, a kernel weighting function
defined by the smoothness parameter mu
, an inflation rate setting
InfR
(see also Beran and Feng, 2002b), a starting value for the
relative bandwidth bStart
, a
boundary method bb
, a boundary cut-off percentage cb
and a
final smoothing method method
. In fact, aside from the input vector
y
, every argument has a default setting that can be adjusted for the
individual case. Theoretically, the initial bandwidth does not affect the
selected optimal bandwidth. However, in practice local minima of the AMISE
might exist and influence the selected bandwidth. Therefore, the default
setting is bStart = 0.15
. In the rare
case of a clearly unsuitable optimal bandwidth, a starting bandwidth that
differs from the default value is a first possible approach to obtain a
better result. Other argument adjustments can be tried as well. For more
specific information on the input arguments consult the section
Arguments.
When applying the function, an optimal bandwidth is obtained based on a
strongly modified version of the IPI algorithm of Beran and Feng (2002a). In
a second step, the nonparametric trend of the series is calculated with
respect to the chosen bandwidth and the selected regression method (lpf
or kr
). Please note that method = "lpf"
is strongly recommended
by the authors. Moreover, it is notable that p
is automatically set to
1
for method = "kr"
. The output object is then a list that
contains, among other components, the original time series, the estimated
trend values and the series without the trend.
The default print method for this function delivers only key numbers such as
the iteration steps and the generated optimal bandwidth rounded to the fourth
decimal. The exact numbers and results such as the estimated nonparametric
trend series are saved within the output object and can be addressed via the
$
sign.
The function returns a list with different components:
the Bayesian Information Criterion of the optimal
FARIMA() model.
the percentage of omitted observations on each side of the observation period.
the optimal bandwidth chosen by the IPI-algorithm.
the boundary bandwidth method used within the IPI; always equal to 1.
the starting value of the (relative) bandwidth; input argument.
the estimated variance factor; in contrast to the definitions
given in the Details section, this object actually contains an
estimated value of , i.e. it corresponds to the estimated sum
of autocovariances.
the long-memory parameter of the optimal FARIMA()
model.
the model fit of the selected FARIMA( model.
the estimated value of .
the setting for the inflation rate according to the chosen algorithm.
the bandwidths of the single iterations steps
the smoothness parameter of the second order kernel; input argument.
the number of observations.
the total number of iterations until convergence.
the original input series; input argument.
the order p of the optimal FARIMA() model.
the order of polynomial used in the IPI-algorithm; also used for the
final smoothing, if method = "lpr"
; input argument.
the order of the optimal FARIMA(
)
model.
the estimated residual series.
the considered order of derivative of the trend; is always zero for this function.
the weighting system matrix used within the local polynomial
regression; this matrix is a condensed version of a complete weighting system
matrix; in each row of ws
, the weights for conducting the smoothing
procedure at a specific observation time point can be found; the first
rows, where
corresponds to the number of
observations,
is the bandwidth considered for smoothing and
denotes the integer part, contain the weights at the
left-hand boundary points; the weights in row
are representative for the estimation at all
interior points and the remaining rows contain the weights for the right-hand
boundary points; each row has exactly
elements,
more specifically the weights for observations of the nearest
time points; moreover, the weights are normalized,
i.e. the weights are obtained under consideration of the time points
, where
.
the nonparametric estimates of the trend.
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/
Sebastian Letmathe (Scientific Employee) (Department of Economics,
Paderborn University),
Package Creator and Maintainer
Dominik Schulz (Scientific Employee) (Department of Economics,
Paderborn University),
Author
Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.
Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
### Example 1: G7-GDP ### # Logarithm of test data # -> the logarithm of the data is assumed to follow the additive model test_data <- gdpG7 y <- log(test_data$gdp) n <- length(y) # Applied tsmooth function for the trend result <- tsmoothlm(y, p = 1, pmax = 1, qmax = 1, InfR = "Opt") trend1 <- result$ye # Plot of the results t <- seq(from = 1962, to = 2020, length.out = n) plot(t, y, type = "l", xlab = "Year", ylab = "log(G7-GDP)", bty = "n", lwd = 1, lty = 3, main = "Estimated trend for log-quarterly G7-GDP, Q1 1962 - Q4 2019") points(t, trend1, type = "l", col = "red", lwd = 1) title(sub = expression(italic("Figure 1")), col.sub = "gray47", cex.sub = 0.6, adj = 0) result
### Example 1: G7-GDP ### # Logarithm of test data # -> the logarithm of the data is assumed to follow the additive model test_data <- gdpG7 y <- log(test_data$gdp) n <- length(y) # Applied tsmooth function for the trend result <- tsmoothlm(y, p = 1, pmax = 1, qmax = 1, InfR = "Opt") trend1 <- result$ye # Plot of the results t <- seq(from = 1962, to = 2020, length.out = n) plot(t, y, type = "l", xlab = "Year", ylab = "log(G7-GDP)", bty = "n", lwd = 1, lty = 3, main = "Estimated trend for log-quarterly G7-GDP, Q1 1962 - Q4 2019") points(t, trend1, type = "l", col = "red", lwd = 1) title(sub = expression(italic("Figure 1")), col.sub = "gray47", cex.sub = 0.6, adj = 0) result