Title: | Empirical Dynamic Modeling ('EDM') |
---|---|
Description: | An implementation of 'EDM' algorithms based on research software developed for internal use at the Sugihara Lab ('UCSD/SIO'). The package is implemented with 'Rcpp' wrappers around the 'cppEDM' library. It implements the 'simplex' projection method from Sugihara & May (1990) <doi:10.1038/344734a0>, the 'S-map' algorithm from Sugihara (1994) <doi:10.1098/rsta.1994.0106>, convergent cross mapping described in Sugihara et al. (2012) <doi:10.1126/science.1227079>, and, 'multiview embedding' described in Ye & Sugihara (2016) <doi:10.1126/science.aag0863>. |
Authors: | Joseph Park [aut, cre] , Cameron Smith [aut] , George Sugihara [aut, ccp] , Ethan Deyle [aut] , Erik Saberski [ctb] , Hao Ye [ctb] , The Regents of the University of California [cph] |
Maintainer: | Joseph Park <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 1.15.4 |
Built: | 2025-01-02 07:00:31 UTC |
Source: | CRAN |
Time series generated from a discrete-time coupled Lotka-Volterra model exhibiting chaotic dynamics.
block_3sp
block_3sp
A data frame with 198 rows and 10 columns:
time
time index (# of generations)
x_t
abundance of simulated species x at time t
x_t-1
abundance of simulated species x at time t-1
x_t-2
abundance of simulated species x at time t-2
y_t
abundance of simulated species y at time t
y_t-1
abundance of simulated species y at time t-1
y_t-2
abundance of simulated species y at time t-2
z_t
abundance of simulated species z at time t
z_t-1
abundance of simulated species z at time t-1
z_t-2
abundance of simulated species z at time t-2
The state-space of a multivariate dynamical system (not a purely
stochastic one) encodes coherent phase-space variable trajectories. If
enough information is available, one can infer the presence or absence
of cross-variable interactions associated with causal links between
variables. CCM
measures the extent to which states of
variable Y can reliably estimate states of variable X. This can happen
if X is causally influencing Y.
If cross-variable state predictability converges as more state-space
information is provided, this indicates a causal link. CCM
performs this cross-variable mapping using Simplex, with convergence
assessed across a range of observational library sizes as described in
Sugihara et al. 2012.
CCM(pathIn = "./", dataFile = "", dataFrame = NULL, E = 0, Tp = 0, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", libSizes = "", sample = 0, random = TRUE, seed = 0, embedded = FALSE, includeData = FALSE, parameterList = FALSE, verbose = FALSE, showPlot = FALSE, noTime = FALSE)
CCM(pathIn = "./", dataFile = "", dataFrame = NULL, E = 0, Tp = 0, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", libSizes = "", sample = 0, random = TRUE, seed = 0, embedded = FALSE, includeData = FALSE, parameterList = FALSE, verbose = FALSE, showPlot = FALSE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
E |
embedding dimension. |
Tp |
prediction horizon (number of time column rows). |
knn |
number of nearest neighbors. If knn=0, knn is set to E+1. |
tau |
lag of time delay embedding specified as number of time column rows. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
libSizes |
string of 3 whitespace separated integer values specifying the intial library size, the final library size, and the library size increment. Can also be a list of strictly increasing library sizes. |
sample |
integer specifying the number of random samples to draw at each library size evaluation. |
random |
logical to specify random ( |
seed |
integer specifying the random sampler seed. If
|
embedded |
logical specifying if the input data are embedded. |
includeData |
logical to include statistics and predictions for every prediction in the ensemble. |
parameterList |
logical to add list of invoked parameters. |
verbose |
logical to produce additional console reporting. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
CCM
computes the X:Y and Y:X cross-mappings in parallel
using threads.
A data.frame with 3 columns. The first column is LibSize
specifying the subsampled library size. Columns 2 and 3 report
Pearson correlation coefficients for the prediction of X from Y, and
Y from X.
if includeData = TRUE
a named list with the following data.frames
data.frame Combo_rho
columns:
LibMeans | CCM mean correlations for each library size |
CCM1_PredictStat | Forward cross map prediction statistics |
CCM1_Predictions | Forward cross map prediction values |
CCM2_PredictStat | Reverse cross map prediction statistics |
CCM2_Predictions | Reverse cross map prediction values |
If includeData = TRUE
and parameterList = TRUE
a
named list "parameters" is added.
Sugihara G., May R., Ye H., Hsieh C., Deyle E., Fogarty M., Munch S., 2012. Detecting Causality in Complex Ecosystems. Science 338:496-500.
data(sardine_anchovy_sst) df = CCM( dataFrame = sardine_anchovy_sst, E = 3, Tp = 0, columns = "anchovy", target = "np_sst", libSizes = "10 70 10", sample = 100 )
data(sardine_anchovy_sst) df = CCM( dataFrame = sardine_anchovy_sst, E = 3, Tp = 0, columns = "anchovy", target = "np_sst", libSizes = "10 70 10", sample = 100 )
Time series of of circle in 2-D (sin and cos).
circle
circle
A data frame with 200 rows and 3 columns:
Time
time index.
x
sin component.
y
cos component.
ComputeError
evaluates the Pearson correlation
coefficient, mean absolute error and root mean square error between two
numeric vectors.
ComputeError(obs, pred)
ComputeError(obs, pred)
obs |
vector of observations. |
pred |
vector of predictions. |
A name list with components:
rho | Pearson correlation |
MAE | mean absolute error |
RMSE | root mean square error |
data(block_3sp) smplx <- Simplex( dataFrame=block_3sp, lib="1 99", pred="105 190", E=3, columns="x_t", target="x_t") err <- ComputeError( smplx$Observations, smplx$Predictions )
data(block_3sp) smplx <- Simplex( dataFrame=block_3sp, lib="1 99", pred="105 190", E=3, columns="x_t", target="x_t") err <- ComputeError( smplx$Observations, smplx$Predictions )
Embed
performs Takens time-delay embedding on columns
.
Embed(path = "./", dataFile = "", dataFrame = NULL, E = 0, tau = -1, columns = "", verbose = FALSE)
Embed(path = "./", dataFile = "", dataFrame = NULL, E = 0, tau = -1, columns = "", verbose = FALSE)
path |
path to |
dataFile |
.csv format data file name. The first column must be a time
index or time values. The first row must be column names. One of
|
dataFrame |
input data.frame. The first column must be a time
index or time values. The columns must be named. One of
|
E |
embedding dimension. |
tau |
integer time delay embedding lag specified as number of time column rows. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
verbose |
logical to produce additional console reporting. |
Each columns
item will have E-1 time-lagged vectors created.
The column name is appended with (t-n)
. For example, data
columns X, Y, with E = 2 will have columns named
X(t-0) X(t-1) Y(t-0) Y(t-1)
.
The returned data.frame does not have a time column. The returned data.frame is truncated by tau * (E-1) rows to remove state vectors with partial data (NaN elements).
A data.frame with lagged columns. E columns for each variable specified
in columns
.
data(circle) embed <- Embed( dataFrame = circle, E = 2, tau = -1, columns = "x y" )
data(circle) embed <- Embed( dataFrame = circle, E = 2, tau = -1, columns = "x y" )
EmbedDimension
uses Simplex
to evaluate
prediction accuracy as a function of embedding dimension.
EmbedDimension(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "", predictFile = "", lib = "", pred = "", maxE = 10, Tp = 1, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), numThreads = 4, showPlot = TRUE, noTime = FALSE)
EmbedDimension(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "", predictFile = "", lib = "", pred = "", maxE = 10, Tp = 1, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), numThreads = 4, showPlot = TRUE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
pathOut |
path for |
predictFile |
output file name. |
lib |
string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library. |
pred |
string with start and stop indices of input data rows used for predictions. A single contiguous range is supported. |
maxE |
maximum value of E to evalulate. |
Tp |
prediction horizon (number of time column rows). |
tau |
lag of time delay embedding specified as number of time column rows. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
embedded |
logical specifying if the input data are embedded. |
verbose |
logical to produce additional console reporting. |
validLib |
logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library. |
numThreads |
number of parallel threads for computation. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
A data.frame with columns E, rho
.
data(TentMap) E.rho = EmbedDimension( dataFrame = TentMap, lib = "1 100", pred = "201 500", columns = "TentMap", target = "TentMap", showPlot = FALSE )
data(TentMap) E.rho = EmbedDimension( dataFrame = TentMap, lib = "1 100", pred = "201 500", columns = "TentMap", target = "TentMap", showPlot = FALSE )
Cumulative weekly water flow into northeast Everglades from water control structures S12C, S12D and S333 from 1980 through 2005.
EvergladesFlow
EvergladesFlow
A data frame with 1379 rows and 2 columns:
Date
Date.
S12CD_S333_CFS
Cumulative weekly flow (CFS).
5-D Lorenz'96 timeseries with F = 8.
Lorenz5D
Lorenz5D
Data frame with 1000 rows and 6 columns
Time
Time.
V1
variable 1.
V2
variable 2.
V3
variable 3.
V4
variable 4.
V5
variable 5.
Lorenz, Edward (1996). Predictability - A problem partly solved, Seminar on Predictability, Vol. I, ECMWF.
MakeBlock
performs Takens time-delay embedding on
columns
. It is an internal function called by Embed
that does not perform input error checking or validation.
MakeBlock(dataFrame, E = 0, tau = -1, columns = "", deletePartial = FALSE)
MakeBlock(dataFrame, E = 0, tau = -1, columns = "", deletePartial = FALSE)
dataFrame |
input data.frame. The first column must be a time index or time values. The columns must be named. |
E |
embedding dimension. |
tau |
integer time delay embedding lag specified as number of time column rows. |
columns |
string of whitespace separated column name(s) in the input data to be embedded. |
deletePartial |
boolean to delete rows with partial data. |
Each columns
item will have E-1 time-lagged vectors created.
The column name is appended with (t-n)
. For example, data
columns X, Y, with E = 2 will have columns named
X(t-0) X(t-1) Y(t-0) Y(t-1)
.
The returned data.frame does not have a time column.
If deletePartial
is TRUE
, the returned
data.frame is truncated by tau * (E-1) rows to remove state vectors
with partial data (NaN elements).
A data.frame with lagged columns. E columns for each variable specified
in columns
.
data(TentMap) embed <- MakeBlock(TentMap, 3, 1, "TentMap")
data(TentMap) embed <- MakeBlock(TentMap, 3, 1, "TentMap")
Multiview
applies the method of Ye & Sugihara
to find optimal combinations of variables that best represent the
dynamics.
Multiview(pathIn = "./", dataFile = "", dataFrame = NULL, lib = "", pred = "", D = 0, E = 1, Tp = 1, knn = 0, tau = -1, columns = "", target = "", multiview = 0, exclusionRadius = 0, trainLib = TRUE, excludeTarget = FALSE, parameterList = FALSE, verbose = FALSE, numThreads = 4, showPlot = FALSE, noTime = FALSE)
Multiview(pathIn = "./", dataFile = "", dataFrame = NULL, lib = "", pred = "", D = 0, E = 1, Tp = 1, knn = 0, tau = -1, columns = "", target = "", multiview = 0, exclusionRadius = 0, trainLib = TRUE, excludeTarget = FALSE, parameterList = FALSE, verbose = FALSE, numThreads = 4, showPlot = FALSE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values. The first row must be column names unless noTime is TRUE. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
lib |
a 2-column matrix, data.frame, 2-element vector or string of row indice pairs, where each pair specifies the first and last *rows* of the time series to create the library. |
pred |
(same format as lib), but specifying the sections of the time series to forecast. |
D |
multivariate dimension. |
E |
embedding dimension. |
Tp |
prediction horizon (number of time column rows). |
knn |
number of nearest neighbors. If knn=0, knn is set to E+1. |
tau |
lag of time delay embedding specified as number of time column rows. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
multiview |
number of multiview ensembles to average for the final prediction estimate. |
exclusionRadius |
number of adjacent observation vector rows to exclude as nearest neighbors in prediction. |
trainLib |
logical to use in-sample (lib=pred) projections for the ranking of column combinations. |
excludeTarget |
logical to exclude embedded target column from combinations. |
parameterList |
logical to add list of invoked parameters. |
verbose |
logical to produce additional console reporting. |
numThreads |
number of CPU threads to use in multiview processing. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
Multiview embedding is a method to identify variables in a multivariate dynamical system that are most likely to contribute to the observed dynamics. It is a multistep algorithm with these general steps:
Compute D-dimensional variable combination forecasts.
Rank forecasts.
Compute predictions of top combinations.
Compute multiview averaged prediction.
If E>1
, all variables are embedded to dimension E.
If trainLib
is TRUE
initial forecasts and ranking are
done in-sample (lib=pred
) and predictions using the top ranked
combinations use the specified lib
and pred
.
If trainLib
is FALSE
initial forecasts and ranking use
the specified lib
and pred
, the step of computing
predictions of the top combinations is skipped.
Named list with data.frames [[View, Predictions]]
.
data.frame View
columns:
Col_1 | column index |
... | column index |
Col_D | column index |
rho | Pearson correlation |
MAE | mean absolute error |
RMSE | root mean square error |
name_1 | column name |
... | column name |
name_D | column name |
If parameterList = TRUE
a named list "parameters" is added.
Ye H., and G. Sugihara, 2016. Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality. Science 353:922-925.
data(block_3sp) L = Multiview( dataFrame = block_3sp, lib = "1 100", pred = "101 190", E = 2, columns = "x_t y_t z_t", target = "x_t" )
data(block_3sp) L = Multiview( dataFrame = block_3sp, lib = "1 100", pred = "101 190", E = 2, columns = "x_t y_t z_t", target = "x_t" )
Time series of Paramecium and Didinium abundances (#/mL) from an experiment by Veilleux (1979)
paramecium_didinium
paramecium_didinium
PredictInterval
uses Simplex
to evaluate
prediction accuracy as a function of forecast interval Tp.
PredictInterval(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", maxTp = 10, E = 1, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), numThreads = 4, showPlot = TRUE, noTime = FALSE)
PredictInterval(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", maxTp = 10, E = 1, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), numThreads = 4, showPlot = TRUE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
pathOut |
path for |
predictFile |
output file name. |
lib |
string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library. |
pred |
string with start and stop indices of input data rows used for predictions. A single contiguous range is supported. |
maxTp |
maximum value of Tp to evalulate. |
E |
embedding dimension. |
tau |
lag of time delay embedding specified as number of time column rows. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
embedded |
logical specifying if the input data are embedded. |
verbose |
logical to produce additional console reporting. |
validLib |
logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library. |
numThreads |
number of parallel threads for computation. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
A data.frame with columns Tp, rho
.
data(TentMap) Tp.rho = PredictInterval( dataFrame = TentMap, lib = "1 100", pred = "201 500", E = 2, columns = "TentMap", target = "TentMap", showPlot = FALSE )
data(TentMap) Tp.rho = PredictInterval( dataFrame = TentMap, lib = "1 100", pred = "201 500", E = 2, columns = "TentMap", target = "TentMap", showPlot = FALSE )
PredictNonlinear
uses SMap
to evaluate
prediction accuracy as a function of the localisation parameter
theta
.
PredictNonlinear(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", theta = "", E = 1, Tp = 1, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), ignoreNan = TRUE, numThreads = 4, showPlot = TRUE, noTime = FALSE )
PredictNonlinear(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", theta = "", E = 1, Tp = 1, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), ignoreNan = TRUE, numThreads = 4, showPlot = TRUE, noTime = FALSE )
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
pathOut |
path for |
predictFile |
output file name. |
lib |
string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library. |
pred |
string with start and stop indices of input data rows used for predictions. A single contiguous range is supported. |
theta |
A whitespace delimeted string with values of the S-map
localisation parameter. An empty string will use default values of
|
E |
embedding dimension. |
Tp |
prediction horizon (number of time column rows). |
knn |
number of nearest neighbors. If knn=0, knn is set to the library size. |
tau |
lag of time delay embedding specified as number of time column rows. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
embedded |
logical specifying if the input data are embedded. |
verbose |
logical to produce additional console reporting. |
validLib |
logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library. |
ignoreNan |
logical to internally redefine library to avoid nan. |
numThreads |
number of parallel threads for computation. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
The localisation parameter theta
weights nearest
neighbors according to exp( (-theta D / D_avg) ) where D is the
distance between the observation vector and neighbor, D_avg the mean
distance. If theta = 0, weights are uniformally unity corresponding
to a global autoregressive model. As theta increases, neighbors in
closer proximity to the observation are considered.
A data.frame with columns Theta, rho
.
data(TentMapNoise) theta.rho = PredictNonlinear( dataFrame = TentMapNoise, E = 2, lib = "1 100", pred = "201 500", columns = "TentMap", target = "TentMap", showPlot = FALSE )
data(TentMapNoise) theta.rho = PredictNonlinear( dataFrame = TentMapNoise, E = 2, lib = "1 100", pred = "201 500", columns = "TentMap", target = "TentMap", showPlot = FALSE )
rEDM provides tools for data-driven time series analyses. It is based on reconstructing multivariate state space representations from uni or multivariate time series, then projecting state changes using various metrics applied to nearest neighbors.
rEDM is a Rcpp interface to the cppEDM library of Empirical Dynamic Modeling tools. Functionality includes:
Simplex projection (Sugihara and May 1990)
Sequential Locally Weighted Global Linear Maps (S-map) (Sugihara 1994)
Multivariate embeddings (Dixon et. al. 1999)
Convergent cross mapping (Sugihara et. al. 2012)
Multiview embedding (Ye and Sugihara 2016)
Main Functions:
Simplex
- simplex projection
SMap
- S-map projection
CCM
- convergent cross mapping
Multiview
- multiview forecasting
Helper Functions:
Embed
- time delay embedding
ComputeError
- forecast skill metrics
EmbedDimension
- optimal embedding dimension
PredictInterval
- optimal prediction interval
PredictNonlinear
- evaluate nonlinearity
Maintainer: Joseph Park
Authors: Joseph Park, Cameron Smith, Ethan Deyle, Erik Saberski, George Sugihara
Sugihara G. and May R. 1990. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344:734-741.
Sugihara G. 1994. Nonlinear forecasting for the classification of natural time series. Philosophical Transactions: Physical Sciences and Engineering, 348 (1688) : 477-495.
Dixon, P. A., M. Milicich, and G. Sugihara, 1999. Episodic fluctuations in larval supply. Science 283:1528-1530.
Sugihara G., May R., Ye H., Hsieh C., Deyle E., Fogarty M., Munch S., 2012. Detecting Causality in Complex Ecosystems. Science 338:496-500.
Ye H., and G. Sugihara, 2016. Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality. Science 353:922-925.
Time series of Pacific sardine landings (CA), Northern anchovy landings (CA), and sea-surface temperature (3-year average) at the SIO pier and Newport pier
sardine_anchovy_sst
sardine_anchovy_sst
year
year of measurement
anchovy
anchovy landings, scaled to mean = 0, sd = 1
sardine
sardine landings, scaled to mean = 0, sd = 1
sio_sst
3-year running average of sea surface temperature at SIO pier, scaled to mean = 0, sd = 1
np_sst
3-year running average of sea surface temperature at Newport pier, scaled to mean = 0, sd = 1
Simplex
performs time series forecasting based on
weighted nearest neighbors projection in the time series phase space as
described in Sugihara and May.
Simplex(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", E = 0, Tp = 1, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), generateSteps = 0, parameterList = FALSE, showPlot = FALSE, noTime = FALSE)
Simplex(pathIn = "./", dataFile = "", dataFrame = NULL, pathOut = "./", predictFile = "", lib = "", pred = "", E = 0, Tp = 1, knn = 0, tau = -1, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), generateSteps = 0, parameterList = FALSE, showPlot = FALSE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
pathOut |
path for |
predictFile |
output file name. |
lib |
string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library. |
pred |
string with start and stop indices of input data rows used for predictions. A single contiguous range is supported. |
E |
embedding dimension. |
Tp |
prediction horizon (number of time column rows). |
knn |
number of nearest neighbors. If knn=0, knn is set to E+1. |
tau |
lag of time delay embedding specified as number of time column rows. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
embedded |
logical specifying if the input data are embedded. |
verbose |
logical to produce additional console reporting. |
validLib |
logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library. |
generateSteps |
number of predictive feedback generative steps. |
parameterList |
logical to add list of invoked parameters. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
If embedded is FALSE
, the data column(s)
are embedded to
dimension E
with time lag tau
. This embedding forms an
E-dimensional phase space for the Simplex
projection.
If embedded is TRUE
, the data are assumed to contain an
E-dimensional embedding with E equal to the number of columns
.
Predictions are made using leave-one-out cross-validation, i.e.
observation vectors are excluded from the prediction simplex.
To assess an optimal embedding dimension EmbedDimension
can be applied. Accuracy statistics can be estimated by
ComputeError
.
A data.frame with columns Observations, Predictions
.
The first column contains the time values.
If parameterList = TRUE
, a named list with "predictions" holding the
data.frame, "parameters" with a named list of invoked parameters.
Sugihara G. and May R. 1990. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344:734-741.
data( block_3sp ) smplx = Simplex( dataFrame = block_3sp, lib = "1 100", pred = "101 190", E = 3, columns = "x_t", target = "x_t" ) ComputeError( smplx $ Predictions, smplx $ Observations )
data( block_3sp ) smplx = Simplex( dataFrame = block_3sp, lib = "1 100", pred = "101 190", E = 3, columns = "x_t", target = "x_t" ) ComputeError( smplx $ Predictions, smplx $ Observations )
SMap
performs time series forecasting based on localised
(or global) nearest neighbor projection in the time series phase space as
described in Sugihara 1994.
SMap(pathIn = "./", dataFile = "", dataFrame = NULL, lib = "", pred = "", E = 0, Tp = 1, knn = 0, tau = -1, theta = 0, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), ignoreNan = TRUE, generateSteps = 0, parameterList = FALSE, showPlot = FALSE, noTime = FALSE)
SMap(pathIn = "./", dataFile = "", dataFrame = NULL, lib = "", pred = "", E = 0, Tp = 1, knn = 0, tau = -1, theta = 0, exclusionRadius = 0, columns = "", target = "", embedded = FALSE, verbose = FALSE, validLib = vector(), ignoreNan = TRUE, generateSteps = 0, parameterList = FALSE, showPlot = FALSE, noTime = FALSE)
pathIn |
path to |
dataFile |
.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names. |
dataFrame |
input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named. |
lib |
string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library. |
pred |
string with start and stop indices of input data rows used for predictions. A single contiguous range is supported. |
E |
embedding dimension. |
Tp |
prediction horizon (number of time column rows). |
knn |
number of nearest neighbors. If knn=0, knn is set to the library size. |
tau |
lag of time delay embedding specified as number of time column rows. |
theta |
neighbor localisation exponent. |
exclusionRadius |
excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius. |
columns |
string of whitespace separated column name(s), or vector of column names used to create the library. If individual column names contain whitespace place names in a vector, or, append ',' to the name. |
target |
column name used for prediction. |
embedded |
logical specifying if the input data are embedded. |
verbose |
logical to produce additional console reporting. |
validLib |
logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library. |
ignoreNan |
logical to internally redefine library to avoid nan. |
generateSteps |
number of predictive feedback generative steps. |
parameterList |
logical to add list of invoked parameters. |
showPlot |
logical to plot results. |
noTime |
logical to allow input data with no time column. |
If embedded
is FALSE
, the data column(s)
are embedded
to dimension E
with time lag tau
. This embedding forms an
n-columns * E-dimensional phase space for the SMap
projection.
If embedded is TRUE
, the data are assumed to contain an
E-dimensional embedding with E equal to the number of columns
.
See the Note below for proper use of multivariate data (number of
columns
> 1).
If ignoreNan
is TRUE
, the library (lib
) is
internally redefined to exclude nan embedding vectors. If
ignoreNan
is FALSE
no library adjustment is made. The
(lib
) can be explicitly specified to exclude nan library vectors.
Predictions are made using leave-one-out cross-validation, i.e. observation rows are excluded from the prediction regression.
In contrast to Simplex
, SMap
uses all
available neighbors and weights them with an exponential decay
in phase space distance with exponent theta
. theta
=0
uses all neighbors corresponding to a global autoregressive model.
As theta
increases, neighbors closer in vicinity to the
observation are considered.
A named list with three data.frames
[[predictions, coefficients, singularValues]]
.
predictions
has columns Observations, Predictions
.
The first column contains time or index values.
coefficients
data.frame has time or index values in the first column.
Columns 2 through E+2 (E+1 columns) are the SMap coefficients.
singularValues
data.frame has time or index values in the first column.
Columns 2 through E+2 (E+1 columns) are the SVD singularValues. The
first value corresponds to the SVD bias (intercept) term.
If parameterList = TRUE
a named list "parameters" is added.
SMap
should be called with columns explicitly corresponding to
dimensions E. In the univariate case (number of columns
= 1) with
default embedded = FALSE
, the time series will be time-delay
embedded to dimension E, SMap coefficients correspond to each dimension.
If a multivariate data set is used (number of columns
> 1) it
must use embedded = TRUE
with E equal to the number of columns.
This prevents the function from internally time-delay embedding the
multiple columns to dimension E. If the internal time-delay embedding
is performed, then state-space columns will not correspond to the
intended dimensions in the matrix inversion, coefficient assignment,
and prediction. In the multivariate case, the user should first prepare
the embedding (using Embed
for time-delay embedding), then
pass this embedding to SMap
with appropriately specified
columns
, E
, and embedded = TRUE
.
Sugihara G. 1994. Nonlinear forecasting for the classification of natural time series. Philosophical Transactions: Physical Sciences and Engineering, 348 (1688):477-495.
data(circle) L = SMap( dataFrame = circle, lib="1 100", pred="110 190", theta = 4, E = 2, embedded = TRUE, columns = "x y", target = "x" )
data(circle) L = SMap( dataFrame = circle, lib="1 100", pred="110 190", theta = 4, E = 2, embedded = TRUE, columns = "x y", target = "x" )
SurrogateData
generates surrogate data under several different
null models.
SurrogateData( ts, method = c("random_shuffle", "ebisuzaki", "seasonal"), num_surr = 100, T_period = 1, alpha = 0 )
SurrogateData( ts, method = c("random_shuffle", "ebisuzaki", "seasonal"), num_surr = 100, T_period = 1, alpha = 0 )
ts |
the original time series |
method |
which algorithm to use to generate surrogate data |
num_surr |
the number of null surrogates to generate |
T_period |
the period of seasonality for seasonal surrogates (ignored for other methods) |
alpha |
additive noise factor: N(0,alpha) |
Method "random_shuffle" creates surrogates by randomly permuting the values of the original time series.
Method "Ebisuzaki" creates surrogates by randomizing the phases of a Fourier transform, preserving the power spectra of the null surrogates.
Method "seasonal" creates surrogates by computing a mean seasonal trend of the specified period and shuffling the residuals. It is presumed that the seasonal trend can be exracted with a smoothing spline. Additive Gaussian noise is included according to N(0,alpha).
A matrix where each column is a separate surrogate with the same
length as ts
.
data("block_3sp") ts <- block_3sp$x_t SurrogateData(ts, method = "ebisuzaki")
data("block_3sp") ts <- block_3sp$x_t SurrogateData(ts, method = "ebisuzaki")
First-differenced time series generated from the tent map recurrence relation with mu = 2.
TentMap
TentMap
Data frame with 999 rows and 2 columns
Time
time index.
TentMap
tent map values.
First-differenced time series generated from the tent map recurrence relation with mu = 2 and random noise.
TentMapNoise
TentMapNoise
Data frame with 999 rows and 2 columns
Time
time index.
TentMap
tent map values.
Seasonal outbreaks of Thrips imaginis.
Davidson and Andrewartha, Annual trends in a natural population of Thrips imaginis Thysanoptera, Journal of Animal Ecology, 17, 193-199, 1948.