---
title: "GPoM : 5 Models predictability"
author: "Sylvain Mangiarotti & Mireille Huc"
date: "`r Sys.Date()`"
output:
  pdf_document:
    number_sections: no
    latex_engine: xelatex
vignette: >
  %\VignetteIndexEntry{GPoM : 5 Models predictability}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
load package: "`r library(GPoM)`"
---

## Model performances

To evaluate the model performances is a often a difficult problem.
It is in particular the case when
the studied dynamic is characterized by a high sensitivity
to the initial conditions, which is the case of many real
world system (climatic, hydrological, epidemiological,
ecological systems, etc.).
Indeed, such high sensitivity lead to the following situation:
a small difference in the initial state of the system
(real or modeled) can lead to different trajectories,
even for the same governing equations.
The consequence of this situation is that, for slightly
different initial conditions, the model simulations may
be very different from the observed measurements,
even if the model is very good (or perfect).

One solution to avoid this difficulty is to use the nonlinear
invariants for the model validation.
Nonlinear invariants are independent to the initial
conditions and are thus well adapted in their design
for this purpose.
Three types of nonlinear invariants can be used:
dynamical^[A. Wolf, J. B. Swift, H. L. Swinney, & J. A. Vastano, 1985. Determining Lyapunov exponents from a time series, *Physica D*, **16**, 285-317.], 
geometrical^[P. Grassberger and I. Procaccia, 1983. Characterization of strange attractors, *Physical Review Letter*, **50**, 346-349.]
and topological^[R. Gilmore, 1998. Topological analysis of chaotic dynamical systems, *Review of Modern Physics*, **70**, 1455-1530.].
In their principle, these invariants can provide
powerful measures for validation, calibration
etc.
Unfortunately, in practice, these may also have
important limitations.
One difficulty met with dynamical and geometrical
invariants is that their algorithms are generally
quite sensitive to noise and thus difficult
to use for a robust validation when datasets from
real world conditions are considered.
Moreover, these two invariants correspond to
statistical properties and cannot guarantee
that all the dynamical properties are preserved
(dynamics characterized by very different topology
can have very similar statistical properties).
Topological properties are much richer because
these can provide a detailed description of
the dynamical properties based on integers
and topological invariants are less sensitive
to noise.
At present, their application is mainly adapted
to three-dimensional systems, although attempts
have been made to extend their use to higher
dimensions^[R. Gilmore & M. Lefranc,
*The Topology of Chaos* (Wiley, New York), 2002.]$^,$ ^[S. Mangiarotti, Modélisation globale et caractérisation topologique de dynamiques environnementales: de l’analyse des enveloppes fluides et du couvert de surface de la Terre à la caractérisation topolodynamique du chaos, dissertation Habilitation to Direct Researches, Université de Toulouse, 2014.,]$^,$ ^[S. Mangiarotti & C. Letellier, 2015. Topological analysis for designing a suspension of the Hénon map,
*Physics Letters A*, **379**, 3069-3074.].

Alternative approaches may thus sometimes be preferred,
especially when considering real world situations
and noisy time series.
Several alternatives of validations are given
in reference ^[S. Mangiarotti, M. Peyre & M. Huc, 2016.
A chaotic model for the epidemic of Ebola virus disease in West
Africa (2013–2016). *Chaos*, **26**, 113112.].
Model predictability is one of these alternatives
and is based the growth of the forecast
error^[S. Mangiarotti, L. Drapeau & C. Letellier, 2014.
Two chaotic global models for cereal crops cycles
observed from satellite in Northern Morocco,
*Chaos*, **24**, 023130.,]$^,$ ^[S. Mangiarotti, 2015.
Low dimensional chaotic models for the plague epidemic
in Bombay (1896-1911), *Chaos, Solitons & Fractals*,
**81**(A), 184-196.].
The use of the $d_{M-\Psi}$ distance may be another
alternative solution ^[S. Mangiarotti, A.K. Sharma, S. Corgne, L. Hubert-Moy,
L. Ruiz, M. Sekhar, Y. Kerr, 2018. Can the global modelling
technique be used for crop classification? *Chaos, Solitons & Fractals*,
**106**, 363-378.].

## Predictability

One practical way to analyze a model performances
is to quantify its forecasting error growth.
To do so, it is necessary to launch a forecast
from some known initial conditions and to compare
this forecast to the original time series.

To illustrate the method, the global modelling technique
is first applied in order to get an ensemble of models
for which predictability will be tested.

```{r, eval = TRUE, include=FALSE}
# load data
data("Ross76")
# time vector
tin <- Ross76[seq(1, 3000, by = 8), 1]
# single time series, here y(t) is chosen
data <- Ross76[seq(1, 3000, by = 8), 3]
# global modelling
# results are put in list outputGPoM
outputGPoM <- gPoMo(data[1:300], tin = tin[1:300], dMax = 2, nS=c(3),
                    show = 0, method = 'rk4',
                    nPmin = 3, nPmax = 12, IstepMin = 400, IstepMax = 401)
```

Eight models are here obtained:

```{r, eval = TRUE}
sum(outputGPoM$okMod)
```
```{r, eval = TRUE}
which(outputGPoM$okMod == 1)
```

For one single initial condition, the forecasting error
is defined as:

$e(t,h) = y(t+h) - y^{obs}(t+h)$,

with $t$ the time at which forecasting is launched,
and $h$ the horizon of prediction.

Such a forecasting error $e$ can be estimated for
one model, starting from one chosen initial state.
Model #1 is defined by the following set of equations:

```{r, eval = TRUE}
visuEq(outputGPoM$models$model1)
```

The initial state can be chosen as:

```{r, eval = TRUE}
x0 <- head(outputGPoM$filtdata, 1)[1:3]
```

This model can be integrated numerically, starting from
this initial condition `x0` using the `numicano` function,
and providing the prediction $y(t)$
to be compared to the original time series $y^{obs}(t)$.

```{r, eval = TRUE, fig.align='center'}
###############
# forecasting #
###############
outNumi <- numicano(nVar = 3, dMax = 2, Istep = 100, onestep = 0.08, 
                    KL = outputGPoM$models$model7, v0 = x0, method = 'rk4')

plot(outputGPoM$tfilt, outputGPoM$filtdata[,1], xlim = c(0,10),
     type='l', main = 'Observed and simulated',
     xlab = expression(italic(h)), ylab = expression(italic(y(h))))

t0 = outputGPoM$tfilt[1]
lines(outNumi$reconstr[,1] + t0,outNumi$reconstr[,2], type='l', col = 'red')

nbpt <- length(outNumi$reconstr[,2])
lines(c(-5,30), c(0,0), type='l', col = 'gray')
lines(outNumi$reconstr[,1] + t0, outNumi$reconstr[,2] - outputGPoM$filtdata[1:nbpt,1],
      type='l', col = 'green')
legend(0,-4, c("simulated", "observed", "difference"), col=c('red', 'black', 'green'), 
       lty=1, cex = 0.6)
```

Since the system is characterized by a high sensitivity
to the initial conditions, forecasting error growth may
depend on the chosen initial states.
To get a more general picture of the error growth
(here plotted in green for the chosen initial condition `x0`),
such error should be reestimated several times starting from
other initial states.
The aim of the `predictab` function is to perform these
iterations automatically.
With this function, predictability can be applied
to these eight models, for an ensemble of `Nech` initial 
conditions, and on prediction horizons of `hp`
steps corresponding here to a time interval of 1.2 (since
time sampling of `tin` equals 0.08, see upper):

```{r, eval = TRUE}
#######################
# test predictability #
#######################
outpred <- predictab(outputGPoM, hp = 15, Nech = 30, selecmod = 9, show = 0)
```


```{r, eval = TRUE, fig.show='hold'}
# manual visualisation of the outputs (e.g. for model 9):
plot(c(outpred$hpE[1], max(outpred$hpE)), c(0,0),
     type='l', main = 'Error growth',
     xlab = expression(h), ylab = expression(italic(e(h))),
     ylim = c(min(outpred$Errmod9), max(outpred$Errmod9)))

for (i in 1:dim(outpred$Errmod9)[2]) {
  lines(outpred$hpE, outpred$Errmod9[,i], col = 'green')
}
lines(c(outpred$hpE[1], max(outpred$hpE)), c(0,0), type='l')

# in terms of variance
# manual visualisation of the outputs (e.g. for model 9):
plot(c(outpred$hpE[1], max(outpred$hpE)), c(0,0),
     type='l', main = 'Square error growth',
     xlab = expression(italic(h)), ylab = expression(italic(e^2) (italic(h))),
     ylim = c(0, 0.25*max(outpred$Errmod9)^2))

for (i in 1:dim(outpred$Errmod9)[2]) {
  lines(outpred$hpE, outpred$Errmod9[,i]^2, col = 'green')
}
lines(c(outpred$hpE[1], max(outpred$hpE)), c(0,0), type='l')
```

Function `predictab` can be applied to several models
and the forecasted error growth $e(t,h)$ can also be
plotted as a bidimensionnal representation, where the color
represents the forecasting error (note that different palettes
are here used for the two plots):

```{r, eval = TRUE}
#######################
# test predictability #
#######################
outpred <- predictab(outputGPoM, hp = 15, Nech = 30, selecmod = c(1,9), show = 0)
```

```{r, eval = TRUE, fig.show='hold'}
# manual visualisation of the outputs (e.g. for model 1):
image(outpred$tE, outpred$hpE, t(outpred$Errmod1),
      xlab = expression(italic(t)), ylab = expression(italic(h)),
      main = expression(italic(e[model1](t,h))))
# (e.g. for model 9):
image(outpred$tE, outpred$hpE, t(outpred$Errmod9),
      xlab = expression(italic(t)), ylab = expression(italic(h)),
      main = expression(italic(e[model9])(italic(t),italic(h))))
```

To have an interpretation of this kind of plot,
see reference ^[S. Mangiarotti, P. Mazzega, P. Hiernaux, and E. Mougin, 2012.
Predictability of vegetation cycles over the semi-arid region of
Gourma (Mali) from forecasts of AVHRR-NDVI signals,
*Remote Sensing of Environment*, **123**, 246-257.].
By default, the `predictab` function will be applied to all the
unclassified models (that is such as `okMod == 1`).
An overview can also be obtained at a glance using the default
parameterization (that is such as `show = 1`).

```{r, eval = FALSE}
#######################
# test predictability #
#######################
outpred <- predictab(outputGPoM, hp = 15, Nech = 30)
```

## Conclusions

You have reached the end of this tutorial.
You should be able now to use most of the tools
provided in the `GPoM` package.
We hope you can enjoy this package when applying
it to your favorite data sets.
Two other vignettes are available which aim
is to show the robustness of the global modelling
under noisy conditions, subsampling, short lenth
time series (in `6 Sensitivity`);
and to illustrate its performances on other
dynamical systems (in `7 Retromodelling`)
since the previous tests were performed before
almost exclusively on the same system.

When presenting your results, please kindly quote
the appropriate references to refer to the package,
and to the various techniques used
(see the introducing vignette `GeneralIntro`).