---
title: Goodness-of-fit measures to compare observed and simulated time series with hydroGOF
author:
- Mauricio Zambrano-Bigiarini^[mauricio.zambrano@ufrontera.cl]
date: "version 0.3, 21-Jan-2024"
output: 
  pdf_document:
    number_sections: yes
  rmarkdown::
    html_vignette
vignette: |
  %\VignetteIndexEntry{Goodness-of-fit measures to compare observed and simulated time series with hydroGOF}
  %\VignetteKeyword{hydrology}
  %\VignetteKeyword{hydrological modelling}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Citation

If you use *[hydroGOF](https://cran.r-project.org/package=hydroGOF)*, please cite it as Zambrano-Bigiarini (2024):

Zambrano-Bigiarini, M. (2024) hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time series R package version 0.5-4. URL: https://cran.r-project.org/package=hydroGOF. doi:10.5281/zenodo.839854.


# Installation

Installing the latest stable version (from [CRAN](https://cran.r-project.org/package=hydroGOF)):
```{r installation1, eval=FALSE}
install.packages("hydroGOF")
```


\noindent Alternatively, you can also try the under-development version (from  [Github](https://github.com/hzambran/hydroGOF)):
```{r installation2, eval=FALSE}
if (!require(devtools)) install.packages("devtools")
library(devtools)
install_github("hzambran/hydroGOF")
```


# Setting up the environment

Loading the *hydroGOF* package, which contains data and functions used in this analysis:

```{r LoadingPkg}
library(hydroGOF)
```

# Example using NSE

The following examples use the well-known Nash-Sutcliffe efficiency (NSE), but you can repeat the computations using any of the goodness-of-fit measures included in the *hydroGOF* package (e.g., KGE, ubRMSE, dr).


## Example 1

Basic ideal case with a numeric sequence of integers:

```{r Example1}
obs <- 1:10
sim <- 1:10
NSE(sim, obs)

obs <- 1:10
sim <- 2:11
NSE(sim, obs)
```


## Example 2

From this example onwards, a streamflow time series will be used.

First, we load the daily streamflows of the Ega River (Spain), from 1961 to 1970:

```{r Example2-Loading}
data(EgaEnEstellaQts)
obs <- EgaEnEstellaQts
```


Generating a simulated daily time series, initially equal to the observed series:
```{r Example2-1}
sim <- obs 
```

Computing the 'NSE' for the "best" (unattainable) case
```{r Example2-2}
NSE(sim=sim, obs=obs)
```


## Example 3

NSE for simulated values equal to observations plus random noise on the first half of the observed values. 

This random noise has more relative importance for low flows than for medium and high flows.
  
Randomly changing the first 1826 elements of 'sim', by using a normal distribution with mean 10 and standard deviation equal to 1 (default of 'rnorm').

```{r Example3-1, fig.width=8, fig.height=5}
sim[1:1826] <- obs[1:1826] + rnorm(1826, mean=10)
ggof(sim, obs)

NSE(sim=sim, obs=obs)
```

Let's have a look at other goodness-of-fit measures:

```{r Example3-2, fig.width=8, fig.height=5}
mNSE(sim=sim, obs=obs)               # modified NSE
rNSE(sim=sim, obs=obs)               # relative NSE

KGE(sim=sim, obs=obs)                # Kling-Gupta efficiency (KGE), 2009
KGE(sim=sim, obs=obs, method="2012") # Kling-Gupta efficiency (KGE), 2012
KGElf(sim=sim, obs=obs)              # KGE for low flows
KGEnp(sim=sim, obs=obs)              # Non-parametric KGE
sKGE(sim=sim, obs=obs)               # Split KGE

d(sim=sim, obs=obs)                  # Index of agreement (d)
rd(sim=sim, obs=obs)                 # Relative d
md(sim=sim, obs=obs)                 # Modified d
dr(sim=sim, obs=obs)                 # Refined d

VE(sim=sim, obs=obs)                 # Volumetric efficiency
cp(sim=sim, obs=obs)                 # Coefficient of persistence

pbias(sim=sim, obs=obs)              # Percent bias (PBIAS)
pbiasfdc(sim=sim, obs=obs)           # PBIAS in the slope of the midsegment of the FDC

rmse(sim=sim, obs=obs)               # Root mean square error (RMSE)
ubRMSE(sim=sim, obs=obs)             # Unbiased RMSE

rPearson(sim=sim, obs=obs)           # Pearson correlation coefficient
rSpearman(sim=sim, obs=obs)          # Spearman rank correlation coefficient
R2(sim=sim, obs=obs)                 # Coefficient of determination (R2)
br2(sim=sim, obs=obs)                # R2 multiplied by the slope of the regression line
```

## Example 4: 
NSE for simulated values equal to observations plus random noise on the first half of the observed values and applying (natural) logarithm to 'sim' and 'obs' during computations.

```{r Example4-1}
NSE(sim=sim, obs=obs, fun=log)
```

Verifying the previous value:

```{r Example4-2}
lsim <- log(sim)
lobs <- log(obs)
NSE(sim=lsim, obs=lobs)
```

Let's have a look at other goodness-of-fit measures:

```{r Example4-3, fig.width=8, fig.height=5}
mNSE(sim=sim, obs=obs, fun=log)               # modified NSE
rNSE(sim=sim, obs=obs, fun=log)               # relative NSE

KGE(sim=sim, obs=obs, fun=log)                # Kling-Gupta efficiency (KGE), 2009
KGE(sim=sim, obs=obs, method="2012", fun=log) # Kling-Gupta efficiency (KGE), 2012
KGElf(sim=sim, obs=obs)                       # KGE for low flows (it does not allow 'fun' argument)
KGEnp(sim=sim, obs=obs, fun=log)              # Non-parametric KGE
sKGE(sim=sim, obs=obs, fun=log)               # Split KGE

d(sim=sim, obs=obs, fun=log)                  # Index of agreement (d)
rd(sim=sim, obs=obs, fun=log)                 # Relative d
md(sim=sim, obs=obs, fun=log)                 # Modified d
dr(sim=sim, obs=obs, fun=log)                 # Refined d

VE(sim=sim, obs=obs, fun=log)                 # Volumetric efficiency
cp(sim=sim, obs=obs, fun=log)                 # Coefficient of persistence

pbias(sim=sim, obs=obs, fun=log)              # Percent bias (PBIAS)
pbiasfdc(sim=sim, obs=obs, fun=log)           # PBIAS in the slope of the midsegment of the FDC

rmse(sim=sim, obs=obs, fun=log)               # Root mean square error (RMSE)
ubRMSE(sim=sim, obs=obs, fun=log)             # Unbiased RMSE

rPearson(sim=sim, obs=obs, fun=log)           # Pearson correlation coefficient (r)
rSpearman(sim=sim, obs=obs, fun=log)          # Spearman rank correlation coefficient (rho)
R2(sim=sim, obs=obs, fun=log)                 # Coefficient of determination (R2)
br2(sim=sim, obs=obs, fun=log)                # R2 multiplied by the slope of the regression line
```

## Example 5

NSE for simulated values equal to observations plus random noise on the first half of the observed values and applying (natural) logarithm to 'sim' and 'obs' and adding the Pushpalatha2012 constant during computations

```{r Example5-1}
NSE(sim=sim, obs=obs, fun=log, epsilon.type="Pushpalatha2012")
```

Verifying the previous value, with the epsilon value following Pushpalatha2012:

```{r Example5-2}
eps  <- mean(obs, na.rm=TRUE)/100
lsim <- log(sim+eps)
lobs <- log(obs+eps)
NSE(sim=lsim, obs=lobs)
```

Let's have a look at other goodness-of-fit measures:

```{r Example5-3}
gof(sim=sim, obs=obs, fun=log, epsilon.type="Pushpalatha2012", do.spearman=TRUE, do.pbfdc=TRUE)
```

## Example 6

NSE for simulated values equal to observations plus random noise on the first half of the observed values and applying (natural) logarithm to 'sim' and 'obs' and adding a user-defined constant during computations

```{r Example6-1}
eps <- 0.01
NSE(sim=sim, obs=obs, fun=log, epsilon.type="otherValue", epsilon.value=eps)
```

Verifying the previous value:

```{r Example6-2}
lsim <- log(sim+eps)
lobs <- log(obs+eps)
NSE(sim=lsim, obs=lobs)
```

Let's have a look at other goodness-of-fit measures:

```{r Example6-3}
gof(sim=sim, obs=obs, fun=log, epsilon.type="otherValue", epsilon.value=eps, do.spearman=TRUE, do.pbfdc=TRUE)
```

## Example 7

NSE for simulated values equal to observations plus random noise on the first half of the observed values and applying (natural) logarithm to 'sim' and 'obs' and using a user-defined factor to multiply the mean of the observed values to obtain the constant to be added to 'sim' and 'obs' during computations

```{r Example7-1}
fact <- 1/50
NSE(sim=sim, obs=obs, fun=log, epsilon.type="otherFactor", epsilon.value=fact)
```

Verifying the previous value:

```{r Example7-2}
fact <- 1/50
eps  <- fact*mean(obs, na.rm=TRUE)
lsim <- log(sim+eps)
lobs <- log(obs+eps)
NSE(sim=lsim, obs=lobs)
```

Let's have a look at other goodness-of-fit measures:

```{r Example7-3}
gof(sim=sim, obs=obs, fun=log, epsilon.type="otherFactor", epsilon.value=fact, do.spearman=TRUE, do.pbfdc=TRUE)
```

## Example 8

NSE for simulated values equal to observations plus random noise on the first half of the observed values and applying a user-defined function to 'sim' and 'obs' during computations:

```{r Example8-1}
fun1 <- function(x) {sqrt(x+1)}
NSE(sim=sim, obs=obs, fun=fun1)
```

Verifying the previous value, with the epsilon value following Pushpalatha2012:

```{r Example8-2}
sim1 <- sqrt(sim+1)
obs1 <- sqrt(obs+1)
NSE(sim=sim1, obs=obs1)
```

```{r Example8-3}
gof(sim=sim, obs=obs, fun=fun1, do.spearman=TRUE, do.pbfdc=TRUE)
```

# A short example from hydrological modelling

Loading observed streamflows of the Ega River (Spain), with daily data from 1961-Jan-01 up to 1970-Dec-31

```{r }
require(zoo)
data(EgaEnEstellaQts)
obs <- EgaEnEstellaQts
```

Generating a simulated daily time series, initially equal to the observed values (simulated values are usually read from the output files of the hydrological model)

```{r }
sim <- obs 
```

Computing the numeric goodness-of-fit measures for the "best" (unattainable) case

```{r }
gof(sim=sim, obs=obs)
```

-  Randomly changing the first 1826 elements of 'sim' (half of the ts), by using a normal distribution with mean 10 and standard deviation equal to 1 (default of 'rnorm').

```{r }
sim[1:1826] <- obs[1:1826] + rnorm(1826, mean=10)
```

Plotting the graphical comparison of 'obs' against 'sim', along with the numeric goodness-of-fit measures for the daily and monthly time series 

```{r fig=TRUE, pdf=TRUE, eps=FALSE, fig.width=10, fig.height=7}
ggof(sim=sim, obs=obs, ftype="dm", FUN=mean)
```


## Removing warm-up period

Using the first two years (1961-1962) as warm-up period, and removing the corresponding observed and simulated values from the computation of the goodness-of-fit measures:
```{r fig=TRUE, pdf=TRUE, eps=FALSE, fig.width=10, fig.height=7}
ggof(sim=sim, obs=obs, ftype="dm", FUN=mean, cal.ini="1963-01-01")
```


Verification of the goodness-of-fit measures for the daily values after removing the warm-up period:
```{r }
sim <- window(sim, start="1963-01-01")
obs <- window(obs, start="1963-01-01")

gof(sim, obs)
```

## Plotting uncertainty bands

Generating fictitious lower and upper uncertainty bounds:

```{r ubands1, fig.width=8, fig.height=5}
lband <- obs - 5
uband <- obs + 5
plotbands(obs, lband, uband)
```

Plotting the previously generated uncertainty bands:

```{r ubands2, fig.width=8, fig.height=5}
plotbands(obs, lband, uband)
```

Randomly generating a simulated time series:

```{r ubands3}
sim <- obs + rnorm(length(obs), mean=3)
```

Plotting the previously generated simualted time series along the obsertations and the uncertainty bounds:

```{r ubands4, fig.width=8, fig.height=5}
plotbands(obs, lband, uband, sim)
```


## Analysis of the residuals


Computing the daily residuals (even if this is a dummy example, it is enough for illustrating the capability)

```{r }
r <- sim-obs
```

Summarizing and plotting the residuals (it requires the hydroTSM package):

```{r }
library(hydroTSM)
smry(r) 
```

```{r fig=TRUE, pdf=TRUE, fig.width=10, fig.height=12}
# daily, monthly and annual plots, boxplots and histograms
hydroplot(r, FUN=mean)
```


Seasonal plots and boxplots
```{r fig=TRUE, eval=TRUE, pdf=TRUE, eps=FALSE, fig.width=8, fig.height=8}
# daily, monthly and annual plots, boxplots and histograms
hydroplot(r, FUN=mean, pfreq="seasonal")
```


# Software details
This tutorial was built under: 

```{r echo=FALSE}
sessionInfo()$platform
sessionInfo()$R.version$version.string 
paste("hydroGOF", sessionInfo()$otherPkgs$hydroGOF$Version)
```

# Version history

* v0.3: Jan-2024
* v0.2: Mar-2020
* v0.1: Aug 2011