Discharge simulations are affected by several sources of uncertainty (e.g. errors in meteorological forcings, sub-optimal parameter estimates).
Data assimilation (DA) allows to combine measurements and model simulations under the assumption that both supply useful information to obtain the most likely estimate of the system state. Among the sequential methods, the ensemble-based techniques allow to explicitly handle different sources of uncertainty and to quantify the unknown errors of both model states and observations.
After the study of Piazzi et al.
(accepted), airGRdatassim package (Piazzi
and Delaigue 2021) provides functions allowing to perform
DA-based ensemble discharge simulations. airGRdatassim is a package
based on the airGR hydrological
modeling package (Coron et al. 2017,
2023). This package is designed to flexibly use the Ensemble
Kalman filter (EnKF) or Particle filter (PF) technique to assimilate
daily discharge observations into GR4J, GR5J, and GR6J models (use the
command ?airGR
to get the references of the GR models).
The uncertainty in both the meteorological forcings (i.e. precipitation and potential evapotranspiration) and model states (i.e. production store level, routing store level and unit hydrographs) can be easily taken into account by enabling specific perturbation procedures.
In this user guide we go through the assimilation procedure with the aim of explaining how to perform DA-based discharge simulations. Along with the description of the modelling system, key recommendations are provided to support the definition of the operational configuration.
airGRdatassim is a package based on the airGR hydrological modeling package (Coron et al. 2017, 2023).
All the examples rely on the following data set, which is available in airGR package:
## Loading required package: airGR
## DatesR P T E Qls Qmm
## 1 1984-01-01 4.1 0.5 0.2 2640 0.6336
## 2 1984-01-02 15.9 0.2 0.2 3440 0.8256
## 3 1984-01-03 0.8 0.9 0.3 12200 2.9280
## 4 1984-01-04 0.0 0.5 0.3 7600 1.8240
## 5 1984-01-05 0.0 -1.6 0.1 6250 1.5000
## 6 1984-01-06 0.0 0.9 0.3 5650 1.3560
This section introduces and explains the main settings of the DA scheme that the user needs to define.
The ensemble size can be defined by assigning NbMbr
with
the number of ensemble members.
It is noteworthy that the PF technique is more sensitive to the ensemble size than the EnKF scheme. Indeed, compared to the EnKF scheme, the PF generally needs a larger ensemble size to efficiently perform the weighting and resampling of particles. It is recommended to generate and use an ensemble of at least 20 or 30 members, when using the EnKF or the PF scheme, respectively.
The DA methodology can be chosen between EnKF and PF by defining
DaMethod
. To perform the open-loop simulation (i.e.,
without DA), the assimilation of discharge observations can be
disabled.
If DaMethod = "PF"
, the PF performs the joint update of
all the state variables.
If DaMethod = "EnKF"
, the EnKF performs the update of
the state variables specified in StateEnKF
.
If DaMethod = "none"
, discharge assimilation is not
performed (i.e., open-loop simulation).
Among the model states, the update of the routing store level ensures the most benefit from the assimilation of observed discharges. Conversely, the update of the unit hydrograph only has a slight impact on the accuracy of the DA-based discharge simulations (Piazzi et al. accepted).
Both the uncetainties in meteorological forcings and state variables can be taken into account in the assimilation scheme.
For both DA techniques, the state variables defined in
StatePert
are perturbed.
The perturbation of model states is performed through a normally-distributed null-mean noise. The noise variance is assumed equal to the variance of the state variables resulting from the analysis procedure and it is restricted between upper and lower limits (Salamon and Feyen 2009).
When considering the uncertainty in the forcings, the
CreateInputsPert()
function generates an ensemble for
precipitation or/and potential evapotranspiration, provided as function
argument/s. It is noteworthy that a significant reduction in the
ensemble spread may occur especially in no-rain periods, when accounting
only for meteorological uncertainty.
Even though a more comprehensive representation of the state uncertainty can prevent the ensemble shrinkage, it has a different impact on the performance of the PF and the EnKF schemes (Piazzi et al. accepted). Indeed, while a higher model error covariance can lead to an overweighting of observations in the EnKF-based analysis procedure, a larger spread of the ensemble simulations allows for a more straightforward weighting and thus a more efficient resampling of particles in the PF scheme.
The example below considers EnKF-based discharge simulations relying
on a 100-member ensemble, when accounting for the uncertainty in model
state variables (see StatePert
). More specifically, only
the levels of the production ("Prod"
) and the routing
("Rout"
) stores are updated and perturbed (see
StatePert
).
It is of key importance to consider that StateEnKF
allows to select the state variables to be updated and
StatePert
identifies the state variables to be perturbed in
the EnKF scheme. Conversely, in the PF scheme StatePert
identifies the state variables of interest only within the perturbation
procedure, as this DA technique jointly updates all the model
states.
When accounting for the uncertainty in meteorological forcings, the
CreateInputsPert()
function generates an ensemble for
precipitation or/and potential evapotranspiration, provided as function
argument/s.
Probabilistic model inputs result from the perturbation of the meteorological data through a multiplicative stochastic noise, according to the methodology proposed by Clark et al. (2008) The random perturbations are provided through a first-order autoregressive model in order to guarantee a physical consistency and a temporal correlation of the time-variant forcings. For the generation of perturbations of both the meteorological variables, the fractional error parameter is set equal to 0.65 and the temporal decorrelation length is defined as 1 day for rainfall and 2 days for potential evapotranspiration, by default.
Deterministic model inputs are still needed for the warm-up period
and they are generated through the CreateInputsModel()
function of the airGR package.
In this example, discharge simulations are provided by GR5J model throughout the period from 2006-09-01 to 2006-09-31.
## simulation period
IndRun <- seq(which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-09-01"),
which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-10-31"))
## preparation of InputsModel object
InputsModel <- CreateInputsModel(FUN_MOD = RunModel_GR5J, DatesR = BasinObs$DatesR,
Precip = BasinObs$P, PotEvap = BasinObs$E)
## generation of probabilistic model inputs
InputsPert <- CreateInputsPert(FUN_MOD = RunModel_GR5J,
DatesR = BasinObs$DatesR,
Precip = BasinObs$P, PotEvap = BasinObs$E,
NbMbr = NbMbr)
Please note that if the uncertainty in meteorological forcings is not
taken into account, the same deterministic InputsModel
object is used to force each ensemble member.
The estimate of the observation error is of critical importance, as it defines how the filter trusts the observations and thus to what extent they are assimilated into the model.
By default, the measurement noise is generated from a normal distribution with a zero-valued mean and a variance parameterized as a function of the observed discharge rate (Weerts and El Serafy 2006; Clark et al. 2008). With the aim of preventing underestimated error variances in case of low discharge, the minimum threshold of the observation error variance is evaluated as the quantile 10 of discharge observations, by default (Thirel et al. 2010).
We assume that the GR5J model has been previously calibrated by using
the Calibration()
function of airGR package.
Therefore, we refer to the calibrated values of model parameters as
Param
.
The RunModel_DA()
function supplies the ensemble
discharge simulations resulting from the sequential assimilation of
daily discharge observations via EnKF or PF. Because of the recursive
approach, the analysis states resulting from the assimilation procedure
at time step t
are used as initial states at the following
prediction time step t + 1
.
If the state uncertainty is taken into account
(StatePert
), the initial state at the prediction time step
t + 1
is the perturbed analysis state resulting from the
assimilation and perturbation procedures at time step
t
.
The function provides three outputs:
the ensemble discharge simulations
(QsimEns
)
the ensemble values of background model states, namely before the
filter update (EnsStateBkg
)
the ensemble values of analysis model states, namely after the
filter update (EnsStateA
)
The following example supplies probabilistic discharge simulations relying on the EnKF-based assimilation of daily discharge observations.
Param <- c(X1 = 194.243, X2 = -0.088, X3 = 117.740, X4 = 1.680, X5 = 0.000)
ResEnKF <- RunModel_DA(InputsModel = InputsModel,
InputsPert = InputsPert,
Qobs = BasinObs$Qmm,
IndRun = IndRun,
FUN_MOD = RunModel_GR5J, Param = Param,
DaMethod = "EnKF", NbMbr = NbMbr,
StateEnKF = StateEnKF, StatePert = StatePert)
As defined in StateEnKF
, in this example the EnKF scheme
updates only the levels of the production ("Prod"
) and the
routing ("Rout"
) stores through the assimilation of
discharge observations. Along with the uncertainty in meterological
forcings (InputsPert
is supplied), also the uncertainty in
model states (StatePert
) is taken into account. Therefore,
the selected state variables (i.e. the store levels) are consistently
perturbed after their update.
The EnKF scheme relies on mass-constraints to guarantee the non-negativity of state variables by ensuring minimum state values. It is noteworthy that the level of production store cannot drop below the 5 percent of its capacity. A further constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities.
The example below provides ensemble discharge simulations resulting from the daily PF-based assimilation of discharge observations.
The PF scheme relies on the Sequential Importance Resampling (SIR) technique (Gordon, Salmond, and Smith 1993). According to the SIR approach, the importance weights associated to the particles are updated according to the likelihood of particle states with respect to the observed one. A resampling procedure allows to discard particles having a low probability and to replicate those having a high importance weight. If the ensemble is too squeezed to properly evaluate the importance weights, all particles are assigned equal weights.
ResPF <- RunModel_DA(InputsModel = InputsModel,
InputsPert = InputsPert,
Qobs = BasinObs$Qmm,
IndRun = IndRun,
FUN_MOD = RunModel_GR5J, Param = Param,
DaMethod = "PF", NbMbr = NbMbr,
StatePert = "Rout")
Unlike the previous example, here only the uncertainty in the level
of the routing store (see StatePert
) is taken into account.
Therefore, the routing store level is the only state variable to be
perturbed after the updating procedure.
After the state perturbation, a mass-constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities.
Here the performances of the EnKF- and PF-based modelling systems are assessed against the open-loop run, which provides discharge simulations without the assimilation of discharge observations.
RunOptions <- CreateRunOptions(FUN_MOD = RunModel_GR5J,
InputsModel = InputsModel,
IndPeriod_Run = IndRun)
InputsCritMulti <- CreateInputsCrit(FUN_CRIT = list(ErrorCrit_KGE, ErrorCrit_NSE),
InputsModel = InputsModel, RunOptions = RunOptions,
Obs = list(Qobs, Qobs),
VarObs = list("Q", "Q"),
transfo = list("", "sqrt"),
Weights = NULL)
# open-loop run (without DA)
OutputsModel_OL <- RunModel_GR5J(InputsModel = InputsModel, RunOptions = RunOptions,
Param = Param)
CritOL <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_OL)
# EnKF run
OutputsModel_EnKF <- OutputsModel_OL
OutputsModel_EnKF$Qsim <- rowMeans(ResEnKF$QsimEns)
CritEnKF <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_EnKF)
# PF run
OutputsModel_PF <- OutputsModel_OL
OutputsModel_PF$Qsim <- rowMeans(ResPF$QsimEns)
CritPF <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_PF)
When comparing the performance of these 3 modelling configurations, it is noteworthy that the assimilation of discharge observations generally succeeds in enhancing the accuracy of discharge simulations, though they rely on different DA settings (i.e. different DA techniques, system uncertainties and updated states).
## Method KGE[Q] NSE[sqrt(Q)]
## 1 OL 0.747 0.771
## 2 EnKF 0.795 0.709
## 3 PL 0.869 0.795