Introduction to hydropeak

Introduction

Hydropeaking causes an important environmental impact on running water ecosystems. Many affected rivers have a poor ecological status. In rivers affected by hydropeaking, the flow conditions are highly complex and difficult to grasp. The implemented event-based algorithm detects flow fluctuations corresponding to increase events (IC) and decrease events (DC). For each event, a set of parameters related to the fluctuation intensity is calculated: maximum flow fluctuation rate (MAFR), mean flow fluctuation rate (MEFR), amplitude (AMP), flow ratio (RATIO), and duration (DUR).

Greimel et al. (2016) introduced a framework for detecting and characterising sub-daily flow fluctuations. By analysing more than 500 Austrian hydrographs, covering the whole range from unimpacted to heavily impacted rivers, different fluctuation types could be identified according to the potential source: e.g., sub-daily flow fluctuations caused by hydropeaking, rainfall or snow and glacier melt. The hydropeak package enables detecting flow fluctuation events in a given time series by computing differences between consecutive time steps and calculating flow fluctuation parameters.

Data

To detect flow fluctuation events, hydropeak needs an input data frame that contains at least a column with an ID of the gauging station, a column with date-time values, and a column with flow rates (Q). To use the functions of hydropeak properly, the input data frame has to be converted to a S3 class called flow. This happens by default in the main function get_metrics(), where the column indices of these three variables can be passed. By default this is cols = c(1, 2, 3) for ID, Time, Q. Converting the data frame to a flow object, makes sure that a standardised date-time format and valid data types will be used.

Q is an example input dataset with 3 variables and 960 stage measurements (Q in \(m^{3}/s\)) from two different gauging stations. One time step is 15 minutes which corresponds to high-resolution data. The dataset is documented in ?Q. We will use these data to demonstrate the detection of increase (IC) and decrease (DC) events and the computation of the metrics from Greimel et al. (2016).

dim(Q)
#> [1] 960   3
head(Q)
#>       ID             Time     Q
#> 1 200000 01.01.2021 00:00 0.753
#> 2 200000 01.01.2021 00:15 0.753
#> 3 200000 01.01.2021 00:30 0.753
#> 4 200000 01.01.2021 00:45 0.752
#> 5 200000 01.01.2021 01:00 0.752
#> 6 200000 01.01.2021 01:15 0.752

To verify the results, we use the Events dataset which also shows the output format of the main function get_metrics(). Events is the output of an ORACLE® database from the Institute of Hydrobiology and Aquatic Ecosystem Management, BOKU University, Vienna, Austria. It contains 165 IC and DC events and 8 variables and is documented in ?Events.

dim(Events)
#> [1] 296   8
head(Events)
#>       ID EVENT_TYPE                Time   AMP  MAFR  MEFR DUR    RATIO
#> 1 200000          0 2021-01-01 00:00:00 0.000 0.000 0.000   2 1.000000
#> 2 200000          4 2021-01-01 00:30:00 0.001 0.001 0.001   1 1.001330
#> 3 200000          1 2021-01-01 00:45:00 0.000 0.000 0.000   3 1.000000
#> 4 200000          5 2021-01-01 01:30:00    NA    NA    NA   5       NA
#> 5 200000          0 2021-01-01 02:45:00 0.000 0.000 0.000   1 1.000000
#> 6 200000          4 2021-01-01 03:00:00 0.001 0.001 0.001   1 1.001332

Compute events and metrics with `get_events()`

get_events() is the main function and processes an input dataset such as Q as follows:

flow() converts Q to a flow object which is formatted to be compatible with the functions in hydropeak.
change_points() computes change points of the flow fluctuation where the flow is increasing (IC) or decreasing (DC). Optionally, constant events or NA events can be included.
all_metrics() for each event determined by change_points(): all metrics according to Greimel et al. (2016) are calculated.
a data frame with all events and metrics is returned.

result <- get_events(Q, omit.constant = FALSE, omit.na = FALSE)
head(result)
#>       ID EVENT_TYPE                Time   AMP  MAFR  MEFR DUR    RATIO   MAX
#> 1 200000          0 2021-01-01 00:00:00 0.000 0.000 0.000   2 1.000000 0.753
#> 2 200000          4 2021-01-01 00:30:00 0.001 0.001 0.001   1 1.001330 0.753
#> 3 200000          1 2021-01-01 00:45:00 0.000 0.000 0.000   3 1.000000 0.752
#> 4 200000          5 2021-01-01 01:30:00    NA    NA    NA   5       NA    NA
#> 5 200000          0 2021-01-01 02:45:00 0.000 0.000 0.000   1 1.000000 0.752
#> 6 200000          4 2021-01-01 03:00:00 0.001 0.001 0.001   1 1.001332 0.752
#>     MIN Q_from  Q_to  Q_mid DUR_mid MEFR_half1 MEFR_half2
#> 1 0.753  0.753 0.753 0.7530       0         NA         NA
#> 2 0.752  0.753 0.752 0.7525       1      5e-04      5e-04
#> 3 0.752  0.752 0.752 0.7520       0         NA         NA
#> 4    NA     NA    NA     NA      NA         NA         NA
#> 5 0.752  0.752 0.752 0.7520       0         NA         NA
#> 6 0.751  0.752 0.751 0.7515       1      5e-04      5e-04

all.equal(Events, result)
#> [1] "Length mismatch: comparison on first 8 components"

Compute events and metrics from input files and directories with `get_events_file()` and `get_events_dir()`

With get_events_file() a file path can be provided as an argument. The function reads a file from the path and calls get_events(). It returns the computed events by default. This can be disabled if the argument return is set to FALSE. All events can then be optionally written to a single file, together. Or if the argument split is set to TRUE, a separate file for each gauging station ID and event type is created. An output directory has to be provided, otherwise it writes to tempdir(). The naming scheme of the output file is ID_event_type_date-from_date_to.csv.

Q_file <- system.file("extdata", "Q.csv", package = "hydropeak")
outdir <- file.path(tempdir(), "Events1")

events <- get_events_file(Q_file, inputsep = ",", inputdec = ".", 
                          save = TRUE, split = TRUE, return = TRUE,
                          outdir = outdir)
head(events)
#>       ID EVENT_TYPE                Time   AMP  MAFR  MEFR DUR    RATIO   MAX
#> 1 200000          4 2021-01-01 00:30:00 0.001 0.001 0.001   1 1.001330 0.753
#> 2 200000          4 2021-01-01 03:00:00 0.001 0.001 0.001   1 1.001332 0.752
#> 3 200000          4 2021-01-01 05:30:00 0.001 0.001 0.001   1 1.001333 0.751
#> 4 200000          4 2021-01-01 07:45:00 0.001 0.001 0.001   1 1.001335 0.750
#> 5 200000          4 2021-01-01 10:15:00 0.001 0.001 0.001   1 1.001337 0.749
#> 6 200000          4 2021-01-01 12:45:00 0.001 0.001 0.001   1 1.001339 0.748
#>     MIN Q_from  Q_to  Q_mid DUR_mid MEFR_half1 MEFR_half2
#> 1 0.752  0.753 0.752 0.7525       1      5e-04      5e-04
#> 2 0.751  0.752 0.751 0.7515       1      5e-04      5e-04
#> 3 0.750  0.751 0.750 0.7505       1      5e-04      5e-04
#> 4 0.749  0.750 0.749 0.7495       1      5e-04      5e-04
#> 5 0.748  0.749 0.748 0.7485       1      5e-04      5e-04
#> 6 0.747  0.748 0.747 0.7475       1      5e-04      5e-04

get_events_dir() allows to read input files from directories and calls get_events_file() for each file in the provided directory. The resulting events are split into separate files for each gauging station ID and event type and are written to the given output directory. If no output directory is provided, it writes to tempdir(). The function does not return anything. The naming scheme of the output files is ID_event_type_date-from_date_to.csv.

Q_dir <- system.file("extdata", package = "hydropeak")
outdir <- file.path(tempdir(), "Events2")

get_events_dir(Q_dir, inputsep = ",", inputdec = ".", outdir = outdir)
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
list.files(outdir)
#> [1] "200000_4_2020-12-31_2021-01-05.csv" "210000_2_2021-01-01_2021-01-05.csv"
#> [3] "210000_4_2020-12-31_2021-01-05.csv"

Using individual metrics

The implemented metrics can be used individually. All of these functions take a single event as their first argument, either increasing or decreasing. To use individual metrics, the event data frame has to be converted first using flow().

Q_event <- Q[3:4, ]
Q_event # decreasing event by 0.001 m^3/s within 15 minutes
#>       ID             Time     Q
#> 3 200000 01.01.2021 00:30 0.753
#> 4 200000 01.01.2021 00:45 0.752

Using get_events() for this DC event results in:

get_events(Q_event)
#>       ID EVENT_TYPE                Time   AMP  MAFR  MEFR DUR   RATIO   MAX
#> 1 200000          4 2021-01-01 00:30:00 0.001 0.001 0.001   1 1.00133 0.753
#>     MIN Q_from  Q_to  Q_mid DUR_mid MEFR_half1 MEFR_half2
#> 1 0.752  0.753 0.752 0.7525       1      5e-04      5e-04

When using the functions separately, first the data set has to be converted with flow():

Q_event <- flow(Q_event)
Q_event
#>       ID                Time     Q
#> 1 200000 2021-01-01 00:30:00 0.753
#> 2 200000 2021-01-01 00:45:00 0.752

The amplitude (AMP, unit: \(m^3/s\)) of an event is defined as the difference between the flow maximum and the flow minimum:

amp(Q_event)
#> [1] 0.001

The maximum flow fluctuation rate (MAFR, unit: \(m^3/s\)) represents the highest absolute flow change of two consecutive time steps within an event.

mafr(Q_event)
#> [1] 0.001

The mean flow fluctuation rate (MEFR, unit: \(m^3/s^2\)) is calculated by the event amplitude divided by the number of time steps (duration) within an event.

mefr(Q_event)
#> [1] 0.001

The duration of an event is specified as the number of consecutive time steps with equal flow trend.

dur(Q_event)
#> [1] 1

The metric flow ratio (RATIO) is defined as the flow maximum divided by the flow minimum.

ratio(Q_event, event_type = event_type(Q_event))
#> [1] 1.00133