--- title: "(02) SNIH dataset" author: "Ezequiel Toum" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{(02) SNIH dataset} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(hydrotoolbox) ``` ## Servicio Nacional de Información Hídrica (SNHI) dataset Sin lugar a dudas, el SNIH posee la más extensa base de datos hidro-meteorológicos (tanto desde el punto de vista espacial como temporal) para la República Argentina (SNIH). En él se pueden encontrar los registros de estaciones desde la Quiaca a Tierra del Fuego, además contiene series que datan de principios del siglo pasado. *** Without a doubt, the SNIH has the most extensive hydro-meteorological database (both from the spatial and temporal point of view) for the Argentine Republic (SNIH). In it the user can find the records of stations from La Quiaca to Tierra del Fuego (northernmost and southernmost places respectively), it also contains series dating from the beginning of the last century. ## Reading individual files La página web permite descargar las variables medidas en cada estación de a una por vez. El paquete **hydrotoolbox** ofrece la posibilidad de leer estos archivos (formato *.xlsx*) de manera automática mediante la función `read_snih()`. Al hacerlo, se cargará al *Global Environment* de **R** un `data.frame` con los datos del archivo original. Cabe destacar que esta función rellena automáticamente los vacíos existentes entre registros con `NA_real_`. Las siguientes líneas de código muestran cómo aplicar esta función con la serie de caudales medios diarios registradas en la estación Guido (provincia de Mendoza). *** The website allows you to download the variables measured at each station one at a time. **hydrotoolbox ** allows to read these files (*.xlsx* format) automatically using the `read_snih()` function. Doing so will load to the *Global Environment* a `data.frame` with the data from the original file. It should be noted that this function automatically fills the gaps between records with `NA_real_`. In the following code lines I show how to apply this function with the daily mean streamflow series recorded at the Guido station (Mendoza province). ```{r read_fun, eval=FALSE, fig.width = 6, fig.height = 4} # set path to file path_file <- system.file('extdata', 'snih_qd_guido.xlsx', package = 'hydrotoolbox') # read daily mean streamflow with default column name guido_qd <- read_snih(path = path_file, by = 'day') head(guido_qd) # now we use the function with column name rm(guido_qd) guido_qd <- read_snih(path = path_file, by = 'day', out_name = 'qd(m3/s)') head(guido_qd) # plot the series plot(x = guido_qd[ , 1], y = guido_qd[ , 2], type = 'l', main = 'Daily mean streamflow at Guido (Mendoza basin)', xlab = 'Date', ylab = 'Q(m3/s)', col = 'dodgerblue', lwd = 1, ylim = c(0, 200)) ``` Si bien esta función resulta de gran utilidad, a medida que la cantidad de variables a analizar crece, cargar estas tablas, ordenarlas y modificarlas, se vuelve tarea complicada. La solución que ofrece **hydrotoolbox** es la de trabajar con los objetos y métodos que el paquete provee. En las siguientes secciones muestro cómo usarlos. *** Although this function is very useful, as the number of variables to be analyzed grows, loading these tables, ordering and modifying them becomes a complicated task. The solution that **hydrotoolbox** offers is to work with the objects and methods that the package provides. In the following sections I will show you how to use them. ## Using classes and methods to build a meteorological station Como menciono en los principios de diseño de este paquete (`vignette('package_overview', package = 'hydrotoolbox')`), los datos que se registran en las estaciones deben almacenarse en un mismo objeto. Por ello primero habrá que crear dicho objeto (o estación hidro-meteorológica) y luego usar `hm_build_generic()`, un método que permite cargar automáticamente al objeto todas las variables que la estación real registra. *** As I mentioned in the design principles of this package (`vignette ('package_overview', package = 'hydrotoolbox')`), the data that is recorded in the stations must be stored in the same object. For this reason, you must first create the object (or hydro-meteorological station) and then use `hm_build_generic()`, a method that allows you to automatically load all variables to the object that the real world station records. ```{r build, eval=FALSE, fig.width = 6, fig.height = 4} # in this path you will find the raw example data path <- system.file('extdata', package = 'hydrotoolbox') list.files(path) # we load in a single object (hydromet_station class) # the streamflow and water height series guido <- hm_create() %>% # create the met-station hm_build_generic(path = path, file_name = c('snih_qd_guido.xlsx'), slot_name = c('qd'), FUN = read_excel, by = c('day'), sheet = 1L ) # we can explore the data-set inside it by using hm_show guido %>% hm_show() # you can also rename the column names guido <- guido %>% hm_name(slot_name = 'qd', col_name = 'q(m3/s)') guido %>% hm_show(slot_name = 'qd') ``` ## Data visualization Una de las herramientas más útiles para analizar series hidrológicas y sintetizar resultados son los gráficos. En esta sección muestro cómo emplear `hm_plot()`, método que permite graficar series de tiempo de forma estática y dinámica a través de argumentos intuitivos y por lo tanto sencillos de aplicar. `hm_plot()` usa internamente parte de la funcionalidad de los paquetes `ggplot2` y `plotly`. *** One of the most useful tools to analyze hydrological series and synthesize results are graphics. In this section I show how to use `hm_plot ()`, a method that allows to plot time series statically and dynamically through intuitive arguments. `hm_plot ()` uses some of the functionality of the `ggplot2` and `plotly` packages. ```{r plot_1, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # we ask hydrotolkit to show all the variables # with data in our station guido %>% hm_show() # if want to analyze the daily mean streamflow records guido %>% hm_plot(slot_name = 'qd', col_name = list('q(m3/s)'), interactive = TRUE, line_color = 'dodgerblue', x_lab = 'Date', y_lab = 'Q(m3/s)' ) ``` ```{r plot_2, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # just show the discharge for the hydrological year 2016/2017 # for publishing guido %>% hm_plot(slot_name = 'qd', col_name = list('q(m3/s)'), interactive = FALSE, line_color = 'dodgerblue', x_lab = 'Date', y_lab = 'Q(m3/s)', from = '2016-07-01', to = '2017-06-30', legend_lab = 'Guido station', title_lab = 'Daily mean discharge' ) ``` ## Access to met-satation information En esta sección muestro cómo usar los métodos `hm_show()`, `hm_report()` y `hm_get()`. Éstos sirven para obtener información cuantitativa acerca de los datos y para extraer las tablas de la estación. *** In this section I show how to use the `hm_show()`, `hm_report()` and `hm_get()` methods. They are used to obtain quantitative information about the data and to extract out of the `hydromet_station` object the `data.frames`. ```{r show, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # the show method allows to get an idea about the stored variables guido %>% hm_show() # or maybe we want to specify the slots guido %>% hm_show(slot_name = c('id', 'qd', 'tair') ) ``` ```{r report, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # suppose that to get an idea about the basic statistics of our data # and we want to know how many missing data we have guido %>% hm_report(slot_name = 'qd') ``` ```{r get, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # now you want to extract the table guido %>% hm_get(slot_name = 'qd') %>% head() ``` ## Data transformation Como menciono en los principios de diseño del paquete, las modificaciones se deben poder almacenar en el mismo archivo con el fin de evitar las múltiples vesiones. En esta sección vamos a ver algunos ejemplos en el uso de los métodos `hm_mutate()` y `hm_melt()`. *** As I mention in the package design principles, modifications must be able to be stored in the same file, in order to avoid the multiple versioning issue. In this section we will see some examples with `hm_mutate()` and `hm_melt()` methods. ```{r mutate, eval=FALSE, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # apply a moving average windows to streamflow records guido %>% hm_mutate(slot_name = 'qd', FUN = mov_avg, k = 10, pos = 'c', out_name = 'mov_avg') %>% # see ?mov_avg() hm_plot(slot_name = 'qd', col_name = list(c('q(m3/s)', 'mov_avg') ), interactive = TRUE, line_color = c('dodgerblue', 'red3'), y_lab = 'Q(m3/s)', legend_lab = c('obs', 'mov_avg') ) ``` > NOTE: hm_mutate() can also be combined with the dplyr package function mutate(). ```{r melt, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # lets say that we want to put together snow water equivalent from Toscas (dgi) # and daily streamflow discharge from Guido (snih) # on the first place we build the Toscas station # dgi file toscas <- hm_create() %>% hm_build_generic(path = path, file_name = 'dgi_toscas.xlsx', slot_name = c('swe', 'tmax', 'tmin', 'tmean', 'rh', 'patm'), by = 'day', FUN = read_dgi, sheet = 1L:6L ) # now we melt the required data in a new object hm_create(class_name = 'compact') %>% hm_melt(melt = c('toscas', 'guido'), slot_name = list(toscas = 'swe', guido = 'qd'), col_name = 'all', out_name = c('swe(mm)', 'qd(m3/s)') ) %>% hm_plot(slot_name = 'compact', col_name = list( c('swe(mm)', 'qd(m3/s)') ), interactive = TRUE, legend_lab = c('swe-Toscas', 'qd-Guido'), line_color = c('dodgerblue', 'red'), y_lab = c('q(m3/s)', 'swe(mm)'), dual_yaxis = c('right', 'left') ) ``` ## Quality flags and non-numeric columns Desde la versión *1.1.0* del paquete, los objetos `hydromet_station` y `hydromet_compact` admiten columnas no numéricas. Esto permite agregar metadatos varios a las series en cuestión. *** Since version *1.1.0* of the package, the `hydromet_station` and `hydromet_compact` objects support non-numeric columns. This allows to add several metadata-types to the tables. ```{r quality-flag, eval=FALSE, fig.width = 6, fig.height = 4, warning = FALSE} # we are going to add come quality-flags to the data library(tibble) my_station <- hm_create(class_name = "station") my_tb <- tibble( date = seq.POSIXt(from = ISOdate(2022, 1, 1, 0, 0, 0), to = ISOdate(2022, 1, 1, 23, 0, 0), by = "hour" ), random_var = runif(n = 24, min = 0, max = 10), unit = "my_units", quality_flag = c(rep("good", 20), rep("bad", 4)) ) my_station <- my_station %>% hm_set(unvar = my_tb) my_station %>% hm_show() ```