--- title: "Deployment of star databases with incremental refresh" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Deployment of star databases with incremental refresh} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, echo = FALSE, message=FALSE} library(rolap) ``` # Introduction Once we developed a star database in R, we would like to exploit it directly in R to develop multidimensional queries, but that is part of a future project. Currently we may be interested in deploying it in a RDBMS (*Relational Database Management Systems*) to use it with OLAP (*On-Line Analytical Processing*) query tools. The **deployment process** consists of **exporting the tables** that make up the star databases to the RDBMS and also **keeping them updated**. The vignette titled *Incremental refresh of star databases*, `vignette("v40-refresh")`, describes how star databases can be periodically updated based on the new data obtained. These updates are transmitted to the RDBMS automatically using the procedure described in this document. This document shows by means of an example the possibilities offered by the package in this context. First, the star database and the deployment processes are presented. The next section shows how the incremental refresh is applied and the result obtained in the relational databases where the deployments have been carried out. Finally, it finish with the conclusions. # Star databases and deployments This section shows the starting star databases and their deployment process. ## Star databases The starting star databases is the content in the variable `mrs_db`, obtained in the vignette titled *Obtaining and transforming flat tables*, `vignette("v05-flat-table-op")`. It contains the constellation, formed by two star databases. Next we get their names. ```{r} library(rolap) mrs_db |> get_fact_names() ``` The code to generate the constellation from the initial data is available in the vignette. Below is a graphic representation of the tables that make it up. ```{r} db_dm <- mrs_db |> as_dm_class(pk_facts = FALSE) db_dm |> dm::dm_draw(rankdir = "LR", view_type = "all") ``` ## Deployment To deploy, we need a file to store the star database. In a real case, it would be a file under our control in our folder tree: we only need a file name. For the vignette, let's get a temporary one. ```{r} mrs_rdb_file <- tempfile() ``` We are going to carry out the deployment on *SQLite*. We will also need a file to store the database. As before, if we use this RDBMS, we would use a file under our control. For the vignette we will use a temporary one. ```{r} mrs_sqlite_file <- tempfile("mrs", fileext = ".sqlite") ``` The deployment functions must be able to access the database that we use. To give them access to the database, we provide a connection function (it must return an object of class `DBIConnection`) and a disconnection function (it must accept a parameter of class `DBIConnection`). ```{r} mrs_sqlite_connect <- function() { DBI::dbConnect(RSQLite::SQLite(), dbname = mrs_sqlite_file) } mrs_sqlite_disconnect <- function(con) { DBI::dbDisconnect(con) } ``` If the disconnection function is like the previous one, we could avoid indicating it because it is the default one. With these parameters, we deploy with the `deploy` function, as shown below. ```{r} mrs_db <- mrs_db |> deploy( name = "mrs_sqlite", connect = mrs_sqlite_connect, disconnect = mrs_sqlite_disconnect, file = mrs_rdb_file ) ``` As a result, fact and dimension tables are explored to the database. The `star_database` object is stored (in the file indicated) so that corresponds to the version stored in the relational database. Additionally, from this moment on, the star database and the relational database are linked: refresh operations to the star databases will automatically be propagated to the relational database. ## Contents of the constellation and the relational database Next, we are going to show the content of the star databases and the associated relational database. For facts and dimensions the number of instances in the star databases is shown. ```{r} l_db <- mrs_db |> as_tibble_list() names <- sort(names(l_db)) for (name in names){ cat(sprintf("name: %s, %d rows\n", name, nrow(l_db[[name]]))) } ``` We get the same information for the relational database. ```{r} mrs_con <- mrs_sqlite_connect() tables <- DBI::dbListTables(mrs_con) for (t in tables) { res <- DBI::dbGetQuery(mrs_con, sprintf('SELECT COUNT(*) FROM `%s`', t)) cat(sprintf("name: %s, %d rows\n", t, res[[1]])) } mrs_sqlite_disconnect(mrs_con) ``` We can see that the tables and their number of instances are the same in both cases. ## Another deployment We can perform more than one deployment associated with a star database. Next, we get a second temporary file for the new *SQLite* relational database. ```{r} mrs_sqlite_file_2 <- tempfile("mrs", fileext = ".sqlite") ``` We need a new connection function to use the new file. We define it below. ```{r} mrs_sqlite_connect_2 <- function() { DBI::dbConnect(RSQLite::SQLite(), dbname = mrs_sqlite_file_2) } ``` In this case we are going to use the default disconnection function and, since a previous deployment has already been carried out, we do not need to indicate any file to store the star databases because it is already being stored in one. The call to the deployment function is as shown below. ```{r} mrs_db <- mrs_db |> deploy( name = "mrs_sqlite_2", connect = mrs_sqlite_connect_2 ) ``` We can consult the current deployments using the following function. ```{r} mrs_db |> get_deployment_names() ``` If necessary, we can also cancel a deployment using the `cancel_deployment()` function: The database is not affected, but it will not be updated with subsequent updates. # Incremental refresh This section shows how incremental refresh updates are transferred from the star databases to the relational databases included in the deployments carried out. To perform the incremental update, we take the data and process presented in vignette *Incremental refresh of star databases*, `vignette("v40-refresh")`. ## New data The update data is stored in the form of a flat table in the package's `mrs_ft_new` variable. Below are the first records of the table. We access the table using the `get_table()` function for the object of the `flat_table` class. ```{r} ft <- mrs_ft_new |> get_table() ft ``` ## Incremental refresh process We apply the modification process carried out on the star databases to the data in the flat table. This process is stored in the `star_database` object itself so we do not have to search for the applied functions. ```{r} mrs_db_age_refresh <- mrs_ft_new |> update_according_to(mrs_db, star = "mrs_age") mrs_db_cause_refresh <- mrs_ft_new |> update_according_to(mrs_db, star = "mrs_cause") ``` The result is that we replicate the structure of each of the components of the constellation on the new data. With this structure we can now carry out the refresh process itself for each of the star databases. ```{r} mrs_db <- mrs_db |> incremental_refresh(mrs_db_age_refresh) |> incremental_refresh(mrs_db_cause_refresh, existing_instances = "group") ``` In each case, using the `existing_instances` parameter, we can decide what to do with the instances that appear in the update and that were already included in the star database. ## Incremental refresh result To check the result obtained, we are going to obtain again the name of the tables and the number of instances for both the star databases and each of the deployments. #### Star database ```{r} l_db <- mrs_db |> as_tibble_list() names <- sort(names(l_db)) for (name in names){ cat(sprintf("name: %s, %d rows\n", name, nrow(l_db[[name]]))) } ``` #### First deployment ```{r} mrs_con <- mrs_sqlite_connect() tables <- DBI::dbListTables(mrs_con) for (t in tables) { res <- DBI::dbGetQuery(mrs_con, sprintf('SELECT COUNT(*) FROM `%s`', t)) cat(sprintf("name: %s, %d rows\n", t, res[[1]])) } mrs_sqlite_disconnect(mrs_con) ``` #### Second deployment ```{r} mrs_con_2 <- mrs_sqlite_connect_2() tables <- DBI::dbListTables(mrs_con_2) for (t in tables) { res <- DBI::dbGetQuery(mrs_con_2, sprintf('SELECT COUNT(*) FROM `%s`', t)) cat(sprintf("name: %s, %d rows\n", t, res[[1]])) } mrs_sqlite_disconnect(mrs_con_2) ``` We can observe that in all three cases the number of instances is the same and has changed with respect to the situation prior to carrying out the refresh operations. ## Checking star database copy synchronization In addition to updating the deployments in the relational databases, the copy of the star databases is automatically kept updated in its file. We can load it into a variable using the `load_star_database()` function and check that its content is identical to that of the original star database. ```{r} mrs_db_new <- load_star_database(mrs_rdb_file) identical(mrs_db, mrs_db_new) ``` If the file has extension *rds*, the `readRDS()` function could also be used directly. The `star_database` object contained in the new variable is fully operational. ```{r, echo = FALSE} unlink(mrs_rdb_file) unlink(mrs_sqlite_file) unlink(mrs_sqlite_file_2) ``` # Conclusions This document shows the functions supporting the deployment in relational databases and the automatic incremental refresh of these offered by the `rolap` package. If we need to perform OLAP analysis using tools that can obtain their data from relational databases, this functionality allows R developers to perform transformations in R in an even more productive way instead of having to work with other tools.