Star Database Export Features

Introduction

Once we developed a star database in R, we would like to exploit it directly in R to develop multidimensional queries, but that is part of a future project. Currently we need to export the result to other formats to be able to use other OLAP query tools.

This document shows the possibilities offered by the package in this context: Export an object of the star_database class to other formats. After presenting the object that we will use as an example and how to select it, a section is dedicated to each type of element obtained: dm, list of tibble objects, relational databases, xlsx and csv files, and geomultistar::multistar object. Finally, it finish with the conclusions.

Example star_database object

The variable mrs_db, obtained in the vignette titled Obtaining and transforming flat tables, vignette("v05-flat-table-op"), contains an object of class star_database that we will use in the examples.

class(mrs_db)
#> [1] "star_database"

We can see a representation of the tables it contains using the draw_tables() function, as shown below.

mrs_db |>
  draw_tables()

Select only one star database

Strictly speaking, a star database is composed of a fact table and several associated dimension tables. A constellation is made up of several star databases that can share dimensions. In the rolap package they are treated in a unified way under the star_database class: It is used both to define star databases and to define constellations.

It is possible that we are interested in exporting only a star database of the constellation, or a subset of it (also a constellation). For this reason, in this situation, the first step before performing the export operation would be to select the star databases that interest us from the constellation. This operation can be carried out using the get_star_database() function, where the names of the star databases are indicated. The names can be obtained using the get_fact_names() function.

mrs_db |> 
  get_fact_names()
#> [1] "mrs_age"   "mrs_cause"

Next, we select one of the star databases (also an object of class star_database) and display its tables.

mrs_cause <- mrs_db |> 
  get_star_database("mrs_cause")

class(mrs_cause)
#> [1] "star_database"

mrs_cause |>
  draw_tables()

dm class object

The dm package allows us to work in R with tables that correspond to others from relational databases, both to import and export them. It also allows them to be represented graphically (the graphical representations of the tables presented in the previous section have been made using the dm package).

We can directly obtain an object of the dm class from the tables of our star databases using the as_dm_class() function.

mrs_dm <- mrs_db |>
  as_dm_class()

class(mrs_dm)
#> [1] "dm"

mrs_dm
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `when`, `where`, `who`, `mrs_cause`, `mrs_age`
#> Columns: 25
#> Primary keys: 5
#> Foreign keys: 5

List of tibble objects

We can generate a tibble list from the component tables or from the flat tables obtained from them.

From component tables

Using the as_tibble_list() function, we get a tibble list with the dimension and fact tables, as shown below.

tl <- mrs_db |>
  as_tibble_list()

tl
#> $when
#> # A tibble: 2,076 × 4
#>    when_key year  week  week_ending_date
#>       <int> <chr> <chr> <chr>           
#>  1        1 1962  " 1"  01/06/1962      
#>  2        2 1962  " 2"  01/13/1962      
#>  3        3 1962  " 5"  02/03/1962      
#>  4        4 1962  " 6"  02/10/1962      
#>  5        5 1962  " 8"  02/24/1962      
#>  6        6 1962  " 9"  03/03/1962      
#>  7        7 1962  "12"  03/24/1962      
#>  8        8 1962  "14"  04/07/1962      
#>  9        9 1962  "15"  04/14/1962      
#> 10       10 1962  "18"  05/05/1962      
#> # ℹ 2,066 more rows
#> 
#> $where
#> # A tibble: 122 × 9
#>    where_key region state city        city_state     status    pop   lat   long 
#>        <int> <chr>  <chr> <chr>       <chr>          <chr>     <chr> <chr> <chr>
#>  1         1 1      CT    Bridgeport  Bridgeport CT  non-capi… "  1… 41.2  " -7…
#>  2         2 1      CT    Hartford    Hartford CT    state ca… "  1… 41.8  " -7…
#>  3         3 1      CT    New Haven   New Haven CT   non-capi… "  1… 41.3  " -7…
#>  4         4 1      CT    Waterbury   Waterbury CT   non-capi… "  1… 41.6  " -7…
#>  5         5 1      MA    Cambridge   Cambridge MA   non-capi… "  1… 42.4  " -7…
#>  6         6 1      MA    Fall River  Fall River MA  non-capi… "   … 41.7  " -7…
#>  7         7 1      MA    Lowell      Lowell MA      non-capi… "  1… 42.6  " -7…
#>  8         8 1      MA    Lynn        Lynn MA        non-capi… "   … 42.5  " -7…
#>  9         9 1      MA    New Bedford New Bedford MA non-capi… "   … 41.7  " -7…
#> 10        10 1      MA    Somerville  Somerville MA  non-capi… "   … 42.4  " -7…
#> # ℹ 112 more rows
#> 
#> $who
#> # A tibble: 5 × 2
#>   who_key age        
#>     <int> <chr>      
#> 1       1 1-24 years 
#> 2       2 25-44 years
#> 3       3 45-64 years
#> 4       4 65+ years  
#> 5       5 <1 year    
#> 
#> $mrs_cause
#> # A tibble: 3,677 × 5
#>    when_key where_key pneumonia_and_influenza_deaths all_deaths nrow_agg
#>       <int>     <int>                          <int>      <int>    <int>
#>  1        1        87                              2         69        1
#>  2        2         8                              1         28        1
#>  3        2        10                              2         21        1
#>  4        2        64                             10        220        1
#>  5        2        97                              2         12        1
#>  6        3        15                              2         32        1
#>  7        4         6                              0         31        1
#>  8        4        15                              1         32        1
#>  9        4       119                              2         51        1
#> 10        5        36                              0         39        1
#> # ℹ 3,667 more rows
#> 
#> $mrs_age
#> # A tibble: 18,228 × 5
#>    when_key where_key who_key all_deaths nrow_agg
#>       <int>     <int>   <int>      <int>    <int>
#>  1        1        87       1          1        1
#>  2        1        87       2          5        1
#>  3        1        87       3         17        1
#>  4        1        87       4         41        1
#>  5        1        87       5          5        1
#>  6        2         8       1          0        1
#>  7        2         8       2          2        1
#>  8        2         8       3         11        1
#>  9        2         8       4         14        1
#> 10        2         8       5          1        1
#> # ℹ 18,218 more rows

From flat tables

Using the as_single_tibble_list() function, we also get a tibble list but in this case the fact and dimension tables have been merged to form a flat table as shown below.

tl <- mrs_db |>
  as_single_tibble_list()

tl
#> $mrs_cause
#> # A tibble: 3,677 × 14
#>    year  week  week_ending_date region state city  city_state status pop   lat  
#>    <chr> <chr> <chr>            <chr>  <chr> <chr> <chr>      <chr>  <chr> <chr>
#>  1 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  2 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#>  3 1962  " 2"  01/13/1962       1      MA    Some… Somervill… non-c… "   … 42.4 
#>  4 1962  " 2"  01/13/1962       5      DC    Wash… Washingto… non-c… "  5… 38.9 
#>  5 1962  " 2"  01/13/1962       8      CO    Colo… Colorado … non-c… "  3… 38.9 
#>  6 1962  " 5"  02/03/1962       2      NJ    Eliz… Elizabeth… non-c… "  1… 40.7 
#>  7 1962  " 6"  02/10/1962       1      MA    Fall… Fall Rive… non-c… "   … 41.7 
#>  8 1962  " 6"  02/10/1962       2      NJ    Eliz… Elizabeth… non-c… "  1… 40.7 
#>  9 1962  " 6"  02/10/1962       9      WA    Spok… Spokane WA non-c… "  1… 47.7 
#> 10 1962  " 8"  02/24/1962       3      IL    Rock… Rockford … non-c… "  1… 42.3 
#> # ℹ 3,667 more rows
#> # ℹ 4 more variables: long <chr>, pneumonia_and_influenza_deaths <int>,
#> #   all_deaths <int>, nrow_agg <int>
#> 
#> $mrs_age
#> # A tibble: 18,228 × 14
#>    year  week  week_ending_date region state city  city_state status pop   lat  
#>    <chr> <chr> <chr>            <chr>  <chr> <chr> <chr>      <chr>  <chr> <chr>
#>  1 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  2 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  3 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  4 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  5 1962  " 1"  01/06/1962       7      OK    Tulsa Tulsa OK   non-c… "  3… 36.1 
#>  6 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#>  7 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#>  8 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#>  9 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#> 10 1962  " 2"  01/13/1962       1      MA    Lynn  Lynn MA    non-c… "   … 42.5 
#> # ℹ 18,218 more rows
#> # ℹ 4 more variables: long <chr>, age <chr>, all_deaths <int>, nrow_agg <int>

Relational database

To export the component tables to a relational database, we can use the as_rdb() function. We have to pass it as a parameter a connection to the database, which we manage.

con <- DBI::dbConnect(RSQLite::SQLite())

mrs_db |>
  as_rdb(con)

DBI::dbListTables(con)
#> [1] "mrs_age"   "mrs_cause" "when"      "where"     "who"

DBI::dbDisconnect(con)

Additionally, in the the vignette titled Deployment of star databases with incremental refresh, vignette("v50-deploy"), it is described how to deploy a star database in a relational database so that it is automatically updated through periodic refresh operations.

xlsx file

Using the as_xlsx_file() function, we get an xslx file where each table is stored in a sheet. We have to indicate the name of the file. For the example, we select a temporary file.

f <- mrs_db |>
  as_xlsx_file(file = tempfile())

f
#> [1] "/tmp/RtmpNIkwmQ/file144e17510c5c.xlsx"

As we can see, it is responsible for assigning the appropriate extension.

csv files

Using the as_csv_files() function we can store each table in a different csv file, in the indicated folder, as shown below.

d <- mrs_db |>
  as_csv_files(dir = tempdir())

list.files(d, pattern = "*.csv")
#> [1] "mrs_age.csv"   "mrs_cause.csv" "when.csv"      "where.csv"    
#> [5] "who.csv"

geomultistar::multistar object

Using the as_multistar() function we can get a geomultistar::multistar object; with this object we can use the query and integration functions with geographic information offered by the geomultistar package.

ms <- mrs_db |>
  as_multistar()

class(ms)
#> [1] "multistar"

Conclusions

This document presents the functions to export the tables that make up the star databases to other types of formats.

The objective of these functions is that the rolap package can be used to transform the data and that other tools can be easily used to analyse it.