--- title: "CDMConnector and dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{CDMConnector and dbplyr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(CDMConnector) if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia")) if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER")) if (!eunomia_is_available()) downloadEunomiaData() knitr::opts_chunk$set( collapse = TRUE, eval = rlang::is_installed("duckdb"), comment = "#>" ) ``` ## Set up First let's load the required packages for the code in this vignette. If you haven't already installed them, all the other packages can be installed using ´install.packages()´ ```{r} library(CDMConnector) library(dplyr, warn.conflicts = FALSE) library(ggplot2) ``` ## Creating the cdm reference Now let´s connect to a duckdb database with the Eunomia data (https://github.com/OHDSI/Eunomia). ```{r} con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir()) cdm <- cdm_from_con(con, cdm_name = "eunomia", cdm_schema = "main", write_schema = "main") cdm ``` This cdm object is now what we´ll use going forward. It provides a reference to the OMOP CDM tables. We can see that these tables are still in the database, but now we have a reference to each of the ones we might want to use in our analysis. For example, the person table can be referenced like so ## Putting it all together Say we want to make a histogram of year of birth in the person table. We can select that variable, bring it into memory, and then use ggplot to make the histogram. ```{r, message=FALSE} cdm$person %>% select(year_of_birth) %>% collect() %>% ggplot(aes(x = year_of_birth)) + geom_histogram(bins = 30) ``` If we wanted to make a boxplot for length of observation periods we could do the computation on the database side, bring in the new variable into memory, and use ggplot to produce the boxplot ```{r} cdm$observation_period %>% select(observation_period_start_date, observation_period_end_date) %>% mutate(observation_period = (observation_period_end_date - observation_period_start_date)/365, 25) %>% select(observation_period) %>% collect() %>% ggplot(aes(x = observation_period)) + geom_boxplot() ``` ## Behind the scenes We use show_query to check the sql that is being run against duckdb ```{r} cdm$person %>% tally() %>% show_query() ``` ```{r} cdm$person %>% summarise(median(year_of_birth))%>% show_query() ``` ```{r, warning=FALSE} cdm$person %>% mutate(gender = case_when( gender_concept_id == "8507" ~ "Male", gender_concept_id == "8532" ~ "Female", TRUE ~ NA_character_))%>% show_query() ``` ```{r} DBI::dbDisconnect(con, shutdown = TRUE) ```