Concatenating cohort records

library(CohortConstructor)
library(CohortCharacteristics)
library(ggplot2)

For this example we’ll use the Eunomia synthetic data from the CDMConnector package.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main", 
                    write_schema = c(prefix = "my_study_", schema = "main"))

Let’s start by creating two drug cohorts, one for users of diclofenac and another for users of acetaminophen.

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = list("diclofenac" = 1124300,
                                                   "acetaminophen" = 1127433), 
                                 name = "medications")
cohortCount(cdm$medications)
#> # A tibble: 2 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1           9365            2580
#> 2                    2            830             830

We can merge cohort records using the collapseCohorts() function in the CohortConstructor package. The function allows us to specifying the number of days between two cohort entries, which will then be merged into a single record.

Let’s first define a new cohort where records within 1095 days (~ 3 years) of each other will be merged.

cdm$medications_collapsed <- collapseCohorts(
  cohort = cdm$medications,
  gap = 1095,
  name = "medications_collapsed"
)

Let’s compare how this function would change the records of a single individual.

cdm$medications |>
  filter(subject_id == 1)
#> # Source:   SQL [4 x 4]
#> # Database: DuckDB v1.1.2 [unknown@Linux 6.5.0-1025-azure:R 4.4.2//tmp/RtmpDCFQ9q/file1a424990aae8.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                  <int>      <int> <date>            <date>         
#> 1                    1          1 1980-03-15        1980-03-29     
#> 2                    1          1 1971-01-04        1971-01-18     
#> 3                    1          1 1982-09-11        1982-10-02     
#> 4                    1          1 1976-10-20        1976-11-03
cdm$medications_collapsed |>
  filter(subject_id == 1)
#> # Source:   SQL [3 x 4]
#> # Database: DuckDB v1.1.2 [unknown@Linux 6.5.0-1025-azure:R 4.4.2//tmp/RtmpDCFQ9q/file1a424990aae8.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                  <int>      <int> <date>            <date>         
#> 1                    1          1 1971-01-04        1971-01-18     
#> 2                    1          1 1976-10-20        1976-11-03     
#> 3                    1          1 1980-03-15        1982-10-02

Subject 1 initially had 4 records between 1971 and 1982. After specifying that records within three years of each other are to be merged, the number of records decreases to three. The record from 1980-03-15 to 1980-03-29 and the record from 1982-09-11 to 1982-10-02 are merged to create a new record from 1980-03-15 to 1982-10-02.

Now let’s look at how the cohorts have been changed.

summary_attrition <- summariseCohortAttrition(cdm$medications_collapsed)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates the changes to cohort 1 (users of acetaminophen) when entries within 3 years of each other are merged. We see that collapsing the cohort has led to 1,390 fewer records.

summary_attrition <- summariseCohortAttrition(cdm$medications_collapsed)
plotCohortAttrition(summary_attrition, cohortId = 2)

The flow chart above illustrates the changes to cohort 2 (users of diclofenac) when entries within 3 years of each other are merged. Since this cohort only has one record per individual the function collapseCohorts() had no impact on the final number of records.