Package: CohortGenerator 0.11.2

Anthony Sena

CohortGenerator: Cohort Generation for the OMOP Common Data Model

Generate cohorts and subsets using an Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) Database. Cohorts are defined using 'CIRCE' (<https://github.com/ohdsi/circe-be>) or SQL compatible with 'SqlRender' (<https://github.com/OHDSI/SqlRender>).

Authors:Anthony Sena [aut, cre], Jamie Gilbert [aut], Gowtham Rao [aut], Martijn Schuemie [aut], Observational Health Data Science and Informatics [cph]

CohortGenerator_0.11.2.tar.gz
CohortGenerator_0.11.2.tar.gz(r-4.5-noble)CohortGenerator_0.11.2.tar.gz(r-4.4-noble)
CohortGenerator_0.11.2.tgz(r-4.4-emscripten)CohortGenerator_0.11.2.tgz(r-4.3-emscripten)
CohortGenerator.pdf |CohortGenerator.html
CohortGenerator/json (API)
NEWS

# Install 'CohortGenerator' in R:
install.packages('CohortGenerator', repos = 'https://cloud.r-project.org')

Bug tracker:https://github.com/ohdsi/cohortgenerator/issues22 issues

Pkgdown site:https://ohdsi.github.io

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT

On CRAN:

Conda:

openjdk

3.48 score 478 downloads 47 exports 55 dependencies

Last updated 5 months agofrom:89f9d8ecc5. Checks:1 OK, 1 WARNING. Indexed: no.

TargetResultLatest binary
Doc / VignettesOKFeb 28 2025
R-4.5-linuxWARNINGFeb 28 2025

Exports:addCohortSubsetDefinitioncheckAndFixCohortDefinitionSetDataTypesCohortSubsetDefinitionCohortSubsetOperatorcomputeChecksumcreateCohortSubsetcreateCohortSubsetDefinitioncreateCohortTablescreateDemographicSubsetcreateEmptyCohortDefinitionSetcreateEmptyNegativeControlOutcomeCohortSetcreateLimitSubsetcreateResultsDataModelcreateSubsetCohortWindowDemographicSubsetOperatordropCohortStatsTablesexportCohortStatsTablesgenerateCohortSetgenerateNegativeControlOutcomeCohortsgetCohortCountsgetCohortDefinitionSetgetCohortInclusionRulesgetCohortStatsgetCohortTableNamesgetDataMigratorgetRequiredTasksgetResultsDataModelSpecificationsgetSubsetDefinitionsinsertInclusionRuleNamesisCamelCaseisCohortDefinitionSetisFormattedForDatabaseUploadisSnakeCaseisTaskRequiredLimitSubsetOperatormigrateDataModelreadCsvrecordTasksDonerunCohortGenerationsampleCohortDefinitionSetsaveCohortDefinitionSetsaveCohortSubsetDefinitionsaveIncrementalSubsetCohortWindowSubsetOperatoruploadResultswriteCsv

Dependencies:backportsbitbit64blobcheckmateclicliprcpp11crayonDatabaseConnectorDBIdbplyrdigestdplyrfansifastmapgenericsgluehmsjsonlitelaterlifecyclelubridatemagrittrParallelLoggerpillarpkgconfigpoolprettyunitsprogresspurrrR6RcppreadrResultModelManagerrJavaRJSONIOrlangsnowSqlRenderstringistringrtibbletidyrtidyselecttimechangetriebeardtzdburltoolsutf8vctrsvroomwithrxml2zip

Creating Cohort Subset Definitions

Rendered fromCreatingCohortSubsetDefinitions.Rmdusingknitr::rmarkdownon Feb 28 2025.

Last update: 2024-10-01
Started: 2024-09-16

Generating Cohorts

Rendered fromGeneratingCohorts.Rmdusingknitr::rmarkdownon Feb 28 2025.

Last update: 2024-10-01
Started: 2024-09-16

Sampling Cohorts

Rendered fromSamplingCohorts.Rmdusingknitr::rmarkdownon Feb 28 2025.

Last update: 2024-10-01
Started: 2024-09-16

Citation

To cite package ‘CohortGenerator’ in publications use:

Sena A, Gilbert J, Rao G, Schuemie M (2024). CohortGenerator: Cohort Generation for the OMOP Common Data Model. R package version 0.11.2, https://CRAN.R-project.org/package=CohortGenerator.

Corresponding BibTeX entry:

  @Manual{,
    title = {CohortGenerator: Cohort Generation for the OMOP Common
      Data Model},
    author = {Anthony Sena and Jamie Gilbert and Gowtham Rao and
      Martijn Schuemie},
    year = {2024},
    note = {R package version 0.11.2},
    url = {https://CRAN.R-project.org/package=CohortGenerator},
  }

Readme and manuals

CohortGenerator

CohortGenerator is part of HADES.

Introduction

This R package contains functions for generating cohorts using data in the CDM.

Features

  • Create a cohort table and generate cohorts against an OMOP CDM.
  • Get the count of subjects and events in a cohort.
  • Provides functions for performing incremental tasks. This is used by CohortGenerator to skip any cohorts that were successfully generated in a previous run. This functionality is generic enough for other packages to use for performing their own incremental tasks.

Example

# First construct a cohort definition set: an empty 
# data frame with the cohorts to generate
cohortsToCreate <- CohortGenerator::createEmptyCohortDefinitionSet()

# Fill the cohort set using  cohorts included in this 
# package as an example
cohortJsonFiles <- list.files(path = system.file("testdata/name/cohorts", package = "CohortGenerator"), full.names = TRUE)
for (i in 1:length(cohortJsonFiles)) {
  cohortJsonFileName <- cohortJsonFiles[i]
  cohortName <- tools::file_path_sans_ext(basename(cohortJsonFileName))
  # Here we read in the JSON in order to create the SQL
  # using [CirceR](https://ohdsi.github.io/CirceR/)
  # If you have your JSON and SQL stored differenly, you can
  # modify this to read your JSON/SQL files however you require
  cohortJson <- readChar(cohortJsonFileName, file.info(cohortJsonFileName)$size)
  cohortExpression <- CirceR::cohortExpressionFromJson(cohortJson)
  cohortSql <- CirceR::buildCohortQuery(cohortExpression, options = CirceR::createGenerateOptions(generateStats = FALSE))
  cohortsToCreate <- rbind(cohortsToCreate, data.frame(cohortId = i,
                                                       cohortName = cohortName, 
                                                       sql = cohortSql,
                                                       stringsAsFactors = FALSE))
}

# Generate the cohort set against Eunomia. 
# cohortsGenerated contains a list of the cohortIds 
# successfully generated against the CDM
connectionDetails <- Eunomia::getEunomiaConnectionDetails()

# Create the cohort tables to hold the cohort generation results
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "my_cohort_table")
CohortGenerator::createCohortTables(connectionDetails = connectionDetails,
                                                        cohortDatabaseSchema = "main",
                                                        cohortTableNames = cohortTableNames)
# Generate the cohorts
cohortsGenerated <- CohortGenerator::generateCohortSet(connectionDetails = connectionDetails,
                                                       cdmDatabaseSchema = "main",
                                                       cohortDatabaseSchema = "main",
                                                       cohortTableNames = cohortTableNames,
                                                       cohortDefinitionSet = cohortsToCreate)

# Get the cohort counts
cohortCounts <- CohortGenerator::getCohortCounts(connectionDetails = connectionDetails,
                                                 cohortDatabaseSchema = "main",
                                                 cohortTable = cohortTableNames$cohortTable)
print(cohortCounts)

Technology

CohortGenerator is an R package.

System requirements

Requires R (version 3.6.0 or higher).

Getting Started

  1. Make sure your R environment is properly configured. This means that Java must be installed. See these instructions for how to configure your R environment.

  2. In R, use the following commands to download and install CohortGenerator:

    remotes::install_github("OHDSI/CohortGenerator")
    

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available:

Support

Contributing

Read here how you can contribute to this package.

License

CohortGenerator is licensed under Apache License 2.0

Development

This package is being developed in RStudio.

Development status

Beta

Help Manual

Help pageTopics
Add cohort subset definition to a cohort definition setaddCohortSubsetDefinition
Check if a cohort definition set is using the proper data typescheckAndFixCohortDefinitionSetDataTypes
Cohort Subset DefinitionCohortSubsetDefinition
Cohort Subset OperatorCohortSubsetOperator
Computes the checksum for a valuecomputeChecksum
A definition of subset functions to be applied to a set of cohortscreateCohortSubset
Create Subset DefinitioncreateCohortSubsetDefinition
Create cohort tablescreateCohortTables
Create createDemographicSubset SubsetcreateDemographicSubset
Create an empty cohort definition setcreateEmptyCohortDefinitionSet
Create an empty negative control outcome cohort setcreateEmptyNegativeControlOutcomeCohortSet
Create Limit SubsetcreateLimitSubset
Create the results data model tables on a database server.createResultsDataModel
A definition of subset functions to be applied to a set of cohortscreateSubsetCohortWindow
Demographic Subset OperatorDemographicSubsetOperator
Drop cohort statistics tablesdropCohortStatsTables
Export the cohort statistics tables to the file systemexportCohortStatsTables
Generate a set of cohortsgenerateCohortSet
Generate a set of negative control outcome cohortsgenerateNegativeControlOutcomeCohorts
Count the cohort(s)getCohortCounts
Get a cohort definition setgetCohortDefinitionSet
Get Cohort Inclusion Rules from a cohort definition setgetCohortInclusionRules
Get Cohort Inclusion Stats Table DatagetCohortStats
Used to get a list of cohort table names to use when creating the cohort tablesgetCohortTableNames
Get database migrations instancegetDataMigrator
Get a list of tasks required when running in incremental modegetRequiredTasks
Get specifications for CohortGenerator results data modelgetResultsDataModelSpecifications
Get cohort subset definitions from a cohort definition setgetSubsetDefinitions
Used to insert the inclusion rule names from a cohort definition set when generating cohorts that include cohort statisticsinsertInclusionRuleNames
Used to check if a string is in lower camel caseisCamelCase
Is the data.frame a cohort definition set?isCohortDefinitionSet
Is the data.frame formatted for uploading to a database?isFormattedForDatabaseUpload
Used to check if a string is in snake caseisSnakeCase
Is a task required when running in incremental modeisTaskRequired
Limit Subset OperatorLimitSubsetOperator
Migrate Data modelmigrateDataModel
Used to read a .csv filereadCsv
Record a task as completerecordTasksDone
Run a cohort generation and export resultsrunCohortGeneration
Sample Cohort Definition SetsampleCohortDefinitionSet
Save the cohort definition set to the file systemsaveCohortDefinitionSet
Save cohort subset definitions to jsonsaveCohortSubsetDefinition
Used in incremental mode to save values to a filesaveIncremental
Time Window For Cohort Subset OperatorSubsetCohortWindow
Abstract base class for subsets.SubsetOperator
Upload results to the database server.uploadResults
Used to write a .csv filewriteCsv