--- title: "Implementing custom filters" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Implementing custom filters} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options("tibble.print_min" = 5L, "tibble.print_max" = 5L) library(dplyr, exclude = "filter") library(cohortBuilder) ``` `cohortBuilder` provides seven built-in filter types suitable for most common filtering tasks: - discrete, - discrete_text, - range, - date_range, - datetime_range, - multi_discrete, - query. If none of the above filters meets your need, you can create a custom one. Below we describe how custom filters work and walk through an example: creating a **logical** filter. ## Filter architecture Filters in `cohortBuilder` are S7 objects that inherit from the `CbFilter` base class. Each filter type is an S7 class with properties describing its configuration (dataset, variable, value, etc.). Filtering logic is implemented via **S7 multi-dispatch generics** that dispatch on both the filter class and the source class. This means you can define how a filter type behaves for different source backends (e.g. `tblist`, database, etc.). The key generics are: | Generic | Purpose | |---------|---------| | `cb_filter_data(filter, source, data_object)` | Apply filter to data | | `cb_get_filter_stats(filter, source, data_object)` | Compute filter statistics | | `cb_plot_filter_data(filter, source, data_object)` | Plot filter data summary | | `cb_get_filter_data(filter, source, data_object)` | Get filter-related data subset | | `cb_get_filter_defaults(filter, source, data_object, cache_object)` | Get default parameter values | | `cb_filter_to_expr(filter, source)` | Generate reproducible code expression | ### Built-in filter example Let's look at a built-in discrete filter to understand the structure: ```{r} spec_filter <- filter("discrete", value = "setosa", dataset = "iris", variable = "Species") spec_filter ``` The filter is an S7 object with properties accessible via `@`: ```{r} spec_filter@type spec_filter@variable spec_filter@value spec_filter@dataset ``` All user-facing parameters can be retrieved as a list: ```{r} str(get_filter_params(spec_filter)) ``` ## Creating a custom filter: logical filter To create a custom filter you need to: 1. Define an S7 class inheriting from `CbFilter`. 2. Register the type so `filter("logical", ...)` works. 3. Implement S7 methods for each source type you want to support. ### Step 1: Define the S7 filter class ```{r} CbFilterLogical <- S7::new_class("CbFilterLogical", parent = CbFilter, properties = list( dataset = S7::class_character, variable = S7::class_character, value = S7::class_any, keep_na = S7::class_logical ), constructor = function(id = NULL, name = NULL, variable, value = NA, dataset, keep_na = TRUE, description = NULL, active = getOption("cb_active_filter", default = TRUE), ...) { if (is.null(id)) id <- paste(dataset, variable, sep = "-") if (is.null(name)) name <- variable S7::new_object(S7::S7_object(), type = "logical", id = id, name = name, input_param = "value", variable = variable, value = value, dataset = dataset, keep_na = keep_na, active = active, description = description, extra = list(...) ) } ) ``` Key points: - `parent = CbFilter` makes it part of the filter hierarchy. - `input_param = "value"` tells the system which property holds the filtering value. - `extra = list(...)` captures any additional parameters (e.g. `gui_input` for shinyCohortBuilder). - `value = NA` means no filtering is applied by default. ### Step 2: Register the filter type ```{r} register_filter_type("logical", CbFilterLogical) ``` Now `filter("logical", ...)` works: ```{r} my_filter <- filter("logical", variable = "is_setosa", dataset = "iris", value = TRUE) my_filter@type my_filter@value ``` ### Step 3: Implement methods for tblist source Each method dispatches on `(filter_class, source_class)`. We use `tblist_class` (provided by cohortBuilder) as the source class for the built-in list-of-tables backend. #### filter_data -- filtering logic ```{r} S7::method(cb_filter_data, list(CbFilterLogical, tblist_class)) <- function(filter, source, data_object, ...) { dataset <- filter@dataset variable <- filter@variable value <- filter@value keep_na <- filter@keep_na if (keep_na && !identical(value, NA)) { data_object[[dataset]] <- data_object[[dataset]] |> dplyr::filter(!!sym(variable) == !!value | is.na(!!sym(variable))) } if (!keep_na && identical(value, NA)) { data_object[[dataset]] <- data_object[[dataset]] |> dplyr::filter(!is.na(!!sym(variable))) } if (!keep_na && !identical(value, NA)) { data_object[[dataset]] <- data_object[[dataset]] |> dplyr::filter(!!sym(variable) == !!value) } attr(data_object[[dataset]], "filtered") <- TRUE data_object } ``` Note: - When `value = NA`, no filtering is applied (unless `keep_na = FALSE`, which removes `NA` rows). - The `filtered` attribute signals to binding logic that this dataset was touched. #### get_stats -- data statistics ```{r} S7::method(cb_get_filter_stats, list(CbFilterLogical, tblist_class)) <- function( filter, source, data_object, name, ... ) { dataset <- filter@dataset variable <- filter@variable if (missing(name)) { name <- c("n_data", "choices", "n_missing") } column <- data_object[[dataset]][[variable]] stats <- list( choices = if ("choices" %in% name) as.list(table(stats::na.omit(column))), n_data = if ("n_data" %in% name) length(stats::na.omit(column)), n_missing = if ("n_missing" %in% name) sum(is.na(column)) ) if (length(name) == 1L) stats[[name]] else stats[name] } ``` Tip: evaluating only the requested stat (via the `name` parameter) improves performance with large data. #### plot_data -- visualization ```{r} S7::method(cb_plot_filter_data, list(CbFilterLogical, tblist_class)) <- function(filter, source, data_object, ...) { dataset <- filter@dataset variable <- filter@variable if (nrow(data_object[[dataset]])) { data_object[[dataset]][[variable]] |> table() |> prop.table() |> graphics::barplot(...) } else { graphics::barplot(0.0, ylim = c(0.0, 0.1), main = "No data") } } ``` #### get_data -- filter-related data ```{r} S7::method(cb_get_filter_data, list(CbFilterLogical, tblist_class)) <- function(filter, source, data_object, ...) { data_object[[filter@dataset]][[filter@variable]] } ``` #### get_defaults -- default parameter values ```{r} S7::method(cb_get_filter_defaults, list(CbFilterLogical, tblist_class)) <- function( filter, source, data_object, cache_object, ... ) { list(value = names(cache_object$choices)) } ``` #### filter_to_expr -- reproducible code ```{r} S7::method(cb_filter_to_expr, list(CbFilterLogical, tblist_class)) <- function(filter, source, ...) { dataset <- filter@dataset variable <- filter@variable value <- filter@value keep_na <- filter@keep_na if (keep_na && !identical(value, NA)) { rlang::expr({ data_object[[!!dataset]] <- data_object[[!!dataset]] |> dplyr::filter(!!sym(variable) == !!value | is.na(!!sym(variable))) }) } else if (!keep_na && identical(value, NA)) { rlang::expr({ data_object[[!!dataset]] <- data_object[[!!dataset]] |> dplyr::filter(!is.na(!!sym(variable))) }) } else if (!keep_na && !identical(value, NA)) { rlang::expr({ data_object[[!!dataset]] <- data_object[[!!dataset]] |> dplyr::filter(!!sym(variable) == !!value) }) } else { NULL } } ``` ### Using the filter Now we can use our filter for building a cohort. For the example we'll use an extended `iris` table: ```{r} iris2 <- dplyr::mutate(iris, is_setosa = Species == "setosa") coh <- set_source(tblist(iris = iris2)) |> cohort( filter("logical", dataset = "iris", variable = "is_setosa", value = TRUE) ) |> run() ``` Let's verify the filtering worked: ```{r} nrow(get_data(coh)$iris) collapse::funique(get_data(coh)$iris$is_setosa) ``` We can also check statistics and reproducible code: ```{r} stat(coh, step_id = 1L, filter_id = 1L) ``` ```{r} code(coh, include_methods = NULL, include_action = NULL) ``` ## Extra parameters Filter constructors accept `...` which is stored in `filter@extra`. This is useful for passing additional configuration consumed by other packages (e.g. `gui_input` for shinyCohortBuilder): ```{r} my_filter <- filter("logical", dataset = "iris", variable = "is_setosa", value = TRUE, gui_input = "checkbox" ) my_filter@extra$gui_input get_filter_params(my_filter)$gui_input ``` Extra parameters are included in `get_filter_params()` output, persisted in `get_state()`, and can be updated via `update_filter()`. ## Implementing filters in a package When creating custom filters in a separate package: 1. Define your S7 filter class(es) and export them. 2. Call `register_filter_type()` in your package's `.onLoad()`. 3. Register S7 methods for your filter × source combinations. 4. Add `S7 (>= 0.2.0)` and `cohortBuilder` to `Imports` in DESCRIPTION. 5. Call `S7::methods_register()` in your `.onLoad()` to register S7 methods. ```r # R/zzz.R .onLoad <- function(libname, pkgname) { S7::methods_register() cohortBuilder::register_filter_type("logical", CbFilterLogical) } ```