--- title: "Creating an OCCDS ADaM" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating an OCCDS ADaM} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(admiraldev) ``` # Introduction This article describes creating an OCCDS ADaM. Examples are currently presented and tested in the context of `ADAE`. However, the examples could be applied to other OCCDS ADaMs such as `ADCM`, `ADMH`, `ADDV`, etc. **Note**: *All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.* # Programming Workflow * [Read in Data](#readdata) * [Derive/Impute End and Start Analysis Date/time and Relative Day](#datetime) * [Derive Durations](#duration) * [Derive ATC variables](#atc) * [Derive Planned and Actual Treatment](#trtpa) * [Derive Date/Date-time of Last Dose](#last_dose) * [Derive Treatment Dose and Unit](#dose_unit) * [Derive Severity, Causality, and Toxicity Grade](#severity) * [Derive Treatment Emergent Flag](#trtflag) * [Derive Occurrence Flags](#occflag) * [Derive Query Variables](#query) * [Add ADSL variables](#adsl_vars) * [Derive Analysis Sequence Number](#aseq) * [Add Labels and Attributes](#attributes) ## Read in Data {#readdata} To start, all data frames needed for the creation of `ADAE` should be read into the environment. This will be a company specific process. Some of the data frames needed may be `AE` and `ADSL`. For example purpose, the CDISC Pilot SDTM and ADaM datasets ---which are included in `{pharmaversesdtm}`--- are used. ```{r, message=FALSE, warning=FALSE} library(admiral) library(dplyr, warn.conflicts = FALSE) library(pharmaversesdtm) library(lubridate) ae <- pharmaversesdtm::ae adsl <- admiral::admiral_adsl ex <- pharmaversesdtm::ex ae <- convert_blanks_to_na(ae) ex <- convert_blanks_to_na(ex) ``` As the start and end datetime of the dosing records is required by multiple derivations, it is derived once at the beginning. Please note that it depends on the study whether imputation is required and if so, which imputation method is appropriate. In the example below, we impute the time as the first time of the day. The `min_dates` argument is used in the second call to ensure that the end datetime is not imputed to be before the start datetime. ```{r} ex <- ex %>% derive_vars_dtm( dtc = EXSTDTC, new_vars_prefix = "EXST", time_imputation = "first", flag_imputation = "none" ) %>% derive_vars_dtm( dtc = EXENDTC, new_vars_prefix = "EXEN", time_imputation = "first", flag_imputation = "none", min_dates = exprs(EXSTDTM) ) %>% derive_vars_dtm_to_dt(exprs(EXSTDTM, EXENDTM)) ``` ```{r echo = FALSE} ae <- filter(ae, USUBJID %in% c("01-701-1015", "01-701-1023", "01-701-1211", "01-703-1086", "01-703-1096", "01-707-1037", "01-716-1024")) ``` At this step, it may be useful to join `ADSL` to your `AE` domain as well. Only the `ADSL` variables used for derivations are selected at this step. The rest of the relevant `ADSL` variables would be added later. ```{r eval=TRUE} adsl_vars <- exprs(TRTSDT, TRTEDT, TRTEDTM, TRT01A, TRT01P, DTHDT, EOSDT) adae <- derive_vars_merged( ae, dataset_add = adsl, new_vars = adsl_vars, by = exprs(STUDYID, USUBJID) ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AESEQ, AETERM, AESTDTC, TRTSDT, TRTEDT, TRT01A, TRT01P, DTHDT, EOSDT ) ) ``` ## Derive/Impute End and Start Analysis Date/time and Relative Day {#datetime} This part derives `ASTDTM`, `ASTDT`, `ASTDY`, `AENDTM`, `AENDT`, and `AENDY`. The function `derive_vars_dtm()` can be used to derive `ASTDTM` and `AENDTM` where `ASTDTM` could be company-specific. `ASTDT` and `AENDT` can be derived from `ASTDTM` and `AENDTM`, respectively, using function `derive_vars_dtm_to_dt()`. `derive_vars_dy()` can be used to create `ASTDY` and `AENDY`. ```{r eval=TRUE} adae <- adae %>% derive_vars_dtm( dtc = AESTDTC, new_vars_prefix = "AST", highest_imputation = "M", min_dates = exprs(TRTSDT) ) %>% derive_vars_dtm( dtc = AEENDTC, new_vars_prefix = "AEN", highest_imputation = "M", date_imputation = "last", time_imputation = "last", max_dates = exprs(DTHDT, EOSDT) ) %>% derive_vars_dtm_to_dt(exprs(ASTDTM, AENDTM)) %>% derive_vars_dy( reference_date = TRTSDT, source_vars = exprs(ASTDT, AENDT) ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AESTDTC, AEENDTC, ASTDTM, ASTDT, ASTDY, AENDTM, AENDT, AENDY ) ) ``` See also [Date and Time Imputation](imputation.html). ## Derive Durations {#duration} The function `derive_vars_duration()` can be used to create the variables `ADURN` and `ADURU`. ```{r eval=TRUE} adae <- adae %>% derive_vars_duration( new_var = ADURN, new_var_unit = ADURU, start_date = ASTDT, end_date = AENDT ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AESTDTC, AEENDTC, ASTDT, AENDT, ADURN, ADURU ) ) ``` ## Derive ATC variables {#atc} The function `derive_vars_atc()` can be used to derive ATC Class Variables. It helps to add Anatomical Therapeutic Chemical class variables from `FACM` to `ADCM`. The expected result is the input dataset with ATC variables added. ```{r eval=TRUE} cm <- tibble::tribble( ~STUDYID, ~USUBJID, ~CMGRPID, ~CMREFID, ~CMDECOD, "STUDY01", "BP40257-1001", "14", "1192056", "PARACETAMOL", "STUDY01", "BP40257-1001", "18", "2007001", "SOLUMEDROL", "STUDY01", "BP40257-1002", "19", "2791596", "SPIRONOLACTONE" ) facm <- tibble::tribble( ~STUDYID, ~USUBJID, ~FAGRPID, ~FAREFID, ~FATESTCD, ~FASTRESC, "STUDY01", "BP40257-1001", "1", "1192056", "CMATC1CD", "N", "STUDY01", "BP40257-1001", "1", "1192056", "CMATC2CD", "N02", "STUDY01", "BP40257-1001", "1", "1192056", "CMATC3CD", "N02B", "STUDY01", "BP40257-1001", "1", "1192056", "CMATC4CD", "N02BE", "STUDY01", "BP40257-1001", "1", "2007001", "CMATC1CD", "D", "STUDY01", "BP40257-1001", "1", "2007001", "CMATC2CD", "D10", "STUDY01", "BP40257-1001", "1", "2007001", "CMATC3CD", "D10A", "STUDY01", "BP40257-1001", "1", "2007001", "CMATC4CD", "D10AA", "STUDY01", "BP40257-1001", "2", "2007001", "CMATC1CD", "D", "STUDY01", "BP40257-1001", "2", "2007001", "CMATC2CD", "D07", "STUDY01", "BP40257-1001", "2", "2007001", "CMATC3CD", "D07A", "STUDY01", "BP40257-1001", "2", "2007001", "CMATC4CD", "D07AA", "STUDY01", "BP40257-1001", "3", "2007001", "CMATC1CD", "H", "STUDY01", "BP40257-1001", "3", "2007001", "CMATC2CD", "H02", "STUDY01", "BP40257-1001", "3", "2007001", "CMATC3CD", "H02A", "STUDY01", "BP40257-1001", "3", "2007001", "CMATC4CD", "H02AB", "STUDY01", "BP40257-1002", "1", "2791596", "CMATC1CD", "C", "STUDY01", "BP40257-1002", "1", "2791596", "CMATC2CD", "C03", "STUDY01", "BP40257-1002", "1", "2791596", "CMATC3CD", "C03D", "STUDY01", "BP40257-1002", "1", "2791596", "CMATC4CD", "C03DA" ) derive_vars_atc(cm, dataset_facm = facm, id_vars = exprs(FAGRPID)) ``` ## Derive Planned and Actual Treatment {#trtpa} `TRTA` and `TRTP` must match at least one value of the character treatment variables in ADSL (e.g., `TRTxxA`/`TRTxxP`, `TRTSEQA`/`TRTSEQP`, `TRxxAGy`/`TRxxPGy`). An example of a simple implementation for a study without periods could be: ```{r eval=TRUE} adae <- mutate(adae, TRTP = TRT01P, TRTA = TRT01A) count(adae, TRTP, TRTA, TRT01P, TRT01A) ``` For studies with periods see the ["Visit and Period Variables" vignette](visits_periods.html#treatment_bds). ## Derive Date/Date-time of Last Dose {#last_dose} Before deriving the last dose date, it may be necessary to create an `ex_single` dataset from the `EX` domain. If the exposure dataset contains multi-day dosing records (e.g., one record per treatment period rather than one record per dose), use `create_single_dose_dataset()` to expand them into one record per dose. Whether this step is necessary depends on how dosing data were collected in your study. For ongoing studies, you may also need to impute missing end dates (e.g., with the data cut-off date) before calling `create_single_dose_dataset()`. For examples including handling of missing end dates, see `?create_single_dose_dataset`. The test data contains one record per treatment period and the dose frequency is daily (`QD`). The following call creates one record per day. ```{r eval=TRUE} ex_single <- ex %>% filter(!is.na(EXSTDT), !is.na(EXENDT)) %>% create_single_dose_dataset( dose_freq = EXDOSFRQ, start_date = EXSTDT, start_datetime = EXSTDTM, end_date = EXENDT, end_datetime = EXENDTM, keep_source_vars = exprs( STUDYID, USUBJID, EXTRT, EXDOSE, EXDOSU, EXSTDT, EXENDT, EXSTDTM, EXENDTM ) ) ``` The function `derive_vars_joined()` can be used to derive the last dose date before the start of the event. ```{r eval=TRUE} adae <- derive_vars_joined( adae, ex_single, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(LDOSEDTM = EXSTDTM), join_vars = exprs(EXSTDTM), join_type = "all", order = exprs(EXSTDTM), filter_add = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) & !is.na(EXSTDTM), filter_join = EXSTDTM <= ASTDTM, mode = "last" ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AEDECOD, AESEQ, AESTDTC, AEENDTC, ASTDT, AENDT, LDOSEDTM ) ) ``` ## Derive Treatment Dose and Unit {#dose_unit} In a similar manner, you could derive the treatment dose (`DOSEON`) and unit (`DOSEU`) at the time of the event. Please note that drug clearance duration should be considered when matching exposure records with adverse events. `EXSTDTC` and `EXENDTC` typically represent the administration period only, not the time the drug remains in the body. To account for drug clearance, you may extend the last exposure end date by the appropriate clearance duration. For example, if the exposure dataset contains one records per dose and the drug administration is instantaneous, e.g., a pill, we have `EXSTDTC == EXENDTC`. I.e., without adding a clearance duration, the dose will only be considered as active at the exact time of administration. Adding a clearance duration to the end date may result in overlapping dosing intervals for some subjects. Therefore the last dosing record before the adverse event is selected in the example below. If overlapping is not expected based on the study design and data collection, the `order` and the `mode` argument in the `derive_vars_joined()` call below can be removed. Then the function will throw an error if overlapping dosing intervals are found. Replace `days(1)` in `filter_join` below with the study-specific drug clearance period. ```{r eval=TRUE} adae <- derive_vars_joined( adae, ex, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(DOSEON = EXDOSE, DOSEU = EXDOSU), order = exprs(EXSTDTM), mode = "last", join_vars = exprs(EXSTDTM, EXENDTM), join_type = "all", filter_add = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) & !is.na(EXSTDTM), filter_join = EXSTDTM <= ASTDTM & (ASTDTM < EXENDTM + days(1) | is.na(EXENDTM)) ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AEDECOD, AESEQ, AESTDTC, AEENDTC, ASTDT, AENDT, DOSEON, DOSEU ) ) ``` If no time is collected for exposure or adverse events, it may be better to use the date variables (`EXSTDT`, `EXENDT`, and `ASTDT`) instead of the datetime variables. ## Derive Severity, Causality, and Toxicity Grade {#severity} The variables `ASEV`, `AREL`, and `ATOXGR` can be added using simple `dplyr::mutate()` assignments, if no imputation is required. ```{r eval=TRUE} adae <- adae %>% mutate( ASEV = AESEV, AREL = AEREL ) ``` ## Derive Treatment Emergent Flag {#trtflag} To derive the treatment emergent flag `TRTEMFL`, one can call `derive_var_trtemfl()`. In the example below, we use 30 days in the flag derivation. ```{r eval=TRUE} adae <- adae %>% derive_var_trtemfl( trt_start_date = TRTSDT, trt_end_date = TRTEDT, end_window = 30 ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, TRTSDT, TRTEDT, AESTDTC, ASTDT, TRTEMFL ) ) ``` To derive on-treatment flag (`ONTRTFL`) in an ADaM dataset with a single occurrence date, we use `derive_var_ontrtfl()`. The expected result is the input dataset with an additional column named `ONTRTFL` with a value of `"Y"` or `NA`. If you want to also check an end date, you could add the `end_date` argument. Note that in this scenario you could set `span_period = TRUE` if you want occurrences that started prior to drug intake, and was ongoing or ended after this time to be considered as on-treatment. ```{r eval=TRUE} bds1 <- tibble::tribble( ~USUBJID, ~ADT, ~TRTSDT, ~TRTEDT, "P01", ymd("2020-02-24"), ymd("2020-01-01"), ymd("2020-03-01"), "P02", ymd("2020-01-01"), ymd("2020-01-01"), ymd("2020-03-01"), "P03", ymd("2019-12-31"), ymd("2020-01-01"), ymd("2020-03-01") ) derive_var_ontrtfl( bds1, start_date = ADT, ref_start_date = TRTSDT, ref_end_date = TRTEDT ) bds2 <- tibble::tribble( ~USUBJID, ~ADT, ~TRTSDT, ~TRTEDT, "P01", ymd("2020-07-01"), ymd("2020-01-01"), ymd("2020-03-01"), "P02", ymd("2020-04-30"), ymd("2020-01-01"), ymd("2020-03-01"), "P03", ymd("2020-03-15"), ymd("2020-01-01"), ymd("2020-03-01") ) derive_var_ontrtfl( bds2, start_date = ADT, ref_start_date = TRTSDT, ref_end_date = TRTEDT, ref_end_window = 60 ) bds3 <- tibble::tribble( ~ADTM, ~TRTSDTM, ~TRTEDTM, ~TPT, "2020-01-02T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA, "2020-01-01T12:00", "2020-01-01T12:00", "2020-03-01T12:00", "PRE", "2019-12-31T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA ) %>% mutate( ADTM = ymd_hm(ADTM), TRTSDTM = ymd_hm(TRTSDTM), TRTEDTM = ymd_hm(TRTEDTM) ) derive_var_ontrtfl( bds3, start_date = ADTM, ref_start_date = TRTSDTM, ref_end_date = TRTEDTM, filter_pre_timepoint = TPT == "PRE" ) ``` ## Derive Occurrence Flags {#occflag} The function `derive_var_extreme_flag()` can help derive variables such as `AOCCIFL`, `AOCCPIFL`, `AOCCSIFL`, and `AOCCzzFL`. If grades were collected, `ATOXGR` should first be derived from the source data (e.g., `mutate(ATOXGR = AETOXGR)`) and then the following can be used to flag first occurrence of maximum toxicity grade. Note that the example below is for illustration only and is not evaluated as the test data does not contain toxicity grade information. ```{r, eval=FALSE} adae <- adae %>% restrict_derivation( derivation = derive_var_extreme_flag, args = params( by_vars = exprs(USUBJID), order = exprs(desc(ATOXGR), ASTDTM, AESEQ), new_var = AOCCIFL, mode = "first" ), filter = TRTEMFL == "Y" ) ``` Similarly, `ASEV` can also be used to derive the occurrence flags, if severity is collected. In this case, the variable will need to be recoded to a numeric variable. Flag first occurrence of most severe adverse event: ```{r, eval=TRUE} adae <- adae %>% restrict_derivation( derivation = derive_var_extreme_flag, args = params( by_vars = exprs(USUBJID), order = exprs( as.integer(factor( ASEV, levels = c("DEATH THREATENING", "SEVERE", "MODERATE", "MILD") )), ASTDTM, AESEQ ), new_var = AOCCIFL, mode = "first" ), filter = TRTEMFL == "Y" ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, ASTDTM, ASEV, AESEQ, TRTEMFL, AOCCIFL ) ) ``` ## Derive Query Variables {#query} For deriving query variables `SMQzzNAM`, `SMQzzCD`, `SMQzzSC`, `SMQzzSCN`, or `CQzzNAM` the `derive_vars_query()` function can be used. As input it expects a queries dataset, which provides the definition of the queries. See [Queries dataset documentation](queries_dataset.html) for a detailed description of the queries dataset. The `create_query_data()` function can be used to create queries datasets. The following example shows how to derive query variables for Standardized MedDRA Queries (SMQs) in ADAE. ```{r, eval=TRUE} queries <- admiral::queries ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette(queries) ``` ```{r, eval=TRUE} adae1 <- tibble::tribble( ~USUBJID, ~ASTDTM, ~AETERM, ~AESEQ, ~AEDECOD, ~AELLT, ~AELLTCD, "01", "2020-06-02 23:59:59", "ALANINE AMINOTRANSFERASE ABNORMAL", 3, "Alanine aminotransferase abnormal", NA_character_, NA_integer_, "02", "2020-06-05 23:59:59", "BASEDOW'S DISEASE", 5, "Basedow's disease", NA_character_, 1L, "03", "2020-06-07 23:59:59", "SOME TERM", 2, "Some query", "Some term", NA_integer_, "05", "2020-06-09 23:59:59", "ALVEOLAR PROTEINOSIS", 7, "Alveolar proteinosis", NA_character_, NA_integer_ ) adae_query <- derive_vars_query(dataset = adae1, dataset_queries = queries) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette(adae_query) ``` Similarly to SMQ, the `derive_vars_query()` function can be used to derive Standardized Drug Groupings (SDG). ```{r, eval=TRUE} sdg <- tibble::tribble( ~PREFIX, ~GRPNAME, ~GRPID, ~SCOPE, ~SCOPEN, ~SRCVAR, ~TERMCHAR, ~TERMNUM, "SDG01", "Diuretics", 11, "BROAD", 1, "CMDECOD", "Diuretic 1", NA, "SDG01", "Diuretics", 11, "BROAD", 1, "CMDECOD", "Diuretic 2", NA, "SDG02", "Costicosteroids", 12, "BROAD", 1, "CMDECOD", "Costicosteroid 1", NA, "SDG02", "Costicosteroids", 12, "BROAD", 1, "CMDECOD", "Costicosteroid 2", NA, "SDG02", "Costicosteroids", 12, "BROAD", 1, "CMDECOD", "Costicosteroid 3", NA, ) adcm <- tibble::tribble( ~USUBJID, ~ASTDTM, ~CMDECOD, "01", "2020-06-02 23:59:59", "Diuretic 1", "02", "2020-06-05 23:59:59", "Diuretic 1", "03", "2020-06-07 23:59:59", "Costicosteroid 2", "05", "2020-06-09 23:59:59", "Diuretic 2" ) adcm_query <- derive_vars_query(adcm, sdg) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette(adcm_query) ``` ## Add the `ADSL` variables {#adsl_vars} If needed, the other `ADSL` variables can now be added: ```{r eval=TRUE, echo=TRUE} adae <- adae %>% derive_vars_merged( dataset_add = select(adsl, !!!negate_vars(adsl_vars)), by_vars = exprs(STUDYID, USUBJID) ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( adae, display_vars = exprs( USUBJID, AEDECOD, ASTDTM, DTHDT, RFSTDTC, RFENDTC, AGE, AGEU, SEX ) ) ``` ## Derive Analysis Sequence Number {#aseq} The function `derive_var_obs_number()` can be used for deriving `ASEQ` variable to ensure the uniqueness of subject records within the dataset. Note that creating `ASEQ` is not required for all ADaM datasets according to the ADaM IG, and this is just for demonstration purpose. For example, there can be multiple records present in `ADCM` for a single subject with the same `ASTDTM` and `CMSEQ` variables. But these records still differ at ATC level: ``` {r eval=TRUE, echo=TRUE} adcm <- tibble::tribble( ~USUBJID, ~ASTDTM, ~CMSEQ, ~CMDECOD, ~ATC1CD, ~ATC2CD, ~ATC3CD, ~ATC4CD, "BP40257-1001", "2013-07-05 UTC", "14", "PARACETAMOL", "N", "N02", "N02B", "N02BE", "BP40257-1001", "2013-08-15 UTC", "18", "SOLUMEDROL", "D", "D10", "D10A", "D10AA", "BP40257-1001", "2013-08-15 UTC", "18", "SOLUMEDROL", "D", "D07", "D07A", "D07AA", "BP40257-1001", "2013-08-15 UTC", "18", "SOLUMEDROL", "H", "H02", "H02A", "H02AB", "BP40257-1002", "2012-12-15 UTC", "19", "SPIRONOLACTONE", "C", "C03", "C03D", "C03DA" ) adcm_aseq <- adcm %>% # Calculate ASEQ (Optional Variable) derive_var_obs_number( by_vars = exprs(USUBJID), order = exprs(ASTDTM, CMSEQ, ATC1CD, ATC2CD, ATC3CD, ATC4CD), new_var = ASEQ, check_type = "error" ) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette(adcm_aseq) ``` ```{r, results='asis', echo=FALSE} admiral_labels_attrs_section() ``` # Example Scripts ADaM | Sourcing Command ---- | -------------- `ADAE` | `use_ad_template("ADAE")` `ADCM` | `use_ad_template("ADCM")`