Writing data to REDCap is more difficult than reading data from REDCap. When you read, you receive data in the structure that the REDCap provides you. You have some control about the columns, rows, and data types, but there is not a lot you have to be concerned.
In contrast, the structure of the dataset you send to the REDCap server must be precise. You need to pass special variables so that the REDCap server understands the hierarchical structure of the data points. This vignette walks you through that process.
If you are new to REDCap and its API, please first understand the concepts described in these two vignettes:
As described in the Retrieving Longitudinal and Repeating Structures vignette, the best way to read and write data from projects with longitudinal/repeating elements is to break up the “block matrix” dataset into individual datasets. Each rectangle should have a coherent grain.
Following this strategy, we’ll write to the REDCap server in two distinct steps:
The actual upload phase is pretty straight-forward –it’s just a call
to REDCapR::redcap_write()
. Most of the vignette’s code
prepares the dataset so that the upload will run smoothly.
See the Typical REDCap Workflow for a Data Analyst vignette and
Please closely read the Retrieve Protected Token section, which has important security implications. The current vignette imports a fake dataset into REDCap, and we’ll use a token stored in a local file.
To keep this vignette focused on writing/importing/uploading to the server, we’ll start with the data that needs to be written. These example tables were prepared by Raymond Balise for our 2023 R/Medicine workshop, “Using REDCap and R to Rapidly Produce Biomedical Publications”.
There are two tables, each with a different granularity:
ds_patient
: each row represents one patient,ds_daily
: each row represents one daily measurement per
patient.Besides the data.frame
to write to REDCap, the only required arguments of the REDCapR::redcap_write()
function are redcap_uri
and token
; both are
contained in the credential object created in the previous section.
As discussed in the Troubleshooting vignette, we recommend running these two preliminary checks before trying to write the dataset to the server for the very first time.
If the REDCap project isn’t longitudinal and doesn’t have arms,
uploading a patient-level data.frame to REDCap doesn’t require adding
variables. However we typically populate the *_complete
variables to communicate the record’s status.
If the row is needs a human to add more values or inspect the existing values consider marking the instrument “incomplete” or “unverified”; the patient’s instrument record will appear red or yellow in REDCap’s Record Dashboard. Otherwise consider marking the instrument “complete” so it will appear green.
With this example project, the only patient-level instrument is
“enrollment”, so the corresponding variable is
enrollment_complete
.
REDCapR::validate_for_write()
REDCapR::validate_for_write()
inspects a data frame to
anticipate potential problems before writing with REDCap’s API. A tibble
is returned, with one row per potential problem (and a suggestion how to
avoid it). Ideally an 0-row tibble is returned.
If you encounter problems that can be checked with automation, please
tell us in an
issue. We’ll work with you to incorporate the new check into
REDCapR::validate_for_write()
.
When a dataset’s problems are caught before reaching the server, the solutions are easier to identify and implement.
If this is your first time with a complicated project, consider loading a small subset of rows and columns. In this case, we start with only three columns and two rows.
Some variables in the data.frame might be represented differently than in REDCap.
A common transformation is changing strings into the integers that
underlie radio buttons. Common approaches are dplyr::case_match()
and using joining to lookup tables (if the mappings are expressed in a
csv). Here’s an in-line example of dplyr::case_match()
.
If the small subset works, we usually jump ahead and try all columns and rows.
If this larger table fails, split the difference between (a) the smaller working example and (b) the larger failing example. See if this middle point (that has fewer rows and/or columns than the failing point) succeeds or fails. Then repeat. This “bisection” or “binary search” debugging technique is helpful in many areas of programming and statistical modeling.
As stated in the vignette’s intro, the structure of the dataset uploaded to the server must be precise. When uploading repeating instruments, there are several important columns:
record_id
: typically indicates the patient’s id. (This
field can be renamed for the project.)redcap_event_name
: If the project is longitudinal or
has arms, this indicates the event. Otherwise, you don’t need to add
this variable.redcap_repeat_instrument
: Indicates the instrument/form
that is repeating for these columns.redcap_repeat_instance
: Typically a sequential positive
integer (e.g., 1, 2, 3, …) indicating the order.The combination of these variables needs to be unique. Please read the Retrieving Longitudinal and Repeating Structures vignette for details of these variables and their meanings.
You need to pass specific variables so that the REDCap server understands the hierarchical structure of the data points.
# repeat-plumbing
ds_daily <-
ds_daily |>
dplyr::group_by(id_code) |>
dplyr::mutate(
redcap_repeat_instrument = "daily",
redcap_repeat_instance = dplyr::row_number(da_date),
daily_complete = REDCapR::constant("form_complete"),
) |>
dplyr::ungroup() |>
dplyr::select(
id_code, # Or `record_id`, if you didn't rename it
# redcap_event_name, # If the project is longitudinal or has arms
redcap_repeat_instrument, # The name of the repeating instrument/form
redcap_repeat_instance, # The sequence of the repeating instrument
tidyselect::everything(), # All columns not explicitly passed to `dplyr::select()`
daily_complete, # Indicates incomplete, unverified, or complete
)
# Check for potential problems. (Remember zero rows are good.)
REDCapR::validate_for_write(ds_daily, convert_logical_to_integer = TRUE)
ds_daily
This vignette required only two data.frames, but more complex projects sometimes need more. For example, each repeating instrument should be its own data.frame and writing step. Arms and longitudinal events need to be considered too.
By default, REDCapR::redcap_write()
requests datasets of
100 patients as a time, and stacks the resulting subsets together before
returning a data.frame. This can be adjusted to improve performance; the
‘Details’ section of REDCapR::redcap_write()
discusses the
trade offs.
I usually shoot for ~10 seconds per batch.
Manual downloading/uploading might make sense if you’re do the operation only once. But when does it ever stop after the first time?
If you have trouble uploading, consider adding a few fake patients & measurements and then download the csv. It might reveal something you didn’t anticipate. But be aware that it will be in the block matrix format (i.e., everything jammed into one rectangle.)
The Clinical Data
Interoperability Services (CDIS) use FHIR to move data from
your institution’s EMR/EHR
(eg, Epic, Cerner) to REDCap. Research staff have control over which
patient records are selected or eligible. Conceptually it’s similar to
writing to REDCap’s with the API, but at much bigger scale.
Realistically, it takes months to get through your institution’s human
layers. Once established, a project would be populated with EMR data in
much less development time
–assuming the desired data models corresponds with FHIR endpoints.
This vignette was originally designed for the 2023 R/Medicine workshop, Using REDCap and R to Rapidly Produce Biomedical Publications Cleaning Medical Data with Raymond R. Balise, Belén Hervera, Daniel Maya, Anna Calderon, Tyler Bartholomew, Stephan Kadauke, and João Pedro Carmezim Correia and the 2024 R/Medicine workshop, REDCap + R: Teaming Up in the Tidyverse, with Stephan Kadauke. The workshop slides are for 2023 and 2024.
This work was made possible in part by the NIH grant U54GM104938 to the Oklahoma Shared Clinical and Translational Resource).
For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-10-24
#> pandoc 3.2.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [2] RSPM (R 4.4.0)
#> bit 4.5.0 2024-09-20 [2] RSPM (R 4.4.0)
#> bit64 4.5.2 2024-09-22 [2] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> buildtools 1.0.0 2024-10-09 [3] local (/pkg)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> checkmate 2.3.2 2024-07-29 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [2] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> curl 5.2.3 2024-09-20 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> dplyr 1.1.4 2023-11-17 [2] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [2] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> highr 0.11 2024-05-26 [2] RSPM (R 4.4.0)
#> hms 1.1.3 2023-03-21 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [2] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> kableExtra 1.4.0 2024-01-24 [2] RSPM (R 4.4.0)
#> knitr * 1.48 2024-07-07 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> magrittr * 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> maketools 1.3.1 2024-10-09 [3] Github (jeroen/maketools@d46f92c)
#> munsell 0.5.1 2024-04-01 [2] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> readr 2.1.5 2024-01-10 [2] RSPM (R 4.4.0)
#> REDCapR * 1.3.0 2024-10-23 [1] CRAN (R 4.4.1)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown * 2.28 2024-08-17 [2] RSPM (R 4.4.0)
#> rstudioapi 0.17.1 2024-10-22 [2] RSPM (R 4.4.0)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> scales 1.3.0 2023-11-28 [2] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> svglite 2.1.3 2023-12-08 [2] RSPM (R 4.4.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.4.0)
#> systemfonts 1.1.0 2024-05-15 [2] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [2] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [2] RSPM (R 4.4.0)
#> vroom 1.6.5 2023-12-05 [2] RSPM (R 4.4.0)
#> withr 3.0.1 2024-07-31 [2] RSPM (R 4.4.0)
#> xfun 0.48 2024-10-03 [2] RSPM (R 4.4.0)
#> xml2 1.3.6 2023-12-04 [2] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#>
#> [1] /tmp/RtmpkO8eZW/Rinst107a75c7dcff
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────