REDCap Projects as R Data Packages

The purpose of this vignette is to show how to export a REDCap project into a R data package.

Possible use cases for this are:

  1. You have data in a REDCap project that needs to be archived.

  2. Snapshots of REDCap projects.

  3. Sharing data with other analysts who have authority to see and work on the data, but for some reason may not have access to REDCap.

This vignette will assume you are able to call export_core successfully. Given that call requires access to REDCap, the example data set avs_raw_core is provided.

data(avs_raw_core, package = "REDCapExporter")
str(avs_raw_core)
## List of 4
##  $ project_raw : 'rcer_raw_project' chr "project_id,project_title,creation_time,production_time,in_production,project_language,purpose,purpose_other,pro"| __truncated__
##   ..- attr(*, "url")= chr "https://redcap.ucdenver.edu/api/"
##   ..- attr(*, "status_code")= int 200
##   ..- attr(*, "times")= Named num [1:6] 0 0.000017 0 0.000105 0.142771 ...
##   .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##   ..- attr(*, "Content-Type")= chr "text/csv; charset=utf-8"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2024-09-19 17:09:21"
##  $ metadata_raw: 'rcer_raw_metadata' chr "field_name,form_name,section_header,field_type,field_label,select_choices_or_calculations,field_note,text_valid"| __truncated__
##   ..- attr(*, "url")= chr "https://redcap.ucdenver.edu/api/"
##   ..- attr(*, "status_code")= int 200
##   ..- attr(*, "times")= Named num [1:6] 0 0.000017 0 0.000099 0.133926 ...
##   .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##   ..- attr(*, "Content-Type")= chr "text/csv; charset=utf-8"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2024-09-19 17:09:21"
##  $ user_raw    : 'rcer_raw_user' chr "username,email,firstname,lastname,expiration,data_access_group,data_access_group_id,design,alerts,user_rights,d"| __truncated__
##   ..- attr(*, "url")= chr "https://redcap.ucdenver.edu/api/"
##   ..- attr(*, "status_code")= int 200
##   ..- attr(*, "times")= Named num [1:6] 0 0.000022 0 0.0001 0.118193 ...
##   .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##   ..- attr(*, "Content-Type")= chr "text/csv; charset=utf-8"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2024-09-19 17:09:21"
##  $ record_raw  : 'rcer_raw_record' chr "record_id,uniform_number,firstname,lastname,hof,nationality,position,birthdate,first_nhl_game,last_nhl_game,hei"| __truncated__
##   ..- attr(*, "url")= chr "https://redcap.ucdenver.edu/api/"
##   ..- attr(*, "status_code")= int 200
##   ..- attr(*, "times")= Named num [1:6] 0 0.000015 0 0.0001 0.155303 ...
##   .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##   ..- attr(*, "Content-Type")= chr "text/csv; charset=utf-8"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2024-09-19 17:09:21"
##  - attr(*, "class")= chr "rcer_rccore"

avs_raw_core is the result of calling export_core and contains data on the 2000-2001 Stanley Cup Champion Colorado Avalanche. The data was transcribed from Hockey Reference into a REDCap Project hosed at the University of Colorado Denver.

Exporting a REDCap Project to a R Data Package

Exporting a REDCap project to a R data package is done with a call to build_r_data_package. If the user passes the uri for the API and an API token a call to export_core will be made. Alternatively, build_r_data_package is an S3 method and can be applied to a rcer_rccore object.

To build the skeleton of a R data package you will need to pass in the core export from the REDCap project, a path for were the source code for the data package will be written, and some some information about the users. In this context, users are the persons who have, or had, access to the REDCap project and are listed under the UserRights section of the REDCap project. The user data from REDCap is used to construct the Author section of the DESCRIPTION file for the R data package to be constructed. By default, all users are listed as ‘contributors’. Modification of the roles can be provide by a named list object. In the example below, the user dewittp is going to assigned the creator and author role. To be a valid R package, at least one user will need to have the creator role assigned.

temppath <- tempdir()

build_r_data_package(
  x            = avs_raw_core,
  path         = temppath,
  author_roles = list(dewittp = c("cre", "aut"))
)
## Creating source package at /tmp/Rtmp0viml2/rcd14465
## ℹ Updating rcd14465 documentation
## First time using roxygen2. Upgrading automatically...
## Setting `RoxygenNote` to "7.3.2"
## ℹ Loading rcd14465
## Writing 'NAMESPACE'
## Writing 'project.Rd'
## Writing 'metadata.Rd'
## Writing 'user.Rd'
## Writing 'record.Rd'

The resulting directory is: echo = FALSE, results = “markup”

fs::dir_tree(temppath)
## /tmp/Rtmp0viml2
## └── rcd14465
##     ├── DESCRIPTION
##     ├── LICENSE
##     ├── NAMESPACE
##     ├── R
##     │   └── datasets.R
##     ├── data
##     │   ├── metadata.rda
##     │   ├── project.rda
##     │   ├── record.rda
##     │   └── user.rda
##     ├── inst
##     │   └── raw-data
##     │       ├── metadata.rds
##     │       ├── project.rds
##     │       ├── record.rds
##     │       └── user.rds
##     └── man
##         ├── metadata.Rd
##         ├── project.Rd
##         ├── record.Rd
##         └── user.Rd

Details on exported Files

First, the package directory name. Exported packages from REDCapExporter will have the directory name rcd. This name is also used in the DESCRIPTION file.

The DESCRIPTION file is

prj_dir <- list.dirs(temppath)
prj_dir <- prj_dir[grepl("/rcd\\d+$", prj_dir)]
t(read.dcf(paste(prj_dir, "DESCRIPTION", sep = "/")))
##                 [,1]                                                                                                                                                                                                                                                                                                                                                      
## Package         "rcd14465"                                                                                                                                                                                                                                                                                                                                                
## Title           "2000-2001 Colorado Avalanche"                                                                                                                                                                                                                                                                                                                            
## Version         "2024.11.07.13.37"                                                                                                                                                                                                                                                                                                                                        
## Authors@R       "c(person(given = \"Tell\", family = \"Bennett\", email = \"[email protected]\", role = c(\"ctb\")),\nperson(given = \"Peter\", family = \"DeWitt\", email = \"[email protected]\", role = c(\"cre\", \"aut\")),\nperson(given = \"Alexandria\", family = \"Jensen\", email = \"[email protected]\", role = c(\"ctb\")))"
## Description     "Data and documentation from the REDCap Project."                                                                                                                                                                                                                                                                                                         
## License         "file LICENSE"                                                                                                                                                                                                                                                                                                                                            
## Encoding        "UTF-8"                                                                                                                                                                                                                                                                                                                                                   
## LazyData        "true"                                                                                                                                                                                                                                                                                                                                                    
## Suggests        "knitr"                                                                                                                                                                                                                                                                                                                                                   
## VignetteBuilder "knitr"                                                                                                                                                                                                                                                                                                                                                   
## RoxygenNote     "7.3.2"

The title comes from the project info recorded in REDCap. The version number is set as the year.month.day.hour.minute of the export. As noted above, the Author field is built from the user data stored in REDCap.

The LICENSE file notes that the package is proprietary and should not be installed or distributed to others who are not authorized to have access to the data.

cat(readLines(paste(prj_dir[1], "LICENSE", sep = "/")), sep = "\n")
## Proprietary
## 
## 
##       Do not distribute to anyone or to machines which are not authorized to hold the data.

The raw data exports are stored as .rds files under inst/raw-data so that these files will be available in R sessions after installing the package.

The data directory has data.frame versions of the data sets.

The R/datasets.R file provides the documentation for the data sets which can be accessed in an interactive R session.

Using the Exported Package

Let’s install the package and explore the contents.

tar_ball <- devtools::build(pkg = prj_dir)
## ── R CMD build ─────────────────────────────────────────────────────────────────
## * checking for file ‘/tmp/Rtmp0viml2/rcd14465/DESCRIPTION’ ... OK
## * preparing ‘rcd14465’:
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
##   NB: this package now depends on R (>= 3.5.0)
##   WARNING: Added dependency on R >= 3.5.0 because serialized objects in
##   serialize/load version 3 cannot be read in older versions of R.
##   File(s) containing such objects:
##     ‘rcd14465/data/metadata.rda’ ‘rcd14465/data/project.rda’
##     ‘rcd14465/data/record.rda’ ‘rcd14465/data/user.rda’
##     ‘rcd14465/inst/raw-data/metadata.rds’
##     ‘rcd14465/inst/raw-data/project.rds’
##     ‘rcd14465/inst/raw-data/record.rds’ ‘rcd14465/inst/raw-data/user.rds’
## * building ‘rcd14465_2024.11.07.13.37.tar.gz’
tar_ball
## [1] "/tmp/Rtmp0viml2/rcd14465_2024.11.07.13.37.tar.gz"

install.packages(pkgs = tar_ball, lib = temppath)
## inferring 'repos = NULL' from 'pkgs'
library(rcd14465, lib.loc = temppath)

The available data sets:

data(package = "rcd14465")$results
##      Package    LibPath           Item       Title     
## [1,] "rcd14465" "/tmp/Rtmp0viml2" "metadata" "Metadata"
## [2,] "rcd14465" "/tmp/Rtmp0viml2" "project"  "Project" 
## [3,] "rcd14465" "/tmp/Rtmp0viml2" "record"   "Record"  
## [4,] "rcd14465" "/tmp/Rtmp0viml2" "user"     "User"

A simple data analysis question: how many goals were scored by position?

aggregate(goals ~ position, data = record, FUN = sum)
##     position goals
## 1       Goal     0
## 2  Left Wing    71
## 3 Right Wing    72
## 4     Center    93
## 5    Defence    34