Title: | Extract 'REDCap' Databases into Tidy 'Tibble's |
---|---|
Description: | Convert 'REDCap' exports into tidy tables for easy handling of 'REDCap' repeat instruments and event arms. |
Authors: | Richard Hanna [aut, cre] , Stephan Kadauke [aut] , Ezra Porter [aut] |
Maintainer: | Richard Hanna <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0 |
Built: | 2024-10-24 04:25:12 UTC |
Source: | CRAN |
Add default skim metrics to the redcap_data
list elements of
a supertibble output from read_readcap
.
add_skimr_metadata(supertbl)
add_skimr_metadata(supertbl)
supertbl |
a supertibble generated using |
For more information on the default metrics provided, check the get_default_skimmer_names documentation.
A supertibble with skimr metadata metrics
superheroes_supertbl add_skimr_metadata(superheroes_supertbl) ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) add_skimr_metadata(supertbl) ## End(Not run)
superheroes_supertbl add_skimr_metadata(superheroes_supertbl) ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) add_skimr_metadata(supertbl) ## End(Not run)
Take a supertibble generated with read_redcap()
and bind its data tibbles (i.e. the tibbles in the redcap_data
column) to
an environment. The default is the global environment.
bind_tibbles(supertbl, environment = global_env(), tbls = NULL)
bind_tibbles(supertbl, environment = global_env(), tbls = NULL)
supertbl |
A supertibble generated by |
environment |
The environment to bind the tibbles to. Default is
|
tbls |
A vector of the |
This function returns nothing as it's used solely for its side effect of modifying an environment.
## Not run: # Create an empty environment my_env <- new.env() ls(my_env) superheroes_supertbl bind_tibbles(superheroes_supertbl, my_env) ls(my_env) ## End(Not run)
## Not run: # Create an empty environment my_env <- new.env() ls(my_env) superheroes_supertbl bind_tibbles(superheroes_supertbl, my_env) ls(my_env) ## End(Not run)
combine_checkboxes()
consolidates multiple checkbox fields in a REDCap data
tibble into a single column. This transformation simplifies analysis by
merging several binary columns into one labeled factor column, making the
data more interpretable and easier to analyze.
combine_checkboxes( supertbl, tbl, cols, names_prefix = "", names_sep = "_", names_glue = NULL, names_repair = "check_unique", multi_value_label = "Multiple", values_fill = NA, raw_or_label = "label", keep = TRUE )
combine_checkboxes( supertbl, tbl, cols, names_prefix = "", names_sep = "_", names_glue = NULL, names_repair = "check_unique", multi_value_label = "Multiple", values_fill = NA, raw_or_label = "label", keep = TRUE )
supertbl |
A supertibble generated by |
tbl |
The |
cols |
Checkbox columns to combine to single column. Required. |
names_prefix |
String added to the start of every variable name. |
names_sep |
String to separate new column names from |
names_glue |
Instead of |
names_repair |
What happens if the output has invalid column names?
The default, "check_unique" is to error if the columns are duplicated.
Use "minimal" to allow duplicates in the output, or "unique" to de-duplicated
by adding numeric suffixes. See |
multi_value_label |
A string specifying the value to be used when multiple checkbox fields are selected. Default "Multiple". |
values_fill |
Value to use when no checkboxes are selected. Default |
raw_or_label |
Either 'raw' or 'label' to specify whether to use raw coded values or labels for the options. Default 'label'. |
keep |
Logical indicating whether to keep the original checkbox fields in
the output. Default |
combine_checkboxes()
operates on the data and metadata tibbles produced by
the read_redcap()
function. Since it relies on the checkbox field naming
conventions used by REDCap, changes to the checkbox variable names or their
associated metadata field_name
s could lead to errors.
REDCap checkbox fields are typically expanded into separate variables for each
checkbox option, with names formatted as checkbox_var___1
, checkbox_var___2
,
etc. combine_checkboxes()
detects these variables and combines them into a
single column. If the expected variables are not found, an error is returned.
A modified supertibble.
# Set up sample data tibble data_tbl <- tibble::tribble( ~"study_id", ~"multi___1", ~"multi___2", ~"multi___3", 1, TRUE, FALSE, FALSE, 2, TRUE, TRUE, FALSE, 3, FALSE, FALSE, FALSE ) # Set up sample metadata tibble metadata_tbl <- tibble::tribble( ~"field_name", ~"field_type", ~"select_choices_or_calculations", "study_id", "text", NA, "multi___1", "checkbox", "1, Red | 2, Yellow | 3, Blue", "multi___2", "checkbox", "1, Red | 2, Yellow | 3, Blue", "multi___3", "checkbox", "1, Red | 2, Yellow | 3, Blue" ) # Create sample supertibble supertbl <- tibble::tribble( ~"redcap_form_name", ~"redcap_data", ~"redcap_metadata", "tbl", data_tbl, metadata_tbl ) class(supertbl) <- c("redcap_supertbl", class(supertbl)) # Combine checkboxes under column "multi" combine_checkboxes( supertbl = supertbl, tbl = "tbl", cols = starts_with("multi") ) |> dplyr::pull(redcap_data) |> dplyr::first() ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) combine_checkboxes( supertbl = supertbl, tbl = "tbl", cols = starts_with("col"), multi_value_label = "Multiple", values_fill = NA ) ## End(Not run)
# Set up sample data tibble data_tbl <- tibble::tribble( ~"study_id", ~"multi___1", ~"multi___2", ~"multi___3", 1, TRUE, FALSE, FALSE, 2, TRUE, TRUE, FALSE, 3, FALSE, FALSE, FALSE ) # Set up sample metadata tibble metadata_tbl <- tibble::tribble( ~"field_name", ~"field_type", ~"select_choices_or_calculations", "study_id", "text", NA, "multi___1", "checkbox", "1, Red | 2, Yellow | 3, Blue", "multi___2", "checkbox", "1, Red | 2, Yellow | 3, Blue", "multi___3", "checkbox", "1, Red | 2, Yellow | 3, Blue" ) # Create sample supertibble supertbl <- tibble::tribble( ~"redcap_form_name", ~"redcap_data", ~"redcap_metadata", "tbl", data_tbl, metadata_tbl ) class(supertbl) <- c("redcap_supertbl", class(supertbl)) # Combine checkboxes under column "multi" combine_checkboxes( supertbl = supertbl, tbl = "tbl", cols = starts_with("multi") ) |> dplyr::pull(redcap_data) |> dplyr::first() ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) combine_checkboxes( supertbl = supertbl, tbl = "tbl", cols = starts_with("col"), multi_value_label = "Multiple", values_fill = NA ) ## End(Not run)
Take a supertibble generated with read_redcap()
and return one of its data tibbles.
extract_tibble(supertbl, tbl)
extract_tibble(supertbl, tbl)
supertbl |
A supertibble generated by |
tbl |
The |
This function makes it easy to extract a single instrument's data from a REDCapTidieR supertibble.
A tibble
.
superheroes_supertbl extract_tibble(superheroes_supertbl, "heroes_information")
superheroes_supertbl extract_tibble(superheroes_supertbl, "heroes_information")
Take a supertibble generated with read_redcap()
and return a named list of data tibbles.
extract_tibbles(supertbl, tbls = everything())
extract_tibbles(supertbl, tbls = everything())
supertbl |
A supertibble generated by |
tbls |
A vector of |
This function makes it easy to extract a multiple instrument's data from a
REDCapTidieR supertibble into a named list. Specifying instruments using
tidyselect helper functions such as dplyr::starts_with()
or dplyr::ends_with()
is supported.
A named list of tibble
s
superheroes_supertbl # Extract all data tibbles extract_tibbles(superheroes_supertbl) # Only extract data tibbles starting with "heroes" extract_tibbles(superheroes_supertbl, starts_with("heroes"))
superheroes_supertbl # Extract all data tibbles extract_tibbles(superheroes_supertbl) # Only extract data tibbles starting with "heroes" extract_tibbles(superheroes_supertbl, starts_with("heroes"))
Use these functions with the format_labels
argument of
make_labelled()
to define how variable labels should be formatted before
being applied to the data columns of redcap_data
. These functions are
helpful to create pretty variable labels from REDCap field labels.
fmt_strip_whitespace()
removes extra white space inside and at the start
and end of a string. It is a thin wrapper of stringr::str_trim()
and
stringr::str_squish()
.
fmt_strip_trailing_colon()
removes a colon character at the end of a string.
fmt_strip_trailing_punct()
removes punctuation at the end of a string.
fmt_strip_html()
removes html tags from a string.
fmt_strip_field_embedding()
removes text between curly braces {}
which
REDCap uses for special "field embedding" logic. Note that read_redcap()
removes html tags and field embedding logic from field labels in the metadata
by default.
fmt_strip_whitespace(x) fmt_strip_trailing_colon(x) fmt_strip_trailing_punct(x) fmt_strip_html(x) fmt_strip_field_embedding(x)
fmt_strip_whitespace(x) fmt_strip_trailing_colon(x) fmt_strip_trailing_punct(x) fmt_strip_html(x) fmt_strip_field_embedding(x)
x |
a character vector |
a modified character vector
fmt_strip_whitespace("Poorly Spaced Label ") fmt_strip_trailing_colon("Label:") fmt_strip_trailing_punct("Label-") fmt_strip_html("<b>Bold Label</b>") fmt_strip_field_embedding("Label{another_field}") superheroes_supertbl make_labelled(superheroes_supertbl, format_labels = fmt_strip_trailing_colon)
fmt_strip_whitespace("Poorly Spaced Label ") fmt_strip_trailing_colon("Label:") fmt_strip_trailing_punct("Label-") fmt_strip_html("<b>Bold Label</b>") fmt_strip_field_embedding("Label{another_field}") superheroes_supertbl make_labelled(superheroes_supertbl, format_labels = fmt_strip_trailing_colon)
Take a supertibble and use the labelled
package to apply variable labels to
the columns of the supertibble as well as to each tibble in the
redcap_data
, redcap_metadata
, and redcap_events
columns
of that supertibble.
make_labelled(supertbl, format_labels = NULL)
make_labelled(supertbl, format_labels = NULL)
supertbl |
a supertibble generated using |
format_labels |
one or multiple optional label formatting functions. A label formatting function is a function that takes a character vector and returns a modified character vector of the same length. This function is applied to field labels before attaching them to variables. One of:
|
The variable labels for the data tibbles are derived from the field_label
column of the metadata tibble.
A labelled supertibble.
superheroes_supertbl make_labelled(superheroes_supertbl) make_labelled(superheroes_supertbl, format_labels = tolower) ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) make_labelled(supertbl) ## End(Not run)
superheroes_supertbl make_labelled(superheroes_supertbl) make_labelled(superheroes_supertbl, format_labels = tolower) ## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) make_labelled(supertbl) ## End(Not run)
Query the REDCap API to retrieve data and metadata about a project, and transform the output into a "supertibble" that contains data and metadata organized into tibbles, broken down by instrument.
read_redcap( redcap_uri, token, raw_or_label = "label", forms = NULL, export_survey_fields = NULL, export_data_access_groups = NULL, suppress_redcapr_messages = TRUE, guess_max = Inf, allow_mixed_structure = getOption("redcaptidier.allow.mixed.structure", FALSE) )
read_redcap( redcap_uri, token, raw_or_label = "label", forms = NULL, export_survey_fields = NULL, export_data_access_groups = NULL, suppress_redcapr_messages = TRUE, guess_max = Inf, allow_mixed_structure = getOption("redcaptidier.allow.mixed.structure", FALSE) )
redcap_uri |
The URI/URL of the REDCap server (e.g., "https://server.org/apps/redcap/api/"). Required. |
token |
The user-specific string that serves as the password for a project. Required. |
raw_or_label |
A string (either 'raw', 'label', or 'haven') that specifies whether
to export the raw coded values or the labels for the options of categorical
fields. Default is 'label'. If 'haven' is supplied, categorical fields are converted
to |
forms |
A character vector of REDCap instrument names that specifies
which instruments to import. Default is |
export_survey_fields |
A logical that specifies whether to export
survey identifier and timestamp fields. The default, |
export_data_access_groups |
A logical that specifies whether to export
the data access group field. The default, |
suppress_redcapr_messages |
A logical to control whether to suppress messages
from REDCapR API calls. Default |
guess_max |
A positive base::numeric value
passed to |
allow_mixed_structure |
A logical to allow for support of mixed repeating/non-repeating
instruments. Setting to |
This function uses the REDCapR
package to query the REDCap API. The REDCap API returns a
block matrix that mashes
data from all data collection instruments
together. The read_redcap()
function
deconstructs the block matrix and splices the data into individual tibbles,
where one tibble represents the data from one instrument.
A tibble
in which each row represents a REDCap instrument. It
contains the following columns:
redcap_form_name
, the name of the instrument
redcap_form_label
, the label for the instrument
redcap_data
, a tibble with the data for the instrument
redcap_metadata
, a tibble of data dictionary entries for each field in the instrument
redcap_events
, a tibble with information about the arms and longitudinal events represented in the instrument.
Only if the project has longitudinal events enabled
structure
, the instrument structure, either "repeating" or "nonrepeating"
data_rows
, the number of rows in the instrument's data tibble
data_cols
, the number of columns in the instrument's data tibble
data_size
, the size in memory of the instrument's data tibble computed by lobstr::obj_size()
data_na_pct
, the percentage of cells in the instrument's data columns that are NA
excluding identifier and
form completion columns
## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") read_redcap( redcap_uri, token, raw_or_label = "label" ) ## End(Not run)
## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") read_redcap( redcap_uri, token, raw_or_label = "label" ) ## End(Not run)
A dataset of superheroes in a REDCapTidieR supertbl
object
superheroes_supertbl
superheroes_supertbl
heroes_information
A tibble
with 734 rows and 12 columns:
REDCap record ID
Hero name
Gender
Eye color
Race
Hair color
Height
Weight
Publisher
Skin color
Alignment
REDCap instrument completed?
super_hero_powers
A tibble
with 5,966 rows and 4 columns:
REDCap record ID
REDCap repeat instance
Super power
REDCap instrument completed?
tbl_sum()
gives a brief textual description of a table-like object,
which should include the dimensions and the data source in the first element,
and additional information in the other elements (such as grouping for dplyr).
The default implementation forwards to obj_sum()
.
## S3 method for class 'redcap_supertbl' tbl_sum(x)
## S3 method for class 'redcap_supertbl' tbl_sum(x)
x |
Object to summarise. |
A named character vector, describing the dimensions in the first element and the data source in the name of the first element.
vec_ptype_full()
displays the full type of the vector. vec_ptype_abbr()
provides an abbreviated summary suitable for use in a column heading.
## S3 method for class 'redcap_supertbl' vec_ptype_abbr(x, ..., prefix_named, suffix_shape)
## S3 method for class 'redcap_supertbl' vec_ptype_abbr(x, ..., prefix_named, suffix_shape)
x |
A vector. |
... |
These dots are for future extensions and must be empty. |
prefix_named |
If |
suffix_shape |
If |
A string.
Transform a supertibble into an XLSX file, with each REDCap data tibble in a separate sheet.
write_redcap_xlsx( supertbl, file, add_labelled_column_headers = NULL, use_labels_for_sheet_names = TRUE, include_toc_sheet = TRUE, include_metadata_sheet = TRUE, table_style = "tableStyleLight8", column_width = "auto", recode_logical = TRUE, na_replace = "", overwrite = FALSE )
write_redcap_xlsx( supertbl, file, add_labelled_column_headers = NULL, use_labels_for_sheet_names = TRUE, include_toc_sheet = TRUE, include_metadata_sheet = TRUE, table_style = "tableStyleLight8", column_width = "auto", recode_logical = TRUE, na_replace = "", overwrite = FALSE )
supertbl |
A supertibble generated using |
file |
The name of the file to which the output will be written. |
add_labelled_column_headers |
If |
use_labels_for_sheet_names |
If |
include_toc_sheet |
If |
include_metadata_sheet |
If |
table_style |
Any Excel table style name or "none". For more details, see
the
"formatting" vignette
of the |
column_width |
Sets the width of columns throughout the workbook. The default is "auto", but you can specify a numeric value. |
recode_logical |
If |
na_replace |
The value used to replace |
overwrite |
If |
An openxlsx2
workbook object, invisibly
## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) supertbl %>% write_redcap_xlsx(file = "supertibble.xlsx") # Add variable labels library(labelled) supertbl %>% make_labelled() %>% write_redcap_xlsx(file = "supertibble.xlsx", add_labelled_column_headers = TRUE) ## End(Not run)
## Not run: redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAP_TOKEN") supertbl <- read_redcap(redcap_uri, token) supertbl %>% write_redcap_xlsx(file = "supertibble.xlsx") # Add variable labels library(labelled) supertbl %>% make_labelled() %>% write_redcap_xlsx(file = "supertibble.xlsx", add_labelled_column_headers = TRUE) ## End(Not run)