Title: | REDCap Metadata Casting and Castellated Data Handling |
---|---|
Description: | Casting metadata for REDCap database creation and handling of castellated data using repeated instruments and longitudinal projects in 'REDCap'. Keeps a focused data export approach, by allowing to only export required data from the database. Also for casting new REDCap databases based on datasets from other sources. Originally forked from the R part of 'REDCapRITS' by Paul Egeler. See <https://github.com/pegeler/REDCapRITS>. 'REDCap' (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies, providing 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources (Harris et al (2009) <doi:10.1016/j.jbi.2008.08.010>; Harris et al (2019) <doi:10.1016/j.jbi.2019.103208>). |
Authors: | Andreas Gammelgaard Damsbo [aut, cre] , Paul Egeler [aut] |
Maintainer: | Andreas Gammelgaard Damsbo <[email protected]> |
License: | GPL (>= 3) |
Version: | 24.11.2 |
Built: | 2024-11-23 17:05:12 UTC |
Source: | CRAN |
Check if vector is all NA
all_na(data)
all_na(data)
data |
vector of data.frame |
logical
rep(NA,4) |> all_na()
rep(NA,4) |> all_na()
This extends [forcats::as_factor()] as well as [haven::as_factor()], by appending original attributes except for "class" after converting to factor to avoid ta loss in case of rich formatted and labelled data.
as_factor(x, ...) ## S3 method for class 'factor' as_factor(x, ...) ## S3 method for class 'logical' as_factor(x, ...) ## S3 method for class 'numeric' as_factor(x, ...) ## S3 method for class 'character' as_factor(x, ...) ## S3 method for class 'haven_labelled' as_factor( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... ) ## S3 method for class 'labelled' as_factor( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... )
as_factor(x, ...) ## S3 method for class 'factor' as_factor(x, ...) ## S3 method for class 'logical' as_factor(x, ...) ## S3 method for class 'numeric' as_factor(x, ...) ## S3 method for class 'character' as_factor(x, ...) ## S3 method for class 'haven_labelled' as_factor( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... ) ## S3 method for class 'labelled' as_factor( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... )
x |
Object to coerce to a factor. |
... |
Other arguments passed down to method. |
levels |
How to create the levels of the generated factor: * "default": uses labels where available, otherwise the values. Labels are sorted by value. * "both": like "default", but pastes together the level and value * "label": use only the labels; unlabelled values become 'NA' * "values": use only the values |
ordered |
If 'TRUE' create an ordered (ordinal) factor, if 'FALSE' (the default) create a regular (nominal) factor. |
Please refer to parent functions for extended documentation. To avoid redundancy calls and errors, functions are copy-pasted here
# will preserve all attributes c(1, 4, 3, "A", 7, 8, 1) |> as_factor() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10) ) |> as_factor() |> dput() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor()
# will preserve all attributes c(1, 4, 3, "A", 7, 8, 1) |> as_factor() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10) ) |> as_factor() |> dput() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor()
Mimics case_when for list of regex patterns and values. Used for date/time validation generation from name vector. Like case_when, the matches are in order of priority. Primarily used in REDCapCAST to do data type coding from systematic variable naming.
case_match_regex_list(data, match.list, .default = NA)
case_match_regex_list(data, match.list, .default = NA)
data |
vector |
match.list |
list of case matches |
.default |
Default value for non-matches. Default is NA. |
vector
case_match_regex_list( c("test_date", "test_time", "test_tida", "test_tid"), list(date_dmy = "_dat[eo]$", time_hh_mm_ss = "_ti[md]e?$") )
case_match_regex_list( c("test_date", "test_time", "test_tida", "test_tid"), list(date_dmy = "_dat[eo]$", time_hh_mm_ss = "_ti[md]e?$") )
Overview of REDCapCAST data for shiny
cast_data_overview(data)
cast_data_overview(data)
data |
list with class 'REDCapCAST' |
gt object
Overview of REDCapCAST meta data for shiny
cast_meta_overview(data)
cast_meta_overview(data)
data |
list with class 'REDCapCAST' |
gt object
Simple function to generate REDCap choices from character vector
char2choice(data, char.split = "/", raw = NULL, .default = NA)
char2choice(data, char.split = "/", raw = NULL, .default = NA)
data |
vector |
char.split |
splitting character(s) |
raw |
specific values. Can be used for options of same length. |
.default |
default value for missing. Default is NA. |
vector
char2choice(c("yes/no"," yep. / nope ","",NA,"what"),.default=NA)
char2choice(c("yes/no"," yep. / nope ","",NA,"what"),.default=NA)
Simple function to generate REDCap branching logic from character vector
char2cond( data, minor.split = ",", major.split = ";", major.sep = " or ", .default = NA )
char2cond( data, minor.split = ",", major.split = ";", major.sep = " or ", .default = NA )
data |
vector |
minor.split |
minor split |
major.split |
major split |
major.sep |
argument separation. Default is " or ". |
.default |
default value for missing. Default is NA. |
vector
#data <- dd_inst$betingelse #c("Extubation_novent, 2; Pacu_delay, 1") |> char2cond()
#data <- dd_inst$betingelse #c("Extubation_novent, 2; Pacu_delay, 1") |> char2cond()
Stepwise removal on non-alphanumeric characters, trailing white space, substitutes spaces for underscores and converts to lower case. Trying to make up for different naming conventions.
clean_redcap_name(x)
clean_redcap_name(x)
x |
vector or data frame for cleaning |
vector or data frame, same format as input
Compacting a vector of any length with or without names
compact_vec(data, nm.sep = ": ", val.sep = "; ")
compact_vec(data, nm.sep = ": ", val.sep = "; ")
data |
vector, optionally named |
nm.sep |
string separating name from value if any |
val.sep |
string separating values |
character string
sample(seq_len(4), 20, TRUE) |> as_factor() |> named_levels() |> sort() |> compact_vec() 1:6 |> compact_vec() "test" |> compact_vec() sample(letters[1:9], 20, TRUE) |> compact_vec()
sample(seq_len(4), 20, TRUE) |> as_factor() |> named_levels() |> sort() |> compact_vec() 1:6 |> compact_vec() "test" |> compact_vec() sample(letters[1:9], 20, TRUE) |> compact_vec()
Create two-column HTML table for data piping in REDCap instruments
create_html_table(text, variable)
create_html_table(text, variable)
text |
descriptive text |
variable |
variable to pipe |
character vector
create_html_table(text = "Patient ID", variable = c("[cpr]")) create_html_table(text = paste("assessor", 1:2, sep = "_"), variable = c("[cpr]")) # create_html_table(text = c("CPR nummer","Word"), variable = c("[cpr][1]", "[cpr][2]", "[test]"))
create_html_table(text = "Patient ID", variable = c("[cpr]")) create_html_table(text = paste("assessor", 1:2, sep = "_"), variable = c("[cpr]")) # create_html_table(text = c("CPR nummer","Word"), variable = c("[cpr][1]", "[cpr][2]", "[test]"))
Metadata can be added by editing the data dictionary of a project in the initial design phase. If you want to later add new instruments, this function can be used to create (an) instrument(s) to add to a project in production.
create_instrument_meta(data, dir = here::here(""), record.id = TRUE)
create_instrument_meta(data, dir = here::here(""), record.id = TRUE)
data |
metadata for the relevant instrument. Could be from 'ds2dd_detailed()' |
dir |
destination dir for the instrument zip. Default is the current WD. |
record.id |
flag to omit the first row of the data dictionary assuming this is the record_id field which should not be included in the instrument. Default is TRUE. |
list
data <- iris |> ds2dd_detailed( add.auto.id = TRUE, form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2) ) ) |> purrr::pluck("meta") # data |> create_instrument_meta() data <- iris |> ds2dd_detailed(add.auto.id = FALSE) |> purrr::pluck("data") iris |> setNames(glue::glue("{sample(x = c('a','b'),size = length(ncol(iris)), replace=TRUE,prob = rep(x=.5,2))}__{names(iris)}")) |> ds2dd_detailed(form.sep = "__") # data |> # purrr::pluck("meta") |> # create_instrument_meta(record.id = FALSE)
data <- iris |> ds2dd_detailed( add.auto.id = TRUE, form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2) ) ) |> purrr::pluck("meta") # data |> create_instrument_meta() data <- iris |> ds2dd_detailed(add.auto.id = FALSE) |> purrr::pluck("data") iris |> setNames(glue::glue("{sample(x = c('a','b'),size = length(ncol(iris)), replace=TRUE,prob = rep(x=.5,2))}__{names(iris)}")) |> ds2dd_detailed(form.sep = "__") # data |> # purrr::pluck("meta") |> # create_instrument_meta(record.id = FALSE)
Convert single digits to words
d2w(x, lang = "en", neutrum = FALSE, everything = FALSE)
d2w(x, lang = "en", neutrum = FALSE, everything = FALSE)
x |
data. Handle vectors, data.frames and lists |
lang |
language. Danish (da) and English (en), Default is "en" |
neutrum |
for numbers depending on counted word |
everything |
flag to also split numbers >9 to single digits |
returns characters in same format as input
d2w(c(2:8, 21)) d2w(data.frame(2:7, 3:8, 1), lang = "da", neutrum = TRUE) ## If everything=T, also larger numbers are reduced. ## Elements in the list are same length as input d2w(list(2:8, c(2, 6, 4, 23), 2), everything = TRUE)
d2w(c(2:8, 21)) d2w(data.frame(2:7, 3:8, 1), lang = "da", neutrum = TRUE) ## If everything=T, also larger numbers are reduced. ## Elements in the list are same length as input d2w(list(2:8, c(2, 6, 4, 23), 2), everything = TRUE)
Works well with 'project.aid::docx2list()'. Allows defining a database in a text document (see provided template) for an easier to use data base creation. This approach allows easier collaboration when defining the database. The generic case is a data frame with variable names as values in a column. This is a format like the REDCap data dictionary, but gives a few options for formatting.
doc2dd( data, instrument.name, col.variables = 1, list.datetime.format = list(date_dmy = "_dat[eo]$", time_hh_mm_ss = "_ti[md]e?$"), col.description = NULL, col.condition = NULL, col.subheader = NULL, subheader.tag = "h2", condition.minor.sep = ",", condition.major.sep = ";", col.calculation = NULL, col.choices = NULL, choices.char.sep = "/", missing.default = NA )
doc2dd( data, instrument.name, col.variables = 1, list.datetime.format = list(date_dmy = "_dat[eo]$", time_hh_mm_ss = "_ti[md]e?$"), col.description = NULL, col.condition = NULL, col.subheader = NULL, subheader.tag = "h2", condition.minor.sep = ",", condition.major.sep = ";", col.calculation = NULL, col.choices = NULL, choices.char.sep = "/", missing.default = NA )
data |
tibble or data.frame with all variable names in one column |
instrument.name |
character vector length one. Instrument name. |
col.variables |
variable names column (default = 1), allows dplyr subsetting |
list.datetime.format |
formatting for date/time detection. See 'case_match_regex_list()' |
col.description |
descriptions column, allows dplyr subsetting. If empty, variable names will be used. |
col.condition |
conditions for branching column, allows dplyr subsetting. See 'char2cond()'. |
col.subheader |
sub-header column, allows dplyr subsetting. See 'format_subheader()'. |
subheader.tag |
formatting tag. Default is "h2" |
condition.minor.sep |
condition split minor. See 'char2cond()'. Default is ",". |
condition.major.sep |
condition split major. See 'char2cond()'. Default is ";". |
col.calculation |
calculations column. Has to be written exact. Character vector. |
col.choices |
choices column. See 'char2choice()'. |
choices.char.sep |
choices split. See 'char2choice()'. Default is "/". |
missing.default |
value for missing fields. Default is NA. |
tibble or data.frame (same as data)
# data <- dd_inst # data |> doc2dd(instrument.name = "evt", # col.description = 3, # col.condition = 4, # col.subheader = 2, # col.calculation = 5, # col.choices = 6)
# data <- dd_inst # data |> doc2dd(instrument.name = "evt", # col.description = 3, # col.condition = 4, # col.subheader = 2, # col.calculation = 5, # col.choices = 6)
Creates a very basic data dictionary skeleton. Please see 'ds2dd_detailed()' for a more advanced function.
ds2dd( ds, record.id = "record_id", form.name = "basis", field.type = "text", field.label = NULL, include.column.names = FALSE, metadata = metadata_names )
ds2dd( ds, record.id = "record_id", form.name = "basis", field.type = "text", field.label = NULL, include.column.names = FALSE, metadata = metadata_names )
ds |
data set |
record.id |
name or column number of id variable, moved to first row of data dictionary, character of integer. Default is "record_id". |
form.name |
vector of form names, character string, length 1 or length equal to number of variables. Default is "basis". |
field.type |
vector of field types, character string, length 1 or length equal to number of variables. Default is "text. |
field.label |
vector of form names, character string, length 1 or length equal to number of variables. Default is NULL and is then identical to field names. |
include.column.names |
Flag to give detailed output including new column names for original data set for upload. |
metadata |
Metadata column names. Default is the included REDCapCAST::metadata_names. |
Migrated from stRoke ds2dd(). Fits better with the functionality of 'REDCapCAST'.
data.frame or list of data.frame and vector
redcapcast_data$record_id <- seq_len(nrow(redcapcast_data)) ds2dd(redcapcast_data, include.column.names=TRUE)
redcapcast_data$record_id <- seq_len(nrow(redcapcast_data)) ds2dd(redcapcast_data, include.column.names=TRUE)
Extract data from stata file for data dictionary
ds2dd_detailed( data, add.auto.id = FALSE, date.format = "dmy", form.name = NULL, form.sep = NULL, form.prefix = TRUE, field.type = NULL, field.label = NULL, field.label.attr = "label", field.validation = NULL, metadata = names(REDCapCAST::redcapcast_meta), convert.logicals = TRUE )
ds2dd_detailed( data, add.auto.id = FALSE, date.format = "dmy", form.name = NULL, form.sep = NULL, form.prefix = TRUE, field.type = NULL, field.label = NULL, field.label.attr = "label", field.validation = NULL, metadata = names(REDCapCAST::redcapcast_meta), convert.logicals = TRUE )
data |
data frame |
add.auto.id |
flag to add id column |
date.format |
date format, character string. ymd/dmy/mdy. dafault is dmy. |
form.name |
manually specify form name(s). Vector of length 1 or ncol(data). Default is NULL and "data" is used. |
form.sep |
If supplied dataset has form names as suffix or prefix to the column/variable names, the seperator can be specified. If supplied, the form.name is ignored. Default is NULL. |
form.prefix |
Flag to set if form is prefix (TRUE) or suffix (FALSE) to the column names. Assumes all columns have pre- or suffix if specified. |
field.type |
manually specify field type(s). Vector of length 1 or ncol(data). Default is NULL and "text" is used for everything but factors, which wil get "radio". |
field.label |
manually specify field label(s). Vector of length 1 or ncol(data). Default is NULL and colnames(data) is used or attribute 'field.label.attr' for haven_labelled data set (imported .dta file with 'haven::read_dta()'). |
field.label.attr |
attribute name for named labels for haven_labelled data set (imported .dta file with 'haven::read_dta()'. Default is "label" |
field.validation |
manually specify field validation(s). Vector of length 1 or ncol(data). Default is NULL and 'levels()' are used for factors or attribute 'factor.labels.attr' for haven_labelled data set (imported .dta file with 'haven::read_dta()'). |
metadata |
redcap metadata headings. Default is REDCapCAST:::metadata_names. |
convert.logicals |
convert logicals to factor. Default is TRUE. |
This function is a natural development of the ds2dd() function. It assumes that the first column is the ID-column. No checks. Please, do always inspect the data dictionary before upload.
Ensure, that the data set is formatted with as much information as possible.
'field.type' can be supplied
list of length 2
## Basic parsing with default options REDCapCAST::redcapcast_data |> dplyr::select(-dplyr::starts_with("redcap_")) |> ds2dd_detailed() ## Adding a record_id field iris |> ds2dd_detailed(add.auto.id = TRUE) ## Passing form name information to function iris |> ds2dd_detailed( add.auto.id = TRUE, form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2)) ) |> purrr::pluck("meta") mtcars |> ds2dd_detailed(add.auto.id = TRUE) ## Using column name suffix to carry form name data <- iris |> ds2dd_detailed(add.auto.id = TRUE) |> purrr::pluck("data") names(data) <- glue::glue("{sample(x = c('a','b'),size = length(names(data)), replace=TRUE,prob = rep(x=.5,2))}__{names(data)}") data |> ds2dd_detailed(form.sep = "__")
## Basic parsing with default options REDCapCAST::redcapcast_data |> dplyr::select(-dplyr::starts_with("redcap_")) |> ds2dd_detailed() ## Adding a record_id field iris |> ds2dd_detailed(add.auto.id = TRUE) ## Passing form name information to function iris |> ds2dd_detailed( add.auto.id = TRUE, form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2)) ) |> purrr::pluck("meta") mtcars |> ds2dd_detailed(add.auto.id = TRUE) ## Using column name suffix to carry form name data <- iris |> ds2dd_detailed(add.auto.id = TRUE) |> purrr::pluck("data") names(data) <- glue::glue("{sample(x = c('a','b'),size = length(names(data)), replace=TRUE,prob = rep(x=.5,2))}__{names(data)}") data |> ds2dd_detailed(form.sep = "__")
Secure API key storage and data acquisition in one
easy_redcap(project.name, widen.data = TRUE, uri, ...)
easy_redcap(project.name, widen.data = TRUE, uri, ...)
project.name |
The name of the current project (for key storage with 'keyring::key_set()', using the default keyring) |
widen.data |
argument to widen the exported data |
uri |
REDCap database API uri |
... |
arguments passed on to 'REDCapCAST::read_redcap_tables()' |
data.frame or list depending on widen.data
Metadata can be added by editing the data dictionary of a project in the initial design phase. If you want to later add new instruments, this function can be used to create (an) instrument(s) to add to a project in production.
export_redcap_instrument(data, file, force = FALSE, record.id = "record_id")
export_redcap_instrument(data, file, force = FALSE, record.id = "record_id")
data |
metadata for the relevant instrument. Could be from 'ds2dd_detailed()' |
file |
destination file name. |
force |
force instrument creation and ignore different form names by just using the first. |
record.id |
record id variable name. Default is 'record_id'. |
exports zip-file
#iris |> # ds2dd_detailed( # add.auto.id = TRUE, # form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2)) # ) |> # purrr::pluck("meta") |> # (\(.x){ # split(.x, .x$form_name) # })() |> # purrr::imap(function(.x, .i){ # export_redcap_instrument(.x,file=here::here(paste0(.i,Sys.Date(),".zip"))) # }) #iris |> # ds2dd_detailed( # add.auto.id = TRUE # ) |> # purrr::pluck("meta") |> # export_redcap_instrument(file=here::here(paste0("instrument",Sys.Date(),".zip")))
#iris |> # ds2dd_detailed( # add.auto.id = TRUE, # form.name = sample(c("b", "c"), size = 6, replace = TRUE, prob = rep(.5, 2)) # ) |> # purrr::pluck("meta") |> # (\(.x){ # split(.x, .x$form_name) # })() |> # purrr::imap(function(.x, .i){ # export_redcap_instrument(.x,file=here::here(paste0(.i,Sys.Date(),".zip"))) # }) #iris |> # ds2dd_detailed( # add.auto.id = TRUE # ) |> # purrr::pluck("meta") |> # export_redcap_instrument(file=here::here(paste0("instrument",Sys.Date(),".zip")))
Allows conversion of factor to numeric values preserving original levels
fct2num(data)
fct2num(data)
data |
vector |
numeric vector
c(1, 4, 3, "A", 7, 8, 1) |> as_factor() |> fct2num() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor() |> fct2num() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "labelled" ) |> as_factor() |> fct2num() # Outlier with labels, but no class of origin, handled like numeric vector # structure(c(1, 2, 3, 2, 10, 9), # labels = c(Unknown = 9, Refused = 10) # ) |> # as_factor() |> # fct2num() v <- sample(6:19,20,TRUE) |> factor() dput(v) named_levels(v) fct2num(v)
c(1, 4, 3, "A", 7, 8, 1) |> as_factor() |> fct2num() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor() |> fct2num() structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "labelled" ) |> as_factor() |> fct2num() # Outlier with labels, but no class of origin, handled like numeric vector # structure(c(1, 2, 3, 2, 10, 9), # labels = c(Unknown = 9, Refused = 10) # ) |> # as_factor() |> # fct2num() v <- sample(6:19,20,TRUE) |> factor() dput(v) named_levels(v) fct2num(v)
DEPRECATED Helper to import files correctly
file_extension(filenames)
file_extension(filenames)
filenames |
file names |
character vector
file_extension(list.files(here::here(""))[[2]])[[1]] file_extension(c("file.cd..ks", "file"))
file_extension(list.files(here::here(""))[[2]])[[1]] file_extension(c("file.cd..ks", "file"))
Extracts limited metadata for variables in a dataset
focused_metadata(metadata, vars_in_data)
focused_metadata(metadata, vars_in_data)
metadata |
A dataframe containing metadata |
vars_in_data |
Vector of variable names in the dataset |
A dataframe containing metadata for the variables in the dataset
Sub-header formatting wrapper
format_subheader(data, tag = "h2")
format_subheader(data, tag = "h2")
data |
character vector |
tag |
character vector length 1 |
character vector
"Instrument header" |> format_subheader()
"Instrument header" |> format_subheader()
Retrieve project API key if stored, if not, set and retrieve
get_api_key(key.name)
get_api_key(key.name)
key.name |
character vector of key name |
character vector
Extract attribute. Returns NA if none
get_attr(data, attr = NULL)
get_attr(data, attr = NULL)
data |
vector |
attr |
attribute name |
character vector
attr(mtcars$mpg, "label") <- "testing" do.call(c, sapply(mtcars, get_attr)) ## Not run: mtcars |> numchar2fct(numeric.threshold = 6) |> ds2dd_detailed() ## End(Not run)
attr(mtcars$mpg, "label") <- "testing" do.call(c, sapply(mtcars, get_attr)) ## Not run: mtcars |> numchar2fct(numeric.threshold = 6) |> ds2dd_detailed() ## End(Not run)
Get the id name
get_id_name(data)
get_id_name(data)
data |
data frame or list |
character vector
This is for repairing data with time variables with appended "1970-01-01"
guess_time_only( data, validate.time = FALSE, time.var.sel.pos = "[Tt]i[d(me)]", time.var.sel.neg = "[Dd]at[eo]" )
guess_time_only( data, validate.time = FALSE, time.var.sel.pos = "[Tt]i[d(me)]", time.var.sel.neg = "[Dd]at[eo]" )
data |
data.frame or tibble |
validate.time |
Flag to validate guessed time columns |
time.var.sel.pos |
Positive selection regex string passed to 'gues_time_only_filter()' as sel.pos. |
time.var.sel.neg |
Negative selection regex string passed to 'gues_time_only_filter()' as sel.neg. |
data.frame or tibble
redcapcast_data |> guess_time_only(validate.time = TRUE)
redcapcast_data |> guess_time_only(validate.time = TRUE)
This is just a try at guessing data type based on data class and column names hoping for a tiny bit of naming consistency. R does not include a time-only data format natively, so the "hms" class from 'readr' is used. This has to be converted to character class before REDCap upload.
guess_time_only_filter( data, validate = FALSE, sel.pos = "[Tt]i[d(me)]", sel.neg = "[Dd]at[eo]" )
guess_time_only_filter( data, validate = FALSE, sel.pos = "[Tt]i[d(me)]", sel.neg = "[Dd]at[eo]" )
data |
data set |
validate |
flag to output validation data. Will output list. |
sel.pos |
Positive selection regex string |
sel.neg |
Negative selection regex string |
character vector or list depending on 'validate' flag.
data <- redcapcast_data data |> guess_time_only_filter() data |> guess_time_only_filter(validate = TRUE) |> lapply(head)
data <- redcapcast_data data |> guess_time_only_filter() data |> guess_time_only_filter(validate = TRUE) |> lapply(head)
Finish incomplete haven attributes substituting missings with values
haven_all_levels(data)
haven_all_levels(data)
data |
haven labelled variable |
named vector
ds <- structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) haven::is.labelled(ds) attributes(ds) ds |> haven_all_levels()
ds <- structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) haven::is.labelled(ds) attributes(ds) ds |> haven_all_levels()
Change "hms" to "character" for REDCap upload.
hms2character(data)
hms2character(data)
data |
data set |
data.frame or tibble
data <- redcapcast_data ## data |> time_only_correction() |> hms2character()
data <- redcapcast_data ## data |> time_only_correction() |> hms2character()
Simple html tag wrapping for REDCap text formatting
html_tag_wrap(data, tag = "h2", extra = NULL)
html_tag_wrap(data, tag = "h2", extra = NULL)
data |
character vector |
tag |
character vector length 1 |
extra |
character vector |
character vector
html_tag_wrap("Titel", tag = "div", extra = 'class="rich-text-field-label"') html_tag_wrap("Titel", tag = "h2")
html_tag_wrap("Titel", tag = "div", extra = 'class="rich-text-field-label"') html_tag_wrap("Titel", tag = "h2")
Multi missing check
is_missing(data, nas = c("", "NA"))
is_missing(data, nas = c("", "NA"))
data |
character vector |
nas |
character vector of strings considered as NA |
logical vector
Test if repeatable or longitudinal
is_repeated_longitudinal( data, generics = c("redcap_event_name", "redcap_repeat_instrument", "redcap_repeat_instance") )
is_repeated_longitudinal( data, generics = c("redcap_event_name", "redcap_repeat_instrument", "redcap_repeat_instance") )
data |
data set |
generics |
default is "redcap_event_name", "redcap_repeat_instrument" and "redcap_repeat_instance" |
logical
is_repeated_longitudinal(c("record_id", "age", "record_id", "gender")) is_repeated_longitudinal(redcapcast_data) is_repeated_longitudinal(list(redcapcast_data))
is_repeated_longitudinal(c("record_id", "age", "record_id", "gender")) is_repeated_longitudinal(redcapcast_data) is_repeated_longitudinal(list(redcapcast_data))
Completion marking based on completed upload
mark_complete(upload, ls)
mark_complete(upload, ls)
upload |
output list from 'REDCapR::redcap_write()' |
ls |
output list from 'ds2dd_detailed()' |
list with 'REDCapR::redcap_write()' results
Match fields to forms
match_fields_to_form(metadata, vars_in_data)
match_fields_to_form(metadata, vars_in_data)
metadata |
A data frame containing field names and form names |
vars_in_data |
A character vector of variable names |
A data frame containing field names and form names
mtcars dataset slightly modified to use for Shiny app upload demonstration
data(mtcars_redcap)
data(mtcars_redcap)
A data frame with 13 variables:
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
ID, numeric
original rownames, charater
Get named vector of factor levels and values
named_levels(data, label = "labels", na.label = NULL, na.value = 99)
named_levels(data, label = "labels", na.label = NULL, na.value = 99)
data |
factor |
label |
character string of attribute with named vector of factor labels |
na.label |
character string to refactor NA values. Default is NULL. |
na.value |
new value for NA strings. Ignored if na.label is NULL. Default is 99. |
named vector
## Not run: structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor() |> named_levels() ## End(Not run)
## Not run: structure(c(1, 2, 3, 2, 10, 9), labels = c(Unknown = 9, Refused = 10), class = "haven_labelled" ) |> as_factor() |> named_levels() ## End(Not run)
Individual thresholds for character and numeric columns
numchar2fct(data, numeric.threshold = 6, character.throshold = 6)
numchar2fct(data, numeric.threshold = 6, character.throshold = 6)
data |
dataset. data.frame or tibble |
numeric.threshold |
threshold for var2fct for numeric columns. Default is 6. |
character.throshold |
threshold for var2fct for character columns. Default is 6. |
data.frame or tibble
mtcars |> str() ## Not run: mtcars |> numchar2fct(numeric.threshold = 6) |> str() ## End(Not run)
mtcars |> str() ## Not run: mtcars |> numchar2fct(numeric.threshold = 6) |> str() ## End(Not run)
Helper to auto-parse un-formatted data with haven and readr
parse_data( data, guess_type = TRUE, col_types = NULL, locale = readr::default_locale(), ignore.vars = "cpr", ... )
parse_data( data, guess_type = TRUE, col_types = NULL, locale = readr::default_locale(), ignore.vars = "cpr", ... )
data |
data.frame or tibble |
guess_type |
logical to guess type with readr |
col_types |
specify col_types using readr semantics. Ignored if guess_type is TRUE |
locale |
option to specify locale. Defaults to readr::default_locale(). |
ignore.vars |
specify column names of columns to ignore when parsing |
... |
ignored |
data.frame or tibble
mtcars |> parse_data() |> str()
mtcars |> parse_data() |> str()
Test if vector can be interpreted as roman numerals
possibly_roman(data)
possibly_roman(data)
data |
character vector |
logical
sample(1:100,10) |> as.roman() |> possibly_roman() sample(c(TRUE,FALSE),10,TRUE)|> possibly_roman() rep(NA,10)|> possibly_roman()
sample(1:100,10) |> as.roman() |> possibly_roman() sample(c(TRUE,FALSE),10,TRUE)|> possibly_roman() rep(NA,10)|> possibly_roman()
User input processing
process_user_input(x)
process_user_input(x)
x |
input |
processed input
User input processing character
## S3 method for class 'character' process_user_input(x, ...)
## S3 method for class 'character' process_user_input(x, ...)
x |
input |
... |
ignored |
processed input
User input processing data.frame
## S3 method for class 'data.frame' process_user_input(x, ...)
## S3 method for class 'data.frame' process_user_input(x, ...)
x |
input |
... |
ignored |
processed input
User input processing default
## Default S3 method: process_user_input(x, ...)
## Default S3 method: process_user_input(x, ...)
x |
input |
... |
ignored |
processed input
User input processing response
## S3 method for class 'response' process_user_input(x, ...)
## S3 method for class 'response' process_user_input(x, ...)
x |
input |
... |
ignored |
processed input
Flexible file import based on extension
read_input(file, consider.na = c("NA", "\"\"", ""))
read_input(file, consider.na = c("NA", "\"\"", ""))
file |
file name |
consider.na |
character vector of strings to consider as NAs |
tibble
read_input("https://raw.githubusercontent.com/agdamsbo/cognitive.index.lookup/main/data/sample.csv")
read_input("https://raw.githubusercontent.com/agdamsbo/cognitive.index.lookup/main/data/sample.csv")
Convenience function to download complete instrument, using token storage in keyring.
read_redcap_instrument( key, uri, instrument, raw_or_label = "raw", id_name = "record_id", records = NULL )
read_redcap_instrument( key, uri, instrument, raw_or_label = "raw", id_name = "record_id", records = NULL )
key |
key name in standard keyring for token retrieval. |
uri |
REDCap database API uri |
instrument |
instrument name |
raw_or_label |
raw or label passed to 'REDCapR::redcap_read()' |
id_name |
id variable name. Default is "record_id". |
records |
specify the records to download. Index numbers. Numeric vector. |
data.frame
Implementation of REDCap_split with a focused data acquisition approach using REDCapR::redcap_read and only downloading specified fields, forms and/or events using the built-in focused_metadata including some clean-up. Works with classical and longitudinal projects with or without repeating instruments.
read_redcap_tables( uri, token, records = NULL, fields = NULL, events = NULL, forms = NULL, raw_or_label = "label", split_forms = "all" )
read_redcap_tables( uri, token, records = NULL, fields = NULL, events = NULL, forms = NULL, raw_or_label = "label", split_forms = "all" )
uri |
REDCap database API uri |
token |
API token |
records |
records to download |
fields |
fields to download |
events |
events to download |
forms |
forms to download |
raw_or_label |
raw or label tags |
split_forms |
Whether to split "repeating" or "all" forms, default is all. |
list of instruments
# Examples will be provided later
# Examples will be provided later
This will take output from a REDCap export and split it into a base table and child tables for each repeating instrument. Metadata is used to determine which fields should be included in each resultant table.
REDCap_split( records, metadata, primary_table_name = "", forms = c("repeating", "all") )
REDCap_split( records, metadata, primary_table_name = "", forms = c("repeating", "all") )
records |
Exported project records. May be a |
metadata |
Project metadata (the data dictionary). May be a
|
primary_table_name |
Name given to the list element for the primary
output table (as described in README.md). Ignored if
|
forms |
Indicate whether to create separate tables for repeating instruments only or for all forms. |
A list of "data.frame"
s. The number of tables will differ
depending on the forms
option selected.
'repeating'
: one base table and one or more
tables for each repeating instrument.
'all'
: a data.frame for each instrument, regardless of
whether it is a repeating instrument or not.
Paul W. Egeler, M.S., GStat
## Not run: # Using an API call ------------------------------------------------------- library(RCurl) # Get the records records <- postForm( uri = api_url, # Supply your site-specific URI token = api_token, # Supply your own API token content = "record", format = "json", returnFormat = "json" ) # Get the metadata metadata <- postForm( uri = api_url, # Supply your site-specific URI token = api_token, # Supply your own API token content = "metadata", format = "json" ) # Convert exported JSON strings into a list of data.frames REDCapRITS::REDCap_split(records, metadata) # Using a raw data export ------------------------------------------------- # Get the records records <- read.csv("/path/to/data/ExampleProject_DATA_2018-06-03_1700.csv") # Get the metadata metadata <- read.csv( "/path/to/data/ExampleProject_DataDictionary_2018-06-03.csv" ) # Split the tables REDCapRITS::REDCap_split(records, metadata) # In conjunction with the R export script --------------------------------- # You must set the working directory first since the REDCap data export # script contains relative file references. old <- getwd() setwd("/path/to/data/") # Run the data export script supplied by REDCap. # This will create a data.frame of your records called 'data' source("ExampleProject_R_2018-06-03_1700.r") # Get the metadatan metadata <- read.csv("ExampleProject_DataDictionary_2018-06-03.csv") # Split the tables REDCapRITS::REDCap_split(data, metadata) setwd(old) ## End(Not run)
## Not run: # Using an API call ------------------------------------------------------- library(RCurl) # Get the records records <- postForm( uri = api_url, # Supply your site-specific URI token = api_token, # Supply your own API token content = "record", format = "json", returnFormat = "json" ) # Get the metadata metadata <- postForm( uri = api_url, # Supply your site-specific URI token = api_token, # Supply your own API token content = "metadata", format = "json" ) # Convert exported JSON strings into a list of data.frames REDCapRITS::REDCap_split(records, metadata) # Using a raw data export ------------------------------------------------- # Get the records records <- read.csv("/path/to/data/ExampleProject_DATA_2018-06-03_1700.csv") # Get the metadata metadata <- read.csv( "/path/to/data/ExampleProject_DataDictionary_2018-06-03.csv" ) # Split the tables REDCapRITS::REDCap_split(records, metadata) # In conjunction with the R export script --------------------------------- # You must set the working directory first since the REDCap data export # script contains relative file references. old <- getwd() setwd("/path/to/data/") # Run the data export script supplied by REDCap. # This will create a data.frame of your records called 'data' source("ExampleProject_R_2018-06-03_1700.r") # Get the metadatan metadata <- read.csv("ExampleProject_DataDictionary_2018-06-03.csv") # Split the tables REDCapRITS::REDCap_split(data, metadata) setwd(old) ## End(Not run)
Converts a list of REDCap data frames from long to wide format. Handles longitudinal projects, but not yet repeated instruments.
redcap_wider( data, event.glue = "{.value}_{redcap_event_name}", inst.glue = "{.value}_{redcap_repeat_instance}" )
redcap_wider( data, event.glue = "{.value}_{redcap_event_name}", inst.glue = "{.value}_{redcap_repeat_instance}" )
data |
A list of data frames. |
event.glue |
A dplyr::glue string for repeated events naming |
inst.glue |
A dplyr::glue string for repeated instruments naming |
The list of data frames in wide format.
# Longitudinal list1 <- list( data.frame( record_id = c(1, 2, 1, 2), redcap_event_name = c("baseline", "baseline", "followup", "followup"), age = c(25, 26, 27, 28) ), data.frame( record_id = c(1, 2), redcap_event_name = c("baseline", "baseline"), gender = c("male", "female") ) ) redcap_wider(list1) # Simpel with two instruments list2 <- list( data.frame( record_id = c(1, 2), age = c(25, 26) ), data.frame( record_id = c(1, 2), gender = c("male", "female") ) ) redcap_wider(list2) # Simple with single instrument list3 <- list(data.frame( record_id = c(1, 2), age = c(25, 26) )) redcap_wider(list3) # Longitudinal with repeatable instruments list4 <- list( data.frame( record_id = c(1, 2, 1, 2), redcap_event_name = c("baseline", "baseline", "followup", "followup"), age = c(25, 26, 27, 28) ), data.frame( record_id = c(1, 1, 1, 1, 2, 2, 2, 2), redcap_event_name = c( "baseline", "baseline", "followup", "followup", "baseline", "baseline", "followup", "followup" ), redcap_repeat_instrument = "walk", redcap_repeat_instance = c(1, 2, 1, 2, 1, 2, 1, 2), dist = c(40, 32, 25, 33, 28, 24, 23, 36) ), data.frame( record_id = c(1, 2), redcap_event_name = c("baseline", "baseline"), gender = c("male", "female") ) ) redcap_wider(list4)
# Longitudinal list1 <- list( data.frame( record_id = c(1, 2, 1, 2), redcap_event_name = c("baseline", "baseline", "followup", "followup"), age = c(25, 26, 27, 28) ), data.frame( record_id = c(1, 2), redcap_event_name = c("baseline", "baseline"), gender = c("male", "female") ) ) redcap_wider(list1) # Simpel with two instruments list2 <- list( data.frame( record_id = c(1, 2), age = c(25, 26) ), data.frame( record_id = c(1, 2), gender = c("male", "female") ) ) redcap_wider(list2) # Simple with single instrument list3 <- list(data.frame( record_id = c(1, 2), age = c(25, 26) )) redcap_wider(list3) # Longitudinal with repeatable instruments list4 <- list( data.frame( record_id = c(1, 2, 1, 2), redcap_event_name = c("baseline", "baseline", "followup", "followup"), age = c(25, 26, 27, 28) ), data.frame( record_id = c(1, 1, 1, 1, 2, 2, 2, 2), redcap_event_name = c( "baseline", "baseline", "followup", "followup", "baseline", "baseline", "followup", "followup" ), redcap_repeat_instrument = "walk", redcap_repeat_instance = c(1, 2, 1, 2, 1, 2, 1, 2), dist = c(40, 32, 25, 33, 28, 24, 23, 36) ), data.frame( record_id = c(1, 2), redcap_event_name = c("baseline", "baseline"), gender = c("male", "female") ) ) redcap_wider(list4)
This is a small dataset from a REDCap database for demonstrational purposes. Contains only synthetic data.
data(redcapcast_data)
data(redcapcast_data)
A data frame with 22 variables:
ID, numeric
Event name, character
Repeat instrument, character
Repeat instance, numeric
CPR number, character
Inclusion date, Date
Inclusion time, hms
Date of birth, Date
Age decimal, numeric
Age integer, numeric
Legal sex, character
Cohabitation status, character
con_calc
con_mrs
consensus_complete
Hypertension, character
diabetes, character
region, character
Completed, character
mRS Assessed, character
Assesment date, Date
Categorical score, numeric
Complete, numeric
Event datetime, POSIXct
Age at time of event, numeric
Event type, character
Completed, character
This metadata dataset from a REDCap database is for demonstrational purposes.
data(redcapcast_meta)
data(redcapcast_meta)
A data frame with 22 variables:
field_name, character
form_name, character
section_header, character
field_type, character
field_label, character
select_choices_or_calculations, character
field_note, character
text_validation_type_or_show_slider_number, character
text_validation_min, character
text_validation_max, character
identifier, character
branching_logic, character
required_field, character
custom_alignment, character
question_number, character
matrix_group_name, character
matrix_ranking, character
field_annotation, character
Copied from textclean, which has not been updated since 2018 and is not on CRAN. Github:https://github.com/trinker/textclean
replace_curly_quote(x)
replace_curly_quote(x)
x |
character vector |
character vector
Removing empty rows
sanitize_split( l, generic.names = c("redcap_event_name", "redcap_repeat_instrument", "redcap_repeat_instance") )
sanitize_split( l, generic.names = c("redcap_event_name", "redcap_repeat_instrument", "redcap_repeat_instance") )
l |
A list of data frames. |
generic.names |
A vector of generic names to be excluded. |
A list of data frames with generic names excluded.
Set attributes for named attribute. Appends if attr is NULL
set_attr(data, label, attr = NULL, overwrite = FALSE)
set_attr(data, label, attr = NULL, overwrite = FALSE)
data |
vector |
label |
label |
attr |
attribute name |
overwrite |
overwrite existing attributes. Default is FALSE. |
vector with attribute
Wraps shiny::runApp()
shiny_cast(...)
shiny_cast(...)
... |
Arguments passed to shiny::runApp() |
shiny app
# shiny_cast()
# shiny_cast()
Split a data frame into separate tables for each form
split_non_repeating_forms(table, universal_fields, fields)
split_non_repeating_forms(table, universal_fields, fields)
table |
A data frame |
universal_fields |
A character vector of fields that should be included in every table |
fields |
A two-column matrix containing the names of fields that should be included in each form |
A list of data frames, one for each non-repeating form
# Create a table table <- data.frame( id = c(1, 2, 3, 4, 5), form_a_name = c("John", "Alice", "Bob", "Eve", "Mallory"), form_a_age = c(25, 30, 25, 15, 20), form_b_name = c("John", "Alice", "Bob", "Eve", "Mallory"), form_b_gender = c("M", "F", "M", "F", "F") ) # Create the universal fields universal_fields <- c("id") # Create the fields fields <- matrix( c( "form_a_name", "form_a", "form_a_age", "form_a", "form_b_name", "form_b", "form_b_gender", "form_b" ), ncol = 2, byrow = TRUE ) # Split the table split_non_repeating_forms(table, universal_fields, fields)
# Create a table table <- data.frame( id = c(1, 2, 3, 4, 5), form_a_name = c("John", "Alice", "Bob", "Eve", "Mallory"), form_a_age = c(25, 30, 25, 15, 20), form_b_name = c("John", "Alice", "Bob", "Eve", "Mallory"), form_b_gender = c("M", "F", "M", "F", "F") ) # Create the universal fields universal_fields <- c("id") # Create the fields fields <- matrix( c( "form_a_name", "form_a", "form_a_age", "form_a", "form_b_name", "form_b", "form_b_gender", "form_b" ), ncol = 2, byrow = TRUE ) # Split the table split_non_repeating_forms(table, universal_fields, fields)
Can be used as a substitute of the base function. Main claim to fame is easing the split around the defined delimiter, see example.
strsplitx(x, split, type = "classic", perl = FALSE, ...)
strsplitx(x, split, type = "classic", perl = FALSE, ...)
x |
data |
split |
delimiter |
type |
Split type. Can be c("classic", "before", "after", "around") |
perl |
perl param from strsplit() |
... |
additional parameters are passed to base strsplit handling splits |
list
test <- c("12 months follow-up", "3 steps", "mRS 6 weeks", "Counting to 231 now") strsplitx(test, "[0-9]", type = "around")
test <- c("12 months follow-up", "3 steps", "mRS 6 weeks", "Counting to 231 now") strsplitx(test, "[0-9]", type = "around")
Correction based on time_only_filter function
time_only_correction(data, ...)
time_only_correction(data, ...)
data |
data set |
... |
arguments passed on to 'guess_time_only_filter()' |
tibble
data <- redcapcast_data ## data |> time_only_correction()
data <- redcapcast_data ## data |> time_only_correction()
This is a wrapper of forcats::as_factor, which sorts numeric vectors before factoring, but levels character vectors in order of appearance.
var2fct(data, unique.n)
var2fct(data, unique.n)
data |
vector or data.frame column |
unique.n |
threshold to convert class to factor |
vector
sample(seq_len(4), 20, TRUE) |> var2fct(6) |> summary() sample(letters, 20) |> var2fct(6) |> summary() sample(letters[1:4], 20, TRUE) |> var2fct(6)
sample(seq_len(4), 20, TRUE) |> var2fct(6) |> summary() sample(letters, 20) |> var2fct(6) |> summary() sample(letters[1:4], 20, TRUE) |> var2fct(6)
Named vector to REDCap choices ('wrapping compact_vec()')
vec2choice(data)
vec2choice(data)
data |
named vector |
character string
sample(seq_len(4), 20, TRUE) |> as_factor() |> named_levels() |> sort() |> vec2choice()
sample(seq_len(4), 20, TRUE) |> as_factor() |> named_levels() |> sort() |> vec2choice()