Title: | Working with Code Lists |
---|---|
Description: | Functions for working with code lists and vectors with codes. These are an alternative for factor that keep track of both the codes and labels. Methods allow for transforming between codes and labels. Also supports hierarchical code lists. |
Authors: | Jan van der Laan [aut, cre]
|
Maintainer: | Jan van der Laan <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-20 19:19:06 UTC |
Source: | CRAN |
Convert object to code
as.code(x) ## S3 method for class 'code' as.code(x) ## Default S3 method: as.code(x)
as.code(x) ## S3 method for class 'code' as.code(x) ## Default S3 method: as.code(x)
x |
object to convert |
By default objects are first converted to factor using
as.factor
before being converted to code using
as.code
.
Returns an object of type code
.
Convert an object to a codelist object
as.codelist(x, ...) ## S3 method for class 'codelist' as.codelist(x, ...) ## S3 method for class 'data.frame' as.codelist( x, code = names(x)[1], label = names(x)[2], description = "description", parent = "parent", locale = "locale", missing = "missing", format = c("regular", "wide"), locales = NULL, locale_sep = "[-_@. ]", ... )
as.codelist(x, ...) ## S3 method for class 'codelist' as.codelist(x, ...) ## S3 method for class 'data.frame' as.codelist( x, code = names(x)[1], label = names(x)[2], description = "description", parent = "parent", locale = "locale", missing = "missing", format = c("regular", "wide"), locales = NULL, locale_sep = "[-_@. ]", ... )
x |
data.frame with the code list |
... |
used to pass extra arguments on to other methods. |
code |
the name of the column in |
label |
the name of the column in |
description |
the name of the column in |
parent |
the name of the column in |
locale |
the name of the column in |
missing |
the name of the column in |
format |
the format of data.frame. In case of 'wide', it is assummed that columns are repeated for each locale. For example there are columns 'label_locale1' and 'label_locale2'. In case of 'regular' there are multiple rows one for each locale. |
locales |
only used for |
locale_sep |
the separator separating the locale from the column name.
This is interpreted as a regular expression (see the 'split' argument of
|
When there is no column with the name given by label
in x
, a
new column 'label' is derived containing codes converted to character.
Returns a codelist
object which is a data.frame
with at
minimum the columns 'code' and 'label' and optionally 'description',
'parent', 'locale' and 'missing'. When x
contains additional columns
these are kept.
codelist
for a description of the codelist
object.
# Examples below show the same codelist in both regular and wide format dta <- data.frame(codes = c(1:3, 1:3), labels = c(letters[1:3], LETTERS[1:3]), locale = c("en", "en", "en", "nl" ,"nl" ,"nl")) as.codelist(dta, format = "regular") dta <- data.frame(codes = 1:3, labels_en = letters[1:3], labels_nl = LETTERS[1:3]) as.codelist(dta, format = "wide")
# Examples below show the same codelist in both regular and wide format dta <- data.frame(codes = c(1:3, 1:3), labels = c(letters[1:3], LETTERS[1:3]), locale = c("en", "en", "en", "nl" ,"nl" ,"nl")) as.codelist(dta, format = "regular") dta <- data.frame(codes = 1:3, labels_en = letters[1:3], labels_nl = LETTERS[1:3]) as.codelist(dta, format = "wide")
Label character vector as label to use in comparisons with a code vector
as.label(x)
as.label(x)
x |
character vector that is to be interpreted as a label. If |
Returns a character vector with the class "label". This can be used in comparisons to a 'code' vector, or to assign to a 'code' vector.
Uses codes
.
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) objectsales$product[1] <- as.label("Hammer") objectsales$product == as.label("Hammer") subset(objectsales, product == as.label("Hammer")) # This is the same as subset(objectsales, product == codes("Hammer", cl(product)))
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) objectsales$product[1] <- as.label("Hammer") objectsales$product == as.label("Hammer") subset(objectsales, product == as.label("Hammer")) # This is the same as subset(objectsales, product == codes("Hammer", cl(product)))
Get the code list associated with the object
cl(x) ## Default S3 method: cl(x) ## S3 method for class 'code' cl(x)
cl(x) ## Default S3 method: cl(x) ## S3 method for class 'code' cl(x)
x |
the object to get the |
An object of type 'codelist'.
Filter a code list
cl_filter(codelist, locale, levels, check_levels = TRUE)
cl_filter(codelist, locale, levels, check_levels = TRUE)
codelist |
a |
locale |
use the codes from the given locale. Should be character
vector of length 1. When NA the default locale is used (as returned by
|
levels |
vector with levels on which to filter an hierarchical code list. Levels are numbered from 0 with 0 the topmost level. See 'Details'. When a code list does not have a 'parent' column and is, therefore, not hierarchical all codes are in level 0. |
check_levels |
if TRUE the parent column (if present) is removed from the result when the resulting code list would not be a valid hierarchy. |
When a code list has a 'parent' column. The codes without parent are assigned level 0. Codes with a parent in level 0 are assigned to level 1. Etc. When the code list does not have a 'parent' column all codes are assigned to level 0 (all codes are in the top level).
Returns a codelist
with the selected encoding and/or levels.
Check if the codelist is valid
cl_is_valid(codelist)
cl_is_valid(codelist)
codelist |
a |
Returns TRUE
when the code list is valid; returns a character vector
of length 1 with a description of the problem when it is not valid.
Get the hierarchical level for each code in a code list
cl_levels(codelist)
cl_levels(codelist)
codelist |
the |
Levels are numbered with 0 being the top-most level, which contains code without parent (parent missing). In level 1 are codes that have a parent in level 0. Etc.
When the code list does not have a 'parent' column, all codes are in level 0.
An integer vector with the same length as the number of rows in the code list.
data(objectcodes) cl_levels(objectcodes)
data(objectcodes) cl_levels(objectcodes)
Get the locale to use with the codelist
cl_locale(codelist, preferred = getOption("CLLOCALE", NA_character_))
cl_locale(codelist, preferred = getOption("CLLOCALE", NA_character_))
codelist |
a |
preferred |
the preferred locale. If missing or not present in the code list, the first locale in the code list will be used. |
A character vector of length 1 with the locale. Can be NA
when the
codelist does not have locales.
Get the number of hierarchical levels in a code list
cl_nlevels(codelist)
cl_nlevels(codelist)
codelist |
the |
A single integer value (>= 1) with the number of levels.
A code vector is a vector with an associated code list. The values in the
vector should come from this code list. The values also have an associated
label and optionally additional properties such as a description. See
codelist
for more information on what should and could be in a
code list.
code(x, codelist, ...)
code(x, codelist, ...)
x |
vector to convert to code vector |
codelist |
code list to associate with the values in |
... |
Ignored; used to pass extra arguments to other methods |
When codelist
is omitted when case x
is a factor, a code list is
generated from the factor values.
Returns an object of type 'code'. Except when x
is a factor, x
keeps classes and attributes assiated with x
. This object is a copy of
x
with a codelist
attribute added.
When x
is a factor x
it converted to an integer vector. The
labels are the levels of the factor.
x <- code(c(1,4,2), codelist(codes = 1:4, labels = letters[1:4])) print(x) labels(x) x <- code(factor(letters[1:3])) print(x) attr(x, "codelist")
x <- code(c(1,4,2), codelist(codes = 1:4, labels = letters[1:4])) print(x) labels(x) x <- code(factor(letters[1:3])) print(x) attr(x, "codelist")
Create a codelist object
codelist( codes, labels = NULL, descriptions = NULL, parent = NULL, locale = NULL, missing = NULL )
codelist( codes, labels = NULL, descriptions = NULL, parent = NULL, locale = NULL, missing = NULL )
codes |
a vector with the codes. |
labels |
optional vector with the labels. Will be converted to character
and should have the same length as |
descriptions |
optional vector with the descriptions of the codes. Will be
converted to character and should have the same length as |
parent |
optional vector with the parents of the codes. Should be of the
same type and length as |
locale |
optional vector with the locale of the labels, descriptions etc.
of the codes. This should be a character vector with the same length as
|
missing |
optional logical vector indicating whether or not the corresponding code can be treated as a missing value. This can be used to encode different types of missingness. |
Returns a codelist
object which is a data.frame
with at minimum
the columns 'code' and 'label' and optionally 'description', 'parent',
'locale' and 'missing'. See below for a description of the columns:
code |
The codes. It is expected that these are either characters or integers although other types are probably supported. For a given locale (see below) they should be unique. Missing values are not allowed. |
label |
The labels of the codes. These are characters. Missing values are not allowed. |
description |
Optional. The description of the codes. These are characters. Missing values are not allowed. |
missing |
Optional. Logical vector indicating whether or not the corresponding code can be treated as a special value. This can be used to have different codes for different types of missingness. Missing values are not allowed. |
locale |
Optional. Character vector indicating for the given row which locale the label and description belong to. The default use is to have different translations of the labels and descriptions. However, this can also be used, for example, to specify short and long labels. When there is more than one locale, there should be multiple lines for each code, one for each locale. |
parent |
Optional. The parent of the code. This can be used to specify simple hierarchies. These should be of the same type as the 'code' column and values should be present in the 'code' column or be 'NA'. When the parent is 'NA' it is assumed this is a top level code. The hierarchy should form a tree. |
The validity of the code list can be checked using cl_is_valid
.
Get the codes belonging to given labels
codes(x, ...) ## Default S3 method: codes(x, codelist, locale = cl_locale(codelist), ...) ## S3 method for class 'code' codes(x, ...) to_codes(x, codelist, locale = cl_locale(codelist))
codes(x, ...) ## Default S3 method: codes(x, codelist, locale = cl_locale(codelist), ...) ## S3 method for class 'code' codes(x, ...) to_codes(x, codelist, locale = cl_locale(codelist))
x |
character vector with labels. |
... |
used to pass arguments to other methods. |
codelist |
a |
locale |
use the codes from the given locale. Should be a character vector of length 1. |
to_codes
has the same functionality as a call to codes.default
.
Returns a vector of codes. Will give an error when one of the labels cannot
be found in the codelist for the given locale. When x
is an object of
type 'code' the codes themselves are returned stripped from the 'code'
class and with the 'codelist' attribute removed.
See as.label
for an alternative in comparisons.
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) codes(c("Hammer", "Electric Drill"), objectcodes) codes(c("Hammer", "Electric Drill"), cl(objectsales$product))
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) codes(c("Hammer", "Electric Drill"), objectcodes) codes(c("Hammer", "Electric Drill"), cl(objectsales$product))
Format a code object for pretty printing
## S3 method for class 'code' format(x, maxlen = getOption("CLMAXLEN", 8L), ...)
## S3 method for class 'code' format(x, maxlen = getOption("CLMAXLEN", 8L), ...)
x |
a |
maxlen |
maximum length of the label. A length of 0 or lower will suppress adding the label to the output. |
... |
ignored When |
A character vector with the formatted code.
Match codes based on label
in_labels( x, labels, codelist = attr(x, "codelist"), locale = cl_locale(codelist) )
in_labels( x, labels, codelist = attr(x, "codelist"), locale = cl_locale(codelist) )
x |
vector with codes. Should be of the same type as the codes in the codelist. |
labels |
vector with labels. |
codelist |
a |
locale |
use the codes from the given locale. Should be character vector of length 1. |
A logical vector of the same length as x
indicating for each value if
the code has a label present in labels
.
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) in_labels(objectsales$product, c("Electric Drill", "Toys")) subset(objectsales, in_labels(product, c("Electric Drill", "Hammer")))
data(objectcodes) data(objectsales) objectsales$product <- code(objectsales$product, objectcodes) in_labels(objectsales$product, c("Electric Drill", "Toys")) subset(objectsales, in_labels(product, c("Electric Drill", "Hammer")))
Check if object is a code
is.code(x)
is.code(x)
x |
object to check |
Returns a logical of length 1 indicating whether or not X
is of
type 'code'.
Check if an object is a Code List
is.codelist(x)
is.codelist(x)
x |
object to test. |
Returns a logical of length 1. Returns TRUE
is x
is of type
codelist
or a data.frame
that conforms to the
requirements of a code list.
Find out which elements of a vector have missing values
is.missing(x, codelist = attr(x, "codelist"))
is.missing(x, codelist = attr(x, "codelist"))
x |
vector for which the missing elements have to be detected. |
codelist |
a |
Unlike is.na
is.missing
will also return TRUE
for
elements of x
whose values are indicated in the code list to be
missing values. For that to work codelist
needs to be a valid
codelist
with a 'missing' column. This column needs to be
interpretable as a logical vector. When codelist
is missing or does
not contain a 'missing' column the result of is.missing
is the same as
is.na
.
Returns a logical vector of the same length as x
with TRUE
indicating corresponing values in x
that can be considered to be
missing.
Convert vector with codes to factor using a code list
## S3 method for class 'code' labels( object, missing = TRUE, droplevels = FALSE, codelist = attr(object, "codelist"), locale = cl_locale(codelist), ... ) to_labels( x, codelist = attr(x, "codelist"), missing = TRUE, droplevels = FALSE, locale = cl_locale(codelist) )
## S3 method for class 'code' labels( object, missing = TRUE, droplevels = FALSE, codelist = attr(object, "codelist"), locale = cl_locale(codelist), ... ) to_labels( x, codelist = attr(x, "codelist"), missing = TRUE, droplevels = FALSE, locale = cl_locale(codelist) )
object |
vector with codes. Should be of the same type as the codes in the codelist. |
missing |
convert codes that are missing value to missing values. |
droplevels |
remove labels that do not occur in |
codelist |
a |
locale |
use the codes from the given locale. Should be character vector of length 1. |
... |
ignored |
x |
vector with codes. Should be of the same type as the codes in the codelist. |
to_labels
calls labels.code
directly and is meant as a
substitute for labels.code
for objects that are not of type 'code'.
A factor vector with the same length as x
.
data(objectsales) data(objectcodes) objectsales$product <- code(objectsales$product, objectcodes) labels(objectsales$product) |> table(useNA = "ifany") labels(objectsales$product, missing = FALSE) |> table(useNA = "ifany") labels(objectsales$product, droplevels = TRUE) |> table(useNA = "ifany") to_labels(c("A", "B"), codelist = objectcodes) # is the same as labels.code(c("A", "B"), codelist = objectcodes)
data(objectsales) data(objectcodes) objectsales$product <- code(objectsales$product, objectcodes) labels(objectsales$product) |> table(useNA = "ifany") labels(objectsales$product, missing = FALSE) |> table(useNA = "ifany") labels(objectsales$product, droplevels = TRUE) |> table(useNA = "ifany") to_labels(c("A", "B"), codelist = objectcodes) # is the same as labels.code(c("A", "B"), codelist = objectcodes)
Recode codes to a higher level in a hierarchy
levelcast( x, level, codelist = attr(x, "codelist"), over_level = c("error", "missing", "ignore"), filter_codelist = TRUE )
levelcast( x, level, codelist = attr(x, "codelist"), over_level = c("error", "missing", "ignore"), filter_codelist = TRUE )
x |
vector of codes to record. This can be an object of type
|
level |
level to which to cast the codes. |
codelist |
the |
over_level |
how to handle codes that are in a higher level than the level that is cast to. The default 'error' will generate an error; 'missing' will result in missing values for those codes; 'ignore' will keep these codes. |
filter_codelist |
if |
When handling codes that are in a higher level than the level that is cast to, codes that are missing values are ignored as these are often in the highest level.
A vector with the same length as x
.
cl <- codelist( codes = c("A", "B", "A1", "A2", "B1", "B2", "A1.1", "B2.2", "X"), parent = c(NA, NA, "A", "A", "B", "B", "A1", "B2", NA), missing = c(0, 0, 0, 0, 0, 0, 0, 0, 1) ) x <- code(c("A1.1", "A1", "A2", "B2.2", "B2.2", NA, "B2", "X"), cl) levelcast(x, 1) levelcast(x, 2, over_level = "ignore") levelcast(x, 0)
cl <- codelist( codes = c("A", "B", "A1", "A2", "B1", "B2", "A1.1", "B2.2", "X"), parent = c(NA, NA, "A", "A", "B", "B", "A1", "B2", NA), missing = c(0, 0, 0, 0, 0, 0, 0, 0, 1) ) x <- code(c("A1.1", "A1", "A2", "B2.2", "B2.2", NA, "B2", "X"), cl) levelcast(x, 1) levelcast(x, 2, over_level = "ignore") levelcast(x, 0)
Contains fictional codes for various types of objects
Data frame with 16 records and 5 columns.
code
the code used for the object
label
label of the code
parent
the parent of the object in the hierarchy
locale
the locale of the label of the code
missing
should the code be treated as a missing value
Contains fictional data with sales of various types of objects.
Data frame with 100 records and 4 columns.
product
the code used for the object. Corresponds
to codes in objectcodes
.
unitprice
price per object.
quantity
number of objects sold.
totalprice
total price of sold objects.