Title: | Label Data Frames, Variables, and Values |
---|---|
Description: | Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference. |
Authors: | Robert Hartman [aut, cre], The MITRE Corporation [cph] |
Maintainer: | Robert Hartman <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.9 |
Built: | 2025-01-07 06:46:42 UTC |
Source: | CRAN |
add_factor_info
searches a data.frame for labelr-specific factor meta-data,
which it records and retains for future use. It is used by other labelr
functions and need not be used directly by labelr end users.
add_factor_info(data)
add_factor_info(data)
data |
the data.frame to which you wish to add labelr-specific factor variable meta-data attributes (if any factors are present). |
a data.frame.
# this function does not strictly require prior or other use of labelr ir2 <- add_factor_info(iris) mt2 <- add_factor_info(mtcars) get_factor_info(mtcars) # none get_factor_info(iris) # none get_factor_info(mt2) # none get_factor_info(ir2) # some!
# this function does not strictly require prior or other use of labelr ir2 <- add_factor_info(iris) mt2 <- add_factor_info(mtcars) get_factor_info(mtcars) # none get_factor_info(iris) # none get_factor_info(mt2) # none get_factor_info(ir2) # some!
Add a 500-or-fewer-characters high-level descriptive label for your data.frame as whole (e.g., nature, originator, population / sample, year created, general contents, article citation).
add_frame_lab(data, frame.lab = NULL) afl(data, frame.lab = NULL)
add_frame_lab(data, frame.lab = NULL) afl(data, frame.lab = NULL)
data |
a data.frame. |
frame.lab |
quoted text of the descriptive data.frame label that you wish to add to the data.frame. |
add_frame_lab
assigns an overall descriptive "frame label" for a data.frame,
which can be retrieved using get_frame_lab.
Note: afl
is a compact alias for add_frame_lab
: they do the same thing,
and the former is easier to type
A data.frame, with a frame.lab attribute added to the attributes meta-data
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") attr(mt2, "frame.lab") # check for attribute get_frame_lab(mt2) # return frame.lab alongside data.frame name as a data.frame
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") attr(mt2, "frame.lab") # check for attribute get_frame_lab(mt2) # return frame.lab alongside data.frame name as a data.frame
add_lab_atts
allows one to apply a list of labelr label attribute meta-data
(created by get_all_lab_atts
) to a data.frame.
add_lab_atts( data, lab.atts.list, strip.first = FALSE, num.convert = FALSE, clean = TRUE )
add_lab_atts( data, lab.atts.list, strip.first = FALSE, num.convert = FALSE, clean = TRUE )
data |
a data.frame object. |
lab.atts.list |
a list previously created using |
strip.first |
FALSE if you do not wish to strip the data.frame of all label attribute information it may already have (this information may still be overwritten, depending on what is in the lab.atts.list list). |
num.convert |
attempt to convert to numeric any data.frame variables where this can be done without producing new NA values. |
clean |
after adding label attributes, put them into a neat, logical order and drop any attributes that describe variables (columns) not present in the data.frame to which they have been added. |
See get_all_lab_atts
.
add_lab_atts
allows one to add or restore label attributes from a free-
standing list (created by get_all_lab_atts
) to a data.frame.
a data.frame object with label attribute information (re-) attached (if it exists in the specified lab.atts.list).
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_val_labs(df, "raceth") # it's here zlab.df <- get_all_lab_atts(df) # back up labelr attributes for df df <- strip_labs(df) # this removes labs from df get_val_labs(df, "raceth") # it's gone check_any_lab_atts(df) # FALSE (means "no labs here") df <- add_lab_atts(df, zlab.df) # restore them
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_val_labs(df, "raceth") # it's here zlab.df <- get_all_lab_atts(df) # back up labelr attributes for df df <- strip_labs(df) # this removes labs from df get_val_labs(df, "raceth") # it's gone check_any_lab_atts(df) # FALSE (means "no labs here") df <- add_lab_atts(df, zlab.df) # restore them
For a single value-labeled column of a data.frame, create a copy of that column that replaces all of its values with the corresponding value labels and added that copy to the supplied data.frame.
add_lab_col1(data, var, suffix = "_lab") alc1(data, var, suffix = "_lab")
add_lab_col1(data, var, suffix = "_lab") alc1(data, var, suffix = "_lab")
data |
a data.frame. |
var |
the unquoted name of the column (variable) whose values you wish to replace with the corresponding value labels. |
suffix |
a suffix that will be appended to the name of the labels-on column that is added to the data.frame (e.g., if suffix = "_lab," the labels-on version of "x1" will be "x1_lab"). |
Note 1: add_lab_col1
is a variant of add_lab_cols
that allows you to
specify only one variable at a time but that allows you to pass its name
without quoting it (compare add_lab_col1(mtcars, am) to
add_lab_cols(mtcars, "am")).
Note 2: alc1
is a compact alias for add_lab_col1
: they do the same thing,
and the former is easier to type.
Note 3: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use add_lab_cols()
.
add_lab_col1
creates a "labels-on" version of a value-labeled column and
adds that new column to the supplied data.frame. Here, "labels-on" means that
the column's original values are replaced with the corresponding value
labels. Note that this column does not replace but is added to its
parent/source columns in the returned data.frame. The resulting "labels-on"
column is a simple, self-contained character column that cannot itself be
converted or reverted to the original ("labels-off") values of its
parent/source column. See add_lab_cols
for a list of other functions that
may be useful in working with value labels.
A data.frame consisting of the originally supplied data.frame, along with the labels-on column added to it.
# add "labels-on" version of "am" to copy of mtcars df <- mtcars # copy of mtcars # now, add value labels df <- add_val1( data = df, var = am, vals = c(0, 1), labs = c("automatic", "manual") ) # add value labels-on version of "am" to df, assign to df_plus df_plus <- add_lab_col1(df, am) head(df_plus[c("am", "am_lab")])
# add "labels-on" version of "am" to copy of mtcars df <- mtcars # copy of mtcars # now, add value labels df <- add_val1( data = df, var = am, vals = c(0, 1), labs = c("automatic", "manual") ) # add value labels-on version of "am" to df, assign to df_plus df_plus <- add_lab_col1(df, am) head(df_plus[c("am", "am_lab")])
Add copies of value-labeled columns to a data.frame, where the new columns' values are replaced with the corresponding value labels.
add_lab_cols(data, vars = NULL, suffix = "_lab") alc(data, vars = NULL, suffix = "_lab")
add_lab_cols(data, vars = NULL, suffix = "_lab") alc(data, vars = NULL, suffix = "_lab")
data |
a data.frame. |
vars |
the names of the columns (variables) for which "labels-on" (values replaced with value labels) versions of the variable will be added to the returned data.frame. |
suffix |
a suffix that will be added to the names of all labels-on variables added to the data.frame (the non-suffix portion of the variable name will be identical to the original variable, e.g., the labels-on version of "x1" will be "x1_lab" (or whatever alternative suffix you supply)). |
Note: alc
is a compact alias for add_lab_cols
: they do the same thing,
and the former is easier to type.
add_lab_cols
adds one or more "labels-on" columns to a data.frame, where
"labels-on" means that the column's original values are replaced with the
corresponding value labels. Note that these columns do not replace but are
added to their parent/source columns in the returned data.frame. The
resulting "labels-on" columns are simple, self-contained character columns
that cannot themselves be converted or reverted to the original
("labels-off") values of their parent/source columns.
For other ways of accessing or leveraging value labels, see, e.g.,
use_val_labs
, val_labs_vec
, add_lab_dummies
, lab_int_to_factor
,
flab
, slab
, get_val_labs
, with_val_labs
, headl
, taill
, somel
,
and tabl
. In particular, see use_val_labs
if, rather than adding a
"labels-on" column to a data.frame, you wish to replace a column's values
with the corresponding value labels. See val_labs_vec
if you wish to
convert a single, value-labeled column's values to labels and return the
result as a stand-alone vector.
A data.frame consisting of the originally supplied data.frame, along with (all or the select) labels-on variable versions added to it.
# one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) # add value labels-on versions of the foregoing to df and return as "df_plus" df_plus <- add_lab_cols(df) head(df_plus) head(df_plus[c("am", "am_lab")])
# one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) # add value labels-on versions of the foregoing to df and return as "df_plus" df_plus <- add_lab_cols(df) head(df_plus) head(df_plus[c("am", "am_lab")])
For a single value-labeled data.frame column, create a dummy (aka indicator) variable for each of that column's unique value labels.
add_lab_dumm1( data, var, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 ) ald1( data, var, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 )
add_lab_dumm1( data, var, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 ) ald1( data, var, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 )
data |
a data.frame with at least one value-labeled variable (column). |
var |
the unquoted name of the value-labeled variable (column) from which dummy variable columns will be generated. |
simple.names |
if TRUE (the default), dummy variable names will be the parent variable's name, followed by the sep separator (see above), followed by an automatically generated numerical id suffix. For example two dummy variable columns created from value-labeled column "tacos" using the sep argument of "." would be given the respective names "tacos.1" and "tacos.2"). |
sep |
the separator character to use in constructing dummy variable column names (appears between the dummy variable name prefix and suffix). |
prefix.length |
(NOTE: This argument is ignored if simple.names = TRUE). A 1L integer indicating the number of leading characters of the parent column's name to use in constructing dummy variable column names. For example, if simple.names = FALSE, if prefix.length = 2, and for a parent column named "tacos", each derivative dummy variable column name will begin with the prefix string "ta," (corresponding to the first two characters of "tacos"), followed by the sep separator character (see sep param, above), followed by the suffix string (see suffix.length param, below). |
suffix.length |
(NOTE: This argument is ignored if simple.names = TRUE). A 1L integer indicating the number of leading characters of each variable value label to use use in constructing dummy variable column names. For example, consider the following setup: parent column name is "tacos"; prefix.length = 3; sep = "_", and suffix.length = 2. In this case, if simple.names = FALSE, then a dummy variable column named "tac_so" would be created to represent those values of the tacos" column that have the value label "soft" (because "tac" are the first three letters of the parent column name, the separator is ".", and "so" are the first two characters in "soft"). |
Note 1: add_lab_dumm1
is a variant of add_lab_dummies
that allows you
to specify only one var to label at a time but that allows you to pass its
name without quoting.
Note 2: ald1
is a compact alias for add_lab_dumm1
: they do the same thing,
and the former is easier to type
Note 3: If the default of simple.names is used, dummy variable column names will be the "parent" variable column name, followed by a separator character (by default, "_"), followed by a number, to differentiate each dummy variable from the others in the set. If one of the automatically generated dummy column names is already "taken" by a pre-existing data.frame column, an error to this effect will be thrown. If simple.names = FALSE, then prefix.length and suffix.length arguments will be used to construct dummy variable column names using the leading characters of the parent column name, followed by a separator character, followed by the leading characters of the value label. (white spaces in the value label will be replaced with the separator character).
Note 4: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use add_lab_dummies()
.
A data.frame with dummy variables added for all value labels of the value-labeled column supplied to the var argument.
# one variable at a time, mtcars df <- mtcars # now, add 1-to-1 value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) # add many-to-1 value labels df <- add_m1_lab( data = df, vars = "gear", vals = 4:5, lab = "4+" ) # add quartile-based numerical range value labels df <- add_quant_labs( data = df, vars = "disp", qtiles = 4 ) # add "pretty" cut-based numerical range value labels (mpg_bins <- pretty(range(df$mpg, na.rm = TRUE))) df <- add_quant_labs(data = df, vars = "mpg", vals = mpg_bins) # add dummy variables for value labels of column "mpg" df1 <- add_lab_dumm1(df, var = mpg, simple.names = TRUE ) # simple.names = TRUE is default df1 # add dummy variables for value labels of column "am" df2 <- add_lab_dumm1(df, am, sep = ".", simple.names = FALSE, prefix.length = 2, suffix.length = 6 ) df2
# one variable at a time, mtcars df <- mtcars # now, add 1-to-1 value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) # add many-to-1 value labels df <- add_m1_lab( data = df, vars = "gear", vals = 4:5, lab = "4+" ) # add quartile-based numerical range value labels df <- add_quant_labs( data = df, vars = "disp", qtiles = 4 ) # add "pretty" cut-based numerical range value labels (mpg_bins <- pretty(range(df$mpg, na.rm = TRUE))) df <- add_quant_labs(data = df, vars = "mpg", vals = mpg_bins) # add dummy variables for value labels of column "mpg" df1 <- add_lab_dumm1(df, var = mpg, simple.names = TRUE ) # simple.names = TRUE is default df1 # add dummy variables for value labels of column "am" df2 <- add_lab_dumm1(df, am, sep = ".", simple.names = FALSE, prefix.length = 2, suffix.length = 6 ) df2
For one or more value-labeled data.frame columns, create a dummy (aka indicator) variable for each unique value label.
add_lab_dummies( data, vars, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 ) ald( data, vars, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 )
add_lab_dummies( data, vars, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 ) ald( data, vars, simple.names = TRUE, sep = "_", prefix.length = 4, suffix.length = 7 )
data |
a data.frame with at least one value-labeled variable (column). |
vars |
the value-labeled variable or variables from which dummy variable columns will be generated (variable names must be quoted). |
simple.names |
if TRUE (the default), dummy variable names will be the parent variable's name, followed by the sep separator (see above), followed by an automatically generated numerical id suffix. For example two dummy variable columns created from value-labeled column "tacos" using the sep argument of "." would be given the respective names "tacos.1" and "tacos.2"). |
sep |
the separator character to use in constructing dummy variable column names (appears between the dummy variable name prefix and suffix). |
prefix.length |
(NOTE: This argument is ignored if simple.names = TRUE). A 1L integer indicating the number of leading characters of the parent column's name to use in constructing dummy variable column names. For example, if simple.names = FALSE, if prefix.length = 2, and for a parent column named "tacos", each derivative dummy variable column name will begin with the prefix string "ta," (corresponding to the first two characters of "tacos"), followed by the sep separator character (see sep param, above), followed by the suffix string (see suffix.length param, below). |
suffix.length |
(NOTE: This argument is ignored if simple.names = TRUE). A 1L integer indicating the number of leading characters of each variable value label to use in constructing dummy variable column names. For example, consider the following setup: parent column name is "tacos"; prefix.length = 3; sep = "", and suffix.length = 2. In this case, if simple.names = FALSE, then a dummy variable column named "tac_so" would be created to represent those values of the tacos" column that have the value label "soft" (because "tac" are the first three letters of the parent column name, the separator is "", and "so" are the first two characters in "soft"). |
If the default of simple.names is used, dummy variable column names will be the "parent" variable column name, followed by a separator character (by default, "_"), followed by a number, to differentiate each dummy variable from the others in the set. If one of the automatically generated dummy column names is already "taken" by a pre-existing data.frame column, an error to this effect will be thrown. If simple.names = FALSE, then prefix.length and suffix.length arguments will be used to construct dummy variable column names using the leading characters of the parent column name, followed by a separator character, followed by the leading characters of the value label. (white spaces in the value label will be replaced with the separator character).
Note: ald()
is an alias function that behaves identically to
add_lab_dummies
.
A data.frame with dummy variables added for all value labels of the value-labeled columns supplied to the vars argument.
# one variable at a time, mtcars df <- mtcars # now, add 1-to-1 value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) # add many-to-1 value labels df <- add_m1_lab( data = df, vars = "gear", vals = 4:5, lab = "4+" ) # add quartile-based numerical range value labels df <- add_quant_labs( data = df, vars = "disp", qtiles = 4 ) # add "pretty" cut-based numerical range value labels (mpg_bins <- pretty(range(df$mpg, na.rm = TRUE))) df <- add_quant_labs(data = df, vars = "mpg", vals = mpg_bins) # add dummy variables for the labels of column "am" df2 <- add_lab_dummies(df, "am", sep = ".", simple.names = FALSE, prefix.length = 2, suffix.length = 6 ) df2 # add dummy variables for the labels of columns "mpg", "gear", and "cyl df3 <- add_lab_dummies(df, c("mpg", "gear", "cyl"), simple.names = TRUE) # default df3
# one variable at a time, mtcars df <- mtcars # now, add 1-to-1 value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) # add many-to-1 value labels df <- add_m1_lab( data = df, vars = "gear", vals = 4:5, lab = "4+" ) # add quartile-based numerical range value labels df <- add_quant_labs( data = df, vars = "disp", qtiles = 4 ) # add "pretty" cut-based numerical range value labels (mpg_bins <- pretty(range(df$mpg, na.rm = TRUE))) df <- add_quant_labs(data = df, vars = "mpg", vals = mpg_bins) # add dummy variables for the labels of column "am" df2 <- add_lab_dummies(df, "am", sep = ".", simple.names = FALSE, prefix.length = 2, suffix.length = 6 ) df2 # add dummy variables for the labels of columns "mpg", "gear", and "cyl df3 <- add_lab_dummies(df, c("mpg", "gear", "cyl"), simple.names = TRUE) # default df3
Apply a single variable value label to multiple values of a variable ("m1" is shorthand for "many values get one label").
add_m1_lab( data, vars, vals, lab, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE ) am1l( data, vars, vals, lab, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE )
add_m1_lab( data, vars, vals, lab, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE ) am1l( data, vars, vals, lab, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE )
data |
a data.frame. |
vars |
a character vector that corresponds to the name(s) of one or more variables to which value labels will be added. |
vals |
a vector of distinct values of the actual variable, each of which is to be associated with the single label supplied to the lab argument. Note: NA and other "irregular" (e.g., NaN, Inf) values all are automatically assigned the label "NA", and this cannot be overridden. |
lab |
a single distinct label that will be associated with all values specified in your vals argument. Note: NA and other "irregular" (e.g., NaN, Inf) values are automatically assigned the label "NA" and may not be assigned another label. |
partial |
To apply the same value labeling scheme to many variables at once, you can provide those variable names explicitly (e.g., vars = c("x1","x2", "x3") or vars = paste0("x", 1:3), or you can provide a substring only and set partial = TRUE (default is FALSE). For example, to apply the same labeling scheme to vars "x1", "x2" ... sequentially through "x10", you could use vars = c("x"), along with partial = TRUE. Be careful with this, as it also will attempt to apply the scheme to "sex" or "tax.bracket", etc. |
not.vars |
use of the partial argument can result in situations where
you inadvertently attempt to value-label a variable. For example, if vars="x"
and partial=TRUE, then |
max.unique.vals |
|
init |
assign placeholder labels for variables that lack decimals and meet the max.unique.vals threshold. |
'
add_m1_lab(and
add1m1) allows the user to assign the same value label to multiple distinct values of a variable ("m1" is short for "many-to-one"). This is in contrast to
add_val_labsand
add_val1', which require a strict
one-to-one mapping of distinct variable values and distinct value labels.
Note 1: Each call to add_m1_lab
accepts only one value label, which may be
applied to multiple distinct values of the specified column(s). Additional
labels can be applied to other values of the same column(s) by making
additional calls to add_m1_lab
(see the example).
Note 2: am1l
is a compact alias for add_m1_lab
: they do the same thing,
and the former is easier to type
A data.frame, with new variable value labels added (call
get_val_labs
to see them), other provisional/default labelr label
information added, and previous user-added labelr label information
preserved.
df <- mtcars df <- add_m1_lab(df, vars = "carb", vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add_m1_lab(df, vars = "carb", vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) get_val_lab1(df, carb) head(use_val_labs(df), 8) # they're there
df <- mtcars df <- add_m1_lab(df, vars = "carb", vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add_m1_lab(df, vars = "carb", vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) get_val_lab1(df, carb) head(use_val_labs(df), 8) # they're there
Add descriptive variable name labels (up to one per column) to the columns of a data.frame.
add_name_labs( data, name.labs = NULL, vars = NULL, labs = NULL, init.max = NULL ) anl(data, name.labs = NULL, vars = NULL, labs = NULL, init.max = NULL)
add_name_labs( data, name.labs = NULL, vars = NULL, labs = NULL, init.max = NULL ) anl(data, name.labs = NULL, vars = NULL, labs = NULL, init.max = NULL)
data |
the data.frame you wish to begin labeling. |
name.labs |
a named character vector, where names are current data.frame column (variable) names, and where values are the proposed labels. If this is NULL, vars and labs arguments may not be NULL. If latter are not NULL, this (name.labs) argument must be NULL. |
vars |
the names of the columns (variables) to which name labels will be applied. If NULL, labs arg must also be NULL and name.labs cannot be NULL. |
labs |
the proposed variable name labels to applied to the columns (variables). If non-NULL, vars arg must also be non-NULL. |
init.max |
If non-NULL, this must be a 1L integer, indicating the maximum number of unique values that a variable may have for it to receive placeholder value labels, which will consist of the variable's actual values coerced to character values. If NULL, or if the variable is numeric with decimal values, the variable will not be given initialized variable value labels. |
Note: anl
is a compact alias for add_name_labs
: they do the same thing,
and the former is easier to type.
add_name_labs
works with get_name_labs
, use_name_labs
, and
use_var_names
to facilitate the creation, accessing, and substitution of
variable name labels for variable names.
Each variable (column) of a data.frame can receive one and only one "name
label," which typically is a noun phrase that expounds the meaning of
contents of the variable's name (e.g., "Weight in ounces at birth" might be a
name label for a column called "wgt"). add_name_labs
takes a data.frame and
either a named character vector (names are current variable names, values are
proposed name labels) supplied to the name.labs arg or two separate character
vectors (one each for current variable names and proposed variable name
labels, respectively) supplied to vars and labs args, respectively. If using
the second approach, the order of each entry matters (e.g., the first
variable name entry to the vars argument will be given the label of the first
name label entry to the labs argument, and so on).
Note that any non-name-labeled columns will receive their own names as
default name labels (e.g., if var "mpg" of mtcars is not assigned a name
label, it will be given the default name label of "mpg"). Note also that
other labelr functions (e.g., add_val_labs
) will initialize name labels
and other labelr attribute meta-data in this same fashion. Name labels can
be removed with drop_name_labs
.
A data.frame, with new name labels added (call get_name_labs
to
see them), other provisional/default labelr label information added, and
previous user-added labelr label information preserved.
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df) # use these df_labs_as_names <- use_name_labs(df) head(df_labs_as_names)[1:3] # these are verbose, so, only show first three head(df)[1:3] # now revert back df_names_as_before <- use_var_names(df_labs_as_names) head(df_names_as_before)[1:3] # indeed, they are as before identical(head(df), head(df_names_as_before)) # strip name label meta-data information from df # NOT same as use_var_names(), which preserves the info but "turns it off" # this strips the name labels meta-data from df altogether df <- drop_name_labs(df) # see what we have get_name_labs(df) # they're gone # alternative syntax (if you have a named vector like names_labs_vec) # assign variable name labels df <- add_name_labs(df, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) # replace two variable name labels, keeping the others df <- add_name_labs(df, name.labs = c( "disp" = toupper("displacement"), "mpg" = toupper("miles per gallon") ) ) attributes(df) # show all attributes get_name_labs(df) # show only the variable name labels get_name_labs(df, var = c("disp", "mpg")) # again, strip name label meta-data information from df # NOT same as use_var_names(), which preserves the info but "turns it off" df <- drop_name_labs(df) # see what we have get_name_labs(df) # they're gone # alternative syntax to add name labels df <- add_name_labs(df, vars = c("carb", "am"), labs = c("how many carburetors?", "automatic or stick?") ) # see what we have get_name_labs(df) # they're back! (and placeholders for others) # add another df <- add_name_labs(df, vars = c("mpg"), labs = c("miles per gallon, of course") ) # see what we have get_name_labs(df) # it's been added, and others preserved head(use_name_labs(df)[c(1, 9, 11)]) # verbose, but they're there
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df) # use these df_labs_as_names <- use_name_labs(df) head(df_labs_as_names)[1:3] # these are verbose, so, only show first three head(df)[1:3] # now revert back df_names_as_before <- use_var_names(df_labs_as_names) head(df_names_as_before)[1:3] # indeed, they are as before identical(head(df), head(df_names_as_before)) # strip name label meta-data information from df # NOT same as use_var_names(), which preserves the info but "turns it off" # this strips the name labels meta-data from df altogether df <- drop_name_labs(df) # see what we have get_name_labs(df) # they're gone # alternative syntax (if you have a named vector like names_labs_vec) # assign variable name labels df <- add_name_labs(df, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) # replace two variable name labels, keeping the others df <- add_name_labs(df, name.labs = c( "disp" = toupper("displacement"), "mpg" = toupper("miles per gallon") ) ) attributes(df) # show all attributes get_name_labs(df) # show only the variable name labels get_name_labs(df, var = c("disp", "mpg")) # again, strip name label meta-data information from df # NOT same as use_var_names(), which preserves the info but "turns it off" df <- drop_name_labs(df) # see what we have get_name_labs(df) # they're gone # alternative syntax to add name labels df <- add_name_labs(df, vars = c("carb", "am"), labs = c("how many carburetors?", "automatic or stick?") ) # see what we have get_name_labs(df) # they're back! (and placeholders for others) # add another df <- add_name_labs(df, vars = c("mpg"), labs = c("miles per gallon, of course") ) # see what we have get_name_labs(df) # it's been added, and others preserved head(use_name_labs(df)[c(1, 9, 11)]) # verbose, but they're there
Add variable-specific value labels based on threshold cuts of a numerical variable.
add_quant_labs( data, vars, qtiles = NULL, vals = NULL, labs = NULL, partial = FALSE, not.vars = NULL ) aql( data, vars, qtiles = NULL, vals = NULL, labs = NULL, partial = FALSE, not.vars = NULL )
add_quant_labs( data, vars, qtiles = NULL, vals = NULL, labs = NULL, partial = FALSE, not.vars = NULL ) aql( data, vars, qtiles = NULL, vals = NULL, labs = NULL, partial = FALSE, not.vars = NULL )
data |
a data.frame. |
vars |
a character vector that corresponds to the name(s) of one or more variables to which value threshold-based labels will be added. |
qtiles |
the number of quantile categories to employ (e.g., 4 would indicate quartiles, 5 would indicate quintiles, 10 for deciles, etc.). If NULL, vals must be non-NULL. |
vals |
one more values of vars that will define range cutpoints, such that all values at or below a given number and above the preceding val will be treated as part of the same numerical range for labeling purposes. If NULL, qtiles must be non-NULL. |
labs |
a character vector of distinct labels to identify the quantiles. If left NULL, convention "q" + quantile (e.g., "q10") will be used for qtile-based labels (i.e., if qtiles arg is non-NULL), and convention "<=" + val will be used for vals argument-based labels (i.e., if vals arg is non-NULL). Note that the labels "NA" and "Other" are (non-case-sensitively) reserved and may not be user-supplied. |
partial |
To apply the same numerical value labeling scheme to many variables at once, you can provide those variable names explicitly (e.g., vars = c("x1","x2", "x3") or vars = paste0("x", 1:3), or you can provide a substring only and set partial = TRUE (default is FALSE). For example, to apply the same labeling scheme to vars "x1", "x2" ... sequentially through "x10", you could use vars = c("x"), along with partial = TRUE. Be careful with this, as it also will attempt to apply the scheme to "sex" or "tax.bracket", etc. (See not.vars argument for a way to mitigate this.) |
not.vars |
use of the partial argument can result in situations where
you inadvertently attempt to value-label a variable. For example, if vars="x"
and partial=TRUE, then |
Note: aql
is a compact alias for add_quant_labs
: they do the same thing,
and the former is easier to type.
Numerical variables that feature decimals or large numbers of distinct values
are not eligible to receive conventional value labels. add_quant_labs
allows
one to label such variables according to user-supplied value thresholds
(i.e., cutpoints) OR quantile membership, Thus, unlike value labels added
with add_val_labs
(and add_val1
), add_quant_labs
(and add_quant1
)
will apply the same value label to all values that fall within the numerical
value range defined by each threshold (cutpoint). For still another
value-labeling approach, see add_m1_lab
(and add1m1
).
Note: Quantity labels cannot be added incrementally through repeated calls
to add_quant_labs
: each new call will overwrite all value labels applied to
the specified vars in any previous add_quant_labs
calls. This is in
contrast to add_val_labs
(which allows for incremental value-labeling) and
add_m1_lab
(which requires incremental value-labeling).
A data.frame, with new variable value labels added (call
get_val_labs
to see them), other provisional/default labelr label
information added, and previous user-added labelr label information
preserved.
# mtcars demo df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # label variable "mpg" in terms of 5 quintiles df <- add_quant_labs(data = df, vars = "mpg", qtiles = 5) # label variable "disp" in terms of "pretty" cutpoints vals2use <- pretty(c(min(df$disp), max(df$disp)))[-1] # establish cutpoints df <- add_quant_labs(data = df, vars = "disp", vals = vals2use) df_labson <- use_val_labs(df) head(df_labson)
# mtcars demo df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # label variable "mpg" in terms of 5 quintiles df <- add_quant_labs(data = df, vars = "mpg", qtiles = 5) # label variable "disp" in terms of "pretty" cutpoints vals2use <- pretty(c(min(df$disp), max(df$disp)))[-1] # establish cutpoints df <- add_quant_labs(data = df, vars = "disp", vals = vals2use) df_labson <- use_val_labs(df) head(df_labson)
Add variable-specific value labels based on threshold cuts of a single numerical variable.
add_quant1(data, var, qtiles = NULL, vals = NULL, labs = NULL) aql1(data, var, qtiles = NULL, vals = NULL, labs = NULL)
add_quant1(data, var, qtiles = NULL, vals = NULL, labs = NULL) aql1(data, var, qtiles = NULL, vals = NULL, labs = NULL)
data |
a data.frame. |
var |
the unquoted name of the variable to which value labels will be added. |
qtiles |
the number of quantile categories to employ (e.g., 4 would indicate quartiles, 5 would indicate quintiles, 10 for deciles, etc.). If NULL, vals must be non-NULL. |
vals |
one more values of var that will define range cutpoints, such that all values at or below a given number and above the preceding val will be treated as part of the same numerical range for labeling purposes. If NULL, qtiles must be non-NULL. |
labs |
a character vector of distinct labels to identify the quantiles. If left NULL, convention "q" + quantile (e.g., "q10") will be used for qtile-based labels (i.e., if qtiles arg is non-NULL), and convention "<=" + val will be used for vals argument-based labels (i.e., if vals arg is non-NULL). Note that the labels "NA" and "Other" are (non-case-sensitively) reserved and may not be user-supplied. |
add_quant1
is a variant of add_quant_labs
that allows you to specify only
one var to label but allows you to pass its name without quoting it
(compare add_quant1
(mtcars, mpg) to add_quant_labs
(mtcars, "mpg").
Numerical variables that feature decimals or large numbers of distinct values
are not eligible to receive conventional value labels. add_quant1
allows
one to label such variables according to user-supplied value thresholds
(i.e., cutpoints) OR quantile membership, Thus, unlike value labels added
with add_val_labs
(and add_val1
), add_quant1
(and add_quant_labs
)
will apply the same value label to all values that fall within the numerical
value range defined by each threshold (cutpoint). For still another
value-labeling approach, see add_m1_lab
(and add1m1
).
Note 1: Quantity labels cannot be added incrementally through repeated calls
to add_quant1
: each new call will overwrite all value labels applied to
the specified vars in any previous add_quant1
calls. This is in contrast to
add_val_labs
(which allows for incremental value-labeling) and add_m1_lab
(which requires incremental value-labeling).
Note 2: aql1
is a compact alias for add_quant1
: they do the same thing, and
the former is easier to type
Note 3: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use add_quant_labs()
.
A data.frame, with new variable value labels added (call
get_val_labs
to see them), other provisional/default labelr label
information added, and previous user-added labelr label information
preserved.
# mtcars demo df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # label variable "mpg" in terms of 5 quintiles df <- add_quant1(data = df, mpg, qtiles = 5) # label variable "disp" in terms of "pretty" cutpoints vals2use <- pretty(c(min(df$disp), max(df$disp)))[-1] # establish cutpoints df <- add_quant1(data = df, disp, vals = vals2use) df_labson <- use_val_labs(df) head(df_labson)
# mtcars demo df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # label variable "mpg" in terms of 5 quintiles df <- add_quant1(data = df, mpg, qtiles = 5) # label variable "disp" in terms of "pretty" cutpoints vals2use <- pretty(c(min(df$disp), max(df$disp)))[-1] # establish cutpoints df <- add_quant1(data = df, disp, vals = vals2use) df_labson <- use_val_labs(df) head(df_labson)
Add variable value-specific, descriptive value labels to a data.frame.
add_val_labs( data, vars, vals, labs, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE ) avl( data, vars, vals, labs, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE )
add_val_labs( data, vars, vals, labs, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE ) avl( data, vars, vals, labs, partial = FALSE, not.vars = NULL, max.unique.vals = 10, init = FALSE )
data |
a data.frame. |
vars |
a character vector that corresponds to the name(s) of one or more variables to which value labels will be added. |
vals |
a vector of distinct values of the actual variable, each of which
is to be associated with a label supplied to the labs argument in the same
positional order (e.g., vals = c(1,0), labs = c("manual", "automatic") will
associate lab "manual" with val 1 and lab "automatic" with val 0.). Note:
NA and other "irregular" (e.g., NaN, Inf) values all are automatically
assigned the label "NA", and this cannot be overridden. Note that you do not
need to specify all unique vals of var, and you can supply value labels
incrementally, one (or a few, or all) unique vals of var at a time. Once
you've added the value label, it is bound to that value until you drop it
(see |
labs |
a character vector of distinct label values, each of which is to be associated with exactly one corresponding distinct value (vals argument element) of the variable(s) identified in the vars argument. The order of labs argument must match that of vals argument entries (e.g., if a three-element vector of values is supplied to vals, then a three-element vector of proposed labels must be supplied to labs, and the first value of vals will get the first label of labs, the second value of vals will get the second label of labs, etc.). Note: NA and other "irregular" (e.g., NaN, Inf) values are automatically assigned the label "NA" and may not be assigned another label. |
partial |
To apply the same value labeling scheme to many variables at once, you can provide those variable names explicitly (e.g., vars = c("x1","x2", "x3") or vars = paste0("x", 1:3), or you can provide a substring only and set partial = TRUE (default is FALSE). For example, to apply the same labeling scheme to vars "x1", "x2" ... sequentially through "x10", you could use vars = c("x"), along with partial = TRUE. Be careful with this, as it also will attempt to apply the scheme to "sex" or "tax.bracket", etc. |
not.vars |
use of the partial argument can result in situations where
you inadvertently attempt to value-label a variable. For example, if vars="x"
and partial=TRUE, then |
max.unique.vals |
|
init |
assign placeholder labels for variables that lack decimals and meet the max.unique.vals threshold. |
Note: avl
is a compact alias for add_val_labs
: they do the same thing,
and the former is easier to type
add_val_labs
is intended for associating value labels with binary,
nominal, or ordinal (e.g., integer) variables, where each of a limited number
of distinct values is to be associated one-to-one with a distinct value label.
To assign labels to ranges of numerical variables, see add_quant_labs
(or
add_quant1
). To apply the same label to multiple distinct values of a
variable, see add_m1_lab
or add1m1
.
add_val_labs
works with other labelr functions (e.g., add_val1
,
drop_val_labs
, get_val_labs
, use_val_labs
, add_lab_cols
) to
facilitate the creation, accessing, modification, use, or deletion of
variable value labels.
When using add_val_labs
or add_val1
, each distinct variable value can
receive one and only one value label, and for any given variable, each unique
label can be assigned to only one unique value (e.g., mtcars$gear==3 and
mtcars$gear==4 cannot both share a single "3 or 4 gears" label: each of these
two distinct values must have its own label). This latter constraint may be
relaxed by using add_m1_lab
.
If partial = TRUE, add_val_labs
will apply the specified labeling scheme to
all variables that contain a key variable name substring of interest
(supplied to the vars argument), which may be one or more variables found in
the data.frame (see Example #2).
A data.frame, with new variable value labels added (call
get_val_labs
to see them), other provisional/default labelr label
information added, and previous user-added labelr label information
preserved.
# Example #1 - mtcars example, one variable at a time # one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars" arg) df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) head(use_val_labs(df), 3) # they're there # Example #2 - (Fake) Likert Data # add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1) # keep a copy dflik_conv <- use_val_labs(dflik) head(dflik_conv, 3)
# Example #1 - mtcars example, one variable at a time # one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars" arg) df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) head(use_val_labs(df), 3) # they're there # Example #2 - (Fake) Likert Data # add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1) # keep a copy dflik_conv <- use_val_labs(dflik) head(dflik_conv, 3)
Add variable value-specific, descriptive value labels to a data.frame.
add_val1(data, var, vals, labs, max.unique.vals = 10, init = FALSE) avl1(data, var, vals, labs, max.unique.vals = 10, init = FALSE)
add_val1(data, var, vals, labs, max.unique.vals = 10, init = FALSE) avl1(data, var, vals, labs, max.unique.vals = 10, init = FALSE)
data |
a data.frame. |
var |
the unquoted name of the variable (column) to which value labels will be added. |
vals |
a vector of distinct values of the actual variable, each of which
is to be associated with a label supplied to the labs argument in the same
positional order (e.g., vals = c(1,0), labs = c("manual", "automatic") will
associate lab "manual" with val 1 and lab "automatic" with val 0.). Note:
NA and other "irregular" (e.g., NaN, Inf) values all are automatically
assigned the label "NA", and this cannot be overridden. Note that you do not
need to specify all unique vals of var, and you can supply value labels
incrementally, one (or a few, or all) unique vals of var at a time. Once
you've added the value label, it is bound to that value until you drop it
(see |
labs |
a character vector of distinct label values, each of which is to be associated with exactly one corresponding distinct value (vals argument element) of the variable identified in the var argument. The order of labs argument must match that of vals argument entries (e.g., if a three-element vector of values is supplied to vals, then a three- element vector of proposed labels must be supplied to labs, and the first value of vals will get the first label of labs, the second value of vals will get the second label of labs, etc.). Note: NA and other "irregular" (e.g., NaN, Inf) values are automatically assigned the label "NA" and may not be assigned another label. |
max.unique.vals |
|
init |
assign placeholder labels for variables that lack decimals and meet the max.unique.vals threshold. |
add_val1
is intended for associating value labels with binary, nominal, or
ordinal (e.g., integer) variables, where each of a limited number of distinct
values is to be associated one-to-one with a distinct value label. To assign
labels to ranges of numerical variables, see add_quant_labs
(or
add_quant1
). To apply the same label to multiple distinct values of a
variable, see add_m1_lab
or add1m1
.
add_val1
works with other labelr functions (e.g., add_val_labs
,
drop_val_labs
, get_val_labs
, use_val_labs
, add_lab_cols
) to
facilitate the creation, accessing, modification, use, or deletion of
variable value labels.
Note 1: add_val1
is a variant of add_val_labs
that allows you to specify
only one var to label at a time but that allows you to pass its name without
quoting it (compare add_val1(mtcars, am) to add_val_labs(mtcars, "am").
Note 2: avl1
is a compact alias for add_val1
: they do the same thing,
and the former is easier to type
Note 3: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use add_val_labs()
.
A data.frame, with new name labels added (call get_val_labs
to see
them), other provisional/default labelr label information added, and previous
user-added labelr label information preserved.
# one variable at a time, mtcars df <- mtcars # add value labels # first, using add_val_labs() -- add_val1() example is below df <- add_val_labs( data = df, vars = "carb", # note, vars arg; add_val1() takes var arg vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # now, using add_val1(), where single var arg can be unquoted (cyl, not "cyl") # note that this is not add_val_labs(); df <- add_val1( data = df, var = cyl, # note, var arg, not vars arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) )
# one variable at a time, mtcars df <- mtcars # add value labels # first, using add_val_labs() -- add_val1() example is below df <- add_val_labs( data = df, vars = "carb", # note, vars arg; add_val1() takes var arg vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # now, using add_val1(), where single var arg can be unquoted (cyl, not "cyl") # note that this is not add_val_labs(); df <- add_val1( data = df, var = cyl, # note, var arg, not vars arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) )
Apply a single variable value label to multiple values of a variable ("m1" is shorthand for "many values get one label").
add1m1(data, var, vals, lab, max.unique.vals = 10, init = FALSE)
add1m1(data, var, vals, lab, max.unique.vals = 10, init = FALSE)
data |
a data.frame. |
var |
the unquoted name of the variable (column) to which value labels will be added. |
vals |
a vector of distinct values of the actual variable, each of which
is to be associated with the label supplied to the lab argument. Note: NA and
other "irregular" (e.g., NaN, Inf) values all are automatically assigned the
label "NA", and this cannot be overridden. Note that you do not need to
specify all unique vals of var, and you can supply value labels
incrementally, one (or a few, or all) unique vals of var at a time. However,
if you do this, do not re-use a value label or repeat a value-label
assignment you have already made: Once you've added the value label, it is
bound to those values until you drop the label (see |
lab |
a single distinct label that will be associated with all values specified in your vals argument. Note: NA and other "irregular" (e.g., NaN, Inf) values are automatically assigned the label "NA" and may not be assigned another label. |
max.unique.vals |
|
init |
assign placeholder labels for variables that lack decimals and meet the max.unique.vals threshold. |
Note 1: add1m1
is a variant of add_m1_lab
that allows you to specify
only one var to label but allows you to pass its name without quoting it
(compare add1m1
(mtcars, am, ...) to add_m1_lab
(mtcars, "carb", ...).
Note 2: add1m1
(and add_m1_lab
) allows the user to assign the same value
label to multiple distinct values of a variable ("m1" is short for
"many-to-one"). This is in contrast to add_val1
(and add_val_labs
), which
requires a strict one-to-one mapping of distinct variable values and distinct
value labels.
Note 3: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use add_m1_lab()
.
A data.frame, with new variable value labels added (call
get_val_labs
to see them), other provisional/default labelr label
information added, and previous user-added labelr label information
preserved.
df <- mtcars df <- add1m1(df, var = carb, vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add1m1(df, var = carb, vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) head(use_val_labs(df), 8) # they're there
df <- mtcars df <- add1m1(df, var = carb, vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add1m1(df, var = carb, vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) head(use_val_labs(df), 8) # they're there
Add variable-specific quantile-based value labels to all numeric variables of a data.frame that meet specified conditions.
all_quant_labs(data, qtiles = 5, not.vars = NULL, unique.vals.thresh = 10) allq(data, qtiles = 5, not.vars = NULL, unique.vals.thresh = 10)
all_quant_labs(data, qtiles = 5, not.vars = NULL, unique.vals.thresh = 10) allq(data, qtiles = 5, not.vars = NULL, unique.vals.thresh = 10)
data |
a data.frame. |
qtiles |
the number of quantile categories to employ (e.g., 4 would indicate quartiles, 5 would indicate quintiles, 10 for deciles, etc.). |
not.vars |
used to specify any numeric variables that should be exempted from this operation. |
unique.vals.thresh |
an integer. Numeric variables with fewer than this many unique variables will be exempted from the operation (i.e., will NOT receive quantile value labels). |
Note: allq
is a compact alias for all_quant_labs
: they do the same thing,
and the former is easier to type.
Numerical variables that feature decimals or large numbers of distinct values
are not eligible to receive conventional add_val_labs()
-style value labels.
all_quant_labs
allows one to label such variables based on quantile
thresholds.
A data.frame, with new variable value labels added.
# mtcars demo df <- mtcars get_val_labs(df) # none # add quintile val labs for all numeric vars with >10 unique vals df <- all_quant_labs(data = df, qtiles = 5, unique.vals.thresh = 10) get_val_labs(df) # here now headl(df) # show them; note this is labelr::headl(), not utils::head()
# mtcars demo df <- mtcars get_val_labs(df) # none # add quintile val labs for all numeric vars with >10 unique vals df <- all_quant_labs(data = df, qtiles = 5, unique.vals.thresh = 10) get_val_labs(df) # here now headl(df) # show them; note this is labelr::headl(), not utils::head()
For a given vector, does the length of (number of values in) the vector equal the number of unique values in the vector?
Note: all_univ
is a compact alias for all_uniquev
: they do the same
thing, and the former is easier to type
all_uniquev(x, na.rm = TRUE) all_univ(x, na.rm = TRUE)
all_uniquev(x, na.rm = TRUE) all_univ(x, na.rm = TRUE)
x |
a vector. |
na.rm |
a logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds. |
a 1L logical.
all_uniquev(mtcars$am) # FALSE set.seed(35994) z <- runif(25) all_univ(z) # TRUE; all_univ is an alias for all_uniquev() z[c(1, 2)] <- NA # two NA values added all_univ(z, na.rm = FALSE) # FALSE, because the two NA values are not unique
all_uniquev(mtcars$am) # FALSE set.seed(35994) z <- runif(25) all_univ(z) # TRUE; all_univ is an alias for all_uniquev() z[c(1, 2)] <- NA # two NA values added all_univ(z, na.rm = FALSE) # FALSE, because the two NA values are not unique
as_base_data_frame
noisily converts an augmented data.frame to a Base R
data.frame.
as_base_data_frame(data, fact.to.char = FALSE, irreg.to.na = FALSE) adf(data, fact.to.char = FALSE, irreg.to.na = FALSE)
as_base_data_frame(data, fact.to.char = FALSE, irreg.to.na = FALSE) adf(data, fact.to.char = FALSE, irreg.to.na = FALSE)
data |
a data.frame object. |
fact.to.char |
coerce all factor variables to character variables. |
irreg.to.na |
convert all irregular values (see |
Note: adf
is a compact alias for as_base_data_frame
: they do the same
thing, and the former is easier to type
To minimize dependencies and complexities, labelr label-assigning functions are designed to work exclusively with Base R data.frames, not alternative data structures like matrices or augmented data.frames, such as data.tables or tibbles. The suggested labeling workflow is to first assign and work with labels using a Base R data.frame and then convert the resulting object to an augmented data.frame as desired and without any assumption that labelr labels or functions will smoothly interoperate with the augmented data.frame construct or functions that depend on it.
as_base_data_frame
determines whether data argument is a conventional Base
R data.frame, some kind of augmented data.frame (e.g., data.table, tibble),
or not a data.frame at all (e.g., matrix). If the object has multiple
classes, one of which is a data.frame, the object is coerced to be a
conventional Base R data.frame, and a message to that effect is issued. If
the supplied object is not any kind of data.frame (i.e., a matrix is not any
kind of data.frame, while a data.table is a kind of data.frame), an error is
thrown. If the supplied object already is a Base R data.frame with no
additional classes (i.e., not an augmented data.frame), that supplied object
is returned with no changes made and no messages.
a data.frame object with any additional classes removed.
x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) dft <- tibble::as_tibble(df) class(dft) df_vanilla <- as_base_data_frame(dft) class(df_vanilla)
x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) dft <- tibble::as_tibble(df) class(dft) df_vanilla <- as_base_data_frame(dft) class(df_vanilla)
as_base_data_frame2
noisily converts an augmented data.frame to a Base R
data.frame, with any factors converted to character vectors, and any irregular
values (see irregular2()
) converted to NA.
as_base_data_frame2(data, fact.to.char = TRUE, irreg.to.na = TRUE) adf2(data, fact.to.char = TRUE, irreg.to.na = TRUE)
as_base_data_frame2(data, fact.to.char = TRUE, irreg.to.na = TRUE) adf2(data, fact.to.char = TRUE, irreg.to.na = TRUE)
data |
a data.frame object. |
fact.to.char |
coerce all factor variables to character variables. |
irreg.to.na |
convert all irregular values (see |
Note: adf2
is a compact alias for as_base_data_frame
: they do the same
thing, and the former is easier to type
as_base_data_frame2
is a variant of as_base_data_frame
with different
default values for fact.to.char and irreg.to.na. Whereas both of these
default to FALSE in as_base_data_frame
, they both default to TRUE in
as_base_data_frame2
. This is the only difference between the two functions.
As such, as_base_data_frame2
is intended as a simple shortcut to save
typing if one prefers to reverse these default logical argument values.
a data.frame object with any additional classes removed.
iris_tib <- tibble::as_tibble(iris) class(iris_tib) iris_tib$Sepal.Length[1] <- Inf head(iris_tib, 1) iris_df <- as_base_data_frame2(iris_tib) class(iris_df) sapply(iris_df, class) head(iris_df, 1)
iris_tib <- tibble::as_tibble(iris) class(iris_tib) iris_tib$Sepal.Length[1] <- Inf head(iris_tib, 1) iris_df <- as_base_data_frame2(iris_tib) class(iris_df) sapply(iris_df, class) head(iris_df, 1)
as_labeled_data_frame
quietly assigns the class labeled.data.frame to a
data.frame and discards other augmented data.frame classes, returning a
data.frame with classes labeled.data.frame and data.frame.
as_labeled_data_frame(data, fact.to.char = FALSE, irreg.to.na = FALSE) aldf(data, fact.to.char = FALSE, irreg.to.na = FALSE)
as_labeled_data_frame(data, fact.to.char = FALSE, irreg.to.na = FALSE) aldf(data, fact.to.char = FALSE, irreg.to.na = FALSE)
data |
a data.frame object. |
fact.to.char |
coerce all factor variables to character variables. |
irreg.to.na |
convert all irregular values (see |
Note 1: aldf
is a compact alias for as_labeled_data_frame
: they do the same
thing, and the former is easier to type
Note 2: as_labeled_data_frame
is used internally by other labelr commands
and is not intended for interactive use.
To minimize dependencies and complexities, labelr label-assigning functions are designed to work exclusively with Base R data.frames, not alternative data structures like matrices or augmented data.frames, such as data.tables or tibbles.
as_labeled_data_frame
determines whether data argument is a conventional
Base R data.frame, some kind of augmented data.frame (e.g., data.table,
tibble), or not a data.frame at all (e.g., matrix). If the submitted object
is a type of data.frame, the object will be returned with the class
labeled.data.frame applied, the class data.frame retained, and any other
previous class attributes discarded. If the supplied object is not any kind
of data.frame (i.e., a matrix is not any kind of data.frame, while a data.table
is a kind of data.frame), an error is thrown.
an object of classes labeled.data.frame and data.frame, with any additional classes removed.
x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) dft <- tibble::as_tibble(df) class(dft) dfl <- as_labeled_data_frame(dft) class(dfl)
x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) dft <- tibble::as_tibble(df) class(dft) dfl <- as_labeled_data_frame(dft) class(dfl)
as_num
identifies the character variables of a data.frame that can be
coerced to numeric without generating new NA values and, for those variables
where this can be done, it makes those conversions (similar to Stata's
destring command).
as_num(data, nan2na = TRUE, inf2na = TRUE)
as_num(data, nan2na = TRUE, inf2na = TRUE)
data |
a data.frame object. |
nan2na |
a logical argument. TRUE if the non-case-sensitive string "nan" should be converted to NA. |
inf2na |
a logical argument. TRUE if the non-case-sensitive strings "inf" or "-inf should be converted to NA. |
Core labelr functions coerce integers to characters and back, which as_num
facilitates. Note that character values of "NA" (including "na", "Na",
and "nA") will be converted to NA and, by default, so will other "irregular"
values (in the sense of check_irregular
).
a data.frame object with all applicable character variables coerced to numeric.
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) head(df, 3) sapply(df, class) class(df$x2) df <- as_num(df) head(df,3) sapply(df, class) class(df$x2)
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) head(df, 3) sapply(df, class) class(df$x2) df <- as_num(df) head(df,3) sapply(df, class) class(df$x2)
as_numv
determines whether a character vector can be coerced to numeric
without generating new NA values and, if so, it makes that conversion
(similar to Stata's destring command).
as_numv(x, nan2na = TRUE, inf2na = TRUE)
as_numv(x, nan2na = TRUE, inf2na = TRUE)
x |
a character vector. |
nan2na |
a logical argument. TRUE if the non-case-sensitive string "nan" should be converted to NA. |
inf2na |
a logical argument. TRUE if the non-case-sensitive strings "inf" or "-inf should be converted to NA. |
Core labelr functions coerce integers to characters and back, which
as_numv
facilitates. Note that character values of "NA" (including "na",
"Na", and "nA") will be converted to NA and, by default, so will other
"irregular" values (in the sense of check_irregular
).
a vector, converted to numeric if feasible (else, the same character vector that was supplied).
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x2_num <- as_numv(x2) class(x2) class(x2_num) head(x2)
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x2_num <- as_numv(x2) class(x2) class(x2_num) head(x2)
axis_lab
accepts a data.frame and single unquoted variable name and returns
that variable's name label for use in axis labeling or plot labeling function
options.
axis_lab(data, var) alb(data, var)
axis_lab(data, var) alb(data, var)
data |
a data.frame. |
var |
the unquoted name of a variable that exists in the data.frame and
is name-labeled (using |
Note 1: alb
is a compact alias for axis_lab
: they do the same thing,
and the former is easier to type.
Note 2: This command is intended exclusively for interactive use. In particular, the var argument must be the literal name of a single variable (column) found in the supplied data.frame and may NOT be, e.g., the name of a character vector that contains the variable (column name) of interest.
a 1L character vector with var's name label.
# copy mtcars to df # create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "wt" = "Weight (1000 lbs)" ) df <- add_name_labs(df, name.labs = names_labs_vec) # ggplot example of axis_lab() library(ggplot2) p <- ggplot(df, aes(mpg, wt, color = cyl)) + geom_point() p <- p + labs(color = axis_lab(df, cyl)) + xlab(axis_lab(df, mpg)) + ylab(axis_lab(df, wt)) # Base R plot example (using alb() alias) with(df, plot(mpg, wt, xlab = alb(df, mpg), ylab = alb(df, wt) ))
# copy mtcars to df # create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "wt" = "Weight (1000 lbs)" ) df <- add_name_labs(df, name.labs = names_labs_vec) # ggplot example of axis_lab() library(ggplot2) p <- ggplot(df, aes(mpg, wt, color = cyl)) + geom_point() p <- p + labs(color = axis_lab(df, cyl)) + xlab(axis_lab(df, mpg)) + ylab(axis_lab(df, wt)) # Base R plot example (using alb() alias) with(df, plot(mpg, wt, xlab = alb(df, mpg), ylab = alb(df, wt) ))
check_any_lab_atts
returns FALSE if your data.frame has no labelr-generated
meta-data attributes (still) associated with it (at all or of a specific
sub-type), and TRUE if it does.
check_any_lab_atts(data, labs = "any")
check_any_lab_atts(data, labs = "any")
data |
the data.frame you are checking for the presence (or absence) of labelr meta-data. |
labs |
which label meta-data you are looking for. Default of "any" will look for types "frame.lab", "name.labs", "val.labs", and "factor." (period is part of the substring), which are the core labelr meta-data. To search more narrowly, you can try things like labs = "val.labs", labs="name.labs", etc. |
By default (labs = "any"), this function looks to see if your data.frame's attributes includes any attribute with a name containing the substring "frame.lab", "name.labs", "val.labs", and/or "factor." These are the core substrings of the label meta-data attributes that labelr creates and manipulates. If you wish to narrow your search for a specific labelr, attribute, you may supply this as a character (sub)string (e.g., check_any_lab_atts(df, "val.labs.cyl") to see if the variable "cyl" has variable value label meta-data). But make sure that your second argument is meaningful (e.g., check_any_lab_atts(iris, "row.") will return TRUE true based on the presence of a standard "row.names" attribute, which has nothing to do with labels.
TRUE if any instance of the default or user-specified meta-data attribute is found, FALSE if not.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) check_any_lab_atts(df)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) check_any_lab_atts(df)
check_class
determines whether a vector's class is among those specified.
check_class( x, classes = c("numeric", "integer", "logical", "character", "factor", "ordered"), strict = TRUE )
check_class( x, classes = c("numeric", "integer", "logical", "character", "factor", "ordered"), strict = TRUE )
x |
the vector to check against specified classes. |
classes |
a character vector of classes against which x is checked. |
strict |
If TRUE, all of x's classes must be among those specified in the classes argument. If FALSE, at least one but not necessarily all of x's classes must be among those specified in the classes argument. |
By default (strict = TRUE), if a vector is of multiple classes, all of its classes must be among those specified via the classes argument.
a 1L logical vector indicating whether x's class is found among those passed to the classes argument.
check_class(mtcars$mpg) # TRUE check_class(mtcars$mpg, classes = c("numeric", "factor")) # TRUE check_class(iris$Species) # TRUE check_class(iris$Species, classes = c("logical", "numeric")) # FALSE check_class(mtcars$mpg, classes = c("logical", "character", "factor")) # FALSE
check_class(mtcars$mpg) # TRUE check_class(mtcars$mpg, classes = c("numeric", "factor")) # TRUE check_class(iris$Species) # TRUE check_class(iris$Species, classes = c("logical", "numeric")) # FALSE check_class(mtcars$mpg, classes = c("logical", "character", "factor")) # FALSE
Check a vector for the presence of "irregular" values, defined as NA values, other arbitrary values you specify, and (by default): NaN, Inf, -Inf, and character variants of same (i.e., upper, lower, or mixed-case variants of "NA","NAN","INF","-INF").
check_irregular( x, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF"), other = NULL, any = FALSE )
check_irregular( x, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF"), other = NULL, any = FALSE )
x |
an atomic vector to checked for the presence of (any) NA values. |
nan.include |
treat NaN values as NA (i.e., return TRUE if present). |
inf.include |
treat Inf and -Inf values as NA (i.e., return TRUE if present). |
special |
a modifiable set of default character values that will be treated as equivalent to NA values (i.e., will return TRUE if present). |
other |
an argument for additional values of arbitrary but consistent
class (e.g., all numeric, all character) that will be treated as equivalent
to NA values (i.e., |
any |
if TRUE, return a 1L vector that is TRUE if any irregular/NA-esque value is found in the vector x, FALSE if no such value is found; if any=FALSE, function will return a logical value for every element of x (TRUE if that specific value meets the "irregular"-ity test). |
check_irregular
is used by core labelr functions (e.g., add_val_labs
) to
ensure that NA and other irregular (e.g., Inf) values are handled in a simple
and consistent – and, hence, rigid – fashion. It is not intended as a user-
facing command as part of a labelr data-analytic workflow, though it may be
useful in other applications where one wishes to test a vector against a
focal and user-extensible class of NA-esque (or other) offending values.
A logical vector (1L if any==TRUE; length of x if any==FALSE).
# below is FALSE, because there is nothing NA-like in this vector check_irregular(1:10) # below is TRUE, because we're treating 99 as "NA-esque" check_irregular(1:100, other = 99) # below is TRUE, because of NA val check_irregular(c(1:100, NA)) # below is TRUE, because nan.include is on (by default) check_irregular(c(1:100, NaN), nan.include = TRUE) # below is TRUE, because inf.include is on (by default) check_irregular(c(1:100, Inf), inf.include = TRUE) # below is TRUE, because inf.include is on (by default) check_irregular(c(1:100, -Inf), inf.include = TRUE) # below is FALSE, it's just letters check_irregular(letters) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "NA")) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "NAN")) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "-iNf")) # below is FALSE, search for irregular vals is not substring/regex-based check_irregular(c(letters, "nan-iNf"))
# below is FALSE, because there is nothing NA-like in this vector check_irregular(1:10) # below is TRUE, because we're treating 99 as "NA-esque" check_irregular(1:100, other = 99) # below is TRUE, because of NA val check_irregular(c(1:100, NA)) # below is TRUE, because nan.include is on (by default) check_irregular(c(1:100, NaN), nan.include = TRUE) # below is TRUE, because inf.include is on (by default) check_irregular(c(1:100, Inf), inf.include = TRUE) # below is TRUE, because inf.include is on (by default) check_irregular(c(1:100, -Inf), inf.include = TRUE) # below is FALSE, it's just letters check_irregular(letters) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "NA")) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "NAN")) # below is TRUE - see default vals for arg special (function not case-sens) check_irregular(c(letters, "-iNf")) # below is FALSE, search for irregular vals is not substring/regex-based check_irregular(c(letters, "nan-iNf"))
check_labs_att
returns TRUE if your data.frame has the specific attribute
indicated and FALSE if it does not.
check_labs_att(data, att = NULL)
check_labs_att(data, att = NULL)
data |
the data.frame you are checking for the presence (or absence) of labelr meta-data. |
att |
the specific label meta-data you are looking for. Default of NULL will return TRUE if any valid labelr meta-data item of types "frame.lab", "name.labs", "val.labs", or "factor." (period is part of the substring) is present. |
TRUE if any instance of the default or user-specified meta-data attribute is found, FALSE if not.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "race" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) check_labs_att(df) # is any valid labelr lab(el) attribute present? check_labs_att(df, "val.labs.race") # "race" lab specifically TRUE
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "race" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) check_labs_att(df) # is any valid labelr lab(el) attribute present? check_labs_att(df, "val.labs.race") # "race" lab specifically TRUE
Drops name.lab and val.lab attributes associated with columns that are not present in the data.frame (i.e., have been dropped) and re-arranges data.frame attributes so that they appear in a clean, logical order.
clean_data_atts(data)
clean_data_atts(data)
data |
a data.frame. |
labelr meta-data exist as data.frame attributes, added through interactive
use in a potentially haphazard order. This function, which is used inside
other labelr functions, drops labels for variables that are not (no longer)
present in the data.frame and re-arranges label and other data.frame
attributes to put them in a more, logical, user-readable order when accessed
via, e.g., attributes()
.
A data.frame, with attributes re-arranged.
# make toy demographic (age, gender, raceth) data set set.seed(555) df <- make_demo_data(n = 1000) # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Latino", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" df <- add_val1( data = df, gender, vals = c(0, 1, 2), labs = c("M", "F", "O"), max.unique.vals = 50 ) # let's add variable NAME labels df <- add_name_labs(df, name.labs = c( "age" = "Age in years", "raceth" = "raceth category", "gender" = "gender assigned at birth" )) # let's add a frame label df <- add_frame_lab(df, frame.lab = "This is a fictional data set that includes demographic variables. It is generated by labelr::make_demo_data") # show attributes attributes(df) # re-arrange and show attributes df2 <- clean_data_atts(df) attributes(df2) # confirm that attributes from df are all present in df2 all(attributes(df) %in% attributes(df2)) # TRUE
# make toy demographic (age, gender, raceth) data set set.seed(555) df <- make_demo_data(n = 1000) # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Latino", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" df <- add_val1( data = df, gender, vals = c(0, 1, 2), labs = c("M", "F", "O"), max.unique.vals = 50 ) # let's add variable NAME labels df <- add_name_labs(df, name.labs = c( "age" = "Age in years", "raceth" = "raceth category", "gender" = "gender assigned at birth" )) # let's add a frame label df <- add_frame_lab(df, frame.lab = "This is a fictional data set that includes demographic variables. It is generated by labelr::make_demo_data") # show attributes attributes(df) # re-arrange and show attributes df2 <- clean_data_atts(df) attributes(df2) # confirm that attributes from df are all present in df2 all(attributes(df) %in% attributes(df2)) # TRUE
Convert a data.frame with Haven package-style labels to a data.frame with
labelr name labels and add_val_labs
-style one-to-one, value labels.
convert_labs(data, max.unique.vals = 50)
convert_labs(data, max.unique.vals = 50)
data |
the data.frame with Haven-style vector value label attributes. |
max.unique.vals |
constrains the variables that may receive value labels to those whose total unique values do not exceed the integer value supplied to this argument. Note that labelr sets a hard ceiling of 5000 on the total number of unique value labels that any variable is permitted to have under any circumstance, as labelr is primarily intended for interactive use with moderately-sized (<=~1M-row) data.frames. |
a data.frame.
# convert haven vector labels to labelr value labels library(haven) library(tibble) x1 <- labelled(1:8, c(good = 1, bad = 5)) x2 <- labelled(1:8, c(good = 1, mediocre = 4, bad = 5, horrible = 8)) # make this a tibble hdf <- tibble::tibble(x1, x2) hdf # how it looks # convert value labels to labelr label values hdf1 <- convert_labs(hdf) # show select values of hdf1 head(hdf1) # show that labelr labels are there for the using head(use_val_labs(hdf1)) # filter hdf1 using x1's "bad" labelr value label (with flab()) head(flab(hdf1, x1 == "bad"), 3) # filter hdf1 using x1's "good" value label (with flab()) head(flab(hdf1, x1 == "good"), 3) # return select rows and columns with slab() slab(hdf1, x2 %in% c("good", 2), x2) slab(hdf1, x2 %in% c("good", 2), x1)
# convert haven vector labels to labelr value labels library(haven) library(tibble) x1 <- labelled(1:8, c(good = 1, bad = 5)) x2 <- labelled(1:8, c(good = 1, mediocre = 4, bad = 5, horrible = 8)) # make this a tibble hdf <- tibble::tibble(x1, x2) hdf # how it looks # convert value labels to labelr label values hdf1 <- convert_labs(hdf) # show select values of hdf1 head(hdf1) # show that labelr labels are there for the using head(use_val_labs(hdf1)) # filter hdf1 using x1's "bad" labelr value label (with flab()) head(flab(hdf1, x1 == "bad"), 3) # filter hdf1 using x1's "good" value label (with flab()) head(flab(hdf1, x1 == "good"), 3) # return select rows and columns with slab() slab(hdf1, x2 %in% c("good", 2), x2) slab(hdf1, x2 %in% c("good", 2), x1)
Note: copy_var
copies an existing variable and its value labels from a
data.frame to another new or (if force = TRUE) existing variable of the
data.frame.
copy_var(data, from.var, to.var, force = FALSE)
copy_var(data, from.var, to.var, force = FALSE)
data |
a data.frame to which variable value labels will be added. |
from.var |
the unquoted name of the variable whose values and labels will be assigned to the to.var. This variable must presently exist in the data.frame. |
to.var |
the unquoted name of the variable to which the from.var's values and labels will be assigned. If force = FALSE, this must be a new variable name (one that does not refer to a variable that already exists in the data.frame). |
force |
if to.var already exists in the data.frame, allow it to be overwritten. If FALSE, this will not be allowed, and an error will be issued. |
Any non-labelr R operation that changes a variable's (column's) name or that
copies its contents to another variable (column) with a different name will
not associate the original variable's value labels with the new variable name.
To mitigate this, copy_var
allows one to copy both a variable (column) and
its value labels and assign those to another variable.
A data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 4) df <- copy_var(df, from.var = raceth, to.var = re_copy) df <- copy_var(df, from.var = x1, to.var = var1) head(df, 4) get_val_labs(df)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 4) df <- copy_var(df, from.var = raceth, to.var = re_copy) df <- copy_var(df, from.var = x1, to.var = var1) head(df, 4) get_val_labs(df)
Remove the frame label attribute (see add_frame_lab
) from a data.frame, if
one is present.
drop_frame_lab(data) dfl(data)
drop_frame_lab(data) dfl(data)
data |
the data.frame with a frame label that you wish to drop (and which
was added using |
See add_frame_lab
for more on this labeling construct.
Note: dfl
is a compact alias for drop_frame_lab
: they do the same thing,
and the former is easier to type.
a data.frame (with any previously applied frame.lab attribute removed).
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") get_frame_lab(mt2) # return frame.lab alongside data.frame name as a data.frame drop_frame_lab(mt2) # remove this frame.lab get_frame_lab(mt2) # the data.frame name now doubles as its frame label is.null(attributes(data)[["frame.lab"]]) # the attribute is NULL
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") get_frame_lab(mt2) # return frame.lab alongside data.frame name as a data.frame drop_frame_lab(mt2) # remove this frame.lab get_frame_lab(mt2) # the data.frame name now doubles as its frame label is.null(attributes(data)[["frame.lab"]]) # the attribute is NULL
Remove one or more descriptive variable name label attributes previously
added to a data.frame using add_name_labs
.
Note: dnl
is a compact alias for drop_name_labs
: they do the same thing,
and the former is easier to type
drop_name_labs(data, vars = NULL) dnl(data, vars = NULL)
drop_name_labs(data, vars = NULL) dnl(data, vars = NULL)
data |
the data.frame with name labels that you wish to drop (and which
were added using |
vars |
the names of the columns/variables (not the name labels) whose name labels you wish to drop. If NULL, all variables' name labels will be dropped. |
drop_name_labs
works with add_name_labs
, get_name_labs
and
use_name_labs
to facilitate creation, accessing, substitution, and removal
of variable name labels for variable names. Each variable (column) of a
data.frame can receive one and only one "name label," which typically is a
noun phrase that expounds the meaning or contents of the variable's name
(e.g., "Weight in ounces at birth" might be a viable name label for a column
called "wgt"). drop_name_labs
takes a data.frame and (optionally) a
character vector of variables whose name labels should be discarded: If only
a data.frame is provided, all variable name labels will be dropped. You can
assign new name labels using new calls to add_name_labs
(If you wish to
change some or all name labels, you do not need to call drop_name_labs
: you
can simply pass the new name labels to add_name_labs
, and they will
overwrite the old ones (including any automatically generated provisional
ones), while leaving in place any previously added name labels that you do
not explicitly replace).
The same data.frame you submitted, except that the selected name label attribute meta-data has been removed.
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df) # drop the name label for var/col "am" df <- drop_name_labs(df, "am") # see what this did to the name label for "am" get_name_labs(df) # now, drop all of the name labels df <- drop_name_labs(df) get_name_labs(df) # they're gone
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df) # drop the name label for var/col "am" df <- drop_name_labs(df, "am") # see what this did to the name label for "am" get_name_labs(df) # now, drop all of the name labels df <- drop_name_labs(df) get_name_labs(df) # they're gone
Drop all value labels previously applied to one or more variables using
add_val_labs
, add_quant_labs
,add_m1_lab
, and related functions (e.g.,
add_val1
) or aliases (e.g., avl
).
Note: dvl
is a compact alias for drop_val_labs
: they do the same thing,
and the former is easier to type
drop_val_labs(data, vars = NULL, partial = FALSE, not.vars = NULL) dvl(data, vars = NULL, partial = FALSE, not.vars = NULL)
drop_val_labs(data, vars = NULL, partial = FALSE, not.vars = NULL) dvl(data, vars = NULL, partial = FALSE, not.vars = NULL)
data |
a data.frame. |
vars |
a character vector that corresponds to the name(s) (or substring within the name(s), if partial = TRUE) of one or more variables form which value labels will be removed. If NULL, all value labels will be removed from all value-labeled variables. |
partial |
To drop labels for many, similarly named variables (e.g., "x1" through "x20"), you can provide a substring only and set partial = TRUE (default is FALSE). For example, to drop value labels for colnames "x1" through "x20", you could use vars = c("x"), along with partial = TRUE. Be careful with this, as it also will attempt to drop value labels for columns with colnames "sex" or "tax.bracket" (etc.), because they, too, contain an "x" in their names). |
not.vars |
use of the partial argument can result in situations where
you inadvertently attempt to drop value labels for a variable. For example,
if vars="x", and partial=TRUE, then |
drop_val_labs
works with other labelr functions (e.g., add_val_labs
,
get_val_labs
, use_val_labs
, add_lab_cols
) to facilitate the creation,
accessing, modification, use, or deletion of variable value labels.
A data.frame, with all value labels dropped from specified variables.
# make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) dfdrop <- drop_val_labs(dflik, vars = c("x2", "y3"), partial = FALSE ) # var x2's value labels are gone, like we asked for get_val_labs(dfdrop, "x2") # var x1's value labels are intact, b/c we didn't ask to drop them get_val_labs(dfdrop, "x1") dfxgone <- drop_val_labs(dflik, c("x"), partial = TRUE ) # still a lot of value labels, but all are for "y" vars, # ...none is left for "x" vars get_val_labs(dfxgone)
# make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) dfdrop <- drop_val_labs(dflik, vars = c("x2", "y3"), partial = FALSE ) # var x2's value labels are gone, like we asked for get_val_labs(dfdrop, "x2") # var x1's value labels are intact, b/c we didn't ask to drop them get_val_labs(dfdrop, "x1") dfxgone <- drop_val_labs(dflik, c("x"), partial = TRUE ) # still a lot of value labels, but all are for "y" vars, # ...none is left for "x" vars get_val_labs(dfxgone)
Drop all value labels previously applied to one or more variables using
add_val_labs
, add_quant_labs
,add_m1_lab
, and related functions (e.g.,
add_val1
) or aliases (e.g., avl
).
Note: dvl1
is a compact alias for drop_val1
: they do the same thing,
and the former is easier to type
drop_val1(data, var) dvl1(data, var)
drop_val1(data, var) dvl1(data, var)
data |
a data.frame. |
var |
the unquoted name of the variable whose value labels will be dropped. |
Note: drop_val1
is the drop_val_labs
analogue to add_val1
: just as
add_val1
is a variant of add_val_labs
that allows you to specify only one
variable at a time unquoted for value labeling, drop_val1
allows you to
pass one unquoted variable name at a time for value dropping. See those
functions for further details regarding the conventions.
A data.frame, with all value labels dropped from specified variable.
# make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val1( data = dflik, var = x3, vals = vals2label, labs = labs2use ) # see what this did get_val_labs(dflik, "x3") dfdrop <- drop_val1(dflik, var = x3 ) # odd choice, but ok # var x3's value labels are gone, like we asked for get_val_labs(dfdrop, "x3")
# make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val1( data = dflik, var = x3, vals = vals2label, labs = labs2use ) # see what this did get_val_labs(dflik, "x3") dfdrop <- drop_val1(dflik, var = x3 ) # odd choice, but ok # var x3's value labels are gone, like we asked for get_val_labs(dfdrop, "x3")
Convenience function to convert all factor variables to character.
fact2char(data) f2c(data)
fact2char(data) f2c(data)
data |
a data.frame object. |
a data.frame identical to data, with exception that any factors have been converted to character variables.
sapply(iris, class) head(iris) iris_ch <- fact2char(iris) sapply(iris_ch, class) head(iris_ch)
sapply(iris, class) head(iris) iris_ch <- fact2char(iris) sapply(iris_ch, class) head(iris_ch)
factor_to_lab_int
converts a factor variable (column) of a data.frame to a
value-labeled integer variable and converts the factor level labels to labelr
value labels, returning the modified data.frame.
factor_to_lab_int(data, var) f2int(data, var)
factor_to_lab_int(data, var) f2int(data, var)
data |
a data.frame object. |
var |
the (unquoted) name of a factor variable found in data. |
Note 1: f2int
is a compact alias for factor_to_lab_int
: they do the same thing,
and the former is easier to type.
Note 2: factor_to_lab_int()
is NOT an "undo" for lab_int_to_factor()
.
factor_to_lab_int()
will assign sequential integer values from 1 to k (the
number of distinct factor levels) in factor level order, and this will not
necessarily match the integer values of a variable previously subjected to a
lab_int_to_factor()
call. See extended second example below for demo.
a data.frame.
class(iris[["Species"]]) iris_df <- factor_to_lab_int(iris, Species) class(iris_df[["Species"]]) get_val_labs(iris_df, "Species") # copy data.frame mtcars to mt2 carb_orig_int <- mtcars # !NOTE! factor_to_lab_int() is NOT an "undo" for lab_int_to_factor() # Integer values will be sequential integers in factor level order # Demo this # add value labels to mtcars$carb; and assign data.frame to carb_orig_int carb_orig_int <- add_val_labs( data = mtcars, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1c", "2c", # a tad silly, but these value labels will demo the principle "3c", "4c", "6c", "8c" ) ) # carb as labeled numeric class(carb_orig_int$carb) # numeric levels(carb_orig_int$carb) # none, not a factor head(carb_orig_int$carb, 3) # compare to carb_to_int (below) mean(carb_orig_int$carb) # compare to carb_to_int (below) lm(mpg ~ carb, data = carb_orig_int) # compare to carb_to_int (below) (adj_r2_int <- summary(lm(mpg ~ carb, data = carb_orig_int))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_orig_int)) # compare to carb_to_int (below) # carb as factor carb_fac <- carb_orig_int # copy carb_orig_int to new data.frame carb_fac carb_fac <- lab_int_to_factor(carb_fac, carb) # alias int2f() also works class(carb_fac$carb) # factor levels(carb_fac$carb) # has levels head(carb_fac$carb, 3) lm(mpg ~ carb, data = carb_fac) # factor (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_fac))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_fac)) # compare to R2, AIC for carb_to_int # ??back?? to integer? Not quite. Compare carb_to_int to carb_orig_int carb_to_int <- carb_fac # copy carb_fac to carb_to_int carb_to_int <- factor_to_lab_int(carb_to_int, carb) # alias int2f() also works class(carb_to_int$carb) # Is an integer levels(carb_to_int$carb) # NOT a factor mean(carb_to_int$carb) # NOT the same as carb_orig_int lm(mpg ~ carb, data = carb_to_int) # NOT the same as carb_orig_int (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_to_int))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_to_int)) # NOT the same as carb_orig_int
class(iris[["Species"]]) iris_df <- factor_to_lab_int(iris, Species) class(iris_df[["Species"]]) get_val_labs(iris_df, "Species") # copy data.frame mtcars to mt2 carb_orig_int <- mtcars # !NOTE! factor_to_lab_int() is NOT an "undo" for lab_int_to_factor() # Integer values will be sequential integers in factor level order # Demo this # add value labels to mtcars$carb; and assign data.frame to carb_orig_int carb_orig_int <- add_val_labs( data = mtcars, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1c", "2c", # a tad silly, but these value labels will demo the principle "3c", "4c", "6c", "8c" ) ) # carb as labeled numeric class(carb_orig_int$carb) # numeric levels(carb_orig_int$carb) # none, not a factor head(carb_orig_int$carb, 3) # compare to carb_to_int (below) mean(carb_orig_int$carb) # compare to carb_to_int (below) lm(mpg ~ carb, data = carb_orig_int) # compare to carb_to_int (below) (adj_r2_int <- summary(lm(mpg ~ carb, data = carb_orig_int))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_orig_int)) # compare to carb_to_int (below) # carb as factor carb_fac <- carb_orig_int # copy carb_orig_int to new data.frame carb_fac carb_fac <- lab_int_to_factor(carb_fac, carb) # alias int2f() also works class(carb_fac$carb) # factor levels(carb_fac$carb) # has levels head(carb_fac$carb, 3) lm(mpg ~ carb, data = carb_fac) # factor (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_fac))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_fac)) # compare to R2, AIC for carb_to_int # ??back?? to integer? Not quite. Compare carb_to_int to carb_orig_int carb_to_int <- carb_fac # copy carb_fac to carb_to_int carb_to_int <- factor_to_lab_int(carb_to_int, carb) # alias int2f() also works class(carb_to_int$carb) # Is an integer levels(carb_to_int$carb) # NOT a factor mean(carb_to_int$carb) # NOT the same as carb_orig_int lm(mpg ~ carb, data = carb_to_int) # NOT the same as carb_orig_int (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_to_int))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_to_int)) # NOT the same as carb_orig_int
flab
("filter using labels") allows one to filter-subset a data.frame
based on variable-specific value label attributes.
flab(data, condition)
flab(data, condition)
data |
the data.frame from which columns will be selected. |
condition |
row-filtering conditions along the lines of base::subset() and/or dplyr::filter(), which may involve a combination of value labels (for value-labeled variables only) and actual values (for non-value-labeled variables only). |
flab
accepts a labelr value-labeled data.frame, followed by condition-
based row-filtering instructions (akin to base::subset
or dplyr::filter
),
expressed in terms of variable value labels that exist only as meta-data
(i.e., not visible using View()
, head()
, etc.), and returns the filtered
data.frame in terms of the values themselves. In other words, value labels
are supplied to the flab()
call to direct the filtering process, but those
value labels are not displayed in the cells of the returned data.frame – the
raw values themselves are. This functionality may be useful for interactively
subsetting a data.frame, where character value labels may be more intuitive
and easily recalled than the underlying variable values themselves (e.g.,
raceth=="White" & gender="F" may be more intuitive or readily recalled than
raceth==3 & gender==2).
Note 1: When using flab
, any conditional row-filtering syntax involving
value-labeled variables must be expressed in terms of those variables' value
labels, not the raw values themselves. Filtering on non-value-labeled
variables is also permitted, with those variables' filtering conditions being
expressed in terms of raw values. Further, flab()
calls may reference both
types of columns (i.e., value-labeled variables and non-value-labeled
variables), provided filtering conditions for the former are expressed in
terms of value labels.
Note 2: flab
(and labelr more broadly) is intended for moderate-sized (or
smaller) data.frames, defined loosely as those with a few million or fewer
rows. With a conventional (c. 2024) laptop, labelr operations on modest-
sized (~100K rows) take seconds (or less); with larger (> a few million rows)
data.frames, labelr may take several minutes (or run out of memory and fail
altogether!), depending on the complexity of the call and the number and type
of cells implicated in it.
See also slab
, use_val_labs
, add_val_labs
, add_val1
, add_quant_labs
, add_quant1
, get_val_labs
, drop_val_labs
. For label-preserving subsetting tools that subset in
terms of raw values (not value labels), see sfilter
, sbrac
, ssubset
, sdrop
.
a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s).
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # see what we did # get_val_labs(df) get_val_labs(df, "gender") get_val_labs(df, "raceth") # use --labels-- to filter w/ flab() ("*F*ilter *lab*el") dflab <- flab(df, raceth == "Asian" & gender == "F") head(dflab, 4) # equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter") dfsf <- sfilter(df, raceth == 3 & gender == 1) head(dfsf, 4)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # see what we did # get_val_labs(df) get_val_labs(df, "gender") get_val_labs(df, "raceth") # use --labels-- to filter w/ flab() ("*F*ilter *lab*el") dflab <- flab(df, raceth == "Asian" & gender == "F") head(dflab, 4) # equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter") dfsf <- sfilter(df, raceth == 3 & gender == 1) head(dfsf, 4)
get_all_factors
returns a list of character vectors, where each character
vector is a given factor variable's unique levels, and where the vector
is given the same name as the factor variable itself. If the data.frame
contains no factors, an empty (length 0) list is returned.
get_all_factors(data)
get_all_factors(data)
data |
the data.frame you are checking for factor variables. |
A list of 0, 1, or more character variables.
class(get_all_factors(iris)) length(get_all_factors(iris)) zz <- iris zz$u <- zz$Species # zz has two factor variables class(get_all_factors(zz)) length(get_all_factors(zz)) get_all_factors(mtcars) length(get_all_factors(mtcars))
class(get_all_factors(iris)) length(get_all_factors(iris)) zz <- iris zz$u <- zz$Species # zz has two factor variables class(get_all_factors(zz)) length(get_all_factors(zz)) get_all_factors(mtcars) length(get_all_factors(mtcars))
get_all_lab_atts
returns a list of labelr-generated meta-data attributes
attached to a data.frame, all of which should have names beginning with one
of these character strings: "frame.lab", "name.labs",
"val.labs", "factor.".
get_all_lab_atts(data, atts = c("name.labs|val.labs|frame.lab|factor."))
get_all_lab_atts(data, atts = c("name.labs|val.labs|frame.lab|factor."))
data |
the data.frame you are checking for labelr meta-data attributes. |
atts |
default is to look for all/any, but you can specify a more narrow subset (or some other, altogether-irrelevant attribute, but do not do that). |
A free-standing list of labelr attributes.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_all_lab_atts(df) # returns all; is default get_all_lab_atts(df, "val.labs") # returns only "val.labs" attributes get_all_lab_atts(df, "class") # You can (but probably should not) use this way.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_all_lab_atts(df) # returns all; is default get_all_lab_atts(df, "val.labs") # returns only "val.labs" attributes get_all_lab_atts(df, "class") # You can (but probably should not) use this way.
get_factor_atts
searches a labelr labeled data.frame for factors. If any
are found, a list of character vectors of factor levels is returned, with
each character vector being the set of unique levels for a factor variable,
and with each character vector named according to the convention "factor." +
variable name (e.g.,"factor.Species" for iris$Species). Used internally by
other labelr functions to get information about factors in labeled
data.frames.
get_factor_atts(data)
get_factor_atts(data)
data |
the labelr labeled data.frame you are checking for factor variables. |
A list of character vectors, each named according to the convention "factor." + variable name (e.g.,"factor.Species" for iris$Species). If the data.frame lacks labelr attributes or lacks factors, an empty list will be returned.
ir2 <- iris unique(ir2$Species) ir2 <- add_val_labs(ir2, vars = "Species", vals = c( "setosa", "versicolor", "virginica" ), labs = c("se", "ve", "vi") ) get_val_labs(ir2) head(use_val_labs(ir2)) get_factor_atts(iris) # no such info: iris is not labelr labeled get_factor_atts(ir2) # this one has info: it's labelr labeled
ir2 <- iris unique(ir2$Species) ir2 <- add_val_labs(ir2, vars = "Species", vals = c( "setosa", "versicolor", "virginica" ), labs = c("se", "ve", "vi") ) get_val_labs(ir2) head(use_val_labs(ir2)) get_factor_atts(iris) # no such info: iris is not labelr labeled get_factor_atts(ir2) # this one has info: it's labelr labeled
get_factor_info
searches a labelr labeled data.frame for factors. If any
are found, a data.frame is returned with the name and unique factor levels of
each, along with a logical indicator of whether the factor is ordered, with
one row per level per factor. If none are found, a one-row data.frame of NA
values is returned.
get_factor_info(data, var = NULL)
get_factor_info(data, var = NULL)
data |
the labelr labeled data.frame you are checking for factor variables. |
var |
a 1L character vector with the name of a specific variable (column) of data, if you wish to restrict query to a single variable (else, keep NULL and will get info for any relevant variables). |
A data.frame with three columns: "factor.var" (the name of the factor variable in question), "levels" (the given level of that factor, expressed as a character), and "ordered" (TRUE if so, FALSE if not). If no factors are present in the supplied data.frame, a one-row data.frame of same structure with all three cells set to NA.
ir2 <- iris unique(ir2$Species) ir2 <- add_val_labs(ir2, vars = "Species", vals = c( "setosa", "versicolor", "virginica" ), labs = c("se", "ve", "vi") ) get_val_labs(ir2) head(use_val_labs(ir2)) get_factor_info(iris) # no such info: iris is not labelr labeled get_factor_info(ir2) # this one has info: it's labelr labeled
ir2 <- iris unique(ir2$Species) ir2 <- add_val_labs(ir2, vars = "Species", vals = c( "setosa", "versicolor", "virginica" ), labs = c("se", "ve", "vi") ) get_val_labs(ir2) head(use_val_labs(ir2)) get_factor_info(iris) # no such info: iris is not labelr labeled get_factor_info(ir2) # this one has info: it's labelr labeled
For a frame-labeled data.frame, get_frame_lab
returns a derivative
1x2 data.frame that lists the data.frame name and its frame.lab attribute.
Note: gfl
is a compact alias for get_frame_lab
: they do the same thing,
and the former is easier to type.
get_frame_lab(data) gfl(data)
get_frame_lab(data) gfl(data)
data |
a data.frame. |
get_frame_lab
returns the overall descriptive "frame label" that is assigned
to a data.frame using add_frame_lab.
A 1x2 data.frame, consisting of "data.frame" and "frame.lab" values for the supplied data.frame. If the supplied data.frame does not have a frame label, its name also will be used as its frame label (i.e., both entries of the returned 1x2 data.frame will be the data.frame name).
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") attr(mt2, "frame.lab") # check for attribute # return frame.lab alongside data.frame name as a data.frame get_frame_lab(mt2)
# add frame.lab to mtcars and assign to new data.frame mt2 mt2 <- add_frame_lab(mtcars, frame.lab = "Data extracted from the 1974 Motor Trend US magazine, comprising fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.") attr(mt2, "frame.lab") # check for attribute # return frame.lab alongside data.frame name as a data.frame get_frame_lab(mt2)
get_labs_att
returns the specified piece of labelr lab(el) attribute meta-
data information if it is present.
get_labs_att(data, att)
get_labs_att(data, att)
data |
the data.frame you are checking for the presence (or absence) of labelr meta-data. |
att |
the specific label meta-data you are looking for. Default of NULL will return any and all meta-data with name substring "name.labs", "val.labs", or "factor." (period is part of the substring). |
A list.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 )
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 )
For a name-labeled data.frame, get_name_labs
returns a derivative
data.frame that lists each variable and its variable name label.
Note: gnl
is a compact alias for get_name_labs
: they do the same thing,
and the former is easier to type
get_name_labs(data, vars = NULL) gnl(data, vars = NULL)
get_name_labs(data, vars = NULL) gnl(data, vars = NULL)
data |
a data.frame. |
vars |
a character vector with the name(s) of any specific variable(s) (If NULL, returned data.frame will contain all variable name labels). |
A two-column data.frame, consisting of "var" and "lab" columns, where each row corresponds to a unique variable (column) from the user- supplied data.frame.
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df)
# create a data set df <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels df <- add_name_labs(df, vars = names(names_labs_vec), labs = names_labs_vec ) # see what we have get_name_labs(df)
For a data.frame with value-labeled variables, get_val_lab1
returns a
derivative data.frame or vector that shows the value-to-label mapping for
each unique value of that value-labeled variable.
get_val_lab1(data, var, simplify = FALSE) gvl1(data, var, simplify = FALSE)
get_val_lab1(data, var, simplify = FALSE) gvl1(data, var, simplify = FALSE)
data |
a data.frame. |
var |
the unquoted name of the variable (column) for which a value-to-label look-up mapping is sought. |
simplify |
return the mapping as a named vector, not a data.frame (defaults to FALSE). |
get_val1
is a variant of get_val_labs
that allows you to specify
only one var whose value-to-label mapping you wish to look up.
Note 1: As with get_val_labs()
, get_val_lab1()
exists to provide a visual,
human-interpretable quick look at how value labels map to underlying values
and is NOT intended for use in automated querying, subsetting, or other
manipulation of those value labels. Further: Unlike get_val_labs()
, which
may return value-to-label mappings for –several– variables of potentially
different atomic types, get_val_lab1()
limits itself to returning the value
labels of a –single– variable (column) of the supplied data.frame.
For this reason, and in contrast to the behavior of get_val_labs()
, if
get_val_lab1()
's simplify argument is set to FALSE (the default), the
returned data.frame will express var values as numeric if this can be done
without creating new NA values (i.e., in the sense of as_numv()
). In
contrast, if simplify is TRUE, the look-up table information will be returned
as a named character vector.
Note 2: gvl1
is a compact alias for get_val_lab1
: they do the same thing,
and the former is easier to type
By default, a three-column data.frame, consisting of "var", "vals",
and "labs" columns, where each row corresponds to a unique value of var OR –
for variables labeled using add_quant_labs
(or add_quant1
) – the
approximate (i.e., possibly rounded) upper bound of numerical values that
fall within that label's range of coverage. If simplify is FALSE, a character
vector will returned.
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" get_val_lab1(dflik, x1) get_val_lab1(dflik, x1, simplify = TRUE)
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" get_val_lab1(dflik, x1) get_val_lab1(dflik, x1, simplify = TRUE)
For a data.frame with value-labeled variables, get_val_labs
returns a
derivative data.frame that shows the value-to-label mapping for each unique
value of each value-labeled variable.
get_val_labs(data, var = NULL) gvl(data, var = NULL)
get_val_labs(data, var = NULL) gvl(data, var = NULL)
data |
a data.frame. |
var |
a character vector with the name(s) of any specific variable(s) (If NULL, returned data.frame will contain all variable value labels). |
Note 1: get_val_labs
returns a data.frame that is intended strictly to
facilitate human-in-the-loop –visual– display and inspection of what (if
any) value label has been associated with each variable value. It is –not–
intended for use in automated querying or subsetting or as an indicator of
of the supplied data.frame's columns' underlying classes or atomic types. In
particular, all columns of the –returned– data.frame object are coerced to
character for display purposes, as a result of concatenating value
information from different variables of potentially different atomic types or
classes. For example, all elements of the "vals" column are expressed as
character even if the underlying values themselves are numeric.
Note 2: gvl
is a compact alias for get_val_labs
: they do the same thing,
and the former is easier to type
A three-column data.frame, consisting of "var", "vals", and "labs"
columns, where each row corresponds to a unique value of a value-labeled
variable (column) from the user-supplied data.frame OR – for variables
labeled using add_quant_labs
(or add_quant1
) – the upper bound of
numerical values that fall within that label's range of coverage. Note that
all variables of the returned data.frame are coerced to character (see Note 1
of details).
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" get_val_labs(dflik) get_val_labs(dflik, "x1")
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" get_val_labs(dflik) get_val_labs(dflik, "x1")
greml
takes two character vectors of strings and, for each pattern
represented by an element of the first vector, searches all elements of the
second vector to see if any of those elements of the second vector matches
that pattern element of the first vector.
greml(patterns, x, ignore.case = TRUE, vals = FALSE)
greml(patterns, x, ignore.case = TRUE, vals = FALSE)
patterns |
a character vector of comma-separated, quoted character strings or regular expressions. |
x |
a character vector that will be tested for presence/absence of the patterns passed via the patterns argument. |
ignore.case |
search in a non-case-sensitive fashion if TRUE. |
vals |
by default, vals = FALSE and will return a named vector that indicates, for each unique element of patterns, whether a match for that pattern was found somewhere in the x vector (one named logical element returned for each patterns argument vector element). If TRUE, vals returns the unique values of patterns that were matched (one character element for each matched patterns argument vector element). |
This function accepts a character vector of text substring patterns or regular expressions (patterns argument), and searches a second character vector (x argument) to determine which elements of the first vector (patterns) match at least one element anywhere in the second vector (x). If vals = TRUE (default is FALSE), each matched pattern element is returned; if vals = FALSE, a vector of named logical values equal in length to the object supplied to the patterns argument is returned, indicating for each patterns element whether a match for it was found anywhere in x (TRUE if so, FALSE if not), with the names corresponding to the elements of the patterns vector. If ignore.case = TRUE (the default), neither vector is treated case- sensitively (both are coerced to lower-case before other operations occur).
Used internally by various labelr functions (e.g., use_val_labs
). Note that
this is the same search and syntax that is performed by gremlr()
(gremlr
means "greml
in reverse"), except that, whereas greml
returns
matches in terms of the patterns argument (i.e., which patterns elements
match at least one x element), gremlr
returns matches in terms of the x
argument.
a vector, either character (if vals = TRUE) or logical (if vals = FALSE).
# search for "AB" (case-sensitively) anywhere in subsequent vector greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) # character(0) # search for "AB" (non-case-sensitively; the default) anywhere in next vector greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = TRUE) # [1] "AB" # other searches greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) greml(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = TRUE) # compare greml (above) to gremlr() (here) gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) gremlr(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("bq", "z"), vals = FALSE)
# search for "AB" (case-sensitively) anywhere in subsequent vector greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) # character(0) # search for "AB" (non-case-sensitively; the default) anywhere in next vector greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = TRUE) # [1] "AB" # other searches greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) greml(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = TRUE) # compare greml (above) to gremlr() (here) gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) gremlr(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("bq", "z"), vals = FALSE)
gremlr
accepts two character vectors of strings and, for each element of
the second vector, determines whether that element matches any of the
patterns supplied via any of the elements of the first vector.
gremlr(patterns, x, ignore.case = TRUE, vals = FALSE)
gremlr(patterns, x, ignore.case = TRUE, vals = FALSE)
patterns |
a character vector of comma-separated, quoted character strings or regular expressions. |
x |
a character vector that will be tested for presence/absence of the patterns passed via the patterns argument. |
ignore.case |
search in a non-case-sensitive fashion if TRUE. |
vals |
by default, vals = FALSE and will return a named vector that indicates, for each unique element of x, whether that x element was a match for any element of patterns (one named logical element returned for each x vector element). If TRUE, vals returns the unique values of x that were matched (one character element for each matched x argument vector element). |
This function accepts a character vector of text substring patterns or
regular expressions (patterns argument), and searches another character
vector (x argument) to determine for each element of x, whether that element
is a match (in the sense of base::grepl()
) for any pattern element of
of patterns. If vals = TRUE (default is FALSE), each matched x element is
returned; if vals = FALSE, a vector of named logical values equal in length
to x is returned, indicating for each x element whether it contains any text
substring or pattern found in any element of patterns (TRUE if so, FALSE if
not), with the names corresponding to the elements of the x vector. Used
internally by various labelr functions. If ignore.case = TRUE (the default),
neither vector is treated case-sensitively (both are coerced to lower-case
before other operations).
Used internally by various labelr functions Note that this is the same
search and syntax that is performed by greml
(gremlr
is "greml
in
reverse"), except that, whereas greml
returns matches in terms of the
patterns argument, gremlr
returns matches in terms of x argument.
a vector, either character (if vals = TRUE) or logical (if vals = FALSE).
# search for "AB" (case-sensitively) anywhere in subsequent vector gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) # character(0) # search for "AB" (non-case-sensitively; the default) anywhere in next vector gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = TRUE) # [1] "ab" "ab" "abc" # other searches gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) gremlr(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("bq", "z"), vals = FALSE) # compare gremlr (above) to greml() (here) greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) greml(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = FALSE)
# search for "AB" (case-sensitively) anywhere in subsequent vector gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) # character(0) # search for "AB" (non-case-sensitively; the default) anywhere in next vector gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = TRUE) # [1] "ab" "ab" "abc" # other searches gremlr(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) gremlr(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) gremlr(c("a|b", "Q"), c("bq", "z"), vals = FALSE) # compare gremlr (above) to greml() (here) greml(c("AB"), c("ab", "ab", "abc", "z"), vals = TRUE, ignore.case = FALSE) greml(c("ab"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("ab", "Q"), c("ab", "ab", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "abc", "z"), vals = FALSE) greml(c("a|b", "Q"), c("a", "b", "z"), vals = FALSE) greml(c("a|b", "Q"), c("bq", "z"), vals = FALSE)
add_val_labs()
-style Value-labeled Variable (Column)?Determine whether a specific variable of a data.frame has value labels
associated with it that were added using add_val_labs()
or add_val1()
.
has_avl_labs(data, var) h11l(data, var) hql(data, var)
has_avl_labs(data, var) h11l(data, var) hql(data, var)
data |
a data.frame. |
var |
the unquoted name of the variable (column) to check for the
presence of |
h11l
is a compact alias for has_avl_labs
: they do the same thing, and the
former is easier to type
A 1L logical.
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "y" in their name, except "y3" set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("y"), # note the vars args not.vars = "y3", vals = vals2label, labs = labs2use, partial = TRUE ) has_avl_labs(dflik, y1) # TRUE has_avl_labs(dflik, y3) # FALSE, see not.vars arg above
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "y" in their name, except "y3" set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("y"), # note the vars args not.vars = "y3", vals = vals2label, labs = labs2use, partial = TRUE ) has_avl_labs(dflik, y1) # TRUE has_avl_labs(dflik, y3) # FALSE, see not.vars arg above
has_decv
determines whether a vector has decimal values, using all values
for smaller vectors and using a non-random sample of observations for larger
vectors.
has_decv(x, sample.after = 1000)
has_decv(x, sample.after = 1000)
x |
the vector to check for presence of decimals. |
sample.after |
for larger (length(x)>1000) vectors, test for decimals using a non-random sample of 1000 observations from the vector. |
This function is used by core labelr functions to detect vectors that are
bad candidates for one-to-one value labeling (as implemented by, e.g.,
add_val_labs
).
a 1L vector indicating whether x has decimal values.
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) head(df, 3) sapply(df, class) class(df$x2) df <- as_num(df) head(df, 3) sapply(df, class) sapply(mtcars, is.double) sapply(mtcars, is.numeric) sapply(mtcars, is.integer) sapply(mtcars, has_decv)
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x3 <- sample(letters, size = 10, replace = TRUE) df <- data.frame(x1, x2, x3) head(df, 3) sapply(df, class) class(df$x2) df <- as_num(df) head(df, 3) sapply(df, class) sapply(mtcars, is.double) sapply(mtcars, is.numeric) sapply(mtcars, is.integer) sapply(mtcars, has_decv)
add_m1_lab()
Many-to-One-Style Value-labeled Variable (Column)?Determine whether a specific variable of a data.frame has many-to-one-style
value labels associated with it (i.e., via add_m1_lab()
or add1m1()
).
has_m1_labs(data, var) hm1l(data, var)
has_m1_labs(data, var) hm1l(data, var)
data |
a data.frame. |
var |
the unquoted name of the variable (column) to check for the presence of many-to-one-style value labels. |
hm1l
is a compact alias for has_m1_labs
: they do the same thing, and the
former is easier to type
A 1L logical.
# add many-to-one style labels for "carb" and one-to-one style for "am" df <- mtcars df <- add_m1_lab(df, vars = "carb", vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add_m1_lab(df, vars = "carb", vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) df <- add_val_labs(df, vars = "am", vals = c(0, 1), labs = c("autom", "manu"), max.unique.vals = 10 ) has_m1_labs(df, carb) # TRUE, carb has m1-style value labels has_val_labs(df, am) # TRUE, am does have value labels has_m1_labs(df, am) # FALSE, am's value labels are not not m1-style labels
# add many-to-one style labels for "carb" and one-to-one style for "am" df <- mtcars df <- add_m1_lab(df, vars = "carb", vals = 1:3, lab = "<=3", max.unique.vals = 10 ) df <- add_m1_lab(df, vars = "carb", vals = c(4, 6, 8), lab = ">=4", max.unique.vals = 10 ) df <- add_val_labs(df, vars = "am", vals = c(0, 1), labs = c("autom", "manu"), max.unique.vals = 10 ) has_m1_labs(df, carb) # TRUE, carb has m1-style value labels has_val_labs(df, am) # TRUE, am does have value labels has_m1_labs(df, am) # FALSE, am's value labels are not not m1-style labels
add_quant_labs()
-style Value-labeled Variable (Column)?Determine whether a specific variable of a data.frame has value labels
associated with it that were added using add_quant_labs()
or add_quant1()
.
has_quant_labs(data, var)
has_quant_labs(data, var)
data |
a data.frame. |
var |
the unquoted name of the variable (column) to check for the
presence of |
hql
is a compact alias for has_quant_labs
: they do the same thing, and
the former is easier to type
A 1L logical.
# copy mtcars to mt2 and assign various types of value labels mt2 <- mtcars # add 1-to-1 value labels mt2 <- add_val_labs( data = mt2, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) has_val_labs(mt2, am) # TRUE, it does has_m1_labs(mt2, am) # FALSE, they are NOT add_m1_lab()-style has_quant_labs(mt2, am) # FALSE, they are NOT add_quant_labs() -style # add many-to-1 value labels mt2 <- add_m1_lab( data = mt2, vars = "gear", vals = 4:5, lab = "4+" ) has_val_labs(mt2, gear) # TRUE, it does has_m1_labs(mt2, gear) # TRUE, they ARE add_m1_lab()-style has_quant_labs(mt2, gear) # FALSE, they NOT not add_quant_labs() -style # add quartile-based numerical range value labels mt2 <- add_quant_labs( data = mt2, vars = "disp", qtiles = 4 ) has_val_labs(mt2, disp) # TRUE, it does has_m1_labs(mt2, disp) # FALSE, they are NOT add_m1_lab()-style has_quant_labs(mt2, disp) # TRUE, they ARE add_quant_labs() -style
# copy mtcars to mt2 and assign various types of value labels mt2 <- mtcars # add 1-to-1 value labels mt2 <- add_val_labs( data = mt2, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) has_val_labs(mt2, am) # TRUE, it does has_m1_labs(mt2, am) # FALSE, they are NOT add_m1_lab()-style has_quant_labs(mt2, am) # FALSE, they are NOT add_quant_labs() -style # add many-to-1 value labels mt2 <- add_m1_lab( data = mt2, vars = "gear", vals = 4:5, lab = "4+" ) has_val_labs(mt2, gear) # TRUE, it does has_m1_labs(mt2, gear) # TRUE, they ARE add_m1_lab()-style has_quant_labs(mt2, gear) # FALSE, they NOT not add_quant_labs() -style # add quartile-based numerical range value labels mt2 <- add_quant_labs( data = mt2, vars = "disp", qtiles = 4 ) has_val_labs(mt2, disp) # TRUE, it does has_m1_labs(mt2, disp) # FALSE, they are NOT add_m1_lab()-style has_quant_labs(mt2, disp) # TRUE, they ARE add_quant_labs() -style
Determine whether a specific variable of a data.frame has value labels associated with it.
has_val_labs(data, var, type = "any") hvl(data, var, type = "any")
has_val_labs(data, var, type = "any") hvl(data, var, type = "any")
data |
a data.frame. |
var |
the unquoted name of the variable (column) to check for the presence of value labels. |
type |
the type of value label to check the variable for: "any,"
"1to1," "m1," or "q." If "1to1," check will be for value labels added via
|
hvl
is a compact alias for has_val_labs
: they do the same thing, and the
former is easier to type
A 1L logical.
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "y" in their name, except "y3" set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("y"), # note the vars args not.vars = "y3", vals = vals2label, labs = labs2use, partial = TRUE ) has_val_labs(dflik, y1) # TRUE has_val_labs(dflik, y3) # FALSE, see not.vars arg above
# add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "y" in their name, except "y3" set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("y"), # note the vars args not.vars = "y3", vals = vals2label, labs = labs2use, partial = TRUE ) has_val_labs(dflik, y1) # TRUE has_val_labs(dflik, y3) # FALSE, see not.vars arg above
headl
accepts a labelr value-labeled data.frame and returns the first n
value-labeled rows of that data.frame
headl(data, n = 6L)
headl(data, n = 6L)
data |
a data.frame. |
n |
the last row to select (i.e., rows 1 through n will be returned). |
Whereas utils::head
returns the first n rows of a data.frame, headl
does
the same thing, substituting value labels for values wherever the former exist.
See also taill
and somel
.
a data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) head(df) # utils::head headl(df) # same, but with value labels in place of values
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) head(df) # utils::head headl(df) # same, but with value labels in place of values
init_labs
pre-populates a data.frame with "placeholder" labelr label meta-
data, which will be overwritten if/when you explicitly assign your own
preferred label attributes.
init_labs(data, max.unique.vals = 5000)
init_labs(data, max.unique.vals = 5000)
data |
the data.frame that you will be labeling via functions like
|
max.unique.vals |
constrains the variables that may receive value labels to those whose total unique values do not exceed the integer value supplied to this argument. Note that labelr sets a hard ceiling of 5000 on the total number of unique value labels that any variable is permitted to have under any circumstance, as labelr is primarily intended for interactive use with moderately-sized (<=~1M-row) data.frames. |
init_labs
is used inside other labelr functions but is not intended for
interactive use at the console.
a data.frame with initial placeholder labelr meta-data added.
# make toy demographic (gender, race, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function df2 <- init_labs(df) # df2 is not df get_all_lab_atts(df) # this is df; is not df2 get_all_lab_atts(df2) # this is df2
# make toy demographic (gender, race, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function df2 <- init_labs(df) # df2 is not df get_all_lab_atts(df) # this is df; is not df2 get_all_lab_atts(df2) # this is df2
Check all (or specified) columns of a data.frame for the presence of "irregular" values (e.g., NA, Inf, NaN) and, if found, replace them with NA (or some other specified value).
irregular2( data, vars = NULL, to = NA, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF") )
irregular2( data, vars = NULL, to = NA, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF") )
data |
a data.frame object. |
vars |
a character vector that corresponds to the name(s) of one or more variables (columns) to which the operational will be applied. If NULL, will be applied to all variables. |
to |
a single value (by default: NA) to which all irregular values will be converted (Note: if arg is character, returned vector will be coerced to character). |
nan.include |
convert NaN values to NA. |
inf.include |
convert Inf and -Inf values to NA. |
special |
additional specific character values that will be converted to NA values. |
For purposes of irregular2
, irregular values consist of: NA values, other
arbitrary values you specify, and (by default): NaN, Inf, -Inf, and character
variants of same (i.e., upper, lower, or mixed-case variants of "NA","NAN",
"INF","-INF"). This function converts all such values to NA (or some other
specified value).
a data.frame identical to data, with exception that irregular values have been converted to the value specified in "to" argument (NA, by default).
set.seed(123) x <- c(NA, Inf, -Inf, NaN, runif(6)) y <- c("a", "inf", "NaN", NA, sample(letters, 6, replace = TRUE)) df <- data.frame(x, y) head(df, 10) df_1 <- irregular2(df) head(df_1, 10) df_2 <- irregular2(df, vars = "x", to = ";-)") head(df_2)
set.seed(123) x <- c(NA, Inf, -Inf, NaN, runif(6)) y <- c("a", "inf", "NaN", NA, sample(letters, 6, replace = TRUE)) df <- data.frame(x, y) head(df, 10) df_1 <- irregular2(df) head(df_1, 10) df_2 <- irregular2(df, vars = "x", to = ";-)") head(df_2)
Check a vector for the presence of "irregular" values (e.g., NA) and, if found, replace with some other (single) user-specified value.
irregular2v( x, to = 99, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF"), other = NULL )
irregular2v( x, to = 99, nan.include = TRUE, inf.include = TRUE, special = c("NA", "NAN", "INF", "-INF"), other = NULL )
x |
a vector. |
to |
a single value to which all NA (and other irregular) values should be converted (Note: if arg is character, returned vector will be coerced to character). |
nan.include |
treat NaN values as NA. |
inf.include |
treat Inf and -Inf values as NA. |
special |
a modifiable set of default character values that will be treated as equivalent to NA values. |
other |
additional user-specified values (of consistent class) that will be treated as equivalent to NA values. |
For purposes of irregular2v
, irregular values consist of: NA values, other
arbitrary values you specify, and (by default): NaN, Inf, -Inf, and character
variants of same (i.e., upper, lower, or mixed-case variants of "NA","NAN",
"INF","-INF"). This function is used insider core labelr functions to manage
such values, but the typical labelr user will have no explicit need to use it
as part of an interactive session.
a vector identical to x, with exception that NA (and other irregular) values have been converted to the value specified in "to" argument.
set.seed(123) x <- c(NA, Inf, -Inf, NaN, runif(6)) x x1 <- irregular2v(x, to = 33) x1 set.seed(123) x1 <- c(NA, Inf, -Inf, NaN, runif(6)) x x1 <- irregular2v(x, to = 33, nan.include = TRUE, inf.include = FALSE) x1 set.seed(123) x <- c(NA, "INF", "in", "nan", "NA", sample(letters, 5)) x x1 <- irregular2v(x, to = "<-X->") x1
set.seed(123) x <- c(NA, Inf, -Inf, NaN, runif(6)) x x1 <- irregular2v(x, to = 33) x1 set.seed(123) x1 <- c(NA, Inf, -Inf, NaN, runif(6)) x x1 <- irregular2v(x, to = 33, nan.include = TRUE, inf.include = FALSE) x1 set.seed(123) x <- c(NA, "INF", "in", "nan", "NA", sample(letters, 5)) x x1 <- irregular2v(x, to = "<-X->") x1
is_numable
determines whether a character vector can be coerced to numeric
without generating new NA values.
is_numable(x, nan2na = TRUE, inf2na = TRUE)
is_numable(x, nan2na = TRUE, inf2na = TRUE)
x |
a character vector. |
nan2na |
treat NaN (including, e.g., "nan") values as NA values. |
inf2na |
treat Inf, -Inf values (including, e.g., "inf") as NA values. treated as equivalent to NA values. |
Core labelr functions coerce integers to characters and back, which
is_numable
facilitates.
a 1L (scalar) logical vector.
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x2_num_test <- is_numable(x2) x2_num_test x3 <- sample(LETTERS, 10, replace = TRUE) x3_num_test <- is_numable(x3) x3_num_test
set.seed(123) x1 <- runif(10) x2 <- as.character(sample(c(1:20), 10, replace = TRUE)) x2_num_test <- is_numable(x2) x2_num_test x3 <- sample(LETTERS, 10, replace = TRUE) x3_num_test <- is_numable(x3) x3_num_test
lab_int_to_factor
converts a value-labeled integer variable to a factor,
using labelr value labels as factor level labels and returning the modified
data.frame.
lab_int_to_factor(data, var, ordered = FALSE) int2f(data, var, ordered = FALSE)
lab_int_to_factor(data, var, ordered = FALSE) int2f(data, var, ordered = FALSE)
data |
a data.frame object. |
var |
the (unquoted) name of a value-labeled integer variable found in data. |
ordered |
logical flag to determine if resulting factor levels should be regarded as ordered (in ascending order of the integer values of var). |
Note 1: int2f
is a compact alias for lab_int_to_factor
: they do the same
thing, and the former is easier to type.
Note 2: This function can be used to produce ordered factors but will not do so by default (see argument ordered).
Note 3: This function's effects are NOT straightforwardly "undone" by
factor_to_lab_int()
See the latter's documentation for more information and
an example demonstration.
a data.frame.
class(iris[["Species"]]) iris_sp_int <- factor_to_lab_int(iris, Species) class(iris_sp_int[["Species"]]) get_val_labs(iris_sp_int, "Species") iris_sp_fac <- lab_int_to_factor(iris_sp_int, Species) class(iris_sp_fac[["Species"]]) levels(iris_sp_fac[["Species"]]) # copy data.frame mtcars to mt2 mt2 <- mtcars # add value labels to mtcars$carb and assign data.frame to object mt2 mt2 <- add_val_labs( data = mt2, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1c", "2c", # a tad silly, but these val labels will demo the principle "3c", "4c", "6c", "8c" ) ) # carb as labeled integer class(mt2$carb) levels(mt2$carb) head(mt2$carb, 3) lm(mpg ~ carb, data = mt2) (adj_r2_int <- summary(lm(mpg ~ carb, data = mt2))$adj.r.squared) AIC(lm(mpg ~ carb, data = mt2)) # carb as factor carb_fac <- mt2 # copy mt2 to new data.frame carb_fac carb_fac <- lab_int_to_factor(carb_fac, carb) # alias int2f() also works class(carb_fac$carb) levels(carb_fac$carb) head(carb_fac$carb, 3) lm(mpg ~ carb, data = carb_fac) (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_fac))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_fac))
class(iris[["Species"]]) iris_sp_int <- factor_to_lab_int(iris, Species) class(iris_sp_int[["Species"]]) get_val_labs(iris_sp_int, "Species") iris_sp_fac <- lab_int_to_factor(iris_sp_int, Species) class(iris_sp_fac[["Species"]]) levels(iris_sp_fac[["Species"]]) # copy data.frame mtcars to mt2 mt2 <- mtcars # add value labels to mtcars$carb and assign data.frame to object mt2 mt2 <- add_val_labs( data = mt2, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1c", "2c", # a tad silly, but these val labels will demo the principle "3c", "4c", "6c", "8c" ) ) # carb as labeled integer class(mt2$carb) levels(mt2$carb) head(mt2$carb, 3) lm(mpg ~ carb, data = mt2) (adj_r2_int <- summary(lm(mpg ~ carb, data = mt2))$adj.r.squared) AIC(lm(mpg ~ carb, data = mt2)) # carb as factor carb_fac <- mt2 # copy mt2 to new data.frame carb_fac carb_fac <- lab_int_to_factor(carb_fac, carb) # alias int2f() also works class(carb_fac$carb) levels(carb_fac$carb) head(carb_fac$carb, 3) lm(mpg ~ carb, data = carb_fac) (adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_fac))$adj.r.squared) AIC(lm(mpg ~ carb, data = carb_fac))
make_demo_data
generates a data.frame with select (entirely fictional)
"demographic" variables purely for the purposes of demonstrating or exploring
common labelr behaviors and uses and is not designed to accurately emulate or
represent the frequencies or relationships among demographic variables.
make_demo_data( n = 1000, age.mean = 43, age.sd = 15, gend.prob = c(0.45, 0.45, 0.045, 0.045, 0.01), raceth.prob = c(1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7), edu.prob = c(0.03, 0.32, 0.29, 0.24, 0.12), rownames = TRUE )
make_demo_data( n = 1000, age.mean = 43, age.sd = 15, gend.prob = c(0.45, 0.45, 0.045, 0.045, 0.01), raceth.prob = c(1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7), edu.prob = c(0.03, 0.32, 0.29, 0.24, 0.12), rownames = TRUE )
n |
number of observations (rows) of hypothetical data set to create. |
age.mean |
mean value of (fictional) age variable (assuming a normal distribution) recorded in a hypothetical data set. |
age.sd |
standard deviation of (fictional) age variable (assuming a normal distribution) recorded in a hypothetical data set. |
gend.prob |
probabilities of four categories of a gender identity variable recorded in a hypothetical data set. |
raceth.prob |
probabilities of categories of a hypothetical race/ethnicity variable recorded in a hypothetical data set. |
edu.prob |
probabilities of categories of a hypothetical "highest level of education" variable recorded in a hypothetical data set. |
rownames |
create memorable but arbitrary rownames for inspection (if TRUE). |
a data.frame.
# make toy demographic (gender, race, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df) summary(df)
# make toy demographic (gender, race, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df) summary(df)
make_likert_data
generates a data.frame with select (entirely fictional)
numerically coded responses of fictional people to fictional survey items to
demonstrate and explore labelr functionalities.
make_likert_data(n = 1000, scale = 1:7, rownames = TRUE)
make_likert_data(n = 1000, scale = 1:7, rownames = TRUE)
n |
number of observations (rows) of hypothetical data set to create. |
scale |
the sequence of distinct integer values describing the raw / naive numerical codings of Likert-type survey items. |
rownames |
create memorable but arbitrary rownames for inspection (if TRUE). |
Data is entirely fictional and strictly for purposes of demonstrating labelr.
a data.frame.
# add_val_labs() "vars" arg will do partial matching if partial = TRUE # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1)
# add_val_labs() "vars" arg will do partial matching if partial = TRUE # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1)
Takes a stand-alone vector (x), and recodes select values (bef) to some other set of values (aft), returning the recoded vector.
recode_vals(x, bef, aft, default.lab = "bef", unique = FALSE)
recode_vals(x, bef, aft, default.lab = "bef", unique = FALSE)
x |
an integer, character, factor, or logical vector. |
bef |
the "before" (i.e., current) values of x to be recoded. |
aft |
the "after" (recoded) values to be substituted in the returned vector in place of the positionally corresponding bef values of the x vector ("positionally corresponding" means that the first element of aft is the replacement (recode) for all x instances of the first element of bef, and so on for the respective second bef and aft elements, etc.; see examples). variables to which value labels will be added. |
default.lab |
the "aft" value to be used for values of x for which no "bef" value is specified. default.lab = "bef" (the default) will use (retain) the existing value of x as its own recode, coercing to character as needed. For example, if the value x=4 is observed in x but is not included in the "bef" argument, the returned vector will have values of 4 (integer) or "4" (character), depending on whether the recodes that –are– supplied are numeric (then 4) or character (then "4"). |
unique |
if TRUE, return only the mapping itself (bef argument values as names, aft argument values as values), else if FALSE (default), return the full recoded vector of x values. |
While labelr users do not need to engage recode_vals
directly, it is the
underlying function that powers certain core labelr functions, including
add_val_labs
and add_name_labs
. The bef argument identifies the values of
x to recode, and aft argument indicates what each bef value should be recoded
to (order matters: bef=c("a", "b", "c"), aft=c(1, 2, 3) means that "a" values
of x will be recoded to 1 values in returned vector, "b" values will be
recoded to 2, and "c" values will be recoded to 3).
A vector of length equal length of supplied vector, with x values found in bef argument switched to the corresponding values found in the aft argument.
z <- mtcars$gear z[1] <- NA z recode_vals(z, c(5, 3, 4), c("five", "three", "four")) irsp <- iris$Species[c(1:3, 60:62, 148:150)] irsp recode_vals(irsp, c("setosa", "versicolor", "virginica"), c("SE", "VE", "VI")) class(irsp) # factor class(recode_vals( irsp, c("setosa", "versicolor", "virginica"), c("SE", "VE", "VI") )) # coerced to character set.seed(112) x_logic <- sample(c(TRUE, FALSE), 10, replace = TRUE) x_logic recode_vals(x_logic, bef = c(FALSE), c("Fake News!"))
z <- mtcars$gear z[1] <- NA z recode_vals(z, c(5, 3, 4), c("five", "three", "four")) irsp <- iris$Species[c(1:3, 60:62, 148:150)] irsp recode_vals(irsp, c("setosa", "versicolor", "virginica"), c("SE", "VE", "VI")) class(irsp) # factor class(recode_vals( irsp, c("setosa", "versicolor", "virginica"), c("SE", "VE", "VI") )) # coerced to character set.seed(112) x_logic <- sample(c(TRUE, FALSE), 10, replace = TRUE) x_logic recode_vals(x_logic, bef = c(FALSE), c("Fake News!"))
restore_factor_info
searches a data.frame for labelr-specific factor
meta-data (added by add_factor_info()
) and, if found, uses that information
to coerce a character vector that was formerly a factor back into a
factor, with former levels and (if applicable) "ordered" factor status,
as well.
restore_factor_info(data)
restore_factor_info(data)
data |
the data.frame to which labelr-specific factor attribute meta-
data may have been applied via |
A data.frame.
# this function does not strictly require prior or other use of labelr zz <- add_factor_info(iris) # we'll find out what this does sapply(zz, class) # Species is a factor zz$Species <- as.character(zz) # now it's a character sapply(zz, class) # yup, it's a character zz <- restore_factor_info(zz) # we'll find out what this does sapply(zz, class) # now it's back to a factor levels(zz$Species) # levels are back, too.
# this function does not strictly require prior or other use of labelr zz <- add_factor_info(iris) # we'll find out what this does sapply(zz, class) # Species is a factor zz$Species <- as.character(zz) # now it's a character sapply(zz, class) # yup, it's a character zz <- restore_factor_info(zz) # we'll find out what this does sapply(zz, class) # now it's back to a factor levels(zz$Species) # levels are back, too.
sbrac
allows one to do "bracket-like" row and/or column selection
(without actual brackets) on a labelr-labeled data.frame in a manner that
preserves its labelr label attributes.
sbrac(data, ri = NULL, ci = NULL, partial = FALSE)
sbrac(data, ri = NULL, ci = NULL, partial = FALSE)
data |
the data.frame from which columns will be selected. |
ri |
row indices (integer positional or logical) or row.names (or partial row.names if partial = TRUE) to be selected. |
ci |
column indices (integer positional or logical) or column / variable names (or partial column names if partial = TRUE) to be selected. |
partial |
if TRUE, any row or column that contains the relevant character substring will be selected (e.g., sbrac(mtcars, c("Maz"), c("di")) will return all "Mazda" car rows and the column "disp"). |
sbrac
allows one to select rows and columns of a data.frame in a bracket-
like fashion, using integers (positional indices), logicals, or (partial)
character names (row names and column/variable names). For example,
sbrac(df, 1:5, 2:4) returns a data.frame (with any labelr attribute
meta-data intact), consisting of rows 1-5 and columns 2-4 of data.frame df;
while sbrac(mtcars, "Maz", "a", partial = TRUE) returns all variables with
substring "a" in their names for all rows with substring "Maz" in their row
names. Integer indices (only) can be enclosed in -c() for negative index
selection (i.e., "select not these"), where, e.g.,
sbrac(df, -c(1,2), "x", partial = TRUE) means select all rows of df other
than rows 1 and 2 and all columns that feature the letter "x" in their names.
See also ssubset
, sfilter
, sselect
, sdrop
, ssort
, and srename
, as
well as slab
and flab
for other labelr attribute-preserving subsetting
tools.
a labelr label attribute-preserving data.frame, consisting of the selected rows and/or columns index-selected.
# create a copy of the mtcars data set mtc2 <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels mtc2 <- add_name_labs(mtc2, vars = names(names_labs_vec), labs = names_labs_vec ) # examples of sbrac() functionality sbrac(mtc2, 1:4, ) # commas used in a bracket-like way: row 1:4 and all cols sbrac(mtc2, , 1:4) # commas used in a bracket-like way: all rows and cols 1:4 sbrac(mtc2, 1, 2) # 1 is row, 2 is col sbrac(mtc2, -c(8:32), -c(1:8)) # select NOT-these rows and cols (-) sbrac(mtc2, 1:5, 1:2) # rows 1-5, cols 1 and 2 # if partial = TRUE, partial matching to get all Mazda or Merc + all # ...vars with "ar" in name sbrac(mtc2, c("Mazda", "Merc"), c("ar"), partial = TRUE) # see what this does mtc3 <- sbrac(mtc2, c("45"), 1:2, partial = TRUE) # see what this does get_labs_att(mtc3, "name.labs") # name.labs still there
# create a copy of the mtcars data set mtc2 <- mtcars # variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # assign variable labels mtc2 <- add_name_labs(mtc2, vars = names(names_labs_vec), labs = names_labs_vec ) # examples of sbrac() functionality sbrac(mtc2, 1:4, ) # commas used in a bracket-like way: row 1:4 and all cols sbrac(mtc2, , 1:4) # commas used in a bracket-like way: all rows and cols 1:4 sbrac(mtc2, 1, 2) # 1 is row, 2 is col sbrac(mtc2, -c(8:32), -c(1:8)) # select NOT-these rows and cols (-) sbrac(mtc2, 1:5, 1:2) # rows 1-5, cols 1 and 2 # if partial = TRUE, partial matching to get all Mazda or Merc + all # ...vars with "ar" in name sbrac(mtc2, c("Mazda", "Merc"), c("ar"), partial = TRUE) # see what this does mtc3 <- sbrac(mtc2, c("45"), 1:2, partial = TRUE) # see what this does get_labs_att(mtc3, "name.labs") # name.labs still there
scbind
allows one to bind columns together into a data.frame, while
preserving any labelr labels of the inputted data.frames.
scbind(...)
scbind(...)
... |
data.frames to be column-bound |
Precedence is given to the labels of earlier-appearing arguments, such that, if both an earlier and a later data.frame include a label attribute with the same name, the attribute from the earlier data.frame will be preserved, and the same-named attribute later data.frame(s) will be discarded.
a data.frame.
# assign mtcars to df df <- mtcars # add value labels to "am" df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # add numerical range value labels to "mpg" df <- add_quant1(df, mpg, qtiles = 4) df_a <- sselect(df, "am") df_b <- sselect(df, "mpg") df_c <- sselect(mtcars, "cyl") df_all <- scbind(df_a, df_b, df_c) head(df_all) get_val_labs(df_all)
# assign mtcars to df df <- mtcars # add value labels to "am" df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # add numerical range value labels to "mpg" df <- add_quant1(df, mpg, qtiles = 4) df_a <- sselect(df, "am") df_b <- sselect(df, "mpg") df_c <- sselect(mtcars, "cyl") df_all <- scbind(df_a, df_b, df_c) head(df_all) get_val_labs(df_all)
schange
allows one to modify or add a column to a data.frame while
preserving the labelr attributes attached to the inputted data.frame.
schange(data, ...)
schange(data, ...)
data |
a data.frame. |
... |
an expression that will create or update a column in data. |
Note that, while schange
preserves label meta-data of the supplied
data.frame, it does not update or add any new labels. Therefore, if you are
altering the range of values for an extant variable or adding a new variable
to the data.frame, you will need to explicitly instantiate any new or
modified labels that you desire via follow-up calls to functions such as
add_val_labs()
, drop_val_labs()
, etc.
a data.frame.
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- schange(df, mpg_sq = mpg^2) # create var "mpg_sq" df <- schange(df, am = ifelse(am == 0, 2, am)) # modify/replace var "am" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) # labels are intact; "val.labs.am" needs updating! df <- drop_val_labs( data = df, vars = "am" ) df <- add_val_labs( data = df, vars = "am", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- schange(df, mpg_sq = mpg^2) # create var "mpg_sq" df <- schange(df, am = ifelse(am == 0, 2, am)) # modify/replace var "am" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) # labels are intact; "val.labs.am" needs updating! df <- drop_val_labs( data = df, vars = "am" ) df <- add_val_labs( data = df, vars = "am", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
sdrop
allows one to remove columns from a data.frame, returning the
remaining columns as a data.frame that preserves the labelr attributes
attached to the inputted data.frame.
sdrop(data, ...)
sdrop(data, ...)
data |
the data.frame from which columns will be removed. |
... |
comma-separated, unquoted column/variable names to be discarded (e.g., cyl, mpg, not c("cyl", "mpg")), with no other special characters or symbols, such as quotes, parentheses, colons, minus signs, exclamation points, or other operators. |
This function accepts a data.frame, followed by a set of comma-separated,
non-quoted column names to be discarded and returns the remaining columns as
a data.frame that preserves labelr attribute information. NOTE: This command
does NOT allow for positive specification of columns to be retained; rather,
all variables not specified will be retained by default. Further, sdrop
does not supported quoted column names, dplyr-like helper functions or
other special selection syntax or idioms. See also ssubset
, sselect
, or
sbrac
); see also sfilter
, ssort
, srename
, slab
, and flab
.
a labelr label attribute-preserving data.frame consisting of the remaining (i.e., non-specified, non-discarded) subset of columns of the supplied data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfless <- sdrop(df, id, raceth) # drop the vars id and raceth head(dfless, 3) # selection worked check_labs_att(dfless, "val.labs.raceth") # "raceth" value labels are gone
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfless <- sdrop(df, id, raceth) # drop the vars id and raceth head(dfless, 3) # selection worked check_labs_att(dfless, "val.labs.raceth") # "raceth" value labels are gone
sfilter
allows one to filter-subset a data.frame, selecting only rows that
satisfy conditions (after the fashion of base::subset() or dplyr::filter()),
returning the selected rows as a data.frame that preserves the labelr
attributes attached to the inputted data.frame.
sfilter(data, condition)
sfilter(data, condition)
data |
the data.frame from which columns will be selected. |
condition |
row-filtering conditions along the lines of base::subset() and/or dplyr::select(). |
This function accepts a data.frame, followed by conditional filtering
instructions and returns the selected rows (and all inputted data.frame
columns) as a data.frame that preserves the labelr attribute information of
the originally supplied data.frame. See ssubset
for a variant that combines
sfilter
row-filtering and sselect
column selection in a single function.
See sbrac
for a labelr attribute-preserving approach to row and/or column
indexing. See also sdrop
, ssort
, srename
, slab
, and flab
.
a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s).
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dffil1 <- sfilter(df, raceth %in% c(1, 2, 3) & id < 50 & gender == 1) head(dffil1, 3) check_labs_att(dffil1, "val.labs.raceth") # "raceth" lab specifically TRUE dffil2 <- sfilter(df, !raceth %in% 1:5 | (age == 60)) head(dffil2, 8) check_labs_att(dffil2, "val.labs.raceth") # "raceth" lab specifically TRUE
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dffil1 <- sfilter(df, raceth %in% c(1, 2, 3) & id < 50 & gender == 1) head(dffil1, 3) check_labs_att(dffil1, "val.labs.raceth") # "raceth" lab specifically TRUE dffil2 <- sfilter(df, !raceth %in% 1:5 | (age == 60)) head(dffil2, 8) check_labs_att(dffil2, "val.labs.raceth") # "raceth" lab specifically TRUE
sgen
allows one to add a column to a data.frame while preserving the labelr
attributes attached to the inputted data.frame.
sgen(data, ...)
sgen(data, ...)
data |
a data.frame. |
... |
an expression that will create a new column in the data.frame. |
Note that, while sgen
preserves label meta-data of the supplied
data.frame, it does not update or add any new labels. Therefore, you will
need to explicitly instantiate any new or modified labels that you desire via
follow-up calls to functions such as add_val_labs()
, drop_val_labs()
, etc.
You may not use sgen
to replace a variable already present in a the supplied
data.frame. For that, see sreplace
or schange
.
a data.frame.
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sgen(df, mpg_sq = mpg^2) # create var "mpg_sq" df <- sgen(df, am2 = ifelse(am == 0, 2, am)) # create var "am2" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) df <- add_quant_labs( data = df, vars = "mpg_sq", vals = c(200, 400, 600, 1000, 1500), labs = NULL ) df <- add_val_labs( data = df, vars = "am2", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sgen(df, mpg_sq = mpg^2) # create var "mpg_sq" df <- sgen(df, am2 = ifelse(am == 0, 2, am)) # create var "am2" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) df <- add_quant_labs( data = df, vars = "mpg_sq", vals = c(200, 400, 600, 1000, 1500), labs = NULL ) df <- add_val_labs( data = df, vars = "am2", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
slab
("subset using labels") allows one to filter rows and select columns
from a data.frame based on variable-specific value label attributes.
slab(data, condition, ...)
slab(data, condition, ...)
data |
the data.frame from which columns rows will be filtered (and, possibly, columns selected) |
condition |
row-filtering conditions along the lines of base::subset()
and/or dplyr::filter(). Note: Row-filtering conditions (to include
condition==NULL) must be supplied. Conditions involving value-labeled
variables must be expressed in terms of the value labels (see examples),
else try |
... |
Optionally supply one or more unquoted, comma-separated column names that identify columns to be retained (Note: If no columns are listed, all columns will be retained). Note: While row-filtering conditions may leverage standard operators (e.g., &, |, ==, !=), the column-selection portion of call may not incorporate special characters or symbols, such as quotes, parentheses, colons, minus signs, exclamation points, or other operators. Only positive selection of columns is permitted (negative selection – i.e., "select not / select-all-except" specified columns is not supported) |
slab
does base::subset-style data subsetting using variable value label
meta-data that are associated with variable values but are not themselves
values (i.e., will not appear in response to View()
, head()
, etc.).
In other words, value labels are supplied to the slab()
call to direct the
filtering process, but those value labels are not displayed in the cells of
the returned data.frame – the raw values themselves are. This functionality
may be useful for interactively subsetting a data.frame, where character
value labels may be more intuitive and easily recalled than the underlying
variable values themselves (e.g., raceth=="White" & gender="F" may be more intuitive or
readily recalled than raceth==3 & gender==2).
slab
takes as its arguments a labelr value-labeled data.frame, followed by
condition-based row-filtering instructions (required) and a list of unquoted
names of variables to be retained (optional; all variables returned by
default).
Note 1: When using slab
, any conditional row-filtering syntax involving
value-labeled variables must be expressed in terms of those variables' value
labels, not the raw values themselves. Filtering on non-value-labeled
variables is also permitted, with those variables' filtering conditions being
expressed in terms of raw values. Further, slab()
calls may reference both
types of columns (i.e., value-labeled variables and non-value-labeled
variables), provided filtering conditions for the former are expressed in
terms of value labels.
Note 2: slab
(and labelr more broadly) is intended for moderate-sized (or
smaller) data.frames, defined loosely as those with a few million or fewer
rows. With a conventional (c. 2024) laptop, labelr operations on modest-
sized (~100K rows) take seconds (or less); with larger (> a few million rows)
data.frames, labelr may take several minutes (or run out of memory and fail
altogether!), depending on the complexity of the call and the number and type
of cells implicated in it.
See also flab
, use_val_labs
, add_val_labs
, add_val1
,add_quant_labs
,
add_quant1
, get_val_labs
, drop_val_labs
. For label-preserving
subsetting tools that subset in terms of raw values (not value labels), see
sfilter
, sbrac
, ssubset
, sdrop
.
a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s) and the columns whose (unquoted, comma-separated) names are passed to dots (...) (or all columns if no column names are passed).
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more # meaningful when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # see what we did # get_val_labs(df) get_val_labs(df, "gender") get_val_labs(df, "raceth") # use --labels-- to subset w/ slab() ("*S*ubset using *lab*els") dflab <- slab(df, raceth == "Asian" & gender == "F", id, gender) head(dflab, 4) # equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter") dfsf <- ssubset(df, raceth == 3 & gender == 1, gender, raceth) head(dfsf, 4)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more # meaningful when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # see what we did # get_val_labs(df) get_val_labs(df, "gender") get_val_labs(df, "raceth") # use --labels-- to subset w/ slab() ("*S*ubset using *lab*els") dflab <- slab(df, raceth == "Asian" & gender == "F", id, gender) head(dflab, 4) # equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter") dfsf <- ssubset(df, raceth == 3 & gender == 1, gender, raceth) head(dfsf, 4)
smerge
allows one to merge two data.frames, one or both of which may be
labeled, preserving the labelr attributes of the inputted data.frames.
smerge(x, y, ...)
smerge(x, y, ...)
x |
a data.frame to be merged with y. |
y |
a data.frame to be merged with x. |
... |
additional arguments passed to |
Precedence is given to the labels of the first data.frame (argument x), such that, if both data.frames include a label attribute with the same name, the attribute from data.frame x will be preserved, and the same-named attribute from data.frame y will be discarded.
a data.frame.
# assign mtcars to df df <- mtcars # assign the rownames to a column id <- rownames(df) df <- cbind(id, df) # split the data.frame into two df_a <- df[c("id", "am")] df_b <- df[c("id", "mpg")] # add value labels to df_a$am df_a <- add_val_labs( data = df_a, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # add numerical range value labels to df_b$mpg df_b <- add_quant1(df_b, mpg, qtiles = 4) # now, safely merge them df_m <- smerge(df_a, df_b) head(df_m) get_val_labs(df_m)
# assign mtcars to df df <- mtcars # assign the rownames to a column id <- rownames(df) df <- cbind(id, df) # split the data.frame into two df_a <- df[c("id", "am")] df_b <- df[c("id", "mpg")] # add value labels to df_a$am df_a <- add_val_labs( data = df_a, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) # add numerical range value labels to df_b$mpg df_b <- add_quant1(df_b, mpg, qtiles = 4) # now, safely merge them df_m <- smerge(df_a, df_b) head(df_m) get_val_labs(df_m)
somel
accepts a labelr value-labeled data.frame and returns a random sample
of n value-labeled rows of that data.frame
somel(data, n = 6L)
somel(data, n = 6L)
data |
a data.frame. |
n |
the number of random rows of the data.frame to return. |
somel
is inspired by the function some
from the car package. See also
headl
and taill
.
a data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) somel(df) # six random rows with value labels visible
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) somel(df) # six random rows with value labels visible
sort_val_labs
sorts the presentation order of variable value label meta-data.
sort_val_labs(data)
sort_val_labs(data)
data |
a data.frame |
This function is used in internally by other labelr functions to ensure that value label meta-data is sorted in a logical, intuitive order. It is not intended for interactive use.
a data.frame
# note that this example is trivial, as value labels already are in order df <- mtcars df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sort_val_labs(df)
# note that this example is trivial, as value labels already are in order df <- mtcars df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sort_val_labs(df)
srbind
allows one to bind rows together into a data.frame, while preserving
any labelr labels of the inputted data.frames.
srbind(...)
srbind(...)
... |
data.frames to be row-bound |
Precedence is given to the labels of earlier-appearing arguments, such that, if both an earlier and a later data.frame include a label attribute with the same name, the attribute from the earlier data.frame will be preserved, and the same-named attribute later data.frame(s) will be discarded.
a data.frame.
# assign mtcars to df df <- mtcars # assign the rownames to a column id <- rownames(df) df <- cbind(id, df) # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # split the data.frame into three aa <- df[1:5, ] bb <- df[6:11, ] cc <- df[12:32, ] # put them back together df2 <- srbind(aa, bb, cc, cc) get_val_labs(df2)
# assign mtcars to df df <- mtcars # assign the rownames to a column id <- rownames(df) df <- cbind(id, df) # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # split the data.frame into three aa <- df[1:5, ] bb <- df[6:11, ] cc <- df[12:32, ] # put them back together df2 <- srbind(aa, bb, cc, cc) get_val_labs(df2)
Note: srename
renames and existing variable and preserves its value labels,
overwriting an existing variable only if option force = TRUE.
srename(data, old.name, new.name, force = FALSE)
srename(data, old.name, new.name, force = FALSE)
data |
a data.frame. |
old.name |
the unquoted name of the existing variable being renamed (to new.name). |
new.name |
the unquoted name that will be used to rename the variable specified in the old.name argument. |
force |
if a variable with the same name as new.name already exists in the data.frame, allow it to be overwritten. If FALSE, this will not be allowed, and an error will be issued. |
Any non-labelr R operation that changes a variable's (column's) name or that
copies its contents to another variable (column) with a different name will
not associate the original variable's value labels with the new variable name.
To mitigate this, srename
allows one to rename a data.frame variable while
preserving its value labels – that is, by associating the old.name's value
labels with the new.name. If the old.name variable (column) has a name label
(in "name.labs" attribute), the column name associated with that name label
will be changed from old.name to new.name.
A data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 4) df <- srename(df, old.name = gender, new.name = genid) df <- srename(df, old.name = raceth, new.name = racid) df <- srename(df, old.name = x1, new.name = var1) head(df, 4)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 4) df <- srename(df, old.name = gender, new.name = genid) df <- srename(df, old.name = raceth, new.name = racid) df <- srename(df, old.name = x1, new.name = var1) head(df, 4)
sreplace
allows one to replace a data.frame column with new values, while
preserving the labelr attributes attached to the inputted data.frame.
sreplace(data, ...)
sreplace(data, ...)
data |
a data.frame. |
... |
an expression that replaces a column in the data.frame. |
Note that, while sreplace
preserves label meta-data of the supplied
data.frame, it does not update or add any new labels. Therefore, if you are
altering the range of values for an extant variable, you will need to
explicitly instantiate any new or modified labels that you desire via
follow-up calls to functions such as add_val_labs()
, drop_val_labs()
, etc.
You may not use sreplace
to generate a new variable (i.e., one not already
present in a the supplied data.frame). For that, see sgen
or schange
.
a data.frame.
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sreplace(df, mpg = mpg^2) # replace var "mpg" df <- sreplace(df, am = ifelse(am == 0, 2, am)) # replace var "am" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) df <- add_quant_labs( data = df, vars = "mpg", vals = c(200, 400, 600, 1000, 1500), labs = NULL ) df <- drop_val_labs( data = df, vars = "am" ) df <- add_val_labs( data = df, vars = "am", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) df <- sreplace(df, mpg = mpg^2) # replace var "mpg" df <- sreplace(df, am = ifelse(am == 0, 2, am)) # replace var "am" head(df, 4) # show that data.frame modifications have been made get_all_lab_atts(df) df <- add_quant_labs( data = df, vars = "mpg", vals = c(200, 400, 600, 1000, 1500), labs = NULL ) df <- drop_val_labs( data = df, vars = "am" ) df <- add_val_labs( data = df, vars = "am", vals = c(1, 2), labs = c("manual", "automatic") ) get_val_labs(df)
sselect
allows one to subset (select) columns from a data.frame,
returning the selected columns as a data.frame that preserves the labelr
attributes attached to the inputted data.frame.
sselect(data, ...)
sselect(data, ...)
data |
the data.frame from which columns will be selected. |
... |
comma-separated, unquoted column/variable names to be selected (e.g., cyl, mpg, not c("cyl", "mpg")), with no other special characters or symbols, such as quotes, parentheses, colons, minus signs, exclamation points, or other operators. |
This function accepts a data.frame, followed by a set of comma-separated,
non-quoted column names to be retained and returns the selected columns in a
data.frame that preserves labelr attribute information. NOTE: This command
does not allow for negative selection, quoted columns, or dplyr-like helper
functions or special selection idioms, but: see sdrop
for negative selection
("return all columns except these"); see sbrac
for a more flexible
subsetting command; and see also ssubset
, sfilter
, ssort
, and srename
,
as well as slab
and flab
.
a labelr label attribute-preserving data.frame consisting of the selected subset of columns.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfless <- sselect(df, id, raceth) # select only the vars id and raceth head(dfless, 3) # selection worked check_labs_att(dfless, "val.labs.raceth") # "raceth" value labels preserved
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfless <- sselect(df, id, raceth) # select only the vars id and raceth head(dfless, 3) # selection worked check_labs_att(dfless, "val.labs.raceth") # "raceth" value labels preserved
ssort
allows one to sort (after the fashion of base::order or
dplyr::arrange) the rows of a data.frame based on column values.
ssort(data, vars, descending = FALSE, na.last = TRUE, fact.to.char = TRUE)
ssort(data, vars, descending = FALSE, na.last = TRUE, fact.to.char = TRUE)
data |
the data.frame to be sorted. |
vars |
the variables to be sorted on, specified as a quoted character vector of variable names or an integer vector of column position indices. |
descending |
whether to sort the given variable of vars in descending or ascending order. Default is FALSE, which will be recycled to all vars arguments. |
na.last |
force NA values to appear last in a variable's sort order if TRUE (default). |
fact.to.char |
coerce all factor variables to character variables. |
This function accepts a data.frame, followed by a quoted vector of column
names (or an integer vector of column position indices), followed by
an indication of which are to be sorted ascending (default) or descending.
If multiple columns are supplied to vars, sorting prioritizes the columns
that appear earlier, with values of subsequent columns being sorted within
distinct values of earlier columns. Note: ssort
is fast enough on small
data.frames and very slow on "larger" (>500K records) data.frames,
particularly for more complex or demanding sort requests. Other R packages
may provide faster sorting while preserving labelr attributes.
a labelr label attribute-preserving data.frame consisting of the re-sorted data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort1 <- ssort(df, c("raceth", "gender", "age"), descending = c(TRUE, FALSE, FALSE)) head(dfsort1, 20) check_labs_att(dfsort1, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort2 <- ssort(df, c("age", "gender")) head(dfsort2, 20) check_labs_att(dfsort2, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort3 <- ssort(df, c("raceth")) head(dfsort3, 10) check_labs_att(dfsort3, "val.labs.raceth") # "raceth" lab specifically TRUE
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort1 <- ssort(df, c("raceth", "gender", "age"), descending = c(TRUE, FALSE, FALSE)) head(dfsort1, 20) check_labs_att(dfsort1, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort2 <- ssort(df, c("age", "gender")) head(dfsort2, 20) check_labs_att(dfsort2, "val.labs.raceth") # "raceth" lab specifically TRUE dfsort3 <- ssort(df, c("raceth")) head(dfsort3, 10) check_labs_att(dfsort3, "val.labs.raceth") # "raceth" lab specifically TRUE
ssubset
allows one to simultaneously filter-select rows that
satisfy conditions AND return only selected columns as a data.frame that
preserves the labelr attributes
attached to the inputted data.frame.
ssubset(data, condition, ...)
ssubset(data, condition, ...)
data |
the data.frame from which columns rows will be filtered (and, possibly, columns selected). |
condition |
row-filtering conditions along the lines of base::subset() and/or dplyr::select(). Note: Row-filtering conditions (to include condition==NULL) must be supplied. |
... |
Optionally supply one or more unquoted, comma-separated column names that identify columns to be retained (Note: If no columns are listed, all columns will be retained). Note: While row-filtering conditions may leverage standard operators (e.g., &, |, ==, !=), the column-selection portion of call may not incorporate special characters or symbols, such as quotes, parentheses, colons, minus signs, exclamation points, or other operators. |
Combining sfilter
and sselect
functionality (along the lines of
base::subset(), this function accepts a data.frame, followed by conditional
filtering instructions (required) and (optional) comma-separated unquoted
column names to be selected (see examples), returning the selected rows and
columns as a data.frame that preserves the labelr attribute information of
the originally supplied data.frame. See ssubset
for a variant that combines
sfilter
row-filtering and sselect
column selection in a single function.
See also sdrop
, sbrac
, ssort
, srename
, slab
and flab
.
a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s).
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfsub1 <- ssubset(df, raceth == 2 & age > 70, id, raceth, gender) head(dfsub1, 3) check_labs_att(dfsub1, "val.labs.raceth") # "raceth" lab specifically TRUE dfsub2 <- ssubset(df, raceth %in% c(2:4), age, raceth) head(dfsub2, 3) check_labs_att(dfsub2, "val.labs.raceth") # "raceth" lab specifically TRUE # even if NULL, must supply explicit condition argument head(ssubset(df, condition = NULL, age, raceth), 3) # better to just use sselect() head(ssubset(df, age == 60), 3) # skip column selection (will return all cols)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) head(df, 3) check_labs_att(df, "val.labs.raceth") # "raceth" lab specifically TRUE dfsub1 <- ssubset(df, raceth == 2 & age > 70, id, raceth, gender) head(dfsub1, 3) check_labs_att(dfsub1, "val.labs.raceth") # "raceth" lab specifically TRUE dfsub2 <- ssubset(df, raceth %in% c(2:4), age, raceth) head(dfsub2, 3) check_labs_att(dfsub2, "val.labs.raceth") # "raceth" lab specifically TRUE # even if NULL, must supply explicit condition argument head(ssubset(df, condition = NULL, age, raceth), 3) # better to just use sselect() head(ssubset(df, age == 60), 3) # skip column selection (will return all cols)
strip_labs
removes all labelr meta-data attributes ("name.labs", "val.labs",
"frame.lab", and "factor.") from a data.frame, because you no longer want/need
it for some reason or another.
strip_labs(data)
strip_labs(data)
data |
a data.frame object. |
Some labelr functions automatically use strip_labs
, but you should only
use it if you no longer want or need a given data.frame's labelr meta-data
(i.e., labels). If you have saved your labelr attributes (using
get_all_lab_atts
), you can restore them to the data.frame using
add_lab_atts
).
a data.frame object with label attribute meta-data stripped from it.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_val_labs(df, "raceth") # it's here df <- strip_labs(df) # this removes labs from df get_val_labs(df, "raceth") # it's gone check_any_lab_atts(df) # FALSE (means "no labs here")
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) get_val_labs(df, "raceth") # it's here df <- strip_labs(df) # this removes labs from df get_val_labs(df, "raceth") # it's gone check_any_lab_atts(df) # FALSE (means "no labs here")
tabl
calculates raw or weighted frequency counts (or proportions) over
arbitrary categorical values (including integer values), which may be
expressed in terms of raw variable values or labelr label values.
tabl( data, vars = NULL, labs.on = TRUE, qtiles = 4, prop.digits = NULL, wt = NULL, div.by = NULL, max.unique.vals = 10, sort.freq = TRUE, zero.rm = FALSE, irreg.rm = FALSE, wide.col = NULL )
tabl( data, vars = NULL, labs.on = TRUE, qtiles = 4, prop.digits = NULL, wt = NULL, div.by = NULL, max.unique.vals = 10, sort.freq = TRUE, zero.rm = FALSE, irreg.rm = FALSE, wide.col = NULL )
data |
a data.frame. |
vars |
a quoted character vector of variable names of variables you wish
to include in defining category groups to tabulate over in the table. If NULL
|
labs.on |
if TRUE (the default), then value labels – rather than the raw variable values – will be displayed in the returned table for any value-labeled variables. Variables need not be value-labeled: This command (with this option set to TRUE or FALSE) will work even when no variables are value-labeled. |
qtiles |
if not NULL, must be a 1L integer between 2 and 100 indicating the number of quantile categories to employ in temporarily (for purposes of tabulation) auto-value-labeling numeric columns that exceed the max.unique.vals threshold. If NULL, no such auto-value-labeling will take place. Note: When labs.on = TRUE, any pre-existing variable value labels will be used in favor of the quantile value labels generated by this argument. By default, qtiles = 4, and the automatically generated quantile category levels will be labeled as "q025" (i.e., first quartile), "q050", "q075", and "q100". |
prop.digits |
if non-NULL, cell percentages (proportions) will be returned instead of frequency counts, and these will be rounded to the digit specified (e.g., prop.digits = 3 means a value of 0.157 would be returned for a cell that accounted for 8 observations if the total number of observations were 51). If NULL (the default), frequency counts will be returned. |
wt |
an optional vector that includes cell counts or some other idiosyncratic "importance" weight. If NULL, no weighting will be employed. |
div.by |
Divide the returned counts by a constant for scaling purposes. This may be a number (e.g., div.by = 10 to divide by 10) or a character that follows the convention "number followed by 'K', 'M', or 'B'", where, e.g., "10K" is translated as 10000, "1B" is translated as 1000000000, etc. |
max.unique.vals |
Integer to specify the maximum number of unique values of a variable that may be observed for that variable to be included in tabulations. Note that labelr sets a hard ceiling of 5000 on the total number of unique value labels that any variable is permitted to have under any circumstance, as labelr is primarily intended for interactive use with moderately-sized data.frames. See the qtiles argument for an approach to incorporating many-valued numeric variables that exceed the max.unique.vals threshold. |
sort.freq |
By default, returned table rows are sorted in descending order of cell frequency (most frequent categories/combinations first). If set to FALSE, table rows will be sorted by the the distinct values of the vars (in the order vars are specified). |
zero.rm |
If TRUE, zero-frequency vars categories/combinations (i.e., those not observed in the data.frame) will be filtered from the table. For tables that would produce more than 10000 rows, this is done automatically. |
irreg.rm |
If TRUE, tabulations exclude cases where any applicable variable (see vars argument) features any of the following "irregular" values: NA, NaN, Inf, -Inf, or any non-case-sensitive variation on "NA", "NAN", "INF", or "-INF." If FALSE, all "irregular" values (as just defined) are assigned to a "catch-all" category of NA that is featured in the returned table (if/where present). |
wide.col |
If non-NULL, this is the quoted name of a single column / var of supplied data.frame whose distinct values (category levels) you wish to be columns of the returned table. For example, if you are interested in a cross-tab of "edu" (highest level of education) and "race" (a race/ethnicity variable), you could supply vars= c("edu") and wide.col = "race", and the different racial-ethnic group categories would appear as distinct columns, with "edu" category levels appearing as distinct rows, and cell values representing the cross-tabbed cell "edu" level frequencies for the respective "race" groups (see examples). You may supply one wide.col at most. |
This function creates a labelr-friendly data.frame representation of multi-variable tabular data, where either value labels or values can be displayed (use of value labels is the default), and where various convenience options are provided, such as using frequency weights, using proportions instead of counts, rounding those percentages, automatically expressing many-valued, non-value-labeled numerical variables in terms of quantile category groups, or pivoting / casting one of the categorical variables' levels (labels) to serve as columns in a cross-tab-like table.
a data.frame.
# assign mtcars to new data.frame df df <- mtcars # add na values to make things interesting df[1, 1:11] <- NA rownames(df)[1] <- "Missing Car" # add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" arg instead of "vars df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = 3:5, labs = c( "3-speed", "4-speed", "5-speed" ) ) # lookup mapping get_val_labs(df) # introduce other "irregular" values df$am[1] <- NA df[2, "am"] <- NaN df[3, "am"] <- -Inf df[5, "cyl"] <- "NAN" # take a look head(df) # demonstrate tabl() frequency tabulation function # this is the "first call" that will be referenced repeatedly below # labels on, sort by variable values, suppress/exclude NA/irregular values # ...return counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return counts, not proportions # same as "first call", except now value labels are off tabl(df, vars = c("cyl", "am"), labs.on = FALSE, # use variable values sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return counts, not proportions # same as "first call," except now proportions instead of counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = 3 ) # return proportions, rounded to 3rd decimal # same as "first call," except now sort by frequency counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort in order of descending frequency irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return proportions, rounded to 3rd decimal # same as "first call," except now use weights set.seed(2944) # for reproducibility df$freqwt <- sample(10:50, nrow(df), replace = TRUE) # create (fake) freq wts tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL ) # return counts, not proportions df$freqwt <- NULL # we don't need this anymore # now, with extremely large weights to illustrate div.by set.seed(428441) # for reproducibility df$freqwt <- sample(1000000:10000000, nrow(df), replace = TRUE) # large freq wts tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL ) # return counts, not proportions # show div by - Millions tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "1M" ) # one million # show div by - Tens of millions tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "10M" ) # ten million # show div by - 10000 tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = 10000 ) # ten thousand; could've used div.by = "10K" # show div by - 10000, but different syntax tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "10K" ) # ten thousand; could've used div.by = 10000 df$freqwt <- NULL # we don't need this anymore # turn labels off, to make this more compact # do not show zero values (zero.rm) # do not show NA values (irreg.rm) # many-valued numeric variables will be converted to quantile categories by # ...qtiles argument tabl(df, vars = c("am", "gear", "carb", "mpg"), qtiles = 4, # many-valued numerics converted to quantile labs.on = FALSE, # use values, not variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed zero.rm = TRUE, # variable combinations that never occur are suppressed prop.digits = NULL, # return counts, not proportions max.unique.vals = 10 ) # drop from table any var with >10 distinct values # same as above, but include NA/irregular category values, # zero.rm is TRUE; include unobserved (zero-count) category combinations tabl(df, vars = c("am", "gear", "carb", "mpg"), qtiles = 4, labs.on = FALSE, # use values, not variable value labels sort.freq = TRUE, # sort by frequency irreg.rm = FALSE, # preserve/include NAs and irregular values zero.rm = FALSE, # include non-observed combinations prop.digits = NULL, # return counts, not proportions max.unique.vals = 10 ) # drop from table any var with >10 distinct values # show cross-tab view with wide.col arg tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL, # return counts, not proportions wide.col = "am" ) # use "am" as a column variable in a cross-tab view tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL, # return counts, not proportions wide.col = "cyl" ) # use "cyl" as a column variable in a cross-tab view # verify select counts using base::subset() nrow(subset(df, am == 0 & cyl == 4)) nrow(subset(df, am == 0 & cyl == 8)) nrow(subset(df, am == 1 & cyl == 8)) nrow(subset(df, am == 0 & cyl == 6)) nrow(subset(df, am == 1 & cyl == 6)) # will work on an un-labeled data.frame tabl(mtcars, vars = c("am", "gear", "carb", "mpg"))
# assign mtcars to new data.frame df df <- mtcars # add na values to make things interesting df[1, 1:11] <- NA rownames(df)[1] <- "Missing Car" # add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" arg instead of "vars df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = 3:5, labs = c( "3-speed", "4-speed", "5-speed" ) ) # lookup mapping get_val_labs(df) # introduce other "irregular" values df$am[1] <- NA df[2, "am"] <- NaN df[3, "am"] <- -Inf df[5, "cyl"] <- "NAN" # take a look head(df) # demonstrate tabl() frequency tabulation function # this is the "first call" that will be referenced repeatedly below # labels on, sort by variable values, suppress/exclude NA/irregular values # ...return counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return counts, not proportions # same as "first call", except now value labels are off tabl(df, vars = c("cyl", "am"), labs.on = FALSE, # use variable values sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return counts, not proportions # same as "first call," except now proportions instead of counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = 3 ) # return proportions, rounded to 3rd decimal # same as "first call," except now sort by frequency counts tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort in order of descending frequency irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL ) # return proportions, rounded to 3rd decimal # same as "first call," except now use weights set.seed(2944) # for reproducibility df$freqwt <- sample(10:50, nrow(df), replace = TRUE) # create (fake) freq wts tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL ) # return counts, not proportions df$freqwt <- NULL # we don't need this anymore # now, with extremely large weights to illustrate div.by set.seed(428441) # for reproducibility df$freqwt <- sample(1000000:10000000, nrow(df), replace = TRUE) # large freq wts tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL ) # return counts, not proportions # show div by - Millions tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "1M" ) # one million # show div by - Tens of millions tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "10M" ) # ten million # show div by - 10000 tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = 10000 ) # ten thousand; could've used div.by = "10K" # show div by - 10000, but different syntax tabl(df, vars = c("cyl", "am"), wt = "freqwt", # use frequency weights labs.on = TRUE, # use variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = FALSE, # NAs and the like are included/shown prop.digits = NULL, # return counts, not proportions div.by = "10K" ) # ten thousand; could've used div.by = 10000 df$freqwt <- NULL # we don't need this anymore # turn labels off, to make this more compact # do not show zero values (zero.rm) # do not show NA values (irreg.rm) # many-valued numeric variables will be converted to quantile categories by # ...qtiles argument tabl(df, vars = c("am", "gear", "carb", "mpg"), qtiles = 4, # many-valued numerics converted to quantile labs.on = FALSE, # use values, not variable value labels sort.freq = FALSE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed zero.rm = TRUE, # variable combinations that never occur are suppressed prop.digits = NULL, # return counts, not proportions max.unique.vals = 10 ) # drop from table any var with >10 distinct values # same as above, but include NA/irregular category values, # zero.rm is TRUE; include unobserved (zero-count) category combinations tabl(df, vars = c("am", "gear", "carb", "mpg"), qtiles = 4, labs.on = FALSE, # use values, not variable value labels sort.freq = TRUE, # sort by frequency irreg.rm = FALSE, # preserve/include NAs and irregular values zero.rm = FALSE, # include non-observed combinations prop.digits = NULL, # return counts, not proportions max.unique.vals = 10 ) # drop from table any var with >10 distinct values # show cross-tab view with wide.col arg tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL, # return counts, not proportions wide.col = "am" ) # use "am" as a column variable in a cross-tab view tabl(df, vars = c("cyl", "am"), labs.on = TRUE, # use variable value labels sort.freq = TRUE, # sort by vars values (not frequencies) irreg.rm = TRUE, # NAs and the like are suppressed prop.digits = NULL, # return counts, not proportions wide.col = "cyl" ) # use "cyl" as a column variable in a cross-tab view # verify select counts using base::subset() nrow(subset(df, am == 0 & cyl == 4)) nrow(subset(df, am == 0 & cyl == 8)) nrow(subset(df, am == 1 & cyl == 8)) nrow(subset(df, am == 0 & cyl == 6)) nrow(subset(df, am == 1 & cyl == 6)) # will work on an un-labeled data.frame tabl(mtcars, vars = c("am", "gear", "carb", "mpg"))
taill
accepts a labelr value-labeled data.frame and returns the last n
value-labeled rows of that data.frame
taill(data, n = 6L)
taill(data, n = 6L)
data |
a data.frame. |
n |
the number of consecutive rows at the end / bottom of the data.frame to return. |
Whereas utils::tail
returns the last n rows of a data.frame, taill
does
the same thing, substituting value labels for values wherever the former exist.
See also headl
and somel
.
a data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) tail(df) # utils::tail taill(df) # same, but with value labels in place of values
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) tail(df) # utils::tail taill(df) # same, but with value labels in place of values
Note: transfer_labs
searches a data.frame's name.labs and val.labs
attributes and transfers the labels associated with one variable name to
another, so that the first variable no longer has name or value labels
associated with it, and so that whatever name or value labels previously were
associated with it are now associated with the second variable.
transfer_labs(data, from, to)
transfer_labs(data, from, to)
data |
a data.frame. |
from |
the unquoted variable name from which labels will be transferred.
Note, even if the variable itself has been dropped from the data.frame (to
include being renamed), its label attribute meta-data may still be present and
available for use by this function (use |
to |
the unquoted name of the variable to which the labels will be transferred. |
Certain non-labelr data management functions will preserve the labelr labels
that are attached to the passed data.frame, but they will not update those
labels to reflect any changes the function makes to the variable(s). For
example, if one were to use dplyr::rename to change the name of a
value-labeled variable from old name "x1" to new name "satisfaction", the
labelr attributes associated with "x1" would not be transferred to label
"satisfaction." transfer_labs
allows one to transfer those labels, dis-
associating them with the old name (here, "x1") and associating them with new
name (here, "satisfaction").
A data.frame.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # let's add variable NAME labels df <- add_name_labs(df, name.labs = c( "age" = "Age in years", "raceth" = "racial-ethnic group category", "gender" = "gender identity" )) head(df, 4) get_name_labs(df) get_val_labs(df) df <- dplyr::rename(df, race = raceth) # new name is on left of = sign df <- dplyr::rename(df, gend = gender) # new name is on left of = sign head(df, 4) get_name_labs(df) get_val_labs(df) df <- transfer_labs(df, from = raceth, to = race) # labs info transferred from raceth df <- transfer_labs(df, from = gender, to = gend) # labs info transferred to gend df <- transfer_labs(df, from = gend, to = nothere) # var nothere does not exist! head(df, 4) get_name_labs(df) get_val_labs(df)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # let's add variable NAME labels df <- add_name_labs(df, name.labs = c( "age" = "Age in years", "raceth" = "racial-ethnic group category", "gender" = "gender identity" )) head(df, 4) get_name_labs(df) get_val_labs(df) df <- dplyr::rename(df, race = raceth) # new name is on left of = sign df <- dplyr::rename(df, gend = gender) # new name is on left of = sign head(df, 4) get_name_labs(df) get_val_labs(df) df <- transfer_labs(df, from = raceth, to = race) # labs info transferred from raceth df <- transfer_labs(df, from = gender, to = gend) # labs info transferred to gend df <- transfer_labs(df, from = gend, to = nothere) # var nothere does not exist! head(df, 4) get_name_labs(df) get_val_labs(df)
Replace data.frame variable names with their corresponding name labels
(previously assigned using add_name_labs
).
use_name_labs(data, vars = NULL) unl(data, vars = NULL)
use_name_labs(data, vars = NULL) unl(data, vars = NULL)
data |
the data.frame whose name labels you wish to "use" (aka swap, turn on, activate, etc.). |
vars |
the names of the columns (variables) to which name labels will be applied. If NULL, all available name labels will be swapped in for the corresponding variable (column) names. |
Note: unl
is a compact alias for use_name_labs
: they do the same thing,
and the former is easier to type
use_name_labs
works with add_name_labs
, get_name_labs
, use_var_names
,
and drop_name_labs
, to facilitate the creation, accessing, substitution
(swap out, swap back in), and destruction of variable name labels for
variable names. Each variable (column) of a data.frame can receive one and
only one "name label," which typically is a noun phrase that expounds the
meaning of contents of the variable's name (e.g., "Weight in ounces at birth"
might be a name label for a column called "wgt"). add_name_labs
associates
these labels with variables in a data.frame, use_name_labs
applies or
"turns on" those name labels, i.e., swaps out variable names for
corresponding labels, and you can assign the name-label-swapped data.frame
to an object, or you may use it strictly for display purposes (e.g.,
head(use_name_labs(df), 5)). Because they are intended to be more descriptive
than column names, they tend to be more verbose – possibly so verbose as to
undermine their value or convenience for anything other than an on-demand
"What is this variable again?" cheat sheet via get_name_labs(). That said,
this may have some uses (see examples).
A data.frame, with (all or the select) name labels swapped in for the variable names.
# variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # add the above name labeling scheme mt2 <- add_name_labs(mtcars, name.labs = names_labs_vec) # use the name labeling scheme (i.e., swap out column/variable names for # ...their name labels) mt2 <- use_name_labs(mt2) # compare these two - concision vs. informativeness as.data.frame(sapply(mtcars, mean)) as.data.frame(sapply(mt2, mean)) # compare the plot labeling we get with mtcars with(mtcars, hist(mpg)) get_name_labs(mt2) # get the lab of interest, and paste it into `` below with(mt2, hist(`Miles/(US) gallon`)) # regression - this is easier to type lm(mpg ~ cyl, data = mtcars) # regression with name labs - more painful to type/copy-paste, but may be # ...the more informative labels are worth it (your mileage/mpg may vary) # let's see the name labels, then copy paste mpg and cyl labs from console to # ...where we need them in the lm() call get_name_labs(mt2) # copy from this call's console output lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) # paste into `` here # same results, more informative labels, more steps/hand-jamming pain # can also turn them on (semi) permanently # ...then you can use mt2$ syntax in RStudio, and RStudio will autocomplete, # then you can backspace delete the "mt2$" # if you like mt2 <- use_name_labs(mt2) lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) lm(mpg ~ cyl, data = use_var_names(mt2)) # let's turn them back off mt2 <- use_var_names(mt2) # use_var_names() as "undo" of use_name_labs() # back to our previous variable names head(mt2) # even with name labels "off," mt2 retains labelr attribute meta-data # ...which we can strip away using strip_labs() identical(strip_labs(mt2), mtcars) # and we're back
# variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # add the above name labeling scheme mt2 <- add_name_labs(mtcars, name.labs = names_labs_vec) # use the name labeling scheme (i.e., swap out column/variable names for # ...their name labels) mt2 <- use_name_labs(mt2) # compare these two - concision vs. informativeness as.data.frame(sapply(mtcars, mean)) as.data.frame(sapply(mt2, mean)) # compare the plot labeling we get with mtcars with(mtcars, hist(mpg)) get_name_labs(mt2) # get the lab of interest, and paste it into `` below with(mt2, hist(`Miles/(US) gallon`)) # regression - this is easier to type lm(mpg ~ cyl, data = mtcars) # regression with name labs - more painful to type/copy-paste, but may be # ...the more informative labels are worth it (your mileage/mpg may vary) # let's see the name labels, then copy paste mpg and cyl labs from console to # ...where we need them in the lm() call get_name_labs(mt2) # copy from this call's console output lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) # paste into `` here # same results, more informative labels, more steps/hand-jamming pain # can also turn them on (semi) permanently # ...then you can use mt2$ syntax in RStudio, and RStudio will autocomplete, # then you can backspace delete the "mt2$" # if you like mt2 <- use_name_labs(mt2) lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) lm(mpg ~ cyl, data = use_var_names(mt2)) # let's turn them back off mt2 <- use_var_names(mt2) # use_var_names() as "undo" of use_name_labs() # back to our previous variable names head(mt2) # even with name labels "off," mt2 retains labelr attribute meta-data # ...which we can strip away using strip_labs() identical(strip_labs(mt2), mtcars) # and we're back
For a single value-labeled column of a data.frame, replace all of its values with the corresponding value labels and return the modified data.frame.
use_val_lab1(data, var) uvl1(data, var)
use_val_lab1(data, var) uvl1(data, var)
data |
the data.frame. |
var |
the unquoted name of the column (variable) whose values you wish to replace with the corresponding value labels. |
Note 1: use_val_lab1
is a variant of use_val_labs
that allows you to
specify only one variable at a time but that allows you to pass its name
without quoting it (compare use_val_lab1(mtcars, am) to
use_val_labs(mtcars, "am")).
Note 2: uvl1
is a compact alias for use_val_lab1
: they do the same thing,
and the former is easier to type.
Note 3: This command is intended exclusively for interactive use. In
particular, the var argument must be the literal name of a single variable
(column) found in the supplied data.frame and may NOT be, e.g., the name of a
character vector that contains the variable (column name) of interest. If you
wish to supply a character vector with the names of variables (columns) of
interest, use use_val_labs()
.
use_val_lab1
replaces a single, value-labeled data.frame column with a
"value labels-on" version of that column. Here, "labels-on" means that the
column's original values are replaced with the corresponding value labels.
Note that the modified column is a simple, self-contained character variable
that cannot itself be converted or reverted back to the original ("labels-off")
values of its parent/source column.
A data.frame consisting of the originally supplied data.frame, with the var argument variable's values replaced with its value labels.
# swap in "am" value labels for values in mtcars df <- mtcars # copy of mtcars # now, add value labels df <- add_val1( data = df, var = am, vals = c(0, 1), labs = c("automatic", "manual") ) # switch out "am" values for value labels, assign to df_plus df_plus <- use_val_lab1(df, am) head(df_plus[c("am")])
# swap in "am" value labels for values in mtcars df <- mtcars # copy of mtcars # now, add value labels df <- add_val1( data = df, var = am, vals = c(0, 1), labs = c("automatic", "manual") ) # switch out "am" values for value labels, assign to df_plus df_plus <- use_val_lab1(df, am) head(df_plus[c("am")])
Replace the actual values of data.frame variables with the corresponding
value labels (previous assigned using add_val_labs
or a related function).
use_val_labs(data, vars = NULL) uvl(data, vars = NULL)
use_val_labs(data, vars = NULL) uvl(data, vars = NULL)
data |
a data.frame. |
vars |
the names of the columns (variables) for which value labels will will replace original values in the returned data.frame. |
Note: uvl
is a compact alias for use_val_labs
: they do the same thing,
and the former is easier to type.
use_val_labs
takes a variable value-labeled data.frame and substitutes each
(labeled) variable's value labels for its values, returning a data.frame whose
dimensions, names, and members are the same as the inputted data.frame. This
may be useful if one wishes to view data.frame information using the
(potentially) more intuitively meaningful value labels (e.g., gender=1 values
displayed as "Male" instead of 1).
Warning: use_val_labs
will replace existing variable values with value
labels and cannot be undone. If you wish to preserve variable values, be sure
to assign the result of use_val_labs
to a new object. For other ways to
leverage value labels for common data management or inspection tasks, while
preserving raw data values in returned object, see add_lab_cols
,
add_lab_dummies
, flab
, slab
, tabl
, headl
, taill
, and somel
.
A data.frame, with (all or the select) variable value labels substituted for original variable values and any affected variables coerced to character if they were not already.
# Example #1 - mtcars example, one variable at a time # one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) head(use_val_labs(df), 3) # they're there # Example #2 - (Fake) Likert Data # add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1) # keep a copy dflik_conv <- use_val_labs(dflik) head(dflik_conv, 3)
# Example #1 - mtcars example, one variable at a time # one variable at a time, mtcars df <- mtcars # now, add value labels df <- add_val_labs( data = df, vars = "am", vals = c(0, 1), labs = c("automatic", "manual") ) df <- add_val_labs( data = df, vars = "carb", vals = c(1, 2, 3, 4, 6, 8), labs = c( "1-carb", "2-carbs", "3-carbs", "4-carbs", "6-carbs", "8-carbs" ) ) # var arg can be unquoted if using add_val1() # note that this is not add_val_labs(); add_val1() has "var" (not "vars) arg df <- add_val1( data = df, var = cyl, # note, "var," not "vars" arg vals = c(4, 6, 8), labs = c( "four-cyl", "six-cyl", "eight-cyl" ) ) df <- add_val_labs( data = df, vars = "gear", vals = c(3, 4), labs = c( "3-speed", "4-speed" ) ) # Oops, we forgot 5-speeds; let's finish the job. df <- add_val_labs( data = df, vars = "gear", vals = 5, labs = "5-speed" ) head(use_val_labs(df), 3) # they're there # Example #2 - (Fake) Likert Data # add val labs to multiple variables at once # make a "Likert"-type fake data set to demo # note, by default, add_val_labs() "vars" arg will do partial matching # in this case, we catch all vars with "x" in their name set.seed(272) dflik <- make_likert_data(scale = 1:7) vals2label <- 1:7 labs2use <- c( "VSD", "SD", "D", "N", "A", "SA", "VSA" ) dflik <- add_val_labs( data = dflik, vars = c("x", "y3"), # note the vars args vals = vals2label, labs = labs2use, partial = TRUE ) # note, all "x" vars get the labs, as does "y3" # see vars = args above lik1 <- use_val_labs(dflik) head(lik1) # keep a copy dflik_conv <- use_val_labs(dflik) head(dflik_conv, 3)
"Undo" or reverse use_name_labs
operation, restoring the data.frame's
original variable names (assuming they were previously swapped out for labels
using use_name_labs
).
use_var_names(data, vars = NULL) uvn(data, vars = NULL)
use_var_names(data, vars = NULL) uvn(data, vars = NULL)
data |
a data.frame. |
vars |
the names of the columns (variables) to which the action will be
applied. If NULL, all current variable names (presumably, the labels you
turned on using |
Note: uvn
is a compact alias for use_var_names
: they do the same thing,
and the former is easier to type
use_var_names
works with add_name_labs
, get_name_labs
, use_name_labs
,
and drop_name_labs
, to facilitate the creation, accessing, substitution
(swap out, swap back in), and destruction of variable name labels for
variable names. Each variable (column) of a data.frame can receive one and
only one "name label," which typically is a noun phrase that expounds the
meaning of contents of the variable's name (e.g., "Weight in ounces at birth"
might be a name label for a column called "wgt"). add_name_labs
associates
these labels with variables in a data.frame, use_var_names
"turns off" the
name labels are currently being used as variable names as a result of calling
use_name_labs
– that is, use_var_names
"undoes" or reverses
use_name_labs
– assuming none of your intervening calls have altered or
dropped your data.frame's name label meta-data attributes.
A data.frame.
# variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # add the above name labeling scheme mt2 <- add_name_labs(mtcars, name.labs = names_labs_vec) # use the name labeling scheme (i.e., swap out column/variable names for # ...their name labels) mt2 <- use_name_labs(mt2) # compare these two - concision vs. informativeness as.data.frame(sapply(mtcars, mean)) as.data.frame(sapply(mt2, mean)) # compare the plot labeling we get with mtcars with(mtcars, hist(mpg)) get_name_labs(mt2) # get the lab of interest, and paste it into `` below with(mt2, hist(`Miles/(US) gallon`)) # regression - this is easier to type lm(mpg ~ cyl, data = mtcars) # regression with name labs - more painful to type/copy-paste, but maybe # ...the more informative labels are worth it (your mileage may vary) # let's see the name labels, then copy paste mpg and cyl labs from console to # ...where we need them in the lm() call get_name_labs(mt2) # copy from this call's console output lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) # paste into `` here # same results, more informative labels, more steps/hand-jamming pain # can also turn them on (semi) permanently # ...then you can use mt2$ syntax in Rstudio, and Rstudio will autocomplete, # then you can backspace delete the "mt2$" # if you like mt2 <- use_name_labs(mt2) lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) lm(mpg ~ cyl, data = use_var_names(mt2)) # let's turn them back off mt2 <- use_var_names(mt2) # use_var_names() as "undo" of use_name_labs() # back to our previous variable names head(mt2) # even with name labels "off," mt2 retains labelr attribute meta-data # ...which we can strip away using strip_labs() identical(strip_labs(mt2), mtcars) # and we're back
# variable names and their labels names_labs_vec <- c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) # add the above name labeling scheme mt2 <- add_name_labs(mtcars, name.labs = names_labs_vec) # use the name labeling scheme (i.e., swap out column/variable names for # ...their name labels) mt2 <- use_name_labs(mt2) # compare these two - concision vs. informativeness as.data.frame(sapply(mtcars, mean)) as.data.frame(sapply(mt2, mean)) # compare the plot labeling we get with mtcars with(mtcars, hist(mpg)) get_name_labs(mt2) # get the lab of interest, and paste it into `` below with(mt2, hist(`Miles/(US) gallon`)) # regression - this is easier to type lm(mpg ~ cyl, data = mtcars) # regression with name labs - more painful to type/copy-paste, but maybe # ...the more informative labels are worth it (your mileage may vary) # let's see the name labels, then copy paste mpg and cyl labs from console to # ...where we need them in the lm() call get_name_labs(mt2) # copy from this call's console output lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) # paste into `` here # same results, more informative labels, more steps/hand-jamming pain # can also turn them on (semi) permanently # ...then you can use mt2$ syntax in Rstudio, and Rstudio will autocomplete, # then you can backspace delete the "mt2$" # if you like mt2 <- use_name_labs(mt2) lm(`Miles/(US) gallon` ~ `Number of cylinders`, data = mt2) lm(mpg ~ cyl, data = use_var_names(mt2)) # let's turn them back off mt2 <- use_var_names(mt2) # use_var_names() as "undo" of use_name_labs() # back to our previous variable names head(mt2) # even with name labels "off," mt2 retains labelr attribute meta-data # ...which we can strip away using strip_labs() identical(strip_labs(mt2), mtcars) # and we're back
Alternative to the base c()
combine operator that allows one to select
columns by passing unquoted comma-separated column names instead of quoted,
comma-separated column names.
v(...)
v(...)
... |
a vector of unquoted, comma-separated column names. |
Does not support or combine with other subsetting operators, such as negative indexing or colon: names must be full, individual column names, separated by commas (see examples).
A character vector of quoted, comma-separated, column names.
mt2a <- mtcars[c("am", "cyl", "mpg")] mt2b <- mtcars[v(am, cyl, mpg)] identical(mt2a, mt2b) # TRUE # silly demo mtlabs <- mtcars mtlabs <- add_val_labs( data = mtlabs, vars = "am", vals = c(0, 1), labs = v(a, m) # equivalent to c("a", "m") ) get_val_labs(mtlabs)
mt2a <- mtcars[c("am", "cyl", "mpg")] mt2b <- mtcars[v(am, cyl, mpg)] identical(mt2a, mt2b) # TRUE # silly demo mtlabs <- mtcars mtlabs <- add_val_labs( data = mtlabs, vars = "am", vals = c(0, 1), labs = v(a, m) # equivalent to c("a", "m") ) get_val_labs(mtlabs)
Select a single, value-labeled data.frame column (variable), replace each of its values with the corresponding value labels, and return the result as a vector.
val_labs_vec(data, var) vlv(data, var)
val_labs_vec(data, var) vlv(data, var)
data |
a data.frame. |
var |
the unquoted name of the column (variable) whose values will be converted to the associated value labels in the returned vector. |
Note 1: vlv
is a compact alias for val_labs_vec
: they do the same thing,
and the former is easier to type.
Note 2: This command is intended exclusively for interactive use. In particular, the var argument must be the literal name of a single variable (column) found in the supplied data.frame and may NOT be, e.g., the name of a character vector that contains the variable (column name) of interest.
val_labs_vec
works with other labelr functions to facilitate creation,
modification, accessing, use, and destruction of variable-specific value
labels. This functionality is equivalent to calling use_val_labs
with a
single variable passed to the vars argument, except that the latter returns
the entire data.frame with that variable modified, while val_labs_vec
returns only that single modified variable itself (as a vector)
A vector containing the original data.frame variable (var), after its values have been converted to their corresponding value labels.
df <- mtcars # add value labels df <- add_val_labs( data = df, var = "am", vals = c(0, 1), labs = c("automatic", "manual") ) am_labs <- val_labs_vec(df, am) length(df$am) class(df$am) df$am length(am_labs) class(am_labs) am_labs
df <- mtcars # add value labels df <- add_val_labs( data = df, var = "am", vals = c(0, 1), labs = c("automatic", "manual") ) am_labs <- val_labs_vec(df, am) length(df$am) class(df$am) df$am length(am_labs) class(am_labs) am_labs
with_both_labs
instructs R function calls to swap in variable name labels
for column names AND variable value labels for variable values in the objects
they return or side effects they produce.
Note: wbl
is a compact alias for with_both_labs
: they do the same thing,
and the former is easier to type
with_both_labs(data, ...) wbl(data, ...)
with_both_labs(data, ...) wbl(data, ...)
data |
a data.frame with variable name labels and variable value labels. |
... |
an additional expression passed to dots (quotes and dollar signs are not needed or permitted). |
with_both_labs
(see also alias wbl
) is intended for interactive use. With
it, you pass a name-labeled data.frame followed, followed by a comma, followed
by an unquoted R expression (function call) to be evaluated within that
data.frame, and both name and value labels will be substituted for their
corresponding, respective column names and variable value in any returned
object or side effects. Your function call (expression) should refer to
columns of the data.frame passed via your data argument, NOT their name
labels, as the intent is to allow you to pass functions in terms of the
(typically much more concise and familiar) column names while having the
results displayed / presented in terms of the more informative (but more
verbose and typically non-standard) name labels. See examples.
Caution 1: Typically, with_name_labs
will be more appropriate than
with_both_labs
, since conversion of variables' values to their corresponding
labels frequently entails conversion from numeric to character.
Caution 2: with_both_labs
is a rudimentary function that leverages basic
regular expressions and eval(parse(text=))
to substitute name labels for
variable names behind the scenes. It appears to be robust to a range of the
most common commands and operators (e.g., formula or modeling operators, such
as ~ , * , +, :, =, and |). However, it is intended strictly as a convenience
for relatively simple, interactive, single-line-expression, data exploration,
description, or simple model-fitting use cases. It is expressly NOT intended
for: (1) multi-step workflows or pipes, (2) expressions that require or make
reference to objects existing outside the supplied data.frame, or (3) data
management operations aimed at modifying the supplied data.frame. Again, see
the examples for the types of expressions/use cases envisioned.
the value of the evaluated expr
, with name and value labels
substituted for variable (column) names and values, respectively.
# assign mtcars to new data.frame mt2 mt2 <- mtcars # add name labs mt2 <- add_name_labs(mt2, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) # add many-to-1 value labels mt2 <- add_m1_lab( data = mt2, vars = "gear", vals = 4:5, lab = "4+" ) # add many-to-1 value labels mt2 <- add_val_labs( data = mt2, vars = "am", vals = c(0, 1), lab = c("auto", "man") ) with_both_labs(mt2, t.test(mpg ~ am)) with_both_labs(mt2, lm(mpg ~ am)) with_both_labs(mt2, xtabs(~gear)) xtabs(~ mt2$gear)
# assign mtcars to new data.frame mt2 mt2 <- mtcars # add name labs mt2 <- add_name_labs(mt2, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) # add many-to-1 value labels mt2 <- add_m1_lab( data = mt2, vars = "gear", vals = 4:5, lab = "4+" ) # add many-to-1 value labels mt2 <- add_val_labs( data = mt2, vars = "am", vals = c(0, 1), lab = c("auto", "man") ) with_both_labs(mt2, t.test(mpg ~ am)) with_both_labs(mt2, lm(mpg ~ am)) with_both_labs(mt2, xtabs(~gear)) xtabs(~ mt2$gear)
with_name_labs
instructs R function calls to swap in variable name labels
for column names in the objects they return or side effects they produce.
with_name_labs(data, ...) wnl(data, ...)
with_name_labs(data, ...) wnl(data, ...)
data |
a data.frame with name labels. |
... |
an additional expression passed to dots (quotes and dollar signs are not needed or permitted). |
Note: wnl
is a compact alias for with_name_labs
: they do the same thing,
and the former is easier to type
with_name_labs
is intended for interactive use. With it, you pass a
name-labeled data.frame, followed by a comma, followed by an unquoted R
expression (function call) to be evaluated within that data.frame, and name
labels will be substituted for column names in any returned object or side
effects. Your function call (expression) should refer to columns of the
data.frame, NOT their name labels, as the intent is to allow you to pass
functions in terms of the (typically much more concise and familiar) column
names while having the results displayed / presented in terms of the more
informative (but more verbose and typically non-standard) name labels.
See examples.
Caution: with_name_labs
is a rudimentary function that leverages basic,
fairly literal (and potentially brittle!) regular expressions and
eval(parse(text=))
to substitute name labels for variable names behind the
scenes. It appears to be robust to a range of common commands and operators
(e.g., ~ , * , +, :, ::, =, and |). However, it is intended strictly as a
convenience for relatively simple, interactive, single-line-expression use
cases, involving data exploration, description, or simple model-fitting. It
is expressly NOT intended for: (1) multi-step workflows or pipes, (2)
expressions that require or make reference to objects existing outside the
supplied data.frame, or (3) data management operations aimed at modifying
(e.g., subsetting, merging, renaming) – as opposed to merely describing or
analyzing – the supplied data.frame. Again, see the examples for the types
of expressions/use cases envisioned.
the value of the evaluated expr
, with name labels substituted for
variable (column) names.
# assign mtcars to new data.frame mt2 mt2 <- mtcars # add name labs mt2 <- add_name_labs(mt2, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) with_name_labs(mt2, t.test(mpg ~ am)) with_name_labs(mt2, lm(mpg ~ am)) with_name_labs(mt2, summary(mt2)) with_name_labs(mt2, cor(mt2, use = "pairwise.complete.obs")) with_name_labs(mt2, xtabs(~gear)) xtabs(~ mt2$gear) with_name_labs(mt2, cor(mpg, carb)) with_name_labs(mt2, hist(mpg)) with_name_labs(mt2, plot(mpg, carb)) with_name_labs(mt2, head(gear)) with_name_labs(mt2, summary(mt2))
# assign mtcars to new data.frame mt2 mt2 <- mtcars # add name labs mt2 <- add_name_labs(mt2, name.labs = c( "mpg" = "Miles/(US) gallon", "cyl" = "Number of cylinders", "disp" = "Displacement (cu.in.)", "hp" = "Gross horsepower", "drat" = "Rear axle ratio", "wt" = "Weight (1000 lbs)", "qsec" = "1/4 mile time", "vs" = "Engine (0 = V-shaped, 1 = straight)", "am" = "Transmission (0 = automatic, 1 = manual)", "gear" = "Number of forward gears", "carb" = "Number of carburetors" ) ) with_name_labs(mt2, t.test(mpg ~ am)) with_name_labs(mt2, lm(mpg ~ am)) with_name_labs(mt2, summary(mt2)) with_name_labs(mt2, cor(mt2, use = "pairwise.complete.obs")) with_name_labs(mt2, xtabs(~gear)) xtabs(~ mt2$gear) with_name_labs(mt2, cor(mpg, carb)) with_name_labs(mt2, hist(mpg)) with_name_labs(mt2, plot(mpg, carb)) with_name_labs(mt2, head(gear)) with_name_labs(mt2, summary(mt2))
with_val_labs
wraps a data.frame in use_val_labs
and wraps the resulting
data.frame in base::with
in support of base::with
-like non-standard
evaluation (see examples).
Note: wvl
is a compact alias for with_val_labs
: they do the same thing,
and the former is easier to type
with_val_labs(data, ...) wvl(data, ...)
with_val_labs(data, ...) wvl(data, ...)
data |
a data.frame with value-labeled columns. |
... |
additional arguments passed to dots, typically an expression involving a function called on unquoted variable(s) (see examples). |
with_val_labs
(see also alias wvl
) is useful for applying certain
nominal-variable-friendly functions (chiefly, table
and the like) to
value-labeled data.frames. See also base::with
. See also tabl
,
with_name_labs
, and with_both_labs
.
the value of the evaluated expr
, with value labels
substituted for variable values.
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # "with_val_labs" - with()-like function that swaps value labels out for value values # compare with(df, ...) to with_val_labs(df,...) with(df, table(gender, raceth)) # without labels # the same data (note that presentation order changes d/t alphabetical ordering) with_val_labs(df, table(gender, raceth)) # with labels with(use_val_labs(df), table(gender, raceth)) # above is shorthand for this # just raceth with(df, table(raceth)) # with with_val_labs(df, table(raceth)) # with_val_labs # another use case with(df, unique(raceth)) # with with_val_labs(df, unique(raceth)) # with_val_labs # another with(df, modelr::typical(raceth)) # numerical median! with_val_labs(df, modelr::typical(raceth)) # modal label (not the median!)
# make toy demographic (gender, raceth, etc.) data set set.seed(555) df <- make_demo_data(n = 1000) # another labelr:: function # let's add variable VALUE labels for variable "raceth" df <- add_val_labs(df, vars = "raceth", vals = c(1:7), labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"), max.unique.vals = 50 ) # let's add variable VALUE labels for variable "gender" # note that, if we are labeling a single variable, we can use add_val1() # distinction between add_val1() and add_val_labs() will become more meaningful # when we get to our Likert example df <- add_val1( data = df, gender, vals = c(0, 1, 2, 3, 4), labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50 ) # "with_val_labs" - with()-like function that swaps value labels out for value values # compare with(df, ...) to with_val_labs(df,...) with(df, table(gender, raceth)) # without labels # the same data (note that presentation order changes d/t alphabetical ordering) with_val_labs(df, table(gender, raceth)) # with labels with(use_val_labs(df), table(gender, raceth)) # above is shorthand for this # just raceth with(df, table(raceth)) # with with_val_labs(df, table(raceth)) # with_val_labs # another use case with(df, unique(raceth)) # with with_val_labs(df, unique(raceth)) # with_val_labs # another with(df, modelr::typical(raceth)) # numerical median! with_val_labs(df, modelr::typical(raceth)) # modal label (not the median!)