Title: | Libraries, Data Dictionaries, and a Data Step for R |
---|---|
Description: | Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing. |
Authors: | David Bosak [aut, cre], Brian Varney [ctb], Archytas Clinical Solutions [cph] |
Maintainer: | David Bosak <[email protected]> |
License: | CC0 |
Version: | 1.3.7 |
Built: | 2024-11-07 13:44:04 UTC |
Source: | CRAN |
A custom indexer for the Datastep Array. The indexer will
return a value for all columns or a specified column. To
access all columns, leave the indexer empty. Otherwise, specify the
the column name(s) or number(s) to return data for. The indexer will
always act upon the current row in the datastep.
For additional details, see the dsarray
function.
## S3 method for class 'dsarray' x[i = NULL]
## S3 method for class 'dsarray' x[i = NULL]
x |
The |
i |
The index of the datastep array item to return a value for. This index can be a column name or position in the array. It can also be a vector of column names or positions. If no index is supplied, a vector of all array values will be returned. |
The value of the specified column for the current row in the datastep. If no index is supplied, a vector of all column values will be returned.
Other datastep:
datastep()
,
delete()
,
dsarray()
,
dsattr()
,
length.dsarray()
,
output()
library(libr) # Create AirPassengers Data Frame df <- as.data.frame(t(matrix(AirPassengers, 12, dimnames = list(month.abb, seq(1949, 1960)))), stringsAsFactors = FALSE) # Use datastep array to get sums by quarter # Examine different ways of referencing data inside datastep dat <- datastep(df, keep = c("Q1", "Q2", "Q3", "Q4", "Tot"), arrays = list(months = dsarray(names(df))), { # Reference by column name Q1 <- Jan + Feb + Mar # Reference by array positions Q2 <- sum(months[4:6]) # Reference by array names Q3 <- sum(months[c("Jul", "Aug", "Sep")]) # Reference by row position Q4 <- rw$Oct + rw[["Nov"]] + rw[[12]] # Empty indexer returns all column values in array Tot <- sum(months[]) }) dat # Q1 Q2 Q3 Q4 Tot # 1949 362 385 432 341 1520 # 1950 382 409 498 387 1676 # 1951 473 513 582 474 2042 # 1952 544 582 681 557 2364 # 1953 628 707 773 592 2700 # 1954 627 725 854 661 2867 # 1955 742 854 1023 789 3408 # 1956 878 1005 1173 883 3939 # 1957 972 1125 1336 988 4421 # 1958 1020 1146 1400 1006 4572 # 1959 1108 1288 1570 1174 5140 # 1960 1227 1468 1736 1283 5714
library(libr) # Create AirPassengers Data Frame df <- as.data.frame(t(matrix(AirPassengers, 12, dimnames = list(month.abb, seq(1949, 1960)))), stringsAsFactors = FALSE) # Use datastep array to get sums by quarter # Examine different ways of referencing data inside datastep dat <- datastep(df, keep = c("Q1", "Q2", "Q3", "Q4", "Tot"), arrays = list(months = dsarray(names(df))), { # Reference by column name Q1 <- Jan + Feb + Mar # Reference by array positions Q2 <- sum(months[4:6]) # Reference by array names Q3 <- sum(months[c("Jul", "Aug", "Sep")]) # Reference by row position Q4 <- rw$Oct + rw[["Nov"]] + rw[[12]] # Empty indexer returns all column values in array Tot <- sum(months[]) }) dat # Q1 Q2 Q3 Q4 Tot # 1949 362 385 432 341 1520 # 1950 382 409 498 387 1676 # 1951 473 513 582 474 2042 # 1952 544 582 681 557 2364 # 1953 628 707 773 592 2700 # 1954 627 725 854 661 2867 # 1955 742 854 1023 789 3408 # 1956 878 1005 1173 883 3939 # 1957 972 1125 1336 988 4421 # 1958 1020 1146 1400 1006 4572 # 1959 1108 1288 1570 1174 5140 # 1960 1227 1468 1736 1283 5714
The datastep
function allows you to perform
row-wise conditional processing on a data frame, data table, or tibble.
The function
contains parameters to drop, keep, or rename variables, perform
by-group processing, and perform row-wise or column-wise calculations.
datastep( data, steps, keep = NULL, drop = NULL, rename = NULL, by = NULL, calculate = NULL, retain = NULL, attrib = NULL, arrays = NULL, sort_check = TRUE, format = NULL, label = NULL, where = NULL, set = NULL, merge = NULL, merge_by = NULL, merge_in = NULL, log = TRUE, subset = NULL )
datastep( data, steps, keep = NULL, drop = NULL, rename = NULL, by = NULL, calculate = NULL, retain = NULL, attrib = NULL, arrays = NULL, sort_check = TRUE, format = NULL, label = NULL, where = NULL, set = NULL, merge = NULL, merge_by = NULL, merge_in = NULL, log = TRUE, subset = NULL )
data |
The data to step through. This parameter is required. Valid
values are a data frame or a NULL. In the case of a NULL, you must include
at least one |
steps |
The operations to perform on the data. This parameter is specified as a set of R statements contained within curly braces. If no steps are desired, pass empty curly braces. |
keep |
A vector of quoted variable names to keep in the output data set. By default, all variables are kept. |
drop |
A vector of quoted variable names to drop from the output data set. By default, no variables are dropped. |
rename |
A named vector of quoted variables to rename. The current variable name should be on the left hand side of the name/value pair, and the new variable name should be on the right. The rename operation is performed after the data step, the keep, and the drop. Therefore, the data steps should use the input variable name. By default, all variables retain their original names. |
by |
A vector of quoted variable names to use for by-group processing.
This parameter will activate the |
calculate |
Steps to set up calculated variables.
Calculated variables are commonly generated with summary functions such as
|
retain |
A list of variable names and initial values
to retain. Retained variables will begin the data step with the initial
value. Then for each iteration of the data step, the variable will
be populated with the ending value from the previous step. The retain
functionality allows you to perform cumulative operations or decisions
based on the value of the previous iteration of the data step. Initial
values should be of the expected data type for the column. For example,
for a numeric column set the initial value to a zero, and for a
character column, set the initial value to an empty string, i.e.
|
attrib |
A named list of attributes. The list can be either
|
arrays |
A named list of |
sort_check |
Checks to see if the input data is sorted according to
the |
format |
A named list of formats to assign to the data frame. Formats will be assigned both before and after the datastep. |
label |
A named list of labels to assign to the output data frame. |
where |
An expression to filter the output dataset. The where
clause will be applied prior to any drop, keep, or rename statement.
Use the |
set |
A dataset or list of datasets to append to the input data frame. The set operation will occur at the beginning of the datastep, prior to the execution of any steps. The columns in the set datasets do not have to match. Where there are no matching columns, the missing values will be filled with NA. |
merge |
A dataset or list of datasets to merge with the input
data. The merge operation will occur at the beginning of the datastep,
prior to the execution of any steps. When the |
merge_by |
If the |
merge_in |
A vector of column names to be used to hold the merge flags.
The number of names should correspond to the number of
datasets being merged. The
merge flags will be populated with 0 or 1 values to indicate whether the record
came from the corresponding table. Use the |
log |
Whether or not to log the datastep. Default is TRUE. This parameter is used internally. |
subset |
The |
Two parameters on the datastep
function are required:
data and steps. The data parameter is
the input data to the data step. The steps parameter contains
the code statements you want to apply to the data. The steps
should be wrapped in curly braces. When running, the data step
will loop through the input data row-by-row, and execute the steps for
each row. Variables inside the data step can be accessed using
non-standard evaluation (meaning they do not have to be quoted).
Note that the data step is pipe-friendly. It can be used within a dplyr pipeline. The data step allows you to perform deeply nested and complex conditionals within the pipeline. The data step is also very readable compared to other pipeline conditionals.
The processed data frame, tibble, or data table.
The datastep
function provides five automatic variables. These
variables are generated for every data step, and can
be accessed at any point within the data step:
data: Represents the entire input data frame.
rw: Represents the current row.
n.: Contains the row number.
first.: Indicates the beginning of a by-group.
last.: Indicates the end of a by-group.
Automatic variables will be dropped from the data frame at the end of the data step. If you wish to keep the automatic variable values, assign the automatic variable to a new variable and keep that variable.
If there are multiple by group variables, the first.
and last.
automatic variables indicates a either/or combination of all by variables.
In addition,
first.X
and last.X
automatic variables will be created for
each variable, where "X" represents the name of the specified variable.
As always, these names are case-sensitive.
To set attributes for a column on your data, use the attrib
parameter. Example attributes include 'label', 'description',
and 'format'. These types of attributes are set using a named list and a
dsattr
object. The name of the list item
is the column name you want to set attributes on.
The value of the list item is the dsattr
object.
For a complete list of available attributes,
see the dsattr
documentation.
It should be mentioned that the dsattr
object is not required.
You can also set attributes with a name and a default value.
The default value can be any valid data value, such as a number or string.
The label and format attributes may also be set with the 'label' and 'format' parameters. These parameters accept a named list with the labels or formats, and will be assigned to the output data frame.
Optional parameters on the datastep
allow you to shape
the output dataset or enhance the operation of the datastep
. Some
parameters are classified as input parameters, and others as output
parameters. Input parameters modify the data before the data step
operations takes place. Output parameters operate on the data
after the data step.
The keep
, drop
, and rename
parameters
are output parameters. These parameters will be applied after the
data step statements are executed. Therefore, within the data step,
refer to variables using the input variable name. New variables may
be created on the fly, just by assigning a value to the new
variable name.
The keep
, drop
, and rename
parameters require
quoted variable names, as the variables may not yet exist at the
time they are passed into the function. Within a data step or
calculate block, however,
variable names do not need to be quoted.
The calculate
parameter is used to perform vectorized functions
on the data prior to executing the data step. For example, you
may want to determine a mean for a variable in the calculate
block, and then make decisions on that mean in the data step block.
The retain
parameter allows you to access the prior row value.
At the start of the data step, the retained variable is seeded with the
initial value. For each subsequent step, the variable is seeded with the
value of the prior step/row. This functionality allows you to increment
values or perform cumulative operations.
calculate
and retain
are both input parameters.
The subset
and where
parameters can both be used to filter
the datastep data. The difference is that subset
is an input
parameter, and where
is an output parameter.
The datastep
function allows you to join one or more input datasets
into a single output dataset. There are two operations in this regard:
"set" and "merge". A set operation stacks the datasets vertically, one
on top of another. The merge operation joins the datasets horizontally,
left and right.
The datastep
set and merge operations are unusually flexible compared
to other join functions in R. The set operation does not require the same
number of columns in each dataset. Likewise, the merge operation does
not require the same number of rows. In both cases, where there is no
corresponding column or row, the function will fill in with NA values.
The merge operation can perform both inner and outer joins. By default,
the merge performs a full outer join. If you wish to limit the operation
to an inner join, use the "merge_in" parameter to set up variables with
which you can filter the desired rows. The "merge_in" variables will
be populated with 1 or 0 values, which indicate whether or not the
dataset contained that row. Once these variables are populated, you
can easily limit the results using a where
expression, or the
delete
or output
functions from inside the datastep.
There are times you may want to iterate over columns in your data step. Such iteration is particularly useful when you have a wide dataset, and wish to perform the same operation on several columns. For instance, you may want to calculate the mean for 10 different variables on your dataset.
The arrays
parameter allows you to iterate across columns.
This parameter accepts a named list of
dsarray
objects. The dsarray
is essentially
a list of columns. You can use a for
loop to iterate over the
dsarray
, and also send it into a vectorized function. Data
step arrays allow to you to perform row-wise calculations.
For instance, you can calculate
a sum or mean by row for the variables in your array.
By default, the data step will retain the column order of any variables that already exist on the input data set. New variables created in a data step will be appended to the right of existing variables. Yet these new variables can sometimes appear in an order that is unexpected or undesirable.
There are two ways to control the order of output columns:
the keep
parameter and the attrib
parameter.
Columns names included on the
keep
parameter will appear in the order indicated on the keep
vector. This ordering mechanism is appropriate when you have a small
number of columns and can easily pass the entire keep list.
To control the order of new variables only, use the attrib
parameter.
New variables for which attributes are defined will appear in the
order indicated on the attrib
list. The attrib
list
is useful when you are adding a relatively small number of columns to
an existing data set, and don't want to pass all the column names.
Remember
that you can supply an attribute list with default values only,
such as attrib = list(column1 = 0, column2 = "")
. This style of
attribute definition is convenient if you are only trying to control
the order of columns.
If the above two mechanisms to control column order are not sufficient, use the data frame subset operators or column ordering functions provided by other packages.
The datastep
is intended to be used on small and medium-sized
datasets. It is not recommended for large datasets.
If your dataset is greater than one million rows, you should consider
other techniques for processing your data. While there is no
built-in restriction on the number of rows, performance of the
datastep
can become unacceptable with a large number of rows.
libname
function to create a data library, and
the dictionary
function to create a data dictionary.
Other datastep:
[.dsarray()
,
delete()
,
dsarray()
,
dsattr()
,
length.dsarray()
,
output()
# Example #1: Simple Data Step df <- datastep(mtcars[1:10,], keep = c("mpg", "cyl", "disp", "mpgcat", "recdt", "is8cyl"), { if (mpg >= 20) mpgcat <- "High" else mpgcat <- "Low" recdt <- as.Date("1974-06-10") if (cyl == 8) is8cyl <- TRUE else is8cyl <- FALSE }) df # mpg cyl disp mpgcat recdt # Mazda RX4 21.0 6 160.0 High 1974-06-10 # Mazda RX4 Wag 21.0 6 160.0 High 1974-06-10 # Datsun 710 22.8 4 108.0 High 1974-06-10 # Hornet 4 Drive 21.4 6 258.0 High 1974-06-10 # Hornet Sportabout 18.7 8 360.0 Low 1974-06-10 # Valiant 18.1 6 225.0 Low 1974-06-10 # Duster 360 14.3 8 360.0 Low 1974-06-10 # Merc 240D 24.4 4 146.7 High 1974-06-10 # Merc 230 22.8 4 140.8 High 1974-06-10 # Merc 280 19.2 6 167.6 Low 1974-06-10 # Example #2: By-group Processing df <- datastep(mtcars[1:10,], keep = c("mpg", "cyl", "gear", "grp"), by = c("gear"), sort_check = FALSE, { if (first.) grp <- "Start" else if (last.) grp <- "End" else grp <- "-" }) df # mpg cyl gear grp # Mazda RX4 21.0 6 4 Start # Mazda RX4 Wag 21.0 6 4 - # Datsun 710 22.8 4 4 End # Hornet 4 Drive 21.4 6 3 Start # Hornet Sportabout 18.7 8 3 - # Valiant 18.1 6 3 - # Duster 360 14.3 8 3 End # Merc 240D 24.4 4 4 Start # Merc 230 22.8 4 4 - # Merc 280 19.2 6 4 End # Example #3: Calculate Block df <- datastep(mtcars, keep = c("mpg", "cyl", "mean_mpg", "mpgcat"), calculate = { mean_mpg = mean(mpg) }, { if (mpg >= mean_mpg) mpgcat <- "High" else mpgcat <- "Low" }) df[1:10,] # mpg cyl mean_mpg mpgcat # Mazda RX4 21.0 6 20.09062 High # Mazda RX4 Wag 21.0 6 20.09062 High # Datsun 710 22.8 4 20.09062 High # Hornet 4 Drive 21.4 6 20.09062 High # Hornet Sportabout 18.7 8 20.09062 Low # Valiant 18.1 6 20.09062 Low # Duster 360 14.3 8 20.09062 Low # Merc 240D 24.4 4 20.09062 High # Merc 230 22.8 4 20.09062 High # Merc 280 19.2 6 20.09062 Low # Example #4: Data pipeline library(dplyr) library(magrittr) # Add datastep to dplyr pipeline df <- mtcars %>% select(mpg, cyl, gear) %>% mutate(mean_mpg = mean(mpg)) %>% datastep({ if (mpg >= mean_mpg) mpgcat <- "High" else mpgcat <- "Low" }) %>% filter(row_number() <= 10) df # mpg cyl gear mean_mpg mpgcat # 1 21.0 6 4 20.09062 High # 2 21.0 6 4 20.09062 High # 3 22.8 4 4 20.09062 High # 4 21.4 6 3 20.09062 High # 5 18.7 8 3 20.09062 Low # 6 18.1 6 3 20.09062 Low # 7 14.3 8 3 20.09062 Low # 8 24.4 4 4 20.09062 High # 9 22.8 4 4 20.09062 High # 10 19.2 6 4 20.09062 Low # Example #5: Drop, Retain and Rename df <- datastep(mtcars[1:10, ], drop = c("disp", "hp", "drat", "qsec", "vs", "am", "gear", "carb"), retain = list(cumwt = 0 ), rename = c(mpg = "MPG", cyl = "Cylinders", wt = "Wgt", cumwt = "Cumulative Wgt"), { cumwt <- cumwt + wt }) df # MPG Cylinders Wgt Cumulative Wgt # Mazda RX4 21.0 6 2.620 2.620 # Mazda RX4 Wag 21.0 6 2.875 5.495 # Datsun 710 22.8 4 2.320 7.815 # Hornet 4 Drive 21.4 6 3.215 11.030 # Hornet Sportabout 18.7 8 3.440 14.470 # Valiant 18.1 6 3.460 17.930 # Duster 360 14.3 8 3.570 21.500 # Merc 240D 24.4 4 3.190 24.690 # Merc 230 22.8 4 3.150 27.840 # Merc 280 19.2 6 3.440 31.280 # Example #6: Attributes and Arrays # Create sample data dat <- read.table(header = TRUE, text = ' Year Q1 Q2 Q3 Q4 2000 125 137 152 140 2001 132 145 138 87 2002 101 104 115 121') # Use attrib list to control column order and add labels # Use array to calculate row sums and means, and get best quarter df <- datastep(dat, attrib = list(Tot = dsattr(0, label = "Year Total"), Avg = dsattr(0, label = "Year Average"), Best = dsattr(0, label = "Best Quarter")), arrays = list(qtrs = dsarray("Q1", "Q2", "Q3", "Q4")), drop = "q", steps = { # Empty brackets return all array values Tot <- sum(qtrs[]) Avg <- mean(qtrs[]) # Iterate to find best quarter for (q in qtrs) { if (qtrs[q] == max(qtrs[])) Best <- q } }) df # Year Q1 Q2 Q3 Q4 Tot Avg Best # 1 2000 125 137 152 140 554 138.50 Q3 # 2 2001 132 145 138 87 502 125.50 Q2 # 3 2002 101 104 115 121 441 110.25 Q4 dictionary(df) # A tibble: 8 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <int> <chr> <int> <int> # 1 df Year integer NA NA NA NA NA 3 0 # 2 df Q1 integer NA NA NA NA NA 3 0 # 3 df Q2 integer NA NA NA NA NA 3 0 # 4 df Q3 integer NA NA NA NA NA 3 0 # 5 df Q4 integer NA NA NA NA NA 3 0 # 6 df Tot integer Year Total NA NA NA NA 3 0 # 7 df Avg numeric Year Average NA NA NA NA 3 0 # 8 df Best character Best Quarter NA NA 2 NA 3 0 # Example #7: Set and Merge Operations # Create sample data grp1 <- read.table(header = TRUE, text = ' GROUP NAME G01 Group1 G02 Group2 ', stringsAsFactors = FALSE) grp2 <- read.table(header = TRUE, text = ' GROUP NAME G03 Group3 G04 Group4 ', stringsAsFactors = FALSE) dat <- read.table(header = TRUE, text = ' ID AGE SEX GROUP A01 58 F G01 A02 20 M G02 A03 47 F G05 A04 11 M G03 A05 23 F G01 ', stringsAsFactors = FALSE) # Set operation grps <- datastep(grp1, set = grp2, {}) grps # GROUP NAME # 1 G01 Group1 # 2 G02 Group2 # 3 G03 Group3 # 4 G04 Group4 # Merge operation - Outer Join res <- datastep(dat, merge = grps, merge_by = "GROUP", merge_in = c("inA", "inB"), {}) # View results res # ID AGE SEX GROUP NAME inA inB # 1 A01 58 F G01 Group1 1 1 # 2 A05 23 F G01 Group1 1 1 # 3 A02 20 M G02 Group2 1 1 # 4 A04 11 M G03 Group3 1 1 # 5 A03 47 F G05 <NA> 1 0 # 6 <NA> NA <NA> G04 Group4 0 1 # Merge operation - Inner Join res <- datastep(dat, merge = grps, merge_by = "GROUP", merge_in = c("inA", "inB"), where = expression(inA & inB), {}) # View results res # ID AGE SEX GROUP NAME inA inB # 1 A01 58 F G01 Group1 1 1 # 2 A05 23 F G01 Group1 1 1 # 3 A02 20 M G02 Group2 1 1 # 4 A04 11 M G03 Group3 1 1 # Example #8: Data NULL # Create new dataset using output() functions. res <- datastep(NULL, { ID <- 10 Item <- "Pencil" output() ID <- 20 Item <- "Scissors" output() }) # View results res # ID Item # 1 10 Pencil # 2 20 Scissors
# Example #1: Simple Data Step df <- datastep(mtcars[1:10,], keep = c("mpg", "cyl", "disp", "mpgcat", "recdt", "is8cyl"), { if (mpg >= 20) mpgcat <- "High" else mpgcat <- "Low" recdt <- as.Date("1974-06-10") if (cyl == 8) is8cyl <- TRUE else is8cyl <- FALSE }) df # mpg cyl disp mpgcat recdt # Mazda RX4 21.0 6 160.0 High 1974-06-10 # Mazda RX4 Wag 21.0 6 160.0 High 1974-06-10 # Datsun 710 22.8 4 108.0 High 1974-06-10 # Hornet 4 Drive 21.4 6 258.0 High 1974-06-10 # Hornet Sportabout 18.7 8 360.0 Low 1974-06-10 # Valiant 18.1 6 225.0 Low 1974-06-10 # Duster 360 14.3 8 360.0 Low 1974-06-10 # Merc 240D 24.4 4 146.7 High 1974-06-10 # Merc 230 22.8 4 140.8 High 1974-06-10 # Merc 280 19.2 6 167.6 Low 1974-06-10 # Example #2: By-group Processing df <- datastep(mtcars[1:10,], keep = c("mpg", "cyl", "gear", "grp"), by = c("gear"), sort_check = FALSE, { if (first.) grp <- "Start" else if (last.) grp <- "End" else grp <- "-" }) df # mpg cyl gear grp # Mazda RX4 21.0 6 4 Start # Mazda RX4 Wag 21.0 6 4 - # Datsun 710 22.8 4 4 End # Hornet 4 Drive 21.4 6 3 Start # Hornet Sportabout 18.7 8 3 - # Valiant 18.1 6 3 - # Duster 360 14.3 8 3 End # Merc 240D 24.4 4 4 Start # Merc 230 22.8 4 4 - # Merc 280 19.2 6 4 End # Example #3: Calculate Block df <- datastep(mtcars, keep = c("mpg", "cyl", "mean_mpg", "mpgcat"), calculate = { mean_mpg = mean(mpg) }, { if (mpg >= mean_mpg) mpgcat <- "High" else mpgcat <- "Low" }) df[1:10,] # mpg cyl mean_mpg mpgcat # Mazda RX4 21.0 6 20.09062 High # Mazda RX4 Wag 21.0 6 20.09062 High # Datsun 710 22.8 4 20.09062 High # Hornet 4 Drive 21.4 6 20.09062 High # Hornet Sportabout 18.7 8 20.09062 Low # Valiant 18.1 6 20.09062 Low # Duster 360 14.3 8 20.09062 Low # Merc 240D 24.4 4 20.09062 High # Merc 230 22.8 4 20.09062 High # Merc 280 19.2 6 20.09062 Low # Example #4: Data pipeline library(dplyr) library(magrittr) # Add datastep to dplyr pipeline df <- mtcars %>% select(mpg, cyl, gear) %>% mutate(mean_mpg = mean(mpg)) %>% datastep({ if (mpg >= mean_mpg) mpgcat <- "High" else mpgcat <- "Low" }) %>% filter(row_number() <= 10) df # mpg cyl gear mean_mpg mpgcat # 1 21.0 6 4 20.09062 High # 2 21.0 6 4 20.09062 High # 3 22.8 4 4 20.09062 High # 4 21.4 6 3 20.09062 High # 5 18.7 8 3 20.09062 Low # 6 18.1 6 3 20.09062 Low # 7 14.3 8 3 20.09062 Low # 8 24.4 4 4 20.09062 High # 9 22.8 4 4 20.09062 High # 10 19.2 6 4 20.09062 Low # Example #5: Drop, Retain and Rename df <- datastep(mtcars[1:10, ], drop = c("disp", "hp", "drat", "qsec", "vs", "am", "gear", "carb"), retain = list(cumwt = 0 ), rename = c(mpg = "MPG", cyl = "Cylinders", wt = "Wgt", cumwt = "Cumulative Wgt"), { cumwt <- cumwt + wt }) df # MPG Cylinders Wgt Cumulative Wgt # Mazda RX4 21.0 6 2.620 2.620 # Mazda RX4 Wag 21.0 6 2.875 5.495 # Datsun 710 22.8 4 2.320 7.815 # Hornet 4 Drive 21.4 6 3.215 11.030 # Hornet Sportabout 18.7 8 3.440 14.470 # Valiant 18.1 6 3.460 17.930 # Duster 360 14.3 8 3.570 21.500 # Merc 240D 24.4 4 3.190 24.690 # Merc 230 22.8 4 3.150 27.840 # Merc 280 19.2 6 3.440 31.280 # Example #6: Attributes and Arrays # Create sample data dat <- read.table(header = TRUE, text = ' Year Q1 Q2 Q3 Q4 2000 125 137 152 140 2001 132 145 138 87 2002 101 104 115 121') # Use attrib list to control column order and add labels # Use array to calculate row sums and means, and get best quarter df <- datastep(dat, attrib = list(Tot = dsattr(0, label = "Year Total"), Avg = dsattr(0, label = "Year Average"), Best = dsattr(0, label = "Best Quarter")), arrays = list(qtrs = dsarray("Q1", "Q2", "Q3", "Q4")), drop = "q", steps = { # Empty brackets return all array values Tot <- sum(qtrs[]) Avg <- mean(qtrs[]) # Iterate to find best quarter for (q in qtrs) { if (qtrs[q] == max(qtrs[])) Best <- q } }) df # Year Q1 Q2 Q3 Q4 Tot Avg Best # 1 2000 125 137 152 140 554 138.50 Q3 # 2 2001 132 145 138 87 502 125.50 Q2 # 3 2002 101 104 115 121 441 110.25 Q4 dictionary(df) # A tibble: 8 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <int> <chr> <int> <int> # 1 df Year integer NA NA NA NA NA 3 0 # 2 df Q1 integer NA NA NA NA NA 3 0 # 3 df Q2 integer NA NA NA NA NA 3 0 # 4 df Q3 integer NA NA NA NA NA 3 0 # 5 df Q4 integer NA NA NA NA NA 3 0 # 6 df Tot integer Year Total NA NA NA NA 3 0 # 7 df Avg numeric Year Average NA NA NA NA 3 0 # 8 df Best character Best Quarter NA NA 2 NA 3 0 # Example #7: Set and Merge Operations # Create sample data grp1 <- read.table(header = TRUE, text = ' GROUP NAME G01 Group1 G02 Group2 ', stringsAsFactors = FALSE) grp2 <- read.table(header = TRUE, text = ' GROUP NAME G03 Group3 G04 Group4 ', stringsAsFactors = FALSE) dat <- read.table(header = TRUE, text = ' ID AGE SEX GROUP A01 58 F G01 A02 20 M G02 A03 47 F G05 A04 11 M G03 A05 23 F G01 ', stringsAsFactors = FALSE) # Set operation grps <- datastep(grp1, set = grp2, {}) grps # GROUP NAME # 1 G01 Group1 # 2 G02 Group2 # 3 G03 Group3 # 4 G04 Group4 # Merge operation - Outer Join res <- datastep(dat, merge = grps, merge_by = "GROUP", merge_in = c("inA", "inB"), {}) # View results res # ID AGE SEX GROUP NAME inA inB # 1 A01 58 F G01 Group1 1 1 # 2 A05 23 F G01 Group1 1 1 # 3 A02 20 M G02 Group2 1 1 # 4 A04 11 M G03 Group3 1 1 # 5 A03 47 F G05 <NA> 1 0 # 6 <NA> NA <NA> G04 Group4 0 1 # Merge operation - Inner Join res <- datastep(dat, merge = grps, merge_by = "GROUP", merge_in = c("inA", "inB"), where = expression(inA & inB), {}) # View results res # ID AGE SEX GROUP NAME inA inB # 1 A01 58 F G01 Group1 1 1 # 2 A05 23 F G01 Group1 1 1 # 3 A02 20 M G02 Group2 1 1 # 4 A04 11 M G03 Group3 1 1 # Example #8: Data NULL # Create new dataset using output() functions. res <- datastep(NULL, { ID <- 10 Item <- "Pencil" output() ID <- 20 Item <- "Scissors" output() }) # View results res # ID Item # 1 10 Pencil # 2 20 Scissors
The delete
function will remove an observation
from the output of a datastep. The function takes no parameters. To use
the function, simply call it on the rows you want to delete. Typically
it is called within a conditional.
delete()
delete()
Observation is marked with a delete flag. No return value.
Other datastep:
[.dsarray()
,
datastep()
,
dsarray()
,
dsattr()
,
length.dsarray()
,
output()
#' # Remove all cars that are not 4 cylinder df <- datastep(mtcars, keep = c("mpg", "cyl", "disp"), { if (cyl != 4) delete() }) df # mpg cyl disp # 1 22.8 4 108.0 # 2 24.4 4 146.7 # 3 22.8 4 140.8 # 4 32.4 4 78.7 # 5 30.4 4 75.7 # 6 33.9 4 71.1 # 7 21.5 4 120.1 # 8 27.3 4 79.0 # 9 26.0 4 120.3 # 10 30.4 4 95.1 # 11 21.4 4 121.0
#' # Remove all cars that are not 4 cylinder df <- datastep(mtcars, keep = c("mpg", "cyl", "disp"), { if (cyl != 4) delete() }) df # mpg cyl disp # 1 22.8 4 108.0 # 2 24.4 4 146.7 # 3 22.8 4 140.8 # 4 32.4 4 78.7 # 5 30.4 4 75.7 # 6 33.9 4 71.1 # 7 21.5 4 120.1 # 8 27.3 4 79.0 # 9 26.0 4 120.3 # 10 30.4 4 95.1 # 11 21.4 4 121.0
A function to create a data dictionary for a data frame, a tibble, or a data library. The function will generate a tibble of information about the data. The tibble will contain the following columns:
Name: The name of the data object.
Column: The name of the column.
Class: The class of the column.
Label: The value of the label attribute.
Description: A description applied to this column.
Format: The value of the format attribute.
Width: The value of the width attribute if any have been assigned.
Justify: The justification or alignment attribute value.
Rows: The number of data rows.
NAs: The number of NA values in this column.
MaxChar: The maximum character length of the values in this column with no padding.
dictionary(x)
dictionary(x)
x |
The input library, data frame, or tibble. |
libname
to create a data library. Also
see the dsattr
function to set attributes for your
dataset from within a datastep
. To render attributes,
see the fmtr package.
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, beaver1) lib_add(dat, iris) # Examine the dictionary for the library dictionary(dat) # A tibble: 9 x 10 # Name Column Class Label Description Format Width Justify Rows NAs MaxChar # <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <int> <int> <int> # 1 beaver1 day numeric NA NA NA NA NA 114 0 3 # 2 beaver1 time numeric NA NA NA NA NA 114 0 4 # 3 beaver1 temp numeric NA NA NA NA NA 114 0 5 # 4 beaver1 activ numeric NA NA NA NA NA 114 0 1 # 5 iris Sepal.Length numeric NA NA NA NA NA 150 0 3 # 6 iris Sepal.Width numeric NA NA NA NA NA 150 0 3 # 7 iris Petal.Length numeric NA NA NA NA NA 150 0 3 # 8 iris Petal.Width numeric NA NA NA NA NA 150 0 3 # 9 iris Species factor NA NA NA NA NA 150 0 10 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, beaver1) lib_add(dat, iris) # Examine the dictionary for the library dictionary(dat) # A tibble: 9 x 10 # Name Column Class Label Description Format Width Justify Rows NAs MaxChar # <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <int> <int> <int> # 1 beaver1 day numeric NA NA NA NA NA 114 0 3 # 2 beaver1 time numeric NA NA NA NA NA 114 0 4 # 3 beaver1 temp numeric NA NA NA NA NA 114 0 5 # 4 beaver1 activ numeric NA NA NA NA NA 114 0 1 # 5 iris Sepal.Length numeric NA NA NA NA NA 150 0 3 # 6 iris Sepal.Width numeric NA NA NA NA NA 150 0 3 # 7 iris Petal.Length numeric NA NA NA NA NA 150 0 3 # 8 iris Petal.Width numeric NA NA NA NA NA 150 0 3 # 9 iris Species factor NA NA NA NA NA 150 0 10 # Clean up lib_delete(dat)
A data step array is an object that allows you to iterate
across a set of columns inside a datastep
. This structure is
useful when you need to perform the same or similar operations on many columns.
dsarray(...)
dsarray(...)
... |
Column names to include as part of the datastep array. The
names can be provided as quoted strings or a vector of strings.
If names are provided as quoted strings, separate the strings with commas
(i.e. |
The datastep array has an indexer that allows you to access a particular column value. The indexer can be used within a for loop to iterate over the array. In this manner, you can place a set of conditions inside the for loop and run the same conditional logic on all the columns in the array.
You can also use the datastep array with an empty indexer in vectorized
functions like sum
, mean
,
and max
. The empty indexer will return all the
values in the array for the current row.
The datastep array object.
libname
to create a data library, and
dictionary
for generating a data dictionary
Other datastep:
[.dsarray()
,
datastep()
,
delete()
,
dsattr()
,
length.dsarray()
,
output()
library(libr) # Create AirPassengers Data Frame df <- as.data.frame(t(matrix(AirPassengers, 12, dimnames = list(month.abb, seq(1949, 1960)))), stringsAsFactors = FALSE) # Use datastep array to get year tot, mean, and top month dat <- datastep(df, arrays = list(months = dsarray(names(df))), attrib = list(Tot = 0, Mean = 0, Top = ""), drop = "mth", { Tot <- sum(months[]) Mean <- mean(months[]) for (mth in months) { if (months[mth] == max(months[])) { Top <- mth } } }) dat # Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Tot Mean Top # 1949 112 118 132 129 121 135 148 148 136 119 104 118 1520 126.6667 Aug # 1950 115 126 141 135 125 149 170 170 158 133 114 140 1676 139.6667 Aug # 1951 145 150 178 163 172 178 199 199 184 162 146 166 2042 170.1667 Aug # 1952 171 180 193 181 183 218 230 242 209 191 172 194 2364 197.0000 Aug # 1953 196 196 236 235 229 243 264 272 237 211 180 201 2700 225.0000 Aug # 1954 204 188 235 227 234 264 302 293 259 229 203 229 2867 238.9167 Jul # 1955 242 233 267 269 270 315 364 347 312 274 237 278 3408 284.0000 Jul # 1956 284 277 317 313 318 374 413 405 355 306 271 306 3939 328.2500 Jul # 1957 315 301 356 348 355 422 465 467 404 347 305 336 4421 368.4167 Aug # 1958 340 318 362 348 363 435 491 505 404 359 310 337 4572 381.0000 Aug # 1959 360 342 406 396 420 472 548 559 463 407 362 405 5140 428.3333 Aug # 1960 417 391 419 461 472 535 622 606 508 461 390 432 5714 476.1667 Jul
library(libr) # Create AirPassengers Data Frame df <- as.data.frame(t(matrix(AirPassengers, 12, dimnames = list(month.abb, seq(1949, 1960)))), stringsAsFactors = FALSE) # Use datastep array to get year tot, mean, and top month dat <- datastep(df, arrays = list(months = dsarray(names(df))), attrib = list(Tot = 0, Mean = 0, Top = ""), drop = "mth", { Tot <- sum(months[]) Mean <- mean(months[]) for (mth in months) { if (months[mth] == max(months[])) { Top <- mth } } }) dat # Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Tot Mean Top # 1949 112 118 132 129 121 135 148 148 136 119 104 118 1520 126.6667 Aug # 1950 115 126 141 135 125 149 170 170 158 133 114 140 1676 139.6667 Aug # 1951 145 150 178 163 172 178 199 199 184 162 146 166 2042 170.1667 Aug # 1952 171 180 193 181 183 218 230 242 209 191 172 194 2364 197.0000 Aug # 1953 196 196 236 235 229 243 264 272 237 211 180 201 2700 225.0000 Aug # 1954 204 188 235 227 234 264 302 293 259 229 203 229 2867 238.9167 Jul # 1955 242 233 267 269 270 315 364 347 312 274 237 278 3408 284.0000 Jul # 1956 284 277 317 313 318 374 413 405 355 306 271 306 3939 328.2500 Jul # 1957 315 301 356 348 355 422 465 467 404 347 305 336 4421 368.4167 Aug # 1958 340 318 362 348 363 435 491 505 404 359 310 337 4572 381.0000 Aug # 1959 360 342 406 396 420 472 548 559 463 407 362 405 5140 428.3333 Aug # 1960 417 391 419 461 472 535 622 606 508 461 390 432 5714 476.1667 Jul
An object to assign attributes
to a column in a datastep
. The parameters allow you
to set the following attributes: 'class', 'label', 'description', 'width',
'justify', and 'format'. Any other desired attributes can be set
with ...
.
The attributes available in the dsattr
class are closely aligned
with those available on the dictionary
object.
dsattr( default = NA, label = NULL, description = NULL, width = NULL, format = NULL, justify = NULL, ... )
dsattr( default = NA, label = NULL, description = NULL, width = NULL, format = NULL, justify = NULL, ... )
default |
The default value of the column. The default value can be any valid data value. Typical default values might be an empty string ("") or a zero (0). If no default value is specified, the column will be defaulted to NA. |
label |
The label to associate with this column. Accepts any string value. The label will appear as a column header on some data viewers and reporting packages. |
description |
A description for this column. Accepts any string value. The description is intended to be a longer explanation of the purpose or source of the variable. |
width |
The desired width for the column in number of characters. |
format |
The format associated with this column. See the fmtr package for more information about formatting. |
justify |
The desired justification for the column. This parameter is normally used only for fixed-width, character columns. Valid values are 'left', 'right', 'center', and 'centre'. |
... |
Any other attributes you wish to assign to this column. Pass these additional attributes as a name/value pair. |
The data step attributes object.
dictionary
function to observe the attributes
associated with a dataset. Also see the fdata
function in the fmtr
package for more information on formatting and rendering data frames.
Other datastep:
[.dsarray()
,
datastep()
,
delete()
,
dsarray()
,
length.dsarray()
,
output()
library(libr) # Create small sample dataframe dat <- mtcars[1:10, c("mpg", "cyl")] # Perform datastep and assign attributes dat1 <- datastep(dat, attrib = list(mpg = dsattr(label = "Miles Per Gallon"), cyl = dsattr(label = "Cylinders"), mpgcat = dsattr(label = "Fuel Efficiency")), { if (mpg >= 20) mpgcat = "High" else mpgcat = "Low" }) # Print results dat1 # mpg cyl mpgcat # Mazda RX4 21.0 6 High # Mazda RX4 Wag 21.0 6 High # Datsun 710 22.8 4 High # Hornet 4 Drive 21.4 6 High # Hornet Sportabout 18.7 8 Low # Valiant 18.1 6 Low # Duster 360 14.3 8 Low # Merc 240D 24.4 4 High # Merc 230 22.8 4 High # Merc 280 19.2 6 Low # Examine label attributes attr(dat1$mpg, "label") # [1] "Miles Per Gallon" attr(dat1$cyl, "label") # [1] "Cylinders" attr(dat1$mpgcat, "label") # [1] "Fuel Efficiency" # See labels in viewer # View(dat1)
library(libr) # Create small sample dataframe dat <- mtcars[1:10, c("mpg", "cyl")] # Perform datastep and assign attributes dat1 <- datastep(dat, attrib = list(mpg = dsattr(label = "Miles Per Gallon"), cyl = dsattr(label = "Cylinders"), mpgcat = dsattr(label = "Fuel Efficiency")), { if (mpg >= 20) mpgcat = "High" else mpgcat = "Low" }) # Print results dat1 # mpg cyl mpgcat # Mazda RX4 21.0 6 High # Mazda RX4 Wag 21.0 6 High # Datsun 710 22.8 4 High # Hornet 4 Drive 21.4 6 High # Hornet Sportabout 18.7 8 Low # Valiant 18.1 6 Low # Duster 360 14.3 8 Low # Merc 240D 24.4 4 High # Merc 230 22.8 4 High # Merc 280 19.2 6 Low # Examine label attributes attr(dat1$mpg, "label") # [1] "Miles Per Gallon" attr(dat1$cyl, "label") # [1] "Cylinders" attr(dat1$mpgcat, "label") # [1] "Fuel Efficiency" # See labels in viewer # View(dat1)
A function to create the import specifications for a
particular data file. This information can be used on the
libname
function to correctly assign the data types for
columns on imported data. The import specifications are defined as
name/value pairs, where the name is the column name and the value is the
data type indicator. Available data type indicators are
'guess', 'logical', 'character', 'integer', 'numeric',
'date', 'datetime', and 'time'. See the specs
function
for an example of using import specs.
import_spec(..., na = NULL, trim_ws = NULL)
import_spec(..., na = NULL, trim_ws = NULL)
... |
Named pairs of column names and column data types. Available types are: 'guess', 'logical', 'character', 'integer', 'numeric', 'date', 'datetime', and 'time'. The date/time data types accept an optional input format. To supply the input format, append it after the data type following an equals sign, e.g.: 'date=%d%B%Y' or 'datetime=%d%m%Y %H:%M:%S'. Default is NULL, meaning no column types are specified, and the function should make its best guess for each column. |
na |
A vector of values to be treated as NA. For example, the
vector |
trim_ws |
Whether or not to trim white space from the input data values.
The default is NULL, meaning the value of the |
The import specification object.
libname
to create a data library, and
specs
for an example using import specs.
Other specs:
print.specs()
,
read.specs()
,
specs()
,
write.specs()
This function tests whether an object is a data library. The data library has a class of "lib".
is.lib(x)
is.lib(x)
x |
The object to test. |
TRUE or FALSE, depending on whether or not the object is a data library.
Other lib:
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create format catalog libname(dat, tempdir()) # Test for "lib" class is.lib(dat) # [1] TRUE is.lib(list()) # [1] FALSE # Clean up lib_delete(dat)
# Create format catalog libname(dat, tempdir()) # Test for "lib" class is.lib(dat) # [1] TRUE is.lib(list()) # [1] FALSE # Clean up lib_delete(dat)
A length function for the data step array dsarray
.
The length function can by used either inside or outside the data step.
## S3 method for class 'dsarray' length(x)
## S3 method for class 'dsarray' length(x)
x |
The |
The number of items in the specified dsarray
.
Other datastep:
[.dsarray()
,
datastep()
,
delete()
,
dsarray()
,
dsattr()
,
output()
# Define datastep array carr <- dsarray(names(mtcars)) length(carr) # 11
# Define datastep array carr <- dsarray(names(mtcars)) length(carr) # 11
The lib_add
function adds a data frame
or tibble to an existing data library. The function will both add the data
to the library list, and immediately write the data to the library
directory location. The data will be written to disk
in the file format associated with the library engine.
If the library is loaded, the function will also
add the data to the workspace environment.
lib_add(x, ..., name = NULL)
lib_add(x, ..., name = NULL)
x |
The library to add data to. |
... |
The data frame(s) to add to the library. If more than one, separate with commas. |
name |
The reference name to use for the data. By default, the name will be the variable name. To assign a name different from the variable name, assign a quoted name to this parameter. If more than one data set is being appended, assign a vector of quoted names. |
Other lib:
is.lib()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
#' # Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # # library 'dat': 0 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Add data to the library lib_add(dat, mtcars, beaver1, iris) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # 2 beaver1 rds 114 4 5.1 Kb 2020-11-05 19:32:04 # 3 iris rds 150 5 7.5 Kb 2020-11-05 19:32:08 # Clean up lib_delete(dat)
#' # Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # # library 'dat': 0 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Add data to the library lib_add(dat, mtcars, beaver1, iris) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # 2 beaver1 rds 114 4 5.1 Kb 2020-11-05 19:32:04 # 3 iris rds 150 5 7.5 Kb 2020-11-05 19:32:08 # Clean up lib_delete(dat)
The lib_copy
function copies a data library. The
function accepts a library and a destination path. If the destination
path does not exist, the function will attempt to create it.
Note that the copy will result in the current data in memory written to the new destination directory. If the library is loaded into the workspace, the workspace version will be considered the most current version, and that is the version that will be copied.
lib_copy(x, nm, directory_path, standard_eval = FALSE)
lib_copy(x, nm, directory_path, standard_eval = FALSE)
x |
The library to copy. |
nm |
The variable name to hold the new library.
The parameter will assume non-standard
(unquoted) evaluation unless the |
directory_path |
The path to copy the library to. |
standard_eval |
A TRUE or FALSE value which indicates whether to
use standard (quoted) or non-standard (unquoted) evaluation on the
|
The new library.
Other lib:
is.lib()
,
lib_add()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat1, tmp) # Add dat to library lib_add(dat1, mtcars, iris) # Copy dat1 to dat2 lib_copy(dat1, dat2, file.path(tmp, "copy")) # library 'dat2': 2 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc/copy # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 21:14:54 # 2 iris rds 150 5 7.5 Kb 2020-11-05 21:14:54 # Clean up lib_delete(dat1) lib_delete(dat2)
# Create temp directory tmp <- tempdir() # Create library libname(dat1, tmp) # Add dat to library lib_add(dat1, mtcars, iris) # Copy dat1 to dat2 lib_copy(dat1, dat2, file.path(tmp, "copy")) # library 'dat2': 2 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc/copy # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 21:14:54 # 2 iris rds 150 5 7.5 Kb 2020-11-05 21:14:54 # Clean up lib_delete(dat1) lib_delete(dat2)
The lib_delete
function deletes a data library from
the file system and from memory. All data files associated with the library
and the specified engine will be deleted.
If other files exist in the library directory, they will not be affected
by the delete operation.
The directory that contains the data will also
not be affected by the delete operation. To delete the data directory,
use the unlink
function or other packaged functions.
lib_delete(x)
lib_delete(x)
x |
The data library to delete. |
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to library lib_add(dat, mtcars) lib_add(dat, iris) # Load library lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.mtcars" "tmp" # Examine library dat # library 'dat': 2 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 21:18:17 # 2 iris rds 150 5 7.5 Kb 2020-11-05 21:18:17 # Delete library lib_delete(dat) #' # Examine workspace again ls() # [1] "tmp"
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to library lib_add(dat, mtcars) lib_add(dat, iris) # Load library lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.mtcars" "tmp" # Examine library dat # library 'dat': 2 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 21:18:17 # 2 iris rds 150 5 7.5 Kb 2020-11-05 21:18:17 # Delete library lib_delete(dat) #' # Examine workspace again ls() # [1] "tmp"
The lib_export
function exports a data library to
another library with a different directory and file format. The
function accepts a library to export, the new library name,
a destination path, and an engine name.
If the destination
path does not exist, the function will attempt to create it.
Note that the export will result in the current data in memory written to the new destination directory. If the library is loaded into the workspace, the workspace version will be considered the most current version, and that is the version that will be exported.
lib_export(x, nm, directory_path, engine, filter = NULL, standard_eval = FALSE)
lib_export(x, nm, directory_path, engine, filter = NULL, standard_eval = FALSE)
x |
The library to export. |
nm |
The variable name to hold the new library.
The parameter will assume non-standard
(unquoted) evaluation unless the |
directory_path |
The path to export the library to. |
engine |
The name of the engine to use for the exported data. The engine name corresponds to the standard file extension of the data file type. Valid values are 'rds', 'Rdata', 'rda', 'sas7bdat', 'xpt', 'xls', 'xlsx', 'dbf', 'csv', and 'parquet'. |
filter |
A filter string to limit which datasets are exported. The filter parameter accepts wildcards. |
standard_eval |
A TRUE or FALSE value which indicates whether to
use standard (quoted) or non-standard (unquoted) evaluation on the
|
The newly exported library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat1, tmp) # Add dat to library lib_add(dat1, mtcars, iris) # Export dat1 to dat2 lib_export(dat1, dat2, file.path(tmp, "export"), "rdata") # library 'dat2': 2 items # - attributes: rdata not loaded # - path: C:\Users\User\AppData\Local\Temp\Rtmp0Sq3kt/export # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rdata 32 11 8.1 Kb 2022-06-23 00:10:52 # 2 iris rdata 150 5 8.1 Kb 2022-06-23 00:10:52 # Clean up lib_delete(dat1) lib_delete(dat2)
# Create temp directory tmp <- tempdir() # Create library libname(dat1, tmp) # Add dat to library lib_add(dat1, mtcars, iris) # Export dat1 to dat2 lib_export(dat1, dat2, file.path(tmp, "export"), "rdata") # library 'dat2': 2 items # - attributes: rdata not loaded # - path: C:\Users\User\AppData\Local\Temp\Rtmp0Sq3kt/export # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rdata 32 11 8.1 Kb 2022-06-23 00:10:52 # 2 iris rdata 150 5 8.1 Kb 2022-06-23 00:10:52 # Clean up lib_delete(dat1) lib_delete(dat2)
The lib_info
function returns a data frame of information
about each item in the data library. That information includes the item
name, file extension, number of rows, number of columns, size in bytes,
and the last modified date.
lib_info(x)
lib_info(x)
x |
The data library. |
A data frame of information about the library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create data library libname(dat, tmp) # Add data to library lib_add(dat, trees, rock, beaver1) # Get library information info <- lib_info(dat) # Examine info info # Name Extension Rows Cols Size LastModified # 1 beaver1 rds 114 4 5.3 Kb 2020-11-05 21:27:57 # 2 rocks rds 48 4 3.1 Kb 2020-11-05 21:27:56 # 3 trees rds 31 3 2.4 Kb 2020-11-05 21:27:56 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create data library libname(dat, tmp) # Add data to library lib_add(dat, trees, rock, beaver1) # Get library information info <- lib_info(dat) # Examine info info # Name Extension Rows Cols Size LastModified # 1 beaver1 rds 114 4 5.3 Kb 2020-11-05 21:27:57 # 2 rocks rds 48 4 3.1 Kb 2020-11-05 21:27:56 # 3 trees rds 31 3 2.4 Kb 2020-11-05 21:27:56 # Clean up lib_delete(dat)
The lib_load
function loads a data library into
an environment. The environment used is associated with the library at
the time it is created with the libname
function.
When the lib_load
function is called, the data frames/tibbles
will be loaded with <library>.<data set> syntax. Loading the data frames
into the environment makes them easy to access and use in your program.
Note that the lib_load
function is optional, and calling the function
is not needed to access data in the libname
. You may also access
data directly from the libname
using the dollar sign ($) syntax.
lib_load(x, filter = NULL)
lib_load(x, filter = NULL)
x |
The data library to load. |
filter |
One or more quoted strings to use as filters for the data names to load into the workspace. For more than one filter string, pass them as a vector of strings. The filter string can be a full or partial name. If using a partial name, use a wild-card character (*) to identify the missing portion. The match will be case-insensitive. |
The loaded data library.
lib_unload
to unload the library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Save some data to temp directory for illustration purposes saveRDS(iris, file.path(tmp, "iris.rds")) saveRDS(ToothGrowth, file.path(tmp, "ToothGrowth.rds")) saveRDS(PlantGrowth, file.path(tmp, "PlantGrowth.rds")) # Create library libname(dat, tmp) # Load library into workspace lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.PlantGrowth" "dat.ToothGrowth" "tmp" # Use some data summary(dat.PlantGrowth) summary(dat.ToothGrowth) # Unload library lib_unload(dat) # Examine workspace again ls() # [1] "dat" "tmp" # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Save some data to temp directory for illustration purposes saveRDS(iris, file.path(tmp, "iris.rds")) saveRDS(ToothGrowth, file.path(tmp, "ToothGrowth.rds")) saveRDS(PlantGrowth, file.path(tmp, "PlantGrowth.rds")) # Create library libname(dat, tmp) # Load library into workspace lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.PlantGrowth" "dat.ToothGrowth" "tmp" # Use some data summary(dat.PlantGrowth) summary(dat.ToothGrowth) # Unload library lib_unload(dat) # Examine workspace again ls() # [1] "dat" "tmp" # Clean up lib_delete(dat)
The lib_path
function returns the current path of the
the library as a string.
lib_path(x)
lib_path(x)
x |
The data library. |
The path of the data library as a single string.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Examine library path lib_path(dat) # [1] "C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc" # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Examine library path lib_path(dat) # [1] "C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc" # Clean up lib_delete(dat)
The lib_remove
function removes an item from the
data library, and deletes the source file for that data. If the library
is loaded, it will also remove that item from the workspace environment.
lib_remove(x, name)
lib_remove(x, name)
x |
The data library. |
name |
The quoted name of the item to remove from the data library. For more than one name, pass a vector of quoted names. |
The library with the requested item removed.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, mtcars, beaver1, iris) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # 2 beaver1 rds 114 4 5.1 Kb 2020-11-05 19:32:04 # 3 iris rds 150 5 7.5 Kb 2020-11-05 19:32:08 # Remove items from the library lib_remove(dat, c("beaver1", "iris")) # library 'dat': 1 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:40 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, mtcars, beaver1, iris) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # 2 beaver1 rds 114 4 5.1 Kb 2020-11-05 19:32:04 # 3 iris rds 150 5 7.5 Kb 2020-11-05 19:32:08 # Remove items from the library lib_remove(dat, c("beaver1", "iris")) # library 'dat': 1 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:40 # Clean up lib_delete(dat)
The lib_replace
function replaces a data frame
in an existing data library. The function will replace the data
in the library list, the data in the workspace (if loaded),
and immediately write the new data to the library
directory location. The data will be written in the file format
associated with the library engine.
lib_replace(x, ..., name = NULL)
lib_replace(x, ..., name = NULL)
x |
The library to replace data in. |
... |
The data frame(s) to replace. If you wish to replace more than one data set, separate with commas. |
name |
The reference name to use for the data. By default, the name will be the variable name. To assign a name different from the variable name, assign a quoted name to this parameter. If more than one data set is being replaced, assign a vector of quoted names. |
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
#' # Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, mtcars) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # Replace data with a subset lib_replace(dat, mtcars[1:10, 1:5], name = "mtcars") # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 10 5 7.5 Kb 2020-11-05 19:33:00 # Clean up lib_delete(dat)
#' # Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to the library lib_add(dat, mtcars) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 32 11 7.5 Kb 2020-11-05 19:32:00 # Replace data with a subset lib_replace(dat, mtcars[1:10, 1:5], name = "mtcars") # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars rds 10 5 7.5 Kb 2020-11-05 19:33:00 # Clean up lib_delete(dat)
The lib_size
function returns the number of bytes used
by the data library, as stored on disk.
lib_size(x)
lib_size(x)
x |
The data library. |
The size of the data library in bytes as stored on the file system.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add some data to library lib_add(dat, mtcars) lib_add(dat, iris) # Check size of library lib_size(dat) # [1] 9757 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add some data to library lib_add(dat, mtcars) lib_add(dat, iris) # Check size of library lib_size(dat) # [1] 9757 # Clean up lib_delete(dat)
The lib_sync
function synchronizes the data
loaded into the working environment with the data stored
in the library list. Synchronization is necessary only
for libraries that have been loaded into the working environment.
The function
copies data from the working environment to the library
list, overwriting any data in the list. The function is useful when
you want to update the library list, but are not yet ready to
unload the data from working memory.
Note that the lib_sync
function does not
write any data to disk. Also note that the lib_sync
function will
not automatically remove any variables from the library list that
have been removed from the workspace. To remove items from the library
list, use the lib_remove
function. To write data to
disk, use the lib_write
function.
lib_sync(x, name = NULL)
lib_sync(x, name = NULL)
x |
The data library to synchronize. |
name |
The name of the library to sync if not the variable name. Used internally. |
The synchronized data library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_unload()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # library 'dat': 0 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Load the library lib_load(dat) # Add data to the workspace dat.mtcars <- mtcars dat.beaver1 <- beaver1 dat.iris <- iris # Sync the library lib_sync(dat) # library 'dat': 3 items # - attributes: loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 NA 114 4 4.6 Kb <NA> # 2 iris NA 150 5 7.1 Kb <NA> # 3 mtcars NA 32 11 7 Kb <NA> # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # library 'dat': 0 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Load the library lib_load(dat) # Add data to the workspace dat.mtcars <- mtcars dat.beaver1 <- beaver1 dat.iris <- iris # Sync the library lib_sync(dat) # library 'dat': 3 items # - attributes: loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 NA 114 4 4.6 Kb <NA> # 2 iris NA 150 5 7.1 Kb <NA> # 3 mtcars NA 32 11 7 Kb <NA> # Clean up lib_delete(dat)
The lib_unload
function unloads a data library from
the workspace environment. The unload function does not delete the data
or remove the library. It simply removes the data frames from working
memory. By default, the lib_unload
function will also synchronize the
data in working memory with the data stored in the library list, as these
two instances can become out of sync if you change the data in working memory.
lib_unload(x, sync = TRUE, name = NULL)
lib_unload(x, sync = TRUE, name = NULL)
x |
The data library to unload. |
sync |
Whether to sync the workspace with the library list before it is unloaded. Default is TRUE. If you want to unload the workspace without saving the workspace data, set this parameter to FALSE. |
name |
The name of the library to unload, if the name is different than the variable name. Used internally. |
The unloaded data library.
lib_load
to load the library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_write()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to library lib_add(dat, iris, ToothGrowth, PlantGrowth) # Load library into workspace lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.PlantGrowth" "dat.ToothGrowth" "tmp" # Use some data summary(dat.PlantGrowth) summary(dat.ToothGrowth) # Unload library lib_unload(dat) # Examine workspace again ls() # [1] "dat" "tmp" # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # Add data to library lib_add(dat, iris, ToothGrowth, PlantGrowth) # Load library into workspace lib_load(dat) # Examine workspace ls() # [1] "dat" "dat.iris" "dat.PlantGrowth" "dat.ToothGrowth" "tmp" # Use some data summary(dat.PlantGrowth) summary(dat.ToothGrowth) # Unload library lib_unload(dat) # Examine workspace again ls() # [1] "dat" "tmp" # Clean up lib_delete(dat)
The lib_write
function writes the data library
to the file system. The library will be written to the
directory for which it was defined, and each data frame will be written
in the format associated with the library data engine. See the
libname
function for further elaboration on the types
of engines available, and the assumptions/limitations of each.
By default, the lib_write
function will not write data that has
not changed. Prior to writing a file, lib_write
will compare the
data in memory to the data on disk. If there are differences in the data,
the function will overwrite the version on disk. To override the default
behavior, use the force
option to force lib_write
to write
every data file to disk.
Note that writing sas7bdat files to disk is not supported.
lib_write(x, force = FALSE)
lib_write(x, force = FALSE)
x |
The data library to write. |
force |
Force writing each data file to disk, even if it has not changed. |
The saved data library.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
libname()
,
print.lib()
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # # library 'dat': 0 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Load the empty library lib_load(dat) # Add data to the library dat.mtcars <- mtcars dat.beaver1 <- beaver1 dat.iris <- iris # Unload the library lib_unload(dat) # library 'dat': 3 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 NA 114 4 4.6 Kb <NA> # 2 iris NA 150 5 7.1 Kb <NA> # 3 mtcars NA 32 11 7 Kb <NA> # Write the library to the file system lib_write(dat) # library 'dat': 3 items #- attributes: not loaded #- path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc #- items: # Name Extension Rows Cols Size LastModified #1 beaver1 rds 114 4 4.8 Kb 2020-11-05 20:47:16 #2 iris rds 150 5 7.3 Kb 2020-11-05 20:47:16 #3 mtcars rds 32 11 7.3 Kb 2020-11-05 20:47:16 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create library libname(dat, tmp) # # library 'dat': 0 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # NULL # Load the empty library lib_load(dat) # Add data to the library dat.mtcars <- mtcars dat.beaver1 <- beaver1 dat.iris <- iris # Unload the library lib_unload(dat) # library 'dat': 3 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 NA 114 4 4.6 Kb <NA> # 2 iris NA 150 5 7.1 Kb <NA> # 3 mtcars NA 32 11 7 Kb <NA> # Write the library to the file system lib_write(dat) # library 'dat': 3 items #- attributes: not loaded #- path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc #- items: # Name Extension Rows Cols Size LastModified #1 beaver1 rds 114 4 4.8 Kb 2020-11-05 20:47:16 #2 iris rds 150 5 7.3 Kb 2020-11-05 20:47:16 #3 mtcars rds 32 11 7.3 Kb 2020-11-05 20:47:16 # Clean up lib_delete(dat)
A data library is a collection of data sets. The purpose of
the data library is to combine related data sets, and provides the opportunity
to manipulate all of them as a single object. A data library is created using
the libname
function. The libname
function allows you to
load an entire directory of data into memory in one step. The libr
package contains additional functions to add and remove data from the library,
copy the library, and write any changed data to the file system.
libname( name, directory_path, engine = "rds", read_only = FALSE, env = parent.frame(), import_specs = NULL, filter = NULL, standard_eval = FALSE, quiet = FALSE, log = TRUE, where = NULL )
libname( name, directory_path, engine = "rds", read_only = FALSE, env = parent.frame(), import_specs = NULL, filter = NULL, standard_eval = FALSE, quiet = FALSE, log = TRUE, where = NULL )
name |
The unquoted name of the library to create. The library name will
be created as a variable in the environment specified on the |
directory_path |
A directory path to associate with the library. If
the directory contains data files of the type specified on the
|
engine |
The engine to associate with the library. The specified engine will be used to import and export data. The engine name corresponds to the standard file extension of the data file type. The default engine is 'rds'. Valid values are 'rds', 'Rdata', 'rda', 'sas7bdat', 'xpt', 'xls', 'xlsx', 'dbf', 'csv', 'parquet'. |
read_only |
Whether the library should be created as read-only. Default is FALSE. If TRUE, the user will be restricted from appending, removing, or writing any data from memory to the file system. |
env |
The environment to use for the libname.
Default is |
import_specs |
A collection of import specifications,
defined using the |
filter |
One or more quoted strings to use as filters for the incoming file names. For more than one filter string, pass them as a vector of strings. The filter string can be a full or partial file name, without extension. If using a partial file name, use a wild-card character (*) to identify the missing portion. The match will be case-insensitive. |
standard_eval |
A TRUE or FALSE value which indicates whether to
use standard (quoted) or non-standard (unquoted) evaluation on the library
|
quiet |
When TRUE, minimizes output to the console when loading files. Default is FALSE. |
log |
Whether to log the libname operation. Default is TRUE. This parameter is used internally. |
where |
An expression used to subset all datasets in the library.
The where clause will be executed when the library is created. Use the
Base R |
For most projects, a data file does not exist in isolation. There are sets of
related files of the same file type. The aim of the libname
function
is to take advantage of this fact, and give you an easy way to manage
the entire set.
The libname
function points to a directory of data files, and associates
a name with that set of data. The name refers to an object of class 'lib',
which at its heart is a named list. When the libname
function
executes, it will load all the data in the directory into the list, and assign
the file name (without extension) as the list item name. Data can be accessed
using list syntax, or loaded directly into the local environment using the
lib_load
function.
The libname
function provides several data engines to read
data of different types. For example, there is an engine for Excel
files, and another engine for SAS® datasets. The engines are identified
by the extension of the file type they handle. The available engines are
'rds', 'RData', 'rda', 'csv', 'xlsx', 'xls', 'sas7bdat', 'xpt', 'dbf', and
'parquet'.
Once an engine has been assigned to a library, all other read/write
operations will be performed by that engine.
The data engines largely hide file import details from you.
The purpose of the libname
function is to make it easy to
import a set of related data files that follow standard conventions.
The function assumes that the
data has file extensions that match the file type, and then makes further
assumptions based on each type of file. As a result, there are very few
import options on the libname
function. If your data does not
follow standard conventions, it is recommended that you import your
data using a package that gives you more control over import options.
The library object, with all data files loaded into the library list. Items in the list will be named according the the file name, minus the file extension.
The libname
function currently provides seven different engines for
seven different types of data files.
Here is a complete list of available engines and some commentary about each:
rds: For R data sets. This engine is the default. Because detailed data type and attribute information can be stored inside the rds file, the rds engine is the most reliable and easiest to use.
Rdata and rda: Older R data storage formats. Like the 'rds' engine, these storage types retain column attributes and data types.
csv: For comma separated value files. This engine assumes
that the first row has column names, and that strings containing commas are
quoted. Blank values and the string 'NA' will be interpreted as NA.
Because data type information is not stored in csv files, the
csv engine will attempt to guess the data types based on the available data.
For most columns, the csv engine is able to guess accurately. Where
it fails most commonly is with date and time columns. For csv date and time
columns, it is therefore recommended to assign an import spec that tells
the engine how to read the date or time. See the specs
documentation for additional details.
xlsx: For Excel files produced with the current version of Excel. Excel provides more data type information than csv, but it is not as accurate as rds. Therefore, you may also need to provide import specifications with Excel files. Also note that currently the xlsx import engine will only import the first sheet of an Excel workbook. If you need to import a sheet that is not the first sheet, use a different package to import the data.
xls: An Excel file format used between 1997 and 2003, and still used in some organizations. As with xlsx, this file format provides more information than csv, but is not entirely reliable. Therefore, you may need to provide import specifications to the xls engine. Also note that the xls engine can read, but not write xls files. Any xls files read with the xls engine will be written as an xlsx file. Like the xlsx engine, the xls engine can only read the first sheet of a workbook.
sas7bdat: Handles SAS® datasets. SAS® datasets provide better type information than either csv or Excel. In most cases, you will not need to define import specifications for SAS® datasets. The sas7bdat engine interprets empty strings, single blanks, and a single dot (".") as missing values. While the import of SAS® datasets is fairly reliable, sas7bdat files cannot be written or exported with the sas7bdat engine. In these cases, it is recommended to export to another file format, such as csv, dbf, or parquet, and then import into SAS®.
xpt: The SAS® transport file engine. Transport format is a platform independent file format. Similar to SAS® datasets, it provides data type information. In most cases, you will not need to define import specifications. The xpt engine also interprets empty strings, single blanks, and a single dot (".") as missing values.
dbf: The DBASE file format engine. The DBASE engine was added to the libr package because many types of software can read and write in DBASE format reliably. Therefore it is a useful file format for interchange between software systems. The DBASE file format contains type information.
parquet: The parquet file format is an open source format supported by many languages and software. The format is strongly typed, and offers fast data storage and retrieval. The libr package can both read from and write to parquet files.
If you wish to import only a portion of your data files into a library,
you may accomplish it with the filter
parameter. The filter
parameter allows you to pass a vector of strings corresponding to the
names of the files you want to import. The function allows a
wild-card (*) for partial matching. For example, "te*"
means any
file name that that begins with a "te", and "*st"
means any file name
that ends with an "st".
In most cases, it is not necessary to specify the data types for incoming columns in your data. Either the file format will preserve the appropriate data type information, or the assigned engine will guess correctly.
However, in some cases it will be necessary to control the column data types.
For these cases, use the
import_specs
parameter. The import_specs
parameter allows you
to specify the data types by data set and column name. All the data type
specifications are contained within a specs
collection, and the
specifications for a particular data set are defined by an
import_spec
function. See the specs
and
import_spec
documentation for further information
and examples of defining an import spec.
specs
to define import specifications,
dictionary
to view the data dictionary for a library,
and datastep
to perform a data step.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
print.lib()
# Create temp directory tmp <- tempdir() # Save some data to temp directory # for illustration purposes saveRDS(trees, file.path(tmp, "trees.rds")) saveRDS(rock, file.path(tmp, "rocks.rds")) saveRDS(beaver1, file.path(tmp, "beaver1.rds")) # Create data library libname(dat, tmp) # # library 'dat': 3 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpklJcfl # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 rds 114 4 5.9 Kb 2020-12-06 15:21:30 # 2 rocks rds 48 4 3.6 Kb 2020-12-06 15:21:30 # 3 trees rds 31 3 2.9 Kb 2020-12-06 15:21:30 # Print dictionary for library dictionary(dat) # A tibble: 11 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <int> <int> # 1 beaver1 day numeric NA NA NA NA NA 114 0 # 2 beaver1 time numeric NA NA NA NA NA 114 0 # 3 beaver1 temp numeric NA NA NA NA NA 114 0 # 4 beaver1 activ numeric NA NA NA NA NA 114 0 # 5 rocks area integer NA NA NA NA NA 48 0 # 6 rocks peri numeric NA NA NA NA NA 48 0 # 7 rocks shape numeric NA NA NA NA NA 48 0 # 8 rocks perm numeric NA NA NA NA NA 48 0 # 9 trees Girth numeric NA NA NA NA NA 31 0 # 10 trees Height numeric NA NA NA NA NA 31 0 # 11 trees Volume numeric NA NA NA NA NA 31 0 # Load library into workspace lib_load(dat) # Print summaries for each data frame # Note that once loaded into the workspace, # data can be accessed using two-level syntax. summary(dat.rocks) summary(dat.trees) summary(dat.beaver1) #Unload from workspace lib_unload(dat) # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Save some data to temp directory # for illustration purposes saveRDS(trees, file.path(tmp, "trees.rds")) saveRDS(rock, file.path(tmp, "rocks.rds")) saveRDS(beaver1, file.path(tmp, "beaver1.rds")) # Create data library libname(dat, tmp) # # library 'dat': 3 items # - attributes: rds not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpklJcfl # - items: # Name Extension Rows Cols Size LastModified # 1 beaver1 rds 114 4 5.9 Kb 2020-12-06 15:21:30 # 2 rocks rds 48 4 3.6 Kb 2020-12-06 15:21:30 # 3 trees rds 31 3 2.9 Kb 2020-12-06 15:21:30 # Print dictionary for library dictionary(dat) # A tibble: 11 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <int> <int> # 1 beaver1 day numeric NA NA NA NA NA 114 0 # 2 beaver1 time numeric NA NA NA NA NA 114 0 # 3 beaver1 temp numeric NA NA NA NA NA 114 0 # 4 beaver1 activ numeric NA NA NA NA NA 114 0 # 5 rocks area integer NA NA NA NA NA 48 0 # 6 rocks peri numeric NA NA NA NA NA 48 0 # 7 rocks shape numeric NA NA NA NA NA 48 0 # 8 rocks perm numeric NA NA NA NA NA 48 0 # 9 trees Girth numeric NA NA NA NA NA 31 0 # 10 trees Height numeric NA NA NA NA NA 31 0 # 11 trees Volume numeric NA NA NA NA NA 31 0 # Load library into workspace lib_load(dat) # Print summaries for each data frame # Note that once loaded into the workspace, # data can be accessed using two-level syntax. summary(dat.rocks) summary(dat.trees) summary(dat.beaver1) #Unload from workspace lib_unload(dat) # Clean up lib_delete(dat)
The output
function will output an observation
from a datastep. The function takes no parameters. To use
the function, simply call it on the rows you want to output. Typically
it is called within a conditional. The output function is interesting
in that you can output multiple rows for the same input observation.
output()
output()
Observation is marked with a output flag. No return value.
Other datastep:
[.dsarray()
,
datastep()
,
delete()
,
dsarray()
,
dsattr()
,
length.dsarray()
#' # Example 1: Output all cars that are 4 cylinder df <- datastep(mtcars, keep = c("mpg", "cyl", "disp"), { if (cyl == 4) output() }) df # mpg cyl disp # 1 22.8 4 108.0 # 2 24.4 4 146.7 # 3 22.8 4 140.8 # 4 32.4 4 78.7 # 5 30.4 4 75.7 # 6 33.9 4 71.1 # 7 21.5 4 120.1 # 8 27.3 4 79.0 # 9 26.0 4 120.3 # 10 30.4 4 95.1 # 11 21.4 4 121.0 # Example 2: Output two rows for each 6 cylinder car # Prepare sample data dat <- data.frame(name = rownames(mtcars), mtcars, stringsAsFactors = FALSE) # Perform datastep df <- datastep(dat, keep = c("name", "mpg", "cyl", "disp", "seq"), { if (cyl == 6) { seq <- 1 output() seq <- 2 output() } }) df # name mpg cyl disp seq # 1 Mazda RX4 21.0 6 160.0 1 # 2 Mazda RX4 21.0 6 160.0 2 # 3 Mazda RX4 Wag 21.0 6 160.0 1 # 4 Mazda RX4 Wag 21.0 6 160.0 2 # 5 Hornet 4 Drive 21.4 6 258.0 1 # 6 Hornet 4 Drive 21.4 6 258.0 2 # 7 Valiant 18.1 6 225.0 1 # 8 Valiant 18.1 6 225.0 2 # 9 Merc 280 19.2 6 167.6 1 # 10 Merc 280 19.2 6 167.6 2 # 11 Merc 280C 17.8 6 167.6 1 # 12 Merc 280C 17.8 6 167.6 2 # 13 Ferrari Dino 19.7 6 145.0 1 # 14 Ferrari Dino 19.7 6 145.0 2 # Example 3: Create data frame using output() functions df <- datastep(data.frame(), { # Row 1 COL1 <- 1 COL2 <- "One" output() # Row 2 COL1 <- 2 COL2 <- "Two" output() }) df # COL1 COL2 # 1 1 One # 2 2 Two
#' # Example 1: Output all cars that are 4 cylinder df <- datastep(mtcars, keep = c("mpg", "cyl", "disp"), { if (cyl == 4) output() }) df # mpg cyl disp # 1 22.8 4 108.0 # 2 24.4 4 146.7 # 3 22.8 4 140.8 # 4 32.4 4 78.7 # 5 30.4 4 75.7 # 6 33.9 4 71.1 # 7 21.5 4 120.1 # 8 27.3 4 79.0 # 9 26.0 4 120.3 # 10 30.4 4 95.1 # 11 21.4 4 121.0 # Example 2: Output two rows for each 6 cylinder car # Prepare sample data dat <- data.frame(name = rownames(mtcars), mtcars, stringsAsFactors = FALSE) # Perform datastep df <- datastep(dat, keep = c("name", "mpg", "cyl", "disp", "seq"), { if (cyl == 6) { seq <- 1 output() seq <- 2 output() } }) df # name mpg cyl disp seq # 1 Mazda RX4 21.0 6 160.0 1 # 2 Mazda RX4 21.0 6 160.0 2 # 3 Mazda RX4 Wag 21.0 6 160.0 1 # 4 Mazda RX4 Wag 21.0 6 160.0 2 # 5 Hornet 4 Drive 21.4 6 258.0 1 # 6 Hornet 4 Drive 21.4 6 258.0 2 # 7 Valiant 18.1 6 225.0 1 # 8 Valiant 18.1 6 225.0 2 # 9 Merc 280 19.2 6 167.6 1 # 10 Merc 280 19.2 6 167.6 2 # 11 Merc 280C 17.8 6 167.6 1 # 12 Merc 280C 17.8 6 167.6 2 # 13 Ferrari Dino 19.7 6 145.0 1 # 14 Ferrari Dino 19.7 6 145.0 2 # Example 3: Create data frame using output() functions df <- datastep(data.frame(), { # Row 1 COL1 <- 1 COL2 <- "One" output() # Row 2 COL1 <- 2 COL2 <- "Two" output() }) df # COL1 COL2 # 1 1 One # 2 2 Two
A class-specific instance of the print
function for
data libraries. The function prints the library in a summary manner.
Use verbose = TRUE
to print the library as a list.
## S3 method for class 'lib' print(x, ..., verbose = FALSE)
## S3 method for class 'lib' print(x, ..., verbose = FALSE)
x |
The library to print. |
... |
Any follow-on parameters. |
verbose |
Whether or not to print the library in verbose style. By default, the parameter is FALSE, meaning to print in summary style. |
The object, invisibly.
Other lib:
is.lib()
,
lib_add()
,
lib_copy()
,
lib_delete()
,
lib_export()
,
lib_info()
,
lib_load()
,
lib_path()
,
lib_remove()
,
lib_replace()
,
lib_size()
,
lib_sync()
,
lib_unload()
,
lib_write()
,
libname()
# Create temp directory tmp <- tempdir() # Create data library libname(dat, tmp) # Add data to library lib_add(dat, iris, ToothGrowth, PlantGrowth) # Print library summary print(dat) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 iris rds 150 5 7.8 Kb 2020-11-05 22:26:59 # 2 PlantGrowth rds 30 2 2.5 Kb 2020-11-05 22:26:59 # 3 ToothGrowth rds 60 3 3.4 Kb 2020-11-05 22:26:59 # Clean up lib_delete(dat)
# Create temp directory tmp <- tempdir() # Create data library libname(dat, tmp) # Add data to library lib_add(dat, iris, ToothGrowth, PlantGrowth) # Print library summary print(dat) # library 'dat': 3 items # - attributes: not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc # - items: # Name Extension Rows Cols Size LastModified # 1 iris rds 150 5 7.8 Kb 2020-11-05 22:26:59 # 2 PlantGrowth rds 30 2 2.5 Kb 2020-11-05 22:26:59 # 3 ToothGrowth rds 60 3 3.4 Kb 2020-11-05 22:26:59 # Clean up lib_delete(dat)
A function to print the import specification collection.
## S3 method for class 'specs' print(x, ..., verbose = FALSE)
## S3 method for class 'specs' print(x, ..., verbose = FALSE)
x |
The specifications to print. |
... |
Any follow-on parameters to the print function. |
verbose |
Whether or not to print the specifications in verbose style. By default, the parameter is FALSE, meaning to print in summary style. |
The specification object, invisibly.
Other specs:
import_spec()
,
read.specs()
,
specs()
,
write.specs()
A function to read import specifications from the file system.
The function accepts a full or relative path to the spec file, and returns
the specs as an object. If the file_path
parameter is passed
as a directory name, the function will search for a file with a '.specs'
extension and read it.
read.specs(file_path = getwd())
read.specs(file_path = getwd())
file_path |
The full or relative path to the file system. Default is
the current working directory. If the |
The specifications object.
Other specs:
import_spec()
,
print.specs()
,
specs()
,
write.specs()
A function to capture a set of import specifications for a
directory of data files. These specs can be used on the
libname
function to correctly assign the data types for
imported data files. The
import engines will guess at the data types for any columns that
are not explicitly defined in the import specifications. Import
specifications are defined with the import_spec
function.
The import spec syntax is the same for all data engines.
Note that the na
and trim_ws
parameters on the specs
function will be applied globally to all files in the library.
These global settings can be overridden on the import_spec
for any particular data file.
Also note that the specs
collection is defined as an object
so it can be stored and reused.
See the write.specs
and read.specs
functions
for additional information on saving specs.
specs(..., na = c("", "NA"), trim_ws = TRUE)
specs(..., na = c("", "NA"), trim_ws = TRUE)
... |
Named input specs. The name should correspond to the file name,
without the file extension.
The spec is defined as an |
na |
A vector of values to be treated as NA. For example, the
vector |
trim_ws |
Whether or not to trim white space from the input data values.
Valid values are TRUE, and FALSE. Default is TRUE. The value of the
|
The import specifications object.
libname
to create a data library,
dictionary
for generating a data dictionary, and
import_spec
for additional information on defining an
import spec.
Other specs:
import_spec()
,
print.specs()
,
read.specs()
,
write.specs()
library(readr) # Create temp path tmp <- file.path(tempdir(), "mtcars.csv") # Create data for illustration purposes df <- data.frame(vehicle = rownames(mtcars), mtcars[c("mpg", "cyl", "disp")], stringsAsFactors = FALSE) # Kill rownames rownames(df) <- NULL # Add some columns df <- datastep(df[1:10, ], { recdt <- "10JUN1974" if (mpg >= 20) mpgcat <- "High" else mpgcat <- "Low" if (cyl == 8) cyl8 <- TRUE }) df # vehicle mpg cyl disp recdt mpgcat cyl8 # 1 Mazda RX4 21.0 6 160.0 10JUN1974 High NA # 2 Mazda RX4 Wag 21.0 6 160.0 10JUN1974 High NA # 3 Datsun 710 22.8 4 108.0 10JUN1974 High NA # 4 Hornet 4 Drive 21.4 6 258.0 10JUN1974 High NA # 5 Hornet Sportabout 18.7 8 360.0 10JUN1974 Low TRUE # 6 Valiant 18.1 6 225.0 10JUN1974 Low NA # 7 Duster 360 14.3 8 360.0 10JUN1974 Low TRUE # 8 Merc 240D 24.4 4 146.7 10JUN1974 High NA # 9 Merc 230 22.8 4 140.8 10JUN1974 High NA # 10 Merc 280 19.2 6 167.6 10JUN1974 Low NA # Save to temp directory for this example write_csv(df, tmp) ## Start Example ## # Define import spec spcs <- specs(mtcars = import_spec(vehicle = "character", cyl = "integer", recdt = "date=%d%b%Y", mpgcat = "guess", cyl8 = "logical")) # Create library libname(dat, tempdir(), "csv", import_specs = spcs) # $mtcars # library 'dat': 1 items # - attributes: csv not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpqAMV6L # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars csv 10 7 9.3 Kb 2020-11-29 09:47:52 # View data types dictionary(dat) # # A tibble: 7 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <int> <chr> <int> <int> # 1 mtcars vehicle character NA NA NA 17 NA 10 0 # 2 mtcars mpg numeric NA NA NA NA NA 10 0 # 3 mtcars cyl integer NA NA NA NA NA 10 0 # 4 mtcars disp numeric NA NA NA NA NA 10 0 # 5 mtcars mpgcat character NA NA NA 4 NA 10 0 # 6 mtcars recdt Date NA NA NA NA NA 10 0 # 7 mtcars cyl8 logical NA NA NA NA NA 10 8 # Clean up lib_delete(dat)
library(readr) # Create temp path tmp <- file.path(tempdir(), "mtcars.csv") # Create data for illustration purposes df <- data.frame(vehicle = rownames(mtcars), mtcars[c("mpg", "cyl", "disp")], stringsAsFactors = FALSE) # Kill rownames rownames(df) <- NULL # Add some columns df <- datastep(df[1:10, ], { recdt <- "10JUN1974" if (mpg >= 20) mpgcat <- "High" else mpgcat <- "Low" if (cyl == 8) cyl8 <- TRUE }) df # vehicle mpg cyl disp recdt mpgcat cyl8 # 1 Mazda RX4 21.0 6 160.0 10JUN1974 High NA # 2 Mazda RX4 Wag 21.0 6 160.0 10JUN1974 High NA # 3 Datsun 710 22.8 4 108.0 10JUN1974 High NA # 4 Hornet 4 Drive 21.4 6 258.0 10JUN1974 High NA # 5 Hornet Sportabout 18.7 8 360.0 10JUN1974 Low TRUE # 6 Valiant 18.1 6 225.0 10JUN1974 Low NA # 7 Duster 360 14.3 8 360.0 10JUN1974 Low TRUE # 8 Merc 240D 24.4 4 146.7 10JUN1974 High NA # 9 Merc 230 22.8 4 140.8 10JUN1974 High NA # 10 Merc 280 19.2 6 167.6 10JUN1974 Low NA # Save to temp directory for this example write_csv(df, tmp) ## Start Example ## # Define import spec spcs <- specs(mtcars = import_spec(vehicle = "character", cyl = "integer", recdt = "date=%d%b%Y", mpgcat = "guess", cyl8 = "logical")) # Create library libname(dat, tempdir(), "csv", import_specs = spcs) # $mtcars # library 'dat': 1 items # - attributes: csv not loaded # - path: C:\Users\User\AppData\Local\Temp\RtmpqAMV6L # - items: # Name Extension Rows Cols Size LastModified # 1 mtcars csv 10 7 9.3 Kb 2020-11-29 09:47:52 # View data types dictionary(dat) # # A tibble: 7 x 10 # Name Column Class Label Description Format Width Justify Rows NAs # <chr> <chr> <chr> <chr> <chr> <lgl> <int> <chr> <int> <int> # 1 mtcars vehicle character NA NA NA 17 NA 10 0 # 2 mtcars mpg numeric NA NA NA NA NA 10 0 # 3 mtcars cyl integer NA NA NA NA NA 10 0 # 4 mtcars disp numeric NA NA NA NA NA 10 0 # 5 mtcars mpgcat character NA NA NA 4 NA 10 0 # 6 mtcars recdt Date NA NA NA NA NA 10 0 # 7 mtcars cyl8 logical NA NA NA NA NA 10 8 # Clean up lib_delete(dat)
A function to write import specifications to the file system. The function accepts a specifications object and a full or relative path. The function returns the full file path. This function is useful so that you can define import specifications once, and reuse them in multiple programs or across multiple teams.
write.specs(x, dir_path = getwd(), file_name = NULL)
write.specs(x, dir_path = getwd(), file_name = NULL)
x |
A specifications object of class 'specs'. |
dir_path |
A full or relative path to save the specs. Default is the current working directory. |
file_name |
The file name to save to specs, without a file extension. The file extension will be added automatically. If no file name is supplied, the function will use the variable name as the file name. |
The full file path.
Other specs:
import_spec()
,
print.specs()
,
read.specs()
,
specs()