Title: | Reading and Writing Open Data Format Files |
---|---|
Description: | The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>. |
Authors: | Tom Hartl [aut, cre] , Claudia Saalbach [ctb] |
Maintainer: | Tom Hartl <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.1.1 |
Built: | 2024-12-04 15:49:08 UTC |
Source: | CRAN |
The package is designed to support the use of the open data format. For this purpose, three main functions have been developed:
Import data from the Open Data Format to an R data frame.
Export data from an R data frame to the open data format.
Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.
Tom Hartl ([email protected]), Claudia Saalbach ([email protected])
Other Contributors: KonsortSWD/NFDI, DIW Berlin
More information about the Open Data Format specification and data examples are available here: https://git.soep.de/opendata/
example data with attributes specified for the Open Data Format.
data_odf
data_odf
A data frame with 20 rows and 7 variables:
Current Health.
Hours of sleep, normal workday.
Pressed For Time Last 4 Weeks.
Run-down, Melancholy Last 4 Weeks.
Well-balanced Last 4 Weeks.
Height.
Firstname.
https://github.com/opendataformat/Specification/tree/main/Example
Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.
docu_odf( input, languages = "current", style = "viewer", replace_missing_language = FALSE, variables = "yes" )
docu_odf( input, languages = "current", style = "viewer", replace_missing_language = FALSE, variables = "yes" )
input |
R data frame (df) or variable from an R data frame (df$var). |
languages |
Select the language in which the descriptions and labels of the data will be displayed.
|
style |
Selects where the output should be displayed (console ore
viewer).By default the metadata information is displayed in the viewer if the
viewer is available.
(
|
replace_missing_language |
If only one language is specified in languages and replace_missing_language is set to TRUE. In case of a missing label or description, the default or english label/description is displayed additionally (if one of these is available). |
variables |
Indicate whether a list with all the variables should be
displayed with the dataset metadata.
If the input is a variable/column, the variables-argument will be ignored.
Set ( |
Documentation.
# get example data from the opendataformat package df <- get(data("data_odf")) # view documentation about the dataset in the language that is currently set docu_odf(df) # view information from a selected variable in language "en" docu_odf(df$bap87, languages = "en") # view dataset information for all available languages docu_odf(df, languages = "all") # print information to the R console docu_odf(df$bap87, style = "print") # print information to the R viewer docu_odf(df$bap87, style = "viewer") # Since the label for language de is missing, in this case the # english label will be displayed additionally. attributes(df$bap87)["label_de"] <- "" docu_odf(df$bap87, languages = "de", replace_missing_language = TRUE)
# get example data from the opendataformat package df <- get(data("data_odf")) # view documentation about the dataset in the language that is currently set docu_odf(df) # view information from a selected variable in language "en" docu_odf(df$bap87, languages = "en") # view dataset information for all available languages docu_odf(df, languages = "all") # print information to the R console docu_odf(df$bap87, style = "print") # print information to the R viewer docu_odf(df$bap87, style = "viewer") # Since the label for language de is missing, in this case the # english label will be displayed additionally. attributes(df$bap87)["label_de"] <- "" docu_odf(df$bap87, languages = "de", replace_missing_language = TRUE)
Get access to information about the dataset and variables via the R-Studio Viewer or the web browser.
getmetadata_odf(input, type, language = "active")
getmetadata_odf(input, type, language = "active")
input |
R data frame (df) or variable from an R data frame (df$var). |
type |
The metadata type you want to retrieve.Possible options are "label", "description", "url", "type", "valuelabels", or "languages". |
language |
Select the language in which the labels of the variables will be displayed. If no language is selected, the current/active language of the data frame will be used.
|
Documentation.
# get example data from the opendataformat package df <- get(data("data_odf")) # view the variable labels for all variables in English getmetadata_odf(input = df, type = "label", language = "en") # view the value labels for variable bap87 in English getmetadata_odf(input = df$bap87, type = "valuelabel", language = "en") # view the description for variable bap87 in English getmetadata_odf(input = df$bap87, type = "description", language = "en")
# get example data from the opendataformat package df <- get(data("data_odf")) # view the variable labels for all variables in English getmetadata_odf(input = df, type = "label", language = "en") # view the value labels for variable bap87 in English getmetadata_odf(input = df$bap87, type = "valuelabel", language = "en") # view the description for variable bap87 in English getmetadata_odf(input = df$bap87, type = "description", language = "en")
Merge two odf tibbles in R while keeping attributes with metadata.
## S3 method for class 'odf' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE, allow.cartesian = getOption("datatable.allow.cartesian"), incomparables = NULL, ... )
## S3 method for class 'odf' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE, allow.cartesian = getOption("datatable.allow.cartesian"), incomparables = NULL, ... )
x , y
|
odf data.frames, or objects to be coerced to one |
by |
A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x. |
by.x , by.y
|
Vectors of column names in x and y to merge on. |
all |
logical; all = TRUE is shorthand to save setting both all.x = TRUE and all.y = TRUE. |
all.x |
logical; if TRUE, rows from x which have no matching row in y are included. These rows will have 'NA's in the columns that are usually filled with values from y. The default is FALSE so that only rows with data from both x and y are included in the output. |
all.y |
logical; analogous to all.x above. |
sort |
logical. If TRUE (default), the rows of the merged data.table are sorted by setting the key to the by / by.x columns. If FALSE, unlike base R's merge for which row order is unspecified, the row order in x is retained (including retaining the position of missings when all.x=TRUE), followed by y rows that don't match x (when all.y=TRUE) retaining the order those appear in y. |
suffixes |
A character(2) specifying the suffixes to be used for making non-by column names unique. The suffix behaviour works in a similar fashion as the merge.data.frame method does. |
no.dups |
logical indicating that suffixes are also appended to non-by.y column names in y when they have the same column name as any by.x. |
allow.cartesian |
See allow.cartesian in |
incomparables |
values which cannot be matched and therefore are excluded from by columns. |
... |
Not used at this time. |
merge
is a generic function in base R. It dispatches
to either the merge.data.frame method, merge.odf or merge.data.table method
depending on the class of its first argument. merge.odf uses the
merge.data.table to join data.frame and adds the attributes containing
metadata from the two original odf data.frames.
Note that, unlike SQL join, NA is matched against NA (and NaN against NaN)
while merging.
For a more data.table-centric way of merging two data.tables, see
data.table
. See FAQ 1.11 for a detailed comparison of
merge.
A new odf data.frame build from the two input data.frames with the variable attributes from the original data.frames. Sorted by the columns set (or inferred for) the by argument if argument sort is set to TRUE. For variables/columns occurring in both x and y, attributes are taken from x.
# get path to example data from the opendataformat package (data.zip) path <- system.file("extdata", "data.zip", package = "opendataformat") # read four columns of example data specified as ODF from ZIP file df <- read_odf(file = path, select = 1:4) # read other columns of example data specified as ODF from ZIP file df2 <- read_odf(file = path, select = 4:7) # generate a variable for joining both datasets: df$id<-1:20 df2$id<-1:20 # merge both datasets by id column merged_df<-merge(df, df2) #merge both datasets by shared key columns between the two tables merged_df2<-merge(df, df2)
# get path to example data from the opendataformat package (data.zip) path <- system.file("extdata", "data.zip", package = "opendataformat") # read four columns of example data specified as ODF from ZIP file df <- read_odf(file = path, select = 1:4) # read other columns of example data specified as ODF from ZIP file df2 <- read_odf(file = path, select = 4:7) # generate a variable for joining both datasets: df$id<-1:20 df2$id<-1:20 # merge both datasets by id column merged_df<-merge(df, df2) #merge both datasets by shared key columns between the two tables merged_df2<-merge(df, df2)
Import data from the Open Data Format to an R data frame.
read_odf(file, languages = "all", nrows = Inf, skip = 0, select = NULL)
read_odf(file, languages = "all", nrows = Inf, skip = 0, select = NULL)
file |
the name of the file which the data are to be read from.
By default all available language variants are imported
( |
languages |
integer: the maximum number of rows to read in. Negative and other invalid values are ignored. |
nrows |
Maximum number of lines to read. |
skip |
Select the number of rows to be skipped (without the column names). |
select |
A vector of column names or numbers to keep, drop the rest. In all forms of select, order that the columns are specified determines the order of the columns in the result. |
R dataframe with attributes including dataset and variable information.
# get path to example data from the opendataformat package (data.zip) path <- system.file("extdata", "data.zip", package = "opendataformat") path # read example data specified as Open Data Format from ZIP file df <- read_odf(file = path) attributes(df) attributes(df$bap87) # read example data with language selection df <- read_odf(file = path, languages = "de") attributes(df$bap87)
# get path to example data from the opendataformat package (data.zip) path <- system.file("extdata", "data.zip", package = "opendataformat") path # read example data specified as Open Data Format from ZIP file df <- read_odf(file = path) attributes(df) attributes(df$bap87) # read example data with language selection df <- read_odf(file = path, languages = "de") attributes(df$bap87)
Changes the active language of a dataframe with metadata for the docu_odf function.
setlanguage_odf(dataframe, language)
setlanguage_odf(dataframe, language)
dataframe |
R data frame (df) enriched with metadata in the odf-format. |
language |
Select the language to which you want to switch the metadata. |
Dataframe
# get example data from the opendataformat package df <- get(data("data_odf")) # Switch dataset df to language "en" df <- setlanguage_odf(df, language = "en") # Display dataset information for dataset df in language "en" docu_odf(df)
# get example data from the opendataformat package df <- get(data("data_odf")) # Switch dataset df to language "en" df <- setlanguage_odf(df, language = "en") # Display dataset information for dataset df in language "en" docu_odf(df)
Export data from an R data frame to a ZIP file that stores the data as Open Data Format.
write_odf( x, file, languages = "all", export_data = TRUE, verbose = TRUE, compression_level = 5 )
write_odf( x, file, languages = "all", export_data = TRUE, verbose = TRUE, compression_level = 5 )
x |
R data frame (df) to be writtem. |
file |
Path to ZIP file or name of zip file to save the odf-dataset in the working directory. |
languages |
Select the language in which the descriptions and labels of the data will be exported
|
export_data |
Choose, if you want to export the file that holds the data (data.csv).Default is TRUE.
|
verbose |
Display more messages. |
compression_level |
A number between 1 and 9. 9 compresses best, but it also takes the longest. |
ZIP file and unzipped directory containing the data as CSV file and the metadata as XML file (DDI Codebook 2.5.).
# get example data from the opendataformat package df <- get(data("data_odf")) # write R data frame with attributes to the file my_data.zip specified # as Open Data Format. write_odf(x = df, paste0(tempdir(), "/my_data.zip")) # write R data frame with attributes to the file my_data.zip # with selected language. write_odf(x = df, paste0(tempdir(), "/my_data.zip"), languages = "en") # write R data frame with attributes to the file my_data.zip but only # metadata, no data. write_odf(x = df, file = paste0(tempdir(), "/my_data.zip"), export_data = FALSE)
# get example data from the opendataformat package df <- get(data("data_odf")) # write R data frame with attributes to the file my_data.zip specified # as Open Data Format. write_odf(x = df, paste0(tempdir(), "/my_data.zip")) # write R data frame with attributes to the file my_data.zip # with selected language. write_odf(x = df, paste0(tempdir(), "/my_data.zip"), languages = "en") # write R data frame with attributes to the file my_data.zip but only # metadata, no data. write_odf(x = df, file = paste0(tempdir(), "/my_data.zip"), export_data = FALSE)