Title: | Spreadsheet Interface for Relational Databases |
---|---|
Description: | Use 'SQLite3' as a database system via a complete SQL free R interface, treating the data as if it was a single spreadsheet. |
Authors: | Benjamin Becker [aut, cre] |
Maintainer: | Benjamin Becker <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5.0 |
Built: | 2024-12-04 07:15:17 UTC |
Source: | CRAN |
Creates a relational data base from a list of data.frames (dfList
). The list structure including the naming of dfList
, pkList
and fkList
needs to be exactly the same. Keys (pkList
and fkList$Keys
) can either be character vectors with a single variable name or multiple variable names. Primary keys (pkList
) have to be unique within a single data.frame. Foreign Keys (fkList
) have to consist of a list with the referenced data frame (fkList$References
) and the referencing keys (fkList$Keys
). If a single data frame is to be converted to a data base, pkList
can be dropped. Otherwise, both elements of fkList
need to be set to NULL
.
createDB(dfList, pkList, fkList = NULL, metaData = NULL, filePath)
createDB(dfList, pkList, fkList = NULL, metaData = NULL, filePath)
dfList |
Named list of data frames. The order of the data.frames determines the merge order. |
pkList |
Named list of the primary keys corresponding to the data.frames. |
fkList |
Named list of a list per data.frame, including referenced data frame ( |
metaData |
[optional] Data.frame including meta data information about the other data.frames. |
filePath |
Path to the db file to write (including name); has to end on |
Primary keys guarantee uniqueness of cases within a single data.frame, and are single variables or combinations of variables. Foreign keys are used to merge data.frames. The foreign key for the first data set always has to be set to list(References = NULL, Keys = NULL)
. The order in which the data.frames are supplied determines the merge order. Currently, left joins are performed when merging data.frames. However, data.frames are stored separately in the relational data base and are only merged if pulled from the data base. \
Conventions for naming variables (columns) follow naming conventions of SQLite3. '.' and sqlite_keywords
are prohibited. Two additional tables within the SQLite3 data base are created: Meta_Information
, which contains a single character with the merge order that is used by dbPull
and Meta_Data
, which contains the meta data.frame supplied to the argument metaData
.
Creates a data base in the given path, returns NULL.
# Set up data frames NoImp <- data.frame(ID = 1:5, age = sample(12:17, size = 5, replace = TRUE), weight = sample(40:60, size = 5, replace = TRUE)) Imp <- data.frame(ID = rep(1:5, 3), imp = c(rep(1, 5), rep(2, 5), rep(3, 5)), noBooks = sample(1:200, 15, replace = TRUE)) PVs <- data.frame(ID = rep(rep(1:5, 3), 2), imp = rep(c(rep(1, 5), rep(2, 5), rep(3, 5)), 2), subject = c(rep("math", 15), rep("reading", 15)), pv = sample(seq(from = -1.75, to = 1.75, by = 0.05), 30, replace = TRUE), stringsAsFactors = FALSE) # Combine into named list dfList <- list(NoImp = NoImp, Imp = Imp, PVs = PVs) # Define primary and foreign keys accordingly pkList <- list(NoImp = "ID", Imp = c("ID", "imp"), PVs = c("ID", "imp", "subject")) fkList <- list(NoImp = list(References = NULL, Keys = NULL), Imp = list(References = "NoImp", Keys = "ID"), PVs = list(References = "Imp", Keys = c("ID", "imp"))) # Optional metaData metaData <- data.frame(varName = c("ID", "age", "weight", "imp", "noBooks", "subject", "pv"), varLabel = c("ID variable", "Age in years", "Body weight in kilogram", "Multiple Imputation number", "Number of books at home (self reported)", "Competence domain (Mathematical Literacy/Reading Literacy", "Plausible value"), data_table = c(rep("NoImp", 3), rep("Imp", 2), rep("PVs", 2)), stringsAsFactors = FALSE) # Create in memory data base createDB(dfList = dfList, pkList = pkList, fkList = fkList, metaData = metaData, filePath = ":memory:")
# Set up data frames NoImp <- data.frame(ID = 1:5, age = sample(12:17, size = 5, replace = TRUE), weight = sample(40:60, size = 5, replace = TRUE)) Imp <- data.frame(ID = rep(1:5, 3), imp = c(rep(1, 5), rep(2, 5), rep(3, 5)), noBooks = sample(1:200, 15, replace = TRUE)) PVs <- data.frame(ID = rep(rep(1:5, 3), 2), imp = rep(c(rep(1, 5), rep(2, 5), rep(3, 5)), 2), subject = c(rep("math", 15), rep("reading", 15)), pv = sample(seq(from = -1.75, to = 1.75, by = 0.05), 30, replace = TRUE), stringsAsFactors = FALSE) # Combine into named list dfList <- list(NoImp = NoImp, Imp = Imp, PVs = PVs) # Define primary and foreign keys accordingly pkList <- list(NoImp = "ID", Imp = c("ID", "imp"), PVs = c("ID", "imp", "subject")) fkList <- list(NoImp = list(References = NULL, Keys = NULL), Imp = list(References = "NoImp", Keys = "ID"), PVs = list(References = "Imp", Keys = c("ID", "imp"))) # Optional metaData metaData <- data.frame(varName = c("ID", "age", "weight", "imp", "noBooks", "subject", "pv"), varLabel = c("ID variable", "Age in years", "Body weight in kilogram", "Multiple Imputation number", "Number of books at home (self reported)", "Competence domain (Mathematical Literacy/Reading Literacy", "Plausible value"), data_table = c(rep("NoImp", 3), rep("Imp", 2), rep("PVs", 2)), stringsAsFactors = FALSE) # Create in memory data base createDB(dfList = dfList, pkList = pkList, fkList = fkList, metaData = metaData, filePath = ":memory:")
Function to get the primary and foreign keys of the data frames in the relational data base.
dbKeys(filePath, includeMeta = FALSE)
dbKeys(filePath, includeMeta = FALSE)
filePath |
Path of the existing db file. |
includeMeta |
Should information about the |
Data in a relational data base are indexed by primary and foreign keys. Primary keys are unique identifiers inside a single data table. Foreign keys reference (link) to other data tables. This function returns the key structure of a relational data base.
Returns a list named as the data tables in the db. Each elements contains a list with the primary key, the data table it references to and the corresponding foreign keys.
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") keys <- dbKeys(db_path) ## primary key structure of the database keys$pkList ## foreign key structure of the database keys$fkList
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") keys <- dbKeys(db_path) ## primary key structure of the database keys$pkList ## foreign key structure of the database keys$fkList
Function to get the names of the variables included in the relational data base.
dbNames(filePath, includeMeta = FALSE)
dbNames(filePath, includeMeta = FALSE)
filePath |
Path of an existing |
includeMeta |
Should the variable names of the |
Extracts names of all variables included in the relational data base, structured as a list with the individual data tables as elements. The ordering in the list is equivalent to the merge order used when data is pulled from the data base.
Returns a named list of character vectors with the variables names included in the data tables.
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") varNames <- dbNames(db_path) ## Names of data tables names(varNames) ## Variable names in data table "NoImp" varNames$NoImp
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") varNames <- dbNames(db_path) ## Names of data tables names(varNames) ## Variable names in data table "NoImp" varNames$NoImp
Function to extract specific variables from various data tables. Variables are merged in the specified merge order via left joins and using the foreign keys. If variables are selected from a specific data table, the corresponding primary keys are also always extracted. If no variables from the first data tables in the mergeOrder
are selected, these data tables are skipped (up till the first variable - data table match). If only variables of a single data table are selected, this data table is extracted with all variables and sub setting is performed in R
.
dbPull(vSelect = NULL, filePath)
dbPull(vSelect = NULL, filePath)
vSelect |
Character vector of variables that should be pulled from data base. If |
filePath |
Path to an the existing db file. |
Note that the exact merging process is determined when the data base is created via createDB
and can not be altered post hoc. Further options (e.g. filtering cases, full joins) are still under development. If you want to use the package and have specific requests, please contact the package author.
Returns a data frame, including the selected variables.
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") ## Extract variables from the first data table by name # primary and foreign keys are added as required dat1 <- dbPull(vSelect = c("age"), filePath = db_path) ## Extract all variables from the first data table varNames <- dbNames(db_path) dat2 <- dbPull(vSelect = varNames$NoImp, filePath = db_path) ## Extract variables from different data table (merged automatically) dat3 <- dbPull(vSelect = c("weight", "noBooks", "pv"), filePath = db_path) ## Extract all variables from the data base dat4 <- dbPull(filePath = db_path)
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") ## Extract variables from the first data table by name # primary and foreign keys are added as required dat1 <- dbPull(vSelect = c("age"), filePath = db_path) ## Extract all variables from the first data table varNames <- dbNames(db_path) dat2 <- dbPull(vSelect = varNames$NoImp, filePath = db_path) ## Extract variables from different data table (merged automatically) dat3 <- dbPull(vSelect = c("weight", "noBooks", "pv"), filePath = db_path) ## Extract all variables from the data base dat4 <- dbPull(filePath = db_path)
Function to extract a single, complete data table from a relational data base. Especially useful for the extraction of the meta information stored in Meta_Data
.
dbSingleDF(dfName = "Meta_Data", filePath)
dbSingleDF(dfName = "Meta_Data", filePath)
dfName |
Name of the data table which should be extracted. |
filePath |
Path of the existing db file. |
This function makes use of the DBI::dbReadTable
function and extracts a complete data table from a data base. All variables are extracted and all rows are used. For extracting only some variables or merging data tables see dbPull
.
Returns a data frame with all variables and cases as in the corresponding data table.
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") ## Extract all meta information meta_data <- dbSingleDF(dfName = "Meta_Data", filePath = db_path) meta_data ## Extract a specific data table NoImp <- dbSingleDF(dfName = "NoImp", filePath = db_path) NoImp
db_path <- system.file("extdata", "example_dataBase.db", package = "eatDB") ## Extract all meta information meta_data <- dbSingleDF(dfName = "Meta_Data", filePath = db_path) meta_data ## Extract a specific data table NoImp <- dbSingleDF(dfName = "NoImp", filePath = db_path) NoImp
A character vector containing the current SQLite Keywords. These strings can not be used as variable names in eatDB
.
sqlite_keywords
sqlite_keywords
A character vector.
https://www.sqlite.org/c3ref/keyword_check.html