Title: | Database Queries Using 'data.table' Syntax |
---|---|
Description: | Query database tables over a 'DBI' connection using 'data.table' syntax. Attach database schemas to the search path. Automatically merge using foreign key constraints. |
Authors: | Kjell P. Konis [aut, cre], Luis Rocha [ctb] (Chinook Database - see example_files/LICENSE) |
Maintainer: | Kjell P. Konis <[email protected]> |
License: | MPL-2.0 |
Version: | 1.0.3 |
Built: | 2025-03-06 07:07:47 UTC |
Source: | CRAN |
A dbi.table is a data structure that describes a SQL query (called the
dbi.table's underlying SQL query). This query can be manipulated
using data.table
's [i, j, by]
syntax.
dbi.table(conn, id) ## S3 method for class 'dbi.table' x[i, j, by, nomatch = NA, on = NULL]
dbi.table(conn, id) ## S3 method for class 'dbi.table' x[i, j, by, nomatch = NA, on = NULL]
conn |
A |
id |
An |
x |
A |
i |
A logical expression of the columns of When When When |
j |
A list of expressions, a literal character vector of column names of
|
by |
A list of expressions, a literal character vector of column names of
|
nomatch |
Either |
on |
|
A dbi.table
.
as.data.frame
to retrieve the
results set as a data.frame
,
csql
to see the underlying SQL query.
# open a connection to the Chinook example database using duckdb duck <- chinook.duckdb() # create a dbi.table corresponding to the Album table on duck Album <- dbi.table(duck, DBI::Id(table_name = "Album")) # the print method displays a 5 row preview # print(Album) Album # 'id' can also be 'SQL'; use the same DBI connection as Album Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre")) # use the extract (\code{[}) method to subset the dbi.table Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))] # use csql to see the underlying SQL query csql(Album[AlbumId < 5, #WHERE .(Title, #SELECT nchar = paste(nchar(Title), "characters"))])
# open a connection to the Chinook example database using duckdb duck <- chinook.duckdb() # create a dbi.table corresponding to the Album table on duck Album <- dbi.table(duck, DBI::Id(table_name = "Album")) # the print method displays a 5 row preview # print(Album) Album # 'id' can also be 'SQL'; use the same DBI connection as Album Genre <- dbi.table(Album, DBI::SQL("chinook_duckdb.main.Genre")) # use the extract (\code{[}) method to subset the dbi.table Album[AlbumId < 5, .(Title, nchar = paste(nchar(Title), "characters"))] # use csql to see the underlying SQL query csql(Album[AlbumId < 5, #WHERE .(Title, #SELECT nchar = paste(nchar(Title), "characters"))])
Execute a dbi.table
's underlying SQL query and return the
result set as a data.frame
. By default, the
result set is limited to 10,000 rows. See Details.
## S3 method for class 'dbi.table' as.data.frame( x, row.names = NULL, optional = FALSE, ..., n = getOption("dbi_table_max_fetch", 10000L) )
## S3 method for class 'dbi.table' as.data.frame( x, row.names = NULL, optional = FALSE, ..., n = getOption("dbi_table_max_fetch", 10000L) )
x |
a |
row.names |
a logical value. This argument is not used. |
optional |
a logical value. This argument is not used. |
... |
additional arguments are ignored. |
n |
an integer value. When nonnegative, the underlying SQL query includes a
'LIMIT |
By default, as.data.frame
returns up to 10,000 rows (see the
n
argument). To override this limit, either call
as.data.frame
and provide the n
argument (e.g., n = -1
to return the entire result set), or set the option
dbi_table_max_fetch
to the desired default value of n
.
a data.frame
.
as.data.frame
(the generic method in the
base package).
duck <- chinook.duckdb() Artist <- dbi.table(duck, DBI::Id("Artist")) as.data.frame(Artist, n = 7)[]
duck <- chinook.duckdb() Artist <- dbi.table(duck, DBI::Id("Artist")) as.data.frame(Artist, n = 7)[]
Test whether an object is a dbi.table
, or coerce it if possible.
is.dbi.table(x) as.dbi.table(conn, x, type = c("auto", "query", "temporary"))
is.dbi.table(x) as.dbi.table(conn, x, type = c("auto", "query", "temporary"))
x |
any R object. |
conn |
a connection handle returned by |
type |
a character string. Possible choices are |
Two types of tables are provided: Temporary (when
type == "temporary"
) and In Query
(when type == "query"
). For Temporary, the data are
written to a SQL temporary table and the associated
dbi.table
is returned. For In Query, the data are
written into a CTE as part of the query itself - useful when the
connection does not permit creating temporary tables.
a dbi.table
.
The temporary tables created by this function are dropped
(by calling dbRemoveTable
) during garbage
collection when they are no longer referenced.
duck <- dbi.catalog(chinook.duckdb) csql(as.dbi.table(duck, iris[1:4, 1:3], type = "query"))
duck <- dbi.catalog(chinook.duckdb) csql(as.dbi.table(duck, iris[1:4, 1:3], type = "query"))
View a dbi.table
's underlying SQL query.
csql(x, n = getOption("dbi_table_max_fetch", 10000L))
csql(x, n = getOption("dbi_table_max_fetch", 10000L))
x |
a |
n |
a single integer value. When nonnegative, limits the number of rows
returned by the query to |
none (invisible NULL
).
dbi.table
sCall DBI methods using the underlying DBI connection.
## S4 method for signature 'dbi.catalog,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.schema,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,missing' dbSendStatement( conn, statement, ..., n = getOption("dbi_table_max_fetch", 10000L) ) ## S4 method for signature 'dbi.table,missing' dbGetQuery(conn, statement, ..., n = getOption("dbi_table_max_fetch", 10000L)) ## S4 method for signature 'dbi.catalog' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.schema' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table' dbGetInfo(dbObj, ...)
## S4 method for signature 'dbi.catalog,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.schema,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,SQL' dbExecute(conn, statement, ...) ## S4 method for signature 'dbi.table,missing' dbSendStatement( conn, statement, ..., n = getOption("dbi_table_max_fetch", 10000L) ) ## S4 method for signature 'dbi.table,missing' dbGetQuery(conn, statement, ..., n = getOption("dbi_table_max_fetch", 10000L)) ## S4 method for signature 'dbi.catalog' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.schema' dbGetInfo(dbObj, ...) ## S4 method for signature 'dbi.table' dbGetInfo(dbObj, ...)
conn |
a |
statement |
a |
... |
other parameters passed on to methods. |
n |
an integer value. A nonnegative value limits the number of records returned by the query. A negative value omits the LIMIT (or TOP) clause entirely. |
dbObj |
a |
dbExecute
, dbGetInfo
,
dbSendStatement
The database schema is attached to the R search path. This means that the
schema is searched by R when evaluating a variable, so that
dbi.table
s in the schema can be accessed by simply giving
their names.
dbi.attach( what, pos = 2L, name = NULL, warn.conflicts = FALSE, schema = NULL, graphics = TRUE )
dbi.attach( what, pos = 2L, name = NULL, warn.conflicts = FALSE, schema = NULL, graphics = TRUE )
what |
a connection handle returned by |
pos |
an integer specifying position in |
name |
a character string specifying the name to use for the attached database. |
warn.conflicts |
a logical value. If |
schema |
a character string specifying the name of the schema to attach. |
graphics |
a logical value; passed to |
an environment
, the attached schema is invisibly returned.
dbi.catalog
A dbi.catalog
represents a database catalog.
dbi.catalog(conn, schemas = NULL)
dbi.catalog(conn, schemas = NULL)
conn |
a connection handle returned by |
schemas |
a character vector of distinct schema names. These schemas will be loaded
into the |
a dbi.catalog
.
# chinook.duckdb is a zero-argument function that returns a DBI handle (db <- dbi.catalog(chinook.duckdb)) # list schemas ls(db) # list the tables in the schema 'main' ls(db$main)
# chinook.duckdb is a zero-argument function that returns a DBI handle (db <- dbi.catalog(chinook.duckdb)) # list schemas ls(db) # list the tables in the schema 'main' ls(db$main)
These zero-argument functions return connections to the example databases included in the dbi.table package.
chinook.sqlite() chinook.duckdb()
chinook.sqlite() chinook.duckdb()
a DBIConnection
object, as
returned by dbConnect
.
Merge two dbi.table
s. The dbi.table
method is similar
to the data.table
method except that the result
set is only determined up to row order and is not sorted by default.
Default merge columns: if x
has a foreign key constraint that
references y
then the columns comprising this key are used; see
details. When a foreign key cannot be found, then the common columns
between the two dbi.tables
s are used.
Use the by
, by.x
, and by.y
arguments explicitly to
override this default.
## S3 method for class 'dbi.table' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = FALSE, suffixes = c(".x", ".y"), no.dups = TRUE, recursive = FALSE, ... )
## S3 method for class 'dbi.table' merge( x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, all.x = all, all.y = all, sort = FALSE, suffixes = c(".x", ".y"), no.dups = TRUE, recursive = FALSE, ... )
x , y
|
|
by |
A vector of shared column names in |
by.x , by.y
|
character vectors of column names in |
all |
a logical value. |
all.x |
a logical value. When |
all.y |
a logical value. Analogous to |
sort |
a logical value. Currently ignored. |
suffixes |
a length-2 character vector. The suffixes to be used for making
non- |
no.dups |
a logical value. When |
recursive |
a logical value. Only used when |
... |
additional arguments are ignored. |
Foreign key constraints. Foreign keys can only be queried when (1) the
dbi.table
's schema is loaded, and (2) dbi.table
understands
the underlying database's information schema.
merge.dbi.table
uses sql.join
to join x
and
y
then formats the result set to match the typical merge
output.
a dbi.table
.
chinook <- dbi.catalog(chinook.duckdb) #The Album table has a foreign key constriant that references Artist merge(chinook$main$Album, chinook$main$Artist) #When y is omitted, x's foreign key relationship is used to determine y merge(chinook$main$Album) #Multiple foreign keys are supported csql(merge(chinook$main$Track)) #Track references Album but not Artist, Album references Artist #This dbi.table includes Artist.Name as well csql(merge(chinook$main$Track, recursive = TRUE))
chinook <- dbi.catalog(chinook.duckdb) #The Album table has a foreign key constriant that references Artist merge(chinook$main$Album, chinook$main$Artist) #When y is omitted, x's foreign key relationship is used to determine y merge(chinook$main$Album) #Multiple foreign keys are supported csql(merge(chinook$main$Track)) #Track references Album but not Artist, Album references Artist #This dbi.table includes Artist.Name as well csql(merge(chinook$main$Track, recursive = TRUE))
dbi.table
vs. Reference ImplementationEvaluate an expression including at least one dbi.table
and compare
the result with the Reference Implementation. This function is
primarily for testing and is potentially very slow for large tables.
reference.test( expr, envir = parent.frame(), ignore.row.order = TRUE, verbose = TRUE )
reference.test( expr, envir = parent.frame(), ignore.row.order = TRUE, verbose = TRUE )
expr |
an expression involving at least one |
envir |
an environment. Where to evaluate |
ignore.row.order |
a logical value. This argument is passed to |
verbose |
a logical value. When |
a logical value.
Suppose that id1
identifies a table in a SQL database and that
[i, j, by]
describes a subset/select/summarize operation using
data.table
syntax. The Reference Implementation for this
operation is:
setDT(dbReadTable(conn, id1))[i, j, by]
More generally, for an expression involving multiple SQL database objects
and using data.table
syntax, the Reference Implementation
would be to download each of these objects in their entirety, convert them
to data.table
s, then evaluate the expression.
The goal of the dbi.table is to generate an SQL query that produces the same results set as the Reference Implementation up to row ordering.
library(data.table) duck <- dbi.catalog(chinook.duckdb) Album <- duck$main$Album Artist <- duck$main$Artist reference.test(merge(Album, Artist, by = "ArtistId"))
library(data.table) duck <- dbi.catalog(chinook.duckdb) Album <- duck$main$Album Artist <- duck$main$Artist reference.test(merge(Album, Artist, by = "ArtistId"))
dbi.table
sA SQL
-like join of two dbi.table
s that share the
same DBI connection
. All columns from
both dbi.table
s are returned.
sql.join(x, y, type = "inner", on = NULL, prefixes = c("x.", "y."))
sql.join(x, y, type = "inner", on = NULL, prefixes = c("x.", "y."))
x , y
|
|
type |
a character string specifying the join type. Valid choices are
|
on |
a |
prefixes |
a 2-element character vector of distinct values. When |
a dbi.table
.
chinook <- dbi.catalog(chinook.duckdb) Album <- chinook$main$Album Artist <- chinook$main$Artist sql.join(Album, Artist, type = "inner", on = Album.ArtistId == Artist.ArtistId, prefixes = c("Album.", "Artist."))
chinook <- dbi.catalog(chinook.duckdb) Album <- chinook$main$Album Artist <- chinook$main$Artist sql.join(Album, Artist, type = "inner", on = Album.ArtistId == Artist.ArtistId, prefixes = c("Album.", "Artist."))