Title: | Connect to ODBC Compatible Databases (using the DBI Interface) |
---|---|
Description: | A DBI-compatible interface to ODBC databases. |
Authors: | Jim Hester [aut], Hadley Wickham [aut, cre], Oliver Gjoneski [aut], lexicalunit [cph] (nanodbc library), Google Inc. [cph] (cctz library), Posit Software, PBC [cph, fnd] |
Maintainer: | Hadley Wickham <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.5.0 |
Built: | 2024-11-03 06:26:01 UTC |
Source: | CRAN |
Connect to Databricks clusters and SQL warehouses via the Databricks ODBC driver.
In particular, the custom dbConnect()
method for the Databricks ODBC driver
implements a subset of the Databricks client unified authentication
model, with support for personal access tokens, OAuth machine-to-machine
credentials, and OAuth user-to-machine credentials supplied via Posit
Workbench or the Databricks CLI on desktop.
All of these credentials are detected automatically if present using
standard environment variables.
databricks() ## S4 method for signature 'DatabricksOdbcDriver' dbConnect( drv, httpPath, workspace = Sys.getenv("DATABRICKS_HOST"), useNativeQuery = TRUE, driver = NULL, HTTPPath, uid = NULL, pwd = NULL, ... )
databricks() ## S4 method for signature 'DatabricksOdbcDriver' dbConnect( drv, httpPath, workspace = Sys.getenv("DATABRICKS_HOST"), useNativeQuery = TRUE, driver = NULL, HTTPPath, uid = NULL, pwd = NULL, ... )
drv |
an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection). |
httpPath , HTTPPath
|
To query a cluster, use the HTTP Path value found
under |
workspace |
The URL of a Databricks workspace, e.g.
|
useNativeQuery |
Suppress the driver's conversion from ANSI SQL 92 to
HiveSQL? The default ( |
driver |
The name of the Databricks ODBC driver, or |
uid , pwd
|
Manually specify a username and password for authentication. Specifying these options will disable automated credential discovery. |
... |
Further arguments passed on to |
An OdbcConnection
object with an active connection to a Databricks
cluster or SQL warehouse.
## Not run: DBI::dbConnect( odbc::databricks(), httpPath = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj" ) ## End(Not run)
## Not run: DBI::dbConnect( odbc::databricks(), httpPath = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj" ) ## End(Not run)
Convenience functions for reading/writing DBMS tables
## S4 method for signature 'OdbcConnection,character,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection,Id,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection,SQL,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection' dbAppendTable( conn, name, value, batch_rows = getOption("odbc.batch_rows", NA), ..., row.names = NULL ) ## S4 method for signature 'OdbcConnection' sqlCreateTable( con, table, fields, row.names = NA, temporary = FALSE, ..., field.types = NULL )
## S4 method for signature 'OdbcConnection,character,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection,Id,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection,SQL,data.frame' dbWriteTable( conn, name, value, overwrite = FALSE, append = FALSE, temporary = FALSE, row.names = NULL, field.types = NULL, batch_rows = getOption("odbc.batch_rows", NA), ... ) ## S4 method for signature 'OdbcConnection' dbAppendTable( conn, name, value, batch_rows = getOption("odbc.batch_rows", NA), ..., row.names = NULL ) ## S4 method for signature 'OdbcConnection' sqlCreateTable( con, table, fields, row.names = NA, temporary = FALSE, ..., field.types = NULL )
conn |
a |
name |
a character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name. |
value |
A data.frame to write to the database. |
overwrite |
Allow overwriting the destination table. Cannot be
|
append |
Allow appending to the destination table. Cannot be
|
temporary |
If |
row.names |
Either If A string is equivalent to For backward compatibility, |
field.types |
Additional field types used to override derived types. |
batch_rows |
The number of rows to retrieve. Defaults to |
... |
Other arguments used by individual methods. |
con |
A database connection. |
table |
The table name, passed on to
|
fields |
Either a character vector or a data frame. A named character vector: Names are column names, values are types.
Names are escaped with A data frame: field types are generated using
|
## Not run: library(DBI) con <- dbConnect(odbc::odbc()) dbListTables(con) dbWriteTable(con, "mtcars", mtcars, temporary = TRUE) dbReadTable(con, "mtcars") dbListTables(con) dbExistsTable(con, "mtcars") # A zero row data frame just creates a table definition. dbWriteTable(con, "mtcars2", mtcars[0, ], temporary = TRUE) dbReadTable(con, "mtcars2") dbDisconnect(con) ## End(Not run)
## Not run: library(DBI) con <- dbConnect(odbc::odbc()) dbListTables(con) dbWriteTable(con, "mtcars", mtcars, temporary = TRUE) dbReadTable(con, "mtcars") dbListTables(con) dbExistsTable(con, "mtcars") # A zero row data frame just creates a table definition. dbWriteTable(con, "mtcars2", mtcars[0, ], temporary = TRUE) dbReadTable(con, "mtcars2") dbDisconnect(con) ## End(Not run)
dbListTables()
provides names of remote tables accessible through this
connection; dbListFields()
provides names of columns within a table.
## S4 method for signature 'OdbcConnection' dbListTables( conn, catalog_name = NULL, schema_name = NULL, table_name = NULL, table_type = NULL, ... )
## S4 method for signature 'OdbcConnection' dbListTables( conn, catalog_name = NULL, schema_name = NULL, table_name = NULL, table_type = NULL, ... )
conn |
A DBIConnection object, as returned by
|
catalog_name , schema_name , table_name
|
Catalog, schema, and table names. By default, |
table_type |
The type of the table to return, the default returns all table types. |
... |
Other parameters passed on to methods. |
A character vector of table or field names respectively.
Currently implemented only for select back-ends where we have a use for it (SQL Server, for example). Generic, in case we develop a broader use case.
isTempTable(conn, name, ...) ## S4 method for signature 'OdbcConnection,Id' isTempTable(conn, name, ...) ## S4 method for signature 'OdbcConnection,SQL' isTempTable(conn, name, ...)
isTempTable(conn, name, ...) ## S4 method for signature 'OdbcConnection,Id' isTempTable(conn, name, ...) ## S4 method for signature 'OdbcConnection,SQL' isTempTable(conn, name, ...)
conn |
OdbcConnection |
name |
Table name |
... |
additional parameters to methods |
The dbConnect()
method documented here is invoked when DBI::dbConnect()
is called with the first argument odbc()
. Connecting to a database via
an ODBC driver is likely the first step in analyzing data using the odbc
package; for an overview of package concepts, see the Overview section
below.
odbc() ## S4 method for signature 'OdbcDriver' dbConnect( drv, dsn = NULL, ..., timezone = "UTC", timezone_out = "UTC", encoding = "", bigint = c("integer64", "integer", "numeric", "character"), timeout = 10, driver = NULL, server = NULL, database = NULL, uid = NULL, pwd = NULL, dbms.name = NULL, attributes = NULL, interruptible = getOption("odbc.interruptible", interactive()), .connection_string = NULL )
odbc() ## S4 method for signature 'OdbcDriver' dbConnect( drv, dsn = NULL, ..., timezone = "UTC", timezone_out = "UTC", encoding = "", bigint = c("integer64", "integer", "numeric", "character"), timeout = 10, driver = NULL, server = NULL, database = NULL, uid = NULL, pwd = NULL, dbms.name = NULL, attributes = NULL, interruptible = getOption("odbc.interruptible", interactive()), .connection_string = NULL )
drv |
An |
dsn |
The data source name. For currently available options, see the
|
... |
Additional ODBC keywords. These will be joined with the other arguments to form the final connection string. Note that ODBC parameter names are case-insensitive so that (e.g.) Any values containing a leading or trailing space, a |
timezone |
The server time zone. Useful if the database has an internal
timezone that is not 'UTC'. If the database is in your local timezone,
set this argument to |
timezone_out |
The time zone returned to R. If you want to display
datetime values in the local timezone, set to |
encoding |
The text encoding used on the Database. If the database is
not using UTF-8 you will need to set the encoding to get accurate
re-encoding. See |
bigint |
The R type that |
timeout |
Time in seconds to timeout the connection attempt. Setting a
timeout of |
driver |
The ODBC driver name or a path to a driver. For currently
available options, see the |
server |
The server hostname. Some drivers use |
database |
The database on the server. Not required when configured for
the supplied |
uid |
The user identifier. Some drivers use |
pwd |
The password. Some drivers use |
dbms.name |
The database management system name. This should normally
be queried automatically by the ODBC driver. This name is used as the class
name for the OdbcConnection object returned from |
attributes |
A list of connection attributes that are passed prior to the connection being established. See ConnectionAttributes. |
interruptible |
Logical. If |
.connection_string |
A complete connection string, useful if you are copy pasting it from another source. If this argument is used, any additional arguments will be appended to this string. |
Internally, dbConnect()
creates a connection string using the supplied
arguments. Connection string keywords are driver-dependent; the arguments
documented here are common, but some drivers may not accept them.
Alternatively to configuring DSNs and driver names with the driver manager,
you can pass a complete connection string directly as the
.connection_string
argument.
The Connection Strings Reference is a
useful resource that has example connection strings for a large variety of
databases.
The odbc package is one piece of the R interface to databases with support for ODBC:
The package supports any Database Management System (DBMS) with ODBC support. Support for a given DBMS is provided by an ODBC driver, which defines how to interact with that DBMS using the standardized syntax of ODBC and SQL. Drivers can be downloaded from the DBMS vendor or, if you're a Posit customer, using the professional drivers. To manage information about each driver and the data sources they provide access to, our computers use a driver manager. Windows is bundled with a driver manager, while MacOS and Linux require installation of one; this package supports the unixODBC driver manager.
In the R interface, the DBI package provides a front-end while odbc implements a back-end to communicate with the driver manager. The odbc package is built on top of the nanodbc C++ library.
Interfacing with DBMSs using R and odbc involves three high-level steps:
Configure drivers and data sources: the functions odbcListDrivers()
and odbcListDataSources()
help to interface with the driver manager.
Connect to a database: The dbConnect()
function, called with the
first argument odbc(), connects to a database using the specified ODBC
driver to create a connection object.
Interface with connections: The resulting connection object can be
passed to various functions to retrieve information on database
structure (dbListTables()
), iteratively develop queries (dbSendQuery()
,
dbColumnInfo()
), and query data objects (dbFetch()
).
To learn more about databases:
"Best Practices in Working with Databases" documents how to use the odbc package with various popular databases.
The pyodbc "Drivers and Driver Managers" Wiki provides further context on drivers and driver managers.
Microsoft's "Introduction to ODBC" is a thorough resource on the ODBC interface.
odbcConnectionColumns()
If the catalog, or the schema arguments are NULL, attempt to infer by querying for CURRENT_DATABASE() and CURRENT_SCHEMA(). We do this to aid with performance, as the SQLColumns method is more performant when restricted to a particular DB/schema.
## S4 method for signature 'Snowflake,character' dbExistsTableForWrite(conn, name, ..., catalog_name = NULL, schema_name = NULL)
## S4 method for signature 'Snowflake,character' dbExistsTableForWrite(conn, name, ..., catalog_name = NULL, schema_name = NULL)
conn |
A DBIConnection object, as returned by
|
name |
The table name, passed on to
|
... |
Other parameters passed on to methods. |
catalog_name , schema_name
|
Catalog and schema names. |
This is used when creating a new table with dbWriteTable()
.
Databases with default methods defined are:
MySQL
PostgreSQL
SQL Server
Oracle
SQLite
Spark
Hive
Impala
Redshift
Vertica
BigQuery
Teradata
Access
Snowflake
odbcDataType(con, obj, ...)
odbcDataType(con, obj, ...)
con |
A driver connection object, as returned by |
obj |
An R object. |
... |
Additional arguments passed to methods. |
If you are using a different database and dbWriteTable()
fails with a SQL
parsing error the default method is not appropriate, you will need to write
a new method. The object type for your method will be the database name
retrieved by dbGetInfo(con)$dbms.name
. Use the documentation provided with
your database to determine appropriate values for each R data type.
Corresponding SQL type for the obj
.
Lists the names and types of each column (field) of a specified object.
odbcListColumns(connection, ...)
odbcListColumns(connection, ...)
connection |
A connection object, as returned by |
... |
Parameters specifying the object. |
The object to inspect must be specified as one of the arguments
(e.g. table = "employees"
); depending on the driver and underlying
data store, additional specification arguments may be required.
A data frame with name
and type
columns, listing the
object's fields.
On MacOS and Linux, odbc uses the unixODBC driver manager to manage information about driver and data sources. This helper returns the filepaths where the driver manager will look for that information.
This function is a wrapper around the command line call odbcinst -j
.
Windows does not use .ini
configuration files; this function will return a
0-length vector on Windows.
odbcListConfig()
odbcListConfig()
The odbcListDrivers()
and odbcListDataSources()
helpers return
information on the contents of odbcinst.ini
and odbc.ini
files,
respectively.
Learn more about unixODBC and the odbcinst
utility
here.
configs <- odbcListConfig() file.edit(configs[1])
configs <- odbcListConfig() file.edit(configs[1])
Collect information about the available data source names (DSNs). A DSN must
be both installed and configured with the driver manager to be included in
this list. Configuring a DSN just sets up a lookup table (e.g. in
odbc.ini
) to allow users to pass only the DSN to dbConnect()
.
DSNs that are not configured with the driver manager can still be
connected to with dbConnect()
by providing DSN metadata directly.
odbcListDataSources()
odbcListDataSources()
A data frame with two columns:
Name of the data source. The entries in this column can be
passed to the dsn
argument of dbConnect()
.
Data source description.
This function interfaces with the driver manager to collect information about the available data source names.
For MacOS and Linux, the odbc package supports the unixODBC driver
manager. unixODBC looks to the odbc.ini
configuration file for information
on DSNs. Find the location(s) of your odbc.ini
file(s) with odbcinst -j
.
In this example odbc.ini
file:
[MySQL] Driver = MySQL Driver Database = test Server = 127.0.0.1 User = root password = root Port = 3306
...the data source name is MySQL
, which will appear in the name
column of this function's output. To pass the DSN as the dsn
argument to
dbConnect()
, pass it as a string, like "MySQL"
.
Driver = MySQL Driver
references the driver name
in odbcListDrivers()
output.
Windows is bundled with an ODBC driver manager.
When a DSN is configured with a driver manager, information on the DSN will
be automatically passed on to dbConnect()
when its dsn
argument is set.
For example, with the MySQL
data source name configured, and the driver
name MySQL Driver
appearing in odbcListDrivers()
output, the code:
con <- dbConnect( odbc::odbc(), Driver = "MySQL Driver", Database = "test", Server = "127.0.0.1", User = "root", password = "root", Port = 3306 )
...can be written:
con <- dbConnect(odbc::odbc(), dsn = "MySQL")
In this case, dbConnect()
will look up the information defined for MySQL
in the driver manager (in our example, odbc.ini
) and automatically
pass the needed arguments.
Collect information about the configured driver names. A driver must be both
installed and configured with the driver manager to be included in this list.
Configuring a driver name just sets up a lookup table (e.g. in
odbcinst.ini
) to allow users to pass only the driver name to dbConnect()
.
Driver names that are not configured with the driver manager (and thus
do not appear in this function's output) can still be
used in dbConnect()
by providing a path to a driver directly.
odbcListDrivers( keep = getOption("odbc.drivers_keep"), filter = getOption("odbc.drivers_filter") )
odbcListDrivers( keep = getOption("odbc.drivers_keep"), filter = getOption("odbc.drivers_filter") )
keep , filter
|
A character vector of driver names to keep in or remove
from the results, respectively. If Driver names are first processed with |
A data frame with three columns.
Name of the driver. The entries in this column can be
passed to the driver
argument of dbConnect()
(as long as the
driver accepts the argument).
Driver attribute name.
Driver attribute value.
If a driver has multiple attributes, there will be one row per attribute,
each with the same driver name
. If a given driver name does not have any
attributes, the function will return one row with the driver name
, but
the last two columns will be NA
.
This function interfaces with the driver manager to collect information about the available driver names.
For MacOS and Linux, the odbc package supports the unixODBC driver
manager. unixODBC looks to the odbcinst.ini
configuration file for
information on driver names. Find the location(s) of your odbcinst.ini
file(s) with odbcinst -j
.
In this example odbcinst.ini
file:
[MySQL Driver] Driver=/opt/homebrew/Cellar/mysql/8.2.0_1/lib/libmysqlclient.dylib
Then the driver name is MySQL Driver
, which will appear in the name
column of this function's output. To pass the driver name as the driver
argument to dbConnect()
, pass it as a string, like "MySQL Driver"
.
Windows is bundled with an ODBC driver manager.
In this example, function output would include 1 row: the name
column
would read "MySQL Driver"
, attribute
would be "Driver"
, and value
would give the file path to the driver. Additional key-value pairs
under the driver name would add additional rows with the same name
entry.
When a driver is configured with a driver manager, information on the driver
will be automatically passed on to dbConnect()
when its driver
argument
is set. For an example, see the same section in the odbcListDataSources()
help-file. Instead of configuring driver information with a driver manager,
it is also possible to provide a path to a driver directly to dbConnect()
.
odbcListDrivers()
odbcListDrivers()
Lists all of the objects in the connection, or all the objects which have specific attributes.
odbcListObjects(connection, ...)
odbcListObjects(connection, ...)
connection |
A connection object, as returned by |
... |
Attributes to filter by. |
When used without parameters, this function returns all of the objects known
by the connection. Any parameters passed will filter the list to only objects
which have the given attributes; for instance, passing schema = "foo"
will return only objects matching the schema foo
.
A data frame with name
and type
columns, listing the
objects.
Lists the object types and metadata known by the connection, and how those object types relate to each other.
odbcListObjectTypes(connection)
odbcListObjectTypes(connection)
connection |
A connection object, as returned by |
The returned hierarchy takes the form of a nested list, in which each object type supported by the connection is a named list with the following attributes:
A list of other object types contained by the object, or "data" if the object contains data
An optional path to an icon representing the type
For instance, a connection in which the top-level object is a schema that contains tables and views, the function will return a list like the following:
list(schema = list(contains = list( list(name = "table", contains = "data") list(name = "view", contains = "data"))))
The hierarchy of object types supported by the connection.
Return the data inside an object as a data frame.
odbcPreviewObject(connection, rowLimit, ...)
odbcPreviewObject(connection, rowLimit, ...)
connection |
A connection object, as returned by |
rowLimit |
The maximum number of rows to display. |
... |
Parameters specifying the object. |
The object to previewed must be specified as one of the arguments
(e.g. table = "employees"
); depending on the driver and underlying
data store, additional specification arguments may be required.
A data frame containing the data in the object.
Set the Transaction Isolation Level for a Connection
odbcSetTransactionIsolationLevel(conn, levels)
odbcSetTransactionIsolationLevel(conn, levels)
conn |
A DBIConnection object, as returned by
|
levels |
One or more of 'read_uncommitted', 'read_committed', 'repeatable_read', 'serializable'. |
## Not run: # Can use spaces or underscores in between words. odbcSetTransactionIsolationLevel(con, "read uncommitted") # Can also use the full constant name. odbcSetTransactionIsolationLevel(con, "SQL_TXN_READ_UNCOMMITTED") ## End(Not run)
## Not run: # Can use spaces or underscores in between words. odbcSetTransactionIsolationLevel(con, "read uncommitted") # Can also use the full constant name. odbcSetTransactionIsolationLevel(con, "SQL_TXN_READ_UNCOMMITTED") ## End(Not run)
When connecting to a database using odbc, all the arguments are concatenated
into a single connection string that looks like name1=value1;name2=value2
.
That means if your value contains =
or ;
then it needs to be quoted.
Other rules mean that you need to quote any values that starts or ends with
white space, or contains {
or }
.
This function quotes a string in a way that should work for most drivers, but unfortunately there doesn't seem to be an approach that works everywhere. If this function doesn't work for you, you'll need to carefully read the docs for your driver.
quote_value(x)
quote_value(x)
x |
A string to quote. |
A quoted string, wrapped in I()
.
quote_value("abc") quote_value("ab'c") # Real usage is more likely to look like: ## Not run: library(DBI) con <- dbConnect( odbc::odbc(), dsn = "reallycooldatabase" password = odbc::quote_value(Sys.getenv("MY_PASSWORD")) ) ## End(Not run)
quote_value("abc") quote_value("ab'c") # Real usage is more likely to look like: ## Not run: library(DBI) con <- dbConnect( odbc::odbc(), dsn = "reallycooldatabase" password = odbc::quote_value(Sys.getenv("MY_PASSWORD")) ) ## End(Not run)
Connect to a Snowflake account via the Snowflake ODBC driver.
In particular, the custom dbConnect()
method for the Snowflake ODBC driver
detects ambient OAuth credentials on platforms like Snowpark Container
Services or Posit Workbench.
snowflake() ## S4 method for signature 'Snowflake' dbConnect( drv, account = Sys.getenv("SNOWFLAKE_ACCOUNT"), driver = NULL, warehouse = NULL, database = NULL, schema = NULL, uid = NULL, pwd = NULL, ... )
snowflake() ## S4 method for signature 'Snowflake' dbConnect( drv, account = Sys.getenv("SNOWFLAKE_ACCOUNT"), driver = NULL, warehouse = NULL, database = NULL, schema = NULL, uid = NULL, pwd = NULL, ... )
drv |
an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection). |
account |
A Snowflake account identifier,
e.g. |
driver |
The name of the Snowflake ODBC driver, or |
warehouse |
The name of a Snowflake compute warehouse, or |
database |
The name of a Snowflake database, or |
schema |
The name of a Snowflake database schema, or |
uid , pwd
|
Manually specify a username and password for authentication. Specifying these options will disable ambient credential discovery. |
... |
Further arguments passed on to |
An OdbcConnection
object with an active connection to a Snowflake
account.
## Not run: # Use ambient credentials. DBI::dbConnect(odbc::snowflake()) # Use browser-based SSO (if configured). Only works on desktop. DBI::dbConnect( odbc::snowflake(), account = "testorg-test_account", authenticator = "externalbrowser" ) # Use a traditional username & password. DBI::dbConnect( odbc::snowflake(), account = "testorg-test_account", uid = "me", pwd = rstudioapi::askForPassword() ) ## End(Not run)
## Not run: # Use ambient credentials. DBI::dbConnect(odbc::snowflake()) # Use browser-based SSO (if configured). Only works on desktop. DBI::dbConnect( odbc::snowflake(), account = "testorg-test_account", authenticator = "externalbrowser" ) # Use a traditional username & password. DBI::dbConnect( odbc::snowflake(), account = "testorg-test_account", uid = "me", pwd = rstudioapi::askForPassword() ) ## End(Not run)