Connect to a Databricks Workspace

Defining Credentials

The {brickster} package connects to a Databricks workspace is two ways:

  1. OAuth user-to-machine (U2M) authentication
  2. Personal Access Tokens (PAT)

It’s recommended to use option (1) when using {brickster} interactively, if you need to run code via an automated process the only option currently is (2).

{brickster} will automatically detect when a session has Posit Workbench managed Databricks OAuth credentials enabled. For more information about this authentication flow see the section Posit Workbench Managed Databricks OAuth Credentials.

Personal Access Tokens can be generated in a few steps, for a step-by-step breakdown refer to the documentation.

Once you have a token you’ll be able to store it alongside the workspace URL in an .Renviron file. The .Renviron is used for storing the variables, such as those which may be sensitive (e.g. credentials) and de-couple them from the code additional reading.

To get started add the following to your .Renviron:

  • DATABRICKS_HOST: The workspace URL

  • DATABRICKS_TOKEN: Personal access token (not required if using OAuth U2M)

  • DATABRICKS_WSID: The workspace ID (docs)

DATABRICKS_WSID is only required for the RStudio IDE integration with the connection pane.

Example of entries in .Renviron:

DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapi123456789012345678a9bc01234defg5
DATABRICKS_WSID=123123123123123

Note: Recommend creating an .Renviron for each project. You can create .Renviron within your user home directory if required.

Restarting your R session will allow those variable to be picked up via the {brickster} package.

Using Credentials with {brickster}

Authentication should now be possible without specifying the credentials in your R code. You can load {brickster} and list the clusters within the workspace using db_cluster_list(), to access the host/token use db_host()/db_token() respectively.

library(brickster)

# using db_host() and db_token() to get credentials
clusters <- db_cluster_list(host = db_host(), token = db_token())

All {brickster} functions have their host/token parameters default to calling db_host()/db_token() therefore we can omit explicit calls to the functions.

# all host/token parameters default to db_host()/db_token()
clusters <- db_cluster_list()

When using OAuth U2M authentication you don’t define a token in .Renviron and therefore db_token() will return NULL.

Managing Multiple Credentials

There are two methods that {brickster} supports to simplify switching of credentials within an R project/session:

  1. Adding multiple credentials to .Renviron, each additional set of credentials is differentiated via a suffix (e.g. DATABRICKS_TOKEN_DEV)
  2. Using a .databrickscfg file (primary method in Databricks CLI)

To differentiate between (1) and (2) the option use_databrickscfg is used, the following example shows how to switch the session to use .databrickscfg.

# will use the `DEFAULT` profile in `.databrickscfg`
options(use_databrickscfg = TRUE)

# values returned should be those in profile of `.databrickscfg`
db_host()
db_token()

The default behaviour is to read credentials from .Renviron. If you wish to change this it’s recommended to set the option within .Rprofile so that it’s set during initialization of the R session.

Switching Between Credentials

The db_profile option controls which profiles credentials are returned by db_host()/db_token()/db_wsid().

Profiles enable you to switch contexts between:

  • Different workspaces (e.g. development or production)

  • Different permissions (e.g. admin or restricted user)

This behaviour works when using credentials specified in either .Renviron or .databrickscfg:

# using .Renviron
db_host() # returns `DB_HOST` (.Renviron)

# switch profile to 'prod'
options(db_profile = "prod")
db_host() # returns `DB_HOST_PROD` (.Renviron)

# set back to default (NULL)
options(db_profile = NULL)
# use .databrickcfg
options(use_databrickscfg = TRUE)
db_host() # returns host from `DEFAULT` profile (.databrickscfg)

options(db_profile = "prod")
db_host() # returns host from `prod` profile in (.datarickscfg)

It is expected that profiles in .Renviron will adhere to the same naming convention as default but add an additional suffix.

Here is an example of an .Renviron file that has three profiles (default, dev, prod):

# default
DATABRICKS_HOST=xxxxxxx.cloud.databricks.com
DATABRICKS_TOKEN=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID=123123123123123
# dev
DATABRICKS_HOST_DEV=xxxxxxx-dev.cloud.databricks.com
DATABRICKS_TOKEN_DEV=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_DEV=123123123123124
# prod
DATABRICKS_HOST_PROD=xxxxxxx-prod.cloud.databricks.com
DATABRICKS_TOKEN_PROD=dapixxxxxxxxxxxxxxxxxxxxxxxxx
DATABRICKS_WSID_PROD=123123123123125

Configuring .databrickscfg

For details on configuring please refer to documentation from Databricks CLI.

There is only one {brickster} specific feature and it is the inclusion of wsid alongside host/token.

wsid is used by the connections pane integration in RStudio as the underlying API’s require it.

Posit Workbench Managed Databricks OAuth Credentials

Posit Workbench has a managed Databricks OAuth credentials feature, which allows users to sign into a Databricks workspace from the home page of Workbench when launching a session and then access Databricks resources as their own identity. When in an RStudio Pro session running on Posit Workbench with managed Databricks OAuth credentials selected, {brickster} functions using db_host()/db_token() respectively should just work without needing to specify any credentials in your R code. See the code below as an example.

library(brickster)
db_cluster_list()

{brickster} will automatically detect when a session has Workbench managed OAuth credentials and then use the workbench profile defined in a .databrickscfg file at the DATABRICKS_CONFIG_FILE specified location. Workbench generates this .databrickscfg file in a temporary directory and should not be modified directly.

To use an alternative .databrickscfg file, a different profile, an alternative env variable DATABRICKS_HOST or set an env variable DATABRICKS_TOKEN, launch an RStudio Pro session without the Databricks managed credentials box selected.