| Title: | Open-Access Computational Biology Datasets |
|---|---|
| Description: | Efficiently access the 'Bedrock Bio' library of open-access computational biology datasets. Lazily query datasets backed by 'DuckDB' and 'Apache Iceberg', with support for predicate pushdown and column projection to the cloud storage backend. This enables quick, iterative access to otherwise massive, unwieldy datasets without downloading them in full. See <https://bedrock.bio> for available datasets and documentation. |
| Authors: | Liam Abbott [aut, cre, cph] |
| Maintainer: | Liam Abbott <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.0 |
| Built: | 2026-07-02 21:22:03 UTC |
| Source: | https://github.com/cran/bedrockbio |
Describe a namespace: its name, citation, license, context, and tables
describe_namespace(name)describe_namespace(name)
name |
Namespace identifier. |
A named list with name, citation, license, context, and
tables (fully-qualified table identifiers). Use describe_table() for
per-table details.
## Not run: library(bedrockbio) describe_namespace("ukb_ppp")$tables ## End(Not run)## Not run: library(bedrockbio) describe_namespace("ukb_ppp")$tables ## End(Not run)
Describe a table: its context, columns, and partitions
describe_table(name)describe_table(name)
name |
Table identifier. |
A named list with name, context, columns (each with name,
type, description, nullable), and partitions (a named list of
partition column to values and default). Filter on partition columns
for fastest reads.
## Not run: library(bedrockbio) describe_table("ukb_ppp.pqtls")$name ## End(Not run)## Not run: library(bedrockbio) describe_table("ukb_ppp.pqtls")$name ## End(Not run)
List available namespaces (data sources)
list_namespaces()list_namespaces()
A character vector of namespace identifiers.
## Not run: library(bedrockbio) list_namespaces() ## End(Not run)## Not run: library(bedrockbio) list_namespaces() ## End(Not run)
List available tables, optionally filtered to one namespace
list_tables(namespace = NULL)list_tables(namespace = NULL)
namespace |
If given, return only that namespace's tables; otherwise all tables. |
A character vector of fully-qualified table identifiers.
## Not run: library(bedrockbio) list_tables("ukb_ppp") ## End(Not run)## Not run: library(bedrockbio) list_tables("ukb_ppp") ## End(Not run)
Lazily query a table
load_table(name)load_table(name)
name |
Table identifier. |
A lazy tbl backed by DuckDB, compatible with dplyr verbs. Filter
on partition columns (see describe_table()) for fastest reads.
## Not run: library(bedrockbio) library(dplyr) load_table("dbsnp.vcf") |> filter(assembly == "GRCh38", chromosome == "22") |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect() ## End(Not run)## Not run: library(bedrockbio) library(dplyr) load_table("dbsnp.vcf") |> filter(assembly == "GRCh38", chromosome == "22") |> select(rsid, position, ref_allele, alt_allele) |> head(5) |> collect() ## End(Not run)