Architecture

This vignette provides a high-level overview of the core architectural components in LaminDB. Understanding these concepts will help you navigate the system and effectively manage your data and metadata.

Core concepts

LaminDB is built around a few key ideas:

Instance

A LaminDB instance is a self-contained environment for storing and managing data and metadata. You can think of it like a database or a project directory. Each instance has its own:

  • Schema: Defines the structure of the metadata.
  • Storage: Where the actual data files are stored (locally, on S3, etc.).
  • Database: Stores the metadata records in registries.

For more information about instances, see ?connect() and ?Instance.

Module

A module in LaminDB is a collection of related registries that provide functionality in a specific domain. For example:

  • core: Provides registries for general data management (Artifacts, Collections, Transforms, etc.). This module is included by default in every LaminDB instance.
  • bionty: Offers registries for managing biological entities (genes, proteins, cell types) and links them to public ontologies.
  • wetlab: Includes registries for managing experimental metadata (samples, treatments, etc.).
  • And many more…

Modules help organize the system and make it easier to find the specific registries you need.

For more information about modules, see ?Module. The core module is documented in the module_core vignette: vignette("module_core", package = "laminr").

Registry

A registry is a centralized collection of related records. It’s like a table in a database, where each row represents a specific entity. Examples of registries include:

  • Artifacts: Datasets, models, or other data entities.
  • Collections: Groupings of related artifacts.
  • Transforms: Data processing operations.
  • Features: Variables or measurements within datasets.
  • Labels: Annotations or classifications applied to data.

Each registry has a defined structure with specific fields that hold relevant information.

For more information about registries, see ?Registry. The core registries are documented in the module_core vignette: vignette("module_core", package = "laminr").

Field

A field is a single piece of information within a registry. It’s analogous to a column in a database table. For example, the Artifact registry might have fields like:

  • key: Storage key, the relative path within the storage location.
  • storage: Storage location, e.g. an S3 or GCP bucket or a local directory.
  • description: A description of the artifact.
  • created_by: The user who created the artifact.

Fields define the type of data that can be stored in a registry and provide a way to organize and query the metadata.

For more information about fields, see ?Field. The fields of core registries are documented in the module_core vignette: vignette("module_core", package = "laminr").

Record

A record is a single entry within a registry. It’s like a row in a database table. A record combines multiple fields to represent a specific entity. For example, a record in the Artifact registry might represent a single dataset with its key, storage location, description, creator, and other relevant information.

Putting it together

In essence, you have instances that contain modules. Each module contains registries, which in turn hold records. Every record is composed of multiple fields. This hierarchical structure allows for flexible and organized management of data and metadata within LaminDB.

Class structure

The laminr package provides a set of classes that mirror the core concepts of LaminDB. These classes allow you to interact with instances, modules, registries, fields, and records in a programmatic way.

The package provides two sets of classes: the base classes and the sugar syntax classes.

Base classes

These classes provide the core functionality for interacting with LaminDB instances, modules, registries, fields, and records. These are the classes that are documented via ?Instance, ?Module, ?Registry, ?Field, and ?Record.

The class diagram below illustrates the relationships between these classes.

However, they are not intended to be used directly in most cases. Instead, the sugar syntax classes provide a more user-friendly interface for working with LaminDB data.

classDiagram
%% # nolint start
    laminr --> Instance
    laminr --> UserSettings
    laminr --> InstanceSettings
    Instance --> InstanceAPI
    Instance --> Module
    Module --> Registry
    Registry --> Field
    Registry --> Record
    Field --> RelatedRecords
    Record --> RelatedRecords
    UserSettings --> InstanceSettings
    InstanceSettings --> Instance
    InstanceAPI --> Module
    Instance --> Registry
    InstanceAPI --> Registry
    Instance --> Record
    InstanceAPI --> Record
    Instance --> RelatedRecords
    InstanceAPI --> RelatedRecords
    
    %% Methods must be on one line to be shown in the right diagram section
    %% Use \n for newlines and #emsp; to create indents in the rendered
    %% diagram when necessary
    
    class laminr{
        +connect(String slug): RichInstance
    }
    class UserSettings{
        +initialize(...): UserSettings
        +email: String
        +access_token: String
        +uid: String
        +uuid: String
        +handle: String
        +name: String
    }
    class InstanceSettings{
        +initialize(...): InstanceSettings
        +owner: String
        +name: String
        +id: String
        +schema_id: String
        +api_url: String
    }
    class Instance{
        +initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map<String, any> schema\n): Instance
        +get_modules(): Module[]
        +get_module(String module_name): Module
        +get_module_names(): String[]
        +get_api(): InstanceAPI
        +get_settings(): InstanceSettings
        +get_py_lamin(Boolean check, String what): PythonModule
        +track(String path, String transform): NULL
        +finish(): NULL
        +is_default: Boolean
    }
    class InstanceAPI{
        +initialize(InstanceSettings Instance_settings)
        +get_schema(): Map<String, Any>
        +get_record(...): Map<String, Any>
        +get_records(...): Map<String, Any>
        +delete_record(...): NULL
    }
    class Module{
        +initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map<String, any> module_schema\n): Module
        +name: String
        +get_registries(): Registry[]
        +get_registry(String registry_name): Registry
        +get_registry_names(): String[]
    }
    class Registry{
        +initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map<String, Any> registry_schema\n): Registry
        +name: String
        +class_name: String
        +is_link_table: Bool
        +get_fields(): Field[]
        +get_field(String field_name): Field
        +get_field_names(): String[]
        +get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord
        +get_record_class(): RichRecordClass
        +get_temporary_record_class(): TemporaryRecordClass
        +df(Integer limit, Bool verbose): DataFrame
        +from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord
        +from_path(\n#emsp;Path path, String key, String description, String run\n): TemporaryRecord
        +from_anndata(\n#emsp;AnnData adata, String key, String description, String run\n): TemporaryRecord
    }
    class Field{
        +initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field
        +type: String
        +through: Map
        +field_name: String
        +registry_name: String
        +column_name: String
        +module_name: String
        +is_link_table: Bool
        +relation_type: String
        +related_field_name: String
        +related_registry_name: String
        +related_module_name: String
    }
    class Record{
        +initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map<String, Any> data\n): Record
        +get_value(String field_name): Any
        +delete(): NULL
    }
    class RelatedRecords{
        +initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords
        +df(): DataFrame
        +field: Field
    }
%% # nolint end

Sugar syntax classes

The sugar syntax classes provide a more user-friendly way to interact with LaminDB data. These classes are designed to make it easier to access and manipulate instances, modules, registries, fields, and records.

For example, to get an artifact with a specific ID using only base classes, you might write:

db <- connect("laminlabs/cellxgene")

artifact <- db$get_module("core")$get_registry("artifact")$get("KBW89Mf7IGcekja2hADu")

artifact$get_value("id")

With the sugar syntax classes, you can achieve the same result more concisely:

db <- connect("laminlabs/cellxgene")

artifact <- db$Artifact$get("KBW89Mf7IGcekja2hADu")

artifact$id

This sugar syntax is achieved by creating RichInstance and RichRecord classes that inherit from Instance and Record, respectively. These classes provide additional methods and properties to simplify working with LaminDB data.

Class diagram

The class diagram below illustrates the relationships between the sugar syntax classes in the laminr package. These classes provide a more user-friendly interface for interacting with LaminDB data.

classDiagram
%% # nolint start
    %% --- Copied from base diagram --------------------------------------------
    laminr --> UserSettings
    laminr --> InstanceSettings
    Instance --> InstanceAPI
    Instance --> Module
    Module --> Registry
    Registry --> Field
    Field --> RelatedRecords
    Record --> RelatedRecords
    UserSettings --> InstanceSettings
    InstanceSettings --> Instance
    InstanceAPI --> Module
    Instance --> Registry
    InstanceAPI --> Registry
    Instance --> Record
    InstanceAPI --> Record
    Instance --> RelatedRecords
    InstanceAPI --> RelatedRecords
    %% -------------------------------------------------------------------------
    
    %% --- New links for Rich classes ------------------------------------------
    RichInstance --|> Instance
    laminr --> RichInstance
    Core --|> Module
    RichInstance --> Core
    Bionty --|> Module
    RichInstance --> Bionty
    Registry --> RichRecord
    Registry --> TemporaryRecord
    RichRecord --|> Record
    TemporaryRecord --|> RichRecord
    Registry --> Artifact
    Artifact --|> RichRecord
    %% -------------------------------------------------------------------------
    
    %% --- Copied from base diagram --------------------------------------------
    class laminr{
        +connect(String slug): RichInstance
    }
    class UserSettings{
        +initialize(...): UserSettings
        +email: String
        +access_token: String
        +uid: String
        +uuid: String
        +handle: String
        +name: String
    }
    class InstanceSettings{
        +initialize(...): InstanceSettings
        +owner: String
        +name: String
        +id: String
        +schema_id: String
        +api_url: String
    }
    class Instance{
        +initialize(\n#emsp;InstanceSettings Instance_settings, API api, \n#emsp;Map<String, any> schema\n): Instance
        +get_modules(): Module[]
        +get_module(String module_name): Module
        +get_module_names(): String[]
        +get_api(): InstanceAPI
        +get_settings(): InstanceSettings
        +get_py_lamin(Boolean check, String what): PythonModule
        +track(String path, String transform): NULL
        +finish(): NULL
        +is_default: Boolean
    }
    class InstanceAPI{
        +initialize(InstanceSettings Instance_settings)
        +get_schema(): Map<String, Any>
        +get_record(...): Map<String, Any>
        +get_records(...): Map<String, Any>
        +delete_record(...): NULL
    }
    class Module{
        +initialize(\n#emsp;Instance Instance, API api, String module_name,\n#emsp;Map<String, any> module_schema\n): Module
        +name: String
        +get_registries(): Registry[]
        +get_registry(String registry_name): Registry
        +get_registry_names(): String[]
    }
    class Registry{
        +initialize(\n#emsp;Instance Instance, Module module, API api,\n#emsp;String registry_name, Map<String, Any> registry_schema\n): Registry
        +name: String
        +class_name: String
        +is_link_table: Bool
        +get_fields(): Field[]
        +get_field(String field_name): Field
        +get_field_names(): String[]
        +get(\n#emsp;String id_or_uid, Bool include_foreign_keys,\n#emsp;List~String~ select, Bool verbose\n): RichRecord
        +get_record_class(): RichRecordClass
        +get_temporary_record_class(): TemporaryRecordClass
        +df(Integer limit, Bool verbose): DataFrame
        +from_df(\n#emsp;DataFrame dataframe, String key,\n#emsp;String description, String run\n): TemporaryRecord
        +from_path(\n#emsp;Path path, String key, String description, String run\n): TemporaryRecord
        +from_anndata(\n#emsp;AnnData adata, String key, String description, String run\n): TemporaryRecord
    }
    class Field{
        +initialize(\n#emsp;String type, String through, String field_name,\n#emsp;String registry_name, String column_name, String module_name,\n#emsp;Bool is_link_table, String relation_type, String related_field_name,\n#emsp;String related_registry_name, String related_module_name\n): Field
        +type: String
        +through: Map
        +field_name: String
        +registry_name: String
        +column_name: String
        +module_name: String
        +is_link_table: Bool
        +relation_type: String
        +related_field_name: String
        +related_registry_name: String
        +related_module_name: String
    }
    class Record{
        +initialize(\n#emsp;Instance Instance, Registry registry,\n#emsp;API api, Map<String, Any> data\n): Record
        +get_value(String field_name): Any
        +delete(): NULL
    }
    class RelatedRecords{
        +initialize(\n#emsp;Instance instance, Registry registry, Field field,\n#emsp;String related_to, API api\n): RelatedRecords
        +df(): DataFrame
        +field: Field
    }
    %% -------------------------------------------------------------------------
    
    %% --- New Rich classes ----------------------------------------------------
    class RichInstance{
        +initialize(
          #emsp;InstanceSettings Instance_settings, API api,
          #emsp;Map<String, any> schema
        ): RichInstance
        +Registry Artifact
        +Registry Collection
        +...registry accessors...
        +Registry User
        +Bionty bionty
    }
    style RichInstance fill:#ffe1c9
    class Core{
        +Registry Artifact
        +Registry Collection
        +...registry accessors...
        +Registry User
    }
    style Core fill:#ffe1c9
    class Bionty{
        +Registry CellLine
        +Registry CellMarker
        +...registry accessors...
        +Registry Tissue
    }
    style Bionty fill:#ffe1c9
    class RichRecord{
        +...field value accessors...
    }
    style RichRecord fill:#ffe1c9
    class TemporaryRecord{
        +save(): NULL
    }
    style TemporaryRecord fill:#ffe1c9
    class Artifact{
        +...field value accessors...
        +cache(): String
        +load(): AnnData | DataFrame | ...
        +describe(): NULL
    }
    style Artifact fill:#ffe1c9
    %% -------------------------------------------------------------------------
%% # nolint end