Title: | Automated Doublet Detection and Classification for Cytometry Data |
---|---|
Description: | Automated method for doublet detection in flow or mass cytometry data, based on simulating doublets and finding events whose protein expression patterns are similar to the simulated doublets. |
Authors: | Matei Ionita [aut, cre] |
Maintainer: | Matei Ionita <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-12-19 17:16:41 UTC |
Source: | CRAN |
Extends a classification of singlets into a classification of doublets.
classify_doublets(cleanet_res, singlet_clas, max_multi = 4)
classify_doublets(cleanet_res, singlet_clas, max_multi = 4)
cleanet_res |
The output of a call to the cleanet function. |
singlet_clas |
An array giving a classification of the singlets, whose length must match the number of singlet events returned in cleanet_res. |
max_multi |
The highest cardinality of a multiplet to be considered. |
An array with the same length as the number of doublets found in cleanet_res, specifying the composition of each doublet.
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5) singlet_clas <- df_mdipa$label[which(cleanet_res$status!="Doublet")] doublet_clas <- classify_doublets(cleanet_res, singlet_clas)
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5) singlet_clas <- df_mdipa$label[which(cleanet_res$status!="Doublet")] doublet_clas <- classify_doublets(cleanet_res, singlet_clas)
Augments data with simulated doublets, computes nearest neighbors for augmented dataset, identifies doublets as those events with a high share of simulated doublets among nearest neighbors.
cleanet(df, cols, cofactor, thresh = 5, is_debris = NULL)
cleanet(df, cols, cofactor, thresh = 5, is_debris = NULL)
df |
A data frame containing protein expression data. |
cols |
Columns to use in analysis. |
cofactor |
Parameter of arcsinh transformation, applied before computing nearest neighbors. Recommended values are 5 for mass cytometry and 500-1000 for flow cytometry. |
thresh |
Among the 15 nearest neighbors, how many should be simulated doublets in order for the event to be classified as doublet? |
is_debris |
Optional, binary array with length matching the number of rows in df. TRUE for debris events, FALSE for everything else. This package includes helper functions to compute this for flow or mass cytometry data. |
A list with multiple elements, among them the singlet/doublet status of each event.
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5)
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5)
Given compatible classifications of singlets and doublets, this function computes expected proportions of doublets as the product of the proportions of their components.
compare_doublets_exp_obs(doublet_clas, singlet_clas, cleanet_res)
compare_doublets_exp_obs(doublet_clas, singlet_clas, cleanet_res)
doublet_clas |
An array giving a classification of the doublets, whose length must match the number of doublet events returned in cleanet_res. |
singlet_clas |
An array giving a classification of the singlets, whose length must match the number of singlet events returned in cleanet_res. |
cleanet_res |
The output of a call to the cleanet function. |
A data frame tabulating expected and observed proportions for each unique doublet type.
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5) singlet_clas <- df_mdipa$label[which(cleanet_res$status!="Doublet")] doublet_clas <- classify_doublets(cleanet_res, singlet_clas) df_exp_obs <- compare_doublets_exp_obs(doublet_clas, singlet_clas, cleanet_res)
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") cleanet_res <- cleanet(df_mdipa, cols, cofactor=5) singlet_clas <- df_mdipa$label[which(cleanet_res$status!="Doublet")] doublet_clas <- classify_doublets(cleanet_res, singlet_clas) df_exp_obs <- compare_doublets_exp_obs(doublet_clas, singlet_clas, cleanet_res)
Detect events with low distance from 0 in protein space. This function aims for high specificity, but not high sensitivity: for Cleanet's purposes, it suffices to deplete debris, even if not all of it is eliminated.
filter_debris_cytof( df, cols, cols_plot = c("DNA1", "CD45"), cofactor = 5, threshold = 0.3 )
filter_debris_cytof( df, cols, cols_plot = c("DNA1", "CD45"), cofactor = 5, threshold = 0.3 )
df |
A data frame containing protein expression data. |
cols |
Columns to use in analysis. It is recommended to use the same ones in the call to cleanet. |
cols_plot |
Two columns that are used for visual feedback. |
cofactor |
Parameter for arcsinh transformation used before computing distances. 5 is a good default for mass cytometry data. |
threshold |
Number between 0 and 1; distances are scaled between 0 and 1 and events whose distance to the origin is smaller than the threshold are flagged. |
A binary array with the same length as the number of rows in df. TRUE for debris, FALSE for everything else.
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") is_debris <- filter_debris_cytof(df_mdipa, cols)
path <- system.file("extdata", "df_mdipa.csv", package="Cleanet") df_mdipa <- read.csv(path, check.names=FALSE) cols <- c("CD45", "CD123", "CD19", "CD11c", "CD16", "CD56", "CD294", "CD14", "CD3", "CD20", "CD66b", "CD38", "HLA-DR", "CD45RA", "DNA1", "DNA2") is_debris <- filter_debris_cytof(df_mdipa, cols)
Detect events in the lower left corner of FSC-A/SSC-A plots. This function aims for high specificity, but not high sensitivity: for Cleanet's purposes, it suffices to deplete debris, even if not all of it is eliminated.
filter_debris_flow(df, sum_max = 50000, cols = c("FSC-A", "SSC-A"))
filter_debris_flow(df, sum_max = 50000, cols = c("FSC-A", "SSC-A"))
df |
A data frame containing scattering channels and protein expression data. |
sum_max |
Numeric; events whose sum of FSC-A and SSC-A is smaller than this value are flagged. |
cols |
Names of columns to use. This function is intended for use with the area channel of forward and side scatter. |
A binary array with the same length as the number of rows in df. TRUE for debris, FALSE for everything else.