| Title: | Find Neighbor Species of a Bacteria of Interest in the Human Gut Microbiota |
|---|---|
| Description: | Implementation of the local approach described in Sola et al., 2026 <doi:10.64898/2025.12.05.692507> to identify companion species of a bacteria of interest. From several abundance tables of metagenomic data, 'NeighborFinder' suggests a shortlist of companion species based on the integration of results. A visualization via a network is proposed. |
| Authors: | Mathilde Sola [aut, cre] (ORCID: <https://orcid.org/0009-0009-0436-5078>), Mahendra Mariadassou [aut] (ORCID: <https://orcid.org/0000-0003-2986-354X>), Magali Berland [aut] (ORCID: <https://orcid.org/0000-0002-6762-5350>) |
| Maintainer: | Mathilde Sola <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-06-30 16:49:30 UTC |
| Source: | https://github.com/cran/NeighborFinder |
Apply NeighborFinder on raw data
apply_NeighborFinder( data_with_annotation, object_of_interest, col_module_id, annotation_level, prev_level = 0.3, filtering_top = 20, .seed = NULL, ... )apply_NeighborFinder( data_with_annotation, object_of_interest, col_module_id, annotation_level, prev_level = 0.3, filtering_top = 20, .seed = NULL, ... )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
prev_level |
Numeric. The prevalence to be studied. Required format is decimal: 0.20 for 20% of prevalence |
filtering_top |
Numeric. The filtering top percentage to be studied. Required format is: 10 for top 10% |
.seed |
Integer. Top level RNG seed to control the generation of RNG seed for the inner loop or NULL if reproducibility is not required. |
... |
Additional arguments passed on to |
Dataframe. Returns results after using apply_NeighborFinder(): for each module ID from 'object_of_interest', the names of their neighbors and the corresponding coefficients calculated by cv.glmnet()
data(data) res_CRC_JPN <- apply_NeighborFinder( data$CRC_JPN[, 1:100], object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", .seed = 123 )data(data) res_CRC_JPN <- apply_NeighborFinder( data$CRC_JPN[, 1:100], object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", .seed = 123 )
Apply NeighborFinder simplest version on raw data
apply_NF_simple( data_with_annotation, object_of_interest, col_module_id, annotation_level, prev_level = 0.3, filtering_top = 20, seed = NULL, ... )apply_NF_simple( data_with_annotation, object_of_interest, col_module_id, annotation_level, prev_level = 0.3, filtering_top = 20, seed = NULL, ... )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
prev_level |
Numeric. The prevalence to be studied. Required format is decimal: 0.20 for 20% of prevalence |
filtering_top |
Numeric. The filtering top percentage to be studied. Required format is: 10 for top 10% |
seed |
Numeric. The seed number, ensuring reproducibility |
... |
Additional arguments passed on to |
Dataframe. Returns results after using apply_NeighborFinder(): for each module ID from 'object_of_interest', the names of their neighbors and the corresponding coefficients calculated by cv.glmnet()
data(data) res_CRC_JPN <- apply_NF_simple( data$CRC_JPN[, 1:100], object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", seed = 20242025 )data(data) res_CRC_JPN <- apply_NF_simple( data$CRC_JPN[, 1:100], object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", seed = 20242025 )
Render a table to give an indication of the values to choose for the prevalence level and the top filtering percentage
choose_params_values( data_with_annotation, object_of_interest, sample_size, prev_list = c(0.2, 0.3, 0.4), filtering_list = c(10, 20, 30), graph_file = NULL, col_module_id, annotation_level, seed = NULL )choose_params_values( data_with_annotation, object_of_interest, sample_size, prev_list = c(0.2, 0.3, 0.4), filtering_list = c(10, 20, 30), graph_file = NULL, col_module_id, annotation_level, seed = NULL )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
sample_size |
Numeric. Number of samples in each dataset. |
prev_list |
List of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence |
filtering_list |
List of numeric. The filtering top percentages to be studied. Required format is: 10 for the top 10% |
graph_file |
Dataframe. The object generated by graph_step() function |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
seed |
Numeric. The seed number, ensuring reproducibility |
Dataframe. Returns F1 rates before and after using apply_NeighborFinder()
data(data) data(graphs) choose_params_values( data_with_annotation = data$CRC_JPN, object_of_interest = "Escherichia coli", sample_size = 100, prev_list = c(0.20, 0.30), filtering_list = c(10, 20), graph_file = graphs$CRC_JPN, col_module_id = "msp_id", annotation_level = "species", seed = 123 )data(data) data(graphs) choose_params_values( data_with_annotation = data$CRC_JPN, object_of_interest = "Escherichia coli", sample_size = 100, prev_list = c(0.20, 0.30), filtering_list = c(10, 20), graph_file = graphs$CRC_JPN, col_module_id = "msp_id", annotation_level = "species", seed = 123 )
Compute precision rate
compute_precision(true, detected)compute_precision(true, detected)
true |
List. The one of true neighbors |
detected |
List. The one of detected neighbors |
Numeric. Returns the precision rate
compute_precision(c("a"), c("a", "b", "c")) compute_precision(c("a", "b"), c("a", "c"))compute_precision(c("a"), c("a", "b", "c")) compute_precision(c("a", "b"), c("a", "c"))
Compute recall rate
compute_recall(true, detected)compute_recall(true, detected)
true |
List. The one of true neighbors |
detected |
List. The one of detected neighbors |
Numeric. Returns the recall rate
compute_recall(c("a"), c("a", "b", "c")) compute_recall(c("a", "b"), c("a", "c"))compute_recall(c("a"), c("a", "b", "c")) compute_recall(c("a", "b"), c("a", "c"))
Apply cv.glmnet() for a list of module IDs and for each prevalence level
cvglm_to_coeffs_by_object( list_dfs, test_module = identify_module(), seed = NULL, ... )cvglm_to_coeffs_by_object( list_dfs, test_module = identify_module(), seed = NULL, ... )
list_dfs |
List of dataframe. A normalized dataframe |
test_module |
List of string. The module IDs |
seed |
Numeric. The seed number, ensuring reproducibility |
... |
Additional arguments passed on to |
Dataframe. Returns the module ID, its detected neighbor and the corresponding coefficient
data(data) data(metadata) # Simple example normed_JPN <- norm_data( data$CRC_JPN, col_module_id = "msp_id", annotation_level = "species", prev_list = c(0.25, 0.30) ) neighbors_JPN <- cvglm_to_coeffs_by_object( list_dfs = normed_JPN, test_module = c("msp_0030", "msp_0345"), seed = 20242025 ) # Example with covariate # normed_CHN <- norm_data( # data$CRC_CHN, # col_module_id = "msp_id", # annotation_level = "species", # prev_list = c(0.25, 0.30) # ) # neighbors_CHN <- cvglm_to_coeffs_by_object( # list_dfs = normed_CHN, # test_module = c("msp_0030", "msp_0345"), # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )data(data) data(metadata) # Simple example normed_JPN <- norm_data( data$CRC_JPN, col_module_id = "msp_id", annotation_level = "species", prev_list = c(0.25, 0.30) ) neighbors_JPN <- cvglm_to_coeffs_by_object( list_dfs = normed_JPN, test_module = c("msp_0030", "msp_0345"), seed = 20242025 ) # Example with covariate # normed_CHN <- norm_data( # data$CRC_CHN, # col_module_id = "msp_id", # annotation_level = "species", # prev_list = c(0.25, 0.30) # ) # neighbors_CHN <- cvglm_to_coeffs_by_object( # list_dfs = normed_CHN, # test_module = c("msp_0030", "msp_0345"), # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )
#' @format A list of dataframes corresponding to abundance tables merges with taxonomic information.
dataframe with only Japanese patients diagnosed with colorectal cancer
dataframe with only Chinese patients diagnosed with colorectal cancer
dataframe with only European patients diagnosed with colorectal cancer
dataframe with patients diagnosed with colorectal cancer from the 3 previous countries
datadata
An object of class list of length 4.
https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/7IVO3E
Gather lists of neighbors of true ones from the graph and detected ones from cv.glmnet()
final_step(df_truth, df_glm, robustness_step = NULL)final_step(df_truth, df_glm, robustness_step = NULL)
df_truth |
Dataframe. The one resulting from truth_by_prevalence() |
df_glm |
Dataframe. The one resulting from cvglm_to_coeffs_by_object() |
robustness_step |
Boolean. When TRUE, filtering_top will be different from 100%, when FALSE the reults from the naïve method are looked at |
Dataframe. Returns for each level of prevalence and module ID, the list of true and/or detected neighbors and the corresponding list of coefficients
# Dataframe with true neighbors df_true <- list( tibble::tibble( node1 = c("msp_1", "msp_1", "msp_2", "msp_3"), node2 = c("msp_55", "msp_20", "msp_3", "msp_18"), prev1 = c(0.28, 0.28, 0.96, 0.75), prev2 = c(0.76, 0.25, 0.75, 0.60) ), tibble::tibble( node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), prev1 = c(0.96, 0.75), prev2 = c(0.75, 0.60) ) ) %>% rlang::set_names(c("0.20", "0.30")) # Dataframe with detected neighbors df_detected <- list( tibble::tibble( prev_level = c("0.20", "0.30", "0.30", "0.30"), node1 = c("msp_2", "msp_2", "msp_3", "msp_3"), node2 = c("msp_3", "msp_3", "msp_18", "msp_8"), coef = c(0.406, -0.025, 0.160, 0.005), filtering_top = c(100, 100, 100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) # Use final_step() to gather both neighbors <- final_step(df_true, df_detected, robustness_step = FALSE)# Dataframe with true neighbors df_true <- list( tibble::tibble( node1 = c("msp_1", "msp_1", "msp_2", "msp_3"), node2 = c("msp_55", "msp_20", "msp_3", "msp_18"), prev1 = c(0.28, 0.28, 0.96, 0.75), prev2 = c(0.76, 0.25, 0.75, 0.60) ), tibble::tibble( node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), prev1 = c(0.96, 0.75), prev2 = c(0.75, 0.60) ) ) %>% rlang::set_names(c("0.20", "0.30")) # Dataframe with detected neighbors df_detected <- list( tibble::tibble( prev_level = c("0.20", "0.30", "0.30", "0.30"), node1 = c("msp_2", "msp_2", "msp_3", "msp_3"), node2 = c("msp_3", "msp_3", "msp_18", "msp_8"), coef = c(0.406, -0.025, 0.160, 0.005), filtering_top = c(100, 100, 100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) # Use final_step() to gather both neighbors <- final_step(df_true, df_detected, robustness_step = FALSE)
Apply cv.glmnet() for a list of module IDs
find_all_module_neighbors(df, test_module, seed = NULL, ...)find_all_module_neighbors(df, test_module, seed = NULL, ...)
df |
Dataframe. A normalized dataframe |
test_module |
List of string. The module IDs |
seed |
Numeric. The seed number, ensuring reproducibility |
... |
Additional arguments passed on to |
Dataframe. Returns the module ID, its detected neighbor and the corresponding coefficient
data(data) data(metadata) # Simple example x <- norm_data(data$CRC_JPN, 0.30, annotation_level = "species")[[1]] neighbors_JPN <- find_all_module_neighbors( df = x, test_module = c("msp_0030", "msp_0345"), seed = 20242025 ) # Example with covariate # x <- norm_data(data$CRC_CHN, 0.30, annotation_level = "species")[[1]] # neighbors_CHN <- find_all_module_neighbors( # df = x, # test_module = c("msp_0030", "msp_0345"), # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )data(data) data(metadata) # Simple example x <- norm_data(data$CRC_JPN, 0.30, annotation_level = "species")[[1]] neighbors_JPN <- find_all_module_neighbors( df = x, test_module = c("msp_0030", "msp_0345"), seed = 20242025 ) # Example with covariate # x <- norm_data(data$CRC_CHN, 0.30, annotation_level = "species")[[1]] # neighbors_CHN <- find_all_module_neighbors( # df = x, # test_module = c("msp_0030", "msp_0345"), # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )
Apply cv.glmnet() for a given mmodule ID
find_module_neighbors( df, module, seed = NULL, covar = NULL, meta_df = NULL, sample_col = NULL )find_module_neighbors( df, module, seed = NULL, covar = NULL, meta_df = NULL, sample_col = NULL )
df |
Dataframe. A normalized dataframe |
module |
String. The module ID name |
seed |
Numeric. The seed number, ensuring reproducibility |
covar |
String or formula. Formula or the name of the column of the covariate in the metadata table. Note that "study_accession" is equivalent to ~study_accession |
meta_df |
Dataframe. The dataframe giving metadata information |
sample_col |
String. The name of the column in metadata indicating the sample names, it should be consistent with the colnames of 'df' |
Dataframe. Returns the module ID, its detected neighbor and the corresponding coefficient
data(data) data(metadata) # Simple example x <- norm_data(data$CRC_JPN, 0.30)[[1]] neighbors_JPN <- find_module_neighbors( df = x, module = "msp_0030", seed = 20242025 ) # Example with covariate # x <- norm_data(data$CRC_CHN, 0.30)[[1]] # neighbors_CHN <- find_module_neighbors( # df = x, # module = "msp_0030", # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )data(data) data(metadata) # Simple example x <- norm_data(data$CRC_JPN, 0.30)[[1]] neighbors_JPN <- find_module_neighbors( df = x, module = "msp_0030", seed = 20242025 ) # Example with covariate # x <- norm_data(data$CRC_CHN, 0.30)[[1]] # neighbors_CHN <- find_module_neighbors( # df = x, # module = "msp_0030", # seed = 20242025, # covar = ~study_accession, # meta_df = metadata$CRC_CHN, # sample_col = "secondary_sample_accession" # )
Conversion to count table function with prevalence filter
get_count_table( abund.path = NULL, abund.table = NULL, sample.id = NULL, prev.min, verbatim = TRUE, msp = NULL )get_count_table( abund.path = NULL, abund.table = NULL, sample.id = NULL, prev.min, verbatim = TRUE, msp = NULL )
abund.path |
String. Path to the abundance table |
abund.table |
Dataframe. Abundance table, it should have the bacterial species names as first column |
sample.id |
String vector. IDs of samples to keep in the final table |
prev.min |
Numeric. The value is between 0 and 1 and corresponds to the minimal prevalence threshold of bacterial species to keep in the final table |
verbatim |
Boolean. Controls verbosity |
msp |
String vector. It indicates bacterial species names, if they are not specified in the abundance table first column |
A list containing
data: |
the final count table (tibble) |
prevalences: |
a tibble gathering the prevalence of each bacterial species |
This function is adapted from the same name function in OneNet package (version 0.3.1), which is licensed under the MIT License. Original copyright (c) 2021-2024 INRAE.
The MIT License text for the original package is as follows:
---
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
tiny_data <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) # Applying a prevalence filter of 30% on the new count_table count_table <- get_count_table( abund.table = tiny_data, sample.id = colnames(tiny_data), prev.min = 0.3 )tiny_data <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) # Applying a prevalence filter of 30% on the new count_table count_table <- get_count_table( abund.table = tiny_data, sample.id = colnames(tiny_data), prev.min = 0.3 )
Generate a graph with a "cluster-like" structure, only needed for simulation purposes
graph_step( data_with_annotation, col_module_id, annotation_level, seed = 10010, data_type = "shotgun", ... )graph_step( data_with_annotation, col_module_id, annotation_level, seed = 10010, data_type = "shotgun", ... )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
col_module_id |
String. The name of the column with the module names in the annotation table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
seed |
Numeric. Seed number for data generation (new_synth_data) |
data_type |
String. Enables the treatment of 16S data with "16S", default value is "shotgun" |
... |
Additional arguments passed on to the |
Dataframe. The dataframe is composed of 0 and 1 corresponding to the existence of edges on the graph.
tiny_data <- data.frame( species = c( "One bacteria", "One bacterium L", "One bacterium G", "Two bact", "Three bact A", "Three bact B" ), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4", "msp_5", "msp_6"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07, 0, 0), SAMPLE2 = c(1.251707e-07, 0, 3.985320e-07, 0, 1.33607e-04, 0.8675e-03), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06, 0, 0.662e-03), SAMPLE4 = c(0, 0, 2.98320e-05, 0, 1.275e-04, 0), SAMPLE5 = c(0.0976, 0.9862, 2.98320e-03, 0, 3.9754e-03, 0), SAMPLE6 = c(0.26417e-06, 0, 1.0077e-05, 3.983320e-08, 0, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings()tiny_data <- data.frame( species = c( "One bacteria", "One bacterium L", "One bacterium G", "Two bact", "Three bact A", "Three bact B" ), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4", "msp_5", "msp_6"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07, 0, 0), SAMPLE2 = c(1.251707e-07, 0, 3.985320e-07, 0, 1.33607e-04, 0.8675e-03), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06, 0, 0.662e-03), SAMPLE4 = c(0, 0, 2.98320e-05, 0, 1.275e-04, 0), SAMPLE5 = c(0.0976, 0.9862, 2.98320e-03, 0, 3.9754e-03, 0), SAMPLE6 = c(0.26417e-06, 0, 1.0077e-05, 3.983320e-08, 0, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings()
graphs
graphsgraphs
A list of dataframes corresponding to graphs based on synthetic data.
graph corresponding to data with only Japanese patients diagnosed with colorectal cancer
graph corresponding to data gathering Japanese, Chinese and European patients diagnosed with colorectal cancer
List the modules corresponding to a given object of interest
identify_module( object_of_interest, annotation_table, col_module_id, annotation_level = "species" )identify_module( object_of_interest, annotation_table, col_module_id, annotation_level = "species" )
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
annotation_table |
Dataframe. The dataframe gathering the taxonomic or functional module correspondence information |
col_module_id |
String. The name of the column with the module names in the annotation table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1. Default value is set to the species level |
List of string. The module names of the corresponding object of interest
df_taxo <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), genus = c("One", "One", "One", "Two"), species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact") ) identify_module( object_of_interest = "bacterium", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) identify_module( object_of_interest = "One", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" )df_taxo <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), genus = c("One", "One", "One", "Two"), species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact") ) identify_module( object_of_interest = "bacterium", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) identify_module( object_of_interest = "One", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" )
Display the intersection network from 2 or more datasets
intersections_network( res_list, threshold, annotation_table, col_module_id, annotation_level, object_of_interest, annotation_option = FALSE, node_size = 12, label_size = 4, edge_label_size = 2, object_color = "cadetblue2", seed = NULL )intersections_network( res_list, threshold, annotation_table, col_module_id, annotation_level, object_of_interest, annotation_option = FALSE, node_size = 12, label_size = 4, edge_label_size = 2, object_color = "cadetblue2", seed = NULL )
res_list |
List of dataframes. The results from apply_NeighborFinder() on several datasets |
threshold |
Numeric. Integer corresponding to the minimum number of datasets in which you want neighbors to have been found |
annotation_table |
Dataframe. The dataframe gathering the taxonomic or functional module correspondence information |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
annotation_option |
Boolean. Default value is False. If True: labels on nodes become module names instead of module IDs |
node_size |
Numeric. The parameter to adjust size of nodes |
label_size |
Numeric. The parameter to adjust size of labels |
edge_label_size |
Numeric. The parameter to adjust the size of edge labels |
object_color |
String. The name of the color to differentiate the nodes corresponding to 'object_of_interest' from the other module IDs |
seed |
Numeric. The seed number, ensuring reproducibility |
Network. Visualization of NeighborFinder results from several datasets. Edge color encodes the mean coefficient sign: green if positive, red if negative.
data(taxo) data(result_example) intersections_network( res_list = list( result_example$res_CRC_JPN, result_example$res_CRC_CHN, result_example$res_CRC_EUR ), taxo, threshold = 2, "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", label_size = 7, edge_label_size = 4, node_size = 15, annotation_option = TRUE, seed = 3 )data(taxo) data(result_example) intersections_network( res_list = list( result_example$res_CRC_JPN, result_example$res_CRC_CHN, result_example$res_CRC_EUR ), taxo, threshold = 2, "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", label_size = 7, edge_label_size = 4, node_size = 15, annotation_option = TRUE, seed = 3 )
Display the intersection table summarizing the results from 2 or more datasets
intersections_table( res_list, threshold, annotation_table, col_module_id, annotation_level, object_of_interest )intersections_table( res_list, threshold, annotation_table, col_module_id, annotation_level, object_of_interest )
res_list |
List of dataframes. The results from apply_NeighborFinder() on several datasets |
threshold |
Numeric. Integer corresponding to the minimum number of datasets in which you want neighbors to have been found |
annotation_table |
Dataframe. The dataframe gathering the taxonomic or functional module correspondence information |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
Dataframe. Table gathering the intersection of NeighborFinder results from several datasets. The column 'datasets' indicates the datasets in which the same neighbor has been found, the column 'intersections' indicates the number of datasets in which the same neighbor has been found
data(taxo) data(result_example) intersections_table( res_list = list( result_example$res_CRC_JPN, result_example$res_CRC_CHN, result_example$res_CRC_EUR ), threshold = 2, taxo, col_module_id = "msp_id", annotation_level = "species", "Escherichia coli" )data(taxo) data(result_example) intersections_table( res_list = list( result_example$res_CRC_JPN, result_example$res_CRC_CHN, result_example$res_CRC_EUR ), threshold = 2, taxo, col_module_id = "msp_id", annotation_level = "species", "Escherichia coli" )
Modified central log ratio (mclr) transformation extracted from the SPRING package
mclr(dat, base = exp(1), tol = 1e-16, eps = NULL, atleast = 1)mclr(dat, base = exp(1), tol = 1e-16, eps = NULL, atleast = 1)
dat |
raw count data or compositional data (n by p) does not matter. |
base |
exp(1) for natural log |
tol |
tolerance for checking zeros |
eps |
epsilon in eq (2) of the paper "Yoon, Gaynanova, Müller (2019), Frontiers in Genetics". positive shifts to all non-zero compositions. Refer to the paper for more details. eps = absolute value of minimum of log ratio counts plus c. |
atleast |
default value is 1. Constant c which ensures all nonzero values to be strictly positive. default is 1. |
This function implements the mclr normalization introduced in Yoon G, Gaynanova I and Müller CL (2019) Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data. Front. Genet. 10:516. doi:10.3389/fgene.2019.00516
mclr returns a data matrix of the same dimension with input data matrix.
#' @format A list of dataframes corresponding to metadata, giving more information on each patient.
dataframe with only Japanese patients diagnosed with colorectal cancer
dataframe with only Chinese patients diagnosed with colorectal cancer
dataframe with only European patients diagnosed with colorectal cancer
dataframe with patients diagnosed with colorectal cancer from the 3 previous countries
metadatametadata
An object of class list of length 4.
https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/7IVO3E
Correspondence between the module ID (msp or functional module) and its name (bacteria or function)
module_to_node( module, annotation_table, col_module_id = "msp_name", annotation_level )module_to_node( module, annotation_table, col_module_id = "msp_name", annotation_level )
module |
String. The name of the biological object (msp or functional module), can be a single one or a list |
annotation_table |
Dataframe. The dataframe gathering the taxonomic or functional module correspondence information |
col_module_id |
String. The name of the column with the module names in the annotation table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
Dictionary. The name of the module, can be a single one or a list
df_taxo <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), genus = c("A", "B", "C", "D"), species = c("A a", "B a", "C c", "D b") ) # Correspondence for one specific msp module_to_node( "msp_1", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) # or for several msps module_to_node( c("msp_1", "msp_3", "msp_4"), annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) # and if one msp is repeated module_to_node( c("msp_1", "msp_1", "msp_2"), annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "genus" )df_taxo <- data.frame( msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), genus = c("A", "B", "C", "D"), species = c("A a", "B a", "C c", "D b") ) # Correspondence for one specific msp module_to_node( "msp_1", annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) # or for several msps module_to_node( c("msp_1", "msp_3", "msp_4"), annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "species" ) # and if one msp is repeated module_to_node( c("msp_1", "msp_1", "msp_2"), annotation_table = df_taxo, col_module_id = "msp_name", annotation_level = "genus" )
Simulate data from some empirical count dataset with a "cluster-like" structure
new_synth_data( real_data, graph_type = "cluster", must_connect = TRUE, graph = NULL, n = 300, seed = NULL, r = 50, dens = 4, k = 3, verbatim = TRUE, signed = FALSE )new_synth_data( real_data, graph_type = "cluster", must_connect = TRUE, graph = NULL, n = 300, seed = NULL, r = 50, dens = 4, k = 3, verbatim = TRUE, signed = FALSE )
real_data |
Matrix. Empirical count table |
graph_type |
String. Structure type for the conditional dependency structure. Here only "cluster" was kept, see EMtree package for more options |
must_connect |
Boolean. TRUE to force the output graph to be connected |
graph |
Boolean. Optional graph to be used, must have rownames and colnames and reference all features from real_data |
n |
Numeric. Number of samples to simulate |
seed |
Numeric. Seed number for data generation (rmvnorm) |
r |
Numeric. For cluster structure, controls the within/between ratio connection probability |
dens |
Numeric. Graph density (for cluster graphs) or edges probability (for erdös-renyi graphs) |
k |
Numeric. For cluster structure, number of groups |
verbatim |
Boolean. Controls verbosity |
signed |
Boolean. TRUE for simulating both positive and negative partial correlations. Default is to FALSE, which implies only negative partial correlations |
List. Containing the simulated discrete counts, the corresponding true partial correlation matrix from the latent Gaussian layer of the model and the original graph structure that was used
This function is adapted from the same name function in OneNet package (version 0.3.1), which is licensed under the MIT License. Original copyright (c) 2021-2024 INRAE.
The MIT License text for the original package is as follows:
---
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
tiny_data <- data.frame( species = c( "One bacteria", "One bacterium L", "One bacterium G", "Two bact", "Three bact A", "Three bact B" ), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4", "msp_5", "msp_6"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07, 0, 0), SAMPLE2 = c(1.251707e-07, 0, 3.985320e-07, 0, 1.33607e-04, 0.8675e-03), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06, 0, 0.662e-03), SAMPLE4 = c(0, 0, 2.98320e-05, 0, 1.275e-04, 0), SAMPLE5 = c(0.0976, 0.9862, 2.98320e-03, 0, 3.9754e-03, 0), SAMPLE6 = c(0.26417e-06, 0, 1.0077e-05, 3.983320e-08, 0, 0) ) count_table <- get_count_table( abund.table = tiny_data %>% dplyr::select(-species), sample.id = colnames(tiny_data), prev.min = 0.1 ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() sim_data <- new_synth_data( count_table$data, n = 50, graph = as.matrix(tiny_graph %>% dplyr::select(-species)), verbatim = FALSE, seed = 20242025 ) %>% suppressWarnings()tiny_data <- data.frame( species = c( "One bacteria", "One bacterium L", "One bacterium G", "Two bact", "Three bact A", "Three bact B" ), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4", "msp_5", "msp_6"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07, 0, 0), SAMPLE2 = c(1.251707e-07, 0, 3.985320e-07, 0, 1.33607e-04, 0.8675e-03), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06, 0, 0.662e-03), SAMPLE4 = c(0, 0, 2.98320e-05, 0, 1.275e-04, 0), SAMPLE5 = c(0.0976, 0.9862, 2.98320e-03, 0, 3.9754e-03, 0), SAMPLE6 = c(0.26417e-06, 0, 1.0077e-05, 3.983320e-08, 0, 0) ) count_table <- get_count_table( abund.table = tiny_data %>% dplyr::select(-species), sample.id = colnames(tiny_data), prev.min = 0.1 ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() sim_data <- new_synth_data( count_table$data, n = 50, graph = as.matrix(tiny_graph %>% dplyr::select(-species)), verbatim = FALSE, seed = 20242025 ) %>% suppressWarnings()
Normalize data and filters it by prevalence level
norm_data( data_with_annotation, col_module_id, prev_list = c(0.3), annotation_level, data_type = "shotgun" )norm_data( data_with_annotation, col_module_id, prev_list = c(0.3), annotation_level, data_type = "shotgun" )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
col_module_id |
String. The name of the column with the module names in annotation_table |
prev_list |
Vector of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence |
annotation_level |
String. Annotation level to aggregate the taxa |
data_type |
String. Enables the treatment of 16S data with "16S", default value is "shotgun" |
List of dataframes. Each element of the list corresponds to a normalized 'data_with_annotation', by level of prevalence
tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_normed <- norm_data( tiny_data, col_module_id = "msp_name", annotation_level = "species", prev_list = c(0.20, 0.30) )tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_normed <- norm_data( tiny_data, col_module_id = "msp_name", annotation_level = "species", prev_list = c(0.20, 0.30) )
Extract edges in graph involving any module in object_of_interest set
prev_for_selected_nodes( data_with_annotation, graph_file, col_module_id, annotation_level, object_of_interest = NULL )prev_for_selected_nodes( data_with_annotation, graph_file, col_module_id, annotation_level, object_of_interest = NULL )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
graph_file |
Dataframe. The object generated by graph_step() function |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
Dataframe. The dataframe of edges in the graph involving modules corresponding to object_of_interest and their corresponding prevalences.
tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_truth <- prev_for_selected_nodes( tiny_data, tiny_graph, col_module_id = "msp_name", annotation_level = "species", object_of_interest = "bacterium" )tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_truth <- prev_for_selected_nodes( tiny_data, tiny_graph, col_module_id = "msp_name", annotation_level = "species", object_of_interest = "bacterium" )
result_example
result_exampleresult_example
A list of dataframes corresponding to apply_NeighborFinder() results.
edge table of Escherichia coli neighbors with only Japanese patients diagnosed with colorectal cancer
edge table of Escherichia coli neighbors with only Chinese patients diagnosed with colorectal cancer
edge table of Escherichia coli neighbors with only European patients diagnosed with colorectal cancer
List the simulated count tables by level of prevalence
simulate_by_prevalence( data_with_annotation, prev_list, graph_file = NULL, col_module_id, annotation_level, sample_size = 500, seed = NULL, verbatim = FALSE, data_type = "shotgun" )simulate_by_prevalence( data_with_annotation, prev_list, graph_file = NULL, col_module_id, annotation_level, sample_size = 500, seed = NULL, verbatim = FALSE, data_type = "shotgun" )
data_with_annotation |
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample |
prev_list |
List of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence. |
graph_file |
Dataframe. The object generated by graph_step() function |
col_module_id |
String. The name of the column with the module names in data_with_annotation |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
sample_size |
Numeric. The size to be considerated, the value of 500 is recommended |
seed |
Numeric. The seed number, ensuring reproducibility |
verbatim |
Boolean. Controls verbosity |
data_type |
String. Enables the treatment of 16S data with "16S", default value is "shotgun" |
List of dataframes. Each element of the list corresponds to a level of prevalence and is a simulated abundance table
tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_sims <- simulate_by_prevalence( tiny_data, prev_list = c(0.20, 0.30), graph_file = tiny_graph, col_module_id = "msp_name", annotation_level = "species", sample_size = 500, seed = 20242025 )tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_sims <- simulate_by_prevalence( tiny_data, prev_list = c(0.20, 0.30), graph_file = tiny_graph, col_module_id = "msp_name", annotation_level = "species", sample_size = 500, seed = 20242025 )
Simulate data Generates synthetic count data based on empirical cumulative distribution (ecdf) of real count data
simulate_from_ecdf(real_data, Sigma, n, seed = NULL, verbatim = FALSE)simulate_from_ecdf(real_data, Sigma, n, seed = NULL, verbatim = FALSE)
real_data |
Matrix. Contains real count data of size n by p |
Sigma |
Matrix. Covariance structure of size p by p |
n |
Numeric. Number of samples |
seed |
Numeric. Seed number for data generation |
verbatim |
Boolean. If TRUE: iteration and index calculation for each step are printed out |
Matrix. The vector from the upper triangular part of A.mat
This function is adapted from the same name function in OneNet package (version 0.3.1), which is licensed under the MIT License. Original copyright (c) 2021-2024 INRAE.
The MIT License text for the original package is as follows:
---
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
#' @format A dataframe with 2537 msps (rows) and 4 columns:
string
string
string
string indicating gut and/or oral
taxotaxo
An object of class data.frame with 2537 rows and 4 columns.
https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/7IVO3E
Render a table gathering precision and recall rates before and after filtering on coefficient values
test_filter(df_before, df_after, prevs = NULL)test_filter(df_before, df_after, prevs = NULL)
df_before |
Dataframe. The one of true neighbors |
df_after |
Dataframe. The one of detected neighbors |
prevs |
List of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence |
Dataframe. Returns the precision and recall rates before and after the modification
# Dataframe with true neighbors list_true <- list( tibble::tibble( node1 = c("msp_1", "msp_1", "msp_2", "msp_3"), node2 = c("msp_55", "msp_20", "msp_3", "msp_18"), prev1 = c(0.28, 0.28, 0.96, 0.75), prev2 = c(0.76, 0.25, 0.75, 0.60) ), tibble::tibble( node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), prev1 = c(0.96, 0.75), prev2 = c(0.75, 0.60) ) ) %>% rlang::set_names(c("0.20", "0.30")) # Dataframes with detected neighbors list_detected <- list( tibble::tibble( prev_level = c("0.20", "0.30", "0.30", "0.30"), node1 = c("msp_2", "msp_2", "msp_3", "msp_3"), node2 = c("msp_3", "msp_3", "msp_18", "msp_8"), coef = c(0.406, -0.025, 0.160, 0.005), filtering_top = c(100, 100, 100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) list_detected2 <- list( tibble::tibble( prev_level = c("0.20", "0.20"), node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), coef = c(0.160, 0.005), filtering_top = c(100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) # Use final_step() to gather both neighbors <- final_step(list_true, list_detected, robustness_step = FALSE) neighbors2 <- final_step(list_true, list_detected2, robustness_step = FALSE) %>% dplyr::mutate(filtering_top = 10) # Calculate scores scores <- test_filter(neighbors, neighbors2)# Dataframe with true neighbors list_true <- list( tibble::tibble( node1 = c("msp_1", "msp_1", "msp_2", "msp_3"), node2 = c("msp_55", "msp_20", "msp_3", "msp_18"), prev1 = c(0.28, 0.28, 0.96, 0.75), prev2 = c(0.76, 0.25, 0.75, 0.60) ), tibble::tibble( node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), prev1 = c(0.96, 0.75), prev2 = c(0.75, 0.60) ) ) %>% rlang::set_names(c("0.20", "0.30")) # Dataframes with detected neighbors list_detected <- list( tibble::tibble( prev_level = c("0.20", "0.30", "0.30", "0.30"), node1 = c("msp_2", "msp_2", "msp_3", "msp_3"), node2 = c("msp_3", "msp_3", "msp_18", "msp_8"), coef = c(0.406, -0.025, 0.160, 0.005), filtering_top = c(100, 100, 100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) list_detected2 <- list( tibble::tibble( prev_level = c("0.20", "0.20"), node1 = c("msp_2", "msp_3"), node2 = c("msp_3", "msp_18"), coef = c(0.160, 0.005), filtering_top = c(100, 100) ), tibble::tibble() ) %>% rlang::set_names(c("0.20", "0.30")) # Use final_step() to gather both neighbors <- final_step(list_true, list_detected, robustness_step = FALSE) neighbors2 <- final_step(list_true, list_detected2, robustness_step = FALSE) %>% dplyr::mutate(filtering_top = 10) # Calculate scores scores <- test_filter(neighbors, neighbors2)
Give true neighbors by level of prevalence
truth_by_prevalence(edge_table, prev_list)truth_by_prevalence(edge_table, prev_list)
edge_table |
Dataframe. The result of prev_for_selected_nodes() |
prev_list |
List of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence |
List of dataframes. Each element of the list corresponds to a dataframe of true edges by level of prevalence
tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_truth <- prev_for_selected_nodes( tiny_data, tiny_graph, col_module_id = "msp_name", annotation_level = "species", object_of_interest = "bacterium" ) tiny_true_edges <- truth_by_prevalence(tiny_truth, c(0.20, 0.30))tiny_data <- data.frame( species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"), msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"), SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07), SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0), SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06), SAMPLE4 = c(0, 0, 2.98320e-05, 0) ) tiny_graph <- graph_step( tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025 ) %>% suppressWarnings() tiny_truth <- prev_for_selected_nodes( tiny_data, tiny_graph, col_module_id = "msp_name", annotation_level = "species", object_of_interest = "bacterium" ) tiny_true_edges <- truth_by_prevalence(tiny_truth, c(0.20, 0.30))
Display network after applying NeighborFinder
visualize_network( res_NeighborFinder, annotation_table, col_module_id, annotation_level, object_of_interest, annotation_option = FALSE, node_size = 12, label_size = 4, object_color = "cadetblue2", seed = NULL )visualize_network( res_NeighborFinder, annotation_table, col_module_id, annotation_level, object_of_interest, annotation_option = FALSE, node_size = 12, label_size = 4, object_color = "cadetblue2", seed = NULL )
res_NeighborFinder |
Dataframe. The result from apply_NeighborFinder() |
annotation_table |
Dataframe. The dataframe gathering the taxonomic or functional module correspondence information |
col_module_id |
String. The name of the column with the module names in annotation_table |
annotation_level |
String. The name of the column with the level to be studied. Examples: species, genus, level_1 |
object_of_interest |
String. The name of the bacteria or species of interest or a key word in the functional module definition |
annotation_option |
Boolean. Default value is False. If True: labels on nodes become module names instead of module IDs |
node_size |
Numeric. The parameter to adjust size of nodes |
label_size |
Numeric. The parameter to adjust size of labels |
object_color |
String. The name of the color to differentiate the nodes corresponding to 'object_of_interest' from the other module IDs |
seed |
Numeric. The seed number, ensuring reproducibility |
Network. Visualization of NeighborFinder results. Edge color encodes coefficient sign: green if positive, red if negative; edge width encodes magnitude.
data(taxo) visualize_network( result_example$res_CRC_JPN, taxo, object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", label_size = 5 ) # #With species names instead of msp names # visualize_network( # result_example$res_CRC_JPN, # taxo, # object_of_interest = "Escherichia coli", # col_module_id = "msp_id", # annotation_level = "species", # label_size = 5, # annotation_option = TRUE, # seed = 2 # ) # #With esthetic changes # visualize_network( # result_example$res_CRC_JPN, # taxo, # object_of_interest = "Escherichia coli", # col_module_id = "msp_id", # annotation_level = "species", # annotation_option = TRUE, # node_size = 15, # label_size = 6, # object_color = "orange", # seed = 2 # )data(taxo) visualize_network( result_example$res_CRC_JPN, taxo, object_of_interest = "Escherichia coli", col_module_id = "msp_id", annotation_level = "species", label_size = 5 ) # #With species names instead of msp names # visualize_network( # result_example$res_CRC_JPN, # taxo, # object_of_interest = "Escherichia coli", # col_module_id = "msp_id", # annotation_level = "species", # label_size = 5, # annotation_option = TRUE, # seed = 2 # ) # #With esthetic changes # visualize_network( # result_example$res_CRC_JPN, # taxo, # object_of_interest = "Escherichia coli", # col_module_id = "msp_id", # annotation_level = "species", # annotation_option = TRUE, # node_size = 15, # label_size = 6, # object_color = "orange", # seed = 2 # )