--- title: "Handling MATLAB Results" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Handling MATLAB Results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction This vignette demonstrates how to process and refine annotated and automatically classified Imaging FlowCytobot (IFCB) data in R using the `iRfcb` package. The workflow assumes that MATLAB-based preprocessing has already been conducted using the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository (Sosik and Olson 2007). This preprocessing includes generating `.mat` files for annotated and classified images. With `iRfcb`, you can further analyze and manage IFCB data, including summarizing annotations and class results, refining annotations. ## Getting Started ### Installation You can install the package from GitHub using the `remotes` package: ```{r, eval=FALSE} # install.packages("remotes") remotes::install_github("EuropeanIFCBGroup/iRfcb") ``` Some functions from the `iRfcb` package used in this tutorial require `Python` to be installed. You can download `Python` from the official website: [python.org/downloads](https://www.python.org/downloads/). Load the `iRfcb` library: ```{r, eval=FALSE} library(iRfcb) ``` ```{r, include=FALSE} library(iRfcb) ``` ### Download Sample Data To get started, download sample data from the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455.v3) (Torstensson et al. 2024) with the following function: ```{r, eval=FALSE} # Define data directory data_dir <- "data" # Download and extract test data in the data folder ifcb_download_test_data(dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE) ``` ```{r, include=FALSE} # Define data directory data_dir <- "data" # Download and extract test data in the data folder if (!dir.exists(data_dir)) { # Download and extract test data if the folder does not exist ifcb_download_test_data(dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE) } ``` ## Classified Results from MATLAB The `iRfcb` package facilitates the processing and analysis of data classified using a random forest algorithm from the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository. This workflow supports various tasks such as extracting classified results, reading summary files, and calculating biovolume and carbon content. This section provides an overview of key functions available in `iRfcb` for handling classified IFCB data. Step-by-step examples are included to guide users through extracting results, summarizing data, and leveraging functionalities for both automated and manually annotated datasets. ### Extract Classified Images from a Sample To begin working with classified data, you can extract all classified images from a specific sample. This is especially useful for isolating ROIs based on specific taxa or classification thresholds. ```{r} # Extract all classified images from a sample ifcb_extract_classified_images( sample = "D20230314T001205_IFCB134", classified_folder = "data/classified", roi_folder = "data/data", out_folder = "data/classified_images", taxa = "Tripos_lineatus", # A specific taxa or "All" threshold = "opt") # or specify another threshold ``` ### Read a Summary File Summary files generated by the MATLAB function `countcells_allTBnew_user_training` provide aggregated classified data. Use the following function to read and process these files. ```{r} # Read a MATLAB summary file generated by `countcells_allTBnew_user_training` summary_data <- ifcb_read_summary("data/classified/2023/summary/summary_allTB_2023.mat", biovolume = FALSE, threshold = "opt") # Print output head(summary_data) ``` Alternatively, `iRfcb` can directly aggregate data and compute carbon content from classification files using the [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) function demonstrated below. ### Summarize Counts, Biovolumes and Carbon Content from Classified IFCB Data This function calculates aggregated biovolumes and carbon content from IFCB samples based on feature and MATLAB classification result files, without summarizing the data in MATLAB. The function can also be adapted to process classification results from other non-MATLAB machine learning algorithms (e.g., a CNN model) by providing custom lists of image names and class labels through the `custom_images` and `custom_classes` arguments. Biovolumes are converted to carbon according to Menden-Deuer and Lessard (2000) for individual ROIs, where different conversion factors are applied to diatoms and non-diatom protist. If provided, it also incorporates sample volume data from `.hdr` files to compute biovolume and carbon content per liter of sample. See details in the help pages for [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) and [`ifcb_extract_biovolumes`](../reference/ifcb_extract_biovolumes.html). ```{r} # Summarize biovolume data using IFCB data from classified data folder biovolume_data <- ifcb_summarize_biovolumes( feature_folder = "data/features/2023", mat_folder = "data/classified", hdr_folder = "data/data/2023", micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", threshold = "opt", verbose = FALSE) # Do not print progress bars # Print output head(biovolume_data) ``` ### Summarize Counts, Biovolumes and Carbon Content from Manually Annotated IFCB Data The [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) function can also be used to calculate aggregated biovolumes and carbon content from manually annotated IFCB image data. See details in the help pages for [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html), [`ifcb_extract_biovolumes`](../reference/ifcb_extract_biovolumes.html) and [`ifcb_count_mat_annotations`](../reference/ifcb_count_mat_annotations.html). ```{r} # Summarize biovolume data using IFCB data from manual data folder manual_biovolume_data <- ifcb_summarize_biovolumes( feature_folder = "data/features", mat_folder = "data/manual", class2use_file = "data/config/class2use.mat", hdr_folder = "data/data", micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", verbose = FALSE) # Do not print progress bars # Print output head(manual_biovolume_data) ``` ## Manually Annotated Data from MATLAB ### Count and Summarize Annotated Image Data #### PNG Directory Summarize counts of annotated images at the sample and class levels. The `hdr_folder` can be included to add GPS positions to the sample data frame: ```{r} # Summarise counts on sample level png_per_sample <- ifcb_summarize_png_counts(png_folder = "data/png", hdr_folder = "data/data", sum_level = "sample") # Print output head(png_per_sample) # Summarise counts on class level png_per_class <- ifcb_summarize_png_counts(png_folder = "data/png", sum_level = "class") # Print output head(png_per_class) ``` #### MATLAB Files Count the annotations in the MATLAB files, similar to [`ifcb_summarize_png_counts`](../reference/ifcb_summarize_png_counts.html): ```{r} # Summarize counts from MATLAB files mat_count <- ifcb_count_mat_annotations( manual_files = "data/manual", class2use_file = "data/config/class2use.mat", skip_class = "unclassified", # Or class ID sum_level = "class") # Or per "sample" # Print output head(mat_count) ``` ### Run Image Gallery To visually inspect and correct annotations, run the image gallery. ```{r, eval=FALSE} # Run Shiny app ifcb_run_image_gallery() ``` ![image_gallery](../man/figures/image_gallery.png) Individual images can be selected and a list of selected images can be downloaded as a `correction` file. This file can be used to correct `.mat` annotations below using the [`ifcb_correct_annotation`](../reference/ifcb_correct_annotation.html) function. ### Correct .mat Files After Checking Images in the App ```{r, include=FALSE} library(reticulate) # Define path to virtual environment env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path # Install python virtual environment tryCatch({ ifcb_py_install(envname = env_path) }, error = function(e) { warning("Python environment could not be installed.") }) ``` ```{r, echo=FALSE} # Check if Python is available if (!py_available(initialize = TRUE)) { knitr::opts_chunk$set(eval = FALSE) warning("Python is not available. Skipping the rest of the vignette evaluation.") } else { # List available packages available_packages <- py_list_packages(python = reticulate::py_discover_config()$python) # Check if scipy is available if (!"scipy" %in% available_packages$package) { knitr::opts_chunk$set(eval = FALSE) warning("Required python modules are not available. Skipping the rest of the vignette evaluation.") } } ``` After reviewing images in the gallery, correct the `.mat` files using the `correction` file with selected images: ```{r, eval=FALSE} # Get class2use class_name <- ifcb_get_mat_names("data/config/class2use.mat") class2use <- ifcb_get_mat_variable("data/config/class2use.mat", variable_name = class_name) # Find the class id of unclassified unclassified_id <- which(grepl("unclassified", class2use)) # Initialize the python session if not already set up env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path ifcb_py_install(envname = env_path) # Correct the annotation with the output from the image gallery ifcb_correct_annotation( manual_folder = "data/manual", out_folder = "data/manual", correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt", correct_classid = unclassified_id) ``` ```{r, include=FALSE} # Get class2use class_name <- ifcb_get_mat_names("data/config/class2use.mat") class2use <- ifcb_get_mat_variable("data/config/class2use.mat", variable_name = class_name) # Find the class id of unclassified unclassified_id <- which(grepl("unclassified", class2use)) # Correct the annotation with the output from the image gallery ifcb_correct_annotation( manual_folder = "data/manual", out_folder = "data/manual", correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt", correct_classid = unclassified_id) ``` ### Replace Specific Class Annotations Replace all instances of a specific class with **unclassified** (class id 1): ```{r} # Get class2use class_name <- ifcb_get_mat_names("data/config/class2use.mat") class2use <- ifcb_get_mat_variable("data/config/class2use.mat", variable_name = class_name) # Find the class id of Alexandrium_pseudogonyaulax ap_id <- which(grepl("Alexandrium_pseudogonyaulax", class2use)) # Find the class id of unclassified unclassified_id <- which(grepl("unclassified", class2use)) # Move all Alexandrium_pseudogonyaulax images to unclassified ifcb_replace_mat_values(manual_folder = "data/manual", out_folder = "data/manual", target_id = ap_id, new_id = unclassified_id) ``` ### Verify Correction Verify that the corrections have been applied: ```{r} # Summarize new counts after correction mat_count <- ifcb_count_mat_annotations( manual_files = "data/manual", class2use_file = "data/config/class2use.mat", skip_class = "unclassified", # Or class ID sum_level = "class") # Or per "sample" # Print output head(mat_count) ``` ### Annotate Images in Batch Images can be batch annotated using the [`ifcb_annotate_batch`](../reference/ifcb_annotate_batch.html) function. If a manual file already exists for the sample, the ROI class list will be updated accordingly. If no file is found, a new `.mat` file will be created, with all unannotated ROIs marked as unclassified. ```{r} # Read a file with selected images, generated by the image gallery app correction <- read.table( "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt", header = TRUE) # Print image names to be annotated print(correction$image_filename) # Re-annotate the images that were moved to unclassified earlier in the tutorial ifcb_annotate_batch(png_images = correction$image_filename, class = "Alexandrium_pseudogonyaulax", manual_folder = "data/manual", adc_folder = "data/data", class2use_file = "data/config/class2use.mat") # Summarize new counts after re-annotation mat_count <- ifcb_count_mat_annotations( manual_files = "data/manual", class2use_file = "data/config/class2use.mat", skip_class = "unclassified", sum_level = "class") # Print output and check if Alexandrium pseudogonyaulax is back head(mat_count) ``` This concludes this tutorial for the `iRfcb` package. For more detailed information, refer to the package documentation or the other [tutorials](../articles/index.html). See how data pipelines can be constructed using `iRfcb` in the following [Example Project](https://github.com/nodc-sweden/ifcb-data-pipeline). Happy analyzing! ## Citation ```{r, echo=FALSE} # Print citation citation("iRfcb") ``` ```{r, include=FALSE} # Clean up unlink(file.path(data_dir, "classified_images"), recursive = TRUE) unlink(file.path(data_dir, "zip"), recursive = TRUE) ``` ## References - Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a - Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569. - Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216. - Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3