---
title: "Handling MATLAB Results"
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Handling MATLAB Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction

This vignette demonstrates how to process and refine annotated and automatically classified Imaging FlowCytobot (IFCB) data in R using the `iRfcb` package. The workflow assumes that MATLAB-based preprocessing has already been conducted using the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository (Sosik and Olson 2007). This preprocessing includes generating `.mat` files for annotated and classified images.

With `iRfcb`, you can further analyze and manage IFCB data, including summarizing annotations and class results, refining annotations.

## Getting Started

### Installation

You can install the package from GitHub using the `remotes` package:
```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("EuropeanIFCBGroup/iRfcb")
```
Some functions from the `iRfcb` package used in this tutorial require `Python` to be installed. You can download `Python` from the official website: [python.org/downloads](https://www.python.org/downloads/).

Load the `iRfcb` library:
```{r, eval=FALSE}
library(iRfcb)
```

```{r, include=FALSE}
library(iRfcb)
```

### Download Sample Data

To get started, download sample data from the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455.v3) (Torstensson et al. 2024) with the following function:
```{r, eval=FALSE}
# Define data directory
data_dir <- "data"

# Download and extract test data in the data folder
ifcb_download_test_data(dest_dir = data_dir,
                        max_retries = 10,
                        sleep_time = 30,
                        verbose = FALSE)
```

```{r, include=FALSE}
# Define data directory
data_dir <- "data"

# Download and extract test data in the data folder
if (!dir.exists(data_dir)) {
  # Download and extract test data if the folder does not exist
  ifcb_download_test_data(dest_dir = data_dir,
                          max_retries = 10,
                          sleep_time = 30,
                          verbose = FALSE)
}
```

## Classified Results from MATLAB

The `iRfcb` package facilitates the processing and analysis of data classified using a random forest algorithm from the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository. This workflow supports various tasks such as extracting classified results, reading summary files, and calculating biovolume and carbon content.

This section provides an overview of key functions available in `iRfcb` for handling classified IFCB data. Step-by-step examples are included to guide users through extracting results, summarizing data, and leveraging functionalities for both automated and manually annotated datasets.

### Extract Classified Images from a Sample

To begin working with classified data, you can extract all classified images from a specific sample. This is especially useful for isolating ROIs based on specific taxa or classification thresholds.

```{r}
# Extract all classified images from a sample
ifcb_extract_classified_images(
  sample = "D20230314T001205_IFCB134",
  classified_folder = "data/classified",
  roi_folder = "data/data",
  out_folder = "data/classified_images",
  taxa = "Tripos_lineatus", # A specific taxa or "All"
  threshold = "opt") # or specify another threshold
```

### Read a Summary File

Summary files generated by the MATLAB function `countcells_allTBnew_user_training` provide aggregated classified data. Use the following function to read and process these files.

```{r}
# Read a MATLAB summary file generated by `countcells_allTBnew_user_training`
summary_data <- ifcb_read_summary("data/classified/2023/summary/summary_allTB_2023.mat",
                                  biovolume = FALSE,
                                  threshold = "opt")

# Print output
head(summary_data)
```

Alternatively, `iRfcb` can directly aggregate data and compute carbon content from classification files using the [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) function demonstrated below.

### Summarize Counts, Biovolumes and Carbon Content from Classified IFCB Data

This function calculates aggregated biovolumes and carbon content from IFCB samples based on feature and MATLAB classification result files, without summarizing the data in MATLAB. The function can also be adapted to process classification results from other non-MATLAB machine learning algorithms (e.g., a CNN model) by providing custom lists of image names and class labels through the `custom_images` and `custom_classes` arguments.

Biovolumes are converted to carbon according to Menden-Deuer and Lessard (2000) for individual ROIs, where different conversion factors are applied to diatoms and non-diatom protist. If provided, it also incorporates sample volume data from `.hdr` files to compute biovolume and carbon content per liter of sample. See details in the help pages for [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) and [`ifcb_extract_biovolumes`](../reference/ifcb_extract_biovolumes.html). 

```{r}
# Summarize biovolume data using IFCB data from classified data folder
biovolume_data <- ifcb_summarize_biovolumes(
  feature_folder = "data/features/2023",
  mat_folder = "data/classified",
  hdr_folder = "data/data/2023",
  micron_factor = 1/3.4,
  diatom_class = "Bacillariophyceae",
  threshold = "opt",
  verbose = FALSE) # Do not print progress bars

# Print output
head(biovolume_data)
```

### Summarize Counts, Biovolumes and Carbon Content from Manually Annotated IFCB Data

The [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html) function can also be used to calculate aggregated biovolumes and carbon content from manually annotated IFCB image data. See details in the help pages for [`ifcb_summarize_biovolumes`](../reference/ifcb_summarize_biovolumes.html), [`ifcb_extract_biovolumes`](../reference/ifcb_extract_biovolumes.html) and [`ifcb_count_mat_annotations`](../reference/ifcb_count_mat_annotations.html).

```{r}
# Summarize biovolume data using IFCB data from manual data folder
manual_biovolume_data <- ifcb_summarize_biovolumes(
  feature_folder = "data/features",
  mat_folder = "data/manual",
  class2use_file = "data/config/class2use.mat",
  hdr_folder = "data/data",
  micron_factor = 1/3.4,
  diatom_class = "Bacillariophyceae",
  verbose = FALSE) # Do not print progress bars

# Print output
head(manual_biovolume_data)
```

## Manually Annotated Data from MATLAB

### Count and Summarize Annotated Image Data

#### PNG Directory

Summarize counts of annotated images at the sample and class levels. The `hdr_folder` can be included to add GPS positions to the sample data frame:
```{r}
# Summarise counts on sample level
png_per_sample <- ifcb_summarize_png_counts(png_folder = "data/png",
                                            hdr_folder = "data/data",
                                            sum_level = "sample")

# Print output
head(png_per_sample)

# Summarise counts on class level
png_per_class <- ifcb_summarize_png_counts(png_folder = "data/png",
                                           sum_level = "class")

# Print output
head(png_per_class)
```

#### MATLAB Files

Count the annotations in the MATLAB files, similar to [`ifcb_summarize_png_counts`](../reference/ifcb_summarize_png_counts.html):
```{r}
# Summarize counts from MATLAB files
mat_count <- ifcb_count_mat_annotations(
  manual_files = "data/manual",
  class2use_file = "data/config/class2use.mat",
  skip_class = "unclassified", # Or class ID
  sum_level = "class") # Or per "sample"

# Print output
head(mat_count)
```

### Run Image Gallery

To visually inspect and correct annotations, run the image gallery. 
```{r, eval=FALSE}
# Run Shiny app
ifcb_run_image_gallery()
```

![image_gallery](../man/figures/image_gallery.png)

Individual images can be selected and a list of selected images can be downloaded as a `correction` file. This file can be used to correct `.mat` annotations below using the [`ifcb_correct_annotation`](../reference/ifcb_correct_annotation.html) function.

### Correct .mat Files After Checking Images in the App

```{r, include=FALSE}
library(reticulate)

# Define path to virtual environment
env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path

# Install python virtual environment
tryCatch({
  ifcb_py_install(envname = env_path)
}, error = function(e) {
  warning("Python environment could not be installed.")
})
```

```{r, echo=FALSE}
# Check if Python is available
if (!py_available(initialize = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
  warning("Python is not available. Skipping the rest of the vignette evaluation.")
} else {
  # List available packages
  available_packages <- py_list_packages(python = reticulate::py_discover_config()$python)
  
  # Check if scipy is available
  if (!"scipy" %in% available_packages$package) {
    knitr::opts_chunk$set(eval = FALSE)
    warning("Required python modules are not available. Skipping the rest of the vignette evaluation.")
  }
}
```

After reviewing images in the gallery, correct the `.mat` files using the `correction` file with selected images:
```{r, eval=FALSE}
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
                                   variable_name = class_name)

# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
                         class2use))

# Initialize the python session if not already set up
env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path
ifcb_py_install(envname = env_path)

# Correct the annotation with the output from the image gallery
ifcb_correct_annotation(
  manual_folder = "data/manual",
  out_folder = "data/manual",
  correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
  correct_classid = unclassified_id)
```

```{r, include=FALSE}
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
                                   variable_name = class_name)

# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
                         class2use))

# Correct the annotation with the output from the image gallery
ifcb_correct_annotation(
  manual_folder = "data/manual",
  out_folder = "data/manual",
  correction = "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt",
  correct_classid = unclassified_id)
```

### Replace Specific Class Annotations

Replace all instances of a specific class with **unclassified** (class id 1):
```{r}
# Get class2use
class_name <- ifcb_get_mat_names("data/config/class2use.mat")
class2use <- ifcb_get_mat_variable("data/config/class2use.mat",
                                   variable_name = class_name)

# Find the class id of Alexandrium_pseudogonyaulax
ap_id <- which(grepl("Alexandrium_pseudogonyaulax",
                     class2use))

# Find the class id of unclassified
unclassified_id <- which(grepl("unclassified",
                         class2use))

# Move all Alexandrium_pseudogonyaulax images to unclassified
ifcb_replace_mat_values(manual_folder = "data/manual",
                        out_folder = "data/manual",
                        target_id = ap_id,
                        new_id = unclassified_id)
```

### Verify Correction

Verify that the corrections have been applied:
```{r}
# Summarize new counts after correction
mat_count <- ifcb_count_mat_annotations(
  manual_files = "data/manual",
  class2use_file = "data/config/class2use.mat",
  skip_class = "unclassified", # Or class ID
  sum_level = "class") # Or per "sample"

# Print output
head(mat_count)
```

### Annotate Images in Batch

Images can be batch annotated using the [`ifcb_annotate_batch`](../reference/ifcb_annotate_batch.html) function. If a manual file already exists for the sample, the ROI class list will be updated accordingly. If no file is found, a new `.mat` file will be created, with all unannotated ROIs marked as unclassified.

```{r}
# Read a file with selected images, generated by the image gallery app
correction <- read.table(
  "data/manual/correction/Alexandrium_pseudogonyaulax_selected_images.txt", 
  header = TRUE)

# Print image names to be annotated
print(correction$image_filename)

# Re-annotate the images that were moved to unclassified earlier in the tutorial
ifcb_annotate_batch(png_images = correction$image_filename,
                    class = "Alexandrium_pseudogonyaulax",
                    manual_folder = "data/manual",
                    adc_folder = "data/data",
                    class2use_file = "data/config/class2use.mat")

# Summarize new counts after re-annotation
mat_count <- ifcb_count_mat_annotations(
  manual_files = "data/manual",
  class2use_file = "data/config/class2use.mat",
  skip_class = "unclassified",
  sum_level = "class")

# Print output and check if Alexandrium pseudogonyaulax is back
head(mat_count)
```

This concludes this tutorial for the `iRfcb` package. For more detailed information, refer to the package documentation or the other [tutorials](../articles/index.html). See how data pipelines can be constructed using `iRfcb` in the following [Example Project](https://github.com/nodc-sweden/ifcb-data-pipeline). Happy analyzing!

## Citation

```{r, echo=FALSE}
# Print citation
citation("iRfcb")
```

```{r, include=FALSE}
# Clean up
unlink(file.path(data_dir, "classified_images"), recursive = TRUE)
unlink(file.path(data_dir, "zip"), recursive = TRUE)
```

## References
- Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
- Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
- Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
- Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3