--- title: "Sharing Annotated IFCB Images" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sharing Annotated IFCB Images} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction Annotated images can be shared as zipped `.png` packages through various data repositories (e.g., Kraft et al., 2022; Torstensson et al., 2024), enabling others to train or enhance their image classifiers. This vignette provides a step-by-step guide to extracting and preparing such images for publication using the `iRfcb` package. The workflow assumes that Regions of Interest (ROIs) have been annotated using the MATLAB code from the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository (Sosik and Olson, 2007). However, the methods presented can be adapted to process images generated by other software platforms. The archive can be shared through various sources, such as [Figshare](https://figshare.com/), [Zenodo](https://zenodo.org/), [EUDAT](https://b2share.eudat.eu/). Links to some repositories from Northern Europe are gathered at the [Nordic Microalgae](https://nordicmicroalgae.org/annotated-images/) webpage. Images may also be shared through EcoTaxa, which is demonstrated in the [Prepare IFCB Images for EcoTaxa](../articles/ecotaxa-tutorial.html) tutorial. Additionally, this vignette shows how users of the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) package can share and merge multiple datasets of manually annotated images, enabling MATLAB users to incorporate external datasets into their random forest algorithms. ## Getting Started ### Installation You can install the package from GitHub using the `remotes` package: ```{r, eval=FALSE} # install.packages("remotes") remotes::install_github("EuropeanIFCBGroup/iRfcb") ``` Load the `iRfcb` library: ```{r, eval=FALSE} library(iRfcb) ``` ```{r, include=FALSE} library(iRfcb) ``` ### Download Sample Data To get started, download sample data from the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455.v3) (Torstensson et al. 2024) with the following function: ```{r, eval=FALSE} # Define data directory data_dir <- "data" # Download and extract test data in the data folder ifcb_download_test_data(dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE) ``` ```{r, include=FALSE} # Define data directory data_dir <- "data" # Download and extract test data in the data folder if (!dir.exists(data_dir)) { # Download and extract test data if the folder does not exist ifcb_download_test_data(dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE) } ``` ## Extract Annotated Images Extract annotated ROIs as `.png` images in subfolders for each class, skipping the **unclassified** (class id 1) category: ```{r} # Extract .png images ifcb_extract_annotated_images(manual_folder = "data/manual", class2use_file = "data/config/class2use.mat", roi_folders = "data/data", out_folder = "data/extracted_images", skip_class = 1, # or "unclassified" verbose = FALSE) # Do not print messages ``` ## Package PNG Directory Prepare the PNG directory for publication as a zip-archive, similar to the files in the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455) (Torstensson et al. 2024). This function reads, updates, and incorporates a **README** file into the zip archive. A template **README** file is included with the `iRfcb` package. ```{r} # Create zip-archive ifcb_zip_pngs(png_folder = "data/extracted_images", zip_filename = "data/zip/ifcb_annotated_images_corrected.zip", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), # Template icluded in `iRfcb` email_address = "tutorial@test.com", version = "1.1", print_progress = FALSE) ``` ## Package MATLAB Directory Prepare the MATLAB directory for publication as a zip-archive, similar to the files in the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455): ```{r} # Create zip-archive ifcb_zip_matlab(manual_folder = "data/manual", features_folder = "data/features", class2use_file = "data/config/class2use.mat", zip_filename = "data/zip/ifcb_matlab_files_corrected.zip", data_folder = "data/data", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), # Template icluded in `iRfcb` matlab_readme_file = system.file("exdata/MATLAB-template.md", package = "iRfcb"), # Template icluded in `iRfcb` email_address = "tutorial@test.com", version = "1.1", print_progress = FALSE) ``` ## Create MANIFEST.txt Create a manifest file for the zip-archive (required for some data repositories): ```{r} # Create MANIFEST.txt of the zip folder content ifcb_create_manifest("data/zip") ``` ## Merge Manual Datasets Datasets that have been manually annotated using the MATLAB code from the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) repository (Sosik and Olson 2007) can be merged using the [`ifcb_merge_manual`](../reference/ifcb_merge_manual.html) function. This is a wrapper function of the [`ifcb_create_class2use`](../reference/ifcb_create_class2use.html), [`ifcb_replace_mat_values`](../reference/ifcb_replace_mat_values.html) and [`ifcb_adjust_classes`](../reference/ifcb_adjust_classes.html) functions. In this example, two datasets from the Swedish west coast are downloaded from the [SMHI IFCB Plankton Image Reference Library (version 3)](https://doi.org/10.17044/scilifelab.25883455.v3) (Torstensson et al. 2024) and combined into a single dataset. Please note that these datasets are large, and the downloading and merging processes may take considerable time. ```{r, eval=FALSE} # Define data directories skagerrak_kattegat_dir <- "data_skagerrak_kattegat" tangesund_dir <- "data_tangesund" merged_dir <- "data_skagerrak_kattegat_tangesund_merged" # Download and extract Skagerrak-Kattegat data in the data folder ifcb_download_test_data(dest_dir = skagerrak_kattegat_dir, figshare_article = "48158725") # Download and extract Tångesund data in the data folder ifcb_download_test_data(dest_dir = tangesund_dir, figshare_article = "48158731") # Initialize the python session if not already set up env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path ifcb_py_install(envname = env_path) # Merge Skagerrak-Kattegat and Tångesund to a single dataset ifcb_merge_manual( class2use_file_base = file.path(skagerrak_kattegat_dir, "config/class2use.mat"), class2use_file_additions = file.path(tangesund_dir, "config/class2use.mat"), class2use_file_output = file.path(merged_dir, "config/class2use.mat"), manual_folder_base = file.path(skagerrak_kattegat_dir, "manual"), manual_folder_additions = file.path(tangesund_dir, "manual"), manual_folder_output = file.path(merged_dir, "manual")) ``` This concludes this tutorial for the `iRfcb` package. For more detailed information, refer to the package documentation or the other [tutorials](../articles/index.html). See how data pipelines can be constructed using `iRfcb` in the following [Example Project](https://github.com/nodc-sweden/ifcb-data-pipeline). Happy analyzing! ## Citation ```{r, echo=FALSE} # Print citation citation("iRfcb") ``` ```{r, include=FALSE} # Clean up unlink(file.path(data_dir, "extracted_images"), recursive = TRUE) unlink(file.path(data_dir, "zip"), recursive = TRUE) ``` ## References - Kraft, K., Velhonoja, O., Seppälä, J., Hällfors, H., Suikkanen, S., Ylöstalo, P., Anglès, S., Kielosto, S., Kuosa, H., Lehtinen, S., Oja, J., Tamminen, T. (2022). SYKE-plankton_IFCB_2022 [Data set]. https://b2share.eudat.eu. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a - Sosik, H. M. and Olson, R. J. (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216. - Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3