---
title: "Importing from qiime2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Importing from qiime2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
if (!requireNamespace("h5lite", quietly = TRUE)) {
  message("Suggested package: 'h5lite' is not installed")
  knitr::opts_chunk$set(eval = FALSE)
} else {
  library(h5lite)
}
library(strollur)
```

*strollur* includes the function `read_qiime2()` as well as several functions to read [qiime2](https://qiime2.org) output files individually. To create a dataset from the outputs of the qiime2, run the following:

```{r}
qza_files <- c(
  strollur_example("rep_seqs.qza"),
  strollur_example("table.qza"),
  strollur_example("taxonomy.qza"),
  strollur_example("rooted-tree.qza")
)

data <- read_qiime2(
  qza = qza_files,
  metadata = strollur_example("sample_metadata.tsv"),
  dataset_name = "qiime_moving_pictures"
)
```

To view a summary of data:

```{r}
data
```

## Reading Individual Files

* `unpack_qiime2_artifact()` unpacks qiime2 *qza* files and returns an artifact
* `read_qiime2_feature_table()` reads a qiime2 *qza* file containing bin data
* `read_qiime2_taxonomy()` reads a qiime2 *qza* file containing containing taxonomy data
* `read_qiime2_metadata()` read qiime2 *tsv* table containing metadata

To create a dataset and read the individual files, you can use the functions below. First let's create a dataset named my_data.

```{r}
my_data <- new_dataset(dataset_name = "my_data")
```

To decompress the individual qza files, you can use the `unpack_qiime2_artifact()`. unpack_qiime2_artifact will extract a folder containing the data, versioning and provenance information. Let's take a look at the rep_seqs.qza file first.

```{r}
fasta_artifact <- unpack_qiime2_artifact(qza = strollur_example("rep_seqs.qza"))
```

fasta_artifact is a list. To add the [FASTA](https://www.ncbi.nlm.nih.gov/genbank/fastaformat/) data to your dataset you can use the `read_fasta()` function:

```{r}
fasta_file <- file.path(
  getwd(), "rep_seqs", fasta_artifact$uuid, "data",
  "dna-sequences.fasta"
)
fasta_data <- strollur::read_fasta(fasta = fasta_file)
```

fasta_data is a data.frame containing sequence names, sequence nucleotide strings, and comments if provided. You can add the FASTA sequences to your strollur object using the `add()` function:

```{r}
add(
  my_data,
  table = fasta_data,
  type = "sequence"
)
```

You can remove the fasta_artifact using the unlink function:

```{r}
unlink(file.path(getwd(), "rep_seqs"), recursive = TRUE)
```

Now that we have added the FASTA sequences, let's read the abundance data found in the table.qza file. Since we know that the sequences are in fact features, we will assign the abundances to the sequences and then we can assign the sequences to bins to create [Amplicon Sequence Variant](https://en.wikipedia.org/wiki/Amplicon_sequence_variant) *asv* clusters. Let's use the `read_qiime2_feature_table()` function to extract the abundance table.

```{r}
abundance <- read_qiime2_feature_table(strollur_example("table.qza"))

assign(
  my_data,
  table = abundance$data,
  type = "sequence_abundance",
  table_names = list(sequence_name = "bin_name")
)

assign(
  my_data,
  table = abundance$data,
  type = "bin",
  bin_type = "asv",
  table_names = list(sequence_name = "bin_name")
)

my_data
```

Now that we have added the abundance data and assigned the sequences to bins, let's take a look at the taxonomic classifications provided in the taxonomy.qza file. We will use the `read_qiime2_taxonomy()` function to extract the feature classification table.

```{r}
taxonomy <- read_qiime2_taxonomy(strollur_example("taxonomy.qza"))

assign(
  my_data,
  table = taxonomy$data,
  type = "sequence_taxonomy",
  table_names = list(sequence_name = "bin_name")
)
```


Now, lets add a tree that shows the relationships between the sequences (features).

```{r}
tree_artifact <- unpack_qiime2_artifact(
  qza =
    strollur_example("rooted-tree.qza")
)
```

We can read the tree using the `ape::read.tree()` function.

```{r}
tree_file <- file.path(
  getwd(), "rooted-tree", tree_artifact$uuid, "data",
  "tree.nwk"
)
sequence_tree <- ape::read.tree(tree_file)
```

You can add the tree to your dataset as follows:

```{r}
my_data$add_sequence_tree(tree = sequence_tree)
```

You can remove the tree_artifact using the following:

```{r}
unlink(file.path(getwd(), "rooted-tree"), recursive = TRUE)
```

Lastly, let's read the metadata provided by qiime2 using the `read_qiime2_metadata()` function. 

```{r}
metadata <- read_qiime2_metadata(
  metadata =
    strollur_example("sample_metadata.tsv")
)
```

To add the metadata to the my_data dataset, run the following:

```{r}
add(my_data, table = metadata, type = "metadata")
```

To view a summary of your imported dataset, 'my_data', run the following:

```{r}
my_data
```