---
title: "Using minimapR"
author: "Jake Reed"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using minimapR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction
'minimapR' is a wrapper for 'Minimap2'. 'Minimap2' is a very valuable long read aligner for the Pacbio and Oxford Nanopore Technologies sequencing platforms. 'minimapR' is an R wrapper for 'minimap2' which was developed by Heng Li <me@liheng.org>. This tool aligns long reads to a reference genome and is used in many different bioinformatics workflows. 

```{r setup}
library(minimapR)
```


# Installation
'minimap2' and 'samtools' must be installed on your system along with the R packages 'Rsamtools', 'git2r', and 'pafr'. 'minimap2' can be installed on various operating systems by running the following function or following the instruction from the output of the function. "/your/path/to/directory/for/install" should be replaced with the path to the directory where you want to install 'minimap2'.
```{r installation, eval = FALSE}
minimap2_installation("/your/path/to/directory/for/install")
```

check if dependencies were successfully installed.
```{r check}
minimap2_check()
samtools_check()
```

# Usage
## Downloading data for example
We will download the reference genomes and the query sequences for the example. The reference genomes are the human genome (GRCh38) and the yeast genome (S288C). The query sequences are the human ONT reads and yeast HIFI reads.
```{r download}
tmp_folder <- tempdir()
cat("Temporary folder is:", tmp_folder, "\n")
href_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/GRCh38_chr1_50m.fa"
hfq_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/ont_hs_sample.fastq.gz"
yref_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/S288C_ref_genome.fasta"
yfq_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/yeast_sample_hifi.fastq.gz"
url_list <- c(href_url, hfq_url, yref_url, yfq_url)
lapply(url_list, function(x) download.file(x, destfile = file.path(tmp_folder, basename(x))))

# Contents of the temporary folder
cat("Contents of the temporary folder are:", "\n")
fa_list <- list.files(tmp_folder, pattern = ".fa", full.names = TRUE)
fa_list
```

## Aligning reads to the reference genomes
We will align the human ONT reads to the human genome and the yeast HIFI reads to the yeast genome using 'minimap2'. The 'minimap2' function returns a data frame with the alignment information.
```{r align}
# Human ONT alignment
h_out <- file.path(tmp_folder, "ont_hs_sample")
h_alignment <- minimap2(reference = fa_list[1], 
                        query_sequences = fa_list[2], 
                        output_file_prefix = h_out, 
                        preset_string = "map-ont", 
                        threads = 4, 
                        return = TRUE, 
                        verbose = FALSE)

# Yeast HIFI alignment
y_out <- file.path(tmp_folder, "yeast_sample_hifi")
y_alignment <- minimap2(reference = fa_list[3], 
                        query_sequences = fa_list[4], 
                        output_file_prefix = y_out, 
                        preset_string = "map-hifi", 
                        threads = 4, 
                        return = TRUE, 
                        verbose = TRUE)

# Check the alignment
cat("Alignment for the human sample is:\n")
h_alignment[1:5, 1:7]
cat("Alignment for the yeast sample is:\n")
y_alignment[1:5, 1:7]
```

# Clean up
```{r cleanup}
# Clean up
unlink(tmp_folder, recursive = TRUE)
sessionInfo()
```