---
title: "Parallel Testing"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Parallel Testing}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(segtest)
```

We provide a walk-through of how to run tests for segregation distortion at many loci in parallel.

Data sets `ufit`, `ufit2`, and `ufit3` contain the genotyping output of `updog::multidog()` using three different models.

- `ufit`: Uses the `norm` model without specifying who the parents are.
- `ufit2`: Uses the `f1pp` model, specifying the parents.
- `ufit3`: Uses the `f1` model, specifying the parents.

You can convert this genotyping output to what `multi_lrt()` expects using `multidog_to_g()`.

If you did *not* use the `f1pp` or `f1` models, use ether the `all_gl` (to run tests using genotype log-likelihoods) or `all_g` (to run tests assuming genotypes are known) options, and you must specify the ID's of the parents.

```{r}
o1 <- multidog_to_g(ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp")
o2 <- multidog_to_g(ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp")
```

If you *did* use the `f1pp` or `f1` models, use either the `off_gl` (to run tests using genotype log-likelihoods) or `off_g` (to run tests assuming genotypes are known) options.

```{r}
o3 <- multidog_to_g(ufit2, type = "off_g")
o4 <- multidog_to_g(ufit2, type = "off_gl")
o5 <- multidog_to_g(ufit3, type = "off_g")
o6 <- multidog_to_g(ufit3, type = "off_gl")
```

We would recommend *always* using genotype log-likelihoods. But the option is there for known genotypes.

Parallelization support is provided through the future package. You specify how to implement parallelization using `future::plan()`. Then you run `multi_lrt()`. Then you shut down the parallelization with `future::plan()`. The most common plan is `future::multisession()`, where you specify the number of parallel processes with the `workers` argument. You can get the maximum number of workers via `future::availableCores()`

```{r}
future::availableCores()
```

So a typically workload looks something like this:
```{r}
future::plan(future::multisession(workers = 2))
mout <- multi_lrt(g = o2$g, p1 = o2$p1, p2 = o2$p2)
future::plan(future::sequential())
```

The output is a data frame. The most important parts of this are the `snp` and `p_value` columns. As a reminder, please ignore the `alpha`, `xi1`, and `xi2` estimates. Those are noisy. Please don't use them for real work.
```{r}
mout[c("snp", "p_value")]
```

It looks like SNPs *12_31029646* and *2_20070837* are in possible segregation distortion. Let's look at the posterior mode genotypes of SNP 2_20070837:
```{r}
o1$g["2_20070837", ]
o1$p1["2_20070837"]
o1$p2["2_20070837"]
```
So SNP 2_20070837 likely got flagged because of two individuals that are very likely a genotype of 3, which is impossible if the parents have genotypes 0 and 2.

For SNP 12_31029646, let's compare the expected frequencies against the observed modes.
```{r}
offspring_gf_2(alpha = 0.1249, xi1 = 1/3, p1 = 2, p2 = 0)
o1$g["12_31029646", ] / sum(o1$g["12_31029646", ])
o1$p1["12_31029646"]
o1$p2["12_31029646"]
```
So SNP 12_31029646 likely got flagged because there are too many individuals with a genotype of 2, and not enough with a genotype of 0.