---
title: "Introduction - How to use EcoDiet"
author: "Heloise Thero, Pierre-Yves Hernvann"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{1. Introduction - How to use EcoDiet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

EcoDiet is a new tool for assimilating data in food-web studies. The goal of the package is to quantify trophic interactions between food-web components by combining stomach content analyses and biotracers in a Bayesian hierarchical model. To do so, the model simultaneously estimates a probabilistic topology matrix of the trophic network and the diet matrix. The topology matrix is a probabilistic description of "who eats whom" in the food web nad is composed of the probabilities that trophic links exist between food-web components, i.e. the probabilities $\eta$ that any prey is eaten by a given predator. The diet matrix is estimated conditionally to this topology and contains all diet proportions $\Pi$, hence it expresses the contribution (in %) of any prey to a given predator's diet.

The full EcoDiet model and its application to real and simulated datasets are described in *Hernvann et al. 2022, Ecol Appl*. Use `citation("EcoDiet")` to get the full reference.

Several options are available within the present package:

* *Default option* - only stomach content and biotracer data are integrated into EcoDiet.

* *Literature option* - stomach content and biotracer data are integrated and results from expert knowledge or the literature are used to formulate informative priors on the topology and diet matrices.

The example showcased here is an artificial dataset, created to be simple to visualize and understand. In this example as in *Hernvann et al. 2022, Ecol Appl* the biotracers used are stable isotopes but EcoDiet could be used to treat other analyses (e.g., fatty acids, specific-compounds stable isotopes).

Before running the following code, please ensure that EcoDiet and the required tools are correctly installed and set up on your computer. Instructions can be found in the [README](https://github.com/pyhernvann/EcoDiet/blob/master/README.md)'s instructions. If you're all set, let's load the EcoDiet package:

```{r}
library(EcoDiet)
```

## 1. Load and check your data

To apply EcoDiet to your own data, your stomach content, biotracer and literature data should be in a specific format, similar to those of following example. This data can for instance be imported into R using `.csv` files:

```{r, eval = FALSE}
example_stomach_data <- read.csv("./data/my_stomach_data.csv")
example_biotracer_data <- read.csv("./data/my_biotracer_data.csv")
```

Here we will demonstrate how EcoDiet works on a simulated dataset stored within the EcoDiet package. 

### Stomach content data

The stomach content table gathers the sum of occurrences of each prey trophic group in the analyzed stomachs of each consumer trophic group. The first column of the table contains the names of the prey trophic groups and the headers of the following columns contain the names of all the predator trophic groups. The last row of the table should be named "full", and indicates how many (non-empty) stomachs have been analyzed for each trophic group.

```{r}
example_stomach_data <- read.csv(system.file("extdata", "example_stomach_data.csv",
                                    package = "EcoDiet"))
knitr::kable(example_stomach_data)
```

In this example, for the "huge" animals, `r example_stomach_data$huge[5]` stomachs were analyzed and contained remainings. Among these stomachs, `r example_stomach_data$huge[2]` contained "large" animal remainings and `r example_stomach_data$huge[3]` contained "medium" animal remainings.

If some trophic groups of the studied food web don't have associated stomach content analyses, you should add a line in this table indicating the name of this group and *0* in the other rows (it is the case here for "small" animals that are at the base of the trophic network).

### Biotracer data

The Biotracer table contains the analyses of different tracer concentrations. Each line of the table represents one individual on which were conducted biotracer analyses for various elements (here, stable isotope analyses for carbon and nitrogen). The first column of the table should be called "group" and contain name of the trophic group the individual belongs. The rest of the table contains the measures, one column corresponding to one biotracer.

```{r}
example_biotracer_data <- read.csv(system.file("extdata", "example_biotracer_data.csv",
                                    package = "EcoDiet"))
knitr::kable(example_biotracer_data)
```


All the trophic groups of the studied food web should be present in this biotracer table. Thus, if biotracer analyses are missing for one given trophic group you still have to enter one line indicating the name of the trophic group and *NA* in the following columnds (as it is the case for the "huge" group).

The biotracers like stable isotope analyses are integrated in the model based on the trophic enrichment concept. Thus, for each biotracer informed, you also need to define the corresponding trophic discrimination factors corresponding to your the biotracers. In this example, we use common trophic discrimination factors for carbon and nitrogen:

```{r, eval = FALSE}
trophic_discrimination_factor = c(0.8, 3.4)
```

## 2. Preprocess your data without literature data (*Default option*)

**If you have data extracted from the literature, skip this and go to section 3.**

Read this section if you **don't** want to integrate information from the literature / expert knowledge.
This is specified using the `literature_configuration` argument.

```{r}
literature_configuration <- FALSE
```

### Preprocess the data

The `preprocess_data` function checks and rearranges the input data into a specific format that is read when running the EcoDiet model:

```{r}
data <- preprocess_data(stomach_data = example_stomach_data,
                        biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration
                        )
```

Error messages will appear if the input data hasn't been provided in the expected shape. Please read carefully the error message and rearrange your data in the correct format.

Stomach content dataset dominated by numerous small values of occurrence might be problematical as to few groups could be identified as potential prey. To avoid such situations, you can choose to upscale the stomach content data so that occurrences are artificially increased but their relative importance are conserved. This is performed by specifying `rescale_stomach = TRUE` when preprocessing the data.

```{r}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        rescale_stomach = TRUE)
```

### Check the trophic links

By default, the links that will be investigated by the model will be those derived from at least one observation in the stomach content analyses. This is a conservative assumption to avoid investigating all the possible trophic links in the food web, since it would make the model too complex and require too long runs, while also reduce the discrimination power of the model (see Hernvann et al. 2022). The trophic links that will be investigated can be summarized by a fixed topology computed by the `preprocess_data` based on stomach contents (i.e., `data$o`):

```{r}
topology <- 1 * (data$o > 0)
print(topology)
```

If you want to add other important link that may not be observed in scat contents (or, on the contrary, remove false positive trophic links identified in the stomach contents), you can modify directly this binary topology matrix. This operation can be useful, for instance, if your stomach sampling size is too low and you missed a prey you are definitely sure that the predator feeds on, or if you know that the identification of a prey will be reduced due to very small size or high digestibility. To specify that the "huge" animal can also eat "small" animals, you can do:

```{r}
topology["small", "huge"] <- 1
print(topology)
```

To run EcoDiet on this updated binary topology, you will have to append the previous one by re-running the `preprocess_data` as follows:

```{r}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data)
```


## 3. Preprocess your data with literature data (*Literature option*)

**If you don't have data extracted from the literature, skip this and go to section 4.**

Read this section if you **do** want to integrate information from the literature / expert knowledge.
This is specified using the `literature_configuration` argument.

```{r}
literature_configuration <- TRUE
```

### Define the priors

A literature diet table is used to set priors on the trophic link probabilities $\eta$ and the diet proportions $\Pi$. This table is similar to the stomach contents table, as all trophic groups must be included in the columns and rows. The numbers are the average diet proportions found in the literature. Here, the selected studies have identified that "huge" animals eat equally "large" and "medium" animals (thus the 0.5 and 0.5 numbers in the first column). The proportions for a given predator (i.e., within a given column) must sum to 1. The "small" animals are at the base of the ecosystem, so the column is filled with zeros.

The last row of the table corresponds to the literature pedigree score. This score (a number from 0 to 1) quantifies the literature reliability on each predator's diet. Here the dietary proportions from the literature are used to produce reliable estimates for the "huge" animals, e.g., the pedigree score associated is high (0.9). On the contrary, the diet proportions for the "medium" animals come from an older article focusing on a very different ecosystem so estimates produced are less reliable, e.g, the pedigree score is low (0.2). The pedigree score for the "small" animals is set at 1, because this group eats nothing. For more details please read the reference article.

```{r}
example_literature_diets_path <- system.file("extdata", "example_literature_diets.csv",
                                    package = "EcoDiet")
example_literature_diets <- read.csv(example_literature_diets_path)
knitr::kable(example_literature_diets)
```

This summary of the literature data will be used to formulate:

1. The priors on the topology matrix's $\eta$s. If a given literature diet proportion is zero, the corresponding prior Beta distribution of $\eta$ will be shifted toward 0. If the proportion is positive, the distribution will be shifted toward 1.

2. The priors on the diet matrix's $\Pi$s. The literature diet proportions are entered as the hyperparameters of the prior Dirichlet distribution of $\Pi$.

The Pedigree scores are used to determine the priors' precision. Other parameters can be used to adjust the prior distributions:

* the `nb_literature` parameter. The higher the number, the stronger the weight will be of the literature in the final inference on $\eta$. Setting this parameter to 10 is like saying that the prior from the literature will weigh as much as the additional data from 10 stomachs. Thus for any particular application, `nb_literature` should be set to a value smaller than the sample size in the available stomach content data.

```{r}
nb_literature = 10
```

* the `literature_slope` parameter (a value between 0 and 1). The higher the number, the stronger the weight will be of the literature in the final inference on $\Pi$. You should set this value depending on the value of your data (number of biotracers, etc).

```{r}
literature_slope = 0.5
```

### Preprocess the data

The `preprocess_data` function then checks and rearranges the data in a specific format so that the EcoDiet model can be run:

```{r}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)
```

If any error appears, it means your data is not in the correct format. Please read the error message and try to rearrange the data in the correct format.

If you have a lot of small values in the stomach occurences, you can choose to upscale the stomach content data:
```{r}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = example_stomach_data,
                        rescale_stomach = TRUE,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)
```

### Check the trophic links to investigate

These are the links that will be investigated by the model. It is not wise to assume that all the trophic links are possible. You therefore need to keep only the reasonnable trophic links (e.g., a shrimp cannot eat a whale). The matrix displayed by the `preprocess_data` function is based by default on the stomach content data (`data$o`) and on the literature diet matrix (`data$alpha_lit`):

```{r}
topology <- 1 * ((data$o > 0) | (data$alpha_lit > 0))
print(topology)
```

If you want to add another trophic link, you can modify directly the binary topology matrix. It can be useful if you are sure that a prey is consumed by a given predator. However the trophic link is not observed in the stomach content data, and the study extracted from the literature did not identify the prey. To specify that the "huge" animal can also eat "small" animals, you can do:

```{r}
topology["small", "huge"] <- 1
print(topology)
```

The new topology matrix can now be entered as an argument of the `preprocess_data` function:
```{r}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 10,
                        literature_slope = 0.5)
```

## 4. Plot the data and the priors

You can visualize your data with the `plot_data` function:

```{r, fig1, fig.height = 4, fig.width = 6}
plot_data(biotracer_data = example_biotracer_data,
          stomach_data = example_stomach_data)
```

You can save the figures as PNG in the current folder using: 

```{r, eval = FALSE}
plot_data(biotracer_data = example_biotracer_data,
          stomach_data = example_stomach_data,
          save = TRUE, save_path = ".")
```

Whether the priors are non-informative or informed by the literature, you can plot the mean of the prior distributions for the trophic link probabilities $\eta$ and the diet proportions $\Pi$:

```{r, fig.height = 4, fig.width = 6}
plot_prior(data, literature_configuration)
```

You can also see the prior distributions for one trophic group (or predator):

```{r, fig.height = 4, fig.width = 6}
plot_prior(data, literature_configuration, pred = "huge")
```

This way, you can change the prior parameters and see how it affects the prior distributions. Here, we will change the `nb_literature` parameter from 10 to 2:

```{r, fig.height = 4, fig.width = 6}
data <- preprocess_data(biotracer_data = example_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        topology = topology,
                        stomach_data = example_stomach_data,
                        literature_diets = example_literature_diets,
                        nb_literature = 2,
                        literature_slope = 0.5)

plot_prior(data, literature_configuration, pred = "huge", variable = "eta")
```


## 5. Run the model

The `write_model` function writes the model in the BUGS syntax. Here You need to specify whether you want to use or not priors form the literature since the model structure will be affected by this choice.
The model is written as a *.txt* for which you should specify a path and name through the  `file.name` argument.

To see the written model, you can turn on visualization using `print.model=TRUE` or opening the saved *.txt* file

```{r}
filename <- "mymodel.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)
```

First, run the model as a test (i.e., low adaption phase and low number of iterations) to check that the model is compiling properly. Specifying `run_param="test"` will by default run the model with nb_iter=1000 and nb_burnin=500.

```{r, eval = TRUE}
mcmc_output <- run_model(filename, data, run_param = "test")
```

The low numbers won't be enough to achieve a satisfactory model convergence. You should progressively increase the number of adaptation steps `nb_adapt` and of iterations `nb_iter` while checking the "Convergence warnings" message. You can specify these model run parameters by specifying `run_param=list(nb_iter=iter, nb_burnin=burnin, nb_thin=thin)`. There are also a couple of default run parameters (*"very short"*,*"short"*,*"normal"*,*"long"*,*"very long"*) that you can use in the with `run_param` and learn about by calling `?run_param`

Be aware that, depending on your data and especially the number of trophic groups and investigate trophic interactions, the model can take **hours or days** to run.


```{r, eval = FALSE}
mcmc_output <- run_model(filename, data, run_param=list(nb_iter=10000, nb_burnin=5000, nb_thin=5))
mcmc_output <- run_model(filename, data, run_param=list(nb_iter=50000, nb_burnin=25000, nb_thin=25))
mcmc_output <- run_model(filename, data, run_param=list(nb_iter=100000, nb_burnin=50000, nb_thin=50))

mcmc_output_example <- run_model(filename, data, run_param=list(nb_iter=50000, nb_burnin=25000, nb_thin=25))
```

Note that for much complex case studies, the dimension of the model may be considerable hence the time required by the runs may be dramatically long. In that context, while analyzing your results carefully, you might be willing to use your model outputs if a only a very limited number of variables don't reach convergence after especially long runs. 

Once a model reaches convergence objectives, don't forget to save it on your machine.

```{r, eval = FALSE}
save(mcmc_output_example, file = "./data/mcmc_output_example.rda")
```

### Diagnoses

Here we provide the `diagnose_model` function to perform simple diagnoses on the EcoDiet run. It displays the number of variables for which the Gelman-Rubin test remains higher than different thresholds, highlighting also the 10 worst variables in terms of convergence. The object created contains these values for all the model variables.

```{r}
Gelman_model <- diagnose_model(mcmc_output_example)
print(Gelman_model)
```

The function can also be used to produce diagnostic plots and save them as a *.pdf* file. Complex models will have so many variables that the process can be very long so the  user can also specify the variable for which graphs should be produced and stored.

```{r, eval = FALSE}
diagnose_model(mcmc_output_example, var.to.diag = "all", save = TRUE)
```

## 6. Plot and save the results

The object output by the `run_model` function contains all MCMC samples, summary statistics and the configuration of the run:

```{r}
str(mcmc_output_example)
```

Please read the documentation of the [jagsUI package](https://CRAN.R-project.org/package=jagsUI) for more information.

Below we provide some functions to for a simple exploration of the results.

### The mean results

The model's outputs are the approximated *a posteriori* distributions for the trophic links probabilities $\eta$ and the diet proportions $\Pi$. You can visualize the mean of these distribitions with the `plot_results` function:

```{r, fig.height = 4, fig.width = 6}
plot_results(mcmc_output_example, data)
```

You can access the main statistics by variable directly from the `jagsUI` object returned by the r`run_model` function, including the mean value of the variables:

```{r}
print(mcmc_output_example$summary[,"mean"])
```

### The probability distributions

The probability distributions can be plotted for one predator:

```{r, fig.height = 4, fig.width = 6}
plot_results(mcmc_output_example, data, pred = "huge")
```

```{r, fig.height = 4, fig.width = 6}
plot_results(mcmc_output_example, data, pred = "large")
```


## 7. Save another variable than $\Pi$ and $\eta$

You actually have the possibility to monitor and output the statistics of all the model parameters. For example you may be interested by the variable $\delta$ that represents the trophic discrimination factor. In the EcoDiet model, a different trophic discrimination factor is used for each trophic group and for each element, allowing some differences between species. We can chose to monitor these parameters when running the model by using the `variables_to_save` argument:

```{r, eval=FALSE}
mcmc_output_delta <- run_model(filename, data, 
          variables_to_save = c("delta"),
          run_param = "test")
```

And now you can access the mean value using:

```{r, eval=FALSE}
print(mcmc_output_delta$summary[,"mean"])
```