<!--
%\VignetteEngine{knitr::docco_linear}
%\VignetteIndexEntry{synlik_vignette}
-->
  
  An Introduction to **synlik**
=======================================
  
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(out.extra='style="display:block; margin: auto"', fig.align="center", tidy=FALSE)
```


Introduction
------------
  
The `synlik` package provides Synthetic Likelihood methods for intractable likelihoods. The package is meant to be 
as general purpose as possible: as long as you are able to simulate data from your model you should be able to fit it.


Creating a **synlik** object
----------------------------
A `synlik` object is mainly composed of the `simulator`, which is the function that simulates data from the model of interest.
In addition, it is possible to specify a function `summaries`, which transforms the data generated by `simulator` into 
summary statistics. The `simulator` can generate any kind of output, as long as `summaries` is able to transform it into a matrix
where each row is a simulated vector of statistics. If `summaries` is not specified, then `simulator` has to output such a matrix.
  
Here we set-up a `synlik` object representing the Ricker map. The observations are given by $Y_t \sim Pois(\phi N_t)$, where the hidden state has the following dynamics: $N_t = r N_{t-1} exp( -N_{t-1} + e_t )$. This is how we create the object:
```{r ricker_co0nstr, results='hide'}
library(synlik)
ricker_sl <- synlik(simulator = rickerSimul,
                    summaries = rickerStats,
                    param = c( logR = 3.8, logSigma = log(0.3), logPhi = log(10) ),
                    extraArgs = list("nObs" = 50, "nBurn" = 50)
)
```
Here:
* `rickerSimul` and `rickerStats` are functions provided by `synlik`.
* `param` is a named vector that contains the log-parameters that will be used by `rickerSimul(param, nsim, extraArgs, ...)`. 
* `extraArgs` contains additional arguments required by `rickerSimul`, see `?rickerSimul` for details.

Now we are ready to simulate data from the object:
```{r ricker_simul}
ricker_sl@data <- simulate(ricker_sl, nsim = 1, seed = 54)
```
Here `ricker_sl@data` is just a vector, but **synlik** allows the simulator to simulate any kind of object, so it is often
necessary encapsulate an adequate plotting function into the object:
```{r ricker_plot}
ricker_sl@plotFun <- function(input, ...) plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
plot(ricker_sl)
```
If we want to simulate several datasets we simply do:
```{r ricker_simul_stats}
tmp <- simulate(ricker_sl, nsim = 10)
dim(tmp)
```

So far we have just simulated data, not summary statistics. In this particular example `rickerStats` needs to be passed the reference data in `ricker_sl@data` in order to be able to calculate the statistics. We can do that by using the slot `extraArgs`:
```{r}
ricker_sl@extraArgs$obsData <- ricker_sl@data
```
Now we are ready to simulate summary statistics:
```{r}
tmp <- simulate(ricker_sl, nsim = 2, stats = TRUE)
tmp
``` 
 and to check their approximate normality:
```{r, results='hide'}
checkNorm(ricker_sl)
```  
 
 
Looking at the synthetic likelihood
----------------------------- 

If we want to estimate the value of the synthetic likelihood at a certain location in the parameter space, we can do it by using the function `slik` as follows:
```{r ricker_slik}
slik(ricker_sl, 
     param  = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
     nsim   = 1e3)
```  
We can also look at slices of this function wrt each parameter:
```{r ricker_slice}
slice(object = ricker_sl, 
      ranges = list("logR" = seq(3.5, 3.9, by = 0.01),
                    "logPhi" = seq(2, 2.6, by = 0.01),
                    "logSigma" = seq(-2, -0.5, by = 0.02)), 
      param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)), 
      nsim = 1000)
```
Finally we can have 2D slices with respect to pairs of parameters:
```{r ricker_slice_2D}
slice(object = ricker_sl, 
      ranges = list("logR" = seq(3.5, 3.9, by = 0.02),
                    "logPhi" = seq(2, 2.6, by = 0.02)), 
      pairs = TRUE,
      param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)), 
      nsim = 1000, 
      multicore = TRUE,
      ncores = 2)
```
Notice that here we have used the `multicore` option, which distributes the computation among different cores or cluster nodes.
Also `slik` provides this option, but for such a simple model the time needed to set up the cluster is longer than the simulation time.
   
   
Estimating the parameters by MCMC
----------------------------- 
  
The unknown model parameters can be estimated by MCMC, using the `smcmc` function. 
Here is an example (you might want to increase `niter`):
```{r ricker_smcmc}
ricker_sl <- smcmc(ricker_sl, 
                   initPar = c(3.2, -1, 2.6),
                   niter = 10, 
                   burn = 3,
                   priorFun = function(input, ...) sum(input), 
                   propCov = diag(c(0.1, 0.1, 0.1))^2, 
                   nsim = 500)
```
Notice that `priorFun` returns the log-density of the prior. If we have not reached convergence we can do some more MCMC iterations by using the `continue` generic:
```{r ricker_continue}
ricker_sl <- continue(ricker_sl, niter = 10)
``` 
Finally we can plot the MCMC output (here we plot a pre-computed object):
```{r ricker_plot_smcmc}
data(ricker_smcmc)
addline1 <- function(parNam, ...) abline(h = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3) 
addline2 <- function(parNam, ...) abline(v = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3)

plot(ricker_smcmc, addPlot1 = "addline1", addPlot2 = "addline2")
``` 
were we have added the green dotted lines, indicating the position of the true parameters.
 
 
Blowfly example
----------------------------

As a more challenging example, let us consider the Blowfly model proposed by Wood (2010):
$$latex
N_{t} = R_t + S_t,
$$
where
$$latex
R_t \sim \text{Pois}(PN_{t-\tau}e^{-\frac{N_{t-\tau}}{N_0}}e_t),
$$
represents delayed recruitment process, while:
$$
S_t \sim \text{binom}(e^{-\delta\epsilon_t}, N_{t-1}),
$$
denotes the adult survival process. Finally $e_t$ and $\epsilon_t$ are independent gamma distributed random variables, with unit means and variances equal to $\sigma_p^2$ and $\sigma_d^2$ respectively. We start by creating a `synlik` object:
```{r blow_constr}
blow_sl <- synlik(simulator = blowSimul,
                  summaries = blowStats,
                  param = log( c( "delta" = 0.16, "P" = 6.5, "N0" = 400, 
                                  "var.p" = 0.1, "tau" = 14, "var.d" = 0.1)  ),
                  extraArgs = list("nObs" = 200, "nBurn" = 200, "steps" = 1),
                  plotFun = function(input, ...){ 
                              plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
                           }
)
```
for more details see `?blow_sl`. As before we simulate data and we store it back into the object:
```{r blow_simul}
blow_sl@data <- simulate(blow_sl, seed = 84)
blow_sl@extraArgs$obsData <- blow_sl@data
```
As for the Ricker example, `blowStats` needs the observed data to calculate the statistics, hence we store it 
into `extraArgs$obsData`. Notice that this is just a peculiarity of the chosen `summaries` function. 
Then, we can estimate the parameters by adaptive MCMC (you might want to increase `niter`):
```{r blow_smcmc}
blow_sl <- smcmc(blow_sl, 
                 initPar = log( c( "delta" = 0.1, "P" = 8, "N0" = 600, 
                                   "sig.p" = 0.2, "tau" = 17, "sig.d" = 0.3)  ),
                 niter = 2, 
                 burn = 0,
                 propCov = diag(rep(0.001, 6)),
                 nsim = 500, 
                 prior = function(input, ...){
                           sum(input) +
                           dunif(input[4], log(0.01), log(1), log = TRUE) +
                           dunif(input[6], log(0.01), log(1), log = TRUE)
                 },
                 targetRate = 0.15,
                 multicore = FALSE
)
```
We plot the results on the original scale (here we plot a pre-computed object):
```{r blow_plot}
data(blow_smcmc)
tmpTrans <- rep("exp", 6)
names(tmpTrans) <- names(blow_smcmc@param)
plot(blow_smcmc, trans = tmpTrans)
```
The package includes some of Nicholson (1954) experimental datasets, which can be loaded into the object:
```{r bf1}
data(bf1)
blow_sl@data <- bf1$pop
blow_sl@extraArgs$obsData <- blow_sl@data
```
Then it is possible to fit the model using the experimental data by MCMC.

Alpha-stable distribution
----------------------------
Let us consider a model that does not produce time series data: the alpha-stable distribution, as
described in Nolan (2012). For a quick reference do `library("stableDist")` and `?rstable`. 
As a first step we need to wrap the function `rstable`, which simulates random variables from this distribution,
into a function that fits the `synlik` framework:
```{r stableSimul}
stableSimul <- function(param, nsim, extraArgs, ...)
{
  if( !is.loaded("stabledist") ) library(stabledist) 
  
  # Some sanity check
  if( !( c("nObs") %in% names(extraArgs) ) ) stop("extraArgs should contain nObs")
  nObs <- extraArgs$nObs 
  stopifnot( length(param) == 4 )
  param[c(1, 3)] <- exp(param[c(1, 3)])
  if(abs(param[1] - 1) < 0.01) stop("alpha == 1 is not allowed")
    
  # Actual simulation
  output <- rstable(nObs * nsim, alpha = param[1], beta = param[2], gamma = param[3], delta = param[4])
  
  return( matrix(output, nsim, nObs) )
}
```
We need also to set up a function that calculates the summary statistics, which in our case are 10 quantiles:
```{r stableStats}
stableStats <- function(x, extraArgs, ...){  
  
  if (!is.matrix(x)) x <- matrix(x, 1, length(x))
  
  X0 <- t( apply(x, 1, quantile, probs = seq(0.1, 0.9, length.out = 10)) )
  
  unname(X0)
}
```
We then create a `synlik` object:
```{r stable_constr}
stable_sl <- synlik( simulator = stableSimul,
                     summaries = stableStats,
                     param = c(alpha = log(1.5), beta = 0.1, gamma = log(1), delta = 2),
                     extraArgs = list("nObs" = 1000),
                     plotFun = function(input, ...) hist(input, xlab = "x", main = "The data", ...)
                     )
stable_sl@data <- simulate(stable_sl, seed = 67)
plot(stable_sl)
```
As we see from the histogram, the distribution is quite fat-tailed.
As before, can then estimate the parameters by MCMC (you might want to increase `niter`):
```{r stable_smcmc}
stable_sl <- smcmc(stable_sl, 
                   initPar = c(alpha = log(1.7), beta = -0.1, gamma = log(1.3), delta = 1.5),
                   niter = 2,
                   burn = 0,
                   priorFun = function(input, ...) { 
                                 dunif(input[1], log(1), log(2), log = TRUE) + 
                                 dunif(input[2], -1, 1, log = TRUE) 
                                 }, 
                   propCov = diag(c(0.1, 0.1, 0.1, 0.1))^2, 
                   targetRate = 0.25,
                   nsim = 200)
# plot(stable_sl, trans = c("alpha" = "exp", "gamma" = "exp"))
```
By slicing the likelihood we check whether the model is well identified:
```{r}
slice(object = stable_sl, 
      ranges = list("alpha" = log(seq(1.2, 1.9, by = 0.05)), 
                    "beta"  = seq(-0.5, 0.5, by = 0.05),
                    "gamma" = log(seq(0.5, 1.9, by = 0.05)),
                    "alpha" = seq(1.1, 2.2, by = 0.05)), 
      param = stable_sl@param, 
      trans = list("alpha" = "exp", "gamma" = "exp"),
      nsim = 1000, 
      multicore = TRUE,
      ncores = 2)
```
We probably could do better by putting some more effort in choosing the summary statistics.


References
----------------------------

* Simon N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102--1104, 2010.

* Alexander J Nicholson. An outline of the dynamics of animal populations. Australian Journal of Zoology, 2(1):9--65, 1954.

* Diethelm Wuertz, Martin Maechler and Rmetrics core team members. (2013). stabledist: Stable Distribution Functions. R package version 0.6--6.