An Introduction to **synlik** ======================================= ```{r setup, include=FALSE} library(knitr) opts_chunk$set(out.extra='style="display:block; margin: auto"', fig.align="center", tidy=FALSE) ``` Introduction ------------ The `synlik` package provides Synthetic Likelihood methods for intractable likelihoods. The package is meant to be as general purpose as possible: as long as you are able to simulate data from your model you should be able to fit it. Creating a **synlik** object ---------------------------- A `synlik` object is mainly composed of the `simulator`, which is the function that simulates data from the model of interest. In addition, it is possible to specify a function `summaries`, which transforms the data generated by `simulator` into summary statistics. The `simulator` can generate any kind of output, as long as `summaries` is able to transform it into a matrix where each row is a simulated vector of statistics. If `summaries` is not specified, then `simulator` has to output such a matrix. Here we set-up a `synlik` object representing the Ricker map. The observations are given by $Y_t \sim Pois(\phi N_t)$, where the hidden state has the following dynamics: $N_t = r N_{t-1} exp( -N_{t-1} + e_t )$. This is how we create the object: ```{r ricker_co0nstr, results='hide'} library(synlik) ricker_sl <- synlik(simulator = rickerSimul, summaries = rickerStats, param = c( logR = 3.8, logSigma = log(0.3), logPhi = log(10) ), extraArgs = list("nObs" = 50, "nBurn" = 50) ) ``` Here: * `rickerSimul` and `rickerStats` are functions provided by `synlik`. * `param` is a named vector that contains the log-parameters that will be used by `rickerSimul(param, nsim, extraArgs, ...)`. * `extraArgs` contains additional arguments required by `rickerSimul`, see `?rickerSimul` for details. Now we are ready to simulate data from the object: ```{r ricker_simul} ricker_sl@data <- simulate(ricker_sl, nsim = 1, seed = 54) ``` Here `ricker_sl@data` is just a vector, but **synlik** allows the simulator to simulate any kind of object, so it is often necessary encapsulate an adequate plotting function into the object: ```{r ricker_plot} ricker_sl@plotFun <- function(input, ...) plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...) plot(ricker_sl) ``` If we want to simulate several datasets we simply do: ```{r ricker_simul_stats} tmp <- simulate(ricker_sl, nsim = 10) dim(tmp) ``` So far we have just simulated data, not summary statistics. In this particular example `rickerStats` needs to be passed the reference data in `ricker_sl@data` in order to be able to calculate the statistics. We can do that by using the slot `extraArgs`: ```{r} ricker_sl@extraArgs$obsData <- ricker_sl@data ``` Now we are ready to simulate summary statistics: ```{r} tmp <- simulate(ricker_sl, nsim = 2, stats = TRUE) tmp ``` and to check their approximate normality: ```{r, results='hide'} checkNorm(ricker_sl) ``` Looking at the synthetic likelihood ----------------------------- If we want to estimate the value of the synthetic likelihood at a certain location in the parameter space, we can do it by using the function `slik` as follows: ```{r ricker_slik} slik(ricker_sl, param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)), nsim = 1e3) ``` We can also look at slices of this function wrt each parameter: ```{r ricker_slice} slice(object = ricker_sl, ranges = list("logR" = seq(3.5, 3.9, by = 0.01), "logPhi" = seq(2, 2.6, by = 0.01), "logSigma" = seq(-2, -0.5, by = 0.02)), param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)), nsim = 1000) ``` Finally we can have 2D slices with respect to pairs of parameters: ```{r ricker_slice_2D} slice(object = ricker_sl, ranges = list("logR" = seq(3.5, 3.9, by = 0.02), "logPhi" = seq(2, 2.6, by = 0.02)), pairs = TRUE, param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)), nsim = 1000, multicore = TRUE, ncores = 2) ``` Notice that here we have used the `multicore` option, which distributes the computation among different cores or cluster nodes. Also `slik` provides this option, but for such a simple model the time needed to set up the cluster is longer than the simulation time. Estimating the parameters by MCMC ----------------------------- The unknown model parameters can be estimated by MCMC, using the `smcmc` function. Here is an example (you might want to increase `niter`): ```{r ricker_smcmc} ricker_sl <- smcmc(ricker_sl, initPar = c(3.2, -1, 2.6), niter = 10, burn = 3, priorFun = function(input, ...) sum(input), propCov = diag(c(0.1, 0.1, 0.1))^2, nsim = 500) ``` Notice that `priorFun` returns the log-density of the prior. If we have not reached convergence we can do some more MCMC iterations by using the `continue` generic: ```{r ricker_continue} ricker_sl <- continue(ricker_sl, niter = 10) ``` Finally we can plot the MCMC output (here we plot a pre-computed object): ```{r ricker_plot_smcmc} data(ricker_smcmc) addline1 <- function(parNam, ...) abline(h = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3) addline2 <- function(parNam, ...) abline(v = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3) plot(ricker_smcmc, addPlot1 = "addline1", addPlot2 = "addline2") ``` were we have added the green dotted lines, indicating the position of the true parameters. Blowfly example ---------------------------- As a more challenging example, let us consider the Blowfly model proposed by Wood (2010): $$latex N_{t} = R_t + S_t, $$ where $$latex R_t \sim \text{Pois}(PN_{t-\tau}e^{-\frac{N_{t-\tau}}{N_0}}e_t), $$ represents delayed recruitment process, while: $$ S_t \sim \text{binom}(e^{-\delta\epsilon_t}, N_{t-1}), $$ denotes the adult survival process. Finally $e_t$ and $\epsilon_t$ are independent gamma distributed random variables, with unit means and variances equal to $\sigma_p^2$ and $\sigma_d^2$ respectively. We start by creating a `synlik` object: ```{r blow_constr} blow_sl <- synlik(simulator = blowSimul, summaries = blowStats, param = log( c( "delta" = 0.16, "P" = 6.5, "N0" = 400, "var.p" = 0.1, "tau" = 14, "var.d" = 0.1) ), extraArgs = list("nObs" = 200, "nBurn" = 200, "steps" = 1), plotFun = function(input, ...){ plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...) } ) ``` for more details see `?blow_sl`. As before we simulate data and we store it back into the object: ```{r blow_simul} blow_sl@data <- simulate(blow_sl, seed = 84) blow_sl@extraArgs$obsData <- blow_sl@data ``` As for the Ricker example, `blowStats` needs the observed data to calculate the statistics, hence we store it into `extraArgs$obsData`. Notice that this is just a peculiarity of the chosen `summaries` function. Then, we can estimate the parameters by adaptive MCMC (you might want to increase `niter`): ```{r blow_smcmc} blow_sl <- smcmc(blow_sl, initPar = log( c( "delta" = 0.1, "P" = 8, "N0" = 600, "sig.p" = 0.2, "tau" = 17, "sig.d" = 0.3) ), niter = 2, burn = 0, propCov = diag(rep(0.001, 6)), nsim = 500, prior = function(input, ...){ sum(input) + dunif(input[4], log(0.01), log(1), log = TRUE) + dunif(input[6], log(0.01), log(1), log = TRUE) }, targetRate = 0.15, multicore = FALSE ) ``` We plot the results on the original scale (here we plot a pre-computed object): ```{r blow_plot} data(blow_smcmc) tmpTrans <- rep("exp", 6) names(tmpTrans) <- names(blow_smcmc@param) plot(blow_smcmc, trans = tmpTrans) ``` The package includes some of Nicholson (1954) experimental datasets, which can be loaded into the object: ```{r bf1} data(bf1) blow_sl@data <- bf1$pop blow_sl@extraArgs$obsData <- blow_sl@data ``` Then it is possible to fit the model using the experimental data by MCMC. Alpha-stable distribution ---------------------------- Let us consider a model that does not produce time series data: the alpha-stable distribution, as described in Nolan (2012). For a quick reference do `library("stableDist")` and `?rstable`. As a first step we need to wrap the function `rstable`, which simulates random variables from this distribution, into a function that fits the `synlik` framework: ```{r stableSimul} stableSimul <- function(param, nsim, extraArgs, ...) { if( !is.loaded("stabledist") ) library(stabledist) # Some sanity check if( !( c("nObs") %in% names(extraArgs) ) ) stop("extraArgs should contain nObs") nObs <- extraArgs$nObs stopifnot( length(param) == 4 ) param[c(1, 3)] <- exp(param[c(1, 3)]) if(abs(param[1] - 1) < 0.01) stop("alpha == 1 is not allowed") # Actual simulation output <- rstable(nObs * nsim, alpha = param[1], beta = param[2], gamma = param[3], delta = param[4]) return( matrix(output, nsim, nObs) ) } ``` We need also to set up a function that calculates the summary statistics, which in our case are 10 quantiles: ```{r stableStats} stableStats <- function(x, extraArgs, ...){ if (!is.matrix(x)) x <- matrix(x, 1, length(x)) X0 <- t( apply(x, 1, quantile, probs = seq(0.1, 0.9, length.out = 10)) ) unname(X0) } ``` We then create a `synlik` object: ```{r stable_constr} stable_sl <- synlik( simulator = stableSimul, summaries = stableStats, param = c(alpha = log(1.5), beta = 0.1, gamma = log(1), delta = 2), extraArgs = list("nObs" = 1000), plotFun = function(input, ...) hist(input, xlab = "x", main = "The data", ...) ) stable_sl@data <- simulate(stable_sl, seed = 67) plot(stable_sl) ``` As we see from the histogram, the distribution is quite fat-tailed. As before, can then estimate the parameters by MCMC (you might want to increase `niter`): ```{r stable_smcmc} stable_sl <- smcmc(stable_sl, initPar = c(alpha = log(1.7), beta = -0.1, gamma = log(1.3), delta = 1.5), niter = 2, burn = 0, priorFun = function(input, ...) { dunif(input[1], log(1), log(2), log = TRUE) + dunif(input[2], -1, 1, log = TRUE) }, propCov = diag(c(0.1, 0.1, 0.1, 0.1))^2, targetRate = 0.25, nsim = 200) # plot(stable_sl, trans = c("alpha" = "exp", "gamma" = "exp")) ``` By slicing the likelihood we check whether the model is well identified: ```{r} slice(object = stable_sl, ranges = list("alpha" = log(seq(1.2, 1.9, by = 0.05)), "beta" = seq(-0.5, 0.5, by = 0.05), "gamma" = log(seq(0.5, 1.9, by = 0.05)), "alpha" = seq(1.1, 2.2, by = 0.05)), param = stable_sl@param, trans = list("alpha" = "exp", "gamma" = "exp"), nsim = 1000, multicore = TRUE, ncores = 2) ``` We probably could do better by putting some more effort in choosing the summary statistics. References ---------------------------- * Simon N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102--1104, 2010. * Alexander J Nicholson. An outline of the dynamics of animal populations. Australian Journal of Zoology, 2(1):9--65, 1954. * Diethelm Wuertz, Martin Maechler and Rmetrics core team members. (2013). stabledist: Stable Distribution Functions. R package version 0.6--6.