--- title: "Ellipse Overlap" author: "Andrew L Jackson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Ellipse Overlap} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{viridis} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 5) ``` ## Calculating the Area of Overlap Between Two Ellipses ```{r} # remove previously loaded items from the current environment and remove previous graphics. rm(list=ls()) graphics.off() # Here, I set the seed each time so that the results are comparable. # This is useful as it means that anyone that runs your code, *should* # get the same results as you, although random number generators change # from time to time. set.seed(1) # load SIBER library(SIBER) library(viridis) # set a new three-colour palette from the viridis package palette(viridis::viridis(3)) # load in the included demonstration dataset data("demo.siber.data") # # create the siber object siber.example <- createSiberObject(demo.siber.data) # Or if working with your own data read in from a *.csv file, you would use # This *.csv file is included with this package. To find its location # type # fname <- system.file("extdata", "demo.siber.data.csv", package = "SIBER") # in your command window. You could load it directly by using the # returned path, or perhaps better, you could navigate to this folder # and copy this file to a folder of your own choice, and create a # script from this vingette to analyse it. This *.csv file provides # a template for how your own files should be formatted. # mydata <- read.csv(fname, header=T) # siber.example <- createSiberObject(mydata) # Create lists of plotting arguments to be passed onwards to the # plotting functions. With p.interval = NULL, these are SEA. NB not SEAc though # which is what we will base our overlap calculations on. This implementation # needs to be added in a future update. For now, the best way to plot SEAc is to # add the ellipses manually following the vignette on this topic. group.ellipses.args <- list(n = 100, p.interval = NULL, lty = 1, lwd = 2) par(mfrow=c(1,1)) plotSiberObject(siber.example, ax.pad = 2, hulls = F, community.hulls.args, ellipses = T, group.ellipses.args, group.hulls = F, group.hull.args, bty = "L", iso.order = c(1,2), xlab = expression({delta}^13*C~'permille'), ylab = expression({delta}^15*N~'permille') ) ``` In order now to calculate the overlap between two (or more) ellipses, we need to know the coordinates of each ellipse. This is done by calling `addEllipse(..., do.plot = FALSE)`. See the associated help file and the vignette [Customising-Plots-Manually](Customising-Plots-Manually.html) for more information on optional inputs to addEllipse for different types of ellipse. Also, bear in mind that the option `n` controls how many points are used to draw the ellipse, and hence low `n` means clunky, edgier ellipses, compared with rounder, smoother ellipses for higher `n`. A higher `n` is more suitable when ellipses are more eccentric as their curvature is greater at the tips. The new functions `maxLikOverlap` and `bayesianOverlap` are wrapper functions that take care of the calls to `addEllipse` and the actual polygon overlap function in the package `spatstat.utils`. The functions `maxLikOverlap` and `bayesianOverlap` return three values each: the computationally calculated area of the first ellipse, second ellipse, and the overlap between them. It is not entirely obvious to me that there is a single choice if you wish to express your overlap as a proportion, since there are several options for the choice of denominator. One can imagine that expressing the overlap as a proportion of the sum of the non-overlapping areas of the ellipses seems suitable in a general sense, since this will range from 0 when the ellipses are completely distinct, to 1 when the ellipses are completely coincidental. ```{r, MLoverlap} # In this example, I will calculate the overlap between ellipses for groups 2 # and 3 in community 1 (i.e. the green and yellow open circles of data). # The first ellipse is referenced using a character string representation where # in "x.y", "x" is the community, and "y" is the group within that community. # So in this example: community 1, group 2 ellipse1 <- "1.2" # Ellipse two is similarly defined: community 1, group3 ellipse2 <- "1.3" # The overlap of the maximum likelihood fitted standard ellipses are # estimated using sea.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, p.interval = NULL, n = 100) # the overlap betweeen the corresponding 95% prediction ellipses is given by: ellipse95.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, p.interval = 0.95, n = 100) # so in this case, the overlap as a proportion of the non-overlapping area of # the two ellipses, would be prop.95.over <- ellipse95.overlap[3] / (ellipse95.overlap[2] + ellipse95.overlap[1] - ellipse95.overlap[3]) ``` The function `bayesianOverlap` returns multiple rows of these three numbers, each representing the values for a particular draw from the posterior estimates so that you can build up a picture of the distribution of the estimated overlap. Calculating this overlap is computationally time consuming, and there are going to be thousands of posterior samples collected in a typical analysis. For this example, I will calculate the posterior overlap on the first 100 samples, but in reality you would probably want to do this on at least a few hundred, if not all your posterior samples in a longer (perhaps over-lunch or over-night) run. ```{r, bayesOverlap} # options for running jags parms <- list() parms$n.iter <- 2 * 10^4 # number of iterations to run the model for parms$n.burnin <- 1 * 10^3 # discard the first set of values parms$n.thin <- 10 # thin the posterior by this many parms$n.chains <- 2 # run this many chains # define the priors priors <- list() priors$R <- 1 * diag(2) priors$k <- 2 priors$tau.mu <- 1.0E-3 # fit the ellipses which uses an Inverse Wishart prior # on the covariance matrix Sigma, and a vague normal prior on the # means. Fitting is via the JAGS method. ellipses.posterior <- siberMVN(siber.example, parms, priors) # and teh corresponding Bayesian estimates for the overlap between the # 95% ellipses is given by: bayes95.overlap <- bayesianOverlap(ellipse1, ellipse2, ellipses.posterior, draws = 100, p.interval = 0.95, n = 100) # a histogram of the overlap hist(bayes95.overlap[,3], 10) # and as above, you can express this a proportion of the non-overlapping area of # the two ellipses, would be bayes.prop.95.over <- (bayes95.overlap[,3] / (bayes95.overlap[,2] + bayes95.overlap[,1] - bayes95.overlap[,3]) ) hist(bayes.prop.95.over, 10) ```