Ex 6: Cladocera

Here we step through the Cladocera Example using the script version of MixSIAR. For a demonstration using the GUI version, see the MixSIAR Manual. For a thorough walkthrough of how to use MixSIAR in a script, see the Wolves Example, which provides more commentary and explanation.

For a clean, runnable .R script, look at mixsiar_script_cladocera.R in the example_scripts folder of the MixSIAR package install:

library(MixSIAR)
mixsiar.dir <- find.package("MixSIAR")
paste0(mixsiar.dir,"/example_scripts")

You can run the Cladocera Example script directly with:

source(paste0(mixsiar.dir,"/example_scripts/mixsiar_script_cladocera.R"))

Cladocera Example

The Cladocera Example is from Galloway et al. 2014 and demonstrates MixSIAR applied to a 22-dimensional fatty acid dataset. Here the 14 mixture datapoints are Cladocera (water flea) fatty acid profiles from 6 lakes in Finland over 2 seasons. Besides the high dimensionality, the other difference with this analysis is that we fit each mixture datapoint individually, because there is no clear covariate structure (some sites have 2 seasons, some have 1, some sites in the same lake). We do this by creating an “id” column and treating “id” as a fixed effect.

  • 22 biotracers (carbon 14.0, 16.0, 16.1w9, 16.1w7, 16.2w4, 16.3w3, 16.4w3, 17.0, 18.0, 18.1w9, 18.1w7, 18.2w6, 18.3w6, 18.3w3, 18.4w3, 18.5w3, 20.0, 22.0, 20.4w6, 20.5w3, 22.6w3, BrFA)
  • Mix datapoints fit independently (“process error” only, MixSIR)
  • Source data as means and SDs

Here we fit the “process error” model of MixSIR, which we MUST do when we only have one mix datapoint (or here, one mix datapoint per fixed effect). With only one datapoint, there is no information to estimate an additional mixture variance term, so we have to assume a fixed variance based on the variance of the sources (see Moore and Semmens 2008).

Here we treat “id” as a fixed effect—this will estimate the diet of each mixture data point separately (sample size of 1). This makes sense to do when you think there will be clear differences between sites/seasons/etc., but only have 1 or 2 points from each site/season (i.e, you don’t have enough data to estimate a site/season effect). If you are interested in the site/season effect, you need replicates within each site/season, and then it is best to fit site/season as a fixed or random effect.

Fatty acid data greatly increase the number of biotracers beyond the typical 2 stable isotopes, d13C and d15N, which gives the mixing model power to resolve more sources. We caution, however, that using fatty acid data is not a panacea for the “underdetermined” problem (# sources > # biotracers + 1). As the number of sources increases, the “uninformative” prior (α = 1) has greater influence, even if there are many more biotracers than sources. See the Cladocera Example prior with 7 sources and 22 biotracers.

Load MixSIAR package

library(MixSIAR)

Load mixture data

See ?load_mix_data for details.

The cladocera consumer data has 1 covariate (factors="id"), which we fit as a fixed effect (fac_random=FALSE). “Site” is not nested within another factor (fac_nested=FALSE). There are no continuous effects (cont_effects=NULL).

# Replace the system.file call with the path to your file
mix.filename <- system.file("extdata", "cladocera_consumer.csv", package = "MixSIAR")

mix <- load_mix_data(filename=mix.filename,
                     iso_names=c("c14.0","c16.0","c16.1w9","c16.1w7","c16.2w4",
                                 "c16.3w3","c16.4w3","c17.0","c18.0","c18.1w9",
                                 "c18.1w7","c18.2w6","c18.3w6","c18.3w3","c18.4w3",
                                 "c18.5w3","c20.0","c22.0","c20.4w6","c20.5w3",
                                 "c22.6w3","BrFA"),
                     factors="id",
                     fac_random=FALSE,
                     fac_nested=FALSE,
                     cont_effects=NULL)

Load source data

See ?load_source_data for details.

We do not have any fixed/random/continuous effects or concentration dependence in the source data (source_factors=NULL, conc_dep=FALSE). We only have source means, SD, and sample size—not the original “raw” data (data_type="means").

# Replace the system.file call with the path to your file
source.filename <- system.file("extdata", "cladocera_sources.csv", package = "MixSIAR")

source <- load_source_data(filename=source.filename,
                           source_factors=NULL,
                           conc_dep=FALSE,
                           data_type="means",
                           mix)

Load discrimination data

See ?load_discr_data for details.

Note that Galloway et al. 2014 conducted feeding trials to create a “resource library”. In the mixing model, the sources are actually consumers fed exclusively each of the sources. This allowed them to set the discrimination = 0 (see isopod_discrimination.csv).

# Replace the system.file call with the path to your file
discr.filename <- system.file("extdata", "cladocera_discrimination.csv", package = "MixSIAR")

discr <- load_discr_data(filename=discr.filename, mix)

Plot data

DO NOT use the plot_data function! When there are more than 2 biotracers, MixSIAR currently plots every pairwise combination. Here, that means ${22 \choose 2} = 231$ plots are produced. Yikes! In the future, MixSIAR will offer non-metric multidimensional scaling (NMDS) plots for these cases.

Plot prior

Define your prior, and then plot using “plot_prior”

  • RED = your prior
  • DARK GREY = “uninformative”/generalist (alpha = 1)
  • LIGHT GREY = “uninformative” Jeffrey’s prior (alpha = 1/n.sources)
# default "UNINFORMATIVE" / GENERALIST prior (alpha = 1)
plot_prior(alpha.prior=1,source)

Note that the “uninformative” prior with 7 sources is starting to look more informative… imagine what it would look like with 15 sources.

Write JAGS model file

Here we fit the “Process only” error model of MixSIR, which we MUST do when we only have one mix datapoint (or here, one mix datapoint per fixed effect). With only one datapoint, there is no information to estimate an additional mixture variance term, so we have to assume a fixed variance based on the variance of the sources. The differences between “Residual * Process”, “Residual only”, and “Process only” are explained in Stock and Semmens (2016).

# Write the JAGS model file
model_filename <- "MixSIAR_model.txt"
resid_err <- FALSE
process_err <- TRUE
write_JAGS_model(model_filename, resid_err, process_err, mix, source)

Run model

First use run = "test" to check if 1) the data are loaded correctly and 2) the model is specified correctly:

jags.1 <- run_model(run="test", mix, source, discr, model_filename)

After a test run works, increase the MCMC run to a value that may converge:

jags.1 <- run_model(run="normal", mix, source, discr, model_filename)

Analyze diagnostics and output

See ?output_JAGS for details.

output_JAGS(jags.1, mix, source, output_options)

Since we fit “id” as a fixed effect, there is no inference on diet at the overall population level (no p.global). You should see posterior plots for all 14 mixture samples individually (ID 7, ID 14).