Here we step through the Cladocera Example using the script version of MixSIAR. For a demonstration using the GUI version, see the MixSIAR Manual. For a thorough walkthrough of how to use MixSIAR in a script, see the Wolves Example, which provides more commentary and explanation.
For a clean, runnable .R
script, look at
mixsiar_script_cladocera.R
in the
example_scripts
folder of the MixSIAR package install:
You can run the Cladocera Example script directly with:
The Cladocera Example is from Galloway et al. 2014 and demonstrates MixSIAR applied to a 22-dimensional fatty acid dataset. Here the 14 mixture datapoints are Cladocera (water flea) fatty acid profiles from 6 lakes in Finland over 2 seasons. Besides the high dimensionality, the other difference with this analysis is that we fit each mixture datapoint individually, because there is no clear covariate structure (some sites have 2 seasons, some have 1, some sites in the same lake). We do this by creating an “id” column and treating “id” as a fixed effect.
Here we fit the “process error” model of MixSIR, which we MUST do when we only have one mix datapoint (or here, one mix datapoint per fixed effect). With only one datapoint, there is no information to estimate an additional mixture variance term, so we have to assume a fixed variance based on the variance of the sources (see Moore and Semmens 2008).
Here we treat “id” as a fixed effect—this will estimate the diet of each mixture data point separately (sample size of 1). This makes sense to do when you think there will be clear differences between sites/seasons/etc., but only have 1 or 2 points from each site/season (i.e, you don’t have enough data to estimate a site/season effect). If you are interested in the site/season effect, you need replicates within each site/season, and then it is best to fit site/season as a fixed or random effect.
Fatty acid data greatly increase the number of biotracers beyond the typical 2 stable isotopes, d13C and d15N, which gives the mixing model power to resolve more sources. We caution, however, that using fatty acid data is not a panacea for the “underdetermined” problem (# sources > # biotracers + 1). As the number of sources increases, the “uninformative” prior (α = 1) has greater influence, even if there are many more biotracers than sources. See the Cladocera Example prior with 7 sources and 22 biotracers.
See ?load_mix_data for details.
The cladocera consumer data has 1 covariate
(factors="id"
), which we fit as a fixed effect
(fac_random=FALSE
). “Site” is not nested within another
factor (fac_nested=FALSE
). There are no continuous effects
(cont_effects=NULL
).
# Replace the system.file call with the path to your file
mix.filename <- system.file("extdata", "cladocera_consumer.csv", package = "MixSIAR")
mix <- load_mix_data(filename=mix.filename,
iso_names=c("c14.0","c16.0","c16.1w9","c16.1w7","c16.2w4",
"c16.3w3","c16.4w3","c17.0","c18.0","c18.1w9",
"c18.1w7","c18.2w6","c18.3w6","c18.3w3","c18.4w3",
"c18.5w3","c20.0","c22.0","c20.4w6","c20.5w3",
"c22.6w3","BrFA"),
factors="id",
fac_random=FALSE,
fac_nested=FALSE,
cont_effects=NULL)
See ?load_source_data for details.
We do not have any fixed/random/continuous effects or concentration
dependence in the source data (source_factors=NULL
,
conc_dep=FALSE
). We only have source means, SD, and sample
size—not the original “raw” data (data_type="means"
).
See ?load_discr_data for details.
Note that Galloway et
al. 2014 conducted feeding trials to create a “resource library”. In
the mixing model, the sources are actually consumers fed exclusively
each of the sources. This allowed them to set the discrimination = 0
(see isopod_discrimination.csv
).
DO NOT use the plot_data
function! When there are more
than 2 biotracers, MixSIAR currently plots every pairwise combination.
Here, that means ${22 \choose 2} = 231$
plots are produced. Yikes! In the future, MixSIAR will offer non-metric
multidimensional scaling (NMDS) plots for these cases.
Define your prior, and then plot using “plot_prior”
Note that the “uninformative” prior with 7 sources is starting to look more informative… imagine what it would look like with 15 sources.
Here we fit the “Process only” error model of MixSIR, which we MUST do when we only have one mix datapoint (or here, one mix datapoint per fixed effect). With only one datapoint, there is no information to estimate an additional mixture variance term, so we have to assume a fixed variance based on the variance of the sources. The differences between “Residual * Process”, “Residual only”, and “Process only” are explained in Stock and Semmens (2016).
First use run = "test"
to check if 1) the data are
loaded correctly and 2) the model is specified correctly:
After a test run works, increase the MCMC run to a value that may converge: