Here we show how to do a mixing
model analysis using the script version of MixSIAR. For
a demonstration of the GUI version, see the MixSIAR
Manual. For a clean, runnable .R
script, look at
mixsiar_script_wolves.R
in the example_scripts
folder of the MixSIAR package install:
You can run the wolves example script with:
While the GUI may be convenient for users less familiar with R, we advise using the script version of MixSIAR for several reasons:
Repeatability: You can run different models and have a record of the commands that created each one. There are many reasons you’d want to do this. For example, you may want to compare model results with an uninformative prior vs. an informative prior, one error term option vs. another, grouping sources a priori vs. a posteriori, different MCMC run lengths, etc.
Speed: Clicking through the GUI buttons can get onerous after a while.
Installation ease: Some users aren’t able to install the GTK+ software that the GUI depends on (more issues on Mac). It may be worth figuring out the script version (R skills!) instead of figuring out how to get GTK+ installed.
The basic MixSIAR workflow is the same using a script or the GUI:
load_mix_data()
)load_source_data()
)load_discr_data()
)plot_data()
)calc_area()
)plot_prior()
)write_JAGS_model()
)run_model()
)output_JAGS()
)output_JAGS()
)The “Wolves Example” uses data reconstructed from (not identical to) Semmens et al. 2009. Here, we investigate the diet of 66 wolves in British Columbia with:
Load the mixture data, i.e. your:
Consumer isotope values (trophic ecology / diet)
Mixed sediment/water tracer values (sediment/hydrology fingerprinting)
filename
: name of the CSV file with mix/consumer
data
iso_names
: column headings of the tracers/isotopes you’d
like to use
factors
: vector of random/fixed effect column headings
in ‘filename’. NULL if no factors.
fac_random
: vector of TRUE/FALSE, TRUE if factor is
random effect, FALSE if fixed effect. NULL if no factors.
fac_nested
: vector of TRUE/FALSE, TRUE if factor is
nested within the other. Only applies if 2 factors. NULL otherwise.
cont_effects
: column headings of any continuous
effects
The wolves consumer data has 2 covariates: Region and Pack, where
Pack is nested within Region (fac_nested=c(FALSE,TRUE)
). By
“nested”, we mean that all wolves in a given pack are in the same region
- each pack is entirely within one region. This is an excellent example
of hierarchical
structure, fit with 2 random effects
(fac_random=c(TRUE,TRUE)
).
# Replace the system.file call with the path to your file
mix.filename <- system.file("extdata", "wolves_consumer.csv", package = "MixSIAR")
# Load the mixture/consumer data
mix <- load_mix_data(filename=mix.filename,
iso_names=c("d13C","d15N"),
factors=c("Region","Pack"),
fac_random=c(TRUE,TRUE),
fac_nested=c(FALSE,TRUE),
cont_effects=NULL)
Load the source data, i.e. your:
Source isotope values (trophic ecology / diet)
Sediment/water source tracer values (sediment/hydrology fingerprinting)
filename
: name of the CSV file with source data
source_factors
: column headings of random/fixed effects
you have source data by
conc_dep
: TRUE or FALSE, do you have concentration
dependence data in the file?
data_type
: “means” or “raw”, is your source data as
means+SD, or do you have raw data?
If you look at wolves_sources.csv
, you will see that
each Region has different isotope values - this is specified with
source_factors="Region"
. We do not have concentration
dependence data here, so conc_dep=FALSE
. We only have
source summary statistics (Mean, SD, and sample size), not the original
“raw”” data, so data_type="means"
. Note that
wolves_sources.csv
has a column titled “n”” with the sample
size of each source estimate. This must be in your source data file when
you run your data!
Load the discrimination data, i.e. your:
Trophic Enrichment Factor (TEF) / fractionation values (trophic ecology/diet)
xxxxxxxx (sediment/hydrology fingerprinting)
filename
: name of the CSV file with discrimination
data
This is your chance to check:
Are the data loaded correctly?
Is your mixture data in the source polygon?
Are one or more of your sources confounded/hidden?
filename
: name you’d like MixSIAR to save the isospace
plot as (extension will be added automatically)
plot_save_pdf
: TRUE or FALSE, should MixSIAR save the
plot as a .pdf?
plot_save_png
: TRUE or FALSE, should MixSIAR save the
plot as a .png?
You should always look at the isospace plot—this is a good check that the data is loaded correctly, and that the isospace geometry makes sense. If the mixture data are well outside the source polygon, you have a serious violation of mixing model assumptions, and it must be true that either 1) You’re missing a source, or 2) You’re using an incorrect discrimination factor. MixSIAR, like SIAR, can fit a residual error term and thus always find a solution even if it is nonsensical.
Also note that the MixSIAR isospace plot adds the discrimination means AND SDs to the raw source values. This is because model uses the source + discrimination values to fit the mixture data, calculated as $\sqrt{\sigma^2_{source} + \sigma^2_{discr}}$, under the assumption of independence. Error bars indicate ± 1 SD.
If 2 isotopes/tracers, calculate normalized surface area of the convex hull polygon(s) as in Brett (2014).
Note 1: discrimination SD is added to the source SD (see ?calc_area for details)
Note 2: If source data are by factor (as in wolf ex), computes area for each polygon (one for each of 3 regions in wolf ex)
# Calculate the convex hull area, standardized by source variance
calc_area(source=source,mix=mix,discr=discr)
## [1] 0.8750158 13.9726701 13.6440547
Define your prior, and then plot using “plot_prior”
Bayesian analyses require priors, and MixSIAR includes a
plot_prior
function to plot the prior on the mixture (diet)
proportions (at the highest hierarchical level, p.global). The prior
represents our knowledge about the proportions before we consider the
biotracer data. A natural tendency is to want a flat/“uninformative”
prior, where all values between 0 and 1 are equally likely. However,
because proportions are not independent, there is no truly uninformative
prior (e.g. the histograms are not flat). The best we can do with the
Dirichlet distribution is set α = c(1,1,1), which is uninformative
on the simplex. In other words, all combinations of the proportions are
equally likely. See the section titled “Constructing informative
Bayesian priors” in the forthcoming MixSIAR paper.
Because the mean of the “uninformative” prior, α = c(1,1,1), is $\frac{1}{n.sources}$, we also call it the generalist prior. This reflects two facts: 1) it is not really uninformative, and 2) for weakly informative data it shifts the posterior towards a generalist diet, $p_1 = p_2 = p_3 = \frac{1}{3}$. The amount of shift depends on the informativeness of the data (quality, quantity, and polygon geometry).
Write the JAGS model file (define model structure). The model will be
saved as model_filename
(“MixSIAR_model.txt” is default,
but you may want to change if you create many different models).
There are 3 error term options available:
resid_err = TRUE
,
process_err = TRUE
)resid_err = TRUE
,
process_err = FALSE
)resid_err = FALSE
,
process_err = TRUE
)In the Wolves Example we want the “Residual * Process” error option. The differences between “Residual * Process”, “Residual only”, and “Process only” are explained in Stock and Semmens (2016).
Note: If you have only 1 mix datapoint (or 1 mix
datapoint per factor level, i.e. region or pack in this example), you
have no information about the mixture/consumer variability. In this
case, we use the original MixSIR error model (which does not fit a
residual error term). This is the same behavior as siarsolo
in SIAR.
Choose one of the MCMC run options:
run == | Chain Length | Burn-in | Thin | # Chains |
---|---|---|---|---|
“test” | 1,000 | 500 | 1 | 3 |
“very short” | 10,000 | 5,000 | 5 | 3 |
“short” | 50,000 | 25,000 | 25 | 3 |
“normal” | 100,000 | 50,000 | 50 | 3 |
“long” | 300,000 | 200,000 | 100 | 3 |
“very long” | 1,000,000 | 500,000 | 500 | 3 |
“extreme” | 3,000,000 | 1,500,000 | 500 | 3 |
You can also set custom MCMC parameters, e.g:
Good idea to use run = "test"
first to check if 1) the
data are loaded correctly and 2) the model is specified correctly:
After a test run works, increase the MCMC run to a value that may converge
jags.1
will be an rjags
object where you
can access the MCMC chains for plotting, aggregating sources a
posteriori, etc.
First you can set output options like file names, plot file types, etc. (see ?output_JAGS for details).
output_options <- list(summary_save = TRUE,
summary_name = "summary_statistics",
sup_post = FALSE,
plot_post_save_pdf = TRUE,
plot_post_name = "posterior_density",
sup_pairs = FALSE,
plot_pairs_save_pdf = TRUE,
plot_pairs_name = "pairs_plot",
sup_xy = TRUE,
plot_xy_save_pdf = FALSE,
plot_xy_name = "xy_plot",
gelman = TRUE,
heidel = FALSE,
geweke = TRUE,
diag_save = TRUE,
diag_name = "diagnostics",
indiv_effect = FALSE,
plot_post_save_png = FALSE,
plot_pairs_save_png = FALSE,
plot_xy_save_png = FALSE,
diag_save_ggmcmc = FALSE)
Then you can call output_JAGS
to process diagnostics,
summary statistics, and create posterior density plots:
For a thorough explanation of the output from
output_JAGS
, see the Wolves Example section of the MixSIAR
Manual. You will also find examples of accessing the MCMC chains for
post hoc plotting and analysis there.