Import the data as before. Note
that we get some warnings from both that “objects are masked from”
various packages. This is because the packages we have just loaded have
functions of the same name as those that have already been loaded
(usually ones from the base R packages). This warning is telling us
that, in the case of filter
, when we simply call
filter()
, we will be using the last loaded one, i.e. from
dplyr
, rather than the one from stats
. If you
want to force R to use a particular function from a particular package
you can write the long form, dplyr::filter()
or
stats::filter()
. In many ways, it is good practice to
always use this format, but in reality we are all lazy.
library(SIBER)
library(dplyr)
library(ggplot2)
# import the data. Replace this line with a read.csv() or similar call
# to you own local file.
data("demo.siber.data")
# make a copy of our data for use here in this example, and
# set the columns group and community to be factor type using dplyr.
# Additionally rename the isotope data columns and drop the iso1 and iso2
# columns using the .keep option to keep only those that were not used to
# create the new variables, i.e. keep only ones on the left of the "=".
demo_data <- demo.siber.data %>% mutate(group = factor(group),
community = factor(community),
d13C = iso1,
d15N = iso2,
.keep = "unused")
The graphics that are part of “base R” are not very pretty (or at
least some people think so - I quite like them to be honest). Another
one of Hadley Wickham’s very popular packages is ggplot2
which by default makes some very blog-friendly graphics. And of course
you can change the theme (template) on them to get something more
suitable for publication (and fashion seems to be changing here
too).
In ggplot2, figures are created in layer, by first creating a basic
layer of axes according to an “aesthetic” which is then used as a
framework to add points and lines and other embellishments. If you push
the layers into an object using the <-
assignment as I
have done here, then you will need to print()
your figure
to get it to render.
# when plotting colors and shapes, we need to tell ggplot that these are to
# be treated as categorical factor type data, and not numeric.
first.plot <- ggplot(data = demo_data,
aes(x = d13C,
y = d15N)) +
geom_point(aes(color = group, shape = community), size = 5) +
ylab(expression(paste(delta^{15}, "N (permille)"))) +
xlab(expression(paste(delta^{13}, "C (permille)"))) +
theme(text = element_text(size=16)) +
scale_color_viridis_d()
# And print our plot to screen
print(first.plot)
If you want to use the more normal plotting format for journals
without gridlines, and without the light-grey background, you add
theme_classic()
to your plot. This is a shortcut to some
default settings of the myriad options available in theme()
which we used above to increase the text size in our plot.
classic.first.plot <- first.plot + theme_classic() +
theme(text = element_text(size=18))
# and print to screen
print(classic.first.plot)
Adding the means and error bars in both directions to create the
classic isotope scatterplot requires adding these as additional layers.
We could re-write all the lines of code for the original plot above, or
we can simply create a new ggplot object based on the original, and then
add the layers we want, and print. We need some summary data for each of
the groups to plot the intervals: we use dplyr::summarise
as in the previous script today except this time we store the output
into a new object which I call sbg
. When we add the layers
for the means and the errorbars, we need to tell the layers to use a new
aesthetic mapping using the new x and y data:
mapping = aes(x = mC, y = mN, ...)
.
# Summarise By Group (sbg)
sbg <- demo_data %>%
group_by(group, community) %>%
summarise(count = n(),
mC = mean(d13C),
sdC = sd(d13C),
mN = mean(d15N),
sdN = sd(d15N))
## `summarise()` has grouped output by 'group'. You can override using the
## `.groups` argument.
# make a copy of the first.plot object
# second.plot <- first.plot
# add the layers using the summary data in sbg
second.plot <- first.plot +
geom_errorbar(data = sbg,
mapping = aes(x = mC, y = mN,
ymin = mN - 1.96*sdN,
ymax = mN + 1.96*sdN),
width = 0) +
geom_errorbarh(data = sbg,
mapping = aes(x = mC, y = mN,
xmin = mC - 1.96*sdC,
xmax = mC + 1.96*sdC),
height = 0) +
geom_point(data = sbg, aes(x = mC,
y = mN,
fill = group),
color = "black", shape = 22, size = 5,
alpha = 0.7, show.legend = FALSE) +
scale_fill_viridis_d()
## Warning in geom_errorbarh(data = sbg, mapping = aes(x = mC, y = mN, xmin = mC -
## : Ignoring unknown aesthetics: x
The ggplot2 function stat_ellipse
allows us to easily
add ellipses, of varying level which corresponds to the
prediction interval. This function defaults to using the t-distribution
so we will override this and specify the normal distribution as is
consistent with the SIBER approach. Note that it is not possible to draw
the small sample size corrected ellipses using this method: I need to
create a new geom_
for SIBER and it is on my to-do
list.
# use our ellipse function to generate the ellipses for plotting
# decide how big an ellipse you want to draw
p.ell <- 0.95
# create our plot based on first.plot above adding the stat_ellipse() geometry.
# We specify thee ellipse to be plotted using the polygon geom, with fill and
# edge colour defined by our column "group", using the normal distribution and
# with a quite high level of transparency on the fill so we can see the points
# underneath. In order to get different ellipses plotted by both columns "group"
# and "community" we have to use the interaction() function to ensure both are
# considered in the aes(group = XYZ) specification. Note also the need to
# specify the scale_fill_viridis_d() function as the mapping of colors for
# points and lines is separate to filled objects and we want them to match.
ellipse.plot <- first.plot +
stat_ellipse(aes(group = interaction(group, community),
fill = group,
color = group),
alpha = 0.25,
level = p.ell,
type = "norm",
geom = "polygon") +
scale_fill_viridis_d()
print(ellipse.plot)
If the permil symbol permille is not showing correctly (and instead
is printing as permille in your plot it
is likely because your computer is not set up to use UTF-8 format
character encoding. This is not a problem with your setup of R or
Rstudio, but is deeper in your computer. It is fixed by changing the
region or locale settings on your computer. You can access these in the
system preferences area of your computer’s operating system. To check if
your computer is set up to identify and interpret UTF-8 encoding, you
can type sessionInfo()
in the R console. You should see
something like this:
en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8
under the heading “locale”. On my machine, this indicates that it is use
english (en), for Ireland (IE) and the UTF-8 encoding format. If the
UTF-8 format is missing from your sessionInfo()
then you
could try changing your operating system to a locale similar to your own
region’s location and see if the UTF-8 format is supported there.