isoWater includes two sets of tools. The first supports queries to the Waterisotopes Database (wiDB, https://waterisotopesDB.org) to identify and download data. The second uses Bayesian estimation to infer information about the origin of a water sample based on its stable isotope ratios.
Three functions are available to streamline queries to the wiDB.
wiDB_values()
is designed to expose the list of terms used
in the database’s categorical metadata fields, helping the user design
other queries:
## $types
## [1] "Beer" "Bottled" "Canal" "Cave_drip"
## [5] "Cloud_or_fog" "Firn_core" "Ground" "Ice_core"
## [9] "Lake" "Lake_or_pond" "Leaf" "Milk"
## [13] "Mine" "Ocean" "Precipitation" "Rime"
## [17] "River_or_stream" "Snow_pit" "Soil" "Spring"
## [21] "Sprinkler" "Stem" "Swimming_pool" "Tap"
## [25] "Vapor" "Wetland" "Wind blown snow"
##
## $countries
## [1] "AD" "AE" "AF" "AL" "AM" "AO" "AQ" "AR" "AT" "AU" "BA" "BB" "BD" "BE" "BF"
## [16] "BG" "BH" "BI" "BJ" "BM" "BO" "BR" "BW" "BY" "BZ" "CA" "CD" "CF" "CH" "CL"
## [31] "CM" "CN" "CO" "CR" "CU" "CV" "CY" "CZ" "DE" "DJ" "DK" "DO" "DZ" "EC" "EE"
## [46] "EG" "ES" "ET" "FI" "FK" "FM" "FO" "FR" "GA" "GB" "GE" "GF" "GH" "GI" "GL"
## [61] "GR" "GT" "GU" "GY" "HK" "HN" "HR" "HU" "ID" "IE" "IL" "IN" "IO" "IQ" "IR"
## [76] "IS" "IT" "JO" "JP" "KE" "KG" "KH" "KI" "KR" "KW" "KY" "KZ" "LA" "LB" "LK"
## [91] "LS" "LT" "LU" "LV" "LY" "MA" "MC" "MD" "MG" "MK" "ML" "MM" "MN" "MT" "MU"
## [106] "MX" "MY" "MZ" "NA" "NE" "NG" "NI" "NL" "NO" "NP" "NZ" "OM" "PA" "PE" "PF"
## [121] "PG" "PH" "PK" "PL" "PR" "PS" "PT" "PW" "PY" "QA" "RE" "RO" "RS" "RU" "SA"
## [136] "SC" "SD" "SE" "SG" "SH" "SI" "SJ" "SK" "SN" "ST" "SV" "SY" "SZ" "TD" "TH"
## [151] "TJ" "TN" "TR" "TW" "TZ" "UA" "UG" "UK" "UM" "US" "UY" "UZ" "VE" "VN" "WS"
## [166] "ZA" "ZM" "ZW"
wiDB_sites()
queries the database and returns a
data.frame containing the latitude and longitude coordinates for all
sites matching the provided query criteria. This supports quick
screening of data availability, for example:
ls = wiDB_sites(countries = "US", types = "Lake_or_pond")
ps = wiDB_sites(countries = "US", types = "Precipitation")
omar = par("mar")
par(mar = c(4, 4, 1, 1))
plot(ls[, 2:1], xlim = c(-125, -68), ylim = c(25, 50))
points(ps[, 2:1], col = "red")
Now we can refine our query and request isotope data using the
function wiDB_data()
. We will focus in on the area around
Ames, Iowa, where the sites query showed both lake and precipitation
data.
ld = wiDB_data(minLat = 41, maxLat = 42, minLong = -94, maxLong = -93, types = "Lake_or_pond")
pd = wiDB_data(minLat = 41, maxLat = 42, minLong = -94, maxLong = -93, types = "Precipitation")
## Site_ID Site_Name Latitude Longitude Elevation Sample_ID
## 1 NLA06608-2036 South Banner Lake 41.43839 -93.55097 238.24 710256
## 2 NLA06608-0008 Lake Ahquabi 41.29211 -93.59040 261.56 710288
## 3 NLA06608-0008 Lake Ahquabi 41.29211 -93.59040 261.56 710971
## 4 NLA06608-2056 Badger Creek Lake 41.46950 -93.92037 278.75 710972
## Type Start_Date Start_Time_Zone Collection_Date
## 1 Lake_or_pond NA NA 2007-07-11 00:00:00
## 2 Lake_or_pond NA NA 2007-07-12 00:00:00
## 3 Lake_or_pond NA NA 2007-08-22 00:00:00
## 4 Lake_or_pond NA NA 2007-08-22 00:00:00
## Collection_Time_Zone Phase Depth_meters Sample_Comments d2H d18O
## 1 NA Liquid 0 NA -15.56189 -0.5534326
## 2 NA Liquid 0 NA -25.35876 -2.0486228
## 3 NA Liquid 0 NA -20.81503 -1.9292234
## 4 NA Liquid 0 NA -27.77034 -3.3816347
## D17O d2H_Analytical_SD d18O_Analytical_SD D17O_Analytical_SD
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## WI_Analysis_Source Project_ID
## 1 EPA_Corvallis 10
## 2 EPA_Corvallis 10
## 3 EPA_Corvallis 10
## 4 EPA_Corvallis 10
## Project_ID Contact_Name Contact_Email
## 1 31 NA NA
## Citation
## 1 W.W. Simpkins, Isotopic composition of precipitation in central Iowa, Journal of Hydrology, Volume 172, Issue 1, 1995, Pages 185-207
## URL Project_Name
## 1 https://doi.org/10.1016/0022-1694(95)02694-K Iqbal, 2008; Simpkins, 1995
Notice that the object returned by the data function is a named list that includes the requested data and a summary of the projects (data sources) from which those data are derived. See the documentation for this function for details, and always remember to cite any data you use!
Let’s look at the data a little more closely:
omar = par("mar")
par(mar = c(5, 5, 1, 1))
plot(pd$data$d18O, pd$data$d2H, xlab = expression(delta^{18}*"O"),
ylab = expression(delta^{2}*"H"))
abline(10, 8)
points(ld$data$d18O, ld$data$d2H, col = "red")
It appears that the precipitation values vary widely and cluster around the Global Meteoric Water Line, and the lake values are relatively high and somewhat below the GMWL, suggesting that they may be somewhat evaporatively enriched.
What if we wanted to ‘correct’ for the isotopic effects of
evaporation, giving sample values that could be compared directly with
one or more unevaporated sources. In other words, based on isotopic data
for a water sample and parameters describing a local meteoric water line
(MWL), estimate the isotope ratios of the sample’s source prior to
evaporation. If you have data you’d like to use to constrain the MWL,
like we do here, the mwl
function will produce a data
object describing the line:
#extract the precipitation H and O isotope values
HO = data.frame(pd$data$d2H, pd$data$d18O)
MWL = mwl(HO)
## Warning in mwl(HO): Missing values removed from HO
If you don’t have local data for a MWL, the function
mwlSource()
(see below) has parameters for the Global
Meteoric Water Line embedded as a default.
Now bundle the data for your water sample as an “iso” data object
using the function iso()
. Note that isoWater uses a
bivariate normal distribution to describe all water sample values; thus
the parameters in an iso object include the H and O isotope values,
their 1 standard deviation uncertainties, and the error covariance for H
and O. For measured sample values the covariance might be expected to be
zero (the default value in iso()
), but for many values
(e.g., potential sample or source values estimated from multiple
measurements) the H and O error covariance will be substantial.
#we will analyze one of our lake water samples, and assume realistic
#values of analytical uncertainty and no error covariance
obs = iso(ld$data$d2H[1], ld$data$d18O[1], 0.5, 0.1)
Lastly, we need to provide a prior distribution for the evaporation line slope, which is represented as a univariate normal distribution with mean and standard deviation parameters. These can be tricky to estimate, but there are number of papers out there that provide guidance, as well as some tools and data products that might help.
#assumed value that is reasonable for lake water evaporation in this
#location/climate
slope = c(5.2, 0.3)
With our inputs prepared, we can run the analysis using
mwlSource()
.
## module glm loaded
## mean sd 2.5% 25% 50%
## E 8.5802940 1.8785581 4.316071 7.272429 8.9745340
## S 5.2671265 0.2878505 4.664935 5.083122 5.2715836
## deviance -0.3400286 1.9888415 -2.264542 -1.761493 -0.9453346
## source_d18O -9.1304145 1.8756991 -11.732744 -10.574370 -9.5336387
## source_d2H -60.8332583 10.4712460 -76.163846 -69.161539 -62.6899928
## 75% 97.5% Rhat n.eff
## E 10.0243458 11.194465 2.734956 4
## S 5.4586976 5.825085 1.057706 44
## deviance 0.4437946 4.918822 1.000803 7500
## source_d18O -7.8232214 -4.899413 2.845208 4
## source_d2H -53.1196889 -38.087184 2.834500 4
The summary object returned by the function provides two statistics that can be used to assess convergence (Rhat and effective sample size), in addition to summary statistics for the posterior distribution. For the relatively small chain lengths run in these examples we don’t expect strong convergence. The function’s output also includes an object containing all posterior samples (isoWater functions thin the posterior to 2500 samples per chain by default), which we can plot as a trace plot to visually assess burn-in and convergence:
plot(mwls1$results$source_d2H[1:2500], type = "l", ylim = range(mwls1$results$source_d2H))
lines(mwls1$results$source_d2H[2501:5000], col=2)
lines(mwls1$results$source_d2H[5001:7500], col=3)
The MWL model uses one of two line statistics to constrain the source
water prior distribution. The default, shown above, uses the MWL
prediction interval as the prior, appropriate if the source is best
represented as a single sample of the type used to define the MWL.
Alternatively, the argument stype = 2
can be provided to
use the confidence interval as the prior constraint, appropriate for
samples where the source is best represented as a mixture of many
samples of the type used to define the MWL. You can see that this makes
a substantial difference in the resulting posterior distribution, so
make sure you’re using the right statistics for your problem:
omar = par("mar")
par(mar = c(5, 5, 1, 1))
plot(HO[, 2:1], xlim = c(-20, 0), ylim = c(-140, 0),
xlab = expression(delta^{18}*"O"),
ylab = expression(delta^{2}*"H"))
abline(MWL[2], MWL[1])
points(obs[2:1], col = "red")
points(mwls1$results$source_d18O, mwls1$results$source_d2H, col = "blue")
points(mwls2$results$source_d18O, mwls2$results$source_d2H, col = "green")
What if instead of just back-correcting our sample values for
evaporation, we wanted to explicitly assess the relative contribution of
different water sources to the sample? In other words, based on isotope
values for two or more potential water sources, what is the mixture of
sources that contributed to the sample? The mixSource()
function works similarly to mwlSource()
, but requires
different arguments to define the mixing model source priors. Values for
all sources are provided in a single iso object:
#prep our data - we'll average the precipitation values by quarter
q = quarters(as.POSIXct(pd$data$Collection_Date))
qu = sort(unique(q))
ql = length(qu)
pd_q = data.frame("H" = numeric(ql), "O" = numeric(ql),
"Hsd" = numeric(ql), "Osd" = numeric(ql),
"HOc" = numeric(ql))
for(i in seq_along(qu)){
pd_q$H[i] = mean(pd$data$d2H[q == qu[i]], na.rm = TRUE)
pd_q$O[i] = mean(pd$data$d18O[q == qu[i]], na.rm = TRUE)
pd_q$Hsd[i] = sd(pd$data$d2H[q == qu[i]], na.rm = TRUE)
pd_q$Osd[i] = sd(pd$data$d18O[q == qu[i]], na.rm = TRUE)
pd_q$HOc[i] = cov(pd$data$d18O[q == qu[i]], pd$data$d2H[q == qu[i]],
use = "pairwise.complete.obs")
}
pd_q
## H O Hsd Osd HOc
## 1 -111.36000 -15.634667 59.49320 7.434212 439.68889
## 2 -32.91821 -4.784643 24.80235 3.257371 77.32559
## 3 -26.05879 -4.285000 19.06960 2.374869 44.77920
## 4 -54.53600 -9.051429 36.48143 4.546960 163.92950
#make the iso object, providing the stats calculated above
sources = iso(pd_q$H, pd_q$O, pd_q$Hsd, pd_q$Osd, pd_q$HOc)
Now we can run the analysis; let’s use the parallel option (also
available in mwlSource()
) to speed this up and run slightly
longer chains:
## 276092 5.3433953
## deviance -0.3005642 1.9961507 -2.268549308 -1.74044708 -0.9225450
## fracs[1] 0.3175808 0.1682634 0.073913250 0.16894341 0.2936494
## fracs[2] 0.1637118 0.1335522 0.010445299 0.05941200 0.1216820
## fracs[3] 0.3307662 0.1595332 0.091131033 0.19911202 0.3238308
## fracs[4] 0.1879411 0.1868916 0.001251286 0.03060732 0.1154059
## mixture_d18O -9.5884393 2.1181653 -13.342611118 -11.12190538 -9.6390290
## mixture_d2H -64.3590371 13.7738266 -90.508361268 -73.85469429 -63.9228695
## 75% 97.5% Rhat n.eff
## E 10.6038830 12.8039120 1.136975 25
## S 5.5781422 5.9707000 1.268443 10
## deviance 0.4678783 5.0776563 1.000908 5000
## fracs[1] 0.4558310 0.6618663 1.284617 10
## fracs[2] 0.2299499 0.4872385 1.349563 8
## fracs[3] 0.4329325 0.6684819 1.004538 400
## fracs[4] 0.3147480 0.6447386 2.838958 3
## mixture_d18O -8.0478003 -5.3581235 1.203351 15
## mixture_d2H -54.0905774 -38.7072694 1.264836 11
Again, we’re a long way from having a strongly converged simulation in which the posterior is likely to be a representative sample of the population parameters, but the initial ‘hint’ is that the lake sample values are consistent with a sub-equal source contribution from precipitation in each calendar quarter. Notice, too, that the evaporation-corrected sample values (‘mixture_d18O’ and ‘mixture_d2H’) are broadly similar to the equivalent values we obtained with the MWL model, above.
The default options use a prior that weights each source evenly, but
if we have other information we can use the arguments
mixprior
and shp
to prescribe different
distributions. Values in mixprior
are relative, so for
example if we think Q1 is likely to have contributed twice as much as
the other seasons…this may or may not have a strong impact on the
results, depending on the strength of the constraints provided by the
isotopic data: