isoWater

Introduction

isoWater includes two sets of tools. The first supports queries to the Waterisotopes Database (wiDB, https://waterisotopesDB.org) to identify and download data. The second uses Bayesian estimation to infer information about the origin of a water sample based on its stable isotope ratios.

wiDB Queries

Three functions are available to streamline queries to the wiDB. wiDB_values() is designed to expose the list of terms used in the database’s categorical metadata fields, helping the user design other queries:

wiDB_values(c("types", "countries"))

## $types
##  [1] "Beer"            "Bottled"         "Canal"           "Cave_drip"      
##  [5] "Cloud_or_fog"    "Firn_core"       "Ground"          "Ice_core"       
##  [9] "Lake"            "Lake_or_pond"    "Leaf"            "Milk"           
## [13] "Mine"            "Ocean"           "Precipitation"   "Rime"           
## [17] "River_or_stream" "Snow_pit"        "Soil"            "Spring"         
## [21] "Sprinkler"       "Stem"            "Swimming_pool"   "Tap"            
## [25] "Vapor"           "Wetland"         "Wind blown snow"
## 
## $countries
##   [1] "AD" "AE" "AF" "AL" "AM" "AO" "AQ" "AR" "AT" "AU" "BA" "BB" "BD" "BE" "BF"
##  [16] "BG" "BH" "BI" "BJ" "BM" "BO" "BR" "BW" "BY" "BZ" "CA" "CD" "CF" "CH" "CL"
##  [31] "CM" "CN" "CO" "CR" "CU" "CV" "CY" "CZ" "DE" "DJ" "DK" "DO" "DZ" "EC" "EE"
##  [46] "EG" "ES" "ET" "FI" "FK" "FM" "FO" "FR" "GA" "GB" "GE" "GF" "GH" "GI" "GL"
##  [61] "GR" "GT" "GU" "GY" "HK" "HN" "HR" "HU" "ID" "IE" "IL" "IN" "IO" "IQ" "IR"
##  [76] "IS" "IT" "JO" "JP" "KE" "KG" "KH" "KI" "KR" "KW" "KY" "KZ" "LA" "LB" "LK"
##  [91] "LS" "LT" "LU" "LV" "LY" "MA" "MC" "MD" "MG" "MK" "ML" "MM" "MN" "MT" "MU"
## [106] "MX" "MY" "MZ" "NA" "NE" "NG" "NI" "NL" "NO" "NP" "NZ" "OM" "PA" "PE" "PF"
## [121] "PG" "PH" "PK" "PL" "PR" "PS" "PT" "PW" "PY" "QA" "RE" "RO" "RS" "RU" "SA"
## [136] "SC" "SD" "SE" "SG" "SH" "SI" "SJ" "SK" "SN" "ST" "SV" "SY" "SZ" "TD" "TH"
## [151] "TJ" "TN" "TR" "TW" "TZ" "UA" "UG" "UK" "UM" "US" "UY" "UZ" "VE" "VN" "WS"
## [166] "ZA" "ZM" "ZW"

wiDB_sites() queries the database and returns a data.frame containing the latitude and longitude coordinates for all sites matching the provided query criteria. This supports quick screening of data availability, for example:

ls = wiDB_sites(countries = "US", types = "Lake_or_pond")
ps = wiDB_sites(countries = "US", types = "Precipitation")

omar = par("mar")
par(mar = c(4, 4, 1, 1))
plot(ls[, 2:1], xlim = c(-125, -68), ylim = c(25, 50))
points(ps[, 2:1], col = "red")

par(mar = omar)

Now we can refine our query and request isotope data using the function wiDB_data(). We will focus in on the area around Ames, Iowa, where the sites query showed both lake and precipitation data.

ld = wiDB_data(minLat = 41, maxLat = 42, minLong = -94, maxLong = -93, types = "Lake_or_pond")
pd = wiDB_data(minLat = 41, maxLat = 42, minLong = -94, maxLong = -93, types = "Precipitation")

ld$data

##         Site_ID         Site_Name Latitude Longitude Elevation Sample_ID
## 1 NLA06608-2036 South Banner Lake 41.43839 -93.55097    238.24    710256
## 2 NLA06608-0008      Lake Ahquabi 41.29211 -93.59040    261.56    710288
## 3 NLA06608-0008      Lake Ahquabi 41.29211 -93.59040    261.56    710971
## 4 NLA06608-2056 Badger Creek Lake 41.46950 -93.92037    278.75    710972
##           Type Start_Date Start_Time_Zone     Collection_Date
## 1 Lake_or_pond         NA              NA 2007-07-11 00:00:00
## 2 Lake_or_pond         NA              NA 2007-07-12 00:00:00
## 3 Lake_or_pond         NA              NA 2007-08-22 00:00:00
## 4 Lake_or_pond         NA              NA 2007-08-22 00:00:00
##   Collection_Time_Zone  Phase Depth_meters Sample_Comments       d2H       d18O
## 1                   NA Liquid            0              NA -15.56189 -0.5534326
## 2                   NA Liquid            0              NA -25.35876 -2.0486228
## 3                   NA Liquid            0              NA -20.81503 -1.9292234
## 4                   NA Liquid            0              NA -27.77034 -3.3816347
##   D17O d2H_Analytical_SD d18O_Analytical_SD D17O_Analytical_SD
## 1   NA                NA                 NA                 NA
## 2   NA                NA                 NA                 NA
## 3   NA                NA                 NA                 NA
## 4   NA                NA                 NA                 NA
##   WI_Analysis_Source Project_ID
## 1      EPA_Corvallis         10
## 2      EPA_Corvallis         10
## 3      EPA_Corvallis         10
## 4      EPA_Corvallis         10

pd$projects

##   Project_ID Contact_Name Contact_Email
## 1         31           NA            NA
##                                                                                                                               Citation
## 1 W.W. Simpkins, Isotopic composition of precipitation in central Iowa, Journal of Hydrology, Volume 172, Issue 1, 1995, Pages 185-207
##                                            URL                Project_Name
## 1 https://doi.org/10.1016/0022-1694(95)02694-K Iqbal, 2008; Simpkins, 1995

Notice that the object returned by the data function is a named list that includes the requested data and a summary of the projects (data sources) from which those data are derived. See the documentation for this function for details, and always remember to cite any data you use!

Let’s look at the data a little more closely:

omar = par("mar")
par(mar = c(5, 5, 1, 1))
plot(pd$data$d18O, pd$data$d2H, xlab = expression(delta^{18}*"O"),
     ylab = expression(delta^{2}*"H"))
abline(10, 8)
points(ld$data$d18O, ld$data$d2H, col = "red")

par(mar = omar)

It appears that the precipitation values vary widely and cluster around the Global Meteoric Water Line, and the lake values are relatively high and somewhat below the GMWL, suggesting that they may be somewhat evaporatively enriched.

Meteoric Water Line source

What if we wanted to ‘correct’ for the isotopic effects of evaporation, giving sample values that could be compared directly with one or more unevaporated sources. In other words, based on isotopic data for a water sample and parameters describing a local meteoric water line (MWL), estimate the isotope ratios of the sample’s source prior to evaporation. If you have data you’d like to use to constrain the MWL, like we do here, the mwl function will produce a data object describing the line:

#extract the precipitation H and O isotope values
HO = data.frame(pd$data$d2H, pd$data$d18O)
MWL = mwl(HO)

## Warning in mwl(HO): Missing values removed from HO

If you don’t have local data for a MWL, the function mwlSource() (see below) has parameters for the Global Meteoric Water Line embedded as a default.

Now bundle the data for your water sample as an “iso” data object using the function iso(). Note that isoWater uses a bivariate normal distribution to describe all water sample values; thus the parameters in an iso object include the H and O isotope values, their 1 standard deviation uncertainties, and the error covariance for H and O. For measured sample values the covariance might be expected to be zero (the default value in iso()), but for many values (e.g., potential sample or source values estimated from multiple measurements) the H and O error covariance will be substantial.

#we will analyze one of our lake water samples, and assume realistic
#values of analytical uncertainty and no error covariance
obs = iso(ld$data$d2H[1], ld$data$d18O[1], 0.5, 0.1)

Lastly, we need to provide a prior distribution for the evaporation line slope, which is represented as a univariate normal distribution with mean and standard deviation parameters. These can be tricky to estimate, but there are number of papers out there that provide guidance, as well as some tools and data products that might help.

#assumed value that is reasonable for lake water evaporation in this
#location/climate
slope = c(5.2, 0.3)

With our inputs prepared, we can run the analysis using mwlSource().

mwls1 = mwlSource(obs, MWL, slope, ngens = 5e3)

## module glm loaded

mwls1$summary

##                    mean         sd       2.5%        25%         50%
## E             7.2731805  2.1164156   2.629912   6.161644   7.3526043
## S             5.2023591  0.2799366   4.652896   5.005931   5.2117960
## deviance     -0.3246354  2.0113106  -2.252578  -1.736462  -0.9426918
## source_d18O  -7.8254391  2.1123860 -12.122166  -8.975384  -7.9073259
## source_d2H  -53.4694262 11.5350332 -77.424685 -59.584996 -53.8116810
##                     75%      97.5%     Rhat n.eff
## E             8.4300563  11.560652 1.524213     8
## S             5.3925788   5.729926 1.018421   120
## deviance      0.3856548   5.146643 1.000851  7500
## source_d18O  -6.7106876  -3.198861 1.414373     9
## source_d2H  -47.3397317 -29.082812 1.386262     9

The summary object returned by the function provides two statistics that can be used to assess convergence (Rhat and effective sample size), in addition to summary statistics for the posterior distribution. For the relatively small chain lengths run in these examples we don’t expect strong convergence. The function’s output also includes an object containing all posterior samples (isoWater functions thin the posterior to 2500 samples per chain by default), which we can plot as a trace plot to visually assess burn-in and convergence:

plot(mwls1$results$source_d2H[1:2500], type = "l", ylim = range(mwls1$results$source_d2H))
lines(mwls1$results$source_d2H[2501:5000], col=2)
lines(mwls1$results$source_d2H[5001:7500], col=3)

The MWL model uses one of two line statistics to constrain the source water prior distribution. The default, shown above, uses the MWL prediction interval as the prior, appropriate if the source is best represented as a single sample of the type used to define the MWL. Alternatively, the argument stype = 2 can be provided to use the confidence interval as the prior constraint, appropriate for samples where the source is best represented as a mixture of many samples of the type used to define the MWL. You can see that this makes a substantial difference in the resulting posterior distribution, so make sure you’re using the right statistics for your problem:

mwls2 = mwlSource(obs, MWL, slope, stype = 2, ngens = 5e3)

omar = par("mar")
par(mar = c(5, 5, 1, 1))
plot(HO[, 2:1], xlim = c(-20, 0), ylim = c(-140, 0),
     xlab = expression(delta^{18}*"O"),
     ylab = expression(delta^{2}*"H"))
abline(MWL[2], MWL[1])
points(obs[2:1], col = "red")
points(mwls1$results$source_d18O, mwls1$results$source_d2H, col = "blue")
points(mwls2$results$source_d18O, mwls2$results$source_d2H, col = "green")

par(mar = omar)

Mixture source

What if instead of just back-correcting our sample values for evaporation, we wanted to explicitly assess the relative contribution of different water sources to the sample? In other words, based on isotope values for two or more potential water sources, what is the mixture of sources that contributed to the sample? The mixSource() function works similarly to mwlSource(), but requires different arguments to define the mixing model source priors. Values for all sources are provided in a single iso object:

#prep our data - we'll average the precipitation values by quarter
q = quarters(as.POSIXct(pd$data$Collection_Date))
qu = sort(unique(q))
ql = length(qu)

pd_q = data.frame("H" = numeric(ql), "O" = numeric(ql),
                  "Hsd" = numeric(ql), "Osd" = numeric(ql),
                  "HOc" = numeric(ql))
for(i in seq_along(qu)){
  pd_q$H[i] = mean(pd$data$d2H[q == qu[i]], na.rm = TRUE)
  pd_q$O[i] = mean(pd$data$d18O[q == qu[i]], na.rm = TRUE)
  pd_q$Hsd[i] = sd(pd$data$d2H[q == qu[i]], na.rm = TRUE)
  pd_q$Osd[i] = sd(pd$data$d18O[q == qu[i]], na.rm = TRUE)
  pd_q$HOc[i] = cov(pd$data$d18O[q == qu[i]], pd$data$d2H[q == qu[i]],
                    use = "pairwise.complete.obs")
}
pd_q

##            H          O      Hsd      Osd       HOc
## 1 -111.36000 -15.634667 59.49320 7.434212 439.68889
## 2  -32.91821  -4.784643 24.80235 3.257371  77.32559
## 3  -26.05879  -4.285000 19.06960 2.374869  44.77920
## 4  -54.53600  -9.051429 36.48143 4.546960 163.92950

#make the iso object, providing the stats calculated above
sources = iso(pd_q$H, pd_q$O, pd_q$Hsd, pd_q$Osd, pd_q$HOc)

Now we can run the analysis; let’s use the parallel option (also available in mwlSource()) to speed this up and run slightly longer chains:

mixs1 = mixSource(obs, sources, slope, ngens = 2e4, ncores = 2)
mixs1$summary

## 405284   5.2508292
## deviance      -0.3333928  1.9539108  -2.259573697  -1.73351892  -0.9366184
## fracs[1]       0.2837213  0.1758833   0.026394852   0.14987218   0.2474793
## fracs[2]       0.1800985  0.1520803   0.001129753   0.05125594   0.1555840
## fracs[3]       0.2580554  0.1722586   0.027358555   0.11987938   0.2332106
## fracs[4]       0.2781248  0.2124371   0.021405929   0.10456479   0.2176328
## mixture_d18O  -9.2217226  1.6745178 -13.930378272  -9.94939600  -9.0002620
## mixture_d2H  -61.2722824 10.4124205 -90.697595212 -66.00835295 -59.6810570
##                      75%       97.5%     Rhat n.eff
## E              9.3999917  13.4017556 1.203728    15
## S              5.4323322   5.7835334 1.048780    40
## deviance       0.4333253   4.9098993 1.001209  3700
## fracs[1]       0.4149136   0.6126021 1.316659     9
## fracs[2]       0.2546101   0.5448602 1.166084    19
## fracs[3]       0.3458363   0.6873533 1.061216  2200
## fracs[4]       0.4131015   0.7556372 1.066266    37
## mixture_d18O  -8.0925087  -6.6845043 1.263138    13
## mixture_d2H  -54.3180141 -45.8966495 1.273361    13

Again, we’re a long way from having a strongly converged simulation in which the posterior is likely to be a representative sample of the population parameters, but the initial ‘hint’ is that the lake sample values are consistent with a sub-equal source contribution from precipitation in each calendar quarter. Notice, too, that the evaporation-corrected sample values (‘mixture_d18O’ and ‘mixture_d2H’) are broadly similar to the equivalent values we obtained with the MWL model, above.

The default options use a prior that weights each source evenly, but if we have other information we can use the arguments mixprior and shp to prescribe different distributions. Values in mixprior are relative, so for example if we think Q1 is likely to have contributed twice as much as the other seasons…this may or may not have a strong impact on the results, depending on the strength of the constraints provided by the isotopic data:

mixs2 = mixSource(obs, sources, slope, prior = c(2, 1, 1, 1), ngens = 2e4, ncores = 2)
boxplot(mixs1$results[, 3:6], outline = FALSE)
boxplot(mixs2$results[, 3:6], add = TRUE, col = rgb(1, 0, 0, 0.5),
        outline = FALSE)