--- title: "RFPM" author: "Brian Church and Claire Detering" output: html_vignette vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{RFPM} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} require(dplyr) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `RFPM` provides an implementation of the Floating Percentile Model (FPM) in the R statistical environment. The FPM was originally developed by others on behalf of the Washington State Department of Ecology (Avocet 2003) and has since been recommended for use in Washington, Oregon, and Idaho (Ecology 2011). The FPM currently exists as a Microsoft Excel macro (written with Visual Basic for Applications). `RFPM` represents an independent effort to port the FPM to R with the purpose of: - correcting a data handling error - streamlining the chemical selection and sediment quality benchmark calculation procedures - providing new tools to rapidly evaluate and refine benchmarks - improving several aspects of the FPM algorithm - improving the availability, accessibility, and transparency of the FPM tool The purpose of this vignette is to lead the reader through an example analysis of real-world data using the core functions of `RFPM` rather than go into great detail on any particular function. For more function details, see the help files by using `?` or `help()` (or by navigating in RStudio to the Packages tab, selecting RFPM from the list of available packages, and then selecting the function of interest). ## Functionality The main user interface within `RFPM` is via the `FPM` function. To use this function, the user must first prepare a sediment chemistry and toxicity data.frame that contains one or more chemical columns and a single toxicity column called `Hit`. Several example datasets are provided with `RFPM` with this structure, for example `h.northport`, which we will evaluate repeatedly in this vignette. Other datasets include the empirical `c.northport` and `h.tristate` as well as simulated FPM data `perfect`, `lowNoise`, and `highNoise`. Other key functions that the user is directed to include `optimFPM`, `cvFPM`, and `chemVI` which can help the user to explore and refine FPM inputs and optimize FPM outputs. ## Case study - Amphipod toxicity in Northport, WA sediments In this vignette, we focus on evaluating a case study dataset called `h.northport`, which includes a data.frame of bulk sediment chemistry metals concentrations (mg/kg) and `Hit` data. Hits are logical results indicating whether a particular sample was deemed toxic (`TRUE`) or not (`FALSE`) based on the *Hyalella azteca* biomass endpoint in 28-day laboratory exposures. The definition of a toxic Hit is up to the practitioner and not a part of the model. Data must be input to `FPM` in a cross-tab (wide) format, meaning each sediment chemical has a unique column and each sample has just one row. ```{r, include = F} library(RFPM) ``` ```{r} head(h.northport) ``` To run `FPM`, you will need to specify, at a minimum, the data.frame object and the column names associated with sediment chemistry data. As noted above, `FPM` also requires that the data.frame include a column called `Hit`. Columns other than `Hit` and those specified using the `paramList` argument are ignored. ```{r} p.northport <- names(h.northport)[1:10] ## all chemical column names FPM(data = h.northport, paramList = p.northport) ## minimum input - dataset and chemical column names ``` The output of `FPM` is a list with at least 1 item called `FPM`, which displays three types of data: - The sediment quality benchmark for each of the chemicals selected by `FPM` as significant (meaning that concentrations when `Hit == TRUE` are significantly greater than when `Hit == FALSE`). - Selected `FN_crit` value, the user-defined limit on benchmark conservatism (default = `0.2`) - Classification metrics: e.g., false negatives rate (`pFN`), false positive rate (`pFP`), sensitivity (`sens`), specificity (`spec`), overall reliability (`OR`), Fowlkes-Mallows Index (`FM`), and Matthew's Correlation Coefficient (`MCC`). These provide an indication of the relative fit of the FPM benchmarks to the underlying toxicity dataset. This is technically all that was required to run the FPM on this example dataset. There is of course a lot going on behind the scenes. Functions that were used within FPM include: - `chemSig`: chemical selection algorithm, which evaluates distribution assumptions of normality and equal variance, then applies appropriate hypothesis test methods to identify significant differences between concentrations when `Hit == TRUE` and `Hit == FALSE`. The output of `chemSig` is a logical vector indicating which chemicals are significant (therefore selected by `FPM` for generating benchmarks). - `chemSigSelect`: a function that uses `chemSig` but instead returns the chemistry data for significant chemicals and has options for plotting chemistry data distributions to compare `Hit == TRUE` and `Hit == FALSE` subsets. Setting `plot = TRUE` in `FPM` turns on plot outputs (by passing that argument to `chemSigSelect`). As an example, see the following figures generated by `plot.chemSigSelect` (identical to those generated by `FPM` when `plot = TRUE`) for a significant chemical (`Cr`), shown with blue points, and a non-significant chemical (`Al`), shown with green points: ```{r} plot(chemSigSelect(h.northport[, c("Al", "Cr", "Hit")], paramList = c("Al", "Cr")), type = "boxplot") ``` There are also many arguments that can be adjusted in `FPM` that affect how `chemSig` and `chemSigSelect` function or how the FPM algorithm itself functions. Users can adjust the test methods/assumptions for selecting chemicals, force the FPM to accept the assumptions of the original Excel-based function (i.e., `ExcelMode == TRUE`), etc. As noted above, the output of `FPM` is a list; additional items can be appended to the list if the user sets the `densInfo`, `lockInfo`, or `hitInfo` arguments equal to `TRUE`. Respectively, these provide information about 1) how much each of the benchmarks varied within the FPM algorithm, 2) what caused each benchmark to lock into place and in what order, and 3) what the predicted `Hit` values would be based on the calculated benchmarks. We have been interested in each of these components at various times of model development, but expect the typical user to not need this information. See `?FPM` and `?chemSig` for more information. ## Optimizing FPM Inputs In `RFPM`, there are currently two optimization functions available: - `optimFPM` is intended to find an optimal input (to `FPM`) of the `FN_crit` and/or `alpha.test` parameter. These optimal levels are based on the full dataset (input as `data` argument) and, therefore, may be over-specific to that data. The `FN_crit` and `alpha.test` arguments should be input either as ranges or single values depending on how you want the optimization to run. Inputting a single value for `FN_crit` and a range for `alpha.test` would optimize the `alpha.test` value only and vice-versa. Inputting ranges for both values results in slightly more complex output owing to the 2-dimensional optimization. - `cvFPM` is similarly parameterized to optimize `FN_crit` and/or `alpha.test` inputs using a cross-validation (CV) approach. The results are "best" values that account for uncertainty in future data. The user can adjust the `k` parameter to change the number of folds in the CV algorithm (up to k = n, where n is the number of the samples). This affects the smoothness of the mean/median optimization curves by generating more hypothetical datapoints; it also is likely to expand the min/max range of optimization values by providing more chances for extreme outcomes. By using larger training datasets (i.e., with larger `k` inputs), the CV-based optimum will converge toward the empirical optimization curve generated by `optimFPM` (and shown in the output of `cvFPM`). While we generally recommend using `cvFPM` over `optimFPM` when possible, there are cases when `cvFPM` cannot be used. These include the analysis of small datasets and/or when `k` is set to a small number; in either of these cases, `FPM` may fail because of a lack of significant chemicals being identified `chemSig` (dependent on N). This is an example of the empirical optimization approach with `optimFPM`: ```{r, fig.width = 7, fig.height = 6} ## one-way optimization of the FN Limit - vertical lines show best values based on two metrics optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05) ## two-way optimization of both FN Limit and alpha - black squares show best values based on two metrics optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = seq(0.01, 0.2, 0.01)) ``` The resulting plots show a mix of potential outcomes. When only one parameter is being optimized (i.e., `length(alpha.test) == 1 | length(FN_crit) == 1`), then the first plot is produced, showing fit statistics over the evaluated range. Vertical lines identify optimal values in `OR`, sensitivity/specificity ratio (equal to `min(c(sens, spec))/max(c(sens, spec))`, `FM`, or `MCC`. In the case that two parameters are being optimized together (i.e., `length(alpha.test) > 1 & length(FN_crit) > 1`), then a matrix of results is provided representing with heat colors (by default) to indicate the optimization outcome (yellow = suboptimal, red = optimal). Squares indicate the selected values, which are also output into the console. This is an example of the CV-based optimization approach with `cvFPM` and 10-folds: ```{r, fig.width = 7, fig.height = 6} cvFPM(h.northport, p.northport, k = 10, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05, which = 2) ``` Optimization statistics from multiple CV runs are shown (as well as the empirical "all data" result from `optimFPM`) as lines. When run with multiple `alpha.test` and `FN_crit` inputs, `cvFPM` will generate a heat map simliar to `optimFPM` with values representing on the mean result from CV runs. Based on the output from either `optimFPM` or `cvFPM`, the practitioner may decide to adjust and rerun the `FPM` using optimized inputs. The decision of whether this is appropriate will depend on the specific management scenario. For example, the optimal FN Limit could end up being 80%, but that lack of conservatism may be unacceptable to a regulator. Similarly, the optimal alpha.test could be 0.5, meaning that the regulator would need to accept a 50% probability of error when selecting significant chemicals; this is a very high uncertainty, 10-fold higher than typical. Regardless, optimization provides useful context to the `RFPM` user. ## Variable Importance The `chemVI` function outputs several potentially useful metrics for evaluating individual metals in the sediment quality benchmark set. These metrics are: - `chemDensity`: how little the benchmark "floated" within the FPM algorithm. High density values correspond to benchmarks that remained at a relatively low concentrations (i.e., did not float up), suggesting relative importance - `MADP`: average change in other chemical's benchmarks resulting from removal of a chemical from `data` - `dOR`: change in the overall reliability resulting from removal of the chemical from `data` - `dFM`: change in the Folkes-Mallows Index resulting from removal of the chemical from `data` - `dMCC`: change in the Matthew's COrrelation Coefficient resulting from removal of the chemical from `data` The `MADP` and `dOR` metrics are the more useful for identifying important chemicals; for example: ```{r} chemVI(h.northport, p.northport) ``` Here we see that copper (`Cu`) has an `MADP`, `dOR`, `dFM` and `dMCC` equal to `0`, meaning that there was no change in the benchmarks or their ability to predict toxicity after removing copper. Clearly copper is, therefore, not important in this case and could be excluded from further consideration. **However, that isn't to say that copper isn't related to toxicity; it's very important to keep in mind that the FPM is a classification tool that attempts to accurately predict** `Hit` **values**. The significant chemicals in this list are mostly well correlated, so toxicity could be related to one, all, or perhaps none of the chemicals (i.e., an unmeasured but related chemical or non-chemical stressor). The inclusion of new chemicals in (or exclusion of important chemicals from) `data` can result in markedly different benchmarks and `chemVI` metrics (particularly for the `chemDensity` metric). Therefore, we recommend pre-screening `data` to remove chemicals that are unlikely to cause toxicity (e.g., due to poor bioavailability, low toxicity, or uniformly low concentrations). These chemicals, while potentially "significant" with respect to having higher concentrations when `Hit == TRUE`, may lead to problems down the road when predicting toxicity more generally. In other words, try to the extent possible to establish a weight of evidence pointing toward causation rather than letting pure correlation lead your analysis. In the current example, we perhaps should have pre-screened chemicals like iron (`Fe`) and aluminum (`Al`), which we expected to have low toxicity at natural levels. Aluminum was ultimately screened out by `FPM` as non-significant, but iron was significant and retained. `chemVI` can be helpful with respect to developing a parsimonious benchark set by identifying benchmarks that do not affect prediction at the site. In our `h.northport` example, `Cu` and `Fe` will be removed because they don't affect toxicity predictions (either positively or negatively). Whenever any chemicals would be removed from consideration through an iterative FPM benchmark development process, `FPM` must be rerun to generate new benchmarks. *We caution the user to drop only one chemical at a time, rerunning* `FPM` *each time*. It is possible that dropping one chemical can result in unexpected results, such as other chemicals becoming more important. Compare the two outputs: ```{r} FPM(data = h.northport, paramList = p.northport) FPM(data = h.northport, paramList = c("Cr", "Zn")) ``` For the FPM benchmarks generated for `h.northport`, the `MADP` for `Fe` was >0, and the benchmark for `Zn` shifted after removing `Fe` from consideration. Although the benchmark changed, the classification errors did not, as reflected by the unchanged `OR`. The final FPM benchmark set including only `Cr` and `Zn` is, therefore, the most parsimonious, by which we mean that it classifies `Hit` values just as accurately as the larger set of benchmarks but does so with fewer benchmarks. We recommend using a parsimonious set of benchmarks on a site-specific basis, whenever possible, because this should avoid problems of "overfitting" that might cause confounding variables to trigger erroneous conclusions. You may have noticed that we didn't use optimized inputs in the example above for `chemVI`. We omitted that to provide an example where relatively unimportant chemicals could be screened out using the metrics generated by `chemVI`. If we instead change the inputs of `chemVI` to the optimized `FN_crit` and `alpha.test` (using the `OR` optimization approach), the `chemVI` results look like this instead: ```{r} FPM(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15)$FPM chemVI(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15) ``` In this case, adjusting `alpha.test` has allowed for 2 new chemicals to be added, aluminum and lead (`Pb`), which were not selected when `alpha.test = 0.05`. After including these chemicals (and slightly adjusting the `FN_crit`), the final `OR` increased to 67%, a 10% improvement in accuracy over the default values (not shown). `MCC` increased by a similar margin. Based on the results of `chemVI` shown above, there is a marked shift in the relative importance of metals for toxicity predictions. Removing any of the chemicals would result in a roughly 10% reduction in accuracy (based on the `dOR` metric), and removing `Cu` now would have an effect on the other benchmarks values (as shown by the `MADP` metric >0). The `chemDensity` value for `Cr` has dropped to <1%, meaning that the benchmark is now close to the maximum possible value, whereas it had one of the highest `chemDensity` values before. As you can see, variable importance is dependent on the *set* of benchmarks, not individual benchmarks in isolation. The addition of new chemicals impacted not only on the relative importance of each chemical but the overall reliability of the set of chemicals and on the FPM benchmark values themselves. ## Notes on Benchmark Applications and Interpretation We want to emphasize several findings from our analyses of the characteristics, behavior, and sensitivity of the `FPM` (the subject of two anticipated manuscripts). 1. FPM benchmarks should be treated as a *set* of values, not values to be used singularly or mixed and matched between runs of the `FPM` using different species, toxicity test endpoints, sites, etc. There may be a desire or an inclination to select the most sensitive species and test endpoints among chemicals. Doing so, however, would invalidate the various assumptions made when generating FPM benchmarks, including the projected FN Limit (i.e., intended `FN_crit` and actual `pFN`) and the accuracy of benchmarks (i.e., `OR`, `FM`, and `MCC`). If there are multiple species that need to be accounted for by the model, consider different approaches to defining `Hit` values that capture multiple species and/or endpoints prior to running `FPM`. For example, assign `Hit == TRUE` if any of several endpoints indicates toxicity in a given sample. This approach will result in a single benchmark set for all endpoints rather than generating multiple sets of benchmarks (as was done by Ecology 2011). The resulting benchmarks may be conservative, but they would not deviate from the intended statistical underpinnings of FPM outputs in the same way that mixing-and-matching multiple benchmark sets would. 2. FPM benchmarks should be considered as predictive of toxicity *classifications* (i.e., `Hit` values) rather than continuous toxicity test results (e.g., 10% mortality, 27% biomass, etc.). For this reason, we would generally argue against the application of FPM benchmarks within a Natural Resource Damage Assessment (NRDA) context. We recognize that future approaches might be proposed for applying FPM benchmarks in a NRDA context; while theoretically possible, such attempts should be considered carefully, particularly with regard to how Hits are defined (and whether the definition reasonably corresponds with definitions of injury). 3. FPM benchmarks are most powerful when they are site-specific. By using site toxicity and sediment chemistry data, and by implicitly considering factors like bioavailability, FPM benchmarks should improve upon existing default benchmarks (Ecology 2011). Therefore, we recommend using `RFPM` and site-specific data to the extent practicable to derive new benchmarks. 4. While it is most common to generate benchmarks based on bulk sediment concentrations, it is not necessary that this be the case. For example, benchmarks could reflect organic carbon-normalized sediment concentrations, pore water concentrations, etc. These measurements might improve the relevance of toxicity predictions by taking bioavailability more explicitly into account. Attempts to generate benchmarks based on these types of data simply complicate the application of the benchmarks by requiring all future samples be analyzed and concentrations reported in the same way. Moreover, some approaches will not be possible such as the simultaneously extracted metal - acid volatile sulfide method, which can generate non-positive data (that cannot be handled by `FPM`). 5. While there may be a concern about mixing chemicals of different types and mechanisms/modes of action (e.g., metals and polycyclic aromatic hydrocarbons) when generating FPM benchmarks, it does not seem to us that such a distinction should matter. From a conceptual standpoint, the FPM can be used to predict any variable that is distilled to a `TRUE`/`FALSE` classification from a set of one or more independent variables (chemical or otherwise). The key limitations that arise when mixing chemicals (and/or non-chemical stressors) are: - All analytes added to the `FPM` via `data` need to have a common directionality with respect to `Hits`. In other words, toxicity is expected to increase as, for example, copper increases. Therefore, you could not also include pore water acidity, which increases toxicity at both high and low pH. This limitation is related to the chemical selection process, which (by default) assumes that concentrations will be higher among toxic samples. It also relates to the iteratively increasing nature of the `FPM` algorithm; concentrations start at a low level and then shift in only one direction (up/increasing). - The inclusion of many correlated variables may increase the number of irrelevant benchmarks. For example, in the analysis of `h.northport` dataset presented above (using default `FPM` inputs), iron was correlated with other metals (not shown) and, as a result, was selected for inclusion among the benchmarks despite being relatively non-toxic and contributing not at all to the predictive accuracy of the benchmark set. As described above, `chemVI` can help to identify potential simplifications to the benchmark set when correlated variables lead to unnecessary complexity. The user is also free to pre-screen their data to remove redundant variables. 6. `FPM` has been found to work poorly or not at all with missing data. Consider removing analytes with incomplete datasets, omit samples with partial analysis, and/or impute values to fill data gaps. 7. `FPM` does not currently address non-detection in a meaningful way. As the model is largely based on percentiles and ranges rather than averages, this shouldn't generally lead to major problems as long as there isn't a high degree of non-detection (e.g., >20%). We leave it to the user to address how non-detects will be handled and to use FPM benchmarks based on non-detected data with caution. High degrees of non-detection of a chemical is a good reason for excluding it from `data` (or `paramList`), as there is not a reasonable way to use such a chemical to predict Hits. The use of more advanced censored data imputation methods (e.g., Kaplan-Meier or Regression on Order Statistics [ROS]) may provide useful, but should also be applied with care; using these methods (in addition to treating non-detects as half of detection limits or as zero) have the potential to result in benchmarks that fall below detection limits. We would argue that it is unreasonable to apply benchmarks that fall at or below detection limits. 8. As with other toxicity-based benchmarks, it may be important to consider background chemical concentrations and reference area toxicity levels when developing and applying `FPM` benchmarks. Reference area toxicity can be incorporated explicitly into `Hits` definitions by normalizing toxicity data to the reference condition prior to classifying `Hits`, and background can be used to qualify chemical exceedances of FPM benchmarks that fall below background levels. Examples of applying a reference condition would include dividing site toxicity by mean reference area toxicity (then applying a default effect threshold like 80%) or using a percentile of reference toxicity as the defining threshold for Hits (e.g., site toxicity >80th percentile of reference effect). ## References cited: Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA. Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.