RFPM
provides an
implementation of the Floating Percentile Model (FPM) in the R
statistical environment. The FPM was originally developed by others on
behalf of the Washington State Department of Ecology (Avocet 2003) and
has since been recommended for use in Washington, Oregon, and Idaho
(Ecology 2011). The FPM currently exists as a Microsoft Excel macro
(written with Visual Basic for Applications). RFPM
represents an independent effort to port the FPM to R with the purpose
of:
- correcting a data handling error
- streamlining the chemical selection and sediment quality benchmark
calculation procedures
- providing new tools to rapidly evaluate and refine benchmarks
- improving several aspects of the FPM algorithm
- improving the availability, accessibility, and transparency of the FPM
tool
The purpose of this vignette is to lead the reader through an example
analysis of real-world data using the core functions of
RFPM
rather than go into great detail on any particular
function. For more function details, see the help files by using
?
or help()
(or by navigating in RStudio to
the Packages tab, selecting RFPM from the list of available packages,
and then selecting the function of interest).
The main user interface within RFPM
is via the
FPM
function. To use this function, the user must first
prepare a sediment chemistry and toxicity data.frame that contains one
or more chemical columns and a single toxicity column called
Hit
. Several example datasets are provided with
RFPM
with this structure, for example
h.northport
, which we will evaluate repeatedly in this
vignette. Other datasets include the empirical c.northport
and h.tristate
as well as simulated FPM data
perfect
, lowNoise
, and
highNoise
.
Other key functions that the user is directed to include
optimFPM
, cvFPM
, and chemVI
which
can help the user to explore and refine FPM inputs and optimize FPM
outputs.
In this vignette, we focus on evaluating a case study dataset called
h.northport
, which includes a data.frame of bulk sediment
chemistry metals concentrations (mg/kg) and Hit
data. Hits
are logical results indicating whether a particular sample was deemed
toxic (TRUE
) or not (FALSE
) based on the
Hyalella azteca biomass endpoint in 28-day laboratory
exposures. The definition of a toxic Hit is up to the practitioner and
not a part of the model. Data must be input to FPM
in a
cross-tab (wide) format, meaning each sediment chemical has a unique
column and each sample has just one row.
head(h.northport)
#> Al As Cu Cd Cr Fe Pb Hg Ni Zn Organism
#> 442 8640 8.38 396.0 3.740 34.6 56900 226.0 0.159 13.20 4300 HA
#> 443 5890 7.69 167.0 4.860 18.7 28500 219.0 0.247 14.00 1780 HA
#> 444 10200 11.10 913.0 0.727 50.8 83300 151.0 0.051 9.91 6490 HA
#> 445 15800 14.60 1630.0 0.688 100.0 161000 211.0 0.016 13.10 13600 HA
#> 446 5950 2.55 30.4 0.721 11.5 14600 63.8 0.020 11.20 484 HA
#> 447 8370 8.74 644.0 0.774 39.3 62400 99.2 0.027 9.68 4740 HA
#> Meas_Day Endpoint Hit
#> 442 28 Biomass FALSE
#> 443 28 Biomass FALSE
#> 444 28 Biomass TRUE
#> 445 28 Biomass TRUE
#> 446 28 Biomass FALSE
#> 447 28 Biomass FALSE
To run FPM
, you will need to specify, at a minimum, the
data.frame object and the column names associated with sediment
chemistry data. As noted above, FPM
also requires that the
data.frame include a column called Hit
. Columns other than
Hit
and those specified using the paramList
argument are ignored.
p.northport <- names(h.northport)[1:10] ## all chemical column names
FPM(data = h.northport, paramList = p.northport) ## minimum input - dataset and chemical column names
#> $FPM
#> Cr Cu Fe Zn FN_crit TP FN TN FP pFN pFP sens spec OR FM
#> 1 26.7 2740 27400 738 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.571 0.627
#> MCC
#> 1 0.222
The output of FPM
is a list with at least 1 item called
FPM
, which displays three types of data:
- The sediment quality benchmark for each of the chemicals selected by
FPM
as significant (meaning that concentrations when
Hit == TRUE
are significantly greater than when
Hit == FALSE
).
- Selected FN_crit
value, the user-defined limit on
benchmark conservatism (default = 0.2
)
- Classification metrics: e.g., false negatives rate (pFN
),
false positive rate (pFP
), sensitivity (sens
),
specificity (spec
), overall reliability (OR
),
Fowlkes-Mallows Index (FM
), and Matthew’s Correlation
Coefficient (MCC
). These provide an indication of the
relative fit of the FPM benchmarks to the underlying toxicity
dataset.
This is technically all that was required to run the FPM on this
example dataset. There is of course a lot going on behind the scenes.
Functions that were used within FPM include:
- chemSig
: chemical selection algorithm, which evaluates
distribution assumptions of normality and equal variance, then applies
appropriate hypothesis test methods to identify significant differences
between concentrations when Hit == TRUE
and
Hit == FALSE
. The output of chemSig
is a
logical vector indicating which chemicals are significant (therefore
selected by FPM
for generating benchmarks).
- chemSigSelect
: a function that uses chemSig
but instead returns the chemistry data for significant chemicals and has
options for plotting chemistry data distributions to compare
Hit == TRUE
and Hit == FALSE
subsets. Setting
plot = TRUE
in FPM
turns on plot outputs (by
passing that argument to chemSigSelect
).
As an example, see the following figures generated by
plot.chemSigSelect
(identical to those generated by
FPM
when plot = TRUE
) for a significant
chemical (Cr
), shown with blue points, and a
non-significant chemical (Al
), shown with green points:
plot(chemSigSelect(h.northport[, c("Al", "Cr", "Hit")], paramList = c("Al", "Cr")), type = "boxplot")
There are also many arguments that can be adjusted in
FPM
that affect how chemSig
and
chemSigSelect
function or how the FPM algorithm itself
functions. Users can adjust the test methods/assumptions for selecting
chemicals, force the FPM to accept the assumptions of the original
Excel-based function (i.e., ExcelMode == TRUE
), etc. As
noted above, the output of FPM
is a list; additional items
can be appended to the list if the user sets the densInfo
,
lockInfo
, or hitInfo
arguments equal to
TRUE
. Respectively, these provide information about 1) how
much each of the benchmarks varied within the FPM algorithm, 2) what
caused each benchmark to lock into place and in what order, and 3) what
the predicted Hit
values would be based on the calculated
benchmarks. We have been interested in each of these components at
various times of model development, but expect the typical user to not
need this information.
See ?FPM
and ?chemSig
for more
information.
In RFPM
, there are currently two optimization functions
available:
- optimFPM
is intended to find an optimal input (to
FPM
) of the FN_crit
and/or
alpha.test
parameter. These optimal levels are based on the
full dataset (input as data
argument) and, therefore, may
be over-specific to that data. The FN_crit
and
alpha.test
arguments should be input either as ranges or
single values depending on how you want the optimization to run.
Inputting a single value for FN_crit
and a range for
alpha.test
would optimize the alpha.test
value
only and vice-versa. Inputting ranges for both values results in
slightly more complex output owing to the 2-dimensional
optimization.
- cvFPM
is similarly parameterized to optimize
FN_crit
and/or alpha.test
inputs using a
cross-validation (CV) approach. The results are “best” values that
account for uncertainty in future data. The user can adjust the
k
parameter to change the number of folds in the CV
algorithm (up to k = n, where n is the number of the samples). This
affects the smoothness of the mean/median optimization curves by
generating more hypothetical datapoints; it also is likely to expand the
min/max range of optimization values by providing more chances for
extreme outcomes. By using larger training datasets (i.e., with larger
k
inputs), the CV-based optimum will converge toward the
empirical optimization curve generated by optimFPM
(and
shown in the output of cvFPM
). While we generally recommend
using cvFPM
over optimFPM
when possible, there
are cases when cvFPM
cannot be used. These include the
analysis of small datasets and/or when k
is set to a small
number; in either of these cases, FPM
may fail because of a
lack of significant chemicals being identified chemSig
(dependent on N).
This is an example of the empirical optimization approach with
optimFPM
:
## one-way optimization of the FN Limit - vertical lines show best values based on two metrics
optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05)
#> sensSpec OR FM MCC
#> 0.4 0.6 0.1 0.1
## two-way optimization of both FN Limit and alpha - black squares show best values based on two metrics
optimFPM(h.northport, p.northport,
FN_crit = seq(0.1, 0.9, 0.05),
alpha.test = seq(0.01, 0.2, 0.01))
#> sensSpec OR FM MCC
#> FN_crit 0.40 0.25 0.10 0.25
#> alpha.test 0.01 0.11 0.07 0.11
The resulting plots show a mix of potential outcomes. When only one
parameter is being optimized (i.e.,
length(alpha.test) == 1 | length(FN_crit) == 1
), then the
first plot is produced, showing fit statistics over the evaluated range.
Vertical lines identify optimal values in OR
,
sensitivity/specificity ratio (equal to
min(c(sens, spec))/max(c(sens, spec))
, FM
, or
MCC
.
In the case that two parameters are being optimized together (i.e.,
length(alpha.test) > 1 & length(FN_crit) > 1
),
then a matrix of results is provided representing with heat colors (by
default) to indicate the optimization outcome (yellow = suboptimal, red
= optimal). Squares indicate the selected values, which are also output
into the console.
This is an example of the CV-based optimization approach with
cvFPM
and 10-folds:
cvFPM(h.northport, p.northport, k = 10, FN_crit = seq(0.1, 0.9, 0.05), alpha.test = 0.05, which = 2)
#> FN_crit alpha.test
#> sensSpecRatio 0.4 0.05
#> OR 0.8 0.05
#> FM 0.1 0.05
#> MCC 0.6 0.05
Optimization statistics from multiple CV runs are shown (as well as
the empirical “all data” result from optimFPM
) as lines.
When run with multiple alpha.test
and FN_crit
inputs, cvFPM
will generate a heat map simliar to
optimFPM
with values representing on the mean result from
CV runs.
Based on the output from either optimFPM
or
cvFPM
, the practitioner may decide to adjust and rerun the
FPM
using optimized inputs. The decision of whether this is
appropriate will depend on the specific management scenario. For
example, the optimal FN Limit could end up being 80%, but that lack of
conservatism may be unacceptable to a regulator. Similarly, the optimal
alpha.test could be 0.5, meaning that the regulator would need to accept
a 50% probability of error when selecting significant chemicals; this is
a very high uncertainty, 10-fold higher than typical. Regardless,
optimization provides useful context to the RFPM
user.
The chemVI
function outputs several potentially useful
metrics for evaluating individual metals in the sediment quality
benchmark set. These metrics are:
- chemDensity
: how little the benchmark “floated” within
the FPM algorithm. High density values correspond to benchmarks that
remained at a relatively low concentrations (i.e., did not float up),
suggesting relative importance
- MADP
: average change in other chemical’s benchmarks
resulting from removal of a chemical from data
- dOR
: change in the overall reliability resulting from
removal of the chemical from data
- dFM
: change in the Folkes-Mallows Index resulting from
removal of the chemical from data
- dMCC
: change in the Matthew’s COrrelation Coefficient
resulting from removal of the chemical from data
The MADP
and dOR
metrics are the more
useful for identifying important chemicals; for example:
chemVI(h.northport, p.northport)
#> chemDensity MADP dOR dFM dMCC
#> Cr 98.800 924.472 -3.503 -1.435 -14.865
#> Cu 0.001 0.000 0.000 0.000 0.000
#> Fe 100.000 7.020 0.000 0.000 0.000
#> Zn 99.100 -32.485 -1.051 -0.478 -4.955
Here we see that copper (Cu
) has an MADP
,
dOR
, dFM
and dMCC
equal to
0
, meaning that there was no change in the benchmarks or
their ability to predict toxicity after removing copper. Clearly copper
is, therefore, not important in this case and could be excluded from
further consideration. However, that isn’t to say that copper
isn’t related to toxicity; it’s very important to keep in mind that the
FPM is a classification tool that attempts to accurately
predict Hit
values. The
significant chemicals in this list are mostly well correlated, so
toxicity could be related to one, all, or perhaps none of the chemicals
(i.e., an unmeasured but related chemical or non-chemical stressor). The
inclusion of new chemicals in (or exclusion of important chemicals from)
data
can result in markedly different benchmarks and
chemVI
metrics (particularly for the
chemDensity
metric). Therefore, we recommend pre-screening
data
to remove chemicals that are unlikely to cause
toxicity (e.g., due to poor bioavailability, low toxicity, or uniformly
low concentrations). These chemicals, while potentially “significant”
with respect to having higher concentrations when
Hit == TRUE
, may lead to problems down the road when
predicting toxicity more generally. In other words, try to the extent
possible to establish a weight of evidence pointing toward causation
rather than letting pure correlation lead your analysis. In the current
example, we perhaps should have pre-screened chemicals like iron
(Fe
) and aluminum (Al
), which we expected to
have low toxicity at natural levels. Aluminum was ultimately screened
out by FPM
as non-significant, but iron was significant and
retained.
chemVI
can be helpful with respect to developing a
parsimonious benchark set by identifying benchmarks that do not affect
prediction at the site. In our h.northport
example,
Cu
and Fe
will be removed because they don’t
affect toxicity predictions (either positively or negatively). Whenever
any chemicals would be removed from consideration through an iterative
FPM benchmark development process, FPM
must be rerun to
generate new benchmarks. We caution the user to drop only one
chemical at a time, rerunning FPM
each time.
It is possible that dropping one chemical can result in unexpected
results, such as other chemicals becoming more important.
Compare the two outputs:
FPM(data = h.northport, paramList = p.northport)
#> $FPM
#> Cr Cu Fe Zn FN_crit TP FN TN FP pFN pFP sens spec OR FM
#> 1 26.7 2740 27400 738 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.571 0.627
#> MCC
#> 1 0.222
FPM(data = h.northport, paramList = c("Cr", "Zn"))
#> $FPM
#> Cr Zn FN_crit TP FN TN FP pFN pFP sens spec OR FM MCC
#> 1 26.1 910 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.571 0.627 0.222
For the FPM benchmarks generated for h.northport
, the
MADP
for Fe
was >0, and the benchmark for
Zn
shifted after removing Fe
from
consideration. Although the benchmark changed, the classification errors
did not, as reflected by the unchanged OR
. The final FPM
benchmark set including only Cr
and Zn
is,
therefore, the most parsimonious, by which we mean that it classifies
Hit
values just as accurately as the larger set of
benchmarks but does so with fewer benchmarks. We recommend using a
parsimonious set of benchmarks on a site-specific basis, whenever
possible, because this should avoid problems of “overfitting” that might
cause confounding variables to trigger erroneous conclusions.
You may have noticed that we didn’t use optimized inputs in the
example above for chemVI
. We omitted that to provide an
example where relatively unimportant chemicals could be screened out
using the metrics generated by chemVI
. If we instead change
the inputs of chemVI
to the optimized FN_crit
and alpha.test
(using the OR
optimization
approach), the chemVI
results look like this instead:
FPM(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15)$FPM
#> Al Cr Cu Fe Pb Zn FN_crit TP FN TN FP pFN pFP sens spec
#> 1 24400 131 131 247250 904 1320 0.25 46 15 53 33 0.246 0.384 0.754 0.616
#> OR FM MCC
#> 1 0.673 0.663 0.366
chemVI(h.northport, p.northport, FN_crit = 0.25, alpha.test = 0.15)
#> chemDensity MADP dOR dFM dMCC
#> Al 60.200 361.802 -18.128 -9.804 -54.098
#> Cr 0.512 0.000 0.000 0.000 0.000
#> Cu 99.500 -13.404 -5.052 -3.017 -15.027
#> Fe 1.060 0.000 0.000 0.000 0.000
#> Pb 0.090 0.000 0.000 0.000 0.000
#> Zn 100.000 -22.146 -20.208 -10.709 -60.383
In this case, adjusting alpha.test
has allowed for 2 new
chemicals to be added, aluminum and lead (Pb
), which were
not selected when alpha.test = 0.05
. After including these
chemicals (and slightly adjusting the FN_crit
), the final
OR
increased to 67%, a 10% improvement in accuracy over the
default values (not shown). MCC
increased by a similar
margin. Based on the results of chemVI
shown above, there
is a marked shift in the relative importance of metals for toxicity
predictions. Removing any of the chemicals would result in a roughly 10%
reduction in accuracy (based on the dOR
metric), and
removing Cu
now would have an effect on the other
benchmarks values (as shown by the MADP
metric >0). The
chemDensity
value for Cr
has dropped to
<1%, meaning that the benchmark is now close to the maximum possible
value, whereas it had one of the highest chemDensity
values
before.
As you can see, variable importance is dependent on the set of benchmarks, not individual benchmarks in isolation. The addition of new chemicals impacted not only on the relative importance of each chemical but the overall reliability of the set of chemicals and on the FPM benchmark values themselves.
We want to emphasize several findings from our analyses of the
characteristics, behavior, and sensitivity of the FPM
(the
subject of two anticipated manuscripts).
FPM benchmarks should be treated as a set of values, not
values to be used singularly or mixed and matched between runs of the
FPM
using different species, toxicity test endpoints,
sites, etc. There may be a desire or an inclination to select the most
sensitive species and test endpoints among chemicals. Doing so, however,
would invalidate the various assumptions made when generating FPM
benchmarks, including the projected FN Limit (i.e., intended
FN_crit
and actual pFN
) and the accuracy of
benchmarks (i.e., OR
, FM
, and
MCC
). If there are multiple species that need to be
accounted for by the model, consider different approaches to defining
Hit
values that capture multiple species and/or endpoints
prior to running FPM
. For example, assign
Hit == TRUE
if any of several endpoints indicates toxicity
in a given sample. This approach will result in a single benchmark set
for all endpoints rather than generating multiple sets of benchmarks (as
was done by Ecology 2011). The resulting benchmarks may be conservative,
but they would not deviate from the intended statistical underpinnings
of FPM outputs in the same way that mixing-and-matching multiple
benchmark sets would.
FPM benchmarks should be considered as predictive of toxicity
classifications (i.e., Hit
values) rather than
continuous toxicity test results (e.g., 10% mortality, 27% biomass,
etc.). For this reason, we would generally argue against the application
of FPM benchmarks within a Natural Resource Damage Assessment (NRDA)
context. We recognize that future approaches might be proposed for
applying FPM benchmarks in a NRDA context; while theoretically possible,
such attempts should be considered carefully, particularly with regard
to how Hits are defined (and whether the definition reasonably
corresponds with definitions of injury).
FPM benchmarks are most powerful when they are site-specific. By
using site toxicity and sediment chemistry data, and by implicitly
considering factors like bioavailability, FPM benchmarks should improve
upon existing default benchmarks (Ecology 2011). Therefore, we recommend
using RFPM
and site-specific data to the extent practicable
to derive new benchmarks.
While it is most common to generate benchmarks based on bulk
sediment concentrations, it is not necessary that this be the case. For
example, benchmarks could reflect organic carbon-normalized sediment
concentrations, pore water concentrations, etc. These measurements might
improve the relevance of toxicity predictions by taking bioavailability
more explicitly into account. Attempts to generate benchmarks based on
these types of data simply complicate the application of the benchmarks
by requiring all future samples be analyzed and concentrations reported
in the same way. Moreover, some approaches will not be possible such as
the simultaneously extracted metal - acid volatile sulfide method, which
can generate non-positive data (that cannot be handled by
FPM
).
While there may be a concern about mixing chemicals of different
types and mechanisms/modes of action (e.g., metals and polycyclic
aromatic hydrocarbons) when generating FPM benchmarks, it does not seem
to us that such a distinction should matter. From a conceptual
standpoint, the FPM can be used to predict any variable that is
distilled to a TRUE
/FALSE
classification from
a set of one or more independent variables (chemical or otherwise). The
key limitations that arise when mixing chemicals (and/or non-chemical
stressors) are:
FPM
via data
need to have a common directionality with respect to Hits
.
In other words, toxicity is expected to increase as, for example, copper
increases. Therefore, you could not also include pore water acidity,
which increases toxicity at both high and low pH. This limitation is
related to the chemical selection process, which (by default) assumes
that concentrations will be higher among toxic samples. It also relates
to the iteratively increasing nature of the FPM
algorithm;
concentrations start at a low level and then shift in only one direction
(up/increasing).h.northport
dataset presented above (using default
FPM
inputs), iron was correlated with other metals (not
shown) and, as a result, was selected for inclusion among the benchmarks
despite being relatively non-toxic and contributing not at all to the
predictive accuracy of the benchmark set. As described above,
chemVI
can help to identify potential simplifications to
the benchmark set when correlated variables lead to unnecessary
complexity. The user is also free to pre-screen their data to remove
redundant variables.FPM
has been found to work poorly or not at all with
missing data. Consider removing analytes with incomplete datasets, omit
samples with partial analysis, and/or impute values to fill data
gaps.
FPM
does not currently address non-detection in a
meaningful way. As the model is largely based on percentiles and ranges
rather than averages, this shouldn’t generally lead to major problems as
long as there isn’t a high degree of non-detection (e.g., >20%). We
leave it to the user to address how non-detects will be handled and to
use FPM benchmarks based on non-detected data with caution. High degrees
of non-detection of a chemical is a good reason for excluding it from
data
(or paramList
), as there is not a
reasonable way to use such a chemical to predict Hits. The use of more
advanced censored data imputation methods (e.g., Kaplan-Meier or
Regression on Order Statistics [ROS]) may provide useful, but should
also be applied with care; using these methods (in addition to treating
non-detects as half of detection limits or as zero) have the potential
to result in benchmarks that fall below detection limits. We would argue
that it is unreasonable to apply benchmarks that fall at or below
detection limits.
As with other toxicity-based benchmarks, it may be important to
consider background chemical concentrations and reference area toxicity
levels when developing and applying FPM
benchmarks.
Reference area toxicity can be incorporated explicitly into
Hits
definitions by normalizing toxicity data to the
reference condition prior to classifying Hits
, and
background can be used to qualify chemical exceedances of FPM benchmarks
that fall below background levels. Examples of applying a reference
condition would include dividing site toxicity by mean reference area
toxicity (then applying a default effect threshold like 80%) or using a
percentile of reference toxicity as the defining threshold for Hits
(e.g., site toxicity >80th percentile of reference effect).
Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA.
Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.