Title: | Nondetects and Data Analysis for Environmental Data |
---|---|
Description: | Contains methods described by Dennis Helsel in his book "Nondetects And Data Analysis: Statistics for Censored Environmental Data". |
Authors: | Lopaka Lee |
Maintainer: | Lopaka Lee <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.6-1.1 |
Built: | 2024-10-29 06:35:52 UTC |
Source: | CRAN |
This dataset was used by Helsel and Cohn (1988) to verify their software. It is provided for code validation purposes.
data(Silver)
data(Silver)
A list containing 56 observations with items 'obs' and 'censored'. 'obs' is a numeric vector of all observations (both censored and uncensored). 'censored' is a logical vector indicating where an element of 'obs' is censored (a less-than value).
Helsel and Cohn (1988)
Dennis R. Helsel and Timothy A. Cohn (1988), Estimation of descriptive statistics for multiply censored water quality data, Water Resources Research vol. 24, no. 12, pp.1997-2004
This dataset is a random selection of dissolved arsenic analyses taken during the U.S. Geological Survey's National Water Quality Assessment program (NAWQA).
data(Arsenic)
data(Arsenic)
A list containing 50 observations with items ‘As’, ‘AsCen’, ‘Aquifer’. ‘As’ is a numeric vector of all arsenic observations (both censored and uncensored). ‘AsCen’ is a logical vector indicating where an element of ‘As’ is censored (a less-than value). ‘Aquifer’ is a grouping factor of hypothetical hydrologic sources for the data.
U.S. Geological Survey National Water Quality Assessment Data Warehouse
The USGS NAWQA site at http://water.usgs.gov/nawqa
This dataset is a random selection of dissolved arsenic analyses taken during the U.S. Geological Survey's National Water Quality Assessment program (NAWQA).
data(NADA.As)
data(NADA.As)
A list containing 50 observations with items ‘obs’ and ‘censored’. ‘obs’ is a numeric vector of all arsenic observations (both censored and uncensored). ‘censored’ is a logical vector indicating where an element of ‘obs’ is censored (a less-than value).
U.S. Geological Survey National Water Quality Assessment Data Warehouse
The USGS NAWQA site at http://water.usgs.gov/nawqa
Artificial numbers representing arsenic concentrations in a drinking water supply.
Objective is to determine what can be done with data where all values are below the reporting limit. There is a detection limit at 1, and a reporting limit at 3 ug/L. Used in Chapter 8 of the NADA book
data(AsExample)
data(AsExample)
None. Generated.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for function asSurv
in package NADA.
asSurv
converts a Cen
object to a Surv
object.
## S4 method for signature 'Cen' asSurv(x) ## S4 method for signature 'formula' asSurv(x)
## S4 method for signature 'Cen' asSurv(x) ## S4 method for signature 'formula' asSurv(x)
x |
A |
Atrazine concentrations in a series of Nebraska wells before (June) and after (September) the growing season.
Objective is to determine if concentrations increase from June to September. There is one detection limit, at 0.01 ug/L. Used in Chapters 4, 5, and 9 of the NADA book.
data(Atra)
data(Atra)
Junk et al., 1980, Journal of Environmental Quality 9, pp. 479-483.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Alternative Atrazine concentrations altered from the Atra data set so that there are more nondetects, adding a second detection limit at 0.05.
Objective is to determine if concentrations increase from June to September. There are two detection limits, at 0.01 and 0.05 ug/L. Used in Chapters 5 and 9 of the NADA book.
data(AtraAlt)
data(AtraAlt)
Altered from the data of Junk et al., 1980, Journal of Environmental Quality 9, pp. 479-483.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
The same atrazine concentrations as in Atra, stacked into one column (col.1). Column 2 indicates the month of collection. Column 3 indicates which data are below the detection limit those with a value of 1.
Objective is to determine if concentrations increase from June to September. There is one detection limit, at 0.01 ug/L. Used in Chapter 9 of the NADA book.
data(Atrazine)
data(Atrazine)
Junk et al., 1980, Journal of Environmental Quality 9, pp. 479-483.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Blood-lead concentrations in herons of Virginia. Objective is to compute interval estimates for lead concentrations. There is one detection limit, at 0.02 ug/g. Used in Chapter 7 of the NADA book.
data(Bloodlead)
data(Bloodlead)
Golden et al., 2003, Environmental Toxicology and Chemistry 22, 1517-1524.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for box plotting objects in package NADA
## S4 method for signature 'ros' boxplot(x, ...)
## S4 method for signature 'ros' boxplot(x, ...)
x |
An output object from a NADA function such as |
... |
Additional arguments passed to the generic |
Cadmium concentrations in fish for two regions of the Rocky Mountains.
Objective is to determine if concentrations are the same or different in fish livers of the two regions. There are four detection limits, at 0.2, 0.3, 0.4, and 0.6 ug/L. Used in Chapter 9 of the NADA book.
data(Cadmium)
data(Cadmium)
none. Data modeled after several reports.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Create a censored object, usually used as a response variable in a model formula.
Cen(obs, censored, type = "left")
Cen(obs, censored, type = "left")
obs |
A numeric vector of observations. This includes both censored and uncensored observations. |
censored |
A logical vector indicating TRUE where an observation in obs is censored (a less-than value) and FALSE otherwise. |
type |
character string specifying the type of censoring. Possible values are
|
An object of class Cen
.
This, and related routines, are front ends to routines in the
survival
package. Since the survival routines can not handle
left-censored data, these routines transparently handle “flipping"
input data and resultant calculations. The Cen
function provides
part of the necessary framework for flipping.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) Cen(obs, censored) flip(Cen(obs, censored))
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) Cen(obs, censored) flip(Cen(obs, censored))
Draws a boxplot with the highest censoring threshold shown as a horizontal line. Any statistics below this line are invalid are must be estimated using methods for censored data.
cenboxplot(obs, cen, group, log=TRUE, range=0, ...)
cenboxplot(obs, cen, group, log=TRUE, range=0, ...)
obs |
A numeric vector of observations. |
cen |
A logical vector indicating TRUE where an observation in x is censored (a less-than value) and FALSE otherwise. |
group |
A factor vector used for grouping ‘obs’ into subsets (each group will be a separate box). |
log |
A TRUE/FALSE indicating if the y axis should be in log units. Default it TRUE. |
range |
This determines how far the plot whiskers extend out from the box. If 'range' is positive, the whiskers extend to the most extreme data point which is no more than 'range' times the interquartile range from the box. The default is zero which causes the whiskers to extend to the min and max data values. |
... |
Additional items that get passed to |
Returns the output of the default boxplot
method.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
data(Golden) with(Golden, cenboxplot(Blood, BloodCen, DosageGroup))
data(Golden) with(Golden, cenboxplot(Blood, BloodCen, DosageGroup))
Tests if there is a difference between two or more empirical cumulative
distribution functions (ECDF) using the family of tests,
or for a single curve against a known alternative.
cendiff(obs, censored, groups, ...)
cendiff(obs, censored, groups, ...)
obs |
Either a numeric vector of observations or a formula. See examples below. |
censored |
A logical vector indicating TRUE where an observation in ‘obs’ is censored (a less-than value) and FALSE otherwise. |
groups |
A factor vector used for grouping ‘obs’ into subsets. |
... |
Additional items that are common to this function and the |
This, and related routines, are front ends to routines in the
survival
package. Since the survival routines can not handle
left-censored data, these routines transparently handle “flipping" input
data and resultant calculations.
This function shares the same arguments as survdiff
. The
most important of which is rho
which controls the type of test.
With rho = 0
this is the log-rank or Mantel-Haenszel test, and with
rho = 1
it is equivalent to the Peto & Peto modification of the
Gehan-Wilcoxon test. The default is rho = 1
, or the Peto & Peto
test. This is the most appropriate for left-censored log-normal data.
For the formula interface: if the right hand side of the formula consists
only of an offset term, then a one sample test is done. To cause missing
values in the predictors to be treated as a separate group, rather than
being omitted, use the factor
function with its exclude
argument.
Returns a list with the following components:
n |
the number of subjects in each group. |
obs |
the weighted observed number of events in each group. If there are strata, this will be a matrix with one column per stratum. |
exp |
the weighted expected number of events in each group. If there are strata, this will be a matrix with one column per stratum. |
chisq |
the chisquare statistic for a test of equality. |
var |
the variance matrix of the test. |
strata |
optionally, the number of subjects contained in each stratum. |
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Harrington, D. P. and Fleming, T. R. (1982). A class of rank test procedures for censored survival data. Biometrika 69, 553-566.
data(Cadmium) obs = Cadmium$Cd censored = Cadmium$CdCen groups = Cadmium$Region # Cd differences between regions? cendiff(obs, censored, groups) # Same as above using formula interface cenfit(Cen(obs, censored)~groups)
data(Cadmium) obs = Cadmium$Cd censored = Cadmium$CdCen groups = Cadmium$Region # Cd differences between regions? cendiff(obs, censored, groups) # Same as above using formula interface cenfit(Cen(obs, censored)~groups)
Computes an estimate of an empirical cumulative distribution function (ECDF) for censored data using the Kaplan-Meier method.
cenfit(obs, censored, groups, ...)
cenfit(obs, censored, groups, ...)
obs |
Either a numeric vector of observations or a formula. See examples below. |
censored |
A logical vector indicating TRUE where an observation in ‘obs’ is censored (a less-than value) and FALSE otherwise. |
groups |
A factor vector used for grouping ‘obs’ into subsets. |
... |
Additional items that are common to this function and the |
This, and related routines, are front ends to routines in the
survival
package. Since the survival routines can not handle
left-censored data, these routines transparently handle “flipping" input
data and resultant calculations. Additionally provided are query and
prediction methods for cenfit
objects.
There are many additional options that are supported and documented
in survfit
. Only a few have application to the geosciences.
However, the most important is ‘conf.int’. This is the level for
a two-sided confidence interval on the ECDF. The default is 0.95.
If you are using the formula interface: The censored
and
groups
parameters are not specified – all information is provided
via a formula as the obs
parameter. The formula must have a
Cen
object as the response on the left of the ~
operator and,
if desired, terms separated by + operators on the right.
a cenfit
object.
Methods defined for cenfit
objects are provided for
print
, plot
, lines
, predict
,
mean
, median
, sd
, quantile
.
If the input formula contained factoring groups
(ie., cenfit(obs, censored, groups)
, individual ECDFs can be
obtained by indexing (eg., model[1]
, etc.).
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Dorey, F. J. and Korn, E. L. (1987). Effective sample sizes for confidence intervals for survival probabilities. Statistics in Medicine 6, 679-87.
Fleming, T. H. and Harrington, D.P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 2469-86.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.
Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601-610.
survfit
,
Cen
,
plot-methods
,
mean-methods
,
sd-methods
,
median-methods
,
quantile-methods
,
predict-methods
,
lines-methods
,
summary-methods
,
cendiff
# Create a Kaplan-Meier ECDF, plot and summarize it. data(Cadmium) obs = Cadmium$Cd censored = Cadmium$CdCen mycenfit = cenfit(obs, censored) plot(mycenfit) summary(mycenfit) quantile(mycenfit, conf.int=TRUE) median(mycenfit) mean(mycenfit) sd(mycenfit) predict(mycenfit, c(10, 20, 100), conf.int=TRUE) # With groups groups = Cadmium$Region cenfit(obs, censored, groups) # Formula interface -- no groups cenfit(Cen(obs, censored)) # Formula interface -- with groups cenfit(Cen(obs, censored)~groups)
# Create a Kaplan-Meier ECDF, plot and summarize it. data(Cadmium) obs = Cadmium$Cd censored = Cadmium$CdCen mycenfit = cenfit(obs, censored) plot(mycenfit) summary(mycenfit) quantile(mycenfit, conf.int=TRUE) median(mycenfit) mean(mycenfit) sd(mycenfit) predict(mycenfit, c(10, 20, 100), conf.int=TRUE) # With groups groups = Cadmium$Region cenfit(obs, censored, groups) # Formula interface -- no groups cenfit(Cen(obs, censored)) # Formula interface -- with groups cenfit(Cen(obs, censored)~groups)
A cenfit object is returned from the NADA
cenfit
function.
survfit
:Object of class survfit
returned from
the survfit
function.
signature(x = "cenfit", i = "numeric", j = "missing")
: ...
signature(x = "cenfit")
: ...
signature(x = "cenfit")
: ...
signature(x = "cenfit", y = "ANY")
: ...
signature(object = "cenfit")
: ...
signature(x = "cenfit")
: ...
signature(x = "cenfit")
: ...
signature(x = "cenfit")
: ...
signature(object = "cenfit")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(cenfit(Cen(obs, censored)))
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(cenfit(Cen(obs, censored)))
See cenfit
for all the details.
data(Atrazine) cenfit(Atrazine$Atra, Atrazine$AtraCen) cenfit(Atrazine$Atra, Atrazine$AtraCen, Atrazine$Month) cenfit(Cen(Atrazine$Atra, Atrazine$AtraCen)) cenfit(Cen(Atrazine$Atra, Atrazine$AtraCen)~Atrazine$Month)
data(Atrazine) cenfit(Atrazine$Atra, Atrazine$AtraCen) cenfit(Atrazine$Atra, Atrazine$AtraCen, Atrazine$Month) cenfit(Cen(Atrazine$Atra, Atrazine$AtraCen)) cenfit(Cen(Atrazine$Atra, Atrazine$AtraCen)~Atrazine$Month)
Computes Kendall's tau for singly (y only) or doubly (x and y) censored data. Computes the Akritas-Theil-Sen nonparametric line, with the Turnbull estimate of intercept.
cenken(y, ycen, x, xcen)
cenken(y, ycen, x, xcen)
y |
A numeric vector of observations or a formula. |
ycen |
A logical vector indicating TRUE where an observation in x is censored (a less-than value) and FALSE otherwise. Can be missing/omitted for the case where x is not censored. |
x |
A numeric vector of observations. |
xcen |
A logical vector indicating TRUE where an observation in y is censored (a less-than value) and FALSE otherwise. |
If you are using the formula interface: The ycen
, x
and xcen
parameters are not specified – all information is
provided via a formula as the y
parameter. The formula must
have a Cen
object as the response on the left of the ~
operator and, if desired, terms separated by + operators on the right.
See example below.
Kendall's tau is a nonparametric correlation coefficient measuring the monotonic association between y and x. For left-censored data, concordant and discordant directions between x and y are measured whenever possible. So with increasing x values, a change in y from <1 to 10 is an increase (concordant). A change from a <1 to a detected 0.5 is considered a tie, as is a <1 to a <5, because neither can definitively be called an increase or decrease. Tie corrections are employed for the variance of the test statistic in order to account for the many ties when computing p-values. The ATS line is the slope that results in a Kendalls tau of 0 for correlation between the residuals, y-slope*x and x. The cenken routine performs an iterative bisection search to find that slope. The intercept is the median residual, where the median for censored data is computed using the Turnbull estimate for interval censored data, as implmented in the Icens contributed package for R.
Returns tau (Kendall's tau), slope, and p-value for the regression.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Akritas, M.G., S. A. Murphy, and M. P. LaValley (1995). The Theil-Sen Estimator With Doubly Censored Data and Applications to Astronomy. Journ. Amer. Statistical Assoc. 90, p. 170-177.
# Both y and x are censored # (exercise 11-1 on pg 198 of the NADA book) data(Golden) with(Golden, cenken(Blood, BloodCen, Kidney, KidneyCen)) ## Not run: # x is not censored # (example on pg 213 of the NADA book) data(TCEReg) with(TCEReg, cenken(log(TCEConc), TCECen, PopDensity)) # formula interface with(TCEReg, cenken(Cen(log(TCEConc), TCECen)~PopDensity)) # Plotting data and the regression line data(DFe) # Recall x and y parameter positons are swapped in plot vs regression calls with(DFe, cenxyplot(Year, YearCen, Summer, SummerCen)) # x vs. y reg = with(DFe, cenken(Summer, SummerCen, Year, YearCen)) # y~x lines(reg) ## End(Not run)
# Both y and x are censored # (exercise 11-1 on pg 198 of the NADA book) data(Golden) with(Golden, cenken(Blood, BloodCen, Kidney, KidneyCen)) ## Not run: # x is not censored # (example on pg 213 of the NADA book) data(TCEReg) with(TCEReg, cenken(log(TCEConc), TCECen, PopDensity)) # formula interface with(TCEReg, cenken(Cen(log(TCEConc), TCECen)~PopDensity)) # Plotting data and the regression line data(DFe) # Recall x and y parameter positons are swapped in plot vs regression calls with(DFe, cenxyplot(Year, YearCen, Summer, SummerCen)) # x vs. y reg = with(DFe, cenken(Summer, SummerCen, Year, YearCen)) # y~x lines(reg) ## End(Not run)
A "cenken" object is returned from cenken
. It extends the
‘list’ class.
Objects can be created by calls of the form
cenken(y, ycen, x, xcen)
.
.Data
:Object of class "list"
Class "list"
, from data part.
signature(x = "cenken")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Regression by Maximum Likelihood (ML) Estimation for left-censored ("nondetect" or "less-than") data. This routine computes regression estimates of slope(s) and intercept by maximum likelihood when data are left-censored. It will compute ML estimates of descriptive statistics when explanatory variables following the ~ are left blank. It will compute ML tests similar in function and assumptions to two-sample t-tests and analysis of variance when groups are specified following the ~. It will compute regression equations, including multiple regression, when continuous explanatory variables are included following the ~. It will compute the ML equivalent of analysis of covariance when both group and continuous explanatory variables are specified following the ~. To avoid an appreciable loss of power with regression and group hypothesis tests, a probability plot of residuals should be checked to ensure that residuals from the regression model are approximately gaussian.
cenmle(obs, censored, groups, ...)
cenmle(obs, censored, groups, ...)
obs |
Either a numeric vector of observations or a formula. See examples below. |
censored |
A logical vector indicating TRUE where an observation in ‘obs’ is censored (a less-than value) and FALSE otherwise. |
groups |
A factor vector used for grouping ‘obs’ into subsets. |
... |
Additional items that are common to this function and the |
This routine is a front end to the survreg
routine in the
survival
package.
There are many additional options that are supported and documented
in survfit
. Only a few have relevance to the evironmental
sciences.
A very important option is ‘dist’ which specifies the distributional model to use in the regression. The default is ‘lognormal’.
Another important option is ‘conf.int’. This is NOT an option to
survreg
but is an added feature (due to some arcane details of
R it can't be documented above). The ‘conf.int’ option specifies
the level for a two-sided confidence interval on the regression.
The default is 0.95. This interval will be used in when the output
object is passed to other generic functions such as mean
and quantile
. See Examples below.
Also supported is a ‘gaussian’ or a normal distribution. The use of a gaussian distribution requires an interval censoring context for left-censored data. Luckily, this routine automatically does this for you – simply specify ‘gaussian’ and the correct manipulations are done.
If any other distribution is specified besides lognormal or gaussian, the return object is a raw survreg object – it is up to the user to ‘do the right thing’ with the output (and input for that matter).
If you are using the formula interface: The censored
and
groups
parameters are not specified – all information is provided
via a formula as the obs
parameter. The formula must have a
Cen
object as the response on the left of the ~
operator and,
if desired, terms separated by + operators on the right.
See Examples below.
a cenmle
object.
Methods defined for cenmle
objects are provided for
mean
, median
, sd
.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Cen
,
cenmle-methods
,
mean-methods
,
sd-methods
,
median-methods
,
quantile-methods
,
summary-methods
# Create a MLE regression object data(TCEReg) tcemle = with(TCEReg, cenmle(TCEConc, TCECen)) summary(tcemle) median(tcemle) mean(tcemle) sd(tcemle) quantile(tcemle) # This time specifiy a different confidence interval tcemle = with(TCEReg, cenmle(TCEConc, TCECen, conf.int=0.80)) # Use the model's confidence interval with the quantile function quantile(tcemle, conf.int=TRUE) # With groupings with(TCEReg, cenmle(TCEConc, TCECen, PopDensity))
# Create a MLE regression object data(TCEReg) tcemle = with(TCEReg, cenmle(TCEConc, TCECen)) summary(tcemle) median(tcemle) mean(tcemle) sd(tcemle) quantile(tcemle) # This time specifiy a different confidence interval tcemle = with(TCEReg, cenmle(TCEConc, TCECen, conf.int=0.80)) # Use the model's confidence interval with the quantile function quantile(tcemle, conf.int=TRUE) # With groupings with(TCEReg, cenmle(TCEConc, TCECen, PopDensity))
A "cenmle" object is returned from cenmle
. It extends the
‘cenreg’ class returned from survreg
.
Objects can be created by calls of the form cenmle(obs, censored)
.
survreg
:Object of class "survreg"
Class "list"
, from data part.
Class "vector"
, by class "list"
.
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(object = "cenmle")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
x = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) xcen = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(cenmle(x, xcen))
x = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) xcen = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(cenmle(x, xcen))
A "cenmle-gaussian" object is returned from cenmle
when a
gaussian distribution is chosen with the ‘dist’ option.
Objects can be created by calls of the form
cenmle(obs, censored, dist="gaussian")
.
n
:Total number of observations associated with the model
n.cen
:Number of censored observations
y
:Vector of observations
ycen
:Censoring indicator
conf.int
:Confidence interval associated with the model
survreg
:Object of class "survreg"
Class "cenmle"
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(object = "cenmle")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
A "cenmle-lognormal" object is returned from cenmle
when a
lognormal distribution is chosen with the ‘dist’ option.
Objects can be created by calls of the form
cenmle(obs, censored, dist="lognormal")
.
n
:Total number of observations associated with the model
n.cen
:Number of censored observations
y
:Vector of observations
ycen
:Censoring indicator
conf.int
:Confidence interval associated with the model
survreg
:Object of class "survreg"
Class "cenmle"
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(x = "cenmle")
: ...
signature(object = "cenmle")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Computes regression equations for singly censored data using maximum likelihood estimation. Estimates of slopes and intercept, tests for significance of parameters,and predicted quantiles (Median = points on the line) with confidence intervals can be computed.
cenreg(obs, censored, groups, ...)
cenreg(obs, censored, groups, ...)
obs |
Either a numeric vector of observations or a formula. See examples below. |
censored |
If a formula is not specified, this should be a logical vector indicating TRUE where an observation in obs is censored (a less-than value) and FALSE otherwise. |
groups |
If a formula is not specified, this should be a numeric or factor vector that represents the explanatory variable. |
... |
Additional items that are common to this function and the |
This routine is a front end to the survreg
routine in the
survival
package.
There are many additional options that are supported and documented
in survfit
. Only a few have relevance to the evironmental
sciences.
A very important option is ‘dist’ which specifies the distributional model to use in the regression. The default is ‘lognormal’.
Another important option is ‘conf.int’. This is NOT an option to
survreg
but is an added feature (due to some arcane details of
R it can't be documented above). The ‘conf.int’ option specifies
the level for a two-sided confidence interval on the regression.
The default is 0.95. This interval will be used in when the output
object is passed to other generic functions such as mean
and quantile
. See Examples below.
Also supported is a ‘gaussian’ or a normal distribution. The use of a gaussian distribution requires an interval censoring context for left-censored data. Luckily, this routine automatically does this for you – simply specify ‘gaussian’ and the correct manipulations are done.
If any other distribution is specified besides lognormal or gaussian, the return object is a raw survreg object – it is up to the user to ‘do the right thing’ with the output (and input for that matter).
If you are using the formula interface: The censored
and
groups
parameters are not specified – all information is
provided via a formula as the obs
parameter. The formula
must have a Cen
object as the response on the left of the
~
operator and, if desired, terms separated by + operators
on the right. See examples below.
The reported likelihood r correlation coefficient measures the linear association between y (groups) and x (obs), based on the difference in log likelihoods between the fitted model and the null model. Slopes and intercepts are fit by maximum likelihood. A lognormal distribution is fit by default, with a normal distribution being an option. Estimates of predicted values on the line can be obtained by specifying the values for all x variables at which y is to be predicted. Requesting the median (p=0.5) will provide estimates on the line for a lognormal distribution. Estimates of the mean are also possible, as are estimates of other percentiles. Equations for confidence intervals follow those of Meeker and Escobar (1098).
Returns a summary.cenreg
object.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Meeker, W.Q. and L. A. Escobar (1998). Statistical Methods for Reliability Data. John Wiley and Sons, USA, NJ.
# (examples in Chap 12 of the NADA book) data(TCEReg) # Using the formula interface with(TCEReg, cenreg(Cen(TCEConc, TCECen)~PopDensity)) # Two or more explanatory variables requires the formula interface tcemle2 = with(TCEReg, cenreg(Cen(TCEConc, TCECen)~PopDensity+Depth)) # Prediction of quantiles at PopDensity=5 and Depth=110 predict(tcemle2, c(5, 110))
# (examples in Chap 12 of the NADA book) data(TCEReg) # Using the formula interface with(TCEReg, cenreg(Cen(TCEConc, TCECen)~PopDensity)) # Two or more explanatory variables requires the formula interface tcemle2 = with(TCEReg, cenreg(Cen(TCEConc, TCECen)~PopDensity+Depth)) # Prediction of quantiles at PopDensity=5 and Depth=110 predict(tcemle2, c(5, 110))
A "cenreg" object is returned from cenreg
. It extends the
‘cenreg’ class returned from survreg
.
Objects can be created by calls of the form cenreg(obs, censored, groups)
.
conf.int
:Numeric value of confidence level (0.95)
n
:Total number of samples
n.cen
:Total censored samples
survreg
:Object of class "survreg"
y
:Total y samples
ycen
:Total censored y samples
Class "list"
, from data part.
Class "vector"
, by class "list"
.
signature(object = "cenreg")
: ...
signature(x = "cenreg")
: ...
signature(object = "cenreg")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
A "cenreg-gaussian" object is returned from cenreg
when a
gaussian distribution is chosen with the ‘dist’ option.
Objects can be created by calls of the form
cenreg(obs, censored, dist="gaussian")
.
n
:Total number of observations associated with the model
n.cen
:Number of censored observations
y
:Vector of observations
ycen
:Censoring indicator
conf.int
:Confidence interval associated with the model
survreg
:Object of class "survreg"
Class "cenreg"
signature(object = "cenreg")
: ...
signature(x = "cenreg")
: ...
signature(object = "cenreg")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
A "cenreg-lognormal" object is returned from cenreg
when a
lognormal distribution is chosen with the ‘dist’ option.
Objects can be created by calls of the form
cenreg(obs, censored, dist="lognormal")
.
n
:Total number of observations associated with the model
n.cen
:Number of censored observations
y
:Vector of observations
ycen
:Censoring indicator
conf.int
:Confidence interval associated with the model
survreg
:Object of class "survreg"
Class "cenreg"
signature(object = "cenreg")
: ...
signature(x = "cenreg")
: ...
signature(object = "cenreg")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
A convenience function that produces a comparative table of
summary statistics obtained using the cenros
, cenmle
and cenfit
routines. These methods are, Regression on
Order Statistics (ROS), Maximum Likelihood Estimation (MLE), and
Kaplan-Meier (K-M).
censtats(obs, censored)
censtats(obs, censored)
obs |
A numeric vector of observations. |
censored |
A logical vector indicating TRUE where an observation in x is censored (a less-than value) and FALSE otherwise. |
If the data do not fulfill the criteria for the application of any method no summary statistics will be produced.
A dataframe with the summary statistics.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
data(DFe) with(DFe, censtats(Summer, SummerCen))
data(DFe) with(DFe, censtats(Summer, SummerCen))
Produces basic, and hopefully useful, summary statistics on censored data.
censummary(obs, censored, groups)
censummary(obs, censored, groups)
obs |
A numeric vector of observations. |
censored |
A logical vector indicating TRUE where an observation in x is censored (a less-than value) and FALSE otherwise. |
groups |
A factor vector used for grouping ‘obs’ into subsets. |
A censummary
object.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
data(DFe) with(DFe, censummary(Summer, SummerCen))
data(DFe) with(DFe, censummary(Summer, SummerCen))
Draws a x-y scatter plot with censored values represented by dashed lines spanning the from the censored threshold to zero.
cenxyplot(x, xcen, y, ycen, log="", lty="dashed", ...)
cenxyplot(x, xcen, y, ycen, log="", lty="dashed", ...)
x |
A numeric vector of observations. |
xcen |
A logical vector indicating TRUE where an observation in x is censored (a less-than value) and FALSE otherwise. |
y |
A numeric vector of observations. |
ycen |
A logical vector indicating TRUE where an observation in y is censored (a less-than value) and FALSE otherwise. |
log |
A character string which contains '"x"' if the x axis is to be logarithmic, '"y"' if the y axis is to be logarithmic and '"xy"' or '"yx"' if both axes are to be logarithmic. Default is '""', or both axis linear. |
lty |
The line type of the lines representing the censored-data ranges. |
... |
Additional items that get passed to |
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
data(DFe) with(DFe, cenxyplot(Year, YearCen, Summer, SummerCen))
data(DFe) with(DFe, cenxyplot(Year, YearCen, Summer, SummerCen))
Chloroform concentrations in groundwaters of California.
Objective is to determine if concentrations differ between urban and rural areas. There are three detection limits, at 0.05, 0.1, and 0.2 ug/L. Used in Chapter 9 of the NADA book.
data(ChlfmCA)
data(ChlfmCA)
Squillace et al., 1999, Environmental Science and Technology 33, 4176-4187.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for extracting coefficients from MLE regression models in package NADA
## S4 method for signature 'cenreg' coef(object, ...)
## S4 method for signature 'cenreg' coef(object, ...)
object |
An output object from a NADA function such as |
... |
Additional parameters to subclasses – currently none |
Methods for function cor
in package NADA
Extracts the r-likelihood correlation coefficient from
a cenreg
object.
Copper and zinc concentrations in ground waters from two zones in the San Joaquin Valley of California. The zinc concentrations were used.
Objective is to determine if zinc concentrations differ between the two zones. Zinc has two detection limits, at 3 and 10 ug/L. Used in Chapters 4, 5 and 9 of the NADA book.
data(CuZn)
data(CuZn)
Millard and Deverel, 1988, Water Resources Research 24, pp. 2087-2098.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Zinc concentrations of the CuZn data set; concentrations in the Alluvial Fan zone have been altered so that there are more nondetects. This produces a greater signal, even with more nondetects.
Objective is to determine if zinc concentrations differ between the two zones. Zinc has two detection limits, at 3 and 10 ug/L. Used in Chapter 9 of the NADA book.
data(CuZnAlt)
data(CuZnAlt)
Altered from the data of Millard and Deverel, 1988, Water Resources Research 24, pp. 2087-2098.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Dissolved iron concentrations over several years in the Brazos River, Texas. Summer concentrations were used.
Objective is to determine if there is a trend over time. Iron has two detection limits, at 3 and 10 ug/L. Used in Chapters 5, 11 and 12 of the NADA book.
data(DFe)
data(DFe)
Hughes and Millard, 1988, Water Resources Bulletin 24, pp. 521-531.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Dissolved Organic Carbon (DOC) concentrations in ground waters of irrigated and non-irrigated areas.
Objective is to determine if concentrations differ between irrigated and non-irrigated areas. There is one detection limit at 0.2 ug/L. Used in Chapter 9 of the NADA book.
data(DOC)
data(DOC)
Junk et al., 1980, Journal of Environmental Quality 9, pp. 479-483.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for function flip
in package NADA.
When used in concert with Cen
, flip
rescales left-censored data into right-censored data
for use in the survival package routines (which can only handle
right-censored data sets).
## S4 method for signature 'Cen' flip(x) ## S4 method for signature 'formula' flip(x)
## S4 method for signature 'Cen' flip(x) ## S4 method for signature 'formula' flip(x)
x |
A |
Flips, or rescales a Cen
object or a formula object.
By default, flip
rescales the input data by subtracting a
large constant that is larger than maximum input value from all
observations. It then marks the data as right censored so that
routines from the survival
package can be used.
IMPORTANT: All NADA routines transparently handle flipping and
re-transforming data. Thus, flip
should almost never be used,
except perhaps in the development of an extension function.
Also, flipping a Cen object results in a Surv object – which presently cannot be flipped back to a Cen object!
Flipping a formula just symbolically updates the response (which should be a Cen object). Result is like: flip(Cen(obs, cen))~groups
Lead concentrations in the blood and several organs of herons in Virginia.
Objective is to determine the relationships between lead concentrations in the blood and various organs. Do concentrations reflect environmental lead concentrations, as represented by dosing groups? There is one detection limit, at 0.02 ug/g. Used in Chapters 10 and 11 of the NADA book.
data(Golden)
data(Golden)
Golden et al., 2003, Environmental Toxicology and Chemistry 22, pp. 1517-1524.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Proportions of detectable concentrations of antibiotics (ug/L) in drainage from fish hatcheries across the United States.
Objective is to compute confidence intervals and tests on proportions.
There is one detection limit for each compound, all at 0.05 ug/L. Used in Chapters 8 and 9 of the NADA book.
data(Hatchery)
data(Hatchery)
Thurman et al., 2002, Occurrence of antibiotics in water from fish hatcheries. USGS Fact Sheet FS 120-02.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Helsel-Cohn style plotting positions for multiply-censored data.
hc.ppoints(obs, censored, na.action) hc.ppoints.uncen(obs, censored, cn, na.action) hc.ppoints.cen(obs, censored, cn, na.action)
hc.ppoints(obs, censored, na.action) hc.ppoints.uncen(obs, censored, cn, na.action) hc.ppoints.cen(obs, censored, cn, na.action)
obs |
A numeric vector of observations. This includes both censored and uncensored observations. |
censored |
A logical vector indicating TRUE where an observation in v is censored (a less-than value) and FALSE otherwise. |
cn |
An optional argument for internal-code use only. cn = a Cohn Numbers list (quantities described by Helsel and Cohn (1988) in their formulation of the problem). |
na.action |
A function which indicates what should happen
when the data contain |
The function computes Wiebull-type plotting positions of data containing mixed uncensored and censored data. The formula was first described by Hirsch and Stedinger (1897) and latter reformulated by Helsel and Cohn (1988). It assumes that censoring is left-censoring (less-thans). A detailed discussion of the formulation is in Lee and Helsel (in press).
Note that if the input vector ‘censored’ is of zero length, then
the plotting positions are calculated using ppoints
.
Otherwise, hc.ppoints.uncen
and hc.ppoints.cen
are used.
hc.ppoints.uncen
calculates plotting positions for uncensored
data only.
hc.ppoints.cen
calculates plotting positions for censored
data only.
hc.ppoints
returns a numeric vector of plotting positions
which correspond to the observations in the input vector 'obs'.
hc.ppoints.uncen
returns a numeric vector of plotting positions
which correspond to the uncensored observations in the input vector 'obs'.
hc.ppoints.cen
returns a numeric vector of plotting positions
which correspond to the censored observations in the input vector 'obs'.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Lee and Helsel (in press), Statistical analysis of environmental data containing multiple detection limits: S-language software for linear regression on order statistics, Computers in Geoscience vol. X, pp. X-X
Dennis R. Helsel and Timothy A. Cohn (1988), Estimation of descriptive statistics for multiply censored water quality data, Water Resources Research vol. 24, no. 12, pp.1997-2004
Robert M. Hirsch and Jery R. Stedinger (1987), Plotting positions for historical floods and their precision. Water Resources Research, vol. 23, no. 4, pp. 715-727.
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) hc.ppoints(obs, censored)
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) hc.ppoints(obs, censored)
Mercury concentrations in fish across the United States.
Objective is to determine if mercury concentrations differ by watershed land use. Can concentrations be related to water and sediment characteristics of the streams?
There are three detection limits, at 0.03, 0.05, and 0.10 ug/g wet weight. Used in Chapters 10, 11 and 12 of the NADA book.
data(HgFish)
data(HgFish)
Brumbaugh et al., 2001, USGS Biological Science Report BSR-2001-0009.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for adding lines
to plots in package NADA
## S4 method for signature 'ros' lines(x, ...) ## S4 method for signature 'cenfit' lines(x, ...) ## S4 method for signature 'cenken' lines(x, ...)
## S4 method for signature 'ros' lines(x, ...) ## S4 method for signature 'cenfit' lines(x, ...) ## S4 method for signature 'cenken' lines(x, ...)
x |
An output object from a NADA function such as |
... |
Additional arguments passed to the generic method. |
Copper concentrations in ground water from the Alluvial Fan zone in the San Joaquin Valley of California. One observation was altered to become a <21, larger than all of the detected observations (the largest detected observation is a 20).
Objective is to calculate summary statistics when the largest observation is censored.
There are five detection limits, at 1, 2, 5, 10 and 20 ug/L. An additional artificial detection limit of 21 was added to illustrate a point. Used in Chapter 6 of the NADA book.
data(MDCu)
data(MDCu)
Millard and Deverel, 1988, Water Resources Research 24, pp. 2087-2098.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for computing the mean using model objects in package NADA
## S4 method for signature 'ros' mean(x, ...) ## S4 method for signature 'cenfit' mean(x, ...) ## S4 method for signature 'cenmle' mean(x, ...)
## S4 method for signature 'ros' mean(x, ...) ## S4 method for signature 'cenfit' mean(x, ...) ## S4 method for signature 'cenmle' mean(x, ...)
x |
An output object from a NADA function such as |
... |
Additional arguments passed to the generic method. |
Methods for computing the median using model objects in package NADA
## S4 method for signature 'ros' median(x, na.rm=FALSE) ## S4 method for signature 'cenfit' median(x, na.rm=FALSE) ## S4 method for signature 'cenmle' median(x, na.rm=FALSE)
## S4 method for signature 'ros' median(x, na.rm=FALSE) ## S4 method for signature 'cenfit' median(x, na.rm=FALSE) ## S4 method for signature 'cenmle' median(x, na.rm=FALSE)
x |
An output object from a NADA function such as |
na.rm |
Should NAs be removed prior to computation? |
A "NADAList" simply extends the ‘list’ class.
NADAList objects are created by calls like
cenken(y, ycen, x, xcen)
and other functions.
.Data
:Object of class "list"
Class "list"
, from data part.
signature(object = "NADAList")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Arsenic concentrations (ug/L) in an urban stream, Manoa Stream at Kanewai Field, on Oahu, Hawaii.
Objective is to characterize conditions by computing summary statistics.
There are three detection limits, at 0.9, 1, and 2 ug/L. Uncensored values reported below the lowest detection limit indicate that informative censoring may have been used, and so the results are likely biased high. Used in Chapter 6 of the NADA book.
data(Oahu)
data(Oahu)
Tomlinson, 2003, Effects of Ground-Water/Surface-Water Interactions and Land Use on Water Quality. Written communication (draft USGS report).
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
pctCen
is a simple, but convenient, function that calculates
the percentage of censored values.
pctCen(obs, censored, na.action)
pctCen(obs, censored, na.action)
obs |
A numeric vector of observations. This includes both censored and uncensored observations. |
censored |
A logical vector indicating TRUE where an observation in v is censored (a less-than value) and FALSE otherwise. |
na.action |
A function which indicates what should happen
when the data contain |
100*(length(obs[censored])/length(obs))
pctCen
returns a single numeric value representing
the percentage of values censored in the “obs" vector.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) pctCen(obs, censored)
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) pctCen(obs, censored)
Methods for plotting objects in package NADA
## S4 method for signature 'ros' plot(x, plot.censored=FALSE, lm.line=TRUE, grid=TRUE, ...) ## S4 method for signature 'cenfit' plot(x, conf.int=FALSE, ...) ## S4 method for signature 'cenmle' plot(x, ...) ## S4 method for signature 'cenreg' plot(x, ...)
## S4 method for signature 'ros' plot(x, plot.censored=FALSE, lm.line=TRUE, grid=TRUE, ...) ## S4 method for signature 'cenfit' plot(x, conf.int=FALSE, ...) ## S4 method for signature 'cenmle' plot(x, ...) ## S4 method for signature 'cenreg' plot(x, ...)
x |
An output object from a NADA function such as |
conf.int |
A logical indicating if confidence intervals should be computed. For
|
plot.censored |
|
lm.line |
|
grid |
|
... |
Additional arguments passed to the generic method. |
Functions that perform predictions using NADA model objects.
For ros
models, predict the normal quantile of a value.
For cenfit
objects, predict the probabilities of new observations.
## S4 method for signature 'ros' predict(object, newdata, ...) ## S4 method for signature 'cenfit' predict(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'cenreg' predict(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'cenfit' pexceed(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'ros' pexceed(object, newdata, conf.int=FALSE, conf.level=0.95, ...)
## S4 method for signature 'ros' predict(object, newdata, ...) ## S4 method for signature 'cenfit' predict(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'cenreg' predict(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'cenfit' pexceed(object, newdata, conf.int=FALSE, ...) ## S4 method for signature 'ros' pexceed(object, newdata, conf.int=FALSE, conf.level=0.95, ...)
object |
An output object from a NADA function such as |
newdata |
Numeric vector of data for which to predict model values.
For |
conf.int |
A logical indicating if confidence intervals should be computed. For
|
conf.level |
The actual confidence level to which to bracket the prediction. Default is 0.95 |
... |
Additional arguments passed to the generic method. |
Methods for function print
in package NADA
Default print method
Displays a Cen
object.
Displays a cenfit
object.
Displays a cenmle
object.
Displays a cenreg
object.
Displays a summary.cenreg
object.
Displays a ros
object.
Displays a censummary
object.
Displays a NADAList
object. This is an
internal method and should rarely be used from the command line.
Methods for the function quantile
in package NADA
Compute the modeled values of quantiles or probabilities using a model object.
## S4 method for signature 'ros' quantile(x, probs=NADAprobs, ...) ## S4 method for signature 'cenfit' quantile(x, probs=NADAprobs, conf.int=FALSE, ...) ## S4 method for signature 'cenmle' quantile(x, probs=NADAprobs, conf.int=FALSE, ...)
## S4 method for signature 'ros' quantile(x, probs=NADAprobs, ...) ## S4 method for signature 'cenfit' quantile(x, probs=NADAprobs, conf.int=FALSE, ...) ## S4 method for signature 'cenmle' quantile(x, probs=NADAprobs, conf.int=FALSE, ...)
x |
An output object from a NADA fuction such as |
probs |
Numeric vector of probabilities for which to calculate model values. The default is the global variable NADAprobs = c(0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95). |
conf.int |
A logical indicating if confidence intervals should be computed.
For |
... |
Additional arguments passed to the generic method. |
data(Cadmium) mymodel = cenfit(Cadmium$Cd, Cadmium$CdCen, Cadmium$Region) quantile(mymodel, conf.int=TRUE)
data(Cadmium) mymodel = cenfit(Cadmium$Cd, Cadmium$CdCen, Cadmium$Region) quantile(mymodel, conf.int=TRUE)
Atrazine concentrations in streams throughout the Midwestern United States.
Objective is to develop a regression of model for atrazine concentrations using explanatory variables.
There is one detection limit, at 0.05 ug/L. Used in Chapter 12 of the NADA book.
data(Recon)
data(Recon)
Mueller et al., 1997, Journal of Environmental Quality 26, pp. 1223-1230.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for extracting residuals from MLE regression models in package NADA
## S4 method for signature 'cenreg' residuals(object, ...)
## S4 method for signature 'cenreg' residuals(object, ...)
object |
An output object from a NADA function such as |
... |
Additional parameters to subclasses – currently none |
Lindane concentrations in fish from tributaries of the Thames River, England.
Objective is to determine whether lindane concentrations are the same at all sites.
There is one detection limit at 0.08 ug/kg. Used in Chapter 9 of the NADA book.
data(Roach)
data(Roach)
Yamaguchi et al., 2003, Chemosphere 50, 265-273.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
ros
is an implementation of a Regression on Order Statistics
(ROS) designed for multiply censored analytical chemistry data.
The method assumes data contains zero to many left censored (less-than) values.
ros(obs, censored, forwardT="log", reverseT="exp", na.action)
ros(obs, censored, forwardT="log", reverseT="exp", na.action)
obs |
A numeric vector of observations. This includes both censored and uncensored observations. |
censored |
A logical vector indicating TRUE where an observation in
|
forwardT |
A name of a function to use for transformation prior to performing
the ROS fit. Defaults to |
reverseT |
A name of a function to use for reversing the transformation
after performing the ROS fit. Defaults to |
na.action |
A function which indicates what should happen
when the data contain |
By default, ros
performs a log transformation prior to, and after
operations over the data. This can be changed by specifying a forward and
reverse transformation function using the forwardT
and
reverseT
parameters. No transformation will be performed if either
forwardT
or reverseT
are set to NULL
.
The procedure first computes the Weibull-type plotting positions of
the combined uncensored and censored observations using a formula
designed for multiply-censored data (see hc.ppoints
).
A linear regression is formed using the plotting positions of the
uncensored observations and their normal quantiles. This model is
then used to estimate the concentration of the censored observations
as a function of their normal quantiles. Finally, the observed
uncensored values are combined with modeled censored values to
corporately estimate summary statistics of the entire population. By
combining the uncensored values with modeled censored values, this
method is more resistant of any non-normality of errors, and reduces
any transformation errors that may be incurred.
ros
returns an object of class c("ros", "lm").
print
displays a simple summary of the ROS model.
as.data.frame
converts the modeled data in a ROS model to
a data frame. Note that this discards all linear-model information
from the object.
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Lee and Helsel (2005) Statistical analysis of environmental data containing multiple detection limits: S-language software for regression on order statistics, Computers in Geoscience vol. 31, pp. 1241-1248.
Lee and Helsel (2005) Baseline models of trace elements in major aquifers of the United States. Applied Geochemistry vol. 20, pp. 1560-1570.
Dennis R. Helsel (2005), Nondetects And Data Analysis: John Wiley and Sons, New York.
Dennis R. Helsel (1990), Less Than Obvious: Statistical Methods for, Environmental Science and Technology, vol.24, no. 12, pp. 1767-1774
Dennis R. Helsel and Timothy A. Cohn (1988), Estimation of descriptive statistics for multiply censored water quality data, Water Resources Research vol. 24, no. 12, pp.1997-2004
splitQual
,
predict
,
plot
,
ros-class
,
ros-methods
,
plot-methods
,
mean-methods
,
sd-methods
,
quantile-methods
,
median-methods
,
predict-methods
,
summary-methods
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) myros = ros(obs, censored) plot(myros) summary(myros) mean(myros); sd(myros) quantile(myros); median(myros) as.data.frame(myros)
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) myros = ros(obs, censored) plot(myros) summary(myros) mean(myros); sd(myros) quantile(myros); median(myros) as.data.frame(myros)
A "ros" object is returned from ros
. It extends the
"lm" class returned from lm
.
Objects can be created by calls of the form ros(obs, censored)
.
.Data
:Object of class "list"
Class "list"
, from data part.
Class "vector"
, by class "list"
.
signature(x = "ros")
: ...
signature(x = "ros")
: ...
signature(x = "ros")
: ...
signature(x = "ros", y = "missing")
: ...
signature(object = "ros")
: ...
signature(x = "ros")
: ...
signature(x = "ros")
: ...
signature(x = "ros")
: ...
signature(object = "ros")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(ros(obs, censored))
obs = c(0.5, 0.5, 1.0, 1.5, 5.0, 10, 100) censored = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE) class(ros(obs, censored))
Methods for constructing ROS models in package NADA
Compute and return a ROS model given a numeric vector of observations and a logical vector indicating TRUE or FALSE where the observations are not censored or censored respectively.
Methods for computing standard deviations in package NADA
## S4 method for signature 'ros' sd(x, na.rm=FALSE) ## S4 method for signature 'cenfit' sd(x, na.rm=FALSE) ## S4 method for signature 'cenmle' sd(x, na.rm=FALSE)
## S4 method for signature 'ros' sd(x, na.rm=FALSE) ## S4 method for signature 'cenfit' sd(x, na.rm=FALSE) ## S4 method for signature 'cenmle' sd(x, na.rm=FALSE)
x |
An output object from a NADA function such as |
na.rm |
Should NAs be removed prior to computation? |
Lead concentrations in stream sediments before and after wildfires.
Objective is to determine whether lead concentrations are the same pre- and post-fire.
There is one detection limit at 4 ug/L. Used in Chapter 9 of the NADA book.
data(SedPb)
data(SedPb)
Eppinger et al., 2003, USGS Open-File Report 03-152.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Pyrene concentrations in milligrams per liter from 20 water-quality monitoring stations in the Puget Sound of Washington State, USA.
Used for characterizing priority pollutant concentrations in sediments of Puget Sound by computing summary statisitics. Contains eight detection limits with 11 nondetects out of 56 total measurements.
data(ShePyrene)
data(ShePyrene)
She, N., 1997, Analyzing censored water quality data using a nonparametric approach. Journal of the American Water Resources Association, 33, pp615–624.
She, N., 1997, Analyzing censored water quality data using a nonparametric approach. Journal of the American Water Resources Association, 33, pp615–624.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Methods for showting objects in package NADA
## S4 method for signature 'ros' show(object) ## S4 method for signature 'cenfit' show(object) ## S4 method for signature 'cenmle' show(object) ## S4 method for signature 'cenreg' show(object) ## S4 method for signature 'summary.cenreg' show(object) ## S4 method for signature 'cenken' show(object) ## S4 method for signature 'censummary' show(object) ## S4 method for signature 'NADAList' show(object)
## S4 method for signature 'ros' show(object) ## S4 method for signature 'cenfit' show(object) ## S4 method for signature 'cenmle' show(object) ## S4 method for signature 'cenreg' show(object) ## S4 method for signature 'summary.cenreg' show(object) ## S4 method for signature 'cenken' show(object) ## S4 method for signature 'censummary' show(object) ## S4 method for signature 'NADAList' show(object)
object |
An output object from a NADA function such as |
Silver concentrations in a standard solution sent to 56 laboratories as part of a quality assurance program.
Objective is to estimate summary statistics for the standard solution. The median or mean might be considered the most likel estimate of the concentration.
Contains twelve detection limits, the largest at 25 ug/L. Used in Chapter 6 of the NADA book.
data(Silver)
data(Silver)
Helsel and Cohn, 1988, Water Resources Research 24, pp. 1997-2004.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
splitQual
extracts qualified and unqualified vectors
from a character vector containing concatenated
numeric and qualifying characters.
Typically used to split “less-thans" in qualifier-numeric concatenations like “<0.5".
splitQual(v, qual.symbol= "<")
splitQual(v, qual.symbol= "<")
v |
A character vector. |
qual.symbol |
The qualifier symbol to split from the characters in v. Defaults to “<". |
splitQual
returns a list of three vectors.
qual |
A numeric vector of values associated with qualified input. |
unqual |
A numeric vector of values associated with unqualified input |
qual.index |
Indexes of qualified values (ie., where qual.symbol was matched) |
unqual.index |
Indexes of unqualified values (ie., where qual.symbol was not matched) |
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Lee and Helsel (2005), Statistical analysis of environmental data containing multiple detection limits: S-language software for regression on order statistics, Computers in Geoscience vol. 31, pp. 1241-1248
v = c('<1', 1, '<1', 1, 2) splitQual(v)
v = c('<1', 1, '<1', 1, 2) splitQual(v)
Methods for summarizing objects in package NADA
## S4 method for signature 'ros' summary(object, plot=FALSE, ...) ## S4 method for signature 'cenfit' summary(object, ...) ## S4 method for signature 'cenreg' summary(object, ...)
## S4 method for signature 'ros' summary(object, plot=FALSE, ...) ## S4 method for signature 'cenfit' summary(object, ...) ## S4 method for signature 'cenreg' summary(object, ...)
object |
An output object from a NADA function such as |
plot |
Logical indicating if summary graphs be generated? |
... |
Additional arguments passed to the generic method. |
A "summary.cenreg" object is returned from summary
.
Objects can be created by calls of the form
summary(cenreg(obs, censored, groups))
.
.Data
:Object of class "list"
Class "list"
.
Class "vector"
, by class "list"
.
signature(object = "cenreg")
: ...
R. Lopaka Lee <[email protected]>
Dennis Helsel <[email protected]>
Contaminant concentrations in test and a control group.
Objective is to determine whether a test group has higher concentrations than a control group.
There are three detection limits, at 1, 2, and 5 ug/L. Used in Chapter 1, Table 1.1 of the NADA book.
data(Tbl1one)
data(Tbl1one)
None. Generated data.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
TCE concentrations (ug/L) in ground waters of Long Island, New York. Categorized by the dominant land use type (low, medium, or high density residential) surrounding the wells.
Objective determine if concentrations are the same for the three land use types. There are Four detection limits, at 1,2,4 and 5 ug/L. Used in Chapter 10 of the NADA book.
data(TCE)
data(TCE)
Eckhardt et al., 1989, USGS Water Resources Investigations Report 86-4142.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
TCE concentrations (ug/L) in ground waters of Long Island, New York, along with several possible explanatory variables.
Objective is to determine if concentrations are related to one or more explanatory variables.
There are four detection limits, at 1,2,4 and 5 ug/L. One column indicates whether concentrations are above or below 5. Used in Chapter 12 of the NADA book.
data(TCEReg)
data(TCEReg)
Eckhardt et al., 1989, USGS Water Resources Investigations Report 86-4142.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.
Dieldrin, lindane and PCB concentrations in fish of the Thames River and tributaries, England.
Objective is to determine if concentrations differ among sampling sites. Are dieldrin and lindane concentrations correlated? There is one detection limit per compound. Used in Chapters 11 and 12 of the NADA book.
data(Thames)
data(Thames)
Yamaguchi et al., 2003, Chemosphere 50, 265-273.
Helsel, Dennis R. (2005). Nondectects and Data Analysis; Statistics for censored environmental data. John Wiley and Sons, USA, NJ.