Title: | Statistical Analysis of Non-Detects |
---|---|
Description: | Provides functions for the analysis of occupational and environmental data with non-detects. Maximum likelihood (ML) methods for censored log-normal data and non-parametric methods based on the product limit estimate (PLE) for left censored data are used to calculate all of the statistics recommended by the American Industrial Hygiene Association (AIHA) for the complete data case. Functions for the analysis of complete samples using exact methods are also provided for the lognormal model. Revised from 2007-11-05 'survfit~1'. |
Authors: | E. L. Frome <[email protected]> and D. P. Frome |
Maintainer: | E. P. Adams <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0 |
Built: | 2024-12-11 07:22:54 UTC |
Source: | CRAN |
Environmental exposure measurements are, in general, positive and may be subject to left censoring; i.e., the measured value is less than a "detection limit", and is referred to as a non-detect or "less than" value. This package calculates the censored data equivalent of a number of statistics that are used in the analysis of environmental data that do not contain non-detects, i.e. the usual complete data case.
Package: | stand |
Type: | Package |
Version: | 2.0 |
Date: | 2015-09-10 |
License: | GPL version 2 or newer |
In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. Parametric methods used to determine acceptable levels of exposure are based on a two parameter lognormal distribution. The mean exposure level, an upper percentile, the exceedance fraction, and confidence limits for each of these statistics are calculated. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations–see e.g., Lyles and Kupper (1996). In this package the maximum likelihood method for randomly left censored lognormal data is used to calculate these statistics, and graphical methods are provided to evaluate the lognormal assumption. Nonparametric methods based on the product limit estimate for left censored data are used to estimate the mean exposure level, and the upper confidence limit on an upper percentile (i.e., the upper tolerance limit) is obtained using a nonparametric approach.
The American Industrial Hygiene Association (AIHA) has published a
consensus standard with two basic strategies for evaluating an
exposure profile—see Mulhausen and Damiano(1998), Ignacio and
Bullock (2006). Most of the AIHA methods are based on the assumptions
that the exposure data does not contain non-detects, and that a
lognormal distribution can be used to describe the data. Exposure
monitoring results from a compliant workplace tend to contain a high
percentage of non-detected results when the detection limit is close
to the exposure limit, and in some situations, the lognormal
assumption may not be reasonable. The function
IH.summary
calculates most of the statistics proposed by
AIHA for exposure data with non-detects. All of the methods are
described in the report Frome and Wambach (2005).
This work was supported in part by the Office of Health, Safety, and Security of the U. S. Department of Energy and was performed in the Computer Science and Mathematics Division (CSMD) at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725. Additional funding and oversight have been provided through the Occupational Exposure and Worker Studies Group, Oak Ridge Institute for Science and Education, which is managed by Oak Ridge Associated Universities under Contract No. DE-AC05-060R23100.
This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
The work has been authored by a contractor of the U.S. Government. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this work, or to allow others to do so for U. S. Government purposes.
Throughout this document and the online help files the greek letter is used to represent the confidence level for a one-sided confidence limit (default value 0.95). This is represented by
gam
or gamma
in the argument list and value of functions that compute confidence limits.
Aitchison, J. and J. A. C. Brown (1969), The Lognormal Distribution, Cambridge, U.K., Cambridge University Press.
Akritas, M. G., T. F. Ruscitti, and G. S. Patil (1994), "Statistical Analysis of Censored Environmental Data," Handbook of Statistics, Vol. 12, G. P. Patil and C. R. Rao (eds), 221-242, Elsevier Science, New York.
American Conference of Governmental Industrial Hygienists (ACGIH) (2004), "Notice of Intended Change In: 2004 TLVs and BEIs," ACGIH, p. 60, Cincinnati, OH.
Burrows, G. L. (1963), "Statistical Tolerance Limits - What are They," Applied Statistics, 12, 133-144.
Armstrong, B. G. (1992), "Confidence Intervals for Arithmetic Means of Lognormally Distributed Exposures," American Industrial Hygiene Association Journal, 53(8), 481-485.
Chambers, J. M., W. S. Cleveland, B. Kleiner, and P. A. Tukey (1983), Graphical Methods for Data Analysis, Duxbury Press, Boston.
Clopper, C. J. and Pearson, E. S. (1934), "The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial," Biometrika, 26, 404-413.
Cohen, A. C. (1991), Truncated and Censored Samples, Marcel Dekker, Inc., New York.
Crow, E. L. and K. Shimizu (1988), Lognormal Distribution, Marcel Decker, New York.
Cox, D. R. and D. V. Hinkley (1979), Theoretical Statistics, Chapman and Hall, New York.
Cox, D. R. and D. Oakes (1984), Analysis of Survival Data, Chapman and Hall, New York.
Department of Energy (December, 1999), "Chronic Beryllium Disease Prevention Program, Federal Register," 10 CFR Part 850, volume 64, number 235, 68854-68914.
Fowlkes, E. B. (1979), "Some Methods for Studying the Mixture of Two Normal (Lognormal) Distributions," Journal of the American Statistical Association, 74, 561-575.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Hahn, G. J. and W. Q. Meeker (1991), Statistical Intervals, John Wiley and Sons, New York.
Hewett, P. and G. H. Ganser, (1997), "Simple Procedures for Calculating Confidence Intervals Around the Sample Mean and Exceedance Fraction Derived from Lognormally Distributed Data," Applied Occupational and Environmental Hygiene, 12(2), 132-147.
Helsel, D. (1990), "Less Than Obvious: Statistical Treatment of Date Below the Detection Limit," Environmental Science and Technology, 24(12), 1767-1774.
Hesel, D. R. and T. A. Cohn (1988), "Estimation of Descriptive Statistics for Multiply Censored Water Quality Data," Water Resources Research, 24, 1997-2004.
Johnson, N. L. and B. L. Welch (1940), "Application of the Non-Central t-Distribution," Biometrika, 31(3/4), 362-389.
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assessing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Kalbfleisch, J. D. and R. L. Prentice (1980), The Statistical Analysis of Failure Time Data, John Wiley and Sons, New York.
Kaplan, E. L. and Meir, P. (1958), "Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, 457-481.
Land, C. E. (1972), "An Evaluation of Approximate Confidence Interval Estimation Methods for Lognormal Means," Technometrics, 14(1), 145-158.
Lyles R. H. and L. L. Kupper (1996), "On Strategies for Comparing Occupational Exposure Data to Limits," American Industrial Hygiene Association Journal, 57:6-15.
Meeker, W. Q. and L. A. Escobar (1998), Statistical Methods for Reliability Data, John Wiley and Sons, New York.
Moulton, L. H. and N. A. Halsey (1995), "A Mixture Model with Detection Limits for Regression Analysis of Antibody Response on Vaccine," Biometrics, 51, 1570-1578.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assessing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
Neuman, M. C., P. M. Dixon, B. B. Looney, and J. E. Pinder (1989), "Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations," Water Resources Bulletin, 25, 905-916.
Ng, M. P. (2002), "A Modification of Peto's Nonparametric Estimation of Survival Curves for Interval-Censored Data," Biometrics, volume 58, number 2, pp. 439-442.
Odeh, R. E. and D. B. Owen (1980), Tables for Normal Tolerance Limits, Sampling Plans, and Screening, Marcel Deker, New York.
Peto, R. (1973), "Experimental Survival Curves for Interval-censored Data," Applied Statistics, volume 22, number 1, pp. 86-91.
Schmee, J., D. Gladstein, and W. Nelson (1985), "Confidence Limits for Parameters of a Normal Distribution From Singly Censored Samples, Using Maximum Likelihood," Technometrics, 27, 119-128.
Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr. (1996), "Difficulties with the Lognormal Model in Mean Estimation and Testing," Environmental and Ecological Statistics, 3, 81-97.
Sommerville, P. N. (1958), "Tables for Obtaining Non-Parametric Confidence Limits," Annals of Mathematical Statistics, 29, 599-601.
Taylor, D. J., L. L. Kupper, S. M. Rappaport, and R. H. Lyles (2001), "A Mixture Model for Occupational Exposure Mean Testing with a Limit of Detection," Biometrics, 57, 681-688.
Tuggle, R. M. (1982), "Assessment of Occupational Exposure Using One-Sided Tolerance Limits," American Industrial Hygiene Association Journal, 43, 338-346.
Turnbull, B. W. (1976), "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data," Journal of the Royal Statistical Society, Series B (Methodological), 38(3), 290-295.
Venables, W. N. and B. D. Ripley (2002), Modern Applied Statistics with S, 4th edition. Springer-Verlag, New York.
Verrill, S. and R. A. Johnson (1998), "Tables and Large-Sample Distribution Theory for Censored-Data Correlation Statistics for Testing Normality," Journal of the American Statistical Association, 83(404), 1192-1197.
Waller, L. A., and B. W. Turnbull, (1992), "Probability Plotting with Censored Data," The American Statistician, 46(1), 5-12.
# Example 1 from Frome and Wambach (2005) ORNL/TM-2005/52 # NOTE THAT FUNCTIONS NAMES AND DETAILS HAVE BEEN REVISED IN THIS PACKAGE # the results are the same. For example lnorm.ml() replaces mlndln(). data(SESdata) mle<-lnorm.ml(SESdata) unlist(mle[1:4]) # ML estimates mu sigma E(X) and sigma^2 unlist(mle[5:8]) # ML Estimates of standard errors unlist(mle[9:13]) # additional output from ORNL/TM-2005/52 IH.summary(SESdata,L=0.2) # All sumarry statistics for SESdata # lognormal q-q plot for SESdata Figure in ORNL/TM-2005/52 qq.lnorm(plend(SESdata),mle$mu,mle$sigma) title("SESdata: Smelter-Elevated Surfaces")
# Example 1 from Frome and Wambach (2005) ORNL/TM-2005/52 # NOTE THAT FUNCTIONS NAMES AND DETAILS HAVE BEEN REVISED IN THIS PACKAGE # the results are the same. For example lnorm.ml() replaces mlndln(). data(SESdata) mle<-lnorm.ml(SESdata) unlist(mle[1:4]) # ML estimates mu sigma E(X) and sigma^2 unlist(mle[5:8]) # ML Estimates of standard errors unlist(mle[9:13]) # additional output from ORNL/TM-2005/52 IH.summary(SESdata,L=0.2) # All sumarry statistics for SESdata # lognormal q-q plot for SESdata Figure in ORNL/TM-2005/52 qq.lnorm(plend(SESdata),mle$mu,mle$sigma) title("SESdata: Smelter-Elevated Surfaces")
Data from Mulhausen and Damiano Appendix V is used to illustrate data with "less-than" values (non-detects).
data(aihand)
data(aihand)
A data frame with 15 observations on the following 3 variables
air monitoring data with 3 non-detects
0 if non-detect, 1 if detect
air monitoring data
The data in column 1 was obtained from the data in column 3 by assuming a limit of detection of 1.9. The original data in column 3 is used as an example in Appendices V, VI, and VII ( see Tables V.1, V.5, V.6, VI.2 ) to illustrate methods of analysis when there are no non-detects.
Table V.2 page 244 in Mulhausen and Damiano (1998) and Table IV.3 page 349 in Ignacio and Bullock (2006)
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assessing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assessing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
data(aihand) # Table V.2 Mulhausen and Damiano (1998) # calculate summary statistics for non-detect data ndt <- IH.summary(aihand,L=5) x <- aihand$x ; det <- rep(1,length(x)) aiha<-data.frame(x,det) # complete data # calculate summary statistics for complete data cdt <- IH.summary(aiha,L=5) # output results in a table round(cbind(cdt,ndt),3) # rm(aiha,aihand,cdt,det,ndt) # add results for exact method for complete data case # for Xp and the exceedance fraction
data(aihand) # Table V.2 Mulhausen and Damiano (1998) # calculate summary statistics for non-detect data ndt <- IH.summary(aihand,L=5) x <- aihand$x ; det <- rep(1,length(x)) aiha<-data.frame(x,det) # complete data # calculate summary statistics for complete data cdt <- IH.summary(aiha,L=5) # output results in a table round(cbind(cdt,ndt),3) # rm(aiha,aihand,cdt,det,ndt) # add results for exact method for complete data case # for Xp and the exceedance fraction
As part of a chronic disease prevention program the DOE
adopted an 8-hour time-weighted average (TWA) occupational exposure
limit (OEL) value of 0.2 proposed by the American
Conference of Government Industrial Hygienists (DOE 10 CRF Part
850 and ACGIH 2004).
data(beTWA)
data(beTWA)
A data frame with 280 observations on the following 2 variables:
8-hour TWA Beryllium exposure
0 if non-detect; 1 if detect
The beTWA data set is the results of 280 personal 8-hour TWA
beryllium exposure readings at a DOE facility. This data contains 175
non-detects that range in value from 0.005 to 0.1 . A detailed description and analysis of this data is given
as Example 2 in Section 4 of Frome and Wambach (2005).
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge , TN 37830 Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
data(beTWA) ## calculate all of the summary statistics described in ## Example 2 in Section 4 of Frome and Wambach (2005) round( IH.summary(beTWA,L=0.2), 3)
data(beTWA) ## calculate all of the summary statistics described in ## Example 2 in Section 4 of Frome and Wambach (2005) round( IH.summary(beTWA,L=0.2), 3)
Surface wipe samples obtained from containers that are used to ship beryllium components.
data(cansdata)
data(cansdata)
A data frame with 120 observations on the following 4 variables:
Surface wipe sample
1 if detect, 0 if non-detect
a factor with levels A
and B
a factor with levels 1
and 2
In a scoping survey, the investigator
decides to divide the survey unit into two strata: A
, used recently,
and B
, not used for several years. The specified limit that is
used to determine if the survey unit is contaminated is
.
An initial sample of n = 30 was obtained from each stratum (sample = 1).
The initial survey produced discrepant results that were
hard to interpret. A second sample of n = 30 surface wipe samples was
obtained from strata
A
and B
. Results below the limit of
quantification are reported as non-detects.
Frome, E. L. and Wambach, P. F. (2005), " Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
data(cansdata) # subset container data into stratum A and stratum B sa60 <- cansdata[ cansdata$st=="A",] ; Ia <- "Be Samples From Stratum A" sb60 <- cansdata[ cansdata$st=="B",] ; Ib <- "Be Samples From Stratum B" mle.sa60 <- unlist(lnorm.ml(sa60)) # MLEs for stratum A mle.sb60 <- unlist(lnorm.ml(sb60)) # print MLE for stratum A and B round( data.frame(mle.sa60,mle.sb60),3) # # Q-Q plot for each stratum par( mfcol=c(1,2) ) qq.lnorm(plend(sa60),mle.sa60[1,2],ylim=c(0.01,1.2),xlim=c(-0.5,2.5),main=Ia ) qq.lnorm(plend(sb60),mle.sb60[1,2],ylim=c(0.01,1.2),xlim=c(-0.5,2.5),main=Ib ) # list all summary statistics by Strata round(IH.summary(cansdata,L=0.2,bcol=3),4)
data(cansdata) # subset container data into stratum A and stratum B sa60 <- cansdata[ cansdata$st=="A",] ; Ia <- "Be Samples From Stratum A" sb60 <- cansdata[ cansdata$st=="B",] ; Ib <- "Be Samples From Stratum B" mle.sa60 <- unlist(lnorm.ml(sa60)) # MLEs for stratum A mle.sb60 <- unlist(lnorm.ml(sb60)) # print MLE for stratum A and B round( data.frame(mle.sa60,mle.sb60),3) # # Q-Q plot for each stratum par( mfcol=c(1,2) ) qq.lnorm(plend(sa60),mle.sa60[1,2],ylim=c(0.01,1.2),xlim=c(-0.5,2.5),main=Ia ) qq.lnorm(plend(sb60),mle.sb60[1,2],ylim=c(0.01,1.2),xlim=c(-0.5,2.5),main=Ib ) # list all summary statistics by Strata round(IH.summary(cansdata,L=0.2,bcol=3),4)
When the distribution function for the X's is not specified a nonparametric approach
can be used to estimate the exceedance fraction the
proportion of measurements that exceed the limit L.
efclnp(dd,gam = 0.95,L)
efclnp(dd,gam = 0.95,L)
dd |
An n by 2 matrix or data frame with |
gam |
one-sided confidence level |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
Given a random
sample of size n the number y of nonconforming observations (i.e., y =
number of X's that exceed the limit L) is described using the binomial
distribution. The point estimate of FL is and confidence
limits are obtained using the method of Clopper and Pearson (1934)
(Hahn and Meeker, 1991) and the R documentation for base R
function
binom.test
.
A LIST with components:
fnp |
nonparametric estimate of exceedance fraction (as percent) |
fnp.LCL |
is the 100* |
fnp.UCL |
is the 100* |
L |
is specified limit for the exceedance fraction( e.g. OEL) |
gam |
one-sided confidence level |
All non-detects < L
The estimates of the exceedance fraction and CL's are in percentage units
E. L. Frome
Clopper, C. J. and E. S. Pearson (1934), "The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial," Biometrika, 26, 404-413.
Hahn, G. J. and W. Q. Meeker (1991), Statistical Intervals, John Wiley and Sons, New York.
# calculate nonparametric estimate # for Example 2 in ORNLTM2005 data(beTWA) unlist(efclnp(beTWA,L=0.2)) # calculate ML estimates of exceedance fraction and CLs unlist(efraction.ml(beTWA,L=0.2))
# calculate nonparametric estimate # for Example 2 in ORNLTM2005 data(beTWA) unlist(efclnp(beTWA,L=0.2)) # calculate ML estimates of exceedance fraction and CLs unlist(efraction.ml(beTWA,L=0.2))
Calculate estimate of the exceedance fraction
and exact confidence limits for random sample from normal/lognormal distribution.
This function should only be used for complete samples.
efraction.exact(x, gam = 0.95, L=NA ,logx=TRUE,wpnt=FALSE)
efraction.exact(x, gam = 0.95, L=NA ,logx=TRUE,wpnt=FALSE)
x |
vector of data values |
gam |
one-sided confidence level |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
logx |
If |
wpnt |
if |
The exceedance fraction represent the proportion of the X's
that exceed a given limit Lp. The null hypothesis of interest is
; i.e., Fo is the maximum proportion of the
population that can exceed the limit Lp. The null hypothesis is
rejected if the
UCL for FL is less than Fo ,
indicating that the exposure profile is acceptable. The type I error rate
for this test is less than or equal to
.
A LIST with components:
f |
estimate of exceedance fraction for lognormal distribution as % |
fe.LCL |
100* |
fe.UCL |
100* |
L |
L is specified limit for the exceedance fraction, e.g. the occupational exposure limit |
gam |
one-sided confidence level |
Logx |
If |
(fe.LCL, fe.UCL) is an approximate percent
confidence interval for F. The R function
uniroot
is used to find the
noncentrality parameter of noncentral t distribution to calculate CL's
for where F = pnorm(U). In some versions of R this
may cause a warning message. See R bug report RP 9171 full precision
was not achieved in 'pnt'. This warning message may occur in
uniroot
calls to pt
and does not effect the precision of the final result
E. L. Frome
Johnson, N. L. and B. L. Welch (1940), "Application of the Non-Central t-Distribution," Biometrika, 31(3/4), 362-389.
Lyles, R. H. and L. L. Kupper (1996), "On strategies for comparing occupational exposure data to limits," American Industrial Hygiene Association Journal. 57:6-15.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assesing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assessing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
Help files for efraction.ml
,efclnp
,
percentile.exact
, efraction.exact
,
aihand
# calculate exceedance fraction and exact CLs for Example data # Appendix Mulhausen and Damiano(1998) Ignacion and Bullock (2006 data(aihand) x <- aihand$x ; det <- rep(1,length(x)) aiha<-data.frame(x,det) # complete data unlist(efraction.exact(x,gam=0.95,L=5) ) # exact CLs unlist(efraction.ml(aiha,gam=0.95,L=5)) # ML CLs unlist(efclnp(aiha,L=5)) # nonparametric CLs
# calculate exceedance fraction and exact CLs for Example data # Appendix Mulhausen and Damiano(1998) Ignacion and Bullock (2006 data(aihand) x <- aihand$x ; det <- rep(1,length(x)) aiha<-data.frame(x,det) # complete data unlist(efraction.exact(x,gam=0.95,L=5) ) # exact CLs unlist(efraction.ml(aiha,gam=0.95,L=5)) # ML CLs unlist(efclnp(aiha,L=5)) # nonparametric CLs
Calculate the ML estimate of the exceedance fraction
and "large sample" confidence limits for lognormal data with non-detects.
efraction.ml(dd, gam = 0.95, L = 5, dat = TRUE)
efraction.ml(dd, gam = 0.95, L = 5, dat = TRUE)
dd |
if |
gam |
one-sided confidence level |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
dat |
if |
The exceedance fraction FL represent the proportion of the X's that exceed a
given limit Lp. The null hypothesis of interest is ; i.e., Fo is the maximum proportion of the population that can exceed
the limit Lp. The ML point estimate of FL is
where
, and N(v) is the standard normal distribution
function. The large sample
LCL for
is LCLv
, where
,
and p1 and p2 are partial derivatives of with respect to
and
.
The
UCL for FL is
.
The
LCL for FL is
, where
. The null hypothesis
is rejected if the
UCL for FL is less
than Fo, indicating that the exposure profile is acceptable. The large
sample ML estimates of the exceedance fraction and
confidence limits for lognormal data are calculated using the
output from
lnorm.ml
.
A LIST with components:
f |
is the ML estimate of exceedance fraction for lognormal distribution |
f.LCL |
is the 100* |
f.UCL |
is the 100* |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
gam |
one-sided confidence level |
(f.LCL, f.UCL) is an 100 percent confidence interval
for F
E. L. Frome
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
# calculate ML estimate of exceedance fraction and CLs for Example 2 in ORNLTM2005-52 data(beTWA) unlist(efraction.ml(beTWA,L=0.2)) # calculate nonparametric CLs unlist(efclnp(beTWA,L=0.2))
# calculate ML estimate of exceedance fraction and CLs for Example 2 in ORNLTM2005-52 data(beTWA) unlist(efraction.ml(beTWA,L=0.2)) # calculate nonparametric CLs unlist(efclnp(beTWA,L=0.2))
Example of quarterly film badge data with non-detects
data(filmbadge)
data(filmbadge)
A data frame with 28 observations on the following 6 variables.
lower end of annual dose
upper end of annual
dose for quarter 1
dose for quarter 2
dose for quarter 3
dose for quarter 4
The product limit estimate (PLE) of the distribution function F(x) was first proposed by Kaplan and Meier (1958) for right-censored data, and Schmoyer et. al. (1996) defined the PLE for situations in which left censored data occurs. Both left censoring and right censoring are special cases of the general PLE (Peto,1973; Turnbull, 1976). A non-detect or left censored dose occurs when the dose is less than a detection limit. For a non-detect it is only known that the dose does not exceed the limit of detection(LOD). To obtain an estimate of the annual dose distribution F(x) from quarterly doses the general PLE is required since the annual doses will be "interval censored" if at least two of the quarterly doses are non-detects. Consider, for example, a worker with quarterly dose of 0, 50, 0, and 100 mrem. The quarterly interval doses are (0,30), (50,50), (0,30), and (100,100) assuming an LOD of 30 mrem. The annual dose is obtained by adding the lower and upper bounds of the quarterly doses and is equal to (150,210) for the example, i.e., it is only known that the dose is between 150 and 210.
The dose unit is mrem (100 mrem= 10mSv). The LOD is 30 mrem. This
is a representative sample of quarterly external dose records for 28
workers at the Y-12 facility in Oak Ridge (see ORAUT-OTIB-0044,
Rev. 01-A and ORAUT-OTIB-0064) and is used here to illustrate
the calculation of the PLE for interval censored data. The R function
survreg
can be used to obtain ML estimates of the parameters
for the lognormal model and the covariance matrix that is needed for
CLs for the exceedance fraction and 95th percentile.
Kaplan, E. L. and P. Meir (1958), "Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, 457-481.
Ng, M. P. (2002), "A Modification of Peto's Nonparametric Estimation of Survival Curves for Interval-Censored Data," Biometrics, volume 58, number 2, pp. 439-442.
ORAUT (Oak Ridge Associated Universities Team), 2005c, "Historical Evaluation of the Film Badge Dosimetry Program at the Y-12 Facility in Oak Ridge, Tennessee: Part 1 - Gamma Radiation", ORAUT-OTIB-0044, Rev. 01-A (was ORAUT-RPRT-0032, Rev. 00), Oak Ridge, Tennessee.
ORAUT (Oak Ridge Associated Universities Team), 2007, "External Coworker Dosimetry Data for the Y-12 National Security Complex". ORAUT-OTIB-0064 (Under Revision).
Peto, R. (1973), "Experimental Survival Curves for Interval-censored Data," Applied Statistics, volume 22, number 1, pp. 86-91.
Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr. (1996), "Difficulties with the Lognormal Model in Mean Estimation and Testing," Environmental and Ecological Statistics, 3, 81-97.
Turnbull, B. W. (1976), "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data," Journal of the Royal Statistical Society, Series B (Methodological), 38(3), 290-295.
data(filmbadge) head(filmbadge) # LOOK AT FIRST FIVE RECORDS # USE icfit() TO CALCULATE PLE FOR INTERVAL CENSORED DATA par( mfrow=c(1,2) ) out <- icfit(filmbadge$dlow,filmbadge$dhigh) # PLOT EXCEEDANCE S(x) vs x USING icplot() tp <- "PLE of Exceedance for Filmbadge Data" icplot(out$surv, out$time,XLAB="Dose",YLAB="Exceedance Probability",main=tp,cex.main=.8) # USE pleicf() TO CALCULATE PLE FOR filmbadge DATA ple.fb <- pleicf(filmbadge[,1:2],FALSE) # USE qq.lnorm FOR LOGNORMAL Q-Q PLOT FOR INTERVAL CENSORED DATA tmp <- qq.lnorm(ple.fb) GM <-round(exp(tmp$par[1])); GSD <- round(exp(tmp$par[2]),2) tp<-paste("Lognormal Q-Q plot for Filmbadge\n Data GM= ",GM,"GSD= ",GSD) title(tp,cex.main=0.8) # title for q-q plot with graphical parameter estimates # RESULTS FROM pleicf() round(ple.fb,3) #
data(filmbadge) head(filmbadge) # LOOK AT FIRST FIVE RECORDS # USE icfit() TO CALCULATE PLE FOR INTERVAL CENSORED DATA par( mfrow=c(1,2) ) out <- icfit(filmbadge$dlow,filmbadge$dhigh) # PLOT EXCEEDANCE S(x) vs x USING icplot() tp <- "PLE of Exceedance for Filmbadge Data" icplot(out$surv, out$time,XLAB="Dose",YLAB="Exceedance Probability",main=tp,cex.main=.8) # USE pleicf() TO CALCULATE PLE FOR filmbadge DATA ple.fb <- pleicf(filmbadge[,1:2],FALSE) # USE qq.lnorm FOR LOGNORMAL Q-Q PLOT FOR INTERVAL CENSORED DATA tmp <- qq.lnorm(ple.fb) GM <-round(exp(tmp$par[1])); GSD <- round(exp(tmp$par[2]),2) tp<-paste("Lognormal Q-Q plot for Filmbadge\n Data GM= ",GM,"GSD= ",GSD) title(tp,cex.main=0.8) # title for q-q plot with graphical parameter estimates # RESULTS FROM pleicf() round(ple.fb,3) #
This function calculates the self-consistent estimate of survival
for interval censored data.
(i.e., the nonparametric maximum likelihood estimate that generalizes
the Kaplan-Meier estimate to interval censored data).
The censoring is such that if the ith observation fails at ,
we only observe that
. Data may be entered with
"exact" values, i.e.,
. In that case the
is
changed internally to
which is the next lower of any of the
observed endpoints (unless
then an error results).
icfit(L, R, initp = NA, minerror = 1e-06, maxcount = 1000)
icfit(L, R, initp = NA, minerror = 1e-06, maxcount = 1000)
L |
a vector of the left endpoints of the interval |
R |
a vector of the right endpoints of the interval |
initp |
a vector with an initial estimate of the density
function. This vector should sum to 1 and have a
length equal to the number of unique values of |
minerror |
The minimum error for convergence purposes. The
EM algorithm stops when |
maxcount |
the maximum number of iterations. Default is 10000. |
The algorithm is basically an EM-algorithm applied to
interval censored data (see Turnbull, 1976); however,
first there is a primary reduction (See Aragon and
Eberly, 1992). Convergence is defined when the maximum
reduced gradient is less than minerror, and the
Kuhn-Tucker conditions are approximately met,
otherwise a warning will result. (see Gentleman and
Geyer, 1994). There may be other faster algorithms,
but they are more complicated (see Aragon and Eberly,
1992). The output for time is sort(unique(c(0,L,R,Inf)))
without the Inf. The output for p
keeps the value
related to Inf so that p
may be inserted into initp
for another run. The outputs for p
and surv
act as if
the jumps in the survival curve happen at the largest
of the possible times (see Gentleman and Geyer, 1994,
Table 2, for a more accurate way to present p
).
Returns a list with the following elements:
u |
a vector of Lagrange multipliers. If there are any
negative values of |
error |
this is the maximum of the reduced gradients. If
convergence is correct then |
count |
number of iterations of the self-consistent algorithm (i.e., EM-algorithm) |
time |
a vector of times (see details) |
p |
a vector of probabilities, all except the last
values are associated with the time
vector above, i.e., |
surv |
a vector of survival values associated with
the time vector above, i.e.,
|
The functions icfit
, icplot
,
and ictest
and documentation for these functions are from Michael P. Fay.
You are free to distribute these functions to whomever is
interested. They come with no warranty however.
Michael P. Fay
Aragon, J. and Eberly, D. (1992), "On Convergence of Convex Minorant Algorithms for Distribution Estimation with Interval-Censored Data," Journal of Computational and Graphical Statistics. 1: 129-140.
Gentleman, R. and Geyer, C. J. (1994), "Maximum Likelihood for Interval Censored Data: Consistency and Computation," Biometrika, 81, 618-623.
Turnbull, B. W. (1976), "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data," Journal of the Royal Statistical Society, Series B,(Methodological), 38(3), 290-295.
# Calculate and plot a Kaplan-Meier type curve for interval censored data. # This is S(x) = 1 - F(x) and is the sample estimate of the probability # of exceeding x. The filmbadge data is used as an example. data(filmbadge) out <- icfit(filmbadge$dlow,filmbadge$dhigh) icplot(out$surv, out$time,XLAB="Dose",YLAB="Exceedance Probability")
# Calculate and plot a Kaplan-Meier type curve for interval censored data. # This is S(x) = 1 - F(x) and is the sample estimate of the probability # of exceeding x. The filmbadge data is used as an example. data(filmbadge) out <- icfit(filmbadge$dlow,filmbadge$dhigh) icplot(out$surv, out$time,XLAB="Dose",YLAB="Exceedance Probability")
This function takes a vector of survival values and a vector of time values and either plots a survival function or adds the survival lines to an existing plot.
icplot(surv, time = as.numeric(names(surv)), xrange = NA, lines.only = FALSE, XLAB = "Time", YLAB = "Probability", LTY = 1, ...)
icplot(surv, time = as.numeric(names(surv)), xrange = NA, lines.only = FALSE, XLAB = "Time", YLAB = "Probability", LTY = 1, ...)
surv |
a vector of survival values. |
time |
a vector of times. These are related to the
vector of survival by
|
xrange |
the range of the x values. The default is
|
lines.only |
a logical value; default = |
XLAB |
a character string denoting the x label. Default = "Time". |
YLAB |
a character string denoting the y label. Default = "Probability". |
LTY |
an integer denoting the line type (lty value). |
... |
additional plotting parameters (except xlab and ylab). |
See icplot
details. This may not be the most accurate
way to present the data. See Betensky, Lindsey, Ryan,
and Wand (1999, p. 238) for an alternative method.
Returns a plot or adds a line to an existing plot.
The functions icfit
, icplot
,
and ictest
and documentation for these functions are from Michael P. Fay.
You are free to distribute these functions to whomever is
interested. They come with no warrantee however.
Michael P. Fay
Betensky, R. A., Lindsey, J. C., Ryan, L. M., and Wand, M. P. (1999), "Local EM Estimation of the Hazard Function for Interval-Censored Data," Biometrics, 55: 238-245.
# Plot two Survival curves on one plot. # need data set for this example # s1<- icfit(left[treatment==1],right[treatment==1]) # s2<- icfit(left[treatment==2],right[treatment==2]) # icplot(s1$surv,s1$time,xrange=range(c(s1$time,s2$time))) # icplot(s2$surv,s2$time,lines.only=TRUE,LTY=2)
# Plot two Survival curves on one plot. # need data set for this example # s1<- icfit(left[treatment==1],right[treatment==1]) # s2<- icfit(left[treatment==2],right[treatment==2]) # icplot(s1$surv,s1$time,xrange=range(c(s1$time,s2$time))) # icplot(s2$surv,s2$time,lines.only=TRUE,LTY=2)
This function performs several different tests for interval censored data. It has 3 different models that generalize either the Wilcoxon rank sum test (model = "PO") or the logrank test (model = "GPH" or model = "Sun"). Each model may be one of 2 types, either an asymptotic permutation test or a score test.
The censoring is such that if the ith observation fails at ,
we only observe that
. Data may be entered with
"exact" values, i.e.,
. In that case the
is
changed internally to
which is the next lower of any of the
observed endpoints (unless
then an error results).
The function requires a previously calculated survival
curve (see icfit
).
ictest(L, R, S, group, model, type = "permutation", fuzz , output.scores)
ictest(L, R, S, group, model, type = "permutation", fuzz , output.scores)
L |
a vector of left endpoints of the interval.
We assume each |
R |
a vector of right endpoints of the interval. Exact values may be entered as L[i] == R[i] they are changed internally. |
S |
a vector of survival values calculated from all the data (i.e., ignoring group membership) (see time below), typically output from icfit. |
group |
a vector denoting the group for which the test is desired. If group is a factor or character then a k-sample test is performed, where k is the number of unique values of group. If group is numeric then a "correlation" type test is performed. If there are only two groups, both methods give the same results. |
model |
a character vector with three possible values describing the model: model = "GPH" (default) gives the grouped proportional hazards model. This generalizes a logrank test. model = "Sun" gives a Logistic model. This generalizes another form of the logrank test. model = "PO" gives a proportional odds model. This generalizes the Wilcoxon rank sum test. (see details). |
type |
a character vector with two possible values, "permutation" or "score" (see details) |
fuzz |
a small numeric value. Because we need to determine places in the survival curve where there are no changes, and the machine may have rounding error, we use this. (Default = 1e-12) |
output.scores |
a logical value. |
The 3 models are compared in depth in Fay (1999). For censored data two common likelihoods are the marginal likelihood or the ranks and the likelihood with nuisance parameters for the baseline survival. Here we use the latter likelihood (as in Finkelstein, 1986, Fay, 1996, and Sun, 1996). It is difficult to create proper score tests for this likelihood because the number of nuisance parameters typically grows with the sample size and often many of the parameters are equal at the nonparametric MLE, i.e., they are on the boundary of the parameter space. One way around this is to perform a permutation test on the scores (Fay, 1996). A second way around (the boundary problem at least) it is to redefine the interval points so that no boundary problems exist (see Fay, 1996). These are the two methods used here.
We present two generalizations of the logrank
test. The method of Sun (1996) is more difficult to
calculate and has no theoretical advantages
of which I am aware. The grouped proportional
hazards model of Finkelstein (1996) is recommended.
Note that when icfit
and ictest
are used on right-censored
data, because the method of estimating
variance is different, even Sun's method does not
produce exactly the standard logrank test results.
There are some typos in Appendix II of Fay (1999). See the S code for the corrections.
Returns a list with the following elements:
scores |
only returned if output.scores = T. This is a vector the same length as L and R, containing the scores used in the permutation test. |
U |
The efficient score vector. When group is a factor or character vector then each element of U has the interpretation as the weighted sum of "observed" minus "expected" deaths for the group element defined by the label of U. Thus negative values indicate better than average survival (see Fay, 1999). |
chisq.value |
Chi-square value of the test |
df |
Degrees of freedom for the chi-square test. |
pvalue |
p-value from the chi-square value. |
test |
a character vector of length one, either "2-sample","correlation" or "k-sample" where k in the number of unique group values. |
model |
same as model input, either "GPH","Sun" or "PO" |
type |
same as type input, either "permutation" or "score" |
The functions icfit
, icplot
,
and ictest
and documentation for these functions are from Michael P. Fay.
You are free to distribute these functions to whomever is
interested. They come with no warranty however.
Michael P. Fay
Fay, M. P. (1996), "Rank Invariant Tests for Interval Censored Data Under the Grouped Continuous Model," Biometrics, 52: 811-822.
Fay, M. P. (1999), "Comparing Several Score Tests for Interval Censored Data," Statistics in Medicine, 18: 273-285.
Finkelstein, D. M. (1986), "A Proportional Hazards Model for Interval Censored Failure Time Data," Biometrics, 42: 845-854.
Sun, J. (1996), "A Non-parametric Test for Interval Censored Failure Time Data With Applications to AIDS Studies," Statistics in Medicine, 15: 1387-1395.
## Perform a logrank-type test using the observed information variance. ## need data set for this example # out<-icfit(left,right) # ictest(left,right,out$surv,group,out$time,model = "GPH",type = "score") # ## Perform a Wilcoxon rank sum-type test using asymptotic permutation variance. # # ictest(left,right,out$surv,group,out$time, model = "PO",type = "permutation")
## Perform a logrank-type test using the observed information variance. ## need data set for this example # out<-icfit(left,right) # ictest(left,right,out$surv,group,out$time,model = "GPH",type = "score") # ## Perform a Wilcoxon rank sum-type test using asymptotic permutation variance. # # ictest(left,right,out$surv,group,out$time, model = "PO",type = "permutation")
Summary statistic described by The American Industrial Hygiene Association (AIHA) for occupational exposure data are calculated for samples with non-detects (aka left censored data). Parametric estimates are based on a lognormal model using maximum likelihood (ML). Nonparametric methods are based on the product limit estimate (PLE) for left censored data.
IH.summary(dd,L, p = 0.95, gam = 0.95,bcol=NA)
IH.summary(dd,L, p = 0.95, gam = 0.95,bcol=NA)
dd |
An n by 2 matrix or data frame with |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
p |
p is probability for Xp the 100pth percentile. Default is 0.95 |
gam |
one-sided confidence level |
bcol |
Column number that contains a BY variable. This column must contain a factor and the value of each of the summary statistics is calculated for each level of the factor. Default NA |
Regulatory and advisory criteria for evaluating the adequacy of occupational exposure controls are generally expressed as limits that are not to be exceeded in a work shift or shorter time-period if the agent is acutely hazardous. Exposure monitoring results above the limit require minimal interpretation and should trigger immediate corrective action. Demonstrating compliance with a limit is more difficult. AIHA has published a consensus standard with two basic strategies for evaluating an exposure profile—see Mulhausen and Damiano(1998), Ignacio and Bullock (2006). The first approach is based on the mean of the exposure distribution, and the second approach considers the "upper tail" of the exposure profile. Statistical methods for estimating the mean, an upper percentile of the distribution, the exceedance fraction, and the uncertainty in each of these parameters are provided by this package. Most of the AIHA methods are based on the assumptions that the exposure data does not contain non-detects, and that a lognormal distribution can be used to describe the data. Exposure monitoring results from a compliant workplace tend to contain a high percentage of non-detected results when the detection limit is close to the exposure limit, and in some situations, the lognormal assumption may not be reasonable. All of these methods are described in a companion report by Frome and Wambach (2005).
A data.frame with column names based on levels of the BY variable and row names:
mu |
ML estimate of mean of y=log(x) |
se.mu |
Estimate of standard error of mu |
sigma |
ML estimate of sigma |
se.sigma |
Estimate of standard error of sigma |
GM |
MLE of geometric mean |
GSD |
MLE of geometric standard deviation |
EX |
MLE of E(X) the (arithmetic) mean |
EX-LCL |
Lower Confidence Limit for E(X) |
EX-UCL |
Upper Confidence Limit for E(X) |
KM-mean |
Kaplan-Meier(KM) Estimate of E(X) |
KM-LCL |
KM Lower Confidence Limit for E(X) |
KM-UCL |
KM Upper Confidence Limit for E(X) |
KM-se |
Standard Error of KM-mean |
obs.Xp |
Estimate of Xp from PLE |
Xp |
ML estimate of Xp the pth percentile |
Xp.LCL |
MLE of LX(p,gam) the LCL for Xp |
Xp.UCL |
MLE of UX(p,gam) the UCL for Xp |
zL |
MLE of the Z value for limit L |
NpUTL |
Nonparametric estimate of the UTL |
Maximum |
Largest value in the data set |
NonDet |
percent of X's that are left censored, i.e., non-detects |
n |
number of observations in the data set |
Rsq |
Square of correlation for the quantile-quantile (q-q) plot |
m |
number X's greater than the LOD |
f |
MLE of exceedance fraction F for limit L |
f.LCL |
LCf(L,gam) MLE of LCL for F |
F.UCL |
UCf(L,gam) MLE of UCL for F |
fnp |
Nonparametric estimate of F for limit L |
fnp.LCL |
Nonparametric estimate of LCL for F |
fnp.UCL |
Nonparametric estimate of UCL for F |
m2log(L) |
-2 times the log-likelihood function |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
p |
percentile for UTL |
gam |
one-sided confidence level |
E. L. Frome
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assesing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assesing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
See complete list of references at About-STAND
See Also lnorm.ml
, efraction.ml
,
percentile.ml
, kmms
# Analysis for cansdata Example 1 from ORNLTM2005-52 data(cansdata) Allcans<- round(IH.summary(cansdata,L=0.2,bcol=NA),3) # Example using cansdata with By variable cansout <- round(IH.summary(cansdata,L=0.2,bcol=3),3) # combine out from both analysis cbind(Allcans,cansout)
# Analysis for cansdata Example 1 from ORNLTM2005-52 data(cansdata) Allcans<- round(IH.summary(cansdata,L=0.2,bcol=NA),3) # Example using cansdata with By variable cansout <- round(IH.summary(cansdata,L=0.2,bcol=3),3) # combine out from both analysis cbind(Allcans,cansout)
Kaplan- Meier Estimate of Mean and Standard Error of the Mean for Left Censored Data
kmms(dd, gam = 0.95)
kmms(dd, gam = 0.95)
dd |
An n by 2 matrix or data frame with |
gam |
one-sided confidence level |
The product limit estimate (PLE) of the cumulative distribution function
was first proposed by Kaplan and Meier (1958) for right censored data.
Turnbull (1976) provides a more general treatment of nonparametric
estimation of the distribution function for arbitrary censoring. For
randomly left censored data, the PLE is defined by Schmoyer et al.
(1996)–see plend
.
The mean of the PLE is a censoring-adjusted
point estimate of E(X) the mean of X. An approximate standard error
of the PLE mean can be obtained using the method of Kaplan and Meier
(1958), and the UCL is
, where
sp
is the Kaplan-Meier standard error of the mean
adjusted by the factor , where
m
is the number of detects in the
sample. When there is no censoring this reduces to the second
approximate method described by Land (1972).
A LIST with components:
KM.mean |
Kaplan- Meier(KM) estimate of mean E(X) |
KM.LCL |
KM estimate of lower confidence limit |
KM.UCL |
KM estimate of upper confidence limit |
KM.se |
estimate of standard error of KM-mean |
gamma |
one-sided confidence level |
Error in KM.se corrected on 12 June 2007. KM standard error is adjusted by multiplying by sqrt(m/(m-1)) where m is number of detected values. Error occurred if there were ties in detected values by calculating the number of unique detected values. For example, for beTWA sqrt(m/(m-1)) is 1.004796 . Due to error 1.008032 was used. The sqrt(m/(m-1)) will always be smaller after correction, depending on value of m and the number of ties. See the example.
E. L. Frome
Kaplan, E. L. and Meier, P. (1958), "Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, 457-481.
Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr. (1996), "Difficulties with the Lognormal Model in Mean Estimation and Testing," Environmental and Ecological Statistics, 3, 81-97.
# results for beTWA data using kmms in stand Ver 1.1 with error # KM.mean KM.LCL KM.UCL KM.se gamma # 0.018626709 0.014085780 0.023167637 0.002720092 0.950000000 # data(beTWA) # Use data from Example 2 in ORNLTM2002-51 unlist(kmms(beTWA))
# results for beTWA data using kmms in stand Ver 1.1 with error # KM.mean KM.LCL KM.UCL KM.se gamma # 0.018626709 0.014085780 0.023167637 0.002720092 0.950000000 # data(beTWA) # Use data from Example 2 in ORNLTM2002-51 unlist(kmms(beTWA))
When an exposure measurement may be less than a detection limit closed form and exact methods have not been developed for the lognormal model. The maximum likelihood (ML) principle is used to develop an algorithm for parameter estimation, and to obtain large sample equivalents of confidence limits for the mean exposure level, the 100pth percentile, and the exceedance fraction. For a detailed discussion of assumptions, properties, and computational issues related to ML estimation see Cox and Hinkley (1979) and Cohen (1991).
lnorm.ml(dd)
lnorm.ml(dd)
dd |
An n by 2 matrix or data frame with |
For notational convenience the m detected values are listed first
followed by the
indicating non-detects, so that the data are
. If
is the same for each
non-detect, this is referred to as a left singly censored sample (Type I
censoring) and
is the limit of detection(LOD). If the
are different,
this is known as randomly (or progressively) left-censored data[see
Cohen(1991) and Schmoyer et al (1996)]. In some situations a value of 0 is
recorded when the exposure measurement is less than the LOD. In this
situation, the value of
is the LOD indicating that
is in the interval
. The probability density function for lognormal distribution is
where is normally distributed with mean
and standard
deviation
[Atkinson and Brown (1969)]. The geometric mean of X is
and the geometric standard deviation is
.
Strom and Stansberry (2000) provide a summary of these and other
relationships for lognormal parameters. Assuming the data are a random
sample from a lognormal distribution, the log of the likelihood function for
the unknown parameters
and
given the data is
where is the lognormal distribution function, i.e.,
is the probability that
.
The first summation is over
, and the second is over
.
To test that the mean of ,
at the
significance level a one-sided upper
confidence limit can be used. One method for calculating this UCL is to use the
censored data equivalent of Cox's direct method; i.e., calculate the ML
estimate of
, and
where
The ML estimator of E(X) is , the
LCL for E(X)
is exp[
], and the
UCL for
E(x) is
], where
. The
resulting confidence interval (LCL, UCL) has confidence level
. An equivalent procedure is to estimate
and its standard error directly, i.e., by maximizing the log-likelihood with
parameters
and
. ML estimates of
,
estimates of their standard errors, and covariance terms are calculated.
A list with components:
mu |
ML estimate of |
sigma |
ML estimate of |
logEX |
ML estimate of log of E(X) |
SigmaSq |
ML estimate of |
se.mu |
ML estimate of standard error of |
se.sigma |
ML estimate of standard error of |
se.logEX |
ML estimate of standard error of log of E(X) |
se.Sigmasq |
ML estimate of standard error of |
cov.musig |
ML estimate of cov( |
m |
number of detects |
n |
number of observations in the data set |
m2log(L) |
-2 times the log-likelihood function |
convergence |
convergence indicator from |
Local function ndln
is called by optim
for mu
and sigma
and local function ndln2
is called by optim
for logEX
and Sigmasq
.
E. L. Frome
Cohen, A. C. (1991), Truncated and Censored Samples, Marcel Decker, New York
Cox, D. R. and D. V. Hinkley (1979), Theoretical Statistics, Chapman and Hall, New York.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
optim
, efraction.ml
, percentile.ml
# Calculate MLE for Example 2 in ORNLTM2005-52 data(beTWA) mle.TWA<- unlist(lnorm.ml(beTWA)) # ML for Be monitoring data mle.TWA[1:4] # ML estimates of parameters mle.TWA[5:8] # Standard errors of ML estimates mle.TWA[9:13] # additional results from lnorm.ml
# Calculate MLE for Example 2 in ORNLTM2005-52 data(beTWA) mle.TWA<- unlist(lnorm.ml(beTWA)) # ML for Be monitoring data mle.TWA[1:4] # ML estimates of parameters mle.TWA[5:8] # Standard errors of ML estimates mle.TWA[9:13] # additional results from lnorm.ml
Find either the sample size or power for complete sample from lognormal distribution
npower.lnorm(n=NA,power=NA,fstar=1,p=0.95,gamma=0.95)
npower.lnorm(n=NA,power=NA,fstar=1,p=0.95,gamma=0.95)
n |
sample size |
power |
power of the test = 1 - |
fstar |
true percent of X's |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
gamma |
one-sided confidence level |
Find either the sample size n
or the power
of the test for specified
values of fstar
, p
, and gamma
. Either n
is missing
or power
is missing.
The null hypothesis of interest is
; i.e., Fo is the maximum proportion of the
population that can exceed the limit Lp. The null hypothesis is
rejected if the
UCL for F is less than Fo ,
indicating that the exposure profile is acceptable. For the complete
data case this is equivalent to testing the null hypothesis
at the
significance level.
See
efraction.exact
, percentile.exact
and
Section 2.3 of Frome and Wambach(2005) for further details.
A vector with components:
n |
sample size |
power |
power of the test = 1 - |
fstar |
true percent of X's |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
gamma |
one-sided confidence level |
The R function uniroot
is used to find a parameter of the
non-central t distribution. In some versions of R this
may cause a warning message. See R bug report RP 9171 full precision
was not achieved in 'pnt'. This warning message may occur in uniroot
calls to pt
and does not effect the precision of the final result
E.L. Frome
Johnson, N. L. and B. L. Welch (1940), "Application of the Non-Central t-Distribution," Biometrika, 31(3/4), 362-389.
Lyles R. H. and L. L. Kupper (1996), "On strategies for comparing occupational exposure data to limits," American Industrial Hygiene Association Journal, 57:6-15.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assesing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assessing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
Help files for efraction.ml
,percentile.ml
,
efclnp
,aihand
# EXAMPLE 1 # Table VII.1 Mulhausen and Damiano (1998) adapted from # Table II in Lyles and Kupper (1996) JAIHA vol 57 6-15 Table II # Sample Size Needed When Using UTL(95,95) to Show 95% Confidence # that the 95th Percentile is below the OEL (Power = 0.8) rx<-c(1.5,2,2.5,3) sdx<- sqrt(c(0.5,1,1.5,2,2.5,3)) tabn<-matrix(0,4,6) for ( i in 1:4) { for (j in 1:6) { fstar<- 100*(1 -pnorm( log(rx[i])/sdx[j] + qnorm(0.95) )) tabn[i,j]<- npower.lnorm(NA,0.8,fstar,p=0.95,gamma=0.95)[1] } } cn<- paste("GSD = ",round(exp(sdx),2),sep="" ) dimnames(tabn)<-list( round(1/rx,2),cn) rm(cn,rx,sdx) tabn # EXAMPLE 2 top<-"Power For Sample Size n = 20 for p=0.95 gamma=0.95" fstar <- seq(0.2,4.8,0.1) pow <- rep(1,length(fstar)) for (i in 1 : length(fstar)) { pow[i]<-npower.lnorm(20,NA,fstar[i],p=0.95,gamma=0.95)[2] } plot(fstar,pow,xlim=c(0,5),ylim=c(0,1),main=top, xlab="fstar = True Percent of Xs > L(Specified Limit )",ylab="Power")
# EXAMPLE 1 # Table VII.1 Mulhausen and Damiano (1998) adapted from # Table II in Lyles and Kupper (1996) JAIHA vol 57 6-15 Table II # Sample Size Needed When Using UTL(95,95) to Show 95% Confidence # that the 95th Percentile is below the OEL (Power = 0.8) rx<-c(1.5,2,2.5,3) sdx<- sqrt(c(0.5,1,1.5,2,2.5,3)) tabn<-matrix(0,4,6) for ( i in 1:4) { for (j in 1:6) { fstar<- 100*(1 -pnorm( log(rx[i])/sdx[j] + qnorm(0.95) )) tabn[i,j]<- npower.lnorm(NA,0.8,fstar,p=0.95,gamma=0.95)[1] } } cn<- paste("GSD = ",round(exp(sdx),2),sep="" ) dimnames(tabn)<-list( round(1/rx,2),cn) rm(cn,rx,sdx) tabn # EXAMPLE 2 top<-"Power For Sample Size n = 20 for p=0.95 gamma=0.95" fstar <- seq(0.2,4.8,0.1) pow <- rep(1,length(fstar)) for (i in 1 : length(fstar)) { pow[i]<-npower.lnorm(20,NA,fstar[i],p=0.95,gamma=0.95)[2] } plot(fstar,pow,xlim=c(0,5),ylim=c(0,1),main=top, xlab="fstar = True Percent of Xs > L(Specified Limit )",ylab="Power")
Given a random sample of size n
from a continuous
distribution, then, with a confidence level of at least ,
at least 100p percent of the population will be below the kth largest value in the
sample.
nptl(n , p = 0.95, gam = 0.95)
nptl(n , p = 0.95, gam = 0.95)
n |
the sample size |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
gam |
one-sided confidence level |
k |
index of the order statistic |
The maximum non-detect must be less than the kth largest value.
E. L. Frome
Sommerville, P. N. (1958), "Tables for Obtaining Non-Parametric Confidence Limits," Annals of Mathematical Statistics, 29, 599-601.
data(beTWA) k<- nptl(length(beTWA[,1])) rev(sort(beTWA[,1]))[k]
data(beTWA) k<- nptl(length(beTWA[,1])) rev(sort(beTWA[,1]))[k]
Calculate estimate of Xp the 100*p percentile of the
normal/lognormal distribution, and the lower and upper 100*% exact
confidence limits. The resulting interval (Xp.LCL,Xp.UCL) is an
approximate
percent confidence interval for
Xp the 100*p percentile. This function should only be used for complete samples.
percentile.exact(x, p = 0.95, gam = 0.95,logx=TRUE,wpnt=FALSE)
percentile.exact(x, p = 0.95, gam = 0.95,logx=TRUE,wpnt=FALSE)
x |
vector of positive data values |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
gam |
one-sided confidence level |
logx |
If |
wpnt |
if |
A point estimate of Xp, the 100pth percentile of a normal/lognormal
distribution is calculated. Exact confidence limits for Xp are
calculated using the quantile function of the non-central t
distribution. The exact UCL is , where
is the sample mean,
is the sample standard deviation, and the
depends on
and
. The exact LCL is
. The local function
kf
calculates and
using the quantile
function of the non-central t distribution
qt
.
The null hypothesis is rejected at the
significance level if the
UCL for Xp
is less than the specified limit Lp (indicating the exposure profile is acceptable).
A LIST with components:
Xp |
estimate of the pth percentile of the distribution |
Xpe.LCL |
|
Xpe.UCL |
|
p |
probability for Xp the 100pth percentile. Default 0.95 |
gam |
one-sided confidence level |
Logx |
If |
n |
sample size |
Ku |
the K factor used to calculate the exact UCL |
Kl |
the K' factor used to calculate the exact LCL |
The UCL is also referred to as an upper tolerance limit,
i.e., if p
= 0.95 and = 0.99 then Xpe.UCL is the exact UTL 95% - 99%.
E. L. Frome
Burrows, G. L. (1963), "Statistical Tolerance Limits - What are They," Applied Statistics, 12, 133-144.
Johnson, N. L. and B. L. Welch (1940), "Application of the Non-Central t-Distribution," Biometrika, 31(3/4), 362-389.
Lyles R. H. and L. L. Kupper (1996), "On Strategies for Comparing Occupational Exposure Data to Limits," American Industrial Hygiene Association Journal, 57:6-15.
Tuggle, R. M. (1982), "Assessment of Occupational Exposure Using One-Sided Tolerance Limits," American Industrial Hygiene Association Journal, 43, 338-346.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Ignacio, J. S. and W. H. Bullock (2006), A Strategy for Assesing and Managing Occupational Exposures, Third Edition, AIHA Press, Fairfax, VA.
Mulhausen, J. R. and J. Damiano (1998), A Strategy for Assessing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.
Help files for percentile.ml
,
efraction.exact
, aihand
# EXAMPLE 1 # calculate 95th percentile and exact CLs for Example data # Appendix Mulhausen and Damiano (1998) data(aihand) x <- aihand$x ; det <- rep(1,length(x)) aiha <- data.frame(x,det) # complete data unlist(percentile.exact(x,gam=0.95,p=0.95) )[1:5] # exact CLs unlist(percentile.ml(aiha,gam=0.95,p=0.95)) # ML CLs # EXAMPLE 2 # Ignacio and Bullock (2006) Mulhausen and Damiano (1998) # Calculate TABLE VII.3 (page 272) Factor for One-Sided Tolerance # Limits for Normal Distribution (Abridged Version) # Same as Table III Burrows(1963) Panel 3 Page 138 nn <- c(seq(3,25),seq(30,50,5)) pv <-c(0.75,0.9,0.95,0.99,0.999) tab <- matrix(0,length(nn),length(pv)) for( k in (1:length(nn) ) ){ xx <- seq(1,nn[k]) for(j in (1:length(pv))) { tab[k,j ]<- percentile.exact(xx,pv[j],gam=0.95,FALSE)$Ku }} dimnames(tab)<-(list(nn,pv)) ; rm(nn,pv,xx) round(tab,3) # # EXAMPLE 3 # Calculate TABLE I One Sided Tolerance Factor K' # Tuggle(1982) Page 339 (Abridged Version) nn <- c(seq(3,20),50,50000000) pv <-c(0.9,0.95,0.99) tab <- matrix(0,length(nn),length(pv)) for( k in (1:length(nn) ) ){ xx <- seq(1,nn[k]) for(j in (1:length(pv))) { tab[k,j ]<- percentile.exact(xx,pv[j],gam=0.95,FALSE)$Kl }} dimnames(tab)<-(list(nn,pv)) ; rm(nn,pv,xx) round(tab,3)
# EXAMPLE 1 # calculate 95th percentile and exact CLs for Example data # Appendix Mulhausen and Damiano (1998) data(aihand) x <- aihand$x ; det <- rep(1,length(x)) aiha <- data.frame(x,det) # complete data unlist(percentile.exact(x,gam=0.95,p=0.95) )[1:5] # exact CLs unlist(percentile.ml(aiha,gam=0.95,p=0.95)) # ML CLs # EXAMPLE 2 # Ignacio and Bullock (2006) Mulhausen and Damiano (1998) # Calculate TABLE VII.3 (page 272) Factor for One-Sided Tolerance # Limits for Normal Distribution (Abridged Version) # Same as Table III Burrows(1963) Panel 3 Page 138 nn <- c(seq(3,25),seq(30,50,5)) pv <-c(0.75,0.9,0.95,0.99,0.999) tab <- matrix(0,length(nn),length(pv)) for( k in (1:length(nn) ) ){ xx <- seq(1,nn[k]) for(j in (1:length(pv))) { tab[k,j ]<- percentile.exact(xx,pv[j],gam=0.95,FALSE)$Ku }} dimnames(tab)<-(list(nn,pv)) ; rm(nn,pv,xx) round(tab,3) # # EXAMPLE 3 # Calculate TABLE I One Sided Tolerance Factor K' # Tuggle(1982) Page 339 (Abridged Version) nn <- c(seq(3,20),50,50000000) pv <-c(0.9,0.95,0.99) tab <- matrix(0,length(nn),length(pv)) for( k in (1:length(nn) ) ){ xx <- seq(1,nn[k]) for(j in (1:length(pv))) { tab[k,j ]<- percentile.exact(xx,pv[j],gam=0.95,FALSE)$Kl }} dimnames(tab)<-(list(nn,pv)) ; rm(nn,pv,xx) round(tab,3)
Calculate the ML estimate of Xp the 100pth percentile
of the lognormal distribution, and the lower and upper % confidence limits
LX(
p
,) and UX(
p
,). The upper confidence limit is used to
test the null hypothesis that the exposure profile is "unacceptable".
If UX(
p
, the null hypothesis is rejected and workplace
is considered "safe" or the object/area is not contaminated. The
Type I error is
. The resulting interval (LX,UX)
is an approximate
percent confidence interval for Xp.
percentile.ml(dd, p = 0.95, gam = 0.95, dat = TRUE)
percentile.ml(dd, p = 0.95, gam = 0.95, dat = TRUE)
dd |
An n by 2 matrix or data frame with |
p |
is probability for Xp the 100pth percentile. Default is 0.95 |
gam |
one-sided confidence level |
dat |
if |
The point estimate of is
where
and
are ML estimates and
is qnorm(p). The variance of the estimate is
The LCL and UCL for Xp are
The ML estimates of ,
, and
are obtained from the ML variance-covariance matrix using
lnorm.ml
. The null hypothesis is rejected at the
significance level if the
UCL for Xp < Lp (indicating the exposure profile is acceptable).
A LIST with components:
Xp |
ML estimate of the pth percentile of lognormal distribution |
Xp.LCL |
|
Xp.UCL |
|
p |
probability for Xp the 100pth percentile. Default 0.95 |
gam |
one-sided confidence level |
The UCL is also referred to as an upper tolerance limit(UTL), i.e., if p = 0.95 and gam = 0.99 then Xp.UCL is the UTL-95%-99%.
E. L. Frome
Cohen, A. C. (1991), Truncated and Censored Samples, Marcel Decker, New York
Cox, D. R. and D. V. Hinkley (1979), Theoretical Statistics, Chapman and Hall, New York.
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Help files for lnorm.ml
,efraction.ml
data(beTWA) # calculate ML estimate of 95th percentile and CLs for Example 2 in ORNLTM2005-52 unlist(percentile.ml(beTWA,0.95,0.95))
data(beTWA) # calculate ML estimate of 95th percentile and CLs for Example 2 in ORNLTM2005-52 unlist(percentile.ml(beTWA,0.95,0.95))
Find Xp, the 100pth percentile, and the %
nonparametric confidence limits from PLE of F(x).
percentile.ple(dd, p = 0.95, gam = 0.95, interp = TRUE)
percentile.ple(dd, p = 0.95, gam = 0.95, interp = TRUE)
dd |
An n by 2 matrix or data frame with |
p |
Find x such that the PLE of F(x) = |
gam |
one-sided confidence level |
interp |
if |
Find Xp the 100pth percentile and confidence limits from the
PLE of F(x) – see plekm
for additional details.
If interp is TRUE
use linear interpolation; otherwise, the upper confidence
limit (UCL) for Xp, UX(p
,), is the smallest value of
such that
the LCL for F(x) is
p
, the lower confidence limit (LCL),
LX(p
,), is the largest value of
such that the UCL for F(x)
is
p
.
A list with components:
Xp |
PLE of the 100pth percentile |
LXp |
PLE of LX( |
UXp |
PLE of UX( |
p |
probability for Xp the 100pth percentile |
gam |
one-sided confidence level |
E. L. Frome
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
# use data from example 2 in ORNL/TM-2005/52 to calculate # 95 percent UCL for 95th percentile data(beTWA) unlist(percentile.ple(beTWA)) unlist(percentile.ml(beTWA)) # compare ML estimates
# use data from example 2 in ORNL/TM-2005/52 to calculate # 95 percent UCL for 95th percentile data(beTWA) unlist(percentile.ple(beTWA)) unlist(percentile.ml(beTWA)) # compare ML estimates
Plot the product limit estimate (PLE) of F(x) and
two-sided confidence limits (CLs) for left censored data. A horizontal line
corresponding to the Xp = 100pth percentile is added to the plot and
the nonparametric confidence limits for Xp are displayed in the title.
ple.plot(dd, gam = 0.95, p = 0.95, xlow = 0, xh = NA, ylow = 0, yh = 1,...)
ple.plot(dd, gam = 0.95, p = 0.95, xlow = 0, xh = NA, ylow = 0, yh = 1,...)
dd |
An n by 2 matrix or data frame with |
gam |
one-sided confidence level |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
xlow |
minimum value on x axis. Default = 0 |
xh |
maximum value on the x axis. Default = maximum value of x |
ylow |
minimum value on y axis. Default = 0 |
yh |
maximum value on the y axis. Default = 1 |
... |
Additional parameters to plot |
Data frame with columns
a |
value of jth detect (ordered) |
ple |
PLE of F(x) at a |
stder |
standard error of F(x) at a |
lower |
lower CL for PLE at a |
upper |
upper CL for PLE at a |
n |
number of detects or non-detects |
r |
number of detects equal to a |
If the solid horizontal line does not intersect the lower
CL for the PLE, then the upper CL for Xp UX(p
,) is not defined.
E. L. Frome
See Also plekm
data(beTWA) par( mfrow=c(1,2) ) ple.plot(beTWA) # plot the PLE of F(x) for the beTWA data ple.plot(beTWA,ylow=0.8) # plot the upper right tail # Lognormal ML estimates of 95th percentile and CLs unlist(percentile.ml(beTWA)) # PLE estimates of 95th percentile and CLs unlist(percentile.ple(beTWA)) #
data(beTWA) par( mfrow=c(1,2) ) ple.plot(beTWA) # plot the PLE of F(x) for the beTWA data ple.plot(beTWA,ylow=0.8) # plot the upper right tail # Lognormal ML estimates of 95th percentile and CLs unlist(percentile.ml(beTWA)) # PLE estimates of 95th percentile and CLs unlist(percentile.ple(beTWA)) #
Compute Product Limit Estimate (PLE) of F(x) for interval censored data (i.e., the nonparametric maximum likelihood estimate that generalizes the Kaplan-Meier estimate to interval censored data).
pleicf(dd,nondet=TRUE,mine=1e-06,maxc=10000,eps=1e-14)
pleicf(dd,nondet=TRUE,mine=1e-06,maxc=10000,eps=1e-14)
dd |
n by 2 matrix or data frame (see note below) |
nondet |
if |
mine |
minimum error for convergence in icfit. Default = 1e-06. |
maxc |
maximum number of iterations. Default is 10000. |
eps |
adjustment factor described by Ng. Default is 1e-14. |
This function is a driver function for icfit
that uses an EM-algorithm applied to interval censored data (see Turnbull, 1976).
Data frame with columns
a |
value of jth uncensored value (ordered) |
ple |
PLE of F(x) at a |
surv |
|
prob |
prob[X = x] from |
n |
sample size |
If nondet
is TRUE
column 1 of dd is the data value
and column 2 is 1 if a detect and 0 otherwise. If nondet
is
FALSE
dd
contains the left and right endpoints required
by icfit
.
E. L. Frome
Fay, M. P. (1999), "Comparing Several Score Tests for Interval Censored Data," Statistics in Medicine,18:273-85. (Corr: 1999, Vol 19, p.2681).
Ng, M. P. (2002), "A Modification of Peto's Nonparametric Estimation of Survival Curves for Interval-Censored Data," Biometrics, 58, 439-442.
Turnbull, B. W. (1976), "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data," Journal of the Royal Statistical Society, Series B (Methodological), 38(3), 290-295.
# PLE for interval censored filmbadge data data(filmbadge) ple.fb<-pleicf(filmbadge[,1:2],FALSE) # PLE for input to qq.lnorm tmp <- qq.lnorm(ple.fb) ; GM<-round(exp(tmp$par[1]));GSD<-round(exp(tmp$par[2]),2) tp<-paste("Lognormal Q-Q plot for Filmbadge Data GM= ",GM,"GSD= ",GSD) title(tp) # title for q-q plot with graphical parameter estimates
# PLE for interval censored filmbadge data data(filmbadge) ple.fb<-pleicf(filmbadge[,1:2],FALSE) # PLE for input to qq.lnorm tmp <- qq.lnorm(ple.fb) ; GM<-round(exp(tmp$par[1]));GSD<-round(exp(tmp$par[2]),2) tp<-paste("Lognormal Q-Q plot for Filmbadge Data GM= ",GM,"GSD= ",GSD) title(tp) # title for q-q plot with graphical parameter estimates
Compute Product Limit Estimate (PLE) of F(x) and Confidence Limits for data with non-detects (left censored data).
plekm(dd,gam)
plekm(dd,gam)
dd |
An n by 2 matrix or data frame with |
gam |
one-sided confidence level |
R function survreg
is used to calculate Kaplan-Meier estimate
of S(z), where z = k - x and k is greater than the largest x value.
This technique of "reversing the data" to convert left censored data
to right censored data was first suggested by Nelson (1972). conf.type
= "plain" is required in survreg for correct CLs. The value of S(z)
is then used to calculate F(x). Note that if = 0.95 the 90%
two-sided CLs are calculated.
Data frame with columns
a |
is the value of jth detect (ordered) |
ple |
is PLE of F(x) at |
stder |
standard error of F(x) at |
lower |
lower CL for PLE at |
upper |
upper CL for PLE at |
n |
number of detects or non-detects |
d |
number of detects equal to |
In survival analysis S(x) = 1 - F(x) is the survival function i.e., S(x) = P [X > x]. In environmental and occupational situations S(x) is the "exceedance" function, i.e., S(x) = is the proportion of X values that exceed x. The PLE is the sample estimate of F(x), i.e., the proportion of values in the sample that are less than x.
E. L. Frome
Nelson, W.(1972), "Theory and Application of Hazard Plotting for Censored Failure Data", Technometrics, 14, 945-66
Frome, E. L. and Wambach, P.F. (2005) "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values", ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr. (1996), "Difficulties with the Lognormal Model in Mean Estimation and Testing," Environmental and Ecological Statistics, 3, 81-97.
data(SESdata) ## use SESdata data set Example 1 from ORNLTM-2005/52 pkm<- plekm(SESdata) qq.lnorm(pkm) # lognormal q-q plot based on PLE round(pkm,3)
data(SESdata) ## use SESdata data set Example 1 from ORNLTM-2005/52 pkm<- plekm(SESdata) qq.lnorm(pkm) # lognormal q-q plot based on PLE round(pkm,3)
Compute Product Limit Estimate(PLE) of F(x) for positive data with non-detects (left censored data)
plend(dd)
plend(dd)
dd |
An n by 2 matrix or data frame with |
The product limit estimate (PLE) of the cumulative distribution function
was first proposed by Kaplan and Meier (1958) for right censored data.
Turnbull (1976) provides a more general treatment of nonparametric
estimation of the distribution function for arbitrary censoring. For
randomly left censored data, the PLE is defined as follows [Schmoyer et al.
(1996)]. Let be the m distinct values at
which detects occur, r[j] is the number of detects at a[j], and n[j] is the
sum of non-detects and detects that are less than or equal to a[j]. Then the
PLE is defined to be 0 for
, where a0 is a[1] or the
value of the detection limit for the smallest non-detect if it is less than
a[1]. For
the PLE is
, where the product is over all
, and the PLE is 1 for
. When there are only detects this reduces to the usual
definition of the empirical cumulative distribution function.
Data frame with columns
a(j) |
value of jth detect (ordered) |
ple(j) |
PLE of F(x) at a(j) |
n(j) |
number of detects or non-detects |
r(j) |
number of detects equal to a(j) |
surv(j) |
1 - ple(j) is PLE of S(x) |
In survival analysis is the survival function
i.e.,
. In environmental and occupational situations
is the "exceedance" function, i.e.,
.
E. L. Frome
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Kaplan, E. L. and Meier, P. (1958), "Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, 457-481.
Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr. (1996), "Difficulties with the Lognormal Model in Mean Estimation and Testing," Environmental and Ecological Statistics, 3, 81-97.
Turnbull, B. W. (1976), "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data," Journal of the Royal Statistical Society, Series B (Methodological), 38(3), 290-295.
data(SESdata) # use SESdata data set Example 1 from ORNLTM-2005/52 pnd<- plend(SESdata) Ia<-"Q-Q plot For SESdata " qq.lnorm(pnd,main=Ia) # lognormal q-q plot based on PLE pnd
data(SESdata) # use SESdata data set Example 1 from ORNLTM-2005/52 pnd<- plend(SESdata) Ia<-"Q-Q plot For SESdata " qq.lnorm(pnd,main=Ia) # lognormal q-q plot based on PLE pnd
qq.lnorm produces a lognormal quantile-quantile (q-q) plot based on the product limit estimate (PLE) of the cumulative distribution function (CDF) F(x) for censored data. A line is added to the plot.
qq.lnorm(pl, mu, sigma, aveple = TRUE,...)
qq.lnorm(pl, mu, sigma, aveple = TRUE,...)
pl |
A data frame with the data(x) in column 1 and PLE in column 2 |
mu |
estimate of the log scale mean |
sigma |
estimate of log scale standard deviation |
aveple |
if |
... |
Additional parameters to plot |
The PLE is used to determine the plotting positions on the horizontal
axis for the censored data version of a theoretical q-q plot for the lognormal distribution. Waller and Turnbull (1992)
provide a good overview of q-q plots and other graphical methods for
censored data. The lognormal q-q plot is obtained by plotting
detected values (on log scale) versus
where
is the
inverse of the distribution function of the standard normal
distribution. If the largest data value is not censored then the PLE
is 1 and H(1) is off scale. The "plotting positions"
are
determined from the PLE of F(x) by multiplying each estimate by
, or by averaging adjacent values–see Meeker and Escobar
(1998, Chap 6)]. In complete data case without ties the first approach
is equivalent to replacing the sample CDF
with
, and for
the second approach the plotting positions are equal to
. If
the lognormal distribution is a close approximation to the empirical
distribution, the points on the plot will fall near a straight line.
An objective evaluation of this is obtained by calculating
Rsq
the
square of the correlation coefficient associated with the plot.
A line is added to the plot based on the values of mu
and sigma
.
If either of these is missing mu
and sigma
are estimated by
linear regression of on
.
A list with components
x |
The x coordinates of the points that were plotted |
y |
The y coordinates of the points that were plotted |
pp |
The adjusted probabilities use to determine |
par |
The values of |
Helsel and Cohen (1988) consider alternative procedures that can be used for calculating plotting positions for left censored data. Waller and Turnbull (1992) describe a modification of the Kaplan-Meier estimator that can be used for right censored data and note that for the purpose of assessing goodness of fit the choice of plotting positions makes little qualitative difference in the appearance of any particular plot. The two options in this function can be used for any type of censoring.
E. L. Frome
Fay, M. P. (1999), "Comparing Several Score Tests for Interval Censored Data," Statistics in Medicine, 1999; 18:273-85. (Corr: 1999, Vol 19, p.2681).
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
Hesel, D. R. and T. A. Cohn (1988), "Estimation of Descriptive Statistics for Multiply Censored Water Quality Data," Water Resources Research, 24, 1997-2004.
Meeker, W. Q. and L. A. Escobar (1998), Statistical Methods for Reliability Data, John Wiley and Sons, New York.
Ny, M. P. (2002), "A Modification of Peto's Nonparametric Estimation of Survival Curves for Interval-Censored Data," Biometrics, 58, 439-442.
Waller, L. A. and B. W. Turnbull (1992), "Probability Plotting with Censored Data," The American Statistician, 46(1), 5-12.
data(SESdata) # use SESdata data set Example 1 from ORNLTM-2005/52 pnd<- plend(SESdata) qq.lnorm(pnd) # lognormal q-q plot based on PLE Ia <- "Q-Q plot For SESdata " qqout <- qq.lnorm(pnd,main=Ia) # lognormal q-q plot based on PLE qqout
data(SESdata) # use SESdata data set Example 1 from ORNLTM-2005/52 pnd<- plend(SESdata) qq.lnorm(pnd) # lognormal q-q plot based on PLE Ia <- "Q-Q plot For SESdata " qqout <- qq.lnorm(pnd,main=Ia) # lognormal q-q plot based on PLE qqout
Read data from fn.txt (space delimited text file) or fn.csv
(comma delimited text file) and calculate all summary statistics using
IH.summary
. Output results to an ASCII text file fnout.csv in CSV format
readss(fn,L,bcol=NA,rto=5,pstat=NA,reverse=FALSE,p=0.95,gam=0.95,comma=FALSE)
readss(fn,L,bcol=NA,rto=5,pstat=NA,reverse=FALSE,p=0.95,gam=0.95,comma=FALSE)
fn |
name of input data file in double quotes without the .txt or .csv extension |
L |
L is specified limit for the exceedance fraction; e.g., the occupational exposure limit |
bcol |
Column that contains the BY variable–see details. Default NA |
rto |
Round values to rto. Default = 5 |
pstat |
Select a subset of statistics calculated by |
reverse |
If |
p |
probability for Xp the 100pth percentile. Default is 0.95 |
gam |
one-sided confidence level |
comma |
if |
Read data from a tab or comma delimited text file in the
current folder/directory. The first column must contain measurements
(observed X values). The second column is an indicator variable with 1 for a
detected value and 0 for a non-detect. Additional columns can contain
factors that can be used to define a BY variable. The first record
in the file must contain valid R names. Valid names may contain
letters (case sensitive), numbers, period, and underscore and should
start with a letter ( no spaces). This file would most likely be
obtained from an Excel spread sheet using the file "Save As" option,
with file Save as type:
Text(Tab delimited)(*.txt) or CSV(Comma
delimited)(*.csv).
Returns invisible data.frame from file fn.txt
Column 1 |
value of measurement |
Column 2 |
indicator variable ; 1 for detect 0 for non-detect |
Column 3 |
|
Summary statistics calculated by
IH.summary
are computed for each subset of data as
defined by the levels of the BY variable. A data frame with row names
from IH.summary (or subset based on value of pstat
) and column
names defined by the values of the BY variable is output as an ASCII
text file in CSV format fnout.csv in the working folder. If
reverse
is TRUE the rows and columns are reversed.
For information about factor
see R help file factor
Each level of the BY variable must have at least two non-detects for this function.
If this is not the case an error message is printed to the R console and the
levels of the BY variable with less than 3 non-detects are printed.
E. L. Frome
see the help file for lnorm.ml
, efclnp
efraction.ml
, percentile.ml
, kmms
About-STAND
for more details and a complete reference list
# to demonstrate the use of readss add a new factor grp to the cansdata # this factor with four levels (A_1 A_2 B_1 B_2) combines strata and sample data(cansdata) grp <- paste(cansdata$strata,cansdata$sample,sep="_") temp <- data.frame(cansdata,grp) # add four level factor grp to cansdata # the next line is NOT executable use CUT AND PASTE # sink("demoread.txt") ; print(temp) ; sink() # The preceding line writes temp to a text file demoread.txt in the current folder # This file would normally be created by another program, e.g. Excel # now use readss() to read this space delimited text file and calculate # all of the summary statistics for each level of grp and output # the results to a new text file demoreadout.csv in the current folder # rdemo <- readss("demoread",L=0.2,bcol=5) # rdemo is the R data frame that was used to calculate results in demoreadout.csv # to see same results rounded to three places in R console use # round( IH.summary(rdemo,L=0.2,bcol=5), 3) # To select a subset of statistics from IH.summary first define the subset # psel<-c("Xp.obs","Xp","Xp.UCL","f","f.UCL","Rsq","m","n") # entering the following command will overwrite demoreadout.csv # with rows and columns reversed and the subset of statistics as columns # and the results will be rounded to 4 places # rdemo <- readss("demoread",L=0.2,bcol=5,rto=4,pstat=psel,rev=TRUE) # # to see same results rounded to three places in R console use # t(round( IH.summary(rdemo,L=0.2,bcol=5)[psel,], 3))
# to demonstrate the use of readss add a new factor grp to the cansdata # this factor with four levels (A_1 A_2 B_1 B_2) combines strata and sample data(cansdata) grp <- paste(cansdata$strata,cansdata$sample,sep="_") temp <- data.frame(cansdata,grp) # add four level factor grp to cansdata # the next line is NOT executable use CUT AND PASTE # sink("demoread.txt") ; print(temp) ; sink() # The preceding line writes temp to a text file demoread.txt in the current folder # This file would normally be created by another program, e.g. Excel # now use readss() to read this space delimited text file and calculate # all of the summary statistics for each level of grp and output # the results to a new text file demoreadout.csv in the current folder # rdemo <- readss("demoread",L=0.2,bcol=5) # rdemo is the R data frame that was used to calculate results in demoreadout.csv # to see same results rounded to three places in R console use # round( IH.summary(rdemo,L=0.2,bcol=5), 3) # To select a subset of statistics from IH.summary first define the subset # psel<-c("Xp.obs","Xp","Xp.UCL","f","f.UCL","Rsq","m","n") # entering the following command will overwrite demoreadout.csv # with rows and columns reversed and the subset of statistics as columns # and the results will be rounded to 4 places # rdemo <- readss("demoread",L=0.2,bcol=5,rto=4,pstat=psel,rev=TRUE) # # to see same results rounded to three places in R console use # t(round( IH.summary(rdemo,L=0.2,bcol=5)[psel,], 3))
The Department of Energy (DOE) Chronic Beryllium Disease Prevention Program is concerned with monitoring objects (e.g., equipment, buildings) for beryllium contamination and workers for exposure to beryllium in the workplace.
The SESdata is the results of a survey to evaluate possible beryllium
contamination based on 31 surface wipe samples from elevated surfaces (SES)
of a smelter at a DOE facility. For equipment that is being evaluated for release to
the public, or for non beryllium use, the DOE has established a release
limit for removable beryllium contamination of .
data(SESdata)
data(SESdata)
A data frame with 31 observations on the following 2 variables
beryllium
0 if non-detect; 1 if detected
Statistics of interest are the exceedance fraction and
the 95th percentile. The exceedance fraction is an estimate of the
percentage of surface area that is expected to exceed the release
limit Lp = 0.2 with p = 0.95. Both the point estimate and
the UCL for F exceed Fo = 100 (1-p) = 5%, indicating that the equipment
is not acceptable. In fact, at the 95
confidence level at least 19.5% of the surface area exceeds the
release limit.
A more detailed description and analysis of this data is given as Example 1 in Section 4 of Frome and Wambach (2005)
Frome, E. L. and Wambach, P. F. (2005), "Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-Detectable Values," ORNL/TM-2005/52,Oak Ridge National Laboratory, Oak Ridge, TN 37830. Available at: http://www.csm.ornl.gov/esh/aoed/ORNLTM2005-52.pdf
data(SESdata) mle.ses <- unlist(lnorm.ml(SESdata)) # ML for SESdata print(mle.ses[1:4]) # ML estimates of parameters print(mle.ses[5:8]) # Standard errors of ML estimates # Next line produces a lognormal q-q plot with ML line qq.lnorm(plend(SESdata),mle.ses[1],mle.ses[2]) title("Lognormal Q-Q plot For SESdata Example 1 in ORNLTM2005-52") unlist(efraction.ml(SESdata,gam=0.95,L=0.2)) # MLE of exceedance fraction and CLs unlist(percentile.ml(SESdata,p=0.95,gam=0.95)) # MLE of 95 percentile and CLs
data(SESdata) mle.ses <- unlist(lnorm.ml(SESdata)) # ML for SESdata print(mle.ses[1:4]) # ML estimates of parameters print(mle.ses[5:8]) # Standard errors of ML estimates # Next line produces a lognormal q-q plot with ML line qq.lnorm(plend(SESdata),mle.ses[1],mle.ses[2]) title("Lognormal Q-Q plot For SESdata Example 1 in ORNLTM2005-52") unlist(efraction.ml(SESdata,gam=0.95,L=0.2)) # MLE of exceedance fraction and CLs unlist(percentile.ml(SESdata,p=0.95,gam=0.95)) # MLE of 95 percentile and CLs