Title: | Dixon's Ratio Test for Outlier Detection |
---|---|
Description: | For outlier detection in small and normally distributed samples the ratio test of Dixon (Q-test) can be used. Density, distribution function, quantile function and random generation for Dixon's ratio statistics are provided as wrapper functions. The core applies McBane's Fortran functions <doi:10.18637/jss.v016.i03> that use Gaussian quadrature for a numerical solution. |
Authors: | Thorsten Pohlert [aut, cre] , George C. McBane [ctb] |
Maintainer: | Thorsten Pohlert <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2024-12-18 06:27:25 UTC |
Source: | CRAN |
Density, distribution function, quantile function
and random generation for Dixon's ratio statistics
for outlier detection.
qdixon(p, n, i = 1, j = 1, log.p = FALSE, lower.tail = TRUE) pdixon(q, n, i = 1, j = 1, lower.tail = TRUE, log.p = FALSE) ddixon(x, n, i = 1, j = 1, log = FALSE) rdixon(n, i = 1, j = 1)
qdixon(p, n, i = 1, j = 1, log.p = FALSE, lower.tail = TRUE) pdixon(q, n, i = 1, j = 1, lower.tail = TRUE, log.p = FALSE) ddixon(x, n, i = 1, j = 1, log = FALSE) rdixon(n, i = 1, j = 1)
p |
vector of probabilities. |
n |
number of observations. If |
i |
number of observations <= x_i |
j |
number of observations >= x_j |
log.p |
logical; if |
lower.tail |
logical; if |
q |
vector of quantiles |
x |
vector of quantiles. |
log |
logical; if |
According to McBane (2006) the density of the statistics of Dixon
can be yield if
and
are integrated over the range
where is the Jacobian and
is the density of the standard normal distribution.
McBane (2006) has proposed a numerical solution using Gaussian quadratures
(Gauss-Hermite quadrature and half-range Hermite quadrature) and coded
a library in Fortran. These R functions are wrapper functions to
use the respective Fortran code.
ddixon
gives the density function,
pdixon
gives the distribution function,
qdixon
gives the quantile function and
rdixon
generates random deviates.
The R code is a wrapper to the Fortran code released under GPL >=2 in the electronic supplement of McBane (2006). The original files are ‘rfuncs.f’, ‘utility.f’ and ‘dixonr.fi’. They were slightly modified to comply with current CRAN policy and the R manual ‘Writing R Extensions’.
The file ‘slowTest/d-p-q-r-tests.R.out.save’ that is included in this package contains some results for the assessment of the numerical accuracy.
The slight numerical differences between McBane's original Fortran output
(see files ‘slowTests/test[1,2,4].ref.output.txt’) and
this implementation are related to different floating point rounding
algorithms between R (see ‘round to even’ in round
)
and Fortran's write(*,'F6.3')
statement.
Dixon, W. J. (1950) Analysis of extreme values. Ann. Math. Stat. 21, 488–506. doi:10.1214/aoms/1177729747.
Dean, R. B., Dixon, W. J. (1951) Simplified statistics for small numbers of observation. Anal. Chem. 23, 636–638. doi:10.1021/ac60052a025.
McBane, G. C. (2006) Programs to compute distribution functions and critical values for extreme value ratios for outlier detection. J. Stat. Soft. 16. doi:10.18637/jss.v016.i03.
set.seed(123) n <- 20 Rdixon <- rdixon(n, i = 3, j = 2) Rdixon pdixon(Rdixon, n = n, i = 3, j = 2) ddixon(Rdixon, n = n, i = 3, j = 2)
set.seed(123) n <- 20 Rdixon <- rdixon(n, i = 3, j = 2) Rdixon pdixon(Rdixon, n = n, i = 3, j = 2) ddixon(Rdixon, n = n, i = 3, j = 2)
Performs Dixons single outlier test.
dixonTest(x, alternative = c("two.sided", "greater", "less"), refined = FALSE)
dixonTest(x, alternative = c("two.sided", "greater", "less"), refined = FALSE)
x |
a numeric vector of data |
alternative |
the alternative hypothesis.
Defaults to |
refined |
logical indicator, whether the refined version
or the Q-test shall be performed. Defaults to |
Let denote an identically and independently distributed
normal variate. Further, let the increasingly ordered realizations
denote
.
Dixon (1950) proposed the following ratio statistic to detect
an outlier (two sided):
The null hypothesis, no outlier, is tested against the alternative,
at least one observation is an outlier (two sided). The subscript
on the
symbol indicates the number of
outliers that are suspected at the upper end of the data set,
and the subscript
indicates the number of outliers suspected
at the lower end. For
it is also common to use the
statistic
.
The statistic for a single maximum outlier is:
The null hypothesis is tested against the alternative, the maximum observation is an outlier.
For testing a single minimum outlier, the test statistic is:
The null hypothesis is tested against the alternative, the minimum observation is an outlier.
Apart from the earlier Dixons Q-test (i.e. ),
a refined version that was later proposed by Dixon can be performed
with this function, where the statistic
depends on
the sample size as follows:
: |
|
: |
|
; |
|
: |
|
The p-value is computed with the function pdixon
.
Dixon, W. J. (1950) Analysis of extreme values. Ann. Math. Stat. 21, 488–506. doi:10.1214/aoms/1177729747.
Dean, R. B., Dixon, W. J. (1951) Simplified statistics for small numbers of observation. Anal. Chem. 23, 636–638. doi:10.1021/ac60052a025.
McBane, G. C. (2006) Programs to compute distribution functions and critical values for extreme value ratios for outlier detection. J. Stat. Soft. 16. doi:10.18637/jss.v016.i03.
## example from Dean and Dixon 1951, Anal. Chem., 23, 636-639. x <- c(40.02, 40.12, 40.16, 40.18, 40.18, 40.20) dixonTest(x, alternative = "two.sided") ## example from the dataplot manual of NIST x <- c(568, 570, 570, 570, 572, 578, 584, 596) dixonTest(x, alternative = "greater", refined = TRUE)
## example from Dean and Dixon 1951, Anal. Chem., 23, 636-639. x <- c(40.02, 40.12, 40.16, 40.18, 40.18, 40.20) dixonTest(x, alternative = "two.sided") ## example from the dataplot manual of NIST x <- c(568, 570, 570, 570, 572, 578, 584, 596) dixonTest(x, alternative = "greater", refined = TRUE)