Package 'dixonTest'

Title: Dixon's Ratio Test for Outlier Detection
Description: For outlier detection in small and normally distributed samples the ratio test of Dixon (Q-test) can be used. Density, distribution function, quantile function and random generation for Dixon's ratio statistics are provided as wrapper functions. The core applies McBane's Fortran functions <doi:10.18637/jss.v016.i03> that use Gaussian quadrature for a numerical solution.
Authors: Thorsten Pohlert [aut, cre] , George C. McBane [ctb]
Maintainer: Thorsten Pohlert <[email protected]>
License: GPL-3
Version: 1.0.4
Built: 2024-12-18 06:27:25 UTC
Source: CRAN

Help Index


Dixon distribution

Description

Density, distribution function, quantile function and random generation for Dixon's ratio statistics rj,i1r_{j,i-1} for outlier detection.

Usage

qdixon(p, n, i = 1, j = 1, log.p = FALSE, lower.tail = TRUE)

pdixon(q, n, i = 1, j = 1, lower.tail = TRUE, log.p = FALSE)

ddixon(x, n, i = 1, j = 1, log = FALSE)

rdixon(n, i = 1, j = 1)

Arguments

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required

i

number of observations <= x_i

j

number of observations >= x_j

log.p

logical; if TRUE propabilities p are given as log(p)

lower.tail

logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x].

q

vector of quantiles

x

vector of quantiles.

log

logical; if TRUE (default), probabilities p are given as log(p).

Details

According to McBane (2006) the density of the statistics rj,i1r_{j,i-1} of Dixon can be yield if xx and vv are integrated over the range (<x<,0v<)(-\infty < x < \infty, 0 \le v < \infty)

f(r)=n!(i1)!(nji1)!(j1)!×0[xvϕ(t)dt]i1[xvxrvϕ(t)dt]nji1×[xrvxϕ(t)dt]j1ϕ(xv)ϕ(xrv)ϕ(x)v dv dx\begin{array}{lcl} f(r) & = & \frac{n!}{\left(i-1\right)! \left(n-j-i-1\right)!\left(j-1\right)!} \\ & & \times \int_{-\infty}^{\infty} \int_{0}^{\infty} \left[\int_{-\infty}^{x-v} \phi(t)dt\right]^{i-1} \left[\int_{x-v}^{x-rv} \phi(t)dt \right]^{n-j-i-1} \\ & & \times \left[\int_{x-rv}^x \phi(t)dt \right]^{j-1} \phi(x-v)\phi(x-rv)\phi(x)v ~ dv ~ dx \\ \end{array}

where vv is the Jacobian and ϕ(.)\phi(.) is the density of the standard normal distribution. McBane (2006) has proposed a numerical solution using Gaussian quadratures (Gauss-Hermite quadrature and half-range Hermite quadrature) and coded a library in Fortran. These R functions are wrapper functions to use the respective Fortran code.

Value

ddixon gives the density function, pdixon gives the distribution function, qdixon gives the quantile function and rdixon generates random deviates.

Source

The R code is a wrapper to the Fortran code released under GPL >=2 in the electronic supplement of McBane (2006). The original files are ‘rfuncs.f’, ‘utility.f’ and ‘dixonr.fi’. They were slightly modified to comply with current CRAN policy and the R manual ‘Writing R Extensions’.

Note

The file ‘slowTest/d-p-q-r-tests.R.out.save’ that is included in this package contains some results for the assessment of the numerical accuracy.

The slight numerical differences between McBane's original Fortran output (see files ‘slowTests/test[1,2,4].ref.output.txt’) and this implementation are related to different floating point rounding algorithms between R (see ‘round to even’ in round) and Fortran's write(*,'F6.3') statement.

References

Dixon, W. J. (1950) Analysis of extreme values. Ann. Math. Stat. 21, 488–506. doi:10.1214/aoms/1177729747.

Dean, R. B., Dixon, W. J. (1951) Simplified statistics for small numbers of observation. Anal. Chem. 23, 636–638. doi:10.1021/ac60052a025.

McBane, G. C. (2006) Programs to compute distribution functions and critical values for extreme value ratios for outlier detection. J. Stat. Soft. 16. doi:10.18637/jss.v016.i03.

Examples

set.seed(123)
n <- 20
Rdixon <- rdixon(n, i = 3, j = 2)
Rdixon
pdixon(Rdixon, n = n, i = 3, j = 2)
ddixon(Rdixon, n = n, i = 3, j = 2)

Dixons Outlier Test (Q-Test)

Description

Performs Dixons single outlier test.

Usage

dixonTest(x, alternative = c("two.sided", "greater", "less"), refined = FALSE)

Arguments

x

a numeric vector of data

alternative

the alternative hypothesis. Defaults to "two.sided"

refined

logical indicator, whether the refined version or the Q-test shall be performed. Defaults to FALSE

Details

Let XX denote an identically and independently distributed normal variate. Further, let the increasingly ordered realizations denote x1x2xnx_1 \le x_2 \le \ldots \le x_n. Dixon (1950) proposed the following ratio statistic to detect an outlier (two sided):

rj,i1=max{xnxnjxnxi,x1+jx1xnix1}r_{j,i-1} = \max\left\{\frac{x_n - x_{n-j}}{x_n - x_i}, \frac{x_{1+j} - x_1}{x_{n-i} - x_1}\right\}

The null hypothesis, no outlier, is tested against the alternative, at least one observation is an outlier (two sided). The subscript jj on the rr symbol indicates the number of outliers that are suspected at the upper end of the data set, and the subscript ii indicates the number of outliers suspected at the lower end. For r10r_{10} it is also common to use the statistic QQ.

The statistic for a single maximum outlier is:

rj,i1=(xnxnj)/(xnxi)r_{j,i-1} = \left(x_n - x_{n-j} \right) / \left(x_n - x_i\right)

The null hypothesis is tested against the alternative, the maximum observation is an outlier.

For testing a single minimum outlier, the test statistic is:

rj,i1=(x1+jx1)/(xnix1)r_{j,i-1} = \left(x_{1+j} - x_1 \right) / \left(x_{n-i} - x_1 \right)

The null hypothesis is tested against the alternative, the minimum observation is an outlier.

Apart from the earlier Dixons Q-test (i.e. r10r_{10}), a refined version that was later proposed by Dixon can be performed with this function, where the statistic rj,i1r_{j,i-1} depends on the sample size as follows:

r10r_{10}: 3n73 \le n \le 7
r11r_{11}: 8n108 \le n \le 10
r21r_{21}; 11n1311 \le n \le 13
r22r_{22}: 14n3014 \le n \le 30

The p-value is computed with the function pdixon.

References

Dixon, W. J. (1950) Analysis of extreme values. Ann. Math. Stat. 21, 488–506. doi:10.1214/aoms/1177729747.

Dean, R. B., Dixon, W. J. (1951) Simplified statistics for small numbers of observation. Anal. Chem. 23, 636–638. doi:10.1021/ac60052a025.

McBane, G. C. (2006) Programs to compute distribution functions and critical values for extreme value ratios for outlier detection. J. Stat. Soft. 16. doi:10.18637/jss.v016.i03.

Examples

## example from Dean and Dixon 1951, Anal. Chem., 23, 636-639.
x <- c(40.02, 40.12, 40.16, 40.18, 40.18, 40.20)
dixonTest(x, alternative = "two.sided")

## example from the dataplot manual of NIST
x <- c(568, 570, 570, 570, 572, 578, 584, 596)
dixonTest(x, alternative = "greater", refined = TRUE)