Title: | Obtain Alpha-Outlier Regions for Well-Known Probability Distributions |
---|---|
Description: | Given the parameters of a distribution, the package uses the concept of alpha-outliers by Davies and Gather (1993) to flag outliers in a data set. See Davies, L.; Gather, U. (1993): The identification of multiple outliers, JASA, 88 423, 782-792, <doi:10.1080/01621459.1993.10476339> for details. |
Authors: | Andre Rehage, Sonja Kuhnt |
Maintainer: | Andre Rehage <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2024-11-15 06:37:42 UTC |
Source: | CRAN |
-outlier regions for well-known probability distributions
Given the parameters of a distribution, the package uses the concept of -outliers by Davies and Gather (1993) to flag outliers in a data set.
The structure of the package is as follows: aout.[Distribution]
is the name of the function which returns the -outlier region of a random variable following
[Distribution]
. The names of the distributions are abbreviated as in the d, p, q, r
functions. Use pre-specified or robustly estimated parameters from your data to obtain reasonable results. The sample size should be taken into account when choosing alpha
, for example Gather et al. (2003) propose .
A. Rehage, S. Kuhnt
Davies, L.; Gather, U. (1993) The identification of multiple outliers, Journal of the American Statistical Association, 88 423, 782-792.
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
iris.setosa <- iris[1:51, 4] aout.norm(data = iris.setosa, param = c(mean(iris.setosa), sd(iris.setosa)), alpha = 0.01) aout.pois(data = warpbreaks[,1], param = mean(warpbreaks[,1]), alpha = 0.01, hide.outliers = TRUE)
iris.setosa <- iris[1:51, 4] aout.norm(data = iris.setosa, param = c(mean(iris.setosa), sd(iris.setosa)), alpha = 0.01) aout.pois(data = warpbreaks[,1], param = mean(warpbreaks[,1]), alpha = 0.01, hide.outliers = TRUE)
-outliers in Binomial data
Given the parameters of a Binomial distribution, aout.binom
identifies -outliers in a given data set.
aout.binom(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.binom(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the Binomial distribution, |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
data(uis) medbeck <- median(uis$BECK) aout.binom(data = uis$BECK, param = c(54, medbeck/54), alpha = 0.001)
data(uis) medbeck <- median(uis$BECK) aout.binom(data = uis$BECK, param = c(54, medbeck/54), alpha = 0.001)
-outliers in conditional Gaussian data
Given the parameters of a conditional Gaussian distribution, aout.cg
identifies -outliers in a given data set.
aout.cg(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.cg(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a matrix. First column: Class of the value, coded with an integer between 1 and d, where d is the number of classes. Second column: The value as a realization of a univariate normal with parameters |
param |
a list with three elements:
|
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers
is set to TRUE
, a data frame of the outlier-free data.
A. Rehage
Edwards, D. (2000) Introduction to Graphical Modelling. 2nd edition, Springer, New York.
Kuhnt, S.; Rehage, A. (2013) The concept of -outliers in structured data situations. In C. Becker, R. Fried, S. Kuhnt (Eds.): Robustness and Complex Data Structures. Festschrift in Honour of Ursula Gather. Berlin: Springer, 91-108.
# Rats' weights data example taken from Edwards (2000) ratweight <- cbind(Drug = c(1, 1, 2, 3, 1, 1, 2, 3, 1, 2, 3, 3, 1, 2, 2, 3, 1, 2, 2, 3, 1, 2, 3, 3), Week1 = c(5, 7, 9, 14, 7, 8, 7, 14, 9, 7, 21, 12, 5, 7, 6, 17, 6, 10, 6, 14, 9, 8, 16, 10)) aout.cg(ratweight, list(p = c(1/3, 1/3, 1/3), mu = c(7, 7, 14), sigma = c(1.6, 1.4, 3.3)))
# Rats' weights data example taken from Edwards (2000) ratweight <- cbind(Drug = c(1, 1, 2, 3, 1, 1, 2, 3, 1, 2, 3, 3, 1, 2, 2, 3, 1, 2, 2, 3, 1, 2, 3, 3), Week1 = c(5, 7, 9, 14, 7, 8, 7, 14, 9, 7, 21, 12, 5, 7, 6, 17, 6, 10, 6, 14, 9, 8, 16, 10)) aout.cg(ratweight, list(p = c(1/3, 1/3, 1/3), mu = c(7, 7, 14), sigma = c(1.6, 1.4, 3.3)))
-outliers in
data
Given the parameters of a distribution,
aout.chisq
identifies -outliers in a given data set.
aout.chisq(data, param, alpha = 0.1, hide.outliers = FALSE, ncp = 0, lower = auto.l, upper = auto.u, method.in = "Newton", global.in = "gline", control.in = list(sigma = 0.1, maxit = 1000, xtol = 1e-12, ftol = 1e-12, btol = 1e-04))
aout.chisq(data, param, alpha = 0.1, hide.outliers = FALSE, ncp = 0, lower = auto.l, upper = auto.u, method.in = "Newton", global.in = "gline", control.in = list(sigma = 0.1, maxit = 1000, xtol = 1e-12, ftol = 1e-12, btol = 1e-04))
data |
a vector. The data set to be examined. |
param |
an atomic vector. Contains the degrees of freedom of the |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to |
hide.outliers |
boolean. Returns the outlier-free data if set to |
ncp |
an atomic vector. Determines the non-centrality parameter of the |
lower |
an atomic vector. First element of |
upper |
an atomic vector. Second element of |
method.in |
See |
global.in |
See |
control.in |
See |
The -outlier region of a
distribution is generally not available in closed form or via the tails, such that a non-linear equation system has to be solved.
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
aout.chisq(chisq.test(occupationalStatus)$statistic, 49)
aout.chisq(chisq.test(occupationalStatus)$statistic, 49)
-outliers in two-way contingency tables
This is a wrapper function for aout.pois
. We assume that each entry of a contingency table can be seen as a realization of a Poisson random variable. The parameter of each cell can either be set by the user or estimated. Given the parameters,
aout.conttab
identifies -outliers in a given contingency table.
aout.conttab(data, param, alpha = 0.1, hide.outliers = FALSE, show.estimates = FALSE)
aout.conttab(data, param, alpha = 0.1, hide.outliers = FALSE, show.estimates = FALSE)
data |
a matrix or data.frame. The contingency table to be examined. |
param |
a character string from |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
show.estimates |
boolean. Returns |
Data frame of the vectorized input data and, if desired, an index named is.outlier
that flags the outliers with TRUE
and a vector named param
containing the estimated lambdas.
A. Rehage
Kuhnt, S. (2000) Ausreisseridentifikation im Loglinearen Poissonmodell fuer Kontingenztafeln unter Einbeziehung robuster Schaetzer. Ph.D. Thesis. Universitaet Dortmund, Dortmund. Fachbereich Statistik.
Kuhnt, S.; Rapallo, F.; Rehage, A. (2014) Outlier detection in contingency tables based on minimal patterns. Statistics and Computing 24 (3), 481-491.
aout.conttab(data = HairEyeColor[,,1], param = "L1", alpha = 0.01, show.estimates = TRUE) aout.conttab(data = HairEyeColor[,,1], param = "ML", alpha = 0.01, show.estimates = TRUE)
aout.conttab(data = HairEyeColor[,,1], param = "L1", alpha = 0.01, show.estimates = TRUE) aout.conttab(data = HairEyeColor[,,1], param = "ML", alpha = 0.01, show.estimates = TRUE)
-outliers in exponentially distributed data
Given the parameters of an exponential distribution, aout.exp
identifies -outliers in a given data set.
aout.exp(data, param, alpha = 0.1, hide.outliers = FALSE, theta = 0)
aout.exp(data, param, alpha = 0.1, hide.outliers = FALSE, theta = 0)
data |
a vector. The data set to be examined. |
param |
an atomic vector. Contains the parameter of the exponential distribution. |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
theta |
an atomic vector. Determines the lower bound of the support of the exponential distribution. Defaults to 0. |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
aout.exp(attenu[,5], median(attenu[,5]), alpha = 0.05)
aout.exp(attenu[,5], median(attenu[,5]), alpha = 0.05)
-outliers in data from the family of
-and-
distributions
Given the parameters of a -and-
distribution,
aout.gandh
identifies -outliers in a given data set.
aout.gandh(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.gandh(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
The concept of -outliers is based on the p.d.f. of the random variable. Since for
-and-
distributions this does not exist in closed form, the computation of the outlier region is based on an optimization of the quantile function with side conditions.
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
Makes use of solnp
.
A. Rehage
Xu, Y.; Iglewicz, B.; Chervoneva, I. (2014) Robust estimation of the parameters of g-and-h distributions, with applications to outlier detection. Computational Statistics and Data Analysis 75, 66-80.
durations <- faithful$eruptions aout.gandh(durations, c(4.25, 1.14, 0.05, 0.05), alpha = 0.1)
durations <- faithful$eruptions aout.gandh(durations, c(4.25, 1.14, 0.05, 0.05), alpha = 0.1)
-outliers in hypergeometric data
Given the parameters of a hypergeometric distribution, aout.hyper
identifies -outliers in a given data set.
aout.hyper(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.hyper(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the hypergeometric distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
set.seed(1) lotto6aus49 <- rhyper(100, 6, 43, 6) aout.hyper(lotto6aus49, c(6, 43, 6), 0.1)
set.seed(1) lotto6aus49 <- rhyper(100, 6, 43, 6) aout.hyper(lotto6aus49, c(6, 43, 6), 0.1)
-outliers in arbitrary univariate data using kernel density estimation
Given the arguments of the density
, aout.kernel
identifies -outliers in a given data set.
aout.kernel(data, alpha, plot = TRUE, plottitle = "", kernel = "gaussian", nkernel = 1024, kern.bw = "SJ", kern.adj = 1, xlim = NA, ylim = NA, outints = FALSE, w = NA, ...)
aout.kernel(data, alpha, plot = TRUE, plottitle = "", kernel = "gaussian", nkernel = 1024, kern.bw = "SJ", kern.adj = 1, xlim = NA, ylim = NA, outints = FALSE, w = NA, ...)
data |
a vector. The data set to be examined. |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. |
plot |
boolean. If |
plottitle |
character string. Title of the plot. |
kernel |
See |
nkernel |
See |
kern.bw |
See |
kern.adj |
See |
xlim |
a vector. Specify if you want to change the x-limits of the plot. |
ylim |
a vector. Specify if you want to change the y-limits of the plot. |
outints |
boolean. If |
w |
a vector. See |
... |
Further arguments for |
If outints = TRUE
, a list of
Results |
A data frame containing one row for each observation. The observations are labelled whether they are outlying, the value of the estimated density at the observation is shown and the bound of the outlier identifier. |
Bounds.of.Inlier.Regions |
The bounds of the inlier region(s). |
KDE.Chosen.Bandwidth |
The bandwidth that was chosen by |
A. Rehage
set.seed(23) tempx <- rnorm(1000, 0, 1) tempx[1] <- -2.5 aout.kernel(tempx[1:10], alpha = 0.1, kern.adj = 1, xlim = c(-3,3), outints = TRUE) # not run: # aout.kernel(tempx[1:200], alpha = 0.1, kern.adj = 1, xlim = c(-3,3))
set.seed(23) tempx <- rnorm(1000, 0, 1) tempx[1] <- -2.5 aout.kernel(tempx[1:10], alpha = 0.1, kern.adj = 1, xlim = c(-3,3), outints = TRUE) # not run: # aout.kernel(tempx[1:200], alpha = 0.1, kern.adj = 1, xlim = c(-3,3))
-outliers in Laplace / double exponential data
Given the parameters of a Laplace distribution, aout.laplace
identifies -outliers in a given data set.
aout.laplace(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.laplace(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the Laplace distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Dumonceaux, R.; Antle, C. E. (1973) Discrimination between the log-normal and the Weibull distributions. Technometrics, 15 (4), 923-926.
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
# Using the flood data from Dumonceaux and Antle (1973): temp <- c(0.265, 0.269, 0.297, 0.315, 0.3225, 0.338, 0.379, 0.380, 0.392, 0.402, 0.412, 0.416, 0.418, 0.423, 0.449, 0.484, 0.494, 0.613, 0.654, 0.74) aout.laplace(temp, c(median(temp), median(abs(temp - median(temp)))), 0.05)
# Using the flood data from Dumonceaux and Antle (1973): temp <- c(0.265, 0.269, 0.297, 0.315, 0.3225, 0.338, 0.379, 0.380, 0.392, 0.402, 0.412, 0.416, 0.418, 0.423, 0.449, 0.484, 0.494, 0.613, 0.654, 0.74) aout.laplace(temp, c(median(temp), median(abs(temp - median(temp)))), 0.05)
-outliers in logistic data
Given the parameters of a logistic distribution, aout.logis
identifies -outliers in a given data set.
aout.logis(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.logis(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the logistic distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Balakrishnan, N. (1992) Maximum likelihood estimation based on complete and type II censored samples. In N. Balakrishnan (Ed.): Handbook of the Logistic Distribution. Dekker, New York, 49-78.
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
# Data example from Balakrishnan (1967) lifetime <- c(785, 855, 905, 918, 919, 920, 929, 936, 948, 950) aout.logis(lifetime, c(949.9, 63.44))
# Data example from Balakrishnan (1967) lifetime <- c(785, 855, 905, 918, 919, 920, 929, 936, 948, 950) aout.logis(lifetime, c(949.9, 63.44))
-outliers in multivariate normal data
Given the parameters of a multivariate normal distribution, aout.mvnorm
identifies -outliers in a given data set.
aout.mvnorm(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.mvnorm(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a data.frame or matrix. The data set to be examined. |
param |
a list. Contains the parameters of the normal distribution: the mean vector |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a data frame of the outlier-free data.
A. Rehage
Kuhnt, S.; Rehage, A. (2013) The concept of -outliers in structured data situations. In C. Becker, R. Fried, S. Kuhnt (Eds.): Robustness and Complex Data Structures. Festschrift in Honour of Ursula Gather. Berlin: Springer, 91-108.
temp <- iris[1:51,-5] temp.xq <- apply(FUN = median, MARGIN = 2, temp) aout.mvnorm(as.matrix(temp), param = list(temp.xq, cov(temp)), alpha = 0.001)
temp <- iris[1:51,-5] temp.xq <- apply(FUN = median, MARGIN = 2, temp) aout.mvnorm(as.matrix(temp), param = list(temp.xq, cov(temp)), alpha = 0.001)
-outliers in negative Binomial data
Given the parameters of a negative Binomial distribution, aout.nbinom
identifies -outliers in a given data set.
aout.nbinom(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.nbinom(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the negative Binomial distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
data(daysabs) aout.nbinom(daysabs, c(8, 0.6), 0.05)
data(daysabs) aout.nbinom(daysabs, c(8, 0.6), 0.05)
-outliers in normal data
Given the parameters of a normal distribution, aout.norm
identifies -outliers in a given data set.
aout.norm(data, param = c(0, 1), alpha = 0.1, hide.outliers = FALSE)
aout.norm(data, param = c(0, 1), alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the normal distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers
is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
iris.setosa <- iris[1:51, 4] # implosion breakdown point: aout.norm(data = iris.setosa, param = c(median(iris.setosa), mad(iris.setosa)), alpha = 0.01) # better: aout.norm(data = iris.setosa, param = c(median(iris.setosa), sd(iris.setosa)), alpha = 0.01)
iris.setosa <- iris[1:51, 4] # implosion breakdown point: aout.norm(data = iris.setosa, param = c(median(iris.setosa), mad(iris.setosa)), alpha = 0.01) # better: aout.norm(data = iris.setosa, param = c(median(iris.setosa), sd(iris.setosa)), alpha = 0.01)
-outliers in Pareto data
Given the parameters of a Pareto distribution, aout.pareto
identifies -outliers in a given data set.
aout.pareto(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.pareto(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the Pareto distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
We use the Pareto distribution with Lebesgue-density .
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Gather, U.; Kuhnt, S.; Pawlitschko, J. (2003) Concepts of outlyingness for various data structures. In J. C. Misra (Ed.): Industrial Mathematics and Statistics. New Delhi: Narosa Publishing House, 545-585.
data(citiesData) aout.pareto(citiesData[[1]], c(1.31, 14815), alpha = 0.01)
data(citiesData) aout.pareto(citiesData[[1]], c(1.31, 14815), alpha = 0.01)
-outliers in Poisson count data
Given the parameters of a Poisson distribution, aout.pois
identifies -outliers in a given data set.
aout.pois(data, param, alpha = 0.1, hide.outliers = FALSE)
aout.pois(data, param, alpha = 0.1, hide.outliers = FALSE)
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameter of the Poisson distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
aout.pois(data = c(discoveries), param = median(discoveries), alpha = 0.01)
aout.pois(data = c(discoveries), param = median(discoveries), alpha = 0.01)
-outliers in Weibull data
Given the parameters of a Weibull distribution, aout.weibull
identifies -outliers in a given data set.
aout.weibull(data, param, alpha = 0.1, hide.outliers = FALSE, lower = auto.l, upper = auto.u, method.in = "Broyden", global.in = "qline", control.in = list(sigma = 0.1, maxit = 1000, xtol = 1e-12, ftol = 1e-12, btol = 1e-04))
aout.weibull(data, param, alpha = 0.1, hide.outliers = FALSE, lower = auto.l, upper = auto.u, method.in = "Broyden", global.in = "qline", control.in = list(sigma = 0.1, maxit = 1000, xtol = 1e-12, ftol = 1e-12, btol = 1e-04))
data |
a vector. The data set to be examined. |
param |
a vector. Contains the parameters of the Weibull distribution: |
alpha |
an atomic vector. Determines the maximum amount of probability mass the outlier region may contain. Defaults to 0.1. |
hide.outliers |
boolean. Returns the outlier-free data if set to |
lower |
an atomic vector. First element of |
upper |
an atomic vector. Second element of |
method.in |
See |
global.in |
See |
control.in |
See |
The -outlier region of a Weibull distribution is generally not available in closed form or via the tails, such that a non-linear equation system has to be solved.
Data frame of the input data and an index named is.outlier
that flags the outliers with TRUE
. If hide.outliers is set to TRUE
, a simple vector of the outlier-free data.
A. Rehage
Dodson, B. (2006) The Weibull Analysis Handbook. American Society for Quality, 2nd edition.
# lifetime data example taken from Table 2.2, Dodson (2006) temp <- c(12.5, 24.4, 58.2, 68.0, 69.1, 95.5, 96.6, 97.0, 114.2, 123.2, 125.6, 152.7) aout.weibull(temp, c(2.25, 97), 0.1)
# lifetime data example taken from Table 2.2, Dodson (2006) temp <- c(12.5, 24.4, 58.2, 68.0, 69.1, 95.5, 96.6, 97.0, 114.2, 123.2, 125.6, 152.7) aout.weibull(temp, c(2.25, 97), 0.1)
Population of the 999 largest German cities as a real life example for Pareto distributed data
data(citiesData)
data(citiesData)
List with one element
http://bevoelkerungsstatistik.de
This function creates a design matrix for contingency tables and is particularly useful for log-linear Poisson models. It uses effect coding of the variables: First the rows of the contingency table from top to bottom, then the columns from left to right.
createDesMat(n, p)
createDesMat(n, p)
n |
Number of rows of the corresponding contingency table. |
p |
Number of columns of the corresponding contingency table. |
A (n+p-1) times (n*p) design matrix.
A. Rehage
Kuhnt, S.; Rapallo, F.; Rehage, A. (2014) Outlier detection in contingency tables based on minimal patterns. Statistics and Computing 24 (3), 481-491.
createDesMat(3, 5)
createDesMat(3, 5)
Number of absence days of students
data(daysabs)
data(daysabs)
Vector with 314 elements
http://www.ats.ucla.edu/stat/r/dae/nbreg.htm