Title: | M-Estimation of Shape for Data with Missing Values |
---|---|
Description: | M-estimators of location and shape following the power family (Frahm, Nordhausen, Oja (2020) <doi:10.1016/j.jmva.2019.104569>) are provided in the case of complete data and also when observations have missing values together with functions aiding their visualization. |
Authors: | Katharina Riemer [cre, aut], Gabriel Frahm [aut] , Klaus Nordhausen [aut] , Una Radojicic [aut] |
Maintainer: | Katharina Riemer <[email protected]> |
License: | GPL-3 |
Version: | 0.0.2 |
Built: | 2024-11-14 06:45:26 UTC |
Source: | CRAN |
Visualize the proportion of missingness per variable in a barplot.
## S3 method for class 'shapeNA' barplot(height, sortNA = FALSE, ...)
## S3 method for class 'shapeNA' barplot(height, sortNA = FALSE, ...)
height |
A |
sortNA |
A logical. If |
... |
Additional graphical arguments passed to
|
Invisibly returns a named vector holding the proportion of missingness per variable.
S <- toeplitz(seq(1, 0.1, length.out = 3)) x <- mvtnorm::rmvt(100, S, df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) barplot(res)
S <- toeplitz(seq(1, 0.1, length.out = 3)) x <- mvtnorm::rmvt(100, S, df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) barplot(res)
Reorder a data set with NA
entries to form blocks of missing values. The
resulting data will have increasing missingness along the rows and along the
columns. The rows are ordered such that the first block consists of complete
observations, and the following blocks are ordered from most frequent
missingness pattern to least frequent missingness pattern.
naBlocks(x, cleanup = TRUE, plot = FALSE)
naBlocks(x, cleanup = TRUE, plot = FALSE)
x |
A matrix with missing values. |
cleanup |
A logical flag. If |
plot |
A logical flag. If |
In case of ties, that is if two patterns occur with the same frequency, the block whose pattern occurs first will be ordered in front of the other block.
This method may fail if the missingness is too strong or if the number of observations is too low (the number of observations has to exceed the number of variables), as it has been designed as a preprocessing step for shape estimations.
A list of class naBlocks
with components:
x |
The reordered data matrix. |
permutation |
The permutation of the columns that was applied to reorder the columns according to the number of |
rowPermutation |
The permutation of the rows that generates the blocks. |
N |
A vector of all row indices. Each row number points to the beginning of a new missingness pattern. |
D |
A vector specifying the missingness pattern for each block. |
P |
A vector specifying the number of observed variables per block. |
kn |
A vector specifying the percentage of observed responses per variable. |
Function to visualize the missingness patterns for objects of class naBlocks
.
## S3 method for class 'naBlocks' plot(x, ...)
## S3 method for class 'naBlocks' plot(x, ...)
x |
A |
... |
Additional parameters passed on to |
No return value.
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) plot(res$naBlocks)
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) plot(res$naBlocks)
Function to visualize the shape matrix from objects of class shapeNA
by
plotting a heatmap where light colored cells indicate small values and dark
colored cells indicate high values.
## S3 method for class 'shapeNA' plot(x, message = TRUE, ...)
## S3 method for class 'shapeNA' plot(x, message = TRUE, ...)
x |
A |
message |
A logical, If |
... |
Additional parameters passed to |
A matrix with the proportion of observed values for each variable.
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- tylerShapeNA(y) ## default plot plot(res) ## plot result in gray scale - reverse order to get a palette starting ## with the lightest instead of the darkest color plot(res, col = gray.colors(9, rev = TRUE))
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- tylerShapeNA(y) ## default plot plot(res) ## plot result in gray scale - reverse order to get a palette starting ## with the lightest instead of the darkest color plot(res, col = gray.colors(9, rev = TRUE))
Power M-estimators of shape and location were recently suggested in
Frahm et al. (2020). They have a tuning parameter alpha
taking values in
[0,1]
. The extreme case alpha
= 1 corresponds to Tyler's shape
matrix and alpha
= 0 to the classical covariance matrix. These special
cases have their own, more efficient functions tylerShape
and
classicShape
, respectively.
If the true location is known, it should be supplied as center
, otherwise
it is estimated simultaneously with the shape.
powerShape(x, alpha, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) tylerShape(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) classicShape(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6)
powerShape(x, alpha, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) tylerShape(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) classicShape(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6)
x |
A numeric data matrix or data.frame without missing data. |
alpha |
Tail index, a numeric value in the interval |
center |
An optional vector of the data's center. If |
normalization |
A string determining how the shape matrix is standardized. The possible values are
|
maxiter |
A positive integer, restricting the maximum number of iterations. |
eps |
A numeric, specifying the tolerance level of when the iteration stops. |
These functions assume that the data were generated from an elliptical distribution, for Tyler's estimate this can be relaxed to generalized elliptical distributions.
For multivariate normally distributed data, classicShape
is the maximum
likelihood estimator of location and scale. It is a special case of the
power M-estimator with tail index alpha
= 0, which returns the empirical
covariance matrix and the empirical mean vector.
The function tylerShape
maximizes the likelihood function after projecting
the observed data of each individual onto the unit hypersphere, in which case
we obtain an angular central Gaussian distribution. It is a special case of
the power M-estimator with tail index alpha
= 1, which returns Tyler's
M-estimator of scatter and an affine equivariant multivariate median
according to Hettmansperger and Randles (2002).
The function powerShape
requires an additional parameter, the so-called
tail index alpha
. For heavy tailed data, the index should be chosen closer
to 1, whereas for light tailed data the index should be chosen closer to 0.
A list with class 'shapeNA' containing the following components:
S |
The estimated shape matrix. |
scale |
The scale with which the shape matrix may be scaled to obtain
a scatter estimate. If |
mu |
The location parameter, either provided by the user or estimated. |
alpha |
The tail index with which the Power M-estimator has been called. |
naBlocks |
|
iterations |
Number of computed iterations before convergence. |
call |
The matched call. |
Tyler, D.E. (1987). A Distribution-Free M-Estimator of Multivariate Scatter. The Annals of Statistics, 15, 234.251. doi:10.1214/aos/1176350263.
Frahm, G., Nordhausen, K., & Oja, H. (2020). M-estimation with incomplete and dependent multivariate data. Journal of Multivariate Analysis, 176, 104569. doi:10.1016/j.jmva.2019.104569.
Hettmansperger, T. P., & Randles, R. H. (2002). A practical affine equivariant multivariate median. Biometrika, 89(4), 851-860. doi:10.1093/biomet/89.4.851
powerShapeNA, tylerShapeNA and classicShapeNA for the corresponding functions for data with missing values.
## Generate example data S <- toeplitz(c(1, 0.1)) x <- mvtnorm::rmvt(100, S) ## Compute some M-estimators res0 <- classicShape(x, center = c(0, 0)) res1 <- powerShape(x, alpha = 0.67, normalization = 'one') res2 <- tylerShape(x, normalization = 'trace') ## Get location estimates res1$mu res2$mu ## Get shape estimates res0$S res1$S res2$S ## Print summary summary(res0)
## Generate example data S <- toeplitz(c(1, 0.1)) x <- mvtnorm::rmvt(100, S) ## Compute some M-estimators res0 <- classicShape(x, center = c(0, 0)) res1 <- powerShape(x, alpha = 0.67, normalization = 'one') res2 <- tylerShape(x, normalization = 'trace') ## Get location estimates res1$mu res2$mu ## Get shape estimates res0$S res1$S res2$S ## Print summary summary(res0)
Power M-estimators of shape and location were recently suggested in
Frahm et al. (2020). They have a tuning parameter alpha
taking values in
[0,1]
. The extreme case alpha
= 1 corresponds to Tyler's shape
matrix and alpha
= 0 to the classical covariance matrix. These special
cases have their own, more efficient functions tylerShapeNA
and
classicShapeNA
, respectively.
If the true location is known, it should be supplied as center
, otherwise
it is estimated simultaneously with the shape.
powerShapeNA(x, alpha, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) tylerShapeNA(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) classicShapeNA(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6)
powerShapeNA(x, alpha, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) tylerShapeNA(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6) classicShapeNA(x, center = NULL, normalization = c("det", "trace", "one"), maxiter = 1e4, eps = 1e-6)
x |
A data matrix or data.frame with missing data and |
alpha |
Tail index, a numeric value in the interval |
center |
An optional vector of the data's center, if |
normalization |
A string determining how the shape matrix is standardized. The possible values are
|
maxiter |
A positive integer, restricting the maximum number of iterations. |
eps |
A numeric, specifying tolerance level of when the iteration stops. |
These functions assume that the data were generated from an elliptical distribution, for Tyler's estimate this can be relaxed to generalized elliptical distributions The missingness mechanism should be MCAR or, under stricter distributional assumptions, MAR. See the references for details.
For multivariate normally distributed data, classicShapeNA
is the maximum
likelihood estimator of the location and scale. It is a special case of the
power M-estimator with tail index alpha
= 0, which returns the
empirical covariance matrix and the empirical mean vector.
The function tylerShapeNA
maximizes the likelihood function after projecting
the observed data of each individual onto the unit hypersphere, in which case
we obtain an angular central Gaussian distribution. It is a special case of
the power M-estimator with tail index alpha
= 1, which returns Tyler's
M-estimator of scatter and an affine equivariant multivariate median
according to Hettmansperger and Randles (2002).
The function powerShapeNA
requires an additional parameter, the so-called
tail index alpha
. For heavy tailed data, the index should be chosen closer
to 1, whereas for light tailed data the index should be chosen closer to 0.
A list with class 'shapeNA' containing the following components:
The estimated shape matrix.
The scale with which the shape matrix may be scaled to obtain
a scatter estimate. If alpha
= 1, then this value will be NA
, as
Tyler's shape matrix has no natural scale.
The location parameter, either provided by the user or estimated.
The tail index with which the Power M-estimator has been called.
An naBlocks
object, with information about the missingness
of the data.
Number of computed iterations before convergence.
The matched call.
Frahm, G., & Jaekel, U. (2010). A generalization of Tyler's M-estimators to the case of incomplete data. Computational Statistics & Data Analysis, 54, 374-393. doi:10.1016/j.csda.2009.08.019.
Frahm, G., Nordhausen, K., & Oja, H. (2020). M-estimation with incomplete and dependent multivariate data. Journal of Multivariate Analysis, 176, 104569. doi:10.1016/j.jmva.2019.104569.
Hettmansperger, T. P., & Randles, R. H. (2002). A practical affine equivariant multivariate median. Biometrika, 89(4), 851-860. doi:10.1093/biomet/89.4.851
powerShape, tylerShape and classicShape for the corresponding functions for data without missing values.
## Generate a data set with missing values x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech = 'MCAR')$amp ## Compute some M-estimators res0 <- classicShapeNA(y, center = c(0, 0, 0)) res1 <- powerShapeNA(y, alpha = 0.67, normalization = 'one') res2 <- tylerShapeNA(y, normalization = 'trace') ## Get location estimates res1$mu res2$mu ## Get shape estimates res0$S res1$S res2$S ## Print summary summary(res0) ## Inspect missingness pattern plot(res0$naBlocks) barplot(res0)
## Generate a data set with missing values x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech = 'MCAR')$amp ## Compute some M-estimators res0 <- classicShapeNA(y, center = c(0, 0, 0)) res1 <- powerShapeNA(y, alpha = 0.67, normalization = 'one') res2 <- tylerShapeNA(y, normalization = 'trace') ## Get location estimates res1$mu res2$mu ## Get shape estimates res0$S res1$S res2$S ## Print summary summary(res0) ## Inspect missingness pattern plot(res0$naBlocks) barplot(res0)
Print the pattern of missingness in the supplied data, as a block matrix. Observed data are represented by 1, missing values by 0.
## S3 method for class 'naBlocks' print(x, ...)
## S3 method for class 'naBlocks' print(x, ...)
x |
An |
... |
Additional parameters passed to |
The first row shows the column names. The leftmost column, without column
name, shows the number of rows per block and the rightmost column with name
#
shows the number of observed variables in the block.
A named matrix representing the missingness pattern of the data.
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) print(res$naBlocks)
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) y <- mice::ampute(x, mech='MCAR')$amp res <- classicShapeNA(y) print(res$naBlocks)
shapeNA
Prints the chosen value of alpha
as well as the estimated shape and
location for objects of class shapeNA
.
## S3 method for class 'shapeNA' print(x, ...)
## S3 method for class 'shapeNA' print(x, ...)
x |
A |
... |
Additional parameters passed to lower level |
No return value.
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) res <- tylerShape(x) res ## equivalent to call print(res)
x <- mvtnorm::rmvt(100, toeplitz(seq(1, 0.1, length.out = 3)), df = 5) res <- tylerShape(x) res ## equivalent to call print(res)
summary.shapeNA
Print Method for Class summary.shapeNA
## S3 method for class 'summary.shapeNA' print(x, ...)
## S3 method for class 'summary.shapeNA' print(x, ...)
x |
Object returned from |
... |
Further arguments to be passed to or from methods. |
No return value.
obj <- tylerShape(mvtnorm::rmvt(100, diag(3))) print(summary(obj))
obj <- tylerShape(mvtnorm::rmvt(100, diag(3))) print(summary(obj))
shapeNA
ObjectsFor Power M-estimates with tail index alpha < 1
, the resulting estimate
has a scale. For these shape estimates, scatter matrices can be computed.
Results from
tylerShape
and tylerShapeNA
give no scatter
estimates. In these cases the function returns NA
.
shape2scatter(obj)
shape2scatter(obj)
obj |
|
Scatter matrix estimate, or only NA
if alpha
= 1.
S <- toeplitz(c(1, 0.3, 0.7)) set.seed(123) x <- mvtnorm::rmvt(100, S, df = 3) obj_det <- powerShape(x, alpha = 0.85, normalization = 'det') shape2scatter(obj_det) obj_tr <- powerShape(x, alpha = 0.85, normalization = 'trace') shape2scatter(obj_tr) obj_one <- powerShape(x, alpha = 0.85, normalization = 'one') shape2scatter(obj_one)
S <- toeplitz(c(1, 0.3, 0.7)) set.seed(123) x <- mvtnorm::rmvt(100, S, df = 3) obj_det <- powerShape(x, alpha = 0.85, normalization = 'det') shape2scatter(obj_det) obj_tr <- powerShape(x, alpha = 0.85, normalization = 'trace') shape2scatter(obj_tr) obj_one <- powerShape(x, alpha = 0.85, normalization = 'one') shape2scatter(obj_one)
shapeNA
Summary methods for objects from class shapeNA
.
## S3 method for class 'shapeNA' summary(object, ...)
## S3 method for class 'shapeNA' summary(object, ...)
object |
An object of class |
... |
Further arguments to be passed to or from methods. |
A summary.shapeNA
object. For objects of this class, the print
method tries to format the location and shape estimate in a readable format
and also shows the number of iterations, before the algorithm converged.
obj <- tylerShape(mvtnorm::rmvt(100, diag(3))) summary(obj)
obj <- tylerShape(mvtnorm::rmvt(100, diag(3))) summary(obj)