Title: | Fast Computation of Multivariate M-Estimators |
---|---|
Description: | Implements the new algorithm for fast computation of M-scatter matrices using a partial Newton-Raphson procedure for several estimators. The algorithm is described in Duembgen, Nordhausen and Schuhmacher (2016) <doi:10.1016/j.jmva.2015.11.009>. |
Authors: | Lutz Duembgen, Klaus Nordhausen, Heike Schuhmacher |
Maintainer: | Klaus Nordhausen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0-4 |
Built: | 2024-11-12 06:31:39 UTC |
Source: | CRAN |
Implements the new algorithm for fast computation of M-scatter matrices using a partial Newton-Raphson procedure for several estimators. The algorithm is described in Duembgen, Nordhausen and Schuhmacher (2016) <doi:10.1016/j.jmva.2015.11.009>.
Multivariate M-estimators are usually computed using a fixed-point algorithm. As shown in Duembgen et al. (2016) a partial Newton-Raphson procedure applied to the second order Taylor expansion of the target function can make the computation considerably faster. We implement this new algorithm for the multivariate M-estimator of location and scatter using weights coming from the multivariate t-distribution (Kent et al., 1994), its symmetrized version, Tyler's shape matrix (Tyler, 1987) and Duembgen's shape matrix (Duembgen, 1998). For the symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially.
Lutz Duembgen, Klaus Nordhausen, Heike Schuhmacher
Maintainer: Klaus Nordhausen <[email protected]>
Duembgen, L. (1998), On Tyler's M-functional of scatter in high dimension, Annals of Institute of Statistical Mathematics, 50, 471–491.
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
Kent, J.T., Tyler, D.E. and Vardi, Y. (1994), A curious likelihood identity for the multivariate t-distribution, Communications in Statistics, Theory and Methods, 23, 441–453.
Tyler, D.E. (1987), A distribution-free M-estimator of scatter, Annals of Statistics, 15, 234–251.
Iterative algorithm to estimate Duembgen's shape matrix using a partial Newton-Raphson approach.
DUEMBGENshape(X, nmax = 500, eps = 1e-06, maxiter = 100, perm = FALSE)
DUEMBGENshape(X, nmax = 500, eps = 1e-06, maxiter = 100, perm = FALSE)
X |
numeric data matrix or dataframe. Missing values are not allowed. |
nmax |
integer, if the sample size n (number of rows of |
eps |
convergence tolerance, which means that the algorithm stops when the Frobenius norm of the gradient is smaller than eps. |
maxiter |
maximum number of iterations. |
perm |
logical. If TRUE the rows of |
The estimate is based on the new fast algorithm described in Duembgen et al. (2016). Note that Duembgen's shape matrix is standardized such that it has determinant 1.
The function does not check if there are several identical observations. In that case the function will fail.
To get a good initial value for the algorithm, the estimator is first computed based on the pairwise differences of
successive observations. Therefore the order of the rows of X
is supposed to be random. If this is not the case, the data
should be first permuted using the argument perm
.
In case maxiter
is reached before convergence, the estimate at that iteration is returned and a warning is given.
A list containing:
Sigma |
Estimated shape matrix. |
iter |
Number of iterations of the algorithm. |
Lutz Duembgen and Klaus Nordhausen
Duembgen, L. (1998), On Tyler's M-functional of scatter in high dimension, Annals of Institute of Statistical Mathematics, 50, 471–491.
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
DUEMBGENshape(longley) DUEMBGENshape(longley, nmax=10) # compare to # library(ICSNP) # duembgen.shape(longley)
DUEMBGENshape(longley) DUEMBGENshape(longley, nmax=10) # compare to # library(ICSNP) # duembgen.shape(longley)
The algorithm of this function is based on a partial Newton approach and should be faster than the traditional fixed-point algorithm. If the data follows a multivariate t-distribution with the correctly specified degrees of freedom this function gives the maximum likelihood estimate of location and scatter.
MVTMLE(X, nu = 1, location = TRUE, eps = 1e-06, maxiter = 100)
MVTMLE(X, nu = 1, location = TRUE, eps = 1e-06, maxiter = 100)
X |
numeric data matrix or dataframe. Missing values are not allowed. |
nu |
assumed degrees of freedom of the t-distribution. Default is '1' which corresponds to the Cauchy distribution. |
location |
logical or numeric. If FALSE, it is assumed that the scatter should be computed wrt to the origin. If TRUE the location will be estimated and if it is a numeric vector it will be computed wrt to this vector. |
eps |
convergence tolerance, which means that the algorithm stops when the Frobenius norm of the gradient is smaller than eps. |
maxiter |
maximum number of iterations. |
The assumed degree of freedom nu must be at least 1 when the location and scatter should be estimated. If only the scatter is to be estimated, then it needs to be larger than zero only.
In case maxiter
is reached before convergence, the estimate at that iteration is returned and a warning is given.
A list containing:
mu |
Estimated location if |
Sigma |
Estimated scatter matrix. |
iter |
Number of iterations of the algorithm. |
Lutz Duembgen and Klaus Nordhausen
Kent, J.T., Tyler, D.E. and Vardi, Y. (1994), A curious likelihood identity for the multivariate t-distribution, Communications in Statistics, Theory and Methods, 23, 441–453.
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
MVTMLE(longley) # compare to # library(ICS) # tM(longley) # library(MASS) # cov.trob(longley, nu=1, tol = 1e-06, maxit = 100)
MVTMLE(longley) # compare to # library(ICS) # tM(longley) # library(MASS) # cov.trob(longley, nu=1, tol = 1e-06, maxit = 100)
The functions below are only for comparison purposes and are all written in R. Each function corresponds to a different algorithm for the scatter only problem for M-estimation using weights coming from the multivariate t-distribution.
MVTMLE0r(X, nu = 0, delta = 1e-06, prewhitened = FALSE, steps = FALSE) MVTMLE0r_FP(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_FP0(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_G(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_CG(X, nu = 0, delta = 1e-06, steps = FALSE)
MVTMLE0r(X, nu = 0, delta = 1e-06, prewhitened = FALSE, steps = FALSE) MVTMLE0r_FP(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_FP0(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_G(X, nu = 0, delta = 1e-06, steps = FALSE) MVTMLE0r_CG(X, nu = 0, delta = 1e-06, steps = FALSE)
X |
numeric data matrix or dataframe. Missing values are not allowed. |
nu |
assumed degrees of freedom of the t-distribution. Must be 0 or larger. Default is '0' which corresponds to Tyler's shape matrix. |
delta |
convergence tolerance, which means that the algorithms stop when the Frobenius norm of the gradient is smaller than delta. |
prewhitened |
logical. Is the data prewhitened or not. |
steps |
logial. If TRUE intermediate results are printed on the console. |
All functions are implemented in R and their purpose is only for demonstration of the differences of the different algorithms.
The function MVTMLE0r
uses the recommended partial Newton approach as implemented also in (MVTMLE
and TYLERshape
).
MVTMLE0r_FP
and MVTMLE0r_FP0
are fixed-point algorithms where MVTMLE0r_FP
iterates the fixed point equation with
'iterative whitening' of the data. The function MVTMLE0r_G
uses a gradient approach and MVTMLE0r_CG
a conjugate gradient approach.
Note that MVTMLE0r_CG
does not check if the 'next' step is really an improvement and that all functions compute the scatter wrt to the origin.
All functions have a hard coded maximum number of iterations of 1000. If that is reached the functions returns the final estimate, however without a warning.
For general purposes we recommend the functions MVTMLE
and TYLERshape
.
A list containing at least:
S |
Estimated scatter matrix (or shape matrix if |
iter |
Number of iterations of the algorithm. |
Lutz Duembgen and Klaus Nordhausen
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
MVTMLE0r(longley,nu=1) MVTMLE0r_FP(longley,nu=1) MVTMLE0r_FP0(longley,nu=1) MVTMLE0r_G(longley,nu=1) MVTMLE0r_CG(longley,nu=1)
MVTMLE0r(longley,nu=1) MVTMLE0r_FP(longley,nu=1) MVTMLE0r_FP0(longley,nu=1) MVTMLE0r_G(longley,nu=1) MVTMLE0r_CG(longley,nu=1)
Based on a partial Newton-Raphson approach offers this function two ways to compute the symmetrized M-estimator of scatter. The user can choose if all pairwise differences are choosen and stored in the memory or if the computation and storage of this large matrix is to be avoided.
MVTMLEsymm(X, nu = 1, nmax = 500, eps = 1e-06, maxiter = 100, perm = FALSE)
MVTMLEsymm(X, nu = 1, nmax = 500, eps = 1e-06, maxiter = 100, perm = FALSE)
X |
numeric data matrix or dataframe with more rows than columns. Missing values are not allowed. |
nu |
assumed degrees of freedom of the t-distribution, must be larger than 0. Default is '1'. |
nmax |
integer, if the sample size n (number of rows of |
eps |
convergence tolerance, which means that the algorithm stops when the Frobenius norm of the gradient is smaller than eps. |
maxiter |
maximum number of iterations. |
perm |
logical. If TRUE the rows of |
To get a good initial value for the algorithm, the estimator is first computed based on the pairwise differences of
successive observations. Therefore the order of the rows of X
is supposed to be random. If this is not the case, the data
should be first permuted using the argument perm
.
In case maxiter
is reached before convergence, the estimate at that iteration is returned and a warning is given.
A list containing:
Sigma |
Estimated scatter matrix. |
iter |
Number of iterations of the algorithm. |
Lutz Duembgen and Klaus Nordhausen
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
MVTMLEsymm(longley) MVTMLEsymm(longley, nmax=10)
MVTMLEsymm(longley) MVTMLEsymm(longley, nmax=10)
Iterative algorithm to estimate Tyler's shape matrix using a partial Newton-Raphson approach.
TYLERshape(X, location = TRUE, eps = 1e-06, maxiter = 100)
TYLERshape(X, location = TRUE, eps = 1e-06, maxiter = 100)
X |
numeric data matrix or dataframe. Missing values are not allowed. |
location |
logical or numeric. If FALSE, it is assumed that the scatter should be computed wrt to the origin. If TRUE the location will be estimated as the mean vector and if it is a numeric vector it will be computed wrt to the given vector. |
eps |
convergence tolerance, which means that the algorithm stops when the Frobenius norm of the gradient is smaller than eps. |
maxiter |
maximum number of iterations. |
The estimate is based on the new fast algorithm described in Duembgen et al. (2016). Note that Tyler's shape matrix is standardized such that it has determinant 1.
The function does not check if there are observations equal to the mean (if location=TRUE
), to the provided location vector or to the origin (if location=FALSE
).
In these cases the function will fail.
In case maxiter
is reached before convergence, the estimate at that iteration is returned and a warning is given.
A list containing:
mu |
Estimated location if |
Sigma |
Estimated shape matrix. |
iter |
Number of iterations of the algorithm. |
Lutz Duembgen and Klaus Nordhausen
Tyler, D.E. (1987), A distribution-free M-estimator of scatter, Annals of Statistics, 15, 234–251.
Duembgen, L., Nordhausen, K. and Schuhmacher, H. (2016), New algorithms for M-estimation of multivariate location and scatter, Journal of Multivariate Analysis, 144, 200–217. doi:10.1016/j.jmva.2015.11.009
TYLERshape(longley) # compare to # library(ICSNP) # tyler.shape(longley) TYLERshape(longley, location=FALSE) # compare to # library(ICSNP) # tyler.shape(longley, location=0)
TYLERshape(longley) # compare to # library(ICSNP) # tyler.shape(longley) TYLERshape(longley, location=FALSE) # compare to # library(ICSNP) # tyler.shape(longley, location=0)