Title: | Tools for Multivariate Nonparametrics |
---|---|
Description: | Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling's T-test, estimates of shape are implemented. |
Authors: | Klaus Nordhausen [aut, cre] , Seija Sirkia [aut], Hannu Oja [aut] , David E. Tyler [aut] |
Maintainer: | Klaus Nordhausen <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1-2 |
Built: | 2024-12-12 06:57:49 UTC |
Source: | CRAN |
Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling's T-test, estimates of shape are implemented.
Package: | ICSNP |
Type: | Package |
Title: | Tools for Multivariate Nonparametrics |
Version: | 1.1-2 |
Date: | 2023-09-18 |
Authors@R: | c(person("Klaus", "Nordhausen", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-3758-8501")), person("Seija", "Sirkia", role = c("aut")), person("Hannu", "Oja", role = c("aut"), comment = c(ORCID = "0000-0002-4945-5976")), person("David E.", "Tyler", role = c("aut"))) |
Author: | Klaus Nordhausen [aut, cre] (<https://orcid.org/0000-0002-3758-8501>), Seija Sirkia [aut], Hannu Oja [aut] (<https://orcid.org/0000-0002-4945-5976>), David E. Tyler [aut] |
Maintainer: | Klaus Nordhausen <[email protected]> |
Depends: | mvtnorm, ICS |
Description: | Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling's T-test, estimates of shape are implemented. |
License: | GPL (>= 2) |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2023-09-18 11:39:25 UTC; admin |
Repository: | CRAN |
Date/Publication: | 2023-09-18 12:50:05 UTC |
Config/pak/sysreqs: | make |
This package contains tools for nonparametric multivariate analysis, including the estimation of location and shape as well as some tests for location and independece.
Shape matrices from this package can be used as one of the scatter matrices needed in the package ICS
whereas the tests of this package
can be used for testing in the framework of invariant coordinates or independent components obtained from the package ICS
.
The parametric Hotelling's T test serves as a reference for the nonparametric location tests.
Index of help topics:
HP.loc.test Hallin and Paindaveine Signed-Rank Tests HP1.shape One Step Rank Scatter Estimator HR.Mest Simultaneous Affine Equivariant Estimation of Multivariate Median and Tyler's Shape Matrix HotellingsT2 Hotelling's T2 Test ICSNP-package Tools for Multivariate Nonparametrics LASERI Cardiovascular Responses to Head-up Tilt duembgen.shape Duembgen's Shape Matrix duembgen.shape.wt Weighted Duembgen's Shape Matrix hl.loc Hodges - Lehmann Estimator of Location ind.ctest Test of Independece based on Marginal Ranks ind.ictest Test of Independence based on Marginal Ranks in a Symmetric IC Model pair.diff Pairwise Differences pair.prod Pairwise Products pair.sum Pairwise Sums pulmonary Change in Pulmonary Response after Exposure to Cotton Dust rank.ctest One, Two and C Sample Rank Tests for Location based on Marginal Ranks rank.ictest One Sample Location Test based on Marginal Ranks in the Independent Component Model spatial.median Spatial Median spatial.sign Spatial Signs symm.huber Symmetrized Huber Scatter Matrix symm.huber.wt Weighted Symmetrized Huber Scatter Matrix tyler.shape Tyler's Shape Matrix vdw.loc Van der Waerden Estimator of Location
Klaus Nordhausen [aut, cre] (<https://orcid.org/0000-0002-3758-8501>), Seija Sirkia [aut], Hannu Oja [aut] (<https://orcid.org/0000-0002-4945-5976>), David E. Tyler [aut]
Maintainer: Klaus Nordhausen <[email protected]>
Iterative algorithm to estimate Duembgen's shape matrix.
duembgen.shape(X, init = NULL, steps = Inf, eps = 1e-06, maxiter = 100, in.R = FALSE, na.action = na.fail, ...)
duembgen.shape(X, init = NULL, steps = Inf, eps = 1e-06, maxiter = 100, in.R = FALSE, na.action = na.fail, ...)
X |
numeric data matrix or dataframe. |
init |
an optional matrix giving the starting value for the iteration. Otherwise the regular covariance is used after transforming it to a shape matrix wit determinant 1. |
steps |
a fixed number of iteration steps to take. See details. |
eps |
convergence tolerance. |
maxiter |
maximum number of iterations. |
in.R |
logical. If TRUE R-code (and not C) is used in the iteration |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
other arguments passed on to |
Duembgen's shape matrix can be seen as tyler.shape
's matrix wrt to the origin for the pairwise differences of the observations.
Therefore this shape matrix needs no location parameter.
The function is, however, slow if the dataset is large.
The algorithm also allows for a k-step version where the iteration is run for a fixed number of steps instead of until convergence. If steps
is finite that number of steps is taken and maxiter
is ignored.
A better implementation is available in the package fastM as the function DUEMBGENshape
.
A matrix.
Klaus Nordhausen, Seija Sirkia, and some of the C++ is based on work by Jari Miettinen
Duembgen, L. (1998), On Tyler's M-functional of scatter in high dimension, Annals of Institute of Statistical Mathematics, 50, 471–491.
tyler.shape
, duembgen.shape.wt
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) cov.matrix/det(cov.matrix)^(1/3) duembgen.shape(X) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) cov.matrix/det(cov.matrix)^(1/3) duembgen.shape(X) rm(.Random.seed)
Iterative algorithm to estimate the weighted version of Duembgen's shape matrix.
duembgen.shape.wt(X, wt = rep(1, nrow(X)), init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
duembgen.shape.wt(X, wt = rep(1, nrow(X)), init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
X |
numeric data frame or matrix. |
wt |
vector of weights. Should be nonnegative and at least one larger than zero. |
init |
an optional matrix giving the starting value for the iteration. |
eps |
convergence tolerance. |
maxiter |
maximum number of iterations. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The weighted Duembgen shape matrix can be seen as tyler.shape
's matrix wrt to the origin for the weighted pairwise differences of the observations.
Therefore this shape matrix needs no location parameter.
Note that this function is memory comsuming and slow for large data sets since the matrix is based on all pairwise difference of the observations.
a matrix.
Klaus Nordhausen
Sirkia, S., Taskinen, S. and Oja, H. (2007), Symmetrised M-estimators of scatter. Journal of Multivariate Analysis, 98, 1611–1629.
set.seed(1) cov.matrix.1 <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol = 3) X.1 <- rmvnorm(100, c(0,0,0), cov.matrix.1) cov.matrix.2 <- diag(1,3) X.2 <- rmvnorm(50, c(1,1,1), cov.matrix.2) X <- rbind(X.1, X.2) D1 <- duembgen.shape.wt(X, rep(c(0,1), c(100,50))) D2 <- duembgen.shape.wt(X, rep(c(1,0), c(100,50))) D1 D2 rm(.Random.seed)
set.seed(1) cov.matrix.1 <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol = 3) X.1 <- rmvnorm(100, c(0,0,0), cov.matrix.1) cov.matrix.2 <- diag(1,3) X.2 <- rmvnorm(50, c(1,1,1), cov.matrix.2) X <- rbind(X.1, X.2) D1 <- duembgen.shape.wt(X, rep(c(0,1), c(100,50))) D2 <- duembgen.shape.wt(X, rep(c(1,0), c(100,50))) D1 D2 rm(.Random.seed)
Function to compute the Hodges - Lehmann estimator of location in the one sample case.
hl.loc(x, na.action = na.fail)
hl.loc(x, na.action = na.fail)
x |
a numeric vector. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The Hodges - Lehmann estimator is the median of the combined data points and Walsh averages.
It is the same as the Pseudo Median returned as a by-product of the function wilcox.test
.
the Hodges - Lehmann estimator of location.
Klaus Nordhausen
Hettmansperger, T.P. and McKean, J.W. (1998), Robust Nonparametric Statistical Methods, London, Arnold.
Hodges, J.L., and Lehmann, E.L. (1963), Estimates of location based on rank tests. The Annals of Mathematical Statistics, 34, 598–611.
set.seed(1) x <- rt(100, df = 3) hl.loc(x) # same as wilcox.test(x, conf.int = TRUE)$estimate rm(.Random.seed)
set.seed(1) x <- rt(100, df = 3) hl.loc(x) # same as wilcox.test(x, conf.int = TRUE)$estimate rm(.Random.seed)
Hotelling's T2 test for the one and two sample case.
HotellingsT2(X, ...) ## Default S3 method: HotellingsT2(X, Y = NULL, mu = NULL, test = "f", na.action = na.fail, ...) ## S3 method for class 'formula' HotellingsT2(formula, na.action = na.fail, ...)
HotellingsT2(X, ...) ## Default S3 method: HotellingsT2(X, Y = NULL, mu = NULL, test = "f", na.action = na.fail, ...) ## S3 method for class 'formula' HotellingsT2(formula, na.action = na.fail, ...)
X |
a numeric data frame or matrix. |
Y |
an optional numeric data frame or matrix for the two sample test. If NULL a one sample test is performed. |
mu |
a vector indicating the hypothesized value of the mean (or difference in means if a two sample test is performed). NULL represents origin or no difference between the groups. |
test |
if 'f', the decision is based on the F-distribution, if 'chi' a chi-squared approximation is used. |
formula |
a formula of the form |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
further arguments to be passed to or from methods. |
The classical test for testing the location of a multivariate population or for testing the mean
difference for two multivariate populations. When test = "f"
the F-distribution is used for
the test statistic and it is assumed that the data are normally distributed. If the chisquare
approximation is used, the normal assumption can be relaxed to existence of second moments.
In the two sample case both populations are assumed to have the same covariance matrix.
The formula interface is only applicable for the 2-sample tests.
A list with class 'htest' containing the following components:
statistic |
the value of the T2-statistic. (That is the scaled value of the statistic that has an
F distribution or a chisquare distribution depending on the value of |
parameter |
the degrees of freedom for the T2-statistic. |
p.value |
the p-value for the test. |
null.value |
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. |
alternative |
a character string with the value 'two.sided'. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data (and grouping vector). |
Klaus Nordhausen
Anderson, T.W. (2003), An introduction to multivariate analysis, New Jersey: Wiley.
# one sample test: data(pulmonary) HotellingsT2(pulmonary) HotellingsT2(pulmonary, mu = c(0,0,2), test = "chi") # two sample test: set.seed(123456) X <- rmvnorm(20, c(0, 0, 0, 0), diag(1:4)) Y <- rmvnorm(30, c(0.5, 0.5, 0.5, 0.5), diag(1:4)) Z <- rbind(X, Y) g <- factor(rep(c(1,2),c(20,30))) HotellingsT2(X, Y) HotellingsT2(Z ~ g, mu = rep(-0.5,4)) rm(.Random.seed)
# one sample test: data(pulmonary) HotellingsT2(pulmonary) HotellingsT2(pulmonary, mu = c(0,0,2), test = "chi") # two sample test: set.seed(123456) X <- rmvnorm(20, c(0, 0, 0, 0), diag(1:4)) Y <- rmvnorm(30, c(0.5, 0.5, 0.5, 0.5), diag(1:4)) Z <- rbind(X, Y) g <- factor(rep(c(1,2),c(20,30))) HotellingsT2(X, Y) HotellingsT2(Z ~ g, mu = rep(-0.5,4)) rm(.Random.seed)
This function implements the signed-rank location tests as suggested by Hallin and Paindaveine (2002a, 2002b).
HP.loc.test(X, mu = NULL, score = "rank", angles = "tyler", method = "approximation", n.perm = 1000, na.action = na.fail)
HP.loc.test(X, mu = NULL, score = "rank", angles = "tyler", method = "approximation", n.perm = 1000, na.action = na.fail)
X |
a numeric data frame or matrix. |
mu |
a vector indicating the hypothesized value of the location. NULL represents the origin. |
score |
score for the pseudo mahalanobis distance. Options are 'rank', 'sign' and 'normal' scores. |
angles |
which angle to use. Possible are 'tyler' for spatial sign type anlges or 'interdirections'. Note however that currently only 'tyler' is implemented. |
method |
defines the method used for the computation of the p-value. The possibilites are 'approximation' or 'permutation'. |
n.perm |
if |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The test based on interdirections is described in Hallin and Paindaveine (2002a) and the test based on Tyler's angles is described in Hallin and Paindaveine (2002b). The two different tests are asymptotically equivalent and in both cases is assumed that the data comes from an elliptic distribution.
A list with class 'htest' containing the following components:
statistic |
the value of the Q-statistic. |
parameter |
the degrees of freedom for the Q-statistic. |
p.value |
the p-value for the test. |
null.value |
the specified hypothesized value of the location. |
alternative |
a character string with the value 'two.sided'. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data. |
Klaus Nordhausen
Hallin, M. and Paindaveine, D. (2002a), Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks, Annals of Statistics, 30, 1103–1133.
Hallin, M. and Paindaveine, D. (2002b), Randles' interdirections or Tyler's angles?, In Y. Dodge, Ed. Statistical data analysis based on the L1-norm and related methods, 271–282.
X <- rmvnorm(100, c(0,0,0.1)) HP.loc.test(X) HP.loc.test(X, score="s") HP.loc.test(X, score="n")
X <- rmvnorm(100, c(0,0,0.1)) HP.loc.test(X) HP.loc.test(X, score="s") HP.loc.test(X, score="n")
one step M-estimator of the scatter matrix based on ranks.
HP1.shape(X, location = "Estimate", na.action = na.fail, ...)
HP1.shape(X, location = "Estimate", na.action = na.fail, ...)
X |
a numeric data frame or matrix. |
location |
if 'Estimate' the location and scatter matrix used for computing the spatial signs are estimated simultaneously using |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
arguments that can be passed on to |
This is a one step M-estimator of shape which is standardized in such a way that the determinant is 1.
The exact formula is:
where is Tyler's shape matrix,
is the spatial sign of
and
gives the rank of
among
. The van der Warden score function
is the inverse of the cdf of a chi-squared distribution with p degrees of freedom.
This scatter matrix is based on the test for shape developed in the paper by Hallin and Paindaveine (2006), its usage with respect to the origin is demonstrated in Nordhausen et al. (2006).
Klaus Nordhausen
Hallin, M. and Paindaveine, D. (2006), Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity, Annals of Statistics, 34, 2707–2756.
Nordhausen, K., Oja, H. and Paindaveine, D. (2009), Signed-rank tests for location in the symmetric independent component model, Journal of Multivariate Analysis, 100, 821–834.
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) HP1.shape(X) HP1.shape(X, location="Origin") cov.matrix/det(cov.matrix)^(1/3) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) HP1.shape(X) HP1.shape(X, location="Origin") cov.matrix/det(cov.matrix)^(1/3) rm(.Random.seed)
iterative algorithm that finds the affine equivariant multivariate median by estimating tyler.shape
simultaneously.
HR.Mest(X, maxiter = 100, eps.scale = 1e-06, eps.center = 1e-06, na.action = na.fail)
HR.Mest(X, maxiter = 100, eps.scale = 1e-06, eps.center = 1e-06, na.action = na.fail)
X |
a numeric data frame or matrix. |
maxiter |
maximum number of iterations. |
eps.scale |
convergence tolerance for the Tyler's shape matrix subroutine. |
eps.center |
convergence tolerance for the location estimate. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The algorithm follows the idea of Hettmansperger and Randles (2002). There are, however, some differences. This algorithm has the vector of marginal medians as starting point for the location and the starting shape matrix is Tyler's shape matrix based on the vector of marginal medians and has then a location step and a shape step which are:
transforming the data as and
computing the spatial median
of y using the function
spatial.median
. Then
retransforming to the original scale
.
computing Tyler's shape matrix with respect to
by using the function
tyler.shape
.
The algorithm stops when the difference between two subsequent location estimates is smaller than eps.center
.
There is no proof that the algorithm converges.
A list containing:
center |
vector with the estimated loaction. |
scatter |
matrix of the estimated scatter. |
Klaus Nordhausen and Seija Sirkia
Hettmansperger, T.P. and Randles, R.H. (2002), A practical affine equivariant multivariate median, Biometrika, 89, 851–860.
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) res <- HR.Mest(X) colMeans(X) res$center cov.matrix/det(cov.matrix)^(1/3) res$scatter rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) res <- HR.Mest(X) colMeans(X) res$center cov.matrix/det(cov.matrix)^(1/3) res$scatter rm(.Random.seed)
Performs the test that a group of variables is independent of an other based on marginal ranks. Three different score functions are available.
ind.ctest(X, index1, index2 = NULL, scores = "rank", na.action = na.fail)
ind.ctest(X, index1, index2 = NULL, scores = "rank", na.action = na.fail)
X |
a data frame or matrix. |
index1 |
integer vector that selects the columns of |
index2 |
integer vector that selects the columns of |
scores |
if 'sign', a sign test is performed, if 'rank' a rank test is performed or if 'normal' a normal score test is performed. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The test tests if X[ , index1] is independent of X[ , index2] and is described in great detail in Puri and Sen (1971).
A list with class 'htest' containing the following components:
statistic |
the value of the W-statistic. |
parameter |
the degrees of freedom for the W-statistic. |
p.value |
the p-value for the test. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data. |
Klaus Nordhausen
Puri , M.L. and Sen, P.K. (1971), Nonparametric Methods in Multivariate Analysis, New York: Wiley.
A1 <- matrix(c(4, 4, 5, 4, 6, 6, 5, 6, 7), ncol = 3) A2 <- matrix(c(0.5, -0.3, -0.3, 0.7), ncol = 2) X <- cbind(rmvnorm(100, c(-1, 0, 1), A1), rmvnorm(100, c(0, 0), A2)) ind.ctest(X,1:3) ind.ctest(X, c(1, 5), c(2, 3), scores = "normal")
A1 <- matrix(c(4, 4, 5, 4, 6, 6, 5, 6, 7), ncol = 3) A2 <- matrix(c(0.5, -0.3, -0.3, 0.7), ncol = 2) X <- cbind(rmvnorm(100, c(-1, 0, 1), A1), rmvnorm(100, c(0, 0), A2)) ind.ctest(X,1:3) ind.ctest(X, c(1, 5), c(2, 3), scores = "normal")
Performs the test that a group of variables is independent of an other based on marginal ranks. It is assumed that the data follows a symmetric IC model. Three different score functions are available.
ind.ictest(X, index1, index2 = NULL, scores = "rank", method = "approximation", n.simu = 1000, ..., na.action = na.fail)
ind.ictest(X, index1, index2 = NULL, scores = "rank", method = "approximation", n.simu = 1000, ..., na.action = na.fail)
X |
a data frame or matrix. |
index1 |
integer vector that selects the columns of |
index2 |
integer vector that selects the columns of |
scores |
if 'sign', a sign test is performed, if 'rank' a signed rank test is performed or if 'normal' a normal score test is performed. |
method |
defines the method used for the computation of the p-value. The possobilites are "approximation" (default), "simulation" or "permutation". Details below. |
n.simu |
if ' |
... |
further arguments to be passed to the function |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
Assumed is here that X[ , index1]
comes from a symmetric independent component model which in turn is independent from X[ , index2]
which has also
an underlying symmetric independent component model. This function recovers the independent components using the function ics
, centers them by a marginal
loaction estimate based on the same scores that will be used in the actual test. The test is described in Oja, Paindaveine and Taskinen (2009).
The asymptotic chi-square distibution is however even for large sample sizes inadequat and therefore p-values can be simulated by resampling the test statistic under the null
hypothesis or by permuting the rows of the independent components of X[ , index2]
. Both alternatives are also described in Oja, Paindaveine and Taskinen (2009).
A list with class 'htest' containing the following components:
statistic |
the value of the Q-statistic. |
parameter |
the degrees of freedom for the Q-statistic or the number of replications depending on the chosen method. |
p.value |
the p-value for the test. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data. |
Klaus Nordhausen
Oja, H. and Paindaveine, D. and Taskinen, S. (2016), Affine-invariant rank tests for multivariate independence in independent component models, Electronic Journal of Statistics, 10, 2372–2419.
Z1<-cbind(rt(500,5),rnorm(500),runif(500)) Z2<-cbind(rt(500,8),rbeta(500,2,2)) A1 <- matrix(c(4, 4, 5, 4, 6, 6, 5, 6, 7), ncol = 3) A2 <- matrix(c(0.5, -0.3, -0.3, 0.7), ncol = 2) X <- cbind(Z1 %*% t(A1), Z2 %*% t(A2)) ind.ictest(X,1:3) ind.ictest(X,1:3,method="simu") ind.ictest(X,1:2,3:5,method="perm", S1=tyler.shape,S2=cov)
Z1<-cbind(rt(500,5),rnorm(500),runif(500)) Z2<-cbind(rt(500,8),rbeta(500,2,2)) A1 <- matrix(c(4, 4, 5, 4, 6, 6, 5, 6, 7), ncol = 3) A2 <- matrix(c(0.5, -0.3, -0.3, 0.7), ncol = 2) X <- cbind(Z1 %*% t(A1), Z2 %*% t(A2)) ind.ictest(X,1:3) ind.ictest(X,1:3,method="simu") ind.ictest(X,1:2,3:5,method="perm", S1=tyler.shape,S2=cov)
This data set contains the cardiovascular responses to a passive head-up tilt for 223 subjects.
data(LASERI)
data(LASERI)
A data frame with 223 observations on the following 32 variables.
Sex
a factor with levels Female
and Male
.
Age
Age in years.
Height
Height in cm.
Weight
Weight in kg.
Waist
Waist circumference in cm.
Hip
Hip circumference in cm.
BMI
Body mass index.
WHR
Waist hip ratio.
HRT1
Average heart rate in the tenth minute of rest.
HRT2
Average heart rate in the second minute during the tilt.
HRT3
Average heart rate in the fifth minute during the tilt.
HRT4
Average heart rate in the fifth minute after the tilt.
COT1
Average cardiac output in the tenth minute of rest.
COT2
Average cardiac output in the second minute during the tilt.
COT3
Average cardiac output in the fifth minute during the tilt.
COT4
Average cardiac output in the fifth minute after the tilt.
SVRIT1
Average systemic vascular resistance index in the tenth minute of rest.
SVRIT2
Average systemic vascular resistance index in the second minute during the tilt.
SVRIT3
Average systemic vascular resistance index in the fifth minute during the tilt.
SVRIT4
Average systemic vascular resistance index in the fifth minute after the tilt.
PWVT1
Average pulse wave velocity in the tenth minute of rest.
PWVT2
Average pulse wave velocity in the second minute during the tilt.
PWVT3
Average pulse wave velocity in the fifth minute during the tilt.
PWVT4
Average pulse wave velocity in the fifth minute after the tilt.
HRT1T2
Difference HRT1
- HRT2
.
COT1T2
Difference COT1
- COT2
.
SVRIT1T2
Difference SVRIT1
- SVRIT2
.
PWVT1T2
Difference PWVT1
- PWVT2
.
HRT1T4
Difference HRT1
- HRT4
.
COT1T4
Difference COT1
- COT4
.
SVRIT1T4
Difference SVRIT1
- SVRIT4
.
PWVT1T4
Difference PWVT1
- PWVT4
.
This data is a subset of hemodynamic data collected as a part of the LASERI study (English title: “Cardivascular risk in young Finns study”) using whole-body impedance cardiography and plethysmographic blood pressure recordings from fingers. The data given here comes from 223 healthy subjects between 26 and 42 years of age, who participated in the recording of the hemodynamic variables both in a supine position and during a passive head-up tilt on a motorized table. During that experiment the subject spent the first ten minutes in a supine position, then the motorized table was tilted to a head-up position (60 degrees) for five minutes, and for the last five minutes the table was again returned to the supine position.
Of interest in this data is for example if the values 5 minutes after the tilt are already returned to their pre-tilt levels.
Data courtesy of the LASERI study
(https://youngfinnsstudy.utu.fi/).
# for example testing if the location before the tilt is the same as # 5 minutes after the tilt: data(LASERI) DIFFS.T1T4 <- subset(LASERI,select=c(HRT1T4,COT1T4,SVRIT1T4)) rank.ctest(DIFFS.T1T4) rank.ctest(DIFFS.T1T4, score="s")
# for example testing if the location before the tilt is the same as # 5 minutes after the tilt: data(LASERI) DIFFS.T1T4 <- subset(LASERI,select=c(HRT1T4,COT1T4,SVRIT1T4)) rank.ctest(DIFFS.T1T4) rank.ctest(DIFFS.T1T4, score="s")
Computes pairwise differences.
pair.diff(X)
pair.diff(X)
X |
a numeric matrix. |
The function computes all differences of row i and row j with i < j. The function is a wrapper to a C function to do the computation quickly and does no checks concerning the input.
Matrix containing the differences.
Seija Sirkia
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.diff(X)
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.diff(X)
Computes pairwise elementwise products.
pair.prod(X)
pair.prod(X)
X |
a numeric matrix. |
The function computes all elementwise products of row i and row j with i < j. The function is a wrapper to a C function to do the computation quickly and does no checks concerning the input.
Matrix containing the products.
Klaus Nordhausen
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.prod(X)
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.prod(X)
Computes pairwise sums.
pair.sum(X)
pair.sum(X)
X |
a numeric matrix. |
The function computes all sums of row i and row j with i < j. The function is a wrapper to a C function to do the computation quickly and does no checks concerning the input.
Matrix containing the sums.
Seija Sirkia
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.sum(X)
X <- matrix(1:10, ncol = 2, byrow = FALSE) pair.sum(X)
Changes in pulmonary function of 12 workers after 6 hours of exposure to cotton dust.
data(pulmonary)
data(pulmonary)
A data frame with 12 observations on the following 3 variables.
FVC
change in FVC (forced vital capacity) after 6 hours.
FEV
change in FEV_3 (forced expiratory volume) after 6 hours.
CC
change in CC (closing capacity) after 6 hours.
There is also a different version of this data set around. In the different version the FVC value of subject 11 is -0.01 instead of -0.10.
Merchant, J. A., Halprin, G. M., Hudson, A. R. Kilburn, K. H., McKenzie, W. N., Hurst, D. J. and Bermazohnm P. (1975), Responses to cotton dust, Archives of Environmental Health, 30, 222–229, Table 5.
Reprinted with permission of the Helen Dwight Reid Educational Foundation. Published by Heldref Publications, 1319 Eighteenth St., NW, Washington, DC 20036-1802.
Hettmansperger, T. P. and McKean, J. W. (1998), Robust Nonparametric Statistical Methods, London: Arnold.
data(pulmonary) plot(pulmonary)
data(pulmonary) plot(pulmonary)
Performs the one, two or c sample location test based on marginal ranks. Three different score functions are available.
rank.ctest(X, ...) ## Default S3 method: rank.ctest(X, Y = NULL, mu = NULL, scores = "rank", na.action = na.fail, ...) ## S3 method for class 'formula' rank.ctest(formula, na.action = na.fail, ...) ## S3 method for class 'ics' rank.ctest(X, g = NULL, index = NULL, na.action = na.fail, ...)
rank.ctest(X, ...) ## Default S3 method: rank.ctest(X, Y = NULL, mu = NULL, scores = "rank", na.action = na.fail, ...) ## S3 method for class 'formula' rank.ctest(formula, na.action = na.fail, ...) ## S3 method for class 'ics' rank.ctest(X, g = NULL, index = NULL, na.action = na.fail, ...)
X |
a numeric data frame or matrix or an ics object. |
Y |
an optional numeric data frame or matrix for the two sample test. If NULL a one sample test is performed. |
mu |
a vector indicating the hypothesized value of the mean (or difference
in means if you are performing a two sample test). NULL represents origin or no difference between the groups.
For more than two groups |
scores |
if 'sign', a sign test is performed, if 'rank' a signed rank test is performed or if 'normal' a normal score test is performed. |
formula |
a formula of the form |
g |
a grouping factor with at least two levels. |
index |
an integer vector that gives the columns to choose the invariant coordinates form the 'ics' object. The default uses all columns. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
further arguments to be passed to or from methods. |
These tests are well described in Puri and Sen (1971). The tests are based on the marginal ranks for which three score functions are available. The scores are also used to estimate the covariance matrices. In the multisample case it is assumed that the distribution of the different populations differs only in their location.
The ics interface provides an invariant test based on the invariant coordinate selection. The assymptotic distribution is however still an open question when more than one component is used, though the chi-square approximation works well also for several components as shown in Nordhausen, Oja and Tyler (2006).
A list with class 'htest' containing the following components:
statistic |
the value of the T-statistic. |
parameter |
the degrees of freedom for the T-statistic. |
p.value |
the p-value for the test. |
null.value |
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. |
alternative |
a character string with the value 'two.sided'. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data (and grouping vector). |
Klaus Nordhausen
Puri , M.L. and Sen, P.K. (1971), Nonparametric Methods in Multivariate Analysis, New York: Wiley.
Nordhausen, K., Oja, H. and Tyler, D.E. (2006), On the Efficiency of Invariant Multivariate Sign and Rank Tests, in Festschrift of Tarmo Pukkila on his 60th Birthday, 217–231.
# one sample tests: data(pulmonary) rank.ctest(pulmonary, scores = "sign") rank.ctest(pulmonary, mu = c(0,0,2)) # two sample tests: set.seed(123456) X <- rmvnorm(20, c(0,0,0,0), diag(1:4)) Y <- rmvnorm(30, c(0.5,0.5,0.5,0.5), diag(1:4)) Z <- rbind(X,Y) g <- factor(rep(c(1,2), c(20,30))) rank.ctest(X, Y, scores = "normal") rank.ctest(Z~g, scores = "sign", mu = rep(-0.5,4)) # c sample test: W <- rmvnorm(30, c(0,0,0,0), diag(1:4)) Z2 <- rbind(X,Y,W) g2 <- factor(rep(1:3, c(20,30,30))) rank.ctest(Z2~g2, scores = "normal") # in an invariant coordinate system rank.ctest(ics(Z2,covOrigin, cov4, S2args=list(location = "Origin")), index = c(1,4), scores = "sign") rank.ctest(ics(Z), g, index = 4) rank.ctest(ics(Z2), g2, scores = "normal",index = 4) rm(.Random.seed)
# one sample tests: data(pulmonary) rank.ctest(pulmonary, scores = "sign") rank.ctest(pulmonary, mu = c(0,0,2)) # two sample tests: set.seed(123456) X <- rmvnorm(20, c(0,0,0,0), diag(1:4)) Y <- rmvnorm(30, c(0.5,0.5,0.5,0.5), diag(1:4)) Z <- rbind(X,Y) g <- factor(rep(c(1,2), c(20,30))) rank.ctest(X, Y, scores = "normal") rank.ctest(Z~g, scores = "sign", mu = rep(-0.5,4)) # c sample test: W <- rmvnorm(30, c(0,0,0,0), diag(1:4)) Z2 <- rbind(X,Y,W) g2 <- factor(rep(1:3, c(20,30,30))) rank.ctest(Z2~g2, scores = "normal") # in an invariant coordinate system rank.ctest(ics(Z2,covOrigin, cov4, S2args=list(location = "Origin")), index = c(1,4), scores = "sign") rank.ctest(ics(Z), g, index = 4) rank.ctest(ics(Z2), g2, scores = "normal",index = 4) rm(.Random.seed)
marginal rank test for the location problem in the one sample case when the margins are assumed independent.
rank.ictest(X, ...) ## Default S3 method: rank.ictest(X, mu = NULL, scores = "rank", method = "approximation", n.simu = 1000, na.action = na.fail, ...) ## S3 method for class 'ics' rank.ictest(X, index = NULL, na.action = na.fail, ...)
rank.ictest(X, ...) ## Default S3 method: rank.ictest(X, mu = NULL, scores = "rank", method = "approximation", n.simu = 1000, na.action = na.fail, ...) ## S3 method for class 'ics' rank.ictest(X, index = NULL, na.action = na.fail, ...)
X |
a numeric data frame or matrix or an ics object. |
mu |
a vector indicating the hypothesized value of the location. NULL represents the origin. |
scores |
options are 'rank' for the signed rank test, 'sign' for the sign test and 'normal' for the normal score test. |
method |
defines the method used for the computation of the p-value. The possibilites are "approximation" (default), "simulation" or "permutation". Details below. |
n.simu |
if ' |
index |
an integer vector that gives the columns to choose from invariant coordinates form the 'ics' object. The default uses all columns. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
further arguments to be passed to or from methods. |
The test is normally used to test for location in the symmetric independent component model.
By default the limiting distribution is used to compute the p-values. However for moderate sample sizes (N=50) was observed in
Nordhausen et al. (2009) that the normal score test can be sometimes slightly biased. Therefore the argument method
can be used to get p-values based on simulations from a multivariate normal under the null or by permuting the signs of the centered
observations.
A list with class 'htest' containing the following components:
statistic |
the value of the Q-statistic. |
parameter |
the degrees of freedom for the Q-statistic. |
p.value |
the p-value for the test. |
null.value |
the specified hypothesized value of the location. |
alternative |
a character string with the value 'two.sided'. |
method |
a character string indicating what type of test was performed. |
data.name |
a character string giving the name of the data. |
Klaus Nordhausen
Nordhausen, K., Oja, H. and Paindaveine, D. (2009), Signed-rank tests for location in the symmetric independent component model, Journal of Multivariate Analysis, 100, 821–834.
set.seed(555) X <- cbind(rt(30,8), rnorm(30,0.5), runif(30,-3,3)) mix.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X.mixed <- X %*% t(mix.matrix) ica.X <- ics(X, covOrigin, cov4, S2args = list(location = "Origin")) rank.ictest(ica.X) rank.ictest(ica.X, scores = "normal", method = "simu") rank.ictest(ics.components(ica.X), scores = "normal", method = "perm") rm(.Random.seed)
set.seed(555) X <- cbind(rt(30,8), rnorm(30,0.5), runif(30,-3,3)) mix.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X.mixed <- X %*% t(mix.matrix) ica.X <- ics(X, covOrigin, cov4, S2args = list(location = "Origin")) rank.ictest(ica.X) rank.ictest(ica.X, scores = "normal", method = "simu") rank.ictest(ics.components(ica.X), scores = "normal", method = "perm") rm(.Random.seed)
iterative algorithm to compute the spatial median.
spatial.median(X, init = NULL, maxiter = 500, eps = 1e-06, print.it = FALSE, na.action = na.fail)
spatial.median(X, init = NULL, maxiter = 500, eps = 1e-06, print.it = FALSE, na.action = na.fail)
X |
a numeric data frame or data matrix. |
init |
Starting value for the alogrihtm, if 'NULL', the vector of marginal medians is used. |
maxiter |
maximum number of iterations. |
eps |
convergence tolerance. |
print.it |
logical. If TRUE prints the number of iterations, otherwise not. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
Follows the algorithm of Vardi and Zhang.
vector of the spatial median.
Klaus Nordhausen and Seija Sirkia
Vardi, Y. and Zhang, C.-H. (1999), The multivariate L1-median and associated data depth, PNAS, 97, 1423–1426.
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) spatial.median(X) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) spatial.median(X) rm(.Random.seed)
Function to obtain the spatial signs of a multivariate dataset. The function can compute the spatial signs also with respect to a given or estimated loacation and scale.
If both location and scale have to be estimated the HR.Mest
function is used, if only one has to be estimated the, estimation is done using
spatial.median
or tyler.shape
.
spatial.sign(X, center = TRUE, shape = TRUE, na.action = na.fail, ...)
spatial.sign(X, center = TRUE, shape = TRUE, na.action = na.fail, ...)
X |
a numeric data frame or matrix. |
center |
either a logical value or a numeric vector of length equal to the number of columns of 'X'. See below for more information. |
shape |
either a logical value or a square numeric matrix with number of columns equal to the number of columns of 'X'. See below for more information. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
... |
arguments that can be passed on to functions used for the estimation of location and shape. |
The spatial signs U of X with location and shape V are given by
If a numeric value is given as 'center' and/or 'shape' these are used as and/or V in the above formula.
If 'center' and/or 'shape' are 'TRUE' the values for
and/or V are estimated, if 'FALSE' the origin is used as the
value of
and/or the identity matrix as the value of V.
In the special case of univariate data the univariate signs of the data (centered if requested) are returned and the shape parameter is redundant.
a matrix with the spatial signs of the data as rows or the univariate signs as a px1 matrix. The centering vector and scaling matrix used are returned as attributes 'center' and 'shape'.
Klaus Nordhausen and Seija Sirkia
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(15, c(1,0,-1), cov.matrix) spatial.sign(X) spatial.sign(X, center=FALSE, shape=FALSE) spatial.sign(X, center=colMeans(X), shape=cov(X)) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(15, c(1,0,-1), cov.matrix) spatial.sign(X) spatial.sign(X, center=FALSE, shape=FALSE) spatial.sign(X, center=colMeans(X), shape=cov(X)) rm(.Random.seed)
Iterative algorithm to estimate the symmetrized Huber scatter matrix.
symm.huber(X, qg = 0.9, init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
symm.huber(X, qg = 0.9, init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
X |
numeric data frame or matrix. |
qg |
tuning parameter. Should be between 0 and 1. The default is 0.9. |
init |
an optional matrix giving the starting value for the iteration. |
eps |
convergence tolerance. |
maxiter |
maximum number of iterations. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The symmetrized Huber scatter matrix is the regular Huber scatter matrix for the pairwise differences of the observations taken wrt to the origin.
Note that this function might be memory comsuming and slow for large data sets since the matrix is based on all pairwise difference of the observations.
The function symmhuber
in the package SpatialNP offers also a k-step option. The SpatialNP package contains also the function mvhuberM
for the regular multivariate Huber location
and scatter estimatior.
a matrix.
Klaus Nordhausen and Jari Miettinen
Sirkia, S., Taskinen, S. and Oja, H. (2007), Symmetrised M-estimators of scatter. Journal of Multivariate Analysis, 98, 1611–1629.
symm.huber.wt
, symmhuber
, mvhuberM
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) symm.huber(X) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) symm.huber(X) rm(.Random.seed)
Iterative algorithm to estimate the weighted symmetrized Huber scatter matrix.
symm.huber.wt(X, wt = rep(1, nrow(X)), qg = 0.9, init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
symm.huber.wt(X, wt = rep(1, nrow(X)), qg = 0.9, init = NULL, eps = 1e-06, maxiter = 100, na.action = na.fail)
X |
numeric data frame or matrix. |
wt |
vector of weights. Should be nonnegative and at least one larger than zero. |
qg |
tuning parameter. Should be between 0 and 1. The default is 0.9. |
init |
an optional matrix giving the starting value for the iteration. |
eps |
convergence tolerance. |
maxiter |
maximum number of iterations. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The weighted symmetrized Huber scatter matrix is the regular Huber scatter matrix for the weighted pairwise differences of the observations taken wrt to the origin.
Note that this function is memory comsuming and slow for large data sets since the matrix is based on all pairwise difference of the observations.
a matrix.
Klaus Nordhausen
Sirkia, S., Taskinen, S. and Oja, H. (2007), Symmetrised M-estimators of scatter. Journal of Multivariate Analysis, 98, 1611–1629.
set.seed(1) cov.matrix.1 <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol = 3) X.1 <- rmvnorm(100, c(0,0,0), cov.matrix.1) cov.matrix.2 <- diag(1,3) X.2 <- rmvnorm(50, c(1,1,1), cov.matrix.2) X <- rbind(X.1, X.2) D1 <- symm.huber.wt(X, rep(c(0,1), c(100,50))) D2 <- symm.huber.wt(X, rep(c(1,0), c(100,50))) D1 D2 rm(.Random.seed)
set.seed(1) cov.matrix.1 <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol = 3) X.1 <- rmvnorm(100, c(0,0,0), cov.matrix.1) cov.matrix.2 <- diag(1,3) X.2 <- rmvnorm(50, c(1,1,1), cov.matrix.2) X <- rbind(X.1, X.2) D1 <- symm.huber.wt(X, rep(c(0,1), c(100,50))) D2 <- symm.huber.wt(X, rep(c(1,0), c(100,50))) D1 D2 rm(.Random.seed)
Iterative algorithm to estimate Tyler's shape matrix.
tyler.shape(X, location = NULL, init = NULL, steps = Inf, eps = 1e-06, maxiter = 100, in.R = FALSE, print.it = FALSE, na.action = na.fail)
tyler.shape(X, location = NULL, init = NULL, steps = Inf, eps = 1e-06, maxiter = 100, in.R = FALSE, print.it = FALSE, na.action = na.fail)
X |
numeric data matrix or dataframe. |
location |
if NULL the sample mean is used, otherwise a vector with the location can be specified. |
init |
an optional matrix giving the starting value for the iteration |
steps |
a fixed number of iteration steps to take. See details. |
eps |
convergence tolerance. |
maxiter |
maximum number of iterations. |
in.R |
logical. If TRUE R-code (and not C) is used in the iteration |
print.it |
logical. If TRUE prints the number of iterations, otherwise not. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The most robust M-estimator of shape. It is proportional to the regular covariance matrix for elliptical contoured distributions. The estimate is in such a way standardized, that its determinate is 1.
The algorithm requires an estimate of location, if none is provided, the sample mean is used. Observations which are equal to the location estimate are removed form the data.
The algorithm also allows for a k-step version where the iteration is run for a fixed number of steps instead of until convergence. If steps
is finite that number of steps is taken and maxiter
is ignored.
A different implementation is available in the package fastM as the function TYLERshape
.
A matrix.
Klaus Nordhausen, and Seija Sirkia
Tyler, D.E. (1987), A distribution-free M-estimator of scatter, Annals of Statistics, 15, 234–251.
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) tyler.shape(X) tyler.shape(X, location=0) cov.matrix/det(cov.matrix)^(1/3) rm(.Random.seed)
set.seed(654321) cov.matrix <- matrix(c(3,2,1,2,4,-0.5,1,-0.5,2), ncol=3) X <- rmvnorm(100, c(0,0,0), cov.matrix) tyler.shape(X) tyler.shape(X, location=0) cov.matrix/det(cov.matrix)^(1/3) rm(.Random.seed)
Iterative algorithm to compute the location estimator based on van der Waerden scores (sometimes also referred to as normal scores).
vdw.loc(x, int.diff = 10, maxiter = 1000, na.action = na.fail)
vdw.loc(x, int.diff = 10, maxiter = 1000, na.action = na.fail)
x |
a numeric vector. |
int.diff |
number of observations in internal interval when the estimate is searched. |
maxiter |
maximum number of iterations. |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The algorithm searches among the observations and all Walsh averages for the two points nearest around the root of the van der Waerden score criterion. Since the criterion function
is monotone first the int.diff
of the sorted data points are searched that contain the root. After then determining there the two points of question a linear interpolation is used as an estimate.
the van der Waerden score estimator of location.
Klaus Nordhausen
Hettmansperger, T.P. and McKean, J.W. (1998), Robust Nonparametric Statistical Methods, London, Arnold.
set.seed(1) x <- rt(100, df = 3) vdw.loc(x) rm(.Random.seed)
set.seed(1) x <- rt(100, df = 3) vdw.loc(x) rm(.Random.seed)