Title: | Multivariate Hypothesis Tests |
---|---|
Description: | Hypothesis tests for multivariate data. Tests for one and two mean vectors, multivariate analysis of variance, tests for one, two or more covariance matrices. References include: Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. ISBN: 978-0124712522. London: Academic Press. |
Authors: | Michail Tsagris [aut, cre] |
Maintainer: | Michail Tsagris <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2024-10-14 06:27:00 UTC |
Source: | CRAN |
Multivariate Hypothesis Tests.
Package: | mvhtests |
Type: | Package |
Version: | 1.0 |
Date: | 2023-10-19 |
License: | GPL-2 |
Michail Tsagris [email protected].
Michail Tsagris [email protected].
Aitchison J. (1986). The statistical analysis of compositional data. Chapman & Hall.
Amaral G.J.A., Dryden I.L. and Wood A.T.A. (2007). Pivotal bootstrap methods for k-sample problems in directional statistics and shape analysis. Journal of the American Statistical Association, 102(478): 695–707.
Efron B. (1981) Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics, 9(2): 139–158.
Emerson S. (2009). Small sample performance and calibration of the Empirical Likelihood method. PhD thesis, Stanford university.
Everitt B. (2005). An R and S-Plus Companion to Multivariate Analysis. Springer.
James G.S. (1954). Tests of Linear Hypotheses in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19–43.
Jing B.Y. and Wood A.T.A. (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics, 24(1): 365–369.
Jing B.Y. and Robinson J. (1997). Two-sample nonparametric tilting method. Australian Journal of Statistics, 39(1): 25–34.
Johnson R.A. and Wichern D.W. (2007, 6th Edition). Applied Multivariate Statistical Analysis.
Krishnamoorthy K. and Yu J. (2004). Modified Nel and Van der Merwe test for the multivariate Behrens-Fisher problem. Statistics & Probability Letters, 66(2): 161–169.
Krishnamoorthy K. and Yanping X. (2006). On Selecting Tests for Equality of Two Normal Mean Vectors. Multivariate Behavioral Research, 41(4): 533–548.
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. London: Academic Press.
Owen A.B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2): 237–249.
Owen A. (1990). Empirical likelihood ratio confidence regions. Annals of Statistics, 18(1): 90–120.
Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.
Preston S.P. and Wood A.T.A. (2010). Two-Sample Bootstrap Hypothesis Tests for Three-Dimensional Labelled Landmark Data. Scandinavian Journal of Statistics 37(4): 568–587.
Todorov V. and Filzmoser P. (2010). Robust Statistic for the One-way MANOVA. Computational Statistics & Data Analysis 54(1): 37–48.
Box's M test for equality of two or more covariance matrices.
Mtest.cov(x, ina, a = 0.05)
Mtest.cov(x, ina, a = 0.05)
x |
A matrix containing Euclidean data. |
ina |
A vector denoting the groups of the data. |
a |
The significance level, set to 0.05 by default. |
According to Mardia, Kent and Bibby (1979, pg. 140), it may be argued that if is small, then the log-likelihood ratio test (function
likel.cov
) gives too much weight to the contribution of . This consideration led Box (1949) to propose another test statistic in place of that seen in
likel.cov
. Box's is given by
where and
and
are the
-th unbiased covariance estimator and the pooled covariance matrix, respectively with
. Box's
also has an asymptotic
distribution with
degrees of freedom. Box's approximation seems to be good if each
exceeds 20 and if
and
do not exceed 5 (Bibby and Kent (1979) pg. 140).
A vector with the the test statistic, the p-value, the degrees of freedom and the critical value of the test.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. London: Academic Press.
x <- as.matrix( iris[, 1:4] ) ina <- iris[, 5] Mtest.cov(x, ina)
x <- as.matrix( iris[, 1:4] ) ina <- iris[, 5] Mtest.cov(x, ina)
Empirical likelihood for a one sample mean vector hypothesis testing.
el.test1(x, mu, R = 1, ncores = 1, graph = FALSE)
el.test1(x, mu, R = 1, ncores = 1, graph = FALSE)
x |
A matrix containing Euclidean data. |
mu |
The hypothesized mean vector. |
R |
If R is 1 no bootstrap calibration is performed and the classical p-value via the |
ncores |
The number of cores to use, set to 1 by default. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. IF TRUE the histogram of the bootstrap test statistic values is plotted. |
The is that
and the constraint imposed by EL is
where the is the Lagrangian parameter introduced to maximize the above expression. Note that the maximization of is with respect to the
. The probabilities have the following form
The log-likelihood ratio test statistic can be written as
where denotes the number of variables. Under
, asymptotically. Alternatively the bootstrap p-value may be computed.
A list with the outcome of the function el.test
which includes
the -2 log-likelihood ratio, the observed P-value by chi-square approximation, the final value of Lagrange multiplier , the gradient at the maximum, the Hessian matrix, the weights on the observations (probabilities multiplied by the sample size) and the number of iteration performed.
In addition the runtime of the procedure is reported. In the case of bootstrap, the bootstrap p-value is also returned.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Jing B.Y. and Wood A.T.A. (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics, 24(1): 365–369.
Owen A. (1990). Empirical likelihood ratio confidence regions. Annals of Statistics, 18(1): 90–120.
Owen A.B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.
eel.test1, hotel1T2, james, hotel2T2, maov, el.test2
x <- as.matrix(iris[, 1:4]) el.test1(x, mu = numeric(4) ) eel.test1(x, mu = numeric(4) )
x <- as.matrix(iris[, 1:4]) el.test1(x, mu = numeric(4) ) eel.test1(x, mu = numeric(4) )
Empirical likelihood hypothesis testing for two mean vectors.
el.test2(y1, y2, R = 0, ncores = 1, graph = FALSE)
el.test2(y1, y2, R = 0, ncores = 1, graph = FALSE)
y1 |
A matrix containing the Euclidean data of the first group. |
y2 |
A matrix containing the Euclidean data of the second group. |
R |
If R is 0, the classical chi-square distribution is used, if R = 1, the corrected chi-square distribution (James, 1954) is used and if R = 2, the modified F distribution (Krishnamoorthy and Yanping, 2006) is used. If R is greater than 3 bootstrap calibration is performed. |
ncores |
How many to cores to use. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. IF TRUE the histogram of the bootstrap test statistic values is plotted. |
The is that
and the two constraints imposed by EL are
where and the
are Lagrangian parameters introduced to maximize the above expression. Note that the maximization of is with respect to the
. The probabilities of the
-th sample have the following form
. The log-likelihood ratio test statistic can be written as
The test is implemented by searching for the mean vector that minimizes the sum of the two one sample EL test statistics. See el.test1
for the test statistic in the one-sample case.
A list including:
test |
The empirical likelihood test statistic value. |
modif.test |
The modified test statistic, either via the chi-square or the F distribution. |
dof |
Thre degrees of freedom of the chi-square or the F distribution. |
pvalue |
The asymptotic or the bootstrap p-value. |
mu |
The estimated common mean vector. |
runtime |
The runtime of the bootstrap calibration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Amaral G.J.A., Dryden I.L. and Wood A.T.A. (2007). Pivotal bootstrap methods for k-sample problems in directional statistics and shape analysis. Journal of the American Statistical Association, 102(478): 695–707.
Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.
Owen A.B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2): 237–249.
Preston S.P. and Wood A.T.A. (2010). Two-Sample Bootstrap Hypothesis Tests for Three-Dimensional Labelled Landmark Data. Scandinavian Journal of Statistics, 37(4): 568–587.
eel.test2, maovjames, maov, hotel2T2, james
el.test2( y1 = as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 0 ) el.test2( y1 = as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 1 ) el.test2( y1 =as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 2 )
el.test2( y1 = as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 0 ) el.test2( y1 = as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 1 ) el.test2( y1 =as.matrix(iris[1:25, 1:4]), y2 = as.matrix(iris[26:50, 1:4]), R = 2 )
Exponential empirical likelihood for a one sample mean vector hypothesis testing.
eel.test1(x, mu, tol = 1e-06, R = 1)
eel.test1(x, mu, tol = 1e-06, R = 1)
x |
A matrix containing Euclidean data. |
mu |
The hypothesized mean vector. |
tol |
The tolerance value used to stop the Newton-Raphson algorithm. |
R |
The number of bootstrap samples used to calculate the p-value. If R = 1 (default value), no bootstrap calibration is performed |
Exponential empirical likelihood or exponential tilting was first introduced by Efron (1981) as a way to perform a "tilted" version of the bootstrap for the one sample mean hypothesis testing. Similarly to the empirical likelihood, positive weights , which sum to one, are allocated to the observations, such that the weighted sample mean
is equal to some population mean
, under the
. Under
the weights are equal to
, where
is the sample size. Following Efron (1981), the choice of
will minimize the Kullback-Leibler distance from
to
subject to the constraint . The probabilities take the form
and the constraint becomes
A numerical search over is required. Under
, where
denotes the number of variables. Alternatively the bootstrap p-value may be computed.
A list including:
p |
The estimated probabilities. |
lambda |
The value of the Lagrangian parameter |
iter |
The number of iterations required by the newton-Raphson algorithm. |
info |
The value of the log-likelihood ratio test statistic along with its corresponding p-value. |
runtime |
The runtime of the process. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Efron B. (1981) Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics, 9(2): 139–158.
Jing B.Y. and Wood A.T.A. (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics, 24(1): 365–369.
Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.
el.test1, hotel1T2, james, hotel2T2, maov, el.test2
x <- as.matrix( iris[, 1:4] ) eel.test1(x, numeric(4) ) el.test1(x, numeric(4) )
x <- as.matrix( iris[, 1:4] ) eel.test1(x, numeric(4) ) el.test1(x, numeric(4) )
Exponential empirical likelihood hypothesis testing for two mean vectors.
eel.test2(y1, y2, tol = 1e-07, R = 0, graph = FALSE)
eel.test2(y1, y2, tol = 1e-07, R = 0, graph = FALSE)
y1 |
A matrix containing the Euclidean data of the first group. |
y2 |
A matrix containing the Euclidean data of the second group. |
tol |
The tolerance level used to terminate the Newton-Raphson algorithm. |
R |
If R is 0, the classical chi-square distribution is used, if R = 1, the corrected chi-square distribution (James, 1954) is used and if R = 2, the modified F distribution (Krishnamoorthy and Yanping, 2006) is used. If R is greater than 3 bootstrap calibration is performed. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. IF TRUE the histogram of the bootstrap test statistic values is plotted. |
Exponential empirical likelihood or exponential tilting was first introduced by Efron (1981) as a way to perform a "tilted" version of the bootstrap for the one sample mean hypothesis testing. Similarly to the empirical likelihood, positive weights , which sum to one, are allocated to the observations, such that the weighted sample mean
is equal to some population mean
, under the
. Under
the weights are equal to
, where
is the sample size. Following Efron (1981), the choice of
will minimize the Kullback-Leibler distance from
to
subject to the constraint . The probabilities take the form
and the constraint becomes
Similarly to empirical likelihood a numerical search over is required.
We can derive the asymptotic form of the test statistic in the two sample means case but in a simpler form, generalizing the approach of Jing and Robinson (1997) to the multivariate case as follows. The three constraints are
Similarly to EL the sum of a linear combination of the is set to zero. We can equate the first two constraints of
Also, we can write the third constraint of as and thus rewrite the first two constraints as
This trick allows us to avoid the estimation of the common mean. It is not possible though to do this in the empirical likelihood method. Instead of minimisation of the sum of the one-sample test statistics from the common mean, we can define the probabilities by searching for the which makes the last equation hold true. The third constraint of is a convenient constraint, but Jing and Robinson (1997) mention that even though as a constraint is simple it does not lead to second-order accurate confidence intervals unless the two sample sizes are equal. Asymptotically, the test statistic follows a
under the null hypothesis.
A list including:
test |
The empirical likelihood test statistic value. |
modif.test |
The modified test statistic, either via the chi-square or the F distribution. |
dof |
The degrees of freedom of the chi-square or the F distribution. |
pvalue |
The asymptotic or the bootstrap p-value. |
mu |
The estimated common mean vector. |
runtime |
The runtime of the bootstrap calibration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Efron B. (1981) Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics, 9(2): 139–158.
Jing B.Y. and Wood A.T.A. (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics, 24(1): 365–369.
Jing B.Y. and Robinson J. (1997). Two-sample nonparametric tilting method. Australian Journal of Statistics, 39(1): 25–34.
Owen A.B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.
Preston S.P. and Wood A.T.A. (2010). Two-Sample Bootstrap Hypothesis Tests for Three-Dimensional Labelled Landmark Data. Scandinavian Journal of Statistics 37(4): 568–587.
Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406–422.
el.test2, maovjames, maov, hotel2T2,
james
y1 = as.matrix(iris[1:25, 1:4]) y2 = as.matrix(iris[26:50, 1:4]) eel.test2(y1, y2) eel.test2(y1, y2 ) eel.test2( y1, y2 )
y1 = as.matrix(iris[1:25, 1:4]) y2 = as.matrix(iris[26:50, 1:4]) eel.test2(y1, y2) eel.test2(y1, y2 ) eel.test2( y1, y2 )
Hotelling's test for testing one Euclidean population mean vector.
hotel1T2(x, M, a = 0.05, R = 999, graph = FALSE)
hotel1T2(x, M, a = 0.05, R = 999, graph = FALSE)
x |
A matrix containing Euclidean data. |
a |
The significance level, set to 0.05 by default. |
M |
The hypothesized mean vector. |
R |
If R is 1 no bootstrap calibration is performed and the classical p-value via the F distribution is returned. If R is greater than 1, the bootstrap p-value is returned. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. IF TRUE the histogram of the bootstrap test statistic values is plotted. |
The hypothesis test is that a mean vector is equal to some specified vector . We assume that
is unknown. The first approach to this hypothesis test is parametrically, using the Hotelling's
test Mardia, Bibby and Kent (1979, pg. 125-126). The test statistic is given by
Under the null hypothesis, the above test statistic follows the distribution. The bootstrap version of the one-sample multivariate generalization of the simple t-test is also included in the function. An extra argument (R) indicates whether bootstrap calibration should be used or not. If R=1, then the asymptotic theory applies, if R>1, then the bootstrap p-value will be applied and the number of re-samples is equal to R.
A list including:
m |
The sample mean vector. |
info |
The test statistic, the p-value, the critical value and the degrees of freedom of the F distribution (numerator and denominator). This is given if no bootstrap calibration is employed. |
pvalue |
The bootstrap p-value is bootstrap is employed. |
runtime |
The runtime of the bootstrap calibration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate analysis. London: Academic Press.
eel.test1, el.test1, james, hotel2T2, maov, el.test2
x <- matrix( rnorm( 100 * 4), ncol = 4) hotel1T2(x, numeric(4), R = 1) hotel1T2(x, numeric(4), R = 999, graph = TRUE)
x <- matrix( rnorm( 100 * 4), ncol = 4) hotel1T2(x, numeric(4), R = 1) hotel1T2(x, numeric(4), R = 999, graph = TRUE)
Hotelling's test for testing the equality of two Euclidean population mean vectors.
hotel2T2(x1, x2, a = 0.05, R = 999, graph = FALSE)
hotel2T2(x1, x2, a = 0.05, R = 999, graph = FALSE)
x1 |
A matrix containing the Euclidean data of the first group. |
x2 |
A matrix containing the Euclidean data of the second group. |
a |
The significance level, set to 0.05 by default. |
R |
If R is 1 no bootstrap calibration is performed and the classical p-value via the F distribution is returned. If R is greater than 1, the bootstrap p-value is returned. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. IF TRUE the histogram of the bootstrap test statistic values is plotted. |
The fist case scenario is when we assume equality of the two covariance matrices. This is called the two-sample Hotelling's test (Mardia, Kent and Bibby, 1979, pg. 131-140) and Everitt (2005, pg. 139). The test statistic is defined as
where is the pooled covariance matrix calculated under the assumption of equal covariance matrices
Under
the statistic
given by
follows the distribution with
and
degrees of freedom. Similar to the one-sample test, an extra argument (R) indicates whether bootstrap calibration should be used or not. If R=1, then the asymptotic theory applies, if R>1, then the bootstrap p-value will be applied and the number of re-samples is equal to R. The estimate of the common mean used in the bootstrap to transform the data under the null hypothesis the mean vector of the combined sample, of all the observations.
The built-in command manova
does the same thing exactly. Try it, the asymptotic test is what you have to see. In addition, this command allows for more mean vector hypothesis testing for more than two groups. I noticed this command after I had written my function and nevertheless as I mention in the introduction this document has an educational character as well.
A list including:
mesoi |
The two mean vectors. |
info |
The test statistic, the p-value, the critical value and the degrees of freedom of the F distribution (numerator and denominator). This is given if no bootstrap calibration is employed. |
pvalue |
The bootstrap p-value is bootstrap is employed. |
note |
A message informing the user that bootstrap calibration has been employed. |
runtime |
The runtime of the bootstrap calibration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Everitt B. (2005). An R and S-Plus Companion to Multivariate Analysis. Springer.
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. London: Academic Press.
Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406–422.
james, maov, el.test2, eel.test2
hotel2T2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]) ) hotel2T2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 )
hotel2T2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]) ) hotel2T2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 )
Hypothesis test for two high-dimensional mean vectors.
sarabai(x1, x2)
sarabai(x1, x2)
x1 |
A matrix containing the Euclidean data of the first group. |
x2 |
A matrix containing the Euclidean data of the second group. |
High dimensional data are the multivariate data which have many variables () and usually a small number of observations (
). It also happens that
and this is the case here in this Section. We will see a simple test for the case of
. In this case, the covariance matrix is not invertible and in addition it can have a lot of zero eigenvalues.
The test we will see was proposed by Bai and Saranadasa (1996). Ever since, there have been some more suggestions but I chose this one for its simplicity. There are two datasets, and
of sample sizes
and
, respectively. Their corresponding sample mean vectors and covariance matrices are
,
and
,
respectively. The assumption here is the same as that of the Hotelling's test we saw before.
Let us define the pooled covariance matrix at first, calculated under the assumption of equal covariance matrices
,
where
. Then define
.
The test statistic is
Under the null hypothesis (equality of the two mean vectors) the test statistic follows the standard normal distribution. Bai and Saranadasa (1996) established the asymptotic normality of the test statistics and showed that it has attractive power property when and under some restriction on the maximum eigenvalue of the common population covariance matrix. However, the requirement of
and
being of the same order is too restrictive to be used in the "large
small
" situation.
A vector with the test statistic and the p-value.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Bai Z. D. and Saranadasa H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica, 6(2): 311–329.
hotel2T2, maov, el.test2, eel.test2
x1 <- matrix( rnorm(40 * 100), ncol = 100 ) x2 <- matrix( rnorm(50 * 100), ncol = 100 ) sarabai(x1, x2)
x1 <- matrix( rnorm(40 * 100), ncol = 100 ) x2 <- matrix( rnorm(50 * 100), ncol = 100 ) sarabai(x1, x2)
James test for testing the equality of two population mean vectors without assuming equality of the covariance matrices.
james(y1, y2, a = 0.05, R = 999, graph = FALSE)
james(y1, y2, a = 0.05, R = 999, graph = FALSE)
y1 |
A matrix containing the Euclidean data of the first group. |
y2 |
A matrix containing the Euclidean data of the second group. |
a |
The significance level, set to 0.05 by default. |
R |
If R is 1 no bootstrap calibration is performed and the classical p-value via the F distribution is returned. If R is greater than 1, the bootstrap p-value is returned. |
graph |
A boolean variable which is taken into consideration only when bootstrap calibration is performed. If TRUE the histogram of the bootstrap test statistic values is plotted. |
Here we show the modified version of the two-sample test (function
hotel2T2
) in the case where the two covariances matrices cannot be assumed to be equal.
James (1954) proposed a test for linear hypotheses of the population means when the variances (or the covariance matrices) are not known. Its form for two -dimensional samples is:
where
.
James (1954) suggested that the test statistic is compared with , a corrected
distribution whose form is
where
and
.
If you want to do bootstrap to get the p-value, then you must transform the data under the null hypothesis. The estimate of the common mean is given by Aitchison (1986)
The modified Nel and van der Merwe (1986) test is based on the same quadratic form as that of James (1954) but the distribution used to compare the value of the test statistic is different.
It is shown in Krishnamoorthy and Yanping (2006) that approximately, where
The algorithm is taken by Krishnamoorthy and Yu (2004).
A list including:
note |
A message informing the user about the test used. |
mesoi |
The two mean vectors. |
info |
The test statistic, the p-value, the correction factor and the corrected critical value of the chi-square distribution if the James test has been used or, the test statistic, the p-value, the critical value and the degrees of freedom (numerator and denominator) of the F distribution if the modified James test has been used. |
pvalue |
The bootstrap p-value if bootstrap is employed. |
runtime |
The runtime of the bootstrap calibration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Aitchison J. (1986). The statistical analysis of compositional data. Chapman & Hall.
James G.S. (1954). Tests of Linear Hypothese in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19–43.
Krishnamoorthy K. and Yu J. (2004). Modified Nel and Van der Merwe test for the multivariate Behrens-Fisher problem. Statistics & Probability Letters, 66(2): 161–169.
Krishnamoorthy K. and Yanping Xia (2006). On Selecting Tests for Equality of Two Normal Mean Vectors. Multivariate Behavioral Research, 41(4): 533–548.
Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406–422.
hotel2T2, maovjames, el.test2, eel.test2
james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 ) james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 2 ) james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]) )
james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 ) james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 2 ) james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]) )
Log-likelihood ratio test for equality of one covariance matrix.
equal.cov(x, Sigma, a = 0.05)
equal.cov(x, Sigma, a = 0.05)
x |
A matrix containing Euclidean data. |
Sigma |
The hypothesis covariance matrix. |
a |
The significance level, set to 0.05 by default. |
The hypothesis test is that the the sample covariance is equal to some specified covariance matrix: , with
unknown. The algorithm for this test is taken from Mardia, Bibby and Kent (1979, pg. 126-127).
The test is based upon the log-likelihood ratio test. The form of the test is
where is the sample size,
is the specified covariance matrix under the null hypothesis,
is the sample covariance matrix and
is the dimensionality of the data (or the number of variables). Let
and
denote the arithmetic mean and the geometric mean respectively of the eigenvalues of
, so that
and
, then the test statistic becomes
The degrees of freedom of the distribution are
.
A vector with the the test statistic, the p-value, the degrees of freedom and the critical value of the test.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. London: Academic Press.
x <- as.matrix( iris[, 1:4] ) s <- cov(x) * 1.5 equal.cov(x, s)
x <- as.matrix( iris[, 1:4] ) s <- cov(x) * 1.5 equal.cov(x, s)
Log-likelihood ratio test for equality of two or more covariance matrices.
likel.cov(x, ina, a = 0.05)
likel.cov(x, ina, a = 0.05)
x |
A matrix containing Euclidean data. |
ina |
A vector denoting the groups of the data. |
a |
The significance level, set to 0.05 by default. |
Tthe hypothesis test is that of the equality of at least two covariance matrices: . The algorithm is taken from Mardia, Bibby and Kent (1979, pg. 140). The log-likelihood ratio test is the multivariate generalization of Bartlett's test of homogeneity of variances. The test statistic takes the following form
where is the
-th sample biased covariance matrix and
is the maximum likelihood estimate of the common covariance matrix (under the null hypothesis) with
. The degrees of freedom of the asymptotic chi-square distribution are
.
A vector with the the test statistic, the p-value, the degrees of freedom and the critical value of the test.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Mardia K.V., Kent J.T. and Bibby J.M. (1979). Multivariate Analysis. London: Academic Press.
x <- as.matrix( iris[, 1:4] ) ina <- iris[, 5] likel.cov(x, ina)
x <- as.matrix( iris[, 1:4] ) ina <- iris[, 5] likel.cov(x, ina)
Multivariate analysis of variance without assuming equality of the covariance matrices.
maovjames(x, ina, a = 0.05)
maovjames(x, ina, a = 0.05)
x |
A matrix containing Euclidean data. |
ina |
A numerical or factor variable indicating the groups of the data. |
a |
The significance level, set to 0.005 by default. |
James (1954) also proposed an alternative to MANOVA when the covariance matrices are not assumed equal. The test statistic for samples is
where and
are the sample mean vector and sample size of the
-th sample respectively and
, where
is the covariance matrix of the
-sample mean vector and
is the estimate of the common mean
.
Normally one would compare the test statistic with a , where
are the degrees of freedom with
denoting the number of groups and
the dimensionality of the data. There are
constraints (how many univariate means must be equal, so that the null hypothesis, that all the mean vectors are equal, holds true), that is where these degrees of freedom come from. James (1954) compared the test statistic with a corrected
distribution instead. Let
and
be
and
.
The corrected quantile of the distribution is given as before by
.
A vector with the next 4 elements:
test |
The test statistic. |
correction |
The value of the correction factor. |
corr.critical |
The corrected critical value of the chi-square distribution. |
p-value |
The p-value of the corrected test statistic. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
James G.S. (1954). Tests of Linear Hypotheses in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19–43.
maov( as.matrix(iris[,1:4]), iris[,5] ) maovjames( as.matrix(iris[,1:4]), iris[,5] )
maov( as.matrix(iris[,1:4]), iris[,5] ) maovjames( as.matrix(iris[,1:4]), iris[,5] )
Multivariate analysis of variance assuming equality of the covariance matrices.
maov(x, ina)
maov(x, ina)
x |
A matrix containing Euclidean data. |
ina |
A numerical or factor variable indicating the groups of the data. |
Multivariate analysis of variance assuming equality of the covariance matrices.
A list including:
note |
A message stating whether the |
result |
The test statistic and the p-value. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Johnson R.A. and Wichern D.W. (2007, 6th Edition). Applied Multivariate Statistical Analysis, pg. 302–303.
Todorov V. and Filzmoser P. (2010). Robust Statistic for the One-way MANOVA. Computational Statistics & Data Analysis, 54(1): 37–48.
maov( as.matrix(iris[,1:4]), iris[,5] ) maovjames( as.matrix(iris[,1:4]), iris[,5] )
maov( as.matrix(iris[,1:4]), iris[,5] ) maovjames( as.matrix(iris[,1:4]), iris[,5] )
test and James' MANOVA
Relationship between Hotelling's test and James' MANOVA.
maovjames.hotel(x, ina)
maovjames.hotel(x, ina)
x |
A matrix containing the Euclidean data of the first group. |
ina |
A numerical or factor variable indicating the groups of the data. |
The relationship for the James two sample test (see the function james.hotel
) is true for the case of the MANOVA. The estimate of the common mean, (see the function
james
for the expression of ), is in general, for
groups, each of sample size
, written as
The function is just a proof of the mathematics you will find in Emerson (2009, pg. 76–81) and is again intended for educational purposes.
A list including:
test |
The value of the test statistic, the sum of the two Hotelling's test statistic using the common mean. |
mc |
The common mean. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Emerson S. (2009). Small sample performance and calibration of the Empirical Likelihood method. PhD thesis, Stanford university.
James G.S. (1954). Tests of Linear Hypothese in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19–43.
hotel2T2, maovjames, el.test2, eel.test2
maovjames.hotel( as.matrix(iris[, 1:4]), iris[, 5] ) maovjames( as.matrix(iris[, 1:4]), iris[, 5] )
maovjames.hotel( as.matrix(iris[, 1:4]), iris[, 5] ) maovjames( as.matrix(iris[, 1:4]), iris[, 5] )
and James' test
Relationship between the Hotelling's and James' test.
james.hotel(x1, x2)
james.hotel(x1, x2)
x1 |
A matrix containing the Euclidean data of the first group. |
x2 |
A matrix containing the Euclidean data of the second group. |
Emerson (2009, pg. 76–81) mentioned a very nice result between the Hotelling's one sample and James test for two mean vectors
where is the James test statistic (James, 1954) and
and
are the two one sample Hotelling's
test statistic values (see function
hotel1T2
) for each sample from their common mean vector (see the help file of
james
). In fact, James test statistic is found from minimizing the right hand side of the above expression with respect to . The sum is mimized when
takes the form of the common mean vector
. The same is true for the t-test in the univariate case.
I have created this function illustrating this result, so this one is for educational purposes. It calculates the James test statistic, the sum of the two test statistics, the common mean vector and the one found via numerical optimization. In the univariate case, the common mean vector is a weighted linear combination of the two sample means. So, if we take a segment connecting the two means, the common mean is somewhere on that segment.
A list including:
tests |
A vector with two values, the James test statistic value and the sum of the two Hotelling's test statistic using the common mean. |
mathematics.mean |
The common mean computed the closed form expression seen in the help file of |
optimised.mean |
The common mean vector obtained from the minimisation process. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Emerson S. (2009). Small sample performance and calibration of the Empirical Likelihood method. PhD thesis, Stanford university.
James G.S. (1954). Tests of Linear Hypothese in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19–43.
hotel2T2, maovjames, el.test2, eel.test2
james.hotel( as.matrix(iris[1:50, 1:4]), as.matrix(iris[51:100, 1:4]) ) james( as.matrix(iris[1:50, 1:4]), as.matrix(iris[51:100, 1:4]), R = 1 )
james.hotel( as.matrix(iris[1:50, 1:4]), as.matrix(iris[51:100, 1:4]) ) james( as.matrix(iris[1:50, 1:4]), as.matrix(iris[51:100, 1:4]), R = 1 )
test
Repeated measures ANOVA (univariate data) using Hotelling's test.
rm.hotel(x, a = 0.05)
rm.hotel(x, a = 0.05)
x |
A numerical matrix with the repeated measurements. Each column contains the values of the repeated measurements. |
a |
The level of significance, default value is equal to 0.05. |
We now show how can one use Hotelling's test to analyse univariate repeated measures. Univariate analysis of variance for repeated measures is the classical way, but we can use this multivariate test as well. In the repeated measures ANOVA case, we have many repeated observations from the same
subjects, usually at different time points and the interest is to see whether the means of the samples are equal or not
assuming
repeated measurements. We can of course change this null hypothesis and test many combinations of means. The idea in any case is to construct a matrix of contrasts. I will focus here in the first case only and in particular the null hypothesis and the matrix of contrasts
are
The contrast matrix has
independent rows and if there is no treatment effect,
.
The test statistic is
A list including:
m |
The mean vector. |
result |
A vector with the test statistic value, it's associated p-value, the numerator and denominator degrees of freedom and the critical value. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
x <- as.matrix(iris[, 1:4]) ## assume they are repeated measurements rm.hotel(x)
x <- as.matrix(iris[, 1:4]) ## assume they are repeated measurements rm.hotel(x)