Package 'corrfuns' reference manual

Title:	Correlation Coefficient Related Functions
Description:	Many correlation coefficient related functions are offered, such as correlations, partial correlations and hypothesis testing using asymptotic tests and computer intensive methods (bootstrap and permutation). References include Mardia K.V., Kent J.T. and Bibby J.M. (1979). "Multivariate Analysis". ISBN: 978-0124712522. London: Academic Press and Owen A. B. (2001). "Empirical likelihood". Chapman and Hall/CRC Press. ISBN: 9781584880714.
Authors:	Michail Tsagris [aut, cre]
Maintainer:	Michail Tsagris <[email protected]>
License:	GPL (>= 2)
Version:	1.2
Built:	2025-03-10 06:06:41 UTC
Source:	CRAN

Correlation Coefficient Related Functions

Description

Description: Many correlation coefficient related functions, such as partial correlations and hypothesis testing. The package serves an educational purpose as well, since some functions are written using a for loop and a vectorised version.

Details

Package:	corrfuns
Type:	Package
Version:	1.2
Date:	2025-01-08
License:	GPL-2

Maintainers

Michail Tsagris <[email protected]>.

Author(s)

Michail Tsagris [email protected]

Asymptotic p-value for a correlation coefficient

Description

Asymptotic p-value a correlation coefficient.

Usage

correl(y, x, type = "pearson", rho = 0, alpha = 0.05)
correl(y, x, type = "pearson", rho = 0, alpha = 0.05)

Arguments

`y`	A numerical vector.
`x`	A numerical vector.
`type`	The type of correlation coefficient to compute, "pearson" or "spearman".
`rho`	The hypothesized value of the true partial correlation.
`alpha`	The significance level.

Details

Fisher's transformation for the correlation coefficient is defined as $\hat{z}=\frac{1}{2}\log\frac{1+r}{1-r}$ and its inverse is equal to $\frac{\exp\left(2\hat{z}\right)-1}{\exp\left(2\hat{z}\right)+1}$ . The estimated standard error of Fisher's transform is $\frac{1}{\sqrt{n-3}}$ (Efron and Tibshirani, 1993, pg. 54). If on the other hand, you choose to calculate Spearman's correlation coefficients, the estimated standard error is slightly different $\simeq \frac{ 1.029563}{\sqrt{n-3}}$ (Fieller, Hartley and Pearson, 1957, Fieller and Pearson, 1961). R calculates confidence intervals based in a different way and does hypothesis testing for zero values only. The function calculates asymptotic confidence intervals based upon Fisher's transform, assuming asymptotic normality of the transform and performs hypothesis testing for the true (any, non only zero) value of the correlation. The sample distribution though is a $t_{n-3}$ .

Value

A list including:

`result`	The correlation coefficient and the p-value for the test of zero correlation.
`ci`	The asymptotic $(1-\alpha)\%$ confidence interval for the true correlation coefficient.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

References

Efron B. and Tibshirani R.J. (1993). An introduction to the bootstrap. Chapman & Hall/CRC.

Fieller E.C., Hartley H.O. and Pearson E.S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4): 470–481.

Fieller E.C. and Pearson E.S. (1961). Tests for rank correlation coefficients: II. Biometrika, 48(1/2): 29–40.

Examples

a <- correl( iris[, 1], iris[, 2] )
a <- correl( iris[, 1], iris[, 2] )

Asymptotic p-value for many correlation coefficients

Description

Asymptotic p-value for many correlation coefficients.

Usage

correls(y, x, type = "pearson", rho = 0, alpha = 0.05)
correls(y, x, type = "pearson", rho = 0, alpha = 0.05)

Arguments

`y`	A numerical vector.
`x`	A numerical vector.
`type`	The type of correlation coefficient to compute, "pearson" or "spearman".
`rho`	The hypothesized value of the true partial correlation.
`alpha`	The significance level.

Details

Suppose you have a (dependent) variable $Y$ and a matrix of $p$ variables $\bf X$ and you want to get all the correlations between $Y$ and $X_i$ for $i=1,\ldots,p$ . if you type cor(y, x) in you will get a vector of the correlations. What I offer here is confidence interval for each of the correlations, the test statistic and the p-values for the hypothesis that each of them is equal to some value $\rho$ . The p-values and test statistics are useful for meta-analysis for example, combination of the p-values in one or even to see the false discovery rate (see the package fdrtool by Korbinian Strimmer).

Value

A matrix with 5 columns, the correlations, the test statistics, their associated p-values and the relevant $(1-\alpha)\%$ confidence intervals.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- correls(y, x )
y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- correls(y, x )

Bootstrap p-value for the correlation coefficient

Description

Bootstrap p-value for the correlation coefficient.

Usage

bootcor(x, B = 999)
bootcor2(x, B = 999)
bootcor(x, B = 999)
bootcor2(x, B = 999)

Arguments

`x`	A numerical matrix with two columns.
`B`	The number of bootstrap samples to generate.

Details

The functions perform non-parametric bootstrap hypothesis testing that the correlation coefficient is zero. A good pivotal statistic is the Fisher's transformation (see correl). Then the data have to be transformed under the null hypothesis ( $\rho=0$ ). This is doable via the eigen-analysis of the covariance matrix. We transform the bivariate data such that the covariance (and thus the correlation) matrix equals the identity matrix. remind that the correlation matrix is independent of measurements and is location free. The next step is easy, we draw bootstrap samples (from the transformed data) and every time we calculate the Fisher's transformation. The bootstrap p-value is calculated in the usual way (Davison and Hinkley, 1997).

If you want to perform a non-parametric bootstrap hypothesis for a value of the correlation other than zero the procedure is similar. The data have already been transformed such that their correlation is zero. Now instead of the zeroes in the off-diagonal values of the identity matrix you will have the value of the correlation matrix you want to test. Eigen analysis of the matrix is performed and the square root of the matrix is used to multiply the transformed data. I could write a more general function to include all case, but I will leave this task to you. If you do write it please send it to me and I will put it with your name of course.

The function "bootcor()" is a vectorised version of "bootcor2()". Instead of using a for loop you can do things vectorised. This idea cam when I found the vectorised bootstrap correlation by Neto (2015). I cannot say I understood fully what he did, so I decided to write my own code based on the direction he pointed.

Pearson's correlation coefficient of $x$ and $y$ for a sample size $n$ is given by

$r = \frac{\sum_{i=1}^nx_iy_i-n\bar{x}\bar{y}} { \sqrt{\left(\sum_{i=1}^nx_i^2-n\bar{x}^2\right) \left(\sum_{i=1}^ny_i^2-n\bar{y}^2\right)} }$

So, we can see that need 5 terms to calculate, $\sum_{i=1}^nx_iy_i, \bar{x}, \bar{y}, \sum_{i=1}^nx_i^2$ and $\sum_{i=1}^ny_i^2$ . After transforming the data under the null hypothesis using the spectral decomposition we proceed as follows with $B$ number of resamples.

Algorithm for vectorised bootstrap

1. Set a seed number in R. This is to make sure that the pairs of $\left(x_i, y_i\right)$ are still the same.

2. Sample with replacement $B \times n$ values of $x$ and put them in a matrix with $n$ rows and $B$ columns, named $X_B$ .

3. Sample with replacement $B \times n$ values of $y$ and put them in a matrix with $n$ rows and $B$ columns, names $Y_B$ .

4. Calculate the mean vectors of $X_B$ and $Y_B$ .

5. Calculate the sum vector of $X_B^2$ and $Y_B^2$ .

6. Finally calculate the sum vector of $X_B * Y_B$ . This is the term $\sum_{i=1}^nx_iy_i$ for all resamples.

So we now have 5 vectors containing the 5 terms we want. We calculate the correlation coefficient and then the Fisher's transformation (see correl) and so we have $B$ bootstrap test statistics. In order to see the time gain I tested both of these functions with $B=9999$ resamples and 1000 repetitions. The gain is not super wow, I would like it if it was 1/10, but even saw, it is still good. Parallelised versions reduce time to 1/3, so from this perspective, I did better. If we now put parallel inside this vectorised version, computations will be even faster. I leave this with you.

But, I noticed one thing, the same thing Neto (2015) mentions. For big sample sizes, for example 1000 pairs, the time difference is not that big and perhaps a for loop is faster. The big difference is in the small to moderate sample sizes. At least for this example. What I mean by this is that you should not be afraid and say, then why? If I have big sample, I do not need vectorization. Maybe yes, but even then I still recommend it. Maybe someone else will have a better alternative for vectorization which is better even in the big samples, for the correlation of course. In the contour plots though, vectorised versions are always faster no matter what.

Value

The correlation coefficient and the bootstrap based p-value for the test of zero correlation.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

References

Davison A.C. and Hinkley D.V. (1997). Bootstrap methods and their application. Cambridge university Press.

Neto E.C. (2015). Speeding up non-parametric bootstrap computations for statistics based on sample moments in small/moderate sample size applications. PloS ONE, 10(6): e0131333.

Examples

bootcor( iris[, 1:2] )
bootcor( iris[, 1:2] )

EL and EEL test for a correlation coefficient

Description

EL and EEL test for a correlation coefficient.

Usage

el.cor.test(y, x, rho, tol = 1e-07)
el.cor.test(y, x, rho, tol = 1e-07)
el.cor.test(y, x, rho, tol = 1e-07)
el.cor.test(y, x, rho, tol = 1e-07)

Arguments

`y`	A numerical vector.
`x`	A numerical vector.
`rho`	The hypothesized value of the true partial correlation.
`tol`	The tolerance vlaue to terminate the Newton-Raphson algorithm.

Details

The empirical likelihood (EL) or the exponential empirical likelihood (EEL) test is performed for the Pearson correlation coefficient. At first we standardise the data so that the sample correlation equal the inner product between the two variables, $\hat{r}=\sum_{i=1}^nx_iy_i$ , where $n$ is the sample size.

The EL works by minimizing the quantity $\sum_{i=1}^n\log{nw_i}$ subject to the constraints $\sum_{i=1}^nw_i(x_iy_i-\rho)=0$ , $\sum_{i=1}^nw_i=1$ and $\rho$ is the hypothesised correlation coefficient, under the $H_0$ . After some algebra the form of the weights $w_i$ becomes

$w_i=\frac{1}{n}\frac{1}{1+\lambda(x_iy_i-\rho)},$

where $\lambda$ is the Lagrange multiplier of the first (zero sum) constraint. Thus, the zero sum constraint becomes $\sum_{i=1}^n\frac{x_iy_i-\rho}{1 + \lambda(x_iy_i-\rho)}=0$ and this equality is solved with respect to $\lambda$ via the Newton-Raphson algortihm. The derivative of this function is $-\sum_{i=1}^n\frac{(x_iy_i-\rho)^2}{\left[1 + \lambda(x_iy_i-\rho)\right]^2}=0$ .

The EL works by minimizing the quantity $\sum_{i=1}^nw_i\log{nw_i}$ subject to the same constraints as before, $\sum_{i=1}^nw_i(x_iy_i-\rho)=0$ or $\sum_{i=1}^nw_i(x_iy_i)=\rho$ , $\sum_{i=1}^nw_i=1$ . After some algebra the form of the weights $w_i$ becomes

$w_i=\frac{e^{\lambda x_iy_i}}{\sum_{j=1}^ne^{\lambda x_jy_j}},$

where, again, $\lambda$ is the Lagrange multiplier of the first (zero sum) constraint. Thus, the zero sum constraint becomes $\frac{\sum_{i=1}^nx_iy_ie^{\lambda x_iy_i}}{\sum_{j=1}^ne^{\lambda x_jy_j}}-\rho=0$ and this equality is solved with respect to $\lambda$ via the Newton-Raphson algortihm. The derivative of this function is

$\frac{\sum_{i=1}^n(x_iy_i)^2e^{\lambda x_iy_i} * \sum_{i=1}^ne^{\lambda x_iy_i} - \left(\sum_{i=1}^nx_iy_ie^{\lambda x_iy_i}\right)^2}{\left(\sum_{j=1}^ne^{\lambda x_jy_j}\right)^2}.$

Value

A list including:

`iters`	The number of iterations required by the Newton-Raphson. If no convergence occured this is NULL.
`info`	A vector with three values, the value of $\lambda$ , the test statistic and its associated asymptotic p-value. If no convergence occured, the value of the $\lambda$ is NA, the value of test statistic is $10^5$ and the p-value is 0. No convergence can be interpreted as rejection of the hypothesis test.
`p`	The probabilities of the EL or of the EEL. If no covnergence occured this is NULL.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

References

Efron B. (1981) Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics, 9(2): 139–158.

Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.

Examples

el.cor.test( iris[, 1], iris[, 2], 0 )$info
eel.cor.test( iris[, 1], iris[, 2], 0 )$info
el.cor.test( iris[, 1], iris[, 2], 0 )$info
eel.cor.test( iris[, 1], iris[, 2], 0 )$info

Hypothesis test for equality of two correlation coefficients

Description

Hypothesis test for equality of two correlation coefficients.

Usage

correls2.test(r1, r2, n1, n2, type = "pearson")
correls2.test(r1, r2, n1, n2, type = "pearson")

Arguments

`r1`	The value of the first correlation coefficient.
`r2`	The value of the second correlation coefficient.
`n1`	The sample size of the first sample from which the first correlation coefficient was computed.
`n2`	The sample size of the second sample from which the first correlation coefficient was computed.
`type`	The type of correlation coefficients, "pearson" or "spearman".

Details

The test statistic for the hypothesis of equality of two correlation coefficients is the following:

$Z=\frac{\hat{z}_1-\hat{z}_2}{\sqrt{1/\left(n1-3\right)+1/\left(n2-3\right)}},$

where $\hat{z}_1$ and $\hat{z}_2$ denote the Fisher's transformation (see correl applied to the two correlation coefficients and $n_1$ and $n_2$ denote the sample sizes of the two correlation coefficients. The denominator is the sum of the variances of the two coefficients and as you can see we used a different variance estimator than the one we used before. This function performs hypothesis testing for the equality of two correlation coefficients. The result is the calculated p-value from the standard normal distribution.

Value

The test statistic and its associated p-value for the test of equal correlations.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- correls(y, x )
y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- correls(y, x )

Partial correlation between two variables

Description

Partial correlation between two variables.

Usage

partialcor2(y, x, z, type = "pearson", rho = 0, alpha = 0.05)
partialcor2(y, x, z, type = "pearson", rho = 0, alpha = 0.05)

Arguments

`y`	A numerical vector.
`x`	A numerical vector.
`z`	A numerical vector or a numerical matrix.
`type`	The type of partial correlation coefficient to compute, "pearson" or "spearman".
`rho`	The hypothesized value of the true partial correlation.
`alpha`	The significance level.

Details

Suppose you want to calculate the correlation coefficient between two variables controlling for the effect of (or conditioning on) one or more other variables. So you cant to calculate $\hat{\rho}\left(X,Y|{\bf Z}\right)$ , where $\bf Z$ is a matrix, since it does not have to be just one variable. This idea was captures by Ronald Fisher some years ago. To calculate it, one can use linear regression as follows.

1. Calculate the residuals $\hat{e}_x$ from the linear regression $X=a+bZ$ .

2. Calculate the residuals $\hat{e}_y$ from the linear regression $Y=c+dZ$ .

3. Calculate the correlation between $\hat{e}_x$ and $\hat{e}_y$ . This is the partial correlation coefficient between $X$ and $Y$ controlling for $\bf Z$ .

The standard error of the Fisher's transformation of the sample partial correlation is Anderson (2003): $\text{SE}\left(\frac{1}{2}\log{\frac{1+\hat{\rho}\left(X,Y|{\bf Z}\right)}{1-\hat{\rho}\left(X,Y|{\bf Z}\right)}}\right)=\frac{1}{n-d-3}$ , where $n$ is the sample size and $d$ is the number of variables upon which we control. The standard error is very similar to the one of the classical correlation coefficient. In fact, the latter one is a special case of the first when $d=0$ and thus there is no variable whose effect is to be controlled.

Value

A list including:

`result`	The partial correlation coefficient and the p-value for the test of zero partial correlation.
`ci`	The asymptotic $(1-\alpha)\%$ confidence interval for the true partial correlation coefficient.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

x <- iris[, 1:4]
partialcor2(x[, 1], x[, 2], x[, 3:4])
x <- iris[, 1:4]
partialcor2(x[, 1], x[, 2], x[, 3:4])

Partial correlation between two variables when a correlation matrix is given

Description

Partial correlation between two variables when a correlation matrix is given.

Usage

partialcor(R, indx, indy, indz, n)
partialcor(R, indx, indy, indz, n)

Arguments

`R`	A correlation matrix.
`indx`	The index of the first variable whose conditional correlation is to estimated.
`indy`	The index of the second variable whose conditional correlation is to estimated.
`indz`	The index of the conditioning variables.
`n`	The sample size of the data from which the correlation matrix was computed.

Details

Suppose you want to calculate the correlation coefficient between two variables controlling for the effect of (or conditioning on) one or more other variables. So you cant to calculate $\hat{\rho}\left(X,Y|{\bf Z}\right)$ , where $\bf Z$ is a matrix, since it does not have to be just one variable. Using the correlation matrix $R$ we can do the following:

$r_{X,Y|{\bf Z}}= { \begin{array}{cc} \frac{R_{X,Y} - R_{X, {\bf Z}} R_{Y,{\bf Z}}}{ \sqrt{ \left(1 - R_{X,{\bf Z}}^2\right)^T \left(1 - R_{Y,{\bf Z}}^2\right) }} & \text{if} \ \ |{\bf Z}|=1 \\ -\frac{ {\bf A}_{1,2} }{ \sqrt{{\bf A}_{1,1}{\bf A}_{2,2}} } & \text{if} \ \ |{\bf Z}| > 1 \end{array} }$

The $R_{X,Y}$ is the correlation between variables $X$ and $Y$ , $R_{X,{\bf Z}}$ and $R_{Y,{\bf Z}}$ denote the correlations between $X$ & $\bf Z$ and $Y$ & $\bf Z$ , ${\bf A}={\bf R}_{X,Y,{\bf Z}}^{-1}$ , with $\bf A$ denoting the correlation sub-matrix of variables $X, Y, {\bf Z}$ and $A_{i,j}$ denotes the element in the $i$ -th row and $j$ -th column of matrix $A$ . The $|{\bf Z}|$ denotes the cardinality of $\bf Z$ , i.e. the number of variables.

Value

The partial correlation coefficient and the p-value for the test of zero partial correlation.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

r <- cor(iris[, 1:4])
partialcor(r, 1, 2, 3:4, 150)
r <- cor(iris[, 1:4])
partialcor(r, 1, 2, 3:4, 150)

Partial correlation matrix

Description

Partial correlation matrix.

Usage

pcormat(x, type = "pearson")
pcormat(x, type = "pearson")

Arguments

`x`	A numerical matrix with two columns.
`type`	The type of the partial correlation matrix to compute, "pearson" or "spearman".

Details

The function computes the partial correlation matrix. Given a correlation matrix, it will return the partial correlation matrix. Each entry in the final matrix, is the partial correlation matrix between a pair of variables given all the rest variables.

Value

The partial correlation matrix.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

pcormat( iris[, 1:4] )
pcormat( iris[, 1:4] )

Permutation based EL and EEL test for a correlation coefficient

Description

Permutation based EL and EEL test for a correlation coefficient.

Usage

perm.elcortest(y, x, tol = 1e-07, B = 999)
perm.eelcortest(y, x, tol = 1e-07, B = 999)
perm.elcortest(y, x, tol = 1e-07, B = 999)
perm.eelcortest(y, x, tol = 1e-07, B = 999)

Arguments

`y`	A numerical vector.
`x`	A numerical vector.
`tol`	The tolerance vlaue to terminate the Newton-Raphson algorithm.
`B`	The numer of permutations to perform.

Details

These functions compute the permutation based p-value for the hypothesis test that the Pearson correlation coefficient is equal to zero using the empirical likelihood or the exponential empirical likelihood.

Value

A vector with two values, the test statistic and its associated permutation based p-value.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

References

Efron B. (1981) Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics, 9(2): 139–158.

Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press.

Examples

perm.eelcortest( iris[, 1], iris[, 2])
perm.eelcortest( iris[, 1], iris[, 2])

Permutation p-value for many correlation coefficients

Description

Permutation p-value for many correlation coefficients.

Usage

permcorrels(y, x, B = 999)
permcorrels(y, x, B = 999)

Arguments

`y`	A numerical vector.
`x`	A numerical matrix with many columns.
`B`	The number of bootstrap samples to generate.

Details

This is the same function as correls, only this time the p-values are produced via permutations and no confidence intervals are produced.

Value

A matrix with 2 columns, the correlations and their permutation based p-values.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- permcorrels(y, x )
y <- rnorm(40)
x <- matrix(rnorm(40 * 1000), ncol = 1000)
a <- permcorrels(y, x )

Permutation p-value for the correlation coefficient

Description

Permutation p-value for the correlation coefficient.

Usage

permcor(x, B = 999, fast = TRUE)
permcor2(x, B = 999)
permcor3(x, B = 999)
permcor(x, B = 999, fast = TRUE)
permcor2(x, B = 999)
permcor3(x, B = 999)

Arguments

`x`	A numerical matrix with two columns.
`B`	The number of bootstrap samples to generate.
`fast`	If you want the C++ implementation leave this TRUE. It is about 2 times faster.

Details

These are permutation based p-values for the test that the true correlation coefficient is zero. The function "permcor2()" is a vectorised version of "permcor3()", whereas the function "permcor() does the following.

Instead of transforming the data under the null hypothesis and re-sampling with replacement we can permute the observations. Te basic difference is that the data are assumed to be under the null hypothesis already. Secondly, what we have to do, is to destroy the pairs. For example, the pairs (a, b), (c, d) and (e, f) in one permutation they can be (c, b), (a, f) and (e, d). And this thing will happen many times, say $B=999$ . Then we have $B$ pseudo-samples again. Everything else is the same as in the bootstrap case. A trick is that we need not change the order of both variables, just the one is enough. This will sped up the process. And guess what, it is faster than bootstrap. It does not require the data to be transformed under the null hypothesis and you only need to permute one variable, in contrast to the bootstrap case, where you must resample from both variables. See Chatzipantsiou et al. (2019) for more details.

Value

The correlation coefficient and the permutation based p-value for the test of zero correlation.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

References

Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2019). Extremely efficient permutation and bootstrap hypothesis tests using R. Journal of Modern Applied Statistical Methods, 18(2): eP2898.

Examples

permcor( iris[, 1:2] )
permcor( iris[, 1:2] )

Squared multivariate correlation between two sets of variables

Description

Squared multivariate correlation between two sets of variables.

Usage

sq.correl(y, x)
sq.correl(y, x)

Arguments

`y`	A numerical matrix.
`x`	A numerical matrix.

Details

Mardia, Kent and Bibby (1979, pg. 171) defined two squared multiple correlation coefficient between the dependent variable $\bf Y$ and the independent variable $\bf X$ . They mention that these are a similar measure of the coefficient determination in the univariate regression. Assume that the multivariate regression model is written as ${\bf Y}={\bf XB}+{\bf U}$ , where $\bf U$ is the matrix of residuals. Then, they write ${\bf D}=\left({\bf Y}^T{\bf Y}\right)^{-1}\hat{\bf U}^T\hat{\bf U}$ , with $\hat{\bf U}^T\hat{\bf U}={\bf Y}^T{\bf PY}$ and $\bf P$ is ${\bf P}={\bf I}_n-{\bf X}\left({\bf X}^T{\bf X}\right)^{-1}{\bf X}^T$ . The matrix $\bf D$ is a generalization of $1-R^2$ in the univariate case. Mardia, Kent and Bibby (1979, pg. 171) mentioned that the dependent variable $\bf Y$ has to be centred.

The squared multivariate correlation should lie between 0 and 1 and this property is satisfied by the trace correlation $r_T$ and the determinant correlation $r_D$ , defined as $r^2_T=d^{-1}\text{tr}\left({\bf I}-{\bf D}\right)$ and $r^2_D=\text{det}\left({\bf I}-{\bf D}\right)$ respectively, where $d$ denotes the dimensionality of $\bf Y$ . So, high values indicate high proportion of variance of the dependent variables explained. Alternatively, one can calculate the trace and the determinant of the matrix ${\bf E}=\left({\bf Y}^T{\bf Y}\right)^{-1}\hat{\bf Y}^T\hat{\bf Y}$ . Try something else also, use the function "sq.correl()" in a univariate regression example and then calculate the $R^2$ for the same dataset. Try this example again but without centering the dependent variable. In addition, take two variables and calculate their squared correlation coefficient and then square it and using "sq.correl()".

Value

A vector with two values, the trace and determinant $R^2$ .

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris [email protected].

Examples

sq.correl( iris[, 1:2], iris[, 3:4] )
sq.correl( iris[, 1:2], iris[, 3:4] )

Package 'corrfuns'

Help Index

Correlation Coefficient Related Functions

Description

Details

Maintainers

Author(s)

Asymptotic p-value for a correlation coefficient

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Asymptotic p-value for many correlation coefficients

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Bootstrap p-value for the correlation coefficient

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

EL and EEL test for a correlation coefficient

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Hypothesis test for equality of two correlation coefficients

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Partial correlation between two variables

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Partial correlation between two variables when a correlation matrix is given

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Partial correlation matrix

Description

Usage

Arguments

Details

Value

Author(s)