Introduction to moderncor

The moderncor package provides a single unified interface for computing a wide variety of classical and modern correlation coefficients. This guide introduces the core features of the package.

Installation and Setup

Once installed, you can load the package as follows:

library(moderncor)

Basic Usage with Vectors

Let’s generate some synthetic data with a non-linear parabolic relationship where \(y = x^2 + \epsilon\):

set.seed(123)
x <- runif(100, -1, 1)
y <- x^2 + rnorm(100, sd = 0.1)

Because the relationship is non-linear and symmetric, classical Pearson correlation will fail to capture the dependence:

moderncor(x, y, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.0168
#>   Statistic: 0.1667
#>   P-value:   0.868
#>   Sample size (n): 100

With moderncor, you can compute distance correlation (dcor) or Chatterjee’s Xi correlation (xi) using the same interface to capture the non-linear relationship:

# Distance Correlation (captures non-linear dependencies)
moderncor(x, y, method = "dcor")
#> 
#>    Distance Correlation 
#> 
#>   Estimate:  0.4673
#>   Statistic: 0.4673
#>   P-value:   0.005
#>   Sample size (n): 100
# Chatterjee's Xi (captures functional dependence)
moderncor(x, y, method = "xi")
#> 
#>    Chatterjee's Xi Correlation 
#> 
#>   Estimate:  0.6829
#>   P-value:   < 2.2e-16
#>   Sample size (n): 100

Classical Methods

moderncor supports Pearson, Spearman, and Kendall correlations via the same interface as base R cor():

moderncor(x, y, method = "spearman")
#> 
#>    Spearman Rank Correlation 
#> 
#>   Estimate:  -0.0105
#>   Statistic: 168404
#>   P-value:   0.9171
#>   Sample size (n): 100
moderncor(x, y, method = "kendall")
#> 
#>    Kendall Rank Correlation 
#> 
#>   Estimate:  -0.0129
#>   Statistic: -0.1906
#>   P-value:   0.8488
#>   Sample size (n): 100

Matrix and Data Frame Input

If you pass a matrix or a data.frame to moderncor(), it will compute the pairwise correlation matrix of the columns:

# Compute Spearman correlation matrix for iris dataset
res_mat <- moderncor(iris[, 1:4], method = "spearman")
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
res_mat
#> 
#>    Spearman Rank Correlation 
#> 
#>   Correlation Matrix (n = 150):
#> 
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length       1.0000     -0.1668       0.8819      0.8343
#> Sepal.Width       -0.1668      1.0000      -0.3096     -0.2890
#> Petal.Length       0.8819     -0.3096       1.0000      0.9377
#> Petal.Width        0.8343     -0.2890       0.9377      1.0000
#> 
#>   P-value Matrix:
#> 
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length       0.0000      0.0414        0e+00       0e+00
#> Sepal.Width        0.0414      0.0000        1e-04       3e-04
#> Petal.Length       0.0000      0.0001        0e+00       0e+00
#> Petal.Width        0.0000      0.0003        0e+00       0e+00

Tidy Output using as.data.frame

You can convert the output of moderncor() to a tidy data frame using as.data.frame(). This is particularly useful for correlation matrices:

# Convert correlation matrix to tidy data frame
df <- as.data.frame(res_mat)
head(df)
#>           var1         var2          r      p.value
#> 1  Sepal.Width Sepal.Length -0.1667777 4.136799e-02
#> 2 Petal.Length Sepal.Length  0.8818981 3.443087e-50
#> 3  Petal.Width Sepal.Length  0.8342888 4.189447e-40
#> 4 Sepal.Length  Sepal.Width -0.1667777 4.136799e-02
#> 5 Petal.Length  Sepal.Width -0.3096351 1.153938e-04
#> 6  Petal.Width  Sepal.Width -0.2890317 3.342981e-04

This returns a data frame containing the variables being compared (var1 and var2), the correlation coefficient (r), and p-values (p.value) if they were calculated.

Controlling P-value Computation

For large datasets, calculating p-values for modern methods (such as MIC, HSIC, or Mutual Information) can be slow because they rely on permutation tests. You can disable p-value calculations by setting p_value = FALSE for a significant speedup:

# Compute only the estimate, without p-values
moderncor(x, y, method = "dcor", p_value = FALSE)
#> 
#>    Distance Correlation 
#> 
#>   Estimate:  0.4673
#>   Sample size (n): 100

Robust Correlations

Robust correlations are less sensitive to outliers than classical methods. moderncor provides three robust correlation methods.

Biweight Midcorrelation

Biweight midcorrelation down-weights observations far from the median using a biweight function. It requires no additional packages:

set.seed(42)
x_out <- c(rnorm(95), rnorm(5, mean = 10))  # 5% outliers
y_out <- c(rnorm(95), rnorm(5, mean = 10))

moderncor(x_out, y_out, method = "biweight")
#> 
#>    Biweight Midcorrelation 
#> 
#>   Estimate:  0.1045
#>   Statistic: 1.0405
#>   P-value:   0.3007
#>   Sample size (n): 100

Compare with Pearson, which is strongly influenced by outliers:

moderncor(x_out, y_out, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.8485
#>   Statistic: 15.8745
#>   P-value:   < 2.2e-16
#>   Sample size (n): 100

Percentage Bend Correlation

Percentage bend correlation trims a specified proportion of the most extreme values (requires the WRS2 package):

moderncor(x_out, y_out, method = "percentage_bend")
#> 
#>    Percentage Bend Correlation 
#> 
#>   Estimate:  0.1893
#>   Statistic: 1.9082
#>   P-value:   0.0593
#>   Sample size (n): 100

Winsorized Correlation

Winsorized correlation replaces extreme values with the nearest non-extreme values (requires WRS2):

moderncor(x_out, y_out, method = "winsorized")
#> 
#>    Winsorized Correlation 
#> 
#>   Estimate:  0.12
#>   Statistic: 1.1964
#>   P-value:   0.2364
#>   Sample size (n): 100

Ordinal Correlations

Ordinal correlations are designed for ordered categorical (Likert-scale) data. They model the data as discretized versions of underlying continuous normal distributions.

Polychoric Correlation

Polychoric correlation is appropriate when both variables are ordinal with more than two categories (requires psych):

# Simulate ordinal data (e.g., Likert scale responses)
set.seed(1)
z1 <- rnorm(200)
z2 <- 0.7 * z1 + rnorm(200, sd = sqrt(1 - 0.7^2))
x_ord <- cut(z1, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
y_ord <- cut(z2, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)

moderncor(x_ord, y_ord, method = "polychoric")
#> 
#>    Polychoric Correlation 
#> 
#>   Estimate:  0.6411
#>   Sample size (n): 200

Tetrachoric Correlation

Tetrachoric correlation is the special case of polychoric for binary (0/1) data (requires psych):

x_bin <- as.integer(z1 > 0)
y_bin <- as.integer(z2 > 0)

moderncor(x_bin, y_bin, method = "tetrachoric")
#> 
#>    Tetrachoric Correlation 
#> 
#>   Estimate:  0.5538
#>   Sample size (n): 200

Partial and Semi-Partial Correlations

Partial and semi-partial correlations measure the relationship between two variables while controlling for one or more confounding variables (requires ppcor).

Partial Correlation

Partial correlation removes the influence of z from both x and y:

set.seed(7)
z <- rnorm(100)
x_p <- 0.6 * z + rnorm(100, sd = 0.8)  # x correlates with z
y_p <- 0.6 * z + rnorm(100, sd = 0.8)  # y correlates with z

# Raw correlation (inflated by shared z)
moderncor(x_p, y_p, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.4122
#>   Statistic: 4.4794
#>   P-value:   2.028e-05
#>   Sample size (n): 100
# Partial correlation controlling for z
moderncor(x_p, y_p, method = "partial", z = z)
#> 
#>    Partial Correlation (Pearson) 
#> 
#>   Estimate:  0.1383
#>   Statistic: 1.375
#>   P-value:   0.1691
#>   Sample size (n): 100

Semi-Partial Correlation

Semi-partial correlation removes the influence of z from y only (also requires ppcor):

moderncor(x_p, y_p, method = "semi_partial", z = z)
#> 
#>    Semi-partial Correlation (Pearson) 
#> 
#>   Estimate:  0.1097
#>   Statistic: 1.0867
#>   P-value:   0.2772
#>   Sample size (n): 100

The method_partial argument selects which base correlation to use ("pearson", "spearman", or "kendall"):

moderncor(x_p, y_p, method = "partial", z = z, method_partial = "spearman")
#> 
#>    Partial Correlation (Spearman) 
#> 
#>   Estimate:  0.1169
#>   Statistic: 1.1588
#>   P-value:   0.2465
#>   Sample size (n): 100

Nonparametric Dependence Measures

Ball Correlation

Ball correlation is a nonparametric measure of dependence based on ball covariance (requires Ball):

moderncor(x, y, method = "ball")
#> 
#>    Ball Correlation 
#> 
#>   Estimate:  0.1317
#>   Statistic: 0.0044
#>   P-value:   0.01
#>   Sample size (n): 100

Bergsma-Dassios Tau*

Bergsma-Dassios \(\tau^*\) is a nonparametric measure of association that equals zero if and only if x and y are independent (requires TauStar):

moderncor(x, y, method = "tau_star")
#> 
#>    Bergsma-Dassios Tau* 
#> 
#>   Estimate:  0.0994
#>   Statistic: 0.0994
#>   P-value:   1.395e-07
#>   Sample size (n): 100

Querying Available Methods

To see all supported correlation methods and their required packages:

available_methods()
#>             method                                         label  package
#> 1          pearson            Pearson Product-Moment Correlation    stats
#> 2         spearman                     Spearman Rank Correlation    stats
#> 3          kendall                      Kendall Rank Correlation    stats
#> 4             dcor                          Distance Correlation   energy
#> 5              mic         Maximal Information Coefficient (MIC)  minerva
#> 6             hsic Hilbert-Schmidt Independence Criterion (HSIC)    dHSIC
#> 7               xi                   Chatterjee's Xi Correlation    XICOR
#> 8        hoeffding                                 Hoeffding's D    Hmisc
#> 9      mutual_info                            Mutual Information infotheo
#> 10        biweight                       Biweight Midcorrelation built-in
#> 11 percentage_bend                   Percentage Bend Correlation     WRS2
#> 12      winsorized                        Winsorized Correlation     WRS2
#> 13      polychoric                        Polychoric Correlation    psych
#> 14     tetrachoric                       Tetrachoric Correlation    psych
#> 15         partial                           Partial Correlation    ppcor
#> 16    semi_partial                      Semi-partial Correlation    ppcor
#> 17            ball                              Ball Correlation     Ball
#> 18        tau_star                          Bergsma-Dassios Tau*  TauStar
#>           type
#> 1      classic
#> 2      classic
#> 3      classic
#> 4       modern
#> 5       modern
#> 6       modern
#> 7       modern
#> 8       modern
#> 9  information
#> 10      robust
#> 11      robust
#> 12      robust
#> 13     ordinal
#> 14     ordinal
#> 15     partial
#> 16     partial
#> 17       other
#> 18       other

To get details on a specific method:

method_info("dcor")
#> $method
#> [1] "dcor"
#> 
#> $label
#> [1] "Distance Correlation"
#> 
#> $package
#> [1] "energy"
#> 
#> $description
#> [1] "Measures both linear and nonlinear dependence. Zero if and only if independent."
#> 
#> $range
#> [1] "[0, 1]"
#> 
#> $assumptions
#> [1] "Continuous variables."

Categorical Association Measures

For categorical variables (factors or contingency tables), use moderncor_cat(). See vignette("categorical") for a full introduction to categorical association measures.

# Quick preview: Cramér's V for two factor variables
moderncor_cat(factor(mtcars$cyl), factor(mtcars$gear), method = "cramers_v")
#> 
#>    Cramer's V 
#> 
#>   Estimate:  0.5309
#>   Statistic: 18.0364
#>   P-value:   0.001214
#>   Sample size (n): 32