--- title: "Introduction to moderncor" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to moderncor} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `moderncor` package provides a single unified interface for computing a wide variety of classical and modern correlation coefficients. This guide introduces the core features of the package. ## Installation and Setup Once installed, you can load the package as follows: ```{r setup} library(moderncor) ``` ## Basic Usage with Vectors Let's generate some synthetic data with a non-linear parabolic relationship where $y = x^2 + \epsilon$: ```{r setup-data} set.seed(123) x <- runif(100, -1, 1) y <- x^2 + rnorm(100, sd = 0.1) ``` Because the relationship is non-linear and symmetric, classical Pearson correlation will fail to capture the dependence: ```{r pearson} moderncor(x, y, method = "pearson") ``` With `moderncor`, you can compute distance correlation (`dcor`) or Chatterjee's Xi correlation (`xi`) using the same interface to capture the non-linear relationship: ```{r modern, eval = requireNamespace("energy", quietly = TRUE)} # Distance Correlation (captures non-linear dependencies) moderncor(x, y, method = "dcor") ``` ```{r xi, eval = requireNamespace("XICOR", quietly = TRUE)} # Chatterjee's Xi (captures functional dependence) moderncor(x, y, method = "xi") ``` ## Classical Methods `moderncor` supports Pearson, Spearman, and Kendall correlations via the same interface as base R `cor()`: ```{r classical} moderncor(x, y, method = "spearman") moderncor(x, y, method = "kendall") ``` ## Matrix and Data Frame Input If you pass a matrix or a `data.frame` to `moderncor()`, it will compute the pairwise correlation matrix of the columns: ```{r matrix-input} # Compute Spearman correlation matrix for iris dataset res_mat <- moderncor(iris[, 1:4], method = "spearman") res_mat ``` ## Tidy Output using `as.data.frame` You can convert the output of `moderncor()` to a tidy data frame using `as.data.frame()`. This is particularly useful for correlation matrices: ```{r as-data-frame} # Convert correlation matrix to tidy data frame df <- as.data.frame(res_mat) head(df) ``` This returns a data frame containing the variables being compared (`var1` and `var2`), the correlation coefficient (`r`), and p-values (`p.value`) if they were calculated. ## Controlling P-value Computation For large datasets, calculating p-values for modern methods (such as MIC, HSIC, or Mutual Information) can be slow because they rely on permutation tests. You can disable p-value calculations by setting `p_value = FALSE` for a significant speedup: ```{r no-pvalue, eval = requireNamespace("energy", quietly = TRUE)} # Compute only the estimate, without p-values moderncor(x, y, method = "dcor", p_value = FALSE) ``` ## Robust Correlations Robust correlations are less sensitive to outliers than classical methods. `moderncor` provides three robust correlation methods. ### Biweight Midcorrelation Biweight midcorrelation down-weights observations far from the median using a biweight function. It requires no additional packages: ```{r biweight} set.seed(42) x_out <- c(rnorm(95), rnorm(5, mean = 10)) # 5% outliers y_out <- c(rnorm(95), rnorm(5, mean = 10)) moderncor(x_out, y_out, method = "biweight") ``` Compare with Pearson, which is strongly influenced by outliers: ```{r biweight-vs-pearson} moderncor(x_out, y_out, method = "pearson") ``` ### Percentage Bend Correlation Percentage bend correlation trims a specified proportion of the most extreme values (requires the `WRS2` package): ```{r percentage-bend, eval = requireNamespace("WRS2", quietly = TRUE)} moderncor(x_out, y_out, method = "percentage_bend") ``` ### Winsorized Correlation Winsorized correlation replaces extreme values with the nearest non-extreme values (requires `WRS2`): ```{r winsorized, eval = requireNamespace("WRS2", quietly = TRUE)} moderncor(x_out, y_out, method = "winsorized") ``` ## Ordinal Correlations Ordinal correlations are designed for ordered categorical (Likert-scale) data. They model the data as discretized versions of underlying continuous normal distributions. ### Polychoric Correlation Polychoric correlation is appropriate when both variables are ordinal with more than two categories (requires `psych`): ```{r polychoric, eval = requireNamespace("psych", quietly = TRUE)} # Simulate ordinal data (e.g., Likert scale responses) set.seed(1) z1 <- rnorm(200) z2 <- 0.7 * z1 + rnorm(200, sd = sqrt(1 - 0.7^2)) x_ord <- cut(z1, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE) y_ord <- cut(z2, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE) moderncor(x_ord, y_ord, method = "polychoric") ``` ### Tetrachoric Correlation Tetrachoric correlation is the special case of polychoric for binary (0/1) data (requires `psych`): ```{r tetrachoric, eval = requireNamespace("psych", quietly = TRUE)} x_bin <- as.integer(z1 > 0) y_bin <- as.integer(z2 > 0) moderncor(x_bin, y_bin, method = "tetrachoric") ``` ## Partial and Semi-Partial Correlations Partial and semi-partial correlations measure the relationship between two variables while controlling for one or more confounding variables (requires `ppcor`). ### Partial Correlation Partial correlation removes the influence of `z` from *both* `x` and `y`: ```{r partial, eval = requireNamespace("ppcor", quietly = TRUE)} set.seed(7) z <- rnorm(100) x_p <- 0.6 * z + rnorm(100, sd = 0.8) # x correlates with z y_p <- 0.6 * z + rnorm(100, sd = 0.8) # y correlates with z # Raw correlation (inflated by shared z) moderncor(x_p, y_p, method = "pearson") ``` ```{r partial-controlled, eval = requireNamespace("ppcor", quietly = TRUE)} # Partial correlation controlling for z moderncor(x_p, y_p, method = "partial", z = z) ``` ### Semi-Partial Correlation Semi-partial correlation removes the influence of `z` from `y` only (also requires `ppcor`): ```{r semi-partial, eval = requireNamespace("ppcor", quietly = TRUE)} moderncor(x_p, y_p, method = "semi_partial", z = z) ``` The `method_partial` argument selects which base correlation to use (`"pearson"`, `"spearman"`, or `"kendall"`): ```{r partial-spearman, eval = requireNamespace("ppcor", quietly = TRUE)} moderncor(x_p, y_p, method = "partial", z = z, method_partial = "spearman") ``` ## Nonparametric Dependence Measures ### Ball Correlation Ball correlation is a nonparametric measure of dependence based on ball covariance (requires `Ball`): ```{r ball, eval = requireNamespace("Ball", quietly = TRUE)} moderncor(x, y, method = "ball") ``` ### Bergsma-Dassios Tau* Bergsma-Dassios $\tau^*$ is a nonparametric measure of association that equals zero if and only if `x` and `y` are independent (requires `TauStar`): ```{r tau-star, eval = requireNamespace("TauStar", quietly = TRUE)} moderncor(x, y, method = "tau_star") ``` ## Querying Available Methods To see all supported correlation methods and their required packages: ```{r available-methods} available_methods() ``` To get details on a specific method: ```{r method-info} method_info("dcor") ``` ## Categorical Association Measures For categorical variables (factors or contingency tables), use `moderncor_cat()`. See `vignette("categorical")` for a full introduction to categorical association measures. ```{r categorical-preview, eval = requireNamespace("DescTools", quietly = TRUE)} # Quick preview: Cramér's V for two factor variables moderncor_cat(factor(mtcars$cyl), factor(mtcars$gear), method = "cramers_v") ```