--- title: "Correlation Types" output: rmarkdown::html_vignette: toc: true fig_width: 10.08 fig_height: 6 tags: [r, correlation, types] vignette: > %\VignetteIndexEntry{Correlation Types} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: bibliography.bib --- ```{r, include=FALSE} library(knitr) options(knitr.kable.NA = "") knitr::opts_chunk$set( comment = ">", out.width = "100%", message = FALSE, warning = FALSE, dpi = 450 ) options(digits = 2) set.seed(333) if (!requireNamespace("see", quietly = TRUE) || !requireNamespace("datawizard", quietly = TRUE) || !requireNamespace("poorman", quietly = TRUE) || !requireNamespace("ggplot2", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } else { library(see) library(datawizard) library(poorman) library(ggplot2) } ``` --- This vignette can be cited as: ```{r cite} citation("correlation") ``` --- ## Different Methods for Correlations Correlations tests are arguably one of the most commonly used statistical procedures, and are used as a basis in many applications such as exploratory data analysis, structural modeling, data engineering, etc. In this context, we present **correlation**, a toolbox for the R language [@Rteam] and part of the [**easystats**](https://github.com/easystats/easystats) collection, focused on correlation analysis. Its goal is to be lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as: - **Pearson's correlation**: This is the most common correlation method. It corresponds to the covariance of the two variables normalized (i.e., divided) by the product of their standard deviations. $$r_{xy} = \frac{cov(x,y)}{SD_x \times SD_y}$$ - **Spearman's rank correlation**: A non-parametric measure of correlation, the Spearman correlation between two variables is equal to the Pearson correlation between the rank scores of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). Confidence Intervals (CI) for Spearman's correlations are computed using the @fieller1957tests correction [see @bishara2017confidence]. $$r_{s_{xy}} = \frac{cov(rank_x, rank_y)}{SD(rank_x) \times SD(rank_y)}$$ - **Kendall's rank correlation**: In the normal case, the Kendall correlation is preferred to the Spearman correlation because of a smaller gross error sensitivity (GES) and a smaller asymptotic variance (AV), making it more robust and more efficient. However, the interpretation of Kendall's tau is less direct compared to that of the Spearman's rho, in the sense that it quantifies the difference between the % of concordant and discordant pairs among all possible pairwise events. Confidence Intervals (CI) for Kendall's correlations are computed using the @fieller1957tests correction [see @bishara2017confidence]. For each pair of observations (i ,j) of two variables (x, y), it is defined as follows: $$\tau_{xy} = \frac{2}{n(n-1)}\sum_{i0, 1, 0)*(" )) { data <- rbind(data, generate_results(r, n, transformation = transformation)) } } } data %>% datawizard::reshape_longer( select = -c("n", "r", "transformation"), names_to = "Type", values_to = "Estimation" ) %>% mutate(Type = relevel(as.factor(Type), "Pearson", "Spearman", "Kendall", "Biweight", "Distance")) %>% ggplot(aes(x = r, y = Estimation, fill = Type)) + geom_smooth(aes(color = Type), method = "loess", alpha = 0, na.rm = TRUE) + geom_vline(aes(xintercept = 0.5), linetype = "dashed") + geom_hline(aes(yintercept = 0.5), linetype = "dashed") + guides(colour = guide_legend(override.aes = list(alpha = 1))) + see::theme_modern() + scale_color_flat_d(palette = "rainbow") + scale_fill_flat_d(palette = "rainbow") + guides(colour = guide_legend(override.aes = list(alpha = 1))) + facet_wrap(~transformation) model <- data %>% datawizard::reshape_longer( select = -c("n", "r", "transformation"), names_to = "Type", values_to = "Estimation" ) %>% lm(r ~ Type / Estimation, data = .) %>% parameters::parameters() arrange(model[6:10, ], desc(Coefficient)) ``` As we can see, **distance** correlation is able to capture the strength even for severely non-linear relationships. # References