Title: | Efficient Implementation of Kendall's Correlation Coefficient Computation |
---|---|
Description: | The computational complexity of the implemented algorithm for Kendall's correlation is O(n log(n)), which is faster than the base R implementation with a computational complexity of O(n^2). For small vectors (i.e., less than 100 observations), the time difference is negligible. However, for larger vectors, the speed difference can be substantial and the numerical difference is minimal. The references are Knight (1966) <doi:10.2307/2282833>, Abrevaya (1999) <doi:10.1016/S0165-1765(98)00255-9>, Christensen (2005) <doi:10.1007/BF02736122> and Emara (2024) <https://learningcpp.org/>. This implementation is described in Vargas Sepulveda (2024) <doi:10.48550/arXiv.2408.09618>. |
Authors: | Mauricio Vargas Sepulveda [aut, cre] , Loader Catherine [ctb] (original stirlerr implementations in C (2000)), Ross Ihaka [ctb] (original chebyshev_eval, gammafn and lgammacor implementations in C (1998)) |
Maintainer: | Mauricio Vargas Sepulveda <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.4.0 |
Built: | 2024-12-25 07:08:10 UTC |
Source: | CRAN |
The computational complexity of the implemented algorithm for Kendall's correlation is O(n log(n)), which is faster than the base R implementation with a computational complexity of O(n^2). For small vectors (i.e., less than 100 observations), the time difference is negligible. However, for larger vectors, the speed difference can be substantial and the numerical difference is minimal. The references are Knight (1966) doi:10.2307/2282833, Abrevaya (1999) doi:10.1016/S0165-1765(98)00255-9, Christensen (2005) doi:10.1007/BF02736122 and Emara (2024) https://learningcpp.org/. This implementation is described in Vargas Sepulveda (2024) doi:10.48550/arXiv.2408.09618.
Maintainer: Mauricio Vargas Sepulveda [email protected] (ORCID)
Other contributors:
Loader Catherine (original stirlerr implementations in C (2000)) [contributor]
Ross Ihaka (original chebyshev_eval, gammafn and lgammacor implementations in C (1998)) [contributor]
Useful links:
Report bugs at https://github.com/pachadotdev/capybara/issues
A dataset containing life expectancy and cigarettes per day.
cigarettes
cigarettes
A data frame with 15 rows and 2 variables:
Life expectancy in years.
Cigarettes smoked per day.
Real Statistics Using Excel (https://real-statistics.com/correlation/kendalls-tau-correlation/kendalls-correlation-testing-with-ties/).
cigarettes
cigarettes
kendall_cor()
calculates the Kendall correlation
coefficient between two numeric vectors. It uses the algorithm described in
Knight (1966), which is based on the number of concordant and discordant
pairs. The computational complexity of the algorithm is
, which is faster than the base R
implementation in
stats::cor(..., method = "kendall")
that has a computational complexity of . For small
vectors (i.e., less than 100 observations), the time difference is
negligible. However, for larger vectors, the difference can be substantial.
By construction, the implementation drops missing values on a pairwise
basis. This is the same as using
stats::cor(..., use = "pairwise.complete.obs")
.
kendall_cor(x, y = NULL)
kendall_cor(x, y = NULL)
x |
a numeric vector or matrix. |
y |
an optional numeric vector. |
A numeric value between -1 and 1.
Knight, W. R. (1966). "A Computer Method for Calculating Kendall's Tau with Ungrouped Data". Journal of the American Statistical Association, 61(314), 436–439.
Abrevaya J. (1999). Computation of the Maximum Rank Correlation Estimator. Economic Letters 62, 279-285.
Christensen D. (2005). Fast algorithms for the calculation of Kendall's Tau. Journal of Computational Statistics 20, 51-62.
Emara (2024). Khufu: Object-Oriented Programming using C++
# input vectors -> scalar output x <- c(1, 0, 2) y <- c(5, 3, 4) kendall_cor(x, y) # input matrix -> matrix output x <- mtcars[, c("mpg", "cyl")] kendall_cor(x)
# input vectors -> scalar output x <- c(1, 0, 2) y <- c(5, 3, 4) kendall_cor(x, y) # input matrix -> matrix output x <- mtcars[, c("mpg", "cyl")] kendall_cor(x)
kendall_cor_test()
calculates p-value for the the
Kendall correlation using the exact values when the number of observations
is less than 50. For larger samples, it uses an approximation as in base R.
kendall_cor_test(x, y, alternative = c("two.sided", "greater", "less"))
kendall_cor_test(x, y, alternative = c("two.sided", "greater", "less"))
x |
a numeric vector. |
y |
a numeric vector. |
alternative |
a character string specifying the alternative hypothesis.
The possible values are |
A list with the following components:
statistic |
The Kendall correlation coefficient. |
p_value |
The p-value of the test. |
alternative |
A character string describing the alternative hypothesis. |
Knight, W. R. (1966). "A Computer Method for Calculating Kendall's Tau with Ungrouped Data". Journal of the American Statistical Association, 61(314), 436–439.
Abrevaya J. (1999). Computation of the Maximum Rank Correlation Estimator. Economic Letters 62, 279-285.
Christensen D. (2005). Fast algorithms for the calculation of Kendall's Tau. Journal of Computational Statistics 20, 51-62.
Emara (2024). Khufu: Object-Oriented Programming using C++
x <- c(1, 0, 2) y <- c(5, 3, 4) kendall_cor_test(x, y)
x <- c(1, 0, 2) y <- c(5, 3, 4) kendall_cor_test(x, y)