library(DFBA)

Theoretical Background

An important statistical question deals with the measurement and testing of an association between two paired continuous measures. The product-moment correlation r is a widely used statistic to examine bivariate associations. However, the product-moment correlation depends on Gaussian assumptions for statistical tests. It is also not a robust measure because it is strongly influenced by any extreme outlier score for either of the two variates. A rank-based statistic can avoid both the problem of outlier sensitivity and the problem of being dependent upon the Gaussian model. The dfba_bivariate_concordance() function provides a Bayesian distribution-free concordance metric for characterizing the association between the two measures.

To illustrate the nonparametric concepts of concordance and discordance, consider a specific example where there are five paired scores:

x	y
3.8	5.9
4.7	-4.1
4.7	7.3
4.7	7.3
11.8	38.9

The ranks for the x variate are 1, 3, 3, 3, and 5 and the corresponding ranks for y are 2, 1, 3.5, 3.5, and 5, so, the five pairs in terms of their ranks and represented as points are P₁ = (1, 2), P₂ = (3, 1), P₃ = (3, 3.5), P₄ = (3, 3.5) and P₅ = (5, 5). Let R_xi and R_yi be the respective rank values for the x and y variates for point i. The relationship between any two of these points Pi and Pj, is either (1) concordant if the sign of R_xi − R_xj is the same as the sign of R_yi − R_yj, (2) discordant if signs are different between R_xi − R_xj and R_yi − R_yj, or (3) null if either R_xi = R_xj or if R_yi = R_yj. For this example, there are ten possible comparisons among the five points; six are concordant, one is discordant, and there are three comparisons lost due to ties. In general, given n bivariate scores, there are n(n − 1)/2 total possible comparisons. When there are ties in the x variate, there is a loss of T_x comparisons, and when there are ties in the y variate, there are a separate T_y lost comparisons. Ties in both x and y are denoted as T_xy. The total number of possible comparisons, accounting for ties, is therefore: n(n − 1)/2 − T_x − T_y + T_xy, where T_xy is added to avoid double-counting of lost comparisons. For the example above, there are three lost comparisons due to ties in x (i.e., T_x = 3), one lost comparison due to a tie in y (i.e., T_y = 1), and one comparison lost to a tie in both the x and y variates (i.e., T_xy = 1). Thus, there are [(5 * 4)/2] − 3 − 1 + 1 = 7 valid comparisons. The τ_A correlation is defined as (n_c − n_d)/(n_c + n_d), which is a value on the [−1, 1] interval. This coefficient is also called the Kendall tau-A (τ_A) correlation. It is important to note that Kendall also used a different coefficient that has come to be called tau-B (τ_B). The tau-B correlation is defined as:

Unfortunately, the τ_B formula does not properly correct for tied scores, which is regrettable because τ_B is the value returned using the cor() and cor.test() functions from the stats package using the method = "kendall" argument (i.e., cor(x, y, method = "kendall"); cor.test(x, y, method = "kendall")). If there are no ties, then T_x = T_y = T_xy = 0, and τ_A = τ_B, but if there are ties, then the coefficient that properly corrects for ties is τ_A. The dfba_bivariate_concordance() function provides the proper correction for tied scores and outputs a sample estimate for the frequentist τ_A rather than τ_B.

The focus for the Bayesian analysis is on the population proportion of concordance, which is the limit of the ratio n_c/(n_c + n_d). This proportion is a value on the [0, 1] interval, and it is called ϕ. The ϕ parameter is also connected to the population τ_A because τ_A = 2ϕ − 1. Moreover, Chechile (2020) showed that the likelihood function for observing n_c concordant changes and n_d discordant changes is a censored Bernoulli process because order is a property that must satisfy a transitivity requirement. Therefore, the likelihood given a value for ϕ is K_cϕ^n_c(1 − ϕ)^n_d where K_c is the number of transitive arrangements with n_c concordant comparisons and n_d discordant comparisons. In Bayesian statistics, the likelihood function is only specified as a proportional function because the number of possible arrangements for observing n_c concordance changes and n_d discordance changes cancel out in Bayes theorem (i.e., the number K_c is in both the numerator and the denominator of Bayes theorem, so it cancels). If the prior for ϕ is a beta distribution, then it follows that the posterior is also a beta distribution (i.e., the beta is a natural Bayesian conjugate function for Bernoulli processes). The default prior for the dfba_bivariate_concordance() function is the flat prior: a beta distribution with shape parameters a₀ = 1 and b₀ = 1.

Using the `dfba_bivariate_concordance()` Function

The dfba_bivariate_concordance() function has two required arguments – x and y – that are two paired vectors. Because these vectors are paired, each score for x[i] is linked with the corresponding score for y[i].

The dfba_bivariate_concordance() function also has four optional arguments; listed with their respective default values, they are: a0 = 1, b0 = 1, prob_interval = .95, and fitting.parameters = NULL. The arguments a0 and b0 represent the shape parameters (a₀ and b₀) for the prior beta distribution to be assumed for the Bayesian analysis of the population ϕ concordance proportion; the default value of 1 for both of these parameters corresponds to a uniform prior distribution (as noted above). Another optional argument is prob_interval(). This input allows the user to set the proportion used for the interval estimate for the ϕ parameter; the default value is .95. The last optional argument is fitting_parameters(), which has a default of NULL. This argument is only used when the user is attempting to fit a mathematical model to the continuous univariate problem.

Examples

An example for this type of problem will be examined later, but first let us see the results from a more typical bivariate association analysis.

x <- c(47, 39, 47, 42, 44, 46, 39, 37, 29, 42, 54, 33, 44, 31, 28, 49, 32, 37, 46, 55, 31)

y <- c(36, 40, 49, 45, 30, 38, 39, 44, 27, 48, 49, 51, 27, 36, 30, 44, 42, 41, 35, 49, 33)

A <- dfba_bivariate_concordance(x,
                                y)

A
#> Descriptive Statistics 
#> ========================
#>   Concordant Pairs    Discordant Pairs 
#>   128                 68 
#>   Proportion of Concordant Pairs 
#>   0.6530612 
#> 
#> Frequentist Analyses
#> ========================
#>    Tau_A
#>    0.3061224 
#> 
#> Bayesian Analyses
#> ========================
#>   Posterior Beta Shape Parameters for the Phi Concordance Measure
#>   a_post          b_post
#>   129             69 
#>   Posterior Median
#>   0.6520263 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.583946        0.7161852

plot(A)

In the special case where the user has a model for predicting a variate in terms of known quantities and where there are free-fitting parameters, the dfba_bivariate_concordance() function can provide a distribution-free measure of the goodness-of-fit of the scientific model. For this type of application, the bivariate pair are the observed values of a variate along with the corresponding predicted values from the scientific model. The concordance proportion must be adjusted in these goodness-of-fit applications to take into account the number of free parameters that were used in the prediction model. Chechile and Barch (2022) argued that the fitting parameters increase the number of concordant changes. Consequently, the value for n_c is downward-adjusted as a function of the number of free parameters. The Chechile-Barch adjusted n_c value for a case where there are m free fitting parameters is n_c − (n * m) + [m * (m + 1)/2]. As an example, suppose that there are n = 20 scores, and the prediction equation has m = 2 free parameters that result in creating a prediction for each observed score (i.e., there are 20 paired values of observed score x and predicted score y), and further suppose that this model results in n_c = 170 and n_d = 20. The value of n_d is kept at 20, but the number of concordant changes is reduced to 170 − (20 * 2) + (2 * 3/2) = 133.

# predicted values from model

p = seq(.05, .95, .05)

ypred= 17.332 - (50.261*p) + (48.308*p^2)

# Note: the coefficients in the ypred equation were found first via a polynomial regression

# observed values

yobs <- c(19.805, 10.105, 9.396, 8.219, 6.110, 4.543, 5.864, 4.861, 6.136, 5.789,
          5.443, 5.548, 4.746, 6.484, 6.185, 6.202, 9.804, 9.332, 14.408)

B <- dfba_bivariate_concordance(x = yobs, 
                                y = ypred, 
                                fitting.parameters = 3)

B
#> Descriptive Statistics 
#> ========================
#>   Concordant Pairs    Discordant Pairs 
#>   142                 29 
#>   Proportion of Concordant Pairs 
#>   0.8304094 
#> 
#> Frequentist Analyses
#> ========================
#>    Tau_A point estimate
#>    0.6608187 
#> 
#> Bayesian Analyses
#> ========================
#>   Posterior Beta Shape Parameters for the Phi Concordance Measure
#>   a_post          b_post
#>   143             30 
#>   Posterior Median
#>   0.8278497 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.766916        0.8791183 
#> 
#> Adjusted for number of model-fitting parameters
#> ------------------------
#>   Beta Shape Parameters
#>   a_post          b_post
#>   92              30 
#>   Posterior Median
#>   0.7554904 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.674262        0.8260471

plot(B)

References

Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.

Chechile, R. A., & Barch Jr., D.H. (2022). A distribution-free, Bayesian goodness-of-fit method for assessing similar scientific prediction equations. Journal of Mathematical Psychology, https://doi.org/10.1016/j.jmp.2021.102638

dfba_bivariate_concordance

Theoretical Background

Using the dfba_bivariate_concordance() Function

Examples

References

Using the `dfba_bivariate_concordance()` Function