Many studies have two variates where each variate is a score on an
ordinal scale (e.g., an integer on a 1, …, M scale). Such data are
typically organized into a rank-ordered matrix of frequency values where
the element in the [I, J] cell is the
frequency of occasions where one variate has a rank value of I while the corresponding rank for
the other variate is J. For
such matrices, Goodman and Kruskal (1954) provided a frequentist
distribution-free concordance correlation statistic that has come to be
called the Goodman and Kruskal’s gamma or the G statistic (Siegel & Castellan,
1988). The dfba_gamma()
function provides a corresponding
Bayesian distribution-free analysis given the input of a rank-ordered
matrix.
Chechile (2020) showed that the Goodman-Kruskal gamma is equivalent
to the more general Kendall τA nonparametric
correlation coefficient. Historically, gamma was considered a different
metric from τ because,
typically, the version of τ in
standard use was τB, which is a
flawed metric because it does not properly correct for ties. It
is important to point out that the commands
cor(x, y, method = "kendall")
and
cor.test(x, y, method = "kendall")
(from the
stats
package) return the τB correlation,
which is incorrect when there are ties.
The correct τA is computed
by the dfba_bivariate_concordance()
function (see the
vignette for the dfba_bivariate_concordance()
function for
more details and examples about the difference between τA and τB). The
dfba_gamma()
function is similar to the
dfba_bivariate_concordance()
function; the main difference
is that the dfba_gamma()
function deals with data that are
organized in advance into a rank-ordered table or matrix,
whereas the input for the dfba_bivariate_concordance()
function are two paired vectors x
and y
of
continuous values.
The gamma statistic is equal to:
where nc is the number
of occasions when the variates change in a concordant way, and
nd is the
number of occasions when the variates change in a discordant
fashion. The value of nc for an order
matrix is the sum of terms for each [I, J] that are equal to
nijNij+,
where nij is
the frequency for cell [I, J] and Nij+
is the sum of the frequencies in the matrix where the row value is
greater than I and where the
column value is greater than J. The value nd is the sum of
terms for each [I, J]
that are nijNij−,
where Nij−
is the sum of the frequencies in the matrix where row value is greater
than I and the column value is
less than J. The nc and nd values
computed in this fashion are respectively equal to nc and nd values found
when the bivariate measures are entered as paired vectors into the
dfba_bivariate_concordance()
function.
As with the dfba_bivariate_concordance()
function, the
Bayesian analysis focuses on the population concordance proportion
parameter ϕ, which is linked
to the G statistic because
G = 2ϕ − 1. The
likelihood function is proportional to ϕnc(1 − ϕ)nd.
Similar to the Bayesian analysis for the concordance parameter in the
dfba_bivariate_concordance()
function, the prior
distribution is a beta distribution with shape parameters a0 and b0, and the posterior
distribution is the conjugate beta distribution where shape parameters
are a = a0 + nc
and b = b0 + nd.
dfba_gamma()
FunctionThe dfba_gamma()
function has one required argument
x
that must be an object in the form of a matrix or a
table.
The following example demonstrates how to create a matrix of data and
to analyze it using the dfba_gamma()
function.
N <- matrix(c(38, 4, 5, 0, 6, 40, 1, 2, 4, 8, 20, 30),
ncol = 4,
byrow = TRUE)
colnames(N) <- c('C1', 'C2', 'C3', 'C4')
rownames(N) <- c('R1', 'R2', 'R3')
A <- dfba_gamma(N)
A
#> Descriptive Statistics
#> ========================
#> Concordant Pairs Discordant Pairs
#> 6588 566
#> Proportion of Concordant Pairs
#> 0.9208834
#> Goodman-Kruskal Gamma
#> 0.8417668
#>
#> Bayesian Analyses
#> ========================
#> Posterior Beta Shape Parameters for the Concordance Phi
#> a b
#> 6589 567
#> Posterior Median
#> 0.920805
#> 95% Equal-tail interval limits:
#> Lower Limit Upper Limit
#> 0.914398 0.9269112
The dfba_gamma()
function also has three optional
arguments; listed with their respective default arguments, they are:
a0 = 1
, b0 = 1
, and
prob_interval = .95
The a0
and b0
arguments are the shape parameters for the prior beta distribution; the
default value of 1 for each corresponds
to a uniform prior. The prob_interval
argument specifies
the probability value for the interval estimate of the ϕ concordance parameter.
Chechile, R.A. (2020). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge: MIT Press.
Goodman, L. A., and Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732-764.
Siegel, S., and Castellan, N. J. (1988). Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.