dfba_mann_whitney

library(DFBA)

Introduction and Overview

The two-sample t-test is the standard frequentist parametric statistical method for data analysis when there are two independent groups and where the dependent variable is a continuous measure. This parametric procedure is based on the assumptions that the data are independent observations from two normal distributions that have the same population variance. It is not likely that these assumptions are strictly valid, so it is advantageous to have some nonparametric alternatives for data analysis. The median test and the Mann-Whitney U test are two frequentist nonparametric methods that are the conventional alternatives to the two-sample t-test. The DFBA package has functions for doing a Bayesian form of each of those two procedures; the Mann-Whitney U test is the more powerful method of the two (Siegel & Castellan, 1988).

This vignette is organized into four sections. Theoretical Framework for the Bayesian Mann-Whitney provides the conceptual basis for understanding both frequentist and Bayesian statistical inference of the Mann-Whitney U statistics. Mathematical Basis for the Large-n Model covers the mathematical basis for the large-n approximation procedure.1 Using the dfba_mann_whitney() Function is focused on the use of the dfba_mann_whitney() function. For additional information about Monte Carlo sampling, approximating the posterior distribution with a beta distribution, the likelihood principle, and the Bayes factor, please see the vignettes for the dfba_beta_bayes_factor() function, the dfba_binomial() function, and the dfba_beta_contrast() function.

Theoretical Framework for the Bayesian Mann-Whitney

Mann and Whitney (1947) developed the idea of U statistics. Earlier, Wilcoxon (1945) employed a ranking metric that could be used when there are two independent groups, but the U statistics are preferable for testing whether one of two independent, continuous random variables is stochastically larger in the population than the other. Let us denote one of the conditions as E for an experimental group and denote the other condition as C for a control group. Given nE scores in the E condition and nC scores in the C condition, there are two U statistics. In the sample, the UE statistic is the number of times that an E labeled score is larger than a C labeled score and the UC statistic is the number of times a C variate is larger than an E variate. Consider two examples to see how these metrics are computed. For the first example, suppose there are three scores in each condition, and these values are all distinctly different with the rank ordering (from low to high) of 1. C  2. C  3. E  4. C  5. E  6. E. The UE statistic can be found by counting for each E score, the number of C scores that are less than that E. That is, UE = 2 + 3 + 3 = 8. The corresponding UC calculation is UC = 0 + 0 + 1 = 1. The second example has some tied scores. Suppose there are four scores in each condition, and there is a cluster of three scores with the same value. The ranking in this second example is: 1. C  2. C  3. E  4. (CEC)  7. E  8.E where the measures within the parenthesis are tied. Now the U statistics are UE = 2 + 2 + 4 + 4 = 12 and UC = 0 + 0 + 1 + 1 = 2. If there are no ties, UE + UC = nE ⋅ nC, but if there are clusters of tied scores UE + UC ≤ nE ⋅ nC. Yet, in all cases, the U statistics can be computed by the following method.

where xEi is a continuous measure for the ith observation for the E variate and xCj is the jth measure for the C variate. The corresponding UC formula is

Following the notation used in Chechile (2020b), let us define a population parameter ΩE:

If E is stochastically dominant, then ΩE > .5. If ΩE < .5, then C is stochastically dominant. We can also define the population parameter $\Omega_C=\lim_{n\to \infty} \frac{U_C}{U_E+U_C}$, but since ΩC = 1 − ΩE, we need just one of these two population characteristics. The DFBA package uses the ΩE parameter as a fundamental measure of the relative stochastic dominance of the two treatment conditions.

The frequentist Mann-Whitney test is based on likelihoods exclusively. In general a likelihood is the conditional probability of an outcome given an assumed value for the relevant population parameter. Effectively, the frequentist analysis assumes the population value of ΩE = .5 and computes the likelihood for the observed UE plus more extreme (unobserved) outcomes. When ΩE = .5, the two conditions are not different, so any pattern for the rank ordering is equally probable. For the first example considered above where nE = nC = 3, there are ten possible rank orderings, so each possible outcome has a likelihood value of $\frac{1}{10}$. Because there is only one more extreme possible event in this example than the observed rank ordering (i.e., the order of 1. C  2. C  3. C  4. E  5. E  6. E), the summed likelihood is .2. The summed likelihood is the frequentist p-value for hypothesis testing. If p is less than α, then the null hypothesis is rejected in favor of the alternative hypothesis that ΩE ≠ .5.2.

Chechile (2020a; 2020b) provided a Bayesian version of the Mann-Whitney test. There are a number of important differences between the frequentist analysis and the Bayesian analysis. Unlike the frequentist approach, the Bayesian analysis is not predicated on the assumption that ΩE = .5. Instead, there is an entire prior distribution for ΩE. Furthermore, unlike in the frequentist analysis, in the Bayesian analysis of the U statistics, the only likelihood that is computed after the data are collected is the likelihood of the observed U statistics given a value for the population ΩE parameter; the likelihoods of non-observed outcomes are irrelevant.3 The problem is the likelihood P(UE, UC|ΩE) is not known. However in the similar fashion as with the Bayesian analysis for the Wilcoxon signed-rank statistic4, the unknown likelihood can be approximated via a Monte Carlo sampling process. For small sample sizes, the dfba_mann_whitney() function uses a discrete approximation method. Rather than a continuous prior distribution for the ΩE parameter, the small-n method instead considers 200 values for ΩE that range from .0025 to .9975 in steps of .005. Let us denote one of these possible 200 values as ΩEi. The Monte Carlo process samples nE and nC random scores from two distributions that have the same ΩEi value. Any one of the random Monte Carlo data sets with nE and nC scores can either (1) have the same UE and UC values as the observed U statistics or (2) have U statistics that differ from the observed. This Monte Carlo sampling process is repeated many thousands of times. The proportion of the Monte Carlo sets of scores that have the same UE and UC values as the observed U statistics is an approximation of the likelihood P(UE, UC|ΩEi).

At this point, the reader might wonder how to sample from distributions that have any desired value for ΩE. Chechile (2020a) showed how to sample from two different exponential distributions that have the desired value for ΩEi. The exponential distribution has a probability density f(x) = kex, where k is the rate parameter. Chechile (2020a) showed that if one random set of exponential scores had a rate of k = 1 while the other random set of exponential scores had a rate of $k=\frac{1-\Omega_{E_i}}{\Omega_{E_i}}$, then the two exponential variates would have the desired ΩE = ΩEi. Consequently, for the 200 candidate values for ΩEi evaluated in the dfba_mann_whitney() function, the small-n algorithm approximates the likelihoods via a Monte Carlo sampling process. Given a prior distribution for the set of ΩEi values along with the likelihood estimates, there is a corresponding posterior distribution via Bayes theorem. The rest of the statistical inference consists of point and interval estimation of the posterior distribution for ΩE. Probabilities for interval hypotheses can also be computed along with Bayes factors.

Chechile (2020a) showed that the time-consuming small-n algorithm described above can be avoided when nE and nC are greater than 15. For sufficiently large sample sizes, discrete probability distribution results can be estimated via approximation. In the large-n approximation, the prior and posterior distributions are beta distributions with adjusted shape parameters. The details about the large-n solution are technical. These details are described in Mathematical Basis for the Large-n Model for the mathematically curious reader. Others are encouraged to skip to Using the dfba_mann_whitney() Function.

Mathematical Basis for the Large-n Model

This section contains the mathematical details and the rationale underlying the large-n approximation for the posterior distribution of the ΩE parameter. Readers who prefer to bypass learning about these details, may feel free to skip this section and go to the section Using the dfba_mann_whitney() Function.

Chechile (2020a) investigated the properties of the discrete Monte Carlo-based posterior for ΩE as a function of sample size in an effort to see if there could be a large-n approximation. He discovered that for larger sample sizes that a combination of the mean of the distribution E and the variance of the distribution V could be accurately predicted with a formula that only depended on nE, nC, UE and UC. The large-n prediction formula is

where nH is the harmonic mean of nE and nC, which is $\frac{2 n_E\, n_C}{n_E+n_C}$. Chechile (2020a) reported that the approximation formula was fairly accurate when nE and nC are 15 or more. In the dfba_mann_whitney() function, the large-n approach is used in a default condition whenever the harmonic mean nH of the sample sizes of the two groups is equal to or greater than 20. Furthermore, for any beta distribution with shape parameters a* and b*, the quantity $\frac{E(1-E)}{V}=a^{*}+b^{*}+1$ (Johnson, Kotz, & Balakrishnan, 1995). Thus it follows that for large nH:

The mean of the large-n beta distribution approximation for ΩE, which is denoted here as Ω̂, is $\frac{a^{*}}{a^{*}+b^{*}}$, so $1-\widehat{\Omega}=\frac{b^{*}}{a^{*}+b^{*}}$. Thus, for the large-n approximation we need a beta distribution where

The above formula enables identification of an approximating beta distribution provided that there is an estimate for Ω̂ that can be made with only the values for nE, nC, UE and UC. Based on symmetry considerations, it is clear that Ω̂ should be .5 if UE = UC. If UE > UC, then 1 ≥ Ω̂ > .5, and if UE < UC, then .5 > Ω̂ ≥ 0. Chechile (2020a) used a Lagrange interpolation method, which is more accurate than a linear interpolation (see Abramowitz & Stegun, 1972). Lagrange method is based on a polynomial interpolation formula where at some evenly-spaced fixed points the interpolation is constrained to fit perfectly. Let x be the known variate and let y be the predicted variate by means of Lagrange interpolation. Following Chechile (2020a; 2010b), the known variate x is equal to $\frac{U_E}{U_E+U_C}$ when UE ≥ UC; otherwise $x=\frac{U_C}{U_E+U_C}$. The predicted value via the interpolation is  = Ω̂ when UE ≥ UC; otherwise  = 1 − Ω̂. Based on these definitions, x and y must be greater or equal to .5. Chechile (2020a; 2020b) used the six points for x = .5, .6, … , 1 to constrain the interpolation to match six predetermined y values. Clearly, y = .5 if x = .5. The other five y values at the corresponding five x points were modeled separately. The following y values were determined from this empirical modeling:

and where the four yi values for i = 1, …, 4 are:

where

Given a vector Y = [.5 y1 y2 y3 y4 y5] and a vector X = [1 x x2 x3 x4 x5], the Lagrange polynomial prediction reduces to a single matrix formula:

where L is the following matrix: $$ \mathbf{L}=\left[ \begin{array}{cccccc} 252 & -1627 & \frac{12500}{3} & -\frac{15875}{3} & \frac{1000}{3} & -\frac{2500}{3}\\ &&&&&\\ -1050 & \frac{42775}{6} & -\frac{38075}{2} & \frac{75125}{3} & -\frac{48750}{3} & \frac{12500}{3}\\ &&&&&\\ 1800 & -12650 & \frac{104800}{3} & -\frac{142250}{3} & \frac{95000}{3} & -\frac{25000}{3}\\ &&&&&\\ -1575 & 11350 & -\frac{96575}{3} & \frac{134750}{3} & -\frac{92500}{3} & \frac{25000}{3}\\ &&&&&\\ 700 & -\frac{15425}{3} & 14900 & -\frac{63875}{3} & 15000 & -\frac{12500}{3}\\ &&&&&\\ -126 & \frac{1879}{2} & -\frac{16625}{6} & \frac{12125}{3} & -\frac{8750}{3} & \frac{2500}{3}\\ \end{array} \right] $$

The elements of matrix L are constants, which are the coefficients of the six Lagrange polynomial functions for the six match points for x (i.e., x ∈ (.5, .6, …, 1)). The predicted from equation @ref(eq:matrixformula) is equal to either Ω̂, if UE ≥ UC or 1 − Ω̂, if UE < UC. Equations @ref(eq:betadistforlargeNa) and @ref(eq:betadistforlargeNb) provide the shape parameters a* and b* for the approximating beta distribution.

Similar to the binomial model, the a* and b* shape parameters can be used to define a na and nb parameters, such that

The general large-n beta approximation model is a posterior beta distribution with shape parameters of

where a0 and b0 are the shape parameters for the prior beta distribution.

To illustrate the large-n approximation, please consider the case where nE = 20 and nC = 24 scores were randomly sampled from non-normal shifted Weibull distributions. The resulting UE = 305 and UC = 175, so $x=\frac{U_E}{U_E+U_C}=.63542$. Assuming a uniform prior for Ω, the large-n algorithm resulted in finding a predicted mean Ω̂ = .62507 and found an equal-tail 95 percent interval of [.4609, .7757]. The posterior probability that Ω > .5 is .9335. The large-n approximating beta distribution has shape parameters of 21.7691 and 13.05771. These results can be compared to results from the Monte Carlo algorithm discussed in Section 2 above with 100, 000 random values for each of 200 candidate values for Ω. The discrete approach found a mean of .62607, which is off by only .001 from the large-n value. The Monte Carlo-based 95 percent equal-tail interval is [.4603, .7775], which is again very close to the large-n interval. The posterior probability that Ω > .5 for the Monte Carlo approach is .9341, which is within .0006 of the large-n probability. For results of a more extensive set of tests of the large-n approximation, see Chechile (2020a).

Using the dfba_mann_whitney() Function

The dfba_mann_whitney() function has two required arguments and five optional arguments. The required arguments are E and C, which are vectors of continuous scores for the respective groups (the abbreviations E and C come from, respectively, the terms experimental and control; this naming convention reflects the shared background and interests of the package authors and is not meant to imply that the function is any less useful for nonexperimental data). The five optional arguments are: a0, b0, prob_interval, samples, and method. The a0 and b0 inputs are the beta distribution shape parameters for the prior distribution. These two arguments have a default value of 1, which makes the uniform distribution as the default prior, and may be set to any desired values provided that both shape parameters are positive, finite values. The prob_interval argument is the value used for interval estimation of the ΩE parameter; the default for this argument is .95. The method input can be either the string "small" or the string "large", which directs the function either to use the small-n Monte Carlo sampling algorithm described in Section 2, or to use the large-n approximation algorithm described in Section 3. The default for the method input is NULL: in that default case, the small-n algorithm or the large-n algorithm is deployed based on a simple rule. If the quantity $\frac{2n_En_C}{n_E+n_C}> 19$, then the program employs the large-n algorithm; otherwise the function will use the small-n algorithm.5 The optional argument samples is the number of Monte Carlo samples that are drawn for each of the 200 discrete cases for ΩE that are examined in the small-n algorithm. The default for the samples argument is 30, 000.

Example

To illustrate the use of the dfba_mann_whitney() function, let us examine the following:

G1 <- c(96.49, 96.78, 97.26, 98.85, 99.75, 100.14, 101.15, 101.39, 102.58, 
        107.22, 107.70, 113.26)
 
G2 <- c(101.16, 102.09, 103.14, 104.70, 105.27, 108.22, 108.32, 108.51, 109.88,
        110.32, 110.55, 113.42)

set.seed(1)

dfba_mann_whitney(E = G1, 
                  C = G2, 
                  hide_progress = TRUE)

The above analysis is based on the default uniform prior and it obtains a posterior probability for ΩE, which is called omega_E in the output. Because the sample sizes are relatively small, the analysis used the discrete approximation method, which is based on Monte Carlo sampling. The posterior probability for ΩE > .5 is low, as is the corresponding Bayes factor BF10 for that hypothesis. Hence, the probability is high that ΩE < .5. The Bayes factor for the hypothesis that ΩE < .5 is $BF_{01}=\frac{1}{BF_{10}}$, and that value is high. Because these results are based on Monte Carlo sampling, it is reasonable to expect some variability in observed dfba_mann_whitney() function results, for example, changing the seed from 1 (as in the above code) to 2 results in the following:

set.seed(2)
dfba_mann_whitney(E = G1, 
                  C = G2,
                  hide_progress = TRUE)

mw_ex2 <- dfba_mann_whitney(E = G1, 
                  C = G2,
                  hide_progress = TRUE)
#> Descriptive Statistics 
#> ========================
#>   n_E             n_C 
#>   12              12 
#>   E mean          C mean 
#>   101.881         107.1317 
#>   Mann-Whitney Statistics 
#>   U_E             U_C 
#>   24              120 
#> 
#>   Monte Carlo Sampling with Discrete Probability Values
#> ========================
#>   Number of MC Samples
#>   30000 
#>   
#>   Mean of omega_E:
#>   0.2030363 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.0630856       0.3978107 
#>   probability that omega_E exceeds 0.5:
#>   prior           posterior
#>   0.5             0.002902218 
#>   Bayes factor BF10 for omega_E > 0.5:
#>   0.002910665

Although there are some small differences between the two implementations, the main conclusions are robust (i.e., it is highly probable that ΩE < .5, which implies that C is stochastically dominant relative to E). The sample U statistics for these data are UE = 24 and UC = 120 for both analyses regardless of the seed.

The plot() method produces visualizations of the prior (optional) and posterior distributions (note: the representation of the prior distribution is optional: plot.prior = TRUE – the default – displays both the prior and posterior distribution; plot.prior = FALSE produces only a representation of the posterior distribution).

plot(dfba_mann_whitney(E = G1, 
                       C = G2,
                       hide_progress = TRUE))

Although the above analysis used the Monte Carlo sampling approach, let us compare those analyses with the analysis based on the large-n algorithm by specifying the method argument as method = "large":

set.seed(1)
dfba_mann_whitney(E = G1, 
                  C = G2, 
                  method = "large")
#> Descriptive Statistics 
#> ========================
#>   n_E             n_C 
#>   12              12 
#>   E mean          C mean 
#>   101.881         107.1317 
#>   Mann-Whitney Statistics 
#>   U_E             U_C 
#>   24              120 
#> 
#>  Beta Approximation Model for Omega_E
#>  for 2*nE*nC/(nE+nC) > 19
#> ========================
#>   Posterior beta shape parameters:
#>   posterior a         posterior b
#>   4.46078             17.37522 
#>   posterior mean      posterior median
#>   0.204286            0.195158 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.0673851       0.3921002 
#>   95% Highest-density interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.0539048       0.370435 
#> 
#>  
#>   probability that omega_E > 0.5:
#>   prior           posterior
#>   0.5             0.00174078 
#>   Bayes factor BF10 for omega_E > 0.5:
#>   0.001743816

Note that a beta distribution with shape parameters of a = 4.460781 and b = 17.37522 yields values for the mean for ΩE and the 95-percent equal-tail interval limits that are close to the results found from the small-n approach.

The plot(dfba_mann_whitney()) method provides a display of the probability densities as a function of ΩE. Note that the small-n displays are in terms of discrete probabilities for the 200 intervals of width .005. However, the large-n plot has a vertical axis that is probability density (rather than probability). Vectors of x and y data can be extracted from the dfba_mann_whitney() output values to customized data visualizations of prior and posterior probability values (e.g.,to create visualizations using the ggplot2 or plotly packages):

  1. When method = small, the x values for the prior and posterior distribution plots is given by the omega_E output value. The y values for the prior and posterior distributions are given by, respectively: priorvector and omegapost.

  2. When method = large, the prior distribution is given by a beta distribution with shape parameters [a0, b0]; the posterior distribution is given by a beta distribution with shape parameters [a_post, b_post]. The dbeta() function from the stats package produces y values for any defined sequence of x values.

The Monte Carlo algorithm is much slower to complete, especially as the sample sizes get to be increasingly large. It is our opinion, the default of method = NULL results in a reasonable rule-of-thumb that has been pretested over a wide range of sample sizes and data sets. Nonetheless, the user has the option of employing either the small-n Monte Carlo algorithm or the large-n algorithm by entering their choice with an input for method.

Finally, let us consider a case where there are more than two groups. For example, suppose that there were six independent groups tested in a 3 × 2 factorial study, and the following scores were observed.

Let us code the data for each group as a vector.

A1B1 <- c(11.541, 11.854, 11.313, 14.201, 11.333, 11.583, 11.223)

A2B1 <- c(11.210, 11.117, 12.967, 12.514, 11.232, 13.585, 11.023)

A3B1 <- c(4.762, 2.323, 5.890, 2.722, 2.499, 2.534, 2.016)

A1B2 <- c(1.500, 1.562, 1.444, 1.822, 1.802, 1.075, 1.464)

A2B2 <- c(2.663, 1.503, 1.086, 1.459, 1.296, 1.009, 3.316)

A3B2 <- c(11.067, 11.117, 10.180, 10.060, 10.664, 10.074, 10.355)

Before doing statistical analyses, let us first see a display of the means in the six conditions. Note that the resulting display is a connected graph rather than a more conventional bar chart. This plotting choice was done to better illustrate potential interactions of the two factors in the experiment.

To see if there is an effect of the levels for the B factor, we can use the dfba_mann_whitney() function:

dfba_mann_whitney(E = c(A1B1, 
                        A2B1, 
                        A3B1), 
                  C = c(A1B2, 
                        A2B2, 
                        A3B2))
#> Descriptive Statistics 
#> ========================
#>   n_E             n_C 
#>   21              21 
#>   E mean          C mean 
#>   9.02105         4.596095 
#>   Mann-Whitney Statistics 
#>   U_E             U_C 
#>   380             60 
#> 
#>  Beta Approximation Model for Omega_E
#>  for 2*nE*nC/(nE+nC) > 19
#> ========================
#>   Posterior beta shape parameters:
#>   posterior a         posterior b
#>   31.2713             5.918987 
#>   posterior mean      posterior median
#>   0.840846            0.8469826 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.709151        0.9380491 
#>   95% Highest-density interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.723621        0.9473272 
#> 
#>  
#>   probability that omega_E > 0.5:
#>   prior           posterior
#>   0.5             0.999995 
#>   Bayes factor BF10 for omega_E > 0.5:
#>   198990.2

The results show clear evidence that there is a B effect (i.e., the pooled B1 group is stochastically dominant to the pooled B2 group because the Bayes factor BF10 for the hypothesis that ΩE > .5 is 198, 990). However, similar analyses to assess comparisons between any of the levels for the A factor do not yield a sizable Bayes factor. Beside a main effect for the B factor, the above plot of the means suggests that there may be a statistically reliable interaction. This interaction can be examined via a contrast Δ where

If Δ were substantially different from 0, then there would be a reliable interaction effect. So, by the proper definition of the E and C groups, we can statistically assess the interaction with a Bayesian Mann-Whitney test:

dfba_mann_whitney(E = c(A1B1, 
                        A2B1, 
                        A3B2), 
                  C = c(A1B2, 
                        A2B2, 
                        A3B1))
#> Descriptive Statistics 
#> ========================
#>   n_E             n_C 
#>   21              21 
#>   E mean          C mean 
#>   11.4387         2.178429 
#>   Mann-Whitney Statistics 
#>   U_E             U_C 
#>   441             0 
#> 
#>  Beta Approximation Model for Omega_E
#>  for 2*nE*nC/(nE+nC) > 19
#> ========================
#>   Posterior beta shape parameters:
#>   posterior a         posterior b
#>   38.7549             0.5831224 
#>   posterior mean      posterior median
#>   0.985177            0.9922342 
#>   95% Equal-tail interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.931586        0.9999618 
#>   95% Highest-density interval limits: 
#>   Lower Limit     Upper Limit 
#>   0.94649         1 
#> 
#>  
#>   probability that omega_E > 0.5:
#>   prior           posterior
#>   0.5             1 
#>   Bayes factor BF10 for omega_E > 0.5:
#>   2.473146e+12

This instruction results in finding an astronomically large Bayes factor of BF10 > 2.47 × 1012 that Δ > .5.

This last example underscores how the Bayesian Mann-Whitney procedure can be used when there are more than two independent groups. With the appropriate pooling of the groups, specific contrast effects can be tested.

References

Abramowitz, M., and Stegun, C. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover.

Berger J. O., and Wolpert, R. L. (1988). Likelihood Principle (2nd ed.). Hayward, CA: Institute of Mathematical Statistics.

Chechile, R. A. (2020a). A Bayesian analysis for the Mann-Whitney statistic. Communications in Statistics: Theory and Methods}, 49, 670-696. https://doi.org/10.1080/03610926.2018.1549247

Chechile, R. A. (2020b). Bayesian Statistics for Experimental Scientists: A General Introduction Using Distribution-Free Methods. Cambridge, MA: MIT Press.

Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, vol. 2, New York: Wiley.

Lindley, D. V., and Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view). The American Statistician, 30, 112-119.

Mann, H. B., and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50-60.

Siegel, S., and Castellan N. J. (1988). Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80-83.


  1. This material is technical and perhaps not necessary for many of the less mathematically oriented readers. Such readers, if they are willing to accept the large-n algorithm as accurate, can skip this section.↩︎

  2. The wilcox.test(..., paired = FALSE) function in the stats package performs the frequentist Mann-Whitney test; the value for UE defined above is called W in the wilcox.test() output↩︎

  3. The frequentist inclusion of likelihoods for non-observed events violates an essential principle in Bayesian statistics that is called the likelihood principle (Berger & Wolpert, 1988). In essence, this principle is the idea that upon the completion of data collection, the only likelihood that should be computed is the likelihood of the data. Other possible non-observed events are not part of Bayes theorem, and therefore any non-observed outcomes are irrelevant. Several researchers have demonstrated inferential paradoxes when the likelihood principle is violated (Lindley & Phillips, 1976; Chechile, 2020b). For example, Chechile (2020b) described two studies with Bernoulli data where there are the same number of success outcomes and the same number of failures. The likelihood for the observed outcome is thus the same for the two experiments. However, because the two experiments used different stopping rules, the non-observed outcomes are different in the two studies. With the inclusion of the likelihoods for the non-observed events, the frequentist analyses arrive at different conclusions for the two experiments. In one study the inclusion of the likelihood from non-observed more extreme events resulted in p > .05; whereas in the other experiment the inclusion of non-observed extreme likelihoods resulted in p < .05. So, the two studies with the same frequencies for the Bernoulli processes have different conclusions when frequentist tests are used.↩︎

  4. See the vignette for the dfba_wilcoxon() function↩︎

  5. The quantity $\frac{2nEnC}{nE+nC}$ is the harmonic mean of the sample sizes in the two conditions. The harmonic mean is less than or equal to the arithmetic mean. Chechile (2020a) found that the harmonic mean was a more precise predictor (than either the arithmetic mean or the geometric mean) as to when the sample sizes were sufficiently large to justify the large-n method. However, when nE = nC, the three types of means of the two sample size values are equivalent.↩︎