An important probability model in both theoretical and applied
statistics is the beta distribution. It is an especially important
distribution in Bayesian models of categorical data, which are
associated with a number of the nonparametric procedures in the
DFBA
package. The beta is a univariate continuous
probability distribution on the [0, 1]
interval. The probability density is f(x), and it is a function
of two non-negative finite shape parameters, which we will denote as
a and b. These shape parameters can be
integers or non-integer real values provided that they are greater than
zero and finite. The probability density function for a beta
distribution is
For a given beta distribution, the a and b parameters are fixed values, so the term $\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}$ is a normalization constant that assures that the cumulative probability (i.e., F(x) = ∫0xf(x) dx) over all values for x is 1.1 The mean of the beta distribution is equal to $\frac{a}{a+b}$. The mode of the distribution is $\frac{a-1}{a+b-2}$, so long as a > 1 and b > 1 (Johnson, Kotz, & Balakrishnan,1995). When either (1) a = b = 1, (2) a < 1, or (3) b < 1, the mode is undefined. The variance of the distribution is $\frac{ab}{(a+b)^2)(a+b+1)}$.
The purpose of the dfba_beta_descriptive()
function is
to provide centrality and interval estimates as well as to provide an
easy way to see displays of both the probability density function and
the cumulative probability function. The function provides information
on properties of the beta distribution that are important for doing
Bayesian inference, and supplements the dbeta()
,
pbeta()
, qbeta()
, and rbeta()
functions included in the stats
package. The
dfba_beta_descriptive()
function is also called by several
of the other functions in the DFBA
package.
The dfba_beta_descriptive()
function provides the
mean, median, mode, and variance
estimates for a beta variate in terms of the two shape parameters for
the beta distribution. The mean and median of the beta distribution are
always provided, but, as noted above, there are conditions under which
the mode is not defined. For example when a = b = 1, the beta
distribution is a flat density function on the [0, 1] interval, so there is no mode. Another
case when there is not a proper mode is when either 0 < a < 1, 0 < b < 1 or when both shape
parameters are less than 1, which
results in the density function that diverges at end points. The
dfba_beta_descriptive()
function reports the modal value as
NA
whenever the mode is not properly defined.
In addition to centrality and variance estimates, the
dfba_beta_descriptive()
function provides two interval
estimates for the beta variate. Each of the interval estimates captures
a set proportion of the distribution where a given probability lies
within the limits. For both estimates, the default value is (95%). One interval estimate has
equal-tail probabilities (i.e., the probability below
the lower limit is equal to the probability above the upper limit). The
other interval estimate is the most compact interval that contains the
stipulated probability; this interval estimate is called the
highest-density interval.
The dfba_beta_descriptive()
function has three
arguments:
a
b
prob_interval
The first example employs the default value of .95 for the prob_interval
argument, and it examines the case where the first and second beta shape
parameters are, respectively, 17 and
3: The code for this example is
dfba_beta_descriptive(a = 17,
b = 3)
#> Centrality Estimates
#> ========================
#> Mean Median Mode
#> 0.85 0.861729 0.8888889
#>
#> Spread Estimate
#> ========================
#> Variance
#> 0.00607143
#>
#> Interval Estimates
#> ========================
#> 95% Equal-tail interval limits:
#> Lower Limit Upper Limit
#> 0.668623 0.9661738
#> 95% Highest-density interval limits:
#> Lower Limit Upper Limit
#> 0.697388 0.9801174
#>
Note that because a > b, the distribution has central point estimates greater than .5. Also note that the two 95-percent interval estimates are different. The highest-density interval is a more compressed interval because it is not constrained to have equal probabilities of .025 outside each limit.
The plot()
method generates plots of the probability
density function and the cumulative probability function:
The dfba_beta_descriptive()
object list also contains a
dataframe of x, f(x), F(x) should the user wish
to create alternative displays:
x<- dfba_beta_descriptive(a = 17,
b = 3)$outputdf
head(x)
#> x density cumulative_prob
#> 1 0.000 0.000000e+00 0.000000e+00
#> 2 0.005 4.391484e-34 1.292334e-37
#> 3 0.010 2.849151e-29 1.677853e-32
#> 4 0.015 1.852583e-26 1.637400e-29
#> 5 0.020 1.829688e-24 2.157461e-27
#> 6 0.025 6.434198e-23 9.489049e-26
Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 1, New York: Wiley.
The gamma function Γ(x) is the generalization of the factorial to real, nonnegative values. If x is an integer, then Γ(x) = (x − 1)!.↩︎