--- title: "dfba_beta_descriptive" output: bookdown::html_document2: base_format: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{dfba_beta_descriptive} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(DFBA) ``` # Overview An important probability model in both theoretical and applied statistics is the beta distribution. It is an especially important distribution in Bayesian models of categorical data, which are associated with a number of the nonparametric procedures in the `DFBA` package. The beta is a univariate continuous probability distribution on the $[0,\,1]$ interval. The probability density is $f(x)$, and it is a function of two non-negative finite shape parameters, which we will denote as $a$ and $b$. These shape parameters can be integers or non-integer real values provided that they are greater than zero and finite. The probability density function for a beta distribution is \begin{equation} f(x) = \begin{cases} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1}, & 0 \le x \le 1, a>0, b>0 \\ 0 & elsewhere \end{cases} (\#eq:betadensity) \end{equation} For a given beta distribution, the $a$ and $b$ parameters are fixed values, so the term $\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}$ is a normalization constant that assures that the cumulative probability (*i.e.*, $F(x)=\int_{0}^{x}f(x)\,dx$) over all values for $x$ is $1$.^[The gamma function $\Gamma(x)$ is the generalization of the factorial to real, nonnegative values. If $x$ is an integer, then $\Gamma(x)=(x-1)!$.] The mean of the beta distribution is equal to $\frac{a}{a+b}$. The mode of the distribution is $\frac{a-1}{a+b-2}$, so long as $a>1$ and $b>1$ (Johnson, Kotz, & Balakrishnan,1995). When either (1) $a = b = 1$, (2) $a < 1$, or (3) $b < 1$, the mode is undefined. The variance of the distribution is $\frac{ab}{(a+b)^2)(a+b+1)}$. The purpose of the `dfba_beta_descriptive()` function is to provide centrality and interval estimates as well as to provide an easy way to see displays of both the probability density function and the cumulative probability function. The function provides information on properties of the beta distribution that are important for doing Bayesian inference, and supplements the `dbeta()`, `pbeta()`, `qbeta()`, and `rbeta()` functions included in the `stats` package. The `dfba_beta_descriptive()` function is also called by several of the other functions in the `DFBA` package. The `dfba_beta_descriptive()` function provides the *mean*, *median*, *mode*, and *variance* estimates for a beta variate in terms of the two shape parameters for the beta distribution. The mean and median of the beta distribution are always provided, but, as noted above, there are conditions under which the mode is not defined. For example when $a=b=1$, the beta distribution is a flat density function on the $[0,~1]$ interval, so there is no mode. Another case when there is not a proper mode is when either $0b$, the distribution has central point estimates greater than $.5$. Also note that the two $95$-percent interval estimates are different. The highest-density interval is a more compressed interval because it is not constrained to have equal probabilities of $.025$ outside each limit. The `plot()` method generates plots of the probability density function and the cumulative probability function: ```{r fig.retina=12, fig.width = 7} plot(dfba_beta_descriptive(a = 17, b = 3)) ``` The `dfba_beta_descriptive()` object list also contains a dataframe of $x$, $f(x)$, $F(x)$ should the user wish to create alternative displays: ```{r} x<- dfba_beta_descriptive(a = 17, b = 3)$outputdf head(x) ``` ## Example 2 Consider the case of a user who is interested in finding the $90\%$ highest-density interval for a beta distribution where the shape parameters are $31$ and $20$: ```{r} x <- dfba_beta_descriptive(a = 31, b = 20, prob_interval = .90) hdi <- c(x$hdi_lower, x$hdi_upper) hdi ``` # References Johnson, N. L., Kotz S., and Balakrishnan, N. (1995). *Continuous Univariate Distributions, Vol. 1*, New York: Wiley.