---
title: "multinomineq: Multinomial Models with Inequality Constraints"
author: "Daniel W. Heck"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{multinomineq: Multinomial Models with Inequality Constraints}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
header-includes:
- \usepackage{bm}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The R package `multinomineq` facilitates the Bayesian analysis of discrete/categorical data using multinomial models with inequality constraints.
Essentially, these types of models allow to analyze discrete choice frequencies while assuming that the corresponding choice probabilities $\theta$ are constrained by linear inequality constraints (e.g., by the simple order $\theta_{1} \leq \theta_{2}\leq \theta_{3}\leq \theta_{4}$).
In the following, this vignette shows how to define and test such models in R.
The second section below provides a concise formal definition of the model class.
## Example: Testing the Description-Experience Gap
The following example shows how inequality constraints can be used to test the description-experience (DE) gap in risky decision making (Hertwig et al., 2004).
In the experiment, participants had to choose one of two risky gambles in each of 6 different paired comparisons (e.g., a lottery in which you win 10€ with $p=.20$ versus a lottery in which you can win 20€ with $p=.45$).
The description-experience gap states that people underweigh small probabilities $p$ when making decisions based on experience.
We will conduct a reanalysis of Hertwig's data set using the inequality-constrained multinomial model in Regenwetter & Robinson (2017).
First, it is necessary to install the R package `multinomineq` from Github.
Note that it is necessary to install the developer R tools first.
Afterwards, `multinomineq` can be installed and loaded via:
```{r}
### uncomment to install packages:
# install.packages("devtools", "coda", "RcppArmadillo", "RcppProgress", "Rglpk", "quadprog")
# devtools::install_github("danheck/multinomineq")
library("multinomineq")
library("coda")
set.seed(1234)
```
### Data structure
We reanalyze the data by Hertwig et al. (2004) who collected choice frequencies of $n=25$ participants for 6 paired comparisons of risky gambles.
The observed choice frequencies can simply be summarized by the frequencies of choosing "Option H" (a specific lottery, cf. original article) in each of the 6 paired comparisons:
```{r}
# total number of observations for each paired comparison:
n <- 25
# Frequencies of choosing "Option H" in Description condition:
HER_desc <- c(9, 16, 16, 7, 12, 16)
# Frequencies of choosing "Option H" in Experience condition:
HER_exp <- c(22, 11, 7, 14, 5, 3)
```
### Vertex ($V$-)representation
First, it is shown how to use the vertex ($V$-)representation for inequality-constrained models.
Essentially, the rows of the matrix $V$ list all patterns that are predicted by a substantive theory of interest.
Each prediction either states that "Option H" is chosen (value `1`) or not (value `0`).
For example, if the theory makes the prediction that "Option H" is chosen only in the 4th paired comparison but otherwise not, the matrix $V$ needs to include the following pattern in one row: `c(0,0,0,1,0,0)`.
Similarly, all possible predictions need to be combined to form the matrix $V$.
Inequality-constrained models assume a mixture distribution over all these possible predicted patterns
(e.g., due to heterogeneity across participants or trials).
Geometrically, the corresponding parameter space thus represents the convex hull of the predicted patterns/vertices (i.e., all possible linear combinations of the rows of $V$ when using nonnegative weights).
For the description-experience gap, Regenwetter & Robinson (2017) used a specific decision making model (Cumulative Prospect Theory) to generate all possible predictions.
The matrix $V$ lists all of these predictions when assuming that participants underweigh small probabilities $p$ of winning money.
The following matrix contains all predicted patterns for the 6 risky gambles:
```{r}
# Underweighting of small probabilities CPT:
# (1 = choose "Option H"; 0 = do not choose "Option H")
V <- matrix(
c(
0, 0, 0, 0, 0, 0, # first predicted pattern
0, 0, 0, 1, 0, 0, # second predicted pattern
0, 0, 1, 1, 0, 0, # ...
1, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0,
1, 0, 1, 1, 0, 0,
1, 1, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0,
1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 1, 1,
1, 1, 0, 1, 0, 0,
1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1
),
ncol = 6, byrow = TRUE
)
```
Given this representation of the theoretical predictions, it is not easy to check whether a given vector of choice probabilities for the 6 gambles is inside the convex hull of the predicted patterns.
As a remedy, one can use the function `inside()` to test this, or the function `find_inside()` to find a probability vector that is inside the convex hull of the predictions:
```{r}
# define a specific vector of choice probabilities:
p_observed <- c(.25, .48, .93, .10, .32, .50)
# check whether p is inside the convex hull defined by V:
inside(p_observed, V = V)
# to find a (random) probability that is inside the convex hull:
find_inside(V = V, random = TRUE)
```
### Inequality ($Ab$-)representation
Instead of listing the vertices (or edges) of the inequality-constrained parameter space, we may instead define a set of linear inequality constraints on the parameters explicitly.
For this purpose, we specify a matrix $A$ and a vector $b$.
The model only allows parameter vectors $\theta$ that satisfy the inequalities defined by the matrix equation: $A \cdot \theta \leq b$.
For the description-experience gap, one can show that instead of listing the predictions by the matrix $V$ as shown above, one can equivalently describe the linear inequalities as follows:
```{r}
A <- matrix(
c(
0, 0, -1, 0, 0, 0, # first inequality
0, 0, 0, 0, 0, -1, # second inequality
-1, 1, 0, 0, 0, 0, # ...
0, -1, 0, 0, 1, 0,
0, 0, 0, 0, -1, 1,
0, 0, 1, -1, 0, 0,
0, 0, 0, 1, 0, 0,
1, 0, 0, 0, 0, 0
),
ncol = 6, byrow = TRUE
)
b <- c(0, 0, 0, 0, 0, 0, 1, 1)
```
For example, the third row of $A$ and the third value of $b$ specify that
$-1 \cdot \theta_1 + 1 \cdot \theta_2 + 0 \cdot \theta_3+ 0 \cdot \theta_4+ 0 \cdot \theta_5+ 0 \cdot \theta_6 \leq 0$ (which is equivalent to $\theta_2 \leq \theta_1$).
Using the matrix $A$ and the vector $b$, one can use standard matrix multiplication in R (`%*%`) to check whether a given vector of choice probabilities satisfies all of the inequality constraints:
```{r}
p_observed <- c(.25, .48, .93, .10, .32, .50)
all(A %*% p_observed <= b)
# corresponding function in multinomineq:
inside(p_observed, A, b)
# find a point that satisfies the constraints:
find_inside(A, b, random = TRUE)
```
Note that it is sometimes possible to convert the vertex ($V$-)representation (convex hull) to the inequality ($Ab$-)representation using specific software.
For older R versions (before 4.0.0), this could be done with the package `rPorta` from Github:
```{r, eval = FALSE}
### (not functional for R >= 4.0.0):
devtools::install_github("TasCL/rPorta")
Ab <- V_to_Ab(V)
Ab
```
### Posterior sampling
In a Bayesian framework, inequality-constrained multinomial models can be fitted by drawing posterior samples for the parameter vector of choice probabilities $\theta$.
If the model is defined in terms of a matrix $V$ with predicted patterns, we obtain $M=2,000$ posterior samples as follows:
```{r, eval = FALSE}
# fit data from Description and Experience condition:
p_D <- sampling_binom(k = HER_desc, n = n, V = V, M = 2000, progress = FALSE)
p_E <- sampling_binom(k = HER_exp, n = n, V = V, M = 2000, progress = FALSE)
```
If the model is defined in terms of the inequality $Ab$-representation, we obtain posterior samples in a similar way by replacing the argument `V=V` by the two arguments `A=A, b=b`:
```{r}
# fit data from Description and Experience condition:
p_D <- sampling_binom(
k = HER_desc, n = n, A = A, b = b,
M = 2000, progress = FALSE
)
p_E <- sampling_binom(
k = HER_exp, n = n, A = A, b = b,
M = 2000, progress = FALSE
)
```
If possible, the $Ab$-representation should be used because the sampling is much more efficient than for the $V$-representation.
Irrespective of how the posterior samples have been obtained, we can use standard methods in Bayesian analysis to check the convergence of the MCMC sampler and obtain summary statistics of the posterior distribution:
```{r, fig.width=7, fig.height=5.5}
# check convergence of MCMC sampler (should look like hairy caterpillars):
plot(p_D)
# summarize posterior samples:
summary(p_D)
```
Similarly, we can use posterior-predictive checks to test whether an inequality-constrained model fits the data adequately (i.e., whether the constraints hold empirically).
Essentially, this method computes Pearson's $X^2$-statistic to measure the discrepancy both for the observed and the posterior-predicted frequencies.
If the model does *not* fit the data, the posterior-predicted $p$-value `ppp` will become very small (and it will be close to 0.50 if the model fits).
Hence, we usually apply a criterion such as `ppp < .05` to reject models that do not fit the data, as in the following example:
```{r}
# posterior-predictive p-value for Description condition:
ppp_binom(prob = p_D, k = HER_desc, n = n)
ppp_binom(prob = p_D, k = HER_exp, n = n)
```
### Bayes factor
Often, one is interested in quantifying the evidence in favor of the inequality constraints.
We can use Bayesian model selection to compare the inequality-constrained model $M_0$ against the unconstrained model $M_u$ that does not restrict the choice probabilities $\theta$ (also called encompassing model).
The Bayes factor `bf_0u` quantifies the evidence in favor of the inequality-constrained model $M_0$ versus the unconstrained model $M_u$.
Similarly, `bf_u0` is the inverse of `bf_0u` and provides the evidence for the unconstrained model.
Moreover, the Bayes factor `bf_00'` compares the inequality-constrained model $M_0$ against the model $M_{0'}$ which has a parameter space defined as the *complement* of the parameter space of $M_0$.
Bayes factors are computed as follows:
```{r,eval = FALSE}
# compute Bayes factor using the V-representation
bf_D <- bf_binom(k = HER_desc, n = n, V = V, M = 10000, progress = FALSE)
bf_E <- bf_binom(k = HER_exp, n = n, V = V, M = 10000, progress = FALSE)
```
```{r}
# compute Bayes factor using the Ab-representation
bf_D <- bf_binom(HER_desc, n = n, A = A, b = b, M = 10000, progress = FALSE)
bf_E <- bf_binom(HER_exp, n = n, A = A, b = b, M = 10000, progress = FALSE)
bf_D
bf_E
```
Note that `multinomineq` also provides an error of approximation for the Bayes factor to account for random sampling error.
The matrix that is provided as output shows the standard error `se` for the Bayes factor and the 90% credibility interval `ci.5%` and `ci.95%`.
##### Increased efficiency for Bayes factor computations
Within the function `bf_binom`, the encompassing Bayes factor is computed in two steps based on the *unconstrained* multinomial model $M_u$:
1. Draw *prior* parameter samples from $M_u$ and count the proportion of samples $c$ inside the inequality-constrained parameter space.
2. Draw *posterior* parameter samples from $M_u$ and count the proportion of samples $f$ inside the inequality-constrained parameter space.
Based on these two proportions, the Bayes factor in favor of the inequality constraints is approximated as $B_{0u}=f/c$.
It might be convenient to perform these computations in separate steps in R for some models and datasets.
For example, the prior constant $c$ is sometimes known analytically.
Moreover, when analyzing multiple datasets or participants, it is more efficient to estimate the prior constant $c$ only once because $c$ is identical for all analyses:
```{r}
# proportion of *prior* samples satisfying inequality constraints
cnt <- count_binom(k = 0, n = 0, A = A, b = b, M = 50000, progress = FALSE)
cnt
# proportion of *prior* samples satisfying inequality constraints
# (dataset: Experience)
cnt_E <- count_binom(k = HER_exp, n = n, A = A, b = b, M = 10000, progress = FALSE)
cnt_E
# obtain Bayes factor (including error of approximation)
count_to_bf(posterior = cnt_E, prior = cnt)
```
Since Bayes factors can be hard to compute for some models and data sets, `multinomineq` provides a stepwise procedure that aims at increasing the efficiency.
The user can specify a minimum number of samples `cmin` that must satisy a subset of the inequality constraints.
The algorithm keeps sampling until this threshold is reached.
The argument `steps` determines at which rows of the matrix $A$ the inequalities are split into subsets of increasing size.
For example, `steps = c(3, 5, 7)` means that the first subset contains only the first 3 inequalities, the second subset the first 5 inequalities etc.:
```{r}
# use a stepwise counting approach for the "Description" condition:
# ("cmin" = minimum of samples that satisfy constraints before sampling stops)
cnt_D <- count_binom(
k = HER_desc, n = n, A = A, b = b,
M = 5000, cmin = 1, steps = c(3, 5, 7), progress = FALSE
)
cnt_D
# obtain Bayes factor (including error of approximation)
count_to_bf(posterior = cnt_D, prior = cnt)
```
### Data format for multinomial frequencies
In the present example, we only observed *binomial* data, which can easily be described by the frequency of choosing one of two options.
If more than two choice options are provided for a comparison, a different data format has to be used for *multinomial* data.
In the binomial case, we ommitted the frequencies of *not* choosing a specific option because we knew the total number $n$.
For multinomial data, one needs to specify *all* choice frequencies $k_{ij}$.
Moreover, one needs to specify a vector `options` that defines how many choice options are available for each item type (e.g., `options = c(3,3,)` means that there are two scenarios, each with three possible options).
Using this notation, the 6 binomial choice frequencies above can equivalently be specified by the multinomial data format as follows:
```{r}
# binomial data format
n <- 25
HER_desc <- c(9, 16, 16, 7, 12, 16)
# multinomial data format ("k" must be a vector!)
k_multinom <- c(
9, 16, # first binary gamble
16, 9, # second binary gamble
16, 9, # ...
7, 17,
12, 13,
16, 9
)
options <- c(2, 2, 2, 2, 2, 2) # 2 options for each type
```
The binomial format can also be translated to the multinomial format automatically:
```{r}
mn <- binom_to_multinom(HER_desc, n)
mn
```
The multinomial data format is used as input for the analysis as follows:
```{r}
posterior <- sampling_multinom(k = mn$k, options = mn$options, A = A, b = b)
bayesfactor <- bf_multinom(k = k_multinom, options = options, A = A, b = b)
bayesfactor
```
## Formal definition of multinomial models with inequality constraints
In the following, we define inequality-constrained multinomial models for discrete data.
We use the term "item type" to refer to a category system $i$ that is modeled by a multinomial distribution with a fixed total number of observations $n_i$.
For an item type $i$, we label the observed frequencies as $k_{ij}$.
#### Parameters and Likelihood
The parameter vector of interest refers to the choice probabilities:
$$\theta = (\theta_{11},\dots,\theta_{1 (J_1-1)},\theta_{21},\dots, \theta_{I (J_I-1)})$$
Note that the last probability within each item type is ommited because probabilities have to sum to one.
Hence, for $n_i=3$ choice options there are only 2 free parameters.
Using this notation, we can define a product-multinomial likelihood function:
$$p(k \mid \theta) = \prod_{i=1}^I
{n_i \choose k_{i1} ,\dots ,k_{i J_i} }
\prod_{j=1}^{J_i - 1} \theta_{ij}^{k_{ij}}
\left(1- \sum_{r=1}^{J_i-1} \theta_{ir}\right)^{k_{J_i}}$$
#### Linear Inequality Constraints
Inequality constraints can be defined in two different, equivalent ways.
First, one may define a list of inequality constraints that have to hold for the choice probabilities, for instance:
\begin{align*}
\theta_{11} \leq \theta_{21} \, \text{ and } \, \theta_{21} \leq \theta_{31}
\end{align*}
These inequalities can be expressed in vector notation as $A \,\theta \leq b$ by defining a matrix $ A$ and a vector $ b$ as follows:
$$ A = \pmatrix{1 & -1 & 0 \\ 0 & 1 & -1} \,\, \text{ and }\,\, b = \pmatrix{0 \\ 0}$$
To formalize this notation, we define the constrained parameter space $\Omega_0$ via a matrix $A$ and a vector $b$ that specify a list of inequalities that have to hold:
$$\Omega_0 = \left\{\theta \in \Omega \, \middle | \, A \, \theta \leq b \right\}$$
Alternatively, one may list all the vertices $v^{(s)}$ defining the constrained parameter space in the rows of a matrix $V$.
In the example above, we have
$$ v^{(1)} = \pmatrix{0\\0\\0} \,,\,
v^{(2)} = \pmatrix{0\\0\\1} \,,\,
v^{(3)} = \pmatrix{0\\1\\1} \,,\,
v^{(1)} = \pmatrix{1\\1\\1}$$
These vectors can be convenietly summarizes in a matrix $ V$ with one vertex per row:
$$ V = \pmatrix{0 & 0 & 0\\ 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1}$$
In general, the restricted parameter space $\Omega_0$ is defined as the convex hull of the set of vertices $v^{(s)}$:
$$\Omega_0 = \left\{ \theta = \sum_{s=1}^S \alpha_s v^{(s)} \,\middle |\, \alpha_s \geq 0 \text{ for all } s=1,\dots, S \text{ and }\sum_{s=1}^S \alpha_s=1 \right\}$$
#### Prior & Posterior
As a prior distribution for the parameters $\theta$, we assume independent Dirichlet distributions for the different item types $i$ with shape parameters $\alpha_{ij}$.
Note that we only allow parameters that are inside the constrained parameter space $\Omega_0$:
$$p(\theta) = \frac {1}{c} \, \mathbb I_{\Omega_0}(\theta) \,
\prod_{i=1}^I \prod_{j=1}^{J_i} \theta_{ij}^{\alpha_{ij}-1}$$
Here, $\mathbb I_{\Omega_0}(\theta)$ is the indicator function which is one if $\theta\in\Omega_0$ and zero otherwise.
Moreover, the normalizing constant $c$ ensures that the prior density function integrates to one.
Finally, the analytical solution of the posterior distribution is also a product-Dirichlet distribution, truncated to the constrained parameter space:
$$p(\theta \mid k) = \frac {1}{f} \, \mathbb I_{\Omega_0}(\theta) \,
\prod_{i=1}^I \prod_{j=1}^{J_i} \theta_{ij}^{k_{ij} + \alpha_{ij}-1}$$
Similarly as before, $f$ is the normalizing constant of the posterior density function.
#### Reference
For a more detailed introduction with technical details, see:
* Heck, D. W., & Davis-Stober, C. P. (2019). Multinomial models with linear
inequality constraints: Overview and improvements of computational methods for
Bayesian inference. Manuscript under revision.