Title: | Generate Postestimation Quantities for Bayesian MCMC Estimation |
---|---|
Description: | An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including 'JAGS', 'BUGS', 'MCMCpack', and 'Stan'. |
Authors: | Johannes Karreth [aut] , Shana Scogin [aut, cre] , Rob Williams [aut] , Andreas Beger [aut] , Myunghee Lee [ctb], Neil Williams [ctb] |
Maintainer: | Shana Scogin <[email protected]> |
License: | GPL-3 |
Version: | 0.3.2 |
Built: | 2024-11-18 06:33:59 UTC |
Source: | CRAN |
This package currently has nine main functions that can be used to generate and plot postestimation quantities after estimating Bayesian regression models using MCMC. The package combines functions written originally for Johannes Karreth's workshop on Bayesian modeling at the ICPSR Summer program. Currently BayesPostEst focuses mostly on generalized linear regression models for binary outcomes (logistic and probit regression). The vignette for this package has a walk-through of each function in action. Please refer to that to get an overview of all the functions, or visit the documentation for a specific function of your choice. Johannes Karreth's website (http://www.jkarreth.net) also has resources for getting started with Bayesian analysis, fitting models, and presenting results.
mcmcAveProb()
mcmcObsProb()
mcmcFD()
mcmcMargEff()
mcmcRocPrc()
mcmcRocPrcGen()
mcmcTab()
mcmcReg()
plot.mcmcFD()
A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.
jags_interactive
jags_interactive
A class "rjags" object created by [R2jags::jags()]
if (interactive()) { data("sim_data_interactive") ## formatting the data for jags datjags <- as.list(sim_data_interactive) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] + b[4] * X1[i] * X2[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_interactive <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
if (interactive()) { data("sim_data_interactive") ## formatting the data for jags datjags <- as.list(sim_data_interactive) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] + b[4] * X1[i] * X2[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_interactive <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.
jags_interactive_cat
jags_interactive_cat
A class "rjags" object created by [R2jags::jags()]
if (interactive()) { data("sim_data_interactive_cat") ## formatting the data for jags datjags <- as.list(sim_data_interactive_cat) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X3[i] + b[4] * X1[i] * X3[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_interactive_cat <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
if (interactive()) { data("sim_data_interactive_cat") ## formatting the data for jags datjags <- as.list(sim_data_interactive_cat) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X3[i] + b[4] * X1[i] * X3[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_interactive_cat <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
A fitted JAGS logit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.
jags_logit
jags_logit
A class "rjags" object created by [R2jags::jags()]
if (interactive()) { data("sim_data") ## formatting the data for jags datjags <- as.list(sim_data) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_logit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
if (interactive()) { data("sim_data") ## formatting the data for jags datjags <- as.list(sim_data) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_logit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
A fitted JAGS probit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.
jags_probit
jags_probit
A class "rjags" object created by [R2jags::jags()]
if (interactive()) { data("sim_data") ## formatting the data for jags datjags <- as.list(sim_data) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i probit(p[i]) <- mu[i] ## Update with probit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_probit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
if (interactive()) { data("sim_data") ## formatting the data for jags datjags <- as.list(sim_data) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i probit(p[i]) <- mu[i] ## Update with probit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) jags_probit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) }
This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. As "average" cases, this function calculates the median value of each predictor. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361).
mcmcAveProb( modelmatrix, mcmcout, xcol, xrange, xinterest, link = "logit", ci = c(0.025, 0.975), fullsims = FALSE )
mcmcAveProb( modelmatrix, mcmcout, xcol, xrange, xinterest, link = "logit", ci = c(0.025, 0.975), fullsims = FALSE )
modelmatrix |
model matrix, including intercept (if the intercept is among the
parameters estimated in the model). Create with model.matrix(formula, data).
Note: the order of columns in the model matrix must correspond to the order of columns
in the matrix of posterior draws in the |
mcmcout |
posterior distributions of all logit coefficients,
in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed
into a matrix using the function as.mcmc() from the coda package for |
xcol |
column number of the posterior draws ( |
xrange |
name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1). |
xinterest |
semi-optional argument. Name of the explanatory variable for which
to calculate associated Pr(y = 1). If |
link |
type of generalized linear model; a character vector set to |
ci |
the bounds of the credible interval. Default is |
fullsims |
logical indicator of whether full object (based on all MCMC draws
rather than their average) will be returned. Default is |
This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361)
if fullsims = FALSE
(default), a tibble with 4 columns:
x: value of variable of interest, drawn from xrange
median_pp: median predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values
lower_pp: lower bound of credible interval of predicted probability at given x
upper_pp: upper bound of credible interval of predicted probability at given x
if fullsims = TRUE
, a tibble with 3 columns:
Iteration: number of the posterior draw
x: value of variable of interest, drawn from xrange
pp: average predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316
if (interactive()) { ## simulating data set.seed(123) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags library(R2jags) set.seed(123) fit <- jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ### average value approach library(coda) xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] X1_sim <- seq(from = min(datjags$X1), to = max(datjags$X1), length.out = 10) ave_prob <- mcmcAveProb(modelmatrix = xmat, mcmcout = mcmc_mat, xrange = X1_sim, xcol = 2) }
if (interactive()) { ## simulating data set.seed(123) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags library(R2jags) set.seed(123) fit <- jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ### average value approach library(coda) xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] X1_sim <- seq(from = min(datjags$X1), to = max(datjags$X1), length.out = 10) ave_prob <- mcmcAveProb(modelmatrix = xmat, mcmcout = mcmc_mat, xrange = X1_sim, xcol = 2) }
Coefficient plots for MCMC output using ggplot2
mcmcCoefPlot( mod, pars = NULL, pointest = "mean", ci = 0.95, hpdi = FALSE, sort = FALSE, plot = TRUE, regex = FALSE )
mcmcCoefPlot( mod, pars = NULL, pointest = "mean", ci = 0.95, hpdi = FALSE, sort = FALSE, plot = TRUE, regex = FALSE )
mod |
Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms. |
pars |
a scalar or vector of the parameters you wish to include in the table.
By default, |
pointest |
a character indicating whether to use the mean or median for point estimates in the table. |
ci |
a scalar indicating the confidence level of the uncertainty intervals. |
hpdi |
a logical indicating whether to use highest posterior density intervals
or equal tailed credible intervals to capture uncertainty; default |
sort |
logical indicating whether to sort the point estimates to produce
a caterpillar or dot plot; default |
plot |
logical indicating whether to return a |
regex |
use regular expression matching with |
a ggplot
object or a tidy DataFrame.
Rob Williams, [email protected]
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## generating coefficient plot with all non-auxiliary parameters mcmcCoefPlot(fit) }
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## generating coefficient plot with all non-auxiliary parameters mcmcCoefPlot(fit) }
R function to calculate first differences after a Bayesian logit or probit model. First differences are a method to summarize effects across covariates. This quantity represents the difference in predicted probabilities for each covariate for cases with low and high values of the respective covariate. For each of these differences, all other variables are held constant at their median. For more, see Long (1997, Sage Publications) and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361).
mcmcFD( modelmatrix, mcmcout, link = "logit", ci = c(0.025, 0.975), percentiles = c(0.25, 0.75), fullsims = FALSE )
mcmcFD( modelmatrix, mcmcout, link = "logit", ci = c(0.025, 0.975), percentiles = c(0.25, 0.75), fullsims = FALSE )
modelmatrix |
model matrix, including intercept (if the intercept is among the
parameters estimated in the model). Create with model.matrix(formula, data).
Note: the order of columns in the model matrix must correspond to the order of columns
in the matrix of posterior draws in the |
mcmcout |
posterior distributions of all logit coefficients,
in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed
into a matrix using the function as.mcmc() from the coda package for |
link |
type of generalized linear model; a character vector set to |
ci |
the bounds of the credible interval. Default is |
percentiles |
values of each predictor for which the difference in Pr(y = 1)
is to be calculated. Default is |
fullsims |
logical indicator of whether full object (based on all MCMC draws
rather than their average) will be returned. Default is |
An object of class mcmcFD
. If fullsims = FALSE
(default),
a data frame with five columns:
median_fd: median first difference
lower_fd: lower bound of credible interval of the first difference
upper_fd: upper bound of credible interval of the first difference
VarName: name of the variable as found in modelmatrix
VarID: identifier of the variable, based on the order of columns in
modelmatrix
and mcmcout
. Can be adjusted for plotting
If fullsims = TRUE
, a matrix with as many columns as predictors in the model. Each row
is the first difference for that variable based on one set of posterior draws. Column names are taken
from the column names of modelmatrix
.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications
if (interactive()) { ## simulating data set.seed(1234) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## running function with logit xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] object <- mcmcFD(modelmatrix = xmat, mcmcout = mcmc_mat) object }
if (interactive()) { ## simulating data set.seed(1234) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## running function with logit xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] object <- mcmcFD(modelmatrix = xmat, mcmcout = mcmc_mat) object }
Marginal effects plots for MCMC output using ggplot2
mcmcMargEff( mod, main, int, moderator, pointest = "mean", seq = 100, ci = 0.95, hpdi = FALSE, plot = TRUE, xlab = "Moderator", ylab = "Marginal Effect" )
mcmcMargEff( mod, main, int, moderator, pointest = "mean", seq = 100, ci = 0.95, hpdi = FALSE, plot = TRUE, xlab = "Moderator", ylab = "Marginal Effect" )
mod |
Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms. |
main |
a character with the name of the parameter of interest in the interaction term. |
int |
a character with the name of the moderating parameter in the interaction term. |
moderator |
a vector of values that the moderating parameter takes on in the data. |
pointest |
a character indicating whether to use the mean or median for point estimates in the plot. |
seq |
a numeric giving the number of moderator values used to generate the marginal effects plot. |
ci |
a scalar indicating the confidence level of the uncertainty intervals. |
hpdi |
a logical indicating whether to use highest posterior density intervals or equal tailed credible intervals to capture uncertainty. |
plot |
logical indicating whether to return a |
xlab |
character giving x axis label if |
ylab |
character giving y axis label if |
a ggplot
object or a tidy DataFrame.
Rob Williams, [email protected]
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 ## linear model data Y_linear <- rnorm(n, Z, 1) df <- data.frame(cbind(X1, X2, Y = Y_linear)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] + b[4] * X1[i] * X2[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) mcmcMargEff(mod = fit, main = 'b[2]', int = 'b[4]', moderator = sim_data_interactive$X2, plot = TRUE) }
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 ## linear model data Y_linear <- rnorm(n, Z, 1) df <- data.frame(cbind(X1, X2, Y = Y_linear)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dnorm(mu[i], sigma) ## Bernoulli distribution of y_i mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] + b[4] * X1[i] * X2[i] } for(j in 1:4){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } sigma ~ dexp(1) } params <- c("b") inits1 <- list("b" = rep(0, 4)) inits2 <- list("b" = rep(0, 4)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) mcmcMargEff(mod = fit, main = 'b[2]', int = 'b[4]', moderator = sim_data_interactive$X2, plot = TRUE) }
Implements R function to calculate the predicted probabilities for "observed" cases after a Bayesian logit or probit model, following Hanmer & Kalkan (2013) (2013, American Journal of Political Science 57(1): 263-277).
mcmcObsProb( modelmatrix, mcmcout, xcol, xrange, xinterest, link = "logit", ci = c(0.025, 0.975), fullsims = FALSE )
mcmcObsProb( modelmatrix, mcmcout, xcol, xrange, xinterest, link = "logit", ci = c(0.025, 0.975), fullsims = FALSE )
modelmatrix |
model matrix, including intercept (if the intercept is among the
parameters estimated in the model). Create with model.matrix(formula, data).
Note: the order of columns in the model matrix must correspond to the order of columns
in the matrix of posterior draws in the |
mcmcout |
posterior distributions of all logit coefficients,
in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed
into a matrix using the function as.mcmc() from the coda package for |
xcol |
column number of the posterior draws ( |
xrange |
name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1). |
xinterest |
semi-optional argument. Name of the explanatory variable for which
to calculate associated Pr(y = 1). If |
link |
type of generalized linear model; a character vector set to |
ci |
the bounds of the credible interval. Default is |
fullsims |
logical indicator of whether full object (based on all MCMC draws
rather than their average) will be returned. Default is |
This function calculates predicted probabilities for "observed" cases after a Bayesian logit or probit model following Hanmer and Kalkan (2013, American Journal of Political Science 57(1): 263-277)
if fullsims = FALSE
(default), a tibble with 4 columns:
x: value of variable of interest, drawn from xrange
median_pp: median predicted Pr(y = 1) when variable of interest is set to x
lower_pp: lower bound of credible interval of predicted probability at given x
upper_pp: upper bound of credible interval of predicted probability at given x
if fullsims = TRUE
, a tibble with 3 columns:
Iteration: number of the posterior draw
x: value of variable of interest, drawn from xrange
pp: average predicted Pr(y = 1) of all observed cases when variable of interest is set to x
Hanmer, Michael J., & Ozan Kalkan, K. (2013). Behind the curve: Clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models. American Journal of Political Science, 57(1), 263-277. https://doi.org/10.1111/j.1540-5907.2012.00602.x
if (interactive()) { ## simulating data set.seed(12345) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags library(R2jags) set.seed(123) fit <- jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ### observed value approach library(coda) xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] X1_sim <- seq(from = min(datjags$X1), to = max(datjags$X1), length.out = 10) obs_prob <- mcmcObsProb(modelmatrix = xmat, mcmcout = mcmc_mat, xrange = X1_sim, xcol = 2) }
if (interactive()) { ## simulating data set.seed(12345) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags library(R2jags) set.seed(123) fit <- jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ### observed value approach library(coda) xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] X1_sim <- seq(from = min(datjags$X1), to = max(datjags$X1), length.out = 10) obs_prob <- mcmcObsProb(modelmatrix = xmat, mcmcout = mcmc_mat, xrange = X1_sim, xcol = 2) }
This function creates LaTeX or HTML regression tables for MCMC Output using
the texreg
function from the texreg
R package.
mcmcReg( mod, pars = NULL, pointest = "mean", ci = 0.95, hpdi = FALSE, sd = FALSE, pr = FALSE, coefnames = NULL, gof = numeric(0), gofnames = character(0), format = "latex", file, regex = FALSE, ... )
mcmcReg( mod, pars = NULL, pointest = "mean", ci = 0.95, hpdi = FALSE, sd = FALSE, pr = FALSE, coefnames = NULL, gof = numeric(0), gofnames = character(0), format = "latex", file, regex = FALSE, ... )
mod |
Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms, or a list of model objects of the same class. |
pars |
a scalar or vector of the parameters you wish to include in the table.
By default, |
pointest |
a character indicating whether to use the mean or median for point estimates in the table. |
ci |
a scalar indicating the confidence level of the uncertainty intervals. |
hpdi |
a logical indicating whether to use highest posterior density
intervals instead of equal tailed credible intervals to capture uncertainty
(default |
sd |
a logical indicating whether to report the standard deviation of
posterior distributions instead of an uncertainty interval
(default |
pr |
a logical indicating whether to report the probability that a
coefficient is in the same direction as the point estimate for that
coefficient (default |
coefnames |
an optional vector or list of vectors containing parameter
names for each model. If there are multiple models, the list must have the same
number of elements as there are models, and the vector of names in each list
element must match the number of parameters. If not supplied, the function
will use the parameter names in the model object(s). Note that this replaces
the standard |
gof |
a named list of goodness of fit statistics, or a list of such lists. |
gofnames |
an optional vector or list of vectors containing
goodness of fit statistic names for each model. Like |
format |
a character indicating |
file |
optional file name to write table to file instead of printing to console. |
regex |
use regular expression matching with |
... |
optional arguments to |
If using custom.coef.map
with more than one model, you should rename
the parameters in the model objects to ensure that different parameters with the
same subscript are not conflated by texreg
e.g. beta[1]
could represent age
in one model and income in another, and texreg
would combine the two if you
do not rename beta[1]
to more informative names in the model objects.
If mod
is a brmsfit
object or list of brmsfit
objects, note that the
default brms
names for coefficients are b_Intercept
and b
, so both of
these should be included in par
if you wish to include the intercept in the
table.
A formatted regression table in LaTeX or HTML format.
Rob Williams, [email protected]
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## generating regression table with all parameters mcmcReg(fit) ## generating regression table with only betas and custom coefficent names mcmcReg(fit, pars = c('b'), coefnames = c('Variable 1', 'Variable 2', 'Variable 3'), regex = TRUE) ## generating regression tables with all betas and custom names mcmcReg(fit, coefnames = c('Variable 1', 'Variable 2', 'Variable 3', 'deviance')) }
if (interactive()) { ## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## generating regression table with all parameters mcmcReg(fit) ## generating regression table with only betas and custom coefficent names mcmcReg(fit, pars = c('b'), coefnames = c('Variable 1', 'Variable 2', 'Variable 3'), regex = TRUE) ## generating regression tables with all betas and custom names mcmcReg(fit, coefnames = c('Variable 1', 'Variable 2', 'Variable 3', 'deviance')) }
This function generates ROC and Precision-Recall curves
after fitting a Bayesian logit or probit regression. For fast calculation for
from an "rjags" object use mcmcRocPrc
mcmcRocPrcGen( modelmatrix, mcmcout, modelframe, curves = FALSE, link = "logit", fullsims = FALSE )
mcmcRocPrcGen( modelmatrix, mcmcout, modelframe, curves = FALSE, link = "logit", fullsims = FALSE )
modelmatrix |
model matrix, including intercept (if the intercept is among the
parameters estimated in the model). Create with model.matrix(formula, data).
Note: the order of columns in the model matrix must correspond to the order of columns
in the matrix of posterior draws in the |
mcmcout |
posterior distributions of all logit coefficients,
in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed
into a matrix using the function as.mcmc() from the coda package for |
modelframe |
model frame in matrix form. Can be created using as.matrix(model.frame(formula, data)) |
curves |
logical indicator of whether or not to return values to plot the ROC or Precision-Recall
curves. If set to |
link |
type of generalized linear model; a character vector set to |
fullsims |
logical indicator of whether full object (based on all MCMC draws
rather than their average) will be returned. Default is |
This function generates ROC and precision-recall curves after fitting a Bayesian logit or probit model.
This function returns a list with 4 elements:
area_under_roc: area under ROC curve (scalar)
area_under_prc: area under precision-recall curve (scalar)
prc_dat: data to plot precision-recall curve (data frame)
roc_dat: data to plot ROC curve (data frame)
Beger, Andreas. 2016. “Precision-Recall Curves.” Available at SSRN: https://ssrn.com/Abstract=2765419. http://dx.doi.org/10.2139/ssrn.2765419.
if (interactive()) { # simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) # formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) # creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) # processing the data mm <- model.matrix(Y ~ X1 + X2, data = df) xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df)) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)] # using mcmcRocPrcGen fit_sum <- mcmcRocPrcGen(modelmatrix = mm, modelframe = xframe, mcmcout = mcmc_mat, curves = TRUE, fullsims = FALSE) }
if (interactive()) { # simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) # formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) # creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) # processing the data mm <- model.matrix(Y ~ X1 + X2, data = df) xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df)) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)] # using mcmcRocPrcGen fit_sum <- mcmcRocPrcGen(modelmatrix = mm, modelframe = xframe, mcmcout = mcmc_mat, curves = TRUE, fullsims = FALSE) }
Summarize Bayesian MCMC Output
R function for summarizing MCMC output in a regression-style table.
mcmcTab( sims, ci = c(0.025, 0.975), pars = NULL, Pr = FALSE, ROPE = NULL, regex = FALSE )
mcmcTab( sims, ci = c(0.025, 0.975), pars = NULL, Pr = FALSE, ROPE = NULL, regex = FALSE )
sims |
Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, and rstanarm. |
ci |
desired level for credible intervals; defaults to c(0.025, 0.975). |
pars |
character vector of parameters to be printed; defaults to |
Pr |
print percent of posterior draws with same sign as median; defaults to |
ROPE |
defaults to |
regex |
use regular expression matching with |
a data frame containing MCMC summary statistics.
Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.
if (interactive()) { data("jags_logit") ## printing out table object <- mcmcTab(jags_logit, ci = c(0.025, 0.975), pars = NULL, Pr = FALSE, ROPE = NULL) object }
if (interactive()) { data("jags_logit") ## printing out table object <- mcmcTab(jags_logit, ci = c(0.025, 0.975), pars = NULL, Pr = FALSE, ROPE = NULL) object }
The plot
method for first differences generated from MCMC
output by mcmcFD
. For more on this method, see Long
(1997, Sage Publications), and King, Tomz, and Wittenberg (2000, American
Journal of Political Science 44(2): 347-361). For a description of this type
of plot, see Figure 1 in Karreth (2018, International Interactions 44(3): 463-90).
## S3 method for class 'mcmcFD' plot(x, ROPE = NULL, ...)
## S3 method for class 'mcmcFD' plot(x, ROPE = NULL, ...)
x |
Output generated from |
ROPE |
defaults to NULL. If not NULL, a numeric vector of length two, defining the Region of Practical Equivalence around 0. See Kruschke (2013, Journal of Experimental Psychology 143(2): 573-603) for more on the ROPE. |
... |
a density plot of the differences in probabilities. The plot is made with ggplot2 and can be
passed on as an object to customize. Annotated numbers show the percent of posterior draws with
the same sign as the median estimate (if ROPE = NULL
) or on the same side of the
ROPE as the median estimate (if ROPE
is specified).
Karreth, Johannes. 2018. “The Economic Leverage of International Organizations in Interstate Disputes.” International Interactions 44 (3): 463-90. https://doi.org/10.1080/03050629.2018.1389728.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316.
Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.
if (interactive()) { ## simulating data set.seed(1234) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## preparing data for mcmcFD() xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] ## plotting with mcmcFDplot() full <- mcmcFD(modelmatrix = xmat, mcmcout = mcmc_mat, fullsims = TRUE) plot(full) }
if (interactive()) { ## simulating data set.seed(1234) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) df <- data.frame(cbind(X1, X2, Y)) ## formatting the data for jags datjags <- as.list(df) datjags$N <- length(datjags$Y) ## creating jags model model <- function() { for(i in 1:N){ Y[i] ~ dbern(p[i]) ## Bernoulli distribution of y_i logit(p[i]) <- mu[i] ## Logit link function mu[i] <- b[1] + b[2] * X1[i] + b[3] * X2[i] } for(j in 1:3){ b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity } } params <- c("b") inits1 <- list("b" = rep(0, 3)) inits2 <- list("b" = rep(0, 3)) inits <- list(inits1, inits2) ## fitting the model with R2jags set.seed(123) fit <- R2jags::jags(data = datjags, inits = inits, parameters.to.save = params, n.chains = 2, n.iter = 2000, n.burnin = 1000, model.file = model) ## preparing data for mcmcFD() xmat <- model.matrix(Y ~ X1 + X2, data = df) mcmc <- coda::as.mcmc(fit) mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)] ## plotting with mcmcFDplot() full <- mcmcFD(modelmatrix = xmat, mcmcout = mcmc_mat, fullsims = TRUE) plot(full) }
Generate ROC and Precision-Recall curves after fitting a Bayesian logit or
probit regression using rstan::stan()
, rstanarm::stan_glm()
,
R2jags::jags()
, R2WinBUGS::bugs()
, MCMCpack::MCMClogit()
, or other
functions that provide samples from a posterior density.
## S3 method for class 'mcmcRocPrc' print(x, ...) ## S3 method for class 'mcmcRocPrc' plot(x, n = 40, alpha = 0.5, ...) ## S3 method for class 'mcmcRocPrc' as.data.frame( x, row.names = NULL, optional = FALSE, what = c("auc", "roc", "prc"), ... ) mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## Default S3 method: mcmcRocPrc(object, curves, fullsims, yvec, ...) ## S3 method for class 'jags' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, yname, xnames, posterior_samples, ... ) ## S3 method for class 'rjags' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...) ## S3 method for class 'runjags' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...) ## S3 method for class 'stanfit' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, data, xnames, yname, ...) ## S3 method for class 'stanreg' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## S3 method for class 'brmsfit' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## S3 method for class 'bugs' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, data, xnames, yname, type = c("logit", "probit"), ... ) ## S3 method for class 'mcmc' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, data, xnames, yname, type = c("logit", "probit"), force = FALSE, ... )
## S3 method for class 'mcmcRocPrc' print(x, ...) ## S3 method for class 'mcmcRocPrc' plot(x, n = 40, alpha = 0.5, ...) ## S3 method for class 'mcmcRocPrc' as.data.frame( x, row.names = NULL, optional = FALSE, what = c("auc", "roc", "prc"), ... ) mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## Default S3 method: mcmcRocPrc(object, curves, fullsims, yvec, ...) ## S3 method for class 'jags' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, yname, xnames, posterior_samples, ... ) ## S3 method for class 'rjags' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...) ## S3 method for class 'runjags' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...) ## S3 method for class 'stanfit' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, data, xnames, yname, ...) ## S3 method for class 'stanreg' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## S3 method for class 'brmsfit' mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...) ## S3 method for class 'bugs' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, data, xnames, yname, type = c("logit", "probit"), ... ) ## S3 method for class 'mcmc' mcmcRocPrc( object, curves = FALSE, fullsims = FALSE, data, xnames, yname, type = c("logit", "probit"), force = FALSE, ... )
x |
a |
... |
Used by methods |
n |
plot method: if 'fullsims = TRUE', how many sample curves to draw? |
alpha |
plot method: alpha value for plotting sampled curves; between 0 and 1 |
row.names |
see [base::as.data.frame()] |
optional |
see [base::as.data.frame()] |
what |
which information to extract and convert to a data frame? |
object |
A fitted binary choice model, e.g. "rjags" object
(see |
curves |
logical indicator of whether or not to return values to plot
the ROC or Precision-Recall curves. If set to |
fullsims |
logical indicator of whether full object (based on all MCMC
draws rather than their average) will be returned. Default is |
yvec |
A |
yname |
( |
xnames |
( |
posterior_samples |
a "mcmc" object with the posterior samples |
data |
the data that was used in the 'stan(data = ?, ...)' call |
type |
"logit" or "probit" |
force |
for MCMCpack models, suppress warning if the model does not appear to be a binary choice model? |
If only the average AUC-ROC and PR are of interest, setting
curves = FALSE
and fullsims = FALSE
can greatly speed up calculation
time. The curve data (curves = TRUE
) is needed for plotting. The plot
method will always plot both the ROC and PR curves, but the underlying
data can easily be extracted from the output for your own plotting;
see the documentation of the value returned below.
The default method works with a matrix of predicted probabilities and the vector of observed incomes as input. Other methods accommodate some of the common Bayesian modeling packages like rstan (which returns class "stanfit"), rstanarm ("stanreg"), R2jags ("jags"), R2WinBUGS ("bugs"), and MCMCpack ("mcmc"). Even if a package-specific method is not implemented, the default method can always be used as a fallback by manually calculating the matrix of predicted probabilities for each posterior sample.
Note that MCMCpack returns generic "mcmc" output that is annotated with
some additional information as attributes, including the original function
call. There is no inherent way to distinguish any other kind of "mcmc"
object from one generated by a proper MCMCpack modeling function, but as a
basic precaution, mcmcRocPrc()
will check the saved call and return an
error if the function called was not MCMClogit()
or MCMCprobit()
.
This behavior can be suppressed by setting force = TRUE
.
Returns a list with length 2 or 4, depending on the on the "curves" and "fullsims" argument values:
"area_under_roc": numeric()
; either length 1 if fullsims = FALSE
, or
one value for each posterior sample otherwise
"area_under_prc": numeric()
; either length 1 if fullsims = FALSE
, or
one value for each posterior sample otherwise
"prc_dat": only if curves = TRUE
; a list with length 1 if
fullsims = FALSE
, longer otherwise
"roc_dat": only if curves = TRUE
; a list with length 1 if
fullsims = FALSE
, longer otherwise
Beger, Andreas. 2016. “Precision-Recall Curves.” Available at doi:10.2139/ssrn.2765419
if (interactive()) { # load simulated data and fitted model (see ?sim_data and ?jags_logit) data("jags_logit") # using mcmcRocPrc fit_sum <- mcmcRocPrc(jags_logit, yname = "Y", xnames = c("X1", "X2"), curves = TRUE, fullsims = FALSE) fit_sum plot(fit_sum) # Equivalently, we can calculate the matrix of predicted probabilities # ourselves; using the example from ?jags_logit: library(R2jags) data("sim_data") yvec <- sim_data$Y xmat <- sim_data[, c("X1", "X2")] # add intercept to the X data xmat <- as.matrix(cbind(Intercept = 1L, xmat)) beta <- as.matrix(as.mcmc(jags_logit))[, c("b[1]", "b[2]", "b[3]")] pred_mat <- plogis(xmat %*% t(beta)) # the matrix of predictions has rows matching the number of rows in the data; # the column are the predictions for each of the 2,000 posterior samples nrow(sim_data) dim(pred_mat) # now we can call mcmcRocPrc; the default method works with the matrix # of predictions and vector of outcomes as input mcmcRocPrc(object = pred_mat, curves = TRUE, fullsims = FALSE, yvec = yvec) }
if (interactive()) { # load simulated data and fitted model (see ?sim_data and ?jags_logit) data("jags_logit") # using mcmcRocPrc fit_sum <- mcmcRocPrc(jags_logit, yname = "Y", xnames = c("X1", "X2"), curves = TRUE, fullsims = FALSE) fit_sum plot(fit_sum) # Equivalently, we can calculate the matrix of predicted probabilities # ourselves; using the example from ?jags_logit: library(R2jags) data("sim_data") yvec <- sim_data$Y xmat <- sim_data[, c("X1", "X2")] # add intercept to the X data xmat <- as.matrix(cbind(Intercept = 1L, xmat)) beta <- as.matrix(as.mcmc(jags_logit))[, c("b[1]", "b[2]", "b[3]")] pred_mat <- plogis(xmat %*% t(beta)) # the matrix of predictions has rows matching the number of rows in the data; # the column are the predictions for each of the 2,000 posterior samples nrow(sim_data) dim(pred_mat) # now we can call mcmcRocPrc; the default method works with the matrix # of predictions and vector of outcomes as input mcmcRocPrc(object = pred_mat, curves = TRUE, fullsims = FALSE, yvec = yvec) }
Simulated data to fit example models against
sim_data
sim_data
a data.frame
## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) sim_data <- data.frame(cbind(X1, X2, Y))
## simulating data set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z <- b0 + b1 * X1 + b2 * X2 pr <- 1 / (1 + exp(-Z)) # inv logit function Y <- rbinom(n, 1, pr) sim_data <- data.frame(cbind(X1, X2, Y))
Simulated data to fit example models against
sim_data_interactive
sim_data_interactive
a data.frame
set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta b3 <- -0.3 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z_interactive <- b0 + b1 * X1 + b2 * X2 + b3 * (X1 * X2) Y_interactive <- rnorm(n, Z_interactive, 1) sim_data_interactive <- data.frame(cbind(X1, X2, Y = Y_interactive))
set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta b3 <- -0.3 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X2 <- runif(n, -1, 1) Z_interactive <- b0 + b1 * X1 + b2 * X2 + b3 * (X1 * X2) Y_interactive <- rnorm(n, Z_interactive, 1) sim_data_interactive <- data.frame(cbind(X1, X2, Y = Y_interactive))
Simulated data to fit example models against
sim_data_interactive_cat
sim_data_interactive_cat
a data.frame
set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta b3 <- -0.3 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X3 <- rbinom(n, 5, .23) Z_interactive_cat <- b0 + b1 * X1 + b2 * X3 + b3 * (X1 * X3) Y_interactive_cat <- rnorm(n, Z_interactive_cat, 1) sim_data_interactive_cat <- data.frame(cbind(X1, X3, Y = Y_interactive_cat))
set.seed(123456) b0 <- 0.2 # true value for the intercept b1 <- 0.5 # true value for first beta b2 <- 0.7 # true value for second beta b3 <- -0.3 # true value for second beta n <- 500 # sample size X1 <- runif(n, -1, 1) X3 <- rbinom(n, 5, .23) Z_interactive_cat <- b0 + b1 * X1 + b2 * X3 + b3 * (X1 * X3) Y_interactive_cat <- rnorm(n, Z_interactive_cat, 1) sim_data_interactive_cat <- data.frame(cbind(X1, X3, Y = Y_interactive_cat))