Package 'BayesPostEst'

Title: Generate Postestimation Quantities for Bayesian MCMC Estimation
Description: An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including 'JAGS', 'BUGS', 'MCMCpack', and 'Stan'.
Authors: Johannes Karreth [aut] , Shana Scogin [aut, cre] , Rob Williams [aut] , Andreas Beger [aut] , Myunghee Lee [ctb], Neil Williams [ctb]
Maintainer: Shana Scogin <[email protected]>
License: GPL-3
Version: 0.3.2
Built: 2024-11-18 06:33:59 UTC
Source: CRAN

Help Index


BayesPostEst Overview

Description

This package currently has nine main functions that can be used to generate and plot postestimation quantities after estimating Bayesian regression models using MCMC. The package combines functions written originally for Johannes Karreth's workshop on Bayesian modeling at the ICPSR Summer program. Currently BayesPostEst focuses mostly on generalized linear regression models for binary outcomes (logistic and probit regression). The vignette for this package has a walk-through of each function in action. Please refer to that to get an overview of all the functions, or visit the documentation for a specific function of your choice. Johannes Karreth's website (http://www.jkarreth.net) also has resources for getting started with Bayesian analysis, fitting models, and presenting results.

Main Functions

  • mcmcAveProb()

  • mcmcObsProb()

  • mcmcFD()

  • mcmcMargEff()

  • mcmcRocPrc()

  • mcmcRocPrcGen()

  • mcmcTab()

  • mcmcReg()

  • plot.mcmcFD()


Fitted JAGS interactive linear model

Description

A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_interactive

Format

A class "rjags" object created by [R2jags::jags()]

Examples

if (interactive()) {
data("sim_data_interactive")

## formatting the data for jags
datjags <- as.list(sim_data_interactive)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i] +
      b[4] * X1[i] * X2[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_interactive <- R2jags::jags(data = datjags, inits = inits, 
                                 parameters.to.save = params, n.chains = 2,
                                 n.iter = 2000, n.burnin = 1000,
                                 model.file = model)
                                 
}

Fitted JAGS interactive linear model with categorical moderator

Description

A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_interactive_cat

Format

A class "rjags" object created by [R2jags::jags()]

Examples

if (interactive()) {
data("sim_data_interactive_cat")

## formatting the data for jags
datjags <- as.list(sim_data_interactive_cat)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X3[i] +
      b[4] * X1[i] * X3[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_interactive_cat <- R2jags::jags(data = datjags, inits = inits,
                                     parameters.to.save = params, n.chains = 2,
                                     n.iter = 2000, n.burnin = 1000,
                                     model.file = model)
                                 
}

Fitted JAGS logit model

Description

A fitted JAGS logit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_logit

Format

A class "rjags" object created by [R2jags::jags()]

Examples

if (interactive()) {
data("sim_data")
  
## formatting the data for jags
datjags <- as.list(sim_data)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
      logit(p[i]) <- mu[i]    ## Logit link function
      mu[i] <- b[1] +
        b[2] * X1[i] +
        b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_logit <- R2jags::jags(data = datjags, inits = inits,
                         parameters.to.save = params, n.chains = 2, 
                         n.iter = 2000, n.burnin = 1000, model.file = model)

}

Fitted JAGS probit model

Description

A fitted JAGS probit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_probit

Format

A class "rjags" object created by [R2jags::jags()]

Examples

if (interactive()) {
data("sim_data")
  
## formatting the data for jags
datjags <- as.list(sim_data)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
      probit(p[i]) <- mu[i]    ## Update with probit link function
      mu[i] <- b[1] +
        b[2] * X1[i] +
        b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_probit <- R2jags::jags(data = datjags, inits = inits,
                         parameters.to.save = params, n.chains = 2, 
                         n.iter = 2000, n.burnin = 1000, model.file = model)

}

Predicted Probabilities using Bayesian MCMC estimates for the "Average" Case

Description

This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. As "average" cases, this function calculates the median value of each predictor. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361).

Usage

mcmcAveProb(
  modelmatrix,
  mcmcout,
  xcol,
  xrange,
  xinterest,
  link = "logit",
  ci = c(0.025, 0.975),
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

xcol

column number of the posterior draws (mcmcout) and model matrices that corresponds to the explanatory variable for which to calculate associated Pr(y = 1). Note that the columns in these matrices must match.

xrange

name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1).

xinterest

semi-optional argument. Name of the explanatory variable for which to calculate associated Pr(y = 1). If xcol is supplied, this is not needed. If both are supplied, the function defaults to xcol and this argument is ignored.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: The longer xrange is, the larger the full output will be if TRUE is selected.

Details

This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361)

Value

if fullsims = FALSE (default), a tibble with 4 columns:

  • x: value of variable of interest, drawn from xrange

  • median_pp: median predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values

  • lower_pp: lower bound of credible interval of predicted probability at given x

  • upper_pp: upper bound of credible interval of predicted probability at given x

if fullsims = TRUE, a tibble with 3 columns:

  • Iteration: number of the posterior draw

  • x: value of variable of interest, drawn from xrange

  • pp: average predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316

Examples

if (interactive()) {
  ## simulating data
  set.seed(123)
  b0 <- 0.2 # true value for the intercept
  b1 <- 0.5 # true value for first beta
  b2 <- 0.7 # true value for second beta
  n <- 500 # sample size
  X1 <- runif(n, -1, 1)
  X2 <- runif(n, -1, 1)
  Z <- b0 + b1 * X1 + b2 * X2
  pr <- 1 / (1 + exp(-Z)) # inv logit function
  Y <- rbinom(n, 1, pr) 
  df <- data.frame(cbind(X1, X2, Y))
  
  ## formatting the data for jags
  datjags <- as.list(df)
  datjags$N <- length(datjags$Y)
  
  ## creating jags model
  model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
library(R2jags)
set.seed(123)
fit <- jags(data = datjags, inits = inits, 
         parameters.to.save = params, n.chains = 2, n.iter = 2000, 
         n.burnin = 1000, model.file = model)

### average value approach
library(coda)
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
X1_sim <- seq(from = min(datjags$X1),
              to = max(datjags$X1), 
              length.out = 10)
ave_prob <- mcmcAveProb(modelmatrix = xmat,
                        mcmcout = mcmc_mat,
                        xrange = X1_sim, 
                        xcol = 2)
}

Coefficient Plots for MCMC Output

Description

Coefficient plots for MCMC output using ggplot2

Usage

mcmcCoefPlot(
  mod,
  pars = NULL,
  pointest = "mean",
  ci = 0.95,
  hpdi = FALSE,
  sort = FALSE,
  plot = TRUE,
  regex = FALSE
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms.

pars

a scalar or vector of the parameters you wish to include in the table. By default, mcmcCoefPlot includes all parameters saved in a model object. If a model has lots of samples and lots of saved parameters, not explicitly specifying a limited number of parameters to include via pars may take a long time or produce an unreadable plot. pars can either be a vector with the specific parameters to be included in the table e.g. pars = c("beta[1]", "beta[2]", "beta[3]"), or they can be partial names that will be matched using regular expressions e.g. pars = "beta" if regex = TRUE. Both of these will include beta[1], beta[2], and beta[3] in the plot. If pars is left blank, mcmcCoefPlot will exclude auxiliary parameters such as deviance from JAGS or lp__ from Stan.

pointest

a character indicating whether to use the mean or median for point estimates in the table.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals or equal tailed credible intervals to capture uncertainty; default FALSE.

sort

logical indicating whether to sort the point estimates to produce a caterpillar or dot plot; default FALSE.

plot

logical indicating whether to return a ggplot object or the underlying tidy DataFrame; default TRUE.

regex

use regular expression matching with pars?

Value

a ggplot object or a tidy DataFrame.

Author(s)

Rob Williams, [email protected]

Examples

if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr)
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] +
      b[2] * X1[i] +
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits,
                    parameters.to.save = params, n.chains = 2, n.iter = 2000,
                    n.burnin = 1000, model.file = model)

## generating coefficient plot with all non-auxiliary parameters
mcmcCoefPlot(fit)
}

First Differences of a Bayesian Logit or Probit model

Description

R function to calculate first differences after a Bayesian logit or probit model. First differences are a method to summarize effects across covariates. This quantity represents the difference in predicted probabilities for each covariate for cases with low and high values of the respective covariate. For each of these differences, all other variables are held constant at their median. For more, see Long (1997, Sage Publications) and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361).

Usage

mcmcFD(
  modelmatrix,
  mcmcout,
  link = "logit",
  ci = c(0.025, 0.975),
  percentiles = c(0.25, 0.75),
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

percentiles

values of each predictor for which the difference in Pr(y = 1) is to be calculated. Default is c(0.25, 0.75), which will calculate the difference between Pr(y = 1) for the 25th percentile and 75th percentile of the predictor. For binary predictors, the function automatically calculates the difference between Pr(y = 1) for x = 0 and x = 1.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE.

Value

An object of class mcmcFD. If fullsims = FALSE (default), a data frame with five columns:

  • median_fd: median first difference

  • lower_fd: lower bound of credible interval of the first difference

  • upper_fd: upper bound of credible interval of the first difference

  • VarName: name of the variable as found in modelmatrix

  • VarID: identifier of the variable, based on the order of columns in modelmatrix and mcmcout. Can be adjusted for plotting

If fullsims = TRUE, a matrix with as many columns as predictors in the model. Each row is the first difference for that variable based on one set of posterior draws. Column names are taken from the column names of modelmatrix.

References

  • King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316

  • Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications

Examples

if (interactive()) {
## simulating data
set.seed(1234)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

## running function with logit
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
object <- mcmcFD(modelmatrix = xmat,
                 mcmcout = mcmc_mat)
object
}

Marginal Effects Plots for MCMC Output

Description

Marginal effects plots for MCMC output using ggplot2

Usage

mcmcMargEff(
  mod,
  main,
  int,
  moderator,
  pointest = "mean",
  seq = 100,
  ci = 0.95,
  hpdi = FALSE,
  plot = TRUE,
  xlab = "Moderator",
  ylab = "Marginal Effect"
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms.

main

a character with the name of the parameter of interest in the interaction term.

int

a character with the name of the moderating parameter in the interaction term.

moderator

a vector of values that the moderating parameter takes on in the data.

pointest

a character indicating whether to use the mean or median for point estimates in the plot.

seq

a numeric giving the number of moderator values used to generate the marginal effects plot.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals or equal tailed credible intervals to capture uncertainty.

plot

logical indicating whether to return a ggplot object or the underlying tidy DataFrame. By default, mcmcMargEff returns a line and ribbon plot for continuous variables, and a dot and line plot for factor variables and discrete variables with fewer than 25 unique values.

xlab

character giving x axis label if plot = TRUE, default "Moderator"

ylab

character giving y axis label if plot = TRUE, default "Marginal Effect"

Value

a ggplot object or a tidy DataFrame.

Author(s)

Rob Williams, [email protected]

Examples

if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2

## linear model data
Y_linear <- rnorm(n, Z, 1)
df <- data.frame(cbind(X1, X2, Y = Y_linear))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i] +
      b[4] * X1[i] * X2[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

mcmcMargEff(mod = fit,
            main = 'b[2]',
            int = 'b[4]',
            moderator = sim_data_interactive$X2,
            plot = TRUE)
}

Predicted Probabilities using Bayesian MCMC estimates for the Average of Observed Cases

Description

Implements R function to calculate the predicted probabilities for "observed" cases after a Bayesian logit or probit model, following Hanmer & Kalkan (2013) (2013, American Journal of Political Science 57(1): 263-277).

Usage

mcmcObsProb(
  modelmatrix,
  mcmcout,
  xcol,
  xrange,
  xinterest,
  link = "logit",
  ci = c(0.025, 0.975),
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

xcol

column number of the posterior draws (mcmcout) and model matrices that corresponds to the explanatory variable for which to calculate associated Pr(y = 1). Note that the columns in these matrices must match.

xrange

name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1).

xinterest

semi-optional argument. Name of the explanatory variable for which to calculate associated Pr(y = 1). If xcol is supplied, this is not needed. If both are supplied, the function defaults to xcol and this argument is ignored.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: The longer xrange is, the larger the full output will be if TRUE is selected.

Details

This function calculates predicted probabilities for "observed" cases after a Bayesian logit or probit model following Hanmer and Kalkan (2013, American Journal of Political Science 57(1): 263-277)

Value

if fullsims = FALSE (default), a tibble with 4 columns:

  • x: value of variable of interest, drawn from xrange

  • median_pp: median predicted Pr(y = 1) when variable of interest is set to x

  • lower_pp: lower bound of credible interval of predicted probability at given x

  • upper_pp: upper bound of credible interval of predicted probability at given x

if fullsims = TRUE, a tibble with 3 columns:

  • Iteration: number of the posterior draw

  • x: value of variable of interest, drawn from xrange

  • pp: average predicted Pr(y = 1) of all observed cases when variable of interest is set to x

References

Hanmer, Michael J., & Ozan Kalkan, K. (2013). Behind the curve: Clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models. American Journal of Political Science, 57(1), 263-277. https://doi.org/10.1111/j.1540-5907.2012.00602.x

Examples

if (interactive()) {
  ## simulating data
  set.seed(12345)
  b0 <- 0.2 # true value for the intercept
  b1 <- 0.5 # true value for first beta
  b2 <- 0.7 # true value for second beta
  n <- 500 # sample size
  X1 <- runif(n, -1, 1)
  X2 <- runif(n, -1, 1)
  Z <- b0 + b1 * X1 + b2 * X2
  pr <- 1 / (1 + exp(-Z)) # inv logit function
  Y <- rbinom(n, 1, pr) 
  df <- data.frame(cbind(X1, X2, Y))
  
  ## formatting the data for jags
  datjags <- as.list(df)
  datjags$N <- length(datjags$Y)
  
  ## creating jags model
  model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
library(R2jags)
set.seed(123)
fit <- jags(data = datjags, inits = inits, 
          parameters.to.save = params, n.chains = 2, n.iter = 2000, 
          n.burnin = 1000, model.file = model)

### observed value approach
library(coda)
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
X1_sim <- seq(from = min(datjags$X1),
              to = max(datjags$X1), 
              length.out = 10)
obs_prob <- mcmcObsProb(modelmatrix = xmat,
                        mcmcout = mcmc_mat,
                        xrange = X1_sim,
                        xcol = 2)
}

LaTeX or HTML regression tables for MCMC Output

Description

This function creates LaTeX or HTML regression tables for MCMC Output using the texreg function from the texreg R package.

Usage

mcmcReg(
  mod,
  pars = NULL,
  pointest = "mean",
  ci = 0.95,
  hpdi = FALSE,
  sd = FALSE,
  pr = FALSE,
  coefnames = NULL,
  gof = numeric(0),
  gofnames = character(0),
  format = "latex",
  file,
  regex = FALSE,
  ...
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms, or a list of model objects of the same class.

pars

a scalar or vector of the parameters you wish to include in the table. By default, mcmcReg includes all parameters saved in a model object. If a model has lots of samples and lots of saved parameters, not explicitly specifying a limited number of parameters to include via pars may take a long time. pars can either be a vector with the specific parameters to be included in the table e.g. pars = c("beta[1]", "beta[2]", "beta[3]"), or they can be partial names that will be matched using regular expressions e.g. pars = "beta" if regex = TRUE. Both of these will include beta[1], beta[2], and beta[3] in the table. When combining models with different parameters in one table, this argument also accepts a list the length of the number of models.

pointest

a character indicating whether to use the mean or median for point estimates in the table.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals instead of equal tailed credible intervals to capture uncertainty (default FALSE).

sd

a logical indicating whether to report the standard deviation of posterior distributions instead of an uncertainty interval (default FALSE). If TRUE, overrides ci, hpdi, and pr.

pr

a logical indicating whether to report the probability that a coefficient is in the same direction as the point estimate for that coefficient (default FALSE). If TRUE, overrides ci and hpdi.

coefnames

an optional vector or list of vectors containing parameter names for each model. If there are multiple models, the list must have the same number of elements as there are models, and the vector of names in each list element must match the number of parameters. If not supplied, the function will use the parameter names in the model object(s). Note that this replaces the standard custom.coef.names argument in texreg because there is no extract method for MCMC model objects, and many MCMC model objects do not have unique parameter names.

gof

a named list of goodness of fit statistics, or a list of such lists.

gofnames

an optional vector or list of vectors containing goodness of fit statistic names for each model. Like coefnames in this function (which replaces the custom.coef.names argument in texreg), gofnames replaces the standard custom.gof.names argument in texreg. If there are multiple models, the list must have the same number of elements as there are models, and the vector of names in each list element must match the number of goodness of fit statistics.

format

a character indicating latex or html output.

file

optional file name to write table to file instead of printing to console.

regex

use regular expression matching with pars?

...

optional arguments to texreg.

Details

If using custom.coef.map with more than one model, you should rename the parameters in the model objects to ensure that different parameters with the same subscript are not conflated by texreg e.g. beta[1] could represent age in one model and income in another, and texreg would combine the two if you do not rename beta[1] to more informative names in the model objects.

If mod is a brmsfit object or list of brmsfit objects, note that the default brms names for coefficients are b_Intercept and b, so both of these should be included in par if you wish to include the intercept in the table.

Value

A formatted regression table in LaTeX or HTML format.

Author(s)

Rob Williams, [email protected]

Examples

if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr)
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] +
      b[2] * X1[i] +
      b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits,
                    parameters.to.save = params,
                    n.chains = 2,
                    n.iter = 2000, n.burnin = 1000,
                    model.file = model)

## generating regression table with all parameters
mcmcReg(fit)

## generating regression table with only betas and custom coefficent names
mcmcReg(fit, pars = c('b'), coefnames = c('Variable 1',
                                          'Variable 2',
                                          'Variable 3'),
        regex = TRUE)
## generating regression tables with all betas and custom names
mcmcReg(fit, coefnames = c('Variable 1', 'Variable 2',
                           'Variable 3', 'deviance'))
}

ROC and Precision-Recall Curves using Bayesian MCMC estimates generalized

Description

This function generates ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression. For fast calculation for from an "rjags" object use mcmcRocPrc

Usage

mcmcRocPrcGen(
  modelmatrix,
  mcmcout,
  modelframe,
  curves = FALSE,
  link = "logit",
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more and Beger (2016) for background.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

modelframe

model frame in matrix form. Can be created using as.matrix(model.frame(formula, data))

curves

logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to FALSE (default), results are returned as a list without the extra values.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: If TRUE is chosen, the function takes notably longer to execute.

Details

This function generates ROC and precision-recall curves after fitting a Bayesian logit or probit model.

Value

This function returns a list with 4 elements:

  • area_under_roc: area under ROC curve (scalar)

  • area_under_prc: area under precision-recall curve (scalar)

  • prc_dat: data to plot precision-recall curve (data frame)

  • roc_dat: data to plot ROC curve (data frame)

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at SSRN: https://ssrn.com/Abstract=2765419. http://dx.doi.org/10.2139/ssrn.2765419.

Examples

if (interactive()) {
# simulating data

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

# formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

# creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

# processing the data
mm <- model.matrix(Y ~ X1 + X2, data = df)
xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df))
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)]

# using mcmcRocPrcGen
fit_sum <- mcmcRocPrcGen(modelmatrix = mm,
                      modelframe = xframe,
                      mcmcout = mcmc_mat,
                      curves = TRUE,
                      fullsims = FALSE)
}

Summarize Bayesian MCMC Output R function for summarizing MCMC output in a regression-style table.

Description

Summarize Bayesian MCMC Output

R function for summarizing MCMC output in a regression-style table.

Usage

mcmcTab(
  sims,
  ci = c(0.025, 0.975),
  pars = NULL,
  Pr = FALSE,
  ROPE = NULL,
  regex = FALSE
)

Arguments

sims

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, and rstanarm.

ci

desired level for credible intervals; defaults to c(0.025, 0.975).

pars

character vector of parameters to be printed; defaults to NULL (all parameters are printed). If not NULL, the user can either specify the exact names of parameters to be printed (e.g. c("alpha", "beta1", "beta2")) or part of a name so that all parameters containing that name will be printed (e.g. "beta" will print beta1, beta2, etc.).

Pr

print percent of posterior draws with same sign as median; defaults to FALSE.

ROPE

defaults to NULL. If not NULL, a vector of two values defining the region of practical equivalence ("ROPE"); returns % of posterior draws to the left/right of ROPE. For this quantity to be meaningful, all parameters must be on the same scale (e.g. standardized coefficients or first differences). See Kruschke (2013, Journal of Experimental Psychology 143(2): 573-603) for more on the ROPE.

regex

use regular expression matching with pars?

Value

a data frame containing MCMC summary statistics.

References

Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.

Examples

if (interactive()) {
data("jags_logit")

## printing out table
object <- mcmcTab(jags_logit, 
          ci = c(0.025, 0.975), 
          pars = NULL, 
          Pr = FALSE,
          ROPE = NULL)
object
}

Plot Method for First Differences from MCMC output

Description

The plot method for first differences generated from MCMC output by mcmcFD. For more on this method, see Long (1997, Sage Publications), and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361). For a description of this type of plot, see Figure 1 in Karreth (2018, International Interactions 44(3): 463-90).

Usage

## S3 method for class 'mcmcFD'
plot(x, ROPE = NULL, ...)

Arguments

x

Output generated from mcmcFD(..., full_sims = TRUE).

ROPE

defaults to NULL. If not NULL, a numeric vector of length two, defining the Region of Practical Equivalence around 0. See Kruschke (2013, Journal of Experimental Psychology 143(2): 573-603) for more on the ROPE.

...

optional arguments to theme from ggplot2.

Value

a density plot of the differences in probabilities. The plot is made with ggplot2 and can be passed on as an object to customize. Annotated numbers show the percent of posterior draws with the same sign as the median estimate (if ROPE = NULL) or on the same side of the ROPE as the median estimate (if ROPE is specified).

References

  • Karreth, Johannes. 2018. “The Economic Leverage of International Organizations in Interstate Disputes.” International Interactions 44 (3): 463-90. https://doi.org/10.1080/03050629.2018.1389728.

  • King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316.

  • Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.

  • Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.

See Also

mcmcFD

Examples

if (interactive()) {
## simulating data
set.seed(1234)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

## preparing data for mcmcFD()
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]

## plotting with mcmcFDplot()
full <- mcmcFD(modelmatrix = xmat,
               mcmcout = mcmc_mat,
               fullsims = TRUE)
plot(full)

}

ROC and Precision-Recall Curves using Bayesian MCMC estimates

Description

Generate ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression using rstan::stan(), rstanarm::stan_glm(), R2jags::jags(), R2WinBUGS::bugs(), MCMCpack::MCMClogit(), or other functions that provide samples from a posterior density.

Usage

## S3 method for class 'mcmcRocPrc'
print(x, ...)

## S3 method for class 'mcmcRocPrc'
plot(x, n = 40, alpha = 0.5, ...)

## S3 method for class 'mcmcRocPrc'
as.data.frame(
  x,
  row.names = NULL,
  optional = FALSE,
  what = c("auc", "roc", "prc"),
  ...
)

mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## Default S3 method:
mcmcRocPrc(object, curves, fullsims, yvec, ...)

## S3 method for class 'jags'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  yname,
  xnames,
  posterior_samples,
  ...
)

## S3 method for class 'rjags'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...)

## S3 method for class 'runjags'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...)

## S3 method for class 'stanfit'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, data, xnames, yname, ...)

## S3 method for class 'stanreg'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## S3 method for class 'brmsfit'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## S3 method for class 'bugs'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  data,
  xnames,
  yname,
  type = c("logit", "probit"),
  ...
)

## S3 method for class 'mcmc'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  data,
  xnames,
  yname,
  type = c("logit", "probit"),
  force = FALSE,
  ...
)

Arguments

x

a mcmcRocPrc() object

...

Used by methods

n

plot method: if 'fullsims = TRUE', how many sample curves to draw?

alpha

plot method: alpha value for plotting sampled curves; between 0 and 1

row.names

see [base::as.data.frame()]

optional

see [base::as.data.frame()]

what

which information to extract and convert to a data frame?

object

A fitted binary choice model, e.g. "rjags" object (see R2jags::jags()), or a ⁠[N, iter]⁠ matrix of predicted probabilites.

curves

logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to FALSE (default), results are returned as a list without the extra values.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: If TRUE is chosen, the function takes notably longer to execute.

yvec

A numeric(N) vector of observed outcomes.

yname

(character(1))
The name of the dependent variable, should match the variable name in the JAGS data object.

xnames

(base::character())
A character vector of the independent variable names, should match the corresponding names in the JAGS data object.

posterior_samples

a "mcmc" object with the posterior samples

data

the data that was used in the 'stan(data = ?, ...)' call

type

"logit" or "probit"

force

for MCMCpack models, suppress warning if the model does not appear to be a binary choice model?

Details

If only the average AUC-ROC and PR are of interest, setting curves = FALSE and fullsims = FALSE can greatly speed up calculation time. The curve data (curves = TRUE) is needed for plotting. The plot method will always plot both the ROC and PR curves, but the underlying data can easily be extracted from the output for your own plotting; see the documentation of the value returned below.

The default method works with a matrix of predicted probabilities and the vector of observed incomes as input. Other methods accommodate some of the common Bayesian modeling packages like rstan (which returns class "stanfit"), rstanarm ("stanreg"), R2jags ("jags"), R2WinBUGS ("bugs"), and MCMCpack ("mcmc"). Even if a package-specific method is not implemented, the default method can always be used as a fallback by manually calculating the matrix of predicted probabilities for each posterior sample.

Note that MCMCpack returns generic "mcmc" output that is annotated with some additional information as attributes, including the original function call. There is no inherent way to distinguish any other kind of "mcmc" object from one generated by a proper MCMCpack modeling function, but as a basic precaution, mcmcRocPrc() will check the saved call and return an error if the function called was not MCMClogit() or MCMCprobit(). This behavior can be suppressed by setting force = TRUE.

Value

Returns a list with length 2 or 4, depending on the on the "curves" and "fullsims" argument values:

  • "area_under_roc": numeric(); either length 1 if fullsims = FALSE, or one value for each posterior sample otherwise

  • "area_under_prc": numeric(); either length 1 if fullsims = FALSE, or one value for each posterior sample otherwise

  • "prc_dat": only if curves = TRUE; a list with length 1 if fullsims = FALSE, longer otherwise

  • "roc_dat": only if curves = TRUE; a list with length 1 if fullsims = FALSE, longer otherwise

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at doi:10.2139/ssrn.2765419

Examples

if (interactive()) {
# load simulated data and fitted model (see ?sim_data and ?jags_logit)
data("jags_logit")

# using mcmcRocPrc
fit_sum <- mcmcRocPrc(jags_logit,
                      yname = "Y",
                      xnames = c("X1", "X2"),
                      curves = TRUE,
                      fullsims = FALSE)
fit_sum                     
plot(fit_sum)

# Equivalently, we can calculate the matrix of predicted probabilities 
# ourselves; using the example from ?jags_logit:
library(R2jags)

data("sim_data")
yvec <- sim_data$Y
xmat <- sim_data[, c("X1", "X2")]

# add intercept to the X data
xmat <- as.matrix(cbind(Intercept = 1L, xmat))

beta <- as.matrix(as.mcmc(jags_logit))[, c("b[1]", "b[2]", "b[3]")]
pred_mat <- plogis(xmat %*% t(beta)) 

# the matrix of predictions has rows matching the number of rows in the data;
# the column are the predictions for each of the 2,000 posterior samples
nrow(sim_data)
dim(pred_mat)

# now we can call mcmcRocPrc; the default method works with the matrix
# of predictions and vector of outcomes as input
mcmcRocPrc(object = pred_mat, curves = TRUE, fullsims = FALSE, yvec = yvec)
}

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data

Format

a data.frame

Examples

## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
sim_data <- data.frame(cbind(X1, X2, Y))

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data_interactive

Format

a data.frame

Examples

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
b3 <- -0.3 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z_interactive <- b0 + b1 * X1 + b2 * X2 + b3 * (X1 * X2)
Y_interactive <- rnorm(n, Z_interactive, 1)
sim_data_interactive <- data.frame(cbind(X1, X2, Y = Y_interactive))

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data_interactive_cat

Format

a data.frame

Examples

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
b3 <- -0.3 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X3 <- rbinom(n, 5, .23)
Z_interactive_cat <- b0 + b1 * X1 + b2 * X3 + b3 * (X1 * X3)
Y_interactive_cat <- rnorm(n, Z_interactive_cat, 1)
sim_data_interactive_cat <- data.frame(cbind(X1, X3, Y = Y_interactive_cat))