Title: | Implements Pseudo-R2D2 Prior for Ordinal Regression |
---|---|
Description: | Implements the pseudo-R2D2 prior for ordinal regression from the paper "Psuedo-R2D2 prior for high-dimensional ordinal regression" by Yanchenko (2025) <doi:10.48550/arXiv.2502.17491>. In particular, it provides code to evaluate the probability distribution function for the cut-points, compute the log-likelihood, calculate the hyper-parameters for the global variance parameter, find the distribution of McFadden's coefficient-of-determination, and fit the model in 'rstan'. Please cite the paper if you use these codes. |
Authors: | Eric Yanchenko [aut, cre] |
Maintainer: | Eric Yanchenko <eyanchenko@aiu.ac.jp> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2025-03-18 13:31:53 UTC |
Source: | CRAN |
This function computes the value of the probability density function for the cut-points. The distribution is induced by a Dirichlet distribution on the prior probabilities of the response.
dcut(tau, W, alpha, log = FALSE)
dcut(tau, W, alpha, log = FALSE)
tau |
cut-points |
W |
global variance |
alpha |
concentration parameters for prior probabilities of Y |
log |
logical; if TRUE, returns log pdf |
value of pdf at tau
tau = c(-1,1) # set cut points W = 1 # set value of global variance alpha = c(1,1,1) #concentration parameters dcut(tau, W, alpha, log=FALSE)
tau = c(-1,1) # set cut points W = 1 # set value of global variance alpha = c(1,1,1) #concentration parameters dcut(tau, W, alpha, log=FALSE)
This function finds the optimal GIG parameters for the prior on W which induces a beta prior distribution on McFadden's R2.
find_param( a, b, n, K, alpha = rep(1, K), nsims = 1000, nreps = 5, no_cores = 10 )
find_param( a, b, n, K, alpha = rep(1, K), nsims = 1000, nreps = 5, no_cores = 10 )
a |
hyper-parameter of prior for R2 ~ Beta(a,b) |
b |
hyper-parameter of prior for R2 ~ Beta(a,b) |
n |
number of observations |
K |
number of response categories |
alpha |
prior hyper-parameters for prior Dirichlet distribution on response probabilities |
nsims |
number of times to simulate data |
nreps |
number of times to run the algorithm (default = 5) |
no_cores |
number of cores to parallelize data-generation process |
Optimal GIG parameters
a = 1 b = 5 n = 100 K = 3 find_param(a, b, n, K, no_cores=1)
a = 1 b = 5 n = 100 K = 3 find_param(a, b, n, K, no_cores=1)
This function evaluates the log-likelihood of the response for a given value of the parameters.
llike(Y, W, tau)
llike(Y, W, tau)
Y |
ordinal response |
W |
global variance |
tau |
cut-points |
value of log-likelihood at Y, W and tau
set.seed(1234) K = 3 # number of response categories Y = sample(1:K, 10, replace=TRUE) # generate responses W = 1 tau = c(-1, 1) # set parameter values llike(Y, W, tau)
set.seed(1234) K = 3 # number of response categories Y = sample(1:K, 10, replace=TRUE) # generate responses W = 1 tau = c(-1, 1) # set parameter values llike(Y, W, tau)
This function carries out a Bayesian ordinal regression model in Stan using the proposed psuedo-R2D2 prior
ord_r2d2( x, y, K, a = 1, b = 10, hyper = NULL, alpha = rep(1, K), nsims = 1000, nreps = 5, no_cores = 10, progress = FALSE, ... )
ord_r2d2( x, y, K, a = 1, b = 10, hyper = NULL, alpha = rep(1, K), nsims = 1000, nreps = 5, no_cores = 10, progress = FALSE, ... )
x |
covariate matrix |
y |
response variables |
K |
number of response categories |
a |
hyper-parameter of prior for R2 ~ Beta(a,b) |
b |
hyper-parameter of prior for R2 ~ Beta(a,b) |
hyper |
hyper-parameters for W prior |
alpha |
prior hyper-parameters for prior Dirichlet distribution on response probabilities |
nsims |
number of times to simulate data |
nreps |
number of times to run the algorithm (default = 5) |
no_cores |
number of cores to parallelize data-generation process |
progress |
logical. if TRUE, shows the progress bars from the posterior sampling. |
... |
optional hyper-parameters for Stan fitting |
Stan model fit
# X are covariates, Y are responses, K is number of response categories # This example will yield low R2 values as the response are independent of the covariates. set.seed(1234) n = 100 p = 5 X = matrix(rnorm(n*p), nrow = n, ncol=p) K = 3 Y = sample(1:K, 100, replace=TRUE) a = 1 b = 5 # Pre-computed hyperparameters fit <- ord_r2d2(X, Y, K, hyper=c(0.002, 0.989, 1.013), no_cores=1) out <- rstan::extract(fit) # Plot histogram of posterior W hist(out$W, xlab="W")
# X are covariates, Y are responses, K is number of response categories # This example will yield low R2 values as the response are independent of the covariates. set.seed(1234) n = 100 p = 5 X = matrix(rnorm(n*p), nrow = n, ncol=p) K = 3 Y = sample(1:K, 100, replace=TRUE) a = 1 b = 5 # Pre-computed hyperparameters fit <- ord_r2d2(X, Y, K, hyper=c(0.002, 0.989, 1.013), no_cores=1) out <- rstan::extract(fit) # Plot histogram of posterior W hist(out$W, xlab="W")
This function finds the posterior distribution of McFadden's R2 given the posterior samples from a Stan model fit
r2_mc(Y, out)
r2_mc(Y, out)
Y |
ordinal response |
out |
posterior samples from R2D2 model fit in Stan |
Posterior samples from McFadden's R2
# Obtain output from ord_r2d2() model fit set.seed(1234) # X are covariates, Y are responses, K is number of response categories # This example will yield low R2 values as the response are independent of the covariates. n = 100 p = 5 X = matrix(rnorm(n*p), nrow = n, ncol=p) K = 3 Y = sample(1:K, 100, replace=TRUE) a = 1 b = 5 # Pre-computed hyperparameters fit <- ord_r2d2(X, Y, K, hyper=c(0.002, 0.989, 1.013), no_cores=1) out <- rstan::extract(fit) # Plot histogram of posterior R2 hist(r2_mc(Y, out), xlab="R2")
# Obtain output from ord_r2d2() model fit set.seed(1234) # X are covariates, Y are responses, K is number of response categories # This example will yield low R2 values as the response are independent of the covariates. n = 100 p = 5 X = matrix(rnorm(n*p), nrow = n, ncol=p) K = 3 Y = sample(1:K, 100, replace=TRUE) a = 1 b = 5 # Pre-computed hyperparameters fit <- ord_r2d2(X, Y, K, hyper=c(0.002, 0.989, 1.013), no_cores=1) out <- rstan::extract(fit) # Plot histogram of posterior R2 hist(r2_mc(Y, out), xlab="R2")