--- title: "Demo of the bayeslm package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Demo of the bayeslm package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- In this vignette, we show how to use `bayeslm` to sample coefficients for a Gaussian linear regression with a number of different prior distributions. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(bayeslm) ``` # Generate data ```{r data_generate} set.seed(200) p = 20 n = 100 kappa = 1.25 beta_true = c(c(1,2,3),rnorm(p-3,0,0.01)) sig_true = kappa*sqrt(sum(beta_true^2)) x = matrix(rnorm(p*n),n,p) y = x %*% beta_true + sig_true * rnorm(n) x = as.matrix(x) y = as.matrix(y) data = data.frame(x = x, y = y) ``` ## Modeling $Y \mid X = x$ using linear regression ### OLS First, we run OLS and inspect the estimated coefficients ```{r ols} fitOLS = lm(y~x-1) coef(fitOLS) ``` ### `bayeslm` #### Updating coefficients individually The bayeslm sampler can group coefficients into blocks and sample several coefficients at once. Below, we simply place every coefficient into its own block. ```{r model_setup} block_vec = rep(1, p) ``` Now, we run `bayeslm` on six different priors and store their estimated coefficients ```{r prior_comparison, results = 'hide', warning=F, error=F} # Horseshoe prior fit1 = bayeslm(y, x, prior = 'horseshoe', icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est1 = colMeans(fit1$beta) # Laplace prior fit2 = bayeslm(y, x, prior = 'laplace', icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est2 = colMeans(fit2$beta) # Ridge prior fit3 = bayeslm(y, x, prior = 'ridge', icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est3 = colMeans(fit3$beta) # "Sharkfin" prior fit4 = bayeslm(y, x, prior = 'sharkfin', icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est4 = colMeans(fit4$beta) # "Non-local" prior fit5 = bayeslm(y, x, prior = 'nonlocal', icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est5 = colMeans(fit5$beta) # Inverse laplace prior fit6 = bayeslm(y, x, prior = 'inverselaplace', lambda = 0.01, icept = FALSE, block_vec = block_vec, N = 10000, burnin=2000) beta_est6 = colMeans(fit6$beta) ``` And we plot the posterior distribution of the regression coefficients, along with the OLS estimates, against the true simulated coefficients. ```{r comparison_plot, fig.height=5, fig.width=7} plot(NULL,xlim=range(beta_true),ylim=range(beta_true), xlab = "beta true", ylab = "estimation", ) points(beta_true,beta_est1,pch=20) points(beta_true,fitOLS$coef,col='red') points(beta_true,beta_est2,pch=20,col='cyan') points(beta_true,beta_est3,pch=20,col='orange') points(beta_true,beta_est4,pch=20,col='pink') points(beta_true,beta_est5,pch=20,col='lightgreen') points(beta_true,beta_est6,pch=20,col='grey') legend("topleft", c("OLS", "horseshoe", "laplace", "ridge", "sharkfin", "nonlocal", "inverselaplace"), col = c("red", "black", "cyan", "orange", "pink", "lightgreen", "grey"), pch = rep(1, 7)) abline(0,1,col='red') ``` We can also compare the root mean squared error (RMSE) for each prior ```{r rmse_comparison} rmseOLS = sqrt(sum((fitOLS$coef-beta_true)^2)) rmse1 = sqrt(sum((beta_est1-beta_true)^2)) rmse2 = sqrt(sum((beta_est2-beta_true)^2)) rmse3 = sqrt(sum((beta_est3-beta_true)^2)) rmse4 = sqrt(sum((beta_est4-beta_true)^2)) rmse5 = sqrt(sum((beta_est5-beta_true)^2)) rmse6 = sqrt(sum((beta_est6-beta_true)^2)) print(cbind(ols = rmseOLS, hs = rmse1,laplace = rmse2, ridge = rmse3, sharkfin = rmse4,nonlocal = rmse5, inverselaplace = rmse6)) ``` #### Updating coefficients in blocks Here, we demonstrate: 1. How to place several coefficients in the same block for the slice-within-Gibbs sampler 2. How to use formula notation (`y ~ x`) in the `bayeslm` library ```{r block_sampling} # Put the first two coefficients in one elliptical sampling block block_vec2 = c(2, rep(1, p-2)) fitb = bayeslm(y ~ x - 1, data = data, prior = 'horseshoe', block_vec = block_vec2, N = 10000, burnin = 2000) summary(fitb) ```