--- title: "Nonparametric Regression with Bayesian Additive Regression Kernels" output: rmarkdown::html_vignette author: "Merlise A Clyde" date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Nonparametric Regression with Bayesian Additive Regression Kernels} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Bayesian Additive Regression Kernel (BARK) models are a flexible Bayesian nonparametric model for regression and classification problems where the unknown mean function is represented as a weighted sum of multivariate Gaussian kernel functions, \begin{equation} f(\mathbf{x}) = \sum_j \beta_j \prod_d \exp( \lambda_d (x_d - \chi_{jd})^2) \end{equation} that allows nonlinearities, interactions and feature selection using Levy random fields to construct a prior on the unknown function. Each kernel is centered at location parameters $\chi_{jd}$ with precision parameters $\lambda_d$ - these precision parameters capture the importance of each of the $d$ dimensional predictor variables and by setting a $\lambda_d$ to zero may remove important variables that are not important. ## Installation To get the latest version of {r bark}, install from github (needs compilation)) ```{r, eval=FALSE} devtools::install_github("merliseclyde/bark") ``` ```{r} library(bark) ``` ## Example We will illustrate feature selection in a simple simulated example from Friedman ```{r Friedman2} set.seed(42) traindata <- data.frame(sim_Friedman2(200, sd=125)) testdata <- data.frame(sim_Friedman2(1000, sd=0)) ``` ```{r example-all} set.seed(42) fit.bark.d <- bark(y ~ ., data = traindata, testdata= testdata, classification=FALSE, selection = FALSE, common_lambdas = FALSE, # fixed = list(eps = .25, gam = 2.5), nburn = 100, nkeep = 250, printevery = 10^10) mean((fit.bark.d$yhat.test.mean-testdata$y)^2) ``` ```{r example-selection} set.seed(42) fit.bark.sd <- bark(y ~ ., data=traindata, testdata = testdata, classification=FALSE, selection = TRUE, common_lambdas = FALSE, fixed = list(eps = .5, gam = 5), nburn = 100, nkeep = 250, printevery = 10^10) mean((fit.bark.sd$yhat.test.mean-testdata$y)^2) ``` bark is similar to SVM, however it allows different kernel smoothing parameters for every dimension of the inputs $x$ using the option `common_lambdas = FALSE` as well as selection of inputs by allowing the kernel smoothing parameters to be zero using the option `selection = TRUE`. The plot below shows posterior draws of the $\lambda$ for the simulated data under the two scenarios allowing different $\lambda_d$ by dimension with and without selection. ```{r} boxplot(as.data.frame(fit.bark.d$theta.lambda)) ``` ```{r} boxplot(as.data.frame(fit.bark.sd$theta.lambda)) ``` While the plots of the $\lambda_j$ without selection (top) and with selection (bottom) are similar, the additional shrinkage of values towards zero has lead to an improvement in RMSE. ## Comparison We will compare {r bark} to two other popular methods, {r BART} and {r SVM} that provide flexible models that are also non-linear in the input variables. ```{r} bart.available = suppressMessages(require(BART)) svm.available = suppressMessages(require(e1071)) io.available = suppressMessages(require(fdm2id)) ``` ### SVM ```{r svm-reg} if (svm.available) { friedman2.svm = svm(y ~ ., data=traindata, type="eps-regression") pred.svm = predict(friedman2.svm, testdata) mean((pred.svm - testdata$y)^2) } ``` ### BART ```{r bart-reg} if (bart.available) { y.loc = match("y", colnames(traindata)) friedman2.bart = wbart(x.train = as.matrix(traindata[ , -y.loc]), y.train = traindata$y) pred.bart = predict(friedman2.bart, as.matrix(testdata[ , -y.loc])) yhat.bart = apply(pred.bart, 2, mean) mean((yhat.bart - testdata$y)^2) } ``` ### Classification Example The data are generated so that only the first 2 dimensions matter ```{r} set.seed(42) n = 500 circle2 = data.frame(sim_circle(n, dim = 5)) train = sample(1:n, size = floor(n/2), rep=FALSE) ``` ```{r, fig.width=4, fig.height=4 } plot(x.1 ~ x.2, data=circle2, col=y+1) ``` ```{r bark} set.seed(42) circle2.bark = bark(y ~ ., data=circle2, subset=train, testdata = circle2[-train,], classification = TRUE, selection = TRUE, common_lambdas = FALSE, fixed = list(eps = .5, gam = 5), nburn = 100, nkeep = 250, printevery = 10^10) ``` ```{r} #Classify # mean((circle2.bark$yhat.test.mean > 0) != circle2[-train, "y"]) ``` The plot below shows posterior draws of the $\lambda$ for the simulated data. ```{r} boxplot(as.data.frame(circle2.bark$theta.lambda)) ``` ## SVM ```{r svm} if (svm.available) { circle2.svm = svm(y ~ ., data=circle2[train,], type="C") pred.svm = predict(circle2.svm, circle2[-train,]) mean(pred.svm != circle2[-train, "y"]) } ``` ```{r bart} if (bart.available) { y.loc = match("y", colnames(circle2)) circle.bart = pbart(x.train = as.matrix(circle2[train, -y.loc]), y.train = circle2[train, y.loc]) pred.bart = predict(circle.bart, as.matrix(circle2[-train, -y.loc])) mean((pred.bart$prob.test.mean > .5) != circle2[-train, y.loc]) } ```