%\VignetteIndexEntry{BGLR-extdoc}
\documentclass[article,shortnames,nojss]{jss}

%Extra packages
\usepackage{amsmath}
\usepackage{tikz}
\usepackage{subfig}
\usepackage{appendix}
\usepackage{rotating}

\newlength{\RoundedBoxWidth}
\newsavebox{\GrayRoundedBox}
\newenvironment{GrayBox}[1][\dimexpr\textwidth-4.5ex]%
   {\setlength{\RoundedBoxWidth}{\dimexpr#1}
    \begin{lrbox}{\GrayRoundedBox}
       \begin{minipage}{\RoundedBoxWidth}}%
   {   \end{minipage}
    \end{lrbox}
    \begin{center}
    \begin{tikzpicture}%
       \draw node[draw=black,fill=black!10,rounded corners,%
             inner sep=2ex,text width=\RoundedBoxWidth]%
             {\usebox{\GrayRoundedBox}};
    \end{tikzpicture}
    \end{center}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% almost as usual
\author{Paulino P\'erez \\Colegio de Postgraduados, M\'exico \And
        Gustavo de los Campos \\University of Alabama at Birmingham}
\title{BGLR: A Statistical Package for Whole Genome Regression and Prediction}

%% for pretty printing and a nice hypersummary also set:
\Plainauthor{Paulino P\'erez, Gustavo de los Campos} %% comma-separated
\Plaintitle{BGLR: A Statistical Package for Whole-Genome Regression} %% without formatting
\Shorttitle{\pkg{BGLR}: An R-package for Whole-Genome Regression} %% a short title (if necessary)

%% an abstract and keywords
\Abstract{

Many modern genomic data analysis problems require implementing 
regressions where the number of unknowns ($p$, e.g., the number of marker effects) 
vastly exceeds sample size ($n$). Implementing these \textit{large-$p$-with-small-$n$} 
regressions poses several statistical and computational challenges. Some of these challenges
can be confronted using Bayesian methods, and the Bayesian approach
allows integrating various parametric and 
non-parametric shrinkage and variable selection procedures 
in a unified and consistent manner. The \pkg{BGLR} R-package implements a 
large collection Bayesian regression models, including various parametric 
regressions where regression coefficients are allowed to have different 
types of prior densities (flat, normal, scaled-t, double-exponential 
and various finite mixtures of the spike-slab family) and 
semi-parametric methods (Bayesian reproducing kernel Hilbert spaces nregressions, RKHS). 
The software was originally developed as an extension of the \pkg{BLR} 
package and with a focus on genomic applications; however, the methods 
implemented are useful for many non-genomic applications as well. The 
response can be continuous (censored or not) or categorical (either binary, 
or ordinal). The algorithm is based on a Gibbs Sampler with scalar 
updates and the implementation takes advantage of efficient compiled 
C and Fortran routines. In this article we describe the methods 
implemented in \pkg{BGLR}, present examples of the use of the package
 and discuss practical issues emerging in real-data analysis.
  
}
\Keywords{Bayesian Methods, Regression, Whole Genome Regression, Whole Genome Prediction, 
Genome Wide Regression, Variable Selection, Shrinkage, semi-parametric regression, RKHS, \proglang{R}}
\Plainkeywords{Bayesian Regression, Whole Genome Regression, Whole Genome Prediction, 
Genome Wide Regression, Variable Selection, Shrinkage, semi-parametric regression, R} %% without formatting

%% publication information
%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
%% \Volume{50}
%% \Issue{9}
%% \Month{June}
%% \Year{2012}
%% \Submitdate{2012-06-04}
%% \Acceptdate{2012-06-04}

%% The address of (at least) one author should be given
%% in the following format:
\Address{

  Paulino P\'erez\\
  Socio Econom\'ia Estad\'istica e Inform\'atica \\
  Colegio de Postgraduados, M\'exico\\
  E-mail: \email{perpdgo@colpos.mx}\\
  
  Gustavo de los Campos\\
  Department of Biostatistics \\
  Section on Statistical Genetics \\ 
  University of Alabama at Birmingam \\
  Telephone: +1/205/975-9248 \\
  Fax: +1 /205/975-2540 \\
  E-mail: \email{gcampos@uab.edu}\\
  \url{http://www.soph.uab.edu/ssg/people/campos}
  
}

%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

%% include your article here, just as usual
%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
\label{sec:Intro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Many modern statistical learning problems involve the analysis of highly dimensional data; 
this is particularly common in genetic studies where, for instance, phenotypes 
are regressed on large numbers of predictor variables (e.g., SNPs) concurrently. 
Implementing these \textit{large-p-with-small-n} regressions posses several 
statistical and computational challenges; including how to confront the so-called 
`curse of dimensionality' \citep{Bellman:1961} as well 
as the complexity of a genetic mechanism that can involve various types of 
interactions between alleles and with environmental factors. Recent developments 
in the areas of shrinkage estimation, both in the penalized and Bayesian 
regression frameworks, as well as in computational methods have 
made the implementation of these \textit{large-p-with-small-n} regressions feasible. 
Consequently, whole-genome-regression approaches \citep{Meuwissen:2001} are 
becoming increasingly popular for the analysis and prediction of complex 
traits in plants  \citep[e.g.][]{Crossa:2010}, animals 
\citep[e.g.][]{VanRaden:2009, Hayes:2009} 
and humans \citep[e.g.][]{Yang:2010,Makowsky:2011,Vazquez:2012, delosCampos:2013c}.

In the last decade a large collection of parametric and non-parametric 
methods have been proposed and empirical evidence has demonstrated that 
there is no single approach that performs best across data sets and 
traits. Indeed, the choice of the model depends on multiple factors 
such as the genetic architecture of the trait, marker density, sample 
size, the span of linkage disequilibrium
\citep[e.g.,][]{delosCampos:2013b}. Although various 
software (\textbf{BLR}, \citealt{Perez:2010}; \textbf{rrBLUP}, 
\citealt{Endelman:2011};
\textbf{synbreed}, \citealt{Wimmer:2012}; \textbf{GEMMA}, \citealt{Zhou:2012}) 
exists, most statistical packages implement a few 
types of methods and there is need of integrating these methods in a 
unified statistical and computational framework. Motivated by this need we 
have developed the R \citep{Rsoft} package 
\pkg{BGLR} \citep{delosCampos:2013a}. The package is available 
at CRAN and at the R-forge website 
\url{https://r-forge.r-project.org/projects/bglr/}. 

\textbf{Models.} \pkg{BGLR} can be used with \textbf{continuous} (censored or not) 
and \textbf{categorical} traits (binary and ordinal). The user has 
control in choosing the prior assigned to effects and this can be used to control the extent 
and type of shrinkage of estimates. For \textbf{parametric linear regressions on covariates} 
(e.g., genetic markers, non-genetic co-variates) the 
user can choose a variety of prior densities, from flat 
priors (the so-called \textbf{\emph{`fixed effects'}}, a method that 
does not induce shrinkage of estimates) to priors that induce 
different types of shrinkage, including: 
Gaussian (\textbf{Bayesian Ridge Regression}, \textbf{BRR}), 
scaled-t \citep[\textbf{BayesA}][]{Meuwissen:2001}, 
Double-Exponential \citep[\textbf{Bayesian LASSO}, \textbf{BL}][]{Park:2008}, and 
\textbf{\emph{two component mixtures}} with a point of mass at zero and a with a slab 
that can be either Gaussian \citep[\textbf{BayesC},][]{Habier:2011} or scaled-t 
\citep[\textbf{BayesB},][]{Habier:2011}. 
The \pkg{BGLR} package also implements \textbf{\emph{Bayesian Reproducing Kernel Hilbert Spaces Regressions}} 
\citep[\textbf{RKHS,}][]{Wahba:1990} using Gaussian processes with arbitrarily 
user-defined co-variance structures. This class of models allows 
implementing semi-parametric regressions for various types of 
problems, including, scatter-plot smoothing
\citep[e.g., \textbf{\emph{smoothing splines}}][]{Wahba:1990}, 
\textbf{spatial smoothing} \citep{Cressie:1988}, 
\textbf{\emph{Genomic-BLUP}} \citep{VanRaden:2008}, 
non-parametric \textbf{\emph{RKHS}} genomic regressions 
\citep{Gianola:2006, Gianola:2008,delosCampos:2010b} and \textbf{\emph{pedigree-BLUP}} 
\citep{Henderson:1975}.

All the above-mentioned prior densities (e.g., Gaussian, Double Exponential, Scaled-t, finite mixtures) 
are index by  \textbf{regularization parameters} that control the extent of shrinkage 
of estimates; rather than fixing them to some user-specified 
values we treat them as random. Consequently, in a deeper level of the 
hierarchal model these regularization parameters are assigned 
prior densities.

\textbf{Algorithms}. In \pkg{BGLR} samples from the posterior 
density are drawn using a Gibbs sampler \citep{Geman:1984, Casella:1992}; with 
scalar updating.  This approach is 
very flexible but computationally demanding. To confront 
the computational challenges emerging in Markov Chain 
Monte Carlo (MCMC) implementations we have adopted a strategy that combines: 
(a) the use of built-in R functions for operations that can be vectorized with 
(b) customized compiled code (C and Fortran) developed to perform operations that cannot be vectorized. Thus, 
the kernel of our software is written in R, but the computationally 
demanding steps are carried out using customized routines written in C and 
Fortran code. The implementation makes use of BLAS routines \texttt{daxpy} and \texttt{ddot}. 
The computational performance of the algorithm can be 
greatly improved if R is linked against a tuned BLAS 
implementation with multithread support, for example 
OpenBLAS, ATLAS, Intel mkl, etc. 

\textbf{Ancillary functions and data sets}. In addition to the main 
function (BGLR) the package comes with: (a) functions to read 
and write from the R-console \texttt{*.ped} and \texttt{*.bed} files 
\citep{Purcell:2007}, (b) two publicly available data sets 
(see Section \ref{sec:datasets} for further details)
and (c) various examples (type \texttt{demo(package=`BGLR')} in the R-console). 

In what remains of the article we discuss the 
\textbf{methods implemented} (\textbf{Section \ref{sec:methods}}), the 
\textbf{user interface}  (\textbf{Section \ref{sec:interface}}), and the \textbf{data sets} 
included (\textbf{Section \ref{sec:datasets}}) in the \pkg{BGLR} package in detail. 
\textbf{Application examples} are given in \textbf{Section \ref{sec:examples}}. 
A small benchmark is given in \textbf{Section \ref{sec:benchmark}}. Finally, the article 
is closed in \textbf{Section \ref{sec:concluding}} with a few \textbf{concluding remarks}.  


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Statistical Models and Algorithms}
\label{sec:methods}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The \pkg{BGLR} supports models for continuous (censored or not) and categorical 
(binary or ordinal multinomial) traits. We begin by 
considering the case of a continuous response 
without censoring; categorical and 
censored data are considered later on.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Conditional distribution of the data}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

For a continuous response ($y_i;\,\, i=1,....,n$) the data equation 
is represented as  $y_i=\eta_i+\varepsilon_i$, 
where $\eta_i$ is a linear predictor 
(the expected value of  $y_i$ given predictors) 
and $\varepsilon_i$ are independent normal model 
residuals with mean zero and variance $w_i^2 \sigma_\varepsilon^2$. Here, the $w_i's$ 
are user-defined weights (by default \pkg{BGLR} sets $w_i=1$ 
for all data-points) and $\sigma_\varepsilon^2$ is a residual 
variance parameter. In matrix notation we have

\begin{equation*}
  \boldsymbol{y}=\boldsymbol{\eta} + \boldsymbol{\varepsilon},
\end{equation*}

where $\boldsymbol y=\{y_1,...,y_n\}$,  $\boldsymbol \eta=\{\eta_1,...,\eta_n\}$ and 
$\boldsymbol \varepsilon=\{\varepsilon_1,...,\varepsilon_n\}$.

The linear predictor represents the conditional 
expectation function, and it is structured as follows:

\begin{equation}
  \label{eq:eta}
  \boldsymbol{\eta}=\boldsymbol 1 \mu + %
  \sum_{j}^J \boldsymbol{X}_j \boldsymbol{\beta}_j + %
  \sum_{l}^{L} \boldsymbol u_l,
\end{equation}

where $\mu$ is an intercept, $\boldsymbol X_j$ are design matrices for predictors, 
$\boldsymbol X_j=\{x_{ijk}\}$, $\boldsymbol \beta_{jk}$ are vectors of 
effects associated to the columns of $\boldsymbol X_j$ and 
$\boldsymbol u_l=\{u_{l1},...,u_{ln}\}$ are vectors of random effects. 
The only element of the linear predictor included by default is the intercept. The 
other elements are user-specified. Collecting the above assumptions, 
we have the following likelihood:

\begin{equation*}
  p(\boldsymbol y | \boldsymbol \theta)= %
  \prod_{i=1}^n N(y_i | \mu + \sum_j^J \sum_{k=1}^{K_j} x_{ijk} \beta_{jk} + \sum_l^L u_{li}, %
                  \sigma_\varepsilon^2 w_i^2),
\end{equation*}

where $\boldsymbol \theta$ represents the collection of 
unknowns, including the intercept, regression 
coefficients, random effects and the residual variance.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Prior densities}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The \textbf{residual variance} is assigned a Scaled-inverse Chi-square 
density $p(\sigma_\varepsilon^2)=\chi^{-2}(\sigma_\varepsilon^2 | S_\varepsilon, df_\varepsilon)$
with degree of freedom $df_\varepsilon$ (>0) 
and scale parameters $S_\varepsilon$ (>0) and the intercept $(\mu)$ is assigned a flat prior. 
In the parameterization used in \pkg{BGLR}, the prior expectation of the Scaled-inverse Chi-square 
density $\chi^{-2}(\cdot| S_{\cdot}, df_{\cdot})$ is given by  
$\frac{S_\cdot}{df_\cdot-2}$.

\textbf{Regression coefficients} $\{\beta_{jk}\}$ can be assigned either 
un-informative (i.e., flat) or informative priors. Those coefficients 
assigned flat priors, the so-called `fixed' effects, are estimated based on 
information contained in the likelihood solely. For the coefficient 
assigned informative priors, the choice of the prior will play an important 
role in determining the type of shrinkage of estimates of effects induced. 
Figure \ref{fig:prior_densities} provides a graphical representation of the prior 
densities available in \pkg{BGLR}. 
The \textbf{\emph{Gaussian prior}} induce shrinkage of estimate similar to that of Ridge Regression 
\citep[\textbf{RR},][]{Hoerl:1970} where all effects are shrunk to a 
similar extent; we refer to this model as the Bayesian 
Ridge Regression (BRR). The \textbf{scaled-t} 
and \textbf{double exponential} (DE) densities have higher mass at zero 
and thicker tails than the normal density, and they induce a type of 
shrinkage of estimates that is size-of-effect dependent \citep{Gianola:2013}. The 
scaled-t density is the prior used in model 
BayesA \citep{Meuwissen:2001}, and the DE or 
Laplace prior is the one used in the BL \citep{Park:2008}. 
Finally, \pkg{BGLR} implements two \textbf{\emph{finite mixture priors}}: a mixture of 
a point of mass at zero and a Gaussian slab, a model usually 
refereed in the literature on GS as to \textbf{BayesC} \citep{Habier:2011} 
and a mixture of a point of mass at zero and a scaled-t slab, a 
model known as \textbf{BayesB} \citep{Meuwissen:2001}. By assigning a 
non-null prior probability for the marker effect to be equal to zero, 
the priors used in BayesB and BayesC have potential 
for inducing variable selection.

\begin{figure}[!htb]
  \centering
  \includegraphics[width=0.75\textwidth]{fig_1}
  \caption{Prior Densities of Regression Coefficients Implemented in \pkg{BGLR}. All the densities displayed correspond to random variables with null mean and unit variance.}
  \label{fig:prior_densities}
\end{figure}


\textbf{Hyper-parameters}. Each of the prior distributions 
above-described  are indexed by one or more parameters that control 
the type and extent of shrinkage induced. We treat these 
regularization parameters as unknown; consequently a prior 
is assigned to these unknowns. Table \ref{tab:priors_marker_effects} 
lists, for each of the prior densities implemented the set of hyper-parameters. 
Further details about how regularization parameters are inferred 
from the data are given in the Appendix. 


\begin{table}[!htb]
  \caption{Prior densities available for regression coefficients in the \pkg{BGLR} package.}
  \begin{center}
  \begin{tabular}{lll}
  \hline
  \textbf{Model}                        & \textbf{Hyper-parameters}  & \textbf{Treatmend in BGLR}$^1$ \\
  \textbf{(prior density)}              &                            & \\
  \hline
  Flat                                 & Mean ($\mu_\beta$)         & $\mu_\beta=0$ \\
  (FIXED)                                & Variance ($\sigma_\beta^2$)& $\sigma_\beta^2=1 \times 1^{10}$\\
  \hline
  Gaussian                                & Mean ($\mu_\beta$)         & $\mu_\beta=0$ \\
  (BRR)                            & Variance ($\sigma_\beta^2$)& $\sigma_\beta^2 \sim \chi^{-2}$\\
  \hline
  Scaled-t                                & Degrees of freedom ($df_\beta$) & User-specified (default value, 5) \\
  (BayesA)                            & Scale ($S_\beta$)  & $S_\beta \sim Gamma$\\
  \hline
   Double-Exponential                               &                            & $\lambda$ Fixed, user specified, or \\
    (BL)                                & $\lambda^2$  & $\lambda^2 \sim Gamma$, or      \\
                                        &                            & $\frac{\lambda}{\max} \sim Beta ^2$ \\                                        
  \hline
  Gaussian Mixture                                    &$\pi$ (prop. of non-null effects) & $\pi \sim Beta$ \\
   (BayesC)                               &$df_\beta$                       & User-specified (default value, 5) \\
                                        &$S_\beta$                        & $S_\beta \sim Gamma$ \\                                            
  \hline
   Scaled-t Mixture                                     &$\pi$ (prop. of non-null effects) & $\pi \sim Beta$ \\
   (BayesB)                               &$df_\beta$                       & User-specified (default value, 5) \\
                                        &$S_\beta$                        & $S_\beta \sim Gamma$ \\     
  \hline
  \end{tabular}
  \end{center}
  1: Further details are given in the Appendix. 2: This approach is further discussed in 
  \cite{delosCampos:2009a}.
  \label{tab:priors_marker_effects}
\end{table}

\textbf{Combining priors}. Different priors can be specified for each of the elements of the linear predictor, 
$\{\boldsymbol{X}_1,\boldsymbol{X}_2,...,\boldsymbol{X}_J, \boldsymbol{u}_1,\boldsymbol{u}_2,...,\boldsymbol{u}_L\}$, 
giving the user great flexibility in building models for data analysis;
an example illustrating how to combine different priors in a model is given in Box 3a of Section \ref{sec:examples}.

\textbf{Gaussian Processes}. The vectors of random effects 
$\boldsymbol u_l$ are assigned multivariate-normal 
priors with a mean equal to zero and co-variance matrix 
$Cov(\boldsymbol u_l,\boldsymbol u_l')=\boldsymbol K_l \sigma_{ul}^2$ 
where $\boldsymbol K_l$ is an 
$n \times n$ symmetric positive semi-definite matrix 
and $\sigma_{ul}^2$ is a variance parameter with prior 
density $\sigma_{ul}^2 \sim \chi^{-2} (df_l,S_l)$.  Special classes 
of models that can be implemented using these random effects include 
standard pedigree-regression models \citep{Henderson:1975} in which 
case $\boldsymbol K_l$ is a pedigree-derived co-variance matrix, 
Genomic BLUP \citep{VanRaden:2008}, which case $\boldsymbol K_l$ may 
be a marker-derived relationship matrix, or models for 
spatial regressions \citep{Cressie:1988} in which case  
$\boldsymbol K_l$ may be a co-variance matrix 
derived from spatial information. Illustration about the 
inclusion of these Gaussian processes into models for 
data analysis are given in examples of Section \ref{sec:examples}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Algorithms}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The R-package \pkg{BGLR} draws samples from the posterior density using a 
Gibbs sampler  \citep{Geman:1984,Casella:1992} with scalar updating. 
For computational convenience the scaled-t and DE densities 
are represented as infinite mixtures of scaled normal 
densities \citep{Andrews:1974}, and the finite-mixture
priors are implemented using latent random Bernoulli variables linking effects
to components of the mixtures. 

\textbf{\emph{Categorical traits}}.  The argument \texttt{response\_type} 
is used to indicate \pkg{BGLR} whether the response should be 
regarded as \texttt{`continuous'}, the default value, or \texttt{`ordinal'}. For continuos traits the
response vector shoud be coercible to numeric; for ordinal traits the response can take onto $K$ possible (ordered) values 
$y_i \in \{1,...,K\}$ (the case where $K=2$ corresponds to the 
binary outcome), and the response vector should be coercible to a factor. For categorical traits
we use the probit link; here, the probability of 
each of the categories is linked to the linear 
predictor according to the following link function:

\begin{equation*}
  P(y_i=k)=\Phi(\eta_i-\gamma_k)-\Phi(\eta_i-\gamma_{k-1})
\end{equation*}

where $\Phi(\cdot)$ is the standard normal cumulative distribution function, 
$\eta_i$ is the linear predictor, specified as above-described, and 
$\gamma_k$ are threshold parameters, with 
$\gamma_0=-\infty$, $\gamma_k \geq \gamma_{k-1}$, $\gamma_K=\infty$.  
The probit link is implemented using data augmentation \citep{Tanner:1987}, 
this is done by introducing a latent variable (so-called liability) 
$l_i=\eta_i+\varepsilon_i$ and a measurement model $y_i=k$ if $\gamma_{k-1} \leq l_i \leq \gamma_k$.
For identification purpouses, the residual variance is set equal to one. 
At each iteration of the Gibbs sampler the un-observed liability 
scores are sampled from truncate normal densities; once 
the un-observed liability has been sampled the Gibbs sampler proceed as 
if  $l_i$ were observed \citep[see][for further details]{Albert:1993}.

\textbf{\emph{Missing data}}.  The response vector 
can contain missing values. Internally, at each iteration of the Gibbs 
sampler missing values are sampled from the corresponding 
fully-conditional density. Missing values in predictors are not allowed.

\textbf{\emph{Censored data}}. Censored data in \pkg{BGLR} is described a 
triplet $\{a_i, y_i , b_i\}$; the elements of this 
triplet must satisfy: $a_i <y_i<b_i$. Here, $y_i$  is 
the observed response (e.g., a time-to event variable, 
observable only in un-censored data points, 
otherwise missing, \texttt{NA}) and $a_i$  and $b_i$  
define lower and upper-bounds for the response, 
respectively. Table \ref{tab:triplet} gives the configuration of 
the triplet for the different types of data-points. 
The triplets are provided to \pkg{BGLR} in the form of three vectors 
$(\boldsymbol y, \boldsymbol a, \boldsymbol b)$. The vectors 
$\boldsymbol a$ and $\boldsymbol b$ have \texttt{NULL} as default value; 
therefore, if only $\boldsymbol y$ is provided this 
is interpreted as a continuous trait without censoring. 
If $\boldsymbol a$ and $\boldsymbol b$ are provided 
together with $\boldsymbol y$ data is treated as censored. 
We treat censoring as a missing data problem; the missing values of $y_i$ present due 
to censoring are sampled from truncated normal densities 
that satisfy $a_i <y_i<b_i$. Further details about models 
for censored data are given in examples of section \ref{sec:examples}.

\begin{table}[!htb]
   \caption{Configuration of the triplet used to described censored data-points in BGLR.}
   \centering
   \begin{tabular}{lcccc}
   \hline
     Type of point   & $a_i$           & $y_i$        & $b_i$  \\
   \hline
   Un-censored       & \texttt{NULL}   & $y_i$        & \texttt{NULL} \\
   Right censored    & $a_i$           & \texttt{NA}  & \texttt{Inf} \\
   Left censored     & -\texttt{Inf}   & \texttt{NA}  & $b_i$     \\
   Interval censored & $a_i$           & \texttt{NA}  & $b_i$ \\
   \hline
  \end{tabular}
  \label{tab:triplet}
\end{table}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Software interface}
\label{sec:interface}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The R-package \pkg{BGLR} \citep{delosCampos:2013a} inherits part of its user interface 
from \pkg{BLR} \citep{delosCampos:2010a}. A detailed description of this package 
can be found in \citep{Perez:2010}; however we have modified key 
elements of the user-interface, and the internal implementation,
 to provide the user more flexibility in building models for data analysis. 
All the arguments of the BGLR function have default values, except the vector of phenotypes. 
Therefore, the simplest call to the BGLR program is as follows:

\begin{GrayBox}
\small
\textbf{Box 1a: Fitting an intercept model}
\begin{verbatim}
library(BGLR)
y<-50+rnorm(100)
fm<-BGLR(y=y)
\end{verbatim}
\end{GrayBox}


When the call \texttt{fm<-BGLR(y=y)} is made, BGLR fits an intercept model, 
a total of 1500 cycles of a Gibbs sampler are run, and the 1st 500 samples 
are discarded. As the Gibbs sampler collects samples some are saved to the hard 
drive (only the most recent samples are retained in memory) in files 
with extension \texttt{*.dat} and the running means required for computing
estimates of the posterior means and of the posterior standard 
deviations are updated; by default a thinning of 5 is used but this can be modified 
by the user using the \texttt{thin} argument of \pkg{BGLR}. Once the 
iteration process finishes BGLR returns a list with estimated 
posterior means and several arguments used in the call. 


\textbf{Inputs}

Box 1b displays a list of the main arguments of the \pkg{BGLR} function, a short description follows:

\begin{itemize}
  \item \texttt{y,a,b} (\texttt{y}, coercible to either numeric or factor, 
         \texttt{a} and \texttt{b} of type numeric) and \texttt{response\_type} (character) 
         are used to define the response. 
  \item \texttt{ETA} (of type \texttt{list}) is used to 
        specify the linear predictor. By default is set to \texttt{NULL}, in which 
        case only the intercept is included. Further details about the specification 
        of this argument are given below.
  \item \texttt{nIter}, \texttt{burnIn} and \texttt{thin} (all of type \texttt{integer}) 
        control the number of iterations of the sampler, the number of 
        samples discarded and the thinning used to compute posterior means.
  \item \texttt{saveAt} (\texttt{character})  can be used to indicate 
        BGLR where to store the samples, and to provide a pre-fix to be appended 
        to the names of the file where samples are stored. By defaults 
        samples are saved in the current working directory and no pre-fix is added to the file names.
  \item \texttt{S0}, \texttt{df0}, \texttt{R2} (\texttt{numeric}) define 
        the prior assigned to the residual variance, \texttt{df0} defines the 
        degree of freedom and \texttt{S0} the scale. If the scale is \texttt{NULL}, 
        its value is chosen so that the prior mode of the 
        residual variance matches the variance of phenotypes times \texttt{1-R2} (see the Appendix for further details).
\end{itemize}


\begin{GrayBox}
\small
\textbf{Box 1b: Partial list of arguments of the BGLR function}

\begin{verbatim}
BGLR( y, a = NULL, b = NULL, response_type = "gaussian", 
      ETA = NULL, 
      nIter = 1500, burnIn = 500, thin = 5, 
      saveAt = "", 
      S0 = NULL, df0 = 5, R2 = 0.5,...
     )
\end{verbatim}
\end{GrayBox}

\textbf{Return}

The function BGLR returns a list with estimated posterior means and 
estimated posterior standard deviations. The parameters used to fit the 
model are also returned within the list. Box 1c shows the structure of 
the object returned after fitting the intercept model of Box 1a. The 
first element of the list \texttt{(y)} is the response vector used 
in the call to \pkg{BGLR}, \texttt{\$whichNa} gives the index of the entries in \texttt{y} 
that were missing, these two elements are then followed by 
several entries describing the call (omitted in Box 1c), 
this is followed by estimated posterior means and estimated 
posterior standard deviations of the linear predictor (\texttt{\$yHat} 
and \texttt{\$SD.yHat}), the intercept (\texttt{\$mu} and \texttt{\$SD.mu}) 
and the residual variance (\texttt{\$varE} and \texttt{\$SD.varE}). 
Finally \texttt{\$fit} gives a list with $DIC$ and $DIC$-related statistics 
\citep{Spiegelhalter:2002}.


\begin{GrayBox}
\small
\textbf{Box 1c: Structure of the object returned by BGLR (after running the code in Box 1a)}

\begin{verbatim}
str(fm)
List of 20
 $ y            : num [1:100] 50.4 48.2 48.5 50.5 50.2 ...
 $ whichNa      : int(0) 
  .
  .
  .
 $ yHat         : num [1:100] 49.7 49.7 49.7 49.7 49.7 ...
 $ SD.yHat      : num [1:100] 0.112 0.112 0.112 0.112 0.112 ...
 $ mu           : num 49.7
 $ SD.mu        : num 0.112
 $ varE         : num 1.11
 $ SD.varE      : num 0.152
 $ fit          :List of 4
  ..$ logLikAtPostMean: num -147
  ..$ postMeanLogLik  : num -148
  ..$ pD              : num 2.02
  ..$ DIC             : num 298
-	attr(*, "class")= chr "BGLR"
\end{verbatim}
\end{GrayBox}


\textbf{Output files}

Box 1d shows an example 
of the files generated after executing the commands given in Box 1a. 
In this case samples of the intercept (\texttt{mu.dat}) and of 
the residual variance (\texttt{varE.dat}) were stored. These 
samples can be used to assess convergence and to estimate Monte Carlo error.  
The R-package \pkg{coda} \citep{Plummer:2006} provide several useful 
functions for the analysis of samples used in Monte Carlo algorithms.

\begin{GrayBox}
\small
\textbf{Box 1d: Files generated by BGLR
(after running the code in Box 1a)}
\begin{verbatim}
list.files()
[1] "mu.dat"   "varE.dat"
plot(scan("varE.dat,type='o'"))
\end{verbatim}
\end{GrayBox}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Datasets}
\label{sec:datasets}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The \pkg{BGLR} package comes with two genomic datasets 
involving phenotypes, markers, pedigree and other covariates.

\textbf{Mice data set}. This data set is from the Wellcome Trust 
(\url{http://gscan.well.ox.ac.uk}) and has been used for detection of 
Quantitative Trait Loci (QTL) by \cite{Valdar:2006a, Valdar:2006b} and for 
whole-genome regression by \cite{Legarra:2008}, \cite{delosCampos:2009a} 
and \cite{Okut:2011}. The data set consists of genotypes and phenotypes of 
1,814 mice. Several phenotypes are available in the data frame 
\texttt{mice.phenos}. Each mouse was genotyped at 10,346 SNPs. We removed SNPs 
with minor allele frequency (MAF) smaller than 0.05, and missing 
marker genotypes imputed with the corresponding average genotype 
calculated with estimates of allele frequencies derived from 
the same data. In addition to this, an additive relationship 
matrix (\texttt{mice.A}) is provided; this was computed 
using the R-package \texttt{pedigreemm}  \citep{Bates:2009, Vazquez:2010}. 


\textbf{Wheat data set}. This data set is from CIMMYT global 
Wheat breeding program and comprises phenotypic, genotypic and 
pedigree information of 599 wheat lines. The data set was made 
publicly available by \cite{Crossa:2010}. Lines were evaluated 
for grain yield (average of two replicates) at four different 
environments; phenotypes (\texttt{wheat.Y}) were centered and standardized 
to a unit variance within environment. Each of the lines 
were genotyped for 1,279 Diversity Array Technology (DArT) markers. 
At each marker two possibly homocygous were possible and these were 
coded as 0/1. Marker genotypes are given in the object \texttt{wheat.X}. 
Finally a matrix \texttt{wheat.A} provides the pedigree-relationships 
between lines computed from the pedigree (see 
\citealt{Crossa:2010} for further details). Box 2 illustrates how to load 
the wheat and mice data sets.

\begin{GrayBox}
\small
\textbf{Box 2: Loading the mice data set included in BGLR}

\begin{verbatim}
library(BGLR)
data(mice)
data(wheat)
ls()
\end{verbatim}
\end{GrayBox}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Application Examples}
\label{sec:examples}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In this section we illustrate the use of \pkg{BGLR} with examples.
 
\textbf{Fitting Models for Fixed and Random Effects for a Continuous Response}  
 
We illustrate how to fit models with various sets of predictors using the mice data set. 
\citet{Valdar:2006b} pointed out that the cage where mice were housed had an important effect in 
the physiological covariates and \citet{Legarra:2008} and \citet{delosCampos:2009a} used models 
that accounted for sex, litter size, cage, familial relationships and 
markers. One possible linear model that we can fit to some of the 
continuous traits available in the mice data set is as follows: 


\begin{equation*}
 \boldsymbol y= \boldsymbol 1 \mu + \boldsymbol X_1 \boldsymbol \beta_1 + %
                \boldsymbol X_2  \boldsymbol \beta_2 + %
                \boldsymbol X_3 \boldsymbol \beta_3 + % 
                \boldsymbol \varepsilon, 
\end{equation*}

where $\mu$ is an intercept, $\boldsymbol X_1$ is a design matrix for the effects 
of sex and litter size, and $\boldsymbol \beta_1$ is the corresponding vector of effects, 
which will be treated as `fixed'; $\boldsymbol X_2$ is the design matrix for the 
effects of cage and $\boldsymbol \beta_2$ is the vector of cage effects 
which will treat as random (in the example fo Box 3a we assign a Gaussian prior 
to these effects); $\boldsymbol X_3$ is the matrix with marker 
genotypes and $\boldsymbol \beta_3$ the corresponding vector of 
marker effects to which, in the example below, we assign IID double-exponential priors.


\textbf{Fitting the model}. The code provided in Box 3a illustrates 
how to fit the model above-described using \pkg{BGLR}. The first block of code, \texttt{\#1\#}, 
loads the data. In the second block of code we set the linear predictor. 
This is specified using a two-level list. Each of the elements of the 
inner list is used to specify the element of the linear predictor. We can specify the predictors to 
be included in each of the inner lists either by providing the design matrix or by using a formula. When the 
formula is used, the design matrix is created internally using the 
\texttt{model.matrix()} function of R. Finally in the 
3rd block of code we fit the model by calling the \texttt{BGLR()} function.

\begin{GrayBox}
\small
\textbf{Box 3a: Fitting a model to markers and non-genetic effects in \pkg{BGLR}} 

\begin{verbatim}
#1# Loading and preparing the input data
 library(BGLR); data(mice); 
 Y<-mice.pheno; X<-mice.X; A=mice.A; 
 y<-Y$Obesity.BMI; y<-(y-mean(y))/sd(y)

#2# Setting the linear predictor
 ETA<-list(  list(~factor(GENDER)+factor(Litter),                   
                   data=Y,model='FIXED'),
             list(~factor(cage),data=Y, model='BRR'),
             list(X=X,  model='BL')
       )

#3# Fitting the model
  fm<-BGLR(y=y,ETA=ETA, nIter=12000, burnIn=2000)
  save(fm,file='fm.rda')
\end{verbatim}         
\end{GrayBox}

When BGLR begins to run, a message warns the user that 
hyper-parameters were not provided and that consequently they 
were set  using built-in rules; further details about these rules are given in the Appendix.

\textbf{Extracting results}. Once the model was fitted one 
can extract from the list returned by \pkg{BGLR} the estimated posterior means 
and the estimated posterior standard deviations as well as measures of model 
goodness of fit and of model complexity. Also, as \pkg{BGLR} run, it saves 
samples of some of the parameters; these samples can be brought into 
the R-environment for posterior analysis. Box 3b illustrates 
how to extract from the returned object estimates of the posterior means and of the posterior 
deviations and how to create trace and density plots. 

The first block of code (\texttt{\#1\#}) in Box 3b shows how to 
extract estimated posterior means and posterior standard deviations of effects. 
In this case we extract those corresponding to the third element of the linear 
predictor (\texttt{fm\$ETA[[3]]}) which correspond to the markers, 
but the same could be done for any of the elements of the linear predictors. 
For models involving linear regressions \texttt{\$b} and \texttt{\$SD.b} give 
the estimated posterior means and posterior standard deviations of effects.  
The second block (\texttt{\#2\#}) of code of Box 3b shows how to 
extract the estimated posterior mean of the linear predictor, 
and also how to compute the estimated posterior mean 
of particular elements of the linear predictor, in this case we illustrate 
with genomic values (gHat). The third block of code (\texttt{\#3\#}) illustrates 
how to extract $DIC$ \citep{Spiegelhalter:2002} and related statistics; 
finally, the fourth block of code (\texttt{\#4\#}) shows how to 
retrieve samples from the posterior distribution and produce 
trace plots. The plots produced by the code in Box 3b are given in 
Figure \ref{fig:four_plots}. 

\clearpage
\begin{GrayBox}
\small
\textbf{Box 3b: Extracting results from a model fitted using BGLR 
(continues from Box 3a)}

\begin{verbatim}
#1# Estimated Marker Effects & posterior SDs
  bHat<- fm$ETA[[3]]$b
  SD.bHat<- fm$ETA[[3]]$SD.b
  plot(bHat^2, ylab='Estimated Squared-Marker Effect', 
        type='o',cex=.5,col=4,main='Marker Effects')

#2# Predictions
  # Total prediction
    yHat<-fm$yHat
    tmp<-range(c(y,yHat))
    plot(yHat~y,xlab='Observed',ylab='Predicted',col=2,
            xlim=tmp,ylim=tmp); abline(a=0,b=1,col=4,lwd=2)

  # Just the genomic part
    gHat<-X%*%fm$ETA[[3]]$b
    plot(gHat~y,xlab='Phenotype',
         ylab='Predicted Genomic Value',col=2,
         xlim=tmp,ylim=tmp); abline(a=0,b=1,col=4,lwd=2)

#3# Godness of fit and related statistics
   fm$fit
   fm$varE # compare to var(y)

#4# Trace plots
  list.files()

  # Residual variance
   varE<-scan('varE.dat')
   plot(varE,type='o',col=2,cex=.5,ylab=expression(var[e]));
   abline(h=fm$varE,col=4,lwd=2);
   abline(v=fm$burnIn/fm$thin,col=4)

  # lambda (regularization parameter of the Bayesian Lasso)
   lambda<-scan('ETA_3_lambda.dat')
   plot(lambda,type='o',col=2,cex=.5,ylab=expression(lambda));
   abline(h=fm$ETA[[3]]$lambda,col=4,lwd=2);
   abline(v=fm$burnIn/fm$thin,col=4)
\end{verbatim}  
\end{GrayBox}

\begin{figure}[!htb]
  \centering
  \includegraphics[width=1.0\textwidth]{fig_2.png}
  \caption{Squared-Estimated Marker Effects (top-left), % 
           phenotype versus predicted genomic values (top-right), %
           trace plot of residual variance (lower-left) %
           and trace plot of regularization parameter of %
           the Bayesian Lasso (lower-right).}
  \label{fig:four_plots}
\end{figure}

\newpage
\textbf{Fitting a Pedigree+Markers `BLUP' model using BGLR}

In the following example we illustrate how to incorporate 
in the model Gaussian random effects with user-defined 
covariance structures. These types of random effects appear 
both in pedigree and genomic models. The example presented 
here uses the wheat data set included with the package. In 
the example of Box 4a we include two random effects, one 
representing a regression on pedigree, 
$\boldsymbol a \sim N(\boldsymbol 0, \boldsymbol A \sigma_a^2)$, 
where $\boldsymbol A$ is a pedigree-derived numerator 
relationship matrix, and one representing a linear 
regression on markers, 
$\boldsymbol g \sim N(\boldsymbol 0, \boldsymbol G \sigma_{gu}^2)$ 
where, $\boldsymbol G$ is a marker-derived genomic 
relationship matrix. The implementation of 
Gaussian processes in BGLR exploits the equivalence 
between these processes and random regressions on 
principal components \citep{delosCampos:2010b,Janss:2012}.  
Te user can implement a RKHS regression either by providing 
co-variance matrix ($\boldsymbol K$) or its eigen-value decomposition (see the example in Box 4a). When the co-variance matrix is provided,
the eigen-value decomposition is computed internally.


\begin{GrayBox}
\small
\textbf{Box 4a: Fitting a Pedigree + Markers regression using Gaussian Processes}

\begin{verbatim}
#1# Loading and preparing the input data
 library(BGLR); data(wheat); 
 Y<-wheat.Y; X<-wheat.X; A<-wheat.A; 
 y<-Y[,1]
\end{verbatim}
\end{GrayBox}

\begin{GrayBox}
\small
\begin{verbatim}
#2# Computing the genomic relationship matrix
 X<-scale(X,center=TRUE,scale=TRUE)
 G<-tcrossprod(X)/ncol(X)

#3# Computing the eigen-value decomposition of G
 EVD <-eigen(G)

#3# Setting the linear predictor
 ETA<-list(list(K=A, model='RKHS'),
           list(V=EVD$vectors,d=EVD$values, model='RKHS')
       )
 
#4# Fitting the model
  fm<-BGLR(y=y,ETA=ETA, nIter=12000, burnIn=2000,saveAt='PGBLUP_')
  save(fm,file='fmPG_BLUP.rda')
\end{verbatim}   
\end{GrayBox}

Box 4b shows how to extract estimates, predictions, and samples from the fitted model.
The first block of code (\texttt{\#1})  shows how to obtain the predictions. 
The second block of code shows how to extract some goodness of fit related statistics. The 
third block of code shows how to extract the posterior 
mean of the variances components $\sigma_a^2$ and $\sigma_{gu}^2$. Note 
that in order to obtain the estimate it is necessary to specify 
the component number (1 or 2), this can be done by writing 
\texttt{fm\$ETA[[1]]\$varU} and \texttt{fm\$ETA[[2]]\$varU} 
respectively. Finally, the fourth block of code shows 
how to produce the trace plots for $\sigma_a^2$, $\sigma_{g}^2$ 
and $\sigma_\varepsilon^2$ (graphs not shown). 

\begin{GrayBox}
\small
\textbf{Box 4b: Extracting estimates, predictions, and samples 
from Reproducing Kernel Hilbert Spaces Regressions 
(continues from Box 4a first)}

\begin{verbatim}
#1# Predictions
  # Total prediction
    yHat<-fm$yHat
    tmp<-range(c(y,yHat))
    plot(yHat~y,xlab='Observed',ylab='Predicted',col=2,
            xlim=tmp,ylim=tmp); abline(a=0,b=1,col=4,lwd=2)

#2# Godness of fit and related statistics
   fm$fit
   fm$varE # compare to var(y)
   
#3# Variance components associated with the genomic and pedigree 
  # matrices
  fm$ETA[[1]]$varU
  fm$ETA[[2]]$varU
  
#4# Trace plots
  list.files()
  # Residual variance
  varE<-scan('PGBLUP_varE.dat')
  plot(varE,type='o',col=2,cex=.5);
\end{verbatim}
\end{GrayBox}

\begin{GrayBox}
\small
\begin{verbatim}
  #varA and varU
  varA<-scan('PGBLUP_ETA_1_varU.dat')
  plot(varA,type='o',col=2,cex=.5);

  varU<-scan('PGBLUP_ETA_2_varU.dat')
  plot(varU,type='o',col=2,cex=.5)
\end{verbatim}
\end{GrayBox}


\textbf{Reproducing Kernel Hilbert Spaces Regressions}

Reproducing Kernel Hilbert Spaces Regressions (RKHS) 
have been used for regression \citep[e.g., Smoothing Spline][]{Wahba:1990}, 
spatial smoothing \citep[e.g., Kriging][]{Cressie:1988} and 
classification problems \citep[e.g., Support Vector Machine, ][]{Vapnik:1998}. 
\citet{Gianola:2006}, proposed to use 
this approach for genomic prediction and since then several methodological and applied articles have 
been published \citep{Gonzalez:2008, Gianola:2008b, delosCampos:2009b, delosCampos:2010b}.

\textbf{Single-Kernel Models}. In RKHS the regression function
is a linear combination of the basis function provided by the reproducing kernel (RK);
therefore, the choice of the RK constitutes one of the central elements of model specification.
The RK is a function that maps from pairs of points in 
input space into the real line and must be positive semi-definite. 
For instance, if the information set is given 
by vectors of marker genotypes the RK, $K(\boldsymbol x_i, \boldsymbol x_{i'})$ 
maps from pairs of vectors of genotypes, $\{\boldsymbol x_i, \boldsymbol x_{i'}\}$, 
onto the real line and must satisfy, 
$\sum_i \sum_{i'} \alpha_i \alpha_{i'} K(\boldsymbol x_i, \boldsymbol x_{i'}) \geq 0$, 
for any non-null sequence of coefficients $\alpha_i$. Following  
\citet{delosCampos:2009b} the Bayesian RKHS regression can be 
represented as follows: 

\begin{equation}
\label{eq:single_kernel}
  \left\{
    \begin{array}{l}
      \boldsymbol y = \boldsymbol 1 \mu + \boldsymbol u + \boldsymbol \varepsilon \,\,\,\, \text{with} \\
      p(\mu,\boldsymbol u, \boldsymbol \varepsilon) \propto %
      N(\boldsymbol u| \boldsymbol 0, \boldsymbol K \sigma_u^2) %
      N(\boldsymbol \varepsilon | \boldsymbol 0, \boldsymbol I \sigma_\varepsilon^2) 
    \end{array}
  \right.
\end{equation}

where $\boldsymbol K=\{K(\boldsymbol x_i, \boldsymbol x_{i'})\}$ 
is an ($n \times n$) matrix whose entries are the evaluations 
of the RK at pairs of points in input space. The structure 
of the model described by \eqref{eq:single_kernel} 
is that of the standard Animal Model \citep{Quaas:1980} with 
the pedigree-derived numerator relationship matrix ($\boldsymbol A$) 
replaced by the kernel matrix ($\boldsymbol K$).  
Box 5 features an example using a Gaussian Kernel
evaluated in the (average) squared-Euclidean distance between 
genotypes, that is: 
$K(\boldsymbol x_i, \boldsymbol x_{i'})=\exp \left\{-h \times \frac{\sum_{k=1}^p (x_{ik}-x_{i'k})^2}{p}  \right\}$.
In the example genotypes were centered and standardized, 
but this is not strictly needed. The bandwidth parameter 
controls how fast the co-variance function drops as the 
distance between pairs of vector genotypes increases. This 
parameter plays an important role. In this example 
we have chosen the bandwidth parameter to be equal to 
0.5, further discussion about this parameter is given in 
next example.

\begin{GrayBox}
\small
\textbf{Box 5: Fitting a Single Kernel Model in BGLR}
\begin{verbatim}
#1# Loading and preparing the input data
 library(BGLR); data(wheat); 
 Y<-wheat.Y; X<-wheat.X; n<-nrow(X); p<-ncol(X)
 y<-Y[,1]
\end{verbatim}
\end{GrayBox}

\begin{GrayBox}
\small
\begin{verbatim}
#2#  Computing the distance matrix and then the krenel.
  X<-scale(X,center=TRUE,scale=TRUE)
  D<-(as.matrix(dist(X,method='euclidean'))^2)/p
  h<-0.5
  K<-exp(-h*D)

#3#  Single Kernel Regression using BGLR
 
 ETA<-list(list(K=K,model='RKHS'))
 fm<-BGLR(y=y,ETA=ETA,nIter=12000, burnIn=2000,saveAt='RKHS_h=0.5_')
\end{verbatim}
\end{GrayBox}

\textbf{Multi-Kernel Models}. The bandwidth parameter of the Gaussian kernel 
can be chosen either using cross-validation 
(CV) or with Bayesian methods. The CV approach requires 
fitting models over a grid of values of $h$. The Bayesian 
approach estimates $h$, and all the model uknowns, form the data concurrently. The fully 
Bayesian treatment, which consist of treating $h$ as unknown, 
is computationally demanding because, any time $h$ is updated, 
the RK needs to be re-computed. To overcome this problem 
\citet{delosCampos:2010b} proposed to use a 
multi-kernel approach (named Kernel Averaging, KA) 
consisting on: (a) defining a sequence of kernels based 
on a set of reasonable values of $h$, and (b) fitting a multi-kernel 
model with as many random effects as kernels in the sequence. 
The model has the following form:

\begin{equation}
\label{eq:several_kernels}
\left\{
    \begin{array}{l}
      \boldsymbol y = \boldsymbol 1 \mu + \sum_{l=1}^L \boldsymbol u_l %
      + \boldsymbol \varepsilon \,\,\,\, \text{with} \\
      p(\mu,\boldsymbol u_1,...,\boldsymbol u_L, \boldsymbol \varepsilon) \propto %
      \prod_{l=1}^L N(\boldsymbol u| \boldsymbol 0, \boldsymbol K_l \sigma_{u_l}^2) %
      N(\boldsymbol \varepsilon | \boldsymbol 0, \boldsymbol I \sigma_\varepsilon^2) 
    \end{array}
  \right.
\end{equation}

where $\boldsymbol K_l$ is the RK evaluated at the 
lth value of the bandwidth parameter in the sequence 
$\{h_1,...,h_L\}$. It can be shown 
\citep[e.g.,][]{delosCampos:2010b} that if variance 
components are known, the model of expression \eqref{eq:several_kernels} 
is equivalent to a model with a single random effect whose 
distribution is $N(\boldsymbol u|\boldsymbol 0,\bar{\boldsymbol K} \sigma_u^2)$ 
where $\bar{\boldsymbol K}$ is a weighted average of all the RK 
used in \eqref{eq:several_kernels} with weights proportional 
to the corresponding variance components (hence the name, Kernel Averaging).

Performing a grid search or implementing a multi-kernel model 
requires defining a reasonable range for $h$.  One possibility 
is to choose as a focal point for that range a value of $h$ that 
gives a RK similar to the one given by the $\boldsymbol G$-matrix 
(this one represents the kernel for an additive model). The entries of 
the distance matrix 
$\boldsymbol D=\left\{D_{ii'}=\frac{\sum_{k=1}^p (x_{ik}-x_{i'k})^2}{p}%
\right\}$ can be calculated from the entries of the $\boldsymbol G$-matrix 
$\boldsymbol G=\left\{G_{ii'}=\frac{\sum_{k=1}^p x_{ik} x_{i'k}}{p}%
\right\}$; indeed, $D_{ii'}=G_{ii} + G_{i'i'} -2G_{ii'}$. 
What value of $h$ makes $\exp\{-h \times D_{ii'}\} \approx G_{ii'}$?
Consider for instance a pair of full sibs in an outbreed population, in this case 
$E[G_{ii}]=E[G_{i'i'}]=1$ and $E[G_{ii'}]=0.5$ 
therefore, $E[D_{ii'}]=1$, and using $h=0.8$ we get 
$K(\boldsymbol x_{i},\boldsymbol x_{i'})=\exp\{-0.8\}\approx 0.4$. 
Similarly, for a pair of half-sibs se have: 
$E[G_{ii'}]=0.5$, therefore, $E[D_{ii'}]=1.5$ and 
$K(\boldsymbol x_i, \boldsymbol x_{i'})=\exp\{-0.8 \times 1.5\} \approx 0.3$, 
which gives values for the RK for that type of relatives close to 
the ones given by the $\boldsymbol G$-matrix. In inbreed populations smaller 
values of $h$ will be needed because $E[G_{ii'}]>1$. Box 6 illustrates how to fit 
a multi-kernel model using $h=0.5 \times \{1/5,1,5\}$. With 
this choice of values for the bandwidth parameter the model includes kernels that give correlations much smaller 
($h=2.5$) similar ($h=0.5$), and much higher ($h=0.1$) than 
the ones given by the $\boldsymbol G$ matrix. This is illustrated 
in Figure \ref{fig:entries_kernel} that displays the entries of the 1st row of 
the kernel matrix evaluated at each of the values of 
the bandwidth parameter in the grid.


\begin{GrayBox}
\small
\textbf{Box 6: Fitting a RKHS Using a Multi-Kernel Methods (Kernel Averaging)}

\begin{verbatim}
 #1# Loading and preparing the input data
  library(BGLR); data(wheat); 
  Y<-wheat.Y; X<-wheat.X; n<-nrow(X); p<-ncol(X)
  y<-Y[,1]
  
 #2#  Computing D and then K
  X<-scale(X,center=TRUE,scale=TRUE)
  D<-(as.matrix(dist(X,method='euclidean'))^2)/p  
  h<-0.5*c(1/5,1,5)
  
 #3#  Kernel Averaging using BGLR
 ETA<-list(list(K=exp(-h[1]*D),model='RKHS'), 
           list(K=exp(-h[2]*D),model='RKHS'), 
           list(K=exp(-h[3]*D),model='RKHS'))
 fm<-BGLR(y=y,ETA=ETA,nIter=5000, burnIn=1000,saveAt='RKHS_KA_')

#1# Variance Components
  fm$ETA[[1]]$varU ; fm$ETA[[2]]$varU; fm$ETA[[3]]$varU
\end{verbatim}
\end{GrayBox}

\begin{figure}[!htb]
  \centering
  \includegraphics[width=0.60\textwidth]{fig_3}
  \caption{Entries of the 1st row of the (Gaussian) kernel matrix evaluated at %
           three different values of the bandwidth parameter, $h=0.5 \times \{1/5,1,5\}$.}
  \label{fig:entries_kernel}
\end{figure}  

\textbf{Assessment of Prediction Accuracy}

The simple way of assessing prediction accuracy consists of partitioning  
the data set into two disjoint sets: one used for model training (TRN) 
and one used for testing (TST). Box 7 shows code that fits a G-BLUP model in a TRN-TST setting using the wheat data set. 
The code randomly assigns 100 individuals to the TST set. 
The variable \texttt{tst} is a vector that indicates 
which data-points belong to the TST data set; for these entries 
we put missing values in the phenotypic vector (see Box 7). Once the model 
is fitted predictions for individuals in TST set 
can be obtained typing \texttt{fit\$yHat[tst]} in the 
R command line. Figure \ref{fig:obs_predicted} plots observed vs predicted 
phenotypes for individuals in training and TST sets.

\begin{GrayBox}
\small
\textbf{Box 7: Assessment of Prediction Accuracy: Continuous Response}

\begin{verbatim}
#1# Loading and preparing the input data
  library(BGLR); data(wheat); 
  Y<-wheat.Y; X<-wheat.X; n<-nrow(X); p<-ncol(X)
  y<-Y[,1]

#2# Creating a Testing set
  yNA<-y
  set.seed(123)
  tst<-sample(1:n,size=100,replace=FALSE)
  yNA[tst]<-NA

#3#  Computing G
  X<-scale(X,center=TRUE,scale=TRUE)
  G<-tcrossprod(X)/p
  
#4#  Fits the G-BLUP model
  ETA<-list(list(K=G,model='RKHS'))
  fm<-BGLR(y=yNA,ETA=ETA,nIter=5000, burnIn=1000,saveAt='RKHS_')
 
  plot(fm$yHat,y,xlab="Phenotype",
     ylab="Pred. Gen. Value" ,cex=.8,bty="L")
  points(x=y[tst],y=fm$yHat[tst],col=2,cex=.8,pch=19)
  legend("topleft", legend=c("training","testing"),bty="n", 
        pch=c(1,19), col=c("black","red"))

#5# Assesment of correlation in TRN and TST data sets
  cor(fm$yHat[tst],y[tst])    #TST
  cor(fm$yHat[-tst],y[-tst])  #TRN
\end{verbatim}
\end{GrayBox}

\begin{figure}[!htb]
  \centering
  \includegraphics[width=0.70\textwidth]{fig_4}
  \caption{Estimated genetic values for training and %
           testing sets. Predictions were derived %
           using  G-BLUP model (see Box7).}
  \label{fig:obs_predicted}
\end{figure}

A cross-validation is simply a generalization of the TRN-TST evaluation 
presented in Box 7. For a K-fold cross-validation there are K TRN-TST 
partitions; in each fold, the individuals assigned to 
that particular fold are used for TST and the 
remaining individuals are used for TRN.  

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Regression with Ordinal and Binary Traits}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
For categorical traits \pkg{BGLR} uses the probit link and the 
phenotype vector should be coercible to a factor. 
The type of response is defined by setting the argument 
\texttt{`response\_type'}. By default this argument 
is set equal to \texttt{`Gaussian'}. For binary and 
ordinal outcomes we should set \texttt{response\_type=`ordinal'}. 
Box 8 provides a simple example  that uses the wheat 
data set with a discretized phenotype. 
The second block of code, \texttt{\#2\#}, presents the analysis of a binary outcome, 
and the third one, \texttt{\#3\#}, that of an ordinal trait. 
Figure \ref{fig:boxplot} shows, for the binary outcome, a plot of 
predicted probability versus realized value in the 
TRN and TST datasets. The estimated posterior means and posterior standard 
deviations of marker effects and posterior means 
of the linear predictor are retrieved as described 
before (e.g., \texttt{fm\$ETA[[1]]\$b}, \texttt{fm\$yHat}). 
For continuous outcomes the posterior mean of 
the linear predictor is also the conditional expectation function. 
For binary outcomes, the conditional expectation is simply the success probability; therefore,
in this case \pkg{BGLR} also returns
the estimated probabilities of each of the categories \texttt{fm\$probs}). 


\begin{GrayBox}
\small
\textbf{Box 8: Fitting models with binary and ordinal responses}

\begin{verbatim}
#1# Loading and preparing the input data
 library(BGLR); data(wheat); 
 Y<-wheat.Y; X<-wheat.X; A<-wheat.A; 
 y<-Y[,1] 
 tst<-sample(1:nrow(X),size=150)
\end{verbatim}
\end{GrayBox}

\begin{GrayBox}
\small
\begin{verbatim}
#2# Binary outcome
 yBin<-ifelse(y>0,1,0)
 yBinNA<-yBin ; yBinNA[tst]<-NA
 ETA<-list(list(X=X,model='BL'))
 
 fmBin<-BGLR(y=yBinNA,response_type='ordinal', ETA=ETA, 
          nIter=1200,burnIn=200)
 
 head(fmBin$probs)
 par(mfrow=c(1,2))
 boxplot(fmBin$probs[-tst,2]~yBin[-tst],main='Training',ylab='Estimated prob.')
 boxplot(fmBin$probs[tst,2]~yBin[tst],main='Testing', ylab='Estimated prob.')

#2# Ordinal outcome
 yOrd<-ifelse(y<quantile(y,1/4),1,ifelse(y<quantile(y,3/4),2,3))
 yOrdNA<-yOrd ; yOrdNA[tst]<-NA

 ETA<-list(list(X=X,model='BL'))

 fmOrd<-BGLR(y=yOrdNA,response_type='ordinal', ETA=ETA, 
          nIter=1200,burnIn=200)
 head(fmOrd$probs)
 
\end{verbatim}
\end{GrayBox}


\begin{figure}[!htb]
  \centering
  \includegraphics[width=1.0\textwidth]{fig_5}
  \caption{Estimated probability by category, versus observed category (binary response).}
  \label{fig:boxplot}
\end{figure} 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Regression with Censored Outcomes}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Box 9 illustrates how to fit a model to a censored trait. 
Note that in the case of censored trait the response is specified using a 
triplet $(a_i,y_i,b_i)$ (see Table \ref{tab:triplet} for further details). 
For assessment of prediction accuracy (not done in Box 9), one can set 
$a_i=-\infty$, $y_i=NA$, $b_i=\infty$ for individuals 
in testing data sets, this way there is no 
information about the ith phenotype available for the model fit.

\begin{GrayBox}
\small
\textbf{Box 9: Fitting censored traits}

\begin{verbatim}
#1# Loading and preparing the input data
 library(BGLR); data(wheat); 
 Y<-wheat.Y; X<-wheat.X; A<-wheat.A; 
 y<-Y[,1]

#censored
 n<-length(y)
 cen<-sample(1:n,size=200)
 yCen<-y
 yCen[cen]<-NA  
 a<-rep(NA,n) 
 b<-rep(NA,n)
 a[cen]<-y[cen]-runif(min=0,max=1,n=200)
 b[cen]<-Inf
 
#models
ETA<-list(list(X=X,model='BL'))

fm<-BGLR(y=yCen,a=a,b=b,ETA=ETA,nIter=12000,burnIn=2000)

cor(y[cen],fm$yHat[cen])
\end{verbatim}
\end{GrayBox}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Benchmark of parametric models}
\label{sec:benchmark}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

We carried out a benchmark evaluation by fitting a BRR to data sets involving 
three different sample size ($n$=1K, 2K and 5K, K=1,000) and 
four different marker densities ($p$=5K, 10K, 50K and 100K). The 
evaluation was carried out in an Intel(R) Xeon(R) processor @ 2 GHz. 
Computing time, expressed in seconds per thousand 
iterations of the Gibbs sampler are given in Figure \ref{fig:benchmark}. 
R was executed in a single thread and was linked against 
OpenBLAS. Computing scales approximately proportional to the product of the number 
of records and the number of effects. For the 
most demanding scenario ($n$=5K, $p$=100K) it took 
approximately 11 min to complete 1,000 iterations 
of the Gibbs sampler.

\begin{figure}[!htb]
  \centering
  \includegraphics[width=1.0\textwidth]{fig_6}
  \caption{Seconds per 1000 iterations of the Gibbs sampler by number of %
           markers and sample size. The Benchmark was carried out %
           by fitting a Gaussian regression (BRR) using an Intel(R) Xeon(R) %
           processor @ 2 GHz. Computations were carried out using a single thread.}
  \label{fig:benchmark}
\end{figure} 

In general, the computational time of models BayesA and BL  
are slightly longer than that of BRR ($\sim 10\%$ longer). The computational 
time of models using finite-mixture priors (e.g., models BayesB or BayesC) tend to 
be higher than those of BayesA, BL and BRR, unless the proportion of markers 
entering in the model is low.

\vspace{2cm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Concluding Remarks}
\label{sec:concluding}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In \pkg{BGLR} we implemented, in a unified Bayesian framework, several methods 
commonly used in genome-enabled prediction, including various 
parametric models as well as Gaussian processes that can be 
used for parametric or semi-parametric regression/prediction.  
The package supports continuous (censored or not) as well as 
binary and ordinal traits. The user interface gives the user great latitude in 
combining different modeling approaches for data analysis. 
Operations that can be vectorized are performed using built-in 
R-functions, but most of the computing intensive tasks 
are performed using compiled routines written in C and 
Fortran languages. The package is also able to take advantage 
of multi-thread BLAS implementations in both Windows and UNIX-like 
systems.  Finally, together with the package we have included 
two data sets and ancillary functions that can be used to 
read into the R-environment genotype files written in ped and bed formats.

The Gibbs sampler implemented is 
computationally very intensive and our current implementation 
stores genotypes in memory; therefore, despite of the effort 
made in in the development of BGLR to make the algorithm computationally 
efficient, performing regressions with hundreds of thousands of 
markers requires access to large amounts of RAM and the computational 
time can be considerable. Certainly, faster 
algorithms could be conceived, but these are in general not as 
flexible, in terms of the class of models that 
can be implemented, as the ones implemented in \pkg{BGLR}.

\textbf{Future developments}. Although some of the computationally 
intensive algorithms implemented in \pkg{BGLR} can benefit from 
multi-thread computing; there is large room to further 
improve the computational performance of the software 
by making more intensive use of parallel computing. In future 
releases we plan to exploit parallel computing to a much 
greater extent. Also, we are currently working on modifying 
the software so that genotypes do not need to be stored 
in memory. Future releases including these and other 
features will be made at the R-Forge website 
(\url{https://r-forge.r-project.org/R/?group_id=1525}) 
first and after considerable testing at CRAN.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Acknowledgements} 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In the development of \pkg{BGLR} Paulino P\'erez and 
Gustavo de los Campos had financial support 
provided by NIH Grants: R01GM099992 and R01GM101219.

%At the end, the bibliography
\bibliography{references}

\appendix
\appendixpage

\textbf{\large{1. Prior Densities Used in the BGLR R-Package}}
\vspace{0.5cm}

In this appendix we describe the prior distributions assigned 
to the location parameters, $(\boldsymbol \beta_j, \boldsymbol u_l)$,
entering in the linear predictor of eq. \eqref{eq:eta}. For each of the unknown effects included in the linear predictor, 
$\{\boldsymbol \beta_1,..,\boldsymbol \beta_J,\boldsymbol u_1,...,\boldsymbol u_L\}$, 
the prior density assigned is specified via 
the argument \texttt{model} in the corresponding entry of the list 
(see Box 3a for an example).  Table A1 describes, for each of 
the options implemented, the prior density used. 
A brief description is given below. 

\textbf{FIXED}. In this case regression coefficients 
are assigned flat priors, specifically we use a Gaussian 
prior with mean zero and variance equal to $1 \times 10^{10}$.

\textbf{BRR}. When this option is used regression coefficients 
are assigned normal IID normal distributions, with mean zero and 
variance $\sigma_\beta^2$. In a 2nd level of the hierarchy, 
the variance parameter is assigned a scaled-inverse Chi-squared density, 
with parameters $df_\beta$ and $S_\beta$. This density is 
parameterized in a way that the prior expected value 
and mode are $E(\sigma^2_\beta)=\frac{S_\beta}{df_\beta-2}$ and 
$Mode(\sigma^2_\beta)=\frac{S_\beta}{df_\beta+2}$, respectively. 
By default, if $df_\beta$ and $S_\beta$ are not provided, 
\pkg{BGLR} sets  $df_\beta=5$ and solves for the scale 
parameter to match the R-squared of the model 
(see default rules to set hyper-parameters below). An analysis 
with fixed variance parameter can be obtained by 
choosing the degree of freedom parameter to a very large 
value (e.g., $1 \times 10^{10}$) and solving for 
the scale using $S_\beta=\sigma^2_\beta\times (df_\beta+2)$; this gives 
a prior that collapses to a point of mass at $\sigma^2_\beta$.

\textbf{BayesA}. In this model the marginal distribution 
of marker effects is a scaled-t density, with parameters 
$df_\beta$ and $S_\beta$. For computational convenience 
this density is implemented as an infinite mixture 
of scaled-normal densities. In a first level of the hierarchy 
marker effects are assigned normal densities with 
zero mean and marker-specific variance parameters, 
$\sigma_{\beta_{jk}}^2$. In a 2nd level of the hierarchy 
these variance parameters are assigned IID scaled-inverse 
Chi-squared densities with degree of freedom and 
scale parameters $df_\beta$ and $S_\beta$, respectively. The 
degree of freedom parameter is regarded as known; 
if the user does not provide a value for this parameter \pkg{BGLR} sets $df_\beta=5$. The scale parameter 
is treated as unknown, and BGLR assigns to this parameter 
a gamma density with rate and shape parameters $r$ and $s$, 
respectively. The mode and coefficient of variation 
(CV) of the gamma density are $Mode(S_\beta)=(s-1)/r$ (for $s>1$) 
and $CV(S_0)=1/\sqrt{s}$. If the user does not provide shape and 
rate parameters \pkg{BGLR} sets $s=1.1$, this gives a relatively 
un-informative prior with a CV of approximately $95\%$, and 
then solves for the rate so that the total contribution of 
the linear predictor matches the R-squared of the model 
(see default rules to set hyper-parameters, below). If one 
wants to run the analysis with fixed scale one can choose 
a very large value for the shape parameter (e.g., $1 \times10^{10}$)  
and then solve for the rate so that the prior mode matches 
the desired value of the scale parameter using $r=(s-1)/S_\beta$.

\textbf{Bayesian LASSO (BL)}. In this model the marginal 
distribution of marker effects is double-exponential. 
Following \citet{Park:2008} we implement the double-exponential 
density as a mixture of scaled normal densities. In the first 
level of the hierarchy, marker effects are assigned 
independent normal densities with null mean and 
maker-specific variance parameter $\tau_{jk}^2 \times \sigma_\varepsilon^2$. 
The residual variance is assigned a scaled-inverse 
Chi-square density, and the marker-specific scale parameters, $\tau_{jk}^2$, 
are assigned IID exponential densities with rate parameter $\lambda^2/2$. 
Finally, in the last level of the hierarchy $\lambda^2$ is either 
regarded as fixed (this is obtained by setting in the 
linear predictor the option \texttt{type=`FIXED'}), or 
assigned either a Gamma $(\lambda^2 \sim Gamma(r,s)$ 
if \texttt{type=`gamma'}) or a $\lambda/\max$ is assigned a Beta prior, 
if \texttt{type=`beta'}, here $\max$ is a user-defined parameter 
representing the maximum value that $\lambda$ can take). If 
nothing is specified, \pkg{BGLR} sets  \texttt{type=`gamma'} and $s=1.1$, and 
solves for the scale parameter to match the 
expected R-squared of the model (see section 2 of this appendix).

\textbf{BayesB-C}. In these models marker effects are 
assigned IID priors that are mixtures of a point of 
mass at zero and a slab that is either normal 
(BayesC) or a scaled-t density (BayesB). The slab is 
structured as either in the BRR (this is the case of BayesC) 
or as in BayesA (this is the case of BayesB). Therefore, 
BayesB and BayesC extend BayesA and BRR, respectively, 
by introducing an additional parameter $\pi$ which in the case 
of BGLR represents the prior proportion of 
non-zero effects. This parameter is treated as unknown 
and it is assigned a Beta prior $\pi\sim Beta(p_0, \pi_0)$, with 
$p_0>0$ and $\pi_0 \in [0,1]$. The beta prior is 
parameterized in a way that the expected value by $E(\pi)=\pi_0$;
on the other hand $p_0$ can be interpreted as the number 
of prior counts (priors ``successes'' plus prior ``failures''); the 
variance of the Beta distribution is then given by 
$Var(\pi)=\frac{\pi_0(1-\pi_0)}{(p_0+1)}$, which is inversely 
proportional to $p_0$. Choosing $p_0=2$ and $\pi_0=0.5$ gives 
a uniform prior in the interval $[0,1]$. Choosing a very 
large value for $p_0$ gives a prior that 
collapses to a point of mass at $\pi_0$.

\begin{sidewaystable}

Table A1. Prior densities implemented in BGLR.

\vspace{0.5cm}

\begin{tabular}{llp{8cm}}
\hline
\texttt{model=} & \textbf{Join distribution of effects and hyper-parameters} %
                & \textbf{Specification of elements in the} \\
                & & \textbf{linear predictor} \\
\hline
\texttt{FIXED}  & $p(\boldsymbol \beta_j) \propto 1$ & \texttt{list(X=, model="FIXED")} \\
\hline
\texttt{BRR}    & $p(\boldsymbol \beta_j, \sigma_\beta^2)= % 
                   \left\{ \prod_{k} N(\beta_{jk}|0,\sigma_\beta^2) \right\} \chi^{-2} (\sigma_\beta^2|df_\beta,S_\beta)$ %
                & \texttt{list(X=, model="BRR",df0=,S0=,R2=)} \\
\hline
\texttt{BayesA} & $p(\boldsymbol \beta_j, \sigma_{\beta_j}^2, S_\beta)= % 
                   \left\{ \prod_{k} N(\beta_{jk}|0,\sigma_{\beta_{jk}}^2) %
                   \chi^{-2} (\sigma_{\beta_{jk}}^2|df_\beta,S_\beta) \right\}  % 
                   G(S_\beta|r,s)$ 
                & \texttt{list(X=, model="BayesA",df0=,rate0=,} \\  
                & & \texttt{shape0=,R2=)}\\ 
\hline
                & $p(\boldsymbol \beta_j, \boldsymbol \tau_j^2, \lambda^2|\sigma_\varepsilon^2)= %
                  \left\{ \prod_k N(\beta_{jk} | 0, \tau_{jk}^2 \times \sigma_\varepsilon^2) 
                  Exp\left\{ \tau_{jk}^2 | \frac{\lambda^2}{2} \right\}%
                   \right\}\times G(\lambda^2 | r, s)$ , or
                & \texttt{list(X=,model="BL",lambda=,type="gamma",}\\
                & %
                & \texttt{rate=,shape=,R2=)}$^1$ \\
\texttt{BL}     & $p(\boldsymbol \beta_j, \boldsymbol \tau_j^2, \lambda|\sigma_\varepsilon^2,\texttt{max})= %
                  \left\{ \prod_k N(\beta_{jk} | 0, \tau_{jk}^2 \times \sigma_\varepsilon^2) 
                  Exp\left\{ \tau_{jk}^2 | \frac{\lambda^2}{2} \right\}%
                   \right\}\times B(\lambda/\max |p_0, \pi_0)$, or
                & \texttt{list(X=,model="BL",lambda=,type="beta",}\\
                & %
                & \texttt{probIn=,counts=,max=,R2=)}$^1$ \\
& $p(\boldsymbol \beta_j, \boldsymbol \tau_j^2|\sigma_\varepsilon^2,\lambda)= %
                  \left\{ \prod_k N(\beta_{jk} | 0, \tau_{jk}^2 \times \sigma_\varepsilon^2) 
                  Exp\left\{ \tau_{jk}^2 | \frac{\lambda^2}{2} \right\}%
                   \right\}$
                & \texttt{list(X=,model="BL",lambda=,type="FIXED")}$^1$ \\
                & %
                & \\
\hline
\texttt{BayesC} & $\begin{array}{lll}%
                   p(\boldsymbol \beta_j , \sigma_\beta^2,\pi)&=& %
                   \left\{ \prod_{k} \left[ \pi N(\beta_{jk}|0,\sigma_\beta^2) %
                   + (1-\pi) 1(\beta_{jk}=0) \right] \right\} \\
                    & & \times \chi^{-2} (\sigma_\beta^2 | df_\beta, S_\beta) B(\pi | p_0, \pi_0)  
                   \end{array}$ 
                & \texttt{list(X=,model="BayesC",df0,S0, probIn=,counts=,R2=)}$^2$\\
\hline
\texttt{BayesB} & $\begin{array}{lll}%
                   p(\boldsymbol \beta_j , \sigma_\beta^2,\pi)&=& %
                   \left\{ \prod_{k} \left[ \pi N(\beta_{jk}|0,\sigma_\beta^2) %
                   + (1-\pi) 1(\beta_{jk}=0) \right] \chi^{-2} (\sigma_{\beta_{jk}}^2 | df_\beta, S_\beta)\right\}\\
                    & &  B(\pi | p_0, \pi_0)\times G(S_\beta | r, s)  
                   \end{array}$ 
                & \texttt{list(X=,model="BayesB",df0,rate0,shape0, probIn=,counts=,R2=)}$^2$\\ 
\hline
\texttt{RKHS}   & $p(\boldsymbol u_l , \sigma_{u_l}^2) = %
                   N(\boldsymbol u_l | \boldsymbol 0, \boldsymbol K_l \times \sigma_{u_l}^2)%
                   \chi^{-2} (\sigma_{u_l}^2 | df_l, S_l)$ 
                & Either \texttt{list(K=,model="RKHS",df0,S0,R2=)} \\
                & & or \texttt{list(V=,d=,model="RKHS",df0,S0,R2=)}$^3$ \\
\hline
\end{tabular}

$N(\cdot|\cdot,\cdot)$, $\chi^{-2}(\cdot|\cdot,\cdot)$, $G(\cdot|\cdot,\cdot)$, $Exp(\cdot| \cdot)$, $B(\cdot| \cdot ,\cdot)$
denote normal, scaled inverse Chi-squared, gamma, exponential and beta densities, respectively. 
(1) \texttt{type} can take values \texttt{"FIXED"}, \texttt{"gamma"}, or \texttt{"beta"};
(2) \texttt{probIn} represents the prior probability of a marker having a non-null effect ($\pi_0$), 
counts (the number of `prior counts') can be used to control how informative the prior is; 
(3) $\texttt{V}$ and $\texttt{d}$ represent the eigen-vectors and eigen-values of $\boldsymbol K$, 
respectively. 

\end{sidewaystable}

\vspace{0.5cm}
\textbf{2. Default rules for choosing hyper-parameters}
\vspace{0.5cm}

\pkg{BGLR} has built-in rules to set values of hyper-parameters. The 
default rules assign proper, but weakly informative, 
priors with prior modes chosen in a way that, 
a priori, they obey a variance partition of 
the phenotype into components attributable 
to the error terms and to each of the 
elements of the linear predictor.  The user 
can control this variance partition by setting the 
argument R2 (representing the \textbf{model R-squared}) of the 
BGLR function to the desired value. By default the model 
R2 is set equal to 0.5, in which case hyper-parameters 
are chosen to match a variance partition where 50\% of the 
variance of the response is attributable to the linear 
predictor and 50\% to model residuals. Each of the 
\textbf{elements of the linear predictor} has its own R2 parameter 
(see last column of Table A1). If these are not provided, 
the \textbf{R2} attributable to each element of the linear 
predictor equals the R-squared of the model divided 
the number of elements in the linear predictor. Once 
the R2 parameters are set, BGLR checks whether each of the hyper-parameters have been 
specified and if not, the built in-rules are used to set 
values for these hyper-parameters. Next we briefly describe 
the built-in rules implemented in \pkg{BGLR}; these are based
on formulas similar to those described by \citet{delosCampos:2013b} implemented using the prior mode
instead of the prior mean.

\textbf{Variance parameters.} The residual variance ($\sigma_{\varepsilon}^2$, $\sigma_{u_{l}}^2$), 
of the RKHS model, and $\sigma_{\beta}^2$, of the BRR, are assigned scaled-inverse 
Chi-square densities, which are indexed by a \textbf{scale} 
and a \textbf{degree of freedom} parameter. By default, if 
degree of freedom parameter is not specified, these are 
set equal to 5 (this gives a relatively un-informative 
scaled-inverse Chi-square and guarantees a finite 
prior variance) and the scale parameter is solved for
to match the desired variance partition.  For instance, 
in case of the residual variance the scale is calculated using 
$S_\varepsilon=var(y)\times (1-R2) \times (df_\varepsilon+2)$, this gives a prior 
mode for the residual variance equal to $var(y) \times (1-R2)$.
Similar rules are used in case of other variance parameters. 
For instance, if one element of the linear predictor 
involves a linear regression of the form 
$\boldsymbol X \boldsymbol \beta$ with \texttt{model=`BRR'} then
$S_\beta=var(y)\times R2 \times (df_\beta+2)/MSx$ where $MSx$
is the sum of the sample variances of the 
columns of $\boldsymbol X$ and R2 is the proportion of 
phenotypic variance a-priori assigned to that particular element 
of the linear predictor. The selection of the scale parameter 
when the model is the RKHS regression is modified relative 
to the above rule to account for the fact that the average 
diagonal value of $\boldsymbol K$ may be different than 1, 
specifically we choose the scale parameter according 
to the following formula 
$S_l=var(y) \times R2 \times (df_l+2)/mean(diag(\boldsymbol K))$. 


In models \textbf{BayesA} and \textbf{BayesB} the scale-parameter indexing the
t-prior assigned to marker effects is assigned a Gamma density with rate and
shape parameters $r$ and $s$, respectively. By default \pkg{BGLR} sets $s= 1.1$ and solves
for the rate parameter using $r=(s-1)/S_\beta$ with 
$S_\beta=var(y)\times R2 \times (df_\beta+2)/MSx$, here, as before,
$MSx$ represents the sum of the variances of the columns of $X$.

For the \textbf{BL}, the default is to set: 
\texttt{type=`gamma'}, fix the shape parameter of 
the gamma density to 1.1 and solve for the rate 
parameter to match the expected proportion of 
variance accounted for by the corresponding element 
of the linear predictor, as specified by the 
argument R2. Specifically, we set the rate to be 
$(s-1)/(2 \times (1-R2)/R2 \times MSx)$.

For models \textbf{BayesB} and \textbf{BayesC}, the default rule is 
to set $\pi_0=0.5$ and $p_0=10$. This gives a 
weakly informative beta prior for $\pi$ with a prior mode 
at 0.5. The scale and degree-of freedom parameters entering 
in the priors of these two models are treated as in the case 
of models BayesA (in the case of BayesB) and BRR (in the case of BayesC), but the rules are modified 
by considering that only a fraction of the markers 
($\pi$) nave non-null effects; therefore, in BayesC we 
use $S_\beta=var(y)\times R2 \times(df_\beta+2)/MSx/\pi$ and in BayesB
we set $r=(s-1)/S_\beta$ with $S_\beta=var(y)\times R2 \times(df_\beta+2)/MSx/\pi$. 


\end{document}