Package 'BLR'

Title: Bayesian Linear Regression
Description: Bayesian Linear Regression.
Authors: Gustavo de los Campos, Paulino Perez Rodriguez,
Maintainer: Paulino Perez Rodriguez <[email protected]>
License: GPL-2
Version: 1.6
Built: 2024-10-31 22:25:27 UTC
Source: CRAN

Help Index


Pedigree info for the wheat dataset

Description

Is a numerator relationship matrix (599 x 599) computed from a pedigree that traced back many generations. This relationship matrix was derived using the Browse application of the International Crop Information System (ICIS), as described in http://repository.cimmyt.org/xmlui/bitstream/handle/10883/3488/72673.pdf (McLaren et al. 2005).

Source

International Maize and Wheat Improvement Center (CIMMYT), Mexico.

References

McLaren, C. G., R. Bruskiewich, A.M. Portugal, and A.B. Cosico. 2005. The International Rice Information System. A platform for meta-analysis of rice crop data. Plant Physiology 139: 637-642.


Bayesian Linear Regression

Description

The BLR (‘Bayesian Linear Regression’) function was designed to fit parametric regression models using different types of shrinkage methods. An earlier version of this program was presented in de los Campos et al. (2009).

Usage

BLR(y, XF, XR, XL, GF, prior, nIter, burnIn, thin,thin2,saveAt,
      minAbsBeta,weights)

Arguments

y

(numeric, nn) the data-vector (NAs allowed).

XF

(numeric, n×pFn \times pF) incidence matrix for βF\boldsymbol \beta_F, may be NULL.

XR

(numeric, n×pRn \times pR) incidence matrix for βR\boldsymbol \beta_R, may be NULL.

XL

(numeric, n×pLn \times pL) incidence matrix for βL\boldsymbol \beta_L, may be NULL.

GF

(list) providing an $\$ID (integer, nn) linking observations to groups (e.g., lines or sires) and a (co)variance structure ($\$A, numeric, pU×pUpU \times pU) between effects of the grouping factor (e.g., line or sire effects). Note: ID must be an integer taking values from 1 to pUpU; ID[i]=qq indicates that the ith observation in y\boldsymbol y belongs to cluster qq whose (co)variance function is in the qth row (column) of A\boldsymbol A. GF may be NULL.

weights

(numeric, nn) a vector of weights, may be NULL.

nIter, burnIn, thin

(integer) the number of iterations, burn-in and thinning.

saveAt

(string) this may include a path and a pre-fix that will be added to the name of the files that are saved as the program runs.

prior

(list) containing the following elements,

  • prior$varE, prior$varBR, prior$varU: (list) each providing degree of freedom ($df) and scale ($S). These are the parameters of the scaled inverse-χ2\chi^2 distributions assigned to variance components, see Eq. (2) below. In the parameterization used by BLR() the prior expectation of variance parameters is S/(df2)S/(df-2).

  • prior$lambda: (list) providing $value (initial value for λ\lambda); $type (‘random’ or ‘fixed’) this argument specifies whether λ\lambda should be kept fixed at the value provided by $value or updated with samples from the posterior distribution; and, either $shape and $rate (this when a Gamma prior is desired on λ2\lambda^2) or $shape1, $shape2 and $max, in this case p(λmax,α1,α2)Beta(λmaxα1,α2)p(\lambda |\max, \alpha_1, \alpha_2) \propto Beta \left(\frac{\lambda}{\max} | \alpha_1, \alpha_2 \right). For detailed description of these priors see de los Campos et al. (2009).

thin2

This value controls wether the running means are saved to disk or not. If thin2 is greater than nIter the running means are not saved (default, thin2=1×10101 \times 10^{10}).

minAbsBeta

The minimum absolute value of the components of βL\boldsymbol \beta_L to avoid numeric problems when sampling from τ2\boldsymbol \tau^2, default 1×1091 \times 10^{-9}

Details

The program runs a Gibbs sampler for the Bayesian regression model described below.

Likelihood. The equation for the data is:

y=1μ+XFβF+XRβR+XLβL+Zu+ε(1)\begin{array}{lr} \boldsymbol y= \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu} + \boldsymbol \varepsilon & (1) \end{array}

where y\boldsymbol y, the response is a n×1n \times 1 vector (NAs allowed); μ\mu is an intercept; XF,XR,XL\boldsymbol X_F, \boldsymbol X_R, \boldsymbol X_L and Z\boldsymbol Z are incidence matrices used to accommodate different types of effects (see below), and; ε\boldsymbol \varepsilon is a vector of model residuals assumed to be distributed as εN(0,Diag(σε2/wi2))\boldsymbol \varepsilon \sim N(\boldsymbol 0,Diag(\sigma_{\boldsymbol \varepsilon}^2/w_i^2)), here σε2\sigma_{\boldsymbol \varepsilon}^2 is an (unknown) variance parameter and wiw_i are (known) weights that allow for heterogeneous-residual variances.

Any of the elements in the right-hand side of the linear predictor, except μ\mu and ε\boldsymbol \varepsilon , can be omitted; by default the program runs an intercept model.

Prior. The residual variance is assigned a scaled inverse-χ2\chi^2 prior with degree of freedom and scale parameter provided by the user, that is, σε2χ2(σε2dfε,Sε)\sigma_{\boldsymbol \varepsilon}^2 \sim \chi^{-2} (\sigma_{\boldsymbol \varepsilon}^2 | df_{\boldsymbol \varepsilon}, S_{\boldsymbol \varepsilon}). The regression coefficients {μ,βF,βR,βL,u}\left\{\mu, \boldsymbol \beta_F, \boldsymbol \beta_R, \boldsymbol \beta_L, \boldsymbol u \right\} are assigned priors that yield different type of shrinkage. The intercept and the vector of regression coefficients βF\boldsymbol \beta_F are assigned flat priors (i.e., estimates are not shrunk). The vector of regression coefficients βR\boldsymbol \beta_R is assigned a Gaussian prior with variance common to all effects, that is, βR,jiidN(0,σβR2)\beta_{R,j} \mathop \sim \limits^{iid} N(0, \sigma_{\boldsymbol \beta_R}^2). This prior is the Bayesian counterpart of Ridge Regression. The variance parameter σβR2\sigma_{\boldsymbol \beta_R}^2, is treated as unknown and it is assigned a scaled inverse-χ2\chi^2 prior, that is, σβR2χ2(σβR2dfβR,SβR)\sigma_{\boldsymbol \beta_R}^2 \sim \chi^{-2} (\sigma_{\boldsymbol \beta_R}^2 | df_{\boldsymbol \beta_R}, S_{\boldsymbol \beta_R}) with degrees of freedom dfβRdf_{\boldsymbol \beta_R}, and scale SβRS_{\boldsymbol \beta_R} provided by the user.

The vector of regression coefficients βL\boldsymbol \beta_L is treated as in the Bayesian LASSO of Park and Casella (2008). Specifically,

p(βL,τ2,λσε2)={kN(βL,k0,σε2τk2)Exp(τk2λ2)}p(λ),p(\boldsymbol \beta_L, \boldsymbol \tau^2, \lambda | \sigma_{\boldsymbol \varepsilon}^2) = \left\{\prod_k N(\beta_{L,k} | 0, \sigma_{\boldsymbol \varepsilon}^2 \tau_k^2) Exp\left(\tau_k^2 | \lambda^2\right) \right\} p(\lambda),

where, Exp()Exp(\cdot|\cdot) is an exponential prior and p(λ)p(\lambda) can either be: (a) a mass-point at some value (i.e., fixed λ\lambda); (b) p(λ2)Gamma(r,δ)p(\lambda^2) \sim Gamma(r,\delta) this is the prior suggested by Park and Casella (2008); or, (c) p(λmax,α1,α2)Beta(λmaxα1,α2)p(\lambda | \max, \alpha_1, \alpha_2) \propto Beta\left(\frac{\lambda}{\max} | \alpha_1,\alpha_2 \right), see de los Campos et al. (2009) for details. It can be shown that the marginal prior of regression coefficients βL,k,N(βL,k0,σε2τk2)Exp(τk2λ2)τk2\beta_{L,k}, \int N(\beta_{L,k} | 0, \sigma_{\boldsymbol \varepsilon}^2 \tau_k^2) Exp\left(\tau_k^2 | \lambda^2\right) \partial \tau_k^2, is Double-Exponential. This prior has thicker tails and higher peak of mass at zero than the Gaussian prior used for βR\boldsymbol \beta_R, inducing a different type of shrinkage.

The vector u\boldsymbol u is used to model the so called ‘infinitesimal effects’, and is assigned a prior uN(0,Aσu2)\boldsymbol u \sim N(\boldsymbol 0, \boldsymbol A\sigma_{\boldsymbol u}^2), where, A\boldsymbol A is a positive-definite matrix (usually a relationship matrix computed from a pedigree) and σu2\sigma_{\boldsymbol u}^2 is an unknow variance, whose prior is σu2χ2(σu2dfu,Su)\sigma_{\boldsymbol u}^2 \sim \chi^{-2} (\sigma_{\boldsymbol u}^2 | df_{\boldsymbol u}, S_{\boldsymbol u}).

Collecting the above mentioned assumptions, the posterior distribution of model unknowns, θ={μ,βF,βR,σβR2,βL,τ2,λ,u,σu2,σε2,}\boldsymbol \theta= \left\{\mu, \boldsymbol \beta_F, \boldsymbol \beta_R, \sigma_{\boldsymbol \beta_R}^2, \boldsymbol \beta_L, \boldsymbol \tau^2, \lambda, \boldsymbol u, \sigma_{\boldsymbol u}^2, \sigma_{\boldsymbol \varepsilon}^2, \right\}, is,

p(θy)N(y1μ+XFβF+XRβR+XLβL+Zu;Diag{σε2wi2})×{jN(βR,j0,σβR2)}χ2(σβR2dfβR,SβR)×{kN(βL,k0,σε2τk2)Exp(τk2λ2)}p(λ)(2)×N(u0,Aσu2)χ2(σu2dfu,Su)χ2(σε2dfε,Sε)\begin{array}{lclr} p(\boldsymbol \theta | \boldsymbol y) & \propto & N\left( \boldsymbol y | \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu}; Diag\left\{ \frac{\sigma_\varepsilon^2}{w_i^2}\right\}\right) & \\ & & \times \left\{ \prod\limits_j N\left(\beta_{R,j} | 0, \sigma_{\boldsymbol \beta_R}^2 \right) \right\} \chi^{-2} \left(\sigma_{\boldsymbol \beta_R}^2 | df_{\boldsymbol \beta_R}, S_{\boldsymbol \beta_R}\right) & \\ & & \times \left\{ \prod\limits_k N \left( \beta_{L,k} |0,\sigma_{\boldsymbol \varepsilon}^2 \tau_k^2 \right) Exp \left(\tau_k^2 | \lambda^2\right)\right\} p(\lambda) & (2)\\ & & \times N(\boldsymbol u | \boldsymbol 0,\boldsymbol A\sigma_{\boldsymbol u}^2) \chi^{-2} (\sigma_{\boldsymbol u}^2 | df_{\boldsymbol u}, S_{\boldsymbol u}) \chi^{-2} (\sigma_{\boldsymbol \varepsilon}^2 | df_{\boldsymbol \varepsilon}, S_{\boldsymbol \varepsilon}) & \end{array}

Value

A list with posterior means, posterior standard deviations, and the parameters used to fit the model:

$yHat

the posterior mean of 1μ+XFβF+XRβR+XLβL+Zu+ε\boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu} + \boldsymbol\varepsilon.

$SD.yHat

the corresponding posterior standard deviation.

$mu

the posterior mean of the intercept.

$varE

the posterior mean of σε2\sigma_{\boldsymbol \varepsilon}^2.

$bR

the posterior mean of βR\boldsymbol \beta_R.

$SD.bR

the corresponding posterior standard deviation.

$varBr

the posterior mean of σβR2\sigma_{\boldsymbol \beta_R}^2.

$bL

the posterior mean of βL\boldsymbol \beta_L.

$SD.bL

the corresponding posterior standard deviation.

$tau2

the posterior mean of τ2\boldsymbol \tau^2.

$lambda

the posterior mean of λ\lambda.

$u

the posterior mean of u\boldsymbol u.

$SD.u

the corresponding posterior standard deviation.

$varU

the posterior mean of σu2\sigma_{\boldsymbol u}^2.

$fit

a list with evaluations of effective number of parameters and DIC (Spiegelhalter et al., 2002).

$whichNa

a vector indicating which entries in y\boldsymbol y were missing.

$prior

a list containig the priors used during the analysis.

$weights

vector of weights.

$fit

list containing the following elements,

  • $logLikAtPostMean: log-likelihood evaluated at posterior mean.

  • $postMeanLogLik: the posterior mean of the Log-Likelihood.

  • $pD: estimated effective number of parameters, Spiegelhalter et al. (2002).

  • $DIC: the deviance information criterion, Spiegelhalter et al. (2002).

$nIter

the number of iterations made in the Gibbs sampler.

$burnIn

the nuber of iteratios used as burn-in.

$thin

the thin used.

$y

original data-vector.

The posterior means returned by BLR are calculated after burnIn is passed and at a thin as specified by the user.

Save. The routine will save samples of μ\mu, variance components and λ\lambda and running means (rm*.dat). Running means are computed using the thinning specified by the user (see argument thin above); however these running means are saved at a thinning specified by argument thin2 (by default, thin2=1×10101 \times 10^{10} so that running means are computed as the sampler runs but not saved to the disc).

Author(s)

Gustavo de los Campos, Paulino Perez Rodriguez,

References

de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. 2009. Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree. Genetics 182: 375-385.

Park T. and G. Casella. 2008. The Bayesian LASSO. Journal of the American Statistical Association 103: 681-686.

Spiegelhalter, D.J., N.G. Best, B.P. Carlin and A. van der Linde. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B (Statistical Methodology) 64 (4): 583-639.

Examples

########################################################################
##Example 1:
########################################################################
if(FALSE){
rm(list=ls())
library(BLR)
data(wheat)     #Loads the wheat dataset

y=Y[,1]
### Creates a testing set with 100 observations
whichNa<-sample(1:length(y),size=100,replace=FALSE)
yNa<-y
yNa[whichNa]<-NA

### Runs the Gibbs sampler
fm<-BLR(y=yNa,XL=X,GF=list(ID=1:nrow(A),A=A),
                           prior=list(varE=list(df=3,S=0.25),
                           varU=list(df=3,S=0.63),
                           lambda=list(shape=0.52,rate=1e-4,
                           type='random',value=30)),
                           nIter=5500,burnIn=500,thin=1,
                           saveAt="example_")

MSE.tst<-mean((fm$yHat[whichNa]-y[whichNa])^2)
MSE.tst
MSE.trn<-mean((fm$yHat[-whichNa]-y[-whichNa])^2)
MSE.trn
COR.tst<-cor(fm$yHat[whichNa],y[whichNa])
COR.tst
COR.trn<-cor(fm$yHat[-whichNa],y[-whichNa])
COR.trn

plot(fm$yHat~y,xlab="Phenotype",
     ylab="Pred. Gen. Value" ,cex=.8)
points(x=y[whichNa],y=fm$yHat[whichNa],col=2,cex=.8,pch=19)

x11()
plot(scan('example_varE.dat'),type="o",
        ylab=expression(paste(sigma[epsilon]^2)))
}
########################################################################
#Example 2: Ten fold, Cross validation, environment 1,
########################################################################
if(FALSE){
rm(list=ls())
library(BLR)
data(wheat)     #Loads the wheat dataset
nIter<-1500     #For real data sets more samples are needed
burnIn<-500     
thin<-10
folds<-10
y<-Y[,1]
priorBL<-list(
               varE=list(df=3,S=2.5),
               varU=list(df=3,S=0.63),
               lambda = list(shape=0.52,rate=1e-5,value=20,type='random')
             )
             
set.seed(123)  #Set seed for the random number generator
sets<-rep(1:10,60)[-1]
sets<-sets[order(runif(nrow(A)))]
COR.CV<-rep(NA,times=(folds+1))
names(COR.CV)<-c(paste('fold=',1:folds,sep=''),'Pooled')
w<-rep(1/nrow(A),folds) ## weights for pooled correlations and MSE
yHatCV<-numeric()

for(fold in 1:folds)
{
   yNa<-y
   whichNa<-which(sets==fold)
   yNa[whichNa]<-NA
   prefix<-paste('PM_BL','_fold_',fold,'_',sep='')
   fm<-BLR(y=yNa,XL=X,GF=list(ID=(1:nrow(A)),A=A),prior=priorBL,
               nIter=nIter,burnIn=burnIn,thin=thin)
   yHatCV[whichNa]<-fm$yHat[fm$whichNa]
   w[fold]<-w[fold]*length(fm$whichNa)
   COR.CV[fold]<-cor(fm$yHat[fm$whichNa],y[whichNa])
}

COR.CV[11]<-mean(COR.CV[1:10])
COR.CV
}
########################################################################

Sets for cross validation (CV)

Description

Is a vector (599 x 1) that assigns observations to 10 disjoint sets; the assignment was generated at random. This is used later to conduct a 10-fold CV.

Source

International Maize and Wheat Improvement Center (CIMMYT), Mexico.


wheat dataset

Description

Information from a collection of 599 historical CIMMYT wheat lines. The wheat data set is from CIMMYT's Global Wheat Program. Historically, this program has conducted numerous international trials across a wide variety of wheat-producing environments. The environments represented in these trials were grouped into four basic target sets of environments comprising four main agroclimatic regions previously defined and widely used by CIMMYT's Global Wheat Breeding Program. The phenotypic trait considered here was the average grain yield (GY) of the 599 wheat lines evaluated in each of these four mega-environments.

A pedigree tracing back many generations was available, and the Browse application of the International Crop Information System (ICIS), as described in http://repository.cimmyt.org/xmlui/bitstream/handle/10883/3488/72673.pdf (McLaren et al. 2005), was used for deriving the relationship matrix A among the 599 lines; it accounts for selection and inbreeding.

Wheat lines were recently genotyped using 1447 Diversity Array Technology (DArT) generated by Triticarte Pty. Ltd. (Canberra, Australia). The DArT markers may take on two values, denoted by their presence or absence. Markers with a minor allele frequency lower than 0.05 were removed, and missing genotypes were imputed with samples from the marginal distribution of marker genotypes, that is, xij=Bernoulli(p^j)x_{ij}=Bernoulli(\hat p_j), where p^j\hat p_j is the estimated allele frequency computed from the non-missing genotypes. The number of DArT MMs after edition was 1279.

Usage

data(wheat)

Format

Matrix Y contains the average grain yield, column 1: Grain yield for environment 1 and so on. The matrix A contains additive relationship computed from the pedigree and matrix X contains the markers information.

Source

International Maize and Wheat Improvement Center (CIMMYT), Mexico.

References

McLaren, C. G., R. Bruskiewich, A.M. Portugal, and A.B. Cosico. 2005. The International Rice Information System. A platform for meta-analysis of rice crop data. Plant Physiology 139: 637-642.


Molecular markers

Description

Is a matrix (599 x 1279) with DArT genotypes; data are from pure lines and genotypes were coded as 0/1 denoting the absence/presence of the DArT. Markers with a minor allele frequency lower than 0.05 were removed, and missing genotypes were imputed with samples from the marginal distribution of marker genotypes, that is, xij=Bernoulli(p^j)x_{ij}=Bernoulli(\hat p_j), where p^j\hat p_j is the estimated allele frequency computed from the non-missing genotypes. The number of DArT MMs after edition was 1279.

Source

International Maize and Wheat Improvement Center (CIMMYT), Mexico.


Grain yield

Description

A matrix (599 x 4) containing the 2-yr average grain yield of each of these lines in each of the four environments (phenotypes were standardized to a unit variance within each environment).

Source

International Maize and Wheat Improvement Center (CIMMYT), Mexico.