Title: | Estimation for MVN and Student-t Data with Monotone Missingness |
---|---|
Description: | Estimation of multivariate normal (MVN) and student-t data of arbitrary dimension where the pattern of missing data is monotone. See Pantaleo and Gramacy (2010) <doi:10.48550/arXiv.0907.2135>. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided. |
Authors: | Robert B. Gramacy [aut, cre] (with Fortran contributions from Cleve Moler (dpotri/LINPACK) as updated by Berwin A. Turlach (qpgen2/quadprog)) |
Maintainer: | Robert B. Gramacy <[email protected]> |
License: | LGPL |
Version: | 1.9-21 |
Built: | 2024-12-23 06:43:53 UTC |
Source: | CRAN |
Estimation of multivariate normal and student-t data of arbitrary dimension where the pattern of missing data is monotone. Through the use of parsimonious/shrinkage regressions (plsr, pcr, lasso, ridge, etc.), where standard regressions fail, the package can handle a nearly arbitrary amount of missing data. The current version supports maximum likelihood inference and a full Bayesian approach employing scale-mixtures for Gibbs sampling. Monotone data augmentation extends this Bayesian approach to arbitrary missingness patterns. A fully functional standalone interface to the Bayesian lasso (from Park & Casella), the Normal-Gamma (from Griffin & Brown), Horseshoe (from Carvalho, Polson, & Scott), and ridge regression with model selection via Reversible Jump, and student-t errors (from Geweke) is also provided
For a fuller overview including a complete list of functions, demos and
vignettes, please use help(package="monomvn")
.
Robert B. Gramacy [email protected]
Maintainer: Robert B. Gramacy [email protected]
Robert B. Gramacy, Joo Hee Lee and Ricardo Silva (2008).
On estimating covariances between many assets with histories
of highly variable length.
Preprint available on arXiv:0710.5837:
https://arxiv.org/abs/0710.5837
https://bobby.gramacy.com/r_packages/monomvn/
monomvn
, the now defunct norm
package, mvnmle
Inference for ordinary least squares, lasso/NG, horseshoe and ridge regression models by (Gibbs) sampling from the Bayesian posterior distribution, augmented with Reversible Jump for model selection
bhs(X, y, T=1000, thin=NULL, RJ=TRUE, M=NULL, beta=NULL, lambda2=1, s2=var(y-mean(y)), mprior=0, ab=NULL, theta=0, rao.s2=TRUE, icept=TRUE, normalize=TRUE, verb=1) bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), mprior = 0, rd = NULL, ab = NULL, theta=0, rao.s2 = TRUE, icept = TRUE, normalize = TRUE, verb = 1) blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), case = c("default", "ridge", "hs", "ng"), mprior = 0, rd = NULL, ab = NULL, theta = 0, rao.s2 = TRUE, icept = TRUE, normalize = TRUE, verb = 1)
bhs(X, y, T=1000, thin=NULL, RJ=TRUE, M=NULL, beta=NULL, lambda2=1, s2=var(y-mean(y)), mprior=0, ab=NULL, theta=0, rao.s2=TRUE, icept=TRUE, normalize=TRUE, verb=1) bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), mprior = 0, rd = NULL, ab = NULL, theta=0, rao.s2 = TRUE, icept = TRUE, normalize = TRUE, verb = 1) blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL, beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), case = c("default", "ridge", "hs", "ng"), mprior = 0, rd = NULL, ab = NULL, theta = 0, rao.s2 = TRUE, icept = TRUE, normalize = TRUE, verb = 1)
X |
|
y |
vector of output responses |
T |
total number of MCMC samples to be collected |
thin |
number of MCMC samples to skip before a sample is
collected (via thinning). If |
RJ |
if |
M |
the maximal number of allowed covariates (columns of
|
beta |
initial setting of the regression coefficients. Any
zero-components will imply that the corresponding covariate (column
of |
lambda2 |
square of the initial lasso penalty parameter. If zero, then least squares regressions are used |
s2 |
initial variance parameter |
case |
specifies if ridge regression, the
Normal-Gamma, or the horseshoe prior should be done instead
of the lasso; only meaningful when |
mprior |
prior on the number of non-zero regression coefficients
(and therefore covariates) |
rd |
|
ab |
|
theta |
the rate parameter ( |
rao.s2 |
indicates whether Rao-Blackwellized samples for
|
icept |
if |
normalize |
if |
verb |
verbosity level; currently only |
The Bayesian lasso model and Gibbs Sampling algorithm is described
in detail in Park & Casella (2008). The algorithm implemented
by this function is identical to that described therein, with
the exception of an added “option” to use a Rao-Blackwellized
sample of (with
integrated out)
for improved mixing, and the model selections by RJ described below.
When input argument
lambda2 = 0
is
supplied, the model is a simple hierarchical linear model where
is given a Jeffrey's prior
Specifying RJ = TRUE
causes Bayesian model selection and
averaging to commence for choosing which of the columns of the
design matrix X
(and thus parameters beta
) should be
included in the model. The zero-components of the beta
input
specify which columns are in the initial model, and
M
specifies the maximal number of columns.
The RJ mechanism implemented here for the Bayesian lasso model
selection differs from the one described by Hans (2009),
which is based on an idea from Geweke (1996).
Those methods require departing from the Park & Casella
(2008) latent-variable model and requires sampling from each conditional
for all
, since a mixture prior with a point-mass at zero is
placed on each
. Out implementation
here requires no such special prior and retains the joint sampling
from the full
vector of non-zero entries, which
we believe yields better mixing in the Markov chain. RJ
proposals to increase/decrease the number of non-zero entries
does proceed component-wise, but the acceptance rates are high due
due to marginalized between-model moves (Troughton & Godsill, 1997).
When the lasso prior or RJ is used, the automatic thinning level
(unless thin != NULL
) is determined by the number of columns
of X
since this many latent variables are introduced
Bayesian ridge regression is implemented as a special case via the
bridge
function. This essentially calls blasso
with case = "ridge"
. A default setting of rd = c(0,0)
is
implied by rd = NULL
, giving the Jeffery's prior for the
penalty parameter unless
ncol(X) >=
length(y)
in which case the proper specification of rd =
c(5,10)
is used instead.
The Normal–Gamma prior (Griffin & Brown, 2009) is implemented as
an extension to the Bayesian lasso with case = "ng"
. Many
thanks to James Scott for providing the code needed to extend the
method(s) to use the horseshoe prior (Carvalho, Polson, Scott, 2010).
When theta > 0
then the Student-t errors via scale mixtures
(and thereby extra latent variables omega2
) of Geweke (1993)
is applied as an extension to the Bayesian lasso/ridge model.
If Student-t errors are used the automatic thinning level
is augmented (unless thin != NULL
) by the number of rows
in X
since this many latent variables are introduced
blasso
returns an object of class "blasso"
, which is a
list
containing a copy of all of the input arguments as well as
of the components listed below.
call |
a copy of the function call as used |
mu |
a vector of |
beta |
a |
m |
the number of non-zero entries in each vector of |
s2 |
a vector of |
lambda2 |
a vector of |
gamma |
a vector of |
tau2i |
a |
omega2 |
a |
nu |
a vector of |
pi |
a vector of |
lpost |
the log posterior probability of each (saved) sample of the joint parameters |
llik |
the log likelihood of each (saved) sample of the parameters |
llik.norm |
the log likelihood of each (saved) sample of the
parameters under the Normal errors model when sampling under the
Student-t model; i.e., it is not present
unless |
Whenever ncol(X) >= nrow(X)
it must be that either RJ = TRUE
with M <= nrow(X)-1
(the default) or that the lasso is turned
on with lambda2 > 0
. Otherwise the regression problem is ill-posed.
Since the starting values are considered to be first sample (of
T
), the total number of (new) samples obtained by Gibbs
Sampling will be T-1
Robert B. Gramacy [email protected]
Park, T., Casella, G. (2008). The Bayesian Lasso.
Journal of the American Statistical Association, 103(482),
June 2008, pp. 681-686
doi:10.1198/016214508000000337
Griffin, J.E. and Brown, P.J. (2009).
Inference with Normal-Gamma prior distributions in
regression problems. Bayesian Analysis, 5, pp. 171-188.
doi:10.1214/10-BA507
Hans, C. (2009). Bayesian Lasso regression.
Biometrika 96, pp. 835-845.
doi:10.1093/biomet/asp047
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The
horseshoe estimator for sparse signals. Biometrika 97(2):
pp. 465-480.
https://faculty.chicagobooth.edu/nicholas.polson/research/papers/Horse.pdf
Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.
Paul T. Troughton and Simon J. Godsill (1997). A reversible jump sampler for autoregressive time series, employing full conditionals to achieve efficient model space moves. Technical Report CUED/F-INFENG/TR.304, Cambridge University Engineering Department.
Geweke, J. (1993) Bayesian treatment of the independent Student-t linear model. Journal of Applied Econometrics, Vol. 8, S19-S40
https://bobby.gramacy.com/r_packages/monomvn/
lm
,
lars
in the lars package,
regress
,
lm.ridge
in the MASS package
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.las <- regress(x, y, method="lasso") ## Bayesian Lasso regression reg.blas <- blasso(x, y) ## summarize the beta (regression coefficients) estimates plot(reg.blas, burnin=200) points(drop(reg.las$b), col=2, pch=20) points(drop(reg.ols$b), col=3, pch=18) legend("topleft", c("blasso-map", "lasso", "lsr"), col=c(2,2,3), pch=c(21,20,18)) ## plot the size of different models visited plot(reg.blas, burnin=200, which="m") ## get the summary s <- summary(reg.blas, burnin=200) ## calculate the probability that each beta coef != zero s$bn0 ## summarize s2 plot(reg.blas, burnin=200, which="s2") s$s2 ## summarize lambda2 plot(reg.blas, burnin=200, which="lambda2") s$lambda2 ## Not run: ## fit with Student-t errors ## (~400-times slower due to automatic thinning level) regt.blas <- blasso(x, y, theta=0.1) ## plotting some information about nu, and quantiles plot(regt.blas, "nu", burnin=200) quantile(regt.blas$nu[-(1:200)], c(0.05, 0.95)) ## Bayes Factor shows strong evidence for Student-t model mean(exp(regt.blas$llik[-(1:200)] - regt.blas$llik.norm[-(1:200)])) ## End(Not run) ## clean up detach(diabetes)
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.las <- regress(x, y, method="lasso") ## Bayesian Lasso regression reg.blas <- blasso(x, y) ## summarize the beta (regression coefficients) estimates plot(reg.blas, burnin=200) points(drop(reg.las$b), col=2, pch=20) points(drop(reg.ols$b), col=3, pch=18) legend("topleft", c("blasso-map", "lasso", "lsr"), col=c(2,2,3), pch=c(21,20,18)) ## plot the size of different models visited plot(reg.blas, burnin=200, which="m") ## get the summary s <- summary(reg.blas, burnin=200) ## calculate the probability that each beta coef != zero s$bn0 ## summarize s2 plot(reg.blas, burnin=200, which="s2") s$s2 ## summarize lambda2 plot(reg.blas, burnin=200, which="lambda2") s$lambda2 ## Not run: ## fit with Student-t errors ## (~400-times slower due to automatic thinning level) regt.blas <- blasso(x, y, theta=0.1) ## plotting some information about nu, and quantiles plot(regt.blas, "nu", burnin=200) quantile(regt.blas$nu[-(1:200)], c(0.05, 0.95)) ## Bayes Factor shows strong evidence for Student-t model mean(exp(regt.blas$llik[-(1:200)] - regt.blas$llik.norm[-(1:200)])) ## End(Not run) ## clean up detach(diabetes)
Summarizing, printing, and plotting the contents of a
"blasso"
-class object containing samples from
the posterior distribution of a Bayesian lasso model
## S3 method for class 'blasso' print(x, ...) ## S3 method for class 'blasso' summary(object, burnin = 0, ...) ## S3 method for class 'blasso' plot(x, which=c("coef", "s2", "lambda2", "gamma", "tau2i","omega2", "nu", "m", "pi"), subset = NULL, burnin = 0, ... ) ## S3 method for class 'summary.blasso' print(x, ...)
## S3 method for class 'blasso' print(x, ...) ## S3 method for class 'blasso' summary(object, burnin = 0, ...) ## S3 method for class 'blasso' plot(x, which=c("coef", "s2", "lambda2", "gamma", "tau2i","omega2", "nu", "m", "pi"), subset = NULL, burnin = 0, ... ) ## S3 method for class 'summary.blasso' print(x, ...)
object |
a |
x |
a |
subset |
a vector of indicies that can be used to specify
the a subset of the columns of |
burnin |
number of burn-in rounds to discard before
reporting summaries and making plots. Must be non-negative
and less than |
which |
indicates the parameter whose characteristics
should be plotted; does not apply to the |
... |
passed to |
print.blasso
prints the call
followed by a
brief summary of the MCMC run and a suggestion to try
the summary and plot commands.
plot.blasso
uses an appropriate
plot
command on the list
entries of the
"blasso"
-class object thus
visually summarizing the samples from the posterior distribution of
each parameter in the model depending on the which
argument supplied.
summary.blasso
uses the summary
command
on the list entries of the "blasso"
-class object thus
summarizing the samples from the posterior distribution of each
parameter in the model.
print.summary.monomvn
calls print.blasso
on the object
and then prints the result of
summary.blasso
summary.blasso
returns a "summary.blasso"
-class
object, which is a list
containing (a subset of) the items below.
The other functions do not return values.
B |
a copy of the input argument |
T |
total number of MCMC samples to be collected from |
thin |
number of MCMC samples to skip before a sample is
collected (via thinning) from |
coef |
a joint |
s2 |
a |
lambda2 |
a |
lambda2 |
a |
tau2i |
a |
omega2 |
a |
nu |
a |
bn0 |
the estimated posterior probability that the individual
components of the regression coefficients |
m |
a |
pi |
the estimated Binomial proportion in the prior for
the model order when 2-vector input is provided for
|
Robert B. Gramacy [email protected]
https://bobby.gramacy.com/r_packages/monomvn/
Bayesian estimation via sampling from the posterior distribution of the of the mean and covariance matrix of multivariate normal (MVN) distributed data with a monotone missingness pattern, via Gibbs Sampling. Through the use of parsimonious/shrinkage regressions (lasso/NG & ridge), where standard regressions fail, this function can handle an (almost) arbitrary amount of missing data
bmonomvn(y, pre = TRUE, p = 0.9, B = 100, T = 200, thin = 1, economy = FALSE, method = c("lasso", "ridge", "lsr", "factor", "hs", "ng"), RJ = c("p", "bpsn", "none"), capm = TRUE, start = NULL, mprior = 0, rd = NULL, theta = 0, rao.s2 = TRUE, QP = NULL, verb = 1, trace = FALSE)
bmonomvn(y, pre = TRUE, p = 0.9, B = 100, T = 200, thin = 1, economy = FALSE, method = c("lasso", "ridge", "lsr", "factor", "hs", "ng"), RJ = c("p", "bpsn", "none"), capm = TRUE, start = NULL, mprior = 0, rd = NULL, theta = 0, rao.s2 = TRUE, QP = NULL, verb = 1, trace = FALSE)
y |
data |
pre |
logical indicating whether pre-processing of the
|
p |
when performing regressions, |
B |
number of Burn-In MCMC sampling rounds, during which samples are discarded |
T |
total number of MCMC sampling rounds to take place after burn-in, during which samples are saved |
thin |
multiplicative thinning in the MCMC. Each Bayesian
(lasso) regression will discard |
economy |
indicates whether memory should be economized at
the expense of speed. When |
method |
indicates the Bayesian parsimonious regression
specification to be used, choosing between the lasso (default)
of Park & Casella, the NG extension, the horseshoe,
a ridge regression special case, and least-squares.
The |
RJ |
indicates the Reversible Jump strategy to be employed.
The default argument of |
capm |
when |
start |
a list depicting starting values for the parameters
that are use to initialize the Markov chain. Usually this will be
a |
mprior |
prior on the number of non-zero regression coefficients
(and therefore covariates) |
rd |
|
theta |
the rate parameter ( |
rao.s2 |
indicates whether to Rao-Blackwellized samples for
|
QP |
if non- |
verb |
verbosity level; currently only |
trace |
if |
If pre = TRUE
then bmonomvn
first re-arranges the columns
of y
into nondecreasing order with respect to the number of
missing (NA
) entries. Then (at least) the first column should
be completely observed.
Samples from the posterior distribution of the MVN mean vector and
covariance matrix are obtained sampling
from the posterior distribution of Bayesian regression models.
The methodology for converting these to samples from the mean vector
and covariance matrix is outlined in the monomvn
documentation, detailing a similarly structured maximum likelihood
approach. Also see the references below.
Whenever the regression model is ill–posed (i.e., when there are
more covariates than responses, or a
“big p
small n
” problem) then
Bayesian lasso or ridge regressions – possibly augmented with Reversible
Jump (RJ) for model selection – are used instead.
See the Park & Casella reference below, and the blasso
documentation. To guarantee each regression is well posed the
combination setting of method="lsr"
and RJ="none"
is not allowed.
As in monomvn
the p
argument can be used to
turn on lasso or ridge regressions (possibly with RJ) at other times.
The exception is the "factor"
method which always involves
an OLS regression on (a subset of) the first p
columns of y
.
Samples from a function of samples of mu
and Sigma
can be obtained by specifying a Quadratic program via the
argument QP
. The idea is to allow for the calculation of
the distribution of minimum variance and mean–variance portfolios,
although the interface is quite general. See default.QP
for more details, as default.QP(ncol(y))
is used
when the argument QP = TRUE
is given. When a positive integer
is given, then the first QP
columns of y
are treated
as factors by using
default.QP(ncol(y) - QP)
instead. The result is that the corresponding components of (samples of)
mu
and rows/cols of S
are not factored into the
specification of the resulting Quadratic Program
bmonomvn
returns an object of class "monomvn"
,
which is a list
containing the inputs above and a
subset of the components below.
call |
a copy of the function call as used |
mu |
estimated mean vector with columns corresponding to the
columns of |
S |
estimated covariance matrix with rows and columns
corresponding to the columns of |
mu.var |
estimated variance of the mean vector with columns
corresponding to the columns of |
mu.cov |
estimated covariance matrix of the mean vector
with columns corresponding to the columns of |
S.var |
estimated variance of the individual components of the
covariance matrix with columns and rows corresponding to the columns
of |
mu.map |
estimated maximum a' posteriori (MAP) of the
mean vector with columns corresponding to the columns of |
S.map |
estimated MAP of the individual
components of the covariance matrix with columns and rows
corresponding to the columns of |
S.nz |
posterior probability that the individual entries of the covariance matrix are non–zero |
Si.nz |
posterior probability that the individual entries of the inverse of the covariance matrix are non–zero |
nu |
when |
lpost.map |
log posterior probability of the MAP estimate |
which.map |
gives the time index of the sample corresponding to the MAP estimate |
llik |
a trace of the log likelihood of the data |
llik.norm |
a trace of the log likelihood
under the Normal errors model when sampling under the
Student-t model; i.e., it is not present unless |
na |
when |
o |
when |
method |
method of regression used on each column, or
|
thin |
the (actual) number of thinning rounds used for the
regression ( |
lambda2 |
records the mean |
ncomp |
records the mean number of components
(columns of the design matrix) used in the regression model for
each column of |
trace |
if input |
R |
gives a |
B |
from inputs: number of Burn-In MCMC sampling rounds, during which samples are discarded |
T |
from inputs: total number of MCMC sampling rounds to take place after burn-in, during which samples are saved |
r |
from inputs: alpha (shape) parameter to the gamma distribution prior for the lasso parameter lambda |
delta |
from inputs: beta (rate) parameter to the gamma distribution prior for the lasso parameter lambda |
QP |
if a valid (non– |
W |
when input |
Whenever the bmonomvn
algorithm requires a regression
where p >= n
, i.e., if any of the columns in the y
matrix have fewer non–NA
elements than the number of
columns with more non–NA
elements, then it is helpful
to employ both lasso/ridge and the RJ method.
It is important that any starting values provided in the
start
be compatible with the regression model
specified by inputs RJ
and method
. Any
incompatibilities will result with a warning that
(alternative) default action was taken and may result in
an undesired (possibly inferior) model being fit
Robert B. Gramacy [email protected]
R.B. Gramacy and E. Pantaleo (2010). Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing. Bayesian Analysis. 5(1), 237-262. doi:10.1214/10-BA602 Preprint available on arXiv:0710.5837 https://arxiv.org/abs/0907.2135
Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.
https://bobby.gramacy.com/r_packages/monomvn/
blasso
, monomvn
,
default.QP
, em.norm
in the now defunct
norm
and mvnmle
packages, and returns
## standard usage, duplicating the results in ## Little and Rubin, section 7.4.3 data(cement.miss) out <- bmonomvn(cement.miss) out out$mu out$S ## ## A bigger example, comparing the various ## parsimonious methods ## ## generate N=100 samples from a 10-d random MVN xmuS <- randmvn(100, 20) ## randomly impose monotone missingness xmiss <- rmono(xmuS$x) ## using least squares only when necessary, obl <- bmonomvn(xmiss) obl ## look at the posterior variability par(mfrow=c(1,2)) plot(obl) plot(obl, "S") ## compare to maximum likelihood Ellik.norm(obl$mu, obl$S, xmuS$mu, xmuS$S) oml <- monomvn(xmiss, method="lasso") Ellik.norm(oml$mu, oml$S, xmuS$mu, xmuS$S) ## ## a min-variance portfolio allocation example ## ## get the returns data, and use 20 random cols data(returns) train <- returns[,sample(1:ncol(returns), 20)] ## missingness pattern requires DA; also gather ## samples from the solution to a QP obl.da <- bmonomvn(train, p=0, QP=TRUE) ## plot the QP weights distribution plot(obl.da, "QP", xaxis="index") ## get ML solution: will warn about monotone violations suppressWarnings(oml.da <- monomvn(train, method="lasso")) ## add mean and MLE comparison, requires the ## quadprog library for the solve.QP function add.pe.QP(obl.da, oml.da) ## now consider adding in the market as a factor data(market) mtrain <- cbind(market, train) ## fit the model using only factor regressions obl.daf <- bmonomvn(mtrain, method="factor", p=1, QP=1) plot(obl.daf, "QP", xaxis="index", main="using only factors") suppressWarnings(oml.daf <- monomvn(mtrain, method="factor")) add.pe.QP(obl.daf, oml.daf) ## ## a Bayes/MLE comparison using least squares sparingly ## ## fit Bayesian and classical lasso p <- 0.25 obls <- bmonomvn(xmiss, p=p) Ellik.norm(obls$mu, obls$S, xmuS$mu, xmuS$S) omls <- monomvn(xmiss, p=p, method="lasso") Ellik.norm(omls$mu, omls$S, xmuS$mu, xmuS$S) ## compare to ridge regression obrs <- bmonomvn(xmiss, p=p, method="ridge") Ellik.norm(obrs$mu, obrs$S, xmuS$mu, xmuS$S) omrs <- monomvn(xmiss, p=p, method="ridge") Ellik.norm(omrs$mu, omrs$S, xmuS$mu, xmuS$S)
## standard usage, duplicating the results in ## Little and Rubin, section 7.4.3 data(cement.miss) out <- bmonomvn(cement.miss) out out$mu out$S ## ## A bigger example, comparing the various ## parsimonious methods ## ## generate N=100 samples from a 10-d random MVN xmuS <- randmvn(100, 20) ## randomly impose monotone missingness xmiss <- rmono(xmuS$x) ## using least squares only when necessary, obl <- bmonomvn(xmiss) obl ## look at the posterior variability par(mfrow=c(1,2)) plot(obl) plot(obl, "S") ## compare to maximum likelihood Ellik.norm(obl$mu, obl$S, xmuS$mu, xmuS$S) oml <- monomvn(xmiss, method="lasso") Ellik.norm(oml$mu, oml$S, xmuS$mu, xmuS$S) ## ## a min-variance portfolio allocation example ## ## get the returns data, and use 20 random cols data(returns) train <- returns[,sample(1:ncol(returns), 20)] ## missingness pattern requires DA; also gather ## samples from the solution to a QP obl.da <- bmonomvn(train, p=0, QP=TRUE) ## plot the QP weights distribution plot(obl.da, "QP", xaxis="index") ## get ML solution: will warn about monotone violations suppressWarnings(oml.da <- monomvn(train, method="lasso")) ## add mean and MLE comparison, requires the ## quadprog library for the solve.QP function add.pe.QP(obl.da, oml.da) ## now consider adding in the market as a factor data(market) mtrain <- cbind(market, train) ## fit the model using only factor regressions obl.daf <- bmonomvn(mtrain, method="factor", p=1, QP=1) plot(obl.daf, "QP", xaxis="index", main="using only factors") suppressWarnings(oml.daf <- monomvn(mtrain, method="factor")) add.pe.QP(obl.daf, oml.daf) ## ## a Bayes/MLE comparison using least squares sparingly ## ## fit Bayesian and classical lasso p <- 0.25 obls <- bmonomvn(xmiss, p=p) Ellik.norm(obls$mu, obls$S, xmuS$mu, xmuS$S) omls <- monomvn(xmiss, p=p, method="lasso") Ellik.norm(omls$mu, omls$S, xmuS$mu, xmuS$S) ## compare to ridge regression obrs <- bmonomvn(xmiss, p=p, method="ridge") Ellik.norm(obrs$mu, obrs$S, xmuS$mu, xmuS$S) omrs <- monomvn(xmiss, p=p, method="ridge") Ellik.norm(omrs$mu, omrs$S, xmuS$mu, xmuS$S)
Heat evolved in setting of cement, as a function of its chemical composition.
data(cement) data(cement.miss)
data(cement) data(cement.miss)
A data.frame
with 13 observations on the following 5 variables.
percentage weight in clinkers of 3CaO.Al2O3
percentage weight in clinkers of 3CaO.SiO2
percentage weight in clinkers of 4CaO.Al2O3.Fe2O3
percentage weight in clinkers of 2CaO.SiO2
heat evolved (calories/gram)
cement.miss
is taken from an example in Little & Rubin's book
on Statistical Analysis with Missing Data (2002), pp.~154, for
demonstrating estimation of multivariate means and variances when
the missing data pattern is monotone. These are indicated by
NA
in cement.miss
. See the examples section of
monomvn
for a re-working of the example from the textbook
Woods, H., Steinour, H. H. and Starke, H. R. (1932) Effect of composition of Portland cement on heat evolved during hardening. Industrial Engineering and Chemistry, 24, 1207–1214.
Davison, A. C. (2003) Statistical Models. Cambridge University Press. Page 355.
Draper, N.R. and Smith, H. (1998) Applied Regression Analysis. Wiley. Page 630.
Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely. Page 154.
https://bobby.gramacy.com/r_packages/monomvn/
monomvn
– Several other R packages also include
this data set
data(cement) lm(y~x1+x2+x3+x4,data=cement)
data(cement) lm(y~x1+x2+x3+x4,data=cement)
This function generates a default “minimum variance”
Quadratic Program in order to obtain samples of the solution
under the posterior for parameters and
obtained via
bmonomvn
. The list generated as output
has entries similar to the inputs of solve.QP
from the quadprog package
default.QP(m, dmu = FALSE, mu.constr = NULL)
default.QP(m, dmu = FALSE, mu.constr = NULL)
m |
the dimension of the solution space; usually
|
dmu |
a logical indicating whether |
mu.constr |
a vector indicating linear constraints on the
samples of |
When bmonomvn(y, QP=TRUE)
is called, this
function is used to generate a default Quadratic Program that
samples from the argument such that
subject to the constraints that all
, for
,
and where is sampled from its posterior distribution
conditional on the data
.
Alternatively, this function
can be used as a skeleton to for adaptation to more general
Quadratic Programs by adjusting the
list
that is returned,
as described in the “value” section below.
Non-default settings of the arguments dmu
and mu.constr
augment the default Quadratic Program, described above, in two standard ways.
Specifying dvec = TRUE
causes the program objective to change
to
with the same constraints as above. Setting mu.constr = 1
,
say, would augment the constraints to include
for samples of
from the posterior. Setting
mu.constr = c(1,2)
would
augment the constraints still further with
i.e., with
alternating sign on the linear part, so that each sample of
must lie in the interval [1,2].
So whereas
dmu = TRUE
allows the mu
samples to
enter the objective in a standard way, mu.constr
(!= NULL
) allows it to enter the constraints.
The accompanying function monomvn.solve.QP
can
act as an interface between the constructed (default) QP
object, and estimates of the covariance matrix and
mean vector
, that is identical to the one used on
the posterior-sample version implemented in
bmonomvn
.
The example below, and those in the documentation for
bmonomvn
, illustrate how this feature may be used
to extract mean and MLE solutions to the constructed
Quadratic Program
This function returns a list
that can be interpreted as
specifying the following arguments to the
solve.QP
function in the quadprog
package. See solve.QP
for more information
of the general specification of these arguments. In what follows
we simply document the defaults provided by default.QP
.
Note that the Dmat
argument is not, specified as
bmonomvn
will use samples from S
(from the
posterior) instead
m |
|
dvec |
a zero-vector |
dmu |
a copy of the |
Amat |
a |
b0 |
a vector containing the (RHS of) in/equalities described by the these constraints |
meq |
an integer scalar indicating that the first |
mu.constr |
a vector whose length is one greater than the
input argument of the same name, providing
and location |
The $QP
object that is returned from bmonomvn
will have the following additional field
o |
an integer vector of length |
Robert B. Gramacy [email protected]
bmonomvn
and solve.QP
in the quadprog package, monomvn.solve.QP
## generate N=100 samples from a 10-d random MVN ## and randomly impose monotone missingness xmuS <- randmvn(100, 20) xmiss <- rmono(xmuS$x) ## set up the minimum-variance (default) Quadratic Program ## and sample from the posterior of the solution space qp1 <- default.QP(ncol(xmiss)) obl1 <- bmonomvn(xmiss, QP=qp1) bm1 <- monomvn.solve.QP(obl1$S, qp1) ## calculate mean bm1er <- monomvn.solve.QP(obl1$S + obl1$mu.cov, qp1) ## use estimation risk oml1 <- monomvn(xmiss) mm1 <- monomvn.solve.QP(oml1$S, qp1) ## calculate MLE ## now obtain samples from the solution space of the ## mean-variance QP qp2 <- default.QP(ncol(xmiss), dmu=TRUE) obl2 <- bmonomvn(xmiss, QP=qp2) bm2 <- monomvn.solve.QP(obl2$S, qp2, obl2$mu) ## calculate mean bm2er <- monomvn.solve.QP(obl2$S + obl2$mu.cov, qp2, obl2$mu) ## use estimation risk oml2 <- monomvn(xmiss) mm2 <- monomvn.solve.QP(oml2$S, qp2, oml2$mu) ## calculate MLE ## now obtain samples from minimum variance solutions ## where the mean weighted (samples) are constrained to be ## greater one qp3 <- default.QP(ncol(xmiss), mu.constr=1) obl3 <- bmonomvn(xmiss, QP=qp3) bm3 <- monomvn.solve.QP(obl3$S, qp3, obl3$mu) ## calculate mean bm3er <- monomvn.solve.QP(obl3$S + obl3$mu.cov, qp3, obl3$mu) ## use estimation risk oml3 <- monomvn(xmiss) mm3 <- monomvn.solve.QP(oml3$S, qp3, oml2$mu) ## calculate MLE ## plot a comparison par(mfrow=c(3,1)) plot(obl1, which="QP", xaxis="index", main="Minimum Variance") points(bm1er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm1, col=3, pch=18, cex=1.5) ## add mean points(mm1, col=5, pch=16, cex=1.5) ## add MLE legend("topleft", c("MAP", "posterior mean", "ER", "MLE"), col=2:5, pch=c(21,18,17,16), cex=1.5) plot(obl2, which="QP", xaxis="index", main="Mean Variance") points(bm2er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm2, col=3, pch=18, cex=1.5) ## add mean points(mm2, col=5, pch=16, cex=1.5) ## add MLE plot(obl3, which="QP", xaxis="index", main="Minimum Variance, mean >= 1") points(bm3er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm3, col=3, pch=18, cex=1.5) ## add mean points(mm3, col=5, pch=16, cex=1.5) ## add MLE ## for a further comparison of samples of the QP solution ## w under Bayesian and non-Bayesian monomvn, see the ## examples in the bmonomvn help file
## generate N=100 samples from a 10-d random MVN ## and randomly impose monotone missingness xmuS <- randmvn(100, 20) xmiss <- rmono(xmuS$x) ## set up the minimum-variance (default) Quadratic Program ## and sample from the posterior of the solution space qp1 <- default.QP(ncol(xmiss)) obl1 <- bmonomvn(xmiss, QP=qp1) bm1 <- monomvn.solve.QP(obl1$S, qp1) ## calculate mean bm1er <- monomvn.solve.QP(obl1$S + obl1$mu.cov, qp1) ## use estimation risk oml1 <- monomvn(xmiss) mm1 <- monomvn.solve.QP(oml1$S, qp1) ## calculate MLE ## now obtain samples from the solution space of the ## mean-variance QP qp2 <- default.QP(ncol(xmiss), dmu=TRUE) obl2 <- bmonomvn(xmiss, QP=qp2) bm2 <- monomvn.solve.QP(obl2$S, qp2, obl2$mu) ## calculate mean bm2er <- monomvn.solve.QP(obl2$S + obl2$mu.cov, qp2, obl2$mu) ## use estimation risk oml2 <- monomvn(xmiss) mm2 <- monomvn.solve.QP(oml2$S, qp2, oml2$mu) ## calculate MLE ## now obtain samples from minimum variance solutions ## where the mean weighted (samples) are constrained to be ## greater one qp3 <- default.QP(ncol(xmiss), mu.constr=1) obl3 <- bmonomvn(xmiss, QP=qp3) bm3 <- monomvn.solve.QP(obl3$S, qp3, obl3$mu) ## calculate mean bm3er <- monomvn.solve.QP(obl3$S + obl3$mu.cov, qp3, obl3$mu) ## use estimation risk oml3 <- monomvn(xmiss) mm3 <- monomvn.solve.QP(oml3$S, qp3, oml2$mu) ## calculate MLE ## plot a comparison par(mfrow=c(3,1)) plot(obl1, which="QP", xaxis="index", main="Minimum Variance") points(bm1er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm1, col=3, pch=18, cex=1.5) ## add mean points(mm1, col=5, pch=16, cex=1.5) ## add MLE legend("topleft", c("MAP", "posterior mean", "ER", "MLE"), col=2:5, pch=c(21,18,17,16), cex=1.5) plot(obl2, which="QP", xaxis="index", main="Mean Variance") points(bm2er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm2, col=3, pch=18, cex=1.5) ## add mean points(mm2, col=5, pch=16, cex=1.5) ## add MLE plot(obl3, which="QP", xaxis="index", main="Minimum Variance, mean >= 1") points(bm3er, col=4, pch=17, cex=1.5) ## add estimation risk points(bm3, col=3, pch=18, cex=1.5) ## add mean points(mm3, col=5, pch=16, cex=1.5) ## add MLE ## for a further comparison of samples of the QP solution ## w under Bayesian and non-Bayesian monomvn, see the ## examples in the bmonomvn help file
These functions calculate the root-mean-squared-error, the expected log likelihood, and Kullback-Leibler (KL) divergence (a.k.a. distance), between two multivariate normal (MVN) distributions described by their mean vector and covariance matrix
rmse.muS(mu1, S1, mu2, S2) Ellik.norm(mu1, S1, mu2, S2, quiet=FALSE) kl.norm(mu1, S1, mu2, S2, quiet=FALSE, symm=FALSE)
rmse.muS(mu1, S1, mu2, S2) Ellik.norm(mu1, S1, mu2, S2, quiet=FALSE) kl.norm(mu1, S1, mu2, S2, quiet=FALSE, symm=FALSE)
mu1 |
mean vector of first (estimated) MVN |
S1 |
covariance matrix of first (estimated) MVN |
mu2 |
mean vector of second (true, baseline, or comparator) MVN |
S2 |
covariance matrix of second (true, baseline, or comparator) MVN |
quiet |
when |
symm |
when |
The root-mean-squared-error is calculated between the entries of the mean vectors, and the upper-triangular part of the covariance matrices (including the diagonal).
The KL divergence is given by the formula:
where is
length(mu1)
, and must agree with
the dimensions of the other parameters. Note that the parameterization
used involves swapped arguments compared to some other references,
e.g., as provided by Wikipedia. See note below.
The expected log likelihood can be formulated in terms of the
KL divergence. That is, the expected log likelihood of data
simulated from the normal distribution with parameters mu2
and S2
under the estimated normal with parameters
mu1
and S1
is given by
In the case of the expected log likelihood the result is a real number. The RMSE is a positive real number. The KL divergence method returns a positive real number depicting the distance between the two normal distributions
The KL-divergence is not symmetric. Therefore
kl.norm(mu1,S1,mu2,S2) != kl.norm(mu2,S2,mu1,S1).
But a symmetric metric can be constructed from
0.5 * (kl.norm(mu1,S1,mu2,S2) + kl.norm(mu2,S2,mu1,S1))
or by using symm = TRUE
. The arguments are reversed
compared to some other references, like Wikipedia. To match
those versions use kl.norm(mu2, S2, mu1, s1)
Robert B. Gramacy [email protected]
https://bobby.gramacy.com/r_packages/monomvn/
mu1 <- rnorm(5) s1 <- matrix(rnorm(100), ncol=5) S1 <- t(s1) %*% s1 mu2 <- rnorm(5) s2 <- matrix(rnorm(100), ncol=5) S2 <- t(s2) %*% s2 ## RMSE rmse.muS(mu1, S1, mu2, S2) ## expected log likelihood Ellik.norm(mu1, S1, mu2, S2) ## KL is not symmetric kl.norm(mu1, S1, mu2, S2) kl.norm(mu2, S2, mu1, S1) ## symmetric version kl.norm(mu2, S2, mu1, S1, symm=TRUE)
mu1 <- rnorm(5) s1 <- matrix(rnorm(100), ncol=5) S1 <- t(s1) %*% s1 mu2 <- rnorm(5) s2 <- matrix(rnorm(100), ncol=5) S2 <- t(s2) %*% s2 ## RMSE rmse.muS(mu1, S1, mu2, S2) ## expected log likelihood Ellik.norm(mu1, S1, mu2, S2) ## KL is not symmetric kl.norm(mu1, S1, mu2, S2) kl.norm(mu2, S2, mu1, S1) ## symmetric version kl.norm(mu2, S2, mu1, S1, symm=TRUE)
Maximum likelihood estimation of the mean and covariance matrix of multivariate normal (MVN) distributed data with a monotone missingness pattern. Through the use of parsimonious/shrinkage regressions (e.g., plsr, pcr, ridge, lasso, etc.), where standard regressions fail, this function can handle an (almost) arbitrary amount of missing data
monomvn(y, pre = TRUE, method = c("plsr", "pcr", "lasso", "lar", "forward.stagewise", "stepwise", "ridge", "factor"), p = 0.9, ncomp.max = Inf, batch = TRUE, validation = c("CV", "LOO", "Cp"), obs = FALSE, verb = 0, quiet = TRUE)
monomvn(y, pre = TRUE, method = c("plsr", "pcr", "lasso", "lar", "forward.stagewise", "stepwise", "ridge", "factor"), p = 0.9, ncomp.max = Inf, batch = TRUE, validation = c("CV", "LOO", "Cp"), obs = FALSE, verb = 0, quiet = TRUE)
y |
data |
pre |
logical indicating whether pre-processing of the
|
method |
describes the type of parsimonious
(or shrinkage) regression to
be performed when standard least squares regression fails.
From the pls package we have |
p |
when performing regressions, |
ncomp.max |
maximal number of (principal) components to include
in a |
batch |
indicates whether the columns with equal missingness should be processed together using a multi-response regression. This is more efficient if many OLS regressions are used, but can lead to slightly poorer, even unstable, fits when parsimonious regressions are used |
validation |
method for cross validation when applying
a parsimonious regression method. The default setting
of |
obs |
logical indicating whether or not to (additionally)
compute a mean vector and covariance matrix based only on the observed
data, without regressions. I.e., means are calculated as averages
of each non- |
verb |
whether or not to print progress indicators. The default
( |
quiet |
causes |
If pre = TRUE
then monomvn
first re-arranges the columns
of y
into nondecreasing order with respect to the number of
missing (NA
) entries. Then (at least) the first column should
be completely observed. The mean components and covariances between
the first set of complete columns are obtained through the standard
mean
and cov
routines.
Next each successive group of columns with the same missingness pattern
is processed in sequence (assuming batch = TRUE
).
Suppose a total of j
columns have
been processed this way already. Let y2
represent the non-missing
contingent of the next group of k
columns of y
with and identical missingness pattern, and let y1
be the
previously processed j-1
columns of y
containing only the rows
corresponding to each non-NA
entry in y2
. I.e.,
nrow(y1) = nrow(y2)
. Note that y1
contains no
NA
entries since the missing data pattern is monotone.
The k
next entries (indices j:(j+k)
) of the mean vector,
and the j:(j+k)
rows and columns of the covariance matrix are
obtained by multivariate regression of y2
on y1
.
The regression method used (except in the case of method =
"factor"
depends on the number of rows and columns
in y1
and on the p
parameter. Whenever ncol(y1)
< p*nrow(y1)
least-squares regression is used, otherwise
method = c("pcr", "plsr")
. If ever a least-squares regression
fails due to co-linearity then one of the other method
s is
tried. The "factor"
method always involves an OLS regression
on (a subset of) the first p
columns of y
.
All method
s require a scheme for estimating the amount of
variability explained by increasing the numbers of coefficients
(or principal components) in the model.
Towards this end, the pls and lars packages support
10-fold cross validation (CV) or leave-one-out (LOO) CV estimates of
root mean squared error. See pls and lars for
more details. monomvn
uses
CV in all cases except when nrow(y1) <= 10
, in which case CV fails and
LOO is used. Whenever nrow(y1) <= 3
pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs are calculated for a number of
components in 1:ncomp.max
where
a NULL
value for ncomp.max
it is replaced with
ncomp.max <- min(ncomp.max, ncol(y2), nrow(y1)-1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp
for pls), or number of coefficients (for
lars), which explains a large amount of the variability (RMSE).
The lars methods use a “one-standard error rule” outlined
in Section 7.10, page 216 of HTF below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the
optimize
function on the GCV
output.
Based on the ML ncol(y1)+1
regression coefficients (including
intercept) obtained for each of the
columns of y2
, and on the corresponding matrix
of
residual sum of squares, and on the previous j-1
means
and rows/cols of the covariance matrix, the j:(j+k)
entries and
rows/cols can be filled in as described by Little and Rubin, section 7.4.3.
Once every column has been processed, the entries of the mean vector, and rows/cols of the covariance matrix are re-arranged into their original order.
monomvn
returns an object of class "monomvn"
, which is a
list
containing a subset of the components below.
call |
a copy of the function call as used |
mu |
estimated mean vector with columns corresponding to the
columns of |
S |
estimated covariance matrix with rows and columns
corresponding to the columns of |
na |
when |
o |
when |
method |
method of regression used on each column, or
|
ncomp |
number of components in a |
lambda |
if |
mu.obs |
when |
S.obs |
when |
The CV in plsr and lars are random in nature, and so
can be dependent on the random seed. Use validation=LOO
for
deterministic (but slower) result.
When using method = "factor"
in the current version of
the package, the factors in the first p
columns of y
must also obey the monotone pattern, and,
have no more NA
entries than the other columns of y
.
Be warned that the lars implementation of
"forward.stagewise"
can sometimes get stuck in
(what seems like) an infinite loop.
This is not a bug in the monomvn
package;
the bug has been reported to the authors of lars
Robert B. Gramacy [email protected]
Robert B. Gramacy, Joo Hee Lee, and Ricardo Silva (2007).
On estimating covariances between many assets with histories
of highly variable length.
Preprint available on arXiv:0710.5837:
https://arxiv.org/abs/0710.5837
Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.
Bjorn-Helge Mevik and Ron Wehrens (2007). The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18(2)
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf
Trevor Hastie, Robert Tibshirani and Jerome Friedman (2002). Elements of Statistical Learning. Springer, NY. [HTF]
Some of the code for monomvn
, and its subroutines, was inspired
by code found on the world wide web, written by Daniel Heitjan.
Search for “fcn.q”
https://bobby.gramacy.com/r_packages/monomvn/
bmonomvn
, em.norm
in the now defunct norm
and mvnmle
packages
## standard usage, duplicating the results in ## Little and Rubin, section 7.4.3 -- try adding ## verb=3 argument for a step-by-step breakdown data(cement.miss) out <- monomvn(cement.miss) out out$mu out$S ## ## A bigger example, comparing the various methods ## ## generate N=100 samples from a 10-d random MVN xmuS <- randmvn(100, 20) ## randomly impose monotone missingness xmiss <- rmono(xmuS$x) ## plsr oplsr <- monomvn(xmiss, obs=TRUE) oplsr Ellik.norm(oplsr$mu, oplsr$S, xmuS$mu, xmuS$S) ## calculate the complete and observed RMSEs n <- nrow(xmiss) - max(oplsr$na) x.c <- xmiss[1:n,] mu.c <- apply(x.c, 2, mean) S.c <- cov(x.c)*(n-1)/n Ellik.norm(mu.c, S.c, xmuS$mu, xmuS$S) Ellik.norm(oplsr$mu.obs, oplsr$S.obs, xmuS$mu, xmuS$S) ## plcr opcr <- monomvn(xmiss, method="pcr") Ellik.norm(opcr$mu, opcr$S, xmuS$mu, xmuS$S) ## ridge regression oridge <- monomvn(xmiss, method="ridge") Ellik.norm(oridge$mu, oridge$S, xmuS$mu, xmuS$S) ## lasso olasso <- monomvn(xmiss, method="lasso") Ellik.norm(olasso$mu, olasso$S, xmuS$mu, xmuS$S) ## lar olar <- monomvn(xmiss, method="lar") Ellik.norm(olar$mu, olar$S, xmuS$mu, xmuS$S) ## forward.stagewise ofs <- monomvn(xmiss, method="forward.stagewise") Ellik.norm(ofs$mu, ofs$S, xmuS$mu, xmuS$S) ## stepwise ostep <- monomvn(xmiss, method="stepwise") Ellik.norm(ostep$mu, ostep$S, xmuS$mu, xmuS$S)
## standard usage, duplicating the results in ## Little and Rubin, section 7.4.3 -- try adding ## verb=3 argument for a step-by-step breakdown data(cement.miss) out <- monomvn(cement.miss) out out$mu out$S ## ## A bigger example, comparing the various methods ## ## generate N=100 samples from a 10-d random MVN xmuS <- randmvn(100, 20) ## randomly impose monotone missingness xmiss <- rmono(xmuS$x) ## plsr oplsr <- monomvn(xmiss, obs=TRUE) oplsr Ellik.norm(oplsr$mu, oplsr$S, xmuS$mu, xmuS$S) ## calculate the complete and observed RMSEs n <- nrow(xmiss) - max(oplsr$na) x.c <- xmiss[1:n,] mu.c <- apply(x.c, 2, mean) S.c <- cov(x.c)*(n-1)/n Ellik.norm(mu.c, S.c, xmuS$mu, xmuS$S) Ellik.norm(oplsr$mu.obs, oplsr$S.obs, xmuS$mu, xmuS$S) ## plcr opcr <- monomvn(xmiss, method="pcr") Ellik.norm(opcr$mu, opcr$S, xmuS$mu, xmuS$S) ## ridge regression oridge <- monomvn(xmiss, method="ridge") Ellik.norm(oridge$mu, oridge$S, xmuS$mu, xmuS$S) ## lasso olasso <- monomvn(xmiss, method="lasso") Ellik.norm(olasso$mu, olasso$S, xmuS$mu, xmuS$S) ## lar olar <- monomvn(xmiss, method="lar") Ellik.norm(olar$mu, olar$S, xmuS$mu, xmuS$S) ## forward.stagewise ofs <- monomvn(xmiss, method="forward.stagewise") Ellik.norm(ofs$mu, ofs$S, xmuS$mu, xmuS$S) ## stepwise ostep <- monomvn(xmiss, method="stepwise") Ellik.norm(ostep$mu, ostep$S, xmuS$mu, xmuS$S)
Summarizing, printing, and plotting the contents of a
"monomvn"
-class object
## S3 method for class 'monomvn' summary(object, Si = FALSE, ...) ## S3 method for class 'summary.monomvn' print(x, ...) ## S3 method for class 'summary.monomvn' plot(x, gt0 = FALSE, main = NULL, xlab = "number of zeros", ...)
## S3 method for class 'monomvn' summary(object, Si = FALSE, ...) ## S3 method for class 'summary.monomvn' print(x, ...) ## S3 method for class 'summary.monomvn' plot(x, gt0 = FALSE, main = NULL, xlab = "number of zeros", ...)
object |
a |
x |
a |
Si |
boolean indicating whether |
gt0 |
boolean indicating whether the histograms in
|
main |
optional text to be added to the main title of the histograms
produced by the generic |
xlab |
label for the x-axes of the histograms produced by
|
... |
passed to |
These functions work on the output from both monomvn
and bmonomvn
.
print.monomvn
prints the call
followed by a
summary of the regression method used at each iteration of the
algorithm. It also indicates how many completely observed features
(columns) there were in the data.
For non-least-squares regressions (i.e., plsr, lars
and lm.ridge
methods)
and indication of the method used for selecting the
number of components (i.e., CV
, LOO
, etc., or
none
) is provided
summary.monomvn
summarizes information about the
number of zeros in the estimated covariance matrix object$S
and its inverse
print.summary.monomvn
calls print.monomvn
on the object
and then prints the result of
summary.monomvn
plot.summary.monomvn
makes histograms of the number of
zeros in the columns of object$S
and its inverse
summary.monomvn
returns a
"summary.monomvn"
-class object, which is a list
containing (a subset of) the items below. The other
functions do not return values.
obj |
the |
marg |
the proportion of zeros in |
S0 |
a vector containing the number of zeros in each column
of |
cond |
if input |
Si0 |
if input |
There is one further S3 function for "monomvn"
-class
objects that has its own help file: plot.monomvn
Robert B. Gramacy [email protected]
https://bobby.gramacy.com/r_packages/monomvn/
bmonomvn
, monomvn
,
plot.monomvn
Solve a Quadratic Program specified by a QP
object
using the covariance matrix and mean vector specified
monomvn.solve.QP(S, QP, mu = NULL)
monomvn.solve.QP(S, QP, mu = NULL)
S |
a positive-definite covariance |
QP |
a Quadratic Programming object like one that can
be generated automatically by |
mu |
an mean vector with
|
The protocol executed by this function is identical to
the one used on samples of and
obtained in
bmonomvn
when a Quadratic Program
is specified through the QP
argument. For more details
on the specification of the Quadratic Program implied by a
QP
object, please see default.QP
and
the examples therein
The output is a vector whose length agrees with
the dimension of S
, describing the solution to the
Quadratic Program given
Robert B. Gramacy [email protected]
default.QP
, bmonomvn
,
and solve.QP
in the quadprog package
Functions for visualizing the output from bmonomvn
,
particularly the posterior standard deviation estimates of the mean
vector and covariance matrix, and samples from the solution to a
Quadratic Program
## S3 method for class 'monomvn' plot(x, which=c("mu", "S", "Snz", "Sinz", "QP"), xaxis=c("numna", "index"), main=NULL, uselog=FALSE, ...)
## S3 method for class 'monomvn' plot(x, which=c("mu", "S", "Snz", "Sinz", "QP"), xaxis=c("numna", "index"), main=NULL, uselog=FALSE, ...)
x |
a |
which |
determines the parameter whose standard deviation
to be visualized: the mean vector ( |
xaxis |
indicates how x-axis (or x- and y-axis in the case
of |
main |
optional text to be added to the main title of the plots;
the default of |
uselog |
a logical which, when |
... |
passed to |
Currently, this function only provides a visualization of the
posterior standard deviation estimates of the parameters, and
the distributions of samples from the posterior of the solution
to a specified Quadratic Program. Therefore
it only works on the output from bmonomvn
All types of visualization (specified by which
) are presented
in the order of the number of missing entries in the columns of the
data passed as input to bmonomvn
.
In the case of which = "mu"
this means that y-values are presented in the order x$o
, where
the x-axis is either 1:length(x$o)
in the case of
xaxis = "index"
, or x$na[x$o]
in the case of xaxis
= "numna"
. When which = "S"
is given the resulting
image
plot is likewise ordered by x$o
where the
x- and y-axis are as above, except that in the case
where xaxis = "numna"
the repeated counts of NA
s are
are adjusted by small increments so that x and y arguments to
image
are distinct. Since a boxplot
is
used when which = "QP"
it may be that xaxis = "index"
is preferred
The only output of this function is beautiful plots
Robert B. Gramacy [email protected]
https://bobby.gramacy.com/r_packages/monomvn/
bmonomvn
, print.monomvn
,
summary.monomvn
Randomly generate a mean vector and covariance matrix describing a multivariate normal (MVN) distribution, and then sample from it
randmvn(N, d, method = c("normwish", "parsimonious"), mup=list(mu = 0, s2 = 1), s2p=list(a = 0.5, b = 1), pnz=0.1, nu=Inf)
randmvn(N, d, method = c("normwish", "parsimonious"), mup=list(mu = 0, s2 = 1), s2p=list(a = 0.5, b = 1), pnz=0.1, nu=Inf)
N |
number of samples to draw |
d |
dimension of the MVN, i.e., the length of the mean vector and the number of rows/cols of the covariance matrix |
method |
the default generation method is |
mup |
a |
s2p |
a |
pnz |
a scalar |
nu |
a scalar |
In the direct method ("normwish"
) the components of the
mean vector mu
are iid from a standard normal distribution,
and the covariance matrix S
is
drawn from an inverse–Wishart distribution with degrees of freedom
d + 2
and mean (centering matrix) diag(d)
In the "parsimonious"
method mu
and S
are
built up sequentially by randomly sampling intercepts, regression
coefficients (of length i-1
for i in 1:d
) and variances
by applying the monomvn
equations. A unique prior results
when a random number of the regression coefficients are set to zero.
When none are set to zero the direct method results
The return value is a list
with the following components:
mu |
randomly generated mean vector of length |
S |
randomly generated covariance |
x |
if |
requires the rmvnorm
function of the
mvtnorm package
Robert B. Gramacy [email protected]
randmvn(5, 3)
randmvn(5, 3)
This function fits the specified ordinary least squares or
parsimonious regression (plsr, pcr, ridge, and lars methods)
depending on the arguments provided, and returns estimates of
coefficients and (co-)variances in a monomvn
friendly
format
regress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar", "forward.stagewise", "stepwise", "ridge", "factor"), p = 0, ncomp.max = Inf, validation = c("CV", "LOO", "Cp"), verb = 0, quiet = TRUE)
regress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar", "forward.stagewise", "stepwise", "ridge", "factor"), p = 0, ncomp.max = Inf, validation = c("CV", "LOO", "Cp"), verb = 0, quiet = TRUE)
X |
|
y |
matrix of responses |
method |
describes the type of parsimonious
(or shrinkage) regression, or ordinary least squares.
From the pls package we have |
p |
when performing regressions, |
ncomp.max |
maximal number of (principal) components to consider
in a |
validation |
method for cross validation when applying
a parsimonious regression method. The default setting
of |
verb |
whether or not to print progress indicators. The default
( |
quiet |
causes |
All method
s (except "lsr"
) require a scheme for
estimating the amount of variability explained by increasing numbers
of non-zero coefficients (or principal components) in the model.
Towards this end, the pls and lars packages support
10-fold cross validation (CV) or leave-one-out (LOO) CV estimates of
root mean squared error. See pls and lars for
more details. The regress
function uses CV in all cases
except when nrow(X) <= 10
, in which case CV fails and
LOO is used. Whenever nrow(X) <= 3
pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs
are calculated for a number of components in 1:ncomp.max
where
a NULL
value for ncomp.max
it is replaced with
ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp
for pls), or number of coefficients (for
lars) which explains a large amount of the variability (RMSE).
The lars methods use a “one-standard error rule” outlined
in Section 7.10, page 216 of HTF below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the optimize
function on the GCV
output.
regress
returns a list
containing
the components listed below.
call |
a copy of the function call as used |
method |
a copy of the |
ncomp |
depends on the |
lambda |
if |
b |
matrix containing the estimated regression coefficients,
with |
S |
(biased corrected) maximum likelihood estimate of residual covariance matrix |
The CV in plsr and lars are random in nature, and so
can be dependent on the random seed. Use validation="LOO"
for
deterministic (but slower) result
Be warned that the lars implementation of
"forward.stagewise"
can sometimes get stuck in
(what seems like) an infinite loop.
This is not a bug in the regress
function;
the bug has been reported to the authors of lars
Robert B. Gramacy [email protected]
Bjorn-Helge Mevik and Ron Wehrens (2007). The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18(2)
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf
https://bobby.gramacy.com/r_packages/monomvn/
monomvn
, blasso
,
lars
in the lars library,
lm.ridge
in the MASS library,
plsr
and pcr
in the
pls library
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.lasso <- regress(x, y, method="lasso") ## partial least squares regression reg.plsr <- regress(x, y, method="plsr") ## ridge regression reg.ridge <- regress(x, y, method="ridge") ## compare the coefs data.frame(ols=reg.ols$b, lasso=reg.lasso$b, plsr=reg.plsr$b, ridge=reg.ridge$b) ## summarize the posterior distribution of lambda2 and s2 detach(diabetes)
## following the lars diabetes example data(diabetes) attach(diabetes) ## Ordinary Least Squares regression reg.ols <- regress(x, y) ## Lasso regression reg.lasso <- regress(x, y, method="lasso") ## partial least squares regression reg.plsr <- regress(x, y, method="plsr") ## ridge regression reg.ridge <- regress(x, y, method="ridge") ## compare the coefs data.frame(ols=reg.ols$b, lasso=reg.lasso$b, plsr=reg.plsr$b, ridge=reg.ridge$b) ## summarize the posterior distribution of lambda2 and s2 detach(diabetes)
Monthly returns of common domestic stocks traded on the NYSE and the AMEX from April 1968 until 1998; also contains the return to the market
data(returns) data(returns.test) data(market) data(market.test)
data(returns) data(returns.test) data(market) data(market.test)
The returns provided are collected in a data.frame
with
1168 columns, and 360 rows in the case of returns
and 12
rows for returns.test
. The columns are uniquely coded to
identify the stock traded on NYSE or AMEX. The market return
is in two vectors market
and market.test
of length 360 and 12, respectively
The columns contain monthly returns of common domestic stocks traded
on the NYSE and the AMEX from April 1968 until 1998. returns
contains returns up until 1997, whereas returns.test
has the
returns for 1997. Both data sets have been cleaned in the following
way. All stocks have a share price greater than $5 and a market
capitalization greater than 20% based on the size distribution of
NYSE firms. Stocks without completely observed return
series in 1997 were also discarded.
The market returns provided are essentially the monthly return on the S&P500 during the same period, which is highly correlated with the raw monthly returns weighted by their market capitalization
This data is a subset of that originally used by Chan, Karceski, and Lakonishok (1999), and subsequently by several others; see the references below. We use it as part of the monomvn package as an example of a real world data set following a nearly monotone missingness pattern
Louis K. Chan, Jason Karceski, and Josef Lakonishok (1999). On Portfolio Optimization: Forecasting Covariances and Choosing the Risk Model. The Review of Financial Studies. 12(5), 937-974
Ravi Jagannathan and Tongshu Ma (2003). Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps. Journal of Finance, American Finance Association. 58(4), 1641-1684
Robert B. Gramacy, Joo Hee Lee, and Ricardo Silva (2008).
On estimating covariances between many assets with histories
of highly variable length.
Preprint available on arXiv:0710.5837:
https://arxiv.org/abs/0710.5837
https://bobby.gramacy.com/r_packages/monomvn/
data(returns) ## investigate the monotone missingness pattern returns.na <- is.na(returns) image(1:ncol(returns), 1:nrow(returns), t(returns.na)) ## for a portfolio balancing exercise, see ## the example in the bmonomvn help file
data(returns) ## investigate the monotone missingness pattern returns.na <- is.na(returns) image(1:ncol(returns), 1:nrow(returns), t(returns.na)) ## for a portfolio balancing exercise, see ## the example in the bmonomvn help file
Randomly impose a monotone missingness pattern by replacing the ends
of each column of the input matrix by a random number of NA
s
rmono(x, m = 7, ab = NULL)
rmono(x, m = 7, ab = NULL)
x |
data |
m |
minimum number of non- |
ab |
a two-vector of |
The returned x
always has one (randomly selected)
complete column, and no column has fewer than m
non-missing entries. Otherwise, the proportion of missing entries
in each column can be uniform, or it can have a beta
distribution with parameters (
ab[1]
) and
(
ab[2]
)
returns a matrix
with the same dimensions as the input x
Robert B. Gramacy [email protected]
https://bobby.gramacy.com/r_packages/monomvn/
randmvn
out <- randmvn(10, 3) rmono(out$x)
out <- randmvn(10, 3) rmono(out$x)
Random generation from the Wishart distribution.
rwish(v, S)
rwish(v, S)
v |
degrees of freedom (scalar) |
S |
inverse scale matrix |
The mean of a Wishart random variable with v
degrees of freedom
and inverse scale matrix S
is .
Returns generates one random draw from the distribution which is a
matrix
with the same dimensions as S
This was copied from the MCMCpack package
draw <- rwish(3, matrix(c(1,.3,.3,1),2,2))
draw <- rwish(3, matrix(c(1,.3,.3,1),2,2))