Title: | The Multivariate Normal and t Distributions, and Their Truncated Versions |
---|---|
Description: | Functions are provided for computing the density and the distribution function of d-dimensional normal and "t" random variables, possibly truncated (on one side or two sides), and for generating random vectors sampled from these distributions, except sampling from the truncated "t". Moments of arbitrary order of a multivariate truncated normal are computed, and converted to cumulants up to order 4. Probabilities are computed via non-Monte Carlo methods; different routines are used in the case d=1, d=2, d=3, d>3, if d denotes the dimensionality. |
Authors: | Adelchi Azzalini [aut, cre], Alan Genz [aut] (most Fortran code), Alan Miller [ctb] (Fortran routine PHI), Michael J. Wichura [ctb] (Fortran routine PHINV), G. W. Hill [ctb] (Fortran routine STDINV), Yihong Ge [ctb] (Fortran routines BNVU and MVBVU). |
Maintainer: | Adelchi Azzalini <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 2.1.1 |
Built: | 2024-12-23 06:41:11 UTC |
Source: | CRAN |
Functions are provided for computing the density and the distribution function of d-dimensional normal and t random variables, possibly truncated (on one side or two sides, with componentwise choice), and for generating random vectors sampled from these distributions, except sampling from the truncated t. Moments of arbitrary order of a truncated normal are computed, and converted to cumulants up to order 4.
Probabilities are computed via non-Monte Carlo methods; different routines
are used in the case d=1, d=2, d=3, d>2
, if d
denotes the
dimensionality.
This package and its documentation are usable under the terms of the “GNU General Public License” version 3 or version 2, as you prefer; a copy of them is available from https://www.R-project.org/Licenses/.
Adelchi Azzalini (R code and package creation) and Alan Genz (Fortran code, see the references below; this code incorporates routines of other authors).
Genz, A. (1992). Numerical Computation of Multivariate Normal Probabilities. J. Computational and Graphical Statist. 1, 141-149.
Genz, A. (1993). Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics, 25, 400-405.
Genz, A.: Fortran code downloaded in 2006 from the author web page, located at https://www.math.wsu.edu/faculty/genz/software/software.html, as of 2020-06-01.
The probability density function, the distribution function and random
number generation for a d
-dimensional multivariate normal (Gaussian)
random variable.
dmnorm(x, mean = rep(0, d), varcov, log = FALSE) pmnorm(x, mean = rep(0, d), varcov, ...) rmnorm(n = 1, mean = rep(0, d), varcov, sqrt=NULL) sadmvn(lower, upper, mean, varcov, maxpts = 2000*d, abseps = 1e-06, releps = 0)
dmnorm(x, mean = rep(0, d), varcov, log = FALSE) pmnorm(x, mean = rep(0, d), varcov, ...) rmnorm(n = 1, mean = rep(0, d), varcov, sqrt=NULL) sadmvn(lower, upper, mean, varcov, maxpts = 2000*d, abseps = 1e-06, releps = 0)
x |
either a vector of length |
mean |
either a vector of length |
varcov |
a symmetric positive-definite matrix representing the
variance-covariance matrix of the distribution;
a vector of length 1 is also allowed (in this case, |
sqrt |
if not |
log |
a logical value (default value is |
... |
arguments passed to |
n |
the number of (pseudo) random vectors to be generated. |
lower |
a numeric vector of lower integration limits of
the density function; must be of maximal length |
upper |
a numeric vector of upper integration limits
of the density function; must be of maximal length |
maxpts |
the maximum number of function evaluations
(default value: |
abseps |
absolute error tolerance (default value: |
releps |
relative error tolerance (default value: |
The dimension d
cannot exceed 20
for pmnorm
and
sadmvn
. If this threshold is exceeded, NA
is returned.
The function pmnorm
works by making a suitable call to
sadmvn
if d>3
, or to ptriv.nt
if d=3
,
or to biv.nt.prob
if d=2
, or to pnorm
if d=1
.
The R functions sadmvn
, ptriv.nt
and biv.nt.prob
are,
in essence, interfaces to underlying Fortran 77 routines by Alan
Genz; see the references below.
These routines use adaptive numerical quadrature and other non-random
type techniques.
If sqrt=NULL
(default value), the working of rmnorm
involves
computation of a square root of varcov
via the Cholesky decomposition.
If a non-NULL
value of sqrt
is supplied, it is assumed
that it represents a matrix, say, such that
represents the required variance-covariance matrix of the distribution;
in this case, the argument
varcov
is ignored.
This mechanism is intended primarily for use in a sequence of calls to
rmnorm
, all sampling from a distribution with fixed variance matrix;
a suitable matrix sqrt
can then be computed only once beforehand,
avoiding that the same operation is repeated multiple times along the
sequence of calls; see the examples below.
Another use of sqrt
is to supply a different form of square root
of the variance-covariance matrix, in place of the Cholesky factor.
For efficiency reasons, rmnorm
does not perform checks on the supplied
arguments.
If, after setting the same seed value to set.seed
,
two calls to rmnorm
are made with the same arguments except that one
generates n1
vectors and the other n2
vectors, with
n1<n2
, then the n1
vectors of the first call coincide with the
initial n2
vectors of the second call.
dmnorm
returns a vector of density values (possibly log-transformed);
pmnorm
returns a vector of probabilities, possibly with attributes
on the accuracy in case x
is a vector;
sadmvn
return a single probability with
attributes giving details on the achieved accuracy;
rmnorm
returns a matrix of n
rows of random vectors,
or a vector in case n=1
or d=1
.
The attributes error
and status
of the probability
returned by pmnorm
and sadmvn
indicate whether the function
had a normal termination, achieving the required accuracy.
If this is not the case, re-run the function with a higher value of
maxpts
FORTRAN 77 code of SADMVN
, package mvtdstpack.f
,
package tvpack
and most auxiliary functions by Alan Genz;
some additional auxiliary functions by people referred to within his programs;
interface to R and additional R code (for dmnormt
,
rmnormt
, etc.) by Adelchi Azzalini.
Genz, A. (1992). Numerical Computation of multivariate normal probabilities. J. Computational and Graphical Statist., 1, 141-149.
Genz, A. (1993). Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics, 25, 400-405.
Genz, A.: Fortran 77 code downloaded in 2005 and again in 2007 from his web-page, whose URL as of 2020-04-28 was https://www.math.wsu.edu/faculty/genz/software/software.html
dnorm
, dmt
,
biv.nt.prob
, ptriv.nt
,
plot_fxy
for plotting examples
x <- seq(-2, 4, length=21) y <- cos(2*x) + 10 z <- x + sin(3*y) mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) f <- dmnorm(cbind(x,y,z), mu, Sigma) f0 <- dmnorm(mu, mu, Sigma) p1 <- pmnorm(c(2,11,3), mu, Sigma) p2 <- pmnorm(c(2,11,3), mu, Sigma, maxpts=10000, abseps=1e-10) p <- pmnorm(cbind(x,y,z), mu, Sigma) # set.seed(123) x1 <- rmnorm(5, mu, Sigma) set.seed(123) x2 <- rmnorm(5, mu, sqrt=chol(Sigma)) # x1=x2 eig <- eigen(Sigma, symmetric = TRUE) R <- t(eig$vectors %*% diag(sqrt(eig$values))) for(i in 1:50) x <- rmnorm(5, mu, sqrt=R) # p <- sadmvn(lower=c(2,11,3), upper=rep(Inf,3), mu, Sigma) # upper tail # p0 <- pmnorm(c(2,11), mu[1:2], Sigma[1:2,1:2]) p1 <- biv.nt.prob(0, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) p2 <- sadmvn(lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) c(p0, p1, p2, p0-p1, p0-p2) # p1 <- pnorm(0, 1, 3) p2 <- pmnorm(0, 1, 3^2)
x <- seq(-2, 4, length=21) y <- cos(2*x) + 10 z <- x + sin(3*y) mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) f <- dmnorm(cbind(x,y,z), mu, Sigma) f0 <- dmnorm(mu, mu, Sigma) p1 <- pmnorm(c(2,11,3), mu, Sigma) p2 <- pmnorm(c(2,11,3), mu, Sigma, maxpts=10000, abseps=1e-10) p <- pmnorm(cbind(x,y,z), mu, Sigma) # set.seed(123) x1 <- rmnorm(5, mu, Sigma) set.seed(123) x2 <- rmnorm(5, mu, sqrt=chol(Sigma)) # x1=x2 eig <- eigen(Sigma, symmetric = TRUE) R <- t(eig$vectors %*% diag(sqrt(eig$values))) for(i in 1:50) x <- rmnorm(5, mu, sqrt=R) # p <- sadmvn(lower=c(2,11,3), upper=rep(Inf,3), mu, Sigma) # upper tail # p0 <- pmnorm(c(2,11), mu[1:2], Sigma[1:2,1:2]) p1 <- biv.nt.prob(0, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) p2 <- sadmvn(lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) c(p0, p1, p2, p0-p1, p0-p2) # p1 <- pnorm(0, 1, 3) p2 <- pmnorm(0, 1, 3^2)
Moments up to the specified orders of a possibly truncated d
-dimensional
normal distribution; the distribution must be non-degenerate.
Each component variable can be truncated on one side (to the left or
to the right) or on two sides or not truncated.
After the initial stage, cumulants up to the fourth order and other
quantities are computed, provided all moments of the required order
had been computed in the first stage.
mom.mtruncnorm(powers=4, mean, varcov, lower, upper, cum = TRUE, ...)
mom.mtruncnorm(powers=4, mean, varcov, lower, upper, cum = TRUE, ...)
powers |
a vector of non-negative integer values representing the required order of moments for each component variable, or a single such value, in which case this value is repeated for all component variables. |
mean |
a |
varcov |
a symmetric positive definite matrix with dimensions |
lower |
a |
upper |
a |
cum |
a logical value; if codeTRUE (default value), cumulants are other quantities are computed up to the minimum between the fourth order and the maximum possible order, given the available moments. |
... |
additional arguments passed to |
The maximal value of d is 20
. If this threshold is exceeded, NA
s
are returned. The constraint originates from the underlying function
sadmvn
.
This function makes use of two workhorses, recintab
and mom2cum
,
providing a user-friendly interface to these more basic tools.
The first function computes an array of raw moments of the truncated normal;
the second function translates them into cumulants and other quantities
such as the Mardia's measures of skewness and kurtosis, unless cum=FALSE
.
See the documentation of these two underlying functions for additional
information about the arguments and the returned quantities.
The argument ...
is passed, via recintab
, to sadmvn
for regulation of its working.
Not all d
component variables need to be truncated.
In fact, the function works also with no truncated components
(just omit lower
and upper
),
although for this case there exist known formulae to do the job.
A list with the following components:
mom |
an array with raw moments as produced by |
cum1 |
the vector of first-order cumulants,
AKA the expected value or the mean value,
which will be there provided |
order2 , ...
|
additional lists with higher order terms up to order 4;
these lists only exist when the available moments are of sufficiently
high order. See |
Adelchi Azzalini
mu <- c(1, -0.5, 0) Sigma <- toeplitz(1/(1:3)) lower <- c(-Inf, -3, -4) upper <- c(1.5, Inf, 2) m <- mom.mtruncnorm(1, mu, Sigma, lower, upper) print(m$cum1) # m <- mom.mtruncnorm(3, mu, Sigma, lower, upper) print(m$order3$gamma1.marginal) print(m$order3$gamma1.Mardia) # #-- # Example 2 of Leppard & Tallis (1989, Appl.Stat. vol.38, p.547) truncp <- c(0, 1, 2) U <- c(0, 0, 0) V <- 0.5*(diag(3) + matrix(1, 3, 3)) m <- mom.mtruncnorm(2, U, V, truncp) print(m$cum1, digits=9) print(m$order2$cum2, digits=9)
mu <- c(1, -0.5, 0) Sigma <- toeplitz(1/(1:3)) lower <- c(-Inf, -3, -4) upper <- c(1.5, Inf, 2) m <- mom.mtruncnorm(1, mu, Sigma, lower, upper) print(m$cum1) # m <- mom.mtruncnorm(3, mu, Sigma, lower, upper) print(m$order3$gamma1.marginal) print(m$order3$gamma1.Mardia) # #-- # Example 2 of Leppard & Tallis (1989, Appl.Stat. vol.38, p.547) truncp <- c(0, 1, 2) U <- c(0, 0, 0) V <- 0.5*(diag(3) + matrix(1, 3, 3)) m <- mom.mtruncnorm(2, U, V, truncp) print(m$cum1, digits=9) print(m$order2$cum2, digits=9)
Given an array of moments of a multivariate distribution, the corresponding cumulants up to the 4th order and other connected quantities are computed, notably the Mardia's measures of multivariate skewness and kurtosis
mom2cum(mom)
mom2cum(mom)
mom |
an array whose entries are assumed to represent moments of a multivariate distribution; see ‘Details’ for an extended description. |
The structure of the input array mom
is of type M/M[1]
where M
represents the output from function recintab
.
For a d
-dimensional random variable, mom
is a k
-fold
d
-dimensional array, where k
is the highest order of moments
being considered;
see the documentation of recintab
for a more detailed description.
However, it is not necessary that mom
originates from recintab
;
the moments can refer to any distribution, as long as mom
has the
appropriate structure and content.
Also, it is not necessary that all entries of mom
are there;
values not required for the processing can be left as NA
.
For computing cumulants of order k
, say, we only need cross moments
whose exponents add up to k
or less.
Conversion from moments to cumulants is performed by using formulae (2.7)
of McCullagh (1987). See also in his (2.15) and
in (2.16) for computing the Mardia's (1970, 1974) measures of
multivariate skewness and kurtosis.
In some cases,
the function may report inconsistencies detected in the argument mom
.
A typical origin of this situation is in numerical inaccuracies of the
returned value of recintab
,
as explained in more detail in its documentation.
When detected, cases of these sort are flagged in the returned $message
string, and a warning message is issued.
The absence of such string does not represent a guarantee of perfect input.
In the multivariate case, a list with the following elements, provided moments of the required order are available, up to the maximal order 4.
cum1 |
the |
order2 |
a list with the following components:
|
order3 |
a list with the following components:
|
order4 |
a list with the following components:
|
message |
possibly, a character string indicating that some inconsistency
has been detected in the argument |
In the univariate case a list with elements:
cum |
a vector of cumulants, |
centr.mom |
a vector of central moments, |
std.cum |
a vector with the third and the fourth standardized cumulants (when enough moments are available), representing common measures of skewness and kurtosis. |
message |
possibly, a character string indicating that some inconsistency
has been detected in the argument |
In the case of a multivariate truncated normal distribution,
a user does not need to call this function; mom.mtruncnorm
provides a more convenient interface for the same computations.
The present function needs to be called only if the array mom
represents the moments of some other distribution.
Adelchi Azzalini
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications Biometrika, 57, 519-530.
Mardia, K. V. (1974). Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya ser.B, 36, 115-128.
McCullagh, Peter (1987). Tensor Methods in Statistics. Chapman & Hall, London.
mu <- c(1, -0.5) Sigma <- toeplitz(1/(1:2)) low <- c(-Inf, -3) hi <- c(1.5, Inf) mom <- recintab(c(3,3), low, hi, mu, Sigma) cum <- mom2cum(mom) print(cum$order3$gamma1.marginal) print(cum$order3$gamma1.Mardia)
mu <- c(1, -0.5) Sigma <- toeplitz(1/(1:2)) low <- c(-Inf, -3) hi <- c(1.5, Inf) mom <- recintab(c(3,3), low, hi, mu, Sigma) cum <- mom2cum(mom) print(cum$order3$gamma1.marginal) print(cum$order3$gamma1.Mardia)
The probability density function, the distribution function and random number
generation for a d
-dimensional Student's t random variable.
dmt(x, mean = rep(0, d), S, df=Inf, log = FALSE) pmt(x, mean = rep(0, d), S, df=Inf, ...) rmt(n = 1, mean = rep(0, d), S, df=Inf, sqrt=NULL) sadmvt(df, lower, upper, mean, S, maxpts = 2000*d, abseps = 1e-06, releps = 0) biv.nt.prob(df, lower, upper, mean, S) ptriv.nt(df, x, mean, S)
dmt(x, mean = rep(0, d), S, df=Inf, log = FALSE) pmt(x, mean = rep(0, d), S, df=Inf, ...) rmt(n = 1, mean = rep(0, d), S, df=Inf, sqrt=NULL) sadmvt(df, lower, upper, mean, S, maxpts = 2000*d, abseps = 1e-06, releps = 0) biv.nt.prob(df, lower, upper, mean, S) ptriv.nt(df, x, mean, S)
x |
either a vector of length |
mean |
either a vector of length |
S |
a symmetric positive definite matrix with dimensions |
df |
the degrees of freedom.
For |
log |
a logical value(default value is |
sqrt |
if not |
... |
arguments passed to |
n |
the number of random vectors to be generated |
lower |
a numeric vector of lower integration limits of
the density function; must be of maximal length |
upper |
a numeric vector of upper integration limits
of the density function; must be of maximal length |
maxpts |
the maximum number of function evaluations
(default value: |
abseps |
absolute error tolerance (default value: |
releps |
relative error tolerance (default value: |
The dimension d
cannot exceed 20
for pmt
and
sadmvt
. If this threshold is exceeded, NA
is returned.
The functions sadmvt
, ptriv.mt
and biv.nt.prob
are
interfaces to Fortran 77 routines by Alan Genz, available from his web page;
they makes use of some auxiliary functions whose authors are indicated
in the Fortran code itself.
The routine sadmvt
uses an adaptive integration method.
If df=3
, a call to pmt
activates a call to ptriv.nt
which is specific for the trivariate case, and uses Genz's Fortran
code tvpack.f
; see Genz (2004) for the background methodology.
A similar fact takes place when df=2
with function biv.nt.prob
;
note however that the underlying Fortran code is taken from
mvtdstpack.f
, not from tvpack.f
.
If pmt
is called with d>3
, this is converted into
a suitable call to sadmvt
.
If sqrt=NULL
(default value), the working of rmt
involves
computation of a square root of S
via the Cholesky decomposition.
If a non-NULL
value of sqrt
is supplied, it is assumed that
it represents a square root of the scale matrix,
otherwise represented by S
, whose value is ignored in this case.
This mechanism is intended primarily for use in a sequence of calls to
rmt
, all sampling from a distribution with fixed scale matrix;
a suitable matrix sqrt
can then be computed only once beforehand,
avoiding that the same operation is repeated multiple times along the
sequence of calls. For examples of use of this argument, see those in the
documentation of rmnorm
.
Another use of sqrt
is to supply a different form of square root
of the scale matrix, in place of the Cholesky factor.
For efficiency reasons, rmt
does not perform checks on the supplied
arguments.
dmt
returns a vector of density values (possibly log-transformed);
pmt
and sadmvt
return a single probability with
attributes giving details on the achieved accuracy, provided x
of pmnorm
is a vector;
rmt
returns a matrix of n
rows of random vectors,
or a vector in case n=1
or d=1
.
The attributes error
and status
of the probability returned
by sadmvt
and by pmt
(the latter only if x
is a vector
and d>2
) indicate whether the function
had a normal termination, achieving the required accuracy.
If this is not the case, re-run the function with a higher value of
maxpts
.
FORTRAN 77 code of SADMVT
, MVTDSTPACK
, TVPACK
and many auxiliary functions by Alan Genz;
some additional auxiliary functions by people referred to within his
programs; interface to R and additional R code (for dmt
, rmt
etc.) by Adelchi Azzalini.
Genz, A.: Fortran 77 code in files mvt.f
, mvtdstpack.f
and codetvpack, downloaded in 2005 and again in 2007 from his webpage,
whose URL as of 2020-06-01 is
https://www.math.wsu.edu/faculty/genz/software/software.html
Genz, A. (2004). Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Statistics and Computing 14, 251-260.
Dunnett, C.W. and Sobel, M. (1954). A bivariate generalization of Student's t-distribution with tables for certain special cases. Biometrika 41, 153–169.
dt
,
rmnorm
for use of argument sqrt
,
plot_fxy
for plotting examples
x <- seq(-2,4,length=21) y <- 2*x+10 z <- x+cos(y) mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 f <- dmt(cbind(x,y,z), mu, Sigma,df) p1 <- pmt(c(2,11,3), mu, Sigma, df) p2 <- pmt(c(2,11,3), mu, Sigma, df, maxpts=10000, abseps=1e-8) x <- rmt(10, mu, Sigma, df) p <- sadmvt(df, lower=c(2,11,3), upper=rep(Inf,3), mu, Sigma) # upper tail # p0 <- pmt(c(2,11), mu[1:2], Sigma[1:2,1:2], df=5) p1 <- biv.nt.prob(5, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) p2 <- sadmvt(5, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) c(p0, p1, p2, p0-p1, p0-p2)
x <- seq(-2,4,length=21) y <- 2*x+10 z <- x+cos(y) mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) df <- 4 f <- dmt(cbind(x,y,z), mu, Sigma,df) p1 <- pmt(c(2,11,3), mu, Sigma, df) p2 <- pmt(c(2,11,3), mu, Sigma, df, maxpts=10000, abseps=1e-8) x <- rmt(10, mu, Sigma, df) p <- sadmvt(df, lower=c(2,11,3), upper=rep(Inf,3), mu, Sigma) # upper tail # p0 <- pmt(c(2,11), mu[1:2], Sigma[1:2,1:2], df=5) p1 <- biv.nt.prob(5, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) p2 <- sadmvt(5, lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) c(p0, p1, p2, p0-p1, p0-p2)
The probability density function, the distribution function and random
number generation for the d
-dimensional truncated normal (Gaussian)
random variable.
dmtruncnorm(x, mean, varcov, lower, upper, log = FALSE, ...) pmtruncnorm(x, mean, varcov, lower, upper, ...) rmtruncnorm(n, mean, varcov, lower, upper, start, burnin=5, thinning=1)
dmtruncnorm(x, mean, varcov, lower, upper, log = FALSE, ...) pmtruncnorm(x, mean, varcov, lower, upper, ...) rmtruncnorm(n, mean, varcov, lower, upper, start, burnin=5, thinning=1)
x |
either a vector of length |
mean |
a |
varcov |
a symmetric positive definite matrix with dimensions |
lower |
a |
upper |
a |
log |
a logical value (default value is |
... |
arguments passed to |
n |
the number of (pseudo) random vectors to be generated. |
start |
an optional vector of initial values; see ‘Details’. |
burnin |
the number of burnin iterations of the Gibbs sampler
(default: |
thinning |
a positive integer representing the thinning factor of the
internally generated Gibbs sequence (default: |
For dmtruncnorm
and pmtruncnorm
,
the dimension d
cannot exceed 20
.
If this threshold is exceeded, NA
s are returned.
The constraint originates from the underlying function sadmvn
.
If d>1
, rmtruncnorm
uses a Gibbs sampling scheme as described by
Breslaw (1994) and by Kotecha & Djurić (1999),
Detailed algebraic expressions are provided by Wilhelm (2022).
After some initial settings in R, the core iteration is performed by
a compiled FORTRAN 77 subroutine, for numerical efficiency.
If the start
vector is not supplied, the mean value of the truncated
distribution is used. This choice should provide a good starting point for the
Gibbs iteration, which explains why the default value for the burnin
stage is so small. Since successive random vectors generated by a Gibbs
sampler are not independent, which can be a problem in certain applications.
This dependence is typically ameliorated by generating a larger-than-required
number of random vectors, followed by a ‘thinning’ stage; this can
be obtained by setting the thinning
argument larger that 1
.
The overall number of generated points is burnin+n*thinning
, and the
returned object is formed by those with index in burnin+(1:n)*thinning
.
If d=1
, the values are sampled using a non-iterative procedure,
essentially as in equation (4) of Breslaw (1994), except that in this case
the mean and the variance do not refer to a conditional distribution,
but are the arguments supplied in the calling statement.
dmtruncnorm
and pmtruncnorm
return a numeric vector;
rmtruncnorm
returns a matrix, unless either n=1
or d=1
,
in which case it returns a vector.
Adelchi Azzalini
Breslaw, J.A. (1994) Random sampling from a truncated multivariate normal distribution. Appl. Math. Lett. vol.7, pp.1-6.
Kotecha, J.H. and Djurić, P.M. (1999). Gibbs sampling approach for generation of truncated multivariate Gaussian random variables. In ICASSP'99: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.3, pp.1757-1760. doi:10.1109/ICASSP.1999.756335.
Wilhelm, S. (2022). Gibbs sampler for the truncated multivariate normal distribution. Vignette of R package https://cran.r-project.org/package=tmvtnorm, version 1.5.
plot_fxy
for additional plotting examples,
sadmvn
for regulating accuracy via ...
# example with d=2 m2 <- c(0.5, -1) V2 <- matrix(c(3, 3, 3, 6), 2, 2) low <- c(-1, -2.8) up <- c(1.5, 1.5) # plotting truncated normal density using 'dmtruncnorm' and 'contour' functions plot_fxy(dmtruncnorm, xlim=c(-2, 2), ylim=c(-3, 2), mean=m2, varcov=V2, lower=low, upper=up, npt=101) set.seed(1) x <- rmtruncnorm(n=500, mean=m2, varcov=V2, lower=low, upper=up) points(x, cex=0.2, col="red") #------ # example with d=1 set.seed(1) low <- -4 hi <- 3 x <- rmtruncnorm(1e5, mean=2, varcov=5, lower=low, upper=hi) hist(x, prob=TRUE, xlim=c(-8, 12), main="Truncated univariate N(2, sqrt(5))") rug(c(low, hi), col=2) x0 <- seq(-8, 12, length=251) pdf <- dnorm(x0, 2, sqrt(5)) p <- pnorm(c(low, hi), 2, sqrt(5)) lines(x0, pdf/diff(p), col=4, lty=2) lines(x0, dmtruncnorm(x0, 2, 5, low, hi), col=2, lwd=2)
# example with d=2 m2 <- c(0.5, -1) V2 <- matrix(c(3, 3, 3, 6), 2, 2) low <- c(-1, -2.8) up <- c(1.5, 1.5) # plotting truncated normal density using 'dmtruncnorm' and 'contour' functions plot_fxy(dmtruncnorm, xlim=c(-2, 2), ylim=c(-3, 2), mean=m2, varcov=V2, lower=low, upper=up, npt=101) set.seed(1) x <- rmtruncnorm(n=500, mean=m2, varcov=V2, lower=low, upper=up) points(x, cex=0.2, col="red") #------ # example with d=1 set.seed(1) low <- -4 hi <- 3 x <- rmtruncnorm(1e5, mean=2, varcov=5, lower=low, upper=hi) hist(x, prob=TRUE, xlim=c(-8, 12), main="Truncated univariate N(2, sqrt(5))") rug(c(low, hi), col=2) x0 <- seq(-8, 12, length=251) pdf <- dnorm(x0, 2, sqrt(5)) p <- pnorm(c(low, hi), 2, sqrt(5)) lines(x0, pdf/diff(p), col=4, lty=2) lines(x0, dmtruncnorm(x0, 2, 5, low, hi), col=2, lwd=2)
The probability density function and the distribution function of the multivariate truncated Student's t distribution
dmtrunct(x, mean, S, df, lower, upper, log = FALSE, ...) pmtrunct(x, mean, S, df, lower, upper, ...)
dmtrunct(x, mean, S, df, lower, upper, log = FALSE, ...) pmtrunct(x, mean, S, df, lower, upper, ...)
x |
either a vector of length |
mean |
either a vector of length |
S |
a symmetric positive-definite matrix representing the
scale matrix, such that |
df |
degrees of freedom; it must be a positive integer |
lower |
a vector representing the lower truncation values of the
component variables; |
upper |
a vector representing the upper truncation values of the
component variables; |
log |
a logical value (default value is |
... |
arguments passed to |
The dimension d
cannot exceed 20
.
a numeric vector
Adelchi Azzalini
sadmvt
for regulating accuracy
m2 <- c(0.5, -1) V2 <- matrix(c(1.5, -1.75, -1.75, 3), 2, 2) lower <- a <- c(-1, -2.5) upper <- b <- c(2, 1) set.seed(1) points <- matrix(runif(10, -3, 3), nrow=5, ncol=2) pdf <- dmtrunct(points, mean=m2, S=V2, df=4, lower, upper) cdf <- pmtrunct(points, mean=m2, S=V2, df=4, lower, upper)
m2 <- c(0.5, -1) V2 <- matrix(c(1.5, -1.75, -1.75, 3), 2, 2) lower <- a <- c(-1, -2.5) upper <- b <- c(2, 1) set.seed(1) points <- matrix(runif(10, -3, 3), nrow=5, ncol=2) pdf <- dmtrunct(points, mean=m2, S=V2, df=4, lower, upper) cdf <- pmtrunct(points, mean=m2, S=V2, df=4, lower, upper)
The inverse of a symmetric positive-definite matrix and its log-determinant
pd.solve(x, silent = FALSE, log.det=FALSE)
pd.solve(x, silent = FALSE, log.det=FALSE)
x |
a symmetric positive-definite matrix. |
silent |
a logical value which indicates the action to take in case of
an error. If |
log.det |
a logical value to indicate whether the log-determinant
of |
The function checks that x
is a symmetric positive-definite
matrix. If an error is detected, an action is taken which depends on the
value of the argument silent
.
the inverse matrix of x
; if log.det=TRUE
, this inverse has an
attribute which contains the logarithm of the determinant of x
.
Adelchi Azzalini
x <- toeplitz(rev(1:4)) x.inv <- pd.solve(x) print(x.inv %*% x) x.inv <- pd.solve(x, log.det=TRUE) logDet <- attr(x.inv, "log.det") print(abs(logDet - determinant(x, logarithm=TRUE)$modulus))
x <- toeplitz(rev(1:4)) x.inv <- pd.solve(x) print(x.inv %*% x) x.inv <- pd.solve(x, log.det=TRUE) logDet <- attr(x.inv, "log.det") print(abs(logDet - determinant(x, logarithm=TRUE)$modulus))
Plot a real-valued function f
evaluated on a grid of points
of the Cartesian plane, possibly with parameters specified by ...
.
The type of graphical display can be regulated by selecting the plotting
function among a set of available options.
plot_fxy(f, xlim, ylim, ..., npt=51, grf, grpar)
plot_fxy(f, xlim, ylim, ..., npt=51, grf, grpar)
f |
either a function or a character string with the name of a
real-valued function whose first argument represents the
coordinates of points where |
xlim |
either a vector of abscissae where the |
ylim |
either a vector of ordinates where the |
... |
additional parameters to be supplied to |
npt |
either an integer value or a two-element integer vector with
the number of equally-spaced points, within the endpoints of |
grf |
an optional character string with the name of the function
which produces the graphical display, selectable among
|
grpar |
an optional character string with arguments supplied to the
selected |
Function f
will be called with the first argument represented by a
two-column matrix, where each row represents a point of the grid on the
Cartesian plane identified by xlim
and ylim
;
this set of coordinates is stored in matrix pts
of the returned list.
If present, arguments supplied as ...
are also passed to f
.
It is assumed that f
accepts this type of call.
The original motivation of plot_fxy
was to plot instances of bivariate
probability density functions specified by package mnormt
,
but it can be used for plotting any function fulfilling the above requirements,
as illustrated by some of the examples below.
an invisible list with the following components:
x
|
a vector of coordinates on the axis |
y
|
a vector of coordinates on the axis |
pts
|
a matrix of dimension (npt[1]*npt[2],2)
with the coordinates of the evaluation points |
f.values |
the vector of f values at the pts points.
|
contour
, filled.contour
,
persp
, image
Sigma <- matrix(c(1,1,1,2), 2, 2) mean <- c(0, -1) xlim <- c(-3, 5) ylim <- c(-5, 3) # # multivariate normal density, contour-level plot gp <- 'col="blue", nlevels=6, main="bivariate normal density"' u <- plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma, grpar=gp) cat(str(u)) #--- # multivariate normal density, filled-contour plot plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma,grf="filled.contour") #--- # multivariate normal density, perspective plot gp <- "theta = 10, phi = 25, r = 2.5" plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma, grf="persp", grpar=gp) #--- # multivariate Student's "t" density; # the xlim argument passed to function 'grf' overrides the earlier xlim; # xlim and ylim can be placed after the arguments of 'f', if one prefers so grp <- 'xlim=c(-1, 3)' plot_fxy(dmt, mean=mean, S=Sigma, df=8, xlim, ylim, npt=101, grf="filled.contour", grpar=grp) #--- # multivariate truncated normal density, 'image' plot low <- c(-3, -5) hi <- c(1, 0) plot_fxy(dmtruncnorm, mean=mean, varcov=Sigma, lower=low, upper=hi, xlim, ylim, npt=81, grf="image") #--- # multivariate truncated normal distribution function, 'image' plot; # hence not a density function low <- c(-3, -5) hi <- c(1, 0) v <- plot_fxy(pmtruncnorm, mean=mean, varcov=Sigma, lower=low, upper=hi, xlim, ylim, npt=c(61, 81), grf="image") #--- # a different sort of 'f' function (lbeta), not a component of this package funct <- function(z) lbeta(a=z[,1], b=z[,2]) plot_fxy(funct, xlim=c(0.1, 2), ylim=c(0.1, 2), npt=41, grpar='main="function log-beta(a,b)", xlab="a", ylab="b"')
Sigma <- matrix(c(1,1,1,2), 2, 2) mean <- c(0, -1) xlim <- c(-3, 5) ylim <- c(-5, 3) # # multivariate normal density, contour-level plot gp <- 'col="blue", nlevels=6, main="bivariate normal density"' u <- plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma, grpar=gp) cat(str(u)) #--- # multivariate normal density, filled-contour plot plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma,grf="filled.contour") #--- # multivariate normal density, perspective plot gp <- "theta = 10, phi = 25, r = 2.5" plot_fxy(dmnorm, xlim, ylim, mean=mean, varcov=Sigma, grf="persp", grpar=gp) #--- # multivariate Student's "t" density; # the xlim argument passed to function 'grf' overrides the earlier xlim; # xlim and ylim can be placed after the arguments of 'f', if one prefers so grp <- 'xlim=c(-1, 3)' plot_fxy(dmt, mean=mean, S=Sigma, df=8, xlim, ylim, npt=101, grf="filled.contour", grpar=grp) #--- # multivariate truncated normal density, 'image' plot low <- c(-3, -5) hi <- c(1, 0) plot_fxy(dmtruncnorm, mean=mean, varcov=Sigma, lower=low, upper=hi, xlim, ylim, npt=81, grf="image") #--- # multivariate truncated normal distribution function, 'image' plot; # hence not a density function low <- c(-3, -5) hi <- c(1, 0) v <- plot_fxy(pmtruncnorm, mean=mean, varcov=Sigma, lower=low, upper=hi, xlim, ylim, npt=c(61, 81), grf="image") #--- # a different sort of 'f' function (lbeta), not a component of this package funct <- function(z) lbeta(a=z[,1], b=z[,2]) plot_fxy(funct, xlim=c(0.1, 2), ylim=c(0.1, 2), npt=41, grpar='main="function log-beta(a,b)", xlab="a", ylab="b"')
Produces an array with the moments up to specified orders of a (possibly) truncated multivariate normal distribution. Each component variable can be truncated on one side (to the left or to the right) or on two sides or not truncated.
recintab(kappa, a, b, mu, S, ...)
recintab(kappa, a, b, mu, S, ...)
kappa |
a vector of non-negative integer values representing the required order of moments for each component variable. |
a |
a vector representing the lower truncation values of the component
variables; |
b |
a vector representing the upper truncation values of the component
variables; |
mu |
a vector representing the mean value of the pre-truncation normal random variable. |
S |
a symmetric positive-definite matrix representing the variance matrix of the pre-truncation normal random variable. |
... |
parameters passed to |
The maximal dimension of the multivariate normal variable is 20.
If this threshold is exceeded NA
s are returned.
This function is the R translation of the Matlab function with the same name
belonging to the package ftnorm
,
which is associated to the paper of Kan and Robotti (2017).
The Matlab package ftnorm
has been downloaded from
http://www-2.rotman.utoronto.ca/~kan/research.htm,
on 2020-04-23.
The function returns an array, M
say, whose entries represent integrals
of type , where
denotes the
-dimensional normal density function.
Typically, interest is in the scaled array
M/M[1]
whose entries
represent the moments of the truncated distribution.
The algorithm is based on a recursion starting from the integral of the normal
distribution over the specified hyper-rectangle. This integral is evaluated
by sadmvn
, whose tuning parameters maxpts, abseps, releps
can be regulated via the ...
argument.
In the multivariate case, for an input vector kappa=c(k1,..., kd)
,
the functions returns an array of dimension c((k1+1),...,(kd+1))
whose entries represent integrals described in section ‘Details’.
In other words, the array element M[i+1, j+1, k+1,...]
contains the
unnormalized cross moment of order (i, j, k,...)
;
this must be divided by M[1]
to obtain the regular cross moment.
In the univariate case, a vector is returned with similar meaning.
Although the underlying algorithm is exact in principle, the actual computation hinges crucially on the initial integration of the multivariate normal density over the truncation hyper-cube. This integration may result in numerical inaccuracies, whose amount depends on the supplied arguments. Moreover, the recursion employed by the algorithm propagates the initial error to other terms.
When problematic cases have been processed by the original Matlab function, the same issues have occurred, up to minor variations.
Instances of such errors may be detected when the array M/M[1]
is
passed to mom2cum
, but there is no guarantee that all such
problems are detected.
This function is not intended for direct call by a user, at least in
commonly encountered situations.
Function mom.mtruncnorm
represents a more user-friendly tool.
Original Matlab code by Raymond Kan and Cesare Robotti, porting to R by Adelchi Azzalini.
Kan, Raymond and Robotti, Cesare (2017). On moments of folded and truncated multivariate normal distributions. Journal of Computational and Graphical Statistics, 26, 930-934, DOI: 10.1080/10618600.2017.1322092
Leppard, P. and Tallis, G. M. (1989). Algorithm AS249: Evaluation of the mean and covariance of the truncated multinormal distribution Applied Statistics 38, 543-553)
mom.mtruncnorm
for a more user-friendly function,
mom2cum
for transformation to cumulants,
sadmvn
for regulating accuracy if d>2
mu <- c(1, -0.5, 0) Sigma <- toeplitz(1/(1:3)) low <- c(-Inf, -3, -4) hi <- c(1.5, Inf, 2) M <- recintab(c(2,3,1), low, hi, mu, Sigma) M/M[1] # cross-moments up to order 2 for X1, up to the 3 for X2, up to 1 for X3, # if the components of the trivariate variable are denoted (X1,X2,X3) #-- # Example 2 of Leppard & Tallis (1989, Appl.Stat. vol.38, p.547) truncp <- c(0, 1, 2) U <- c(0, 0, 0) V <- 0.5*(diag(3) + matrix(1, 3, 3)) M <- recintab(c(2,2,2), truncp, rep(Inf,3), U, V) mom <- M/M[1] EX <- c(mom[2,1,1], mom[1,2,1], mom[1,1,2]) print(EX, digits=9) EX2 <- matrix(c( mom[3,1,1], mom[2,2,1], mom[2,1,2], mom[2,2,1], mom[1,3,1], mom[1,2,2], mom[2,1,2], mom[1,2,2], mom[1,1,3]), 3, 3, byrow=TRUE) varX <- EX2 - outer(EX ,EX) print(varX, digits=9)
mu <- c(1, -0.5, 0) Sigma <- toeplitz(1/(1:3)) low <- c(-Inf, -3, -4) hi <- c(1.5, Inf, 2) M <- recintab(c(2,3,1), low, hi, mu, Sigma) M/M[1] # cross-moments up to order 2 for X1, up to the 3 for X2, up to 1 for X3, # if the components of the trivariate variable are denoted (X1,X2,X3) #-- # Example 2 of Leppard & Tallis (1989, Appl.Stat. vol.38, p.547) truncp <- c(0, 1, 2) U <- c(0, 0, 0) V <- 0.5*(diag(3) + matrix(1, 3, 3)) M <- recintab(c(2,2,2), truncp, rep(Inf,3), U, V) mom <- M/M[1] EX <- c(mom[2,1,1], mom[1,2,1], mom[1,1,2]) print(EX, digits=9) EX2 <- matrix(c( mom[3,1,1], mom[2,2,1], mom[2,1,2], mom[2,2,1], mom[1,3,1], mom[1,2,2], mom[2,1,2], mom[1,2,2], mom[1,1,3]), 3, 3, byrow=TRUE) varX <- EX2 - outer(EX ,EX) print(varX, digits=9)
Given a multivariate sample, the Mardia measures of skewness and kurtosis are computed, along with their p-values for testing normality
sample_Mardia_measures(data, correct = FALSE)
sample_Mardia_measures(data, correct = FALSE)
data |
a data matrix |
correct |
(logical) if |
For a given a data matrix, the multivariate measures of skewness and kurtosis introduced by Mardia (1970, 1974) are computed, along with some associated quantities. We follow the notation of the 1974 paper.
If n
denotes the number of complete cases, the condition n>3
is required for numerical computation. Clearly, a much larger n
is
required for meaningful statistical work.
The sample variance matrix appearing in (2.2) and (2.4)
is computed here (in the dafault setting) with the
denominator,
at variance from the commonly employed
n-1
denominator.
With this definition of , one obtains the same numerical outcome
of the example on p.127 of Mardia (1974).
The approximate observed significance levels for testing normality,
p.b1
and p.b2
, are computed using expressions (5.5) and
(5.6) in Section 5 of Mardia (1974).
For p.b2
, the condition (n-d-1)>0
is required, where
d
denotes the number of variables.
A named vector with the following components:
b1 |
the measure of asymmetry as given in (2.2) |
b2 |
the measure of kurtosis as given in (2.4) |
g1 |
the measure of asymmetry as given in (2.10) |
g2 |
the measure of kurtosis as given in (2.11) |
p.b1 |
observed significance level of |
p.b2 |
observed significance level of |
n |
The number of complete cases in the input data matrix |
where the quoted formulae are those of Mardia (1974).
Adelchi Azzalini
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications Biometrika, 57, 519-530.
Mardia, K. V. (1974). Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya ser.B, 36, 115-128.
set.seed(1) x <- rmnorm(100, mean=1:3, varcov=toeplitz(1/(1:3))) sample_Mardia_measures(x)
set.seed(1) x <- rmnorm(100, mean=1:3, varcov=toeplitz(1/(1:3))) sample_Mardia_measures(x)