Title: | Fast Multivariate Normal and Student's t Methods |
---|---|
Description: | Provides computationally efficient tools related to the multivariate normal and Student's t distributions. The main functionalities are: simulating multivariate random vectors, evaluating multivariate normal or Student's t densities and Mahalanobis distances. These tools are very efficient thanks to the use of C++ code and of the OpenMP API. |
Authors: | Matteo Fasiolo [aut, cre], Thijs van den Berg [ctb] |
Maintainer: | Matteo Fasiolo <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.2.8 |
Built: | 2024-10-26 06:40:17 UTC |
Source: | CRAN |
Fast density computation for mixture of multivariate normal distributions.
dmixn(X, mu, sigma, w, log = FALSE, ncores = 1, isChol = FALSE, A = NULL)
dmixn(X, mu, sigma, w, log = FALSE, ncores = 1, isChol = FALSE, A = NULL)
X |
matrix n by d where each row is a d dimensional random vector. Alternatively |
mu |
an (m x d) matrix, where m is the number of mixture components. |
sigma |
as list of m covariance matrices (d x d) on for each mixture component.
Alternatively it can be a list of m cholesky decomposition of the covariance.
In that case |
w |
vector of length m, containing the weights of the mixture components. |
log |
boolean set to true the logarithm of the pdf is required. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
A |
an (optional) numeric matrix of dimension (m x d), which will be used to store the evaluations of each mixture
density over each mixture component. It is useful when m and n are large and one wants to call |
NB: at the moment the parallelization does not work properly on Solaris OS when ncores>1
. Hence, dmixt()
checks if the OS
is Solaris and, if this the case, it imposes ncores==1
.
A vector of length n where the i-the entry contains the pdf of the i-th random vector (i.e. the i-th row of X
).
Matteo Fasiolo <[email protected]>.
#### 1) Example use # Set up mixture density mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixn(1e4, mu, sigma, w) # Evaluate density ds <- dmixn(X, mu, sigma, w = w) head(ds) ##### 2) More complicated example # Define mixture set.seed(5135) N <- 10000 d <- 2 w <- rep(1, 2) / 2 mu <- matrix(c(0, 0, 2, 3), 2, 2, byrow = TRUE) sigma <- list(matrix(c(1, 0, 0, 2), 2, 2), matrix(c(1, -0.9, -0.9, 1), 2, 2)) # Simulate random variables X <- rmixn(N, mu, sigma, w = w, retInd = TRUE) # Plot mixture density np <- 100 xvals <- seq(min(X[ , 1]), max(X[ , 1]), length.out = np) yvals <- seq(min(X[ , 2]), max(X[ , 2]), length.out = np) theGrid <- expand.grid(xvals, yvals) theGrid <- as.matrix(theGrid) dens <- dmixn(theGrid, mu, sigma, w = w) plot(X, pch = '.', col = attr(X, "index")+1) contour(x = xvals, y = yvals, z = matrix(dens, np, np), levels = c(0.002, 0.01, 0.02, 0.04, 0.08, 0.15 ), add = TRUE, lwd = 2)
#### 1) Example use # Set up mixture density mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixn(1e4, mu, sigma, w) # Evaluate density ds <- dmixn(X, mu, sigma, w = w) head(ds) ##### 2) More complicated example # Define mixture set.seed(5135) N <- 10000 d <- 2 w <- rep(1, 2) / 2 mu <- matrix(c(0, 0, 2, 3), 2, 2, byrow = TRUE) sigma <- list(matrix(c(1, 0, 0, 2), 2, 2), matrix(c(1, -0.9, -0.9, 1), 2, 2)) # Simulate random variables X <- rmixn(N, mu, sigma, w = w, retInd = TRUE) # Plot mixture density np <- 100 xvals <- seq(min(X[ , 1]), max(X[ , 1]), length.out = np) yvals <- seq(min(X[ , 2]), max(X[ , 2]), length.out = np) theGrid <- expand.grid(xvals, yvals) theGrid <- as.matrix(theGrid) dens <- dmixn(theGrid, mu, sigma, w = w) plot(X, pch = '.', col = attr(X, "index")+1) contour(x = xvals, y = yvals, z = matrix(dens, np, np), levels = c(0.002, 0.01, 0.02, 0.04, 0.08, 0.15 ), add = TRUE, lwd = 2)
Fast density computation for mixture of multivariate Student's t distributions.
dmixt(X, mu, sigma, df, w, log = FALSE, ncores = 1, isChol = FALSE, A = NULL)
dmixt(X, mu, sigma, df, w, log = FALSE, ncores = 1, isChol = FALSE, A = NULL)
X |
matrix n by d where each row is a d dimensional random vector. Alternatively |
mu |
an (m x d) matrix, where m is the number of mixture components. |
sigma |
as list of m covariance matrices (d x d) on for each mixture component.
Alternatively it can be a list of m cholesky decomposition of the covariance.
In that case |
df |
a positive scalar representing the degrees of freedom. All the densities in the mixture have the same |
w |
vector of length m, containing the weights of the mixture components. |
log |
boolean set to true the logarithm of the pdf is required. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
A |
an (optional) numeric matrix of dimension (m x d), which will be used to store the evaluations of each mixture
density over each mixture component. It is useful when m and n are large and one wants to call |
There are many candidates for the multivariate generalization of Student's t-distribution, here we use
the parametrization described here https://en.wikipedia.org/wiki/Multivariate_t-distribution. NB: at the moment
the parallelization does not work properly on Solaris OS when ncores>1
. Hence, dmixt()
checks if the OS
is Solaris and, if this the case, it imposes ncores==1
.
A vector of length n where the i-the entry contains the pdf of the i-th random vector (i.e. the i-th row of X
).
Matteo Fasiolo <[email protected]>.
#### 1) Example use # Set up mixture density df <- 6 mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixt(1e4, mu, sigma, df, w) # Evaluate density ds <- dmixt(X, mu, sigma, w = w, df = df) head(ds) ##### 2) More complicated example # Define mixture set.seed(5135) N <- 10000 d <- 2 df = 10 w <- rep(1, 2) / 2 mu <- matrix(c(0, 0, 2, 3), 2, 2, byrow = TRUE) sigma <- list(matrix(c(1, 0, 0, 2), 2, 2), matrix(c(1, -0.9, -0.9, 1), 2, 2)) # Simulate random variables X <- rmixt(N, mu, sigma, w = w, df = df, retInd = TRUE) # Plot mixture density np <- 100 xvals <- seq(min(X[ , 1]), max(X[ , 1]), length.out = np) yvals <- seq(min(X[ , 2]), max(X[ , 2]), length.out = np) theGrid <- expand.grid(xvals, yvals) theGrid <- as.matrix(theGrid) dens <- dmixt(theGrid, mu, sigma, w = w, df = df) plot(X, pch = '.', col = attr(X, "index")+1) contour(x = xvals, y = yvals, z = matrix(dens, np, np), levels = c(0.002, 0.01, 0.02, 0.04, 0.08, 0.15 ), add = TRUE, lwd = 2)
#### 1) Example use # Set up mixture density df <- 6 mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixt(1e4, mu, sigma, df, w) # Evaluate density ds <- dmixt(X, mu, sigma, w = w, df = df) head(ds) ##### 2) More complicated example # Define mixture set.seed(5135) N <- 10000 d <- 2 df = 10 w <- rep(1, 2) / 2 mu <- matrix(c(0, 0, 2, 3), 2, 2, byrow = TRUE) sigma <- list(matrix(c(1, 0, 0, 2), 2, 2), matrix(c(1, -0.9, -0.9, 1), 2, 2)) # Simulate random variables X <- rmixt(N, mu, sigma, w = w, df = df, retInd = TRUE) # Plot mixture density np <- 100 xvals <- seq(min(X[ , 1]), max(X[ , 1]), length.out = np) yvals <- seq(min(X[ , 2]), max(X[ , 2]), length.out = np) theGrid <- expand.grid(xvals, yvals) theGrid <- as.matrix(theGrid) dens <- dmixt(theGrid, mu, sigma, w = w, df = df) plot(X, pch = '.', col = attr(X, "index")+1) contour(x = xvals, y = yvals, z = matrix(dens, np, np), levels = c(0.002, 0.01, 0.02, 0.04, 0.08, 0.15 ), add = TRUE, lwd = 2)
Fast computation of the multivariate normal density.
dmvn(X, mu, sigma, log = FALSE, ncores = 1, isChol = FALSE)
dmvn(X, mu, sigma, log = FALSE, ncores = 1, isChol = FALSE)
X |
matrix n by d where each row is a d dimensional random vector. Alternatively |
mu |
vector of length d, representing the mean of the distribution. |
sigma |
covariance matrix (d x d). Alternatively it can be the cholesky decomposition of the covariance. In that case isChol should be set to TRUE. |
log |
boolean set to true the logarithm of the pdf is required. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
A vector of length n where the i-the entry contains the pdf of the i-th random vector.
Matteo Fasiolo <[email protected]>
N <- 100 d <- 5 mu <- 1:d X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) myChol <- chol(mcov) head(dmvn(X, mu, mcov), 10) head(dmvn(X, mu, myChol, isChol = TRUE), 10) ## Not run: # Performance comparison: microbenchmark does not work on all # platforms, hence we need to check whether it is installed if( "microbenchmark" %in% rownames(installed.packages()) ){ library(mvtnorm) library(microbenchmark) a <- cbind( dmvn(X, mu, mcov), dmvn(X, mu, myChol, isChol = TRUE), dmvnorm(X, mu, mcov)) # Check if we get the same output as dmvnorm() a[ , 1] / a[, 3] a[ , 2] / a[, 3] microbenchmark(dmvn(X, mu, myChol, isChol = TRUE), dmvn(X, mu, mcov), dmvnorm(X, mu, mcov)) detach("package:mvtnorm", unload=TRUE) } ## End(Not run)
N <- 100 d <- 5 mu <- 1:d X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) myChol <- chol(mcov) head(dmvn(X, mu, mcov), 10) head(dmvn(X, mu, myChol, isChol = TRUE), 10) ## Not run: # Performance comparison: microbenchmark does not work on all # platforms, hence we need to check whether it is installed if( "microbenchmark" %in% rownames(installed.packages()) ){ library(mvtnorm) library(microbenchmark) a <- cbind( dmvn(X, mu, mcov), dmvn(X, mu, myChol, isChol = TRUE), dmvnorm(X, mu, mcov)) # Check if we get the same output as dmvnorm() a[ , 1] / a[, 3] a[ , 2] / a[, 3] microbenchmark(dmvn(X, mu, myChol, isChol = TRUE), dmvn(X, mu, mcov), dmvnorm(X, mu, mcov)) detach("package:mvtnorm", unload=TRUE) } ## End(Not run)
Fast computation of the multivariate Student's t density.
dmvt(X, mu, sigma, df, log = FALSE, ncores = 1, isChol = FALSE)
dmvt(X, mu, sigma, df, log = FALSE, ncores = 1, isChol = FALSE)
X |
matrix n by d where each row is a d dimensional random vector. Alternatively |
mu |
vector of length d, representing the mean of the distribution. |
sigma |
scale matrix (d x d). Alternatively it can be the cholesky decomposition
of the scale matrix. In that case isChol should be set to TRUE. Notice that ff the degrees of
freedom (the argument |
df |
a positive scalar representing the degrees of freedom. |
log |
boolean set to true the logarithm of the pdf is required. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
There are many candidates for the multivariate generalization of Student's t-distribution, here we use
the parametrization described here https://en.wikipedia.org/wiki/Multivariate_t-distribution. NB: at the moment
the parallelization does not work properly on Solaris OS when ncores>1
. Hence, dmvt()
checks if the OS
is Solaris and, if this the case, it imposes ncores==1
.
A vector of length n where the i-the entry contains the pdf of the i-th random vector.
Matteo Fasiolo <[email protected]>
N <- 100 d <- 5 mu <- 1:d df <- 4 X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) myChol <- chol(mcov) head(dmvt(X, mu, mcov, df = df), 10) head(dmvt(X, mu, myChol, df = df, isChol = TRUE), 10)
N <- 100 d <- 5 mu <- 1:d df <- 4 X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) myChol <- chol(mcov) head(dmvt(X, mu, mcov, df = df), 10) head(dmvt(X, mu, myChol, df = df, isChol = TRUE), 10)
X
and the vector mu
with respect to sigma.Fast computation of squared mahalanobis distance between all rows of X
and the vector mu
with respect to sigma.
maha(X, mu, sigma, ncores = 1, isChol = FALSE)
maha(X, mu, sigma, ncores = 1, isChol = FALSE)
X |
matrix n by d where each row is a d dimensional random vector. Alternatively |
mu |
vector of length d, representing the central position. |
sigma |
covariance matrix (d x d). Alternatively is can be the cholesky decomposition
of the covariance. In that case |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
a vector of length n where the i-the entry contains the square mahalanobis distance i-th random vector.
Matteo Fasiolo <[email protected]>
N <- 100 d <- 5 mu <- 1:d X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) myChol <- chol(mcov) rbind(head(maha(X, mu, mcov), 10), head(maha(X, mu, myChol, isChol = TRUE), 10), head(mahalanobis(X, mu, mcov), 10)) ## Not run: # Performance comparison: microbenchmark does not work on all # platforms, hence we need to check whether it is installed if( "microbenchmark" %in% rownames(installed.packages()) ){ library(microbenchmark) a <- cbind( maha(X, mu, mcov), maha(X, mu, myChol, isChol = TRUE), mahalanobis(X, mu, mcov)) # Same output as mahalanobis a[ , 1] / a[, 3] a[ , 2] / a[, 3] microbenchmark(maha(X, mu, mcov), maha(X, mu, myChol, isChol = TRUE), mahalanobis(X, mu, mcov)) } ## End(Not run)
N <- 100 d <- 5 mu <- 1:d X <- t(t(matrix(rnorm(N*d), N, d)) + mu) tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) myChol <- chol(mcov) rbind(head(maha(X, mu, mcov), 10), head(maha(X, mu, myChol, isChol = TRUE), 10), head(mahalanobis(X, mu, mcov), 10)) ## Not run: # Performance comparison: microbenchmark does not work on all # platforms, hence we need to check whether it is installed if( "microbenchmark" %in% rownames(installed.packages()) ){ library(microbenchmark) a <- cbind( maha(X, mu, mcov), maha(X, mu, myChol, isChol = TRUE), mahalanobis(X, mu, mcov)) # Same output as mahalanobis a[ , 1] / a[, 3] a[ , 2] / a[, 3] microbenchmark(maha(X, mu, mcov), maha(X, mu, myChol, isChol = TRUE), mahalanobis(X, mu, mcov)) } ## End(Not run)
Given a sample from a d-dimensional distribution, an initialization point and a bandwidth the algorithm finds the nearest mode of the corresponding Gaussian kernel density.
ms(X, init, H, tol = 1e-06, ncores = 1, store = FALSE)
ms(X, init, H, tol = 1e-06, ncores = 1, store = FALSE)
X |
n by d matrix containing the data. |
init |
d-dimensional vector containing the initial point for the optimization. |
H |
Positive definite bandwidth matrix representing the covariance of each component of the Gaussian kernel density. |
tol |
Tolerance used to assess the convergence of the algorithm, which is stopped if the absolute values of increments along all the dimensions are smaller then tol at any iteration. Default value is 1e-6. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
store |
If |
A list where estim
is a d-dimensional vector containing the last position of the algorithm, while traj
is a matrix with d-colums representing the trajectory of the algorithm along each dimension. If store == FALSE
the whole trajectory
is not stored and traj = NULL
.
Matteo Fasiolo <[email protected]>.
set.seed(434) # Simulating multivariate normal data N <- 1000 mu <- c(1, 2) sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2) X <- rmvn(N, mu = mu, sigma = sigma) # Plotting the true density function steps <- 100 range1 <- seq(min(X[ , 1]), max(X[ , 1]), length.out = steps) range2 <- seq(min(X[ , 2]), max(X[ , 2]), length.out = steps) grid <- expand.grid(range1, range2) vals <- dmvn(as.matrix(grid), mu, sigma) contour(z = matrix(vals, steps, steps), x = range1, y = range2, xlab = "X1", ylab = "X2") points(X[ , 1], X[ , 2], pch = '.') # Estimating the mode from "nrep" starting points nrep <- 10 index <- sample(1:N, nrep) for(ii in 1:nrep) { start <- X[index[ii], ] out <- ms(X, init = start, H = 0.1 * sigma, store = TRUE) lines(out$traj[ , 1], out$traj[ , 2], col = 2, lwd = 2) points(out$final[1], out$final[2], col = 4, pch = 3, lwd = 3) # Estimated mode (blue) points(start[1], start[2], col = 2, pch = 3, lwd = 3) # ii-th starting value }
set.seed(434) # Simulating multivariate normal data N <- 1000 mu <- c(1, 2) sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2) X <- rmvn(N, mu = mu, sigma = sigma) # Plotting the true density function steps <- 100 range1 <- seq(min(X[ , 1]), max(X[ , 1]), length.out = steps) range2 <- seq(min(X[ , 2]), max(X[ , 2]), length.out = steps) grid <- expand.grid(range1, range2) vals <- dmvn(as.matrix(grid), mu, sigma) contour(z = matrix(vals, steps, steps), x = range1, y = range2, xlab = "X1", ylab = "X2") points(X[ , 1], X[ , 2], pch = '.') # Estimating the mode from "nrep" starting points nrep <- 10 index <- sample(1:N, nrep) for(ii in 1:nrep) { start <- X[index[ii], ] out <- ms(X, init = start, H = 0.1 * sigma, store = TRUE) lines(out$traj[ , 1], out$traj[ , 2], col = 2, lwd = 2) points(out$final[1], out$final[2], col = 4, pch = 3, lwd = 3) # Estimated mode (blue) points(start[1], start[2], col = 2, pch = 3, lwd = 3) # ii-th starting value }
Fast simulation of r.v.s from a mixture of multivariate normal densities
rmixn( n, mu, sigma, w, ncores = 1, isChol = FALSE, retInd = FALSE, A = NULL, kpnames = FALSE )
rmixn( n, mu, sigma, w, ncores = 1, isChol = FALSE, retInd = FALSE, A = NULL, kpnames = FALSE )
n |
number of random vectors to be simulated. |
mu |
an (m x d) matrix, where m is the number of mixture components. |
sigma |
as list of m covariance matrices (d x d) on for each mixture component.
Alternatively it can be a list of m cholesky decomposition of the covariance.
In that case |
w |
vector of length m, containing the weights of the mixture components. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
retInd |
when set to |
A |
an (optional) numeric matrix of dimension (n x d), which will be used to store the output random variables.
It is useful when n and d are large and one wants to call |
kpnames |
if |
Notice that this function does not use one of the Random Number Generators (RNGs) provided by R, but one
of the parallel cryptographic RNGs described in (Salmon et al., 2011). It is important to point out that this
RNG can safely be used in parallel, without risk of collisions between parallel sequence of random numbers.
The initialization of the RNG depends on R's seed, hence the set.seed()
function can be used to
obtain reproducible results. Notice though that changing ncores
causes most of the generated numbers
to be different even if R's seed is the same (see example below). NB: at the moment the RNG does not work
properly on Solaris OS when ncores>1
. Hence, rmixn()
checks if the OS is Solaris and, if this the case,
it imposes ncores==1
.
If A==NULL
(default) the output is an (n x d) matrix where the i-th row is the i-th simulated vector.
If A!=NULL
then the random vector are store in A
, which is provided by the user, and the function
returns NULL
. Notice that if retInd==TRUE
an attribute called "index" will be added to A.
This is a vector specifying to which mixture components each random vector belongs.
Matteo Fasiolo <[email protected]>, C++ RNG engine by Thijs van den Berg <http://sitmo.com/>.
John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw (2011). Parallel Random Numbers: As Easy as 1, 2, 3. D. E. Shaw Research, New York, NY 10036, USA.
# Create mixture of two components mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixn(1e4, mu, sigma, w, retInd = TRUE) plot(X, pch = '.', col = attr(X, "index")) # Simulate with fixed seed set.seed(414) rmixn(4, mu, sigma, w) set.seed(414) rmixn(4, mu, sigma, w) set.seed(414) rmixn(4, mu, sigma, w, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, 2) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmixn(4, mu, sigma, w, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
# Create mixture of two components mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixn(1e4, mu, sigma, w, retInd = TRUE) plot(X, pch = '.', col = attr(X, "index")) # Simulate with fixed seed set.seed(414) rmixn(4, mu, sigma, w) set.seed(414) rmixn(4, mu, sigma, w) set.seed(414) rmixn(4, mu, sigma, w, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, 2) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmixn(4, mu, sigma, w, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
Fast simulation of r.v.s from a mixture of multivariate Student's t densities
rmixt( n, mu, sigma, df, w, ncores = 1, isChol = FALSE, retInd = FALSE, A = NULL, kpnames = FALSE )
rmixt( n, mu, sigma, df, w, ncores = 1, isChol = FALSE, retInd = FALSE, A = NULL, kpnames = FALSE )
n |
number of random vectors to be simulated. |
mu |
an (m x d) matrix, where m is the number of mixture components. |
sigma |
as list of m covariance matrices (d x d) on for each mixture component.
Alternatively it can be a list of m cholesky decomposition of the covariance.
In that case |
df |
a positive scalar representing the degrees of freedom. All the densities in the mixture have the same |
w |
vector of length m, containing the weights of the mixture components. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
retInd |
when set to |
A |
an (optional) numeric matrix of dimension (n x d), which will be used to store the output random variables.
It is useful when n and d are large and one wants to call |
kpnames |
if |
There are many candidates for the multivariate generalization of Student's t-distribution, here we use the parametrization described here https://en.wikipedia.org/wiki/Multivariate_t-distribution.
Notice that this function does not use one of the Random Number Generators (RNGs) provided by R, but one
of the parallel cryptographic RNGs described in (Salmon et al., 2011). It is important to point out that this
RNG can safely be used in parallel, without risk of collisions between parallel sequence of random numbers.
The initialization of the RNG depends on R's seed, hence the set.seed()
function can be used to
obtain reproducible results. Notice though that changing ncores
causes most of the generated numbers
to be different even if R's seed is the same (see example below). NB: at the moment
the parallelization does not work properly on Solaris OS when ncores>1
. Hence, rmixt()
checks if the OS
is Solaris and, if this the case, it imposes ncores==1
If A==NULL
(default) the output is an (n x d) matrix where the i-th row is the i-th simulated vector.
If A!=NULL
then the random vector are store in A
, which is provided by the user, and the function
returns NULL
. Notice that if retInd==TRUE
an attribute called "index" will be added to A.
This is a vector specifying to which mixture components each random vector belongs.
Matteo Fasiolo <[email protected]>, C++ RNG engine by Thijs van den Berg <http://sitmo.com/>.
John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw (2011). Parallel Random Numbers: As Easy as 1, 2, 3. D. E. Shaw Research, New York, NY 10036, USA.
# Create mixture of two components df <- 6 mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixt(1e4, mu, sigma, df, w, retInd = TRUE) plot(X, pch = '.', col = attr(X, "index")) # Simulate with fixed seed set.seed(414) rmixt(4, mu, sigma, df, w) set.seed(414) rmixt(4, mu, sigma, df, w) set.seed(414) rmixt(4, mu, sigma, df, w, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, 2) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmixt(4, mu, sigma, df, w, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
# Create mixture of two components df <- 6 mu <- matrix(c(1, 2, 10, 20), 2, 2, byrow = TRUE) sigma <- list(diag(c(1, 10)), matrix(c(1, -0.9, -0.9, 1), 2, 2)) w <- c(0.1, 0.9) # Simulate X <- rmixt(1e4, mu, sigma, df, w, retInd = TRUE) plot(X, pch = '.', col = attr(X, "index")) # Simulate with fixed seed set.seed(414) rmixt(4, mu, sigma, df, w) set.seed(414) rmixt(4, mu, sigma, df, w) set.seed(414) rmixt(4, mu, sigma, df, w, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, 2) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmixt(4, mu, sigma, df, w, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
Fast simulation of multivariate normal random variables
rmvn(n, mu, sigma, ncores = 1, isChol = FALSE, A = NULL, kpnames = FALSE)
rmvn(n, mu, sigma, ncores = 1, isChol = FALSE, A = NULL, kpnames = FALSE)
n |
number of random vectors to be simulated. |
mu |
vector of length d, representing the mean. |
sigma |
covariance matrix (d x d). Alternatively is can be the cholesky decomposition
of the covariance. In that case |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
A |
an (optional) numeric matrix of dimension (n x d), which will be used to store the output random variables.
It is useful when n and d are large and one wants to call |
kpnames |
if |
Notice that this function does not use one of the Random Number Generators (RNGs) provided by R, but one
of the parallel cryptographic RNGs described in (Salmon et al., 2011). It is important to point out that this
RNG can safely be used in parallel, without risk of collisions between parallel sequence of random numbers.
The initialization of the RNG depends on R's seed, hence the set.seed()
function can be used to
obtain reproducible results. Notice though that changing ncores
causes most of the generated numbers
to be different even if R's seed is the same (see example below). NB: at the moment the RNG does not work
properly on Solaris OS when ncores>1
. Hence, rmvn()
checks if the OS is Solaris and, if this the case,
it imposes ncores==1
.
If A==NULL
(default) the output is an (n x d) matrix where the i-th row is the i-th simulated vector.
If A!=NULL
then the random vector are store in A
, which is provided by the user, and the function
returns NULL
.
Matteo Fasiolo <[email protected]>, C++ RNG engine by Thijs van den Berg <http://sitmo.com/>.
John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw (2011). Parallel Random Numbers: As Easy as 1, 2, 3. D. E. Shaw Research, New York, NY 10036, USA.
d <- 5 mu <- 1:d # Creating covariance matrix tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) set.seed(414) rmvn(4, 1:d, mcov) set.seed(414) rmvn(4, 1:d, mcov) set.seed(414) rmvn(4, 1:d, mcov, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, d) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmvn(4, 1:d, mcov, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
d <- 5 mu <- 1:d # Creating covariance matrix tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) set.seed(414) rmvn(4, 1:d, mcov) set.seed(414) rmvn(4, 1:d, mcov) set.seed(414) rmvn(4, 1:d, mcov, ncores = 2) # r.v. generated on the second core are different ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, d) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmvn(4, 1:d, mcov, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
Fast simulation of multivariate Student's t random variables
rmvt(n, mu, sigma, df, ncores = 1, isChol = FALSE, A = NULL, kpnames = FALSE)
rmvt(n, mu, sigma, df, ncores = 1, isChol = FALSE, A = NULL, kpnames = FALSE)
n |
number of random vectors to be simulated. |
mu |
vector of length d, representing the mean of the distribution. |
sigma |
scale matrix (d x d). Alternatively it can be the cholesky decomposition
of the scale matrix. In that case isChol should be set to TRUE. Notice that ff the degrees of
freedom (the argument |
df |
a positive scalar representing the degrees of freedom. |
ncores |
Number of cores used. The parallelization will take place only if OpenMP is supported. |
isChol |
boolean set to true is |
A |
an (optional) numeric matrix of dimension (n x d), which will be used to store the output random variables.
It is useful when n and d are large and one wants to call |
kpnames |
if |
There are in fact many candidates for the multivariate generalization of Student's t-distribution, here we use the parametrization described here https://en.wikipedia.org/wiki/Multivariate_t-distribution.
Notice that rmvt()
does not use one of the Random Number Generators (RNGs) provided by R, but one
of the parallel cryptographic RNGs described in (Salmon et al., 2011). It is important to point out that this
RNG can safely be used in parallel, without risk of collisions between parallel sequence of random numbers.
The initialization of the RNG depends on R's seed, hence the set.seed()
function can be used to
obtain reproducible results. Notice though that changing ncores
causes most of the generated numbers
to be different even if R's seed is the same (see example below). NB: at the moment the RNG does not work
properly on Solaris OS when ncores>1
. Hence, rmvt()
checks if the OS is Solaris and, if this the case,
it imposes ncores==1
.
If A==NULL
(default) the output is an (n x d) matrix where the i-th row is the i-th simulated vector.
If A!=NULL
then the random vector are store in A
, which is provided by the user, and the function
returns NULL
.
Matteo Fasiolo <[email protected]>, C++ RNG engine by Thijs van den Berg <http://sitmo.com/>.
John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw (2011). Parallel Random Numbers: As Easy as 1, 2, 3. D. E. Shaw Research, New York, NY 10036, USA.
d <- 5 mu <- 1:d df <- 4 # Creating covariance matrix tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) set.seed(414) rmvt(4, 1:d, mcov, df = df) set.seed(414) rmvt(4, 1:d, mcov, df = df) set.seed(414) rmvt(4, 1:d, mcov, df = df, ncores = 2) # These will not match the r.v. generated on a single core. ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, d) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmvt(4, 1:d, mcov, df = df, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here
d <- 5 mu <- 1:d df <- 4 # Creating covariance matrix tmp <- matrix(rnorm(d^2), d, d) mcov <- tcrossprod(tmp, tmp) + diag(0.5, d) set.seed(414) rmvt(4, 1:d, mcov, df = df) set.seed(414) rmvt(4, 1:d, mcov, df = df) set.seed(414) rmvt(4, 1:d, mcov, df = df, ncores = 2) # These will not match the r.v. generated on a single core. ###### Here we create the matrix that will hold the simulated random variables upfront. A <- matrix(NA, 4, d) class(A) <- "numeric" # This is important. We need the elements of A to be of class "numeric". set.seed(414) rmvt(4, 1:d, mcov, df = df, ncores = 2, A = A) # This returns NULL ... A # ... but the result is here