--- title: "Latin Hypercube Samples - Questions" author: "Rob Carnell" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Latin Hypercube Samples - Questions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) require(lhs) set.seed(2893) ``` ## Question 1 I am looking for a package which gives me Latin hypercube samples from a grid of values: ```{r q1} a <- (1:10) b <- (20:30) dataGrid <- expand.grid(a, b) ``` ### Answer The `lhs` package returns a uniformly distributed stratified sample from the unit hypercube. The marginal distributions can then be transformed to your distribution of choice. If you wanted a uniform Latin hypercube on [1,10] and [20,30] with 22 samples, you could do: ```{r a1} X <- randomLHS(22, 2) X[,1] <- 1 + 9*X[,1] X[,2] <- 20 + 10*X[,2] # OR Y <- randomLHS(22, 2) Y[,1] <- qunif(Y[,1], 1, 10) Y[,2] <- qunif(Y[,2], 20, 30) head(X) head(Y) ``` If you want integers only in the sample, then you can use `qinteger`. ```{r a12} X <- randomLHS(3, 2) X[,1] <- qinteger(X[,1], 1, 10) X[,2] <- qinteger(X[,2], 20, 30) head(X) ``` ## Question 2 I am trying to do a Latin hypercube Sampling (LHS) to a 5-parameter design matrix. I want the combination of the first three parameters to sum up to 1 (which obviously do not) If I divide each of these parameters with the sum, the uniform distribution is lost. Is there a way to maintain the random LHS (with uniformly distributed parameters) so that the referred condition is fulfilled? ### Answer In my experience with Latin hypercube samples, most people draw the sample on a uniform hypercube and then transform the uniform cube to have new distributions on the margins. The transformed distributions are not necessarily uniform. It is possible to draw a Latin hypercube with correlated margins and have them sum to one using `qdirichlet`, but they will not be uniform. I'll make a quick example argument that explains the difficulty... In two dimensions, you could draw this which is uniform and correlated. ```{r a21} x <- seq(0.05, 0.95, length = 10) y <- 1 - x all.equal(x + y, rep(1, length(x))) hist(x, main = "") hist(y, main = "") ``` But in three dimensions, it is hard to maintain uniformity because large samples on the first uniform margin overweight the small samples on the other margins. ```{r a22} x <- seq(0.05, 0.95, length = 10) y <- runif(length(x), 0, 1 - x) z <- 1 - x - y hist(x, main = "") hist(y, main = "") hist(z, main = "") ``` The transformed distributions maintain their "Latin" properties, but are in the form of new distributions. The Dirichlet distribution is built for this purpose. ```{r a24, fig.width=5, fig.height=5} N <- 1000 x <- randomLHS(N, 5) y <- x y[,1:3] <- qdirichlet(x[,1:3], c(1, 1, 1)) y[,4] <- x[,4] y[,5] <- x[,5] par(mfrow = c(2,3)) dummy <- apply(x, 2, hist, main = "") par(mfrow = c(2,3)) dummy <- apply(y, 2, hist, main = "") all.equal(rowSums(y[,1:3]), rep(1, nrow(y))) ``` The uniform properties are gone as you can see here... ```{r a25} par(mfrow = c(1,1)) pairs(x) pairs(y, col = "red") ``` ## Question 3 How do I create a Latin hypercube that ranges between between 0 and 1 and sums to 1? ### Answer I have a solution to this problem using a Dirichlet distribution. The result is not uniformly distributed on (0,1) anymore, but instead is Dirichlet distributed with the parameters alpha. The Latin properties are maintained. ```{r qdirichlet} X <- randomLHS(1000, 7) Y <- qdirichlet(X, rep(1,7)) stopifnot(all(abs(rowSums(Y) - 1) < 1E-12)) range(Y) ws <- randomLHS(1000, 7) wsSums <- rowSums(ws) wss <- ws / wsSums stopifnot(all(abs(rowSums(wss) - 1) < 1E-12)) range(wss) ``` ## Question 5 I need to use Latin hypercube sampling for my own custom functions. ### Answer ```{r custom, fig.width=5, fig.height=5} require(lhs) # functions you described T1 <- function(t) t*t WL1 <- function(T1, t) T1*t BE1 <- function(WL1, T1, t) WL1*T1*t # t is distributed according to some pdf (e.g. normal) # draw a lhs with 512 rows and 3 columns (one for each function) y <- randomLHS(512, 3) # transform the three columns to a normal distribution (these could be any # distribution) t <- apply(y, 2, function(columny) qnorm(columny, 2, 1)) # transform t using the functions provided result <- cbind( T1(t[,1]), WL1(T1(t[,2]), t[,2]), BE1(WL1(T1(t[,3]), t[,3]), T1(t[,3]), t[,3]) ) # check the results # these should be approximately uniform par(mfrow = c(2,2)) dummy <- apply(y, 2, hist, breaks = 50, main = "") # these should be approximately normal par(mfrow = c(2,2)) dummy <- apply(t, 2, hist, breaks = 50, main = "") # these should be the results of the functions par(mfrow = c(2,2)) dummy <- apply(result, 2, hist, breaks = 50, main = "") ``` ## Question 6 I need a Latin hypercube sample on an integer set or a set of colors. ### Answer ```{r q6, fig.height=5, fig.width=5} N <- 1000 x <- randomLHS(N, 4) y <- as.data.frame(x) # uniform integers on 1-10 y[,1] <- qinteger(x[,1], 1, 10) # three colors 1,2,3 y[,2] <- qfactor(x[,2], factor(c("R", "G", "B"))) # other distributions y[,3] <- qunif(x[,3], 5, 10) y[,4] <- qnorm(x[,4], 0, 2) table(y[,1]) table(y[,2]) hist(y[,3], main="") hist(y[,4], main="") ```