Title: | Density Estimation and Random Number Generation with Distribution Element Trees |
---|---|
Description: | Density estimation for possibly large data sets and conditional/unconditional random number generation or bootstrapping with distribution element trees. The function 'det.construct' translates a dataset into a distribution element tree. To evaluate the probability density based on a previously computed tree at arbitrary query points, the function 'det.query' is available. The functions 'det1' and 'det2' provide density estimation and plotting for one- and two-dimensional datasets. Conditional/unconditional smooth bootstrapping from an available distribution element tree can be performed by 'det.rnd'. For more details on distribution element trees, see: Meyer, D.W. (2016) <arXiv:1610.00345> or Meyer, D.W., Statistics and Computing (2017) <doi:10.1007/s11222-017-9751-9> and Meyer, D.W. (2017) <arXiv:1711.04632> or Meyer, D.W., Journal of Computational and Graphical Statistics (2018) <doi:10.1080/10618600.2018.1482768>. |
Authors: | Daniel Meyer |
Maintainer: | Daniel Meyer <[email protected]> |
License: | GPL-2 |
Version: | 1.1.3 |
Built: | 2024-10-31 21:21:55 UTC |
Source: | CRAN |
Check if all column vectors in a matrix are equal.
allequal(x)
allequal(x)
x |
matrix with column vectors. |
TRUE if matrix x
has zero or one column, or if all column vectors in matrix x
are equal. FALSE if x
contains at least two different columns.
Pearson's Chi-square test of pairwise mutual independence.
chi2indeptest(x, alpha)
chi2indeptest(x, alpha)
x |
matrix with |
alpha |
significance level. |
Object with test outcomes h[i,j] = h[j,i] = TRUE/FALSE
for 1 <= i,j <= n
meaning rejection/acceptance of independence null hypothesis involving rows i
and j
of matrix x
, and p-value or confidence level of acceptance.
Composite Pearson's Chi-square test for goodness-of-fit of linear distributions.
chi2testlinear(x, alpha)
chi2testlinear(x, alpha)
x |
vector with normalized data between 0 and 1. |
alpha |
significance level. |
Object with test outcome h = TRUE/FALSE
meaning rejection/acceptance of linear null hypothesis, and p-value or confidence level of acceptance.
(Contingency) tables for Pearson's Chi-square goodness-of-fit and independence tests.
chi2testtable(x, alpha, cf = FALSE)
chi2testtable(x, alpha, cf = FALSE)
x |
vector with normalized data between 0 and 1. |
alpha |
significance level. |
cf |
flag, if |
Object with observed counts co
of data x
inside bins defined by edges be
.
Cochran, W.G., The Chi Square Test of Goodness of Fit. The Annals of Mathematical Statistics, 1952. 23(3): p. 315-345.
Mann, H.B. and A. Wald, On the Choice of the Number of Class Intervals in the Application of the Chi Square Test. The Annals of Mathematical Statistics, 1942. 13(3): p. 306-317.
Bagnato, L., A. Punzo, and O. Nicolis, The autodependogram: a graphical device to investigate serial dependences. Journal of Time Series Analysis, 2012. 33(2): p. 233-254.
Pearson's Chi-square test for goodness-of-fit of a uniform distribution.
chi2testuniform(x, alpha)
chi2testuniform(x, alpha)
x |
vector with normalized data between 0 and 1. |
alpha |
significance level. |
Object with test outcome h = TRUE/FALSE
meaning rejection/acceptance of uniform null hypothesis, and p-value or confidence level of acceptance.
The function contourRect
draws the z contour levels of a rectangular domain in x-y-space with z-values given at the corners of the rectangle.
contourRect(xy, z, n = 20, zlb = 0, zub = 1, color = grDevices::colorRamp(c("white", "black")))
contourRect(xy, z, n = 20, zlb = 0, zub = 1, color = grDevices::colorRamp(c("white", "black")))
xy |
matrix with two rows and four columns containing x- and y-coordinates of the four corner points of the rectangle. The corner points are ordered in clockwise or counter-clockwise direction. |
z |
vector with four z-values at the four corner points. |
n |
|
zlb , zub
|
determines the global range of z-values used to determine the contour colors. All values in |
color |
function to assign plot colors that is generated, e.g., by |
Splits a parent distribution element characterized by x
, sze
, and id
along dimension dimens
based on mode
into two child elements.
de.split(dimens, x, sze, id, mode)
de.split(dimens, x, sze, id, mode)
dimens |
split dimension. |
x |
data in element given as matrix with |
sze |
vector representing the size of the element. |
id |
element index or identifier. |
mode |
for splitting: |
Object containing the properties of the resulting two child distribution elements and the split position within the parent element.
The function det.construct
generates a distribution element tree DET from available data. The DET can be used firstly in connection with det.query
for density estimation. Secondly, with det.rnd
, DETs can be used for smooth bootstrapping or more specifically conditional or unconditional random number generation.
det.construct(dta, mode = 2, lb = NA, ub = NA, alphag = 0.001, alphad = 0.001, progress = TRUE, dtalim = Inf, cores = 1)
det.construct(dta, mode = 2, lb = NA, ub = NA, alphag = 0.001, alphad = 0.001, progress = TRUE, dtalim = Inf, cores = 1)
dta |
matrix with |
mode |
order of distribution elements applied, default is |
lb , ub
|
vectors of length |
alphag , alphad
|
significance levels for goodness-of-fit and independence tests, respectively, in element refinement or splitting process. Default is |
progress |
optional logical, if set to |
dtalim |
for large datasets, |
cores |
|
A DET object, which reflects the tree and pre-white transform, is returned.
Meyer, D.W. (2016) http://arxiv.org/abs/1610.00345 or Meyer, D.W., Statistics and Computing (2017) https://doi.org/10.1007/s11222-017-9751-9 and Meyer, D.W. (2017) http://arxiv.org/abs/1711.04632
## Gaussian mixture data require(stats) det <- det.construct(t(c(rnorm(1e5),rnorm(1e4)/100+2))) # default linear det (mode = 2) x <- t(seq(-4,6,0.01)); p <- det.query(det, x); plot(x, p, type = "l") ## piecewise uniform data with peaks x <- matrix(c(rep(0,1e3),rep(1,1e3), 2*runif(1e4), rep(0,5e2),rep(1,25e2),2*runif(9e3)), nrow = 2, byrow = TRUE) det <- det.construct(x, mode = 1, lb = 0, ub = 0) # constant elements, no pre-whitening
## Gaussian mixture data require(stats) det <- det.construct(t(c(rnorm(1e5),rnorm(1e4)/100+2))) # default linear det (mode = 2) x <- t(seq(-4,6,0.01)); p <- det.query(det, x); plot(x, p, type = "l") ## piecewise uniform data with peaks x <- matrix(c(rep(0,1e3),rep(1,1e3), 2*runif(1e4), rep(0,5e2),rep(1,25e2),2*runif(9e3)), nrow = 2, byrow = TRUE) det <- det.construct(x, mode = 1, lb = 0, ub = 0) # constant elements, no pre-whitening
Identify distribution element tree (DET) leafs that are cut by conditions. The latter are defined in terms of positions xc
along probability-space components with indices dc
.
det.cut(det, xc, dc)
det.cut(det, xc, dc)
det |
distribution element tree object resulting from |
xc |
vector with conditioning values of probability-space components listed in |
dc |
integer vector with indices of conditioning components corresponding to |
A vector containing the leaf indices that are cut by conditions xc
of components dc
is returned. If no leafs are found, the return vector has length 0.
# DET based on Gaussian data require(stats); require(graphics) n <- 8e4; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.2)), ncol = 2) det <- det.construct(t(x), lb = 0, ub = 0) # no pre-whitening plot(x, type = "p", pch = ".", asp = 1) # leaf elements that are cut by x1 = 2 leafs <- det.cut(det, xc = 2, dc = 1) # condition x1 = 2 # draw probability space (black) with cut leaf elements (red) rect(det$lb[1], det$lb[2], det$ub[1], det$ub[2], border = "black") for (k in 1:length(leafs)) { p <- det.de(det, leafs[k])$lb; w <- det.de(det, leafs[k])$size rect(p[1],p[2],p[1]+w[1],p[2]+w[2], border = "red") } # leafs cut by two conditions x1 = -3, x2 = -2 (blue) leafs <- det.cut(det, xc = c(-2,-3), dc = c(2,1)) p <- det.de(det, leafs[1])$lb; w <- det.de(det, leafs[1])$size rect(p[1],p[2],p[1]+w[1],p[2]+w[2], border = "blue")
# DET based on Gaussian data require(stats); require(graphics) n <- 8e4; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.2)), ncol = 2) det <- det.construct(t(x), lb = 0, ub = 0) # no pre-whitening plot(x, type = "p", pch = ".", asp = 1) # leaf elements that are cut by x1 = 2 leafs <- det.cut(det, xc = 2, dc = 1) # condition x1 = 2 # draw probability space (black) with cut leaf elements (red) rect(det$lb[1], det$lb[2], det$ub[1], det$ub[2], border = "black") for (k in 1:length(leafs)) { p <- det.de(det, leafs[k])$lb; w <- det.de(det, leafs[k])$size rect(p[1],p[2],p[1]+w[1],p[2]+w[2], border = "red") } # leafs cut by two conditions x1 = -3, x2 = -2 (blue) leafs <- det.cut(det, xc = c(-2,-3), dc = c(2,1)) p <- det.de(det, leafs[1])$lb; w <- det.de(det, leafs[1])$size rect(p[1],p[2],p[1]+w[1],p[2]+w[2], border = "blue")
The function det.de
extracts the distribution element with index ind
from a distribution element tree (DET) generated by the function det.construct
.
det.de(det, ind)
det.de(det, ind)
det |
distribution element tree object resulting from |
ind |
index of element to extract from |
A list with the element characteristics is returned: p
probability density, theta
element parameters, lb
lower bound, size
of element, div
divisions or splits along dimensions leading to final element.
The function det.leafs
extracts the distribution elements at the branch ends of a DET generated by the function det.construct
.
det.leafs(det)
det.leafs(det)
det |
distribution element tree object resulting from |
A list of vectors containing the leaf element data is returned: p
probability density, theta
element parameters, lb
lower bound, size
of element, div
divisions or splits along dimensions leading to final element.
require(stats); require(graphics) # generate DET based on bi-variate Gaussian data n <- 1e4; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.2)), nrow = 2, byrow = TRUE) det <- det.construct(x) # plot data and element pattern leafs <- det.leafs(det) plot(t(x), type = "p", pch = ".", asp = 1) for (k in 1:length(leafs$p)) { p <- leafs$lb[,k] # element corner point w <- leafs$size[,k] # element size elem <- rbind(c(p[1],p[1]+w[1],p[1]+w[1],p[1],p[1]), c(p[2],p[2],p[2]+w[2],p[2]+w[2],p[2])) # element rectangle elem <- t(det$A) %*% elem + det$mu %*% t(rep(1,5)) # pre-white transform lines(elem[1,],elem[2,]) # draw element }
require(stats); require(graphics) # generate DET based on bi-variate Gaussian data n <- 1e4; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.2)), nrow = 2, byrow = TRUE) det <- det.construct(x) # plot data and element pattern leafs <- det.leafs(det) plot(t(x), type = "p", pch = ".", asp = 1) for (k in 1:length(leafs$p)) { p <- leafs$lb[,k] # element corner point w <- leafs$size[,k] # element size elem <- rbind(c(p[1],p[1]+w[1],p[1]+w[1],p[1],p[1]), c(p[2],p[2],p[2]+w[2],p[2]+w[2],p[2])) # element rectangle elem <- t(det$A) %*% elem + det$mu %*% t(rep(1,5)) # pre-white transform lines(elem[1,],elem[2,]) # draw element }
The function det.query
evaluates probability densities at the query points x
based on a distribution element tree (DET). The latter is calculable with det.construct
based on available data.
det.query(det, x, cores = 1)
det.query(det, x, cores = 1)
det |
distribution element tree object resulting from |
x |
matrix containing |
cores |
for large query-point sets, |
A vector containing the probability density at the query points x
is returned.
## 1d example require(stats); require(graphics) # DET generation based on Gaussian/uniform data det <- det.construct(t(c(rnorm(1e5,2,3),runif(1e5)-3))) # density evaluation based on DET at equidistant query points x <- t(seq(-10,14,0.01)); p <- det.query(det, x) # compare DET estimate (black) against Gaussian/uniform reference (red) plot(x, p, type = "l", col = "black") lines(x, (dnorm(x,2,3)+dunif(x+3))/2, col = "red") ## 2d example require(stats); require(graphics) # mean and covariance of Gaussian, data generation mu <- c(3,5); C <- matrix(c(4.0,-2.28,-2.28,1.44), nrow = 2) A <- eigen(C); B <- diag(A$values); A <- A$vectors x <- matrix(rnorm(2e4), nrow = 2) x <- t(A %*% (sqrt(B) %*% x) + mu %*% t(rep(1,ncol(x)))) # bounds and resolution of x1-x2 query grid lb <- c(-5,0); ub <- c(11,10); np <- c(320,200) x1 <- lb[1] + (ub[1]-lb[1])*((1:np[1])-0.5)/np[1] x2 <- lb[2] + (ub[2]-lb[2])*((1:np[2])-0.5)/np[2] xp <- rbind(rep(x1,np[2]), rep(x2,each = np[1])) # grid points # plotting split.screen(c(2, 2)); screen(1) plot(x, type = "p", pch = ".", asp = 1, main = "data") # DET estimator det <- det.construct(t(x)) yd <- matrix(det.query(det, xp), nrow = np[1]) screen(2) image(list(x = x1, y = x2, z = yd), asp = 1, col = grDevices::gray((100:0)/100), main = "det") # Gaussian density for comparison yr <- yr <- exp(-1/2 * colSums( (t(solve(C)) %*% (xp - mu%*%t(rep(1,ncol(xp))))) * (xp - mu%*%t(rep(1,ncol(xp))))) ) / sqrt((2*pi)^2*det(C)) yr <- matrix(yr, nrow = np[1]) screen(3) image(list(x = x1, y = x2, z = yr), asp = 1, col = grDevices::gray((100:0)/100), main = "reference")
## 1d example require(stats); require(graphics) # DET generation based on Gaussian/uniform data det <- det.construct(t(c(rnorm(1e5,2,3),runif(1e5)-3))) # density evaluation based on DET at equidistant query points x <- t(seq(-10,14,0.01)); p <- det.query(det, x) # compare DET estimate (black) against Gaussian/uniform reference (red) plot(x, p, type = "l", col = "black") lines(x, (dnorm(x,2,3)+dunif(x+3))/2, col = "red") ## 2d example require(stats); require(graphics) # mean and covariance of Gaussian, data generation mu <- c(3,5); C <- matrix(c(4.0,-2.28,-2.28,1.44), nrow = 2) A <- eigen(C); B <- diag(A$values); A <- A$vectors x <- matrix(rnorm(2e4), nrow = 2) x <- t(A %*% (sqrt(B) %*% x) + mu %*% t(rep(1,ncol(x)))) # bounds and resolution of x1-x2 query grid lb <- c(-5,0); ub <- c(11,10); np <- c(320,200) x1 <- lb[1] + (ub[1]-lb[1])*((1:np[1])-0.5)/np[1] x2 <- lb[2] + (ub[2]-lb[2])*((1:np[2])-0.5)/np[2] xp <- rbind(rep(x1,np[2]), rep(x2,each = np[1])) # grid points # plotting split.screen(c(2, 2)); screen(1) plot(x, type = "p", pch = ".", asp = 1, main = "data") # DET estimator det <- det.construct(t(x)) yd <- matrix(det.query(det, xp), nrow = np[1]) screen(2) image(list(x = x1, y = x2, z = yd), asp = 1, col = grDevices::gray((100:0)/100), main = "det") # Gaussian density for comparison yr <- yr <- exp(-1/2 * colSums( (t(solve(C)) %*% (xp - mu%*%t(rep(1,ncol(xp))))) * (xp - mu%*%t(rep(1,ncol(xp))))) ) / sqrt((2*pi)^2*det(C)) yr <- matrix(yr, nrow = np[1]) screen(3) image(list(x = x1, y = x2, z = yr), asp = 1, col = grDevices::gray((100:0)/100), main = "reference")
Smooth bootstrapping or generation of (un)conditional random vectors based on an existing distribution element tree (DET).
det.rnd(n, det, xc = vector("numeric", length = 0), dc = vector("numeric", length = 0), cores = Inf)
det.rnd(n, det, xc = vector("numeric", length = 0), dc = vector("numeric", length = 0), cores = Inf)
n |
number of samples to generate. |
det |
distribution element tree object resulting from |
xc |
vector with conditioning values of probability-space components listed in |
dc |
integer vector with indices of conditioning components corresponding to |
cores |
for large |
A matrix containing n
random vectors (columns) with d
components or dimensions (rows) is returned. d
is equal to the dimensionality of the underlying det
object.
## 2d example require(stats); require(graphics) # data from uniform distribution on a wedge x <- matrix(runif(2e4), ncol = 2); x <- x[x[,2]<x[,1],] x2c <- 0.75 # conditioning component # data and conditioning line split.screen(c(2, 1)); screen(1) plot(x, type = "p", pch = ".", asp = 1) lines(c(0,1), x2c*c(1,1), col = "red") # DET construction and bootstrapping det <- det.construct(t(x), mode = 1, lb = 0, ub = 0) # const. de's, no pre-white y <- det.rnd(1e3, det, xc = x2c, dc = 2, cores = 2) # conditional bootstrap'g # compare generated data (black) with exact cond. distribution (red) screen(2); det1(y[1,], mode = 1) lines(c(0,x2c,x2c,1,1),c(0,0,1/(1-x2c),1/(1-x2c),0), col = "red") ## example 2d unconditional require(stats); require(graphics) x <- matrix(runif(2e4), ncol = 2); x <- x[x[,2]<x[,1],] # uniform wedge det <- det.construct(t(x), mode = 1, lb = 0, ub = 0) # no pre-white y <- t(det.rnd(nrow(x), det, cores = 2)) # smooth bootstrapping split.screen(c(2, 1)) screen(1); plot(x, type = "p", pch = ".", asp = 1, main = "original") screen(2); plot(y, type = "p", pch = ".", asp = 1, main = "bootstrapped") ## example 3d require(stats); require(graphics) # mean and covariance of Gaussian, data generation mu <- c(1,3,2); C <- matrix(c(25,7.5,1.75,7.5,7,1.35,1.75,1.35,0.43), nrow = 3) A <- eigen(C); B <- diag(A$values); A <- A$vectors x <- matrix(rnorm(3e4), nrow = 3) x <- A %*% (sqrt(B) %*% x) + mu %*% t(rep(1,ncol(x))) lbl <- "x1 | x2 = 7 & x3 = 2.5" pairs(t(x), labels = c("x1","x2","x3"), pch = ".", main = lbl) # bootstrapping conditional on x2 and x3 det <- det.construct(x, lb = 0, ub = 0) xc <- c(2.5,7); d <- c(3,2) # conditional on x2 = 7 & x3 = 2.5 y <- det.rnd(1e4, det, xc, d, cores = 1) det1(y[1,], mode = 1, main = lbl) # compare with exact conditional density Cm1 <- solve(C); var1 <- det(C)/det(C[2:3,2:3]) # conditional variance mu1 <- mu[1] + var1*((mu[2]-xc[d==2])*Cm1[1,2]+(mu[3]-xc[d==3])*Cm1[1,3]) # cond. mean x1 <- mu1 + seq(-50,50)/50 * 5*sqrt(var1) # x1-axis grid points lines(x1, dnorm(x1,mu1,sqrt(var1)), col = "red")
## 2d example require(stats); require(graphics) # data from uniform distribution on a wedge x <- matrix(runif(2e4), ncol = 2); x <- x[x[,2]<x[,1],] x2c <- 0.75 # conditioning component # data and conditioning line split.screen(c(2, 1)); screen(1) plot(x, type = "p", pch = ".", asp = 1) lines(c(0,1), x2c*c(1,1), col = "red") # DET construction and bootstrapping det <- det.construct(t(x), mode = 1, lb = 0, ub = 0) # const. de's, no pre-white y <- det.rnd(1e3, det, xc = x2c, dc = 2, cores = 2) # conditional bootstrap'g # compare generated data (black) with exact cond. distribution (red) screen(2); det1(y[1,], mode = 1) lines(c(0,x2c,x2c,1,1),c(0,0,1/(1-x2c),1/(1-x2c),0), col = "red") ## example 2d unconditional require(stats); require(graphics) x <- matrix(runif(2e4), ncol = 2); x <- x[x[,2]<x[,1],] # uniform wedge det <- det.construct(t(x), mode = 1, lb = 0, ub = 0) # no pre-white y <- t(det.rnd(nrow(x), det, cores = 2)) # smooth bootstrapping split.screen(c(2, 1)) screen(1); plot(x, type = "p", pch = ".", asp = 1, main = "original") screen(2); plot(y, type = "p", pch = ".", asp = 1, main = "bootstrapped") ## example 3d require(stats); require(graphics) # mean and covariance of Gaussian, data generation mu <- c(1,3,2); C <- matrix(c(25,7.5,1.75,7.5,7,1.35,1.75,1.35,0.43), nrow = 3) A <- eigen(C); B <- diag(A$values); A <- A$vectors x <- matrix(rnorm(3e4), nrow = 3) x <- A %*% (sqrt(B) %*% x) + mu %*% t(rep(1,ncol(x))) lbl <- "x1 | x2 = 7 & x3 = 2.5" pairs(t(x), labels = c("x1","x2","x3"), pch = ".", main = lbl) # bootstrapping conditional on x2 and x3 det <- det.construct(x, lb = 0, ub = 0) xc <- c(2.5,7); d <- c(3,2) # conditional on x2 = 7 & x3 = 2.5 y <- det.rnd(1e4, det, xc, d, cores = 1) det1(y[1,], mode = 1, main = lbl) # compare with exact conditional density Cm1 <- solve(C); var1 <- det(C)/det(C[2:3,2:3]) # conditional variance mu1 <- mu[1] + var1*((mu[2]-xc[d==2])*Cm1[1,2]+(mu[3]-xc[d==3])*Cm1[1,3]) # cond. mean x1 <- mu1 + seq(-50,50)/50 * 5*sqrt(var1) # x1-axis grid points lines(x1, dnorm(x1,mu1,sqrt(var1)), col = "red")
One-dimensional piecewise linear or constant probability density estimator based on distribution element trees (DETs).
det1(dta, mode = 2, bounds = c(0, 0), alpha = 0.001, main = NULL, dtalim = Inf, cores = 1)
det1(dta, mode = 2, bounds = c(0, 0), alpha = 0.001, main = NULL, dtalim = Inf, cores = 1)
dta |
vector with data |
mode |
order of distribution elements applied, default is |
bounds |
|
alpha |
significance level for goodness-of-fit testing in element refinement or splitting process. Default is |
main |
an overall plot title, see |
dtalim |
allows to limit the number of samples used in tests guiding the element splitting process. Default is |
cores |
number of cores for parallel tree construction. Default is |
require(stats) det1(rbeta(5e5, shape1 = 1.05, shape2 = 0.8), mode = -1, bounds = c(-0.1,1.1), main = "beta, const. elements, equal-scores splits") x <- seq(-0.1,1.1,0.005); lines(x, dbeta(x,shape1 = 1.05,shape2 = 0.8), col = "red") det1(rbeta(5e5, shape1 = 1.05, shape2 = 0.8), mode = -2, bounds = c(-0.1,1.1), main = "beta, linear elements, equal-scores splits") x <- seq(-0.1,1.1,0.005); lines(x, dbeta(x,shape1 = 1.05,shape2 = 0.8), col = "red") det1(rnorm(5e5), mode = 2, cores = 1, main = "Gaussian, linear elements, equal-size splits") x <- seq(-5,5,0.05); lines(x, dnorm(x), col = "red") det1(runif(5e5), mode = 1, bounds = c(-0.1,1.1), main = "uniform, const. elements, equal-size splits") x <- seq(-0.1,1.1,0.005); lines(x, dunif(x), col = "red")
require(stats) det1(rbeta(5e5, shape1 = 1.05, shape2 = 0.8), mode = -1, bounds = c(-0.1,1.1), main = "beta, const. elements, equal-scores splits") x <- seq(-0.1,1.1,0.005); lines(x, dbeta(x,shape1 = 1.05,shape2 = 0.8), col = "red") det1(rbeta(5e5, shape1 = 1.05, shape2 = 0.8), mode = -2, bounds = c(-0.1,1.1), main = "beta, linear elements, equal-scores splits") x <- seq(-0.1,1.1,0.005); lines(x, dbeta(x,shape1 = 1.05,shape2 = 0.8), col = "red") det1(rnorm(5e5), mode = 2, cores = 1, main = "Gaussian, linear elements, equal-size splits") x <- seq(-5,5,0.05); lines(x, dnorm(x), col = "red") det1(runif(5e5), mode = 1, bounds = c(-0.1,1.1), main = "uniform, const. elements, equal-size splits") x <- seq(-0.1,1.1,0.005); lines(x, dunif(x), col = "red")
Two-dimensional piecewise linear or constant probability density estimator based on distribution element trees (DETs).
det2(dta, mode = 2, bounds = list(NA, NA), alphag = 0.001, alphad = 0.001, main = NULL, nc = 20, dtalim = Inf, cores = 1, color = grDevices::colorRamp(c("white", "black")))
det2(dta, mode = 2, bounds = list(NA, NA), alphag = 0.001, alphad = 0.001, main = NULL, nc = 20, dtalim = Inf, cores = 1, color = grDevices::colorRamp(c("white", "black")))
dta |
matrix with two rows containing data (samples in columns). |
mode |
order of distribution elements applied, default is |
bounds |
|
alphag , alphad
|
significance levels for goodness-of-fit and independence tests, respectively, in element refinement or splitting process. Default is |
main |
an overall plot title, see |
nc |
number of contour levels (default is 20). |
dtalim |
allows to limit the number of samples used in tests guiding the element splitting process. Default is |
cores |
number of cores for parallel tree construction. Default is |
color |
function to assign plot colors that is generated, e.g., by |
## uniform require(stats) det2(rbind(runif(5e3),1+2*runif(5e3)), mode = 1, bounds = list(c(-0.1,0),c(1.1,4))) det2(rbind(1:100,101:200+runif(100)), mode = 2) # data on a line ## Gaussian require(stats); require(graphics); require(grDevices) n <- 5e3; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.5)), ncol = 2) split.screen(c(2,2)) color = colorRamp(c("#FFFFFF","#E6E680","#E6BF1A", "#E68000","#FF4026","#993380", "#4D26BF","#262680","#000000")) screen(3); plot(x, type = "p", pch = ".", main = "data") screen(1); det2(t(x), mode = 1, main = "constant det estimator", color = color) screen(2); det2(t(x), main = "linear det estimator", color = color) screen(4) det2(t(x), mode = 1, bounds = list(0,0), main = "const. det, no pre-white", color = color)
## uniform require(stats) det2(rbind(runif(5e3),1+2*runif(5e3)), mode = 1, bounds = list(c(-0.1,0),c(1.1,4))) det2(rbind(1:100,101:200+runif(100)), mode = 2) # data on a line ## Gaussian require(stats); require(graphics); require(grDevices) n <- 5e3; x <- rnorm(n) x <- matrix(c(x, x+rnorm(n,0,0.5)), ncol = 2) split.screen(c(2,2)) color = colorRamp(c("#FFFFFF","#E6E680","#E6BF1A", "#E68000","#FF4026","#993380", "#4D26BF","#262680","#000000")) screen(3); plot(x, type = "p", pch = ".", main = "data") screen(1); det2(t(x), mode = 1, main = "constant det estimator", color = color) screen(2); det2(t(x), main = "linear det estimator", color = color) screen(4) det2(t(x), mode = 1, bounds = list(0,0), main = "const. det, no pre-white", color = color)
Distribution element trees (DETs) enable the estimation of probability densities based on (possibly large) datasets. Moreover, DETs can be used for random number generation or smooth bootstrapping both in unconditional and conditional modes. In the latter mode, information about certain probability-space components is taken into account when sampling the remaining probability-space components.
The function det.construct
translates a dataset into a DET.
To evaluate the probability density based on a precomputed DET at arbitrary query points, det.query
is used.
The functions det1
and det2
provide density estimation and plotting for one- and two-dimensional datasets.
(Un)conditional smooth bootstrapping from an available DET, can be performed by det.rnd
.
To inspect the structure of a DET, the functions det.de
and det.leafs
are useful. While det.de
enables the extraction of an individual distribution element from the tree, det.leafs
extracts all leaf elements at branch ends.
Daniel Meyer, [email protected]
Distribution element tree basics and density estimation, see Meyer, D.W. (2016) http://arxiv.org/abs/1610.00345 or Meyer, D.W., Statistics and Computing (2017) https://doi.org/10.1007/s11222-017-9751-9.
DETs for smooth bootstrapping, see Meyer, D.W. (2017) http://arxiv.org/abs/1711.04632 or Meyer, D.W., Journal of Computational and Graphical Statistics (2018) https://doi.org/10.1080/10618600.2018.1482768.
Determine the split dimensions of an existing distribution element in the DET refinement-process based on statistical tests.
dimstosplit(x, sze, mode, alphag, alphad)
dimstosplit(x, sze, mode, alphag, alphad)
x |
data in element given as matrix with |
sze |
vector representing the element size. |
mode |
element order and split mode as detailed in |
alphag , alphad
|
significance levels for goodness-of-fit and independence tests, respectively, in element refinement or splitting process. |
An object comprised of the split dimension(s) or NA
for no split, and the resulting child element parameters is returned.