Title: | Estimation of Overlapping in Empirical Distributions |
---|---|
Description: | Functions for estimating the overlapping area of two or more kernel density estimations from empirical data. |
Authors: | Massimiliano Pastore [aut,cre], Pierfrancesco Alaimo Di Loro [ctb], Marco Mingione [ctb], Antonio Calcagni' [ctb] |
Maintainer: | Massimiliano Pastore <[email protected]> |
License: | GPL-2 |
Version: | 2.1 |
Built: | 2024-11-06 06:30:33 UTC |
Source: | CRAN |
Resampling via non-parametric bootstrap to estimate the overlapping area between two or more kernel density estimations from empirical data.
boot.overlap( x, B = 1000, pairsOverlap = FALSE, ... )
boot.overlap( x, B = 1000, pairsOverlap = FALSE, ... )
x |
a list of numerical vectors to be compared (each vector is an element of the list). |
B |
integer, number of bootstrap draws. |
pairsOverlap |
logical, if |
... |
options, see function |
If the list x
contains more than two elements (i.e., more than two distributions) it computes the bootstrap overlapping measure between all the paired distributions. For example, if
x
contains three elements then ; if
x
contains four elements then .
It returns a list containing the following components:
OVboot_stats |
a data frame |
OVboot_dist |
a matrix with |
Call function overlap
.
Thanks to Jeremy Vollen for suggestions.
Massimiliano Pastore
Pastore, M. (2018). Overlapping: a R package for Estimating Overlapping in Empirical Distributions. The Journal of Open Source Software, 3 (32), 1023. doi:10.21105/joss.01023
Pastore, M., Calcagnì, A. (2019). Measuring Distribution Similarities Between Samples: A Distribution-Free Overlapping Index. Frontiers in Psychology, 10:1089. doi:10.3389/fpsyg.2019.01089
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) ## bootstrapping out <- boot.overlap( x, B = 10 ) out$OVboot_stats # bootstrap quantile intervals apply( out$OVboot_dist, 2, quantile, probs = c(.05, .9) ) # plot of bootstrap distributions Y <- stack( data.frame( out$OVboot_dist )) ggplot( Y, aes( values )) + facet_wrap( ~ind ) + geom_density()
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) ## bootstrapping out <- boot.overlap( x, B = 10 ) out$OVboot_stats # bootstrap quantile intervals apply( out$OVboot_dist, 2, quantile, probs = c(.05, .9) ) # plot of bootstrap distributions Y <- stack( data.frame( out$OVboot_dist )) ggplot( Y, aes( values )) + facet_wrap( ~ind ) + geom_density()
Graphical representation of the estimated densities along with the overlapping area.
final.plot( x, pairs = FALSE, boundaries = NULL )
final.plot( x, pairs = FALSE, boundaries = NULL )
x |
a list of numerical vectors to be compared; each vector is an element of the list, see |
pairs |
logical, if |
boundaries |
an optional vector indicating the minimum and the maximum over a predefined subset of the support of the empirical densities. |
It requires the package ggplot2
.
The output plot can be customized using the ggplot2
rules, see example below.
Massimiliano Pastore
set.seed(20150605) x <- list(X1=rnorm(100),X2=rt(50,8),X3=rchisq(80,2)) final.plot(x) final.plot(x, pairs = TRUE) # customizing plot final.plot(x) + scale_fill_brewer() + scale_color_brewer() final.plot(x) + theme(text=element_text(size=15))
set.seed(20150605) x <- list(X1=rnorm(100),X2=rt(50,8),X3=rchisq(80,2)) final.plot(x) final.plot(x, pairs = TRUE) # customizing plot final.plot(x) + scale_fill_brewer() + scale_color_brewer() final.plot(x) + theme(text=element_text(size=15))
It returns the overlapped estimated area between two or more kernel density estimations from empirical data. The overlapping measure can be computed either as the integral of the minimum between two densities (type = "1"
) or as the proportion of overlapping area between two densities (type = "2"
). In the last case, the integral of the minimum between two densities is divided by the integral of the maximum of the two densities.
overlap( x, nbins = 1024, type = c( "1", "2" ), pairsOverlap = TRUE, plot = FALSE, boundaries = NULL, get_xpoints = FALSE, ... )
overlap( x, nbins = 1024, type = c( "1", "2" ), pairsOverlap = TRUE, plot = FALSE, boundaries = NULL, get_xpoints = FALSE, ... )
x |
a list of numerical vectors to be compared (each vector is an element of the list). |
nbins |
number of equally spaced points through which the density estimates are compared; see |
type |
character, type of index. If |
pairsOverlap |
logical, if |
plot |
logical, if |
boundaries |
an optional vector indicating the minimum and the maximum over a predefined subset of the support of the empirical densities, see Details. |
get_xpoints |
logical, if |
... |
optional arguments to be passed to the function |
When dealing with two densities: type = "1"
corresponds to the integral of the minimum between the two densities; type = "2"
corresponds to the proportion of the overlapped area over the total area.
If the list x
contains more than two elements (i.e. more than two distributions) it computes both the multiple and the pairwise overlapping among all distributions.
If plot = TRUE
all the overlapped areas are plotted. It requires ggplot2
.
The optional vector boundaries
has to contain two numbers for the empirical minimum and maximum of the overlapped area. See examples below.
It returns a list containing the following components:
OV |
estimate of the overlapped area; if |
xpoints |
a list of intersection points (in abscissa) among the densities (if |
OVpairs |
the estimates of overlapped areas for each pair of densities (only if |
Call function ovmult
.
Massimiliano Pastore, Pierfrancesco Alaimo Di Loro, Marco Mingione
Pastore, M. (2018). Overlapping: a R package for Estimating Overlapping in Empirical Distributions. The Journal of Open Source Software, 3 (32), 1023. doi:10.21105/joss.01023
Pastore, M., Calcagnì, A. (2019). Measuring Distribution Similarities Between Samples: A Distribution-Free Overlapping Index. Frontiers in Psychology, 10:1089. doi:10.3389/fpsyg.2019.01089
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) overlap(x, plot=TRUE) # including boundaries x <- list(X1=runif(100), X2=runif(100,.5,1)) overlap(x, plot=TRUE, boundaries=c(.5,1)) x <- list(X1=runif(100), X2=runif(50), X3=runif(30)) overlap(x, plot=TRUE, boundaries=c(.1,.9)) # changing kernel overlap(x, plot=TRUE, kernel="rectangular") # normalized overlap N <- 1e5 x <- list(X1=runif(N),X2=runif(N,.5)) overlap(x) overlap(x, type = "2")
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) overlap(x, plot=TRUE) # including boundaries x <- list(X1=runif(100), X2=runif(100,.5,1)) overlap(x, plot=TRUE, boundaries=c(.5,1)) x <- list(X1=runif(100), X2=runif(50), X3=runif(30)) overlap(x, plot=TRUE, boundaries=c(.1,.9)) # changing kernel overlap(x, plot=TRUE, kernel="rectangular") # normalized overlap N <- 1e5 x <- list(X1=runif(N),X2=runif(N,.5)) overlap(x) overlap(x, type = "2")
It gives the overlap area between two or more kernel density estimations from empirical data.
ovmult( x, nbins = 1024, type = c( "1", "2" ), boundaries = NULL, get_xpoints = FALSE, ... )
ovmult( x, nbins = 1024, type = c( "1", "2" ), boundaries = NULL, get_xpoints = FALSE, ... )
x |
a list of numerical vectors to be compared (each vector is an element of the list). |
nbins |
number of equally spaced points through which the density estimates are compared; see |
type |
character, type of index. If |
boundaries |
an optional vector indicating the minimum and the maximum over a predefined subset of the support of the empirical densities. |
get_xpoints |
logical, if |
... |
optional arguments to be passed to the function |
If the list x
contains more than two elements (i.e. more than two distributions) it computes multiple overlap measures.
The optional vector boundaries
has to contain two numbers for the empirical minimum and maximum of the overlapped area. See examples below.
It returns the value of overlapped area.
Called from the function overlap
.
Pierfrancesco Alaimo Di Loro, Marco Mingione, Massimiliano Pastore
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) ovmult(x) ovmult(x, normalized = TRUE) # including boundaries x <- list(X1=runif(100), X2=runif(100,.5,1)) ovmult(x, boundaries=c( 0, .8 )) x <- list(X1=runif(100), X2=runif(50), X3=runif(30)) ovmult(x, boundaries=c( .2, .8 )) # changing kernel ovmult(x, kernel="rectangular")
set.seed(20150605) x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2)) ovmult(x) ovmult(x, normalized = TRUE) # including boundaries x <- list(X1=runif(100), X2=runif(100,.5,1)) ovmult(x, boundaries=c( 0, .8 )) x <- list(X1=runif(100), X2=runif(50), X3=runif(30)) ovmult(x, boundaries=c( .2, .8 )) # changing kernel ovmult(x, kernel="rectangular")