Title: | High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls |
---|---|
Description: | Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling. |
Authors: | Benjamin Blonder, with contributions from Cecina Babich Morrow, Stuart Brown, Gregoire Butruille, Daniel Chen, Alex Laini, and David J. Harris |
Maintainer: | Benjamin Blonder <[email protected]> |
License: | GPL-3 |
Version: | 3.1.4 |
Built: | 2024-12-28 06:41:20 UTC |
Source: | CRAN |
Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.
A frequently asked questions document (FAQ) can be found at http://www.benjaminblonder.org/hypervolume_faq.html. More details are also available in a user guide within our 2018 paper (see reference below).
Benjamin Blonder, with contributions from Cecina Babich Morrow, Stuart Brown, Gregoire Butruille, Daniel Chen, Alex Laini, and David J. Harris
Maintainer: Benjamin Blonder <[email protected]>
Blonder, B., Lamanna, C., Violle, C. and Enquist, B. J. (2014), The n-dimensional hypervolume. Global Ecology and Biogeography, 23: 595-609. doi: 10.1111/geb.12146
Blonder, B. Do Hypervolumes Have Holes?, The American Naturalist, 187(4) E93-E105. doi: 10.1086/685444
Blonder, B., Morrow, C.B., Maitner, B., et al. New approaches for delineating n-dimensional hypervolumes. Methods Ecol Evol. 2018;9:305-319. doi: 10.1111/2041-210X.12865
Data for occurrences of Acacia and Pinus species based on geographic observations.
data(acacia_pinus)
data(acacia_pinus)
A data.frame
with 37845 observations on the following 3 variables.
Species
a character
containing 139 unique values
Latitude
a numeric vector
Longitude
a numeric vector
Occurrence data come from the BIEN database (https://biendata.org/
).
Blonder, B., Lamanna, C., Violle, C., Enquist, B. The n-dimensional hypervolume. Global Ecology and Biogeography (2014).
Data generated by picking 100 points randomly within a circle of radius 1. See the vignette on occupancy for information about its usage.
data(circles)
data(circles)
A list
with 20 objects of class matrix
. Each matrix
contains 100 rows and 2 columns.
[,1]
x coordinate
[,2]
y coordinate
copy_param_hypervolume
takes in a hypervolume and data. After detecting the method used to generate the input hypervolume, the function returns a new hypervolume generated from the data using the same method and parameters as the input hypervolume.
copy_param_hypervolume(hv, data, name = NULL)
copy_param_hypervolume(hv, data, name = NULL)
hv |
hypervolume object |
data |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
name |
String name of hypervolume |
copy_param_hypervolume
only works if the input hypervolume was generated using method = "box"
, method = "gaussian"
, or method = "svm"
. Calling this function on hypervolumes generated from hypervolume_set
will result in an error.
Note that kde.bandwidth
is affected by size of the data and will be re-estimated using whichever method was used to generate the original bandwidth if method = "gaussian"
or method = "box"
.
Use hv@Parameters
to see what parameters are copied from the input hypervolume.
hypervolume object
## Not run: library(palmerpenguins) data("penguins") bill_data = na.omit(penguins[,3:4]) hv = hypervolume(data = bill_data, method = "gaussian", quantile.requested = .9, quantile.requested.type = "volume") # Generates a new hypervolume using the same hypervolume and data hv_copy = copy_param_hypervolume(hv, hv@Data) # Check to see that the information of the two hypervolumes is the same print(hv) print(hv_copy) ## End(Not run)
## Not run: library(palmerpenguins) data("penguins") bill_data = na.omit(penguins[,3:4]) hv = hypervolume(data = bill_data, method = "gaussian", quantile.requested = .9, quantile.requested.type = "volume") # Generates a new hypervolume using the same hypervolume and data hv_copy = copy_param_hypervolume(hv, hv@Data) # Check to see that the information of the two hypervolumes is the same print(hv) print(hv_copy) ## End(Not run)
Estimates bandwidth vector from data using multiple approaches.
estimate_bandwidth(data,method="silverman",value=NULL)
estimate_bandwidth(data,method="silverman",value=NULL)
data |
m x n matrix or data frame, where m is the number of observations and n the number of dimensions. |
method |
One of |
value |
If |
The fixed ("fixed"
) is a constant value (scalar or vector of length equal to the dimensionality of the data). The value can be set via the value
argument. If the input has length 1, the value will be repeated for all dimensions.
The Silverman ("silverman"
) estimator is defined as (4/(n+2))^(1/(n+4)) * m^(-1/(n+4))*sd(X) where m is the number of observations, n is the dimensionality, and X is the data vector in each dimension. This corresponds to the Silverman rule of thumb for multivariate data and is chosen as the default for computational speed, though other more advanced algorithms may perform better.
The Silverman ("silverman-1d"
) estimator is defined as 1.06 * sd(X) * m^(-1/5) where m is the number of observations and X is the data vector in each dimension. Minimizes mean integrated square error under the assumption the data are univariate normal. This was the default behavior in versions 1.x and 2.x of the package.
The plug-in ("plug-in"
) estimator is defined using a diagonal plug-in estimator with a 2-stage pilot estimation and a pre-scaling transformation (in ks::Hpi.diag
). The resulting diagonal variances are then transformed to standard deviations and multiplied by two to be consistent for the box kernels used here. Available only in n<7 dimensions. Minimizes sum of asymptotic mean squared error.
The cross-validation ("cross-validation"
) estimator is defined using a diagonal smoothed cross validation estimator with a 2-stage pilot estimation and a pre-scaling transformation (in ks::Hscv.diag
). The resulting diagonal variances are then transformed to standard deviations and multiplied by two to be consistent for the box kernels used here. Available only in n<7 dimensions. Minimizes sum of asymptotic mean squared error.
Note that all estimators are optimal only for normal kernels, whereas the hypervolume algorithms use box kernels - as the number of data points increases, this difference will become increasingly less important.
Computational run-times for the plug-in and cross-validation estimators may become infeasibly large in n>=4 dimensions.
Vector of length n with each entry corresponding to the estimated bandwidth along each axis. An attribute method
is also set indicating the algorithm used.
Duong, T. (2007) ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R. Journal of Statistical Software 21, (7)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] estimate_bandwidth(penguins_adelie,method="fixed",value=c(2,1,2)) estimate_bandwidth(penguins_adelie,method="silverman") estimate_bandwidth(penguins_adelie,method="plug-in") # may be quite slow to run estimate_bandwidth(penguins_adelie,method="cross-validation") # may be quite slow to run ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] estimate_bandwidth(penguins_adelie,method="fixed",value=c(2,1,2)) estimate_bandwidth(penguins_adelie,method="silverman") estimate_bandwidth(penguins_adelie,method="plug-in") # may be quite slow to run estimate_bandwidth(penguins_adelie,method="cross-validation") # may be quite slow to run ## End(Not run)
Generates expectation hypervolume corresponding to a hypersphere that minimally encloses the data.
expectation_ball(input, point.density = NULL, num.samples = NULL, use.random = FALSE)
expectation_ball(input, point.density = NULL, num.samples = NULL, use.random = FALSE)
input |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
point.density |
The point density of the output expectation. If |
num.samples |
The number of points in the output expectation. If |
use.random |
If |
A Hypervolume-class
object corresponding to the expectation.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_ball <- expectation_ball(penguins_adelie)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_ball <- expectation_ball(penguins_adelie)
Generates expectation hypervolume corresponding to an axis-aligned hyperbox that minimally encloses the data.
expectation_box(input, point.density = NULL, num.samples = NULL, use.random = FALSE)
expectation_box(input, point.density = NULL, num.samples = NULL, use.random = FALSE)
input |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
point.density |
The point density of the output expectation. If |
num.samples |
The number of points in the output expectation. If |
use.random |
If |
A Hypervolume-class
object corresponding to the expectation.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_box <- expectation_box(penguins_adelie)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_box <- expectation_box(penguins_adelie)
Generates expectation hypervolume corresponding to a convex hull (polytope) that minimally encloses the data.
expectation_convex(input, point.density = NULL, num.samples = NULL, num.points.on.hull = NULL, check.memory = TRUE, verbose = TRUE, use.random = FALSE, method = "hitandrun", chunksize = 1000)
expectation_convex(input, point.density = NULL, num.samples = NULL, num.points.on.hull = NULL, check.memory = TRUE, verbose = TRUE, use.random = FALSE, method = "hitandrun", chunksize = 1000)
input |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
point.density |
The point density of the output expectation. If |
num.samples |
The number of points in the output expectation. If |
num.points.on.hull |
Number of points of the input used to calculate the convex hull. Larger values are more accurate but may lead to slower runtimes. If |
check.memory |
If |
verbose |
If |
use.random |
If |
method |
One of |
chunksize |
Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled. |
The rejection sampling algorithm generates random points within a hyperbox enclosing the points, then sequentially tests whether each is in or out of the convex polytope based on a dot product test. It becomes exponentially inefficient in high dimensionalities. The hit-and-run sampling algorithm generates a Markov chain of samples that eventually converges to the true distribution of points within the convex polytope. It performs better in high dimensionalities but may not converge quickly. It will also be slow if the number of simplices on the convex polytope is large.
Both algorithms may become impracticably slow in >= 6 or 7 dimensions.
A Hypervolume-class
object corresponding to the expectation hypervolume.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_convex <- expectation_convex(penguins_adelie, check.memory=FALSE) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] e_convex <- expectation_convex(penguins_adelie, check.memory=FALSE) ## End(Not run)
Creates a hypervolume from a set of points reflecting the maximal expectation.
expectation_maximal(input, ...)
expectation_maximal(input, ...)
input |
A dataset to be used as input to the |
... |
Arguments to the |
This function is effectively an alias for the hypervolume
function. You must decide what the maximal expectation is yourself!
A Hypervolume object.
The find_optimal_occupancy_thin()
function is used to find the optimal parameters for
hypervolume_n_occupancy()
.
find_optimal_occupancy_thin(..., verbose = TRUE, sequence = seq(0, 1, 0.1), n = 10, res_type = "raw")
find_optimal_occupancy_thin(..., verbose = TRUE, sequence = seq(0, 1, 0.1), n = 10, res_type = "raw")
... |
Parameters to be used to run |
verbose |
Logical value; print diagnostic output if |
sequence |
Quantiles to be tested. |
n |
Number of seeds to be tested. |
res_type |
If |
The find_optimal_occupancy_thin()
function searches for the optimal parameters for running hypervolume_n_occupancy()
. It works by testing different quantiles and n
seeds for random number generation (the same set of n seeds is tested for each quantile). RMSE is returned as the measure of the goodness of fit and results are ordered by increasing RMSE when res_type = "raw"
. Quantile equal to 0 correspond to no thin. The obtained parameters can be used to feed arguments quant.thin
and seed
within the function hypervolume_n_occupancy()
.
A data.frame
.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # find optimal parameters opt_par = find_optimal_occupancy_thin(hv_list, classification = rep(c("female", "male"), 3), n = 20) head(opt_par) unoptimized_hv_occ = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) optimized_hv_occ = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3), quant.thin = opt_par[1, 2], seed = opt_par[1, 1]) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # find optimal parameters opt_par = find_optimal_occupancy_thin(hv_list, classification = rep(c("female", "male"), 3), n = 20) head(opt_par) unoptimized_hv_occ = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) optimized_hv_occ = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3), quant.thin = opt_par[1, 2], seed = opt_par[1, 1]) ## End(Not run)
Returns the column mean of the random points in each hypervolume.
get_centroid(hv)
get_centroid(hv)
hv |
A |
Either a vector or a matrix of column of centroid values along each axis.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie) get_centroid(hv) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie) get_centroid(hv) ## End(Not run)
Returns the column weighted mean of the random points in each hypervolume. Useful for hypervolumes generated with hypervolume_n_occupancy()
or hypervolume_n_occupancy_test()
.
get_centroid_weighted(hv)
get_centroid_weighted(hv)
hv |
A |
The function get_centroid_weighted()
differs from get_centroid()
because it uses occupancy values to weight random points for evaluating centroids position.
Either a vector or a matrix of column of centroid values along each axis.
hypervolume_n_occupancy
, hypervolume_n_occupancy_test
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) hv_list = lapply(penguins_no_na_split, function(x) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100)) hv_list = hypervolume_join(hv_list) hv_occupancy = hypervolume_n_occupancy(hv_list) # unweighted centroids get_centroid(hv_occupancy) # weighted centroids get_centroid_weighted(hv_occupancy) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) hv_list = lapply(penguins_no_na_split, function(x) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100)) hv_list = hypervolume_join(hv_list) hv_occupancy = hypervolume_n_occupancy(hv_list) # unweighted centroids get_centroid(hv_occupancy) # weighted centroids get_centroid_weighted(hv_occupancy) ## End(Not run)
The get_occupancy_intersection_bootstrap()
function is used to get the volume of the intersection of objects generated with hypervolume_n_occupancy_bootstrap()
. It provides raw values or summary statistics for all the hypervolumes or their n_wise combinations.
get_occupancy_intersection_bootstrap(path, method = "n_wise", res_type = "summary", m = 2, relative = FALSE, tol = 1e-10)
get_occupancy_intersection_bootstrap(path, method = "n_wise", res_type = "summary", m = 2, relative = FALSE, tol = 1e-10)
path |
A path to a directory of bootstrapped occupancy objects obtained with |
method |
If |
res_type |
It can be |
m |
Number of elements to choose. Default to 2 (pairwise comparisons). This argument is ignored when |
relative |
If |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The function get_occupancy_intersection_bootstrap()
returns the volume of the intersection for each bootstrapped occupancy object if res_type = "raw"
and method = "all"
. When res_type = "summary"
and method = "all"
this function returns the mean volume as well as the standard deviation, median, minimum, maximum, 2.5% and 97.5% quantiles, skewness and kurtosis of the intersection. The same summary statistics are calculated for each n_wise combination of hypervolumes when res_type = "summary"
and method = "n_wise"
. The number of elements of n_wise combinations is set with the argument m
. The intersection is calculated by finding the set of random points shared by all or n_wise combinations of hypervolumes in each of the bootstrapped occupancy objects. More details on how the intersection is computed in occupancy_to_intersection()
.
The get_occupancy_intersection_bootstrap()
function attempts to reconstruct the volume of the intersection from each bootstrapped occupancy object. At first, the volume of the union of hypervolumes is calculated for each hypervolume of the jth bootstrapped occupancy object as the ratio between the total number of random points and the number of random points of the ith hypervolume of the jth bootstrapped occupancy object, multiplied by the volume of the ith hypervolume of the jth bootstrapped occupancy object. This step results in a number of reconstructed volumes equal to the number of hypervolumes in the jth bootstrapped occupancy object. Reconstructed volumes are then compared among each other to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated using the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned. The volume of the intersection is then calculated as the ratio between the number of random points of the intersection and the total number of random points, multiplied by the volume of the union of hypervolumes.
When relative = TRUE
relative instead of absolute volumes are returned. The relative volume is calculated as the ratio between the volume of the intersection and the volume of the union of all the hypervolumes (or combination of hypervolumes when method = "n_wise"
). The same approach described above is used to reconstruct the volume of the union of hypervolumes.
A data.frame
with bootstrapped volumes or summary statistics of the intersection.
hypervolume_n_occupancy
, hypervolume_n_occupancy_bootstrap
, occupancy_to_intersection
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # get the intersection get_occupancy_intersection_bootstrap(hv_occupancy_boot_sex) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # get the intersection get_occupancy_intersection_bootstrap(hv_occupancy_boot_sex) ## End(Not run)
Functions get_occupancy_stats()
and get_occupancy_stats_bootstrap()
return the results of a function applied to hypervolumes generated with hypervolume_n_occupancy()
,
hypervolume_n_occupancy_bootstrap()
, hypervolume_n_occupancy_permute()
or hypervolume_n_occupancy_test()
.
get_occupancy_stats(hv, FUN, remove_zeroes = TRUE) get_occupancy_stats_bootstrap(path, FUN, remove_zeroes = TRUE, method = "pairwise", res_type = "summary", verbose = TRUE, cores = 1)
get_occupancy_stats(hv, FUN, remove_zeroes = TRUE) get_occupancy_stats_bootstrap(path, FUN, remove_zeroes = TRUE, method = "pairwise", res_type = "summary", verbose = TRUE, cores = 1)
hv |
A |
FUN |
The function to be applied. |
remove_zeroes |
Remove zeroes before the calculation. See Details. |
path |
A path to a directory of bootstrapped hypervolumes obtained with
|
method |
If |
res_type |
It can be |
verbose |
Logical value; print diagnostic output if |
cores |
Number of logical cores to use while generating permuted hypervolumes. If parallel backend already registered to |
The get_occupancy_stats()
and get_occupancy_stats_bootstrap()
functions take ValueAtRandomPoints
of each hypervolume as input to FUN
(e.g. mean, median).
The get_occupancy_stats_bootstrap()
function applies the function to bootstrapped occupancy objects generated with hypervolume_n_occupancy_bootstrap()
. If res_type = "raw"
raw values of the applied functions are returned for each occupancy object in path
, only when method = "all"
. If res_type = "summary"
the mean value as well as the standard deviation, median, minimum, maximum, 2.5% and 97.5% quantiles, skewness and kurtosis are returned either for individual hypervolumes (method = "all"
) or pairwise comparisons (method = "pairwise"
).
The get_occupancy_stats()
and get_occupancy_stats_bootstrap()
functions remove occupancy values equal to 0 by default. These values are generated during the occupancy routine when a random point is included in some groups of hypervolumes but not in others. A tipical usage of
get_occupancy_stats()
or get_occupancy_stats_bootstrap()
should remove zeroes before applying a function (the default).
Either a vector
, a matrix
or a data.frame
with the results of the applied function.
hypervolume_n_occupancy
, hypervolume_n_occupancy_bootstrap
,
hypervolume_n_occupancy_permute
, hypervolume_n_occupancy_test
## Not run: ##### single occupancy object ##### data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # calculate the mean occupancy value get_occupancy_stats(hv_occupancy_list_sex, mean) ##### bootstrapped occupancy objects ##### # bootstrap input hypervolumes hv_boot = hypervolume_n_resample(name = "example", hv_list = hv_list, n = 9) # calculate occupancy on bootstrapped hypervolumes hv_boot_occ = hypervolume_n_occupancy_bootstrap(hv_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # calculate summary statistics for pairwise comparisons get_occupancy_stats_bootstrap(hv_boot_occ, FUN = mean) ## End(Not run)
## Not run: ##### single occupancy object ##### data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # calculate the mean occupancy value get_occupancy_stats(hv_occupancy_list_sex, mean) ##### bootstrapped occupancy objects ##### # bootstrap input hypervolumes hv_boot = hypervolume_n_resample(name = "example", hv_list = hv_list, n = 9) # calculate occupancy on bootstrapped hypervolumes hv_boot_occ = hypervolume_n_occupancy_bootstrap(hv_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # calculate summary statistics for pairwise comparisons get_occupancy_stats_bootstrap(hv_boot_occ, FUN = mean) ## End(Not run)
The function get_occupancy_volume_bootstrap()
extract the volume from objects generated with hypervolume_n_occupancy_bootstrap()
. It provides raw values or summary statistics for both single hypervolumes or their pairwise comparisons.
get_occupancy_volume_bootstrap(path, method = "all", res_type = "raw", relative = FALSE, tol = 1e-10)
get_occupancy_volume_bootstrap(path, method = "all", res_type = "raw", relative = FALSE, tol = 1e-10)
path |
A path to a directory containing bootstrapped occupancy objects generated with |
method |
If |
res_type |
It can be |
relative |
If |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The function get_occupancy_volume_bootstrap()
returns the volume for each bootstrapped hypervolume if res_type = "raw"
and method = "all"
. When res_type = "summary"
and method = "all"
this function returns the mean volume as well as the standard deviation, median, minimum, maximum, 2.5% and 97.5% quantiles, skewness and kurtosis for each of hypervolume. The same summary statistics are calculated for the difference of volume for each pairwise combination of hypervolumes when res_type = "summary"
and method = "pairwise"
.
When relative = TRUE
relative instead of absolute volumes are returned. The relative volume is calculated as the ratio between the volume of an hypervolume and the volume of the union of all the hypervolumes. The get_occupancy_volume_bootstrap()
function attempts to reconstruct the volume of the union of all the hypervolumes from each bootstrapped hypervolume. At first, the volume of the union of hypervolumes is calculated for each hypervolume of the jth bootstrapped occupancy_object as the the ratio between the total number of random points and the number of random points of the ith hypervolume of the jth bootstrapped occupancy_object, multiplied by the volume of the ith hypervolume of the jth bootstrapped occupancy_object. This step results in a number of reconstructed volumes equal to the number of hypervolumes in the jth bootstrapped occupancy_object. Reconstructed volumes are then compared among each other to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated with the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned.
A data.frame
with bootstrapped volumes or summary statistics for single hypervolumes or their pairwise comparisons.
hypervolume_n_occupancy
, hypervolume_n_occupancy_bootstrap
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # get the volume of the bootstrapped hypervolumes get_occupancy_volume_bootstrap(hv_occupancy_boot_sex) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) # get the volume of the bootstrapped hypervolumes get_occupancy_volume_bootstrap(hv_occupancy_boot_sex) ## End(Not run)
The function get_relative_volume()
computes the relative volume from objects generated with the occupancy routine.
get_relative_volume(hv_list, tol = 1e-10)
get_relative_volume(hv_list, tol = 1e-10)
hv_list |
A |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The relative volume is calculated as the ratio between hypervolumes of an HypervolumeList
and the volume resulting from the union of hypervolumes in the same HypervolumeList
. Relative volumes can be calculated only for HypervolumeList
generated with functions hypervolume_n_occupancy()
, hypervolume_n_occupancy_test()
, hypervolume_n_occupancy_permute()
, occupancy_to_union()
, occupancy_to_ushared()
, occupancy_to_intersection()
or occupancy_filter()
.
The get_relative_volume()
function attempts to reconstruct the volume of the union of hypervolumes from hv_list
. At first, the volume of the union of hypervolumes is calculated for each hypervolume of hv_list
as the the ratio between the total number of random points and the number of random points of the ith hypervolume of hv_list
, multiplied by the volume of the ith hypervolume hv_list
. This step results in a number of reconstructed volumes equal to the number of hypervolumes in hv_list
. Reconstructed volumes are then compared to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated with the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned.
A named numeric vector with the relative volume of each input hypervolume
hypervolume_n_occupancy
, hypervolume_n_occupancy_permute
, hypervolume_n_occupancy_test
, occupancy_to_union
,
occupancy_to_unshared
, occupancy_to_intersection
,
occupancy_filter
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # get the relative volume get_relative_volume(hv_occupancy_list_sex) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # get the relative volume get_relative_volume(hv_occupancy_list_sex) ## End(Not run)
Extract volume from Hypervolume or HypervolumeList object
## S3 method for class 'Hypervolume' get_volume(object) ## S3 method for class 'HypervolumeList' get_volume(object)
## S3 method for class 'Hypervolume' get_volume(object) ## S3 method for class 'HypervolumeList' get_volume(object)
object |
A |
A named numeric vector with the volume of each input hypervolume
Constructs hypervolumes using one of several possible methods after error-checking input data.
hypervolume(data, method = "gaussian", ...)
hypervolume(data, method = "gaussian", ...)
data |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
method |
One of |
... |
Further arguments passed to |
Checks for collinearity, missingness of input data, and appropriate random point coverage. Generates warning/errors as appropriate.
A Hypervolume-class
object corresponding to the inferred hypervolume.
weight_data
, estimate_bandwidth
, expectation_convex
, expectation_ball
, expectation_box
, hypervolume_threshold
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume(penguins_adelie,method='box')
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume(penguins_adelie,method='box')
Constructs a hypervolume from a set of observations via thresholding a kernel density estimate of the observations. Assumes an axis-aligned hyperbox kernel.
hypervolume_box(data, name = NULL, verbose = TRUE, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), kde.bandwidth = 2*estimate_bandwidth(data), tree.chunksize = 10000)
hypervolume_box(data, name = NULL, verbose = TRUE, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), kde.bandwidth = 2*estimate_bandwidth(data), tree.chunksize = 10000)
data |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
name |
A string to assign to the hypervolume for later output and plotting. Defaults to the name of the variable if NULL. |
verbose |
Logical value; print diagnostic output if |
samples.per.point |
Number of random points to be evaluated per data point in |
kde.bandwidth |
A scalar or a n x 1 vector corresponding to the half-width of the box kernel in each dimension. If a scalar input, the single value is used for all dimensions. Several esimation methods are available in |
tree.chunksize |
Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled. |
Constructs a kernel density estimate by overlaying hyperbox kernels on each datapoint, then sampling uniformly random points from each kernel. Kernel density at each point is then determined by a range query on a recursive partitioning tree and used to resample these random points to a uniform density and fixed number, from which a volume can be inferred.
Note that when comparing among hypervolumes constructed with fixed bandwidth, volume will be approximately a an approximately linear function of the number of input data points.
Note that this function returns an unthresholded hypervolume. To assign a quantile threshold, use hypervolume_threshold
.
A Hypervolume-class
object corresponding to the inferred hypervolume.
hypervolume_threshold
, estimate_bandwidth
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') summary(hv)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') summary(hv)
Calculates the distance between two hypervolumes either defined as the Euclidean distance between centroids or as the minimum Euclidean distance between the random points comprising either hypervolume.
hypervolume_distance(hv1, hv2, type = "centroid", num.points.max = 1000, check.memory = TRUE)
hypervolume_distance(hv1, hv2, type = "centroid", num.points.max = 1000, check.memory = TRUE)
hv1 |
A |
hv2 |
A |
type |
If 'centroid', the centroid distance; if 'minimum', the minimum distance. |
num.points.max |
The number of random points to subsample from each input hypervolume. Ignored if |
check.memory |
If |
Minimum distance calculations scale quadratically with npmax
and may be computationally costly.
The distance between the two hypervolumes.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_gaussian(penguins_adelie) hv2 = hypervolume_gaussian(penguins_chinstrap) # note that minimum distance is smaller than centroid distance as expected hypervolume_distance(hv1, hv2, type='centroid') hypervolume_distance(hv1, hv2, type='minimum', num.points.max=500, check.memory=FALSE) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_gaussian(penguins_adelie) hv2 = hypervolume_gaussian(penguins_chinstrap) # note that minimum distance is smaller than centroid distance as expected hypervolume_distance(hv1, hv2, type='centroid') hypervolume_distance(hv1, hv2, type='minimum', num.points.max=500, check.memory=FALSE) ## End(Not run)
Estimates probability density at one or more of points within or outside a hypervolume. The estimation is carried out as the weighted sum of the probability density of all subsampled random points in the input hypervolume, where the weights are proportional to the distance from the test point raised to a certain power. The default power, -1, corresponds to inverse distance weighting.
hypervolume_estimate_probability(hv, points, reduction.factor = 1, weight.exponent = -1, set.edges.zero = TRUE, edges.zero.distance.factor = 1, parallel = FALSE, n.cores = 1, verbose = TRUE, ...)
hypervolume_estimate_probability(hv, points, reduction.factor = 1, weight.exponent = -1, set.edges.zero = TRUE, edges.zero.distance.factor = 1, parallel = FALSE, n.cores = 1, verbose = TRUE, ...)
hv |
An input hypervolume |
points |
A m x n matrix of m points of dimensionality n (same as the input hypervolume). These are the points at which the probability is to be estimated. |
reduction.factor |
A value between 0 and 1 corresponding to a thinning factor applied to random points of the input hypervolume. Smaller values result in faster runtimes but lower accuracy. |
weight.exponent |
The exponent of the distance weights. Should be negative and probably does not need to be changed. |
set.edges.zero |
If |
edges.zero.distance.factor |
Positive number used to multiply the critical distance for |
parallel |
If |
n.cores |
Number of cores to use in parallel operation. |
verbose |
If |
... |
Other arguments to be passed to |
Identifies the uniformly random points enclosed within a hypersphere centered on the point of interest, then averages the probability density at each of these points.
A vector of probability densities of length corresponding to m, the number of input points.
hypervolume_inclusion_test
, hypervolume_redundancy
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') new_points = data.frame(bill_length_mm=c(0,38), bill_depth_mm=c(0,18),flipper_length_mm=c(0,190)) probs <- hypervolume_estimate_probability(hv, points=new_points) probs # should give a zero value and a non-zero value # example for parallel operation # probs_new <- hypervolume_estimate_probability(hv, points=new_points, parallel=TRUE, n.cores=2)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') new_points = data.frame(bill_length_mm=c(0,38), bill_depth_mm=c(0,18),flipper_length_mm=c(0,190)) probs <- hypervolume_estimate_probability(hv, points=new_points) probs # should give a zero value and a non-zero value # example for parallel operation # probs_new <- hypervolume_estimate_probability(hv, points=new_points, parallel=TRUE, n.cores=2)
This function takes in hypervolumes bootstrapped at different sample sizes applies a function to each hypervolume. The output of the function can either be a plot of nonparametric confidence intervals or a table of the mean and quantiles.
hypervolume_funnel(input_path, title = NULL, func = get_volume, CI = .95, as_table = FALSE)
hypervolume_funnel(input_path, title = NULL, func = get_volume, CI = .95, as_table = FALSE)
input_path |
output of |
title |
title of output plot, ignore if outputting as table |
func |
a function that takes a single parameter which is a hypervolume and returns a numerical value. |
CI |
Confidence interval is taken by using the the (1-CI)/2 and (1+CI)/2 quantile |
as_table |
If TRUE, returns a table with columns upper quantile, mean, lower quantile |
This function is used to evaluate the behavior of hypervolumes at different sample sizes and determine bias. Statistics such as volume are affected by sample size especially when the hypervolumes are constructed with method = "gaussian" since the bandwidth estimate is dependent on sample size.
ggplot object, or dataframe object
## Not run: # 3000 data point hypervolume data(quercus) hv_quercus = hypervolume(quercus[,c(2,3)]) # the seq argument is equivalent to a length 30 vector {10, 139, ... , 3649, 3779} # 6hr sequential runtime quercus_bootstrap_seq <- resample('quercus_bootstrap_seq', hv_quercus, method = 'bootstrap seq', points_per_resample = "sample_size", seq = floor(seq(10, 3779, length.out = 30)), cores = 20) # Compatible with ggplot syntax when used with as_table = FALSE hypervolume_funnel(quercus_bootstrap_seq, title = 'Resampled volumes of Quercus', func = get_volume) + geom_line(aes(y = get_volume(hv_quercus))) + ylab("Volume") ## End(Not run)
## Not run: # 3000 data point hypervolume data(quercus) hv_quercus = hypervolume(quercus[,c(2,3)]) # the seq argument is equivalent to a length 30 vector {10, 139, ... , 3649, 3779} # 6hr sequential runtime quercus_bootstrap_seq <- resample('quercus_bootstrap_seq', hv_quercus, method = 'bootstrap seq', points_per_resample = "sample_size", seq = floor(seq(10, 3779, length.out = 30)), cores = 20) # Compatible with ggplot syntax when used with as_table = FALSE hypervolume_funnel(quercus_bootstrap_seq, title = 'Resampled volumes of Quercus', func = get_volume) + geom_line(aes(y = get_volume(hv_quercus))) + ylab("Volume") ## End(Not run)
Constructs a hypervolume by building a Gaussian kernel density estimate on an adaptive grid of random points wrapping around the original data points. The bandwidth vector reflects the axis-aligned standard deviations of a hyperelliptical kernel.
Because Gaussian kernel density estimates do not decay to zero in a finite distance, the algorithm evaluates the kernel density in hyperelliptical regions out to a distance set by sd.count
.
After delineating the probability density, the function calls hypervolume_threshold
to determine a boundary. The defaullt behavior ensures that 95 percent of the stimated probability density is enclosed by the chosen boundary. However note that theaccuracy of the total probability density depends on having set a large value of sd.count
.
Most use cases should not require modification of any parameters except kde.bandwidth
.
Optionally, weighting of the data (e.g. for abundance-weighting) is possible. By default, the function estimates the probability density of the observations via Gaussian kernel functions, assuming each data point contributes equally. By setting a weight
parameter, the algorithm can instead take a weighted average the kernel functions centered on each observation. Code for weighting data written by Yuanzhi Li ([email protected]).
hypervolume_gaussian(data, name = NULL, weight = NULL, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), kde.bandwidth = estimate_bandwidth(data), sd.count = 3, quantile.requested = 0.95, quantile.requested.type = "probability", chunk.size = 1000, verbose = TRUE, ...)
hypervolume_gaussian(data, name = NULL, weight = NULL, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), kde.bandwidth = estimate_bandwidth(data), sd.count = 3, quantile.requested = 0.95, quantile.requested.type = "probability", chunk.size = 1000, verbose = TRUE, ...)
data |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
name |
A string to assign to the hypervolume for later output and plotting. Defaults to the name of the variable if NULL. |
weight |
An optional vector of weights for the kernel density estimation. Defaults to even weighting ( |
samples.per.point |
Number of random points to be evaluated per data point in |
kde.bandwidth |
A bandwidth vector obtained by running |
sd.count |
The number of standard deviations (converted to actual units by multiplying by |
quantile.requested |
The quantile value used to delineate the boundary of the kernel density estimate. See |
quantile.requested.type |
The type of quantile (volume or probability) used for the boundary delineation. See |
chunk.size |
Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled. |
verbose |
Logical value; print diagnostic output if |
... |
Other arguments to pass to |
A Hypervolume-class
object corresponding to the inferred hypervolume.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] # low samples per point for CRAN demo hv = hypervolume_gaussian(penguins_adelie,name='Adelie',samples.per.point=100) summary(hv)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] # low samples per point for CRAN demo hv = hypervolume_gaussian(penguins_adelie,name='Adelie',samples.per.point=100) summary(hv)
Uses rejection sampling to obtain predicted values of a model object at uniformly random points within a range box, then converts output to a hypervolume.
hypervolume_general_model(model, name = NULL, verbose = TRUE, data = NULL, range.box = NULL, num.samples = ceiling(10^(3 + sqrt(ncol(range.box)))), chunk.size = 10000, min.value = 0, ...)
hypervolume_general_model(model, name = NULL, verbose = TRUE, data = NULL, range.box = NULL, num.samples = ceiling(10^(3 + sqrt(ncol(range.box)))), chunk.size = 10000, min.value = 0, ...)
model |
Any model object which can be used within a |
name |
Name of the output hypervolume |
verbose |
If |
data |
If not |
range.box |
A 2 x n matrix, where n is the number of dimensions of the data, and the first row corresponds to a lower limit and the second row to an upper limit. Each column is thus the low and high limits of the range box along each axis. Can be generated via |
num.samples |
Number of samples to draw from the range box. |
chunk.size |
Number of samples to process in each |
min.value |
If |
... |
Other arguments to be passed to |
A Hypervolume
-class object corresponding to retained values within the hyperbox of interest.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_no_na$is_adelie = penguins_no_na$species=="Adelie" penguins_no_na = penguins_no_na[,c("is_adelie","bill_length_mm","bill_depth_mm")] m_glm = glm(is_adelie~.,data=penguins_no_na) hv_general_glm = hypervolume_general_model(m_glm, range.box=padded_range(penguins_no_na[,2:3]),type='response') plot(hv_general_glm)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_no_na$is_adelie = penguins_no_na$species=="Adelie" penguins_no_na = penguins_no_na[,c("is_adelie","bill_length_mm","bill_depth_mm")] m_glm = glm(is_adelie~.,data=penguins_no_na) hv_general_glm = hypervolume_general_model(m_glm, range.box=padded_range(penguins_no_na[,2:3]),type='response') plot(hv_general_glm)
Detects the holes in an observed hypervolume relative to an expectation
hypervolume_holes(hv.obs, hv.exp, set.num.points.max = NULL, set.check.memory = TRUE)
hypervolume_holes(hv.obs, hv.exp, set.num.points.max = NULL, set.check.memory = TRUE)
hv.obs |
The observed hypervolume whose holes are to be detected |
hv.exp |
The expected hypervolume that provides a baseline expectation geometry |
set.num.points.max |
Maximum number of points to be used for set operations comparing |
set.check.memory |
If |
This algorithm has a good Type I error rate (rarely detects holes that do not actually exist). However it can have a high Type II error rate (failure to find holes when they do exist). To reduce this error rate, make sure to re-run the algorithm with input hypervolumes with higher values of @PointDensity
, or increase set.num.points.max
.
The algorithm performs the set difference between the observed and expected hypervolumes, then removes stray points in this hypervolume by deleting any random point whose distance from any other random point is greater than expected.
A 'rule of thumb' is that algorithm has acceptable statistical performance when log_e(m) > n, where m is the number of data points and n is the dimensionality.
A Hypervolume
object containing a uniformly random set of points describing the holes in hv_obs
. Note that the point density of this object is likely to be much lower than that of the input hypervolumes due to the stochastic geometry algorithms used.
## Not run: # generate annulus data data_annulus <- data.frame(matrix(data=runif(4000),ncol=2)) names(data_annulus) <- c("x","y") data_annulus <- subset(data_annulus, sqrt((x-0.5)^2+(y-0.5)^2) > 0.4 & sqrt((x-0.5)^2+(y-0.5)^2) < 0.5) # MAKE HYPERVOLUME (low reps for fast execution) hv_annulus <- hypervolume_gaussian(data_annulus, kde.bandwidth=0.05,name='annulus',samples.per.point=1) # GET CONVEX EXPECTATION hv_convex <- expectation_convex(hypervolume_thin(hv_annulus,num.samples=500), check.memory=FALSE,use.random=TRUE) # DETECT HOLES (low npoints for fast execution) features_annulus <- hypervolume_holes( hv.obs=hv_annulus, hv.exp=hv_convex, set.check.memory=FALSE) # CLEAN UP RESULTS features_segmented <- hypervolume_segment(features_annulus, check.memory=FALSE,distance.factor=2) features_segmented_pruned <- hypervolume_prune(features_segmented, volume.min=0.02) # PLOT RETAINED HOLE(S) plot(hypervolume_join(hv_annulus, features_segmented_pruned)) ## End(Not run)
## Not run: # generate annulus data data_annulus <- data.frame(matrix(data=runif(4000),ncol=2)) names(data_annulus) <- c("x","y") data_annulus <- subset(data_annulus, sqrt((x-0.5)^2+(y-0.5)^2) > 0.4 & sqrt((x-0.5)^2+(y-0.5)^2) < 0.5) # MAKE HYPERVOLUME (low reps for fast execution) hv_annulus <- hypervolume_gaussian(data_annulus, kde.bandwidth=0.05,name='annulus',samples.per.point=1) # GET CONVEX EXPECTATION hv_convex <- expectation_convex(hypervolume_thin(hv_annulus,num.samples=500), check.memory=FALSE,use.random=TRUE) # DETECT HOLES (low npoints for fast execution) features_annulus <- hypervolume_holes( hv.obs=hv_annulus, hv.exp=hv_convex, set.check.memory=FALSE) # CLEAN UP RESULTS features_segmented <- hypervolume_segment(features_annulus, check.memory=FALSE,distance.factor=2) features_segmented_pruned <- hypervolume_prune(features_segmented, volume.min=0.02) # PLOT RETAINED HOLE(S) plot(hypervolume_join(hv_annulus, features_segmented_pruned)) ## End(Not run)
Determines if a set of points are within a hypervolume. Can operate using a 'fast' algorithm which determines whether at least one random point of the hypervolume is within a critical distance of the test point. This algorithm is very efficient but leads to noisy and error-prone results when the point density slow. A warning is generated if this algorithm is used.
The function can also operate using an 'accurate' algorithm which estimates the probability density at the test point, and rejects it if it is below the requested threshold value. This is very slow but guarantees good results.
hypervolume_inclusion_test(hv, points, reduction.factor = 1, fast.or.accurate = "fast", fast.method.distance.factor = 1, accurate.method.threshold = quantile(hv@ValueAtRandomPoints, 0.5), verbose = TRUE, ...)
hypervolume_inclusion_test(hv, points, reduction.factor = 1, fast.or.accurate = "fast", fast.method.distance.factor = 1, accurate.method.threshold = quantile(hv@ValueAtRandomPoints, 0.5), verbose = TRUE, ...)
hv |
n-dimensional hypervolume to compare against |
points |
Candidate points. A m x n matrix or dataframe, where m is the number of candidate points and n is the number of dimensions. |
reduction.factor |
A number in (0,1] that represents the fraction of random points sampled from the hypervolume for the stochastic inclusion test. Larger values are more accurate but computationally slower. |
fast.or.accurate |
If |
fast.method.distance.factor |
Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Used only when |
accurate.method.threshold |
Numeric value; threshold probability value below which the point is determined to be out of the hypervolume. Used only when |
verbose |
Logical value; print diagnostic output if true. |
... |
Additional arguments to be passed to either |
A m x 1 logical vector indicating whether each candidate point is in the hypervolume.
## Not run: # construct a hypervolume of points in the unit square [0,1] x [0,1] data = data.frame(x=runif(100,min=0,max=1), y=runif(100,min=0,max=1)) hv = hypervolume_gaussian(data) # test if (0.5,0.5) and (-1,1) are in - should return TRUE FALSE hypervolume_inclusion_test(hv, points=data.frame(x=c(0.5,-1),y=c(0.5,-1))) ## End(Not run)
## Not run: # construct a hypervolume of points in the unit square [0,1] x [0,1] data = data.frame(x=runif(100,min=0,max=1), y=runif(100,min=0,max=1)) hv = hypervolume_gaussian(data) # test if (0.5,0.5) and (-1,1) are in - should return TRUE FALSE hypervolume_inclusion_test(hv, points=data.frame(x=c(0.5,-1),y=c(0.5,-1))) ## End(Not run)
Combines multiple hypervolumes or hypervolume lists into a single HypervolumeList suitable for analysis or plotting.
hypervolume_join(...)
hypervolume_join(...)
... |
One or more objects of class |
A HypervolumeList
containing all hypervolumes in all arguments.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hvs_joined = hypervolume_join(hv1, hv2)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hvs_joined = hypervolume_join(hv1, hv2)
Computes the occupancy of hyperspace by one or more groups of hypervolumes.
hypervolume_n_occupancy(hv_list, classification = NULL, method = "subsample", FUN = mean, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE, box_density = 5000, thin = FALSE, quant.thin = 0.5, seed = NULL, print_log = FALSE) hypervolume_n_occupancy_bootstrap(path, name = NULL, classification = NULL, method = "subsample", FUN = mean, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE, box_density = 5000, thin = FALSE, quant.thin = 0.5, seed = NULL)
hypervolume_n_occupancy(hv_list, classification = NULL, method = "subsample", FUN = mean, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE, box_density = 5000, thin = FALSE, quant.thin = 0.5, seed = NULL, print_log = FALSE) hypervolume_n_occupancy_bootstrap(path, name = NULL, classification = NULL, method = "subsample", FUN = mean, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE, box_density = 5000, thin = FALSE, quant.thin = 0.5, seed = NULL)
hv_list |
An |
classification |
A vector assigning each hypervolume in the |
method |
Can be |
FUN |
A function to aggregate points within each group. Default to |
num.points.max |
Maximum number of random points to use for set operations. If |
verbose |
Logical value; print diagnostic output if |
distance.factor |
Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Recommended to not change this parameter. |
check.hyperplane |
Check if data is hyperplanar. |
box_density |
Density of random points to fill the hyperbox when method is equal to |
thin |
Take a subsample of random points to get a more uniform distribution of random points. Intended to be used with |
quant.thin |
Set quantile for using when |
seed |
Set seed for random number generation. Useful for having reproducible results and with the use of |
print_log |
Save a log file with the volume of each input hypervolume, recomputed volume and the ratio between the original and recomputed hypervolumes. It works for |
path |
A path to a directory of bootstrapped hypervolumes obtained with |
name |
File name; The function writes hypervolumes to file in "./Objects/<name>". |
Uses the inclusion test approach to count how many hypervolumes include each random point. Counts range from 0 (no hypervolumes contain a given random point), to the number of hypervolumes in a group (all the hypervolumes contain a given random point). A function FUN
, usually mean
or sum
, is then applied. A hypervolume is then returned for each group and the occupancy stored in ValueAtRandomPoints
. IMPORTANT: random points with
ValueAtRandomPoints
equal to 0 are not removed to ease downstream calculation.
When method = "subsample"
the computation is performed on a random sample from input hypervolumes, constraining each to have the same point density given by the minimum of the point density of each input hypervolume and the point density calculated using the volumes of each input hypervolume divided by num.points.max
.
Because this algorithm is based on distances calculated between the distributions of random points, the critical distance (point density ^ (-1/n)) can be scaled by a user-specified factor to provide more or less liberal estimates (distance_factor
greater than or less than 1).
Two methods can be used for calculating the occupancy. The method subsample
is based on a random sample of points from input hypervolumes. Each point is selected with a probability set to the inverse of the number of neighbour points calculated according to the critical distance. This method performs accurately when input hypervolumes have a low degree of overlap. The method box
create a bounding box around the union of input hypervolumes. The bounding box is filled with points following a uniform distribution and with a density set with the argument box_density
. A greater density usually provides more accurate results. The method box
performs better than the method subsample
in low dimensions, while in higher dimensions the method box
become computationally inefficient as nearly all of the hyperbox sampling space will end up being empty and most of the points will be rejected.
When verbose = TRUE
the volume of each input hypervolume will be printed to screen togheter with the recomputed volume and the ratio between the original and recomputed hypervolumes. Mean absolute error (MAE) and root mean square error (RMSE) are also provided as overall measures of the goodness of fit. A log file will be saved in the working directory with the information about the volume of input hypervolumes, the recomputed volume and the ratio between the original and recomputed hypervolumes.
When thin = TRUE
an algorithm is applied to try to make the distribution of random points more uniform. Moderate departures from uniform distribution can in fact result from applying hypervolume_n_occupancy()
on hypervolumes with a high overlap degree. At first, the algorithm in thin
calculates the minimum distance from the neighboor points within the critical distance for each random point. A quantile (set with quant.thin
) of these distances is taken and set as the threshold distance. Random points are then subset so that the distance of a point to another is greater than the threshold distance.
The function hypervolume_n_occupancy_bootstrap()
takes a path of bootstrapped hypervolumes generated with hypervolume_n_resample()
as input. It creates a directory called Objects in the current working directory if a directory of that name doesn't already exist where storing occupancy objects. The function hypervolume_n_occupancy_bootstrap()
returns the absolute path to the directory with bootstrapped hypervolumes. It automatically saves a log file with the volume of each input hypervolume, the recomputed volume and the ratio between the original and recomputed hypervolumes. The log file is used with occupancy_bootstrap_gof()
.
hypervolume_n_occupancy()
returns a Hypervolume
or HypervolumeList
whose number of hypervolumes equals the number of groups in classification
. hypervolume_n_occupancy_bootstrap()
returns a string containing an absolute path equivalent to ./Objects/<name>.
find_optimal_occupancy_thin
, occupancy_bootstrap_gof
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) hv_list <- hypervolume_join(hv_list) # calculate occupancy without groups hv_occupancy <- hypervolume_n_occupancy(hv_list) plot(hv_occupancy, cex.random = 1) # calculate occupancy with groups hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) plot(hv_occupancy_list_sex, cex.random = 1, show.density = FALSE) ### hypervolume_n_occupancy_bootstrap ### # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) hv_list <- hypervolume_join(hv_list) # calculate occupancy without groups hv_occupancy <- hypervolume_n_occupancy(hv_list) plot(hv_occupancy, cex.random = 1) # calculate occupancy with groups hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) plot(hv_occupancy_list_sex, cex.random = 1, show.density = FALSE) ### hypervolume_n_occupancy_bootstrap ### # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list) # calculate occupancy on bootstrapped hypervolumes hv_occupancy_boot_sex = hypervolume_n_occupancy_bootstrap(path = hv_list_boot, name = "example_occ", classification = rep(c("female", "male"), 3)) ## End(Not run)
Permute labels of an hypervolume_n_occupancy()
object and calculate hypervolume_n_occupancy()
for the permuted objects. This function is meant for taking a sample of all permutations and does not guarantee that permutations are not repeated. Newly generated hypervolume objects are written to file. This function is to be used within the occupancy routine.
hypervolume_n_occupancy_permute(name, hv_list1, hv_list2, verbose = TRUE, n = 9, cores = 1)
hypervolume_n_occupancy_permute(name, hv_list1, hv_list2, verbose = TRUE, n = 9, cores = 1)
name |
File name; The function writes hypervolumes to file in ./Objects/<name> |
hv_list1 |
An hypervolume list generated with |
hv_list2 |
The hypervolume list used to generate |
verbose |
Logical value; outputs progress bar in console. |
n |
Number of permutations to take. |
cores |
Number of logical cores to use while generating permuted hypervolumes. If parallel backend already registered to |
hypervolume_n_occupancy_permute()
creates a directory called Objects in the current working directory if a directory of that name doesn't already exist. Within this directory, it creates a directory for each pairwise combinations of elements within hv_list1
. Group labels are permuted and a new HypervolumeList
is saved as a rds file for each pairwise combination. IMPORTANT: only group labels are permuted, random points are kept fixed and will be the same across all the permuted hypervolumes.
hypervolume_n_occupancy_test()
takes in a hypervolume_n_occupuancy_permute()
filepath output.
It is also possible to access the hypervolumes by using readRDS to read the hypervolume objects in one by one.
Returns a string containing an absolute path equivalent to ./Objects/<name>
hypervolume_n_occupancy_permute()
requires a lot of disk space especially when building occupancy hypervolumes with method = "box"
. Try with a small number of replications and check the folder Objects for memory usage before to proceed.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) names(hv_list) <- names(penguins_no_na_split) hv_list <- hypervolume_join(hv_list) hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) # takes 9 permutations on 1 core hypervolume_n_occupancy_permute("permute", hv_occupancy_list_sex, hv_list , n = 9, cores = 1) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) names(hv_list) <- names(penguins_no_na_split) hv_list <- hypervolume_join(hv_list) hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) # takes 9 permutations on 1 core hypervolume_n_occupancy_permute("permute", hv_occupancy_list_sex, hv_list , n = 9, cores = 1) ## End(Not run)
The function hypervolume_n_occupancy_test()
calculates the significance of the difference between occupancy values for each random point and for all the pairwise combinations of groups in objects generated with hypervolume_n_occupancy()
and hypervolume_n_occupancy_permute()
.
hypervolume_n_occupancy_test(observed, path, alternative = "two_sided", significance = 0.05, cores = 1, p_adjust = "none", multi_comp_type = "pairwise")
hypervolume_n_occupancy_test(observed, path, alternative = "two_sided", significance = 0.05, cores = 1, p_adjust = "none", multi_comp_type = "pairwise")
observed |
An |
path |
A path to a directory of permuted hypervolumes generated with |
alternative |
Alternative hypothesis, can be one of |
significance |
Significance values lower than this threshold will be retained. |
cores |
Number of logical cores to use while generating permuted hypervolumes. If parallel backend already registered to |
p_adjust |
Method of correction for multiple comparisons, set to |
multi_comp_type |
Type of multiple comparison. Can be |
The observed difference between ValueAtRandomPoints
of two groups is compared against null expectations generated with hypervolume_n_occupancy_permute()
.
An HypervolumeList
of length equal to the number of pairwise combinations of the observed HypervolumeList
elements. ValueAtRandomPoints
are calculated as the difference between the ValueAtRandomPoints
of the first and the second group for each pairwise combination. Only significant values are retained according to significance
.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) names(hv_list) <- names(penguins_no_na_split) hv_list <- hypervolume_join(hv_list) hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) # takes 9 permutations on 1 core hyper_permuted <- hypervolume_n_occupancy_permute("permute", hv_occupancy_list_sex, hv_list , n = 99, cores = 1) hypervolume_test <- hypervolume_n_occupancy_test(hv_occupancy_list_sex, hyper_permuted, alternative = "more", significance = 0.1) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) names(hv_list) <- names(penguins_no_na_split) hv_list <- hypervolume_join(hv_list) hv_occupancy_list_sex <- hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), each = 3)) # takes 9 permutations on 1 core hyper_permuted <- hypervolume_n_occupancy_permute("permute", hv_occupancy_list_sex, hv_list , n = 99, cores = 1) hypervolume_test <- hypervolume_n_occupancy_test(hv_occupancy_list_sex, hyper_permuted, alternative = "more", significance = 0.1) ## End(Not run)
The function hypervolume_n_resample()
generates n hypervolumes using data bootstrapped from original data of the input hypervolumes.
hypervolume_n_resample(name, hv_list, n = 10, points_per_resample = 'sample_size', cores = 1, verbose = TRUE, seed = NULL)
hypervolume_n_resample(name, hv_list, n = 10, points_per_resample = 'sample_size', cores = 1, verbose = TRUE, seed = NULL)
name |
File name; The function writes hypervolumes to file in "./Objects/<name>"" |
hv_list |
A |
n |
Number of resamples to take. Used for every method. |
points_per_resample |
Number of points in each resample. If the input is |
cores |
Number of logical cores to use while generating bootstraped hypervolumes. If parallel backend already registered to |
verbose |
Logical value; If function is being run sequentially, outputs progress bar in console. |
seed |
Set seed for random number generation. |
hypervolume_n_resample()
creates a directory called Objects in the current working directory if a directory of that name doesn't already exist. A directory is then created for each hypervolume in hv_list
. Returns an absolute path to directory with resampled hypervolumes.
It is possible to access the hypervolumes by using readRDS to read the hypervolume objects one by one.
The resampled hypervolumes are generated using the same parameters used to generate the input hypervolume. The only exception is that the bandwidth is re-estimated if method = "gaussian"
or method = "box"
. See copy_param_hypervolume
for more details.
Returns a string containing an absolute path equivalent to ./Objects/<name>
hypervolume_n_occupancy_bootstrap
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Example 1: get 50 resampled hypervolumes for each input hypervolume # Use detectCores to see how many cores are availible in current environment # Set cores = 1 to run sequentially (default) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list, n = 50) ## End(Not run)
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Example 1: get 50 resampled hypervolumes for each input hypervolume # Use detectCores to see how many cores are availible in current environment # Set cores = 1 to run sequentially (default) # bootstrap the hypervolumes hv_list_boot = hypervolume_n_resample(name = "example", hv_list, n = 50) ## End(Not run)
Generates confidence intervals of four different overlap statistics. In order to find the confidence interval for the overlap statistics of two hypervolumes, use hypervolume_resample
twice to generate bootstraps. The function takes in paths to two sets of bootstrapped hypervolumes and gets overlap statistics for each possible pair. Confidence interval is calculated by taking a quantile of generated overlap statistics.
hypervolume_overlap_confidence(path1, path2, CI = .95, cores = 1)
hypervolume_overlap_confidence(path1, path2, CI = .95, cores = 1)
path1 |
A path to a directory of bootstrapped hypervolumes |
path2 |
A path to a directory of bootstrapped hypervolumes |
CI |
Desired confidence interval proportion |
cores |
Number of logical cores to use while generating overlap statistics. If parallel backend already registered to |
The four overlap statistics are Sorensen, Jaccard, frac_unique_1, frac_unique_2. See hypervolume_overlap_statistics
Each hypervolume from path1 is overlapped with each hypervolume from path2 using hypervolume_set
. The four overlap statistics are calculated for each overlap.
jaccard |
Confidence interval for jaccard similarity score |
sorensen |
Confidence interval for sorensen similarity score |
frac_unique_1 |
Confidence interval for fraction of first hypervolume that is unique |
frac_unique_2 |
Confidence interval for fraction of second hypervolume that is unique |
distribution |
a matrix of overlap statistics used to generate the confidence intervals |
## Not run: # Let us overlap two hypervolumes generated from multivariate nomral # distributions with different means and same covariance matrices. sample1 = rmvnorm(150, mean = c(0, 0)) sample2 = rmvnorm(150, mean = c(0.5, 0.5)) hv1 = hypervolume(sample1) hv2 = hypervolume(sample2) # generates confidence intervals from quantiles of 20*20 overlaps path1 = hypervolume_resample("mean_0_0", hv1, n = 20) path2 = hypervolume_resample("mean_0.5_0.5", hv2, n = 20) result = hypervolume_overlap_confidence(path1, path2) # confidence index of Sorensen coefficient print(result["sorensen"]) ## End(Not run)
## Not run: # Let us overlap two hypervolumes generated from multivariate nomral # distributions with different means and same covariance matrices. sample1 = rmvnorm(150, mean = c(0, 0)) sample2 = rmvnorm(150, mean = c(0.5, 0.5)) hv1 = hypervolume(sample1) hv2 = hypervolume(sample2) # generates confidence intervals from quantiles of 20*20 overlaps path1 = hypervolume_resample("mean_0_0", hv1, n = 20) path2 = hypervolume_resample("mean_0.5_0.5", hv2, n = 20) result = hypervolume_overlap_confidence(path1, path2) # confidence index of Sorensen coefficient print(result["sorensen"]) ## End(Not run)
Calculates overlap metrics for two hypervolumes
hypervolume_overlap_statistics(hvlist)
hypervolume_overlap_statistics(hvlist)
hvlist |
A set of hypervolumes calculated from |
A set of multiple metrics
jaccard |
Jaccard similarity (volume of intersection of 1 and 2 divided by volume of union of 1 and 2) |
sorensen |
Sorensen similarity (twice the volume of intersection of 1 and 2 divided by volume of 1 plus volume of 2) |
frac_unique_1 |
Unique fraction 1 (volume of unique component of 1 divided by volume of 1)) |
frac_unique_2 |
Unique fraction 2 (volume of unique component of 2 divided by volume of 2)) |
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hv_set <- hypervolume_set(hv1, hv2, check.memory=FALSE) hypervolume_overlap_statistics(hv_set) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hv_set <- hypervolume_set(hv1, hv2, check.memory=FALSE) hypervolume_overlap_statistics(hv_set) ## End(Not run)
Generates null distribution of four different overlap statistics under the null hypothesis that two samples are drawn from the same population. Observed value of overlap statistic is calculated from inputed hypervolumes. calculates p value for observed value of each statistic with respect to the generated null distributions.
hypervolume_overlap_test(hv1, hv2, path, alternative = "one-sided", bins = 100, cores = 1)
hypervolume_overlap_test(hv1, hv2, path, alternative = "one-sided", bins = 100, cores = 1)
hv1 |
A hypervolume object |
hv2 |
A hypervolume object |
path |
a path to a directory containing permuted hypervolumes, bootstrapped hypervolumes, or a vector of two paths to bootstrapped hypervolumes |
alternative |
"one-sided" or "two-sided" |
bins |
plotting parameter for histogram of overlap statistics |
cores |
Number of logical cores to use while generating overlap statistics. If parallel backend already registered to |
Generating overlap statistics can be parallelized using the cores
argument.
hypervolume_overlap_test can generate a null distribution from the output of hypervolume_permute, hypervolume_resample with method = "bootstrap", or a vector of two bootstrap outputs. See examples for how to use each case.
path should point to hypervolumes generated from the two input hypervolumes. There are three valid choices:
path is generated from hypervolume_permute(<name>, hv1, hv2, ...)
. In this case the null distribution is generated by taking the overlap statistics of every single pair of permutations and turning them into a histogram.
OR
path is generated by resampling the hypervolume generated by combining the data of hv1 and hv2
If the number of data points used to generate hv1 is the same as hv2 then the path is hypervolume_resample(<name>, hv_combined, "bootstrap", points_per_resample = nrow(hv1@Data))
. In this case, the list bootstrapped hypervolumes is split in half and overlap statistics are taken for every possible pair of hypervolumes from the two halves. A histogram of these overlap statitics represent the null distribution.
If the number of data points is different between hv1 and hv2 path is a list of two paths generated from hypervolume_resample(<name>, hv_combined, "bootstrap", points_per_resample = nrow(hv1@Data), ...)
and hypervolume_resample(<name>, hv_combined, "bootstrap", points_per_resample = nrow(hv2@Data), ...)
. Overlap statistics are taken for every possible pair of hypervolumes from each bootstrap. A histogram of these overlap statitics represent the null distribution.
See example for appropriate path inputs.
The four overlap statistics are Sorensen, Jaccard, frac_unique_1, frac_unique_2. See hypervolume_overlap_statistics
for description of the statistics.
p_values |
a list of p_values indexed by the name of the relevant statistic |
plots |
a list of ggplot objects indexed by the name of the relevant statistic. The observed value of each statistic is represented as a vertical line on the x axis. |
distribution |
a matrix of overlap statistics used to generate the null distribution |
hypervolume_resample
, hypervolume_permute
## Not run: # We will use the data in "quercus" as our population in this example data("quercus") # Consider taking two samples of size 150 from the population and you want to figure out whether # the samples are similar by seeing if they occupy the same area in feature space. qsample1 = quercus[sample(1:nrow(quercus), 150),] qsample2 = quercus[sample(1:nrow(quercus), 150),] # Construct two hypervolumes from the samples hv1 = hypervolume(qsample1[,2:3]) hv2 = hypervolume(qsample2[,2:3]) # Approach 1 # Take 200 permutations of the 300 data points. Using more cores is faster. perm_path = hypervolume_permute("Quercus_perm_150", hv1, hv2, n = 200, cores = 20) # hypervolume_overlap_test takes perm_path as an input. # Results include p values for the overlap statistics of hv1 and hv2 as well as # the corresponding null distributions generated from perm_path. results1 = hypervolume_overlap_test(hv1, hv2, perm_path, cores = 20) # Approach 2 # Under our null hypothesis the samples come from the same population. # Approximate the original population by combining the data # then simulate drawing 150 data points 50 times. hv_combine = hypervolume(rbind(qsample1[,2:3],qsample2[,2:3])) bootstrap_path = hypervolume_resample("Quercus_boot_150", hv_combine, method = "bootstrap", n = 50, points_per_resample = 150, cores = 20) # hypervolume_overlap_test splits the 50 resampled hypervolumes in half and gets # overlap statistic for each of the 25*25 pairs to generate the null # distribution. This method allows us to approximate the null distribution using # 625 data points while only generating 50 hypervolumes as opposed to # hypervolume_permute which uses 400 hypervolumes to generate 200 data points. results2 = hypervolume_overlap_test(hv1, hv2, bootstrap_path) # Approach 3 # Suppose we have a size 300 sample and a size 150 sample and we want to know # whether they come from the same distribution. qsample3 = quercus[sample(1:nrow(quercus), 300),] hv3 = hypervolume(qsample3[,2:3]) # Permutation still works in this case, however we can also use bootstrap by # combining the data and drawing size 150 then size 300 samples. hv_combine = hypervolume(rbind(qsample1[,2:3],qsample3[,2:3])) b150_path = resample("Quercus_150", hv_combine, method = "bootstrap", n = 25, points_per_resample = 150, cores = 20) b300_path = resample("Quercus_300", hv_combine, method = "bootstrap", n = 25, points_per_resample = 300, cores = 20) # hypervolume_overlap_test generates overlap statistics for each of the 25*25 # possible pairs of size 150 and size 300 hypervolumes. results3 = hypervolume_overlap_test(hv1, hv2, c(b150_path, b300_path), cores = 1) ## End(Not run)
## Not run: # We will use the data in "quercus" as our population in this example data("quercus") # Consider taking two samples of size 150 from the population and you want to figure out whether # the samples are similar by seeing if they occupy the same area in feature space. qsample1 = quercus[sample(1:nrow(quercus), 150),] qsample2 = quercus[sample(1:nrow(quercus), 150),] # Construct two hypervolumes from the samples hv1 = hypervolume(qsample1[,2:3]) hv2 = hypervolume(qsample2[,2:3]) # Approach 1 # Take 200 permutations of the 300 data points. Using more cores is faster. perm_path = hypervolume_permute("Quercus_perm_150", hv1, hv2, n = 200, cores = 20) # hypervolume_overlap_test takes perm_path as an input. # Results include p values for the overlap statistics of hv1 and hv2 as well as # the corresponding null distributions generated from perm_path. results1 = hypervolume_overlap_test(hv1, hv2, perm_path, cores = 20) # Approach 2 # Under our null hypothesis the samples come from the same population. # Approximate the original population by combining the data # then simulate drawing 150 data points 50 times. hv_combine = hypervolume(rbind(qsample1[,2:3],qsample2[,2:3])) bootstrap_path = hypervolume_resample("Quercus_boot_150", hv_combine, method = "bootstrap", n = 50, points_per_resample = 150, cores = 20) # hypervolume_overlap_test splits the 50 resampled hypervolumes in half and gets # overlap statistic for each of the 25*25 pairs to generate the null # distribution. This method allows us to approximate the null distribution using # 625 data points while only generating 50 hypervolumes as opposed to # hypervolume_permute which uses 400 hypervolumes to generate 200 data points. results2 = hypervolume_overlap_test(hv1, hv2, bootstrap_path) # Approach 3 # Suppose we have a size 300 sample and a size 150 sample and we want to know # whether they come from the same distribution. qsample3 = quercus[sample(1:nrow(quercus), 300),] hv3 = hypervolume(qsample3[,2:3]) # Permutation still works in this case, however we can also use bootstrap by # combining the data and drawing size 150 then size 300 samples. hv_combine = hypervolume(rbind(qsample1[,2:3],qsample3[,2:3])) b150_path = resample("Quercus_150", hv_combine, method = "bootstrap", n = 25, points_per_resample = 150, cores = 20) b300_path = resample("Quercus_300", hv_combine, method = "bootstrap", n = 25, points_per_resample = 300, cores = 20) # hypervolume_overlap_test generates overlap statistics for each of the 25*25 # possible pairs of size 150 and size 300 hypervolumes. results3 = hypervolume_overlap_test(hv1, hv2, c(b150_path, b300_path), cores = 1) ## End(Not run)
Takes two data of two hypervolume objects (with the same column labels) and generates pairs of hypervolumes with the original sizes of the input hypervolumes but with permuted data (the rows of the original hypervolumes' data are combined and redistributed to the two new hypervolumes). This function is meant for taking a sample of all permutations and does not guarantee that permutations are not repeated. Newly generated hypervolume objects are written to file.
hypervolume_permute(name, hv1, hv2, n = 50, cores = 1, verbose = TRUE)
hypervolume_permute(name, hv1, hv2, n = 50, cores = 1, verbose = TRUE)
name |
File name; The function writes hypervolumes to file in ./Objects/<name> |
hv1 |
A hypervolume object |
hv2 |
A hypervolume object |
n |
number of permutations to take |
cores |
Number of logical cores to use while generating permuted hypervolumes. If parallel backend already registered to |
verbose |
Logical value; If function is being run sequentially, outputs progress bar in console. |
hypervolume_permute
creates a directory called Objects in the current working directory if a directory of that name doesn't already exist. Returns an absolute path to directory with permuted hypervolumes. rds files are stored in separate subdirectories for each permutation. Use hypervolume_permute
when generating null distribution of overlap statistics. hypervolume_overlap_test
takes in a hypervolume_permute
filepath output.
It is also possible to access the hypervolumes by using readRDS to read the hypervolume objects in one by one.
returns a string containing an absolute path equivalent to ./Objects/<name>
hypervolume_overlap_test
## Not run: data("quercus") # For this example consider taking two samples of size 150 from the data. qsample1 = quercus[sample(1:nrow(quercus), 150),] qsample2 = quercus[sample(1:nrow(quercus), 150),] # Construct two hypervolumes from the samples hv1 = hypervolume(qsample1[,2:3]) hv2 = hypervolume(qsample2[,2:3]) # Take 200 permutations of the 300 data points. Using more cores is faster. perm_path = hypervolume_permute("Quercus_perm_150", hv1, hv2, n = 200, cores = 20) # hypervolume_overlap_test takes perm_path as an input. # Results include p value for the overlap statistics of hv1 and hv2 as well as # null distribution generated from perm_path. The null distribution assumes data # for hv1 and hv2 are drawn from the same distribution and permuting data will # not change the overlap statitsics. results = hypervolume_overlap_test(hv1, hv2, perm_path) ## End(Not run)
## Not run: data("quercus") # For this example consider taking two samples of size 150 from the data. qsample1 = quercus[sample(1:nrow(quercus), 150),] qsample2 = quercus[sample(1:nrow(quercus), 150),] # Construct two hypervolumes from the samples hv1 = hypervolume(qsample1[,2:3]) hv2 = hypervolume(qsample2[,2:3]) # Take 200 permutations of the 300 data points. Using more cores is faster. perm_path = hypervolume_permute("Quercus_perm_150", hv1, hv2, n = 200, cores = 20) # hypervolume_overlap_test takes perm_path as an input. # Results include p value for the overlap statistics of hv1 and hv2 as well as # null distribution generated from perm_path. The null distribution assumes data # for hv1 and hv2 are drawn from the same distribution and permuting data will # not change the overlap statitsics. results = hypervolume_overlap_test(hv1, hv2, perm_path) ## End(Not run)
Determines a suitability score by calculating the hypervolume value at each of a set of points in an input raster stack based on either a probability density estimation or inclusion test.
Note that projected values are not normalized and are not necessarily constrained to fall between 0 and 1.
Note also that additional arguments can be passed to this function to enable parallel operation (see ... below).
hypervolume_project(hv, rasters, type = "probability", verbose = TRUE, ...)
hypervolume_project(hv, rasters, type = "probability", verbose = TRUE, ...)
hv |
An input hypervolume |
rasters |
A |
type |
If |
... |
Additional arguments to be passed to either |
verbose |
If |
A raster
object of same resolution and extent as the input layers corresponding to suitability values.
hypervolume_estimate_probability
, hypervolume_inclusion_test
## Not run: # load in lat/lon data data('quercus') data_alba = subset(quercus, Species=="Quercus alba")[,c("Longitude","Latitude")] data_alba = data_alba[sample(1:nrow(data_alba),500),] # get worldclim data from internet require(maps) require(raster) climatelayers = getData('worldclim', var='bio', res=10, path=tempdir()) # z-transform climate layers to make axes comparable climatelayers_ss = climatelayers[[c(1,12)]] for (i in 1:nlayers(climatelayers_ss)) { climatelayers_ss[[i]] <- (climatelayers_ss[[i]] - cellStats(climatelayers_ss[[i]], 'mean')) / cellStats(climatelayers_ss[[i]], 'sd') } climatelayers_ss = crop(climatelayers_ss, extent(-150,-50,15,60)) # extract transformed climate values climate_alba = extract(climatelayers_ss, data_alba[1:300,]) # compute hypervolume hv_alba <- hypervolume_gaussian(climate_alba) # do geographical projection raster_alba_projected_accurate <- hypervolume_project(hv_alba, rasters=climatelayers_ss) raster_alba_projected_fast = hypervolume_project(hv_alba, rasters=climatelayers_ss, type='inclusion', fast.or.accurate='fast') # draw map of suitability scores plot(raster_alba_projected_accurate,xlim=c(-100,-60),ylim=c(25,55)) map('usa',add=TRUE) plot(raster_alba_projected_fast,xlim=c(-100,-60),ylim=c(25,55)) map('usa',add=TRUE) ## End(Not run)
## Not run: # load in lat/lon data data('quercus') data_alba = subset(quercus, Species=="Quercus alba")[,c("Longitude","Latitude")] data_alba = data_alba[sample(1:nrow(data_alba),500),] # get worldclim data from internet require(maps) require(raster) climatelayers = getData('worldclim', var='bio', res=10, path=tempdir()) # z-transform climate layers to make axes comparable climatelayers_ss = climatelayers[[c(1,12)]] for (i in 1:nlayers(climatelayers_ss)) { climatelayers_ss[[i]] <- (climatelayers_ss[[i]] - cellStats(climatelayers_ss[[i]], 'mean')) / cellStats(climatelayers_ss[[i]], 'sd') } climatelayers_ss = crop(climatelayers_ss, extent(-150,-50,15,60)) # extract transformed climate values climate_alba = extract(climatelayers_ss, data_alba[1:300,]) # compute hypervolume hv_alba <- hypervolume_gaussian(climate_alba) # do geographical projection raster_alba_projected_accurate <- hypervolume_project(hv_alba, rasters=climatelayers_ss) raster_alba_projected_fast = hypervolume_project(hv_alba, rasters=climatelayers_ss, type='inclusion', fast.or.accurate='fast') # draw map of suitability scores plot(raster_alba_projected_accurate,xlim=c(-100,-60),ylim=c(25,55)) map('usa',add=TRUE) plot(raster_alba_projected_fast,xlim=c(-100,-60),ylim=c(25,55)) map('usa',add=TRUE) ## End(Not run)
Identifies hypervolumes characterized either by a number of uniformly random points or a volume below a user-specified value and removes them from a HypervolumeList
.
This function is useful for removing small features that can occur stochastically during segmentation after set operations or hole detection.
hypervolume_prune(hvlist, num.points.min = NULL, volume.min = NULL, return.ids=FALSE)
hypervolume_prune(hvlist, num.points.min = NULL, volume.min = NULL, return.ids=FALSE)
hvlist |
A |
num.points.min |
The minimum number of points in each input hypervolume. |
volume.min |
The minimum volume in each input hypervolume |
return.ids |
If |
Either minnp
or minvol
(but not both) must be specified.
A HypervolumeList
pruned to only those hypervolumes of sizes above the desired value. If returnids=TRUE
, instead returns a list structure with first item being the HypervolumeList
and the second item being the indices of the retained hypervolumes.
hypervolume_holes
, hypervolume_segment
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') hv_segmented <- hypervolume_segment(hv, num.points.max=200, distance.factor=1, check.memory=FALSE) # intentionally under-segment hv_segmented_pruned <- hypervolume_prune(hv_segmented, num.points.min=20) plot(hv_segmented_pruned) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') hv_segmented <- hypervolume_segment(hv, num.points.max=200, distance.factor=1, check.memory=FALSE) # intentionally under-segment hv_segmented_pruned <- hypervolume_prune(hv_segmented, num.points.min=20) plot(hv_segmented_pruned) ## End(Not run)
Estimates squared probability density at a given point. This metric is proportional to the number of data points multiplied by the probability density at a point.
hypervolume_redundancy(...)
hypervolume_redundancy(...)
... |
Arguments to be passed to |
hypervolume_estimate_probability
hypervolume_resample
generates new hyperolumes based on the method input. Outputs written to file.
- "bootstrap"
: Generates n hypervolumes using data bootstrapped from original data
- "bootstrap seq"
: Generates n hypervolumes for each sample size in sequence specified by user
- "weighted bootstrap"
: Same procedure as bootstrap but with sampling weights applied to data (default weights are normal distributed)
hypervolume_resample(name, hv, method, n = 10, points_per_resample = "sample_size", seq = 3:nrow(hv@Data), cores = 1, verbose = TRUE, to_file = TRUE, mu = NULL, sigma = NULL, cols_to_weigh = 1:ncol(hv@Data), weights = NULL)
hypervolume_resample(name, hv, method, n = 10, points_per_resample = "sample_size", seq = 3:nrow(hv@Data), cores = 1, verbose = TRUE, to_file = TRUE, mu = NULL, sigma = NULL, cols_to_weigh = 1:ncol(hv@Data), weights = NULL)
name |
File name; The function writes hypervolumes to file in ./Objects/<name> |
hv |
A hypervolume object |
method |
String input; options are |
n |
Number of resamples to take. Used for every method. |
points_per_resample |
Number of points in each resample. If the input is "sample_size", then the same number of points as the original sample is used.
Used for |
seq |
Sequence of sample sizes. If |
cores |
Number of logical cores to use while generating bootstraped hypervolumes. If parallel backend already registered to |
verbose |
Logical value; If function is being run sequentially, outputs progress bar in console. |
to_file |
Logical value; If TRUE, writes resampled hypervolumes to file, otherwise returns an object from |
mu |
Array of values specifying the mean of multivariate normal weights.
Used for |
sigma |
Array of values specifying the variance in each dimension. (higher variance corresponds to more even weights)
Used for |
cols_to_weigh |
Array of column indices; must be same length as mu and sigma.
Used for |
weights |
Custom weight assigned to each row of data when resampling.
Used for |
hypervolume_resample
creates a directory called Objects in the current working directory if a directory of that name doesn't already exist. Returns an absolute path to directory with resampled hypervolumes. rds files are stored in different file structures depending on which method is called.
Use to_hv_list
to extract every hypervolume object in a directory into a HypervolumeList object.
It is also possible to access the hypervolumes by using readRDS to read the hypervolume objects in one by one.
The resampled hypervolumes are generated using the same parameters used to generate the input hypervolume. The only exception is that the bandwidth is re-estimated if method = "gaussian"
or method = "box"
. See copy_param_hypervolume
for more details.
returns a string containing an absolute path equivalent to ./Objects/<name>
to_hv_list
, hypervolume_overlap_test
, hypervolume_funnel
, hypervolume_overlap_confidence
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Example 1: Get 50 resampled hypervolumes # Use detectCores to see how many cores are availible in current environment # Set cores = 1 to run sequentially (default) path = hypervolume_resample("example_bootstrap", hv, method = "bootstrap", n = 50, cores = 12) hvs = to_hv_list(path) # Example 2: Get resample with applied weights # Get maximum bill length max_bill = max(bill_data$bill_length_mm) # Make data with larger bill length slightly more likley to be resampled weighted_path = hypervolume_resample("weighted test", hv, method = "weighted bootstrap", n = 50, cores = 12, mu = max_bill, sigma = 90, cols_to_weigh = "bill_length_mm") hvs_weighted = to_hv_list(weighted_path) ## End(Not run)
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Example 1: Get 50 resampled hypervolumes # Use detectCores to see how many cores are availible in current environment # Set cores = 1 to run sequentially (default) path = hypervolume_resample("example_bootstrap", hv, method = "bootstrap", n = 50, cores = 12) hvs = to_hv_list(path) # Example 2: Get resample with applied weights # Get maximum bill length max_bill = max(bill_data$bill_length_mm) # Make data with larger bill length slightly more likley to be resampled weighted_path = hypervolume_resample("weighted test", hv, method = "weighted bootstrap", n = 50, cores = 12, mu = max_bill, sigma = 90, cols_to_weigh = "bill_length_mm") hvs_weighted = to_hv_list(weighted_path) ## End(Not run)
Rotates the plot around an axis at a given speed and saves results as a series of GIFs. Requires that the rgl
library is installed. Assumes there is an open RGL plot (e.g. from calling plot(hv, show.3d=TRUE)
). If the magick
package is available, combines these GIFs into a single animation.
hypervolume_save_animated_gif(image.size = 400, axis = c(0, 0, 1), rpm = 4, duration = 15, fps = 10, file.name = "movie", directory.output = ".", ...)
hypervolume_save_animated_gif(image.size = 400, axis = c(0, 0, 1), rpm = 4, duration = 15, fps = 10, file.name = "movie", directory.output = ".", ...)
image.size |
Number of pixels on each side of the animated image. |
axis |
A three-element vector describing the rotation axis. |
rpm |
Animation speed in rotations per minute. |
duration |
Animation duration in seconds. |
fps |
Animation speed in frames per second. |
file.name |
A base name (no extension) for the GIFs. |
directory.output |
The folder in which output should be located. |
... |
Other arguments to be passed to |
None; used for the side-effect of producing files.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') if(interactive()) { plot(hv, show.3d=TRUE) hypervolume_save_animated_gif() rgl.close() } ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') if(interactive()) { plot(hv, show.3d=TRUE) hypervolume_save_animated_gif() rgl.close() } ## End(Not run)
Performs hierarchical clustering (using the 'single' method described in fastcluster::hclust
) on the input hypervolume to determine which sets of points are closest to others, then cuts the resulting tree at a height equal to the characteristic distance between points multiplied by a distance factor. Random points in the input hypervolume corresponding to each distinct cluster are assigned to distinct output hypervolumes.
Because clustering algorithms scale quadratically with the number of input points, this algorithm can run slowly. Therefore by default, the function can thin the input hypervolume to a reduced number of random points before analysis. This causes some loss of resolution but improves runtimes.
hypervolume_segment(hv, distance.factor = 1, num.points.max = NULL, verbose = TRUE, check.memory = TRUE)
hypervolume_segment(hv, distance.factor = 1, num.points.max = NULL, verbose = TRUE, check.memory = TRUE)
hv |
An input |
distance.factor |
A numeric value characterizing the distance multiplication factor. Larger values result in fewer distinct output hypervolumes; smaller values result in more. |
num.points.max |
A numeric value describing the maximum number of random points to be retained in the input; passed to |
verbose |
Logical value; print diagnostic output if |
check.memory |
Logical value; returns information about expected memory usage if true. |
A HypervolumeList
object.
# low sample sizes to meet CRAN time requirements data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] # intentionally make a holey shape for segmentation example hv = hypervolume_gaussian(penguins_adelie,name='Adelie', kde.bandwidth=estimate_bandwidth(penguins_adelie)/3) hv_segmented <- hypervolume_segment(hv, num.points.max=200, distance.factor=1.25, check.memory=FALSE) # intentionally under-segment plot(hv_segmented,show.contour=FALSE)
# low sample sizes to meet CRAN time requirements data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] # intentionally make a holey shape for segmentation example hv = hypervolume_gaussian(penguins_adelie,name='Adelie', kde.bandwidth=estimate_bandwidth(penguins_adelie)/3) hv_segmented <- hypervolume_segment(hv, num.points.max=200, distance.factor=1.25, check.memory=FALSE) # intentionally under-segment plot(hv_segmented,show.contour=FALSE)
Computes the intersection, union, and unique components of two hypervolumes.
hypervolume_set(hv1, hv2, num.points.max = NULL, verbose = TRUE, check.memory = TRUE, distance.factor = 1)
hypervolume_set(hv1, hv2, num.points.max = NULL, verbose = TRUE, check.memory = TRUE, distance.factor = 1)
hv1 |
A n-dimensional hypervolume |
hv2 |
A n-dimensional hypervolume |
num.points.max |
Maximum number of random points to use for set operations. If |
verbose |
Logical value; print diagnostic output if true. |
check.memory |
Logical value; returns information about expected memory usage if true. |
distance.factor |
Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Recommended to not change this parameter. |
Uses the inclusion test approach to identify points in the first hypervolume that are or are not within the second hypervolume and vice-versa, based on determining whether each random point in each hypervolume is within a critical distance of at least one random point in the other hypervolume.
The intersection is the points in both hypervolumes, the union those in either hypervolume, and the unique components the points in one hypervolume but not the other.
If you have more than two hypervolumes and wish to calculate only an intersection, consider instead using hypervolume_set_n_intersection
rather than iteratively applying this function.
By default, the function uses check.memory=TRUE
which will provide an estimate of the computational cost of the set operations. The function should then be re-run with check_memory=FALSE
if the cost is acceptable. This algorithm's memory and time cost scale quadratically with the number of input points, so large datasets can have disproportionately high costs. This error-checking is intended to prevent the user from large accidental memory allocation.
The computation is actually performed on a random sample from both input hypervolumes, constraining each to have the same point density given by the minimum of the point density of each input hypervolume, and the point density calculated using the volumes of each input hypervolume divided by num.points.max
.
Because this algorithm is based on distances calculated between the distributions of random points, the critical distance (point density ^ (-1/n)) can be scaled by a user-specified factor to provide more or less liberal estimates (distance_factor
greater than or less than 1).
If check_memory
is false, returns a HypervolumeList object, with six items in its HVList slot:
HV1 |
The input hypervolume hv1 |
HV2 |
The input hypervolume hv2 |
Intersection |
The intersection of hv1 and hv2 |
Union |
The union of hv1 and hv2 |
Unique_1 |
The unique component of hv1 relative to hv2 |
Unique_2 |
The unique component of hv2 relative to hv1 |
Note that the output hypervolumes will have lower random point densities than the input hypervolumes.
You may find it useful to define a Jaccard-type fractional overlap between hv1 and hv2 as hv_set@HVList$Intersection@Volume / hv_set@HVList$Union@Volume
.
If check_memory
is true, instead returns a scalar with the expected number of pairwise comparisons.
If one of the input hypervolumes has no random points, returns NA
with a warning.
hypervolume_set_n_intersection
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hv_set <- hypervolume_set(hv1, hv2, check.memory=FALSE) hypervolume_overlap_statistics(hv_set) # examine volumes of each set component get_volume(hv_set)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] penguins_chinstrap = penguins_no_na[penguins_no_na$species=="Chinstrap", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv1 = hypervolume_box(penguins_adelie,name='Adelie') hv2 = hypervolume_box(penguins_chinstrap,name='Chinstrap') hv_set <- hypervolume_set(hv1, hv2, check.memory=FALSE) hypervolume_overlap_statistics(hv_set) # examine volumes of each set component get_volume(hv_set)
Intersection of n hypervolumes.
hypervolume_set_n_intersection(hv_list, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE)
hypervolume_set_n_intersection(hv_list, num.points.max = NULL, verbose = TRUE, distance.factor = 1, check.hyperplane = FALSE)
hv_list |
A list of hypervolumes (HypervolumeList) |
num.points.max |
Maximum number of random points to use for the calculation of the intersection. If |
verbose |
Logical value; print diagnostic output if true. |
distance.factor |
Numeric value; multiplicative factor applied to the critical distance for all inclusion tests (see below). Recommended to not change this parameter. |
check.hyperplane |
Checks whether data in the input hypervolumes forms a hyperplane (if so, the algorithm is not able to accurately calculate an intersection) |
Finds the intersection of multiple hypervolumes. Using this function is likely faster and more accurate than iteratively applying hypervolume_set
to hypervolume pairs, as this function does not iteratively perform downsampling.
Stores all the points from the input hypervolumes in a single set. Then uses the inclusion test approach to identify and store points from this set that are within each individual resampled hypervolume, successively. All the points that are common to all the tests are grouped, resampled and used to generate the hypervolume corresponding to the intersection.
The computation is actually performed on a random sample from input hypervolumes, constraining each to have the same point density given by the minimum of the point density of each input hypervolume, and the point density calculated using the volumes of each input hypervolume divided by num.points.max
.
Because this algorithm is based on distances calculated between the distributions of random points, the critical distance (point density ^ (-1/n)) can be scaled by a user-specified factor to provide more or less liberal estimates (distance_factor
greater than or less than 1).
result |
The intersection of the input hypervolumes, as a unique hypervolume |
.
Note that the output hypervolumes will have lower random point densities than the input hypervolumes.
If one of the input hypervolumes has no random points, returns NA
with a warning.
## Not run: data(iris) hv1 = hypervolume_gaussian(subset(iris, Species=="setosa")[,1:3], name='setosa') hv2 = hypervolume_gaussian(subset(iris, Species=="virginica")[,1:3], name='virginica') hv3 = hypervolume_gaussian(subset(iris, Species=="versicolor")[,1:3], name='versicolor') hv_list = hypervolume_join(hv1,hv2,hv3) intersection = hv_set_n_intersection(hv_list) ## End(Not run)
## Not run: data(iris) hv1 = hypervolume_gaussian(subset(iris, Species=="setosa")[,1:3], name='setosa') hv2 = hypervolume_gaussian(subset(iris, Species=="virginica")[,1:3], name='virginica') hv3 = hypervolume_gaussian(subset(iris, Species=="versicolor")[,1:3], name='versicolor') hv_list = hypervolume_join(hv1,hv2,hv3) intersection = hv_set_n_intersection(hv_list) ## End(Not run)
Constructs a hypervolume by building a one-class support vector machine that classifies data points as 'in' and other locations as 'out'. This is accomplished by 1) transforming the input data into a high-dimensional nonlinear space in which the data points can be optimally separated from background by a single hyperplane, 2) back-transforming the hyperplane into the original space, 3) delineating an adaptive grid of random points near the original data points, and 4) using the SVM to predict if each of these points is in or out.
hypervolume_svm(data, name = NULL, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), svm.nu = 0.01, svm.gamma = 0.5, scale.factor = 1, chunk.size = 1000, verbose = TRUE)
hypervolume_svm(data, name = NULL, samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)), svm.nu = 0.01, svm.gamma = 0.5, scale.factor = 1, chunk.size = 1000, verbose = TRUE)
data |
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality. |
name |
A string to assign to the hypervolume for later output and plotting. Defaults to the name of the variable if NULL. |
samples.per.point |
Number of random points to be evaluated per data point in |
svm.nu |
A SVM parameter determining an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Lower values result in tighter wrapping of the shape to the data (see section 2.2. of https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf). |
svm.gamma |
A SVM parameter defining the inverse radius of influence of a single point. Low values yield large influences (smooth less complex wraps around the data) and high values yield small influences (tighter but potentially noiser wraps around the data) (see http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html). |
scale.factor |
A multiplicative factor used to determine the boundaries of the hyperelliptical sampling region. Larger values yield larger boundaries and can prevent clipping. Should not need to be changed in almost any situation. |
chunk.size |
Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled. |
verbose |
Logical value; print diagnostic output if |
A Hypervolume-class
object corresponding to the inferred hypervolume.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_svm(penguins_adelie,name='Adelie') summary(hv)
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_svm(penguins_adelie,name='Adelie') summary(hv)
Many hypervolume algorithms have computational complexities that scale with the number of random points used to characterize a hypervolume (@RandomPoints
). This value can be reduced to improve runtimes at the cost of lower resolution.
hypervolume_thin(hv, factor = NULL, num.points = NULL)
hypervolume_thin(hv, factor = NULL, num.points = NULL)
hv |
An object of class |
factor |
A number in (0,1) describing the fraction of random points to keep. |
num.points |
A number describing the number random points to keep. |
Either factor
or npoints
(but not both) must be specified.
A Hypervolume
object
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') # downsample to 1000 random points hv_thinned = hypervolume_thin(hv, num.points=1000) hv_thinned
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') # downsample to 1000 random points hv_thinned = hypervolume_thin(hv, num.points=1000) hv_thinned
Thresholds a hypervolume at a given value that can correspond to a quantile of the hypervolume. All random points below the threshold value are removed and the volume is adjusted accordingly. Provides threshold-quantile plots if multiple thresholds are specified (as by default).
Quantiles can be specified to be either of the total volume enclosed by the hypervolume p(proportional to nrow(hv@RandomPoints)
), or of the total probability density (proportional to sum(hv@ValueAtRandomPoints)
).
hypervolume_threshold(hv, thresholds = NULL, num.thresholds = 20, quantile.requested = NULL, quantile.requested.type = "volume", uniform.density = TRUE, plot = TRUE, verbose = TRUE)
hypervolume_threshold(hv, thresholds = NULL, num.thresholds = 20, quantile.requested = NULL, quantile.requested.type = "volume", uniform.density = TRUE, plot = TRUE, verbose = TRUE)
hv |
An input hypervolume |
thresholds |
A sequence of probability threshold values. If |
num.thresholds |
The number of threshold values to use if |
quantile.requested |
If not |
quantile.requested.type |
Determines the quantile type: either |
uniform.density |
Logical value. If |
plot |
Plots a threshold-quantile plot if |
verbose |
If |
Hypervolumes constructed using the hypervolume_box
method may not always yield quantiles close to the requested value because of the flat shape of the kernel.
A list containing two elements: a HypervolumeList
or Hypervolume
object corresponding to the hypervolumes at each threshold value, and a dataframe Statistics
corresponding to the relevant quantiles and thresholds.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') # get hypervolumes at multiple thresholds hvlist = hypervolume_threshold(hv, plot=TRUE) head(hvlist$Statistics) plot(hvlist$HypervolumesThresholded[[c(1,5,10,15,20)]], show.random=TRUE, show.data=FALSE,show.centroid=FALSE) # get hypervolume for a single low quantile value plot(hypervolume_threshold(hv, plot=FALSE, verbose=FALSE, quantile.requested=0.1,quantile.requested.type="volume")[[1]]) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') # get hypervolumes at multiple thresholds hvlist = hypervolume_threshold(hv, plot=TRUE) head(hvlist$Statistics) plot(hvlist$HypervolumesThresholded[[c(1,5,10,15,20)]], show.random=TRUE, show.data=FALSE,show.centroid=FALSE) # get hypervolume for a single low quantile value plot(hypervolume_threshold(hv, plot=FALSE, verbose=FALSE, quantile.requested=0.1,quantile.requested.type="volume")[[1]]) ## End(Not run)
data.frame
Convert objects of class Hypervolume
or HypervolumeList
to a data.frame
.
hypervolume_to_data_frame(hv, remove_zeroes = TRUE)
hypervolume_to_data_frame(hv, remove_zeroes = TRUE)
hv |
A |
remove_zeroes |
Remove zeroes from |
Zero values are generated during the occupancy routine when a random point is included in some groups of hypervolumes but not in others. A tipical usage of hypervolume_to_data_frame()
with objects generated with the occupancy routine should remove zeroes.
A data.frame
.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) hv_list <- hypervolume_join(hv_list) # get the data.frame hypervolume_to_data_frame(hv_list) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) hv_list <- hypervolume_join(hv_list) # get the data.frame hypervolume_to_data_frame(hv_list) ## End(Not run)
Assesses the contribution of each variable to the total hypervolume as a rough metric of variable importance.
hypervolume_variable_importance(hv, verbose = TRUE)
hypervolume_variable_importance(hv, verbose = TRUE)
hv |
A hypervolume for which the importance of each variable should be calculated. |
verbose |
If |
The algorithm proceeds by comparing the n-dimensional input hypervolume's volume to all possible n-1 dimensional hypervolumes where each variable of interest has been deleted. The importance score reported is the ratio of the n-dimensional hypervolume relative to each of the n-1 dimensional hypervolumes. Larger values indicate that a variable makes a proportionally higher contribution to the overall volume.
The algorithm can only be used on Hypervolumes that have a Data
and Method
value, because the variable deletion process is not well defined for objects that are not associated with a particular set of observations and construction method.
A named vector with importance scores for each axis. Note that these scores are not dimensionless but rather have units corresponding to the original units of each variable.
# low parameter values for speed data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') varimp = hypervolume_variable_importance(hv,verbose=FALSE) barplot(varimp,ylab='Importance',xlab='Variable')
# low parameter values for speed data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_box(penguins_adelie,name='Adelie') varimp = hypervolume_variable_importance(hv,verbose=FALSE) barplot(varimp,ylab='Importance',xlab='Variable')
"Hypervolume"
Primary storage class for stochastic descriptions of hypervolumes
Objects can be created by calls of the form new("Hypervolume", ...)
.
Name
:Object of class "character"
~~ the name of the hypervolume
Method
:Object of class "character"
~~ the method used to construct this hypervolume
Data
:Object of class "matrix"
~~ May be empty if the hypervolume is not associated with data (e.g. convex expectation, set operations)
Dimensionality
:Object of class "numeric"
~~ Dimensionality of the hypervolume
Volume
:Object of class "numeric"
~~ Volume of the hypervolume
PointDensity
:Object of class "numeric"
~~ Number of random points per unit volume
Parameters
:Object of class "list"
~~ List of parameters that will depend on the method used to construct the hypervolume
RandomPoints
:Object of class "matrix"
~~ A matrix of uniformly random points distributed within the hypervolume
ValueAtRandomPoints
:Object of class "numeric"
~~ A vector of positive numbers representing the probabilty density at each random point in @RandomPoints
"HypervolumeList"
A class used for storing more than one hypervolume.
Objects can be created by calls of the form new("HypervolumeList", ...)
.
HVList
:Object of class "list"
containing multiple hypervolumes
Data for nine morphological traits for species of Darwin's finches occurring on the Galapagos Islands.
Note that the underlying morphological dataset has been augmented and improved since version 1.3.1 to include more species and islands. Results are not comparable to version 1.3.0 and below. To duplicate results in the Blonder et al. (2014) paper please install an older version of the package.
data("morphSnodgrassHeller")
data("morphSnodgrassHeller")
A data frame with 549 observations on the following 20 variables.
Source
a factor with levels Snodgrass & Heller (1904)
IslandID
a factor with levels Balt_SS
Drwn_Clp
Esp_Hd
Flor_Chrl
Frn_Nrb
Gnov_Twr
Isa_Alb
Mrch_Bndl
Pnt_Abng
Pnz_Dnc
SCris_Chat
SCru_Inde
SFe_Brngt
Snti_Jams
Wlf_Wnm
TaxonOrig
a factor with levels Certhidea cinerascens bifasciata
Certhidea cinerascens cinerascens
Certhidea olivacea becki
Certhidea olivacea fusca
Certhidea olivacea luteola
Certhidea olivacea mentalis
Certhidea olivacea olivacea
Geospiza affinis
Geospiza conirostris conirostris
Geospiza conirostris propinqua
Geospiza crassirostris
Geospiza fortis dubia
Geospiza fortis fortis
Geospiza fortis fratercula
Geospiza fortis platyrhyncha
Geospiza fuliginosa acutirostris
Geospiza fuliginosa difficilis
Geospiza fuliginosa fuliginosa
Geospiza fuliginosa minor
Geospiza fuliginosa parvula
Geospiza habeli
Geospiza heliobates
Geospiza paupera
Geospiza prosthemelas prosthemelas
Geospiza prosthemelas salvini
Geospiza psittacula psittacula
Geospiza scandens abingdoni
Geospiza scandens fatigata
Geospiza scandens rothschildi
Geospiza scandens scandens
Geospiza septentrionalis
Geospiza strenua
GenusL69
a factor with levels Camarhynchus
Certhidea
Geospiza
Platyspiza
SpeciesL69
a factor with levels conirostris
crassirostris
difficilis
fortis
fuliginosa
heliobates
magnirostris
olivacea
parvulus
pauper
psittacula
scandens
SubspL69
a factor with levels abingdoni
affinis
becki
bifasciatus
cinerascens
conirostris
darwini
fusca
habeli
intermedia
luteola
mentalis
olivacea
parvulus
propinqua
psittacula
rothschildi
salvini
scandens
septentrionalis
strenua
SpeciesID
a factor with levels Cam.hel
Cam.par
Cam.pau
Cam.psi
Cer.oli
Geo.con
Geo.dif
Geo.for
Geo.ful
Geo.mag
Geo.sca
Pla.cra
SubspID
a factor with levels Cam.hel
Cam.par.par
Cam.par.sal
Cam.pau
Cam.psi.aff
Cam.psi.hab
Cam.psi.psi
Cer.oli.bec
Cer.oli.bif
Cer.oli.cin
Cer.oli.fus
Cer.oli.lut
Cer.oli.men
Cer.oli.oli
Geo.con.con
Geo.con.dar
Geo.con.pro
Geo.dif.sep
Geo.for
Geo.ful
Geo.mag.str
Geo.sca.abi
Geo.sca.int
Geo.sca.rot
Geo.sca.sca
Pla.cra
Sex
a factor with levels F
M
Plumage
a logical vector
BodyL
a numeric vector
WingL
a numeric vector
TailL
a numeric vector
BeakW
a numeric vector
BeakH
a numeric vector
LBeakL
a numeric vector
UBeakL
a numeric vector
N.UBkL
a factor with levels 10
10.3
10.5
10.7
11
11.3
11.5
11.7
12
12.3
12.5
12.7
13
13.3
13.5
13.7
14
14.3
14.5
14.7
15
15.3
15.5
15.7
16
16.3
16.5
16.7
17
17.5
6.5
6.7
7
7.3
7.5
7.7
8
8..3
8.3
8.5
8.7
9
9.3
9.5
9.7
TarsusL
a numeric vector
MToeL
a logical vector
Snodgrass RE and Heller E (1904) Papers from the Hopkins-Stanford Galapagos Expedition, 1898-99. XVI. Birds. Proceedings of the Washington Academy of Sciences 5: 231-372.
Downloaded from http://datadryad.org/resource/doi:10.5061/dryad.152
data(morphSnodgrassHeller) finch_isabela <- morphSnodgrassHeller[morphSnodgrassHeller$IslandID=="Isa_Alb",]
data(morphSnodgrassHeller) finch_isabela <- morphSnodgrassHeller[morphSnodgrassHeller$IslandID=="Isa_Alb",]
The occupancy_bootstrap_gof()
function calculates goodness of fit metrics for objects generated with hypervolume_n_occupancy_bootstrap()
.
occupancy_bootstrap_gof(path, FUN)
occupancy_bootstrap_gof(path, FUN)
path |
A path to a directory of bootstrapped hypervolumes generated with |
FUN |
Function to calculate the goodness of fit. It can be |
Goodness of fit metrics are calculated on the difference between input and recomputed volumes for each bootstrapped element (set with n
in hypervolume_n_resample()
). See hypervolume_n_occupancy()
for details on the meaning of input and recomputed hypervolumes.
A one row data.frame
reporting mean, standard deviation, minimum, maximum, median, 2.5%, 25%, 75% ans 97.5% quantiles.
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap hypervolumes based on sex hv_resample = hypervolume_n_resample(hv_list, name = "boot_example") # calculate occupancy for each bootstrap hv_occupancy_bootstrap = hypervolume_n_occupancy_bootstrap(hv_resample, name = "occupancy_example", classification = rep(c("female", "male"), 3)) occupancy_bootstrap_gof(hv_occupancy_bootstrap, FUN = "rmse") ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # bootstrap hypervolumes based on sex hv_resample = hypervolume_n_resample(hv_list, name = "boot_example") # calculate occupancy for each bootstrap hv_occupancy_bootstrap = hypervolume_n_occupancy_bootstrap(hv_resample, name = "occupancy_example", classification = rep(c("female", "male"), 3)) occupancy_bootstrap_gof(hv_occupancy_bootstrap, FUN = "rmse") ## End(Not run)
The occupancy_filter()
function is used to subset an hypervolume generated with hypervolume_n_occupancy()
or hypervolume_n_occupancy_test()
.
occupancy_filter(hv, operator = NULL, filter = NULL, tol = 1e-10)
occupancy_filter(hv, operator = NULL, filter = NULL, tol = 1e-10)
hv |
A |
operator |
Binary operator which allow the comparison. |
filter |
Threshold value to perform the operation. |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The occupancy_filter()
function set the occupancy values to 0 based on the user-provided operation. Volume of the hypervolumes are changed accordingly.
When hv
is an HypervolumeList
, the occupancy_filter()
function attempts to reconstruct the volume of the union of hypervolumes from hv_list
. At first, the volume of the union of hypervolumes is calculated for each element of hv
as the the ratio between the total number of random points and the number of random points of the ith element of hv
, multiplied by the volume of the ith element hv
. This step results in a number of reconstructed volumes equal to the number of hypervolumes in the jth bootstrapped occupancy_object. Reconstructed volumes are then compared among each other to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated with the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned.
A Hypervolume-class
or HypervolumeList-class
object.
hypervolume_n_occupancy
, hypervolume_n_occupancy_test
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # set to 0 values lower than 0.35 occupancy_filter(hv_occupancy_list_sex, operator = "<", filter = "0.35") ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm","bill_depth_mm","flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # set to 0 values lower than 0.35 occupancy_filter(hv_occupancy_list_sex, operator = "<", filter = "0.35") ## End(Not run)
The occupancy_to_intersection()
function is used to get the intersection of hypervolumes of an object generated with the occupancy routine.
occupancy_to_intersection(hv_list, method = "all", m = 2, tol = 1e-10)
occupancy_to_intersection(hv_list, method = "all", m = 2, tol = 1e-10)
hv_list |
A |
method |
If |
m |
Number of elements to choose. Default to 2 (pairwise comparisons). This argument is ignored when |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The occupancy_to_intersection()
function takes as input a HypervolumeList
generated with an occupancy function (check See Also
) and returns a Hypervolume
or HypervolumeList
depending on method
. When method = "all"
the occupancy_to_intersection()
function returns a Hypervolume
representing the intersection of all the hypervolumes in hv_list
. When method = "n_wise"
a HypervolumeList
in which each hypervolume represent the intersection of a combination of the hypervolumes in hv_list
is returned. The number of hypervolumes for each combination is set with the argument m
. Argument m
can not be higher than the number of hypervolumes in hv_list
and lower than 2.
The occupancy_to_intersection()
function attempts to reconstruct the volume of the intersection from the hv_list
provided by the user. At first, the volume of the union of hypervolumes is calculated for each hypervolume in hv_list
as the the ratio between the total number of random points and the number of random points of the ith hypervolume of hv_list
, multiplied by the volume of the ith hypervolume of hv_list
. This step results in a number of reconstructed volumes equal to the number of hypervolumes in the jth bootstrapped occupancy_object. Reconstructed volumes are then compared among each other to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated with the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned. The volume of the intersection is then calculated as the ratio between the number of random points of the intersection and the total number of random points, multiplied by the volume of the union of hypervolumes.
A Hypervolume-class
or HypervolumeList-class
object.
hypervolume_n_occupancy
, hypervolume_n_occupancy_test
, occupancy_to_union
,
occupancy_to_unshared
, occupancy_filter
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("Adelie", "Chinstrap", "Gentoo"), 2)) # get the hypervolume of intersection hv_occupancy_sex_intersection <- occupancy_to_intersection(hv_occupancy_list_sex) plot(hv_occupancy_sex_intersection) # get hypervolumes with the intersection among 3 or 2 combinations of hypervolumes hv_occ_sex_intersection_3 <- occupancy_to_intersection(hv_occupancy_list_sex, method = "n_wise", m = 3) hv_occ_intersection_2 <- occupancy_to_intersection(hv_occupancy_list_sex, method = "n_wise", m = 2) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("Adelie", "Chinstrap", "Gentoo"), 2)) # get the hypervolume of intersection hv_occupancy_sex_intersection <- occupancy_to_intersection(hv_occupancy_list_sex) plot(hv_occupancy_sex_intersection) # get hypervolumes with the intersection among 3 or 2 combinations of hypervolumes hv_occ_sex_intersection_3 <- occupancy_to_intersection(hv_occupancy_list_sex, method = "n_wise", m = 3) hv_occ_intersection_2 <- occupancy_to_intersection(hv_occupancy_list_sex, method = "n_wise", m = 2) ## End(Not run)
The occupancy_to_union()
function is used to get the union of hypervolumes of an object generated with hypervolume_n_occupancy()
.
occupancy_to_union(hv_list, method = "all", m = 2, tol = 1e-10)
occupancy_to_union(hv_list, method = "all", m = 2, tol = 1e-10)
hv_list |
A |
method |
If |
m |
Number of elements to choose. Default to 2 (pairwise comparisons). This argument is ignored when |
tol |
Set the tolerance for reconstructing whole volume. See details. |
The occupancy_to_union()
function takes as input a HypervolumeList
generated with an occupancy function (check See Also
) and returns a Hypervolume
or HypervolumeList
depending on method
. When method = "all"
the occupancy_to_union()
function returns a Hypervolume
representing the union of all the hypervolumes in hv_list
. When method = "n_wise"
a HypervolumeList
in which each hypervolume represent the union of a combination of the hypervolumes in hv_list
is returned. The number of hypervolumes for each combination is set with the argument m
. Argument m
can not be higher than the number of hypervolumes in hv_list
and lower than 2.
The occupancy_to_union()
function attempts to reconstruct the volume of the union from the hv_list
provided by the user. For each hypervolume in hv_list
, it calculates the volume of the union as the ratio between the total number of random points and the number of random points of the ith hypervolume of hv_list
, multiplied by the volume of the ith hypervolume of hv_list
. This step results in a number of reconstructed volumes equal to the number of hypervolumes in the jth bootstrapped occupancy_object. Reconstructed volumes are then compared among each other to ensure the consistency of the reconstruction. To do this, the distance among reconstructed volumes is calculated with the dist()
function of the stats
package. If at least one of the distances is greater than tol
the computation is stopped and some suggestions are returned.
A Hypervolume-class
or HypervolumeList-class
object.
hypervolume_n_occupancy
, hypervolume_n_occupancy_test
, occupancy_to_intersection
, occupancy_to_unshared
, occupancy_filter
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # get the union of all the hypervolumes hv_occupancy_sex_union <- occupancy_to_union(hv_occupancy_list_sex) plot(hv_occupancy_sex_union) ## End(Not run)
## Not run: data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) # split the dataset on species and sex penguins_no_na_split = split(penguins_no_na, paste(penguins_no_na$species, penguins_no_na$sex, sep = "_")) # calculate the hypervolume for each element of the splitted dataset hv_list = mapply(function(x, y) hypervolume_gaussian(x[, c("bill_length_mm", "flipper_length_mm")], samples.per.point=100, name = y), x = penguins_no_na_split, y = names(penguins_no_na_split)) # transform the list into an HypervolumeList hv_list = hypervolume_join(hv_list) # calculate occupancy based on sex hv_occupancy_list_sex = hypervolume_n_occupancy(hv_list, classification = rep(c("female", "male"), 3)) # get the union of all the hypervolumes hv_occupancy_sex_union <- occupancy_to_union(hv_occupancy_list_sex) plot(hv_occupancy_sex_union) ## End(Not run)
For each data axis, finds the minimum and maximum values. Then pads this range by a multiplicative factor of the range interval, and pads again by an additive amount.
padded_range(data, multiply.interval.amount = 0, add.amount = 0)
padded_range(data, multiply.interval.amount = 0, add.amount = 0)
data |
A m x n matrix whose range limits should be found. |
multiply.interval.amount |
A non-negative factor used to multiply the range interval. Can have either dimensionality 1 or n. |
add.amount |
A non-negative factor used to add to the range limits. Can have either dimensionality 1 or n. |
A 2 x n matrix, whose first row is the low value along each axis and whose second row is the high value along each axis.
data(morphSnodgrassHeller) finch_isabela <- na.omit(morphSnodgrassHeller[morphSnodgrassHeller$IslandID=="Isa_Alb", c("WingL","TailL","BeakW","BeakH")]) finch_isabela_rangebox_nopadding = padded_range(finch_isabela) finch_isabela_rangebox_nopadding finch_isabela_rangebox_padding = padded_range(finch_isabela, multiply.interval.amount=0.5, add.amount=0.1) finch_isabela_rangebox_padding
data(morphSnodgrassHeller) finch_isabela <- na.omit(morphSnodgrassHeller[morphSnodgrassHeller$IslandID=="Isa_Alb", c("WingL","TailL","BeakW","BeakH")]) finch_isabela_rangebox_nopadding = padded_range(finch_isabela) finch_isabela_rangebox_nopadding finch_isabela_rangebox_padding = padded_range(finch_isabela, multiply.interval.amount=0.5, add.amount=0.1) finch_isabela_rangebox_padding
Plots a single hypervolume or multiple hypervolumes as either a pairs plot (all axes) or a 3D plot (a subset of axes). The hypervolume is drawn as a uniformly random set of points guaranteed to be in the hypervolume.
## S3 method for class 'HypervolumeList' plot(x, show.3d=FALSE,plot.3d.axes.id=NULL, show.axes=TRUE, show.frame=TRUE, show.random=TRUE, show.density=TRUE,show.data=TRUE, names=NULL, show.legend=TRUE, limits=NULL, show.contour=TRUE, contour.lwd=1.5, contour.type='kde', contour.alphahull.alpha=0.25, contour.ball.radius.factor=1, contour.kde.level=1e-04, contour.raster.resolution=20, show.centroid=TRUE, cex.centroid=2, colors=rainbow(floor(length(x@HVList)*1.5),alpha=0.8), point.alpha.min=0.2, point.dark.factor=0.5, cex.random=0.5,cex.data=0.75,cex.axis=0.75,cex.names=1.0,cex.legend=0.75, num.points.max.data = 1000, num.points.max.random = 2000, reshuffle=TRUE, plot.function.additional=NULL, verbose=FALSE, ...)
## S3 method for class 'HypervolumeList' plot(x, show.3d=FALSE,plot.3d.axes.id=NULL, show.axes=TRUE, show.frame=TRUE, show.random=TRUE, show.density=TRUE,show.data=TRUE, names=NULL, show.legend=TRUE, limits=NULL, show.contour=TRUE, contour.lwd=1.5, contour.type='kde', contour.alphahull.alpha=0.25, contour.ball.radius.factor=1, contour.kde.level=1e-04, contour.raster.resolution=20, show.centroid=TRUE, cex.centroid=2, colors=rainbow(floor(length(x@HVList)*1.5),alpha=0.8), point.alpha.min=0.2, point.dark.factor=0.5, cex.random=0.5,cex.data=0.75,cex.axis=0.75,cex.names=1.0,cex.legend=0.75, num.points.max.data = 1000, num.points.max.random = 2000, reshuffle=TRUE, plot.function.additional=NULL, verbose=FALSE, ...)
x |
A Hypervolume or HypervolumeList object. The objects to be plotted. |
show.3d |
If |
plot.3d.axes.id |
Numeric identities of axes to plot in three dimensions. Defaults to 1:3 if set to |
show.axes |
If |
show.frame |
If |
show.random |
If |
show.density |
If |
show.data |
If |
names |
A vector of strings in the same order as the input hypervolumes. Used to draw the axes labels. |
show.legend |
If |
limits |
A list of two-element vectors corresponding to the axes limits for each dimension. If a single two-element vector is provided it is re-used for all axes. |
show.contour |
If |
contour.lwd |
Line width used for contour lines. Ignored if |
contour.type |
Type of contour boundary: any of |
contour.alphahull.alpha |
Value of the alpha parameter for a |
contour.ball.radius.factor |
Factor used to multiply radius of ball surrounding each random point for a |
contour.kde.level |
Probability level used to delineate edges for a |
contour.raster.resolution |
Grid resolution for a |
show.centroid |
If |
cex.centroid |
Expansion factor for the centroid symbol. |
colors |
A vector of colors to be used to plot each hypervolume, in the same order as the input hypervolumes. |
point.alpha.min |
Fractional value corresponding to the most transparent value for plotting random points. 0 corresponds to full transparency. |
point.dark.factor |
Fractional value corresponding to the darkening factor for plotting data points. 0 corresponds to fully black. |
cex.random |
cex value for uniformly random points. |
cex.data |
cex value for data points. |
cex.axis |
cex value for axes, if pair=T. |
cex.names |
cex value for variable names printed on the diagonal, if pair=T. |
cex.legend |
cex value for the legend text |
num.points.max.data |
An integer indicating the maximum number of data points to be sampled from each hypervolume. Lower values result in faster plotting and smaller file sizes but less accuracy. |
num.points.max.random |
An integer indicating the maximum number of random points to be sampled from each hypervolume. Lower values result in faster plotting and smaller file sizes but less accuracy. |
reshuffle |
A logical value relevant when pair=TRUE. If false, each hypervolume is drawn on top of the previous hypervolume; if true, all points of all hypervolumes are randomly shuffled so no hypervolume is given visual preference during plotting. |
plot.function.additional |
Any |
verbose |
If |
... |
Additional arguments to be passed to |
None; used for the side-effect of producing a plot.
Contour line plotting with alphahull
requires the non-FOSS alphahull
package to be installed. Please do so in order to use this functionality!
## Not run: # low parameter values for speed data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') # 2d plot plot(hv, show.3d=FALSE) # 3d plot if(interactive()) { plot(hv, show.3d=TRUE) } ## End(Not run)
## Not run: # low parameter values for speed data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] hv = hypervolume_gaussian(penguins_adelie,name='Adelie') # 2d plot plot(hv, show.3d=FALSE) # 3d plot if(interactive()) { plot(hv, show.3d=TRUE) } ## End(Not run)
Summarizes all slots of Hypervolume-class
object.
## S3 method for class 'Hypervolume' print(x, ...) ## S3 method for class 'HypervolumeList' print(x, ...)
## S3 method for class 'Hypervolume' print(x, ...) ## S3 method for class 'HypervolumeList' print(x, ...)
x |
The hypervolume to summarize |
... |
Other arguments to be passed to base print function |
None; used for the side-effect of printing.
Data for occurrences of Quercus alba and Quercus rubra based on geographic observations. Demonstration analysis of how to use hypervolumes for species distribution modeling using WorldClim data.
data(quercus)
data(quercus)
A data frame with 3779 observations on the following 3 variables.
Species
a factor with levels Quercus alba
Quercus rubra
Latitude
a numeric vector
Longitude
a numeric vector
Occurrence data come from the BIEN database (https://biendata.org/
). Climate data are from WorldClim.
Blonder, B., Lamanna, C., Violle, C., Enquist, B. The n-dimensional hypervolume. Global Ecology and Biogeography (2014).
demo('quercus', package='hypervolume')
demo('quercus', package='hypervolume')
Prints basic information about Hypervolume or HypervolumeList structure.
## S3 method for class 'Hypervolume' summary(object, ...) ## S3 method for class 'HypervolumeList' summary(object, ...)
## S3 method for class 'Hypervolume' summary(object, ...) ## S3 method for class 'HypervolumeList' summary(object, ...)
object |
The hypervolume to summarize |
... |
Other arguments to be passed to base summary function |
None; used for the side-effect of printing.
Takes a path to a directory containing only rds files and reads them into a HypervolumeList object.
to_hv_list(path)
to_hv_list(path)
path |
absolute or relative path to directory containing rds files |
Use to_hv_list on the output from hypervolume_resample
when method = "bootstrap" to read bootstrapped hypervolumes into memory.
HypervolumeList
object
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Use detectCores to see how many cores are availible in current environment path = hypervolume_resample("example_bootstrap", hv, method = "bootstrap", n = 50, cores = 12) hvs = to_hv_list(path) ## End(Not run)
## Not run: library(palmerpenguins) data(penguins) bill_data = na.omit(penguins[,3:4]) hv = hypervolume(bill_data) # Use detectCores to see how many cores are availible in current environment path = hypervolume_resample("example_bootstrap", hv, method = "bootstrap", n = 50, cores = 12) hvs = to_hv_list(path) ## End(Not run)
Resamples input data for hypervolume construction, so that some data points can be weighted more strongly than others in kernel density estimation. Also allows a multidimensional normal prior distribution to be placed on each data point to enable simulation of uncertainty or variation within each observed data point.
Note that this algorithm will change the number of data points and may thus lead to changes in the inferred hypervolume if the selected algorithm (e.g. for bandwidth selection) depends on sample size.
A direct weighting approach (which does not artificially change the sample size, and thus the kernel bandwidth estimate) is available for Gaussian hypervolumes within hypervolume_gaussian
.
weight_data(data, weights, jitter.sd = matrix(0, nrow = nrow(data), ncol = ncol(data)))
weight_data(data, weights, jitter.sd = matrix(0, nrow = nrow(data), ncol = ncol(data)))
data |
A data frame or matrix of unweighted data. Must only contain numeric values. |
weights |
A vector of weights with the same length as the number of rows in |
jitter.sd |
A matrix of the same size as |
Each data point is jittered a single time. To sample many points from a distribution around each observed data point, multiply all weights by a large number.
A data frame with the rows of data
repeated by weights
, potentially with noise added. The output has the same columns as the input but sum(weights)
total rows.
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] weighted_data <- weight_data(penguins_adelie, weights=1+rpois(n=nrow(penguins_adelie),lambda=3)) # color points by alpha to show overlaps pairs(weighted_data,col=rgb(1,0,0,alpha=0.15)) weighted_noisy_data <- weight_data(penguins_adelie, weights=1+rpois(n=nrow(penguins_adelie),lambda=3),jitter.sd=0.5) # color points by alpha to show overlaps pairs(weighted_noisy_data,col=rgb(1,0,0,alpha=0.15))
data(penguins,package='palmerpenguins') penguins_no_na = as.data.frame(na.omit(penguins)) penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie", c("bill_length_mm","bill_depth_mm","flipper_length_mm")] weighted_data <- weight_data(penguins_adelie, weights=1+rpois(n=nrow(penguins_adelie),lambda=3)) # color points by alpha to show overlaps pairs(weighted_data,col=rgb(1,0,0,alpha=0.15)) weighted_noisy_data <- weight_data(penguins_adelie, weights=1+rpois(n=nrow(penguins_adelie),lambda=3),jitter.sd=0.5) # color points by alpha to show overlaps pairs(weighted_noisy_data,col=rgb(1,0,0,alpha=0.15))