Title: | Nonparametric Methods for Directional Data |
---|---|
Description: | Nonparametric kernel density estimation, bandwidth selection, and other utilities for analyzing directional data. Implements the estimator in Bai, Rao and Zhao (1987) <doi:10.1016/0047-259X(88)90113-3>, the cross-validation bandwidth selectors in Hall, Watson and Cabrera (1987) <doi:10.1093/biomet/74.4.751> and the plug-in bandwidth selectors in García-Portugués (2013) <doi:10.1214/13-ejs821>. |
Authors: | Eduardo García-Portugués [aut, cre] |
Maintainer: | Eduardo García-Portugués <[email protected]> |
License: | GPL-3 |
Version: | 0.1.10 |
Built: | 2024-12-11 07:13:11 UTC |
Source: | CRAN |
DirStats
– Nonparametric Methods for Directional DataNonparametric kernel density estimation, bandwidth selection, and other utilities for analyzing directional data. Implements the estimator in Bai, Rao and Zhao (1987) <doi:10.1016/0047-259X(88)90113-3>, the cross-validation bandwidth selectors in Hall, Watson and Cabrera (1987) <doi:10.1093/biomet/74.4.751> and the plug-in bandwidth selectors in García-Portugués (2013) <doi:10.1214/13-ejs821>.
Eduardo García-Portugués.
Fitting mixtures of von Mises–Fisher distributions by the Expectation-Maximization algorithm, with determination of the optimal number of mixture components.
bic_vmf_mix(data, M_bound = ceiling(log(nrow(data))), M_neig = 3, crit = "BIC", iterative = TRUE, plot_it = FALSE, verbose = FALSE, kappa_max = 250)
bic_vmf_mix(data, M_bound = ceiling(log(nrow(data))), M_neig = 3, crit = "BIC", iterative = TRUE, plot_it = FALSE, verbose = FALSE, kappa_max = 250)
data |
directional data, a matrix of size |
M_bound |
bound for the number of components in the mixtures. If it is
not enough, the search for the mixture with minimum |
M_neig |
number of neighbors explored around the optimal number
of mixture components. Defaults to |
crit |
information criterion employed, either |
iterative |
keep exploring higher number of components if the optimum
is attained at |
plot_it |
display an informative plot on the optimization's grid search?
Defaults to |
verbose |
display fitting progress? Defaults to |
kappa_max |
maximum value of allowed concentrations, to avoid numerical
instabilities. Defaults to |
See Algorithm 3 in García-Portugués (2013). The Expectation-Maximization
fit is performed with movMF
.
A list with entries:
best_fit
: a list with estimated mixture parameters
mu_hat
, kappa_hat
, and p_hat
of the best-fitting
mixture according to crit
.
fit_mixs
: a list with of the fitted mixtures.
BICs
: a vector with the BICs (or other information criterion)
of the fitted mixtures.
García-Portugués, E. (2013). Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electronic Journal of Statistics, 7:1655–1685. doi:10.1214/13-ejs821
Hornik, K. and Grün, B. (2014). movMF: An R Package for Fitting Mixtures of von Mises–Fisher Distributions. Journal of Statistical Software, 58(10):1–31. doi:10.18637/jss.v058.i10
# Sample q <- 2 n <- 300 set.seed(42) samp <- rbind(rotasym::r_vMF(n = n / 3, mu = c(rep(0, q), 1), kappa = 5), rotasym::r_vMF(n = n / 3, mu = c(rep(0, q), -1), kappa = 5), rotasym::r_vMF(n = n / 3, mu = c(1, rep(0, q)), kappa = 5)) # Mixture fit bic_vmf_mix(data = samp, plot_it = TRUE, verbose = TRUE)
# Sample q <- 2 n <- 300 set.seed(42) samp <- rbind(rotasym::r_vMF(n = n / 3, mu = c(rep(0, q), 1), kappa = 5), rotasym::r_vMF(n = n / 3, mu = c(rep(0, q), -1), kappa = 5), rotasym::r_vMF(n = n / 3, mu = c(1, rep(0, q)), kappa = 5)) # Mixture fit bic_vmf_mix(data = samp, plot_it = TRUE, verbose = TRUE)
Likelihood and least squares cross-validation bandwidth selectors for kernel density estimation with directional data.
bw_dir_lcv(data, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), L = NULL, plot_it = FALSE, optim = TRUE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10) bw_dir_lscv(data, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), L = NULL, plot_it = FALSE, optim = TRUE, R_code = FALSE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10)
bw_dir_lcv(data, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), L = NULL, plot_it = FALSE, optim = TRUE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10) bw_dir_lscv(data, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), L = NULL, plot_it = FALSE, optim = TRUE, R_code = FALSE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10)
data |
directional data, a matrix of size |
h_grid |
vector of bandwidths for performing a grid search. Defaults
to |
L |
kernel function. Set internally to |
plot_it |
display an informative plot on the optimization's grid search?
Defaults to |
optim |
run an optimization? Defaults to |
optim_par , optim_lower , optim_upper
|
parameters passed to |
R_code |
use slower R code when |
data
is not checked to have unit norm, so the user must be careful.
When L = NULL
, faster FORTRAN code is employed.
bw_dir_lscv
employs Monte Carlo integration for , which
results in a random output. Use
set.seed
before to avoid it.
A list with entries:
h_opt
: selected bandwidth.
h_grid
: h_grid
, if used (otherwise NULL
).
CV_opt
: minimum of the CV loss.
CV_grid
: value of the CV function at h_grid
, if used
(otherwise NULL
).
The function bw_dir_lscv
employs Netlib's subroutine
ribesl
for evaluating
the modified Bessel function of the first kind. The subroutine is based
on a program by Sookne (1973) and was modified by W. J. Cody and L. Stoltz.
An earlier version was published in Cody (1983).
Cody, W. J. (1983). Algorithm 597: Sequence of modified Bessel functions of the first kind. ACM Transactions on Mathematical Software, 9(2):242–245. doi:10.1145/357456.357462
Hall, P., Watson, G. S., and Cabrera, J. (1987). Kernel density estimation with spherical data. Biometrika, 74(4):751–762. doi:10.1093/biomet/74.4.751
Sookne, D. J. (1973). Bessel functions of real argument and integer order. Journal of Research of the National Bureau of Standards, 77B:125–132.
# Sample n <- 25 q <- 2 set.seed(42) samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # bw_dir_lcv bw_dir_lcv(data = samp, optim = TRUE)$h_opt bw_dir_lcv(data = samp, optim = FALSE, plot_it = TRUE)$h_opt bw_dir_lcv(data = samp, L = function(x) exp(-x))$h_opt # bw_dir_lscv set.seed(42) bw_dir_lscv(data = samp, optim = TRUE)$h_opt bw_dir_lscv(data = samp, optim = FALSE, plot_it = TRUE)$h_opt bw_dir_lscv(data = samp, optim = FALSE, R_code = TRUE)$h_opt bw_dir_lscv(data = samp, L = function(x) exp(-x))$h_opt
# Sample n <- 25 q <- 2 set.seed(42) samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # bw_dir_lcv bw_dir_lcv(data = samp, optim = TRUE)$h_opt bw_dir_lcv(data = samp, optim = FALSE, plot_it = TRUE)$h_opt bw_dir_lcv(data = samp, L = function(x) exp(-x))$h_opt # bw_dir_lscv set.seed(42) bw_dir_lscv(data = samp, optim = TRUE)$h_opt bw_dir_lscv(data = samp, optim = FALSE, plot_it = TRUE)$h_opt bw_dir_lscv(data = samp, optim = FALSE, R_code = TRUE)$h_opt bw_dir_lscv(data = samp, L = function(x) exp(-x))$h_opt
Plug-in bandwidth selectors for kernel density estimation with directional data, including Rule-Of-Thumb (ROT), Asymptotic MIxtures (AMI), and Exact MIxtures (EMI).
bw_dir_rot(data) bw_dir_ami(data, fit_mix = NULL, L = NULL) R_Psi_mixvmf(q, mu, kappa, p) bw_dir_emi(data, fit_mix = NULL, optim = TRUE, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), plot_it = TRUE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10)
bw_dir_rot(data) bw_dir_ami(data, fit_mix = NULL, L = NULL) R_Psi_mixvmf(q, mu, kappa, p) bw_dir_emi(data, fit_mix = NULL, optim = TRUE, h_grid = exp(seq(log(0.05), log(1.5), l = 100)), plot_it = TRUE, optim_par = 0.25, optim_lower = 0.06, optim_upper = 10)
data |
directional data, a matrix of size |
fit_mix |
output from |
L |
kernel function. Set internally to |
q |
dimension of |
mu , kappa , p
|
mixture parameters. |
optim |
run an optimization? Defaults to |
h_grid |
vector of bandwidths for performing a grid search. Defaults
to |
plot_it |
display an informative plot on the optimization's grid search?
Defaults to |
optim_par , optim_lower , optim_upper
|
parameters passed to |
See Algorithms 1 (AMI) and 2 (EMI) in García-Portugués (2013). The ROT
selector is implemented according to Proposition 2, but without
the paper's typo in equation (6), case , where an incorrect
extra
appears premultiplying
in the denominator.
bw_dir_ami
uses R_Psi_mixvmf
for computing the curvature
term of a mixture of von Mises–Fisher densities.
bw_dir_emi
employs Monte Carlo integration for , which
results in a random output. Use
set.seed
before to avoid it.
Selected bandwidth for bw_dir_rot
and bw_dir_ami
.
bw_dir_emi
returns a list with entries:
h_opt
: selected bandwidth.
h_grid
: h_grid
, if used (otherwise NULL
).
MISE_opt
: minimum of the MISE loss.
MISE_grid
: value of the MISE function at h_grid
, if
used (otherwise NULL
).
García-Portugués, E. (2013). Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electronic Journal of Statistics, 7:1655–1685. doi:10.1214/13-ejs821
# Sample n <- 25 q <- 2 set.seed(42) samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Mixture fit fit_mix <- bic_vmf_mix(data = samp, plot_it = TRUE) # ROT bw_dir_rot(samp) # AMI bw_dir_ami(samp) bw_dir_ami(samp, fit_mix = fit_mix) bw_dir_ami(samp, fit_mix = fit_mix, L = function(x) exp(-x)) # EMI bw_dir_emi(samp) bw_dir_emi(samp, fit_mix = fit_mix, optim = FALSE, plot_it = TRUE)
# Sample n <- 25 q <- 2 set.seed(42) samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Mixture fit fit_mix <- bic_vmf_mix(data = samp, plot_it = TRUE) # ROT bw_dir_rot(samp) # AMI bw_dir_ami(samp) bw_dir_ami(samp, fit_mix = fit_mix) bw_dir_ami(samp, fit_mix = fit_mix, L = function(x) exp(-x)) # EMI bw_dir_emi(samp) bw_dir_emi(samp, fit_mix = fit_mix, optim = FALSE, plot_it = TRUE)
Normalization of data in to
.
Transformations between
and
, and between
and
.
norm2(x) normalize(x) to_cir(th) to_rad(x) to_sph(th, ph)
norm2(x) normalize(x) to_cir(th) to_rad(x) to_sph(th, ph)
x |
matrix or vector, in |
th |
vector of angles in |
ph |
vector of angles in |
Euclidean norm (norm
) and normalized data (normalize
).
Position in (
to_cir
) or in (
to_rad
).
Position in (
to_sph
) or in
(
to_rad
).
# Normalization x <- 1:3 norm2(x) normalize(x) x <- rbind(1:3, 3:1) norm2(x) normalize(x) # Circular transformations th <- 1 x <- c(0, 1) to_rad(to_cir(th)) to_rad(to_cir(c(th, th + 1))) to_cir(to_rad(x)) to_cir(to_rad(rbind(x, -x))) # Spherical transformations th <- 2 ph <- 1 x <- c(0, 1, 0) to_rad(to_sph(th, ph)) to_rad(to_sph(c(th, th + 1), c(ph, ph + 1))) to_sph(to_rad(x)[, 1], to_rad(x)[, 2]) to_sph(to_rad(rbind(x, -x))[, 1], to_rad(rbind(x, -x))[, 2])
# Normalization x <- 1:3 norm2(x) normalize(x) x <- rbind(1:3, 3:1) norm2(x) normalize(x) # Circular transformations th <- 1 x <- c(0, 1) to_rad(to_cir(th)) to_rad(to_cir(c(th, th + 1))) to_cir(to_rad(x)) to_cir(to_rad(rbind(x, -x))) # Spherical transformations th <- 2 ph <- 1 x <- c(0, 1, 0) to_rad(to_sph(th, ph)) to_rad(to_sph(c(th, th + 1), c(ph, ph + 1))) to_sph(to_rad(x)[, 1], to_rad(x)[, 2]) to_sph(to_rad(rbind(x, -x))[, 1], to_rad(rbind(x, -x))[, 2])
Several quadrature rules for integration of functions on
,
, and
,
.
int_cir(f, N = 500, na.rm = TRUE, f_vect = TRUE, ...) int_sph(f, na.rm = TRUE, f_vect = TRUE, ...) int_hypsph(f, q, M = 1e+05, na.rm = TRUE, f_vect = TRUE, ...)
int_cir(f, N = 500, na.rm = TRUE, f_vect = TRUE, ...) int_sph(f, na.rm = TRUE, f_vect = TRUE, ...) int_hypsph(f, q, M = 1e+05, na.rm = TRUE, f_vect = TRUE, ...)
f |
function to be integrated on |
N |
Defaults to |
na.rm |
ignore possible |
f_vect |
can |
... |
further arguments passed to |
q |
dimension of |
M |
number of Monte Carlo replicates. Defaults to |
int_cir
is an extension of equation (4.1.11) in Press et al. (1997),
a periodic trapezoidal rule. int_sph
employs the
Lebedev quadrature on .
int_hypsph
implements a Monte Carlo integration on .
A scalar approximating the integral.
Lebedev, V. I. and Laikov, D. N. (1999). A quadrature formula for the sphere of the 131st algebraic order of accuracy. Doklady Mathematics, 59(3):477–481.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery B. P. (1997). Numerical Recipes in Fortran 77: The Art of Scientific Computing. Volume 1. Cambridge University Press, Cambridge. Second edition.
# S^1, trapezoidal rule f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 1), kappa = 2) int_cir(f = f) # S^2, Lebedev rule f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 0, 1), kappa = 2) int_sph(f = f) # S^2, Monte Carlo f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 0, 1), kappa = 2) int_hypsph(f = f, q = 2)
# S^1, trapezoidal rule f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 1), kappa = 2) int_cir(f = f) # S^2, Lebedev rule f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 0, 1), kappa = 2) int_sph(f = f) # S^2, Monte Carlo f <- function(x) rotasym::d_vMF(x = x, mu = c(0, 0, 1), kappa = 2) int_hypsph(f = f, q = 2)
Kernel density estimation with directional data as in the estimator of Bai et al. (1988).
kde_dir(x, data, h, L = NULL) c_h(h, q, L = NULL) lambda_L(L = NULL, q) b_L(L = NULL, q) d_L(L = NULL, q)
kde_dir(x, data, h, L = NULL) c_h(h, q, L = NULL) lambda_L(L = NULL, q) b_L(L = NULL, q) d_L(L = NULL, q)
x |
evaluation points, a matrix of size |
data |
directional data, a matrix of size |
h |
bandwidth, a scalar for |
L |
kernel function. Set internally to |
q |
dimension of |
data
is not checked to have unit norm, so the user must be careful.
When L = NULL
, faster FORTRAN code is employed.
kde_dir
returns a vector of size nx
with the
evaluated kernel density estimator. c_h
returns the normalizing
constant for the kernel, a vector of length length(h)
.
lambda_L
, b_L
, and d_L
return moments of L
.
Bai, Z. D., Rao, C. R., and Zhao, L. C. (1988). Kernel estimators of density function of directional data. Journal of Multivariate Analysis, 27(1):24–39. doi:10.1016/0047-259X(88)90113-3
# Sample n <- 50 q <- 3 samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Evaluation points x <- rbind(diag(1, nrow = q + 1), diag(-1, nrow = q + 1)) # kde_dir kde_dir(x = x, data = samp, h = 0.5, L = NULL) kde_dir(x = x, data = samp, h = 0.5, L = function(x) exp(-x)) # c_h c_h(h = 0.5, q = q, L = NULL) c_h(h = 0.5, q = q, L = function(x) exp(-x)) # b_L b_L(L = NULL, q = q) b_L(L = function(x) exp(-x), q = q) # d_L d_L(L = NULL, q = q) d_L(L = function(x) exp(-x), q = q) # lambda_L lambda_L(L = NULL, q = q) lambda_L(L = function(x) exp(-x), q = q)
# Sample n <- 50 q <- 3 samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Evaluation points x <- rbind(diag(1, nrow = q + 1), diag(-1, nrow = q + 1)) # kde_dir kde_dir(x = x, data = samp, h = 0.5, L = NULL) kde_dir(x = x, data = samp, h = 0.5, L = function(x) exp(-x)) # c_h c_h(h = 0.5, q = q, L = NULL) c_h(h = 0.5, q = q, L = function(x) exp(-x)) # b_L b_L(L = NULL, q = q) b_L(L = function(x) exp(-x), q = q) # d_L d_L(L = NULL, q = q) d_L(L = function(x) exp(-x), q = q) # lambda_L lambda_L(L = NULL, q = q) lambda_L(L = function(x) exp(-x), q = q)
Nodes and weights for Lebedev quadrature on the sphere
. The rule has 5810 points and is exact up to polynomials
of order 131.
lebedev
lebedev
A data frame with 5810 rows and two variables:
nodes for quadrature, a matrix with three columns.
weights for quadrature, a vector.
The approximation to the integral of has the form
where . The nodes (in spherical coordinates) and weights
are processed from
lebedev_131.txt.
https://people.sc.fsu.edu/~jburkardt/datasets/sphere_lebedev_rule/sphere_lebedev_rule.html
Lebedev, V. I. and Laikov, D. N. (1999). A quadrature formula for the sphere of the 131st algebraic order of accuracy. Doklady Mathematics, 59(3):477–481.
# Load data data("lebedev") # Integrate x_1 * x_2^2 (zero integral) f_1 <- function(x) x[, 1] * x[, 2]^2 4 * pi * sum(lebedev$w * f_1(lebedev$xyz))
# Load data data("lebedev") # Integrate x_1 * x_2^2 (zero integral) f_1 <- function(x) x[, 1] * x[, 2]^2 4 * pi * sum(lebedev$w * f_1(lebedev$xyz))
Maximum likelihood estimation for the von Mises–Fisher distribution and evaluation of density mixtures.
kappa_ml(data, min_kappa = 1e-04, max_kappa = 100, ...) mu_ml(data) d_mixvmf(x, mu, kappa, p, norm = FALSE)
kappa_ml(data, min_kappa = 1e-04, max_kappa = 100, ...) mu_ml(data) d_mixvmf(x, mu, kappa, p, norm = FALSE)
data |
directional data, a matrix of size |
min_kappa , max_kappa
|
minimum and maximum kappas to look for the maximum likelihood estimate. |
... |
further parameters passed to |
x |
evaluation points, a matrix of size |
mu , kappa , p
|
mixture parameters. |
norm |
enforce normalization of |
Estimated vector mean (mu_ml
) or concentration parameter
(kappa_ml
). A vector of length nx
for d_mixvmf
.
# Sample n <- 50 q <- 2 samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Estimates mu_ml(samp) kappa_ml(samp) # Mixture x <- to_cir(seq(0, 2 * pi, l = 200)) dens <- d_mixvmf(x = x, mu = rbind(c(-1, 0), c(0, 1), c(1, 0)), kappa = 1:3, p = c(0.5, 0.2, 0.3)) plot(to_rad(x), dens, type = "l")
# Sample n <- 50 q <- 2 samp <- rotasym::r_vMF(n = n, mu = c(1, rep(0, q)), kappa = 2) # Estimates mu_ml(samp) kappa_ml(samp) # Mixture x <- to_cir(seq(0, 2 * pi, l = 200)) dens <- d_mixvmf(x = x, mu = rbind(c(-1, 0), c(0, 1), c(1, 0)), kappa = 1:3, p = c(0.5, 0.2, 0.3)) plot(to_rad(x), dens, type = "l")