Title: | PCA on the Torus via Density Ridges |
---|---|
Description: | Implementation of a Principal Component Analysis (PCA) in the torus via density ridge estimation. The main function, ridge_pca(), obtains the relevant density ridge for bivariate sine von Mises and bivariate wrapped Cauchy distribution models and provides the associated scores and variance decomposition. Auxiliary functions for evaluating, fitting, and sampling these models are also provided. The package provides replicability to García-Portugués and Prieto-Tirado (2023) <doi:10.1007/s11222-023-10273-9>. |
Authors: | Eduardo García-Portugués [aut, cre] , Arturo Prieto-Tirado [aut] |
Maintainer: | Eduardo García-Portugués <[email protected]> |
License: | GPL-3 |
Version: | 1.0.2 |
Built: | 2024-11-07 06:47:15 UTC |
Source: | CRAN |
ridgetorus
: PCA on the Torus via Density RidgesImplementation of a Principal Component Analysis (PCA) in the torus via density ridge estimation. The main function, ridge_pca(), obtains the relevant density ridge for bivariate sine von Mises and bivariate wrapped Cauchy distribution models and provides the associated scores and variance decomposition. Auxiliary functions for evaluating, fitting, and sampling these models are also provided. The package provides replicability to García-Portugués and Prieto-Tirado (2023) <doi:10.1007/s11222-023-10273-9>.
Eduardo García-Portugués and Arturo Prieto-Tirado.
García-Portugués, E. and Prieto-Tirado, A. (2023). Toroidal PCA via density ridges. Statistics and Computing, 33(5):107. doi:10.1007/s11222-023-10273-9
Performs the following likelihood ratio tests for the
concentrations in bivariate sine von Mises and wrapped Cauchy distributions:
(1) homogeneity: vs.
, and
vs.
, respectively;
(2) independence:
vs.
, and
vs.
.
The tests (1) and (2) can be performed simultaneously.
biv_lrt(x, hom = FALSE, indep = FALSE, fit_mle = NULL, type, ...)
biv_lrt(x, hom = FALSE, indep = FALSE, fit_mle = NULL, type, ...)
x |
matrix of dimension |
hom |
test the homogeneity hypothesis? Defaults to |
indep |
test the independence hypothesis? Defaults to |
fit_mle |
output of |
type |
either |
... |
optional parameters passed to |
A list with class htest
:
statistic |
the value of the likelihood ratio test statistic. |
p.value |
the |
alternative |
a character string describing the alternative hypothesis. |
method |
description of the type of test performed. |
df |
degrees of freedom. |
data.name |
a character string giving the name of |
fit_mle |
maximum likelihood fit. |
fit_null |
maximum likelihood fit under the null hypothesis. |
Kato, S. and Pewsey, A. (2015). A Möbius transformation-induced distribution on the torus. Biometrika, 102(2):359–370. doi:10.1093/biomet/asv003
Singh, H., Hnizdo, V., and Demchuk, E. (2002). Probabilistic model for two dependent circular variables. Biometrika, 89(3):719–723. doi:10.1093/biomet/89.3.719
## Bivariate sine von Mises # Homogeneity n <- 200 mu <- c(0, 0) kappa_0 <- c(1, 1, 0.5) kappa_1 <- c(0.7, 0.1, 0.25) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, hom = TRUE, type = "bvm") biv_lrt(x = samp_1, hom = TRUE, type = "bvm") # Independence kappa_0 <- c(0, 1, 0) kappa_1 <- c(1, 0, 1) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, indep = TRUE, type = "bvm") biv_lrt(x = samp_1, indep = TRUE, type = "bvm") # Independence and homogeneity kappa_0 <- c(3, 3, 0) kappa_1 <- c(3, 1, 0) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, indep = TRUE, hom = TRUE, type = "bvm") biv_lrt(x = samp_1, indep = TRUE, hom = TRUE, type = "bvm") ## Bivariate wrapped Cauchy # Homogeneity xi_0 <- c(0.5, 0.5, 0.25) xi_1 <- c(0.7, 0.1, 0.5) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, hom = TRUE, type = "bwc") biv_lrt(x = samp_1, hom = TRUE, type = "bwc") # Independence xi_0 <- c(0.1, 0.5, 0) xi_1 <- c(0.3, 0.5, 0.2) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, indep = TRUE, type = "bwc") biv_lrt(x = samp_1, indep = TRUE, type = "bwc") # Independence and homogeneity xi_0 <- c(0.2, 0.2, 0) xi_1 <- c(0.1, 0.2, 0.1) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, indep = TRUE, hom = TRUE, type = "bwc") biv_lrt(x = samp_1, indep = TRUE, hom = TRUE, type = "bwc")
## Bivariate sine von Mises # Homogeneity n <- 200 mu <- c(0, 0) kappa_0 <- c(1, 1, 0.5) kappa_1 <- c(0.7, 0.1, 0.25) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, hom = TRUE, type = "bvm") biv_lrt(x = samp_1, hom = TRUE, type = "bvm") # Independence kappa_0 <- c(0, 1, 0) kappa_1 <- c(1, 0, 1) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, indep = TRUE, type = "bvm") biv_lrt(x = samp_1, indep = TRUE, type = "bvm") # Independence and homogeneity kappa_0 <- c(3, 3, 0) kappa_1 <- c(3, 1, 0) samp_0 <- r_bvm(n = n, mu = mu, kappa = kappa_0) samp_1 <- r_bvm(n = n, mu = mu, kappa = kappa_1) biv_lrt(x = samp_0, indep = TRUE, hom = TRUE, type = "bvm") biv_lrt(x = samp_1, indep = TRUE, hom = TRUE, type = "bvm") ## Bivariate wrapped Cauchy # Homogeneity xi_0 <- c(0.5, 0.5, 0.25) xi_1 <- c(0.7, 0.1, 0.5) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, hom = TRUE, type = "bwc") biv_lrt(x = samp_1, hom = TRUE, type = "bwc") # Independence xi_0 <- c(0.1, 0.5, 0) xi_1 <- c(0.3, 0.5, 0.2) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, indep = TRUE, type = "bwc") biv_lrt(x = samp_1, indep = TRUE, type = "bwc") # Independence and homogeneity xi_0 <- c(0.2, 0.2, 0) xi_1 <- c(0.1, 0.2, 0.1) samp_0 <- r_bwc(n = n, mu = mu, xi = xi_0) samp_1 <- r_bwc(n = n, mu = mu, xi = xi_1) biv_lrt(x = samp_0, indep = TRUE, hom = TRUE, type = "bwc") biv_lrt(x = samp_1, indep = TRUE, hom = TRUE, type = "bwc")
Computation of the density and normalizing constant
of the bivariate sine von Mises
Simulation of samples from a bivariate sine von Mises.
Maximum likelihood and method of moments estimation of the
parameters .
d_bvm(x, mu, kappa, log_const = NULL) const_bvm(kappa, M = 25, MC = 10000) r_bvm(n, mu, kappa) fit_bvm_mm( x, lower = c(0, 0, -30), upper = c(30, 30, 30), start = NULL, M = 25, hom = FALSE, indep = FALSE, ... ) fit_bvm_mle( x, start = NULL, M = 25, lower = c(-pi, -pi, 0, 0, -30), upper = c(pi, pi, 30, 30, 30), hom = FALSE, indep = FALSE, ... )
d_bvm(x, mu, kappa, log_const = NULL) const_bvm(kappa, M = 25, MC = 10000) r_bvm(n, mu, kappa) fit_bvm_mm( x, lower = c(0, 0, -30), upper = c(30, 30, 30), start = NULL, M = 25, hom = FALSE, indep = FALSE, ... ) fit_bvm_mle( x, start = NULL, M = 25, lower = c(-pi, -pi, 0, 0, -30), upper = c(pi, pi, 30, 30, 30), hom = FALSE, indep = FALSE, ... )
x |
matrix of size |
mu |
circular means of the density, a vector of length |
kappa |
vector of length |
log_const |
logarithm of the normalizing constant. Computed internally
if |
M |
truncation of the series expansion for computing the normalizing
constant. Defaults to |
MC |
Monte Carlo replicates for computing the normalizing
constant when there is no series expansion. Defaults to |
n |
sample size. |
lower , upper
|
vectors of length |
start |
a vector of length |
hom |
assume a homogeneous distribution with equal marginal
concentrations? Defaults to |
indep |
set the dependence parameter to zero? Defaults to |
... |
further parameters passed to
|
d_bvm
: a vector of length nx
with the density evaluated
at x
.
const_bvm
: the value of the normalizing constant
.
r_bvm
: a matrix of size c(n, 2)
with the random sample.
fit_mme_bvm, fit_mle_bvm
: a list with the estimated parameters
and the object
opt
containing the optimization summary.
Mardia, K. V., Hughes, G., Taylor, C. C., and Singh, H. (2008). A multivariate von Mises with applications to bioinformatics. Canadian Journal of Statistics, 36(1):99–109. doi:10.1002/cjs.5550360110
Singh, H., Hnizdo, V., and Demchuk, E. (2002). Probabilistic model for two dependent circular variables. Biometrika, 89(3):719–723. doi:10.1093/biomet/89.3.719
## Density evaluation mu <- c(0, 0) kappa <- 3:1 nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) const <- const_bvm(kappa = kappa) d <- d_bvm(x = x, mu = mu, kappa = kappa, log_const = log(const)) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(31), levels = seq(0, max(d), l = 30)) ## Sampling and estimation n <- 100 samp <- r_bvm(n = n, mu = mu, kappa = kappa) (param_mm <- fit_bvm_mm(samp)$par) (param_mle <- fit_bvm_mle(samp)$par)
## Density evaluation mu <- c(0, 0) kappa <- 3:1 nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) const <- const_bvm(kappa = kappa) d <- d_bvm(x = x, mu = mu, kappa = kappa, log_const = log(const)) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(31), levels = seq(0, max(d), l = 30)) ## Sampling and estimation n <- 100 samp <- r_bvm(n = n, mu = mu, kappa = kappa) (param_mm <- fit_bvm_mm(samp)$par) (param_mle <- fit_bvm_mle(samp)$par)
Computation of the density of a bivariate wrapped Cauchy:
Simulation of samples from a bivariate wrapped Cauchy.
Maximum likelihood and method of moments estimation of the
parameters .
d_bwc(x, mu, xi) r_bwc(n, mu, xi) fit_bwc_mm(x, hom = FALSE, indep = FALSE) fit_bwc_mle( x, start = NULL, lower = c(-pi, -pi, 0, 0, -1 + 0.001), upper = c(pi, pi, 1 - 0.001, 1 - 0.001, 1 - 0.001), hom = FALSE, indep = FALSE, ... )
d_bwc(x, mu, xi) r_bwc(n, mu, xi) fit_bwc_mm(x, hom = FALSE, indep = FALSE) fit_bwc_mle( x, start = NULL, lower = c(-pi, -pi, 0, 0, -1 + 0.001), upper = c(pi, pi, 1 - 0.001, 1 - 0.001, 1 - 0.001), hom = FALSE, indep = FALSE, ... )
x |
matrix of size |
mu |
circular means of the density, a vector of length |
xi |
a vector of length |
n |
sample size. |
hom |
assume a homogeneous distribution with equal marginal
concentrations? Defaults to |
indep |
set the dependence parameter to zero? Defaults to |
start |
a vector of length |
lower , upper
|
vectors of length |
... |
further parameters passed to
|
d_bwc
: a vector of length nx
with the density evaluated
at x
.
r_bwc
: a matrix of size c(n, 2)
with the random sample.
fit_mme_bwc, fit_mle_bwc
: a list with the parameters
and the object
opt
containing the optimization summary.
The original code for r_bwc
was supplied by
Arthur Pewsey.
Kato, S. and Pewsey, A. (2015). A Möbius transformation-induced distribution on the torus. Biometrika, 102(2):359–370. doi:10.1093/biomet/asv003
## Density evaluation mu <- c(0, 0) xi <- c(0.3, 0.5, 0.4) nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwc(x = x, mu = mu, xi = xi) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20), levels = seq(0, max(d), l = 20)) ## Sampling and estimation n <- 100 samp <- r_bwc(n = n, mu = mu, xi = xi) (param_mm <- fit_bwc_mm(samp)$par) (param_mle <- fit_bwc_mle(samp)$par)
## Density evaluation mu <- c(0, 0) xi <- c(0.3, 0.5, 0.4) nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwc(x = x, mu = mu, xi = xi) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20), levels = seq(0, max(d), l = 20)) ## Sampling and estimation n <- 100 samp <- r_bwc(n = n, mu = mu, xi = xi) (param_mm <- fit_bwc_mm(samp)$par) (param_mle <- fit_bwc_mle(samp)$par)
Computation of the density of the bivariate wrapped normal.
Simulation of pairs of samples from a bivariate wrapped normal.
Maximum likelihood estimation of the parameters
.
d_bwn(x, mu, Sigma, kmax = 2) r_bwn(n, mu, Sigma) fit_bwn_mle( x, kmax = 2, lower = c(-pi, -pi, 0.001, 0.001, -1), upper = c(pi, pi, 20, 20, 1), ... )
d_bwn(x, mu, Sigma, kmax = 2) r_bwn(n, mu, Sigma) fit_bwn_mle( x, kmax = 2, lower = c(-pi, -pi, 0.001, 0.001, -1), upper = c(pi, pi, 20, 20, 1), ... )
x |
matrix of size |
mu |
circular means of the density, a vector of length |
Sigma |
covariance matrix of size |
kmax |
integer number up to truncate the wrapped normal series in
|
n |
sample size. |
lower , upper
|
vector of length |
... |
further parameters passed to
|
d_bwn
: a vector of length nx
with the density evaluated
at x
.
r_bwn
: a matrix of size c(n, 2)
with the random sample.
fit_bwn_mle
: a list with the parameters
and the object
opt
containing the optimization summary.
## Density evaluation mu <- c(0, 0) Sigma <- 3 * matrix(c(1.5, 0.75, 0.75, 1), nrow = 2, ncol = 2) nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwn(x = x, mu = mu, Sigma = Sigma) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(31), levels = seq(0, max(d), l = 30)) ## Sampling and estimation n <- 100 samp <- r_bwn(n = 100, mu = mu, Sigma = Sigma) (param_mle <- fit_bwn_mle(samp)$par)
## Density evaluation mu <- c(0, 0) Sigma <- 3 * matrix(c(1.5, 0.75, 0.75, 1), nrow = 2, ncol = 2) nth <- 50 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwn(x = x, mu = mu, Sigma = Sigma) filled.contour(th, th, matrix(d, nth, nth), col = viridisLite::viridis(31), levels = seq(0, max(d), l = 30)) ## Sampling and estimation n <- 100 samp <- r_bwn(n = 100, mu = mu, Sigma = Sigma) (param_mle <- fit_bwn_mle(samp)$par)
Pre-earthquake direction of steepest descent and the direction of lateral ground movement before and after, respectively, an earthquake in Noshiro (Japan) in 1983.
earthquakes
earthquakes
A data frame with 678 rows and 2 variables:
Direction of steepest descent.
Direction of lateral ground movement.
The direction is measured in radians in with
/
/
/
/
representing the East / North / West / South / East directions.
Hamada, M. and O'Rourke, T. (1992). Case Studies of Liquefaction & Lifeline Performance During Past Earthquake. Volume 1: Japanese Case Studies. Technical Report NCEER-92-0001. National Center for Earthquake Engineering Research, University at Buffalo. doi:10.1016/0886-7798(93)90146-m
Jones, M. C., Pewsey, A., and Kato, S. (2015). On a class of circulas: copulas for circular distributions. Annals of the Institute of Statistical Mathematics, 67(5):843–862. doi:10.1007/s10463-014-0493-6
Rivest, L.-P. (1997). A decentred predictor for circular-circular regression. Biometrika, 84(3):717–726. doi:10.1093/biomet/84.3.717
# Load data data("earthquakes") # Transform the data into [-pi, pi) earthquakes <- sdetorus::toPiInt(earthquakes) plot(earthquakes, xlab = expression(theta[1]), ylab = expression(theta[2]), xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE) sdetorus::torusAxis() # Perform TR-PCA fit <- ridge_pca(x = earthquakes) show_ridge_pca(fit)
# Load data data("earthquakes") # Transform the data into [-pi, pi) earthquakes <- sdetorus::toPiInt(earthquakes) plot(earthquakes, xlab = expression(theta[1]), ylab = expression(theta[2]), xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE) sdetorus::torusAxis() # Perform TR-PCA fit <- ridge_pca(x = earthquakes) show_ridge_pca(fit)
Computes the Fréchet mean, variance, and standard deviation of
a sample on the -torus
,
, with
identified.
frechet_mean(x, l = pi, N = 500, draw_plot = FALSE) frechet_ss(x, l = pi, N = 500, draw_plot = FALSE)
frechet_mean(x, l = pi, N = 500, draw_plot = FALSE) frechet_ss(x, l = pi, N = 500, draw_plot = FALSE)
x |
sample of angles on |
l |
half-period of the circular data. Can be a vector of length
|
N |
size of the grid in |
draw_plot |
draw a diagnostic plot showing the Fréchet loss function?
Defaults to |
frechet_mean
: a list with the marginal Fréchet means
(mu
), variances (var
), and standard deviations (sd
).
frechet_ss
: a list with the Fréchet variances (var
)
and the cumulative proportion of total variance explained (var_exp
).
## Circular data # Sample from a wrapped normal x <- sdetorus::toPiInt(rnorm(n = 5e2, mean = 2, sd = 1)) frechet_mean(x = x) # Sample from a bimodal distribution x <- sdetorus::toPiInt(rnorm(n = 5e2, mean = c(1, -2), sd = c(0.5, 0.75))) frechet_mean(x = x) # Periodic data in [-2, 2) x <- sdetorus::toInt(rnorm(n = 5e2, mean = c(-2, 1), sd = 2:1), a = -2, b = 2) frechet_mean(x = x, l = 2) ## Toroidal data # Sample from a multivariate wrapped normal n <- 50 S <- rbind(c(2.5, -0.2, 0.5), c(-0.2, 1.5, -0.5), c(0.5, -0.5, 0.75)) x <- sdetorus::toPiInt(mvtnorm::rmvnorm(n, mean = c(0, 1.5, -2), sigma = S)) (f <- frechet_mean(x = x)) # Total Fréchet variance is sum of marginal variances sum(torus_dist(x, y = f$mu, squared = TRUE)) / n sum(f$var) # Cumulative proportion of variances frechet_ss(x)
## Circular data # Sample from a wrapped normal x <- sdetorus::toPiInt(rnorm(n = 5e2, mean = 2, sd = 1)) frechet_mean(x = x) # Sample from a bimodal distribution x <- sdetorus::toPiInt(rnorm(n = 5e2, mean = c(1, -2), sd = c(0.5, 0.75))) frechet_mean(x = x) # Periodic data in [-2, 2) x <- sdetorus::toInt(rnorm(n = 5e2, mean = c(-2, 1), sd = 2:1), a = -2, b = 2) frechet_mean(x = x, l = 2) ## Toroidal data # Sample from a multivariate wrapped normal n <- 50 S <- rbind(c(2.5, -0.2, 0.5), c(-0.2, 1.5, -0.5), c(0.5, -0.5, 0.75)) x <- sdetorus::toPiInt(mvtnorm::rmvnorm(n, mean = c(0, 1.5, -2), sigma = S)) (f <- frechet_mean(x = x)) # Total Fréchet variance is sum of marginal variances sum(torus_dist(x, y = f$mu, squared = TRUE)) / n sum(f$var) # Cumulative proportion of variances frechet_ss(x)
Given the angles theta
in ,
ridge_curve
computes the Fourier-fitted ridge curve
or
, where
with and
for
.
der_ridge_curve
and
dist_ridge_curve
compute the derivatives of and the distances along
these curves, respectively. alpha_ridge_curve
provides a uniform
grid of the ridge curve using the arc-length parametrization.
proj_ridge_curve
gives the ridge's for which the curve
is closer to any point on
.
ridge_curve( theta, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, at2 = TRUE ) der_ridge_curve( theta, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, norm = NULL, at2 = TRUE ) dist_ridge_curve( alpha, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, der = TRUE, shortest = TRUE, at2 = TRUE ) arclength_ridge_curve( mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, L = 500, at2 = TRUE ) proj_ridge_curve( x, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, ridge_curve_grid = NULL, arclength = FALSE, at2 = TRUE )
ridge_curve( theta, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, at2 = TRUE ) der_ridge_curve( theta, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, norm = NULL, at2 = TRUE ) dist_ridge_curve( alpha, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, der = TRUE, shortest = TRUE, at2 = TRUE ) arclength_ridge_curve( mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, L = 500, at2 = TRUE ) proj_ridge_curve( x, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, ridge_curve_grid = NULL, arclength = FALSE, at2 = TRUE )
theta |
vector |
mu |
a vector of size |
coefs |
list of coefficients |
ind_var |
index |
at2 |
do the |
norm |
normalize tangent vectors? If different from |
alpha |
a vector of size |
N |
number of discretization points for approximating curve lengths.
Defaults to |
der |
use derivatives to approximate curve lengths? Defaults to
|
shortest |
return the shortest possible distance? Defaults to
|
L |
number of discretization points for computing the arc-length
parametrization curve lengths. Defaults to |
x |
a matrix of size |
ridge_curve_grid |
if provided, the |
arclength |
use the arc-length parametrization to compute the
projections? This yields a more uniform grid for searching the projections.
Defaults to |
ridge_curve
: a matrix of size c(nth, 2)
with the ridge
curve evaluated at theta
.
der_ridge_curve
: a matrix of size c(nth, 2)
with the
derivatives of the ridge curve evaluated at theta
.
dist_ridge_curve
: the distance between two points along
the ridge curve, a non-negative scalar.
proj_ridge_curve
: a list with (1) the theta
's that
give the points in the ridge curve that are the closest (in the flat torus
distance) to x
(a matrix of size c(nx, 2)
); (2) the indexes
of ridge_curve_grid
in which those theta
's were obtained.
arclength_ridge_curve
: a vector of size N
giving the
theta
angles that yield a uniform-length grid of the ridge curve.
mu <- c(-0.5, 1.65) th <- seq(-pi, pi, l = 200) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) rid1 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1) rid2 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2) plot(mu[1], mu[2], xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), pch = "*", col = 5, cex = 3) sdetorus::linesTorus(rid1[, 1], rid1[, 2], col = 1) sdetorus::linesTorus(rid2[, 1], rid2[, 2], col = 2) abline(v = mu[1], lty = 3, col = 5) abline(h = mu[2], lty = 3, col = 5) points(ridge_curve(theta = mu[1], mu = mu, coefs = coefs, ind_var = 1), col = 1) points(ridge_curve(theta = mu[2], mu = mu, coefs = coefs, ind_var = 2), col = 2) sdetorus::torusAxis() ## der_ridge_curve th <- seq(-pi, pi, l = 10) mu <- c(0.5, 1.5) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) rid1 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1) rid2 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2) v1 <- der_ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1, norm = 0.5) v2 <- der_ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2, norm = 0.5) points(rid1, pch = 16, col = 1) points(rid2, pch = 16, col = 2) arrows(x0 = rid1[, 1], y0 = rid1[, 2], x1 = (rid1 + v1)[, 1], y1 = (rid1 + v1)[, 2], col = 3, angle = 5, length = 0.1) arrows(x0 = rid2[, 1], y0 = rid2[, 2], x1 = (rid2 + v2)[, 1], y1 = (rid2 + v2)[, 2], col = 4, angle = 5, length = 0.1) ## dist_ridge_curve # Distances accuracy a <- c(-pi / 2, pi) mu <- c(-pi / 2, pi / 2) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = TRUE, N = 1e6) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = FALSE, N = 1e6) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = TRUE, N = 1e2) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = FALSE, N = 1e2) ## arclength_ridge_curve mu <- c(-pi / 2, pi / 2) alpha <- arclength_ridge_curve(mu = mu, coefs = coefs, ind_var = 1, N = 25) alpha <- sdetorus::toPiInt(c(alpha, alpha[1])) rid <- ridge_curve(theta = alpha, mu = mu, coefs = coefs, ind_var = 1) plot(mu[1], mu[2], pch = "*", col = 5, cex = 3, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2])) sdetorus::linesTorus(rid[, 1], rid[, 2], col = 1, pch = 16) points(rid[, 1], rid[, 2], pch = 16, col = 1) abline(v = mu[1], lty = 3, col = 5) abline(h = mu[2], lty = 3, col = 5) sdetorus::torusAxis() ## proj_ridge_curve mu <- c(0, 0) n <- 25 x <- matrix(runif(2 * n, -pi, pi), nrow = n, ncol = 2) col <- rainbow(n) th <- seq(-pi, pi, l = 100) old_par <- par(no.readonly = TRUE) par(mfrow = c(1, 2)) for (j in 1:2) { plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), col = col) rid <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = j) sdetorus::linesTorus(x = rid[, 1], y = rid[, 2], lwd = 2) abline(v = mu[1], lty = 3) abline(h = mu[2], lty = 3) points(mu[1], mu[2], pch = "*", cex = 3) sdetorus::torusAxis() theta_projs <- proj_ridge_curve(x = x, mu = mu, coefs = coefs, ind_var = j, ridge_curve_grid = rid)$theta_proj projs <- ridge_curve(theta = theta_projs, mu = mu, coefs = coefs, ind_var = j) points(projs, col = col, pch = 3) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], projs[i, 1]), y = c(x[i, 2], projs[i, 2]), col = col[i], lty = 3) } } par(old_par)
mu <- c(-0.5, 1.65) th <- seq(-pi, pi, l = 200) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) rid1 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1) rid2 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2) plot(mu[1], mu[2], xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), pch = "*", col = 5, cex = 3) sdetorus::linesTorus(rid1[, 1], rid1[, 2], col = 1) sdetorus::linesTorus(rid2[, 1], rid2[, 2], col = 2) abline(v = mu[1], lty = 3, col = 5) abline(h = mu[2], lty = 3, col = 5) points(ridge_curve(theta = mu[1], mu = mu, coefs = coefs, ind_var = 1), col = 1) points(ridge_curve(theta = mu[2], mu = mu, coefs = coefs, ind_var = 2), col = 2) sdetorus::torusAxis() ## der_ridge_curve th <- seq(-pi, pi, l = 10) mu <- c(0.5, 1.5) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) rid1 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1) rid2 <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2) v1 <- der_ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 1, norm = 0.5) v2 <- der_ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = 2, norm = 0.5) points(rid1, pch = 16, col = 1) points(rid2, pch = 16, col = 2) arrows(x0 = rid1[, 1], y0 = rid1[, 2], x1 = (rid1 + v1)[, 1], y1 = (rid1 + v1)[, 2], col = 3, angle = 5, length = 0.1) arrows(x0 = rid2[, 1], y0 = rid2[, 2], x1 = (rid2 + v2)[, 1], y1 = (rid2 + v2)[, 2], col = 4, angle = 5, length = 0.1) ## dist_ridge_curve # Distances accuracy a <- c(-pi / 2, pi) mu <- c(-pi / 2, pi / 2) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = TRUE, N = 1e6) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = FALSE, N = 1e6) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = TRUE, N = 1e2) dist_ridge_curve(alpha = a, mu = mu, coefs = coefs, der = FALSE, N = 1e2) ## arclength_ridge_curve mu <- c(-pi / 2, pi / 2) alpha <- arclength_ridge_curve(mu = mu, coefs = coefs, ind_var = 1, N = 25) alpha <- sdetorus::toPiInt(c(alpha, alpha[1])) rid <- ridge_curve(theta = alpha, mu = mu, coefs = coefs, ind_var = 1) plot(mu[1], mu[2], pch = "*", col = 5, cex = 3, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2])) sdetorus::linesTorus(rid[, 1], rid[, 2], col = 1, pch = 16) points(rid[, 1], rid[, 2], pch = 16, col = 1) abline(v = mu[1], lty = 3, col = 5) abline(h = mu[2], lty = 3, col = 5) sdetorus::torusAxis() ## proj_ridge_curve mu <- c(0, 0) n <- 25 x <- matrix(runif(2 * n, -pi, pi), nrow = n, ncol = 2) col <- rainbow(n) th <- seq(-pi, pi, l = 100) old_par <- par(no.readonly = TRUE) par(mfrow = c(1, 2)) for (j in 1:2) { plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), col = col) rid <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = j) sdetorus::linesTorus(x = rid[, 1], y = rid[, 2], lwd = 2) abline(v = mu[1], lty = 3) abline(h = mu[2], lty = 3) points(mu[1], mu[2], pch = "*", cex = 3) sdetorus::torusAxis() theta_projs <- proj_ridge_curve(x = x, mu = mu, coefs = coefs, ind_var = j, ridge_curve_grid = rid)$theta_proj projs <- ridge_curve(theta = theta_projs, mu = mu, coefs = coefs, ind_var = j) points(projs, col = col, pch = 3) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], projs[i, 1]), y = c(x[i, 2], projs[i, 2]), col = col[i], lty = 3) } } par(old_par)
Computation of the connected component of the density ridge of
in a given set of points or, if not specified, in a regular grid on
.
ridge_bvm(mu, kappa, eval_points, subint_1, subint_2) ridge_bwc(mu, xi, eval_points, subint_1, subint_2) ridge_bwn(mu, Sigma, kmax = 2, eval_points, subint_1, subint_2)
ridge_bvm(mu, kappa, eval_points, subint_1, subint_2) ridge_bwc(mu, xi, eval_points, subint_1, subint_2) ridge_bwn(mu, Sigma, kmax = 2, eval_points, subint_1, subint_2)
mu |
circular means of the density, a vector of length |
kappa |
vector of length |
eval_points |
evaluation points for the ridge. |
subint_1 |
number of points for |
subint_2 |
number of points for |
xi |
a vector of length |
Sigma |
covariance matrix of size |
kmax |
integer number up to truncate the wrapped normal series in
|
A matrix of size c(subint_1, 2)
containing the points of the
connected component of the ridge.
Ozertem, U. and Erdogmus, D. (2011). Locally defined principal curves and surfaces. Journal of Machine Learning Research, 12(34):1249–1286. doi:10.6083/M4ZG6Q60
# Bivariate von Mises mu <- c(0, 0) kappa <- c(0.3, 0.5, 0.4) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bvm(x = x, mu = mu, kappa = kappa) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bvm(mu = mu, kappa = kappa, subint_1 = 5e2, subint_2 = 5e2) points(ridge) # Bivariate wrapped Cauchy mu <- c(0, 0) xi <- c(0.3, 0.6, 0.25) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwc(x = x, mu = mu, xi = xi) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bwc(mu = mu, xi = xi, subint_1 = 5e2, subint_2 = 5e2) points(ridge) # Bivariate wrapped normal mu <- c(0, 0) Sigma <- matrix(c(10, 3, 3, 5), nrow = 2) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwn(x = x, mu = mu, Sigma = Sigma) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bwn(mu = mu, Sigma = Sigma, subint_1 = 5e2, subint_2 = 5e2) points(ridge)
# Bivariate von Mises mu <- c(0, 0) kappa <- c(0.3, 0.5, 0.4) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bvm(x = x, mu = mu, kappa = kappa) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bvm(mu = mu, kappa = kappa, subint_1 = 5e2, subint_2 = 5e2) points(ridge) # Bivariate wrapped Cauchy mu <- c(0, 0) xi <- c(0.3, 0.6, 0.25) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwc(x = x, mu = mu, xi = xi) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bwc(mu = mu, xi = xi, subint_1 = 5e2, subint_2 = 5e2) points(ridge) # Bivariate wrapped normal mu <- c(0, 0) Sigma <- matrix(c(10, 3, 3, 5), nrow = 2) nth <- 100 th <- seq(-pi, pi, l = nth) x <- as.matrix(expand.grid(th, th)) d <- d_bwn(x = x, mu = mu, Sigma = Sigma) image(th, th, matrix(d, nth, nth), col = viridisLite::viridis(20)) ridge <- ridge_bwn(mu = mu, Sigma = Sigma, subint_1 = 5e2, subint_2 = 5e2) points(ridge)
Computation of the Fourier expansion coefficients of a given curve.
ridge_fourier_fit(curve, K = 15, norm_prop = 1, N = 1280, at2 = TRUE)
ridge_fourier_fit(curve, K = 15, norm_prop = 1, N = 1280, at2 = TRUE)
curve |
points of the curve. |
K |
number of terms in the Fourier expansion. Defaults to |
norm_prop |
percentage of explained norm. Defaults to |
N |
number of Gaussian quadrature points, passed to
Gauss_Legen_nodes. Defaults to |
at2 |
do the |
The coefficients of the fit (see ridge_curve
). A list
with entries:
cos_a |
contains |
sin_b |
contains |
# Zero mean ridge0 <- ridge_bvm(mu = c(0, 0), kappa = c(1, 2, -5), subint_1 = 5e2, subint_2 = 5e2) coefs <- ridge_fourier_fit(ridge0) th <- seq(-pi, pi, l = 500) plot(ridge0, xlim = c(-pi, pi), ylim = c(-pi, pi)) points(ridge_curve(th, mu = c(0, 0), coefs = coefs, at2 = TRUE), col = 3, cex = 0.5) # Non-zero mean from a zero-mean ridge mu <- c(1.4, 2) ridge1 <- ridge_bvm(mu = mu, kappa = c(1, 2, -5), subint_1 = 5e2, subint_2 = 5e2) # Just for plot plot(ridge1, xlim = c(-pi, pi), ylim = c(-pi, pi)) points(mu[1], mu[2], col = 4, pch = "*", cex = 5) points(ridge_curve(th, mu = mu, coefs = coefs), col = 3, cex = 0.5) # Other zero-mean example mu <- c(0, 0) ridge <- ridge_bwc(mu = mu, xi = c(0.3, 0.5, 0.7), subint_1 = 5e2, subint_2 = 5e2) plot(ridge, xlim = c(-pi, pi), ylim = c(-pi, pi)) coefs <- ridge_fourier_fit(ridge) points(ridge_curve(th, mu = mu, coefs = coefs), col = 4, cex = 0.5) # Another zero-mean example mu <- c(0, 0) ridge <- ridge_bwc(mu = mu, xi = c(0.8, 0.1, 0.75), subint_1 = 5e2, subint_2 = 5e2) plot(ridge, xlim = c(-pi, pi), ylim = c(-pi, pi)) coefs <- ridge_fourier_fit(ridge) points(ridge_curve(th, mu = mu, coefs = coefs), col = 4, cex = 0.5)
# Zero mean ridge0 <- ridge_bvm(mu = c(0, 0), kappa = c(1, 2, -5), subint_1 = 5e2, subint_2 = 5e2) coefs <- ridge_fourier_fit(ridge0) th <- seq(-pi, pi, l = 500) plot(ridge0, xlim = c(-pi, pi), ylim = c(-pi, pi)) points(ridge_curve(th, mu = c(0, 0), coefs = coefs, at2 = TRUE), col = 3, cex = 0.5) # Non-zero mean from a zero-mean ridge mu <- c(1.4, 2) ridge1 <- ridge_bvm(mu = mu, kappa = c(1, 2, -5), subint_1 = 5e2, subint_2 = 5e2) # Just for plot plot(ridge1, xlim = c(-pi, pi), ylim = c(-pi, pi)) points(mu[1], mu[2], col = 4, pch = "*", cex = 5) points(ridge_curve(th, mu = mu, coefs = coefs), col = 3, cex = 0.5) # Other zero-mean example mu <- c(0, 0) ridge <- ridge_bwc(mu = mu, xi = c(0.3, 0.5, 0.7), subint_1 = 5e2, subint_2 = 5e2) plot(ridge, xlim = c(-pi, pi), ylim = c(-pi, pi)) coefs <- ridge_fourier_fit(ridge) points(ridge_curve(th, mu = mu, coefs = coefs), col = 4, cex = 0.5) # Another zero-mean example mu <- c(0, 0) ridge <- ridge_bwc(mu = mu, xi = c(0.8, 0.1, 0.75), subint_1 = 5e2, subint_2 = 5e2) plot(ridge, xlim = c(-pi, pi), ylim = c(-pi, pi)) coefs <- ridge_fourier_fit(ridge) points(ridge_curve(th, mu = mu, coefs = coefs), col = 4, cex = 0.5)
This function computes the whole process of toroidal PCA via density ridges on a given sample: parameter estimation of the underlying distribution, estimation of the connected component of the ridge, and determination of its Fourier expansion from which to obtain the first and second scores.
ridge_pca( x, type = c("auto", "bvm", "bwc")[1], N = 500, K = 15, scale = TRUE, lrts = TRUE, alpha = 0.05, at2 = TRUE, ... )
ridge_pca( x, type = c("auto", "bvm", "bwc")[1], N = 500, K = 15, scale = TRUE, lrts = TRUE, alpha = 0.05, at2 = TRUE, ... )
x |
matrix of dimension |
type |
either |
N |
number of discretization points for approximating curve lengths.
Defaults to |
K |
number of terms in the Fourier expansion. Defaults to |
scale |
scale the resulting scores to |
lrts |
run |
alpha |
significance level for the homogeneity test. |
at2 |
do the |
... |
optional parameters passed to |
A list with:
mu_hat |
estimated circular means of the sample. |
coefs_hat |
estimated Fourier coefficients. |
ind_var |
indexing variable. |
scores |
scores for each of the sample points. |
var_exp |
percentage of explained variance. |
fit_mle |
maximum likelihood fit. |
bic_fit |
BIC of the fit. |
data |
original sample. |
scales |
vector of length 2 with the scale limits for the axes. |
type |
type of fit performed. |
p_hom |
|
p_indep |
|
## Bivariate von Mises n <- 100 x <- r_bvm(n = n, mu = c(1, 2), kappa = c(0.4, 0.4, 0.5)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") x <- r_bvm(n = n, mu = c(2, 1), kappa = c(1, 2, 0)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") x <- r_bvm(n = n, mu = c(2, 1), kappa = c(3, 2, 0)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") ## Bivariate wrapped Cauchy x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.2, 0.2, 0.5)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red") x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.2, 0.8, 0)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red") x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.5, 0.2, 0)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red")
## Bivariate von Mises n <- 100 x <- r_bvm(n = n, mu = c(1, 2), kappa = c(0.4, 0.4, 0.5)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") x <- r_bvm(n = n, mu = c(2, 1), kappa = c(1, 2, 0)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") x <- r_bvm(n = n, mu = c(2, 1), kappa = c(3, 2, 0)) fit <- ridge_pca(x = x, type = "bvm") show_ridge_pca(fit = fit, col_data = "red") ## Bivariate wrapped Cauchy x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.2, 0.2, 0.5)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red") x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.2, 0.8, 0)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red") x <- r_bwc(n = n, mu = c(1, 2), xi = c(0.5, 0.2, 0)) fit <- ridge_pca(x = x, type = "bwc") show_ridge_pca(fit = fit, col_data = "red")
Computation of PCA scores for Fourier-fitted ridge curves. The scores are defined as follows:
First scores: signed distances along the ridge curve of the data
projections to .
Second scores: signed toroidal distances from the data points to their ridge projections.
The scores can be scaled to or remain as
, where
is the length of the curve and
is the maximal absolute second score.
ridge_scores( x, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, scale = TRUE, at2 = TRUE ) max_score_2( mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, L = 25, f = 2, at2 = TRUE )
ridge_scores( x, mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, N = 500, scale = TRUE, at2 = TRUE ) max_score_2( mu = c(0, 0), coefs = list(cos_a = c(0, 0), sin_b = 0), ind_var = 1, L = 25, f = 2, at2 = TRUE )
x |
a matrix of size |
mu |
a vector of size |
coefs |
list of coefficients |
ind_var |
index |
N |
number of discretization points for approximating curve lengths.
Defaults to |
scale |
scale the resulting scores to |
at2 |
do the |
L |
grid along he variable |
f |
factor for shrinking the grid on the variable that is different to
|
The mean corresponds to the first score being null.
ridge_scores
returns a list with:
scores |
a matrix of size |
scales |
a vector of length 2 with the scale limits for the axes. |
max_score_2
computes the maximum allowed second score to rescale if
scale = TRUE
.
mu <- c(-0.5, 1.65) th <- seq(-pi, pi, l = 200) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) n <- 10 col <- rainbow(n) set.seed(13213) old_par <- par(no.readonly = TRUE) par(mfrow = c(2, 2)) for (j in 1:2) { # Simulate synthetic data close to the ridge curve rid <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = j) ind <- sort(sample(length(th), size = n)) eps <- 0.25 * matrix(runif(2 * n, -1, 1), nrow = n, ncol = 2) x <- sdetorus::toPiInt(rid[ind, ] + eps) # Plot ridge and synthetic data, with signs from the second scores s <- ridge_scores(x, mu = mu, coefs = coefs, ind_var = j)$scores plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), col = col, pch = ifelse(sign(s[, 2]) == 1, "+", "-"), cex = 1.25) sdetorus::linesTorus(rid[, 1], rid[, 2], lwd = 2) abline(v = mu[1], lty = 3) abline(h = mu[2], lty = 3) points(mu[1], mu[2], pch = "*", cex = 3) sdetorus::torusAxis() # Projections theta_projs <- proj_ridge_curve(x = x, mu = mu, coefs = coefs, ind_var = j, ridge_curve_grid = rid, )$theta_proj projs <- ridge_curve(theta = theta_projs, mu = mu, coefs = coefs, ind_var = j) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], projs[i, 1]), y = c(x[i, 2], projs[i, 2]), col = col[i], lty = 3) } # Scores plot plot(s, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = "Score 1", ylab = "Score 2", col = col, pch = ifelse(sign(s[, 2]) == 1, "+", "-")) sdetorus::torusAxis() abline(v = 0, lty = 3) abline(h = 0, lty = 3) points(0, 0, pch = "*", cex = 3) } par(old_par)
mu <- c(-0.5, 1.65) th <- seq(-pi, pi, l = 200) K <- 5 coefs <- list(cos_a = 1 / (1:(K + 1))^3, sin_b = 1 / (1:K)^3) n <- 10 col <- rainbow(n) set.seed(13213) old_par <- par(no.readonly = TRUE) par(mfrow = c(2, 2)) for (j in 1:2) { # Simulate synthetic data close to the ridge curve rid <- ridge_curve(theta = th, mu = mu, coefs = coefs, ind_var = j) ind <- sort(sample(length(th), size = n)) eps <- 0.25 * matrix(runif(2 * n, -1, 1), nrow = n, ncol = 2) x <- sdetorus::toPiInt(rid[ind, ] + eps) # Plot ridge and synthetic data, with signs from the second scores s <- ridge_scores(x, mu = mu, coefs = coefs, ind_var = j)$scores plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = expression(theta[1]), ylab = expression(theta[2]), col = col, pch = ifelse(sign(s[, 2]) == 1, "+", "-"), cex = 1.25) sdetorus::linesTorus(rid[, 1], rid[, 2], lwd = 2) abline(v = mu[1], lty = 3) abline(h = mu[2], lty = 3) points(mu[1], mu[2], pch = "*", cex = 3) sdetorus::torusAxis() # Projections theta_projs <- proj_ridge_curve(x = x, mu = mu, coefs = coefs, ind_var = j, ridge_curve_grid = rid, )$theta_proj projs <- ridge_curve(theta = theta_projs, mu = mu, coefs = coefs, ind_var = j) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], projs[i, 1]), y = c(x[i, 2], projs[i, 2]), col = col[i], lty = 3) } # Scores plot plot(s, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, xlab = "Score 1", ylab = "Score 2", col = col, pch = ifelse(sign(s[, 2]) == 1, "+", "-")) sdetorus::torusAxis() abline(v = 0, lty = 3) abline(h = 0, lty = 3) points(0, 0, pch = "*", cex = 3) } par(old_par)
The Santa Barbara Channel is a coastal area in California. This dataset contains the sea currents in the four areas present in the data application in García-Portugués and Prieto-Tirado (2022). Precisely, it contains the 24-hour speed-weighted mean of the currents' direction in each of the four areas downloaded from the NOAA High Frequency Radar National Server.
santabarbara
santabarbara
A data frame with 1092 rows and 4 variables:
Sea current direction at zone A.
Sea current direction at zone B.
Sea current direction at zone C.
Sea current direction at zone D.
The selection of these four areas is motivated by previous studies on the
Santa Barbara currents, like Auad et al. (1998). The direction is measured
in radians in with
/
/
/
/
representing the
East / South / West / North / East directions. The script performing the data
preprocessing is available at
data-acquisition.R
. The data was retrieved on 2022-10-21.
Auad, G., Hendershott, M. C., and Winant, C. D. (1998). Wind-induced currents and bottom-trapped waves in the Santa Barbara Channel. Journal of Physical Oceanography, 28(1):85–102. doi:10.1175/1520-0485(1998)028<0085:WICABT>2.0.CO;2
García-Portugués, E. and Prieto-Tirado, A. (2023). Toroidal PCA via density ridges. Statistics and Computing, 33(5):107. doi:10.1007/s11222-023-10273-9
# Load data data("santabarbara") AB_zone <- santabarbara[c("A","B")] # Perform TR-PCA fit <- ridge_pca(x = AB_zone) show_ridge_pca(fit)
# Load data data("santabarbara") AB_zone <- santabarbara[c("A","B")] # Perform TR-PCA fit <- ridge_pca(x = AB_zone) show_ridge_pca(fit)
Shows the scores computation for PCA via density ridges on
.
show_ridge_pca( fit, n_max = 500, projs = TRUE, projs_lines = TRUE, signs = TRUE, col_data = 1, col_projs = c(3, 4), main = "", N = 500, at2 = TRUE )
show_ridge_pca( fit, n_max = 500, projs = TRUE, projs_lines = TRUE, signs = TRUE, col_data = 1, col_projs = c(3, 4), main = "", N = 500, at2 = TRUE )
fit |
the output of |
n_max |
maximum number of data points to draw. These are sampled from
the data provided. Defaults to |
projs |
draw projections? Defaults to |
projs_lines |
draw projection lines? Defaults to |
signs |
plot the original data points with |
col_data |
color(s) for the data points. Defaults to |
col_projs |
a vector of size |
main |
caption of the plot. Empty by default. |
N |
number of discretization points for approximating curve lengths.
Defaults to |
at2 |
do the |
Nothing, the functions are called to produce plots.
# Generate data set.seed(987654321) n <- 50 S1 <- rbind(c(1, -0.7), c(-0.7, 1)) S2 <- rbind(c(1, 0.5), c(0.5, 1)) x <- rbind(mvtnorm::rmvnorm(n, mean = c(0, pi / 2), sigma = S1), mvtnorm::rmvnorm(n, mean = c(pi, -pi / 2), sigma = S2)) x <- sdetorus::toPiInt(x) col <- rainbow(2)[rep(1:2, each = n)] # ridge_pca and its visualization fit <- ridge_pca(x = x, at2 = FALSE) show_ridge_pca(fit = fit, col_data = col, at2 = FALSE) fit2 <- ridge_pca(x = x, at2 = TRUE) show_ridge_pca(fit = fit2, col_data = col, at2 = TRUE)
# Generate data set.seed(987654321) n <- 50 S1 <- rbind(c(1, -0.7), c(-0.7, 1)) S2 <- rbind(c(1, 0.5), c(0.5, 1)) x <- rbind(mvtnorm::rmvnorm(n, mean = c(0, pi / 2), sigma = S1), mvtnorm::rmvnorm(n, mean = c(pi, -pi / 2), sigma = S2)) x <- sdetorus::toPiInt(x) col <- rainbow(2)[rep(1:2, each = n)] # ridge_pca and its visualization fit <- ridge_pca(x = x, at2 = FALSE) show_ridge_pca(fit = fit, col_data = col, at2 = FALSE) fit2 <- ridge_pca(x = x, at2 = TRUE) show_ridge_pca(fit = fit2, col_data = col, at2 = TRUE)
Computation of distances on ,
,
between two sets of observations.
torus_dist(x, y, squared = FALSE)
torus_dist(x, y, squared = FALSE)
x |
a matrix of size |
y |
either a matrix with the same size as |
squared |
return the squared distance? Defaults to |
The maximal distance on is
.
A vector of size nx
with the distances between the
observations of x
and y
.
# Illustration of torus distances n <- 10 x <- matrix(runif(2 * n, -pi, pi), nrow = n, ncol = 2) y <- c(pi / 2, pi / 3) col <- rainbow(n) plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, col = col, xlab = expression(theta[1]), ylab = expression(theta[2]), pch = 16) sdetorus::torusAxis() points(y[1], y[2], col = 1, pch = 17) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], y[1]), y = c(x[i, 2], y[2]), lty = 2, col = col[i]) } text(x = x, labels = sprintf("%.2f", torus_dist(x, y)), col = col, pos = 1)
# Illustration of torus distances n <- 10 x <- matrix(runif(2 * n, -pi, pi), nrow = n, ncol = 2) y <- c(pi / 2, pi / 3) col <- rainbow(n) plot(x, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE, col = col, xlab = expression(theta[1]), ylab = expression(theta[2]), pch = 16) sdetorus::torusAxis() points(y[1], y[2], col = 1, pch = 17) for (i in 1:n) { sdetorus::linesTorus(x = c(x[i, 1], y[1]), y = c(x[i, 2], y[2]), lty = 2, col = col[i]) } text(x = x, labels = sprintf("%.2f", torus_dist(x, y)), col = col, pos = 1)
Pairs plots for data on ,
.
The diagonal panels contain kernel density estimates tailored to
circular data.
torus_pairs( x, max_dim = 10, columns = NULL, col_data = 1, ylim_dens = c(0, 1.5), bwd = "ROT", scales = rep(pi, ncol(x)) )
torus_pairs( x, max_dim = 10, columns = NULL, col_data = 1, ylim_dens = c(0, 1.5), bwd = "ROT", scales = rep(pi, ncol(x)) )
x |
a matrix of size |
max_dim |
the maximum number of scores to produce the scores plot.
Defaults to |
columns |
if specified, the variables to be plotted. If |
col_data |
color(s) for the data points. Defaults to |
ylim_dens |
common |
bwd |
type of bandwidth selector used in the kernel density plots.
Either |
scales |
scales of the torus. Defaults to |
The default bandwidth selector is the Rule-Of-Thumb (ROT) selector in García-Portugués (2013). It is fast, yet it may oversmooth non-unimodal densities. The EMI selector gives more flexible fits.
A ggplot
. The density plots show the
Fréchet means (red bars) and the Fréchet standard
deviations (gray text).
García-Portugués, E. (2013). Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electronic Journal of Statistics, 7:1655–1685. doi:10.1214/13-ejs821
# Generate data n <- 50 set.seed(123456) x <- sdetorus::toPiInt(rbind( mvtnorm::rmvnorm(n = n, mean = c(-pi, -pi) / 2, sigma = diag(0.1, nrow = 2)), mvtnorm::rmvnorm(n = n, mean = c(-3 * pi / 2, 0) / 2, sigma = diag(0.1, nrow = 2)), mvtnorm::rmvnorm(n = n, mean = c(0, pi / 2), sigma = diag(0.1, nrow = 2)) )) col <- rainbow(3)[rep(1:3, each = n)] # Torus pairs torus_pairs(x, col_data = col) fit <- ridge_pca(x = x) torus_pairs(fit$scores, col_data = col)
# Generate data n <- 50 set.seed(123456) x <- sdetorus::toPiInt(rbind( mvtnorm::rmvnorm(n = n, mean = c(-pi, -pi) / 2, sigma = diag(0.1, nrow = 2)), mvtnorm::rmvnorm(n = n, mean = c(-3 * pi / 2, 0) / 2, sigma = diag(0.1, nrow = 2)), mvtnorm::rmvnorm(n = n, mean = c(0, pi / 2), sigma = diag(0.1, nrow = 2)) )) col <- rainbow(3)[rep(1:3, each = n)] # Torus pairs torus_pairs(x, col_data = col) fit <- ridge_pca(x = x) torus_pairs(fit$scores, col_data = col)
Wind direction at 6:00 and 7:00 from June 1, 2003 to June 30, 2003, in radians, measured at a weather station in Texas coded as C28-1.
wind
wind
A data frame with 30 rows and 2 variables:
Direction at 6:00 am.
Direction at 12:00 noon.
The direction is measured in radians in with
/
/
/
/
representing the East/South/West/North/East directions.
Johnson, R. A. and Wehrly, T. (1977). Measures and models for angular correlation and angular-linear correlation. Journal of the Royal Statistical Society. Series B (Methodological), 39(2):222–229. https://www.jstor.org/stable/2984799
# Load data data("wind") plot(wind, xlab = expression(theta[1]), ylab = expression(theta[2]), xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE) sdetorus::torusAxis() # Perform TR-PCA fit <- ridge_pca(x = wind) show_ridge_pca(fit)
# Load data data("wind") plot(wind, xlab = expression(theta[1]), ylab = expression(theta[2]), xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE) sdetorus::torusAxis() # Perform TR-PCA fit <- ridge_pca(x = wind) show_ridge_pca(fit)