Title: | Conditional Manifold Learning |
---|---|
Description: | Finds a low-dimensional embedding of high-dimensional data, conditioning on available manifold information. The current version supports conditional MDS (based on either conditional SMACOF in Bui (2021) <arXiv:2111.13646> or closed-form solution in Bui (2022) <doi:10.1016/j.patrec.2022.11.007>) and conditional ISOMAP in Bui (2021) <arXiv:2111.13646>. |
Authors: | Anh Tuan Bui [aut, cre] |
Maintainer: | Anh Tuan Bui <[email protected]> |
License: | GPL-2 |
Version: | 0.2.2 |
Built: | 2024-11-09 06:18:22 UTC |
Source: | CRAN |
Finds a low-dimensional embedding of high-dimensional data, conditioning on available manifold information. The current version supports conditional MDS (based on either conditional SMACOF or closed-form solution) and conditional ISOMAP.
Please cite this package as follows:
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters 164, 148-152. https://doi.org/10.1016/j.patrec.2022.11.007
Brief descriptions of the main functions of the package are provided below:
condMDS()
: is the conditional MDS method, which uses conditional SMACOF to optimize its conditional stress objective function.
condMDSeigen()
: is the conditional MDS method, which uses a closed-form solution based on multiple linear regression and eigendecomposition.
condIsomap()
: is the conditional ISOMAP method, which is basically conditional MDS applying to graph distances (i.e., estimated geodesic distances) of the given distances/dissimilarities.
Anh Tuan Bui
Maintainer: Anh Tuan Bui <[email protected]>
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters 164, 148-152. https://doi.org/10.1016/j.patrec.2022.11.007
## Generate car-brand perception data factor.weights <- c(90, 88, 83, 82, 81, 70, 68)/562 N <- 100 set.seed(1) data <- matrix(runif(N*7), N, 7) colnames(data) <- c('Quality', 'Safety', 'Value', 'Performance', 'Eco', 'Design', 'Tech') rownames(data) <- paste('Brand', 1:N) data.hat <- data + matrix(rnorm(N*7), N, 7)*data*.05 data.weighted <- t(apply(data, 1, function(x) x*factor.weights)) d <- dist(data.weighted) d.hat <- d + rnorm(length(d))*d*.05 ## The following examples use the first 4 factors as known features # Conditional MDS based on conditional SMACOF u.cmds = condMDS(d.hat, data.hat[,1:4], 3, init='none') u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional MDS based on the closed-form solution u.cmds = condMDSeigen(d.hat, data.hat[,1:4], 3) u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional MDS based on conditional SMACOF, # initialized by the closed-form solution u.cmds = condMDS(d.hat, data.hat[,1:4], 3, init='eigen') u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional ISOMAP u.cisomap = condIsomap(d.hat, data.hat[,1:4], 3, k = 20, init='eigen') u.cisomap$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cisomap$U)$cancor vegan::procrustes(data.hat[,5:7], u.cisomap$U, symmetric = TRUE)$ss
## Generate car-brand perception data factor.weights <- c(90, 88, 83, 82, 81, 70, 68)/562 N <- 100 set.seed(1) data <- matrix(runif(N*7), N, 7) colnames(data) <- c('Quality', 'Safety', 'Value', 'Performance', 'Eco', 'Design', 'Tech') rownames(data) <- paste('Brand', 1:N) data.hat <- data + matrix(rnorm(N*7), N, 7)*data*.05 data.weighted <- t(apply(data, 1, function(x) x*factor.weights)) d <- dist(data.weighted) d.hat <- d + rnorm(length(d))*d*.05 ## The following examples use the first 4 factors as known features # Conditional MDS based on conditional SMACOF u.cmds = condMDS(d.hat, data.hat[,1:4], 3, init='none') u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional MDS based on the closed-form solution u.cmds = condMDSeigen(d.hat, data.hat[,1:4], 3) u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional MDS based on conditional SMACOF, # initialized by the closed-form solution u.cmds = condMDS(d.hat, data.hat[,1:4], 3, init='eigen') u.cmds$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cmds$U)$cancor # canonical correlations vegan::procrustes(data.hat[,5:7], u.cmds$U, symmetric = TRUE)$ss # Procrustes statistic # Conditional ISOMAP u.cisomap = condIsomap(d.hat, data.hat[,1:4], 3, k = 20, init='eigen') u.cisomap$B # compare with diag(factor.weights[1:4]) ccor(data.hat[,5:7], u.cisomap$U)$cancor vegan::procrustes(data.hat[,5:7], u.cisomap$U, symmetric = TRUE)$ss
Computes canonical correlations for two sets of multivariate data x
and y
.
ccor(x, y)
ccor(x, y)
x |
the first multivariate dataset. |
y |
the second multivariate dataset. |
a list of the following components:
cancor |
a vector of canonical correlations. |
xcoef |
a matrix, each column of which is the vector of coefficients of x to produce the corresponding canonical covariate. |
ycoef |
a matrix, each column of which is the vector of coefficients of y to produce the corresponding canonical covariate. |
Anh Tuan Bui
ccor(iris[,1:2], iris[,3:4])
ccor(iris[,1:2], iris[,3:4])
Internal functions.
condDist(U, V.tilda, one_n_t=t(rep(1,nrow(U)))) condDist2(U, V.tilda2, one_n_t=t(rep(1,nrow(U))))
condDist(U, V.tilda, one_n_t=t(rep(1,nrow(U)))) condDist2(U, V.tilda2, one_n_t=t(rep(1,nrow(U))))
U |
the embedding |
V.tilda |
|
V.tilda2 |
|
one_n_t |
|
a dist
object.
Anh Tuan Bui
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Finds a low-dimensional manifold embedding of a given distance/dissimilarity matrix, conditioning on available manifold information. The method applies conditional MDS (see condMDS) to a graph distance matrix computed for the given distances/dissimilarities, using the isomap{vegan}
function.
condIsomap(d, V, u.dim, epsilon = NULL, k, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start, ...)
condIsomap(d, V, u.dim, epsilon = NULL, k, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start, ...)
d |
a distance/dissimilarity matrix of N entities (or a |
V |
an Nxq matrix of q manifold auxiliary parameter values of the N entities. |
u.dim |
the embedding dimension. |
epsilon |
shortest dissimilarity retained. |
k |
Number of shortest dissimilarities retained for a point. If both |
W |
an NxN symmetric weight matrix. If not given, a matrix of ones will be used. |
method |
if |
exact |
only relevant if |
it.max |
the max number of conditional SMACOF iterations. |
gamma |
conditional SMACOF stops early if the reduction of normalized conditional stress is less than |
init |
initialization method. |
U.start |
user-defined starting values for the embedding (when |
B.start |
starting |
... |
other arguments for the |
U |
the embedding result. |
B |
the estimated |
stress |
Normalized conditional stress value. |
sigma |
the conditional stress value at each iteration. |
init |
the value of the |
U.start |
the starting values for the embedding. |
B.start |
starting values for the |
method |
the value of the |
exact |
the value of the |
Anh Tuan Bui
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters 164, 148-152. https://doi.org/10.1016/j.patrec.2022.11.007
# see help(cml)
# see help(cml)
Wrapper of condSmacof
, which finds a low-dimensional embedding of a given distance/dissimilarity matrix, conditioning on available manifold information.
condMDS(d, V, u.dim, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start)
condMDS(d, V, u.dim, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start)
d |
a distance/dissimilarity matrix of N entities (or a |
V |
an Nxq matrix of q manifold auxiliary parameter values of the N entities. |
u.dim |
the embedding dimension. |
W |
an NxN symmetric weight matrix. If not given, a matrix of ones will be used. |
method |
if |
exact |
only relevant if |
it.max |
the max number of conditional SMACOF iterations. |
gamma |
conditional SMACOF stops early if the reduction of normalized conditional stress is less than |
init |
initialization method. |
U.start |
user-defined starting values for the embedding (when |
B.start |
starting |
U |
the embedding result. |
B |
the estimated |
stress |
Normalized conditional stress value. |
sigma |
the conditional stress value at each iteration. |
init |
the value of the |
U.start |
the starting values for the embedding. |
B.start |
starting values for the |
method |
the value of the |
exact |
the value of the |
Anh Tuan Bui
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters 164, 148-152. https://doi.org/10.1016/j.patrec.2022.11.007
condSmacof, condMDSeigen, condIsomap
# see help(cml)
# see help(cml)
Provides a closed-form solution for conditional multidimensional scaling, based on multiple linear regression and eigendecomposition.
condMDSeigen(d, V, u.dim, method = c('matrix', 'vector'))
condMDSeigen(d, V, u.dim, method = c('matrix', 'vector'))
d |
a |
V |
an Nxq matrix of q manifold auxiliary parameter values of the N entities. |
u.dim |
the embedding dimension. |
method |
if |
U |
the embedding result. |
B |
the estimated |
eig |
the computed eigenvalues. |
stress |
the corresponding normalized conditional stress value of the solution. |
Anh Tuan Bui
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters 164, 148-152. https://doi.org/10.1016/j.patrec.2022.11.007
# see help(cml)
# see help(cml)
Conditional SMACOF algorithms. Intended for internal usage.
condSmacof(d, V, u.dim, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start)
condSmacof(d, V, u.dim, W, method = c('matrix', 'vector'), exact = TRUE, it.max = 1000, gamma = 1e-05, init = c('none', 'eigen', 'user'), U.start, B.start)
d |
a |
V |
an Nxq matrix of q manifold auxiliary parameter values of the N entities. |
u.dim |
the embedding dimension. |
W |
an NxN symmetric weight matrix. If not given, a matrix of ones will be used. |
method |
if |
exact |
only relevant if |
it.max |
the max number of conditional SMACOF iterations. |
gamma |
conditional SMACOF stops early if the reduction of normalized conditional stress is less than |
init |
initialization method. |
U.start |
user-defined starting values for the embedding (when |
B.start |
starting |
U |
the embedding result. |
B |
the estimated |
stress |
Normalized conditional stress value. |
sigma |
the conditional stress value at each iteration. |
init |
the value of the |
U.start |
the starting values for the embedding. |
B.start |
starting values for the |
method |
the value of the |
exact |
the value of the |
Anh Tuan Bui
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Bui, A. T. (2022). A Closed-Form Solution for Conditional Multidimensional Scaling. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2022.11.007
Internal function.
cz(w, d, dz)
cz(w, d, dz)
w |
the |
d |
the |
dz |
the |
the matrix C(Z)
Anh Tuan Bui
Bui, A.T. (2021). Dimension Reduction with Prior Information for Knowledge Discovery. arXiv:2111.13646. https://arxiv.org/abs/2111.13646.
Computes the Moore-Penrose inverse (a.k.a., generalized inverse or pseudoinverse) of a matrix based on singular-value decomposition (SVD).
mpinv(A, eps = sqrt(.Machine$double.eps))
mpinv(A, eps = sqrt(.Machine$double.eps))
A |
a matrix of real numbers. |
eps |
a threshold (to be multiplied with the largest singular value) for dropping SVD parts that correspond to small singular values. |
the Moore-Penrose inverse.
Anh Tuan Bui
mpinv(2*diag(4))
mpinv(2*diag(4))