Title: | Multidimensional Projection Techniques |
---|---|
Description: | Multidimensional projection techniques are used to create two dimensional representations of multidimensional data sets. |
Authors: | Francisco M. Fatore, Samuel G. Fadel |
Maintainer: | Francisco M. Fatore <[email protected]> |
License: | GPL |
Version: | 0.4.1 |
Built: | 2024-11-25 06:37:02 UTC |
Source: | CRAN |
Creates a 2D representation of the data based on a dissimilarity matrix. A few modifications have been made in relation to the method described in the literature: shuffled indices are used to minimize the order dependency factor, only a fraction of delta is used for better stability and a tolerance factor was introduced as a second stop criterion.
forceScheme(D, Y = NULL, max.iter = 50, tol = 0, fraction = 8, eps = 1e-05)
forceScheme(D, Y = NULL, max.iter = 50, tol = 0, fraction = 8, eps = 1e-05)
D |
A dissimilarity structure such as that returned by dist or a full symmetric matrix containing the dissimilarities. |
Y |
Initial 2D configuration. A random configuration will be used when omitted. |
max.iter |
Maximum number of iterations that the algorithm will run. |
tol |
The tolerance for the accumulated error between iterations. If set to 0, the algorithm will run max.iter times. |
fraction |
Controls the point movement. Larger values means less freedom to move. |
eps |
Minimum distance between two points. |
The 2D representation of the data.
Eduardo Tejada, Rosane Minghim, Luis Gustavo Nonato: On improved projection techniques to support visual exploration of multi-dimensional data sets. Information Visualization 2(4): 218-231 (2003)
dist
(stats) and dist
(proxy) for d computation
# Eurodist example emb <- forceScheme(eurodist) plot(emb, type = "n", xlab ="", ylab ="", asp=1, axes=FALSE, main="") text(emb, labels(eurodist), cex = 0.6) # Iris example emb <- forceScheme(dist(iris[,1:4])) plot(emb, col=iris$Species)
# Eurodist example emb <- forceScheme(eurodist) plot(emb, type = "n", xlab ="", ylab ="", asp=1, axes=FALSE, main="") text(emb, labels(eurodist), cex = 0.6) # Iris example emb <- forceScheme(dist(iris[,1:4])) plot(emb, col=iris$Species)
Tests whether the given matrix is symmetric.
is.symmetric(mat)
is.symmetric(mat)
mat |
Matrix to be tested for symmetry. |
Whether the matrix is symmetric.
Creates a 2D representation of the data. Requires a subsample (sample.indices) and its 2D representation (Ys).
lamp(X, sample.indices = NULL, Ys = NULL, cp = 1)
lamp(X, sample.indices = NULL, Ys = NULL, cp = 1)
X |
A data frame or matrix. |
sample.indices |
The indices of data points in X used as subsamples. If not given, some points from X will be randomly selected and Ys will be generated by calling forceScheme on them. |
Ys |
Initial 2D configuration of the data subsamples (will be ignored if sample.indices is NULL). Scaling the columns to [-0.5, 0.5] is recommended to avoid scaling problems. |
cp |
Proportion of nearest control points to be used. |
The 2D representation of the data.
Joia, P.; Paulovich, F.V.; Coimbra, D.; Cuminato, J.A.; Nonato, L.G., "Local Affine Multidimensional Projection," Visualization and Computer Graphics, IEEE Transactions on , vol.17, no.12, pp.2563,2571, Dec. 2011
# Iris example emb <- lamp(iris[, 1:4]) plot(emb, col=iris$Species)
# Iris example emb <- lamp(iris[, 1:4]) plot(emb, col=iris$Species)
Creates a q-dimensional representation of multidimensional data. Requires a subsample (sample.indices) and its qD representation (Ys).
lsp(X, sample.indices = NULL, Ys = NULL, k = 15, q = 2)
lsp(X, sample.indices = NULL, Ys = NULL, k = 15, q = 2)
X |
A data frame or matrix. |
sample.indices |
The indices of data points in X used as subsamples. If not given, some rows from X will be randomly selected and Ys will be generated by calling forceScheme on them. |
Ys |
Initial kD configuration of the data subsamples (will be ignored if sample.indices is NULL). |
k |
Number of neighbors used to build the neighborhood graph. |
q |
The target dimensionality. |
The qD representation of the data.
F. V. Paulovich, L. Nonato, R. Minghim, and H. Levkowitz, Least-Square Projection: A fast high-precision multidimensional projection technique and its application to document mapping, vol. 14, no. 3, pp. 564-575.
# Iris example emb <- lsp(iris[, 1:4]) plot(emb, col=iris$Species)
# Iris example emb <- lsp(iris[, 1:4]) plot(emb, col=iris$Species)
Implementation of multidimensional projection techniques
Creates a k-dimensional representation of the data. As input, a subsample and its k-dimensional mapping are required. The method approximates the subsample mapping to a linear mapping based on the distances matrix of the subsample and then applies the same mapping to all instances.
pekalska(D, sample.indices = NULL, Ys = NULL)
pekalska(D, sample.indices = NULL, Ys = NULL)
D |
dist object or distances matrix. |
sample.indices |
The indices of subsamples. |
Ys |
The subsample mapping (k-dimensional). |
The low-dimensional representation of the data.
Pekalska, E., de Ridder, D., Duin, R. P., & Kraaijveld, M. A. (1999). A new method of generalizing Sammon mapping with application to algorithm speed-up (pp. 221-228).
Creates a k-dimensional representation of the data. As input, a subsample and its k-dimensional mapping (control points) are required. The method approximates the subsample mapping to a linear mapping and then applies the same mapping to all instances.
plmp(X, sample.indices = NULL, Ys = NULL, k = 2)
plmp(X, sample.indices = NULL, Ys = NULL, k = 2)
X |
A dataframe or matrix representing the data. |
sample.indices |
The indices of subsamples used as control points. |
Ys |
The control points. |
k |
The target dimensionality. |
The low-dimensional representation of the data.
Paulovich, F.V.; Silva, C.T.; Nonato, L.G., "Two-Phase Mapping for Projecting Massive Data Sets," Visualization and Computer Graphics, IEEE Transactions on , vol.16, no.6, pp.1281,1290, Nov.-Dec. 2010.
# Iris example emb <- plmp(iris[,1:4]) plot(emb, col=iris$Species)
# Iris example emb <- plmp(iris[,1:4]) plot(emb, col=iris$Species)
Creates a k-dimensional representation of the data by modeling the probability of picking neighbors using a Gaussian for the high-dimensional data and t-Student for the low-dimensional map and then minimizing the KL divergence between them. This implementation uses the same default parameters as defined by the authors.
tSNE(X, Y = NULL, k = 2, perplexity = 30, n.iter = 1000, eta = 500, initial.momentum = 0.5, final.momentum = 0.8, early.exaggeration = 4, gain.fraction = 0.2, momentum.threshold.iter = 20, exaggeration.threshold.iter = 100, max.binsearch.tries = 50)
tSNE(X, Y = NULL, k = 2, perplexity = 30, n.iter = 1000, eta = 500, initial.momentum = 0.5, final.momentum = 0.8, early.exaggeration = 4, gain.fraction = 0.2, momentum.threshold.iter = 20, exaggeration.threshold.iter = 100, max.binsearch.tries = 50)
X |
A data frame, data matrix, dissimilarity (distance) matrix or dist object. |
Y |
Initial k-dimensional configuration. If NULL, the method uses a random initial configuration. |
k |
Target dimensionality. Avoid anything other than 2 or 3. |
perplexity |
A rough upper bound on the neighborhood size. |
n.iter |
Number of iterations to perform. |
eta |
The "learning rate" for the cost function minimization |
initial.momentum |
The initial momentum used before changing |
final.momentum |
The momentum to use on remaining iterations |
early.exaggeration |
The early exaggeration applied to intial iterations |
gain.fraction |
Undocumented |
momentum.threshold.iter |
Number of iterations before using the final momentum |
exaggeration.threshold.iter |
Number of iterations before using the real probabilities |
max.binsearch.tries |
Maximum number of tries in binary search for parameters to achieve the target perplexity |
The k-dimensional representation of the data.
L.J.P. van der Maaten and G.E. Hinton. _Visualizing High-Dimensional Data Using t-SNE._ Journal of Machine Learning Research 9(Nov): 2579-2605, 2008.
# Iris example emb <- tSNE(iris[, 1:4]) plot(emb, col=iris$Species)
# Iris example emb <- tSNE(iris[, 1:4]) plot(emb, col=iris$Species)