Package 'mp'

Title: Multidimensional Projection Techniques
Description: Multidimensional projection techniques are used to create two dimensional representations of multidimensional data sets.
Authors: Francisco M. Fatore, Samuel G. Fadel
Maintainer: Francisco M. Fatore <[email protected]>
License: GPL
Version: 0.4.1
Built: 2024-12-25 06:35:08 UTC
Source: CRAN

Help Index


Force Scheme Projection

Description

Creates a 2D representation of the data based on a dissimilarity matrix. A few modifications have been made in relation to the method described in the literature: shuffled indices are used to minimize the order dependency factor, only a fraction of delta is used for better stability and a tolerance factor was introduced as a second stop criterion.

Usage

forceScheme(D, Y = NULL, max.iter = 50, tol = 0, fraction = 8,
  eps = 1e-05)

Arguments

D

A dissimilarity structure such as that returned by dist or a full symmetric matrix containing the dissimilarities.

Y

Initial 2D configuration. A random configuration will be used when omitted.

max.iter

Maximum number of iterations that the algorithm will run.

tol

The tolerance for the accumulated error between iterations. If set to 0, the algorithm will run max.iter times.

fraction

Controls the point movement. Larger values means less freedom to move.

eps

Minimum distance between two points.

Value

The 2D representation of the data.

References

Eduardo Tejada, Rosane Minghim, Luis Gustavo Nonato: On improved projection techniques to support visual exploration of multi-dimensional data sets. Information Visualization 2(4): 218-231 (2003)

See Also

dist (stats) and dist (proxy) for d computation

Examples

# Eurodist example
emb <- forceScheme(eurodist)
plot(emb, type = "n", xlab ="", ylab ="", asp=1, axes=FALSE, main="")
text(emb, labels(eurodist), cex = 0.6)

# Iris example
emb <- forceScheme(dist(iris[,1:4]))
plot(emb, col=iris$Species)

Tests whether the given matrix is symmetric.

Description

Tests whether the given matrix is symmetric.

Usage

is.symmetric(mat)

Arguments

mat

Matrix to be tested for symmetry.

Value

Whether the matrix is symmetric.


Local Affine Multidimensional Projection

Description

Creates a 2D representation of the data. Requires a subsample (sample.indices) and its 2D representation (Ys).

Usage

lamp(X, sample.indices = NULL, Ys = NULL, cp = 1)

Arguments

X

A data frame or matrix.

sample.indices

The indices of data points in X used as subsamples. If not given, some points from X will be randomly selected and Ys will be generated by calling forceScheme on them.

Ys

Initial 2D configuration of the data subsamples (will be ignored if sample.indices is NULL). Scaling the columns to [-0.5, 0.5] is recommended to avoid scaling problems.

cp

Proportion of nearest control points to be used.

Value

The 2D representation of the data.

References

Joia, P.; Paulovich, F.V.; Coimbra, D.; Cuminato, J.A.; Nonato, L.G., "Local Affine Multidimensional Projection," Visualization and Computer Graphics, IEEE Transactions on , vol.17, no.12, pp.2563,2571, Dec. 2011

Examples

# Iris example
emb <- lamp(iris[, 1:4])
plot(emb, col=iris$Species)

Least-Square Projection

Description

Creates a q-dimensional representation of multidimensional data. Requires a subsample (sample.indices) and its qD representation (Ys).

Usage

lsp(X, sample.indices = NULL, Ys = NULL, k = 15, q = 2)

Arguments

X

A data frame or matrix.

sample.indices

The indices of data points in X used as subsamples. If not given, some rows from X will be randomly selected and Ys will be generated by calling forceScheme on them.

Ys

Initial kD configuration of the data subsamples (will be ignored if sample.indices is NULL).

k

Number of neighbors used to build the neighborhood graph.

q

The target dimensionality.

Value

The qD representation of the data.

References

F. V. Paulovich, L. Nonato, R. Minghim, and H. Levkowitz, Least-Square Projection: A fast high-precision multidimensional projection technique and its application to document mapping, vol. 14, no. 3, pp. 564-575.

Examples

# Iris example
emb <- lsp(iris[, 1:4])
plot(emb, col=iris$Species)

Multidimensional Projection Techniques

Description

Implementation of multidimensional projection techniques


Pekalska's approach to speeding up Sammon's mapping.

Description

Creates a k-dimensional representation of the data. As input, a subsample and its k-dimensional mapping are required. The method approximates the subsample mapping to a linear mapping based on the distances matrix of the subsample and then applies the same mapping to all instances.

Usage

pekalska(D, sample.indices = NULL, Ys = NULL)

Arguments

D

dist object or distances matrix.

sample.indices

The indices of subsamples.

Ys

The subsample mapping (k-dimensional).

Value

The low-dimensional representation of the data.

References

Pekalska, E., de Ridder, D., Duin, R. P., & Kraaijveld, M. A. (1999). A new method of generalizing Sammon mapping with application to algorithm speed-up (pp. 221-228).


Part-Linear Multidimensional Projection

Description

Creates a k-dimensional representation of the data. As input, a subsample and its k-dimensional mapping (control points) are required. The method approximates the subsample mapping to a linear mapping and then applies the same mapping to all instances.

Usage

plmp(X, sample.indices = NULL, Ys = NULL, k = 2)

Arguments

X

A dataframe or matrix representing the data.

sample.indices

The indices of subsamples used as control points.

Ys

The control points.

k

The target dimensionality.

Value

The low-dimensional representation of the data.

References

Paulovich, F.V.; Silva, C.T.; Nonato, L.G., "Two-Phase Mapping for Projecting Massive Data Sets," Visualization and Computer Graphics, IEEE Transactions on , vol.16, no.6, pp.1281,1290, Nov.-Dec. 2010.

Examples

# Iris example
emb <- plmp(iris[,1:4])
plot(emb, col=iris$Species)

t-Distributed Stochastic Neighbor Embedding

Description

Creates a k-dimensional representation of the data by modeling the probability of picking neighbors using a Gaussian for the high-dimensional data and t-Student for the low-dimensional map and then minimizing the KL divergence between them. This implementation uses the same default parameters as defined by the authors.

Usage

tSNE(X, Y = NULL, k = 2, perplexity = 30, n.iter = 1000, eta = 500,
  initial.momentum = 0.5, final.momentum = 0.8, early.exaggeration = 4,
  gain.fraction = 0.2, momentum.threshold.iter = 20,
  exaggeration.threshold.iter = 100, max.binsearch.tries = 50)

Arguments

X

A data frame, data matrix, dissimilarity (distance) matrix or dist object.

Y

Initial k-dimensional configuration. If NULL, the method uses a random initial configuration.

k

Target dimensionality. Avoid anything other than 2 or 3.

perplexity

A rough upper bound on the neighborhood size.

n.iter

Number of iterations to perform.

eta

The "learning rate" for the cost function minimization

initial.momentum

The initial momentum used before changing

final.momentum

The momentum to use on remaining iterations

early.exaggeration

The early exaggeration applied to intial iterations

gain.fraction

Undocumented

momentum.threshold.iter

Number of iterations before using the final momentum

exaggeration.threshold.iter

Number of iterations before using the real probabilities

max.binsearch.tries

Maximum number of tries in binary search for parameters to achieve the target perplexity

Value

The k-dimensional representation of the data.

References

L.J.P. van der Maaten and G.E. Hinton. _Visualizing High-Dimensional Data Using t-SNE._ Journal of Machine Learning Research 9(Nov): 2579-2605, 2008.

Examples

# Iris example
emb <- tSNE(iris[, 1:4])
plot(emb, col=iris$Species)