| Title: | Self-Organizing Maps for Mixed-Attribute Data Using Gower Distance |
|---|---|
| Description: | Implements a variant of the Self-Organizing Map (SOM) algorithm designed for mixed-attribute datasets. Similarity between observations is computed using the Gower distance, and categorical prototypes are updated via heuristic strategies (weighted mode and multinomial sampling). Provides functions for model fitting, mapping, visualization (U-Matrix and component planes), and evaluation, making SOM applicable to heterogeneous real-world data. For methodological details see Sáez and Salas (2026) <doi:10.1007/s41060-025-00941-6>. |
| Authors: | Patricio Salas [aut, cre] (ORCID: <https://orcid.org/0000-0002-2201-4038>), Patricio Sáez [aut] (ORCID: <https://orcid.org/0000-0002-0113-3644>) |
| Maintainer: | Patricio Salas <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.0 |
| Built: | 2026-05-28 07:32:54 UTC |
| Source: | https://github.com/cran/GowerSom |
Computes, for each observation, the index of the best-matching neuron (BMU) in a trained Gower-SOM codebook and the corresponding Gower distance. Also converts BMU indices into grid coordinates (row, col).
get_bmu_gower(data, codebook, n_rows, n_cols)get_bmu_gower(data, codebook, n_rows, n_cols)
data |
A |
codebook |
A |
n_rows, n_cols
|
Integers, the SOM grid dimensions. |
A data.frame with the following columns:
Integer BMU index (1 .. n_rows * n_cols).
Numeric, the Gower distance to the BMU.
Integer, BMU grid row coordinate.
Integer, BMU grid column coordinate.
Patricio Sáez <[email protected]>; Patricio Salas <[email protected]>
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
set.seed(1) df <- data.frame( x1 = rnorm(10), x2 = rnorm(10), g = factor(sample(letters[1:3], 10, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 5) res <- get_bmu_gower(df, codebook = fit$weights, n_rows = 3, n_cols = 3) head(res)set.seed(1) df <- data.frame( x1 = rnorm(10), x2 = rnorm(10), g = factor(sample(letters[1:3], 10, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 5) res <- get_bmu_gower(df, codebook = fit$weights, n_rows = 3, n_cols = 3) head(res)
Maps new observations to their Best Matching Units (BMUs) using the
codebook and grid stored in a fitted gowersom object.
gsom_predict(object, newdata, ...)gsom_predict(object, newdata, ...)
object |
A |
newdata |
A |
... |
Additional arguments passed to internal functions (not used). |
This function is a convenience wrapper around get_bmu_gower.
It automatically extracts the grid dimensions from object\$coords
and applies BMU mapping for each observation in newdata.
A data.frame with the following columns:
Integer BMU index (1 .. n_rows * n_cols).
Numeric Gower distance to the BMU.
Integer, BMU grid row coordinate.
Integer, BMU grid column coordinate.
Patricio Sáez <[email protected]>; Patricio Salas <[email protected]>
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) # Map observations to BMUs pred <- gsom_predict(fit, df) head(pred)set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) # Map observations to BMUs pred <- gsom_predict(fit, df) head(pred)
Train a Self-Organizing Map (SOM) on datasets with mixed attributes (numeric and categorical) using Gower distance to find the BMU and heuristics to update categorical prototypes.
gsom_Training(data, grid_rows = 5, grid_cols = 5, learning_rate = 0.1, num_iterations = 100, radius = NULL, batch_size = 10, sampling = TRUE, set_seed = 123)gsom_Training(data, grid_rows = 5, grid_cols = 5, learning_rate = 0.1, num_iterations = 100, radius = NULL, batch_size = 10, sampling = TRUE, set_seed = 123)
data |
|
grid_rows, grid_cols
|
SOM grid dimensions (rows x cols). |
learning_rate |
Initial learning rate (decays exponentially). |
num_iterations |
Number of iterations. |
radius |
Initial neighborhood radius; default |
batch_size |
Mini-batch size per iteration. |
sampling |
Logical; if |
set_seed |
Integer random seed for reproducibility. |
Learning rate and neighborhood radius decay exponentially per iteration:
where is num_iterations and is radius
(default max(grid_rows, grid_cols)/2). For categorical variables,
the prototype combines current and input values weighted by
and the neighborhood kernel; if sampling = TRUE, a weighted draw
is used; otherwise a weighted mode is applied.
An object of class gowersom with:
weights: data.frame of trained neuron prototypes.
coords: data.frame of grid coordinates per neuron.
Patricio Sáez <[email protected]>; Patricio Salas <[email protected]>
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
set.seed(1) df <- data.frame( x1 = rnorm(50), x2 = rnorm(50), g = factor(sample(letters[1:3], 50, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, learning_rate = 0.1, num_iterations = 10, batch_size = 8, sampling = TRUE, set_seed = 123) str(fit)set.seed(1) df <- data.frame( x1 = rnorm(50), x2 = rnorm(50), g = factor(sample(letters[1:3], 50, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, learning_rate = 0.1, num_iterations = 10, batch_size = 8, sampling = TRUE, set_seed = 123) str(fit)
Calculates the U-Matrix (unified distance matrix) to visualize the topology and cluster structure of a Self-Organizing Map trained on mixed-attribute data. Each entry contains the average Gower distance between a neuron and its immediate neighbors in the rectangular grid.
gsom_Umatrix(codebook, n_rows, n_cols)gsom_Umatrix(codebook, n_rows, n_cols)
codebook |
A data.frame or matrix containing the SOM prototypes (weights), with one row per neuron. |
n_rows |
Integer, number of rows in the SOM grid. |
n_cols |
Integer, number of columns in the SOM grid. |
The function assumes a rectangular topology where each neuron has up to
four direct neighbors (up, down, left, right). For each neuron, the mean
Gower distance to its valid neighbors is computed using
daisy with metric = "gower".
A numeric matrix of size n_rows x n_cols, where each cell contains
the average distance between the corresponding neuron and its neighbors.
Patricio Sáez <[email protected]>; Patricio Salas <[email protected]>
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3) plot_Umatrix(U)set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3) plot_Umatrix(U)
Visualizes the U-Matrix of a trained Gower-SOM using ggplot2. The U-Matrix reveals cluster boundaries and topological structures in the map.
plot_Umatrix(u_matrix, fill_palette = "C")plot_Umatrix(u_matrix, fill_palette = "C")
u_matrix |
Numeric matrix as returned by |
fill_palette |
Character string, viridis option for the fill scale
(default |
The function reshapes the U-Matrix into long format and draws a raster heatmap
with geom_raster. By default, it uses perceptually uniform viridis
palettes for improved interpretability, but the palette can be changed through
fill_palette.
A ggplot object displaying the U-Matrix as a heatmap.
set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3) plot_Umatrix(U)set.seed(1) df <- data.frame( x1 = rnorm(20), x2 = rnorm(20), g = factor(sample(letters[1:3], 20, TRUE)) ) fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3, num_iterations = 5, batch_size = 4) U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3) plot_Umatrix(U)