| Title: | A Collection of Fast, Exact and Eco-Friendly k-Means Clustering Algorithms |
|---|---|
| Description: | A collection of fast k-means clustering algorithms under a single, uniform interface. The core method is Geometric-k-means, a bound-free algorithm of Sharma et al. (2026) <doi:10.1007/s10994-025-06891-1> that uses geometry to restrict computation to the data points able to change clusters, substantially reducing distance computations and runtime while returning the same result as standard k-means. Also included are Lloyd's algorithm, Elkan, Hamerly, Annulus, Exponion, and Ball k-means. All algorithms are implemented in 'C++' via 'Rcpp' and 'RcppEigen' and return the final centroids, optional per-point cluster assignments, and computational statistics. |
| Authors: | Parichit Sharma [aut, cre, cph], Hasan Kurban [aut] |
| Maintainer: | Parichit Sharma <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-22 19:34:02 UTC |
| Source: | https://github.com/cran/geokmeans |
Run one of the bundled k-means variants on a numeric data matrix. All
functions share the same interface and return value; they differ only in the
acceleration strategy used internally. geo_kmeans() runs the bound-free
Geometric-k-means method.
geo_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) lloyd_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) elkan_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) hamerly_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) annulus_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) exponion_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) ball_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE )geo_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) lloyd_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) elkan_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) hamerly_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) annulus_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) exponion_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE ) ball_kmeans( data, centers, iter_max = 100L, threshold = 0.001, init = c("random", "sequential"), seed = NULL, with_labels = TRUE, verbose = FALSE, drop_empty = TRUE )
data |
A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed. |
centers |
Either a single positive integer giving the number of clusters
|
iter_max |
Maximum number of iterations. |
threshold |
Convergence threshold on centroid movement. |
init |
Initialisation strategy when |
seed |
Optional integer seed for the random initialisation, or |
with_labels |
Logical; if |
verbose |
Logical; if |
drop_empty |
Logical; if |
An object of class "geokmeans": a list with components
A k x ncol(data) matrix of final cluster centres.
Integer vector of cluster ids (1-based), if
with_labels = TRUE.
Number of iterations performed.
Total number of point-to-centroid distance computations.
The algorithm used.
The number of clusters.
Sharma, P., Stanislaw, M., Kurban, H., Kulekci, O., and Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. doi:10.1007/s10994-025-06891-1
set.seed(1) X <- rbind(matrix(rnorm(100, 0), ncol = 2), matrix(rnorm(100, 5), ncol = 2)) fit <- geo_kmeans(X, centers = 2) fit$centroids table(fit$cluster) # Supplying explicit starting centroids: geo_kmeans(X, centers = X[c(1, 51), ])set.seed(1) X <- rbind(matrix(rnorm(100, 0), ncol = 2), matrix(rnorm(100, 5), ncol = 2)) fit <- geo_kmeans(X, centers = 2) fit$centroids table(fit$cluster) # Supplying explicit starting centroids: geo_kmeans(X, centers = X[c(1, 51), ])
A thin dispatcher over the individual algorithm functions.
kmeans_dc( data, centers, method = c("geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball"), ... )kmeans_dc( data, centers, method = c("geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball"), ... )
data |
A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed. |
centers |
Either a single positive integer giving the number of clusters
|
method |
The algorithm to use. One of |
... |
Further arguments passed to the chosen algorithm. |
An object of class "geokmeans"; see geo_kmeans().
set.seed(1) X <- rbind(matrix(rnorm(100, 0), ncol = 2), matrix(rnorm(100, 5), ncol = 2)) kmeans_dc(X, centers = 2, method = "elkan")set.seed(1) X <- rbind(matrix(rnorm(100, 0), ncol = 2), matrix(rnorm(100, 5), ncol = 2)) kmeans_dc(X, centers = 2, method = "elkan")