Title: | Almost Linear-Time k-Medoids Clustering |
---|---|
Description: | Interface to a high-performance implementation of k-medoids clustering described in Tiwari, Zhang, Mayclin, Thrun, Piech and Shomorony (2020) "BanditPAM: Almost Linear Time k-medoids Clustering via Multi-Armed Bandits" <https://proceedings.neurips.cc/paper/2020/file/73b817090081cef1bca77232f4532c5d-Paper.pdf>. |
Authors: | Balasubramanian Narasimhan [aut, cre], Mo Tiwari [aut] (https://motiwari.com) |
Maintainer: | Balasubramanian Narasimhan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0-1 |
Built: | 2024-11-28 06:53:49 UTC |
Source: | CRAN |
banditpam is a high-performance package for almost linear-time k-medoids clustering. The methods are described in Tiwari, et al. 2020 (Advances in Neural Information Processing Systems 33).
Balasubramanian Narasimhan and Mo Tiwari
Return the number of threads banditpam is using
bpam_num_threads()
bpam_num_threads()
the number of threads banditpam is using
This class wraps around the C++ KMedoids class and exposes methods and fields of the C++ object.
k
(integer(1)
)
The number of medoids/clusters to create
max_iter
(integer(1)
)
max_iter the maximum number of SWAP steps the algorithm runs
build_conf
(integer(1)
)
Parameter that affects the width of BUILD confidence intervals, default 1000
swap_conf
(integer(1)
)
Parameter that affects the width of SWAP confidence intervals, default 10000
loss_fn
(character(1)
)
The loss function, "lp" (for p integer > 0) or one of "manhattan", "cosine", "inf" or "euclidean"
new()
Create a new KMedoids object
KMedoids$new( k = 5L, algorithm = c("BanditPAM", "PAM", "FastPAM1"), max_iter = 1000L, build_conf = 1000, swap_conf = 10000L )
k
number of medoids/clusters to create, default 5
algorithm
the algorithm to use, one of "BanditPAM", "PAM", "FastPAM1"
max_iter
the maximum number of SWAP steps the algorithm runs, default 1000
build_conf
parameter that affects the width of BUILD confidence intervals, default 1000
swap_conf
parameter that affects the width of SWAP confidence intervals, default 10000
a KMedoids object which can be used to fit the banditpam algorithm to data
get_algorithm()
Return the algorithm used
KMedoids$get_algorithm()
a string indicating the algorithm
fit()
Fit the KMedoids algorthm given the data and loss. It is advisable to set the seed before calling this method for reproducible results.
KMedoids$fit(data, loss, dist_mat = NULL)
data
the data matrix
loss
the loss function, either "lp" (p, integer indicating L_p loss) or one of "manhattan", "cosine", "inf" or "euclidean"
dist_mat
an optional distance matrix
get_medoids_final()
Return the final medoid indices after clustering
KMedoids$get_medoids_final()
a vector indices of the final mediods
get_statistic()
Get the specified statistic after clustering
KMedoids$get_statistic(what)
what
a string which should one of "dist_computations"
, "dist_computations_and_misc"
,
"misc_dist"
, "build_dist"
, "swap_dist"
, "cache_writes"
, "cache_hits"
,
or "cache_misses"
return
the statistic
print()
Printer.
KMedoids$print(...)
...
(ignored).
clone()
The objects of this class are cloneable with this method.
KMedoids$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Generate data from a Gaussian Mixture Model with the given means: set.seed(10) n_per_cluster <- 40 means <- list(c(0, 0), c(-5, 5), c(5, 5)) X <- do.call(rbind, lapply(means, MASS::mvrnorm, n = n_per_cluster, Sigma = diag(2))) obj <- KMedoids$new(k = 3) obj$fit(data = X, loss = "l2") meds <- obj$get_medoids_final() plot(X[, 1], X[, 2]) points(X[meds, 1], X[meds, 2], col = "red", pch = 19)
# Generate data from a Gaussian Mixture Model with the given means: set.seed(10) n_per_cluster <- 40 means <- list(c(0, 0), c(-5, 5), c(5, 5)) X <- do.call(rbind, lapply(means, MASS::mvrnorm, n = n_per_cluster, Sigma = diag(2))) obj <- KMedoids$new(k = 3) obj$fit(data = X, loss = "l2") meds <- obj$get_medoids_final() plot(X[, 1], X[, 2]) points(X[meds, 1], X[meds, 2], col = "red", pch = 19)