| Title: | Gower's Distance |
|---|---|
| Description: | Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP. |
| Authors: | Mark van der Loo [aut, cre], David Turner [ctb] |
| Maintainer: | Mark van der Loo <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.2 |
| Built: | 2026-05-13 05:25:41 UTC |
| Source: | https://github.com/cran/gower |
A C-based implementation of Gower's distance.
Maintainer: Mark van der Loo [email protected]
Other contributors:
David Turner [contributor]
Useful links:
Compute Gower's distance, pairwise between records in two data sets x
and y. Records from the smallest data set are recycled over.
gower_dist( x, y, pair_x = NULL, pair_y = NULL, eps = 1e-08, weights = NULL, ignore_case = FALSE, nthread = getOption("gd_num_thread") )gower_dist( x, y, pair_x = NULL, pair_y = NULL, eps = 1e-08, weights = NULL, ignore_case = FALSE, nthread = getOption("gd_num_thread") )
x |
|
y |
|
pair_x |
|
pair_y |
|
eps |
|
weights |
|
ignore_case |
|
nthread |
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using |
A numeric vector of length max(nrow(x),nrow(y)).
When there are no columns to compare, a message is printed and both
numeric(0) is returned invisibly.
There are three ways to specify which columns of x should be compared
with what columns of y. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y argument, specifying for each column in x
in order, which column in y must be used to pair it with (use 0
to skip a column in x). The third option is to explicitly specify the
columns to be matched using pair_x and pair_y.
Gower (1971) originally defined a similarity measure (, say)
with values ranging from 0 (completely dissimilar) to 1 (completely similar).
The distance returned here equals .
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
Find the top-n matches in y for each record in x.
gower_topn( x, y, pair_x = NULL, pair_y = NULL, n = 5, eps = 1e-08, weights = NULL, ignore_case = FALSE, nthread = getOption("gd_num_thread") )gower_topn( x, y, pair_x = NULL, pair_y = NULL, n = 5, eps = 1e-08, weights = NULL, ignore_case = FALSE, nthread = getOption("gd_num_thread") )
x |
|
y |
|
pair_x |
|
pair_y |
|
n |
The top-n indices and distances to return. |
eps |
|
weights |
|
ignore_case |
|
nthread |
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using |
A list with two array elements: index
and distance. Both have size n X nrow(x). Each ith column
corresponds to the top-n best matches of x with rows in y.
When there are no columns to compare, a message is printed and both
distance and index will be empty matrices; the list is
then returned invisibly.
# find the top 4 best matches in the iris data set with itself. x <- iris[1:3,] lookup <- iris[1:10,] gower_topn(x=x,y=lookup,n=4)# find the top 4 best matches in the iris data set with itself. x <- iris[1:3,] lookup <- iris[1:10,] gower_topn(x=x,y=lookup,n=4)