Package 'bfcluster'

Title: Buttler-Fickel Distance and R2 for Mixed-Scale Cluster Analysis
Description: Implements the distance measure for mixed-scale variables proposed by Buttler and Fickel (1995), based on normalized mean pairwise distances (Gini mean difference), and an R2 statistic to assess clustering quality.
Authors: Moritz Schäfer [aut, cre]
Maintainer: Moritz Schäfer <[email protected]>
License: MIT + file LICENSE
Version: 1.0.0
Built: 2026-05-23 07:53:31 UTC
Source: https://github.com/cran/bfcluster

Help Index


R² for Cluster Solutions after Buttler & Fickel (1995)

Description

Computes the proportion of explained distance variation (R²) for a given clustering solution using a distance matrix derived from the Buttler-Fickel distance. The statistic reflects how well the clustering partitions the total pairwise distance structure.

Usage

bf_R2(D, cluster)

Arguments

D

A distance object of class dist, usually computed via buttler_fickel_dist().

cluster

An integer or factor vector of cluster assignments, typically obtained from cutree() or another clustering method.

Details

The R² is defined as:

R2=1DwithinDtotalR^2 = 1 - \frac{D_{\text{within}}}{D_{\text{total}}}

where DtotalD_{\text{total}} is the sum of all pairwise distances and DwithinD_{\text{within}} is the sum of distances within clusters.

Value

A numeric value between 0 and 1 indicating the proportion of explained distance variation. Higher values represent better cluster fit.

Examples

df <- data.frame(
  sex    = factor(c("m","f","m","f")),
  height = c(180, 165, 170, 159),
  age    = c(25, 32, 29, 28)
)

types <- c("nominal", "metric", "metric")

D <- buttler_fickel_dist(df, types)
hc <- hclust(D)
cl <- cutree(hc, k = 2)

bf_R2(D, cl)

Buttler-Fickel Distance Matrix

Description

Computes a distance matrix following Buttler & Fickel (1995) for mixed-scale variables. Each variable-specific distance matrix is normalized by its mean pairwise distance (Gini mean difference), ensuring equal contribution of all variables to the overall distance.

Usage

buttler_fickel_dist(df, types)

Arguments

df

A data.frame where rows are cases and columns are variables.

types

A character vector of the same length as ncol(df), indicating the scale level of each variable. Allowed values are "metric", "ordinal", or "nominal".

Value

An object of class dist.