Title: | Generate Heatmaps Based on Partitioning Around Medoids (PAM) |
---|---|
Description: | Data are partitioned (clustered) into k clusters "around medoids", which is a more robust version of K-means implemented in the function pam() in the 'cluster' package. The PAM algorithm is described in Kaufman and Rousseeuw (1990) <doi:10.1002/9780470316801>. Please refer to the pam() function documentation for more references. Clustered data is plotted as a split heatmap allowing visualisation of representative "group-clusters" (medoids) in the data as separated fractions of the graph while those "sub-clusters" are visualised as a traditional heatmap based on hierarchical clustering. |
Authors: | Vidal Fey [aut, cre], Henri Sara [aut] |
Maintainer: | Vidal Fey <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2 |
Built: | 2024-12-02 06:52:17 UTC |
Source: | CRAN |
This is the main wrapper function to be called by end users. It accepts a numeric matrix (or an object that can be coerced to a numeric matrix) or a number of data file formats and produces one or more PDFs with the plots.
PAM.hm( x, project.folder = ".", nsheets = 1, dec = ".", header = TRUE, symbolcol = 1, sample.names = NULL, cluster.number = 4, trim = NULL, winsorize.mat = TRUE, cols = "BlueWhiteRed", dendrograms = "Both", autoadj = TRUE, pdf.height = 10, pdf.width = 10, labelheight = 0.25, labelwidth = 0.2, r.cex = 0.5, c.cex = 1, medianCenter = NULL, log = FALSE, do.log = FALSE, log.base = 2, metric = "manhattan", na.strings = "NA", makeFolder = TRUE, do.pdf = FALSE, do.png = FALSE, save.objects = FALSE )
PAM.hm( x, project.folder = ".", nsheets = 1, dec = ".", header = TRUE, symbolcol = 1, sample.names = NULL, cluster.number = 4, trim = NULL, winsorize.mat = TRUE, cols = "BlueWhiteRed", dendrograms = "Both", autoadj = TRUE, pdf.height = 10, pdf.width = 10, labelheight = 0.25, labelwidth = 0.2, r.cex = 0.5, c.cex = 1, medianCenter = NULL, log = FALSE, do.log = FALSE, log.base = 2, metric = "manhattan", na.strings = "NA", makeFolder = TRUE, do.pdf = FALSE, do.png = FALSE, save.objects = FALSE )
x |
( |
project.folder |
( |
nsheets |
( |
dec |
( |
header |
( |
symbolcol |
( |
sample.names |
( |
cluster.number |
( |
trim |
( |
winsorize.mat |
( |
cols |
( |
dendrograms |
( |
autoadj |
( |
pdf.height |
( |
pdf.width |
( |
labelheight |
( |
labelwidth |
( |
r.cex |
( |
c.cex |
( |
medianCenter |
( |
log |
( |
do.log |
( |
log.base |
( |
metric |
( |
na.strings |
( |
makeFolder |
( |
do.pdf |
( |
do.png |
( |
save.objects |
( |
Argument x
can be a data.frame
or numeric matrix to be used directly for plotting the heatmap.
If it is a data.frame
argument symbolcol
sets the respective columns for symbols to be used as
labels and the column where the numeric data starts.
Matrices will be coerced to data frames.
The read function accepts txt, tsv, csv and xls files.
If PDF, PNG or R object files are to be saved, i.e., if the corresponding arguments are TRUE
, a results
folder will be created using time and date to create a unique name. The folder will be created in the directory
set by argument project.folder
. The reasoning behind that behaviour is that during development the
heatmap was used as data analysis tool testing various cluster.number
values with numerous files and
comparing the results.
The cluster.number
argument defines the numbers of clusters when doing PAM. After processing it is passed
one-by-one to argument k
in pam
. The numbers can be defined in the form
c("2","4-7", "9")
, for example, depending on the experimental setup. An integer vector is coerced to
character.
If autoadj
is TRUE
character expansion (cex) for rows annd columns, pdf width and height and
label width and height are adjusted automatically based on the dimensions of the data matrix and length
(number of characters) of the labels.
The default behavior regarding outliers is to winsorize the matrix before plotting, i.e., shrink outliers to the unscattered part of the data by replacing extreme values at both ends of the distribution with less extreme values. This is done for the same reason as trimming but the data will not be symmetrical around 0.
A list: Invisibly returns the results object from the PAM clustering.
Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc. doi:10.1002/9780470316801
# Generate a random 10x10 matrix and plot it using default values set.seed(1234) # for reproducibility mat <- matrix(rnorm(120), nrow = 20) # standard normal PAM.hm(mat, cluster.number = 3) ## Plot with more than one cluster number PAM.hm(mat, cluster.number = 2:4) # integer vector PAM.hm(mat, cluster.number = c("2", "4-5")) # character vector # Using the 'trim' argument ## Introduce outlier to the matrix and plot w/o trimming or winsorization mat[1] <- mat[1] * 10 PAM.hm(mat, cluster.number = 3, trim = NULL, winsorize = FALSE) ## calculate a trim value by getting the largest possible absolute integer and ## plot again tr <- min(abs(ceiling(c(min(mat, na.rm = TRUE), max(mat, na.rm = TRUE)))), na.rm = TRUE) PAM.hm(mat, cluster.number = 3, trim = tr, winsorize = FALSE) ## Note that the outlier is still visible but since it is less extreme ## it does not distort the colour scheme. # An example reading data from an Excel file # The function readxl::read_excel is used internally to read Excel files. # The example uses their example data. readxl_datasets <- readxl::readxl_example("datasets.xlsx") PAM.hm(readxl_datasets, cluster.number = 4, symbolcol = 5)
# Generate a random 10x10 matrix and plot it using default values set.seed(1234) # for reproducibility mat <- matrix(rnorm(120), nrow = 20) # standard normal PAM.hm(mat, cluster.number = 3) ## Plot with more than one cluster number PAM.hm(mat, cluster.number = 2:4) # integer vector PAM.hm(mat, cluster.number = c("2", "4-5")) # character vector # Using the 'trim' argument ## Introduce outlier to the matrix and plot w/o trimming or winsorization mat[1] <- mat[1] * 10 PAM.hm(mat, cluster.number = 3, trim = NULL, winsorize = FALSE) ## calculate a trim value by getting the largest possible absolute integer and ## plot again tr <- min(abs(ceiling(c(min(mat, na.rm = TRUE), max(mat, na.rm = TRUE)))), na.rm = TRUE) PAM.hm(mat, cluster.number = 3, trim = tr, winsorize = FALSE) ## Note that the outlier is still visible but since it is less extreme ## it does not distort the colour scheme. # An example reading data from an Excel file # The function readxl::read_excel is used internally to read Excel files. # The example uses their example data. readxl_datasets <- readxl::readxl_example("datasets.xlsx") PAM.hm(readxl_datasets, cluster.number = 4, symbolcol = 5)
Data are partitioned (clustered) into k clusters "around medoids", which is a more robust version of K-means implemented in the function pam() in the 'cluster' package. The PAM algorithm is described in Kaufman and Rousseeuw (1990) <doi:10.1002/9780470316801>. Please refer to the pam() function documentation for more references. Clustered data is plotted as a split heatmap allowing visualisation of representative "group-clusters" (medoids) in the data as separated fractions of the graph while those "sub-clusters" are visualised as a traditional heatmap based on hierarchical clustering.
Package: | PAMhm |
Type: | Package |
Initial version: | 0.1-0 |
Created: | 2011-01-07 |
License: | GPL-3 |
LazyLoad: | yes |
Vidal Fey <[email protected]>, Henri Sara <[email protected]> Maintainer: Vidal Fey <[email protected]>