Package 'PHclust' reference manual

Title:	Poisson Hurdle Clustering for Sparse Microbiome Data
Description:	Clustering analysis for sparse microbiome data, based on a Poisson hurdle model.
Authors:	Zhili Qiao
Maintainer:	Zhili Qiao <zlqiao@iastate.edu>
License:	GPL-3
Version:	0.1.0
Built:	2025-03-14 07:34:18 UTC
Source:	CRAN

Calculate optimal number of clusters.

Description

This function estimates the optimal number of clusters for a given dataset.

Usage

Hybrid(data, absolute = FALSE, Kstart = NULL, Treatment)
Hybrid(data, absolute = FALSE, Kstart = NULL, Treatment)

Arguments

`data`	Data matrix with dimension N*P indicating N features and P samples.
`absolute`	Logical. Whether we should use absolute (TRUE) or relative (FALSE) abundance of features to determine clusters.
`Kstart`	Positive integer. The number of clusters for starting the hybrid merging algorithm. Should be relatively large to ensure that Kstart > optimal number of clusters. Uses max(50, sqrt(N)) by default.
`Treatment`	Vector of length p, indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.

Value

A positive integer indicating the optimal number of clusters

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))
######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

Poisson hurdle clustering

Description

This function gives the clustering result based on a Poisson hurdle model.

Usage

PHcluster(
  data,
  Treatment,
  nK,
  method = c("EM", "SA"),
  absolute = FALSE,
  cool = 0.9,
  nstart = 1
)
PHcluster(
  data,
  Treatment,
  nK,
  method = c("EM", "SA"),
  absolute = FALSE,
  cool = 0.9,
  nstart = 1
)

Arguments

`data`	Data matrix with dimension N*P indicating N features and P samples. The cluster analysis is done feature-wised.
`Treatment`	Vector of length P. Indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.
`nK`	Positive integer. Number of clusters.
`method`	Method for the algorithm. Can choose between "EM" as Expectation Maximization or "SA" as Simulated Annealing.
`absolute`	Logical. Whether we should use absolute (TRUE) or relative (False) abundance of features to determine clusters.
`cool`	Real number between (0, 1). Cooling rate for the "SA" algorithm. Uses 0.9 by default.
`nstart`	Positive integer. Number of starts for the entire algorithm. Note that as nstart increases the computational time also grows linearly. Uses 1 by default.

Value

cluster: Vector of length N consisting of integers from 1 to nK. Indicating final clustering result. For evaluating the clustering result please check NMI for Normalized Mutual Information.
prob: N*nK matrix. The (i, j)th element representing the probability that observation i belongs to cluster j.
log_l: Scaler. The Poisson hurdle log-likelihood of the final clustering result.
alpha: Vector of length N. The geometric mean abundance level for each feature, across all treatment groups.
Normalizer: vector of length P. The normalizing constant of sequencing depth for each sample.

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))
######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

Plot of feature abundance level

Description

This function plots the feature abundance level for each cluster, after extracting the effect of sample-wise normalization factors and feature-wise geometric mean.

Usage

plot_abundance(result, data, Treatment)
plot_abundance(result, data, Treatment)

Arguments

`result`	Clustering result from function PHclust().
`data`	Data matrix with dimension N*P indicating N features and P samples.
`Treatment`	Vector of length P. Indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.

Value

A plot for feature abundance level will be shown. No value is returned.

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))
######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

Sample of sparse microbiome count data

Description

A sample data matrix with 100 features in 2 true clusters, 4 treatment groups with 4 replicates in each group.

Usage

sample_data
sample_data

Format

The dataset contains 16 columns, indexed as A1 ~ A4, B1 ~ B4, C1 ~ C4, D1 ~ D4 to represent 4 treatment groups.

Examples

head(sample_data)
head(sample_data)

Package 'PHclust'

Help Index

Calculate optimal number of clusters.

Description

Usage

Arguments

Value

Examples

Poisson hurdle clustering

Description

Usage

Arguments

Value

Examples

Plot of feature abundance level

Description

Usage

Arguments

Value

Examples

Sample of sparse microbiome count data

Description

Usage

Format

Examples