Package 'ClussCluster' reference manual

Title:	Simultaneous Detection of Clusters and Cluster-Specific Genes in High-Throughput Transcriptome Data
Description:	Implements a new method 'ClussCluster' descried in Ge Jiang and Jun Li, "Simultaneous Detection of Clusters and Cluster-Specific Genes in High-throughput Transcriptome Data" (Unpublished). Simultaneously perform clustering analysis and signature gene selection on high-dimensional transcriptome data sets. To do so, 'ClussCluster' incorporates a Lasso-type regularization penalty term to the objective function of K- means so that cell-type-specific signature genes can be identified while clustering the cells.
Authors:	Li Jun [cre], Jiang Ge [aut], Wang Chuanqi [ctb]
Maintainer:	Li Jun <[email protected]>
License:	GPL-3
Version:	0.1.0
Built:	2024-11-08 06:27:44 UTC
Source:	CRAN

Performs simultaneous detection of cell types and cell-type-specific signature genes

Description

ClussCluster takes the single-cell transcriptome data and returns an object containing cell types and type-specific signature gene sets

Selects the tuning parameter in a permutation approach. The tuning parameter controls the L1 bound on w, the feature weights.

Usage

ClussCluster(x, nclust = NULL, centers = NULL, ws = NULL,
  nepoch.max = 10, theta = NULL, seed = 1, nstart = 20,
  iter.max = 50, verbose = FALSE)

ClussCluster_Gap(x, nclust = NULL, B = 20, centers = NULL,
  ws = NULL, nepoch.max = 10, theta = NULL, seed = 1,
  nstart = 20, iter.max = 50, verbose = FALSE)
ClussCluster(x, nclust = NULL, centers = NULL, ws = NULL,
  nepoch.max = 10, theta = NULL, seed = 1, nstart = 20,
  iter.max = 50, verbose = FALSE)

ClussCluster_Gap(x, nclust = NULL, B = 20, centers = NULL,
  ws = NULL, nepoch.max = 10, theta = NULL, seed = 1,
  nstart = 20, iter.max = 50, verbose = FALSE)

Arguments

`x`	An nxp data matrix. There are n cells and p genes.
`nclust`	Number of clusters desired if the cluster centers are not provided. If both are provided, nclust must equal the number of cluster `centers`.
`centers`	A set of initial (distinct) cluster centres if the number of clusters (`nclust`) is null. If both are provided, the number of cluster centres must equal `nclust`.
`ws`	One or multiple candidate tuning parameters to be evaluated and compared. Determines the sparsity of the selected genes. Should be greater than 1.
`nepoch.max`	The maximum number of epochs. In one epoch, each cell will be evaluated to determine if its label needs to be updated.
`theta`	Optional argument. If provided, `theta` are used as the initial cluster labels of the ClussCluster algorithm; if not, K-means is performed to produce starting cluster labels.
`seed`	This seed is used wherever K-means is used.
`nstart`	Argument passed to `kmeans`. It is the number of random sets used in `kmeans`.
`iter.max`	Argument passed to `kmeans`. The maximum number of iterations allowed.
`verbose`	Print the updates inside every epoch? If TRUE, the updates of cluster label and the value of objective function will be printed out.
`B`	Number of permutation samples.

Details

Takes the normalized and log transformed number of reads mapped to genes (e.g., log(RPKM+1) or log(TPM+1) where RPKM stands for Reads Per Kilobase of transcript per Million mapped reads and TPM stands for transcripts per million) but NOT centered.

Value

a list containing the optimal tuning parameter, s, group labels of clustering, theta, and type-specific weights of genes, w.

a list containig a vector of candidate tuning parameters, ws, the corresponding values of objective function, O, a matrix of values of objective function for each permuted data and tuning parameter, O_b, gap statistics and their one standard deviations, Gap and sd.Gap, the result given by ClussCluster, run, the tuning parameters with the largest Gap statistic and within one standard deviation of the largest Gap statistic, bestw and onesd.bestw

Examples

data(Hou_sim)
hou.dat <-Hou_sim$x
run.ft <- filter_gene(hou.dat)
hou.test <- ClussCluster(run.ft$dat.ft, nclust=3, ws=4, verbose = FALSE)
data(Hou_sim)
hou.dat <-Hou_sim$x
run.ft <- filter_gene(hou.dat)
hou.test <- ClussCluster(run.ft$dat.ft, nclust=3, ws=4, verbose = FALSE)

Gene Filter

Description

Filters out genes that are not suitable for differential expression analysis.

Usage

filter_gene(dfname, minmean = 2, n0prop = 0.2, minsd = 1)
filter_gene(dfname, minmean = 2, n0prop = 0.2, minsd = 1)

Arguments

`dfname`	name of the expression data frame
`minmean`	minimum mean expression for each gene
`n0prop`	minimum proportion of zero expression (count) for each gene
`minsd`	minimum standard deviation of expression for each gene

Details

Takes an expression data frame that has been properly normalized but NOT centered. It returns a list with the slot dat.ft being the data set that satisfies the pre-set thresholds on minumum mean, standard deviation (sd), and proportion of zeros (n0prop) for each gene.

If the data has already been centered, one can still apply the filters of mean and sd but not n0prop.

Value

a list containing the data set with genes satisfying the thresholds, dat.ft, the name of dat.ft, and the indices of those kept genes, index.

Examples

dat <- matrix(rnbinom(300*60, mu = 2, size = 1), 300, 60)
dat_filtered <- filter_gene(dat, minmean=2, n0prop=0.2, minsd=1)
dat <- matrix(rnbinom(300*60, mu = 2, size = 1), 300, 60)
dat_filtered <- filter_gene(dat, minmean=2, n0prop=0.2, minsd=1)

A truncated subset of the scRNA-seq expression data set from Hou et.al (2016)

Description

This data contains expression levels (normalized and log-transformed) for 33 cells and 100 genes.

Usage

data(Hou_sim)
data(Hou_sim)

Format

An object containing the following variables:

x: An expression data frame of 33 HCC cells on 100 genes.
y: Numerical group indicator of all cells.
gnames: Gene names of all genes.
snames: Cell names of all cells.
groups: Cell group names.
note: A simple note of the data set.

Details

This data contains raw expression levels (log-transformed but not centered) for 33 HCC cells and 100 genes. The 33 cells belongs to three different subpopulations and exhibited different biological characteristics. For descriptions of how we generated this data, please refer to the paper.

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65364

References

Hou, Yu, et al. "Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas." Cell research 26.3 (2016): 304-319.

Examples

data(Hou_sim)
data <- Hou_sim$x
data(Hou_sim)
data <- Hou_sim$x

Plots the results of `ClussCluster`

Description

Plots the number of signature genes against the tuning parameters if multiple tuning parameters are evaluated in the object. If only one is included, then plot_ClussCluster returns a venn diagram and a heatmap at this particular tuning parameter.

Usage

plot_ClussCluster(object, m = 10, snames = NULL, gnames = NULL, ...)

top.m.hm(object, m, snames = NULL, gnames = NULL, ...)
plot_ClussCluster(object, m = 10, snames = NULL, gnames = NULL, ...)

top.m.hm(object, m, snames = NULL, gnames = NULL, ...)

Arguments

`object`	An object that is obtained by applying the ClussCluster function to the data set.
`m`	The number of top signature genes selected to produce the heatmap.
`snames`	The names of the cells.
`gnames`	The names of the genes
`...`	Addtional parameters, sent to the method

Details

If multiple tuning parameters are evaluated in the object, the number of signature genes is computed for each cluster and is plotted against the tuning parameters. Each color and line type corresponds to a cell type.

If only one tuning parameter is evaluated, two plots will be produced. One is the venn diagram of the cell-type-specific genes, the other is the heatmap of the data with the cells and top m signature genes. See more details in the paper.

Value

a ggplot2 object of the heatmap with top signature genes selected by ClussCluster

Examples

data(Hou_sim)
run.cc <- ClussCluster(Hou_sim$x, nclust = 3, ws = c(2.4, 5, 8.8))
plot_ClussCluster(run.cc, m = 5, snames=Hou$snames, gnames=Hou$gnames)

data(Hou_sim)
run.cc <- ClussCluster(Hou_sim$x, nclust = 3, ws = c(2.4, 5, 8.8))
plot_ClussCluster(run.cc, m = 5, snames=Hou$snames, gnames=Hou$gnames)

Plots the results of `ClussCluster_Gap`

Description

Plots the gap statistics and number of genes selected as the tuning parameter varies.

Usage

plot_ClussCluster_Gap(object)
plot_ClussCluster_Gap(object)

Arguments

object

object obtained from ClussCluster_Gap()

Prints out the results of `ClussCluster`

Description

Prints out the results of ClussCluster

Usage

print_ClussCluster(object)
print_ClussCluster(object)

Arguments

object

An object that is obtained by applying the ClussCluster function to the data set.

Prints out the results of `ClussCluster_Gap` Prints the gap statistics and number of genes selected for each candidate tuning parameter.

Description

Prints out the results of ClussCluster_Gap Prints the gap statistics and number of genes selected for each candidate tuning parameter.

Usage

print_ClussCluster_Gap(object)
print_ClussCluster_Gap(object)

Arguments

object

An object that is obtained by applying the ClussCluster_Gap function to the data set.

A simulated expression data set.

Description

An example data set containing expressing levels for 60 cells and 200 genes. The 60 cells belong to 4 cell types with 15 cells each. Each cell type is uniquely associated with 30 signature genes, i.e., the first cell type is associated with the first 30 genes, the second cell type is associated with the next 30 genes, so on and so forth. The remaining 80 genes show indistinct expression patterns among the four cell types and are considered as noise genes.

Usage

data(sim_dat)
data(sim_dat)

Format

A data frame with 60 cells on 200 genes.

Value

A simulated dataset used to demonstrate the application of ClussCluster.

Examples

data(sim_dat)
head(sim_dat)
data(sim_dat)
head(sim_dat)

Package 'ClussCluster'

Help Index

Performs simultaneous detection of cell types and cell-type-specific signature genes

Description

Usage

Arguments

Details

Value

Examples

Gene Filter

Description

Usage

Arguments

Details

Value

Examples

A truncated subset of the scRNA-seq expression data set from Hou et.al (2016)

Description

Usage

Format

Details

Source

References

Examples

Plots the results of ClussCluster

Description

Usage

Arguments

Details

Value

Examples

Plots the results of ClussCluster_Gap

Description

Usage

Arguments

Prints out the results of ClussCluster

Description

Usage

Arguments

Prints out the results of ClussCluster_Gap Prints the gap statistics and number of genes selected for each candidate tuning parameter.

Description

Usage

Arguments

A simulated expression data set.

Description

Usage

Format

Value

Examples

Plots the results of `ClussCluster`

Plots the results of `ClussCluster_Gap`

Prints out the results of `ClussCluster`

Prints out the results of `ClussCluster_Gap` Prints the gap statistics and number of genes selected for each candidate tuning parameter.