Title: | The Iterative Signature Algorithm |
---|---|
Description: | The ISA is a biclustering algorithm that finds modules in an input matrix. A module or bicluster is a block of the reordered input matrix. |
Authors: | Gabor Csardi <[email protected]> |
Maintainer: | Gabor Csardi <[email protected]> |
License: | CC BY-NC-SA 4.0 |
Version: | 0.3.6 |
Built: | 2024-10-28 06:50:48 UTC |
Source: | CRAN |
The Iterative Signature Algorithm
The Iterative Signature Algorithm (ISA) is a biclustering algorithm. Biclustering algorithms classify simultaneously the rows and columns of an input matrix into biclusters, or as we will call them here, modules.
The easiest way to run ISA is to call the isa
function
with your input matrix as the single argument. This does all steps of
a typical ISA work flow, with the default parameters.
An ISA module is pair; a subset of the rows of the input matrix and a subset of its columns. In other words, a bicluster is a block of the reordered input matrix, where reordering means a permutation of both the rows and columns. (Another bicluster might be block of the same permuted input matrix or one after a different permutation.)
The criteria of a good bicluster is that 1) its rows are significantly different than the other rows, when we consider only the positions defined by the columns of the same bicluster, and (symmetrically) 2) its columns are significantly different than the other columns, when we consider only the positions defined by the rows of the same bicluster.
In other words, the rows of the bicluster are correlated, but only on the columns defined by the same bicluster; and the opposite is also true, the columns of the bicluster are correlated, but only on the rows defined by the same bicluster.
ISA biclusters are soft, two biclusters may overlap in their rows, columns or even both. It is also possible that some rows and/or columns of the input matrix are not found to be part of any ISA biclusters. Depending on the stringency parameters, it might even happen that ISA does not find any biclusters.
ISA biclusters are not only soft, but every row and column in a given bicluster has a score, a number between minus one and one. The further this number is from zero, then stronger is the association of the given row or column to the bicluster.
ISA works in an iterative way. For an
input matrix it starts from seed vector
, which is
typically a sparse 0/1 vector of length
. This defines a set of
rows in
. Then
is multiplied by
and the
result is thresholded. (Please see also ‘Normalization’ below.)
The thresholding is an important step of the ISA, without thresholding
ISA would be equivalent to a (not too effective) numerical singular
value decomposition (SVD). Currently thresholding is done by
calculating the mean and standard deviation of the vector and keeping
only elements that are further than a given number of standard
deviations from the mean. Based on the direction
parameter,
this means 1) keeping values that are significantly higher than the
mean (direction="up"
), significantly lower
(direction="down"
) or both
(direction="updown"
).
The thresholded vector is the (column)
‘signature’ of
. Then the (row) signature of
is calculated,
is multiplied by
and then thresholded to get
.
This iteration is performed until it converges, i.e.
and
are “close”, and
and
are also close. The convergence criteria,
i.e. what “close” means is by default defined by high Pearson
correlation.
It is very possible that the ISA finds the same modules more than once;
two or more seeds might converge to the same module. The function
isa.unique
eliminates every module from the result of
isa.iterate
that is very similar (in terms of
Pearson correlation) to the one that was already found before it.
The two main parameters of ISA are the two thresholds (one for the rows and one for the columns). They basically define the stringency of the modules. If the row threshold is high, then the modules will have very similar rows. If it is mild, then modules will be bigger, with less similar rows than in the first case.
By default (i.e. if the isa
function is used) the ISA is
performed from random sparse starting seeds, generated by
generate.seeds
. This way the algorithm is
completely unsupervised, but also stochastic: it might give different
results for different runs.
It is possible to use non-random seeds as well, if you have some
knowledge about the data or are interested in a particular subset of
rows/columns, then you can feed in your seeds into the
isa.iterate
function directly. In this case the
algorithm is deterministic, for the same seed you will always get the
same results.
On in silico data we observed that ISA has the best performance if the
input matrix is normalized (see isa.normalize
). The
normalization produces two matrices: and
.
is calculated by transposing
and
centering and scaling its rows (see
scale
). is
calculated by centering and scaling the rows of
.
is
used to calculate the column signature of rows and
is used
to calculate the signature of the columns.
It is possible to use another normalization, then the user is
requested to supply the normalized input data in a named list,
including the two matrices of appropriate
dimensions. ‘Er
’ will be used for calculating the
signature of the rows, ‘Ec
’ the signature of the
columns. If you want to use the same matrix in both steps, then supply
it twice, the first one transposed.
As ISA is an unsupervised algorithm, it may very well find some modules, even if you feed in noise as an input matrix. To avoid these spurious modules we defined a robustness measure, a single number for a modules that gives how well the rows and the columns are correlated.
It recommended that the user uses isa.filter.robust
to
run ISA on the scrambled input matrix with the same threshold
parameters and then drop every module, which has a robustness score
lower than the highest robustness score among modules found in the
scrambled data.
Please see the manual page and the source code of isa
for a typical ISA work flow. (You can obtain the source code by typing
‘isa
’ (without the apostrophes) into your R prompt and
pressing ENTER.)
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
The vignette in the package and isa
for running ISA.
Generate random input seeds for the ISA.
generate.seeds (length, count = 100, method = c("uni"), sparsity=2)
generate.seeds (length, count = 100, method = c("uni"), sparsity=2)
length |
The length of the seeds, should be the number of rows in your input data for row seeds and the number of columns for column seeds. |
count |
The number of seeds to gnerate. |
method |
The method for generating the seeds. Currently only
|
sparsity |
A numeric scalar, an integer number giving the number of non-zero values in each seed vector. It will be recycled to have the same length as the number of seeds. |
This function can generate a 0/1 matrix whose columns are the seeds of
the ISA. The result can be use as the row.seeds
(or
col.seeds
) argument of the isa.iterate
function.
A numeric matrix with 0/1 values.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## Just to get always the same result set.seed(24) ## Create some random seeds with different sparseness data <- isa.in.silico() sparsity <- rep( c(1,5,25,125), length=100) row.seeds <- generate.seeds(length=nrow(data[[1]]), count=100, sparsity=sparsity) ## Do ISA with the seeds normed.data <- isa.normalize(data[[1]]) isaresult <- isa.iterate(normed.data, thr.row=1, thr.col=1, row.seeds=row.seeds) ## Add the sparsity to the seed data isaresult$seeddata$sparsity <- sparsity ## Check which ones leed to higher robustness scores rob <- robustness(normed.data, isaresult$rows, isaresult$columns) tapply(rob, sparsity, mean) ## About the same ## How many unique modules did we find for the different sparsity isaresult.unique <- isa.unique(normed.data, isaresult) tapply(seq_len(ncol(isaresult.unique$rows)), isaresult.unique$seeddata$sparsity, length) ## We usually find more modules with sparser seeds
## Just to get always the same result set.seed(24) ## Create some random seeds with different sparseness data <- isa.in.silico() sparsity <- rep( c(1,5,25,125), length=100) row.seeds <- generate.seeds(length=nrow(data[[1]]), count=100, sparsity=sparsity) ## Do ISA with the seeds normed.data <- isa.normalize(data[[1]]) isaresult <- isa.iterate(normed.data, thr.row=1, thr.col=1, row.seeds=row.seeds) ## Add the sparsity to the seed data isaresult$seeddata$sparsity <- sparsity ## Check which ones leed to higher robustness scores rob <- robustness(normed.data, isaresult$rows, isaresult$columns) tapply(rob, sparsity, mean) ## About the same ## How many unique modules did we find for the different sparsity isaresult.unique <- isa.unique(normed.data, isaresult) tapply(seq_len(ncol(isaresult.unique$rows)), isaresult.unique$seeddata$sparsity, length) ## We usually find more modules with sparser seeds
Run ISA with the default parameters
## S4 method for signature 'matrix' isa(data, ...)
## S4 method for signature 'matrix' isa(data, ...)
data |
The input. It must be a numeric matrix. It may contain
|
... |
Additional arguments, see details below. |
Please read the isa2-package manual page for an introduction on ISA.
This function can be called as
isa(data, thr.row=seq(1,3,by=0.5), thr.col=seq(1,3,by=0.5), no.seeds=100, direction=c("updown", "updown"))
where the arguments are:
The input. It must be a numeric matrix. It may contain
NA
and/or NaN
values, but then the algorithm might be a
bit slower, as R matrix multiplication might be slower for these
matrices, depending on your platform.
Numeric vector.
The row threshold parameters for which the ISA will be
run. We use all possible combinations of thr.row
and
thr.col
.
Numeric vector.
The column threshold parameters for which the ISA will be run. We
use all possible combinations of thr.row
and thr.col
.
Integer scalar, the number of seeds to use.
Character vector of length two, one for the rows, one
for the columns. It specifies whether we are interested in
rows/columns that are higher (‘up
’) than average,
lower than average (‘down
’), or both
(‘updown
’).
The isa
function provides an easy to use interface to the
ISA. It runs all steps of a typical ISA work flow with their default
parameters.
This involves:
Normalizing the data by calling isa.normalize
.
Generating random input seeds via
generate.seeds
.
Running ISA with all combinations of given row and column
thresholds, (by default 1, 1.5, 2, 2.5, 3); by calling
isa.iterate
.
Merging similar modules, separately for each threshold
combination, by calling isa.unique
.
Filtering the modules separately for each threshold combination,
by calling isa.filter.robust
.
Putting all modules from the runs with different thresholds into a single object.
Merging similar modules, across all threshold combinations, if two modules are similar, then the larger one, the one with the milder thresholds is kept.
Please see the manual pages of these functions for the details or if you want to change their default parameters.
A named list is returned with the following elements:
rows |
The row components in the biclusters, a numeric matrix. Every column in it corresponds to a bicluster, if an element (the score of the row) is non-zero, that means that the row is included in the bicluster, otherwise it is not. Scores are between -1 and 1. If the scores of two rows have the same (nonzero) sign, that means that the two corresponding rows “behave” the same way. If they have opposite sign, that means that they behave the opposite way. If the corresponding seed has not converged during the allowed
number of iterations, then that column of |
columns |
The column components of the biclusters, in the same format as the rows. If the corresponding seed has not converged during the allowed
number of iterations, then that column of |
seeddata |
A data frame containing information about the biclusters. There is one row for each bicluster. The data frame has the following columns:
|
rundata |
A named list with information about the ISA runs. It has the following entries:
|
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative Signature Algorithm. See the functions mentioned above if you want to change the default ISA parameters.
## Not run: ## We generate some noisy in-silico data with modules and try to find ## them with the ISA. This might take one or two minutes. data <- isa.in.silico(noise=0.1) isa.result <- isa(data[[1]]) ## Find the best bicluster for each block in the input best <- apply(cor(isa.result$rows, data[[2]]), 2, which.max) ## Check correlation sapply(seq_along(best), function(x) cor(isa.result$rows[,best[x]], data[[2]][,x])) ## The same for the columns sapply(seq_along(best), function(x) cor(isa.result$columns[,best[x]], data[[3]][,x])) ## Plot the data and the modules found if (interactive()) { layout(rbind(1:2,3:4)) image(data[[1]], main="In-silico data") sapply(best, function(b) image(outer(isa.result$rows[,b], isa.result$columns[,b]), main=paste("Module", b))) } ## End(Not run)
## Not run: ## We generate some noisy in-silico data with modules and try to find ## them with the ISA. This might take one or two minutes. data <- isa.in.silico(noise=0.1) isa.result <- isa(data[[1]]) ## Find the best bicluster for each block in the input best <- apply(cor(isa.result$rows, data[[2]]), 2, which.max) ## Check correlation sapply(seq_along(best), function(x) cor(isa.result$rows[,best[x]], data[[2]][,x])) ## The same for the columns sapply(seq_along(best), function(x) cor(isa.result$columns[,best[x]], data[[3]][,x])) ## Plot the data and the modules found if (interactive()) { layout(rbind(1:2,3:4)) image(data[[1]], main="In-silico data") sapply(best, function(b) image(outer(isa.result$rows[,b], isa.result$columns[,b]), main=paste("Module", b))) } ## End(Not run)
This function converts the object with ISA modules to a Biclust
object, so all the functions in the biclust
package can be used
on it.
isa.biclust(modules)
isa.biclust(modules)
modules |
The ISA modules, as returned by the |
biclust
is an R package that implements many biclustering
algorithms in a unified framework. This function converts a set of ISA
biclusters to a Biclust
object, this class is used to store all
biclustering results by the biclust
package.
The Biclust
class only supports binary biclusters, so the ISA
modules are binarized during the conversion.
A Biclust
object.
Gabor Csardi [email protected]
## You need the biclust package for this ## Not run: if (require(biclust)) { set.seed(1) data <- isa.in.silico() modules <- isa(data[[1]]) bc <- isa.biclust(modules) ## A heatmap drawHeatmap(data[[1]], bc, 1) ## A "bubble" plot bubbleplot(data[[1]], bc) ## Compare values inside and outside the bicluster plotclust(bc, data[[1]]) ## Plot profiles of bicluster elements parallelCoordinates(data[[1]], bc, number=1) ## Coherence measures vs. ISA robustness cV <- sapply(seq(bc@Number), function(x) constantVariance(data[[1]], bc, x, dimension="both")) aV <- sapply(seq(bc@Number), function(x) additiveVariance(data[[1]], bc, x, dimension="both")) mV <- sapply(seq(bc@Number), function(x) multiplicativeVariance(data[[1]], bc, x, dimension="both")) sV <- sapply(seq(bc@Number), function(x) signVariance(data[[1]], bc, x, dimension="both")) rob <- robustness(isa.normalize(data[[1]]), modules$rows, modules$columns) cor( cbind(cV, aV, mV, sV, rob) ) } ## End(Not run)
## You need the biclust package for this ## Not run: if (require(biclust)) { set.seed(1) data <- isa.in.silico() modules <- isa(data[[1]]) bc <- isa.biclust(modules) ## A heatmap drawHeatmap(data[[1]], bc, 1) ## A "bubble" plot bubbleplot(data[[1]], bc) ## Compare values inside and outside the bicluster plotclust(bc, data[[1]]) ## Plot profiles of bicluster elements parallelCoordinates(data[[1]], bc, number=1) ## Coherence measures vs. ISA robustness cV <- sapply(seq(bc@Number), function(x) constantVariance(data[[1]], bc, x, dimension="both")) aV <- sapply(seq(bc@Number), function(x) additiveVariance(data[[1]], bc, x, dimension="both")) mV <- sapply(seq(bc@Number), function(x) multiplicativeVariance(data[[1]], bc, x, dimension="both")) sV <- sapply(seq(bc@Number), function(x) signVariance(data[[1]], bc, x, dimension="both")) rob <- robustness(isa.normalize(data[[1]]), modules$rows, modules$columns) cor( cbind(cV, aV, mV, sV, rob) ) } ## End(Not run)
This function generates a test data set for ISA, containing modules of prescribed number, size, signal level, internal noise and background noise.
isa.in.silico (num.rows = 300, num.cols = 50, num.fact = 3, mod.row.size = round(0.5 * num.rows/num.fact), mod.col.size = round(0.5 * num.cols/num.fact), noise = 0.1, mod.signal = rep(1, num.fact), mod.noise = rep(0, num.fact), overlap.row = 0, overlap.col = overlap.row)
isa.in.silico (num.rows = 300, num.cols = 50, num.fact = 3, mod.row.size = round(0.5 * num.rows/num.fact), mod.col.size = round(0.5 * num.cols/num.fact), noise = 0.1, mod.signal = rep(1, num.fact), mod.noise = rep(0, num.fact), overlap.row = 0, overlap.col = overlap.row)
num.rows |
The number of rows in the data matrix. |
num.cols |
The number of columns in the data matrix. |
num.fact |
The number of modules to put into the data. |
mod.row.size |
The size of the modules, the number of rows per module. It can be a scalar or a vector and it is recycled. |
mod.col.size |
The size of the modules, the number of columns per module. It can be a scalar or a vector and it is recycled. |
noise |
The level of the background noise to be added to the data matrix. It gives the standard deviation of the normal distribution from which the noise is generated. |
mod.signal |
The signal level of the modules. |
mod.noise |
The noise levels of the different modules. Normally
distributed noise with standard deviation |
overlap.row |
The overlap of the modules, for the rows. Zero means no overlap, one means one overlapping row, etc. |
overlap.col |
The overlap of the modules, for the columns. Zero means no overlap, one means one overlapping column, etc. |
isa.in.silico
creates an artificial data set to test the ISA or
any other biclustering algorithm. It creates a data matrix with a
checkerboard matrix. In other words, potentially overlapping blocks
are planted into a noisy background matrix.
These blocks may have different signal and noise levels and they might also overlap. See the parameters above.
A list with three matrices. The first matrix is the in silico data, the second contains the rows of the correct modules, the third the columns.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## Define a function for plotting if we are interactive if (interactive()) { layout( rbind(1:2,3:4) ) } myimage <- function(mat) { if (interactive()) { par(mar=c(1,2,2,1)); image(mat[[1]]) } } ## Create a simple checkerboard without overlap and noise silico1 <- isa.in.silico(100, 100, 10, mod.row.size=10, mod.col.size=10, noise=0) myimage(silico1) ## The same, but with some overlap and noise silico2 <- isa.in.silico(100, 100, 10, mod.row.size=10, mod.col.size=10, noise=0.1, overlap.row=3) myimage(silico2) ## Modules with different noise levels silico3 <- isa.in.silico(100, 100, 5, mod.row.size=10, mod.col.size=10, noise=0.01, mod.noise=seq(0.1,by=0.1,length=5)) myimage(silico3) ## Modules with different signal levels silico4 <- isa.in.silico(100, 100, 5, mod.row.size=10, mod.col.size=10, noise=0.01, mod.signal=seq(1,5,length=5)) myimage(silico4)
## Define a function for plotting if we are interactive if (interactive()) { layout( rbind(1:2,3:4) ) } myimage <- function(mat) { if (interactive()) { par(mar=c(1,2,2,1)); image(mat[[1]]) } } ## Create a simple checkerboard without overlap and noise silico1 <- isa.in.silico(100, 100, 10, mod.row.size=10, mod.col.size=10, noise=0) myimage(silico1) ## The same, but with some overlap and noise silico2 <- isa.in.silico(100, 100, 10, mod.row.size=10, mod.col.size=10, noise=0.1, overlap.row=3) myimage(silico2) ## Modules with different noise levels silico3 <- isa.in.silico(100, 100, 5, mod.row.size=10, mod.col.size=10, noise=0.01, mod.noise=seq(0.1,by=0.1,length=5)) myimage(silico3) ## Modules with different signal levels silico4 <- isa.in.silico(100, 100, 5, mod.row.size=10, mod.col.size=10, noise=0.01, mod.signal=seq(1,5,length=5)) myimage(silico4)
Perform ISA on the (normalized) input matrix.
## S4 method for signature 'list' isa.iterate(normed.data, ...)
## S4 method for signature 'list' isa.iterate(normed.data, ...)
normed.data |
The normalized data. A list of two matrices,
usually coming from |
... |
Additional arguments, see details below. |
isa.iterate
performs the ISA iteration on the specified input
seeds. It can be called as
isa.iterate(normed.data, row.seeds, col.seeds, thr.row, thr.col = thr.row, direction = c("updown", "updown"), convergence = c("corx", "cor", "eps"), cor.limit = 0.99, eps = 1e-04, corx=3, oscillation = FALSE, maxiter = 100)
where the arguments are:
The normalized data. A list of two matrices,
usually coming from isa.normalize
.
The row seed vectors to start the ISA runs
from. Every column is a seed vector. (If this argument and
col.seeds
are both present, then both of them are used.)
The column seed vectors to start the ISA runs from,
every column is a seed vector. (If this argument and
row.seeds
are both present, then both of them are used.)
Numeric scalar or vector giving the threshold parameter for the rows. Higher values indicate a more stringent threshold and the result biclusters will contain less rows on average. The threshold is measured by the number of standard deviations from the mean, over the values of the row vector. If it is a vector then it must contain an entry for each seed.
Numeric scalar or vector giving the threshold parameter
for the columns. The analogue of thr.row
.
Character vector of length two, one for the rows, one
for the columns. It specifies whether we are interested in
rows/columns that are higher (‘up
’) than average,
lower than average (‘down
’), or both
(‘updown
’).
Character scalar, the convergence criteria for the
ISA iteration. If it is ‘cor
’, then convergence is
measured based on high Pearson correlation (see the cor.limit
argument below) of the subsequent row and
column vectors. If it is ‘eps
’, then all entries of
the subsequent row and column vectors must be close to each other,
see the eps
argument below.
‘corx
’ is similar to ‘cor
’, but the
current row/column vectors are compared to the ones corx
steps ago, instead of the ones in the previous step. See the
corx
argument below, that gives the size of the shift.
The correlation limit for convergence if the
‘cor
’ method is used.
Limit for convergence if the ‘eps
’ method is
used.
The number of iterations to use as a shift, for checking
convergence with the ‘corx
’ method.
Logical scalar, whether to look for oscillating
seeds. Usually there are not too many oscillating seeds, so it is
safe to leave this on FALSE
.
The maximum number of iterations allowed.
A named list with many components. Please see the manual page of
isa
for a complete description.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## A basic ISA work flow for a single threshold combination ## In-silico data set.seed(1) insili <- isa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Eliminate duplicates isares <- isa.unique(nm, isares) ## Filter out not robust ones isares <- isa.filter.robust(insili[[1]], nm, isares) ## Print the sizes of the modules cbind( colSums(isares$rows!=0), colSums(isares$columns!=0) ) ## Plot the original data and the modules found if (interactive()) { layout(rbind(1:2)) image(insili[[1]], main="In silico data") image(outer(isares$rows[,1],isares$columns[,1])+ outer(isares$rows[,2],isares$columns[,2])+ outer(isares$rows[,3],isares$columns[,3]), main="ISA modules") }
## A basic ISA work flow for a single threshold combination ## In-silico data set.seed(1) insili <- isa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Eliminate duplicates isares <- isa.unique(nm, isares) ## Filter out not robust ones isares <- isa.filter.robust(insili[[1]], nm, isares) ## Print the sizes of the modules cbind( colSums(isares$rows!=0), colSums(isares$columns!=0) ) ## Plot the original data and the modules found if (interactive()) { layout(rbind(1:2)) image(insili[[1]], main="In silico data") image(outer(isares$rows[,1],isares$columns[,1])+ outer(isares$rows[,2],isares$columns[,2])+ outer(isares$rows[,3],isares$columns[,3]), main="ISA modules") }
Normalize a matrix and create a form that can be effectively used for ISA runs.
## S4 method for signature 'matrix' isa.normalize(data, ...)
## S4 method for signature 'matrix' isa.normalize(data, ...)
data |
A numeric matrix, the input data. It might contains
|
... |
Additional arguments, see details below. |
This function can be called as
isa.normalize(data, prenormalize = FALSE)
where the arguments are:
A numeric matrix, the input data. It might contains
NA
and/or NaN
values.
Logical scalar, see details below.
It was observed that the ISA works better if the input matrix is scaled and its rows have mean zero and standard deviation one.
An ISA step consists of two sub-steps, and this implies two different normalizations, in the first the rows, in the second the columns of the input matrix will be scaled.
If the prenormalize
argument is set to TRUE
, then
row-wise scaling is calculated on the column-wise scaled matrix and
not on the input matrix directly.
A list of two normalized matrices, the first one is transposed.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## In-silico data set.seed(1) insili <- isa.in.silico() nm <- isa.normalize(insili[[1]]) ## Check it max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(apply(t(nm[[1]]), 2, sd) - 1)) max(abs(apply(t(nm[[2]]), 2, sd) - 1)) ## Plot them if (interactive()) { layout(rbind(1:2,3:4)) image(insili[[1]], main="Original data") image(t(nm[[1]]), main="Row normalized") image(nm[[2]], main="Column normalized") }
## In-silico data set.seed(1) insili <- isa.in.silico() nm <- isa.normalize(insili[[1]]) ## Check it max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(apply(t(nm[[1]]), 2, sd) - 1)) max(abs(apply(t(nm[[2]]), 2, sd) - 1)) ## Plot them if (interactive()) { layout(rbind(1:2,3:4)) image(insili[[1]], main="Original data") image(t(nm[[1]]), main="Row normalized") image(nm[[2]], main="Column normalized") }
This function can be used to set various options that affect many functions in the isa package.
isa.option(...)
isa.option(...)
... |
A single option query, or option assignments, these must be named, too. See details below. |
The isa.option
function can be used in three forms. First,
calling it without any arguments returns a named list of the current
values of all isa options.
Second, calling it with a character scalar as the single argument, it returns the value of the specified option.
Third, calling it with a named argument (or more named arguments) set the specified options to the given values.
Here is a list of all the currently supported options:
verbose
Logical scalar. Whether to report what the isa
functions are currently doing. Defaults to FALSE
.
status.function
A function object, it serves as a callback for printing status messages.
In the first form, isa.option
returns a named list with the
current values of all options.
In the second form, it returns the value of the specified option.
In the third form, it returns a named list with the current values of all options, invisibly.
Gabor Csardi [email protected]
## Make isa functions verbose isa.option(verbose=TRUE) ## Query the value of 'verbose' isa.option("verbose") ## Query all options isa.option()
## Make isa functions verbose isa.option(verbose=TRUE) ## Query the value of 'verbose' isa.option("verbose") ## Query all options isa.option()
Relate the biclusters found in many ISA runs on the same input data.
## S4 method for signature 'matrix' isa.sweep(data, ...) ## S4 method for signature 'list' sweep.graph(sweep.result, ...)
## S4 method for signature 'matrix' isa.sweep(data, ...) ## S4 method for signature 'list' sweep.graph(sweep.result, ...)
data |
The input matrix. |
... |
Additional arguments, see details
below. |
sweep.result |
An ISA result with hierarchy information in the
seed data, typically calculated by the |
isa.sweep
can be called as
isa.sweep(data, normed.data, isaresult, method = c("cor"), neg.cor = TRUE, cor.limit = 0.9)
where the arguments are:
The input matrix.
The normalized input matrix, usually the output of
the isa.normalize
function.
An object containing the biclusters, the result of
isa
or isa.iterate
.
Character scalar giving the method to determine which
seed converged which bicluster. Right now only ‘cor
’
is supported, this is based on Pearson correlation.
Logical scalar, whether to consider negative correlation as convergence.
Numeric scalar giving the minimum correlation for convergence.
Many ISA runs with different thresholds typically create a bunch of biclusters and it is useful to visualize how these are related.
From a set of biclusters for which of the thr.row
and
thr.col
parameters was the same, but the other was not,
isa.sweep
creates a hierarchy of modules.
The hierarchy is a directed graph of modules in which every node has
an out degree at most one. An edge pointing from module to
module
means that module
is “part of” module
; in the sense that an ISA iteration started from module
converges to module
at the (milder) thresholds of
module
.
The information about the module relationships is stored in a column of the seed data.
sweep.graph
takes the output of isa.sweep
and creates a
graph object of it. For this the ‘igraph’ package is required
to be installed on the system.
isa.sweep
returns a named list with the same components as in
the input (isaresult
), but the ‘father
’ and the
‘level
’ columns are
added to the ‘seeddata
’ member. father
contains
the edges of the sweep graph: if bicluster is the father of
bicluster
that means that bicluster
converges to
bicluster
at the same threshold parameters that were used to
find biclusters
.
level
is a simple numbering of the different thresholds for
which the sweep tree was built. I.e. the most strict threshold is
level one, the second most is level two, etc.
sweep.graph
returns and igraph graph with a lot of attributes:
1 |
The |
2 |
The |
3 |
The |
4 |
The |
5 |
The |
6 |
The |
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## In-silico data set.seed(1) insili <- isa.in.silico() ## Do ISA with a bunch of row thresholds while keeping the column ## threshold fixed. This is quite an artificial example... isares <- isa(insili[[1]], thr.row=c(0.5,1,2), thr.col=0) ## Create a nice tree from the modules, we need the normed data for this nm <- isa.normalize(insili[[1]]) isa.tree <- isa.sweep(insili[[1]], nm, isares) network <- sweep.graph(isa.tree) ## Plot the network of modules, only if the igraph package is ## installed if (interactive() && require(igraph) && compareVersion(packageDescription("igraph")$Version, "0.6")>=0) { lab <- paste(sep="", seq_len(ncol(isa.tree$rows)), ": ", colSums(isa.tree$rows!=0), ",", colSums(isa.tree$columns!=0)) par(mar=c(1,1,1,1)) roots <- tapply(topological.sort(network, mode="out"), clusters(network)$membership, function(x) x[1]) rootlevel <- isa.tree$seeddata$level-1 coords <- layout.reingold.tilford(network, root=roots, rootlevel=rootlevel[roots+1]) plot(network, layout=coords, vertex.shape="rectangle", vertex.color="green", vertex.label=lab, vertex.size=30, vertex.size2=10) } ## Plot the modules themselves as well if (interactive()) { plotModules(isa.tree) } ## Yet another plot, the scores for the rows within the modules if (interactive()) { layout(matrix( 1:15, ncol=3 )) for (i in seq(ncol(isa.tree$rows))) { par(mar=c(2,2,1,1)) plot(isa.tree$rows[,i], axes=FALSE, ylim=c(-1,1)) axis(1); axis(2) text(nrow(isa.tree$rows), 1, adj=c(1,1), paste(sep="", "#", i), cex=2) } }
## In-silico data set.seed(1) insili <- isa.in.silico() ## Do ISA with a bunch of row thresholds while keeping the column ## threshold fixed. This is quite an artificial example... isares <- isa(insili[[1]], thr.row=c(0.5,1,2), thr.col=0) ## Create a nice tree from the modules, we need the normed data for this nm <- isa.normalize(insili[[1]]) isa.tree <- isa.sweep(insili[[1]], nm, isares) network <- sweep.graph(isa.tree) ## Plot the network of modules, only if the igraph package is ## installed if (interactive() && require(igraph) && compareVersion(packageDescription("igraph")$Version, "0.6")>=0) { lab <- paste(sep="", seq_len(ncol(isa.tree$rows)), ": ", colSums(isa.tree$rows!=0), ",", colSums(isa.tree$columns!=0)) par(mar=c(1,1,1,1)) roots <- tapply(topological.sort(network, mode="out"), clusters(network)$membership, function(x) x[1]) rootlevel <- isa.tree$seeddata$level-1 coords <- layout.reingold.tilford(network, root=roots, rootlevel=rootlevel[roots+1]) plot(network, layout=coords, vertex.shape="rectangle", vertex.color="green", vertex.label=lab, vertex.size=30, vertex.size2=10) } ## Plot the modules themselves as well if (interactive()) { plotModules(isa.tree) } ## Yet another plot, the scores for the rows within the modules if (interactive()) { layout(matrix( 1:15, ncol=3 )) for (i in seq(ncol(isa.tree$rows))) { par(mar=c(2,2,1,1)) plot(isa.tree$rows[,i], axes=FALSE, ylim=c(-1,1)) axis(1); axis(2) text(nrow(isa.tree$rows), 1, adj=c(1,1), paste(sep="", "#", i), cex=2) } }
From a potentially non-unique set of ISA biclusters, create a unique set by removing all biclusters that are similar to others.
## S4 method for signature 'list,list' isa.unique(normed.data, isaresult, ...)
## S4 method for signature 'list,list' isa.unique(normed.data, isaresult, ...)
normed.data |
The normalized input data, a list of two matrices,
usually the output of |
isaresult |
The result of an ISA run, a set of biclusters. |
... |
Additional arguments, see details below. |
This function can we called as
isa.unique(normed.data, isaresult, method = c("cor"), ignore.div = TRUE, cor.limit = 0.9, neg.cor = TRUE, drop.zero = TRUE)
where the arguments are:
The normalized input data, a list of two matrices,
usually the output of isa.normalize
.
The result of an ISA run, a set of biclusters.
Character scalar giving the method to be used to
determine if two biclusters are similar. Right now only
‘cor
’ is implemented, this keeps both biclusters if
their Pearson correlation is less than cor.limit
, both for
their row and column scores. See also the neg.cor
argument.
Logical scalar, if TRUE
, then the divergent
biclusters will be removed.
Numeric scalar, giving the correlation limit for the
‘cor
’ method.
Logical scalar, if TRUE
, then the
‘cor
’ method considers the absolute value of the
correlation.
Logical scalar, whether to drop biclusters that have all zero scores.
Because of the nature of the ISA algorithm, the set of biclusters
created by isa.iterate
is not unique; many input seeds
may converge to the same biclusters, even if the input seeds are not
random.
isa.unique
filters a set of biclusters and removed the ones
that are very similar to ones that were already found for another
seed.
A named list, the filtered isaresult
. See the return value of
isa.iterate
for the details.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA.
## Create an ISA module set set.seed(1) insili <- isa.in.silico(noise=0.01) ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=20) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Check correlation among modules cc <- cor(isares$rows) if (interactive()) { hist(cc[lower.tri(cc)],10) } ## Some of them are quite high, how many? undiag <- function(x) { diag(x) <- 0; x } sum(undiag(cc) > 0.99, na.rm=TRUE) ## Eliminate duplicated modules isares.unique <- isa.unique(nm, isares) ## How many modules left? ncol(isares.unique$rows) ## Double check cc2 <- cor(isares.unique$rows) if (interactive()) { hist(cc2[lower.tri(cc2)],10) } ## High correlation? sum(undiag(cc2) > 0.99, na.rm=TRUE)
## Create an ISA module set set.seed(1) insili <- isa.in.silico(noise=0.01) ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=20) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Check correlation among modules cc <- cor(isares$rows) if (interactive()) { hist(cc[lower.tri(cc)],10) } ## Some of them are quite high, how many? undiag <- function(x) { diag(x) <- 0; x } sum(undiag(cc) > 0.99, na.rm=TRUE) ## Eliminate duplicated modules isares.unique <- isa.unique(nm, isares) ## How many modules left? ncol(isares.unique$rows) ## Double check cc2 <- cor(isares.unique$rows) if (interactive()) { hist(cc2[lower.tri(cc2)],10) } ## High correlation? sum(undiag(cc2) > 0.99, na.rm=TRUE)
Make several image plots, one for each bicluster, and optionally one for the original data as well.
images(matrices, names=NULL, ...) ## S4 method for signature 'list' plotModules(modules, ...)
images(matrices, names=NULL, ...) ## S4 method for signature 'list' plotModules(modules, ...)
matrices |
A list of matrices to plot. Please note that this argument is always interpreted as a list, even if want to plot a single matrix, put it into a list. |
names |
Character vector, the labels to show above the image
plots. If you give the |
... |
Additional arguments, for |
modules |
The object with the ISA modules, as returned by the
|
images
creates image plots for a series of matrices, using
levelplot
from the lattice
package.
plotModules
calls images
from the
to create image plots for a set of modules. It can be called as
plotModules(modules, to.plot=seq_len(ncol(modules$rows)), data, binary=TRUE, names=NULL, xlab="", ylab="", \dots)
where the arguments are:
The object with the ISA modules, as returned by the
isa
function or other such functions.
Numeric vector, the modules to plot, the numbers
correspond to the columns in modules$rows
and
modules$columns
. By default all modules will be plot.
An optional data matrix to plot. Most often this is the
original data. If given, its dimension must much the dimensions in
the modules
object. If given, then this matrix is plotted
first, before the modules.
Logical scalar, whether to binarize the biclusters before plotting or use the actual ISA scores. By default the biclusters are binarized.
Character vector, the labels to show above the image
plots. If you give the data
argument to plotModules
,
then the first label corresponds to that.
Character scalar, the label to put on the horizontal axis.
Character scalar, the label to put on the vertical axis.
Further arguments are passed to
levelplot
.
Note, that if you want to export these plots to a file, then a bitmap-based format might be more appropriate. For larger matrices vector formats tend to generate huge file because of the many dots.
Since these function use the lattice
package, they return an
object of class trellis
. You will need to print
this
object to create the actual plots.
Gabor Csardi [email protected]
image
and the other version: image
from the Matrix
package, for alternatives to create image
plots.
## The following should plot the input matrix and the four modules ## found by ISA set.seed(1) # to get same plot every time data <- isa.in.silico(100, 100, num.fact=4) modules <- isa(data[[1]], thr.row=2, thr.col=2) plotModules(modules, data=data[[1]], binary=FALSE, names=c("Input matrix", paste("Module", seq_len(ncol(modules$rows)))))
## The following should plot the input matrix and the four modules ## found by ISA set.seed(1) # to get same plot every time data <- isa.in.silico(100, 100, num.fact=4) modules <- isa(data[[1]], thr.row=2, thr.col=2) plotModules(modules, data=data[[1]], binary=FALSE, names=c("Input matrix", paste("Module", seq_len(ncol(modules$rows)))))
Run the PPA with the default parameters
## S4 method for signature 'list' ppa(data, ...)
## S4 method for signature 'list' ppa(data, ...)
data |
The input, a list of two numeric matrices, with the same
number of columns. They may contain |
... |
Additional arguments, see details below. |
Please read the isa2-package manual page for and introductino on ISA and PPA.
This function can be called as
ppa(data, thr.row1 = seq(1, 3, by = 0.5), thr.row2 = seq(1, 3, by = 0.5), thr.col = seq(1, 3, by = 0.5), no.seeds = 100, direction = "updown")
where the arguments are:
The input, a list of two numeric matrices, with the same
number of columns. They may contain NA
and/or NaN
values, but then the algorithm might get slower, as R matrix
multiplication is slower sometimes slower for these matrices,
depending on your platform.
Numeric scalar or vector giving the threshold parameter for the rows of the first matrix. Higher values indicate a more stringent threshold and the result comodules will contain less rows for the first matrix on average. The threshold is measured by the number of standard deviations from the mean, over the values of the first row vector. If it is a vector then it must contain an entry for each seed.
Numeric scalar or vector, the threshold parameter(s)
for the rows of the second matrix. See thr.row1
for
details.
Numeric scalar or vector giving the threshold
parameter for the columns of both matrices. The analogue of
thr.row1
.
Integer scalar, the number of random seeds to use.
Character vector of length four, one for each matrix
multiplication performed during a PPA iteration.
It specifies whether we are interested in
rows/columns that are higher (‘up
’) than average,
lower than average (‘down
’), or both
(‘updown
’). The first and the second entry both
corresponds to the common column dimension of the two matrices, so
they should be equal, otherwise a warning is given.
The ppa
function provides and easy interface to the PPA. It
runs all sptes of a typical PPA work flow, with their default
paramers.
This involves:
Normalizing the input matrices by calling
ppa.normalize
.
Generating random input seeds via
generate.seeds
.
Running the PPA with all combinations of the given row1, row2
and column thresholds (by default 1, 1.5, 2, 2.5, 3); by calling
ppa.iterate
.
Merging similar co-modules, separately for each threshold
combination, by calling ppa.unique
.
Filtering the co-modules separately for each threshold
combination, by calling ppa.filter.robust
.
Putting all co-modules from the run with different thresholds, into a single object.
Merging similar co-modules, again, but now across all threshold combinations. If two co-modules are similar, then the larger one, the one with milder thresholds is kept.
Please see the manual pages of these functions for the details.
A named list is returned with the following elements:
rows1 |
The first components of the co-modules, corresponding to the rows of the first input matrix. Every column corresponds to a co-module, if an element (the score of the row) is non-zero, that means that that component is included in the co-module, otherwise it is not. Scores are between -1 and 1. If two scores have the same non-zero sign, then the corresponding first matrix rows are collelated. If they have an opposite sign, then they are anti-correlated. If an input seed did not converge within the allowed number of
iterations, then that column of |
rows2 |
This is the same as |
columns |
The same as |
seeddata |
A data frame containing information about the co-modules. There is one row for each co-module. The data frame has the following columns:
|
rundata |
A named list with information about the PPA run. It has the following entries:
|
Gabor Csardi [email protected]
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
isa2-package for a short introduction to the ISA and the Ping-Pong algorithms. See the functions mentioned above if you want to change the default ISA parameters.
## WE do not run this, it takes relatively long ## Not run: data <- ppa.in.silico(noise=0.1) ppa.result <- ppa(data[1:2], direction="up") ## Find the best bicluster for each block in the input ## (based on the rows of the first input matrix) best <- apply(cor(ppa.result$rows1, data[[3]]), 2, which.max) ## Check correlation sapply(seq_along(best), function(x) cor(ppa.result$rows1[,best[x]], data[[3]][,x])) ## The same for the rows of the second matrix sapply(seq_along(best), function(x) cor(ppa.result$rows2[,best[x]], data[[4]][,x])) ## The same for the columns sapply(seq_along(best), function(x) cor(ppa.result$columns[,best[x]], data[[5]][,x])) ## Plot the data and the modules found if (interactive()) { layout(rbind(1:2,c(3,6),c(4,7), c(5,8))) image(data[[1]], main="In-silico data, first matrix") image(data[[2]], main="In-silico data, second matrix") sapply(best[1:3], function(b) image(outer(ppa.result$rows1[,b], ppa.result$columns[,b]), main=paste("Module", b))) sapply(best[1:3], function(b) image(outer(ppa.result$rows2[,b], ppa.result$columns[,b]), main=paste("Module", b))) } ## End(Not run)
## WE do not run this, it takes relatively long ## Not run: data <- ppa.in.silico(noise=0.1) ppa.result <- ppa(data[1:2], direction="up") ## Find the best bicluster for each block in the input ## (based on the rows of the first input matrix) best <- apply(cor(ppa.result$rows1, data[[3]]), 2, which.max) ## Check correlation sapply(seq_along(best), function(x) cor(ppa.result$rows1[,best[x]], data[[3]][,x])) ## The same for the rows of the second matrix sapply(seq_along(best), function(x) cor(ppa.result$rows2[,best[x]], data[[4]][,x])) ## The same for the columns sapply(seq_along(best), function(x) cor(ppa.result$columns[,best[x]], data[[5]][,x])) ## Plot the data and the modules found if (interactive()) { layout(rbind(1:2,c(3,6),c(4,7), c(5,8))) image(data[[1]], main="In-silico data, first matrix") image(data[[2]], main="In-silico data, second matrix") sapply(best[1:3], function(b) image(outer(ppa.result$rows1[,b], ppa.result$columns[,b]), main=paste("Module", b))) sapply(best[1:3], function(b) image(outer(ppa.result$rows2[,b], ppa.result$columns[,b]), main=paste("Module", b))) } ## End(Not run)
This function generates an artificial test data set for the PPA algorithm: two matrices, with common column dimension, containing co-modules of prescribed number, size, signal level, noise level and background noise.
ppa.in.silico (num.rows1 = 300, num.rows2 = 200, num.cols = 50, num.fact = 3, mod.row1.size = round(0.5 * num.rows1/num.fact), mod.row2.size = round(0.5 * num.rows2/num.fact), mod.col.size = round(0.5 * num.cols/num.fact), noise = 0.1, mod.signal = rep(1, num.fact), mod.noise = rep(0, num.fact), overlap.row1 = 0, overlap.row2 = overlap.row1, overlap.col = overlap.row1)
ppa.in.silico (num.rows1 = 300, num.rows2 = 200, num.cols = 50, num.fact = 3, mod.row1.size = round(0.5 * num.rows1/num.fact), mod.row2.size = round(0.5 * num.rows2/num.fact), mod.col.size = round(0.5 * num.cols/num.fact), noise = 0.1, mod.signal = rep(1, num.fact), mod.noise = rep(0, num.fact), overlap.row1 = 0, overlap.row2 = overlap.row1, overlap.col = overlap.row1)
num.rows1 |
The number of rows in the first data matrix. |
num.rows2 |
The number of rows in the second data matrix. |
num.cols |
The number of columns in both matrices. |
num.fact |
The number of co-modules to put in the data. |
mod.row1.size |
The size of the co-modules, the number of rows from the first data matrix. It can be a scalar or a vector, and it is recycled. |
mod.row2.size |
The size of the co-modules, the number of rows from the second data matrix. It can be a scalar or a vector, and it is recycled. |
mod.col.size |
The size of the co-modules, the number of columns (from both matrices). It can be a scalar or a vector, and it is recycled. |
noise |
The level of the background noise to be added to the data matrices. It gives the standard deviation of the normal distribution from which the noise is generated. |
mod.signal |
The signal level of the co-modules. |
mod.noise |
The noise levels of the different co-modules. Normally
distributed noise with standard deviation |
overlap.row1 |
The overlap of the modules, for the rows of the first matrix. Zero means no overlap, one means one overlapping row, etc. |
overlap.row2 |
The same as |
overlap.col |
The same as |
ppa.in.silico
creates an artificial data set to test the PPA
(or potentially other) algorithm. It creates two data matrices with an
overlapping dimension and a checkerboard-like structure. The fields of
the checkerboard are the co-modules, and they may have different
signal and noise levels, and they may also overlap.
A list with five matrices. The first two are the two data matrices, they have the same number of columns. The last three matrices contain the correct modules, for the rows of the first matrix, the rows of the second matrix, and finally for the common column dimension.
Gabor Csardi [email protected]
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
See ppa
for an easy way of running the PPA
## Define a function for plotting if we are interactive if (interactive()) { layout(cbind(1:2,3:4,5:6,7:8)) } myimage <- function(mat) { if (interactive()) { par(mar=c(3,2,1,1)); image(mat[[1]]) par(mar=c(3,2,1,1)); image(mat[[2]]) } } ## Co-modules without overlap and noise silico1 <- ppa.in.silico(100, 100, 100, 10, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0) myimage(silico1) ## The same, but with overlap and noise silico2 <- ppa.in.silico(100, 100, 100, 10, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.1, overlap.row1=3) myimage(silico2) ## Co-modules with different noise levels silico3 <- ppa.in.silico(100, 100, 100, 5, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.01, mod.noise=seq(0.1,by=0.1,length=5)) myimage(silico3) ## Co-modules withe different signal levels silico4 <- ppa.in.silico(100, 100, 100, 5, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.01, mod.signal=seq(1,5,length=5)) myimage(silico4)
## Define a function for plotting if we are interactive if (interactive()) { layout(cbind(1:2,3:4,5:6,7:8)) } myimage <- function(mat) { if (interactive()) { par(mar=c(3,2,1,1)); image(mat[[1]]) par(mar=c(3,2,1,1)); image(mat[[2]]) } } ## Co-modules without overlap and noise silico1 <- ppa.in.silico(100, 100, 100, 10, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0) myimage(silico1) ## The same, but with overlap and noise silico2 <- ppa.in.silico(100, 100, 100, 10, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.1, overlap.row1=3) myimage(silico2) ## Co-modules with different noise levels silico3 <- ppa.in.silico(100, 100, 100, 5, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.01, mod.noise=seq(0.1,by=0.1,length=5)) myimage(silico3) ## Co-modules withe different signal levels silico4 <- ppa.in.silico(100, 100, 100, 5, mod.row1.size=10, mod.row2.size=10, mod.col.size=10, noise=0.01, mod.signal=seq(1,5,length=5)) myimage(silico4)
Perform PPA on two (normalized) input matrices.
## S4 method for signature 'list' ppa.iterate(normed.data, ...)
## S4 method for signature 'list' ppa.iterate(normed.data, ...)
normed.data |
The normalized input matrices. A list of four
matrices, usually coming from the |
... |
Additional arguments, see details below. |
ppa.iterate
performs the PPA iteration on the specified input
matrices using given input seeds. It can be called
as
ppa.iterate(normed.data, row1.seeds, col1.seeds, row2.seeds, col2.seeds, thr.row1, thr.col=thr.row1, thr.row2=thr.row1, direction="updown", convegence=c("cor"), cor.limit=0.9, oscillation=FALSE, maxiter=100)
where the arguments are:
The normalized data, a list of four matrices with
the appropriate size. They are usually coming from the output of
the ppa.normalize
function.
The row seed vectors for the first matrix. At least one of the four possible seed vectors must be present and they will be concatenated, after doing the suitable half-iterations.
The column seed vectors for the first matrix. At least one of the four possible seed vectors must be present and they will be concatenated, after doing the suitable half-iterations.
The row seed vectors for the second matrix. At least one of the four possible seed vectors must be present and they will be concatenated, after doing the suitable half-iterations.
The column seed vectors for the second matrix. At least one of the four possible seed vectors must be present and they will be concatenated, after doing the suitable half-iterations.
Numeric scalar or vector giving the threshold parameter for the rows of the first matrix. Higher values indicate a more stringent threshold and the result comodules will contain less rows for the first matrix on average. The threshold is measured by the number of standard deviations from the mean, over the values of the first row vector. If it is a vector then it must contain an entry for each seed.
Numeric scalar or vector giving the threshold
parameter for the columns of both matrices. The analogue of
thr.row1
.
Numeric scalar or vector, the threshold parameter(s)
for the rows of the second matrix. See thr.row1
for
details.
Character vector of length four, one for each matrix
multiplication performed during a PPA iteration.
It specifies whether we are interested in
rows/columns that are higher (‘up
’) than average,
lower than average (‘down
’), or both
(‘updown
’). The first and the second entry both
corresponds to the common column dimension of the two matrices, so
they should be equal, otherwise a warning is given.
Character scalar, the convergence criteria for the
PPA iteration. If it is ‘cor
’, then convergence is
measured based on high Pearson correlation (see the cor.limit
argument below) of the subsequent row and column
vectors. Currently this is the only option implemented.
The correlation limit for convergence if the
‘cor
’ method is used.
Logical scalar, whether to look for oscillating
seeds. Usually there are not too many oscillating seeds, so it is
safe to leave this on FALSE
.
The maximum number of iterations allowed.
A named list with many components. Please see the manual page of
link{isa}
for a complete description.
Gabor Csardi [email protected]
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
See ppa
for an easy way of running the PPA
## A basic PPA workflow with a single threshold combination ## In-silico data set.seed(1) insili <- ppa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrices nm <- ppa.normalize(insili[1:2]) ## Do PPA ppares <- ppa.iterate(nm, row1.seeds=seeds, thr.row1=1, direction="up") ## Eliminate duplicates ppares <- ppa.unique(nm, ppares) ## Fitler out not robust ones ppares <- ppa.filter.robust(insili[1:2], nm, ppares) ## Print the sizes of the co-modules cbind(colSums(ppares$rows1 != 0), colSums(ppares$rows1 != 0), colSums(ppares$columns != 0)) ## Plot the original data and the first three modules found if (interactive()) { layout(rbind(1:2,3:4)) image(insili[[1]], main="In silico data 1") image(outer(ppares$rows1[,1],ppares$columns[,1])+ outer(ppares$rows1[,2],ppares$columns[,2])+ outer(ppares$rows1[,3],ppares$columns[,3]), main="PPA co-modules 2") image(insili[[2]], main="In silico data 2") image(outer(ppares$rows2[,1],ppares$columns[,1])+ outer(ppares$rows2[,2],ppares$columns[,2])+ outer(ppares$rows2[,3],ppares$columns[,3]), main="PPA co-modules 2") }
## A basic PPA workflow with a single threshold combination ## In-silico data set.seed(1) insili <- ppa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrices nm <- ppa.normalize(insili[1:2]) ## Do PPA ppares <- ppa.iterate(nm, row1.seeds=seeds, thr.row1=1, direction="up") ## Eliminate duplicates ppares <- ppa.unique(nm, ppares) ## Fitler out not robust ones ppares <- ppa.filter.robust(insili[1:2], nm, ppares) ## Print the sizes of the co-modules cbind(colSums(ppares$rows1 != 0), colSums(ppares$rows1 != 0), colSums(ppares$columns != 0)) ## Plot the original data and the first three modules found if (interactive()) { layout(rbind(1:2,3:4)) image(insili[[1]], main="In silico data 1") image(outer(ppares$rows1[,1],ppares$columns[,1])+ outer(ppares$rows1[,2],ppares$columns[,2])+ outer(ppares$rows1[,3],ppares$columns[,3]), main="PPA co-modules 2") image(insili[[2]], main="In silico data 2") image(outer(ppares$rows2[,1],ppares$columns[,1])+ outer(ppares$rows2[,2],ppares$columns[,2])+ outer(ppares$rows2[,3],ppares$columns[,3]), main="PPA co-modules 2") }
Normalize the two input matrices and store them in a form that can be used effectively to perform the Ping-Pong Algorithm
## S4 method for signature 'list' ppa.normalize(data, ...)
## S4 method for signature 'list' ppa.normalize(data, ...)
data |
A list of two numeric matrices, with matching number of
columns. They might contain |
... |
Additional arguments, see details below. |
This function can be called as
isa.normalize(data, prenormalize = FALSE)
where the arguments are:
A list of two numeric matrices, with matching number of
columns. They might contain Na
and/or NaN
values.
Logical scalar, see details below.
It was observed that the PPA works best if the input matrices are scaled to have mean zero and standard deviation one.
A PPA step consist of four matrix-multiplications and this requires four different matrices, each of the two input matrices scaled row-wise and column-wise.
If the prenormalized
argument is set to TRUE
, then
row-wise scaling is calculated on the column-wise scaled matrices and
not on the raw input.
A list of four matrices, the first two corresponds to the first input matrix, the second two to the second matrix.
Gabor Csardi [email protected]
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
See ppa
for an easy way of running the PPA
## In-silico data set.seed(1) insili <- ppa.in.silico() nm <- ppa.normalize(insili[1:2]) ## Check it max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(rowSums(nm[[3]]))) max(abs(rowSums(nm[[4]]))) max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(rowSums(nm[[3]]))) max(abs(rowSums(nm[[4]]))) ## Plot them if (interactive()) { layout(rbind(1:3,4:6)) image(insili[[1]], main="Original data 1") image(t(nm[[1]]), main="Row normalized 1") image(nm[[2]], main="Column normalized 1") image(insili[[2]], main="Original data 2") image(t(nm[[3]]), main="Row normalized 2") image(nm[[4]], main="Column normalized 2") }
## In-silico data set.seed(1) insili <- ppa.in.silico() nm <- ppa.normalize(insili[1:2]) ## Check it max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(rowSums(nm[[3]]))) max(abs(rowSums(nm[[4]]))) max(abs(rowSums(nm[[1]]))) max(abs(rowSums(nm[[2]]))) max(abs(rowSums(nm[[3]]))) max(abs(rowSums(nm[[4]]))) ## Plot them if (interactive()) { layout(rbind(1:3,4:6)) image(insili[[1]], main="Original data 1") image(t(nm[[1]]), main="Row normalized 1") image(nm[[2]], main="Column normalized 1") image(insili[[2]], main="Original data 2") image(t(nm[[3]]), main="Row normalized 2") image(nm[[4]], main="Column normalized 2") }
From a potentially non-unique set of PPA co-modules, create a unique set by removing all co-modules that are similar to others.
## S4 method for signature 'list,list' ppa.unique(normed.data, pparesult, ...)
## S4 method for signature 'list,list' ppa.unique(normed.data, pparesult, ...)
normed.data |
The normalized input data, a list of four matrices,
usually the output of the |
pparesult |
The result of a PPA run, a set of co-modules. |
... |
Additional arguments, see details below. |
This function can be called as
ppa.unique(normed.data, pparesult, method = c("cor"), ignore.div = TRUE, cor.limit = 0.9, neg.cor = TRUE, drop.zero = TRUE)
where the arguments are:
The normalized input data, a list of four
matrices, usually the output of ppa.normalize
.
The results of a PPA run, a set of co-modules.
Character scalar giving the method to be used to
determine if two co-modules are similar. Right now only
‘cor
’ is implemented, this keeps both co-modules if
their Pearson correlation is less than cor.limit
, for
their row1, row2 and column scores. See also the neg.cor
argument.
Logical scalar, if TRUE
, then the divergent
co-modules will be removed.
Numeric scalar, giving the correlation limit for the
‘cor
’ method.
Logical scalar, if TRUE
, then the
‘cor
’ method considers the absolute value of the
correlation.
Logical scalar, whether to drop co-modules that have all zero scores.
A named list, the filtered pparesult
. See the return value of
ppa.iterate
for the details.
Gabor Csardi [email protected]
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
See ppa
for an easy way of running the PPA
## Create an PPA module set set.seed(1) insili <- ppa.in.silico(noise=0.01) ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=20) ## Normalize input matrix nm <- ppa.normalize(insili[1:2]) ## Do PPA ppares <- ppa.iterate(nm, row1.seeds=seeds, thr.row1=2, thr.row2=2, thr.col=1) ## Check correlation among modules cc <- cor(ppares$rows1) if (interactive()) { hist(cc[lower.tri(cc)],10) } ## Some of them are quite high, how many? undiag <- function(x) { diag(x) <- 0; x } sum(undiag(cc) > 0.99, na.rm=TRUE) ## Eliminate duplicated modules ppares.unique <- ppa.unique(nm, ppares) ## How many modules left? ncol(ppares.unique$rows1) ## Double check cc2 <- cor(ppares.unique$rows1) if (interactive()) { hist(cc2[lower.tri(cc2)],10) } ## High correlation? sum(undiag(cc2) > 0.99, na.rm=TRUE)
## Create an PPA module set set.seed(1) insili <- ppa.in.silico(noise=0.01) ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=20) ## Normalize input matrix nm <- ppa.normalize(insili[1:2]) ## Do PPA ppares <- ppa.iterate(nm, row1.seeds=seeds, thr.row1=2, thr.row2=2, thr.col=1) ## Check correlation among modules cc <- cor(ppares$rows1) if (interactive()) { hist(cc[lower.tri(cc)],10) } ## Some of them are quite high, how many? undiag <- function(x) { diag(x) <- 0; x } sum(undiag(cc) > 0.99, na.rm=TRUE) ## Eliminate duplicated modules ppares.unique <- ppa.unique(nm, ppares) ## How many modules left? ncol(ppares.unique$rows1) ## Double check cc2 <- cor(ppares.unique$rows1) if (interactive()) { hist(cc2[lower.tri(cc2)],10) } ## High correlation? sum(undiag(cc2) > 0.99, na.rm=TRUE)
Robustness of ISA biclusters and PPA co-modules. The more robust biclusters/co-modules are more significant in the sense that it is less likely to see them in random data.
## S4 method for signature 'list' robustness(normed.data, ...) ## S4 method for signature 'matrix' isa.filter.robust(data, ...) ## S4 method for signature 'list' ppa.filter.robust(data, ...)
## S4 method for signature 'list' robustness(normed.data, ...) ## S4 method for signature 'matrix' isa.filter.robust(data, ...) ## S4 method for signature 'list' ppa.filter.robust(data, ...)
normed.data |
The normalized input data, usually calculated with
|
data |
The original, not normalized input data, a matrix for
|
... |
Additional arguments, see details below. |
robustness
can be called as
robustness(normed.data, row.scores, col.scores)
isa.filter.robust
can be called as
isa.filter.robust(data, normed.data, isares, perms = 1, row.seeds, col.seeds)
and ppa.filter.robust
can be called as
ppa.filter.robust(data, normed.data, ppares, perms = 1, row1.seeds, col1.seeds, row2.seeds, col2.seeds)
These arguments are:
The normalized input data, usually calculated with
isa.normalize
(for ISA) or
ppa.normalize
(for PPA).
The scores of the row components of the
biclusters. Usually the rows
member of the result list, as
returned by isa
, or isa.iterate
or some
other ISA function.
The scores of the columns of the biclusters, usually
the columns
member of the result list from
isa
.
The original, not normalized input data.
The result of ISA, coming from isa
or
isa.iterate
or any other function that return the same
format.
The number of permutations to perform on the input data.
Optionally the row seeds for the ISA run on the
scrambled data. If this and col.seeds
are both omitted the
same number of random seeds are used as for isaresult
.
Optionally the column seed to use for the ISA on the
scrambled input matrix. If both this and row.seeds
are
omitted, then the same number of random (row) seeds will be used as
for isares
.
The result of a PPA run, the output of the
ppa.iterate
or the ppa.unique
function (or any other function that returns the same format).
Optionally, the seeds based of the rows of the first input matrix, can be given here. Otherwise random seeds are used, the same number as it was used to find the original co-modules.
Optionally, the seeds based of the columns of the first input matrix, can be given here. Otherwise random seeds are used, the same number as it was used to find the original co-modules.
Optionally, the seeds based of the rows of the second input matrix, can be given here. Otherwise random seeds are used, the same number as it was used to find the original co-modules.
Optionally, the seeds based of the columns of the second input matrix, can be given here. Otherwise random seeds are used, the same number as it was used to find the original co-modules.
Even if you generate a matrix with uniform random noise in it, if you calculate ISA on it, you will get some biclusters, except maybe if you use very strict threshold parameters. These biclusters contain rows and columns that are correlated just by chance. The same is true for PPA.
To circumvent this, you can use the so-called robustness measure of the biclusters/co-modules. The robustness of a bicluster is the function of its rows, columns and the input data, and it is a real number, usually positive. It is roughly equivalent to the principal singular value of the submatrix (of the reordered input matrix) defined by the bicluster.
robustness
calculates the robustness score of a set of
biclusters/co-modules, usually coming from one or more ISA/PPA
iterations.
isa.filter.robust
provides filtering based on the robustness
measure. It reshuffles the input matrix and calculates ISA on it, with
the parameters that were used to find the biclusters under
evaluation. It then calculates the robustness for the modules that
were found in the scrambled matrix (if there is any) and removes any
modules from the data set that have a lower robustness score than at
least one module in the scrambled data.
You can think of isa.filter.robust
as a permutation test, but
the input data is shuffled only once (at least by default), because of
the relatively high computational demands of the ISA.
ppa.filter.robust
does essentially the same, but for PPA
co-modules.
robustness
returns a numeric vector, the robustness score of
each bicluster.
isa.filter.robust
returns a named list, the filtered
isares
, see the return value of isa.iterate
for
the structure of the list.
ppa.filter.robust
resturns a named list, the filtered
ppares
, see the return value of ppa.iterate
for
the structure of the list.
Gabor Csardi [email protected]
Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22
Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.
Kutalik Z, Bergmann S, Beckmann, J: A modular approach for integrative analysis of large-scale gene-expression and drug-response data Nat Biotechnol 2008 May; 26(5) 531-9.
isa2-package for a short introduction on the Iterative
Signature Algorithm. See isa
for an easy way of running
ISA, ppa
for an easy way of running the PPA.
## A basic ISA work flow for a single threshold combination ## In-silico data set.seed(1) insili <- isa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Eliminate duplicates isares <- isa.unique(nm, isares) ## Calculate robustness rob <- robustness(nm, isares$rows, isares$columns) rob ## There are three robust ones and a lot of less robust ones ## Plot the three robust ones and three others if (interactive()) { toplot1 <- rev(order(rob))[1:3] toplot2 <- sample(seq_along(rob)[-toplot1], 3) layout( rbind(1:3,4:6) ) for (i in c(toplot1, toplot2)) { image(outer(isares$rows[,i], isares$column[,i]), main=round(rob[i],2)) } } ## Filter out not robust ones isares2 <- isa.filter.robust(insili[[1]], nm, isares) ## Probably there are only three of them left ncol(isares2$rows)
## A basic ISA work flow for a single threshold combination ## In-silico data set.seed(1) insili <- isa.in.silico() ## Random seeds seeds <- generate.seeds(length=nrow(insili[[1]]), count=100) ## Normalize input matrix nm <- isa.normalize(insili[[1]]) ## Do ISA isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1) ## Eliminate duplicates isares <- isa.unique(nm, isares) ## Calculate robustness rob <- robustness(nm, isares$rows, isares$columns) rob ## There are three robust ones and a lot of less robust ones ## Plot the three robust ones and three others if (interactive()) { toplot1 <- rev(order(rob))[1:3] toplot2 <- sample(seq_along(rob)[-toplot1], 3) layout( rbind(1:3,4:6) ) for (i in c(toplot1, toplot2)) { image(outer(isares$rows[,i], isares$column[,i]), main=round(rob[i],2)) } } ## Filter out not robust ones isares2 <- isa.filter.robust(insili[[1]], nm, isares) ## Probably there are only three of them left ncol(isares2$rows)