| Title: | Common Dimensions (ComDim) Multi-Block Analysis |
|---|---|
| Description: | Common Dimensions (ComDim) is a multi-block method that simultaneously considers multiple data tables to find latent components that are common to all the tables as well as those specific to each data table, along with the contribution of each table to each component. See Jouan-Rimbaud Bouveresse and Rutledge (2024) <doi:10.1002/cem.3454>, Boccard and Rutledge (2013) <doi:10.1016/j.aca.2013.01.022>, and Puig-Castellví et al. (2021) <doi:10.1016/j.chemolab.2021.104422>. |
| Authors: | Francesc Puig-Castellví [aut, cre] |
| Maintainer: | Francesc Puig-Castellví <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-13 12:09:01 UTC |
| Source: | https://github.com/cran/R.ComDim |
Adds or overwrites batch and/or metadata information for one or more blocks of an existing MultiBlock object.
AddMetadata(MB, block = NULL, batches = NULL, metadata = NULL)AddMetadata(MB, block = NULL, batches = NULL, metadata = NULL)
MB |
A MultiBlock object. |
block |
The name of the block to modify (character). If |
batches |
A numeric vector of batch labels, one per sample in the target block. Replaces any existing Batch entry for that block. |
metadata |
A data.frame with one row per sample in the target block. Replaces any existing Metadata entry for that block. |
The updated MultiBlock.
MultiBlock, FilterSamplesMultiBlock
b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables batch_b1 <- rep(1, 10) meta_b1 <- data.frame(condition = rep(c("A", "B"), 5)) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Add batch information to block 'b1': mb <- AddMetadata(mb, block = "b1", batches = batch_b1) # Add (or overwrite) metadata for block 'b1': mb <- AddMetadata(mb, block = "b1", metadata = meta_b1)b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables batch_b1 <- rep(1, 10) meta_b1 <- data.frame(condition = rep(c("A", "B"), 5)) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Add batch information to block 'b1': mb <- AddMetadata(mb, block = "b1", batches = batch_b1) # Add (or overwrite) metadata for block 'b1': mb <- AddMetadata(mb, block = "b1", metadata = meta_b1)
Return the block names of a MultiBlock object.
blockNames(x, ...) ## S4 method for signature 'MultiBlock' blockNames(x, slot = "Data")blockNames(x, ...) ## S4 method for signature 'MultiBlock' blockNames(x, slot = "Data")
x |
A MultiBlock object. |
... |
Not used. Present for S4 generic dispatch compatibility. |
slot |
A string: |
A character vector with the block names of the requested slot.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) blockNames(mb) # c("b1", "b2") blockNames(mb, "Data") # sameb1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) blockNames(mb) # c("b1", "b2") blockNames(mb, "Data") # same
Set the block names of a MultiBlock object. Renames the Data and Variables slots, and updates Batch and Metadata names to stay consistent.
blockNames(x) <- value ## S4 replacement method for signature 'MultiBlock' blockNames(x) <- valueblockNames(x) <- value ## S4 replacement method for signature 'MultiBlock' blockNames(x) <- value
x |
A MultiBlock object. |
value |
A character vector of new block names (same length as the number of blocks). |
The updated MultiBlock object.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) blockNames(mb) <- c("block1", "block2") blockNames(mb) # c("block1", "block2")b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) blockNames(mb) <- c("block1", "block2") blockNames(mb) # c("block1", "block2")
Extends any matrix decomposition method used for exploratory purposes to the multi-block
field. The user provides a function (FUN) to compute the scores from the
salience-weighted concatenated blocks; these scores are then used to derive the global
scores, local scores, and loadings following the traditional ComDim-PCA framework.
ComDim_Exploratory( MB = MB, ndim = NULL, FUN = FUN, normalise = FALSE, threshold = 1e-10, loquace = FALSE, method = "FUN", ... )ComDim_Exploratory( MB = MB, ndim = NULL, FUN = FUN, normalise = FALSE, threshold = 1e-10, loquace = FALSE, method = "FUN", ... )
MB |
A MultiBlock object. |
ndim |
Number of Common Dimensions. |
FUN |
The function used as the core of the ComDim analysis. It must accept a matrix
|
normalise |
To apply block normalisation. FALSE == no (default), TRUE == yes. |
threshold |
The threshold limit to stop the iterations. Iterations stop when the change in the global score vector is below this value (1e-10 as default). |
loquace |
To display the calculation times. TRUE == yes, FALSE == no (default). |
method |
A string label identifying the decomposition method used (default: 'FUN'). |
... |
Additional arguments passed to |
A ComDim object. Slots for supervised analysis
(R2Y, Q2, DQ2, VIP, VIP.block,
PLS.model, cv, Prediction) are empty. The
populated slots are:
MethodThe label supplied via the method argument.
ndimNumber of Common Dimensions extracted.
Q.scoresGlobal scores matrix ().
Column names are CC1, CC2, etc.; row names are sample
names. Each column is a unit-norm consensus score
derived from the dominant left direction of FUN applied to the
salience-weighted concatenated blocks
.
T.scoresNamed list of block-specific local scores matrices
( each). For block and component :
local loading and
local score
.
P.loadingsGlobal loadings matrix (). Column is
,
where is the mean-centred (and optionally normalised)
concatenated blocks.
SaliencesBlock salience (weight) matrix (, row names = block names). Entry is
,
the variance of block captured by global score .
R2XProportion of multi-block variance captured by each
component (named vector, length ). Let
be the score vector returned by FUN for component (before
unit-normalisation to obtain ), so that
; then
SingularSquared L2 norms of the FUN score vectors:
, used to derive
R2X.
MeanList with MeanMB: named list of column-mean
vectors per block, used for mean-centring.
NormList with NormMB: Frobenius norms used for
block normalisation (all ones when normalise = FALSE).
variable.blockCharacter vector (length )
indicating the block name of each row in P.loadings.
runtimeTotal computation time in seconds.
Jouan-Rimbaud Bouveresse D, Rutledge DN (2024). A synthetic review of some recent extensions of ComDim. Journal of Chemometrics, 38(5), e3454. doi:10.1002/cem.3454
b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) if (requireNamespace("ica", quietly = TRUE)) { fun.ICA <- function(W, ndim, ...) { # W is the concatenated MB. # ndim is the number of components. result <- ica::ica(W, ndim) # The function must return the source estimates (analogous to PCA scores). return(result$S) } resultsICA <- ComDim_Exploratory(mb, ndim = 2, FUN = fun.ICA, method = "ICA" ) }b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) if (requireNamespace("ica", quietly = TRUE)) { fun.ICA <- function(W, ndim, ...) { # W is the concatenated MB. # ndim is the number of components. result <- ica::ica(W, ndim) # The function must return the source estimates (analogous to PCA scores). return(result$S) } resultsICA <- ComDim_Exploratory(mb, ndim = 2, FUN = fun.ICA, method = "ICA" ) }
Finding common dimensions in multi-block datasets using OPLS. Also known as ConsensusOPLS (ComDim-OPLS) for multiblock structures: orthogonal components uncorrelated with Y are extracted from all blocks simultaneously before the predictive components are computed.
ComDim_OPLS( MB = MB, y = y, ndim = 1, nort = 1, method = c("OPLS-DA", "OPLS-R"), decisionRule = c("fixed", "max")[2], normalise = FALSE, loquace = FALSE, cv.k = 7 )ComDim_OPLS( MB = MB, y = y, ndim = 1, nort = 1, method = c("OPLS-DA", "OPLS-R"), decisionRule = c("fixed", "max")[2], normalise = FALSE, loquace = FALSE, cv.k = 7 )
MB |
A MultiBlock object. |
y |
The Y-block. A class vector or dummy matrix for OPLS-DA, or a numeric matrix/vector for OPLS-R. |
ndim |
Number of predictive Common Dimensions. Default is 1. |
nort |
Maximum number of orthogonal Common Dimensions. Default is 1. The actual number used is determined by ConsensusOPLS cross-validation and may be less than this value. |
method |
'OPLS-DA' for discriminant analysis or 'OPLS-R' for regression. |
decisionRule |
Only used if method is 'OPLS-DA'. If 'fixed', samples are assigned to the class with Y-hat above 1/nclasses. If 'max', samples are assigned to the class with the highest Y-hat. |
normalise |
To apply block normalisation. FALSE == no (default), TRUE == yes. |
loquace |
To display the calculation times. TRUE == yes, FALSE == no (default). |
cv.k |
Number of folds for k-fold cross-validation (default 7). Set
to 0 to skip CV output. ConsensusOPLS always performs internal CV to select
the optimal number of orthogonal components; when |
This function is a wrapper around ConsensusOPLS.
The core kernel-OPLS extraction is delegated to that package; all ComDim
output slots (local scores, loadings, VIP, sensitivity, confusion matrix,
etc.) are computed from the returned model objects.
A ComDim object. All slots are populated. Key slots:
Method"OPLS-DA" or "OPLS-R".
ndimNumber of predictive Common Dimensions.
Q.scoresPredictive global scores matrix ().
T.scoresNamed list of block-specific predictive local scores.
P.loadingsGlobal predictive loadings.
SaliencesPredictive block salience matrix ().
OrthogonalList with orthogonal component outputs: nort,
Q.scores, T.scores, P.loadings.ort, Saliences.ort.
R2XNamed vector (length ) of X-variance fractions.
R2YNamed vector (length ) of Y-variance fractions.
Q2Cross-validated Q2 per class/response (when cv.k >= 2;
otherwise training-set fit).
DQ2(OPLS-DA only) Cross-validated discriminant Q2 per class.
VIPGlobal total VIP (named vector, length ).
VIP.blockNamed list (one data.frame per block) with columns
p, o, tot.
PLS.modelKOPLS regression objects: W, B, B0, Y.
cvCross-validation results when cv.k >= 2: k,
Ypred, Q2, DQ2.
PredictionTraining-set predictions: Y.pred; for OPLS-DA
also decisionRule, trueClass, predClass,
Sensitivity, Specificity, confusionMatrix.
MeanList with MeanMB and MeanY.
NormList with NormMB, FrobNorms, RVweights.
variable.blockBlock membership of each variable.
runtimeTotal computation time in seconds.
Boccard J, Rutledge DN (2013). A consensus OPLS-DA strategy for multiblock Omics data fusion. Analytica Chimica Acta, 769, 30–39. doi:10.1016/j.aca.2013.01.022
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) y <- rep(c("A", "B"), 5) results <- ComDim_OPLS(mb, y, ndim = 1, nort = 1, method = "OPLS-DA")b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) y <- rep(c("A", "B"), 5) results <- ComDim_OPLS(mb, y, ndim = 1, nort = 1, method = "OPLS-DA")
Finding common dimensions in multi-block datasets.
ComDim_PCA( MB = MB, ndim = NULL, normalise = FALSE, threshold = 1e-10, loquace = FALSE, CompMethod = "Normal", Partitions = 1 )ComDim_PCA( MB = MB, ndim = NULL, normalise = FALSE, threshold = 1e-10, loquace = FALSE, CompMethod = "Normal", Partitions = 1 )
MB |
A MultiBlock object. |
ndim |
Number of Common Dimensions. |
normalise |
To apply normalisation. FALSE == no (default), TRUE == yes. |
threshold |
The threshold limit to stop the iterations. If the "difference of fit" < threshold (1e-10 as default). |
loquace |
To display the calculation times. TRUE == yes, FALSE == no (default). |
CompMethod |
To speed-up the analysis for really big MultiBlocks. 'Normal' (default), 'Kernel', 'PCT', 'Tall' or 'Wide'. |
Partitions |
To speed-up the analysis for really big MultiBlocks. This parameter is used if CompMethod is 'Tall' or 'Wide'. |
A ComDim object. Slots for supervised analysis
(R2Y, Q2, DQ2, VIP, VIP.block,
PLS.model, cv, Prediction) are empty. The
populated slots are:
Method"PCA".
ndimNumber of Common Dimensions extracted.
Q.scoresGlobal scores matrix ().
Column names are CC1, CC2, etc.; row names are sample
names. Each column is a unit-norm consensus score,
the dominant left singular vector of the salience-weighted concatenated
blocks
.
T.scoresNamed list of block-specific local scores matrices
( each). For block and component :
local loading and
local score
.
P.loadingsGlobal loadings matrix (). Column is
,
where is the mean-centred (and optionally normalised)
concatenated blocks.
SaliencesBlock salience (weight) matrix (, row names = block names). Entry is
,
the variance of block captured by global score .
R2XProportion of multi-block inertia captured by each
component (named vector, length ). Let
be the leading singular value of for
component (stored as ); then
SingularSquared leading singular values of
, one per component:
.
MeanList with MeanMB: named list of column-mean
vectors per block, used for mean-centring.
NormList with NormMB: Frobenius norms used for
block normalisation (all ones when normalise = FALSE).
variable.blockCharacter vector (length )
indicating the block name of each row in P.loadings.
runtimeTotal computation time in seconds.
Jouan-Rimbaud Bouveresse D, Rutledge DN (2024). A synthetic review of some recent extensions of ComDim. Journal of Chemometrics, 38(5), e3454. doi:10.1002/cem.3454
Original MATLAB implementation: https://github.com/DNRutledge/ComDim/
# Example 1: two data blocks. b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) results <- ComDim_PCA(mb, 2) # Example 2: two data blocks, each with different replicate number b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(2400), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Samples = list( b1 = paste0("samples_", 1:10), b2 = rep(paste0("samples_", 1:10), 3) ), Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.size = TRUE ) rw <- SplitRW(mb) results <- ComDim_PCA(rw, 2)# Example 1: two data blocks. b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) results <- ComDim_PCA(mb, 2) # Example 2: two data blocks, each with different replicate number b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(2400), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Samples = list( b1 = paste0("samples_", 1:10), b2 = rep(paste0("samples_", 1:10), 3) ), Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.size = TRUE ) rw <- SplitRW(mb) results <- ComDim_PCA(rw, 2)
Finding common dimensions in multi-block datasets using PLS.
ComDim_PLS( MB = MB, y = y, ndim = NULL, method = c("PLS-DA", "PLS-R"), decisionRule = c("fixed", "max")[2], normalise = FALSE, scale.y = FALSE, threshold = 1e-10, loquace = FALSE, CompMethod = "Normal", Partitions = 1, cv.k = 7 )ComDim_PLS( MB = MB, y = y, ndim = NULL, method = c("PLS-DA", "PLS-R"), decisionRule = c("fixed", "max")[2], normalise = FALSE, scale.y = FALSE, threshold = 1e-10, loquace = FALSE, CompMethod = "Normal", Partitions = 1, cv.k = 7 )
MB |
A MultiBlock object. |
y |
The Y-block to use in the PLS model as dependent data. A class vector or a dummy matrix. |
ndim |
Number of Common Dimensions. |
method |
PLS-DA or PLS-R. |
decisionRule |
Only used if method is set to PLS-DA. If 'fixed', samples are assigned to the class with Y-hat above 1/nclasses. If 'max', samples are assigned to the class with the highest Y-hat. |
normalise |
To apply block normalisation. FALSE == no (default), TRUE == yes.
When TRUE each block is mean-centred and then divided by its Frobenius norm so
that all blocks have unit total inertia entering the ComDim loop.
Has no effect on the Y-block; use |
scale.y |
Logical (default FALSE). When TRUE and |
threshold |
The threshold limit to stop the iterations. If the "difference of fit" < threshold (1e-10 as default). |
loquace |
To display the calculation times. TRUE == yes, FALSE == no (default). |
CompMethod |
To speed up the analysis for really big multi-blocks. 'Normal' (default), 'Kernel', 'PCT', 'Tall' or 'Wide'. |
Partitions |
To speed up the analysis for really big multi-blocks. This parameter is used if CompMethod is 'Tall' or 'Wide'. |
cv.k |
Number of folds for k-fold cross-validation (default 7). Set to 0 to skip CV. When cv.k >= 2, Q2 and DQ2 in the output reflect cross-validated predictive ability; otherwise they reflect training-set fit (R2). |
A ComDim object. All slots are populated. Key slots:
Method"PLS-DA" or "PLS-R".
ndimNumber of Common Dimensions extracted.
Q.scoresGlobal consensus PLS scores ().
Each column (unit-norm) is the dominant left
singular vector from the NIPALS PLS applied to the salience-weighted
concatenated blocks.
T.scoresNamed list of block-specific local scores
( each). Local loading
; local score
.
P.loadingsGlobal loadings ():
, where is
the mean-centred (and optionally normalised) concatenated blocks.
SaliencesBlock salience matrix ():
.
R2XProportion of X variance captured by each component
(named vector, length ). Let be the
NIPALS PLS X-score vector for component on the salience-
weighted blocks; then
R2YCumulative Y-variance explained (named vector, length
). is the from an OLS regression of
on the first global scores with an intercept:
where is the residual SS when predicting
from .
Note: is cumulative — it reflects the total
Y-variance explained by the first components together,
not the marginal contribution of component alone.
Q2Predictive Q2 per response column (PLS-R) or per class (PLS-DA), named accordingly:
where and
.
When cv.k >= 2, are out-of-sample
cross-validated predictions; otherwise training-set predictions are
used (i.e. Q2 = R2Y for the full model).
DQ2(PLS-DA only) Discriminant Q2 per class. Only penalising residuals contribute to the sum:
where sums for class-0 samples with
, and for class-1 samples
with . Same cross-validation logic as Q2.
SingularSquared L2 norm of the NIPALS PLS score vector
per component (), used to derive
R2X.
VIPGlobal VIP scores (named numeric vector, length
) using the Wold formula:
where ,
is the
L2-normalised -th element of the -th NIPALS weight
vector, and is the total number of variables.
VIP.blockNamed list (one data.frame per block)
with columns p (per-block predictive VIP computed with block
size instead of ) and tot (= p
for PLS; included for consistency with OPLS output). Row names are
variable names.
PLS.modelList with: W (NIPALS X weight matrix,
); B (regression coefficients,
,
,
in original Y units); B0 (intercept vector, length
,
);
Y (original response matrix as supplied).
Training-set Y predictions:
.
cvCross-validation results when cv.k >= 2 (empty
list otherwise): k (number of folds), fold
(sample-to-fold vector), Ypred ( matrix
of out-of-sample predictions), Q2 (CV Q2 per class/response),
DQ2 (mean CV DQ2 across classes, PLS-DA only),
DQ2.perclass (CV DQ2 per class, PLS-DA only).
PredictionTraining-set predictions: Y.pred
(); for PLS-DA also decisionRule,
trueClass (character vector), predClass (data.frame),
Sensitivity and Specificity (named per class),
confusionMatrix (named list of 2x2 matrices, one per class).
MeanList with MeanMB (column means per block),
MeanY (column means of Y before any scaling), and
ScaleY (column SDs of Y; all ones when scale.y = FALSE).
NormList with NormMB: Frobenius norms for block
normalisation.
variable.blockCharacter vector (length )
mapping each row of P.loadings and each element of VIP
to its block.
runtimeTotal computation time in seconds.
Jouan-Rimbaud Bouveresse D, Rutledge DN (2024). A synthetic review of some recent extensions of ComDim. Journal of Chemometrics, 38(5), e3454. doi:10.1002/cem.3454
b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(2400), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Samples = list( b1 = paste0("samples_", 1:10), b2 = rep(paste0("samples_", 1:10), 3) ), Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.size = TRUE ) rw <- SplitRW(mb) y <- scale(1:10, center = TRUE) results.plsr <- ComDim_PLS(rw, y, 2, method = "PLS-R") groups <- c(rep("A", 5), rep("B", 5)) results.plsda <- ComDim_PLS(rw, y = groups, 2, method = "PLS-DA")b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(2400), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Samples = list( b1 = paste0("samples_", 1:10), b2 = rep(paste0("samples_", 1:10), 3) ), Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.size = TRUE ) rw <- SplitRW(mb) y <- scale(1:10, center = TRUE) results.plsr <- ComDim_PLS(rw, y, 2, method = "PLS-R") groups <- c(rep("A", 5), rep("B", 5)) results.plsda <- ComDim_PLS(rw, y = groups, 2, method = "PLS-DA")
Extends any PLS-like method used for regression or discriminant purposes to
the multi-block field. The user provides a function (FUN) that
computes one predictive component from the salience-weighted concatenated
blocks; global scores, local scores, and loadings are then derived following
the traditional ComDim-PLS framework. Optionally, orthogonal components
returned by FUN (e.g. from an O-PLS wrapper) are captured. VIP scores and
k-fold cross-validation are also supported.
ComDim_y( MB = MB, y = y, ndim = NULL, FUN = FUN, nort = 0L, type = c("regression", "discriminant")[1], decisionRule = c("fixed", "max")[2], normalise = FALSE, scale.y = FALSE, threshold = 1e-10, loquace = FALSE, method = "FUN", cv.k = 7, ... )ComDim_y( MB = MB, y = y, ndim = NULL, FUN = FUN, nort = 0L, type = c("regression", "discriminant")[1], decisionRule = c("fixed", "max")[2], normalise = FALSE, scale.y = FALSE, threshold = 1e-10, loquace = FALSE, method = "FUN", cv.k = 7, ... )
MB |
A MultiBlock object. |
y |
The response: a numeric vector or matrix for regression
( |
ndim |
Number of predictive Common Dimensions. If |
FUN |
The function used as the core of the ComDim analysis. It must
accept
Optional return fields:
|
nort |
Number of orthogonal Common Dimensions to extract before the
predictive loop. Default |
type |
|
decisionRule |
Only used when |
normalise |
To apply block normalisation. |
scale.y |
Logical (default |
threshold |
Convergence threshold: iterations stop when the change
in the global score vector falls below this value (default |
loquace |
Display computation time at each step. |
method |
A string label identifying the method (default: |
cv.k |
Number of folds for k-fold cross-validation (default 7). Set
to 0 to skip CV. When |
... |
Additional arguments passed to |
A ComDim object with the following slots:
MethodThe label supplied via the method argument.
ndimNumber of predictive Common Dimensions extracted.
Q.scoresGlobal consensus scores matrix (). Each column (unit-norm) is derived from
the dominant left direction of FUN applied to the salience-weighted
concatenated blocks.
T.scoresNamed list of block-specific local scores
( each). Local loading
(computed
on the ort-deflated block when nort > 0); local score
.
P.loadingsGlobal loadings ():
, where
is the (optionally ort-deflated) mean-centred
concatenated blocks.
SaliencesBlock salience matrix ():
.
R2XProportion of X variance captured by each predictive
component (named vector, length ). Let
be the X-score vector returned by FUN for component :
When nort > 0, the denominator also includes the orthogonal
terms, and the orthogonal R2X fractions
are stored separately in Orthogonal$R2X.
R2YCumulative Y-variance explained (named vector, length
):
where is the residual SS from an OLS regression of
on .
Note: is cumulative – the total Y-variance
explained by the first components together, not the marginal
contribution of component alone.
Q2Predictive Q2 per response column (regression) or per class (discriminant), named accordingly:
where .
When cv.k >= 2 and nort = 0: cross-validated (out-of-
sample) predictions are used; otherwise training-set predictions.
CV is automatically skipped when nort > 0.
DQ2(Discriminant mode only) Discriminant Q2 per class, using only penalising residuals:
where sums for class-0 samples with
, and for class-1 samples
with . Same cross-validation logic as Q2.
SingularSquared L2 norm of the FUN X-score vector per
component (), used to derive R2X.
VIPGlobal total VIP (named vector, length ):
concatenation of VIP.block[[b]]$tot across blocks. When
nort = 0, uses the Wold formula; when nort = 1, tot
combines predictive and orthogonal VIPs (see VIP.block).
VIP.blockNamed list (one data.frame per block).
When nort = 0: columns p and tot (= p),
using the Wold formula:
where and
is the L2-normalised
-th element of the -th weight vector.
When nort = 1: columns p (Wold, same as above),
o (orthogonal VIP, loadings-based:
,
where and
is the column-L2-normalised block-slice of
the ort loadings), and tot
().
Row names are variable names.
PLS.modelList with: W (X weight matrix collected
from FUN, ); B (regression
coefficients,
,
in original Y units); B0 (intercept,
); Y (original
response matrix as supplied).
Training-set predictions:
.
cvCross-validation results when cv.k >= 2 and
nort = 0 (empty list otherwise): k, fold
(sample-to-fold vector), Ypred (
out-of-sample predictions), Q2 (CV Q2 per class/response),
DQ2 (mean CV DQ2, discriminant only),
DQ2.perclass (CV DQ2 per class, discriminant only).
OrthogonalWhen nort > 0: list with nort,
Q.scores (global ort scores, , unit-norm),
T.scores (block ort local scores, each),
P.loadings.ort (ort loadings, ),
Saliences.ort (), and R2X
(orthogonal X-variance fractions,
).
Empty list when nort = 0.
PredictionTraining-set predictions: Y.pred
(); for discriminant analysis also
decisionRule, trueClass, predClass (data.frame),
Sensitivity and Specificity (per class),
confusionMatrix (named list of 2x2 matrices).
MeanList with MeanMB (column means per block),
MeanY (column means of Y), and ScaleY (column SDs of Y;
all ones when scale.y = FALSE).
NormList with NormMB: Frobenius norms for block
normalisation.
variable.blockCharacter vector (length )
mapping each row of P.loadings and each element of VIP
to its block.
runtimeTotal computation time in seconds.
b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) ## Example 1: ComDim-PLS (regression) --------------------------------------- # Single-step NIPALS PLS wrapper (one predictive component per call). # Note: 'tx' is used instead of 't' to avoid shadowing base::t(). fun.PLS <- function(W, y, ndim, ...) { output <- list() w <- t(W) %*% y / as.numeric(t(y) %*% y) # X weight (u = y, 1 step) w <- w / sqrt(sum(w^2)) # L2 normalise tx <- W %*% w # X score p <- t(W) %*% tx / as.numeric(t(tx) %*% tx) # X loading q <- t(y) %*% tx / as.numeric(t(tx) %*% tx) # Y loading u <- y %*% q / as.numeric(t(q) %*% q) # Y score output$scores <- as.vector(tx) output$P <- as.vector(p) output$W <- as.vector(w) output$Q <- as.vector(q) output$U <- as.vector(u) return(output) } y <- c(1, 1, 1, 1, 1, 5, 5, 5, 10, 10) resultsPLS <- ComDim_y(mb, y = y, ndim = 2, type = "regression", FUN = fun.PLS, method = "PLS", cv.k = 0 ) ## Example 2: ComDim-OPLS-DA (discriminant, nort = 1) ---------------------- # Thin wrapper around OPLS_NIPALS_DNR(), the package's NIPALS OPLS engine. # All inputs (W, y, and any extra args such as 'threshold') are forwarded # directly via '...'. Use this pattern when nort > 0; for nort = 0 the # simpler PLS wrapper in Example 1 is sufficient (no orthoscores needed). fun.OPLS <- function(W, y, ndim, ...) { res <- OPLS_NIPALS_DNR(W = W, y = y, ...) list( scores = as.vector(res$t_pred), P = as.vector(res$p), W = as.vector(res$w_pred), Q = as.vector(res$q), U = as.vector(res$u), orthoscores = matrix(res$t_ort, ncol = 1) ) } groups <- c(rep("A", 5), rep("B", 5)) resultsOPLS <- ComDim_y(mb, y = groups, ndim = 1, nort = 1, type = "discriminant", FUN = fun.OPLS, method = "OPLS-DA", cv.k = 0 ) ## Example 3 (not run): ComDim-OPLS-DA via ropls --------------------------- # Wrapping ropls::opls is also possible. Key points: # - Use orthoI = 1 (fixed) instead of NA so the output is predictable. # - Always return output$orthoscores; ComDim_y ignores it in phases # where ort has already been removed. # - Expand the single ropls Q loading to match the ncol(y_dummy) width. if (requireNamespace("ropls", quietly = TRUE)) { fun.OPLSDA.ropls <- function(W, y, ndim, ...) { output <- list() # Convert dummy matrix to ropls-compatible -1/+1 vector Y <- c(-1, 1)[apply(y, 1, function(x) match(1, x))] result <- tryCatch( ropls::opls( x = W, y = Y, predI = 1, orthoI = 1, fig.pdfC = "none", info.txtC = "none" ), error = function(e) { ropls::opls( x = W, y = Y, predI = 1, orthoI = 0, fig.pdfC = "none", info.txtC = "none" ) } ) output$scores <- result@scoreMN[, 1] output$P <- result@loadingMN[, 1] output$W <- result@weightMN[, 1] output$U <- result@uMN[, 1] # Expand the single ropls Q loading to match the 2-column dummy matrix: # loadings for class1 and class2 are antisymmetric in binary PLS-DA. output$Q <- c(-result@cMN[, 1], result@cMN[, 1]) output$y <- result@suppLs$yModelMN # internal y (for scaling detection) # Orthogonal scores (used during the ort pre-loop when nort > 0) if (!is.null(result@orthoScoreMN) && ncol(result@orthoScoreMN) > 0) { output$orthoscores <- result@orthoScoreMN # n x k matrix; col jj used for jj-th ort } else { output$orthoscores <- matrix(0, nrow = nrow(W), ncol = 1) } return(output) } b1_r <- matrix(rnorm(8 * 30), 8, 30) b2_r <- matrix(rnorm(8 * 20), 8, 20) mb_r <- MultiBlock(Data = list(b1 = b1_r, b2 = b2_r)) resultsOPLSDA <- ComDim_y(mb_r, y = c(rep("NI", 4), rep("OFF", 4)), ndim = 1, nort = 1, type = "discriminant", FUN = fun.OPLSDA.ropls, method = "OPLS-DA(ropls)", cv.k = 0 ) }b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) ## Example 1: ComDim-PLS (regression) --------------------------------------- # Single-step NIPALS PLS wrapper (one predictive component per call). # Note: 'tx' is used instead of 't' to avoid shadowing base::t(). fun.PLS <- function(W, y, ndim, ...) { output <- list() w <- t(W) %*% y / as.numeric(t(y) %*% y) # X weight (u = y, 1 step) w <- w / sqrt(sum(w^2)) # L2 normalise tx <- W %*% w # X score p <- t(W) %*% tx / as.numeric(t(tx) %*% tx) # X loading q <- t(y) %*% tx / as.numeric(t(tx) %*% tx) # Y loading u <- y %*% q / as.numeric(t(q) %*% q) # Y score output$scores <- as.vector(tx) output$P <- as.vector(p) output$W <- as.vector(w) output$Q <- as.vector(q) output$U <- as.vector(u) return(output) } y <- c(1, 1, 1, 1, 1, 5, 5, 5, 10, 10) resultsPLS <- ComDim_y(mb, y = y, ndim = 2, type = "regression", FUN = fun.PLS, method = "PLS", cv.k = 0 ) ## Example 2: ComDim-OPLS-DA (discriminant, nort = 1) ---------------------- # Thin wrapper around OPLS_NIPALS_DNR(), the package's NIPALS OPLS engine. # All inputs (W, y, and any extra args such as 'threshold') are forwarded # directly via '...'. Use this pattern when nort > 0; for nort = 0 the # simpler PLS wrapper in Example 1 is sufficient (no orthoscores needed). fun.OPLS <- function(W, y, ndim, ...) { res <- OPLS_NIPALS_DNR(W = W, y = y, ...) list( scores = as.vector(res$t_pred), P = as.vector(res$p), W = as.vector(res$w_pred), Q = as.vector(res$q), U = as.vector(res$u), orthoscores = matrix(res$t_ort, ncol = 1) ) } groups <- c(rep("A", 5), rep("B", 5)) resultsOPLS <- ComDim_y(mb, y = groups, ndim = 1, nort = 1, type = "discriminant", FUN = fun.OPLS, method = "OPLS-DA", cv.k = 0 ) ## Example 3 (not run): ComDim-OPLS-DA via ropls --------------------------- # Wrapping ropls::opls is also possible. Key points: # - Use orthoI = 1 (fixed) instead of NA so the output is predictable. # - Always return output$orthoscores; ComDim_y ignores it in phases # where ort has already been removed. # - Expand the single ropls Q loading to match the ncol(y_dummy) width. if (requireNamespace("ropls", quietly = TRUE)) { fun.OPLSDA.ropls <- function(W, y, ndim, ...) { output <- list() # Convert dummy matrix to ropls-compatible -1/+1 vector Y <- c(-1, 1)[apply(y, 1, function(x) match(1, x))] result <- tryCatch( ropls::opls( x = W, y = Y, predI = 1, orthoI = 1, fig.pdfC = "none", info.txtC = "none" ), error = function(e) { ropls::opls( x = W, y = Y, predI = 1, orthoI = 0, fig.pdfC = "none", info.txtC = "none" ) } ) output$scores <- result@scoreMN[, 1] output$P <- result@loadingMN[, 1] output$W <- result@weightMN[, 1] output$U <- result@uMN[, 1] # Expand the single ropls Q loading to match the 2-column dummy matrix: # loadings for class1 and class2 are antisymmetric in binary PLS-DA. output$Q <- c(-result@cMN[, 1], result@cMN[, 1]) output$y <- result@suppLs$yModelMN # internal y (for scaling detection) # Orthogonal scores (used during the ort pre-loop when nort > 0) if (!is.null(result@orthoScoreMN) && ncol(result@orthoScoreMN) > 0) { output$orthoscores <- result@orthoScoreMN # n x k matrix; col jj used for jj-th ort } else { output$orthoscores <- matrix(0, nrow = nrow(W), ncol = 1) } return(output) } b1_r <- matrix(rnorm(8 * 30), 8, 30) b2_r <- matrix(rnorm(8 * 20), 8, 20) mb_r <- MultiBlock(Data = list(b1 = b1_r, b2 = b2_r)) resultsOPLSDA <- ComDim_y(mb_r, y = c(rep("NI", 4), rep("OFF", 4)), ndim = 1, nort = 1, type = "discriminant", FUN = fun.OPLSDA.ropls, method = "OPLS-DA(ropls)", cv.k = 0 ) }
Object of the type MultiBlock, to use as input for ComDim
analyses.
The output of a ComDim analysis.
Samplesvector with the sample names. If this data is not available, the slot will be filled with integers.
DataA list with the data-blocks.
VariablesA character vector with the variable names. If this data is not available, the slot will be filled with integers.
BatchA list with the vectors with the batch information for each data-block. Optional.
MetadataA list with the samples metadata.
MethodThe algorithm used in the core of the ComDim analysis (ex. PCA, PLS,...)
ndimThe number of components.
Q.scoresThe Global scores.
T.scoresThe Local scores.
P.loadingsThe Loadings
SaliencesThe Saliences
R2XThe explained variance of the MultiBlock.
R2YFor regression or discriminant models, the explained variance of the Y-block.
Q2For regression or discriminant models, the predicted variance of the Y-block.
DQ2For discriminant models, the predicted discriminant variance of the Y-block.
SingularThe singular values.
MeanThe mean values for each variable in the MultiBlock.
NormThe norm values for each variable in the MultiBlock.
PLS.modelFor ComDim analyses using PLS as the core algorithm, contains the W, B, B0 and Y matrices.
cvFor ComDim_KOPLS, it contains the index of the samples used during the cross-validation.
PredictionA list with the predicted Y, the decision rule used for sample classification, the sensitivity, the specificity, and a confusion matrix.
MetadataA list with the per-sample metadata for each block.
variable.blockA vector with the same length as the P.loadings, indicating the block each variable belongs to.
runtimeThe used running time.
Splits data into several blocks, allowing variables to appear in more than one block
simultaneously. Each variable is duplicated into every block to which it is assigned according
to the metadata mapping table.
ExpandMultiBlock(data = NULL, metadata = NULL, minblock = 0, loquace = TRUE)ExpandMultiBlock(data = NULL, metadata = NULL, minblock = 0, loquace = TRUE)
data |
A data.frame or matrix with samples in rows and variables in columns. |
metadata |
A 2-column data.frame describing how variables are assigned to blocks. The
first column gives the block name; the second column gives the variable name, and must match
the column names of |
minblock |
Integer. Blocks with fewer than |
loquace |
Logical. If |
For each row in metadata that matches a variable in data, the variable's
values are copied into the corresponding block column. Column names in the resulting expanded
matrix are formed as <block>.<variable>. Variables with all-NA values after
expansion are removed. If no matches exist between data column names and
metadata, NULL is returned with a warning. If minblock filtering
removes all blocks, NULL is returned.
A MultiBlock object whose blocks are defined by the first column of
metadata, or NULL if no valid blocks could be constructed.
data(mouse_ds) lipidsMB <- ExpandMultiBlock(data = lipids, metadata = metadata_lipids, minblock = 0, loquace = FALSE)data(mouse_ds) lipidsMB <- ExpandMultiBlock(data = lipids, metadata = metadata_lipids, minblock = 0, loquace = FALSE)
Extracellular metabolites in growth medium
data(mouse_ds)data(mouse_ds)
An object of class matrix (inherits from array) with 12 rows and 298 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))
Retain a subset of samples in a MultiBlock object.
FilterSamplesMultiBlock(MB, samples = sampleNames(MB))FilterSamplesMultiBlock(MB, samples = sampleNames(MB))
MB |
A MultiBlock object. |
samples |
A vector of sample names to keep. Names not found in the MultiBlock are silently ignored; the order of the retained samples follows the order given here. |
The MultiBlock object restricted to the requested samples. The Batch and
Metadata slots are also subsetted and reordered to match.
MultiBlock, AddMetadata, ProcessMultiBlock
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) rownames(b1) <- rownames(b2) <- paste0("sample_", 1:10) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- FilterSamplesMultiBlock(mb, samples = paste0("sample_", 1:5))b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) rownames(b1) <- rownames(b2) <- paste0("sample_", 1:10) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- FilterSamplesMultiBlock(mb, samples = paste0("sample_", 1:5))
GCMS data on cell differentiation
data(dataset3)data(dataset3)
An object of class matrix (inherits from array) with 36 rows and 15 columns.
Cabrero et al. (2019) Scientific data 6:256 (doi:10.1038/s41597-019-0202-7)
data(dataset3) mb_d3 <- MultiBlock(Data = list(gcms = gcms))data(dataset3) mb_d3 <- MultiBlock(Data = list(gcms = gcms))
Intracellular metabolites in growth medium
data(mouse_ds)data(mouse_ds)
An object of class matrix (inherits from array) with 12 rows and 230 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))
Metadata for the metabolites in growth medium
data(mouse_ds)data(mouse_ds)
An object of class data.frame with 687 rows and 2 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) #' intraMB <- ExpandMB(data = intra, metadata = KEGG_table_metabolites, #' minblock = 10, loquace = FALSE)data(mouse_ds) #' intraMB <- ExpandMB(data = intra, metadata = KEGG_table_metabolites, #' minblock = 10, loquace = FALSE)
LCMS data on cell differentiation
data(dataset3)data(dataset3)
An object of class matrix (inherits from array) with 36 rows and 44 columns.
Cabrero et al. (2019) Scientific data 6:256 (doi:10.1038/s41597-019-0202-7)
data(dataset3) #' mb_d3 <- MultiBlock(Data = list(lcms = lcms))data(dataset3) #' mb_d3 <- MultiBlock(Data = list(lcms = lcms))
Lipid profiles of cell extracts
data(mouse_ds)data(mouse_ds)
An object of class matrix (inherits from array) with 12 rows and 437 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))
Creates a long (tidy) data frame with the local P-loadings from a ComDim model, suitable for use with ggplot2.
MakeComDimLoadingsTable(model, blocks = NULL, dim = NULL, dim.ort = NULL)MakeComDimLoadingsTable(model, blocks = NULL, dim = NULL, dim.ort = NULL)
model |
The output from a ComDim analysis (a |
blocks |
The blocks from which loadings will be extracted. A vector of integers (block indices) or block names. When omitted, all blocks are included. |
dim |
Integer vector of predictive component indices to include. When omitted, all predictive components in the model are included. |
dim.ort |
Integer vector of orthogonal component indices to include. When |
A long data frame with one row per variable–component combination, containing the following columns:
variable.idVariable name (factor).
variable.id.numberPosition of the variable across all blocks (factor).
block.idInteger index of the block.
block.nameName of the block (factor).
dimComponent number.
valueLoading value (from P.loadings or Orthogonal$P.loadings.ort).
MakeComDimScoresTable, ComDim_PCA
b1 <- matrix(rnorm(500), 10, 50) # 10 rows and 50 columns b2 <- matrix(rnorm(800), 10, 80) # 10 rows and 80 columns mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) model <- ComDim_PCA(mb, ndim = 2) tbl <- MakeComDimLoadingsTable(model)b1 <- matrix(rnorm(500), 10, 50) # 10 rows and 50 columns b2 <- matrix(rnorm(800), 10, 80) # 10 rows and 80 columns mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) model <- ComDim_PCA(mb, ndim = 2) tbl <- MakeComDimLoadingsTable(model)
Creates a long (tidy) data frame with the global and/or local scores from a ComDim model, suitable for use with ggplot2.
MakeComDimScoresTable( model, blocks = NULL, dim = NULL, dim.ort = NULL, include = c("Q.scores", "T.scores", "Q.scores.ort", "T.scores.ort")[1:2] )MakeComDimScoresTable( model, blocks = NULL, dim = NULL, dim.ort = NULL, include = c("Q.scores", "T.scores", "Q.scores.ort", "T.scores.ort")[1:2] )
model |
The output from a ComDim analysis (a |
blocks |
The blocks from which local scores will be extracted. A vector of integers (block
indices) or block names. When omitted, all blocks are included. Only relevant when
|
dim |
Integer vector of predictive component indices to include. When omitted, all predictive components in the model are included. |
dim.ort |
Integer vector of orthogonal component indices to include. When |
include |
Character vector selecting which score types to include. Accepted values
(case-insensitive) are |
A long data frame with one row per sample–component–score-type combination, containing the following columns:
sample.idSample name (factor).
sample.id.numberInteger position of the sample (factor).
block.idBlock index, or "Global" / "Global.ort" for Q scores.
block.nameBlock name, or "Global" / "Global.ort" for Q scores (factor).
dimComponent number.
scores.typeOne of "Global", "Global.ort", "Local", or "Local.ort" (factor).
scores.type.dimConcatenation of scores type and component number, e.g. "Q.scores1" (factor).
valueScore value.
MakeComDimLoadingsTable, ComDim_PCA
b1 <- matrix(rnorm(500), 10, 50) # 10 rows and 50 columns b2 <- matrix(rnorm(800), 10, 80) # 10 rows and 80 columns mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) model <- ComDim_PCA(mb, ndim = 2) tbl <- MakeComDimScoresTable(model)b1 <- matrix(rnorm(500), 10, 50) # 10 rows and 50 columns b2 <- matrix(rnorm(800), 10, 80) # 10 rows and 80 columns mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) model <- ComDim_PCA(mb, ndim = 2) tbl <- MakeComDimScoresTable(model)
Metadata for the lipid profiles of cell extracts
data(mouse_ds)data(mouse_ds)
An object of class data.frame with 437 rows and 2 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) #' lipidsMB <- ExpandMB(data = lipids, metadata = metadata_lipids, #' minblock = 0, loquace = FALSE)data(mouse_ds) #' lipidsMB <- ExpandMB(data = lipids, metadata = metadata_lipids, #' minblock = 0, loquace = FALSE)
Metadata for the RNAseq data of cell extracts
data(mouse_ds)data(mouse_ds)
An object of class data.frame with 16890 rows and 2 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) #' RNAseqMB <- ExpandMB(data = RNAseq3, metadata = metadata_RNAseq3, #' minblock = 500, loquace = FALSE)data(mouse_ds) #' RNAseqMB <- ExpandMB(data = RNAseq3, metadata = metadata_RNAseq3, #' minblock = 500, loquace = FALSE)
miRNAseq data on cell differentiation
data(dataset3)data(dataset3)
An object of class matrix (inherits from array) with 36 rows and 469 columns.
Cabrero et al. (2019) Scientific data 6:256 (doi:10.1038/s41597-019-0202-7)
data(dataset3) #' mb_d3 <- MultiBlock(Data = list(mirnaseq = mirnaseq))data(dataset3) #' mb_d3 <- MultiBlock(Data = list(mirnaseq = mirnaseq))
Converts a MultiAssayExperiment into a MultiBlock. Samples are first intersected across all experiments so that only common samples are retained.
MultiAssayExperiment2MultiBlock(se, colData_samplenames = NULL, Batch = NULL)MultiAssayExperiment2MultiBlock(se, colData_samplenames = NULL, Batch = NULL)
se |
A |
colData_samplenames |
Character string giving the name of the column in |
Batch |
Character string giving the name of the column in |
Columns (samples) are intersected across all experiments via intersectColumns before
conversion. Each experiment in the MultiAssayExperiment becomes one block in the
MultiBlock (rows = samples, columns = features). Sample names are taken from the
colData row names of se. Metadata from colData is stored only for the
first block; subsequent blocks share the same sample order but carry no additional metadata.
If Batch is specified, the corresponding column is extracted from colData,
removed from the metadata, and stored as the Batch slot of the MultiBlock.
A MultiBlock object with one block per experiment in se.
MultiBlock, MultiBlock2MultiAssayExperiment
if (requireNamespace("MultiAssayExperiment", quietly = TRUE)) { library(MultiAssayExperiment) mae <- MultiAssayExperiment( experiments = ExperimentList( block1 = matrix(rnorm(50), nrow = 5, dimnames = list(paste0("s", 1:5), paste0("v", 1:10))), block2 = matrix(rnorm(30), nrow = 5, dimnames = list(paste0("s", 1:5), paste0("w", 1:6))) ) ) mb <- MultiAssayExperiment2MultiBlock(mae) }if (requireNamespace("MultiAssayExperiment", quietly = TRUE)) { library(MultiAssayExperiment) mae <- MultiAssayExperiment( experiments = ExperimentList( block1 = matrix(rnorm(50), nrow = 5, dimnames = list(paste0("s", 1:5), paste0("v", 1:10))), block2 = matrix(rnorm(30), nrow = 5, dimnames = list(paste0("s", 1:5), paste0("w", 1:6))) ) ) mb <- MultiAssayExperiment2MultiBlock(mae) }
Creates a MultiBlock object from a named list of data blocks.
MultiBlock( Samples = NULL, Data, Variables = NULL, Batch = NULL, Metadata = NULL, ignore.names = FALSE, ignore.size = FALSE )MultiBlock( Samples = NULL, Data, Variables = NULL, Batch = NULL, Metadata = NULL, ignore.names = FALSE, ignore.size = FALSE )
Samples |
A vector of sample names shared across all blocks (optional).
When omitted, sample names are taken from the row names of each block.
If no row names exist and all blocks have the same number of rows,
samples are numbered as integers. Use |
Data |
A named list of matrices or data.frames (one entry per block). |
Variables |
A named list of variable-name vectors, one per block (optional). When omitted, column names are taken from each block; if absent, variables are numbered as integers. |
Batch |
A named list of batch vectors, one per block (optional). |
Metadata |
A named list of metadata data.frames, one per block (optional). |
ignore.names |
If TRUE, sample names are not checked across blocks.
All blocks must have the same number of rows unless |
ignore.size |
If TRUE (only meaningful when |
A MultiBlock object.
Puig-Castellví F, Jouan-Rimbaud Bouveresse D, Mazéas L, Chapleur O, Rutledge DN (2021). Rearrangement of incomplete multi-omics datasets combined with ComDim for evaluating replicate cross-platform variability and batch influence. Chemometrics and Intelligent Laboratory Systems, 218, 104422. doi:10.1016/j.chemolab.2021.104422
b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables # Minimal call: Samples and Variables are filled in automatically. mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # With explicit sample names (enables cross-block alignment): rownames(b1) <- paste0("s", 1:10) rownames(b2) <- paste0("s", 1:10) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Blocks with different row counts (replicate design): b3 <- matrix(rnorm(800), 30, 80) batch_b3 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb3 <- MultiBlock( Data = list(b3 = b3), Batch = list(b3 = batch_b3), ignore.names = TRUE, ignore.size = TRUE )b1 <- matrix(rnorm(500), 10, 50) # 10 samples, 50 variables b2 <- matrix(rnorm(800), 10, 80) # 10 samples, 80 variables # Minimal call: Samples and Variables are filled in automatically. mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # With explicit sample names (enables cross-block alignment): rownames(b1) <- paste0("s", 1:10) rownames(b2) <- paste0("s", 1:10) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Blocks with different row counts (replicate design): b3 <- matrix(rnorm(800), 30, 80) batch_b3 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb3 <- MultiBlock( Data = list(b3 = b3), Batch = list(b3 = batch_b3), ignore.names = TRUE, ignore.size = TRUE )
Combines the blocks of a MultiBlock into a single matrix by column-binding the selected blocks. Useful for computing summary statistics across the whole MultiBlock (e.g. maximum value).
MultiBlock2Matrix(MB = MB, blocks = NULL, vars = NULL)MultiBlock2Matrix(MB = MB, blocks = NULL, vars = NULL)
MB |
A |
blocks |
The blocks to combine. A vector of integers (block indices) or a vector of block names. When omitted, all blocks are included. |
vars |
The variables to keep. A list of the same length as |
Blocks are column-bound in the order given by blocks. Row names of the output matrix
are the sample names of MB. Column names are the variable names; if the same variable
name appears in more than one block, duplicates are disambiguated by appending
.<block_name> to all occurrences of the repeated name, and a warning is issued.
A numeric matrix with rows corresponding to samples and columns corresponding to the variables of the selected blocks concatenated in order.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Combine all blocks into a matrix and compute a summary statistic: mat <- MultiBlock2Matrix(mb) max(mat) # Combine only the first block, keeping variables 1-10: mat <- MultiBlock2Matrix(mb, blocks = 1, vars = list(1:10))b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Combine all blocks into a matrix and compute a summary statistic: mat <- MultiBlock2Matrix(mb) max(mat) # Combine only the first block, keeping variables 1-10: mat <- MultiBlock2Matrix(mb, blocks = 1, vars = list(1:10))
Converts a MultiBlock into a MultiAssayExperiment. Each block becomes one experiment; sample
names, batch information, and metadata are carried over into the colData of the result.
MultiBlock2MultiAssayExperiment(MB, MSEmetadata = NULL)MultiBlock2MultiAssayExperiment(MB, MSEmetadata = NULL)
MB |
A |
MSEmetadata |
An optional list of unstructured metadata describing the overall content of
the MultiAssayExperiment (stored in its |
Each block in MB is transposed (features x samples) and stored as a named matrix in
the ExperimentList. Row names are set to the variable names and column names to the
sample names of the MultiBlock. The colData is constructed from MB@Samples;
any Metadata and Batch information present in the MultiBlock is appended as
additional columns. A sampleMap is generated mapping every sample to every experiment
using the same primary and column names.
A MultiAssayExperiment object with one experiment per block in MB.
MultiBlock, MultiAssayExperiment2MultiBlock
if (requireNamespace("MultiAssayExperiment", quietly = TRUE)) { library(MultiAssayExperiment) b1 <- matrix(rnorm(50), 5, 10, dimnames = list(paste0("s", 1:5), paste0("v", 1:10))) b2 <- matrix(rnorm(30), 5, 6, dimnames = list(paste0("s", 1:5), paste0("w", 1:6))) mb <- MultiBlock(Data = list(block1 = b1, block2 = b2)) mae <- MultiBlock2MultiAssayExperiment(mb) mae <- MultiBlock2MultiAssayExperiment(mb, MSEmetadata = list(study = "example")) }if (requireNamespace("MultiAssayExperiment", quietly = TRUE)) { library(MultiAssayExperiment) b1 <- matrix(rnorm(50), 5, 10, dimnames = list(paste0("s", 1:5), paste0("v", 1:10))) b2 <- matrix(rnorm(30), 5, 6, dimnames = list(paste0("s", 1:5), paste0("w", 1:6))) mb <- MultiBlock(Data = list(block1 = b1, block2 = b2)) mae <- MultiBlock2MultiAssayExperiment(mb) mae <- MultiBlock2MultiAssayExperiment(mb, MSEmetadata = list(study = "example")) }
Remove NA and Infinite values from a MultiBlock object in a single pass.
Variables whose combined count of NA and Infinite values meets or exceeds
minfrac * nrow are discarded first. Remaining Infinite values are
then replaced according to inf.method; remaining NA values are
imputed according to na.method.
NAInfRemoveMultiBlock( MB, blocks = NULL, minfrac = 0.5, na.method = c("none", "zero", "median", "discard", "fixed.value", "fixed.value.all", "fixed.noise", "random.noise", "QRILC")[8], inf.method = c("none", "fixed.noise", "random.noise")[3], constant = 0, factor.NA = 0.5, sd.noise = 0.3, tune.sigma = 1, showWarning = TRUE )NAInfRemoveMultiBlock( MB, blocks = NULL, minfrac = 0.5, na.method = c("none", "zero", "median", "discard", "fixed.value", "fixed.value.all", "fixed.noise", "random.noise", "QRILC")[8], inf.method = c("none", "fixed.noise", "random.noise")[3], constant = 0, factor.NA = 0.5, sd.noise = 0.3, tune.sigma = 1, showWarning = TRUE )
MB |
The MultiBlock object. |
blocks |
Blocks to process: a vector of integers or block names (optional; all blocks are processed when omitted). |
minfrac |
Minimum fraction of valid (non-NA, finite) values required to retain a variable. Variables at or below this threshold are discarded. Default: 0.5. |
na.method |
Imputation method for remaining NA values after the
minfrac filter. One of: |
inf.method |
Replacement method for remaining Infinite values after
the minfrac filter. One of: |
constant |
For |
factor.NA |
Noise factor used by |
sd.noise |
Standard-deviation factor for |
tune.sigma |
Tuning scalar for |
showWarning |
If |
The processed MultiBlock object.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) b2[c(2, 3, 5), c(1, 2, 3)] <- NA b2[c(1, 4), c(4, 5)] <- Inf mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- NAInfRemoveMultiBlock(mb, na.method = "zero", inf.method = "fixed.noise")b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) b2[c(2, 3, 5), c(1, 2, 3)] <- NA b2[c(1, 4), c(4, 5)] <- Inf mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- NAInfRemoveMultiBlock(mb, na.method = "zero", inf.method = "fixed.noise")
Return the number of columns (variables) in each block of a MultiBlock.
## S4 method for signature 'MultiBlock' ncol(x)## S4 method for signature 'MultiBlock' ncol(x)
x |
A MultiBlock object. |
A named integer vector with the number of columns per block.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) ncol(mb) # c(b1 = 50, b2 = 80)b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) ncol(mb) # c(b1 = 50, b2 = 80)
Normalize all blocks from a MultiBlock object. The ranknorm transform is based on that from the RNOmni package (doi:10.1111/biom.13214).
NormalizeMultiBlock( MB, blocks = NULL, method = c("none", "auto", "mean", "pareto", "norm", "geometric", "ranknorm")[5], infinite.as.NA = FALSE, constant = 0, offset = 3/8, showWarning = TRUE )NormalizeMultiBlock( MB, blocks = NULL, method = c("none", "auto", "mean", "pareto", "norm", "geometric", "ranknorm")[5], infinite.as.NA = FALSE, constant = 0, offset = 3/8, showWarning = TRUE )
MB |
The MultiBlock object. |
blocks |
Blocks to normalize. A vector of integers or block names (optional; all blocks are processed when omitted). |
method |
Normalization method. One of: |
infinite.as.NA |
If |
constant |
For |
offset |
For |
showWarning |
If |
The normalized MultiBlock object.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) b2[c(2, 3, 5), c(1, 2, 3)] <- NA mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- NormalizeMultiBlock(mb, method = "auto")b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) b2[c(2, 3, 5), c(1, 2, 3)] <- NA mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) mb <- NormalizeMultiBlock(mb, method = "auto")
Return the number of rows (samples) in each block of a MultiBlock. For a valid MultiBlock all blocks share the same number of rows, so the result is a named integer vector with one (identical) value per block.
## S4 method for signature 'MultiBlock' nrow(x)## S4 method for signature 'MultiBlock' nrow(x)
x |
A MultiBlock object. |
A named integer vector with the number of rows per block.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) nrow(mb) # c(b1 = 10, b2 = 10)b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) nrow(mb) # c(b1 = 10, b2 = 10)
One NIPALS OPLS step on a (lambda-weighted) concatenated block matrix.
Computes one predictive component and one orthogonal component in a single
pass. This function is the recommended building block for constructing an
OPLS-based FUN argument for ComDim_y() when nort > 0.
For nort = 0 (plain PLS) a simpler PLS wrapper is sufficient.
OPLS_NIPALS_DNR(W = W, y = y, threshold = 1e-10)OPLS_NIPALS_DNR(W = W, y = y, threshold = 1e-10)
W |
Numeric matrix (n x p): the concatenated, lambda-weighted blocks
as passed by |
y |
Numeric matrix (n x q): the response block (dummy matrix for
discriminant analysis, numeric matrix for regression). Only the first
column drives the NIPALS u-score iteration; all columns are used to
compute the Y-loading |
threshold |
Convergence threshold for the u-score update (default
|
A named list:
Predictive X-score (length n).
Predictive X-weight (length p), L2-normalised.
X-loading (length p).
Y-loading (length q = ncol(y)).
Y-score (length n).
Orthogonal X-score (length n).
Orthogonal X-weight (length p), L2-normalised.
Orthogonal X-loading (length p).
ComDim_y for the multi-block OPLS wrapper that uses
this function.
Projects a new MultiBlock dataset into an existing ComDim model. Works with models produced by ComDim_PCA, ComDim_PLS, ComDim_OPLS, ComDim_y, and ComDim_Exploratory. The projection type (PCA-like, PLS-like, OPLS-like) is determined from the model structure rather than the method string, so custom method labels from ComDim_y are handled automatically.
PredictMultiBlock(MB = MB, y, model = model, normalise = FALSE, loquace = TRUE)PredictMultiBlock(MB = MB, y, model = model, normalise = FALSE, loquace = TRUE)
MB |
A MultiBlock object containing the new samples to project. |
y |
Response vector or dummy matrix (optional). When supplied for a supervised model, Q2, DQ2, and classification statistics are computed for the new samples. |
model |
A ComDim object (the calibration model). |
normalise |
If TRUE, each block is mean-centred using the training
column means and divided by the training Frobenius norm. Must match the
|
loquace |
If TRUE, print a message for each set of model elements that were projected. Default TRUE. |
The model ComDim object with updated slots:
Q.scores — projected global scores (new samples x ndim).
T.scores — projected local scores (per block).
Orthogonal$Q.scores — projected global ort scores (if
model has orthogonal components).
Orthogonal$T.scores — projected local ort scores.
Prediction$Y.pred — predicted Y (supervised models).
Q2, DQ2, classification slots — when y
is supplied for a supervised model.
Apply a custom function to transform a MultiBlock and/or select variables or blocks.
When multiple operations are supplied, the order of execution is:
vars subsetting, then FUN, then FUN.SelectVars, then FUN.SelectBlocks.
ProcessMultiBlock( MB = MB, blocks = NULL, vars = NULL, FUN = NULL, FUN.SelectVars = NULL, FUN.SelectBlocks = NULL )ProcessMultiBlock( MB = MB, blocks = NULL, vars = NULL, FUN = NULL, FUN.SelectVars = NULL, FUN.SelectBlocks = NULL )
MB |
A MultiBlock object. |
blocks |
The blocks to process. A vector of integers or block names. When omitted, all blocks are processed. |
vars |
The variables to keep. A list with the same length as |
FUN |
A function applied to each selected block's data matrix (samples x variables). It receives the matrix as its sole argument and must return a matrix of the same dimensions. |
FUN.SelectVars |
A function applied to each selected block's data matrix to determine which
variables to keep. It receives the matrix as its sole argument and must return a logical vector
of length equal to |
FUN.SelectBlocks |
A function applied to each selected block's data matrix to determine
whether the block should be retained. It receives the matrix as its sole argument and must
return a single |
The processed MultiBlock object, with data matrices transformed and/or blocks/variables
removed according to the supplied arguments. Blocks not listed in blocks are left
unchanged.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Normalize each block to 0-100 range BY100 <- function(x) 100 * (x - min(x)) / (max(x) - min(x)) mb <- ProcessMultiBlock(mb, FUN = BY100) # Keep only variables with non-zero variance mb <- ProcessMultiBlock(mb, FUN.SelectVars = function(x) apply(x, 2, var) > 0) # Remove blocks where all values are below 0.5 mb <- ProcessMultiBlock(mb, FUN.SelectBlocks = function(x) max(x) >= 0.5)b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Normalize each block to 0-100 range BY100 <- function(x) 100 * (x - min(x)) / (max(x) - min(x)) mb <- ProcessMultiBlock(mb, FUN = BY100) # Keep only variables with non-zero variance mb <- ProcessMultiBlock(mb, FUN.SelectVars = function(x) apply(x, 2, var) > 0) # Remove blocks where all values are below 0.5 mb <- ProcessMultiBlock(mb, FUN.SelectBlocks = function(x) max(x) >= 0.5)
RNAseq data on cell differentiation
data(dataset3)data(dataset3)
An object of class matrix (inherits from array) with 36 rows and 12762 columns.
Cabrero et al. (2019) Scientific data 6:256 (doi:10.1038/s41597-019-0202-7)
data(dataset3) #' mb_d3 <- MultiBlock(Data = list(rnaseq = rnaseq))data(dataset3) #' mb_d3 <- MultiBlock(Data = list(rnaseq = rnaseq))
RNAseq data of cell extracts
data(mouse_ds)data(mouse_ds)
An object of class matrix (inherits from array) with 12 rows and 9254 columns.
Radic Shechter et al. (2021) Molecular Systems Biology 17:e10141 (doi:10.15252/msb.202010141)
data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))data(mouse_ds) # MB <- MultiBlock(Data = list(RNAseq3 = RNAseq3, # lipids = lipids, intra = intra, extra = extra))
Return the sample names of a MultiBlock object.
sampleNames(x) ## S4 method for signature 'MultiBlock' sampleNames(x)sampleNames(x) ## S4 method for signature 'MultiBlock' sampleNames(x)
x |
A MultiBlock object. |
A vector with the sample names.
b1 <- matrix(rnorm(500), 10, 50) rownames(b1) <- paste0("s", 1:10) mb <- MultiBlock(Data = list(b1 = b1)) sampleNames(mb) # "s1" ... "s10"b1 <- matrix(rnorm(500), 10, 50) rownames(b1) <- paste0("s", 1:10) mb <- MultiBlock(Data = list(b1 = b1)) sampleNames(mb) # "s1" ... "s10"
Set the sample names of a MultiBlock object.
sampleNames(x) <- value ## S4 replacement method for signature 'MultiBlock' sampleNames(x) <- valuesampleNames(x) <- value ## S4 replacement method for signature 'MultiBlock' sampleNames(x) <- value
x |
A MultiBlock object. |
value |
A vector of new sample names. Must have the same length as the current number of samples. |
The updated MultiBlock object.
b1 <- matrix(rnorm(500), 10, 50) mb <- MultiBlock(Data = list(b1 = b1)) sampleNames(mb) <- paste0("patient_", 1:10) sampleNames(mb)b1 <- matrix(rnorm(500), 10, 50) mb <- MultiBlock(Data = list(b1 = b1)) sampleNames(mb) <- paste0("patient_", 1:10) sampleNames(mb)
Finds the important variables presenting a coordinated response across all specified replicate-blocks for a given ComDim component.
SelectFeaturesRW( RW = RW, results = results, ndim = NULL, blocks = NULL, threshold_cor = 1, threshold_cov = 1, mean.RW = TRUE, plots = "NO" )SelectFeaturesRW( RW = RW, results = results, ndim = NULL, blocks = NULL, threshold_cor = 1, threshold_cov = 1, mean.RW = TRUE, plots = "NO" )
RW |
The object used as input in the ComDim analysis. |
results |
The output object obtained in the ComDim analysis. |
ndim |
The number of the component for which the important variables are to be identified. |
blocks |
A vector with the indices or the names for the replicate blocks of the same data type. |
threshold_cor |
The "times" parameter used to calculate the threshold in the following formula: cor(variable) > times * sd(cor(variables)). Minimal value that can be assigned to threshold_cor is 1. |
threshold_cov |
The "times" parameter used to calculate the threshold in the following formula: cov(variable) > times * sd(cov(variables)). Minimal value that can be assigned to threshold_cov is 1. |
mean.RW |
Logical value to indicate whether the RW data must be mean-centered (TRUE) or not (FALSE). |
plots |
Parameter to indicate whether S-plots (covariance vs. correlation with the Q scores)
must be produced. Possible values are |
The function applies an S-plot approach to identify variables that are both strongly covarying
and strongly correlated with the Q scores of the chosen component. For each block in
blocks, covariance (s1) and correlation (s2) of every variable with the
(pseudo-inverse-scaled) Q scores are computed. A variable is considered important in a block if
its absolute covariance exceeds threshold_cov * sd(s1) and its absolute correlation
exceeds threshold_cor * sd(s2). Only variables that satisfy both criteria in all
specified blocks simultaneously are returned. The sign of the local P-loadings is used to
separate variables into positive and negative groups.
A named list with two elements:
$positiveInteger indices (named by variable name) of the important variables presenting a positive relationship with the Q scores (positive covariance, positive correlation, and positive local P-loading) across all specified blocks.
$negativeInteger indices (named by variable name) of the important variables presenting a negative relationship with the Q scores (negative covariance, negative correlation, and negative local P-loading) across all specified blocks.
When plots is not "NO", S-plots are also displayed as a side effect: each plot
shows covariance on the x-axis and correlation on the y-axis, with selected variables
highlighted in red.
b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(800), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.names = TRUE, ignore.size = TRUE ) rw <- SplitRW(mb) results <- ComDim_PCA(rw, 2) # Identify important variables for component 1 across replicate blocks 2, 3, and 4 features <- SelectFeaturesRW(RW = rw, results = results, ndim = 1, blocks = c(2, 3, 4)) # Use stricter thresholds and display S-plots side by side features <- SelectFeaturesRW(RW = rw, results = results, ndim = 1, blocks = c(2, 3, 4), threshold_cor = 2, threshold_cov = 2, plots = "together")b1 <- matrix(rnorm(500), 10, 50) batch_b1 <- rep(1, 10) b2 <- matrix(rnorm(800), 30, 80) batch_b2 <- c(rep(1, 10), rep(2, 10), rep(3, 10)) mb <- MultiBlock( Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b1, b2 = batch_b2), ignore.names = TRUE, ignore.size = TRUE ) rw <- SplitRW(mb) results <- ComDim_PCA(rw, 2) # Identify important variables for component 1 across replicate blocks 2, 3, and 4 features <- SelectFeaturesRW(RW = rw, results = results, ndim = 1, blocks = c(2, 3, 4)) # Use stricter thresholds and display S-plots side by side features <- SelectFeaturesRW(RW = rw, results = results, ndim = 1, blocks = c(2, 3, 4), threshold_cor = 2, threshold_cov = 2, plots = "together")
Generate a synthetic MultiBlock dataset built from a known number of
orthogonal latent sources plus Gaussian noise. Useful for benchmarking and
testing ComDim functions.
SimulateMultiBlock( n = 500L, p = 2000L, n_sources = 4L, noise = 0.05, n_blocks = 2L )SimulateMultiBlock( n = 500L, p = 2000L, n_sources = 4L, noise = 0.05, n_blocks = 2L )
n |
Number of samples. Default: |
p |
Total number of variables (split evenly across blocks). Must be
divisible by |
n_sources |
Number of orthogonal latent sources. Default: |
noise |
Fraction of total variance attributed to noise, in (0, 1).
Default: |
n_blocks |
Number of blocks to split the variables into. Default:
|
The dataset is constructed as follows:
n_sources score vectors () are
drawn from a standard normal distribution and orthonormalised by QR
decomposition.
Loading vectors () are built so that
each source loads primarily (SD = 1) on one equal-sized variable segment,
with small cross-loadings (SD = 0.10) on the remaining variables.
The true signal is computed.
Gaussian noise is added such that
.
The variables are split into n_blocks equal-width
blocks, each assembled as a named element of the returned
MultiBlock.
A MultiBlock object with n_blocks blocks, each
of size , named "Block1",
"Block2", etc.
mb <- SimulateMultiBlock(n = 100, p = 200, n_sources = 4, noise = 0.05, n_blocks = 2) mb <- NormalizeMultiBlock(mb, method = 'norm') res <- ComDim_PCA(mb, ndim = 4)mb <- SimulateMultiBlock(n = 100, p = 200, n_sources = 4, noise = 0.05, n_blocks = 2) mb <- NormalizeMultiBlock(mb, method = 'norm') res <- ComDim_PCA(mb, ndim = 4)
Splits a multi-block into a replicate-wise (RW) structure by expanding each block along its batch dimension. Each batch within each original block becomes a separate block in the output, enabling replicate-wise ComDim analysis.
SplitRW( MB = MB, checkSampleCorrespondence = FALSE, batchNormalisation = TRUE, showSampleCorrespondence = TRUE )SplitRW( MB = MB, checkSampleCorrespondence = FALSE, batchNormalisation = TRUE, showSampleCorrespondence = TRUE )
MB |
A |
checkSampleCorrespondence |
Logical. If |
batchNormalisation |
Logical. If |
showSampleCorrespondence |
Logical. If |
Output block names follow the convention <original_block> when the original block has
only one batch, or <original_block>_<batch_label> when it has multiple batches.
The Metadata slot of each source block is also split and carried over to the
corresponding replicate blocks. If the MultiBlock has no Batch information at all,
the original object is returned unchanged with a warning.
A MultiBlock object in which each block corresponds to one batch of one
original data block (a replicate-wise structure ready for ComDim_PCA or
similar).
MultiBlock, ComDim_PCA, SelectFeaturesRW
b1 <- matrix(rnorm(1500), 30, 50) b2 <- matrix(rnorm(2400), 30, 80) batch_b <- c(rep(1, 10), rep(2, 10), rep(3, 10)) # Generate the multi-block (mb) with 3 batches of 10 samples each mb <- MultiBlock( Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b, b2 = batch_b), ignore.names = TRUE ) rw <- SplitRW(mb)b1 <- matrix(rnorm(1500), 30, 50) b2 <- matrix(rnorm(2400), 30, 80) batch_b <- c(rep(1, 10), rep(2, 10), rep(3, 10)) # Generate the multi-block (mb) with 3 batches of 10 samples each mb <- MultiBlock( Data = list(b1 = b1, b2 = b2), Batch = list(b1 = batch_b, b2 = batch_b), ignore.names = TRUE ) rw <- SplitRW(mb)
Converts a SummarizedExperiment into a MultiBlock. Each assay in the
SummarizedExperiment becomes one block (rows = samples, columns = features).
SummarizedExperiment2MultiBlock(se, colData_samplenames = NULL, Batch = NULL)SummarizedExperiment2MultiBlock(se, colData_samplenames = NULL, Batch = NULL)
se |
A |
colData_samplenames |
Character string giving the name of the column in |
Batch |
Character string giving the name of the column in |
Each assay is transposed so that samples are in rows and features in columns. Sample order
is aligned to colData(se) row names; samples not present in colData are
removed. If an assay has no name, the block is labelled "X". Metadata from
colData (excluding the Batch column if specified) is stored only for the first block;
subsequent assays are appended as additional blocks with no extra metadata.
A MultiBlock object with one block per assay in se.
MultiBlock, MultiAssayExperiment2MultiBlock,
MultiBlock2MultiAssayExperiment
if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { library(SummarizedExperiment) nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rownames(counts) <- paste0("feature", seq_len(nrows)) colnames(counts) <- paste0("sample", seq_len(ncols)) se <- SummarizedExperiment(assays = list(counts = counts)) mb <- SummarizedExperiment2MultiBlock(se) }if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { library(SummarizedExperiment) nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rownames(counts) <- paste0("feature", seq_len(nrows)) colnames(counts) <- paste0("sample", seq_len(ncols)) se <- SummarizedExperiment(assays = list(counts = counts)) mb <- SummarizedExperiment2MultiBlock(se) }
Return the variable names of a MultiBlock object.
variableNames(x, ...) ## S4 method for signature 'MultiBlock' variableNames(x, block)variableNames(x, ...) ## S4 method for signature 'MultiBlock' variableNames(x, block)
x |
A MultiBlock object. |
... |
Not used. Present for S4 generic dispatch compatibility. |
block |
Optional. A vector of block names or indices to retrieve. When omitted, variable names for all blocks are returned as a named list. When a single block is specified, a plain vector is returned. |
A named list of variable-name vectors (all blocks), or a single vector when exactly one block is requested.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) variableNames(mb) # named list: b1 = 1:50, b2 = 1:80 variableNames(mb, "b1") # 1:50 variableNames(mb, 1:2) # same as variableNames(mb)b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) variableNames(mb) # named list: b1 = 1:50, b2 = 1:80 variableNames(mb, "b1") # 1:50 variableNames(mb, 1:2) # same as variableNames(mb)
Set the variable names of a MultiBlock object.
value must be a named list with one entry per block, where each
entry is a vector of variable names whose length matches the number of
columns in that block.
variableNames(x) <- value ## S4 replacement method for signature 'MultiBlock' variableNames(x) <- valuevariableNames(x) <- value ## S4 replacement method for signature 'MultiBlock' variableNames(x) <- value
x |
A MultiBlock object. |
value |
A named list of variable-name vectors, one per block. |
To update a single block, use standard list-replacement chaining:
variableNames(mb)[["b1"]] <- newNames
The updated MultiBlock object.
b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Replace all at once: variableNames(mb) <- list( b1 = paste0("v", 1:50), b2 = paste0("v", 1:80) ) # Replace a single block using chaining: variableNames(mb)[["b1"]] <- paste0("feat_", 1:50)b1 <- matrix(rnorm(500), 10, 50) b2 <- matrix(rnorm(800), 10, 80) mb <- MultiBlock(Data = list(b1 = b1, b2 = b2)) # Replace all at once: variableNames(mb) <- list( b1 = paste0("v", 1:50), b2 = paste0("v", 1:80) ) # Replace a single block using chaining: variableNames(mb)[["b1"]] <- paste0("feat_", 1:50)