| Title: | Multi-Block Partial Least Squares Discriminant Analysis |
|---|---|
| Description: | Several functions are provided to implement a MBPLSDA : components search, optimal model components number search, optimal model validity test by permutation tests, observed values evaluation of optimal model parameters and predicted categories, bootstrap values evaluation of optimal model parameters and predicted cross-validated categories. The use of this package is described in Brandolini-Bunlon et al (2019. Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134). |
| Authors: | Marion Brandolini-Bunlon, Stephanie Bougeard, Melanie Petera, Estelle Pujos-Guillot |
| Maintainer: | Marion Brandolini-Bunlon <[email protected]> |
| License: | GPL (>= 2.0) |
| Version: | 0.9.0 |
| Built: | 2026-05-11 06:42:29 UTC |
| Source: | https://github.com/cran/packMBPLSDA |
Several functions are provided to implement a MBPLSDA : components search, optimal model components number search, optimal model validity test by permutation tests, observed values evaluation of optimal model parameters and predicted categories, bootstrap values evaluation of optimal model parameters and predicted cross-validated categories. The use of this package is described in Brandolini-Bunlon et al (2019. Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134).
Index of help topics:
boot_mbplsda bootstraped simulations for multi-block partial
least squares discriminant analysis
cvpred_mbplsda Cross-validated predicted categories from a
multi-block partial least squares discriminant
model
disjunctive Disjunctive table
ginv generalized inverse of a matrix X
inertie inertia of a matrix
mbplsda Multi-block partial least squares discriminant
analysis
medical medical dataset
nutrition nutritional dataset
omics metabolomic dataset
packMBPLSDA-package Multi-Block Partial Least Squares Discriminant
Analysis
permut_mbplsda Permutation testing of a multi-block partial
least squares discriminant model
plot_boot_mbplsda Plot the results of the fonction boot_mbplsda
in a pdf file
plot_cvpred_mbplsda Plot the results of the fonction cvpred_mbplsda
in a pdf file
plot_permut_mbplsda Plot the results of the fonction permut_mbplsda
in a pdf file
plot_pred_mbplsda Plot the results of the fonction pred_mbplsda
in a pdf file
plot_testdim_mbplsda Plot the results of the fonction
testdim_mbplsda in a pdf file
pred_mbplsda Observed parameters and predicted categories
from a multi-block partial least squares
discriminant model
status physiopathological status data
testdim_mbplsda Test of number of components by two-fold
cross-validation for a multi-block partial
least squares discriminant model
Marion Brandolini-Bunlon, Stephanie Bougeard, Melanie Petera, Estelle Pujos-Guillot
Maintainer: Marion Brandolini-Bunlon <[email protected]>
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
testdim_mbplsda
plot_testdim_mbplsda
permut_mbplsda
plot_permut_mbplsda
pred_mbplsda
plot_pred_mbplsda
cvpred_mbplsda
plot_cvpred_mbplsda
boot_mbplsda
plot_boot_mbplsda
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2)data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2)
Function to perform bootstraped simulations for multi-block partial least squares discriminant analysis, in order to get confidence intervals for regression coefficients, variable loadings, variable and block importances.
boot_mbplsda(object, nrepet = 199, optdim, cpus = 1, ...)boot_mbplsda(object, nrepet = 199, optdim, cpus = 1, ...)
object |
an object created by mbplsda |
nrepet |
integer indicating the number of repetitions |
optdim |
integer indicating the optimal number of global components to be introduced in the model |
cpus |
integer indicating the number of cpus to use when running the code in parallel |
... |
other arguments to be passed to methods |
no details are needed
XYcoef |
mean, standard deviation, quantiles (0.025;0.975), 95% confidence interval, median for regression coefficients |
faX |
mean, standard deviation, quantiles (0.025;0.975), 95% confidence interval, median for variable loadings |
vipc |
mean, standard deviation, quantiles (0.025;0.975), 95% confidence interval, median for cumulated variable importances |
bipc |
mean, standard deviation, quantiles (0.025;0.975), 95% confidence interval, median for cumulated block importances |
at least 30 bootstrap repetitions may be recommended, more than 100 beeing preferable
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Efron, B., Tibshirani, R.J. (1994). An Introduction to the Bootstrap. Chapman and Hall-CRC Monographs on Statistics and Applied Probability, Norwell, Massachusetts, United States.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
plot_boot_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) resboot <- boot_mbplsda(modelembplsQ, optdim = ncpopt, nrepet = 30, cpus=1)data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) resboot <- boot_mbplsda(modelembplsQ, optdim = ncpopt, nrepet = 30, cpus=1)
Function to perform 2-fold cross-validation for multi-block partial least squares discriminant analysis, in order to get for each observation the cross-validated predicted categories, and the statistical description of the predictions (mean, sd, 95
cvpred_mbplsda(object, nrepet = 100, threshold = 0.5, bloY, optdim, cpus = 1, algo = c("max", "gravity", "threshold"))cvpred_mbplsda(object, nrepet = 100, threshold = 0.5, bloY, optdim, cpus = 1, algo = c("max", "gravity", "threshold"))
object |
an object created by mbplsda |
nrepet |
integer indicating the number of repetitions |
threshold |
numeric indicating the threshold, between 0 and 1, to consider the categories are predicted with the threshold prediction method. |
bloY |
integer vector indicating the number of categories per variable of the Y-block. |
optdim |
integer indicating the (optimal) number of components of the multi-block partial least squares discriminant model |
cpus |
integer indicating the number of cpus to use when running the code in parallel |
algo |
character vector indicating the method(s) of prediction to use (see details) |
Three different algorithms are available to predict the categories of observations. In the max, and respectively the threshold algorithms, numeric values are calculated from the matrix of explanatory variables and the regression coefficients. Then, the predicted categorie for each variable of the Y-block is the one which corresponds to the higher predicted value, respectively to the values higher than the indicated threshold. In the gravity algorithm, predicted scores of the observations on the components are calculated. Then, each observation is assigned to the observed category of which it is closest to the barycentre in the component space.
TRUEnrepet |
number of repetitions |
matPredYc.max |
with the max algorithm, boolean matrix indicating the cross-validated predicted categories on the calibration datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
matPredYv.max |
with the max algorithm, boolean matrix indicating the cross-validated predicted categories on the validation datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
matPredYc.gravity |
with the gravity algorithm, boolean matrix indicating the cross-validated predicted categories on the calibration datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
matPredYv.gravity |
with the gravity algorithm, boolean matrix indicating the cross-validated predicted categories on the validation datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
matPredYc.threshold |
with the threshold algorithm, boolean matrix indicating the cross-validated predicted categories on the calibration datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
matPredYv.threshold |
with the threshold algorithm, boolean matrix indicating the cross-validated predicted categories on the validation datasets, the prediction accuracy for each categorie, each Y-block variable, and overall |
statPredYc.max |
with the max algorithm, matrix indicating the statistical description of prediction categories for each observation on the calibration datasets: number of predictions as an observation of the calibration dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
statPredYv.max |
with the max algorithm, matrix indicating the statistical description of prediction categories for each observation on the validation datasets: number of predictions as an observation of the validation dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
statPredYc.gravity |
with the gravity algorithm, matrix indicating the statistical description of prediction categories for each observation on the calibration datasets: number of predictions as an observation of the calibration dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
statPredYv.gravity |
with the gravity algorithm, matrix indicating the statistical description of prediction categories for each observation on the validation datasets: number of predictions as an observation of the validation dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
statPredYc.threshold |
with the threshold algorithm, matrix indicating the statistical description of prediction categories for each observation on the calibration datasets: number of predictions as an observation of the calibration dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
statPredYv.threshold |
with the threshold algorithm, matrix indicating the statistical description of prediction categories for each observation on the validation datasets: number of predictions as an observation of the validation dataset, modal value, probability to be predicted with its standard deviation, 95% confidence interval, quantiles 0.025 and 0.975, median value |
at least 90 cross-validation repetitions may be recommended
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36(2), 111-147.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
plot_cvpred_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 30, threshold = 0.5, bloY = bloYobs, optdim = ncpopt, cpus = 1, algo = c("max")) data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 90, threshold = 0.5, bloY = bloYobs, optdim = ncpopt, cpus = 1, algo = c("max"))data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 30, threshold = 0.5, bloY = bloYobs, optdim = ncpopt, cpus = 1, algo = c("max")) data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 90, threshold = 0.5, bloY = bloYobs, optdim = ncpopt, cpus = 1, algo = c("max"))
Function to transform a boolean matrix in a disjunctive table
disjunctive(y)disjunctive(y)
y |
boolean matrix indicating observations categories |
no details are needed
ydisj |
disjunctive table |
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
data(status) disjonctif <- (disjunctive(status))data(status) disjonctif <- (disjunctive(status))
function to calculate the generalized inverse of a matrix X
ginv(X, tol = sqrt(.Machine$double.eps))ginv(X, tol = sqrt(.Machine$double.eps))
X |
Matrix for which the generalized inverse is required |
tol |
A relative tolerance to detect zero singular values |
function to calculate the inertia of a matrix
inertie(tab)inertie(tab)
tab |
a matrix |
Function to perform a multi-block partial least squares discriminant analysis (MBPLSDA) of several explanatory blocks defined as an object of class ktab, to explain a dependent dataset (Y-block) defined as an object of class dudi, in order to get model parameters for the indicated number of components.
mbplsda(dudiY, ktabX, scale = TRUE, option = c("uniform", "none"), scannf = TRUE, nf = 2)mbplsda(dudiY, ktabX, scale = TRUE, option = c("uniform", "none"), scannf = TRUE, nf = 2)
dudiY |
an object of class dudi containing the dependent variables |
ktabX |
an object of class ktab containing the blocks of explanatory variables |
scale |
logical value indicating whether the explanatory variables should be standardized |
option |
option for the block weighting. If uniform, the weight of each explanatory block is equal to 1/number of explanatory blocks, and the weight of the Y-block is eqyual to 1. If none, the block weight is equal to the block inertia. |
scannf |
logical value indicating whether the eigenvalues bar plot should be displayed |
nf |
integer indicating the number of components to be calculated |
no details are needed
call |
the matching call |
tabX |
data frame of explanatory variables centered, eventually scaled (if scale=TRUE)and weighted (if option="uniform") |
tabY |
data frame of dependent variables centered, eventually scaled (if scale=TRUE)and weighted (if option="uniform") |
nf |
integer indicating the number of kept dimensions |
lw |
numeric vector of row weights |
X.cw |
numeric vector of column weights for the explanalatory dataset |
blo |
vector of the numbers of variables in each explanatory dataset |
rank |
rank of the analysis |
eig |
numeric vector containing the eigenvalues |
TL |
dataframe useful to manage graphical outputs |
TC |
dataframe useful to manage graphical outputs |
faX |
matrix containing the global variable loadings associated with the global explanatory dataset |
Tc1 |
matrix containing the partial variable loadings associated with each explanatory dataset(unit norm) |
Yc1 |
matrix of the variable loadings associated with the dependent dataset |
lX |
matrix of the global components associated with the whole explanatory dataset(scores of the individuals) |
TlX |
matrix containing the partial components associated with each explanatory dataset |
lY |
matrix of the components associated with the dependent dataset |
cov2 |
squared covariance between lY and TlX |
XYcoef |
list of matrices of the regression coefficients of the whole explanatory dataset onto the dependent dataset |
intercept |
intercept of the regression of the whole explanatory dataset onto the dependent dataset |
XYcoef.raw |
list of matrices of the regression coefficients of the whole raw explanatory dataset onto the raw dependent dataset |
intercept.raw |
intercept of the regression of the whole raw explanatory dataset onto the raw dependent dataset |
bip |
block importances for a given dimension |
bipc |
cumulated block importances for a given number of dimensions |
vip |
variable importances for a given dimension |
vipc |
cumulated variable importances for a given number of dimensions |
This function is coming from the mbpls function of the R package ade4 (application in order to explain a disjunctive table, limitation of the number of calculated components)
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Bougeard, S. and Dray, S. (2018) Supervised Multiblock Analysis in R with the ade4 Package.Journal of Statistical Software,86(1), 1-17.
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2)data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2)
extract of modified medical data obtained from physical examination and questionnaires in a human cohort study
data("medical")data("medical")
A data frame with 40 observations on the following 18 variables.
medic1a numeric vector
medic2a numeric vector
medic3a numeric vector
medic4a numeric vector
medic5a numeric vector
medic6a numeric vector
medic7a numeric vector
medic8a numeric vector
medic9a numeric vector
medic10a numeric vector
medic11a numeric vector
medic12a numeric vector
medic13a numeric vector
medic14a numeric vector
medic15a numeric vector
medic16a numeric vector
medic17a numeric vector
medic18a numeric vector
no details are needed
non-real data
data(medical)data(medical)
extract of modified nutritional data obtained by analysis of food questionnaires in a human cohort study
data("nutrition")data("nutrition")
A data frame with 40 observations on the following 33 variables.
nutri1a numeric vector
nutri2a numeric vector
nutri3a numeric vector
nutri4a numeric vector
nutri5a numeric vector
nutri6a numeric vector
nutri7a numeric vector
nutri8a numeric vector
nutri9a numeric vector
nutri10a numeric vector
nutri11a numeric vector
nutri12a numeric vector
nutri13a numeric vector
nutri14a numeric vector
nutri15a numeric vector
nutri16a numeric vector
nutri17a numeric vector
nutri18a numeric vector
nutri19a numeric vector
nutri20a numeric vector
nutri21a numeric vector
nutri22a numeric vector
nutri23a numeric vector
nutri24a numeric vector
nutri25a numeric vector
nutri26a numeric vector
nutri27a numeric vector
nutri28a numeric vector
nutri29a numeric vector
nutri30a numeric vector
nutri31a numeric vector
nutri32a numeric vector
nutri33a numeric vector
no details are needed
non-real data
data(nutrition)data(nutrition)
extract of modified metabolomic data obtained by LC-MS analysis of human plasma samples in a cohort study
data("omics")data("omics")
A data frame with 40 observations on the following 46 variables.
omic1a numeric vector of relative intensities
omic2a numeric vector of relative intensities
omic3a numeric vector of relative intensities
omic4a numeric vector of relative intensities
omic5a numeric vector of relative intensities
omic6a numeric vector of relative intensities
omic7a numeric vector of relative intensities
omic8a numeric vector of relative intensities
omic9a numeric vector of relative intensities
omic10a numeric vector of relative intensities
omic11a numeric vector of relative intensities
omic12a numeric vector of relative intensities
omic13a numeric vector of relative intensities
omic14a numeric vector of relative intensities
omic15a numeric vector of relative intensities
omic16a numeric vector of relative intensities
omic17a numeric vector of relative intensities
omic18a numeric vector of relative intensities
omic19a numeric vector of relative intensities
omic20a numeric vector of relative intensities
omic21a numeric vector of relative intensities
omic22a numeric vector of relative intensities
omic23a numeric vector of relative intensities
omic24a numeric vector of relative intensities
omic25a numeric vector of relative intensities
omic26a numeric vector of relative intensities
omic27a numeric vector of relative intensities
omic28a numeric vector of relative intensities
omic29a numeric vector of relative intensities
omic30a numeric vector of relative intensities
omic31a numeric vector of relative intensities
omic32a numeric vector of relative intensities
omic33a numeric vector of relative intensities
omic34a numeric vector of relative intensities
omic35a numeric vector of relative intensities
omic36a numeric vector of relative intensities
omic37a numeric vector of relative intensities
omic38a numeric vector of relative intensities
omic39a numeric vector of relative intensities
omic40a numeric vector of relative intensities
omic41a numeric vector of relative intensities
omic42a numeric vector of relative intensities
omic43a numeric vector of relative intensities
omic44a numeric vector of relative intensities
omic45a numeric vector of relative intensities
omic46a numeric vector of relative intensities
no details are needed
non-real data
data(omics)data(omics)
Function to perform permutation testing with 2-fold cross-validation for multi-block partial least squares discriminant analysis, in order to evaluate model validity and predictivity
permut_mbplsda(object, optdim, bloY, algo = c("max", "gravity", "threshold"), threshold = 0.5, nrepet = 100, npermut = 100, nbObsPermut = NULL, outputs = c("ER", "ConfMat", "AUC"), cpus = 1)permut_mbplsda(object, optdim, bloY, algo = c("max", "gravity", "threshold"), threshold = 0.5, nrepet = 100, npermut = 100, nbObsPermut = NULL, outputs = c("ER", "ConfMat", "AUC"), cpus = 1)
object |
an object created by mbplsda_nfX |
optdim |
integer indicating the (optimal) number of components of the multi-block partial least squares discriminant model |
bloY |
integer vector indicating the number of categories per variable of the Y-block. |
algo |
character vector indicating the method(s) of prediction to use (see details) |
threshold |
numeric indicating the threshold, between 0 and 1, to consider the categories are predicted with the threshold prediction method. |
nrepet |
integer indicating the number of repetitions |
npermut |
integer indicating the number of Y-block with switching observations |
nbObsPermut |
integer indicating the number of switching observations in all the modified Y-blocks |
outputs |
character vector indicating the wanted outputs (see details) |
cpus |
integer indicating the number of cpus to use when running the code in parallel |
Three different algorithms are available to predict the categories of observations. In the max, and respectively the threshold algorithms, numeric values are calculated from the matrix of explanatory variables and the regression coefficients. Then, the predicted categorie for each variable of the Y-block is the one which corresponds to the higher predicted value, respectively to the values higher than the indicated threshold. In the gravity algorithm, predicted scores of the observations on the components are calculated. Then, each observation is assigned to the observed category of which it is closest to the barycentre in the component space.
If nbObsPermut is not NULL, t-test are performed to compare mean cross-validated overall prediction error rates (or aera under ROC curve) evaluated on permuted Y-blocks, with the cross-validated overall prediction error rate (or aera under ROC curve) evaluated on the original Y-block.
Available outputs are Error Rates (ER), Confusion Matrix (ConfMat), Aera Under Curve (AUC).
RV.YYpermut.values |
RV coefficient between Y-block and each Y-block with permuted values |
cor.YYpermut.values |
correlation coefficient between categories in the Y-block and each Y-block with permuted values |
prctGlob.Ychange.values |
overall percentage of modified values in each Y-block with permuted values |
prct.Ychange.values |
percentage per category of modified values in each Y-block with permuted values |
descrYperm |
statistical description of RV.YYpermut, cor.YYpermut, prctGlob.Ychange, prct.Ychange |
TruePosC.max, TruePosC.gravity, TruePosC.threshold
|
statistical description of cross-validated percentages of true positive observations per category, evaluated on calibration datasets, with the different algorithms (TruePosC.max for "max", TruePosC.gravity for "gravity", TruePosC.threshold for "threshold"), for each Y-block with permuted values |
TruePosV.max, TruePosV.gravity, TruePosV.threshold
|
statistical description of cross-validated percentages of true positive observations per category, evaluated on validation datasets, with the different algorithms (TruePosV.max for "max", TruePosV.gravity for "gravity", TruePosV.threshold for "threshold"), for each Y-block with permuted values |
TrueNegC.max, TrueNegC.gravity, TrueNegC.threshold
|
statistical description of cross-validated percentages of true negative observations per category, evaluated on calibration datasets, with the different algorithms (TrueNegC.max for "max", TrueNegC.gravity for "gravity", TrueNegC.threshold for "threshold"), for each Y-block with permuted values |
TrueNegV.max, TrueNegV.gravity, TrueNegV.threshold
|
statistical description of cross-validated percentages of true negative observations per category, evaluated on validation datasets, with the different algorithms (TrueNegV.max for "max", TrueNegV.gravity for "gravity", TrueNegV.threshold for "threshold"), for each Y-block with permuted values |
FalsePosC.max, FalsePosC.gravity, FalsePosC.threshold
|
statistical description of cross-validated percentages of false positive observations per category, evaluated on calibration datasets, with the different algorithms (FalsePosC.max for "max", FalsePosC.gravity for "gravity", FalsePosC.threshold for "threshold"), for each Y-block with permuted values |
FalsePosV.max, FalsePosV.gravity, FalsePosV.threshold
|
statistical description of cross-validated percentages of false positive observations per category, evaluated on validation datasets, with the different algorithms (FalsePosV.max for "max", FalsePosV.gravity for "gravity", FalsePosV.threshold for "threshold"), for each Y-block with permuted values |
FalseNegC.max, FalseNegC.gravity, FalseNegC.threshold
|
statistical description of cross-validated percentages of false negative observations per category, evaluated on calibration datasets, with the different algorithms (FalseNegC.max for "max", FalseNegC.gravity for "gravity", FalseNegC.threshold for "threshold"), for each Y-block with permuted values |
FalseNegV.max, FalseNegV.gravity, FalseNegV.threshold
|
statistical description of cross-validated percentages of false negative observations per category, evaluated on validation datasets, with the different algorithms (FalseNegV.max for "max", FalseNegV.gravity for "gravity", FalseNegV.threshold for "threshold"), for each Y-block with permuted values |
ErrorRateC.max, ErrorRateC.gravity, ErrorRateC.threshold
|
statistical description of cross-validated prediction error rates per category, evaluated on calibration datasets, with the different algorithms (ErrorRateC.max for "max", ErrorRateC.gravity for "gravity", ErrorRateC.threshold for "threshold"), for each Y-block with permuted values |
ErrorRateV.max, ErrorRateV.gravity, ErrorRateV.threshold
|
statistical description of cross-validated prediction error rates per category, evaluated on validation datasets, with the different algorithms (ErrorRateV.max for "max", ErrorRateV.gravity for "gravity", ErrorRateV.threshold for "threshold"), for each Y-block with permuted values |
ErrorRateCglobal.max, ErrorRateCglobal.gravity, ErrorRateCglobal.threshold
|
statistical description of cross-validated overall prediction error rates, evaluated on calibration datasets, with the different algorithms (ErrorRateCglobal.max for "max", ErrorRateCglobal.gravity for "gravity", ErrorRateCglobal.threshold for "threshold"), for each Y-block with permuted values |
ErrorRateVglobal.max, ErrorRateVglobal.gravity, ErrorRateVglobal.threshold
|
statistical description of cross-validated overall prediction error rates, evaluated on validation datasets, with the different algorithms (ErrorRateVglobal.max for "max", ErrorRateVglobal.gravity for "gravity", ErrorRateVglobal.threshold for "threshold"), for each Y-block with permuted values |
AUCc |
if all Y-block variables are binary, statistical description of cross-validated aera under ROC curve values per category, evaluated on the validation datasets, for each Y-block with permuted values |
AUCv |
if all Y-block variables are binary, statistical description of cross-validated aera under ROC curve values per category, evaluated on the validation datasets, for each Y-block with permuted values |
AUCc.global |
if all Y-block variables are binary, statistical description of cross-validated overall aera under ROC curve values, evaluated on the validation datasets, for each Y-block with permuted values |
AUCv.global |
if all Y-block variables are binary, statistical description of cross-validated overall aera under ROC curve values, evaluated on the validation datasets, for each Y-block with permuted values |
reg.GlobalRes_prctYchange |
results of linear regression of overall prediction error rates, and overall aera under ROC curve, onto percentages of modified values in Y-block |
ttestMeanERv |
if nbObsPermut is not NULL, results of the t-test comparing mean cross-validated overall prediction error rates (and eventually aera under ROC curve) evaluated on permuted Y-blocks, with the cross-validated overall prediction error rate (and eventually aera under ROC curve) evaluated on the original Y-block |
at least 30 cross-validation repetitions and 100 Y-block with switching observations may be recommended
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Westerhuis, J.A., Hoefsloot, H.C.J., Smit, S., Vis, D.J., Smilde, A.K., van Velzen, E.J.J., van Duijnhoven, J.P.M., van Dorsten, F.A. (2008). Assessment of PLSDA cross validation. Metabolomics, 4, 81-89.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
plot_permut_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[1:20,], omics = omics[1:20,])) disjonctif <- (disjunctive(data.frame(status=status[1:20,], row.names = rownames(status)[1:20]))) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 1) rtsPermut <- permut_mbplsda(modelembplsQ, nrepet = 30, npermut = 100, optdim = ncpopt, outputs = c("ER"), bloY = bloYobs, nbObsPermut = 10, cpus=1, algo = c("max"))data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[1:20,], omics = omics[1:20,])) disjonctif <- (disjunctive(data.frame(status=status[1:20,], row.names = rownames(status)[1:20]))) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 1) rtsPermut <- permut_mbplsda(modelembplsQ, nrepet = 30, npermut = 100, optdim = ncpopt, outputs = c("ER"), bloY = bloYobs, nbObsPermut = 10, cpus=1, algo = c("max"))
Fonction to draw the results of the fonction boot_mbplsda (2-fold cross-validated parameter values) in a pdf file
plot_boot_mbplsda(obj, filename = "PlotBootstrapMbplsda", propbestvar = 0.5)plot_boot_mbplsda(obj, filename = "PlotBootstrapMbplsda", propbestvar = 0.5)
obj |
object type list containing the results of the fonction boot_mbplsda |
filename |
a string of characters indicating the given pdf filename |
propbestvar |
numeric value between 0 and 1, indicating the pourcentage of variables with the best VIPc values to plot |
no details are needed
no numeric result
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Efron, B., Tibshirani, R.J. (1994). An Introduction to the Bootstrap. Chapman and Hall-CRC Monographs on Statistics and Applied Probability, Norwell, Massachusetts, United States.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
boot_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) resboot <- boot_mbplsda(modelembplsQ, optdim = ncpopt, nrepet = 30, cpus=1) plot_boot_mbplsda(resboot,"plotBoot_nf1_30rep", propbestvar=0.20)data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) resboot <- boot_mbplsda(modelembplsQ, optdim = ncpopt, nrepet = 30, cpus=1) plot_boot_mbplsda(resboot,"plotBoot_nf1_30rep", propbestvar=0.20)
Fonction to draw the results of the fonction cvpred_mbplsda (2-fold cross-validated predictions) in a pdf file
plot_cvpred_mbplsda(obj, filename = "PlotCVpredMbplsda")plot_cvpred_mbplsda(obj, filename = "PlotCVpredMbplsda")
obj |
object type list containing the results of the fonction cvpred_mbplsda |
filename |
a string of characters indicating the given pdf filename |
no details are needed
no numeric result
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36(2), 111-147.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
cvpred_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 30, threshold = 0.5, bloY=bloYobs, optdim=ncpopt, cpus = 1, algo = c("max")) plot_cvpred_mbplsda(CVpred,"plotCVPred_nf1_30rep") data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 90, threshold = 0.5, bloY=bloYobs, optdim=ncpopt, cpus = 1, algo = c("max")) plot_cvpred_mbplsda(CVpred,"plotCVPred_nf1_90rep")data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 30, threshold = 0.5, bloY=bloYobs, optdim=ncpopt, cpus = 1, algo = c("max")) plot_cvpred_mbplsda(CVpred,"plotCVPred_nf1_30rep") data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) CVpred <- cvpred_mbplsda(modelembplsQ, nrepet = 90, threshold = 0.5, bloY=bloYobs, optdim=ncpopt, cpus = 1, algo = c("max")) plot_cvpred_mbplsda(CVpred,"plotCVPred_nf1_90rep")
Fonction to draw the results of the fonction permut_mbplsda (plot and regression line of cross validated prediction error rates, evaluated on the validation datasets, in function of the percent of modified Y-block values) in a pdf file
plot_permut_mbplsda(obj, filename = "PlotPermutationTest", MainPlot = "Permutation test results \n (subset of validation)")plot_permut_mbplsda(obj, filename = "PlotPermutationTest", MainPlot = "Permutation test results \n (subset of validation)")
obj |
object type list containing the results of the fonction permut_mbplsda |
filename |
a string of characters indicating the given pdf filename |
MainPlot |
a string of characters indicating the given main title |
no details are needed
no numeric result
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Westerhuis, J.A., Hoefsloot, H.C.J., Smit, S., Vis, D.J., Smilde, A.K., van Velzen, E.J.J., van Duijnhoven, J.P.M., van Dorsten, F.A. (2008). Assessment of PLSDA cross validation. Metabolomics, 4, 81-89.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
permut_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[1:20,], omics = omics[1:20,])) disjonctif <- (disjunctive(data.frame(status=status[1:20,], row.names = rownames(status)[1:20]))) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 1) ncpopt <- 1 rtsPermut <- permut_mbplsda(modelembplsQ, nrepet = 30, npermut = 100, optdim = ncpopt, outputs = c("ER"), bloY=bloYobs, nbObsPermut = 10, cpus = 1, algo = c("max")) plot_permut_mbplsda(rtsPermut,"plotPermut_nf1_30rep_100perm")data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[1:20,], omics = omics[1:20,])) disjonctif <- (disjunctive(data.frame(status=status[1:20,], row.names = rownames(status)[1:20]))) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 1) ncpopt <- 1 rtsPermut <- permut_mbplsda(modelembplsQ, nrepet = 30, npermut = 100, optdim = ncpopt, outputs = c("ER"), bloY=bloYobs, nbObsPermut = 10, cpus = 1, algo = c("max")) plot_permut_mbplsda(rtsPermut,"plotPermut_nf1_30rep_100perm")
Fonction to draw the results of the fonction pred_mbplsda (observed parameter values and predictions) in a pdf file
plot_pred_mbplsda(obj, filename = "PlotPredMbplsda", propbestvar = 0.5)plot_pred_mbplsda(obj, filename = "PlotPredMbplsda", propbestvar = 0.5)
obj |
object type list containing the results of the fonction pred_mbplsda |
filename |
a string of characters indicating the given pdf filename |
propbestvar |
numeric value between 0 and 1, indicating the pourcentage of variables with the best VIPc values to plot |
no details are needed
no numeric result
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
pred_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) predictions <- pred_mbplsda(modelembplsQ, optdim = ncpopt, threshold = 0.5, bloY=bloYobs, algo = c("max", "gravity", "threshold")) plot_pred_mbplsda(predictions,"plotPred_nf1", propbestvar=0.20)data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) predictions <- pred_mbplsda(modelembplsQ, optdim = ncpopt, threshold = 0.5, bloY=bloYobs, algo = c("max", "gravity", "threshold")) plot_pred_mbplsda(predictions,"plotPred_nf1", propbestvar=0.20)
Fonction to draw the results of the fonction testdim_mbplsda (cross validated prediction error rates, or aera under ROC curve, in function of the number of components in the model) in a pdf file
plot_testdim_mbplsda(obj, filename = "PlotTestdimMbplsda")plot_testdim_mbplsda(obj, filename = "PlotTestdimMbplsda")
obj |
object type list containing the results of the fonction testdim_mbplsda |
filename |
a string of characters indicating the given pdf filename |
no details are needed
no numeric result
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36(2), 111-147.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
testdim_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 3) resdim <- testdim_mbplsda(object=modelembplsQ, nrepet = 30, threshold = 0.5, bloY=bloYobs, cpus=1, algo = c("max"), outputs = c("ER")) plot_testdim_mbplsda(resdim, "plotTDim")data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 3) resdim <- testdim_mbplsda(object=modelembplsQ, nrepet = 30, threshold = 0.5, bloY=bloYobs, cpus=1, algo = c("max"), outputs = c("ER")) plot_testdim_mbplsda(resdim, "plotTDim")
Fonction to perform categories predictions from a multi-block partial least squares discriminant model.
pred_mbplsda(object, optdim , threshold = 0.5, bloY, algo = c("max", "gravity", "threshold"))pred_mbplsda(object, optdim , threshold = 0.5, bloY, algo = c("max", "gravity", "threshold"))
object |
an object created by mbplsda |
optdim |
integer indicating the (optimal) number of components of the multi-block partial least squares discriminant model |
threshold |
numeric indicating the threshold, between 0 and 1, to consider the categories are predicted with the threshold prediction method. |
bloY |
integer vector indicating the number of categories per variable of the Y-block. |
algo |
character vector indicating the method(s) of prediction to use (see details) |
Three different algorithms are available to predict the categories of observations. In the max, and respectively the threshold algorithms, numeric values are calculated from the matrix of explanatory variables and the regression coefficients. Then, the predicted categorie for each variable of the Y-block is the one which corresponds to the higher predicted value, respectively to the values higher than the indicated threshold. In the gravity algorithm, predicted scores of the observations on the components are calculated. Then, each observation is assigned to the observed category of which it is closest to the barycentre in the component space.
XYcoef |
list of matrices of the regression coefficients of the whole explanatory dataset onto the dependent dataset |
VIPc |
cumulated variable importances for a given number of dimensions |
BIPc |
cumulated block importances for a given number of dimensions |
faX |
matrix containing the global variable loadings associated with the global explanatory dataset |
lX |
matrix of the global components associated with the whole explanatory dataset(scores of the individuals) |
ConfMat.ErrorRate |
confidence matrix and prediction error rate per category |
ErrorRate.global |
confidence matrix and prediction error rate, per Y-block variable and overall |
PredY.max |
predictions and accuracy of predictions with the "max" algorithm |
PredY.gravity |
predictions and accuracy of predictions with the "gravity" algorithm |
PredY.threshold |
predictions and accuracy of predictions with the "threshold" algorithm |
AUC |
aera under ROC cuve value and 95% confidence interval, per category, per Y-block variable and overall |
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
plot_pred_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) predictions <- pred_mbplsda(modelembplsQ, optdim = ncpopt, threshold = 0.5, bloY=bloYobs, algo = c("max", "gravity", "threshold"))data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical, nutrition = nutrition, omics = omics)) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 ncpopt <- 1 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 2) predictions <- pred_mbplsda(modelembplsQ, optdim = ncpopt, threshold = 0.5, bloY=bloYobs, algo = c("max", "gravity", "threshold"))
physiopathological status of men in a human cohort study
data("status")data("status")
A data frame with 40 observations on the following variable.
statusa factor with levels cas temoin
no details are needed
extract of data not yet published
data(status)data(status)
Function to perform a two-fold cross-validation in order to select the optimal number of dimensions of a multi-block partial least squares discriminant model, according to the classification error rate or to the area under ROC curve
testdim_mbplsda(object, nrepet = 100, algo = c("max", "gravity", "threshold"), threshold = 0.5, bloY, outputs = c("ER", "ConfMat", "AUC"), cpus = 1)testdim_mbplsda(object, nrepet = 100, algo = c("max", "gravity", "threshold"), threshold = 0.5, bloY, outputs = c("ER", "ConfMat", "AUC"), cpus = 1)
object |
an object created by mbplsda_nfX |
nrepet |
integer indicating the number of repetitions |
algo |
character vector indicating the method(s) of prediction to use (see details) |
threshold |
numeric indicating the threshold, between 0 and 1, to consider the categories are predicted with the threshold prediction method. |
bloY |
integer vector indicating the number of categories per variable of the Y-block. |
outputs |
character vector indicating the wanted outputs (see details) |
cpus |
integer indicating the number of cpus to use when running the code in parallel |
Three different algorithms are available to predict the categories of observations. In the max, and respectively the threshold algorithms, numeric values are calculated from the matrix of explanatory variables and the regression coefficients. Then, the predicted categorie for each variable of the Y-block is the one which corresponds to the higher predicted value, respectively to the values higher than the indicated threshold. In the gravity algorithm, predicted scores of the observations on the components are calculated. Then, each observation is assigned to the observed category of which it is closest to the barycentre in the component space.
Available outputs are Error Rates (ER), Confusion Matrix (ConfMat), Aera Under Curve (AUC).
TRUEnrepet |
number of repetitions |
TruePosC.max, .gravity, .threshold
|
statistical description of percentages of true positive observations per category, evaluated on the calibration dataset, with the different algorithms (TPcM for "max", TPcG for "gravity", TPcT for "threshold"), for a number of components ranging from 1 to its maximum value |
TruePosV.max, .gravity, .threshold
|
statistical description of percentages of true positive observations per category, evaluated on the validation dataset, with the different algorithms (TPvM for "max", TPvG for "gravity", TPvT for "threshold"), for a number of components ranging from 1 to its maximum value |
TrueNegC.max, .gravity, .threshold
|
statistical description of percentages of true negative observations per category, evaluated on the calibration dataset, with the different algorithms (TNcM for "max", TNcG for "gravity", TNcT for "threshold"), for a number of components ranging from 1 to its maximum value |
TrueNegV.max, .gravity, .threshold
|
statistical description of percentages of true negative observations per category, evaluated on the validation dataset, with the different algorithms (TNvM for "max", TNvG for "gravity", TNvT for "threshold"), for a number of components ranging from 1 to its maximum value |
FalsePosC.max, .gravity, .threshold
|
statistical description of percentages of false positive observations per category, evaluated on the calibration dataset, with the different algorithms (FPcM for "max", FPcG for "gravity", FPcT for "threshold"), for a number of components ranging from 1 to its maximum value |
FalsePosV.max, .gravity, .threshold
|
statistical description of percentages of false positive observations per category, evaluated on the validation dataset, with the different algorithms (FPvM for "max", FPvG for "gravity", FPvT for "threshold"), for a number of components ranging from 1 to its maximum value |
FalseNegC.max, .gravity, .threshold
|
statistical description of percentages of false negative observations per category, evaluated on the calibration dataset, with the different algorithms (FNcM for "max", FNcG for "gravity", FNcT for "threshold"), for a number of components ranging from 1 to its maximum value |
FalseNegV.max, .gravity, .threshold
|
statistical description of percentages of false negative observations per category, evaluated on the validation dataset, with the different algorithms (FNvM for "max", FNvG for "gravity", FNvT for "threshold"), for a number of components ranging from 1 to its maximum value |
ErrorRateC.max, .gravity, .threshold
|
statistical description of prediction error rates per category, evaluated on the calibration dataset, with the different algorithms (ERcM for "max", ERcG for "gravity", ERcT for "threshold"), for a number of components ranging from 1 to its maximum value |
ErrorRateV.max, .gravity, .threshold
|
statistical description of prediction error rates per category, evaluated on the validation dataset, with the different algorithms (ERvM for "max", ERvG for "gravity", ERvT for "threshold"), for a number of components ranging from 1 to its maximum value |
ErrorRateCglobal.max, .gravity, .threshold
|
statistical description of global prediction error rates, evaluated on the calibration dataset, with the different algorithms (ERcM.global for "max", ERcG.global for "gravity", ERcT.global for "threshold"), for a number of components ranging from 1 to its maximum value |
ErrorRateVglobal.max, .gravity, .threshold
|
statistical description of global prediction error rates, evaluated on the validation dataset, with the different algorithms (ERvM.global for "max", ERvG.global for "gravity", ERvT.global for "threshold"), for a number of components ranging from 1 to its maximum value |
AUCc |
statistical description of aera under ROC curve values per category, evaluated on the calibration dataset, if all Y-block variables are binary, for a number of components ranging from 1 to its maximum value |
AUCv |
statistical description of aera under ROC curve values per category, evaluated on the validation dataset, if all Y-block variables are binary, for a number of components ranging from 1 to its maximum value |
AUCc.global |
statistical description of global aera under ROC curve values, evaluated on the calibration dataset, if all Y-block variables are binary, for a number of components ranging from 1 to its maximum value |
AUCv.global |
statistical description of global aera under ROC curve values, evaluated on the validation dataset, if all Y-block variables are binary, for a number of components ranging from 1 to its maximum value |
at least 30 cross-validation repetitions may be recommended
Marion Brandolini-Bunlon (<[email protected]>) and Stephanie Bougeard (<[email protected]>)
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36(2), 111-147.
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at 12emes Journees Scientifiques RFMF, Clermont-Ferrand, FRA(05-21-2019 - 05-23-2019).
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2019). Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics, 15(10):134
Brandolini-Bunlon, M., Petera, M., Gaudreau, P., Comte, B., Bougeard, S., Pujos-Guillot, E.(2020). A new tool for multi-block PLS discriminant analysis of metabolomic data: application to systems epidemiology. Presented at Chimiometrie 2020, Liege, BEL(01-27-2020 - 01-29-2020).
mbplsda
plot_testdim_mbplsda
packMBPLSDA-package
data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 3) resdim <- testdim_mbplsda(object = modelembplsQ, nrepet = 30, threshold = 0.5, bloY = bloYobs, cpus = 1, algo = c("max"), outputs = c("ER"))data(status) data(medical) data(omics) data(nutrition) ktabX <- ktab.list.df(list(medical = medical[,1:10], nutrition = nutrition[,1:10], omics = omics[,1:20])) disjonctif <- (disjunctive(status)) dudiY <- dudi.pca(disjonctif , center = FALSE, scale = FALSE, scannf = FALSE) bloYobs <- 2 modelembplsQ <- mbplsda(dudiY, ktabX, scale = TRUE, option = "uniform", scannf = FALSE, nf = 3) resdim <- testdim_mbplsda(object = modelembplsQ, nrepet = 30, threshold = 0.5, bloY = bloYobs, cpus = 1, algo = c("max"), outputs = c("ER"))