Title: | A Fast Implementation of Partial Least Square |
---|---|
Description: | An implementation in 'Rcpp' / 'RcppArmadillo' of Partial Least Square algorithms. This package includes other functions to perform the double cross-validation and a fast correlation. |
Authors: | Stefano Cacciatore [aut, trl, cre] , Dupe Ojo [aut] , Leonardo Tenori [aut] , Alessia Vignoli [aut] |
Maintainer: | Stefano Cacciatore <[email protected]> |
License: | GPL-3 |
Version: | 0.2 |
Built: | 2024-12-12 06:49:07 UTC |
Source: | CRAN |
This function perform a fast calculation of the Spearman's correlation coefficient.
fastcor (a, b=NULL, byrow=TRUE, diag=TRUE)
fastcor (a, b=NULL, byrow=TRUE, diag=TRUE)
a |
a matrix of training set cases. |
b |
a matrix of training set cases. |
byrow |
if byrow == T rows are correlated (much faster) else columns |
diag |
if diag == T only the diagonal of the cor matrix is returned (much faster). |
The output matrix of correlation coefficient.
Stefano Cacciatore, Leonardo Tenori, Dupe Ojo, Alessia Vignoli
data(iris) data=as.matrix(iris[,-5]) fastcor(data)
data(iris) data=as.matrix(iris[,-5]) fastcor(data)
This function performs a 10-fold cross validation on a given data set using Partial Least Squares (PLS) model. To assess the prediction ability of the model, a 10-fold cross-validation is conducted by generating splits with a ratio 1:9 of the data set. This is achieved by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection and scaling. Best number of component for PLS was carried out by means of 10-fold cross-validation on the remaining 90% selecting the best Q2y value. Permutation testing was undertaken to estimate the classification/regression performance of predictors.
optim.pls.cv (Xdata, Ydata, ncomp, constrain=NULL, scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), kfold=10)
optim.pls.cv (Xdata, Ydata, ncomp, constrain=NULL, scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), kfold=10)
Xdata |
a matrix of independent variables or predictors. |
Ydata |
the responses. If Ydata is a numeric vector, a regression analysis will be performed. If Ydata is factor, a classification analysis will be performed. |
ncomp |
the number of latent components to be used for classification. |
constrain |
a vector of |
scaling |
the scaling method to be used. Choices are " |
method |
the algorithm to be used to perform the PLS. Choices are " |
svd.method |
the SVD method to be used to perform the PLS. Choices are " |
kfold |
number of cross-validations loops. |
The output of the result is a list with the following components:
B |
the (p x m x length(ncomp)) array containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix. |
Ypred |
the vector containing the predicted values of the response variables obtained by cross-validation. |
Yfit |
the vector containing the fitted values of the response variables. |
P |
the (p x max(ncomp)) matrix containing the X-loadings. |
Q |
the (m x max(ncomp)) matrix containing the Y-loadings. |
T |
the (ntrain x max(ncomp)) matrix containing the X-scores (latent components) |
R |
the (p x max(ncomp)) matrix containing the weights used to construct the latent components. |
Q2Y |
predicting power of model. |
R2Y |
proportion of variance in Y. |
R2X |
vector containg the explained variance of X by each PLS component. |
txtQ2Y |
a summary of the Q2y values. |
txtR2Y |
a summary of the R2y values. |
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
data(iris) data=iris[,-5] labels=iris[,5] pp=optim.pls.cv(data,labels,2:4) pp$optim_comp
data(iris) data=iris[,-5] labels=iris[,5] pp=optim.pls.cv(data,labels,2:4) pp$optim_comp
Partial Least Squares (PLS) classification and regression for test set from training set.
pls (Xtrain, Ytrain, Xtest = NULL, Ytest = NULL, ncomp=min(5,c(ncol(Xtrain),nrow(Xtrain))), scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), fit = FALSE, proj = FALSE, perm.test = FALSE, times = 100)
pls (Xtrain, Ytrain, Xtest = NULL, Ytest = NULL, ncomp=min(5,c(ncol(Xtrain),nrow(Xtrain))), scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), fit = FALSE, proj = FALSE, perm.test = FALSE, times = 100)
Xtrain |
a matrix of training set cases. |
Ytrain |
a classification vector. |
Xtest |
a matrix of test set cases. |
Ytest |
a classification vector. |
ncomp |
the number of components to consider. |
scaling |
the scaling method to be used. Choices are " |
method |
the algorithm to be used to perform the PLS. Choices are " |
svd.method |
the SVD method to be used to perform the PLS. Choices are " |
fit |
a boolean value to perform the fit. |
proj |
a boolean value to perform the fit. |
perm.test |
a classification vector. |
times |
a classification vector. |
A list with the following components:
B |
the (p x m x length(ncomp)) matrix containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix. |
Q |
the (m x max(ncomp)) matrix containing the Y-loadings. |
Ttrain |
the (ntrain x max(ncomp)) matrix containing the X-scores (latent components) |
R |
the (p x max(ncomp)) matrix containing the weights used to construct the latent components. |
mX |
mean X. |
vX |
variance X. |
mY |
mean Y. |
p |
matrix for the independent variable X. This indicates how the original data relates to the latent components. |
m |
number of predictor variables |
ncomp |
number of components used in the PLS model |
Yfit |
the prediction values based on the PLS model |
R2Y |
proportion of variance in Y |
classification |
a boolgean output is given indicating if the response variable is a classification |
lev |
level of response variable Y |
Ypred |
the (ntest x m x length(ncomp)) containing the predicted values of the response variables for the observations from Xtest. The third dimension of the matrix Ypred corresponds to the number of PLS components used to compute the regression coefficients. |
P |
the (p x max(ncomp)) matrix containing the X-loadings. |
Ttest |
... |
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
data(iris) data=iris[,-5] labels=iris[,5] ss=sample(150,15) ncomponent=3 z=pls(data[-ss,], labels[-ss], data[ss,], ncomp=ncomponent)
data(iris) data=iris[,-5] labels=iris[,5] ss=sample(150,15) ncomponent=3 z=pls(data[-ss,], labels[-ss], data[ss,], ncomp=ncomponent)
This function performs a 10-fold cross validation on a given data set using Partial Least Squares (PLS) model. To assess the prediction ability of the model, a 10-fold cross-validation is conducted by generating splits with a ratio 1:9 of the data set, that is by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection and scaling. Best number of component for PLS was carried out by means of 10-fold cross-validation on the remaining 90% selecting the best Q2y value. Permutation testing was undertaken to estimate the classification/regression performance of predictors.
pls.double.cv (Xdata, Ydata, ncomp=min(5,c(ncol(Xdata),nrow(Xdata))), constrain=1:nrow(Xdata), scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), perm.test=FALSE, times=100, runn=10, kfold_inner=10, kfold_outer=10)
pls.double.cv (Xdata, Ydata, ncomp=min(5,c(ncol(Xdata),nrow(Xdata))), constrain=1:nrow(Xdata), scaling = c("centering", "autoscaling","none"), method = c("plssvd", "simpls"), svd.method = c("irlba", "dc"), perm.test=FALSE, times=100, runn=10, kfold_inner=10, kfold_outer=10)
Xdata |
a matrix. |
Ydata |
the responses. If Ydata is a numeric vector, a regression analysis will be performed. If Ydata is factor, a classification analysis will be performed. |
ncomp |
the number of latent components to be used for classification. |
constrain |
a vector of |
scaling |
the scaling method to be used. Choices are " |
method |
the algorithm to be used to perform the PLS. Choices are " |
svd.method |
the SVD method to be used to perform the PLS. Choices are " |
perm.test |
a classification vector. |
times |
number of cross-validations with permutated samples |
runn |
number of cross-validations loops. |
kfold_inner |
if perform the optmization of the number of components. |
kfold_outer |
if perform the optmization of the number of components. |
A list with the following components:
B |
the (p x m x length(ncomp)) array containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix. |
Ypred |
the vector containing the predicted values of the response variables obtained by cross-validation. |
Yfit |
the vector containing the fitted values of the response variables. |
P |
the (p x max(ncomp)) matrix containing the X-loadings. |
Q |
the (m x max(ncomp)) matrix containing the Y-loadings. |
T |
the (ntrain x max(ncomp)) matrix containing the X-scores (latent components) |
R |
the (p x max(ncomp)) matrix containing the weights used to construct the latent components. |
Q2Y |
predictive power of the model. |
R2Y |
proportion of variance in Y. |
R2X |
vector containg the explained variance of X by each PLS component. |
txtQ2Y |
a summary of the Q2y values. |
txtR2Y |
a summary of the R2y values. |
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
data(iris) data=iris[,-5] labels=iris[,5] pp=pls.double.cv(data,labels,2:4)
data(iris) data=iris[,-5] labels=iris[,5] pp=pls.double.cv(data,labels,2:4)
Partial Least Squares (PLS) regression for test set from training set.
## S3 method for class 'fastPLS' predict(object, newdata, Ytest=NULL, proj=FALSE, ...)
## S3 method for class 'fastPLS' predict(object, newdata, Ytest=NULL, proj=FALSE, ...)
object |
a matrix of training set cases. |
newdata |
a matrix of predictor variables X for the test set. |
Ytest |
a vector of the response variable Y from Xtest. |
proj |
projection of the test set. |
... |
further arguments. Currently not used. |
A list with the following components:
Ypred |
the (ntest x m x length(ncomp)) containing the predicted values of the response variables for the observations from Xtest. The third dimension of the matrix Ypred corresponds to the number of PLS components used to compute the regression coefficients. |
Q2Y |
predictive power of model |
Ttest |
the (ntrain x max(ncomp)) matrix containing the X-scores (latent components) |
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
data(iris) data=iris[,-5] labels=iris[,5] ss=sample(150,15) ncomponent=3 z=pls(data[-ss,], labels[-ss], ncomp=ncomponent) predict(z,data[ss,],FALSE)
data(iris) data=iris[,-5] labels=iris[,5] ss=sample(150,15) ncomponent=3 z=pls(data[-ss,], labels[-ss], ncomp=ncomponent) predict(z,data[ss,],FALSE)
This function converts a classification vector into a classification matrix.
transformy(y)
transformy(y)
y |
a vector or factor. |
This function converts a classification vector into a classification matrix. Different groups are compared amongst each other.
A matrix.
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
y=c(1,1,1,1,2,2,2,3,3) print(y) z=transformy(y) print(z)
y=c(1,1,1,1,2,2,2,3,3) print(y) z=transformy(y) print(z)
Variable Importance in the Projection (VIP) is a score that measures how important a variable is in a Partial Least Squares (PLS) model. VIP scores are used to identify which variables are most important in a model and are often used for variable selection.
ViP (model)
ViP (model)
model |
a object returning from the pls function. |
A list with the following components:
B |
the (p x m x length(ncomp)) matrix containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix. |
Ypred |
the (ntest x m x length(ncomp)) containing the predicted values of the response variables for the observations from Xtest. The third dimension of the matrix Ypred corresponds to the number of PLS components used to compute the regression coefficients. |
P |
the (p x max(ncomp)) matrix containing the X-loadings. |
Q |
the (m x max(ncomp)) matrix containing the Y-loadings. |
T |
the (ntrain x max(ncomp)) matrix containing the X-scores (latent components) |
R |
the (p x max(ncomp)) matrix containing the weights used to construct the latent components. |
Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori
data(iris) data=as.matrix(iris[,-5]) labels=iris[,5] pp=pls(data,labels,ncomp = 2) ViP(pp)
data(iris) data=as.matrix(iris[,-5]) labels=iris[,5] pp=pls(data,labels,ncomp = 2) ViP(pp)