Title: | Multivariate Data Analysis Laboratory |
---|---|
Description: | An open-source implementation of latent variable methods and multivariate modeling tools. The focus is on exploratory analyses using dimensionality reduction methods including low dimensional embedding, classical multivariate statistical tools, and tools for enhanced interpretation of machine learning methods (i.e. intelligible models to provide important information for end-users). Target domains include extension to dedicated applications e.g. for manufacturing process modeling, spectroscopic analyses, and data mining. |
Authors: | Nelson Lee Afanador, Thanh Tran, Lionel Blanchet, and Richard Baumgartner |
Maintainer: | Nelson Lee Afanador <[email protected]> |
License: | GPL-3 |
Version: | 1.7 |
Built: | 2024-12-10 06:38:34 UTC |
Source: | CRAN |
Implementation of latent variables methods. The focus is on explorative anlaysis using dimensionality reduction methods, such as Principal Component Analysis (PCA), and on multivariate regression based on Partial Least Squares regression (PLS). PLS analyses are supported by embedded bootstrapping and variable selection procedures.
Package: | mvdalab |
Type: | Package |
Version: | 1.0 |
Date: | 2015-08-10 |
License: | GPL-3 |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected]), Lionel Blanchet ([email protected]), Richard Baumgartner ([email protected])
Maintainer: Nelson Lee Afanador ([email protected])
This function computes the autocorrelation function estimates for a selected parameter.
acfplot(object, parm = NULL)
acfplot(object, parm = NULL)
object |
an object of class |
parm |
a chosen predictor variable; if |
This function computes the autocorrelation function estimates for a selected parameter, via acf
, and generates a graph that allows the analyst to assess the need for an autocorrelation adjustment in the smc
.
Nelson Lee Afanador ([email protected])
This function is built using the acf
function in the stats R package.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer-Verlag.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") acfplot(mod1, parm = NULL)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") acfplot(mod1, parm = NULL)
This function provides the actual versus predicted and actual versus residuals plot as part of a model assessment
ap.plot(object, ncomp = object$ncomp, verbose = FALSE)
ap.plot(object, ncomp = object$ncomp, verbose = FALSE)
object |
an object of class |
ncomp |
number of components used in the model assessment |
verbose |
output results as a data frame |
This function provides the actual versus predicted and residuals versus predicted plot as part of model a assessment across the desired number of latent variables. A smooth fit (dashed line) is added in order to detect curvature in the fit.
The output of ap.plot
is a two facet graph for actual versus predicted and residuals versus predicted plots.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") ap.plot(mod1, ncomp = 2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") ap.plot(mod1, ncomp = 2)
Computes bootstrap BCa confidence intervals for chosen parameters for PLS models fitted with validation = "oob"
.
bca.cis(object, conf = .95, type = c("coefficients", "loadings", "weights"))
bca.cis(object, conf = .95, type = c("coefficients", "loadings", "weights"))
object |
an object of class |
conf |
desired confidence level |
type |
input parameter vector |
The function computes the bootstrap BCa confidence intervals for any fitted mvdareg
model.
Should be used in instances in which there is reason to suspectd the percentile intervals. Results provided across all latent variables (LVs). As such, it may be slow for models with a large number of LVs.
A bca.cis object contains component results for the following:
ncomp |
number of components in the model |
variables |
variable names |
boot.mean |
mean of the bootstrap |
BCa percentiles |
confidence intervals |
proportional bias |
calculated bias |
skewness |
skewness of the bootstrap distribution |
a |
acceleration contstant |
Nelson Lee Afanador ([email protected])
There are many references explaining the bootstrap and its implementation for confidence interval estimation. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall.
Hinkley, D.V. (1988) Bootstrap methods (with Discussion). Journal of the Royal Statistical Society, B, 50, 312:337, 355:370.
data(Penta) ## Number of bootstraps set to 250 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 250) bca.cis(mod1, conf = .95, type = "coefficients") ## Not run: bca.cis(mod1, conf = .95, type = "loadings") bca.cis(mod1, conf = .95, type = "weights") ## End(Not run)
data(Penta) ## Number of bootstraps set to 250 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 250) bca.cis(mod1, conf = .95, type = "coefficients") ## Not run: bca.cis(mod1, conf = .95, type = "loadings") bca.cis(mod1, conf = .95, type = "weights") ## End(Not run)
Bidiagonalization algorithm for PLS1
bidiagpls.fit(X, Y, ncomp, ...)
bidiagpls.fit(X, Y, ncomp, ...)
X |
a matrix of observations. |
Y |
a vector. |
ncomp |
the number of components to include in the model (see below). |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through plsFit
with the argument method="bidiagpls"
. It implements the Bidiag2 scores algorithm.
An object of class mvdareg
is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:
loadings |
X loadings |
weights |
weights |
D2 |
bidiag2 matrix |
iD2 |
inverse of bidiag2 matrix |
Ymean |
mean of reponse variable |
Xmeans |
mean of predictor variables |
coefficients |
regression coefficients |
y.loadings |
y-loadings |
scores |
X scores |
R |
orthogonal weights |
Y |
scaled response values |
Yactual |
actual response values |
fitted |
fitted values |
residuals |
residuals |
Xdata |
X matrix |
iPreds |
predicted values |
y.loadings2 |
scaled y-loadings |
fit.time |
model fitting time |
val.method |
validation method |
ncomp |
number of latent variables |
contrasts |
contrast matrix used |
method |
PLS algorithm used |
scale |
scaling used |
validation |
validation method |
call |
model call |
terms |
model terms |
model |
fitted model |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
Indahl, Ulf G., (2014) The geometry of PLS1 explained properly: 10 key notes on mathematical properties of and some alternative algorithmic approaches to PLS1 modeling. Journal of Chemometrics, 28, 168:180.
Manne R., Analysis of two partial-least-squares algorithms for multi-variate calibration. Chemom. Intell. Lab. Syst. 1987; 2: 187:197.
Generates a 2D Graph of both the scores and loadings for both "mvdareg"
and "mvdapca"
objects.
BiPlot(object, diag.adj = c(0, 0), axis.scaling = 2, cov.scale = FALSE, comps = c(1, 2), col = "red", verbose = FALSE)
BiPlot(object, diag.adj = c(0, 0), axis.scaling = 2, cov.scale = FALSE, comps = c(1, 2), col = "red", verbose = FALSE)
object |
an object of class |
diag.adj |
adjustment to singular values. see details. |
axis.scaling |
a graphing parameter for extenting the axis. |
cov.scale |
implement covariance scaling |
comps |
the components to illustrate on the graph |
col |
the color applied to the scores |
verbose |
output results as a data frame |
"BiPlot"
is used to extract a 2D graphical summary of the scores and loadings of PLS and PCA models.
The singular values are scaled so that the approximation becomes X = GH':
X = ULV' = (UL^alpha1)(L^alpha2V') = GH', and where alpha2 is = to (1 = alpha)
The rows of the G matrix are plotted as points, corresponding to observations. The rows of the H matrix are plotted as vectors, corresponding to variables. The choice of alpha determines the following:
c(0, 0): variables are scaled to unit length and treats observations and variables symmetrically.
c(0, 1): This biplot attempts to preserve relationships between variables wherein the distance betweein any two rows of G is proportional to the Mahalanobis distance between the same observations in the data set.
c(1, 0): This biplot attempts to preserve the distance between observations where in the positions of the points in the biplot are identical to the score plot of first two principal components, but the distance between any two rows of G is equal to the Euclidean distance between the corresponding observations in the data set.
cov.scale = FALSE
sets diag.adj to c(0, 0) and multiples G by sqrt(n - 1) and divides H by sqrt(n - 1). In this biplot the rows of H approximate the variance of the corresponding variable, and the distance between any two points of G approximates the Mahalanobis distance between any two rows.
Additional scalings may be implemented.
Nelson Lee Afanador ([email protected])
SAS Stat Studio 3.11 (2009), User's Guide.
Additional information pertaining to biplots can be obtained from the following:
Friendly, M. (1991), SAS System for Statistical Graphics , SAS Series in Statistical Applications, Cary, NC: SAS Institute
Gabriel, K. R. (1971), "The Biplot Graphical Display of Matrices with Applications to Principal Component Analysis," Biometrika , 58(3), 453–467.
Golub, G. H. and Van Loan, C. F. (1989), Matrix Computations , Second Edition, Baltimore: Johns Hopkins University Press.
Gower, J. C. and Hand, D. J. (1996), Biplots , London: Chapman & Hall.
Jackson, J. E. (1991), A User's Guide to Principal Components , New York: John Wiley & Sons.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") BiPlot(mod1, diag.adj = c(0, 0), axis.scaling = 2, cov.scale = FALSE) ## Not run: data(Penta) mod2 <- pcaFit(Penta[, -1], ncomp = 4) BiPlot(mod2, diag.adj = c(0, 0), axis.scaling = 2.25, cov.scale = FALSE) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") BiPlot(mod1, diag.adj = c(0, 0), axis.scaling = 2, cov.scale = FALSE) ## Not run: data(Penta) mod2 <- pcaFit(Penta[, -1], ncomp = 4) BiPlot(mod2, diag.adj = c(0, 0), axis.scaling = 2.25, cov.scale = FALSE) ## End(Not run)
mvdareg
ObjectThis takes an mvdareg
object fitted with validation = "oob"
and produces a graph of the bootstrap distribution and its corresponding normal quantile plot for a variable of interest.
boot.plots(object, comp = object$ncomp, parm = NULL, type = c("coefs", "weights", "loadings"))
boot.plots(object, comp = object$ncomp, parm = NULL, type = c("coefs", "weights", "loadings"))
object |
an object of class |
comp |
latent variable from which to generate the bootstrap distribution for a specific parameter |
parm |
a parameter for which to generate the bootstrap distribution |
type |
input parameter vector |
The function generates the bootstrap distribution and normal quantile plot for a bootstrapped mvdareg
model given validation = "oob"
for type = c("coefs", "weights", "loadings")
. If parm = NULL
a paramater is chosen at random.
The output of boot.plots
is a histogram of the bootstrap distribution and the corresponding normal quantile plot.
Nelson Lee Afanador ([email protected])
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) boot.plots(mod1, type = "coefs", parm = NULL)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) boot.plots(mod1, type = "coefs", parm = NULL)
Functions to extract information from mvdalab
objects.
## S3 method for class 'mvdareg' coef(object, ncomp = object$ncomp, type = c("coefficients", "loadings", "weights", "y.loadings"), conf = .95, ...)
## S3 method for class 'mvdareg' coef(object, ncomp = object$ncomp, type = c("coefficients", "loadings", "weights", "y.loadings"), conf = .95, ...)
object |
an mvdareg object, i.e. a |
ncomp |
the number of components to include in the model (see below). |
type |
specify model parameters to return. |
conf |
for a bootstrapped model, the confidence level to use. |
... |
additional arguments. Currently ignored. |
These are usually called through their generic functions coef
and residuals
, respectively.
coef.mvdareg
is used to extract the regression coefficients, loadings, or weights of a PLS model.
If comps
is missing (or is NULL
), all parameter estimates are returned.
coefficients |
a named vector, or matrix, of coefficients. |
loadings |
a named vector, or matrix, of loadings. |
weights |
a named vector, or matrix, of weights. |
y.loadings |
a named vector, or matrix, of y.loadings. |
Nelson Lee Afanador ([email protected])
coef
, coefficients.boots
, coefficients
,
loadings
, loadings.boots
, weights
,
weight.boots
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coef(mod1, type = "coefficients")
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coef(mod1, type = "coefficients")
Computes bootstrap BCa confidence intervals for regression coefficients, along with expanded bootstrap summaries.
coefficients.boots(object, ncomp = object$ncomp, conf = 0.95)
coefficients.boots(object, ncomp = object$ncomp, conf = 0.95)
object |
an object of class |
ncomp |
number of components in the model |
conf |
desired confidence level |
The function computes the bootstrap BCa confidence intervals for fitted mvdareg
models where valiation = "oob"
.
Should be used in instances in which there is reason to suspectd the percentile intervals. Results provided across all latent variables or for specific latent variables via ncomp
.
A coefficients.boots object contains component results for the following:
variable |
variable names |
actual |
Actual loading estimate using all the data |
BCa percentiles |
confidence intervals |
boot.mean |
mean of the bootstrap |
skewness |
skewness of the bootstrap distribution |
bias |
estimate of bias w.r.t. the loading estimate |
Bootstrap Error |
estimate of bootstrap standard error |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
There are many references explaining the bootstrap. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall.
Hinkley, D.V. (1988) Bootstrap methods (with Discussion). Journal of the Royal Statistical Society, B, 50, 312:337, 355:370.
coef
, coefficients
,
coefsplot
, coefficients
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) coefficients.boots(mod1, ncomp = 2, conf = .95)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) coefficients.boots(mod1, ncomp = 2, conf = .95)
Functions to extract regression coefficient bootstrap information from mvdalab objects.
## S3 method for class 'mvdareg' coefficients(object, ncomp = object$ncomp, conf = .95, ...)
## S3 method for class 'mvdareg' coefficients(object, ncomp = object$ncomp, conf = .95, ...)
object |
an mvdareg object. A fitted model. |
ncomp |
the number of components to include in the model (see below). |
conf |
for a bootstrapped model, the confidence level to use. |
... |
additional arguments. Currently ignored. |
coefficients
is used to extract a bootstrap summary of the regression of a PLS model.
If comps
is missing (or is NULL), summaries for all regression estimates are returned. Otherwise, if comps is given parameters for a model with only the requested component comps is returned.
Boostrap summaries provided are for actual regression coefficients, bootstrap percentiles, bootstrap mean, skewness, and bias. These summaries can also be extracted using coefficients.boots
A coefficients object contains a data frame with columns:
variable |
variable names |
Actual |
Actual loading estimate using all the data |
BCa percentiles |
confidence intervals |
boot.mean |
mean of the bootstrap |
skewness |
skewness of the bootstrap distribution |
bias |
estimate of bias w.r.t. the loading estimate |
Bootstrap Error |
estimate of bootstrap standard error |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
coef
, coefficients.boots
, coefficients
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefficients(mod1)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefficients(mod1)
Functions to extract 2D graphical coefficients information from mvdalab
objects.
coefficientsplot2D(object, comps = c(1, 2), verbose = FALSE)
coefficientsplot2D(object, comps = c(1, 2), verbose = FALSE)
object |
an |
comps |
a vector of length 2 corresponding to the number of components to include. |
verbose |
output results as a data frame |
coefficientsplot2D
is used to extract a graphical summary of the coefficients of a PLS model.
If comp
is missing
(or is NULL
), a graphical summary for the 1st and 2nd components is returned.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefficientsplot2D(mod1, comp = c(1, 2))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefficientsplot2D(mod1, comp = c(1, 2))
Functions to extract regression coefficient bootstrap information from mvdalab objects.
coefsplot(object, ncomp = object$ncomp, conf = 0.95, verbose = FALSE)
coefsplot(object, ncomp = object$ncomp, conf = 0.95, verbose = FALSE)
object |
an mvdareg object. A fitted model. |
ncomp |
the number of components to include. |
conf |
for a bootstrapped model, the confidence level to use. |
verbose |
output results as a data frame |
coefficients
is used to extract a graphical summary of the regression coefficients of a PLS model.
If comps
is missing
(or is NULL
), a graphical summary for the nth component regression estimates are returned. Otherwise, if comps
is given parameters for a model with only the requested component comps
is returned.
Bootstrap graphcal summaries provided are when method = oob
.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefsplot(mod1, ncomp = 1:2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") coefsplot(mod1, ncomp = 1:2)
Scores obtained from 87 college students on the College Level Examination Program and the College Qualification Test.
College
College
A data frame with 87 observations and the following 3 variables.
Science
Science (CQT) - numerical vector
Social
Social science and history (CLEP) - numerical vector
Verbal
Verbal (CQT) - numerical vector
Johnson, R.A., Wichern, D.W. (2002) Applied Multivariate Statistical Analysis. Prentice Hall.
This function generates a cell means contrast matrix to support PLS models.
contr.niets(n, contrasts)
contr.niets(n, contrasts)
n |
A vector of levels for a factor, or the number of levels. |
contrasts |
a logical indicating whether contrasts should be computed; set to |
This function uses contr.treatment
to generate a cell means contrast matrix in support of PLS models.
For datasets with categorical variables it produces the needed design matrix.
Nelson Lee Afanador
# Three levels levels <- LETTERS[1:3] contr.niets(levels) # Two levels levels <- LETTERS[1:2] contr.niets(levels)
# Three levels levels <- LETTERS[1:3] contr.niets(levels) # Two levels levels <- LETTERS[1:2] contr.niets(levels)
This function draws econfidence ellipses for covariance and correlation matrices derived from from either a matrix or dataframe.
ellipse.mvdalab(data, center = c(0, 0), radius = "chi", scale = TRUE, segments = 51, level = c(0.95, 0.99), plot.points = FALSE, pch = 1, size = 1, alpha = 0.5, verbose = FALSE, ...)
ellipse.mvdalab(data, center = c(0, 0), radius = "chi", scale = TRUE, segments = 51, level = c(0.95, 0.99), plot.points = FALSE, pch = 1, size = 1, alpha = 0.5, verbose = FALSE, ...)
data |
A dataframe |
center |
2-element vector with coordinates of center of ellipse. |
radius |
Use of the Chi or F Distributions for setting the radius of the confidence ellipse |
scale |
use correlation or covariance matrix |
segments |
number of line-segments used to draw ellipse. |
level |
draw elliptical contours at these (normal) probability or confidence levels. |
pch |
symbols to use for scores |
size |
size to use for scores |
alpha |
transparency of scores |
plot.points |
Should the points be added to the graph. |
verbose |
output results as a data frame |
... |
additional arguments. Currently ignored. |
ellipse
uses the singular value decomposition in order to generate the desired confidence regions. The default confidence ellipse is based on the chisquare statistic.
Returns a graph with the ellipses at the stated as levels, as well as the ellipse coordinates.
Nelson Lee Afanador ([email protected])
Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage.
data(iris) ellipse.mvdalab(iris[, 1:2], plot.points = FALSE) ellipse.mvdalab(iris[, 1:2], center = colMeans(iris[, 1:2]), plot.points = TRUE)
data(iris) ellipse.mvdalab(iris[, 1:2], plot.points = FALSE) ellipse.mvdalab(iris[, 1:2], center = colMeans(iris[, 1:2]), plot.points = TRUE)
Imputes the mean or median for continous variables; highest frequency for categorical variables.
imputeBasic(data, Init = "mean")
imputeBasic(data, Init = "mean")
data |
a dataset with missing values |
Init |
For continous variables impute either the mean or median |
A completed data frame is returned. For numeric variables, NAs
are replaced with column means or medians. For categorical variables, NAs
are replaced with the most frequent levels. If object contains no NAs
, it is returned unaltered.
imputeBasic
returns a list containing the following components:
Imputed.DataFrame |
Final imputed data frame |
Imputed.Missing.Continous |
Imputed continous values |
Imputed.Missing.Factors |
Imputed categorical values |
Nelson Lee Afanador ([email protected])
dat <- introNAs(iris, percent = 25) imputeBasic(dat)
dat <- introNAs(iris, percent = 25) imputeBasic(dat)
Missing values are iterarively updated via an EM algorithm.
imputeEM(data, impute.ncomps = 2, pca.ncomps = 2, CV = TRUE, Init = "mean", scale = TRUE, iters = 25, tol = .Machine$double.eps^0.25)
imputeEM(data, impute.ncomps = 2, pca.ncomps = 2, CV = TRUE, Init = "mean", scale = TRUE, iters = 25, tol = .Machine$double.eps^0.25)
data |
a dataset with missing values. |
impute.ncomps |
integer corresponding to the minimum number of components to test. |
pca.ncomps |
minimum number of components to use in the imputation. |
CV |
Use cross-validation in determining the optimal number of components to retain for the final imputation. |
Init |
For continous variables impute either the mean or median. |
scale |
Scale variables to unit variance. |
iters |
For continous variables impute either the mean or median. |
tol |
the threshold for assessing convergence. |
A completed data frame is returned that mirrors a model.matrix
. NAs
are replaced with convergence values as obtained via EM. If object contains no NAs
, it is returned unaltered.
imputeEM
returns a list containing the following components:
Imputed.DataFrames |
A list of imputed data frames across |
Imputed.Continous |
A list of imputed values, at each EM iteration, across |
CV.Results |
Cross-validation results across |
ncomps |
|
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
B. Walczak, D.L. Massart. Dealing with missing data, Part I. Chemom. Intell. Lab. Syst. 58 (2001); 15:27
dat <- introNAs(iris, percent = 25) imputeEM(dat)
dat <- introNAs(iris, percent = 25) imputeEM(dat)
Missing value imputed as 'Missing'.
imputeQs(data)
imputeQs(data)
data |
a dataset with missing values |
A completed data frame is returned. For continous variables with missing values, missing values are replaced with 'Missing', while the non-missing values are replaced with their corresponding quartile assignment. For categorical variable with missing values, missing values are replaced with 'Missing'. This procedure can greatly increases the dimensionality of the data.
Nelson Lee Afanador ([email protected])
dat <- introNAs(iris, percent = 25) imputeQs(dat)
dat <- introNAs(iris, percent = 25) imputeQs(dat)
After generating a cell means model matrix, impute expected values (mean or median for continous; hightest frequency for categorical).
imputeRough(data, Init = "mean")
imputeRough(data, Init = "mean")
data |
a dataset with missing values |
Init |
For continous variables impute either the mean or median |
A completed data frame is returned that mirrors a model.matrix
. NAs
are replaced with column means or medians. If object contains no NAs
, it is returned unaltered. This is the starting point for imputeEM.
imputeRough
returns a list containing the following components:
Initials |
Imputed values |
Pre.Imputed |
Pre-imputed data frame |
Imputed.Dataframe |
Imputed data frame |
Nelson Lee Afanador ([email protected])
dat <- introNAs(iris, percent = 25) imputeRough(dat)
dat <- introNAs(iris, percent = 25) imputeRough(dat)
Function for testing missing value imputation algorithms
introNAs(data, percent = 25)
introNAs(data, percent = 25)
data |
a dataset without missing values. |
percent |
the percent data that should be randomly assigned as missing |
A completed data frame is returned with the desired percentage of missing data. NAs
are assigned at random.
Nelson Lee Afanador ([email protected])
dat <- introNAs(iris) dat
dat <- introNAs(iris) dat
This function calculates the jackknife influence values from a bootstrap output mvdareg
object and plots the corresponding jackknife-after-bootstrap plot.
jk.after.boot(object, ncomp = object$ncomp, type = c("coefficients", "loadings", "weights"), parm = NULL)
jk.after.boot(object, ncomp = object$ncomp, type = c("coefficients", "loadings", "weights"), parm = NULL)
object |
an mvdareg object. A fitted model. |
ncomp |
the component number to include in the jackknife-after-bootstrap plot assessment. |
type |
input parameter vector. |
parm |
predictor variable for which to perform the assessment. if |
The centred jackknife quantiles for each observation are estimated from those bootstrap samples in which a particular observation did not appear. These are then plotted against the influence values.
The resulting plots are useful diagnostic tools for looking at the way individual observations affect the bootstrap output.
The plot will consist of a number of horizontal dotted lines which correspond to the quantiles of the centred bootstrap distribution. For each data point the quantiles of the bootstrap distribution calculated by omitting that point are plotted against the jackknife values. The observation number is printed below the plots. To make it easier to see the effect of omitting points on quantiles, the plotted quantiles are joined by line segments. These plots provide a useful diagnostic tool in establishing the effect of individual observations on the bootstrap distribution. See the references below for some guidelines on the interpretation of the plots.
There is no returned value but a graph is generated on the current graphics display.
Nelson Lee Afanador ([email protected])
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1992) Jackknife-after-bootstrap standard errors and influence functions (with Discussion). Journal of the Royal Statistical Society, B, 54, 83:127.
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) jk.after.boot(mod1, type = "coefficients") ## Not run: jk.after.boot(mod1, type = "loadings") jk.after.boot(mod1, type = "weights") ## End(Not run)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) jk.after.boot(mod1, type = "coefficients") ## Not run: jk.after.boot(mod1, type = "loadings") jk.after.boot(mod1, type = "weights") ## End(Not run)
Functions to extract loadings bootstrap information from mvdalab objects.
## S3 method for class 'mvdareg' loadings(object, ncomp = object$ncomp, conf = .95, ...)
## S3 method for class 'mvdareg' loadings(object, ncomp = object$ncomp, conf = .95, ...)
object |
an mvdareg or mvdapaca object. A fitted model. |
ncomp |
the number of components to include in the model (see below). |
conf |
for a bootstrapped model, the confidence level to use. |
... |
additional arguments. Currently ignored. |
loadings
is used to extract a summary of the loadings of a PLS or PCA model.
If ncomps
is missing (or is NULL), summaries for all loadings estimates are returned. Otherwise, if comps is given parameters for a model with only the requested component comps is returned.
Boostrap summaries are provided for mvdareg
objects where validation = "oob"
. These summaries can also be extracted using loadings.boots
A loadings object contains a data frame with columns:
variable |
variable names |
Actual |
Actual loading estimate using all the data |
BCa percentiles |
confidence intervals |
boot.mean |
mean of the bootstrap |
skewness |
skewness of the bootstrap distribution |
bias |
estimate of bias w.r.t. the loading estimate |
Bootstrap Error |
estimate of bootstrap standard error |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
There are many references explaining the bootstrap. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1992) Jackknife-after-bootstrap standard errors and influence functions (with Discussion). Journal of the Royal Statistical Society, B, 54, 83:127.
loadingsplot
, loadings.boots
, loadingsplot2D
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadings(mod1, ncomp = 2, conf = .95) data(iris) pc1 <- pcaFit(iris) loadings(pc1)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadings(mod1, ncomp = 2, conf = .95) data(iris) pc1 <- pcaFit(iris) loadings(pc1)
Computes bootstrap BCa confidence intervals for the loadings, along with expanded bootstrap summaries.
loadings.boots(object, ncomp = object$ncomp, conf = .95)
loadings.boots(object, ncomp = object$ncomp, conf = .95)
object |
an object of class |
ncomp |
number of components in the model. |
conf |
desired confidence level. |
The function computes the bootstrap BCa confidence intervals for fitted mvdareg
models where valiation = "oob"
.
Should be used in instances in which there is reason to suspectd the percentile intervals. Results provided across all latent variables or for specific latent variables via ncomp
.
A loadings.boots object contains component results for the following:
variable |
variable names |
actual |
Actual loading estimate using all the data |
BCa percentiles |
confidence intervals |
boot.mean |
mean of the bootstrap |
skewness |
skewness of the bootstrap distribution |
bias |
estimate of bias w.r.t. the loading estimate |
Bootstrap Error |
estimate of bootstrap standard error |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
There are many references explaining the bootstrap. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1992) Jackknife-after-bootstrap standard errors and influence functions (with Discussion). Journal of the Royal Statistical Society, B, 54, 83:127.
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadings.boots(mod1, ncomp = 2, conf = .95)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadings.boots(mod1, ncomp = 2, conf = .95)
Functions to extract graphical loadings information from mvdareg
and mvdapca
object.
loadingsplot(object, ncomp = object$ncomp, conf = 0.95, verbose = FALSE)
loadingsplot(object, ncomp = object$ncomp, conf = 0.95, verbose = FALSE)
object |
an |
ncomp |
the number of components to include. |
conf |
for a bootstrapped model, the confidence level to use. |
verbose |
output results as a data frame |
"loadingsplot"
is used to extract a graphical summary of the loadings of a PLS model.
If "comps"
is missing (or is NULL), a graphical summary for the nth component estimates are returned. Otherwise, if comps is given parameters for a model with only the requested component comps is returned.
Bootstrap graphcal summaries provided are when "method = oob"
Nelson Lee Afanador ([email protected])
loadings
, loadings.boots
, loadingsplot2D
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadingsplot(mod1, ncomp = 1:2)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) loadingsplot(mod1, ncomp = 1:2)
Functions to extract 2D graphical loadings information from mvdalab objects.
loadingsplot2D(object, comps = c(1, 2), verbose = FALSE)
loadingsplot2D(object, comps = c(1, 2), verbose = FALSE)
object |
an |
comps |
a vector or length 2 corresponding to the number of components to include. |
verbose |
output results as a data frame |
loadingsplot2D
is used to extract a graphical summary of the loadings of a PLS model.
If comp
is missing (or is NULL), a graphical summary for the 1st and 2nd componentsare returned.
Nelson Lee Afanador ([email protected])
coefficientsplot2D
, weightsplot2D
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") loadingsplot2D(mod1, comp = c(1, 2)) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") loadingsplot2D(mod2, comp = c(1, 2)) ## End(Not run) data(iris) pc1 <- pcaFit(iris) loadingsplot2D(pc1, comp = c(1, 2))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") loadingsplot2D(mod1, comp = c(1, 2)) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") loadingsplot2D(mod2, comp = c(1, 2)) ## End(Not run) data(iris) pc1 <- pcaFit(iris) loadingsplot2D(pc1, comp = c(1, 2))
Generates a Hotelling's T2 Graph for mewma
objects.
mewma(X, phase = 1, lambda = 0.2, conf = c(0.95, 0.99), asymptotic.form = FALSE)
mewma(X, phase = 1, lambda = 0.2, conf = c(0.95, 0.99), asymptotic.form = FALSE)
X |
a dataframe. |
phase |
designates whether the confidence limits should reflect the current data frame, |
lambda |
EWMA smoothing parameter |
conf |
the confidence level(s) to use for upper control limit. |
asymptotic.form |
use asymptotic convergence parameter for scaling the covariance matrix. |
mewma
is used to generates a Hotelling's T2 graph for the multivariate EWMA.
The output of mewma
is a graph of Hotelling's T2 for the Multivariate EWMS, and a list containing a data frame of univariate EWMAs and the multivariate EWMA T2 values.
Nelson Lee Afanador ([email protected])
Lowry, Cynthia A., et al. "A multivariate exponentially weighted moving average control chart." Technometrics 34.1 (1992): 46:53.
mewma(iris[, -5], phase = 1, lambda = 0.2, conf = c(0.95, 0.99), asymptotic.form = FALSE)
mewma(iris[, -5], phase = 1, lambda = 0.2, conf = c(0.95, 0.99), asymptotic.form = FALSE)
model.matrix
creates a design (or model) matrix.This function returns the model.matrix
of an mvdareg
object.
## S3 method for class 'mvdareg' model.matrix(object, ...)
## S3 method for class 'mvdareg' model.matrix(object, ...)
object |
an |
... |
additional arguments. Currently ignored. |
"model.matrix.mvdareg"
is used to returns the model.matrix
of an mvdareg
object.
The design matrix for a PLS
model with the specified formula and data.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") model.matrix(mod1)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") model.matrix(mod1)
Provides three multivariate capability indices for correlated multivariate processes based on Principal Component Analysis.
MultCapability(data, lsls, usls, targets, ncomps = NULL, Target = FALSE)
MultCapability(data, lsls, usls, targets, ncomps = NULL, Target = FALSE)
data |
a multivariable dataset |
lsls |
is the vector of the lower specification limits |
usls |
is the vector of the upper specification limits |
targets |
is the vector of the target of the process |
ncomps |
is the number of principal component to use |
Target |
Use |
ncomps
has to be set prior to running the analysis. The user is strongly encouraged to use pcaFit
in order to determine the optimal number of principal components using cross-validation.
When the parameter targets is not specified, then is estimated of centered way as targets = lsls + (usls - lsls)/2.
Ppk values are provided to allow the user to compare the multivariate results to the univariate results.
A list with the following elements:
For mpca_wang
, the following is returned:
ncomps |
number of components used |
mcp_wang |
index greater than 1, the process is capable |
mcpk_wang |
index greater than 1, the process is capable |
mcpm_wang |
index greater than 1, the process is capable |
mcpmk_wang |
index greater than 1, the process is capable |
For mcp_xe
, the following is returned:
ncomps |
number of components used |
mcp_wang_2 |
index greater than 1, the process is capable |
mcpk_wang_2 |
index greater than 1, the process is capable |
mcpm_wang_2 |
index greater than 1, the process is capable |
mcpmk_wang_2 |
index greater than 1, the process is capable |
For mpca_wang_2
, the following is returned:
ncomps |
number of components used |
mcp_xe |
index greater than 1, the process is capable |
mcpk_xe |
index greater than 1, the process is capable |
mcpm_xe |
index greater than 1, the process is capable |
mcpmk_xe |
index greater than 1, the process is capable |
For Ppk, the following is returned:
Individual.Ppks |
univariate Ppks; index greater than 1, the process is capable |
Nelson Lee Afanador ([email protected])
Wang F, Chen J (1998). Capability index using principal components analysis. Quality Engineering, 11, 21-27.
Xekalaki E, Perakis M (2002). The Use of principal component analysis in the assessment of process capability indices. Proceedings of the Joint Statistical Meetings of the American Statistical Association, The Institute of Mathematical Statistics, The Canadian Statistical Society. New York.
Wang, C (2005). Constructing multivariate process capability indices for short-run production. The International Journal of Advanced Manufacturing Technology, 26, 1306-1311.
Scagliarini, M (2011). Multivariate process capability using principal component analysis in the presence of measurement errors. AStA Adv Stat Anal, 95, 113-128.
Santos-Fernandez E, Scagliarini M (2012). "MPCI: An R Package for Computing Multivariate Process Capability Indices". Journal of Statistical Software, 47(7), 1-15, URL http://www.jstatsoft.org/v47/i07/.
data(Wang_Chen_Sim) lsls1 <- c(2.1, 304.5, 304.5) usLs1 <- c(2.3, 305.1, 305.1) targets1 <- c(2.2, 304.8, 304.8) MultCapability(Wang_Chen_Sim, lsls = lsls1, usls = usLs1, targets = targets1, ncomps = 2) data(Wang_Chen) targets2 <- c(177, 53) lsls2 <- c(112.7, 32.7) usLs2 <- c(241.3, 73.3) MultCapability(Wang_Chen, lsls = lsls2, usls = usLs2, targets = targets2, ncomps = 1)
data(Wang_Chen_Sim) lsls1 <- c(2.1, 304.5, 304.5) usLs1 <- c(2.3, 305.1, 305.1) targets1 <- c(2.2, 304.8, 304.8) MultCapability(Wang_Chen_Sim, lsls = lsls1, usls = usLs1, targets = targets1, ncomps = 2) data(Wang_Chen) targets2 <- c(177, 53) lsls2 <- c(112.7, 32.7) usLs2 <- c(241.3, 73.3) MultCapability(Wang_Chen, lsls = lsls2, usls = usLs2, targets = targets2, ncomps = 1)
Calculate joint confidence intervals (Hotelling's T2 Intervals).
MVcis(data, segments = 51, level = .95, Vars2Plot = c(1, 2), include.zero = F)
MVcis(data, segments = 51, level = .95, Vars2Plot = c(1, 2), include.zero = F)
data |
a multivariable dataset to compare to means |
segments |
number of line-segments used to draw ellipse. |
level |
draw elliptical contours at these (normal) probability or confidence levels. |
Vars2Plot |
variables to plot |
include.zero |
add the zero axis to the graph output |
This function calculates the Hotelling's T2 Intervals for a mean vector.
Assumption:
Population is a random sample from a multivariate population.
If the confidence ellipse does not cover c(0, 0), we reject the NULL that the joint confidence region is equal to zero (at the stated alpha level).
This function returns the Hotelling's T2 confidence intervals for the p-variates and its corresponding confidence ellipse at the stated confidence level.
Nelson Lee Afanador ([email protected])
Johnson, R.A., Wichern, D.W. (2002) Applied Multivariate Statistical Analysis. Prentice Hall.
data(College) MVcis(College, Vars2Plot = c(1, 2), include.zero = TRUE)
data(College) MVcis(College, Vars2Plot = c(1, 2), include.zero = TRUE)
Performs a traditional multivariate comparison of mean vectors drawn from two populations.
MVComp(data1, data2, level = .95)
MVComp(data1, data2, level = .95)
data1 |
a multivariable dataset to compare to. |
data2 |
a multivariable dataset to compare. |
level |
draw elliptical contours at these (normal) probability or confidence levels. |
This function provides a T2-statistic for testing the equality of two mean vectors. This test is appropriate for testing two populations, assuming independence.
Assumptions:
The sample for both populations is a random sample from a multivariate population.
-Both populations are independent
-Both populations are multivariate normal
-Covariance matrices are approximately equal
This function returns the simultaneous confidence intervals for the p-variates and its corresponding confidence ellipse at the stated confidence level.
Nelson Lee Afanador ([email protected])
Johnson, R.A., Wichern, D.W. (2002) Applied Multivariate Statistical Analysis. Prentice Hall.
data(College) dat1 <- College #Generate a 'fake' difference of 15 units dat2 <- College + matrix(rnorm(nrow(dat1) * ncol(dat1), mean = 15), nrow = nrow(dat1), ncol = ncol(dat1)) Comparison <- MVComp(dat1, dat2, level = .95) Comparison plot(Comparison, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = FALSE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = TRUE) data(iris) dat1b <- iris[, -5] #Generate a 'fake' difference of .5 units dat2b <- dat1b + matrix(rnorm(nrow(dat1b) * ncol(dat1b), mean = .5), nrow = nrow(dat1b), ncol = ncol(dat1b)) Comparison2 <- MVComp(dat1b, dat2b, level = .90) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = TRUE)
data(College) dat1 <- College #Generate a 'fake' difference of 15 units dat2 <- College + matrix(rnorm(nrow(dat1) * ncol(dat1), mean = 15), nrow = nrow(dat1), ncol = ncol(dat1)) Comparison <- MVComp(dat1, dat2, level = .95) Comparison plot(Comparison, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = FALSE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = TRUE) data(iris) dat1b <- iris[, -5] #Generate a 'fake' difference of .5 units dat2b <- dat1b + matrix(rnorm(nrow(dat1b) * ncol(dat1b), mean = .5), nrow = nrow(dat1b), ncol = ncol(dat1b)) Comparison2 <- MVComp(dat1b, dat2b, level = .90) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = TRUE)
mvdareg
objectsWhen validation = 'oob'
this routine effects the bootstrap procedure for mvdareg
objects.
mvdaboot(X, Y, ncomp, method = "bidiagpls", scale = FALSE, n_cores, parallel, boots, ...)
mvdaboot(X, Y, ncomp, method = "bidiagpls", scale = FALSE, n_cores, parallel, boots, ...)
X |
a matrix of observations. NAs and Infs are not allowed. |
Y |
a vector. NAs and Infs are not allowed. |
ncomp |
the number of components to include in the model (see below). |
method |
PLS algorithm used. |
scale |
scaling used. |
n_cores |
No. of cores to run for parallel processing. Currently set to 2 (4 max). |
parallel |
should parallelization be used. |
boots |
No. of bootstrap samples when |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through the generic function plsFit with the argument validation = 'oob'
.
Provides the following bootstrapped results as a list for mvdareg
objects:
coefficients |
fitted values |
weights |
weights |
loadings |
loadings |
ncomp |
number of latent variables |
bootstraps |
No. of bootstraps |
scores |
scores |
cvR2 |
bootstrap estimate of cvR2 |
PRESS |
bootstrap estimate of prediction error sums of squares |
MSPRESS |
bootstrap estimate of mean squared error prediction sums of squares |
boot.means |
bootstrap mean of bootstrapped parameters |
RMSPRESS |
bootstrap estimate of mean squared error prediction sums of squares |
D2 |
bidiag2 matrix |
iD2 |
Inverse of bidiag2 matrix |
y.loadings |
normalized y-loadings |
y.loadings2 |
non-normalized y-loadings |
MSPRESS.632 |
.632 corrected estimate of MSPRESS |
oob.fitted |
out-of-bag PLS fitted values |
RMSPRESS.632 |
.632 corrected estimate of RMSPRESS |
in.bag |
bootstrap samples used for model building at each bootstrap |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
There are many references explaining the bootstrap and its implementation for confidence interval estimation. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall.
Hinkley, D.V. (1988) Bootstrap methods (with Discussion). Journal of the Royal Statistical Society, B, 50, 312:337, 355:370.
NOTE: This function is adapted from mvr
in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) ## Run line below to see bootstrap results ## mod1$validation
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) ## Run line below to see bootstrap results ## mod1$validation
mvdareg
objectsWhen validation = 'loo'
this routine effects the leave-one-out cross-validation procedure for mvdareg
objects.
mvdaloo(X, Y, ncomp, weights = NULL, method = "bidiagpls", scale = FALSE, boots = NULL, ...)
mvdaloo(X, Y, ncomp, weights = NULL, method = "bidiagpls", scale = FALSE, boots = NULL, ...)
X |
a matrix of observations. |
Y |
a vector. |
ncomp |
the number of components to include in the model (see below). |
weights |
currently not in use |
method |
PLS algorithm used |
scale |
scaling used |
boots |
not applicable for |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through the generic function plsFit
with the argument validation = 'loo'
.
Provides the following bootstrapped results as a list for mvdareg
objects:
cvR2 |
leave-one-out estimate of cvR2. |
PRESS |
leave-one-out estimate of prediction error sums of squares. |
MSPRESS |
leave-one-out estimate of mean squared error prediction sums of squares. |
RMSPRESS |
leave-one-out estimate of mean squared error prediction sums of squares. |
in.bag |
leave-one-out samples used for model building. |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
NOTE: This function is adapted from mvr
in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, method = "bidiagpls", validation = "loo") mod1$validation$cvR2 mod1$validation$PRESS mod1$validation$MSPRESS mod1$validation$RMSPRESS mod1$validation$in.bag
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, method = "bidiagpls", validation = "loo") mod1$validation$cvR2 mod1$validation$PRESS mod1$validation$MSPRESS mod1$validation$RMSPRESS mod1$validation$in.bag
Produces one or more samples from the specified multivariate distribution.
mvrnorm.svd(n = 1, mu = NULL, Sigma = NULL, tol = 1e-06, empirical = FALSE, Dist = "normal", skew = 5, skew.mean = 0, skew.sd = 1, poisson.mean = 5)
mvrnorm.svd(n = 1, mu = NULL, Sigma = NULL, tol = 1e-06, empirical = FALSE, Dist = "normal", skew = 5, skew.mean = 0, skew.sd = 1, poisson.mean = 5)
n |
the number of samples required. |
mu |
a vector giving the means of the variables. |
Sigma |
a positive-definite symmetric matrix specifying the covariance matrix of the variables. |
tol |
tolerance (relative to largest variance) for numerical lack of positive-definiteness in Sigma. |
empirical |
logical. If true, |
Dist |
desired distribution. |
skew |
amount of skew for skewed distributions. |
skew.mean |
mean for skewed distribution. |
skew.sd |
standard deviation for skewed distribution. |
poisson.mean |
mean for poisson distribution. |
"mvrnorm.svd"
The matrix decomposition is done via svd
Nelson Lee Afanador ([email protected])
Sigma <- matrix(c(1, .5, .5, .5, 1, .5, .5, .5, 1), 3, 3) Means <- rep(0, 3) Sim.dat.norm <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "normal") plot(as.data.frame(Sim.dat.norm)) Sim.dat.pois <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "poisson") plot(as.data.frame(Sim.dat.pois)) Sim.dat.exp <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "exp") plot(as.data.frame(Sim.dat.exp)) Sim.dat.skew <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "skewnorm") plot(as.data.frame(Sim.dat.skew))
Sigma <- matrix(c(1, .5, .5, .5, 1, .5, .5, .5, 1), 3, 3) Means <- rep(0, 3) Sim.dat.norm <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "normal") plot(as.data.frame(Sim.dat.norm)) Sim.dat.pois <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "poisson") plot(as.data.frame(Sim.dat.pois)) Sim.dat.exp <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "exp") plot(as.data.frame(Sim.dat.exp)) Sim.dat.skew <- mvrnorm.svd(n = 1000, Means, Sigma, Dist = "skewnorm") plot(as.data.frame(Sim.dat.skew))
This function generates a dummy variable data frame in support various functions.
my.dummy.df(data, contr = "contr.niets")
my.dummy.df(data, contr = "contr.niets")
data |
a data frame |
contr |
an optional list. See the contrasts.arg of model.matrix.default. |
my.dummy.df takes a data.frame
with categorical variables, and returns a data.frame
in which all the categorical variables columns are expanded as dummy variables.
The argument contr
is passed to the default contr.niets
; contr.helmert
, contr.poly
, contr.sum
, contr.treatment
are also supported.
For datasets with categorical variables it produces the specified design matrix.
Nelson Lee Afanador ([email protected])
data(iris) my.dummy.df(iris)
data(iris) my.dummy.df(iris)
Deletes the intercept from a model matrix.
no.intercept(mm)
no.intercept(mm)
mm |
Model Matrix |
A model matrix without intercept column.
Nelson Lee Afanador
Implements the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm for computing PCA scores and loadings and intermediate steps to convergence.
pca.nipals(data, ncomps = 1, Iters = 500, start.vec = NULL, tol = 1e-08)
pca.nipals(data, ncomps = 1, Iters = 500, start.vec = NULL, tol = 1e-08)
data |
A dataframe |
ncomps |
the number of components to include in the analysis. |
Iters |
Number of iterations |
start.vec |
option for choosing your own starting vector |
tol |
tolernace for convergence |
The NIPALS algorithm is a popular algorithm in multivariate data analysi for computing PCA scores and loadings. This function is specifically designed to help explore the subspace prior to convergence. Currently only mean-centering is employed.
Loadings |
Loadings obtained via NIPALS |
Scores |
Scores obtained via NIPALS |
Loading.Space |
A list containing the intermediate step to convergence for the loadings |
Score.Space |
A list containing the intermediate step to convergence for the scores |
Nelson Lee Afanador ([email protected])
There are many good references for the NIPALS algorithm:
Risvik, Henning. "Principal component analysis (PCA) & NIPALS algorithm." (2007).
Wold, Svante, Kim Esbensen, and Paul Geladi. "Principal component analysis." Chemometrics and intelligent laboratory systems 2.1-3 (1987): 37:52.
my.nipals <- pca.nipals(iris[, 1:4], ncomps = 4, tol = 1e-08) names(my.nipals) #Check results my.nipals$Loadings svd(scale(iris[, 1:4], scale = FALSE))$v nipals.scores <- data.frame(my.nipals$Scores) names(nipals.scores) <- paste("np", 1:4) svd.scores <- data.frame(svd(scale(iris[, 1:4], scale = FALSE))$u) names(svd.scores) <- paste("svd", 1:4) Scores. <- cbind(nipals.scores, svd.scores) plot(Scores.) my.nipals$Loading.Space my.nipals$Score.Space
my.nipals <- pca.nipals(iris[, 1:4], ncomps = 4, tol = 1e-08) names(my.nipals) #Check results my.nipals$Loadings svd(scale(iris[, 1:4], scale = FALSE))$v nipals.scores <- data.frame(my.nipals$Scores) names(nipals.scores) <- paste("np", 1:4) svd.scores <- data.frame(svd(scale(iris[, 1:4], scale = FALSE))$u) names(svd.scores) <- paste("svd", 1:4) Scores. <- cbind(nipals.scores, svd.scores) plot(Scores.) my.nipals$Loading.Space my.nipals$Score.Space
Function to perform principal component analysis.
pcaFit(data, scale = TRUE, ncomp = NULL)
pcaFit(data, scale = TRUE, ncomp = NULL)
data |
an data frame containing the variables in the model. |
scale |
should scaling to unit variance be used. |
ncomp |
the number of components to include in the model (see below). |
The calculation is done via singular value decomposition of the data matrix. Dummy variables are automatically created for categorical variables.
pcaFit
returns a list containing the following components:
loadings |
X loadings |
scores |
X scores |
D |
eigenvalues |
Xdata |
X matrix |
Percent.Explained |
Explained variation in X |
PRESS |
Prediction Error Sum-of-Squares |
ncomp |
number of latent variables |
method |
PLS algorithm used |
Nelson Lee Afanador ([email protected])
Everitt, Brian S. (2005). An R and S-Plus Companion to Multivariate Analysis. Springer-Verlag.
Edoardo Saccentia, Jos? Camacho, (2015) On the use of the observation-wise k-fold operation in PCA cross-validation, J. Chemometrics 2015; 29: 467-478.
loadingsplot2D
, T2
, Xresids
, ScoreContrib
data(iris) pc1 <- pcaFit(iris, scale = TRUE, ncomp = NULL) pc1 print(pc1) #Model summary plot(pc1) #MSEP PE(pc1) #X-explained variance T2(pc1, ncomp = 2) #T2 plot Xresids(pc1, ncomp = 2) #X-residuals plot scoresplot(pc1) #scoresplot variable importance (SC <- ScoreContrib(pc1, obs1 = 1:9, obs2 = 10:11)) #score contribution plot(SC) #score contribution plot loadingsplot(pc1, ncomp = 1) #loadings plot loadingsplot(pc1, ncomp = 1:2) #loadings plot loadingsplot(pc1, ncomp = 1:3) #loadings plot loadingsplot(pc1, ncomp = 1:7) #loadings plot loadingsplot2D(pc1, comps = c(1, 2)) #2-D loadings plot loadingsplot2D(pc1, comps = c(2, 3)) #2-D loadings plot
data(iris) pc1 <- pcaFit(iris, scale = TRUE, ncomp = NULL) pc1 print(pc1) #Model summary plot(pc1) #MSEP PE(pc1) #X-explained variance T2(pc1, ncomp = 2) #T2 plot Xresids(pc1, ncomp = 2) #X-residuals plot scoresplot(pc1) #scoresplot variable importance (SC <- ScoreContrib(pc1, obs1 = 1:9, obs2 = 10:11)) #score contribution plot(SC) #score contribution plot loadingsplot(pc1, ncomp = 1) #loadings plot loadingsplot(pc1, ncomp = 1:2) #loadings plot loadingsplot(pc1, ncomp = 1:3) #loadings plot loadingsplot(pc1, ncomp = 1:7) #loadings plot loadingsplot2D(pc1, comps = c(1, 2)) #2-D loadings plot loadingsplot2D(pc1, comps = c(2, 3)) #2-D loadings plot
This function provides both the cumulative and individual percent explained for the X-block for an mvdareg
and mvdapca
objects.
PE(object, verbose = FALSE)
PE(object, verbose = FALSE)
object |
an object of class |
verbose |
output results as a data frame |
This function provides both the cumulative and individual percent explained for the X-block for an mvdareg
or mvdapca
objects.
Nelson Lee Afanador ([email protected])
mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") PE(mod1) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") PE(mod2) ## End(Not run)
mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") PE(mod1) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") PE(mod2) ## End(Not run)
This data is obtained from drug discovery and includes measurements pertaining to size, lipophilicity, and polarity at various sites on a molecule.
Penta
Penta
A data frame with 30 observations and the following 17 variables.
Obs.Name
Categorical ID Variable
S1
numeric predictor vector
L1
numeric predictor vector
P1
numeric predictor vector
S2
numeric predictor vector
L2
numeric predictor vector
P2
numeric predictor vector
S3
numeric predictor vector
L3
numeric predictor vector
P3
numeric predictor vector
S4
numeric predictor vector
L4
numeric predictor vector
P4
numeric predictor vector
S5
numeric predictor vector
L5
numeric predictor vector
P5
numeric predictor vector
log.RAI
numeric response vector
Umetrics, Inc. (1995), Multivariate Analysis (3-day course), Winchester, MA.
SAS/STAT(R) 9.22 User's Guide, "The PLS Procedure".
Computes percentile bootstrap confidence intervals for chosen parameters for plsFit
models fitted with validation = "oob"
perc.cis(object, ncomp = object$ncomp, conf = 0.95, type = c("coefficients", "loadings", "weights"))
perc.cis(object, ncomp = object$ncomp, conf = 0.95, type = c("coefficients", "loadings", "weights"))
object |
an object of class |
ncomp |
number of components to extract percentile intervals. |
conf |
confidence level. |
type |
input parameter vector. |
The function fits computes the bootstrap percentile confidence intervals for any fitted mvdareg
model.
A perc.cis object contains component results for the following:
ncomp |
number of components in the model |
variables |
variable names |
boot.mean |
mean of the bootstrap |
percentiles |
confidence intervals |
Nelson Lee Afanador ([email protected])
There are many references explaining the bootstrap and its implementation for confidence interval estimation. Among them are:
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall.
Hinkley, D.V. (1988) Bootstrap methods (with Discussion). Journal of the Royal Statistical Society, B, 50, 312:337, 355:370.
data(Penta) ## Number of bootstraps set to 250 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 250) perc.cis(mod1, ncomp = 1:2, conf = .95, type = "coefficients")
data(Penta) ## Number of bootstraps set to 250 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 250) perc.cis(mod1, ncomp = 1:2, conf = .95, type = "coefficients")
This function generates a plot an object of class score.contribution
## S3 method for class 'cp' plot(x, ncomp = "Overall", ...)
## S3 method for class 'cp' plot(x, ncomp = "Overall", ...)
x |
|
ncomp |
the number of components to include the graph output. |
... |
additional arguments. Currently ignored. |
A graph of the score contributions for ScoreContrib
objects.
The output of plot
is a graph of score contributions for the specified observation(s).
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, method = "bidiagpls", validation = "loo") Score.Contributions1 <- ScoreContrib(mod1, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 1) ## Not run: data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) Score.Contributions2 <- ScoreContrib(mod2, obs1 = 1, obs2 = 3) plot(Score.Contributions2, ncomp = 1) ## End(Not run) #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 3) Score.Contributions1 <- ScoreContrib(mod1, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 1)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, method = "bidiagpls", validation = "loo") Score.Contributions1 <- ScoreContrib(mod1, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 1) ## Not run: data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) Score.Contributions2 <- ScoreContrib(mod2, obs1 = 1, obs2 = 3) plot(Score.Contributions2, ncomp = 1) ## End(Not run) #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 3) Score.Contributions1 <- ScoreContrib(mod1, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 1)
Plot a comparison of mean vectors drawn from two populations.
## S3 method for class 'mvcomp' plot(x, Diff2Plot = c(3, 4), segments = 51, include.zero = FALSE, ...)
## S3 method for class 'mvcomp' plot(x, Diff2Plot = c(3, 4), segments = 51, include.zero = FALSE, ...)
x |
an plot.mvcomp object. |
segments |
number of line-segments used to draw ellipse. |
Diff2Plot |
variable differences to plot. |
include.zero |
add the zero axis to the graph output. |
... |
additional arguments. Currently ignored. |
This function provides a plot of the T2-statistic for testing the equality of two mean vectors. This test is appropriate for testing two populations, assuming independence.
Assumptions:
The sample for both populations is a random sample from a multivariate population.
-Both populations are independent
-Both populations are multivariate normal
-Covariance matrices are approximately equal
If the confidence ellipse does not cover c(0, 0), we reject the NULL that the differnece between mean vectors is equal to zero (at the stated alpha level).
This function returns a plot of the simultaneous confidence intervals for the p-variates and its corresponding confidence ellipse at the stated confidence level.
Nelson Lee Afanador ([email protected])
Johnson, R.A., Wichern, D.W. (2002) Applied Multivariate Statistical Analysis. Prentice Hall.
data(College) dat1 <- College #Generate a 'fake' difference of 15 units dat2 <- College + matrix(rnorm(nrow(dat1) * ncol(dat1), mean = 15), nrow = nrow(dat1), ncol = ncol(dat1)) Comparison <- MVComp(dat1, dat2, level = .95) Comparison plot(Comparison, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = FALSE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = TRUE) data(iris) dat1b <- iris[, -5] #Generate a 'fake' difference of .5 units dat2b <- dat1b + matrix(rnorm(nrow(dat1b) * ncol(dat1b), mean = .5), nrow = nrow(dat1b), ncol = ncol(dat1b)) Comparison2 <- MVComp(dat1b, dat2b, level = .90) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = TRUE)
data(College) dat1 <- College #Generate a 'fake' difference of 15 units dat2 <- College + matrix(rnorm(nrow(dat1) * ncol(dat1), mean = 15), nrow = nrow(dat1), ncol = ncol(dat1)) Comparison <- MVComp(dat1, dat2, level = .95) Comparison plot(Comparison, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = FALSE) plot(Comparison, Diff2Plot = c(2, 3), include.zero = TRUE) data(iris) dat1b <- iris[, -5] #Generate a 'fake' difference of .5 units dat2b <- dat1b + matrix(rnorm(nrow(dat1b) * ncol(dat1b), mean = .5), nrow = nrow(dat1b), ncol = ncol(dat1b)) Comparison2 <- MVComp(dat1b, dat2b, level = .90) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(1, 2), include.zero = TRUE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = FALSE) plot(Comparison2, Diff2Plot = c(3, 4), include.zero = TRUE)
mvdareg
and mvdapaca
objects.A general plotting function for a mvdareg
and mvdapca
objects.
## S3 method for class 'mvdareg' plot(x, plottype = c("PE", "scoresplot", "loadingsplot", "loadingsplot2D", "T2", "Xresids", "coefsplot", "ap.plot", "weightsplot", "weightsplot2D", "acfplot"), ...)
## S3 method for class 'mvdareg' plot(x, plottype = c("PE", "scoresplot", "loadingsplot", "loadingsplot2D", "T2", "Xresids", "coefsplot", "ap.plot", "weightsplot", "weightsplot2D", "acfplot"), ...)
x |
an object of class |
plottype |
the desired plot from an object of class |
... |
additional arguments. Currently ignored. |
The following plotting functions are supported:
PE
, scoreplot
, loadingsplot
, loadingsplot2D
, T2
, Xresids
, coefsplot
, ap.plot
, weightsplot
, weightsplot2D
, acfplot
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") plot(mod1, plottype = "scoresplot") ## Not run: plot(mod1, plottype = "loadingsplot2D") plot(mod1, plottype = "T2", ncomp = 2, phase = 1, conf = c(.95, .99)) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") plot(mod1, plottype = "scoresplot") ## Not run: plot(mod1, plottype = "loadingsplot2D") plot(mod1, plottype = "T2", ncomp = 2, phase = 1, conf = c(.95, .99)) ## End(Not run)
Generates a 2-dimensional graph of the scores for both plusminus
objects.
## S3 method for class 'plusminus' plot(x, ncomp = 2, comps = c(1, 2), ...)
## S3 method for class 'plusminus' plot(x, ncomp = 2, comps = c(1, 2), ...)
x |
an object of class |
ncomp |
the number of components to include in the model (see below). |
comps |
a vector or length 2 corresponding to the number of components to include. |
... |
additional arguments. Currently ignored. |
plot.plusminus
is used to extract a 2D graphical summary of the PCA scores associated with a plusminus
object.
Nelson Lee Afanador ([email protected])
### PLUS-Minus CLASSIFIER WITH validation = 'none', i.e. no CV ### data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "none", n_cores = 2) plot(mod1, ncomp = 2, comps = c(1, 2)) ### Plus-Minus CLASSIFIER WITH validation = 'loo', i.e. leave-one-out CV ### ## Not run: data(plusMinusDat) mod2 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) plot(mod2, ncomp = 2, comps = c(1, 2)) ## End(Not run)
### PLUS-Minus CLASSIFIER WITH validation = 'none', i.e. no CV ### data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "none", n_cores = 2) plot(mod1, ncomp = 2, comps = c(1, 2)) ### Plus-Minus CLASSIFIER WITH validation = 'loo', i.e. leave-one-out CV ### ## Not run: data(plusMinusDat) mod2 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) plot(mod2, ncomp = 2, comps = c(1, 2)) ## End(Not run)
Plots for the cross-validated R2 (CVR2), explained variance in the predictor variables (R2X), and the reponse (R2Y).
## S3 method for class 'R2s' plot(x, ...)
## S3 method for class 'R2s' plot(x, ...)
x |
An |
... |
additional arguments. Currently ignored. |
plot.R2s
is used to generates the graph of the cross-validated R2 (CVR2), explained variance in the predictor variables (R2X), and the reponse (R2Y) for PLS models.
The output of plot.R2s
is a graph of the stated explained variance summary.
Thanh Tran ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") plot(R2s(mod1))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") plot(R2s(mod1))
This function generates a plot an object of class smc
.
## S3 method for class 'smc' plot(x, variables = "all", ...)
## S3 method for class 'smc' plot(x, variables = "all", ...)
x |
|
variables |
the number of variables to include the graph output. |
... |
additional arguments. Currently ignored. |
plot.smc
is used to generates the graph of the significant multivariate correlation from smc
objects.
The output of plot.smc
is a graph of the significant multivariate correlation for the specified observation(s).
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc(mod1) plot(smc(mod1))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc(mod1) plot(smc(mod1))
This function provides the ability to plot an object of class sr
## S3 method for class 'sr' plot(x, variables = "all", ...)
## S3 method for class 'sr' plot(x, variables = "all", ...)
x |
|
variables |
the number of variables to include the graph output. |
... |
additional arguments. Currently ignored. |
plot.sr
is used to generates the graph of the selectivity ratio from sr
objects.
The output of plot.sr
is a graph of the selectivity ratio for the specified observation(s).
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") sr(mod1) plot(sr(mod1))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") sr(mod1) plot(sr(mod1))
mvdareg
Object with method = "bidiagpls"
This takes an mvdareg
object fitted with method = "bidiagpls"
and produces a graph of the bootstrap distribution and its corresponding normal quantile plot for a variable of interest.
## S3 method for class 'wrtpls' plot(x, comp = 1:object$ncomp, distribution = "log", ...)
## S3 method for class 'wrtpls' plot(x, comp = 1:object$ncomp, distribution = "log", ...)
x |
an object of class |
comp |
number of latent variables to generate the permutation distribution |
distribution |
plot the |
... |
additional arguments. Currently ignored. |
The function generates the permutation distribution and normal quantile plot for a mvdareg
model when method = "bidiagpls"
is specified.
The output of plot.wrtpls
is a histogram of the permutation distribution with the following vertical line indicators.
Solid line = Actual Value; Dashed Line = Critical Value from t-distribution at the model specifed alpha; Dotted line = Quantile at the model specifed alpha
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "wrtpls", validation = "none") ## Not run ## plot.wrtpls(mod1, distribution = "log")
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "wrtpls", validation = "none") ## Not run ## plot.wrtpls(mod1, distribution = "log")
Functions to perform partial least squares regression with a formula interface. Bootstraping can be used. Prediction, residuals, model extraction, plot, print and summary methods are also implemented.
plsFit(formula, data, subset, ncomp = NULL, na.action, method = c("bidiagpls", "wrtpls"), scale = TRUE, n_cores = 2, alpha = 0.05, perms = 2000, validation = c("none", "oob", "loo"), boots = 1000, model = TRUE, parallel = FALSE, x = FALSE, y = FALSE, ...) ## S3 method for class 'mvdareg' summary(object, ncomp = object$ncomp, digits = 3, ...)
plsFit(formula, data, subset, ncomp = NULL, na.action, method = c("bidiagpls", "wrtpls"), scale = TRUE, n_cores = 2, alpha = 0.05, perms = 2000, validation = c("none", "oob", "loo"), boots = 1000, model = TRUE, parallel = FALSE, x = FALSE, y = FALSE, ...) ## S3 method for class 'mvdareg' summary(object, ncomp = object$ncomp, digits = 3, ...)
formula |
a model formula (see below). |
data |
an optional data frame containing the variables in the model. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
ncomp |
the number of components to include in the model (see below). |
na.action |
a function which indicates what should happen when the data contain |
method |
the multivariate regression algorithm to be used. |
scale |
should scaling to unit variance be used. |
n_cores |
Number of cores to run for parallel processing. Currently set to 2 with the max being 4. |
alpha |
the significance level for |
perms |
the number of permutations to run for |
validation |
character. What kind of (internal) validation to use. See below. |
boots |
Number of bootstrap samples when |
model |
an optional data frame containing the variables in the model. |
parallel |
should parallelization be used. |
x |
a logical. If TRUE, the model matrix is returned. |
y |
a logical. If TRUE, the response is returned. |
object |
an object of class |
digits |
the number of decimal place to output with |
... |
additional arguments, passed to the underlying fit functions, and |
The function fits a partial least squares (PLS) model with 1, ..., ncomp
number of latent variables. Multi-response models are not supported.
The type of model to fit is specified with the method argument. Currently two PLS algorithms are available: the bigiag2 algorithm ("bigiagpls" and "wrtpls").
The formula argument should be a symbolic formula of the form response ~ terms, where response is the name of the response vector and terms is the name of one or more predictor matrices, usually separated by +, e.g., y ~ X + Z. See lm
for a detailed description. The named variables should exist in the supplied data data frame or in the global environment. The chapter Statistical models in R of the manual An Introduction to R distributed with R is a good reference on formulas in R.
The number of components to fit is specified with the argument ncomp
. It this is not supplied, the maximal number of components is used.
Note that if the number of samples is <= 15, oob validation may fail. It is recommended that you PLS with validation = "loo"
.
If method = "bidiagpls"
and validation = "oob"
, bootstrap cross-validation is performed. Bootstrap confidence intervals are provided for coefficients
, weights
, loadings
, and y.loadings
. The number of bootstrap samples is specified with the argument boots
. See mvdaboot
for details.
If method = "bidiagpls"
and validation = "loo"
, leave-one-out cross-validation is performed.
If method = "bidiagpls"
and validation = "none"
, no cross-validation is performed. Note that the number of components, ncomp
, is set to min(nobj - 1, npred)
If method = "wrtpls"
and validation = "none"
, The Weight Randomization Test for the selection of the number of components is performed. Note that the number of components, ncomp
, is set to min(nobj - 1, npred)
An object of class mvdareg
is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:
loadings |
X loadings |
weights |
weights |
D2.values |
bidiag2 matrix |
iD2 |
inverse of bidiag2 matrix |
Ymean |
mean of reponse variable |
Xmeans |
mean of predictor variables |
coefficients |
PLS regression coefficients |
y.loadings |
y-loadings |
scores |
X scores |
R |
orthogonal weights |
Y.values |
scaled response values |
Yactual |
actual response values |
fitted |
fitted values |
residuals |
residuals |
Xdata |
X matrix |
iPreds |
predicted values |
y.loadings2 |
scaled y-loadings |
ncomp |
number of latent variables |
method |
PLS algorithm used |
scale |
scaling used |
validation |
validation method |
call |
model call |
terms |
model terms |
model |
fitted model |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
NOTE: This function is adapted from mvr
in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.
bidiagpls.fit
, mvdaboot
, boot.plots
,
R2s
, PE
, ap.plot
,
T2
, Xresids
, smc
,
scoresplot
, ScoreContrib
, sr
,
loadingsplot
, weightsplot
, coefsplot
,
coefficientsplot2D
, loadingsplot2D
,
weightsplot2D
,
bca.cis
, coefficients.boots
, loadings.boots
,
weight.boots
, coefficients
, loadings
,
weights
, BiPlot
, jk.after.boot
### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'oob', i.e. bootstrapping ### data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "oob", boots = 300) summary(mod1) #Model summary ### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'loo', i.e. leave-one-out CV ### ## Not run: mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "loo") summary(mod2) #Model summary ## End(Not run) ### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'none', i.e. no CV is performed ### ## Not run: mod3 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "none") summary(mod3) #Model summary ## End(Not run) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod4 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "wrtpls", validation = "none") summary(mod4) #Model summary plot.wrtpls(mod4) ## End(Not run)
### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'oob', i.e. bootstrapping ### data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "oob", boots = 300) summary(mod1) #Model summary ### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'loo', i.e. leave-one-out CV ### ## Not run: mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "loo") summary(mod2) #Model summary ## End(Not run) ### PLS MODEL FIT WITH method = 'bidiagpls' and validation = 'none', i.e. no CV is performed ### ## Not run: mod3 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "bidiagpls", ncomp = 2, validation = "none") summary(mod3) #Model summary ## End(Not run) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod4 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], method = "wrtpls", validation = "none") summary(mod4) #Model summary plot.wrtpls(mod4) ## End(Not run)
Plus-Minus classifier
plusminus.fit(XX, YY, ...)
plusminus.fit(XX, YY, ...)
XX |
a matrix of observations. |
YY |
a vector. |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through plusminusFit
with the argument method="plusminus"
. It implements the Plus-Minus algorithm.
An object of class plusminus
is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:
coefficients |
regression coefficients |
Y |
response values |
X |
scaled predictors |
Richard Baumgartner ([email protected]), Nelson Lee Afanador ([email protected])
Zhao et al. (2014) Mas-o-menos: a simple sign averaging method for discriminationin genomic data analysis. Bioinformatics, 30(21):3062-3069.
plusminus
objectsWhen validation = 'loo'
this routine effects the leave-one-out cross-validation procedure for plusminus
objects.
plusminus.loo(X, Y, method = "plusminus", n_cores, ...)
plusminus.loo(X, Y, method = "plusminus", n_cores, ...)
X |
a matrix of observations. |
Y |
a vector. |
method |
PlusMinus algorithm used |
n_cores |
number of cores |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through the generic function plusminusFit
with the argument validation = 'loo'
.
Provides the following crossvalideted results as a list for plusminus
objects:
cvError |
leave-one-out estimate of cv error. |
in.bag |
leave-one-out samples used for model building. |
Richard Baumgartner ([email protected]), Nelson Lee Afanador ([email protected])
NOTE: This function is adapted from mvr
in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.
data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) ## Not run: summary(mod1) mod1$validation$cvError mod1$validation$in.bag ## End(Not run)
data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) ## Not run: summary(mod1) mod1$validation$cvError mod1$validation$in.bag ## End(Not run)
A simulated dataset for demonstrating the performance of a plusminusFit
analysis.
plusMinusDat
plusMinusDat
A data frame with 201 observations, 200 input variables (X) and one response variable (Y).
Richard Baumgartner ([email protected])
Functions to perform plus-minus classifier with a formula interface. Leave one out crossvalidation also implemented. Model extraction, plot, print and summary methods are also implemented.
plusminusFit(formula, data, subset, na.action, method = "plusminus", n_cores = 2, validation = c("loo", "none"), model = TRUE, x = FALSE, y = FALSE, ...) ## S3 method for class 'plusminus' summary(object,...)
plusminusFit(formula, data, subset, na.action, method = "plusminus", n_cores = 2, validation = c("loo", "none"), model = TRUE, x = FALSE, y = FALSE, ...) ## S3 method for class 'plusminus' summary(object,...)
formula |
a model formula (see below). |
data |
an optional data frame containing the variables in the model. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain |
method |
the classification algorithm to be used. |
n_cores |
Number of cores to run for parallel processing. Currently set to 2 with the max being 4. |
validation |
character. What kind of (internal) validation to use. See below. |
model |
an optional data frame containing the variables in the model. |
x |
a logical. If TRUE, the model matrix is returned. |
y |
a logical. If TRUE, the response is returned. |
object |
an object of class |
... |
additional arguments, passed to the underlying fit functions, and |
The function fits a Plus-Minus classifier.
The formula argument should be a symbolic formula of the form response ~ terms, where response is the name of the response vector and terms is the name of one or more predictor matrices, usually separated by +, e.g., y ~ X + Z. See lm
for a detailed description. The named variables should exist in the supplied data data frame or in the global environment. The chapter Statistical models in R of the manual An Introduction to R distributed with R is a good reference on formulas in R.
If validation = "loo"
, leave-one-out cross-validation is performed. If validation = "none"
, no cross-validation is performed.
An object of class plusminus
is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:
coefficients |
Plus-Minus regression coefficients |
X |
X matrix |
Y |
actual response values (class labels) |
val.method |
validation method |
call |
model call |
terms |
model terms |
mm |
model matrix |
model |
fitted model |
Richard Baumgartner ([email protected]), Nelson Lee Afanador ([email protected])
Zhao et al.: Mas-o-menos: a simple sign averaging method for discriminationin genomic data analysis. Bioinformatics, 30(21):3062-3069,2014.
### PLUS-Minus CLASSIFIER WITH validation = 'none', i.e. no CV ### data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "none", n_cores = 2) summary(mod1) ### Plus-Minus CLASSIFIER WITH validation = 'loo', i.e. leave-one-out CV ### ## Not run: data(plusMinusDat) mod2 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) summary(mod2) ## End(Not run)
### PLUS-Minus CLASSIFIER WITH validation = 'none', i.e. no CV ### data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "none", n_cores = 2) summary(mod1) ### Plus-Minus CLASSIFIER WITH validation = 'loo', i.e. leave-one-out CV ### ## Not run: data(plusMinusDat) mod2 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) summary(mod2) ## End(Not run)
predict
provides predictions from the results of a pls model.
## S3 method for class 'mvdareg' predict(object, newdata, ncomp = object$ncomp, na.action = na.pass, ...)
## S3 method for class 'mvdareg' predict(object, newdata, ncomp = object$ncomp, na.action = na.pass, ...)
object |
A |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
ncomp |
the number of components to include in the model (see below). |
na.action |
function determining what should be done with missing values in newdata. The default is to predict |
... |
additional arguments. Currently ignored. |
predict.mvdareg
produces predicted values, obtained by evaluating the regression function in the frame newdata (which defaults to model.frame(object). If newdata
is omitted the predictions are based on the data used for the fit.
If comps
is missing (or is NULL), predictions of the number of latent variables is provided. Otherwise, if comps
is given parameters for a model with only the requested components is returned. The generic function residuals
return the model residuals for all the components specified for the model. If the model was fitted with na.action = na.exclude (or after setting the default na.action to na.exclude
with options), the residuals corresponding to excluded observations are returned as NA; otherwise, they are omitted.
predict.mvdareg
produces a vector of predictions or a matrix of predictions
Nelson Lee Afanador ([email protected])
NOTE: This function is adapted from mvr
in package pls with extensive modifications by Nelson Lee Afanador.
coef
, coefficients.boots
, coefficients
,
loadings
, loadings.boots
, weights
,
weight.boots
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") predict.mvdareg(mod1) ## Not run: residuals(mod1) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") predict.mvdareg(mod1) ## Not run: residuals(mod1) ## End(Not run)
Summary and print methods for mvdalab objects.
## S3 method for class 'mvdareg' print(x, ...)
## S3 method for class 'mvdareg' print(x, ...)
x |
an mvdalab object |
... |
additional arguments. Currently ignored. |
print.mvdalab
Is a generic function used to print mvdalab objects, such as print.empca
for imputeEM
, print.mvdapca
for mvdapca
objects, and summary.mvdareg
for mvdareg
objects.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") print(mod1, ncomp = 2) summary(mod1, ncomp = 2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") print(mod1, ncomp = 2) summary(mod1, ncomp = 2)
Summary and print methods for plusminus objects.
## S3 method for class 'plusminus' print(x, ...)
## S3 method for class 'plusminus' print(x, ...)
x |
an plusminus object |
... |
additional arguments. Currently ignored. |
print.plusminus
Is a generic function used to print plusminus objects, such as print.plusminus
for plusminus
objects.
Richard Baumgartner ([email protected]), Nelson Lee Afanador ([email protected])
## Not run: data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) print(mod1) ## End(Not run)
## Not run: data(plusMinusDat) mod1 <- plusminusFit(Y ~., data = plusMinusDat, validation = "loo", n_cores = 2) print(mod1) ## End(Not run)
Implementation of Procrustes Analysis in the spirit of multidimensional scaling.
proCrustes(X, Y, scaling = TRUE, standardize = FALSE, scale.unit = F, ...)
proCrustes(X, Y, scaling = TRUE, standardize = FALSE, scale.unit = F, ...)
X |
Target configuration |
Y |
Matching configuration |
scaling |
Scale Y-axis |
standardize |
Standardize configurations |
scale.unit |
Scale to unit variance |
... |
additional arguments. Currently ignored. |
This function implements Procrustes Analysis as described in the reference below. That is to say:
Translation: Fixed displacement of points through a constant distance in a common direction
Rotation: Fixed displacement of all points through a constant angle
Dilation: Stretching or shrinking by a contant amount
Rotation.Matrix |
The matrix, Q, that rotates Y towards X; obtained via |
Residuals |
residuals after fitting |
M2_min |
Residual Sums of Squares |
Xmeans |
Column Means of X |
Ymeans |
Column Means of Y |
PRMSE |
Procrustes Root Mean Square Error |
Yproj |
Projected Y-values |
scale |
logical. Should Y be scaled. |
Translation |
Scaling through a common distance based on rotation of Y and scaling parameter, c |
residuals. |
residual sum-of-squares |
Anova.MSS |
Explained Variance w.r.t. Y |
Anova.ESS |
Unexplained Variance w.r.t. Y |
Anova.TSS |
Total Sums of Squares w.r.t. X |
Nelson Lee Afanador ([email protected])
Krzanowski, Wojtek. Principles of multivariate analysis. OUP Oxford, 2000.
X <- iris[, 1:2] Y <- iris[, 3:4] proc <- proCrustes(X, Y) proc names(proc)
X <- iris[, 1:2] Y <- iris[, 3:4] proc <- proCrustes(X, Y) proc names(proc)
Functions to report the cross-validated R2 (CVR2), explained variance in the predictor variables (R2X), and the reponse (R2Y) for PLS models.
R2s(object)
R2s(object)
object |
an mvdareg object, i.e., |
R2s
is used to extract a summary of the cross-validated R2 (CVR2), explained variance in the predictor variables (R2X), and the reponse (R2Y) for PLS models.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") R2s(mod1) ## Not run: plot(R2s(mod1)) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") R2s(mod1) ## Not run: plot(R2s(mod1)) ## End(Not run)
Generates a the Score Contribution Graph both mvdareg
and mvdapca
objects.
ScoreContrib(object, ncomp = 1:object$ncomp, obs1 = 1, obs2 = NULL)
ScoreContrib(object, ncomp = 1:object$ncomp, obs1 = 1, obs2 = NULL)
object |
an object of class |
ncomp |
the number of components to include in the model (see below). |
obs1 |
the first observaion(s) in the score(s) comparison. |
obs2 |
the second observaion(s) in the score(s) comparison. |
ScoreContrib
is used to generates the score contributions for both PLS and PCA models. Up to two groups of score(s) can be selected. If only one group is selected, the contribution is measured to the model average. For PLS models the PCA loadings are replaced with the PLS weights.
The output of ScoreContrib
is a matrix of score contributions for the specified observation(s).
Nelson Lee Afanador ([email protected])
MacGregor, Process Monitoring and Diagnosis by Multiblock PLS Methods, May 1994 Vol. 40, No. 5 AIChE Journal.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") Score.Contributions1 <- ScoreContrib(mod1, ncomp = 1:2, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 2) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") Score.Contributions2 <- ScoreContrib(mod2, obs1 = 1, obs2 = 3) plot(Score.Contributions2) Score.Contributions3 <- ScoreContrib(mod1, obs1 = c(1, 3), obs2 = c(5:10)) plot(Score.Contributions3) ## End(Not run) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod3 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored Score.Contributions4 <- ScoreContrib(mod3, ncomp = 1:5, obs1 = 1, obs2 = 3) plot(Score.Contributions4, ncomp = 5) ## End(Not run) #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 2) Score.Contributions1 <- ScoreContrib(pc1, obs1 = 1, obs2 = 3) plot(Score.Contributions1)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") Score.Contributions1 <- ScoreContrib(mod1, ncomp = 1:2, obs1 = 1, obs2 = 3) plot(Score.Contributions1, ncomp = 2) ## Not run: data(Penta) mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "none") Score.Contributions2 <- ScoreContrib(mod2, obs1 = 1, obs2 = 3) plot(Score.Contributions2) Score.Contributions3 <- ScoreContrib(mod1, obs1 = c(1, 3), obs2 = c(5:10)) plot(Score.Contributions3) ## End(Not run) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod3 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored Score.Contributions4 <- ScoreContrib(mod3, ncomp = 1:5, obs1 = 1, obs2 = 3) plot(Score.Contributions4, ncomp = 5) ## End(Not run) #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 2) Score.Contributions1 <- ScoreContrib(pc1, obs1 = 1, obs2 = 3) plot(Score.Contributions1)
Generates a 2-dimensional graph of the scores for both mvdareg
and mvdapca
objects.
scoresplot(object, comps = c(1, 2), alphas = c(.95, .99), segments = 51, verbose = FALSE)
scoresplot(object, comps = c(1, 2), alphas = c(.95, .99), segments = 51, verbose = FALSE)
object |
an object of class |
comps |
a vector or length 2 corresponding to the number of components to include. |
alphas |
draw elliptical contours at these confidence levels. |
segments |
number of line-segments used to draw ellipse. |
verbose |
output results as a data frame |
scoresplot
is used to extract a 2D graphical summary of the scores of PLS and PCA models.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") scoresplot(mod1, comp = c(1, 2))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") scoresplot(mod1, comp = c(1, 2))
Missing values are sequentially updated via an EM algorithm.
SeqimputeEM(data, max.ncomps = 5, max.ssq = 0.99, Init = "mean", adjmean = FALSE, max.iters = 200, tol = .Machine$double.eps^0.25)
SeqimputeEM(data, max.ncomps = 5, max.ssq = 0.99, Init = "mean", adjmean = FALSE, max.iters = 200, tol = .Machine$double.eps^0.25)
data |
a dataset with missing values. |
max.ncomps |
integer corresponding to the maximum number of components to test |
max.ssq |
maximal SSQ for final number of components. This will be improved by automation. |
Init |
For continous variables impute either the mean or median. |
adjmean |
Adjust (recalculate) mean after each iteration. |
max.iters |
maximum number of iterations for the algorithm. |
tol |
the threshold for assessing convergence. |
A completed data frame is returned that mirrors the model matrix. NAs
are replaced with convergence values as obtained via Seqential EM algorithm. If object contains no NAs
, it is returned unaltered.
Imputed.DataFrames |
A list of imputed data frames across |
ncomps |
number of components to test |
Thanh Tran ([email protected]), Nelson Lee Afanador ([email protected])
NOTE: Publication Pending
dat <- introNAs(iris, percent = 25) SeqimputeEM(dat)
dat <- introNAs(iris, percent = 25) SeqimputeEM(dat)
This function calculates the significant multivariate correlation (smc
) metric for an mvdareg
object
smc(object, ncomps = object$ncomp, corrected = F)
smc(object, ncomps = object$ncomp, corrected = F)
object |
an mvdareg or mvdapaca object, i.e. |
ncomps |
the number of components to include in the model (see below). |
corrected |
whether there should be a correction of 1st order auto-correlation in the residuals. |
Note that hidden objects include the smc modeled matrix and error matrices
smc
is used to extract a summary of the significant multivariae correlation of a PLS model.
If comps
is missing (or is NULL
), summaries for all smc
estimates are returned. Otherwise, if comps are given parameters for a model with only the requested component comps is returned.
The output of smc
is an smc summary
detailing the following:
smc |
significant multivariate correlation statistic ( |
p.value |
p-value of the smc statistic. |
f.value |
f-value of the smc statistic. |
Significant |
Assessment of statistical significance. |
Nelson Lee Afanador ([email protected])
Thanh N. Tran, Nelson Lee Afanador, Lutgarde M.C. Buydens, Lionel Blanchet, Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC). Chemom. Intell. Lab. Syst. 2014; 138: 153:160.
Nelson Lee Afanador, Thanh N. Tran, Lionel Blanchet, Lutgarde M.C. Buydens, Variable importance in PLS in the presence of autocorrelated data - Case studies in manufacturing processes. Chemom. Intell. Lab. Syst. 2014; 139: 139:145.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc(mod1) plot(smc(mod1)) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod2 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored plot(smc(mod2, ncomps = 2)) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc(mod1) plot(smc(mod1)) ### PLS MODEL FIT WITH method = 'wrtpls' and validation = 'none', i.e. WRT-PLS is performed ### ## Not run: mod2 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored plot(smc(mod2, ncomps = 2)) ## End(Not run)
This function peforms a 1st order test of the Residual Significant Multivariate Correlation Matrix in order to help determine if the smc
should be performed correcting for 1st order autocorrelation.
smc.acfTest(object, ncomp = object$ncomp)
smc.acfTest(object, ncomp = object$ncomp)
object |
an object of class |
ncomp |
the number of components to include in the acf assessment |
This function computes a test for 1st order auto correlation in the smc
residual matrix.
The output of smc.acfTest
is a list detailing the following:
variable |
variable for whom the test is being performed |
ACF |
value of the 1st lag of the ACF |
Significant |
Assessment of the statistical significance of the 1st order lag |
Nelson Lee Afanador ([email protected])
Thanh N. Tran, Nelson Lee Afanador, Lutgarde M.C. Buydens, Lionel Blanchet, Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC). Chemom. Intell. Lab. Syst. 2014; 138: 153:160.
Nelson Lee Afanador, Thanh N. Tran, Lionel Blanchet, Lutgarde M.C. Buydens, Variable importance in PLS in the presence of autocorrelated data - Case studies in manufacturing processes. Chemom. Intell. Lab. Syst. 2014; 139: 139:145.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc.acfTest(mod1, ncomp = 2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") smc.acfTest(mod1, ncomp = 2)
This function calculates the Selectivity Ratio (sr
) metric for an mvdareg
object
sr(object, ncomps = object$ncomp)
sr(object, ncomps = object$ncomp)
object |
an mvdareg or mvdapaca object, i.e. |
ncomps |
the number of components to include in the model (see below). |
sr
is used to extract a summary of the significant multivariae correlation of a PLS model.
If comps
is missing (or is NULL
), summaries for all sr
estimates are returned. Otherwise, if comps are given parameters for a model with only the requested component comps is returned.
The output of sr
is an sr summary
detailing the following:
sr |
selectivity ratio statistic ( |
p.value |
p-value of the sr statistic. |
f.value |
f-value of the sr statistic. |
Significant |
Assessment of statistical significance. |
Note that hidden objects include the SR modeled matrix and error matrices.
Nelson Lee Afanador ([email protected])
O.M. Kvalheim, T.V. Karstang, Interpretation of latent-variable regression models. Chemom. Intell. Lab. Syst., 7 (1989), pp. 39:51
O.M. Kvalheim, Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots. J. Chemom., 24 (2010), pp. 496:504
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") sr(mod1) plot(sr(mod1)) ## Not run: mod2 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored plot(sr(mod2, ncomps = 2)) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") sr(mod1) plot(sr(mod1)) ## Not run: mod2 <- plsFit(Sepal.Length ~., scale = TRUE, data = iris, method = "wrtpls", validation = "none") #ncomp is ignored plot(sr(mod2, ncomps = 2)) ## End(Not run)
Generates a Hotelling's T2 Graph both mvdareg
and mvdapca
objects.
T2(object, ncomp = object$ncomp, phase = 1, conf = c(.95, .99), verbose = FALSE)
T2(object, ncomp = object$ncomp, phase = 1, conf = c(.95, .99), verbose = FALSE)
object |
an object of class |
ncomp |
the number of components to include in the calculation of Hotelling's T2. |
phase |
designates whether the confidence limits should reflect the current data frame, |
conf |
the confidence level(s) to use for upper control limit. |
verbose |
output results as a data frame |
T2
is used to generates a Hotelling's T2 graph both PLS and PCA models.
The output of T2
is a graph of Hotelling's T2 and a data frame listing the T2 values.
Nelson Lee Afanador ([email protected])
Hotelling, H. (1931). "The generalization of Student's ratio". Annals of Mathematical Statistics 2 (3): 360:378.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") T2(mod1, ncomp = 2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") T2(mod1, ncomp = 2)
Twenty-five observations where 'H' represents brinell hardness and 'S' represents tensile strength.
Wang_Chen
Wang_Chen
A data frame with 25 observations and the following 2 variables.
H
brinell hardness
S
tensile strength
Wang F, Chen J (1998). "Capability index using principal components analysis." Quality Engineering, 11, 21-27.
Fifty observations where 'D' represents depth, 'L' represents length, and 'W' represents width.
Wang_Chen_Sim
Wang_Chen_Sim
A simulated data frame with 50 observations and the following 3 variables.
D
depth
L
length
W
width
Data simulated by Nelson Lee Afanador from average and covariance estimates provided in Wang F, Chen J (1998). "Capability index using principal components analysis." Quality Engineering, 11, 21-27.
Computes weights bootstrap BCa confidence intervals, along with expanded bootstrap summaries.
weight.boots(object, ncomp = object$ncomp, conf = .95)
weight.boots(object, ncomp = object$ncomp, conf = .95)
object |
an object of class |
ncomp |
number of components in the model. |
conf |
desired confidence level. |
The function fits computes the bootstrap BCa confidence intervals for fitted mvdareg
models where valiation = "oob"
.
Should be used in instances in which there is reason to suspectd the percentile intervals. Results provided across all latent variables or for specific latent variables via ncomp
.
A weight.boots object contains component results for the following:
variable |
variable names. |
actual |
Actual loading estimate using all the data. |
BCa percentiles |
confidence intervals. |
boot.mean |
mean of the bootstrap. |
skewness |
skewness of the bootstrap distribution. |
bias |
estimate of bias w.r.t. the loading estimate. |
Bootstrap Error |
estimate of bootstrap standard error. |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1992) Jackknife-after-bootstrap standard errors and influence functions (with Discussion). Journal of the Royal Statistical Society, B, 54, 83:127.
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) weight.boots(mod1, ncomp = 2, conf = .95)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) weight.boots(mod1, ncomp = 2, conf = .95)
Functions to extract weights
bootstrap information from mvdalab
objects.
## S3 method for class 'mvdareg' weights(object, ncomp = object$ncomp, conf = .95, ...)
## S3 method for class 'mvdareg' weights(object, ncomp = object$ncomp, conf = .95, ...)
object |
an mvdareg or mvdapaca object, i.e. |
ncomp |
the number of components to include in the model (see below). |
conf |
for a bootstrapped model, the confidence level to use. |
... |
additional arguments. Currently ignored. |
weights
is used to extract a summary of the weights of a PLS.
If ncomps
is missing (or is NULL
), summaries for all regression estimates are returned. Otherwise, if comps is given parameters for a model with only the requested component comps is returned.
For mvdareg objects only, boostrap summaries provided are for actual regression weights, bootstrap percentiles, bootstrap mean, skewness, and bias. These summaries can also be extracted using weight.boots
A weights object contains a data frame with columns:
variable |
variable names. |
Actual |
Actual loading estimate using all the data. |
BCa percentiles |
confidence intervals. |
boot.mean |
mean of the bootstrap. |
skewness |
skewness of the bootstrap distribution. |
bias |
estimate of bias w.r.t. the loading estimate. |
Bootstrap Error |
estimate of bootstrap standard error. |
t value |
approximate 't-value' based on the |
bias t value |
approximate 'bias t-value' based on the |
Nelson Lee Afanador ([email protected])
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Efron, B. (1992) Jackknife-after-bootstrap standard errors and influence functions (with Discussion). Journal of the Royal Statistical Society, B, 54, 83:127.
weightsplot
, weight.boots
, weightsplot2D
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") weights(mod1, ncomp = 2, conf = .95)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") weights(mod1, ncomp = 2, conf = .95)
Functions to extract regression coefficient bootstrap information from mvdalab
objects.
weightsplot(object, ncomp = object$ncomp, conf = .95, verbose = FALSE)
weightsplot(object, ncomp = object$ncomp, conf = .95, verbose = FALSE)
object |
an mvdareg object, i.e. |
ncomp |
the number of components to include. |
conf |
for a bootstrapped model, the confidence level to use. |
verbose |
output results as a data frame |
weightsplot
is used to extract a graphical summary of the weights of a PLS model.
If comps
is missing (or is NULL
), a graphical summary for the nth component regression estimates are returned. Otherwise, if comps is given parameters for a model with only the requested component comps
is returned.
Boostrap graphcal summaries provided are when method = oob
.
Nelson Lee Afanador ([email protected])
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) weightsplot(mod1, ncomp = 1:2)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) weightsplot(mod1, ncomp = 1:2)
Functions to extract 2D graphical weights information from mvdalab
objects.
weightsplot2D(object, comps = c(1, 2), verbose = FALSE)
weightsplot2D(object, comps = c(1, 2), verbose = FALSE)
object |
an mvdareg object, i.e. |
comps |
a vector or length 2 corresponding to the number of components to include. |
verbose |
output results as a data frame |
weightsplot2D
is used to extract a graphical summary of the weights of a PLS model.
If comp
is missing (or is NULL
), a graphical summary for the 1st and 2nd componentsare returned.
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") weightsplot2D(mod1, comp = c(1, 2))
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") weightsplot2D(mod1, comp = c(1, 2))
Weight Randomization Test algorithm for PLS1
wrtpls.fit(X, Y, ncomp, perms, alpha, ...)
wrtpls.fit(X, Y, ncomp, perms, alpha, ...)
X |
a matrix of observations. |
Y |
a vector. |
ncomp |
the number of components to include in the model (see below). |
alpha |
the significance level for |
perms |
the number of permutations to run for |
... |
additional arguments. Currently ignored. |
This function should not be called directly, but through plsFit
with the argument method="wrtpls"
. It implements the Bidiag2 scores algorithm with a permutation test for selecting the statistically significant components.
An object of class mvdareg
is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:
loadings |
X loadings |
weights |
weights |
D2 |
bidiag2 matrix |
iD2 |
inverse of bidiag2 matrix |
Ymean |
mean of reponse variable |
Xmeans |
mean of predictor variables |
coefficients |
regression coefficients |
y.loadings |
y-loadings |
scores |
X scores |
R |
orthogonal weights |
Y |
scaled response values |
Yactual |
actual response values |
fitted |
fitted values |
residuals |
residuals |
Xdata |
X matrix |
iPreds |
predicted values |
y.loadings2 |
scaled y-loadings |
wrtpls |
permutations effected |
wrtpls.out.Sig |
Significant LVs |
wrtpls.crit |
weight critical values |
actual.normwobs |
normed weights |
fit.time |
model fitting time |
val.method |
validation method |
ncomp |
number of latent variables |
perms |
number of permutations performed |
alpha |
permutation alpha value |
method |
PLS algorithm |
scale |
scaling used |
scaled |
was scaling performed |
call |
model call |
terms |
model terms |
mm |
model matrix |
model |
fitted model |
Nelson Lee Afanador ([email protected]), Thanh Tran ([email protected])
Indahl, Ulf G., (2014) The geometry of PLS1 explained properly: 10 key notes on mathematical properties of and some alternative algorithmic approaches to PLS1 modeling. Journal of Chemometrics, 28, 168:180.
Manne R., Analysis of two partial-least-squares algorithms for multi-variate calibration. Chemom. Intell. Lab. Syst. 1987; 2: 187:197.
Thanh Tran, Ewa Szymanska, Jan Gerretzen, Lutgarde Buydens, Nelson Lee Afanador, Lionel Blanchet, Weight Randomization Test for the Selection of the Number of Components in PLS Models. Chemom. Intell. Lab. Syst., accepted for publication - Jan 2017.
Generates a graph of the X-residuals for both mvdareg
and mvdapca
objects.
Xresids(object, ncomp = object$ncomp, conf = c(.95, .99), normalized = TRUE, verbose = FALSE)
Xresids(object, ncomp = object$ncomp, conf = c(.95, .99), normalized = TRUE, verbose = FALSE)
object |
an object of class |
ncomp |
the number of components to include in the calculation of the X-residuals. |
conf |
the confidence level(s) to use for upper control limit. |
normalized |
should residuals be normalized |
verbose |
output results as a data frame |
Xresids
is used to generates a graph of the X-residuals for both PLS and PCA models.
The output of Xresids
is a graph of X-residuals and a data frame listing the X-residuals values.
Nelson Lee Afanador ([email protected])
MacGregor, Process Monitoring and Diagnosis by Multiblock PLS Methods, May 1994 Vol. 40, No. 5 AIChE Journal.
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") Xresids(mod1, ncomp = 2)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") Xresids(mod1, ncomp = 2)
Generates the squared prediction error (SPE) contributions and graph both mvdareg
and mvdapca
objects.
XresidualContrib(object, ncomp = object$ncomp, obs1 = 1)
XresidualContrib(object, ncomp = object$ncomp, obs1 = 1)
object |
an object of class |
ncomp |
the number of components to include in the SPE calculation. |
obs1 |
the observaion in SPE assessment. |
XresidualContrib
is used to generates the squared prediction error (SPE) contributions and graph for both PLS and PCA models. Only one observation at a time is supported.
The output of XresidualContrib
is a matrix of score contributions for a specified observation and the corresponding graph.
Nelson Lee Afanador ([email protected])
MacGregor, Process Monitoring and Diagnosis by Multiblock PLS Methods, May 1994 Vol. 40, No. 5 AIChE Journal
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") XresidualContrib(mod1, ncomp = 2, obs1 = 3) ## Not run: #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 4) XresidualContrib(pc1, ncomp = 3, obs1 = 3) ## End(Not run)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") XresidualContrib(mod1, ncomp = 2, obs1 = 3) ## Not run: #PCA Model pc1 <- pcaFit(Penta[, -1], ncomp = 4) XresidualContrib(pc1, ncomp = 3, obs1 = 3) ## End(Not run)
Functions to extract the y-loadings from mvdareg and mvdapca objects.
y.loadings(object, conf = .95)
y.loadings(object, conf = .95)
object |
an |
conf |
for a bootstrapped model, the confidence level to use. |
y.loadings
is used to extract a summary of the y-loadings from a PLS or PCA model.
If comps
is missing (or is NULL
), summaries for all regression estimates are returned. Otherwise, if comps
is provided the requested component comps are returned.
For mvdareg
objects only, boostrap summaries provided are for actual regression y.loadings
, bootstrap percentiles, bootstrap mean, skewness, and bias. These summaries can also be extracted using y.loadings.boots
Nelson Lee Afanador ([email protected])
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") y.loadings(mod1)
data(Penta) mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "loo") y.loadings(mod1)
Functions to extract the y-loadings from mvdareg and mvdapca objects.
y.loadings.boots(object, ncomp = object$ncomp, conf = 0.95)
y.loadings.boots(object, ncomp = object$ncomp, conf = 0.95)
object |
an |
ncomp |
the number of components to include in the model (see below). |
conf |
for a bootstrapped model, the confidence level to use. |
y.loadings.boots
is used to extract a summary of the y-loadings from a PLS or PCA model.
If comps
is missing (or is NULL
), summaries for all regression estimates are returned. Otherwise, if comps
is provided the requested component comps are returned.
For mvdareg
objects only, boostrap summaries provided are for actual regression y.loadings
, bootstrap percentiles, bootstrap mean, skewness, and bias. These summaries can also be extracted using y.loadings.boots
Nelson Lee Afanador ([email protected])
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) y.loadings(mod1) y.loadings.boots(mod1)
data(Penta) ## Number of bootstraps set to 300 to demonstrate flexibility ## Use a minimum of 1000 (default) for results that support bootstraping mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], ncomp = 2, validation = "oob", boots = 300) y.loadings(mod1) y.loadings.boots(mod1)