Title: | Multivariate Analysis |
---|---|
Description: | Multivariate analysis, having functions that perform simple correspondence analysis (CA) and multiple correspondence analysis (MCA), principal components analysis (PCA), canonical correlation analysis (CCA), factorial analysis (FA), multidimensional scaling (MDS), linear (LDA) and quadratic discriminant analysis (QDA), hierarchical and non-hierarchical cluster analysis, simple and multiple linear regression, multiple factor analysis (MFA) for quantitative, qualitative, frequency (MFACT) and mixed data, biplot, scatter plot, projection pursuit (PP), grant tour method and other useful functions for the multivariate analysis. |
Authors: | Paulo Cesar Ossani [aut, cre] , Marcelo Angelo Cirillo [aut] |
Maintainer: | Paulo Cesar Ossani <[email protected]> |
License: | GPL-3 |
Version: | 2.2.5 |
Built: | 2024-12-23 06:48:50 UTC |
Source: | CRAN |
Multivariate analysis, having functions that perform simple correspondence analysis (CA) and multiple correspondence analysis (MCA), principal components analysis (PCA), canonical correlation analysis (CCA), factorial analysis (FA), multidimensional scaling (MDS), linear (LDA) and quadratic discriminant analysis (QDA), hierarchical and non-hierarchical cluster analysis, simple and multiple linear regression, multiple factor analysis (MFA) for quantitative, qualitative, frequency (MFACT) and mixed data, biplot, scatter plot, projection pursuit (PP), grant tour method and other useful functions for the multivariate analysis.
Package: | MVar |
Type: | Package |
Version: | 2.2.5 |
Date: | 2024-11-22 |
License: | GPL(>=2) |
LazyLoad: | yes |
Paulo Cesar Ossani and Marcelo Angelo Cirillo.
Maintainer: Paulo Cesar Ossani <[email protected]>
Abdessemed, L.; Escofier, B.; Analyse factorielle multiple de tableaux de frequencies: comparaison avec l'analyse canonique des correspondences. Journal de la Societe de Statistique de Paris, Paris, v. 137, n. 2, p. 3-18, 1996.
Abdi, H. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In: SALKIND, N. J. (Ed.). Encyclopedia of measurement and statistics. Thousand Oaks: Sage, 2007. p. 907-912.
Abdi, H.; Valentin, D. Multiple factor analysis (MFA). In: SALKIND, N. J. (Ed.). Encyclopedia of measurement and statistics. Thousand Oaks: Sage, 2007. p. 657-663.
Abdi, H.; Williams, L. Principal component analysis. WIREs Computational Statatistics, New York, v. 2, n. 4, p. 433-459, July/Aug. 2010.
Abdi, H.; Williams, L.; Valentin, D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Computational Statatistics, New York, v. 5, n. 2, p. 149-179, Feb. 2013.
Asimov, D. The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing, 6(1), 128-143, 1985.
Asimov, D.; Buja, A. The grand tour via geodesic interpolation of 2-frames. in Visual Data Exploration and Analysis. Symposium on Electronic Imaging Science and Technology, IS&T/SPIE. 1994.
Becue-Bertaut, M.; Pages, J. A principal axes method for comparing contingency tables: MFACT. Computational Statistics & Data Analysis, New York, v. 45, n. 3, p. 481-503, Feb. 2004
Becue-Bertaut, M.; Pages, J. Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data. Computational Statistics & Data Analysis, New York, v. 52, n. 6, p. 3255-3268, Feb. 2008.
Benzecri, J. Analyse de l'inertie intraclasse par l'analyse d'un tableau de contingence: intra-classinertia analysis through the analysis of a contingency table. Les Cahiers de l'Analyse des Donnees, Paris, v. 8, n. 3, p. 351-358, 1983.
Buja, A.; Asimov, D. Grand tour methods: An outline. Computer Science and Statistics, 17:63-67. 1986.
Buja, A.; Cook, D.; Asimov, D.; Hurley, C. Computational Methods for High-Dimensional Rotations in Data Visualization, in C. R. Rao, E. J. Wegman & J. L. Solka, eds, "Handbook of Statistics: Data Mining and Visualization", Elsevier/North Holland, http://www.elsevier.com, pp. 391-413. 2005.
Charnet, R., at al. Analise de modelos de regressao lienar, 2a ed. Campinas: Editora da Unicamp, 2008. 357 p.
Cook, D.; Lee, E. K.; Buja, A.; WickmamM, H. Grand tours, projection pursuit guided tours and manual controls. In Chen, Chunhouh, Hardle, Wolfgang, Unwin, e Antony (Eds.), Handbook of Data Visualization, Springer Handbooks of Computational Statistics, chapter III.2, p. 295-314. Springer, 2008.
Cook, D.; Buja, A.; Cabrera, J. Projection pursuit indexes based on orthonormal function expansions. Journal of Computational and Graphical Statistics, 2(3):225-250, 1993.
Cook, D.; Buja, A.; Cabrera, J.; Hurley, C. Grand tour and projection pursuit, Journal of Computational and Graphical Statistics, 4(3), 155-172, 1995.
Cook, D.; Swayne, D. F. Interactive and Dynamic Graphics for Data Analysis: With R and GGobi. Springer. 2007.
Escofier, B. Analyse factorielle en reference a un modele: application a l'analyse d'un tableau d'echanges. Revue de Statistique Appliquee, Paris, v. 32, n. 4, p. 25-36, 1984.
Escofier, B.; Drouet, D. Analyse des differences entre plusieurs tableaux de frequence. Les Cahiers de l'Analyse des Donnees, Paris, v. 8, n. 4, p. 491-499, 1983.
Escofier, B.; Pages, J. Analyse factorielles simples et multiples. Paris: Dunod, 1990. 267 p.
Escofier, B.; Pages, J. Analyses factorielles simples et multiples: objectifs, methodes et interpretation. 4th ed. Paris: Dunod, 2008. 318 p.
Escofier, B.; Pages, J. Comparaison de groupes de variables definies sur le meme ensemble d'individus: un exemple d'applications. Le Chesnay: Institut National de Recherche en Informatique et en Automatique, 1982. 121 p.
Escofier, B.; Pages, J. Multiple factor analysis (AFUMULT package). Computational Statistics & Data Analysis, New York, v. 18, n. 1, p. 121-140, Aug. 1994
Espezua, S.; Villanueva, E.; Maciel, C. D.; Carvalho, A. A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing, 149, 767-776, 2015.
Ferreira, D. F. Estatistica multivariada. 2. ed. rev. e ampl. Lavras: UFLA, 2011. 675 p.
Friedman, J. H., Tukey, J. W. A projection pursuit algorithm for exploratory data analysis. IEEE Transaction on Computers, 23(9):881-890, 1974.
Greenacre, M.; Blasius, J. Multiple correspondence analysis and related methods. New York: Taylor and Francis, 2006. 607 p.
Hastie, T.; Buja, A.; Tibshirani, R. Penalized discriminant analysis. The Annals of Statistics. 23(1), 73-102 . 1995.
Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, Arlington, v. 24, p. 417-441, Sept. 1933.
Huber, P. J. Projection pursuit. Annals of Statistics, 13(2):435-475, 1985.
Hurley, C.; Buja, A. Analyzing high-dimensional data with motion graphics, SIAM Journal of Scientific and Statistical Computing, 11 (6), 1193-1211. 1990.
Johnson, R. A.; Wichern, D. W. Applied multivariate statistical analysis. 6th ed. New Jersey: Prentice Hall, 2007. 794 p.
Jones, M. C.; Sibson, R. What is projection pursuit, (with discussion), Journal of the Royal Statistical Society, Series A 150, 1-36, 1987.
Lee, E.; Cook, D.; Klinke, S.; Lumley, T. Projection pursuit for exploratory supervised classification. Journal of Computational and Graphical Statistics, 14(4):831-846, 2005.
Lee, E. K., Cook, D. A projection pursuit index for large p small n data. Statistics and Computing, 20(3):381-392, 2010.
Martinez, W. L.; Martinez, A. R. Computational Statistics Handbook with MATLAB, 2th. ed. New York: Chapman & Hall/CRC, 2007. 794 p.
Martinez, W. L.; Martinez, A. R.; Solka, J. Exploratory Data Analysis with MATLAB, 2th. ed. New York: Chapman & Hall/CRC, 2010. 499 p.
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Ossani, P. C.; Cirillo, M. A.; Borem, F. M.; Ribeiro, D. E.; Cortez, R. M. Quality of specialty coffees: a sensory evaluation by consumers using the MFACT technique. Revista Ciencia Agronomica (UFC. Online), v. 48, p. 92-100, 2017.
Ossani, P. C. Qualidade de cafes especiais e nao especiais por meio da analise de multiplos fatores para tabelas de contingencias. 2015. 107 p. Dissertacao (Mestrado em Estatistica e Experimentacao Agropecuaria) - Universidade Federal de Lavras, Lavras, 2015.
Pages, J. Analyse factorielle multiple appliquee aux variables qualitatives et aux donnees mixtes. Revue de Statistique Appliquee, Paris, v. 50, n. 4, p. 5-37, 2002.
Pages, J. Multiple factor analysis: main features and application to sensory data. Revista Colombiana de Estadistica, Bogota, v. 27, n. 1, p. 1-26, 2004.
Pena, D.; Prieto, F. Cluster identification using projections. Journal of the American Statistical Association, 96(456):1433-1445, 2001.
Posse, C. Projection pursuit exploratory data analysis, Computational Statistics and Data Analysis, 29:669-687, 1995a.
Posse, C. Tools for two-dimensional exploratory projection pursuit, Journal of Computational and Graphical Statistics, 4:83-100, 1995b
Rencher, A.C.; Methods of Multivariate Analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Young, F. W.; Rheingans P. Visualizing structure in high-dimensional multivariate data, IBM Journal of Research and Development, 35:97-107, 1991.
Young, F. W.; Faldowski R. A.; McFarlane M. M. Multivariate statistical visualization, in Handbook of Statistics, Vol 9, C. R. Rao (ed.), The Netherlands: Elsevier Science Publishers, 959-998, 1993.
Plots the Biplot graph.
Biplot(data, alpha = 0.5, title = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, var = TRUE, obs = TRUE, linlab = NA, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300)
Biplot(data, alpha = 0.5, title = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, var = TRUE, obs = TRUE, linlab = NA, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300)
data |
Data for plotting. |
alpha |
Representativeness of the individuals (alpha), representativeness of the variables (1 - alpha), being 0.5 the default. |
title |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
var |
Adds the variable projections to graph (default = TRUE). |
obs |
Adds the observations to graph (default = TRUE). |
linlab |
Vector with the labels for the observations. |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
axes |
Plots the X and Y axes (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
Biplot |
Biplot graph. |
Md |
Matrix eigenvalues. |
Mu |
Matrix U (eigenvectors). |
Mv |
Matrix V (eigenvectors). |
coorI |
Coordinates of the individuals. |
coorV |
Coordinates of the variables. |
pvar |
Proportion of the principal components. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data(iris) # dataset data <- iris[,1:4] Biplot(data) cls <- iris[,5] res <- Biplot(data, alpha = 0.6, title = "Biplot of data valuing individuals", class = cls, classcolor = c("goldenrod3","gray56","red"), posleg = 2, boxleg = FALSE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) print(res$pvar) res <- Biplot(data, alpha = 0.4, title = "Graph valuing the variables", xlabel = "", ylabel = "", color = FALSE, obs = FALSE, savptc = FALSE, width = 3236, height = 2000, res = 300) print(res$pvar)
data(iris) # dataset data <- iris[,1:4] Biplot(data) cls <- iris[,5] res <- Biplot(data, alpha = 0.6, title = "Biplot of data valuing individuals", class = cls, classcolor = c("goldenrod3","gray56","red"), posleg = 2, boxleg = FALSE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) print(res$pvar) res <- Biplot(data, alpha = 0.4, title = "Graph valuing the variables", xlabel = "", ylabel = "", color = FALSE, obs = FALSE, savptc = FALSE, width = 3236, height = 2000, res = 300) print(res$pvar)
Performs simple correspondence analysis (CA) and multiple (MCA) in a data set.
CA(data, typdata = "f", typmatrix = "I")
CA(data, typdata = "f", typmatrix = "I")
data |
Data to be analyzed (contingency table). |
typdata |
"f" for frequency data (default), |
typmatrix |
Matrix used for calculations when typdata = "c". |
depdata |
Verify if the rows and columns are dependent, or independent by the chi-square test, at the 5% significance level. |
typdata |
Data type: "F" frequency or "C" qualitative. |
numcood |
Number of principal components. |
mtxP |
Matrix of the relative frequency. |
vtrR |
Vector with sums of the rows. |
vtrC |
Vector with sums of the columns. |
mtxPR |
Matrix with profile of the rows. |
mtxPC |
Matrix with profile of the columns |
mtxZ |
Matrix Z. |
mtxU |
Matrix with the eigenvectors U. |
mtxV |
Matrix with the eigenvectors V. |
mtxL |
Matrix with eigenvalues. |
mtxX |
Matrix with the principal coordinates of the rows. |
mtxY |
Matrix with the principal coordinates of the columns. |
mtxAutvlr |
Matrix of the inertias (variances), with the proportions and proportions accumulated. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data(DataFreq) # frequency data set data <- DataFreq[,2:ncol(DataFreq)] rownames(data) <- as.character(t(DataFreq[1:nrow(DataFreq),1])) res <- CA(data = data, "f") # performs CA print("Is there dependency between rows and columns?"); res$depdata print("Number of principal coordinates:"); res$numcood print("Principal coordinates of the rows:"); round(res$mtxX,2) print("Principal coordinates of the columns:"); round(res$mtxY,2) print("Inertia of the principal components:"); round(res$mtxAutvlr,2)
data(DataFreq) # frequency data set data <- DataFreq[,2:ncol(DataFreq)] rownames(data) <- as.character(t(DataFreq[1:nrow(DataFreq),1])) res <- CA(data = data, "f") # performs CA print("Is there dependency between rows and columns?"); res$depdata print("Number of principal coordinates:"); res$numcood print("Principal coordinates of the rows:"); round(res$mtxX,2) print("Principal coordinates of the columns:"); round(res$mtxY,2) print("Inertia of the principal components:"); round(res$mtxAutvlr,2)
Perform Canonical Correlation Analysis (CCA) on a data set.
CCA(X = NULL, Y = NULL, type = 1, test = "Bartlett", sign = 0.05)
CCA(X = NULL, Y = NULL, type = 1, test = "Bartlett", sign = 0.05)
X |
First group of variables of a data set. |
Y |
Second group of variables of a data set. |
type |
1 for analysis using the covariance matrix (default), |
test |
Test of significance of the relationship between the group X and Y: |
sign |
Test significance level (default 5%). |
Cxx |
Covariance matrix or correlation Cxx. |
Cyy |
Covariance matrix or correlation Cyy. |
Cxy |
Covariance matrix or correlation Cxy. |
Cyx |
Covariance matrix or correlation Cyx. |
var.UV |
Matrix with eigenvalues (variances) of the canonical pairs U and V. |
corr.UV |
Matrix of the correlation of the canonical pairs U and V. |
coef.X |
Matrix of the canonical coefficients of the group X. |
coef.Y |
Matrix of the canonical coefficients of the group Y. |
corr.X |
Matrix of the correlations between canonical variables and the original variables of the group X. |
corr.Y |
Matrix of the correlations between the canonical variables and the original variables of the group Y. |
score.X |
Matrix with the scores of the group X. |
score.Y |
Matrix with the scores of the group Y. |
sigtest |
Returns the significance test of the relationship between group X and Y: "Bartlett" (default) or "Rao". |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Lattin, J.; Carrol, J. D.; Green, P. E. Analise de dados multivariados. 1th. ed. Sao Paulo: Cengage Learning, 2011. 455 p.
data(DataMix) # data set data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[,1] X <- data[,1:2] Y <- data[,5:6] res <- CCA(X, Y, type = 2, test = "Bartlett", sign = 0.05) print("Matrix with eigenvalues (variances) of the canonical pairs U and V:"); round(res$var.UV,3) print("Matrix of the correlation of the canonical pairs U and V:"); round(res$corr.UV,3) print("Matrix of the canonical coefficients of the group X:"); round(res$coef.X,3) print("Matrix of the canonical coefficients of the group Y:"); round(res$coef.Y,3) print("Matrix of the correlations between the canonical variables and the original variables of the group X:"); round(res$corr.X,3) print("Matrix of the correlations between the canonical variables and the original variables of the group Y:"); round(res$corr.Y,3) print("Matrix with the scores of the group X:"); round(res$score.X,3) print("Matrix with the scores of the group Y:"); round(res$score.Y,3) print("test of significance of the canonical pairs:"); res$sigtest
data(DataMix) # data set data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[,1] X <- data[,1:2] Y <- data[,5:6] res <- CCA(X, Y, type = 2, test = "Bartlett", sign = 0.05) print("Matrix with eigenvalues (variances) of the canonical pairs U and V:"); round(res$var.UV,3) print("Matrix of the correlation of the canonical pairs U and V:"); round(res$corr.UV,3) print("Matrix of the canonical coefficients of the group X:"); round(res$coef.X,3) print("Matrix of the canonical coefficients of the group Y:"); round(res$coef.Y,3) print("Matrix of the correlations between the canonical variables and the original variables of the group X:"); round(res$corr.X,3) print("Matrix of the correlations between the canonical variables and the original variables of the group Y:"); round(res$corr.Y,3) print("Matrix with the scores of the group X:"); round(res$score.X,3) print("Matrix with the scores of the group Y:"); round(res$score.Y,3) print("test of significance of the canonical pairs:"); res$sigtest
Performs hierarchical and non-hierarchical cluster analysis in a data set.
Cluster(data, titles = NA, hierarquic = TRUE, analysis = "Obs", cor.abs = FALSE, normalize = FALSE, distance = "euclidean", method = "complete", horizontal = FALSE, num.groups = 0, lambda = 2, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Cluster(data, titles = NA, hierarquic = TRUE, analysis = "Obs", cor.abs = FALSE, normalize = FALSE, distance = "euclidean", method = "complete", horizontal = FALSE, num.groups = 0, lambda = 2, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
data |
Data to be analyzed. |
titles |
Titles of the graphics, if not set, assumes the default text. |
hierarquic |
Hierarchical groupings (default = TRUE), for non-hierarchical groupings (method K-Means), only for case 'analysis' = "Obs". |
analysis |
"Obs" for analysis on observations (default), "Var" for analysis on variables. |
cor.abs |
Matrix of absolute correlation case 'analysis' = "Var" (default = FALSE). |
normalize |
Normalize the data only for case 'analysis' = "Obs" (default = FALSE). |
distance |
Metric of the distances in case of hierarchical groupings: "euclidean" (default), "maximum", "manhattan", "canberra", "binary" or "minkowski". Case Analysis = "Var" the metric will be the correlation matrix, according to cor.abs. |
method |
Method for analyzing hierarchical groupings: "complete" (default), "ward.D", "ward.D2", "single", "average", "mcquitty", "median" or "centroid". |
horizontal |
Horizontal dendrogram (default = FALSE). |
num.groups |
Number of groups to be formed. |
lambda |
Value used in the minkowski distance. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Several graphics.
tab.res |
Table with similarities and distances of the groups formed. |
groups |
Original data with groups formed. |
res.groups |
Results of the groups formed. |
R.sqt |
Result of the R squared. |
sum.sqt |
Total sum of squares. |
mtx.dist |
Matrix of the distances. |
Paulo Cesar Ossani
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Mingoti, S. A. analysis de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] rownames(data) <- DataQuan[1:nrow(DataQuan),1] res <- Cluster(data, titles = NA, hierarquic = TRUE, analysis = "Obs", cor.abs = FALSE, normalize = FALSE, distance = "euclidean", method = "ward.D", horizontal = FALSE, num.groups = 2, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) print("R squared:"); res$R.sqt # print("Total sum of squares:"); res$sum.sqt print("Groups formed:"); res$groups # print("Table with similarities and distances:"); res$tab.res # print("Table with the results of the groups:"); res$res.groups # print("Distance Matrix:"); res$mtx.dist write.table(file=file.path(tempdir(),"SimilarityTable.csv"), res$tab.res, sep=";", dec=",",row.names = FALSE) write.table(file=file.path(tempdir(),"GroupData.csv"), res$groups, sep=";", dec=",",row.names = TRUE) write.table(file=file.path(tempdir(),"GroupResults.csv"), res$res.groups, sep=";", dec=",",row.names = TRUE)
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] rownames(data) <- DataQuan[1:nrow(DataQuan),1] res <- Cluster(data, titles = NA, hierarquic = TRUE, analysis = "Obs", cor.abs = FALSE, normalize = FALSE, distance = "euclidean", method = "ward.D", horizontal = FALSE, num.groups = 2, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) print("R squared:"); res$R.sqt # print("Total sum of squares:"); res$sum.sqt print("Groups formed:"); res$groups # print("Table with similarities and distances:"); res$tab.res # print("Table with the results of the groups:"); res$res.groups # print("Distance Matrix:"); res$mtx.dist write.table(file=file.path(tempdir(),"SimilarityTable.csv"), res$tab.res, sep=";", dec=",",row.names = FALSE) write.table(file=file.path(tempdir(),"GroupData.csv"), res$groups, sep=";", dec=",",row.names = TRUE) write.table(file=file.path(tempdir(),"GroupResults.csv"), res$res.groups, sep=";", dec=",",row.names = TRUE)
Find the coefficient of variation of the data, either overall or per column.
CoefVar(data, type = 1)
CoefVar(data, type = 1)
data |
Data to be analyzed. |
type |
1 Coefficient of overall variation (default), |
Coefficient of variation, either overall or per column.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Ferreira, D. F.; Estatistica Basica. 2 ed. rev. Lavras: UFLA, 2009. 664 p.
data(DataQuan) # data set data <- DataQuan[,2:8] res <- CoefVar(data, type = 1) # Coefficient of overall variation round(res,2) res <- CoefVar(data, type = 2) # Coefficient of variation per column round(res,2)
data(DataQuan) # data set data <- DataQuan[,2:8] res <- CoefVar(data, type = 1) # Coefficient of overall variation round(res,2) res <- CoefVar(data, type = 2) # Coefficient of variation per column round(res,2)
Perform linear and quadratic discriminant analysis.
DA(data, class = NA, type = "lda", validation = "learning", method = "moment", prior = NA, testing = NA)
DA(data, class = NA, type = "lda", validation = "learning", method = "moment", prior = NA, testing = NA)
data |
Data to be classified. |
class |
Vector with data classes names. |
type |
"lda": linear discriminant analysis (default), or |
validation |
Type of validation: |
method |
Classification method: |
prior |
Probabilities of occurrence of classes. If not specified, it will take the proportions of the classes. If specified, probabilities must follow the order of factor levels. |
testing |
Vector with indices that will be used in data as test. For validation = "learning", one has testing = NA. |
confusion |
Confusion table. |
error.rate |
Overall error ratio. |
prior |
Probability of classes. |
type |
Type of discriminant analysis. |
validation |
Type of validation. |
num.class |
Number of classes. |
class.names |
Class names. |
method |
Classification method. |
num.correct |
Number of correct observations. |
results |
Matrix with comparative classification results. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Ripley, B. D. Pattern Recognition and Neural Networks. Cambridge University Press, 1996.
Venabless, W. N.; Ripley, B. D. Modern Applied Statistics with S. Fourth edition. Springer, 2002.
data(iris) # data set data = iris[,1:4] # data to be classified class = iris[,5] # data class prior = c(1,1,1)/3 # a priori probability of the classs res <- DA(data, class, type = "lda", validation = "learning", method = "mle", prior = prior, testing = NA) print("confusion table:"); res$confusion print("Overall hit ratio:"); 1 - res$error.rate print("Probability of classes:"); res$prior print("classification method:"); res$method print("type of discriminant analysis:"); res$type print("class names:"); res$class.names print("Number of classess:"); res$num.class print("type of validation:"); res$validation print("Number of correct observations:"); res$num.correct print("Matrix with comparative classification results:"); res$results ### cross-validation ### amostra = sample(2, nrow(data), replace = TRUE, prob = c(0.7,0.3)) datatrain = data[amostra == 1,] # training data datatest = data[amostra == 2,] # test data dim(datatrain) # training data dimension dim(datatest) # test data dimension testing = as.integer(rownames(datatest)) # test data index res <- DA(data, class, type = "qda", validation = "testing", method = "moment", prior = NA, testing = testing) print("confusion table:"); res$confusion print("Overall hit ratio:"); 1 - res$error.rate print("Number of correct observations:"); res$num.correct print("Matrix with comparative classification results:"); res$results
data(iris) # data set data = iris[,1:4] # data to be classified class = iris[,5] # data class prior = c(1,1,1)/3 # a priori probability of the classs res <- DA(data, class, type = "lda", validation = "learning", method = "mle", prior = prior, testing = NA) print("confusion table:"); res$confusion print("Overall hit ratio:"); 1 - res$error.rate print("Probability of classes:"); res$prior print("classification method:"); res$method print("type of discriminant analysis:"); res$type print("class names:"); res$class.names print("Number of classess:"); res$num.class print("type of validation:"); res$validation print("Number of correct observations:"); res$num.correct print("Matrix with comparative classification results:"); res$results ### cross-validation ### amostra = sample(2, nrow(data), replace = TRUE, prob = c(0.7,0.3)) datatrain = data[amostra == 1,] # training data datatest = data[amostra == 2,] # test data dim(datatrain) # training data dimension dim(datatest) # test data dimension testing = as.integer(rownames(datatest)) # test data index res <- DA(data, class, type = "qda", validation = "testing", method = "moment", prior = NA, testing = testing) print("confusion table:"); res$confusion print("Overall hit ratio:"); 1 - res$error.rate print("Number of correct observations:"); res$num.correct print("Matrix with comparative classification results:"); res$results
Set of data categorized by coffees, on sensorial abilities in the consumption of special coffees.
data(DataCoffee)
data(DataCoffee)
Data set of a research done with the purpose of evaluating the concordance between the responses of different groups of consumers with different sensorial abilities. The experiment relates the sensorial analysis of special coffees defined by (A) Yellow Bourbon, cultivated at altitudes greater than 1200 m; (D) idem to (A) differing only in the preparation of the samples; (B) Acaia cultivated at an altitude of less than 1,100 m; (C) identical to (B) but differentiating the sample preparation. Here the data are categorized by coffees. The example given demonstrates the results found in OSSANI et al. (2017).
Ossani, P. C.; Cirillo, M. A.; Borem, F. M.; Ribeiro, D. E.; Cortez, R. M.. Quality of specialty coffees: a sensory evaluation by consumers using the MFACT technique. Revista Ciencia Agronomica (UFC. Online), v. 48, p. 92-100, 2017.
Ossani, P. C. Qualidade de cafes especiais e nao especiais por meio da analise de multiplos fatores para tabelas de contingencias. 2015. 107 p. Dissertacao (Mestrado em Estatistica e Experimentacao Agropecuaria) - Universidade Federal de Lavras, Lavras, 2015.
data(DataCoffee) # categorized data set data <- DataCoffee[,2:ncol(DataCoffee)] rownames(data) <- as.character(t(DataCoffee[1:nrow(DataCoffee),1])) group.names = c("Coffee A", "Coffee B", "Coffee C", "Coffee D") mf <- MFA(data, c(16,16,16,16), c(rep("f",4)), group.names) print("Principal components variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2) tit <- c("Scree-plot","Individuals","Individuals / Types coffees","Inercias Groups") Plot.MFA(mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = NA, casc = FALSE) # plotting several graphs on the screen
data(DataCoffee) # categorized data set data <- DataCoffee[,2:ncol(DataCoffee)] rownames(data) <- as.character(t(DataCoffee[1:nrow(DataCoffee),1])) group.names = c("Coffee A", "Coffee B", "Coffee C", "Coffee D") mf <- MFA(data, c(16,16,16,16), c(rep("f",4)), group.names) print("Principal components variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2) tit <- c("Scree-plot","Individuals","Individuals / Types coffees","Inercias Groups") Plot.MFA(mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = NA, casc = FALSE) # plotting several graphs on the screen
Simulated data set with the weekly frequency of the number of coffee cups consumed weekly in some world capitals.
data(DataFreq)
data(DataFreq)
Set of data with 6 rows and 9 columns. There are 6 observations described by 9 variables: Group by sex and age, Sao Paulo - Cafe Bourbon, London - Cafe Bourbon, Athens - Cafe Bourbon, London - Cafe Acaia, Athens - Cafe Catuai, Sao Paulo - Cafe Catuai, Athens - Cafe Catuai.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataFreq) DataFreq
data(DataFreq) DataFreq
Set of data categorized by coffees, on sensorial abilities in the consumption of special coffees.
data(DataInd)
data(DataInd)
Data set of a research done with the purpose of evaluating the concordance between the responses of different groups of consumers with different sensorial abilities. The experiment relates the sensorial analysis of special coffees defined by (A) Yellow Bourbon, cultivated at altitudes greater than 1200 m; (D) idem to (A) differing only in the preparation of the samples; (B) Acaia cultivated at an altitude of less than 1,100 m; (C) identical to (B) but differentiating the sample preparation. Here the data are categorized by coffees. The example given demonstrates the results found in OSSANI et al. (2017).
Ossani, P. C.; Cirillo, M. A.; Borem, F. M.; Ribeiro, D. E.; Cortez, R. M.. Quality of specialty coffees: a sensory evaluation by consumers using the MFACT technique. Revista Ciencia Agronomica (UFC. Online), v. 48, p. 92-100, 2017.
Ossani, P. C. Qualidade de cafes especiais e nao especiais por meio da analise de multiplos fatores para tabelas de contingencias. 2015. 107 p. Dissertacao (Mestrado em Estatistica e Experimentacao Agropecuaria) - Universidade Federal de Lavras, Lavras, 2015.
data(DataInd) # categorized data set data <- DataInd[,2:ncol(DataInd)] rownames(data) <- as.character(t(DataInd[1:nrow(DataInd),1])) group.names = c("Group 1", "Group 2", "Group 3", "Group 4") mf <- MFA(data, c(16,16,16,16), c(rep("f",4)), group.names) print("Principal components variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2) tit <- c("Scree-plot","Individuals","Individuals / Types coffees","Inercias Groups") Plot.MFA(mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = NA, casc = FALSE) # plotting several graphs on the screen
data(DataInd) # categorized data set data <- DataInd[,2:ncol(DataInd)] rownames(data) <- as.character(t(DataInd[1:nrow(DataInd),1])) group.names = c("Group 1", "Group 2", "Group 3", "Group 4") mf <- MFA(data, c(16,16,16,16), c(rep("f",4)), group.names) print("Principal components variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2) tit <- c("Scree-plot","Individuals","Individuals / Types coffees","Inercias Groups") Plot.MFA(mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = NA, casc = FALSE) # plotting several graphs on the screen
Simulated set of mixed data on consumption of coffee.
data(DataMix)
data(DataMix)
Data set with 10 rows and 7 columns. Being 10 observations described by 7 variables: Cooperatives/Tasters, Average grades given to analyzed coffees, Years of work as a taster, Taster with technical training, Taster exclusively dedicated, Average frequency of the coffees Classified as special, Average frequency of the coffees as commercial.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataMix) DataMix
data(DataMix) DataMix
Set simulated of qualitative data on consumption of coffee.
data(DataQuali)
data(DataQuali)
Data set simulated with 12 rows and 6 columns. Being 12 observations described by 6 variables: Sex, Age, Smoker, Marital status, Sportsman, Study.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataQuali) DataQuali
data(DataQuali) DataQuali
Set simulated of quantitative data on grades given to some sensory characteristics of coffees.
data(DataQuan)
data(DataQuan)
Data set with 6 rows and 11 columns. Being 6 observations described by 11 variables: Coffee, Chocolate, Caramelised, Ripe, Sweet, Delicate, Nutty, Caramelised, Chocolate, Spicy, Caramelised.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataQuan) DataQuan
data(DataQuan) DataQuan
Performs factorial analysis (FA) in a data set.
FA(data, method = "PC", type = 2, nfactor = 1, rotation = "None", scoresobs = "Bartlett", converg = 1e-5, iteracao = 1000, testfit = TRUE)
FA(data, method = "PC", type = 2, nfactor = 1, rotation = "None", scoresobs = "Bartlett", converg = 1e-5, iteracao = 1000, testfit = TRUE)
data |
Data to be analyzed. |
method |
Method of analysis: |
type |
1 for analysis using the covariance matrix, |
rotation |
Type of rotation: "None" (default), "Varimax" and "Promax". |
nfactor |
Number of factors (default = 1). |
scoresobs |
Type of scores for the observations: "Bartlett" (default) or "Regression". |
converg |
Limit value for convergence to sum of the squares of the residuals for Maximum likelihood method (default = 1e-5). |
iteracao |
Maximum number of iterations for Maximum Likelihood method (default = 1000). |
testfit |
Tests the model fit to the method of Maximum Likelihood (default = TRUE). |
mtxMC |
Matrix of correlation / covariance. |
mtxAutvlr |
Matrix of eigenvalues. |
mtxAutvec |
Matrix of eigenvectors. |
mtxvar |
Matrix of variances and proportions. |
mtxcarga |
Matrix of factor loadings. |
mtxvaresp |
Matrix of specific variances. |
mtxcomuna |
Matrix of commonalities. |
mtxresidue |
Matrix of residues. |
vlrsqrs |
Upper limit value for sum of squares of the residues. |
vlrsqr |
Sum of squares of the residues. |
mtxresult |
Matrix with all associated results. |
mtxscores |
Matrix with scores of the observations. |
coefscores |
Matrix with the scores of the coefficients of the factors. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Mingot, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Kaiser, H. F.The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187-200, 1958.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
data(DataQuan) # data set data <- DataQuan[,2:ncol(DataQuan)] rownames(data) <- DataQuan[,1] res <- FA(data, method = "PC", type = 2, nfactor = 3, rotation = "None", scoresobs = "Bartlett", converg = 1e-5, iteracao = 1000, testfit = TRUE) print("Matrix with all associated results:"); round(res$mtxresult,3) print("Sum of squares of the residues:"); round(res$vlrsqr,3) print("Matrix of the factor loadings.:"); round(res$mtxcarga,3) print("Matrix with scores of the observations:"); round(res$mtxscores,3) print("Matrix with the scores of the coefficients of the factors:"); round(res$coefscores,3)
data(DataQuan) # data set data <- DataQuan[,2:ncol(DataQuan)] rownames(data) <- DataQuan[,1] res <- FA(data, method = "PC", type = 2, nfactor = 3, rotation = "None", scoresobs = "Bartlett", converg = 1e-5, iteracao = 1000, testfit = TRUE) print("Matrix with all associated results:"); round(res$mtxresult,3) print("Sum of squares of the residues:"); round(res$vlrsqr,3) print("Matrix of the factor loadings.:"); round(res$mtxcarga,3) print("Matrix with scores of the observations:"); round(res$mtxscores,3) print("Matrix with the scores of the coefficients of the factors:"); round(res$coefscores,3)
Performs the exploration of the data through the technique of animation Grand Tour.
GrandTour(data, method = "Interpolation", title = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, axesvar = TRUE, axes = TRUE, numrot = 200, choicerot = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
GrandTour(data, method = "Interpolation", title = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, axesvar = TRUE, axes = TRUE, numrot = 200, choicerot = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
data |
Numerical data set. |
method |
Method used for rotations: |
title |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
axesvar |
Puts axes of rotation of the variables (default = TRUE). |
axes |
Plots the X and Y axes (default = TRUE). |
numrot |
Number of rotations (default = 200). If method = "Interpolation", numrot represents the angle of rotation. |
choicerot |
Choose specific rotation and display on the screen, or save the image if savptc = TRUE. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
Graphs with rotations.
proj.data |
Projected data. |
vector.opt |
Vector projection. |
method |
method used on Grand Tour. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Asimov, D. The Grand Tour: A Tool for Viewing Multidimensional data. SIAM Journal of Scientific and Statistical Computing, 6(1), 128-143, 1985.
Asimov, D.; Buja, A. The grand tour via geodesic interpolation of 2-frames. in Visual data Exploration and Analysis. Symposium on Electronic Imaging Science and Technology, IS&T/SPIE. 1994.
Buja, A.; Asimov, D. Grand tour methods: An outline. Computer Science and Statistics, 17:63-67. 1986.
Buja, A.; Cook, D.; Asimov, D.; Hurley, C. Computational methods for High-Dimensional Rotations in data Visualization, in C. R. Rao, E. J. Wegman & J. L. Solka, eds, "Handbook of Statistics: data Mining and Visualization", Elsevier/North Holland, http://www.elsevier.com, pp. 391-413. 2005.
Hurley, C.; Buja, A. Analyzing high-dimensional data with motion graphics, SIAM Journal of Scientific and Statistical Computing, 11 (6), 1193-1211. 1990.
Martinez, W. L.; Martinez, A. R.; Solka, J.; Exploratory data Analysis with MATLAB, 2th. ed. New York: Chapman & Hall/CRC, 2010. 499 p.
Young, F. W.; Rheingans P. Visualizing structure in high-dimensional multivariate data, IBM Journal of Research and Development, 35:97-107, 1991.
Young, F. W.; Faldowski R. A.; McFarlane M. M. Multivariate statistical visualization, in Handbook of Statistics, Vol 9, C. R. Rao (ed.), The Netherlands: Elsevier Science Publishers, 959-998, 1993.
data(iris) # database res <- GrandTour(iris[,1:4], method = "Torus", title = NA, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, class = NA, posleg = 2, boxleg = TRUE, axesvar = TRUE, axes = FALSE, numrot = 10, choicerot = NA, savptc = FALSE, width = 3236, height = 2000, res = 300) print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Grand Tour projection method:"); res$method res <- GrandTour(iris[,1:4], method = "Interpolation", title = NA, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, posleg = 2, boxleg = FALSE, axesvar = FALSE, axes = FALSE, numrot = 10, choicerot = NA, class = iris[,5], classcolor = c("goldenrod3","gray53","red"),savptc = FALSE, width = 3236, height = 2000, res = 300) print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Grand Tour projection method:"); res$method
data(iris) # database res <- GrandTour(iris[,1:4], method = "Torus", title = NA, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, class = NA, posleg = 2, boxleg = TRUE, axesvar = TRUE, axes = FALSE, numrot = 10, choicerot = NA, savptc = FALSE, width = 3236, height = 2000, res = 300) print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Grand Tour projection method:"); res$method res <- GrandTour(iris[,1:4], method = "Interpolation", title = NA, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, posleg = 2, boxleg = FALSE, axesvar = FALSE, axes = FALSE, numrot = 10, choicerot = NA, class = iris[,5], classcolor = c("goldenrod3","gray53","red"),savptc = FALSE, width = 3236, height = 2000, res = 300) print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Grand Tour projection method:"); res$method
Given the matrix of order
, the generalized singular value decomposition (GSVD) involves the use of two sets of positive square matrices of order
and
respectively. These two matrices express constraints imposed, respectively, on the lines and columns of
.
GSVD(data, plin = NULL, pcol = NULL)
GSVD(data, plin = NULL, pcol = NULL)
data |
Matrix used for decomposition. |
plin |
Weight for rows. |
pcol |
Weight for columns |
If plin or pcol is not used, it will be calculated as the usual singular value decomposition.
d |
Eigenvalues, that is, line vector with singular values of the decomposition. |
u |
Eigenvectors referring rows. |
v |
Eigenvectors referring columns. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Abdi, H. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In: SALKIND, N. J. (Ed.). Encyclopedia of measurement and statistics. Thousand Oaks: Sage, 2007. p. 907-912.
data <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12), nrow = 4, ncol = 3) svd(data) # Usual Singular Value Decomposition GSVD(data) # GSVD with the same previous results # GSVD with weights for rows and columns GSVD(data, plin = c(0.1,0.5,2,1.5), pcol = c(1.3,2,0.8))
data <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12), nrow = 4, ncol = 3) svd(data) # Usual Singular Value Decomposition GSVD(data) # GSVD with the same previous results # GSVD with weights for rows and columns GSVD(data, plin = c(0.1,0.5,2,1.5), pcol = c(1.3,2,0.8))
In the indicator matrix the elements are arranged in the form of dummy variables, in other words, 1 for a category chosen as a response variable and 0 for the other categories of the same variable.
IM(data, names = TRUE)
IM(data, names = TRUE)
data |
Categorical data. |
names |
Include the names of the variables in the levels of the Indicator Matrix (default = TRUE). |
mtxIndc |
Returns converted data in the indicator matrix. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data <- matrix(c("S","S","N","N",1,2,3,4,"N","S","T","N"), nrow = 4, ncol = 3) IM(data, names = FALSE) data(DataQuali) # qualitative data set IM(DataQuali, names = TRUE)
data <- matrix(c("S","S","N","N",1,2,3,4,"N","S","T","N"), nrow = 4, ncol = 3) IM(data, names = FALSE) data(DataQuali) # qualitative data set IM(DataQuali, names = TRUE)
Function for better position of the labels in the graphs.
LocLab(x, y = NULL, labels = seq(along = x), cex = 1, method = c("SANN", "GA"), allowSmallOverlap = FALSE, trace = FALSE, shadotext = FALSE, doPlot = TRUE, ...)
LocLab(x, y = NULL, labels = seq(along = x), cex = 1, method = c("SANN", "GA"), allowSmallOverlap = FALSE, trace = FALSE, shadotext = FALSE, doPlot = TRUE, ...)
x |
Coordinate x |
y |
Coordinate y |
labels |
The labels |
cex |
cex |
method |
Not used |
allowSmallOverlap |
Boolean |
trace |
Boolean |
shadotext |
Boolean |
doPlot |
Boolean |
... |
Other arguments passed to or from other methods |
See the text of the function.
Performs Multidimensional Scaling (MDS) on a data set.
MDS(data, distance = "euclidean", title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
MDS(data, distance = "euclidean", title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
data |
Data to be analyzed. |
distance |
Metric of the distance: "euclidean" (default), "maximum", "manhattan", "canberra", "binary" or "minkowski". |
title |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
axes |
Plot the X and Y axes (default = TRUE). |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
Multidimensional Scaling.
mtxD |
Matrix of the distances. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data(iris) # data set data <- iris[,1:4] cls <- iris[,5] # data class md <- MDS(data = data, distance = "euclidean", title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","gray53","red"), savptc = FALSE, width = 3236, height = 2000, res = 300) print("Matrix of the distances:"); md$mtxD
data(iris) # data set data <- iris[,1:4] cls <- iris[,5] # data class md <- MDS(data = data, distance = "euclidean", title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","gray53","red"), savptc = FALSE, width = 3236, height = 2000, res = 300) print("Matrix of the distances:"); md$mtxD
Perform Multiple Factor Analysis (MFA) on groups of variables. The groups of variables can be quantitative, qualitative, frequency (MFACT) data, or mixed data.
MFA(data, groups, typegroups = rep("n",length(groups)), namegroups = NULL)
MFA(data, groups, typegroups = rep("n",length(groups)), namegroups = NULL)
data |
Data to be analyzed. |
groups |
Number of columns for each group in order following the order of data in 'data'. |
typegroups |
Type of group: |
namegroups |
Names for each group. |
vtrG |
Vector with the sizes of each group. |
vtrNG |
Vector with the names of each group. |
vtrplin |
Vector with the values used to balance the lines of the Z matrix. |
vtrpcol |
Vector with the values used to balance the columns of the Z matrix. |
mtxZ |
Matrix concatenated and balanced. |
mtxA |
Matrix of the eigenvalues (variances) with the proportions and proportions accumulated. |
mtxU |
Matrix U of the singular decomposition of the matrix Z. |
mtxV |
Matrix V of the singular decomposition of the matrix Z. |
mtxF |
Matrix global factor scores where the lines are the observations and the columns the components. |
mtxEFG |
Matrix of the factor scores by group. |
mtxCCP |
Matrix of the correlation of the principal components with original variables. |
mtxEV |
Matrix of the partial inertias / scores of the variables |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Abdessemed, L.; Escofier, B. Analyse factorielle multiple de tableaux de frequencies: comparaison avec l'analyse canonique des correspondences. Journal de la Societe de Statistique de Paris, Paris, v. 137, n. 2, p. 3-18, 1996..
Abdi, H. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In: SALKIND, N. J. (Ed.). Encyclopedia of measurement and statistics. Thousand Oaks: Sage, 2007. p. 907-912.
Abdi, H.; Valentin, D. Multiple factor analysis (MFA). In: SALKIND, N. J. (Ed.). Encyclopedia of measurement and statistics. Thousand Oaks: Sage, 2007. p. 657-663.
Abdi, H.; Williams, L. Principal component analysis. WIREs Computational Statatistics, New York, v. 2, n. 4, p. 433-459, July/Aug. 2010.
Abdi, H.; Williams, L.; Valentin, D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Computational Statatistics, New York, v. 5, n. 2, p. 149-179, Feb. 2013.
Becue-Bertaut, M.; Pages, J. A principal axes method for comparing contingency tables: MFACT. Computational Statistics & data Analysis, New York, v. 45, n. 3, p. 481-503, Feb. 2004
Becue-Bertaut, M.; Pages, J. Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data. Computational Statistics & data Analysis, New York, v. 52, n. 6, p. 3255-3268, Feb. 2008.
Bezecri, J. Analyse de l'inertie intraclasse par l'analyse d'un tableau de contingence: intra-classinertia analysis through the analysis of a contingency table. Les Cahiers de l'Analyse des Donnees, Paris, v. 8, n. 3, p. 351-358, 1983.
Escofier, B. Analyse factorielle en reference a un modele: application a l'analyse d'un tableau d'echanges. Revue de Statistique Appliquee, Paris, v. 32, n. 4, p. 25-36, 1984.
Escofier, B.; Drouet, D. Analyse des differences entre plusieurs tableaux de frequence. Les Cahiers de l'Analyse des Donnees, Paris, v. 8, n. 4, p. 491-499, 1983.
Escofier, B.; Pages, J. Analyse factorielles simples et multiples. Paris: Dunod, 1990. 267 p.
Escofier, B.; Pages, J. Analyses factorielles simples et multiples: objectifs, methodes et interpretation. 4th ed. Paris: Dunod, 2008. 318 p.
Escofier, B.; Pages, J. Comparaison de groupes de variables definies sur le meme ensemble d'individus: un exemple d'applications. Le Chesnay: Institut National de Recherche en Informatique et en Automatique, 1982. 121 p.
Escofier, B.; Pages, J. Multiple factor analysis (AFUMULT package). Computational Statistics & data Analysis, New York, v. 18, n. 1, p. 121-140, Aug. 1994
Greenacre, M.; Blasius, J. Multiple correspondence analysis and related methods. New York: Taylor and Francis, 2006. 607 p.
Ossani, P. C.; Cirillo, M. A.; Borem, F. M.; Ribeiro, D. E.; Cortez, R. M. Quality of specialty coffees: a sensory evaluation by consumers using the MFACT technique. Revista Ciencia Agronomica (UFC. Online), v. 48, p. 92-100, 2017.
Pages, J. Analyse factorielle multiple appliquee aux variables qualitatives et aux donnees mixtes. Revue de Statistique Appliquee, Paris, v. 50, n. 4, p. 5-37, 2002.
Pages, J.. Multiple factor analysis: main features and application to sensory data. Revista Colombiana de Estadistica, Bogota, v. 27, n. 1, p. 1-26, 2004.
data(DataMix) # mixed dataset data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[1:nrow(DataMix),1] group.names = c("Grade Cafes/Work", "Formation/Dedication", "Coffees") mf <- MFA(data = data, c(2,2,2), typegroups = c("n","c","f"), group.names) # performs MFA print("Principal Component Variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2)
data(DataMix) # mixed dataset data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[1:nrow(DataMix),1] group.names = c("Grade Cafes/Work", "Formation/Dedication", "Coffees") mf <- MFA(data = data, c(2,2,2), typegroups = c("n","c","f"), group.names) # performs MFA print("Principal Component Variances:"); round(mf$mtxA,2) print("Matrix of the Partial Inertia / Score of the Variables:"); round(mf$mtxEV,2)
Function that normalizes the data globally, or by column.
NormData(data, type = 1)
NormData(data, type = 1)
data |
Data to be analyzed. |
type |
1 normalizes overall (default), |
dataNorm |
Normalized data. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] res <- NormData(data, type = 1) # normalizes the data globally res # Globally standardized data sd(res) # overall standard deviation mean(res) # overall mean res <- NormData(data, type = 2) # normalizes the data per column res # standardized data per column apply(res, 2, sd) # standard deviation per column colMeans(res) # column averages
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] res <- NormData(data, type = 1) # normalizes the data globally res # Globally standardized data sd(res) # overall standard deviation mean(res) # overall mean res <- NormData(data, type = 2) # normalizes the data per column res # standardized data per column apply(res, 2, sd) # standard deviation per column colMeans(res) # column averages
Check the normality of the data, based on the asymmetry coefficient test.
NormTest(data, sign = 0.05)
NormTest(data, sign = 0.05)
data |
Data to be analyzed. |
sign |
Test significance level (default 5%). |
statistic |
Observed Chi-square value, that is, the test statistic. |
chisquare |
Chi-square value calculated. |
gl |
Degree of freedom. |
p.value |
p-value. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Rencher, A. C. Methods of Multivariate Analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
data <- cbind(rnorm(100,2,3), rnorm(100,1,2)) NormTest(data) plot(density(data)) data <- cbind(rexp(200,3), rexp(200,3)) NormTest(data, sign = 0.01) plot(density(data))
data <- cbind(rnorm(100,2,3), rnorm(100,1,2)) NormTest(data) plot(density(data)) data <- cbind(rexp(200,3), rexp(200,3)) NormTest(data, sign = 0.01) plot(density(data))
Performs principal component analysis (PCA) in a data set.
PCA(data, type = 1)
PCA(data, type = 1)
data |
Data to be analyzed. |
type |
1 for analysis using the covariance matrix (default), |
mtxC |
Matrix of covariance or correlation according to "type". |
mtxAutvlr |
Matrix of eigenvalues (variances) with the proportions and proportions accumulated. |
mtxAutvec |
Matrix of eigenvectors - principal components. |
mtxVCP |
Matrix of covariance of the principal components with the original variables. |
mtxCCP |
Matrix of correlation of the principal components with the original variables. |
mtxscores |
Matrix with scores of the principal components. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, Arlington, v. 24, p. 417-441, Sept. 1933.
Mingoti, S. A. Analise de dados atraves de metodos de estatistica multivariada: uma abordagem aplicada. Belo Horizonte: UFMG, 2005. 297 p.
Ferreira, D. F. Estatistica Multivariada. 2a ed. revisada e ampliada. Lavras: Editora UFLA, 2011. 676 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] rownames(data) <- DataQuan[1:nrow(DataQuan),1] pc <- PCA(data, 2) # performs the PCA print("Covariance matrix / Correlation:"); round(pc$mtxC,2) print("Principal Components:"); round(pc$mtxAutvec,2) print("Principal Component Variances:"); round(pc$mtxAutvlr,2) print("Covariance of the Principal Components:"); round(pc$mtxVCP,2) print("Correlation of the Principal Components:"); round(pc$mtxCCP,2) print("Scores of the Principal Components:"); round(pc$mtxscores,2)
data(DataQuan) # set of quantitative data data <- DataQuan[,2:8] rownames(data) <- DataQuan[1:nrow(DataQuan),1] pc <- PCA(data, 2) # performs the PCA print("Covariance matrix / Correlation:"); round(pc$mtxC,2) print("Principal Components:"); round(pc$mtxAutvec,2) print("Principal Component Variances:"); round(pc$mtxAutvlr,2) print("Covariance of the Principal Components:"); round(pc$mtxVCP,2) print("Correlation of the Principal Components:"); round(pc$mtxCCP,2) print("Scores of the Principal Components:"); round(pc$mtxscores,2)
Graphs of the simple (CA) and multiple correspondence analysis (MCA).
Plot.CA(CA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.CA(CA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
CA |
Data of the CA function. |
titles |
Titles of the graphics, if not set, assumes the default text.. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataFreq) # frequency data set data <- DataFreq[,2:ncol(DataFreq)] rownames(data) <- DataFreq[1:nrow(DataFreq),1] res <- CA(data, "f") # performs CA tit <- c("Scree-plot","Observations", "Variables", "Observations / Variables") Plot.CA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = rownames(data), savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) data(DataQuali) # qualitative data set data <- DataQuali[,2:ncol(DataQuali)] res <- CA(data, "c", "b") # performs CA tit <- c("","","Graph of the variables") Plot.CA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE)
data(DataFreq) # frequency data set data <- DataFreq[,2:ncol(DataFreq)] rownames(data) <- DataFreq[1:nrow(DataFreq),1] res <- CA(data, "f") # performs CA tit <- c("Scree-plot","Observations", "Variables", "Observations / Variables") Plot.CA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = rownames(data), savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) data(DataQuali) # qualitative data set data <- DataQuali[,2:ncol(DataQuali)] res <- CA(data, "c", "b") # performs CA tit <- c("","","Graph of the variables") Plot.CA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE)
Graphs of the Canonical Correlation Analysis (CCA).
Plot.CCA(CCA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.CCA(CCA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
CCA |
Data of the CCA function. |
titles |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataMix) # database data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[,1] X <- data[,1:2] Y <- data[,5:6] res <- CCA(X, Y, type = 2, test = "Bartlett", sign = 0.05) # performs CCA tit <- c("Scree-plot","Correlations","Scores of the group X","Scores of the group Y") Plot.CCA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
data(DataMix) # database data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[,1] X <- data[,1:2] Y <- data[,5:6] res <- CCA(X, Y, type = 2, test = "Bartlett", sign = 0.05) # performs CCA tit <- c("Scree-plot","Correlations","Scores of the group X","Scores of the group Y") Plot.CCA(res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
It performs the correlations between the variables of a database and presents it in graph form.
Plot.Cor(data, title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = FALSE, arrow = TRUE, color = TRUE, namesvar = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
Plot.Cor(data, title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = FALSE, arrow = TRUE, color = TRUE, namesvar = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
data |
Numeric data set. |
title |
Title for the plot, if not defined it assumes standard text. |
grid |
Puts grid on plot (default = TRUE). |
leg |
Put the legend on the plot (default = TRUE) |
boxleg |
Put frame in the legend (default = FALSE). |
text |
Puts correlation values in circles (default = FALSE). |
arrow |
Positive (up) and negative (down) correlation arrows (default = TRUE). |
color |
Colorful plot (default = TRUE). |
namesvar |
Vector with the variable names, if omitted it assumes the names in 'date'. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
Plot with the correlations between the variables in 'date'.
Paulo Cesar Ossani
data(iris) # data set Plot.Cor(data = iris[,1:4], title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = FALSE, arrow = TRUE, color = TRUE, namesvar = NA, savptc = FALSE, width = 3236, height = 2000, res = 300) Plot.Cor(data = iris[,1:4], title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = TRUE, arrow = TRUE, color = TRUE, namesvar = c("A1","B2","C3","D4"), savptc = FALSE, width = 3236, height = 2000, res = 300)
data(iris) # data set Plot.Cor(data = iris[,1:4], title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = FALSE, arrow = TRUE, color = TRUE, namesvar = NA, savptc = FALSE, width = 3236, height = 2000, res = 300) Plot.Cor(data = iris[,1:4], title = NA, grid = TRUE, leg = TRUE, boxleg = FALSE, text = TRUE, arrow = TRUE, color = TRUE, namesvar = c("A1","B2","C3","D4"), savptc = FALSE, width = 3236, height = 2000, res = 300)
Graphs of the Factorial Analysis (FA).
Plot.FA(FA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, axes = TRUE, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.FA(FA, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, axes = TRUE, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
FA |
Data of the FA function. |
titles |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
axes |
Plots the X and Y axes (default = TRUE). |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(iris) # conjunto de dados data <- iris[,1:4] cls <- iris[,5] # classe dos dados res <- FA(data, method = "PC", type = 2, nfactor = 3) tit <- c("Scree-plot","Scores of the Observations","Factorial Loadings","Biplot") cls <- as.character(iris[,5]) Plot.FA(FA = res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, size = 1.1, posleg = 1, boxleg = FALSE, class = cls, axes = TRUE, classcolor = c("blue3","red","goldenrod3"), width = 3236, height = 2000, res = 300, casc = FALSE)
data(iris) # conjunto de dados data <- iris[,1:4] cls <- iris[,5] # classe dos dados res <- FA(data, method = "PC", type = 2, nfactor = 3) tit <- c("Scree-plot","Scores of the Observations","Factorial Loadings","Biplot") cls <- as.character(iris[,5]) Plot.FA(FA = res, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, size = 1.1, posleg = 1, boxleg = FALSE, class = cls, axes = TRUE, classcolor = c("blue3","red","goldenrod3"), width = 3236, height = 2000, res = 300, casc = FALSE)
Graphics of the Multiple Factor Analysis (MFA).
Plot.MFA(MFA, titles = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, size = 1.1, grid = TRUE, color = TRUE, groupscolor = NA, namarr = FALSE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.MFA(MFA, titles = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, size = 1.1, grid = TRUE, color = TRUE, groupscolor = NA, namarr = FALSE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
MFA |
Data of the MFA function. |
titles |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
posleg |
1 for caption in the left upper corner, |
boxleg |
Puts frame in legend (default = TRUE). |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
groupscolor |
Vector with the colors of the groups. |
namarr |
Puts the points names in the cloud around the centroid in the graph corresponding to the global analysis of the Individuals and Variables (default = FALSE). |
linlab |
Vector with the labels for the observations, if not set, assumes the default text. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(DataMix) # set of mixed data data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[1:nrow(DataMix),1] group.names = c("Grade Cafes/Work", "Formation/Dedication", "Coffees") mf <- MFA(data, c(2,2,2), typegroups = c("n","c","f"), group.names) # performs MFA tit <- c("Scree-Plot","Observations","Observations/Variables", "Correlation Circle","Inertia of the Variable Groups") Plot.MFA(MFA = mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, groupscolor = c("blue3","red","goldenrod3"), namarr = FALSE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE) # plotting several graphs on the screen Plot.MFA(MFA = mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = rep("A?",10), savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE) # plotting several graphs on the screen
data(DataMix) # set of mixed data data <- DataMix[,2:ncol(DataMix)] rownames(data) <- DataMix[1:nrow(DataMix),1] group.names = c("Grade Cafes/Work", "Formation/Dedication", "Coffees") mf <- MFA(data, c(2,2,2), typegroups = c("n","c","f"), group.names) # performs MFA tit <- c("Scree-Plot","Observations","Observations/Variables", "Correlation Circle","Inertia of the Variable Groups") Plot.MFA(MFA = mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, groupscolor = c("blue3","red","goldenrod3"), namarr = FALSE, linlab = NA, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE) # plotting several graphs on the screen Plot.MFA(MFA = mf, titles = tit, xlabel = NA, ylabel = NA, posleg = 2, boxleg = FALSE, color = TRUE, namarr = FALSE, linlab = rep("A?",10), savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE) # plotting several graphs on the screen
Graphs of the Principal Components Analysis (PCA).
Plot.PCA(PC, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, axes = TRUE, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.PCA(PC, titles = NA, xlabel = NA, ylabel = NA, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, axes = TRUE, class = NA, classcolor = NA, posleg = 2, boxleg = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
PC |
Data of the PCA function. |
titles |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
axes |
Plots the X and Y axes (default = TRUE). |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
data(iris) # data set data <- iris[,1:4] cls <- iris[,5] # data class pc <- PCA(data, 2) tit <- c("Scree-plot","Observations","Correlations") cls <- as.character(iris[,5]) Plot.PCA(PC = pc, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, size = 1.1, posleg = 2, boxleg = FALSE, class = cls, axes = TRUE, classcolor = c("blue3","red","goldenrod3"), width = 3236, height = 2000, res = 300, casc = FALSE)
data(iris) # data set data <- iris[,1:4] cls <- iris[,5] # data class pc <- PCA(data, 2) tit <- c("Scree-plot","Observations","Correlations") cls <- as.character(iris[,5]) Plot.PCA(PC = pc, titles = tit, xlabel = NA, ylabel = NA, color = TRUE, linlab = NA, savptc = FALSE, size = 1.1, posleg = 2, boxleg = FALSE, class = cls, axes = TRUE, classcolor = c("blue3","red","goldenrod3"), width = 3236, height = 2000, res = 300, casc = FALSE)
Graphics of the Projection Pursuit (PP).
Plot.PP(PP, titles = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, size = 1.1, grid = TRUE, color = TRUE, classcolor = NA, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.PP(PP, titles = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, size = 1.1, grid = TRUE, color = TRUE, classcolor = NA, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
PP |
Data of the PP_Optimizer function. |
titles |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
classcolor |
Vector with the colors of the classes. |
linlab |
Vector with the labels for the observations. |
axesvar |
Puts axes of rotation of the variables, only when dimproj > 1 (default = TRUE). |
axes |
Plots the X and Y axes (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Graph of the evolution of the indices, and graphs whose data were reduced in two dimensions.
Paulo Cesar Ossani
Marcelo Angelo Cirillo
PP_Optimizer
and PP_Index
data(iris) # dataset # Example 1 - Without the classes in the data data <- iris[,1:4] findex <- "kurtosismax" # index function dim <- 1 # dimension of data projection sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = NA, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) Plot.PP(res, titles = NA, posleg = 1, boxleg = FALSE, color = TRUE, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 2 - With the classes in the data class <- iris[,5] # data class res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) tit <- c(NA,"Graph example") # titles for the graphics Plot.PP(res, titles = tit, posleg = 1, boxleg = FALSE, color = TRUE, classcolor = c("blue3","red","goldenrod3"), linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 3 - Without the classes in the data, but informing # the classes in the plot function res <- PP_Optimizer(data = data, class = NA, findex = "Moment", optmethod = "GTSA", dimproj = 2, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) lin <- c(rep("a",50),rep("b",50),rep("c",50)) # data class Plot.PP(res, titles = NA, posleg = 1, boxleg = FALSE, color = TRUE, linlab = lin, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 4 - With the classes in the data, but not informed in plot function class <- iris[,5] # data class dim <- 2 # dimension of data projection findex <- "lda" # index function res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) tit <- c("",NA) # titles for the graphics Plot.PP(res, titles = tit, posleg = 1, boxleg = FALSE, color = TRUE, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE)
data(iris) # dataset # Example 1 - Without the classes in the data data <- iris[,1:4] findex <- "kurtosismax" # index function dim <- 1 # dimension of data projection sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = NA, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) Plot.PP(res, titles = NA, posleg = 1, boxleg = FALSE, color = TRUE, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 2 - With the classes in the data class <- iris[,5] # data class res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) tit <- c(NA,"Graph example") # titles for the graphics Plot.PP(res, titles = tit, posleg = 1, boxleg = FALSE, color = TRUE, classcolor = c("blue3","red","goldenrod3"), linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 3 - Without the classes in the data, but informing # the classes in the plot function res <- PP_Optimizer(data = data, class = NA, findex = "Moment", optmethod = "GTSA", dimproj = 2, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) lin <- c(rep("a",50),rep("b",50),rep("c",50)) # data class Plot.PP(res, titles = NA, posleg = 1, boxleg = FALSE, color = TRUE, linlab = lin, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE) # Example 4 - With the classes in the data, but not informed in plot function class <- iris[,5] # data class dim <- 2 # dimension of data projection findex <- "lda" # index function res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 500, half = 30) tit <- c("",NA) # titles for the graphics Plot.PP(res, titles = tit, posleg = 1, boxleg = FALSE, color = TRUE, linlab = NA, axesvar = TRUE, axes = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = FALSE)
Graphs of the linear regression results.
Plot.Regr(Reg, typegraf = "Scatterplot", title = NA, xlabel = NA, ylabel = NA, namevary = NA, namevarx = NA, size = 1.1, grid = TRUE, color = TRUE, intconf = TRUE, intprev = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Plot.Regr(Reg, typegraf = "Scatterplot", title = NA, xlabel = NA, ylabel = NA, namevary = NA, namevarx = NA, size = 1.1, grid = TRUE, color = TRUE, intconf = TRUE, intprev = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300, casc = TRUE)
Reg |
Regression function data. |
typegraf |
Type of graphic: |
title |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
namevary |
Variable name Y, if not set, assumes the default text. |
namevarx |
Name of the variable X, or variables X, if not set, assumes the default text. |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
intconf |
Case typegraf = "Regression". Graphics with confidence interval (default = TRUE). |
intprev |
Case typegraf = "Regression". Graphics with predictive interval (default = TRUE). |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
casc |
Cascade effect in the presentation of the graphics (default = TRUE). |
Returns several graphs.
Paulo Cesar Ossani
data(DataMix) Y <- DataMix[,2] X <- DataMix[,7] name.y <- "Medium grade" name.x <- "Commercial coffees" res <- Regr(Y, X, namevarx = name.x , intercept = TRUE, sigf = 0.05) tit <- c("Scatterplot") Plot.Regr(res, typegraf = "Scatterplot", title = tit, namevary = name.y, namevarx = name.x, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) tit <- c("Scatterplot with the adjusted line") Plot.Regr(res, typegraf = "Regression", title = tit, xlabel = name.x, ylabel = name.y, color = TRUE, intconf = TRUE, intprev = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) dev.new() # necessary to not overlap the following graphs to the previous graph par(mfrow = c(2,2)) Plot.Regr(res, typegraf = "QQPlot", casc = FALSE) Plot.Regr(res, typegraf = "Histogram", casc = FALSE) Plot.Regr(res, typegraf = "Fits", casc = FALSE) Plot.Regr(res, typegraf = "Order", casc = FALSE)
data(DataMix) Y <- DataMix[,2] X <- DataMix[,7] name.y <- "Medium grade" name.x <- "Commercial coffees" res <- Regr(Y, X, namevarx = name.x , intercept = TRUE, sigf = 0.05) tit <- c("Scatterplot") Plot.Regr(res, typegraf = "Scatterplot", title = tit, namevary = name.y, namevarx = name.x, color = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) tit <- c("Scatterplot with the adjusted line") Plot.Regr(res, typegraf = "Regression", title = tit, xlabel = name.x, ylabel = name.y, color = TRUE, intconf = TRUE, intprev = TRUE, savptc = FALSE, width = 3236, height = 2000, res = 300) dev.new() # necessary to not overlap the following graphs to the previous graph par(mfrow = c(2,2)) Plot.Regr(res, typegraf = "QQPlot", casc = FALSE) Plot.Regr(res, typegraf = "Histogram", casc = FALSE) Plot.Regr(res, typegraf = "Fits", casc = FALSE) Plot.Regr(res, typegraf = "Order", casc = FALSE)
Function used to find Projection Pursuit indexes (PP).
PP_Index(data, class = NA, vector.proj = NA, findex = "HOLES", dimproj = 2, weight = TRUE, lambda = 0.1, r = 1, ck = NA)
PP_Index(data, class = NA, vector.proj = NA, findex = "HOLES", dimproj = 2, weight = TRUE, lambda = 0.1, r = 1, ck = NA)
data |
Numeric dataset without class information. |
class |
Vector with names of data classes. |
vector.proj |
Vector projection. |
findex |
Projection index function to be used: |
dimproj |
Dimension of data projection (default = 2). |
weight |
Used in index LDA, PDA and Lr to weight the calculations for the number of elements in each class (default = TRUE). |
lambda |
Used in the PDA index (default = 0.1). |
r |
Used in the Lr index (default = 1). |
ck |
Internal use of the CHI index function. |
num.class |
Number of classes. |
class.names |
Class names. |
findex |
Projection index function used. |
vector.proj |
Projection vectors found. |
index |
Projection index found in the process. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Ossani, P. C.; Figueira, M. R.; Cirillo, M. A. Proposition of a new index for projection pursuit in the multiple factor analysis. Computational and Mathematical Methods, v. 1, p. 1-18, 2020.
Cook, D.; Buja, A.; Cabrera, J. Projection pursuit indexes based on orthonormal function expansions. Journal of Computational and Graphical Statistics, 2(3):225-250, 1993.
Cook, D.; Buja, A.; Cabrera, J.; Hurley, C. Grand tour and projection pursuit, Journal of Computational and Graphical Statistics, 4(3), 155-172, 1995.
Cook, D.; Swayne, D. F. Interactive and Dynamic Graphics for data Analysis: With R and GGobi. Springer. 2007.
Espezua, S.; Villanueva, E.; Maciel, C. D.; Carvalho, A. A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing, 149, 767-776, 2015.
Friedman, J. H., Tukey, J. W. A projection pursuit algorithm for exploratory data analysis. IEEE Transaction on Computers, 23(9):881-890, 1974.
Hastie, T., Buja, A., Tibshirani, R. Penalized discriminant analysis. The Annals of Statistics. 23(1), 73-102 . 1995.
Huber, P. J. Projection pursuit. Annals of Statistics, 13(2):435-475, 1985.
Jones, M. C.; Sibson, R. What is projection pursuit, (with discussion), Journal of the Royal Statistical Society, Series A 150, 1-36, 1987.
Lee, E. K.; Cook, D. A projection pursuit index for large p small n data. Statistics and Computing, 20(3):381-392, 2010.
Lee, E.; Cook, D.; Klinke, S.; Lumley, T. Projection pursuit for exploratory supervised classification. Journal of Computational and Graphical Statistics, 14(4):831-846, 2005.
Martinez, W. L., Martinez, A. R.; Computational Statistics Handbook with MATLAB, 2th. ed. New York: Chapman & Hall/CRC, 2007. 794 p.
Martinez, W. L.; Martinez, A. R.; Solka, J. Exploratory data Analysis with MATLAB, 2th. ed. New York: Chapman & Hall/CRC, 2010. 499 p.
Pena, D.; Prieto, F. Cluster identification using projections. Journal of the American Statistical Association, 96(456):1433-1445, 2001.
Posse, C. Projection pursuit exploratory data analysis, Computational Statistics and data Analysis, 29:669-687, 1995a.
Posse, C. Tools for two-dimensional exploratory projection pursuit, Journal of Computational and Graphical Statistics, 4:83-100, 1995b.
PP_Optimizer
and Plot.PP
data(iris) # data set data <- iris[,1:4] # Example 1 - Without the classes in the data ind <- PP_Index(data = data, class = NA, vector.proj = NA, findex = "moment", dimproj = 2, weight = TRUE, lambda = 0.1, r = 1) print("Number of classes:"); ind$num.class print("class Names:"); ind$class.names print("Projection index function:"); ind$findex print("Projection vectors:"); ind$vector.proj print("Projection index:"); ind$index # Example 2 - With the classes in the data class <- iris[,5] # data class findex <- "pda" # index function sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "SA", dimproj = 2, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) # Comparing the result obtained if (match(toupper(findex),c("LDA", "PDA", "LR"), nomatch = 0) > 0) { if (sphere) { data <- apply(predict(prcomp(data)), 2, scale) # spherical data } } else data <- as.matrix(res$proj.data[,1:Dim]) ind <- PP_Index(data = data, class = class, vector.proj = res$vector.opt, findex = findex, dimproj = 2, weight = TRUE, lambda = 0.1, r = 1) print("Number of classes:"); ind$num.class print("class Names:"); ind$class.names print("Projection index function:"); ind$findex print("Projection vectors:"); ind$vector.proj print("Projection index:"); ind$index print("Optimized Projection index:"); res$index[length(res$index)]
data(iris) # data set data <- iris[,1:4] # Example 1 - Without the classes in the data ind <- PP_Index(data = data, class = NA, vector.proj = NA, findex = "moment", dimproj = 2, weight = TRUE, lambda = 0.1, r = 1) print("Number of classes:"); ind$num.class print("class Names:"); ind$class.names print("Projection index function:"); ind$findex print("Projection vectors:"); ind$vector.proj print("Projection index:"); ind$index # Example 2 - With the classes in the data class <- iris[,5] # data class findex <- "pda" # index function sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "SA", dimproj = 2, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) # Comparing the result obtained if (match(toupper(findex),c("LDA", "PDA", "LR"), nomatch = 0) > 0) { if (sphere) { data <- apply(predict(prcomp(data)), 2, scale) # spherical data } } else data <- as.matrix(res$proj.data[,1:Dim]) ind <- PP_Index(data = data, class = class, vector.proj = res$vector.opt, findex = findex, dimproj = 2, weight = TRUE, lambda = 0.1, r = 1) print("Number of classes:"); ind$num.class print("class Names:"); ind$class.names print("Projection index function:"); ind$findex print("Projection vectors:"); ind$vector.proj print("Projection index:"); ind$index print("Optimized Projection index:"); res$index[length(res$index)]
Optimization function of the Projection Pursuit index (PP).
PP_Optimizer(data, class = NA, findex = "HOLES", dimproj = 2, sphere = TRUE, optmethod = "GTSA", weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 3000, half = 30)
PP_Optimizer(data, class = NA, findex = "HOLES", dimproj = 2, sphere = TRUE, optmethod = "GTSA", weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 3000, half = 30)
data |
Numeric dataset without class information. |
class |
Vector with names of data classes. |
findex |
Projection index function to be used: |
dimproj |
Dimension of the data projection (default = 2). |
sphere |
Spherical data (default = TRUE). |
optmethod |
Optimization method GTSA - Grand Tour Simulated Annealing or SA - Simulated Annealing (default = "GTSA"). |
weight |
Used in index LDA, PDA and Lr to weight the calculations for the number of elements in each class (default = TRUE). |
lambda |
Used in the PDA index (default = 0.1). |
r |
Used in the Lr index (default = 1). |
cooling |
Cooling rate (default = 0.9). |
eps |
Approximation accuracy for cooling (default = 1e-3). |
maxiter |
Maximum number of iterations of the algorithm (default = 3000). |
half |
Number of steps without incrementing the index, then decreasing the cooling value (default = 30). |
num.class |
Number of classes. |
class.names |
Class names. |
proj.data |
Projected data. |
vector.opt |
Projection vectors found. |
index |
Vector with the projection indices found in the process, converging to the maximum, or the minimum. |
findex |
Projection index function used. |
Paulo Cesar Ossani
Marcelo Angelo Cirillo
Cook, D.; Lee, E. K.; Buja, A.; Wickmam, H. Grand tours, projection pursuit guided tours and manual controls. In Chen, Chunhouh, Hardle, Wolfgang, Unwin, e Antony (Eds.), Handbook of data Visualization, Springer Handbooks of Computational Statistics, chapter III.2, p. 295-314. Springer, 2008.
Lee, E.; Cook, D.; Klinke, S.; Lumley, T. Projection pursuit for exploratory supervised classification. Journal of Computational and Graphical Statistics, 14(4):831-846, 2005.
data(iris) # data set # Example 1 - Without the classes in the data data <- iris[,1:4] class <- NA # data class findex <- "kurtosismax" # index function dim <- 1 # dimension of data projection sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) print("Number of classes:"); res$num.class print("class Names:"); res$class.names print("Projection index function:"); res$findex print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Projection index:"); res$index # Example 2 - With the classes in the data class <- iris[,5] # classe dos dados res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) print("Number of classes:"); res$num.class print("class Names:"); res$class.names print("Projection index function:"); res$findex print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Projection index:"); res$index
data(iris) # data set # Example 1 - Without the classes in the data data <- iris[,1:4] class <- NA # data class findex <- "kurtosismax" # index function dim <- 1 # dimension of data projection sphere <- TRUE # spherical data res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) print("Number of classes:"); res$num.class print("class Names:"); res$class.names print("Projection index function:"); res$findex print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Projection index:"); res$index # Example 2 - With the classes in the data class <- iris[,5] # classe dos dados res <- PP_Optimizer(data = data, class = class, findex = findex, optmethod = "GTSA", dimproj = dim, sphere = sphere, weight = TRUE, lambda = 0.1, r = 1, cooling = 0.9, eps = 1e-3, maxiter = 1000, half = 30) print("Number of classes:"); res$num.class print("class Names:"); res$class.names print("Projection index function:"); res$findex print("Projected data:"); res$proj.data print("Projection vectors:"); res$vector.opt print("Projection index:"); res$index
Performs linear regression on a data set.
Regr(Y, X, namevarx = NA, intercept = TRUE, sigf = 0.05)
Regr(Y, X, namevarx = NA, intercept = TRUE, sigf = 0.05)
Y |
Variable response. |
X |
Regression variables. |
namevarx |
Name of the variable, or variables X, if not set, assumes the default text. |
intercept |
Consider the intercept in the regression (default = TRUE). |
sigf |
Level of significance of residue tests(default = 5%). |
Betas |
Regression coefficients. |
CovBetas |
Covariance matrix of the regression coefficients. |
ICc |
Confidence interval of the regression coefficients. |
hip.test |
Hypothesis test of the regression coefficients. |
ANOVA |
Regression analysis of the variance. |
R |
Determination coefficient. |
Rc |
Corrected coefficient of determination. |
Ra |
Adjusted coefficient of determination. |
QME |
Variance of the residues. |
ICQME |
Confidence interval of the residue variance. |
prev |
Prediction of the regression fit. |
IPp |
Predictions interval |
ICp |
Interval of prediction confidence |
error |
Residuals of the regression fit. |
error.test |
It returns to 5% of significance the test of independence, normality and homogeneity of the variance of the residues. |
Paulo Cesar Ossani
Charnet, R.; at al.. Analise de modelos de regressao lienar, 2a ed. Campinas: Editora da Unicamp, 2008. 357 p.
Rencher, A. C.; Schaalje, G. B. Linear models in statisctic. 2th. ed. New Jersey: John & Sons, 2008. 672 p.
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
data(DataMix) Y <- DataMix[,2] X <- DataMix[,6:7] name.x <- c("Special Coffees", "Commercial Coffees") res <- Regr(Y, X, namevarx = name.x , intercept = TRUE, sigf = 0.05) print("Regression Coefficients:"); round(res$Betas,4) print("Analysis of Variance:"); res$ANOVA print("Hypothesis test of regression coefficients:"); round(res$hip.test,4) print("Determination coefficient:"); round(res$R,4) print("Corrected coefficient of determination:"); round(res$Rc,4) print("Adjusted coefficient of determination:"); round(res$Ra,4) print("Tests of the residues"); res$error.test
data(DataMix) Y <- DataMix[,2] X <- DataMix[,6:7] name.x <- c("Special Coffees", "Commercial Coffees") res <- Regr(Y, X, namevarx = name.x , intercept = TRUE, sigf = 0.05) print("Regression Coefficients:"); round(res$Betas,4) print("Analysis of Variance:"); res$ANOVA print("Hypothesis test of regression coefficients:"); round(res$hip.test,4) print("Determination coefficient:"); round(res$R,4) print("Corrected coefficient of determination:"); round(res$Rc,4) print("Adjusted coefficient of determination:"); round(res$Ra,4) print("Tests of the residues"); res$error.test
Performs the scatter plot.
Scatter(data, ellipse = TRUE, ellipse.level = 0.95, rectangle = FALSE, title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
Scatter(data, ellipse = TRUE, ellipse.level = 0.95, rectangle = FALSE, title = NA, xlabel = NA, ylabel = NA, posleg = 2, boxleg = TRUE, axes = TRUE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = NA, classcolor = NA, savptc = FALSE, width = 3236, height = 2000, res = 300)
data |
Data with x and y coordinates. |
ellipse |
Place an ellipse around the classes (default = TRUE). |
ellipse.level |
Significance level of the ellipse (defaul = 0.95). |
rectangle |
Place rectangle to differentiate classes (default = FALSE). |
title |
Titles of the graphics, if not set, assumes the default text. |
xlabel |
Names the X axis, if not set, assumes the default text. |
ylabel |
Names the Y axis, if not set, assumes the default text. |
posleg |
0 with no caption, |
boxleg |
Puts the frame in the caption (default = TRUE). |
axes |
Plots the X and Y axes (default = TRUE). |
size |
Size of the points in the graphs. |
grid |
Put grid on graphs (default = TRUE). |
color |
Colored graphics (default = TRUE). |
linlab |
Vector with the labels for the observations. |
class |
Vector with names of data classes. |
classcolor |
Vector with the colors of the classes. |
savptc |
Saves graphics images to files (default = FALSE). |
width |
Graphics images width when savptc = TRUE (defaul = 3236). |
height |
Graphics images height when savptc = TRUE (default = 2000). |
res |
Nominal resolution in ppi of the graphics images when savptc = TRUE (default = 300). |
Scatter plot.
Paulo Cesar Ossani
Rencher, A. C. Methods of multivariate analysis. 2th. ed. New York: J.Wiley, 2002. 708 p.
Anton, H.; Rorres, C. Elementary linear algebra: applications version. 10th ed. New Jersey: John Wiley & Sons, 2010. 768 p.
data(iris) # data set data <- iris[,3:4] cls <- iris[,5] # data class Scatter(data, ellipse = TRUE, ellipse.level = 0.95, rectangle = FALSE, title = NA, xlabel = NA, ylabel = NA, posleg = 1, boxleg = FALSE, axes = FALSE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","blue","red"), savptc = FALSE, width = 3236, height = 2000, res = 300) Scatter(data, ellipse = FALSE, ellipse.level = 0.95, rectangle = TRUE, title = NA, xlabel = NA, ylabel = NA, posleg = 1, boxleg = TRUE, axes = FALSE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","blue","red"), savptc = FALSE, width = 3236, height = 2000, res = 300)
data(iris) # data set data <- iris[,3:4] cls <- iris[,5] # data class Scatter(data, ellipse = TRUE, ellipse.level = 0.95, rectangle = FALSE, title = NA, xlabel = NA, ylabel = NA, posleg = 1, boxleg = FALSE, axes = FALSE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","blue","red"), savptc = FALSE, width = 3236, height = 2000, res = 300) Scatter(data, ellipse = FALSE, ellipse.level = 0.95, rectangle = TRUE, title = NA, xlabel = NA, ylabel = NA, posleg = 1, boxleg = TRUE, axes = FALSE, size = 1.1, grid = TRUE, color = TRUE, linlab = NA, class = cls, classcolor = c("goldenrod3","blue","red"), savptc = FALSE, width = 3236, height = 2000, res = 300)