| Title: | High Dimensional Discriminant Analysis with Compositional Data |
|---|---|
| Description: | High dimensional discriminant analysis with compositional data is performed. The compositional data are first transformed using the alpha-transformation of Tsagris M., Preston S. and Wood A.T.A. (2011) <doi:10.48550/arXiv.1106.1451>, and then the High Dimensional Discriminant Analysis (HDDA) algorithm of Bouveyron C. Girard S. and Schmid C. (2007) <doi:10.1080/03610920701271095> is applied. |
| Authors: | Michail Tsagris [aut, cre] |
| Maintainer: | Michail Tsagris <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0 |
| Built: | 2026-05-10 06:51:02 UTC |
| Source: | https://github.com/cran/CompositionalHDDA |
High dimensional discriminant analysis (HDDA) for compositional data using the alpha-transformation is performed.
| Package: | CompositionalHDDA |
| Type: | Package |
| Version: | 1.0 |
| Date: | 2025-07-08 |
| License: | GPL-2 |
Michail Tsagris <[email protected]>
Michail Tsagris [email protected].
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
HDDA for compositional data using the alpha-transformation.
alfa.hdda(xnew, ina, x, a = seq(-1, 1, by = 0.1), d_select = "Cattell", threshold = 0.2)alfa.hdda(xnew, ina, x, a = seq(-1, 1, by = 0.1), d_select = "Cattell", threshold = 0.2)
xnew |
A matrix with the new compositional data whose class is to be predicted. |
ina |
A group indicator variable for the compositional data. |
x |
The compositional data. Zero values are allowed. |
a |
Either a single value or a vector of |
d_select |
Either "Cattell", "BIC" or "both". "Cattell": The Cattell's scree-test is used to gather the intrinsic dimension of each class. If the model is of common dimension (models 7 to 14), the scree-test is done on the covariance matrix of the whole dataset. "BIC": The intrinsic dimensions are selected with the BIC criterion. See Bouveyron et al. (2010) for a discussion of this topic. For common dimension models, the procedure is done on the covariance matrix of the whole dataset. |
threshold |
A float stricly within 0 and 1. It is the threshold used in the Cattell's Scree-Test. |
The compositional data are first using the -transformation and then the HDDA algorithm is called. The function then will compute all the models, give their BIC and keep the model with the highest BIC value.
A list with sub-lists, one for each value of , where each sub-list includes:
mod |
A list containing the output as returned by the function hdda from the package HDclassif. |
class |
The predicted class of each observation. |
posterior |
The posterior probabilities of each new observation. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
x <- matrix( rgamma(60 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE ) x <- x / rowSums(x) ## Dirichlet simulated values xnew <- matrix( rgamma(20 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE ) xnew <- xnew / rowSums(xnew) ## Dirichlet simulated values ina <- rbinom(60, 1, 0.5) alfa.hdda(xnew, ina, x, a = 0.5)x <- matrix( rgamma(60 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE ) x <- x / rowSums(x) ## Dirichlet simulated values xnew <- matrix( rgamma(20 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE ) xnew <- xnew / rowSums(xnew) ## Dirichlet simulated values ina <- rbinom(60, 1, 0.5) alfa.hdda(xnew, ina, x, a = 0.5)
Cross-Validation of the HDDA for compositional data using the alpha-transformation.
cv.alfahdda(ina, x, a = seq(-1, 1, by = 0.1), d_select = "both", threshold = c(0.001, 0.005, 0.05, 1:9 * 0.1), folds = NULL, stratified = TRUE, nfolds = 10, seed = NULL)cv.alfahdda(ina, x, a = seq(-1, 1, by = 0.1), d_select = "both", threshold = c(0.001, 0.005, 0.05, 1:9 * 0.1), folds = NULL, stratified = TRUE, nfolds = 10, seed = NULL)
ina |
A group indicator variable for the compositional data. |
x |
The compositional data. Zero values are allowed. |
a |
A vector of |
d_select |
Either "Cattell", "BIC" or "both". |
threshold |
A vector with numbers strictly bewtween 0 and 1. Each value corresponds to a threshold used in the Cattell's Scree-Test. |
folds |
If you have the list with the folds supply it here. You can also leave it NULL and it will create folds. |
stratified |
Do you want the folds to be created in a stratified way? The default value is TRUE. |
nfolds |
The number of folds in the cross validation. |
seed |
You can specify your own seed number here or leave it NULL. |
K-fold cross-validation for the high dimensional discriminant analysis with compositional data using the -transformation is performed.
A list including:
kl |
A matrix with the configurations of hyper-parameters tested and the estimated Kullback-Leibler divergence, for each configuration. |
js |
A matrix with the configurations of hyper-parameters tested and the estimated Jensen-Shannon divergence, for each configuration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
x <- matrix( rgamma(100 * 200, runif(200, 4, 10), 1), ncol = 200, byrow = TRUE ) x <- x / rowSums(x) ## Dirichlet simulated values ina <- rbinom(100, 1, 0.5) mod <- cv.alfahdda(ina, x, a = c(0.1, 0.5, 1), d_select = "both", threshold = seq(0.1, 0.5, by = 0.1), nfolds = 5)x <- matrix( rgamma(100 * 200, runif(200, 4, 10), 1), ncol = 200, byrow = TRUE ) x <- x / rowSums(x) ## Dirichlet simulated values ina <- rbinom(100, 1, 0.5) mod <- cv.alfahdda(ina, x, a = c(0.1, 0.5, 1), d_select = "both", threshold = seq(0.1, 0.5, by = 0.1), nfolds = 5)