--- title: "Working with log-ratio coordinates in `coda.base`" author: "Marc Comas-CufĂ­" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_caption: yes vignette: > %\VignetteIndexEntry{Working with log-ratio coordinates in `coda.base`} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` In this vignette we show how to define log-ratio coordinates using `coda.base` package and its function `coordinates` with parameters `X`, a composition, and `basis`, defining the independent log-contrasts for building the coordinates. In this vignette we work with a subcomposition of the results obtained in different regions of Catalonia in 2017's parliament elections: ```{r, message=FALSE} library(coda.base) data('parliament2017') X = parliament2017[,c('erc','jxcat','psc','cs')] ``` # Log-ratio coordinates with `coda.base` ## The additive logratio (alr) coordinates The alr coordinates are accessible by setting the parameter `basis='alr'` or by using the building function `alr_basis()`. If you don't want the last part in the denominator, the easiest way to define an alr-coordinates is to set `basis='alr'`: ```{r} H1.alr = coordinates(X, basis = 'alr') head(H1.alr) ``` It defines an alr-coordinates were the last part is used in the denominator. We can obtain the basis used to build the coordinates with function `basis()`: ```{r} basis(H1.alr) ``` The basis can be reproduced using the function `alr_basis`: ```{r} alr_basis(dim = 4) ``` In fact, function `alr_basis` allows to define any type of alr-like coordinate by defining the numerator and the denominator: ```{r} B.alr = alr_basis(dim = 4, numerator = c(4,2,3), denominator = 1) B.alr ``` The log-contrast matrix can be used as `basis` parameter in `coordinates()` function: ```{r} H2.alr = coordinates(X, basis = B.alr) basis(H2.alr) ``` ## The centered logratio (clr) coordinates Building centered log-ratio coordinates can be accomplished by setting parameter `basis='clr'` or ```{r} H.clr = coordinates(X, basis = 'clr') head(H.clr) ``` ## The isometric logratio (ilr) coordinates `coda.base` allows to define a wide variety of ilr-coordinates: principal components (pc) coordinates, specific user balances coordinates, principal balances (pb) coordinates, balanced coordinates (default's [CoDaPack](https://imae.udg.edu/codapack/)'s coordinates). The default ilr coordinate used by `coda.base` are accessible by simply calling function `coordinates` without parameters: ```{r} H1.ilr = coordinates(X) head(H1.ilr) ``` Parameter `basis` is set to `ilr` by default: ```{r} all.equal( coordinates(X, basis = 'ilr'), H1.ilr ) ``` ## Other ilr-coordinates: Principal Components and Principal balances Other easily accessible coordinates are the Principal Component (PC) coordinates. PC coordinates define the first coordinate as the log-contrast with highest variance, the second the one independent from the first and with highest variance and so on: ```{r, fig.width=5.5, fig.height=4, fig.align='center', caption='Variance of principal components coordinates'} H2.ilr = coordinates(X, basis = 'pc') head(H2.ilr) barplot(apply(H2.ilr, 2, var)) ``` Note that the PC coordinates are independent: ```{r} cov(H2.ilr) ``` The Principal Balance coordinates are similar to PC coordinates but with the restriction that the log contrast are balances ```{r, fig.width=5.5, fig.height=4, fig.align='center', caption='Variance of principal balances coordinates'} H3.ilr = coordinates(X, basis = 'pb') head(H3.ilr) barplot(apply(H3.ilr, 2, var)) ``` Moreover, they are not independent: ```{r} cor(H3.ilr) ``` Principal Balances are hard to compute when the number of components is very high. `coda.base` allows to build PB approximations using different algorithms. ```{r} X100 = exp(matrix(rnorm(1000*100), ncol = 100)) ``` * _Hierarchical clustering based algorithm_. ```{r} PB1.ward = pb_basis(X100, method = 'cluster') ``` * _Constrained search algorithm_ ```{r} PB1.constrained = pb_basis(X100, method = 'constrained') ``` We can compare they performance (variance explained by the first balance) with respect to the principal components. ```{r} PC_approx = coordinates(X100, cbind(pc_basis(X100)[,1], PB1.ward[,1], PB1.constrained[,1])) names(PC_approx) = c('PC', 'Ward', 'Constrained') apply(PC_approx, 2, var) ``` Finally, `coda.base` allows to define the default CoDaPack basis which consists in defining well balanced balances, i.e. equal number of branches in each balance. ```{r} H4.ilr = coordinates(X, basis = 'cdp') head(H4.ilr) ``` # Defining coordinates manually ## Defining coordinates with an specific basis We can define the coordinates directly by providing the log-contrast matrix. ```{r} B = matrix(c(-1,-1,2,0, 1,0,-0.5,-0.5, -0.5,0.5,0,0), ncol = 3) H1.man = coordinates(X, basis = B) head(H1.man) ``` ## Defining coordinates using balances We can also define balances using formula `numerator~denominator`: ```{r} B.man = sbp_basis(list(b1 = erc~jxcat, b2 = psc~cs, b3 = erc+jxcat~psc+cs), data=X) H2.man = coordinates(X, basis = B.man) head(H2.man) ``` With `sbp_basis` we do not need to define neither a basis nor a system generator ```{r} B = sbp_basis(list(b1 = erc+jxcat~psc+cs), data=X) H3.man = coordinates(X, basis = B) head(H3.man) ``` or ```{r} B = sbp_basis(list(b1 = erc~jxcat+psc~cs, b2 = jxcat~erc+psc+cs, b3 = psc~erc+jxcat+cs, b4 = cs~erc+jxcat+psc), data=X) H4.man = coordinates(X, basis = B) head(H4.man) ``` If interested, we can complete a sequential binary partition giving only some partitions ```{r} B = sbp_basis(list(b1 = erc+jxcat~psc), data=X, fill = TRUE) sign(B) ``` We can also define sequential binary partition using a matrix. By using a matrix we don't need to include a dataset. The number of components is obtained with the number of rows and component names from row names (if available). ```{r} P = matrix(c(1, 1,-1,-1, 1,-1, 0, 0, 0, 0, 1,-1), ncol= 3) B = sbp_basis(P) H5.man = coordinates(X, basis = B) head(H5.man) ```