---
title: "Working with log-ratio coordinates in `coda.base`"
author: "Marc Comas-Cufí"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
vignette: >
  %\VignetteIndexEntry{Working with log-ratio coordinates in `coda.base`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette we show how to define log-ratio coordinates using `coda.base` package and its function `coordinates` with parameters `X`, a composition, and `basis`, defining the independent log-contrasts for building the coordinates.

In this vignette we work with a subcomposition of the results obtained in different regions of Catalonia in 2017's parliament elections:

```{r, message=FALSE}
library(coda.base)
data('parliament2017')
X = parliament2017[,c('erc','jxcat','psc','cs')]
```

# Log-ratio coordinates with `coda.base`

##  The additive logratio (alr) coordinates

The alr coordinates are accessible by setting the parameter `basis='alr'` or by using the building function `alr_basis()`.

If you don't want the last part in the denominator, the easiest way to define an alr-coordinates is to set `basis='alr'`:

```{r}
H1.alr = coordinates(X, basis = 'alr')
head(H1.alr)
```

It defines an alr-coordinates were the last part is used in the denominator. We can obtain the basis used to build the coordinates with function `basis()`:

```{r}
basis(H1.alr)
```

The basis can be reproduced using the function `alr_basis`:

```{r}
alr_basis(dim = 4)
```

In fact, function `alr_basis` allows to define any type of alr-like coordinate by defining the numerator and the denominator:

```{r}
B.alr = alr_basis(dim = 4, numerator = c(4,2,3), denominator = 1)
B.alr
```

The log-contrast matrix can be used as `basis` parameter in `coordinates()` function:

```{r}
H2.alr = coordinates(X, basis = B.alr)
basis(H2.alr)
```

##  The centered logratio (clr) coordinates

Building centered log-ratio coordinates can be accomplished by setting parameter `basis='clr'` or 

```{r}
H.clr = coordinates(X, basis = 'clr')
head(H.clr)
```


## The isometric logratio (ilr) coordinates

`coda.base` allows to define a wide variety of ilr-coordinates: principal components (pc) coordinates, specific user balances coordinates, principal balances (pb) coordinates, balanced coordinates (default's [CoDaPack](https://imae.udg.edu/codapack/)'s coordinates).

The default ilr coordinate used by `coda.base` are accessible by simply calling function `coordinates` without parameters:

```{r}
H1.ilr = coordinates(X)
head(H1.ilr)
```

Parameter `basis` is set to `ilr` by default:

```{r}
all.equal( coordinates(X, basis = 'ilr'),
           H1.ilr )
```

## Other ilr-coordinates: Principal Components and Principal balances

Other easily accessible coordinates are the Principal Component (PC) coordinates. PC coordinates define the first coordinate as the log-contrast with highest variance, the second the one independent from the first and with highest variance and so on:

```{r, fig.width=5.5, fig.height=4, fig.align='center', caption='Variance of principal components coordinates'}
H2.ilr = coordinates(X, basis = 'pc')
head(H2.ilr)
barplot(apply(H2.ilr, 2, var))
```

Note that the PC coordinates are independent:

```{r}
cov(H2.ilr)
```


The Principal Balance coordinates are similar to PC coordinates but with the restriction that the log contrast are balances 

```{r, fig.width=5.5, fig.height=4, fig.align='center', caption='Variance of principal balances coordinates'}
H3.ilr = coordinates(X, basis = 'pb')
head(H3.ilr)
barplot(apply(H3.ilr, 2, var))
```

Moreover, they are not independent:

```{r}
cor(H3.ilr)
```

Principal Balances are hard to compute when the number of components is very high. `coda.base` allows to build PB approximations using different algorithms.

```{r}
X100 = exp(matrix(rnorm(1000*100), ncol = 100))
```

* _Hierarchical clustering based algorithm_.

```{r}
PB1.ward = pb_basis(X100, method = 'cluster')
```

* _Constrained search algorithm_

```{r}
PB1.constrained = pb_basis(X100, method = 'constrained')
```

We can compare they performance (variance explained by the first balance) with respect to the principal components.

```{r}
PC_approx = coordinates(X100, cbind(pc_basis(X100)[,1], PB1.ward[,1], PB1.constrained[,1]))
names(PC_approx) = c('PC', 'Ward', 'Constrained')
apply(PC_approx, 2, var)
```

Finally, `coda.base` allows to define the default CoDaPack basis which consists in defining well balanced balances, i.e. equal number of branches in each balance.

```{r}
H4.ilr = coordinates(X, basis = 'cdp')
head(H4.ilr)
```

# Defining coordinates manually

## Defining coordinates with an specific basis

We can define the coordinates directly by providing the log-contrast matrix.

```{r}
B = matrix(c(-1,-1,2,0,
             1,0,-0.5,-0.5,
             -0.5,0.5,0,0), ncol = 3)
H1.man = coordinates(X, basis = B)
head(H1.man)
```

## Defining coordinates using balances

We can also define balances using formula `numerator~denominator`:

```{r}
B.man = sbp_basis(list(b1 = erc~jxcat,
                       b2 = psc~cs,
                       b3 = erc+jxcat~psc+cs), 
                  data=X)
H2.man = coordinates(X, basis = B.man)
head(H2.man)
```

With `sbp_basis` we do not need to define neither a basis nor a system generator

```{r}
B = sbp_basis(list(b1 = erc+jxcat~psc+cs), 
              data=X)
H3.man = coordinates(X, basis = B)
head(H3.man)
```

or 

```{r}
B = sbp_basis(list(b1 = erc~jxcat+psc~cs, 
                   b2 = jxcat~erc+psc+cs,
                   b3 = psc~erc+jxcat+cs,
                   b4 = cs~erc+jxcat+psc),
              data=X)
H4.man = coordinates(X, basis = B)
head(H4.man)
```

If interested, we can complete a sequential binary partition giving only some partitions


```{r}
B = sbp_basis(list(b1 = erc+jxcat~psc), 
              data=X, fill = TRUE)
sign(B)
```


We can also define sequential binary partition using a matrix. By using a matrix we don't need to include a dataset. The number of components is obtained with the number of rows and component names from row names (if available).

```{r}
P =  matrix(c(1, 1,-1,-1,
              1,-1, 0, 0,
              0, 0, 1,-1), ncol= 3)
B = sbp_basis(P)
H5.man = coordinates(X, basis = B)
head(H5.man)
```