--- title: "Magnitude table suppression" author: "" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Magnitude table suppression} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r include = FALSE} htmltables <- TRUE if (htmltables) { source("GaussKable.R") source("KableMagnitudeTable.R") P <- function(...) G(timevar = "geo", ...) M <- function(...) KableMagnitudeTable(..., numVar = "value", timevar = "geo", singletonMethod = "none") } else { P <- function(...) cat("Formatted table not avalable") M <- P } ``` ## Introduction The `GaussSuppression` package contains several easy-to-use wrapper functions and in this vignette we will look at the `SuppressFewContributors` and `SuppressDominantCells` functions. In these functions, primary suppression is based on the number of contributors or by dominance rules. Then, as always in this package, secondary suppression is performed using the Gauss method. We begin by loading a dataset to be used below. ```{r} library(GaussSuppression) dataset <- SSBtoolsData("magnitude1") dataset ``` We can imagine the figures in the variable `"value"` represent sales value to different sectors from different companies. In the first examples, we will not use the `"company"` variable, but instead assume that each row represents the contribution of a unique company. Our input data can then be reformatted and illustrated like this: \ ```{r echo=FALSE} M(caption = '**Table 1**: Input data with the 20 contributions.', dataset, formula = ~sector4:geo-1) ``` \ ## Initial basic examples ### Using few contributors primary suppression In the first example, we use `SuppressFewContributors` with `maxN = 1`. This means that cells based on a single contributor are primary suppressed. ```{r} SuppressFewContributors(data=dataset, numVar = "value", dimVar= c("sector4", "geo"), maxN=1) ``` In the output, the number of contributors is in columns `nRule` and `nAll`. The two columns are equal under normal usage. A formatted version of this output is given in Table 2 below. Primary suppressed cells are underlined and labeled in red, while the secondary suppressed cells are labeled in purple. \ ```{r echo=FALSE} P(caption = '**Table 2**: Output from `SuppressFewContributors` with `maxN = 1` (number of contributors in parenthesis)', data=dataset, numVar = "value", dimVar= c("sector4", "geo"), maxN = 1, fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")') ``` \ ### Using dominant cell primary suppression In the second example, we use `SuppressDominantCells` with `n = 1` and `k = 80`. This means that aggregates are primary suppressed whenever the largest contribution exceeds 80% of the cell total. ```{r} SuppressDominantCells(data=dataset, numVar = "value", dimVar= c("sector4", "geo"), n = 1, k = 80, allDominance = TRUE) ``` To incorporate the percentage of the two largest contributions in the output, the parameter `allDominance = TRUE` was utilized. A formatted version of this output is given in Table 3 below. \ ```{r echo=FALSE} P(caption = '**Table 3**: Output from `SuppressDominantCells`
with `n = 1` and `k = 80`
(percentage from largest contribution in parenthesis)', data=dataset, numVar = "value", dimVar= c("sector4", "geo"), n=1, k=80, allDominance = TRUE, fun = SuppressDominantCells, print_expr = 'paste0(value, " (",round(100*`primary.1:80`) ,"%) ")') ``` Note that this table, as well as Table 2, is discussed below in the section on the singleton problem. \ \ ## An hierarchical table and two dominance rules Here we use `SuppressDominantCells` with `n = 1:2` and `k = c(80, 99)`. This means that aggregates are primary suppressed whenever the largest contribution exceeds 80% of the cell total or when the two largest contributions exceed 99% of the cell total. In addition, the example below is made even more advanced by including the variables "sector2" and "eu". ```{r} output <- SuppressDominantCells(data=dataset, numVar = "value", dimVar= c("sector4", "sector2", "geo", "eu"), n = 1:2, k = c(80, 99)) head(output) ``` \ ```{r echo=FALSE} P(caption = '**Table 4**: Output from `SuppressDominantCells`
with `n = 1:2` and `k = c(80, 99)`', data=dataset, numVar = "value", dimVar= c("sector4", "sector2", "geo", "eu"), n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value') ``` \ As described in the define-tables vignette hierarchies are here detected automatically. The same output is obtained if we first generate hierarchies by: ```{r} dimlists <- SSBtools::FindDimLists(dataset[c("sector4", "sector2", "geo", "eu")]) dimlists ``` And thereafter run SuppressDominantCells with these hierarchies as input: ```{r} output <- SuppressDominantCells(data=dataset, numVar = "value", hierarchies = dimlists, n = 1:2, k = c(80, 99)) ``` \ \ ## Using the formula interface Using the formula interface is one way to achieve fewer cells in the output. Below we use `SuppressFewContributors` with `maxN = 2`. This means that table cells based on one or two contributors are primary suppressed. ```{r} output <- SuppressFewContributors(data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, maxN=2, removeEmpty = FALSE) head(output) tail(output) ``` In the formatted version of this output, blank cells indicate that they are not included in the output. \ ```{r echo=FALSE} P(caption = '**Table 5**: Output from `SuppressFewContributors` with `maxN = 2`
(number of contributors in parenthesis)', data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, maxN=2, removeEmpty = FALSE, fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")') ``` \ Please note that in order to include the three empty cells with no contributors, the `removeEmpty` parameter was set to `FALSE`. By default, this parameter is set to `TRUE` when using the formula interface. \ \ ## Using the contributorVar parameter (`contributorVar = "company"`) According to the `"company"` variable in the data set, there are only four contribution companies (A, B, C and D). We specify this using the `contributorVar` parameter, which corresponds to a variable within the dataset, in this case, `"company"`. In general, this variable refers to the holding information to be used by the suppression method. When this is taken into account, the primary suppression rules will be applied to data that, within each cell, is aggregated within each contributor. Our example data aggregated in this way is shown below. \ ```{r echo=FALSE} M(caption = '**Table 6**: The "value" data aggregated according to hierarchy and contributor', dataset, dimVar = c("sector4", "sector2", "geo", "eu"), contributorVar = "company") ``` \ Below we take into account contributor IDs when using few contributors primary suppression. ```{r} output <- SuppressFewContributors(data=dataset, numVar = "value", dimVar = c("sector4", "sector2", "geo", "eu"), maxN=2, contributorVar = "company") head(output) ``` \ ```{r echo=FALSE} P(caption = '**Table 7**: Output from `SuppressFewContributors` with `maxN = 2` and with `contributorVar = "company"` (number contributors in parenthesis)', data=dataset, numVar = "value", dimVar = c("sector4", "sector2", "geo", "eu"), maxN=2, contributorVar = "company", fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")') ``` \ Below we take into account contributor IDs when using dominant cell primary suppression. ```{r} output <- SuppressDominantCells(data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, contributorVar = "company", n = 1:2, k = c(80, 99)) head(output) ``` Here we have also made use of the formula interface. \ ```{r echo=FALSE} P(caption = '**Table 8**: Output from `SuppressDominantCells` with `n = 1:2` and
`k = c(80, 99)` and with `contributorVar = "company"`', data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, contributorVar = "company", n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value') ``` \ \ ## The singleton problem Below, the data is suppressed in the same way as in Table 7, but with a different formula. \ ```{r} output <- SuppressDominantCells(data=dataset, numVar = "value", formula = ~sector4*geo + sector2*eu, contributorVar = "company", n = 1:2, k = c(80, 99)) head(output) ``` \ ```{r echo=FALSE} P(caption = '**Table 9**: Output from `SuppressDominantCells` with `n = 1:2` and `k = c(80, 99)` and with `contributorVar = "company"`', data=dataset, numVar = "value", formula = ~sector4*geo + sector2*eu, contributorVar = "company", n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value') ``` \ By using `singletonMethod = "none"` in this case, *Entertainment:Spain* will not be suppressed. This cell is suppressed due to the default handling of the singleton problem. The reason is that *Entertainment:Iceland* has a single contributor. This contributor can reveal *Entertainment:Portugal* if *Entertainment:Spain* is not suppressed. Here it might appear that the table contains another issue, *Entertainment:Iceland* can reveal *Industry:Iceland*. However, this can be considered ok in this case. *Industry:Iceland* is secondary suppressed and the only reason for this is to protect *Entertainment:Iceland*. Nevertheless, in most cases, secondary suppressed cells introduce further complexity to the handling of singletons. A part of the singleton handling of magnitude tables is to add virtual primary suppressed cells prior to the secondary suppression algorithm. Secondary suppressed cells cannot, therefore, be treated in this way. However, another part of the singleton handling solves many of the remaining problems. This is done within the suppression algorithm. Here we can observe the effect of this in Tables 2 and 3. By using `singletonMethod = "none"` in Table 2, *Industry:Portugal* will not be suppressed. In that case, *Industry:Spain* can reveal *Industry:Iceland* and consequently *Entertainment:Iceland*. In Table 3, *Governmental:Total* and *Industry:Total* are suppressed due to advanced singleton handling. In fact, by using `singletonMethod = "none"`, all tables above will be suppressed differently.