--- title: "ggBubbles" author: "Thomas Schwarzl" date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{ggMiniBubbles} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction Welcome to this vignette! ## Loading the package ```{r message=FALSE, warning=FALSE, include=FALSE, paged.print=FALSE} require(knitr) ``` You can load the package with this command ```{r message=FALSE, warning=FALSE, paged.print=FALSE} require(ggplot2) require(ggBubbles) ``` ## In 15 seconds The package introduces `position_surround()` for `r CRANpkg("ggplot2")`. Parameter is `offset` which controls the offsets for position corrections (default is 0.1). `position_surround()` can be used in many `r CRANpkg("ggplot2")` functions like `geom_point` or `geom_text`: ```{r, echo = F, fig.wide = TRUE, fig.height=5, fig.width=7.1, fig.cap = "Example of a MiniBubble plot"} suppressPackageStartupMessages({ require(dplyr) require(tibble) }) data(MusicianInterestsSmall) ggplot(data = MusicianInterestsSmall, aes(x = Instrument, y = Genre, col = Level)) + geom_point(position = position_surround(), size = 4) + scale_colour_manual(values = c("#00e5ff", "#4694ff", "#465aff", "#2c00c9")) + theme_bw(base_size = 18) ``` # Bubbleplot vs Minibubble plot Here we demonstrate the advantage of MiniBubble plots compared to traditional Bubbleplot in certain usecases with *discrete* data. Please not that in this vignette we will use `r CRANpkg("dplyr")` and `r CRANpkg("tibble")` from `r CRANpkg("tidyverse")`. ```{r message=FALSE, warning=FALSE, paged.print=FALSE} require(dplyr) require(tibble) ``` ## Example data First, we load a small example data ```{r} data(MusicianInterestsSmall) ``` which contains data from musicians about their experience in differente music genres they have with their music instruments. ```{r eval=FALSE} head(MusicianInterestsSmall) ``` ```{r echo=FALSE} kable(head(MusicianInterestsSmall), caption = "First rows of MusicianInterestsSmall") ``` ## Traditional Bubble plot The traditional bubble plot is able to portrait the amount of guitarrists or pianists able to play jazz or classical music by size and display the average experience level by colour coding. ```{r, fig.wide = TRUE, fig.height=5, fig.width=7.1, fig.cap = "Traditional Bubble Plot."} ggplot(data = MusicianInterestsSmall %>% group_by(Instrument, Genre) %>% summarize(Count = n(), AvgLevel = mean(as.integer(Level))), aes(x = Instrument, y = Genre, size = Count, col = AvgLevel)) + geom_point() + theme_bw(base_size = 18) + scale_colour_gradientn( colours = rev(topo.colors(2)), na.value = "transparent", breaks = as.integer(MusicianInterestsSmall$Level) %>% unique %>% sort, labels = levels(MusicianInterestsSmall$Level), limits = c(as.integer(MusicianInterestsSmall$Level) %>% min, as.integer(MusicianInterestsSmall$Level) %>% max)) + scale_size_continuous(range = c(3, 11)) ``` From a data visualisation point of view, it is debateable how good point sizes are to display counts. However, in general we can agree that averages often hide a lot of useful information. ## MiniBubble Plot The MiniBubble plot allows to show each musician and their corresponding skill level individually: ```{r, fig.wide = TRUE, fig.height=5, fig.width=7.1, fig.cap = "Example of a MiniBubbleplot with position_surround."} ggplot(data = MusicianInterestsSmall, aes(x = Instrument, y = Genre, col = Level)) + geom_point(position = position_surround(), size = 4) + scale_colour_manual(values = c("#00e5ff", "#4694ff", "#465aff", "#2c00c9")) + theme_bw(base_size = 18) ``` This is done by the `position_surround()` function passed to the `position` argument of `geom_point`. Note, that only exact overlaps will be dodged. The points will surround the center in layers which will be filled clockwise. Since each individual data point is shown seperately, you can also use `shape` and `fill` to show further features, as long the plot will not be overloaded with information. Also, you can use `geom_text(position = position_surround())` to overlay the points with text, or make the text appear in shiny when hovering. MiniBubbleplot allows to show more features in a bubble plot. # Offset parameter The offset of the dodged points can be handed as parameters to `position_surround()`. ```{r, fig.wide = TRUE, fig.height=5, fig.width=7.1, fig.cap = "Example of a MiniBubbleplot with position_surround."} ggplot(data = MusicianInterestsSmall, aes(x = Instrument, y = Genre, col = Level)) + geom_point(position = position_surround(offset = .2), size = 4) + scale_colour_manual(values = c("#00e5ff", "#4694ff", "#465aff", "#2c00c9")) + theme_bw(base_size = 18) ``` # Another example We load a bigger test data set: ```{r} data(MusicianInterests) ``` This dataset also contains information about the musicians themselves from the multiple - choice survey. ```{r eval=FALSE} head(MusicianInterests) ``` ```{r echo=FALSE} kable(head(MusicianInterests), caption = "First rows of MusicianInterests") ``` The basis of the plot is simply: ```{r Musician Genre Interest plot} p <- ggplot(data = MusicianInterests, aes(x = Genre, y = Instrument, col = Level)) + geom_point(size = 1.8, position = position_surround(offset = .2)) ``` Here we add some graphical parameters to make it pretty: ```{r, fig.cap = "Use case of position_surround. The levels of experiences can be displayed for each individual musician, rather than displaying an average over the group", fig.wide = TRUE, fig.height=7} p <- p + theme_bw(base_size = 17) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_colour_gradientn( colours = rev(topo.colors(2)), na.value = "transparent", breaks = 1:6, labels = c("Interested", "Beginner", "Intermediate", "Experienced", "Very experienced", "Pro")) + xlab("") + ylab("") p ``` # Algorithmic details The `position_surround()` algorithm determines how many points are overlaying and then displays the points in clockwise ration around the center in quadratic layers. Please see the graphical illustration here: ```{r, echo=FALSE, fig.height=7, fig.width=7.1, fig.wide = TRUE, fig.cap = "Demonstration of position_surround algorithm. Points will be added clockwise in layers."} n <- 25 X <- data.frame(x = rep(1, sum(1:n)), y = rep(1, sum(1:n)), group = unlist(lapply(1:n, function(x) { rep(x, x) })), label = unlist(lapply(1:n, function(x) { 1:x })) ) ggplot(data = X, aes(x = x, y = y, label = label)) + facet_wrap(~group) + geom_text(position = position_surround(offset=.4)) + theme_minimal() + theme( strip.background = element_blank(), strip.text.x = element_blank() ) + xlim(c(0,2)) + ylim(c(0,2)) ``` # Support or Feedback For feedback or suggestions please contact the maintainer: Thomas Schwarzl thomas@schwarzl.net or schwarzl@embl.de.