Title: | High Dimensional Categorical Data Visualization |
---|---|
Description: | Easy visualization for datasets with more than two categorical variables and additional continuous variables. 'diceplot' is particularly useful for exploring complex categorical data in the context of pathway analysis across multiple conditions. For a detailed documentation please visit <https://dice-and-domino-plot.readthedocs.io/en/latest/>. |
Authors: | Matthias Flotho [aut, cre] |
Maintainer: | Matthias Flotho <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.4 |
Built: | 2024-11-25 14:56:55 UTC |
Source: | CRAN |
Calculates the dot size based on the number of variables.
calculate_dot_size(num_vars, max_size, min_size)
calculate_dot_size(num_vars, max_size, min_size)
num_vars |
Number of variables. |
max_size |
Maximal dot size for the plot to scale the dot sizes. |
min_size |
Minimal dot size for the plot to scale the dot sizes. |
A numeric value representing the dot size.
Creates custom legend plots for cat_c
and group
.
create_custom_legends( data, cat_c, group, cat_c_colors, group_colors, var_positions, num_vars, dot_size )
create_custom_legends( data, cat_c, group, cat_c_colors, group_colors, var_positions, num_vars, dot_size )
data |
The original data frame. |
cat_c |
The name of the |
group |
The name of the group variable. |
cat_c_colors |
A named vector of colors for |
group_colors |
A named vector of colors for the group variable. |
var_positions |
Data frame with variable positions. |
num_vars |
Number of variables in |
dot_size |
The size of the dots used in the plot. |
A combined ggplot object of the custom legends.
Generates a data frame containing variable names from cat_c_colors
and corresponding x and y offsets based on the number of variables.
create_var_positions(cat_c_colors, num_vars)
create_var_positions(cat_c_colors, num_vars)
cat_c_colors |
A named vector of colors for variables in category C. The names correspond to variable names. |
num_vars |
The number of variables. Supported values are "3", "4", "5", or "6". |
A data frame with columns:
Factor of variable names from cat_c_colors
.
Numeric x-axis offset for plotting.
Numeric y-axis offset for plotting.
library(dplyr) cat_c_colors <- c("Var1" = "red", "Var2" = "blue", "Var3" = "green") create_var_positions(cat_c_colors, 3)
library(dplyr) cat_c_colors <- c("Var1" = "red", "Var2" = "blue", "Var3" = "green") create_var_positions(cat_c_colors, 3)
This function generates a custom plot based on three categorical variables and a group variable. It adapts to the number of unique categories in cat_c
and allows customization of various plot aesthetics.
dice_plot( data, cat_a, cat_b, cat_c, group = NULL, group_alpha = 0.5, title = NULL, cat_c_colors = NULL, group_colors = NULL, custom_theme = theme_minimal(), max_dot_size = 5, min_dot_size = 2, legend_width = 0.25, legend_height = 0.5, base_width_per_cat_a = 0.5, base_height_per_cat_b = 0.3, reverse_ordering = FALSE, cat_b_order = NULL, cluster_by_row = TRUE, cluster_by_column = TRUE, show_legend = TRUE )
dice_plot( data, cat_a, cat_b, cat_c, group = NULL, group_alpha = 0.5, title = NULL, cat_c_colors = NULL, group_colors = NULL, custom_theme = theme_minimal(), max_dot_size = 5, min_dot_size = 2, legend_width = 0.25, legend_height = 0.5, base_width_per_cat_a = 0.5, base_height_per_cat_b = 0.3, reverse_ordering = FALSE, cat_b_order = NULL, cluster_by_row = TRUE, cluster_by_column = TRUE, show_legend = TRUE )
data |
A data frame containing the categorical and group variables for plotting. |
cat_a |
A string representing the column name in |
cat_b |
A string representing the column name in |
cat_c |
A string representing the column name in |
group |
A string representing the column name in |
group_alpha |
A numeric value for the transparency level of the group rectangles. Default is |
title |
An optional string for the plot title. Defaults to |
cat_c_colors |
A named vector of colors for |
group_colors |
A named vector of colors for the group variableor a string to chose a colorbrewer palette. Defaults to |
custom_theme |
A ggplot2 theme for customizing the plot's appearance. Defaults to |
max_dot_size |
Maximal dot size for the plot to scale the dot sizes. |
min_dot_size |
Minimal dot size for the plot to scale the dot sizes. |
legend_width |
Relative width of your legend. Default is 0.25. |
legend_height |
Relative width of your legend. Default is 0.5. |
base_width_per_cat_a |
Used for dynamically scaling the width. Default is 0.5. |
base_height_per_cat_b |
Used for dynamically scaling the height. Default is 0.3. |
reverse_ordering |
Should the cluster ordering be reversed?. Default is FALSE. |
cat_b_order |
Do you want to pass an explicit order?. Default is NULL. |
cluster_by_row |
Cluster rows, defaults to TRUE |
cluster_by_column |
Cluster columns, defaults to TRUE |
show_legend |
Do you want to show the legend? Default is TRUE |
A ggplot object representing the dice plot.
This function generates a plot to visualize gene expression levels for a given list of genes. The size of the dots can be customized, and the plot can be saved to an output file if specified.
domino_plot( data, gene_list, switch_axis = FALSE, min_dot_size = 1, max_dot_size = 5, spacing_factor = 3, var_id = "var", feature_col = "gene", celltype_col = "Celltype", contrast_col = "Contrast", contrast_levels = c("Clinical", "Pathological"), contrast_labels = c("Clinical", "Pathological"), logfc_col = "avg_log2FC", pval_col = "p_val_adj", logfc_limits = c(-1.5, 1.5), logfc_colors = c(low = "blue", mid = "white", high = "red"), color_scale_name = "Log2 Fold Change", size_scale_name = "-log10(adj. p-value)", axis_text_size = 8, aspect_ratio = NULL, base_width = 5, base_height = 4, output_file = NULL )
domino_plot( data, gene_list, switch_axis = FALSE, min_dot_size = 1, max_dot_size = 5, spacing_factor = 3, var_id = "var", feature_col = "gene", celltype_col = "Celltype", contrast_col = "Contrast", contrast_levels = c("Clinical", "Pathological"), contrast_labels = c("Clinical", "Pathological"), logfc_col = "avg_log2FC", pval_col = "p_val_adj", logfc_limits = c(-1.5, 1.5), logfc_colors = c(low = "blue", mid = "white", high = "red"), color_scale_name = "Log2 Fold Change", size_scale_name = "-log10(adj. p-value)", axis_text_size = 8, aspect_ratio = NULL, base_width = 5, base_height = 4, output_file = NULL )
data |
A data frame containing gene expression data. |
gene_list |
A character vector of gene names to include in the plot. |
switch_axis |
A logical value indicating whether to switch the x and y axes. Default is |
min_dot_size |
A numeric value indicating the minimum dot size in the plot. Default is |
max_dot_size |
A numeric value indicating the maximum dot size in the plot. Default is |
spacing_factor |
A numeric value indicating the spacing between gene pairs. Default is |
var_id |
A string representing the column name in |
feature_col |
A string representing the column name in |
celltype_col |
A string representing the column name in |
contrast_col |
A string representing the column name in |
contrast_levels |
A character vector specifying the levels of the contrast variable. Default is |
contrast_labels |
A character vector specifying the labels for the contrasts in the plot. Default is |
logfc_col |
A string representing the column name in |
pval_col |
A string representing the column name in |
logfc_limits |
A numeric vector of length 2 specifying the limits for the log fold change color scale. Default is |
logfc_colors |
A named vector specifying the colors for the low, mid, and high values in the color scale. Default is |
color_scale_name |
A string specifying the name of the color scale in the legend. Default is |
size_scale_name |
A string specifying the name of the size scale in the legend. Default is |
axis_text_size |
A numeric value specifying the size of the axis text. Default is |
aspect_ratio |
A numeric value specifying the aspect ratio of the plot. If |
base_width |
A numeric value specifying the base width for saving the plot. Default is |
base_height |
A numeric value specifying the base height for saving the plot. Default is |
output_file |
An optional string specifying the path to save the plot. If |
A ggplot object representing the domino plot.
Determines the ordering of category B based on the counts within each group, ordered by group and count.
order_cat_b(data, group, cat_b, group_colors, reverse_order = FALSE)
order_cat_b(data, group, cat_b, group_colors, reverse_order = FALSE)
data |
A data frame containing the variables. |
group |
The name of the column representing the grouping variable. |
cat_b |
The name of the column representing category B. |
group_colors |
A named vector of colors for each group. The names correspond to group names. |
reverse_order |
Reverse the ordering? Default is FALSE. |
A vector of category B labels ordered according to group and count.
library(dplyr) data <- data.frame( group = rep(c("G1", "G2"), each = 5), cat_b = sample(LETTERS[1:3], 10, replace = TRUE) ) group_colors <- c("G1" = "red", "G2" = "blue") order_cat_b(data, "group", "cat_b", group_colors)
library(dplyr) data <- data.frame( group = rep(c("G1", "G2"), each = 5), cat_b = sample(LETTERS[1:3], 10, replace = TRUE) ) group_colors <- c("G1" = "red", "G2" = "blue") order_cat_b(data, "group", "cat_b", group_colors)
Performs hierarchical clustering on category A based on the binary presence of combinations of categories B and C.
perform_clustering(data, cat_a, cat_b, cat_c)
perform_clustering(data, cat_a, cat_b, cat_c)
data |
A data frame containing the variables. |
cat_a |
The name of the column representing category A. |
cat_b |
The name of the column representing category B. |
cat_c |
The name of the column representing category C. |
A vector of category A labels ordered according to the hierarchical clustering.
library(dplyr) library(tidyr) library(tibble) data <- data.frame( cat_a = rep(letters[1:5], each = 4), cat_b = rep(LETTERS[1:2], times = 10), cat_c = sample(c("Var1", "Var2", "Var3"), 20, replace = TRUE) ) perform_clustering(data, "cat_a", "cat_b", "cat_c")
library(dplyr) library(tidyr) library(tibble) data <- data.frame( cat_a = rep(letters[1:5], each = 4), cat_b = rep(LETTERS[1:2], times = 10), cat_c = sample(c("Var1", "Var2", "Var3"), 20, replace = TRUE) ) perform_clustering(data, "cat_a", "cat_b", "cat_c")
Prepares data for plotting boxes by calculating box boundaries based on category positions.
prepare_box_data(data, cat_a, cat_b, group, cat_a_order, cat_b_order)
prepare_box_data(data, cat_a, cat_b, group, cat_a_order, cat_b_order)
data |
A data frame containing the variables. |
cat_a |
The name of the column representing category A. |
cat_b |
The name of the column representing category B. |
group |
The name of the column representing the grouping variable. |
cat_a_order |
A vector specifying the order of category A. |
cat_b_order |
A vector specifying the order of category B. |
A data frame with box boundaries for plotting.
library(dplyr) data <- data.frame( cat_a = rep(letters[1:3], each = 2), cat_b = rep(LETTERS[1:2], times = 3), group = rep(c("G1", "G2"), times = 3) ) cat_a_order <- c("a", "b", "c") cat_b_order <- c("A", "B") prepare_box_data(data, "cat_a", "cat_b", "group", cat_a_order, cat_b_order)
library(dplyr) data <- data.frame( cat_a = rep(letters[1:3], each = 2), cat_b = rep(LETTERS[1:2], times = 3), group = rep(c("G1", "G2"), times = 3) ) cat_a_order <- c("a", "b", "c") cat_b_order <- c("A", "B") prepare_box_data(data, "cat_a", "cat_b", "group", cat_a_order, cat_b_order)
Prepares data for plotting by calculating positions based on provided variable positions and orders.
prepare_plot_data( data, cat_a, cat_b, cat_c, group, var_positions, cat_a_order, cat_b_order )
prepare_plot_data( data, cat_a, cat_b, cat_c, group, var_positions, cat_a_order, cat_b_order )
data |
A data frame containing the variables. |
cat_a |
The name of the column representing category A. |
cat_b |
The name of the column representing category B. |
cat_c |
The name of the column representing category C. |
group |
The name of the column representing the grouping variable. |
var_positions |
A data frame with variable positions, typically output from |
cat_a_order |
A vector specifying the order of category A. |
cat_b_order |
A vector specifying the order of category B. |
A data frame ready for plotting with added x_pos and y_pos columns.
library(dplyr) data <- data.frame( cat_a = rep(letters[1:3], each = 4), cat_b = rep(LETTERS[1:2], times = 6), cat_c = rep(c("Var1", "Var2"), times = 6), group = rep(c("G1", "G2"), times = 6) ) var_positions <- data.frame( var = c("Var1", "Var2"), x_offset = c(0.1, -0.1), y_offset = c(0.1, -0.1) ) cat_a_order <- c("a", "b", "c") cat_b_order <- c("A", "B") prepare_plot_data(data, "cat_a", "cat_b", "cat_c", "group", var_positions, cat_a_order, cat_b_order)
library(dplyr) data <- data.frame( cat_a = rep(letters[1:3], each = 4), cat_b = rep(LETTERS[1:2], times = 6), cat_c = rep(c("Var1", "Var2"), times = 6), group = rep(c("G1", "G2"), times = 6) ) var_positions <- data.frame( var = c("Var1", "Var2"), x_offset = c(0.1, -0.1), y_offset = c(0.1, -0.1) ) cat_a_order <- c("a", "b", "c") cat_b_order <- c("A", "B") prepare_plot_data(data, "cat_a", "cat_b", "cat_c", "group", var_positions, cat_a_order, cat_b_order)