Title: | Exploratory Data Analysis using Tiled One-Dimensional Graphics |
---|---|
Description: | Streamlines exploratory data analysis by providing a turnkey approach to visualising n-dimensional data which graphically reveals correlative or associative relationships between 2 or more features. Represents all dataset features as distinct, vertically aligned bar or tile plots, with plot types auto-selected based on whether variables are categorical or numeric. |
Authors: | Sam El-Kamand [aut, cre] , Children's Cancer Institute Australia [cph] |
Maintainer: | Sam El-Kamand <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-12-10 01:36:16 UTC |
Source: | CRAN |
Takes an input string and 'beautify' by converting underscores to spaces and
beautify(string, autodetect_units = TRUE)
beautify(string, autodetect_units = TRUE)
string |
input string |
autodetect_units |
automatically detect units (e.g. mm, kg, etc) and wrap in brackets. |
string
Parse a tibble and ensure it meets standards
column_info_table( data, maxlevels = 6, col_id = NULL, cols_to_plot, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", palettes, colours_default, colours_default_logical, verbose )
column_info_table( data, maxlevels = 6, col_id = NULL, cols_to_plot, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", palettes, colours_default, colours_default_logical, verbose )
data |
data.frame to autoplot (data.frame) |
maxlevels |
for categorical variables, what is the maximum number of distinct values to allow (too many will make it hard to find a palette that suits). (number) |
col_id |
name of column to use as an identifier. If null, artificial IDs will be created based on row-number. |
cols_to_plot |
names of columns in data that should be plotted. By default plots all valid columns (character) |
tooltip_column_suffix |
the suffix added to a column name that indicates column should be used as a tooltip (string) |
ignore_column_regex |
a regex string that, if matches a column name, will cause that column to be exclude from plotting (string) (default: "_ignore$") |
palettes |
A list of named vectors. List names correspond to data column names (categorical only). Vector names to levels of columns. Vector values are colours, the vector names are used to map values in data to a colour. |
colours_default |
Default colors for categorical variables without a custom palette. |
colours_default_logical |
Colors for binary variables: a vector of three colors representing |
verbose |
Numeric value indicating the verbosity level:
|
tibble with the following columns:
colnames
coltype (categorical/numeric/tooltip/invalid)
ndistinct (number of distinct values)
plottable (should this column be plotted)
tooltip_col (the name of the column to use as the tooltip) or NA if no obvious tooltip column found
Visualize all columns in a data frame with gg1d's vertically aligned plots and automatic plot selection based on variable type. Plots are fully interactive, and custom tooltips can be added.
gg1d( data, col_id = NULL, col_sort = NULL, order_matches_sort = TRUE, maxlevels = 6, verbose = 2, drop_unused_id_levels = FALSE, interactive = TRUE, return = c("plot", "column_info", "data"), palettes = NULL, sort_type = c("frequency", "alphabetical"), desc = TRUE, limit_plots = TRUE, max_plottable_cols = 15, cols_to_plot = NULL, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", convert_binary_numeric_to_factor = TRUE, options = gg1d_options(show_legend = !interactive) )
gg1d( data, col_id = NULL, col_sort = NULL, order_matches_sort = TRUE, maxlevels = 6, verbose = 2, drop_unused_id_levels = FALSE, interactive = TRUE, return = c("plot", "column_info", "data"), palettes = NULL, sort_type = c("frequency", "alphabetical"), desc = TRUE, limit_plots = TRUE, max_plottable_cols = 15, cols_to_plot = NULL, tooltip_column_suffix = "_tooltip", ignore_column_regex = "_ignore$", convert_binary_numeric_to_factor = TRUE, options = gg1d_options(show_legend = !interactive) )
data |
data.frame to autoplot (data.frame) |
col_id |
name of column to use as an identifier. If null, artificial IDs will be created based on row-number. |
col_sort |
name of columns to sort on. To do a hierarchical sort, supply a vector of column names in the order they should be sorted (character). |
order_matches_sort |
should the column plots be stacked top-to-bottom in the order they appear in |
maxlevels |
for categorical variables, what is the maximum number of distinct values to allow (too many will make it hard to find a palette that suits). (number) |
verbose |
Numeric value indicating the verbosity level:
|
drop_unused_id_levels |
if col_id is a factor with unused levels, should these be dropped or included in visualisation |
interactive |
produce interactive ggiraph visualiastion (flag) |
return |
a string describing what this function should return. Options include:
|
palettes |
A list of named vectors. List names correspond to data column names (categorical only). Vector names to levels of columns. Vector values are colours, the vector names are used to map values in data to a colour. |
sort_type |
controls how categorical variables are sorted.
Numerical variables are always sorted in numerical order irrespective of the value given here.
Options are |
desc |
sort in descending order (flag) |
limit_plots |
throw an error when there are > |
max_plottable_cols |
maximum number of columns that can be plotted (default: 15) (number) |
cols_to_plot |
names of columns in data that should be plotted. By default plots all valid columns (character) |
tooltip_column_suffix |
the suffix added to a column name that indicates column should be used as a tooltip (string) |
ignore_column_regex |
a regex string that, if matches a column name, will cause that column to be exclude from plotting (string) (default: "_ignore$") |
convert_binary_numeric_to_factor |
If a numeric column conatins only values 0, 1, & NA, then automatically convert to a factor. |
options |
a list of additional visual parameters created by calling |
ggiraph interactive visualisation
path_gg1d <- system.file("example.csv", package = "gg1d") df <- read.csv(path_gg1d, header = TRUE, na.strings = "") # Create Basic Plot gg1d(df, col_id = "ID", col_sort = "Glasses") # Configure plot gg1d_options() gg1d( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = gg1d_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
path_gg1d <- system.file("example.csv", package = "gg1d") df <- read.csv(path_gg1d, header = TRUE, na.strings = "") # Create Basic Plot gg1d(df, col_id = "ID", col_sort = "Glasses") # Configure plot gg1d_options() gg1d( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = gg1d_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
Configures aesthetic and layout settings for plots generated by gg1d
.
gg1d_options( colours_default = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494"), colours_default_logical = c(`TRUE` = "#648fff", `FALSE` = "#dc267f"), colours_missing = "grey90", show_legend_titles = FALSE, legend_title_position = c("top", "bottom", "left", "right"), legend_nrow = 4, legend_ncol = NULL, legend_title_size = NULL, legend_text_size = NULL, legend_key_size = 0.3, legend_orientation_heatmap = c("horizontal", "vertical"), show_legend = TRUE, legend_position = c("right", "left", "bottom", "top"), na_marker = "!", na_marker_size = 8, na_marker_colour = "black", show_na_marker_categorical = FALSE, show_na_marker_heatmap = FALSE, colours_heatmap_low = "purple", colours_heatmap_high = "seagreen", transform_heatmap = c("identity", "log10", "log2"), fontsize_values_heatmap = 3, show_values_heatmap = FALSE, colours_values_heatmap = "white", vertical_spacing = 0, numeric_plot_type = c("bar", "heatmap"), y_axis_position = c("left", "right"), width = 0.9, relative_height_numeric = 4, cli_header = "Running gg1d", interactive_svg_width = NULL, interactive_svg_height = NULL, fontsize_barplot_y_numbers = 8, max_digits_barplot_y_numbers = 3, fontsize_y_title = 12, beautify_text = TRUE )
gg1d_options( colours_default = c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494"), colours_default_logical = c(`TRUE` = "#648fff", `FALSE` = "#dc267f"), colours_missing = "grey90", show_legend_titles = FALSE, legend_title_position = c("top", "bottom", "left", "right"), legend_nrow = 4, legend_ncol = NULL, legend_title_size = NULL, legend_text_size = NULL, legend_key_size = 0.3, legend_orientation_heatmap = c("horizontal", "vertical"), show_legend = TRUE, legend_position = c("right", "left", "bottom", "top"), na_marker = "!", na_marker_size = 8, na_marker_colour = "black", show_na_marker_categorical = FALSE, show_na_marker_heatmap = FALSE, colours_heatmap_low = "purple", colours_heatmap_high = "seagreen", transform_heatmap = c("identity", "log10", "log2"), fontsize_values_heatmap = 3, show_values_heatmap = FALSE, colours_values_heatmap = "white", vertical_spacing = 0, numeric_plot_type = c("bar", "heatmap"), y_axis_position = c("left", "right"), width = 0.9, relative_height_numeric = 4, cli_header = "Running gg1d", interactive_svg_width = NULL, interactive_svg_height = NULL, fontsize_barplot_y_numbers = 8, max_digits_barplot_y_numbers = 3, fontsize_y_title = 12, beautify_text = TRUE )
colours_default |
Default colors for categorical variables without a custom palette. |
colours_default_logical |
Colors for binary variables: a vector of three colors representing |
colours_missing |
Color for missing ( |
show_legend_titles |
Display titles for legends (flag). |
legend_title_position |
Position of the legend title ("top", "bottom", "left", "right"). |
legend_nrow |
Number of rows in the legend (number). |
legend_ncol |
Number of columns in the legend. If set, |
legend_title_size |
Size of the legend title text (number). |
legend_text_size |
Size of the text within the legend (number). |
legend_key_size |
Size of the legend key symbols (number). |
legend_orientation_heatmap |
should legend orientation be "horizontal" or "vertical". |
show_legend |
Display the legend on the plot (flag). |
legend_position |
Position of the legend ("right", "left", "bottom", "top"). |
na_marker |
Text used to mark |
na_marker_size |
Size of the text marker for |
na_marker_colour |
Color of the |
show_na_marker_categorical |
Show a marker for |
show_na_marker_heatmap |
Show a marker for |
colours_heatmap_low |
Color for the lowest value in heatmaps (string). |
colours_heatmap_high |
Color for the highest value in heatmaps (string). |
transform_heatmap |
Transformation to apply before visualizing heatmap values ("identity", "log10", "log2"). |
fontsize_values_heatmap |
Font size for heatmap values (number). |
show_values_heatmap |
Display numerical values on heatmap tiles (flag). |
colours_values_heatmap |
Color for heatmap values (string). |
vertical_spacing |
Space between each data row in points (number). |
numeric_plot_type |
Type of visualization for numeric data: "bar" or "heatmap". |
y_axis_position |
Position of the y-axis ("left" or "right"). |
width |
controls how much space is present between bars and tiles within each plot. Can be 0-1 where values of 1 makes bars/tiles take up 100% of available space (no gaps between bars). |
relative_height_numeric |
how many times taller should numeric plots be relative to categorical tile plots. Only taken into account if numeric_plot_type == "bar" (number) |
cli_header |
Text used for h1 header. Included so it can be tweaked by packages that use gg1d, so they can customise how the info messages appear. |
interactive_svg_width , interactive_svg_height
|
width and height of the interactive graphic region (in inches). Only used when |
fontsize_barplot_y_numbers |
fontsize of the text describing numeric barplot max & min values (number). |
max_digits_barplot_y_numbers |
Number of digits to round the numeric barplot max and min values to (number). |
fontsize_y_title |
fontsize of the y axis titles (a.k.a the data.frame column names) (number). |
beautify_text |
Beautify y-axis text and legend titles by capitalizing words and adding spaces (flag). |
A list of visualization parameters for gg1d
.
path_gg1d <- system.file("example.csv", package = "gg1d") df <- read.csv(path_gg1d, header = TRUE, na.strings = "") # Create Basic Plot gg1d(df, col_id = "ID", col_sort = "Glasses") # Configure plot gg1d_options() gg1d( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = gg1d_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
path_gg1d <- system.file("example.csv", package = "gg1d") df <- read.csv(path_gg1d, header = TRUE, na.strings = "") # Create Basic Plot gg1d(df, col_id = "ID", col_sort = "Glasses") # Configure plot gg1d_options() gg1d( lazy_birdwatcher, col_sort = "Magpies", palettes = list( Birdwatcher = c(Robert = "#E69F00", Catherine = "#999999"), Day = c(Weekday = "#999999", Weekend = "#009E73") ), options = gg1d_options( show_legend = TRUE, fontsize_barplot_y_numbers = 12, legend_text_size = 16, legend_key_size = 1, legend_nrow = 1, ) )
A simulated dataset describing the number of magpies observed by two birdwatchers.
lazy_birdwatcher
lazy_birdwatcher
lazy_birdwatcher
A data frame with 45 rows and 3 columns:
Number of magpies observed
Was the day of observation a weekday or a weekend?
Name of the birdwatcher
Find sensible values to add 2 breaks at for a ggplot2 axis
sensible_2_breaks(vector)
sensible_2_breaks(vector)
vector |
vector fed into ggplot axis you want to define sensible breaks for |
vector of length 2. first element descripts upper break position, lower describes lower break