Title: | Quantile Binned Plots |
---|---|
Description: | Create quantile binned and conditional plots for Exploratory Data Analysis. The package provides several plotting functions that are all based on quantile binning. The plots are created with 'ggplot2' and 'patchwork' and can be further adjusted. |
Authors: | Edwin de Jonge [aut, cre]
|
Maintainer: | Edwin de Jonge <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.3 |
Built: | 2025-02-24 19:21:07 UTC |
Source: | CRAN |
This package creates plots using quantile binning.
Quantile binning is an exploratory data analysis tool that helps to see the distribution of the variables in a dataset as a function of the variable that is binned.
A data.frame is quantile binned on a variable x
using qbin()
and then
plotted with one of the avaible plot functions.
qbinplots
offers various types of plots:
qbin_*
quantile binned plots that show the distribution of the variables in the quantile bins.
cond_*
conditional quantile plots that show the distribution of the variables conditional on the x
variable.
qbin_lineplot()
highlights the change in median between qbins, shows the distribution within qbins.
qbin_barplot()
shows the size of medians or expected value of qbins.
qbin_boxplot()
shows the distribution within qbins.
qbin_heatmap()
shows the distribution within the qbins.
cond_boxplot()
shows the distribution of the variables conditional on the x variable.
cond_barplot()
shows the expected median/mean of the variables conditional on the x variable.
funq_plot()
shows a functional view of the data, plotting the median and
interquartile range of numerical variables and level frequency of the other
variables as a function of the x
variable using quantile bins.
Maintainer: Edwin de Jonge [email protected] (ORCID)
Other contributors:
Martijn Tennekes [email protected] [contributor]
Useful links:
cond_barplot()
conditions all variables on x
by quantile binning and
shows the median or mean of the other variables for each x
.
cond_barplot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ncols = NULL, fill = "#2f4f4f", auto_fill = FALSE, show_bins = FALSE, type = c("median", "mean"), ... )
cond_barplot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ncols = NULL, fill = "#2f4f4f", auto_fill = FALSE, show_bins = FALSE, type = c("median", "mean"), ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
ncols |
The number of column to be used in the layout. |
fill |
The color to use for the bars. |
auto_fill |
If |
show_bins |
If |
type |
The type of statistic to use for the bars. |
... |
Additional arguments to pass to the plot functions |
A list
of ggplot objects.
Other conditional quantile plotting functions:
cond_boxplot()
,
cond_heatmap()
,
funq_plot()
# plots the expected median conditional on Sepal.Width cond_barplot(iris, "Sepal.Width", n = 12) # plots the expected median cond_barplot(iris, "Sepal.Width", n = 12, show_bins = TRUE) data("diamonds", package="ggplot2") cond_barplot(diamonds[c(1:4, 7)], "carat", auto_fill = TRUE) if (require(palmerpenguins)) { p <- cond_barplot(penguins[1:7], "body_mass_g", auto_fill = TRUE) print(p) # compare with qbin_boxplot p <- cond_boxplot(penguins[1:7], "body_mass_g", auto_fill = TRUE) print(p) }
# plots the expected median conditional on Sepal.Width cond_barplot(iris, "Sepal.Width", n = 12) # plots the expected median cond_barplot(iris, "Sepal.Width", n = 12, show_bins = TRUE) data("diamonds", package="ggplot2") cond_barplot(diamonds[c(1:4, 7)], "carat", auto_fill = TRUE) if (require(palmerpenguins)) { p <- cond_barplot(penguins[1:7], "body_mass_g", auto_fill = TRUE) print(p) # compare with qbin_boxplot p <- cond_boxplot(penguins[1:7], "body_mass_g", auto_fill = TRUE) print(p) }
cond_boxplot()
conditions all variables on x
by quantile binning and
shows the boxplots for the other variables for each value of qbinned x
.
cond_boxplot( data, x = NULL, n = 100, min_bin_size = NULL, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, ncols = NULL, xmarker = NULL, qmarker = NULL, show_bins = FALSE, xlim = NULL, connect = FALSE, ... )
cond_boxplot( data, x = NULL, n = 100, min_bin_size = NULL, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, ncols = NULL, xmarker = NULL, qmarker = NULL, show_bins = FALSE, xlim = NULL, connect = FALSE, ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
color |
The color to use for the line charts |
fill |
The fill color to use for the areas |
auto_fill |
If |
ncols |
The number of column to be used in the layout |
xmarker |
|
qmarker |
|
show_bins |
if |
xlim |
|
connect |
if |
... |
Additional arguments to pass to the plot functions |
cond_boxplot
is the same function as funq_plot()
but with different defaults,
namely connect = FALSE
and auto_fill = FALSE
.
funq_plot
highlights the functional relationship between
x and the y-variables, by connecting the medians of the quantile bins.
qbin_boxplot()
shows the boxplots of the quantile bins on a quantile scale.
A list
of ggplot objects.
Other conditional quantile plotting functions:
cond_barplot()
,
cond_heatmap()
,
funq_plot()
cond_boxplot( iris, x = "Petal.Length" )
cond_boxplot( iris, x = "Petal.Length" )
cond_heatmap
shows the conditional distribution of the y
of variables for each quantile bin of x
. It is an alternative to
cond_boxplot()
, fine graining the distribution per qbin()
.
cond_barplot()
highlights the median/mean of the quantile bins, while
funq_plot()
highlights the functional dependency of the median.
cond_heatmap( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, bins = c(n, 25), ncols = NULL, auto_fill = FALSE, show_bins = FALSE, fill = "#2f4f4f", low = "#eeeeee", high = "#2f4f4f", ... )
cond_heatmap( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, bins = c(n, 25), ncols = NULL, auto_fill = FALSE, show_bins = FALSE, fill = "#2f4f4f", low = "#eeeeee", high = "#2f4f4f", ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
bins |
|
ncols |
The number of column to be used in the layout. |
auto_fill |
If |
show_bins |
If |
fill |
The color used for categorical variables. |
low |
The color used for low values in the heatmap. |
high |
The color used for high values in the heatmap. |
... |
Additional arguments to pass to the plot functions |
A list
of ggplot objects.
Other conditional quantile plotting functions:
cond_barplot()
,
cond_boxplot()
,
funq_plot()
cond_heatmap( iris, x = "Petal.Length", n = 12 ) data("diamonds", package="ggplot2") cond_heatmap( diamonds, x = "carat", bins <- c(100,100) )[6:8]
cond_heatmap( iris, x = "Petal.Length", n = 12 ) data("diamonds", package="ggplot2") cond_heatmap( diamonds, x = "carat", bins <- c(100,100) )[6:8]
funq_plot()
conditions on variable x
with quantile binning and
plots the median and interquartile range of numerical variables and level frequency
of the other variables as a function the x
variable.
funq_plot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, color = "#002f2f", fill = "#2f4f4f", auto_fill = TRUE, ncols = NULL, xmarker = NULL, qmarker = NULL, show_bins = FALSE, xlim = NULL, connect = TRUE, ... )
funq_plot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, color = "#002f2f", fill = "#2f4f4f", auto_fill = TRUE, ncols = NULL, xmarker = NULL, qmarker = NULL, show_bins = FALSE, xlim = NULL, connect = TRUE, ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
color |
The color to use for the line charts |
fill |
The fill color to use for the areas |
auto_fill |
If |
ncols |
The number of column to be used in the layout |
xmarker |
|
qmarker |
|
show_bins |
if |
xlim |
|
connect |
if |
... |
Additional arguments to pass to the plot functions |
By highlighting and connecting the median values it creates a functional view of the data.
What is the (expected) median given a certain value of x
?
It qbin
s the x
variable and plots the medians of the qbins vs the other variables, thereby
creating a functional view of x
to the rest of the data,
calculating the statistics for each bin, hence the name funq_plot
.
A ggplot object with the plots
Other conditional quantile plotting functions:
cond_barplot()
,
cond_boxplot()
,
cond_heatmap()
funq_plot(iris, "Sepal.Length", xmarker=5.5) funq_plot( iris, x = "Sepal.Length", xmarker=5.5, overlap = TRUE ) data("diamonds", package="ggplot2") funq_plot(diamonds[1:7], "carat", xlim=c(0,2)) if (require(palmerpenguins)){ funq_plot( penguins[1:7], x = "body_mass_g", xmarker=4650, ncol = 3 ) }
funq_plot(iris, "Sepal.Length", xmarker=5.5) funq_plot( iris, x = "Sepal.Length", xmarker=5.5, overlap = TRUE ) data("diamonds", package="ggplot2") funq_plot(diamonds[1:7], "carat", xlim=c(0,2)) if (require(palmerpenguins)){ funq_plot( penguins[1:7], x = "body_mass_g", xmarker=4650, ncol = 3 ) }
Bins a data.frame into quantile bins for variable x
in data
.
qbin(data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ...)
qbin(data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ...)
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
... |
reserved for future use |
Each numeric variable in the data.frame is binned into n
quantile bins, for
which the fivenum()
and mean()
is calculated.
When n/nrow(data)
is less than min_bin_size
, qbin
gives a warning and
n
is adjusted to nrow(data)/min_bin_size
.
Each categorical variable is binned into n
quantile bins, for which the
level frequency is calculated.
a qbin
object with:
$x the variable name used for binning
$bin a vector of bin numbers
$n the number of bins
$num_cols a vector of numeric column names
$cat_cols a vector of categorical column names
$data a list of data.tables with the collected information
qbin_barplot()
shows the median or mean for each quantile bin, thereby focusing on
the expected value per qbin()
.
For a conditional plot, see cond_barplot()
.
qbin_barplot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ncols = NULL, fill = "#2f4f4f", type = c("median", "mean"), ... ) table_plot(data, x = NULL, n = 100, ncols = ncol(data), fill = "#555555", ...)
qbin_barplot( data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ncols = NULL, fill = "#2f4f4f", type = c("median", "mean"), ... ) table_plot(data, x = NULL, n = 100, ncols = ncol(data), fill = "#555555", ...)
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
ncols |
The number of column to be used in the layout. |
fill |
The color to use for the bars. |
type |
The type of statistic to use for the bars. |
... |
Additional arguments to pass to the plot functions |
The table_plot
is a specific form of qbin_barplot
with ncols
set to ncol(data)
.
A list
of ggplot objects.
Other qbin plotting functions:
qbin_boxplot()
,
qbin_heatmap()
,
qbin_lineplot()
data("diamonds", package="ggplot2") table_plot(diamonds[c(1:4, 7)], "carat") qbin_barplot(iris, "Sepal.Length", n = 12) table_plot(iris, "Sepal.Length", n=12) table_plot( iris, x = "Sepal.Length", min_bin_size=20, overlap=TRUE ) if (require(palmerpenguins)) { table_plot(penguins[1:7], "body_mass_g", 19) }
data("diamonds", package="ggplot2") table_plot(diamonds[c(1:4, 7)], "carat") qbin_barplot(iris, "Sepal.Length", n = 12) table_plot(iris, "Sepal.Length", n=12) table_plot( iris, x = "Sepal.Length", min_bin_size=20, overlap=TRUE ) if (require(palmerpenguins)) { table_plot(penguins[1:7], "body_mass_g", 19) }
qbin_boxplot
creates quantile binned boxplots from data
using x
as the binning
variable. It focuses on the change of median between qbins. It is a
complement to qbin_heatmap()
which focuses on the distribution within the qbins.
qbin_boxplot( data, x = NULL, n = 100, min_bin_size = NULL, ncols = NULL, overlap = NULL, connect = FALSE, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, qmarker = NULL, xmarker = NULL, ... )
qbin_boxplot( data, x = NULL, n = 100, min_bin_size = NULL, ncols = NULL, overlap = NULL, connect = FALSE, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, qmarker = NULL, xmarker = NULL, ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
ncols |
The number of column to be used in the layout |
overlap |
|
connect |
if |
color |
The color to use for the lines |
fill |
The color to use for the bars |
auto_fill |
If |
qmarker |
|
xmarker |
|
... |
Additional arguments to pass to the plot functions |
The data is binned by the x
and a boxplot is created for each bin.
The median of the subsequent boxplots are connected to highlight jumps in the
data. It hints at the dependecy of the variable on the binning variable.
A list
of ggplot objects.
Other qbin plotting functions:
qbin_barplot()
,
qbin_heatmap()
,
qbin_lineplot()
qbin_boxplot( iris, x = "Sepal.Length", ) qbin_boxplot( iris, x = "Sepal.Length", connect = TRUE, overlap = TRUE ) qbin_boxplot( iris, x = "Sepal.Length", connect = TRUE, xmarker = 5.5, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_boxplot( diamonds[1:7], "carat", auto_fill = TRUE ) qbin_boxplot( diamonds[1:7], "price", auto_fill = TRUE, )
qbin_boxplot( iris, x = "Sepal.Length", ) qbin_boxplot( iris, x = "Sepal.Length", connect = TRUE, overlap = TRUE ) qbin_boxplot( iris, x = "Sepal.Length", connect = TRUE, xmarker = 5.5, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_boxplot( diamonds[1:7], "carat", auto_fill = TRUE ) qbin_boxplot( diamonds[1:7], "price", auto_fill = TRUE, )
qbin_heatmap
shows the distribution of the y
of variables for each quantile bin of x
. It is an alternative to
qbin_boxplot()
, fine graining the distribution per qbin()
.
qbin_barplot()
highlights the median/mean of the quantile bins, while
qbin_heatmap( data, x = NULL, n = 25, min_bin_size = NULL, overlap = NULL, bins = c(n), type = c("gradient", "size"), ncols = NULL, auto_fill = FALSE, fill = "#2f4f4f", low = "#eeeeee", high = "#2f4f4f", ... )
qbin_heatmap( data, x = NULL, n = 25, min_bin_size = NULL, overlap = NULL, bins = c(n), type = c("gradient", "size"), ncols = NULL, auto_fill = FALSE, fill = "#2f4f4f", low = "#eeeeee", high = "#2f4f4f", ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
overlap |
|
bins |
|
type |
The type of heatmap to use. Either "gradient" or "size". |
ncols |
The number of column to be used in the layout. |
auto_fill |
If |
fill |
The color used for categorical variables. |
low |
The color used for low values in the heatmap. |
high |
The color used for high values in the heatmap. |
... |
Additional arguments to pass to the plot functions |
A list
of ggplot objects.
Other qbin plotting functions:
qbin_barplot()
,
qbin_boxplot()
,
qbin_lineplot()
qbin_heatmap( iris, "Sepal.Length", auto_fill = TRUE ) qbin_heatmap( iris, "Sepal.Length", auto_fill = TRUE, type = "size" ) qbin_heatmap( iris, "Sepal.Length", overlap = TRUE, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_heatmap( diamonds[c(1,7:9)], x = "price", n = 150 )
qbin_heatmap( iris, "Sepal.Length", auto_fill = TRUE ) qbin_heatmap( iris, "Sepal.Length", auto_fill = TRUE, type = "size" ) qbin_heatmap( iris, "Sepal.Length", overlap = TRUE, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_heatmap( diamonds[c(1,7:9)], x = "price", n = 150 )
qbin_lineplot
creates quantile binned boxplots from data
using x
as the binning
variable and connects the medians: it focuses on the change of median between qbins.
qbin_lineplot( data, x = NULL, n = 100, min_bin_size = NULL, ncols = NULL, connect = TRUE, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, qmarker = NULL, xmarker = NULL, ... )
qbin_lineplot( data, x = NULL, n = 100, min_bin_size = NULL, ncols = NULL, connect = TRUE, color = "#002f2f", fill = "#2f4f4f", auto_fill = FALSE, qmarker = NULL, xmarker = NULL, ... )
data |
a |
x |
|
n |
|
min_bin_size |
|
ncols |
The number of column to be used in the layout |
connect |
if |
color |
The color to use for the lines |
fill |
The color to use for the bars |
auto_fill |
If |
qmarker |
|
xmarker |
|
... |
Additional arguments to pass to the plot functions |
The data is binned by the x
and a boxplot is created for each bin.
The median of the subsequent boxplots are connected to highlight jumps in the
data. It hints at the dependecy of the variable on the binning variable.
A list
of ggplot objects.
Other qbin plotting functions:
qbin_barplot()
,
qbin_boxplot()
,
qbin_heatmap()
qbin_lineplot( iris, x = "Sepal.Length", ) qbin_lineplot( iris, x = "Sepal.Length", xmarker = 5.5, auto_fill = TRUE ) qbin_lineplot( iris, x = "Sepal.Length", overlap=TRUE, xmarker = 5.5, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_lineplot( diamonds[1:7], "carat", auto_fill = TRUE ) qbin_lineplot( diamonds[1:7], "price", auto_fill = TRUE, )
qbin_lineplot( iris, x = "Sepal.Length", ) qbin_lineplot( iris, x = "Sepal.Length", xmarker = 5.5, auto_fill = TRUE ) qbin_lineplot( iris, x = "Sepal.Length", overlap=TRUE, xmarker = 5.5, auto_fill = TRUE ) data("diamonds", package="ggplot2") qbin_lineplot( diamonds[1:7], "carat", auto_fill = TRUE ) qbin_lineplot( diamonds[1:7], "price", auto_fill = TRUE, )