Title: | Density Strips and Other Methods for Compactly Illustrating Distributions |
---|---|
Description: | Graphical methods for compactly illustrating probability distributions, including density strips, density regions, sectioned density plots and varying width strips. |
Authors: | Christopher Jackson |
Maintainer: | Christopher Jackson <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5.4 |
Built: | 2024-11-05 06:15:10 UTC |
Source: | CRAN |
Graphical methods for compactly illustrating and comparing distributions, particularly distributions arising from parameter estimation or prediction.
denstrip
implements the density strip for
illustrating a single univariate distribution. The darkness of the
density strip at a point is proportional to the density at that point.
A shortcut function denstrip.normal
draws the strip for the
given normal distribution.
densregion
implements the density region, which
illustrates the uncertainty surrounding a continuously-varying quantity
as a two-dimensional shaded region with darkness proportional to the
density. There are shortcut functions densregion.normal
and densregion.survfit
for computing and drawing the
region for normally-distributed predictions and survival curves,
respectively.
sectioned.density
implements the sectioned density plots
of Cohen and Cohen (2006). These illustrate distributions using
occlusion and varying shading. They were developed for
summarising data, but can also be used for illustrating known
distributions.
vwstrip
can be used to draw varying-width strips to
illustrate distributions, in a similar manner to the violin plot for
summarising data. The width of the strip is proportional to the density.
A shortcut function vwstrip.normal
draws the strip for the
given normal distribution.
bpstrip
adapts the box-percentile plot to illustrate a
distribution instead of observed data. This strip has width
proportional to the probability of a more extreme point.
cistrip
implements the popular point and line figure for
illustrating point and interval estimates, for example from multiple
regression.
These methods are discussed in more detail by Jackson (2008).
Each function is designed to add a graphic to an existing set of plot axes. The plots can be added to either base graphics or lattice panels.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
Box-percentile strips give a compact illustration of a distribution. The width of the strip is proportional to the probability of a more extreme point. This function adds a box-percentile strip to an existing plot.
bpstrip(x, prob, at, width, horiz=TRUE, scale=1, limits=c(-Inf, Inf), col="gray", border=NULL, lwd, lty, ticks=NULL, tlen=1, twd, tty, lattice=FALSE) panel.bpstrip(...)
bpstrip(x, prob, at, width, horiz=TRUE, scale=1, limits=c(-Inf, Inf), col="gray", border=NULL, lwd, lty, ticks=NULL, tlen=1, twd, tty, lattice=FALSE) panel.bpstrip(...)
x |
Either the vector of points at which the probability is
evaluated (if |
prob |
Probability, or cumulative density, of the distribution
at |
at |
Position of the centre of the strip on the y-axis (if
|
width |
Thickness of the strip at its thickest point, which will be at the median. Defaults to 1/20 of the axis range. |
horiz |
Draw the strip horizontally ( |
scale |
Alternative way of specifying the thickness of the
strip, as a proportion of |
limits |
Vector of minimum and maximum values, respectively, at which to terminate the strip. |
col |
Colour to shade the strip, either as a built-in R
colour name (one of |
border |
Colour of the border, see |
lwd |
Line width of the border (defaults to
|
lty |
Line type of the border (defaults to
|
ticks |
Vector of |
tlen |
Length of the ticks, relative to the thickness of the strip. |
twd |
Line width of these marks (defaults to
|
tty |
Line type of these marks (defaults to
|
lattice |
Set this to |
... |
Other arguments passed to |
The box-percentile strip looks the same as the box-percentile plot
(Esty and Banfield, 2003) which is a generalisation of the boxplot for
summarising data. However, bpstrip
is intended for illustrating
distributions arising from parameter
estimation or prediction. Either the distribution is known
analytically, or an arbitrarily large sample from the distribution is
assumed to be available via a method such as MCMC or bootstrapping.
The function bpplot
in the Hmisc
package can be used to draw vertical box-percentile plots of observed
data.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
Esty, W. W. and Banfield, J. D. (2003) The box-percentile plot. Journal of Statistical Software 8(17).
x <- seq(-4, 4, length=1000) prob <- pnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") bpstrip(x, prob, at=1, ticks=qnorm(c(0.25, 0.5, 0.75))) ## Terminate the strip at specific outer quantiles bpstrip(x, prob, at=2, limits=qnorm(c(0.025, 0.975))) bpstrip(x, prob, at=3, limits=qnorm(c(0.005, 0.995))) ## Compare with density strip denstrip(x, dnorm(x), at=0) ## Estimate the density from a large sample x <- rnorm(10000) bpstrip(x, at=4)
x <- seq(-4, 4, length=1000) prob <- pnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") bpstrip(x, prob, at=1, ticks=qnorm(c(0.25, 0.5, 0.75))) ## Terminate the strip at specific outer quantiles bpstrip(x, prob, at=2, limits=qnorm(c(0.025, 0.975))) bpstrip(x, prob, at=3, limits=qnorm(c(0.005, 0.995))) ## Compare with density strip denstrip(x, dnorm(x), at=0) ## Estimate the density from a large sample x <- rnorm(10000) bpstrip(x, at=4)
Adds one or more points and lines to a plot, representing point and interval estimates.
cistrip(x, at, d, horiz=TRUE, pch = 16, cex = 1, lattice=FALSE, ...) panel.cistrip(...)
cistrip(x, at, d, horiz=TRUE, pch = 16, cex = 1, lattice=FALSE, ...) panel.cistrip(...)
x |
Either a vector of three elements corresponding to point estimate, lower limit and upper limit of the interval estimate, respectively, or a numeric matrix or data frame with three columns representing point estimates, lower and upper limits. |
at |
Position of the line on the y-axis (if
|
d |
Length of the serifs at each end of the line. Defaults to 1/60 of the axis range. |
horiz |
Draw the line horizontally ( |
pch |
Character to draw at the point estimate, see
|
cex |
Expansion factor for the character at the point estimate,
for. A vector can be supplied here, one for each estimate, as in |
lattice |
Set this to |
... |
Further arguments passed to the |
Christopher Jackson <[email protected]>
## One estimate x <- c(0.1, -2, 2) plot(0, type="n", xlim=c(-5, 5), ylim=c(-5, 5), xlab="", ylab="") abline(h=0, lty=2, col="lightgray") abline(v=0, lty=2, col="lightgray") cistrip(x, at=-0.1) cistrip(x, at=0.2, lwd=3, d=0.1) cistrip(x, at=-4, horiz=FALSE, lwd=3, d=0.2) ## Double / triple the area of the central point, as in forest plots cistrip(x, at=2, d=0.2, pch=22, bg="black") cistrip(x, at=2.5, d=0.2, pch=22, bg="black", cex=sqrt(2)) cistrip(x, at=3, d=0.2, pch=22, bg="black", cex=sqrt(3)) ## Several estimates x <- rbind(c(0.1, -2, 2), c(1, -1, 2.3), c(-0.2, -0.8, 0.4), c(-0.3, -1.2, 1.5)) plot(0, type="n", xlim=c(-5, 5), ylim=c(-5, 5), xlab="", ylab="") cistrip(x, at=1:4) abline(v=0, lty=2, col="lightgray") cistrip(x, at=1:4, horiz=FALSE, lwd=3, d=0.2) abline(h=0, lty=2, col="lightgray")
## One estimate x <- c(0.1, -2, 2) plot(0, type="n", xlim=c(-5, 5), ylim=c(-5, 5), xlab="", ylab="") abline(h=0, lty=2, col="lightgray") abline(v=0, lty=2, col="lightgray") cistrip(x, at=-0.1) cistrip(x, at=0.2, lwd=3, d=0.1) cistrip(x, at=-4, horiz=FALSE, lwd=3, d=0.2) ## Double / triple the area of the central point, as in forest plots cistrip(x, at=2, d=0.2, pch=22, bg="black") cistrip(x, at=2.5, d=0.2, pch=22, bg="black", cex=sqrt(2)) cistrip(x, at=3, d=0.2, pch=22, bg="black", cex=sqrt(3)) ## Several estimates x <- rbind(c(0.1, -2, 2), c(1, -1, 2.3), c(-0.2, -0.8, 0.4), c(-0.3, -1.2, 1.5)) plot(0, type="n", xlim=c(-5, 5), ylim=c(-5, 5), xlab="", ylab="") cistrip(x, at=1:4) abline(v=0, lty=2, col="lightgray") cistrip(x, at=1:4, horiz=FALSE, lwd=3, d=0.2) abline(h=0, lty=2, col="lightgray")
A density region uses shading to represent the uncertainty surrounding a continuously-varying quantity, such as a survival curve or a forecast from a time series. The darkness of the shading is proportional to the (posterior, predictive or fiducial) density. This function adds a density region to an existing plot.
densregion(x, ...) ## Default S3 method: densregion(x, y, z, pointwise=FALSE, nlevels=100, colmax=par("fg"), colmin="white", scale=1, gamma=1, contour=FALSE, ...)
densregion(x, ...) ## Default S3 method: densregion(x, y, z, pointwise=FALSE, nlevels=100, colmax=par("fg"), colmin="white", scale=1, gamma=1, contour=FALSE, ...)
x |
Suppose the continuously-varying quantity varies over a
space S. |
y |
Vector of ordinates at which the density of the distribution
will be evaluated for every point in |
z |
Matrix of densities on the grid defined by |
pointwise |
If If |
nlevels |
Number of distinct shades to use to illustrate the varying densities. The default of 100 should result in a plot with smoothly-varying shading. |
colmax |
Colour to shade the maximum density, either as a built-in R
colour name (one of |
colmin |
Colour to shade the minimum density, likewise.
Defaults to "white". If this is set to |
scale |
Proportion of |
gamma |
Gamma correction to apply to the colour palette, see |
contour |
If |
... |
Further arguments passed to or from other methods, such
as the |
The plot is shaded by interpolating the value of the density
between grid points, using the algorithm described by Cleveland (1993)
as implemented in the filled.contour
function.
With lattice graphics, similar plots can be implemented using
the contourplot
or levelplot
functions.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
Cleveland, W. S. (1993) Visualizing Data. Hobart Press, Summit, New Jersey.
densregion.survfit
, densregion.normal
, denstrip
## Predictive uncertainty around a hypothetical regression line x <- 1:10 nx <- length(x) est <- seq(0, 1, length=nx) lcl <- seq(-1, 0, length=nx) ucl <- seq(1, 2, length=nx) se <- (est - lcl)/qnorm(0.975) y <- seq(-3, 3, length=100) z <- matrix(nrow=nx, ncol=length(y)) for(i in 1:nx) z[i,] <- dnorm(y, est[i], se[i]) plot(x, type="n", ylim=c(-5.5, 2.5)) densregion(x, y, z, colmax="darkgreen") lines(x, est) lines(x, lcl, lty=2) lines(x, ucl, lty=2) box() ## On graphics devices that support transparency, specify ## colmin="transparent" to allow adjacent regions to overlap smoothly densregion(x, y-1, z, colmax="magenta", colmin="transparent") ## or automatically choose the y points to evaluate the density plot(x, type="n", ylim=c(-1.5, 2.5)) densregion.normal(x, est, se, ny=50, colmax="darkgreen") lines(x, est) lines(x, lcl, lty=2) lines(x, ucl, lty=2)
## Predictive uncertainty around a hypothetical regression line x <- 1:10 nx <- length(x) est <- seq(0, 1, length=nx) lcl <- seq(-1, 0, length=nx) ucl <- seq(1, 2, length=nx) se <- (est - lcl)/qnorm(0.975) y <- seq(-3, 3, length=100) z <- matrix(nrow=nx, ncol=length(y)) for(i in 1:nx) z[i,] <- dnorm(y, est[i], se[i]) plot(x, type="n", ylim=c(-5.5, 2.5)) densregion(x, y, z, colmax="darkgreen") lines(x, est) lines(x, lcl, lty=2) lines(x, ucl, lty=2) box() ## On graphics devices that support transparency, specify ## colmin="transparent" to allow adjacent regions to overlap smoothly densregion(x, y-1, z, colmax="magenta", colmin="transparent") ## or automatically choose the y points to evaluate the density plot(x, type="n", ylim=c(-1.5, 2.5)) densregion.normal(x, est, se, ny=50, colmax="darkgreen") lines(x, est) lines(x, lcl, lty=2) lines(x, ucl, lty=2)
Adds a density region to an existing plot of a normally-distributed quantity with continuously-varying mean and standard deviation, such as a time series forecast. Automatically computes a reasonable set of ordinates to evaluate the density at, which span the whole forecast space.
## S3 method for class 'normal' densregion(x, mean, sd, ny=20, ...)
## S3 method for class 'normal' densregion(x, mean, sd, ny=20, ...)
x |
Suppose the continuously-varying quantity varies over a
space S. |
mean |
Vector of normal means at each point in |
sd |
Vector of standard deviations at each point in |
ny |
Minimum number of points to calculate the density at for
each |
... |
Further arguments passed to |
The plot is shaded by interpolating the value of the density
between grid points, using the algorithm described by Cleveland (1993)
as implemented in the filled.contour
function.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
Cleveland, W. S. (1993) Visualizing Data. Hobart Press, Summit, New Jersey.
densregion
, densregion.survfit
, denstrip
## Time series forecasting (fit <- arima(USAccDeaths, order = c(0,1,1), seasonal = list(order=c(0,1,1)))) pred <- predict(fit, n.ahead = 36) plot(USAccDeaths, xlim=c(1973, 1982), ylim=c(5000, 15000)) ## Compute normal forecast densities automatically (slow) ## Not run: densregion.normal(time(pred$pred), pred$pred, pred$se, pointwise=TRUE, colmax="darkgreen") lines(pred$pred, lty=2) lines(pred$pred + qnorm(0.975)*pred$se, lty=3) lines(pred$pred - qnorm(0.975)*pred$se, lty=3) ## End(Not run) ## Compute forecast densities by hand (more efficient) nx <- length(pred$pred) y <- seq(5000, 15000, by=100) z <- matrix(nrow=nx, ncol=length(y)) for(i in 1:nx) z[i,] <- dnorm(y, pred$pred[i], pred$se[i]) plot(USAccDeaths, xlim=c(1973, 1982), ylim=c(5000, 15000)) densregion(time(pred$pred), y, z, colmax="darkgreen", pointwise=TRUE) lines(pred$pred, lty=2) lines(pred$pred + qnorm(0.975)*pred$se, lty=3) lines(pred$pred - qnorm(0.975)*pred$se, lty=3) densregion(time(pred$pred), y+2000, z, colmax="darkblue", pointwise=TRUE)
## Time series forecasting (fit <- arima(USAccDeaths, order = c(0,1,1), seasonal = list(order=c(0,1,1)))) pred <- predict(fit, n.ahead = 36) plot(USAccDeaths, xlim=c(1973, 1982), ylim=c(5000, 15000)) ## Compute normal forecast densities automatically (slow) ## Not run: densregion.normal(time(pred$pred), pred$pred, pred$se, pointwise=TRUE, colmax="darkgreen") lines(pred$pred, lty=2) lines(pred$pred + qnorm(0.975)*pred$se, lty=3) lines(pred$pred - qnorm(0.975)*pred$se, lty=3) ## End(Not run) ## Compute forecast densities by hand (more efficient) nx <- length(pred$pred) y <- seq(5000, 15000, by=100) z <- matrix(nrow=nx, ncol=length(y)) for(i in 1:nx) z[i,] <- dnorm(y, pred$pred[i], pred$se[i]) plot(USAccDeaths, xlim=c(1973, 1982), ylim=c(5000, 15000)) densregion(time(pred$pred), y, z, colmax="darkgreen", pointwise=TRUE) lines(pred$pred, lty=2) lines(pred$pred + qnorm(0.975)*pred$se, lty=3) lines(pred$pred - qnorm(0.975)*pred$se, lty=3) densregion(time(pred$pred), y+2000, z, colmax="darkblue", pointwise=TRUE)
Adds a density region to a survival plot. The shading of the region has darkness proportional to the fiducial density of the point. This distribution is assumed to be normal with standard deviation calculated using the lower confidence limit stored in the survival curve object.
## S3 method for class 'survfit' densregion(x, ny=20, ...)
## S3 method for class 'survfit' densregion(x, ny=20, ...)
x |
Survival curve object, returned by
|
ny |
Minimum number of points to calculate the density at for each event time. The default of 20 should be sufficient to obtain smooth-looking plots. |
... |
Further arguments passed to |
The density is calculated at a grid of points, and interpolated using
the method referred to in densregion
.
In general, this approach can only illustrate one survival curve per plot. Though if the graphics device supports transparency (e.g. PDF) multiple curves can be made to overlap smoothly - see the example below.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
densregion
, densregion.normal
, denstrip
if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) densregion(fit) lines(fit, lwd=3, conf.int=FALSE, lty=1) lines(fit, lwd=3, conf.int=TRUE, lty=2) ## Wider CIs based on log survival fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log") plot(fit, col=0) densregion(fit) # Big variation in maximum density plot(fit, col=0) densregion(fit, pointwise=TRUE, colmax="maroon4") par(new=TRUE) plot(fit) ## Narrower CIs based on untransformed survival. ## Normal assumption probably unrealistic fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="plain") plot(fit, col=0) densregion(fit, pointwise=TRUE, colmax="darkmagenta") par(new=TRUE) plot(fit) ## Multiple survival curves on same axes ## Should overlap smoothly on devices that allow transparency fit2 <- survfit(Surv(time, status) ~ x, data=aml, conf.type="log-log") fit2x1 <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log", subset=(x=="Maintained")) fit2x0 <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log", subset=(x=="Nonmaintained")) plot(fit2, lwd=3, xlab="Weeks", ylab="Survival", xlim=c(0, 60), lty=1:2, col=c("red", "blue"), conf.int=TRUE, mark.time=TRUE) densregion(fit2x1, colmax="red", gamma=2) densregion(fit2x0, colmax="blue", gamma=2) }
if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) densregion(fit) lines(fit, lwd=3, conf.int=FALSE, lty=1) lines(fit, lwd=3, conf.int=TRUE, lty=2) ## Wider CIs based on log survival fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log") plot(fit, col=0) densregion(fit) # Big variation in maximum density plot(fit, col=0) densregion(fit, pointwise=TRUE, colmax="maroon4") par(new=TRUE) plot(fit) ## Narrower CIs based on untransformed survival. ## Normal assumption probably unrealistic fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="plain") plot(fit, col=0) densregion(fit, pointwise=TRUE, colmax="darkmagenta") par(new=TRUE) plot(fit) ## Multiple survival curves on same axes ## Should overlap smoothly on devices that allow transparency fit2 <- survfit(Surv(time, status) ~ x, data=aml, conf.type="log-log") fit2x1 <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log", subset=(x=="Maintained")) fit2x0 <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log", subset=(x=="Nonmaintained")) plot(fit2, lwd=3, xlab="Weeks", ylab="Survival", xlim=c(0, 60), lty=1:2, col=c("red", "blue"), conf.int=TRUE, mark.time=TRUE) densregion(fit2x1, colmax="red", gamma=2) densregion(fit2x0, colmax="blue", gamma=2) }
The density strip illustrates a univariate distribution as a shaded rectangular strip, whose darkness at a point is proportional to the probability density. The strip is darkest at the maximum density and fades into the background at the minimum density. It may be used to generalise the common point-and-line drawing of a point and interval estimate, by representing the entire posterior or predictive distribution of the estimate. This function adds a density strip to an existing plot.
denstrip(x, dens, at, width, horiz=TRUE, colmax, colmin="white", scale=1, gamma=1, ticks=NULL, tlen=1.5, twd, tcol, mticks=NULL, mlen=1.5, mwd, mcol, lattice=FALSE, ...) panel.denstrip(...)
denstrip(x, dens, at, width, horiz=TRUE, colmax, colmin="white", scale=1, gamma=1, ticks=NULL, tlen=1.5, twd, tcol, mticks=NULL, mlen=1.5, mwd, mcol, lattice=FALSE, ...) panel.denstrip(...)
x |
Either the vector of points at which the density is
evaluated (if |
dens |
Density at |
at |
Position of the centre of the strip on the y-axis (if
|
width |
Thickness of the strip, that is, the length of its shorter dimension. Defaults to 1/30 of the axis range. |
horiz |
Draw the strip horizontally ( |
colmax |
Colour at the maximum density, either as a built-in R
colour name (one of |
colmin |
Colour to shade the minimum density, likewise.
Defaults to "white". If this is set to |
scale |
Proportion of |
gamma |
Gamma correction to apply to the colour palette.
The default of 1 should give an approximate perception of
darkness proportional to density, but this may need to be adjusted
for different displays. Values of |
ticks |
Vector of |
tlen |
Length of these tick marks relative to the strip width. |
twd |
Line thickness of these marks (defaults to
|
tcol |
Colour of the tick marks. Defaults to |
mticks |
|
mlen |
Length of this mark relative to the strip width. |
mwd |
Line thickness of this mark (defaults to
|
mcol |
Colour of this mark. Defaults to |
lattice |
Set this to |
... |
Additional arguments supplied to |
In OpenBUGS (http://www.openbugs.net) density strips are available via the Inference/Compare menu.
See this blog post: http://blogs.sas.com/content/graphicallyspeaking/2012/11/03/density-strip-plot/, for density strips in SAS.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
## Illustrate a known standard normal distribution ## Various settings to change the look of the plot x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") denstrip(x, dens, at=0) # default width denstrip(x, dens, width=0.5, at=0) denstrip(x, dens, at=-4, ticks=c(-2, 0, 2)) denstrip(x, dens, at=-3, ticks=c(-2, 2), mticks=0) denstrip(x, dens, at=-2, ticks=c(-2, 2), mticks=0, mlen=3, mwd=4, colmax="#55AABB") denstrip(x, dens, at=1, ticks=c(-2, 2), tlen=3, twd=3) denstrip(x, dens, at=-4, ticks=c(-2, 2), mticks=0, colmax="darkgreen", horiz=FALSE) x <- rnorm(1000) # Estimate the density denstrip(x, width=0.2, at=-3, ticks=c(-2, 2), mticks=0, colmax="darkgreen", horiz=FALSE) denstrip(x, at=2, width=0.5, gamma=2.2) denstrip(x, at=3, width=0.5, gamma=1/2.2) ### Specifying colour of minimum density par(bg="lightyellow") plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") x <- seq(-4, 4, length=10000) dens <- dnorm(x) ## Equivalent ways of drawing same distribution denstrip(x, dens, at=-1, ticks=c(-2, 2), mticks=0, colmax="darkmagenta") denstrip(x, dens, at=-2, ticks=c(-2, 2), mticks=0, colmax="darkmagenta", colmin="lightyellow") ## ...though the next only works if graphics device supports transparency denstrip(x, dens, at=-3, ticks=c(-2, 2), mticks=0, colmax="darkmagenta", colmin="transparent") denstrip(x, dens, at=-4, ticks=c(-2, 2), mticks=0, colmax="#8B008B", colmin="white") ## Alternative to density regions (\link{densregion.survfit}) for ## survival curves - a series of vertical density strips with no ## interpolation if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) lse <- (log(-log(fit$surv)) - log(-log(fit$upper)))/qnorm(0.975) n <- length(fit$time) lstrip <- fit$time - (fit$time-c(0,fit$time[1:(n-1)])) / 2 rstrip <- fit$time + (c(fit$time[2:n], fit$time[n])-fit$time) / 2 for (i in 1:n) { y <- exp(-exp(qnorm(seq(0,1,length=1000)[-c(1,1000)], log(-log(fit$surv))[i], lse[i]))) z <- dnorm(log(-log(y)), log(-log(fit$surv))[i], lse[i]) denstrip(y, z, at=(lstrip[i]+rstrip[i])/2, width=rstrip[i]-lstrip[i], horiz=FALSE, colmax="darkred") } par(new=TRUE) plot(fit, lwd=2) } ## Use for lattice graphics (first example from help(xyplot)) library(lattice) Depth <- equal.count(quakes$depth, number=8, overlap=.1) xyplot(lat ~ long | Depth, data = quakes, panel = function(x, y) { panel.xyplot(x, y) panel.denstrip(x, horiz=TRUE, at=-10, ticks=mean(x)) panel.denstrip(y, horiz=FALSE, at=165, ticks=mean(y)) } ) ## Lattice example data: heights of singing voice types bwplot(voice.part ~ height, data=singer, xlab="Height (inches)", panel=panel.violin, xlim=c(50,80)) bwplot(voice.part ~ height, data=singer, xlab="Height (inches)", panel = function(x, y) { xlist <- split(x, factor(y)) for (i in seq(along=xlist)) panel.denstrip(x=xlist[[i]], at=i) }, xlim=c(50,80) )
## Illustrate a known standard normal distribution ## Various settings to change the look of the plot x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") denstrip(x, dens, at=0) # default width denstrip(x, dens, width=0.5, at=0) denstrip(x, dens, at=-4, ticks=c(-2, 0, 2)) denstrip(x, dens, at=-3, ticks=c(-2, 2), mticks=0) denstrip(x, dens, at=-2, ticks=c(-2, 2), mticks=0, mlen=3, mwd=4, colmax="#55AABB") denstrip(x, dens, at=1, ticks=c(-2, 2), tlen=3, twd=3) denstrip(x, dens, at=-4, ticks=c(-2, 2), mticks=0, colmax="darkgreen", horiz=FALSE) x <- rnorm(1000) # Estimate the density denstrip(x, width=0.2, at=-3, ticks=c(-2, 2), mticks=0, colmax="darkgreen", horiz=FALSE) denstrip(x, at=2, width=0.5, gamma=2.2) denstrip(x, at=3, width=0.5, gamma=1/2.2) ### Specifying colour of minimum density par(bg="lightyellow") plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") x <- seq(-4, 4, length=10000) dens <- dnorm(x) ## Equivalent ways of drawing same distribution denstrip(x, dens, at=-1, ticks=c(-2, 2), mticks=0, colmax="darkmagenta") denstrip(x, dens, at=-2, ticks=c(-2, 2), mticks=0, colmax="darkmagenta", colmin="lightyellow") ## ...though the next only works if graphics device supports transparency denstrip(x, dens, at=-3, ticks=c(-2, 2), mticks=0, colmax="darkmagenta", colmin="transparent") denstrip(x, dens, at=-4, ticks=c(-2, 2), mticks=0, colmax="#8B008B", colmin="white") ## Alternative to density regions (\link{densregion.survfit}) for ## survival curves - a series of vertical density strips with no ## interpolation if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) lse <- (log(-log(fit$surv)) - log(-log(fit$upper)))/qnorm(0.975) n <- length(fit$time) lstrip <- fit$time - (fit$time-c(0,fit$time[1:(n-1)])) / 2 rstrip <- fit$time + (c(fit$time[2:n], fit$time[n])-fit$time) / 2 for (i in 1:n) { y <- exp(-exp(qnorm(seq(0,1,length=1000)[-c(1,1000)], log(-log(fit$surv))[i], lse[i]))) z <- dnorm(log(-log(y)), log(-log(fit$surv))[i], lse[i]) denstrip(y, z, at=(lstrip[i]+rstrip[i])/2, width=rstrip[i]-lstrip[i], horiz=FALSE, colmax="darkred") } par(new=TRUE) plot(fit, lwd=2) } ## Use for lattice graphics (first example from help(xyplot)) library(lattice) Depth <- equal.count(quakes$depth, number=8, overlap=.1) xyplot(lat ~ long | Depth, data = quakes, panel = function(x, y) { panel.xyplot(x, y) panel.denstrip(x, horiz=TRUE, at=-10, ticks=mean(x)) panel.denstrip(y, horiz=FALSE, at=165, ticks=mean(y)) } ) ## Lattice example data: heights of singing voice types bwplot(voice.part ~ height, data=singer, xlab="Height (inches)", panel=panel.violin, xlim=c(50,80)) bwplot(voice.part ~ height, data=singer, xlab="Height (inches)", panel = function(x, y) { xlist <- split(x, factor(y)) for (i in seq(along=xlist)) panel.denstrip(x=xlist[[i]], at=i) }, xlim=c(50,80) )
Add a legend to an existing plot with a density strip or shaded region, indicating the mapping of colours to densities.
denstrip.legend(x, y, width, len, colmax, colmin="white", gamma=1, horiz=FALSE, max=1, nticks = 5, ticks, value.adj = 0, cex, main = "Density", lattice=FALSE) panel.denstrip.legend(...)
denstrip.legend(x, y, width, len, colmax, colmin="white", gamma=1, horiz=FALSE, max=1, nticks = 5, ticks, value.adj = 0, cex, main = "Density", lattice=FALSE) panel.denstrip.legend(...)
x |
Central x position of the legend. |
y |
Central y position of the legend. |
width |
Width of the legend strip, that is, the length of its shorter dimension. Defaults to 1/30 of the axis range. |
len |
Length of the legend strip, that is, the length of its longer dimension. Defaults to 1/4 of the axis range. |
colmax |
Colour at the maximum density, either as a built-in R
colour name (one of |
colmin |
Colour to shade the minimum density, likewise.
Defaults to "white". If this is set to |
gamma |
Gamma correction to apply to the colour palette, see |
horiz |
Legend strip drawn vertically ( |
max |
Maximum density on the legend, which is represented by
|
nticks |
Number of tick marks on the axis adjacent to the
legend, if |
ticks |
Positions of numbered ticks on the axis adjacent to the
legend. Defaults to |
value.adj |
Extra adjustment for the axis labels to the right
(if |
cex |
Text expansion. Defaults to |
main |
Text to place above the legend. |
lattice |
Set this to |
... |
Other arguments passed to |
Christopher Jackson <[email protected]>
if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) densregion(fit) denstrip.legend(100, 0.8) ### TODO if max not supplied - ticks are not meaningful. ### In help example, find actual max dens used for densregion denstrip.legend(120, 0.8, width=3, len=0.4, value.adj=5) denstrip.legend(40, 0.9, horiz=TRUE) denstrip.legend(60, 0.7, horiz=TRUE, width=0.02, len=50, value.adj=0.04) }
if (requireNamespace("survival", quietly=TRUE)){ library(survival) fit <- survfit(Surv(time, status) ~ 1, data=aml, conf.type="log-log") plot(fit, col=0) densregion(fit) denstrip.legend(100, 0.8) ### TODO if max not supplied - ticks are not meaningful. ### In help example, find actual max dens used for densregion denstrip.legend(120, 0.8, width=3, len=0.4, value.adj=5) denstrip.legend(40, 0.9, horiz=TRUE) denstrip.legend(60, 0.7, horiz=TRUE, width=0.02, len=50, value.adj=0.04) }
Draws a density strip for a normal or log-normal distribution with the given mean and standard deviation, based on computing the density at a large set of equally-spaced quantiles.
denstrip.normal(mean, sd, log=FALSE, nx=1000, ...) panel.denstrip.normal(...)
denstrip.normal(mean, sd, log=FALSE, nx=1000, ...) panel.denstrip.normal(...)
mean |
Mean of the normal distribution. |
sd |
Standard deviation of the normal distribution. |
log |
If |
nx |
Number of points to evaluate the density at. |
... |
Further arguments passed to |
Christopher Jackson <[email protected]>
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-1, 2), xlab="x", ylab="", type="n", axes=FALSE) axis(1) denstrip(x, dens, at=0, width=0.3) denstrip.normal(0, 1, at=1, width=0.3) ### log-normal distribution sdlog <- 0.5 x <- rlnorm(10000, 0, sdlog) plot(x, xlim=c(0, 5), ylim=c(-2, 4), xlab="x", , ylab="", type="n", axes=FALSE) axis(1) abline(v=1, lty=2, col="lightgray") denstrip(x, at=0, ticks=exp(-sdlog^2), width=0.4) # tick at theoretical maximum density denstrip(x, at=1, bw=0.1, ticks=exp(-sdlog^2), width=0.4) denstrip.normal(0, sdlog, log=TRUE, at=3, nx=1000, ticks=exp(-sdlog^2), width=0.4)
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-1, 2), xlab="x", ylab="", type="n", axes=FALSE) axis(1) denstrip(x, dens, at=0, width=0.3) denstrip.normal(0, 1, at=1, width=0.3) ### log-normal distribution sdlog <- 0.5 x <- rlnorm(10000, 0, sdlog) plot(x, xlim=c(0, 5), ylim=c(-2, 4), xlab="x", , ylab="", type="n", axes=FALSE) axis(1) abline(v=1, lty=2, col="lightgray") denstrip(x, at=0, ticks=exp(-sdlog^2), width=0.4) # tick at theoretical maximum density denstrip(x, at=1, bw=0.1, ticks=exp(-sdlog^2), width=0.4) denstrip.normal(0, sdlog, log=TRUE, at=3, nx=1000, ticks=exp(-sdlog^2), width=0.4)
Sectioned density plots (Cohen and Cohen, 2006) use shading and occlusion to give a compact illustration of a distribution, such as the empirical distribution of data.
sectioned.density(x, dens, at, width, offset, ny, method=c("kernel","frequency"), nx, horiz=TRUE, up.left = TRUE, colmax, colmin="white", gamma=1, lattice=FALSE, ...) panel.sectioned.density(...)
sectioned.density(x, dens, at, width, offset, ny, method=c("kernel","frequency"), nx, horiz=TRUE, up.left = TRUE, colmax, colmin="white", gamma=1, lattice=FALSE, ...) panel.sectioned.density(...)
x |
Either the vector of points at which the density is
evaluated (if |
dens |
Density at |
at |
Position of the bottom of the plot on the y-axis (if
|
ny |
Number of fixed-width intervals for categorising the density. |
width |
Width of individual rectangles in the plot. Defaults to the range of the axis divided by 20. |
offset |
Offset for adjacent rectangles. Defaults to
|
method |
Method of estimating the density of If If |
nx |
Number of data bins for the |
horiz |
If |
up.left |
If changed to |
colmax |
Darkest colour, either as a built-in R colour name (one
of |
colmin |
Lightest colour, either as a built-in R colour name (one
of |
gamma |
Gamma correction to apply to the colour palette, see |
lattice |
Set this to |
... |
Additional arguments supplied to |
Christopher Jackson <[email protected]> (R implementation)
Cohen, D. J. and Cohen, J. The sectioned density plot. The American Statistician (2006) 60(2):167–174
## Fisher's iris data ## Various settings to change the look of the plot hist(iris$Sepal.Length, nclass=20, col="lightgray") sectioned.density(iris$Sepal.Length, at=0.2) sectioned.density(iris$Sepal.Length, at=5) sectioned.density(iris$Sepal.Length, at=10, width=0.5) hist(iris$Sepal.Length, nclass=20, col="lightgray") sectioned.density(iris$Sepal.Length, at=7, width=0.5, offset=0.1, colmax="darkmagenta") sectioned.density(iris$Sepal.Length, at=9, width=0.5, offset=0.1, ny=15, colmin="lemonchiffon") ## frequency method less smooth than kernel density sectioned.density(iris$Sepal.Length, at=12, width=0.5, offset=0.1, method="frequency") sectioned.density(iris$Sepal.Length, at=13.5, width=0.5, offset=0.1, method="frequency", nx=20) ## Illustrate a known distribution x <- seq(-4, 4, length=1000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") sectioned.density(x, dens, ny=8, at=0, width=0.3) sectioned.density(x, dens, ny=16, at=2, width=0.1) sectioned.density(x, dens, at=-3, horiz=FALSE) sectioned.density(x, dens, at=4, width=0.3, horiz=FALSE)
## Fisher's iris data ## Various settings to change the look of the plot hist(iris$Sepal.Length, nclass=20, col="lightgray") sectioned.density(iris$Sepal.Length, at=0.2) sectioned.density(iris$Sepal.Length, at=5) sectioned.density(iris$Sepal.Length, at=10, width=0.5) hist(iris$Sepal.Length, nclass=20, col="lightgray") sectioned.density(iris$Sepal.Length, at=7, width=0.5, offset=0.1, colmax="darkmagenta") sectioned.density(iris$Sepal.Length, at=9, width=0.5, offset=0.1, ny=15, colmin="lemonchiffon") ## frequency method less smooth than kernel density sectioned.density(iris$Sepal.Length, at=12, width=0.5, offset=0.1, method="frequency") sectioned.density(iris$Sepal.Length, at=13.5, width=0.5, offset=0.1, method="frequency", nx=20) ## Illustrate a known distribution x <- seq(-4, 4, length=1000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") sectioned.density(x, dens, ny=8, at=0, width=0.3) sectioned.density(x, dens, ny=16, at=2, width=0.1) sectioned.density(x, dens, at=-3, horiz=FALSE) sectioned.density(x, dens, at=4, width=0.3, horiz=FALSE)
Get all sequences of contiguous values in a vector of integers.
seqToIntervals(x)
seqToIntervals(x)
x |
A vector of integers, for example, representing
indices. |
A matrix with one row for each sequence, and two columns containing the start and the end of the sequence, respectively.
Chris Jackson <[email protected]>. Thanks to Tobias Verbeke for the suggestion.
seqToIntervals(1:10) # [1 10] seqToIntervals(c(1:10, 15:18, 20)) # [1 10; 15 18; 20 20] # vectorised, so efficient for large vectors x seqToIntervals(sample(1:1000000, size=999990))
seqToIntervals(1:10) # [1 10] seqToIntervals(c(1:10, 15:18, 20)) # [1 10; 15 18; 20 20] # vectorised, so efficient for large vectors x seqToIntervals(sample(1:1000000, size=999990))
Varying-width strips give a compact illustration of a distribution. The width of the strip is proportional to the density. This function adds a varying-width strip to an exising plot.
vwstrip(x, dens, at, width, horiz=TRUE, scale=1, limits=c(-Inf, Inf), col="gray", border=NULL, lwd, lty, ticks=NULL, tlen=1, twd, tty, lattice=FALSE,...) panel.vwstrip(...)
vwstrip(x, dens, at, width, horiz=TRUE, scale=1, limits=c(-Inf, Inf), col="gray", border=NULL, lwd, lty, ticks=NULL, tlen=1, twd, tty, lattice=FALSE,...) panel.vwstrip(...)
x |
Either the vector of points at which the density is
evaluated (if |
dens |
Density at |
at |
Position of the centre of the strip on the y-axis (if
|
width |
Thickness of the strip at the maximum density, that is, the length of its shorter dimension. Defaults to 1/20 of the axis range. |
horiz |
Draw the strip horizontally ( |
scale |
Alternative way of specifying the thickness of the
strip, as a proportion of |
limits |
Vector of minimum and maximum values, respectively, at which to terminate the strip. |
col |
Colour to shade the strip, either as a built-in R
colour name (one of |
border |
Colour of the border, see |
lwd |
Line width of the border (defaults to
|
lty |
Line type of the border (defaults to
|
ticks |
Vector of |
tlen |
Length of the ticks, relative to the thickness of the strip. |
twd |
Line width of these marks (defaults to
|
tty |
Line type of these marks (defaults to
|
lattice |
Set this to |
... |
Additional arguments supplied to |
Varying-width strips look like violin plots. The difference is that
violin plots are intended to summarise data, while
vwstrip
is
intended to illustrate a distribution arising from parameter
estimation or prediction. Either the distribution is known
analytically, or an arbitrarily large sample from the distribution is
assumed to be available via a method such as MCMC or bootstrapping.
Illustrating outliers is important for summarising data, therefore
violin plots terminate at the sample minimum and maximum and superimpose
a box plot (which appears like the bridge of a violin, hence the name).
Varying-width strips, however, are used to illustrate known
distributions which may have unbounded support. Therefore it is
important to think about where the strips should terminate (the
limits
argument). For example, the end points may illustrate
a particular pair of extreme quantiles of the distribution.
The function vioplot
in the vioplot
package and panel.violin
in the lattice
package can be used to draw violin plots of observed data.
Christopher Jackson <[email protected]>
Jackson, C. H. (2008) Displaying uncertainty with shading. The American Statistician, 62(4):340-347.
Hintze, J.L. and Nelson, R.D. (1998) Violin plots: a box plot - density trace synergism. The American Statistician 52(2),181–184.
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") vwstrip(x, dens, at=1, ticks=qnorm(c(0.025, 0.25,0.5, 0.75, 0.975))) ## Terminate the strip at specific outer quantiles vwstrip(x, dens, at=2, limits=qnorm(c(0.025, 0.975))) vwstrip(x, dens, at=3, limits=qnorm(c(0.005, 0.995))) ## Compare with density strip denstrip(x, dens, at=0) ## Estimate the density from a large sample x <- rnorm(10000) vwstrip(x, at=4)
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-5, 5), xlab="x", ylab="x", type="n") vwstrip(x, dens, at=1, ticks=qnorm(c(0.025, 0.25,0.5, 0.75, 0.975))) ## Terminate the strip at specific outer quantiles vwstrip(x, dens, at=2, limits=qnorm(c(0.025, 0.975))) vwstrip(x, dens, at=3, limits=qnorm(c(0.005, 0.995))) ## Compare with density strip denstrip(x, dens, at=0) ## Estimate the density from a large sample x <- rnorm(10000) vwstrip(x, at=4)
Draws a varying width strip for a normal or log-normal distribution with the given mean and standard deviation, based on computing the density at a large set of equally-spaced quantiles.
vwstrip.normal(mean, sd, log=FALSE, nx=1000, ...) panel.vwstrip.normal(...)
vwstrip.normal(mean, sd, log=FALSE, nx=1000, ...) panel.vwstrip.normal(...)
mean |
Mean of the normal distribution. |
sd |
Standard deviation of the normal distribution. |
log |
If |
nx |
Number of points to evaluate the density at. |
... |
Further arguments passed to |
Christopher Jackson <[email protected]>
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-1, 2), xlab="x", ylab="", type="n", axes=FALSE) axis(1) vwstrip(x, dens, at=0, width=0.4, limits=qnorm(c(0.005, 0.995))) vwstrip.normal(0, 1, at=1, width=0.4, limits=qnorm(c(0.005, 0.995))) ### log-normal distribution sdlog <- 0.5 x <- rlnorm(10000, 0, sdlog) plot(x, xlim=c(0, 5), ylim=c(-1, 3), xlab="x", ylab="", type="n", axes=FALSE) axis(1) abline(v=1, lty=2, col="lightgray") vwstrip(x, at=0, width=0.4, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog)) # tick at theoretical maximum density vwstrip(x, at=1, width=0.4, bw=0.1, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog)) vwstrip.normal(0, sdlog, log=TRUE, at=2.5, width=0.4, nx=1000, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog))
x <- seq(-4, 4, length=10000) dens <- dnorm(x) plot(x, xlim=c(-5, 5), ylim=c(-1, 2), xlab="x", ylab="", type="n", axes=FALSE) axis(1) vwstrip(x, dens, at=0, width=0.4, limits=qnorm(c(0.005, 0.995))) vwstrip.normal(0, 1, at=1, width=0.4, limits=qnorm(c(0.005, 0.995))) ### log-normal distribution sdlog <- 0.5 x <- rlnorm(10000, 0, sdlog) plot(x, xlim=c(0, 5), ylim=c(-1, 3), xlab="x", ylab="", type="n", axes=FALSE) axis(1) abline(v=1, lty=2, col="lightgray") vwstrip(x, at=0, width=0.4, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog)) # tick at theoretical maximum density vwstrip(x, at=1, width=0.4, bw=0.1, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog)) vwstrip.normal(0, sdlog, log=TRUE, at=2.5, width=0.4, nx=1000, ticks=exp(-sdlog^2), limits=qlnorm(c(0.005,0.975),0,sdlog))