Title: | Plot Categorical Data Using Quasirandom Noise and Density Estimates |
---|---|
Description: | Generate a violin point plot, a combination of a violin/histogram plot and a scatter plot by offsetting points within a category based on their density using quasirandom noise. |
Authors: | Scott Sherrill-Mix, Erik Clarke |
Maintainer: | Scott Sherrill-Mix <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4.7 |
Built: | 2024-11-13 06:34:02 UTC |
Source: | CRAN |
A function is applied to subsets of x
where each subset consist of those observations with the same groupings in y
aveWithArgs(x, y, FUN = mean, ...)
aveWithArgs(x, y, FUN = mean, ...)
x |
a vector to apply FUN to |
y |
a vector or list of vectors of grouping variables all of the same length as |
FUN |
function to apply for each factor level combination. |
... |
additional arguments to |
A numeric vector of the same length as x
where an each element contains the output from FUN
after FUN
was applied on the corresponding subgroup for that element (repeated if necessary within a subgroup).
aveWithArgs(1:10,rep(1:5,2)) aveWithArgs(c(1:9,NA),rep(1:5,2),max,na.rm=TRUE)
aveWithArgs(1:10,rep(1:5,2)) aveWithArgs(c(1:9,NA),rep(1:5,2),max,na.rm=TRUE)
A dataset containing data from the US census burea
counties
counties
A data frame with 3143 rows and 8 variables:
GEO.id from original data
state in which the county is located
name of the county
population of the county
housing units in the county
Area in square miles - Total area
Area in square miles - Water area
Area in square miles - Land area
http://factfinder.census.gov/bkmk/table/1.0/en/DEC/10_SF1/GCTPH1.US05PR (link now dead), system.file("data-raw", "makeCounties.R", package = "vipor")
https://web.archive.org/web/20150326040847/https://www.census.gov/prod/cen2010/cph-2-1.pdf
Takes a vector of integers representing digits in an arbitrary base e.g. binary or octal and converts it into an integer (or the integer divided by base^length(digits) for the number of digits if fractional is TRUE). Note that the first digit in the input is the least significant.
digits2number(digits, base = 2, fractional = FALSE)
digits2number(digits, base = 2, fractional = FALSE)
digits |
a vector of integers representing digits in an arbitrary base |
base |
the base for the numeral system (e.g. 2 for binary or 8 for octal) |
fractional |
divide the output by the max for this number of digits and base. Note that this is |
an integer
https://en.wikipedia.org/wiki/Radix
digits2number(c(4,4,1),8) digits2number(number2digits(100))
digits2number(c(4,4,1),8) digits2number(number2digits(100))
Find a random string of concatenated permutations of 1:n fulfilling Tukey's criteria that there are no runs of 3 or more increases or decreases in a row. Tukey just uses the default n=5.
generatePermuteString(nReps = 20, n = 5)
generatePermuteString(nReps = 20, n = 5)
nReps |
number of permutations to concatenate |
n |
permutations from 1 to n |
a vector of nReps*n integers giving concatenated permutations
tukeyPermutes() tukeyPermutes(6,3)
tukeyPermutes() tukeyPermutes(6,3)
A dataset containing data from a meta-analysis looking for differences between active and inactive HIV integrations. Each row represents a provirus integrated somewhere in a human chromosome with whether viral expression was detectd, the distance to the nearest gene and the number of reads from H4K12ac ChIP-Seq mapped to within 50,000 bases of the integration.
integrations
integrations
A data frame with 12436 rows and 4 variables:
the cell population infected by HIV
whether the provirus was active (expressed) or inactive (latent)
distance to nearest gene (transcription unit) (0 if in a gene)
number of reads aligned within +- 50,000 bases in a H4K12ac ChIP-Seq
https://retrovirology.biomedcentral.com/articles/10.1186/1742-4690-10-90, system.file("data-raw", "makeIntegrations.R", package = "vipor")
https://retrovirology.biomedcentral.com/articles/10.1186/1742-4690-10-90
Takes an integer and converts it into an arbitrary base e.g. binary or octal. Note that the first digit in the output is the least significant.
number2digits(n, base = 2)
number2digits(n, base = 2)
n |
the integer to be converted |
base |
the base for the numeral system (e.g. 2 for binary or 8 for octal) |
a vector of length ceiling(log(n+1,base))
respresenting each digit for that numeral system
https://en.wikipedia.org/wiki/Radix
number2digits(100) number2digits(100,8)
number2digits(100) number2digits(100,8)
Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/one-dimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). This function returns a vector of the offsets to be used in plotting.
offsetX(y, x = rep(1, length(y)), width = 0.4, varwidth = FALSE, ...) offsetSingleGroup( y, maxLength = NULL, method = c("quasirandom", "pseudorandom", "smiley", "maxout", "frowney", "minout", "tukey", "tukeyDense"), nbins = NULL, adjust = 1 )
offsetX(y, x = rep(1, length(y)), width = 0.4, varwidth = FALSE, ...) offsetSingleGroup( y, maxLength = NULL, method = c("quasirandom", "pseudorandom", "smiley", "maxout", "frowney", "minout", "tukey", "tukeyDense"), nbins = NULL, adjust = 1 )
y |
vector of data points |
x |
a grouping factor for y (optional) |
width |
the maximum spacing away from center for each group of points. Since points are spaced to left and right, the maximum width of the cluster will be approximately width*2 (0 = no offset, default = 0.4) |
varwidth |
adjust the width of each group based on the number of points in the group |
... |
additional arguments to offsetSingleGroup |
maxLength |
multiply the offset by sqrt(length(y)/maxLength) if not NULL. The sqrt is to match boxplot (allows comparison of order of magnitude different ns, scale with standard error) |
method |
method used to distribute the points:
|
nbins |
the number of points used to calculate density (defaults to 1000 for quasirandom and pseudorandom and 100 for others) |
adjust |
adjust the bandwidth used to calculate the kernel density (smaller values mean tighter fit, larger values looser fit, default is 1) |
a vector with of x-offsets of the same length as y
## Generate fake data dat <- list(rnorm(50), rnorm(500), c(rnorm(100), rnorm(100,5)), rcauchy(100)) names(dat) <- c("Normal", "Dense Normal", "Bimodal", "Extremes") ## Plot each distribution with a variety of parameters par(mfrow=c(4,1), mar=c(2,4, 0.5, 0.5)) sapply(names(dat),function(label) { y<-dat[[label]] offsets <- list( 'Default'=offsetX(y), 'Smoother'=offsetX(y, adjust=2), 'Tighter'=offsetX(y, adjust=0.1), 'Thinner'=offsetX(y, width=0.1) ) ids <- rep(1:length(offsets), sapply(offsets,length)) plot(unlist(offsets) + ids, rep(y, length(offsets)), ylab=label, xlab='', xaxt='n', pch=21, las=1) axis(1, 1:4, c("Default", "Adjust=2", "Adjust=0.1", "Width=10%")) })
## Generate fake data dat <- list(rnorm(50), rnorm(500), c(rnorm(100), rnorm(100,5)), rcauchy(100)) names(dat) <- c("Normal", "Dense Normal", "Bimodal", "Extremes") ## Plot each distribution with a variety of parameters par(mfrow=c(4,1), mar=c(2,4, 0.5, 0.5)) sapply(names(dat),function(label) { y<-dat[[label]] offsets <- list( 'Default'=offsetX(y), 'Smoother'=offsetX(y, adjust=2), 'Tighter'=offsetX(y, adjust=0.1), 'Thinner'=offsetX(y, width=0.1) ) ids <- rep(1:length(offsets), sapply(offsets,length)) plot(unlist(offsets) + ids, rep(y, length(offsets)), ylab=label, xlab='', xaxt='n', pch=21, las=1) axis(1, 1:4, c("Default", "Adjust=2", "Adjust=0.1", "Width=10%")) })
Recursively generates all permutations of a vector. The result will be factorial(length(vals))
long so be careful with any longer vectors (e.g. longer than 10).
permute(vals)
permute(vals)
vals |
a vector of elements to be permuted |
A list of vectors containing all permutation of the values
permute(letters[1:3]) permute(1:5)
permute(letters[1:3]) permute(1:5)
Produce offsets to generate smile-like or frown-like distributions of points. That is sorting the points so that the most extreme values alternate between the left and right e.g. (max,3rd max,...,4th max, 2nd max). The function returns either a proportion between 0 and 1 (useful for plotting) or an order
topBottomDistribute(x, frowney = FALSE, prop = TRUE)
topBottomDistribute(x, frowney = FALSE, prop = TRUE)
x |
the elements to be sorted |
frowney |
if TRUE then sort minimums to the outside, otherwise sort maximums to the outside |
prop |
if FALSE then return an ordering of the data with extremes on the outside. If TRUE then return a sequence between 0 and 1 sorted by the ordering |
a vector of the same length as x with values ranging between 0 and 1 if prop is TRUE or an ordering of 1 to length(x)
topBottomDistribute(1:10) topBottomDistribute(1:10,TRUE)
topBottomDistribute(1:10) topBottomDistribute(1:10,TRUE)
Find all permutations of 1:n fulfilling Tukey's criteria that there are no runs of 3 or more increases or decreases in a row. Tukey just uses the default n=5 and limit=2.
tukeyPermutes(n = 5, limit = 2)
tukeyPermutes(n = 5, limit = 2)
n |
permutations from 1 to n |
limit |
the maximum number of increases or decreases in a row |
a list of vectors containing valid permutations
tukeyPermutes() tukeyPermutes(6,3)
tukeyPermutes() tukeyPermutes(6,3)
Combine base+1 permutation strings to generate offsets
tukeyT(nReps = 10, base = 5)
tukeyT(nReps = 10, base = 5)
nReps |
number of permutations to paste together |
base |
generate permutations of integers 1:base |
A nReps*base length vector giving offset positions based on Tukey's algorithm
tukeyT() tukeyT() tukeyT(5,4)
tukeyT() tukeyT() tukeyT(5,4)
Generate partly random, partly constrained lateral displacements based on Tukey texture algorithm from Tukey and Tukey 1990
tukeyTexture( x, jitter = TRUE, thin = FALSE, hollow = FALSE, delta = diff(stats::quantile(x, c(0.25, 0.75))) * 0.03 )
tukeyTexture( x, jitter = TRUE, thin = FALSE, hollow = FALSE, delta = diff(stats::quantile(x, c(0.25, 0.75))) * 0.03 )
x |
the points to be jittered. really only used to calculate length |
jitter |
if TRUE add random jitter to each point |
thin |
if TRUE then push points to the center in thin regions |
hollow |
if TRUE then expand points outward to avoid “hollowness” |
delta |
a “reasonably small value” used in edge straightening and thinning |
a vector of length length(x) giving displacements for each corresponding point in x
x<-rnorm(200) plot(tukeyTexture(x),x) x<-1:100 plot(tukeyTexture(x),x) plot(tukeyTexture(log10(counties$landArea),TRUE,TRUE),log10(counties$landArea),cex=.25)
x<-rnorm(200) plot(tukeyTexture(x),x) x<-1:100 plot(tukeyTexture(x),x) plot(tukeyTexture(log10(counties$landArea),TRUE,TRUE),log10(counties$landArea),cex=.25)
Generates the first (or an arbitrary offset) n elements of the van der Corput low-discrepancy sequence for a given base
vanDerCorput(n, base = 2, start = 1)
vanDerCorput(n, base = 2, start = 1)
n |
the first n elements of the van der Corput sequence |
base |
the base to use for calculating the van der Corput sequence |
start |
start at this position in the sequence |
a vector of length n with values ranging between 0 and 1
https://en.wikipedia.org/wiki/Van_der_Corput_sequence
vanDerCorput(100)
vanDerCorput(100)
Arranges data points using quasirandom noise (van der Corput sequence) to create a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). The development version of this package is on https://github.com/sherrillmix/vipor
The main functions are:
offsetX
:calculate offsets in X position for plotting (groups of) one dimensional data
vpPlot
:a simple wrapper around plot and offsetX to generate plots of grouped data
Scott Sherrill-Mix, [email protected]
https://github.com/sherrillmix/vipor
dat<-list(rnorm(100),rnorm(50,1,2)) ids<-rep(1:length(dat),sapply(dat,length)) offset<-offsetX(unlist(dat),ids) plot(unlist(dat),ids+offset)
dat<-list(rnorm(100),rnorm(50,1,2)) ids<-rep(1:length(dat),sapply(dat,length)) offset<-offsetX(unlist(dat),ids) plot(unlist(dat),ids+offset)
Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/one-dimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points) and so here we'll call it a violin point plot.
vpPlot(x = rep("Data", length(y)), y, xaxt = "y", offsetXArgs = NULL, ...)
vpPlot(x = rep("Data", length(y)), y, xaxt = "y", offsetXArgs = NULL, ...)
x |
a grouping factor for y (optional) |
y |
vector of data points |
xaxt |
if 'n' then no x axis is plotted |
offsetXArgs |
a list with arguments for offsetX |
... |
additional arguments to plot |
invisibly return the adjusted x positions of the points
dat<-list( 'Mean=0'=rnorm(200), 'Mean=1'=rnorm(50,1), 'Bimodal'=c(rnorm(40,-2),rnorm(60,2)), 'Gamma'=rgamma(50,1) ) labs<-factor(rep(names(dat),sapply(dat,length)),levels=names(dat)) vpPlot(labs,unlist(dat))
dat<-list( 'Mean=0'=rnorm(200), 'Mean=1'=rnorm(50,1), 'Bimodal'=c(rnorm(40,-2),rnorm(60,2)), 'Gamma'=rgamma(50,1) ) labs<-factor(rep(names(dat),sapply(dat,length)),levels=names(dat)) vpPlot(labs,unlist(dat))