| Title: | Convenience Functions for Routine Data Exploration |
|---|---|
| Description: | A series of shortcuts for routine tasks originally developed by Rafael A. Irizarry to facilitate data exploration. |
| Authors: | Rafael A. Irizarry [aut, cre], Michael I. Love [aut] |
| Maintainer: | Rafael A. Irizarry <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.0.4 |
| Built: | 2026-05-29 11:15:41 UTC |
| Source: | https://github.com/cran/rafalib |
Converts a vector of characters into factors and then converts these into numeric.
as.fumeric(x, levels = unique(x))as.fumeric(x, levels = unique(x))
x |
a character vector |
levels |
the leves to be used in the call to factor |
Rafael A. Irizarry
group = c("a","a","b","b") plot(seq_along(group),col=as.fumeric(group))group = c("a","a","b","b") plot(seq_along(group),col=as.fumeric(group))
Plot the overlap of three groups with a barplot
bartab(x, y, z, names, skipNone = FALSE, ...)bartab(x, y, z, names, skipNone = FALSE, ...)
x |
logical |
y |
logical |
z |
logical |
names |
a character vector of length 3 |
skipNone |
remove the "none" group |
... |
further arguments passed on to |
Michael I. Love
set.seed(1) x <- sample(c(FALSE,TRUE), 10, replace=TRUE) y <- sample(c(FALSE,TRUE), 10, replace=TRUE) z <- sample(c(FALSE,TRUE), 10, replace=TRUE) bartab(x,y,z,c("X","Y","Z"))set.seed(1) x <- sample(c(FALSE,TRUE), 10, replace=TRUE) y <- sample(c(FALSE,TRUE), 10, replace=TRUE) z <- sample(c(FALSE,TRUE), 10, replace=TRUE) bartab(x,y,z,c("X","Y","Z"))
Produces an image of a matrix which matches the natural orientation.
imagemat( x, col = colorRampPalette(c("white", "grey50"))(9), las = 1, xlab = "", ylab = "", ... )imagemat( x, col = colorRampPalette(c("white", "grey50"))(9), las = 1, xlab = "", ylab = "", ... )
x |
the matrix |
col |
the colors |
las |
as in par |
xlab |
x-axis title |
ylab |
y-axis title |
... |
arguments passed to image |
Michael I. Love
x <- matrix(c(1,0,0,0,1, 1,1,0,1,1, 1,0,1,0,1, 1,0,0,0,1, 1,0,0,0,1), ncol=5,byrow=TRUE) imagemat(x)x <- matrix(c(1,0,0,0,1, 1,1,0,1,1, 1,0,1,0,1, 1,0,0,0,1, 1,0,0,0,1), ncol=5,byrow=TRUE) imagemat(x)
the rows are sorted such that the first column has 2 blocks, the second column has 4 blocks, etc. see example("imagesort")
imagesort(x, col = c("white", "black"), ...)imagesort(x, col = c("white", "black"), ...)
x |
a matrix of 0s and 1s |
col |
the colors of 0 and 1 |
... |
arguments to heatmap |
Michael I. Love
x <- replicate(4,sample(0:1,40,TRUE)) imagesort(x)x <- replicate(4,sample(0:1,40,TRUE)) imagesort(x)
This function is simply a wrapper for link[BiocManager]{install}. If BiocManager is not installed it it automatically installed.
install_bioc(...)install_bioc(...)
... |
arguments passed on to |
If BiocManager is installed you can simply call BiocManager::install instead.
Rafael A. Irizarry
install_bioc("affy")install_bioc("affy")
This function lists all the objects in the global environmnet and lists the n largest.
largeobj(n = 5, units = "Mb")largeobj(n = 5, units = "Mb")
n |
the number of objects to return |
units |
units to display, see |
a named character string of the size of the 'n' largest objects
Michael I. Love
x<-rnorm(10^5) y<-rnorm(10^6) z<-rnorm(2*10^6) w<-rnorm(3*10^6) largeobj(n=3)x<-rnorm(10^5) y<-rnorm(10^6) z<-rnorm(2*10^6) w<-rnorm(3*10^6) largeobj(n=3)
Takes two vectors x and y and plots M=y-x versus A=(x+y)/2. If the vectors a more longer than length n the data is sampled to size n. A smooth curve is added to show trends.
maplot( x, y, n = 10000, subset = NULL, xlab = NULL, ylab = NULL, curve.add = TRUE, curve.col = 2, curve.span = 1/2, curve.lwd = 2, curve.n = 2000, ... )maplot( x, y, n = 10000, subset = NULL, xlab = NULL, ylab = NULL, curve.add = TRUE, curve.col = 2, curve.span = 1/2, curve.lwd = 2, curve.n = 2000, ... )
x |
a numeric vector |
y |
a numeric vector |
n |
a numeric value. If |
subset |
index of the points to be plotted |
xlab |
a title for the x axis |
ylab |
a title for the y axis |
curve.add |
if |
curve.col |
a numeric value that determines the color of the smooth curve |
curve.span |
is passed on to |
curve.lwd |
the line width for the smooth curve |
curve.n |
a numeric value that determines the sample size used to fit the curve. This makes fitting the curve faster with large datasets |
... |
further arguments passed to |
Rafael A. Irizarry
n <- 10000 signal <- runif(n,4,15) bias <- (signal/5 - 2)^2 x <- signal + rnorm(n) y <- signal + bias + rnorm(n) maplot(x,y)n <- 10000 signal <- runif(n,4,15) bias <- (signal/5 - 2)^2 x <- signal + rnorm(n) y <- signal + bias + rnorm(n) maplot(x,y)
Called without arguments, this function optimizes graphical parameters
for the RStudio plot window. bigpar uses big fonts which are good for presentations.
mypar( a = 1, b = 1, brewer.n = 8, brewer.name = "Dark2", cex.lab = 1, cex.main = 1.2, cex.axis = 1, mar = c(2.5, 2.5, 1.6, 1.1), mgp = c(1.5, 0.5, 0), ... )mypar( a = 1, b = 1, brewer.n = 8, brewer.name = "Dark2", cex.lab = 1, cex.main = 1.2, cex.axis = 1, mar = c(2.5, 2.5, 1.6, 1.1), mgp = c(1.5, 0.5, 0), ... )
a |
the first entry of the vector passed to |
b |
the second entry of the vector passed to |
brewer.n |
parameter |
brewer.name |
parameters |
cex.lab |
passed on to |
cex.main |
passed on to |
cex.axis |
passed on to |
mar |
passed on to |
mgp |
passed on to |
... |
other parameters passed on to |
Rafael A. Irizarry
mypar() plot(cars) bigpar() plot(cars)mypar() plot(cars) bigpar() plot(cars)
Modifiction of plclust for plotting hclust objects in *in colour*!
myplclust( hclust, labels = hclust$labels, lab.col = rep(1, length(hclust$labels)), hang = 0.1, xlab = "", sub = "", ... )myplclust( hclust, labels = hclust$labels, lab.col = rep(1, length(hclust$labels)), hang = 0.1, xlab = "", sub = "", ... )
hclust |
hclust object |
labels |
a character vector of labels of the leaves of the tree |
lab.col |
colour for the labels; NA=default device foreground colour |
hang |
|
xlab |
title for x-axis (defaults to no title) |
sub |
subtitle (defualts to no subtitle) |
... |
further arguments passed to |
Eva KF Chan
data(iris) hc <- hclust( dist(iris[,1:4]) ) myplclust(hc, labels=iris$Species,lab.col=as.numeric(iris$Species))data(iris) hc <- hclust( dist(iris[,1:4]) ) myplclust(hc, labels=iris$Species,lab.col=as.numeric(iris$Species))
Make an plot with nothing in it
nullplot(x1 = 0, x2 = 1, y1 = 0, y2 = 1, xlab = "", ylab = "", ...)nullplot(x1 = 0, x2 = 1, y1 = 0, y2 = 1, xlab = "", ylab = "", ...)
x1 |
lowest x-axis value |
x2 |
largest x-axis value |
y1 |
lowest y-axis value |
y2 |
largest y-axis value |
xlab |
x-axis title, defaults to no title |
ylab |
y-axis title, defaults to no title |
... |
further arguments passed on to plot |
nullplot()nullplot()
this returns a character vector which shows the top n lines of a file.
Note: I realized after the fact that this is essentially a duplicate
of the base R function readLines.
peek(x, n = 2)peek(x, n = 2)
x |
a filename |
n |
the number of lines to return |
Michael I. Love
filename <- tempfile() x<-matrix(round(rnorm(10^4),2),1000,10) colnames(x)=letters[1:10] write.csv(x,file=filename,row.names=FALSE) peek(filename)filename <- tempfile() x<-matrix(round(rnorm(10^4),2),1000,10) colnames(x)=letters[1:10] write.csv(x,file=filename,row.names=FALSE) peek(filename)
Returns the population standard deviation. Note that sd returns
the unbiased sample estimate of the population standard deviation.
It simply multiplies the result of var by (n-1) / n with n
the population size and takes the square root.
popsd(x, na.rm = FALSE)popsd(x, na.rm = FALSE)
x |
a numeric vector or an R object which is coercible to one by |
na.rm |
logical. Should missing values be removed? |
Returns the population variance. Note that var returns
the unbiased sample estimate of the population variance.
It simply multiplies the result of var by (n-1) / n with n
the population size.
popvar(x, ...)popvar(x, ...)
x |
a numeric vector, matrix or data frame. |
... |
further arguments passed along to |
x <- c(0,1) ##variance should be 0.5^2=0.25 var(x) popvar(x)x <- c(0,1) ##variance should be 0.5^2=0.25 var(x) popvar(x)
draws points or boxes depending on sample size
sboxplot(x, ...)sboxplot(x, ...)
x |
a named list of numeric vectors |
... |
further arguments passed on to |
sboxplot(list(a=rnorm(15),b=rnorm(75),c=rnorm(10000)))sboxplot(list(a=rnorm(15),b=rnorm(75),c=rnorm(10000)))
a smooth histogram with unit indicator
(we're simply scaling the kernel density estimate). The advantage of this plot
is its interpretability since the height of the curve represents the
frequency of a interval of size unit around the point in question.
Another advantage is that if z is a matrix, curves are plotted
together.
shist( z, unit, bw = "nrd0", n, from, to, plotHist = FALSE, add = FALSE, xlab, ylab = "Frequency", xlim, ylim, main, ... )shist( z, unit, bw = "nrd0", n, from, to, plotHist = FALSE, add = FALSE, xlab, ylab = "Frequency", xlim, ylim, main, ... )
z |
the data |
unit |
the unit which determines the y axis scaling and is drawn |
bw |
arguments to density |
n |
arguments to density |
from |
arguments to density |
to |
arguments to density |
plotHist |
a logical: should an actual histogram be drawn under curve? |
add |
a logical: add should the curve be added to existing plot? |
xlab |
x-axis title, defaults to no title |
ylab |
y-axis title, defaults to no title |
xlim |
range of the x-axis |
ylim |
range of the y-axis |
main |
an overall title for the plot: see |
... |
arguments to lines |
set.seed(1) x = rnorm(50) par(mfrow=c(2,1)) hist(x, breaks=-5:5) shist(x, unit=1, xlim=c(-5,5))set.seed(1) x = rnorm(50) par(mfrow=c(2,1)) hist(x, breaks=-5:5) shist(x, unit=1, xlim=c(-5,5))
Creates an list of indexes for each unique entry of x
splitit(x)splitit(x)
x |
a vector |
x <- c("a","a","b","a","b","c","b","b") splitit(x)x <- c("a","a","b","a","b","c","b","b") splitit(x)
if n > 10,000, make a random subset of 10,000 and plot. You can also specify
a specific subset to plot. If length of subset is larger
than n, a random sample is still used to reduce data size.
splot(x, y, n = 10000, subset = NULL, xlab = NULL, ylab = NULL, ...)splot(x, y, n = 10000, subset = NULL, xlab = NULL, ylab = NULL, ...)
x |
the x data |
y |
the y data |
n |
the number to subset |
subset |
explicit subset index (optional). |
xlab |
title for the x-axis |
ylab |
title for the y-axis |
... |
further parameters passed on to |
x <- rnorm(1e5) y <- rnorm(1e5) splot(x,y,pch=16,col=rgb(0,0,0,.25))x <- rnorm(1e5) y <- rnorm(1e5) splot(x,y,pch=16,col=rgb(0,0,0,.25))
This simply calls stripchart but specifies
a vertical plot with jitter and using pch=1.
stripplot(...)stripplot(...)
... |
passed to |
a plot