Title: | R Commander Miscellaneous Functions |
---|---|
Description: | Various statistical, graphics, and data-management functions used by the Rcmdr package in the R Commander GUI for R. |
Authors: | John Fox [aut, cre], Manuel Marquez [aut], Robert Muenchen [ctb], Dan Putler [ctb] |
Maintainer: | John Fox <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.9-1 |
Built: | 2024-11-03 06:44:44 UTC |
Source: | CRAN |
Correctly creates a cluster membership variable that can be attached to a dataframe when only a subset of the observations in that dataframe were used to create the clustering solution. NAs are assigned to the observations of the original dataframe not used in creating the clustering solution.
assignCluster(clusterData, origData, clusterVec)
assignCluster(clusterData, origData, clusterVec)
clusterData |
The data matrix used in the clustering solution. The data matrix may have have only a subset of the observations contained in the original dataframe. |
origData |
The original dataframe from which the data used in the clustering solution were taken. |
clusterVec |
An integer variable containing the cluster membership
assignments for the observations used in creating the clustering solution.
This vector can be created using |
A factor (with integer labels) that indicate the cluster assignment for each observation, with an NA value given to observations not used in the clustering solution.
Dan Putler
hclust
, cutree
, kmeans
,
KMeans
data(USArrests) USArrkm3 <- KMeans(USArrests[USArrests$UrbanPop<66, ], centers=3) assignCluster(USArrests[USArrests$UrbanPop<66, ], USArrests, USArrkm3$cluster)
data(USArrests) USArrkm3 <- KMeans(USArrests[USArrests$UrbanPop<66, ], centers=3) assignCluster(USArrests[USArrests$UrbanPop<66, ], USArrests, USArrkm3$cluster)
Create bar plots for one or two factors scaled by frequency or precentages.
In the case of two factors, the bars can be divided (stacked) or plotted in
parallel (side-by-side). This function is a front end to barplot
in the graphics package.
Barplot(x, by, scale = c("frequency", "percent"), conditional=TRUE, style = c("divided", "parallel"), col=if (missing(by)) "gray" else rainbow_hcl(length(levels(by))), xlab = deparse(substitute(x)), legend.title = deparse(substitute(by)), ylab = scale, main=NULL, legend.pos = "above", label.bars=FALSE, ...)
Barplot(x, by, scale = c("frequency", "percent"), conditional=TRUE, style = c("divided", "parallel"), col=if (missing(by)) "gray" else rainbow_hcl(length(levels(by))), xlab = deparse(substitute(x)), legend.title = deparse(substitute(by)), ylab = scale, main=NULL, legend.pos = "above", label.bars=FALSE, ...)
x |
a factor (or character or logical variable). |
by |
optionally, a second factor (or character or logical variable). |
scale |
either |
conditional |
if |
style |
for two-factor plots, either |
col |
if |
xlab |
an optional character string providing a label for the horizontal axis. |
legend.title |
an optional character string providing a title for the legend. |
ylab |
an optional character string providing a label for the vertical axis. |
main |
an optional main title for the plot. |
legend.pos |
position of the legend, in a form acceptable to the |
label.bars |
if |
... |
arguments to be passed to the |
Invisibly returns the horizontal coordinates of the centers of the bars.
John Fox [email protected]
with(Mroz, { Barplot(wc) Barplot(wc, col="lightblue", label.bars=TRUE) Barplot(wc, by=hc) Barplot(wc, by=hc, scale="percent", label.bars=TRUE) Barplot(wc, by=hc, style="parallel", scale="percent", legend.pos="center") })
with(Mroz, { Barplot(wc) Barplot(wc, col="lightblue", label.bars=TRUE) Barplot(wc, by=hc) Barplot(wc, by=hc, scale="percent", label.bars=TRUE) Barplot(wc, by=hc, style="parallel", scale="percent", legend.pos="center") })
Bins a numeric variable, as for a histogram, and reports the count and percentage in each bin. The computations are done by the hist
function, but no histogram is drawn. If supplied a numeric matrix or data frame, the distribution of each column is printed.
binnedCounts(x, breaks="Sturges", round.percents=2, name=deparse(substitute(x)))
binnedCounts(x, breaks="Sturges", round.percents=2, name=deparse(substitute(x)))
x |
a numeric vector, matrix, or data frame. |
breaks |
specification of the breaks between bins, to be passed to the |
round.percents |
number of decimal places to round percentages; default is |
name |
name for the variable; only used for vector argument |
For a numeric vector, invisibly returns the vector of counts, named with the end-points of the corresponding bins. For a matrix or data frame, invisibly returns NULL
John Fox [email protected]
with(Prestige, binnedCounts(income)) binnedCounts(Prestige[, 1:4])
with(Prestige, binnedCounts(income)) binnedCounts(Prestige[, 1:4])
Create a factor dissecting the range of a numeric variable into
bins of equal width, (roughly) equal frequency, or at "natural"
cut points. The cut
function is used to create the factor.
bin.var
is a synomym for binVariable
, retained for backwards compatibility.
binVariable(x, bins = 4, method = c("intervals", "proportions", "natural"), labels = FALSE) bin.var(...)
binVariable(x, bins = 4, method = c("intervals", "proportions", "natural"), labels = FALSE) bin.var(...)
x |
numeric variable to be binned. |
bins |
number of bins. |
method |
one of |
labels |
if |
... |
arguments to be passed to |
A factor.
Dan Putler, slightly modified by John Fox [email protected] with the original author's permission.
summary(binVariable(rnorm(100), method="prop", labels=letters[1:4]))
summary(binVariable(rnorm(100), method="prop", labels=letters[1:4]))
Percentage a matrix or higher-dimensional array of frequency counts by rows, columns, or total frequency.
colPercents(tab, digits=1) rowPercents(tab, digits=1) totPercents(tab, digits=1)
colPercents(tab, digits=1) rowPercents(tab, digits=1) totPercents(tab, digits=1)
tab |
a matrix or higher-dimensional array of frequency counts. |
digits |
number of places to the right of the decimal place for percentages. |
Returns an array of the same size and shape as tab
percentaged by
rows or columns, plus rows or columns of totals and counts, or by the
table total.
John Fox [email protected]
if (require(car)){ data(Mroz) # from car package cat("\n\n column percents:\n") print(colPercents(xtabs(~ lfp + wc, data=Mroz))) cat("\n\n row percents:\n") print(rowPercents(xtabs(~ hc + lfp, data=Mroz))) cat("\n\n total percents:\n") print(totPercents(xtabs(~ hc + wc, data=Mroz))) cat("\n\n three-way table, column percents:\n") print(colPercents(xtabs(~ lfp + wc + hc, data=Mroz))) }
if (require(car)){ data(Mroz) # from car package cat("\n\n column percents:\n") print(colPercents(xtabs(~ lfp + wc, data=Mroz))) cat("\n\n row percents:\n") print(rowPercents(xtabs(~ hc + lfp, data=Mroz))) cat("\n\n total percents:\n") print(totPercents(xtabs(~ hc + wc, data=Mroz))) cat("\n\n three-way table, column percents:\n") print(colPercents(xtabs(~ lfp + wc + hc, data=Mroz))) }
DeltaMethod
is a wrapper for the deltaMethod
function in the car package. It computes the asymptotic standard error of an arbitrary, usually nonlinear, function of model coefficients, which are named b0
(if there is an intercept in the model), b1
, b2
, etc., and based on the standard error, a confidence interval based on the normal distribution.
DeltaMethod(model, g, level = 0.95) ## S3 method for class 'DeltaMethod' print(x, ...)
DeltaMethod(model, g, level = 0.95) ## S3 method for class 'DeltaMethod' print(x, ...)
model |
a regression model; see the |
g |
the expression — that is, function of the coefficients — to evaluate, as a character string. |
level |
the confidence level, defaults to |
x |
an object of class |
... |
optional arguments to pass to |
DeltaMethod
returns an objects of class "DeltaMethod"
, for which a print
method is provided.
John Fox [email protected]
deltaMethod
function in the car package
if (require(car)){ DeltaMethod(lm(prestige ~ income + education, data=Duncan), "b1/b2") }
if (require(car)){ DeltaMethod(lm(prestige ~ income + education, data=Duncan), "b1/b2") }
Computes the frequency and percentage distribution of a descrete numeric variable or the distributions of the variables in a numeric matrix or data frame.
discreteCounts(x, round.percents=2, name=deparse(substitute(x)), max.values=min(round(2*sqrt(length(x))), round(10*log10(length(x))), 100))
discreteCounts(x, round.percents=2, name=deparse(substitute(x)), max.values=min(round(2*sqrt(length(x))), round(10*log10(length(x))), 100))
x |
a discrete numeric vector, matrix, or data frame. |
round.percents |
number of decimal places to round percentages; default is |
name |
name for the variable; only used for vector argument |
max.values |
maximum number of unique values (default is the smallest of
twice the square root of the number of elements in |
For a numeric vector, invisibly returns the table of counts.
For a matrix or data frame, invisibly returns NULL
John Fox [email protected]
set.seed(12345) # for reproducibility discreteCounts(data.frame(x=rpois(51, 2), y=rpois(51, 10)))
set.seed(12345) # for reproducibility discreteCounts(data.frame(x=rpois(51, 2), y=rpois(51, 10)))
Plot the distribution of a discrete numeric variable, optionally classified by a factor.
discretePlot(x, by, scale = c("frequency", "percent"), xlab = deparse(substitute(x)), ylab = scale, main = "", xlim=NULL, ylim=NULL, ...)
discretePlot(x, by, scale = c("frequency", "percent"), xlab = deparse(substitute(x)), ylab = scale, main = "", xlim=NULL, ylim=NULL, ...)
x |
a numeric variable. |
by |
optionally a factor (or character or logical variable) by which to classify |
scale |
either |
xlab |
optional character string to label the horizontal axis. |
ylab |
optional character string to label the vertical axis. |
main |
optonal main label for the plot (ignored if the |
xlim , ylim
|
two-element numeric vectors specifying the ranges of the x and y axes; if not specified, will be determined from the data; the lower limit of the y-axis should normally be 0 and a warning will be printed if it isn't. |
... |
other arguments to be passed to |
.
If the by
argument is specified, then one plot is produced for each
level of by
; these are arranged vertically and all use the same scale
for the horizontal and vertical axes.
Returns NULL
invisibly.
John Fox [email protected]
Hist
, link{Dotplot}
.
if (require(datasets)){ data(mtcars) mtcars$cyl <- factor(mtcars$cyl) with(mtcars, { discretePlot(carb) discretePlot(carb, scale="percent") discretePlot(carb, by=cyl) }) }
if (require(datasets)){ data(mtcars) mtcars$cyl <- factor(mtcars$cyl) with(mtcars, { discretePlot(carb) discretePlot(carb, scale="percent") discretePlot(carb, by=cyl) }) }
Dot plot of numeric variable, either using raw values or binned, optionally classified by a factor. Dot plots are useful for visualizing the distribution of a numeric variable in a small data set.
Dotplot(x, by, bin = FALSE, breaks, xlim, xlab = deparse(substitute(x)))
Dotplot(x, by, bin = FALSE, breaks, xlim, xlab = deparse(substitute(x)))
x |
a numeric variable. |
by |
optionally a factor (or character or logical variable) by which to classify |
bin |
if |
breaks |
breaks for the bins, in a form acceptable to the |
xlim |
optional 2-element numeric vector giving limits of the horizontal axis. |
xlab |
optional character string to label horizontal axis. |
If the by
argument is specified, then one dot plot is produced for each
level of by
; these are arranged vertically and all use the same scale
for x
. An attempt is made to adjust the size of the dots to the space
available without making them too big.
Returns NULL
invisibly.
John Fox [email protected]
if (require(car)){ data(Duncan) with(Duncan, { Dotplot(education) Dotplot(education, bin=TRUE) Dotplot(education, by=type) Dotplot(education, by=type, bin=TRUE) }) }
if (require(car)){ data(Duncan) with(Duncan, { Dotplot(education) Dotplot(education, bin=TRUE) Dotplot(education, by=type) Dotplot(education, by=type, bin=TRUE) }) }
Density, distribution function, quantile function and random generation for the Gumbel distribution with
specified location
and scale
parameters.
dgumbel(x, location = 0, scale = 1) pgumbel(q, location=0, scale=1, lower.tail=TRUE) qgumbel(p, location=0, scale=1, lower.tail=TRUE) rgumbel(n, location=0, scale=1)
dgumbel(x, location = 0, scale = 1) pgumbel(q, location=0, scale=1, lower.tail=TRUE) qgumbel(p, location=0, scale=1, lower.tail=TRUE) rgumbel(n, location=0, scale=1)
x , q
|
vector of quantiles (values of the variable). |
p |
vector of probabilities. |
n |
number of observations. If |
location |
location parameter (default |
scale |
scale parameter (default |
lower.tail |
logical; if |
John Fox [email protected]
See https://en.wikipedia.org/wiki/Gumbel_distribution for details of the Gumbel distribution.
x <- 100 + 5*c(-Inf, -1, 0, 1, 2, 3, Inf, NA) dgumbel(x, 100, 5) pgumbel(x, 100, 5) p <- c(0, .25, .5, .75, 1, NA) qgumbel(p, 100, 5) summary(rgumbel(1e5, 100, 5))
x <- 100 + 5*c(-Inf, -1, 0, 1, 2, 3, Inf, NA) dgumbel(x, 100, 5) pgumbel(x, 100, 5) p <- c(0, .25, .5, .75, 1, NA) qgumbel(p, 100, 5) summary(rgumbel(1e5, 100, 5))
This function is a wrapper for the hist
function in
the base
package, permitting percentage scaling of the
vertical axis in addition to frequency and density scaling.
Hist(x, groups, scale=c("frequency", "percent", "density"), xlab=deparse(substitute(x)), ylab=scale, main="", breaks = "Sturges", ...)
Hist(x, groups, scale=c("frequency", "percent", "density"), xlab=deparse(substitute(x)), ylab=scale, main="", breaks = "Sturges", ...)
x |
a vector of values for which a histogram is to be plotted. |
groups |
a factor (or character or logical variable) to create histograms by group with common horizontal and vertical scales. |
scale |
the scaling of the vertical axis: |
xlab |
x-axis label, defaults to name of variable. |
ylab |
y-axis label, defaults to value of |
main |
main title for graph, defaults to empty. |
breaks |
see the |
... |
arguments to be passed to |
This function is primarily called for its side effect —
plotting a histogram or histograms — but it also invisibly
returns an object of class hist
or a list of hist
objects.
John Fox [email protected]
data(Prestige, package="car") Hist(Prestige$income, scale="percent") with(Prestige, Hist(income, groups=type))
data(Prestige, package="car") Hist(Prestige$income, scale="percent") with(Prestige, Hist(income, groups=type))
Index plots with point identification.
indexplot(x, groups, labels = seq_along(x), id.method = "y", type = "h", id.n = 0, ylab, legend="topright", title, col=palette(), ...)
indexplot(x, groups, labels = seq_along(x), id.method = "y", type = "h", id.n = 0, ylab, legend="topright", title, col=palette(), ...)
x |
a numeric variable, a matrix whose columns are numeric variables, or a numeric data frame;
if |
labels |
point labels; if |
groups |
an optional grouping variable, typically a factor (or character or logical variable). |
id.method |
method for identifying points; see |
type |
to be passed to |
id.n |
number of points to identify; see |
ylab |
label for vertical axis; if missing, will be constructed from |
legend |
keyword (see |
title |
title for the legend; may normally be omitted. |
col |
vector of colors for the |
... |
to be passed to |
Returns labelled indices of identified points or (invisibly) NULL
if no points
are identified or if there are multiple variables with some missing data.
John Fox [email protected]
if (require("car")){ with(Prestige, indexplot(income, id.n=2, labels=rownames(Prestige))) indexplot(Prestige[, c("income", "education", "prestige")], groups = Prestige$type, id.n=2) }
if (require("car")){ with(Prestige, indexplot(income, id.n=2, labels=rownames(Prestige))) indexplot(Prestige[, c("income", "education", "prestige")], groups = Prestige$type, id.n=2) }
Finds a number of k-means clusting solutions using R's kmeans
function,
and selects as the final solution the one that has the minimum total
within-cluster sum of squared distances.
KMeans(x, centers, iter.max=10, num.seeds=10)
KMeans(x, centers, iter.max=10, num.seeds=10)
x |
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a dataframe with all numeric columns). |
centers |
The number of clusters in the solution. |
iter.max |
The maximum number of iterations allowed. |
num.seeds |
The number of different starting random seeds to use. Each random seed results in a different k-means solution. |
A list with components:
cluster |
A vector of integers indicating the cluster to which each point is allocated. |
centers |
A matrix of cluster centres (centroids). |
withinss |
The within-cluster sum of squares for each cluster. |
tot.withinss |
The within-cluster sum of squares summed across clusters. |
betweenss |
The between-cluster sum of squared distances. |
size |
The number of points in each cluster. |
Dan Putler
data(USArrests) KMeans(USArrests, centers=3, iter.max=5, num.seeds=5)
data(USArrests) KMeans(USArrests, centers=3, iter.max=5, num.seeds=5)
This function plots lines for one or more variables against another variable — typically time series against time.
lineplot(x, ..., legend)
lineplot(x, ..., legend)
x |
variable giving horizontal coordinates. |
... |
one or more variables giving vertical coordinates. |
legend |
plot legend? Default is |
Produces a plot; returns NULL
invisibly.
John Fox [email protected]
if (require("car")){ data(Bfox) Bfox$time <- as.numeric(rownames(Bfox)) with(Bfox, lineplot(time, menwage, womwage)) }
if (require("car")){ data(Bfox) Bfox$time <- as.numeric(rownames(Bfox)) with(Bfox, lineplot(time, menwage, womwage)) }
This function merges two data frames by combining their rows.
mergeRows(X, Y, common.only = FALSE, ...) ## S3 method for class 'data.frame' mergeRows(X, Y, common.only = FALSE, ...)
mergeRows(X, Y, common.only = FALSE, ...) ## S3 method for class 'data.frame' mergeRows(X, Y, common.only = FALSE, ...)
X |
First data frame. |
Y |
Second data frame. |
common.only |
If |
... |
Not used. |
A data frame containing the rows from both input data frames.
John Fox
For column merges and more complex merges, see merge
.
if (require(car)){ data(Duncan) D1 <- Duncan[1:20,] D2 <- Duncan[21:45,] D <- mergeRows(D1, D2) print(D) dim(D) }
if (require(car)){ data(Duncan) D1 <- Duncan[1:20,] D2 <- Duncan[21:45,] D <- mergeRows(D1, D2) print(D) dim(D) }
Perform one of several tests of normality, either for a variable or for a variable by groups. The normalityTest
function uses the shapiro.test
function or one of several functions in the nortest package.
If tests are done by groups, then adjusted p-values, computed by the Holm method, are also reported (see p.adjust
).
normalityTest(x, ...) ## S3 method for class 'formula' normalityTest(formula, test, data, ...) ## Default S3 method: normalityTest(x, test=c("shapiro.test", "ad.test", "cvm.test", "lillie.test", "pearson.test", "sf.test"), groups, vname, gname, ...)
normalityTest(x, ...) ## S3 method for class 'formula' normalityTest(formula, test, data, ...) ## Default S3 method: normalityTest(x, test=c("shapiro.test", "ad.test", "cvm.test", "lillie.test", "pearson.test", "sf.test"), groups, vname, gname, ...)
x |
numeric vector or formula. |
formula |
one-sided formula of the form |
data |
a data frame containing the data for the test. |
test |
quoted name of the function to perform the test. |
groups |
optional factor to divide the data into groups. |
vname |
optional name for the variable; if absent, taken from |
gname |
optional name for the grouping factor; if absent, taken from |
... |
any arguments to be passed down; the only useful such arguments are for the
|
If testing by groups, the function invisibly returns NULL
; otherwise it returns an object of class
"htest"
, which normally would be printed.
John Fox [email protected]
shapiro.test
, ad.test
, cvm.test
, lillie.test
,
pearson.test
, sf.test
.
data(Prestige, package="car") with(Prestige, normalityTest(income)) normalityTest(income ~ type, data=Prestige, test="ad.test") normalityTest(~income, data=Prestige, test="pearson.test", n.classes=5)
data(Prestige, package="car") with(Prestige, normalityTest(income)) normalityTest(income ~ type, data=Prestige, test="ad.test") normalityTest(~income, data=Prestige, test="pearson.test", n.classes=5)
numSummary
creates neatly formatted tables of means, standard deviations, coefficients of variation,
skewness, kurtosis, and quantiles of numeric variables. CV
computes the coefficient of variation.
numSummary(data, statistics=c("mean", "sd", "se(mean)", "var", "CV", "IQR", "quantiles", "skewness", "kurtosis"), type=c("2", "1", "3"), quantiles=c(0, .25, .5, .75, 1), groups) CV(x, na.rm=TRUE) ## S3 method for class 'numSummary' print(x, ...)
numSummary(data, statistics=c("mean", "sd", "se(mean)", "var", "CV", "IQR", "quantiles", "skewness", "kurtosis"), type=c("2", "1", "3"), quantiles=c(0, .25, .5, .75, 1), groups) CV(x, na.rm=TRUE) ## S3 method for class 'numSummary' print(x, ...)
data |
a numeric vector, matrix, or data frame. |
statistics |
any of |
type |
definition to use in computing skewness and kurtosis; see the
|
quantiles |
quantiles to report; default is |
groups |
optional variable, typically a factor, to be used to partition the data. |
x |
object of class |
na.rm |
if |
... |
arguments to pass down from the print method. |
numSummary
returns an object of class "numSummary"
containing the table of
statistics to be reported along with information on missing data, if there are any. CV
returns the coefficient(s) of variation.
John Fox [email protected]
mean
, sd
, quantile
,
skewness
, kurtosis
.
if (require("carData")){ data(Prestige) Prestige[1, "income"] <- NA print(numSummary(Prestige[,c("income", "education")], statistics=c("mean", "sd", "quantiles", "cv", "skewness", "kurtosis"))) print(numSummary(Prestige[,c("income", "education")], groups=Prestige$type)) remove(Prestige) }
if (require("carData")){ data(Prestige) Prestige[1, "income"] <- NA print(numSummary(Prestige[,c("income", "education")], statistics=c("mean", "sd", "quantiles", "cv", "skewness", "kurtosis"))) print(numSummary(Prestige[,c("income", "education")], groups=Prestige$type)) remove(Prestige) }
Computes a matrix of partial correlations between each pair of variables controlling for the others.
partial.cor(X, tests=FALSE, use=c("complete.obs", "pairwise.complete.obs"))
partial.cor(X, tests=FALSE, use=c("complete.obs", "pairwise.complete.obs"))
X |
data matrix. |
tests |
show two-sided p-value and p-value adjusted for multiple testing by Holm's method for each partial correlation? |
use |
observations to use to compute partial correlations, default is |
Returns the matrix of partial correlations, optionally with adjusted and unadjusted p-values.
John Fox [email protected]
data(DavisThin, package="car") partial.cor(DavisThin) partial.cor(DavisThin, tests=TRUE)
data(DavisThin, package="car") partial.cor(DavisThin) partial.cor(DavisThin, tests=TRUE)
piechart
is a front-end to the standard R pie
function, with the
capability of adding percents or counts to the pie-segment labels.
piechart(x, scale = c("percent", "frequency", "none"), col = rainbow_hcl(nlevels(x)), ...)
piechart(x, scale = c("percent", "frequency", "none"), col = rainbow_hcl(nlevels(x)), ...)
x |
a factor or other discrete variable; the segments of the pie correspond to the
unique values (levels) of |
scale |
parenthetical numbers to add to the pie-segment labels; the default is |
col |
colors for the segments; the default is provided by the |
... |
further arguments to be passed to |
John Fox [email protected]
with(Duncan, piechart(type))
with(Duncan, piechart(type))
The function takes an object of class "boot"
and creates an array of density estimates for the bootstrap distributions of the parameters.
plotBoot(object, confint=NULL, ...) ## S3 method for class 'boot' plotBoot(object, confint=NULL, ...)
plotBoot(object, confint=NULL, ...) ## S3 method for class 'boot' plotBoot(object, confint=NULL, ...)
object |
an object of class |
confint |
an object of class |
... |
not used |
Creates an array of adaptive kernal density plots, using densityPlot
in the car package, showing the bootstrap distribution, point estimate ,and (optionally) confidence limits for each parameter.
Invisibly returns the object produced by densityPlot
.
John Fox [email protected]
## Not run: plotBoot(Boot(lm(prestige ~ income + education + type, data=Duncan))) ## End(Not run)
## Not run: plotBoot(Boot(lm(prestige ~ income + education + type, data=Duncan))) ## End(Not run)
This function plots a probability density, mass, or distribution function, adapting the form of the plot as appropriate.
plotDistr(x, p, discrete=FALSE, cdf=FALSE, regions=NULL, col="gray", legend=TRUE, legend.pos="topright", ...)
plotDistr(x, p, discrete=FALSE, cdf=FALSE, regions=NULL, col="gray", legend=TRUE, legend.pos="topright", ...)
x |
horizontal coordinates |
p |
vertical coordinates |
discrete |
is the random variable discrete? |
cdf |
is this a cumulative distribution (as opposed to mass) function? |
regions , col
|
for continuous distributions only,
if non- |
legend |
plot a legend of the regions (default |
legend.pos |
position for the legend (see |
... |
arguments to be passed to |
Produces a plot; returns NULL
invisibly.
John Fox [email protected]
x <- seq(-4, 4, length=100) plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density") plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density", region=list(c(1.96, Inf), c(-Inf, -1.96)), col=c("red", "blue")) plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density", region=list(c(qnorm(0), qnorm(.025)), c(qnorm(.975), qnorm(1)))) # same x <- 0:10 plotDistr(x, pbinom(x, 10, 0.5), xlab="successes", discrete=TRUE, cdf=TRUE, main="Binomial Distribution Function, p=0.5, n=10")
x <- seq(-4, 4, length=100) plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density") plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density", region=list(c(1.96, Inf), c(-Inf, -1.96)), col=c("red", "blue")) plotDistr(x, dnorm(x), xlab="Z", ylab="p(z)", main="Standard Normal Density", region=list(c(qnorm(0), qnorm(.025)), c(qnorm(.975), qnorm(1)))) # same x <- 0:10 plotDistr(x, pbinom(x, 10, 0.5), xlab="successes", discrete=TRUE, cdf=TRUE, main="Binomial Distribution Function, p=0.5, n=10")
Plots cell means for a numeric variable in each category of a factor or in each combination of categories of two factors, optionally along with error bars based on cell standard errors or standard deviations.
plotMeans(response, factor1, factor2, error.bars = c("se", "sd", "conf.int", "none"), level=0.95, xlab=deparse(substitute(factor1)), ylab=paste("mean of", deparse(substitute(response))), legend.lab=deparse(substitute(factor2)), legend.pos=c("farright", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center"), main="Plot of Means", pch=1:n.levs.2, lty=1:n.levs.2, col=palette(), connect=TRUE, ...)
plotMeans(response, factor1, factor2, error.bars = c("se", "sd", "conf.int", "none"), level=0.95, xlab=deparse(substitute(factor1)), ylab=paste("mean of", deparse(substitute(response))), legend.lab=deparse(substitute(factor2)), legend.pos=c("farright", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center"), main="Plot of Means", pch=1:n.levs.2, lty=1:n.levs.2, col=palette(), connect=TRUE, ...)
response |
Numeric variable for which means are to be computed. |
factor1 |
Factor defining horizontal axis of the plot. |
factor2 |
If present, factor defining profiles of means |
error.bars |
If |
level |
level of confidence for confidence intervals; default is .95 |
xlab |
Label for horizontal axis. |
ylab |
Label for vertical axis. |
legend.lab |
Label for legend. |
legend.pos |
Position of legend; if |
main |
Label for the graph. |
pch |
Plotting characters for profiles of means. |
lty |
Line types for profiles of means. |
col |
Colours for profiles of means |
connect |
connect profiles of means, default |
... |
arguments to be passed to |
The function invisibly returns NULL
.
John Fox [email protected]
if (require(car)){ data(Moore) with(Moore, plotMeans(conformity, fcategory, partner.status, ylim=c(0, 25))) }
if (require(car)){ data(Moore) with(Moore, plotMeans(conformity, fcategory, partner.status, ylim=c(0, 25))) }
This function uses the rcorr
function in the Hmisc package
to compute matrices of Pearson or Spearman correlations along with
the pairwise p-values among the correlations. The p-values are corrected
for multiple inference using Holm's method (see p.adjust
).
Observations are filtered for missing data, and only complete observations are used.
rcorr.adjust(x, type = c("pearson", "spearman"), use=c("complete.obs", "pairwise.complete.obs")) ## S3 method for class 'rcorr.adjust' print(x, ...)
rcorr.adjust(x, type = c("pearson", "spearman"), use=c("complete.obs", "pairwise.complete.obs")) ## S3 method for class 'rcorr.adjust' print(x, ...)
x |
a numeric matrix or data frame, or an object of class |
type |
|
use |
how to handle missing data: |
... |
not used. |
Returns an object of class "rcorr.adjust"
, which is normally just printed.
John Fox, adapting code from Robert A. Muenchen.
if (require(car)){ data(Mroz) print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")])) print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")], type="spearman")) }
if (require(car)){ data(Mroz) print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")])) print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")], type="spearman")) }
readSAS
reads a SAS “b7dat” data set, stored in a file of type .sas7bdat
, into an R data frame; it provides
a front end to the read_sas
function in the haven package.
readSAS(file, rownames=FALSE, stringsAsFactors=FALSE)
readSAS(file, rownames=FALSE, stringsAsFactors=FALSE)
file |
path to a SAS b7dat file. |
rownames |
if |
stringsAsFactors |
if |
a data frame
John Fox [email protected]
readSPSS
reads an SPSS data set, stored in a file of type .sav
or .por
, into an R data frame; it provides
a front end to the read_spss
function in the haven package and the read.spss
function in the foreign package.
readSPSS(file, rownames=FALSE, stringsAsFactors=FALSE, tolower=TRUE, use.value.labels=TRUE, use.haven=!por)
readSPSS(file, rownames=FALSE, stringsAsFactors=FALSE, tolower=TRUE, use.value.labels=TRUE, use.haven=!por)
file |
path to an SPSS |
rownames |
if |
stringsAsFactors |
if |
tolower |
change variable names to lowercase, default |
use.value.labels |
if |
use.haven |
use |
a data frame
John Fox [email protected]
readStata
reads a Stata data set, stored in a file of type .dta
, into an R data frame; it provides
a front end to the read.dta13
function in the readstata13 package.
readStata(file, rownames=FALSE, stringsAsFactors=FALSE, convert.dates=TRUE)
readStata(file, rownames=FALSE, stringsAsFactors=FALSE, convert.dates=TRUE)
file |
path to a Stata |
rownames |
if |
stringsAsFactors |
if |
convert.dates |
if |
a data frame
John Fox [email protected]
readXL
reads an Excel file, either of type .xls
or .xlsx
into an R data frame; it provides
a front end to the read_excel
function in the readxl package.
excel_sheets
is re-exported from the readxl package and reports the
names of spreadsheets in an Excel file.
readXL(file, rownames = FALSE, header = TRUE, na = "", sheet = 1, stringsAsFactors = FALSE) excel_sheets(path)
readXL(file, rownames = FALSE, header = TRUE, na = "", sheet = 1, stringsAsFactors = FALSE) excel_sheets(path)
file , path
|
path to an Excel file. |
rownames |
if |
header |
if |
na |
character string denoting missing data; the default is the empty string, |
sheet |
number of the spreadsheet in the file containing the data to be read; the
default is |
stringsAsFactors |
if |
a data frame
John Fox [email protected]
Calculates Cronbach's alpha and standardized alpha (lower bounds on reliability) for a composite (summated-rating) scale. Standardized alpha is for the sum of the standardized items. In addition, the function calculates alpha and standardized alpha for the scale with each item deleted in turn, and computes the correlation between each item and the sum of the other items.
reliability(S) ## S3 method for class 'reliability' print(x, digits=4, ...)
reliability(S) ## S3 method for class 'reliability' print(x, digits=4, ...)
S |
the covariance matrix of the items; normally, there should be at least 3 items and certainly no fewer than 2. |
x |
reliability object to be printed. |
digits |
number of decimal places. |
... |
not used: for compatibility with the print generic." |
an object of class reliability, which normally would be printed.
John Fox [email protected]
N. Cliff (1986) Psychological testing theory. Pp. 343–349 in S. Kotz and N. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 7. Wiley.
if (require(car)){ data(DavisThin) reliability(cov(DavisThin)) }
if (require(car)){ data(DavisThin) reliability(cov(DavisThin)) }
Creates a means plot for a repeated-measures ANOVA design with one or two within-subjects factor and zero or more between-subjects factors, for data in "wide" format.
repeatedMeasuresPlot(data, within, within.names, within.levels, between.names = NULL, response.name = "score", trace, xvar, pch=15:25, lty=1:6, col=palette()[-1], plot.means=TRUE, print.tables = FALSE)
repeatedMeasuresPlot(data, within, within.names, within.levels, between.names = NULL, response.name = "score", trace, xvar, pch=15:25, lty=1:6, col=palette()[-1], plot.means=TRUE, print.tables = FALSE)
data |
a data frame in wide format. |
within |
a character vector with the names of the data columns containing the repeated measures. |
within.names |
a character vector with one or two elements, of names of the within-subjects factor(s). |
within.levels |
a named list whose elements are character vectors of level names for the within-subjects factors, with names corresponding to the names of the within-subjects factors; the product of the numbers of levels should be equal to the number of repeated-measures columns in |
between.names |
a column vector of names of the between-subjects factors (if any). |
response.name |
optional quoted name for the response variable, defaults to |
trace |
optional quoted name of the (either within- or between-subjects) factor to define profiles of means in each panel of the graph; the default is the within-subjects factor with the smaller number of levels, if there are two, or not used if there is one. |
xvar |
optional quoted name of the factor to define the horizontal axis of each panel; the default is the within-subjects factor with the larger number of levels. |
pch , lty
|
vectors of symbol and line-type numbers to use for the profiles of means (i.e., levels of the |
col |
vector of colors for the profiles of means; the default is given by |
plot.means |
if |
print.tables |
if |
A "trellis"
object, which normally is just "printed" (i.e., plotted).
John Fox [email protected]
if (require(carData)){ repeatedMeasuresPlot( data=OBrienKaiser, within=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names=c("phase", "hour"), within.levels=list(phase=c("pre", "post", "fup"), hour = c("1", "2", "3", "4", "5")), between.names=c("gender", "treatment"), response.name="improvement", print.tables=TRUE) } if (require(carData)){ repeatedMeasuresPlot(data=OBrienKaiser, within=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names=c("phase", "hour"), within.levels=list(phase=c("pre", "post", "fup"), hour = c("1", "2", "3", "4", "5")), between.names=c("gender", "treatment"), trace="gender") # note that gender is between subjects } if (require(carData)){ repeatedMeasuresPlot( data=OBrienKaiser, within=c("fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names="hour", within.levels=list(hour = c("1", "2", "3", "4", "5")), between.names=c("treatment", "gender"), response.name="improvement") }
if (require(carData)){ repeatedMeasuresPlot( data=OBrienKaiser, within=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names=c("phase", "hour"), within.levels=list(phase=c("pre", "post", "fup"), hour = c("1", "2", "3", "4", "5")), between.names=c("gender", "treatment"), response.name="improvement", print.tables=TRUE) } if (require(carData)){ repeatedMeasuresPlot(data=OBrienKaiser, within=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names=c("phase", "hour"), within.levels=list(phase=c("pre", "post", "fup"), hour = c("1", "2", "3", "4", "5")), between.names=c("gender", "treatment"), trace="gender") # note that gender is between subjects } if (require(carData)){ repeatedMeasuresPlot( data=OBrienKaiser, within=c("fup.1", "fup.2", "fup.3", "fup.4", "fup.5"), within.names="hour", within.levels=list(hour = c("1", "2", "3", "4", "5")), between.names=c("treatment", "gender"), response.name="improvement") }
A simple front-end to the standard R reshape
function. The data are assumed to be in "long" format, with several rows for each subject.
reshapeL2W(data, within, id, varying, ignore)
reshapeL2W(data, within, id, varying, ignore)
data |
a data frame in long format. |
within |
a character vector of names of the within-subjects factors in the long form of the data; there must be at least one within-subjects factor. |
id |
the (character) name of the variable representing the subject identifier in the long form of the data set; that is, rows with the same |
varying |
a character vector of names of the occasion-varying variables in the long form of the data; there must be at least one such variable, and typically there will be just one, an occasion-varying response variable. |
ignore |
an optional character vector of names of variables in the long form of the data to exclude from the wide data set. |
Between-subjects variables don't vary by occasions for each subject. Variables that aren't listed explicitly in the arguments to the function are assumed to be between-subjects variables, and a warning is printed if their values aren't invariant for each subject (see the ignore
argument).
Within-subjects factors vary by occasions for each subject, and it is assumed that the within-subjects design is regular, completely crossed, and balanced, so that the same combinations of within-subjects factors are observed for each subject.
Occasion-varying variables, as their name implies, (potentially) vary by occasions for each subject, and include one or more "response" variables, possibly along with occasion-varying covariates; these variables can be factors as well as numeric variables.
The data are reshaped so that there is one row per subject, with columns for the between-subjects variables, and each occasion-varying variable as multiple columns representing the combinations of levels of the within-subjects factors. The names of the columns for the occasion-varying variables are composed from the combinations of levels of the within-subjects factors and from the names of the occasion-varying variables. If a subject in the long form of the data set lacks any combination of levels of within-subjects factors, he or she is excluded (with a warning) from the wide form of the data.
a data frame in "wide" format, with one row for each subject, columns representing the between subjects factors, and columns for the occasion-varying variable(s) for each combination of within-subjects factors.
John Fox [email protected]
reshape
, OBrienKaiser
, OBrienKaiserLong
.
if (require(carData)){ OBW <- reshapeL2W(OBrienKaiserLong, within=c("phase", "hour"), id="id", varying="score") brief(OBW) # should be the same as OBrienKaiser in the carData package: all.equal(OBrienKaiser, OBW, check.attributes=FALSE) }
if (require(carData)){ OBW <- reshapeL2W(OBrienKaiserLong, within=c("phase", "hour"), id="id", varying="score") brief(OBW) # should be the same as OBrienKaiser in the carData package: all.equal(OBrienKaiser, OBW, check.attributes=FALSE) }
The data are assumed to be in "wide" format, with a single row for each subject, and different columns for values of one or more repeated-measures variables classified by one or more within-subjects factors.
reshapeW2L(data, within, levels, varying, ignore, id = "id")
reshapeW2L(data, within, levels, varying, ignore, id = "id")
data |
wide version of data set. |
within |
a character vector of names for the crossed within-subjects factors to be created in the long form of the data. |
levels |
a named list of character vectors, each element giving the names of the levels for a within-subjects factor; the names of the list elements are the names of the within-subjects factor, given in the |
varying |
a named list of the names of variables in the wide data set specifying the occasion-varying variables to be created in the long data set; each element in the list is named for an occasion-varying variable and is a character vector of column names in the wide data for that occasion-varying variable. |
ignore |
a character vector of names of variables in the wide data to be dropped in the long form of the data. |
id |
the (character) name of the subject ID variable to be created in the long form of the data, default |
Between-subjects variables don't vary by occasions for each subject. Variables that aren't listed explicitly in the arguments to the function are assumed to be between-subjects variables. The values of these variables are duplicated in each row pertaining to a given subject.
Within-subjects factors vary by occasions for each subject, and it is assumed that the within-subjects design is regular, completely crossed, and balanced, so that the same combinations of within-subjects factors are observed for each subject. There are typically one or two within-subject factors.
Occasion-varying variables, as their name implies, (potentially) vary by occasions for each subject, and include one or more "response" variables, possibly along with occasion-varying covariates; these variables can be factors as well as numeric variables. Each occasion-varying variable is encoded in multiple columns of the wide form of the data and in a single column in the long form. There is typically one occasion-varying variable, a response variable.
There is one value of each occasion-varying variable for each combination of levels of the within-subjects factors. Thus, the number of variables in the wide data for each occasion-varying variable must be equal to the product of levels of the within-subjects factors, with the levels of the within-subjects factors varying most quickly from right to left in the
within
argument.
a data frame in "long" format, with multiple rows for each subject (equal to the number of combinations of levels of the within-subject factors) and one column for each between-subjects and occasion-varying variable.
John Fox [email protected]
reshapeL2W
, reshape
, OBrienKaiser
, OBrienKaiserLong
.
if (require(carData)){ OBrienKaiserL <- reshapeW2L(OBrienKaiser, within=c("phase", "hour"), levels=list(phase=c("pre", "post", "fup"), hour=1:5), varying=list(score=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"))) brief(OBrienKaiserL, c(15, 15)) m1 <- Tapply(score ~ phase + hour + treatment + gender, mean, data=OBrienKaiserL) m2 <- Tapply(score ~ phase + hour + treatment + gender, mean, data=OBrienKaiserLong) all.equal(m1, m2) # should be equal } if (require(carData)){ OBrienKaiserL2 <- reshapeW2L(OBrienKaiser, within="phase", levels=list(phase=c("pre", "post", "fup")), ignore=c("pre.2", "pre.3", "pre.4", "pre.5", "post.2", "post.3", "post.4", "post.5", "fup.2", "fup.3", "fup.4", "fup.5"), varying=list(score=c("pre.1", "post.1", "fup.1"))) brief(OBrienKaiserL2, c(6, 6)) m1 <- Tapply(score ~ phase + treatment + gender, mean, data=OBrienKaiserL2) m2 <- Tapply(score ~ phase + treatment + gender, mean, data=subset(OBrienKaiserLong, hour==1)) all.equal(m1, m2) # should be equal }
if (require(carData)){ OBrienKaiserL <- reshapeW2L(OBrienKaiser, within=c("phase", "hour"), levels=list(phase=c("pre", "post", "fup"), hour=1:5), varying=list(score=c("pre.1", "pre.2", "pre.3", "pre.4", "pre.5", "post.1", "post.2", "post.3", "post.4", "post.5", "fup.1", "fup.2", "fup.3", "fup.4", "fup.5"))) brief(OBrienKaiserL, c(15, 15)) m1 <- Tapply(score ~ phase + hour + treatment + gender, mean, data=OBrienKaiserL) m2 <- Tapply(score ~ phase + hour + treatment + gender, mean, data=OBrienKaiserLong) all.equal(m1, m2) # should be equal } if (require(carData)){ OBrienKaiserL2 <- reshapeW2L(OBrienKaiser, within="phase", levels=list(phase=c("pre", "post", "fup")), ignore=c("pre.2", "pre.3", "pre.4", "pre.5", "post.2", "post.3", "post.4", "post.5", "fup.2", "fup.3", "fup.4", "fup.5"), varying=list(score=c("pre.1", "post.1", "fup.1"))) brief(OBrienKaiserL2, c(6, 6)) m1 <- Tapply(score ~ phase + treatment + gender, mean, data=OBrienKaiserL2) m2 <- Tapply(score ~ phase + treatment + gender, mean, data=subset(OBrienKaiserLong, hour==1)) all.equal(m1, m2) # should be equal }
This function is a front end to the stepAIC
function in the
MASS package.
stepwise(mod, direction = c("backward/forward", "forward/backward", "backward", "forward"), criterion = c("BIC", "AIC"), ...)
stepwise(mod, direction = c("backward/forward", "forward/backward", "backward", "forward"), criterion = c("BIC", "AIC"), ...)
mod |
a model object of a class that can be handled by |
direction |
if |
criterion |
for selection. Either |
... |
arguments to be passed to |
The model selected by stepAIC
.
John Fox [email protected]
W. N. Venables and B. D. Ripley Modern Applied Statistics Statistics with S, Fourth Edition Springer, 2002.
# adapted from ?stepAIC in MASS if (require(MASS)){ data(birthwt) bwt <- with(birthwt, { race <- factor(race, labels = c("white", "black", "other")) ptd <- factor(ptl > 0) ftv <- factor(ftv) levels(ftv)[-(1:2)] <- "2+" data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0), ptd, ht = (ht > 0), ui = (ui > 0), ftv) }) birthwt.glm <- glm(low ~ ., family = binomial, data = bwt) print(stepwise(birthwt.glm, trace = FALSE)) print(stepwise(birthwt.glm, direction="forward/backward")) }
# adapted from ?stepAIC in MASS if (require(MASS)){ data(birthwt) bwt <- with(birthwt, { race <- factor(race, labels = c("white", "black", "other")) ptd <- factor(ptl > 0) ftv <- factor(ftv) levels(ftv)[-(1:2)] <- "2+" data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0), ptd, ht = (ht > 0), ui = (ui > 0), ftv) }) birthwt.glm <- glm(low ~ ., family = binomial, data = bwt) print(stepwise(birthwt.glm, trace = FALSE)) print(stepwise(birthwt.glm, direction="forward/backward")) }
summarySandwich
creates a summary of a "lm"
object similar to the standard one,
with sandwich estimates of the coefficient standard errors in the place of the usual OLS standard errors,
also modifying as a consequence the reported t-tests and p-values for the coefficients.
Standard errors may be computed from a heteroscedasticity-consistent ("HC") covariance matrix for the
coefficients (of several varieties), or from a heteroscedasticity-and-autocorrelation-consistent ("HAC") covariance matrix.
summarySandwich(model, ...) ## S3 method for class 'lm' summarySandwich(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4", "hac"), ...)
summarySandwich(model, ...) ## S3 method for class 'lm' summarySandwich(model, type=c("hc3", "hc0", "hc1", "hc2", "hc4", "hac"), ...)
model |
a linear-model object. |
type |
type of sandwich standard errors to be computed; see |
... |
arguments to be passed to |
an object of class "summary.lm"
, with sandwich standard errors substituted for the
usual OLS standard errors; the omnibus F-test is similarly adjusted.
John Fox [email protected]
mod <- lm(prestige ~ income + education + type, data=Prestige) summary(mod) summarySandwich(mod)
mod <- lm(prestige ~ income + education + type, data=Prestige) summary(mod) summarySandwich(mod)