Package 'RcmdrMisc'

Title:	R Commander Miscellaneous Functions
Description:	Various statistical, graphics, and data-management functions used by the Rcmdr package in the R Commander GUI for R.
Authors:	John Fox [aut, cre], Manuel Marquez [aut], Robert Muenchen [ctb], Dan Putler [ctb]
Maintainer:	John Fox <[email protected]>
License:	GPL (>= 2)
Version:	2.9-1
Built:	2025-02-01 06:51:45 UTC
Source:	CRAN

Help Index

Append a Cluster Membership Variable to a Dataframe
Bar Plots
Binned Frequency Distributions of Numeric Variables
Bin a Numeric Varisible
Row, Column, and Total Percentage Tables
Confidence Intervals by the Delta Method
Frequency Distributions of Numeric Variables
Plot Distribution of Discrete Numeric Variable
Dot Plots
The Gumbel Distribution
Plot a Histogram
Index Plots
K-Means Clustering Using Multiple Random Seeds
Plot a one or more lines.
Function to Merge Rows of Two Data Frames.
Normality Tests
Summary Statistics for Numeric Variables
Partial Correlations
Draw a Piechart With Percents or Counts in the Labels
Plot Bootstrap Distributions
Plot a probability density, mass, or distribution function.
Plot Means for One or Two-Way Layout
Compute Pearson or Spearman Correlations with p-Values
Read a SAS b7dat Data Set
Read an SPSS Data Set
Read a Stata Data Set
Read an Excel File
Reliability of a Composite Scale
Plot Means for Repeated-Measures ANOVA Designs
Reshape Repeated-Measures Data from Long to Wide Format
Reshape Repeated-Measures Data from Wide to Long Format
Stepwise Model Selection
Linear Model Summary with Sandwich Standard Errors

Append a Cluster Membership Variable to a Dataframe

Description

Correctly creates a cluster membership variable that can be attached to a dataframe when only a subset of the observations in that dataframe were used to create the clustering solution. NAs are assigned to the observations of the original dataframe not used in creating the clustering solution.

Usage

assignCluster(clusterData, origData, clusterVec)
assignCluster(clusterData, origData, clusterVec)

Arguments

`clusterData`	The data matrix used in the clustering solution. The data matrix may have have only a subset of the observations contained in the original dataframe.
`origData`	The original dataframe from which the data used in the clustering solution were taken.
`clusterVec`	An integer variable containing the cluster membership assignments for the observations used in creating the clustering solution. This vector can be created using `cutree` for clustering solutions generated by `hclust` or the `cluster` component of a list object created by `kmeans` or `KMeans`.

Value

A factor (with integer labels) that indicate the cluster assignment for each observation, with an NA value given to observations not used in the clustering solution.

Author(s)

Dan Putler

Examples

  data(USArrests)
  USArrkm3 <- KMeans(USArrests[USArrests$UrbanPop<66, ], centers=3)
  assignCluster(USArrests[USArrests$UrbanPop<66, ], USArrests, USArrkm3$cluster)
data(USArrests)
  USArrkm3 <- KMeans(USArrests[USArrests$UrbanPop<66, ], centers=3)
  assignCluster(USArrests[USArrests$UrbanPop<66, ], USArrests, USArrkm3$cluster)

Bar Plots

Description

Create bar plots for one or two factors scaled by frequency or precentages. In the case of two factors, the bars can be divided (stacked) or plotted in parallel (side-by-side). This function is a front end to barplot in the graphics package.

Usage

Barplot(x, by, scale = c("frequency", "percent"), conditional=TRUE,
  style = c("divided", "parallel"), 
  col=if (missing(by)) "gray" else rainbow_hcl(length(levels(by))),
  xlab = deparse(substitute(x)), legend.title = deparse(substitute(by)), 
  ylab = scale, main=NULL, legend.pos = "above", label.bars=FALSE, ...)
Barplot(x, by, scale = c("frequency", "percent"), conditional=TRUE,
  style = c("divided", "parallel"), 
  col=if (missing(by)) "gray" else rainbow_hcl(length(levels(by))),
  xlab = deparse(substitute(x)), legend.title = deparse(substitute(by)), 
  ylab = scale, main=NULL, legend.pos = "above", label.bars=FALSE, ...)

Arguments

`x`	a factor (or character or logical variable).
`by`	optionally, a second factor (or character or logical variable).
`scale`	either `"frequency"` (the default) or `"percent"`.
`conditional`	if `TRUE` then percentages are computed separately for each value of `x` (i.e., conditional percentages of `by` within levels of `x`); if `FALSE` then total percentages are graphed; ignored if `scale="frequency"`.
`style`	for two-factor plots, either `"divided"` (the default) or `"parallel"`.
`col`	if `by` is missing, the color for the bars, defaulting to `"gray"`; otherwise colors for the levels of the `by` factor in two-factor plots, defaulting to colors provided by `rainbow_hcl` in the colorspace package.
`xlab`	an optional character string providing a label for the horizontal axis.
`legend.title`	an optional character string providing a title for the legend.
`ylab`	an optional character string providing a label for the vertical axis.
`main`	an optional main title for the plot.
`legend.pos`	position of the legend, in a form acceptable to the `legend` function; the default, `"above"`, puts the legend above the plot.
`label.bars`	if `TRUE` (the default is `FALSE`) show values of frequencies or percents in the bars.
`...`	arguments to be passed to the `barplot` function.

Value

Invisibly returns the horizontal coordinates of the centers of the bars.

`x`	a numeric vector, matrix, or data frame.
`breaks`	specification of the breaks between bins, to be passed to the `hist` function.
`round.percents`	number of decimal places to round percentages; default is `2`.
`name`	name for the variable; only used for vector argument `x`.

`x`	numeric variable to be binned.
`bins`	number of bins.
`method`	one of `"intervals"` for equal-width bins; `"proportions"` for equal-count bins; `"natural"` for cut points between bins to be determined by a k-means clustering.
`labels`	if `FALSE`, numeric labels will be used for the factor levels; if `NULL`, the cut points are used to define labels; otherwise a character vector of level names.
`...`	arguments to be passed to `binVariable`.

`tab`	a matrix or higher-dimensional array of frequency counts.
`digits`	number of places to the right of the decimal place for percentages.

`model`	a regression model; see the `deltaMethod` documentation.
`g`	the expression — that is, function of the coefficients — to evaluate, as a character string.
`level`	the confidence level, defaults to `0.95`.
`x`	an object of class `"DeltaMethod"`.
`...`	optional arguments to pass to `print` to show the results.

`x`	a discrete numeric vector, matrix, or data frame.
`round.percents`	number of decimal places to round percentages; default is `2`.
`name`	name for the variable; only used for vector argument `x`.
`max.values`	maximum number of unique values (default is the smallest of twice the square root of the number of elements in `x`, 10 times the log10 of the number of elements, and `100`); if exceeded, an error is reported.

`x`	a numeric variable.
`by`	optionally a factor (or character or logical variable) by which to classify `x`.
`scale`	either `"frequency"` (the default) or `"percent"`.
`xlab`	optional character string to label the horizontal axis.
`ylab`	optional character string to label the vertical axis.
`main`	optonal main label for the plot (ignored if the `by` argument is specified).
`xlim`, `ylim`	two-element numeric vectors specifying the ranges of the x and y axes; if not specified, will be determined from the data; the lower limit of the y-axis should normally be 0 and a warning will be printed if it isn't.
`...`	other arguments to be passed to `plot`

`x`, `q`	vector of quantiles (values of the variable).
`p`	vector of probabilities.
`n`	number of observations. If `length(n)` > 1, the length is taken to be the number required.
`location`	location parameter (default `0`); potentially a vector.
`scale`	scale parameter (default `1`); potentially a vector.
`lower.tail`	logical; if `TRUE` (the default) probabilities and quantiles correspond to $P(X \le x)$ , if `FALSE` to $P(X > x)$ .

`x`	a vector of values for which a histogram is to be plotted.
`groups`	a factor (or character or logical variable) to create histograms by group with common horizontal and vertical scales.
`scale`	the scaling of the vertical axis: `"frequency"` (the default), `"percent"`, or `"density"`.
`xlab`	x-axis label, defaults to name of variable.
`ylab`	y-axis label, defaults to value of `scale`.
`main`	main title for graph, defaults to empty.
`breaks`	see the `breaks` argument for `hist`.
`...`	arguments to be passed to `hist`.

`x`	a numeric variable, a matrix whose columns are numeric variables, or a numeric data frame; if `x` is a matrix or data frame, plots vertically aligned index plots for the columns.
`labels`	point labels; if `x` is a data frame, defaults to the row names of `x`, otherwise to the case index.
`groups`	an optional grouping variable, typically a factor (or character or logical variable).
`id.method`	method for identifying points; see `showLabels`.
`type`	to be passed to `plot`.
`id.n`	number of points to identify; see `showLabels`.
`ylab`	label for vertical axis; if missing, will be constructed from `x`; for a data frame, defaults to the column names.
`legend`	keyword (see `link`[grapics]legend) giving location of the legend if `groups` are specified; if `legend=FALSE`, the legend is suppressed.
`title`	title for the legend; may normally be omitted.
`col`	vector of colors for the `groups`.
`...`	to be passed to `plot`.

`x`	A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a dataframe with all numeric columns).
`centers`	The number of clusters in the solution.
`iter.max`	The maximum number of iterations allowed.
`num.seeds`	The number of different starting random seeds to use. Each random seed results in a different k-means solution.

`cluster`	A vector of integers indicating the cluster to which each point is allocated.
`centers`	A matrix of cluster centres (centroids).
`withinss`	The within-cluster sum of squares for each cluster.
`tot.withinss`	The within-cluster sum of squares summed across clusters.
`betweenss`	The between-cluster sum of squared distances.
`size`	The number of points in each cluster.

`X`	First data frame.
`Y`	Second data frame.
`common.only`	If `TRUE`, only variables (columns) common to the two data frame are included in the merged data set; the default is `FALSE`.
`...`	Not used.

`data`	a numeric vector, matrix, or data frame.
`statistics`	any of `"mean"`, `"sd"`, `"se(mean)"`, `"var"`, `"CV"`, `"IQR"`, `"quantiles"`, `"skewness"`, or `"kurtosis"`, defaulting to `c("mean", "sd", "quantiles", "IQR")`.
`type`	definition to use in computing skewness and kurtosis; see the `skewness` and `kurtosis` functions in the e1071 package. The default is `"2"`.
`quantiles`	quantiles to report; default is `c(0, 0.25, 0.5, 0.75, 1)`.
`groups`	optional variable, typically a factor, to be used to partition the data.
`x`	object of class `"numSummary"` to print, or for `CV`, a numeric vector or matrix.
`na.rm`	if `TRUE` (the default) remove `NA`s before computing the coefficient of variation.
`...`	arguments to pass down from the print method.

`X`	data matrix.
`tests`	show two-sided p-value and p-value adjusted for multiple testing by Holm's method for each partial correlation?
`use`	observations to use to compute partial correlations, default is `"complete.obs"`.

`x`	variable giving horizontal coordinates.
`...`	one or more variables giving vertical coordinates.
`legend`	plot legend? Default is `TRUE` if there is more than one variable to plot and `FALSE` is there is just one.

`x`	numeric vector or formula.
`formula`	one-sided formula of the form `~x` or two-sided formula of the form `x ~ groups`, where `x` is a numeric variable and `groups` is a factor.
`data`	a data frame containing the data for the test.
`test`	quoted name of the function to perform the test.
`groups`	optional factor to divide the data into groups.
`vname`	optional name for the variable; if absent, taken from `x`.
`gname`	optional name for the grouping factor; if absent, taken from `groups`.
`...`	any arguments to be passed down; the only useful such arguments are for the `pearson.test` function in the nortest package.

`x`	a factor or other discrete variable; the segments of the pie correspond to the unique values (levels) of `x` and are proportional to the frequency counts in the various levels.
`scale`	parenthetical numbers to add to the pie-segment labels; the default is `"percent"`.
`col`	colors for the segments; the default is provided by the `rainbow_hcl` function in the colorspace package.
`...`	further arguments to be passed to `pie`.

`object`	an object of class `"boot"`.
`confint`	an object of class `"confint.boot"` (or an ordinary 2-column matrix) containing confidence limits for the parameters in `object`; if `NULL` (the default), these are computed from the first argument, using the defaults for `"boot"` objects.
`...`	not used

`x`	horizontal coordinates
`p`	vertical coordinates
`discrete`	is the random variable discrete?
`cdf`	is this a cumulative distribution (as opposed to mass) function?
`regions`, `col`	for continuous distributions only, if non-`NULL`, a list of regions to fill with color `col`; each element of the list is a pair of `x` values with the minimum and maximum horizontal coordinates of the corresponding region; `col` may be a single value or a vector.
`legend`	plot a legend of the regions (default `TRUE`).
`legend.pos`	position for the legend (see `legend`, default `"topright"`).
`...`	arguments to be passed to `plot`.

`response`	Numeric variable for which means are to be computed.
`factor1`	Factor defining horizontal axis of the plot.
`factor2`	If present, factor defining profiles of means
`error.bars`	If `"se"`, the default, error bars around means give plus or minus one standard error of the mean; if `"sd"`, error bars give plus or minus one standard deviation; if `"conf.int"`, error bars give a confidence interval around each mean; if `"none"`, error bars are suppressed.
`level`	level of confidence for confidence intervals; default is .95
`xlab`	Label for horizontal axis.
`ylab`	Label for vertical axis.
`legend.lab`	Label for legend.
`legend.pos`	Position of legend; if `"farright"` (the default), extra space is left at the right of the plot.
`main`	Label for the graph.
`pch`	Plotting characters for profiles of means.
`lty`	Line types for profiles of means.
`col`	Colours for profiles of means
`connect`	connect profiles of means, default `TRUE`.
`...`	arguments to be passed to `plot`.

`x`	a numeric matrix or data frame, or an object of class `"rcorr.adjust"` to be printed.
`type`	`"pearson"` or `"spearman"`, depending upon the type of correlations desired; the default is `"pearson"`.
`use`	how to handle missing data: `"complete.obs"`, the default, use only complete cases; `"pairwise.complete.obs"`, use all cases with valid data for each pair.
`...`	not used.

`file`	path to a SAS b7dat file.
`rownames`	if `TRUE` (the default is `FALSE`), the first column in the data set contains row names (which must be unique—i.e., no duplicates).
`stringsAsFactors`	if `TRUE` (the default is `FALSE`) then columns containing character data are converted to factors.

`file`	path to an SPSS `.sav` or `.por` file.
`rownames`	if `TRUE` (the default is `FALSE`), the first column in the data set contains row names, which should be unique.
`stringsAsFactors`	if `TRUE` (the default is `FALSE`) then columns containing character data are converted to factors and factors are created from SPSS value labels.
`tolower`	change variable names to lowercase, default `TRUE`.
`use.value.labels`	if `TRUE`, the default, variables with value labels in the SPSS data set will become either factors or character variables (depending on the `stringsAsFactors` argument) with the value labels as their levels or values. As for `read.spss`, this is only done if there are at least as many labels as values of the variable (and values without a matching label are returned as `NA`).
`use.haven`	use `read_spss` from the haven package to read the file, in preference to `read.spss` from the foreign package; the default is `TRUE` for a `.sav` file and `FALSE` for a `.por` file.

`file`	path to a Stata `.dta` file.
`rownames`	if `TRUE` (the default is `FALSE`), the first column in the data set contains row names, which should be unique.
`stringsAsFactors`	if `TRUE` (the default is `FALSE`) then columns containing character data are converted to factors and factors are created from Stata value labels.
`convert.dates`	if `TRUE` (the default) then Stata dates are converted to R dates.

Package 'RcmdrMisc'

Help Index

Append a Cluster Membership Variable to a Dataframe

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Bar Plots

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Binned Frequency Distributions of Numeric Variables

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Bin a Numeric Varisible

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Row, Column, and Total Percentage Tables

Description

Usage

Arguments

Value

Author(s)

Examples

Confidence Intervals by the Delta Method

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Frequency Distributions of Numeric Variables

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot Distribution of Discrete Numeric Variable

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Dot Plots

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

The Gumbel Distribution

Description

Usage

Arguments

Author(s)

`file`, `path`	path to an Excel file.
`rownames`	if `TRUE` (the default is `FALSE`), the first column in the spreadsheet contains row names (which must be unique—i.e., no duplicates).
`header`	if `TRUE` (the default), the first row in the spreadsheet contains column (variable) names.
`na`	character string denoting missing data; the default is the empty string, `""`.
`sheet`	number of the spreadsheet in the file containing the data to be read; the default is `1`.
`stringsAsFactors`	if `TRUE` (the default is `FALSE`) then columns containing character data are converted to factors.

`S`	the covariance matrix of the items; normally, there should be at least 3 items and certainly no fewer than 2.
`x`	reliability object to be printed.
`digits`	number of decimal places.
`...`	not used: for compatibility with the print generic."

`data`	a data frame in wide format.
`within`	a character vector with the names of the data columns containing the repeated measures.
`within.names`	a character vector with one or two elements, of names of the within-subjects factor(s).
`within.levels`	a named list whose elements are character vectors of level names for the within-subjects factors, with names corresponding to the names of the within-subjects factors; the product of the numbers of levels should be equal to the number of repeated-measures columns in `within`.
`between.names`	a column vector of names of the between-subjects factors (if any).
`response.name`	optional quoted name for the response variable, defaults to `"score"`.
`trace`	optional quoted name of the (either within- or between-subjects) factor to define profiles of means in each panel of the graph; the default is the within-subjects factor with the smaller number of levels, if there are two, or not used if there is one.
`xvar`	optional quoted name of the factor to define the horizontal axis of each panel; the default is the within-subjects factor with the larger number of levels.
`pch`, `lty`	vectors of symbol and line-type numbers to use for the profiles of means (i.e., levels of the `trace` factor); for the meaning of the defaults, see `points` and `par`.
`col`	vector of colors for the profiles of means; the default is given by `palette()`, starting at the second color.
`plot.means`	if `TRUE` (the default), draw a plot of means by the factors.
`print.tables`	if `TRUE` (the default is `FALSE`), print tables of means and standard deviations of the response by the factors.

`data`	a data frame in long format.
`within`	a character vector of names of the within-subjects factors in the long form of the data; there must be at least one within-subjects factor.
`id`	the (character) name of the variable representing the subject identifier in the long form of the data set; that is, rows with the same `id` belong to the same subject.
`varying`	a character vector of names of the occasion-varying variables in the long form of the data; there must be at least one such variable, and typically there will be just one, an occasion-varying response variable.
`ignore`	an optional character vector of names of variables in the long form of the data to exclude from the wide data set.

`data`	wide version of data set.
`within`	a character vector of names for the crossed within-subjects factors to be created in the long form of the data.
`levels`	a named list of character vectors, each element giving the names of the levels for a within-subjects factor; the names of the list elements are the names of the within-subjects factor, given in the `within` argument.
`varying`	a named list of the names of variables in the wide data set specifying the occasion-varying variables to be created in the long data set; each element in the list is named for an occasion-varying variable and is a character vector of column names in the wide data for that occasion-varying variable.
`ignore`	a character vector of names of variables in the wide data to be dropped in the long form of the data.
`id`	the (character) name of the subject ID variable to be created in the long form of the data, default `"id"`.

`mod`	a model object of a class that can be handled by `stepAIC`.
`direction`	if `"backward/forward"` (the default), selection starts with the full model and eliminates predictors one at a time, at each step considering whether the criterion will be improved by adding back in a variable removed at a previous step; if `"forward/backwards"`, selection starts with a model including only a constant, and adds predictors one at a time, at each step considering whether the criterion will be improved by removing a previously added variable; `"backwards"` and `"forward"` are similar without the reconsideration at each step.
`criterion`	for selection. Either `"BIC"` (the default) or `"AIC"`. Note that `stepAIC` labels the criterion in the output as `"AIC"` regardless of which criterion is employed.
`...`	arguments to be passed to `stepAIC`.

`model`	a linear-model object.
`type`	type of sandwich standard errors to be computed; see `hccm` in the car package, and `vcovHAC` in the sandwich package, for details.
`...`	arguments to be passed to `hccm` or `vcovHAC`