Title: | Bootstrap Confidence Regions for Simple and Multiple Correspondence Analysis |
---|---|
Description: | Performs simple correspondence analysis on a two-way contingency table, or multiple correspondence analysis (homogeneity analysis) on data with p categorical variables, and produces bootstrap-based elliptical confidence regions around the projected coordinates for the category points. Includes routines to plot the results in a variety of styles. Also reports the standard numerical output for correspondence analysis. |
Authors: | Trevor Ringrose |
Maintainer: | Trevor Ringrose <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0 |
Built: | 2024-12-09 06:33:23 UTC |
Source: | CRAN |
Performs simple (classical) correspondence analysis on a two-way contingency table and produces bootstrap-based confidence regions around the projected coordinates for the category points. Includes additional routines for summarising the output and for plotting the results in a variety of ways, including both french and biplot styles.
Performs multiple correspondence analysis (homogeneity analysis) of a Burt matrix (a matrix of two-way contingency tables for p variables) and produces bootstrap-based confidence regions around the projected coordinates for the variable category points. This includes a new method to correct the confidence regions for the well-known distortion of the results caused by the diagonal of the Burt matrix. Also contains a highly experimental method to produce confidence regions when analysing an indicator matrix.
Package: | cabootcrs |
Type: | Package |
Version: | 2.1.0 |
Date: | 2022-02-03 |
License: | GPL-3 |
Depends: | lpSolve, colorspace |
Simple Correspondence Analysis:
Correspondence Analysis plots usually only show the coordinates for each of the row and column category points projected onto the new axes, with no indication of the degree of sampling variation. This package produces bootstrap-based confidence ellipses for each of the row and column points with respect to the axes shown.
These confidence regions are based on the sampling variation of the difference between sample and population points when both are projected onto the sample axes, allowing for variation in both the points and the axes and the correlation between them. Hence the coverage percentage is the chance of drawing a sample such that the confidence ellipse contains the population point when it is projected onto the samples axes as a supplementary point. See the reference below for further details.
There are options for different ways of generating the bootstrap resamples, notably based on either the Poisson or the multinomial distribution, with the latter allowing the option of fixed row or column sums.
Correspondence analysis results can be plotted in two main ways. The default option here is to produce a biplot where the row category points are plotted in principal coordinates (i.e. coordinates which allow for the different inertias of the axes) and the confidence ellipses are shown for these row category points. The column category points are shown in standard coordinates on this plot and drawn as directions in the common biplot style. A second biplot is also produced where the roles of the rows and columns are reversed.
The other main plotting option is to produce a "french-style" plot where both row categories and column categories are plotted as points in principal coordinates. However, again two plots are produced, one with confidence ellipses for the row category points and one with confidence ellipses for the column category points. This is a deliberate restriction, partly to reduce plot clutter but mostly to emphasize that the row and column points are in different spaces and that their relative positions should not be over-interpreted.
There are several options for different ways of plotting the data, with simple choices to vary the colour schemes, or to suppress point labels, or to show only a few of the ellipses, intended particularly to reduce the clutter in pictures with large data matrices. There is also an option for fuller control over the graphics, by supplying either files or data frames to define groups of points which can be plotted in common colours and symbols, or to suppress their point labels and ellipses.
The package can also be used just to perform Correspondence Analysis as usual, but with the above plotting options available.
The package can also be used just to produce the covariance matrices for each of the category points, which for example can then be used with ellipse() to add confidence ellipses to results from the ca() and mjca() (below) routines in the ca package.
Multiple Correspondence Analysis:
The same principles as above carry over to the Multiple Correspondence Analysis case with p categorical variables.
The recommended approach is to bootstrap the indicator matrix and then apply MCA to the derived Burt matrix. A new method is provided in the variance calculations to correct for the well-known problems induced by the diagonal of the Burt matrix.
Standard options for correcting inertias and coordinates are also given. In particular options exists such that MCA with p=2 gives the same results as SCA, so that MCA can be regarded as a proper generalisation of SCA (not all would agree with this, however).
The default plotting of the results is to produce p plots, with each plot showing confidence ellipses for all of the categories of just one variable. This is again done to reduce plot clutter, albeit at the expense of more plots. Again, numerous options for controlling the plots are provided.
Bootstrapping can also be applied to the analysis of an indicator matrix or a doubled matrix, but the procedure is highly experimental and very slow.
The data set can be input in numerous formats, and routines are provided to convert between them. However note that the data cannot be inputted as a Burt matrix because this loses some of the information in the data.
Changes from Version 1.0:
Multiple correspondence analysis routines added.
Hungarian algorithm added for axis rearranging, to replace the embarrassingly poor method used previously.
Bootstrap critical values added, and are now the default.
Numerous fairly minor changes made to existing simple correspondence analysis routines.
New routines to convert data between different formats.
There may be a little backwards incompatibility as some options to the routines have changed a bit, but standard use with few options will still work. Any changes should produce easily fixable failures to run.
Changes from Version 2.0:
Debugging, including for plotca to call dev.new() when in R but not Rstudio, and option to call it from Rstudio.
New routines to reflect axes and reorder categories, e.g. for when using results with other packages.
New routine to add supplementary points to plots.
More options and examples for plotting.
Final Notes:
The package does not use any routines from any of the other Correspondence Analysis packages, only base R routines. This was deliberate, in order to maintain control over the precise details. The only external routines used are lp.assign, from the lpSolve package, for the Hungarian algorithm, and some hcl functions from the colo(u)rspace package for plotting.
The results can be used as input to other packages, such as using ellipse()
to draw the ellipses on a plot from ca(), see covmat
and plotca
for this.
T.J. Ringrose <[email protected]>
Ringrose, T.J. (2012).
Bootstrap confidence regions for correspondence analysis.
Journal of Statistical Computation and Simulation.
Vol 83, No. 10, October 2012, 1397-1413.
A paper on the application to MCA is in preparation.
cabootcrs
,
plotca
,
summaryca
,
printca
,
convert
,
covmat
,
allvarscovs
,
reflectaxes
,
reordercategories
,
addsupplementary
,
sca
,
rearrange
,
cabootcrsresults
,
cabasicresults
# Data frame of a contingency table, with row and column labels data(DreamData) # Perform (simple) correspondence analysis, calculate variances and show confidence ellipses. # Use all defaults: 999 bootstrap replicates, Poisson resampling, calculate variances # only for first two axes, but give usual output for up to the first 4 axes. # Show one biplot with confidence ellipses for row points in principal coordinates, # another biplot with confidence ellipses for column points in principal coordinates. # In each case the other set of points are in standard coordinates, but note that the # lines are cropped to fit the plot by default, as it is the directions that matter most. bd <- cabootcrs(DreamData) # Plot in "french" style where both rows and columns are in principal coordinates, # not as a biplot, but still produce two plots, with row ellipses in one plot # and column ellipses in the other. plotca(bd, plottype="french") ## Not run: # See the stored results, an object of type cabootcrsresults bd # Prettier printed output, no plots. printca(bd) # Brief summary output, similar style to ca package, no plots summaryca(bd,datasetname="Dreams") # Extract the covariance matrix of: # row 4 for axes 1 and 2; # column 1 for axes 1 and 2. vmr4 <- covmat(bd,4,"row",1,2) vmc1 <- covmat(bd,1,"column",1,2) # Display all variances and covariances for each row and column, axes 1-2 allvarscovs(bd, "rows") allvarscovs(bd, "columns") # Convert the data set into a 223 individuals by 2 variables # matrix of category membership values ddnbyp <- convert(DreamData,input="CT",output="nbyp")$result # Perform multiple correspondence analysis with all defaults: # non-parametric resampling, analyse Burt matrix, correct for # the Burt diagonal in the inertias, coordinates and bootstrapping. # Note that the coordinates, inertias etc are identical to those # from simple CA above, while the standard deviations and hence # the ellipses are very similar, but not quite the same. bdmca <- cabootcrs(ddnbyp, catype="mca") ## End(Not run)
# Data frame of a contingency table, with row and column labels data(DreamData) # Perform (simple) correspondence analysis, calculate variances and show confidence ellipses. # Use all defaults: 999 bootstrap replicates, Poisson resampling, calculate variances # only for first two axes, but give usual output for up to the first 4 axes. # Show one biplot with confidence ellipses for row points in principal coordinates, # another biplot with confidence ellipses for column points in principal coordinates. # In each case the other set of points are in standard coordinates, but note that the # lines are cropped to fit the plot by default, as it is the directions that matter most. bd <- cabootcrs(DreamData) # Plot in "french" style where both rows and columns are in principal coordinates, # not as a biplot, but still produce two plots, with row ellipses in one plot # and column ellipses in the other. plotca(bd, plottype="french") ## Not run: # See the stored results, an object of type cabootcrsresults bd # Prettier printed output, no plots. printca(bd) # Brief summary output, similar style to ca package, no plots summaryca(bd,datasetname="Dreams") # Extract the covariance matrix of: # row 4 for axes 1 and 2; # column 1 for axes 1 and 2. vmr4 <- covmat(bd,4,"row",1,2) vmc1 <- covmat(bd,1,"column",1,2) # Display all variances and covariances for each row and column, axes 1-2 allvarscovs(bd, "rows") allvarscovs(bd, "columns") # Convert the data set into a 223 individuals by 2 variables # matrix of category membership values ddnbyp <- convert(DreamData,input="CT",output="nbyp")$result # Perform multiple correspondence analysis with all defaults: # non-parametric resampling, analyse Burt matrix, correct for # the Burt diagonal in the inertias, coordinates and bootstrapping. # Note that the coordinates, inertias etc are identical to those # from simple CA above, while the standard deviations and hence # the ellipses are very similar, but not quite the same. bdmca <- cabootcrs(ddnbyp, catype="mca") ## End(Not run)
addsupplementary
calculates principal coordinate values for supplementary rows or columns in SCA,
or supplementary variables in MCA, then plots them on the current selected graph, which it assumes to be appropriate.
addsupplementary( x, supp, thing = "columns", suppsymbol = "*", suppcolour = "blue", plotsupp = TRUE, varandcat = TRUE )
addsupplementary( x, supp, thing = "columns", suppsymbol = "*", suppcolour = "blue", plotsupp = TRUE, varandcat = TRUE )
x |
An object of class |
supp |
A data frame in the format: |
thing |
Whether to calculate the supplementary principal coordinates for
Note that "rows" is only needed for supplementary rows in SCA. |
suppsymbol |
The plot symbol used for the supplementary points |
suppcolour |
The colour of the supplementary points and their labels |
plotsupp |
TRUE if you want the points plotted on the currently active graph, FALSE otherwise |
varandcat |
Flag for how to construct column names for supplementary variables in MCA:
|
To add supplementary rows in SCA, define the parameter supp as a data frame with one named row for each supplementary row
and with the columns the same as in the original data.
If plotting the points, ensure that the plot for rows is selected.
To add supplementary columns in SCA define the parameter supp as a data frame with one named row for each supplementary column
and with the columns the same as the rows in the original data.
If plotting the points, ensure that the plot for columns is selected.
To add supplementary variables in MCA define the parameter supp as a data frame with one column for each supplementary variable in
individuals by variables format.
Each row represents the same individual as the same row in the original data,
and each entry is the category value for that supplementary variable.
Hence the new columns should just look exactly like new columns in the 'nbyp' format.
If plotting the points, any of the standard plots is suitable.
A matrix containing the principal coordinates of the supplementary points
cabootcrs-package
, cabootcrs
, plotca
, cabootcrsresults
results <- cabootcrs(DreamData) # SCA case two supplementary columns, make sure that the Columns plot is active suppcols <- data.frame(rbind(c(5,3,6,8,12),c(1,7,3,1,5))) suppcolpc <- addsupplementary(results, suppcols) suppcolpc ## Not run: # SCA case one supplementary row, make sure that the Rows plot is active supprow <- data.frame(cbind(12,4,8,3),row.names="supprow") supprowpc <- addsupplementary(results, supprow, thing="rows") supprowpc # MCA case, one or two supplementary variables, plots the same on any of the usual plots results3 <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE, datasetname="Dream data with extra random column") newsupcol <- c(rep(c(rep("s1",10),rep("s2",10),rep("s3",10)),8))[1:223] newsupcol2 <- c(rep(c(rep("t1",5),rep("t2",15),rep("t3",25),rep("t4",35)),5))[1:223] newsupcols <- cbind(newsupcol,newsupcol2) suppvarpc <- addsupplementary(results3, newsupcol, varandcat=FALSE) supp2varpc <- addsupplementary(results3, newsupcols, varandcat=FALSE) ## End(Not run)
results <- cabootcrs(DreamData) # SCA case two supplementary columns, make sure that the Columns plot is active suppcols <- data.frame(rbind(c(5,3,6,8,12),c(1,7,3,1,5))) suppcolpc <- addsupplementary(results, suppcols) suppcolpc ## Not run: # SCA case one supplementary row, make sure that the Rows plot is active supprow <- data.frame(cbind(12,4,8,3),row.names="supprow") supprowpc <- addsupplementary(results, supprow, thing="rows") supprowpc # MCA case, one or two supplementary variables, plots the same on any of the usual plots results3 <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE, datasetname="Dream data with extra random column") newsupcol <- c(rep(c(rep("s1",10),rep("s2",10),rep("s3",10)),8))[1:223] newsupcol2 <- c(rep(c(rep("t1",5),rep("t2",15),rep("t3",25),rep("t4",35)),5))[1:223] newsupcols <- cbind(newsupcol,newsupcol2) suppvarpc <- addsupplementary(results3, newsupcol, varandcat=FALSE) supp2varpc <- addsupplementary(results3, newsupcols, varandcat=FALSE) ## End(Not run)
allvarscovs
extracts all variances and covariances for either rows or columns
and puts them in a data frame
allvarscovs(x, thing = "columns")
allvarscovs(x, thing = "columns")
x |
An object of class |
thing |
Whether to extract the variances for
Note that default is "columns" as this is more convenient for MCA |
A data frame with one row for each row or column category
cabootcrs-package
, cabootcrs
,
covmat
, cabootcrsresults
results <- cabootcrs(DreamData, showresults=FALSE) rowvars <- allvarscovs(results,"rows") colvars <- allvarscovs(results,"columns") ## Not run: resultsmca <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) allvars <- allvarscovs(resultsmca) ## End(Not run)
results <- cabootcrs(DreamData, showresults=FALSE) rowvars <- allvarscovs(results,"rows") colvars <- allvarscovs(results,"columns") ## Not run: resultsmca <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) allvars <- allvarscovs(resultsmca) ## End(Not run)
Cases and severity of asbestosis, classified by years working with asbestos
AsbestosData
AsbestosData
A contingency table with 5 rows and 4 columns
Years working with asbestos: 0-9, 10-19, 20-29, 30-39, 40+
Has asbestosis or not: No (N), severity graded as 1-3 (G1-G3)
Still looking for it
Classification of mother's attachment to her child and child's reaction
AttachmentData
AttachmentData
A contingency table with 4 rows and 4 columns
Infant response: Avoidant, Secure, Resistant, Disorganised
Mother's Classification: Dismissing, Autonomous, Preoccupied, Unresolved
E.J. Beh, Elliptical confidence regions for simple correspondence analysis, Journal of Statistical Planning and Inference 140 (2010), pp. 2582–2588.
This is intended for internal use within cabootcrs
and only contains the
data structures required for each bootstrap replicate
Rprofile
Row profile matrix, class "matrix"
Cprofile
Column profile matrix, class "matrix"
Rweights
Matrix of weights for row points: square roots of inverse column sums, class "matrix"
Cweights
Matrix of weights for column points: square roots of inverse row sums, class "matrix"
Raxes
Matrix of axes for row points: right singular vectors of weighted, centred data matrix, class "matrix"
Caxes
Matrix of axes for column points: left singular vectors of weighted, centred data matrix, class "matrix"
r
Rank of weighted, centred data matrix, class "numeric"
realr
In multiple CA, the number of singular values (Burt matrix) or squared singular values
(indicator matrix) exceeding where
is the number of variables,
class
"numeric"
mu
Singular values of weighted, centred data matrix, class "numeric"
cabootcrs
performs simple or multiple correspondence analysis
and uses bootstrap resampling to
construct confidence ellipses for each appropriate category point,
printing and plotting the results;
for help on the package see cabootcrs-package
.
cabootcrs( xobject = NULL, datafile = NULL, datasetname = NULL, nboots = 999, resampledistn = "Poisson", multinomialtype = "whole", printdims = 4, lastaxis = 4, maxrearrange = 6, rearrangemethod = "lpassign", usebootcrits = TRUE, groupings = NULL, grouplabels = NULL, varnames = NULL, plotsymbolscolours = c(19, "inferno", 18, "inferno"), othersmonochrome = "grey", crpercent = 95, catype = "sca", scainput = "CT", mcainput = "nbyp", mcatype = "Burt", mcavariant = "mca", mcasupplementary = "offdiag", mcaadjustinertias = TRUE, mcauseadjustinertiasum = FALSE, mcaadjustcoords = TRUE, mcaadjustmassctr = FALSE, mcaoneploteach = TRUE, mcashowindividuals = FALSE, mcavariablecolours = FALSE, mcacategorycolours = FALSE, Jk = NULL, varandcat = TRUE, likertarrows = FALSE, mcastoreindicator = TRUE, mcaindividualboot = FALSE, mcalikertnoise = 0.1, poissonzeronewmean = 0, newzeroreset = 0, bootstdcoords = FALSE, reflectonly = FALSE, showresults = TRUE, eps = 1e-15 )
cabootcrs( xobject = NULL, datafile = NULL, datasetname = NULL, nboots = 999, resampledistn = "Poisson", multinomialtype = "whole", printdims = 4, lastaxis = 4, maxrearrange = 6, rearrangemethod = "lpassign", usebootcrits = TRUE, groupings = NULL, grouplabels = NULL, varnames = NULL, plotsymbolscolours = c(19, "inferno", 18, "inferno"), othersmonochrome = "grey", crpercent = 95, catype = "sca", scainput = "CT", mcainput = "nbyp", mcatype = "Burt", mcavariant = "mca", mcasupplementary = "offdiag", mcaadjustinertias = TRUE, mcauseadjustinertiasum = FALSE, mcaadjustcoords = TRUE, mcaadjustmassctr = FALSE, mcaoneploteach = TRUE, mcashowindividuals = FALSE, mcavariablecolours = FALSE, mcacategorycolours = FALSE, Jk = NULL, varandcat = TRUE, likertarrows = FALSE, mcastoreindicator = TRUE, mcaindividualboot = FALSE, mcalikertnoise = 0.1, poissonzeronewmean = 0, newzeroreset = 0, bootstdcoords = FALSE, reflectonly = FALSE, showresults = TRUE, eps = 1e-15 )
xobject |
Name of data object (data frame or similar class that can be coerced to data frame). |
datafile |
Name of a text file (in " ") containing the data, same defaults as xobject, ignored if xobject is non-null |
datasetname |
A string to use as the name of the data set in the plots, defaults to name of xobject or datafile |
nboots |
Number of boostrap replicate matrices used, default and recommended minimum is 999,
but 9999 is recommended if machine and data set size allows;
the calculated variances will sometimes differ
around the third decimal place, but the pictures should look the same. |
resampledistn |
Poisson resampling is the default for SCA,
|
multinomialtype |
Only relevant for multinomial sampling in SCA, otherwise ignored:
|
printdims |
Print full correspondence analysis coordinates, contributions, correlations etc for all output dimensions up to and including this one |
lastaxis |
Calculate variances and covariances for all output axes (dimensions) up to this one
(or the number of dimensions in the solution if smaller). |
maxrearrange |
The maximum number of axes to consider when rearranging |
rearrangemethod |
The method used to rearrange the axes:
Option is only included in case something weird goes wrong with lpSolve. |
usebootcrits |
To be passed to the plot routine, see |
groupings |
To be passed to the plot routine, see |
grouplabels |
To be passed to the plot routine, see |
varnames |
Character p-vector naming the variables, defaults to c("Rows","Columns") in sca |
plotsymbolscolours |
To be passed to the plot routine, see |
othersmonochrome |
To be passed to the plot routine, see |
crpercent |
To be passed to the plot routine, see |
catype |
Type of correspondence analysis:
|
scainput |
Format of input data, only applies for SCA:
|
mcainput |
Format of input data, only applies for MCA:
|
mcatype |
Format of data matrix analysed, only applies for MCA:
|
mcavariant |
Currently must be "mca", placeholder for future updates |
mcasupplementary |
How the sample points are projected as supplementary points onto the bootstrap axes when calculating the variances in MCA of a Burt matrix, see Details section for full explanation
If "offdiag" then when p=2 the variances will be very similar to those from SCA. |
mcaadjustinertias |
Whether to adjust inertias to allow for the meaningless inertia terms induced by the diagonal of the Burt matrix in MCA:
If TRUE then when p=2 the inertias will agree with those from SCA. |
mcauseadjustinertiasum |
How to define the total inertia in MCA, whether to just use the sum of the adjusted inertias:
If TRUE then when p=2 the inertias will agree with those from SCA |
mcaadjustcoords |
Whether to adjust the principal coordinates in MCA using the adjusted inertias above, as in Greenacre and Blasius, p68:
If TRUE then when p=2 the coordinates will agree with those from SCA. |
mcaadjustmassctr |
Whether to adjust the point masses and column contributions in MCA so that the masses and contributions are with respect to each variable (as in SCA) rather than with respect to all variables together:
If TRUE then when p=2 the CTR will agree with those from SCA, though when p>2 the contributions can be >1 |
mcaoneploteach |
Parameter passed to
|
mcashowindividuals |
Parameter passed to
|
mcavariablecolours |
Parameter passed to
|
mcacategorycolours |
Parameter passed to
|
Jk |
The number of classes for each variable in MCA, as a list or vector, which only needs specifying when inputting an indicator matrix, as in other cases it can be derived from the input matrix |
varandcat |
Flag for how to construct variable category names:
|
likertarrows |
Parameter passed to |
mcastoreindicator |
If TRUE then store the indicator matrix created for MCA |
mcaindividualboot |
If TRUE then use the experimental method to bootstrap an indicator or doubled matrix, see Details section part (2) for full explanation |
mcalikertnoise |
The "noise" value to use in the experimental method (above) to bootstrap an indicator or doubled matrix, see Details section part (2) for full explanation |
poissonzeronewmean |
Experimental method for SCA to deal with contingency tables where zero cells
could have been non-zero, i.e. they are not structural zeros. |
newzeroreset |
Experimental method for SCA to deal with sparse contingency tables. |
bootstdcoords |
If TRUE then produce bootstrap variances for points in standard coordinates
instead of principal coordinates |
reflectonly |
If TRUE then just allow for axis reflections and not axis reorderings |
showresults |
If TRUE then output the results using |
eps |
Any value less than this is treated as zero in some calculations and comparisons |
This routine performs all of the usual Correspondence Analysis calculations while also using bootstrapping to estimate the variance of the difference between the sample and population point when both are projected onto the sample axes in principal coordinates. This is done for each row and column category on each dimension of the solution, allowing for sampling variation in both the points and the axes.
It hence constructs confidence ellipses for each category point, plots the results
by a call to plotca
and prints the usual Correspondence Analysis summary
output and the calculated standard deviations through a call to summaryca
.
Use printca
for more detailed numerical results.
For further examples and help on the package as a whole see cabootcrs-package
.
(1) Corrections for Burt diagonal
It is well-known that in multiple CA (MCA) the results are distorted by the diagonal elements of the Burt matrix. As well as the standard methods to correct for this, here we propose and implement a new method to correct for this when bootstrapping. If bootstrapping is applied in a naive way then, even when the standard corrections are used, the estimated variances will be much too small because diagonal elements of the standardised Burt matrix are the same in every bootstrap replicate, thus underestimating the true variation in the data.
All bootstrapping is performed on the indicator matrix (or equivalently the n by p matrix) and the resampled Burt matrix is then constructed from the resampled indicator matrix in the usual way.
Included here are the usual corrections to the inertias (mcaadjustinertias=TRUE, the default) and the coordinates (mcaadjustcoords=TRUE, the default). In addition you can choose to use, as the total inertia, either the sum of these adjusted inertias (mcauseadjustinertiasum=TRUE) as proposed by Benzecri or the average of the off-diagonal inertias (mcauseadjustinertiasum=FALSE, the default) as proposed by Greenacre. You can adjust (multiply by p) the Contribution figures in MCA so that they sum to p over all variables, i.e. an average of 1 for each variable as in SCA (mcaadjustmassctr=TRUE), rather than a total of 1 over all variables, as usually in MCA (mcaadjustmassctr=FALSE, the default). Note that when p=2 this will be the same as in SCA, but when p>3 you can get contributions greater than 1, so use with caution. This also adjusts (multiplies by p) the point masses so that they sum to 1 for each variable, rather than over all variables, same caveat applies.
The fundamental problem with MCA is that in a Burt matrix each diagonal element is the value of a variable category cross-classified with itself, so it is always equal to the number of times that category appears. Hence if a category appears k times then the row (or column) in the Burt matrix consists of p blocks each of which sum to k, so its row (and column) sum is kp.
Therefore when the row (or column) profile matrix is calculated the diagonal elements of the Burt matrix are always all 1/p while the elements for the categories in the offdiagonal blocks sum to 1/p in each block. Hence when the projected difference between the sample and resample row (or column) profile matrices is calculated this is artificially small because the diagonals of the two matrices are always the same, no matter how different the off-diagonal elements are.
The new method to correct for the diagonal elements of the Burt matrix when calculating variances therefore works by re-expressing the coordinates purely in terms of the interesting and variable off-diagonal elements, excluding the uninteresting and constant diagonal elements.
First calculate the Burt principal coordinates (PC), but with the diagonal elements of the profile matrix ignored (or set to zero). It is easy to verify that the usual Burt principal coordinates can be re-expressed, using the singular values (SV), as
Burt PC = ( Burt SV / (Burt SV - (1/p)) ) x Burt PC without diagonal element
Hence in the bootstrapping the sample and resample points are re-expressed in this way and their differences when projected onto the bootstrap axes are calculated as usual.
Similarly the adjusted principal coordinates are calculated as
adj Burt PC = (p/(p-1)) x ( (Burt SV - (1/p)) / Burt SV ) x Burt PC
So, when using the usual adjusted coordinates (and adjusted inertias), both of the above will be used, hence ending up with just a correction of (p/(p-1)). The unadjusted coordinates can be used, but this is not recommended.
One consequence of this correction is that when mcaadjustinertias=TRUE, mcaadjustcoords=TRUE, mcauseadjustinertiasum=TRUE, mcaadjustmassctr=TRUE and mcasupplementary="offdiag" then when p=2 the bootstrap variances for MCA are almost the same as those for SCA, while all other results for MCA are the same as those for SCA. The package author regards this as a good thing, making MCA more of a proper generalisation of MCA, but recognises that some people regard MCA as a fundamentally different method to SCA, linked only by the common algebra.
Note that if this adjustment is not made then in the p=2 case it is easy to see that the projected differences are half those in the SCA case and the standard deviations are a quarter the size.
The new method will be written up for publication once this update to the package is finished.
(2) Experimental method for bootstrapping indicator matrices
A highly experimental method is included for bootstrapping with an indicator or doubled matrix
in the case of ordered categorical (e.g. Likert scale) data.
This has not been studied or optimised extensively, and is currently very slow, so use is very
much at the user's discretion and at user's risk. To try it, choose:
mcatype="indicator" or mcatype="doubled" with nboots>0
If CRs are required for the individual points then also choose:
mcaindividualboot=TRUE
The bootstrap methodology used here relies on the comparison of bootstrap to sample points when both are projected onto bootstrap axes. In SCA this is fine because when looking at column points the axes are given by the rows and vice versa, and row i and column j each represent the same category in all bootstrap replicates. Similarly in MCA with a Burt matrix when looking at column points the axes are given by the rows, which again each represent the same category in all bootstrap replicates.
However, with an indicator or doubled matrix, row i of the matrix does not represent the same individual in each replicate, it just represents the i-th individual drawn in the resampling for that particular replicate. In order for this type of bootstrapping to work with an indicator or doubled matrix we would need a resampling method whereby the i-th row represents the i-th individual in all bootstrap replicates. The resampled row would need to represent the answers of the same individual "on another day". This might make sense with questionnaire data where an individual's answers have uncertainty attached, in that if asked the same questions on multiple occasions they would give different answers due to random variation rather than temporal change. If a believable model for the sampling, and hence the resampling, could be derived (CUB models perhaps) then this could lead to CRs for an individual point, representing the uncertainty in what that person actually thinks.
The method here uses the same idea of treating the sample row as representing an individual, and bootstrap replicates of that row as representing the variability in how that individual might have answered the questionnaire (or similar). However, instead of an explicit model to represent this variability, it is generated by the data themselves. The bootstrap replicate indicator (or doubled) matrix is generated as usual, by either non-parametric (multinomial) or balanced resampling, and then its rows are "matched" to the rows of the sample indicator (or doubled) matrix. The rows of the resampled matrix are reordered so that the rows of the resampled matrix are, overall, as similar as possible to the same rows of the sample matrix. Hence the resampled rows can reasonably be viewed as representing the same individual in each bootstrap replicate, and hence variances for the column points (categories) and row points (individuals) can be produced.
The matching again uses the Hungarian algorithm from lpSolve. This only makes sense if all variables are ordered categorical. Applying this to (ordered) categorical data results in a very large number of ties, so the mcalikertnoise parameter defines the standard deviation of white noise added to each of the sample and resample category numbers for the purpose of the matching (only).
Note that this is very slow for all but small data sets, and if all of the CRs for individuals are shown then the plot is impossibly busy. Hence it is recommended that this experimental method only be used for fairly small data sets where CRs are wanted for only a few example individuals. A better approach might be to average the CRs of all of the individuals who give the same results in the sample, but this is not implemented yet.
Note that supplementary row points in indicator matrix MCA are usually regarded as different people answering the same questions, whereas in this case for CRs to make any sense we need to regard them as the same people answering the same questions on different days.
(3) Critical values
Bootstrap critical values are calculated by re-using the bootstrap replicates used to calculate
the variances, with a critical value calculated for each ellipse.
The projected differences between bootstrap and sample points are ordered and the appropriate
percentile value picked. These are usually slightly larger than the critical values.
Only 90%, 95% (default) and 99% critical values are calculated.
Alternatively use critical values, usually with df=2, but with df=1 if only 2 non-zero cells in row/col.
The experimental method to construct ellipses for individuals in MCA always uses bootstrap critical values
An object of class cabootcrsresults
cabootcrs-package
, cabootcrsresults
,
plotca
, printca
, summaryca
,
covmat
, allvarscovs
# Simple CA (SCA) of a 5 by 4 contingency table, using all SCA defaults: # 999 bootstraps, Poisson resampling, variances for up to first four axes, # usual output for up to the first 4 axes, # one biplot with CRs for rows in principal coordinates and another with # CRs for columns in principal coordinates bd <- cabootcrs(DreamData) ## Not run: # Same data set with a completely random three-category third variable added, # analysed with MCA but with standardisations which mimic SCA as much as possible bd3 <- cabootcrs(DreamData223by3, catype="mca") Explicitly stating what the rows and columns represent, often needed for a contingency table bd <- cabootcrs(DreamData, datasetname="Maxwell's dream data", varnames=c("What the rows are","What the columns are")) # Multiple CA (MCA) of 3 categorical variables with all defaults: # non-parametric resampling, Burt matrix analysed, # each variable has one plot with it in colour with CRs shown, other variables in monochrome. # Same data set but now as 223 by 3 matrix, with random 3rd column (with 3 categories) added. bd3 <- cabootcrs(DreamData223by3, catype="mca") # Comparison of SCA to MCA with p=2, by converting contingency table to 223 by 2 matrix. # Note that the coordinates and inertias etc are the same while the standard deviations # and hence the ellipses are very similar but not identical. bd <- cabootcrs(DreamData) DreamData223by2 <- convert(DreamData,input="CT",output="nbyp")$result bdmca <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE) # Not adjusting inertias, which means that coordinates will also not be adjusted and # the bootstrapping will use the Burt diagonal. # Note how the coordinates are larger but the inertias and ellipses are smaller. bdmcaunadj <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcaadjustinertias=FALSE) # Applying the standard adjustments to inertias and coordinates, but with # the bootstrapping still using the Burt diagonal. # Note how inertias and coordinates are now the same as SCA, but ellipses are smaller. bdmcaadjbutall <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcasupplementary="all") # Effect of sample size in SCA: bdx4 <- cabootcrs(4*DreamData) bdx9 <- cabootcrs(9*DreamData) ba <- cabootcrs(AttachmentData) bs <- cabootcrs(SuicideData) bas <- cabootcrs(AsbestosData) # Options for SCA: # SCA with multinomial resampling, with the matrix treated as a single multinomial distribution bdm <- cabootcrs(DreamData, resampledistn="multinomial") # Fix the row sums, i.e. keep sum of age group constant bdmrf <- cabootcrs(DreamData, resampledistn="multinomial", multinomialtype="rowsfixed") # Use chi-squared critical values for the CRs bdchisq <- cabootcrs(DreamData, usebootcrits=FALSE) # Just perform correspondence analysis, without bootstrapping bdnb0 <- cabootcrs(DreamData, nboots=0) # Effect of sample size in MCA: bn <- cabootcrs(NishData, catype="mca") # Options for MCA # Using default settings the SCA and MCA standard results are the same when p=2, # bootstrap standard deviations (multinomial/nonparametric) are similar but not identical bdsca <- cabootcrs(DreamData,resampledistn="multinomial") bdmca <- cabootcrs(convert(DreamData,input="CT",output="nbyp")$result, catype="mca") # Row A can be labelled A rather than R:A # because the three variables have all different category names bd3l <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE) # Balanced resampling, each of the 223 rows occurs 999 times in the 999 resamples bd3b <- cabootcrs(DreamData223by3, catype="mca", resampledistn="balanced") # Do not adjust inertias, coordinates or contributions # (if inertias are not adjusted then coordinates are also not adjusted) bd3unadj <- cabootcrs(DreamData223by3, catype="mca",mcaadjustinertias=FALSE) ## Comparisons to ellipses from FactoMineR # Generate some completely random uniform categorical data, construct ellipses. # The cabootcrs ellipses are very large and overlap extensively, as you would expect # from completely random data. # The FactoMineR ellipses are much smaller, often with minimal overlaps, # giving a completely false impression of genuine differences between categories. library(FactoMineR) p <- 4 maxcat <- 5 n <- 100 Xnpr <- apply( as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)), 2, factor ) fr <- MCA(Xnpr, method="Burt") plotellipses(fr) br <- cabootcrs(Xnpr, catype="mca", showresults=FALSE) plotca(br, mcacategorycolours = TRUE, showcolumnlabels=FALSE) ## Comparisons to results in ca and FactoMineR Summary: If using unadjusted inertias, coordinates the packages produce identical results, apart from differences in presentation (rounding off, the naming of rep/cor/cos2). Summary: When using adjusted inertias and coordinates (not an option in FactoMineR::MCA) the correlations in ca::mjca no longer sum to 1 over all dimensions, in cabootcrs they do. Ratios are the same for each dimension, but not each point, they are standardised differently. # Example comparisons with random data library(FactoMineR) library(ca) p <- 4 maxcat <- 5 n <- 100 Xnpdf <- as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)) Xnpr <- apply( Xnpdf, 2, factor ) # Note that ca::mjca only accepts the data as numerical, # FactoMineR::MCA only acccepts the data as characters rbun <- cabootcrs(Xnpr, catype="mca", nboots=0, mcaadjustinertias = FALSE) rcun <- mjca(Xnpdf,lambda="Burt") summary(rcun) rfm <- MCA(Xnpr,method="Burt", graph=FALSE) summary(rfm) rb <- cabootcrs(Xnpr, catype="mca", nboots=0) rc <- mjca(Xnpdf) summary(rc) realr <- rb@br@realr rb@ColREP[,1:realr] rc$colcor[,1:realr] apply(rb@ColREP[,1:realr],1,"sum") apply(rc$colcor[,1:realr],1,"sum") ## End(Not run)
# Simple CA (SCA) of a 5 by 4 contingency table, using all SCA defaults: # 999 bootstraps, Poisson resampling, variances for up to first four axes, # usual output for up to the first 4 axes, # one biplot with CRs for rows in principal coordinates and another with # CRs for columns in principal coordinates bd <- cabootcrs(DreamData) ## Not run: # Same data set with a completely random three-category third variable added, # analysed with MCA but with standardisations which mimic SCA as much as possible bd3 <- cabootcrs(DreamData223by3, catype="mca") Explicitly stating what the rows and columns represent, often needed for a contingency table bd <- cabootcrs(DreamData, datasetname="Maxwell's dream data", varnames=c("What the rows are","What the columns are")) # Multiple CA (MCA) of 3 categorical variables with all defaults: # non-parametric resampling, Burt matrix analysed, # each variable has one plot with it in colour with CRs shown, other variables in monochrome. # Same data set but now as 223 by 3 matrix, with random 3rd column (with 3 categories) added. bd3 <- cabootcrs(DreamData223by3, catype="mca") # Comparison of SCA to MCA with p=2, by converting contingency table to 223 by 2 matrix. # Note that the coordinates and inertias etc are the same while the standard deviations # and hence the ellipses are very similar but not identical. bd <- cabootcrs(DreamData) DreamData223by2 <- convert(DreamData,input="CT",output="nbyp")$result bdmca <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE) # Not adjusting inertias, which means that coordinates will also not be adjusted and # the bootstrapping will use the Burt diagonal. # Note how the coordinates are larger but the inertias and ellipses are smaller. bdmcaunadj <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcaadjustinertias=FALSE) # Applying the standard adjustments to inertias and coordinates, but with # the bootstrapping still using the Burt diagonal. # Note how inertias and coordinates are now the same as SCA, but ellipses are smaller. bdmcaadjbutall <- cabootcrs(DreamData223by2, catype="mca", varandcat=FALSE, mcasupplementary="all") # Effect of sample size in SCA: bdx4 <- cabootcrs(4*DreamData) bdx9 <- cabootcrs(9*DreamData) ba <- cabootcrs(AttachmentData) bs <- cabootcrs(SuicideData) bas <- cabootcrs(AsbestosData) # Options for SCA: # SCA with multinomial resampling, with the matrix treated as a single multinomial distribution bdm <- cabootcrs(DreamData, resampledistn="multinomial") # Fix the row sums, i.e. keep sum of age group constant bdmrf <- cabootcrs(DreamData, resampledistn="multinomial", multinomialtype="rowsfixed") # Use chi-squared critical values for the CRs bdchisq <- cabootcrs(DreamData, usebootcrits=FALSE) # Just perform correspondence analysis, without bootstrapping bdnb0 <- cabootcrs(DreamData, nboots=0) # Effect of sample size in MCA: bn <- cabootcrs(NishData, catype="mca") # Options for MCA # Using default settings the SCA and MCA standard results are the same when p=2, # bootstrap standard deviations (multinomial/nonparametric) are similar but not identical bdsca <- cabootcrs(DreamData,resampledistn="multinomial") bdmca <- cabootcrs(convert(DreamData,input="CT",output="nbyp")$result, catype="mca") # Row A can be labelled A rather than R:A # because the three variables have all different category names bd3l <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE) # Balanced resampling, each of the 223 rows occurs 999 times in the 999 resamples bd3b <- cabootcrs(DreamData223by3, catype="mca", resampledistn="balanced") # Do not adjust inertias, coordinates or contributions # (if inertias are not adjusted then coordinates are also not adjusted) bd3unadj <- cabootcrs(DreamData223by3, catype="mca",mcaadjustinertias=FALSE) ## Comparisons to ellipses from FactoMineR # Generate some completely random uniform categorical data, construct ellipses. # The cabootcrs ellipses are very large and overlap extensively, as you would expect # from completely random data. # The FactoMineR ellipses are much smaller, often with minimal overlaps, # giving a completely false impression of genuine differences between categories. library(FactoMineR) p <- 4 maxcat <- 5 n <- 100 Xnpr <- apply( as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)), 2, factor ) fr <- MCA(Xnpr, method="Burt") plotellipses(fr) br <- cabootcrs(Xnpr, catype="mca", showresults=FALSE) plotca(br, mcacategorycolours = TRUE, showcolumnlabels=FALSE) ## Comparisons to results in ca and FactoMineR Summary: If using unadjusted inertias, coordinates the packages produce identical results, apart from differences in presentation (rounding off, the naming of rep/cor/cos2). Summary: When using adjusted inertias and coordinates (not an option in FactoMineR::MCA) the correlations in ca::mjca no longer sum to 1 over all dimensions, in cabootcrs they do. Ratios are the same for each dimension, but not each point, they are standardised differently. # Example comparisons with random data library(FactoMineR) library(ca) p <- 4 maxcat <- 5 n <- 100 Xnpdf <- as.data.frame( matrix( round(runif(n*p,0.5,maxcat+0.5)), n, p)) Xnpr <- apply( Xnpdf, 2, factor ) # Note that ca::mjca only accepts the data as numerical, # FactoMineR::MCA only acccepts the data as characters rbun <- cabootcrs(Xnpr, catype="mca", nboots=0, mcaadjustinertias = FALSE) rcun <- mjca(Xnpdf,lambda="Burt") summary(rcun) rfm <- MCA(Xnpr,method="Burt", graph=FALSE) summary(rfm) rb <- cabootcrs(Xnpr, catype="mca", nboots=0) rc <- mjca(Xnpdf) summary(rc) realr <- rb@br@realr rb@ColREP[,1:realr] rc$colcor[,1:realr] apply(rb@ColREP[,1:realr],1,"sum") apply(rc$colcor[,1:realr],1,"sum") ## End(Not run)
This contains all of the usual output from simple or multiple CA, plus the results of the bootstrap analysis and the various settings used for this.
The meanings and possible values for the settings are described in cabootcrs
br
The basic results from CA, class cabasicresults
datasetname
Name of the data set for printing, class "character"
DataMatrix
The sample data matrix, class "matrix"
rows
Number of rows, class "numeric"
columns
Number of columns, class "numeric"
rowlabels
Row category labels, class "character"
collabels
Column category labels, class "character"
varnames
Names of the variables, class "character"
Rowprinccoord
Principal coordinates for row points, class "matrix"
Colprinccoord
Principal coordinates for column points, class "matrix"
Rowstdcoord
Standard coordinates for row points, class "matrix"
Colstdcoord
Standard coordinates for column points, class "matrix"
RowCTR
Contributions for row points, class "matrix"
RowREP
Representations for row points, class "matrix"
ColCTR
Contributions for column points, class "matrix"
ColREP
Representations for column points, class "matrix"
RowVar
Variances for row points, class "matrix"
RowCov
Covariances for row points, class "array"
ColVar
Variances for column points, class "matrix"
ColCov
Covariances for column points, class "array"
inertiasum
Total inertia, class "numeric"
inertias
Axis inertias, class "matrix"
rowmasses
Masses of row points, class "numeric"
colmasses
Masses of column points, class "numeric"
nboots
Number of bootstrap replicates used to calculate the (co)variances, class "numeric"
.
If nboots=0 then standard CA or MCA is performed with no confidence regions produced.
resampledistn
Distribution used for resampling, class "character"
multinomialtype
Form of multinomial resampling used, class "character"
sameaxisorder
Number of resamples with no reordering in first six bootstrap axes, class "numeric"
poissonzeronewmean
Mean used for resampling zero cells, class "numeric"
newzeroreset
Option to reset resample zero cells, class "numeric"
printdims
Number of dimensions to print, though note that all are stored, class "numeric"
axisvariances
Number of axes for which variances were calculated and are stored, class "numeric"
bootcritR
Bootstrap critical values for row points, class "array"
bootcritC
Bootstrap critical values for column points, class "array"
usebootcrits
Whether to use bootstrap critical values for confidence ellipses, class "logical"
catype
Type of correspondence analysis performed, class "character"
mcatype
Type of multiple correspondence analysis performed, class "character"
mcaindividualboot
Whether the experimental method to bootstrap an indicator or doubled matrix was used, class "logical"
IndicatorMatrix
The indicator matrix derived from the data matrix, class "matrix"
Jk
The number of classes for each variable, class "numeric"
p
The number of variables, class "numeric"
mcalikertnoise
The noise value used in the experimental method to bootstrap an indicator or doubled matrix, class "numeric"
mcaadjustinertias
Whether MCA inertias were adjusted, class "logical"
mcauseadjustinertiasum
Whether the adjusted MCA inertia sum was used, class "logical"
mcaadjustcoords
Whether the MCA coordinates were adjusted, class "logical"
mcaadjustmassctr
Whether the MCA masses and contributions were adjusted, class "logical"
mcasupplementary
How supplementary points were calculated when bootstrapping a Burt matrix, class "character"
convert
recodes a data matrix from one format, used by versions of correspondence analysis,
into another (n objects by p variables, counts for distinct combinations of p variables, indicator matrix, contingency table).
convert( Xinput, input = "nbyp", output = "indicator", Jk = NULL, maxcat = NULL, varandcat = TRUE )
convert( Xinput, input = "nbyp", output = "indicator", Jk = NULL, maxcat = NULL, varandcat = TRUE )
Xinput |
A data matrix, in the form of a data frame or similar |
input |
The format of the input matrix:
|
output |
The format of the output matrix:
|
Jk |
A list containing the number of distinct categories for each variable. |
maxcat |
The maximum category value, for use when all variables are Likert on a scale of 1 to maxcat. |
varandcat |
Flag for how to construct column names in an indicator matrix:
|
A list containing:
the output data matrix formatted according to the output argument
a list of length p containing the names of each variable
a list/array (of length p) containing the lists (of length Jk[i]) of category names for each variable
a list of length p containing the number of distinct categories for each variable
the number of variables
getBurt
to obtain a Burt matrix or a subset of an existing onegetCT
to obtain a contingency table (only if p=2)getindicator
to obtain an indicator matrixgetdoubled
to obtain a doubled matrix if all variables are ordered categorical with numbered categories
Other conversion functions:
getBurt()
,
getCT()
,
getdoubled()
,
getindicator()
dreamdataCT <- DreamData dreamdatanbyplist <- convert(dreamdataCT,input="CT",output="nbyp") dreamdatanbyp <- dreamdatanbyplist$result ## Not run: dreamdataCTb <- table(dreamdatanbyp) dreamdatanbypcounts <- convert(dreamdatanbyp,input="nbyp",output="nbypcounts")$result dreamdataindicatorlist <- convert(dreamdatanbypcounts,input="nbypcounts",output="indicator") dreamdatanbypb <- convert(dreamdataindicatorlist$result,input="indicator", output="nbyp",Jk=dreamdataindicatorlist$Jk)$result nishdatanbyp <- NishData nishdataindicator <- convert(nishdatanbyp)$result nishdataBurt <- t(nishdataindicator)%*%nishdataindicator ## End(Not run)
dreamdataCT <- DreamData dreamdatanbyplist <- convert(dreamdataCT,input="CT",output="nbyp") dreamdatanbyp <- dreamdatanbyplist$result ## Not run: dreamdataCTb <- table(dreamdatanbyp) dreamdatanbypcounts <- convert(dreamdatanbyp,input="nbyp",output="nbypcounts")$result dreamdataindicatorlist <- convert(dreamdatanbypcounts,input="nbypcounts",output="indicator") dreamdatanbypb <- convert(dreamdataindicatorlist$result,input="indicator", output="nbyp",Jk=dreamdataindicatorlist$Jk)$result nishdatanbyp <- NishData nishdataindicator <- convert(nishdatanbyp)$result nishdataBurt <- t(nishdataindicator)%*%nishdataindicator ## End(Not run)
covmat
extracts a 2 by 2 covariance matrix for one data point on two dimensions,
allowing the confidence ellipse to be plotted
covmat(x, i, thing = "column", axis1 = 1, axis2 = 2, show = TRUE)
covmat(x, i, thing = "column", axis1 = 1, axis2 = 2, show = TRUE)
x |
An object of class |
i |
The number of the row or column, note that in MCA this will be the number of the variable category (e.g. for p=3 variables with 5 categories each, column 8 is the 3rd category of the 2nd variable) |
thing |
Whether to extract the covariance matrix for the i-th
Note that default is "column" as this is more convenient for MCA |
axis1 |
First axis for which (co)variances are required |
axis2 |
Second axis for which (co)variances are required |
show |
If TRUE then print the extracted covariance matrix |
This can be used with the ellipse() package to add the confidence ellipse to a picture from another package
Example: confidence ellipse for row or column i on axes 1,2 from cabootcrs() output Results is:
lines( ellipse(x=covmat(Results,i,"row",1,2,FALSE),
centre=Results@Rowprinccoord[i,cbind(1,2)], npoints=1000),
cex=1, pch=".", col="blue")
lines( ellipse(x=covmat(Results,i,"column",1,2,FALSE),
centre=Results@Colprinccoord[i,cbind(1,2)], npoints=1000),
cex=1, pch=".", col="blue")
Note that reflectaxes
will be needed if cabootcrs() and ca() axes
are reflected with respect to each other
An object of class "matrix"
(square symmetric, 2 by 2)
cabootcrs-package
, cabootcrs
, allvarscovs
,
cabootcrsresults
results <- cabootcrs(DreamData, showresults=FALSE) row2covmataxes12 <- covmat(results,2,"row") col3covmataxes23 <- covmat(results,3,"column",2,3) ## Not run: # There are now 3 variables with 5,4,3 categories, hence 12 columns resultsmca <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) row2covmataxes12mca <- covmat(resultsmca,2,"column") col3covmataxes23mca <- covmat(resultsmca,8,"column",2,3) newvarcat2covmataxes12mca <- covmat(resultsmca,11,"column") # Use ellipse() to put confidence regions around row points on a plot produced by ca(). # Note that reflectaxes() will be needed if cabootcrs() and ca() axes # are reflected with respect to each other library(ca) library(ellipse) TheData <- DreamData Results <- cabootcrs(TheData, showresults=FALSE) caResults <- ca(TheData) plot(caResults) for (i in 1:dim(TheData)[1]) { lines( ellipse(x=covmat(Results,i,"row",1,2,FALSE), centre=Results@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") } ## End(Not run)
results <- cabootcrs(DreamData, showresults=FALSE) row2covmataxes12 <- covmat(results,2,"row") col3covmataxes23 <- covmat(results,3,"column",2,3) ## Not run: # There are now 3 variables with 5,4,3 categories, hence 12 columns resultsmca <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) row2covmataxes12mca <- covmat(resultsmca,2,"column") col3covmataxes23mca <- covmat(resultsmca,8,"column",2,3) newvarcat2covmataxes12mca <- covmat(resultsmca,11,"column") # Use ellipse() to put confidence regions around row points on a plot produced by ca(). # Note that reflectaxes() will be needed if cabootcrs() and ca() axes # are reflected with respect to each other library(ca) library(ellipse) TheData <- DreamData Results <- cabootcrs(TheData, showresults=FALSE) caResults <- ca(TheData) plot(caResults) for (i in 1:dim(TheData)[1]) { lines( ellipse(x=covmat(Results,i,"row",1,2,FALSE), centre=Results@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") } ## End(Not run)
Reported disturbance of dreams among boys, using labels A, a etc
DreamData
DreamData
A contingency table with 5 rows and 4 columns
Age group: 5-7 (A), 8-9 (B), 10-11 (C), 12-13 (D), 13-14 (E)
Severity of disturbance of dream: lowest (a) to highest (d)
G. Iliopoulos, M. Kateri and I. Ntzoufras, Bayesian estimation of unrestricted and order-restricted association models for a two-way contingency table, Computational Statistics and Data Analysis 51 (2007), pp. 4643–4655.
Reported disturbance of dreams among boys, plus a random column
DreamData223by3
DreamData223by3
A matrix of 223 individuals by 3 variables
5-7 (A), 8-9 (B), 10-11 (C), 12-13 (D), 13-14 (E)
Severity of disturbance of dream: lowest (a) to highest (d)
1 to 3
Adapted from G. Iliopoulos, M. Kateri and I. Ntzoufras, Bayesian estimation of unrestricted and order-restricted association models for a two-way contingency table, Computational Statistics and Data Analysis 51 (2007), pp. 4643–4655.
Reported disturbance of dreams among boys, using ages and original code for severities
DreamDataNames
DreamDataNames
A contingency table with 5 rows and 4 columns
Age group: 5-7 (A), 8-9 (B), 10-11 (C), 12-13 (D), 13-14 (E)
Severity of disturbance of dream: lowest (1) to highest (4)
G. Iliopoulos, M. Kateri and I. Ntzoufras, Bayesian estimation of unrestricted and order-restricted association models for a two-way contingency table, Computational Statistics and Data Analysis 51 (2007), pp. 4643–4655.
getBurt
recodes a data matrix from one format (n objects by p variables,
counts for distinct combinations of p variables, contingency table) into a Burt matrix,
or extracts a subset of a Burt matrix for selected variables
getBurt( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE, vars = NULL )
getBurt( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE, vars = NULL )
Xinput |
A data matrix, in the form of a data frame or similar |
input |
The format of the input matrix:
|
Jk |
See |
maxcat |
See |
varandcat |
See |
vars |
A list of the variable numbers to be used in the Burt matrix, if only a subset is wanted. |
A Burt matrix, symmetrical and block-diagonal with each block being a contingency table for a pair of variables
Other conversion functions:
convert()
,
getCT()
,
getdoubled()
,
getindicator()
nishburt <- getBurt(NishData) nishburtvars1to3 <- getBurt(NishData,vars=1:3) nishburtvars2and4 <- getBurt(nishburt,input="Burt",Jk=rep(3,4),vars=c(2,4))
nishburt <- getBurt(NishData) nishburtvars1to3 <- getBurt(NishData,vars=1:3) nishburtvars2and4 <- getBurt(nishburt,input="Burt",Jk=rep(3,4),vars=c(2,4))
getCT
recodes a data matrix from one format (n objects by p variables,
counts for distinct combinations of p variables, indicator matrix or Burt matrix) into a contingency table,
for cases where table() doesn't work
getCT( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE, vars = NULL )
getCT( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE, vars = NULL )
Xinput |
A data matrix, in the form of a data frame or similar |
input |
See |
Jk |
See |
maxcat |
See |
varandcat |
See |
vars |
A list of the variable numbers to be used in the contingency table when there are more than 2. |
A contingency table, giving counts for the two cross-classified variables
Other conversion functions:
convert()
,
getBurt()
,
getdoubled()
,
getindicator()
nishCTvars23 <- getCT(NishData,Jk=rep(3,4),vars=2:3)
nishCTvars23 <- getCT(NishData,Jk=rep(3,4),vars=2:3)
getdoubled
recodes a data matrix from one format (n objects by p variables,
counts for distinct combinations of p variables, contingency table) into a doubled matrix
getdoubled(Xinput, input = "nbyp", Jk = NULL, maxcat = NULL)
getdoubled(Xinput, input = "nbyp", Jk = NULL, maxcat = NULL)
Xinput |
A data matrix, in the form of a data frame or similar, all variables must be ordered categorical with numerical categories |
input |
See |
Jk |
See |
maxcat |
See |
A doubled matrix, where each variable is represented by 2 columns, and a data point taking the i-th category for a variable with J_k categories is given the values J_k-i in the first (low) column and i-1 in the second (high) column
Other conversion functions:
convert()
,
getBurt()
,
getCT()
,
getindicator()
nishdoubled <- getdoubled(NishData)
nishdoubled <- getdoubled(NishData)
getindicator
recodes a data matrix from one format (n objects by p variables,
counts for distinct combinations of p variables, contingency table) into an indicator matrix
getindicator( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE )
getindicator( Xinput, input = "nbyp", Jk = NULL, maxcat = NULL, varandcat = TRUE )
Xinput |
A data matrix, in the form of a data frame or similar |
input |
See |
Jk |
See |
maxcat |
See |
varandcat |
See |
An indicator matrix, where a variable with J_k categories is represented by J_k columns and a data point taking the i-th category has 1 in the i-th of these columns and a zero in the others
Other conversion functions:
convert()
,
getBurt()
,
getCT()
,
getdoubled()
nishindicator <- getindicator(NishData)
nishindicator <- getindicator(NishData)
myresamplefn
in this case assumes that each pair of cells represents 50 people
answering yes or no to a question, with undecideds not recorded
myresamplefn(X)
myresamplefn(X)
X |
a data matrix, to be resampled from |
This is only intended as an example of a user-generated resampling routine, users should replace it with their own function of this name
In this example we assume that rows groups of 50 people have each been asked columns/2 questions, with possible answers yes/no/undecided. It uses binomial bootstrapping for pairs of columns, assuming that successive columns are "yes" and "no" answers, with others undecided, with sums of pairs of columns having a maximum, in this case 50.
a resampled version of the input data matrix
# Five groups of people answer two yes/no/undecided questions # Note: this is just an example, and does not intend to claim that this # is the correct analysis for such a data set x <- as.matrix( rbind( c(22,25,18,22), c(12,23,21,27), c(31,12,28,22), c(29,14,35,11), c(7,31,12,21))) xresampled <- myresamplefn(x) bmr <- cabootcrs(x,"myresample",nboots=199)
# Five groups of people answer two yes/no/undecided questions # Note: this is just an example, and does not intend to claim that this # is the correct analysis for such a data set x <- as.matrix( rbind( c(22,25,18,22), c(12,23,21,27), c(31,12,28,22), c(29,14,35,11), c(7,31,12,21))) xresampled <- myresamplefn(x) bmr <- cabootcrs(x,"myresample",nboots=199)
Questionnaire data collected at a Dual Scaling workshop in Singapore
NishData
NishData
A matrix of 13 individuals by 4 variables each with 3 possible answers
20-29 (1), 30-39 (2), 40+ (3)
Agree (1), Disagree (2), Can't tell (3)
Agree (1), Disagree (2), Can't tell (3)
Agree (1), Disagree (2), Indifferent (3)
Nishisato, S. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. Lawrence Erlbaum Associates, New Jersey. (p153)
Animal bones classified by 11 different variables
OsteoData
OsteoData
A matrix of 6027 bones by 11 variables, each with 2-12 categories
1-10
1-12
P/A
P/A
P/A
1-4 or A
1-5 or A
P/A
P/A
P/A
1-4
Macheridis, S. The Use of Multiple Correspondence Analysis (MCA) in Taphonomy: The Case of Middle Helladic Asine, Greece International Journal of Osteoarchaeology 27 (2017), pp. 477–-487.
Animal bones classified by 11 different variables
OsteoDataNames
OsteoDataNames
A matrix of 6027 bones by 11 variables, each with 2-12 categories
Sheep/goat,Pig,Cattle,Deer,Dog,Equid,Large,Medium,Small,Size:indet
Horn/antler,Head,Neck,Axial,Upper_front,Lower_front,Upper_hind,Lower_hind,Feet,Metapodials,Long_bone,Element:indet
P/A
P/A
P/A
stage 1-4 or A
stage 1-5 or A
P/A
P/A
P/A
Secondary,Room_fills,Primary,Floors
Macheridis, S. The Use of Multiple Correspondence Analysis (MCA) in Taphonomy: The Case of Middle Helladic Asine, Greece International Journal of Osteoarchaeology 27 (2017), pp. 477–-487.
plotca
produces one or more scatterplots of the results of simple or multiple
correspondence analysis, with elliptical confidence regions around chosen points.
plotca( x, datasetname = NULL, mytitles = NULL, showrowlabels = TRUE, showcolumnlabels = TRUE, plotsymbolscolours = c(19, "inferno", 18, "inferno"), othersmonochrome = "grey", crpercent = 95, usebootcrits = NULL, plottype = "biplot", showrowcrs = TRUE, showcolumncrs = TRUE, likertarrows = FALSE, firstaxis = 1, lastaxis = 2, plotallpairs = "successive", picsize = NULL, mcaoneploteach = TRUE, mcashowindividuals = FALSE, mcavariablecolours = FALSE, mcacategorycolours = FALSE, groupings = NULL, grouplabels = NULL, eps = 1e-15, plotwithdevnew = FALSE )
plotca( x, datasetname = NULL, mytitles = NULL, showrowlabels = TRUE, showcolumnlabels = TRUE, plotsymbolscolours = c(19, "inferno", 18, "inferno"), othersmonochrome = "grey", crpercent = 95, usebootcrits = NULL, plottype = "biplot", showrowcrs = TRUE, showcolumncrs = TRUE, likertarrows = FALSE, firstaxis = 1, lastaxis = 2, plotallpairs = "successive", picsize = NULL, mcaoneploteach = TRUE, mcashowindividuals = FALSE, mcavariablecolours = FALSE, mcacategorycolours = FALSE, groupings = NULL, grouplabels = NULL, eps = 1e-15, plotwithdevnew = FALSE )
x |
An object of class |
datasetname |
A string to use as the name of the data set in the plots, defaults to that in cabootcrs object |
mytitles |
A list of text strings, to be used instead of the default titles of the plots, where the list must be at least as long as the number of plots to be produced |
showrowlabels |
If TRUE then label row points as usual, otherwise suppress labels of row points. Note: when analysing a Burt matrix the columns points are plotted |
showcolumnlabels |
If TRUE then label column points as usual, otherwise suppress labels of column points. Note: when analysing a Burt matrix the columns points are plotted |
plotsymbolscolours |
A vector/list of length 1, 2, 4 or equal to the number of category points to be plotted. (a) If longer than length 4 then it contains the colours for all the points, their ellipses and labels. Taken to be a vector or list of valid R colours, length at least equal to the number of category points to be plotted, in the order rows followed by columns for sca or variable categories in order for mca. The colours can be named (e.g. "azure2") or RGB hexadecimal (e.g. "#1173B3"), and can be subsetted from colours() or from the colo(u)rspace library's palettes. (b) If length 4 then it takes the form: c(row symbol, "row colour", column symbol, "column colour") giving plot symbols and colours for row and column points and ellipses when they are the primary points.
(c1) If length 2 and the first element is a valid colour choice as above then it takes the form: c("row colour", "column colour") giving colours for row and column points and ellipses when they are the primary points.
(c2) If length 2 and the first element is *not* a valid colour choice as above then it takes the form: c(column symbol, "column colour") giving plot symbols and colours for column points and ellipses when they are the primary points.
(d) If length 1 then it takes the form: "column colour" giving colours for column points and ellipses when they are the primary points, choices as second element above. Note: colour can also be specified as colours()[i] which picks out the i-th colour. Note: when analysing a Burt matrix the columns points are plotted. The idea behind "alldifferent" etc is that the colours change gradually, so that if the order of the rows/columns is meaningful then the colour change tracks this. Note: groupings and grouplabels below override all this. |
othersmonochrome |
Either:
|
crpercent |
The nominal coverage percentage of the confidence ellipses (90, 95 or 99 only if using bootstrap critical values) |
usebootcrits |
Whether to use bootstrap critical values for the ellipses:
Note: only 90%, 95% and 99% bootstrap critical values are available |
plottype |
This is only relevant for simple CA, in MCA the choice is automatic.
|
showrowcrs |
Whether to plot confidence ellipses for row points:
Note: when analysing a Burt matrix the columns points are plotted |
showcolumncrs |
Whether to plot confidence ellipses for column points:
Note: when analysing a Burt or indicator matrix the column points are plotted, the columns being all of the variable categories, ordered by variable number, e.g. for p=3 variables each with 5 categories then columns 6:10 are variable 2. Note: in MCA this is overridden by mcaoneploteach=TRUE, in which case there is one plot per variable, automatically giving ellipses for each of its categories. |
likertarrows |
If TRUE then, for MCA on likert-type ordered categorical data,
draw arrows connecting the category points for each variable,
with the arrows drawn from a category point to the next higher category point. |
firstaxis |
Number of the first (i.e. highest inertia) axis to be plotted |
lastaxis |
Number of the last (i.e. lowest inertia) axis to be plotted, which must be <= axisvariances value for x. |
plotallpairs |
Whether to plot all pairs of axes against each other:
|
picsize |
A 2-vector/list or 4-vector/list specifying the plot size:
All plots have an aspect ratio of 1. The same scales are used for all plots, so in the biplot case it might occasionally be preferred to run plotca twice with different picsize values, one being better for rows in principal coordinates and the other better for columns in principal coordinates. If picsize is used to focus in on a particular area of the plot then biplot labels might not appear properly. If using Rstudio then it may override this somewhat, especially if you resize the plot window after plotting. If Rstudio has too much of a mind of its own when plotting then try plotwithdevnew=TRUE to put each plot in a new window, as in standard R. |
mcaoneploteach |
For MCA only, a flag or list of column numbers saying whether to produce one plot for each variable, where confidence ellipses are shown for that variable but not others:
|
mcashowindividuals |
For MCA on an indicator matrix only, a flag saying whether to plot the individuals on the plot(s):
|
mcavariablecolours |
If TRUE then the only valid colour options for plotsymbolscolours are "viridis", "inferno", "alldifferent", "differentreds" or "differentblues" as above. Note: if plotsymbolscolours has length > 4, and so is specifying a colour for each category, then it overrides this. |
mcacategorycolours |
If TRUE then the only valid colour options for plotsymbolscolours are "viridis", "inferno", "alldifferent", "differentreds" or "differentblues" as above. Note: if plotsymbolscolours has length > 4, and so is specifying a colour for each category, then it overrides this. |
groupings |
The name of a file (in " ") or data frame containing group structure of row and column points: the n rows are divided into m groups and the p columns divided into k groups the file or data frame is n+p by 2, where: first column is 1..n 1..p (to make the file easier to read) Hence the file or data frame is: 1 <the number of the group-of-rows to which row 1 belongs> |
grouplabels |
The name of the file (in " ") or data frame containing the colours and labels to be used, in association with the groupings option above, in a m+k by 5 array: 1 <legend> <plot symbol> <plot colour> <draw ellipse?> The first column contains the number of the group-of-rows or group-of-columns, the others are:
See Details section and examples below to make more sense of this. This can also be used for multiple CA, remembering that only column points are shown (usually) and that columns are ordered by variable and then by category, so that three variables each with 5 categories will be columns 1:5, 6:10 and 11:15 respectively. It should not be used when mcaoneploteach=TRUE, however, as that already takes care of this sort of grouping. These options are particularly intended for large data sets, to allow attention to be drawn to some points above others, to emphasize any group structure within the data, or to show only the most important ellipses in order to make the picture less cluttered. |
eps |
Any value less than this is treated as zero in some calculations and comparisons |
plotwithdevnew |
When using Rstudio, a flag saying whether to put each plot in a new device (plot window) or just use the default Rstudio plot pane:
|
In the following, the categories for which confidence regions are being shown are referred to as the primary points, the others as the secondary points. The primary points are always plotted in principal coordinates while the secondary points can be in standard (biplot style in simple CA) or principal (french style in simple CA, always in multiple CA) coordinates.
The default colour scheme is for the primary points and their confidence ellipses to be plotted each in a different colour, as this makes it easier to see which ellipse goes with which point, while the secondary points are all plotted in monochrome to make it easier to distinguish between the two sets of points. This can all be controlled by the user. Note that a point will still be treated as a primary point and plotted with its own colour even when the plotting of its ellipse is suppressed with the showrowcrs or showcolumncrs options.
Note that the plots will look better if saved as .eps or .pdf rather than as .jpg or .png.
(1) Simple CA
Two plots are produced, in each plot one set of points (rows or columns) is regarded as
the primary set and is plotted in principal coordinates with confidence regions shown:
- one plot shows confidence regions for rows in principal coordinates
- one plot shows confidence regions for columns in principal coordinates
The other set of points (columns or rows) is regarded as the secondary set and the plotting
depends on the choice of biplot or french-style plot:
biplot - secondary points shown as directions in standard coordinates
french - secondary points shown in principal coordinates
(2) Multiple CA
All points are plotted in principal coordinates ("french")
Burt matrix (mcatype="Burt"):
a) only plot the columns of the Burt matrix (the rows are the same)
b) plot all variable categories, i.e. columns
c) if mcaoneploteach=TRUE then produce p plots, each with CRs for all categories of one of the variables,
otherwise produce one plot showing CRs for all variables (busy)
d) Columns are ordered by variable then category
(e.g. for p=3 with 5 categories each, columns 6:10 are variable 2)
Indicator matrix (mcatype="indicator"):
a) if mcashowindividuals=TRUE then plot individual (row) points, without CRs
b) as Burt for variables
Indicator matrix (mcatype="indicator") with experimental likert resampling (cabootcrs had mcaindividualboot=TRUE):
a) if mcashowindividuals=TRUE then plot individual (row) points, with CRs (busy)
b) as Burt for variables
(3) Critical values
Critical values for the ellipses default to those specified in cabootcrs, which default to bootstrap critical values
(4) Choosing colours and which ellipses to show
The showrowcrs, showcolumncrs, showrowlabels, showcolumnlabels and othersmonochrome options are available as ways of reducing plot clutter in large data sets, for example by showing the column points unlabelled and monochrome as a way of drawing the eye to the multicoloured row points and ellipses.
The default is for each primary point to be in a different colour, with secondary points in the colour defined by othersmonochrome (default grey). If othersmonochrome=NULL then secondary points are also plotted with different colours.
The plotsymbolscolours option can be used to specify quickly the set of symbols and colours used, with the options described in (5) below giving far more control at the cost of extra work.
The default colour scheme is grDevices:inferno, but with k+2 colours picked and then 2:k+1 used, because the end colour is a bit too yellow and hard to see.
Note that french-style plots in simple CA are often less cluttered because they omit the biplot lines, while they also show the two sets of points on similar scales so that it is easier to fit all the points on one picture without cropping or excessive empty space.
(5) Specifying colours for (groups of) points and ellipses
For large matrices the plots from exploratory multivariate methods are often so busy that the whole point of the method, to clarify the structure of the data, is nullified. This is even more of a problem when confidence regions are shown on the plots.
Hence points can be defined in groups as below, so you can divide them into groups in one or more ways, e.g. rows 1-3 in red and rows 4-8 in blue, or rows 1-5 in green and rows 6-8 in orange etc.
The groupings and grouplabels options are chosen via separate text files or data frames to define the groups of points. If groupings is left null then plotsymbolscolours is used instead.
There are two ways of defining groupings and group labels. The first of these is by defining a pair of data frames within R and supplying them as parameters either to cabootcrs initially or to plotca. This method works in R CMD check and hence is the one used in the examples, but as you can see is rather hard to follow.
To plot with colours defined using groups-of-points:
bd <- cabootcrs(DreamData)
Then define the groups using data frames in R or text files:
(5a) Using data frames
These data frames define the same groupings and colours as the files below, see the files for a clearer explanation:
bd <- cabootcrs(DreamData) groupingsframe <- cbind(c(1:5,1:4),c(1,1,2,2,3,1,1,2,2)) grouplabframe <- cbind( c(1,2,3,1,2), c("AB","CD","E","ab","cd"), c(19,20,21,"+","*"), c("green","blue","yellow","red","orange"), "T" ) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe)
(5b) Using text files
A version which produces identical results, but does not work in R CMD check, is usually much easier for the user as they can be edited outside R. The groupings and group labels are defined in files, present in the directory specified in setwd(). To obtain identical results to the above, create two text files as below:
DreamGroupings.txt contains
1 1 2 1 3 2 4 2 5 3 1 1 2 1 3 2 4 2
e.g. the first two lines show that rows 1,2 belong to group-of-rows 1, while the last two lines show that columns 3,4 belong to group-of-columns 2.
DreamGroupLabels.txt contains
1 AB 19 "green" T 2 CD 20 "blue" T 3 E 21 "yellow" T 1 ab + "red" T 2 cd * "orange" T
e.g. group-of-rows 1 will be shown in green and plotted with symbol 19, with the legend AB.
bd <- cabootcrs(DreamData, showresults=FALSE) plotca(bd, groupings="DreamGroupings.txt", grouplabels="DreamGroupLabels.txt")
(5c) General use
Even without groupings this can be used to specify all colours, simply by specifying each point as its own group, in this case rows 1-5 and columns 1-4 define row groups 1-5 and column groups 1-4, no legend is required so repeat "", choose 9 plot symbols and 9 colours.
Hence to plot each point with its own specified colour and symbol:
bd <- cabootcrs(DreamData, showresults=FALSE) groupingsframe <- cbind(c(1:5,1:4),c(1:5,1:4)) grouplabframe <- cbind( c(1:5,1:4), rep("",9), 11:19, c("green","blue","yellow","red","orange","grey1","grey22","grey44","grey66"), "T" ) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe)
Note: plotsymbolscolours can be used to plot with a different colour for each category point, but with default symbols.
(5d) MCA use
As before, but need to specify for both row and column categories even though only column categories will be plotted,
so just duplicate the data frames (yes I know it's a bodge).
Hence to plot each point with your own choice of colour and symbol:
bd3 <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE) groupingsframe <- cbind(1:12,1:12) groupingsframe <- rbind(groupingsframe,groupingsframe) grouplabframe <- cbind( 1:12, rep("",12), 11:22, c("green","blue","yellow","red","orange","grey1","grey22","grey44","grey66","coral1","coral2","coral3"), "T" ) grouplabframe <- rbind(grouplabframe,grouplabframe) plotca(bd3, groupings=groupingsframe, grouplabels=grouplabframe)
This can also be used to specify colours and symbols for each variable in MCA, but note that you are still specifying for each column point, so that you need to know how many categories each variable has - a simpler way to do this will be added to a later update.
ost <- cabootcrs(OsteoDataNames, catype="mca", varandcat=FALSE) totcolumns <- ost@columns totvars <- ost@p numcats <- ost@Jk cats <- NULL for (i in 1:totvars) { cats <- c(cats,rep(i,numcats[i])) } groupingsframe <- cbind(1:totcolumns,cats) groupingsframe <- rbind(groupingsframe,groupingsframe) grouplabframe <- cbind( 1:totvars, rep("",totvars), rep(19,totvars), c("blue","red","green","darkgreen","green3","black","grey1","coral1","coral2","coral3","navy"), "T" ) grouplabframe <- rbind(grouplabframe,grouplabframe) plotca(ost, groupings=groupingsframe, grouplabels=grouplabframe)
This can also be used to plot the individual points with colours to denote groups, for example in the below the first 100 individual points are plotted with one colour, the rest with another, while all columns get their own colour.
bd3indnoboot <- cabootcrs(DreamData223by3,catype="mca",mcatype="indicator",varandcat=FALSE,nboots=0) rowgroups <- cbind( 1:223, c(rep(1,100),rep(2,123)) ) colgroups <- cbind(1:12,1:12) groupingsframe <- rbind( rowgroups, colgroups ) rowlabs <- cbind( 1:2, c("1-100","101-223"), c("+","+"), c("black","grey"), "T") collabs <- cbind( 1:12, rep("",12), "*", c("green","cyan","yellow","red","orange","blue","blue1","blue2","blue3","coral1","coral2","coral3"), "T" ) grouplabframe <- rbind(rowlabs,collabs) plotca(bd3indnoboot, groupings=groupingsframe, grouplabels=grouplabframe, mcashowindividuals=TRUE, mcaoneploteach=FALSE, showrowlabels=FALSE, othersmonochrome=NULL, picsize=c(-.36,.36))
(6) Plotting results from cabootcrs() using ellipse() or ca()
This can be used with the ellipse() package to add the confidence ellipse to a picture from another package
Example: confidence ellipse for row or column i on axes 1,2 from cabootcrs output Results is:
lines(ellipse(x=covmat(Results,i,"row",1,2,FALSE), centre=Results@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") lines(ellipse(x=covmat(Results,i,"column",1,2,FALSE), centre=Results@Colprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue")
Example: to add row CRs to a plot from the ca() package to data set TheData
Results <- cabootcrs(TheData, showresults=FALSE) caResults <- ca(TheData) plot(caResults) for (i in 1:dim(TheData)[1]) { lines(ellipse(x=covmat(Results,i,"row",1,2,FALSE), centre=Results@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") }
However note that reflectaxes() may also be needed.
(7) Note
Note that plotca, summaryca and printca are all defined as new functions, rather than as overloaded versions of plot, summary and print, simply in order to avoid complication and unintended consequences within R
One or more plots are produced but no output object is created
cabootcrs-package
, cabootcrs
,
printca
, summaryca
, cabootcrsresults
# the main function also calls plotca with the default options bd <- cabootcrs(DreamDataNames, datasetname="Maxwell's dream data", varnames=c("Age groups","Severity of disturbance"),showresults=FALSE) plotca(bd) ## Not run: ### Plot options for SCA: # Note that Rstudio changes plots depending on the size of your plot window, # so the picsize parameter (used for xlim, ylim in the plot command) is partially # overridden, so warnings that a point is outside the plot limits may not be correct # Plot with specified size to fit the whole of the arrows in without cropping plotca(bd, picsize=c(-2.5,2.5)) # or smaller, note the warning plotca(bd, picsize=c(-0.5,0.5)) # Replacing the plot titles with your own plotca(bd, mytitles=c("Plot 1 Title line 1\nline 2\nline 3","Plot 2 Title line 1\nline 2\nline 3" )) # All points in colour plotca(bd,othersmonochrome=NULL) # 90% regions in reds and blue plotca(bd, plotsymbolscolours=c(3,"differentreds","*","blue"), crpercent=90) # Many different colour schemes and ways of specifying colours and symbols # Specify colour/colour scheme and symbols plotca(bd, plotsymbolscolours=c(3,"differentreds","*","blue") ) plotca(bd, plotsymbolscolours=c(3,"viridis") ) plotca(bd, plotsymbolscolours="inferno" ) plotca(bd, plotsymbolscolours=colours()[641] ) # Just give a list of colours, one for each category point plotca(bd, plotsymbolscolours=c("green","blue","yellow","red","orange","red","blue","tan1","orchid4") ) plotca(bd, plotsymbolscolours=colours()[161:170] ) plotca(bd, plotsymbolscolours=colours()[c(111:115,561:564)] ) # This time using colo(u)rspace package colour palettes library(colorspace) plotca(bd, plotsymbolscolours=hcl.colors(9,palette="Peach") ) plotca(bd, plotsymbolscolours=hcl.colors(50,palette="Mint")[c(11:15,31:34)] ) plotca(bd, plotsymbolscolours=c(sequential_hcl(bd@rows,"Blues 3"),sequential_hcl(bd@columns,"Reds 3")) ) # suppress labels for column points, to de-clutter row points picture, # this is mostly useful for larger data sets than this one plotca(bd, showcolumnlabels=FALSE) # only show ellipses for rows 1, 1-2 and 1-3 respectively plotca(bd, showrowcrs=1) plotca(bd, showrowcrs=c(1,2)) plotca(bd, showrowcrs=1:3) # plot axes 1 v 2, 1 v 3 and 2 v 3 plotca(bd, firstaxis=1, lastaxis=3, plotallpairs="all") # If the cell values were all 10 times larger bdx10 <- cabootcrs(10*DreamData) plotca(bdx10,plottype = "french",picsize=c(-0.4,0.4)) # Various plots for a larger data set, note that the default colour scheme picks out # males, females and ages because of the ordering of the rows bs <- cabootcrs(SuicideData) plotca(bs, picsize=c(-0.7,0.8)) plotca(bs, plottype="french", picsize=c(-0.7,0.8)) plotca(bs, plottype="french", picsize=c(-0.7,0.8), plotsymbolscolours=c(".","inferno","+","black")) # Note that the ellipses follow the horseshoe bas <- cabootcrs(AsbestosData) # more complicated plotting, define group structure in data frames groupingsframe <- cbind( c(1:5,1:4), c(1,1,2,2,3,1,1,2,2) ) grouplabframe <- cbind( c(1,2,3,1,2), c("AB","CD","E","ab","cd"), c(19,20,21,"+","*"), c("green","blue","yellow","red","orange"), "T" ) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe, plottype="french") # This can also be used for custom colour schemes other than "differentreds" etc as # defined in the plotsymbolscolours option, though note that R colours are not ordered in the # way you might expect, so the colour scheme below is purely illustrative and not very sensible customframe <- cbind( c(1:5,1:4), c(1:5,1:4) ) customlabframe <- cbind( c(1:5,1:4), rep("",9), c(rep(18,5),rep(19,4)), colours()[c(seq(10,130,30),seq(440,590,50))], "T" ) plotca(bd, groupings=customframe, grouplabels=customlabframe) ### Plot options for MCA: # Use one of the below, labelling row A as R:A or just A (etc) as preferred bd3 <- cabootcrs(DreamData223by3, catype="mca", datasetname="Dream data with extra random column") bd3 <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE, datasetname="Dream data with extra random column") # just variable 2 plotca(bd3,mcaoneploteach=2) # just variables 1 and 3 plotca(bd3,mcaoneploteach=c(1,3)) # one plot showing CRs for all variable categories (busy) plotca(bd3,mcaoneploteach=FALSE) # each variable has its own colour plotca(bd3,mcavariablecolours=TRUE) # each category number has its own colour plotca(bd3,mcacategorycolours=TRUE) # draw arrows between successive ordered categories plotca(bd3,likertarrows=TRUE) # secondary points black rather than grey plotca(bd3,othersmonochrome="black") # 99% CRs plotca(bd3,crpercent=99) # Plot together CRs for the first category of each variable plotca(bd3,showcolumncrs=c(1,6,10),mcaoneploteach=FALSE) # Plot together CRs for the second category of each variable plotca(bd3,showcolumncrs=c(2,7,11),mcaoneploteach=FALSE) # One plot with CRs only for variable 3 plotca(bd3,showcolumncrs=10:12,mcaoneploteach=FALSE) # Three plots, various colour schemes plotca(bd3,othersmonochrome="black") plotca(bd3,othersmonochrome="black",mcacategorycolours=TRUE) plotca(bd3,mcavariablecolours=TRUE,likertarrows=TRUE) # All on one plot, various colour schemes, very busy plotca(bd3,mcaoneploteach=FALSE,showcolumncrs=1:5,othersmonochrome="black") plotca(bd3,mcaoneploteach=FALSE,showcolumncrs=1:5,likertarrows=TRUE) plotca(bd3,mcaoneploteach=FALSE,likertarrows=TRUE,mcacategorycolours=TRUE) plotca(bd3,mcaoneploteach=FALSE,likertarrows=TRUE,mcavariablecolours=TRUE) # Plots with more complicated colour and grouping structure, as above but now in MCA case. # Note the need to duplicate both data frames as groupings must be specified for both rows # and columns, though only columns are used. # Note also that symbol types need to be defined either all as numbers or all as symbols groupingsframe <- cbind(1:12,c(1,1,2,2,3,4,4,5,5,6,7,7)) groupingsframe <- rbind(groupingsframe,groupingsframe) grouplabframe <- cbind( 1:7, c("AB","CD","E","ab","cd","v1","v23"), 19:25, c("cyan","deepskyblue","blue","red","tomato","chartreuse","green"), "T" ) grouplabframe <- rbind(grouplabframe,grouplabframe) plotca(bd3, groupings=groupingsframe, grouplabels=grouplabframe, mcaoneploteach=FALSE) ### Adding confidence ellipses to plots from ca package ca() and mjca() functions ## Simple CA # Adding confidence ellipses for row points to plots from ca() using ellipse() # Note: reflectaxes() is needed if cabootcrs() and ca() axes are reflected wrt each other library(ca) library(ellipse) cad <- ca(DreamData) plot(cad) for (i in 1:dim(DreamData)[1]) { lines( ellipse(x=covmat(bd,i,"row",1,2,FALSE), centre=bd@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") } # These plots can also be produced almost identically here bd <- cabootcrs(DreamData, showresults=FALSE) # both plots almost the same as the default plot from ca() plotca(bd, plottype="french", showrowcrs=FALSE, showcolumncrs=FALSE, othersmonochrome=NULL, plotsymbolscolours=c(19,"blue",17,"red"), picsize=c(-0.5,0.6) ) # plot almost the same as the ca() plot, but with ellipses added plotca(bd, plottype="french", othersmonochrome=NULL, plotsymbolscolours=c(19,"blue",17,"red"), picsize=c(-0.5,0.6)) ## Multiple CA # Adding confidence ellipses for category points to plots from mjca() using ellipse() # Note that ca also uses standardised inertias and coordinates by default library(ca) library(ellipse) cad3 <- mjca(DreamData223by3) # Obtain covariance matrices, using same scalings and standardisations bd3 <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) # Reflect axis 1 for consistency (may differ on other machines) bd3 <- reflectaxes(bd3,1) # Plot and then add ellipses for categories of variable 1 only plot(cad3) for (i in 1:bd3@Jk[1]) { lines( ellipse(x=covmat(bd3,i,"column",1,2,FALSE), centre=bd3@Colprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="red") } # These plots can also be produced almost identically here plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, mcavariablecolours=TRUE, showcolumncrs=FALSE) plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, plotsymbolscolours=c(17,"red"), showcolumncrs=FALSE) plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, mcacategorycolours=TRUE, showcolumncrs=1:5 ) # Three separate plots with ellipses for one variable on each for (j in 1:length(bd3@Jk)) { plot(cad3) if (j==1) { firstcol <- 1 } else { firstcol <- cumsum(bd3@Jk)[j-1]+1 } for (i in firstcol:cumsum(bd3@Jk)[j]) { lines( ellipse(x=covmat(bd3,i,"column",1,2,FALSE), centre=bd3@Colprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="red") } } # For comparison, default plot with one plot each showing the ellipses for each variable plotca(bd3,picsize=c(-0.35,0.35), mcacategorycolours=TRUE ) ## End(Not run)
# the main function also calls plotca with the default options bd <- cabootcrs(DreamDataNames, datasetname="Maxwell's dream data", varnames=c("Age groups","Severity of disturbance"),showresults=FALSE) plotca(bd) ## Not run: ### Plot options for SCA: # Note that Rstudio changes plots depending on the size of your plot window, # so the picsize parameter (used for xlim, ylim in the plot command) is partially # overridden, so warnings that a point is outside the plot limits may not be correct # Plot with specified size to fit the whole of the arrows in without cropping plotca(bd, picsize=c(-2.5,2.5)) # or smaller, note the warning plotca(bd, picsize=c(-0.5,0.5)) # Replacing the plot titles with your own plotca(bd, mytitles=c("Plot 1 Title line 1\nline 2\nline 3","Plot 2 Title line 1\nline 2\nline 3" )) # All points in colour plotca(bd,othersmonochrome=NULL) # 90% regions in reds and blue plotca(bd, plotsymbolscolours=c(3,"differentreds","*","blue"), crpercent=90) # Many different colour schemes and ways of specifying colours and symbols # Specify colour/colour scheme and symbols plotca(bd, plotsymbolscolours=c(3,"differentreds","*","blue") ) plotca(bd, plotsymbolscolours=c(3,"viridis") ) plotca(bd, plotsymbolscolours="inferno" ) plotca(bd, plotsymbolscolours=colours()[641] ) # Just give a list of colours, one for each category point plotca(bd, plotsymbolscolours=c("green","blue","yellow","red","orange","red","blue","tan1","orchid4") ) plotca(bd, plotsymbolscolours=colours()[161:170] ) plotca(bd, plotsymbolscolours=colours()[c(111:115,561:564)] ) # This time using colo(u)rspace package colour palettes library(colorspace) plotca(bd, plotsymbolscolours=hcl.colors(9,palette="Peach") ) plotca(bd, plotsymbolscolours=hcl.colors(50,palette="Mint")[c(11:15,31:34)] ) plotca(bd, plotsymbolscolours=c(sequential_hcl(bd@rows,"Blues 3"),sequential_hcl(bd@columns,"Reds 3")) ) # suppress labels for column points, to de-clutter row points picture, # this is mostly useful for larger data sets than this one plotca(bd, showcolumnlabels=FALSE) # only show ellipses for rows 1, 1-2 and 1-3 respectively plotca(bd, showrowcrs=1) plotca(bd, showrowcrs=c(1,2)) plotca(bd, showrowcrs=1:3) # plot axes 1 v 2, 1 v 3 and 2 v 3 plotca(bd, firstaxis=1, lastaxis=3, plotallpairs="all") # If the cell values were all 10 times larger bdx10 <- cabootcrs(10*DreamData) plotca(bdx10,plottype = "french",picsize=c(-0.4,0.4)) # Various plots for a larger data set, note that the default colour scheme picks out # males, females and ages because of the ordering of the rows bs <- cabootcrs(SuicideData) plotca(bs, picsize=c(-0.7,0.8)) plotca(bs, plottype="french", picsize=c(-0.7,0.8)) plotca(bs, plottype="french", picsize=c(-0.7,0.8), plotsymbolscolours=c(".","inferno","+","black")) # Note that the ellipses follow the horseshoe bas <- cabootcrs(AsbestosData) # more complicated plotting, define group structure in data frames groupingsframe <- cbind( c(1:5,1:4), c(1,1,2,2,3,1,1,2,2) ) grouplabframe <- cbind( c(1,2,3,1,2), c("AB","CD","E","ab","cd"), c(19,20,21,"+","*"), c("green","blue","yellow","red","orange"), "T" ) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe) plotca(bd, groupings=groupingsframe, grouplabels=grouplabframe, plottype="french") # This can also be used for custom colour schemes other than "differentreds" etc as # defined in the plotsymbolscolours option, though note that R colours are not ordered in the # way you might expect, so the colour scheme below is purely illustrative and not very sensible customframe <- cbind( c(1:5,1:4), c(1:5,1:4) ) customlabframe <- cbind( c(1:5,1:4), rep("",9), c(rep(18,5),rep(19,4)), colours()[c(seq(10,130,30),seq(440,590,50))], "T" ) plotca(bd, groupings=customframe, grouplabels=customlabframe) ### Plot options for MCA: # Use one of the below, labelling row A as R:A or just A (etc) as preferred bd3 <- cabootcrs(DreamData223by3, catype="mca", datasetname="Dream data with extra random column") bd3 <- cabootcrs(DreamData223by3, catype="mca", varandcat=FALSE, datasetname="Dream data with extra random column") # just variable 2 plotca(bd3,mcaoneploteach=2) # just variables 1 and 3 plotca(bd3,mcaoneploteach=c(1,3)) # one plot showing CRs for all variable categories (busy) plotca(bd3,mcaoneploteach=FALSE) # each variable has its own colour plotca(bd3,mcavariablecolours=TRUE) # each category number has its own colour plotca(bd3,mcacategorycolours=TRUE) # draw arrows between successive ordered categories plotca(bd3,likertarrows=TRUE) # secondary points black rather than grey plotca(bd3,othersmonochrome="black") # 99% CRs plotca(bd3,crpercent=99) # Plot together CRs for the first category of each variable plotca(bd3,showcolumncrs=c(1,6,10),mcaoneploteach=FALSE) # Plot together CRs for the second category of each variable plotca(bd3,showcolumncrs=c(2,7,11),mcaoneploteach=FALSE) # One plot with CRs only for variable 3 plotca(bd3,showcolumncrs=10:12,mcaoneploteach=FALSE) # Three plots, various colour schemes plotca(bd3,othersmonochrome="black") plotca(bd3,othersmonochrome="black",mcacategorycolours=TRUE) plotca(bd3,mcavariablecolours=TRUE,likertarrows=TRUE) # All on one plot, various colour schemes, very busy plotca(bd3,mcaoneploteach=FALSE,showcolumncrs=1:5,othersmonochrome="black") plotca(bd3,mcaoneploteach=FALSE,showcolumncrs=1:5,likertarrows=TRUE) plotca(bd3,mcaoneploteach=FALSE,likertarrows=TRUE,mcacategorycolours=TRUE) plotca(bd3,mcaoneploteach=FALSE,likertarrows=TRUE,mcavariablecolours=TRUE) # Plots with more complicated colour and grouping structure, as above but now in MCA case. # Note the need to duplicate both data frames as groupings must be specified for both rows # and columns, though only columns are used. # Note also that symbol types need to be defined either all as numbers or all as symbols groupingsframe <- cbind(1:12,c(1,1,2,2,3,4,4,5,5,6,7,7)) groupingsframe <- rbind(groupingsframe,groupingsframe) grouplabframe <- cbind( 1:7, c("AB","CD","E","ab","cd","v1","v23"), 19:25, c("cyan","deepskyblue","blue","red","tomato","chartreuse","green"), "T" ) grouplabframe <- rbind(grouplabframe,grouplabframe) plotca(bd3, groupings=groupingsframe, grouplabels=grouplabframe, mcaoneploteach=FALSE) ### Adding confidence ellipses to plots from ca package ca() and mjca() functions ## Simple CA # Adding confidence ellipses for row points to plots from ca() using ellipse() # Note: reflectaxes() is needed if cabootcrs() and ca() axes are reflected wrt each other library(ca) library(ellipse) cad <- ca(DreamData) plot(cad) for (i in 1:dim(DreamData)[1]) { lines( ellipse(x=covmat(bd,i,"row",1,2,FALSE), centre=bd@Rowprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="blue") } # These plots can also be produced almost identically here bd <- cabootcrs(DreamData, showresults=FALSE) # both plots almost the same as the default plot from ca() plotca(bd, plottype="french", showrowcrs=FALSE, showcolumncrs=FALSE, othersmonochrome=NULL, plotsymbolscolours=c(19,"blue",17,"red"), picsize=c(-0.5,0.6) ) # plot almost the same as the ca() plot, but with ellipses added plotca(bd, plottype="french", othersmonochrome=NULL, plotsymbolscolours=c(19,"blue",17,"red"), picsize=c(-0.5,0.6)) ## Multiple CA # Adding confidence ellipses for category points to plots from mjca() using ellipse() # Note that ca also uses standardised inertias and coordinates by default library(ca) library(ellipse) cad3 <- mjca(DreamData223by3) # Obtain covariance matrices, using same scalings and standardisations bd3 <- cabootcrs(DreamData223by3, catype="mca", showresults=FALSE) # Reflect axis 1 for consistency (may differ on other machines) bd3 <- reflectaxes(bd3,1) # Plot and then add ellipses for categories of variable 1 only plot(cad3) for (i in 1:bd3@Jk[1]) { lines( ellipse(x=covmat(bd3,i,"column",1,2,FALSE), centre=bd3@Colprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="red") } # These plots can also be produced almost identically here plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, mcavariablecolours=TRUE, showcolumncrs=FALSE) plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, plotsymbolscolours=c(17,"red"), showcolumncrs=FALSE) plotca(bd3,picsize=c(-0.35,0.35), mcaoneploteach=FALSE, mcacategorycolours=TRUE, showcolumncrs=1:5 ) # Three separate plots with ellipses for one variable on each for (j in 1:length(bd3@Jk)) { plot(cad3) if (j==1) { firstcol <- 1 } else { firstcol <- cumsum(bd3@Jk)[j-1]+1 } for (i in firstcol:cumsum(bd3@Jk)[j]) { lines( ellipse(x=covmat(bd3,i,"column",1,2,FALSE), centre=bd3@Colprinccoord[i,cbind(1,2)], npoints=1000), cex=1, pch=".", col="red") } } # For comparison, default plot with one plot each showing the ellipses for each variable plotca(bd3,picsize=c(-0.35,0.35), mcacategorycolours=TRUE ) ## End(Not run)
printca
prints full correspondence analysis results, including inertias, coordinates,
representations, contributions, variances, covariances and critical values
printca(x, datasetname = NULL)
printca(x, datasetname = NULL)
x |
An object of class |
datasetname |
The name (in "") of the data set, to be used in the output, defaults to name in cabootcrs object |
Printed results, no plots or objects produced
cabootcrs-package
, cabootcrs
, summaryca
, plotca
,
cabootcrsresults
results <- cabootcrs(DreamData, showresults=FALSE) printca(results, datasetname="Dreams")
results <- cabootcrs(DreamData, showresults=FALSE) printca(results, datasetname="Dreams")
rearrange
compares one set of axes for row points and column points (from the bootstrap
data matrix) to another (from the sample data matrix) by looking at all possible
reorderings and reflections (only) of the bootstrap axes and picking the one which
best matches the sample axes.
rearrange( RS, RB, CS, CB, r, reflectonly = FALSE, catype = "sca", mcatype = "Burt", mcaindividualboot = FALSE, maxrearrange = 6 )
rearrange( RS, RB, CS, CB, r, reflectonly = FALSE, catype = "sca", mcatype = "Burt", mcaindividualboot = FALSE, maxrearrange = 6 )
RS |
Sample axes for row points (as columns) |
RB |
Bootstrap axes for row points (as columns) |
CS |
Sample axes for column points (as columns) |
CB |
Bootstrap axes for column points (as columns) |
r |
Rank of the bootstrap matrix |
reflectonly |
TRUE to reflect the axes only, no reordering |
catype |
Can be "sca" for simple or "mca" for multiple CA. |
mcatype |
"Burt" if using Burt matrix rows and columns are the same, so only use column axes |
mcaindividualboot |
TRUE to use highly experimental method |
maxrearrange |
Maximum number of axes to rearrange |
This is only intended for internal use by the cabootcrs
function.
Finds the rearrangement of columns of RB and CB to maximise match = tr( abs(RS'*RB + CS'*CB) )
Uses the Hungarian algorithm via lp.assign in lpSolve up to a maximum of maxrearrange vectors.
Algorithm assigns columns (B) to rows (S), hence transpose matrix so postmultiplication moves B to coincide with S.
In effect this is Procrustes rotation of bootstrap axes to best match sample axes, except that there is no rotation, only reflection and reordering of axes (aka rearranging).
Note that this seeks the best fit to all axes, not best fit just to the ones whose variances are being calculated, and does not weight the reordering by eigenvalues or restrict how far a vector can be reordered by. Hence a fairly low maxrearrange may be preferable.
Faster than full comparison when rank >= 4, for maxrearrange=6, but can take much longer if rearrange all axes
Rearranging more axes means higher chance of finding a matching axis, so std dev can be decreased by average of 1-2% if all axes are rearranged.
Limited testing suggests that rearranging all axes tends to over-reorder and hence underestimate variances, due to ignoring eigenvalues, hence seems best to rearrange 6 as before, unless very large numbers of close eigenvalues.
When mca bootstrap replicate has fewer "real" singular values (i.e. > 1/p) than the sample matrix then only the first B@realr axes will be compared, so that the last sample axis will get nothing from this replicate and the "real" bootstrap axes will be matched only with the same first few sample ones.
r = rank of bootstrap matrix, so if < sample rank will ignore last sample axis
list containing:
T = matrix to rearrange xB so it is equivalent to xS, i.e. xS <- xB * T
numre = number of axes checked for rearranging = min(r,maxrearrange)
match = assign$objval from the Hungarian algorithm
same = flag for whether there was no reordering of axes (but may have been reflection)
# Not intended for direct call by users
# Not intended for direct call by users
rearrange_old
compares one set of axes for row points and column points (from the bootstrap
data matrix) to another (from the sample data matrix) by looking at all possible
reorderings and reflections (only) of the bootstrap axes and picking the one which
best matches the sample axes.
rearrange_old(RS, RB, CS, CB, r)
rearrange_old(RS, RB, CS, CB, r)
RS |
Sample axes for row points (as columns) |
RB |
Bootstrap axes for row points (as columns) |
CS |
Sample axes for column points (as columns) |
CB |
Bootstrap axes for column points (as columns) |
r |
Rank of the bootstrap matrix |
This is only intended for internal use by the cabootcrs
function, and only for
simple CA if for some reason the lpSolve package is unavailable.
It has not been used with MCA, and so will almost certainly not work properly in that case.
Finds the rearrangement of columns of RB and CB to maximise match = tr( abs(RS'*RB + CS'*CB) )
Goes through all possible orderings and so is painfully slow.
list containing:
T = matrix to rearrange xB so it is equivalent to xS, i.e. xS <- xB * T
numre = number of axes checked for rearranging = min(r,maxrearrange)
match = assign$objval from the Hungarian algorithm
same = flag for whether there was no reordering of axes (but may have been reflection)
cabootcrs-package
, cabootcrs
, rearrange
# Not intended for direct call by users
# Not intended for direct call by users
reflectaxes
reflects the principal and standard coordinates
of the axes chosen, and the appropriate covariances where needed
reflectaxes(x, axes = c(1, 2))
reflectaxes(x, axes = c(1, 2))
x |
An object of class |
axes |
A list or vector containing the numbers of the axes to be reflected |
This may be useful when comparing results between different data sets, or from different packages
An object of class cabootcrsresults
cabootcrs-package
, cabootcrs
, reordercategories
, cabootcrsresults
results <- cabootcrs(DreamData) resultsreflectfirstaxis <- reflectaxes(results, 1) summaryca(resultsreflectfirstaxis) plotca(resultsreflectfirstaxis) ## Not run: # Often needed when comparing results between different packages, # or same package on different machines, # or to allow ellipses from this package to be added to plots from other packages library(ca) cad3 <- mjca(DreamData223by3) bd3 <- cabootcrs(DreamData223by3, catype="mca") summary(cad3) bd3reflect1 <- reflectaxes(bd3,1) summaryca(bd3reflect1) ## End(Not run)
results <- cabootcrs(DreamData) resultsreflectfirstaxis <- reflectaxes(results, 1) summaryca(resultsreflectfirstaxis) plotca(resultsreflectfirstaxis) ## Not run: # Often needed when comparing results between different packages, # or same package on different machines, # or to allow ellipses from this package to be added to plots from other packages library(ca) cad3 <- mjca(DreamData223by3) bd3 <- cabootcrs(DreamData223by3, catype="mca") summary(cad3) bd3reflect1 <- reflectaxes(bd3,1) summaryca(bd3reflect1) ## End(Not run)
reordercategories
reorders the principal and standard coordinates, CTR, REP,
variances and covariances of the categories for a single MCA variable
reordercategories(x, varno, newcats)
reordercategories(x, varno, newcats)
x |
An object of class |
varno |
The number of the variable to be reordered |
newcats |
A vector of length equal to the number of categories for this variable, giving the new order for the categories (e.g. c(4,1,2,3) means that the original 4th category is moved to first) |
This may be useful when comparing results between different data sets or from different packages
Note: does not reorder anything in the cabasicresults
part of the cabootcrsresults
object
An object of class cabootcrsresults
cabootcrs-package
, cabootcrs
, reflectaxes
, cabootcrsresults
bd3 <- cabootcrs(DreamData223by3, catype="mca", nboots=0, showresults=FALSE) bd3reorderedvar2 <- reordercategories(bd3, 2, c(3,2,4,1)) summaryca(bd3) summaryca(bd3reorderedvar2) ## Not run: # Can be used when comparing results in different packages, # or when adding ellipses from this package to output from others library(FactoMineR) library(ca) data(tea) # remove duplicated age variable teamod <- tea[,c(1:18,20:36)] # ca package uses standardised coordinates and inertias by default catea <- mjca(teamod) btea <- cabootcrs(teamod, catype="mca", showresults=FALSE, nboots=0, varandcat=FALSE) # FactoMineR package uses unstandardised coordinates and inertias by default fmtea <- MCA(teamod, method="Burt", graph=FALSE) bteaunstd <- cabootcrs(teamod, catype="mca", showresults=FALSE, nboots=0, mcaadjustinertias = FALSE, mcaadjustcoords = FALSE, varandcat=FALSE) summary(fmtea) summaryca(bteaunstd) summary(catea) summaryca(btea) # slight difference due to different orderings of categories for these two fmtea$var$coord / bteaunstd@Colprinccoord[,1:5] catea$colpcoord[,1:5] / btea@Colprinccoord[,1:5] fmtea$var$coord / catea$colpcoord[,1:5] # Variables 22 and 23, in columns 57-65, are the problem # The coordinates agree (apart from reflection) but the categories are in a different order fmtea$var$coord[57:65,1:3] bteaunstd@Colprinccoord[57:65,1:3] catea$colpcoord[57:65,1:3] btea@Colprinccoord[57:65,1:3] # Coordinates agree when categories reordered and axes reflected bteaunstdreord <- reordercategories(bteaunstd,22,c(2:5,1)) bteaunstdreord <- reordercategories(bteaunstdreord,23,c(3,2,1,4)) bteaunstdreordreflect <- reflectaxes(bteaunstdreord,c(1,4)) fmtea$var$coord / bteaunstdreordreflect@Colprinccoord[,1:5] bteareord <- reordercategories(btea,22,c(2:5,1)) bteareord <- reordercategories(bteareord,23,c(3,2,1,4)) bteareordreflect <- reflectaxes(bteareord,c(2,5)) catea$colpcoord[,1:5] / bteareordreflect@Colprinccoord[,1:5] ## End(Not run)
bd3 <- cabootcrs(DreamData223by3, catype="mca", nboots=0, showresults=FALSE) bd3reorderedvar2 <- reordercategories(bd3, 2, c(3,2,4,1)) summaryca(bd3) summaryca(bd3reorderedvar2) ## Not run: # Can be used when comparing results in different packages, # or when adding ellipses from this package to output from others library(FactoMineR) library(ca) data(tea) # remove duplicated age variable teamod <- tea[,c(1:18,20:36)] # ca package uses standardised coordinates and inertias by default catea <- mjca(teamod) btea <- cabootcrs(teamod, catype="mca", showresults=FALSE, nboots=0, varandcat=FALSE) # FactoMineR package uses unstandardised coordinates and inertias by default fmtea <- MCA(teamod, method="Burt", graph=FALSE) bteaunstd <- cabootcrs(teamod, catype="mca", showresults=FALSE, nboots=0, mcaadjustinertias = FALSE, mcaadjustcoords = FALSE, varandcat=FALSE) summary(fmtea) summaryca(bteaunstd) summary(catea) summaryca(btea) # slight difference due to different orderings of categories for these two fmtea$var$coord / bteaunstd@Colprinccoord[,1:5] catea$colpcoord[,1:5] / btea@Colprinccoord[,1:5] fmtea$var$coord / catea$colpcoord[,1:5] # Variables 22 and 23, in columns 57-65, are the problem # The coordinates agree (apart from reflection) but the categories are in a different order fmtea$var$coord[57:65,1:3] bteaunstd@Colprinccoord[57:65,1:3] catea$colpcoord[57:65,1:3] btea@Colprinccoord[57:65,1:3] # Coordinates agree when categories reordered and axes reflected bteaunstdreord <- reordercategories(bteaunstd,22,c(2:5,1)) bteaunstdreord <- reordercategories(bteaunstdreord,23,c(3,2,1,4)) bteaunstdreordreflect <- reflectaxes(bteaunstdreord,c(1,4)) fmtea$var$coord / bteaunstdreordreflect@Colprinccoord[,1:5] bteareord <- reordercategories(btea,22,c(2:5,1)) bteareord <- reordercategories(bteareord,23,c(3,2,1,4)) bteareordreflect <- reflectaxes(bteareord,c(2,5)) catea$colpcoord[,1:5] / bteareordreflect@Colprinccoord[,1:5] ## End(Not run)
sca
returns all the basic results from a CA of a matrix with rows >= cols,
in an object of class cabasicresults
sca(X, catype = "sca", mcatype = NULL, p = 2, needtrans = FALSE)
sca(X, catype = "sca", mcatype = NULL, p = 2, needtrans = FALSE)
X |
A data matrix with rows >= cols |
catype |
Can be "sca" for simple CA or "mca" for multiple CA |
mcatype |
If catype="mca" then this can be "Burt", "Indicator"
or "doubled" depending on the analysis required. |
p |
Number of variables, only needed if catype="mca" |
needtrans |
TRUE if rows < columns so need to transpose in the routine |
This is only intended for internal use by the cabootcrs
function.
An object of class cabasicresults
cabootcrs-package
, cabootcrs
, cabasicresults
results <- sca(as.matrix(DreamData))
results <- sca(as.matrix(DreamData))
settingsinertias
prints the settings and the inertias
settingsinertias(x)
settingsinertias(x)
x |
An object of class |
printed output only
# Purely internal, not intended for use by users
# Purely internal, not intended for use by users
Methods of suicide in Germany, 1974-1977
SuicideData
SuicideData
A contingency table with 34 rows and 9 columns
Gender and age of individual: Females aged 10-15 (F10) to males aged 90+ (M90)
Method: drugs/poison (Mat), gas at home (Gas.h), gas-others (Gas.o), hanging (Hang), drowning (Drown), gunshot (Gun), stabbing (Stab), jumping (Jump), Other
Nishisato, S. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. Lawrence Erlbaum Associates, New Jersey. (p12)
summaryca
prints correspondence analysis results for the first two dimensions,
giving inertias, coordinates,
representations, contributions and standard deviations
summaryca(x, datasetname = NULL, mcaprintindividuals = FALSE)
summaryca(x, datasetname = NULL, mcaprintindividuals = FALSE)
x |
An object of class |
datasetname |
The name (in "") of the data set, to be used in the output, defaults to that in cabootcrs object |
mcaprintindividuals |
If TRUE then print individual (row) point results in multiple correspondence analysis when using indicator or doubled matrix |
Printed results, no plots or objects produced
cabootcrs-package
, cabootcrs
, printca
, plotca
,
cabootcrsresults
results <- cabootcrs(DreamData, showresults=FALSE) summaryca(results, datasetname="Dreams")
results <- cabootcrs(DreamData, showresults=FALSE) summaryca(results, datasetname="Dreams")