Title: | Mapping, Pruning, and Graphing Tree Models |
---|---|
Description: | Functions with example data for graphing, pruning, and mapping models from hierarchical clustering, and classification and regression trees. |
Authors: | Denis White, Robert B. Gramacy <[email protected]> |
Maintainer: | Robert B. Gramacy <[email protected]> |
License: | Unlimited |
Version: | 1.4-8 |
Built: | 2024-11-14 06:25:23 UTC |
Source: | CRAN |
Reduces a hierarchical cluster tree to a smaller tree either by pruning until a given number of observation groups remain, or by pruning tree splits below a given height.
clip.clust (cluster, data=NULL, k=NULL, h=NULL)
clip.clust (cluster, data=NULL, k=NULL, h=NULL)
cluster |
object of class |
data |
clustered dataset for hclust application. |
k |
desired number of groups. |
h |
height at which to prune for grouping. |
At least one of k
or h
must be specified; k
takes
precedence if both are given.
Used with draw.clust
. See example.
Pruned cluster object of class hclust
.
Denis White
hclust
,
twins.object
,
cutree
,
draw.clust
library (cluster) data (oregon.bird.dist) draw.clust (clip.clust (agnes (oregon.bird.dist), k=6))
library (cluster) data (oregon.bird.dist) draw.clust (clip.clust (agnes (oregon.bird.dist), k=6))
Reduces a prediction tree produced by rpart
to a
smaller tree by specifying either a cost-complexity parameter,
or a number of nodes to which to prune.
clip.rpart (tree, cp=NULL, best=NULL)
clip.rpart (tree, cp=NULL, best=NULL)
tree |
object of class |
cp |
cost-complexity parameter. |
best |
number of nodes to which to prune. |
If both cp
and best
are not NULL
, then
cp
is used.
A minor enhancement of the existing prune.rpart
to
incorporate the parameter best
as it is used in the
(now defunct) prune.tree
function in the old tree
package. See example.
Pruned tree object of class rpart
.
Denis White
library (rpart) data (oregon.env.vars, oregon.border, oregon.grid) draw.tree (clip.rpart (rpart (oregon.env.vars), best=7), nodeinfo=TRUE, units="species", cases="cells", digits=0) group <- group.tree (clip.rpart (rpart (oregon.env.vars), best=7)) names(group) <- row.names(oregon.env.vars) map.groups (oregon.grid, group) lines (oregon.border) map.key (0.05, 0.65, labels=as.character(seq(6)), size=1, new=FALSE, sep=0.5, pch=19, head="node")
library (rpart) data (oregon.env.vars, oregon.border, oregon.grid) draw.tree (clip.rpart (rpart (oregon.env.vars), best=7), nodeinfo=TRUE, units="species", cases="cells", digits=0) group <- group.tree (clip.rpart (rpart (oregon.env.vars), best=7)) names(group) <- row.names(oregon.env.vars) map.groups (oregon.grid, group) lines (oregon.border) map.key (0.05, 0.65, labels=as.character(seq(6)), size=1, new=FALSE, sep=0.5, pch=19, head="node")
Graph a hierarchical cluster tree of class twins
or hclust
using colored symbols at observations.
draw.clust (cluster, data=NULL, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, nodeinfo=FALSE, cases="obs", new=TRUE)
draw.clust (cluster, data=NULL, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, nodeinfo=FALSE, cases="obs", new=TRUE)
cluster |
object of class |
data |
clustered dataset for hclust application. |
cex |
size of text, par parameter. |
pch |
shape of symbol at leaves, par parameter. |
size |
size in cex units of symbol at leaves. |
col |
vector of colors from |
nodeinfo |
if |
cases |
label for type of observations. |
new |
if |
An alternative to pltree
and plot.hclust
.
The vector of colors supplied or generated.
Denis White
agnes
,
diana
,
hclust
,
draw.tree
,
map.groups
library (cluster) data (oregon.bird.dist) draw.clust (clip.clust (agnes (oregon.bird.dist), k=6))
library (cluster) data (oregon.bird.dist) draw.clust (clip.clust (agnes (oregon.bird.dist), k=6))
Graph a classification or regression tree with a hierarchical tree diagram, optionally including colored symbols at leaves and additional info at intermediate nodes.
draw.tree (tree, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, nodeinfo=FALSE, units="", cases="obs", digits=getOption("digits"), print.levels=TRUE, new=TRUE)
draw.tree (tree, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, nodeinfo=FALSE, units="", cases="obs", digits=getOption("digits"), print.levels=TRUE, new=TRUE)
tree |
object of class |
cex |
size of text, par parameter. |
pch |
shape of symbol at leaves, par parameter. |
size |
if |
col |
vector of colors from |
nodeinfo |
if |
units |
label for units of mean value of response, if regression tree. |
cases |
label for type of observations. |
digits |
number of digits to round mean value of response, if regression tree. |
print.levels |
if |
new |
if |
As in plot.rpart(,uniform=TRUE)
, each level has constant depth.
Specifying nodeinfo=TRUE
, shows the deviance explained or the
classification rate at each node.
A split is shown, for numerical variables, as
variable <> value
when the cases with lower values go left, or as
variable >< value
when the cases with lower values go right.
When the splitting variable is a factor, and print.levels=TRUE,
the split is shown as levels = factor = levels
with the cases
on the left having factor levels equal to those on the left of the
factor name, and correspondingly for the right.
The vector of colors supplied or generated.
Denis White
library (rpart) data (oregon.env.vars) draw.tree (clip.rpart (rpart (oregon.env.vars), best=7), nodeinfo=TRUE, units="species", cases="cells", digits=0)
library (rpart) data (oregon.env.vars) draw.tree (clip.rpart (rpart (oregon.env.vars), best=7), nodeinfo=TRUE, units="species", cases="cells", digits=0)
Alternative to cutree
that orders pruned groups from
left to right in draw order.
group.clust (cluster, k=NULL, h=NULL)
group.clust (cluster, k=NULL, h=NULL)
cluster |
object of class |
k |
desired number of groups. |
h |
height at which to prune for grouping. |
At least one of k
or h
must be specified; k
takes
precedence if both are given.
Normally used with map.groups
. See example.
Vector of pruned cluster membership
Denis White
hclust
,
twins.object
,
cutree
,
map.groups
data (oregon.bird.dist, oregon.grid) group <- group.clust (hclust (dist (oregon.bird.dist)), k=6) names(group) <- row.names(oregon.bird.dist) map.groups (oregon.grid, group)
data (oregon.bird.dist, oregon.grid) group <- group.clust (hclust (dist (oregon.bird.dist)), k=6) names(group) <- row.names(oregon.bird.dist) map.groups (oregon.grid, group)
Alternative to tree[["where"]]
that orders groups from left
to right in draw order.
group.tree (tree)
group.tree (tree)
tree |
object of class |
Normally used with map.groups
. See example.
Vector of rearranged tree[["where"]]
Denis White
library (rpart) data (oregon.env.vars, oregon.grid) group <- group.tree (clip.rpart (rpart (oregon.env.vars), best=7)) names(group) <- row.names(oregon.env.vars) map.groups (oregon.grid, group=group)
library (rpart) data (oregon.env.vars, oregon.grid) group <- group.tree (clip.rpart (rpart (oregon.env.vars), best=7)) names(group) <- row.names(oregon.env.vars) map.groups (oregon.grid, group=group)
Computes the Kelley-Gardner-Sutcliffe penalty function for a hierarchical cluster tree.
kgs (cluster, diss, alpha=1, maxclust=NULL)
kgs (cluster, diss, alpha=1, maxclust=NULL)
cluster |
object of class |
diss |
object of class |
alpha |
weight for number of clusters. |
maxclust |
maximum number of clusters for which to compute measure. |
Kelley et al. (see reference) proposed a method that can help decide where to prune a hierarchical cluster tree. At any level of the tree the mean across all clusters of the mean within clusters of the dissimilarity measure is calculated. After normalizing, the number of clusters times alpha is added. The minimum of this function corresponds to the suggested pruning size.
The current implementation has complexity O(n*n*maxclust), thus very slow with large n. For improvements, at least it should only calculate the spread for clusters that are split at each level, rather than over again for all.
Vector of the penalty function for trees of size 2:maxclust. The names of vector elements are the respective numbers of clusters.
Denis White
Kelley, L.A., Gardner, S.P., Sutcliffe, M.J. (1996) An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally-related subfamilies, Protein Engineering, 9, 1063-1065.
twins.object
,
dissimilarity.object
,
hclust
,
dist
,
clip.clust
,
library (cluster) data (votes.repub) a <- agnes (votes.repub, method="ward") b <- kgs (a, a$diss, maxclust=20) plot (names (b), b, xlab="# clusters", ylab="penalty")
library (cluster) data (votes.repub) a <- agnes (votes.repub, method="ward") b <- kgs (a, a$diss, maxclust=20) plot (names (b), b, xlab="# clusters", ylab="penalty")
Draws maps of groups of observations created by clustering, classification or regression trees, or some other type of classification.
map.groups (pts, group, pch=par("pch"), size=2, col=NULL, border=NULL, new=TRUE)
map.groups (pts, group, pch=par("pch"), size=2, col=NULL, border=NULL, new=TRUE)
pts |
matrix or data frame with components |
group |
vector of integer class numbers corresponding to
|
pch |
symbol number from |
size |
size in cex units of point symbol. |
col |
vector of fill colors from |
border |
vector of border colors from |
new |
if |
If the number of rows of pts
is not equal to the length
of group
, then (1) pts
are assumed to represent
polygons and polygon
is used, (2) the identifiers in
group
are matched to the polygons in pts
through
names(group)
and pts$x[is.na(pts$y)]
, and (3) these
identifiers are mapped to dense integers to reference colours.
Otherwise, group
is assumed to parallel pts
, and,
if pch < 100
, then points
is used, otherwise
ngon
, to draw shaded polygon symbols for each
observation in pts.
The vector of fill colors supplied or generated.
Denis White
ngon
,
polygon
,
group.clust
,
group.tree
,
map.key
data (oregon.bird.names, oregon.env.vars, oregon.bird.dist) data (oregon.border, oregon.grid) # range map for American Avocet spp <- match ("American avocet", oregon.bird.names[["common.name"]]) group <- oregon.bird.dist[,spp] + 1 names(group) <- row.names(oregon.bird.dist) kol <- gray (seq(0.8,0.2,length.out=length (table (group)))) map.groups (oregon.grid, group=group, col=kol) lines (oregon.border) # distribution of January temperatures cuts <- quantile (oregon.env.vars[["jan.temp"]], probs=seq(0,1,1/5)) group <- cut (oregon.env.vars[["jan.temp"]], cuts, labels=FALSE, include.lowest=TRUE) names(group) <- row.names(oregon.env.vars) kol <- gray (seq(0.8,0.2,length.out=length (table (group)))) map.groups (oregon.grid, group=group, col=kol) lines (oregon.border) # January temperatures using point symbols rather than polygons map.groups (oregon.env.vars, group, col=kol, pch=19) lines (oregon.border)
data (oregon.bird.names, oregon.env.vars, oregon.bird.dist) data (oregon.border, oregon.grid) # range map for American Avocet spp <- match ("American avocet", oregon.bird.names[["common.name"]]) group <- oregon.bird.dist[,spp] + 1 names(group) <- row.names(oregon.bird.dist) kol <- gray (seq(0.8,0.2,length.out=length (table (group)))) map.groups (oregon.grid, group=group, col=kol) lines (oregon.border) # distribution of January temperatures cuts <- quantile (oregon.env.vars[["jan.temp"]], probs=seq(0,1,1/5)) group <- cut (oregon.env.vars[["jan.temp"]], cuts, labels=FALSE, include.lowest=TRUE) names(group) <- row.names(oregon.env.vars) kol <- gray (seq(0.8,0.2,length.out=length (table (group)))) map.groups (oregon.grid, group=group, col=kol) lines (oregon.border) # January temperatures using point symbols rather than polygons map.groups (oregon.env.vars, group, col=kol, pch=19) lines (oregon.border)
Draws legends for maps of groups of observations.
map.key (x, y, labels=NULL, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, head="", sep=0.25*cex, new=FALSE)
map.key (x, y, labels=NULL, cex=par("cex"), pch=par("pch"), size=2.5*cex, col=NULL, head="", sep=0.25*cex, new=FALSE)
x , y
|
coordinates of lower left position of key in proportional units (0-1) of plot. |
labels |
vector of labels for classes, or if |
size |
size in cex units of shaded key symbol. |
pch |
symbol number for |
cex |
pointsize of text, |
head |
text heading for key. |
sep |
separation in cex units between adjacent symbols in key.
If |
col |
vector of colors from |
new |
if |
Uses points
or ngon
, depending on value of
pch
, to draw shaded polygon symbols for key.
The vector of colors supplied or generated.
Denis White
data (oregon.env.vars) # key for examples in help(map.groups) # range map for American Avocet kol <- gray (seq(0.8,0.2,length.out=2)) map.key (0.2, 0.2, labels=c("absent","present"), pch=106, col=kol, head="key", new=TRUE) # distribution of January temperatures cuts <- quantile (oregon.env.vars[["jan.temp"]], probs=seq(0,1,1/5)) kol <- gray (seq(0.8,0.2,length.out=5)) map.key (0.2, 0.2, labels=as.character(round(cuts,0)), col=kol, sep=0, head="key", new=TRUE) # key for example in help file for group.tree map.key (0.2, 0.2, labels=as.character(seq(6)), pch=19, head="node", new=TRUE)
data (oregon.env.vars) # key for examples in help(map.groups) # range map for American Avocet kol <- gray (seq(0.8,0.2,length.out=2)) map.key (0.2, 0.2, labels=c("absent","present"), pch=106, col=kol, head="key", new=TRUE) # distribution of January temperatures cuts <- quantile (oregon.env.vars[["jan.temp"]], probs=seq(0,1,1/5)) kol <- gray (seq(0.8,0.2,length.out=5)) map.key (0.2, 0.2, labels=as.character(round(cuts,0)), col=kol, sep=0, head="key", new=TRUE) # key for example in help file for group.tree map.key (0.2, 0.2, labels=as.character(seq(6)), pch=19, head="node", new=TRUE)
Draws a regular polygon at specified coordinates as an outline or shaded.
ngon (xydc, n=4, angle=0, type=1)
ngon (xydc, n=4, angle=0, type=1)
xydc |
four element vector with |
n |
number of sides for polygon (>8 => circle). |
angle |
rotation angle of figure, in degrees. |
type |
|
Uses polygon
to draw shaded polygons and
lines
for outline. If n is odd, there is
a vertex at (0, d/2), otherwise the midpoint of a side is
at (0, d/2).
Invisible.
Denis White
polygon
,
lines
,
map.key
,
map.groups
plot (c(0,1), c(0,1), type="n") ngon (c(.5, .5, 10, "blue"), angle=30, n=3) apply (cbind (runif(8), runif(8), 6, 2), 1, ngon)
plot (c(0,1), c(0,1), type="n") ngon (c(.5, .5, 10, "blue"), angle=30, n=3) apply (cbind (runif(8), runif(8), 6, 2), 1, ngon)
Binary matrix (1 = present) for distributions of 248 native breeding bird species for 389 grid cells in Oregon, USA.
data (oregon.bird.dist)
data (oregon.bird.dist)
A data frame with 389 rows and 248 columns.
Row names are hexagon identifiers from White et al. (1992). Column names are species element codes developed by The Nature Conservancy (TNC), the Oregon Natural Heritage Program (ONHP), and NatureServe.
Denis White
Master, L. (1996) Predicting distributions for vertebrate species: some observations, Gap Analysis: A Landscape Approach to Biodiversity Planning, Scott, J.M., Tear, T.H., and Davis, F.W., editors, American Society for Photogrammetry and Remote Sensing, Bethesda, MD, pp. 171-176.
White, D., Preston, E.M., Freemark, K.E., Kiester, A.R. (1999) A hierarchical framework for conserving biodiversity, Landscape ecological analysis: issues and applications, Klopatek, J.M., Gardner, R.H., editors, Springer-Verlag, pp. 127-153.
White, D., Kimerling, A.J., Overton, W.S. (1992) Cartographic and geometric components of a global sampling design for environmental monitoring, Cartography and Geographic Information Systems, 19(1), 5-22.
TNC, https://www.nature.org/en-us/
ONHP, https://inr.oregonstate.edu/orbic/
NatureServe, https://www.natureserve.org/
oregon.env.vars
,
oregon.bird.names
,
oregon.grid
,
oregon.border
Scientific and common names for 248 native breeding bird species in Oregon, USA.
data (oregon.bird.names)
data (oregon.bird.names)
A data frame with 248 rows and 2 columns.
Row names are species element codes. Columns are
"scientific.name"
and "common.name"
.
Data are provided by The Nature Conservancy (TNC),
the Oregon Natural Heritage Program (ONHP), and
NatureServe.
Denis White
Master, L. (1996) Predicting distributions for vertebrate species: some observations, Gap Analysis: A Landscape Approach to Biodiversity Planning, Scott, J.M., Tear, T.H., and Davis, F.W., editors, American Society for Photogrammetry and Remote Sensing, Bethesda, MD, pp. 171-176.
TNC, https://www.nature.org/en-us/
ONHP, https://inr.oregonstate.edu/orbic/
NatureServe, https://www.natureserve.org/
The boundary of the state of Oregon, USA, in
lines
format.
data (oregon.border)
data (oregon.border)
A data frame with 485 rows and 2 columns (the components
"x"
and "y"
).
The map projection for this boundary, as well as the point
coordinates in oregon.env.vars
, is the Lambert
Conformal Conic with standard parallels at 33 and 45
degrees North latitude, with the longitude of the central
meridian at 120 degrees, 30 minutes West longitude,
and with the projection origin latitude at 41 degrees,
45 minutes North latitude.
Denis White
Distributions of 10 environmental variables for 389 grid cells in Oregon, USA.
data (oregon.env.vars)
data (oregon.env.vars)
A data frame with 389 rows and 10 columns.
Row names are hexagon identifiers from White et al. (1992). Variables (columns) are
bird.spp | number of native breeding bird species |
x | x coordinate of center of grid cell |
y | y coordinate of center of grid cell |
jan.temp | mean minimum January temperature (C) |
jul.temp | mean maximum July temperature (C) |
rng.temp | mean difference between July and January temperatures (C) |
ann.ppt | mean annual precipitation (mm) |
min.elev | minimum elevation (m) |
rng.elev | range of elevation (m) |
max.slope | maximum slope (percent) |
Denis White
White, D., Preston, E.M., Freemark, K.E., Kiester, A.R. (1999) A hierarchical framework for conserving biodiversity, Landscape ecological analysis: issues and applications, Klopatek, J.M., Gardner, R.H., editors, Springer-Verlag, pp. 127-153.
White, D., Kimerling, A.J., Overton, W.S. (1992) Cartographic and geometric components of a global sampling design for environmental monitoring, Cartography and Geographic Information Systems, 19(1), 5-22.
oregon.bird.dist
,
oregon.grid
,
oregon.border
Polygon borders for 389 hexagonal grid cells covering Oregon, USA,
in polygon
format.
data (oregon.grid)
data (oregon.grid)
A data frame with 3112 rows and 2 columns (the components
"x"
and "y"
).
The polygon format used for these grid cell boundaries is a slight
variation from the standard R/S format. Each cell polygon is
described by seven coordinate pairs, the last repeating the first.
Prior to the first coordinate pair of each cell is a row containing
NA in the "y"
column and, in the "x"
column, an
identifier for the cell. The identifiers are the same as the
row names in oregon.bird.dist
and
oregon.env.vars
. See map.groups
for
how the linkage is made in mapping.
These grid cells are extracted from a larger set covering the conterminous United States and adjacent parts of Canada and Mexico, as described in White et al. (1992). Only cells with at least 50 percent of their area contained within the state of Oregon are included.
The map projection for the coordinates, as well as the point
coordinates in oregon.env.vars
, is the Lambert
Conformal Conic with standard parallels at 33 and 45
degrees North latitude, with the longitude of the central
meridian at 120 degrees, 30 minutes West longitude,
and with the projection origin latitude at 41 degrees,
45 minutes North latitude.
Denis White
White, D., Kimerling, A.J., Overton, W.S. (1992) Cartographic and geometric components of a global sampling design for environmental monitoring, Cartography and Geographic Information Systems, 19(1), 5-22.
Alternative to as.hclust
that retains cluster data.
twins.to.hclust (cluster)
twins.to.hclust (cluster)
cluster |
object of class |
Used internally in with clip.clust
and
draw.clust
.
hclust object
Denis White