Fit a Classification or Regression Tree
Description
A tree is grown by binary recursive partitioning using the response in
the specified formula and choosing splits from the terms of the
right-hand-side.
Usage
tree(formula, data, weights, subset,
na.action = na.pass, control = tree.control(nobs, ...),
method = "recursive.partition",
split = c("deviance", "gini"),
model = FALSE, x = FALSE, y = TRUE, wts = TRUE, ...)
Arguments
formula |
A formula expression. The left-hand-side (response)
should be either a numerical vector when a regression tree will be
fitted or a factor, when a classification tree is produced. The
right-hand-side should be a series of numeric or factor
variables separated by + ; there should be no interaction
terms. Both . and - are allowed: regression trees can
have offset terms.
|
data |
A data frame in which to preferentially interpret
formula , weights and subset .
|
weights |
Vector of non-negative observational weights; fractional
weights are allowed.
|
subset |
An expression specifying the subset of cases to be used.
|
na.action |
A function to filter missing data from the model
frame. The default is na.pass (to do nothing) as tree
handles missing values (by dropping them down the tree as far
as possible).
|
control |
A list as returned by tree.control .
|
method |
character string giving the method to use. The only other
useful value is "model.frame" .
|
split |
Splitting criterion to use.
|
model |
If this argument is itself a model frame, then the
formula and data arguments are ignored, and
model is used to define the model. If the argument is
logical and true, the model frame is stored as component
model in the result.
|
x |
logical. If true, the matrix of variables for each case
is returned.
|
y |
logical. If true, the response variable is returned.
|
wts |
logical. If true, the weights are returned.
|
... |
Additional arguments that are passed to
tree.control . Normally used for mincut , minsize
or mindev .
|
Details
A tree is grown by binary recursive partitioning using the response in
the specified formula and choosing splits from the terms of the
right-hand-side. Numeric variables are divided into
X<a
and X>a
; the levels of an unordered factor
are divided into
two non-empty groups. The split which maximizes the reduction in
impurity is chosen, the data set split and the process
repeated. Splitting continues until the terminal nodes are too small or
too few to be split.
Tree growth is limited to a depth of 31 by the use of integers to
label nodes.
Factor predictor variables can have up to 32 levels. This limit is
imposed for ease of labelling, but since their use in a classification
tree with three or more levels in a response involves a search over
2(k−1)−1
groupings for k
levels,
the practical limit is much less.
Value
The value is an object of class "tree"
which has components
frame |
A data frame with a row for each node, and
row.names giving the node numbers. The columns include
var , the variable used at the split (or "<leaf>" for a
terminal node), n , the (weighted) number of cases reaching
that node, dev the deviance of the node, yval , the
fitted value at the node (the mean for regression trees, a majority
class for classification trees) and split , a two-column
matrix of the labels for the left and right splits at the
node. Classification trees also have yprob , a matrix of
fitted probabilities for each response level.
|
where |
An integer vector giving the row number of the frame
detailing the node to which each case is assigned.
|
terms |
The terms of the formula.
|
call |
The matched call to Tree .
|
model |
If model = TRUE , the model frame.
|
x |
If x = TRUE , the model matrix.
|
y |
If y = TRUE , the response.
|
wts |
If wts = TRUE , the weights.
|
and attributes xlevels
and, for classification trees,
ylevels
.
A tree with no splits is of class "singlenode"
which inherits
from class "tree"
.
Author(s)
B. D. Ripley
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984)
Classification and Regression Trees. Wadsworth.
Ripley, B. D. (1996)
Pattern Recognition and Neural Networks.
Cambridge University Press, Cambridge. Chapter 7.
See Also
tree.control
, prune.tree
,
predict.tree
, snip.tree
Examples
data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)
cpus.ltr
summary(cpus.ltr)
plot(cpus.ltr); text(cpus.ltr)
ir.tr <- tree(Species ~., iris)
ir.tr
summary(ir.tr)