Title: | Partial Dependence Plots |
---|---|
Description: | A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R. |
Authors: | Brandon M. Greenwell [aut, cre] |
Maintainer: | Brandon M. Greenwell <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.8.2 |
Built: | 2024-10-29 06:56:47 UTC |
Source: | CRAN |
Plots partial dependence functions (i.e., marginal effects) using
ggplot2
graphics.
## S3 method for class 'partial' autoplot( object, center = FALSE, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, smooth = FALSE, smooth.method = "auto", smooth.formula = y ~ x, smooth.span = 0.75, smooth.method.args = list(), contour = FALSE, contour.color = "white", train = NULL, xlab = NULL, ylab = NULL, main = NULL, legend.title = "yhat", ... ) ## S3 method for class 'ice' autoplot( object, center = FALSE, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, train = NULL, xlab = NULL, ylab = NULL, main = NULL, ... ) ## S3 method for class 'cice' autoplot( object, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, train = NULL, xlab = NULL, ylab = NULL, main = NULL, ... )
## S3 method for class 'partial' autoplot( object, center = FALSE, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, smooth = FALSE, smooth.method = "auto", smooth.formula = y ~ x, smooth.span = 0.75, smooth.method.args = list(), contour = FALSE, contour.color = "white", train = NULL, xlab = NULL, ylab = NULL, main = NULL, legend.title = "yhat", ... ) ## S3 method for class 'ice' autoplot( object, center = FALSE, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, train = NULL, xlab = NULL, ylab = NULL, main = NULL, ... ) ## S3 method for class 'cice' autoplot( object, plot.pdp = TRUE, pdp.color = "red", pdp.size = 1, pdp.linetype = 1, rug = FALSE, train = NULL, xlab = NULL, ylab = NULL, main = NULL, ... )
object |
An object that inherits from the |
center |
Logical indicating whether or not to produce centered ICE
curves (c-ICE curves). Only useful when |
plot.pdp |
Logical indicating whether or not to plot the partial
dependence function on top of the ICE curves. Default is |
pdp.color |
Character string specifying the color to use for the partial
dependence function when |
pdp.size |
Positive number specifying the line width to use for the
partial dependence function when |
pdp.linetype |
Positive number specifying the line type to use for the
partial dependence function when |
rug |
Logical indicating whether or not to include rug marks on the
predictor axes. Default is |
smooth |
Logical indicating whether or not to overlay a LOESS smooth.
Default is |
smooth.method |
Character string specifying the smoothing method
(function) to use (e.g., |
smooth.formula |
Formula to use in smoothing function (e.g.,
|
smooth.span |
Controls the amount of smoothing for the default loess
smoother. Smaller numbers produce wigglier lines, larger numbers produce
smoother lines. Default is |
smooth.method.args |
List containing additional arguments to be passed
on to the modeling function defined by |
contour |
Logical indicating whether or not to add contour lines to the level plot. |
contour.color |
Character string specifying the color to use for the
contour lines when |
train |
Data frame containing the original training data. Only required
if |
xlab |
Character string specifying the text for the x-axis label. |
ylab |
Character string specifying the text for the y-axis label. |
main |
Character string specifying the text for the main title of the plot. |
legend.title |
Character string specifying the text for the legend title.
Default is |
... |
Additional (optional) arguments to be passed onto
|
A "ggplot"
object.
## Not run: # # Regression example (requires randomForest package to run) # # Load required packages library(ggplot2) # for autoplot() generic library(gridExtra) # for `grid.arrange()` library(magrittr) # for forward pipe operator `%>%` library(randomForest) # Fit a random forest to the Boston housing data data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Partial dependence of cmedv on lstat boston.rf %>% partial(pred.var = "lstat") %>% autoplot(rug = TRUE, train = boston) + theme_bw() # Partial dependence of cmedv on lstat and rm boston.rf %>% partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>% autoplot(contour = TRUE, legend.title = "cmedv", option = "B", direction = -1) + theme_bw() # ICE curves and c-ICE curves age.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE) grid.arrange( autoplot(age.ice, alpha = 0.1), # ICE curves autoplot(age.ice, center = TRUE, alpha = 0.1), # c-ICE curves ncol = 2 ) ## End(Not run)
## Not run: # # Regression example (requires randomForest package to run) # # Load required packages library(ggplot2) # for autoplot() generic library(gridExtra) # for `grid.arrange()` library(magrittr) # for forward pipe operator `%>%` library(randomForest) # Fit a random forest to the Boston housing data data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Partial dependence of cmedv on lstat boston.rf %>% partial(pred.var = "lstat") %>% autoplot(rug = TRUE, train = boston) + theme_bw() # Partial dependence of cmedv on lstat and rm boston.rf %>% partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>% autoplot(contour = TRUE, legend.title = "cmedv", option = "B", direction = -1) + theme_bw() # ICE curves and c-ICE curves age.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE) grid.arrange( autoplot(age.ice, alpha = 0.1), # ICE curves autoplot(age.ice, center = TRUE, alpha = 0.1), # c-ICE curves ncol = 2 ) ## End(Not run)
Data on median housing values from 506 census tracts in the suburbs of Boston
from the 1970 census. This data frame is a corrected version of the original
data by Harrison and Rubinfeld (1978) with additional spatial information.
The data were taken directly from BostonHousing2
and
unneeded columns (i.e., name of town, census tract, and the uncorrected
median home value) were removed.
data(boston)
data(boston)
A data frame with 506 rows and 16 variables.
lon
Longitude of census tract.
lat
Latitude of census tract.
cmedv
Corrected median value of owner-occupied homes in USD 1000's
crim
Per capita crime rate by town.
zn
Proportion of residential land zoned for lots over 25,000 sq.ft.
indus
Proportion of non-retail business acres per town.
chas
Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox
Nitric oxides concentration (parts per 10 million).
rm
Average number of rooms per dwelling.
age
Proportion of owner-occupied units built prior to 1940.
dis
Weighted distances to five Boston employment centers.
rad
Index of accessibility to radial highways.
tax
Full-value property-tax rate per USD 10,000.
ptratio
Pupil-teacher ratio by town.
b
$1000(B - 0.63)^2$ where B is the proportion of blacks by town.
lstat
Percentage of lower status of the population.
Harrison, D. and Rubinfeld, D.L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81-102.
Gilley, O.W., and R. Kelley Pace (1996). On the Harrison and Rubinfeld Data. Journal of Environmental Economics and Management, 31, 403-405.
Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html] Irvine, CA: University of California, Department of Information and Computer Science.
Pace, R. Kelley, and O.W. Gilley (1997). Using the Spatial Configuration of the Data to Improve Estimation. Journal of the Real Estate Finance and Economics, 14, 333-340.
Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.1-1.
head(boston)
head(boston)
Construct a single "exemplar" record from a data frame. For now, all numeric
columns (including "Date"
objects) are replaced with their
corresponding median value and non-numeric columns are replaced with their
most frequent value.
exemplar(object) ## S3 method for class 'data.frame' exemplar(object) ## S3 method for class 'matrix' exemplar(object) ## S3 method for class 'dgCMatrix' exemplar(object)
exemplar(object) ## S3 method for class 'data.frame' exemplar(object) ## S3 method for class 'matrix' exemplar(object) ## S3 method for class 'dgCMatrix' exemplar(object)
object |
A data frame, matrix, or
|
A data frame with the same number of columns as object
and a
single row.
set.seed(1554) # for reproducibility train <- data.frame( x = rnorm(100), y = sample(letters[1L:3L], size = 100, replace = TRUE, prob = c(0.1, 0.1, 0.8)) ) exemplar(train)
set.seed(1554) # for reproducibility train <- data.frame( x = rnorm(100), y = sample(letters[1L:3L], size = 100, replace = TRUE, prob = c(0.1, 0.1, 0.8)) ) exemplar(train)
Compute partial dependence functions (i.e., marginal effects) for various model fitting objects.
partial(object, ...) ## Default S3 method: partial( object, pred.var, pred.grid, pred.fun = NULL, grid.resolution = NULL, ice = FALSE, center = FALSE, approx = FALSE, quantiles = FALSE, probs = 1:9/10, trim.outliers = FALSE, type = c("auto", "regression", "classification"), inv.link = NULL, which.class = 1L, prob = FALSE, recursive = TRUE, plot = FALSE, plot.engine = c("lattice", "ggplot2"), smooth = FALSE, rug = FALSE, chull = FALSE, levelplot = TRUE, contour = FALSE, contour.color = "white", alpha = 1, train, cats = NULL, check.class = TRUE, progress = FALSE, parallel = FALSE, paropts = NULL, ... ) ## S3 method for class 'model_fit' partial(object, ...)
partial(object, ...) ## Default S3 method: partial( object, pred.var, pred.grid, pred.fun = NULL, grid.resolution = NULL, ice = FALSE, center = FALSE, approx = FALSE, quantiles = FALSE, probs = 1:9/10, trim.outliers = FALSE, type = c("auto", "regression", "classification"), inv.link = NULL, which.class = 1L, prob = FALSE, recursive = TRUE, plot = FALSE, plot.engine = c("lattice", "ggplot2"), smooth = FALSE, rug = FALSE, chull = FALSE, levelplot = TRUE, contour = FALSE, contour.color = "white", alpha = 1, train, cats = NULL, check.class = TRUE, progress = FALSE, parallel = FALSE, paropts = NULL, ... ) ## S3 method for class 'model_fit' partial(object, ...)
object |
A fitted model object of appropriate class (e.g., |
... |
Additional optional arguments to be passed onto
|
pred.var |
Character string giving the names of the predictor variables of interest. For reasons of computation/interpretation, this should include no more than three variables. |
pred.grid |
Data frame containing the joint values of interest for the
variables listed in |
pred.fun |
Optional prediction function that requires two arguments:
|
grid.resolution |
Integer giving the number of equally spaced points to
use for the continuous variables listed in |
ice |
Logical indicating whether or not to compute individual
conditional expectation (ICE) curves. Default is |
center |
Logical indicating whether or not to produce centered ICE
curves (c-ICE curves). Only used when |
approx |
Logical indicating whether or not to compute a faster, but
approximate, marginal effect plot (similar in spirit to the
plotmo package). If |
quantiles |
Logical indicating whether or not to use the sample
quantiles of the continuous predictors listed in |
probs |
Numeric vector of probabilities with values in [0,1]. (Values up
to 2e-14 outside that range are accepted and moved to the nearby endpoint.)
Default is |
trim.outliers |
Logical indicating whether or not to trim off outliers
from the continuous predictors listed in |
type |
Character string specifying the type of supervised learning.
Current options are |
inv.link |
Function specifying the transformation to be applied to the
predictions before the partial dependence function is computed
(experimental). Default is |
which.class |
Integer specifying which column of the matrix of predicted
probabilities to use as the "focus" class. Default is to use the first class.
Only used for classification problems (i.e., when
|
prob |
Logical indicating whether or not partial dependence for
classification problems should be returned on the probability scale, rather
than the centered logit. If |
recursive |
Logical indicating whether or not to use the weighted tree
traversal method described in Friedman (2001). This only applies to objects
that inherit from class |
plot |
Logical indicating whether to return a data frame containing the
partial dependence values ( |
plot.engine |
Character string specifying which plotting engine to use
whenever |
smooth |
Logical indicating whether or not to overlay a LOESS smooth.
Default is |
rug |
Logical indicating whether or not to include a rug display on the
predictor axes. The tick marks indicate the min/max and deciles of the
predictor distributions. This helps reduce the risk of interpreting the
partial dependence plot outside the region of the data (i.e., extrapolating).
Only used when |
chull |
Logical indicating whether or not to restrict the values of the
first two variables in |
levelplot |
Logical indicating whether or not to use a false color level
plot ( |
contour |
Logical indicating whether or not to add contour lines to the
level plot. Only used when |
contour.color |
Character string specifying the color to use for the
contour lines when |
alpha |
Numeric value in |
train |
An optional data frame, matrix, or sparse matrix containing the
original training data. This may be required depending on the class of
|
cats |
Character string indicating which columns of |
check.class |
Logical indicating whether or not to make sure each column
in |
progress |
Logical indicating whether or not to display a text-based
progress bar. Default is |
parallel |
Logical indicating whether or not to run |
paropts |
List containing additional options to be passed onto
|
By default, partial
returns an object of class
c("data.frame", "partial")
. If ice = TRUE
and
center = FALSE
then an object of class c("data.frame", "ice")
is returned. If ice = TRUE
and center = TRUE
then an object of
class c("data.frame", "cice")
is returned. These three classes
determine the behavior of the plotPartial
function which is
automatically called whenever plot = TRUE
. Specifically, when
plot = TRUE
, a "trellis"
object is returned (see
lattice
for details); the "trellis"
object will
also include an additional attribute, "partial.data"
, containing the
data displayed in the plot.
In some cases it is difficult for partial
to extract the original
training data from object
. In these cases an error message is
displayed requesting the user to supply the training data via the
train
argument in the call to partial
. In most cases where
partial
can extract the required training data from object
,
it is taken from the same environment in which partial
is called.
Therefore, it is important to not change the training data used to construct
object
before calling partial
. This problem is completely
avoided when the training data are passed to the train
argument in the
call to partial
.
It is recommended to call partial
with plot = FALSE
and store
the results. This allows for more flexible plotting, and the user will not
have to waste time calling partial
again if the default plot is not
sufficient.
It is possible to retrieve the last printed "trellis"
object, such as
those produced by plotPartial
, using trellis.last.object()
.
If ice = TRUE
or the prediction function given to pred.fun
returns a prediction for each observation in newdata
, then the result
will be a curve for each observation. These are called individual conditional
expectation (ICE) curves; see Goldstein et al. (2015) and
ice
for details.
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29: 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, 24(1): 44-65, 2015.
## Not run: # # Regression example (requires randomForest package to run) # # Fit a random forest to the boston housing data library(randomForest) data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Using randomForest's partialPlot function partialPlot(boston.rf, pred.data = boston, x.var = "lstat") # Using pdp's partial function head(partial(boston.rf, pred.var = "lstat")) # returns a data frame partial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE) # The partial function allows for multiple predictors partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40, plot = TRUE, chull = TRUE, progress = TRUE) # The plotPartial function offers more flexible plotting pd <- partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40) plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE, colorkey = FALSE, screen = list(z = -20, x = -60)) # The autplot function can be used to produce graphics based on ggplot2 library(ggplot2) autoplot(pd, contour = TRUE, legend.title = "Partial\ndependence") # # Individual conditional expectation (ICE) curves # # Use partial to obtain ICE/c-ICE curves rm.ice <- partial(boston.rf, pred.var = "rm", ice = TRUE) plotPartial(rm.ice, rug = TRUE, train = boston, alpha = 0.2) autoplot(rm.ice, center = TRUE, alpha = 0.2, rug = TRUE, train = boston) # # Classification example (requires randomForest package to run) # # Fit a random forest to the Pima Indians diabetes data data (pima) # load the boston housing data set.seed(102) # for reproducibility pima.rf <- randomForest(diabetes ~ ., data = pima, na.action = na.omit) # Partial dependence of positive test result on glucose (default logit scale) partial(pima.rf, pred.var = "glucose", plot = TRUE, chull = TRUE, progress = TRUE) # Partial dependence of positive test result on glucose (probability scale) partial(pima.rf, pred.var = "glucose", prob = TRUE, plot = TRUE, chull = TRUE, progress = TRUE) ## End(Not run)
## Not run: # # Regression example (requires randomForest package to run) # # Fit a random forest to the boston housing data library(randomForest) data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Using randomForest's partialPlot function partialPlot(boston.rf, pred.data = boston, x.var = "lstat") # Using pdp's partial function head(partial(boston.rf, pred.var = "lstat")) # returns a data frame partial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE) # The partial function allows for multiple predictors partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40, plot = TRUE, chull = TRUE, progress = TRUE) # The plotPartial function offers more flexible plotting pd <- partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40) plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE, colorkey = FALSE, screen = list(z = -20, x = -60)) # The autplot function can be used to produce graphics based on ggplot2 library(ggplot2) autoplot(pd, contour = TRUE, legend.title = "Partial\ndependence") # # Individual conditional expectation (ICE) curves # # Use partial to obtain ICE/c-ICE curves rm.ice <- partial(boston.rf, pred.var = "rm", ice = TRUE) plotPartial(rm.ice, rug = TRUE, train = boston, alpha = 0.2) autoplot(rm.ice, center = TRUE, alpha = 0.2, rug = TRUE, train = boston) # # Classification example (requires randomForest package to run) # # Fit a random forest to the Pima Indians diabetes data data (pima) # load the boston housing data set.seed(102) # for reproducibility pima.rf <- randomForest(diabetes ~ ., data = pima, na.action = na.omit) # Partial dependence of positive test result on glucose (default logit scale) partial(pima.rf, pred.var = "glucose", plot = TRUE, chull = TRUE, progress = TRUE) # Partial dependence of positive test result on glucose (probability scale) partial(pima.rf, pred.var = "glucose", prob = TRUE, plot = TRUE, chull = TRUE, progress = TRUE) ## End(Not run)
Diabetes test results collected by the the US National Institute of Diabetes
and Digestive and Kidney Diseases from a population of women who were at
least 21 years old, of Pima Indian heritage, and living near Phoenix,
Arizona. The data were taken directly from
PimaIndiansDiabetes2
.
data(pima)
data(pima)
A data frame with 768 observations on 9 variables.
pregnant
Number of times pregnant.
glucose
Plasma glucose concentration (glucose tolerance test).
pressure
Diastolic blood pressure (mm Hg).
triceps
Triceps skin fold thickness (mm).
insulin
2-Hour serum insulin (mu U/ml).
mass
Body mass index (weight in kg/(height in m)^2).
pedigree
Diabetes pedigree function.
age
Age (years).
diabetes
Factor indicating the diabetes test result (neg
/pos
).
Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Brian D. Ripley (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.
Grace Whaba, Chong Gu, Yuedong Wang, and Richard Chappell (1995), Soft Classification a.k.a. Risk Estimation via Penalized Log Likelihood and Smoothing Spline Analysis of Variance, in D. H. Wolpert (1995), The Mathematics of Generalization, 331-359, Addison-Wesley, Reading, MA.
Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.1-1.
head(pima)
head(pima)
Plots partial dependence functions (i.e., marginal effects) using
lattice
graphics.
plotPartial(object, ...) ## S3 method for class 'ice' plotPartial( object, center = FALSE, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, rug = FALSE, train = NULL, ... ) ## S3 method for class 'cice' plotPartial( object, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, rug = FALSE, train = NULL, ... ) ## S3 method for class 'partial' plotPartial( object, center = FALSE, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, smooth = FALSE, rug = FALSE, chull = FALSE, levelplot = TRUE, contour = FALSE, contour.color = "white", col.regions = NULL, number = 4, overlap = 0.1, train = NULL, ... )
plotPartial(object, ...) ## S3 method for class 'ice' plotPartial( object, center = FALSE, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, rug = FALSE, train = NULL, ... ) ## S3 method for class 'cice' plotPartial( object, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, rug = FALSE, train = NULL, ... ) ## S3 method for class 'partial' plotPartial( object, center = FALSE, plot.pdp = TRUE, pdp.col = "red2", pdp.lwd = 2, pdp.lty = 1, smooth = FALSE, rug = FALSE, chull = FALSE, levelplot = TRUE, contour = FALSE, contour.color = "white", col.regions = NULL, number = 4, overlap = 0.1, train = NULL, ... )
object |
An object that inherits from the |
... |
Additional optional arguments to be passed onto |
center |
Logical indicating whether or not to produce centered ICE
curves (c-ICE curves). Only useful when |
plot.pdp |
Logical indicating whether or not to plot the partial
dependence function on top of the ICE curves. Default is |
pdp.col |
Character string specifying the color to use for the partial
dependence function when |
pdp.lwd |
Integer specifying the line width to use for the partial
dependence function when |
pdp.lty |
Integer or character string specifying the line type to use
for the partial dependence function when |
rug |
Logical indicating whether or not to include rug marks on the
predictor axes. Default is |
train |
Data frame containing the original training data. Only required
if |
smooth |
Logical indicating whether or not to overlay a LOESS smooth.
Default is |
chull |
Logical indicating whether or not to restrict the first two
variables in |
levelplot |
Logical indicating whether or not to use a false color level
plot ( |
contour |
Logical indicating whether or not to add contour lines to the
level plot. Only used when |
contour.color |
Character string specifying the color to use for the
contour lines when |
col.regions |
Vector of colors to be passed on to
|
number |
Integer specifying the number of conditional intervals to use
for the continuous panel variables. See |
overlap |
The fraction of overlap of the conditioning variables. See
|
## Not run: # # Regression example (requires randomForest package to run) # # Load required packages library(gridExtra) # for `grid.arrange()` library(magrittr) # for forward pipe operator `%>%` library(randomForest) # Fit a random forest to the Boston housing data data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Partial dependence of cmedv on lstat boston.rf %>% partial(pred.var = "lstat") %>% plotPartial(rug = TRUE, train = boston) # Partial dependence of cmedv on lstat and rm boston.rf %>% partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>% plotPartial(contour = TRUE, legend.title = "rm") # ICE curves and c-ICE curves age.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE) p1 <- plotPartial(age.ice, alpha = 0.1) p2 <- plotPartial(age.ice, center = TRUE, alpha = 0.1) grid.arrange(p1, p2, ncol = 2) ## End(Not run)
## Not run: # # Regression example (requires randomForest package to run) # # Load required packages library(gridExtra) # for `grid.arrange()` library(magrittr) # for forward pipe operator `%>%` library(randomForest) # Fit a random forest to the Boston housing data data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Partial dependence of cmedv on lstat boston.rf %>% partial(pred.var = "lstat") %>% plotPartial(rug = TRUE, train = boston) # Partial dependence of cmedv on lstat and rm boston.rf %>% partial(pred.var = c("lstat", "rm"), chull = TRUE, progress = TRUE) %>% plotPartial(contour = TRUE, legend.title = "rm") # ICE curves and c-ICE curves age.ice <- partial(boston.rf, pred.var = "lstat", ice = TRUE) p1 <- plotPartial(age.ice, alpha = 0.1) p2 <- plotPartial(age.ice, center = TRUE, alpha = 0.1) grid.arrange(p1, p2, ncol = 2) ## End(Not run)
Extract the most "important" predictors for regression and classification models.
topPredictors(object, n = 1L, ...) ## Default S3 method: topPredictors(object, n = 1L, ...) ## S3 method for class 'train' topPredictors(object, n = 1L, ...)
topPredictors(object, n = 1L, ...) ## Default S3 method: topPredictors(object, n = 1L, ...) ## S3 method for class 'train' topPredictors(object, n = 1L, ...)
object |
A fitted model object of appropriate class (e.g., |
n |
Integer specifying the number of predictors to return. Default is
|
... |
Additional optional arguments to be passed onto
|
This function uses the generic function varImp
to
calculate variable importance scores for each predictor. After that, they are
sorted at the names of the n
highest scoring predictors are returned.
## Not run: # # Regression example (requires randomForest package to run) # Load required packages library(ggplot2) library(randomForest) # Fit a random forest to the mtcars dataset data(mtcars, package = "datasets") set.seed(101) mtcars.rf <- randomForest(mpg ~ ., data = mtcars, mtry = 5, importance = TRUE) # Topfour predictors top4 <- topPredictors(mtcars.rf, n = 4) # Construct partial dependence functions for top four predictors pd <- NULL for (i in top4) { tmp <- partial(mtcars.rf, pred.var = i) names(tmp) <- c("x", "y") pd <- rbind(pd, cbind(tmp, predictor = i)) } # Display partial dependence functions ggplot(pd, aes(x, y)) + geom_line() + facet_wrap(~ predictor, scales = "free") + theme_bw() + ylab("mpg") ## End(Not run)
## Not run: # # Regression example (requires randomForest package to run) # Load required packages library(ggplot2) library(randomForest) # Fit a random forest to the mtcars dataset data(mtcars, package = "datasets") set.seed(101) mtcars.rf <- randomForest(mpg ~ ., data = mtcars, mtry = 5, importance = TRUE) # Topfour predictors top4 <- topPredictors(mtcars.rf, n = 4) # Construct partial dependence functions for top four predictors pd <- NULL for (i in top4) { tmp <- partial(mtcars.rf, pred.var = i) names(tmp) <- c("x", "y") pd <- rbind(pd, cbind(tmp, predictor = i)) } # Display partial dependence functions ggplot(pd, aes(x, y)) + geom_line() + facet_wrap(~ predictor, scales = "free") + theme_bw() + ylab("mpg") ## End(Not run)