Package 'mda' reference manual

Title:	Mixture and Flexible Discriminant Analysis
Description:	Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines. Hastie, Tibshirani and Friedman (2009) "Elements of Statistical Learning (second edition, chap 12)" Springer, New York.
Authors:	Trevor Hastie [aut, cre] (Original co-author of the S package `mda`), Robert Tibshirani [aut] (Original co-author of the S package `mda`), Balasubramanian Narasimhan [ctb] (Contributed to the upgrading of code), Friedrich Leisch [ctb] (Original R port from the S package), Kurt Hornik [ctb] (Original R port from the S package), Brian Ripley [ctb] (Original R port from the S package)
Maintainer:	Trevor Hastie <[email protected]>
License:	GPL-2
Version:	0.5-5
Built:	2025-03-08 06:26:06 UTC
Source:	CRAN

Fit an Additive Spline Model by Adaptive Backfitting

Description

Fit an additive spline model by adaptive backfitting.

Usage

bruto(x, y, w, wp, dfmax, cost, maxit.select, maxit.backfit, 
      thresh = 0.0001, trace.bruto = FALSE, start.linear = TRUE,
      fit.object, ...)
bruto(x, y, w, wp, dfmax, cost, maxit.select, maxit.backfit, 
      thresh = 0.0001, trace.bruto = FALSE, start.linear = TRUE,
      fit.object, ...)

Arguments

`x`	a matrix of numeric predictors (does not include the column of 1s).
`y`	a vector or matrix of responses.
`w`	optional observation weight vector.
`wp`	optional weight vector for each column of `y`; the RSS and GCV criteria use a weighted sum of squared residuals.
`dfmax`	a vector of maximum df (degrees of freedom) for each term.
`cost`	cost per degree of freedom; default is 2.
`maxit.select`	maximum number of iterations during the selection stage.
`maxit.backfit`	maximum number of iterations for the final backfit stage (with fixed lambda).
`thresh`	convergence threshold (default is 0.0001); iterations cease when the relative change in GCV is below this threshold.
`trace.bruto`	logical flag. If `TRUE` (default) a progress report is printed during the fitting.
`start.linear`	logical flag. If `TRUE` (default), the model starts with the linear fit.
`fit.object`	This the object returned by `bruto()`; if supplied, the same model is fit to the presumably new `y`.
`...`	further arguments to be passed to or from methods.

Value

A multiresponse additive model fit object of class "bruto" is returned. The model is fit by adaptive backfitting using smoothing splines. If there are np columns in y, then np additive models are fit, but the same amount of smoothing (df) is used for each term. The procedure chooses between df = 0 (term omitted), df = 1 (term linear) or df > 0 (term fitted by smoothing spline). The model selection is based on an approximation to the GCV criterion, which is used at each step of the backfitting procedure. Once the selection process stops, the model is backfit using the chosen amount of smoothing.

A bruto object has the following components of interest:

`lambda`	a vector of chosen smoothing parameters, one for each column of `x`.
`df`	the df chosen for each column of `x`.
`type`	a factor with levels `"excluded"`, `"linear"` or `"smooth"`, indicating the status of each column of `x`.
`gcv.select gcv.backfit df.select`	The sequence of gcv values and df selected during the execution of the function.
`nit`	the number of iterations used.
`fitted.values`	a matrix of fitted values.
`residuals`	a matrix of residuals.
`call`	the call that produced this object.

References

Trevor Hastie and Rob Tibshirani, Generalized Additive Models, Chapman and Hall, 1990 (page 262).

Trevor Hastie, Rob Tibshirani and Andreas Buja “Flexible Discriminant Analysis by Optimal Scoring” JASA 1994, 89, 1255-1270.

Examples

data(trees)
fit1 <- bruto(trees[,-3], trees[3])
fit1$type
fit1$df
## examine the fitted functions
par(mfrow=c(1,2), pty="s")
Xp <- matrix(sapply(trees[1:2], mean), nrow(trees), 2, byrow=TRUE)
for(i in 1:2) {
  xr <- sapply(trees, range)
  Xp1 <- Xp; Xp1[,i] <- seq(xr[1,i], xr[2,i], len=nrow(trees))
  Xf <- predict(fit1, Xp1)
  plot(Xp1[ ,i], Xf, xlab=names(trees)[i], ylab="", type="l")
}
data(trees)
fit1 <- bruto(trees[,-3], trees[3])
fit1$type
fit1$df
## examine the fitted functions
par(mfrow=c(1,2), pty="s")
Xp <- matrix(sapply(trees[1:2], mean), nrow(trees), 2, byrow=TRUE)
for(i in 1:2) {
  xr <- sapply(trees, range)
  Xp1 <- Xp; Xp1[,i] <- seq(xr[1,i], xr[2,i], len=nrow(trees))
  Xf <- predict(fit1, Xp1)
  plot(Xp1[ ,i], Xf, xlab=names(trees)[i], ylab="", type="l")
}

Produce coefficients for an fda or mda object

Description

a method for coef for extracting the canonical coefficients from an fda or mda object

Usage

## S3 method for class 'fda'
coef(object, ...)
## S3 method for class 'fda'
coef(object, ...)

Arguments

`object`	an `fda` or `mda` object.
`...`	not relevant

Details

See the references for details.

Value

A coefficient matrix

Author(s)

Trevor Hastie and Robert Tibshirani

References

“Flexible Disriminant Analysis by Optimal Scoring” by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

“Penalized Discriminant Analysis” by Hastie, Buja and Tibshirani, 1995, Annals of Statistics, 73-102.

“Elements of Statisical Learning - Data Mining, Inference and Prediction” (2nd edition, Chapter 12) by Hastie, Tibshirani and Friedman, 2009, Springer

Examples

data(iris)
irisfit <- fda(Species ~ ., data = iris)
coef(irisfit)
mfit=mda(Species~.,data=iris,subclass=2)
coef(mfit)
data(iris)
irisfit <- fda(Species ~ ., data = iris)
coef(irisfit)
mfit=mda(Species~.,data=iris,subclass=2)
coef(mfit)

Confusion Matrices

Description

Compute the confusion matrix between two factors, or for an fda or mda object.

Usage

## Default S3 method:
confusion(object, true, ...)
## S3 method for class 'fda'
confusion(object, data, ...)
## Default S3 method:
confusion(object, true, ...)
## S3 method for class 'fda'
confusion(object, data, ...)

Arguments

`object`	the predicted factor, or an fda or mda model object.
`true`	the true factor.
`data`	a data frame (list) containing the test data.
`...`	further arguments to be passed to or from methods.

Details

This is a generic function.

Value

For the default method essentially table(object, true), but with some useful attribute(s).

Examples

data(iris)
irisfit <- fda(Species ~ ., data = iris)
confusion(predict(irisfit, iris), iris$Species)
##            Setosa Versicolor Virginica 
##     Setosa     50          0         0
## Versicolor      0         48         1
##  Virginica      0          2        49
## attr(, "error"):
## [1] 0.02
data(iris)
irisfit <- fda(Species ~ ., data = iris)
confusion(predict(irisfit, iris), iris$Species)
##            Setosa Versicolor Virginica 
##     Setosa     50          0         0
## Versicolor      0         48         1
##  Virginica      0          2        49
## attr(, "error"):
## [1] 0.02

Mixture example from "Elements of Statistical Learning"

Description

A list with training data and other details for the mixture example

Usage

data(ESL.mixture)data(ESL.mixture)

Format

This list contains the following elements:

x: a 200x2 matrix of predictors.
y: a 200 vector of y values taking values 0 or 1.
xnew: a 6831x2 matrix of prediction points, on a 69x99 grid.
prob: a vector of 6831 probabilities - the true probabilities of a 1 at each point in xnew.
marginal: the marginal distribution of the predictors t each point in xnew.
px1: grid values for first coordinate in xnew.
px2: grid values for second coordinate in xnew.
means: a 20 x 2 matrix of means used in the generation of these data.

Source

"Elements of Statistical Learning (second edition)", Hastie, T., Tibshirani, R. and Friedman, J. (2009), Springer, New York. https://hastie.su.domains/ElemStatLearn/

Flexible Discriminant Analysis

Description

Flexible discriminant analysis.

Usage

fda(formula, data, weights, theta, dimension, eps, method,
    keep.fitted, ...)
fda(formula, data, weights, theta, dimension, eps, method,
    keep.fitted, ...)

Arguments

`formula`	of the form `y~x` it describes the response and the predictors. The formula can be more complicated, such as `y~log(x)+z` etc (see `formula` for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).
`data`	data frame containing the variables in the formula (optional).
`weights`	an optional vector of observation weights.
`theta`	an optional matrix of class scores, typically with less than `J-1` columns.
`dimension`	The dimension of the solution, no greater than `J-1`, where `J` is the number classes. Default is `J-1`.
`eps`	a threshold for small singular values for excluding discriminant variables; default is `.Machine$double.eps`.
`method`	regression method used in optimal scaling. Default is linear regression via the function `polyreg`, resulting in linear discriminant analysis. Other possibilities are `mars` and `bruto`. For Penalized Discriminant analysis `gen.ridge` is appropriate.
`keep.fitted`	a logical variable, which determines whether the (sometimes large) component `"fitted.values"` of the `fit` component of the returned fda object should be kept. The default is `TRUE` if `n * dimension < 5000`.
`...`	additional arguments to `method`.

Value

an object of class "fda". Use predict to extract discriminant variables, posterior probabilities or predicted class memberships. Other extractor functions are coef, confusion and plot.

The object has the following components:

`percent.explained`	the percent between-group variance explained by each dimension (relative to the total explained.)
`values`	optimal scaling regression sum-of-squares for each dimension (see reference). The usual discriminant analysis eigenvalues are given by `values / (1-values)`, which are used to define `percent.explained`.
`means`	class means in the discriminant space. These are also scaled versions of the final theta's or class scores, and can be used in a subsequent call to `fda` (this only makes sense if some columns of theta are omitted—see the references).
`theta.mod`	(internal) a class scoring matrix which allows `predict` to work properly.
`dimension`	dimension of discriminant space.
`prior`	class proportions for the training data.
`fit`	fit object returned by `method`.
`call`	the call that created this object (allowing it to be `update`-able)
`confusion`	confusion matrix when classifying the training data.

The method functions are required to take arguments x and y where both can be matrices, and should produce a matrix of fitted.values the same size as y. They can take additional arguments weights and should all have a ... for safety sake. Any arguments to method can be passed on via the ... argument of fda. The default method polyreg has a degree argument which allows polynomial regression of the required total degree. See the documentation for predict.fda for further requirements of method. The package earth is suggested for this package as well; earth is a more detailed implementation of the mars model, and works as a method argument.

Author(s)

Trevor Hastie and Robert Tibshirani

References

“Flexible Disriminant Analysis by Optimal Scoring” by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

“Penalized Discriminant Analysis” by Hastie, Buja and Tibshirani, 1995, Annals of Statistics, 73-102.

“Elements of Statisical Learning - Data Mining, Inference and Prediction” (2nd edition, Chapter 12) by Hastie, Tibshirani and Friedman, 2009, Springer

Examples

data(iris)
irisfit <- fda(Species ~ ., data = iris)
irisfit
## fda(formula = Species ~ ., data = iris)
##
## Dimension: 2 
##
## Percent Between-Group Variance Explained:
##     v1     v2 
##  99.12 100.00 
##
## Degrees of Freedom (per dimension): 5 
##
## Training Misclassification Error: 0.02 ( N = 150 )

confusion(irisfit, iris)
##            Setosa Versicolor Virginica 
##     Setosa     50          0         0
## Versicolor      0         48         1
##  Virginica      0          2        49
## attr(, "error"):
## [1] 0.02

plot(irisfit)

coef(irisfit)
##           [,1]        [,2]
## [1,] -2.126479 -6.72910343
## [2,] -0.837798  0.02434685
## [3,] -1.550052  2.18649663
## [4,]  2.223560 -0.94138258
## [5,]  2.838994  2.86801283

marsfit <- fda(Species ~ ., data = iris, method = mars)
marsfit2 <- update(marsfit, degree = 2)
marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
## this refits the model, using the fitted means (scaled theta's)
## from marsfit to start the iterations
data(iris)
irisfit <- fda(Species ~ ., data = iris)
irisfit
## fda(formula = Species ~ ., data = iris)
##
## Dimension: 2 
##
## Percent Between-Group Variance Explained:
##     v1     v2 
##  99.12 100.00 
##
## Degrees of Freedom (per dimension): 5 
##
## Training Misclassification Error: 0.02 ( N = 150 )

confusion(irisfit, iris)
##            Setosa Versicolor Virginica 
##     Setosa     50          0         0
## Versicolor      0         48         1
##  Virginica      0          2        49
## attr(, "error"):
## [1] 0.02

plot(irisfit)

coef(irisfit)
##           [,1]        [,2]
## [1,] -2.126479 -6.72910343
## [2,] -0.837798  0.02434685
## [3,] -1.550052  2.18649663
## [4,]  2.223560 -0.94138258
## [5,]  2.838994  2.86801283

marsfit <- fda(Species ~ ., data = iris, method = mars)
marsfit2 <- update(marsfit, degree = 2)
marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
## this refits the model, using the fitted means (scaled theta's)
## from marsfit to start the iterations

Penalized Regression

Description

Perform a penalized regression, as used in penalized discriminant analysis.

Usage

gen.ridge(x, y, weights, lambda=1, omega, df, ...)
gen.ridge(x, y, weights, lambda=1, omega, df, ...)

Arguments

`x`, `y`, `weights`	the x and y matrix and possibly a weight vector.
`lambda`	the shrinkage penalty coefficient.
`omega`	a penalty object; omega is the eigendecomposition of the penalty matrix, and need not have full rank. By default, standard ridge is used.
`df`	an alternative way to prescribe lambda, using the notion of equivalent degrees of freedom.
`...`	currently not used.

Value

A generalized ridge regression, where the coefficients are penalized according to omega. See the function definition for further details. No functions are provided for producing one dimensional penalty objects (omega). laplacian() creates a two-dimensional penalty object, suitable for (small) images.

Glass Identification Database

Description

The glass data frame has 214 observations and 10 variables, representing glass fragments.

Usage

data(glass)data(glass)

Format

This data frame contains the following columns:

RI

refractive index

Na