Package 'dawai' reference manual

Title:	Discriminant Analysis with Additional Information
Description:	In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.
Authors:	David Conde [aut, cre], Miguel A. Fernandez [aut], Bonifacio Salvador [aut]
Maintainer:	David Conde <[email protected]>
License:	GPL (>= 2)
Version:	1.2.7
Built:	2025-02-13 06:51:01 UTC
Source:	CRAN

Discriminant analysis with additional information

Description

This package performs linear and quadratic discriminant analysis with additional information expressed as inequality constraints among the populations means and computes several estimations of the true error rate

Details

Package: dawai

Type: Package

Version: 1.2.7

Date: 2024-10-15

License: GPL-2 | GPL-3

For a complete list of functions with individual help pages, use library(help = "dawai").

Author(s)

David Conde, Miguel A. Fernandez, Bonifacio Salvador

Maintainer: David Conde <[email protected]>

References

Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.

Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.

Conde, D., Salvador, B., Rueda, C. , and Fernandez, M. A. (2013). Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial. Statistical Applications in Genetics and Molecular Biology, 12(5), 583-602.

Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.

Restricted Discriminant Analysis. True Error Rate estimation

Description

err.est is a generic function for true error rate estimations of classification rules built with additional information. The function invokes particular methods which depend on the class of the first argument.

Usage

err.est(x, ...)
err.est(x, ...)

Arguments

`x`	An object for which true error rate estimations are desired.
`...`	Additional arguments affecting the true error rate estimations produced.

Value

See the documentation of the particular methods for details of what is produced by each method.

Author(s)

David Conde

Restricted Linear Discriminant Analysis. True Error Rate estimation

Description

Estimate the true error rate of linear classification rules built with additional information (in conjunction with rlda).

Usage

## S3 method for class 'rlda'
err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
## S3 method for class 'rlda'
err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)

Arguments

`x`	An object of class `'rlda'`.
`nboot`	Number of bootstrap samples used to estimate the true error rate of the classification rules.
`gamma`	A vector of values specifying which rules to take among the ones in `x`. If unspecified, all rules built with `x$gamma` will be used. If present, `gamma` must be contained in `x$gamma`.
`prior`	The prior probabilities of class membership. If unspecified, `x$prior` probabilities are used. If present, the probabilities must be specified in the order of the factor levels.
`...`	Arguments based from or to other methods.

Details

This function is a method for the generic function err.est() for class 'rlda'.

Value

A list with components

`call`	The (matched) function call.
`restrictions`	Character vector with the restrictions on the means vector detailed.
`prior`	The prior probabilities of the classes used.
`counts`	The number of observations of the classes used.
`N`	The total number of observations used.
`estimationError`	Matrix with BT2, BT3, BT2CV and BT3CV true error rate estimates of the rules.

Note

To overcome singularity of the covariance matrices after bootstraping, the number of observations in each class must be greater than the number of explanatory variables divided by 0.632.

Author(s)

David Conde

References

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data = Vehicle2[, c("Holl.Ra", "Sc.Var.maxis")]
grouping = Vehicle2$Class
levels(grouping) <- c(3, 1, 1, 2)  
## now we can consider the following restrictions:
## mu11 >= mu21 >= mu31
## 
## we can specify these restrictions by restext = "s>1"

set.seed(-1007)
values <- runif(length(rownames(data)))
trainsubset <- values < 0.05
testsubset <- values >= 0.05
obj <- rlda(data, grouping, subset = trainsubset, restext = "s>1")
pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset],
                prior = c(1/3, 1/3,1/3))
pred$error.rate
err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data = Vehicle2[, c("Holl.Ra", "Sc.Var.maxis")]
grouping = Vehicle2$Class
levels(grouping) <- c(3, 1, 1, 2)  
## now we can consider the following restrictions:
## mu11 >= mu21 >= mu31
## 
## we can specify these restrictions by restext = "s>1"

set.seed(-1007)
values <- runif(length(rownames(data)))
trainsubset <- values < 0.05
testsubset <- values >= 0.05
obj <- rlda(data, grouping, subset = trainsubset, restext = "s>1")
pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset],
                prior = c(1/3, 1/3,1/3))
pred$error.rate
err.est(obj, 30, prior = c(1/3, 1/3, 1/3))

Restricted Quadratic Discriminant Analysis. True Error Rate Estimation

Description

Estimate the true error rate of quadratic classification rules built with additional information (in conjunction with rqda).

Usage

## S3 method for class 'rqda'
err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
## S3 method for class 'rqda'
err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)

Arguments

`x`	An object of class `'rqda'`.
`nboot`	Number of bootstrap samples used to estimate the true error rate of the classification rules.
`gamma`	A vector of values specifying which rules to take among the ones in `x`. If unspecified, all rules built with `x$gamma` will be used. If present, `gamma` must be contained in `x$gamma`.
`prior`	The prior probabilities of class membership. If unspecified, `x$prior` probabilities are used. If present, the probabilities must be specified in the order of the factor levels.
`...`	Arguments based from or to other methods.

Details

This function is a method for the generic function err.est() for class 'rqda'.

Value

A list with components

`call`	The (matched) function call.
`restrictions`	Character vector with the restrictions on the means vector detailed.
`prior`	The prior probabilities of the classes used.
`counts`	The number of observations of the classes used.
`N`	The total number of observations used.
`estimationError`	Matrix with BT2, BT3, BT2CV and BT3CV true error rate estimates of the rules.

Note

To overcome singularity of the covariance matrices after bootstraping, the number of observations in each class must be greater than the number of explanatory variables divided by 0.632.

Author(s)

David Conde

References

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data = Vehicle2[, c("Kurt.Maxis", "Holl.Ra", "Sc.Var.maxis")]
grouping = Vehicle2$Class
levels(grouping) <- c(3, 1, 1, 2)  
## now we can consider the following restrictions:
## mu11 >= mu21 >= mu31
## mu12 >= mu22 >= mu32
## 
## we can specify these restrictions by restext = "s>1,2"

set.seed(5561)
values <- runif(length(rownames(data)))
trainsubset <- values < 0.05
testsubset <- values >= 0.05
obj <- rqda(data, grouping, subset = trainsubset, restext = "s>1,2")
pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset],
                prior = c(1/3, 1/3,1/3))
pred$error.rate
err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data = Vehicle2[, c("Kurt.Maxis", "Holl.Ra", "Sc.Var.maxis")]
grouping = Vehicle2$Class
levels(grouping) <- c(3, 1, 1, 2)  
## now we can consider the following restrictions:
## mu11 >= mu21 >= mu31
## mu12 >= mu22 >= mu32
## 
## we can specify these restrictions by restext = "s>1,2"

set.seed(5561)
values <- runif(length(rownames(data)))
trainsubset <- values < 0.05
testsubset <- values >= 0.05
obj <- rqda(data, grouping, subset = trainsubset, restext = "s>1,2")
pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset],
                prior = c(1/3, 1/3,1/3))
pred$error.rate
err.est(obj, 30, prior = c(1/3, 1/3, 1/3))

Minimize Inequality Constrained Mahalanobis Distance

Description

Find the vector z that solves:

min{ (x - z)'inv(S)(x - z); Az <= b },

where x is an input vector, S its covariance matrix, A is a matrix of known contrasts, and b is a vector of known constraint constants.

Usage

lsConstrain.fit(x, b, s, a, iflag, itmax=4000, eps=1e-06, eps2=1e-06)
lsConstrain.fit(x, b, s, a, iflag, itmax=4000, eps=1e-06, eps2=1e-06)

Arguments

`x`	vector of length n
`b`	vector of length k, containing constraint constants
`s`	matrix of dim n x n, the covariance matrix for vector x
`a`	matrix of dim k x n, for the contraints
`iflag`	vector of length k; an item = 0 if inequality constraint, 1 if equality constraint
`itmax`	scalar for number of max interations
`eps`	scalar of accuracy for convergence
`eps2`	scalar to determine close to zero

Value

List with the following components:

itmax: (defined above)

eps: (defined above)

eps2: (defined above)

iflag: (defined above)

xkt: vector of length k, for the Kuhn-Tucker coefficients.

iter: number of completed iterations.

supdif: greatest difference between estimates across a full cycle

ifault: error indicator: 0 = no error 1 = itmax exceeded 3 = invalid constraint function for some row ASA'=0.

a: (defined above)

call: function call

x.init: input vector x.

x.final: the vector "z" that solves the equation (see z in description).

s: (defind above)

min.dist: the minimum value of the function (see description).

References

Wollan PC, Dykstra RL. Minimizing inequality constrained mahalanobis distances. Applied Statistics Algorithm AS 225 (1987).

Examples

# An simulation example with linear regression with 3 beta's, 
# where we have the contraints:
#
# b[1] > 0
# b[2] - b[1] < 0
# b[3] < 0


set.seed(111)

n <- 100
x <- rep(1:3,rep(n,3))
x <- 1*outer(x,1:3,"==")

beta <- c(2,1,1)

y <- x%*%beta + rnorm(nrow(x))

fit <- lm(y ~-1 + x)

s <- solve( t(x) %*% x )

bhat <- fit$coef


a <-  rbind(c(-1, 0, 0),
            c(-1, 1, 0),
            c( 0, 0, 1))

# View expected constraints (3rd not met):

a %*% bhat
#            [,1] 
# [1,] -1.8506811
# [2,] -0.9543320
# [3,]  0.8590827

b <- rep(0, nrow(a))
iflag <- rep(0,length(b))

save <- lsConstrain.fit(x=bhat,b=b, s=s, a=a, iflag=iflag, itmax=500, 
                        eps=1e-6, eps2=1e-6)

save
# An simulation example with linear regression with 3 beta's, 
# where we have the contraints:
#
# b[1] > 0
# b[2] - b[1] < 0
# b[3] < 0


set.seed(111)

n <- 100
x <- rep(1:3,rep(n,3))
x <- 1*outer(x,1:3,"==")

beta <- c(2,1,1)

y <- x%*%beta + rnorm(nrow(x))

fit <- lm(y ~-1 + x)

s <- solve( t(x) %*% x )

bhat <- fit$coef


a <-  rbind(c(-1, 0, 0),
            c(-1, 1, 0),
            c( 0, 0, 1))

# View expected constraints (3rd not met):

a %*% bhat
#            [,1] 
# [1,] -1.8506811
# [2,] -0.9543320
# [3,]  0.8590827

b <- rep(0, nrow(a))
iflag <- rep(0,length(b))

save <- lsConstrain.fit(x=bhat,b=b, s=s, a=a, iflag=iflag, itmax=500, 
                        eps=1e-6, eps2=1e-6)

save

Restricted Linear Discriminant Analysis. Multivariate Observations Classification

Description

Classify multivariate observations with linear classification rules built with additional information in conjunction with rlda.

Usage

## S3 method for class 'rlda'
predict(object, newdata, prior = object$prior,
        gamma = object$gamma, grouping = NULL, ...)
## S3 method for class 'rlda'
predict(object, newdata, prior = object$prior,
        gamma = object$gamma, grouping = NULL, ...)

Arguments

`object`	An object of class `'rlda'`.
`newdata`	A data frame of cases to be classified, containing the variables used on creating `object`. A vector will be interpreted as a row vector.
`prior`	The prior probabilities of class membership. If unspecified, `object$prior` probabilities are used. If present, the probabilities must be specified in the order of the factor levels.
`gamma`	A vector of values specifying which rules to take among the ones in `object`. If unspecified, all rules built with `object$gamma` will be used. If present, `gamma` must be contained in `object$gamma`.
`grouping`	A numeric vector or factor with numeric levels specifying the class for each observation. If present, true error rate will be estimated from `newdata`.
`...`	Arguments based from or to other methods.

Details

This function is a method for the generic function predict() for class 'rlda'.

Value

A list with components

`call`	The (matched) function call.
`class`	Matrix with the classification for each rule (in columns).
`prior`	The prior probabilities of the classes used.
`posterior`	Array with the posterior probabilities of the classes for each rule.
`error.rate`	True error rate estimation (when `grouping` specified) for each rule, based on `newdata`.

Note

If there are missing values in newdata, corresponding observations will not be classified.

If there are missing values in grouping, corresponding observations will not be considered on calculating the true error rate.

Author(s)

David Conde

References

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2
levels(data$Class) <- c(4, 2, 1, 3)  
## classes ordered by increasing size
## 
## according to variable definitions, we can 
## consider the following restrictions on the means vectors:
## mu11, mu21 >= mu31 >= mu41
## mu12, mu22 >= mu32 >= mu42
## 
## we have 6 restrictions, 3 predictors and 4 classes, so
## resmatrix must be a 6 x 12 matrix:

A <- matrix(0, ncol = 12, nrow = 6)
A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1
A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1

set.seed(983)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
testsubset <- values >= 0.2
obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis,
            data, subset = trainsubset, gamma = c(0, 0.5, 1),
            resmatrix = A)
pred <- predict(obj, newdata = data[testsubset,], 
                grouping = data[testsubset, "Class"],
                prior = rep(1/4, 4))
pred$error.rate
## we can see that the test error rate of the restricted
## rules decrease with gamma:
##                       gamma=0 gamma=0.5  gamma=1
## True error rate (%): 40.86957  39.71014 39.71014
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2
levels(data$Class) <- c(4, 2, 1, 3)  
## classes ordered by increasing size
## 
## according to variable definitions, we can 
## consider the following restrictions on the means vectors:
## mu11, mu21 >= mu31 >= mu41
## mu12, mu22 >= mu32 >= mu42
## 
## we have 6 restrictions, 3 predictors and 4 classes, so
## resmatrix must be a 6 x 12 matrix:

A <- matrix(0, ncol = 12, nrow = 6)
A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1
A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1

set.seed(983)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
testsubset <- values >= 0.2
obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis,
            data, subset = trainsubset, gamma = c(0, 0.5, 1),
            resmatrix = A)
pred <- predict(obj, newdata = data[testsubset,], 
                grouping = data[testsubset, "Class"],
                prior = rep(1/4, 4))
pred$error.rate
## we can see that the test error rate of the restricted
## rules decrease with gamma:
##                       gamma=0 gamma=0.5  gamma=1
## True error rate (%): 40.86957  39.71014 39.71014

Restricted Quadratic Discriminant Analysis. Multivariate Observations Classification

Description

Classify multivariate observations with quadratic classification rules built with additional information in conjunction with rqda.

Usage

## S3 method for class 'rqda'
predict(object, newdata, prior = object$prior,
        gamma = object$gamma, grouping = NULL, ...)
## S3 method for class 'rqda'
predict(object, newdata, prior = object$prior,
        gamma = object$gamma, grouping = NULL, ...)

Arguments

`object`	An object of class `'rqda'`.
`newdata`	A data frame of cases to be classified, containing the variables used on creating `object`. A vector will be interpreted as a row vector.
`prior`	The prior probabilities of class membership. If unspecified, `object$prior` probabilities are used. If present, the probabilities must be specified in the order of the factor levels.
`gamma`	A vector of values specifying which rules to take among the ones in `object`. If unspecified, all rules built with `object$gamma` will be used. If present, `gamma` must be contained in `object$gamma`.
`grouping`	A numeric vector or factor with numeric levels specifying the class for each observation. If present, true error rate will be estimated from `newdata`.
`...`	Arguments based from or to other methods.

Details

This function is a method for the generic function predict() for class 'rqda'.

Value

A list with components

`call`	The (matched) function call.
`class`	Matriarchx with the classification for each rule (in columns).
`prior`	The prior probabilities of the classes used.
`posterior`	Array with the posterior probabilities of the classes for each rule.
`error.rate`	True error rate estimation (when `grouping` specified) for each rule, based on `newdata`.

Note

If there are missing values in newdata, corresponding observations will not be classified.

If there are missing values in grouping, corresponding observations will not be considered on calculating the true error rate.

Author(s)

David Conde

References

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2[, 1:4]
grouping = Vehicle2$Class
levels(grouping) <- c(4, 2, 1, 3)
## classes ordered by increasing size
## 
## according to variable definitions, we can consider 
## the following restrictions on the means vectors:
## mu11 >= mu21 >= mu31 >= mu41
## mu12 >= mu22 >= mu32 >= mu42
## mu13 >= mu23 >= mu33 >= mu43
## 
## we can specify these restrictions by restext = "s>1,2,3"

set.seed(7964)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
testsubset <- values >= 0.2
obj <- rqda(data, grouping, subset = trainsubset,
            gamma = (1:5)/5, restext = "s>1,2,3")
pred <- predict(obj, newdata = data[testsubset,], 
                grouping = grouping[testsubset])
pred$error.rate
## we can see that the test error rate of the restricted
## rules decrease with gamma:
##                      gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8  gamma=1
## True error rate (%):  40.14815  39.85185  39.85185  39.11111 39.11111
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2[, 1:4]
grouping = Vehicle2$Class
levels(grouping) <- c(4, 2, 1, 3)
## classes ordered by increasing size
## 
## according to variable definitions, we can consider 
## the following restrictions on the means vectors:
## mu11 >= mu21 >= mu31 >= mu41
## mu12 >= mu22 >= mu32 >= mu42
## mu13 >= mu23 >= mu33 >= mu43
## 
## we can specify these restrictions by restext = "s>1,2,3"

set.seed(7964)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
testsubset <- values >= 0.2
obj <- rqda(data, grouping, subset = trainsubset,
            gamma = (1:5)/5, restext = "s>1,2,3")
pred <- predict(obj, newdata = data[testsubset,], 
                grouping = grouping[testsubset])
pred$error.rate
## we can see that the test error rate of the restricted
## rules decrease with gamma:
##                      gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8  gamma=1
## True error rate (%):  40.14815  39.85185  39.85185  39.11111 39.11111

Restricted Linear Discriminant Analysis

Description

Build linear classification rules with additional information expressed as inequality restrictions among the populations means.

Usage

rlda(x, ...)

## S3 method for class 'matrix'
rlda(x, ...)

## S3 method for class 'data.frame'
rlda(x, grouping, ...)

## S3 method for class 'formula'
rlda(formula, data, ...)

## Default S3 method:
rlda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL,
     gamma = c(0, 1), prior = NULL, ...)
rlda(x, ...)

## S3 method for class 'matrix'
rlda(x, ...)

## S3 method for class 'data.frame'
rlda(x, grouping, ...)

## S3 method for class 'formula'
rlda(formula, data, ...)

## Default S3 method:
rlda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL,
     gamma = c(0, 1), prior = NULL, ...)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...`. That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are to be taken.
`x`	(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables.
`grouping`	(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation.
`subset`	An index vector specifying the cases to be used in the training sample.
`resmatrix`	A matrix specifying the linear restrictions on the mean vectors: `resmatrix` `%*%` `mu` `<=` 0, where `mu = c(mu1, mu2, ...)` and `mui` is the mean vector of class `i`. If unspecified, `restext` will be required (and `resmatrix` established accordingly).
`restext`	(Required if no `resmatrix` argument is given.) A character string from which `resmatrix` will be calculated. The first element must be either `"s"` (simple order) or `"t"` (tree order: `mu1 >= mu2, mu1 >= mu3 ...`). The second element must be either `"<"` (increasing componentwise order) or `">"` (decreasing componentwise order). The rest of the elements must be numbers from 1 to the number of explanatory variables, separated by commas, specifying among which variables the restrictions hold. For example, `"s<1,3"` will stand for `mu11` `<=` `mu21` `<=` `mu31` `<=` ..., `mu13` `<=` `mu23` `<=` `mu33` `<=` ...
`gamma`	A vector of values in the unit interval that determine the classification rules with additional information (see references).
`prior`	The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels.
`...`	Arguments passed to or from other methods.

Details

Specifying the prior will affect the classification and error unless over-ridden in predict.rlda and err.est.rlda, respectively.

Value

An object of class 'rlda' containing the following components:

`call`	The (matched) function call.
`trainset`	Matrix with the training set used (first columns) and the class for each observation (last column).
`restrictions`	Edited character string with the linear restrictions on the mean vectors detailed.
`resmatrix`	The matrix with the restrictions on the mean vectors used.
`prior`	Prior probabilities of class membership used.
`counts`	The number of observations of the classes used.
`N`	The total number of observations used.
`samplemeans`	Matrix with the sample means in rows.
`samplevariances`	Array with the sample covariance matrices of the classes.
`gamma`	Gamma values used.
`spooled`	Pooled covariance matrix.
`estimatedmeans`	Array with the estimated means for each classification rule.
`apparent`	Apparent error rate for each classification rule.

Note

This function may be called giving either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.

Classes must be identified, either in a column of data or in the grouping vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.

If there are missing values in either data, x or grouping, corresponding observations will be deleted.

To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.

Author(s)

David Conde

References

Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2
levels(data$Class) <- c(4, 2, 1, 3)  
## classes ordered by increasing size
## 
## according to variable definitions, we can 
## consider the following restrictions on the means vectors:
## mu11, mu21 >= mu31 >= mu41
## mu12, mu22 >= mu32 >= mu42
## 
## we have 6 restrictions, 3 predictors and 4 classes, so
## resmatrix must be a 6 x 12 matrix:

A <- matrix(0, ncol = 12, nrow = 6)
A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1
A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1

set.seed(983)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis,
            data, subset = trainsubset, gamma = c(0, 0.5, 1),
            resmatrix = A)
obj
## we can see that the apparent error rate of the restricted
## rules decrease with gamma:
##  gamma=0 gamma=0.5   gamma=1
## 42.30769  41.66667  41.02564
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2
levels(data$Class) <- c(4, 2, 1, 3)  
## classes ordered by increasing size
## 
## according to variable definitions, we can 
## consider the following restrictions on the means vectors:
## mu11, mu21 >= mu31 >= mu41
## mu12, mu22 >= mu32 >= mu42
## 
## we have 6 restrictions, 3 predictors and 4 classes, so
## resmatrix must be a 6 x 12 matrix:

A <- matrix(0, ncol = 12, nrow = 6)
A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1
A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1

set.seed(983)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis,
            data, subset = trainsubset, gamma = c(0, 0.5, 1),
            resmatrix = A)
obj
## we can see that the apparent error rate of the restricted
## rules decrease with gamma:
##  gamma=0 gamma=0.5   gamma=1
## 42.30769  41.66667  41.02564

Restricted Quadratic Discriminant Analysis

Description

Build quadratic classification rules with additional information expressed as inequality restrictions among the populations means.

Usage

rqda(x, ...)

## S3 method for class 'matrix'
rqda(x, ...)

## S3 method for class 'data.frame'
rqda(x, grouping, ...)

## S3 method for class 'formula'
rqda(formula, data, ...)

## Default S3 method:
rqda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, 
     gamma = c(0, 1), prior = NULL, ...)
rqda(x, ...)

## S3 method for class 'matrix'
rqda(x, ...)

## S3 method for class 'data.frame'
rqda(x, grouping, ...)

## S3 method for class 'formula'
rqda(formula, data, ...)

## Default S3 method:
rqda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, 
     gamma = c(0, 1), prior = NULL, ...)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...`. That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are to be taken.
`x`	(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables.
`grouping`	(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation.
`subset`	An index vector specifying the cases to be used in the training sample.
`resmatrix`	A matrix specifying the linear restrictions on the mean vectors: `resmatrix` `%*%` `mu` `<=` 0, where `mu = c(mu1, mu2, ...)` and `mui` is the mean vector of class `i`. If unspecified, `restext` will be required (and `resmatrix` established accordingly).
`restext`	(Required if no `resmatrix` argument is given.) A character string from which `resmatrix` will be calculated. The first element must be either `"s"` (simple order) or `"t"` (tree order: `mu1 >= mu2, mu1 >= mu3 ...`). The second element must be either `"<"` (increasing componentwise order) or `">"` (decreasing componentwise order). The rest of the elements must be numbers from 1 to the number of explanatory variables, separated by commas, specifying among which variables the restrictions hold. For example, `"s<1,3"` will stand for `mu11` `<=` `mu21` `<=` `mu31` `<=` ..., `mu13` `<=` `mu23` `<=` `mu33` `<=` ...
`gamma`	A vector of values in the unit interval that determine the classification rules with additional information (see references).
`prior`	The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels.
`...`	Arguments passed to or from other methods.

Details

Specifying the prior will affect the classification and error unless over-ridden in predict.rlda and err.est.rlda, respectively.

Value

An object of class 'rqda' containing the following components:

`call`	The (matched) function call.
`trainset`	Matrix with the training set used (first columns) and the class for each observation (last column).
`restrictions`	Edited character string with the linear restrictions on the mean vectors detailed.
`resmatrix`	The matrix with the restrictions on the mean vectors used.
`prior`	Prior probabilities of class membership used.
`counts`	The number of observations of the classes used.
`N`	The total number of observations used.
`samplemeans`	Matrix with the sample means in rows.
`samplevariances`	Array with the sample covariance matrices of the classes.
`gamma`	Gamma values used.
`estimatedmeans`	Array with the estimated means for each classification rule.
`apparent`	Apparent error rate for each classification rule.

Note

This function may be called using either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.

Classes must be identified, either in a column of data or in the grouping vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.

If there are missing values in either data, x or grouping, corresponding observations will be deleted.

To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.

Author(s)

David Conde

References

Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.

Examples

data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2[, 1:4]
grouping = Vehicle2$Class
levels(grouping) <- c(4, 2, 1, 3)
## classes ordered by increasing size
## 
## according to variable definitions, we can consider
## the following restrictions on the means vectors:
## mu11 >= mu21 >= mu31 >= mu41
## mu12 >= mu22 >= mu32 >= mu42
## mu13 >= mu23 >= mu33 >= mu43
## 
## we can specify these restrictions by restext = "s>1,2,3"

set.seed(7964)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rqda(data, grouping, subset = trainsubset,
            gamma = (1:5)/5, restext = "s>1,2,3")
obj
## we can see that the apparent error rate of the restricted
## rules increase with gamma:
## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8   gamma=1
##  30.40936  30.99415  30.99415  30.99415  31.57895
data(Vehicle2)
levels(Vehicle2$Class)
## "bus" "opel" "saab" "van"

data <- Vehicle2[, 1:4]
grouping = Vehicle2$Class
levels(grouping) <- c(4, 2, 1, 3)
## classes ordered by increasing size
## 
## according to variable definitions, we can consider
## the following restrictions on the means vectors:
## mu11 >= mu21 >= mu31 >= mu41
## mu12 >= mu22 >= mu32 >= mu42
## mu13 >= mu23 >= mu33 >= mu43
## 
## we can specify these restrictions by restext = "s>1,2,3"

set.seed(7964)
values <- runif(dim(data)[1])
trainsubset <- values < 0.2
obj <- rqda(data, grouping, subset = trainsubset,
            gamma = (1:5)/5, restext = "s>1,2,3")
obj
## we can see that the apparent error rate of the restricted
## rules increase with gamma:
## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8   gamma=1
##  30.40936  30.99415  30.99415  30.99415  31.57895

Vehicle Silhouettes 2

Description

The purpose is to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many different angles. The features were extracted from the silhouettes by the HIPS (Hierarchical Image Processing System) extension BINATTS, which extracts a combination of scale independent features utilising both classical moments based measures such as scaled variance, skewness and kurtosis about the major/minor axes and heuristic measures such as hollows, circularity, rectangularity and compactness.

Four "Corgie" model vehicles were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars.

Usage

data(Vehicle2)data(Vehicle2)

Format

A data frame with 846 observations on 4 variables, all numerical and one nominal defining the class of the objects.

[,1]	Skew.maxis	Skewness about minor axis
[,2]	Kurt.Maxis	Kurtosis about major axis
[,3]	Holl.Ra	Hollows ratio: (area of hollows)/(area of bounding polygon)
[,4]	Sc.Var.maxis	Scaled variance along minor axis: (2nd order moment about minor axis)/area
[,5]	Class	Type

Source

Creator: Drs.Pete Mowforth and Barry Shepherd, Turing Institute, Glasgow, Scotland.

These data have been taken from the UCI Repository Of Machine Learning Databases at

http://archive.ics.uci.edu/ml/index.php

and were converted to R format by Evgenia Dimitriadou.

References

Turing Institute Research Memorandum TIRM-87-018 "Vehicle Recognition Using Rule Based Methods" by Siebert, JP (March 1987).

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

Examples

data(Vehicle2)
summary(Vehicle2)data(Vehicle2)
summary(Vehicle2)

Package 'dawai'

Help Index

Discriminant analysis with additional information

Description

Details

Author(s)

References

Restricted Discriminant Analysis. True Error Rate estimation

Description

Usage

Arguments

Value

Author(s)

See Also

Restricted Linear Discriminant Analysis. True Error Rate estimation

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Restricted Quadratic Discriminant Analysis. True Error Rate Estimation

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Minimize Inequality Constrained Mahalanobis Distance

Description

Usage

Arguments

Value

References

Examples

Restricted Linear Discriminant Analysis. Multivariate Observations Classification

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Restricted Quadratic Discriminant Analysis. Multivariate Observations Classification

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Restricted Linear Discriminant Analysis

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Restricted Quadratic Discriminant Analysis

Description

Usage

Arguments