Title: | Discriminant Analysis with Additional Information |
---|---|
Description: | In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate. |
Authors: | David Conde [aut, cre], Miguel A. Fernandez [aut], Bonifacio Salvador [aut] |
Maintainer: | David Conde <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.7 |
Built: | 2024-10-16 12:33:41 UTC |
Source: | CRAN |
This package performs linear and quadratic discriminant analysis with additional information expressed as inequality constraints among the populations means and computes several estimations of the true error rate
Package: dawai
Type: Package
Version: 1.2.7
Date: 2024-10-15
License: GPL-2 | GPL-3
For a complete list of functions with individual help pages, use library(help = "dawai")
.
David Conde, Miguel A. Fernandez, Bonifacio Salvador
Maintainer: David Conde <[email protected]>
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Conde, D., Salvador, B., Rueda, C. , and Fernandez, M. A. (2013). Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial. Statistical Applications in Genetics and Molecular Biology, 12(5), 583-602.
Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.
err.est
is a generic function for true error rate estimations of classification rules built with additional information. The function invokes particular methods which depend on the class
of the first argument.
err.est(x, ...)
err.est(x, ...)
x |
An object for which true error rate estimations are desired. |
... |
Additional arguments affecting the true error rate estimations produced. |
See the documentation of the particular methods for details of what is produced by each method.
David Conde
Estimate the true error rate of linear classification rules built with additional information (in conjunction with rlda
).
## S3 method for class 'rlda' err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
## S3 method for class 'rlda' err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
x |
An object of class |
nboot |
Number of bootstrap samples used to estimate the true error rate of the classification rules. |
gamma |
A vector of values specifying which rules to take among the ones in |
prior |
The prior probabilities of class membership. If unspecified, |
... |
Arguments based from or to other methods. |
This function is a method for the generic function err.est()
for class 'rlda'
.
A list with components
call |
The (matched) function call. |
restrictions |
Character vector with the restrictions on the means vector detailed. |
prior |
The prior probabilities of the classes used. |
counts |
The number of observations of the classes used. |
N |
The total number of observations used. |
estimationError |
Matrix with BT2, BT3, BT2CV and BT3CV true error rate estimates of the rules. |
To overcome singularity of the covariance matrices after bootstraping, the number of observations in each class must be greater than the number of explanatory variables divided by 0.632.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Conde, D., Salvador, B., Rueda, C. , and Fernandez, M. A. (2013). Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial. Statistical Applications in Genetics and Molecular Biology, 12(5), 583-602.
err.est
, rlda
, predict.rlda
, rqda
, predict.rqda
, err.est.rqda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data = Vehicle2[, c("Holl.Ra", "Sc.Var.maxis")] grouping = Vehicle2$Class levels(grouping) <- c(3, 1, 1, 2) ## now we can consider the following restrictions: ## mu11 >= mu21 >= mu31 ## ## we can specify these restrictions by restext = "s>1" set.seed(-1007) values <- runif(length(rownames(data))) trainsubset <- values < 0.05 testsubset <- values >= 0.05 obj <- rlda(data, grouping, subset = trainsubset, restext = "s>1") pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset], prior = c(1/3, 1/3,1/3)) pred$error.rate err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data = Vehicle2[, c("Holl.Ra", "Sc.Var.maxis")] grouping = Vehicle2$Class levels(grouping) <- c(3, 1, 1, 2) ## now we can consider the following restrictions: ## mu11 >= mu21 >= mu31 ## ## we can specify these restrictions by restext = "s>1" set.seed(-1007) values <- runif(length(rownames(data))) trainsubset <- values < 0.05 testsubset <- values >= 0.05 obj <- rlda(data, grouping, subset = trainsubset, restext = "s>1") pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset], prior = c(1/3, 1/3,1/3)) pred$error.rate err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
Estimate the true error rate of quadratic classification rules built with additional information (in conjunction with rqda
).
## S3 method for class 'rqda' err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
## S3 method for class 'rqda' err.est(x, nboot = 50, gamma = x$gamma, prior = x$prior, ...)
x |
An object of class |
nboot |
Number of bootstrap samples used to estimate the true error rate of the classification rules. |
gamma |
A vector of values specifying which rules to take among the ones in |
prior |
The prior probabilities of class membership. If unspecified, |
... |
Arguments based from or to other methods. |
This function is a method for the generic function err.est()
for class 'rqda'
.
A list with components
call |
The (matched) function call. |
restrictions |
Character vector with the restrictions on the means vector detailed. |
prior |
The prior probabilities of the classes used. |
counts |
The number of observations of the classes used. |
N |
The total number of observations used. |
estimationError |
Matrix with BT2, BT3, BT2CV and BT3CV true error rate estimates of the rules. |
To overcome singularity of the covariance matrices after bootstraping, the number of observations in each class must be greater than the number of explanatory variables divided by 0.632.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Conde, D., Salvador, B., Rueda, C. , and Fernandez, M. A. (2013). Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial. Statistical Applications in Genetics and Molecular Biology, 12(5), 583-602.
err.est
, rqda
, predict.rqda
, rlda
, predict.rlda
, err.est.rlda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data = Vehicle2[, c("Kurt.Maxis", "Holl.Ra", "Sc.Var.maxis")] grouping = Vehicle2$Class levels(grouping) <- c(3, 1, 1, 2) ## now we can consider the following restrictions: ## mu11 >= mu21 >= mu31 ## mu12 >= mu22 >= mu32 ## ## we can specify these restrictions by restext = "s>1,2" set.seed(5561) values <- runif(length(rownames(data))) trainsubset <- values < 0.05 testsubset <- values >= 0.05 obj <- rqda(data, grouping, subset = trainsubset, restext = "s>1,2") pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset], prior = c(1/3, 1/3,1/3)) pred$error.rate err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data = Vehicle2[, c("Kurt.Maxis", "Holl.Ra", "Sc.Var.maxis")] grouping = Vehicle2$Class levels(grouping) <- c(3, 1, 1, 2) ## now we can consider the following restrictions: ## mu11 >= mu21 >= mu31 ## mu12 >= mu22 >= mu32 ## ## we can specify these restrictions by restext = "s>1,2" set.seed(5561) values <- runif(length(rownames(data))) trainsubset <- values < 0.05 testsubset <- values >= 0.05 obj <- rqda(data, grouping, subset = trainsubset, restext = "s>1,2") pred <- predict(obj, data[testsubset,], grouping = grouping[testsubset], prior = c(1/3, 1/3,1/3)) pred$error.rate err.est(obj, 30, prior = c(1/3, 1/3, 1/3))
Find the vector z that solves:
min{ (x - z)'inv(S)(x - z); Az <= b },
where x is an input vector, S its covariance matrix, A is a matrix of known contrasts, and b is a vector of known constraint constants.
lsConstrain.fit(x, b, s, a, iflag, itmax=4000, eps=1e-06, eps2=1e-06)
lsConstrain.fit(x, b, s, a, iflag, itmax=4000, eps=1e-06, eps2=1e-06)
x |
vector of length n |
b |
vector of length k, containing constraint constants |
s |
matrix of dim n x n, the covariance matrix for vector x |
a |
matrix of dim k x n, for the contraints |
iflag |
vector of length k; an item = 0 if inequality constraint, 1 if equality constraint |
itmax |
scalar for number of max interations |
eps |
scalar of accuracy for convergence |
eps2 |
scalar to determine close to zero |
List with the following components:
itmax: (defined above)
eps: (defined above)
eps2: (defined above)
iflag: (defined above)
xkt: vector of length k, for the Kuhn-Tucker coefficients.
iter: number of completed iterations.
supdif: greatest difference between estimates across a full cycle
ifault: error indicator: 0 = no error 1 = itmax exceeded 3 = invalid constraint function for some row ASA'=0.
a: (defined above)
call: function call
x.init: input vector x.
x.final: the vector "z" that solves the equation (see z in description).
s: (defind above)
min.dist: the minimum value of the function (see description).
Wollan PC, Dykstra RL. Minimizing inequality constrained mahalanobis distances. Applied Statistics Algorithm AS 225 (1987).
# An simulation example with linear regression with 3 beta's, # where we have the contraints: # # b[1] > 0 # b[2] - b[1] < 0 # b[3] < 0 set.seed(111) n <- 100 x <- rep(1:3,rep(n,3)) x <- 1*outer(x,1:3,"==") beta <- c(2,1,1) y <- x%*%beta + rnorm(nrow(x)) fit <- lm(y ~-1 + x) s <- solve( t(x) %*% x ) bhat <- fit$coef a <- rbind(c(-1, 0, 0), c(-1, 1, 0), c( 0, 0, 1)) # View expected constraints (3rd not met): a %*% bhat # [,1] # [1,] -1.8506811 # [2,] -0.9543320 # [3,] 0.8590827 b <- rep(0, nrow(a)) iflag <- rep(0,length(b)) save <- lsConstrain.fit(x=bhat,b=b, s=s, a=a, iflag=iflag, itmax=500, eps=1e-6, eps2=1e-6) save
# An simulation example with linear regression with 3 beta's, # where we have the contraints: # # b[1] > 0 # b[2] - b[1] < 0 # b[3] < 0 set.seed(111) n <- 100 x <- rep(1:3,rep(n,3)) x <- 1*outer(x,1:3,"==") beta <- c(2,1,1) y <- x%*%beta + rnorm(nrow(x)) fit <- lm(y ~-1 + x) s <- solve( t(x) %*% x ) bhat <- fit$coef a <- rbind(c(-1, 0, 0), c(-1, 1, 0), c( 0, 0, 1)) # View expected constraints (3rd not met): a %*% bhat # [,1] # [1,] -1.8506811 # [2,] -0.9543320 # [3,] 0.8590827 b <- rep(0, nrow(a)) iflag <- rep(0,length(b)) save <- lsConstrain.fit(x=bhat,b=b, s=s, a=a, iflag=iflag, itmax=500, eps=1e-6, eps2=1e-6) save
Classify multivariate observations with linear classification rules built with additional information in conjunction with rlda
.
## S3 method for class 'rlda' predict(object, newdata, prior = object$prior, gamma = object$gamma, grouping = NULL, ...)
## S3 method for class 'rlda' predict(object, newdata, prior = object$prior, gamma = object$gamma, grouping = NULL, ...)
object |
An object of class |
newdata |
A data frame of cases to be classified, containing the variables used on creating |
prior |
The prior probabilities of class membership. If unspecified, |
gamma |
A vector of values specifying which rules to take among the ones in |
grouping |
A numeric vector or factor with numeric levels specifying the class for each observation. If present, true error rate will be estimated from |
... |
Arguments based from or to other methods. |
This function is a method for the generic function predict()
for class 'rlda'
.
A list with components
call |
The (matched) function call. |
class |
Matrix with the classification for each rule (in columns). |
prior |
The prior probabilities of the classes used. |
posterior |
Array with the posterior probabilities of the classes for each rule. |
error.rate |
True error rate estimation (when |
If there are missing values in newdata
, corresponding observations will not be classified.
If there are missing values in grouping
, corresponding observations will not be considered on calculating the true error rate.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
rlda
, err.est.rlda
, rqda
, predict.rqda
, err.est.rqda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2 levels(data$Class) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can ## consider the following restrictions on the means vectors: ## mu11, mu21 >= mu31 >= mu41 ## mu12, mu22 >= mu32 >= mu42 ## ## we have 6 restrictions, 3 predictors and 4 classes, so ## resmatrix must be a 6 x 12 matrix: A <- matrix(0, ncol = 12, nrow = 6) A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1 A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1 set.seed(983) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 testsubset <- values >= 0.2 obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis, data, subset = trainsubset, gamma = c(0, 0.5, 1), resmatrix = A) pred <- predict(obj, newdata = data[testsubset,], grouping = data[testsubset, "Class"], prior = rep(1/4, 4)) pred$error.rate ## we can see that the test error rate of the restricted ## rules decrease with gamma: ## gamma=0 gamma=0.5 gamma=1 ## True error rate (%): 40.86957 39.71014 39.71014
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2 levels(data$Class) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can ## consider the following restrictions on the means vectors: ## mu11, mu21 >= mu31 >= mu41 ## mu12, mu22 >= mu32 >= mu42 ## ## we have 6 restrictions, 3 predictors and 4 classes, so ## resmatrix must be a 6 x 12 matrix: A <- matrix(0, ncol = 12, nrow = 6) A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1 A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1 set.seed(983) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 testsubset <- values >= 0.2 obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis, data, subset = trainsubset, gamma = c(0, 0.5, 1), resmatrix = A) pred <- predict(obj, newdata = data[testsubset,], grouping = data[testsubset, "Class"], prior = rep(1/4, 4)) pred$error.rate ## we can see that the test error rate of the restricted ## rules decrease with gamma: ## gamma=0 gamma=0.5 gamma=1 ## True error rate (%): 40.86957 39.71014 39.71014
Classify multivariate observations with quadratic classification rules built with additional information in conjunction with rqda
.
## S3 method for class 'rqda' predict(object, newdata, prior = object$prior, gamma = object$gamma, grouping = NULL, ...)
## S3 method for class 'rqda' predict(object, newdata, prior = object$prior, gamma = object$gamma, grouping = NULL, ...)
object |
An object of class |
newdata |
A data frame of cases to be classified, containing the variables used on creating |
prior |
The prior probabilities of class membership. If unspecified, |
gamma |
A vector of values specifying which rules to take among the ones in |
grouping |
A numeric vector or factor with numeric levels specifying the class for each observation. If present, true error rate will be estimated from |
... |
Arguments based from or to other methods. |
This function is a method for the generic function predict()
for class 'rqda'
.
A list with components
call |
The (matched) function call. |
class |
Matriarchx with the classification for each rule (in columns). |
prior |
The prior probabilities of the classes used. |
posterior |
Array with the posterior probabilities of the classes for each rule. |
error.rate |
True error rate estimation (when |
If there are missing values in newdata
, corresponding observations will not be classified.
If there are missing values in grouping
, corresponding observations will not be considered on calculating the true error rate.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
rqda
, err.est.rqda
, rlda
, predict.rlda
, err.est.rlda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2[, 1:4] grouping = Vehicle2$Class levels(grouping) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can consider ## the following restrictions on the means vectors: ## mu11 >= mu21 >= mu31 >= mu41 ## mu12 >= mu22 >= mu32 >= mu42 ## mu13 >= mu23 >= mu33 >= mu43 ## ## we can specify these restrictions by restext = "s>1,2,3" set.seed(7964) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 testsubset <- values >= 0.2 obj <- rqda(data, grouping, subset = trainsubset, gamma = (1:5)/5, restext = "s>1,2,3") pred <- predict(obj, newdata = data[testsubset,], grouping = grouping[testsubset]) pred$error.rate ## we can see that the test error rate of the restricted ## rules decrease with gamma: ## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8 gamma=1 ## True error rate (%): 40.14815 39.85185 39.85185 39.11111 39.11111
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2[, 1:4] grouping = Vehicle2$Class levels(grouping) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can consider ## the following restrictions on the means vectors: ## mu11 >= mu21 >= mu31 >= mu41 ## mu12 >= mu22 >= mu32 >= mu42 ## mu13 >= mu23 >= mu33 >= mu43 ## ## we can specify these restrictions by restext = "s>1,2,3" set.seed(7964) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 testsubset <- values >= 0.2 obj <- rqda(data, grouping, subset = trainsubset, gamma = (1:5)/5, restext = "s>1,2,3") pred <- predict(obj, newdata = data[testsubset,], grouping = grouping[testsubset]) pred$error.rate ## we can see that the test error rate of the restricted ## rules decrease with gamma: ## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8 gamma=1 ## True error rate (%): 40.14815 39.85185 39.85185 39.11111 39.11111
Build linear classification rules with additional information expressed as inequality restrictions among the populations means.
rlda(x, ...) ## S3 method for class 'matrix' rlda(x, ...) ## S3 method for class 'data.frame' rlda(x, grouping, ...) ## S3 method for class 'formula' rlda(formula, data, ...) ## Default S3 method: rlda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, gamma = c(0, 1), prior = NULL, ...)
rlda(x, ...) ## S3 method for class 'matrix' rlda(x, ...) ## S3 method for class 'data.frame' rlda(x, grouping, ...) ## S3 method for class 'formula' rlda(formula, data, ...) ## Default S3 method: rlda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, gamma = c(0, 1), prior = NULL, ...)
formula |
A formula of the form |
data |
Data frame from which variables specified in |
x |
(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables. |
grouping |
(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation. |
subset |
An index vector specifying the cases to be used in the training sample. |
resmatrix |
A matrix specifying the linear restrictions on the mean vectors: |
restext |
(Required if no |
gamma |
A vector of values in the unit interval that determine the classification rules with additional information (see references). |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels. |
... |
Arguments passed to or from other methods. |
Specifying the prior
will affect the classification and error unless over-ridden in predict.rlda
and err.est.rlda
, respectively.
An object of class 'rlda'
containing the following components:
call |
The (matched) function call. |
trainset |
Matrix with the training set used (first columns) and the class for each observation (last column). |
restrictions |
Edited character string with the linear restrictions on the mean vectors detailed. |
resmatrix |
The matrix with the restrictions on the mean vectors used. |
prior |
Prior probabilities of class membership used. |
counts |
The number of observations of the classes used. |
N |
The total number of observations used. |
samplemeans |
Matrix with the sample means in rows. |
samplevariances |
Array with the sample covariance matrices of the classes. |
gamma |
Gamma values used. |
spooled |
Pooled covariance matrix. |
estimatedmeans |
Array with the estimated means for each classification rule. |
apparent |
Apparent error rate for each classification rule. |
This function may be called giving either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.
Classes must be identified, either in a column of data
or in the grouping
vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.
If there are missing values in either data
, x
or grouping
, corresponding observations will be deleted.
To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.
predict.rlda
, err.est.rlda
, rqda
, predict.rqda
, err.est.rqda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2 levels(data$Class) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can ## consider the following restrictions on the means vectors: ## mu11, mu21 >= mu31 >= mu41 ## mu12, mu22 >= mu32 >= mu42 ## ## we have 6 restrictions, 3 predictors and 4 classes, so ## resmatrix must be a 6 x 12 matrix: A <- matrix(0, ncol = 12, nrow = 6) A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1 A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1 set.seed(983) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis, data, subset = trainsubset, gamma = c(0, 0.5, 1), resmatrix = A) obj ## we can see that the apparent error rate of the restricted ## rules decrease with gamma: ## gamma=0 gamma=0.5 gamma=1 ## 42.30769 41.66667 41.02564
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2 levels(data$Class) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can ## consider the following restrictions on the means vectors: ## mu11, mu21 >= mu31 >= mu41 ## mu12, mu22 >= mu32 >= mu42 ## ## we have 6 restrictions, 3 predictors and 4 classes, so ## resmatrix must be a 6 x 12 matrix: A <- matrix(0, ncol = 12, nrow = 6) A[t(matrix(c(1, 1, 2, 2, 3, 4, 4, 5, 5, 7, 6, 8), nrow = 2))] <- -1 A[t(matrix(c(1, 7, 2, 8, 3, 7, 4, 8, 5, 10, 6, 11), nrow = 2))] <- 1 set.seed(983) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 obj <- rlda(Class ~ Kurt.Maxis + Holl.Ra + Sc.Var.maxis, data, subset = trainsubset, gamma = c(0, 0.5, 1), resmatrix = A) obj ## we can see that the apparent error rate of the restricted ## rules decrease with gamma: ## gamma=0 gamma=0.5 gamma=1 ## 42.30769 41.66667 41.02564
Build quadratic classification rules with additional information expressed as inequality restrictions among the populations means.
rqda(x, ...) ## S3 method for class 'matrix' rqda(x, ...) ## S3 method for class 'data.frame' rqda(x, grouping, ...) ## S3 method for class 'formula' rqda(formula, data, ...) ## Default S3 method: rqda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, gamma = c(0, 1), prior = NULL, ...)
rqda(x, ...) ## S3 method for class 'matrix' rqda(x, ...) ## S3 method for class 'data.frame' rqda(x, grouping, ...) ## S3 method for class 'formula' rqda(formula, data, ...) ## Default S3 method: rqda(x, grouping, subset = NULL, resmatrix = NULL, restext = NULL, gamma = c(0, 1), prior = NULL, ...)
formula |
A formula of the form |
data |
Data frame from which variables specified in |
x |
(Required if no formula is given as the principal argument.) A data frame or matrix containing the explanatory variables. |
grouping |
(Required if no formula is given as the principal argument.) A numeric vector or factor with numeric levels specifying the class for each observation. |
subset |
An index vector specifying the cases to be used in the training sample. |
resmatrix |
A matrix specifying the linear restrictions on the mean vectors: |
restext |
(Required if no |
gamma |
A vector of values in the unit interval that determine the classification rules with additional information (see references). |
prior |
The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities must be specified in the order of the factor levels. |
... |
Arguments passed to or from other methods. |
Specifying the prior
will affect the classification and error unless over-ridden in predict.rlda
and err.est.rlda
, respectively.
An object of class 'rqda'
containing the following components:
call |
The (matched) function call. |
trainset |
Matrix with the training set used (first columns) and the class for each observation (last column). |
restrictions |
Edited character string with the linear restrictions on the mean vectors detailed. |
resmatrix |
The matrix with the restrictions on the mean vectors used. |
prior |
Prior probabilities of class membership used. |
counts |
The number of observations of the classes used. |
N |
The total number of observations used. |
samplemeans |
Matrix with the sample means in rows. |
samplevariances |
Array with the sample covariance matrices of the classes. |
gamma |
Gamma values used. |
estimatedmeans |
Array with the estimated means for each classification rule. |
apparent |
Apparent error rate for each classification rule. |
This function may be called using either a formula and data frame, or a data frame and grouping factor, or a matrix and grouping factor as the first two arguments. All other arguments are optional.
Classes must be identified, either in a column of data
or in the grouping
vector, by natural numbers varying from 1 to the number of classes. The number of classes must be greater than 1.
If there are missing values in either data
, x
or grouping
, corresponding observations will be deleted.
To overcome singularity of the covariance matrices, the number of observations in each class must be greater or equal than the number of explanatory variables.
David Conde
Conde, D., Fernandez, M. A., Rueda, C., and Salvador, B. (2012). Classification of samples into two or more ordered populations with application to a cancer trial. Statistics in Medicine, 31, 3773-3786.
Conde, D., Fernandez, M. A., Salvador, B., and Rueda, C. (2015). dawai: An R Package for Discriminant Analysis with Additional Information. Journal of Statistical Software, 66(10), 1-19. URL http://www.jstatsoft.org/v66/i10/.
Fernandez, M. A., Rueda, C., Salvador, B. (2006). Incorporating additional information to normal linear discriminant rules. Journal of the American Statistical Association, 101, 569-577.
predict.rqda
, err.est.rqda
, rlda
, predict.rlda
, err.est.rlda
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2[, 1:4] grouping = Vehicle2$Class levels(grouping) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can consider ## the following restrictions on the means vectors: ## mu11 >= mu21 >= mu31 >= mu41 ## mu12 >= mu22 >= mu32 >= mu42 ## mu13 >= mu23 >= mu33 >= mu43 ## ## we can specify these restrictions by restext = "s>1,2,3" set.seed(7964) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 obj <- rqda(data, grouping, subset = trainsubset, gamma = (1:5)/5, restext = "s>1,2,3") obj ## we can see that the apparent error rate of the restricted ## rules increase with gamma: ## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8 gamma=1 ## 30.40936 30.99415 30.99415 30.99415 31.57895
data(Vehicle2) levels(Vehicle2$Class) ## "bus" "opel" "saab" "van" data <- Vehicle2[, 1:4] grouping = Vehicle2$Class levels(grouping) <- c(4, 2, 1, 3) ## classes ordered by increasing size ## ## according to variable definitions, we can consider ## the following restrictions on the means vectors: ## mu11 >= mu21 >= mu31 >= mu41 ## mu12 >= mu22 >= mu32 >= mu42 ## mu13 >= mu23 >= mu33 >= mu43 ## ## we can specify these restrictions by restext = "s>1,2,3" set.seed(7964) values <- runif(dim(data)[1]) trainsubset <- values < 0.2 obj <- rqda(data, grouping, subset = trainsubset, gamma = (1:5)/5, restext = "s>1,2,3") obj ## we can see that the apparent error rate of the restricted ## rules increase with gamma: ## gamma=0.2 gamma=0.4 gamma=0.6 gamma=0.8 gamma=1 ## 30.40936 30.99415 30.99415 30.99415 31.57895
The purpose is to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many different angles. The features were extracted from the silhouettes by the HIPS (Hierarchical Image Processing System) extension BINATTS, which extracts a combination of scale independent features utilising both classical moments based measures such as scaled variance, skewness and kurtosis about the major/minor axes and heuristic measures such as hollows, circularity, rectangularity and compactness.
Four "Corgie" model vehicles were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars.
data(Vehicle2)
data(Vehicle2)
A data frame with 846 observations on 4 variables, all numerical and one nominal defining the class of the objects.
[,1] | Skew.maxis | Skewness about minor axis |
[,2] | Kurt.Maxis | Kurtosis about major axis |
[,3] | Holl.Ra | Hollows ratio: (area of hollows)/(area of bounding polygon) |
[,4] | Sc.Var.maxis | Scaled variance along minor axis: (2nd order moment about minor axis)/area |
[,5] | Class | Type |
Creator: Drs.Pete Mowforth and Barry Shepherd, Turing Institute, Glasgow, Scotland.
These data have been taken from the UCI Repository Of Machine Learning Databases at
and were converted to R format by Evgenia Dimitriadou.
Turing Institute Research Memorandum TIRM-87-018 "Vehicle Recognition Using Rule Based Methods" by Siebert, JP (March 1987).
Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
data(Vehicle2) summary(Vehicle2)
data(Vehicle2) summary(Vehicle2)