Introducing biClassify

is a package for adapting Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Kernel Discriminant Analysis to a variety of situations where the conventional methods may not work. In particular, this package has methodology for the following problems:

The purpose of this section is to give the user a quick overview of the package and the types of problems it can be used to solve. Accordingly, we implement only the basic versions of the available methods, and more detailed presentations are given in later sections.

We first load the package

library(biClassify)

Our first example illustrates the compressed LDA function on data well-suited for LDA. The first two features of the training data in are plotted below:

data(LDA_Data)

This data set has n = 10, 000 training samples with p = 10 features. It is normally distributed, and the two classes have equal covariance matrices. The test data was independently generated from the same distribution, but it has only n = 1, 000 samples.

Let us use compressed LDA to predict the test data labels.

> test_pred <- LDA(TrainData = LDA_Data$TrainData,
                   TrainCat = LDA_Data$TrainCat,
                   TestData = LDA_Data$TestData,
                   Method = "Compressed")$Predictions

> mean(test_pred != LDA_Data$TestCat)
[1] 0

The automatic impementation of compressed LDA predicted the Test labels perfectly! However, this is due, in part, to the classes being well-separated and having the same covariance structure. Let us now consider an example of where LDA will not perform well.

Our next example illustrates the compressed QDA function on data well-suited for QDA. The first two features of the training data in are plotted below:

data(QDA_Data)

A modification of Quadratic Discriminant Analysis is well-suited to such data. The package comes with a function for such purposes.

> test_pred <- QDA(TrainData = QDA_Data$TrainData,
                   TrainCat = QDA_Data$TrainCat,
                   TestData = QDA_Data$TestData,
                   Method = "Compressed")

> mean(test_pred != QDA_Data$TestCat)
[1] 0

Compressed QDA gives perfect class prediction

What happens if the data is not well-suited to either Linear or Quadratic Discriminant Analysis? Moreover, what happens if, in addtion to a non-linear decision boundary between classes, there also appear to be variables which do not contribute to group separation?

For example, consider the shown below.

data(KOS_Data)

For this data set, neither LDA or QDA would suffice. The function is the sparse kernel optimal scoring algorithm presented in . It is particularly well-suited to such problems, as can be seen from the following.

> output <- KOS(TrainData = KOS_Data$TrainData, 
                TrainCat = KOS_Data$TrainCat,
                TestData = KOS_Data$TestData)
> output$Weight
[1] 1 1 0 0

> mean(output$Predictions != KOS_Data$TestCat)
[1] 0

> summary(output$Dvec)
       V1          
 Min.   :-0.03002  
 1st Qu.:-0.01953  
 Median :-0.01445  
 Mean   : 0.00000  
 3rd Qu.: 0.03788  
 Max.   : 0.05799

in the output is how much weight the kernel classifier gives to each feature. The weight values lie in [−1, 1], and zero weight means that the feature does not contribute to computing the discriminant function. The KOS function correctly identifies that the first two features are important for class separation, and gives them full weight. It also correctly identifies Features 3 and 4 as being ``noise’’, and it gives them zero weight.

are the predicted class labels for the test data. As we can see, has perfect classification.

are the coefficients of the kernel classifier.

This section provides a more in-depth treatment to the Linear Discriminant methods available in .

There are five seperate linear discriminant methods avilable through the wrapper function:

The individual methods are invoked by setting the argument. Let us first load the data for notational convenience.

TrainData <- LDA_Data$TrainData
TrainCat <- LDA_Data$TrainCat
TestData <- LDA_Data$TestData
TestCat <- LDA_Data$TestCat

This method is the result of setting equal to . This method is traditional Linear Discriminant Analysis, as presented in . No additional parameters need to be supplied, and the code will run as stated.

test_pred <- LDA(TrainData, TrainCat, TestData)$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

which produces a list containing a vector of predicted class labels for and the discriminant vector used in LDA.

Compressed LDA seeks to solve the LDA problem on reduced-size data. It first compressed the groups of centered data $(X^{g} - \overline{X}_g)$ via a compression matrix Q^g. The entries Q_i, j^g are i.i.d. sparse radamacher random variables with distribution $$ \mathbb{P}(Q^{g}_{i,j}=1) = \mathbb{P}(Q^{g}_{i,j}=-1)= \frac{p}{2}\text{ and } \mathbb{P}(Q^{g}_{i,j}=0) = 1-p. $$

This method is the result of setting equal to . It is compressed LDA, as presented in . Compressed LDA reduces the group sample amounts from n₁ and n₂ to m₁ and m₂ respectively.

Compressed LDA requires the parameters , , .

The easiest way to run Compressed LDA is to set to and not worry about supplying additional parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Compressed", Mode = "Automatic")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

is the default value for , and so one could simply run

test_pred <- LDA(TrainData, TrainCat, TestData, Method = "Compressed")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

and obtain the same output.

When is set to , prompts will appear asking for the compression amounts m₁, m₂, and sparsity level s to be used in compression. The user will type in the amounts:

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "Compressed", Mode = "Interactive")$Predictions
"Please enter the number m1 of group 1 compression samples: "700
"Please enter the number m2 of group 2 compression samples: "300
"Please enter sparsity level s used in compression: "0.01

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Compressed", Mode = "Research", 
                 m1 = 700, m2 = 300, s = 0.01)$Predictions

table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

WARNING: The argument will override any supplied parameters if its value is or .

Sub-sampled LDA is just LDA trained on data sub-sampled uniformly from both classes.

To run sub-sampled LDA, set equal to . It requires the additional parameters and .

The easiest way to run Compressed LDA is to set to and not worry about supplying additional parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Subsampled", Mode = "Automatic")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300

is the default value for , and so one could simply run

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Subsampled")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300

and obtain the same output.

When is set to , prompts will appear asking for the sub-sample amounts m₁, m₂ for each group to be used. The user will type in the amounts:

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Subsampled", Mode = "Interactive")$Predictions
"Please enter the number m1 of group 1 sub-samples: "700
"Please enter the number m2 of group 2 sub-samples: "300

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "Subsampled", Mode = "Research", 
              m1 = 700, m2 = 300)$Predictions

table(output)
#> output
#>   1   2 
#> 700 300
mean(output != TestCat)
#> [1] 0

WARNING: The argument will override any supplied parameters if its value is or .

This method is the result of setting equal to . It is Projected LDA, as presented in . Projected LDA creates the discriminant vector on compressed data and then projects the full training data onto the discriminant vector.

Projected LDA requires the parameters , , .

The easiest way to run Projected LDA is to set to and not worry about supplying additional parameters.

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "Projected", Mode = "Automatic")$Predictions
table(output)
#> output
#>   1   2 
#> 700 300
mean(output != TestCat)
#> [1] 0

is the default value for , and so one could simply run

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "Projected")$Predictions
table(output)
#> output
#>   1   2 
#> 700 300
mean(output != TestCat)
#> [1] 0

and obtain the same output.

When is set to , prompts will appear asking for the compression amounts m₁, m₂, and sparsity level s to be used in compression. The user will type in the amounts:

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "Projected", Mode = "Interactive")$Predictions
"Please enter the number m1 of group 1 compression samples: "700
"Please enter the number m2 of group 2 compression samples: "300
"Please enter sparsity level s used in compression: "0.01

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "Projected", Mode = "Research", 
                 m1 = 700, m2 = 300, s = 0.01)$Predictions

table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(output != TestCat)
#> [1] 0

WARNING: The argument will override any supplied parameters if its value is or .

This method is the result of setting equal to . It is the Fast Random Fisher Discriminant Analysis algorithm, as presented in . Fast Random fisher creates the discriminant vector on reduced sample amounts m, and then projects the full training data onto the learned discriminant vector. The difference between Fast Random Fisher Discriminant Analysis and Projected LDA is that Fast Random Fisher mixes the groups together when forming the discriminant vector, but Projected LDA does not.

Fast Random Fisher requires the parameters , and .

The easiest way to run Fast Random Fisher is to set to and not worry about supplying additional parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "fastRandomFisher", Mode = "Automatic")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

is the default value for , and so one could simply run

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "fastRandomFisher")$Predictions
table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

and obtain the same output.

When is set to , prompts will appear asking for the total amount of compressed samples m and sparsity level s to be used in compression. The user will type in the amounts:

output <- LDA(TrainData, TrainCat, TestData, 
              Method = "fastRandomFisher", Mode = "Interactive")$Predictions
"Please enter the number m of total compressed samples: "1000
"Please enter sparsity level s used in compression: "0.01

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

test_pred <- LDA(TrainData, TrainCat, TestData, 
                 Method = "fastRandomFisher", Mode = "Research", 
                 m = 1000, s = 0.01)$Predictions

table(test_pred)
#> test_pred
#>   1   2 
#> 700 300
mean(test_pred != TestCat)
#> [1] 0

WARNING: The argument will override any supplied parameters if its value is or .

This section provides a more in-depth treatment to the Linear Discriminant methods available in .

There are three seperate quadratic discriminant methods avilable through the wrapper function:

The individual methods are invoked by setting the argument. Let us first load the data for notational convenience.

TrainData <- QDA_Data$TrainData
TrainCat <- QDA_Data$TrainCat
TestData <- QDA_Data$TestData
TestCat <- QDA_Data$TestCat

This method is the result of setting equal to . This method is traditional Quadratic Discriminant Analysis, as presented in . No additional parameters need to be supplied, and the code will run as stated. Unlike the function, only the class predictions are produced:

Predictions <- QDA(TrainData, TrainCat, TestData, Method = "Full")
table(Predictions)
#> Predictions
#>   1   2 
#> 700 300

This method is the result of setting equal to . It is compressed QDA, as presented in . Compressed QDA reduces the group sample amounts from n₁ and n₂ to m₁ and m₂ respectively via compression and trains QDA on the reduced samples.

Compressed QDA requires the parameters , , .

The easiest way to run Compressed QDA is to set to and not worry about supplying additional parameters.

output <- QDA(TrainData, TrainCat, TestData, Method = "Compressed", Mode = "Automatic")
table(output)
#> output
#>   1   2 
#> 700 300

is the default value for , and so one could simply run

output <- QDA(TrainData, TrainCat, TestData, Method = "Compressed")
table(output)
#> output
#>   1   2 
#> 700 300

and obtain the same output.

When is set to , prompts will appear asking for the compression amounts m₁, m₂, and sparsity level s to be used in compression. The user will type in the amounts:

output <- QDA(TrainData, TrainCat, TestData, Method = "Compressed", Mode = "Interactive")
"Please enter the number m1 of group 1 compression samples: "700
"Please enter the number m2 of group 2 compression samples: "300
"Please enter sparsity level s used in compression: "0.01

table(output)

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

output <- QDA(TrainData, TrainCat, TestData, Method = "Compressed", 
              Mode = "Research", m1 = 700, m2 = 300, s = 0.01)

summary(output)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     1.0     1.0     1.0     1.3     2.0     2.0

Sub-sampled QDA is just QDA trained on data sub-sampled uniformly from both classes. To run sub-sampled QDA, set equal to .

It requires the additional parameters and .

The easiest way to run sub-sampled QDA is to set to and not worry about supplying additional parameters.

output <- QDA(TrainData, TrainCat, TestData, Method = "Subsampled", Mode = "Automatic")
table(output)
#> output
#>   1   2 
#> 700 300

is the default value for , and so one could simply run

output <- QDA(TrainData, TrainCat, TestData, Method = "Subsampled")
summary(output)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     1.0     1.0     1.0     1.3     2.0     2.0

and obtain the same output.

When is set to , prompts will appear asking for the sub-sample amounts m₁, m₂ for each group to be used. The user will type in the amounts:

output <- QDA(TrainData, TrainCat, TestData, Method = "Subsampled", Mode = "Interactive")
"Please enter the number m1 of group 1 sub-samples: "700
"Please enter the number m2 of group 2 sub-samples: "300

summary(output)

and the output is produced.

If the user is interested in running simulation studies or has mastery over the functionality, they may wish to give the function all parameters.

output <- QDA(TrainData, TrainCat, TestData, Method = "Subsampled", 
              Mode = "Research", m1 = 700, m2 = 300)

summary(output)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     1.0     1.0     1.0     1.3     2.0     2.0

WARNING: The argument will override any supplied parameters if its value is or .

This section presents the kernel optimal scoring method available in the package. Kernel optimal scoring is presented in .

Kernel optimal scoring finds the kernel discriminant coefficients α ∈ ℝⁿ by solving a kernelized form of the optimal scoring problem

It is equivalent to kernel discriminant analysis.

We include simultaneous sparse feature selection by weighting the features using w ∈ [−1, 1]ⁿ, so that the weighted samples are wx = (w₁x₁, …, w_px_p)^⊤. The weighted kernel matrix K_w is defined by (K_w)_i, j := k(wx_i, wx_j). To perform sparse feature selection, we add a sparsity penalty on the weight vector λ∥w∥₁ and minimize $$ \min_{\alpha \in \mathbb{R}^{n}\,,\, w\in [-1,1]^{p}} \bigg\{\frac{1}{n}\|Y \widehat{\theta} - C\mathbf{K}_{w}C\alpha\|_{2}^{2}+\lambda \|w\|_{1}+\gamma \alpha^\top \mathbf{K}_{w}\alpha\bigg\}. $$

Let us load the data set used in kernel optimal scoring

TrainData <- KOS_Data$TrainData
TrainCat <- KOS_Data$TrainCat
TestData <- KOS_Data$TestData
TestCat <- KOS_Data$TestCat

This subsection details how selects the parameters σ², γ, and λ.

The gaussian kernel parameter σ², is selected based on the {.05, .1, .2, .3, .5} quantiles of the set of squared distances between the classes {∥x_i₁ − x_i₂∥₂² : x_i₁ ∈ C₁, x_i₂ ∈ C₂}. The ridge parameter γ is selected by adapting a kernel matrix shrinkage technique Lancewicki (2018) to the setting of ridge regression. For more details, see .

The sparsity parameter λ is selected using 5-fold cross-validation to minimize the error rate over a grid of 20 equally-spaced values.

The function implements these methods automatically. For more details, see .

> SelectParams(TrainData, TrainCat)

$Sigma
[1] 0.7390306

$Gamma
[1] 0.137591

$Lambda
[1] 0.0401767

If parameters are not supplied to , the function first invokes to generate any missing parameters.

Sparse kernel optimal scoring has three parameters: a Gaussian kernel parameter Sigma, a ridge parameter Gamma, and a sparsity parameter Lambda. They have a hierarchical dependency, in that Sigma influences Gamma, and both influence Lambda. The ordering is

Top Sigma

Middle Gamma

Bottom Lambda

When using either of the functions, the user is only allowed to specify parameter combinations which adhere to the hierarchical ordering above. That is, they can only input parameters which go from Top to Bottom. For example, they could specify both Sigma and Gamma, but leave Lambda as the default NULL value. On the other hand, the user would not be allowed to specify only Lambda while leaving Sigma and Gamma as their default NULL values.

> SelectParams(TrainData, TrainCat, Sigma = 1, Gamma = 0.1)

$Sigma
[1] 1

$Gamma
[1] 0.1

$Lambda
[1] 0.06186337

If the user supplies parameter values which violate the hierarchical ordering, the error message Hierarchical order of parameters violated. will be returned.

SelectParams(TrainData, TrainCat, Gamma = 0.1)

Error in SelectParams(TrainData, TrainCat, Gamma = 0.1) : 
Hierarchical order of parameters violated. 
Please specify Sigma before Gamma, and both Sigma and Gamma before Lambda.

This package comes with an all-purpose function for running kernel optimal scoring.

Sigma <- 1.325386  
Gamma <- 0.07531579 
Lambda <- 0.002855275

> output <- KOS(TestData, TrainData, TrainCat, Sigma = Sigma, 
                Gamma = Gamma, Lambda = Lambda)

> output$Weight
[1] 1 1 0 0

> table(output$Predictions)
 1  2 
26 68

> summary(output$Dvec)
       V1          
 Min.   :-0.05860  
 1st Qu.:-0.03711  
 Median :-0.02539  
 Mean   : 0.00000  
 3rd Qu.: 0.06983  
 Max.   : 0.10192

true