Image classification of the MNIST and CIFAR-10 data using KernelKnn and HOG (histogram of oriented gradients)

In this vignette I’ll illustrate how to increase the accuracy on the MNIST (to approx. 98.4%) and CIFAR-10 data (to approx. 58.3%) using the KernelKnn package and HOG (histogram of oriented gradients).

dependencies

The MNIST and Cifar-10 data sets can be downloaded from Github using system(“wget https://raw.githubusercontent.com/mlampros/DataSets/master/mnist.zip”) and system(“wget https://raw.githubusercontent.com/mlampros/DataSets/master/cifar_10.zip”)
the irlba package, which is needed for comparison purposes, can be installed from CRAN directly
the HOG_apply function is part of the OpenImageR package, which can be installed from CRAN as well.

MNIST data set

The MNIST data set of handwritten digits has a training set of 70,000 examples and each row of the matrix corresponds to a 28 x 28 image. The unique values of the response variable y range from 0 to 9. More information about the data can be found in the DataSets repository (the folder includes also an Rmarkdown file).

# using system('wget..') on a linux OS 

system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/mnist.zip")             

mnist <- read.table(unz("mnist.zip", "mnist.csv"), nrows = 70000, header = T, 
                    
                    quote = "\"", sep = ",")

X = mnist[, -ncol(mnist)]
dim(X)

## [1] 70000   784

# the KernelKnn function requires that the labels are numeric and start from 1 : Inf

y = mnist[, ncol(mnist)] + 1          
table(y)

## y
##    1    2    3    4    5    6    7    8    9   10 
## 6903 7877 6990 7141 6824 6313 6876 7293 6825 6958

K nearest neighbors do not perform well in high dimensions due to the curse of dimensionality (k observations that are nearest to a given test observation x1 may be very far away from x1 in p-dimensional space when p is large [ An introduction to statistical learning, James/Witten/Hastie/Tibshirani, pages 108-109 ]), leading to a very poor k-nearest-neighbors fit. One option to overcome this problem would be to use truncated svd (irlba package) to reduce the dimensions of the data. A second option, which is appropriate in case of images, would be to use image descriptors. In this vignette, I’ll compare those two approaches.

KernelKnnCV using truncated svd

I experimented with different settings and the following parameters gave the best results,

irlba_singlular_vectors	k	method	kernel
40	8	braycurtis	biweight_tricube_MULT

library(irlba)

svd_irlb = irlba(as.matrix(X), nv = 40, nu = 40, verbose = F)            # irlba truncated svd

new_x = as.matrix(X) %*% svd_irlb$v               # new_x using the 40 right singular vectors

library(KernelKnn)

fit = KernelKnnCV(as.matrix(new_x), y, k = 8, folds = 4, method = 'braycurtis',
                  
                  weights_function = 'biweight_tricube_MULT', regression = F, 
                  
                  threads = 6, Levels = sort(unique(y)))


# str(fit)


# evaluation metric

acc = function (y_true, preds) {
  
  out = table(y_true, max.col(preds, ties.method = "random"))
  
  acc = sum(diag(out))/sum(out)
  
  acc
}

acc_fit = unlist(lapply(1:length(fit$preds), 
                        
                        function(x) acc(y[fit$folds[[x]]], 
                                        
                                        fit$preds[[x]])))
acc_fit

## [1] 0.9742857 0.9749143 0.9761143 0.9741143

cat('mean accuracy using cross-validation :', mean(acc_fit), '\n')

## mean accuracy using cross-validation : 0.9748571

Utilizing truncated svd a 4-fold cross-validation KernelKnn model gives a 97.48% accuracy.

KernelKnnCV and HOG (histogram of oriented gradients)

In this chunk of code, besides KernelKnnCV I’ll also use HOG. The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy (Wikipedia).

library(OpenImageR)

hog = HOG_apply(X, cells = 6, orientations = 9, rows = 28, columns = 28, threads = 6)

## 
## time to complete : 1.802997 secs

dim(hog)

## [1] 70000   324

fit_hog = KernelKnnCV(hog, y, k = 20, folds = 4, method = 'braycurtis',
                  
                  weights_function = 'biweight_tricube_MULT', regression = F, 
                  
                  threads = 6, Levels = sort(unique(y)))


#str(fit_hog)

acc_fit_hog = unlist(lapply(1:length(fit_hog$preds), 
                            
                            function(x) acc(y[fit_hog$folds[[x]]], 
                                            
                                            fit_hog$preds[[x]])))
acc_fit_hog

## [1] 0.9833714 0.9840571 0.9846857 0.9838857

cat('mean accuracy for hog-features using cross-validation :', mean(acc_fit_hog), '\n')

## mean accuracy for hog-features using cross-validation : 0.984

By changing from the simple svd-features to HOG-features the accuracy of a 4-fold cross-validation model increased from 97.48% to 98.4% (approx. 1% difference)

CIFAR-10 data set

CIFAR-10 is an established computer-vision dataset used for object recognition. The data I’ll use in this example is a subset of an 80 million tiny images dataset and consists of 60,000 32x32 color images containing one of 10 object classes ( 6000 images per class ). Furthermore, the data were converted from RGB to gray, normalized and rounded to 2 decimal places (to reduce the storage size). More information about the data can be found in my DataSets repository (I included an Rmarkdown file).

I’ll build the kernel k-nearest-neighbors models in the same way I’ve done for the mnist data set and then I’ll compare the results.

# using system('wget..') on a linux OS 

system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/cifar_10.zip")      

cifar_10 <- read.table(unz("cifar_10.zip", "cifar_10.csv"), nrows = 60000, header = T, 
                       
                       quote = "\"", sep = ",")

KernelKnnCV using truncated svd

X = cifar_10[, -ncol(cifar_10)]
dim(X)

## [1] 60000  1024

# the KernelKnn function requires that the labels are numeric and start from 1 : Inf

y = cifar_10[, ncol(cifar_10)]         
table(y)

## y
##    1    2    3    4    5    6    7    8    9   10 
## 6000 6000 6000 6000 6000 6000 6000 6000 6000 6000

The parameter settings are similar to those for the mnist data,

irlba_singlular_vectors	k	method	kernel
40	8	braycurtis	biweight_tricube_MULT

svd_irlb = irlba(as.matrix(X), nv = 40, nu = 40, verbose = F)            # irlba truncated svd

new_x = as.matrix(X) %*% svd_irlb$v               # new_x using the 40 right singular vectors

fit = KernelKnnCV(as.matrix(new_x), y, k = 8, folds = 4, method = 'braycurtis',
                  
                  weights_function = 'biweight_tricube_MULT', regression = F,
                  
                  threads = 6, Levels = sort(unique(y)))


# str(fit)

acc_fit = unlist(lapply(1:length(fit$preds),
                        
                        function(x) acc(y[fit$folds[[x]]], 
                                        
                                        fit$preds[[x]])))
acc_fit

## [1] 0.4080667 0.4097333 0.4040000 0.4102667

cat('mean accuracy using cross-validation :', mean(acc_fit), '\n')

## mean accuracy using cross-validation : 0.4080167

The accuracy of a 4-fold cross-validation model using truncated svd is 40.8%.

KernelKnnCV using HOG (histogram of oriented gradients)

Next, I’ll run the KernelKnnCV using the HOG-descriptors,

hog = HOG_apply(X, cells = 6, orientations = 9, rows = 32,
                
                columns = 32, threads = 6)

## 
## time to complete : 3.394621 secs

dim(hog)

## [1] 60000   324

fit_hog = KernelKnnCV(hog, y, k = 20, folds = 4, method = 'braycurtis',
                  
                  weights_function = 'biweight_tricube_MULT', regression = F,
                  
                  threads = 6, Levels = sort(unique(y)))


# str(fit_hog)

acc_fit_hog = unlist(lapply(1:length(fit_hog$preds), 
                            
                            function(x) acc(y[fit_hog$folds[[x]]], 
                                            
                                            fit_hog$preds[[x]])))
acc_fit_hog

## [1] 0.5807333 0.5884000 0.5777333 0.5861333

cat('mean accuracy for hog-features using cross-validation :', mean(acc_fit_hog), '\n')

## mean accuracy for hog-features using cross-validation : 0.58325

By using hog-descriptors in a 4-fold cross-validation model the accuracy in the cifar-10 data increases from 40.8% to 58.3% (approx. 17.5% difference).