--- title: "Image classification of the MNIST and CIFAR-10 data using KernelKnn and HOG (histogram of oriented gradients)" author: "Lampros Mouselimis" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Image classification of the MNIST and CIFAR-10 data using KernelKnn and HOG (histogram of oriented gradients)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- In this vignette I'll illustrate how to increase the accuracy on the MNIST (to approx. 98.4%) and CIFAR-10 data (to approx. 58.3%) using the KernelKnn package and HOG (histogram of oriented gradients).
### dependencies * The MNIST and Cifar-10 data sets can be downloaded from Github using **system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/mnist.zip")** and **system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/cifar_10.zip")** * the **irlba** package, which is needed for comparison purposes, can be installed from CRAN directly * the *HOG_apply* function is part of the **OpenImageR** package, which can be installed from CRAN as well. ### MNIST data set The MNIST data set of handwritten digits has a training set of 70,000 examples and each row of the matrix corresponds to a 28 x 28 image. The unique values of the response variable *y* range from 0 to 9. More information about the data can be found in the *DataSets* repository (the folder includes also an Rmarkdown file). ```{r, eval = F, echo = T, warning = F, message = F, cache = T} # using system('wget..') on a linux OS system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/mnist.zip") mnist <- read.table(unz("mnist.zip", "mnist.csv"), nrows = 70000, header = T, quote = "\"", sep = ",") ``` ```{r, eval = F, cache = T} X = mnist[, -ncol(mnist)] dim(X) ## [1] 70000 784 # the KernelKnn function requires that the labels are numeric and start from 1 : Inf y = mnist[, ncol(mnist)] + 1 table(y) ## y ## 1 2 3 4 5 6 7 8 9 10 ## 6903 7877 6990 7141 6824 6313 6876 7293 6825 6958 ```
K nearest neighbors do not perform well in high dimensions due to the *curse of dimensionality* (k observations that are nearest to a given test observation x1 may be very far away from x1 in p-dimensional space when p is large [ An introduction to statistical learning, James/Witten/Hastie/Tibshirani, pages 108-109 ]), leading to a very poor k-nearest-neighbors fit. One option to overcome this problem would be to use truncated svd (irlba package) to reduce the dimensions of the data. A second option, which is appropriate in case of images, would be to use image descriptors. In this vignette, I'll compare those two approaches.

##### KernelKnnCV using truncated svd I experimented with different settings and the following parameters gave the best results,

```{r, eval = T, echo = F} knitr::kable(data.frame(irlba_singlular_vectors = 40, k = 8, method = 'braycurtis', kernel = 'biweight_tricube_MULT'), align = 'l') ```
```{r, eval = F, cache = T} library(irlba) svd_irlb = irlba(as.matrix(X), nv = 40, nu = 40, verbose = F) # irlba truncated svd new_x = as.matrix(X) %*% svd_irlb$v # new_x using the 40 right singular vectors ```
```{r, eval = F, cache = T, warning = FALSE, message = FALSE, results = 'hide'} library(KernelKnn) fit = KernelKnnCV(as.matrix(new_x), y, k = 8, folds = 4, method = 'braycurtis', weights_function = 'biweight_tricube_MULT', regression = F, threads = 6, Levels = sort(unique(y))) # str(fit) # evaluation metric acc = function (y_true, preds) { out = table(y_true, max.col(preds, ties.method = "random")) acc = sum(diag(out))/sum(out) acc } ```
```{r, eval = F, cache = F} acc_fit = unlist(lapply(1:length(fit$preds), function(x) acc(y[fit$folds[[x]]], fit$preds[[x]]))) acc_fit ## [1] 0.9742857 0.9749143 0.9761143 0.9741143 cat('mean accuracy using cross-validation :', mean(acc_fit), '\n') ## mean accuracy using cross-validation : 0.9748571 ```
Utilizing truncated svd a 4-fold cross-validation KernelKnn model gives a 97.48% accuracy.

##### KernelKnnCV and HOG (histogram of oriented gradients) In this chunk of code, besides KernelKnnCV I'll also use HOG. The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy (Wikipedia).
```{r, eval = F, cache = T} library(OpenImageR) hog = HOG_apply(X, cells = 6, orientations = 9, rows = 28, columns = 28, threads = 6) ## ## time to complete : 1.802997 secs dim(hog) ## [1] 70000 324 ```
```{r, eval = F, cache = T, warning = FALSE, message = FALSE, results = 'hide'} fit_hog = KernelKnnCV(hog, y, k = 20, folds = 4, method = 'braycurtis', weights_function = 'biweight_tricube_MULT', regression = F, threads = 6, Levels = sort(unique(y))) #str(fit_hog) ```
```{r, eval = F, cache = F} acc_fit_hog = unlist(lapply(1:length(fit_hog$preds), function(x) acc(y[fit_hog$folds[[x]]], fit_hog$preds[[x]]))) acc_fit_hog ## [1] 0.9833714 0.9840571 0.9846857 0.9838857 cat('mean accuracy for hog-features using cross-validation :', mean(acc_fit_hog), '\n') ## mean accuracy for hog-features using cross-validation : 0.984 ```
By changing from the simple svd-features to HOG-features the accuracy of a 4-fold cross-validation model increased from 97.48% to 98.4% (approx. 1% difference) ### CIFAR-10 data set CIFAR-10 is an established computer-vision dataset used for object recognition. The data I'll use in this example is a subset of an 80 million tiny images dataset and consists of 60,000 32x32 color images containing one of 10 object classes ( 6000 images per class ). Furthermore, the data were converted from RGB to gray, normalized and rounded to 2 decimal places (to reduce the storage size). More information about the data can be found in my *DataSets* repository (I included an Rmarkdown file).

I'll build the kernel k-nearest-neighbors models in the same way I've done for the mnist data set and then I'll compare the results. ```{r, eval = F, echo = T, warning = F, message = F, cache = T} # using system('wget..') on a linux OS system("wget https://raw.githubusercontent.com/mlampros/DataSets/master/cifar_10.zip") cifar_10 <- read.table(unz("cifar_10.zip", "cifar_10.csv"), nrows = 60000, header = T, quote = "\"", sep = ",") ``` ##### KernelKnnCV using truncated svd ```{r, eval = F, cache = T} X = cifar_10[, -ncol(cifar_10)] dim(X) ## [1] 60000 1024 # the KernelKnn function requires that the labels are numeric and start from 1 : Inf y = cifar_10[, ncol(cifar_10)] table(y) ## y ## 1 2 3 4 5 6 7 8 9 10 ## 6000 6000 6000 6000 6000 6000 6000 6000 6000 6000 ```
The parameter settings are similar to those for the mnist data, ```{r, eval = T, echo = F} knitr::kable(data.frame(irlba_singlular_vectors = 40, k = 8, method = 'braycurtis', kernel = 'biweight_tricube_MULT'), align = 'l') ```
```{r, eval = F, cache = T} svd_irlb = irlba(as.matrix(X), nv = 40, nu = 40, verbose = F) # irlba truncated svd new_x = as.matrix(X) %*% svd_irlb$v # new_x using the 40 right singular vectors ```
```{r, eval = F, cache = T, warning = FALSE, message = FALSE, results = 'hide'} fit = KernelKnnCV(as.matrix(new_x), y, k = 8, folds = 4, method = 'braycurtis', weights_function = 'biweight_tricube_MULT', regression = F, threads = 6, Levels = sort(unique(y))) # str(fit) ```
```{r, eval = F, cache = F} acc_fit = unlist(lapply(1:length(fit$preds), function(x) acc(y[fit$folds[[x]]], fit$preds[[x]]))) acc_fit ## [1] 0.4080667 0.4097333 0.4040000 0.4102667 cat('mean accuracy using cross-validation :', mean(acc_fit), '\n') ## mean accuracy using cross-validation : 0.4080167 ```
The accuracy of a 4-fold cross-validation model using truncated svd is 40.8%. ##### KernelKnnCV using HOG (histogram of oriented gradients)
Next, I'll run the KernelKnnCV using the HOG-descriptors,

```{r, eval = F, cache = T} hog = HOG_apply(X, cells = 6, orientations = 9, rows = 32, columns = 32, threads = 6) ## ## time to complete : 3.394621 secs dim(hog) ## [1] 60000 324 ```
```{r, eval = F, cache = T, warning = FALSE, message = FALSE, results = 'hide'} fit_hog = KernelKnnCV(hog, y, k = 20, folds = 4, method = 'braycurtis', weights_function = 'biweight_tricube_MULT', regression = F, threads = 6, Levels = sort(unique(y))) # str(fit_hog) ```
```{r, eval = F, cache = F} acc_fit_hog = unlist(lapply(1:length(fit_hog$preds), function(x) acc(y[fit_hog$folds[[x]]], fit_hog$preds[[x]]))) acc_fit_hog ## [1] 0.5807333 0.5884000 0.5777333 0.5861333 cat('mean accuracy for hog-features using cross-validation :', mean(acc_fit_hog), '\n') ## mean accuracy for hog-features using cross-validation : 0.58325 ```
By using hog-descriptors in a 4-fold cross-validation model the accuracy in the cifar-10 data increases from 40.8% to 58.3% (approx. 17.5% difference). ```{r, eval = F, echo = F} # remove cache and mnist.zip once vignettes are built # unlink("image_classification_using_MNIST_CIFAR_data_cache", recursive = TRUE) # USE this chunk in case of 'eval = TRUE' # unlink("mnist.zip", recursive = TRUE) # unlink("cifar_10.zip", recursive = TRUE) ```