Multi-Object Classification using Local Features and Support Vector Domain Description (SVDD)

For a complete understanding please have a look at my thesis


Goal of this research

The goal of this research is to improve the computation time required by the learning and testing phases of a classification system based on SVM when used with local features. We solve this problem by using local features with multiclass SVDD with incremental learning. Using local features with kernel methods is not simple. As the number of features extracted from images using a feature extractor is different for each image, therefore they cannot be used directly as input to SVDD. We employ a vocabulary guided pyramid match kernel, which is suitable for local features, finding the similarity value between sets of local features of images.
The goal of this thesis is to perform classification rather than recognition. By classification we mean, that on a trained classifier, the test images are different from the training images, while in a recognition system the original trained images are searched within a database of images and matched to the query image. Figure shows object recognition using SIFT feature matching of two images. The second image is a rotated and scaled version of the first one. The lines show same local features being detected and matched between the two images. Both these images are matched using the original SIFT match binaries available on David Lowe’s website.


Object Recognition using SIFT


Support Vector Domain Description

SVDD, finds the domain density of a class of data by mapping the data in feature space. The goal here is to find a sphere or domain with minimum volume that encompasses the whole data or most of data. SVDD is also called a once class classifier or density estimation technique. To find the domain with minimum volume the radius of sphere should be minimized under the constraint that the distance between the center and data point is smaller than the radius. The approach is similar to SVMs as support vectors define the class density description. The following figure shows how support vectors are used to find the domain density of a class data. The figure shows both linear kernel and non-linear kernel descriptions using SVDD. If a non-linear kernel like Gaussian is used then a better description of the data can be obtained.


Domain Description using linear and Gaussian Kernels

Multi-Class Support Vector Domain Description

The basic idea of this method is to find the distribution of each class data and when new data arrives it can be checked using distance of this data from each class center which class it belongs to. By using the distance in feature space we can determine the actual closeness of a test point to a certain class. It is important that the distribution of data in feature space should be discriminative. Using the vocabulary guided match kernel allows us to find a good distribution of data as it uses k-means clustering to cluster data. When we combine the vocabulary guided pyramid match kernel with multiclass SVDD we get the distribution of a certain domain or class in feature space. As mentioned above, a test data can then be tested by comparing it to all domains in feature space.


Incremental Learning
Incremental learning is an important feature in multiclass classification systems. Large number of classes having large number of training patterns requires long training times. This is a major hurdle for systems that need real-time performance. Incremental SVDD [18] as opposed to batch learning does not re-train all the previous data available, but only trains on the newly added data. This gives us very fast training times for a large number of training patterns in a large number of classes. We also used incremental learning in order to compare the training times between batch method and incremental method.


Pyramid Match Kernel with Support Vector Domain Description
The objective of this thesis has been to classify large categories of data using the local feature approach. We want our classification system to perform similar or at least closely to the humans for obvious reasons. For this purpose we combined the local features for image representation with statistical learning techniques used for pattern recognition. To combine the two approaches we needed a kernel that could exploit the advantages of local features and use the learning characteristics of SVDD. We managed this by using the vocabulary guided pyramid match kernel to find similarities between sets of local features extracted from images and plugging this kernel into multi-class SVDD [6] method for efficient classification. Our method is useful in two ways. Firstly, it is very fast for training as both the kernel calculation for high dimensional local features using the VG pyramid match kernel is fast and learning using multi-class SVDD is also very computationally efficient.



Training & Testing Criteria

  • Number of train images 5,10,15,20,25 and 30 per class,
  • Number of test images is fixed to 20 per class.

As there is different number of images per class, therefore it has been suggested to use a fix number of test images, while increasing the number of training images 5 at a time and measuring the classification accuracy. The experiments are carried out by varying the training and testing images from the images that are available for each class. The overall accuracy is taken as the average of 10 runs. All the tests were done on the CALTECH-101 dataset.


CALTECH-101 Dataset


Accuracy Comparison between SVDD & SVM




Timing Comparison between SVDD & SVM


Timing SIFT

Timing SURF


Incremental learning with local features and SVDD

Training Time Comparisons

Incremental SIFT

Incremental SURF


Accuracy Comparison

SVM Incremental SVDD SURF