top of page
Projects

​Semantic Image Embedding using Convolution Neural Networks

Semantic Image Embedding using Convolution Neural Networks

(2019)

A computer vision research project to model and train a convolution neural network that takes an image and converts it into an n-dimension feature vector in such a way that similar images are encoded closely while images belonging to a different class are encoded far away in an n-dimension hyperplane.

​Semantic Image Embedding using Convolution Neural Networks

The motivation comes from the fact that the data is an essential fuel of any AI project, and most of the time, enough dataset is not available to efficiently train a neural network. Additionally, if a new category is supposed to be added to the model, the entire neural network is retrained which is a costly process in terms of time, computing power, accuracy, etc. Most of the neural networks are trained on a predefined number of categories and are specific to a particular problem domain, making it extremely difficult for any new amendments.


By developing a neural network smart enough to understand the distinction between similar & dissimilar images by encoding it into a feature vector, one can use various statistical methods such as a nearest-neighbor or a simple euclidean distance to build a classification model which is dynamic and progressive in nature. Such a neural network can solve challenges related to re-training the network when training samples are changed or a new category is added. Additionally, the main advantage of this technique is that it allows building a classification model with bare minimum samples (around 20 per class) that can provide a significant initial accuracy which can gradually increase as the new samples are added.

Below are some intriguing results of a few datasets used for analysis. Each dataset was divided into Index Set (for indexing the classifier) and Test Set (for evaluation). With no additional training on these datasets, we used a generic pre-trained image embedding network to encode these images into feature vectors and then used nearest-neighbor along with detailed statistical calculations to refine and determine the exact match from Index Set. Following are the results of top-1 matches:

bottom of page