One-Shot Learning with Siamese Network [For Facial Recognition]

The following article talks about the need for using One-shot learning along with its variations and drawbacks.

To begin with, in order to train any deep learning model, we need a large amount of data so that our model performs the desired prediction or classification task efficiently. For instance, detecting a dog from images will require you to train a neural network model on hundreds and thousands of dog and non-dog images for it to accurately distinguish one from the other. However, this neural network model will fail to work if it is trained on one or very few training data. 

With the lack of data, extracting relevant features at different layers becomes difficult. The model will not be able to generalize well between different classes thereby affecting its overall performance.

For illustration, consider the example of facial recognition at an airport. In this, we do not have the liberty to train our model of hundreds and thousands of images of each person containing different expressions, background lighting et al. With more than thousands of passengers arriving daily it is an impossible task! Besides, storing such a huge chunk of data adds up to the cost. 

To tackle the above problem, we use a technique in which classification or categorization tasks can be achieved with one or a few examples to classify many new examples. This technique is called One-shot learning. 

In recent years One-shot learning technology is being used extensively in facial recognition and passport checks. The concept being used is- The model takes input 2 images; one being the image from the passport and the other being the image of the person looking at the camera. The model then outputs a value which is the similarity between the 2 images. If the value of the output is low then the two images are similar else they are different.

Siamese Network

The architecture used for One-shot learning is called the Siamese Network. This architecture comprises two parallel neural networks with each taking different input. The output of the model is a value or a similarity index which indicates whether the two input images are alike or not. A value below a pre-defined threshold corresponds to the high similarity between the two images and visa versa. 

When the images are passed a series of Convolutional layers, max-pooling layers, and fully connected layers what we achieve is a vector that encodes the features of the images. Here because we input two images, two vectors encompassing the features of the input images will be generated. The value which we were talking about is the distance between the two feature vectors which can be calculated by finding the norm of the difference between the two vectors. 

Advantages and Disadvantages of Siamese Networks

As one of the matching networks for one shot learning, when working with SNN, you should remember these pros and cons.

Advantages of SNNs

  • Siamese networks demonstrate much higher speed and accuracy when identifying faces, images, and more such similarities than other neural networks.
  • You do not have to retrain Siamese networks to detect new classes after initially training them to work with large datasets. That is not possible with other neural networks, which have to be completely retrained.
  • Models can display improved generalization performance when both outputs are based on the same parameters, especially when the model is dealing with objects that are similar but not identical.

Trending Machine Learning Skills

Drawbacks of SNNs

  • The main challenge you will face with Siamese networks is that it needs higher computational power to work on twice as many operations required to train two models compared to other CNNs.
  • Siamese networks have a huge memory requirement.
  • SNNs also take much longer to train since they learn by comparing pairs of items.

Triplet loss function

As the name suggests, to train the model we require three images- one anchor (A) image, one positive (P), and one negative (N) image. Since two inputs can be provided to the model, an anchor image with either a positive or negative image is given. The model learns the parameter in such a fashion that the distance between the anchor image and the positive image is low while the distance between the anchor image and the negative image is high. 

The constructive loss function penalizes the model if the distance between A and N is low or A and P is high, while it encourages the model or learns features when the distance between A and N is high and A and P is low.

To understand more about the anchor, positive and negative images let’s consider the previous example of that at an airport. In such a case, the anchor image will be your image when you look at the camera, the positive image will be the one on your passport photo and the negative image will be a random image of a passenger present at the airport. 

Whenever we train a Siaseme network we provide it with the APN trios (Anchor, positive and negative) images. Creating this dataset is much easier and would require fewer images to train. 

Learn ML Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career

Limitations of One-shot learning

One-shot learning is still a mature machine learning algorithm and does possess some limitations. For instance, the model will not work well if the input image has some modifications- a person wearing a hat, sunglasses et al. Further, a model that is trained for one application cannot be generalized for another application. 

Moving on let’s see a few variations of One-shot learning which entails Zero-shot learning and Few-shot learning.

Zero-shot learning

Zero-shot learning is the ability of the model to identify new or unseen labeled data while being trained on seen data and knowing the semantic features of new or unseen data. For instance, a child who has seen a cat can identify it by its distinct features. Moreover, if the child is aware that the dog’s bark and possesses more solid characteristics than a cat, then the child would have no problem in recognizing the dog.

To conclude, we can say that ZSL recognition functions in a manner that takes into account the labeled training set of seen classes coupled with the knowledge about how each unseen class is semantically related to the seen classes.

Few Shot Learning

In Few shot learning, models require a very short amount of data to make predictions, compared to the large amounts that other models require of learning. It is a meta-learning form involving training on multiple related tasks during the meta-training phase. It enables the model to effectively generalize when faced with new data and only a few examples.

Few-shot learning is used in computer vision, natural language processing, robotics, and audio processing.

How is Few Shot Learning Helpful?

There are several reasons why Few shot learning is helpful:

  • It can be used when you want to reduce the data collection as it does not need much data to train the model. It also helps reduce the cost of data collection and computation.
  • In case of insufficient data, you can use Few-shot learning to make accurate predictions. Other machine learning tools, whether supervised or unsupervised, find it difficult to do without sufficient data.
  • Judging by a few examples, humans can categorize various handwritten characters, which is difficult for machines to do since they need large amounts of data to train. Few-shot learning can achieve the same feat as humans, owing to the small data it can work with.
  • Through the use of few-shot learning, machines can learn about rare diseases. These machines can classify anomalies with minimal data by employing computer vision models.

N-shot learning

As the name suggests, in N shot learning we will have n labeled data of each class available for training. The model is trained on K classes each containing n labeled data. After extracting relevant features and patterns the model has to categorize a new unlabelled image into one of the K classes. They use Matching networks that work on the nearest neighbors based approach trained fully end to end. 

Main Difference Between One-Shot, Few-Shot and Zero-Shot Learning 

One shot learning requires one labeled example for each new class. Few-shot learning requires a small number of examples for each new class, and zero-shot learning requires no labeled example for a new class.

Few-shot learning is a variation of one-shot learning since it requires more than one training image.

Zero-shot learning aims to classify unknown classes without any training data. The way it learns here is by using the image’s metadata or important information. This method mimics how humans learn. For example, if you read a detailed description of an elephant in a book, you will easily recognize it in real life or a photo.

Popular AI and ML Blogs & Free Courses


In conclusion, the field of One-shot learning and its counterparts have immense potential to solve some of the challenging problems. Though, being a relatively new area of research, it is making fast progress, and researchers are working trying to bridge the gap between machines and humans. 

With this, we have come to an end of this post, I hope you enjoyed reading it. 

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Want to share this article?

Lead the AI Driven Technological Revolution

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Machine Learning Course

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks