Transformers have taken the world of NLP by storm in the last few years. Now they are being used with success in applications beyond NLP as well.
The Transformer gets its powers because of the Attention module. And this happens because it captures the relationships between each word in a sequence with every other word.
But the all-important question is how exactly does it do that?
In this article, we will attempt to answer that question, and understand why it performs the calculations that it does.
I have a few more articles in my series on Transformers. In those articles…
The Batch Norm layer is frequently used in deep learning models in association with a Convolutional or Linear layer. Many state-of-the-art Computer Vision architectures such as Inception and Resnet rely on it to create deeper networks that can be trained faster.
In this article, we will explore why Batch Norm works and why it requires fewer training epochs when training a model.
You might also enjoy reading my other article on Batch Norm which explains, in simple language, what Batch Norm is and walks through, step by step, how it operates under the hood.
And if you’re interested in Neural…
Optimizers are a critical component of neural network architecture. And Schedulers are a vital part of your deep learning toolkit. During training, they play a key role in helping the network learn to make better predictions.
But what ‘knobs’ do they have to control their behavior? And how can you make the best use of them to tune hyperparameters to improve the performance of your model?
When defining your model there are a few important choices to be made — how to prepare the data, the model architecture, and the loss function. …
Batch Norm is an essential part of the toolkit of the modern deep learning practitioner. Soon after it was introduced in the Batch Normalization paper, it was recognized as being transformational in creating deeper neural networks that could be trained faster.
Batch Norm is a neural network layer that is now commonly used in many architectures. It often gets added as part of a Linear or Convolutional block and helps to stabilize the network during training.
In this article, we will explore what Batch Norm is, why we need it and how it works.
You might also enjoy reading my…
Most NLP applications such as machine translation, chatbots, text summarization, and language models generate some text as their output. In addition applications like image captioning or automatic speech recognition (ie. Speech-to-Text) output text, even though they may not be considered pure NLP applications.
The common problem when training these applications is how do we decide how ‘good’ that output is?
With applications like, say, image classification the predicted class can be compared unambiguously with the target class to decide whether the output is correct or not. However, the problem is much trickier with applications where the output is a sentence.
Generating Image Captions using deep learning has produced remarkable results in recent years. One of the most widely-used architectures was presented in the Show, Attend and Tell paper.
The innovation that it introduced was to apply Attention, which has seen much success in the world of NLP, to the Image Caption problem. Attention helped the model focus on the most relevant portion of the image as it generated each word of the caption.
In this article, we will walk through a simple demo application to understand how this architecture works in detail.
I have another article that provides an overview…
Image Captioning is a fascinating application of deep learning that has made tremendous progress in recent years. What makes it even more interesting is that it brings together both Computer Vision and NLP.
It takes an image as input and produces a short textual summary describing the content of the photo.
Location data is an important category of data that you frequently have to deal with in many machine learning applications. Location data typically provides a lot of extra context to your application’s data.
For instance, you might want to predict e-commerce sales projections based on your customer data. The machine learning model might be able to identify more accurate customer buying patterns by also accounting for the customer location information. This would become all the more important if this was for a physical site (rather than online) such as retail stores, restaurants, hotels, or hospitals.
Optimizers are a critical component of a Neural Network architecture. During training, they play a key role in helping the network learn to make better and better predictions.
They do this by finding the optimal set of model parameters like weights and biases so that the model can produce the best outputs for the problem they’re solving.
The most common optimization technique used by most neural networks is gradient descent.
Most popular deep learning libraries, such as Pytorch and Keras, have a plethora of built-in optimizers based on gradient descent eg. …
Many NLP applications such as machine translation, chatbots, text summarization, and language models generate some text as their output. In addition applications like image captioning or automatic speech recognition (ie. Speech-to-Text) output text, even though they may not be considered pure NLP applications.
There are a couple of commonly used algorithms used by all of these applications as part of their last step to produce their final output.
Machine Learning and Big Data