HANDS-ON TUTORIALS, INTUITIVE TRANSFORMERS SERIES NLP

A Gentle Guide to how the Attention Score calculations capture relationships between words in a sequence, in Plain English.

Photo by Olav Ahrens Røtne on Unsplash

Transformers have taken the world of NLP by storm in the last few years. Now they are being used with success in applications beyond NLP as well.

The Transformer gets its powers because of the Attention module. And this happens because it captures the relationships between each word in a sequence with every other word.

But the all-important question is how exactly does it do that?

In this article, we will attempt to answer that question, and understand why it performs the calculations that it does.

I have a few more articles in my series on Transformers. In those articles…


HANDS-ON TUTORIALS, INTUITIVE DEEP LEARNING SERIES

A Gentle Guide to the reasons for the Batch Norm layer’s success in making training converge faster, in Plain English

Photo by AbsolutVision on Unsplash

The Batch Norm layer is frequently used in deep learning models in association with a Convolutional or Linear layer. Many state-of-the-art Computer Vision architectures such as Inception and Resnet rely on it to create deeper networks that can be trained faster.

In this article, we will explore why Batch Norm works and why it requires fewer training epochs when training a model.

You might also enjoy reading my other article on Batch Norm which explains, in simple language, what Batch Norm is and walks through, step by step, how it operates under the hood.

And if you’re interested in Neural…


Hands-on Tutorials, INTUITIVE DEEP LEARNING SERIES

A Gentle Guide to boosting model training and hyperparameter tuning with Optimizers and Schedulers, in Plain English

Photo by Tim Mossholder on Unsplash

Optimizers are a critical component of neural network architecture. And Schedulers are a vital part of your deep learning toolkit. During training, they play a key role in helping the network learn to make better predictions.

But what ‘knobs’ do they have to control their behavior? And how can you make the best use of them to tune hyperparameters to improve the performance of your model?

When defining your model there are a few important choices to be made — how to prepare the data, the model architecture, and the loss function. …


Hands-on Tutorials, INTUITIVE DEEP LEARNING SERIES

A Gentle Guide to an all-important Deep Learning layer, in Plain English

Photo by Reuben Teo on Unsplash

Batch Norm is an essential part of the toolkit of the modern deep learning practitioner. Soon after it was introduced in the Batch Normalization paper, it was recognized as being transformational in creating deeper neural networks that could be trained faster.

Batch Norm is a neural network layer that is now commonly used in many architectures. It often gets added as part of a Linear or Convolutional block and helps to stabilize the network during training.

In this article, we will explore what Batch Norm is, why we need it and how it works.

You might also enjoy reading my…


INTUITIVE NLP SERIES

A Gentle Guide to two essential metrics (Bleu Score and Word Error Rate) for NLP models, in Plain English

Photo by engin akyurt on Unsplash

Most NLP applications such as machine translation, chatbots, text summarization, and language models generate some text as their output. In addition applications like image captioning or automatic speech recognition (ie. Speech-to-Text) output text, even though they may not be considered pure NLP applications.

How good is the predicted output?

The common problem when training these applications is how do we decide how ‘good’ that output is?

With applications like, say, image classification the predicted class can be compared unambiguously with the target class to decide whether the output is correct or not. However, the problem is much trickier with applications where the output is a sentence.


INTUITIVE IMAGE CAPTIONS SERIES

An end-to-end example using Encoder-Decoder with Attention in Keras and Tensorflow 2.0, in Plain English

Photo by Max Kleinen on Unsplash

Generating Image Captions using deep learning has produced remarkable results in recent years. One of the most widely-used architectures was presented in the Show, Attend and Tell paper.

The innovation that it introduced was to apply Attention, which has seen much success in the world of NLP, to the Image Caption problem. Attention helped the model focus on the most relevant portion of the image as it generated each word of the caption.

In this article, we will walk through a simple demo application to understand how this architecture works in detail.

I have another article that provides an overview…


Hands-on Tutorials, INTUITIVE IMAGE CAPTIONS SERIES

A Gentle Guide to Image Feature Encoders, Sequence Decoders, Attention, and Multi-modal Architectures, in plain English

Photo by Brett Jordan on Unsplash

Image Captioning is a fascinating application of deep learning that has made tremendous progress in recent years. What makes it even more interesting is that it brings together both Computer Vision and NLP.

What is Image Captioning?

It takes an image as input and produces a short textual summary describing the content of the photo.


Hands-on Tutorials, INTUITIVE GEO-LOCATION SERIES

A Gentle Guide to Feature Engineering and Visualization with Geospatial data, in Plain English

Photo by Daniel Olah on Unsplash

Location data is an important category of data that you frequently have to deal with in many machine learning applications. Location data typically provides a lot of extra context to your application’s data.

For instance, you might want to predict e-commerce sales projections based on your customer data. The machine learning model might be able to identify more accurate customer buying patterns by also accounting for the customer location information. This would become all the more important if this was for a physical site (rather than online) such as retail stores, restaurants, hotels, or hospitals.


Hands-on Tutorials, INTUITIVE DEEP LEARNING

A Gentle Guide to fundamental techniques used by gradient descent optimizers like SGD, Momentum, RMSProp, Adam, and others, in plain English

Photo by George Stackpole on Unsplash

Optimizers are a critical component of a Neural Network architecture. During training, they play a key role in helping the network learn to make better and better predictions.

They do this by finding the optimal set of model parameters like weights and biases so that the model can produce the best outputs for the problem they’re solving.

The most common optimization technique used by most neural networks is gradient descent.

Most popular deep learning libraries, such as Pytorch and Keras, have a plethora of built-in optimizers based on gradient descent eg. …


Hands-on Tutorials, INTUITIVE NLP SERIES

A gentle guide to how Beam Search enhances predictions, in plain English

Photo by Casey Horner on Unsplash

Many NLP applications such as machine translation, chatbots, text summarization, and language models generate some text as their output. In addition applications like image captioning or automatic speech recognition (ie. Speech-to-Text) output text, even though they may not be considered pure NLP applications.

There are a couple of commonly used algorithms used by all of these applications as part of their last step to produce their final output.

  • Greedy Search is one such algorithm. It is used often because it is simple and quick.
  • The alternative is to use Beam Search. It is very popular because, although it requires more…

Ketan Doshi

Machine Learning and Big Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store