15 Language models and the Transformer

This chapter covers

How to generate text with a deep learning model
Training a model to translate from English to Spanish
The Transformer, a powerful architecture for text modeling problems

With the basics of text preprocessing and modeling covered in the previous chapter, this chapter will tackle some more involved language problems such as machine translation. We will build up a solid intuition for the Transformer model that powers products like ChatGPT and has helped trigger a wave of investment in NLP.

15.1 The Language Model

In the previous chapter, we learned how to convert text data to numeric inputs, and we used this numeric representation to classify movie reviews. However, text classification is, in many ways, a uniquely simple problem. We only need to output a single floating-point number for binary classification and, at worst, N numbers for N-way classification.

What about other text-based tasks like question answering or translation? For many real-world problems, we are interested in a model that can generate a text output for a given input. Just like we needed tokenizers and embeddings to help us handle text on the way in to a model, we must build up some techniques before we can produce text on the way out.

15.1.1 Training a Shakespeare Language Model

15.1.2 Generating Shakespeare

15.2 Sequence-to-sequence learning

15.2.1 English to Spanish Translation

15.2.2 Sequence-to-sequence learning with RNNs

15.3 The Transformer architecture

15.4 Dot-product attention

15.5 Transformer Encoder block

15.6 Transformer Decoder block

15.7 Sequence-to-sequence learning with a Transformer

15.8 Embedding positional information

15.9 Classification with a pretrained Transformer

15.9.1 Pretraining a Transformer encoder

15.9.2 Loading a pretrained Transformer

15.9.3 Preprocessing IMDb Movie reviews

15.10 Fine-tuning a pretrained Transformer

15.11 What makes the Transformer effective?

15.12 Chapter summary