16 Text generation

This chapter covers

A brief history of generative modeling
Training a miniature GPT model from scratch
Using a pretrained Transformer model to build a chatbot
Building a multimodal model that can describe images in natural language

When I first claimed that in a not-so-distant future, most of the cultural content we consume would be created with substantial help from AIs, I was met with utter disbelief, even from longtime machine learning practitioners. That was in 2014. Fast-forward a decade, and that disbelief had receded at an incredible speed. Generative AI tools are now common additions to word processors, image editors, and development environments. Prestigious awards are going out to literature and art created with generative models – to considerable controversy and debate.^[1] It no longer feels like science fiction to consider a world where AI and artistic endeavors are often intertwined.

In any practical sense, AI is nowhere close to rivaling human screenwriters, painters, or composers. But replacing humans need not, and should not, be the point. In many fields, but especially in creative ones, people will use AI to augment their capabilities – more augmented intelligence than artificial intelligence.

16.1 A brief history of sequence generation

16.2 Training a mini-GPT

16.3 Building the model

16.4 Pretraining the model

16.5 Generative decoding

16.6 Sampling strategies

16.7 Using a pretrained LLM

16.7.1 Text generation with the Gemma model

16.8 Instruction fine-tuning

16.9 Low-Rank Adaptation (LoRA)

16.10 Going further with LLMs

16.10.1 Reinforcement Learning with Human Feedback (RLHF)

16.11 Multimodal LLMs

16.11.1 Foundation models

16.12 Retrieval Augmented Generation (RAG)

16.13 “Reasoning” models

16.14 Where are LLMs heading next?

16.15 Chapter summary