1 How AI works

This chapter covers

The way LLMs process inputs and generate outputs
The transformer architecture that powers LLMs
Different types of machine learning
How LLMs and other AI models learn from data
How convolutional neural networks are used to process different types of media with AI
Combining different types of data (e.g., producing images from text)

This chapter clarifies how AI works, discussing many foundational AI topics. Since the latest AI boom, many of these topics (e.g., “embeddings” and “temperature”) are now widely discussed, not just by AI practitioners but also by businesspeople and the general public. This chapter demystifies them.

Instead of just piling up definitions and writing textbook explanations, this chapter is a bit more opinionated. It points out common AI problems, misconceptions, and limitations based on my experience working in the field, as well as discussing some interesting insights you might not be aware of. For example, we’ll discuss why language generation is more expensive in French than in English and how OpenAI hires armies of human workers to manually help train ChatGPT. So, even if you are already familiar with all the topics covered in this chapter, reading it might provide you with a different perspective.

The first part of this chapter is a high-level explanation of how large language models (LLMs) such as ChatGPT work. Its sections are ordered to roughly mimic how LLMs themselves turn inputs into outputs one step at a time.

How LLMs work

Text generation

End of text

Chat

The system prompt

Calling external software functions

Retrieval-augmented generation

The concept of tokens

One token at a time

Billed by the token

What about languages other than English?

Why do LLMs need tokens anyway?

Embeddings: A way to represent meaning

Machine learning and embeddings

Visualizing embeddings

Why embeddings are useful

Why LLMs struggle to analyze individual letters

The transformer architecture

Step 1: Initial embeddings

Step 2: Contextualization