5 Exploring and evaluating language models

This chapter covers

Understanding the capabilities of language models
Selecting suitable language models
Customizing language models for specific tasks
Considering language models in the wider application context
Evaluating language models

In this chapter, we’ll dive into the world of language models (LMs), which can be used for a wide variety of tasks, starting with content creation and moving on to tasks such as text summarization, translation, and more complex problem solving. The chapter will provide you with a solid understanding of LMs to help you make informed decisions about model selection, deployment, customization, and risk management. You also need to support your engineers in making design decisions about the integration, adaptation, and evaluation of LMs within the larger AI system you’re building.

Terminology While giant language models were the main “culprit” of the generative AI boom, there’s also a trend toward downscaling and using smaller, more efficient models. In the following, I use language model (LM) as a general term encompassing both large language models (LLMs) with more than 2 billion (2 B) parameters and small language models (SLMs) with fewer than 2 B parameters.

5.1 How language models work

5.1.1 Understanding the training data of a language model

5.1.2 The task of language modeling

5.1.3 Expanding the capabilities of a language model

5.2 Usage scenarios for language models

5.2.1 Direct interaction between user and model

5.2.2 Programmatic use

5.2.3 Using the language model for predefined tasks

5.3 Mapping the language model landscape

5.3.1 Mainstream commercial LLMs

5.3.2 Open source models

5.3.3 Reasoning language models

5.3.4 Small language models

5.3.5 Multimodal models

5.4 Managing the language model lifecycle

5.4.1 Model selection