
12 Object detection
This chapter covers
- Understanding the object detection problem
- Two-stage and single-stage object detectors
- Training a simple single-stage detector from scratch
- Using a pretrained object detector
Object detection is all about drawing boxes (called “bounding boxes”) around objects of interest in a picture (see Figure 12.1). This enables you to know not just which objects are in a picture, but also where they are. Some of its most common applications are:
- Counting: Find out how many instances of an object are in an image.
- Tracking: Track how objects move in a scene over time by performing object detection on every frame of a movie.
- Cropping: Identify the area of an image that contains an object of interest, in order to crop it and send a higher-resolution version of the image patch to a classifier or an Optical Character Recognition (OCR) model.
Figure 12.1 Object detectors draw boxes around objects in an image and label them.

You might be wondering – if I have a segmentation mask for an object instance, I can already compute the coordinates of the smallest box that contains the mask. So couldn’t we just use image segmentation all the time? Do we need object detection models at all?