12 Object detection

This chapter covers

Understanding the object detection problem
Two-stage and single-stage object detectors
Training a simple single-stage detector from scratch
Using a pretrained object detector

Object detection is all about drawing boxes (called “bounding boxes”) around objects of interest in a picture (see Figure 12.1). This enables you to know not just which objects are in a picture, but also where they are. Some of its most common applications are:

Counting: Find out how many instances of an object are in an image.
Tracking: Track how objects move in a scene over time by performing object detection on every frame of a movie.
Cropping: Identify the area of an image that contains an object of interest, in order to crop it and send a higher-resolution version of the image patch to a classifier or an Optical Character Recognition (OCR) model.

Figure 12.1 Object detectors draw boxes around objects in an image and label them.

You might be wondering – if I have a segmentation mask for an object instance, I can already compute the coordinates of the smallest box that contains the mask. So couldn’t we just use image segmentation all the time? Do we need object detection models at all?

12.1 Single-stage vs two-stage object detectors

12 Object detection

This chapter covers

Figure 12.1 Object detectors draw boxes around objects in an image and label them.

12.1 Single-stage vs two-stage object detectors

12.1.1 Two-stage R-CNN detectors

12.1.2 Single-stage detectors

12.2 Training a YOLO model from scratch

12.2.1 Downloading the COCO dataset

12.2.2 Creating a YOLO model

12.2.3 Readying the COCO data for the YOLO model

12.3 Training the YOLO model

12.4 Using a pretrained RetinaNet detector

12.5 Chapter summary