9 Convnet architecture patterns

This chapter covers

The Modularity-Hierarchy-Reuse formula for model architecture
An overview of standard best practices for building convnets: Residual connections, batch normalization, depthwise separable convolutions
Ongoing design trends for computer vision models

A model’s “architecture” is the sum of the choices that went into creating it: which layers to use, how to configure them, in what arrangement to connect them. These choices define the hypothesis space of your model: the space of possible functions that gradient descent can search over, parameterized by the model’s weights. Like feature engineering, a good hypothesis space encodes prior knowledge that you have about the problem at hand and its solution. For instance, using convolution layers means that you know in advance that the relevant patterns present in your input images are translation-invariant. In order to effectively learn from data, you need to make assumptions about what you’re looking for.

9.1 Modularity, hierarchy, and reuse

9.2 Residual connections

9 Convnet architecture patterns

This chapter covers

9.1 Modularity, hierarchy, and reuse

9.2 Residual connections

9.3 Batch normalization

9.4 Depthwise separable convolutions

9.5 Putting it together: a mini Xception-like model

9.6 Beyond convolution: Vision Transformers

9.7 Chapter summary