The Quick Python Book, Fourth Edition cover
welcome to this free extract from
an online version of the Manning book.
to read more
or

16 Regular expressions

 

This chapter covers

  • Understanding regular expressions
  • Creating regular expressions with special characters
  • Using raw strings in regular expressions
  • Extracting matched text from strings
  • Substituting text with regular expressions

Some might wonder why I’m discussing regular expressions in this book at all. Regular expressions are implemented by a single Python module and are advanced enough that they don’t even come as part of the standard library in languages like C or Java. But if you’re using Python, you’re probably doing text parsing; if you’re doing that, regular expressions are too useful to be ignored. If you’ve used Perl, Tcl, or Linux/UNIX, you may be familiar with regular expressions; if not, this chapter goes into them in some detail.

16.1 What is a regular expression?

A regular expression (regex) is a way of recognizing and often extracting data from certain patterns of text. A regex that recognizes a piece of text or a string is said to match that text or string. A regex is defined by a string in which certain characters (the so-called metacharacters) can have a special meaning, which enables a single regex to match many different specific strings.

It’s easier to understand this through example than through explanation. The following is a program with a regular expression that counts how many lines in a text file contain the word hello. A line that contains hello more than once is counted only once:

16.2 Regular expressions with special characters

16.3 Regular expressions and raw strings

16.3.1 Raw strings to the rescue

16.4 Extracting matched text from strings

16.5 Substituting text with regular expressions

16.5.1 Using a function with sub

16.6 Phone number normalizer

16.6.1 Solving the problem with AI-generated code

16.6.2 Solutions and discussion

Summary