The Quick Python Book, Fourth Edition cover
welcome to this free extract from
an online version of the Manning book.
to read more
or

24 Exploring data

 

This chapter covers

  • Python’s advantages for handling data
  • Using pandas
  • Data aggregation
  • Plots with Matplotlib

Over the past few chapters, I’ve dealt with some aspects of using Python to get and clean data. Now it’s time to look at a few of the things that Python can help you do to manipulate and explore data.

24.1 Python tools for data exploration

In this chapter, we’ll look at some common Python tools for data exploration in Jupyter: pandas and Matplotlib. I can only touch briefly on a few features of these tools, but the aim is to give you an idea of what is possible and some initial tools to use in exploring data with Python.

24.1.1 Python’s advantages for exploring data

Python has become one of the leading languages for data science and continues to grow in that area. As I’ve mentioned, however, Python isn’t always the fastest language in terms of raw performance. Conversely, some data-crunching libraries, such as NumPy, are largely written in C and heavily optimized to the point that speed isn’t a problem. In addition, considerations such as readability and accessibility often outweigh pure speed; minimizing the amount of developer time needed is often more important. Python is readable and accessible, and both on its own and in combination with tools developed in the Python community, it’s an enormously powerful tool for manipulating and exploring data.

24.1.2 Python can be better than a spreadsheet

24.2 Python and pandas

24.2.1 Why you might want to use pandas

24.2.2 Installing pandas

24.2.3 Data frames

24.3 Data cleaning

24.3.1 Loading and saving data with pandas

24.3.2 Data cleaning with a data frame

24.4 Data aggregation and manipulation

24.4.1 Merging data frames

24.4.2 Selecting data

24.4.3 Grouping and aggregation

24.5 Plotting data

24.6 Why you might not want to use pandas

Summary