Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, this practical guide helps you gain a deeper understanding of Python’s implementation. You’ll learn how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs.

How can you take advantage of multi-core architectures or clusters? Or build a system that can scale up and down without losing reliability? Experienced Python programmers will learn concrete solutions to these and other issues, along with war stories from companies that use high performance Python for social media analytics, productionized machine learning, and other situations.

Get a better grasp of numpy, Cython, and profilers
Learn how Python abstracts the underlying computer architecture
Use profiling to find bottlenecks in CPU time and memory usage
Write efficient programs by choosing appropriate data structures
Speed up matrix and vector computations
Use tools to compile Python down to machine code
Manage multiple I/O and computational operations concurrently
Convert multiprocessing code to run on a local or remote cluster
Solve large problems while using less RAM

Micha Gorelick

Micha Gorelick was the first man on Mars in 2023 and won the Nobel prize in 2046 for his contributions to time travel. In a moment of rage after seeing the deplorable uses of his new technology, he traveled back in time to 2012 and convinced himself to leave his Physics PhD program and follow his love of data. First he applied his knowledge of real time computing and data science to the dataset at bitly. Then, after realizing he wanted to help people understand the technology of the future, he helped start Fast Forward Labs as a resident mad scientist. There, he worked on many issues from machine learning to performant stream algorithms. In this period of his life, he could be found consulting for various projects on issues of high performance data analysis. A monument celebrating his life can be found in Central Park, 1857.

Ian Ozsvald

Ian Ozsvald is a Data scientist and teacher at ModelInsight.io withover ten years of Python experience. He’s taught high performancePython at the PyCon and PyData conferences and has been consulting ondata science and high performance computing for years in the UK.

  1. Chapter 1Understanding Performant Python

    1. The Fundamental Computer System

    2. Putting the Fundamental Elements Together

    3. So Why Use Python?

  2. Chapter 2Profiling to Find Bottlenecks

    1. Profiling Efficiently

    2. Introducing the Julia Set

    3. Calculating the Full Julia Set

    4. Simple Approaches to Timing—print and a Decorator

    5. Simple Timing Using the Unix time Command

    6. Using the cProfile Module

    7. Using runsnakerun to Visualize cProfile Output

    8. Using line_profiler for Line-by-Line Measurements

    9. Using memory_profiler to Diagnose Memory Usage

    10. Inspecting Objects on the Heap with heapy

    11. Using dowser for Live Graphing of Instantiated Variables

    12. Using the dis Module to Examine CPython Bytecode

    13. Unit Testing During Optimization to Maintain Correctness

    14. Strategies to Profile Your Code Successfully

    15. Wrap-Up

  3. Chapter 3Lists and Tuples

    1. A More Efficient Search

    2. Lists Versus Tuples

    3. Wrap-Up

  4. Chapter 4Dictionaries and Sets

    1. How Do Dictionaries and Sets Work?

    2. Dictionaries and Namespaces

    3. Wrap-Up

  5. Chapter 5Iterators and Generators

    1. Iterators for Infinite Series

    2. Lazy Generator Evaluation

    3. Wrap-Up

  6. Chapter 6Matrix and Vector Computation

    1. Introduction to the Problem

    2. Aren’t Python Lists Good Enough?

    3. Memory Fragmentation

    4. Applying numpy to the Diffusion Problem

    5. numexpr: Making In-Place Operations Faster and Easier

    6. A Cautionary Tale: Verify “Optimizations” (scipy)

    7. Wrap-Up

  7. Chapter 7Compiling to C

    1. What Sort of Speed Gains Are Possible?

    2. JIT Versus AOT Compilers

    3. Why Does Type Information Help the Code Run Faster?

    4. Using a C Compiler

    5. Reviewing the Julia Set Example

    6. Cython

    7. Shed Skin

    8. Cython and numpy

    9. Numba

    10. Pythran

    11. PyPy

    12. When to Use Each Technology

    13. Foreign Function Interfaces

    14. Wrap-Up

  8. Chapter 8Concurrency

    1. Introduction to Asynchronous Programming

    2. Serial Crawler

    3. gevent

    4. tornado

    5. AsyncIO

    6. Database Example

    7. Wrap-Up

  9. Chapter 9The multiprocessing Module

    1. An Overview of the Multiprocessing Module

    2. Estimating Pi Using the Monte Carlo Method

    3. Estimating Pi Using Processes and Threads

    4. Finding Prime Numbers

    5. Verifying Primes Using Interprocess Communication

    6. Sharing numpy Data with multiprocessing

    7. Synchronizing File and Variable Access

    8. Wrap-Up

  10. Chapter 10Clusters and Job Queues

    1. Benefits of Clustering

    2. Drawbacks of Clustering

    3. Common Cluster Designs

    4. How to Start a Clustered Solution

    5. Ways to Avoid Pain When Using Clusters

    6. Three Clustering Solutions

    7. NSQ for Robust Production Clustering

    8. Other Clustering Tools to Look At

    9. Wrap-Up

  11. Chapter 11Using Less RAM

    1. Objects for Primitives Are Expensive

    2. Understanding the RAM Used in a Collection

    3. Bytes Versus Unicode

    4. Efficiently Storing Lots of Text in RAM

    5. Tips for Using Less RAM

    6. Probabilistic Data Structures

  12. Chapter 12Lessons from the Field

    1. Adaptive Lab’s Social Media Analytics (SoMA)

    2. Making Deep Learning Fly with RadimRehurek.com

    3. Large-Scale Productionized Machine Learning at Lyst.com

    4. Large-Scale Social Media Analysis at Smesh

    5. PyPy for Successful Web and Data Processing Systems

    6. Task Queues at Lanyrd.com