Python debugging, profiling and testing

It only looks simple when it’s finished

2019-04-27 — 2025-05-13

Wherein the reader is presented with methods for locating faults, from post‑mortem pdb and ipdb usage to sampling profilers like py‑spy that may be attached to running processes without restart, and memory tracers

computers are awful

premature optimization

python

My python script is unsatisfactory! Why did it break? How is it slow?

1 Understanding Python’s execution model

To understand how Python code execution can go slow, or fail, it helps to understand the execution model. Philip Guo’s pythontutor.com deserves a shout-out here for the app demonstrating what is going on with basic Python execution. However, Philip is the kind of person who gruffly deletes his articles from the internet with extreme prejudice, which is behaviour indistinguishable from that of a crank, so take what he says with a grain of salt.

2 Reloading edited code

Changing code? Sometimes it’s complicated to work out how to load some big dependency tree of stuff. There is an autoreload extension which in principle reloads everything that has changed recently:

%load_ext autoreload
%autoreload 2

It usually works, but usually is not enough for a debugging loop. If I don’t trust the reload, I can force manually using deepreload. I can even monkey patch traditional reload to be deep, I read somewhere:

import builtins
from IPython.lib import deepreload
builtins.reload = deepreload.reload

That didn’t work reliably for me. If I load them both at the same time, stuff gets weird. Don’t do that.

Also, this is incompatible with snakeviz profiling. Errors ensue.

It’s better to spin up a new kernel and run the code from scratch, if you can.

Also worth trying, marimo which is a notebook that tracks dependencies very explicitly and so far I have not been able to break its reload functionality.

3 Debugging

3.1 Built-in debugger

Let’s say there is a line in my code that fails:

1/0

In vanilla Python if I want to debug the last exception (the post-mortem debugger) I do:

import pdb; pdb.pm()

and if I want to drop into a debugger from some bit of code, I write:

import pdb; pdb.set_trace()

or in Python 3.7+:

breakpoint()

This is a pretty good solution that works well and is available AFAICT everywhere.

The main problem is that before Python 3.7 or in exotic execution environments they constantly changes the recommended way of invoking the debugger. Get ready for a LONG LIST OF ALTERNATIVES.

If I want a debugger with rich autocomplete, there is a nice one in IPython. Here’s a manual way to drop into the IPython debugger from code, according to Christoph Martin and David Hamann:

from IPython.core.debugger import Tracer; Tracer()()  # < 5.1
from IPython.core.debugger import set_trace; set_trace()  # >= v5.1

However, that’s not how we are supposed to do it in polite society. In particular, it’s terribly complicated in Jupyter because everything is complicated in Jupyter. Jupyter-users of quality are rumoured to invoke their debuggers via so-called magics, e.g. the %debug magic to set a breakpoint at a certain line number:

%debug [--breakpoint filename:line_number_for_breakpoint]

Pish posh, who thinks in line-numbers? set_trace wastes less time for humans per default.

An actual use I would make of %debug is to drop into post-mortem debugging; Without the argument, %debug activates post-mortem mode. And if I want to drop automatically into the post-mortem debugger for every error:

%pdb on

Props to Josh Devlin for explaining this and some other handy tips, and also Gaël Varoquaux.

If that seems abstruse or verbose, ipdb exposes the enhanced debugger from IPython simply and explicitly:

pip install ipdb

import ipdb; ipdb.set_trace()

or:

import ipdb; ipdb.pm()

From the CLI, we can invoke the debugger on a script with

python -m ipdb -c continue your_script.py

This will drop into the post-mortem debugger when something breaks.

ipdb doesn’t work in Jupyter, whose interaction loop is incompatible. %debug does, but it’s fairly horrible, because Jupyter frontends are a mess and various things break; e.g. if I try to execute non-debugger code while in the debugger the entire notebook sometimes freezes unrecoverably; this is very easy to do because the debug console is small and easy to miss when trying to click on it. Any time I find myself needing to debug debugging in Jupyter I am briefly filled with despair, then I remember that there is no overwhelming moral imperative for me to use Jupyter for anything and I can switch to IPython or VS Code or marimo, unless I am trapped in Google Colab or something.

4 Alternative debugging systems

Of course, this is Python, so the core built-in stuff is wreathed in a fizzing haze of short-lived re-implementations that exist probabilistically for an instant then annihilate, like virtual particles in the void. Trillions of debuggers were potentially invented then abandoned on GitHub in the time it took you to read this sentence; Some radiate outwards like Hawking radiation, only to recede away from you in the expanding space of version dependency.

4.1 VS Code debugger

See VS Code Python debugger.

5 Python hunter

Pascal Hirsch mentions

ionelmc/python-hunter: Hunter is a flexible code tracing toolkit.

To quote:

…on your Python debugging page, I didn’t see https://github.com/ionelmc/python-hunter mentioned which I really like. It uses.pth files to leverage Python’s path manipulation magic for configuration/triggering (and even loading in the first place), letting you selectively examine things simply through setting environment variables (downsides: behind-the scenes manipulation of module loading/paths, considerable execution slowdown, even in non-triggered code parts).

Try it like so:

PYTHONHUNTER="stdlib=False,action=CallPrinter(force_colors=True)" uv run --with hunter python -c 123456789

5.1 pudb

Pudb seems to be very close to the native debugger but with console enhancements

Syntax-highlighted source, the stack, breakpoints and variables are all visible at once and continuously updated. This helps you be more aware of what’s going on in your program. Variable displays can be expanded, collapsed and have various customization options.

Simple, keyboard-based navigation using single keystrokes makes debugging quick and easy. PuDB understands cursor-keys and Vi shortcuts for navigation. Other keys are inspired by the corresponding pdb commands.

Drop to a Python shell in the current environment by pressing “!”. Or open a command prompt alongside the source-code via “Ctrl-X”.

Ability to control the debugger from a separate terminal.

5.2 PyCharm

My brother Andy likes the PyCharm/IntelliJ IDE’s built-in Python debugger. I have not used it.

5.3 Viztracer

VizTracer

… is a low-overhead logging/debugging/profiling tool that can trace and visualise your Python code to help you intuitively understand your code and figure out the time-consuming part of your code.

VizTracer can display every function executed and the corresponding entry/exit time from the beginning of the program to the end, which is helpful for programmers to catch sporadic performance issues.

Sure, sounds fine.

5.4 pysnooper

PySnooper claims:

instead of carefully crafting the right print lines, you just add one decorator line to the function you’re interested in. You’ll get a play-by-play log of your function, including which lines ran and when, and exactly when local variables were changed.

I always think I’d like to use this, but in practice I don’t.

5.5 Pyrasite

pyrasite injects code into running Python processes, which enables more exotic debuggery, and real-time object mutation and stuff and of course, memory and performance profiling.

5.6 Yet more

Gaël recommended some extra debuggers:

aiomonitor is REPL-injection for async Python
pudb, a curses-style debugger, is popular.
The trepan family of debuggers, trepan3k (Python 3), trepan (Python 2), ipython-trepan (theoretically IPython but looks unmaintained). Docs live here.

Jeez, OK. But wait there are more.

There are many other debuggers.
That’s too many debuggers
Realistically I won’t use any of them, because the inbuilt one is OK, and already hard enough to keep in my head without putting more points of failure in the mix
Stop making debuggers

6 Memory leaks

Python 3 has tracemalloc built in. This is a powerful Python memory analyser, although bare-bones. Mike Lin walks us through it. Benoit Bernard explains various options that run on older Pythons, including, most usefully IMO, objgraph which draws us an actual diagram of where the leaking things are. More full-featured, Pympler provides GUI-backed memory profiling, including the magically handy thing of tracking referrers using its refbrowser.

6.1 Memray

Memory specialist. bloomberg/memray: Memray is a memory profiler for Python

Memray is a memory profiler for Python. It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data. While commonly used as a CLI tool, it can also be used as a library to perform more fine-grained profiling tasks.

Notable features:

🕵️‍♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers.

ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results.

🏎 Blazing fast! Profiling slows the application only slightly. Tracking native code is somewhat slower, but this can be enabled or disabled on demand.

📈 It can generate various reports about the collected memory usage data, like flame graphs.

🧵 Works with Python threads.

👽🧵 Works with native-threads (e.g. C++ threads in C extensions).

Memray can help with the following problems:

Analyze allocations in applications to help discover the cause of high memory usage.

Find memory leaks.

Find hotspots in code that cause a lot of allocations.

Note that Memray only works on Linux and cannot be installed on other platforms.

6.2 Scalene

See below.

7 Profiling

Maybe it’s not crashing, but simply taking too long? Then I want a profiler. There are, of course, lots of profilers, and they each dwell in a city built upon the remains of a previous city, inhabited by other profilers lost to time. Searching for a good profile is not so simple, for we encounter profilers from various archaeological strata as we excavate the internet, and each was acclaimed in its day.

First, we pause to note that debugging tools pysnooper and viztracer both have profiling features. Also we might want to profile various things, such as code speed, code memory use and the trade-off between speed and memory. All the below options have different micro-specialties across this area. Next, profiling-specific alternatives:

7.1 Built-in profiler

Profile functions using cProfile:

import cProfile as profile
profile.runctx('print(predded.shape)', globals(), locals())

CProfile is not so hip any longer. There are some other ones that are more fashionable.

7.2 Scalene

plasma-umass/scalene

… is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than many other profilers while delivering far more detailed information. It is also the first profiler ever to incorporate AI-powered proposed optimisations.

Includes web-gui and VS Code integration.

Maybe the freshest thing here? Colleagues of mine love it but I have not used it.

7.3 py-spy

py-spy

[…] lets you visualise what your Python program is spending time on without restarting the program or modifying the code in any way. Py-Spy is extremely low overhead: it is written in Rust for speed and doesn’t run in the same process as the profiled Python program, nor does it interrupt the running program in any way. This means Py-Spy is safe to use against production Python code. […]

This project aims to let you profile and debug any running Python program, even if the program is serving production traffic. […]

Py-spy works by directly reading the memory of the Python program using the process_vm_readv system call on Linux, the vm_read call on macOS or the ReadProcessMemory call on Windows.

Figuring out the call stack of the Python program is done by looking at the global PyInterpreterState variable to get all the Python threads running in the interpreter, and then iterating over each PyFrameObject in each thread to get the call stack.

Native IPython can run the profiler magically:

%%prun -D somefile.prof

files = glob.glob(’*.txt’) for file in files: with open(file) as f: print(hashlib.md5(f.read().encode(‘utf-8’)).hexdigest())


Great worked example — [Making Python 100x faster with less than 100 lines of Rust](https://ohadravid.github.io/posts/2023-03-rusty-python/):

>Python has a built-in Profiler (`cProfile`), but in this case it’s not really the right tool for the job:
>
>1. It’ll introduce a lot of overhead to all the Python code, and none for native code, so our results might be biased.
>2. We won’t be able to see into native frames, meaning we aren’t going to be able to see into our Rust code.
>
>We are going to use `py-spy` ([GitHub](https://github.com/benfred/py-spy)).
>
>`py-spy` is a [sampling profiler](https://en.wikipedia.org/wiki/Profiling_(computer_programming)#Statistical_profilers) which can see into native frames.
>
>They also mercifully publish pre-built wheels to pypi, so we can just `pip install py-spy` and get to work.

### Score-P

HPC-friendly profiling can be provided by `scorep`, a Python binding of the popular multiprocessing score function.
@Gocht2021Advanced:

>In this paper, we present the Python bindings for Score-P, which make it easy for users to trace and profileFootnote 4 their Python applications, including the usage of (multi-threaded) libraries, MPI parallelism and accelerator usage.

* [Tool Time: Profiling and Tracing of Python Code with Score-P | Performance Optimisation and Productivity](https://pop-coe.eu/blog/tool-time-profiling-and-tracing-of-python-code-with-score-p)
* [VI-HPS :: Projects :: Score-P](https://www.vi-hps.org/projects/score-p)
* [score-p/scorep\_binding\_python: Allows tracing of python code using Score-P](https://github.com/score-p/scorep_binding_python#user-regions)
* [Profile and Trace an Application · score-p/scorep\_binding\_python Wiki](https://github.com/score-p/scorep_binding_python/wiki/Profile-and-Trace-an-Application)

### Austin

I do not know much about this.

* [P403n1x87/austin-python: Python wrapper for Austin, the CPython frame stack sampler.](https://github.com/P403n1x87/austin-python)
* [P403n1x87/austin: Python frame stack sampler for CPython](https://github.com/p403n1x87/austin)

### Visualising profiles

* [flamegraph](https://github.com/brendangregg/FlameGraph)
* [speedscope](https://github.com/jlfwong/speedscope)
* [snakeviz](https://jiffyclub.github.io/snakeviz/) is a
  browser-based system that might be ok for the output of CProfile profiles
* ftrace profiles

  * [Chrome’s catapult system](https://chromium.googlesource.com/catapult/+/refs/heads/main/README.md) can view traces — `chrome://tracing/` or `brave://tracing/` in the browser
  * They have a new UI called [perfetto](https://perfetto.dev/)

* [convert the output](http://thirld.com/blog/2014/11/30/visualising-the-results-of-profiling-python-code/)
  to [cachegrind](http://valgrind.org/docs/manual/cg-manual.html) format
  for visualisation in the many `cachegrind`
  tools.
* [py-spy](https://github.com/benfred/py-spy) includes built-in flame graphs
* ~~[runsnakerun](http://www.vrplumber.com/programming/runsnakerun/) —
  the original Python profiling visualizer~~, now expired.

[SnakeViz](https://jiffyclub.github.io/snakeviz/) includes a handy magic to automatically save stats and launch the
profiler.
(Gotcha: I have to have the snakeviz CLI already on the path when I launch IPython.)

```ipython
%load_ext snakeviz
%%snakeviz
files = glob.glob('*.txt')
for file in files:
  with open(file) as f:
    print(hashlib.md5(f.read().encode('utf-8')).hexdigest())

This is incompatible with autoreload and gives weird errors if I run them both in the same session.

8 Testing

You may not be amazed to learn that there are many frameworks. The most common seem to be unittest, py.test and nose.

More robust tests.
Jacon Kaplan-Moss likes pytest and he’s good let’s copy him.

FWIW I’m no fan of nose; my experience of it was that I spent a lot of time debugging weird failures getting lost in its attempts to automagically help me. This might be because I didn’t deeply understand what I was doing, but the other frameworks didn’t require me to understand so deeply the complexities of their attempts to simplify my life.

9 Typing

microsoft/pyright: Static type checker for Python

10 Reference: Useful step debugger commands

For the in-built step debugger the following commands are especially useful:

! statement: Execute the (one-line) statement in the context of the current stack frame, even if it mirrors the name of a debugger command. This is the most useful command because the debugger parser is horrible and will always interpret anything it conceivably can as a debugger command instead of a Python command, which is confusing and misleading. So preface everything with ! to be safe.
h(elp) [command]: Guess
w(here): Print your location in the current stack
d(own) [count]/up [count]: Move the current frame count (default one) levels down/ in the stack trace (to a newer frame).
b(reak) [([filename:]lineno | function) [, condition]]: The one that is tedious to do manually. Without argument, list all breaks and their metadata.
tbreak [([filename:]lineno | function) [, condition]]: Temporary breakpoint, which is removed automatically when it is first hit.
cl(ear) [filename:lineno | bpnumber [bpnumber …]]: Clear specific or all breakpoints
disable [bpnumber [bpnumber …]]/enable [bpnumber [bpnumber …]]: disable is mostly the same as clear, but you can re-enable
ignore bpnumber [count]: Ignore a breakpoint a specified number of times
condition bpnumber [condition]: Set a new condition for the breakpoint
commands [bpnumber]: Specify a list of commands for breakpoint number bpnumber. The commands themselves appear on the following lines. Type end to terminate the command list.
s(tep): Execute the next line, even if that is inside an invoked function.
n(ext): Execute the next line in this function.
unt(il) [lineno]: Continue to line lineno, or the next line with a higher number than the current one
r(eturn): Continue execution until the current function returns.
c(ont(inue)): Continue execution, only stop when a breakpoint is encountered.
j(ump) lineno: Set the next line that will be executed. Only available in the bottom-most frame. It is not possible to jump into weird places like the middle of a for loop.
l(ist) [first[, last]]: List source code for the current file.
ll | longlist: List all source code for the current function or frame.
a(rgs): Print the argument list of the current function.
p expression: Evaluate the expression in the current context and print its value.
pp expression: Like the p command, except the value of the expression is pretty-printed using the pprint module.
whatis expression: Print the type of the expression.
source expression: Try to get source code for the given object and display it.
display [expression]/undisplay [expression]: Display the value of the expression if it changed, each time execution stops in the current frame.
interact: Start an interactive interpreter (using the code module) whose global namespace contains all the (global and local) names found in the current scope.
alias [name [command]]/unalias name: Create an alias called name that executes command.
q(uit): Pack up and go home

The alias one needs another look, right? How even does it…

As an example, here are two useful aliases from the manual, for the .pdbrc file:

# Print instance variables (usage `pi classInst`)

alias pi for k in %1.__dict__.keys(): print(“%1.”,k,“=”,%1.__dict__[k])
# Print instance variables in self

alias ps pi self

11 References

Gocht, Schöne, and Frenzel. 2021. “Advanced Python Performance Monitoring with Score-P.” In Tools for High Performance Computing 2018 / 2019.

Knüpfer, Rössel, Mey, et al. 2012. “Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,Scalasca, TAU, and Vampir.” In Tools for High Performance Computing 2011.