Tolerating Jupyter’s file format
In order to make your python code accessible, we wrapped it into encoded javascript strings. No need to thank us.
2017-02-09 — 2025-09-01
Various notes on dealing with the jupyter file format, which in the name of convenience, gives us new and different problems to learn to manage. Because jupyter notebooks (the file format) are a weird mash of binary multimedia content and program input and output data, all wrapped up in a JSON encoding, many things that would be simple and seamless with normal text simply do not work for the .ipynb
jupyter file format. This is a huge barrier to seamlessly integrating text and notebook-based development practices, and thus impairs the jupyter mission goal of providing an easy on-ramp to data science for users. I think this is one of the larger annoyances of the many in the jupyter system, and it would have been completely avoidable if they had settled on a less awkward textual data format than JSON for the backend storage, like R or julia did. Too late now, I suppose. There are various workarounds and ameliorations, but no one agrees which to use, so I switch between them constantly.
1 Generating Jupyter code via AI
Can Claude et al generate jupyter notebooks for me? Not natively — it’s horrible.
I have seen some MCP servers that might make this feasible, as noted under jupyter front ends.
- jjsantos01/jupyter-notebook-mcp: A Model Context Protocol (MCP) for Jupyter Notebook
- ClaudeJupy: Persistent Python & Jupyter for Claude AI
However, I have not tested them.
2 Version control
One thing that breaks is diffing and merging; Things get messy when I try to put notebooks into version control. In particular, my repository gets very large and my git client may or may not show diffs. Oh, and merging using the usual merge tools is likely to break things because merge tools do not know about idiosyncratic JSON-based storage. How do we fix that? Here are some options.
2.1 No jupyter
I want an interactive workbook that can dynamically execute code and include documentation?
No problem. There are many solutions that are superior to jupyter for this, albeit less hyped. One obvious example is knitr, which can support Python. Lesser known projects like pweave also work fine. marimo is my current favourite. VS Code Interactive Python mode is also pretty good.
There is nothing requiring me to stick to jupyter except that it is mysteriously popular and ubiquitous.
However, popularity is a strong reason. jupyter is the qwerty of Python development, so we are usually stuck with it. Here are some methods to version those globs of horrible file cruft more gracefully.
2.2 nbdev
For Python projects, there is Nbdev which aims to solve a number of problems at once, including versioning notebooks. Main value proposition:
- A robust, two-way sync between notebooks and source code, which allow you to use your IDE for code navigation or quick edits if desired.…
- Tools for merge/conflict resolution with notebooks in a human readable format.
- active maintenance and improvement by a transparent and use-engaged crew. See nbdev v2 review: Git-friendly Jupyter Notebooks
There are other useful things, too.
- Automatically generate docs from Jupyter notebooks. These docs are searchable and automatically hyperlinked to appropriate documentation pages by introspecting keywords you surround in backticks.
- Utilities to automate the publishing of pypi and conda packages including version number management.
- Ability to write tests directly in notebooks without having to learn special APIs. These tests get executed in parallel with a single CLI command. You can even define certain groups of tests such that you don’t have to always run long-running tests.
- Continuous integration (CI) comes setup for you with GitHub Actions out of the box, that will run tests automatically for you. Even if you are not familiar with CI or GitHub Actions, this starts working right away for you without any manual intervention.
- Integration With GitHub Pages for docs hosting: nbdev allows you to easily host your documentation for free, using GitHub pages.
- Create Python modules, following best practices such as automatically defining
__all__
(more details) with your exported functions, classes, and variables.- Math equation support with LaTeX.
So I guess that’s nice? I am faintly offended that the solution to work around Jupyter’s attempt to “fix” plain text storage by replacing it is to reinvent it.
nbdev does not, sadly, make the Jupyter browser client itself an easier place to type code. I still prefer to use a normal code editor or IDE for editing code, even Jupyter notebooks.
2.3 Strip notebooks
We can automatically strip images and other big things from our notebook to keep them smaller and tidier if we are using git as our version control. They still work, but if we restore the notebook from git it no longer has all the graphics and it’s many megabytes smaller. Usually that’s fine, since we already have the code to generate them again right there, so we don’t necessarily want them around anyway.
Doing it manually is tedious. See how fastai does this automatically It uses automated git hooks. It’s not well explained, but it works. If we aren’t working for fastai, the quickest way is nbstripout, which AFAICT is what the fastai hack is based on nbstripout includes its own installation script, which usually works except in git submodules — but nothing works in submodules, so no change there. We can set up attributes so that these filters and others run automatically. It’s surprisingly under-documented for some reason. Excluding certain files from nbstripout filtering can be done several ways, including in the notebook itself. See the GitHub issue on that theme./
tl;dr In the repository do this:
I do this for all my notebooks now. This doesn’t entirely solve the diffing and merging hurdles, but it usually removes just enough pointless cruft that merging works.
2.4 Minimalist images
If we want images in the notebook to be small, tell matplotlib to use low-quality images:
I had to dig deep to find this. The answers were in the source of Python.core.pylabtools.select_figure_formats and matplotlib_inline/backend_inline.py. It was a nice couple of months while I had that trick, but it no longer works in matplotlib 3.6.
A tip from Erin Kenna for plotly is to keep versioned images small, but occasionally allow larger ones.
#%%
# Pick a renderer
# https://plotly.com/python/renderers/
renderer="plotly_mimetype"
# renderer="jpeg"
if renderer == "jpeg":
jpeg_renderer = pio.renderers['jpeg']
jpeg_renderer.width = None
jpeg_renderer.height = None
jpeg_renderer.scale = 1.8 # modifying the scale since we won’t have zoom controls
Then explicitly pass it to all plots.
This is obviously a little more manual and error-prone, but being able to explicitly include some images is useful.
2.5 jupytext
Another way to make my notebooks closer to plain text is jupytext. It claims to do that and more:
Wish you could edit [jupyter notebooks] in your favourite IDE? And get clear and meaningful diffs when doing version control? Then… Jupytext may well be the tool you’re looking for!
Jupytext can save Jupyter notebooks as Markdown and R Markdown documents, Julia, Python, R, Bash, Scheme, Clojure, C++ and q/kdb+ scripts.
There are multiple ways to use jupytext:
Directly from Jupyter Notebook or JupyterLab. Jupytext provides a contents manager that allows Jupyter to save your notebook to your favourite format (
.py
,.R
,.jl
,.md
,.Rmd
…) in addition to (or in place of) the traditional.ipynb
file. The text representation can be edited in your favourite editor. When you’re done, refresh the notebook in Jupyter: inputs cells are loaded from the text file, while output cells are reloaded from the.ipynb
file if present. Refreshing preserves kernel variables, so you can resume your work in the notebook and run the modified cells without having to rerun the notebook in full.On the command line.
jupytext
converts Jupyter notebooks to their text representation, and back. The command line tool can act on notebooks in many ways. It can synchronise multiple representations of a notebook, pipe a notebook into a reformatting tool likeblack
, etc… It can also work as a pre-commit hook if you wish to automatically update the text representation when you commit the.ipynb
file.
A plus is that search-and-replace would then work seamlessly across normal code and wak notebook-encoded code, which, I assure you, is a constant irritation.
One downside is that if I develop a workflow for transforming my notebook back into proper code to run the code, I might wonder whether the notebook has gained me anything over ordinary literate coding besides circuitous workarounds. Am I then sure I don’t secretly want to use knitr? Pro tip: although it advertises itself for R, it already supports Python. That’s worth mentioning again. We don’t need to be arsing about with this thing. We can leave.
Anyway, jupytext sounds promising, right? There’s a downside for me: I just (2020-11-02) spent 90 minutes trying to get jupytext to work on my Jupyter notebook, and it continues to sullenly fail. I don’t have any more time for debugging this nonsense. I might check back in a year or two, but for now it’s dead to me.
2.6 Diffing/merging notebooks (sort-of) natively
Okay, I surrender. We’re stuck with the nasty Jupyter notebook format. Fine. nbdime provides diffing and merging for notebooks. It has git integration:
I do not use this one because it seemed too slow on the large notebooks I was using and did not play well with my git GUI. In any case it does not seem to support 3 way merging, which means most merges fail and need manual intervention anyway.
Development on this is slow and the latest release is broken in the git installation phase. Fixed development release can be installed
3 Exporting notebooks
I can host static versions easily using nbviewer (and GitHub will do this automatically). For fancier variations, I need to read how the document templates work. Here’s a base LaTeX template for, e.g., academic use.
For special occasions I could write your own or customize an existing exporter Julius Schulz has some great tips, For example, he shows how to use cell metadata to format figures like this:
fast.ai’s nb2md trick converts Jupyter notebooks to Markdown for blogging with my blogging platform of choice. See also jupytext
above.