Jupyter

The least excruciating compromise between 1) irreproducible science, and 2) spooking your luddite colleagues


The python-derived entrant in the scientific workbook field is called jupyter.

Interactive “notebook” computing for various languages; python/julia/R/whatever plugs into the open “kernel” interface. Jupyter allows easy(ish) online-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. So handy that it’s sometimes worth the many rough spots, and so I conquer my discomfort with this style of work.

Jupyter considered mostly harmless

jupyter notebook in action

tl;dr Not to besmirch the efforts of the jupyter developers who are doing a difficult thing, but I will complain about jupyter notebook because it is often touted as a wonderful solution to data science but seems to me to merely offer a different selection of pain points to traditional methods. Worse, it introduces some new pain points when you try to combine the old and the new to make something better.

I’m an equivocal advocate of the jupyter notebook interface, which some days seems to counteract every plus with a minus. This is partly due to the particulars of jupyter’s design decisions, and partly because of the problems of notebook interfaces generally (Chattopadhyay et al. 2020). As with so many things in computer interfaces, this luke-warm endorsement makes me, in relative terms, a fan because most other things are worse.

As for jupyter: It’s friendly to use, but hard to install. It’s easy to graphically explore your data, but hard to keep that exploration in version control. It makes it easy to explore your code output, but clashes with the fancy debugger that would make it easy to explore your code bugs. It is open source, and written in an easy scripting language, python, so it seems it should be easy to tweak to taste. In practice it’s an ill-explained spaghetti of javascript and various external packages that relate to one another in obscure ways that few people with a day job have time to contribute to. The sum total is IMO no more easy to tweak than most of the other UI development messes that we tolerate in academic software.

These pain points seem are perhaps not so intrusive for projects of intermediate complexity, and jupyter seems good at making such projects look smooth, shiny, and inviting. That is, at the crucial moment when you need to make your data science project look sophisticated-yet-friendly, it lures colleagues into your web(-based IDE). Then it is too late mwhahahahah you have fallen into my trap now you are committed. This might be a feature not a bug, as far as the realities of team dynamics and their relation to software development.

Some argue that the weird / irritating constraints of jupyter can even lead to good architecture, such as Guillaume Chevallier and Jeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric. Or the famous “The fastest code is code that doesn’t need to run, and the best code is code you don’t need to write” adage whose corollary is presumably “so let’s make writing code horrible so that you don’t”. I could be persuaded, if I found time in between all the debugging.

Here is some verbiage by Will Chrichton, The Future of Notebooks: Lessons from JupyterCon

Terminology

Terminology tarpit alert.

The notebook is on one hand a style of interface which this conforms to. Other applications with a notebook style of interface are Mathematica and MATLAB.

These interfaces communicate with a computational backed, which is called a kernel, because in mathematics and computers science if you don’t know what to call something you call it a kernel. This confusing explosion of definitions is very much on-message for the notebook development.

These are software packages in which a unit of development is a type of notebook file on your disk, containing both code and output of that code. (In the case of jupyter this file format is marked by file extension .ipynb, which is short for “ipython notebook” for fraught historical reasons.) One implementation of a notebook frontend interface over a notebook protocol for jupyter is called the jupyter notebook, launched by the jupyter notebook command. Another common notebook-style interface implementation is called jupyter lab, which additionally uses much of the same jupyter notebook infrastructure but is distinct and only sometimes interoperable in ways which I do not pretend to know in depth. And in fact there are multiple ‘frontends’ besides, which interact over the jupyter notebook protocol to talk to a kernel. One is called jupyter notebook and which will open up a javascript-backed notebook interface in a web browser. This is the one that is usually assumed. But there are many more.

Which sense of notebook is intended you have to work out from context, e.g. the following sentence is not at all tautological:

You dawg, I heard you like notebooks, so I started up your jupyter notebook in jupyter notebook.

Hearing this, this I became enlightened.

Version control

jupyter diffing and merging is painful. Because jupyter notebooks (the file format) are a weird mash of binary multimedia content and program input and output data, all wrapped up in a JSON encoding, things get messy when you try to put them into version control. in particular, your repository gets very large and your git client may or may not show diffs. Oh, and merging is a likely to break things.

This is a huge barrier to seamlessly integrating text and notebook-based development practices, and thus impairs the mission goal of [providing an easy on-ramp to data science for users. IMO this is one of the most annoying features of the notebook system, and it would have been completely avoidable, if they had settled on a less restrictive textual data format than JSON for the backend storage. Too late now, I suppose.

Strip notebooks

You can automatically strip images and other big things from your notebook to keep them tidy if you are using git as your version control. This means you lose the graphs and such that you just generated as outputs in your notebook. On the other hand you already have the code to generate them again right there, so you don’t necessarily want them around anyway. @mgeier documents the details of manually doing this for jupyter. This is still not perfect since the code will be wrapped up in mangled JSON formatting, but at least it is readable.

Manually doing it is tedious. See how fastai does this automatically with automated git hooks. Not well explained, but works. The quickest way if we are not working for fastai is nbstripout upon which the fastai hack is AFAICT based and which includes its own installation script. You can set up attributes so that these filters and others are invoked automatically. It’s a surprisingly under-documented thing for some reason.

tldr In your repository do this

pip install nbstripout
nbstripout --install

After you check a notebook out from git you will notice that there are no output cell contents any more. You can recreate the outputs of all your code input cells by running them again if desired, but they won’t go into git, which is usually what you want. I do this for all my notebooks now. This doesn’t entirely solve the diffing and merging hurdles, but usually removes just enough pointless cruft that merging kind-of works.

jupytext

One way you can make your notebooks manageable is to turn them into text. I haven’t tried this myself but it looks like it could be made to behave well automatically.

jupytext can do that and more:

Wish you could edit [jupyter notebooks] in your favourite IDE? And get clear and meaningful diffs when doing version control? Then… Jupytext may well be the tool you’re looking for!

Jupytext can save Jupyter notebooks as Markdown and R Markdown documents, Julia, Python, R, Bash, Scheme, Clojure, C++ and q/kdb+ scripts.

There are multiple ways to use jupytext:

Directly from Jupyter Notebook or JupyterLab. Jupytext provides a contents manager that allows Jupyter to save your notebook to your favorite format (.py, .R, .jl, .md, .Rmd …) in addition to (or in place of) the traditional .ipynb file. The text representation can be edited in your favorite editor. When you’re done, refresh the notebook in Jupyter: inputs cells are loaded from the text file, while output cells are reloaded from the .ipynb file if present. Refreshing preserves kernel variables, so you can resume your work in the notebook and run the modified cells without having to rerun the notebook in full.

On the command line. jupytext converts Jupyter notebooks to their text representation, and back. The command line tool can act on notebooks in many ways. It can synchronize multiple representations of a notebook, pipe a notebook into a reformatting tool like black, etc… It can also work as a pre-commit hook if you wish to automatically update the text representation when you commit the .ipynb file.

A plus here is that search-and-replace would then work seamlessly across normal code and wak notebook-encoded code, which is, I assure you, a constant irritation.

One downside here is that if you develop a workflow about transforming your notebook back into proper code in order to run it, you might wonder if the notebook has gained you anything over ordinary literate coding except circuitous workarounds so you can have visualisation plots embedded in your code.

Are you sure you do not secretly want to be running knitr

Diffing/merging notebooks natively

One workaround: nbdime provides diffing and merging for notebooks. It has git integration:

nbdime config-git --enable --global

I do not use this one because it seemed too slow on the large notebooks I was using and did not play well with my git GUI.

Exporting notebooks

You can host static versions easily using nbviewer (and github will do this automatically.) For fancy variations you need to read how the document templates work. Here is a base latex template for e.g. academic use.

For special occasions you can write your own or customize an existing exporter Julis Schulz has virtuosic tips, e.g. using cell metadata to format figures like this:

{
  "caption": "somecaption",
  "label": "fig:somelabel",
  "widefigure": true
}

fast.ai’s nb2md trick renders jupyter for blogging with your blogging platform of choice. See also jupytext above.

Presentations using Jupyter

Basic: export to reveal.js

You can use my favourite dorky presentation hack!

The easiest is Classic reveal.js mode. tl;dr:

$ jupyter nbconvert --to slides some_notebook.ipynb  --post serve

You might want to make various improvements, such as tweaking the reveal.js settings in jupyter slideshows

If you aren’t running a coding class, you will want to hide the input cells from your IPython slides by customising the output templates, or you can suppress all code by using output format hide_code_slides.

These kind of custom tweaks are not too crazy but you need a copy of the reveal.js source code for some of them. A more comprehensive version using a custom theme and the reveal.js source looks like

jupyter nbconvert Presentation.ipynb  \
    --to slides --reveal-prefix ../../reveal.js \
    --post serve --SlidesExporter.reveal_theme=league

Fancier: integrated slideshows using RISE

Fancier again: interactive slideshows using RISE.

To meet your house style requirements it is usually sufficient to customise some decorations and alter some css.

Major plus: you can execute code while running the slide!

Major minus: there is no facility that I can see to style your cover slides differently, which is incompatible with, e.g., my university’s style guide.

If you don’t wish to display inline input code, you can avoid it with hide_code

Install using pip:

pip install hide_code
jupyter nbextension install --py hide_code
jupyter nbextension enable --py hide_code
jupyter serverextension enable --py hide_code

Insall using conda:

conda install -c conda-forge hide_code

Front ends

Jupyter is, as presaged, a whole ecology of different language back end kernels talking to various front end interfaces and it is terribly confusing.

  • Classic jupyter notebook, the archetypal browser-based coding environment. The command jupyter notebook starts this mode.
  • jupyterlab, the new thing, extends and redesigns the classic notebook into an IDE with text editors, notebooks, REPL terminals etc. The command jupyter lab starts this mode.
  • base ipython shell can execute notebooks. Is that the same as jupyter console?
  • hydrogen, a plugin for the atom text editor, provides a more unified coding experience with a normal code editor. See intro blog post. IMO, this kind of thing is a generally better way of doing it. Jupyter shouldn’t have to reinvent text editors. Although my opinions will not dissuade some Quixote from taking it on.
  • vscodeJupyter is hydrogen for VS Code.
  • nteract, a desktop app for running jupyter notebooks as apps, integrating with OS indexing services and looking pretty etc. Not totally sold on this idea because it looks so bloaty, but I would like to be persuaded.
  • pweave, below, also executes jupyter kernels as part of a reproducible document.
  • qtconsole is a traditional client.

Notebook classic

Configuring

the location of themeing, widgets, CSS etc has moved of late; check your version number. The current location is ~/.jupyter/custom/custom.css, not the former location ~/.ipython/profile_default/static/custom/custom.css

Julius Schulz’s ultimate setup guide is also the ultimate pro tip compilation.

Auto-closing parentheses

Kill parenthesis molestation (a.k.a. bracket autoclose) with fire. Unless you like having to fight with your IDE’s misplaced faith in its ability to read your mind. The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead cm_config.autoCloseBrackets and is not in the preference menus. According to a support ticket this should work.

# Run this in Python once, it should take effect permanently

from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {
  "cm_config": {"autoCloseBrackets": False}}})

or add the following to your custom.js:

define([
    'base/js/namespace',
], function(Jupyter) {
    Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})

or maybe create ~/.jupyter/nbconfig/notebook.json with the content

{
  "CodeCell": {
    "cm_config": {
      "autoCloseBrackets": false
    }
  }
}

That doesn’t work with jupyterlab, which is even more righteously sure that it knows better than you about what you truly wish to do with parentheses. Perhaps the following does work? Go to Settings --> Advanced Settings Editor and add the following to the User Overrides section:

{
  "codeCellConfig": {
    "autoClosingBrackets": false
  }
}

Ahhhhhhhh.

Notebook extensions

Jupyter classic is more usable if you install the notebook extensions, which includes, e.g. drag-and-drop image support.

$ pip install --upgrade jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user

For example, if you run nbconvert to generate a HTML file, this image will remain outside of the html file. You can embed all images by using the calling nbconvert with the EmbedPostProcessor.

$ jupyter nbconvert --post=embed.EmbedPostProcessor

Update — broken in Jupyter 5.0

Wait, that was still pretty confusing; I need the notebook configurator whatsit.

$ pip install --upgrade jupyter_nbextensions_configurator
$ jupyter nbextensions_configurator enable --user

Jupyter lab

jupyter lab (sometimes styled jupyterlab) is the current cutting edge, and reputedly is much nicer to develop plugins for than the notebook interface. From the user perspective it’s more or less the same thing, but the annoyances are different. It does not strictly dominate notebook in terms of user experience, although I understand it does in terms of experience for plugin developers.

A pitch targeted at us users explains some practical implications of this and how it is the one true way and the righteous future etc. Since we are not in the future, though, we must deal with certain friction points in the present, actually-existing jupyterlab.

Jupterlab UI

The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already worked, but as a jankier javascript version. The jupyter lab creators have succumbed as far as reimplementing copy, paste, search/replace, browser tabs and the command line. In the tradition of jupyter this I think of this as

Yo dawg I heard you like notebook tabs so I put notebook tabs in your browser notebook tab.

The replacement versions run in parallel to the existing versions, with clashing keyboard shortcuts and confusingly similar but distinct function. One wonders if the creators of the UI are actually using the UI, or if they are all working away writing code in text editors and blogging about the UI without touching it, or maybe this is just Tesler’s law in action.

Because I am used to how all these functions work in the browser, it would have be a huge improvement in each of them to be worth my time learning the new jupyterlab system, which after all, I am not using for its quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics. Large UX improvements are not immediately forthcoming, as far as I can tell, but rather, we get some unintuitive trade-offs like a search function which AFAICT non-deterministically sometimes does regexp matching but then doesn’t search the whole page. Surely that cannot be the goal? Some jupyterlab enthusiasts want to re-implement text editors too. I’m not a fan of this DIY handicraft for my own use, but if that is your thing, please do weigh in with some code contributions, and I will be greteful.

Whether you like the overwrought jupyter lab UX or not, we should all live with whatever NIH it exemplifies, if the developer API is truly cleaner and and easier to work with. That would be a solid win in terms of delivering the interactive coding features I would actually regard as improvements including, maybe, ultimately, a better UI. In the meantime, I ignore the things that vex me. Mostly.

Personal peeve: As presaged, jupyter lab molests brackets, compulsorily as test of your faith per default.

Multiple clients connecting to a single kernel

Supported natively in jupyterlab with a good UI for identifying running notebooks. IMO this is the killer feature of jupyter lab. Which is to say, this is useable but sometime calculation output gets lost in the system somewhere.

There is an underdocumented project to introduce real time collaboration to jupyterlab coordinating on notebook content code, output and backend state, but that does not work yet.

If you have a snakepit of different jupyter sessions running on some machine you have just logged in to and which to open up the browser to get a UI for them, then you want to work out which are running on a given machine so that you can attach them. The command is (for either jupyter notebook notebooks or jupyter lab sessions):

jupyter notebook list

Lab extensions

Related to, inspired but and maybe conflicting or intersecting with the nbextensions are the labextensions, which add bits of extra functionality to the lab interface rather than the notebook interface (where the lab interface is built upon the notebook interface and runs notebooks just like it but has some different moving parts under the hood.)

I try to keep the use of these to a minimum as I have a possibly irrational foreboding that some complicated death spiral of version clashes is beginning between all the different juptyer kernel and lab and notebook installations I have cluttering up my hard disk and it can’t improve things to put various versions of lab extensions in the mix can it? And I really don’t want to have to understand how it works to work out whether that is true or not, so please don’t explain it to me. I moreover do not wish to obsessively update lab extensions everywhere.

Anyway there are some useful ones, so I live with it by running install and update commands obsessively in every combination of kernel/lab/whatever environment in the hope that something sticks.

Life is easier with jupyerlab-toc which allows you to navigate your lab notebook by markdown section headings.

jupyter labextension install @jupyterlab/toc

The upgrade command is

jupyter labextension update @jupyterlab/toc

Integrated diagram editor? Someone integrated drawio as jupyterlab-drawio to prove a point about the developer API thing.

jupyter labextension install jupyterlab-drawio

LaTeX editor? As flagged, I think this is a terrible idea. Even worse than the diagram editor. There are better editors than jupyter, better means of scientific communication than latex, and better specific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, e.g. as a plot point in a contrived techno-thriller script written by cloistered nerds. If you find yourself in such dramaturgical straits:

jupyter labextension install @jupyterlab/latex

One nerdy extension is jupyter-matplotlib, a.k.a., confusingly, ipympl, which integrates interactive plotting into the notebook better.

pip install ipympl
# If using JupyterLab

# Install nodejs: https://nodejs.org/en/download/
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyter-matplotlib

qtconsole

A classic, i.e. non-web-browser based client for jupyter. No longer fashionable? Seems to work fine but is sometimes difficult to compile and doesn’t support all the fancy client-side extensions.

Multiple clients connecting to a single kernel

jupyter qtconsole can connect two frontends to the same kernel. This will be loopy since they update the same variables (presumably) but AFAICT not the same notebook content, so some care would be required to make sure you are doing what you intend.

%qtconsole

Google colab

A proprietary google fork/extension, Colaboratory is a jupyter thingo integrated with some fancy hosting and storage infrastructure, and you get free GPUs. Looks like a neat way of sharing things, and all that they demand is your soul. TBD.

Rich display

Various objects support rich display of python objects e.g. IPython.display.Image

from IPython.display import Image
Image(filename='img/test.png')

or you can use markdown for local image display

![title](img/test.png)

If you want to make your own objects display, uh, richly, you can implement the appropriate magical methods:

class Shout(object):
    def __init__(self, text):
        self.text = text

    def _repr_html_(self):
        return "<h1>" + self.text + "</h1>"

I leverage this to make a latex renderer called latex_fragment which you should totally check out for rendering inline algorithms, or for emitting SVG equations.

Custom kernels

jupyter looks for kernel specs in a kernel spec directory, depending on your platform.

Say your kernel is dan; then the definition can be found in the following location:

  • Unixey: ~/.local/share/jupyter/kernels/dan/kernel.json
  • macOS: ~/Library/Jupyter/kernels/dan/kernel.json
  • Win: %APPDATA%\jupyter\kernels\dan\kernel.json

See the manual for details.

How to set up jupyter to use a virtualenv (or other) kernel? tl;dr Do this from inside the virtualenv to bootstrap it:

pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv-name

Addendum: for Anaconda, you can auto-install all conda envs, which worked for me, unlike the ipykernel method.

conda install nb_conda_kernels

custom kernel lite — e.g. if you wish to run a kernel with different parameters. for example with a GPU-enabled launcher. See here for an example for GPU-enabled kernels:

For computers on Linux with optimus, you have to make a kernel that will be called with optirun to be able to use GPU acceleration.

I made a kernel in ~/.local/share/jupyter/kernels/dan/kernel.json and modified it thus:

{
    "display_name": "dan-gpu",
    "language": "python",
    "argv": [
        "/usr/bin/optirun",
        "--no-xorg",
        "/home/me/.virtualenvs/dan/bin/python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}"
    ]
}

Any script called can be set up to use CUDA but not the actual GPU, by setting an environment variable in the script, which is handy for kernels. So this could be in a script called noprimusrun:

CUDA_VISIBLE_DEVICES= $*

Graphs

Set up inline plots:

%matplotlib inline

inline svg:

%config InlineBackend.figure_format = 'svg'

Graph sizes are controlled by matplotlib. Here’s how to make big graphs:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10.0, 8.0)

Interesting-looking other graphing options:

Jupyter lab includes such nifty features as a diagram editor which you can install using jupyter labextension install jupyterlab-drawio

Citations and other academic writing in Jupyter

I did this for

  1. my blog — using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and easy and robust.

  2. my papers — abandoning jupyter in favour of Pweave+pandoc, which works amazingly for everything if you use pandoc tricks for your citations.

I couldn’t find a unified approach for these two different use cases which didn’t sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I’m just being a fancy pants if I try to tweak it further.

More recently there is jupyterbook which enables notebook-based blog rendering, including citations. This is built using the ruby site generaltor jekyll-scholar so is heavy in dependencies but it seems to work.

Chris Sewell has produced a script called ipypublish that eases some pain points in producing articles. It’s an impressive piece of work.

My own latex_fragment allows you to insert one-off latex fragments into jupyter and pweave (e.g. algorithmic environments or some weird tikz thing.)

Jean-François Bercher’s jupyter_latex_envs reimplements various latex markup as native jupyter including \cite. I

Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

nbconvert has built-in citation support but only for LaTeX output. Citations look like this in markup:

<cite data-cite="granger2013">(Granger, 2013)</cite>

or even

<strong data-cite="granger2013">(Granger, 2013)</strong>

The template defines the bibliography source and looks like:

((*- extends 'article.tplx' -*))

((* block bibliography *))j
((( super () )))
\bibliographystyle{unsrt}
\bibliography{refs}
((* endblock bibliography *))

And building looks like:

jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb

As above, it helps to know how the document templates work.

Note that even in the best case you don’t have access to natbib-style citation, so auto-year citation styles will look funky.

Speaking of custom templates, the nbconvert setup is customisable for more than latex.

{% extends 'full.tpl'%}
{% block any_cell %}
    <div style="border:thin solid red">
        {{ super() }}
    </div>
{% endblock any_cell %}

But how about for online? cite2c seems to do this by live inserting citations from zotero, including author-year stuff. (Requires Jupyter notebook 4.2 or better which might require a pip install --upgrade notebook)

Julius Schulz gives a comprehensive config for this and everything else.

This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your BibTeX keys so can’t really be manually edited.

if you are customising the output of jupyter’s nbconvert, you should be aware that the {% block output_prompt %} override doesn’t actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

$ jupyter nbconvert --to slides some_notebook.ipynb \
   --TemplateExporter.exclude_output_prompt=True \
    --post serve

I had to use the source to discover this.

ipyBibtex.ipynb? Looks like this:

%%cite
Lorem ipsum dolor sit amet
__\citep{hansen1982,crealkoopmanlucas2013}__,
consectetuer adipiscing elit,
sed diam nonummy nibh euismod tincidunt
ut laoreet dolore magna aliquam erat volutpat.

So it supports natbib-style author-year citations! But it’s a small, unmaintained package so is risky.

🏗 Work out how Mark Masden got citations working?

Hosting live jupyter notebooks on the internet

Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine. I’m not going to go into the security implications here.

Commercial notebook hosts

NB: This section is outdated. 🏗; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc.

Base level, you can run one using a standard a standard cloud option like buying compute time as a virtual machine or container, and using a jupyter notebook for their choice of data science workflow.

Special mention to two early movers:

  • sagemath runs notebooks online, with fancy features starting at $7/month. Messy design but tidy open-source ideals.

  • Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. ($7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.

  • Microsoft’s Azure notebooks

    Azure Notebooks is a free hosted service to develop and run Jupyter notebooks in the cloud with no installation. Jupyter (formerly IPython) is an open source project that lets you easily combine markdown text, executable code (Python, R, and F#), persistent data, graphics, and visualizations onto a single, sharable canvas called a notebook.

  • Google’ss Colaboratory is hip now

    Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

    With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

    Here is an intro and here is another

Miscellaneous tips and gotchas

Debugging

This is all build on ipython so you invoke the debugger ipython-style, specifically:

from IPython.core.debugger import Tracer; Tracer()()      # < 5.1
from IPython.core.debugger import set_trace; set_trace()  # >= v5.1

I can’t see part of the cell!

Sometime, you can’t see the whole code cell which is annoying. This is a known issue The workaround is simple enough:

zooming out to 90% and zooming back in to 100%, Ctrl + - / +

IOPub data rate exceeded.

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It’s jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py

update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

This is fixed after 5.0.

Offline MathJax in jupyter

e.g. for latex free mathematics.

python -m IPython.external.MathJax /path/to/source/MathJax.zip

Interactive code

Jupyter allows interactions! This is the easiest python UI system I have seen, for all that it is basic.

Updating progress graphs

from IPython.display import clear_output
## Run code

for i in range(10):
    clear_output(True)
    plt.plot(x, y)
    plt.show()

Widgets

Official Manual: ipywidgets.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies.

Pro tip: If you want a list of widgets

from ipywidgets import widget
widget.Widget.widget_types

Then you use them.

from ipywidgets import interact, IntSlider

External event loops

External event loops are now easy and documented. What they don’t say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one.

If you want you make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop:

from zmq.eventloop import ioloop
ioloop.install()

Now, your asynchronous python should just work using tornado coroutines.

NB with the release of latest asyncio and tornado and various major version incompatibilities, I’m curious how smoothly this all still works.

Javascript from python with jupyter

As seen in art python.

Here’s how you invoke javascript from jupyter. Here is the jupyter JS source And here is the full jupyter browser JS manual, and the Jupyter JS extension guide.

Chattopadhyay, Souti, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. “What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities,” 12.

Himmelstein, Daniel S., Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, and Anthony Gitter. 2019. “Open Collaborative Writing with Manubot.” Edited by Dina Schneidman-Duhovny. PLOS Computational Biology 15 (6): e1007128. https://doi.org/10.1371/journal.pcbi.1007128.

Otasek, David, John H. Morris, Jorge Bouças, Alexander R. Pico, and Barry Demchak. 2019. “Cytoscape Automation: Empowering Workflow-Based Network Analysis.” Genome Biology 20 (1): 185. https://doi.org/10.1186/s13059-019-1758-4.