Plotting in python

Jack of all trades, old master of none



I’m visualising data in python because it is the lingua franca of my team. I’d like to use it for real-time and interactive or publication-quality, but I won’t be inconsolable if I cannot achieve both simultaneously.

Visualisation is not an especially strong suit of python; the strong suit is hodgepodge, decoupage, bricolage, and, uh, potpourri. Therefore my solution is to cobble something together, or better, to use someone else’s cobbling.

Matplotlib

The default option. Well documented, ubiquitous. Reliable. Built upon a foundation of design choices aged poorly. Basic plots are easy; nuanced plots are a maze of margin tweaking, overlapping incompatible helper libraries, confusing naming and API collision. A classic example of a tool which it is easier to use by copy-pasta from stackoverflow than to use yourself. Complicated enough that I made a new notebook. See matplotlib.

Send it to R

R plotting is good so it can be worth the overhead to export data to R,

Holoviz

A suite of visualisations tools and guides called Holoviz includes a lot of plotting infrastructure. Fresh, enthusiastic.

HoloViz tools build on the many excellent visualization tools available in the scientific python ecosystem, allowing you to access their power conveniently and efficiently. The core tools make use of Bokeh’s interactive plotting, Matplotlib’s publication-quality output, and Plotly’s interactive 3D visualizations. Panel lets you combine any of these visualizations with output from nearly any other Python plotting library, including specific support for seaborn, altair, vega, plotnine, graphviz, ggplot2, plus anything that can generate HTML, PNG, or SVG.

HoloViz tools and examples generally work with any Python standard data types (lists, dictionaries, etc.), plus Pandas or Dask DataFrames and NumPy, Xarray, or Dask arrays, including remote data from the Intake data catalog library. They also use Dask and Numba to speed up computations along with algorithms and functions from SciPy.

HoloViz tools are designed for general-purpose use, but also support some domain-specific datatypes like graphs from NetworkX and geographic data from GeoPandas and Cartopy and Iris.
Panel can be used with yt for volumetric and physics data and SymPy or LaTeX for visualizing equations.
HoloViz tools provide extensive support for Jupyter notebooks, as well as for standalone web servers and exporting as static files.

Is it good? Some think so, notably Sophia Yang, who wrote some intros, e.g.

She explains:

HoloViz allows users to build Python visualization and interactive dashboard with super easy and flexible Python code. It provides the flexibility to choose among several API backends, including bokeh, matplotlib, and plotly, so you can choose different backends based on your preferences. Plus, it’s 100% open source!

Unlike the other python viz and dashboarding options, HoloViz is very serious about supporting every reasonable context in which you might want to use a Python viz or app tool:

  • a Jupyter notebook,
  • a Python file,
  • a batch job generating PDFs or SVGs or PNGs or GIFs,
  • as part of an automated report,
  • as a standalone server,
  • as a standalone .html file on a website.…

Each of the alternative technologies supports a few of those cases well but lets all the rest slide. HoloViz minimizes the friction and cost of switching between all of these contexts, because that’s the reality of any scientist or analyst—as soon as you publish it, people want changes! Once you have your Dash app; that’s all you have, but once you have a Panel app, you can go back to Jupyter the next day and start right where you left off.

  • Panel builds interactive dashboards and apps. It’s like R Shiny, but more powerful. I can’t say enough how much I love Panel.
  • hvPlot is easier than any other plotting libraries in my experience, especially if you like to plot Pandas DataFrames. With one line of code, hvPlot will provide you an interactive plot with all the nice built-in functionalities you want.
  • HoloViews is a great tool for data exploration and data mining through visualization.
  • GeoViews plots geographic data.
  • Datashader handles big data visualization. Using Numba (Python compiler) and Dask (distributed computing), Datashader creates meaningful visualizations of large datasets very quickly. I absolutely love Datashader and love the beautiful plots it generates.
  • Param creates declarative user-configurable objects.
  • Colorcet creates colormaps.

Plotly

Originally a mostly browser-based visualisation, plotly’s native python support (source) is supposed to be quite good and quite general these days. It support high-resolution print-quality graphics, vector rendering and so on. Certainly the Plotly library is hipper than matplotlib and seems to incorporate the input of some graphic designers from the internet, which matplotlib seems to do rarely because it is old and/or confusing and/or unlikely to pop up as a highlight in your web portfolio since the main target is scientific journals.

Credit: I am indebted to Andy MacKinlay for reminding me that this is a viable concern.

Bokeh

bokeh does “big-data” and streaming-based browser graphing for python. And its website probably looks the nicest out of everything I’ve mentioned, which says something important about priorities. However, its print-output is bad; this is a web-oriented tool.

Bokeh is a Python library for creating interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself.

Vega/Altair

Browser visualiser Vega is available for python, via the library Altair. USP: easy interactives.

Misc browser options

jupyter notebooks have a rich enough API to integrate various more exotic pure-browser graphics options; In fact, since you are now using the web browser, you can inspect a menu at browser datavis.

Here are some hacks:

  • superset is Airbnb’s python+browser interactive data exploration tool; filed under dashboards.

  • mpl3d plugs browser d3.js into jupyter to emulate matplotib.

    The mpld3 package is extremely easy to use: you can simply take any script generating a matplotlib plot, run it through one of mpld3’s convenience routines, and embed the result in a web page.

    2d only, AFAICT.

GR

GR.py wraps GR, a cross-platform visualisation framework:

GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes. […]

GR is essentially based on an implementation of a Graphical Kernel System (GKS) and OpenGL. […] GR is characterized by its high interoperability and can be used with modern web technologies and mobile devices. The GR framework is especially suitable for real-time environments.

It will also function as a matplotlib backend. GR is ugly brutalist in its graph presentation, but it works fine.

Visdom

Visdom is

A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

It pumps graphs to a visualisation server enabling some kind of shared visualisation of a thing of interest. You want more of a pitch?

  • Visdom aims to facilitate visualization of (remote) data with an emphasis on supporting scientific experimentation.
  • Broadcast visualizations of plots, images, and text for yourself and your collaborators.
  • Organize your visualization space programmatically or through the UI to create dashboards for live data, inspect results of experiments, or debug experimental code.

It sounds like one could e.g. build an SGD diagnostic convergence diagram using this as an alternative to Tensorboard.

Plotting networks in particular

In browser datavis I found Sigma.js; there are surely more JS graph visualisations.

VisPy

VisPy is OpenGL-backed data visualisation, focussing on science (ooh!). It also offers a matplotlib compatibility layer. Here are some howtos:

It seems to require more writing of OpenGL shaders than one would like to draw a line graph.

However, there are less messy looking tools in the ecosystem: napari is a multidimensional image viewer leveraging Vispy.

Mayavi

Mayavi is an opinionated open-source commercially-backed interactive 3D visualiser. The source code repository is worryingly hard to find. For future reference, it’s here.

On a similar tip, although looking more basic and more bitrotten, is vtk - if I understand correctly, VTK is the engine used by Mayavi? Better maintained and possibly still vtk-based is Paraview, which supports pluggable backends.

Not exactly graphing libraries

  • Disney (!) has a game library Panda3d, that seems to do all the fun things

  • even more bareback, more-or-less-directly calling into openGL, but seriously, I’m a statistician, not a coder. I could also hand-pulp hemp to make my own graph paper to draw my visualisations, drawn in home-made iron gall ink, but I would find it equally hard to argue that it was an efficient prioritisation.

  • I haven’t used PREdator (although I understand it’s been around longer than I. Heh.)(Wiedemann, Bellstedt, and Görlach 2014)

General image reading and writing

  • Imageio is a good workhorse python image system.

Animations

GIFs

Of course, what we all truly want is animated GIFs. Here is a classic using Python, Pillow. See also the specialized array2gif.

manim

3b1b’s manim is a curious passion project to create animations through code. It is famous on e.g. youtube. Here is a powerful example.

See manim for more documentation; I am currently using this tool and might document some stuff there.

References

Heusser, Andrew C., Kirsten Ziman, Lucy L. W. Owen, and Jeremy R. Manning. 2017. HyperTools: A Python Toolbox for Visualizing and Manipulating High-Dimensional Data.” arXiv:1701.08290 [Stat], January.
Otasek, David, John H. Morris, Jorge Bouças, Alexander R. Pico, and Barry Demchak. 2019. Cytoscape Automation: Empowering Workflow-Based Network Analysis.” Genome Biology 20 (1): 185.
Wiedemann, Christoph, Peter Bellstedt, and Matthias Görlach. 2014. PREdator: A Python Based GUI for Data Analysis, Evaluation and Fitting.” Source Code for Biology and Medicine 9 (1): 21.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.