I’m visualising data in python because it is the lingua franca of my team. I’d like it to be real-time and interactive or publication-quality, but I won’t be inconsolable if I cannot achieve both simultaneously.
Visualisation is not an especially strong suit of python; the strong suit is hodgepodge, decoupage, bricolage, and, uh, potpourri. Therefore our solution will be to cobble something together, or better, to use someone else’s cobbling.
The classic option is matplotlib. It can’t do all those modern hipster graphs without hard labour and is awful at animations and interactions, and it fugly per default. It works OK out of the box. There are libraries which use matplotlib as a backend and aim for something smoother.
It is an acceptable default with lots of weird edge cases when you try to be clever.
Note some confusing terminology;
Axes object, which is constructed by an
Axis objects, but is much more than a list of such objects, being
the basic“graph” object.
Read Jakevdp’s manual.
Here are some miscellaneous tips:
- If you are using jupyter, the nerdy extension is jupyter-matplotlib which integrates interactive plotting into the notebook better.
- Improving log y-axis plots, esp histograms
Grammar of graphics
The default matplotlib stylesheet aspires to look like 80s spreadsheet defaults, but if you are not a retrofuturist, you want to change the stylesheet. Some of the built-in stylesheets are OK.
Seaborn is another vaunted extension, which I would describe as an “Edward Tufterizer”. Extends matplotlib with modern apperance and some missing plot types.
A cute hack to justify matplotlib’s existence: xkcd graphs.
from matplotlib import rc rc( 'font', family='serif', serif=['Palatino'] ) rc( 'mathtext', fontset='cm' )
Supported math fonts are reputedly
- dejavusans (horrible default)
- dejavuserif (beware of odd greek letters)
- cm (”Computer Modern”. Classic, dated. )
- stix (not sure)
- stixsans (sounds like sans serif to me)
Alternatively you can render your graph labels with TeX which leads to some weird spacing but allows you to match fonts better.
Yellowbrick is a matplotlib specialisation for hyperparameter optimisation.
Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.
Here are some promising hacks:
superset is Airbnb’s python+browser interactive data exploration tool.
The mpld3 package is extremely easy to use: you can simply take any script generating a matplotlib plot, run it through one of mpld3’s convenience routines, and embed the result in a web page.
2d only, AFAICT.
same tool (web browser), different approach: bokeh does “big-data” and streaming-based browser graphing for python. And its website probably looks the nicest out of everything I’ve mentioned, which counts for a lot. However, its print-output is bad; this is a web-oriented tool
GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes. […]
GR is essentially based on an implementation of a Graphical Kernel System (GKS) and OpenGL. […] GR is characterized by its high interoperability and can be used with modern web technologies and mobile devices. The GR framework is especially suitable for real-time environments.
It will also function as a matplotlib backend. GR is somewhat brutalist in its graph presentation, but it works fine.
Visdom pumps graphs to a visualisation server.
- Visdom aims to facilitate visualization of (remote) data with an emphasis on supporting scientific experimentation.
- Broadcast visualizations of plots, images, and text for yourself and your collaborators.
- Organize your visualization space programmatically or through the UI to create dashboards for live data, inspect results of experiments, or debug experimental code.
Holoviews has been crafted by some neurologists to serve science. Fresh, enthusiastic. Is it good?
HoloViews focuses on bundling your data together with the appropriate metadata to support both analysis and visualization, making your raw data and its visualization equally accessible at all times. This process can be unfamiliar to those used to traditional data-processing and plotting tools, and this getting-started guide is meant to demonstrate how it all works at a high level. More detailed information about each topic is then provided in the User Guide .
With HoloViews, instead of building a plot using direct calls to a plotting library, you first describe your data with a small amount of crucial semantic information required to make it visualizable, then you specify additional metadata as needed to determine more detailed aspects of your visualization. This approach provides immediate, automatic visualization that can be effortlessly requested at any time as your data evolves, rendered automatically by one of the supported plotting libraries (such as Bokeh or Matplotlib).
Part of a suite of visualisations tools and guides called Pyviz.
- INRIA’s Tulip has fans
- Visualizing a NetworkX graph in the IPython notebook with d3.js
VisPy is OpenGL-backed data visualisation, focussing on science (ooh!). It also offers a matplotlib compatibility layer. Here are some howtos:
There seems to be a lot more writing of OpenGL shaders than one would like to draw a line graph.
Mayavi is an opinionated open-source commercially-backed interactive 3D visualiser. Its aesthetic I find grating. The source code repository is worryingly hard to find. For future reference, it’s here.
On a similar tip, although looking more basic and more bitrotten, is vtk - if I understand correctly, VTK is the engine used by Mayavi? Better maintained and possibly still vtk-based is Paraview, which supports pluggable backends.
Not exactly graphing libraries
Disney (!) has a game library Panda3d, that seems to do all the fun things
even more bareback, more-or-less-directly calling into openGL, but seriously, I’m a statistician, not a coder. I could also hand-pulp hemp to make my own graph paper to draw my visualisations, drawn in home-made iron gall ink, but I would find it equally hard to argue that it was an efficient prioritisation.
I haven’t used PREdator (although I understand it’s been around longer than I. Heh.) Wiedemann, C., Bellstedt, P., & Görlach, M. (2014). PREdator: a python based GUI for data analysis, evaluation and fitting. Source Code for Biology and Medicine, 9(1), 21. DOI. Online.