Plotting stuff in julia

May 31, 2019 — July 25, 2020

computers are awful
dataviz
julia
photon choreography
statistics

Julia has an assortment of adequate, if unexciting, plotting options. The julia wiki includes worked examples using various engines.

Figure 1

1 General tips

A feature of many julia toolkits is that they produce SVG graphics per default, which require huge amounts of storage for even medium-large data sets and much CPU/RAM for rendering. This is ok for printing, or for small data. For exploratory data analysis I disable SVG or “interactive” JS stuff in favour of some rasterised format like PNG.

1.1 SVG+latex

We know that SVG+LaTeX is handy for mathematical markup in graphs, as per Martin H’s tip. This can be scripted using julia external commands.

run(`inkscape -D -z --file=image.svg -export-pdf=image.pdf --export-latex`)

Then invoke in the latex document as

\begin{figure}
    \centering
    % set width of next svg image:
    \def\svgwidth{\columnwidth}
    \input{image.pdf_tex}
\end{figure}

2 Plots.jl

The default option/family-of-options. Plots.jl wraps many other plotting packages as backends, although notably not Gadfly. The author explains this is because Gadfly does not support 3d or interactivity, although I want neither of those things in general, least of all for publication, so maybe this means I am not quite priorty-aligned with this project?

I have had enough grief trying to persuade various academics that this whole “internet” thing is anything other than a PDF download engine; For this target audience I don’t need to compromise my print output for animation, of all the useless fripperies. We tell you, this “multimedia” thing is a fad, and the punters will be going back to quill and vellum soon enough. Anyway, one can absolutely shoehorn Plots.jl into print-quality CMYK plots as well as web stuff, so please disregard my grumping.

The Plots documentation is barely usable, since it’s notionally simply wrapping some other plot libraries, and wishes to defer to them. improving. Except of course those other libraries all have their own terminology and APIs and what-have-you, and only some of them are aware that they have a secondary audience of confused Plots.jl users. The whole system is a confusing Kafkaesque maze of buck-passing. You can more-or-less work it out by perusing the attributes documentation, then cross-checking specific backend examples, e.g. GR. Although GR is a wrapper library for some C code whose documentation is in turn poor, so you might end up finding the best option is guessing.

The whole interface is usable, although inflexible and inelegant in comparison with the state-of-the-art. They have adopted the MATLAB/Pyplot style of procedural plotting which was great for the 80s but IMO has been surpassed improved by modern Grammar-of-Graphics approaches (as seen in ggplot which I still think the killer app for R.) Most of us learn to do without that luxury, so this is not a showstopper. Moreover, if you wanted modern grammar-of-graphics type affordances, you could export your data to R for the final plots, or use some of the ports such as Gadfly or the rudimentary implementation in StatsMakie.

Plots has a rich extensions ecosystem. PlotRecipes and StatPlots use the “Recipes” system defined in RecipesBase) to provide a macro(?)-based data-specific plotting tools. For example, StatPlots causes Plots to have sensible DataFrame support.

Table-like data structures, […] are supported thanks to the macro @df which allows passing columns as symbols.

using StatPlots
using DataFrames, IndexedTables
gr(size=(400,300))
df = DataFrame(
    a=1:10,
    b=10 .* rand(10),
    c=10 .* rand(10))
@df df plot(:a, [:b :c], colour = [:red :blue])

This is not quite as useful as I want, since I am often plotting functions of the column in question.

For my future reference for offline work I mention the marker shapes, which I always forget.

markershape  [
    :none, :auto,
    :circle, :rect, :diamond,
    :cross, :xcross,
    :utriangle, :dtriangle, :rtriangle, :ltriangle,
    :hexagon, :pentagon, :heptagon, :octagon,
    :star4, :star5, :star6, :star7, :star8,
    :vline, :hline, :+, :x]

Markers have some oddities; for example, some of them (e.g. the default circles) have a black line around them, which is often not what I want. The naive might think that setting markerstrokewidth=0 and/or markerstrokealpha=0 would make the black line go away, since a zero-width-transparent line is surely invisible. Indeed, that seems to work in the PNG it renders to the screen. With PDF output for the GR.jl backend however, the black lines are still there. Apparently the magical incantation to fix this is markerstrokecolor=:auto, e.g.:

scatter(randn(10), markerstrokecolor=:auto)

One trick I needed was that to match errorbar color to the plot itself I must now specify markerstrokecolor=:auto for that too?

Colours! Colour schemes are important but not immediately obvious. Colours in typical use are a key into a colour scheme. You can access the default scheme by, e.g.

cur_colors = get_color_palette(:auto, plot_color(:white), 17)

Colour gradients are accessed by the clibraries command.

Now, some backends.

2.1 GR

My default Plots.jl backend is GR. This wraps GR.jl which in turn wraps GR, a cross-platform visualisation framework:

GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes. […]

GR is essentially based on an implementation of a Graphical Kernel System (GKS) and OpenGL. […] The GR framework is especially suitable for real-time environments.

Important tip: if you aren’t rendering graphs for publication output, but for say exploratory data analysis, switch to PNG from SVG because SVG is very large for images with lots of details.

ENV["GKS_ENCODING"] = "UTF-8"
using Plots: gr, default
gr()
default(fmt=:png)

Note that while I was there I set the character encoding because otherwise GR has weak character support. There are other workarounds such as inbuilt greek characters via textext and inbuilt LaTeX via grtex:

using LaTeXStrings, Plots
plot(
    rand(20),
    lab=L"a = \beta = c",
    title=L"before\;f(\theta)\;and\;after")

If you want to suppress latex rendering, ensure that your label string does not both start and end with $. I do this by padding with a trailing space. Corollary: strings that are only partly LaTeX are not supported. e.g. title=raw"The squared greek letter $\alpha^2$" will break by displaying dollar signs. Documentation for this does not exist. One must read the C source to work out what is happening.

Generally, formatting using mathematical markup is a mess. It works some of the time but then you reach a weird edge-case and everything breaks, and then you are on your own. If I really wanted LaTeX to work reliably I would probably use PGFPlots.

If anything is flaky on first execution, you might need to check the GR installation instructions which includes such steps (on Ubuntu) as:

apt install libxt6 libxrender1 libxext6 \
    libgl1-mesa-glx libqt5widgets5

The observant will notice that this requires root access on the machine. I’m sure there must be a workaround but I can’t be arsed discovering it right now.

Also, every time the version of GR/ GR.jl increments there is a loooong recompilation process, which seems to be single-threaded and takes many minutes on my fancy 8-core machine. So be aware that it is not fast in every sense.

For all its smoothness and speed when it is up and running, GR Plots are not IMO all that beautiful and it is not clear how to make them beautiful, since beauty is hidden down at the toolkit layer.

2.2 PyPlot

PyPlot.jl using python plotting to do plotting in julia. Pyplot was in turn inspired by MATLAB plots. The result is slow and rickety but also battle-tested, and ugly in a well-understood, reliable way. LaTeX formatting is broken differently than in GR. Deep and confusing reasons, this is what one must do to get LaTeX rendering in PyPlot.

pyplot()
rc("text", usetex=true)
rc("text.latex", preamble=raw"\newcommand{\mathregular}[1][]{}")

Once again, if I really wanted LaTeX to work reliably I would probably use PGFPlots.

2.3 PGFPlots

PGFPlotsX is more-or-less a TeX-native option, which is a little feature-lite AFAICT but supports LaTeX natively, as opposed to via a confusing undocumented chaos. (There is a deprecated version called PGFPlots also.)

Alternatively, one might use any backend to produce an SVG generaically, but then use LateX markup in the SVG and allow LaTeX to finish the job. I am unsure of the tradeoffs here.

2.4 Plotly

Plots.jl also targets Javascript online back ends via Plotly, which is neat although I have no use for it at the moment. As mentioned previously, in my department this “online” nonsense is about as popular as communicating data through modulated flatulence, on the basis that the two are approximately equivalent in terms of their contributions toward our performance metrics. See plotly.

2.5 InspectDR

InspectDR functions as a Plots.jl backend but I think is more designed to be run separately. It produces interaction-oriented plots with inclination toward signal processing.

3 Gadfly

The aspirational ggplot clone is Gadfly. It’s elegant, pretty, well integrated into the statistics DataFrame ecosystem, but missing some stuff, and has weird gotchas.

I had to switch Gadfly from SVG to PNG or similar rasterised format, as presaged, to avoid bloated jupyter notebooks. First I needed to install the Fontconfig and Cairo bonus libraries

Pkg.add("Gadfly")
Pkg.add("Cairo")
Pkg.add("Fontconfig")

Now to force PNG output:

draw(
    PNG(150, 150),
    plot(data,
        x=:gamma,
        y=:p,
        Geom.point, Geom.line)
)

Even though it feels most natural to me, I am not currently using Gadfly. The reason being: Gadfly seemed unfortunately slow on initial startup; This is survivable, and there are workarounds. But even after startup it seemed slow on many basic tasks. If I histogram a million data points it took 30 seconds for each plot of that histogram (i.e. slower than python’s pyplot). It saps the utility of an elegant graph grammar if it’s not responsive to your adjustments. I wonder if this can be improved?

Gadfly is based on a declarative vector graphics system called Compose.jl, which might be independently useful.

4 Makie

Makie reinvents a plotting ecosystem that is somewhat less confusing and overgrown than the Plots.jl one. For reasons of simplicity it seems it may be a better bet for new users? Unless the older, cruftier Plots.jl system includes tasty morsel you crave. It support OpenGL-backed GLVisualize GPU accelerated plotting and also traditional vector graphics.

Haven’t tried it, since learning the Plots.jl and Gadfly.jl APIs has filled up my brain, but the gallery is pretty.

Edited highlights of its justification for existing:

Makie is a high level plotting interface […] with a focus on interactivity and speed.

It can also be seen as a prototype for a new design of Plots.jl, since it will implement a very similar interface and incorporate a lot of the ideas.

A fresh start instead of the already available GLVisualize backend for Plots.jl was needed for the following reasons:

  1. Plots.jl was written to create static plots without any interaction. This is deeply reflected in the internal design and makes it hard to integrate the high performance interaction possibilities from GLVisualize.

  2. Plots.jl has many high level plotting packages as a backend [and] there is no straight interface a backend needs to implement, […] which lead to a lot of duplicated work for the lower level backends and a lot of inconsistent behavior since the code isn’t shared between backends.

  3. […]

  4. …There should be a finite set of “atomic” drawing operations (which can’t be decomposed further) which a backend needs to implement and the rest should be implemented via recipes using those atomic operations.

  5. Backend loading is done in Plots.jl via evaling the backend code. This has […] negative consequences:

    1. Backend code can’t be precompiled leading to longer load times
    2. Backend dependencies are not in the Plots.jl REQUIRE file…

That sounds promising, right? Later, perhaps.

Some more modern data handling comfort is available via StatsMakie which allows grouping and styling.

5 Vega

Vega.jl binds to a javascript ggplot-alike, Vega. There is a “higher level” binding also called VegaLite.jl.

It’s unclear to me how either of these work with print graphics, and they are both lagging behind the latest version of the underlying Vega library, so I’m wary of them.

6 Animations

See Reel.

7 Others

Winston.jl has a brutalist simple approach but seems to go.

8 References

Christ, Simon, Daniel Schwabeneder, Christopher Rackauckas, Michael Krabbe Borregaard, and Thomas Breloff. 2022. Plots.jl — a User Extendable Plotting API for the Julia Programming Language.” arXiv.