LaTeX

…and ΤeΧ, and ConTeXt and XeTeX and TeXleMeElmo

September 8, 2014 — November 25, 2024

computers are awful
faster pussycat
LaTeX
plain text
typography
UI
workflow
Figure 1: fun.fnord.eu, Beautiful typesetting with LaTeX.

The least worst mathematical typesetting system for the last 30 years and still for now. One of the better scoured of the rusting pipes comprising academic plumbing. De facto standard for mathematicians, especially those who are not so impertinent as to insist on writing in non-English languages, or are not so shallow as to gainsay the simple delights in the painstaking handicraft of manually setting line breaks, or who have grad students who will do that for free. That is to say, LaTeX is a tool that provides comfort for that endangered animal, the Tenured Academic, and tolerable usefulness for the rest of us.

Other alternatives include

  1. using MS Word, or
  2. stabbing your eyeballs with a pencil

… each of which I regard as similarly undesirable as the other, and, to be clear, both somewhat less desirable than LaTeX itself.

1 History

Standard disclaimer, before diving into the TeXonomy:

I am also aware that I do violence to the rich and storied ecosystem by failing to mention that almost everything I mention is but a macro system built upon Knuth’s OG TeX system. Even that is a crude simplification of the complicated truth that this original system has evolved, reimplemented and mutated in complex and subtle ways. However, this document is not a philological or phylogenetic exploration; it is a pragmatic guide to getting documents typeset before key deadlines pass. Nonetheless, some context is occasionally helpful.

Eddie Smith, From boiling lead and black art: An essay on the history of mathematical typography; the only thing on this page I might conceivably read for pleasure.

Wasn’t that nice. Now, let Robert Kosara rant about where we are now:

The tools of the trade for academics and others who write research papers are among the worst software has to offer. Whether it’s writing or citation management, there are countless issues and annoyances. How is it possible that this fairly straightforward category of software is so outdated and awful?

Grad students, Robert, and the low-marginal-cost, low-quality labour they provide. The same labour undervaluation that keeps slave economies from developing the steam engine. As long as it is cheaper to solve typesetting problems with grad student labour, the system can shamble forward without anyone being incentivised to fix it for everyone else. Side effect: grad students also have underdeveloped software development skills and will never return to remedy the engineering mistakes of their younger selves. This will keep the average quality of the pieces of this system mediocre.

There are also standards-lock-in problems; even if someone develops a better system, will conferences and journals find it worthwhile to switch? Or will they let the old system shamble on knowing it will cost their editorial board nothing to waste everyone else’s time? After all, academic publishing is built on free professorial labour. For the authors: will a newer better system be good enough to justify the cost of learning it? Will it be robust enough to last long enough that it repays its cost? Cf Moloch.

2 Tutorials

LaTeX has a long and storied history, and that means it has lots of nasty historical design decisions preserved in it, like smallpox victims in a glacier. And yet! It is not so very backwards compatible. Many old packages interact disastrously with new ones, and the conservatism of the community cargo cult way of doing things means that people tend to carry on deprecated practices forever, and any given document you edit may be a ticking time bomb of confusing font failures and incomprehensible error messages.

There is a guide to avoid old things Das LaTeX2e-Sündenregister oder Veraltete Befehle, Pakete und andere Fehler although those who do not speak German might prefer, e.g. the slightly outdated English L2 Taboos document. Deutsch befriedigt, im Bereich pingelig sein.

3 Variants

Figure 2

There are differences in various different engines, formats, macro systems etc, giving us ConTeXT and LaTeX and TeX and pdfTeX and XeTeX and LuaTeX, and that they are all refreshingly unique in their choices of pain-points, whether in formatting, interoperation, character set handling, compatibility, preferred name capitalisation, or community support. Here is a more pious take by Graham Douglas, What’s in a name: a guide to the many flavours of TeX. For now I will construe LaTeX broadly, which is to say “in principle any of these LaTeX-like engines”, which is to say “in practice any LaTeX engine which is sufficiently similar to the baseline horror to survive the submission process to arxiv.org”.

I am indeed cognisant of the diversity and richness of buffet of failure cases I could choose from, if only in outline. However, standards lock-in being what it is, I will for now avoid arranging the deckchairs of incremental improvement on this sinking ship of legacy mess. If I must innovate, it will be to discretely shuffle over, to the lifeboats, in which I will wait for some disruptive scholarly version of Markdown to come rescue me from LaTeX entirely.

In the interim, some people advocate typst as a more modern alternative to LaTeX. I have not tried it.

3.1 Vanilla LaTeX

The default, in the sense that it is the one which I get if I run the command latex. I essentially never use this.

3.2 pdfTeX/pdfLaTeX

de facto standard; slightly different toolchain than the primordial DVI/PS rendering one, but pretty compatible, except it handles PDF images.

3.3 XeTeX/XeLaTeX

pdfTeX with better font support, and Unicode support, so can handle multilingual text better. The price we pay is that many conferences and journals do not support it.

3.3.1 gotcha

Sometimes I get errors when I include .pdf images in my documents, and sometimes funny font business is involved, as latex fonts are a mess. Some errors can be fixed by downgrading pdfs. There is probably an underlying issue in there, but I do not have time to diagnose it.

4 Installing TeX

See LaTeX installation.

5 No TeX at all

See LaTeX-free mathematical typesetting.

6 Reverse LaTeX

Figure 3

Getting LaTeX code back from screen captures photos of (formatted) equations. Rebekah Yates reports

  1. You can look things up in the Comprehensive LaTeX symbols list. It can usually be easily accessed with texdoc symbols or texdoc symbols-a4 (in MiKTeX the latter only).
  2. Another good option is to try the web-based software Detexify, which allows you to draw the symbol and tries to recognise what you’ve drawn.[…]
  3. If you are using the package unicode-math, then besides using any Unicode character list, the list of all supported symbols (texdoc unimath-symbols) is very useful as it also lists which symbols are available in the various fonts.
  4. Using unicode-math, you can also search for characters by drawing (just like with detexify) using ShapeCatcher.

That sounds boring compared to ML-based automation. I used Mathpix. LaTeX OCR seems to be an open source pytorch implementation. They also offer a mathematical notebook, snip.

Or! Leave the machines behind! Train yourself in speed LaTeX transcription via the gamified mathematical typesetting training system TeXnique.

7 Invocation

Figure 4

7.1 Run like a normal unix program

Per default TeX runs in an “interactive mode”, which makes usually pointless efforts to solicit my advice about badly explained syntax errors, and offers me a chance to… fix them? I guess? I have never tried; why would I want to try to do that halfway through formatting instead of in my text editor where my fixes will actually persist? This probably dates to some time in the 80s when users were billed per-command-line-invocation or something, and is utterly contrary to modern expectations. Anyway, here is how we get Normal-unix-halt-on-failure-with-helpful-message:

pdflatex -interaction=nonstopmode -halt-on-error

Or, alternatively, bloody-minded-compile-my-document-at-all-costs-I-don’t-care-how-it-is-broken:

pdflatex -interaction=batchmode

7.2 latexmk

A popular tool that makes a best-effort attempt to couple together the clanking chain of components that turn those text files into documents. It has various command-line options in the manual but examples are more explanatory to me.

See latexmk options and nomenclature. See also the latexmkrc files that it comes with for examples of advanced configuration.

oooh! I can set up latex as an automatically updating dynamical preview, even with synctex, as a poor-man-interactive-editor.

latexmk -pvc

7.2.1 Direct PDF

-pdfxe
Generate pdf version of document using xelatex. Note that to optimise processing time, latexmk uses xelatex to generate an .xdv file rather than a pdf file directly. Only after possibly multiple runs to generate a fully up-to-date .xdv file does latexmk then call xdvipdfmx to generate the final .pdf file.

(Note: The reason why latexmk arranges for xelatex to make an .xdv file instead of the xelatex’s default of a .pdf file is as follows: When the document includes large graphics files, especially .png files, the production of a .pdf file can be quite time consuming, even when the creation of the .xdv file by xelatex is fast. So the use of the intermediate .xdv file can result in substantial gains in processing time, since the .pdf file is produced once rather than on every run of xelatex.)

This is all fine I guess, but the .xdv build path sometimes gives me errors that I do not get when I use -pdf instead at the final conversion. I guess I need to go the slow way around to do that via -pdf. But I thought that the compilation path would be the same, so I do not know why -pdf would suppress the error unless it switches to a non xelatex tool under the hood. tldr I am not sure what the combination latexmk -xelatex -pdf does exactly, but like most people in the LaTeX world, I just keep adding and subtracting command line flags and packages until it works.

7.3 Tectonic

Tectonic addresses several of my complaints at once, at least on e-paper. I wonder if it is as seamless as I might hope in practice.

Tectonic to be a light wrapping/forking of mainline LaTeX to modernise the toolchain slightly — not as regards the (La)TeX language itself but as regards the way it is built and executed.

Tectonic is a modernised, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.

… TeX is quite archaic in some ways, but it’s still the tool of choice for documents that require precision typography or ones that involve lots of mathematical equations, which makes it especially important in the sciences. Tectonic converts TeX files into PDF files.

Tectonic is beta software but has been demonstrated to work well in a variety of real-world situations. Contributions in any form — documentation, bug reports, test cases, new features — are most welcome. The user forum is the place to start.

Advertised features:

  • Tectonic automatically downloads support files so you don’t have to install a full LaTeX system to start using it. If you start using a new LaTeX package, Tectonic just pulls down the files it needs and continues processing. The underlying “bundle” technology allows for completely reproducible document compiles. Thanks to the Dataverse Project for hosting the large LaTeX resource files!
  • Tectonic has sophisticated logic and automatically loops TeX and BibTeX as needed, and only as much as needed. In its default mode it doesn’t write TeX’s intermediate files and always produces a fully-processed document.
  • The tectonic command-line program is quiet and never stops to ask for input.
  • Thanks to the power of XeTeX, Tectonic can use modern OpenType fonts and is fully Unicode-enabled.
  • The Tectonic engine has been extracted into a completely self-contained library so that it can be embedded in other applications.
  • Tectonic has been forked from the old-fashioned WEB2C implementation of TeX and is developed in the open on GitHub using modern tools like the Rust language.
  • Tectonic can be used from GitHub Actions to typeset your documents whenever a change to them is made:

It is hard to imagine getting uptake because it doesn’t use enough cute 1980s typesetting stunts like naming itself canonically TECTONIC.

brew install tectonic

The manual took some work to find. See here.

Can be used in VS Code as a LaTeX Workshop build script, or in the new disruptive TeXlab plugin, so we can be !!!double disruptor!!!

Pro-tip: SyncTeX is available using the --synctex option.

Pro-tip: There is a v1 and v2 command-line interface. I am going to ignore this distinction until I have a reason not to.

Pro-tip: tectonic demands a price for its modernity, which is that its errors are even worse than vanilla LaTeX. WTF is error: CFF: Inconsistent DICT argument number? I don’t know, but I get it a lot.

7.4 Include in python

Generating arbitrary LaTeX in python scripts, jupyter notebooks, Pweave literate documents? For that I use an ingenious python script called latex_fragment to ease my burden and render my latex fragments inline. It was written by that paragon of coding cleanliness, that tireless crusader for not-dicking-around, me.

from IPython.display import display_latex, display
import latex_fragment
l = latex_fragment.LatexFragment(r'$x=y$')
display(l)

You should totally check it out for rendering inline algorithms, or for emitting SVG equations.

Note also that pandoc markdown already includes LaTeX support for LaTeX output.

Other options include inverting this setup, and injecting python into LaTeX via an executable notebook such as knitr.

In jupyter, the inbuilt jupyter LaTeX renderer will display maths ok via HTML+JS, so why would we use this?

#%%
from IPython.display import Latex
Latex(r'''$x=y$''')

For one, once this thing has rendered there are no external dependencies, so a notebook which displays mathematics this way also works when you are offline. For another, this can display other stuff than mathematics, for example, specialised LaTeX-only things like pseudocode, and weird font samples, and exotic diagram types like Feynman diagrams and parse trees.

8 Submitting to Arxiv

Here are some automations.

brew install arxiv_latex_cleaner
arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'

Here is a decently documented template: kourgeorge/arxiv-style.

9 Strikethrough

a.k.a. strikeout. Jan Söhlke says this works great in text mode.

\usepackage[normalem]{ulem}

\sout{text to be struck through}

Strikethrough is weird in math mode; the cancel package is recommended. In mathjax, this needs an extension.

10 Putting dates in drafts

Certain document classes (all?) have draft modes.

\documentclass[draft]{article}

A universal (not document-class-dependent) option was suggested by the Malaysian LaTeX User Group, Putting Dates in Watermarks:

\usepackage{draftwatermark}
\usepackage{datetime2}
\SetWatermarkLightness{.9}
\SetWatermarkText{Draft\string@\DTMnow}
\SetWatermarkScale{.3}

On a minimalist TeX system, this may necessitate

tlmgr install draftwatermark everypage \
  datetime2 etoolbox tracklang

11 No BoundingBox error

e.g. LaTeX Error: Cannot determine size of my_cool_image.pdf (no BoundingBox).

PDF inclusion only works seamlessly from pdflatex. Per default.

One workaround uses bmpsize which should decorate PNGs etc with the required information.

\usepackage[dvipdfmx]{graphicx}
\usepackage{bmpsize}

For some reason this also worked for me in importing PDFs, but I have no idea why.

Or if you have lots of time you can manually annotate images with “natural sizes” but this is tedious.

We could alternatively convert PDFs to EPSs, but that is also tedious and wastes space, plus has not been reliable for me, mangling fonts and causing problems when converting back to PDF.

12 Spacing

Managing spacing between symbols is the major reason for existence for LaTeX. It is a hard problem, optimising for legibility of symbols on a sheet of paper for all the various sorts who might read it. One must solve for the goal of minimising the number of criticisms of grumpy aesthetically-challenged pedantic folk-typographers each of whom has a different, incompatible list in their mind of what constitutes an unspeakable crime against legibility. Thus there are many compromises, tricks, edge-cases and other potholes to get your foot stuck in, especially in mathematical mode.

As far as individual mathematical characters go, here is a comprehensive guide to LaTeX mathematical spacing by Werner. tl;dr if things look weird I can convert a mathematical character to an “ordinal” by wrapping it {=} and add my own manual spacing back in and it will work nicely. When that is not sufficient, Overleaf’s Spacing in math mode is what I most often need:

\begin{align*}
f(x) &= x^2\! +3x\! +2 \\
f(x) &= x^2+3x+2 \\
f(x) &= x^2\, +3x\, +2 \\
f(x) &= x^2\: +3x\: +2 \\
f(x) &= x^2\; +3x\; +2 \\
f(x) &= x^2\ +3x\ +2 \\
f(x) &= x^2\quad +3x\quad +2 \\
f(x) &= x^2\qquad +3x\qquad +2
\end{align*}

\[ \begin{align*} f(x) &= x^2\! +3x\! +2 \\ f(x) &= x^2+3x+2 \\ f(x) &= x^2\, +3x\, +2 \\ f(x) &= x^2\: +3x\: +2 \\ f(x) &= x^2\; +3x\; +2 \\ f(x) &= x^2\ +3x\ +2 \\ f(x) &= x^2\quad +3x\quad +2 \\ f(x) &= x^2\qquad +3x\qquad +2 \end{align*} \]

K. Cooper’s even more comprehensive LaTeX Spacing Tricks is the guide for almost every type of spacing civilians will need.

To manage justification, a.k.a. text alignment, generally (Why is everything fully justified per default? It makes the spacing so ugly, at least to this aesthetically-challenged folk-typographer) one needs the \raggedright/\centering etc commands, or even the ragged2e package. See the Overleaf documentation and wikibooks.

PRO-TIP: \RaggedRight and friends destroy paragraph indentation. The fix is to restore the indent:

\newlength{\saveparindent}
\setlength{\saveparindent}{\parindent}
\RaggedRight
\setlength{\parindent}{\saveparindent}

Or you could do what a real typographer would do and put space between paragraphs, which might require some style updates.

I use RaggedRight spacing to make it easier to proofread, but revert to fully justified after the proofreading process. This has advantages:

  1. Editors will not complain about everything not being fully justified
  2. Reviewers will find it harder to read, reducing the chance they will detect any inconvenient errors I made.

Gotcha: Commands gobble following space.

12.1 Floats

Via chatGPT I found the following mantra will cause floats to be aggressively crammed in to meet conference page limits:

\renewcommand{\floatpagefraction}{.8}%
\renewcommand{\topfraction}{.85}%
\renewcommand{\bottomfraction}{.85}%
\renewcommand{\textfraction}{.15}%
\setcounter{totalnumber}{5}%
\setcounter{topnumber}{3}%
\setcounter{bottomnumber}{3}%

13 Commenting out

The obvious way to comment stuff out is with the % comment marker. For long blocks, Martin Scharrer suggests

You can use \iffalse ... \fi to make (La)TeX not compile everything between it. However, this might not work properly if you have unmatched \ifxxx ... \fi pairs inside them or do something else special with if-switches. It should be fine for normal user text.

There is also the comment package which gives you the comment environment which ignores everything in it verbatim. It allows you to define own environments and to switch them on and off. You would use it by placing \usepackage{comment} in the preamble.

14 Long documents

subfiles is a handy package for Multi-file LaTeX projects.

\documentclass[../main.tex]{subfiles}

15 Death-or-define macro

Death-or-define is how I think of the trick to force a macro definition redefinition even if there is no definition to be redefined — handy if I am rendering latex from some tricky source such as jupyter, or where I don’t have control over the overall document outside my section but don’t care about wreaking havoc on my collaborators; some other poor sap can deal with the macro mutations Mwahahaha.

\providecommand{\foo}{}
\renewcommand{\foo}[1]{bar: #1}

16 Symbols, fonts

See Latex symbols, fonts and character encodings.

17 Algorithms

See LaTeX algorithms.

18 IDs (ORCID, DOI etc)

Why is this not documented at orcid.org? I do not know. So now I document it here. AFAICT, at the basic level I should simply create a hyperlink, e.g.

\href{https://orcid.org/0000-0001-6077-2684}{Dan MacKinlay}

But what if I want the fancy logo so that everyone knows I cleverly did the ORCID thing? If I am using some hidebound conference stylesheet from the 90s this is unlikely to work. But for a more modern setup (e.g. IEEE is usually current) I might be able to get an attractive green logo.

I made this work with the academicons package, which renders the logo using a custom font. Then, ORCID, for example, is set up in the preamble:

\usepackage{academicons}
\definecolor{orcidlogocol}{HTML}{A6CE39}

and in the body

\href{https://orcid.org/0000-0001-6077-2684
}{Dan MacKinlay \hspace{2mm} \textcolor{orcidlogocol}{\aiOrcid}  }

Or use the orcid.pdf (which I converted from orcid.svg, feel free to use it):

\href{https://orcid.org/0000-0001-6077-2684}{\includegraphics[scale=0.06]{orcid.pdf}\hspace{2mm}Dan MacKinlay}

19 Mathematical hacks

See LaTeX mathematics hacks.

21 Tables

booktab tables are less ugly.

22 Version control

Tools such as git-latexdiff provide custom diffing for, in this case, LaTeX code in git. This is invaluable during a review process because it produces a new latex file which highlights changes in colour in a PDF.

I install it manually via the usual instructions and invoke as git-latexdiff

More involved install instructions to invoke it as a git subcommand (i.e. git latexdiff, losing the hyphen) are here; Note that the URLs there are wrong because of the move to gitlab.

23 Diagrams

Getting a diagram or a plot into a document? See also general diagrams for tools which create generic types of diagram. Consider scientific workbooks, which often include automatic conversion of inline plots.

23.1 SVG

Martin H says, on including SVG in TeX, that the smoothest route is to convert the SVG into PDF+TeX, as per Johan Engelen’s manual:

inkscape -D -z --file=image.svg --export-pdf=image.pdf --export-latex

Then invoke using

\begin{figure}
    \centering
    % set width of next svg image:
    \def\svgwidth{\columnwidth}
    \input{image.pdf_tex}
\end{figure}

This workflow from SVG->PDF->LaTeX can be automated using the svg tex package.

23.2 PFGPlots

PGFPlots is a native diagramming/plotting package which supports PDF output for TiKZ-style diagrams.

24 Editors

See LaTeX editors.

25 Citations and bibliographies

From within LaTeX? See BibTeX etc.

26 Posters

Posters HOWTO.

a0poster is popular, as expounded by Morales de Luna, but I secretly feel that it sounds like a nightmare of legacy postscript nonsense and doesn’t even look good. sciposter is a popular a0poster variant.

tikzposter and beamerposter are both highlighted on sharelatex but I cannot find a way of making them seem anything but fugly to me and I cannot condone their use. It is hard enough to bring beauty into this world without making it worse.

27 Comments, TODOs

I quite like todonotes. As per Dustin Hauer’s suggestion I find it useful to define attributed TODOs in collaborative docs:

\usepackage{todonotes}
\newcommand{\dan}[1]{\todo[inline,color=green!20!white]{\textbf{Dan:} #1}}

28 Incoming

29 IEEE style specialties

IEEEtran stylesheets have some special equation formatting noûs.

\begin{IEEEeqnarray}{rCl}Z
&=&x_1 + x_2 + x_3 + x_4 + x_5 + x_6
\IEEEnonumber\\
&&+\:a + b%
\end{IEEEeqnarray}

30 Conversion to other markup formats

Maybe you want to throw the original latex out? in which case see LaTeX-free mathematics

  • pandoc converts to everything

  • ConTeXt can handle XHTML natively

  • plasTeX

    is a LaTeX document processing framework written entirely in Python. It currently comes bundled with an XHTML renderer (including multiple themes), as well as a way to simply dump the document to a generic form of XML. Other renderers can be added as well and are planned for future releases…

    plasTeX differs from other tools like LaTeX2HTML, TeX4ht, TtH, etc in that the parsing and rendering of the document are completely separated. This separation makes it possible to render the document in multiple output formats.

    It is active, being used by high-profile online projects such as the collaborative textbooks stacks and kerodon which have a system called Gerby that builds using plastex.

  • HEVEA — an older HTML converter with mediocre maths support (so why would you even bother?) Manual lives here.

  • LaTeX2HTML, TeX4ht, TtH,…

31 Fun

Via Louise Ord, virtuosic algorithmic line art in MetaPost:

Figure 5

32 References

Oetiker, Serwin, Partl, et al. 2022. The Not So Short Introduction to LaTeX.”