LaTeX symbols, fonts and character sets



LaTeX has been around for a few decades now, and a lot of stuff has happened in that time, e.g. they invented the € symbol. Here is how one keeps up with the typographical times.

First, be aware that there are many systems for encoding characters and information about them that can be used by TeX. There are font encodings with names like T1, OT1, T2C, R2D2 and what-have-you. There are also a distinct set of, I think, character encodings. There is also the modern solution of using unicode systems (which in practice means utf8 encoding), supported by modern desktop fonts. Encodings have to map to fonts, which means that fonts must support the desired encodings. The systems that make this work are nowadays routine for European text but less so much for non European scripts or mathematical text. In particular, when combining LaTeX’s recondite quirks with popular modern cosmopolitan text handling, friction arises. The main point is — it would be nice if all this were easy and transparent, but it is not, so set us see if we can learn the minimum possible about this entire dire situation so as to survive it.

Unicode

For 5 points, can you render this in LaTeX using the OT1 encoding?

As far as my shaky understanding goes, there are two encodings you need to car about in LaTeX: the character encoding, which controls how LaTeX understands the document engine, and the font encoding, which determines which glyph is pulled out of the font to represent the text when you write to PDF or print the thing out.

Classic LaTeX/pdfLaTeX uses ASCII or some other such basic American font encoding for the character encoding, but these days can be browbeaten into accepting UTF-8 for unicode character encoding.

Modern LaTeX variants such as XeLaTeX/LuaTeX etc use unicode throughout (?). I think.

For a long, useful, unexciting history of what on earth is going on here see Frank Mittelbach et al, LATEX font encodings.

There is a whole other complicated side story for the mathematical parts of the document which we set aside for the moment to do with the fact that the font encodings for mathematical stuff are obscure or broken or difficult or something.

Oh, and the other complexity is citations, because they have a partly parallel infrastructure. However, the 💩 hits the fän if I try to use non-ascii ©haracters in BibTeX. I use BibLaTeX/biber instead which works fine without any further effort because it uses more of the same infrastructure as LaTeX itself. Sometimes a journal will advise against BibLaTeX, but they don’t seem to notice if I ignore them and use it anyway. So far it saves me much time. I have not yet received the complaint “Oh no! There weren’t enough glitches in your bibliography! I was offended that umlauts rendered correctly!” (although that would not be the least productive reviewer comment I had ever received).

Anyway, caveats and qualms aside, I prefer to use unicode since it means I can copy and paste text from the internet without worry about diacritics makign a mess of everything.

pdfLaTeX

pdfLaTeX can handle modern input encoding but not modern fonts, AFAICS. It is also the de facto standard and more-or-less works just enough that it can stagger on, so in general I need to know how to make it work. This is more complicated than, e.g. just using straight up XeTeX which has modern font mappings and understands unicode. However, modern solutions are not supported on arxiv.org..

My cargo-cult solution: I put this at the start of every file I touch.

% !TEX encoding = UTF-8 Unicode

then after the documentclass

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

To really ensure it works with my IDE I probably want to put in there an appropriate magic commend:

% !TEX program = latexmk
% !TEX options = -synctex=1 -file-line-error -halt-on-error -pdf -outdir="%OUTDIR%" "%DOC%"

XeTeX

XeTeX is my exemplar of “modern font LaTeX”. I have the vague idea that it is interchangeable with ConTeX and LuaTeX et al.

\usepackage{mathspec}
\usepackage{xunicode}
% Fix incommensurability of font sizes which per default is awful
% this line must come after mathspec or fontspec
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\newcommand{\euro}{€}

For text, any modern font/font encoding works just works AFAICT for the base document text. When doing fancy Font Stuff, it can get tricky. Because some legacy packages are unmaintained hacks there are weird fragile problems that can arise where some package will try to mess with the fonts in an manner inappropriate by modern standards. It’s incompatible with a couple of primordial packages which do fake Unicode hacks (like the ucs package) but I don’t think those are actually needed anyway.

The magic comment for this setup goes:

% !TEX program = latexmk
% !TEX options = -synctex=1 -file-line-error -halt-on-error -xelatex  -outdir="%OUTDIR%" "%DOC%"

Now, one may luxuriously change fonts using fontspec/mathspec:

\setmainfont{Times New Roman}

As a bonus complication amsmath must be loaded before anything which uses mathspec because mathspec is flakey. More on maths fonts in a moment – this gets nasty.

Now we can do show-off tricks such as turning italic green.

But which fonts do I have? I find out manually:

fc-list : family

or, more precisely:

fc-list :outline -f "%{family}\n"

PRO TIP: only load one of mathspec and fontspec because mathspec loads fontspec which causes errors.

A tonne may be installed, but start with something useful.

tlmgr install
    baskervillef \
    mathdesign

An example of all this in action is tvwerkhoven’s XeTeX cheat sheet.

unicode-math

OK, the complicated mathematical story begins here.

If I decided to go hardcore unicode in XeTeX et al, I could use unicode even for mathematics, via unicode-math.

With this package, changing maths fonts is as easy as changing text fonts — and there are more and more maths fonts appearing now. Maths input can also be simplified with Unicode since literal glyphs may be entered instead of control sequences in your document source.

Also if I were to copy-paste equations from a PDF generated by such means to LaTeX, they would be somewhat less mangled. The price is that it has some quirks, e.g. missing some curly letters. On the other hand it has symbols that are in unicode but not in elderly Tex math fonts, such as ⫫, which I need essentially every da and is a recurring pain point for me. Also, usually I do not typically have a choice of maths fonts because versions are stipulated in the style guide for the journal/conference/thesis I am writing. Therefore, while this is nifty, the benefits are insufficiently many compared to the burdens. The logic of collective action dictates I ignore it for now.

Anyway, it goes like this

\usepackage{unicode-math}
\setmathfont{texgyrepagella-math.otf}

Text fonts

XITS is a scientific Times-like font that seem to soothe for example academics, who have a great fetish for cargo cult graphic design. It is not obvious how to set these fonts and weird things happen with maths sometimes but it mostly seems to work in the end.

Mathematical fonts

I am not the person to ask about the intimate details of LaTeX font hell but see TUG font catalogue for traditional LaTeX math font support and mathspec for XeLaTeX font support. To understand the intricacties of how maths typesettings works work check out the Free math Survey. This is outdated now but is a good conservative starting point AFAICT.

tl;dr There is some complicated interaction between

  1. main font and
  2. math font (and possibly a special greek font) and
  3. typesetting engine, and
  4. which format my fonts happen to be in, which BTW changes how one must install them, which is in turn dependent upon
  5. the OS I am using and
  6. which features I want and by the way
  7. what my text encoding is.

If I get any one of those wrong there are weird, unhelpful errors. This is stunningly boring.

Classic mathematical fonts

Or nearly-classic, using Type 1 postscript fonts and whatever dark magic LaTeX uses to make those go. Here is a large election of fonts, surely one of them works?

tlmgr install \
    ebgaramond \
    ebgaramond-maths \
    newtx \
    gfsneohellenicmath \
    mathdesign \
    stix2-otf \
    baskervillef \
    fira \
    firamath-otf \
    eulervm \
    mathpazo \
    sfmath \
    psnfss \
    rsfs

The above complicated interactions mean that configuring my document to use these is simultaneously terrifying and boring. Fortunately if I cargo-cult some example configurations things seem to work out OK. Hre are some incantations that work for me:

% Garamond
\usepackage[garamond]{mathdesign}
% Garamond
\usepackage[garamond]{mathdesign}
%garamond
\usepackage{fourier}
% Utopia
\usepackage[utopia]{mathdesign}
% Times
\usepackage{mathptmx}
% Times
\usepackage{mbtimes}
% Palatino
\usepackage{mathpazo}

unicode-math fonts

The experimentally modern world of unicode-math (i.e. typesetting using modern unicode encodings and OpenType fonts) has a unicode math font list. On the plus side, there are fewer moving parts for unicode maths, so less ways to break it. On the minus the minus side, those moving parts are still someone new and experimental and thus their failure modes are less well documented. If one is running minimalist TeX, one sets these parts up thusly

tlmgr install \
    unicode-math \
    fontspec \
    lualatex-math \
    l3kernel \
    l3packages \
    l3experimental

Unicode maths is currently supported by the following freely available open source fonts:

  • Latin Modern Math (Bogusław Jackowski, Janusz M. Nowacki, a minimal one, attempts to look like the 80s font Computer modern) — tlmgr install lm-math
  • Tex Gyre family (looks kinda like the fonts that every word processor uses) — tlmgr install tex-gyre tex-gyre-math
  • Asana Math (Apostolos Syropolous) — tlmgr install Asana-Math
  • STIX2 (STIpub) — tlmgr install stix2-otf and
  • XITS Math (Khaled Hosny) — tlmgr install xits
  • Libertinus Math (Philipp H. Poll and Khaled Hosny) — tlmgr install libertinus
  • Fira Math (Xiangdong Zeng) — tlmgr install firamath.

The following fonts are proprietary with OpenType maths support:

Special symbols

Particular characters/dingbats/emoji/etc?

90% of questions to this theme can be answered by the Latex Math Symbols Cheat sheet, or the full-length Not so short introduction to LaTeX. One of the great things about unicode maths is that some of the special symbols problems go away, because if you can find a special symbol in the unicode symbol list, or whatever they call it, you can use it in your document. The process to do that in class LaTeX is much more onerous — It would probably be better to import a single glyph via pifont or yagusylo.

Emoji

There are two dominant ways to insert emoji into LaTeX.

The dirty-yet-shiny hack to include color emoji as images. This needs a Mac computer lying to raid for the images, as a one-off.

\documentclass{article}
\usepackage{coloremoji}
\begin{document}
Hello, 🌎.
\end{document}

Elegant but less colourful, XeTeX has native monochrome emoji via DejVu fonts.

\documentclass{article}
\usepackage{fontspec}

% these lines must come after fontspec
\newfontfamily\DejaSans{DejaVu Sans}
\newcommand\todo{{\color{red}\DejaSans 🚧}}

\begin{document}
  \todo mention {\DejaSans 😁😂😃😇😉😈😋😍😱}
\end{document}

Google’s Noto font famously supports emoji. I wonder if that would be a good alternative? One could use the system-installed Noto font via XeTeX in the usual manner. There is also a noto package for TexLive, but I think that is not unicode? 🤷‍♂ One could probably also try the joypixels emoji, although their license prohibits free commercial use, so be careful with where they are deployed.

Intercal

What is the \intercal symbol? No one knows, but possibly it is a reference to the Roman god Terminus which got into character sets in 1986 and has remained by inertia.

Stochastic independence symbol

A case study in doing typography right. The probabilistic independence symbol ⫫, unicode U+2AEB, (“double up tack”) does not ship in normal LaTeX maths systems for some reason. So how do you fake it? In one of many slightly unsatisfactory ways!

Jason Blevins suggests the following hacks:

\newcommand{\indep}{\perp \! \! \! \perp}

It has some shortcomings, such as not setting the symbol up as a proper operator, which probably means something bad in the complicated world of LaTeX spacing. Perhaps this would be better:

\newcommand{\indep}{\mathop{\perp \! \! \! \perp}}

Alternatively, the following does more specific space management.

\newcommand\indep{\protect\mathpalette{\protect\independenT}{\perp}}
\def\independenT#1#2{\mathrel{\rlap{$#1#2$}\mkern2mu{#1#2}}}

Fibo Kowalsky adds the alternative:

\newcommand{\indep}{\raisebox{0.05em}{\rotatebox[origin=c]{90}{$\models$}}}

Ashwin Khadke notes that for classic LaTeX one can import the symbol in one of the massive math symbol fonts, e.g. mdsymbol:

\usepackage{mdsymbol}
\newcommand{\indep}{\upvDash}

All of these would probably benefit from declaring the created symbol to be a mathematical operator via \mathrel.

Note that mdsymbol is incompatible with amssymb and amsfonts although notionally it renders them unneeded. Also it is a sans serif math font, so may not fit with your aesthetic. And it redefines various useful characters and is generally a mess.

The generic glyph import also presumably works.

If I am using unicode-math, it is very simple and should work via:

\usepackage{unicode-math}
\newcommand{\indep}{⫫}

Any of the above should result in the independence symbol being available for use as

$X \indep Y $

AFAIK only the Jason Blevins double \perp trick works for js mathematics, although in that case I believe you can just type ⫫ since js mathematics is happy with unicode.

Conditional |

The stochastic conditional symbol is also fiddly to type. Jason Blevins observes that spacing is differently allocated in normal and big sizes.

Normal size:

\Pr( A \mid B )

will get us

\[\Pr(A \mid B)\]

Big size needs a \; spacer set around a \middle\vert, e.g.:

\Pr\left( A \;\middle\vert\; \sum_{i=1}^N B_i \right)

for

\[\Pr\left( A \;\middle\vert\; \sum_{i=1}^N B_i \right)\]

Every moment spent thinking about this nonsense is a moment spent not saving the world or eating profiteroles or having sex or such.

Blackboard bold

a.k.a. doublestroke. Notoriously annoying for numerals. Historically, for legacy fonts one used

\documentclass{article}
\usepackage{bbm}
\begin{document}
\[
  \mathbbm{1}
\]
\end{document}

Davislor’s recent advice suggests that the problem does not arise in unicode mathematics, but that a modern solution for legacy PdfTex is

\documentclass{article}
\usepackage{amsmath}
\usepackage[bb=dsserif]{mathalpha}
\begin{document}
\[ \mathbb{1} \Bbbbone
\]
\end{document}

In MathJax you need to cross your fingers and hope because it depends upon something complicated about users’ browsers.

Tilde ~ and backslash

two answers

Classic:

\textasciitilde{}
\textbackslash{}

XeTeX/LuaTeX:

\char`~

Bold in mathematics

tl;dr: In mainline LaTeX, \usepackage{bm} then \bm everywhere In Mathjax, \boldsymbol everywhere.

I want to bold certain symbols in equations. Somtimes these symbols are greek letters, and sometimes roman letters. Possibly other things, I do not know. In unreconstructed LaTeX this is hard for pointless historical reasons about the sensibilities of characters sets in the 1980s which I do not know because I cannot write documents in the 80s. I can only write documents in the 2020s or later. Even if I acquire the ability to write documents in the 80s I will not benefit from this historical allowance, because the documents I write in the 80s will not be full of mathematics, they will be full of phrases like “Sell Atari Stock! Buy MMicrosoft stock!”

Back to the more immediate problem. Roman letters are bolded by \mathbf{x} and greek letters by \boldsymbol{\xi}.

If I write a macro which attempts to bold whatever, this leads to sadness. I need to know ahead of time what it is that I need to bold. How can I bold any old thing? The canonical answer from the AMS guide seems to be \boldsymbol and \bm with the bm package. This is fine, except that the bm package does not exist for Mathjax.Which is not fatal as such. \boldsymbol in Mathjax is more powerful than in vanilla LaTeX and will in fact bold anything, unlike in vanilla LaTeX. So there is a solution for each, just not seamless translation between them. I could try to work around this by writing macros to redefine \boldsymbol but this feels like it is asking for trouble, and macro support in Mathjax is not universal (e.g. it does not work if you are outputting to a powerpoint presentation.

Changing case

tl;dr

These are different in traditional latex but both identical in unicode xelatex:

\uppercase{ab\"{u}ë} % AB\"{u}Ë
\MakeUppercase{ab\"{u}ë} % AB\"{U}Ë

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.