# LaΤeΧ symbols, fonts and character sets

Latex has been around for a few decades now, and a lot of weird stuff has happened in that time, e.g. they invented the € symbol. Here is how one keeps up with the typographical times.

First, be aware that there are may systems for encoding characters and information about them that can be used by TeX. I am no expert but there are font encodings with names like T1, OT1, T2C and what-have-you. There are also a distinct set of, I think, character encodings. There is also the modern solution of using unicode systems (which in practice means utf8 encoding), supported by modern desktop fonts. Encodings have to map to fonts, which means that fonts must support the desired encodings. The systems that make this work are nowadays routine for European text but not so much for non European scripts or mathematical text. In particular, when combining LaTeX’s ancient recondite quirks with popular modern cosmopolitan text handling, friction arises.

## Unicode

For 5 points, can you render this in LaTeX using the OT1 encoding?

As far as my shaky understanding goes, there are two encodings you need to car about in LaTeX: the character encoding, which controls how LaTeX understands the document engine, and the font encoding, which determines which glyph is puled out of the font to represent the text when you write to PDF or print the thing out.

Classic LaTeX/pdfLaTeX uses ASCII or some other such basic european font encoding for the character encoding (but these days can be browbeaten into accepting UTF-8 for unicode character encoding.)

Modern LaTeX variants such as XeLaTeX/LuaTeX etc use unicode throughout (?). I think.

For a long, useful, unexciting history of what on earth is going on here see Frank Mittelbach et al LATEX font encodings.

There is a whole other complicated side story for the mathematical parts of the document which we set aside for the moment to do with the fact that the font encodings for mathematical stuff are obscure or broken or difficult or something.

Oh, and the other complexity is citations, because they have a partly parallel infrastructure. However, the 💩 hits the fän if I try to use non-ascii ©haracters in BibTeX. I use BibLaTeX/biber instead which works fine without any further effort because it uses more of the same infrastructure as LaTeX itself. Sometimes a journal will advise against BibLaTeX, but they don’t seem to notice if I ignore them and use it anyway. So far it saves me much time. I have not yet received the complaint “Oh no! There weren’t enough glitches in your bibliography! I was offended that umlauts rendered correctly!” (although that would not be the least productive reviewer comment I had ever received).

Anyway, caveats and qualms aside, I prefer to use unicode since it means I can can copy and paste text from the internet without worry about diacritics makign a mess of everything.

### pdfLaTeX

pdfLaTeX can import modern input encoding but not modern fonts, AFAICS. OK, I can work with this I put this at the start of every file I touch.

% !TEX encoding = UTF-8 Unicode

then after the documentclass

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

To really ensure it works I probably want to put in there an appropriate magic

% !TEX program = latexmk
% !TEX options = -synctex=1 -file-line-error -halt-on-error -pdf -outdir="%OUTDIR%" "%DOC%"

This is more complicated than just using straight up XeLaTeX which has modern font mappings and understands unicode. However, it is not supported on arxiv.org so you need to fall back to this classic mode for now.

### XeTeX

\usepackage{mathspec}
\usepackage{xunicode}
% Fix incommensurability of font sizes which per default is awful
% this line must come after mathspec or fontspec
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\newcommand{\euro}{€}

Everything just works AFAICT for the base docuemnt. Because some packages are useless unmaintained hacks there are some weird fragile problems that can arise where some package will try to mess with the fonts in an inappropriate manner. It’s incompatible with a couple of primordial packages which do various fake Unicode hacks (like the ucs package) but I don’t think those are actually needed anyway.

To really ensure it works I probably want to insert an appropriate magic

% !TEX program = latexmk
% !TEX options = -synctex=1 -file-line-error -halt-on-error -xelatex  -outdir="%OUTDIR%" "%DOC%"

You can change fonts using fontspec/mathspec:

\setmainfont{Times New Roman}

As a bonus complication amsmath must be loaded before anything which uses mathspec because mathspec is flakey.

You can do cleverer tricks such as turning italic green.

But which fonts do you have? Find out manually.

fc-list : family

or, even more precise

fc-list :outline -f "%{family}\n"

PRO TIP: only load one of mathspec and fontspec because mathspec loads fontspec which causes errors.

tlmgr install
mathdesign

An example of all this in action is tvwerkhoven’s XeTeX cheat sheet.

### unicode-math

OK, the complicated mathematical story begins here.

If I decided to go hardcore unicode in XeTeX et al, I could use unicode even for mathematics, via unicode-math.

With this package, changing maths fonts is as easy as changing text fonts — and there are more and more maths fonts appearing now. Maths input can also be simplified with Unicode since literal glyphs may be entered instead of control sequences in your document source.

Also if I were to copy-paste equations from a PDF generated by such means to LaTeX, they would be somewhat less mangled. The price is that it has some quirks, e.g. missing some curly letters. On the other hand it has symbols that are in unicode but not in elderly Tex math fonts, such as ⫫, which I need essentially every da and is a recurring pain point for me. Also, usually I do not typically have a choice of maths fonts because versions are stipulated in the style guide for the journal/conference/thesis I am writing. Therefore, while this is nifty, the benefits are insufficiently many compared to the burdens. The logic of collective action dictates I ignore it for now.

Anyway, it goes like this

\usepackage{unicode-math}
\setmathfont{texgyrepagella-math.otf}

## Modern fonts

XITS is a scientific Times-like font that seem to soothe for example academics, who have a great fetish for cargo cult graphic design. It is not obvious how to set these fonts and weird things happen with maths sometimes but it mostly seems to work in the end.

## Special symbols

Particular characters/dingbats/emoji/etc?

90% of questions to this theme can be answered by the Latex Math Symbols Cheat sheet, or the full-length Not so sort introduction to LaTeX.

### Emoji

There are two dominant ways to insert emoji into LaTeX.

The dirty-yet-shiny hack to include color emoji as images. (needs a Mac computer lying to raid for the images, as a one-off.)

\documentclass{article}
\usepackage{coloremoji}
\begin{document}
Hello, 🌎.
\end{document}

Elegant but less colourful, XeTeX has native monochrome emoji via DejVu fonts.

\documentclass{article}
\usepackage{fontspec}

% these lines must come after fontspec
\newfontfamily\DejaSans{DejaVu Sans}
\newcommand\todo{{\color{red}\DejaSans 🚧}}

\begin{document}
\todo mention {\DejaSans 😁😂😃😇😉😈😋😍😱}
\end{document}

Google’s Noto font famously supports emoji. I wonder if that would be a good alternative? One could use the system-installed Noto font via XeTeX in the usual manner. There is also a noto package for TexLive, but I think that is not unicode? 🤷‍♂ One could probably also try the joypixels emoji, although their license prohibits free commercial use, so be careful with where they are deployed,

### Stochastic independence symbol ⫫

A case study in doing typography right. The probabilistic independence symbol ⫫, unicode U+2AEB, (“double up tack”) does not ship in normal LaTeX maths systems for some reason. So how do you fake it? In one of many slightly unsatisfactory ways!

Jason Blevins suggests the following hacks:

\newcommand{\indep}{\perp \! \! \! \perp}

It has some shortcomings, such as not setting the symbol up as a proper operator, which probably means something bad in the complicated world of LaTeX spacing. Perhaps this would be better:

\newcommand{\indep}{\mathop{\perp \! \! \! \perp}}

Alternatively, the following does more specific space management.

\newcommand\indep{\protect\mathpalette{\protect\independenT}{\perp}}
\def\independenT#1#2{\mathrel{\rlap{$#1#2$}\mkern2mu{#1#2}}}

\newcommand{\indep}{\raisebox{0.05em}{\rotatebox[origin=c]{90}{$\models$}}}

Ashwin Khadke notes that for classic LaTeX one can import the symbol in one of the massive math symbol fonts, e.g. mdsymbol:

\usepackage{mdsymbol}
\newcommand{\indep}{\upvDash}

All of these would probably benefit from declaring the created symbol to be a mathematical operator via \mathrel.

Note that mdsymbol is incompatible with amssymb and amsfonts although notionally it renders them unneeded. Also it is a sans serif math font, so may not fit with your aesthetic. And it redefines various useful characters and is generally a mess. It would probably be better to import a single character via pifont or yagusylo although that is onerous in its own right.

If I am using unicode-math, it should work via:

\usepackage{unicode-math}
\newcommand{\indep}{⫫}

Any of the above should result in the independence symbol being available for use as

$X \indep Y$

AFAIK only the Jason Blevins double \perp trick works for js mathematics, although in that case I believe you can just type ⫫ since js mathematics is happy with unicode.

### Conditional |

The stochastic conditional symbol is also fiddly to type. Jason Blevins observes that spacing is differently preserved in normal and big sizes.

Normal size:

\Pr( A \mid B )

will get us

$\Pr(A \mid B)$

Big size needs a \; spacer set around a \middle\vert, e.g.:

\Pr\left( A \;\middle\vert\; \sum_{i=1}^N B_i \right)

for

$\Pr\left( A \;\middle\vert\; \sum_{i=1}^N B_i \right)$

These are different in traditional latex but both identical in unicode xelatex:
\uppercase{ab\"{u}ë} % AB\"{u}Ë
\MakeUppercase{ab\"{u}ë} % AB\"{U}Ë