Grant Sanderson a.k.a. 3blue1brown’s projectmanim is a curious passion project to create interactivepython plot-compatible animations withmathematics, via code. It is youtube-famous.Here is a powerful example of what this tool can do. Most importantly, it makes me feel fancy. Here are some notes and links I need while using it.

- I believe thecommunity edition (source) is recommended for newcomers rather than the more idiosyncratic 31b1 version.
- ManimCommunity/manim: A community-maintained Python framework for creating mathematical animations.

Note dependencies:

```
sudo apt install libcairo2-dev libpango1.0-dev ffmpeg # Debian
brew install py3cairo ffmpeg pango scipy # macos
python -m pip install manim
```

- Manim Quickstart
- /r/manim the reddit community is very helpful

- Moving things to the edge of the screen?
`to_edge`

- Manim Tutorial on 2D Graphs
- A deep dive into Manim’s internals
- Manim’s Output Settings
- Manim’s building blocks
- There are z_indexes to allow us to include one thing with another.
- A lot of stuff that is badly explained, or unexplained, in the manual is beautifull demonstrated in theExample Gallery

From the documentation one might get the impression that theSVGMobject
width and height parameters set width and height.
Wrong; if I use both they only set the*width*.
If I wanted to set the height, I needed to use the`stretch_to_fit_height`

method.

```
def create_person(width=1.5, height=2.6, *_, color=PURPLE):
person = SVGMobject(
"person.svg",
fill_color=color,
width=width, height=height,
).set_z_index(0)
print("requested dims", width, height, "!=actual dims", person.get_width(), person.get_height())
person = person.stretch_to_fit_width(width).stretch_to_fit_height(height)
print("but after stretching we should match", person.get_width(), person.get_height())
return person
```

- Professionalize the police - by Noah Smith - Noahpinion
- manimml/ManimML: ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.
- Matheart/manim-physics: Physics simulation plugin of Manim that can generate scenes in various branches of Physics.
- NeoPlato/manim-livestream: Package that implements livestreaming configurations for Manim.
- ManimCommunity/awesome-manim: A database with many Manim users and content creators
- heejin_park’sLectures
- naveen521kk/manim-fonts: Use Fonts from Internet With Manim.
- naveen521kk/manim-fontawesome: Font Awesome SVG's for Manim

Undocumented, except for the distressingly vague and unofficialManim OpenGL Renderer Usage Guide.

Fast (notionally) and interactive.

Possibly using OpenGL is as simple as passing the`--renderer=opengl`

flag?
That does not work for me.

This is how we create a cell that will render itself:

```
%%manim -v WARNING --progress_bar None CreateCircle
class CreateCircle(Scene):
def construct(self):
circle = Circle() # create a circle
circle.set_fill(PINK, opacity=0.5) # set the color and transparency
self.play(Create(circle)) # show the circle on screen
```

The`-v WARNING`

and`--progress_bar None`

are to keep output minimalist.

It is somewhat hard to find documentation for this features by browsing, but it exists underManimMagic, although there it punts lots of stuff to the manim command line.

```
self.play(
*[FadeOut(mob)for mob in self.mobjects]
)
```

OK, but how do we actually create a video with helpful text etc? People do not often post full examples.Here is one, by Act of Learning.

Painstakingly editing a video in code is very hi-fi, but also tedious and I am happy to throw away some fidelity in order to get my presentations done.

One quick hack to improve the quality of inference is to use a screen capture tool to record the video as it is being rendered. I use quicktime on macos because that is easy, but there are many alternatives on all OSes; in particular there are fancyvideo routers which will do live compositing.

To make that fluid there need to be something to play back and pause the video animation; this goes much nicer if there are no annoying “pause” and “play” icons on the screen.

Pro-tip: VLC has a mode without such icons, which are called “on screen display”. They can be disabled in the advanced preferences: seeHow to disable the pause and play on screen icons.

That done, a screen capture of the important bit of the video playing window is a reasonably seamless way to show off the animation, and it is easy to pause and resume the animation as needed. The resulting video might occasionally be paused in the wrong moment and will have some silly resolution (811 pixels high on min machine) but TBH my years of training is in machine learning not in video production, so they are going to have to pay someone if they want something better.

Want a quirky GUI to present those animations fluidly?

A recent feature tries to make the timing of the animation smoother by easing synchronising a script and a recorded or synthesized voice inside the animation.

I personally am bad at writing scripts, and bad at following scripts, and I usually speak off the cuff, so this does not seem to help me.

OpenGL mode is supposedly interactive:

Adding a line:

`self.interactive_embed()`

within your scene allows you to interact with the scene directly via an IPython shell

However, OpenGL mode does not work for me.

There is an extension apparently for live previewing manim animations.
It doesn’t work for me; for one it seems to use PowerShell, somehow, but then give an error about a missing`/bin/sh`

. What shell/OS am I even supposed to be using?

Jupyter, the python code dev environment, is not a monolithic thing, but a whole ecology.
Different language back end kernels talk tovarious front end interfaces with varying systems of interaction and rendering in the mix.
I can execute python code interactively using several*different* front-ends, each offering different user experiences.
Because the, er, food web of this ecosystem is complicated it is not always easy to know which chunk of code is responsible for which part of the user experience.

Of the many irritations I have with jupyter, the IMO horrible default frontend (“classic” or “lab”) provide the worse user experience. But there are choices for all tastes and some of them are not as bad. Let us examine some:

- command line
- Classic jupyter notebook, the archetypal browser-based coding environment.
The command
`jupyter notebook`

starts this frontend. - jupyterlab, the newer iteration of classic notebook, extends and redesigns the classic notebook into a pseudo-IDE with idiosyncratic text editors, notebooks, REPL terminals etc.
The command
`jupyter lab`

starts this front-end. It is more powerful than classic but also even more confusing. - base
`ipython`

shell can execute notebooks. (Is that the same as`jupyter console`

?) ~~vscodeJupyter is an editor for notebooksVS Code.~~VS code hasjupyter integration withcorporate backing It is fast and IMO less nasty than the horrible jupyter notebook UX which I constantly whinge about. This is what I currently use. IMO, IDE interfaces like this represent a generally better way of doing python execution, because for most of us, executing python code requires writing python code. Many jupyter frontends are great at running code, but not so great at editing code.Code editors and IDEs exist and have had much effort poured into them, to the point where they are pretty good at editing code. Jupyter should not need to reinvent text editors; Although my opinions will not dissuade some Quixote from continuing to try. (C&C the phenomenon ofRstudio insistently reinventing code editors for R).- hydrogen is an equivalent for Atom that I have not used much but might be fine.
- nteract is a system for turning juptyer notebooks into apps somehow?
- pweave, also executes jupyter kernels as part of areproducible document.
- qtconsole is a traditional client which de-emphasises browser-based stuff in favour of desktop-integrated windows.
- Probably others too.

`jupyter run notebook.ipynb`

*What is the difference between this and just running normal python scripts from the command line?*, you might ask.
For one thing, jupyter wastes more hard disk space and doesn’t version nicely.

Alternatively, we can execute a notebook and capture the output in a notebook, which is possibly a more intuitive workflow if we were using a noteb ook in the first place.

`jupyter nbconvert --to notebook --execute my_notebook.ipynb`

See thenbconvert executing notebooks manual for more options.

papermill takes it further (too far imo), turning a jupyter notebook into a python script with a mediocrepython command line interface.

papermillis a tool for parameterizing, executing, and analyzing Jupyter Notebooks.Papermill lets you:

parameterizenotebooksexecutenotebooksThis opens up new opportunities for how notebooks can be used. For example:

- Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year,
using parametersmakes this task easier.- Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically
execute a workflowwithout having to copy and paste from notebook to notebook manually.

VS code’s jupyter integration is good-ish and its support for plain python is excellent.

A jupyter frontend is*already* an app that we need to install to interact with the code; so why not install a good one instead of a bad one?
Further, there are a huge number of benefits to using a proper editor to edit code, instead of jupyter.
More benefits than I care to enumerate, but I’ll start.

I do not need to learn new keyboard shortcuts.
I do not need to arse around with the various defective reimplementations of text editors from inside the browser.
Further, heaps of other things that jupyter cannot even dream of just magically work!
Session sharing? No problem. Remote editing? Easy!
Type inference! Autocomplete! Debugger injection!
Search and replace across a whole project! Style checking! refactor assistance!*Comprehensible documentation*.

SeeVS Code for python for more details.

Still, if that is not your jam, read on for some onerous alternatives that I hate.

The one that comes recommended until, I dunno, 2015 or so I guess? I cannot recall. Worth knowing about to set expectations.

The location of themeing infrastructure, widgets, CSS etc has moved of late;
check your version number.The current location is`~/.jupyter/custom/custom.css`

, not the former location`~/.ipython/profile_default/static/custom/custom.css`

Julius Schulz’s ultimate setup guide is also the ultimate pro tip compilation.

If I must use jupyter notebok then I kill parenthesis molestation (refered to in the docs as*bracket autoclose*) with fire.
I do not like having to fight with the notebook’s misplaced faith in its ability to read my mind.
The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead`cm_config.autoCloseBrackets`

and is not in the preference menus.
According toa support ticket, the following should work.

```
# Run this in Python once, it should take effect permanently
from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {
"cm_config": {"autoCloseBrackets": False}}})
```

or add the following to`custom.js`

:

```
define([
'base/js/namespace',
], function(Jupyter) {
Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})
```

or maybe create`~/.jupyter/nbconfig/notebook.json`

with the content

```
{
"CodeCell": {
"cm_config": {
"autoCloseBrackets": false
}
}
}
```

That doesn’t work with`jupyterlab`

, which is even more righteously sure that itknows better than you about what you truly wish to do with parentheses.
Perhaps the following does work?
Go to`Settings --> Advanced Settings Editor`

andadd the following
to the User Overrides section:

```
{
"codeCellConfig": {
"autoClosingBrackets": false
}
}
```

or in the file`~/.jupyter/lab/user-settings/@jupyterlab/notebook-extension/tracker.jupyterlab-settings`

.

Ahhhhhhhh. Update: it is easier now.

Jupyter classic is more usable if you install thenotebook extensions, which includes, e.g. drag-and-drop image support.

```
$ pip install --upgrade jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user
```

For example, if you run`nbconvert`

to generate a HTML file,
this image will remain outside of the html file.
You can embed all images by using the calling`nbconvert`

with the`EmbedPostProcessor`

.

`$ jupyter nbconvert --post=embed.EmbedPostProcessor`

**Update** — broken in Jupyter 5.0

Wait, that was still pretty confusing; I needthe notebook configurator whatsit.

```
$ pip install --upgrade jupyter_nbextensions_configurator
$ jupyter nbextensions_configurator enable --user
```

`jupyter lab`

(sometimes styled`jupyterlab`

)
is the current cutting edge according to jupyter mainline, and reputedly is much nicer to
develop plugins for than the notebook interface.
From the user perspective it’s more or less the same thing, but the annoyances are different.
It does not strictly dominate`notebook`

in terms of user experience,
although I understand it may do in terms of experience for plugin developers.

Apitch targeted at us users explains some practical implications of jupyterlab and how it is the one true way and the righteous future etc. Since we are not in the future, though, we must deal with certain friction points in the present, actually-existing jupyterlab.

The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already works, but as a jankier javascript version. The jupyter lab have tasted that forbidden fruit, have the spidermonkey on their backs, and they hanger to reinvent things using javascript. So far they ahve reimplemented copy, paste, search/replace, browser tabs and the command line.

Yo dawg I heard you like notebook tabs so I put notebook tabs in your browser notebook tab.

The replacement jupyter tab system clashes with the browser tab system, keyboard shortcuts and generally confuses the eye. Why do we get some a search function which AFAICT non-deterministically sometimes does regexp matching but then doesn’t search the whole page? Possibly the intention is that it should not be run through a normal browser, but in a custom single-tab embedded browser? Or maybe for true believers, once you load up a jupyter notebook you like it so hard that you never need to browse to a website on the internet again and you close all the other tabs forever. Whatever the rationale, the learning curve for these weird UI choices is bumpy, and the lessons are not transferable.

Is the learning process worth it?
I am*not* using jupyter for its artisinal, quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics.
Does it get me that?

Maybe. If I get in and out fast, do most of my development in a real code editor and leave the jupyter nonsense for results sharing, I neither need the weird UI, nor am I bothered by it. The other features are like that button on the microwave labelled “fish”, which no one has ever need or intentionally used, but which does not stop the microwave from defrosting things if you use the normal controls.

At the same time, some jupyterlab enthusiasts want tore-implement text editors, which is an indicator that there might be a contagion ofNIH fever going around int the community, and it makes me nervouse.

Whether*I* like the overwrought jupyter lab UX or not, we should allow it a moderate nonsense-baggage allowance,
if the developer API is truly cleaner and easier to work with.
That would be a solid win in terms of delivering the features I would actually regard as improvements including, maybe, ultimately, a better UI.

Simultaneous users aresupported natively in jupyterlab with anOK UI for identifying running notebooks. IMO this could be a killer feature of jupyter lab. As currently implemented it is useable but sometime calculation output gets lost in the system somewhere.

There is anunder-documented project to introducereal time collaboration to jupyterlab coordinating on notebook content code, output*and* backend state,which apparently works, but is barely mentioned in the docs.
Maybe theJupyterLab RTC Design Documentation would help there.

If you have a snakepit of different jupyter sessions running on some machine you have just logged in to and which to open up the browser to get a UI for them, then you want to work out which are running on a given machine so that you can attach them. The command is (for either jupyter notebook notebooks or jupyter lab sessions):

`jupyter notebook list`

Related to, inspired but and maybe conflicting or intersecting with the nbextensions are thelabextensions, which add bits of extra functionality to the lab interface rather than the notebook interface (where the lab interface is built upon the notebook interface and runs notebooks just like it but with different moving parts under the hood.)

I try to keep the use of these to a minimum as I have a possibly irrational foreboding that some complicated death spiral of version clashes is beginning between all the different juptyer kernel and lab and notebook installations I have cluttering up my hard disk and it can’t improve things to put various versions of lab extensions in the mix can it? And I really don’t want to have to understand how it works to work out whether that is true or not, so please don’t explain it to me. I moreover do not wish to obsessively update lab extensions everywhere.

Anyway there are some useful ones, so I live with it by running install and update commands obsessively in every combination of kernel/lab/whatever environment in the hope that something sticks.

Life is easier withjupyerlab-toc which allows you to navigate your lab notebook by markdown section headings.

`jupyter labextension install @jupyterlab/toc`

`jupyter labextension update @jupyterlab/toc`

Integrated diagram editor? Someone integrateddrawio asjupyterlab-drawio toprove a point about the developer API thing.

`jupyter labextension install jupyterlab-drawio`

LaTeX editor? As flagged, I think this is a terrible idea. Even worse than the diagram editor. There arebetter editors than jupyter, bettermeans of scientific communication than latex, and betterspecific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, e.g. as a plot point in a contrived techno-thriller script written by cloistered nerds. If you find yourself in such dramaturgical straits:

`jupyter labextension install @jupyterlab/latex`

One nerdy extension isjupyter-matplotlib,
a.k.a., confusingly,`ipympl`

,
which integrates interactive plotting into the notebook better.

```
pip install ipympl
# If using JupyterLab
# Install nodejs: https://nodejs.org/en/download/
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyter-matplotlib
```

jupyterlab/jupyterlab-hdf5 claims to provide aUI for HDF5 files.

`qtconsole`

A classic, i.e. non-web-browser based client for jupyter. No longer fashionable? Seems to work fine but is sometimes difficult to compile and doesn’t support all the fancy client-side extensions.

`jupyter qtconsole`

can connect two frontends to the same kernel.
This will be loopy since they update the same variables (presumably) but AFAICT not the same notebook content, so some care would be required to make sure you are doing what you intend.

`%qtconsole`

A proprietary google fork/extension,Colaboratory is a jupyter thingo integrated with some fancy hosting and storage infrastructure, and you get free GPUs. Looks like a neat way of sharing things, and all that they demand is your soul. Don’t be fooled, though, claims you see on the internet that this is a real-time collaborative environmentare false, Gogglekilled realtime interaction.

hydrogen, a plugin for the`atom`

text editor,
provides a more unified coding experience with a normal code editor. See theintro blog post.

pweave is likeknitr for python. It also executes jupyter kernels. The emphasis is not interactive but rather reproducible documents.

nteract is a system for running jupyter notebooks as desktop apps, integrating with OS indexing services and looking pretty etc. Not totally sold on this idea because it looks so bloaty, but I would like to be persuaded.

Thediagrams I most often need are directed flow graphs, a formal mathematical cousin of the flowchart, which can representgraphical models andneural nets.
The diagrams I need here are nearly-enough flowchart-like that I*can* sketch them with a flowchart tool if need be;
but they are closely integrated with the equations of a particular statistical model, so I would like to incorporate them into the same system to avoid tedious and error-prone manual sync if possible.
Further, there are a couple of things I would like handled which flowchart programs are bad at, such as plate-notation and handling of inline mathematical markup, and representing network topologies.

As always, I would like to export the resulting diagrams to a modern compatible vector format which meansSVG,PDF. As a fallback I would accept other formats that can be converted to the above, such as Adobe Illustrator, EPS or xfig.

After initializing a new DAG using a command line, the researcher can evaluate what associations are introduced by adjusting for covariables. Potentially biasing paths from exposure to outcome can be identified (seeeFig. 1, demonstrating harmful adjustment using an example DAG from Fleischer and Diez Roux). Functions to conveniently add or remove nodes and arcs are included, as is a function checking introduced associations and biasing paths for all possible adjustment sets […]. The graphics capabilities of R allow fairly straightforward programming of basic DAG drawing routines, while also supporting the interactive repositioning of nodes and arcs.

It additionally calculates adjustment sets and other useful functions of the graph.

LaTeX-friendly diagramming toolTikZ has a graphical model library,jluttine/tikz-bayesnet, as “TikZ library for drawing Bayesian networks, graphical models and (directed) factor graphs in LaTeX”.

DiagrammerR is a generic graph visualisation app for R which can incidentally do graphical models.

With the

DiagrammeRpackage you can create, modify, analyze, and visualize network graph diagrams. The output can be incorporated intoRMarkdowndocuments, integrated withShinyweb apps, converted to other graph formats, or exported as image files.

diagrams.net (formerly draw.io) is a browser-based diagram editor. It is a generic diagramming tool but has lots of affordances for the specific needs of graphical models.

Another browser option.mermaid is a flowcharting tool which can be pressed into service for DAGs. Unique Value Proposition: code-driven diagrams with a syntax that aspires to be so basic that it is easier than point and click. Integrates with many markdown editors. Has anonline editor, and aCLI.

It also has, if anything, too manyVS Code integrations. Here are two.

Thegraphviz family is nearly good,
in that it supports graphing various networks, including probabilistic DAGs.
Inevitably, none of its fancy algorithms ever lay it out*quite* like I want.
There isa macOS gui
and a cross-platform (WX) gui calleddoteditor.

- numpyro automatically renders graphviz graphical model diagrams viathe
`render_model`

method. - So does pyro
- There isa python graphviz wrapper which has possibly the best documentation for graphviz generally.
- fastdot.core wraps pygraphviz in a slightly nicer API.
- dot2tex is A Graphviz toLaTeX converter which support mathematical markup in the nodes
- Aaaaaandgraphviz renders in jupyter. (see alsoother jupyter options)
- And it runs in R viadiagrammeR
- runs in javascript asviz.js.
- and traditional style usingRgraphviz

For some purposes, I would recommend dagitty over this; the syntax is similar, but dagitty additionally accepts specification by a structural equation model.

daggity(Textor et al. 2017) is an option for the browser and alsor. I’m not 100% sure how these two parts relate to each other. Documentation for the R version was hard to find—see here.

ggdag extends`daggity`

to the`ggplot`

ecosystem.
Theggdag bias structure vignette
shows of the useful explanation diagrams available in`ggdag`

and is also a good introduction to selection bias andcausal DAGs themselves

Shinydag provides a web interface to ggdag. Shinydag is tedious to install natively, especially with the poor documentation. It runs ok indocker.

`docker run -d -p 3838:3838 --name shinydag gerkelab/shinydag:latest`

Now Shinydag is waiting for you at`127.0.0.1:3838/shinyDAG`

.
It is bit crashy and clunky. I’m not sure I prefer it to plain ggdag.

Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.

HOWTOdiagramming convnets using matplotlib in python Or:daft-pgm:

Daft is a Python package that uses matplotlib to render pixel-perfect probabilistic graphical models for publication in a journal or on the internet. With a short Python script and an intuitive model-building syntax you can design directed (Bayesian Networks, directed acyclic graphs) and undirected (Markov random fields) models and save them in any formats that matplotlib supports (including PDF, PNG, EPS and SVG).

It’s OK at programmatic DAG rendering; it would be nice if it had some kind of semi-automatic layout algorithm.

yEd is a low-key nerdview diagrammer.

yEd supports a wide variety of diagram types. In addition to the illustrated types, (BPMN Diagrams, Flowcharts, Family Trees, Semantic Networks, Social Networks, UML Class Diagrams) yEd also supports organization charts, mind maps, swimlane diagrams, Entity Relationship diagrams, and many more.

Found viaA blog by Jonas Kristoffer Lindeløv which talks through pluses and minuses:

yEd is purely graphical editing which is fast to work with and great for tweaking small details. A very handy yEd feature is its intelligent snapping to alignments and equal distances when positioning objects. Actually, I don’t understand why yEd almost never makes it to the “top 10 diagram/flowchart programs” lists.

A few things I learned: To make subscripts, you have to use HTML code.… it is not possible to do double-subscripts. Also, the double-edges node is made completely manually by placing an empty ellipse above another. […] A final limitation is that arrowhead sizes cannot be changed. You can, however, zoom. Therefore, your very first decision has to be the arrowhead size. Zoom so that it is appropriate and make your graphical model.

visualizes various NNs from various frameworks.

AFAICT the style here is useful for diagnostics but does not seem to match the more manicured style that I would use for a publication. Maybe I should look deeper.

flowchart.fun (source) is a flowcharting tool which can be pressed into service for DAGs. Documentation and project goals are sparse.

TETRAD, the graph inference software, will also graph structural equation models.

Seems to do what it says on the tin, especially if you what to make convnets look pretty:HarisIqbal88/PlotNeuralNet: Latex code for making neural networks diagrams.

Laura Dietz and Jaakko Luttinen made tikz macros for drawing Bayes nets in LaTeX,tikz-bayesnet.

Breitling, Lutz Philipp. 2010.“dagR: A Suite of R Functions for Directed Acyclic Graphs.”*Epidemiology* 21 (4): 586.

Creed, Jordan, and Travis Gerke. 2018.“Gerkelab/Shinydag: Initial Release.” Zenodo.

Greenland, Sander, Judea Pearl, and James M Robins. 1999.“Causal Diagrams for Epidemiologic Research.”*Epidemiology* 10 (1): 37.

Textor, Johannes, Juliane Hardt, and Sven Knüppel. 2011.“DAGitty: A Graphical Tool for Analyzing Causal Diagrams.”*Epidemiology* 22 (5): 745.

Textor, Johannes, Benito van der Zander, Mark S. Gilthorpe, Maciej Liśkiewicz, and George T.H. Ellison. 2017.“Robust Causal Inference Using Directed Acyclic Graphs: The R Package‘Dagitty’.”*International Journal of Epidemiology*, January, dyw341.

On the art and science of line drawings. I am especially interested in scientific diagrams; drafting/CAD/engineering stuff is somewhat different emphasis, but you might still find some usable ideas here.

I am also interested in diagrams that are some mix of freehand and procedural; maybe partly generated algorithmically and partly through a GUI.
Generally I would like these to export to a modern compatible vector format which meansSVG,PDF or, as a fallback, one of these other formats that can be converted to the above, such as Adobe Illustrator, EPS or xfig.
I am no graphic designer, but I understand thatit is a habit of successful scientists to have an effective Figure 1.
But possibly I can make*adequate* diagrams rather than*terrible* ones, while waiting on a graphic design budget.

Seeking inspiration or examples of what I am talking about.Gallery of concept visualisation: gallery of methods for explaining complex ideas; everything from exotic 3d stuff to crayon drawing and sticks. Construct your mood board!

See alsointeractive visualisations.

Here’sthe stack exchange list of tools.
What follows is*my* list, in descending order of (fluctuating) frequency of use.^{1}

My current recommendation: Start withdiagrams.net and tidy up any mathematics etc ininkscape.

diagrams.net (formerly draw.io) is a browser-based diagram editor. It is free and open source. It has all the modern convenience of the new generation of browser-based apps and the recent versions are pretty good; good enough that this is my go-to tool. I can use it as a desktop app or integrated browser widget:

This provides support also for theflowchart-style graphical model diagrams that I need a lot, and many other process-model diagrams, plus a little bit of generic diagramming. It can importGliffy files.

Open sourceInkscape can do everything; it is essentially targeting the Adobe Illustrator audience. Aimed firstly at generalvector graphic design, which means the functions I want for sciencey stuff are not front-and-center, but it can be done.

⚠️**Pro tip**: Inkscapesupports LaTeX math rendering in its`PDF+LaTeX`

mode, which is extremely useful.

`inkscape -D image.svg -o image.pdf --export-latex`

ipe has, if not a*actually* intuitive interface, at least one that makes certain specialist science graphing tasks easy.
It’s*mostly* simple to learn, modulo some oddities - e.g. you change the page size by creating and importing an XML stylesheet. Riiiight.

Also, it can’t export SVG, merely PDF and EPS, each of which are not so much vector graphics*standards*, as the battleground beneath the standards behind which the corporate armies clash.
Nonetheless, it repays persistence and can get some surprising qualitative diagrams done.

Ipe’s main features are:

- Entry of text as LaTeX source code. This makes it easy to enter mathematical expressions, and to reuse the LaTeX-macros of the main document. …
- Produces pure Postscript/PDF, including the text…
- It is easy to align objects with respect to each other (for instance, to place a point on the intersection of two lines, or to draw a circle through three given points) using various snapping modes…

That last point sounds minor, but it is an ingenious move that makes all the more questionable design choices worthwhile.
Anyone who has every spent 90 minutes on doing trigonometry to get a`pstricks`

diagram right, and wondered what happened to the thing where
computers eliminated drudge work, will weep upon the sight of it.

These days,`diagrams.net`

has acquired many of the same alignment features and the point of differentiation is shrinking.

Has built-inLua scripting, like everything on the planet.Integrates with matplotlib.

Recommended: handy user contributions byStefan Huber.

Seepdf2svg to export the PDFs to other formats.

The classic type-your-diagram-in-then-work-out-what-went-wrong option. If I am constructing a well-understood type of diagram (e.g.a PGM) then this is good. If we are doing something unusual, it become tedious..

The original flavour, pstricks/TikZ, is based on postscript and has compatibility problems with modern toolchains Modern projects seem to prefer the more compatible (less powerful ?)PGF/TikZ .

latexdraw is a Java pstricks GUI. Might be good, but I couldn’t install it.

TikZiT, a raesonably good GUI for Tikz:

a super simple GUI editor for graphs and string diagrams. Its native file format is a subset of PGF/TikZ, which means TikZiT files can be included directly in papers typeset using LaTeX.

HT Luis Riera Garcia for showing me this.

Agustinus Kristiadi, inThe Last Mile of Creating Publication-Ready Plots introducestexworld/tikzplotlib,, viamatplotlib which might be a tenable way of working.

The overleaf tutorial,LaTeX Graphics using TikZ, is probably the best one.

figma the browser-basedgraphic design tool might also be functional for diagrams. Its special selling point is that it is easy to collaboratively edit live. It is not particularly targeted at science but also it has lots of science-useful functionality. Not free, but many organisations have a corporate subscription.

Gliffy is an “enterprise” diagram tool whose selling point seems to be that it is integrated into lots of other enterprise things and works from the browser. AFAICT it is OK but it is hard to label things with mathematics, and the alignment is not quite as nice asdiagrams.net. Not free, but many organisations have a corporate subscription.

Python,R,julia,javascript all have diverse plotting infrastructure, some of which turns out to also be diagramming infrastructure. In practice I often generate part of a diagram this way, but rarely the whole thing,

I am slightly interested in a certain kind of low-key automation which is sometimes useful in diagrams, which I understand is called a “compositional layout” diagramming style, as seen in Grammar-of-graphics tools, most famously R’sggplot2. These tools lay out components as a kind of algebra of composed styling operations.

Compose.jl provides these for Julia.Diagrams is the Haskell version. R’sgrid is purportedly somewhat similar for layout, as is all theggplot2 stuff for style.

pencil is a fashionable tool for GUI design that happens to be good for some other tricky things, notably flow charts. It’s built as a browser app so has good compatibility with export and presumably some hackability. I have not looked at this one for a while; I wonder if it is still current.

Dia has some neat features; and many confusing ones I can’t imagine the use-case for. Possibly this last one is because I am not in middle management. If you need to produce specialised diagram types for your project manager, Dia probably has the function.

Dia supports more than 30 different diagram types like flowcharts, network diagrams, database models. More than a thousand readymade objects help to draw professional diagrams. Dia can read and write a number of different raster and vector image formats. […] Dia can be scripted and extended using Python.

Crashes on my recent macOS, though, and I can’t be arsed working out why.

Asymptote is a powerful descriptive vector graphics language that provides a natural coordinate-based framework for technical drawing. Labels and equations are typeset with LaTeX, for high-quality PostScript output.

Might be good; haven’t used it. Has ajupyter extension

An Inkscape alternative option leveraging web technology is Jarosław Foksa’sBoxy. Not sure if it has nay advantages over diagrams.net.

PlotDevice is a cousin of nodebox, which is to say, a python drawing system 2d vector printable outputs.

xfig combines the imprecision of drawing through GUIs, with the abstruseness and fragility of drawing through code. Listed here for the surpassing beauty of its manual page.

Notice there are few commercial entrants in this race? I’m too poor for any Autodesk product, and unconvinced of their utility unless I wish to build some kind of major concrete structure. I

*did*buy Omnigraffle, the much-touted diagram editor for macOS. What a disaster; expensive, unintuitive, over-engineered. Like they ripped off`xfig`

, took out the API and compensated for it by putting bezels on the icons. If you want to waste hours with horribly inscrutable drafting products, there are many open-source options available that can give you that experience for free.↩︎

Classic in the modern setting:Snoek et al. (2015).

This looks interesting:Weber et al. (2018)

We propose a new method for training neural networks online in a bandit setting. Similar to prior work, we model the uncertainty only in the last layer of the network, treating the rest of the network as a feature extractor. This allows us to successfully balance between exploration and exploitation due to the efficient, closed-form uncertainty estimates available for linear models. To train the rest of the network, we take advantage of the posterior we have over the last layer, optimizing over all values in the last layer distribution weighted by probability. We derive a closed form, differential approximation to this objective and show empirically that this method leads to both better online and offline performance when compared to other methods

Haven’t seen it used much, which leads me to suspect there is some difficulty in practice.

a.k.a. Neural Linear models.

AFAICT this case is the simplest. We are concerned with the density over the predictive, so we start with a neural network. Then we treat the neural network as a feature generator in all the layers up to the last one, and treat the last layer probabilistically, as an adaptive-basis regression or classification problem, to get a decent learnable predictive uncertainty. I think this was implicit inMackay (1992), but it was named inSnoek et al. (2015), critiqued and extended inLorsung (2021).

For a simple practical example, see theProbflow tutorial.

Under a last-layer Laplace approximation, we write the joint model as\(\vrv{y}= \vrv{r}^{\top}\Phi(\vrv{u})\) so the joint distribution is\[\begin{align*} \left.\left[\begin{array}{c} \vrv{y} \\ \vrv{r} \end{array}\right]\right|\vrv{u} &\sim\dist{N}\left( \left[\begin{array}{c} \vv{m}_{\vrv{y}}\\ \vv{m}_{\vrv{r}} \end{array}\right], \left[\begin{array}{cc} \mm{K}_{\vrv{y}\vrv{y}} & \mm{K}_{\vrv{y}\vrv{r}}^{\top} \\ \mm{K}_{\vrv{y}\vrv{r}} & \mm{K}_{\vrv{r}\vrv{r}} \end{array}\right] \right) \end{align*}\] with\[\begin{align*} \vv{m}_{\vrv{y}} &=\vv{m}_{\vrv{r}}^{\top}\Phi(\vrv{u}) \\ \mm{K}_{\vrv{y}\vrv{r}} &=\Phi(\vrv{u}) \mm{K}_{\vrv{r}\vrv{r}}\\ \mm{K}_{\vrv{y}\vrv{y}} &= \Phi(\vrv{u})\mm{K}_{\vrv{r}\vrv{r}} \Phi^{\top} (\vrv{u})+ \sigma^2\mm{I}. \end{align*}\] Here\(\vrv{r}\sim \dist{N}\left(\vv{m}_{\vrv{r}}, \mm{K}_{\vrv{r}\vrv{r}}\right)\) is the random weighting, and\(\Phi(\vrv{u})\) is called the feature map.

TBD

Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021.“Laplace Redux — Effortless Bayesian Deep Learning.” In*arXiv:2106.14806 [Cs, Stat]*.

Kristiadi, Agustinus, Matthias Hein, and Philipp Hennig. 2020.“Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks.” In*ICML 2020*.

Lorsung, Cooper. 2021.“Understanding Uncertainty in Bayesian Deep Learning.” arXiv.

Mackay, David J. C. 1992.“A Practical Bayesian Framework for Backpropagation Networks.”*Neural Computation* 4 (3): 448–72.

Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018.“A Scalable Laplace Approximation for Neural Networks.” In.

Snoek, Jasper, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Prabhat, and Ryan P. Adams. 2015.“Scalable Bayesian Optimization Using Deep Neural Networks.” In*Proceedings of the 32nd International Conference on Machine Learning*.

Tran, Dustin, Mike Dusenberry, Mark van der Wilk, and Danijar Hafner. 2019.“Bayesian Layers: A Module for Neural Network Uncertainty.”*Advances in Neural Information Processing Systems* 32.

Weber, Noah, Janez Starc, Arpit Mittal, Roi Blanco, and Lluís Màrquez. 2018.“Optimizing over a Bayesian Last Layer.” In*NeurIPS Workshop on Bayesian Deep Learning*.

Transformers are bigattention networks with some extra tricks. I am no expert. Here are some good blog posts explaining everything, for my reference, but I will not write yet another one. This is a fast-moving area and I am not keeping track of it, so if you are on this page looking for guidance you are already in trouble.

Phuong and Hutter (2022)

Transformers are deep feed-forward artificial neural networks with a (self)attention mechanism. They have been tremendously successful in natural language processing tasks and other domains. Since their inception 5 years ago, many variants have been suggested. Descriptions are usually graphical, verbal, partial, or incremental. Despite their popularity, it seems no pseudocode has ever been published for any variant. […] This report intends to rectify the situation for Transformers. It aims to be a self-contained, complete, precise and compact overview of transformer architectures and formal algorithms (but not results)

Lilian Weng,The Transformer Family

Jay Alammar’sIllustrated Transformer.

Xavier Amatriain,Transformer models: an introduction and catalog — 2023 Edition - AI, software, tech, and people, not in that order… by X

These networks are massive (heh) innatural language processing right now.

A key point about these networks seems to be that they can be made extremely large but still remain trainable. This leads tointeresting scaling laws.

A good paper read isYannic Kilcher’s.

Transformers are pretty good at weird stuff, e.g.automata — seeUnveiling Transformers with LEGO(Zhang et al. 2022).

How about Bayesian inference?(Müller et al. 2022)

Democratizing the hardware side of large language models seems to be an advertisement for some new hardware, but there is interesting background in there.

HuggingFace distributes and documents and implements a lot ofTransformer/attention NLP models and seem to be the most active neural NLP project. Certainly too active to explain what they are up to in between pumping out all the code.

The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:

- BERT (from Google) released with the paperBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
- GPT (from OpenAI) released with the paperImproving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever.
- GPT-2 (from OpenAI) released with the paperLanguage Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
- Transformer-XL (from Google/CMU) released with the paperTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov.
- XLNet (from Google/CMU) released with the paperXLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le.
- XLM (from Facebook) released together with the paperCross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
- RoBERTa (from Facebook), released together with the paper aRobustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov.
- DistilBERT (from HuggingFace) released together with the paperDistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, and Thomas Wolf. The same method has been applied to compress GPT2 intoDistilGPT2.
- [very long list excised]

GPT-Neo is the code name for a series of transformer-based language models loosely styled around the GPT architecture that we plan to train and open source. Our primary goal is to replicate a GPT-3 sized model and open source it to the public, for free.

Along the way we will be running experiments withalternativearchitectures andattentiontypes, releasing any intermediate models, and writing up any findings on our blog.

It is unclear if they will release the actual weights, but you can use a miniature GPT-alikeat contentyze.

UPDATE: IsGPT-J-6B: 6B JAX-Based Transformer in the same family? THAT seems to be open and available. Some background to that and the other open options here: Alberto Romero,Can’t Access GPT-3? Here’s GPT-J, Its Open-Source Cousin

Thisguide to pruning multihead attention NN should probably go somewhere useful if i actually end up doing NLP like all the recruiters seem to want.

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015.“Neural Machine Translation by Jointly Learning to Align and Translate.” In*arXiv:1409.0473 [Cs, Stat]*.

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020.“Language Models Are Few-Shot Learners.”*arXiv:2005.14165 [Cs]*, June.

Celikyilmaz, Asli, Li Deng, Lihong Li, and Chong Wang. 2017.“Scaffolding Networks for Teaching and Learning to Comprehend.”*arXiv:1702.08653 [Cs]*, February.

Choy, Christopher B, JunYoung Gwak, Silvio Savarese, and Manmohan Chandraker. 2016.“Universal Correspondence Network.” In*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2406–14. Curran Associates, Inc.

Freeman, Alexandra L J. 2019.“How to Communicate Evidence to Patients.”*Drug and Therapeutics Bulletin* 57 (8): 119–24.

Huang, Cheng-Zhi Anna, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2018.“Music Transformer: Generating Music with Long-Term Structure,” September.

Katharopoulos, Angelos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020.“Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention.”*arXiv:2006.16236 [Cs, Stat]*, August.

Li, Zhuohan, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joseph E. Gonzalez. 2020.“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.”*arXiv:2002.11794 [Cs]*, February.

Merrill, William, and Ashish Sabharwal. 2022.“Transformers Implement First-Order Logic with Majority Quantifiers.” arXiv.

Müller, Samuel, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. 2022.“Transformers Can Do Bayesian Inference.” arXiv.

Ortega, Pedro A., Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, et al. 2021.“Shaking the Foundations: Delusions in Sequence Models for Interaction and Control.”*arXiv:2110.10819 [Cs]*, October.

Phuong, Mary, and Marcus Hutter. 2022.“Formal Algorithms for Transformers.” arXiv.

Piantadosi, Steven T., and Felix Hill. 2022.“Meaning Without Reference in Large Language Models.” arXiv.

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019.“Language Models Are Unsupervised Multitask Learners,” 24.

Ramsauer, Hubert, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, et al. 2020.“Hopfield Networks Is All You Need.”*arXiv:2008.02217 [Cs, Stat]*, July.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017.“Attention Is All You Need.”*arXiv:1706.03762 [Cs]*, June.

Yang, Greg, and Edward J. Hu. 2020.“Feature Learning in Infinite-Width Neural Networks.”*arXiv:2011.14522 [Cond-Mat]*, November.

Zhang, Yi, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, and Tal Wagner. 2022.“Unveiling Transformers with LEGO: A Synthetic Reasoning Task.” arXiv.

Avoid having to integrate by parts twice

Suppose\(f(x)\) and\(g(x)\) are functions that are each proportional to their second derivative. These include exponential, circular, and hyperbolic functions. Then the integral of\(f(x) g(x)\) can be computed in closed form with a moderate amount of work. There's a formula that can compute all these related integrals in one fell swoop.(

PeaseUseful1959?) Suppose\[ f^{\prime \prime}(x)=h f(x) \] and\[ g^{\prime \prime}(x)=k g(x) \] for constants\(h\) and\(k\). …Then\[ \int f(x) g(x) d x=\frac{1}{h-k}\left(f^{\prime}(x) g(x)-f(x) g^{\prime}(x)\right)+C \text {. } \]

consider the integral\[ \int e^{x^2} d x \] The most common approach to evaluating this integral is to expand it as a power series and integrate term-by-term, which yields\[ C+x+\frac{x^3}{3}+\frac{x^5}{10}+\frac{x^7}{42}+\frac{x^9}{216}+\cdots \] as the antiderivative, with\(C\) as the constant of integration. Maclaurin Integration is an alternative solution by series (although not a power series, since it involves the function itself in the solution) that eliminates the nuisance of calculation completely.

…the formula is simply:\[ \int f(x) d x=-\sum_{u=0}^{\infty}\left(\frac{d^u}{d x^u}(x f(x)) \sum_{n=0}^{\infty} \frac{(1-x)^{n+u+1}}{\prod_{v=1}^{u+1}(n+v)}\right)+C, \] where\(C\) is the constant of integration.…

This formula is valid only if: 1.\(f(x)\) is defined on the domain\((0,2)\), 2.\(f(x)\) is continuous on\((0,2)\), and 3.\(x f(x)\) has derivatives of all orders on\((0,2)\).

Bruda, Glenn. 2022.“Maclaurin Integration: A Weapon Against Infamous Integrals.” arXiv.

Likebrowser automation but without the browser, web API automation tools use the published access points of various web services (dropbox, facebook, github etc) to do automate stuff for you.

For a while there this was how we thought the internet would be run - a smooth and integrated automated flow of information for the benefit of users. There has been a lot less hype about that recently, as it turns out the best way to monetise users is not to make them more efficient but tomake them slower so they will look at your advertising orwatch your antivax video. Despite the claim thatAPIs are dead, these web APIs are still a going concern. There are too many automation tools here if anything.

A lot of them are examples oflow code development adapted to a particular domain, or application ofgraphical flow based programming, and there are more examples on those pages.

IFTTT is the classic here, although they’vebeen a bunch of cocks recently.

zapier might be lesser cocks than ifttt? They cost money, but come more highly recommended, e.g. by NGO coryphéeJoe Moran and look terribly useful. But they do not have great entry-level plans for occasional users and the cheapest option is USD19/month so I have not used them.

Botize might be lesser cocks than ifttt?Their interface is nicer to my eyes

Pipedream seems to be Zapier for developers, withlow-friction hosting of code.

Pipedream is an integration platform for developers to build and runworkflows that integrate apps, data, and APIs — no servers or infrastructure to manage!

- Develop any workflow, based on any trigger.
- Workflows are code, which you can run forfree.
- No server or cloud resources to manage.

In other tools, you typically have to setup infrastructure to process events— typically you setup an HTTP endpoint, then run a script on a container, or have to manage a serverless function. This takes time to write and maintain.

Pipedream is purpose-built for running workflows on event data, so we take care of the infrastructure and boilerplate configuration for you.

**Pipedream lets you focus on***what*you want done, and we take care of*how*to do it for you.Microsoft flow is a commercial entrant in this domain, which your large enterprise might have already licensed.

~~conditionalactionprogrammer has hot AI tech to do automation apparently but clearly no hot AI tech coming up with usable names.~~

- Automatisch - Open Source Zapier Alternative /automatisch/automatisch: The open source Zapier alternative. Build workflow automation without spending time and money.
- n8n is a self hosted sort-of-open source IFTTT alternative.
- Huginn is a self-hosted open source IFTTT alternative.
`nodered`

the IoT graphicalflow based programming system includes various boilerplate for internet automation. Here, for example, is achatbot.- trigger-happy is a self-hosted open source IFTTT alternative. Nifty: It supportspelican.

The internet is full of guides to training neural nets. Here are some selected highlights.

Michael Nielson has afree online textbook with code examples in python.Christopher Olah’s visual explanations make many things clear.

Andrej’s popular unromantic messy guide totraining neural nets in practice has a lot of tips that people tend to rediscover the hard way if they do not get them from him. (I did)

It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. … Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier.

- google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.
- Making Deep Learning go Brrrr From First Principles
- Monitor & Improve GPU Usage for Model Training on Weights & Biases
- Tracking system resource (GPU, CPU, etc.) utilization during training with the Weights & Biases Dashboard
- Algorithms for Modern Hardware - Algorithmica
- pytorch profilers

I have used

- pytorch
- julia
- jax
- Occasionally, reluctantly,Tensorflow

I could use any of the otherautodiff systems, such as…

- Intel’sngraph, which compiles neural nets esp for CPUs
- Collaboratively build, visualize, and design neural nets in browser
- Theano (Python) (nowdefunct) was a trailblazer
- Torch (lua) —in practice deprecated in favour of pytorch
- Caffe was popular for a while; have not seen it recently (MATLAB/Python)
- Paddlepaddle is one of Baidu’s NN properties (Python/C++)
- mindspore is Huawei’s framework based onsource transformation autodiff, targets interesting edge hardware.
- Minimalisttiny-dnn is a C++11 implementation of certain tools for deep learning. It is targets deep learning on limited-compute, embedded systems and IoT devices.
- javascript: seejavascript machine learning
- julia: Variousautodiff andfull-service ML tools.

Seeconfiguring experiments; in practice I usehydra for everything.

Caffe format:

TheCaffe Zoo has lots of nice models, pre-trainedon their wiki

Here’s a great CV one, Andrej Karpathy’s image captioner,Neuraltalk2

For lasagne:https://github.com/Lasagne/Recipes/tree/master/modelzoo

For Keras:

A lot of the time managing deep learning is remembering which axis is which. Practically, I have foundEinstein convention to solve all my needs.

However, there are alternatives. Alexander Rushargues for NamedTensor. Implementations:

- Native Pytorch
- namedtensor pytorch
- labeledtensor tensorflow

Three wyrd sisters, poster,presentation andpaper attend upon the ritual for conjuring an academic career. Of these, poster is the least regarded, but is sometimes necessary for all that.

Get some hot advice from…

- Tullio Rossi,How to design an award-winning conference poster
- Better Posters, a poster critique and tips blog

SeePDFs for some useful practical mechanics of design and printing.

Pro-tip: if you are plotting usingpython, one should be aware thatseaborn has a poster mode for scaling line widths and fonts sensibly.

I do not know ifbiorender poster is any good, but it was created AFAICT by scientific communication experts, unlike many tools in this domain which were created by scientists or graphic designers, or in the case of Microsoft Powerpoint, a committee containing none of the above.

Scribus is a good open source desktop publisher system (think InDesign, but free, with all the good and bad that this entails.)

The in-built LaTeX renderer does not support big font sizes per default butone can force that manually by overriding the supplied preamble.

This still doesn’t get you the correct margins, which matters for long equations. For that you need Chloé-Agathe Azencott’sLaTeX geometry hacks. Combining these:

```
\documentclass[$scribus_fontsize$]{extarticle}
\usepackage[left=0cm,top=0cm,right=0cm,bottom=0cm,nohead,nofoot, paperwidth=$scribus_realwidth$pt,paperheight=$scribus_realheight$ pt]{geometry}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{xcolor}
\usepackage{fourier}% uses Utopia font for text and math
\begin{document}
{
\fontsize{48pt}{48pt}
\selectfont
\begin{align*}
\mathcal{A}\{c\phi\}(\xi)&= c^2\mathcal{A}\{\phi\}(\xi)\\
\mathcal{A}\{\phi(r t)\}(\xi)&= \frac{1}{r} \mathcal{A}\{\phi\}\left(\frac{\xi}{r}\right)\\
\mathsf{E}\left[ \mathcal{A}\{S_1\phi + S_2\phi'\}(\xi)\right]
&=\mathcal{A}\{\phi\}(\xi)+ \mathcal{A}\{\phi'\}(\xi) \\
\end{align*}
where \(\{S_i\}\) are i.i.d. Rademacher variables.
}
\end{document}
```

Strong Scientific Posters: Presenting science concisely with Bruce Kirchoff

How to create a better research poster in less time (#betterposter Generation 2).

astrobites,Fixing academic posters: the #BetterPoster approach

Francl (2023)

Once touted as an innovative way to present results, sure to become the preferred mode of presentation by top scientists, the poster session quickly devolved into a venue for primarily graduate students and postdocs. Why do we consider a poster to be a second-class presentation? Are we simply stuffy and unwilling to change the nearly two-century-old style of scientific meetings? Or is there something inherent in the poster genre that makes it less attractive to experienced scientists or better suited to trainees? I rummaged through old newsletters and conference proceedings to find out what drove the adoption of the scientific poster session, and what might have led to the narrowing of its pool of presenters.

Francl, Michelle. 2023.“Poster Children.”*Nature Chemistry* 15 (1): 1–2.

Doing stuff onclassic HPC clusters.

`Slurm`

,`torque`

,`PlatformLSF`

all implement a similar API providing concurrency guarantees specified by the famous*Byzantine committee-designed greasy totem pole* priority system.
Empirical observation: the IT department for any given cluster often seems reluctant to document which one they are using.
Typically a campus cluster will come with some gruff example commands that worked for that guy that time, but not much more.
Usually that guy that time was running a molecular simulation package written in some language I have never heard of.
Presumably this is often a combination of the understandable desire not to write documentation, and a kind of availability-through-obscurity demand-management.
They are typically less eager to allocate GPUs, slightly confused by all this modern neural network stuff, and downright flabbergasted bycontainers.

To investigate: Apparently there is a modern programmatic API to some of these schedulers calledDRMAA, (Distributed Resource Management Application API), which allows fairly generic job definition and which works on my local cluster, supposedly, although they for sure have nto documented how.

Anyway, here are some methods for getting stuff done that work well for my use-cases, which tend towards statistical inference and neural nets etc pautod ## submitit

My current fave for python.

submitit is a recent entrant which programmatically submits jobs from inside python. It looks like this:

```
import submitit
def add(a, b):
return a + b
# executor is the submission interface (logs are dumped in the folder)
executor = submitit.AutoExecutor(folder="log_test")
# set timeout in min, and partition for running the job
executor.update_parameters(
timeout_min=1, slurm_partition="dev",
tasks_per_node=4 # number of cores
)
job = executor.submit(add, 5, 7) # will compute add(5, 7)
print(job.job_id) # ID of your job
output = job.result() # waits for completion and returns output
assert output == 12 # 5 + 7 = 12... your addition was computed in the cluster
```

The docs could be better. Here are some example pages that show how it goes though:

Here is a pattern I use when running a bunch of experiments via submitit:

```
import bz2
import cloudpickle
job_name = "my_cool_job"
with bz2.open(job_name + ".job.pkl.bz2", "wb") as f:
cloudpickle.dump(jobs, f)
exp_list = []
# # Optionally append these results to a previous run
# with bz2.open(job_name + ".pkl.bz2", "rb") as f:
# exp_list.extend(pickle.load(f))
with bz2.open(job_name + ".job.pkl.bz2", "rb") as f:
jobs = cloudpickle.load(f)
[job.wait() for job in jobs]
fail_ids = [job.job_id for job in jobs if job.state not in ('DONE', 'COMPLETED')]
exp_list.extend([job.result() for job in jobs if job.job_id not in fail_ids ])
failures = [job for job in jobs if job.job_id in fail_ids]
if failures:
print("failures")
print("===")
for job in failures:
print(job.state, job.stderr())
with bz2.open(job_name + ".pkl.bz2", "wb") as f:
cloudpickle.dump(exp_list, f)
```

Cool feature: the spawning script only need to survive as long as it takes to put jobs on the queue, and then it can die. Later on we can reload those jobs from disk.

Dask.distributed works well on a multi-machien job on the clust apparently, and will even spawn the Slurm job.

Easily distributing a parallel IPython Notebook on a cluster:

Have you ever asked yourself: “Do I want to spend 2 days adjusting this analysis to run on the cluster and wait 2 days for the jobs to finish or do I just run it locally with no extra work and just wait a week.”

Or:ipython-cluster-helper automates that.

“Quickly and easily parallelize Python functions using IPython on a cluster, supporting multiple schedulers. Optimizes IPython defaults to handle larger clusters and simultaneous processes.” […]

ipython-cluster-helper creates a throwaway parallel IPython profile, launches a cluster and returns a view. On program exit it shuts the cluster down and deletes the throwaway profile.

works on Platform LSF, Sun Grid Engine, Torque, SLURM. Strictly python.

Handy if what I am running is many parallel experiments, and includes a parallel job submission system. Seehydra ML.

⚠️ seems to be discontinued.

An alternative option for many use cases istest-tube, a “Python library to easily log experiments and parallelize hyperparameter search for neural networks”. AFAICT there is nothing neural-network specific in this and it will happily schedule a whole bunch of useful types of task, generating the necessary scripts and keeping track of what is going on. This function is not obvious from the front page description of this software library, but seetest-tube/SlurmCluster.md. (Thanks for pointing me to this,Chris Jackett.)

See alsoDRMAA Python, which is a Python wrapper around the DRMAA API.

Other ones I looked at:Andrea Zonca wrote a script that allows spawning jobs on a cluster from a Jupyter notebook. After several iterations and improvements it is now calledbatchspawner.

`snakemake`

supports`make`

-like build workflows for clusters.
Seem general and powerful but complicated.

hanythingondemand provides a set of scripts to easily set up an ad-hoc Hadoop cluster through PBS jobs.

InJulia there is a rather fancy systemJuliaParallel/ClusterManagers.jl which supports most major HPC job managers automatically.

There is also a bare-bonescth/QsubCmds.jl: Run Julia external (shell) commands on a HPC cluster.

More modern tools facilitate very sophisticated workflows with execution graphs and pipelines and such. One that was briefly pitched to us that I did not ultimately use:nextflow

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.

Its fluent DSL simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

Nextflow supports Docker and Singularity containers technology.

This, along with the integration of the GitHub code sharing platform, allows you to write self-contained pipelines, manage versions and to rapidly reproduce any former configuration.

It provides out of the box executors for SGE, LSF, SLURM, PBS and HTCondor batch schedulers and for Kubernetes, Amazon AWS and Google Cloud platforms.

I think that we are actually going to be givenD2iQ Kaptain: End-to-End Machine Learning Platform instead? TBC.

Things that I think should be noted and filed in an orderly fashion, but which I lack time to address right now. Content will change incessantly.

I need to reclassify thebio computing links; that section has become confusing and there are too many nice ideas there not clearly distinguished.

- Do organizations have to get slower as they grow? (with Alex Komoroske) | Clearer Thinking with Spencer Greenberg
- Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle
- Community Developed Lessons
- carpentries-incubator/ml-python-supervised-learning: Supervised Learning with Python
- Introduction to Machine Learning with Scikit Learn

The Carr–Madan formula is really just a special case of a Taylor expansion. For completeness, let’s rederive the Taylor expansion with an integral remainder.

When explaining becomes a sin—by Tom Stafford file under taboos and tetlocka nd compassion/comprehension

Karloo Pools and the hidden alternative swimming spots nearby—Walk My World

Cult Classic ’Fight Club’ Gets a Very Different Ending in China

A Turkish Farmer Tests Out VR Goggles on Cows To Get More Milk

How to buy a social network, with Tumblr CEO Matt Mullenweg—The Verge

Fake Feelings—ai emo. When post-hardcore emo band Silverstein… | by Dadabots—Medium

Why the super rich are inevitable

Meanwhile, the richer player will gain money. That’s because, from their perspective, every game they lose means they have an opportunity to win it back—and then some—in the next coin flip. Every game they win means, no matter what happens in the next coin flip, they’ll still be at a net-plus.

Repeat this process millions of times with millions of people, and you’re left with one very rich person.

Pluralistic: Tiktok’s enshittification (21 Jan 2023) – Pluralistic: Daily links from Cory Doctorow

Getting The Word Out—by Steven Johnson

I wrote about the disappointing—though I suppose not surprising—lack of coverage of the death of Dilip Mahalanabis, the Bangladeshi doctor who played a critical role in popularizing Oral Rehydration Therapy, the amazingly simply medical intervention that has saved millions of lives around the world over the past fifty years. I noted that as far as I could tell, no mainstream news organization outside of India had run so much as a brief obituary of Mahalanabis, despite the heroic nature of his initial adoption of ORT in the middle of a refugee crisis in the early 1970s, and the long-term legacy of his work. (The Lancet once called ORT “potentially the most important medical advance of the 20th century”.) …when we talk about the history of innovation, we often over-index on the inventors and underplay the critical role of popularizers, the people who are unusually gifted at making the case for adopting a new innovation, or who have a platform that gives them an unusual amount of influence.

Pluralistic: EU to Facebook, ’Drop Dead’ (07 Dec 2022) – Pluralistic: Daily links from Cory Doctorow

In Which Long-Time Netizen & Programmer-at-Arms Dave Winer Records a Podcast for Me, Personally

Þe Forlorn Hope Þt Was Vox.com, & BRIEFLY NOTED*Against Cop Shit—Jeffrey Moro

DRMacIver’s Notebook: Three key problems with Von-Neumann Morgenstern Utility Theory

The first part is about physical difficulties with measurement—you can only know the probabilities up to some finite precision. VNM theory handwaves this away by saying that the probabilities are perfectly known, but this doesn’t help you because that just moves the problem to be a computational one, and requires you to be able to solve the halting problem. e.g. choose between\(L_1=p B+(1-p) W\) and\(L_2=q B+(1-q) W\) where\(p=0.0 \ldots\) until machine\(M 1\) halts and 1 after and\(q\) is the same but for machine\(M 2\).

The second demonstrates that what you get out of the VNM theorem is not a utility function. It is an algorithm that produces a sequence converging to a utility function, and you cannot recreate even the original decision procedure from that sequence without being able to take the limit (which requires running an infinite computation, again giving you the ability to solve the halting problem) near the boundary.

Supervised Training of Conditional Monge Maps—Apple Machine Learning Research

How To Be an Academic Hyper-Producer—Economics from the Top Down

A global analysis of matches and mismatches between human genetic and linguistic histories—PNAS

Desmos—Let’s learn together. graphing calculator online

The Cause of Depression Is Probably Not What You Think—Quanta Magazine

What Monks Can Teach Us About Paying Attention—The New Yorker

Actually, Japan has changed a lot—by Noah Smith — japanese real estate is surprsising

One Useful Thing (And Also Some Other Things) | Ethan Mollick—Substack

The radical idea that people aren’t stupid paired withHow to achieve self-control without “self-control”

Colonialism did not cause the Indian famines—History Reclaimed

Erik van Zwet,Shrinkage Trilogy Explainer on modelling the publication process

Mathematics of the impossible: Computational Complexity—Thoughts

Download the Atkinson Hyperlegible Font—Braille Institute What makes it different from traditional typography design is that it focuses on letterform distinction to increase character recognition, ultimately improving readability. We are making it free for anyone to use!

Low-Rank Approximation Toolbox: Nyström Approximation—Ethan Epperly

-ise or-ize? Is-ize American? (1/3) – Jeremy Butterfield Editorial

Iron deficiencies are very bad and you should treat them—Aceso Under Glass

The Australian academic STEMM workplace post-COVID: a picture of disarray

torchgeo—torchgeo 0.3.1 documentation/microsoft/torchgeo: TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

Merve Emre,Has Academia Ruined Literary Criticism?

Matt Clancy,Age and the Nature of Innovation “Are there some kinds of discoveries that are easier to make when young, and some that are easier to make when older”?

Tom Stafford,Microarguments and macrodecisions

Kevin Munger,Why I am (Still) a Conservative (For Now)

Kevin Munger,Facebook is Other People

Randy Au, inData science has a tool obsession talks about Gear Acquisition Syndrome for data scientists.

Clive Thompson,The Power of Indulging Your Weird, Offbeat Obsessions

omg.lol - A lovable web page and email address, just for you

Donate to a highly effective charity - Effective Altruism Australia Very Powerty focussed, which is important.

What are the best charities to donate to in 2023? · Giving What We Can

karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.

What is the “forward-forward” algorithm, Geoffrey Hinton’s new AI technique?

Simon Willison,AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code

Fission: Build the future of web apps at the edge incubates several decentralized protocols

danah boyd,What if failure is the plan?. I’ve been thinking a lot about failure…

Mastodon—and the pros and cons of moving beyond Big Tech gatekeepers

Michael Nielsen on science online

Great bloggers are rare, weird, and not team players – Kevin Drum

Swayable: RCTs for marketing campaigns via ingenious audience recruiting network

Zoomers Co-Working Community (co-working for accountability)

Normconf Lightning Talks/Normconf: The Normcore Tech Conference — a conference on the stuff that we actually need to do in ML, as opp. the stuff we would like to pretend is what we do.

Jean Gallier and Jocelyn Quaintance ,Algebra, Topology, Differential Calculus, and Optimization Theory for Computer Science and Machine Learning, 2188 pages as of 2022/10/30, and growing.

Terence Eden,You can have user accounts without needing to manage user accounts

Adam Mastroianni, Ludwin-Peery, EJ,Things could be better

Adam Mastroianni,The great myths of political hatred

Big correlations and big interactions ([2105.13445] The piranha problem: Large effects swimming in a small pond)

How to keep cakes moist and cause the greatest tragedies of the 20th century

Distribution testing

GPflow/GeometricKernels: Geometric kernels on manifolds, meshes and graphs

George Ho,How to Improve Your Static Site's Typography (for code formatting)

Invasive Diffusion: How one unwilling illustrator found herself turned into an AI model

Marc ten Bosch,Let's remove Quaternions from every 3D Engine (An Interactive Introduction to Rotors from Geometric Algebra)

Michele Coscia,Meritocracy vs Topocracy

oxcsml/riemannian-score-sde: Score-based generative models for compact manifolds

Public-facing Censorship Is Safety Theater, Causing Reputational Damage

Ti John’sPublications

Starboard, a shareable in-browser notebook that runs pyton (!)

Students Are Using AI to Write Their Papers, Because Of Course They Are

Treehugger Introduces a Modern Pyramid of Energy Conservation

Vast.ai “Rent Cloud GPU Servers for Deep Learning and AI”

Vast.ai “Rent Cloud GPU Servers for Deep Learning and AI”

[Decentralized autonomous organization](https://en.wikipedia.org/wiki/Decentralized_autonomous_organization#:~:text=A%20decentralized%20autonomous%20organization%20(DAO,words%20they%20are%20member%2Downed)

Adam Mastroianni,Things could be better

Michael Burnam-Fink,What is Scientific about Data Science?

Christian Lawson-Perfect’sInteresting Esoterica is a collection of weird papers in maths.

Erik Hoel,Why do most popular science books suck?

Étienne Fortier-Dubois,The Vibes Are Off

George Ho,Understanding NUTS and HMC

Gordon Brander,Coevolution creates living complexity

Gordon Brander,Thinking together

Kate Mannell, Eden T. SmithAlternative Social Media and the Complexities of a More Participatory Culture: A View From Scuttlebutt

Peter Woit,Symmetry and Physics

Rob J Hyndman,We need more open data in Australia

Vicki Boykis,How I learn machine learning

Oshan Jarow,Markets Underinvest In Vitality

Spirals of Delusion: How AI Distorts Decision-Making and Makes Dictators More Dangerous (not convinced tbh)

Erik Hoel,The gossip trap

The Developer Certificate of Origin is a great alternative to a CLA

I. Risk Management Foundations - Machine Learning for Financial Risk Management with Python [Book]

jkbren/einet: Uncertainty and causal emergence in complex networks

Darren Wilkinson’sBayesian inference for a logistic regression model 1,2,3,4,5

Book Review: Public Choice Theory And The Illusion Of Grand Strategy

Stephen Malina —Deriving the front-door criterion with the do-calculus

Census is a tool which links all the weird different data storage systems and CRM stuff

Michael Lewis podcast on illegible experts

Nemanja Rakicevic,NeurIPS Conference: Historical Data Analysis

Yanir Seroussi,The mission matters: Moving to climate tech as a data scientist

Keir Bradwell,#1: In-group Cheems

Samuel Moore,Why open science is primarily a labour issue

Adam Mastroianni,Against All Applications

Have The Effective Altruists And Rationalists Brainwashed Me?

Anthony Lee Zhang,The War for Eyeballs

Digital artists’ post-bubble hopes for NFTs don’t need a blockchain

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

I Do Not Think It Means What You Think It Means: Artificial Intelligence, Cognitive Work & Scale

ClearerThinking.org’s courses, e.g.

- Introduction to Decision Academy: The Science of Better Decisions
- Rhetorical Fallacies: Dodging Argument Traps
- Learning from Mistakes: A Systematic Approach
- Probabilistic Fallacies: Gauging the Strength of Evidence
- Explanation Freeze: Interpreting Uncertain Events
- Aspire: A Tool to Help You Improve Your Life
- The Sunk Cost Fallacy: Focusing on the Future

Reddit forAI-generated and manipulated content

PJ Vogt,Selling Drugs to Buy Crypto

Michele Coscia,Pearson Correlations for Networks

The DAIR Institute “The Distributed AI Research Institute is a space for independent, community-rooted AI research, free from Big Tech’s pervasive influence.”

Machine Learning Trick of the Day (1): Replica Trick— Shakir Mohammed

Machine Learning Trick of the Day (7): Density Ratio Trick— Shakir Mohammed

ApplyingML - Papers, Guides, and Interviews with ML practitioners

Ryan Broderick,We were the unpaid janitors of a bloated tech monopoly

fastdownload: the magic behind one of the famous 4 lines of code · fast.ai

Schneier,When AIs Start Hacking

Multimodal Neurons in Artificial Neural Networks/Distill version of Multimodal Neurons in Artificial Neural Networks

On the Generalization Ability of Online Strongly Convex Programming Algorithms

Bookmarked but where will they ever go?

Dispel your justification-monkey with a “HWA!” - Malcolm Ocean

Roger’s Bacon,Living and Dying with a Mad God

washable & breathable flexiOH cast adapts to the patient’s skin

‘We can continue Pratchett’s efforts’: the gamers keeping Discworld alive

AO3’s 15-year journey from blog post to fanfiction powerhouse - The Verge

today I took a desk lamp whose Halogen light had burned out, whose crappy transformer always made those bulbs sputter, and whose mildly art-deco appearance I’d always liked, and swapped it out to run an LED bulb off USB power. It took about an hour’s work to replace the light with an LED, the switch with a nice heavy clicky one and now the whole thing runs off USB-C instead of wall voltage. It emits no appreciable heat, and if these calculations are to be believed, will run for decades for a few cents per year, assuming I leave it on all the time.

I hadn’t really appreciated how big a deal USB-PD voltage negotiation was until I found out that the little chips that handle that negotiation are about the size of the end of a pencil, that if you include the USB-C port you can replace basically any low-voltage transformer with something smaller than a quarter.

The magic search string, if you want to try this yourself, is “usb-pd trigger module”,

vscode-paste-image/README.md at master · mushanshitiancai/vscode-paste-image

mhoye/awesome-falsehood: 😱 Falsehoods Programmers Believe in

Gary Brecher,The War Nerd: Taiwan — The Thucydides Trapper Who Cried Woof

Evidence of Fraud in an Influential Field Experiment About Dishonesty. Looks bad for Dan Ariely. Damn.

on programming humans (Amir’s work)

Communications' digital initiative and its first digital event

PlayableHalf Earth Socialism simulator

flatmax/vector-synth: Old 2002 era vector synth code based on XFig

Nick Chater,Would you Stand Up to An Oppressive Regime.

Lambda School’s Job Placement Rate May Be Far Worse Than Advertised

I would like to read the diaries ofUsama ibn Munqidh

The latest target of China’s tech regulation blitz: algorithms

State Power and the Power Law,State Power and the Power Law 2

Yuling Yao,The likelihood principle in model check and model evaluation “We are (only) interested in estimating an unknown parameter\(\theta\), and there are two data generating experiments both involving\(\theta\) with observable outcomes\(y_1\) and\(y_2\) and likelihoods\(p_1\left(y_1 \mid \theta\right)\) and\(p_2\left(y_2 \mid \theta\right)\). If the outcome-experiment pair satisfies\(p_1\left(y_1 \mid \theta\right) \propto p_2\left(y_2 \mid \theta\right)\), (viewed as a function of\(\theta\) ) then these two experiments and two observations will provide the same amount of information about\(\theta\).”

Liquid Information Flow Control, a confidential computing DSL

Jag Bhalla,Vaccine Greed: Capitalism Without Competition Isn’t Capitalism, It’s Exploitation

Kostas Kiriakakis,A Day At The Park

By analyzing medical text and extracting biomedical entities and relations from the entire history of published medical science, Xyla can facilitate better real-world evidence-based clinical decision support and help make clinical research—such as research into new treatments, including de novo drug design as well as the repurposing of existing drugs—smarter and faster. In so doing, Xyla is fulfilling its mission of organizing the world’s medical knowledge and making it more useful.

My2050 calculator - create your pathway for the UK to be net zero by 2050

Is Pandemic Stress to Blame for the Rise in Traffic Deaths? Nope apparently it is decreased congestion making drivers drive faster on shit roads.

Marisa Abrajano has a provoking list of research topics. I would like to read the work to see her methodology.

Do normal people need to know or care about “the metaverse”?

Apple acquires song-shifting startup AI Music, here’s what it could mean for users

Black Americans are pessimistic about their position in U.S. society

Smart technologies | Internet Policy Review

Speaking of ‘smart’ technologies we may avoid the mysticism of terms like ‘artificial intelligence’ (AI). To situate ‘smartness’ I nevertheless explore the origins of smart technologies in the research domains of AI and cybernetics. Based in postphenomenological philosophy of technology and embodied cognition rather than media studies and science and technology studies (STS), the article entails a relational and ecological understanding of the constitutive relationship between humans and technologies, requiring us to take seriously their affordances as well as the research domain of computer science. To this end I distinguish three levels of smartness, depending on the extent to which they can respond to their environment without human intervention: logic-based, grounded in machine learning or in multi-agent systems. I discuss these levels of smartness in terms of machine agency to distinguish the nature of their behaviour from both human agency and from technologies considered dumb. Finally, I discuss the political economy of smart technologies in light of the manipulation they enable when those targeted cannot foresee how they are being profiled.

Concurrent programming, with examples

Mention concurrency and you’re bound to get two kinds of unsolicited advice: first that it’s a nightmarish problem which will melt your brain, and second that there’s a magical programming language or niche paradigm which will make all your problems disappear.

We won’t run to either extreme here. Instead we’ll cover the production workhorses for concurrent software – threading and locking – and learn about them through a series of interesting programs. By the end of this article you’ll know the terminology and patterns used by POSIX threads (pthreads).

A study of lights at night suggests dictators lie about economic growth

DIY Collective Embeds Abortion Pill Onto Business Cards, Distributes Them At Hacker Conference

Penny Wyatt,Developer Innovation and the Free Puppy

Elizabeth Van Nostrand,A Quick Look At 20% Time

Chalk is a non-terrible calculator for macos, incorporating useful things like matrices and bitwise ops

- Tutorial introductions
- As least-squares
- Going nonlinear
- Monte Carlo moves in the ensemble
- Managing overconfidence
- Ensemble methods in smoothing
- System identification in
- Theoretical basis for probabilists
- Lanczos trick in precision estimates
- Relation to particle filters
- Schilling’s filter
- Hutchinson trace estimator for
- Likelihood in
- Incoming
- References

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

A random-sampling variant/generalisation of theKalman-Bucy filter. That also describesparticle filters, but the randomisation is different than those. We can do both types of randomisation. This sent has a few tweaks that make it more tenable in tricky situations with high dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method forspatiotemporal models.

Katzfuss, Stroud, and Wikle (2016);Roth et al. (2017);Fearnhead and Künsch (2018), are all pretty good.Schillings and Stuart (2017) has been recommended byHaber, Lucka, and Ruthotto (2018) as the canonical modern version.Wikle and Berliner (2007) present a broad data assimilation context on these methods, although it is too curt to be helpful for me.Mandel (2009) is helpfully longer. The inventor of the method explains it inGeir Evensen (2003), but I could make neither head nor tail of that, since it uses too much oceanography terminology.Roth et al. (2017) is probably the best for my background. Let us copy their notation.

We start from the discrete-time state-space models; the basic one is thelinear system\[ \begin{aligned} x_{k+1} &=F x_{k}+G v_{k}, \\ y_{k} &=H x_{k}+e_{k}, \end{aligned} \] with state\(x\in\mathbb{R}^n\) and the measurement\(y\in\mathbb{R}^m\). The initial state\(x_{0}\), the process noise\(v_{k}\), and the measurement noise\(e_{k}\) are mutually independent such that\[\begin{aligned} \Ex x_{0}&=\hat{x}_{0}\\ \Ex v_{k}&=0\\ \Ex e_{k}&=0\\ \cov x_{0} &=P_{0}\\ \cov v_{k} & =Q\\ \cov e_{k}&=R \end{aligned}\] and all areGaussian.

The Kalman filter propagates state estimates\(\hat{x}_{k \mid k}\) and covariance matrices\(P_{k \mid k}\) for this model.
The KF*update* or*prediction* or*forecast* is given by the step\[
\begin{aligned}
&\hat{x}_{k+1 \mid k}=F \hat{x}_{k \mid k} \\
&P_{k+1 \mid k}=F P_{k \mid k} F^{\top}+G Q G^{\top}
\end{aligned}
\]
We predict the observations forward using these state estimates via\[
\begin{aligned}
\hat{y}_{k \mid k-1} &=H \hat{x}_{k \mid k-1}, \\
S_{k} &=H P_{k \mid k-1} H^{\top}+R .
\end{aligned}
\]
Given these and an actual observation, we update the state estimates using a*gain matrix*,\(K_{k}\)\[
\begin{aligned}
\hat{x}_{k \mid k} &=\hat{x}_{k \mid k-1}+K_{k}\left(y_{k}-\hat{y}_{k \mid k-1}\right) \\
&=\left(I-K_{k} H\right) \hat{x}_{k \mid k-1}+K_{k} y_{k}, \\
P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1}\left(I-K_{k} H\right)^{\top}+K_{k} R K_{k}^{\top}.
\end{aligned}
\]
in what geoscience types refer to as an*analysis* update.
The variance-minimising gain is given\[
K_{k}=P_{k \mid k-1} H^{\top} S_{k}^{-1}=M_{k} S_{k}^{-1},
\]
where\(M_{k}\) is the cross-covariance between the state and output predictions.

In the Ensemble Kalman filter, we approximate some of these quantities of interest using samples; this allows us to relax the assumption of Gaussianity and gets us computational savings in certain problems of interest. That does sound very similar toparticle filters, and indeed there is a relation.

Instead of maintaining the\(n\)-dimensional estimate\(\hat{x}_{k \mid k}\) and the\(n \times n\) covariance\(P_{k \mid k}\) as such, we maintain an ensemble of\(N<n\) sampled state realizations\[X_{k}:=\left[x_{k}^{(i)}\right]_{i=1}^{N}.\]
This notation is intended to imply that we are treating these realisations as an\(n \times N\) matrix\(X_{k \mid k}\) with columns\(x_{k}^{(i)}\).
We introduce the following notation for ensemble moments:\[
\begin{aligned}
&\bar{x}_{k \mid k}=\frac{1}{N} X_{k \mid k} \one \\
&\bar{P}_{k \mid k}=\frac{1}{N-1} \widetilde{X}_{k \mid k} \widetilde{X}_{k \mid k}^{\top},
\end{aligned}
\]
where\(\one=[1, \ldots, 1]^{\top}\) is an\(N\)-dimensional vector and\[
\widetilde{X}_{k \mid k}=X_{k \mid k}-\bar{x}_{k \mid k} \one^{\top}=X_{k \mid k}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right)
\]
is an ensemble of*anomalies*/*deviations* from\(\bar{x}_{k \mid k}\), which I would call it the*centred version*.
We attempt to match the moments of the ensemble with those realised by a true Kalman filter, in the sense that\[
\begin{aligned}
&\bar{x}_{k \mid k}:=\frac{1}{N} \sum_{i=1}^{N} x_{k}^{(i)} \approx \hat{x}_{k \mid k}, \\
&\bar{P}_{k \mid k}:=\frac{1}{N-1} \sum_{i=1}^{N}\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)^{\top} \approx P_{k \mid k} .
\end{aligned}
\]
The forecast step computes\(X_{k+1 \mid k}\) such that its moments are close to\(\hat{x}_{k+1 \mid k}\) and\(P_{k+1 \mid k}\).
An ensemble of\(N\) independent process noise realizations\(V_{k}:=\left[v_{k}^{(i)}\right]_{i=1}^{N}\) with zero mean and covariance\(Q\), is used in\[
X_{k+1 \mid k}=F X_{k \mid k}+G V_{k}.
\]

Next the\(X_{k \mid k-1}\) is adjusted to obtain the filtering ensemble\(X_{k \mid k}\) by applying an update to each ensemble member: With some gain matrix\(\bar{K}_{k}\) the KF update is applied to the ensemble by the update\[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top} . \] This does not yet approximate the update of the full Kalman observation — there is no term\(\bar{K}_{k} R \bar{K}_{k}^{\top}\); We have a choice how to implement that.

In the stochastic method, we use artificial zero-mean measurement noise realizations\(E_{k}:=\left[e_{k}^{(i)}\right]_{i=1}^{N}\) with covariance\(R\).\[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top}-\bar{K}_{k} E_{k} . \] The resulting\(X_{k \mid k}\) has the correct ensemble mean and covariance,\(\hat{x}_{k \mid k}\) and\(P_{k \mid k}\).

If we define a predicted output ensemble\[ Y_{k \mid k-1}=H X_{k \mid k-1}+E_{k} \] that evokes the classic Kalman update (and encapsulates information about)\(\hat{y}_{k \mid k-1}\) and\(S_{k}\), we can rewrite this update into one that resembles the Kalman update:\[ X_{k \mid k}=X_{k \mid k-1}+\bar{K}_{k}\left(y_{k} \one^{\top}-Y_{k \mid k-1}\right) . \]

Now, the gain matrix\(\bar{K}_{k}\) in the classic KF is computed from the covariance matrices of the predicted state and output. In the EnKF, the required\(M_{k}\) and\(S_{k}\) must be estimated from the prediction ensembles. The obvious way of doing that is to once again centre the ensemble,\[ \begin{aligned} &\widetilde{X}_{k \mid k-1}=X_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \\ &\widetilde{Y}_{k \mid k-1}=Y_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \end{aligned} \] and use the empirical ensemble covariances\[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{X}_{k \mid k-1}^{\top}, \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} . \end{aligned} \] The gain\(\bar{K}_{k}\) is then the solution to the system of linear equations,\[ \bar{K}_{k} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}=\widetilde{X}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} \]

Resemblance to unscented/sigma-point filtering also apparent. TBD.

The additive measurement noise model we have used the\(e_{k}\) for should not affect the cross covariance\(M_k\). Thus it is reasonable to make the substitution\[ \widetilde{Y}_{k \mid k-1}\longrightarrow \widetilde{Z}_{k \mid k-1}=H \widetilde{X}_{k \mid k-1} \] to get a less noisy update\[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top} \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Z}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top}+R \end{aligned} \] The Kalman gain\(\bar{K}_{k}\) is then computed as in the KF. Or we can interpret it as a matrix square-root\(R^{\frac{1}{2}}\) with\(R^{\frac{1}{2}} R^{\frac{\top}{2}}=R\) and then factorize\[ \bar{S}_{k}=\left[\begin{array}{cc}\frac{1}{\sqrt{N-1}} \widetilde{Z}_{k \mid k-1}\quad R^{\frac{1}{2}}\end{array}\right] \left[\begin{array}{c}\frac{1}{\sqrt{N-1}} \widetilde{Z}^{\top}_{k \mid k-1} \\ R^{\frac{\top}{2}}\end{array}\right]. \]

TBD: EAKF and ETKF(Tippett et al. 2003) which deterministically propagate an estimate\[ P_{k \mid k}^{\frac{1}{2}} P_{k \mid k}^{\frac{\top}{2}}=P_{k \mid k} \] which introduces less sampling noise.Roth et al. (2017) explain it as rewriting the measurement update to use a square root\(P_{k \mid k-1}^{\frac{1}{2}}\) and in particular the ensemble approximation\(\frac{1}{N-1} \widetilde{X}_{k \mid k-1}\) :\[ \begin{aligned} P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1} \\ &=P_{k \mid k-1}^{\frac{1}{2}}\left(I-P_{k \mid k-1}^{\frac{\top}{2}} H^{\top} S_{k}^{-1} H P_{k \mid k-1}^{\frac{1}{2}}\right) P_{k \mid k-1}^{\frac{\top}{2}} \\ & \approx \frac{1}{N-1} \widetilde{X}_{k \mid k-1}\left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right) \widetilde{X}_{k \mid k-1}^{\top}. \end{aligned} \] Factorising,\[ \left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right)=\Pi_{k}^{\frac{1}{2}} \Pi_{k}^{\frac{\top}{2}}, \] The\(\Pi_{k}^{\frac{1}{2}}\in\mathbb{R}^{N\times N}\) can be used to create a deviation ensemble\[ \tilde{X}_{k \mid k}=\tilde{X}_{k \mid k-1} \Pi_{k}^{\frac{1}{2}} \] that correctly encodes\(P_{k \mid k}\) without using random perturbations. The actual filtering is achieved by updating each sample according to\[ \bar{x}_{k \mid k}=\left(I-\bar{K}_{k} H\right) F_{x_{k-1 \mid k-1}}+\bar{K}_{k} y_{k}, \] where\(\bar{K}_{k}\) is computed from the deviation ensembles.

TBD. Permits calculating the operations without forming covariance matrices.

TBD

The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?

TBD

Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing:Stroud et al. (2010) -type reverse methods, and the EnKS(Geir Evensen and van Leeuwen 2000) which augments the states with lagged copies rather than doing a reverse pass.

Here are some other papers I sawN. K. Chada, Chen, and Sanz-Alonso (2021);Luo et al. (2015);White (2018);Zhang et al. (2018).

Can we use ensemble methods foronline parameter estimation? Apparently.G. Evensen (2009);Malartic, Farchi, and Bocquet (2021);Moradkhani et al. (2005);Fearnhead and Künsch (2018).

Bishop and Del Moral (2020);Del Moral, Kurtzmann, and Tugaut (2017);Garbuno-Inigo et al. (2020);Kelly, Law, and Stuart (2014);Le Gland, Monbet, and Tran (2009);Taghvaei and Mehta (2019).

Intimate. Seeparticle filters.

Claudia Schilling’s filter(Schillings and Stuart 2017) is an elegant version which looks somehow more general than the original but also simpler.Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

Shakir Mohamed mentionsHutchinson’s Trick, and was introduced to it, as I was, byDr Maurizio Filippone. This trick also works with efficiently with the ensemble Kalman filter, where the randomised products are cheap.

TBD

- DART | The Data Assimilation Research Testbed (Fortran, …matlab?) has nice tutorials, e.g.DART Tutorial
- OpenDA: Integrating models and observations (python and c++?)

Alsup, Terrence, Luca Venturi, and Benjamin Peherstorfer. 2022.“Multilevel Stein Variational Gradient Descent with Applications to Bayesian Inverse Problems.” In*Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference*, 93–117. PMLR.

Alzraiee, Ayman H., Jeremy T. White, Matthew J. Knowling, Randall J. Hunt, and Michael N. Fienen. 2022.“A Scalable Model-Independent Iterative Data Assimilation Tool for Sequential and Batch Estimation of High Dimensional Model Parameters and States.”*Environmental Modelling & Software* 150 (April): 105284.

Anderson, Jeffrey L. 2007.“Exploring the Need for Localization in Ensemble Data Assimilation Using a Hierarchical Ensemble Filter.”*Physica D: Nonlinear Phenomena*, Data Assimilation, 230 (1): 99–111.

———. 2009.“Ensemble Kalman Filters for Large Geophysical Applications.”*IEEE Control Systems Magazine* 29 (3): 66–82.

Anderson, Jeffrey, Tim Hoar, Kevin Raeder, Hui Liu, Nancy Collins, Ryan Torn, and Avelino Avellano. 2009.“The Data Assimilation Research Testbed: A Community Facility.”*Bulletin of the American Meteorological Society* 90 (9): 1283–96.

Bickel, Peter J., and Elizaveta Levina. 2008.“Regularized Estimation of Large Covariance Matrices.”*The Annals of Statistics* 36 (1): 199–227.

Bishop, Adrian N., and Pierre Del Moral. 2020.“On the Mathematical Theory of Ensemble (Linear-Gaussian) Kalman-Bucy Filtering.”*arXiv:2006.08843 [Math, Stat]*, June.

Bocquet, Marc, Carlos A. Pires, and Lin Wu. 2010.“Beyond Gaussian Statistical Modeling in Geophysical Data Assimilation.”*Monthly Weather Review* 138 (8): 2997–3023.

Chada, Neil K., Yuming Chen, and Daniel Sanz-Alonso. 2021.“Iterative Ensemble Kalman Methods: A Unified Perspective with Some New Variants.”*Foundations of Data Science* 3 (3): 331.

Chada, Neil, and Xin Tong. 2022.“Convergence Acceleration of Ensemble Kalman Inversion in Nonlinear Settings.”*Mathematics of Computation* 91 (335): 1247–80.

Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022.“A Novel Neural Network Training Framework with Data Assimilation.”*The Journal of Supercomputing*, June.

Chen, Yuming, Daniel Sanz-Alonso, and Rebecca Willett. 2021.“Auto-Differentiable Ensemble Kalman Filters.”*arXiv:2107.07687 [Cs, Stat]*, July.

Del Moral, P., A. Kurtzmann, and J. Tugaut. 2017.“On the Stability and the Uniform Propagation of Chaos of a Class of Extended Ensemble Kalman-Bucy Filters.”*SIAM Journal on Control and Optimization* 55 (1): 119–55.

Dubrule, Olivier. 2018.“Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In*Handbook of Mathematical Geosciences: Fifty Years of IAMG*, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.

Duffin, Connor, Edward Cripps, Thomas Stemler, and Mark Girolami. 2021.“Statistical Finite Elements for Misspecified Models.”*Proceedings of the National Academy of Sciences* 118 (2).

Dunbar, Oliver R. A., Andrew B. Duncan, Andrew M. Stuart, and Marie-Therese Wolfram. 2022.“Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods.”*SIAM Journal on Applied Dynamical Systems* 21 (2): 1539–72.

Evensen, G. 2009.“The Ensemble Kalman Filter for Combined State and Parameter Estimation.”*IEEE Control Systems* 29 (3): 83–104.

Evensen, Geir. 2003.“The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation.”*Ocean Dynamics* 53 (4): 343–67.

———. 2004.“Sampling Strategies and Square Root Analysis Schemes for the EnKF.”*Ocean Dynamics* 54 (6): 539–60.

———. 2009.*Data Assimilation - The Ensemble Kalman Filter*. Berlin; Heidelberg: Springer.

Evensen, Geir, and Peter Jan van Leeuwen. 2000.“An Ensemble Kalman Smoother for Nonlinear Dynamics.”*Monthly Weather Review* 128 (6): 1852–67.

Fearnhead, Paul, and Hans R. Künsch. 2018.“Particle Filters and Data Assimilation.”*Annual Review of Statistics and Its Application* 5 (1): 421–49.

Finn, Tobias Sebastian, Gernot Geppert, and Felix Ament. 2021.“Ensemble-Based Data Assimilation of Atmospheric Boundary Layerobservations Improves the Soil Moisture Analysis.” Preprint. Catchment hydrology/Modelling approaches.

Furrer, R., and T. Bengtsson. 2007.“Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants.”*Journal of Multivariate Analysis* 98 (2): 227–55.

Furrer, Reinhard, Marc G Genton, and Douglas Nychka. 2006.“Covariance Tapering for Interpolation of Large Spatial Datasets.”*Journal of Computational and Graphical Statistics* 15 (3): 502–23.

Galy-Fajou, Théo, Valerio Perrone, and Manfred Opper. 2021.“Flexible and Efficient Inference with Particles for the Variational Gaussian Approximation.”*Entropy* 23 (8): 990.

Garbuno-Inigo, Alfredo, Franca Hoffmann, Wuchen Li, and Andrew M. Stuart. 2020.“Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.”*SIAM Journal on Applied Dynamical Systems* 19 (1): 412–41.

Grooms, Ian, and Gregor Robinson. 2021.“A Hybrid Particle-Ensemble Kalman Filter for Problems with Medium Nonlinearity.”*PLOS ONE* 16 (3): e0248266.

Guth, Philipp A., Claudia Schillings, and Simon Weissmann. 2020.“Ensemble Kalman Filter for Neural Network Based One-Shot Inversion.” arXiv.

Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018.“Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.”*arXiv:1805.08034 [Cs, Math]*, May.

Hou, Elizabeth, Earl Lawrence, and Alfred O. Hero. 2016.“Penalized Ensemble Kalman Filters for High Dimensional Non-Linear Systems.”*arXiv:1610.00195 [Physics, Stat]*, October.

Houtekamer, P. L., and Herschel L. Mitchell. 2001.“A Sequential Ensemble Kalman Filter for Atmospheric Data Assimilation.”*Monthly Weather Review* 129 (1): 123–37.

Houtekamer, P. L., and Fuqing Zhang. 2016.“Review of the Ensemble Kalman Filter for Atmospheric Data Assimilation.”*Monthly Weather Review* 144 (12): 4489–4532.

Huang, Daniel Zhengyu, Tapio Schneider, and Andrew M. Stuart. 2022.“Iterated Kalman Methodology for Inverse Problems.”*Journal of Computational Physics* 463 (August): 111262.

Kantas, Nikolas, Arnaud Doucet, Sumeetpal S. Singh, Jan Maciejowski, and Nicolas Chopin. 2015.“On Particle Methods for Parameter Estimation in State-Space Models.”*Statistical Science* 30 (3): 328–51.

Katzfuss, Matthias, Jonathan R. Stroud, and Christopher K. Wikle. 2016.“Understanding the Ensemble Kalman Filter.”*The American Statistician* 70 (4): 350–57.

Kelly, D. T. B., K. J. H. Law, and A. M. Stuart. 2014.“Well-Posedness and Accuracy of the Ensemble Kalman Filter in Discrete and Continuous Time.”*Nonlinearity* 27 (10): 2579.

Kovachki, Nikola B., and Andrew M. Stuart. 2019.“Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.”*Inverse Problems* 35 (9): 095005.

Kuzin, Danil, Le Yang, Olga Isupova, and Lyudmila Mihaylova. 2018.“Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning.”*2018 21st International Conference on Information Fusion (FUSION)*, July, 39–46.

Lakshmivarahan, S., and David J. Stensrud. 2009.“Ensemble Kalman Filter.”*IEEE Control Systems Magazine* 29 (3): 34–46.

Law, Kody J. H., Hamidou Tembine, and Raul Tempone. 2016.“Deterministic Mean-Field Ensemble Kalman Filtering.”*SIAM Journal on Scientific Computing* 38 (3).

Le Gland, François, Valerie Monbet, and Vu-Duc Tran. 2009.“Large Sample Asymptotics for the Ensemble Kalman Filter,” 25.

Lei, Jing, Peter Bickel, and Chris Snyder. 2009.“Comparison of Ensemble Kalman Filters Under Non-Gaussianity.”*Monthly Weather Review* 138 (4): 1293–1306.

Luo, Xiaodong, Andreas S. Stordal, Rolf J. Lorentzen, and Geir Nævdal. 2015.“Iterative Ensemble Smoother as an Approximate Solution to a Regularized Minimum-Average-Cost Problem: Theory and Applications.”*SPE Journal* 20 (05): 962–82.

Malartic, Quentin, Alban Farchi, and Marc Bocquet. 2021.“State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.”*arXiv:2107.11253 [Nlin, Physics:physics, Stat]*, July.

Mandel, Jan. 2009.“A Brief Tutorial on the Ensemble Kalman Filter.”*arXiv:0901.3725 [Physics]*, January.

Mitchell, Herschel L., and P. L. Houtekamer. 2000.“An Adaptive Ensemble Kalman Filter.”*Monthly Weather Review* 128 (2): 416.

Moradkhani, Hamid, Soroosh Sorooshian, Hoshin V. Gupta, and Paul R. Houser. 2005.“Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter.”*Advances in Water Resources* 28 (2): 135–47.

Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Constant-Time Predictive Distributions for Gaussian Processes.” In. arXiv.

Popov, Andrey Anatoliyevich. 2022.“Combining Data-Driven and Theory-Guided Models in Ensemble Data Assimilation.” ETD. Virginia Tech.

Reich, Sebastian, and Simon Weissmann. 2019.“Fokker-Planck Particle Systems for Bayesian Inference: Computational Approaches,” November.

Roth, Michael, Gustaf Hendeby, Carsten Fritsche, and Fredrik Gustafsson. 2017.“The Ensemble Kalman Filter: A Signal Processing Perspective.”*EURASIP Journal on Advances in Signal Processing* 2017 (1): 56.

Sainsbury-Dale, Matthew, Andrew Zammit-Mangion, and Raphaël Huser. 2022.“Fast Optimal Estimation with Intractable Models Using Permutation-Invariant Neural Networks.” arXiv.

Schillings, Claudia, and Andrew M. Stuart. 2017.“Analysis of the Ensemble Kalman Filter for Inverse Problems.”*SIAM Journal on Numerical Analysis* 55 (3): 1264–90.

Schneider, Tapio, Andrew M. Stuart, and Jin-Long Wu. 2022.“Ensemble Kalman Inversion for Sparse Learning of Dynamical Systems from Time-Averaged Data.”*Journal of Computational Physics* 470 (December): 111559.

Shumway, Robert H., and David S. Stoffer. 2011.*Time Series Analysis and Its Applications*. Springer Texts in Statistics. New York, NY: Springer New York.

Spantini, Alessio, Ricardo Baptista, and Youssef Marzouk. 2022.“Coupling Techniques for Nonlinear Ensemble Filtering.”*SIAM Review* 64 (4): 921–53.

Stordal, Andreas S., Rafael J. Moraes, Patrick N. Raanes, and Geir Evensen. 2021.“P-Kernel Stein Variational Gradient Descent for Data Assimilation and History Matching.”*Mathematical Geosciences* 53 (3): 375–93.

Stroud, Jonathan R., Matthias Katzfuss, and Christopher K. Wikle. 2018.“A Bayesian Adaptive Ensemble Kalman Filter for Sequential State and Parameter Estimation.”*Monthly Weather Review* 146 (1): 373–86.

Stroud, Jonathan R., Michael L. Stein, Barry M. Lesht, David J. Schwab, and Dmitry Beletsky. 2010.“An Ensemble Kalman Filter and Smoother for Satellite Data Assimilation.”*Journal of the American Statistical Association* 105 (491): 978–90.

Taghvaei, Amirhossein, and Prashant G. Mehta. 2019.“An Optimal Transport Formulation of the Ensemble Kalman Filter,” October.

———. 2021.“An Optimal Transport Formulation of the Ensemble Kalman Filter.”*IEEE Transactions on Automatic Control* 66 (7): 3052–67.

Tamang, Sagar K., Ardeshir Ebtehaj, Peter J. van Leeuwen, Dongmian Zou, and Gilad Lerman. 2021.“Ensemble Riemannian Data Assimilation over the Wasserstein Space.”*Nonlinear Processes in Geophysics* 28 (3): 295–309.

Tippett, Michael K., Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, and Jeffrey S. Whitaker. 2003.“Ensemble Square Root Filters.”*Monthly Weather Review* 131 (7): 1485–90.

Ubaru, Shashanka, Jie Chen, and Yousef Saad. 2017.“Fast Estimation of\(tr(f(A))\) via Stochastic Lanczos Quadrature.”*SIAM Journal on Matrix Analysis and Applications* 38 (4): 1075–99.

Wen, Linjie, and Jinglai Li. 2022.“Affine-Mapping Based Variational Ensemble Kalman Filter.”*Statistics and Computing* 32 (6): 97.

White, Jeremy T. 2018.“A Model-Independent Iterative Ensemble Smoother for Efficient History-Matching and Uncertainty Quantification in Very High Dimensions.”*Environmental Modelling & Software* 109 (November): 191–201.

Wikle, Christopher K., and L. Mark Berliner. 2007.“A Bayesian Tutorial for Data Assimilation.”*Physica D: Nonlinear Phenomena*, Data Assimilation, 230 (1): 1–16.

Wikle, Christopher K., and Mevin B. Hooten. 2010.“A General Science-Based Framework for Dynamical Spatio-Temporal Models.”*TEST* 19 (3): 417–51.

Yang, Biao, Jonathan R. Stroud, and Gabriel Huerta. 2018.“Sequential Monte Carlo Smoothing with Parameter Estimation.”*Bayesian Analysis* 13 (4): 1137–61.

Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020.“Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In*Machine Learning, Optimization, and Data Science*, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.

Zhang, Jiangjiang, Guang Lin, Weixuan Li, Laosheng Wu, and Lingzao Zeng. 2018.“An Iterative Local Updating Ensemble Smoother for Estimation and Uncertainty Assessment of Hydrologic Model Parameters With Multimodal Distributions.”*Water Resources Research* 54 (3): 1716–33.

Services to extract information from web pages.

Some of these usebrowser automation although that is kind of its own thing.

`Scrapy`

is a python library to do that.
Companion projectscrapy-rss converts my parsings intoRSS feeds.

Also there is a custom cloud service (scrapinghub) that will deploy it for you on a massive scale if you want.

Scrapoxy automates deployment of distributed cloud for this purpose.

- History
- Tutorials
- Variants
- Installing TeX
- No TeX at all
- Reverse LaTeX
- Invocation
- Submitting to Arxiv
- Strikethrough
- Putting dates in drafts
- Spacing
- Commenting out
- Long documents
- Death-or-define macro
- Symbols, fonts
- Algorithms
- IDs (ORCID, DOI etc)
- Mathematical hacks
- Navigation
- Tables
- Version control
- Diagrams
- Editors
- Citations and bibliographies
- Posters
- Comments, TODOs
- Incoming
- IEEE style specialties
- Convertion to other markup formats
- References

The least worst mathematical typesetting system.
One of the better scoured of the filthy pipes inacademic plumbing.*De facto* standard for mathematicians, especially those who are not so
impertinent as to insist in writing in non-English languages, or are not so
shallow as gainsay the simple delights in the painstaking handicraft of manually
setting line breaks, or who have grad students who will deal with all that
for free.
That is, a tool which provides comfort for that endangered animal, the Tenured
Academic, and tolerable usefulness for the rest of us.

Other alternatives include

- using MS Word, or
- stabbing your eyeballs with a pencil

… each of which I regard as similarly undesirable as the other, and, to be clear, both*somewhat less* desirable than LaTeX itself.

Standard disclaimer, before diving into the*otaku*-zone TeXonomy:

I am also aware that I am doing violence to the rich and storied ecosystem by failing to mention that almost everything I mention is but a macro system built upon Knuth’s OG TeX system. Even that is a crude simplification of the complicated truth that his original system has evolved, reimplemented and mutated in complex and subtle ways. However, this document is not a philological exploration; it is a pragmatic guide to getting documents typeset before key deadlines pass. Nonetheless some context is occasionally helpful.

Eddie Smith,From boiling lead and black art: An essay on the history of mathematical typography; the only thing on this page you might conceivably read for pleasure.

Wasn’t that nice. Now, let Robert Kosararant about where this has evolved to:

The tools of the trade for academics and others who write research papers are among the worst software has to offer. Whether it’s writing or citation management, there are countless issues and annoyances. How is it possible that this fairly straightforward category of software is so outdated and awful?

Grad students, Robert, and the low-marginal-cost, low quality labour of grad students. The same labour undervaluation that keeps slave economies from developing the steam engine. As long as it is cheaper to solve typesetting problems with grad student labour, the system can shamble forward without anyone being incentivised to fix it for everyone else. Side effect: grad students also have underdeveloped software development skills and will never return to remedythe engineering mistakes of their younger selves. This will keep the average quality of the pieces of this system mediocre.

There are alsostandards-lock-in problems; even if someone develops a better system, will conferences and journals find it worthwhile to switch?
Or will they let the old system shamble on knowing it will cost*their editorial board* nothing to waste everyone*else’s* time?
For the authors: will a newer better system be good enough to justify the cost of learning it?
Will it be robust enough to last long enough that it repays its cost?
CfMoloch.

- How TeX macros actually work according to overleaf
- tug.org on LaTeX

LaTeX has a long and storied history, and that means it has lots of nasty historical design decisions are preserved in it, like smallpox victims in a glacier. And yet! It is not so very backwards compatible. Many old packages interact disastrously with new ones, and the conservatism of the community cargo cult way of doing things means that people tend to carry on deprecated practices forever, and any given document you edit may be a ticking timebomb of confusing font failures and incomprehensible error messages.

There is a guide to avoid old thingsDas LaTeX2e-Sündenregister oder Veraltete Befehle, Pakete und andere Fehler although if those who do not speak German might prefer, e.g. the slightly outdated EnglishL2 Taboos document. Deutsch befriedigt, im Bereich pingelig sein.

There are differences in various different
engines, formats, macro systems etc, giving usConTeXT and LaTeX and TeX and pdfTeX and XeTeX and LuaTeX,
and that they are all refreshingly unique in their choices of pain-points,
whether in formatting, interoperation,character set handling, compatibility,
preferred name capitalisation, or community support.
Here is a more pious take by Graham Douglas,What’s in a name: a guide to the many flavours of TeX.
For now I will construe*LaTeX* broadly, which is to say “in principle any of these LaTeX-like engines”, which is to say ”in practice any LaTeX engine which is sufficiently similar to the baseline horror to survive the submission process to arxiv.org”.

I am indeed cognisant of the diversity and richness of buffet of failure cases I could choose from, if only in outline. However,standards lock-in being what it is, I will for now avoid arranging the deckchairs of incremental improvement on this sinking ship of legacy mess. If I must innovate, it will be to discretely shuffle overthere, to thelifeboats, in which I will wait for some disruptive scholarly version ofMarkdown to come rescue me from LaTeX entirely.

Getting LaTeX code back from screen captures photos of (formatted) equations.Rebekah Yates reports

- You can look things up in the
**Comprehensive LaTeX symbols**list. It can usually be easily accessed with`texdoc symbols`

or`texdoc symbols-a4`

(in MiKTeX the latter only). - Another good option is to try the web-based software
**Detexify**, which allows you to draw the symbol and tries to recognize what you’ve drawn.[…] - If you are using the package
`unicode-math`

, then besides using any Unicode character list, the**list of all supported symbols**(`texdoc unimath-symbols`

) is very useful as it also lists which symbols are available in the various fonts. - Using
`unicode-math`

, you can also search for characters by drawing (just like with detexify) using**ShapeCatcher**.

Frontrunner:Mathpix. They also offer a mathematical notebook,snips.

Or! Leave the machines behind!
Train*yourself* in speed LaTeX transcription via thegamified mathematical typesetting training systemTeXnique.

Per default TeX runs in an “interactive mode”, which makes usually pointless efforts to solicit my advice about badly explained syntax errors, and offers me a chance to… fix them? I guess? I have never tried; why would I want to try to do that half way through formatting instead of in my text editor where my fixes will actually persist? This probably dates to some time in the 80s when users were billed per-command-line-invocation or something, and is utterly contrary to modern expectations. Anyway, here is how we getNormal-unix-halt-on-failure-with-helpful-message:

`pdflatex -interaction=nonstopmode -halt-on-error`

Or, alternatively, bloody-minded-compile-my-document-at-all-costs-I-don’t-care-how-it-is-broken:

`pdflatex -interaction=batchmode`

`latexmk`

A popular tool that makes a best-effort attempt to couple together the clanking chain of components that turn those text files into documents. It has various command-line options in the manual but examples are more explanatory to me.

See`latexmk`

options and nomenclature.
See also the`latexmkrc`

files that it comes with for examples of advanced configuration.

🤓 oooh! I can set up latex as anautomatically updating dynamical preview, even withsynctex, as a poor-man-interactive-editor.

`latexmk -pvc`

Tectonic addresses several of my complaints at once, at least on paper. I wonder if it is as seamless as I might hope in practice.

Tectonic to be a light wrapping/forking of mainline LaTeX to modernise the toolchain slightly — not as regards the (La)TeX language itself but as regards the way it is built and executed.

Tectonicis a modernized, complete, self-containedTeX/LaTeX engine, powered byXeTeX andTeXLive.… TeX is quite archaic in some ways, but it’s still the tool of choice for documents that require precision typography or ones that involve lots of mathematical equations, which makes it especially important in the sciences. Tectonic converts TeX files intoPDF files.

Tectonic is

beta softwarebut has been demonstrated to work well in a variety of real-world situations.Contributions in any form — documentation, bug reports, test cases, new features — are most welcome. Theuser forum is the place to start.

Advertised features:

- Tectonic
automatically downloads support filesso you don’t have to install a full LaTeX system in order to start using it. If you start using a new LaTeX package, Tectonic just pulls down the files it needs and continues processing. The underyling "bundle" technology allows forcompletely reproducible document compiles.Thanks tothe Dataverse Project for hosting the large LaTeX resource files!- Tectonic has sophisticated logic and
automatically loops TeX and BibTeXas needed, and only as much as needed. In its default mode itdoesn’t write TeX’s intermediate filesandalways produces a fully-processed document.- The
`tectonic`

command-line program isquiet and never stops to ask for input.- Thanks to the power of XeTeX, Tectonic can use
modern OpenType fontsand isfully Unicode-enabled.- The Tectonic engine has been extracted into a
completely self-contained libraryso that it can be embedded in other applications.- Tectonic has been forked from the old-fashioned WEB2C implementation of TeX and is
developed in the openonGitHub using modern tools likethe Rust language.- Tectonic can be used from Github Actions to typeset your documents whenever a change to them is made:
- setup-tectonic - Use tectonic in your github action workflows (supports caching and optionally biber)
- compile-latex -
Thanks toVinay Sharma for creating the action.

It is hard to imagine getting uptake because it doesn’t use enough cute 1980s typesetting stunts like naming itself canonically T^{E}C_{T}O^{N}I_{C}.

`brew install tectonic`

The manual took some work to find. Seehere.

Can be used in VS Code as a LaTeX Workshop build scripted or in the new disruptiveTeXlab plugin.

Generating arbitrary LaTeX in python scripts, jupyter notebooks, Pweave literate documents? For that I use an ingenious python script calledlatex_fragment to ease my burden and render my latex fragments inline. It was written by that paragon of coding cleanliness, that tireless crusader for not-dicking-around, me.

```
from IPython.display import display_latex, display
import latex_fragment
l = latex_fragment.LatexFragment(r'\(x=y\)')
display(l)
```

You should totally check it out for rendering inline algorithms, or for emitting SVG equations.

Note also thatpandoc markdown already includes LaTeX support for LaTeX output.

Other options include inverting this setup, and injecting python into LaTeX via anexecutable notebook such as knitr.

Injupyter, theinbuilt jupyter LaTeX renderer will display maths ok viaHTML+JS, so why would we use this?

```
#%%
from IPython.display import Latex
Latex(r'''$x=y$''')
```

For one, once this thing has rendered there are no external dependencies, so a notebook which display mathematics this way also works when you are offline. For another, this can display other stuff than mathematics, for example, specialised LaTeX-only things like pseudocode, and weird font samples, and exotic diagram types like Feynman diagrams and parse trees.

Here are some automations.

- google-research/arxiv-latex-cleaner: arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
- DanielCWard/latex_tools: A few scripts to automatically process latex projects for easier online publishing

Here is a decently documented template:kourgeorge/arxiv-style.

a.k.a. strikeout.Jan Söhlke says this works great in text mode.

```
\usepackage[normalem]{ulem}
\sout{text to be struck through}
```

Strikethrough is weird in math mode;
the`cancel`

package is recommended.
In mathjax, thisneeds an extension.

Certain document classes (all?) have draft modes.

`\documentclass[draft]{article}`

A universal (not document-class-dependent) option was suggested by the Malaysian LaTeX User Group,Putting Dates in Watermarks:

```
\usepackage{draft watermark}
\usepackage{datetime2}
\SetWatermarkLightness{.9}
\SetWatermarkText{Draft\string@\DTMnow}
\SetWatermarkScale{.3}
```

On a minimalist TeX system, this may necessitate

```
tlmgr install draftwatermark everypage \
datetime2 etoolbox tracklang
```

Managing spacing between symbols is the major reason for existence for LaTeX. It is a hard problem, optimising for legibility of symbols on a sheet of paper for all the various sorts who might read it. One must solve for the goal of minimising the number of criticism of grumpy aesthetically-challenged pedantic folk-typographers each of whom has a different, incompatible list in their mind of what in constitutes an unspeakable crimes against legibility. Thus there are many compromises, tricks, edge-cases and other potholes to get your foot stuck in, especially in mathematical mode.

As far as individual mathematical characters goes, here is acomprehensive guide to LaTeX mathematical spacing
byWerner.**tl;dr** if things look weird I can convert a mathematical character to an “ordinal” by wrapping it`{=}`

and add my own manual spacing back in and it will work nicely.
When that is not suffiicent, Overleaf’sSpacing in math mode is what I most often need:

```
\begin{align*}
f(x) &= x^2\! +3x\! +2 \\
f(x) &= x^2+3x+2 \\
f(x) &= x^2\, +3x\, +2 \\
f(x) &= x^2\: +3x\: +2 \\
f(x) &= x^2\; +3x\; +2 \\
f(x) &= x^2\ +3x\ +2 \\
f(x) &= x^2\quad +3x\quad +2 \\
f(x) &= x^2\qquad +3x\qquad +2
\end{align*}
```

\[ \begin{align*} f(x) &= x^2\! +3x\! +2 \\ f(x) &= x^2+3x+2 \\ f(x) &= x^2\, +3x\, +2 \\ f(x) &= x^2\: +3x\: +2 \\ f(x) &= x^2\; +3x\; +2 \\ f(x) &= x^2\ +3x\ +2 \\ f(x) &= x^2\quad +3x\quad +2 \\ f(x) &= x^2\qquad +3x\qquad +2 \end{align*} \]

K. Cooper’s even more comprehensiveLaTeX Spacing Tricks is the guide for almost every type of spacing civilians will need.

To manage justification, a.k.a. text alignment, generally
(Why is everything fully justified per default?
It makes the spacing so ugly, at least to this aesthetically-challenged folk-typographer)
one needs the`\raggedright`

/`\centering`

etc commands,
or even the`ragged2e`

package.
See theOverleaf documentation andwikibooks.

**PRO-TIP**:`\RaggedRight`

and friends destroy paragraph indentation.The fix
is to restore the indent:

```
\newlength{\saveparindent}
\setlength{\saveparindent}{\parindent}
\RaggedRight
\setlength{\parindent}{\saveparindent}
```

Or you could do what a*real* typographer would do andput space between paragraphs, which might require some style updates.

I use RaggedRight spacing to make it easier to proofread, but revert to fully justified after the proofreading process. This has advantages:

- Editors will not complain about everything not being fully justified
- Reviewers will find it harder to read, reducing the chance they will detect any inconvenient errors I made.

Gotcha:Commands gobble following space.

The obvious way to comment stuff out is with the`%`

comment marker.
For long blocks,Martin Scharrer suggests

You can use

`\iffalse ... \fi`

to make (La)TeX not compile everything between it. However, this might not work properly if you have unmatched`\ifxxx ... \fi`

pairs inside them or do something else special with if-switches. It should be fine for normal user text.There is also the

`comment`

package which gives you the comment environment which ignores everything in it verbatim. It allows you to define own environments and to switch them on and off. You would use it by placing`\usepackage{comment}`

in the preamble.

`subfiles`

is a handy package forMulti-file LaTeX projects.

`\documentclass[../main.tex]{subfiles}`

Death-or-define is how I think of the trick to force a macro definition redefinition even if there is no definition to be redefined — handy if I am rendering latex from some tricky source such asjupyter, or where I don’t have control over the overall document outside my section but don’t care about wreaking havoc on my collaborators; some other poor sap can deal with the macro mutations Mwahahaha.

```
\providecommand{\foo}{}
\renewcommand{\foo}[1]{bar: #1}
```

SeeLaTeX algorithms.

Why is this not documented at`orcid.org`

? I do not know.
So now I document it here.
AFAICT, at the basic level I should simply create a hyperlink, e.g.

`\href{https://orcid.org/0000-0001-6077-2684}{Dan MacKinlay}`

But what if I want the fancy logo so that everyone*knows* I cleverly did the ORCID thing?
If I am using some hidebound conference stylesheet from the 90s this is unlikely to work.
But for a more modern setups (e.g. IEEE is usually current) I might be able to get an attractive green logo.

I made this work with theacademicons package, which redners the logo using a custom font. Then,ORCID, for example, is set up in the preamble:

```
\usepackage{academicons}
\definecolor{orcidlogocol}{HTML}{A6CE39}
```

and in the body

```
\href{https://orcid.org/0000-0001-6077-2684
}{Dan MacKinlay \hspace{2mm} \textcolor{orcidlogocol}{\aiOrcid} }
```

Or use the`orcid.pdf`

(which I converted from`orcid.svg`

, feel free to use it):

`\href{https://orcid.org/0000-0001-6077-2684}{\includegraphics[scale=0.06]{orcid.pdf}\hspace{2mm}Dan MacKinlay}`

Tools such asgit-latexdiff provide custom diffing for, in this case,LaTeX code ingit.

Getting a diagram or a plot into into a document? See alsogeneral diagrams for tools which create generic types of diagram. Considerscientific workbooks, which often include automatic conversion of inline plots.

Martin H says, on including SVG in TeX, that the smoothest route is to convert the SVG into PDF+TeX, as perJohan Engelen’s manual:

`inkscape -D -z --file=image.svg --export-pdf=image.pdf --export-latex`

Then invoke using

```
\begin{figure}
\centering
% set width of next svg image:
\def\svgwidth{\columnwidth}
\input{image.pdf_tex}
\end{figure}
```

This workflow from SVG->PDF->LaTeXcan be automated using thesvg tex package.

PGFPlots is a native diagramming/plotting package which supports PDF output forTiKZ-style diagrams.

SeeLaTeX editors.

From within LaTeX? SeeBibTeX etc.

`a0poster`

is popular, as expounded byMorales de Luna,
but I secretly feel that it sounds like a nightmare of legacy postscript
nonsense and doesn’t even look good.sciposter
is a popular`a0poster`

variant.

`tikzposter`

and`beamerposter`

are both highlighted onsharelatex
but I cannot find a way of making them seem anything but fugly to me and I cannot condone their use.
It is hard enough to bring beauty into this world without makgint it worse.

- How to take lecture notes with LaTeX.
- git-latexdiff is a LaTeX-aware diff.

IEEEtran stylesheets have somespecial equation formatting noûs.

```
\begin{IEEEeqnarray}{rCl}Z
&=&x_1 + x_2 + x_3 + x_4 + x_5 + x_6
\IEEEnonumber\\
&&+\:a + b%
\end{IEEEeqnarray}
```

Maybe you want to throw the original latex out? in whicih case seeLaTeX-free mathematics

pandoc converts to everything

ConTeXt can handleXHTML natively

is a LaTeX document processing framework written entirely in Python. It currently comes bundled with an XHTML renderer (including multiple themes), as well as a way to simply dump the document to a generic form of XML. Other renderers can be added as well and are planned for future releases…

plasTeX differs from other tools like

`LaTeX2HTML`

,`TeX4ht`

,`TtH`

, etc in that the parsing and rendering of the document are completely separated. This separation makes it possible to render the document in multiple output formats.It is active, being used by high-profile online projects such as the collaborative textbooksstacks andkerodon.

HEVEA — an older HTML converter with medocre maths support (so why would you even bother?)Manual lives here.

`LaTeX2HTML`

,`TeX4ht`

,`TtH`

,…

Oetiker, Tobias, Marcin Serwin, Hubert Partl, Irene Hyna, and Elisabeth Schlegl. 2022.“The Not So Short Introduction to LaTeX,” October.

Thepython-derived entrant in thescientific workbook field is called`jupyter`

.

Interactive “notebook” computing for various languages;python/julia/R/whatever plugs into the “kernel” interface. Jupyter allows easy(ish) online-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. Handy enough that it’s sometimes worth the many rough spots, and so I conquer my discomfort and use it.

But what does the painful set up of jupyter buy you?
Why bother with this contraption?
It took me a long time to realise that the answer was in part thatit is a*de facto* standard for running remote computation jobs interactively.
The browser-based, network-friendly jupyter notebook is a natural, easy way to execute tedious computations on some other computer somewhere else, with some kind of a paper trail.
In particular, it is much better over unreliable networks thanremote terminals orremote desktops, because the client/server architecture doesn’t need to do so many round-trips to render the state of your work
So, jupyter is a kind of re-designed network terminal.
Certainly, if you need to execute a job that can be executed over remote desktop or jupyter, jupyter is going to be less awful if your connection has any lag at all, when every mouse click and keystroke involves waiting and twiddling your fingers. Jupyter has less waiting.

People make UX arguments, say, that jupyter is friendly and supports graphs and so on.
I am personally ambivalent about those arguments.
Jupyter can do some things*better than theconsole*.
But that is an artificially restricted comparison.
It can also do some things better than pencil and paper.
On the other hand, most things that jupyter does, it does worse than a proper IDE or decentcode editor.
This comparison is more pertinent when you need to run the code in a
However, sometimes those other things are not available on, say, yourHPC cluster orcloud compute environment, and then this becomes a relevant advantage.

But for now the main takeaway, I think, is that if, like me, you are confused by jupyter enthusiasts claiming it is easy and fun, it will make more sense if you append “in comparison to the worst-case other ways of executing code remotely which is frequently what we face”.

There are other comparisons to make — some like it as a documentation format/literate coding environmennt. Once again, sure, it is better than text files. But then,RMarkdown is better.

SeeIPython.

**tl;dr** Not to besmirch the efforts of the jupyter developers who are doing a difficult thing, in many cases for free, but I*will* complain about jupyter notebook.
It is often touted as a wonderful solution for data science but seems to me to merely offer a different selection of pain points to traditional methods.
Further, it introduces some new pain points when you try to combine the old and the new to make something better.

This is not to say it is all bad.
I’m an equivocal advocate of the jupyter notebook*interface*, which some days seems to counteract every plus with a minus.
This is partly due to the particulars of`jupyter`

’s design decisions, and partly because of the problems of notebook interfaces generally(Chattopadhyay et al. 2020).
As with so many things in computer interfaces, my luke-warm endorsement is, in relative terms,*fannish enthusiasm* because often, as presaged, the alternatives are*abysmal*.

Jupyter:
It’s friendly to use, but hard to install.
It’s easy to graphically explore your data, but hard to keep that exploration in version control.
It makes it easy to explore your code output, butclashes with the fancy debugger
that would make it easy to explore your code bugs.
It is open source, and written in an easy scripting language, python, so it seems it*should* be easy to tweak to taste.
In practice it’s an ill-explained spaghetti of python, javascript, compiled libraries and browsers that relate to one another in obscure ways that few people with a day job have time to understand or contribute to.
Things regularly break either at the server or client side and you might need to upgrade either or both to fix it.
You might have many different installs of each and need to upgrade a half-dozen different installs to keep them all working.
You might upgrade the wrong one at the end of a long day and have no way to get it working without a lengthy debugging process.
It is a constant struggle to keep jupyter finding the many intricate dependencies that are needing to keep the entire contraption running.
The sum total is IMO no more easy to run than most of the other UI development messes that we tolerate in academic software, let alone*tweak*.
Case study: look adependency of a dependency of the autocomplete function broke something and thus spawned a multi-month confusion of cascading dependency problems and certainly cost me several hours to fix across the few dozen different python environments I manage across several computers.
This kind of tedious intermittent breakage is the much the cost of doing business with jupyter, and has been so for as long as I have been using the project, which is as long as it has existed.

These pain points are perhaps not so intrusive for projects of intermediate complexity and/or longevity.
Indeed, jupyter seems good at making such projects look smooth, shiny, and inviting.
That is, at the crucial moment when you need to make your data science project look sophisticated-yet-friendly, it lures colleagues into your web(-based IDE).
Then it is*too late mwhahahahah you have fallen into my trap now you are committed*.
This entrapment might be a feature not a bug, as far as the realities of team dynamics and their relation to software development.
You want to lure people in until your problems become their problems and you are required to work together to solve them.

Some argue that the weird / irritating constraints of jupyter can even lead to good architecture, such asGuillaume Chevallier andJeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric. I could be persuaded of its merits, if I found time in between all the debugging.

For now I think of the famous adage “The fastest code is code that doesn’t need to run, and the best code is code you don’t need to write”. The uncharitable corollary might be “Thus, let’s make writing code horrible so that you write less of it”. That is not even necessarily a crazy position.

Here is some verbiage by Will Crichton which explores some of these themes,The Future of Notebooks: Lessons from JupyterCon.

Pain point: The lexicon of jupyter is confusing. Terminology tarpit alert.

A*notebook* is on one hand a*style of interface* which this conforms to.
Other applications with a notebook style of interface
areMathematica andMATLAB.

These interfaces communicate with a computational backed, which is called a`kernel`

^because in mathematics and computers science if you don’t know what to call somethingyou call it a kernel. This confusing explosion of definitions is very much on-message for the notebook development.]

These are software packages in which a unit of development is
a type of*notebook* file on your disk, containing both code and output of that code.
(In the case of jupyter this file format is marked by file extension`.ipynb`

, which is short for “ipython notebook”, for fraught historical reasons.)*One* implementation of a notebook frontend interface over a notebook protocol for jupyter is called the jupyter notebook, launched by the`jupyter notebook`

command which will open up a javascript-backed notebook interface in a web browser.
This is the one that is usually assumed.
Another common notebook-style interface implementation is called`jupyter lab`

,
which additionally uses much of the same`jupyter notebook`

infrastructure but is distinct and only sometimes interoperable in ways which I do not pretend to know in depth.
But there aremultiple ‘frontends’ besides which interact over the jupyter notebook protocol to talk to a kernel.

Which sense of*notebook* is intended you have to work out from context,
e.g. the following sentence is not at all tautological:

You dawg, I heard you like notebooks, so I started up your jupyter notebook in

`jupyter notebook`

.

Hearing this, this I became enlightened.

Seejupyter UI.

jupyter looks for*kernel specs* in a kernel spec directory,
depending on your platform.

Say your kernel is`dan`

; then the definition can be found in the following location:

- Unixey:
`~/.local/share/jupyter/kernels/dan/kernel.json`

- macOS:
`~/Library/Jupyter/kernels/dan/kernel.json`

- Win:
`%APPDATA%\jupyter\kernels\dan\kernel.json`

Seethe manual for details.

How to set up jupyterto use a virtualenv (or other) kernel?**tl;dr** Do this from inside the virtualenv to bootstrap it:

```
pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv-name
```

Addendum: for Anaconda, you can auto-install all discoverable conda envs, which worked for me, whereas the ipykernel method did not.

`conda install nb_conda_kernels`

e.g. if you wish to run a kernel with different parameters. for example with a GPU-enabled launcher.See here for an example for GPU-enabled kernels:

For computers on Linux with optimus, you have to make a kernel that will be called with

`optirun`

to be able to use GPU acceleration.

I made a kernel in`~/.local/share/jupyter/kernels/dan/kernel.json`

and modified it thus:

```
{
"display_name": "dan-gpu",
"language": "python",
"argv": [
"/usr/bin/optirun",
"--no-xorg",
"/home/me/.virtualenvs/dan/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
```

Any script called can be set up to use CUDA but not the actual GPU, by setting an environment variable in the script, which is handy for kernels.
So this could be in a script called`noprimusrun`

:

`CUDA_VISIBLE_DEVICES= $*`

Various options. For one, github will attempt to render jupyter notebooks in github repos.; I have had various glitches and inconsistencies with images and equations rendering in such notebooks. Perhaps it is better in…

The fastest way to share your notebooks - announcing NotebookSharing.space - Yuvi Panda

You can upload your notebook easily via the web interface atnotebooksharing.space:

Once uploaded, the web interface will just redirect you to the beautifully rendered notebook, and you can copy the link to the page and share it!

Or you can directly use the

`nbss-upload`

commandline tool: …When uploading, you can opt-in to have collaborative annotations enabled on your notebook via the open source, web standards basedhypothes.is service. You can thus directly annotate the notebook, instead of having to email back and forth about 'that cell where you are importing matplotlib' or 'that graph with the blue border'.

This is one of the coolest features of notebooksharing.space.

Jupyter can hostonline notebooks,
evenmulti-user notebook servers -
if you are brave enough to let people execute weird code on your machine.
I’m not going to go into the security implications.**tl;dr**encrypt and password-protect that connection.
Here, see thisjupyterhub tutorial.

**NB**: This section is outdated.
🏗; I should probably mention the ill-explainedKaggle kernels
and google cloud ML execution of same, etc.

Base level, you can run one using a standard a standardcloud option like buying compute time as avirtual machine orcontainer, and using a jupyter notebook for their choice of data science workflow.

Special mention to two early movers:

sagemath runs notebooks online, with fancy features starting at $7/month. Messy design but tidy open-source ideals.

Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. ($7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.

Microsoft’sAzure notebooks

Azure Notebooks is a free hosted service to develop and run Jupyter notebooks in the cloud with no installation. Jupyter (formerly IPython) is an open source project that lets you easily combine markdown text, executable code (Python, R, and F#), persistent data, graphics, and visualizations onto a single, sharable canvas called a notebook.

Google’ssColaboratory is hip now

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is an intro andhere is another

Anne Bonner’sTips, Tricks, Hacks, and Magic: How to Effortlessly Optimize Your Jupyter Notebook is actually full of useful stuff. So much stuff that I nearly forget I hate jupyter. If you must use it, read her article it will make stuff better. Many tips here are gleaned from her.

Here are some useful ones too look up from her:

```
%%writefile basic_imports.py
%load basic_imports.py
```

e.g. forlatex free mathematics.

`python -m IPython.external.MathJax /path/to/source/MathJax.zip`

Sometime, you can’t see the whole code cell which is annoying. This is aknownissue The workaround is simple enough:

zooming out to 90% and zooming back in to 100%,

`Ctrl + - / +`

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images?It’s jupyter being conservative in version 5.0

```
jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py
```

update the`c.ServerApp.iopub_data_rate_limit`

to be big,
e.g.`c.ServerApp.iopub_data_rate_limit = 10000000`

.

This is fixed after 5.0.

Modern jupyter issuspicious of connections per default and will ask you either for a magic token or a password and thereafter, I thing, encrypts the connection (?), which is sometimes what I want. Not always.

But when I am inHPC hell, accessing jupyter notebooks through a double SSH tunnel, the last thing I need is to put a hat on a hat by*triply* securing the connection.
Also, sometimes the tokens do not work over SSH tunnels for me and I cannot work out why.
I think it is something about some particular jupyter version mangling tokens, or possibly failing to report that it has not claimed a port used by someone else (although it happens more often than is plausible for the latter case).CodingMatters notes
that the following invocation will disable all jupyter-side security measures:

`$ jupyter notebook --port 5000 --no-browser --ip='*' --ServerApp.token='' --ServerApp.password=''`

Obviously never do this unless you believe that everyone sharing a network with that machine has your best interests at heart.

There arevarious other useful settings which one could use to reduce security.
In config file format for`~/.jupyter/jupyter_notebook_config.py`

:

```
c.ServerApp.disable_check_xsrf = True #irritates ssh tunnel for me that one time
c.ServerApp.open_browser = False # consumes a 1 time token and is pointless from a headless HPC
c.ServerApp.use_redirect_file = False # forces display of token rather than writing it to some file that gets lost in the containerisation and is useless in headless HPC
c.ServerApp.allow_password_change = True # Allow password setup somewhere sensible.
c.ServerApp.token = '' # no auth needed
c.ServerApp.password = password # actually needs to be hashed - see below
```

Eric Hodgins recommends this hack for a simple password without messing about trying to be clever with their browser infrastructure which TBH does seem to break pretty often for me:

```
c = get_config()
c.ServerApp.ip = '*'
c.ServerApp.open_browser = False
c.ServerApp.port = 5000
# setting up the password
from IPython.lib import passwd
password = passwd("your_secret_password")
c.ServerApp.password = password
```

Jupyter Server Proxy lets you run arbitrary external processes (such as RStudio, Shiny Server, syncthing, PostgreSQL, etc) alongside your notebook, and provide authenticated web access to them.

Note

This project used to be called

nbserverproxy. if you have an older version of nbserverproxy installed, remember to uninstall it before installing jupyter-server-proxy - otherwise they may conflictThe primary use cases are:

- Use with JupyterHub / Binder to allow launching users into web interfaces that have nothing to do with Jupyter - such as RStudio, Shiny, or OpenRefine.
- Allow access from frontend javascript (in classic notebook or JupyterLab extensions) to access web APIs of other processes running locally in a safe manner. This is used by theJupyterLab extension fordask.

Chattopadhyay, Souti, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020.“What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities,” 12.

Granger, Brian E., and Fernando Pérez. 2021.“Jupyter: Thinking and Storytelling With Code and Data.”*Computing in Science Engineering* 23 (2): 7–14.

Himmelstein, Daniel S., Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, and Anthony Gitter. 2019.“Open Collaborative Writing with Manubot.” Edited by Dina Schneidman-Duhovny.*PLOS Computational Biology* 15 (6): e1007128.

Otasek, David, John H. Morris, Jorge Bouças, Alexander R. Pico, and Barry Demchak. 2019.“Cytoscape Automation: Empowering Workflow-Based Network Analysis.”*Genome Biology* 20 (1): 185.

Sokol, Kacper, and Peter Flach. 2021.“You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source.” In. Zenodo.

Here be tips for using VS Code as a collaborative cloud-friendly editor, in several senses.

Remote edits code in particular environments from a local VS Code instance - including spinning up containers or SSH sessions, so that your editor and execution environments can be different. The remote execution environments can be elsewhere on the internet, or they can be local VMs, which is why I have “scare quotes” around the word “remote”. VS Code tries hard to make this seamless, installing some helper files on the remote side that interact smoothly with the local side. Occasionally some extensions will not work gracefully across the barrier between local and remote practices but for most of my workflows this problem does not arise. This is generally the easiest and most mainstream option, and I use it all the time.

This is best explained by using the tutorial examples.

Tutorial | Description |
---|---|

Remote via SSH | Connect to remote and virtual machines with Visual Studio Code via SSH. |

Work in WSL | Run Visual Studio Code in Windows Subsystem for Linux. |

Develop in Containers | Run Visual Studio Code in a Docker Container. |

GitHub Codespaces | Connect to a codespace with Visual Studio Code. |

Also I wrote one about gettingremote debugging for python working, see below.

Pro tip: remote editing via unreliable or intermittent network connection can be a drag. There isno universal Persistent SSH session solution at time of writing. It would be nice if it worked overeternal terminal butit does not. However, Christopher LaPointe’ssolution usingtmux seems OK for now.

**tl;dr**:

```
{
"terminal.integrated.profiles.linux": {
"tmux": {
"path": "/usr/bin/tmux",
"args": [
"new-session",
"-A",
"-s",
"vscode-${workspaceFolderBasename}"
],
},
},
"terminal.integrated.defaultProfile.linux": "tmux",
"remote.SSH.defaultForwardedPorts": [
{
"localPort": 9002,
"remotePort": 9002,
"name": "SA"
}
],
}
```

Slightly weirder: run a server process that makes VS Code accessible to you in a browser. No local copy of VS Code is required; you use a browser window for everything.

I have not had need of this yet, because any scenario where I have HTTPS browser access to a remote server, I have also had SSH access and so it has been easier to use the VS Code Remote setup.

However, a completely browser-based IDE might be useful in some circumstances: in exotic devops scenarios, through painful firewalls, when working from a thin client, something like that?

There are at least two forks of VS code designed to be a server-side editor.

Code-server byCoder: The Developer Workspace Platform seems actively maintained and popular. AFAICT it is Linux-only.

Alternatively,gitpod-io/openvscode-server seems very similar and has an explanatory blurb:

VS Code has traditionally been a desktop IDE built with web technologies. A few years back, people started patching it in order to run it in a remote context and to make it accessible through web browsers. These efforts have been complex and error prone, because many changes had to be made across the large code base of VS Code.

Luckily, in 2019 the VS Code team started to refactor its architecture to support a browser-based working mode. While this architecture has been adopted by Gitpod and GitHub, the important bits have not been open-sourced, until now. As a result, many people in the community still use the old, hard to maintain and error-prone approach.

At Gitpod, we’ve been asked a lot about how we do it. So we thought we might as well share the minimal set of changes needed so people can rely on the latest version of VS Code, have a straightforward upgrade path and low maintenance effort.

Does this use the same stack as the previous ones? Maybe.GoogleCloudPlatform/cloud-code-vscode: Cloud Code for Visual Studio Code: Issues, Documentation and more

Cloud Code for VS Code brings the power and convenience of IDEs to cloud-native Kubernetes and Cloud Run application development. Cloud Code works with Google’s command-linecontainer tools like

`skaffold`

and`kubectl`

under the hood, providing local, continuous feedback on your project as you build, edit, and run your applications locally or in the cloud.

Having said that everything is seamless, there are in fact edge cases where there divide between what is*here* on the client and what is*there* on the remote server, becomes important.
One of those cases is interactive debugging.

Here is a worked example of what I needed to do to get a remote python debugger working from my local machine. Finding out why code that works locally does not work in the actual target deploy environment.

First need to installthe debugpy package in our project. How to install depends how we are managing your python environment, but it will probably be one of these:

```
pip install debugy ## 👈🏻 this one for me
conda install debugpy
poetry add debugpy
```

Now I choose a network port for the debugger.^{1}
This port needs to be unique to me (in the sense that no-one else on the remote machine should be using it) and it should be between 1024 and 65535.
For simplicity’s sake I will assume the port is 48888, but I would recommend against that number for you because someone else might have copy-pasted these instructions and got there first, if you are using a shared machine.
(Although not me; I use different port numbers in reality, this is just a tutorial example).

Next, we do a one-time setup of VS Code to know that it should look for that port by finding the debug tab in VS Code and adding a configuration.

This will open`launch.json`

into which we can add a configuration for a remote attach debugger:

```
{
"name": "Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "127.0.0.1",
"port": 48888
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "."
}
]
}
```

This part is clunky, but after one-time setup everything goes smoothly. VS Code will do all the remaining necessary networking by magic.

So NOW WE ARE READY TO DEBUG. Almost. We still need to load up our code on the server to USE the debugpy module. Two options here.

Firstly, we could invoke the code from the vs code terminal like this:

`python -m debugpy --listen 0.0.0.0:48888 ./myscript.py`

Alternatively, we could load and execute the debugging system inside`myscript.py`

:

```
import debugpy
debugpy.listen(('0.0.0.0', debug))
debugpy.wait_for_client()
```

Either way, VS code will detect what I am doing and magically open up an SSH tunnel from*here* to*there*.
Now, if you did the`wait_for_client`

thing, the code will pause and wait for for a debugger connection.

OK great, so now I select that debug configuration I created earlier in order to get my debugger to connect

I was stuck for ages on this. but**now I ALSO NEED TO CLICK THE GREEN TRIANGLE** or it doesn’t start.
And then… That’s it. we are done.

The debugger will show what is happening in general, but it is especially useful when

- we hit a line that raises an exception, or
- the code hits the line
`breakpoint()`

which will cause the debugger to drop into the code right*there*.

How does the debugger work?
I do not need to explain becausemicrosoft has already done that.
But*tl;dr* there is now a lot of information about my running code, a graph of the current call stack, a little tab to inspect the current value of variables etc.

**Pro tip.** for use behind a firewall, requiresthe following whitelist exceptions:

For github copilot:

`vscode-auth.github.com`

`api.github.com`

`copilot-proxy.githubusercontent.com`

In addition it will beeasier to install extensions with the following exceptions

`marketplace.visualstudio.com`

`vscode.blob.core.windows.net`

`*.vo.msecnd.net`

`*.gallerycdn.vsassets.io`

`download.microsoft.com`

(only some extensions)`download.visualstudio.microsoft.com`

(only some extensions)

`*.online.visualstudio.com`

`*.liveshare.vsengsaas.visualstudio.com:443`

A list ofmiscellaneous URLS that the VSCode likes includes

`update.code.visualstudio.com`

- Visual Studio Code download and update server`code.visualstudio.com`

- Visual Studio Code documentation`go.microsoft.com`

- Microsoft link forwarding service`vscode.blob.core.windows.net`

- Visual Studio Code blob storage, used for remote server`marketplace.visualstudio.com`

- Visual Studio Marketplace`*.gallery.vsassets.io`

- Visual Studio Marketplace`*.gallerycdn.vsassets.io`

- Visual Studio Marketplace`rink.hockeyapp.net`

- Crash reporting service`bingsettingssearch.trafficmanager.net`

- In-product settings search`vscode.search.windows.net`

- In-product settings search`raw.githubusercontent.com`

- GitHub repository raw file access`vsmarketplacebadges.dev`

- Visual Studio Marketplace badge service`az764295.vo.msecnd.net`

- Visual Studio Code download CDN`download.visualstudio.microsoft.com`

- Visual Studio download server, provides dependencies for some VS Code extensions (C++, C#)`vscode-sync.trafficmanager.net`

- Visual Studio Code Settings Sync service`vscode-sync-insiders.trafficmanager.net`

- Visual Studio Code Settings Sync service (Insiders)`vscode.dev`

- Used when logging in with GitHub or Microsoft for an extension or Settings Sync`default.exp-tas.com`

- Visual Studio Code Experiment Service, used to provide experimental user experiences

Technically we need to choose a port on the local AND remote machine, and these can be different, but why do that?↩︎

I have broken this out into various themes,trousers,low impact fashion,gym wear,footwear and probably more by now.Particulate masks??Hygienic masks?? The remainders are here.

Models of fashions (as opposed to fashion models):

- Red queen signalling
- Neutral mutation(Ormerod and Bentley 2010)
- Network diffusion

These can probably be considered asinnovation diffusion models.

Rachel Huberwonders aboutblockchainizedsurveillance for brand fashion and its impact on the concept ofownership.

Upcycling tips?

I am interested in collared shirts with australian designs where the money goes to the artist in a fair way.

The good cheap stuff comes from Indonesia, but indonesia is a closed economy.

- Jadi Batek is a Malaysian one with pretty sweet designs although suspect cuts.

- LED Hoodie by Lumen.
- Uzbek Chapans, e.g.this black floral bad boy.
- Search: 2 results found for "mask venetian" – Feather.com.au
- Masquerade Mask Couples Phantom Mask Pair Simple Elegant

I keep my clothing smelling fresh withZeroda High Performance Sports Wash, which seems both cheap and effective.

I useNanoman Water proofing spray for fabrics. Stain repellent. It is weirdly cheaper from Amazon than direct from their store.

Bentley, R Alexander, Paul Ormerod, and Michael Batty. 2011.“Evolving Social Influence in Large Populations.”*Behavioral Ecology and Sociobiology* 65 (3): 537–46.

Bentley, R Alexander, Paul Ormerod, and Stephen Shennan. 2011.“Population-Level Neutral Model Already Explains Linguistic Patterns.”*Proceedings of the Royal Society B: Biological Sciences* 278 (1713): 1770–72.

Centola, D, and Michael W Macy. 2007.“Complex Contagions and the Weakness of Long Ties.”*American Journal of Sociology* 113 (3): 702.

DellaPosta, Daniel, Yongren Shi, and Michael Macy. 2015.“Why Do Liberals Drink Lattes?”*American Journal of Sociology* 120 (5): 1473–1511.

Luvaas, Brent. 2010.“Designer Vandalism: Indonesian Indie Fashion and the Cultural Practice of Cut‘n’ Paste.”*Visual Anthropology Review* 26 (1): 1–16.

———. 2013a.“Material Interventions: Indonesian DIY Fashion and the Regime of the Global Brand.”*Cultural Anthropology* 28 (1): 127–43.

———. 2013b.“Indonesian Fashion Blogs: On the Promotional Subject of Personal Style.”*Fashion Theory: The Journal of Dress, Body & Culture* 17 (1): 55–76.

Ormerod, Paul, and R Alexander Bentley. 2010.“Modelling Creative Innovation.”*Cultural Science* 3 (1).

Ormerod, Paul, and Greg Wiltshire. 2009.“‘Binge’ Drinking in the UK: A Social Network Phenomenon.”*Mind & Society* 8 (2): 135–52.

Serrà, Joan, Álvaro Corral, Marián Boguñá, Martín Haro, and Josep Ll Arcos. 2012.“Measuring the Evolution of Contemporary Western Popular Music.”*Scientific Reports* 2 (July).

Other people have written much more about principled self experimentation, so I will not. Here are some links though.

Obligatory: The tragic morality fable,Seth Roberts’ Final Column: Butter Makes Me Smarter.

- Gwern on blind trials
- White Paper: Design and Implementation of Participant-Led Research in the Quantified Self Community
- Quantified Self How-To: Designing Self-Experiments

I mention some work-monitoring apps undertime management.

Measuring moods? See the Experience Sampling Method(Verhagen et al. 2016;Hektner, Schmidt, and Csikszentmihalyi 2007) orSwan (2013).

Seebio-markers.

- Bearable | Mood & Symptoms Tracker App
- Exist is a hip quantified self tracker.
- Symple symptom journal and health diary
- Cronometer: Track nutrition & count calories
- Gyroscope · Your Personal Health Coach
- ActivityWatch?

- A set of watchers that record relevant information about what you do and what happens on your computer (such as if you are AFK or not, or which window is currently active).
- A way of storing data collected by the watchers.
- A dataformat accomodating most logging needs due to its flexibility.
- An ecosystem of tools to help users extend the software to fit their needs.

- N=1: Single-Subject Research – SLIME MOLD TIME MOLD
- N=1: Introduction – SLIME MOLD TIME MOLD
- Mark Koester,How to Export, Parse and Explore Your Apple Health Data with Python
- HealthExport - Export health data from your iPhone to CSV
- Simple Health Export CSV on the App Store
- Apple Health XML to CSV Converter - ericwolter.com
- Show & Tell Projects Archive - Quantified Self
- Bibliography - Quantified Self
- woop/awesome-quantified-self: Websites, Resources, Devices, Wearables, Applications, and Platforms for Self Tracking
- Open Humans
- markwk/qs_ledger: Quantified Self Personal Data Aggregator and Data Analysis
- onejgordon/flow-dashboard: A goal, task & habit tracker + personal dashboard to focus on what matters
- Flow
- pacogomez/health-records: Plain text health records
- heedy/heedy: An aggregator for personal metrics, and an extensible analysis engine
- heedy/heedy-notebook-plugin: Use Jupyter notebooks in Heedy
- NeuroEducate, the Book!

Chrisinger, Benjamin W. 2020.“The Quantified Self-in-Place: Opportunities and Challenges for Place-Based N-of-1 Datasets.”*Frontiers in Computer Science* 2.

Daskalova, Nediyana, Karthik Desingh, Jin Young Kim, Lixiang Zhang, Alexandra Papoutsaki, and Jeff Huang. n.d.“A Cohort of Self-Experimenters: Lessons Learned from N=1 Personal Informatics Experiments,” 12.

Dulaud, Paul, Ines Di Loreto, and Denis Mottet. 2020.“Self-Quantification Systems to Support Physical Activity: From Theory to Implementation Principles.”*International Journal of Environmental Research and Public Health* 17 (24): 9350.

Feng, Shan, Matti Mäntymäki, Amandeep Dhir, and Hannu Salmela. 2021.“How Self-Tracking and the Quantified Self Promote Health and Well-Being: Systematic Review.”*Journal of Medical Internet Research* 23 (9): e25171.

Hektner, Joel M., Jennifer A. Schmidt, and Mihaly Csikszentmihalyi. 2007.*Experience Sampling Method: Measuring the Quality of Everyday Life*. Experience Sampling Method: Measuring the Quality of Everyday Life. Thousand Oaks, CA, US: Sage Publications, Inc.

Heyen, Nils B. 2020.“From Self-Tracking to Self-Expertise: The Production of Self-Related Knowledge by Doing Personal Science.”*Public Understanding of Science* 29 (2): 124–38.

Pelayo, Verónica Rivera. n.d.“Design and Application of Quantified Self Approaches for Reflective Learning in the Workplace,” 358.

Swan, Melanie. 2013.“The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery.”*Big Data* 1 (2): 85–99.

Verhagen, Simone J W, Laila Hasmi, Marjan Drukker, J van Os, and Philippe A E G Delespaul. 2016.“Use of the Experience Sampling Method in the Context of Clinical Trials.”*Evidence-Based Mental Health* 19 (3): 86–89.

Scrapbook to collect various models of how inequity on various axes arises and is maintained. Having models for this is a good idea; otherwise we need to try to guess what we can change, what we cannot, and what the tradeoffs are, using only feelings. Yet ourfeelings need help.

Cailin O’Connor’sThe Origins of Unfairness: Social Categories and Cultural Evolution(O’Connor 2019).

Hilbe’s inequality model(Hauser et al. 2019) is another (?) game theoretic model of the difficulties of coordinating in an unequal society.

Possibly related:collective action.

Du, Nordell, and Joseph (2021) is a model of how gendered the promotion pathway is for non-obvious reasons:

The term

glass ceilingis applied to the well-established phenomenon in which women and people of color are consistently blocked from reaching the upper-most levels of the corporate hierarchy. Focusing on gender, we present an agent-based model that explores how empirically established mechanisms of interpersonal discrimination coevolve with social norms at both the organizational (meso) and societal (macro) levels to produce this glass ceiling effect for women. Our model extends our understanding of how the glass ceiling arises, and why it can be resistant to change. We do so by synthesizing existing psychological and structural theories of discrimination into a mathematical model that quantifies explicitly how complex organizational systems can produce and maintain inequality. We discuss implications of our findings for both intervention and future empirical analyses, and provide open-source code for those wishing to adapt or extend our work.

Their model auditions various, apparently-individually-minor, points of adverse gender discrimination and finds that the overall result is large divergence between genders. This is a stylised model, but it matches my understanding of the world well.

Why am I mentioning this here? Because I think it characterises an important dynamic that is often under-addressed in workplace equity arguments and seems to be under-addressed in favour ofgetting sidetracked with other stuff that doesn’t necessarily help so much.

TODO: How to address cumulative disadvantage models best? Quotas, formal sponsorship, etc?

- Matthew effect
- Cumulative inequality theory
- Tanya Khovanova’s Mathematical Model for Gender Bias in Mathematics. This is very simple but gets straight to the heart of one of the challenges of quota systems.
- Michele Coscia,Meritocracy vs Topocracy [@

Clifton et al. (2019) promote a minimum viable version of this:

Here, we present a minimal mathematical model that reveals the relative role that bias and homophily (self-seeking) may play in the ascension of women through professional hierarchies. Unlike previous models, our novel model predicts that gender parity is not inevitable, and deliberate intervention may be required to achieve gender balance in several fields. To validate the model, we analyze a new database of gender fractionation over time for 16 professional hierarchies. The decreasing representation of women at increasing levels of power within hierarchical professions has been called the “leaky pipeline” effect, but the main cause of this phenomenon remains contentious. Using a mathematical model of gender dynamics within professional hierarchies and a new database of gender fractionation over time, we quantify the impact of the two major decision-makers in the ascension of people through hierarchies: those applying for promotion and those who grant promotion. The model is the first to demonstrate that intervention may be required to reach gender parity in some fields.

Inequality between groups arises because groups can coordinate to capture more resources for themselves at the expense of another group. Seeconflict theory.

Borondo, J., F. Borondo, C. Rodriguez-Sickert, and C. A. Hidalgo. 2014.“To Each According to Its Degree: The Meritocracy and Topocracy of Embedded Markets.”*Scientific Reports* 4 (1): 3784.

Clauset, Aaron, Samuel Arbesman, and Daniel B. Larremore. 2015.“Systematic Inequality and Hierarchy in Faculty Hiring Networks.”*Science Advances* 1 (1): e1400005.

Clifton, Sara M., Kaitlin Hill, Avinash J. Karamchandani, Eric A. Autry, Patrick McMahon, and Grace Sun. 2019.“Mathematical Model of Gender Bias and Homophily in Professional Hierarchies.”*Chaos: An Interdisciplinary Journal of Nonlinear Science* 29 (2): 023135.

Coscia, Michele, and Clara Vandeweerdt. 2022.“Posts on Central Websites Need Less Originality to Be Noticed.”*Scientific Reports* 12 (1): 15265.

Crystal, Stephen, Dennis G Shea, and Adriana M Reyes. 2017.“Cumulative Advantage, Cumulative Disadvantage, and Evolving Patterns of Late-Life Inequality.”*The Gerontologist* 57 (5): 910–20.

Du, Yuhao, Jessica Nordell, and Kenneth Joseph. 2021.“Insidious Nonetheless: How Small Effects and Hierarchical Norms Create and Maintain Gender Disparities in Organizations.”*arXiv:2110.04196 [Cs]*, October.

Fehr, Ernst, and Klaus M. Schmidt. 1999.“A Theory of Fairness, Competition, and Cooperation.”*The Quarterly Journal of Economics* 114 (3): 817–68.

Gould, Roger V. 2002.“The Origins of Status Hierarchies: A Formal Theory and Empirical Test.”*American Journal of Sociology* 107 (5): 1143–78.

Hauser, Oliver P., Christian Hilbe, Krishnendu Chatterjee, and Martin A. Nowak. 2019.“Social Dilemmas Among Unequals.”*Nature* 572 (7770): 524–27.

Hauser, Oliver P., Gordon T. Kraft-Todd, David G. Rand, Martin A. Nowak, and Michael I. Norton. n.d.“Invisible Inequality Leads to Punishing the Poor and Rewarding the Rich.”*Behavioural Public Policy*, 1–21.

Hetzer, Moritz, and Didier Sornette. 2009.“Other-Regarding Preferences and Altruistic Punishment: A Darwinian Perspective.” SSRN Scholarly Paper ID 1468517. Rochester, NY: Social Science Research Network.

Hisano, Ryohei, Didier Sornette, and Takayuki Mizuno. 2011.“Predicted and Verified Deviations from Zipf’s Law in Ecology of Competing Products.”*Physical Review E* 84 (2): 026117.

Kaldasch, Joachim. 2012.“Evolutionary Model of the Personal Income Distribution.”*Physica A: Statistical Mechanics and Its Applications* 391 (22): 5628–42.

Merton, Robert K. 1968.“The Matthew Effect in Science.”*Science* 159 (3810): 56–63.

———. 1988.“The Matthew Effect in Science, II: Cumulative Advantage and the Symbolism of Intellectual Property.”*Isis* 79 (4): 606–23.

O’Connor, Cailin. 2019.*The Origins of Unfairness: Social Categories and Cultural Evolution*. Oxford, New York: Oxford University Press.

O’Connor, Cailin, Liam Kofi Bright, and Justin P. Bruner. 2019.“The Emergence of Intersectional Disadvantage.”*Social Epistemology* 33 (1): 23–41.

Pratto, Felicia, Jim Sidanius, and Shana Levin. 2006.“Social Dominance Theory and the Dynamics of Intergroup Relations: Taking Stock and Looking Forward.”*European Review of Social Psychology* 17 (1): 271–320.

Ross, Matthew B., Britta M. Glennon, Raviv Murciano-Goroff, Enrico G. Berkes, Bruce A. Weinberg, and Julia I. Lane. 2022.“Women Are Credited Less in Science Than Men.”*Nature*, June, 1–11.

Rowe, Mary. 1977.“The Saturn’s Rings Phenomenon.” In*Conference on Women’s Leadership and Authority in the Health Professions, Santa Cruz, CA*.

Salganik, Matthew J., Peter Sheridan Dodds, and Duncan J. Watts. 2006.“Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.”*Science* 311 (5762): 854–56.

Shippee, Tetyana P., Lindsay Rinaldo, and Kenneth F. Ferraro. 2012.“Mortality Risk Among Black and White Working Women: The Role of Perceived Work Trajectories.”*Journal of Aging and Health* 24 (1): 141–67.

Smith, Jennifer E, B Natterson-Horowitz, and Michael E Alfaro. 2021.“The Nature of Privilege: Intergenerational Wealth in Animal Societies.”*Behavioral Ecology*, December, arab137.

Venkatasubramanian, Suresh, Carlos Scheidegger, Sorelle Friedler, and Aaron Clauset. 2021.“Fairness in Networks: Social Capital, Information Access, and Interventions.” In*Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining*, 4078–79. KDD ’21. New York, NY, USA: Association for Computing Machinery.

Willson, Andrea E., Kim M. Shuey, and Jr. Elder Glen H. 2007.“Cumulative Advantage Processes as Mechanisms of Inequality in Life Course Health.”*American Journal of Sociology* 112 (6): 1886–1924.

Wolpert, David H. 2010.“Why Income Comparison Is Rational.”*Ecological Economics* 69 (2): 458–74.

Wooldredge, John, James Frank, Natalie Goulette, and Lawrence Travis III. 2015.“Is the Impact of Cumulative Disadvantage on Sentencing Greater for Black Defendants?”*Criminology & Public Policy* 14 (2): 187–223.

A trick where we cleverlytransform rvs to sample from tricky target distributions via a “nice” nice source distribution.
Useful in e.g.variational inference, especiallyautoencoders,
fordensity estimation inprobabilistic deep learning,
best summarised as “fancy change of variables such that I can differentiate through the parameters of a distribution”,
usually byMC.Storchastic credits this to(Glasserman and Ho 1991) as*perturbation analysis*.

Connections tooptimal transport andlikelihood free inference, in that this trick can enable some clever approximate-likelihood approaches.

Terminology:
All variables here are assumed to take values in\(\mathbb{R}^D\).
If I am writing about a random*variable* I write it\(\mathsf{x}\) and if I am
writing about the realized values, I write it\(x\).\(\mathsf{x}\sim p(x)\) should be read “The random variable\(\mathsf{x}\) has law with density\(p(x)\).”^{1}

- Rui Shu explains change of variables in probability and shows how it induces the normalizing flow idea.
- PyMC3 example of a non-trivial usage.
- Adam Kosiorek summarises some fancy variants of normalizing flow.
- Eric Jang did a tutorial which explains how this works inTensorflow.
- Praveen onRuiz, Titsias, and Blei (2016).
- Yuge Shi’s variational inference tutorial is a tour of cunning reparameterisation gradient tricks. Written for her paperShi et al. (2019). She punts some details toMohamed et al. (2020) which in turn tells me I need to readFigurnov, Mohamed, and Mnih (2018),Devroye (2006) andJankowiak and Obermeyer (2018).
- Dustin Tran,Denoising Criterion for Variational Auto-Encoding Framework
- Shakir Mohamed,Machine Learning Trick of the Day (4): Reparameterisation Tricks
- pamamakouros on normalizing flows
- Eric Jang on normalizing flows

Invariational autoencoders (VAEs)
we talk about*normalizing flows*.
AFAICT the terminology is to imply that the reparameterisation should be differentiable and bijective.Ingmar Shuster’s summary
of the foundational(Rezende and Mohamed 2015) has the obvious rant about terminology:

The paper adopts the term

normalizing flowfor referring to the plain old change of variables formula for integrals.

Some of the literature is tidily summarized in the Sylvester flow paper(Berg et al. 2018). I have had the dissertation(Papamakarios 2019) recommended to me a lot as a summary for this field, although I have not read it. The shorter summary paper looks good though(Papamakarios et al. 2021).

The setup for this application is to dovariational inference, particularly in adeep learning context. It would be convenient for us while doing inference to have a way of handling an approximate posterior density over the latent variables\(\mathsf{z}\) given the data\(\mathsf{x}\). The real posterior density\(p(z|x)\) is intractable, so we construct an approximate density\(q_{\phi}(z|x)\) parameterised by some variational parameters\(\phi\). We have certain desiderata. Specifically, we would to efficiently…

- calculate the density\(q_{\phi}(z|x)\), such that…
- we can estimate expectations/integrals with respect to\(q_{\phi}(\cdot|z)\) in the sense that we can estimate\(\int q_{\phi}(\cdot|z)f(z) \mathrm{d} z,\) which we are satisfied to do via Monte Carlo, and so it will suffice if we can simulate from this density, and
- we can differentiate through this density with respect to the variational parameters to find\({\textstyle \frac{\partial }{\partial \phi }q_{\phi}(z|x)}.\)

Additionally we would like our method to be flexible enough that we can persuade ourselves that it is capable of approximating a complicated, messy posterior density such as might arise in the course of our inference; that is, we would like to buy all these convenient characteristics with the lowest possible cost in verisimilitude.

This kind of challenge arises naturally in thevariational autoencoder problems(Diederik P. Kingma and Welling 2014) where those properties are enough to get us affordable approximate inference. The whole problem in such a case would entail solving for the following fairly typical kind of approximate variational objective:

\[\begin{aligned} \log p_{ \theta }\left( x\right) &\geq \mathcal{L}\left( \theta, \phi ; x\right)\\ &=\mathbb{E}_{q_{\phi}( z | x )}\left[-\log q_{\phi}( z | x )+ \log p_{ \theta }( x, z )\right]\\ &=-\operatorname{KL}\left(q_{\phi}\left( z | x\right) \| p_{ \theta }( z )\right)+ \mathbb{E}_{q_{\phi}\left( z | x\right)}\left[ \log p_{ \theta }\left( x | z \right)\right]\\ &=-\mathcal{F}(\theta, \phi) \end{aligned}\]

Here\(p_{\theta}(x)\) is the marginal likelihood of the generative model for the data,\(p_{\theta}(x|z)\) the density of observations given the latent factors and\(\theta\) parameterises the density of the generative model. We pronounce\(\mathcal{F}(\theta, \phi)\) asvariational free energy for reasons of tradition.

(For the next bit, we temporarily suppress the dependence on\(\mathsf{x}\) to avoid repetition, and the dependence of the transforms and density upon\({\phi}\).)

The reparameterization trick is an answer to those desiderata. We get the magical\(q\) of our dreams by requiring it to have a particular form, then using basic multivariate calculus to approximately enforce the required properties.

Specifically, we assume that for some function (not density!)\(f:\mathbb{R}^D\to\mathbb{R}^D,\) that\[\mathsf{z}=f(\mathsf{z}_0)\] and that\(\mathsf{z}_0\sim q_{0}=\mathcal{N}(0,\mathrm{I} )\) (or some similarly easy distribution). It will turn out that by imposing some extra restrictions, we can do most of the heavy lifting in this algorithm through this simple\(\mathsf{z}_0\sim q_0\) and still get the power of a fancy posterior\(\mathsf{z}\sim q.\)

Now we can calculate (sufficiently nice) expectations with respect to\(\mathsf{z}\) using the law of the unconscious statistician\[\mathbb{E} f(\mathsf{z})=\int f(z) q_0(z) \mathrm{d} z.\]

However, we need to impose additional conditions to guarantee a tractable form for the densities; in particular we will impose the restriction that\(f\) is invertible, so that we can use thechange of variables formula to find the density

\[q\left( z \right) =q_0( f^{-1}(z) )\left| \operatorname{det} \frac{\partial f^{-1}(z)}{\partial z }\right| =q_0( f^{-1}(z) )\left| \operatorname{det} \frac{\partial f(z)}{\partial z }\right|^{-1}.\]

We can economise on function inversions since we are evaluating always via simulating\(\mathsf{z}\) from\(f(\mathsf{z}_0)\), i.e.\(\mathsf{z}:=f( \mathsf{z}_0)\), so we can write

\[q\left( \mathsf{z} \right) =q_0(\mathsf{z}_0 )\left| \operatorname{det} \frac{\partial f(\mathsf{z})}{\partial \mathsf{z} }\right|^{-1}.\]

We do not need, that is to say,\(f\) inversions (in this VAE context) just the Jacobian determinant, which might be easier. Spoiler: later on we construct some\(f\) transforms for which it is substantially easier to find that Jacobian determinant without inverting the function.

In our application, which may as well be the VAE, why not, we want this mapping to depend upon\(\mathsf{x}\) and the parameters\(\phi\) so we reintroduce that dependence now. We parameterize these functions\(f_{\phi}(\mathsf{z}):=f(\mathsf{z},\phi(\mathsf{x}));\) that is, our\(\phi\) parameters are a learned mapping from observation\(\mathsf{z}_0\) to a posterior sample from the latent variable\(\mathsf{z}\) with density\(q_{K}\left( z _{K} | x \right) \simeq p_\theta(z|x)\) such that, if we configure everything just right,\(q_{K}\left( z _{K} | x \right) \simeq p_\theta(z|x)\).

And indeed it is not too hard to find a recipe for configuring these parameters so that we have the best attainable approximation.

Noting that, we may estimate the following derivatives\(\nabla_{\theta} \mathcal{F}( x )\) and\(\nabla_{\phi} \mathcal{F}( x )\) which is sufficient to minimise\(\mathcal{F}(\theta, \phi)\) by gradient descent.

In practice we are doing this in a big data context, so we will use stochastic subsamples from\(x\) to estimate the gradient and we will also use Monte Carlo simulations from\(\mathsf{z}_0\) to estimate the necessary integrals with respect to\(q_K(\cdot|\mathsf{x}),\) but you can read the paper for the details of that.

So far so good. But it turns out that this is not in general*very* tractable,
because determinants are notoriously expensive to calculate, scaling poorly —\(\mathcal{O}(D^3)\) — in the
number of dimensions, which we will expect to be large in trendy neural network
type problems.

We look for restrictions on the form of\(f_{\phi}\) such that\(\operatorname{det} \frac{\partial f_{\phi}}{\partial z }\) is cheap and yet the approximate\(q\) they induce is still “flexible enough”.

The normalizing flow solution is to choose compositions of some class of*cheap* functions\[ \mathsf{z}_{K}=f_{K} \circ \ldots \circ f_{2} \circ f_{1}\left( \mathsf{z} _{0}\right)\sim q_K.\]

By induction on the change of variables formula (and using the logarithm for tidiness), we can find the density of the variable mapped through this flow as

\[\log q_{K}\left( \mathsf{z} _{K} | x \right) =\log q_{0}\left( \mathsf{z} _{0} | x \right) - \sum_{k=1}^{K} \log \left|\operatorname{det}\left(\frac{\partial f_{k}\left( \mathsf{z} _{k-1} \right)}{\partial \mathsf{z} _{k-1}}\right)\right| \]

Compositions of such cheap functions should also be a cheap-ish way of buying flexibility. But which\(f_k\) mappings are cheap? The archetypal one from(Rezende and Mohamed 2015) is the “planar flow”:

\[ f(\mathsf{z})= \mathsf{z} + u h\left( w ^{\top} \mathsf{z} +b\right)\]

where\(u, w \in \mathbb{R}^{D}, b \in \mathbb{R}\) and\(h:\mathbb{R}\to\mathbb{R}\) is some monotonic differentiable activation function.

There is a standard identity, the*matrix determinant lemma*\(\operatorname{det}\left( \mathrm{I} + u v^{\top}\right)=1+ u ^{\top} v,\)
from which it follows

\[\begin{aligned} \operatorname{det} \frac{\partial f}{\partial z } &=\operatorname{det}\left( \mathrm{I} + u h^{\prime}\left( w ^{\top} z +b\right) w ^{\top}\right) \\ &=1+ u ^{\top} h^{\prime}\left( w ^{\top} z +b\right) w. \end{aligned}\] We can often simply parameterise acceptable domains for functions like these so that they remain invertible; For example if\(h\equiv \tanh\) a sufficient condition is\(u^\top w\geq 1.\) This means that we know it should be simple to parameterise these weights\(u,w,b\). Or, as in our application, that it is easy to construct functions\(\phi:x\mapsto u,w,b\) which are guaranteed to remain in an invertible domain.

For functions like this the determinant calculation is cheap, and does not depend at all on the ambient dimension of\(\mathsf{x}\). However, we might find it hard to persuade ourselves that this mapping is flexible enough to represent\(q\) well, at least not without letting\(K\) be large, as the mapping must pass through a univariate “bottleneck”\(w ^{\top} \mathsf{z}.\) Indeed, empirically this does not in fact perform well and a lot of time has been spent trying to do better.

A popular solution to this problem is given by the*Sylvester flow*(Berg et al. 2018) which, instead of the*matrix determinant lemma* uses a
generalisation,*Sylvester’s determinant identity*. This tells us that for all\(\mathrm{A} \in \mathbb{R}^{D \times M}, \mathrm{B} \in \mathbb{R}^{M \times R}\),

\[ \operatorname{det}\left( \mathrm{I}_{D}+ \mathrm{A} \mathrm{B} \right) =\operatorname{det}\left( \mathrm{I}_{M}+ \mathrm{B} \mathrm{A} \right)\] where\(\mathrm{I}_{M}\) and\(\mathrm{I}_{D}\) are respectively\(M\) and\(D\) dimensional identity matrices.

This suggests we might look at a generalized planar flow,

\[ f(\mathsf{z}):= \mathsf{z} + \mathrm{A} h\left(\mathrm{B} \mathsf{z} +b\right)\]

with\(\mathrm{A} \in \mathbb{R}^{D \times M}, \mathrm{B} \in \mathbb{R}^{M \times D}, b \in \mathbb{R}^{M},\) and\(M \leq D.\) The determinant calculation here scales as (\(\mathcal{O}(M^3)\ll \mathcal{O}(D^3),\) which is at least cheaper than the general case, and (we hope) gives us enough additional scope to design sufficiently flexible flows, since we have a bottleneck of size\(M \geq 1.\)

The price is an additional parameterisation problem. How do we select\(\mathrm{A}\) and\(\mathrm{B}\) such that they are still invertible for a given\(h\)? The solution in(Berg et al. 2018) is to break this into two simpler parameterization problems.

They choose\(f(\mathsf{z}):=\mathsf{z}+\mathrm{Q} \mathrm{R} h\left(\tilde{\mathrm{R}} \mathrm{Q}^{T} \mathsf{z}+\mathrm{b}\right)\) where\(\mathrm{R}\) and\(\tilde{\mathrm{R}}\) are upper triangular\(M \times M\) matrices, and\(Q\) is a\(D\times M\) matrix with orthonormal columns\(\mathrm{Q}=\left(\mathrm{q}_{1} \ldots \mathrm{q}_{M}\right).\) Using the Sylvester identity on this\(f\) we find

\[\operatorname{det} \frac{\partial f}{\partial z } =\operatorname{det}\left( \mathrm{I} _{M}+\operatorname{diag}\left(h^{\prime}\left(\tilde{\mathrm{R} } \mathrm{Q} ^{T} z + b \right)\right) \tilde{\mathrm{R} } \mathrm{R} \right)\]

They also show that if, in addition,

- \(h: \mathbb{R} \longrightarrow \mathbb{R}\) is a smooth function with bounded, positive derivative and
- if the diagonal entries of\(\mathrm{R}\) and\(\tilde{\mathrm{R}}\) satisfy\(r_{i i} \tilde{r}_{i i}>-1 /\left\|h^{\prime}\right\|_{\infty}\) and
- \(\tilde{\mathrm{R}}\) is invertible,

then\(f\) is invertible as required.

Now all we need to action this do is choose a differentiable parameterisation of the upper-triangular\(\mathrm{R},\) the upper-triangular invertible\(\tilde{\mathrm{R} }\) with appropriate diagonal entries and theorthonormal matrix,\(Q\). That is a whole other story though.

The parallel with the problem of findingcovariance kernels is interesting. In both cases we have some large function class we wish to parameterise so we can search over it, but we restrict it to a subset with computationally convenient properties and a simple domain. This probably arises in other nonparametrics?

The most fundamental restriction of the normalizing flow paradigm is that each layer needs to be invertible. We ask whether this restriction has any ‘cost’ in terms of the size, and in particular the depth, of the model. Here we’re counting depth in terms of the number of the invertible transformations that make up the flow. A requirement for large depth would explain training difficulties due to exploding (or vanishing) gradients. Since the Jacobian of a composition of functions is the product of the Jacobians of the functions being composed, the min (max) singular value of the Jacobian of the composition is the product of the min (max) singular value of the Jacobians of the functions. This implies that the smallest (largest) singular value of the Jacobian will get exponentially smaller (larger) with the number of compositions.

A natural way of formalizing this question is by exhibiting a distribution which is easy to model for an unconstrained generator network but hard for a shallow normalizing flow. Precisely, we ask: is there a probability distribution that can be represented by a shallow generator with a small number of parameters that could not be approximately represented by a shallow composition of invertible transformations?

We demonstrate that such a distribution exists.

…To reiterate the takeaway: a GLOW-style linear layer in between affine couplings could in theory make your network between 5 and 47 times smaller while representing the same function. We now have a precise understanding of the value of that architectural choice!

TBC

Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savare. 2008.*Gradient Flows: In Metric Spaces and in the Space of Probability Measures*. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.

Bamler, Robert, and Stephan Mandt. 2017.“Structured Black Box Variational Inference for Latent Time Series Models.”*arXiv:1707.01069 [Cs, Stat]*, July.

Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018.“Sylvester Normalizing Flows for Variational Inference.” In*UAI18*.

Caterini, Anthony L., Arnaud Doucet, and Dino Sejdinovic. 2018.“Hamiltonian Variational Auto-Encoder.” In*Advances in Neural Information Processing Systems*.

Charpentier, Bertrand, Oliver Borchert, Daniel Zügner, Simon Geisler, and Stephan Günnemann. 2022.“Natural Posterior Network: Deep Bayesian Uncertainty for Exponential Family Distributions.”*arXiv:2105.04471 [Cs, Stat]*, March.

Chen, Changyou, Chunyuan Li, Liqun Chen, Wenlin Wang, Yunchen Pu, and Lawrence Carin. 2017.“Continuous-Time Flows for Efficient Inference and Density Estimation.”*arXiv:1709.01179 [Stat]*, September.

Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018.“Neural Ordinary Differential Equations.” In*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 6572–83. Curran Associates, Inc.

Devroye, Luc. 2006.“Chapter 4 Nonuniform Random Variate Generation.” In*Simulation*, edited by Shane G. Henderson and Barry L. Nelson, 13:83–121. Handbooks in Operations Research and Management Science. Elsevier.

Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. 2016.“Density Estimation Using Real NVP.” In*Advances In Neural Information Processing Systems*.

Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. 2018.“Implicit Reparameterization Gradients.” In*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 441–52. Curran Associates, Inc.

Glasserman, Paul, and Yu-Chi Ho. 1991.*Gradient Estimation Via Perturbation Analysis*. Springer Science & Business Media.

Grathwohl, Will, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018.“FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models.”*arXiv:1810.01367 [Cs, Stat]*, October.

Huang, Chin-Wei, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018.“Neural Autoregressive Flows.”*arXiv:1804.00779 [Cs, Stat]*, April.

Jankowiak, Martin, and Fritz Obermeyer. 2018.“Pathwise Derivatives Beyond the Reparameterization Trick.” In*International Conference on Machine Learning*, 2235–44.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016.“Improving Variational Inference with Inverse Autoregressive Flow.” In*Advances in Neural Information Processing Systems 29*. Curran Associates, Inc.

Kingma, Diederik P., Tim Salimans, and Max Welling. 2015.“Variational Dropout and the Local Reparameterization Trick.” In*Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2*, 2575–83. NIPS’15. Cambridge, MA, USA: MIT Press.

Kingma, Diederik P., and Max Welling. 2014.“Auto-Encoding Variational Bayes.” In*ICLR 2014 Conference*.

Kingma, Durk P, and Prafulla Dhariwal. 2018.“Glow: Generative Flow with Invertible 1x1 Convolutions.” In*Advances in Neural Information Processing Systems 31*, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 10236–45. Curran Associates, Inc.

Koehler, Frederic, Viraj Mehta, and Andrej Risteski. 2020.“Representational Aspects of Depth and Conditioning in Normalizing Flows.”*arXiv:2010.01155 [Cs, Stat]*, October.

Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. 2019.“Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures.”*arXiv:1910.13398 [Cs, Stat]*, October.

Louizos, Christos, and Max Welling. 2017.“Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In*PMLR*, 2218–27.

Lu, You, and Bert Huang. 2020.“Woodbury Transformations for Deep Generative Flows.” In*Advances in Neural Information Processing Systems*. Vol. 33.

Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016.“Sampling via Measure Transport: An Introduction.” In*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1:1–41. Cham: Springer International Publishing.

Massaroli, Stefano, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, and Hajime Asama. 2020.“Stable Neural Flows.”*arXiv:2003.08063 [Cs, Math, Stat]*, March.

Mohamed, Shakir, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020.“Monte Carlo Gradient Estimation in Machine Learning.”*Journal of Machine Learning Research* 21 (132): 1–62.

Ng, Tin Lok James, and Andrew Zammit-Mangion. 2020.“Non-Homogeneous Poisson Process Intensity Modeling and Estimation Using Measure Transport.”*arXiv:2007.00248 [Stat]*, July.

Papamakarios, George. 2019.“Neural Density Estimation and Likelihood-Free Inference.” The University of Edinburgh.

Papamakarios, George, Iain Murray, and Theo Pavlakou. 2017.“Masked Autoregressive Flow for Density Estimation.” In*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2338–47. Curran Associates, Inc.

Papamakarios, George, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021.“Normalizing Flows for Probabilistic Modeling and Inference.”*Journal of Machine Learning Research* 22 (57): 1–64.

Pfau, David, and Danilo Rezende. 2020.“Integrable Nonparametric Flows.” In, 7.

Ran, Zhi-Yong, and Bao-Gang Hu. 2017.“Parameter Identifiability in Statistical Machine Learning: A Review.”*Neural Computation* 29 (5): 1151–1203.

Rezende, Danilo Jimenez, and Shakir Mohamed. 2015.“Variational Inference with Normalizing Flows.” In*International Conference on Machine Learning*, 1530–38. ICML’15. Lille, France: JMLR.org.

Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2015.“Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In*Proceedings of ICML*.

Rippel, Oren, and Ryan Prescott Adams. 2013.“High-Dimensional Probability Estimation with Deep Density Models.”*arXiv:1302.5125 [Cs, Stat]*, February.

Ruiz, Francisco J. R., Michalis K. Titsias, and David M. Blei. 2016.“The Generalized Reparameterization Gradient.” In*Advances In Neural Information Processing Systems*.

Shi, Yuge, N. Siddharth, Brooks Paige, and Philip H. S. Torr. 2019.“Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models.”*arXiv:1911.03393 [Cs, Stat]*, November.

Spantini, Alessio, Ricardo Baptista, and Youssef Marzouk. 2022.“Coupling Techniques for Nonlinear Ensemble Filtering.”*SIAM Review* 64 (4): 921–53.

Spantini, Alessio, Daniele Bigoni, and Youssef Marzouk. 2017.“Inference via Low-Dimensional Couplings.”*Journal of Machine Learning Research* 19 (66): 2639–709.

Tabak, E. G., and Cristina V. Turner. 2013.“A Family of Nonparametric Density Estimation Algorithms.”*Communications on Pure and Applied Mathematics* 66 (2): 145–64.

Tabak, Esteban G., and Eric Vanden-Eijnden. 2010.“Density Estimation by Dual Ascent of the Log-Likelihood.”*Communications in Mathematical Sciences* 8 (1): 217–33.

Wang, Prince Zizhuang, and William Yang Wang. 2019.“Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling.” In*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)*, 284–94. Minneapolis, Minnesota: Association for Computational Linguistics.

Xu, Ming, Matias Quiroz, Robert Kohn, and Scott A. Sisson. 2018.“Variance Reduction Properties of the Reparameterization Trick.”*arXiv:1809.10330 [Cs, Stat]*, September.

Yang, Yunfei, Zhen Li, and Yang Wang. 2021.“On the Capacity of Deep Generative Networks for Approximating Distributions.”*arXiv:2101.12353 [Cs, Math, Stat]*, January.

Zahm, Olivier, Paul Constantine, Clémentine Prieur, and Youssef Marzouk. 2018.“Gradient-Based Dimension Reduction of Multivariate Vector-Valued Functions.”*arXiv:1801.07922 [Math]*, January.

Zhang, Xin, and Andrew Curtis. 2021.“Bayesian Geophysical Inversion Using Invertible Neural Networks.”*Journal of Geophysical Research: Solid Earth* 126 (7): e2021JB022320.

I am aware from painful experience that using a font to denote a random variable will irritate some people, but this is usually the same people who have strong opinions about “which” versus “that” and other such bullshit, and such people are welcome to their opinions but they can get their own blog to express those opinions on in notations which suit their purposes. In my context, things come out

*much*clearer if we can distinguish RVs, their realisations, densities and observations using some appropriate notation which is not capitalisation or italics, which I already use for other purposes.↩︎

If you want to do SVGs,snap.js is a modern SVG library by the author ofraphaël.js.

3D graphics in the browser is convenient and somewhat performant. And you can do3d dance parties in the browser, which is surely the benchmark.

Two common options use OpenGL ES, the mobile- and browser- friendly option.

- Scenejs seems to specialise in loading up geometries and shapes and physics for realistic scene modelling
- three.js does the same things, but does more abstract stuff with them
- shadertoy support writing raw GSL shaders in the browser.
- Use.GPU reimagines GPU scene paintingvia javascript FRP idioms

Everything supports lense flare, which is the main thing.

For desktop apps and a larger OpenGL subset there~~is~~ was a desktop option,Plask which seems to be some kind of
particle-system-friendly macOS app, with~~spurty~~ no ongoing development but many elegant applications.

cables.gl attempts to create a VVVV-style patcher in javascript.

cables is your model kit for creating beautiful interactive content. With an easy to navigate interface and results in real time, it allows for fast prototyping and prompt adjustments.

Working with cables is just as easy as creating cable spaghetti:

You are provided with a given set of operators such as mathematical functions, shapes and materials. Connect these to each other using virtual cables to create the scene you have in mind. Easily export your piece of work at any time. Embed it into your website or use it for any kind of creative installation.

Threefab designs THREE.js scenes.

Nunustudio does too (source code )

Threenodes.js (source) also attempts to create a VVVV-style patcher in javascript

Neuroglancer is a WebGL-based viewer for volumetric data. It is capable of displaying arbitrary (non axis-aligned) cross-sectional views of volumetric data, as well as 3-D meshes and line-segment based models (skeletons).

p5.js a port of processing to js. Seems CPU profligate.

centi.js hmm.

The*big six* exact matrix decompositions are(Stewart 2000)
Cholesky decomposition; pivoted LU decomposition; QR decomposition; spectral decomposition; Schur decomposition; and singular value decomposition.

See Nick Higham’ssummary of those.

Mastered QR and LU decompositions? There are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if we suspect our matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be close to sparse if we chose the right metric, or…

A big matrix is close to, in some sense, the (tensor/matrix) product (or sum, or…) of some matrices that are in some way simple (small-rank, small dimension, sparse), possibly with additional constraints. Can we find those simple matrices?

Ethan Epperly’s introduction toLow-rank Matrices puts many ideas clearly.

Here’s an example:Godec — A decomposition into low-rank*and* sparse components which loosely speaking, combines multidimensional factorisation and outlier detection.

There are so many more of these things, depending on your preferred choice of loss function, free variables and such.

Keywords: Matrix sketching, low-rank approximation, traditionaldimensionality reduction.

Matrixconcentration inequalities turn out to a useful tool.

I would like to learn more about

- sparse or low-rank matrix approximation as clustering for density estimation, which is how I imagine high-dimensional mixture models would need to work, and thereby also
- Mercer kernel approximation.
- Connection tomanifold learning is also probably worth examining.

Igor Carron’sMatrix Factorization Jungle classifies the following problems as matrix-factorisation type.

- Kernel Factorizations
- …
- Spectralclustering
- \([A = DX]\) with unknown D and X, solve for sparse X and X_i = 0 or 1
- K-Means / K-Medianclustering
- \([A = DX]\) with unknown D and X, solve for XX^{} = I and X_i = 0 or 1
- Subspaceclustering
- \([A = AX]\) with unknown X, solve for sparse/other conditions on X
- Graph Matching
- \([A = XBX^{\TOP}]\) with unknown X, B solve for B and X as a permutation
- NMF
- \([A = DX]\) with unknown D and X, solve for elements of D,X positive
- Generalized Matrix Factorization
- \([W.*L − W.*UV']\) with W a known mask, U,V unknowns solve for U,V and L lowest rank possible
- Matrix Completion
- \([A = H.*L]\) with H a known mask, L unknown solve for L lowest rank possible
- Stable Principle Component Pursuit (SPCP)/ Noisy Robust PCA
- \([A = L + S + N]\) with L, S, N unknown, solve for L low rank, S sparse, N noise
- Robust PCA
- \([A = L + S]\) with L, S unknown, solve for L low rank, S sparse
- Sparse PCA
- \([A = DX]\) with unknown D and X, solve for sparse D
- Dictionary Learning
- \([A = DX]\) with unknown D and X, solve for sparse X
- Archetypal Analysis
- \([A = DX]\) with unknown D and X, solve for D = AB with D, B positive
- Matrix Compressive Sensing (MCS)
- find a rank-r matrix L such that\([A(L) ~= b]\) / or\([A(L+S) = b]\)
- Multiple Measurement Vector (MMV)
- \([Y = A X]\) with unknown X and rows of X are sparse
- Compressed sensing
- \([Y = A X]\) with unknown X and rows of X are sparse, X is one column.
- Blind Source Separation (BSS)
- \([Y = A X]\) with unknown A and X and statistical independence between columns of X or subspaces of columns of X
- Partial and Online SVD/PCA
- …
- Tensor Decomposition
- Many, many options; seetensor decompositions for some tractable ones.

Truncated Classic PCA is clearly also an example, but is excluded from the list for some reason. Boringness? the fact it’s a special case of Sparse PCA?

I also add

- Square root
- \(Y = X^{\top}X\) for\(Y\in\mathbbb{R}^{N\times N}, X\in\mathbbb{R}^{N\times n}\), with (typically)\(n<N\).

See alsolearning on manifolds,compressed sensing,optimisationrandom linear algebra andclustering,sparse regression…

For certain types of data matrix, here is a suggestive observation:Udell and Townsend (2019) ask “Why Are Big Data Matrices Approximately Low Rank?”

Matrices of (approximate) low rank are pervasive in data science, appearing in movie preferences, text documents, survey data, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention paid to explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an\(m\times n\) matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as\(\mathcal{O}(\log(m+n))\). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.

Ethan Epperlyargues from afunction approximation perspective (e.g.) that we can deduce this property from smoothness of functons.

Saul (2023) connectsnon-negative matrix factorisation togeometric algebra andlinear algebra viadeep learning andkernels. that sounds like fun.

- Data mining seminar: Matrix sketching
- Kumar and Schneider have a literature survey on low rank approximation of matrices(Kumar and Shneider 2016)
- Preconditioning tutorial by Erica Klarreich
- Andrew McGregor’s ICML TutorialStreaming, sampling, sketching
- more atsignals and graph.
- Another one that makes the link to clustering is Chris Ding’sPrincipal Component Analysis and Matrix Factorizations for Learning
- Igor Carron’sAdvanced Matrix Factorization Jungle.

Total Least Squares (a.k.a. orthogonal distance regression, or error-in-variables least-squares linear regression) is a low-rank matrix approximation that minimises the Frobenius divergence from the data matrix. Who knew?

Various otherdimensionality reduction techniques can be put in a regression framing, notable Exponential-family PCA.

“Sketching” is a common term to describe a certain type of low-rank factorisation, although I am not sure which types. 🏗

(Martinsson 2016) mentions CUR and interpolative decompositions. Does preconditioning fit ?

It seems like low-rank matrix factorisation could related to\([\mathcal{H}]\)-matrix methods, as seen in, e.g.covariance matrices, but I do not know enough to say more.

Seehmatrix.org for one lab’s backgrounder and their implementation,h2lib,hlibpro for a black-box closed-source one.

Rather than find an optimal solution, why not just choose a random one which might be good enough? There are indeedrandomised versions.

See(Grosse et al. 2012) for a mind-melting compositional matrix factorization diagram, constructing a search overhierarchical kernel decompositions that also turn out to have some matrix factorisation interpretations.

Nakajima and Sugiyama (2012):

Mnih and Salakhutdinov (2008) proposed a Bayesian maximum a posteriori (MAP) method based on the Gaussian noise model and Gaussian priors on the decomposed matrices. This method actually corresponds to minimizing the squared-loss with the trace-norm penalty(Srebro, Rennie, and Jaakkola 2004) Recently, the variational Bayesian (VB) approach(Attias 1999) has been applied to MF(Lim and Teh 2007;Raiko, Ilin, and Karhunen 2007), which we refer to as VBMF. The VBMF method was shown to perform very well in experiments. However, its good performance was not completely understood beyond its experimental success.

☜ Insert further developments here. Possibly Brouwer’s thesis(Brouwer 2017) or Shakir Mohamed’s(Mohamed 2011) would be a good start, or Benjamin Drave’s tutorial,Probabilistic Matrix Factorization andXinghao Ding, Lihan He, and Carin (2011).

I am currently sitting in a seminar byHe Zhao on Bayesian matrix factorisation, wherein he is building up this tool for discrete data, which is an interesting case. He starts fromM. Zhou et al. (2012) and builds up toZhao et al. (2018), introducing some hierarchical descriptions along the way. His methods seem to be sampling-based rather than variational (?).

Generalized² Linear² models(Gordon 2002) unify nonlinear matrix factorisations withGeneralized Linear Models. I had not heard of that until recently; I wonder how common it is?

“Enough theory! Plug this algorithm into my data!”

OK.

NMF Toolbox (MATLAB and Python)

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants—in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

Vowpal Wabbit factors matrices, e.g forrecommender systems.
It seems the`--qr`

version ismore favoured.

HPC for matlab, R, python, c++:libpmf:

LIBPMF implements the CCD++ algorithm, which aims to solve large-scale matrix factorization problems such as the low-rank factorization problems for recommender systems.

NMF (R) 🏗

Matlab: Chih-Jen Lin’snmf.m - “This tool solves NMF by alternative non-negative least squares using projected gradients. It converges faster than the popular multiplicative update approach. ”

In this repository, we offer both MPI and OPENMP implementation for MU, HALS and ANLS/BPP based NMF algorithms. This can run off the shelf as well easy to integrate in other source code. These are very highly tuned NMF algorithms to work on super computers. We have tested this software in NERSC as well OLCF cluster. The openmp implementation is tested on many different Linux variants with intel processors. The library works well for both sparse and dense matrix.(Fairbanks et al. 2015;Kannan, Ballard, and Park 2016;Kannan 2016)

Spams (C++/MATLAB/python) includes some matrix factorisations in its sparse approx toolbox. (seeoptimisation)

`scikit-learn`

(python) doesa few matrix factorisation
in its inimitable
batteries-in-the-kitchen-sink way.

… is a Python library for nonnegative matrix factorization. It includes implementations of several factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.”

Tapkee (C++). Pro-tip — even without coding C++, tapkee does a long list of dimensionality reduction from the CLI.

- PCA and randomized PCA
- Kernel PCA (kPCA)
- Random projection
- Factor analysis

tensorly supports some interesting tensor decompositions.

Aarabi, Hadrien Foroughmand, and Geoffroy Peeters. 2018.“Music Retiler: Using NMF2D Source Separation for Audio Mosaicing.” In*Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion*, 27:1–7. AM’18. New York, NY, USA: ACM.

Abdallah, Samer A., and Mark D. Plumbley. 2004.“Polyphonic Music Transcription by Non-Negative Sparse Coding of Power Spectra.” In.

Achlioptas, Dimitris. 2003.“Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.”*Journal of Computer and System Sciences*, Special Issue on PODS 2001, 66 (4): 671–87.

Aghasi, Alireza, Nam Nguyen, and Justin Romberg. 2016.“Net-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks.”*arXiv:1611.05162 [Cs, Stat]*, November.

Ang, Andersen Man Shun, and Nicolas Gillis. 2018.“Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation.”*Neural Computation* 31 (2): 417–39.

Arora, Sanjeev, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. 2012.“A Practical Algorithm for Topic Modeling with Provable Guarantees.”*arXiv:1212.4777 [Cs, Stat]*, December.

Attias, Hagai. 1999.“Inferring Parameters and Structure of Latent Variable Models by Variational Bayes.” In*Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence*, 21–30. UAI’99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Babacan, S. Derin, Martin Luessi, Rafael Molina, and Aggelos K. Katsaggelos. 2012.“Sparse Bayesian Methods for Low-Rank Matrix Estimation.”*IEEE Transactions on Signal Processing* 60 (8): 3964–77.

Bach, Francis. 2013.“Convex Relaxations of Structured Matrix Factorizations.”*arXiv:1309.3117 [Cs, Math]*, September.

Bach, Francis R. 2013.“Sharp Analysis of Low-Rank Kernel Matrix Approximations.” In*COLT*, 30:185–209.

Bach, Francis R, and Michael I Jordan. 2002.“Kernel Independent Component Analysis.”*Journal of Machine Learning Research* 3 (July): 48.

Bach, Francis, Rodolphe Jenatton, and Julien Mairal. 2011.*Optimization With Sparsity-Inducing Penalties*. Foundations and Trends(r) in Machine Learning 1.0. Now Publishers Inc.

Bagge Carlson, Fredrik. 2018.“Machine Learning and System Identification for Estimation in Physical Systems.” Thesis/docmono, Lund University.

Barbier, Jean, Nicolas Macris, and Léo Miolane. 2017.“The Layered Structure of Tensor Estimation and Its Mutual Information.”*arXiv:1709.10368 [Cond-Mat, Physics:math-Ph]*, September.

Batson, Joshua, Daniel A. Spielman, and Nikhil Srivastava. 2008.“Twice-Ramanujan Sparsifiers.”*arXiv:0808.0163 [Cs]*, August.

Bauckhage, Christian. 2015.“K-Means Clustering Is Matrix Factorization.”*arXiv:1512.07548 [Stat]*, December.

Berry, Michael W., Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. 2007.“Algorithms and Applications for Approximate Nonnegative Matrix Factorization.”*Computational Statistics & Data Analysis* 52 (1): 155–73.

Bertin, N., R. Badeau, and E. Vincent. 2010.“Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription.”*IEEE Transactions on Audio, Speech, and Language Processing* 18 (3): 538–49.

Brouwer, Thomas Alexander. 2017.“Bayesian Matrix Factorisation: Inference, Priors, and Data Integration.”

Bruckstein, A. M., Michael Elad, and M. Zibulevsky. 2008a.“Sparse Non-Negative Solution of a Linear System of Equations Is Unique.” In*3rd International Symposium on Communications, Control and Signal Processing, 2008. ISCCSP 2008*, 762–67.

———. 2008b.“On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.”*IEEE Transactions on Information Theory* 54 (11): 4813–20.

Buch, Michael, Elio Quinton, and Bob L Sturm. 2017.“NichtnegativeMatrixFaktorisierungnutzendesKlangsynthesenSystem (NiMFKS): Extensions of NMF-Based Concatenative Sound Synthesis.” In*Proceedings of the 20th International Conference on Digital Audio Effects*, 7. Edinburgh.

Caetano, Marcelo, and Xavier Rodet. 2013.“Musical Instrument Sound Morphing Guided by Perceptually Motivated Features.”*IEEE Transactions on Audio, Speech, and Language Processing* 21 (8): 1666–75.

Cao, Bin, Dou Shen, Jian-Tao Sun, Xuanhui Wang, Qiang Yang, and Zheng Chen. n.d.“Detect and Track Latent Factors with Online Nonnegative Matrix Factorization.” In.

Carabias-Orti, J. J., T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Canadas-Quesada. 2011.“Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.”*IEEE Journal of Selected Topics in Signal Processing* 5 (6): 1144–58.

Chen, Yudong, and Yuejie Chi. n.d.“Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation,” 29.

Chi, Yuejie, Yue M. Lu, and Yuxin Chen. 2019.“Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview.”*IEEE Transactions on Signal Processing* 67 (20): 5239–69.

Cichocki, A., N. Lee, I. V. Oseledets, A.-H. Phan, Q. Zhao, and D. Mandic. 2016.“Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1.”*arXiv:1609.00893 [Cs]*, September.

Cichocki, A., R. Zdunek, and S. Amari. 2006.“New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation.” In*2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings*, 5:V–.

Cohen, Albert, Ingrid Daubechies, and Jean-Christophe Feauveau. 1992.“Biorthogonal Bases of Compactly Supported Wavelets.”*Communications on Pure and Applied Mathematics* 45 (5): 485–560.

Combettes, Patrick L., and Jean-Christophe Pesquet. 2008.“A Proximal Decomposition Method for Solving Convex Variational.”*Inverse Problems* 24 (6): 065014.

Dasarathy, Gautam, Parikshit Shah, Badri Narayan Bhaskar, and Robert Nowak. 2013.“Sketching Sparse Matrices.”*arXiv:1303.6544 [Cs, Math]*, March.

Dasgupta, Sanjoy, and Anupam Gupta. 2003.“An Elementary Proof of a Theorem of Johnson and Lindenstrauss.”*Random Structures & Algorithms* 22 (1): 60–65.

Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. 2016.“Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering.” In*Advances In Neural Information Processing Systems*.

Desai, A., M. Ghashami, and J. M. Phillips. 2016.“Improved Practical Matrix Sketching with Guarantees.”*IEEE Transactions on Knowledge and Data Engineering* 28 (7): 1678–90.

Devarajan, Karthik. 2008.“Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology.”*PLoS Comput Biol* 4 (7): e1000029.

Ding, C., X. He, and H. Simon. 2005.“On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering.” In*Proceedings of the 2005 SIAM International Conference on Data Mining*, 606–10. Proceedings. Society for Industrial and Applied Mathematics.

Ding, C., Tao Li, and M.I. Jordan. 2010.“Convex and Semi-Nonnegative Matrix Factorizations.”*IEEE Transactions on Pattern Analysis and Machine Intelligence* 32 (1): 45–55.

Dokmanić, Ivan, and Rémi Gribonval. 2017.“Beyond Moore-Penrose Part II: The Sparse Pseudoinverse.”*arXiv:1706.08701 [Cs, Math]*, June.

Driedger, Jonathan, and Thomas Pratzlich. 2015.“Let It Bee – Towards NMF-Inspired Audio Mosaicing.” In*Proceedings of ISMIR*, 7. Malaga.

Drineas, Petros, and Michael W. Mahoney. 2005.“On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning.”*Journal of Machine Learning Research* 6 (December): 2153–75.

Dueck, Delbert, Quaid D. Morris, and Brendan J. Frey. 2005.“Multi-Way Clustering of Microarray Data Using Probabilistic Sparse Matrix Factorization.”*Bioinformatics* 21 (suppl 1): i144–51.

Eaton, Morris L. 2007a.“Chapter 5: Matrix Factorizations and Jacobians.” In*Institute of Mathematical Statistics Lecture Notes - Monograph Series*, 159–83. Beachwood, Ohio, USA: Institute of Mathematical Statistics.

———. 2007b.*Multivariate statistics: a vector space approach*. Lecture notes-monograph series / Institute of Mathematical Statistics 53. Beachwood, Ohio: Inst. of Mathematical Statistics.

Ellis, Robert L., and David C. Lay. 1992.“Factorization of Finite Rank Hankel and Toeplitz Matrices.”*Linear Algebra and Its Applications* 173 (August): 19–38.

Fairbanks, James P., Ramakrishnan Kannan, Haesun Park, and David A. Bader. 2015.“Behavioral Clusters in Dynamic Graphs.”*Parallel Computing*, Graph analysis for scientific discovery, 47 (August): 38–50.

Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu. 2008.“Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.”*Neural Computation* 21 (3): 793–830.

Flammia, Steven T., David Gross, Yi-Kai Liu, and Jens Eisert. 2012.“Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity, and Efficient Estimators.”*New Journal of Physics* 14 (9): 095022.

Fung, Wai Shing, Ramesh Hariharan, Nicholas J.A. Harvey, and Debmalya Panigrahi. 2011.“A General Framework for Graph Sparsification.” In*Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing*, 71–80. STOC ’11. New York, NY, USA: ACM.

Gemulla, Rainer, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011.“Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent.” In*Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 69–77. KDD ’11. New York, NY, USA: ACM.

Ghashami, Mina, Edo Liberty, Jeff M. Phillips, and David P. Woodruff. 2015.“Frequent Directions : Simple and Deterministic Matrix Sketching.”*arXiv:1501.01711 [Cs]*, January.

Gordon, Geoffrey J. 2002.“Generalized² Linear² Models.” In*Proceedings of the 15th International Conference on Neural Information Processing Systems*, 593–600. NIPS’02. Cambridge, MA, USA: MIT Press.

Gross, D. 2011.“Recovering Low-Rank Matrices From Few Coefficients in Any Basis.”*IEEE Transactions on Information Theory* 57 (3): 1548–66.

Gross, David, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. 2010.“Quantum State Tomography via Compressed Sensing.”*Physical Review Letters* 105 (15).

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012.“Exploiting Compositionality to Explore a Large Space of Model Structures.” In*Proceedings of the Conference on Uncertainty in Artificial Intelligence*.

Guan, Naiyang, Dacheng Tao, Zhigang Luo, and Bo Yuan. 2012.“NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization.”*IEEE Transactions on Signal Processing* 60 (6): 2882–98.

Guan, N., D. Tao, Z. Luo, and B. Yuan. 2012.“Online Nonnegative Matrix Factorization With Robust Stochastic Approximation.”*IEEE Transactions on Neural Networks and Learning Systems* 23 (7): 1087–99.

Hackbusch, Wolfgang. 2015.*Hierarchical Matrices: Algorithms and Analysis*. 1st ed. Springer Series in Computational Mathematics 49. Heidelberg New York Dordrecht London: Springer Publishing Company, Incorporated.

Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. 2010.“Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions.” arXiv.

Hassanieh, Haitham, Piotr Indyk, Dina Katabi, and Eric Price. 2012.“Nearly Optimal Sparse Fourier Transform.” In*Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing*, 563–78. STOC ’12. New York, NY, USA: ACM.

Hassanieh, H., P. Indyk, D. Katabi, and E. Price. 2012.“Simple and Practical Algorithm for Sparse Fourier Transform.” In*Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms*, 1183–94. Proceedings. Kyoto, Japan: Society for Industrial and Applied Mathematics.

Hastie, Trevor, Rahul Mazumder, Jason D Lee, and Reza Zadeh. n.d.“Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares,” 36.

Heinig, Georg, and Karla Rost. 2011.“Fast Algorithms for Toeplitz and Hankel Matrices.”*Linear Algebra and Its Applications* 435 (1): 1–59.

Hoffman, Matthew D, David M Blei, and Perry R Cook. 2010.“Bayesian Nonparametric Matrix Factorization for Recorded Music.” In*International Conference on Machine Learning*, 8.

Hoffman, Matthew, Francis R. Bach, and David M. Blei. 2010.“Online Learning for Latent Dirichlet Allocation.” In*Advances in Neural Information Processing Systems*, 856–64.

Hoyer, P.O. 2002.“Non-Negative Sparse Coding.” In*Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002*, 557–65.

Hsieh, Cho-Jui, and Inderjit S. Dhillon. 2011.“Fast Coordinate Descent Methods with Variable Selection for Non-Negative Matrix Factorization.” In*Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 1064–72. KDD ’11. New York, NY, USA: ACM.

Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014.“A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In*2014 48th Asilomar Conference on Signals, Systems and Computers*.

Huang, G., M. Kaess, and J. J. Leonard. 2013.“Consistent Sparsification for Graph Optimization.” In*2013 European Conference on Mobile Robots (ECMR)*, 150–57.

Iliev, Filip L., Valentin G. Stanev, Velimir V. Vesselinov, and Boian S. Alexandrov. 2018.“Nonnegative Matrix Factorization for Identification of Unknown Number of Sources Emitting Delayed Signals.”*PLOS ONE* 13 (3): e0193974.

Ji, S., D. Dunson, and L. Carin. 2009.“Multitask Compressive Sensing.”*IEEE Transactions on Signal Processing* 57 (1): 92–106.

Kannan, Ramakrishnan. 2016.“Scalable and Distributed Constrained Low Rank Approximations,” April.

Kannan, Ramakrishnan, Grey Ballard, and Haesun Park. 2016.“A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization.” In*Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*, 9:1–11. PPoPP ’16. New York, NY, USA: ACM.

Keriven, Nicolas, Anthony Bourrier, Rémi Gribonval, and Patrick Pérez. 2016.“Sketching for Large-Scale Learning of Mixture Models.” In*2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 6190–94.

Keshava, Nirmal. 2003.“A Survey of Spectral Unmixing Algorithms.”*Lincoln Laboratory Journal* 14 (1): 55–78.

Khoromskij, B. N., A. Litvinenko, and H. G. Matthies. 2009.“Application of Hierarchical Matrices for Computing the Karhunen–Loève Expansion.”*Computing* 84 (1-2): 49–67.

Kim, H., and H. Park. 2008.“Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.”*SIAM Journal on Matrix Analysis and Applications* 30 (2): 713–30.

Koren, Yehuda, Robert Bell, and Chris Volinsky. 2009.“Matrix Factorization Techniques for Recommender Systems.”*Computer* 42 (8): 30–37.

Koutis, Ioannis, Gary L. Miller, and Richard Peng. 2012.“A Fast Solver for a Class of Linear Systems.”*Communications of the ACM* 55 (10): 99–107.

Kruskal, J. B. 1964.“Nonmetric Multidimensional Scaling: A Numerical Method.”*Psychometrika* 29 (2): 115–29.

Kumar, N. Kishore, and Jan Shneider. 2016.“Literature Survey on Low Rank Approximation of Matrices.”*arXiv:1606.06511 [Cs, Math]*, June.

Lahiri, Subhaneil, Peiran Gao, and Surya Ganguli. 2016.“Random Projections of Random Manifolds.”*arXiv:1607.04331 [Cs, q-Bio, Stat]*, July.

Lawrence, Neil D., and Raquel Urtasun. 2009.“Non-Linear Matrix Factorization with Gaussian Processes.” In*Proceedings of the 26th Annual International Conference on Machine Learning*, 601–8. ICML ’09. New York, NY, USA: ACM.

Lee, Daniel D., and H. Sebastian Seung. 1999.“Learning the Parts of Objects by Non-Negative Matrix Factorization.”*Nature* 401 (6755): 788–91.

———. 2001.“Algorithms for Non-Negative Matrix Factorization.” In*Advances in Neural Information Processing Systems 13*, edited by T. K. Leen, T. G. Dietterich, and V. Tresp, 556–62. MIT Press.

Li, Chi-Kwong, and Edward Poon. 2002.“Additive Decomposition of Real Matrices.”*Linear and Multilinear Algebra* 50 (4): 321–26.

Li, S.Z., XinWen Hou, HongJiang Zhang, and Qiansheng Cheng. 2001.“Learning Spatially Localized, Parts-Based Representation.” In*Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001*, 1:I-207-I-212 vol.1.

Liberty, Edo. 2013.“Simple and Deterministic Matrix Sketching.” In*Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 581–88. KDD ’13. New York, NY, USA: ACM.

Liberty, Edo, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. 2007.“Randomized Algorithms for the Low-Rank Approximation of Matrices.”*Proceedings of the National Academy of Sciences* 104 (51): 20167–72.

Lim, Yew Jin, and Yee Whye Teh. 2007.“Variational Bayesian Approach to Movie Rating Prediction.” In*Proceedings of KDD Cup and Workshop*, 7:15–21. Citeseer.

Lin, Chih-Jen. 2007.“Projected Gradient Methods for Nonnegative Matrix Factorization.”*Neural Computation* 19 (10): 2756–79.

Lin, Zhouchen. n.d.“A Review on Low-Rank Models in Signal and Data Analysis.”

Liu, Tongliang, Dacheng Tao, and Dong Xu. 2016.“Dimensionality-Dependent Generalization Bounds for\(k\)-Dimensional Coding Schemes.”*arXiv:1601.00238 [Cs, Stat]*, January.

Liu, T., and D. Tao. 2015.“On the Performance of Manhattan Nonnegative Matrix Factorization.”*IEEE Transactions on Neural Networks and Learning Systems* PP (99): 1–1.

López-Serrano, Patricio, Christian Dittmar, Yigitcan Özer, and Meinard Müller. 2019.“NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization.” In.

Mahoney, Michael W. 2010.*Randomized Algorithms for Matrices and Data*. Vol. 3.

Mailhé, Boris, Rémi Gribonval, Pierre Vandergheynst, and Frédéric Bimbot. 2011.“Fast Orthogonal Sparse Approximation Algorithms over Local Dictionaries.”*Signal Processing*, Advances in Multirate Filter Bank Structures and Multiscale Representations, 91 (12): 2822–35.

Mairal, Julien, Francis Bach, and Jean Ponce. 2014.*Sparse Modeling for Image and Vision Processing*. Vol. 8.

Mairal, Julien, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009.“Online Dictionary Learning for Sparse Coding.” In*Proceedings of the 26th Annual International Conference on Machine Learning*, 689–96. ICML ’09. New York, NY, USA: ACM.

———. 2010.“Online Learning for Matrix Factorization and Sparse Coding.”*The Journal of Machine Learning Research* 11: 19–60.

Martinsson, Per-Gunnar. 2016.“Randomized Methods for Matrix Computations and Analysis of High Dimensional Data.”*arXiv:1607.01649 [Math]*, July.

Martinsson, Per-Gunnar, Vladimir Rockhlin, and Mark Tygert. 2006.“A Randomized Algorithm for the Approximation of Matrices.” DTIC Document.

Mensch, Arthur, Julien Mairal, Bertrand Thirion, and Gael Varoquaux. 2017.“Stochastic Subsampling for Factorizing Huge Matrices.”*arXiv:1701.05363 [Math, q-Bio, Stat]*, January.

Mnih, Andriy, and Russ R Salakhutdinov. 2008.“Probabilistic Matrix Factorization.”*Advances in Neural Information Processing Systems*, 8.

Mohamed, Shakir. 2011.“Generalised Bayesian Matrix Factorisation Models,” 140.

Nakajima, Shinichi, and Masashi Sugiyama. 2012.“Theoretical Analysis of Bayesian Matrix Factorization.”*Journal of Machine Learning Research*, 66.

Needell, Deanna, and Roman Vershynin. 2009.“Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit.”*Foundations of Computational Mathematics* 9 (3): 317–34.

Nowak, W., and A. Litvinenko. 2013.“Kriging and Spatial Design Accelerated by Orders of Magnitude: Combining Low-Rank Covariance Approximations with FFT-Techniques.”*Mathematical Geosciences* 45 (4): 411–35.

Okajima, Koki, and Yoshiyuki Kabashima. 2021.“Matrix Completion Based on Gaussian Parameterized Belief Propagation,” May.

Oymak, Samet, and Joel A. Tropp. 2015.“Universality Laws for Randomized Dimension Reduction, with Applications.”*arXiv:1511.09433 [Cs, Math, Stat]*, November.

Paatero, Pentti, and Unto Tapper. 1994.“Positive Matrix Factorization: A Non-Negative Factor Model with Optimal Utilization of Error Estimates of Data Values.”*Environmetrics* 5 (2): 111–26.

Pan, Gang, Wangsheng Zhang, Zhaohui Wu, and Shijian Li. 2014.“Online Community Detection for Large Complex Networks.”*PLoS ONE* 9 (7): e102799.

Raiko, Tapani, Alexander Ilin, and Juha Karhunen. 2007.“Principal Component Analysis for Large Scale Problems with Lots of Missing Values.” In*Machine Learning: ECML 2007*, edited by Joost N. Kok, Jacek Koronacki, Raomon Lopez de Mantaras, Stan Matwin, Dunja Mladenič, and Andrzej Skowron, 4701:691–98. Berlin, Heidelberg: Springer Berlin Heidelberg.

Rokhlin, Vladimir, Arthur Szlam, and Mark Tygert. 2009.“A Randomized Algorithm for Principal Component Analysis.”*SIAM J. Matrix Anal. Appl.* 31 (3): 1100–1124.

Rokhlin, Vladimir, and Mark Tygert. 2008.“A Fast Randomized Algorithm for Overdetermined Linear Least-Squares Regression.”*Proceedings of the National Academy of Sciences* 105 (36): 13212–17.

Ryabko, Daniil, and Boris Ryabko. 2010.“Nonparametric Statistical Inference for Ergodic Processes.”*IEEE Transactions on Information Theory* 56 (3): 1430–35.

Sachdeva, Noveen, Mehak Preet Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022.“Infinite Recommendation Networks: A Data-Centric Approach.” arXiv.

Salakhutdinov, Ruslan, and Andriy Mnih. 2008.“Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo.” In*Proceedings of the 25th International Conference on Machine Learning*, 880–87. ICML ’08. New York, NY, USA: ACM.

Saul, Lawrence K. 2023.“A Geometrical Connection Between Sparse and Low-Rank Matrices and Its Application to Manifold Learning.”*Transactions on Machine Learning Research*, January.

Schmidt, M.N., J. Larsen, and Fu-Tien Hsiao. 2007.“Wind Noise Reduction Using Non-Negative Sparse Coding.” In*2007 IEEE Workshop on Machine Learning for Signal Processing*, 431–36.

Seshadhri, C., Aneesh Sharma, Andrew Stolman, and Ashish Goel. 2020.“The Impossibility of Low-Rank Representations for Triangle-Rich Complex Networks.”*Proceedings of the National Academy of Sciences* 117 (11): 5631–37.

Shi, Jiarong, Xiuyun Zheng, and Wei Yang. 2017.“Survey on Probabilistic Models of Low-Rank Matrix Factorizations.”*Entropy* 19 (8): 424.

Singh, Ajit P., and Geoffrey J. Gordon. 2008.“A Unified View of Matrix Factorization Models.” In*Machine Learning and Knowledge Discovery in Databases*, 358–73. Springer.

Smaragdis, Paris. 2004.“Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs.” In*Independent Component Analysis and Blind Signal Separation*, edited by Carlos G. Puntonet and Alberto Prieto, 494–99. Lecture Notes in Computer Science. Granada, Spain: Springer Berlin Heidelberg.

Soh, Yong Sheng, and Venkat Chandrasekaran. 2017.“A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.”*arXiv:1701.01207 [Cs, Math, Stat]*, January.

Sorzano, C. O. S., J. Vargas, and A. Pascual Montano. 2014.“A Survey of Dimensionality Reduction Techniques.”*arXiv:1403.2877 [Cs, q-Bio, Stat]*, March.

Spielman, Daniel A., and Shang-Hua Teng. 2004.“Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems.” In*Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing*, 81–90. STOC ’04. New York, NY, USA: ACM.

———. 2006.“Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems.”*arXiv:cs/0607105*, July.

———. 2008a.“Spectral Sparsification of Graphs.”*arXiv:0808.4134 [Cs]*, August.

———. 2008b.“A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly-Linear Time Graph Partitioning.”*arXiv:0809.3232 [Cs]*, September.

Spielman, D., and N. Srivastava. 2011.“Graph Sparsification by Effective Resistances.”*SIAM Journal on Computing* 40 (6): 1913–26.

Sra, Suvrit, and Inderjit S. Dhillon. 2006.“Generalized Nonnegative Matrix Approximations with Bregman Divergences.” In*Advances in Neural Information Processing Systems 18*, edited by Y. Weiss, B. Schölkopf, and J. C. Platt, 283–90. MIT Press.

Srebro, Nathan, Jason D. M. Rennie, and Tommi S. Jaakkola. 2004.“Maximum-Margin Matrix Factorization.” In*Advances in Neural Information Processing Systems*, 17:1329–36. NIPS’04. Cambridge, MA, USA: MIT Press.

Stewart, G.W. 2000.“The Decompositional Approach to Matrix Computation.”*Computing in Science Engineering* 2 (1): 50–59.

Sun, Ying, and Michael L. Stein. 2016.“Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets.”*Journal of Computational and Graphical Statistics* 25 (1): 187–208.

Sundin, Martin. 2016.“Bayesian Methods for Sparse and Low-Rank Matrix Problems.” PhD Thesis, KTH Royal Institute of Technology.

Tropp, Joel A., Alp Yurtsever, Madeleine Udell, and Volkan Cevher. 2016.“Randomized Single-View Algorithms for Low-Rank Matrix Approximation.”*arXiv:1609.00048 [Cs, Math, Stat]*, August.

———. 2017.“Practical Sketching Algorithms for Low-Rank Matrix Approximation.”*SIAM Journal on Matrix Analysis and Applications* 38 (4): 1454–85.

Tufts, D. W., and R. Kumaresan. 1982.“Estimation of Frequencies of Multiple Sinusoids: Making Linear Prediction Perform Like Maximum Likelihood.”*Proceedings of the IEEE* 70 (9): 975–89.

Tung, Frederick, and James J. Little. n.d.“Factorized Binary Codes for Large-Scale Nearest Neighbor Search.”

Türkmen, Ali Caner. 2015.“A Review of Nonnegative Matrix Factorization Methods for Clustering.”*arXiv:1507.03194 [Cs, Stat]*, July.

Turner, Richard E., and Maneesh Sahani. 2014.“Time-Frequency Analysis as Probabilistic Inference.”*IEEE Transactions on Signal Processing* 62 (23): 6171–83.

Udell, M., and A. Townsend. 2019.“Why Are Big Data Matrices Approximately Low Rank?”*SIAM Journal on Mathematics of Data Science* 1 (1): 144–60.

Vaz, Colin, Asterios Toutios, and Shrikanth S. Narayanan. 2016.“Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.” In, 963–67.

Vincent, E., N. Bertin, and R. Badeau. 2008.“Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription.” In*2008 IEEE International Conference on Acoustics, Speech and Signal Processing*, 109–12.

Virtanen, T. 2007.“Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria.”*IEEE Transactions on Audio, Speech, and Language Processing* 15 (3): 1066–74.

Vishnoi, Nisheeth K. 2013.“Lx = b.”*Foundations and Trends® in Theoretical Computer Science* 8 (1-2): 1–141.

Vo, Ba Ngu, Ba Tuong Vo, and Hung Gia Hoang. 2017.“An Efficient Implementation of the Generalized Labeled Multi-Bernoulli Filter.”*arXiv:1606.08350 [Stat]*, February.

Wager, S., L. Chen, M. Kim, and C. Raphael. 2017.“Towards Expressive Instrument Synthesis Through Smooth Frame-by-Frame Reconstruction: From String to Woodwind.” In*2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 391–95.

Wang, Boyue, Yongli Hu, Junbin Gao, Yanfeng Sun, Haoran Chen, and Baocai Yin. 2017.“Locality Preserving Projections for Grassmann Manifold.” In*PRoceedings of IJCAI, 2017*.

Wang, Shusen, Alex Gittens, and Michael W. Mahoney. 2017.“Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging.”*arXiv:1702.04837 [Cs, Stat]*, February.

Wang, Y. X., and Y. J. Zhang. 2013.“Nonnegative Matrix Factorization: A Comprehensive Review.”*IEEE Transactions on Knowledge and Data Engineering* 25 (6): 1336–53.

Wang, Yuan, and Yunde Jia. 2004.“Fisher Non-Negative Matrix Factorization for Learning Local Features.” In*In Proc. Asian Conf. On Comp. Vision*, 27–30.

Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019.“End-to-End Probabilistic Inference for Nonstationary Audio Analysis.”*arXiv:1901.11436 [Cs, Eess, Stat]*, January.

Woodruff, David P. 2014.*Sketching as a Tool for Numerical Linear Algebra*. Foundations and Trends in Theoretical Computer Science 1.0. Now Publishers.

Woolfe, Franco, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. 2008.“A Fast Randomized Algorithm for the Approximation of Matrices.”*Applied and Computational Harmonic Analysis* 25 (3): 335–66.

Wright, John, and Yi Ma. 2022.*High-dimensional data analysis with low-dimensional models: Principles, computation, and applications*. S.l.: Cambridge University Press.

Xinghao Ding, Lihan He, and L. Carin. 2011.“Bayesian Robust Principal Component Analysis.”*IEEE Transactions on Image Processing* 20 (12): 3419–30.

Yang, Jiyan, Xiangrui Meng, and Michael W. Mahoney. 2015.“Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments.”*arXiv:1502.03032 [Cs, Math, Stat]*, February.

Yang, Linxiao, Jun Fang, Huiping Duan, Hongbin Li, and Bing Zeng. 2018.“Fast Low-Rank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models.”*IEEE Transactions on Signal Processing* 66 (11): 2804–17.

Yang, Wenzhuo, and Huan Xu. 2015.“Streaming Sparse Principal Component Analysis.” In*Journal of Machine Learning Research*, 494–503.

Ye, Ke, and Lek-Heng Lim. 2016.“Every Matrix Is a Product of Toeplitz Matrices.”*Foundations of Computational Mathematics* 16 (3): 577–98.

Yin, M., J. Gao, and Z. Lin. 2016.“Laplacian Regularized Low-Rank Representation and Its Applications.”*IEEE Transactions on Pattern Analysis and Machine Intelligence* 38 (3): 504–17.

Yoshii, Kazuyoshi. 2013.“Beyond NMF: Time-Domain Audio Source Separation Without Phase Reconstruction,” 6.

Yu, Chenhan D., William B. March, and George Biros. 2017.“An\(N \log N\) Parallel Fast Direct Solver for Kernel Matrices.” In*arXiv:1701.02324 [Cs]*.

Yu, Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon. 2012.“Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems.” In*IEEE International Conference of Data Mining*, 765–74.

———. 2014.“Parallel Matrix Factorization for Recommender Systems.”*Knowledge and Information Systems* 41 (3): 793–819.

Zass, Ron, and Amnon Shashua. 2005.“A Unifying Approach to Hard and Probabilistic Clustering.” In*Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1 - Volume 01*, 294–301. ICCV ’05. Washington, DC, USA: IEEE Computer Society.

Zhang, Kai, Chuanren Liu, Jie Zhang, Hui Xiong, Eric Xing, and Jieping Ye. 2017.“Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling.” In*Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 615–23. KDD ’17. New York, NY, USA: ACM.

Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017.“Stochastic Variance-Reduced Gradient Descent for Low-Rank Matrix Recovery from Linear Measurements.”*arXiv:1701.00481 [Stat]*, January.

Zhang, Zhongyuan, Chris Ding, Tao Li, and Xiangsun Zhang. 2007.“Binary Matrix Factorization with Applications.” In*Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007*, 391–400. IEEE.

Zhao, He, Lan Du, Wray Buntine, and Mingyuan Zhou. 2018.“Inter and Intra Topic Structure Learning with Word Embeddings.” In*Proceedings of the 35th International Conference on Machine Learning*, 5892–5901. PMLR.

Zhou, Mingyuan, Haojun Chen, John Paisley, Lu Ren, Guillermo Sapiro, and Lawrence Carin. 2009.“Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations.” In*Proceedings of the 22nd International Conference on Neural Information Processing Systems*, 22:2295–2303. NIPS’09. Red Hook, NY, USA: Curran Associates Inc.

Zhou, Mingyuan, Lauren Hannah, David Dunson, and Lawrence Carin. 2012.“Beta-Negative Binomial Process and Poisson Factor Analysis.” In*Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics*, 1462–71. PMLR.

Zhou, Tianyi, and Dacheng Tao. 2011.“Godec: Randomized Low-Rank & Sparse Matrix Decomposition in Noisy Case.”

———. 2012.“Multi-Label Subspace Ensemble.”*Journal of Machine Learning Research*.

Zitnik, Marinka, and Blaz Zupan. 2018.“NIMFA: A Python Library for Nonnegative Matrix Factorization.”*arXiv:1808.01743 [Cs, q-Bio, Stat]*, August.

Kinda hateR, because, as much as it is a statistical dream, it is a programming nightmare? IsMATLAB too expensive when you try to run it on your cloud server farm and you’re anyway vaguely suspicious that they get kickbacks from the companies that sell RAM because otherwise why does it eat all your memory like that? Love the speed ofC++ but have a nagging feeling that you should not need to recompile your code to do exploratory data analysis? Like the idea ofJulia, but wary of depending on yet another bloody language, let alone one without the serious corporate backing or long history of the other ones I mentioned?

Python has a*different* set of warts to those other options.
Its statistical library support is narrower than R - probably comparable to
MATLAB.
It is, however, sorta fast enough in practice, and nicer to debug, and support diverse general programming tasks well — web servers,academic blogs,neural networks,weird art projects,
onlineopen workbooks, and has interfaces to animpressive numerical ecosystem…

Although it is occasionally rough, it’s ubiquitous and free, free, free so you don’t need to worry about stupid licensing restrictions, and the community is enormous, so it’s pretty easy to answer any questions you may have.

But in any case, you don’t need to choose.
Python interoperates with all these other languages, and indeed, makes a
specialty of*gluing stuff together*.

**Aside**: A lot of useful machine-learning-type functionality,
which I won’t discuss in detail here,
exists in the pythondeep learning toolkits
such asTensorflow and Theano;
you might want to check those pages too.
Alsographing is a whole separate issue,
as isoptimisation.

In recent times, a few major styles have been ascendant in the statistical python scene.

`pandas`

plus`statsmodels`

look a lot more like R.
On the minus side, this combination lack some language features of R
(e.g. regression formulae are not first class language features).
On the plus side, they lack some language misfeatures of R (the object model being a box of turds, and copy-by-value semantics and all those other annoyances.)

I am not a huge fan of pandas, personally. The enginerring behind it is impressive, but it ends up not fitting my actual problems particularly well. This might be about my workflow being idiosyncratic, or it might be because the original author had an idiosyncratic workflow, and it is he who needed weird features that the rest of us do not.

In comparison to R, one crucial weakness is that R has a rich ecosystem of tools for dataframes. Python is a bit thinner.

Also (and I do not know if this was the true process or not) when pandas was designed, the made a bunch of design choices differently than R did, possibly thinking to himself “Why did R not do it this way which is clearly better”, only to discover that in practice R’s way was better, or that the cool hack ended up being a bit awkward when applied to python syntax. Chief among these is the obligatory indexing of rows in the table; I spend a lot of time fighting pandas insistence on wanting everything to be named, even if it is stuff that does not need to be named.

Anyway, this is still very usable and useful.

pandas is more-or-less a dataframe class for python. Lots of nice things are built on this, such as …

statsmodels, which is more-or-less R, but Python. Implements

- Linear regression models
- Generalized linear models
- Discrete choice models
- Robust linear models
- Many models and functions for time series analysis
- Nonparametric estimators
- A wide range of statistical tests
- etc

patsy implements a formula language for

`pandas`

. This does lots of things, but most importantly, it- builds design matrices
(i.e. it knows how to represent
`z~x^2+x+y^3`

as a matrix, which only sounds trivial if you haven’t tried it) - statefully preconditions data (e.g. constructs data transforms that will correctly normalise the test set as well as the training data.)

- builds design matrices
(i.e. it knows how to represent
pandera implements type sanity and validation

The pandas API is popular; there are a few tools which aim to accelerate calculations by providing backends for it based onalternative data formats orparallelism.

cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF also provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

Scale your pandas workflow by changing a single line of code — The modin.pandas DataFrame is an extremely light-weight parallel DataFrame. Modin transparently distributes the data and computation so that all you need to do is continue using the pandas API as you were before installing Modin. Unlike other parallel DataFrame systems, Modin is an extremely light-weight, robust DataFrame. Because it is so light-weight, Modin provides speed-ups of up to 4x on a laptop with 4 physical cores.

Koalas: pandas API on Apache Spark seems to be Modin but for Spark.

Python dataframe interchange protocol

Python users today have a number of great choices for dataframe libraries. From Pandas and cuDF to Vaex, Koalas, Modin, Ibis, and more. Combining multiple types of dataframes in a larger application or analysis workflow, or developing a library which uses dataframes as a data structure, presents a challenge though. Those libraries all have different APIs, and there is no standard way of converting one type of dataframe into another.

Polars is a blazingly fast DataFrames library implemented in Rust usingApache Arrow Columnar Format as the memory model.

- Lazy | eager execution
- Multi-threaded
- SIMD
- Query optimization
- Powerful expression API
- Hybrid Streaming (larger than RAM datasets)
- Rust | Python | NodeJS | ...

Does not invent a python-specific data format but instead leveragesApache arrow.
It*looks* like pandas in many ways, but is not 100% compatible, seePolars for pandas users.
This means that the ecosystem is thinner again than the pandas ecosystem; on the other hand, some stuff looks easier than pandas, so maybe it is not too bad in practice.

`scikit-learn`

exemplifies a machine-learning style,
with lots of abstract feature construction
and predictive-performance style model selection
built around homogeneously-typed (only floats, only ints)
matrices instead of dataframes.
This style wil be more familiar toMATLAB
users than toR users.

scikit-learn (

*sklearn*to its friends) is the flagship of this fleet. It is fast, clear and well-designed. I enjoy using it for implementing ML-type tasks. It has various algorithms such asrandom forests and linear regression andGaussian processes and reference implementations of many algorithms, both*à la mode*and*passé*. Although I miss*sniff*`glmnet`

in R for lasso regression.SKLL (pronounced “skull”) provides a number of utilities to make it simpler to run common scikit-learn experiments with pre-generated features.

…provides a bridge between

`sklearn`

’s machine learning methods and pandas-style Data Frames.In particular, it provides:

a way to map DataFrame columns to transformations, which are later recombined into features

a way to cross-validate a pipeline that takes a pandas DataFrame as input.

libROSA, themachine listening library is more or less in this school.

“pystruct aims at being an easy-to-use structured learning and prediction library.”

Currently it implements only max-margin methods and a perceptron, but other algorithms might follow. The learning algorithms implemented in PyStruct have various names, which are often used loosely or differently in different communities. Common names are conditional random fields (CRFs), maximum-margin Markov random fields (M3N) or structural support vector machines.

PyCaret (announcement) is some kind of low-code stats tool.

PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such asscikit-learn,XGBoost,Microsoft LightGBM,spaCy, and many more.

`xarray`

xarray; what is this?

Notable application,ArviZ: Exploratory analysis of Bayesian models as seen in

Forecasting in python. Rob J Hyndman , inPython implementations of time series forecasting and anomaly detection recommends

Seetslearn

tslearn is a Python package that provides machine learning tools for the analysis of time series. This package builds on (and hence depends on) scikit-learn, numpy and scipy libraries.

Itintegrates with other time series tools, for example:

It automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks.

Cesium is an end-to-end machine learning platform for time-series, from calculation of features to model-building to predictions. Cesium has two main components - a Python library, and a web application platform that allows interactive exploration of machine learning pipelines. Take control over the workflow in a Python terminal or Jupyter notebook with the Cesium library, or upload your time-series files, select your machine learning model, and watch Cesium do feature extraction and evaluation right in your browser with the web application.

pyts is a time series classification library that seems moderately popular.

The non-python options inForecasting are also worth looking at.

You can do this using`rpy2.ipython`

```
%load_ext rpy2.ipython
%R library(robustbase)
%Rpush yy xx
%R mod <- lmrob(yy ~ xx);
%R params <- mod$coefficients;
%Rpull params
```

See theRevolutions blog, orJosh Devlin’s tips for more of that.

Counter-intuitively this is remarkably slow. I have experienced much greater speed in saving data to the file system in one language then loading it in another. For that, see the next

Much faster, weirdly, and better documented. Recommended. TryApache arrow.

```
import pandas as pd
import feather
path = 'my_data.feather'
feather.write_dataframe(df, path)
df = feather.read_dataframe(path)
```

```
library(feather)
path <- "my_data.feather"
write_feather(df, path)
df <- read_feather(path)
```

If that doesn’t work, try`hdf5`

or`protobuf`

or whatever.
There are many options.`hdf5`

seems to work well for me.

Seepycall from julia.

⚠️ this is out of date now; the new RNG API is much better.

Seeding your RNG can be a pain in the arse,
especially if you are interfacing with an external library
that doesn’t have RNG state passing in the API.
So, use a context manager.
Here’s one that works for`numpy`

-based code:

```
from numpy.random import get_state, set_state, seed
class Seed(object):
"""
context manager for reproducible seeding.
>>> with Seed(5):
>>> print(np.random.rand())
0.22199317108973948
"""
def __init__(self, seed):
self.seed = seed
self.state = None
def __enter__(self):
self.state = get_state()
seed(self.seed)
def __exit__(self, exc_type, exc_value, traceback):
set_state(self.state)
```

Exercise for the student: make it work with the default RNG also

agate is a stats package is not designed for high performance but for ease of use and reproducibility for non-specialists, e.g. journalists.

agate is a intended to fill a very particular programming niche. It should not be allowed to become as complex asnumpy orpandas. Please bear in mind the following principles when considering a new feature:

- Humans have less time than computers. Optimize for humans.
- Most datasets are small. Don’t optimize for “big data”.
- Text is data. It must always be a first-class citizen.
- Python gets it right. Make it work like Python does.
- Humans lives are nasty, brutish and short. Make it easy

- hypertools is a generic dimensionality reduction toolkit. Is this worthwhile?
- Bonus scientific computation is also available throughGSL, most easily viaCythonGSL use and reproducibility for journalists.
- savez makes persisting arrays fast and efficient per default. Use that, if you are talking to other python processes.
- biopython is a whole other world of phylogeny and bio-data wrangling. I’m not sure if it adheres to one of the aforementioned schools or not. Should check.
- more options for speech/string analysis atnatural language processing.

By javascript audio I mean in particular,*web* audio for javascript, i.e. in the browser.
If you want to use javascript as part of a more general audio coding thingy, great, but I know nothing of that.

I want to make sound in my browser. My main excuse is thesynestizer project.

There is a reasonably maturejavascript API for audio; but it is missing various things you might be used to:

- Noise generator
- binary unit generator extension API
- many sensible basic mathematical DSP operations, such as modulo arithmetic, transcendental functions, sample and hold…
- reliable low-level sequencing architecture
- \[…\]

It’s more suited to game sound effects than arbitrary synthesis. But it works, fast, and it’s everywhere, and it comes with integrated web services, openGL and UI libraries, and runs on android tablets, so there are some positives. Therefore, let’s see how it can be made convenient.

If you want to use a library to simplify things, see below. First, here’s the instructions for going “bareback”, which is still reasonably high-level:

- The MDN docs explain everything pretty well from the bottom up, e.g. theAnalyserNode documentation
- There’s no white noise generator! Here’s apseudo-white-noise-generation-workaround
- The master specification for theWebAudioAPI is not notably readable, but still the only documentation for some points.
- theHTML5rocks tutorial
- Buffer management
- Polyphony
- Creativejs webaudio howto
- Tizen audio API
- FX
- Samples
- Oscillators
- BBC examples are non-trivial and well documented

Not an embeddable library as such but a site with niffty API design for JS audio

Incredible example:

Simplifying repetitive tasks. Which of these is the hottest? I updated this list to include the openhub widgets, which give a somewhat-current estimate of the development activity, since they are all pretty sporadic.

A chiptune algorave project,Speccy has neato dorky sequencing in the browser. Integrates with random synth toysfxr.

Tone.js is another webaudio framework; a
particular selling point is that is has its own timing/scheduling system which
is what is*actually* missing from the JS Audio. Also people are actually
developing it still, which is not the case for everything else.

…is a bunch of helpers for WebAudio API. It isn’t a library

per se.

It appears to in fact be*precisely* a library, unlike the various crazy
frameworks on display here. I regard this as a plus. Technically it’s abandoned,
but that might also be because its too simple to need additions, just copy-paste
what you need.

Flocking attempts to create a declarative JSON-style audio language, with inspiration fromsupercollider.

Maxim is a javascript port of Mick Grierson’s C++ libraryMaximilian.

Gibber claims to be a full-featured “livecoding environment”, which is not my focus, but perhaps it has good parts for my use? (see alsothe manual). Its style is too opinionated for my tastes, and it has a weird DSL that is even slower and less consistent than JS itself. However, the maintainer is passionate.

WAAX is a fancy library, with synthesizer abstraction that feel less imperative and more declarative. However, the abstractions also feel over-engineered, and the boss-guy has been hired by Google and dropped out leaving the code less maintained.

audiolib.js is one of the others to have sequencer abstractions but is also abandoned. Its chief virtue is a lucid introduction to web audio on its homepage.

The incredibleFaust DSP language cantarget optimised javascript (To learn: how to mix JavascriptNode and native AudioNode with this approach.)

WebPd is a partial port of puredata to javascript. This is a bizarre project — partially porting one obscure high-level audio language to a less obscure high-level audio language doesn’t open up many new horizons AFAICT. Nonetheless I like the puredata community — who tilt the best windmills — so it deserves a link.

Timbre.js is another webaudio library. It was an early adopter, but also stagnated early.

Audiocomponents is a strategy to use the new webcomponents system to (more or less) make Webaudio a matter of pure HTML. Webcomponents are something I might use when the dust and standards wars have settled and they integrate with other stuff, if they have provided any concrete benefit after all that.

omnitone does 3d decoding of ambisonic, for all your surround-sounds needs.

beep … “is a JavaScript toolkit for building browser-based synthesizers.” Abandoned.

react-music is a cute hack - composeable React components which happen to produce audio

I don’t know why there are so many audio libraries; The most tricky thing is sequencing, and of course, most would ratherpaint the bike shed than solve that one.

- Tone.js does sequencing well
- motor.js is a step sequencer, simple and pretty.
- Jazzari is a cute example of a minimal algorithmic sequencer.
- manual sequencing example from a MIDI player.
- Essential tutorial onScheduling

🏗

This is a big enough topic to merit its own section onthe UI page.

NB certain complexities arise for audio stuff (e.g. you don’t really want to use React to update your GUI 44100 times/seconds; look out for that).

[wavesurfer.js)[https://wavesurfer-js.org/) is a widget that plots the audio waveform as well as providing a player.

Works! (At least in Chrome) Read theWebMIDI spec. Works well withTone.js.

Some explained, some for reverse-engineering, all simple.

- MusicMappr does
- Pink Trombone
- Enough of those exist at premier JS noise vendorssoft object.
- You can build your own graphically atblokdust.
- A synthesizer howto
- Chromium examples
- Building theMonotron
- A classic: thejavascript dubstep generator
- Jazzari deserves a second mention.

Tips and tricks for collaborative data sharing, e.g. forreproducible research.

Related: the problems of organising the data efficiently for the task in hand.
For that, seedatabase, and the task ofversioning the data.
Also related: the problem of finding some good data for your research project, ideally without having to do the research yourself.
For some classic datasets that*use* these data sharing methods (and others) seedata sets.

You’ve finished writing a paper? Congratulations.

Online services to host supporting data from your finished projects in the name of reproducible research. The data gets uploaded once and thereafter is static.

There is not much to this, except that you might want verification of your data — e.g. someone who will vouch that you have not tampered with it after publication. You might also want a persistent identifier such as a DOI so that other researchers can refer to your work in an academe-endorsed fashion.

- Figshare, which hosts the supporting data for many researchers. It gives you a DOI for your dataset. Up to 5GB. Free.
- Zenodo is similar. Backed by CERN, on their infrastructure. Uploads get a DOI. Up to 50GB. Free.
- IEEE Dataport happily hosts 2TB datasets. It gives you a DOI and integrates with many IEEE publications, plus allows convenient access from the Amazon cloud via AWS, which might be where your data is anyway. They charge USD2000 for an open access upload, and otherwise only other IEEE dataport users can get at your data. I know this is not an unusual way for access to journal articles to work, but for data sets it feels like a ham-fisted way of enforcing scarcity, and it is hard to see how they will compete with Zenodo except for the Very Large Dataset users.
- Datadryad gives 10Gb of data per DOI, and validates it. USD120/dataset. Free formembers which, on balance of probability, is not your institution but why not check?
- Some campuses offer their own systems, e.g. my university offersresdata.
- DIY option. You could probably upload your data, if not too large, to github and for veracity get a trusted party to cryptographically sign it. Or indeed you could upload it to anywhere and get someone to cryptographically sign it. The problem with such DIY solutions is that they are unstable - few data sets last more than a few years with this kind of set up. Campus web servers shut down, hosting fees go up etc. On the plus side you can make a nice presentational web page explaining everything and providing nice formatting for the text and tables and such.
- Open Science Framework doesn’t host data, but it does index data sets in google drive or whatever and make them coherently available to other researchers.

The question that you are asking about all of these if you are me is: can I make a nice web front-end to my media examples? Can I play my cool movies or audio example? The answer is, AFAICT, not in general; one would need to build an extra front-end and even then it might have difficulty streaming video or whatever from the fancy data store. Media streaming is a DIY option.

Recommendation: If your data is small, make a DIY site to show it off for users and also make site on e.g. Zenodo to host if for future users.

If you are sharing data for*ongoing* collaboration (the experiments are are still accumulating
data) you might want a different tool, with less focus on DOIs/verification and
more on convenient updating and reproducibility of intermediate results.

Realistically, seeing how often data sets are found to be flawed, or how often they can be improved, I’m not especially interested in verifiable one-off spectacular data releases. I’m interested in accessing collaborative, incremental, and improvable data. That is, after all, how research itself progresses.

The next options are solutions to simplify that kind of thing.

Dataverse is an open-source data storage/archive system, hosted by some large partners.

TBC; I’m using this right now but have little time to say things.

For uploading serious data theDVUploader app is best.Downloads are here but it takes a little work to findthe manual. It supports useful stuff like “direct upload” mode (sending file direct to the backend store instead of via the dataverse frontend) which is an order of magnitude faster and more reliable than the indirect alternative.

The python APIpydataverse is not great at time of writing; too much uploading and downloading of data, file sizes capped at 2gb with default python distribution… Most things are just about as easy if we use curl commands from the command line, and some things are impossible with Python.

dolthub is the collaborative/sharey arm ofdolt, theversioning database for relational data. I presume that means it is implicitly a data sharing system

We’re excited to announce the public beta ofXetHub, a collaborative storage platform for data management. XetHub aims to address each of the above requirements head-on towards our end goal: to make working with data as fast and collaborative as working with code.[…]

With XetHub, users can run the same flows and commands they already use for code (e.g., commits, pull requests, history, and audits) with repositories of up to 1 TB. Our Git-backed protocol allows easy integration with existing workflows, with no need to reformat files or adopt heavyweight data ecosystems, and also allows for incremental difference tracking on compatible data types.

a.k.a. Data science Version control

DVC looks like it gets us data sharing as a side effect. Versions code with data assets in some external data store like S3 or whatever, which means they are shareable if you set the permissions right. Read more atDVC/data versioning.

dat, tracks updates to your datasets and shares the updates in a distributed fashion. I would use this for sharing predictably updated research, because I wished to have the flexibility of updating my data, at the cost of keeping the software running. But if I publish to zenodo, all my typos and errors are immortalised for all time so I might be too afraid to ever get around to publishing it. Read more atDat/data versioning.

Not sure yet. TBC.

Qu publishes any old data from a mongodb store. Mongodb needs more effort to set up than I am usually prepared to tolerate, and isn’t great for dense binary blobs, which is my stock in trade, so I won’t explore that further.

Google’s open data set protocol, which they call their “Dataset Publishing Language”, is a standard for medium-size datasets with EZ visualisations.

Open Science Framework seems to strive to be github-for-preserving-data-assets. TODO.

rOpensci provides a number of open data set importers that work seamlessly. They are a noble target for your own data publishing efforts.

Dan Hopkins and Brendan Nyhan onHow to make scientific research more trustworthy.

CKAN “is a powerful data management system that makes data accessible — by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.”

- Seems to have a data table library calledrecline.

hugo is astatic site generator which I happen find amenable sometimes. Its documentation is confusing and abstruse sometimes.

I mostly use it at a backend forblogdown but it is useful for other sites.

Pluses:

- fast (build times are quick)
- compact (does not install hundreds of megabytes of crap when you run it like most node.js-based ones)
- sort of easy, in that I do not need to learn to write javascript components to do basic stuff
- powerful shortcodes to do formatting stunts
- deep integration ofmarkdown processing
- many community plugins

Minuses:

- documentation confusing and abstruse
- mathematics support is a dumpster fire
- when hugo is used as a backend to other things (quarto,blogdown) it is only the stock/default hugo which does not include the powerful plugins or community support.

For 4 years or so there has been an argument about hugo and math support.
The current settlement is that it*can* work natively but that is tedious, and there are extensions that make it not terrible (`goldmark-mathjax`

) but they are not where you need them (.e.g not supported by standard hosts)
i.e. it is ginding low-level friction and irritating hacks.
InAdd MathJax build support -- post-hugo processing · Issue #6694 · gohugoio/hugo,shreevatsa summarises:

(at least) two issues are mixed up here:

Input syntax: MathJax (LaTeX) syntax versus Markdown. The question of whether a user can just type LaTeX syntax, or needs to incorporate many workarounds like escaping underscores and backslashes, so that after passed through Markdown the intended LaTeX syntax remains. (This seems to be

Server-side versus client-side processing: Whether the HTML ultimately generated by Hugo should simply contain the MathJax input (to be processed by client-side---i.e. JavaScript---MathJax in the reader's browser) or should already be processed (server-side) via MathJax (so no JavaScript needs to run client-side).

If a user is happy with special input and client-side MathJax, then this is already possible with stock Hugo by

being super-careful about escaping all Markdown-special characters in one's input (probably not practical), orusing shortcodes /

`.inner`

, andloading the MathJax (or KaTeX) JavaScript in the header or body of one's page.

But otherwise, either problem is unsolved AFAICT.

Various arguments about how to fix it:

- Use goldmark-mathjax extension
- Consider markdown extensions for math typesetting in Hugo · Issue #6544 · gohugoio/hugo
- How to render math equations properly [KaTeX, Goldmark]? - support - HUGO

Workarounds to shoehorn the current configuration into working:

- Math Typesetting in Hugo | Mert Bakır
- Render LaTeX math expressions in Hugo with MathJax 3 · Geoff Ruddock
- Writing math with Hugo | Misha Brukman

Forks of hugo supporting better fixes:

Natively I have no idea. Citations are supported via pandoc, however the hugo configuration options areinsufficentlyflexible. The secret pro tip is thathugo already supports pandoc config via double metadata YAML blocks.

Seequarto.

Site.js is one of several neat tools that integrates hugo.

One person.

Develop and test on your own device. One server.

Sync and deploy to your own VPS. One site.

Host your own site at your own domain

Caddy, is a low-key single serveweb server withautomatic hugo integration.

Native calendar event output is supported but looks a little messy.

- iCal Event Maker - generate ics file (iCalendar) to share your events
- Adding iCalendar Feeds for Events in Hugo · Jamie Tanna

Perhaps ingesting content from aremote calendar server is a better idea?

.

The classicpython plotting tool ismatplotlib. It can’t do all those modern hipster graphs without hard labour and is awful at animations and interactions, and it fugly per default. It works OK out of the box. There are libraries which use matplotlib as a backend and build more elaborate systems on the top, but these have not had much longevity so far, so I find myself falling back to plain old matplotlib. It is an acceptable default with lots of weird edge cases when you try to be clever, but gets the job 80% done.

Note someconfusing terminology;
An`Axes`

object, which is constructed by an`add_subplot`

command,
contains two`Axis`

objects, but is much more than a list of such objects, being
the fundamental object upon which a graph is drawn.

But don’t listen to me describe it. Observe this lovely diagram which explains all.

.

ReadJakevdp’s manual for some pedagogic advice.

- If I am usingjupyter, the nerdy extension isjupyter-matplotlib which integrates interactive plotting into the notebook better.
- Improving log y-axis plots, esp histograms
- drawnow allows dynamically updated diagrams. It is, ironically, itself updated not especially dynamically.
- Nicolas P. Rougier’ scient visualisation book/rougier/scientific-visualization-book: An open access book on scientific visualization using python and matplotlib

Confusing.

```
%matplotlib ipympl
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(3*x)
ax.plot(x, y)
```

Traditionally annoying. There are colorbars everywhere and aspect ratios are horrible and getting multiple images to plot is vexing etc.

There are helpers for this in modern matplotlib.
The keyword to look for is`mpl_toolkits.axes_grid1`

.
See also thetutorial.

Alternative, useproplot

matplotlib will always be horrible to ue because the designers were not prophets so some of their early API design guesses were bad, but we are stuck with them for compatibility. There are various attempts to abstract away matplotlib’s horrible API behind nicer ones.

Matplotlib is an extremely versatile plotting package used by scientists and engineers far and wide. However, matplotlib can be cumbersome or repetitive for users who…

Make highly complex figures with many subplots.

Want to finely tune their annotations and aesthetics.

Need to make new figures nearly every day.

Proplot’s core mission is to provide a smoother plotting experience for matplotlib’s most demanding users. We accomplish this by

expanding uponmatplotlib’sobject-oriented interface. Proplot makes changes that would be hard to justify or difficult to incorporate into matplotlib itself, owing to differing design choices and backwards compatibility considerations.This page enumerates these changes and explains how they address the limitations of matplotlib’s default interface. To start using these features, see theusage introduction and theuser guide.

The next-generation seaborn interface attempts to achieve a pythonic equivalent to ggplot, at least somewhat:

as seaborn has become more powerful, one has to write increasing amounts of matpotlib code to recreate what it is doing.

So the goal is to expose seaborn’s core features — integration with pandas, automatic mapping between data and graphics, statistical transformations — within an interface that is more compositional, extensible, and comprehensive.

One will note that the result looks a bit (a lot?) like ggplot. That’s not unintentional, but the goal is also not to “port ggplot2 to Python”. (If that’s what you’re looking for, check out the very niceplotnine package). There is an immense amount of wisdom in the grammar of graphics and in its particular implementation as ggplot2. But I think that, as languages, R and Python are just too different for idioms from one to feel natural when translated literally into the other. So while I have taken much inspiration from ggplot, I’ve also made plenty of choices differently, for better or for worse.

**UPDATE** this has been released.

Plotnine implements a best-effort clone ofR’sggplot2 library for matplotlib I believe plotnine supersedes the abandoned(?)ggplot.py by yhat (ggplot source,plotnine source).

The default matplotlib stylesheet aspires to look like 80s spreadsheet defaults, but if you are not a retrofuturist, you want to change the stylesheet Some of the built-in stylesheets are OK.

Here is an ugly gallery of sometimes-beautiful graph styles. Andhere is an ugly gallery of sometimes-beautiful colour maps.

Seaborn is another vaunted extension, which I would describe as an “Edward Tufterizer”, originally by Michael Waskom. Extends matplotlib with modern appearance and some missing plot types.

```
import seaborn as sns
sns.set_theme()
```

A cute hack to justify matplotlib’s existence:xkcd graphs.

`plt.savefig("image.png", dpi=300, bbox_inches='tight', pad_inches=0)`

Suppressing?

```
ax = plt.gca()
ax.axes.xaxis.set_visible(False)
ax.axes.yaxis.set_visible(False)
```

An interactive exploratory matplotib GUI toolkit/app isglue. They have solved a lot ofpython gui problems, bless them, and have tried to make everything more-or-less interactive.

Glue is designed with "data-hacking" workflows in mind, and can be used in different ways. For instance, you can simply make use of the graphical Glue application as is, and never type a line of code. However, you can also interact with Glue via Python in different ways:

- Using the IPython terminal built-in to the Glue application
- Sending data in the form of NumPy arrays or Pandas DataFrames to Glue for exploration from a Python or IPython session.
- Customizing/hacking your Glue setup using
`config.py`

files, including automatically loading and clean data before starting Glue, writing custom functions to parse files in your favorite file format, writing custom functions to link datasets, or creating your own data viewers.Glue thus blurs the boundary between GUI-centric and code-centric data exploration. In addition, it is also possible to develop your own plugin packages for Glue that you can distribute to users separately, and you can also make use of the Glue framework in your own application to provide data linking capabilities.

I have an array of images in`arr`

.
How can I plot them on a nice simple plot?
I need to do this all the time.
If I have skimage installed I can use themontage function.
I do not always have that installed though. Here is a snippet to do it by hand:

```
columns = 5
rows = 3
fsize = 6
fig = plt.figure(figsize=(fsize *columns/rows, fsize))
for i in range(1, columns*rows +1):
img = arr[1,:,:,i]
ax = fig.add_subplot(rows, columns, i)
plt.imshow(img)
ax.set_axis_off()
plt.tight_layout(pad = 1)
plt.show()
```

Alternative, useproplot.

Corrupt font cache? Some places we see this advice

```
import matplotlib.font_manager as fm
fm._rebuild()
```

That does not work for me as of matplotlib 3.6. What does is finding the path thusly:

```
import matplotlib
matplotlib.get_cachedir()
```

and then removing it with`rm -rf`

.
Still cannot find that fint? what if you are on the HPC?

Harriet Alexander advisessetting up truetype fonts properly.

Agustinus Kristiadi, inThe Last Mile of Creating Publication-Ready Plots introducestexworld/tikzplotlib,, which is a tikz plotting backend; why do we want this? For one, it can match fonts to the parent document.

Someone made the idiosyncratic choice that default
font is sans serif, even formathematical text.You can change this
by setting serif fonts also for`mathtext`

.

```
from matplotlib import rc
rc(
'font',
family='serif',
serif=['Palatino']
)
rc(
'mathtext',
fontset='dejavuserif'
)
```

Supported math fonts arereputedly

- dejavusans (the horrible default)
- dejavuserif (beware of odd greek letters)
- cm (“Computer Modern”. Classic, dated.)
- stix (modern serif, looks OK)
- stixsans (sounds like sans serif to me)

I am not aware of any publications which use those fonts in their style guides, but the difference between, say Deja Vu Serif and Times New Roman is small enough that only font nerds notice.

Alternatively I canrender graph labels with TeX which leads to some weird spacing but allows me to match fonts better. It is also fragile and character set issues are terrible. Are these problems eased if I useXeLaTeX/LuaLaTeX?

Further suggestion: Automate these selection problems away usingTUEplots.

TUEplots is a light-weight matplotlib extension that adapts your figure sizes to formats more suitable for scientific publications. It produces configurations that are compatible with matplotlib’s rcParams, and provides fonts, figure sizes, font sizes, color schemes, and more, for a number of publication formats.

```
import matplotlib.pyplot as plt
from tueplots import bundles
bundles.icml2022()
{'text.usetex': True, 'font.family': 'serif', 'text.latex.preamble': '\\usepackage{times} ', 'figure.figsize': (3.25, 2.0086104634371584), 'figure.constrained_layout.use': True, 'figure.autolayout': False, 'savefig.bbox': 'tight', 'savefig.pad_inches': 0.015, 'font.size': 8, 'axes.labelsize': 8, 'legend.fontsize': 6, 'xtick.labelsize': 6, 'ytick.labelsize': 6, 'axes.titlesize': 8}
# Plug any of those into either the rcParams or into an rc_context:
with plt.rc_context(bundles.icml2022()):
pass
```

I am indebted to my colleagueChristian Walder for suggesting the following as a reliable initialisation procedure for matplotlib plotting with equations.

```
import matplotlib
import sys, os
# GTK GTKAgg GTKCairo MacOSX Qt4Agg TkAgg WX WXAgg CocoaAgg
# GTK3Cairo GTK3Agg WebAgg agg cairo emf gdk pdf pgf ps svg template
is_mac = sys.platform == 'darwin'
if is_mac:
_matplotlib_backend = 'MacOSX'
else:
_matplotlib_backend = 'pdf'
matplotlib.rcParams['svg.fonttype'] = 'none'
matplotlib.rcParams['backend'] = _matplotlib_backend
matplotlib.rcParams['mathtext.fontset'] = 'stix'
matplotlib.rcParams['font.family'] = 'Times New Roman'
matplotlib.use(_matplotlib_backend)
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.switch_backend(_matplotlib_backend)
# print(matplotlib.pyplot.get_backend())
try:
import cairocffi as cairo
except:
pass
# logging.warning('import cairocffi failed')
_latex_preamble = [
r'\usepackage{amsmath,bm}',
r'\newcommand\what{\hat{\bm{w}}}',
r'\newcommand\tr{^\top}',
r'\newcommand\dt[1]{\left|#1\right|}',]
_latex_path = '/Library/TeX/texbin/'
def use_latex_mpl(
latex_path=_latex_path,
latex_preamble=_latex_preamble):
mpl.rcParams['text.usetex'] = True
mpl.rcParams['text.latex.preamble'] = latex_preamble
if latex_path is not None:
os.environ['PATH'] = '%s:%s' % (os.environ['PATH'], latex_path)
```

Yellowbrick is a matplotlib specialisation for hyperparameter optimisation.

Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.

Doing inference where theprobability metric measuring discrepancy between some target distribution and the implied inferential distribution is anoptimal-transport one. Frequently intractable, but neat when we can get it.

WassersteinGANs and OT Gans(Salimans et al. 2018) are argued to do an approximate optimal transport inference, indirectly.

See e.g.(J. H. Huggins et al. 2018b,2018a) for a particular Bayes posterior approximation using OT.

Daniel Daza inApproximating Wasserstein distances with PyTorch touches uponFatras et al. (2020):

Optimal transport distances are powerful tools to compare probability distributions and have found many applications in machine learning. Yet their algorithmic complexity prevents their direct use on large scale datasets. To overcome this challenge, practitioners compute these distances on minibatches i.e., they average the outcome of several smaller optimal transport problems. We propose in this paper an analysis of this practice, which effects are not well understood so far. We notably argue that it is equivalent to an implicit regularization of the original problem, with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with defects such as loss of distance property.

Optimal Transport Tools (OTT)(Cuturi et al. 2022), a toolbox for all things Wasserstein (documentation):

The goal of OTT is to provide sturdy, versatile and efficient optimal transport solvers, taking advantage of JAX features, such asJIT,auto-vectorization andimplicit differentiation.

A typical OT problem has two ingredients: a pair of weight vectors

`a`

and`b`

(one for each measure), with a ground cost matrix that is either directly given, or derived as the pairwise evaluation of a cost function on pairs of points taken from two measures. The main design choice in OTT comes from encapsulating the cost in a`Geometry`

object, and bundle it with a few useful operations (notably kernel applications). The most common geometry is that of two clouds of vectors compared with the squared Euclidean distance, as illustrated in the example below:

A self-contained example of this in action:

```
import jax
import jax.numpy as jnp
from ott.tools import transport
# Samples two point clouds and their weights.
rngs = jax.random.split(jax.random.PRNGKey(0),4)
n, m, d = 12, 14, 2
x = jax.random.normal(rngs[0], (n,d)) + 1
y = jax.random.uniform(rngs[1], (m,d))
a = jax.random.uniform(rngs[2], (n,))
b = jax.random.uniform(rngs[3], (m,))
a, b = a / jnp.sum(a), b / jnp.sum(b)
# Computes the couplings via Sinkhorn algorithm.
ot = transport.solve(x, y, a=a, b=b)
P = ot.matrix
```

The call to

`sinkhorn`

above works out the optimal transport solution by storing its output. The transport matrix can be instantiated using those optimal solutions and the`Geometry`

again. That transport matrix links each point from the first point cloud to one or more points from the second, as illustrated below.To be more precise, the

`sinkhorn`

algorithm operates on the`Geometry`

, taking into account weights`a`

and`b`

, to solve the OT problem, produce a named tuple that contains two optimal dual potentials`f`

and`g`

(vectors of the same size as`a`

and`b`

), the objective`reg_ot_cost`

and a log of the`errors`

of the algorithm as it converges, and a`converged`

flag.

POT: Python Optimal Transport(Rémi Flamary et al. 2021)

This open source Python library provide several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.

Website and documentation:https://PythonOT.github.io/

Source Code (MIT):https://github.com/PythonOT/POT

POT provides the following generic OT solvers (links to examples):

- OT Network Simplex solver for the linear program/ Earth Movers Distance .
- Conditional gradient andGeneralized conditional gradient for regularized OT .
- Entropic regularization OT solver withSinkhorn Knopp Algorithm , stabilized version , greedy Sinkhorn andScreening Sinkhorn.
- Bregman projections forWasserstein barycenter ,convolutional barycenter and unmixing .
- Sinkhorn divergence and entropic regularization OT from empirical data.
- Debiased Sinkhorn barycentersSinkhorn divergence barycenter
- Smooth optimal transport solvers (dual and semi-dual) for KL and squared L2 regularizations .
- Weak OT solver between empirical distributions
- Non regularizedWasserstein barycenters with LP solver (only small scale).
- Gromov-Wasserstein distances andGW barycenters (exact and regularized ), differentiable using gradients from Graph Dictionary Learning
- Fused-Gromov-Wasserstein distances solver andFGW barycenters
- Stochastic solver anddifferentiable losses for Large-scale Optimal Transport (semi-dual problem and dual problem )
- Sampled solver of Gromov Wasserstein for large-scale problem with any loss functions
- Non regularizedfree support Wasserstein barycenters .
- One dimensional Unbalanced OT with KL relaxation andbarycenter\[10, 25\]. Alsoexact unbalanced OT with KL and quadratic regularization and theregularization path of UOT
- Partial Wasserstein and Gromov-Wasserstein (exact and entropic formulations).
- Sliced Wasserstein\[31, 32\] and Max-sliced Wasserstein that can be used for gradient flows .
- Graph Dictionary Learning solvers .
- Several backends for easy use of POT withPytorch/jax/Numpy/Cupy/Tensorflow arrays.
POT provides the following Machine Learning related solvers:

- Optimal transport for domain adaptation withgroup lasso regularization,Laplacian regularization andsemi supervised setting.
- Linear OT mapping andJoint OT mapping estimation .
- Wasserstein Discriminant Analysis (requires autograd + pymanopt).
- JCPOT algorithm for multi-source domain adaptation with target shift .
Some other examples are available in thedocumentation.

The

GeomLosslibrary provides efficient GPU implementations for:

- Kernel norms (also known asMaximum Mean Discrepancies).
- Hausdorff divergences, which are positive definite generalizations of theChamfer-ICP loss and are analogous to
log-likelihoodsof Gaussian Mixture Models.- Debiased Sinkhorn divergences, which are affordable yet
positive and definiteapproximations ofOptimal Transport (Wasserstein) distances.It is hosted onGitHub and distributed under the permissiveMIT license.

GeomLoss functions are available through the customPyTorch layers

`SamplesLoss`

,`ImagesLoss`

and`VolumesLoss`

which allow you to work with weightedpoint clouds(of any dimension),density mapsandvolumetric segmentation masks.

Rigollet and Weed (2018):

We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance.

Thomas Viehmann,An efficient implementation of the Sinkhorn algorithm for the GPU is a Pytorch CUDA extension(Viehmann 2019)

Marco Cuturi’s course notes on OT include a 400 page slide deck.

Agueh, Martial, and Guillaume Carlier. 2011.“Barycenters in the Wasserstein Space.”*SIAM Journal on Mathematical Analysis* 43 (2): 904–24.

Alaya, Mokhtar Z., Maxime Berar, Gilles Gasso, and Alain Rakotomamonjy. 2019.“Screening Sinkhorn Algorithm for Regularized Optimal Transport.”*Advances in Neural Information Processing Systems* 32.

Altschuler, Jason, Jonathan Niles-Weed, and Philippe Rigollet. n.d.“Near-Linear Time Approximation Algorithms for Optimal Transport via Sinkhorn Iteration,” 11.

Ambrogioni, Luca, Umut Guclu, and Marcel van Gerven. 2018.“Wasserstein Variational Gradient Descent: From Semi-Discrete Optimal Transport to Ensemble Variational Inference.”*arXiv:1811.02827 [Cs, Stat]*, November.

Ambrogioni, Luca, Umut Güçlü, Yagmur Güçlütürk, Max Hinne, Eric Maris, and Marcel A. J. van Gerven. 2018.“Wasserstein Variational Inference.” In*Proceedings of the 32Nd International Conference on Neural Information Processing Systems*, 2478–87. NIPS’18. USA: Curran Associates Inc.

Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savare. 2008.*Gradient Flows: In Metric Spaces and in the Space of Probability Measures*. 2nd ed. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel.

Angenent, Sigurd, Steven Haker, and Allen Tannenbaum. 2003.“Minimizing Flows for the Monge-Kantorovich Problem.”*SIAM Journal on Mathematical Analysis* 35 (1): 61–97.

Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017.“Wasserstein Generative Adversarial Networks.” In*International Conference on Machine Learning*, 214–23.

Arora, Sanjeev, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. 2017.“Generalization and Equilibrium in Generative Adversarial Nets (GANs).”*arXiv:1703.00573 [Cs]*, March.

Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019.“Gaussian Processes with Multidimensional Distribution Inputs via Optimal Transport and Hilbertian Embedding.”*arXiv:1805.00753 [Stat]*, April.

Benamou, Jean-David. 2021.“Optimal Transportation, Modelling and Numerical Simulation.”*Acta Numerica* 30 (May): 249–325.

Benamou, Jean-David, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. 2014.“Iterative Bregman Projections for Regularized Transportation Problems.”*arXiv:1412.5154 [Math]*, December.

Berg, Rianne van den, Leonard Hasenclever, Jakub M. Tomczak, and Max Welling. 2018.“Sylvester Normalizing Flows for Variational Inference.” In*UAI18*.

Bishop, Adrian N., and Arnaud Doucet. 2014.“Distributed Nonlinear Consensus in the Space of Probability Measures.”*IFAC Proceedings Volumes*, 19th IFAC World Congress, 47 (3): 8662–68.

Blanchet, Jose, Lin Chen, and Xun Yu Zhou. 2018.“Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances.”*arXiv:1802.04885 [Stat]*, February.

Blanchet, Jose, Arun Jambulapati, Carson Kent, and Aaron Sidford. 2018.“Towards Optimal Running Times for Optimal Transport.”*arXiv:1810.07717 [Cs]*, October.

Blanchet, Jose, Yang Kang, and Karthyek Murthy. 2016.“Robust Wasserstein Profile Inference and Applications to Machine Learning.”*arXiv:1610.05627 [Math, Stat]*, October.

Blanchet, Jose, Karthyek Murthy, and Nian Si. 2019.“Confidence Regions in Wasserstein Distributionally Robust Estimation.”*arXiv:1906.01614 [Math, Stat]*, June.

Blanchet, Jose, Karthyek Murthy, and Fan Zhang. 2018.“Optimal Transport Based Distributionally Robust Optimization: Structural Properties and Iterative Schemes.”*arXiv:1810.02403 [Math]*, October.

Blondel, Mathieu, Vivien Seguy, and Antoine Rolet. 2018.“Smooth and Sparse Optimal Transport.” In*AISTATS 2018*.

Boissard, Emmanuel. 2011.“Simple Bounds for the Convergence of Empirical and Occupation Measures in 1-Wasserstein Distance.”*Electronic Journal of Probability* 16 (none).

Bonneel, Nicolas. n.d.“Displacement Interpolation Using Lagrangian Mass Transport,” 11.

Canas, Guillermo D., and Lorenzo Rosasco. 2012.“Learning Probability Measures with Respect to Optimal Transport Metrics.”*arXiv:1209.1077 [Cs, Stat]*, September.

Carlier, Guillaume, Marco Cuturi, Brendan Pass, and Carola Schoenlieb. 2017.“Optimal Transport Meets Probability, Statistics and Machine Learning,” 9.

Chizat, Lenaic, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. 2017.“Scaling Algorithms for Unbalanced Transport Problems.”*arXiv:1607.05816 [Math]*, May.

Chu, Casey, Jose Blanchet, and Peter Glynn. 2019.“Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning.” In*ICML*.

Corenflos, Adrien, James Thornton, George Deligiannidis, and Arnaud Doucet. 2021.“Differentiable Particle Filtering via Entropy-Regularized Optimal Transport.”*arXiv:2102.07850 [Cs, Stat]*, June.

Coscia, Michele. 2020.“Generalized Euclidean Measure to Estimate Network Distances,” 11.

Courty, Nicolas, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. 2016.“Optimal Transport for Domain Adaptation.”*arXiv:1507.00504 [Cs]*, June.

Cuturi, Marco. 2013.“Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances.” In*Advances in Neural Information Processing Systems 26*.

Cuturi, Marco, and Arnaud Doucet. 2014.“Fast Computation of Wasserstein Barycenters.” In*International Conference on Machine Learning*, 685–93. PMLR.

Cuturi, Marco, Laetitia Meng-Papaxanthos, Yingtao Tian, Charlotte Bunne, Geoff Davis, and Olivier Teboul. 2022.“Optimal Transport Tools (OTT): A JAX Toolbox for All Things Wasserstein.”*arXiv Preprint arXiv:2201.12324*.

Fatras, Kilian, Younes Zine, Rémi Flamary, Remi Gribonval, and Nicolas Courty. 2020.“Learning with Minibatch Wasserstein : Asymptotic and Gradient Properties.” In*Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics*, 2131–41. PMLR.

Fernholz, Luisa Turrin. 1983.*von Mises calculus for statistical functionals*. Lecture Notes in Statistics 19. New York: Springer.

Feydy, Jean, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouve, and Gabriel Peyré. 2019.“Interpolating Between Optimal Transport and MMD Using Sinkhorn Divergences.” In*Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics*, 2681–90. PMLR.

Flamary, Rémi, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, et al. 2021.“POT: Python Optimal Transport.”*Journal of Machine Learning Research* 22 (78): 1–8.

Flamary, Rémi, Marco Cuturi, Nicolas Courty, and Alain Rakotomamonjy. 2018.“Wasserstein Discriminant Analysis.”*Machine Learning* 107 (12): 1923–45.

Flamary, Remi, Alain Rakotomamonjy, Nicolas Courty, and Devis Tuia. n.d.“Optimal Transport with Laplacian Regularization,” 10.

Frogner, Charlie, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. 2015.“Learning with a Wasserstein Loss.” In*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2053–61. Curran Associates, Inc.

Gao, Rui, and Anton J. Kleywegt. 2022.“Distributionally Robust Stochastic Optimization with Wasserstein Distance.” arXiv.

Garbuno-Inigo, Alfredo, Franca Hoffmann, Wuchen Li, and Andrew M. Stuart. 2020.“Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.”*SIAM Journal on Applied Dynamical Systems* 19 (1): 412–41.

Genevay, Aude, Marco Cuturi, Gabriel Peyré, and Francis Bach. 2016.“Stochastic Optimization for Large-Scale Optimal Transport.” In*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3432–40. Curran Associates, Inc.

Genevay, Aude, Gabriel Peyré, and Marco Cuturi. 2017.“Learning Generative Models with Sinkhorn Divergences.”*arXiv:1706.00292 [Stat]*, October.

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014.“Generative Adversarial Nets.” In*Advances in Neural Information Processing Systems 27*, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2672–80. NIPS’14. Cambridge, MA, USA: Curran Associates, Inc.

Gozlan, Nathael, and Christian Léonard. 2010.“Transport Inequalities. A Survey.”*arXiv:1003.3852 [Math]*, March.

Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017.“Improved Training of Wasserstein GANs.”*arXiv:1704.00028 [Cs, Stat]*, March.

Guo, Xin, Johnny Hong, Tianyi Lin, and Nan Yang. 2017.“Relaxed Wasserstein with Applications to GANs.”*arXiv:1705.07164 [Cs, Stat]*, May.

Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018a.“Scalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.”*arXiv:1806.10234 [Cs, Stat]*, June.

———. 2018b.“Practical Bounds on the Error of Bayesian Posterior Approximations: A Nonasymptotic Approach.”*arXiv:1809.09505 [Cs, Math, Stat]*, September.

Huggins, Jonathan H., Mikołaj Kasprzak, Trevor Campbell, and Tamara Broderick. 2019.“Practical Posterior Error Bounds from Variational Objectives.”*arXiv:1910.04102 [Cs, Math, Stat]*, October.

Huggins, Jonathan, Ryan P Adams, and Tamara Broderick. 2017.“PASS-GLM: Polynomial Approximate Sufficient Statistics for Scalable Bayesian GLM Inference.” In*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 3611–21. Curran Associates, Inc.

Khan, Gabriel, and Jun Zhang. 2022.“When Optimal Transport Meets Information Geometry.”*Information Geometry*, June.

Kim, Jin W., and Prashant G. Mehta. 2019.“An Optimal Control Derivation of Nonlinear Smoothing Equations,” April.

Léonard, Christian. 2014.“A Survey of the Schrödinger Problem and Some of Its Connections with Optimal Transport.”*Discrete & Continuous Dynamical Systems - A* 34 (4): 1533.

Liu, Huidong, Xianfeng Gu, and Dimitris Samaras. 2018.“A Two-Step Computation of the Exact GAN Wasserstein Distance.” In*International Conference on Machine Learning*, 3159–68.

Liu, Qiang, and Dilin Wang. 2019.“Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” In*Advances In Neural Information Processing Systems*.

Louizos, Christos, and Max Welling. 2017.“Multiplicative Normalizing Flows for Variational Bayesian Neural Networks.” In*PMLR*, 2218–27.

Magyar, Jared C., and Malcolm S. Sambridge. 2022.“The Wasserstein Distance as a Hydrological Objective Function.” Preprint. Catchment hydrology/Mathematical applications.

Mahdian, Saied, Jose Blanchet, and Peter Glynn. 2019.“Optimal Transport Relaxations with Application to Wasserstein GANs.”*arXiv:1906.03317 [Cs, Math, Stat]*, June.

Mallasto, Anton, Augusto Gerolin, and Hà Quang Minh. 2021.“Entropy-Regularized 2-Wasserstein Distance Between Gaussian Measures.”*Information Geometry*, August.

Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016.“Sampling via Measure Transport: An Introduction.” In*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1:1–41. Cham: Springer International Publishing.

Maurya, Abhinav. 2018.“Optimal Transport in Statistical Machine Learning : Selected Review and Some Open Questions.” In.

Minh, Hà Quang. 2022.“Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.”*SIAM/ASA Journal on Uncertainty Quantification*, February, 96–124.

Mohajerin Esfahani, Peyman, and Daniel Kuhn. 2018.“Data-Driven Distributionally Robust Optimization Using the Wasserstein Metric: Performance Guarantees and Tractable Reformulations.”*Mathematical Programming* 171 (1): 115–66.

Montavon, Grégoire, Klaus-Robert Müller, and Marco Cuturi. 2016.“Wasserstein Training of Restricted Boltzmann Machines.” In*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3711–19. Curran Associates, Inc.

Ostrovski, Georg, Will Dabney, and Remi Munos. n.d.“Autoregressive Quantile Networks for Generative Modeling,” 10.

Panaretos, Victor M., and Yoav Zemel. 2019.“Statistical Aspects of Wasserstein Distances.”*Annual Review of Statistics and Its Application* 6 (1): 405–31.

Perrot, Michaël, Nicolas Courty, Rémi Flamary, and Amaury Habrard. n.d.“Mapping Estimation for Discrete Optimal Transport,” 9.

Peyré, Gabriel, and Marco Cuturi. 2019.*Computational Optimal Transport*. Vol. 11.

Peyré, Gabriel, Marco Cuturi, and Justin Solomon. 2016.“Gromov-Wasserstein Averaging of Kernel and Distance Matrices.” In*International Conference on Machine Learning*, 2664–72. PMLR.

Redko, Ievgen, Nicolas Courty, Rémi Flamary, and Devis Tuia. 2019.“Optimal Transport for Multi-Source Domain Adaptation Under Target Shift.” In*The 22nd International Conference on Artificial Intelligence and Statistics*, 849–58. PMLR.

Rezende, Danilo Jimenez, and Shakir Mohamed. 2015.“Variational Inference with Normalizing Flows.” In*International Conference on Machine Learning*, 1530–38. ICML’15. Lille, France: JMLR.org.

Rigollet, Philippe, and Jonathan Weed. 2018.“Entropic Optimal Transport Is Maximum-Likelihood Deconvolution.” arXiv.

Rustamov, Raif M. 2019.“Closed-Form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto-Encoders.”*arXiv:1901.03227 [Cs, Stat]*, January.

Salimans, Tim, Han Zhang, Alec Radford, and Dimitris Metaxas. 2018.“Improving GANs Using Optimal Transport.” arXiv.

Sambridge, Malcolm, Andrew Jackson, and Andrew P Valentine. 2022.“Geophysical Inversion and Optimal Transport.”*Geophysical Journal International* 231 (1): 172–98.

Santambrogio, Filippo. 2015.*Optimal Transport for Applied Mathematicians*. Edited by Filippo Santambrogio. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing.

Schmitzer, Bernhard. 2019.“Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems.”*arXiv:1610.06519 [Cs, Math]*, February.

Solomon, Justin, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. 2015.“Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains.”*ACM Transactions on Graphics* 34 (4): 66:1–11.

Spantini, Alessio, Daniele Bigoni, and Youssef Marzouk. 2017.“Inference via Low-Dimensional Couplings.”*Journal of Machine Learning Research* 19 (66): 2639–709.

Taghvaei, Amirhossein, and Prashant G. Mehta. 2019.“An Optimal Transport Formulation of the Ensemble Kalman Filter,” October.

Verdinelli, Isabella, and Larry Wasserman. 2019.“Hybrid Wasserstein Distance and Fast Distribution Clustering.”*Electronic Journal of Statistics* 13 (2): 5088–5119.

Viehmann, Thomas. 2019.“Implementation of Batched Sinkhorn Iterations for Entropy-Regularized Wasserstein Loss.” arXiv.

Wang, Prince Zizhuang, and William Yang Wang. 2019.“Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling.” In*Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)*, 284–94. Minneapolis, Minnesota: Association for Computational Linguistics.

Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020.“Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In*Proceedings of NeurIPS 2020*.

Zhu, B., J. Jiao, and D. Tse. 2020.“Deconstructing Generative Adversarial Networks.”*IEEE Transactions on Information Theory* 66 (11): 7155–79.

Friends of mine in the dating circuit occasionally raise a qualm about a prospective match:*Ah! but they are so much older/younger than me! Is that unfair, that I am fishing outside my age bracket?*

There are various heuristics to address this problem—e.g. a romantic match is acceptable if

- they are older than half your age plus 10 years
- if they are within +/- 10 years of your age
- etc.

PISH POSH. Comparable ages of romantic partners is a quantitative question so we can SOLVE IT WITH DATA. This page is the DEFINITIVE answer to optimal dating based on selecting partners’ age in the optimal method: choosing them to be in the same population percentile as yourself.

To keep this simple, we start with classic vanilla heterosexual dating (males seeking females and vice versa), and we will do it with the population profile of Australia, but feel free to update this with whatever configurations of inter-gender/international attraction you see fit. Bonus exercise: You could add relationship status in there, religious community size etc.

At the end we will have a model of age-parity in dating, and another slice through the classicage mismatch problem which will resolve it forever.1

Solutions here are non-prescriptive; I do not guarantee that utopia is attained either by following this model, or by ignoring it. I do not wish to quash love in any of its forms. But if your love cannot handle a bit of quantitative critique, maybe you need to rethink it.

OK, estimated Australian population statistics are atABS catalog number 3101.0, and we can download them usingstandard Australian data tools.
A little googling reveals additionally, thatwe are after table 59,*Estimated Resident Population By Single Year Of Age, Australia*.

I started writing an immaculate R script to download, extract and clean this data, but I got bored, so instead I simply copy-pasted the data for 2021from a basic spreadsheet, which is less maintainable but TBH no-one is paying me for this, so…

OK, let us put in some data-prep groundwork.

```
library("dplyr")
pop.males = c(
776290, 831593, 835444, 762032,
837110, 918413, 940528, 928244,
815889, 817302, 794156, 762539,
711192, 617537, 554692, 387842,
249962, 208817)
pop.females = c(
733669, 785181, 788448, 717600,
786274, 903618, 959092, 939143,
838611, 832733, 817398, 787968,
753833, 662606, 592081, 419353,
295446, 325443)
bin.edges = c(
0, 5, 10, 15,
20, 25, 30, 35,
40, 45, 50, 55,
60, 65, 70, 75,
80, 85, 90)
# approximately delete ages under 16
# which is the age of consent in most of Australia;
# see https://en.wikipedia.org/wiki/Ages_of_consent_in_Oceania#Australia
pop.males = pop.males[-3:-1]
pop.males[1] = 0.8 * pop.males[1]
pop.females = pop.females[-3:-1]
pop.females[1] = 0.8 * pop.females[1]
bin.edges = bin.edges[-3:-1]
bin.edges[1] = bin.edges[1] + 1
```

Boring data prep over, lets plot some stuff.

```
library("plotly")
fig <- plot_ly(
x = bin.edges[1:length(bin.edges)-1],
) %>% add_lines(
y = pop.females,
name = "females",
line = list(shape = "hv"),
hovertemplate = 'Females: %{y:.3s}'
) %>% add_lines(
y = pop.males,
name = "males",
line = list(shape = "hv"),
hovertemplate = 'Males: %{y:.3s}'
) %>% layout(
xaxis = list(title = 'Age'),
yaxis = list(title = 'Population')
)
fig
```

So if you are, e.g. a male in the 30-35 age bracket, you know there are about 0.94M people in it with you.

What does that tell us? Interesting stuff I did not know, in fact. Men outnumber women in the young age brackets, up to about 30 years old. After that women are in the majority. Huh. What does this mean about whom to date?

One model for who is near me in age is to take people from my target gender who are*in the same percentile of their gender as me in mine*.
This is perhaps best explained by example.
Let us construct some lookup tables of the population percentile; these tell us where in the population a given age is (I am younger than\(x\%\) of the people if my age is\(y\).)

```
# Cumulative age tables
males.cumul = c(0, cumsum(pop.males))
females.cumul = c(0, cumsum(pop.females))
males.percentile = 100 * males.cumul/males.cumul[length(males.cumul)]
females.percentile = 100 * females.cumul/females.cumul[length(females.cumul)]
males.age.to.percentile = approxfun(bin.edges, males.percentile, rule=2)
females.age.to.percentile = approxfun(bin.edges, females.percentile, rule=2)
males.percentile.to.age = approxfun(males.percentile, bin.edges, rule=2)
females.percentile.to.age = approxfun(females.percentile, bin.edges, rule=2)
```

In action: Suppose that I am 30 and male. How many males are younger than me (in the population of males above the age of consent)?

```
sprintf(
"As a 30 year old male, I am at the %1.1f%% male age percentile",
males.age.to.percentile(30))
```

`## [1] "As a 30 year old male, I am at the 23.3% male age percentile"`

Nice. Which women are at the same percentile as me?

```
sprintf(
"The population-adjusted equivalent age to mine, 30, amongst women is %1.1f",
females.percentile.to.age(males.age.to.percentile(30)))
```

`## [1] "The population-adjusted equivalent age to mine, 30, amongst women is 30.9"`

Note that the equivalent percentile in women is a little higher.
This is because males those populations are not equally distributed.**tl;dr** If I am male, the females in the who are proportionally the same as me with respect to*their* gender, are about 1 year older than me.
By*proportionally* we mean that each of us would have about the same proportion of the population in our*own* gender who are younger (or older) than us.

Demanding my potential partner be*exactly* the same proportional age as me is not plausible (although the astrological implications might be interesting).
How about we admit a "slop" factor, where people are required to be in the same quintile of the population as me, i.e. that my partners are within\(\pm10\%\) of me.
And with that final ingredient we have everything we need.

```
library("plotly")
ages = seq(16, 80, by=0.1)
fig <- plot_ly(
x = ages,
) %>% add_lines(
y = females.percentile.to.age(males.age.to.percentile(ages)-10),
name = "lower bound",
hovertemplate = 'At male age %{x:.3s} lower bound for a female partner is %{y:.3s}'
) %>% add_lines(
y = females.percentile.to.age(males.age.to.percentile(ages)),
name = "equivalent age",
hovertemplate = 'At male age %{x:.3s} equivelent age for a female partner is %{y:.3s}'
) %>% add_lines(
y = females.percentile.to.age(males.age.to.percentile(ages)+10),
name = "upper bound",
hovertemplate = 'At male age %{x:.3s} upper bound for a female partner is %{y:.3s}'
) %>% layout(
xaxis = list(title = 'male age'),
yaxis = list(title = 'female age')
)
fig
```

We could do that over but for female-> male. However, that would just be the same graph reflected about the diagonal, so we leave that as an exercise for the student.

But THERE YOU GO, a perfect solution to the question of whom to date! If you disagree, you… can file a bug report, I guess.

OK, now we are in bonus time. Let us do this over for a hypothetical pansexual revolution, where everyone is indifferent to their partner’s gender:

```
# Cumulative age tables
pop.cumul = c(0, cumsum(pop.males+pop.females))
pop.percentile = 100 * pop.cumul/pop.cumul[length(pop.cumul)]
pop.age.to.percentile = approxfun(bin.edges, pop.percentile, rule=2)
pop.percentile.to.age = approxfun(pop.percentile, bin.edges, rule=2)
fig <- plot_ly(
x = ages,
) %>% add_lines(
y = pop.percentile.to.age(pop.age.to.percentile(ages)-10),
name = "lower bound",
hovertemplate = 'At age %{x:.3s} lower bound for any partner is %{y:.3s}'
) %>% add_lines(
y = pop.percentile.to.age(pop.age.to.percentile(ages)+10),
name = "upper bound",
hovertemplate = 'At age %{x:.3s} upper bound for any partner is %{y:.3s}'
) %>% layout(
xaxis = list(title = 'my age'),
yaxis = list(title = 'their age')
)
fig
```

Nice!
Ok what are we missing?
Selectable slop factor?
A market in age-offset tokens? A dating app? A dating app that uses age-offset tokens?
Something to do with blockchains for some reason?
Feel free to approach me with your~~business plan~~ free labour.

Oh, what I would like is a widget that showed me, GIVEN MY AGE, and a potential partner age, how far they are from me in quantile. That is tricky to plot in plotlyjs; suggestions?

*Forever*on the internet is about 30 minutes, or until a better offer comes along.↩︎

Some notes to the connection betweenreproducibility,scholarly discovery,intellectual propertypeer-review, academic business models and such.

To explain: What was I imagining the clear distinction would be between this page andpublication bias?

Cameron Neylon runs a cottage industry producing pragmatic publishing critique from an institutional economics perspective:

e.g.The Marginal Costs of Article Publishing orA Journal is a Club:

we’d been talking about communities, cultures, economics, “public-making” but it was the word ‘club’ and its associated concepts, both pejorative and positive that crystalised everything. We were talking about the clubbishness of making knowledge — the term “Knowledge Clubs” emerged quickly — but also the benefits that such a club might gain in choosing to invest in wider sharing.

Working paper:Potts et al. (2016). Alternatively, seeAfonso (2014), “How Academia resembles a drug gang”.

In the business setting this often leads incumbent publishers to a kind of spluttering defense of the value they create, while simultaneously complaining that the customer doesn’t appreciate their work. Flip the target slightly and we’d call this “missing the new market opportunity” or “failing to express the value offering clearly”. […]

Lingua, […] has gone from one of the most important journals in analytical linguistics to no longer being in the field, and seems well on its way to becoming irrelevant. How does a company as competent in its business strategy as Elsevier let this happen? I would argue, as I did at the time that the former editorial board of Lingua resigned to form Glossa that it was a failure to understand the assets.

The neoliberal analysis of Lingua showed an asset generating good revenues, with good analytics and a positive ROI. The capitalist analysis focussed on the fixed assets and trademarks. But it turns out these weren’t what was creating value. What was creating value was the community, built around an editorial board and the good will associated with that.

Also, seePushing costs downstream.`

Here’s a thing I would like to be said a little better, but think is importantAn Adversarial Review of “Adversarial Generation of Natural Language”: The argument is that even though it’s nice that arxiv avoids some of the problems of traditional publishing, it inherits some of the problems that traditional publishing tries to avoid. No free lunches.

Journal rank and journal impact factor etc. Who cares? Your funders care, against your advice but whatever, they have the money, so you need to care too in order that they will keep funding you.

Latrobe explains it.Scimago Journal rank is the Google Pagerank-inspired slightly hipper journal ranking. Theirsearch tool is probably what you want.Impact factors come from the 60s and are still around, h-Index is also a thing.journalrank might be a factor too?

According to Latrobe, we have the following indices and (partial list of) weaknesses.

Hirsch index: The number of articles in a journal [h] that have received at least [h] citations over a citation period.

Weaknesses:

- Editors can manipulate by requiring contributors to add citations from their journals
- Increases with age so bias towards researchers with long publication records

Journal Impact Factor: Citations to a journal in the JCR year to items published in the previous two years, divided by the total number of citable items (articles and reviews) published in the journal in the previous two years.

Weaknesses:

Limited to journals within Web of Science

Cannot be used to compare journals across different subject categories

SCImago Journal Rank: Average number of weighted citations received in a year, by articles published in a journal in the previous 3 years.

Weaknesses are that it is “complicated” and that the numbers are small.

So I guess if you*must* do a journal ranking this is the least bad method?

See alsoacademic reading workflow for reader-oriented tips.

A platform for scholarly publishing and peer review that empowers researchers with the

- Autonomy to pursue their passions,
- Authority to develop and disseminate their work, and
- Access to engage with the international community of scholars.

Millions of research papers are available for free on government and university web servers, legally uploaded by the authors themselves, with the express permission of publishers. Unpaywall automatically harvests these freely shared papers from thousands of legal institutional repositories, preprint servers, and publishers, making them all available to you as you read.

Zenodo “is an open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science.”

- Research. Shared. — all research outputs from across all fields of science are welcome!
- Citeable. Discoverable. — uploads gets a Digital Object Identifier (DOI) to make them easily and uniquely citeable…
- Flexible licensing — because not everything is under Creative Commons.
- Safe — your research output is stored safely for the future in same cloud infrastructure as research data from CERN’s Large Hadron Collider.

A major win is the easy DOI-linking of data and code for reproducible research. (for free)

is a free Web publishing tool that will create a complete Web presence for your scholarly conference. OCS will allow you to:

- create a conference Web site
- compose and send a call for papers
- electronically accept paper and abstract submissions
- allow paper submitters to edit their work
- post conference proceedings and papers in a searchable format
- post, if you wish, the original data sets
- register participants
- integrate post-conference online discussions

A peeriodical is a lightweight virtual journal with you as the Editor-in-chief, giving you complete freedom in setting editorial policy to select the most interesting and useful manuscripts for your readers.

I did not find that explanation so useful asthe interview the creators gave.

is an open access online scholarly publishing platform that employs open post-publication peer review. You guessed it! We think transparency from start to finish is critical in scientific communication. […]

Retraction Watch for sufficiently-high-profile-research is a watchdog blog that has somehow ended up doing a lot of well-regarded gatekeeping/exposure.

Iterations of how this system of review and dissemination system could work better? SeePeer review.

The biggest phenomenon in open access, as far as I can tell, is the massive pirate infrastructure providing open access to journals for free.

Copyright activism, Guerilla open access etc.

See, e.g. Jonathan Basile’s essay onAAARG,Who’s Afraid of AAARG?. GenerallyFree online libraries.

There is as interesting question aboutmechanism design for the important business of science, which I do not myself pretend to know good answers for.

Various open access (and occasionally also open source) journals attempt to disrupt the incumbent publishers with new business models based around the low cost of internet stuff. As with legacy journals, they have variying degrees of success

One cute boutique example:

Open Journals is a collection of open source, open access journals. We currently have four main publications:

- The Journal of Open Source Software
- The Journal of Open Source Education
- The Open Journal of Astrophysics
- The Journal of Brief Ideas
All of our journals run on open source software which is available under our GitHub organization profile:github.com/openjournals.

All of our journals are open access publications with content licensed under aCreative Commons Attribution 4.0 International License. Copyright remains with the submitting authors.

Aczel, Balazs, Barnabas Szaszi, and Alex O. Holcombe. 2021.“A Billion-Dollar Donation: Estimating the Cost of Researchers’ Time Spent on Peer Review.”*Research Integrity and Peer Review* 6 (1): 14.

Afonso, Alexandre. 2014.“How Academia Resembles a Drug Gang.” SSRN Scholarly Paper. Rochester, NY.

Björk, Bo-Christer, and David Solomon. 2013.“The Publishing Delay in Scholarly Peer-Reviewed Journals.”*Journal of Informetrics* 7 (4): 914–23.

Bogich, Tiffany L, Sebastien Balleseteros, Robin Berjon, Chris Callahan, and Leon Chen. n.d.“On the Marginal Cost of Scholarly Communication.”

Heckman, James J., and Sidharth Moktan. 2020.“Publishing and Promotion in Economics: The Tyranny of the Top Five.”*Journal of Economic Literature* 58 (2): 419–70.

Himmelstein, Daniel S., Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, and Anthony Gitter. 2019.“Open Collaborative Writing with Manubot.” Edited by Dina Schneidman-Duhovny.*PLOS Computational Biology* 15 (6): e1007128.

Ioannidis, John P. A., Richard Klavans, and Kevin W. Boyack. 2018.“Thousands of Scientists Publish a Paper Every Five Days.”*Nature* 561 (7722): 167–69.

Krikorian, Gaëlle, and Amy Kapczynski. 2010.*Access to knowledge in the age of intellectual property*. New York; Cambridge, Mass.: Zone Books ; Distributed by the MIT Press.

Pensky, Jennifer, Christina Richardson, Araceli Serrano, Galen Gorski, Adam N. Price, and Margaret Zimmer. 2021.“Disrupt and Demystify the Unwritten Rules of Graduate School.”*Nature Geoscience* 14 (8): 538–39.

Potts, Jason, John Hartley, Lucy Montgomery, Cameron Neylon, and Ellie Rennie. 2016.“A Journal Is a Club: A New Economic Model for Scholarly Publishing.” SSRN Scholarly Paper ID 2763975. Rochester, NY: Social Science Research Network.

Schimmer, Ralf, Geschuhn, Kai Karin, and Vogler, Andreas. 2015.“Disrupting the subscription journals’ business model for the necessary large-scale transformation to open access.”

Van Noorden, Richard. 2013.“Open Access: The True Cost of Science Publishing.”*Nature* 495 (7442): 426–29.

Wagenmakers, Eric-Jan, Alexandra Sarafoglou, and Balazs Aczel. 2022.“One Statistical Analysis Must Not Rule Them All.”*Nature* 605 (7910): 423–25.

How can institutions construct good decisions out of the aggregate ignorance, laziness, short-sightedness and chaos that we pump in to them?

A flagship question is is about how the market does so, finding prices (maybe) efficiently despite theboundedly rational dynamics
of human decisions.
re markets systems for
fabricating rational-like behaviour from irrational agents?
(Hayek might argue
this, and I think also Gode and Sunder of Zero Intelligence Agents fame.
Friedman argued that in fact that markets*effectively turn people into rational
agents*, which is yet stronger.)

Are there useful measures of “how much rationality” humans have that we can use for aggregate modelling? (as opp. the minute and detailed ones that Kahnemann and Tversky devise, that are hard to scale up.)

TBC

- Henry Farrell, inSkepticism and human reason, argues “Even if human beings are bad at (some forms) of individual reasoning, they may be able to reason quite well collectively.” A criticism ofBright (2022) viaFarrell, Mercier, and Schwartzberg (2022).

Akerlof, George A., and Robert J. Shiller. 2015.*Phishing for Phools: The Economics of Manipulation and Deception*. Princeton University Press.

Bright, Liam Kofi. 2022.“Neo-Rationalism.”

Camara, Modibo K. 2021.“Computationally Tractable Choice,” 80.

Easley, David, and Jon Kleinberg. 2010.*Networks, Crowds, and Markets: Reasoning about a Highly Connected World*. New York: Cambridge University Press.

Epstein, Joshua M. 2001.“Learning to Be Thoughtless: Social Norms and Individual Computation.”*Computational Economics* 18: 9–24.

———. 2007.*Generative Social Science: Studies in Agent-Based Computational Modeling*. Princeton Studies in Complexity. Princeton University Press.

Farrell, Henry, Hugo Mercier, and Melissa Schwartzberg. 2022.“Analytical Democratic Theory: A Microfoundational Approach.”*American Political Science Review*, August, 1–6.

Gagen, Michael J, and Kae Nemoto. 2006.“Variational Optimization of Probability Measure Spaces Resolves the Chain Store Paradox.”

Gode, Dhananjay K, and Shyam Sunder. 1993.“Allocative Efficiency of Markets with Zero-Intelligence Traders: Market as a Partial Substitute for Individual Rationality.”*The Journal of Political Economy* 101: 119–37.

———. 1997.“What Makes Markets Allocationally Efficient?”*The Quarterly Journal of Economics* 112: 603–30.

Graham-Tomasi, Theodore, Ford C Runge, and William F Hyde. 1986.“Foresight and Expectations in Models of Natural Resource Markets.”*Land Economics* 62: 234–49.

Henrich, Joseph, Robert Boyd, Samuel Bowles, Colin Camerer, Ernst Fehr, Herbert Gintis, Richard McElreath, et al. 2005.“‘Economic Man’ in Cross-Cultural Perspective: Behavioral Experiments in 15 Small-Scale Societies.”*Behavioral and Brain Sciences* 28: 795.

Holland, John H, and John H Miller. 1991.“Artificial Adaptive Agents in Economic Theory.”*The American Economic Review* 81: 365–70.

Jackson, Matthew O. 2009.“Social Structure, Segregation, and Economic Behavior.”*Presented as the Nancy Schwartz Memorial Lecture*, February.

Kirman, Alan. 2010.“Learning in Agent Based Models.”

Latek, Maciej, Robert Axtell, and Bogumil Kaminski. 2009.“Bounded Rationality via Recursion.” In, 457–64. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.

Lo, Andrew W. 2004.“The Adaptive Markets Hypothesis.”*The Journal of Portfolio Management* 30: 15–29.

Lucas, Deborah J, and Robert L McDonald. 1992.“Bank Financing and Investment Decisions with Asymmetric Information about Loan Quality.”*The RAND Journal of Economics* 23 (1): 86–105.

Molavi, Pooya. 2022.“Simple Models and Biased Forecasts.”*arXiv:2202.06921 [Econ]*, February.

North, Douglass C. 1994.“Economic Performance Through Time.”*The American Economic Review* 84: 359–68.

Paich, Mark, and John D Sterman. 1993.“Boom, Bust, and Failures to Learn in Experimental Markets.”*Management Science* 39.

Rauch, James E, and Alessandra Casella. 2001.“Networks and Markets: Concepts for Bridging Disciplines.” In*Networks and Markets*. Russell Sage Foundation Publications.

Rosewell, Bridget, and Paul Ormerod. 2004.“How Much Can Firms Know?”*Computing in Economics and Finance 2004*.

Rubinstein, Ariel. 1997.*Modeling Bounded Rationality*. The MIT Press.

Sah, Raaj K. 1991.“Fallibility in Human Organizations and Political Systems.”*The Journal of Economic Perspectives* 5: 67–88.

Sargent, Thomas J. 1994.*Bounded Rationality in Macroeconomics: The Arne Ryde Memorial Lectures (Clarendon Paperbacks)*. Oxford University Press, USA.

Simon, Herbert A. 1996.*The Sciences of the Artificial*. The MIT Press.

Vanberg, Viktor J. 2004.“The Rationality Postulate in Economics: Its Ambiguity, Its Deficiency and Its Evolutionary Alternative.”*Journal of Economic Methodology* 11: 171–29.

Wilhite, Allen. 2001.“Bilateral Trade and‘Small-World’ Networks.”*Computational Economics* 18: 49–64.

Yanagita, T, and T Onozaki. 2008.“Dynamics of a Market with Heterogeneous Learning Agents.”*Journal of Economic Interaction and Coordination* 3 (1): 107–18.

Young, H Peyton. 1996.“The Economics of Convention.”*The Journal of Economic Perspectives* 10 (2): 105–22.

———. 2005.“The Spread of Innovations Through Social Learning.”

A cousin toneural automata: writing machines to code for us. We might also want to write code tospeak for us, which end up involving simlar technology

GitHub Copilot uses suggestions fromOpenAI Codex to suggest code completion.

**Pro tip.** for use behind a firewall, requiresthe following whitelist exceptions:

`vscode-auth.github.com`

`api.github.com`

`copilot-proxy.githubusercontent.com`

SeeNetworks VS Code for some more whitelest rules we need.

I am vaguely concerned about how much of the world is uploading their source code for everything to these code servers. The potential for abuse is huge.

Codeium has been developed by the team atExafunction to build on the industry-wide momentum on foundational models. We realized that the combination of recent advances in generative models and our world-class optimized deep learning serving software could provide users with top quality AI-based products at the lowest possible costs (or ideally, free!).

Glean is a system for working with facts about source code. It is designed for collecting and storing detailed information about code structure, and providing access to the data to power tools and experiences from online IDE features to offline code analysis.

For example, Glean could answer all the questions you’d expect your IDE to answer, accurately and efficiently on a large-scale codebase. Things like:

Where is the definition of this method?Where are all the callers of this function?Who inherits from this class?What are all the declarations in this file?

⚠️⚠️⚠️ I now regret using*weaponized* in the title of this notebook, because it, I think implies that emphasises the intent large coherent actors (famously “Russian propaganda”) but while that may be important, the general adversarial nature of the mediasphere in which we are all combatants seems important even if it has no coherent intent.
Too late now I guess.
Also this notebook has lost cohesion and turned into a link salad.
It would be more valuable if it analysed and compared some theses; in this topic area is we need fewer link lists and more synthesis one where some synthesis would actually help.

Memetic information warfare on thesocial information graph withviral media for the purpose ofhuman behaviour control and steeringepistemic communities. The other side totrusted news; hacking the implicitreputation system of social media to suborn factual reporting, or to motivate people to behave to suit your goals, to, e.g.sell uncertainty.

News media and public shared reality. Fake news, incomplete news, alternative facts, strategic inference, general incompetence.kompromat,agnotology,facebooking to a molecular level. Basic media literacy and whether it helps. As seen inelections, andprovocateur twitter bots.

Research in this area is vague for many reasons; It is hard to do experiments on societies at large for reasons of ethics and practicality for most of us. Also, our tools forcausality on social graphs are weak and it is hard. There are some actors (nation states, social media companies) for which experiments are practical, but they have various reasons for not sharing their results. But we can get a long way!

But for now, here is some qualitative journalism from the front lines.

Coscia is always low-key fun:News on Social Media: It’s not Real if I don’t Like it.

Toxic-social-media docoThe Social Dilemma was a thing, although not a thing I got around to watching. Was it any good?

Renee DiResta,The Digital Maginot Line
related toinsurgence economics argues that performative search for*bad* actors is insufficient because it misses the*worst* actors:

Information war combatants have certainly pursued regime change: there is reasonable suspicion that they succeeded in a few cases (Brexit) and clear indications of it in others (Duterte). They’ve targeted corporations and industries. And they’ve certainly gone after mores: social media became the main battleground for the culture wars years ago, and we now describe the unbridgeable gap between two polarized Americas using technological terms like filter bubble.

But ultimately the information war is about territory — just not the geographic kind.

In a warm information war, the human mind is the territory. If you aren’t a combatant, you are the territory. And once a combatant wins over a sufficient number of minds, they have the power to influence culture and society, policy and politics. […] The key problem is this: platforms aren’t incentivized to engage in the profoundly complex arms race against the worst actors when they can simply point to transparency reports showing that they caught a fair number of the mediocre actors.

If correct this would still leave open the question of low-key good-faith polarization by sub-standard actors, which no one seems to be tackling.

Epsilon Theory,Gell-Mann Amnesia

Michael Hobbes,The Methods of Moral Panic Journalism

Master List Of Official Russia Claims That Proved To Be Bogus lists some really interesting incidents in the Trump-administration era media reporting, that make the entire press corps look pretty bad.

Just 12 People Are Behind Most Vaccine Hoaxes On Social Media, Research Shows

Elkus on information/science/media dynamics in the age of crisis

More Elkus,Twelve Angry Robots, Or Moderation And Its Discontents

The Radicalization Risks of GPT-3 and Neural Language Models

Techcrunch summary of the Facebook testing debacle

[…] every product, brand, politician, charity, and social movement is trying to manipulate your emotions on some level, and they’re running A/B tests to find out how. They all want you to use more, spend more, vote for them, donate money, or sign a petition by making you happy, insecure, optimistic, sad, or angry. There are many tools for discovering how best to manipulate these emotions, including analytics, focus groups, and A/B tests.

Fun keywords:

- parasocial interaction, the way our monkey minds regard remote celebrities as our intimates.
- …

Sophie Zhang,I saw millions compromise their Facebook accounts to fuel fake engagement, raises an interesting point, which is that people will willingly put their opinions in the hands of engagement farms for a small fee. In this case it is selling their logins, but it is easy to interplolate a continuum from classic old-style shilling for some interest and this new mass-market version.

As Gwern on points out,Littlewood’s Law of Media implies the anecdotes we can recount, in all truthfulness, grow increasingly weird as the population does. In a large enough sample you can find a small number of occurrences to support any hypothesis you would like.

[This] illustrates a version of Littlewood’s Law of Miracles: in a world with ~8 billion people, one which is increasingly networked and mobile and wealthy at that, a one-in-billion event will happen 8 times a month.

Human extremes are not only weirder than we suppose, they are weirder than we can suppose.

But let’s, for a moment, assume that people actually have intent to come to a
shared understanding of the facts reality, writ large and systemic.
Do they even have the skills?
I don’t know, but it*is* hard to work out when you are being fed bullshit and
we don’t do well at teaching that.
There are courses on identifying the lazier type of bullshit

- Calling bullshit.
- Adi Robertson,How to fight lies, tricks,and chaos online

and even courses on more sophisticated bullshit detection

Craig Silverman (ed),Verification Handbook For Disinformation And Media Manipulation.

Will all the billions of humans on earth take such a course? Would they deploy the skills they learned thereby even if they did?

And, given thatsociety is complex and counter-intuitive even if we are doing simple analysis of correlation, how about more complex causation, such as feedback loops? Nicky Case created a diagrammatic account ofhow “systems journalism” might work.

Welcome! This is an online resource guide for civil society groups looking to better deal with the problem of disinformation. Let us know your concerns and we will suggest resources, curated by civil society practitioners and the Project on Computational Propaganda.

fullfact is a full-time fact checking agency in the UK, who do fact checking and reports such asTackling misinformation in an Open Society

The previous organisation I found viaData Skeptic Podcast’s Fake News Series

An amusingportrait of snopes

Facebook’s Walled Wonderland Is Inherently Incompatible With News

Unfiltered news doesn’t share well, not at all:

- It can be emotional, but in the worse sense; no one is willing to spread a gruesome account from Mosul among his/er peers.
- Most likely, unfiltered news will convey a negative aspect of society. Again, another revelation from The Intercept or ProPublica won’t get many clicks.
- Unfiltered news can upset users’ views, beliefs, or opinions.

Tim Harford,The Problem With Facts:

[…] will this sudden focus on facts actually lead to a more informed electorate, better decisions, a renewed respect for the truth? The history of tobacco suggests not. The link between cigarettes and cancer was supported by the world’s leading medical scientists and, in 1964, the US surgeon general himself. The story was covered by well-trained journalists committed to the values of objectivity. Yet the tobacco lobbyists ran rings round them.

In the 1950s and 1960s, journalists had an excuse for their stumbles: the tobacco industry’s tactics were clever, complex and new. First, the industry appeared to engage, promising high-quality research into the issue. The public were assured that the best people were on the case. The second stage was to complicate the question and sow doubt: lung cancer might have any number of causes, after all. And wasn’t lung cancer, not cigarettes, what really mattered? Stage three was to undermine serious research and expertise. Autopsy reports would be dismissed as anecdotal, epidemiological work as merely statistical, and animal studies as irrelevant. Finally came normalisation: the industry would point out that the tobacco-cancer story was stale news. Couldn’t journalists find something new and interesting to say?

[…] In 1995, Robert Proctor, a historian at Stanford University who has studied the tobacco case closely, coined the word “agnotology”. This is the study of how ignorance is deliberately produced; the entire field was started by Proctor’s observation of the tobacco industry. The facts about smoking — indisputable facts, from unquestionable sources — did not carry the day. The indisputable facts were disputed. The unquestionable sources were questioned. Facts, it turns out, are important, but facts are not enough to win this kind of argument.

seeConspiracy mania.

Sea-lioning is a common hack for trolls, and is a whole interesting essay in strategic conversation derailment strategies. Here is onestrategy against it, theFAQ off system for live FAQ. This is one of manydogpiling strategies that are effective online, where economies ofscarce attention are important.

How do weevaulate the effects of social media interventions? Of course, standardsurvey modelling.

There is some structure to exploit here, e.g.causalimpact and other such time series-causal-inference systems. How about when the data is a mixture of time-series data and one-off results (e.g. polling before and election and the election itself)

Getting to the data is fraught:

Facebook’s Illusory Promise of Transparency ish currently obstructing theAd Observatory by NYU Tandon School of Engineering.

Various browser data-harvesting systems exist:

The controversialGPT-x(Radford et al. 2019) family

GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets.

It takes 5 minutes todownload this package and start generating decent fake news; Whether you gain anything over the traditional manual method is an open question.

The controversialdeepcom model enables automatic comment generation for your fake news.(Yang et al. 2019)

Assembling these into a twitter bot farm is left as an exercise for the student.

Tim Starks and Aaron Schaffer summariseEady et al. (2023):Russian trolls on Twitter had little influence on 2016 voters.

David Gilbert,YouTube’s Algorithm Keeps Suggesting Users Watch Climate Change Misinformation The methodology here looks, at a glance, shallow which is not to say it is not implausible.

Craig Silverman, Jane Lytvynenko, William Kung,Disinformation For Hire: How A New Breed Of PR Firms Is Selling Lies Online

One firm promised to “use every tool and take every advantage available in order to change reality according to our client’s wishes.”

Kate Starbird,the surprising nuance behind the Russian troll strategy

Dan O’Sullivan,Inside the RNC Leak

In what is the largest known data exposure of its kind, UpGuard’s Cyber Risk Team can now confirm that a misconfigured database containing the sensitive personal details of over 198 million American voters was left exposed to the internet by a firm working on behalf of the Republican National Committee (RNC) in their efforts to elect Donald Trump. The data, which was stored in a publicly accessible cloud server owned by Republican data firm Deep Root Analytics, included 1.1 terabytes of entirely unsecured personal information compiled by DRA and at least two other Republican contractors, TargetPoint Consulting, Inc. and Data Trust. In total, the personal information of potentially near all of America’s 200 million registered voters was exposed, including names, dates of birth, home addresses, phone numbers, and voter registration details, as well as data described as “modeled” voter ethnicities and religions. […]

“‘Microtargeting is trying to unravel your political DNA,’ [Gage] said. ‘The more information I have about you, the better.’ The more information [Gage] has, the better he can group people into “target clusters” with names such as ‘Flag and Family Republicans’ or ‘Tax and Terrorism Moderates.’ Once a person is defined, finding the right message from the campaign becomes fairly simple.”

But what’s often overlooked in press coverage is that ISIS doesn’t just have strong, organic support online. It also employs social-media strategies that inflate and control its message. Extremists of all stripes are increasingly using social media to recruit, radicalize and raise funds, and ISIS is one of the most adept practitioners of this approach.

British army creates team of Facebook warriors

The Israel Defence Forces have pioneered state military engagement with social media, with dedicated teams operating since Operation Cast Lead, its war in Gaza in 2008-9. The IDF is active on 30 platforms — including Twitter, Facebook, Youtube and Instagram — in six languages. “It enables us to engage with an audience we otherwise wouldn’t reach,” said an Israeli army spokesman. […] During last summer’s war in Gaza, Operation Protective Edge, the IDF and Hamas’s military wing, the Qassam Brigades, tweeted prolifically, sometimes engaging directly with one another.

Nick Statt,Facebook reportedly ignored its own research showing algorithms divided users:

An internal Facebook report presented to executives in 2018 found that the company was well aware that its product, specifically its recommendation engine, stoked divisiveness and polarization, according to anew report from

The Wall Street Journal. “Our algorithms exploit the human brain’s attraction to divisiveness,” one slide from the presentation read. The group found that if this core element of its recommendation engine were left unchecked, it would continue to serve Facebook users “more and more divisive content in an effort to gain user attention & increase time on the platform.” A separate internal report, crafted in 2016, said 64 percent of people who joined an extremist group on Facebook only did so because the company’s algorithm recommended it to them, the _WSJ _reports.Leading the effort to downplay these concerns and shift Facebook’s focus away from polarization has been Joel Kaplan, Facebook’s vice president of global public policy and former chief of staff under President George W. Bush. Kaplan is a controversial figure in part due to his staunch right-wing politics— hesupported Supreme Court Justice Brett Kavanaugh throughout his nomination— and his apparent ability to sway CEO Mark Zuckerberg on important policy matters.

Ray Serrator documents the kind of dynamics that we should be aware of here. One false-flag tweet circulated by partisans gets far more exposure as a evidence of the vileness of the people it purports to come from than does the belated take-down of that tweet.

The problem with raging against the machine is that the machine has learned to feed off rage

80% of the 22 million comments on net neutrality rollback were fake, investigation finds

Biggest ISPs paid for 8.5 million fake FCC comments opposing net neutrality

ISPs Funded 8.5 Million Fake Comments Opposing Net Neutrality

Joan Donovan, Research Director of Harvard Kennedy School’s Shorenstein Center on Media, Politics and Public Policy,How Civil Society Can Combat Misinformation and Hate Speech Without Making It Worse.

Adam Elkus,Twelve Angry Robots, Or Moderation And Its Discontents

How the Far Right in Italy Is Manipulating Twitter and Discourse

Facebook will change algorithm to demote “borderline content” that almost violates policies

Aistrope, Tim. 2016.“Social Media and Counterterrorism Strategy.”*Australian Journal of International Affairs* 70 (2): 121–38.

Allen, Danielle, Henry Farrell, and Cosma Rohilla Shalizi. 2017.“Evolutionary Theory and Endogenous Institutional Change.”

Andrew Crooks. n.d.“Bot Stamina: Examining the Influence and Staying Power of Bots in Online Social Networks.”

Arif, Ahmer, Leo Graiden Stewart, and Kate Starbird. 2018.“Acting the Part: Examining Information Operations Within #BlackLivesMatter Discourse.”*Proc. ACM Hum.-Comput. Interact.* 2 (CSCW): 20:1–27.

Banerjee, Abhijit, Arun G Chandrasekhar, Esther Duflo, and Matthew O Jackson. 2019.“Using Gossips to Spread Information: Theory and Evidence from Two Randomized Controlled Trials.”*The Review of Economic Studies* 86 (6): 2453–90.

Bay, Morten. 2018.“Weaponizing the Haters: The Last Jedi and the Strategic Politicization of Pop Culture Through Social Media Manipulation.”*First Monday* 23 (11).

Behr, Ines Von, Anais Reding, Charlie Edwards, and Luke Gribbon. 2013.“Radicalisation in the Digital Era: The Use of the Internet in 15 Cases of Terrorism and Extremism.”

Benkler, Yochai, Rob Faris, and Harold Roberts. 2018.*Network propaganda: manipulation, disinformation, and radicalization in American politics*. New York, NY: Oxford University Press.

Beskow, David. 2020.“Finding and Characterizing Information Warfare Campaigns.” Carnegie Mellon University.

Bessi, Alessandro. 2016.“On the Statistical Properties of Viral Misinformation in Online Social Media.”*arXiv:1609.09435 [Physics, Stat]*, September.

Bradshaw, S., and P. Howard. 2017.“Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation” 2017.12.

Brito, Kellyton, Natalia Paula, Manoel Fernandes, and Silvio Meira. 2019.“Social Media and Presidential Campaigns – Preliminary Results of the 2018 Brazilian Presidential Election.” In*Proceedings of the 20th Annual International Conference on Digital Government Research*, 332–41. Dg.o 2019. New York, NY, USA: ACM.

Broniatowski, David A., Amelia M. Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C. Quinn, and Mark Dredze. 2018.“Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate.”*American Journal of Public Health* 108 (10): 1378–84.

Bursztyn, Victor S, and Larry Birnbaum. 2019.“Thousands of Small, Constant Rallies: A Large-Scale Analysis of Partisan WhatsApp Groups,” 6.

Cadwalladr, Carole. 2017.“The Great British Brexit Robbery: How Our Democracy Was Hijacked.”*The Guardian*, May 7, 2017, sec. Technology.

Callaway, Duncan S., M. E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. 2000.“Network Robustness and Fragility: Percolation on Random Graphs.”*Physical Review Letters* 85 (25): 5468–71.

Campbell, David E. 2013.“Social Networks and Political Participation.”*Annual Review of Political Science* 16 (1): 33–48.

Coscia, Michele. 2017.“Popularity Spikes Hurt Future Chances for Viral Propagation of Protomemes.”*Communications of the ACM* 61 (1): 70–77.

Dittmar, Jeremiah E., and Skipper Seabold. 2015.“Media, Markets, and Radical Ideas: Evidence from the Protestant Reformation.”*Centre for Economic Performance Working Paper*.

Dodds, Peter Sheridan. 2017.“Slightly Generalized Generalized Contagion: Unifying Simple Models of Biological and Social Spreading.”*arXiv:1708.09697 [Physics]*, August.

Eady, Gregory, Tom Paskhalis, Jan Zilinsky, Richard Bonneau, Jonathan Nagler, and Joshua A. Tucker. 2023.“Exposure to the Russian Internet Research Agency Foreign Influence Campaign on Twitter in the 2016 US Election and Its Relationship to Attitudes and Voting Behavior.”*Nature Communications* 14 (1): 62.

Evans, David S. 2017.“The Economics of Attention Markets.” SSRN Scholarly Paper ID 3044858. Rochester, NY: Social Science Research Network.

Farrell, Henry. n.d.“Analysis | Blame Fox, Not Facebook, for Fake News.”*Washington Post*, sec. Monkey Cage Analysis Analysis Interpretation of the news based on evidence, including data, as well as anticipating how events might unfold based on past events.

Farrell, Henry, and Bruce Schneier. 2018.“Common-Knowledge Attacks on Democracy.” SSRN Scholarly Paper ID 3273111. Rochester, NY: Social Science Research Network.

Farrell, Henry, and Cosma Shalizi. 2012.“Cognitive Democracy.”*Crooked Timber* 23.

———. n.d.a.“An Outline of Cognitive Democracy,” 44.

———. n.d.b.“Evolutionary Theory and the Dynamics of Institutional Change,” 41.

Farrell, Henry, and Cosma Rohilla Shalizi. 2021.“9 Pursuing Cognitive Democracy.” In*9 Pursuing Cognitive Democracy*, 209–31. University of Chicago Press.

Freelon, Deen, Alice Marwick, and Daniel Kreiss. 2020.“False Equivalencies: Online Activism from Left to Right.”*Science* 369 (6508): 1197–1201.

Gibney, Elizabeth. 2018.“The Scant Science Behind Cambridge Analytica’s Controversial Marketing Techniques.”*Nature*, March.

Goel, Sharad, Ashton Anderson, Jake Hofman, and Duncan J. Watts. 2015.“The Structural Virality of Online Diffusion.”*Management Science*, July, 150722112809007.

Goel, Sharad, Jake M. Hofman, Sébastien Lahaie, David M. Pennock, and Duncan J. Watts. 2010.“Predicting Consumer Behavior with Web Search.”*Proceedings of the National Academy of Sciences* 107 (41): 17486–90.

Goel, Sharad, Winter Mason, and Duncan J. Watts. 2010.“Real and Perceived Attitude Agreement in Social Networks.”*Journal of Personality and Social Psychology* 99 (4): 611–21.

Goel, Sharad, Duncan J. Watts, and Daniel G. Goldstein. 2012.“The Structure of Online Diffusion Networks.” In*Proceedings of the 13th ACM Conference on Electronic Commerce - EC ’12*, 623. Valencia, Spain: ACM Press.

Gonzalez-Bailon, Sandra. 2009.“Opening the Black Box of Link Formation: Social Factors Underlying the Structure of the Web.”*Social Networks* 31 (4): 271–80.

Granovetter, Mark. 1983.“The Strength of Weak Ties: A Network Theory Revisited.”*Sociological Theory* 1 (1): 201–33.

Granovetter, Mark S. 1973.“The Strength of Weak Ties.”*The American Journal of Sociology* 78 (6): 1360–80.

Grossman, Gene, and Elhanan Helpman. 2019.“Electoral Competition with Fake News.” w26409. Cambridge, MA: National Bureau of Economic Research.

HAmid, Nafees, and Cristina Ariza. n.d.“Offline Versus Online Radicalisation: Which Is the Bigger Threat?”

Harwell, Drew. 2021.“Lonely, Angry, Eager to Make History: Online Mobs Are Likely to Remain a Dangerous Reality.”*Washington Post*, February 17, 2021.

Hassan, Ghayda, Sébastien Brouillette-Alarie, Séraphin Alava, Divina Frau-Meigs, Lysiane Lavoie, Arber Fetiu, Wynnpaul Varela, et al. 2018.“Exposure to Extremist Online Content Could Lead to Violent Radicalization:A Systematic Review of Empirical Evidence.”*International Journal of Developmental Science* 12 (1-2): 71–88.

Hawkins, Stephen, Daniel Yudkin, Miriam Juan-Torres, and Tim Dixon. 2019.“Hidden Tribes: A Study of America’s Polarized Landscape.” Preprint. PsyArXiv.

Howard, Philip N., and Bence Kollanyi. 2016.“Bots, #StrongerIn, and #Brexit: Computational Propaganda During the UK-EU Referendum.”*Browser Download This Paper*.

Hurd, T. R., and James P. Gleeson. 2012.“On Watts’ Cascade Model with Random Link Weights.”*arXiv:1211.5708 [Cond-Mat, Physics:physics]*, November.

Imhoff, Roland, and Martin Bruder. 2014.“Speaking (Un-)Truth to Power: Conspiracy Mentality as a Generalised Political Attitude.”*European Journal of Personality* 28 (1): 25–43.

Jackson, Matthew O., Suraj Malladi, and David McAdams. 2019.“Learning Through the Grapevine: The Impact of Noise and the Breadth and Depth of Social Networks.” SSRN Scholarly Paper ID 3269543. Rochester, NY: Social Science Research Network.

Jakesch, Maurice, Kiran Garimella, Dean Eckles, and Mor Naaman. 2021.“#Trend Alert: How a Cross-Platform Organization Manipulated Twitter Trends in the Indian General Election.”*arXiv:2104.13259 [Cs]*, April.

Johnson, Hollyn M., and Colleen M. Seifert. 1994.“Sources of the Continued Influence Effect: When Misinformation in Memory Affects Later Inferences.”*Learning, Memory* 20 (6): 1420–36.

Johnson, N. F., N. Velasquez, N. Johnson Restrepo, R. Leahy, R. Sear, N. Gabriel, H. Larson, and Y. Lupu. 2021.“Mainstreaming of Conspiracy Theories and Misinformation,” February.

Kellow, Christine L., and H. Leslie Steeves. 1998.“The Role of Radio in the Rwandan Genocide.”*Journal of Communication* 48 (3): 107–28.

Kim, Yonghwan. 2015.“Does Disagreement Mitigate Polarization? How Selective Exposure and Disagreement Affect Political Polarization.”*Journalism & Mass Communication Quarterly* 92 (4): 915–37.

Klausen, Jytte. 2015.“Tweeting the Jihad: Social Media Networks of Western Foreign Fighters in Syria and Iraq.”*Studies in Conflict & Terrorism* 38 (1): 1–22.

Kreps, Sarah. 2020.*Social Media and International Relations*. 1st ed. Cambridge University Press.

LaFrance, Adrienne. 2020.“The Prophecies of Q.”*The Atlantic*, June 2020.

Larson, Heidi J. 2018.“The Biggest Pandemic Risk? Viral Misinformation.”*Nature* 562 (October): 309–9.

Levy, Gilat, and Ronny Razin. 2019.“Echo Chambers and Their Effects on Economic and Political Outcomes.”*Annual Review of Economics* 11 (1): 303–28.

Lewis, Rebecca. n.d.“Broadcasting the Reactionary Right on YouTube,” 61.

Lin, Herbert, and Jaclyn Kerr. 2019.“On Cyber-Enabled Information Warfare and Information Operations.” SSRN Scholarly Paper ID 3015680. Rochester, NY: Social Science Research Network.

Machado, Caio, Beatriz Kira, Vidya Narayanan, Bence Kollanyi, and Philip Howard. 2019.“A Study of Misinformation in WhatsApp Groups with a Focus on the Brazilian Presidential Elections.” In*Companion Proceedings of The 2019 World Wide Web Conference*, 1013–19. WWW ’19. New York, NY, USA: ACM.

Mahmoodi, Ali, Dan Bang, Karsten Olsen, Yuanyuan Aimee Zhao, Zhenhao Shi, Kristina Broberg, Shervin Safavi, et al. 2015.“Equality Bias Impairs Collective Decision-Making Across Cultures.”*Proceedings of the National Academy of Sciences* 112 (12): 3835–40.

Martin, Gregory J., and Ali Yurukoglu. 2017.“Bias in Cable News: Persuasion and Polarization.”*American Economic Review* 107 (9): 2565–99.

Marwick, Alice, and Rebecca Lewis. 2017.“Media Manipulation and Disinformation Online.” Data & Society Research Institute.

Munn, Luke. 2019.“Alt-Right Pipeline: Individual Journeys to Extremism Online.”*First Monday* 24 (6).

Nyhan, Brendan. 2021.“Why the Backfire Effect Does Not Explain the Durability of Political Misperceptions.”*Proceedings of the National Academy of Sciences* 118 (15).

O’connor, Cailin, and James Owen Weatherall. 2019.*The Misinformation Age: How False Beliefs Spread*. 1 edition. New Haven: Yale University Press.

Oliver, Eric, and Tom Wood. 2014.“Larger Than Life.”*New Scientist* 224 (3000): 36–37.

Oliver, J. Eric, and Thomas J. Wood. 2014.“Conspiracy Theories and the Paranoid Style(s) of Mass Opinion.”*American Journal of Political Science* 58 (4): 952–66.

Osborne, Jonathan. 2022.“Science Education in an Age of Misinformation.”

Ottman, Bill, Daryl Davis, Jack Ottman, Jesse Morton, Justin E Lane, and F LeRon Shults. 2022.“The Censorship Effect,” 85.

Parker, Priya. 2018.*The Art of Gathering: How We Meet and Why It Matters*. International edition. New York: Riverhead Books.

Powell, Derek, and Kara Weisman. 2018.“Articulating Lay Theories Through Graphical Models: A Study of Beliefs Surrounding Vaccination Decisions,” February.

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019.“Language Models Are Unsupervised Multitask Learners,” 24.

Redlawsk, David P., Andrew J. W. Civettini, and Karen M. Emmerson. 2010.“The Affective Tipping Point: Do Motivated Reasoners Ever‘Get It’?”*Political Psychology* 31 (4): 563–93.

Reyna, Valerie F. 2021.“A Scientific Theory of Gist Communication and Misinformation Resistance, with Implications for Health, Education, and Policy.”*Proceedings of the National Academy of Sciences* 118 (15).

Ribeiro, Manoel Horta, Raphael Ottoni, Robert West, Virgílio A. F. Almeida, and Wagner Meira. 2019.“Auditing Radicalization Pathways on YouTube.”*arXiv:1908.08313 [Cs]*, August.

Richardson, Bailey, Kevin Huynh, and Kai Elmer Sotto. 2019.*Get together: how to build a community with your people*.

Rieder, Bernhard, Ariadna Matamoros-Fernández, and Òscar Coromina. 2018.“From Ranking Algorithms to‘Ranking Cultures’: Investigating the Modulation of Visibility in YouTube Search Results.”*Convergence* 24 (1): 50–68.

Rogall, Thorsten. 2014.“Mobilizing the Masses for Genocide.”

Roose, Kevin. 2020.“Get Ready for a Vaccine Information War.”*The New York Times*, May 13, 2020, sec. Technology.

———. 2021.“Inside Facebook’s Data Wars.”*The New York Times*, July 14, 2021, sec. Technology.

Roozenbeek, Jon, and Sander van der Linden. 2019.“Fake News Game Confers Psychological Resistance Against Online Misinformation.”*Palgrave Communications* 5 (1): 1–10.

Salamanos, Nikos, Michael J. Jensen, Costas Iordanou, and Michael Sirivianos. 2020.“Did State-Sponsored Trolls Shape the 2016 US Presidential Election Discourse? Quantifying Influence on Twitter,” June.

Salganik, Matthew J., Peter Sheridan Dodds, and Duncan J. Watts. 2006.“Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.”*Science* 311 (5762): 854–56.

Salganik, Matthew J., and Duncan J. Watts. 2008.“Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.”*Social Psychology Quarterly* 74 (4): 338.

Schuchard, Ross, Andrew T. Crooks, Anthony Stefanidis, and Arie Croitoru. 2019.“Bot Stamina: Examining the Influence and Staying Power of Bots in Online Social Networks.”*Applied Network Science* 4 (1): 1–23.

Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. 2015.“Estimating the Causal Impact of Recommendation Systems from Observational Data.”*Proceedings of the Sixteenth ACM Conference on Economics and Computation - EC ’15*, 453–70.

Staines, Cassie, and Will Moy. 2018.“Tackling Misinformation in an Open Society.”

Starbird, Kate. 2017.“Examining the Alternative Media Ecosystem Through the Production of Alternative Narratives of Mass Shooting Events on Twitter.” In*Eleventh International AAAI Conference on Web and Social Media*.

———. 2019.“Disinformation’s Spread: Bots, Trolls and All of Us.”*Nature* 571 (July): 449.

Stewart, Leo G, Ahmer Arif, and Kate Starbird. 2018.“Examining Trolls and Polarization with a Retweet Network,” 6.

Stewart, Leo Graiden, Ahmer Arif, A. Conrad Nied, Emma S. Spiro, and Kate Starbird. 2017.“Drawing the Lines of Contention: Networked Frame Contests Within #BlackLivesMatter Discourse.”*Proceedings of the ACM on Human-Computer Interaction* 1 (CSCW): 1–23.

Taylor, Emily, and Stacie Hoffmann. n.d.“Industry Responses to Computational Propaganda and Social Media Manipulation.” Oxford Information Labs.

Tufekci, Zeynep. 2014.“Engineering the Public: Big Data, Surveillance and Computational Politics.”*First Monday*, July.

Uscinski, Joseph E., and Matthew Atkinson. 2013.“Why Do People Believe in Conspiracy Theories? The Role of Informational Cues and Predispositions.” SSRN Scholarly Paper ID 2268782. Rochester, NY: Social Science Research Network.

Verwimp, Philip. 2005.“An Economic Profile of Peasant Perpetrators of Genocide: Micro-Level Evidence from Rwanda.”*Journal of Development Economics* 77 (2): 297–323.

Vosoughi, Soroush, Deb Roy, and Sinan Aral. 2018.“The Spread of True and False News Online.”*Science* 359 (6380): 1146–51.

Watts, Duncan J. 2014.“Common Sense and Sociological Explanations.”*American Journal of Sociology* 120 (2): 313–51.

Watts, Duncan J., and Peter Sheridan Dodds. 2007.“Influentials, Networks, and Public Opinion Formation.”*Journal of Consumer Research* 34 (4): 441–58.

Watts, Duncan J, and Steven H Strogatz. 1998.“Collective Dynamics of‘Small-World’ Networks.”*Nature* 393 (6684): 440–42.

Wilson, Tom, Kaitlyn Zhou, and Kate Starbird. 2018.“Assembling Strategic Narratives: Information Operations As Collaborative Work Within an Online Community.”*Proc. ACM Hum.-Comput. Interact.* 2 (CSCW): 183:1–26.

Winter, Aaron. 2019.“Online Hate: From the Far-Right to the‘Alt-Right’ and from the Margins to the Mainstream.” In*Online Othering: Exploring Digital Violence and Discrimination on the Web*, edited by Karen Lumsden and Emily Harmer, 39–63. Palgrave Studies in Cybercrime and Cybersecurity. Cham: Springer International Publishing.

Yang, Ze, Can Xu, Wei Wu, and Zhoujun Li. 2019.“Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation.”*arXiv:1909.11974 [Cs]*, October.

My mentioning and citing material to do with recreational or medical use of controlled substances should obviously not be taken as an endorsement of recreational use of controlled substances.
I am not qualified to make such endorsements, even if I were inclined to.
Indeed, if you are turning to a mathematician for intelligence on legal or pharmacological questions, you should probably inspect your life decisions to see if any*other* ones you have made recently are similarly suspect.

## Comments, TODOs

I quite like this one