An infuriating quagmire. Parsing command line arguments inpython is just hard enough to be a friction, but not so hard that enough developers are dissuaded from attempting to reinvent it. All the various solutions claim to be beautiful, and/or seamless and/or elegant, which individually they may be. Collectively they feel like aggressive hawkers shouting in your face and at each other, wasting your time by reducing the probability that any two projects have the same dependencies.

The best dependency system is … any of the three CLI libraries your project already uses. Do not add an additional one.

OK, since I contribute to more than two python projects with CLIs, I have more than three CLI systems that I need to deal with. Below is a spotter’s guide.

**tl;dr**: My use case involves configuring lots of ML so wherever feasible I useHydra, which does that very well, and supports every feature I need, and will generate command-line parsers as a side effect.

`argparse`

is built-in to python stdlib and is adequate, so why not just use that
and avoid other dependencies?
Answer: a dependency you might already have is likely to have introduced an additional CLI parsing library.

Cute hack:chriskiehl/Gooey will construct a GUI for programs by examining`argparse`

code.

However, this is not flexible enough for the kind of stuff that I frequently need to do (complicated nested options…)

“Hydra is a framework for elegantlyconfiguring complex applications.” As a special case itbuilds CLIs with autocomplete and other fun stuff. Because it can do so many things of use to an ML researcher, within a very simple paradigm, this tool is all I need for experiment and workflow invocation. Because that is what I mostly do, that is now my main tool. See myhydra notes.

docopt/docopt: Pythonic command line arguments parser, that will make you smile

docopt helps you:

- define the interface for your command-line app, and
- automatically generate a parser for it.
docopt is based on conventions that have been used for decades in help messages and man pages for describing a program’s interface. An interface description in docopt is such a help message, but formalized.

Bonus feature: It is implemented in many languages, so the same command line description will also generate parsers in C++, julia etc.

FFS there is another hip one that uses even more shiny features yay.
This one has unusually compact syntax, since it uses typed-hinting in arguments to sort it out, which, I get it, is nice.
Is it nice enough to throw everything out and start again with a new, incompatible system?
Whe yes, it*is* nice enough for at least a few developers, and now*you* need to know about it or you will be less cool than them.

Typer is internally based upon the next one,Click, so*maybe* it doesn’t count as a new dependency to add to your project if you already use Click.

is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It’s the “Command Line Interface Creation Kit”. It’s highly configurable but comes with sensible defaults out of the box. […]

- arbitrary nesting of commands
- automatic help page generation
- supports lazy loading of subcommands at runtime

Aims to offer analternative to the built-in argparse, which they regard as excessively magical. Its special feature issetuptools integration enabling installation of command-line tools from your current ipython virtualenv.

Google frameworkabseil has apython CLI system whose selling points are that it

- Works across Google ML apps
- integrates C++ arguments somehow? (Build arguments? Run-time? Someone who cares can answer that)
- allows distributed definition of arguments rather than centralized, and
- has some logging and testing features bolted together into the same library.

Actual value proposition: Because AFAICT abseil is a dependency ofjax andTensorflow if you do machine learning, this is pre-installed in all the examples. Thus you may as well keep it when copy-pasting Google sample code.

On the other hand, pretty much every machine learning framework has an equivalent command-line whatsit, so you will probably end up copy-pasting some*other* stuff from somewhere else which doesn’t work with abseil, so maybe you could just ditch this?
It is not amazing.

provides a clean, high level API for running shell commands and defining/organizing task functions from a tasks.py file […] it offers advanced features as well — namespacing, task aliasing, before/after hooks, parallel execution and more.

argh was/is a popular extension to argparse

Argh is fully compatible with argparse. You can mix Argh-agnostic and Argh-aware code. Just keep in mind that the dispatcher does some extra work that a custom dispatcher may not do.

clip.py comes with a
passive-aggressive app name, (+1) is all about wrapping generic python
commands in command-line applications easily, much like`click`

.
But different.

Remember when the internet was full of delightful things? It still is; it is merely hard to notice between theinfowars and operant-conditioned instagram envy-doomscrolls. There are people lovingly surfacing delightful things far from the active fronts of infowara. So ssooooothing.

If I browsed things recreationally, I would consume the product painstaking refined by the following delight-miners:

If you had to give a rough overview of what you cover, what would it say?

FINE. Roughly, I’d say something along the lines of ‘digital arts, online culture, webdesign and creativity, philosophy, economics, sex, art, death, drugs, music, animation, literary fiction, comedy, nihilism, advertising, marketing, pornography, rights, AI, identity, PR, and the crippling horror of being made of meat’.

Andy Baio’sWaxy.org

Frequent topics include internet culture, copyright and fair use, online community, independent and experimental media, and the intersection of art and technology.

The Kid Should See This is a Webby Award-winning collection of 5,000+ kid-friendly videos, curated for teachers and parents who want to share smarter, more meaningful media in the classroom and at home.

Founded in 1998, kottke.org is one of the oldest blogs on the web. It’s written and produced by Jason Kottke and covers the essential people, inventions, performances, and ideas that increase the collective adjacent possible of humanity. Frequent topics of interest among the 26,000+ posts include art, technology, science, visual culture, design, music, cities, food, architecture, sports, endless nonsense, and carefully curated current events, all of it lightly contextualized.

Called the “Tate Modern of the Internet,” Colossal is an international platform for contemporary art and visual expression that explores a vast range of creative disciplines.

Founded in 2011, The Public Domain Review is an online journal and not-for-profit project dedicated to the exploration of curious and compelling works from the history of art, literature, and ideas.

See also otherfree content sources.

Hello, kith. My name is Maria Popova. I am a reader and writer, and I write about what I read here on Brain Pickings — my one-woman labor of love exploring what it means to live a decent, substantive, rewarding life. Founded in 2006 as a weekly email to seven friends, eventually brought online and now included in the Library of Congress permanent web archive, it is a record of my own becoming as a person — intellectually, creatively, spiritually, poetically — drawn from my extended marginalia on the search for meaning across literature, science, art, philosophy, and the various other tentacles of human thought and feeling.

Boing Boing, are they still good? More polemical than some here, probably

Nadia Eghbal,Friend groups:

Kevin Kwok has a schtick about howfriend groups underinvest in themselves. A group of people with high mutual trust, shared interests, and low coordination costs have already solved for many of the issues that otherwise prevent humans from accomplishing great things.

I’ve been thinking about whether friendships could be described as having a purpose beyond personal fulfillment, and I think it shares some parallels with dating. One of modern dating’s great albatrosses is the hedonistic treadmill: the Tinder-induced “Welcome to Hell” of meeting, connecting, dating, and ghosting, repeated ad infinitum. Dating around is a perfectly good way to pass the weeks and months and years, but at some point, if you’re not committing to something—building towards a “we” that exists outside of yourself—the initial thrill of intimacy begins to take on a saccharine, artificial quality.

Similarly, there’s a version of modern friendship that feels like dating around, especially because it’s extremely easy to meet new people these days. Maybe it’s getting coffee with interesting people from Twitter, or grabbing a drink with old acquaintances to catch up and trade “life updates” every month or quarter. It feels good to meet people, swap ideas, connect on some superficial level, and perhaps even be a small part of one another’s lives, but if you never really settle down and commit to those friendships, you’re just collecting names. Over time, those names get replaced with other names, and the cycle begins anew.…

“Commitment” in the context of friendship usually refers to things like keeping your word or showing unconditional support. But I think there could also be a implied commitment not just towards each other as individuals, but to a shared sense of self. To try on a more normative version of friendship, perhaps we could say:

“A group of friends who enjoy each others’ company ought to build something together.”

C&C ThisULTRA WHOLESOME MEME VERSION.

Recreationally staring into the abyss. I do enjoy a bit of horror fiction, especially with a political inflection.

I am interested in examining some trends in recent horror writers (who I will call horrorists to save a syllable). But rather than getting to the core, here is a lightly decorated listicle of trends and thoughts I may get around to packaging up into a real essay later.

Some of the writers on this list share a red-pill motif,
in the sense of the feeling of horror arising from*seeing painful truths that the snowflakes can’t handle.*
And the painful truth is that the horror is*other people*.
Probably, in fact, other people who*look different* than the imagined audience.
Maybe black people, if you believe you are talking to white people.
Or women.
Quasi-biological-class-determinist, Houellebecq and Deep-State-is-Cthulhu-Actually,
Zero HP Lovecraft, could serve as waypoints along a red-piller’s path tothe endimming.
Indeed, a dark enlightenment progenitor, Nick Land a.k.a. The Whiteness Out Of Space, is a horror writer.

A recurring theme (Houellebecq, Tacos, Zero HP Lovecraft) ispluralistic ignorance,
i.e. the real horror is that we are all surviving in a vicious social norm that we individually secretly hate but publicly endorse for fear of censure.
In this Emperor’s-new-clothes variant of the Cassandra myth, the protagonists can see the truth but cannot say it for fear being pilloried by other Cassandras.
The Cassandras are locked in adilemma of collective action.
The author, in these cases, is the*real* Cassandra who dares to speak the forbidden truths but will not be believed by society at large.
This particular horror fails to land for me personally, because it does not tally with any experience I have had.
Maybe I am that one arsehat who is happy with the social norm, and by enforcing it ruining it for the silent masses?

The kind of awful inescapable horror I more often fear is themoral maze, where we can all see and acknowledge that the situation is not ideal but our local incentives mean that we nonetheless cannot escape.
In stylised examples, if you talk about things in a pluralistic-ignorance-kind of horror, the imagined risk is the condemnation by other patsies.
“Maybe this thing that most people think is bad is*actually* bad” you say, and all the other bad-thing-haters punish you for stepping out of line for speaking the truth. Probably on twitter.
In a moral-maze-kind, the horror is more that we all know we are in it together but are still doomed.
“Oh no, would it not be great if we were not all part of the problem?” you ask, “Yes, but how can we get there? My kids have to eat,” your colleagues answer, then your kids end up stabbing each other in the streets over food.
This is the kind of horror that, e.g. Peter Watts writes, where we are all turned into monsters by going down a path where at every step we chose the least monstrous option.
Which type of horror lands for you depends possibly upon experiences, attachment styles, paranoia etc.

The incomprehensible dread of the mediated digital world is another thread through these. Consider theconspiracy mania sweeping the internet, which is taking an occult turn. Or the uncanniness of the internet itself.Margot Harrison:

That old aesthetic term for creeping dread, famouslydissected by Freud, is typically now applied to disturbing specimens of digital animation said to reside in the “uncanny valley.”

[…] The dead live on in their videos and social media feeds. Thanks to targeted advertising, a pair of boots we put in our cart months ago stalks us at every turn. The notion that a single utterance can turn a random citizen into an influencer might have sounded to Freud like magical thinking. We see it happen every day.

Or consider the uncanny motif of the double, which has inspired writers of dread from Dostoyevsky toTana French. The fear of having our identity appropriated by a look-alike doesn’t seem atavistic in the era of catfishing and deepfakes. We all lead parallel lives in which presence is absence and reality is malleable.

But I think there are more uncanny and dangerous systems we are aware of than the merely digital. Economist Bryan Caplan isbriefly and provocatively a horrorist in this excellent metaphor about minimum wage:

I honestly wonder if we’d have better luck explaining economics if we used the metaphor of a terrifying and incomprehensible alien deity that is kept barely contained by a complicated and humanly meaningless ritual, and that if somebody upsets the ritual prices then It will break loose and all the electrical plants will simultaneously catch fire. Because that probably

isthe closest translation of the math we believe into a native human ontology.Want to help the bottom 30%? Don’t scribble over the mad inscriptions that are closest to them, trying to prettify the blood-drawn curves. Mess with any other numbers than those, move money around in any other way than that, because It is standing very near to them already.

I love this explanation. There is something deep going on in it. Quibbles: not only economics can own the eldritch horror of its subject matter; probably every social science can. Andmarine biology. Also I disagree ofhis characterisation of why one might support minimum wages (tl;dr: electoral pragmatics rather than economic optimality), but that is an essay under a different heading.

Lovecraftian intelligence - by Noah Smith - Noahpinion

Lovecraftian intelligences are so alien that we’ll never really be able to resolve the argument over whether they’re truly intelligent or not. But unlike other alien intelligences that display their incomprehensibility at all times, Lovecraftian intelligences trick us most of the time—they seem human right up until they don’t. And this is an inherent property of the way they’re created—in the case of Lovecraft’s horrors, they’re the product of a fantasy author’s twisted mind, and in the case of AIs they’re tools that we create to do our bidding. They are a product of humanity, and this is what allows them to be so creepy to humans.

There are diverse abysses to gaze into. Gaze on. There is also agoodreads list. See alsoNightmare Fuel.

Something likescpwiki makes a horror fan-fiction of reality. See, e.g.infinite IKEA.

Ric Royer,The Horror is that There is No Horror

Beyond the happy endings, commercialization, reduced violence, and cops tying a bow on it all, horror films that exploit our era of extreme partisanship tend to be a low-risk, conservative venture. As horror legend John Carpenter notes, these narratives are not only easy stories to tell, but easy to sell: “The easiest story that people will buy … is that evil is out there, beyond the woods, beyond the dark, the evil is the other … because the audience is always going to respond to the ‘other.’ It unifies us as a tribe.”

Yokay here are some recommendations.

Everything thisformer marine biologist touches turns into posthuman ooze.
(“What if the realgrey goo was inside your skull all along?”)
Gritty depictions of a future in which humans are cockroaches in the diverse biome of posthuman life and which eradicates them in incomprehensible ways.
Recommended: The*Firefall* series, in which the superior post-humans that rule the future rarely stop to explain their fears to humans about the even more superior aliens that pop by.

In Laundry Files, a series gradually sliding from public-service/dotcom office snark into dark tech Sleeper-in-the-Pyramid weirdness. The density of the puns is decreasing as their magnitude increases throughout the series.

The Odd Jobs series is Birmingham comedy civil service horror. The monsters from beyond have a flavour of Brexit-climate-change-with-tentacles. I find it satisfying and cathartic and relatable in a prim medium-density-urban kind of way. The upwardly-mobile god monsters send their kids to private schools and worry about landscaping in between sucking out souls.

This guy (?) can turn an arresting phrase and a horrific image, in e.g.,God-Shaped Hole, a tale of an augmented reality future in which our wistfully unsocialised protagonist and his PUA AI dating assistant encounter the spammy post-human dark sex gods arisen fromGenerative Amorous Networks.

Aside: I suspect Zero leansneoreactionary based on his tweets,
which makes me wonder vaguely if some of thelayers of horror a conservative would experience
in his stuff passed me by.
(UPDATE:oh yeah, he is*deep* in the frog right.)
Maybe there would be still*more* if I were terrorised by the hot queer robot sex motifs. 🤷♂
Anyway, there is still enough macabre to go around, and social media future extrapolation is slickly done.

He wrote a biography of H.P. Lovecraft andAtomised is full of a certain type of biological horror so I’ll count him.
His later books seem to be more*things-that-scare-baby-boomers*-horror, which has not grabbed me so much.

Another marginal entrant in this classification. The pseudonymousDelicious Tacos writes books with (appropriately enough I suppose) a complicated exploitative relationship to the complicated and exploitative PUA community. Has that Houellebecq aura of speaking painful-ideas-through-pluralistic ignorance, except maybe funnier? Interview excerpt that gives the flavour:

Part of the plot involves weaponized blackmail, with personal data being leveraged by a terrorist cell. In the wake of the Ashley Madison dump and so many instances of doxing, it’s an eerily plausible scenario — and of course this ties in with our increasing technological dependence. Maybe there’s not a nuclear holocaust at the end of the chain, but do you see things coming to a head?The good version of this would be: everyone’s innermost secrets are revealed. We all realize that we’re all racist, horny, greedy, hateful. We all jerk off to unspeakable things. We all hate our husbands, wives, children etc. Everyone’s secrets come out at once and no secret has leverage over another.

This is that aforementioned pluralistic ignorance theme.
Perhaps I am the fool.
I think the odds are good that we humans all harbour varying degrees of racist/horny/greedy/etc impulses and I*thought* we all believed that and chatted about it reasonably openly both on the anonymous internet and in intimate conversations.
I also imagined the reason that we had a legal system and so on was because we as a society have learned how to keep those impulses from action, and we are not required to be concerned about thought police as such.
Perhaps I am the fool.
Or maybe Delicious Taco’s family would benefit from some group therapy and/or MDMA idk.

The lauded*Southern Reach* series (now a motion picture!) seems popular.
I am AFAICT the only person in the world who did not like*Annihilation*, the first book, and have not read the rest.
The components are body horror/weird trippy cut sequences/bureaucratic authoritarian satire framed as a found-footage Blair-Witch-project-with-talking-psilocybe.
I dunno, maybe if it were the first time I had tasted such a combination it might have blown me away, but as it is, I read a lot of trippy shit and my jaded palate found nothing new.
This book to me had the flavour of a bunch of separately tasty ingredients stirred together into an undifferentiated mess.
Also, cardinal sin in my narrative preferences: lack of structural payoff.
No*plot twist*; it is displaced by*perpetual plot wringing*.

Further, this tale does not relate to my world.
I prefer my horror connected to my lived experience by stronger sinews than this; there is no horrifyingly plausible version of my future in here, no alternative take on my present, no matter how much I say*woooooo but is it?*
This is a parable about a world which I clearly cannot visit and in which I have no pen-pals.

Recommended for lovers of numinous fungus.

Ahn, Woo-Young, Kenneth T. Kishida, Xiaosi Gu, Terry Lohrenz, Ann Harvey, John R. Alford, Kevin B. Smith, et al. 2014.“Nonpolitical Images Evoke Neural Predictors of Political Ideology.”*Current Biology* 24 (22): 2693–99.

Boers, Elroy, Mohammad H. Afzali, and Patricia Conrod. 2019.“Temporal Associations of Screen Time and Anxiety Symptoms Among Adolescents.”*The Canadian Journal of Psychiatry*, November, 0706743719885486.

Giggs, Rebecca A. n.d.“Fictions for Strange Weather,” 363.

qntm. 2021.*There Is No Antimemetics Division*. Independently published.

Tingle, Chuck. n.d.*Dr. Chuck Tingle’s Complete Guide To The Void*.

VanderMeer, Jeff. 2014.*Annihilation*. First Edition. Southern Reach Trilogy 1. New York: Farrar, Straus and Giroux.

Wisker, Gina. 2018.“Speaking the Unspeakable: Women, Sex, and the Dismorphmythic in Lovecraft, Angela Carter, Caitlín R. Kiernan, and Beyond: The Critical Influence of H. P. Lovecraft.” In*New Directions in Supernatural Horror Literature: The Critical Influence of H. P. Lovecraft*, 209–34.

The skill of communicating in the highly artificial situations of the modern human. Such crucial skills. Often not taught. Worse, wesystematically fail to realise we lack them.

Here are some resources I use to work on my skills in this area. Analysis of these in a broader social context is something I consider in e.g.speech standards.

Back and Back’s classic (and cheap)Assertiveness at work is short, clear and has practical exercises. I buy copies of this in bulk and give them to friends with workplace friction challenges. This book is really, really good and I cannot recommend it highly enough. Not least because the downside risk of a $5 second-hand book is lowBack, Back, and Bates (1991):

We use the word ‘assertion’ … to refer to behaviour that involves:

- Standing up for your own rights in such a way that you do not violate another person’s rights
- Expressing your needs, wants, opinions, feelings and beliefs in direct, honest and appropriate ways
We will demonstrate this with an example. Suppose your manager asked you to complete some additional work by the end of the month. You are the best person to do the work, but your time is already fully committed to other work. An assertive response in this situation would be:

“I appreciate that you would like this work completed by the end of the month. However, I don’t see that I can fit it in with my workload as it is at present, so can we discuss it?”

So assertiveness is based on the beliefs that in any situation:

- You have needs to be met
- The other people involved have needs to be met
- You have rights; so do others
- You have something to contribute; so do others
The aim of assertive behaviour is to satisfy the needs and wants of both parties involved in the situation.

This is one of those things where it is not rocket science when you read it back to yourself, but which in my experience is poorly taught everywhere.

I really like their framing. They discuss assertive speech asa standard of communication that you can mutually agree upon, which if we all accede to it, will lower overall stress.
It is kind of an emotional*lingua franca*.

Not yet read:

Negotiation trainerMisha Glouberman has some stuff to say on the Dave McRaney podcast:How to have better conversations with loved ones (and just about anyone) about difficult topics (and just about anything). His work seems to be based uponStone and Heen (2011) andUry and Fisher (2012).

Dave Baileysummarises of Marshall B. Rosenberg’sNon-Violent Communication, whose key advice is similar. Looks similar tothe FBI Behavioral Change Stairway Model,(Vecchi, Van Hasseltb, and Romano 2005) which they use for hostage negotiations and suicide threats, so it is a literally battle-tested system in that regard.

Nate Soares,Assuming Positive Intent

I believe that the ability to expect that conversation partners are well-intentioned by default is a public good. An

extremely valuablepublic good. When criticism turns to attacking theintentionsof others, I perceive that to be burning the commons.

See alsopersonality tests; IIRC there is some communicative stuff there.

Academic writing is not so much brusque as verbosely passive aggressive.
We do not talk*inside* the ivory towers the same style in which we declaim from them, or at least my teams do not.

I find Piper Harron’s postWhy I Do Not Talk About Math troubling: “Nobody was mean to me, nobody consciously laughed at me. There’s just a way that mathematicians have been socialized (I guess?!) to interact with each other that I find oppressive. If you have never had someone mansplain or whitesplain things to you, it may be hard for you to understand what I’m going to describe.”

I personally find the way that mathematicians discuss things invigorating, myself, usually, and if I don’t enjoy it I change it. It sounds like there are roles for various communication styles; Is what we really need better ways to negotiate communication styles?

NB Harron’s science communication is seriously innovative and top-grade, a real level-up to people reading it, and it would have been a loss to the field if she had given up. See herthesis for wonderfully comprehensible deep algebra explanations.

Here is a name for a norm I notice and enjoy sometimes in academia:Crocker’s rules.

Declaring yourself to be operating by “Crocker’s Rules” means that other people are allowed to optimize their messages for information, not for being nice to you. Crocker’s Rules means that you have accepted full responsibility for the operation of your own mind — if you’re offended, it’s your fault. Anyone is allowed to call you a moron and claim to be doing you a favor. (Which, in point of fact, they would be. One of the big problems with this culture is that everyone’s afraid to tell you you’re wrong, or they think they have to dance around it.) Two people using Crocker’s Rules should be able to communicate all relevant information in the minimum amount of time, without paraphrasing or social formatting. Obviously, don’t declare yourself to be operating by Crocker’s Rules unless you have that kind of mental discipline.

Note that Crocker’s Rules does not mean you can insult people; it means that other people don’t have to worry about whether they are insulting you. Crocker’s Rules are a discipline, not a privilege. Furthermore, taking advantage of Crocker’s Rules does not imply reciprocity. How could it? Crocker’s Rules are something you do for yourself, to maximize information received — not something you grit your teeth over and do as a favor.

By framing these as a consensual communication strategy, it defangs one of the problems with brusque communication which different people interpret differently (for some it is aggression and for others it is respect for the value of another person’s time).

I am in many academic email correspondences that are cleanskin Crocker exchanges and I find it effective.

Important to me. For an example, seeA Deep Dive into the Harris-Klein Controversy, and for some background, seeDecoupling revisited. Question: Is decoupling possible inconflict theoretic worldviews?

Helpful insight:Decoupling as a Moral Decision

Seeworkplace habits.

Seemanaging

Here are my current communicative preferences, affordances and commitments.

I would like to work with people to learn true things, because truth is beautiful and because truth can be used to make the world better, and thus in turn make the truths about it more beautiful. To that end I invite your disagreement and criticism and collaboration on knowledge, and I guess on life itself.

Disagreement is one of the hardest things to do in conversation. Here are some tricks and background to communicate with me generally, esp with regard to disagreement.

- Native English speaker.
- Sociable, but too busy to act on that as much as I would prefer.
- Generally interested in people and theirwonderful weirdness andvariety.
- I attach a high value to kindness, empathy, equity, truth and respect for other humans, and do my best to model that, or at least fake it until I make it.
- My preferences are valid. So are yours. Let’s negotiate.
- My metamorphosis fromasker to guesser is complete and I am having trouble even remembering how guessers work.
- Going by my history, my opinions about lots of things are wrong.
- I attempt to not
*believe*many things but I assign a high certainty to some things. - I do my best to welcome and learn from your criticism of my opinions (If I know you well enough, I am game to play byCrocker’s rules).
- I think every topic is open for discussion, although not in every context. I can provide content warnings about sensitive topics etc.
- Regardless, we need to discuss stuff with care and compassion, because speech can indeed be violence of a sort, sometimes.
- I believe the force of an argument is dictated by its logical and observational content, not who makes it (which is not to say that who you are does not influence the observations you make, or the discussions we might have).
- I start from theassumption of positive intent, and in particular…
- I do my best to take your statements at face value rather than asmemetic warfare, unless you make it clear you intend otherwise.
- I would possibly enjoy debating with you in a contrary and contentious fashion, in theRational Style if you were game for that.
- If you tell me that something will burn me, I will probably touch it.

In short, I am a stroppy, empathetic bastard who aspires to thepositibility of reasonable discussion, who would probably like you if you have interesting opinions and positive intent.

Online comment moderation advice (I would like this to be data-backed):A Pragmatic Approach To Thorny People Problems.

Ian Leslie,Ten causes of breakdown in communication.
The top one is excellent,*believing you have communicated*.

Gaël Varoquaux onTechnical maintenance discussions.

Ozy Brennan oncovert contracts.

no hello is a handy webpage about why one should not start chats with “hey”.

Acemoglu, Daron, Victor Chernozhukov, and Muhamet Yildiz. 2006.“Learning and Disagreement in an Uncertain World.” Working Paper 12648. National Bureau of Economic Research.

Back, Ken, Kate Back, and Terry Bates. 1991.*Assertiveness at Work: A Practical Guide to Handling Awkward Situations*. 2nd ed. London ; New York: McGraw-Hill.

Cheng, Justin, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. n.d.“Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions.”

Goel, Sharad, Winter Mason, and Duncan J. Watts. 2010.“Real and Perceived Attitude Agreement in Social Networks.”*Journal of Personality and Social Psychology* 99 (4): 611–21.

Grant, Anthony M. 2017.“The Third‘Generation’ of Workplace Coaching: Creating a Culture of Quality Conversations.”*Coaching: An International Journal of Theory, Research and Practice* 10 (1): 37–53.

Stone, Douglas, and Sheila Heen. 2011.*Difficult Conversations: How to Discuss What Matters Most*. New York: Penguin.

Tan, Chenhao, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2016.“Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-Faith Online Discussions.” In*Proceedings of the 25th International Conference on World Wide Web*, 613–24. WWW ’16. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.

Ury, William, and Roger Fisher. 2012.*Getting to Yes: Negotiating an agreement without giving in*. 1st edition. London: CENTURY - TRADE.

Vecchi, Gregory M., Vincent B. Van Hasseltb, and Stephen J. Romano. 2005.“Crisis (Hostage) Negotiation: Current Strategies and Issues in High-Risk Conflict Resolution.”*Aggression and Violent Behavior* 10 (5): 533–51.

For most of my life, some people close to me have battled with depression.1
There have been a few cases spiralling chronically out of control.
I’m keeping notes of what useful things I learn about managing/assisting/engaging with people afflicted by this condition, in the hope I can apply in the service of various loved ones in the future.

Scott Alexander,Peer Review Request: Depression.

We all (mostly?) would like to talk to any fellow human with compassion and respect. How do we accomplish this with depressed people? I know that acute trauma, which definitely hasspecial care requirements. I suspect that depression does too.

Attempting extra compassion for someone who is sad is probably a good idea, regardless of whether that sadness is a diagnosable chronic depression or not. Feeling sad, it turns out, is shit.

That said, my fraught and tragic experience in this area indicates that possibly the best way to talk to a depressed person who is sufficiently deep in their illness is with extreme care, and possibly firmly maintained emotional distance. Being actively helpful might require specialised counselling skills, as it involves communicating with someone in a vulnerable state with compromised ability to communicate their need.

Also, I would like to know: Which interventions are likely to be helpful, and which not? Which are likely to be*most* helpful?
In a life with finite hours I want to spend my helping time efficiently.
Also I am not a professional counsellor or psychiatric nurse or social worker, so I want to keep a firm bound on how much I can actually do which is realistic for a non-specialist.
Also, presumably the skills of a professional include various important training I do not have, such as not making things worse.What are good bounds on how helpful to be?

Or, as I now call themmyths. This kind of thing:

Life gave me lemons and I have an amazing lemonade recipe but actually this are

citrus limoniand tastes better withcitrus latifoliaso there is no point in making it I would only be thinking about superior lemonades I cannot be making.

I*do* have ongoing questions about how to deal with the circular logic that depressed people seem to be especially likely to exhibit; I have noticed that someone who is really deep in will tend to construct a thicket of reasons to justify why the particular hole they are in is inescapable, and will grow new clauses justifying continued inaction, as each potential solution arise for the old problem.*I am sad because no one wants to talk to me / No-one wants to talk to me because I seem boring / thank you for offering to help facilitate a less boring conversation but actually I don’t to talk to those people because actually it isthey who are boring / thank you for offering to help me visit someone who I have previously said is not boring, but actually the weather is bad so I wouldn’t enjoy it…*
There are many versions of this.

When someone is depressed enough, they can always find a reason to avoid getting collecting evidence about whether certain things can alleviate what they claim is a direct stressor.
A particularly smart depressed person might spend what looks to me to be considerable cognitive effort on finding an explanation as for why they cannot spend effort on things that look prima facie pretty useful, and why all the various solutions that rise are*not quite right*, and therefore pointless to try.
I suspect this one would make more sense to me if I had suffered from more depression and I might learn how it is that, e.g. that talking about how stuff is terrible in a dark room is a free action, but talking about how stuff is terrible in the sunshine is gruelling.

Anyway, asides from my venting… Is it better to continue to attempt to refute such reasoning clause by clause, or does that lead only further into the sadness labyrinth?

The inactivity rationalisation spiral is self refuting. If you run far enough along a branch you will eventually find that the initial solution has been retconned out of existence. Perhaps running down these paths at all is destructive and is just helping the depressed person to eliminate choices from their life by publicly rejecting them, and then feeling compelled ot stand by their reaxiomatisaion of the problem.

Would love hot tips on this.

Exercise, sunlight,sleep butnot too much apparently?

See alsotraditionally-recreational drugs in psychiatric care.

TODO: what is the state of evidence on psilocybin for depression?(Vargas et al. 2020;Carhart-Harris et al. 2016;Goldberg et al. 2020)

Really. Guzey, in his review of sleep researchmentions the following quote from somewhere(?)

Jeremy Hadfield writes:

My (summarized/simplified) hypothesis based on what I’ve read: depression involves rigid, non-flexible brain states that correspond to rigid depressive world models. Depression also involves a non-updating of models or inability to draw new connections (brain is even literally slightly lighter in depressed patients). Sleep involves revising/simplifying world models based on connections learned during the day, involves pruning unneeded or irrelevant synaptic connections. Thus, excessive sleep + depression = even less world model updating, even more rigid brain, even fewer new connections. Sleep deprivation can resolve this problem at least temporarily by ensuring that you stay awake for longer and keep adding connections, thus compensating for the decreased connection-building caused by depression and “forcing” a brain update.

- Once again, Scott Alexander,Peer Review Request: Depression
- I also think hisTrapped Priors As A Basic Problem Of Rationality observation pertains here.
- Motivated Reasoning As Mis-applied Reinforcement Learning

Guzey argues for a deeper connection with sleep.

Question: Does thedark room problem have anything to say about this?

- My journal: years of depression and self-loathing; learning to accept myself and others; overcoming video game addiction - Alexey Guzey
- https://overcast.fm/+nuuo_rG_g Lead author on that controversial new “depression not from serotonin imbalance” paper(Moncrieff et al. 2022;Read and Moncrieff 2022) in interview. TBH she lost me in the middle when she seemed to be basically denying that any drug could count as a productive input to the human experience ever.

Bessone, Pedro, Gautam Rao, Frank Schilbach, Heather Schofield, and Mattie Toma. 2021.“The Economic Consequences of Increasing Sleep Among the Urban Poor.”*The Quarterly Journal of Economics* 136 (3): 1887–1941.

Boers, Elroy, Mohammad H. Afzali, Nicola Newton, and Patricia Conrod. 2019.“Association of Screen Time and Depression in Adolescence.”*JAMA Pediatrics* 173 (9): 853–59.

Buhusi, Catalin V., and Warren H. Meck. 2005.“What Makes Us Tick? Functional and Neural Mechanisms of Interval Timing.”*Nature Reviews Neuroscience* 6 (10): 755–65.

Carhart-Harris, Robin L, Mark Bolstridge, James Rucker, Camilla M J Day, David Erritzoe, Mendel Kaelen, Michael Bloomfield, et al. 2016.“Psilocybin with Psychological Support for Treatment-Resistant Depression: An Open-Label Feasibility Study.”*The Lancet Psychiatry* 3 (7): 619–27.

Costello, Cory, Sanjay Srivastava, Reza Rejaie, and Maureen Zalewski. 2021.“Predicting Mental Health From Followed Accounts on Twitter.”*Collabra: Psychology* 7 (18731).

Dregan, Alex, and Martin C. Gulliford. 2012.“Is Illicit Drug Use Harmful to Cognitive Functioning in the Midadult Years? A Cohort-Based Investigation.”*American Journal of Epidemiology* 175 (3): 218–27.

Durmer, Jeffrey S., and David F. Dinges. 2005.“Neurocognitive Consequences of Sleep Deprivation.” In*SEMINARS IN NEUROLOGY*. Vol. 25.

Gartlehner, Gerald, Catherine A. Forneris, Kimberly A. Brownley, Bradley N. Gaynes, Jeffrey Sonis, Emmanuel Coker-Schwimmer, Daniel E. Jonas, et al. 2013.*Interventions for the Prevention of Posttraumatic Stress Disorder (PTSD) in Adults After Exposure to Psychological Trauma*. AHRQ Comparative Effectiveness Reviews. Rockville (MD): Agency for Healthcare Research and Quality (US).

Goldberg, Simon B., Brian T. Pace, Christopher R. Nicholas, Charles L. Raison, and Paul R. Hutson. 2020.“The Experimental Effects of Psilocybin on Symptoms of Anxiety and Depression: A Meta-Analysis.”*Psychiatry Research* 284 (February): 112749.

Hölzel, Britta K., James Carmody, Mark Vangel, Christina Congleton, Sita M. Yerramsetti, Tim Gard, and Sara W. Lazar. 2011.“Mindfulness Practice Leads to Increases in Regional Brain Gray Matter Density.”*Psychiatry Research: Neuroimaging* 191 (1): 36–43.

Levandovski, Rosa, Giovana Dantas, Luciana Carvalho Fernandes, Wolnei Caumo, Iraci Torres, Till Roenneberg, Maria Paz Loayza Hidalgo, and Karla Viviani Allebrandt. 2011.“Depression scores associate with chronotype and social jetlag in a rural population.”*Chronobiology International* 28 (9): 771–78.

Lin, Liu yi, Jaime E. Sidani, Ariel Shensa, Ana Radovic, Elizabeth Miller, Jason B. Colditz, Beth L. Hoffman, Leila M. Giles, and Brian A. Primack. 2016.“Association Between Social Media Use and Depression Among U.s. Young Adults.”*Depression and Anxiety* 33 (4): 323–31.

Moncrieff, Joanna, Ruth E. Cooper, Tom Stockmann, Simone Amendola, Michael P. Hengartner, and Mark A. Horowitz. 2022.“The Serotonin Theory of Depression: A Systematic Umbrella Review of the Evidence.”*Molecular Psychiatry*, July, 1–14.

Moreno-Domínguez, Silvia, Tania Raposo, and Paz Elipe. 2019.“Body Image and Sexual Dissatisfaction: Differences Among Heterosexual, Bisexual, and Lesbian Women.”*Frontiers in Psychology* 10.

Olders, Henry. 2003.“Average sunrise time predicts depression prevalence.”*Journal of Psychosomatic Research* 55 (2): 99–105.

Owesson-White, Catarina A., Joseph F. Cheer, Manna Beyene, Regina M. Carelli, and R. Mark Wightman. 2008.“Dynamic Changes in Accumbens Dopamine Correlate with Learning During Intracranial Self-Stimulation.”*Proceedings of the National Academy of Sciences* 105 (33): 11957–62.

Primack, Brian A., Ariel Shensa, César G. Escobar-Viera, Erica L. Barrett, Jaime E. Sidani, Jason B. Colditz, and A. Everette James. 2017.“Use of Multiple Social Media Platforms and Symptoms of Depression and Anxiety: A Nationally-Representative Study Among U.S. Young Adults.”*Computers in Human Behavior* 69 (April): 1–9.

Read, John, and Joanna Moncrieff. 2022.“Depression: Why Drugs and Electricity Are Not the Answer.”*Psychological Medicine* 52 (8): 1401–10.

Rose, S., J. Bisson, R. Churchill, and S. Wessely. 2002.“Psychological debriefing for preventing post traumatic stress disorder (PTSD).”*The Cochrane Database of Systematic Reviews*, no. 2: CD000560.

Salamone, John D., and Mercè Correa. 2012.“The Mysterious Motivational Functions of Mesolimbic Dopamine.”*Neuron* 76 (3): 470–85.

Selvi, Yavuz, Adem Aydin, Murat Boysan, Abdullah Atli, Mehmed Yucel Agargun, and Lutfullah Besiroglu. 2010.“Associations between chronotype, sleep quality, suicidality, and depressive symptoms in patients with major depression and healthy controls.”*Chronobiology International* 27 (9-10): 1813–28.

Vargas, Ana Sofia, Ângelo Luís, Mário Barroso, Eugenia Gallardo, and Luísa Pereira. 2020.“Psilocybin as a New Approach to Treat Depression and Anxiety in the Context of Life-Threatening Diseases—A Systematic Review and Meta-Analysis of Clinical Trials.”*Biomedicines* 8 (9): 331.

I am hoping this is not a causal relationship.↩︎

The converse tovoice fakes: generating text from speech. a.k.a. speech-to-text. We might do this in real time, to control something, or “off-line”, to turn an audio recording into text. Or something in between.

Speaking as a realtime textual input method. See following roundups of dictation apps to start:

- Zapier dictation roundup
- the rather grimmerLinux-specific roundup.

Here are some options culled from those lists and elsewhere of vague relevance to me:

- dictation.io provides a frontend to google speech recognition.
- A classic is NuanceDragon dictate.
- macOS includes dictation.
- so does Windows.

SeeSpeaking in code: how to program by voice

Coding by voice command requires two kinds of software: a speech-recognition engine and a platform for voice coding. Dragon from Nuance, a speech-recognition software developer in Burlington, Massachusetts, is an advanced engine and is widely used for programming by voice, with Windows and Mac versions available. Windows also has its own built-in speech recognition system. On the platform side,VoiceCode by Ben Meyer andTalon by Ryan Hileman … are popular.

Two other platforms for voice programming areCaster andAenea, the latter of which runs on Linux. Both are free and open source, and enable voice-programming functionality inDragonfly, which is an open-source Python framework that links actions with voice commands detected by a speech-recognition engine.

See also:Programming by Voice May Be the Next Frontier in Software Development.

Full disclosure: I am researching this because I have temporarily disabled my hands. For the moment, for my purposes, the easiest option is to use Serenade for python programming, OS speech recognition for prose typing, and to leave my other activities aside for now. If my arms were to be disabled for a longer period of time I would probably accept the learning curve of using Talon, which seems to solve more problems, at the cost of greater commitment.

One point of friction which I did not anticipate, is that most of these tools will, for various reasons, do their best to switch off any music playing any time you use them. For someone like me who can't focus for three minutes straight without banging electro in the background this is tricky. My current workaround is to play music on a different device so I can sneak beats past my unnecessarily diligent speech recognition tools trying to control background noise. This means that I am wearing two headsets, which looks funny, but to be honest it is not the worst fashion sacrifice I have been forced to make in the course of this particular injury.

Contrariwise, if I were to try to do the speech control stuff in an open plan office, coworking space or in the family living room, it would be excruciatingly irritating for anyone else who could hear me. My current workaround, when I am annoying some innocent bystander, is to accuse them of being ablist.

Simple, low-lift intuitionistic voice recognition for coding. Includes deep integration for various languages and also various code editors includingvisual studio code and those jetbrains ones. Free. Simple to use.

Supported languages:

- Python
- JavaScript
- HTML
- Java
- C / C++
- TypeScript
- CSS
- Markdown
- Dart
- Bash
- Sass
- C#
- Go
- Ruby
- Rust

The experience is very good for plain code. Editor integration is not awesome when using Jupiter, in line with the general rule that Jupiter makes everything more flaky and complicated.

**tl;dr**

Powerful hands-free input

- Voice Control — talk to your computer
- Noise Control — click with a back-beat
- Eye Tracking — mouse where you look
- Python Scripts — customize everything

Full length:

🤳Talon aims to bring programming, realtime video gaming, command line, and full desktop computer proficiency to people who have limited or no use of their hands, and vastly improve productivity and wow-factor of anyone who can use a computer.

System requirements:

macOS High Sierra (10.13) or newer. Talon is a universal2 build with native Apple Silicon support.

Linux / X11 (Ubuntu 18.04+, and most modern distros), Wayland support is currently limited to XWayland

Windows 8 or newer

Powerful voice control - Talon comes with a free speech recognition engine, and it is also compatible with Dragon with no additional setup.

Multiple algorithms for eye tracking mouse control (depends on a single Tobii 4C, Tobii 5 or equivalent eye tracker)

Noise recognition system (pop and hiss). Many more noises coming soon.

Scriptable with Python 3 (via embedded CPython, no need to install or configure Python on your host system).

Talon is very modular and adaptable - you can use eye tracking without speech recognition, or vice versa.

Worked example:Coding with voice dictation using Talon Voice.

Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects. Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose.

Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only). See theactions sub-package documentation for more information, including code examples.

This project is a fork of the originalt4ngo/dragonfly project.

Dragonfly currently supports the following speech recognition engines:

Dragon, a product ofNuance. All versions up to 15 (the latest) should be supported.Home,Professional Individualand previous similar editions ofDragonare supported. Other editions may work tooWindows Speech Recognition(WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XPKaldi(under development)CMU Pocket Sphinx(with caveats)

Your voice is the most efficient way to communicate. VoiceCode is a concise spoken language that controls your computer in real-time. When writing anything from emails to kernel code, to switching applications or navigating Photoshop – VoiceCode does the job faster and easier.

VoiceCode is different from other voice-command solutions in that commands can be chained and nested in any combination, allowing complex actions to be performed by a single spoken phrase.

By taking advantage of your brain’s natural aptitude for language you can control your computer more efficiently and naturally. It really feels like you’re in the future!

Handy if you have a recording and you want to make it into a text thing offline.

- producthunt transcription options
- descript aims to integrate editing with transcription and in particular seems to allow editing audio via editing the transcription viavoice fake technology.Weaponised social media deep fake here we come. USD 10/month for 10hr/month
- rev transcription is a human-powered service (USD1.25/minute)
- Vatis tech is AI-backed? USD10/hr. Output to video subtitles and identifies different speakers.
- Audioburst offers transcription as part of their podcast service. The price is a mystery.
- Tony door aims to do meeting-specific transcription. AI. 4hr/month free, thereafter USD25/month for uup to 40hr.
- The all-manual option:Type it yourself.
- wreally transcribe has built their own in-browser speech recognizer as well as a manual transcription UI. More augmented-manual than automatic. $20/year.

It has been a long time since I tookPhil Rose’s extravagantly weird undergraduate phonetics class, and I have forgotten much. Here is a cheating tool:

I cannot easily see how to automate phenetic transcription, but surely that is around somewhere? Some voice transcription software may well use phonetics as an intermediate representation or even as the final output.

trystylus oreye tracking systems.

Things that I think should be noted and filed in an orderly fashion but which I lack time to address right now. Content will change incessantly.

I need to reclassify thebio computing links; that section has become confusing and there are too many nice ideas there not clearly distinguished.

“The problem with Bernoulli regression is that binary outcomes just aren’t very informative.” one of my colleagues said to me. Now I have decided that there is some meat on this bone. TODO: revisit the informativeness of categories about their covariates in the post-Imagenet era, from a classic vector quantisation perspective. Deep learning classifiers as a model for legibility.

None rn.

KeOps (file under least squares autodiff, gps, pytorch)

How I Attained Persistent Self-Love, or, I Demand Deep Okayness For Everyone

You'll forget most of what you learn. What should you do about that?

Psychology might be a big stinkin’ load of hogwash and that’s just fine

The problem with raging against the machine is that the machine has learned to feed off rage

Zuckerman,Tips to article writers

Long-term Dynamics of Fairness Intervention in Connection Recommender Systems

Spark the Mood You Want | Create a positive mental association | ClearerThinking.org

Spirals of Delusion: How AI Distorts Decision-Making and Makes Dictators More Dangerous (not convinced tbh)

Keet only shares end-to-end encrypted data between participants in your calls. Without middlemen, third-parties, or servers, there’s nobody left who can snoop or leak data.

Telios: Decentralized and Secure Email Communication Service

Transition away from expensive and complex cloud services to a rent-free P2P web! Request access for P2P features on our serverless, files, and data apps.

DatDot team is building an autonomous hosting network for p2p data systems. Think of it as a Filecoin but for Hypercore protocol, built with Substrate.

Creating an Autistic-Friendly Workplace · Public Neurodiversity Support Center

AO3’s 15-year journey from blog post to fanfiction powerhouse - The Verge

F1000Research | Open Access Publishing Platform | Beyond a Research Journal

The Developer Certificate of Origin is a great alternative to a CLA

Panel 3: Mathematical sciences to address societal challenges and issues

Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale - Criteo AI Lab

I. Risk Management Foundations - Machine Learning for Financial Risk Management with Python [Book]

jkbren/einet: Uncertainty and causal emergence in complex networks

Inform: A C library for information analysis of complex systems

[2110.05038] Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs

[2110.14759] Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

[2112.12524] Emulation of greenhouse-gas sensitivities using variational autoencoders

Generalizing projections to Mahalanobis-type metrics | Mathematical Odds & Ends

Democratizing the hardware side of large language models seems to be an advertisement for some new hardware, but there is interesting background in there.

Darren Wilkinson’sBayesian inference for a logistic regression model 1,2,3,4,5

All Minus One an Australian ❄️🍑 club.

When Giving People Money Doesn’t Help - by Zvi Mowshowitz. Troubling, needs follow-up.

Your Book Review: Public Choice Theory And The Illusion Of Grand Strategy

Jeff Maurer,It’s Always the Adults’ Fault

Stephen Malina —Deriving the front-door criterion with the do-calculus

DIY Collective Embeds Abortion Pill Onto Business Cards, Distributes Them At Hacker Conference

Understand how other people think: a theory of worldviews – Spencer Greenberg

Census tool which links all the weird different data storage systems and CRM stuff

Michael Lewis podcast on illegible experts

Hot money on complicated porn rules

Chalk is a non-terrible calculator for macos, incorporating useful things like matrices and bitwise ops

NeurIPS Conference: Historical Data Analysis | by Nemanja Rakicevic

M Bronstein’s ICLR 2021 Keynote,Geometric Deep Learning: The Erlangen Programme of ML

Communications' digital initiative and its first digital event

flatmax/vector-synth: Old 2002 era vector synth code based on XFig

Yanir Seroussi,The mission matters: Moving to climate tech as a data scientist

PJ Vogt,Selling Drugs to Buy Crypto

Have The Effective Altruists And Rationalists Brainwashed Me?

Field Guide to the Curve Wars: DeFi’s Fight for Liquidity - Almanack - Every

Digital artists’ post-bubble hopes for NFTs don’t need a blockchain

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

PlayableHalf Earth Socialism simulator

Quasi-likelihood analysis and its applications | SpringerLink

I Do Not Think It Means What You Think It Means: Artificial Intelligence, Cognitive Work & Scale

Path Integrals and Feynman Diagrams for Classical Stochastic Processes

Modern Computational Methods for Bayesian Inference — A Reading List | George Ho is a good curation of modern Bayes methods posts. The next links come from there

will wolf,Deriving Expectation-Maximization

will wolf,Deriving Mean-Field Variational Bayes

ClearerThinking.org’s courses, e.g.

- Introduction to Decision Academy: The Science of Better Decisions
- Rhetorical Fallacies: Dodging Argument Traps
- Learning from Mistakes: A Systematic Approach
- Probabilistic Fallacies: Gauging the Strength of Evidence
- Explanation Freeze: Interpreting Uncertain Events
- Aspire: A Tool to Help You Improve Your Life
- The Sunk Cost Fallacy: Focusing on the Future

My2050 calculator - create your pathway for the UK to be net zero by 2050

Is Pandemic Stress to Blame for the Rise in Traffic Deaths? Nope apparently it is decreased congestion making drivers drive faster on shit roads.

Review of Joseph Heath, Enlightenment 2.0 on Habermas+Kahnemann ideas in the public sphere.

Agenty Duck: Intro To Naturalism Sequence

Is naturalism nonparametric learning for mind-as-ML?

Nick Chater,Would you Stand Up to An Oppressive Regime.

Reddit forAI-generated and manipulated content

Everything I wrote before

*now*was overconfidentBrutalita Sans >Brutalita is an experimental font and editor, edit in your browser and download the font.

Social Desirability Bias: How Psych Can Salvage Econo-Cynicism (File under dunning-kruger theory of society, and survey analysis)

That broken tech/content culture cycle

I’m not there for 100% of this story about youtube/instagram/facebook/amazon/etc, but I will back a chunk of it. Also: compare to TikTok.

Evidence of Fraud in an Influential Field Experiment About Dishonesty. Looks bad for Dan Ariely. Damn.

on programming humans (Amir’s work)

Apple acquires song-shifting startup AI Music, here’s what it could mean for users

Michele Coscia,Pearson Correlations for Networks

Penny Wyatt,Developer Innovation and the Free Puppy

Elizabeth Van Nostrand,A Quick Look At 20% Time

Oleksandr Nikitin is working onThe Cortex, their Ultimate Productivity App Thing

Marisa Abrajano has a provoking list of research topics. I would like to read the work to see her methodology.

Do normal people need to know or care about “the metaverse”?

The DAIR Institute “The Distributed AI Research Institute is a space for independent, community-rooted AI research, free from Big Tech’s pervasive influence.”

Machine Learning Trick of the Day (1): Replica Trick— Shakir Mohammed

Machine Learning Trick of the Day (3): Hutchinson’s Trick— Shakir Mohammed

Machine Learning Trick of the Day (7): Density Ratio Trick— Shakir Mohammed

vscode-paste-image/README.md at master · mushanshitiancai/vscode-paste-image

mhoye/awesome-falsehood: 😱 Falsehoods Programmers Believe in

The Paradox of Choice in Computing-Research Conferences | November 2021

ApplyingML - Papers, Guides, and Interviews with ML practitioners

The War Nerd: Taiwan — The Thucydides Trapper Who Cried Woof - By Gary Brecher

We were the unpaid janitors of a bloated tech monopoly - by Ryan Broderick - Garbage Day

Lambda School’s Job Placement Rate May Be Far Worse Than Advertised

AI research: the unreasonably narrow path and how not to be miserable

I would like to read the diaries ofUsama ibn Munqidh

The latest target of China’s tech regulation blitz: algorithms

Jag Bhalla,Vaccine Greed: Capitalism Without Competition Isn’t Capitalism, It’s Exploitation

Kostas Kiriakakis,A Day At The Park

fastdownload: the magic behind one of the famous 4 lines of code · fast.ai

State Power and the Power Law,State Power and the Power Law 2

Hosting SQLite databases on Github Pages - (or any static file hoster)

Reciprocal Convexity to reverse the Jensen Inequality - B.log

Schneier,When AIs Start Hacking

Multimodal Neurons in Artificial Neural Networks/Distill version of Multimodal Neurons in Artificial Neural Networks

Why a Universal Society Is Unattainable is a rebadged and refreshed version of the “societies need common enemies” trope.

Yuling Yao,The likelihood principle in model check and model evaluation

Algorithms for Decision Making: Decision making, in the sense of reinforcement learning

Liquid Information Flow Control, a confidential computing DSL

On the Generalization Ability of Online Strongly Convex Programming Algorithms

Thomas Lumley visualises data pooling simply and well.

To obtain the least biased information, researchers must acknowledge the potential presence of biases and take steps to avoid and minimise their effects. Equally, in assessing the results of studies, we must be aware of the different types of biases, their potential impact and how this affects interpretation and use of evidence in healthcare decision making.

.

The classicpython plotting tool ismatplotlib. It can’t do all those modern hipster graphs without hard labour and is awful at animations and interactions, and it fugly per default. It works OK out of the box. There are libraries which use matplotlib as a backend and build more elaborate systems on the top, but these have not had much longevity so far, so I find myself falling back to plain old matplotlib. It is an acceptable default with lots of weird edge cases when you try to be clever, but gets the job 80% done.

Note someconfusing terminology;
An`Axes`

object, which is constructed by an`add_subplot`

command,
contains two`Axis`

objects, but is much more than a list of such objects, being
the fundamental object upon which a graph is drawn.

But don’t listen to me describe it. Observe this lovely diagram which explains all.

.

ReadJakevdp’s manual for some pedagogic advice.

- If I am usingjupyter, the nerdy extension isjupyter-matplotlib which integrates interactive plotting into the notebook better.
- Improving log y-axis plots, esp histograms
- drawnow allows dynamically updated diagrams. It is, ironically, itself updated not especially dynamically.
- rougier/scientific-visualization-book: An open access book on scientific visualization using python and matplotlib

Traditionally annoying. There are colorbars everywhere and aspect ratios are horrible and getting multiple images to plot is vexing etc.

There are helpers for this in modern matplotlib.
The keyword to look for is`mpl_toolkits.axes_grid1`

.
See also thetutorial.

Alternative, useproplot

matplotlib will always be horrible to ue because the designers were not prophets so some of their early API seigdesignn guesses were bad, but we are stuck with them for compatibility. There are various attempts to abstract away matplotlib's horrilbe API behind nicer ones.

Matplotlib is an extremely versatile plotting package used by scientists and engineers far and wide. However, matplotlib can be cumbersome or repetitive for users who…

Make highly complex figures with many subplots.

Want to finely tune their annotations and aesthetics.

Need to make new figures nearly every day.

Proplot’s core mission is to provide a smoother plotting experience for matplotlib’s most demanding users. We accomplish this by

expanding uponmatplotlib’sobject-oriented interface. Proplot makes changes that would be hard to justify or difficult to incorporate into matplotlib itself, owing to differing design choices and backwards compatibility considerations.This page enumerates these changes and explains how they address the limitations of matplotlib’s default interface. To start using these features, see theusage introduction and theuser guide.

TheNext-generation seaborn interface attempts to achieve a pythonic equivalent to ggplot, at least somewhat:

as seaborn has become more powerful, one has to write increasing amounts of matpotlib code to recreate what it is doing.

So the goal is to expose seaborn’s core features — integration with pandas, automatic mapping between data and graphics, statistical transformations — within an interface that is more compositional, extensible, and comprehensive.

One will note that the result looks a bit (a lot?) like ggplot. That’s not unintentional, but the goal is also not to “port ggplot2 to Python”. (If that’s what you’re looking for, check out the very niceplotnine package). There is an immense amount of wisdom in the grammar of graphics and in its particular implementation as ggplot2. But I think that, as languages, R and Python are just too different for idioms from one to feel natural when translated literally into the other. So while I have taken much inspiration from ggplot, I’ve also made plenty of choices differently, for better or for worse.

Note that as exciting as this sounds, the project is 100% vaporware at this stage, with no sign of a public release or any commitment to any kind of process

I do plan to issue a series of alpha/beta releases so that people can play around with it and give feedback, but it’s not at that point yet.

Possibly one can follow along atmwaskom/seaborn/nextgen/main.

Plotnine implements a best-effort clone ofR’sggplot2 library for matplotlib I believe plotnine supersedes the abandoned(?)ggplot.py by yhat (ggplot source,plotnine source).

The default matplotlib stylesheet aspires to look like 80s spreadsheet defaults, but if you are not a retrofuturist, you want to change the stylesheet Some of the built-in stylesheets are OK.

Here is an ugly gallery of sometimes-beautiful graph styles. Andhere is an ugly gallery of sometimes-beautiful colour maps.

Seaborn is another vaunted extension, which I would describe as an “Edward Tufterizer”. Extends matplotlib with modern appearance and some missing plot types.

TUEplots is a light-weight matplotlib extension that adapts your figure sizes to formats more suitable for scientific publications. It produces configurations that are compatible with matplotlib’s rcParams, and provides fonts, figure sizes, font sizes, color schemes, and more, for a number of publication formats.

A cute hack to justify matplotlib’s existence:xkcd graphs.

`plt.savefig("image.png", dpi=300, bbox_inches='tight', pad_inches=0)`

Suppressing?

```
ax = plt.gca()
ax.axes.xaxis.set_visible(False)
ax.axes.yaxis.set_visible(False)
```

An interactive exploratory matplotib GUI toolkit/app isglue. They have solved a lot ofpython gui problems, bless them, and have tried to make everything more-or-less interactive.

Glue is designed with "data-hacking" workflows in mind, and can be used in different ways. For instance, you can simply make use of the graphical Glue application as is, and never type a line of code. However, you can also interact with Glue via Python in different ways:

- Using the IPython terminal built-in to the Glue application
- Sending data in the form of NumPy arrays or Pandas DataFrames to Glue for exploration from a Python or IPython session.
- Customizing/hacking your Glue setup using
`config.py`

files, including automatically loading and clean data before starting Glue, writing custom functions to parse files in your favorite file format, writing custom functions to link datasets, or creating your own data viewers.Glue thus blurs the boundary between GUI-centric and code-centric data exploration. In addition, it is also possible to develop your own plugin packages for Glue that you can distribute to users separately, and you can also make use of the Glue framework in your own application to provide data linking capabilities.

I have an array of images in`arr`

.
How can I plot them on a nice simple plot?
I need to do this all the time.
If I have skimage installed I can use themontage function.
I do not always have that installed though. Here is a snippet to do it by hand:

```
columns = 5
rows = 3
fsize = 6
fig = plt.figure(figsize=(fsize *columns/rows, fsize))
for i in range(1, columns*rows +1):
img = arr[1,:,:,i]
ax = fig.add_subplot(rows, columns, i)
plt.imshow(img)
ax.set_axis_off()
plt.tight_layout(pad = 1)
plt.show()
```

Alternative, useproplot

Agustinus Kristiadi, inThe Last Mile of Creating Publication-Ready Plots introducestexworld/tikzplotlib,, which is a tikz plotting backend; why do we want this? For one, it can match fonts to the parent document.

Someone made the idiosyncratic choice that default
font is sans serif, even formathematical text.You can change this
by setting serif fonts also for`mathtext`

.

```
from matplotlib import rc
rc(
'font',
family='serif',
serif=['Palatino']
)
rc(
'mathtext',
fontset='cm'
)
```

Supported math fonts are reputedly

- dejavusans (the horrible default)
- dejavuserif (beware of odd greek letters)
- cm (“Computer Modern”. Classic, dated.)
- stix (modern serif, looks OK)
- stixsans (sounds like sans serif to me)

Alternatively I canrender graph labels with TeX which leads to some weird spacing but allows me to match fonts better. It is also fragile and character set issues are terrible. Are these problems eased if I useXeLaTeX/LuaLaTeX?

I am indebted to my colleagueChristian Walder for suggesting this as a reliable initialisation procedure for matplotlib plotting.

```
import matplotlib
import sys, os
# GTK GTKAgg GTKCairo MacOSX Qt4Agg TkAgg WX WXAgg CocoaAgg
# GTK3Cairo GTK3Agg WebAgg agg cairo emf gdk pdf pgf ps svg template
is_mac = sys.platform == 'darwin'
if is_mac:
_matplotlib_backend = 'MacOSX'
else:
_matplotlib_backend = 'pdf'
matplotlib.rcParams['svg.fonttype'] = 'none'
matplotlib.rcParams['backend'] = _matplotlib_backend
matplotlib.rcParams['mathtext.fontset'] = 'stix'
matplotlib.rcParams['font.family'] = 'Times New Roman'
matplotlib.use(_matplotlib_backend)
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.switch_backend(_matplotlib_backend)
# print(matplotlib.pyplot.get_backend())
try:
import cairocffi as cairo
except:
pass
# logging.warning('import cairocffi failed')
_latex_preamble = [
r'\usepackage{amsmath,bm}',
r'\newcommand\what{\hat{\bm{w}}}',
r'\newcommand\tr{^\top}',
r'\newcommand\dt[1]{\left|#1\right|}',]
_latex_path = '/Library/TeX/texbin/'
def use_latex_mpl(
latex_path=_latex_path,
latex_preamble=_latex_preamble):
mpl.rcParams['text.usetex'] = True
mpl.rcParams['text.latex.preamble'] = latex_preamble
if latex_path is not None:
os.environ['PATH'] = '%s:%s' % (os.environ['PATH'], latex_path)
```

Yellowbrick is a matplotlib specialisation for hyperparameter optimisation.

Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.

Thepython-derived entrant in thescientific workbook field is called`jupyter`

.

Interactive “notebook” computing for various languages;python/julia/R/whatever plugs into the “kernel” interface. Jupyter allows easy(ish) online-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. Handy enough that it’s sometimes worth the many rough spots, and so I conquer my discomfort and use it.

But what does the painful set up of jupyter buy you?
Why bother with this contraption?
It took me a long time to realise that the answer was in part thatit is a*de facto* standard for running remote computation jobs interactively.
The browser-based, network-friendly jupyter notebook is a natural, easy way to execute tedious computations on some other computer somewhere else, with some kind of a paper trail.
In particular, it is much better over unreliable networks thanremote terminals orremote desktops, because the client/server architecture doesn’t need to do so many round-trips to render the state of your work
So, jupyter is a kind of re-designed network terminal.
Certainly, if you need to execute a job that can be executed over remote desktop or jupyter, jupyter is going to be less awful if your connection has any lag at all, when every mouse click and keystroke involves waiting and twiddling your fingers. Jupyter has less waiting.

People make UX arguments, say, that jupyter is friendly and supports graphs and so on.
I am personally ambivalent about those arguments.
Jupyter can do some things*better than theconsole*.
But that is an artificially restricted comparison.
It can also do some things better than pencil and paper.
On the other hand, most things that jupyter does, it does worse than a proper IDE or decentcode editor.
This comparison is more pertinent when you need to run the code in a
However, sometimes those other things are not available on, say, yourHPC cluster orcloud compute environment, and then this becomes a relevant advantage.

But for now the main takeaway, I think, is that if, like me, you are confused by jupyter enthusiasts claiming it is easy and fun, it will make more sense if you append “in comparison to the worst-case other ways of executing code remotely which is frequently what we face”.

There are other comparisons to make — some like it as a documentation format/literate coding environmennt. Once again, sure, it is better than text files. But then,RMarkdown is better.

**tl;dr** Not to besmirch the efforts of the jupyter developers who are doing a difficult thing, in many cases for free, but I*will* complain about jupyter notebook.
It is often touted as a wonderful solution for data science but seems to me to merely offer a different selection of pain points to traditional methods.
Further, it introduces some new pain points when you try to combine the old and the new to make something better.

This is not to say it is all bad.
I’m an equivocal advocate of the jupyter notebook*interface*, which some days seems to counteract every plus with a minus.
This is partly due to the particulars of`jupyter`

’s design decisions, and partly because of the problems of notebook interfaces generally(Chattopadhyay et al. 2020).
As with so many things in computer interfaces, my luke-warm endorsement is, in relative terms,*fannish enthusiasm* because often, as presaged, the alternatives are*abysmal*.

Jupyter:
It’s friendly to use, but hard to install.
It’s easy to graphically explore your data, but hard to keep that exploration in version control.
It makes it easy to explore your code output, butclashes with the fancy debugger
that would make it easy to explore your code bugs.
It is open source, and written in an easy scripting language, python, so it seems it*should* be easy to tweak to taste.
In practice it’s an ill-explained spaghetti of python, javascript, compiled libraries and browsers that relate to one another in obscure ways that few people with a day job have time to understand or contribute to.
Things regularly break either at the server or client side and you might need to upgrade either or both to fix it.
You might have many different installs of each and need to upgrade a half-dozen different installs to keep them all working.
You might upgrade the wrong one at the end of a long day and have no way to get it working without a lengthy debugging process.
It is a constant struggle to keep jupyter finding the many intricate dependencies that are needing to keep the entire contraption running.
The sum total is IMO no more easy to run than most of the other UI development messes that we tolerate in academic software, let alone*tweak*.
Case study: look adependency of a dependency of the autocomplete function broke something and thus spawned a multi-month confusion of cascading dependency problems and certainly cost me several hours to fix across the few dozen different python environments I manage across several computers.
This kind of tedious intermittent breakage is the much the cost of doing business with jupyter, and has been so for as long as I have been using the project, which is as long as it has existed.

These pain points are perhaps not so intrusive for projects of intermediate complexity and/or longevity.
Indeed, jupyter seems good at making such projects look smooth, shiny, and inviting.
That is, at the crucial moment when you need to make your data science project look sophisticated-yet-friendly, it lures colleagues into your web(-based IDE).
Then it is*too late mwhahahahah you have fallen into my trap now you are committed*.
This entrapment might be a feature not a bug, as far as the realities of team dynamics and their relation to software development.
You want to lure people in until your problems become their problems and you are required to work together to solve them.

Some argue that the weird / irritating constraints of jupyter can even lead to good architecture, such asGuillaume Chevallier andJeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric. I could be persuaded of its merits, if I found time in between all the debugging.

For now I think of the famous adage “The fastest code is code that doesn’t need to run, and the best code is code you don’t need to write”. The uncharitable corollary might be “Thus, let’s make writing code horrible so that you write less of it”. That is not even necessarily a crazy position.

Here is some verbiage by Will Crichton which explores some of these themes,The Future of Notebooks: Lessons from JupyterCon.

OK, here is a pain point: The lexicon of jupyter is confusing. Terminology tarpit alert.

A*notebook* is on one hand a*style of interface* which this conforms to.
Other applications with a notebook style of interface
areMathematica andMATLAB.

These interfaces communicate with a computational backed, which is called a`kernel`

^because in mathematics and computers science if you don’t know what to call somethingyou call it a kernel. This confusing explosion of definitions is very much on-message for the notebook development.]

These are software packages in which a unit of development is
a type of*notebook* file on your disk, containing both code and output of that code.
(In the case of jupyter this file format is marked by file extension`.ipynb`

, which is short for “ipython notebook”, for fraught historical reasons.)*One* implementation of a notebook frontend interface over a notebook protocol for jupyter is called the jupyter notebook, launched by the`jupyter notebook`

command which will open up a javascript-backed notebook interface in a web browser.
This is the one that is usually assumed.
Another common notebook-style interface implementation is called`jupyter lab`

,
which additionally uses much of the same`jupyter notebook`

infrastructure but is distinct and only sometimes interoperable in ways which I do not pretend to know in depth.
But there aremultiple ‘frontends’ besides which interact over the jupyter notebook protocol to talk to a kernel.

Which sense of*notebook* is intended you have to work out from context,
e.g. the following sentence is not at all tautological:

You dawg, I heard you like notebooks, so I started up your jupyter notebook in

`jupyter notebook`

.

Hearing this, this I became enlightened.

Seejupyter UI.

jupyter looks for*kernel specs* in a kernel spec directory,
depending on your platform.

Say your kernel is`dan`

; then the definition can be found in the following location:

- Unixey:
`~/.local/share/jupyter/kernels/dan/kernel.json`

- macOS:
`~/Library/Jupyter/kernels/dan/kernel.json`

- Win:
`%APPDATA%\jupyter\kernels\dan\kernel.json`

Seethe manual for details.

How to set up jupyterto use a virtualenv (or other) kernel?**tl;dr** Do this from inside the virtualenv to bootstrap it:

```
pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv-name
```

Addendum: for Anaconda, you can auto-install all discoverable conda envs, which worked for me, whereas the ipykernel method did not.

`conda install nb_conda_kernels`

e.g. if you wish to run a kernel with different parameters. for example with a GPU-enabled launcher.See here for an example for GPU-enabled kernels:

For computers on Linux with optimus, you have to make a kernel that will be called with

`optirun`

to be able to use GPU acceleration.

I made a kernel in`~/.local/share/jupyter/kernels/dan/kernel.json`

and modified it thus:

```
{
"display_name": "dan-gpu",
"language": "python",
"argv": [
"/usr/bin/optirun",
"--no-xorg",
"/home/me/.virtualenvs/dan/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
```

Any script called can be set up to use CUDA but not the actual GPU, by setting an environment variable in the script, which is handy for kernels.
So this could be in a script called`noprimusrun`

:

`CUDA_VISIBLE_DEVICES= $*`

Various options. For one, github will attempt to render jupyter notebooks in github repos.; I have had various glitches and inconsistencies with images and equations rendering in such notebooks. Perhaps it is better in…

The fastest way to share your notebooks - announcing NotebookSharing.space - Yuvi Panda

You can upload your notebook easily via the web interface atnotebooksharing.space:

Once uploaded, the web interface will just redirect you to the beautifully rendered notebook, and you can copy the link to the page and share it!

Or you can directly use the

`nbss-upload`

commandline tool: …When uploading, you can opt-in to have collaborative annotations enabled on your notebook via the open source, web standards basedhypothes.is service. You can thus directly annotate the notebook, instead of having to email back and forth about 'that cell where you are importing matplotlib' or 'that graph with the blue border'.

This is one of the coolest features of notebooksharing.space.

Jupyter can hostonline notebooks,
evenmulti-user notebook servers -
if you are brave enough to let people execute weird code on your machine.
I’m not going to go into the security implications.**tl;dr**encrypt and password-protect that connection.
Here, see thisjupyterhub tutorial.

**NB**: This section is outdated.
🏗; I should probably mention the ill-explainedKaggle kernels
and google cloud ML execution of same, etc.

Base level, you can run one using a standard a standardcloud option like buying compute time as avirtual machine orcontainer, and using a jupyter notebook for their choice of data science workflow.

Special mention to two early movers:

sagemath runs notebooks online, with fancy features starting at $7/month. Messy design but tidy open-source ideals.

Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. ($7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.

Microsoft’sAzure notebooks

Azure Notebooks is a free hosted service to develop and run Jupyter notebooks in the cloud with no installation. Jupyter (formerly IPython) is an open source project that lets you easily combine markdown text, executable code (Python, R, and F#), persistent data, graphics, and visualizations onto a single, sharable canvas called a notebook.

Google’ssColaboratory is hip now

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is an intro andhere is another

Anne Bonner’sTips, Tricks, Hacks, and Magic: How to Effortlessly Optimize Your Jupyter Notebook is actually full of useful stuff. So much stuff that I nearly forget I hate jupyter. If you must use it, read her article it will make stuff better. Many tips here are gleaned from her.

Here are some useful ones too look up from her:

```
%%writefile basic_imports.py
%load basic_imports.py
```

This is all built upon`ipython`

so you invoke the debugger ipython-style,
specifically:

```
from IPython.core.debugger import Tracer; Tracer()() # < 5.1
from IPython.core.debugger import set_trace; set_trace() # >= v5.1
```

e.g. forlatex free mathematics.

`python -m IPython.external.MathJax /path/to/source/MathJax.zip`

Sometime, you can’t see the whole code cell which is annoying. This is aknownissue The workaround is simple enough:

zooming out to 90% and zooming back in to 100%,

`Ctrl + - / +`

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images?It’s jupyter being conservative in version 5.0

```
jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py
```

update the`c.ServerApp.iopub_data_rate_limit`

to be big,
e.g.`c.ServerApp.iopub_data_rate_limit = 10000000`

.

This is fixed after 5.0.

Modern jupyter issuspicious of connections per default and will ask you either for a magic token or a password and thereafter, I thing, encrypts the connection (?), which is sometimes what I want. Not always.

But when I am inHPC hell, accessing jupyter notebooks through a double SSH tunnel, the last thing I need is to put a hat on a hat by*triply* securing the connection.
Also, sometimes the tokens do not work over SSH tunnels for me and I cannot work out why.
I think it is something about some particualr jupyter version mangling tockens, or possibly failing to report that it has not claimed a port used by someone else (although it happens more often than is plausible foor the latter case).CodingMatters notes
that the following invocation will disable all jupyter-side security measures:

`$ jupyter notebook --port 5000 --no-browser --ip='*' --ServerApp.token='' --ServerApp.password=''`

Obivously never do this unless you believe that everyone sharing a network with that machine has your best interests at heart.

There arevarious other useful settings which one could use to reduce security.
In config file format for`~/.jupyter/jupyter_notebook_config.py`

:

```
c.ServerApp.disable_check_xsrf = True #irritates ssh tunnel for me that one time
c.ServerApp.open_browser = False # consumes a 1 time token and is pointless from a headlless HPC
c.ServerApp.use_redirect_file = False # forces display of token rather than writing it to some file that gets lost in the containerisation and is useless in headless HPC
c.ServerApp.allow_password_change = True # Allow password setup somewhere sensible.
c.ServerApp.token = '' # no auth needed
c.ServerApp.password = password # actually needs to be hashed - see below
```

Eric Hodgins recommends this hack for a simple password without messing about trying to be clever with their browser infrastructure which TBH does seem to break pretty often for me.

```
c = get_config()
c.ServerApp.ip = '*'
c.ServerApp.open_browser = False
c.ServerApp.port = 5000
# setting up the password
from IPython.lib import passwd
password = passwd("your_secret_password")
c.ServerApp.password = password
```

Jupyter Server Proxy lets you run arbitrary external processes (such as RStudio, Shiny Server, syncthing, PostgreSQL, etc) alongside your notebook, and provide authenticated web access to them.

Note

This project used to be called

nbserverproxy. if you have an older version of nbserverproxy installed, remember to uninstall it before installing jupyter-server-proxy - otherwise they may conflictThe primary use cases are:

- Use with JupyterHub / Binder to allow launching users into web interfaces that have nothing to do with Jupyter - such as RStudio, Shiny, or OpenRefine.
- Allow access from frontend javascript (in classic notebook or JupyterLab extensions) to access web APIs of other processes running locally in a safe manner. This is used by theJupyterLab extension fordask.

Chattopadhyay, Souti, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020.“What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities,” 12.

Granger, Brian E., and Fernando Pérez. 2021.“Jupyter: Thinking and Storytelling With Code and Data.”*Computing in Science Engineering* 23 (2): 7–14.

Himmelstein, Daniel S., Vincent Rubinetti, David R. Slochower, Dongbo Hu, Venkat S. Malladi, Casey S. Greene, and Anthony Gitter. 2019.“Open Collaborative Writing with Manubot.” Edited by Dina Schneidman-Duhovny.*PLOS Computational Biology* 15 (6): e1007128.

Otasek, David, John H. Morris, Jorge Bouças, Alexander R. Pico, and Barry Demchak. 2019.“Cytoscape Automation: Empowering Workflow-Based Network Analysis.”*Genome Biology* 20 (1): 185.

Sokol, Kacper, and Peter Flach. 2021.“You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source.” In. Zenodo.

Various notes on dealing with thejupyter file format, which in the name of convenience, gives you new and different problems to learn to manage.
Because jupyter notebooks (the file format) are a weird mash of binary multimedia content and program input and output data, all wrapped up in a JSON encoding, many things that would be simple and seamless with normal text simply do not work for the`.ipynb`

jupyter file format.
This is a huge barrier to seamlessly integrating text and notebook-based development practices, and thus impairs the jupyter mission goal of providing an easy on-ramp to data science for users.
IMO this is one of the larger annoyances of the many in the jupyter system, and it would have been completely avoidable, if they had settled on a less awkwardtextual data format than JSON for the backend storage, likeR orjulia did.
Too late now, I suppose.
There are various workarounds, ameliorations and so forth, but no one agrees on which to use so I switch between all of them constantly.

One of the things that breaks is diffing and merging; Things get messy when I try to put them into version control. In particular, my repository gets very large and my git client may or may not show diffs. Oh, and merging using the usual merge tools is a likely to break things because merge tools do not know about idiosyncratic JSON-based storage. How do we fix that? Here are some options.

I want aninteractive workbook that can dynamically execute code and include documentation?

No problem. There are many solutions that are superior to jupyter for this, albeit less hyped. One obvious example isknitr, which can support python. Lesser known projects likepweave also work fine. There are surely many more.

There is nothing requiring me to stick to jupyter except that it is mysteriously popular and ubiquitous.

However, popularity is a strong reason. jupyter is theqwerty of python development, so we are usually stuck with it. Here are some alternatives.

For python projects there isNbdev which aims to solve a number of problems at once, including versioning notebooks.

It claims these advantages

- A robust,
**two-way sync between notebooks and source code**, which allow you to use your IDE for code navigation or quick edits if desired.… - Tools for
**merge/conflict resolution**with notebooks in a**human readable format**.

In addition there are other useful things

**Automatically generate docs**from Jupyter notebooks. These docs are searchable and automatically hyperlinked to appropriate documentation pages by introspecting keywords you surround in backticks.- Utilities to
**automate the publishing of pypi and conda packages**including version number management. - Ability to
**write tests directly in notebooks**without having to learn special APIs. These tests get executed in parallel with a single CLI command. You can even define certain groups of tests such that you don’t have to always run long-running tests. **Continuous integration (CI) comes setup for you withGitHub Actions**out of the box, that will run tests automatically for you. Even if you are not familiar with CI or GitHub Actions, this starts working right away for you without any manual intervention.**Integration With GitHub Pages for docs hosting**: nbdev allows you to easily host your documentation for free, using GitHub pages.- Create Python modules, following
**best practices such as automatically defining**(more details) with your exported functions, classes, and variables.`__all__`

**Math equation support**with LaTeX.

So I guess that is nice? I am faintly offended that the solution to work around jupyter’s attempt to “fix” plain text storage by replacing it is to recreate it. It does not sadly make jupyter itself into an easier place to type code.

Also I can’t be bothered installing this in all my jupyter installations. Is there a more minimal solution?

You can automatically strip images and other big things from your notebook to keep them smaller and tidier if you are using git as your version control.
They still work, but if you restore the notebook from git it does not any longer have all the graphics in it, and is many megabytes smaller.
Usually this is fine, since you already have the code to generate them again*right there*, so you don’t necessarily want them around anyway.

Manually doing it is tedious. Seehow fastai does this automatically with automated git hooks. Not well explained, but it works. The quickest way if we are not working for fastai isnbstripout upon which the fastai hack is AFAICT based nbstripout includes its own installation script, which usually works except not in git submodules, although nothing works in submodules so no change there. You can set upattributes so that thesefilters and others are invoked automatically. It’s a surprisingly under-documented thing for some reason. Excluding certain from nbstripout filtering can be done several ways, including in the notebook itself.See the github issue on that theme./

**tl;dr** In the repository do this:

```
pip install nbstripout # or conda install nbstripout -c conda-forge
nbstripout --install # basic mode
nbstripout --install --attributes .gitattributes # slightly more explicit attributes
```

I do this for all my notebooks now. This doesn’t entirely solve the diffing and merging hurdles, but usually removes just enough pointless cruft that merging kind-of works fairly often.

If we want images in the notebook to be small is to tellmatplotlib to use a low-quality images:

```
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('jpeg', quality=70)
```

I had tp go deep to find this. The answers were in the source ofPython.core.pylabtools.select_figure_formats andmmatplotlib_inline/backend_inline.py.

A tip fromErin Kenna forplotly is to keep versioned images small but allow larger ones sometimes .

```
#%%
# Pick a renderer
# https://plotly.com/python/renderers/
renderer="plotly_mimetype"
# renderer="jpeg"
if renderer == "jpeg":
jpeg_renderer = pio.renderers['jpeg']
jpeg_renderer.width = None
jpeg_renderer.height = None
jpeg_renderer.scale = 1.8 # modifying the scale since we won’t have zoom controls
```

Then explicitly pass that to all plots.

```
#%%
fig = px.imshow(hemi_hor_im)
fig.show(renderer=renderer)
```

This is obviously a little more manual and error-prone, but the power of being able to explcitly include some images is useful

`jupytext`

3>Another way to make my notebooks something closer to text isjupytext. It claims todo that and more:

Wish you could edit [jupyter notebooks] in your favourite IDE? And get clear and meaningful diffs when doing version control? Then… Jupytext may well be the tool you’re looking for!

Jupytext can save Jupyter notebooks as Markdown and R Markdown documents, Julia, Python, R, Bash, Scheme, Clojure, C++ and q/kdb+ scripts.

There are multiple ways to use jupytext:

Directly from Jupyter Notebook or JupyterLab.Jupytext provides a contents manager that allows Jupyter to save your notebook to your favorite format (`.py`

,`.R`

,`.jl`

,`.md`

,`.Rmd`

…) in addition to (or in place of) the traditional`.ipynb`

file. The text representation can be edited in your favorite editor. When you’re done, refresh the notebook in Jupyter: inputs cells are loaded from the text file, while output cells are reloaded from the`.ipynb`

file if present. Refreshing preserves kernel variables, so you can resume your work in the notebook and run the modified cells without having to rerun the notebook in full.

On the command line.`jupytext`

converts Jupyter notebooks to their text representation, and back. The command line tool can act on notebooks in many ways. It can synchronize multiple representations of a notebook, pipe a notebook into a reformatting tool like`black`

, etc… It can also work as a pre-commit hook if you wish to automatically update the text representation when you commit the`.ipynb`

file.

A plus is that search-and-replace would then work seamlessly across normal code and wak notebook-encoded code, which is, I assure you, a constant irritation.

One downside is that if I develop a workflow about transforming my notebook back into*proper code* in order to run it, I might wonder if the notebook has gained me anything over ordinary literate coding except circuitous workarounds.
Am I then sure I do not secretly want to useknitr?
Pro tip: although that advertises itself for R, it supports python already.
That is worth mentioning again.
We do not need to be arsing about with this thing. We can leave.

Anyway, jupytext sounds promising, right? There is a downside for me which is that I just (2020-11-02) spent 90 minutes trying to get jupytext to work on my jupyter notebook and it continues to sullenly fail to function. I do not have any more time for debugging this nonsense. I might check back in a year or two, but for now this is dead to me.

Ok, I surrender. We are stuck with the nasty jupyter notebook format. Finenbdime provides diffing and merging for notebooks. It has git integration:

`nbdime config-git --enable --global`

I do not use this one because it seemed too slow on the large notebooks I was using and did not play well with mygit GUI. In any case it does not seem to support 3 way merging, which means most merges fail and need manual intervention anyway.

I can host static versions easily usingnbviewer (andgithub will do this automatically.) For fancy variations I need to readhow the document templates work.Here is a base latex template for e.g. academic use.

For special occasions I couldwrite your own or customize an existing exporter Julius Schulzhas virtuosic tips, e.g. using cell metadata to format figures like this:

```
{
"caption": "somecaption",
"label": "fig:somelabel",
"widefigure": true
}
```

fast.ai’snb2md trick renders jupyter for blogging with myblogging platform of choice.
See also`jupytext`

above.

Placeholder.

AFAICS, generative models using score-matching to learn and Langevin MCMC to sample. I am vaguely aware that this oversimplifies a rich and interesting history of convergence of many useful techniques, but not invested enough to perform a reconstruction upon the details.

Denoising score matchingHyvärinen (2005). Seescore matching.

- What are Diffusion Models?
- Yang Song,Generative Modeling by Estimating Gradients of the Data Distribution
- Diffusion models are autoencoders – Sander Dieleman
- Denoising Diffusion-based Generative Modeling: Foundations and Applications
- Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications
- What's the score? – Review of latest Score Based Generative Modeling papers.

Suggestive connection tothermodynamics(Sohl-Dickstein et al. 2015).

Anderson, Brian D. O. 1982.“Reverse-Time Diffusion Equation Models.”*Stochastic Processes and Their Applications* 12 (3): 313–26.

Dhariwal, Prafulla, and Alex Nichol. 2021.“Diffusion Models Beat GANs on Image Synthesis.”*arXiv:2105.05233 [Cs, Stat]*, June.

Dutordoir, Vincent, Alan Saul, Zoubin Ghahramani, and Fergus Simpson. 2022.“Neural Diffusion Processes.” arXiv.

Han, Xizewen, Huangjie Zheng, and Mingyuan Zhou. 2022.“CARD: Classification and Regression Diffusion Models.” arXiv.

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. 2020.“Denoising Diffusion Probabilistic Models.”*arXiv:2006.11239 [Cs, Stat]*, December.

Hoogeboom, Emiel, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, and Tim Salimans. 2021.“Autoregressive Diffusion Models.”*arXiv:2110.02037 [Cs, Stat]*, October.

Hyvärinen, Aapo. 2005.“Estimation of Non-Normalized Statistical Models by Score Matching.”*The Journal of Machine Learning Research* 6 (December): 695–709.

Jalal, Ajil, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon Tamir. 2021.“Robust Compressed Sensing MRI with Deep Generative Priors.” In*Advances in Neural Information Processing Systems*, 34:14938–54. Curran Associates, Inc.

Jolicoeur-Martineau, Alexia, Rémi Piché-Taillefer, Ioannis Mitliagkas, and Remi Tachet des Combes. 2022.“Adversarial Score Matching and Improved Sampling for Image Generation.” In.

Nichol, Alex, and Prafulla Dhariwal. 2021.“Improved Denoising Diffusion Probabilistic Models.”*arXiv:2102.09672 [Cs, Stat]*, February.

Sohl-Dickstein, Jascha, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015.“Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”*arXiv:1503.03585 [Cond-Mat, q-Bio, Stat]*, November.

Song, Jiaming, Chenlin Meng, and Stefano Ermon. 2021.“Denoising Diffusion Implicit Models.”*arXiv:2010.02502 [Cs]*, November.

Song, Yang, Conor Durkan, Iain Murray, and Stefano Ermon. 2021.“Maximum Likelihood Training of Score-Based Diffusion Models.” In*Advances in Neural Information Processing Systems*.

Song, Yang, and Stefano Ermon. 2020a.“Generative Modeling by Estimating Gradients of the Data Distribution.” In*Advances In Neural Information Processing Systems*. arXiv.

———. 2020b.“Improved Techniques for Training Score-Based Generative Models.” In*Advances In Neural Information Processing Systems*. arXiv.

Song, Yang, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. 2019.“Sliced Score Matching: A Scalable Approach to Density and Score Estimation.” arXiv.

Song, Yang, Liyue Shen, Lei Xing, and Stefano Ermon. 2022.“Solving Inverse Problems in Medical Imaging with Score-Based Generative Models.” In. arXiv.

Song, Yang, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2022.“Score-Based Generative Modeling Through Stochastic Differential Equations.” In.

Swersky, Kevin, Marc’Aurelio Ranzato, David Buchman, Nando D. Freitas, and Benjamin M. Marlin. 2011.“On Autoencoders and Score Matching for Energy Based Models.” In*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*, 1201–8.

Vincent, Pascal. 2011.“A connection between score matching and denoising autoencoders.”*Neural Computation* 23 (7): 1661–74.

Yang, Ling, Zhilong Zhang, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Ming-Hsuan Yang, and Bin Cui. 2022.“Diffusion Models: A Comprehensive Survey of Methods and Applications.” arXiv.

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

A random-sampling variant/generalisation of theKalman-Bucy filter. That also describesparticle filters, but the randomisation is different than those. We can do both types of randomisation. This sent has a few tweaks that make it more tenable in tricky situations with high dimensional state spaces or nonlinearities in inconvenient places. A popular data assimilation method forspatiotemporal models.

Katzfuss, Stroud, and Wikle (2016);Roth et al. (2017);Fearnhead and Künsch (2018), are all pretty good.Schillings and Stuart (2017) has been recommended byHaber, Lucka, and Ruthotto (2018) as the canonical modern version.Wikle and Berliner (2007) presents it in a broader data assimilation context, although it is too curt to be helpful for me.Mandel (2009) is slightly longer. The inventor of the method explains it inGeir Evensen (2003), but I could make neither head nor tail of that, since it uses too much oceanography terminology.Roth et al. (2017) is probably the best for my background. Let us copy their notation.

We start from the discrete-time state-space models; the basic one is thelinear system\[ \begin{aligned} x_{k+1} &=F x_{k}+G v_{k}, \\ y_{k} &=H x_{k}+e_{k}, \end{aligned} \] with state\(x\in\mathbb{R}^n\) and the measurement\(y\in\mathbb{R}^m\). The initial state\(x_{0}\), the process noise\(v_{k}\), and the measurement noise\(e_{k}\) are mutually independent such that\[\begin{aligned} \Ex x_{0}&=\hat{x}_{0}\\ \Ex v_{k}&=0\\ \Ex e_{k}&=0\\ \cov x_{0} &=P_{0}\\ \cov v_{k} & =Q\\ \cov e_{k}&=R \end{aligned}\] and all areGaussian.

The Kalman filter propagates state estimates\(\hat{x}_{k \mid k}\) and covariance matrices\(P_{k \mid k}\) for this model.
The KF*update* or*prediction* or*forecast* is given by the step\[
\begin{aligned}
&\hat{x}_{k+1 \mid k}=F \hat{x}_{k \mid k} \\
&P_{k+1 \mid k}=F P_{k \mid k} F^{\top}+G Q G^{\top}
\end{aligned}
\]
We predict the observations forward using these state estimates via\[
\begin{aligned}
\hat{y}_{k \mid k-1} &=H \hat{x}_{k \mid k-1}, \\
S_{k} &=H P_{k \mid k-1} H^{\top}+R .
\end{aligned}
\]
Given these and an actual observation, we update the state estimates using a*gain matrix*,\(K_{k}\)\[
\begin{aligned}
\hat{x}_{k \mid k} &=\hat{x}_{k \mid k-1}+K_{k}\left(y_{k}-\hat{y}_{k \mid k-1}\right) \\
&=\left(I-K_{k} H\right) \hat{x}_{k \mid k-1}+K_{k} y_{k}, \\
P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1}\left(I-K_{k} H\right)^{\top}+K_{k} R K_{k}^{\top}.
\end{aligned}
\]
in what geoscience types refer to as an*analysis* update.
The variance-minimising gain is given\[
K_{k}=P_{k \mid k-1} H^{\top} S_{k}^{-1}=M_{k} S_{k}^{-1},
\]
where\(M_{k}\) is the cross-covariance between the state and output predictions.

In the Ensemble Kalman filter, we approximate some of these quantities of interest using samples; this allows us to relax the assumption of Gaussianity and gets us computational savings in certain problems of interest. That does sound very similar toparticle filters, and indeed there is a relation.

Instead of maintaining the\(n\)-dimensional estimate\(\hat{x}_{k \mid k}\) and the\(n \times n\) covariance\(P_{k \mid k}\) as such, we maintain an ensemble of\(N<n\) sampled state realizations\[X_{k}:=\left[x_{k}^{(i)}\right]_{i=1}^{N}.\]
This notation is intended to imply that we are treating these realisations as an\(n \times N\) matrix\(X_{k \mid k}\) with columns\(x_{k}^{(i)}\).
We introduce the following notation for ensemble moments:\[
\begin{aligned}
&\bar{x}_{k \mid k}=\frac{1}{N} X_{k \mid k} \one \\
&\bar{P}_{k \mid k}=\frac{1}{N-1} \widetilde{X}_{k \mid k} \widetilde{X}_{k \mid k}^{\top},
\end{aligned}
\]
where\(\one=[1, \ldots, 1]^{\top}\) is an\(N\)-dimensional vector and\[
\widetilde{X}_{k \mid k}=X_{k \mid k}-\bar{x}_{k \mid k} \one^{\top}=X_{k \mid k}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right)
\]
is an ensemble of*anomalies*/*deviations* from\(\bar{x}_{k \mid k}\), which I would call it the*centred version*.
We attempt to match the moments of the ensemble with those realised by a true Kalman filter, in the sense that\[
\begin{aligned}
&\bar{x}_{k \mid k}:=\frac{1}{N} \sum_{i=1}^{N} x_{k}^{(i)} \approx \hat{x}_{k \mid k}, \\
&\bar{P}_{k \mid k}:=\frac{1}{N-1} \sum_{i=1}^{N}\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)\left(x_{k}^{(i)}-\bar{x}_{k \mid k}\right)^{\top} \approx P_{k \mid k} .
\end{aligned}
\]
The forecast step computes\(X_{k+1 \mid k}\) such that its moments are close to\(\hat{x}_{k+1 \mid k}\) and\(P_{k+1 \mid k}\).
An ensemble of\(N\) independent process noise realizations\(V_{k}:=\left[v_{k}^{(i)}\right]_{i=1}^{N}\) with zero mean and covariance\(Q\), is used in\[
X_{k+1 \mid k}=F X_{k \mid k}+G V_{k}.
\]

Next the\(X_{k \mid k-1}\) is adjusted to obtain the filtering ensemble\(X_{k \mid k}\) by applying an update to each ensemble member: With some gain matrix\(\bar{K}_{k}\) the KF update is applied to the ensemble by the update\[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top} . \] This does not yet approximate the update of the full Kalman observation — there is no term\(\bar{K}_{k} R \bar{K}_{k}^{\top}\); We have a choice how to implement that.

In the stochastic method, we use artificial zero-mean measurement noise realizations\(E_{k}:=\left[e_{k}^{(i)}\right]_{i=1}^{N}\) with covariance\(R\).\[ X_{k \mid k}=\left(I-\bar{K}_{k} H\right) X_{k \mid k-1}+\bar{K}_{k} y_{k} \one^{\top}-\bar{K}_{k} E_{k} . \] The resulting\(X_{k \mid k}\) has the correct ensemble mean and covariance,\(\hat{x}_{k \mid k}\) and\(P_{k \mid k}\).

If we define a predicted output ensemble\[ Y_{k \mid k-1}=H X_{k \mid k-1}+E_{k} \] that evokes the classic Kalman update (and encapsulates information about)\(\hat{y}_{k \mid k-1}\) and\(S_{k}\), we can rewrite this update into one that resembles the Kalman update:\[ X_{k \mid k}=X_{k \mid k-1}+\bar{K}_{k}\left(y_{k} \one^{\top}-Y_{k \mid k-1}\right) . \]

Now, the gain matrix\(\bar{K}_{k}\) in the classic KF is computed from the covariance matrices of the predicted state and output. In the EnKF, the required\(M_{k}\) and\(S_{k}\) must be estimated from the prediction ensembles. The obvious way of doing that is to once again centre the ensemble,\[ \begin{aligned} &\widetilde{X}_{k \mid k-1}=X_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \\ &\widetilde{Y}_{k \mid k-1}=Y_{k \mid k-1}\left(I_{N}-\frac{1}{N} \one \one^{\top}\right) \end{aligned} \] and use the sample covariances\[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}, \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} . \end{aligned} \] The gain\(\bar{K}_{k}\) is then the solution to the system of linear equations,\[ \bar{K}_{k} \widetilde{Y}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top}=\widetilde{X}_{k \mid k-1} \widetilde{Y}_{k \mid k-1}^{\top} \]

Resemblance to unscented/sigma-point filtering also apparent. TBD.

The additive measurement noise model we have used the\(e_{k}\) for should not affect the cross covariance\(M_k\). Thus it is reasonable to make the substitution\[ \widetilde{Y}_{k \mid k-1}\longrightarrow \widetilde{Z}_{k \mid k-1}=H \widetilde{X}_{k \mid k-1} \] to get a less noisy update\[ \begin{aligned} \bar{M}_{k} &=\frac{1}{N-1} \widetilde{X}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top} \\ \bar{S}_{k} &=\frac{1}{N-1} \widetilde{Z}_{k \mid k-1} \widetilde{Z}_{k \mid k-1}^{\top}+R \end{aligned} \] The Kalman gain\(\bar{K}_{k}\) is then computed as in the KF. Or we can take it to be a matrix square-root\(R^{\frac{1}{2}}\) with\(R^{\frac{1}{2}} R^{\frac{\top}{2}}=R\) and then factorize\[ \bar{S}_{k}=\left[\begin{array}{cc}\frac{1}{\sqrt{N-1}} \widetilde{Z}_{k \mid k-1}\quad R^{\frac{1}{2}}\end{array}\right] \left[\begin{array}{c}\frac{1}{\sqrt{N-1}} \widetilde{Z}^{\top}_{k \mid k-1} \\ R^{\frac{\top}{2}}\end{array}\right]. \]

TBD: EAKF and ETKF(Tippett et al. 2003) which deterministically propagate an estimate\[ P_{k \mid k}^{\frac{1}{2}} P_{k \mid k}^{\frac{\top}{2}}=P_{k \mid k} \] which introduces less sampling noise.Roth et al. (2017) explain it as rewriting the measurement update to use a square root\(P_{k \mid k-1}^{\frac{1}{2}}\) and in particular the ensemble approximation\(\frac{1}{N-1} \widetilde{X}_{k \mid k-1}\) :\[ \begin{aligned} P_{k \mid k} &=\left(I-K_{k} H\right) P_{k \mid k-1} \\ &=P_{k \mid k-1}^{\frac{1}{2}}\left(I-P_{k \mid k-1}^{\frac{\top}{2}} H^{\top} S_{k}^{-1} H P_{k \mid k-1}^{\frac{1}{2}}\right) P_{k \mid k-1}^{\frac{\top}{2}} \\ & \approx \frac{1}{N-1} \widetilde{X}_{k \mid k-1}\left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right) \widetilde{X}_{k \mid k-1}^{\top}. \end{aligned} \] Factorising,\[ \left(I-\frac{1}{N-1} \widetilde{Z}_{k \mid k-1}^{\top} \bar{S}_{k}^{-1} \widetilde{Z}_{k \mid k-1}\right)=\Pi_{k}^{\frac{1}{2}} \Pi_{k}^{\frac{\top}{2}}, \] The\(\Pi_{k}^{\frac{1}{2}}\in\mathbb{R}^{N\times N}\) can be used to create a deviation ensemble\[ \tilde{X}_{k \mid k}=\tilde{X}_{k \mid k-1} \Pi_{k}^{\frac{1}{2}} \] that correctly encodes\(P_{k \mid k}\) without using random perturbations. The actual filtering is achieved by updating each sample according to\[ \bar{x}_{k \mid k}=\left(I-\bar{K}_{k} H\right) F_{x_{k-1 \mid k-1}}+\bar{K}_{k} y_{k}, \] where\(\bar{K}_{k}\) is computed from the deviation ensembles.

TBD. Permits calculating the operations without forming covariance matrices.

TBD

The ensemble is rank deficient. Question: When can we sample other states from the ensemble to improve the rank by stationary posterior moves?

TBD

Katzfuss, Stroud, and Wikle (2016) claims there are two major approaches to smoothing:Stroud et al. (2010) -type reverse methods, and the EnKS(Geir Evensen and van Leeuwen 2000) which augments the states with lagged copies rather than doing a reverse pass.

Here are some other papers I sawN. K. Chada, Chen, and Sanz-Alonso (2021);Luo et al. (2015);White (2018);Zhang et al. (2018).

Can we use ensemble methods foronline parameter estimation? Apparently.G. Evensen (2009);Malartic, Farchi, and Bocquet (2021);Moradkhani et al. (2005);Fearnhead and Künsch (2018).

Bishop and Del Moral (2020);Del Moral, Kurtzmann, and Tugaut (2017);Garbuno-Inigo et al. (2020);Kelly, Law, and Stuart (2014);Le Gland, Monbet, and Tran (2009);Taghvaei and Mehta (2019).

Intimate. Seeparticle filters.

Claudia Schilling’s filter(Schillings and Stuart 2017) is an elegant version which looks somehow more general than the original but also simpler.Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

Anderson, Jeffrey L. 2009.“Ensemble Kalman Filters for Large Geophysical Applications.”*IEEE Control Systems Magazine* 29 (3): 66–82.

Bishop, Adrian N., and Pierre Del Moral. 2020.“On the Mathematical Theory of Ensemble (Linear-Gaussian) Kalman-Bucy Filtering.”*arXiv:2006.08843 [Math, Stat]*, June.

Chada, Neil K., Yuming Chen, and Daniel Sanz-Alonso. 2021.“Iterative Ensemble Kalman Methods: A Unified Perspective with Some New Variants.”*Foundations of Data Science* 3 (3): 331.

Chada, Neil, and Xin Tong. 2022.“Convergence Acceleration of Ensemble Kalman Inversion in Nonlinear Settings.”*Mathematics of Computation* 91 (335): 1247–80.

Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022.“A Novel Neural Network Training Framework with Data Assimilation.”*The Journal of Supercomputing*, June.

Chen, Yuming, Daniel Sanz-Alonso, and Rebecca Willett. 2021.“Auto-Differentiable Ensemble Kalman Filters.”*arXiv:2107.07687 [Cs, Stat]*, July.

Del Moral, P., A. Kurtzmann, and J. Tugaut. 2017.“On the Stability and the Uniform Propagation of Chaos of a Class of Extended Ensemble Kalman–Bucy Filters.”*SIAM Journal on Control and Optimization* 55 (1): 119–55.

Dubrule, Olivier. 2018.“Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In*Handbook of Mathematical Geosciences: Fifty Years of IAMG*, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.

Evensen, G. 2009.“The Ensemble Kalman Filter for Combined State and Parameter Estimation.”*IEEE Control Systems* 29 (3): 83–104.

Evensen, Geir. 2003.“The Ensemble Kalman Filter: Theoretical Formulation and Practical Implementation.”*Ocean Dynamics* 53 (4): 343–67.

———. 2004.“Sampling Strategies and Square Root Analysis Schemes for the EnKF.”*Ocean Dynamics* 54 (6): 539–60.

———. 2009.*Data Assimilation - The Ensemble Kalman Filter*. Berlin; Heidelberg: Springer.

Evensen, Geir, and Peter Jan van Leeuwen. 2000.“An Ensemble Kalman Smoother for Nonlinear Dynamics.”*Monthly Weather Review* 128 (6): 1852–67.

Fearnhead, Paul, and Hans R. Künsch. 2018.“Particle Filters and Data Assimilation.”*Annual Review of Statistics and Its Application* 5 (1): 421–49.

Finn, Tobias Sebastian, Gernot Geppert, and Felix Ament. 2021.“Ensemble-Based Data Assimilation of Atmospheric Boundary Layerobservations Improves the Soil Moisture Analysis.” Preprint. Catchment hydrology/Modelling approaches.

Garbuno-Inigo, Alfredo, Franca Hoffmann, Wuchen Li, and Andrew M. Stuart. 2020.“Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.”*SIAM Journal on Applied Dynamical Systems* 19 (1): 412–41.

Grooms, Ian, and Gregor Robinson. 2021.“A Hybrid Particle-Ensemble Kalman Filter for Problems with Medium Nonlinearity.”*PLOS ONE* 16 (3): e0248266.

Guth, Philipp A., Claudia Schillings, and Simon Weissmann. 2020.“Ensemble Kalman Filter for Neural Network Based One-Shot Inversion.” arXiv.

Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018.“Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.”*arXiv:1805.08034 [Cs, Math]*, May.

Hou, Elizabeth, Earl Lawrence, and Alfred O. Hero. 2016.“Penalized Ensemble Kalman Filters for High Dimensional Non-Linear Systems.”*arXiv:1610.00195 [Physics, Stat]*, October.

Kantas, Nikolas, Arnaud Doucet, Sumeetpal S. Singh, Jan Maciejowski, and Nicolas Chopin. 2015.“On Particle Methods for Parameter Estimation in State-Space Models.”*Statistical Science* 30 (3): 328–51.

Katzfuss, Matthias, Jonathan R. Stroud, and Christopher K. Wikle. 2016.“Understanding the Ensemble Kalman Filter.”*The American Statistician* 70 (4): 350–57.

Kelly, D. T. B., K. J. H. Law, and A. M. Stuart. 2014.“Well-Posedness and Accuracy of the Ensemble Kalman Filter in Discrete and Continuous Time.”*Nonlinearity* 27 (10): 2579.

Kovachki, Nikola B., and Andrew M. Stuart. 2019.“Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.”*Inverse Problems* 35 (9): 095005.

Kuzin, Danil, Le Yang, Olga Isupova, and Lyudmila Mihaylova. 2018.“Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning.”*2018 21st International Conference on Information Fusion (FUSION)*, July, 39–46.

Lakshmivarahan, S., and David J. Stensrud. 2009.“Ensemble Kalman Filter.”*IEEE Control Systems Magazine* 29 (3): 34–46.

Law, Kody J. H., Hamidou Tembine, and Raul Tempone. 2016.“Deterministic Mean-Field Ensemble Kalman Filtering.”*SIAM Journal on Scientific Computing* 38 (3).

Le Gland, François, Valerie Monbet, and Vu-Duc Tran. 2009.“Large Sample Asymptotics for the Ensemble Kalman Filter,” 25.

Lei, Jing, Peter Bickel, and Chris Snyder. 2009.“Comparison of Ensemble Kalman Filters Under Non-Gaussianity.”*Monthly Weather Review* 138 (4): 1293–1306.

Luo, Xiaodong, Andreas S. Stordal, Rolf J. Lorentzen, and Geir Nævdal. 2015.“Iterative Ensemble Smoother as an Approximate Solution to a Regularized Minimum-Average-Cost Problem: Theory and Applications.”*SPE Journal* 20 (05): 962–82.

Malartic, Quentin, Alban Farchi, and Marc Bocquet. 2021.“State, Global and Local Parameter Estimation Using Local Ensemble Kalman Filters: Applications to Online Machine Learning of Chaotic Dynamics.”*arXiv:2107.11253 [Nlin, Physics:physics, Stat]*, July.

Mandel, Jan. 2009.“A Brief Tutorial on the Ensemble Kalman Filter.”*arXiv:0901.3725 [Physics]*, January.

Moradkhani, Hamid, Soroosh Sorooshian, Hoshin V. Gupta, and Paul R. Houser. 2005.“Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter.”*Advances in Water Resources* 28 (2): 135–47.

Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Constant-Time Predictive Distributions for Gaussian Processes.” In. arXiv.

Popov, Andrey Anatoliyevich. 2022.“Combining Data-Driven and Theory-Guided Models in Ensemble Data Assimilation.” ETD. Virginia Tech.

Reich, Sebastian, and Simon Weissmann. 2019.“Fokker-Planck Particle Systems for Bayesian Inference: Computational Approaches,” November.

Roth, Michael, Gustaf Hendeby, Carsten Fritsche, and Fredrik Gustafsson. 2017.“The Ensemble Kalman Filter: A Signal Processing Perspective.”*EURASIP Journal on Advances in Signal Processing* 2017 (1): 56.

Schillings, Claudia, and Andrew M. Stuart. 2017.“Analysis of the Ensemble Kalman Filter for Inverse Problems.”*SIAM Journal on Numerical Analysis* 55 (3): 1264–90.

Stroud, Jonathan R., Michael L. Stein, Barry M. Lesht, David J. Schwab, and Dmitry Beletsky. 2010.“An Ensemble Kalman Filter and Smoother for Satellite Data Assimilation.”*Journal of the American Statistical Association* 105 (491): 978–90.

Taghvaei, Amirhossein, and Prashant G. Mehta. 2019.“An Optimal Transport Formulation of the Ensemble Kalman Filter,” October.

———. 2021.“An Optimal Transport Formulation of the Ensemble Kalman Filter.”*IEEE Transactions on Automatic Control* 66 (7): 3052–67.

Tippett, Michael K., Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, and Jeffrey S. Whitaker. 2003.“Ensemble Square Root Filters.”*Monthly Weather Review* 131 (7): 1485–90.

Ubaru, Shashanka, Jie Chen, and Yousef Saad. 2017.“Fast Estimation of\(tr(f(A))\) via Stochastic Lanczos Quadrature.”*SIAM Journal on Matrix Analysis and Applications* 38 (4): 1075–99.

White, Jeremy T. 2018.“A Model-Independent Iterative Ensemble Smoother for Efficient History-Matching and Uncertainty Quantification in Very High Dimensions.”*Environmental Modelling & Software* 109 (November): 191–201.

Wikle, Christopher K., and L. Mark Berliner. 2007.“A Bayesian Tutorial for Data Assimilation.”*Physica D: Nonlinear Phenomena*, Data Assimilation, 230 (1): 1–16.

Yang, Biao, Jonathan R. Stroud, and Gabriel Huerta. 2018.“Sequential Monte Carlo Smoothing with Parameter Estimation.”*Bayesian Analysis* 13 (4): 1137–61.

Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020.“Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In*Machine Learning, Optimization, and Data Science*, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.

Zhang, Jiangjiang, Guang Lin, Weixuan Li, Laosheng Wu, and Lingzao Zeng. 2018.“An Iterative Local Updating Ensemble Smoother for Estimation and Uncertainty Assessment of Hydrologic Model Parameters With Multimodal Distributions.”*Water Resources Research* 54 (3): 1716–33.

I am especially interested in modeling how technology makes profound changes to the*rules* of
the game, not just marginal changes in some parameters;
So not, say, residual stochastic shocks (in the “Real Business Cycle” models), or as the slope of a
marginal cost of production curve (in textbook microeconomics).
That is, technological innovation that leads to a*qualitative*, rather than
incremental, change in the state of play — respecting that a lot of marginal
changes might indeed lead to major qualitative changes.

In recognition of that emphasis, I briefly called this entry “disruptive technology” instead of mere “innovation”, but then I felt like a TED speaker and woke up sweating in the night and changed it.

This is a vibrant area at the moment, readily disrupting itself. Brr. This notebook will probably explode soon and sporulate, leaving behind some smaller ones.

This concept, innovation, is at the very limit of modelability, surely?. The introduction of a new technology has many components, from social uptake, to supply chains, to the discovery process. The unexpected interactions with the other technologies out there. The internal combustion engine changed more than just transit times. The computer network altered more than just mail delivery times.

The cascade of effects from any one alteration is, it is likely, unknowable in advance, but might have some regularities, or at least some kind of underlying set of distributions as a stochastic process — some kind of branching process perhaps? Fixation processes, by analogy with evolutionary theory?

Gregory Clark and Julia Galef in podcast conversation:What caused the industrial revolution?:

the timing in 1770 in Britain makes it very, very difficult to explain the industrial revolution. The reason for that is that Britain at that time was institutionally a very stable society, and essentially had very little institutional change in the previous 80 years. When you’re trying to explain this event, it’s occurring against the kind of unchanged background of a society… with stable institutions. Very small government that mainly exists to fight more abroad. You have very stable wages within the society, they’re really not changing, the cost of capital was not changing. [..] It’s an economic environment which just looks very flat. Suddenly, in the middle of all of this, you’ve got this transforming event occurring.

A well documented example of this is the “Product space” model, due originally to Hidalgo and Hausmann, and made purportedly more rigorous by Caldarelli et al.

Considers products and nations in a bipartite graph, and does various network statistics upon it.

Attempts to be predictive about the “natural level” of a country’s GDP.

(c.f. Felix Reed-Tsochas’ affinity for such graphs, har har) Note that there is an implicit third part in the graph, to whit “capabilities”, which represent infrastructure to manufacture products.

Frank Schweitzer et al have a similar notion of inter-firm R&D networks which may be related? See references.

*random idea*:
Estimating number of SKUs as a surrogate for divisions of a modern economy a
la Beinhocker (lots of research into this because of Long Tail theories,
though the primary data is rarely included — might chase this.)

This is a cute model of certain stylised types of overhypeing.

The hype cycle itself is presumably on the plateau of productivity right now since Gartner has a steady business producing diagrams based on it.

SeeKevin Bryan’s paper reviews, especiallyModels of Innovation I: The Patent Race has a lot of papers discussed which I should file one day > I’ve been going through some old literature on innovation again as part of a current project, so I figured I ought put up a little review of this literature. I’ll cover five strands: the patent race, the partial equilibrium/auction, the quality ladder, sequential innovation a la Scotchmer and Green, and bandit experimentation.

Moore’s law versus Eroom’s law governing trends in marginal research productivity. What does the paucity of new drugs mean? Is this the same as the problem inscience? The difficulties form the basis of François Chollet’sarguments against the likelihood of a hard AI singularity andThe Singularity is not coming relate this to the question ofa technological singularity.

Is science slowing down? Experiencingdiminishing returns? Are good ideas getting harder to find?(Bloom et al. 2020)

Is it just me, or does this resemble a maximal statistic, or perhaps a rarefaction curve? SeeScannell et al. (2012).

Do technological improvements primarily result in lower prices for consumers or in higher profits for producers? If producers are able to capture (or appropriate) most of the social returns to innovation, then profits will rise and prices will fall relatively little.

How much of the profits from a new technology are captured by innovators will vary greatly across industries. For sectors where knowledge is in the public domain, such as weather forecasting, the new knowledge cannot be appropriated and productivity improvements are passed on in lower prices. In other industries with well-defined products and strong patents, such as pharmaceuticals, producers may be successful in capturing a large fraction of social gains in “Schumpeterian profits.”

Danny Crichton,The dual PhD problem of today’s startups:

Software is so democratized today, we forget just how blisteringly difficult almost all other facets of human endeavor are to even start. A middle schooler can build and deploy a web service scalable to millions of people with some lines of code (learned from easily and widely accessible resources on the internet) and some basic cloud infrastructure tools that are designed to onboard new users expeditiously.

Try that with rocketry. Or with pharma. Or with autonomous vehicles. Or any of the interesting new frontiers with green fields that are just sitting there waiting for the taking.

Andreessen Horowitz critiques AI startup hype that claims we have a unicorn factory.

I found that the marginal returns of researchers are rapidly declining. There is what’s called a ‘standing on toes’ effect: researcher productivity declines as the field grows. Because ML has recently grown very quickly, this makes better ML models much harder to find.

His thesisBesiroglu (2020) goes into depth.

Rebranded: Danny Dorling,Slowdown: The End of the Great Acceleration—and Why It’s Good for the Planet, the Economy, and Our Lives..

Jason Crawford,Technological stagnation: Why I came around

Thiel, P. A. (2014).

*Zero to one: notes on startups, or how to build the future*:“Thiel begins with the contrarian premise that we live in an age of technological stagnation, even if we’re too distracted by shiny mobile devices to notice. Information technology has improved rapidly, but there is no reason why progress should be limited to computers or Silicon Valley. Progress can be achieved in any industry or area of business. It comes from the most important skill that every leader must master: learning to think for yourself.

Doing what someone else already knows how to do takes the world from 1 to n, adding more of something familiar. But when you do something new, you go from 0 to 1. The next Bill Gates will not build an operating system. The next Larry Page or Sergey Brin won’t make a search engine. Tomorrow’s champions will not win by competing ruthlessly in today’s marketplace. They will escape competition altogether, because their businesses will be unique.”

I gather this is re-introducing Austrian economics to the silicon valley age. The sting will be in the policy prescriptions.

Sam Kriss is glum and hyperbolical as always.The Long, Slow, Rotten March of Progress:

None of these start-ups are doing anything new or interesting. Which shouldn’t be surprising: how often does anyone have a really good idea? What you actually get is just code, sloshing around, congealing into apps and firms that exist simply to exist. Uber for dogs, GrubHub for clothes, Patreon for sex, Slack for death, PayPal for God, WhatsApp for the spaceless non-void into which a blind universe expands.

I wonder how well supported isHartmann, Krabbe, and Spicer (2019):

What is driving the declining quality of innovation-driven entrepreneurship? In this paper, we argue the growing entrepreneurship industry is an important yet overlooked explanation. This rapidly growing industry has transformed the nature of entrepreneurship and encouraged a particular form of low-quality entrepreneurship. It has done so by leveraging the Ideology of Entrepreneurialism to mass-produce and mass-market products that make possible what we term Veblenian Entrepreneurship. This is entrepreneurship pursued primarily as a form of conspicuous consumption. Aside from lowering average entrepreneurial quality, Veblenian Entrepreneurship has a range of (short-run) positive and (medium and long-run) negative effects for both individuals and society at large. We argue that the rise of the Veblenian Entrepreneur has contributed to creating an increasingly Untrepreneurial Economy. That is an economy which superficially appears innovation-driven and dynamic, but is actually rife with inefficiencies and unable to generate economically meaningful growth through innovation.

As someone who has worked in what I cannot call a dotcom exit scam (because it did certain necessary tasks to be legally distinct from a scam and yet not ever enough to plausibly succeed) I am sympathetic to rants about this.

Seeastroturf and artificial reefs for now.

This needs tidying.

Originally(?) proposed by Patrick Collison and Tyler Cowen,We Need a New Science of Progress. The idea invites critique; that is kind of its purpose.

- Shannon Dea, Ted McCormickCan ‘progress studies’ contribute to knowledge? History suggests caution.
- Progress Studies: A Discipline is a Set of Institutional Norms
- Progress Studies
- How does progress happen?
- Progress Studies – A blog about the causes and consequences of Progress, Economic Growth, Technological Change, and Innovation
- The Moral Foundations of Progress

Seestroppy people.

Industrial Policy, Innovation, and Decline with Ben Landau-Taylor

On expected utility, part 1: Skyscrapers and madmen – Hands and Cities

Intellectual Property systems

Mathematical Model Reveals the Patterns of How Innovations Arise:

So Loreto, Strogatz, and co have modified Polya’s urn model to account for the possibility that discovering a new color in the urn can trigger entirely unexpected consequences. They call this model “Pólya’s urn with innovation triggering.” [..they] then calculate how the number of new colors picked from the urn, and their frequency distribution, changes over time. The result is that the model reproduces Heaps’ and Zipf’s Laws as they appear in the real world

“reveal” is excessive. But it is cute for sure.

Phillip Ball has ideas onWhat innovation really is

Mariana Mazzucato seems to be interesting

The Myth of the Myth of the Lone Genius is a little bit angry that narratives about collective endeavour have eclipsed lone genius narratives. There is a lot of terminological wrangling. My opinions, in my own terminology, which might be compatible with the argument and also with the argument this person is attempting to refute is “geniuses are real, I promise, I have met some. Also, they get stuff done, but in practice they get it done in a social context and sometimes that stuff is crazy and/or useless. Also teams of non-geniuses get stuff done, which can also, to be fair, be crazy and/or useless.” This summary lacks pith and invective, though.

Bert Hubert on innovation

How technology loses out in companies, countries & continents and what to do about it

Aghion, Philippe, Nicholas Bloom, Richard Blundell, Rachel Griffith, and Peter Howitt. 2002.“Competition and Innovation: An Inverted U Relationship.” Working Paper 9269. National Bureau of Economic Research.

Aghion, Philippe, Christopher Harris, Peter Howitt, and John Vickers. 2001.“Competition, Imitation and Growth with Step-by-Step Innovation.”*The Review of Economic Studies* 68 (3): 467–92.

Arbesman, Samuel, and Nicholas A Christakis. 2011.“Eurekometrics: Analyzing the Nature of Discovery.”*PLoS Comput Biol* 7 (6): –1002072.

Arbilly, Michal, and Kevin N. Laland. 2017.“The Magnitude of Innovation and Its Evolution in Social Animals.”*Proceedings of the Royal Society B: Biological Sciences* 284 (1848).

Arthur, W Brian. 2007.“The Structure of Invention.”*Research Policy* 36 (2): 274–87.

Arthur, W. Brian. 1989.“Competing Technologies, Increasing Returns, and Lock-In by Historical Events.”*The Economic Journal* 99 (394): 116–31.

Baber, Zaheer. 2010.“Society: The Rise of the‘Technium’.”*Nature* 468 (7322): 372.

Behrens, Arno, Stefan Giljum, Jan Kovanda, and Samuel Niza. 2007.“The Material Basis of the Global Economy: Worldwide Patterns of Natural Resource Extraction and Their Implications for Sustainable Resource Use Policies.”*Ecological Economics*, Special Section - Ecosystem Services and Agriculture Ecosystem Services and Agriculture, 64 (2): 444–53.

Beinhocker, Eric D. 2007.*Origin of Wealth: Evolution, Complexity, and the Radical Remaking of Economics*. Harvard Business Press.

Beinhocker, Eric D. 2011.“Evolution as Computation: Integrating Self-Organization with Generalized Darwinism.”*Journal of Institutional Economics* 7 (Special Issue 03): 393–423.

Benkler, Yochai. 2017.“Law, Innovation, and Collaboration in Networked Economy and Society.”*Annual Review of Law and Social Science* 13 (1): 231–50.

Besiroglu, Tamay. 2020.“Are Models Getting Harder to Find?”

Bhattacharya, Jay, and Mikko Packalen. 2020.“Stagnation and Scientific Incentives.” Working Paper 26752. National Bureau of Economic Research.

Bloom, Nicholas, Charles I. Jones, John Van Reenen, and Michael Webb. 2020.“Are Ideas Getting Harder to Find?”*American Economic Review* 110 (4): 1104–44.

Bloom, Nicholas, John Van Reenen, and Heidi Williams. 2019.“A Toolkit of Policies to Promote Innovation.”*Journal of Economic Perspectives* 33 (3): 163–84.

Castellani, Brian, and Rajeev Rajaram. n.d.“How Large Must a Population Be to Accomplish Great Things?” 15.

Chu, Johan S. G., and James A. Evans. 2021.“Slowed Canonical Progress in Large Fields of Science.”*Proceedings of the National Academy of Sciences* 118 (41): e2021636118.

Collison, Patrick, and Michael Nielsen. 2018.“Science Is Getting Less Bang for Its Buck.”*The Atlantic*, November 16, 2018.

Cristelli, Matthieu, Andrea Tacchella, and Luciano Pietronero. 2015.“The Heterogeneous Dynamics of Economic Complexity.”*PLoS ONE* 10 (2): e0117174.

David, Paul A. 1985.“Clio and the Economics of QWERTY.”*The American Economic Review* 75 (2): 332–37.

Filimonov, Vladimir, David Bicchetti, Nicolas Maystre, and Didier Sornette. 2014.“Quantification of the High Level of Endogeneity and of Structural Regime Shifts in Commodity Markets.”*Journal of International Money and Finance*, Understanding International Commodity Price Fluctuations, 42 (April): 174–92.

Filimonov, Vladimir, and Didier Sornette. 2012.“Quantifying Reflexivity in Financial Markets: Toward a Prediction of Flash Crashes.”*Physical Review E* 85 (5): 056108.

Filimonov, V., and D. Sornette. 2013.“A Stable and Robust Calibration Scheme of the Log-Periodic Power Law Model.”*Physica A: Statistical Mechanics and Its Applications* 392 (17): 3698–3707.

Frenken, Koen. 2006.*Innovation, Evolution and Complexity Theory*. Edward Elgar Publishing.

Funtowicz, Silvio O, and Jerome R Ravetz. 1994.“The Worth of a Songbird: Ecological Economics as a Post-Normal Science.”*Ecological Economics* 10: 197–207.

Goldenberg, Jacob, Barak Libai, Yoram Louzoun, David Mazursky, and Sorin Solomon. 2004.“Inevitably Reborn: The Reawakening of Extinct Innovations.”*Technological Forecasting and Social Change* 71 (9): 881–96.

Hammond, Allen, Albert Adriaanse, Stefan Bringzeu, Yuichi Moriguchi, Eric Rodenburg, Donald Rogich, and Helmut Schütz. 1997.*Resource Flows: The Material Basis of Industrial Economies*. World Resources Institute Washington, DC.

Hartmann, Rasmus Koss, Anders D. Krabbe, and André Spicer. 2019.“Towards an Untrepreneurial Economy? The Entrepreneurship Industry and the Rise of the Veblenian Entrepreneur.” SSRN Scholarly Paper ID 3479042. Rochester, NY: Social Science Research Network.

Hawken, Paul, Amory Lovins, and L Hunter Lovins. 2000.*Natural Capitalism: Creating the Next Industrial Revolution*. Back Bay Books.

Iribarren, José Luis, and Esteban Moro. 2011.“Branching Dynamics of Viral Information Spreading.”*Physical Review E* 84 (4): 046116.

Jackson, Matthew O. 2008.*Social and Economic Networks*. Princeton University Press.

Kali, Raja, Javier Reyes, Joshua McGee, and Stuart Shirrell. 2013.“Growth Networks.”*Journal of Development Economics* 101 (March): 216–27.

Kelly, Morgan, and Cormac Ó Gráda. 2022.“Connecting the Scientific and Industrial Revolutions: The Role of Practical Mathematics.”*The Journal of Economic History*, July, 1–33.

König, Michael D., S. Battiston, M. Napoletano, and F. Schweitzer. 2011.“Recombinant Knowledge and the Evolution of Innovation Networks.”*Journal of Economic Behavior & Organization* 79 (3): 145–64.

König, Michael D., Stefano Battiston, Mauro Napoletano, and Frank Schweitzer. 2012.“The Efficiency and Stability of R&D Networks.”*Games and Economic Behavior* 75 (2): 694–713.

Lane, David A, and Robert R Maxfield. 2005.“Ontological Uncertainty and Innovation.”*Journal of Evolutionary Economics* 15: 3–50.

Loreto, Vittorio, Vito D. P. Servedio, Steven H. Strogatz, and Francesca Tria. 2016.“Dynamics on Expanding Spaces: Modeling the Emergence of Novelties.” In*Creativity and Universality in Language*, edited by Mirko Degli Esposti, Eduardo G. Altmann, and François Pachet, 59–83. Lecture Notes in Morphogenesis. Springer International Publishing.

Mokyr, Joel. n.d.*The Lever of Riches: Technological Creativity and Economic Progress*. Oxford University Press.

Moussaïd, Mehdi, Juliane E. Kämmer, Pantelis P. Analytis, and Hansjörg Neth. 2013.“Social Influence and the Collective Dynamics of Opinion Formation.”*PLoS ONE* 8 (11): e78433.

Napolitano, Lorenzo, Evangelos Evangelou, Emanuele Pugliese, Paolo Zeppini, and Graham Room. n.d.“Technology Networks: The Autocatalytic Origins of Innovation.”*Royal Society Open Science* 5 (6): 172445.

Nelson, Richard R, and Sidney G Winter. 2002.“Evolutionary Theorizing in Economics.”*The Journal of Economic Perspectives* 16: 23–46.

Nordhaus, William D. 2005.“Schumpeterian Profits and the Alchemist Fallacy.” SSRN Scholarly Paper ID 820309. Rochester, NY: Social Science Research Network.

Nowak, Martin A, and DAvid C Krakauer. 1999.“The Evolution of Language.”*Proceedings of the National Academy of Sciences of the United States of America* 96 (14): 8028.

Ollhoff, Jim, and Michael Walcheski. 2002.*Stepping in Wholes: Introduction to Complex Systems*. Sparrow Media Group.

Onnela, J P, and Felix Reed-Tsochas. 2010.“Spontaneous Emergence of Social Influence in Online Systems.”*Proceedings of the National Academy of Sciences* 107 (43): 18375–80.

Ormerod, Paul, and R Alexander Bentley. 2010.“Modelling Creative Innovation.”*Cultural Science* 3 (1).

Ormerod, Paul, and Rich Colbaugh. 2006.“Cascades of Failure and Extinction in Evolving Complex Systems.”*Journal of Artificial Societies and Social Simulation* 9.

Pezzey, John C V, and John M Anderies. 2003.“The Effect of Subsistence on Collapse and Institutional Adaptation in Population-Resource Societies.”*Journal of Development Economics* 72: 299–320.

Philippon, Thomas. 2022.“Additive Growth.” Working Paper. Working Paper Series. National Bureau of Economic Research.

Renn, Jürgen. 2020.*The Evolution of Knowledge: Rethinking Science for the Anthropocene*. Princeton: Princeton University Press.

Repenning, Nelson P. 2002.“A Simulation-Based Approach to Understanding the Dynamics of Innovation Implementation.”*Organization Science* 13.

Rivkin, Jan W. 2001.“Reproducing Knowledge: Replication Without Imitation at Moderate Complexity.”*Organization Science*, 274–93.

Rizzo Jr, Mario J. 1996.*The Economics of Time and Ignorance: With a New Introduction*. Routledge.

Rosewell, Bridget, and Paul Ormerod. 2004.“How Much Can Firms Know?”*Computing in Economics and Finance 2004*.

Rosicky, Anton’ın. 2001.“Information and Social Systems Evolution.” In.

Rossman, Gabriel. 2012.*Climbing the Charts: What Radio Airplay Tells Us about the Diffusion of Innovation*. Princeton: Princeton University Press.

Scannell, Jack W., Alex Blanckley, Helen Boldon, and Brian Warrington. 2012.“Diagnosing the Decline in Pharmaceutical R&D Efficiency.”*Nature Reviews Drug Discovery* 11 (3): 191–200.

Schweitzer, Frank, Giorgio Fagiolo, Didier Sornette, Fernando Vega-Redondo, Alessandro Vespignani, and Douglas R White. 2009.“Economic Networks: The New Challenges.”*Science* 325 (5939): 422–25.

Serafinelli, Michel, and Guido Tabellini. 2017.“Creativity Over Time and Space.”*SSRN Electronic Journal*.

———. 2018.“Creativity and Freedom.”*VoxEU.org* (blog).

Solé, Ricard V, Bernat Corominas-Murtra, Sergi Valverde, and Luc Steels. 2010.“Language Networks: Their Structure, Function, and Evolution.”*Complexity* 15: 20–26.

Solé, Ricard V., Sergi Valverde, Marti Rosas Casals, Stuart A. Kauffman, Doyne Farmer, and Niles Eldredge. 2013.“The Evolutionary Ecology of Technological Innovations.”*Complexity* 18 (4): 15–27.

Sood, Vishal, Myléne Mathieu, Amer Shreim, Peter Grassberger, and Maya Paczuski. 2010.“Interacting Branching Process as a Simple Model of Innovation.”*Physical Review Letters* 105 (17): 178701.

Spranzi, Marta. 2004.“Galileo and the Mountains of the Moon: Analogical Reasoning, Models and Metaphors in Scientific Discovery.”*Journal of Cognition and Culture* 4 (3): 451–83.

Stadler, Bärbel M R, Peter F Stadler, Günter P Wagner, and Walter Fontana. 2001.“The Topology of the Possible: Formal Spaces Underlying Patterns of Evolutionary Change.”*Journal of Theoretical Biology* 213 (2): 241–74.

Stebbing, A R D. 2006.“Genetic Parsimony: A Factor in the Evolution of Complexity, Order and Emergence.”*Biological Journal of the Linnean Society,* 88: 295.

Sterman, John D. 2000.*Business Dynamics*. McGraw Hill Higher Education.

Straatman, Bas, Roger White, and Wolfgang Banzhaf. 2008.“An Artificial Chemistry-Based Model of Economies.”*Artificial Life* 11: 592.

Sutton, John. 2001.*Technology and Market Structure: Theory and History*. The MIT Press.

Tacchella, Andrea, Matthieu Cristelli, Guido Caldarelli, Andrea Gabrielli, and Luciano Pietronero. 2012.“A New Metrics for Countries’ Fitness and Products’ Complexity.”*Scientific Reports* 2 (October).

Tainter, Joseph A. 1995.“Sustainability of Complex Societies.”*Futures* 27: 397–407.

Thiel, Peter A. 2014.*Zero to One: Notes on Startups, or How to Build the Future*. First edition. New York: Crown Business.

Thorngate, Warrem, Jing Liu, and Wahida Chowdhury. 2011.“The Competition for Attention and the Evolution of Science.”*Journal of Artificial Societies and Social Simulation* 14 (4): 17.

Tomasello, Mario Vincenzo, Mauro Napoletano, Antonios Garas, and Frank Schweitzer. 2013.“The Rise and Fall of R&D Networks.”*arXiv:1304.3623 [Physics]*, April.

Topolinski, Sascha, and Rolf Reber. 2010.“Gaining Insight Into the ?Aha? Experience.”*Current Directions in Psychological Science* 19: 402–5.

Tria, F., V. Loreto, V. D. P. Servedio, and S. H. Strogatz. 2013.“The Dynamics of Correlated Novelties.”*arXiv:1310.1953 [Physics]* 4 (October).

Valverde, Sergi, Ricard V Solé, Mark A Bedau, and Norman H Packard. 2007.“Topology and Evolution of Technology Innovation Networks.”*Phys. Rev. E* 76 (5): 056118.

Vitali, Stefania, James B. Glattfelder, and Stefano Battiston. 2011.“The Network of Global Corporate Control.”*PLoS ONE* 6 (10): e25995.

Wong, Michael L., and Stuart Bartlett. 2022.“Asymptotic Burnout and Homeostatic Awakening: A Possible Solution to the Fermi Paradox?”*Journal of The Royal Society Interface* 19 (190): 20220029.

Wu, Lingfei, Dashun Wang, and James A. Evans. 2019.“Large Teams Develop and Small Teams Disrupt Science and Technology.”*Nature* 566 (7744): 378–82.

Young, H Peyton. 1998.*Individual Strategy and Social Structure : An Evolutionary Theory of Institutions*. Princeton, N.J.: Princeton University Press.

———. 2002.“The Diffusion of Innovations in Social Networks.”

———. 2005.“The Spread of Innovations Through Social Learning.”

Zabell, S. L. 1992.“Predicting the Unpredictable.”*Synthese* 90 (2): 205–32.

Upon the efficient consumption and summarizing of news from around the world.

Remember feeds? From when we though the internet would provide us timely, pertinent information from around the world on our topics of interest? When our information streams were not about your aunt’s conspiracy theory memes on a social media feed, but curated lists of expert opinion?

No? Did you miss that bit of internet discussion. Perhaps a picture best explains it in that case.

I have been told to do this through Twitter or Facebook, but, seriously…
no.Those are systems designed to waste time with stupid distractions in order to benefit someone else.
Facebook is*informative* in the same way that thumb sucking is*nourishing*.Telling me to use someone’s social website to gain information is like telling me to play poker machines to fix my financial troubles.
Stop that.

Contrarily, I would like to find ways to*summarise* and*condense* information to*save* time for myself.
That is what feeds were designed for.
New to this game? You know what podcasts are?
Podcasts are a type of feed. An audio feed.
If I care about news articles and tumblr posts and whatever, not just audio,
then I can still use feeds, feeds of text instead of audio.
Any website can have a feed. Many do.

Remember when we thought the web would be a useful tool for researching and learning, and that automated research assistants would trawl the web for us? RSS Feeds were often discussed as a piece of that machine: Little updates dripped from the web, to be sliced, diced, prioritised and analysed by our software to keep us aware of… whatever.

Feeds in their current form, as useful and time-saving as they are, are not the apotheosis of information retrieval. AFAICT they were always intended to be a part of a larger infrastructure of automated knowledge classification anddiscovery and analysis, possibly of community sharing and curating.

This higher level, the analysis, has not eventuated, at least not as any kind of technical standard.
Most feed readers don’t do much fancy analysis or triage,
they just give you a list of new items ordered by date.
Many*people* make a living sifting feed wheat from feed chaff. This is what we called “journalism”.
Services likeCanopy or Pinterest orkeen which do automatic moderation might be useful, but it is not so common.

Still, whatever. As they are, feed readers work and they are better than browsing to the same page and pressing “refresh” constantly.

Feeds are mourned and missing from certain fancy modern blogs. Read Anil Dash onThe lost infrastructure of social media.

*Pace* Anil, feeds are alive.
Feeds areavailable for, for example, Medium, much as that site tries to distance itself from the normal web.
Anil Dash’s Medium feed, for example, ishttps://medium.com/feed/@anildash.

Moreeover, in my feed are more useful than ever,. In academia we occupy different slopes of the hype curve. Academics have belatedly caught up on the existence of feeds to the point that specialist blogs are having, IMO, a golden age. Perhaps because getting a PhD is a giant PITA, the barrier for entry to starting a blog seems trivial by comparison, so a lot of PhD people produce a lot of high quality content. I think academics are enjoying thepeople in the machine phase.

Want to get started? Why not try the sources that I use in my feed reader? They are all available onmy blogroll.

How to get those feeds? Why, an app! Either a web app, or an app app like your grandaddy used to use.

fraidycat is a desktop app or browser extension for Firefox or Chrome, with a remit to be an unapologetic passion project in an area depressingly dominated by SEO types. Has a novel UX design, by a novel piece of internet,Kicks Condor.

I use it to follow people (hundreds) on whatever platform they choose - Twitter, a blog, YouTube, even on a public TiddlyWiki.

Open source.

On that UX thing,

There is no news feed. Rather than showing you a massive inbox of new posts to sort through, you see a list of recently active individuals. No one can noisily take over this page, since every follow has a summary that takes up a mere two lines.

feedly is the current boss, and the one that I use. Targets commercial users, like web “community managers” or marketing types, but it mostly usable despite that. Probably works for humans too.This is how you would subscribe to my site in Feedly. Point of friction: the internal feedback mechanism has the tic that it is constantly asking me if an article is about “leadership” or not. The thought of who would comb the internet all day long and desperately try to optimise the efficiency with which they can find articles about leadership, this fills me with gnawing dread.

Newsblur is a quirky option that I used for a while before the interface annoyed me. For me, its user interface is in the uncanny valley between radical UI redesign, and poor reimplementation of the corporate standard, but this is by no means a universal opinion.

The UI defies the last 10 years of user interface conventions, which is confusing, but it works and is cheap (and in fact has an open source app you can self-host).This is how you would subscribe to my site in Newsblur

Inoreader seems popular amongst readers of this blog. I have not used it myself, but thanks for all your custom, Inoreader users. Care to share why you like it?

Feeder is a browser extension/site/app that reads feeds that I have not used but also produces a lot of the traffic of this blog. (Feeder fans: speak up!

The old reader reads feeds and this includes activity updates for people you follow on social media. Not sure if that is the worst or best of all worlds. I presume the name is a tribute to the fact that this technology is pretty good despite the hype curve moving on to coil about other things?

feedbin is also a thing. Not sure about the value proposition. Found it somewhere. I have spend enough time auditioning feed readers now, though, so it is unlikely I will change.

Most people can stop reading here.

If you want to get more indy/private or have a distaste for the pre-packaged, you can run your own feed reader server.
I willrun my own server software if the application is compelling enough,
but let us consider the costs.
Let’s say between backups, security issues, confusing DNS failures etc,
that’s 8 hours per year of miscellaneous computer wrangling, best case, and
more hours if you have complicated things like running some multi-user enterprisey database to store data.
It is hard to imagine that cost being worth it for internet distracting.1

- selfoss is a PHP/SQLite self-hosted feed reader
- miniflux is open-source/DIY, but also offers a hosted version for $15/year.
- stringer looks like a nice little ruby app but needs postgresql. Bloat. ⚠️
- tinytinyrss is the original “minimalist” RSS reader; it still needs more databases than is sensible.
- fever is a weird commercial (USD30) application that you host on your own server. It claims to learn your information preferences, which could be cool. But I cannot be arsed installing some database-wanting app with suspiciously antique language requirements (PHP3) that also costs money to try, so I will never know.

fraidycat does this for Twitter, Instagram, Reddit, SoundCloud and Twitch. So doesrssbox for slightly different selection of sites.

Feedity generates feed from sites that don’t understand content aggregators.

More generally,web scraping tools can do this. For example,Scrapy i whose companionn projectscrapy-rss converts weird sites into RSS.

Newsletter/RSS gatewayNewsletterHunt, e.g. here is, e.g.Matt Levin’s Money Stuff

Why people insist on running enterprise databases run apps such as this is an ongoing mystery to me. The capacity to scale to many users is nice, I suppose, but by that logic everyone should drive everywhere in a school bus.↩︎

Predicting with competence: the best machine learning idea you never heard of from renowned passive-aggressive grumpy bastard Scott Locklin (Sorry Scott, but you are so reliably objectionable that I am always going to need to put a disclaimer on links to you, why do you refer to female scientists as “this woman”?):

The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA when the standard parametric confidence intervals won’t work.

Cosma Shalizi recommends Samii’sConformal Inference Tutorial andLei et al. (2017), because he feltVovk, Gammerman, and Shafer (2005) wasbadly written. Maybe Shafer’s tutorial is good?(Shafer and Vovk 2008). Modern takes in“Predicting With Confidence: Using Conformal Prediction in Drug Discovery” (2021);Zeni, Fontana, and Vantini (2020) andA Tutorial on Conformal Prediction plus accompanyingvideo(Angelopoulos and Bates 2022).

Question: how well does this work underdataset shift?(Tibshirani et al. 2019).

Angelopoulos, Anastasios N., and Stephen Bates. 2022.“A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” arXiv.

Barber, Rina Foygel, Emmanuel J. Candès, Aaditya Ramdas, and Ryan J. Tibshirani. 2021.“Predictive Inference with the Jackknife+.”*The Annals of Statistics* 49 (1): 486–507.

Bastani, Osbert, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth. 2022.“Practical Adversarial Multivalid Conformal Prediction.” arXiv.

Lei, Jing, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. 2017.“Distribution-Free Predictive Inference For Regression.” arXiv.

“Predicting With Confidence: Using Conformal Prediction in Drug Discovery.” 2021.*Journal of Pharmaceutical Sciences* 110 (1): 42–49.

Shafer, Glenn, and Vladimir Vovk. 2008.“A Tutorial on Conformal Prediction.”*Journal of Machine Learning Research* 9 (12): 371–421.

Tibshirani, Ryan J, Emmanuel J Candès, Rina Foygel Barber, and Aaditya Ramdas. 2019.“Conformal Prediction Under Covariate Shift,” 11.

Vovk, Vladimir, Alex Gammerman, and Glenn Shafer. 2005.*Algorithmic Learning in a Random World*. Springer Science & Business Media.

Vovk, Vladimir, Ilia Nouretdinov, and Alexander Gammerman. 2009.“On-Line Predictive Linear Regression.”*The Annals of Statistics* 37 (3): 1566–90.

Zeni, Gianluca, Matteo Fontana, and Simone Vantini. 2020.“Conformal Prediction: A Unified Review of Theory and New Challenges.”*arXiv:2005.07972 [Cs, Econ, Stat]*, May.

Estimating densities by considering the observations drawn from that as a point process. In one dimension this gives us the particularly lovely trick ofsurvival analysis, but the relations are much more general.

Consider the problem of estimating the common density\(f(x)dx=dF(x)\) of indexed i.i.d. random variables\(\{X_i\}_{1\leq i\leq n}\in \mathbb{R}^d\) from\(n\) realisations of those variables,\(\{x_i\}_{i\leq n}\) where\(F:\mathbb{R}^d\rightarrow[0,1]\) a (cumulative) distribution. We assume the state is absolutely continuous with respect to the Lebesgue measure, i.e.\(\mu(A)=0\Rightarrow P(X_i\in A)=0\). This implies that\(P(X_i)=P(X_j)=0\text{ for }i\neq j\) and that the density exists as a standard function (i.e. we do not need to consider generalised functions such as distributions to handle atoms in\(F\) etc.)

Here we parameterise the density with some finite dimensional parameter vector\(\theta,\) i.e.\(f(x;\theta),\) whose value completely characterises the density; the problem of estimating the density is then the same as the one of estimating\(\theta.\)

In the method of maximum likelihood estimation, we seek to maximise the value of the empirical likelihood of the data. That is, we choose a parameter estimate\(\hat{\theta}\) to satisfy\[ \begin{aligned} \hat{\theta} &:=\operatorname{arg max}_\theta\prod_i f(x_i;\theta)\\ &=\operatorname{arg max}_\theta\sum_i \log f(x_i;\theta) \end{aligned} \]

Let’s consider the case where we try to estimate this function by constructing it from some given basis of\(p\) functions\(\phi_j: \mathbb{R}^d\rightarrow[0,\infty),\) so that

\[f(x)=\sum_{j\leq p}w_j\phi_j(x)\]

and\(\theta\equiv\{w_j\}_{j\leq p}.\) We keep this simple by requiring\(\int\phi_j(x)dx=1,\) so that they are all valid densities. Then the requirement that\(\int f(x)dx=1\) will imply that\(\sum_j w_j=1,\) i.e. we are taking a convex combination of these basis densities.

Then the maximum likelihood estimator can be written\[ \begin{aligned} \hat{\theta} &=\operatorname{arg max}_{\{w_i\}}f(\{x_i\};\{w_i\})\\ &=\operatorname{arg max}_{\{w_i\}}\sum_i \log \sum_{j\leq p}w_j\phi_j(x_i) \end{aligned} \]

A moment’s thought reveals that this equation has no finite optimum, since it is strictly increasing in each\(w_j\). However, we are missing a constraint, which is that to be a well-defined probability density, it must integrate to unity, i.e.\[ \int f(\{x\};\{w_i\})dx = 1 \] and therefore\[ \begin{aligned} \int \sum_{j\leq p}w_j\phi_j(x)dx&=1\\ \sum_{j\leq p}w_j\int\phi_j(x)dx&=1\\ \sum_{j\leq p}w_j &=1. \end{aligned} \]

By messing around with Lagrange multipliers to enforce that constraint we eventually find\[ \hat{\{w_k\}} = \frac{\sum_i \phi_k(x_i)}{\sum_i \sum_j \phi_j(x_i)}. \]

Consider the problem of estimating the intensity\(\lambda\) of a*simple*,
non-interacting inhomogeneous point process\(N(B)\) on some compact\(W\subset\mathbb{R}^d\) from a realisation\(\{x_i\}_{i\leq n}\), and this counting function\(N(B)\)
counts the number of points that fall on
a set\(B\subset\mathbb{R}^d\).

The*intensity* is (in the simple non-interacting case —
seeDaley and Vere-Jones (2003)
for other cases) a function\(\lambda:\mathbb{R}\rightarrow [0,\infty)\) such that,
or any box\(B\subset W\),\[N(B)\sim\operatorname{Poisson}(\Lambda(B))\]
where\[\Lambda(B):=\int_Bd\lambda(x)dx\]
and for any disjoint boxes,\(N(A)\perp N(B).\)

After some argumentation about intensities we can find a likelihood for the observed distribution:\[f(\{x_i\};\tau)= \prod_i \lambda(x_i;\tau)\exp\left(-\int\lambda(x;\tau)dx\right). \]

Say that we wish to find the inhomogeneous intensity function by the method of maximum likelihood. We allow the intensity function to be described by a parameter vector\(\tau,\) which we write\(\lambda(x;\tau)\), and we once again construct an estimate:\[ \begin{aligned} \hat{\tau}&:=\operatorname{arg max}_\tau\sum_i\log f(x;\tau)\\ &=\operatorname{arg max}_\tau\sum_i\log \left(\lambda(x_i;\tau) \exp\left(-\int_W\lambda(x;\tau) dx\right)\right)\\ &=\operatorname{arg max}_\tau\sum_i\log \lambda(x_i;\tau)-\int_W\lambda(x;\tau)dx\\ &=\operatorname{arg max}_\tau\sum_i\log \lambda(x_i;\tau)-\Lambda(W). \end{aligned} \]

Now consider the case where we assume that the intensity can be written in a\(\phi_k\) basis as above, so that\[\lambda(x)=\sum_{j\leq p}\omega_j\phi_j(x)\] with\(\tau\equiv\{\omega_j\}.\) Then our estimate may be written\[ \begin{aligned} \hat{\tau}&:=\operatorname{arg max}_{\{\omega_j\}}f\left(\{x_i\};\{\omega_j\} \right)\\ &:=\operatorname{arg max}_{\{\omega_j\}}\sum_i\left(\log \lambda(x_i;\tau)-\Lambda(W)\right)\\ &=\operatorname{arg max}_{\{\omega_j\}}\sum_i\left(\log \sum_{j\leq p}\omega_j\phi_j(x_i)-\int_W\sum_{j\leq p}\omega_j\phi_j(x)dx\right)\\ &=\operatorname{arg max}_{\{\omega_j\}}\sum_i\left(\log \sum_{j\leq p}\omega_j\phi_j(x_i)-\sum_{j\leq p}\omega_j\int_W \phi_j(x)dx \right)\\ &=\operatorname{arg max}_{\{\omega_j\}}\sum_i\left(\log \sum_{j\leq p}\omega_j\phi_j(x_i)\right)-\sum_{j\leq p}\omega_j \end{aligned} \] We have a similar log-likelihood to the density estimation case.

Under the constraint that\(E(\hat{N}=N)\) we have\(\sum_j\omega_j=n\) and therefore\[ \hat{\tau} =\operatorname{arg max}_{\{\omega_i\}}\sum_i\left(\log \sum_{j\leq p}\omega_j\phi_j(x_i)\right)-n. \]

Note that if we consider the points*as a shifted density* we find the result is the same
as the maximum obtained by considering the points as an inhomogeneous spatial
point pattern, up to an offset of\(n\), i.e.\(\omega_j\equiv nw_j.\)

From the other direction, we can formulate density estimation as a count regression; For “nice” distributions this will be the same as estimating the correct Poisson intensity for every given small region of the state space (e.g.(Gu 1993;Eilers and Marx 1996)). 🏗

Consider a box in\(B\subset \mathbb{R}^d\). The probability of any one\(X_i\) falling within that box,\[P(X_i\subset B)=E\left(\mathbb{I}\{X_i\subset B\}\right)=\int_B dF(x).\]

We know that the expected number of\(X_i\) to fall within that box is\(N\) times the probability of any one falling in that box, i.e.\[E\left(\sum_{i\leq N}\mathbb{I}\{X_i\subset B\}\right)=N\int_B dF(x)\] and thus\[P(N(B)=k)=\frac{\exp(-\Lambda(B))\Lambda(B)^k}{k!}.\] …Where was I going with this? Something to do withlinear point process estimation perhaps? 🏗

The score function and log-hazard rates are similar beasts. We can exploit that in other ways perhaps, e.g. in aLangevin dynamics algorithm? But will we gain something useful from that? DoesFlow Network based Generative Models for Non-Iterative Diverse Candidate Generation leverage something like that?

Interacting point processes have intensities too which may also be re-interpreted as densities. What kind of relations are implied between the RVs which would have this “dynamically evolving” density? Clearly not i.i.d. But useful somewhere?

Andersen, Per Kragh, Ornulf Borgan, Richard D. Gill, and Niels Keiding. 1997.*Statistical models based on counting processes*. Corr. 2. print. Springer series in statistics. New York, NY: Springer.

Anderson, J. A., and S. C. Richardson. 1979.“Logistic Discrimination and Bias Correction in Maximum Likelihood Estimation.”*Technometrics* 21 (1): 71–78.

Barron, Andrew R., and Chyong-Hwa Sheu. 1991.“Approximation of Density Functions by Sequences of Exponential Families.”*The Annals of Statistics* 19 (3): 1347–69.

Berman, Mark, and Peter Diggle. 1989.“Estimating Weighted Integrals of the Second-Order Intensity of a Spatial Point Process.”*Journal of the Royal Statistical Society. Series B (Methodological)* 51 (1): 81–92.

Brown, Lawrence D., T. Tony Cai, and Harrison H. Zhou. 2010.“Nonparametric Regression in Exponential Families.”*The Annals of Statistics* 38 (4): 2005–46.

Castellan, G. 2003.“Density Estimation via Exponential Model Selection.”*IEEE Transactions on Information Theory* 49 (8): 2052–60.

Cox, D. R. 1965.“On the Estimation of the Intensity Function of a Stationary Point Process.”*Journal of the Royal Statistical Society: Series B (Methodological)* 27 (2): 332–37.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008.“Fast Gaussian Process Methods for Point Process Intensity Estimation.” In*Proceedings of the 25th International Conference on Machine Learning*, 192–99. ICML ’08. New York, NY, USA: ACM Press.

Daley, Daryl J., and David Vere-Jones. 2003.*An introduction to the theory of point processes*. 2nd ed. Vol. 1. Elementary theory and methods. New York: Springer.

———. 2008.*An Introduction to the Theory of Point Processes*. 2nd ed. Vol. 2. General theory and structure. Probability and Its Applications. New York: Springer.

Efromovich, Sam. 1996.“On Nonparametric Regression for IID Observations in a General Setting.”*The Annals of Statistics* 24 (3): 1126–44.

———. 2007.“Conditional Density Estimation in a Regression Setting.”*The Annals of Statistics* 35 (6): 2504–35.

Eilers, Paul H. C., and Brian D. Marx. 1996.“Flexible Smoothing with B-Splines and Penalties.”*Statistical Science* 11 (2): 89–121.

Ellis, Steven P. 1991.“Density Estimation for Point Processes.”*Stochastic Processes and Their Applications* 39 (2): 345–58.

Giesecke, K., H. Kakavand, and M. Mousavi. 2008.“Simulating Point Processes by Intensity Projection.” In*Simulation Conference, 2008. WSC 2008. Winter*, 560–68.

Gu, Chong. 1993.“Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm.”*Journal of the American Statistical Association* 88 (422): 495–504.

Heigold, Georg, Ralf Schlüter, and Hermann Ney. 2007.“On the Equivalence of Gaussian HMM and Gaussian HMM-Like Hidden Conditional Random Fields.” In*Eighth Annual Conference of the International Speech Communication Association*.

Hinton, G., Li Deng, Dong Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012.“Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.”*IEEE Signal Processing Magazine* 29 (6): 82–97.

Kooperberg, Charles, and Charles J. Stone. 1991.“A Study of Logspline Density Estimation.”*Computational Statistics & Data Analysis* 12 (3): 327–47.

———. 1992.“Logspline Density Estimation for Censored Data.”*Journal of Computational and Graphical Statistics* 1 (4): 301–28.

Leung, G., and A.R. Barron. 2006.“Information Theory and Mixing Least-Squares Regressions.”*IEEE Transactions on Information Theory* 52 (8): 3396–3410.

Lieshout, Marie-Colette N. M. van. 2011.“On Estimation of the Intensity Function of a Point Process.”*Methodology and Computing in Applied Probability* 14 (3): 567–78.

Miller, Benjamin Kurt, Alex Cole, and Gilles Louppe. n.d.“Simulation-Efﬁcient Marginal Posterior Estimation with Swyft: Stop Wasting Your Precious Time.” In, 9.

Møller, Jesper, and Rasmus Plenge Waagepetersen. 2003.*Statistical Inference and Simulation for Spatial Point Processes*. Chapman and Hall/CRC.

Norets, Andriy. 2010.“Approximation of Conditional Densities by Smooth Mixtures of Regressions.”*The Annals of Statistics* 38 (3): 1733–66.

Panaretos, Victor M., and Yoav Zemel. 2016.“Separation of Amplitude and Phase Variation in Point Processes.”*The Annals of Statistics* 44 (2): 771–812.

Papangelou, F. 1974.“The Conditional Intensity of General Point Processes and an Application to Line Processes.”*Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete* 28 (3): 207–26.

Rásonyi, Miklós, and Kinga Tikosi. 2022.“On the Stability of the Stochastic Gradient Langevin Algorithm with Dependent Data Stream.”*Statistics & Probability Letters* 182 (March): 109321.

Reynaud-Bouret, Patricia. 2003.“Adaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.”*Probability Theory and Related Fields* 126 (1).

Saul, Lawrence K., and Daniel D. Lee. 2001.“Multiplicative Updates for Classification by Mixture Models.” In*Advances in Neural Information Processing Systems*, 897–904.

Schoenberg, Frederic Paik. 2005.“Consistent Parametric Estimation of the Intensity of a Spatial–Temporal Point Process.”*Journal of Statistical Planning and Inference* 128 (1): 79–93.

Sha, Fei, and Lawrence K. Saul. 2006.“Large Margin Hidden Markov Models for Automatic Speech Recognition.” In*Advances in Neural Information Processing Systems*, 1249–56.

Sugiyama, Masashi, Ichiro Takeuchi, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Daisuke Okanohara. 2010.“Conditional Density Estimation via Least-Squares Density Ratio Estimation.” In*International Conference on Artificial Intelligence and Statistics*, 781–88.

Tüske, Zoltán, Muhammad Ali Tahir, Ralf Schlüter, and Hermann Ney. 2015.“Integrating Gaussian Mixtures into Deep Neural Networks: Softmax Layer with Hidden Variables.” In*2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, 4285–89. IEEE.

Willett, R. M., and R. D. Nowak. 2007.“Multiscale Poisson Intensity and Density Estimation.”*IEEE Transactions on Information Theory* 53 (9): 3171–87.

Very famous now thanks toneural diffusions.

Suggestive connection tothermodynamics(Sohl-Dickstein et al. 2015),score estimators in gradient…

Hyvärinen, Aapo. 2005.“Estimation of Non-Normalized Statistical Models by Score Matching.”*The Journal of Machine Learning Research* 6 (December): 695–709.

Sohl-Dickstein, Jascha, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015.“Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”*arXiv:1503.03585 [Cond-Mat, q-Bio, Stat]*, November.

Song, Jiaming, Chenlin Meng, and Stefano Ermon. 2021.“Denoising Diffusion Implicit Models.”*arXiv:2010.02502 [Cs]*, November.

Song, Yang, and Stefano Ermon. 2020a.“Generative Modeling by Estimating Gradients of the Data Distribution.” In*Advances In Neural Information Processing Systems*. arXiv.

———. 2020b.“Improved Techniques for Training Score-Based Generative Models.” In*Advances In Neural Information Processing Systems*. arXiv.

Song, Yang, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. 2019.“Sliced Score Matching: A Scalable Approach to Density and Score Estimation.” arXiv.

Song, Yang, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2022.“Score-Based Generative Modeling Through Stochastic Differential Equations.” In.

Swersky, Kevin, Marc’Aurelio Ranzato, David Buchman, Nando D. Freitas, and Benjamin M. Marlin. 2011.“On Autoencoders and Score Matching for Energy Based Models.” In*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*, 1201–8.

Vincent, Pascal. 2011.“A connection between score matching and denoising autoencoders.”*Neural Computation* 23 (7): 1661–74.

- OCR from screen
- Help I need to use a Windows keyboard
- Gatekeeper
- Desktop wallpaper
- Meta tips
- iMessage doesn’t know how country prefixes work any longer
- Dock disconnects hard drives when mac sleeps
- Error beeps are laptop farts
- Typing emoji
- Concatenate PDFs
- Why do files open when I click on them too long?
`ssh-agent`

died- Preview modern image formats
- Respond to model dialogue boxes without mouse
- Trusting homebrew casks
- Convert selected text
- Usable scripting
- iTunes never finishes syncing my phone
- Creating GUIs for shell scripts
- Automatically anonymized system stats
- Change default shell
- XCode
- Run out of file handles
`plist`

files are opaque binary messes- Which file is crashing/hanging
`$PID`

? - Bootloadey whodangle
- Networking from the command line
- Focus stealing
- Installing skype
- Stop gamed and other processes leaking your data and wasting precious network sockets for no reason
- macOS claims it forgot my email/contacts/calendar password again
- Using Chromium (“open-source chrome”)
- Reset semaphores
- Time machine
- Re-time stupid alarms
- To file

See alsocommand lines it is tedious to remember for general POSIX commands.

⚠️ Many of these commands are supposed to be run sudo root, and each may irremediably ruin your computer, your life, and soil everything you have ever loved. Then it might challenge you to a break dance battle, I dunno. Certainly, some of these commands have done some of that to me.

If it happens to you also, none of that will be my responsibility. The only guarantee I provide here is that some stuff helped me at least one time.

So many of these were filesystem-specific hacks that I broke out those into a specificfilesystem-specific hacks section. And others were about easing the pains ofusing macOS in the low-bandwidt world ormacOS server.

The remaining tips are arranged so that the further you get down the list the longer it has been since I have needed to know it; the later ones probably don’t work on modern macOS.

macOS supports an OCR system calledLive Text which notionally means any time I can see text on the screen I should be able to copy and paste it (e.g. I should be able to use links that I see in a video chat). However, only some apps seem to support this feature, so I end up doing circuitous round trips from screenshots to supported apps. There are better ways.

Markus Schappi’smacOCR: Get any text on your screen into your clipboard.

```
# install
brew install schappim/ocr/ocr
# invoke
ocr
```

This one is an adequate solution for me because I have a terminal open 100% of the time. I have also noticedExtract Text from a Screenshot with Shortcuts but I cannot work out where to download it.ltxlouis has a Siri-based solution but that just leads me to a download page for an app that claims that Siri does not work on my laptop, which is weird because it is RIGHT THERE. Tapping out.

There are a lot of keys on this Windows keyboards that are useless to me. Macos has a fairly limited repertoire of ways for me to mutate the key assignment natively. So how do i remap that Outlook button to be a Command key?

We can use hdiutil as perTechnical Note TN2450: Remapping Keys in macOS 10.12 Sierra, but that involves did you sing some complicated keyboard character numbers or some such.

An alternate solution is to useKarabiner Elements. I do not know how trustworthy this software is, but it seems to require high level of system access, including continuous keystroke monitoring.

An ongoing mess of opaque trade-off between security privacy and usability.

Various scrappy fan communities make animated wallpaper.

Here is a command-line app for stitching together “dynamic“ (ie..e time-of-day-sensitive) wallpaper:mczachurski/wallpapper: Console application for creating dynamic wallpapers for macOS Mojave and newer Dynamic Wallpaper club hosts a user-generatedGallery of pretty things and alsoinstruction.

For pre-rolled wallpapers for the busy see

- Unsplash Wallpapers (free)
- 24 Hour Wallpaper (AUD 15 but very pretty; but also limited selection; too much coast for my liking)

Mr Bishop’sAwesome MacOs Command Line lists how to fix a great many things from the keyboard. Previouslyon github, but that is now mostly a page about his grumpiness at lazy github drive-bys contributors.

A sampling:

```
# screenshot
screencapture -T 3 -t jpg -P delayedpic.jpg
# quicklook preview
qlmanage -p /path/to/file
#Convert Audio File to iPhone Ringtone
afconvert input.mp3 ringtone.m4r -f m4af
```

Apparently we now need to manually add country prefixes to any phone numbers in our inbox that arrive without country prefixes? Here is a script toadd country code to OS X Address book entries:

```
tell application "Contacts"
repeat with eachPerson in people
repeat with eachNumber in phones of eachPerson
set theNum to (get value of eachNumber)
if (theNum does not start with "+" and theNum does not start with "61" and theNum starts with "0") then
-- log "+61" & (get text 2 thru (length of theNum) of theNum)
set value of eachNumber to "+61" & (get text 2 thru (length of theNum) of theNum)
end if
end repeat
end repeat
save
end tell
```

Note thatit only works in versions of Macos before 12.1 because that is when Apple broke scripting for the Contacts app.

Thunderbolt 3 dock disconnects when MacBook sleeps:

Given that the computer sleep seems to be the problem, I turned off the "Power Nap", and it seems to be working for me now.

- System Preferences > Energy Saver
- Power Adapter tab
Uncheck"Enable Power Nap while plugged into a power adapter

- Open System Preferences.
- Click the Energy Saver icon.
- By default the 'Put hard disks to sleep when possible' option will be selected. Uncheck this option.

I hate user interface beeps in general. I REALLY hate them being played promiscuously through arbitrary bluetooth devices. I have done my best to turn them off.

I do not want them.
If they really must play themselves, they can play on my laptop’s internal speakers thank you very much.
Even better would be the laptop speakers of someone else.
Maybe a convicted criminal who needs this kind of low-level irritation as part of their state-mandated punishment?
Or Apple bluetooth devs needing operant conditioning
I*particularly* do not want beeps on audio outputs to which they are not invited and have never been invited.

MacOS audio alerts, though are rude intruders. For me, every time a new bluetooth audio device or HDMI device is connected (or reconnects because a bird flies past or an angel sighs), MacOS will use it for error beeps and miscellaneous notifications. Ths OS is a sex pest when it comes to bluetooth devices, constantly attempting to interfere with them by non-consensual error beep interference. This is incredibly uncomfortable to be around.

Or: A macos beep is a guy who gets in the elevator with me, farts, then leaves again before the door closes.
Why did that guy come here just to fart?
He did not get anything out of it.
Could he not have farted somewhere else?
I could have tolerated him farting if he had not made it weird, but now it’s weird because he came in here and farted pointedly*at* me so I’m kind of*obligated* to be offended I guess?

Here are two apps that claim to prevent macos from imposing its error beeps on random devices:

~~Audio Profile Manager~~(USD5) Audio Profile manager does not do this thing. I tried. It managed other system sounds OK but apparently cannot stop system alert beeps playing from inappropriate devices. Macos will still frantically switch to whichever bluetooth device is most irritating.- SoundSource (USD43) I am not yet psychologically prepared to spend USD43 because something in my heart feels there must be another way.

I should probably secure my mac.macOS Security and Privacy Guide.

As seen intypography.**tl;dr**`Cmd Ctrl Space`

.

```
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" \
-o PATH/TO/YOUR/MERGED/FILE.pdf \
/PATH/TO/ORIGINAL/1.pdf \
/PATH/TO/ANOTHER/2.pdf \
/PATH/TO/A/WHOLE/DIR/*.pdf
```

**UPDATE**: now that python no longer ships with macos, this python2 script is still available but tedious to execute.

If you are, e.g. concatenating chapters of a PDF book you downloaded from e.g. Springer, then you might have the creation dates in the correct order even if they are lexicographically incorrect. In that case you want (fish shell style)

```
pushd PATH/TO/PDFS
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" \
-o PATH/TO/YOUR/MERGED/FILE.pdf \
(ls -rUt)
```

There is some kind of problem with spaces in pathnames if I do it from a different folder.

I didn’t double click, honest guv.

That would bespring loading which one can turn off.

`ssh-agent`

died2>```
sudo launchctl unload /System/Library/LaunchDaemons/ssh.plist
sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist
```

Which worked for me. It is not ideal to require root permissions to share user keys.

`influx6`

asserts we canRestart SSH on Mac Terminal like this (also requiring root permission obviously):

```
sudo launchctl stop com.openssh.sshd
sudo launchctl start com.openssh.sshd
```

WebP support is available byWebPQuicklook

`brew install --cask WebPQuickLook`

Like many modern casks you might need to assert trust for the developer:

`brew install --cask WebPQuickLook --no-quarantine`

qlImagesize claims more features, but I have not tried it.

上的“iPreview - 强大好用的Quick Look扩展程序” enhances quicklook with a variety of image formats including AVIF, webp and eps.

dreampiggy/AVIFQuickLook: AVIF QuickLook plugin on macOS is free for AVIF.

```
brew install avifquicklook
xattr -d -r com.apple.quarantine ~/Library/QuickLook/AVIFQuickLook.qlgenerator
```

Some time I do not wish to click on buttons.
Disabled per default.There is a keyboard shortcut setting
which enables tab-navigation.**System Preferences → Keyboard → Shortcuts → Full Keyboard Access... → All Controls**.

`brew install --cask WebPQuickLook --no-quarantine`

Setting`HOMEBREW_CASK_OPTS`

couldautomatically approve some apps, although then you are casting aside a useful security feature.

`export HOMEBREW_CASK_OPTS="--no-quarantine"`

e.g. to upper case/lower case/HTML/markdown. There was this “services menu” technology which looked like it was going to do this for a while but that is no longer fashionable? 🤷♂ These days it seems one uses a third party app.

Apple has shipped a slow, incomprehensible scripting language for as long as I have been aware of them.
It’s called*Applescript*.
After much time trying to persuade it to do things for me I feel comfortable
asserting that using it will never save me time, although occasionally I execute
a one-liner viaosascript.
There are some attempts
to in-principle open up macOS’s cocoa to, e.g. javascript byjstalk
or python viaPyObjC
but that’s a brutally low level to do things from unless you are an app
developer, and the documentation is even worse. (and then why not just use Swift?)

However, people seemed to have converged on theUIKit accessibility API as a reasonable interface to script.

On this front, tryHammerspoon/Phoenix.pyatomac does the same for python but seems designed for UI testers not for UI users. You cando it with Applescript of course, if your heart overunneth with bituminous hate.

It stays stuck on “importing photos”?
Because you don’t use iCloud, right?
For one thing, why would you voluntarily put yourprivate photos in the hands of some notoriously secretive third party?
For another, even if you wanted to, if you live in a bandwidth-poor country,
iCloud sync is not just bad, it’s*comedically* bad.
Atrociously, OS-floggingly slow and glitchy. “iClod”, let us call it.
So you sync using a cable and iTunes, of course.

Except that, every couple of days, that breaks.Here’s how to fix it. (Note that this question refers to “iPhoto”, but the same bug has been faithfully carried over and reproduced in Aperture and Photos by diligent Apple devs, and the same fix works.)

Quit iTunes

`rm -rf ~/Pictures/Photos\ Library.photoslibrary/iPod\ Photo\ Cache/`

Note that this will reset iTunes to*not* sync your images, so you might need to
reconstruct your settings.

Do that, try again.

If you are reading this you are enough of a geek to need xcode

Run this:

`xcode-select --install`

Hmm, who knows how this works on the latest versions?

But thetraditional advice is:

`ulimit -S -n 2048`

`wget https://github.com/wilsonmar/mac-setup/raw/master/configs/limit.maxproc.plist https://github.com/wilsonmar/mac-setup/raw/master/configs/limit.maxfiles.plist`

`plist`

files are opaque binary messes2>This is how to convert them to text. If you regard XML as text.

`$PID`

?2>Perpetual monitoring natively:

`fs_usage $PID | grep /path/to/file`

or the classic unix way:

`lsof -r -p $PID | grep /path/to/file`

Seehere for some tips on debugging runaway/hung/exploded processes. “See what syscalls does the process actually try to do and if there are any failed ones (status is not 0)” :

`sudo dtruss -p PID`

Case study:`distnoted`

and`lsd`

.

`lsd`

runaway CPU3>Some indexing jobs can cause it to choke, e.g.application bundles. Also,a corrupted database, which may be fixed thusly:

```
/System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Support/lsregister \
-kill -r -domain local -domain system -domain user ; killall Dock
```

This is some kind of notifications daemon. I have no idea why it is out of control. It seems to be related to other processessuch as certain version of flux or bartender. but the problem only seems to occur when my backup drive is plugged in. Hmmph. File a ticket?

Michael Rourke suggests doing this every minute:

```
#!/bin/sh
# check for runaway distnoted, kill if necessary
PATH=/bin:/usr/bin
export PATH
ps -reo '%cpu,uid,pid,command' |
awk -v UID=$UID '
/distnoted agent$/ && $1 > 100.0 && $2 == UID {
system("kill -9 " $3)
}
'
```

But notethis will break backup, so maybe just don’t.

For certain problems you need to reseat the SMC and PRAM which are formally referred to collective as the`bootloadey whodangle`

.
This seemed to break often for me.
I have an inkling it is no longer a thing for modern Macs.

Symptoms include:

- machine doesn’t boot
- CPU fan going all the time
- machine is pausing lots,
- having trouble getting laid,
- *a global geopolitical malaise is leading to the ineluctable slide of civilisation into ecosocial catastrophe.

Do these things:

- Reset the SMC: Switch the computer off, then, while off, on the built-in keyboard, press the (left side) Shift-Control-Option keys and the power button at the same time.
- Reset the PRAM: Switch the computer off, then, while booting, press and hold the Option-Command-P-R keys until the startup sound chimes again.
- Build a small pyramid over your laptop from bronze and crystals. Burn some incense and your Applecare guarantee in a brazier atop it. Surround it with small pictures of your departed ancestors. Make an offering of fruit and prayer.

- Alt
- offer a boot menu
- C
- boot up off CD/USB
- Command-R
- Recovery OS
- Shift
- Safe mode
- Command-V
- verbose mode
- Command-S
- single user prompt

See alsonetwork hacks from som enon-mac-specific ones.

My previous awful router would cross if I leave the house and when I come back I try to use the same DHCP lease.But it’s a one-liner to fix.

```
sudo ipconfig set en0 DHCP
ipconfig getpacket en0
```

odgard notes that some things are slows in Macos 10.15+. Fixes suggested include

```
sudo spctl developer-mode enable-terminal
sudo spctl --master-disable
```

See alsoJeff Johnson’s analysis.

`networksetup -setairportpower en0 off`

Turn on wifi on your macbook from the macOS terminal command line:

`networksetup -setairportpower en0 on`

List available wifi networks from the macOS terminal command line:

`/System/Library/PrivateFrameworks/Apple80211.framework/Versions/A/Resources/airport scan`

Join a wifi network from the macOS terminal command line:

`networksetup -setairportnetwork en0 WIFI_SSID WIFI_PASSWORD`

Find your network interface name:

`networksetup -listallhardwareports`

Oh no! did you use your computer in some wacky workplace network that DNS blocks “frivolous” websites? You need to flush it.

DNS flush command keeps changing, eh?:

```
sudo dscacheutil -flushcache
sudo killall -HUP mDNSResponder
```

More generally, avoid the problem withupgraded DNS.

*Stop stealing focus from me, slow app, I clicked on you like 30 seconds ago.*

CNET says:How to keep applications from stealing focus. But their first idea (edit application, break code signature) is not a viable idea in the modern world.

The command-line background-open still works, although if I wanted to launch apps this way I would be using Linux.

`open -ga iCal`

The Apple supported solution is for you to buy a faster computer.

Never install skype!Skype is spookware. If you must use it, use theweb version. UPDATE: is MS Teams built on Skype?

`sudo defaults write /System/Library/LaunchAgents/com.apple.gamed disabled -bool true`

or:

`launchctl unload -w /System/Library/LaunchAgents/com.apple.gamed.plist`

in reference tothis.

Then an hour later it forgets again again?

Woe! I fixed this once then I forgot how I did it.

Linkdump while I sort it out again again again:

Continuous [sic] request for the CalDAV password?Here’s one solution:

- Go to the Apple menu and choose System Preferences
- Choose the ‘iCloud’ preference pane
- Sign in to iCloud at the OS X preference panel — note if you’re already signed in but still seeing the pop-up message, you can sign out then sign back in to stop that password prompt from happening again
- Close System Preferences

Harder than it should be;
Google*really* wants you to use their furtively modified alternate branch,
Google Chrome.

I can’t even remember why I needed to do this, or how I worked it out, but geez it saved my bacon from something or other.

`ipcs -s | grep " 0x" | awk '{ print $2; }' | xargs -n 1 ipcrm -s`

`tmutil`

is the command-line app which allows you to do proper monitoring and control of the time machine service.
It includes, for example,backup statistics

There is alsoTime Machine editor which controlsvarious time machine settings including local snapshot, below. If you are using a oldish machine without a fast drive then this is useful.

I think this is largely a cosmetic issue, but macOS keeps a local copy of various versions of your files around, called local snapshots or mobile backsups depending where you look.

If, like me, you version everything in git, this is mostly annoying, but also I think harmless as they are self-cleaning. Nonetheless stuff did go wrong for me. I noticed that my spotlight indexing was stuck on the mobilebackups folder for some reason. Why was it even?

What Crap Is This: OSX’s Mobilebackups:

`sudo tmutil disablelocal`

By default, if you use notifications from Apple Calendar,
the notification for events is at 9am,
right in the middle of the first meeting of the day.
So you are half way through the report-back presentation
about your recent conference visit,
your laptop pops up**RECTAL EXAMINATION TODAY**.

You aren’t supposed to be able to change this because the thought of this cruelty is all that gets jaded Apple executives out of bed in the morning, butthere is a hack they forgot to stop.

Are you an inveterateremixer? Do you have a taste for collage? A blog you wish to illustrate with public domain images?

Here I note repositories of legally available content for remixing and mashing up. (Check for remix rights in your local jurisdiction, I ain’t no lawyer)

This is what I actually do with my life, so I have too many opinions to fit here. Seesample libraries,musical corpora.

In descending order of addictiveness.

**UPDATE**: ⚠️This account was deleted without warning, all the content and annotations are gone, and the internet is a poorer place 🪦.

**UPDATE 2**:Account reinstated

Internet archive book imageson Flickr are my favourite and have anelegant origin story The quirky and serendipitous search makes for amazing image finding.

They were my primary source of illustrations on this blog, although I switched to Picryl when they were deleted, and removed many links from this blog. For example, Jan David’s 1603 classicChristeliicken waerseggher images are wildly tripped out. (IPL link.) AFAICT they have stopped updating Flickr though and you might need to readbooks for the latest ones.

Founded in 2011, The Public Domain Review is an online journal and not-for-profit project dedicated to the exploration of curious and compelling works from the history of art, literature, and ideas.

Similar to Bilbliodessy, but with more scrupulous licensing. Annoyingly you usually can’t click through to the correct page in the source books.

Favourites:

- Cat Pianos, Sound-Houses, and Other Imaginary Musical Instruments
- Redressing the Balance: Levinus Vincent’s Wonder Theatre of Nature
- Experiments and Observations in a Heated Room (1774)
- Music of the Squares: David Ramsay Hay and the Reinvention of Pythagorean Aesthetics

They do art objects

Paul K a.k.a. Bibliodessy used to lovingly hand-curateclassic book images. Peking opera figures? Baltic Heraldry? The Astrolabe Molluscs? No longer active, but still worth checking out.

If your tastes are very specific, about weird, allegorical proto-comics from the 1600s, theEmblem Project Utrecht has you covered.
Has,*us* covered, I should say.

The DutchRijksmuseum is amazing, with high quality scans of lots of historical stuff.
Of course it skews heavily, well, Dutch.
But if you have a taste for engravings and lithography of the 1600s and 1700s then … well the Dutch were doing a*lot* of that.
My favourite starting collection ismysteries.

Google arts and culture indexes some collections, including e.g. theRijksmueum. The search function is inaccurate but serendipitous; I usually find something better than what I searched for.

Unsplash is community driven copyleft photos. Less classic archival stuff, more stock-photo replacement. This is occasionally what I want.

picryl is a paid service that indexes public domain images. High resolution images are pay-for-download and the rest are free. I could give them USD7/month for the high-resolution or I could use their search index and then use areverse image search such astineye to find a higher resolution version. Occasionally that way I find one that is higher resolution than even the paid-for picryl copy.

The search function is reasonably good. The UX is rough in spots — don't get me started on their horrible date range selector. When downloading images I need to click through a remarkable number of things, and do some mandatory tagging. Maybe that goes away if I pay?

We could think of it as a pay-to-play substitute for the sadly lamented Internet Public Library Book Images project. I cannot quite justify paying their monthly rate yet (I have SO MANY MONTHLY SERVICES), but we shall see if they wear me down.

The Smithsonian Institution images were a high profileopen access launch. Have not tested them in practice.

Digital Commonwealth is a non-profit collaborative organization, founded in 2006, that provides resources and services to support the creation, management, and dissemination of cultural heritage materials held by Massachusetts libraries, museums, historical societies, and archives. Digital Commonwealth currently has over 200 member institutions from across the state.

This site, managed by the Boston Public Library, provides access to thousands of images, documents, and sound recordings that have been digitized by member institutions so that they may be available to researchers, students, and the general public.

Some really nice onces in there, e.g.Collection:19th Century American Trade Cards.

British Library collection is useful if a little straight-laced. They had a conservative collection policy, or a conservative upload policy. It’s hard to find lurid, prurient, or provocative images, but there are some very beautiful and edifying ones, if that is your thing.

Art! Art! Art!Paris Musées have 100000 artworks for use apparently.

Metmuseum Open Access has a 400,000-strong open access public domain art image collection. Resolution tends to be low. Photographs professional, though.

- British Museum collection.
- Old book illustrations lithograph curation is good, although you mustpay $15 a month to get access to many full-res versions.
- Library of Congress Picture collection.
- University of Chicago overview.
- Biodiversity Heritage Library has nature sketches and such.
- NYPL collections are good but not all full resolution without a fee.

- 1930s-40s in Color | Flickr
- J.J. Grandville (See alsoGrandville, Visions, and Dreams – The Public Domain Review, espUn autre monde on Internet Archive)
- Thomas Wright’s An Original Theory or New Hypothesis of the Universe (1750) – The Public Domain Review
- WW II posters
- Athanasius Kircher light and shadowWDB - Wolfenbütteler Digitale Bibliothek - drucke/94-2-quod-2f
- Nicolas Louis Delaunois+++
- Fludd, Robert,Utriusque cosmi maioris scilicet et minoris metaphysica, physica atque technica historia : in duo volumnina secundum cosmi differentiam diuisa
- Kircher, Athanasius,Musurgia Universalis Sive Ars Magna Consoni Et Dissoni : In X. Libros Digesta : Quà Vniuersa Sonorum doctrina, & Philosophia, Musicaeque tam Theoricae, quam practicae scientia, summa varietate traditur
- WALTER DRAESNER - schattenrisss Webseite!
- Suzanne Gerber - Giving Taste A Bad Name Since Kindergarten» Blog Archive » A Mid-November Dance with Death
- Digitale Sammlungen / Bilder des Todes oder Todtentanz für alle Stände
- Digitale Sammlungen / Ein Totentanz [1]
- Digitale Sammlungen / Ein moderner Toten... [1]
- Suzanne Gerber - Giving Taste A Bad Name Since Kindergarten» Blog Archive » A Mid-November Dance with Death

One of the many wonderful features of the Internet Public Library is not its browsing page, which is a mess. I laboriouslyopted in to all the old (hopefully uncopyrighted) books by clicking checkboxes. Start from here to avoid disappointment.

Great content, but not always a great online-first experience.

Thousands of items from our collections have been digitised, and copies are freely accessible online. Our digital collections cover a wide variety of topics, and are particularly strong in the areas of mental health, sex and sexual health, genetics, public health, and 19th-century books.

If you're a member of the library, you also have access to many of our oursubscription databases and resources with your library card.

Digitised materials from our collections can be accessed and downloaded for use under a variety of Creative Commons non-commercial, attribution and Public Domain licences, depending on the material.

You can find all our digital items in thecatalogue. An option to limit your search to online material appears once you have entered a search term. You can also search for digital images in the “Images” tab on the catalogue

erara the incredible library of ultra-high-resolution lovingly digitised manuscripts and prints.

viaLibri is a search engine for actual physical books. It sometimes includes lavish high-resolution previews. Try, e.g.,*Rare books from 1621*.

Not a collection*per se*, rather, an obsession of mine.

- undraw is one source. From theHacker n00bs thread we also find…
- The Noun project
- FreeSVG
- Ouch! is a bunch of quirky/chunky illustrations of common situations for your quirky/chunky website.
- Humaaans provides paper-doll identikit people
- So doesfresh folk.
- illustrations.co is a 100-illustrations-in-100-days thing
- Isometric does isometric illustrations.
- Glaze: stock vector illustrations via crowdsourcing and profit-share
- Lukasz Adam is a productive solo illustrator’s loss-leader

- Redwood data,A Large Dataset of Object Scans
- Smithsonian 3d models

- World’s cinema has copies of many art-house films. Provenance unclear and thus suspect.
- content on torrent aggregators like torrends.to, thepiratebay.net, demonoid.is, ww1.1337x.buzz/1337x.to and rarbg.to is not
*necessarily*going to be violating copyright, but in practice, 90% of the content one finds there will be in violation, and it is best to steer clear.

- Open Culture tracks “675 Free Movies, 550 Free Audio Books, 600 Free eBooks, 170 Free Textbooks, 300 Free Language Lessons…”. AFAICT everything here is legit but once again, be careful and check the laws in your jurisdiction.

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Notoriously, GP regression scales badly with dataset size, requiring us to invert a matrix full of observation covariances. But inverting a matrix is just solving a least square optimisation, when you think about it. So can we solve it bygradient descent and have it somehow come out cheaper? Can we incorporate updates from only some margins at a time and still converge to the same answer? More cheaply? Maybe.

Chen et al. (2020) shows that we can optimise hyperparametrs by subsampling sites and performing SGD if the kernels are smooth enough and the batch sizes large enough. At inference time we are still in trouble though.

Minh (2022) bounds Wasserstein when we subsample observations and sites. e.g their algorithm 5.1 goes:

Input: Finite samples\(\left\{\xi_k^i\left(x_j\right)\right\}\), from\(N_i\) realizations\(\xi_k^i, 1 \leq k \leq N_i\), of processes\(\xi^i, i=1,2\), sampled at\(m\) points\(x_j, 1 \leq j \leq m\)

Procedure:

- Form\(m \times N_i\) data matrices\(Z_i\), with\(\left(Z_i\right)_{j k}=\xi_k^i\left(x_j\right), i=1,2,1 \leq j \leq m, 1 \leq k \leq N_i\)
- Compute\(m \times m\) empirical covariance matrices\(\hat{K}^i=\frac{1}{N} Z_i Z_i^T, i=1,2\)
- Compute\(W=W_2\left[\mathcal{N}\left(0, \frac{1}{m} \hat{K}^1\right), \mathcal{N}\left(0, \frac{1}{m} \hat{K}^2\right)\right]\) according to\[ W_2^2\left(\nu_0, \nu_1\right)=\left\|m_0-m_1\right\|^2+\operatorname{tr}\left(C_0\right)+\operatorname{tr}\left(C_1\right)-2 \operatorname{tr}\left(C_0^{1 / 2} C_1 C_0^{1 / 2}\right)^{1 / 2}\]

This is pretty much as we would expect to do it naively by plugging in the sample estimates of the target quantity, except they provide bounds for the quality of the estimate we get. AFAICS the bounds are trivial for Wasserstein in infinite-dimensional Hilbert spaces. But if we care about Sinkhorn divergences they seem to have useful bounds?

Anyway, sometimes subsampling is OK, it seems, if we want to approximate some GP in Sinkhorn divergence. Does this tell us anything about the optimisation problem?

Song et al. (2019) maybe? I wonder if theSlice Score Matching approach is feasible here? Works great indiffusion.

Chen, Hao, Lili Zheng, Raed Al Kontar, and Garvesh Raskutti. 2020.“Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes.” In*Proceedings of the 34th International Conference on Neural Information Processing Systems*, 2722–33. NIPS’20. Red Hook, NY, USA: Curran Associates Inc.

Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In*Proceedings of the 32nd International Conference on Neural Information Processing Systems*, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.

Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013.“Gaussian Processes for Big Data.” In*Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence*, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.

Minh, Hà Quang. 2022.“Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.”*SIAM/ASA Journal on Uncertainty Quantification*, February, 96–124.

Song, Yang, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. 2019.“Sliced Score Matching: A Scalable Approach to Density and Score Estimation.” arXiv.

Ubaru, Shashanka, Jie Chen, and Yousef Saad. 2017.“Fast Estimation of\(tr(f(A))\) via Stochastic Lanczos Quadrature.”*SIAM Journal on Matrix Analysis and Applications* 38 (4): 1075–99.

Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019.“Exact Gaussian Processes on a Million Data Points.” In*Advances in Neural Information Processing Systems*, 32:14648–59. Red Hook, NY, USA.

Many facts about the useful, boring, ubiquitous Gaussian. Djalil Chafaï listsThree reasons for Gaussians, emphasising more abstract, not-necessarily generative reasons.

- Gaussians asisotropic distributions — a Gaussian is the only distribution that can be both marginally independent and isotropic.
- Entropy maximizing (the Gaussian has the highest entropy out of any distribution wath fixed variance and finite entropy)
- The onlystable distribution with finite variance

Many other things give rise to Gaussians;
sampling distributions fortest statistics,bootstrap samples,low dimensional projections, anything with the rightStein-type symmetries…
There are many*post hoc* rationalisations that use the Gaussian in the hope that it is close enough to the real distribution: such as when we assume something is aGaussian process because they are tractable, or seek a noise distribution that will justifyquadratic loss, when we useBrownian motions in stochastic calculus because it comes out neatly, and so on.

The standard (univariate) Gaussian pdf is\[ \psi:x\mapsto \frac{1}{\sqrt{2\pi}}\text{exp}\left(-\frac{x^2}{2}\right). \] Typically we allow a scale-location parameterised version\[ \phi(x; \mu,\sigma ^{2})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}} \] We call the CDF\[ \Psi:x\mapsto \int_{-\infty}^x\psi(t) dt. \] In the multivariate case, where the covariance\(\Sigma\) is strictly positive definite, we can write a density of the general normal distribution over\(\mathbb{R}^k\) as\[ \psi({x}; \mu, \Sigma) = (2\pi )^{-{\frac {k}{2}}}\det(\Sigma)^{-\frac{1}{2}}\,\exp ({-\frac{1}{2}( x-\mu)^{\top}\Sigma^{-1}( x-\mu)}) \] If a random variable\(Y\) has a Gaussian distribution with parameters\(\mu, \Sigma\), we write\[Y \sim \mathcal{N}(\mu, \Sigma)\]

Taylor expansion of\(e^{-x^2/2}\)\[ e^(-x^2/2) = \sum_{k=0}^{\infty} (2^(-k) (-x^2)^k)/(k!). \]

\[\begin{aligned} \nabla_{x}\log\psi({x}; \mu, \Sigma) &= \nabla_{x}\left(-\frac{1}{2}( x-\mu)^{\top}\Sigma^{-1}( x-\mu) \right)\\ &= -( x-\mu)^{\top}\Sigma^{-1} \end{aligned}\]

Mills’ ratio is\((1 - \Phi(x))/\phi(x)\) and is a workhorse for tail inequalities for Gaussians. See the review and extensions of classic results inDümbgen (2010), found viaMike Spivey. Check out his extended justification for the classic identity

\[ \int_x^{\infty} \frac{1}{\sqrt{2\pi}} e^{-t^2/2} dt \leq \int_x^{\infty} \frac{t}{x} \frac{1}{\sqrt{2\pi}} e^{-t^2/2} dt = \frac{e^{-x^2/2}}{x\sqrt{2\pi}}.\]

First, trivially,\(\phi'(x)=-\frac{e^{-\frac{x^2}{2}} x}{\sqrt{2 \pi }}.\)

Meckes (2009) explainsStein (1972)’s characterisation:

The normal distribution is the unique probability measure\(\mu\) for which\[ \int\left[f^{\prime}(x)-x f(x)\right] \mu(d x)=0 \] for all\(f\) for which the left-hand side exists and is finite.

This is incredibly useful inprobability approximation by Gaussians where it justifiesStein’s method.

\[\begin{aligned} \sigma ^2 \phi'(x)+\phi(x) (x-\mu )&=0, \text{ i.e.}\\ L(x) &=(\sigma^2 D+x-\mu)\\ \end{aligned}\]

With initial conditions

\[\begin{aligned} \phi(0) &=\frac{e^{-\mu ^2/(2\sigma ^2)}}{\sqrt{2 \sigma^2\pi } }\\ \phi'(0) &=0 \end{aligned}\]

🏗 note where I learned this.

From(Steinbrecher and Shaw 2008) via Wikipedia.

Let us write\(w:=\Psi^{-1}\) to suppress keep notation clear.

\[\begin{aligned} {\frac {d^{2}w}{dp^{2}}} &=w\left({\frac {dw}{dp}}\right)^{2}\\ \end{aligned}\]

With initial conditions

\[\begin{aligned} w\left(1/2\right)&=0,\\ w'\left(1/2\right)&={\sqrt {2\pi }}. \end{aligned}\]

Z. I. Botev, Grotowski, and Kroese (2010) notes

\[\begin{aligned} \frac{\partial}{\partial t}\phi(x;t) &=\frac{1}{2}\frac{\partial^2}{\partial x^2}\phi(x;t)\\ \phi(x;0)&=\delta(x-\mu) \end{aligned}\]

Look, it’s the diffusion equation of Wiener process. Surprise! If you think about this for a while you end up discoveringFeynman-Kac formulate.

For small\(p\), the quantile function has the asymptotic expansion\[ \Phi^{-1}(p) = -\sqrt{\ln\frac{1}{p^2} - \ln\ln\frac{1}{p^2} - \ln(2\pi)} + \mathcal{o}(1). \]

Polynomial basis? You want the Hermite polynomials.

🏗

Univariate -

\[\begin{aligned} \left\| \frac{d}{dx}\phi_\sigma \right\|_2 &= \frac{1}{4\sqrt{\pi}\sigma^3}\\ \left\| \left(\frac{d}{dx}\right)^n \phi_\sigma \right\|_2 &= \frac{\prod_{i<n}2n-1}{2^{n+1}\sqrt{\pi}\sigma^{2n+1}} \end{aligned}\]

The normal distribution is theleast “surprising” distribution in the sense that out of all distributions with a given mean and variance the Gaussian has the maximum entropy. Or maybe that is the most surprising,depending on your definition.

Linear transforms of Gaussians are especially convenient. You could say that this is a definitional property of the Gaussian. Because we have learned to represent so many things by linear algebra, this means the pairing with Gaussians is a natural one. As made famous byGaussian process regression in Bayesian nonparametrics.

See, e.g.these lectures, orMichael I Jordan’s backgrounders.

In practice I look up my favourite useful Gaussian identities inPetersen and Pedersen (2012) and so does everyone else I know.

The Fourier transform/Characteristic function of a Gaussian is still Gaussian.

\[\mathbb{E}\exp (i\mathbf{t}\cdot \mathbf {X}) =\exp \left( i\mathbf {t} ^{\top}{\boldsymbol {\mu }}-{\tfrac {1}{2}}\mathbf {t} ^{\top}{\boldsymbol {\Sigma }}\mathbf {t} \right).\]

Since Gaussian approximations pop up a lot in e.g.variational approximation problems, it is nice to know how to relate them inprobability metrics. Seedistance between two Gaussians.

This*erf*, or*error function*, is a rebranding and reparameterisation of the
standard univariate normal cdf popular in computer science, to provide a slightly differently ambiguity to the one you are used to with the “normal” density.
There are scaling factors tacked on.

\[ \operatorname{erf}(x) = \frac{1}{\sqrt{\pi}} \int_{-x}^x e^{-t^2} \, dt \] which is to say\[\begin{aligned} \Phi(x) &={\frac {1}{2}}\left[1+\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)\right]\\ \operatorname {erf}(x) &=2\Phi (\sqrt{2}x)-1\\ \end{aligned}\]

Seematrix gaussian.

Botev, Z. I. 2017.“The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 79 (1): 125–48.

Botev, Z. I., J. F. Grotowski, and D. P. Kroese. 2010.“Kernel Density Estimation via Diffusion.”*The Annals of Statistics* 38 (5): 2916–57.

Botev, Zdravko, and Pierre L’Ecuyer. n.d.“A Tool for Efﬁcient Simulation from the Truncated Multivariate Normal and Student’s Distributions Pre Review,” 7.

Dümbgen, Lutz. 2010.“Bounding Standard Gaussian Tail Probabilities.”*arXiv:1012.2063 [Math, Stat]*, December.

Givens, Clark R., and Rae Michael Shortt. 1984.“A Class of Wasserstein Metrics for Probability Distributions.”*The Michigan Mathematical Journal* 31 (2): 231–40.

Gupta, A. K., and D. K. Nagar. 2018.*Matrix Variate Distributions*. CRC Press.

Magnus, Jan R., and Heinz Neudecker. 2019.*Matrix differential calculus with applications in statistics and econometrics*. 3rd ed. Wiley series in probability and statistics. Hoboken (N.J.): Wiley.

Majumdar, Rajeshwari, and Suman Majumdar. 2019.“On the Conditional Distribution of a Multivariate Normal Given a Transformation – the Linear Case.”*Heliyon* 5 (2): e01136.

Meckes, Elizabeth. 2009.“On Stein’s Method for Multivariate Normal Approximation.” In*High Dimensional Probability V: The Luminy Volume*, 153–78. Beachwood, Ohio, USA: Institute of Mathematical Statistics.

Minka, Thomas P. 2000.*Old and new matrix algebra useful for statistics*.

Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012.“The Matrix Cookbook.”

Richards, Winston A., Robin S, Ashok Sahai, and M. Raghunadh Acharya. 2010.“An Efficient Polynomial Approximation to the Normal Distribution Function and Its Inverse Function.”*Journal of Mathematics Research* 2 (4): p47.

Roy, Paramita, and Amit Choudhury. 2012.“Approximate Evaluation of Cumulative Distribution Function of Central Sampling Distributions: A Review.”*Electronic Journal of Applied Statistical Analysis* 5 (1).

Stein, Charles. 1972.“A Bound for the Error in the Normal Approximation to the Distribution of a Sum of Dependent Random Variables.”*Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory*, January, 583–602.

———. 1986.*Approximate Computation of Expectations*. Vol. 7. IMS.

Steinbrecher, György, and William T. Shaw. 2008.“Quantile Mechanics.”*European Journal of Applied Mathematics* 19 (2): 87–112.

Strecok, Anthony. 1968.“On the Calculation of the Inverse of the Error Function.”*Mathematics of Computation* 22 (101): 144–58.

Takatsu, Asuka. 2008.“On Wasserstein Geometry of the Space of Gaussian Measures,” January.

Wichura, Michael J. 1988.“Algorithm AS 241: The Percentage Points of the Normal Distribution.”*Journal of the Royal Statistical Society. Series C (Applied Statistics)* 37 (3): 477–84.

Zhang, Yufeng, Wanwei Liu, Zhenbang Chen, Ji Wang, and Kenli Li. 2022.“On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions.” arXiv.

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Ornstein-Uhlenbeck-typeautoregressive,stationary stochastic processes, e.g.stationary gamma processes, classic Gaussian noise Ornstein-Uhlenbeck processes… There is a family of such induced by everyLévy process via itsbridge.

Given a\(K \times K\) real matrix\(\Phi\) with all the eignevalues of\(\Phi\) in to the interval\((-1,1)\), and given a sequence\(\varepsilon_t\) of multivariate normal variables\(\varepsilon_t \sim \mathrm{N}(0, \Sigma)\), with\(\boldsymbol{\Sigma}\) a\(K \times K\) positive definite symmetric real matrix, the stationary distribution of the process\[ \mathbf{x}_t=\varepsilon_t+\boldsymbol{\Phi} \mathbf{x}_{t-1}=\sum_{h=0}^t \boldsymbol{\Phi}^h \varepsilon_{t-h} \quad ? \] is given by the Lyapunov equation, or just by basic variance identities. It is Gaussian with\(\mathcal{N}(0, \Lambda)\) where the following recurrence relation holds for\(\Lambda\),\[ \Lambda=\Phi \mathbf{x} \Phi^{\top}+\Sigma. \] The solution is also, apparently, the limit of a summation\[ \Lambda=\sum_{k=0}^{\infty} \Phi^k \Sigma\left(\Phi^{\top}\right)^k. \]

Suppose we use a Wiener process\(W\) as thedriving noise in continuous time with some small increment\(\epsilon\),\[ d \mathbf{x}(t)=-\epsilon A \mathbf{x}(t) d t+ \epsilon B d W(t) \] This is the Ornstein-Uhlenbeck process. If stable, at stationarity it an analytic stationary density\(\mathbf{x}\sim\mathcal{N}(0, \Lambda)\) where\[ \Lambda A+A \Lambda =\epsilon B B^{\top}. \]

Over atGamma processes,Wolpert (2021) notes several example constructions which “look like” Ornstein-Uhlenbeck processes, in that they are stationary-autoregressive, but constructed by different means. Should we look at processes like those here?

For fixed\(\alpha, \beta>0\) these notes present six different stationary time series, each with Gamma\(X_{t} \sim \operatorname{Ga}(\alpha, \beta)\) univariate marginal distributions and autocorrelation function\(\rho^{|s-t|}\) for\(X_{s}, X_{t} .\) Each will be defined on some time index set\(\mathcal{T}\), either\(\mathcal{T}=\mathbb{Z}\) or\(\mathcal{T}=\mathbb{R}\)

Five of the six constructions can be applied to other Infinitely Divisible (ID) distributions as well, both continuous ones (normal,\(\alpha\)-stable, etc.) and discrete (Poisson, negative binomial, etc). For specifically the Poisson and Gaussian distributions, all but one of them (the Markov change-point construction) coincide— essentially, there is just one “AR(1)-like” Gaussian process (namely, the\(\operatorname{AR}(1)\) process in discrete time, or the Ornstein-Uhlenbeck process in continuous time), and there is just one\(\operatorname{AR}(1)\)-like Poisson process. For other ID distributions, however, and in particular for the Gamma, each of these constructions yields a process with the same univariate marginal distributions and the same autocorrelation but with different joint distributions at three or more times.

Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012.“Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In*Proceedings of the 29th International Coference on International Conference on Machine Learning*, 1771–78. ICML’12. Madison, WI, USA: Omnipress.

Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022.“Structured Stochastic Gradient MCMC.” In*Proceedings of the 39th International Conference on Machine Learning*, 414–34. PMLR.

Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014.“Stochastic Gradient Hamiltonian Monte Carlo.” In*Proceedings of the 31st International Conference on Machine Learning*, 1683–91. Beijing, China: PMLR.

Chen, Zaiwei, Shancong Mou, and Siva Theja Maguluri. 2021.“Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization.” arXiv.

Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.“Stochastic Gradient Descent as Approximate Bayesian Inference.”*JMLR*, April.

Simoncini, V. 2016.“Computational Methods for Linear Matrix Equations.”*SIAM Review* 58 (3): 377–441.

Wolpert, Robert L. 2021.“Lecture Notes on Stationary Gamma Processes.”*arXiv:2106.00087 [Math]*, May.

CombiningMarkov Chain Monte Carlo andStochastic Gradient Descent for Bayesian inference, especially by using SGD to do some cheap version of MCMC posterior sampling. Overviews inMa, Chen, and Fox (2015) andMandt, Hoffman, and Blei (2017). A lot ofprobabilistic neural nets are built on this idea.

A related idea isestimating gradients of parameters by Monte Carlo; there is nothing necessarily Bayesian about that*per se*; in that case we are doing a noisy estimate of a deterministic quantity.
In*this* setting we are interested in the noise itself.

I have a vague memory that this argument is leveraged inNeal (1996)? Should check. For sure the version inMandt, Hoffman, and Blei (2017) is a highly developed and modern take. Basically, they analyse the distribution near convergence as anautoregressive process:

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results.

- We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions.
- We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models.
- We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly.
- We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally,
- we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.

The article is rather beautiful. Importantly they leverage the assumption that we are sampling from approximately (log-)quadratic posterior modes, which means that we should be suspicious of the method when

- The posterior is not quadratic, i.e. the distribution is not well approximated by a Gaussian at the mode, and
- The same for the tails. If there are low-probability but high importance posterior configurations such that they are not Gaussian in the tails, we should be skeptical that they will be sampled well; I have an intuition that this is a more stringent requirement, but TBH I am not sure of the exact relationship of these two conditions.

The models leveragegradient flow, which is a continuous limit ofstochastic gradient descent.

A popular recent development is the Stochastic Weight Averaging family of methods(Izmailov et al. 2018,2020;Maddox et al. 2019;Wilson and Izmailov 2020). SeeAndrew G Wilson’s web page for a brief description of the sub methods, since he seems to have been involved in all of them.

“a Markov Chain reminiscent of noisy gradient descent”(Welling and Teh 2011) extending vanillaLangevin dynamics.

- Some kind of variance control using auxiliary variables?

Ahn, Korattikara, and Welling (2012)

Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012.“Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In*Proceedings of the 29th International Coference on International Conference on Machine Learning*, 1771–78. ICML’12. Madison, WI, USA: Omnipress.

Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022.“Structured Stochastic Gradient MCMC.” In*Proceedings of the 39th International Conference on Machine Learning*, 414–34. PMLR.

Brosse, Nicolas, Éric Moulines, and Alain Durmus. 2018.“The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.” In*Proceedings of the 32nd International Conference on Neural Information Processing Systems*, 8278–88. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.

Chandramoorthy, Nisha, Andreas Loukas, Khashayar Gatmiry, and Stefanie Jegelka. 2022.“On the Generalization of Learning Algorithms That Do Not Converge.” arXiv.

Chaudhari, Pratik, and Stefano Soatto. 2018.“Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks.” In*2018 Information Theory and Applications Workshop (ITA)*, 1–10.

Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014.“Stochastic Gradient Hamiltonian Monte Carlo.” In*Proceedings of the 31st International Conference on Machine Learning*, 1683–91. Beijing, China: PMLR.

Chen, Zaiwei, Shancong Mou, and Siva Theja Maguluri. 2021.“Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization.” arXiv.

Choi, Hyunsun, Eric Jang, and Alexander A. Alemi. 2019.“WAIC, but Why? Generative Ensembles for Robust Anomaly Detection.” arXiv.

Dieuleveut, Aymeric, Alain Durmus, and Francis Bach. 2018.“Bridging the Gap Between Constant Step Size Stochastic Gradient Descent and Markov Chains.” arXiv.

Ding, Nan, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, and Hartmut Neven. 2014.“Bayesian Sampling Using Stochastic Gradient Thermostats.” In*Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2*, 3203–11. NIPS’14. Cambridge, MA, USA: MIT Press.

Durmus, Alain, and Eric Moulines. 2016.“High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.”*arXiv:1605.01559 [Math, Stat]*, May.

Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021.“Deep Neural Networks as Point Estimates for Deep Gaussian Processes.”*arXiv:2105.04504 [Cs, Stat]*, May.

Ge, Rong, Holden Lee, and Andrej Risteski. 2020.“Simulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.”*arXiv:1812.00793 [Cs, Math, Stat]*, September.

Girolami, Mark, and Ben Calderhead. 2011.“Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 73 (2): 123–214.

Grenander, Ulf, and Michael I. Miller. 1994.“Representations of Knowledge in Complex Systems.”*Journal of the Royal Statistical Society: Series B (Methodological)* 56 (4): 549–81.

Hodgkinson, Liam, Robert Salomone, and Fred Roosta. 2019.“Implicit Langevin Algorithms for Sampling From Log-Concave Densities.”*arXiv:1903.12322 [Cs, Stat]*, March.

Izmailov, Pavel, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2020.“Subspace Inference for Bayesian Deep Learning.” In*Proceedings of The 35th Uncertainty in Artificial Intelligence Conference*, 1169–79. PMLR.

Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018.“Averaging Weights Leads to Wider Optima and Better Generalization,” March.

Le, Samuel L. Smith and Quoc V. 2018.“A Bayesian Perspective on Generalization and Stochastic Gradient Descent.” In.

Ma, Yi-An, Tianqi Chen, and Emily B. Fox. 2015.“A Complete Recipe for Stochastic Gradient MCMC.” In*Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2*, 2917–25. NIPS’15. Cambridge, MA, USA: MIT Press.

Maclaurin, Dougal, David Duvenaud, and Ryan P. Adams. 2015.“Early Stopping as Nonparametric Variational Inference.” In*Proceedings of the 19th International Conference on Artificial Intelligence and Statistics*, 1070–77. arXiv.

Maddox, Wesley, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson. 2019.“A Simple Baseline for Bayesian Uncertainty in Deep Learning,” February.

Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.“Stochastic Gradient Descent as Approximate Bayesian Inference.”*JMLR*, April.

Neal, Radford M. 1996.“Bayesian Learning for Neural Networks.” Secaucus, NJ, USA: Springer-Verlag New York, Inc.

Norton, Richard A., and Colin Fox. 2016.“Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.”*arXiv:1610.00781 [Math, Stat]*, October.

Osawa, Kazuki, Siddharth Swaroop, Mohammad Emtiyaz E Khan, Anirudh Jain, Runa Eschenhagen, Richard E Turner, and Rio Yokota. 2019.“Practical Deep Learning with Bayesian Principles.” In*Advances in Neural Information Processing Systems*. Vol. 32. Red Hook, NY, USA: Curran Associates, Inc.

Parisi, G. 1981.“Correlation Functions and Computer Simulations.”*Nuclear Physics B* 180 (3): 378–84.

Rásonyi, Miklós, and Kinga Tikosi. 2022.“On the Stability of the Stochastic Gradient Langevin Algorithm with Dependent Data Stream.”*Statistics & Probability Letters* 182 (March): 109321.

Shang, Xiaocheng, Zhanxing Zhu, Benedict Leimkuhler, and Amos J Storkey. 2015.“Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling.” In*Advances in Neural Information Processing Systems*. Vol. 28. NIPS’15. Curran Associates, Inc.

Smith, Samuel L., Benoit Dherin, David Barrett, and Soham De. 2020.“On the Origin of Implicit Regularization in Stochastic Gradient Descent.” In.

Welling, Max, and Yee Whye Teh. 2011.“Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In*Proceedings of the 28th International Conference on International Conference on Machine Learning*, 681–88. ICML’11. Madison, WI, USA: Omnipress.

Wenzel, Florian, Kevin Roth, Bastiaan Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, and Sebastian Nowozin. 2020.“How Good Is the Bayes Posterior in Deep Neural Networks Really?” In*Proceedings of the 37th International Conference on Machine Learning*, 10248–59. PMLR.

Wilson, Andrew Gordon, and Pavel Izmailov. 2020.“Bayesian Deep Learning and a Probabilistic Perspective of Generalization,” February.

Xifara, T., C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami. 2014.“Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.”*Statistics & Probability Letters* 91 (Supplement C): 14–19.

Jolicoeur-Martineau et al. (2022);(**SongGenerative2020?**);Song and Ermon (2020b)

Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song

Seelog concave distributions for a family of distributions where this works especially well.Rob Salomone explains this well; seeHodgkinson, Salomone, and Roosta (2019). Holden Lee, Andrej Risteski introducethe connection between log-concavity and convex optimisation.

\[ x_{t+\eta} = x_t - \eta \nabla f(x_t) + \sqrt{2\eta}\xi_t,\quad \xi_t\sim N(0,I). \]

Left-field, Max Raginsky,Sampling Using Diffusion Processes, from Langevin to Schrödinger:

the Langevin process gives only approximate samples from\(\mu\). I would like to discuss an alternative approach that uses diffusion processes to obtain exact samples in finite time. This approach is based on ideas that appeared in two papers from the 1930s by Erwin Schrödinger in the context of physics, and is now referred to as the Schrödinger bridge problem.

Grenander, Ulf, and Michael I. Miller. 1994.“Representations of Knowledge in Complex Systems.”*Journal of the Royal Statistical Society: Series B (Methodological)* 56 (4): 549–81.

Hodgkinson, Liam, Robert Salomone, and Fred Roosta. 2019.“Implicit Langevin Algorithms for Sampling From Log-Concave Densities.”*arXiv:1903.12322 [Cs, Stat]*, March.

Jolicoeur-Martineau, Alexia, Rémi Piché-Taillefer, Ioannis Mitliagkas, and Remi Tachet des Combes. 2022.“Adversarial Score Matching and Improved Sampling for Image Generation.” In.

Parisi, G. 1981.“Correlation Functions and Computer Simulations.”*Nuclear Physics B* 180 (3): 378–84.

Song, Yang, and Stefano Ermon. 2020a.“Generative Modeling by Estimating Gradients of the Data Distribution.” In*Advances In Neural Information Processing Systems*. arXiv.

———. 2020b.“Improved Techniques for Training Score-Based Generative Models.” In*Advances In Neural Information Processing Systems*. arXiv.

Things I would like to re-derive for my own entertainment:

Conditioning in the sense ofmeasure-theoretic probability. Kolmogorov formulation. Conditioning as Radon-Nikodym derivative. Clunkiness of definition due to niceties of Lebesgue integration.

H.H. Rugh’s answer is nice.

TBC

Conditioning in full measure-theoretic glory forBayesian nonparametrics. E.g. conditioning ofGaussian Processes is also fun.

e.g.Wilson et al. (2021):

Let\((\Omega, \mathcal{F}, \mathbb{P})\) be a probability space and denote by\((\boldsymbol{a}, \boldsymbol{b})\) a pair of square integrable, centered random variables on\(\mathbb{R}^{n_{a}} \times \mathbb{R}^{n_{b}}\). The conditional expectation is the unique random variable that minimizes the optimization problem\[ \mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b})=\underset{\hat{\boldsymbol{a}}=f(\boldsymbol{b})}{\arg \min } \mathbb{E}(\hat{\boldsymbol{a}}-\boldsymbol{a})^{2} \] In words then,\(\mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b})\) is the measurable function of\(\boldsymbol{b}\) that best predicts\(\boldsymbol{a}\) in the sense of minimizing the mean square error\((6)\).

Uncorrelated, jointly Gaussian random variables are independent. Consequently, when\(\boldsymbol{a}\) and\(\boldsymbol{b}\) are jointly Gaussian, the optimal predictor\(\mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b})\) manifests as the best unbiased linear estimator\(\hat{\boldsymbol{a}}=\mathbf{S} \boldsymbol{b}\) of\(\boldsymbol{a}\)

Chang, J. T., and D. Pollard. 1997.“Conditioning as Disintegration.”*Statistica Neerlandica* 51 (3): 287–317.

Kallenberg, Olav. 2002.*Foundations of Modern Probability*. 2nd ed. Probability and Its Applications. New York: Springer-Verlag.

Schervish, Mark J. 2012.*Theory of Statistics*. Springer Series in Statistics. New York, NY: Springer Science & Business Media.

Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021.“Pathwise Conditioning of Gaussian Processes.”*Journal of Machine Learning Research* 22 (105): 1–47.

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\dif}{\backslash} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]

Can we find a transformation that will turn a Gaussian process prior sample into a Gaussian process posterior sample. A special trick where we doGP regression byGP simulation.

The main tool is an old insight made useful for modern problems inJ. T. Wilson et al. (2020) (brusque) andJ. T. Wilson et al. (2021) (deep). Actioned inRitter et al. (2021) to conditionprobabilistic neural nets somehow.

**Danger**: notation updates in the pipeline.

We start by examining a slightly different way of defining a Gaussian RV(J. T. Wilson et al. 2021) starting from the recipe for sampling:

A random vector\(\boldsymbol{x}=\left(x_{1}, \ldots, x_{n}\right) \in \mathbb{R}^{n}\) is said to be Gaussian if there exists a matrix\(\mathbf{L}\) and vector\(\boldsymbol{\mu}\) such that\[ \boldsymbol{x} \stackrel{\mathrm{d}}{=} \boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta} \quad \boldsymbol{\zeta} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \] where\(\mathcal{N}(\mathbf{0}, \mathbf{I})\) is known as the standard version of a (multivariate) normal distribution, which is defined through its density.

This is the location-scale form of a Gaussian RV, as opposed to thecanonical form which we use inGaussian Belief Propagation. In location-scale form, a non-degenerate Gaussian RV’s distribution is given (uniquely) by its mean\(\boldsymbol{\mu}=\mathbb{E}(\boldsymbol{x})\) and its covariance\(\boldsymbol{\Sigma}=\mathbb{E}\left[(\boldsymbol{x}-\boldsymbol{\mu})(\boldsymbol{x}-\boldsymbol{\mu})^{\top}\right] .\) In this notation the density, if defined, is\[ p(\boldsymbol{x})=\mathcal{N}(\boldsymbol{x} ; \boldsymbol{\mu}, \boldsymbol{\Sigma})=\frac{1}{\sqrt{|2 \pi \boldsymbol{\Sigma}|}} \exp \left(-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right). \]

Since\(\zeta\) has identity covariance, any matrix square root of\(\boldsymbol{\Sigma}\), such as the Cholesky factor\(\mathbf{L}\) with\(\boldsymbol{\Sigma}=\mathbf{L L}^{\top}\), may be used to draw\(\boldsymbol{x}=\boldsymbol{\mu}+\mathbf{L} \boldsymbol{\zeta}.\)

**tl;dr** we can think about drawing any Gaussian RV as transforming a standard Gaussian.
So much is basic entry-level stuff.
What might a rule which updates a Gaussian prior into a data-conditioned posterior look like?
Like this!

We define\(\cov(a,b)=\Sigma_{a,b}\) as the covariance between two random variables(J. T. Wilson et al. 2021):

Matheron’s Update Rule: Let\(\boldsymbol{a}\) and\(\boldsymbol{b}\) be jointly Gaussian, centered random variables. Then the random variable\(\boldsymbol{a}\) conditional on\(\boldsymbol{b}=\boldsymbol{\beta}\) may be expressed as\[ (\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \stackrel{\mathrm{d}}{=} \boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}}{\boldsymbol{\Sigma}}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b}) \] Proof: Comparing the mean and covariance on both sides immediately affirms the result\[ \begin{aligned} \mathbb{E}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) & =\boldsymbol{\mu}_{\boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}\left(\boldsymbol{\beta}-\boldsymbol{\mu}_{\boldsymbol{b}}\right) \\ & =\mathbb{E}(\boldsymbol{a} \mid \boldsymbol{b}=\boldsymbol{\beta}) \end{aligned} \]\[ \begin{aligned} \operatorname{Cov}\left(\boldsymbol{a}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1}(\boldsymbol{\beta}-\boldsymbol{b})\right) &=\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \operatorname{Cov}(\boldsymbol{b}) \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}} \\ & =\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{a}}+\boldsymbol{\Sigma}_{\boldsymbol{a}, \boldsymbol{b}} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{b}}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{b}, \boldsymbol{a}}\\ &=\operatorname{Cov}(\boldsymbol{a} \mid \boldsymbol{b} =\boldsymbol{\beta}) \end{aligned} \]

Can we find a transformation that will turn a Gaussian*process* prior sample into a Gaussian*process* posterior sample, and thus if we can use prior samples, which are presumably pretty easy, to create posterior ones, which are often hard.
If we evaluate the function at a finite number of points, then we can simply use this formula to do precisely that.
Turns out, we can sometimes and sometimes it can even be useful.
The resulting algorithm uses tricks from both analyticGP regression andMonte Carlo.

Exact in the sense that we do not approximate the data. These updates are not exact if our basis function representation is only an approximation to the “true” GP. There are neat extensions to the non-Gaussian and sparse cases; that comes later. For now we assume that the observation likelihood is Gaussian.

For a Gaussian process\(f \sim \mathcal{G P}(\mu, k)\) with marginal\(\boldsymbol{f}_{m}=f(\mathbf{Z})\), the process conditioned on\(\boldsymbol{f}_{m}=\boldsymbol{y}\) admits, in distribution, the representation\[ \underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {posterior }} \stackrel{\mathrm{d}}{=} \underbrace{f(\cdot)}_{\text {prior }}+\underbrace{k(\cdot, \mathbf{Z}) \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{y}-\boldsymbol{f}_{m}\right)}_{\text {update }}. \]

If our observations are contaminated by additive i.i.d Gaussian noise,\(\boldsymbol{y}=\boldsymbol{f}_{m} +\boldsymbol{\varepsilon}\) with\(\boldsymbol{\varepsilon}\sim\mathcal{N}(\boldsymbol{0}, \sigma^2\mathbf{I}),\) we find\[ \begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, n}\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}(\boldsymbol{y}-\boldsymbol{f}-\boldsymbol{\varepsilon}) \end{aligned} \] When sampling from exact GPs we jointly draw\(\boldsymbol{f}_{*}\) and\(\boldsymbol{f}\) from the prior. Then, we combine\(\boldsymbol{f}\) with noise variates\(\varepsilon \sim \mathcal{N}\left(\mathbf{0}, \sigma^{2} \mathbf{I}\right)\) such that\(\boldsymbol{f}+\varepsilon\) constitutes a draw from the prior distribution of\(\boldsymbol{y}\).

Compare this to the equivalentdistributional update from classical GP regression which updates the*moments* of a*distribution*, not*samples from a path* — the formulae are related though:

…the conditional distribution is the Gaussian\(\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)\) with moments\[\begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned} \]

For many purposes we need abasis function representation, a.k.a. the*weight-space* representation.
We assert the GP can be written as a random function comprising basis functions\(\boldsymbol{\phi}=\left(\phi_{1}, \ldots, \phi_{\ell}\right)\) with the Gaussian random weight vector\(w\) so that\[
f^{(w)}(\cdot)=\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot) \quad \boldsymbol{w} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\boldsymbol{w}}\right).
\]\(f^{(w)}\) is a random function satisfying\(\boldsymbol{f}^{(\boldsymbol{w})} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Phi}_{n} \boldsymbol{\Sigma}_{\boldsymbol{w}} \boldsymbol{\Phi}^{\top}\right)\), where\(\boldsymbol{\Phi}_{n}=\boldsymbol{\phi}(\mathbf{X})\) is a\(|\mathbf{X}| \times \ell\) matrix of features.
If we are lucky, the representation might not be too bad when the basis is truncated to a small size.

The posterior weight distribution\(\boldsymbol{w} \mid \boldsymbol{y} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{w} \mid n}, \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n}\right)\) is Gaussian with moments\[ \begin{aligned} \boldsymbol{\mu}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \boldsymbol{\Phi}^{\top} \boldsymbol{y} \\ \boldsymbol{\Sigma}_{\boldsymbol{w} \mid n} &=\left(\boldsymbol{\Phi}^{\top} \boldsymbol{\Phi}+\sigma^{2} \mathbf{I}\right)^{-1} \sigma^{2} \end{aligned} \] where\(\boldsymbol{\Phi}=\boldsymbol{\phi}(\mathbf{X})\) is an\(n \times \ell\) feature matrix. We solve for the right-hand side at\(\mathcal{O}\left(\min \{\ell, n\}^{3}\right)\) cost by applying the Woodbury identity as needed. So far there is nothing unusual here; the cool bit is realising we can represent this update as an independent operation.

In the weight-space setting, the pathwise update given an initial weight vector\(\boldsymbol{w} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\) is\(\boldsymbol{w} \mid \boldsymbol{y} \stackrel{\mathrm{d}}{=} \boldsymbol{w}+\boldsymbol{\Phi}^{\top}\left(\boldsymbol{\Phi} \boldsymbol{\Phi}^{\top}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}-\boldsymbol{\varepsilon}\right).\)

So if we had a nice weight-space representation for everything already we could go home at this point. However, for many models we are not given that; we might find natural bases for the prior and posterior are not the same (the posterior should not be stationary usually, for one thing).

The innovation inJ. T. Wilson et al. (2020) is to make different choices offunctional bases for prior and posterior updates.
We can choose anything really, AFAICT.
They suggest Fourier basis for prior and the*canonical basis*, i.e. the reproducing kernel basis\(k(\cdot,\vv{x})\) for the update.
Then we have\[
\underbrace{(f \mid \boldsymbol{y})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{x}_{j}\right)}_{\text {function-space update}} ,
\]
where we have defined\(\boldsymbol{v}=\left(\mathbf{K}_{n, n}+\sigma^{2} \mathbf{I}\right)^{-1}\left(\boldsymbol{y}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}- \boldsymbol{\varepsilon}\right) .\)

I.e. usinginducing variables.

Given\(q(\boldsymbol{u})\), we approximate posterior distributions as\[ p\left(\boldsymbol{f}_{*} \mid \boldsymbol{y}\right) \approx \int_{\mathbb{R}^{m}} p\left(\boldsymbol{f}_{*} \mid \boldsymbol{u}\right) q(\boldsymbol{u}) \mathrm{d} \boldsymbol{u} . \] If\(\boldsymbol{u} \sim \mathcal{N}\left(\boldsymbol{\mu}_{\boldsymbol{u}}, \boldsymbol{\Sigma}_{\boldsymbol{u}}\right)\), we compute this integral analytically to obtain a Gaussian distribution with mean and covariance\[ \begin{aligned} \boldsymbol{m}_{* \mid m} &=\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1} \boldsymbol{\mu}_{m} \\ \mathbf{K}_{*, * \mid m} &=\mathbf{K}_{*, *}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{\Sigma}_{\boldsymbol{u}}-\mathbf{K}_{m, m}\right) \mathbf{K}_{m, m}^{-1} \mathbf{K}_{m, *^{*}} \end{aligned} \]

\[ \begin{aligned} &\boldsymbol{f}_{*} \mid \boldsymbol{u} \stackrel{\mathrm{d}}{=} \boldsymbol{f}_{*}+\mathbf{K}_{*, m} \mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{f}_{m}\right) \\ \end{aligned} \]

When sampling from sparse GPs we draw\(\boldsymbol{f}_{*}\) and\(\boldsymbol{f}_{m}\) together from the prior, and independently generate target values\(\boldsymbol{u} \sim q(\boldsymbol{u}) .\)\[ \underbrace{(f \mid \boldsymbol{u})(\cdot)}_{\text {sparse posterior }} \stackrel{\mathrm{d}}{\approx} \underbrace{\sum_{i=1}^{\ell} w_{i} \phi_{i}(\cdot)}_{\text {weight-space prior}} +\underbrace{\sum_{j=1}^{m} v_{j} k\left(\cdot, \boldsymbol{z}_{j}\right)}_{\text {function-space update}} , \] where we have defined\(\boldsymbol{v}=\mathbf{K}_{m, m}^{-1}\left(\boldsymbol{u}-\boldsymbol{\Phi}^{\top} \boldsymbol{w}\right) .\)

(Ritter et al. 2021 appendix D) reframes the Matheron update and generalises it tomatrix-Gaussians. TBC.

Thus far we have talked about moves updating a prior to a posterior; how about moves within a posterior?

We could tryLangevin sampling, for example, orSG MCMC but these all seem to require inverting the covariance matrix so are not likely to be efficient in general. Can we do better?

Abrahamsen, Petter. 1997.“A Review of Gaussian Random Fields and Correlation Functions.”

Abt, Markus, and William J. Welch. 1998.“Fisher Information and Maximum-Likelihood Estimation of Covariance Parameters in Gaussian Stochastic Processes.”*Canadian Journal of Statistics* 26 (1): 127–37.

Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004.“Exponential Families for Conditional Random Fields.” In*Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence*, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.

Alvarado, Pablo A., and Dan Stowell. 2018.“Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music.”*arXiv:1705.07104 [Cs, Stat]*, November.

Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015.“Fast Direct Methods for Gaussian Processes.”*arXiv:1403.6015 [Astro-Ph, Stat]*, April.

Bachoc, F., F. Gamboa, J. Loubes, and N. Venet. 2018.“A Gaussian Process Regression Model for Distribution Inputs.”*IEEE Transactions on Information Theory* 64 (10): 6620–37.

Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019.“Gaussian Processes with Multidimensional Distribution Inputs via Optimal Transport and Hilbertian Embedding.”*arXiv:1805.00753 [Stat]*, April.

Birgé, Lucien, and Pascal Massart. 2006.“Minimal Penalties for Gaussian Model Selection.”*Probability Theory and Related Fields* 138 (1-2): 33–73.

Bonilla, Edwin V., Kian Ming A. Chai, and Christopher K. I. Williams. 2007.“Multi-Task Gaussian Process Prediction.” In*Proceedings of the 20th International Conference on Neural Information Processing Systems*, 153–60. NIPS’07. USA: Curran Associates Inc.

Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019.“Generic Inference in Latent Gaussian Process Models.”*Journal of Machine Learning Research* 20 (117): 1–63.

Borovitskiy, Viacheslav, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2020.“Matérn Gaussian Processes on Riemannian Manifolds.”*arXiv:2006.10160 [Cs, Stat]*, June.

Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020.“Convergence of Sparse Variational Inference in Gaussian Processes Regression.”*Journal of Machine Learning Research* 21 (131): 1–63.

Calandra, R., J. Peters, C. E. Rasmussen, and M. P. Deisenroth. 2016.“Manifold Gaussian Processes for Regression.” In*2016 International Joint Conference on Neural Networks (IJCNN)*, 3338–45. Vancouver, BC, Canada: IEEE.

Cressie, Noel. 1990.“The Origins of Kriging.”*Mathematical Geology* 22 (3): 239–52.

———. 2015.*Statistics for Spatial Data*. John Wiley & Sons.

Cressie, Noel, and Christopher K. Wikle. 2011.*Statistics for Spatio-Temporal Data*. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.

Csató, Lehel, and Manfred Opper. 2002.“Sparse On-Line Gaussian Processes.”*Neural Computation* 14 (3): 641–68.

Csató, Lehel, Manfred Opper, and Ole Winther. 2001.“TAP Gibbs Free Energy, Belief Propagation and Sparsity.” In*Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic*, 657–63. NIPS’01. Cambridge, MA, USA: MIT Press.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008.“Fast Gaussian Process Methods for Point Process Intensity Estimation.” In*Proceedings of the 25th International Conference on Machine Learning*, 192–99. ICML ’08. New York, NY, USA: ACM Press.

Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017.“Random Feature Expansions for Deep Gaussian Processes.” In*PMLR*.

Dahl, Astrid, and Edwin Bonilla. 2017.“Scalable Gaussian Process Models for Solar Power Forecasting.” In*Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy*, edited by Wei Lee Woon, Zeyar Aung, Oliver Kramer, and Stuart Madnick, 94–106. Lecture Notes in Computer Science. Cham: Springer International Publishing.

Dahl, Astrid, and Edwin V. Bonilla. 2019.“Sparse Grouped Gaussian Processes for Solar Power Forecasting.”*arXiv:1903.03986 [Cs, Stat]*, March.

Damianou, Andreas, and Neil Lawrence. 2013.“Deep Gaussian Processes.” In*Artificial Intelligence and Statistics*, 207–15.

Damianou, Andreas, Michalis K. Titsias, and Neil D. Lawrence. 2011.“Variational Gaussian Process Dynamical Systems.” In*Advances in Neural Information Processing Systems 24*, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2510–18. Curran Associates, Inc.

Dezfouli, Amir, and Edwin V. Bonilla. 2015.“Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In*Advances in Neural Information Processing Systems 28*, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.

Domingos, Pedro. 2020.“Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.”*arXiv:2012.00152 [Cs, Stat]*, November.

Dubrule, Olivier. 2018.“Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In*Handbook of Mathematical Geosciences: Fifty Years of IAMG*, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.

Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018.“How Deep Are Deep Gaussian Processes?”*Journal of Machine Learning Research* 19 (1): 2100–2145.

Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021.“Deep Neural Networks as Point Estimates for Deep Gaussian Processes.”*arXiv:2105.04504 [Cs, Stat]*, May.

Dutordoir, Vincent, Alan Saul, Zoubin Ghahramani, and Fergus Simpson. 2022.“Neural Diffusion Processes.” arXiv.

Duvenaud, David. 2014.“Automatic Model Construction with Gaussian Processes.” PhD Thesis, University of Cambridge.

Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013.“Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In*Proceedings of the 30th International Conference on Machine Learning (ICML-13)*, 1166–74.

Ebden, Mark. 2015.“Gaussian Processes: A Quick Introduction.”*arXiv:1505.02965 [Math, Stat]*, May.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017.“Identification of Gaussian Process State Space Models.” In*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.

Emery, Xavier. 2007.“Conditioning Simulations of Gaussian Random Fields by Ordinary Kriging.”*Mathematical Geology* 39 (6): 607–23.

Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005.“Learning Multiple Tasks with Kernel Methods.”*Journal of Machine Learning Research* 6 (Apr): 615–37.

Ferguson, Thomas S. 1973.“A Bayesian Analysis of Some Nonparametric Problems.”*The Annals of Statistics* 1 (2): 209–30.

Finzi, Marc, Roberto Bondesan, and Max Welling. 2020.“Probabilistic Numeric Convolutional Neural Networks.”*arXiv:2010.10876 [Cs]*, October.

Föll, Roman, Bernard Haasdonk, Markus Hanselmann, and Holger Ulmer. 2017.“Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation.”*arXiv:1711.00799 [Stat]*, November.

Frigola, Roger, Yutian Chen, and Carl Edward Rasmussen. 2014.“Variational Gaussian Process State-Space Models.” In*Advances in Neural Information Processing Systems 27*, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3680–88. Curran Associates, Inc.

Frigola, Roger, Fredrik Lindsten, Thomas B Schön, and Carl Edward Rasmussen. 2013.“Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC.” In*Advances in Neural Information Processing Systems 26*, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3156–64. Curran Associates, Inc.

Gal, Yarin, and Zoubin Ghahramani. 2015.“Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In*Proceedings of the 33rd International Conference on Machine Learning (ICML-16)*.

Gal, Yarin, and Mark van der Wilk. 2014.“Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.”*arXiv:1402.1412 [Stat]*, February.

Galliani, Pietro, Amir Dezfouli, Edwin V Bonilla, and Novi Quadrianto. n.d.“Gray-Box Inference for Structured Gaussian Process Models,” 9.

Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In*Proceedings of the 32nd International Conference on Neural Information Processing Systems*, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.

Gardner, Jacob R., Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Product Kernel Interpolation for Scalable Gaussian Processes.”*arXiv:1802.08903 [Cs, Stat]*, February.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018.“Conditional Neural Processes.”*arXiv:1807.01613 [Cs, Stat]*, July, 10.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018.“Neural Processes,” July.

Ghahramani, Zoubin. 2013.“Bayesian Non-Parametrics and the Probabilistic Approach to Modelling.”*Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences* 371 (1984): 20110553.

Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015.“Scaling Multidimensional Inference for Structured Gaussian Processes.”*IEEE Transactions on Pattern Analysis and Machine Intelligence* 37 (2): 424–36.

Girolami, Mark, and Simon Rogers. 2005.“Hierarchic Bayesian Models for Kernel Learning.” In*Proceedings of the 22nd International Conference on Machine Learning - ICML ’05*, 241–48. Bonn, Germany: ACM Press.

Gramacy, Robert B. 2016.“laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R.”*Journal of Statistical Software* 72 (1).

Gramacy, Robert B., and Daniel W. Apley. 2015.“Local Gaussian Process Approximation for Large Computer Experiments.”*Journal of Computational and Graphical Statistics* 24 (2): 561–78.

Gratiet, Loïc Le, Stefano Marelli, and Bruno Sudret. 2016.“Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expansions and Gaussian Processes.” In*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–37. Cham: Springer International Publishing.

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012.“Exploiting Compositionality to Explore a Large Space of Model Structures.” In*Proceedings of the Conference on Uncertainty in Artificial Intelligence*.

Hartikainen, J., and S. Särkkä. 2010.“Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In*2010 IEEE International Workshop on Machine Learning for Signal Processing*, 379–84. Kittila, Finland: IEEE.

Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013.“Gaussian Processes for Big Data.” In*Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence*, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.

Huber, Marco F. 2014.“Recursive Gaussian Process: On-Line Regression and Learning.”*Pattern Recognition Letters* 45 (August): 85–91.

Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018.“Scalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.”*arXiv:1806.10234 [Cs, Stat]*, June.

Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020.“Deep Sigma Point Processes.” In*Conference on Uncertainty in Artificial Intelligence*, 789–98. PMLR.

Jordan, Michael Irwin. 1999.*Learning in Graphical Models*. Cambridge, Mass.: MIT Press.

Karvonen, Toni, and Simo Särkkä. 2016.“Approximate State-Space Gaussian Processes via Spectral Transformation.” In*2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)*, 1–6. Vietri sul Mare, Salerno, Italy: IEEE.

Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020.“Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.”*arXiv:2001.08055 [Physics, Stat]*, January.

Kingma, Diederik P., and Max Welling. 2014.“Auto-Encoding Variational Bayes.” In*ICLR 2014 Conference*.

Ko, Jonathan, and Dieter Fox. 2009.“GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In*Autonomous Robots*, 27:75–90.

Kocijan, Juš, Agathe Girard, Blaž Banko, and Roderick Murray-Smith. 2005.“Dynamic Systems Identification with Gaussian Processes.”*Mathematical and Computer Modelling of Dynamical Systems* 11 (4): 411–24.

Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016.“AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In*UAI17*.

Krige, D. G. 1951.“A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.”*Journal of the Southern African Institute of Mining and Metallurgy* 52 (6): 119–39.

Kroese, Dirk P., and Zdravko I. Botev. 2013.“Spatial Process Generation.”*arXiv:1308.0399 [Stat]*, August.

Lawrence, Neil. 2005.“Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.”*Journal of Machine Learning Research* 6 (Nov): 1783–1816.

Lawrence, Neil D., and Raquel Urtasun. 2009.“Non-Linear Matrix Factorization with Gaussian Processes.” In*Proceedings of the 26th Annual International Conference on Machine Learning*, 601–8. ICML ’09. New York, NY, USA: ACM.

Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003.“Fast Sparse Gaussian Process Methods: The Informative Vector Machine.” In*Proceedings of the 16th Annual Conference on Neural Information Processing Systems*, 609–16.

Lázaro-Gredilla, Miguel, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R. Figueiras-Vidal. 2010.“Sparse Spectrum Gaussian Process Regression.”*Journal of Machine Learning Research* 11 (Jun): 1865–81.

Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018.“Deep Neural Networks as Gaussian Processes.” In*ICLR*.

Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2021.“A Tutorial on Sparse Gaussian Processes and Variational Inference.”*arXiv:2012.13962 [Cs, Stat]*, June.

Lenk, Peter J. 2003.“Bayesian Semiparametric Density Estimation and Model Verification Using a Logistic–Gaussian Process.”*Journal of Computational and Graphical Statistics* 12 (3): 548–65.

Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011.“An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 73 (4): 423–98.

Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011.“Gaussian Processes for Underdetermined Source Separation.”*IEEE Transactions on Signal Processing* 59 (7): 3155–67.

Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014.“Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” In*Twenty-Eighth AAAI Conference on Artificial Intelligence*.

Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019.“The Functional Neural Process.”*arXiv:1906.08324 [Cs, Stat]*, June.

MacKay, David J C. 1998.“Introduction to Gaussian Processes.”*NATO ASI Series. Series F: Computer and System Sciences* 168: 133–65.

———. 2002.“Gaussian Processes.” In*Information Theory, Inference & Learning Algorithms*, Chapter 45. Cambridge University Press.

Matheron, Georges. 1963a.*Traité de Géostatistique Appliquée. 2. Le Krigeage*. Editions Technip.

———. 1963b.“Principles of Geostatistics.”*Economic Geology* 58 (8): 1246–66.

Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016.“GPflow: A Gaussian Process Library Using TensorFlow.”*arXiv:1610.08733 [Stat]*, October.

Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Guilherme A. Barreto, and Neil D. Lawrence. 2017.“Deep Recurrent Gaussian Processes for Outlier-Robust System Identification.”*Journal of Process Control*, DYCOPS-CAB 2016, 60 (December): 82–94.

Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A. Barreto, and Neil D. Lawrence. 2016.“Recurrent Gaussian Processes.” In*Proceedings of ICLR*.

Micchelli, Charles A., and Massimiliano Pontil. 2005a.“Learning the Kernel Function via Regularization.”*Journal of Machine Learning Research* 6 (Jul): 1099–1125.

———. 2005b.“On Learning Vector-Valued Functions.”*Neural Computation* 17 (1): 177–204.

Minh, Hà Quang. 2022.“Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.”*SIAM/ASA Journal on Uncertainty Quantification*, February, 96–124.

Mohammadi, Hossein, Peter Challenor, and Marc Goodfellow. 2021.“Emulating Computationally Expensive Dynamical Simulators Using Gaussian Processes.”*arXiv:2104.14987 [Stat]*, April.

Moreno-Muñoz, Pablo, Antonio Artés-Rodríguez, and Mauricio A. Álvarez. 2019.“Continual Multi-Task Gaussian Processes.”*arXiv:1911.00002 [Cs, Stat]*, October.

Nagarajan, Sai Ganesh, Gareth Peters, and Ido Nevat. 2018.“Spatial Field Reconstruction of Non-Gaussian Random Fields: The Tukey G-and-H Random Process.”*SSRN Electronic Journal*.

Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018.“State Space Gaussian Processes with Non-Gaussian Likelihood.” In*International Conference on Machine Learning*, 3789–98.

O’Hagan, A. 1978.“Curve Fitting and Optimal Design for Prediction.”*Journal of the Royal Statistical Society: Series B (Methodological)* 40 (1): 1–24.

Papaspiliopoulos, Omiros, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. 2012.“Nonparametric Estimation of Diffusions: A Differential Equations Approach.”*Biometrika* 99 (3): 511–31.

Pinder, Thomas, and Daniel Dodd. 2022.“GPJax: A Gaussian Process Framework in JAX.”*Journal of Open Source Software* 7 (75): 4455.

Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Constant-Time Predictive Distributions for Gaussian Processes.” In. arXiv.

Pleiss, Geoff, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. 2020.“Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization.”*Advances in Neural Information Processing Systems* 33.

Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. 2005.“A Unifying View of Sparse Approximate Gaussian Process Regression.”*Journal of Machine Learning Research* 6 (Dec): 1939–59.

Raissi, Maziar, and George Em Karniadakis. 2017.“Machine Learning of Linear Differential Equations Using Gaussian Processes.”*arXiv:1701.02440 [Cs, Math, Stat]*, January.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006.*Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.

Reece, S., and S. Roberts. 2010.“An Introduction to Gaussian Processes for the Kalman Filter Expert.” In*2010 13th International Conference on Information Fusion*, 1–9.

Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021.“Sparse Uncertainty Representation in Deep Learning with Inducing Weights.”*arXiv:2105.14594 [Cs, Stat]*, May.

Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020.“Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.”*arXiv:2004.11408 [Stat]*, April.

Rossi, Simone, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, and Maurizio Filippone. 2020.“Rethinking Sparse Gaussian Processes: Bayesian Approaches to Inducing-Variable Approximations,” March.

Saatçi, Yunus. 2012.“Scalable inference for structured Gaussian process models.” Ph.D., University of Cambridge.

Saatçi, Yunus, Ryan Turner, and Carl Edward Rasmussen. 2010.“Gaussian Process Change Point Models.” In*Proceedings of the 27th International Conference on International Conference on Machine Learning*, 927–34. ICML’10. Madison, WI, USA: Omnipress.

Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020.“Variational Integrator Networks for Physically Structured Embeddings.”*arXiv:1910.09349 [Cs, Stat]*, March.

Salimbeni, Hugh, and Marc Deisenroth. 2017.“Doubly Stochastic Variational Inference for Deep Gaussian Processes.” In*Advances In Neural Information Processing Systems*.

Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. 2018.“Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models.” In*International Conference on Artificial Intelligence and Statistics*, 689–97.

Särkkä, Simo. 2011.“Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In*Artificial Neural Networks and Machine Learning – ICANN 2011*, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.

———. 2013.*Bayesian Filtering and Smoothing*. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.

Särkkä, Simo, and Jouni Hartikainen. 2012.“Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In*Artificial Intelligence and Statistics*.

Särkkä, Simo, A. Solin, and J. Hartikainen. 2013.“Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.”*IEEE Signal Processing Magazine* 30 (4): 51–61.

Schulam, Peter, and Suchi Saria. 2017.“Reliable Decision Support Using Counterfactual Models.” In*Proceedings of the 31st International Conference on Neural Information Processing Systems*, 1696–706. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.

Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014.“Student-t Processes as Alternatives to Gaussian Processes.” In*Artificial Intelligence and Statistics*, 877–85. PMLR.

Sidén, Per. 2020.*Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields*. Vol. 15. Linköping Studies in Statistics. Linköping: Linköping University Electronic Press.

Smith, Michael Thomas, Mauricio A. Alvarez, and Neil D. Lawrence. 2018.“Gaussian Process Regression for Binned Data.”*arXiv:1809.02010 [Cs, Stat]*, September.

Snelson, Edward, and Zoubin Ghahramani. 2005.“Sparse Gaussian Processes Using Pseudo-Inputs.” In*Advances in Neural Information Processing Systems*, 1257–64.

Solin, Arno, and Simo Särkkä. 2020.“Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.”*Statistics and Computing* 30 (2): 419–46.

Tait, Daniel J., and Theodoros Damoulas. 2020.“Variational Autoencoding of PDE Inverse Problems.”*arXiv:2006.15641 [Cs, Stat]*, June.

Tang, Wenpin, Lu Zhang, and Sudipto Banerjee. 2019.“On Identifiability and Consistency of the Nugget in Gaussian Spatial Process Models.”*arXiv:1908.05726 [Math, Stat]*, August.

Titsias, Michalis K. 2009a.“Variational Learning of Inducing Variables in Sparse Gaussian Processes.” In*International Conference on Artificial Intelligence and Statistics*, 567–74. PMLR.

———. 2009b.“Variational Model Selection for Sparse Gaussian Process Regression: TEchical Supplement.” Technical report, School of Computer Science, University of Manchester.

Titsias, Michalis, and Neil D. Lawrence. 2010.“Bayesian Gaussian Process Latent Variable Model.” In*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*, 844–51.

Tokdar, Surya T. 2007.“Towards a Faster Implementation of Density Estimation With Logistic Gaussian Process Priors.”*Journal of Computational and Graphical Statistics* 16 (3): 633–55.

Turner, Richard E., and Maneesh Sahani. 2014.“Time-Frequency Analysis as Probabilistic Inference.”*IEEE Transactions on Signal Processing* 62 (23): 6171–83.

Turner, Ryan, Marc Deisenroth, and Carl Rasmussen. 2010.“State-Space Inference and Learning with Gaussian Processes.” In*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*, 868–75.

Vanhatalo, Jarno, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013.“GPstuff: Bayesian Modeling with Gaussian Processes.”*Journal of Machine Learning Research* 14 (April): 1175−1179.

———. 2015.“Bayesian Modeling with Gaussian Processes Using the GPstuff Toolbox.”*arXiv:1206.5754 [Cs, Stat]*, July.

Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008.“Sparse Multiscale Gaussian Process Regression.” In*Proceedings of the 25th International Conference on Machine Learning*, 1112–19. ICML ’08. New York, NY, USA: ACM.

Walder, C., B. Schölkopf, and O. Chapelle. 2006.“Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.”*Computer Graphics Forum* 25 (3): 635–44.

Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019.“Exact Gaussian Processes on a Million Data Points.” In*Advances in Neural Information Processing Systems*, 32:14648–59. Red Hook, NY, USA.

Wikle, Christopher K., Noel Cressie, and Andrew Zammit-Mangion. 2019.*Spatio-Temporal Statistics with R*.

Wilk, Mark van der, Andrew G. Wilson, and Carl E. Rasmussen. 2014.“Variational Inference for Latent Variable Modelling of Correlation Structure.” In*NIPS 2014 Workshop on Advances in Variational Inference*.

Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019.“End-to-End Probabilistic Inference for Nonstationary Audio Analysis.”*arXiv:1901.11436 [Cs, Eess, Stat]*, January.

Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021.“Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.” arXiv.

Williams, Christopher KI, and Matthias Seeger. 2001.“Using the Nyström Method to Speed Up Kernel Machines.” In*Advances in Neural Information Processing Systems*, 682–88.

Williams, Christopher, Stefan Klanke, Sethu Vijayakumar, and Kian M. Chai. 2009.“Multi-Task Gaussian Process Learning of Robot Inverse Dynamics.” In*Advances in Neural Information Processing Systems 21*, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 265–72. Curran Associates, Inc.

Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013.“Gaussian Process Kernels for Pattern Discovery and Extrapolation.” In*International Conference on Machine Learning*.

Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015.“The Human Kernel.”*arXiv:1510.07389 [Cs, Stat]*, October.

Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011.“Generalised Wishart Processes.” In*Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence*, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.

———. 2012.“Modelling Input Varying Correlations Between Multiple Responses.” In*Machine Learning and Knowledge Discovery in Databases*, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.

Wilson, Andrew Gordon, David A. Knowles, and Zoubin Ghahramani. 2012.“Gaussian Process Regression Networks.” In*Proceedings of the 29th International Coference on International Conference on Machine Learning*, 1139–46. ICML’12. Madison, WI, USA: Omnipress.

Wilson, Andrew Gordon, and Hannes Nickisch. 2015.“Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP).” In*Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37*, 1775–84. ICML’15. Lille, France: JMLR.org.

Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020.“Efficiently Sampling Functions from Gaussian Process Posteriors.” In*Proceedings of the 37th International Conference on Machine Learning*, 10292–302. PMLR.

Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021.“Pathwise Conditioning of Gaussian Processes.”*Journal of Machine Learning Research* 22 (105): 1–47.

Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020.“Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In*Proceedings of NeurIPS 2020*.

\[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\cov}{\operatorname{Cov}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\one}{\unicode{x1D7D9}}\]

Training neural networks byensemble Kalman updates instead ofSGD. Arises naturally from thedynamical perspective on neural networks. TBD.

Claudia Schilling’s filter(Schillings and Stuart 2017) is an elegant variant of the ensemble Kalman Filter which looks somehow more general than the original but also simpler.Haber, Lucka, and Ruthotto (2018) use it to train neural nets (!) and show a rather beautiful connection to stochastic gradient descent in section 3.2.

Chen, Chong, Yixuan Dou, Jie Chen, and Yaru Xue. 2022.“A Novel Neural Network Training Framework with Data Assimilation.”*The Journal of Supercomputing*, June.

Haber, Eldad, Felix Lucka, and Lars Ruthotto. 2018.“Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.”*arXiv:1805.08034 [Cs, Math]*, May.

Kovachki, Nikola B., and Andrew M. Stuart. 2019.“Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.”*Inverse Problems* 35 (9): 095005.

Schillings, Claudia, and Andrew M. Stuart. 2017.“Analysis of the Ensemble Kalman Filter for Inverse Problems.”*SIAM Journal on Numerical Analysis* 55 (3): 1264–90.

Venturi, Daniele, and Xiantao Li. 2022.“The Mori-Zwanzig Formulation of Deep Learning.” arXiv.

Yegenoglu, Alper, Kai Krajsek, Sandra Diaz Pier, and Michael Herty. 2020.“Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In*Machine Learning, Optimization, and Data Science*, edited by Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, 12566:78–92. Cham: Springer International Publishing.

Buying items from other markets that will not ship to me.

Shop & Ship services many challenging places, including some hard-to-get locations—Australia, Canada, China, Cyprus, Czech Republic, Egypt, France, Georgia, Germany, Greece, Hong Kong, India, Indonesia, Italy, Japan, Jordan, Saudi Arabia, Lebanon, Malaysia, New Zealand, Pakistan, Singapore, South Africa, South Korea, Spain, Sri Lanka, Switzerland, Thailand, Turkey, United Arab Emirates, UK, and USA.

Free basic, USD70/year premium. Will tru these folks next.

I have trialled the Singapore Post servicevPost.~~USP: will reship fromIndonesian merchants to Australia. (Batik!)~~
Update 2022-08-24: they discontinued all reshipping from Asia to Australia.
This service is pointless.
The below information is of purely historical interest.

I have now received one parcel from via vPost.

Notes:
First, they did not note receipt of my parcel at the Jakarta warehouse.
I emailed the vendor (in Indonesian) and got delivery receipt details.
vPost did not respond to my queries about this when I used the Australia vPost contact address.**Pro-tip**: the Australian contact email address is UNATTENDED and will lead to unsatisfactory customer service if you believe it is real.Use the Singapore-based customer service form to avoid sadness.
Once I tried that, they found my parcel in the warehouse within 2 days.

Next step: The projected shipping time to Australia was way off (36 days late, for an overall delivery time of 56 days). I guess they get a COVID pass for that, this one time.

They provided me tracking number for Australia post, which was valid for the last two days of the delivery only, once it reached Australia. Before it reached the Australian border, the parcel was in the hands of the Singapore Post logistics armQuantium Solutions whodo have a tracking service but I had no Quantium tracking number, so it was unclear for most of that time if the parcel was lost or in transit. Emails to the customer service team were answered by people who said they would look into it and get back to me within 3 business days but were not able to do so, until the last one, who told me that the parcel was in Australia now.

Nonetheless vPost are one of very few only operator on the challengingIndonesia—Australia reshipping route, so maybe that is as good as it gets. Pricing was OK I guess? AUD83 for 5 shirts is less than the full air freight rate but more than the surface LCL freight rate. Delivery was surface freight speed, bulk-carrier level of service, which is probably my jam and would be OK except for how much manual intervention was required.

From the USA I got a small (0.5kg) electronics parcel delivered at a reasonably price (AUD34 including insurance) Paid on 13 Jan, estimated arrival 24 Jan -1 Feb, actual arrival on 9 Feb, so the service is converging towards punctual over time.

Shippn is a slightly different thing. It is a gig-economy style service where locals will shop for you in various countries. Not in Indonesia, but in many other places.

Previously I usedShipito, a US service for getting audio gear. That has not been recently useful; most things I could get in the USA have been available in Australia too for comparable rates.

Various other services arereviewed in Techvise. Australia-specific, seeShopMate and US freight-forwarding services compared.

Crowdshipping options sound like they could solve some problems here? But are any still working?Parcl was hyped for a while but now defunct. How aboutBambizz orNimber?

- Quick intro
- Observation likelihoods
- Incorporating a mean function
- Density estimation
- Kernels
- Using state filtering
- On lattice observations
- On manifolds
- By variational inference
- With inducing variables
- By variational inference with inducing variables
- With vector output
- Deep
- Approximation with dropout
- Inhomogeneous with covariates
- For dimension reduction
- Pathwise/Matheron updates
- Implementations
- References

Gaussian random processes/fields are stochastic processes/fields with jointly Gaussian distributions of observations.
While “Gaussian*process* regression” is not wrong*per se*, there is a common convention in stochastic process theory (and also in pedagogy) to use*process* to talk about some notionally time-indexed process and*field* to talk about ones that have a some space-like index without a presumption of an arrow of time.
This leads to much confusion, because Gaussian*field* regression is what we usually want to talk about. What we want to use the arrow of time for is a whole other story.
Regardless, hereafter I’ll use “field” and “process” interchangeably.

In machine learning, Gaussian fields are used often as a means of regression or classification, since it is fairly easy to conditionalize a Gaussian field on data and produce a posterior distribution over functions. They provide nonparametric method of inferring regression functions, with a conveniently Bayesian interpretation and reasonably elegant learning and inference steps. I would further add that this is the crystal meth of machine learning methods, in terms of the addictiveness, and of the passion of the people who use it.

The central trick is using a clever union ofHilbert space trickss and probability to give a probabilistic interpretation offunctional regression as a kind ofnonparametric Bayesian inference.

Useful side divergence intorepresenter theorems andKarhunen-Loève expansions for thinking about this.
Regression using Gaussian processes is common
e.g.spatial statistics
where it arises as*kriging*.Cressie (1990) traces a history of this idea viaMatheron (1963a), to works ofKrige (1951).

This web site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes. Although Gaussian processes have a long history in the field of statistics, they seem to have been employed extensively only in niche areas. With the advent of kernel machines in the machine learning community, models based on Gaussian processes have become commonplace for problems of regression (kriging) and classification as well as a host of more specialized applications.

I’ve not been enthusiastic about these in the past. It’s nice to have a principled nonparametric Bayesian formalism, but it has always seemed pointless having a formalism that is so computationally demanding that people don’t try to use more than a thousand data points, or spend most of a paper working out how to approximate this simple elegant model with a complex messy model. However, that previous sentence describes most of my career now, so I guess I must have come around.

Perhaps I should be persuaded by tricks such asAutoGP(Krauth et al. 2016) which breaks some computational deadlocks by clever use of inducing variables andvariational approximation to produce a compressed representation of the data with tractable inference and model selection, including kernel selection, and doing the whole thing in many dimensions simultaneously. There are other clever tricks like this one, e.g(Saatçi 2012) shows how to use a lattice structure for observations to make computation cheap.

I am not the right guy to provide the canonical introduction, because it already exists. Specifically,Rasmussen and Williams (2006).

Thislecture by the late David Mackay is probably good; the man could talk.

There is also a well-illustrated and elementaryintroduction by Yuge Shi. There are many, many more.

J. T. Wilson et al. (2021):

A Gaussian process (GP) is a random function\(f: \mathcal{X} \rightarrow \mathbb{R}\), such that, for any finite collection of points\(\mathbf{X} \subset \mathcal{X}\), the random vector\(\boldsymbol{f}=f(\mathbf{X})\) follows a Gaussian distribution. Such a process is uniquely identified by a mean function\(\mu: \mathcal{X} \rightarrow \mathbb{R}\) and a positive semi-definite kernel\(k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}\). Hence, if\(f \sim \mathcal{G} \mathcal{P}(\mu, k)\), then\(\boldsymbol{f} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K})\) is multivariate normal with mean\(\boldsymbol{\mu}=\mu(\mathbf{X})\) and covariance\(\mathbf{K}=k(\mathbf{X}, \mathbf{X})\).

[…] we investigate different ways of reasoning about the random variable\(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\) for some non-trivial partition\(\boldsymbol{f}=\boldsymbol{f}_n \oplus \boldsymbol{f}_*\). Here,\(\boldsymbol{f}_n=f\left(\mathbf{X}_n\right)\) are process values at a set of training locations\(\mathbf{X}_n \subset \mathbf{X}\) where we would like to introduce a condition\(\boldsymbol{f}_n=\boldsymbol{y}\), while\(\boldsymbol{f}_*=f\left(\mathbf{X}_*\right)\) are process values at a set of test locations\(\mathbf{X}_* \subset \mathbf{X}\) where we would like to obtain a random variable\(\boldsymbol{f}_* \mid \boldsymbol{f}_n=\boldsymbol{y}\).

[…] we may obtain\(\boldsymbol{f}_* \mid \boldsymbol{y}\) by first finding its conditional distribution. Since process values\(\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) are defined as jointly Gaussian, this procedure closely resembles that of [the finite-dimensinal case]: we factor out the marginal distribution of\(\boldsymbol{f}_n\) from the joint distribution\(p\left(\boldsymbol{f}_n, \boldsymbol{f}_*\right)\) and, upon canceling, identify the remaining distribution as\(p\left(\boldsymbol{f}_* \mid \boldsymbol{y}\right)\). Having done so, we find that the conditional distribution is the Gaussian\(\mathcal{N}\left(\boldsymbol{\mu}_{* \mid y}, \mathbf{K}_{*, * \mid y}\right)\) with moments\[\begin{aligned} \boldsymbol{\mu}_{* \mid \boldsymbol{y}}&=\boldsymbol{\mu}_*+\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1}\left(\boldsymbol{y}-\boldsymbol{\mu}_n\right) \\ \mathbf{K}_{*, * \mid \boldsymbol{y}}&=\mathbf{K}_{*, *}-\mathbf{K}_{*, n} \mathbf{K}_{n, n}^{-1} \mathbf{K}_{n, *}\end{aligned} \]

classification etc

Almost immediate but not*quite* trivial(Rasmussen and Williams 2006, 2.7).

TODO: discuss identifiability.

Can Iinfer a density using GPs? Yes. One popular method is apparently the logistic Gaussian process.(Tokdar 2007;Lenk 2003)

a.k.a. covariance models.

GP regression models arekernel machines. As suchcovariance kernels are the parameters. More or less. One can also parameterise with a mean function, but let us ignore that detail for now because usually we do not use them.

When one dimension of the input vector can be interpreted as a time dimension we areKalman filtering Gaussian Processes, which has benefits in terms of speed.

I would like to readTerenin on GPs on Manifolds who also makes a suggestive connection toSDEs, which is thefiltering GPs trick again.

🏗

“Sparse GP”. SeeQuiñonero-Candela and Rasmussen (2005). 🏗

SeeGP factoring.

SeeNN ensembles.

Integrated nested Laplace approximation connects toGP-as-SDE idea, I think?

e.g. GP-LVM(N. Lawrence 2005). 🏗

Seepathwise GP.

Abrahamsen, Petter. 1997.“A Review of Gaussian Random Fields and Correlation Functions.”

Abt, Markus, and William J. Welch. 1998.“Fisher Information and Maximum-Likelihood Estimation of Covariance Parameters in Gaussian Stochastic Processes.”*Canadian Journal of Statistics* 26 (1): 127–37.

Altun, Yasemin, Alex J. Smola, and Thomas Hofmann. 2004.“Exponential Families for Conditional Random Fields.” In*Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence*, 2–9. UAI ’04. Arlington, Virginia, United States: AUAI Press.

Alvarado, Pablo A., and Dan Stowell. 2018.“Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music.”*arXiv:1705.07104 [Cs, Stat]*, November.

Ambikasaran, Sivaram, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, and Michael O’Neil. 2015.“Fast Direct Methods for Gaussian Processes.”*arXiv:1403.6015 [Astro-Ph, Stat]*, April.

Bachoc, F., F. Gamboa, J. Loubes, and N. Venet. 2018.“A Gaussian Process Regression Model for Distribution Inputs.”*IEEE Transactions on Information Theory* 64 (10): 6620–37.

Bachoc, Francois, Alexandra Suvorikova, David Ginsbourger, Jean-Michel Loubes, and Vladimir Spokoiny. 2019.“Gaussian Processes with Multidimensional Distribution Inputs via Optimal Transport and Hilbertian Embedding.”*arXiv:1805.00753 [Stat]*, April.

Birgé, Lucien, and Pascal Massart. 2006.“Minimal Penalties for Gaussian Model Selection.”*Probability Theory and Related Fields* 138 (1-2): 33–73.

Bonilla, Edwin V., Kian Ming A. Chai, and Christopher K. I. Williams. 2007.“Multi-Task Gaussian Process Prediction.” In*Proceedings of the 20th International Conference on Neural Information Processing Systems*, 153–60. NIPS’07. USA: Curran Associates Inc.

Bonilla, Edwin V., Karl Krauth, and Amir Dezfouli. 2019.“Generic Inference in Latent Gaussian Process Models.”*Journal of Machine Learning Research* 20 (117): 1–63.

Borovitskiy, Viacheslav, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2020.“Matérn Gaussian Processes on Riemannian Manifolds.”*arXiv:2006.10160 [Cs, Stat]*, June.

Burt, David R., Carl Edward Rasmussen, and Mark van der Wilk. 2020.“Convergence of Sparse Variational Inference in Gaussian Processes Regression.”*Journal of Machine Learning Research* 21 (131): 1–63.

Calandra, R., J. Peters, C. E. Rasmussen, and M. P. Deisenroth. 2016.“Manifold Gaussian Processes for Regression.” In*2016 International Joint Conference on Neural Networks (IJCNN)*, 3338–45. Vancouver, BC, Canada: IEEE.

Cressie, Noel. 1990.“The Origins of Kriging.”*Mathematical Geology* 22 (3): 239–52.

———. 2015.*Statistics for Spatial Data*. John Wiley & Sons.

Cressie, Noel, and Christopher K. Wikle. 2011.*Statistics for Spatio-Temporal Data*. Wiley Series in Probability and Statistics 2.0. John Wiley and Sons.

Csató, Lehel, and Manfred Opper. 2002.“Sparse On-Line Gaussian Processes.”*Neural Computation* 14 (3): 641–68.

Csató, Lehel, Manfred Opper, and Ole Winther. 2001.“TAP Gibbs Free Energy, Belief Propagation and Sparsity.” In*Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic*, 657–63. NIPS’01. Cambridge, MA, USA: MIT Press.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008.“Fast Gaussian Process Methods for Point Process Intensity Estimation.” In*Proceedings of the 25th International Conference on Machine Learning*, 192–99. ICML ’08. New York, NY, USA: ACM Press.

Cutajar, Kurt, Edwin V. Bonilla, Pietro Michiardi, and Maurizio Filippone. 2017.“Random Feature Expansions for Deep Gaussian Processes.” In*PMLR*.

Dahl, Astrid, and Edwin Bonilla. 2017.“Scalable Gaussian Process Models for Solar Power Forecasting.” In*Data Analytics for Renewable Energy Integration: Informing the Generation and Distribution of Renewable Energy*, edited by Wei Lee Woon, Zeyar Aung, Oliver Kramer, and Stuart Madnick, 94–106. Lecture Notes in Computer Science. Cham: Springer International Publishing.

Dahl, Astrid, and Edwin V. Bonilla. 2019.“Sparse Grouped Gaussian Processes for Solar Power Forecasting.”*arXiv:1903.03986 [Cs, Stat]*, March.

Damianou, Andreas, and Neil Lawrence. 2013.“Deep Gaussian Processes.” In*Artificial Intelligence and Statistics*, 207–15.

Damianou, Andreas, Michalis K. Titsias, and Neil D. Lawrence. 2011.“Variational Gaussian Process Dynamical Systems.” In*Advances in Neural Information Processing Systems 24*, edited by J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, 2510–18. Curran Associates, Inc.

Dezfouli, Amir, and Edwin V. Bonilla. 2015.“Scalable Inference for Gaussian Process Models with Black-Box Likelihoods.” In*Advances in Neural Information Processing Systems 28*, 1414–22. NIPS’15. Cambridge, MA, USA: MIT Press.

Domingos, Pedro. 2020.“Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.”*arXiv:2012.00152 [Cs, Stat]*, November.

Dubrule, Olivier. 2018.“Kriging, Splines, Conditional Simulation, Bayesian Inversion and Ensemble Kalman Filtering.” In*Handbook of Mathematical Geosciences: Fifty Years of IAMG*, edited by B.S. Daya Sagar, Qiuming Cheng, and Frits Agterberg, 3–24. Cham: Springer International Publishing.

Dunlop, Matthew M., Mark A. Girolami, Andrew M. Stuart, and Aretha L. Teckentrup. 2018.“How Deep Are Deep Gaussian Processes?”*Journal of Machine Learning Research* 19 (1): 2100–2145.

Dutordoir, Vincent, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, and Nicolas Durrande. 2021.“Deep Neural Networks as Point Estimates for Deep Gaussian Processes.”*arXiv:2105.04504 [Cs, Stat]*, May.

Dutordoir, Vincent, Alan Saul, Zoubin Ghahramani, and Fergus Simpson. 2022.“Neural Diffusion Processes.” arXiv.

Duvenaud, David. 2014.“Automatic Model Construction with Gaussian Processes.” PhD Thesis, University of Cambridge.

Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013.“Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In*Proceedings of the 30th International Conference on Machine Learning (ICML-13)*, 1166–74.

Ebden, Mark. 2015.“Gaussian Processes: A Quick Introduction.”*arXiv:1505.02965 [Math, Stat]*, May.

Eleftheriadis, Stefanos, Tom Nicholson, Marc Deisenroth, and James Hensman. 2017.“Identification of Gaussian Process State Space Models.” In*Advances in Neural Information Processing Systems 30*, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 5309–19. Curran Associates, Inc.

Emery, Xavier. 2007.“Conditioning Simulations of Gaussian Random Fields by Ordinary Kriging.”*Mathematical Geology* 39 (6): 607–23.

Evgeniou, Theodoros, Charles A. Micchelli, and Massimiliano Pontil. 2005.“Learning Multiple Tasks with Kernel Methods.”*Journal of Machine Learning Research* 6 (Apr): 615–37.

Ferguson, Thomas S. 1973.“A Bayesian Analysis of Some Nonparametric Problems.”*The Annals of Statistics* 1 (2): 209–30.

Finzi, Marc, Roberto Bondesan, and Max Welling. 2020.“Probabilistic Numeric Convolutional Neural Networks.”*arXiv:2010.10876 [Cs]*, October.

Föll, Roman, Bernard Haasdonk, Markus Hanselmann, and Holger Ulmer. 2017.“Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation.”*arXiv:1711.00799 [Stat]*, November.

Frigola, Roger, Yutian Chen, and Carl Edward Rasmussen. 2014.“Variational Gaussian Process State-Space Models.” In*Advances in Neural Information Processing Systems 27*, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3680–88. Curran Associates, Inc.

Frigola, Roger, Fredrik Lindsten, Thomas B Schön, and Carl Edward Rasmussen. 2013.“Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC.” In*Advances in Neural Information Processing Systems 26*, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3156–64. Curran Associates, Inc.

Gal, Yarin, and Zoubin Ghahramani. 2015.“Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In*Proceedings of the 33rd International Conference on Machine Learning (ICML-16)*.

Gal, Yarin, and Mark van der Wilk. 2014.“Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models - a Gentle Tutorial.”*arXiv:1402.1412 [Stat]*, February.

Galliani, Pietro, Amir Dezfouli, Edwin V Bonilla, and Novi Quadrianto. n.d.“Gray-Box Inference for Structured Gaussian Process Models,” 9.

Gardner, Jacob R., Geoff Pleiss, David Bindel, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In*Proceedings of the 32nd International Conference on Neural Information Processing Systems*, 31:7587–97. NIPS’18. Red Hook, NY, USA: Curran Associates Inc.

Gardner, Jacob R., Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Product Kernel Interpolation for Scalable Gaussian Processes.”*arXiv:1802.08903 [Cs, Stat]*, February.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018.“Conditional Neural Processes.”*arXiv:1807.01613 [Cs, Stat]*, July, 10.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018.“Neural Processes,” July.

Ghahramani, Zoubin. 2013.“Bayesian Non-Parametrics and the Probabilistic Approach to Modelling.”*Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences* 371 (1984): 20110553.

Gilboa, E., Y. Saatçi, and J. P. Cunningham. 2015.“Scaling Multidimensional Inference for Structured Gaussian Processes.”*IEEE Transactions on Pattern Analysis and Machine Intelligence* 37 (2): 424–36.

Girolami, Mark, and Simon Rogers. 2005.“Hierarchic Bayesian Models for Kernel Learning.” In*Proceedings of the 22nd International Conference on Machine Learning - ICML ’05*, 241–48. Bonn, Germany: ACM Press.

Gramacy, Robert B. 2016.“laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R.”*Journal of Statistical Software* 72 (1).

Gramacy, Robert B., and Daniel W. Apley. 2015.“Local Gaussian Process Approximation for Large Computer Experiments.”*Journal of Computational and Graphical Statistics* 24 (2): 561–78.

Gratiet, Loïc Le, Stefano Marelli, and Bruno Sudret. 2016.“Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expansions and Gaussian Processes.” In*Handbook of Uncertainty Quantification*, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–37. Cham: Springer International Publishing.

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012.“Exploiting Compositionality to Explore a Large Space of Model Structures.” In*Proceedings of the Conference on Uncertainty in Artificial Intelligence*.

Hartikainen, J., and S. Särkkä. 2010.“Kalman Filtering and Smoothing Solutions to Temporal Gaussian Process Regression Models.” In*2010 IEEE International Workshop on Machine Learning for Signal Processing*, 379–84. Kittila, Finland: IEEE.

Hensman, James, Nicolò Fusi, and Neil D. Lawrence. 2013.“Gaussian Processes for Big Data.” In*Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence*, 282–90. UAI’13. Arlington, Virginia, USA: AUAI Press.

Huber, Marco F. 2014.“Recursive Gaussian Process: On-Line Regression and Learning.”*Pattern Recognition Letters* 45 (August): 85–91.

Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018.“Scalable Gaussian Process Inference with Finite-Data Mean and Variance Guarantees.”*arXiv:1806.10234 [Cs, Stat]*, June.

Jankowiak, Martin, Geoff Pleiss, and Jacob Gardner. 2020.“Deep Sigma Point Processes.” In*Conference on Uncertainty in Artificial Intelligence*, 789–98. PMLR.

Jordan, Michael Irwin. 1999.*Learning in Graphical Models*. Cambridge, Mass.: MIT Press.

Karvonen, Toni, and Simo Särkkä. 2016.“Approximate State-Space Gaussian Processes via Spectral Transformation.” In*2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)*, 1–6. Vietri sul Mare, Salerno, Italy: IEEE.

Kasim, M. F., D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gregori, et al. 2020.“Up to Two Billion Times Acceleration of Scientific Simulations with Deep Neural Architecture Search.”*arXiv:2001.08055 [Physics, Stat]*, January.

Kingma, Diederik P., and Max Welling. 2014.“Auto-Encoding Variational Bayes.” In*ICLR 2014 Conference*.

Ko, Jonathan, and Dieter Fox. 2009.“GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In*Autonomous Robots*, 27:75–90.

Kocijan, Juš, Agathe Girard, Blaž Banko, and Roderick Murray-Smith. 2005.“Dynamic Systems Identification with Gaussian Processes.”*Mathematical and Computer Modelling of Dynamical Systems* 11 (4): 411–24.

Krauth, Karl, Edwin V. Bonilla, Kurt Cutajar, and Maurizio Filippone. 2016.“AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In*UAI17*.

Krige, D. G. 1951.“A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.”*Journal of the Southern African Institute of Mining and Metallurgy* 52 (6): 119–39.

Kroese, Dirk P., and Zdravko I. Botev. 2013.“Spatial Process Generation.”*arXiv:1308.0399 [Stat]*, August.

Lawrence, Neil. 2005.“Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models.”*Journal of Machine Learning Research* 6 (Nov): 1783–1816.

Lawrence, Neil D., and Raquel Urtasun. 2009.“Non-Linear Matrix Factorization with Gaussian Processes.” In*Proceedings of the 26th Annual International Conference on Machine Learning*, 601–8. ICML ’09. New York, NY, USA: ACM.

Lawrence, Neil, Matthias Seeger, and Ralf Herbrich. 2003.“Fast Sparse Gaussian Process Methods: The Informative Vector Machine.” In*Proceedings of the 16th Annual Conference on Neural Information Processing Systems*, 609–16.

Lázaro-Gredilla, Miguel, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, and Aníbal R. Figueiras-Vidal. 2010.“Sparse Spectrum Gaussian Process Regression.”*Journal of Machine Learning Research* 11 (Jun): 1865–81.

Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018.“Deep Neural Networks as Gaussian Processes.” In*ICLR*.

Leibfried, Felix, Vincent Dutordoir, S. T. John, and Nicolas Durrande. 2021.“A Tutorial on Sparse Gaussian Processes and Variational Inference.”*arXiv:2012.13962 [Cs, Stat]*, June.

Lenk, Peter J. 2003.“Bayesian Semiparametric Density Estimation and Model Verification Using a Logistic–Gaussian Process.”*Journal of Computational and Graphical Statistics* 12 (3): 548–65.

Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011.“An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 73 (4): 423–98.

Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011.“Gaussian Processes for Underdetermined Source Separation.”*IEEE Transactions on Signal Processing* 59 (7): 3155–67.

Lloyd, James Robert, David Duvenaud, Roger Grosse, Joshua Tenenbaum, and Zoubin Ghahramani. 2014.“Automatic Construction and Natural-Language Description of Nonparametric Regression Models.” In*Twenty-Eighth AAAI Conference on Artificial Intelligence*.

Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019.“The Functional Neural Process.”*arXiv:1906.08324 [Cs, Stat]*, June.

MacKay, David J C. 1998.“Introduction to Gaussian Processes.”*NATO ASI Series. Series F: Computer and System Sciences* 168: 133–65.

———. 2002.“Gaussian Processes.” In*Information Theory, Inference & Learning Algorithms*, Chapter 45. Cambridge University Press.

Matheron, Georges. 1963a.*Traité de Géostatistique Appliquée. 2. Le Krigeage*. Editions Technip.

———. 1963b.“Principles of Geostatistics.”*Economic Geology* 58 (8): 1246–66.

Matthews, Alexander Graeme de Garis, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2016.“GPflow: A Gaussian Process Library Using TensorFlow.”*arXiv:1610.08733 [Stat]*, October.

Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Guilherme A. Barreto, and Neil D. Lawrence. 2017.“Deep Recurrent Gaussian Processes for Outlier-Robust System Identification.”*Journal of Process Control*, DYCOPS-CAB 2016, 60 (December): 82–94.

Mattos, César Lincoln C., Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A. Barreto, and Neil D. Lawrence. 2016.“Recurrent Gaussian Processes.” In*Proceedings of ICLR*.

Micchelli, Charles A., and Massimiliano Pontil. 2005a.“Learning the Kernel Function via Regularization.”*Journal of Machine Learning Research* 6 (Jul): 1099–1125.

———. 2005b.“On Learning Vector-Valued Functions.”*Neural Computation* 17 (1): 177–204.

Minh, Hà Quang. 2022.“Finite Sample Approximations of Exact and Entropic Wasserstein Distances Between Covariance Operators and Gaussian Processes.”*SIAM/ASA Journal on Uncertainty Quantification*, February, 96–124.

Mohammadi, Hossein, Peter Challenor, and Marc Goodfellow. 2021.“Emulating Computationally Expensive Dynamical Simulators Using Gaussian Processes.”*arXiv:2104.14987 [Stat]*, April.

Moreno-Muñoz, Pablo, Antonio Artés-Rodríguez, and Mauricio A. Álvarez. 2019.“Continual Multi-Task Gaussian Processes.”*arXiv:1911.00002 [Cs, Stat]*, October.

Nagarajan, Sai Ganesh, Gareth Peters, and Ido Nevat. 2018.“Spatial Field Reconstruction of Non-Gaussian Random Fields: The Tukey G-and-H Random Process.”*SSRN Electronic Journal*.

Nickisch, Hannes, Arno Solin, and Alexander Grigorevskiy. 2018.“State Space Gaussian Processes with Non-Gaussian Likelihood.” In*International Conference on Machine Learning*, 3789–98.

O’Hagan, A. 1978.“Curve Fitting and Optimal Design for Prediction.”*Journal of the Royal Statistical Society: Series B (Methodological)* 40 (1): 1–24.

Papaspiliopoulos, Omiros, Yvo Pokern, Gareth O. Roberts, and Andrew M. Stuart. 2012.“Nonparametric Estimation of Diffusions: A Differential Equations Approach.”*Biometrika* 99 (3): 511–31.

Pinder, Thomas, and Daniel Dodd. 2022.“GPJax: A Gaussian Process Framework in JAX.”*Journal of Open Source Software* 7 (75): 4455.

Pleiss, Geoff, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2018.“Constant-Time Predictive Distributions for Gaussian Processes.” In. arXiv.

Pleiss, Geoff, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. 2020.“Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization.”*Advances in Neural Information Processing Systems* 33.

Quiñonero-Candela, Joaquin, and Carl Edward Rasmussen. 2005.“A Unifying View of Sparse Approximate Gaussian Process Regression.”*Journal of Machine Learning Research* 6 (Dec): 1939–59.

Raissi, Maziar, and George Em Karniadakis. 2017.“Machine Learning of Linear Differential Equations Using Gaussian Processes.”*arXiv:1701.02440 [Cs, Math, Stat]*, January.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006.*Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.

Reece, S., and S. Roberts. 2010.“An Introduction to Gaussian Processes for the Kalman Filter Expert.” In*2010 13th International Conference on Information Fusion*, 1–9.

Ritter, Hippolyt, Martin Kukla, Cheng Zhang, and Yingzhen Li. 2021.“Sparse Uncertainty Representation in Deep Learning with Inducing Weights.”*arXiv:2105.14594 [Cs, Stat]*, May.

Riutort-Mayol, Gabriel, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari. 2020.“Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming.”*arXiv:2004.11408 [Stat]*, April.

Rossi, Simone, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, and Maurizio Filippone. 2020.“Rethinking Sparse Gaussian Processes: Bayesian Approaches to Inducing-Variable Approximations,” March.

Saatçi, Yunus. 2012.“Scalable inference for structured Gaussian process models.” Ph.D., University of Cambridge.

Saatçi, Yunus, Ryan Turner, and Carl Edward Rasmussen. 2010.“Gaussian Process Change Point Models.” In*Proceedings of the 27th International Conference on International Conference on Machine Learning*, 927–34. ICML’10. Madison, WI, USA: Omnipress.

Saemundsson, Steindor, Alexander Terenin, Katja Hofmann, and Marc Peter Deisenroth. 2020.“Variational Integrator Networks for Physically Structured Embeddings.”*arXiv:1910.09349 [Cs, Stat]*, March.

Salimbeni, Hugh, and Marc Deisenroth. 2017.“Doubly Stochastic Variational Inference for Deep Gaussian Processes.” In*Advances In Neural Information Processing Systems*.

Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. 2018.“Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models.” In*International Conference on Artificial Intelligence and Statistics*, 689–97.

Särkkä, Simo. 2011.“Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.” In*Artificial Neural Networks and Machine Learning – ICANN 2011*, edited by Timo Honkela, Włodzisław Duch, Mark Girolami, and Samuel Kaski, 6792:151–58. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.

———. 2013.*Bayesian Filtering and Smoothing*. Institute of Mathematical Statistics Textbooks 3. Cambridge, U.K. ; New York: Cambridge University Press.

Särkkä, Simo, and Jouni Hartikainen. 2012.“Infinite-Dimensional Kalman Filtering Approach to Spatio-Temporal Gaussian Process Regression.” In*Artificial Intelligence and Statistics*.

Särkkä, Simo, A. Solin, and J. Hartikainen. 2013.“Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering.”*IEEE Signal Processing Magazine* 30 (4): 51–61.

Schulam, Peter, and Suchi Saria. 2017.“Reliable Decision Support Using Counterfactual Models.” In*Proceedings of the 31st International Conference on Neural Information Processing Systems*, 1696–706. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.

Shah, Amar, Andrew Wilson, and Zoubin Ghahramani. 2014.“Student-t Processes as Alternatives to Gaussian Processes.” In*Artificial Intelligence and Statistics*, 877–85. PMLR.

Sidén, Per. 2020.*Scalable Bayesian Spatial Analysis with Gaussian Markov Random Fields*. Vol. 15. Linköping Studies in Statistics. Linköping: Linköping University Electronic Press.

Smith, Michael Thomas, Mauricio A. Alvarez, and Neil D. Lawrence. 2018.“Gaussian Process Regression for Binned Data.”*arXiv:1809.02010 [Cs, Stat]*, September.

Snelson, Edward, and Zoubin Ghahramani. 2005.“Sparse Gaussian Processes Using Pseudo-Inputs.” In*Advances in Neural Information Processing Systems*, 1257–64.

Solin, Arno, and Simo Särkkä. 2020.“Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.”*Statistics and Computing* 30 (2): 419–46.

Tait, Daniel J., and Theodoros Damoulas. 2020.“Variational Autoencoding of PDE Inverse Problems.”*arXiv:2006.15641 [Cs, Stat]*, June.

Tang, Wenpin, Lu Zhang, and Sudipto Banerjee. 2019.“On Identifiability and Consistency of the Nugget in Gaussian Spatial Process Models.”*arXiv:1908.05726 [Math, Stat]*, August.

Titsias, Michalis K. 2009a.“Variational Learning of Inducing Variables in Sparse Gaussian Processes.” In*International Conference on Artificial Intelligence and Statistics*, 567–74. PMLR.

———. 2009b.“Variational Model Selection for Sparse Gaussian Process Regression: TEchical Supplement.” Technical report, School of Computer Science, University of Manchester.

Titsias, Michalis, and Neil D. Lawrence. 2010.“Bayesian Gaussian Process Latent Variable Model.” In*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*, 844–51.

Tokdar, Surya T. 2007.“Towards a Faster Implementation of Density Estimation With Logistic Gaussian Process Priors.”*Journal of Computational and Graphical Statistics* 16 (3): 633–55.

Turner, Richard E., and Maneesh Sahani. 2014.“Time-Frequency Analysis as Probabilistic Inference.”*IEEE Transactions on Signal Processing* 62 (23): 6171–83.

Turner, Ryan, Marc Deisenroth, and Carl Rasmussen. 2010.“State-Space Inference and Learning with Gaussian Processes.” In*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*, 868–75.

Vanhatalo, Jarno, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013.“GPstuff: Bayesian Modeling with Gaussian Processes.”*Journal of Machine Learning Research* 14 (April): 1175−1179.

———. 2015.“Bayesian Modeling with Gaussian Processes Using the GPstuff Toolbox.”*arXiv:1206.5754 [Cs, Stat]*, July.

Walder, Christian, Kwang In Kim, and Bernhard Schölkopf. 2008.“Sparse Multiscale Gaussian Process Regression.” In*Proceedings of the 25th International Conference on Machine Learning*, 1112–19. ICML ’08. New York, NY, USA: ACM.

Walder, C., B. Schölkopf, and O. Chapelle. 2006.“Implicit Surface Modelling with a Globally Regularised Basis of Compact Support.”*Computer Graphics Forum* 25 (3): 635–44.

Wang, Ke, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, and Andrew Gordon Wilson. 2019.“Exact Gaussian Processes on a Million Data Points.” In*Advances in Neural Information Processing Systems*, 32:14648–59. Red Hook, NY, USA.

Wikle, Christopher K., Noel Cressie, and Andrew Zammit-Mangion. 2019.*Spatio-Temporal Statistics with R*.

Wilk, Mark van der, Andrew G. Wilson, and Carl E. Rasmussen. 2014.“Variational Inference for Latent Variable Modelling of Correlation Structure.” In*NIPS 2014 Workshop on Advances in Variational Inference*.

Wilkinson, William J., Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, and Arno Solin. 2019.“End-to-End Probabilistic Inference for Nonstationary Audio Analysis.”*arXiv:1901.11436 [Cs, Eess, Stat]*, January.

Wilkinson, William J., Simo Särkkä, and Arno Solin. 2021.“Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.” arXiv.

Williams, Christopher KI, and Matthias Seeger. 2001.“Using the Nyström Method to Speed Up Kernel Machines.” In*Advances in Neural Information Processing Systems*, 682–88.

Williams, Christopher, Stefan Klanke, Sethu Vijayakumar, and Kian M. Chai. 2009.“Multi-Task Gaussian Process Learning of Robot Inverse Dynamics.” In*Advances in Neural Information Processing Systems 21*, edited by D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, 265–72. Curran Associates, Inc.

Wilson, Andrew Gordon, and Ryan Prescott Adams. 2013.“Gaussian Process Kernels for Pattern Discovery and Extrapolation.” In*International Conference on Machine Learning*.

Wilson, Andrew Gordon, Christoph Dann, Christopher G. Lucas, and Eric P. Xing. 2015.“The Human Kernel.”*arXiv:1510.07389 [Cs, Stat]*, October.

Wilson, Andrew Gordon, and Zoubin Ghahramani. 2011.“Generalised Wishart Processes.” In*Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence*, 736–44. UAI’11. Arlington, Virginia, United States: AUAI Press.

———. 2012.“Modelling Input Varying Correlations Between Multiple Responses.” In*Machine Learning and Knowledge Discovery in Databases*, edited by Peter A. Flach, Tijl De Bie, and Nello Cristianini, 858–61. Lecture Notes in Computer Science. Springer Berlin Heidelberg.

Wilson, Andrew Gordon, David A. Knowles, and Zoubin Ghahramani. 2012.“Gaussian Process Regression Networks.” In*Proceedings of the 29th International Coference on International Conference on Machine Learning*, 1139–46. ICML’12. Madison, WI, USA: Omnipress.

Wilson, Andrew Gordon, and Hannes Nickisch. 2015.“Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP).” In*Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37*, 1775–84. ICML’15. Lille, France: JMLR.org.

Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. 2020.“Efficiently Sampling Functions from Gaussian Process Posteriors.” In*Proceedings of the 37th International Conference on Machine Learning*, 10292–302. PMLR.

Wilson, James T, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. 2021.“Pathwise Conditioning of Gaussian Processes.”*Journal of Machine Learning Research* 22 (105): 1–47.

Zhang, Rui, Christian Walder, Edwin V. Bonilla, Marian-Andrei Rizoiu, and Lexing Xie. 2020.“Quantile Propagation for Wasserstein-Approximate Gaussian Processes.” In*Proceedings of NeurIPS 2020*.

Jha et al. (2022):

The uncertainty-aware Neural Process Family (NPF)(Garnelo, Rosenbaum, et al. 2018;Garnelo, Schwarz, et al. 2018) aims to address the aforementioned limitations of the Bayesian paradigm by exploiting the function approximation capabilities of deep neural networks to learn a family of real-world data-generating processes, a.k.a., stochastic Gaussian processes (GPs)(Rasmussen and Williams 2006). Neural processes (NPs) define uncertainties in predictions in terms of a conditional distribution over functions given the context (observations)\(C\) drawn from a distribution of functions. Here, each function\(f\) is parameterized using neural networks and can be thought of capturing an underlying data generating stochastic process.

To model the variability of\(f\) based on the variability of the generated data, NPs concurrently train and test their learned parameters on multiple datasets. This endows them with the capability to meta learn their predictive distributions over functions. The meta-learning setup makes NPs fundamentally distinguished from other non-Bayesian uncertainty-aware learning frameworks like stochastic GPs. NPF members thus combine the best of meta learners, GPs and neural networks. Like GPs, NPs learn a distribution of functions, quickly adapt to new observations, and provide uncertainty measures given test time observations. Like neural networks, NPs learn function approximation from data directly besides being efficient at inference. To learn\(f\), NPs incorporate the encoder-decoder architecture that comprises a functional encoding of each observation point followed by the learning of a decoder function whose parameters are capable of unraveling the unobserved function realizations to approximate the outputs of\(f\)…. Despite their resemblance to NPs, the vanilla encoder-decoder networks traditionally based on CNNs, RNNs, and Transformers operate merely on pointwise inputs and clearly lack the incentive to meta learn representations for dynamically changing functions (imagine\(f\) changing over a continuum such as time) and their families 5 The NPF members not only improve upon these architectures to model functional input spaces and provide uncertaintyaware estimates but also offer natural benefits to a number of challenging real-world tasks. Our study brings into light the potential of NPF models for several such tasks including but not limited to the handling of missing data, handling off-the-grid data, allowing continual and active learning out-of-the-box, superior interpretation capabilities all the while leveraging a diverse range of task-specific inductive biases.

Garnelo, Marta, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018.“Conditional Neural Processes.”*arXiv:1807.01613 [Cs, Stat]*, July, 10.

Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, and Yee Whye Teh. 2018.“Neural Processes,” July.

Jha, Saurav, Dong Gong, Xuesong Wang, Richard E. Turner, and Lina Yao. 2022.“The Neural Process Family: Survey, Applications and Perspectives.” arXiv.

Louizos, Christos, Xiahan Shi, Klamer Schutte, and Max Welling. 2019.“The Functional Neural Process.”*arXiv:1906.08324 [Cs, Stat]*, June.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006.*Gaussian Processes for Machine Learning*. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press.

Singh, Gautam, Jaesik Yoon, Youngsung Son, and Sungjin Ahn. 2019.“Sequential Neural Processes.”*arXiv:1906.10264 [Cs, Stat]*, June.

The main point IMO of themarkdown format was that it is supposed to be easy to read and not require a specialised editor. Nonetheless, people inevitably end up wanting one, because we want to integrate documents syncing, or mathematics preview, or image linking, ornote taking etc.

There are many, both integrated intonormal text editors and more specialized editors; see e.g.this review for an overview of some interesting ones. In particular some of thenote taking systems use markdown as the backend and double as markdown editors.

🏗

My own workflow is based onVS code these days and I use that as much as possible for my markdown editing as well. It has built-in markdown preview, althoughvarious friction points for mathematics.

RStudio has a neatly integrated markdown editor especially for RMarkdown documents, which I also use.

Notational Velocity,joplin,turtl andzettlr are all note-taking apps which happen to work in markdown and thus include markdown editors.

Typora is another one that I’ve seen that seems popular. Available for windows, macOS and Linux. Does look pretty and highly polished. Hipster support, e.g. from mathpix.

Markdown plus has an open source online version andoffline apps you can buy to support the creators.

markdeep is a designerly markdown renderer with good integration to other javascript-markdown outputs.

Regression estimation with penalties on the model parameters. I am especially interested when the penalties aresparsifying penalties, and I have more notes to sparse regression.

Here I consider general penalties: ridge etc. At least in principle — I have no active projects using penalties without sparsifying them at the moment.

Why might I use such penalties? One reason would be that\(L_2\) penalties have simple forms for theirinformation criteria, as shown by Konishi and Kitagawa(Konishi and Kitagawa 2008, 5.2.4).

See alsomatrix factorisations,optimisation,multiple testing,concentration inequalities, sparse flavoured icecream.

To discuss:

Ridge penalties, relationship withrobust regression,statistical learning theory etc.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a “penalty” on the parameters.

“Penalization” has a genealogy unknown to me, but is probably the least abstruse for common, general usage.

The “regularisation” nomenclature claims descent from Tikhonov, (egTikhonov and Glasko (1965)) who wanted to solve ill-conditioned integral and differential equations, which is slightly more general.

In statistics, the term "shrinkage" is used for very nearly the same thing.

“Smoothing” seems to be common in thespline andkernel estimate communities of(Silverman 1982,1984;Wahba 1990) et al, who usually actually want to smooth curves. When we say “smoothing” you usually mean that you can express your predictions as a “linear smoother”/hat matrix, which has certain nice properties ingeneralised cross validation.

“Smoothing” is not a great general term, since penalisation does not necessarily cause “smoothness” from any particular perspective — for example, some penalties cause the coefficients to become sparse and therefore, from the perspective of coefficients, it promotes non-smooth vectors. Often the thing that becomes smooth is not obvious.

Regardless, what these problems share in common is that we wish to solve an ill-conditioned inverse problem, so we tame it by adding a penalty to solutions we feel one should be reluctant to accept.

🏗 specifics

Famously, a penalty can have an interpretation as a Bayesian prior on the solution space. It is a fun exercise for example to “rediscover” the lasso regression as a typical linear regression but with the plus prize for the coefficients. In that case the maximum a posteriori estimate given those bays rise and the lasso solution coincide. If you want to know the full posterior you have to do a lot more work. But the connection is suggestive nonetheless.

A related and useful connection the interpretation of covariance kernels as prize producing smoothness in solutions. A very elegant introduction to these is given inMiller, Glennie, and Seaton (2020).

TBD.James and Stein (1961).beautiful explainer video.

What should we regularize to attain specific kinds of solutions?

Here’s one thing I saw recently:

Venkat Chandrasekaran, Learning Semidefinite Regularizers via Matrix Factorization

Abstract: Regularization techniques are widely employed in the solution of inverse problems in data analysis and scientific computing due to their effectiveness in addressing difficulties due to ill-posedness. In their most common manifestation, these methods take the form of penalty functions added to the objective in optimization-based approaches for solving inverse problems. The purpose of the penalty function is to induce a desired structure in the solution, and these functions are specified based on prior domain-specific expertise. We consider the problem of learning suitable regularization functions from data in settings in which prior domain knowledge is not directly available. Previous work under the title of ‘dictionary learning’ or ‘sparse coding’ may be viewed as learning a polyhedral regularizer from data. We describe generalizations of these methods to learn semidefinite regularizers by computing structured factorizations of data matrices. Our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with operator analogs of Sinkhorn scaling. The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems. (Joint work with Yong Sheng Soh)

Akaike, Hirotogu. 1973.“Information Theory and an Extension of the Maximum Likelihood Principle.” In*Proceeding of the Second International Symposium on Information Theory*, edited by Petrovand F Caski, 199–213. Budapest: Akademiai Kiado.

Akaike, Htrotugu. 1973.“Maximum Likelihood Identification of Gaussian Autoregressive Moving Average Models.”*Biometrika* 60 (2): 255–65.

Azizyan, Martin, Akshay Krishnamurthy, and Aarti Singh. 2015.“Extreme Compressive Sampling for Covariance Estimation.”*arXiv:1506.00898 [Cs, Math, Stat]*, June.

Bach, Francis. 2009.“Model-Consistent Sparse Estimation Through the Bootstrap.”*arXiv:0901.3202 [Cs, Stat]*.

Banerjee, Arindam, Sheng Chen, Farideh Fazayeli, and Vidyashankar Sivakumar. 2014.“Estimation with Norm Regularization.” In*Advances in Neural Information Processing Systems 27*, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 1556–64. Curran Associates, Inc.

Barron, Andrew R., Cong Huang, Jonathan Q. Li, and Xi Luo. 2008.“MDL, Penalized Likelihood, and Statistical Risk.” In*Information Theory Workshop, 2008. ITW’08. IEEE*, 247–57. IEEE.

Battiti, Roberto. 1992.“First-and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method.”*Neural Computation* 4 (2): 141–66.

Bellec, Pierre C., and Alexandre B. Tsybakov. 2016.“Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty.”*arXiv:1609.06675 [Math, Stat]*, September.

Bickel, Peter J., Bo Li, Alexandre B. Tsybakov, Sara A. van de Geer, Bin Yu, Teófilo Valdés, Carlos Rivero, Jianqing Fan, and Aad van der Vaart. 2006.“Regularization in Statistics.”*Test* 15 (2): 271–344.

Brown, Lawrence D., and Yi Lin. 2004.“Statistical Properties of the Method of Regularization with Periodic Gaussian Reproducing Kernel.”*The Annals of Statistics* 32 (4): 1723–43.

Bühlmann, Peter, and Sara van de Geer. 2011.“Additive Models and Many Smooth Univariate Functions.” In*Statistics for High-Dimensional Data*, 77–97. Springer Series in Statistics. Springer Berlin Heidelberg.

———. 2015.“High-Dimensional Inference in Misspecified Linear Models.”*arXiv:1503.06426 [Stat]* 9 (1): 1449–73.

Burman, P., and D. Nolan. 1995.“A General Akaike-Type Criterion for Model Selection in Robust Regression.”*Biometrika* 82 (4): 877–86.

Candès, Emmanuel J., and Carlos Fernandez-Granda. 2013.“Super-Resolution from Noisy Data.”*Journal of Fourier Analysis and Applications* 19 (6): 1229–54.

Candès, Emmanuel J., and Y. Plan. 2010.“Matrix Completion With Noise.”*Proceedings of the IEEE* 98 (6): 925–36.

Cavanaugh, Joseph E. 1997.“Unifying the Derivations for the Akaike and Corrected Akaike Information Criteria.”*Statistics & Probability Letters* 33 (2): 201–8.

Chen, Yen-Chi, and Yu-Xiang Wang. n.d.“Discussion on‘Confidence Intervals and Hypothesis Testing for High-Dimensional Regression’.”

Chernozhukov, Victor, Christian Hansen, and Martin Spindler. 2015.“Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach.”*Annual Review of Economics* 7 (1): 649–88.

Efron, Bradley. 2004.“The Estimation of Prediction Error.”*Journal of the American Statistical Association* 99 (467): 619–32.

Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004.“Least Angle Regression.”*The Annals of Statistics* 32 (2): 407–99.

Flynn, Cheryl J., Clifford M. Hurvich, and Jeffrey S. Simonoff. 2013.“Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.”*arXiv:1302.2068 [Stat]*, February.

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010.“Regularization Paths for Generalized Linear Models via Coordinate Descent.”*Journal of Statistical Software* 33 (1): 1–22.

Fuglstad, Geir-Arne, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2019.“Constructing Priors That Penalize the Complexity of Gaussian Random Fields.”*Journal of the American Statistical Association* 114 (525): 445–52.

Geer, Sara van de. 2014a.“Weakly Decomposable Regularization Penalties and Structured Sparsity.”*Scandinavian Journal of Statistics* 41 (1): 72–86.

———. 2014b.“Statistical Theory for High-Dimensional Models.”*arXiv:1409.8557 [Math, Stat]*, September.

Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014.“On the Stability of Deep Networks.”*arXiv:1412.5896 [Cs, Math, Stat]*, December.

Golub, Gene H., Michael Heath, and Grace Wahba. 1979.“Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter.”*Technometrics* 21 (2): 215–23.

Golubev, Grigori K., and Michael Nussbaum. 1990.“A Risk Bound in Sobolev Class Regression.”*The Annals of Statistics* 18 (2): 758–78.

Green, P. J. 1990.“Bayesian Reconstructions from Emission Tomography Data Using a Modified EM Algorithm.”*IEEE Transactions on Medical Imaging* 9 (1): 84–93.

Green, Peter J. 1990.“On Use of the EM for Penalized Likelihood Estimation.”*Journal of the Royal Statistical Society. Series B (Methodological)* 52 (3): 443–52.

Gu, Chong. 1993.“Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm.”*Journal of the American Statistical Association* 88 (422): 495–504.

Gui, Jiang, and Hongzhe Li. 2005.“Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data.”*Bioinformatics* 21 (13): 3001–8.

Hastie, Trevor J., and Robert J. Tibshirani. 1990.*Generalized Additive Models*. Vol. 43. CRC Press.

Hastie, Trevor J., Tibshirani, Rob, and Martin J. Wainwright. 2015.*Statistical Learning with Sparsity: The Lasso and Generalizations*. Boca Raton: Chapman and Hall/CRC.

Hawe, S., M. Kleinsteuber, and K. Diepold. 2013.“Analysis Operator Learning and Its Application to Image Reconstruction.”*IEEE Transactions on Image Processing* 22 (6): 2138–50.

Hegde, Chinmay, Piotr Indyk, and Ludwig Schmidt. 2015.“A Nearly-Linear Time Framework for Graph-Structured Sparsity.” In*Proceedings of the 32nd International Conference on Machine Learning (ICML-15)*, 928–37.

Hoerl, Arthur E., and Robert W. Kennard. 1970.“Ridge Regression: Biased Estimation for Nonorthogonal Problems.”*Technometrics* 12 (1): 55–67.

Huang, Jianhua Z., Naiping Liu, Mohsen Pourahmadi, and Linxu Liu. 2006.“Covariance Matrix Selection and Estimation via Penalised Normal Likelihood.”*Biometrika* 93 (1): 85–98.

James, William, and Charles Stein. 1961.“Estimation with Quadratic Loss.” In*Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability*, 1:361–79. University of California Press.

Janson, Lucas, William Fithian, and Trevor J. Hastie. 2015.“Effective Degrees of Freedom: A Flawed Metaphor.”*Biometrika* 102 (2): 479–85.

Javanmard, Adel, and Andrea Montanari. 2014.“Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.”*Journal of Machine Learning Research* 15 (1): 2869–909.

Kaufman, S., and S. Rosset. 2014.“When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counterexamples.”*Biometrika* 101 (4): 771–84.

Kloft, Marius, Ulrich Rückert, and Peter L. Bartlett. 2010.“A Unifying View of Multiple Kernel Learning.” In*Machine Learning and Knowledge Discovery in Databases*, edited by José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag, 66–81. Lecture Notes in Computer Science. Springer Berlin Heidelberg.

Koenker, Roger, and Ivan Mizera. 2006.“Density Estimation by Total Variation Regularization.”*Advances in Statistical Modeling and Inference*, 613–34.

Konishi, Sadanori, and G. Kitagawa. 2008.*Information Criteria and Statistical Modeling*. Springer Series in Statistics. New York: Springer.

Konishi, Sadanori, and Genshiro Kitagawa. 1996.“Generalised Information Criteria in Model Selection.”*Biometrika* 83 (4): 875–90.

Lange, K. 1990.“Convergence of EM image reconstruction algorithms with Gibbs smoothing.”*IEEE transactions on medical imaging* 9 (4): 439–46.

Liu, Han, Kathryn Roeder, and Larry Wasserman. 2010.“Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.” In*Advances in Neural Information Processing Systems 23*, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 1432–40. Curran Associates, Inc.

Meinshausen, Nicolai, and Peter Bühlmann. 2010.“Stability Selection.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 72 (4): 417–73.

Meyer, Mary C. 2008.“Inference Using Shape-Restricted Regression Splines.”*The Annals of Applied Statistics* 2 (3): 1013–33.

Miller, David L., Richard Glennie, and Andrew E. Seaton. 2020.“Understanding the Stochastic Partial Differential Equation Approach to Smoothing.”*Journal of Agricultural, Biological and Environmental Statistics* 25 (1): 1–16.

Montanari, Andrea. 2012.“Graphical Models Concepts in Compressed Sensing.”*Compressed Sensing: Theory and Applications*, 394–438.

Needell, D., and J. A. Tropp. 2008.“CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.”*arXiv:0803.2392 [Cs, Math]*, March.

Rahimi, Ali, and Benjamin Recht. 2009.“Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In*Advances in Neural Information Processing Systems*, 1313–20. Curran Associates, Inc.

Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2015.“Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” In*Proceedings of ICML*.

Rigollet, Philippe, and Jonathan Weed. 2018.“Entropic Optimal Transport Is Maximum-Likelihood Deconvolution.” arXiv.

Shen, Xiaotong, and Hsin-Cheng Huang. 2006.“Optimal Model Assessment, Selection, and Combination.”*Journal of the American Statistical Association* 101 (474): 554–68.

Shen, Xiaotong, Hsin-Cheng Huang, and Jimmy Ye. 2004.“Adaptive Model Selection and Assessment for Exponential Family Distributions.”*Technometrics* 46 (3): 306–17.

Shen, Xiaotong, and Jianming Ye. 2002.“Adaptive Model Selection.”*Journal of the American Statistical Association* 97 (457): 210–21.

Silverman, B. W. 1982.“On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method.”*The Annals of Statistics* 10 (3): 795–810.

———. 1984.“Spline Smoothing: The Equivalent Variable Kernel Method.”*The Annals of Statistics* 12 (3): 898–916.

Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011.“Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.”*Journal of Statistical Software* 39 (5).

Smola, Alex J., Bernhard Schölkopf, and Klaus-Robert Müller. 1998.“The Connection Between Regularization Operators and Support Vector Kernels.”*Neural Networks* 11 (4): 637–49.

Somekh-Baruch, Anelia, Amir Leshem, and Venkatesh Saligrama. 2016.“On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems.”*arXiv:1609.07415 [Cs, Math, Stat]*, September.

Stein, Charles M. 1981.“Estimation of the Mean of a Multivariate Normal Distribution.”*The Annals of Statistics* 9 (6): 1135–51.

Tansey, Wesley, Oluwasanmi Koyejo, Russell A. Poldrack, and James G. Scott. 2014.“False Discovery Rate Smoothing.”*arXiv:1411.6144 [Stat]*, November.

Tikhonov, A. N., and V. B. Glasko. 1965.“Use of the Regularization Method in Non-Linear Problems.”*USSR Computational Mathematics and Mathematical Physics* 5 (3): 93–107.

Uematsu, Yoshimasa. 2015.“Penalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application.”*arXiv:1504.06706 [Math, Stat]*, April.

Wahba, Grace. 1990.*Spline Models for Observational Data*. SIAM.

Weng, Haolei, Arian Maleki, and Le Zheng. 2018.“Overcoming the Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques.”*The Annals of Statistics* 46 (6A): 3099–129.

Wood, S. N. 2000.“Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 62 (2): 413–28.

Wood, Simon N. 2008.“Fast Stable Direct Fitting and Smoothness Selection for Generalized Additive Models.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 70 (3): 495–518.

Wu, Tong Tong, and Kenneth Lange. 2008.“Coordinate Descent Algorithms for Lasso Penalized Regression.”*The Annals of Applied Statistics* 2 (1): 224–44.

Xie, Bo, Yingyu Liang, and Le Song. 2016.“Diversity Leads to Generalization in Neural Networks.”*arXiv:1611.03131 [Cs, Stat]*, November.

Ye, Jianming. 1998.“On Measuring and Correcting the Effects of Data Mining and Model Selection.”*Journal of the American Statistical Association* 93 (441): 120–31.

Zhang, Cun-Hui, and Stephanie S. Zhang. 2014.“Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 76 (1): 217–42.

Zhang, Yiyun, Runze Li, and Chih-Ling Tsai. 2010.“Regularization Parameter Selections via Generalized Information Criterion.”*Journal of the American Statistical Association* 105 (489): 312–23.

Zou, Hui, and Trevor Hastie. 2005.“Regularization and Variable Selection via the Elastic Net.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 67 (2): 301–20.

Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2007.“On the‘Degrees of Freedom’ of the Lasso.”*The Annals of Statistics* 35 (5): 2173–92.

Jupyter, the python code dev environment, is not a monolithic thing, but a whole ecology.
Different language back end kernels talk tovarious front end interfaces with varying systems of interaction and rendering in the mix.
I can execute python code interactively using several*different* front-ends, each offering different user experiences.
Because the, er, food web of this ecosystem is complicated it is not always easy to know which chunk of code is responsible for which part of the user experience.

Of the many irritations I have with jupyter, the IMO horrible default frontend (“classic” or “lab”) provide the worse user experience. But there are choices for all tastes and some of them are not as bad. Let us examine some:

- command line
- Classic jupyter notebook, the archetypal browser-based coding environment.
The command
`jupyter notebook`

starts this frontend. - jupyterlab, the newer iteration of classic notebook, extends and redesigns the classic notebook into a pseudo-IDE with idiosyncratic text editors, notebooks, REPL terminals etc.
The command
`jupyter lab`

starts this front-end. It is more powerful than classic but also even more confusing. - base
`ipython`

shell can execute notebooks. (Is that the same as`jupyter console`

?) ~~vscodeJupyter is an editor for notebooksVS Code.~~VS code hasjupyter integration withcorporate backing It is fast and IMO less nasty than the horrible jupyter notebook UX which I constantly whinge about. This is what I currently use. IMO, IDE interfaces like this represent a generally better way of doing python execution, because for most of us, executing python code requires writing python code. Many jupyter frontends are great at running code, but not so great at editing code.Code editors and IDEs exist and have had much effort poured into them, to the point where they are pretty good at editing code. Jupyter should not need to reinvent text editors; Although my opinions will not dissuade some Quixote from continuing to try. (C&C the phenomenon ofRstudio insistently reinventing code editors for R).- hydrogen is an equivalent for Atom that I have not used much but might be fine.
- nteract is a system for turning juptyer notebooks into apps somehow?
- pweave, also executes jupyter kernels as part of areproducible document.
- qtconsole is a traditional client which de-emphasises browser-based stuff in favour of desktop-integrated windows.
- Probably others too.

`jupyter execute notebook.ipynb`

*What is the difference between this and just running normal python scripts from the command line?*, you might ask.
For one thing, jupyter wastes more hard disk space and doesn’t version nicely.
Butpapermill improves it further, turning a jupyter notebook into a python script with a mediocrepython command line interface.

papermillis a tool for parameterizing, executing, and analyzing Jupyter Notebooks.Papermill lets you:

parameterizenotebooksexecutenotebooksThis opens up new opportunities for how notebooks can be used. For example:

- Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year,
using parametersmakes this task easier.- Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically
execute a workflowwithout having to copy and paste from notebook to notebook manually.

VS code’s jupyter integration is good-ish and its support for plain python is excellent.

A jupyter frontend is*already* an app that we need to install to interact with the code; so why not install a good one instead of a bad one?
Further, there are a huge number of benefits to using a proper editor to edit code, instead of jupyter.
More benefits than I care to enumerate, but I’ll start.

I do not need to learn new keyboard shortcuts.
I do not need to arse around with the various defective reimplementations of text editors from inside the browser.
Further, heaps of other things that jupyter cannot even dream of just magically work!
Session sharing? No problem. Remote editing? Easy!
Type inference! Autocomplete! Debugger injection!
Search and replace across a whole project! Style checking! refactor assistance!*Comprehensible documentation*.

SeeVS Code for python for more details.

Still, if that is not your jam, read on for some onerous alternatives that I hate.

The one that comes recommended until, I dunno, 2015 or so I guess? I cannot recall. Worth knowing about to set expectations.

The location of themeing infrastructure, widgets, CSS etc has moved of late;
check your version number.The current location is`~/.jupyter/custom/custom.css`

, not the former location`~/.ipython/profile_default/static/custom/custom.css`

Julius Schulz’s ultimate setup guide is also the ultimate pro tip compilation.

If I must use jupyter notebok then I kill parenthesis molestation (refered to in the docs as*bracket autoclose*) with fire.
I do not like having to fight with the notebook’s misplaced faith in its ability to read my mind.
The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead`cm_config.autoCloseBrackets`

and is not in the preference menus.
According toa support ticket, the following should work.

```
# Run this in Python once, it should take effect permanently
from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {
"cm_config": {"autoCloseBrackets": False}}})
```

or add the following to`custom.js`

:

```
define([
'base/js/namespace',
], function(Jupyter) {
Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})
```

or maybe create`~/.jupyter/nbconfig/notebook.json`

with the content

```
{
"CodeCell": {
"cm_config": {
"autoCloseBrackets": false
}
}
}
```

That doesn’t work with`jupyterlab`

, which is even more righteously sure that itknows better than you about what you truly wish to do with parentheses.
Perhaps the following does work?
Go to`Settings --> Advanced Settings Editor`

andadd the following
to the User Overrides section:

```
{
"codeCellConfig": {
"autoClosingBrackets": false
}
}
```

or in the file`~/.jupyter/lab/user-settings/@jupyterlab/notebook-extension/tracker.jupyterlab-settings`

.

Ahhhhhhhh. Update: it is easier now.

Jupyter classic is more usable if you install thenotebook extensions, which includes, e.g. drag-and-drop image support.

```
$ pip install --upgrade jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user
```

For example, if you run`nbconvert`

to generate a HTML file,
this image will remain outside of the html file.
You can embed all images by using the calling`nbconvert`

with the`EmbedPostProcessor`

.

`$ jupyter nbconvert --post=embed.EmbedPostProcessor`

**Update** — broken in Jupyter 5.0

Wait, that was still pretty confusing; I needthe notebook configurator whatsit.

```
$ pip install --upgrade jupyter_nbextensions_configurator
$ jupyter nbextensions_configurator enable --user
```

`jupyter lab`

(sometimes styled`jupyterlab`

)
is the current cutting edge according to jupyter mainline, and reputedly is much nicer to
develop plugins for than the notebook interface.
From the user perspective it’s more or less the same thing, but the annoyances are different.
It does not strictly dominate`notebook`

in terms of user experience,
although I understand it may do in terms of experience for plugin developers.

Apitch targeted at us users explains some practical implications of jupyterlab and how it is the one true way and the righteous future etc. Since we are not in the future, though, we must deal with certain friction points in the present, actually-existing jupyterlab.

The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already worked, but as a jankier javascript version. The jupyter lab creators have succumbed as far as reimplementing copy, paste, search/replace, browser tabs and the command line. In the tradition of jupyter this I think of this as

Yo dawg I heard you like notebook tabs so I put notebook tabs in your notebook browser tab.

The replacement jupyter tab system clashes with the browser tab system, keyboard shortcuts and generally confuses the eye.
Why do we get some a search function which AFAICT non-deterministically sometimes does regexp matching but then doesn’t search the whole page?
Possibly the intention is that it should not be run through the browser, but in a custom app?
Or maybe for true believers, once you load up a jupyter notebook you like it so hard that you never need to browse to a website on the internet again and you close all the other tabs forever.
Whatever the rationale, the learning curve for this particular bag of weird UI choices is bumpy, and the lessons are not transferable.
Is the learning process worth it though?
Am I merely grumpy because I do not like some things?
I am*not* using jupyter for its artisinal, quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics.
Does it get me that?

Maybe. If I get in and out fast, do most of my development in a real code editor and leave the jupyter nonsense for results sharing, I neither need the weird UI, nor am I bothered by it. The other features are like the button on the microwave labelled “fish”, which no one has ever neede or intentionally used, but which pointless butoon does not stop the microwave from defrosting things.

At the same time, some jupyterlab enthusiasts want tore-implement text editors, which is an indicator that there might be a contagion ofNIH fever going about the community.

Whether*I* like the overwrought jupyter lab UX or not, we should allow it a general nonsense baggage allowance,
if the developer API is truly cleaner and easier to work with.
That would be a solid win in terms of delivering the features I would actually regard as improvements including, maybe, ultimately, a better UI.

Simultaneous users aresupported natively in jupyterlab with anOK UI for identifying running notebooks. IMO this is the killer feature of jupyter lab. Which is to say, this is useable but sometime calculation output gets lost in the system somewhere.

There is anunderdocumented project to introducereal time collaboration to jupyterlab coordinating on notebook content code, output*and* backend state,which apparently works, but is barely mentioned in the docs.
Maybe theJupyterLab RTC Design Documentation would help there.

If you have a snakepit of different jupyter sessions running on some machine you have just logged in to and which to open up the browser to get a UI for them, then you want to work out which are running on a given machine so that you can attach them. The command is (for either jupyter notebook notebooks or jupyter lab sessions):

`jupyter notebook list`

Related to, inspired but and maybe conflicting or intersecting with the nbextensions are thelabextensions, which add bits of extra functionality to the lab interface rather than the notebook interface (where the lab interface is built upon the notebook interface and runs notebooks just like it but has some different moving parts under the hood.)

I try to keep the use of these to a minimum as I have a possibly irrational foreboding that some complicated death spiral of version clashes is beginning between all the different juptyer kernel and lab and notebook installations I have cluttering up my hard disk and it can’t improve things to put various versions of lab extensions in the mix can it? And I really don’t want to have to understand how it works to work out whether that is true or not, so please don’t explain it to me. I moreover do not wish to obsessively update lab extensions everywhere.

Anyway there are some useful ones, so I live with it by running install and update commands obsessively in every combination of kernel/lab/whatever environment in the hope that something sticks.

Life is easier withjupyerlab-toc which allows you to navigate your lab notebook by markdown section headings.

`jupyter labextension install @jupyterlab/toc`

`jupyter labextension update @jupyterlab/toc`

Integrated diagram editor? Someone integrateddrawio asjupyterlab-drawio toprove a point about the developer API thing.

`jupyter labextension install jupyterlab-drawio`

LaTeX editor? As flagged, I think this is a terrible idea. Even worse than the diagram editor. There arebetter editors than jupyter, bettermeans of scientific communication than latex, and betterspecific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, e.g. as a plot point in a contrived techno-thriller script written by cloistered nerds. If you find yourself in such dramaturgical straits:

`jupyter labextension install @jupyterlab/latex`

One nerdy extension isjupyter-matplotlib,
a.k.a., confusingly,`ipympl`

,
which integrates interactive plotting into the notebook better.

```
pip install ipympl
# If using JupyterLab
# Install nodejs: https://nodejs.org/en/download/
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyter-matplotlib
```

jupyterlab/jupyterlab-hdf5 claims to provide aUI for HDF5 files.

`qtconsole`

2>A classic, i.e. non-web-browser based client for jupyter. No longer fashionable? Seems to work fine but is sometimes difficult to compile and doesn’t support all the fancy client-side extensions.

`jupyter qtconsole`

can connect two frontends to the same kernel.
This will be loopy since they update the same variables (presumably) but AFAICT not the same notebook content, so some care would be required to make sure you are doing what you intend.

`%qtconsole`

A proprietary google fork/extension,Colaboratory is a jupyter thingo integrated with some fancy hosting and storage infrastructure, and you get free GPUs. Looks like a neat way of sharing things, and all that they demand is your soul. Don’t be fooled, though, claims you see on the internet that this is a real-time collaborative environmentare false, Gogglekilled realtime interaction.

hydrogen, a plugin for the`atom`

text editor,
provides a more unified coding experience with a normal code editor. See theintro blog post.

pweave is likeknitr for python. It also executes jupyter kernels. The emphasis is not interactive but rather reproducible documents.

nteract is a system for running jupyter notebooks as desktop apps, integrating with OS indexing services and looking pretty etc. Not totally sold on this idea because it looks so bloaty, but I would like to be persuaded.

generative art using DALLE2, stable diffusion, midjourney etc, which arediffusion +transformer models.

CLIP presumably goes here.

How do I run Stable Diffusion and sharing FAQs : StableDiffusion

diffusion_models/diffusion_03_waveform.ipynb at main · acids-ircam/diffusion_models

This startup is setting a DALL-E 2-like AI free, consequences be damned | TechCrunch

Google AI Blog: High Fidelity Image Generation Using Diffusion Models

Dhariwal, Prafulla, and Alex Nichol. 2021.“Diffusion Models Beat GANs on Image Synthesis.”*arXiv:2105.05233 [Cs, Stat]*, June.

Han, Xizewen, Huangjie Zheng, and Mingyuan Zhou. 2022.“CARD: Classification and Regression Diffusion Models.” arXiv.

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. 2020.“Denoising Diffusion Probabilistic Models.”*arXiv:2006.11239 [Cs, Stat]*, December.

Hoogeboom, Emiel, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, and Tim Salimans. 2021.“Autoregressive Diffusion Models.”*arXiv:2110.02037 [Cs, Stat]*, October.

Nichol, Alex, and Prafulla Dhariwal. 2021.“Improved Denoising Diffusion Probabilistic Models.”*arXiv:2102.09672 [Cs, Stat]*, February.

Sohl-Dickstein, Jascha, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015.“Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”*arXiv:1503.03585 [Cond-Mat, q-Bio, Stat]*, November.

Song, Jiaming, Chenlin Meng, and Stefano Ermon. 2021.“Denoising Diffusion Implicit Models.”*arXiv:2010.02502 [Cs]*, November.

Yang, Ling, Zhilong Zhang, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Ming-Hsuan Yang, and Bin Cui. 2022.“Diffusion Models: A Comprehensive Survey of Methods and Applications.” arXiv.

Moving the mouse cursor by looking where you want it to go. These devices seem gamer-oriented right now. Some do integrate intovoice control systems though.

On heuristicmechanism andinstitutional design forcommunities of scientific practice for thecommon property resource that is human knowledge. Sociology of science, in other words. How do diverse underfundedteams manage to advance truth with their weirdprestige economy despite the many pitfalls ofpublication filters and such? What is effective in designing communities, practice and social norms? Both of scientific insidersand outsiders? How much communication is too much? How much iconoclasm is right todefeat groupthink and foster thespread of good ideas? At an individual level we might wonder aboutsoft methodology.

A place to file questions like this, in other words(O’Connor and Wu 2021):

Diversity of practice is widely recognized as crucial to scientific progress. If all scientists perform the same tests in their research, they might miss important insights that other tests would yield. If all scientists adhere to the same theories, they might fail to explore other options which, in turn, might be superior. But the mechanisms that lead to this sort of diversity can also generate epistemic harms when scientific communities fail to reach swift consensus on successful theories. In this paper, we draw on extant literature using network models to investigate diversity in science. We evaluate different mechanisms from the modeling literature that can promote transient diversity of practice, keeping in mind ethical and practical constraints posed by real epistemic communities. We ask: what are the best ways to promote the right amount of diversity of practice in such communities?

- Molecular Missionaries - by Samir Unni - New Science
- Open call:Consequences of the Scientific Reform Movement · Journal of Trial & Error
- Scott And Scurvy (Idle Words)
- Escaping science’s paradox - Works in Progress discusses red-teaming science and incentivising productive failures
- Progress Studies: A Discipline is a Set of Institutional Norms (connection toinnovation)

Hanania onTetlock and the Taliban makes a point about the illusory nature of some expertise.

[Tetlock’s results] show that “expertise” as we understand it is largely fake. Should you listen to epidemiologists or economists when it comes to COVID-19? Conventional wisdom says “trust the experts.” The lesson of Tetlock (and the Afghanistan War), is that while you certainly shouldn’t be getting all your information from your uncle’s Facebook Wall, there is no reason to start with a strong prior that people with medical degrees know more than any intelligent person who honestly looks at the available data.

He has some clever examples about science community in there.
Then he draws a longer bow and makes some IMO less considered swipes at straw-mandiversity which somewhat ruins the effect for me.Zeynep Tufekci gets at the actual problem that I think both people who talk about*contrarianism* and*diversity* would like to get at: Do the incentives, and especially the incentives in social structures, actually encourage the researchers towards truths, or towards collective fictions?

Sometimes, going against consensus is conflated with contrarianism. Contrarianism is juvenile, and misleads people. It’s not a good habit.

The opposite of contrarianism isn’t accepting elite consensus or being gullible.

Groupthink, especially when big interests are involved, is common. The job is to resist groupthink with facts, logic, work and a sense of duty to the public. History rewards that, not contrarianism.

To get the right lessons from why we fail—be it masks or airborne transmission or failing to regulate tech when we could or Iraq war—it’s key to study how that groupthink occurred. It’s a sociological process: vested interests arguing themselves into positions that benefit them.

Scott Alexander,Contrarians, Crackpots, and Consensus tries to crack this one open with an ontology.

I think a lot of things are getting obscured by the term “scientific establishment” or “scientific consensus”. Imagine a pyramid with the following levels from top to bottom:

FIRST, specialist researchers in a field…

SECOND, non-specialist researchers in a broader field…

THIRD, the organs and administrators of a field who help set guidelines…

FOURTH, science journalism, meaning everyone from the science reporters at the New York Times to the guys writing books with titles like The Antidepressant Wars to random bloggers…

ALSO FOURTH IN A DIFFERENT COLUMN OF THE PYRAMID BECAUSE THIS IS A HYBRID GREEK PYRAMID THAT HAS COLUMNS, “fieldworkers”, aka the professionals we charge with putting the research into practice. … FIFTH, the general public.

A lot of these issues make a lot more sense in terms of different theories going on at the same time on different levels of the pyramid. I get the impression that in the 1990s, the specialist researchers, the non-specialist researchers, and the organs and administrators were all pretty responsible about saying that the serotonin theory was just a theory and only represented one facet of the multifaceted disease of depression. Science journalists and prescribing psychiatrists were less responsible about this, and so the general public may well have ended up with an inaccurate picture.

- Why open science is primarily a labour issue - Samuel Moore
- The online information environment | Royal Society
- Jonathan Rauch — The Constitution of Knowledge: A Defense of Truth
- ISPs Funded 8.5 Million Fake Comments Opposing Net Neutrality
- Biggest ISPs paid for 8.5 million fake FCC comments opposing net neutrality
- 80% of the 22 million comments on net neutrality rollback were fake, investigation finds
- How much should theories guide learning through experiments?
- Molecular Missionaries - by Samir Unni - New Science
- Every Bug is Shallow if One of Your Readers is an Entomologist – SLIME MOLD TIME MOLD

Bright (2023)

Du Bois took quite the opposite route from trying to introduce lotteries, with their embrace of chance randomization. In fact, to a very considerable degree he centrally planned the sort of research his group would carry out so as to form an interlinking whole. Where the status quo system allows for competition between scientists to give funding out piecemeal to whoever seems best at a given moment, Du Bois’ work embodies the attitude that as far as possible our research activities should be coordinated, and not aimed at rewarding individual greatness but rather producing the best overall project. While ideas along these lines have not been totally without support in the history of philosophy of science (see e.g. Neurath 1946, Bernal 1949, Kummerfeld & Zollman 2015), it is safe to say the epistemic merits of this are relatively under-explored. Our brief examination of Du Bois’ plan will thus hopefully form a spur to generate more consideration of this sort of holistic line of action

Agassi, Joseph. 1974.“The Logic of Scientific Inquiry.”*Synthese* 26: 498–514.

Alon, Uri. 2009.“How to Choose a Good Scientific Problem.”*Molecular Cell* 35 (6): 726–28.

Arbesman, Samuel, and Nicholas A Christakis. 2011.“Eurekometrics: Analyzing the Nature of Discovery.”*PLoS Comput Biol* 7 (6): –1002072.

Arvan, Marcus, Liam Kofi Bright, and Remco Heesen. 2022.“Jury Theorems for Peer Review.”*The British Journal for the Philosophy of Science*, January.

Azoulay, Pierre, Christian Fons-Rosen, and Joshua S. Graff Zivin. 2015.“Does Science Advance One Funeral at a Time?” Working Paper 21788. National Bureau of Economic Research.

Bhattacharya, Jay, and Mikko Packalen. 2020.“Stagnation and Scientific Incentives.” Working Paper 26752. National Bureau of Economic Research.

Board, Simon, and Moritz Meyer-ter-Vehn. 2021.“Learning Dynamics in Social Networks.”*Econometrica* 89 (6): 2601–35.

Bright, liam kofi. 2023.“Du Bois on the Centralised Organisation of Science.” In*Pluralising Philosophy’s Past*, edited by Marius Backmann and Amber Griffioen.

Campante, Filipe, Ruben Durante, and Andrea Tesei. 2022.“Media and Social Capital.”*Annual Review of Economics* 14 (1): 69–91.

Chu, Johan S. G., and James A. Evans. 2021.“Slowed Canonical Progress in Large Fields of Science.”*Proceedings of the National Academy of Sciences* 118 (41): e2021636118.

Dang, Haixin, and Liam Kofi Bright. 2021.“Scientific Conclusions Need Not Be Accurate, Justified, or Believed by Their Authors.”*Synthese* 199 (3-4): 8187–8203.

Devezer, Berna, Luis G. Nardin, Bert Baumgaertner, and Erkan Ozge Buzbas. 2019.“Scientific Discovery in a Model-Centric Framework: Reproducibility, Innovation, and Epistemic Diversity.”*PLOS ONE* 14 (5): e0216125.

Dubova, Marina, Arseny Moskvichev, and Kevin Zollman. 2022.“Against Theory-Motivated Experimentation in Science.” MetaArXiv.

Farrow, Robert, and Rolin Moe. 2019.“Rethinking the Role of the Academy: Cognitive Authority in the Age of Post-Truth.”*Teaching in Higher Education* 24 (3): 272–87.

Galesic, Mirta, Daniel Barkoczi, Andrew Berdahl, Dora Biro, Giuseppe Carbone, Ilaria Giannoccaro, Robert Goldstone, et al. 2022.“Beyond Collective Intelligence: Collective Adaptation.” SocArXiv.

Gasparyan, Armen Yuri, Alexey N. Gerasimov, Alexander A. Voronov, and George D. Kitas. 2015.“Rewarding Peer Reviewers: Maintaining the Integrity of Science Communication.”*Journal of Korean Medical Science* 30 (4): 360–64.

Greenberg, Steven A. 2009.“How Citation Distortions Create Unfounded Authority: Analysis of a Citation Network.”*BMJ* 339 (July): b2680.

Healy, Kieran. 2015.“The Performativity of Networks.”*European Journal of Sociology* 56 (02): 175–205.

Heesen, Remco, and Liam Kofi Bright. 2021.“Is Peer Review a Good Idea?”*The British Journal for the Philosophy of Science* 72 (3): 635–63.

Ioannidis, John P. 2005.“Why Most Published Research Findings Are False.”*PLoS Medicine* 2 (8): –124.

Jan, Zeeshan. n.d.“Recognition and Reward System for Peer-Reviewers,” 9.

Kearns, Hugh, and Maria Gardiner. 2011.“The Care and Maintenance of Your Adviser.”*Nature* 469 (7331): 570–70.

Lakatos, Imre. 1980.*The Methodology of Scientific Research Programmes: Volume 1 : Philosophical Papers*. Cambridge University Press.

McElreath, Richard, and Robert Boyd. 2007.*Mathematical Models of Social Evolution: A Guide for the Perplexed*. University Of Chicago Press.

McElreath, Richard, and Paul E. Smaldino. 2015.“Replication, Communication, and the Population Dynamics of Scientific Discovery.”*arXiv:1503.02780 [Stat]*, March.

Merrifield, Michael R, and Donald G Saari. 2009.“Telescope Time Without Tears: A Distributed Approach to Peer Review.”*Astronomy & Geophysics* 50 (4): 4.16–20.

Merton, Robert K. 1968.“The Matthew Effect in Science.”*Science* 159 (3810): 56–63.

———. 1988.“The Matthew Effect in Science, II: Cumulative Advantage and the Symbolism of Intellectual Property.”*Isis* 79 (4): 606–23.

Nissen, Silas B., Tali Magidson, Kevin Gross, and Carl T. Bergstrom. 2016.“Publication Bias and the Canonization of False Facts.”*arXiv:1609.00494 [Physics, Stat]*, September.

O’Connor, Cailin. 2017.“Evolving to Generalize: Trading Precision for Speed.”*British Journal for the Philosophy of Science* 68 (2).

O’Connor, Cailin, and Justin Bruner. 2019.“Dynamics and Diversity in Epistemic Communities.”*Erkenntnis* 84 (1): 101–19.

O’Connor, Cailin, and James Owen Weatherall. 2017.“Scientific Polarization.”*European Journal for Philosophy of Science* 8 (3): 855–75.

O’connor, Cailin, and James Owen Weatherall. 2019.*The Misinformation Age: How False Beliefs Spread*. 1 edition. New Haven: Yale University Press.

O’Connor, Cailin, and Jingyi Wu. 2021.“How Should We Promote Transient Diversity in Science?” MetaArXiv.

Rekdal, Ole Bjørn. 2014.“Academic Urban Legends.”*Social Studies of Science* 44 (4): 638–54.

Robbins, Lionel. 1932.*An Essay on the Nature and Significance of Economic Science*. Macmillan.

Ross, Matthew B., Britta M. Glennon, Raviv Murciano-Goroff, Enrico G. Berkes, Bruce A. Weinberg, and Julia I. Lane. 2022.“Women Are Credited Less in Science Than Men.”*Nature*, June, 1–11.

Rubin, Hannah, and Cailin O’Connor. 2018.“Discrimination and Collaboration in Science.”*Philosophy of Science* 85 (3): 380–402.

Rzhetsky, Andrey, Jacob G. Foster, Ian T. Foster, and James A. Evans. 2015.“Choosing Experiments to Accelerate Collective Discovery.”*Proceedings of the National Academy of Sciences* 112 (47): 14569–74.

Smith, Lones, Peter Norman Sørensen, and Jianrong Tian. 2021.“Informational Herding, Optimal Experimentation, and Contrarianism.”*The Review of Economic Studies* 88 (5): 2527–54.

Spranzi, Marta. 2004.“Galileo and the Mountains of the Moon: Analogical Reasoning, Models and Metaphors in Scientific Discovery.”*Journal of Cognition and Culture* 4 (3): 451–83.

Stove, David Charles. 1982.*Popper and After: Four Modern Irrationalists*. Pergamon.

Suppes, Patrick. 2002.*Representation and Invariance of Scientific Structures*. CSLI Publications.

Thagard, Paul. 1993.“Societies of Minds: Science as Distributed Computing.”*Studies in History and Philosophy of Modern Physics* 24: 49.

———. 1994.“Mind, Society, and the Growth of Knowledge.”*Philosophy of Science* 61.

———. 1997.“Collaborative Knowledge.”*Noûs* 31 (2): 242–61.

———. 2005.“How to Be a Successful Scientist.”*Scientific and Technological Thinking*, 159–71.

———. 2007.“Coherence, Truth, and the Development of Scientific Knowledge.”*Philosophy of Science* 74: 28–47.

Thagard, Paul, and Abninder Litt. 2008.“Models of Scientific Explanation.” In*The Cambridge Handbook of Computational Psychology*. Cambridge: Cambridge University Press.

Thagard, Paul, and Jing Zhu. 2003.“Acupuncture, Incommensurability, and Conceptual Change.”*Intentional Conceptual Change*, 79–102.

Thurner, Stefan, and Rudolf Hanel. 2010.“Peer-Review in a World with Rational Scientists: Toward Selection of the Average.”

Valente, Thomas W, and Everett M. Rogers. 1995.“The Origins and Development of the Diffusion of Innovations Paradigm as an Example of Scientific Growth.”*Science Communication* 16 (3): 242–73.

Vazire, Simine. 2017.“Our Obsession with Eminence Warps Research.”*Nature News* 547 (7661): 7.

Wagenmakers, Eric-Jan, Alexandra Sarafoglou, and Balazs Aczel. 2022.“One Statistical Analysis Must Not Rule Them All.”*Nature* 605 (7910): 423–25.

Weisbuch, Gérard, Guillaume Deffuant, Frédéric Amblard, and Jean-Pierre Nadal. 2002.“Meet, Discuss, and Segregate!”*Complexity* 7 (3): 55–63.

Weng, L, A Flammini, A Vespignani, F Menczer, L Weng, A Flammini, A Vespignani, and F Menczer. 2012.“Competition Among Memes in a World with Limited Attention.”*Scientific Reports* 2.

Wible, James R. 1998.*Economics of Science*. Routledge.

Williams, Daniel. 2022.“The Marketplace of Rationalizations.”*Economics & Philosophy*, March, 1–25.

Woodley, Lou, and Katie Pratt. 2020.“The CSCCE Community Participation Model – A Framework to Describe Member Engagement and Information Flow in STEM Communities,” August.

Wu, Jingyi, Cailin O’Connor, and Paul E. Smaldino. 2022.“The Cultural Evolution of Science.” MetaArXiv.

Yarkoni, Tal. 2019.“The Generalizability Crisis.” Preprint. PsyArXiv.

Zimmer, Carl. 2020.“How You Should Read Coronavirus Studies, or Any Science Paper.”*The New York Times*, June 1, 2020, sec. Science.

Is theInternal Family Systems Model of psychotherapy actually useful? In this system people are encouraged to think of themselves as a family of little sub-people.What would it say about us if this works?.

Multi agent mind after Kaj Sotala

This website is a static site, by which I mean,*it is a folder of files on my hard drive*
.

When I want to publish new content, I run these files through a static site generator, which bundles them up, generates an index and a content page, formats everything as HTML files a web browser can understand, then copies*those* files to a server somewhere.
After that, I am free from any further responsibility for its upkeep.
The server that hosts this content can be extremely simple, which means I do not need to spend much effort on security or configuration, or hosting fees etc.

This is a high performance, low-friction way of doing things, at least for me.
I do not need to worry about manually copying my notes from my hard drive to the website.
My notes*are* my website.

The main pain point of static sites IMO is that there are many systems for making them, each pitched at a particular level of nerdiness, but there are few methods targeted at non-nerds.
Also, yak shaving risk: Such sites are highly customisable, and so cry out for automation and macros and setting up*just how I like it*, which is probably not how other people like it.
Any static site generator which is too nerdy seems incomprehensibly idiosyncratic.
Any static sit generator which is not nerdy enough seems tediously menial.
The upshot is that these things are great for personal use but can be tricky for collaboration.

Theacademic blogging workflow is a sequel to this one, targeted to researchers
wherein I recommend plain text static-site blogging.*Here* I do not worry so much about certain features which are important mostly to academics,
e.g. mathematical equations, graphs, citations…

Bloggers might have less academic priorities. If you want less mathematical markup and more monetization, trythe blogophere.

This documentation is oriented to my priorities, but you can find a lot of stuff googling*JAMstack*, which is the hype name for this static setup.

The static site generator. The core bit. The software that takes my plain content files and turns them into friendly websites with all the nice decorations around the edges and colour schemes and indexes and stuff.

As mentioned,there are hundreds — maybe thousands — of static site generators. The lineage is ancient, including suchthe original World Wide Web and its progenitors, travelling via the primordial (and no longer active) static site generators such as the venerablebloxsom into the current day See theAbout page to see which one(s) I am (currently) using for this site.

TODO: AFAICT there is not much to choose between the various site generators I mention below as far as the base functionality goes (taking some files and making them look acceptable on the internet).
There are some advanced features which*would*
be distinguishing, if I had treated them more thoroughly:

- Good graphical preview in an editor.
- Intuitive handling of images and other media.
- Ease of collaboration on content via a CMS of some kind.

Some interesting ones:

- Hugo (go) is a popular system. ItsR companion,blogdown, is probably ascendant for academics.
- Next.js by Vercel has momentum and is beloved of people building rich interactions and sites that transcend the feeling of being a “static” site. If I wasn’t trying to be an academic this would be an interesting thing to try out simply because the tooling is good.
- Gatsby (javascript) is another hipster JS one, probably slightly past Next on the hype curve
- Eleventy happens to be javascript but claims to be the least obtrusive static site generator.Introducing Eleventy, a new Static Site Generator mentions “While Eleventy uses JavaScript in node.js to transform templates into content, importantly (by default) it does not recommend nor force your HTML to include any Eleventy-specific client-side JavaScript. This is a core facet of the project’s intent and goals. We are not a JavaScript framework. We want our content decoupled as much as possible from Eleventy altogether, and because Eleventy uses templating engines that are Eleventy-independent, it gets us much closer to that goal.”
- Pelican (python), the previous engine for this blog, is easy to hack if you use python.
- Jekyll (ruby) is the default for github, although I personally could never make it work for me because of something about forking and plugins and other stuff that was so boring that I erased it from my brain.
- Hakyll is a haskell variant of jekyll one with goodpandoc integration.
- Neuron (also haskell) is noteworthy because it putsZettelkasten online, which is nice if that is your thing.
- There are some extra ones, below, that integrate specialised editor apps, a.k.a. CMSs.
- Not quite a static site, butorg2blog publishesorg mode notes to a website, even a “non static” one. As seen inNick Higham’s Blog Workflow

Some, like`jekyll`

or`hugo`

are opinionated and provide a featureful setup per default.
Others, likelettersmith
take a DIY route where they provide the libraries to build something minimal, but it is up to you.

For my part, I useddokuwiki for a while (no longer recommended), then switched to Pelican (fine), and have now settled uponblogdown (i.e. hugo+RMarkdown) which has better support foracademic blogging.

If your static site system comes with some kind of app that will edit that site it is called a*CMS*, for*content management system*.
There is a continuum between that and an editor with integrated static site generator capabilities.
Also there is no sharp distinction truly between online and offline editors, for all that I have tried to make one below for the sake of simplicity.
Sometimes the local CMS can run on the internet, sometimes that would be unwise or inconvenient.

If you usemarkdown, which
is the*de facto* standard markup for plain text blogging it might be a good start to simply preview that in the oldcode editor.
If you are using some other weirder specialised markup, good on you but I will
not cover that complexity.
Presumably if you know enough to do that, you know the consequences.

For a combination blogging tool and encrypted markdown edition note storage you might want to use something likestandard notes, which costs some money when you use the bells and whistles, although might be worth it if your notes includeconfidential ones.

Preview tools, that show you plain text as rendered web-style HTML, make it all nicer.

Lektor is a static site generator with anintegrated local CMS that looks Wordpress-like. Seems to be made of python.

publii is a desktop-based CMS with integrated site generator for Windows, Mac and Linux. Seems to be based on Electron/node.js.

Text editorsAtom andvs code have built-in markdown preview, which is rough but often helps.

RStudio has sophisticated integration with blogdown blogs.

blot.im (USD4/month)

A blogging platform with no interface

Why a blogging platform with no interface? So you can blog with your favorite tools. Blot turns a folder into a blog. Drag-and-drop ﬁles inside to publish. Images, text ﬁles, Word Documents, Markdown and more become blog posts automatically.

Support mathematical markup.

Hokus is one just for Hugo sites. (Untouched for two years).

As mentioned above,Caddy has a built-inautomatic hugo editor.

marked is cheap macOS markdown editor/previewer…

restview is a previewer for an alternative markup called ReST

mou is cheap and looks nice.

and (free! open source! mou-like design):Macdown

livereload turns any browser into a preview tool.

Experts can run alocalhost dev server which will host a local copy of the website

Websites that edit your website for you.

forestry seems popular. It has a rather good interface and I quite like it, but it has some red flags

- subset product receiving no updates, and also
- it wants intrusive repository permissions (it seems to demand read and write access to
*all*my private github repos?)

NetlifyCMS is Netlify’s generic CMS client for various static site backends offering a friendly, integrated CMS workflow. Code seems to have been untouched for two years.

Tina, by the creators of forestry.io, specialises for NextJS in particular but adds extra features by being tightly-coupled instead of generic. User experience at this stage is ungainly; there is a lot of logins to various different static site providers and the default config didn’t actually set up a working site for me, and now I cannot work out how to even delete it. But the demos look impressive.Maybe it will get better?

gitbook is a markdown website GUI and publishing toolchain.

“Prose provides a beautifully simple content authoring environment for CMS-free websites. It’s a web-based interface for managing content on GitHub. Use it to create, edit, and delete files, and save your changes directly to GitHub. Host your website on GitHub Pages for free, or set up your own GitHubwebhook server.”

It is indeed lovely and minimalist. The subset of markdown that it supports is also minimalist, so this blog looks funky if I edit it in prose. If you do not need mathematics and citations this is not terrible option.

Draft is a collaborative frontend for document editing although not AFAICT publishing.

Commercial optionCosmic can do lots of stuff, but for multiple users is expensive (USD99/month)

Wagtail plusdjango-bakery together render static sites from a dynamic database. One could fashion an UI out of these parts, but it would be a lot of work

Gitit is a wiki backed by a git, darcs, or mercurial filestore. Pages and uploaded files can be modified either directly via the VCS’s command-line tools or through the wiki’s web interface. Pandoc is used for markup processing, so pages may be written in (extended) markdown, reStructuredText, LaTeX, HTML, or literate Haskell, and exported in ten different formats, including LaTeX, ConTeXt, DocBook, RTF, OpenOffice ODT, and MediaWiki markup.

~~cactus is a plain website generator, that features a GUI-ish client,cactus for mac~~~~classeur attempts to be friendly for more than nerds.~~

Seelink rot.

TryJAMstackthemes for a smörgåsbord of themes for various software.

Here are some hosts I have auditioned to host my main static site (i.e. this blog).

github incidentally hosts sites as part of their

`github pages`

thing.netlify is a hosting/CDN/etc provider with good github integration, with especially fancy affordances for javascript static site builders like gatsby or nextjs.

They support alocal dev server which makes stuff convenient.

Vercel supportsmany static apps and hasintegrated tools fortraffic tracking. SeeSimon Willison’s case study. They presumably support nextjs well because they invented it.

Various other generic hosts such asHostinger orVindo. Have not tried.

Useful:

Instant site publishing from a folder of files: Netlify drop.

Hosting comments is a weak point for static sites, since by definition there is no content server to wait around for random drive-by interactions from the internet. But it is feasible by leveraging various other hosted services, in a slightly laborious, quirky, two-tiers-of-content kind of way.

I used the hosted Disqus system but it wasbloated andsuspect, so I seek alternatives.

- Ben Fedidat’s reviews.
- Hugo’s suggestions
- Peter Baumgartner’sreviews

Welcomments is a new entrant that for a monthly fee will de-spam and stash comments in github for you. I currently use this service for this site and I love it. They handle all the hosting.

staticman is an (optionally) hosted (open source) app that integrates comments into your source. It looks mostly smooth; the lack of authentication might be tedious if there was a lot of traffic on your blog.

There is a Go option calledremark42.

Netlify advocates for their own DIY solution:gotell: Netlify Comments is an API and build tool for handling large amounts of comments for JAMstack products, but it seems to be discontinued?

Web annotation toolhypothes.is can be used as a weird kind of web comment system.

Annotate the web, with anyone, anywhere. We’re a nonprofit on a mission to bring an open conversation over the whole web. Use Hypothesis right now to hold discussions, read socially, organize your research, and take personal notes.

It is targeted at academics who are the people whose comments I generally want, plus is run by a non-profit. It hasfancy options.

Seems to beopen source, and in principle one couldextract ones own data from it.

One could host a server running its own comment system.

The two most hip seem to beCommento is an optionally-hosted open-source comments software. Both these have heavy dependencies if you are self-hosting.

Schnack is a simple`node.js`

one supporting various
3rd party authentication.
Another simple alternative isisso (python) it
has no third party authentication support so I am nervous about having to do my
own account management, but it has a certain kind of seductive simplicity.

Talkyard is optionally-hosted open-source forum software which integrates blog comments as a side effect.

A small amount of work canrepurpose github issues as a comment system, although it is clunky, and requires your users to be prepared to open github issues if they want to comment, which is nerdier than one would like.

This can be made smoother withutterances which automates some of the legwork in github comments at the cost of needing a helper app to run.

A comments system powered byGitHub Discussions. Let visitors leave comments and reactions on your website via GitHub! Heavily inspired byutterances.

- Open source. 🌏
- No tracking, no ads, always free. 📡 🚫
- No database needed. All data is stored in GitHub Discussions.
- Supportscustom themes! 🌗
- Supportsmultiple languages. 🌐
- Extensively configurable. 🔧
- Automatically fetches new comments and edits from GitHub. 🔃
- Can be self-hosted! 🤳

See thelist of tools maintained by Hugo. For my config,algolia/algoliasearch-netlify is convenient and cheap, so I use that. TODO: work out how to link to searches.

Tips forLaTeX specific to mathematical typesetting. See alsoChris Cheung’s list.

Weird in maths.
Thebreqn package position paper explains many of the issues.
NB: the solutions that actually work in thejavascript-backed LaTeX maths are`multline`

(nb only one`i`

) and`split`

environments, so in practice I use those to ensure cross-compatibility of copy-pasta.

`\numberthis`

2>The latex macro`\numberthis`

is satisfying.
ThanksRussell Tsuchida for showing it to me.
It allows me to number*only* needed lines in multi-line equations.
The macro goes like this:

`\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}}`

A minimal example in context:

```
\documentclass{article}
\usepackage{amsmath}
\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}}
\begin{document}
\begin{align*}
a &=b \\
&=c \numberthis \label{eqn}
\end{align*}
Equation \eqref{eqn} shows that $a=c$.
\begin{equation}
d = e
\end{equation}
\end{document}
```

I forget this all the time.Explained by overleaf,Math font size ordering is

```
\displaystyle % Size for equations in display mode
\textstyle % Size for equations in text mode
\scriptstyle % Size for first sub/superscripts
\scriptscriptstyle % Size for subsequent sub/superscripts
```

There is an`array`

environment which is good for typesetting equations, but it is too verbose for typesetting arrays of other things like numbers.

Use`amsmath`

matrix for that, e.g.

```
\begin{pmatrix}
1 & 2 & 3\\
a & b & c
\end{pmatrix}
```

If we want a brace underneath part of an equation,`amsmath`

provides`underbrace`

.

```
\documentclass{article}
\usepackage{amsmath}% http://ctan.org/pkg/amsmath
\begin{document}
\[
\underbrace{u'-P(x)u^2-Q(x)u-R(x)}_{\text{=0, since~$u$ is a particular solution.}}
\]
\end{document}
```

Without Limits i.e. limits on the side,\({\mathop{\mathrm{arg\,max}}\nolimits}_{x\to\infty} x\). Plain style (works everywhere including old MathJax):

`\newcommand{\sech}{\mathop{\mathrm{sech}}\nolimits}`

`amsmath`

style (works in AMSMath environments):

`\DeclareMathOperator{\sech}{sech}`

If we want limits limits underneath (Does not seem to display correctly on this blog so we will have to use our imaginations)

Vanilla:

`\newcommand{\sech}{\mathop{\mathrm sech}\limits}`

Keeping track of not only changes in code but changes in data. One of the things we might need to do these days alongsideconfiguring ML,tracking progress,optimising hyperparameters etc.

There is a snake nest of tangled problems in data versioning - data sets can be big, datasets are encoded in weird ways, data might not be local files, data sets might be used by many people simultaneously, data sets are coupled tightly with data cleaning procedures, which are usually code, but also they are not updated as often as code is.

How do you get the modern affordances of source code management for data sets?

Here are some tools which variously solve some of the problems

“Data science Version control”.

DVC looks hip and solves some problems related toexperiment tracking. Versions code with data assets stored in some external data store like S3 or whatever.

DVC runs on top of any Git repository and is compatible with any standard Git server or provider (Github, Gitlab, etc). Data file contents can be shared by network-accessible storage or any supported cloud solution. DVC offers all the advantages of a distributed version control system — lock-free, local branching, and versioning.

The single

`dvc repro`

command reproduces experiments end-to-end. DVC guarantees reproducibility by consistently maintaining a combination of input data, configuration, and the code that was initially used to run an experiment.

It resemblesgit-lfs which is the classic git method of dealing with Really Big Files, and maybe alsogit-annex, which is a Big File Handler built on git. However it puts these at the service ofreproducible and easily-distributed experiments. There is a github overlaydagshub that specialises for DVC projects.

Perhaps in practice we should also think of DVC also as anexperiment tracking tool?

Building on top ofGit andgit-annex,DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.

- Track changes to your data
- Revert to previous versions
- Capture full provenance records
- Ensure complete reproducibility
A

DataLad datasetis a directory with files, managed by DataLad. You can link other datasets, known as subdatasets, and perform commands recursively across an arbitrarily deep hierarchy of datasets. This helps you to create structure while maintaining advanced provenance capture abilities, versioning, and actionable file retrieval.DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from. The collaborative power ofGit, for your data.

DataLad is integrated with a variety of hosting services and data management platforms, and extended and used by a diverse community. Export datasets to third party services such as GitHub or Figshare with built-in commands. Extend DataLad to be compatible with your preferred data supplier or workflow. Or use a multitude of other DataLad-compatible services such as Dropbox or Amazon S3. Search through all integrations, extensions, and use cases to find the right fit for your data!

I think we could read that as “a friendly python frontend to git-annex”.

It is not targetted specifically at data science people but is much more broad.

git-annex supports explicit and customisable folder-tree synchronisation, merging andsneakernets and as such I am well disposed toward it. You can choose to have things in various stores, and to copy files to an from servers or disks as they become available. It doesn’t support iOS. Windows support is experimental. Granularity is per-file. It has weird symlink-based file access protocol which might be inconvenient for many uses. (I’m imagining this is trouble for Microsoft Word or whatever.)

Also, do you want to invoke various disk-online-disk-offline-how-sync-when options from the command line, or do you want stuff to magically replicate itself across some machines without requiring you to remember the correct incantation on a regular basis?

The documentation is nerdy and unclear, but I think my needs are nerdy and unclear by modern standards. However, the combinatorial explosion of options and excessive hands-on-ness is a serious problem which I will not realistically get around to addressing due to my to-do list already being too long.

GGD comes from the genomics community. It seems to be a system for fetching and processing data based on code recipes. The combination of raw data URL plus recipe plus some caching is what gives you the data set you actually use.

Go Get Data (ggd) is a data management system that provides access to data packages containing auto curated genomic data.

`ggd`

data packages contain all necessary information for data extraction, handling, and processing. With a growing number of scientific datasets, ggd provides access to these datasets without the hassle of finding, downloading, and processing them yourself.`ggd`

leverages the conda package management system and the infrastructure of Bioconda to provide a fast and easy way to retrieve processed annotations and datasets, supporting data provenance, and providing a stable source of reproducibility. Using the`ggd`

data management system allows any user to quickly access all desired datasets, manage that data within an environment, and provides a platform upon which to cite data access and use by way of the ggd data package name and version.

ggd consists of:

- arepository of data recipes hosted on Github
- acommand line interface (cli) to communicate with the ggd ecosystem
- a continually growing list of genomic recipes to provide quick and easy access to processed genomic data using the ggd cli tool

This strikes me as an elegant solution, applicable far beyond genomics. It seems to be a lighter version ofpachyhderm.

Splitgraph is a data management, building and sharing tool inspired by Docker and Git that works on top of PostgreSQL and integrates seamlessly with anything that uses PostgreSQL.

Splitgraph allows the user to manipulatedata images (snapshots of SQL tables at a given point in time) as if they were code repositories by versioning, pushing and pulling them. It brings the best parts of Git and Docker, tools well-known and loved by developers, to data science and data engineering, and allows users to build and manipulate datasets directly on their database using familiar commands and paradigms.

It works on top of PostgreSQL and uses SQL for all versioning and internal operations. You can“check out” data into actual PostgreSQL tables, offering read/write performance and feature parity with PostgreSQL and allowing you to query it with any SQL client. The client application has no idea that it’s talking to a Splitgraph table and you don’t need to rewrite any of your tools to use Splitgraph. Anything that works with PostgreSQL will work with Splitgraph.

Splitgraph also defines thedeclarative Splitfile language with Dockerfile-like caching semantics that allows you to build Splitgraph data images in a composable, maintainable and reproducible way. When you build data with Splitfiles, you getprovenance tracking. You can inspect an image’s metadata to find the exact upstream images, tables and columns that went into it. With one command, Splitgraph can use this provenance data to rebuild an image against a newer version of its upstream dependencies. You can easily integrate Splitgraph into your existing CI pipelines, to keep your data up-to-date and stay on top of changes to its inputs.

You do not need to download the full Splitgraph image to query it. Instead, you can query Splitgraph images withlayered querying, which will download only the regions of the table relevant to your query, using bloom filters and other metadata. This is useful when you’re exploring large datasets from your laptop, or when you’re only interested in a subset of data from an image. This is still completely transparent to the client application, which sees a PostgreSQL schema that it can talk to using the Postgres wire protocol.

Splitgraph does not limit your data sources to Postgres databases. It includes first-class support for importing and querying data from other databases using Postgresforeign data wrappers. You can create Splitgraph images or query data inMongoDB,MySQL,CSV files orother Postgres databases using the same interface.

Maybe potentially interesting:pangeo-forge/roadmap: Pangeo Forge public roadmap

Pangeo Forge is inspired to copy the very successful pattern of Conda Forge. Conda Forge makes it easy for anyone to create a conda package, a binary software package that can be installed with the conda package manager. In Conda Forge, a maintainer contributes a recipe which is used to generate a conda package from a source code tarball. Behind the scenes, CI downloads the source code, builds the package, and uploads it to a repository. By automating the difficult parts of package creation, Conda Forge has enabled the open-source community to collaboratively maintain a huge and dynamic library of software packages.

Sno stores geospatial and tabular data in Git, providing version control at the row and cell level.

- Built on Git, works like Git
- Uses standard Git repositories and Git-like CLI commands. If you know Git, you’ll feel right at home with Sno.
- Supports current GIS workflows
- Provides repository working copies as GIS databases and files. Edit directly in common GIS software without plugins.

This is a neat approach if you have a large enough git repository I suppose.

- dolthub/dolt: Dolt – It’s Git for Data
- Dolt Use Cases in the Wild | DoltHub Blog
- Dolt Diff vs. Sqlite Diff | DoltHub Blog

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository. Connect to Dolt just like any MySQL database to run queries or update the data using SQL commands. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate’s changes.

All the commands you know for Git work exactly the same for Dolt. Git versions files, Dolt versions tables. It’s like Git and MySQL had a baby.

We also builtDoltHub, a place to share Dolt databases. We host public data for free. If you want to host your own version of DoltHub, we haveDoltLab. If you want us to run a Dolt server for you, we haveHosted Dolt.

It has a twin project,dolthub, which is the github of dolts, i.e.data sharing infrastructure.

Intake TBC

is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you’re looking for a way to “productionize” them, Pachyderm can make this easy for you.

- Containerized: Pachyderm is built on Docker and Kubernetes. Whatever languages or libraries your pipeline needs, they can run on Pachyderm which can easily be deployed on any cloud provider or on prem.
- Version Control: Pachyderm version controls your data as it’s processed. You can always ask the system how data has changed, see a diff, and, if something doesn’t look right, revert.
- Provenance (aka data lineage): Pachyderm tracks where data comes from. Pachyderm keeps track of all the code and data that created a result.
- Parallelization: Pachyderm can efficiently schedule massively parallel workloads.
- Incremental Processing: Pachyderm understands how your data has changed and is smart enough to only process the new data.

This is the wrong scale for me, but interesting to see how enterprise might be doing big versions of my little experiments.

How to install the right versions of everything for some python code I am developing?
How to deploy that sustainably? How to share it with others?
There are two problems:installing the right package dependencies and keeping theright dependency versions for*this* project.
In python there are various integrated solutions that solve these two problems at once with varying degrees of success.
Not so hard, but confusing and chaotic due to many long-running disputes some of which have lately resolve and many of which will probably stay with us forever.

In the before-times there were many python packaging standards. Distutils and what-not. AFAICT, unless I am migrating extremely old code I should ignore everything about these.

**tl;dr**: only pip and conda support hardware specification in practice.
Users of GPUs must ignore any other options, no matter how attractive all the other options might seem at first glance.

Many packages specify*local versions* for particular architectures as a part of their functionality.
For example,pytorch comes in various flavours, which when using`pip`

```
# CPU flavour
pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
# GPU flavour
pip install torch==1.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
```

The local version is given by the`+cpu`

or`+cu113`

bit, and it changes completely what code will be executed when using these packages.
Specifying a GPU version is essential for many machine learning projects (essential, that is, if I do not want my code to run orders of magnitude slower).
The details of how this can be controlled with regard to the python packaging ecosystem are somewhat contentious and complicated and thus not supported by any of the new wave options like`poetry`

or`pipenv`

.Brian Wilson argues:

During my dive into the open-source abyss that is ML packages and

`+localVersions`

I discovered lots of people have strong opinions about what it should not be and like to tell other people they're wrong. Other people with opinions about what it could be are too afraid of voicing them lest there be some unintended consequence. PSF has asserted what they believe to be the intended state inPEP-440 (no local versions published) but the solution (PEP-459) is not an ML Model friendly solution because the installation providers (pip, pipenv, poetry) don’t have enough standardized hooks into the underlying hardware (cpu vs gpu vs cuda lib stack) to even understand which version to pull, let alone the Herculean effort it would take to get even just pytorch to update their package metadata.

There is no evidence that this logjam will resolve any time soon.
Since I do neural network stuff and thus use GPU/CPU version of packages, this means that I can effectively ignore most of the python environment alternatives on this page.
The two that work areconda andpip, which support a minimum viable local version package system*de facto*, and if they are less smooth or pleasant than the new systems, at least I am not alone.

Least nerdview guide: Vicki Boykis,Alice in Python projectland.

Simplest readable guide ispython-packaging

PyPI Quick and Dirty, includes some good tips such as usingtwine to make it automaticker.

Official docs are no longer awful but are

*slightly*stale, and especially perfunctory forcompilationKenneth Reitz shows rather than tells witha heavily documented setup.py

Try Zed Shaw’s signature aggressively cynical and reasonably practical explanation ofproject structure, with bonus explication of how you should expect much time wasting yak shaving like this if you want to do software.

- Or copypyskel.
- Or generate a project structure withcookiecutter.

updated:What the heck is

`pyproject.toml`

?

`pip`

2>The default python package installer. It isbest spelled as

`python -m pip install package_name`

To snapshot dependencies:

`python -m pip freeze > requirements.txt`

To restore dependencies:

`python -m pip install -r requirements.txt`

`venv`

works.
It is a good default choice, widely supported and adequate, if not awesome, workflow.

`pipx`

3>Pro tip:pipx:

pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages.

That is, pipx is an application that installs global applications for you. (There is a bootstrapping problem: How to install pipx itself.)

pip has a heavy cache overhead.
If disk space is at a premium, I invoke it as`pip --no-cache-dir`

.

A parallel system to pip, designed to do all the work of installing scientific python software with hefty compiled dependencies.

There are two parts here with two separate licenses

- the anaconda python distribution
- the conda python package manager.

I am slightly confused about how these two relate (Can I install a non-anaconda python distribution through the conda package manager?) There distinction is important, since licensing anaconda can be expensive. See, e.g.

- Anaconda is not free for commercial use (anymore) - alternatives ? : Python
- Conda/Anaconda no longer free to use?
- See alsomamba below, which aims to reduce licensing risk by reimplementing the more licensing-vulnerable parts of the anaconda ecosystem

Some things that are (or were?) painful to install by pip are painless via conda. Contrarywise, some things that are painful to install by conda are easy by pip.

I recommend working out which pain points are worse in this complicated decision by trial and error. Sometimes it would be worth the administrative burden of understanding conda’s current licensing and future licensing risks, but if it does not bring substantial value, choose pip.

This is an updated recommendation; previously I preferred conda — pip used to be much worse, and anaconda’s licensing used to be less restrictive. Now I think anaconda cannot be relied upon IP-wise.

Download e.g.Linux x64 Miniconda, from thedownload page.

```
bash Miniconda3-latest-Linux-x86_64.sh
# login/logout here
# or do something like `exec bash -` if you are fancy
# Less aggressive conda
conda config --set auto_activate_base false
# conda for fish users
conda init fish
```

Alternatively, tryminiforge: A conda-forge distribution orfastchan, fast.ai’s conda mini-distribution.

```
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
bash Mambaforge-$(uname)-$(uname -m).sh
```

It is very much worth installing one of these minimalist dists rather than the default anaconda distro, since anaconda default is gigantic but nonetheless does not have what I need, so it simply wastes space. Some of these might additionally have less onerous licensing than the mainline? I am not sure.

If I want to install something with tricky dependencies like ViTables, I do this:

```
conda install pytables=3.2
conda install pyqt=4
```

Aside: I usefish shell, so need to do someextra setup. Specifically, I add the line

`source (conda info --root)/etc/fish/conf.d/conda.fish`

into`~/.config/fish/config.fish`

.
These days this is automated by

`conda init fish`

Forjupyter compatibility one needs

`conda install nb_conda_kernels`

The main selling point of conda is that specifying dependencies for*ad hoc* python scripts or packages is easy.

Conda has a slightly different dependency management and packaging workflow than the pip ecosystem.
See, e.g.Tim Hoppper’s explanation
of this`environment.yml`

malarkey, or the creators’rationale andmanual.

Oneexports the current conda environment config, by convention, into`environment.yml`

.

`conda env export > environment.yml`

`conda env create --file environment.yml`

Which to use out of`conda env create`

and`conda create`

?
if it involves`.yaml`

environment configs then`conda env create`

.
Confusing errors and capability differences for these two is aquagmire of opaque errors, bad documentation and sadness.

One point of friction that I rapidly encountered is that the automatically-created environments are not terribly generic; I might specify from the command-line a package that I know will install sanely on any platform (`matplotlib`

, say) but the version as stored in the environment file is specific to where I installed it (macos, linux, windows…) and architecture (x64, ARM…)
For GPU software there are even more incompatibilities because there are more choices of architecture.
So to share environments with collaborators on different platforms, I need to…*be* them, I guess? Buy them new laptops that match my laptop?
idk this seems weird maybe I’m missing something.

**NB** Conda will fill up my hard disk if not regularly disciplined
viaconda clean.

`conda clean --all --yes`

If I have limited space in your home dir, might need to move the cache:

configure`PKGS_DIR`

in`~/.condarc`

:

`conda config --add pkgs_dirs /SOME/OTHER/PATH/.conda`

Possibly also required?

`chmod a-rwx ~/.conda`

I might also want to not have the gigantic MKL library installed,not being a fan. It comes baked in by default for most anaconda installs.I can usually disable it by request:

`conda create -n pynomkl python nomkl`

Clearly the packagers do not test this configuration so often, because it fails sometimes even for packages which notionally do not need MKL. Worth attempting, however. Between the various versions and installed copies, MKL alone was using about 10GB total on my mac when I last checked. I also try to reduce the number of copies of MKL by starting fromminiconda as my base anaconda distribution, cautiously adding things as I need them.

Local environment folder is more isolated, keeping packages in a local folder than keeping all environments somewhere global, where I need to remember what I named them all.

```
conda config --set env_prompt '({name})'
conda env create --prefix ./env/myenv --file environment_linux.yml
conda activate ./env/myenv
```

Gotcha: infish shell the first line needs to be

`conda config --set env_prompt '\({name}\)'`

I am not sure why. AFAIK, fish command substitution does not happen inside strings. Either way, this will add the line

`env_prompt: ({name})`

to`.condarc`

.

Robocorp tools claim to make conda install more generic.

RCC is a command-line tool that allows you to create, manage, and distribute Python-based self-contained automation packages - or robots 🤖 as we call them.

Together with therobot.yaml configuration file,

`rcc`

provides the foundation to build and share automation with ease.In short, the RCC toolchain helps you to get rid of the phrase: "Works on my machine" so that you can actually build and run your robots more freely.

Mamba is a fully compatible drop-in replacement for conda. It was started in 2019 by Wolf Vollprecht. It can be installed as a conda package with the command

`conda install mamba -c conda-forge`

.

Theintroductory blog post is an enlightening read, which also explains conda better than conda explains itself.

Seemamba-org/mamba for more.

Itexplicitly targets package installation for less mainstream configurations such asR, andvscode development environments.

Provide a convenient way to install developer tools in VSCode workspaces fromconda-forge withmicromamba. Get NodeJS, Go, Rust, Python or JupyterLab installed by running a single command.

It also inherits some of the debilities of conda, e.g. that dependencies are platform- and architecture- specific.

`venv`

2>venv is now a built-in python virtual environment system in python 3. It doesn’t support python 2 but fixes various problems, e.g. it supports framework python on macOS which isimportant for GUIs, and is covered by the python docs in thepython virtual environment introduction.

```
# Create venv
python3 -m venv ./venv --prompt some_arbitrary_name
# or if we want to use system packages:
python3 -m venv ./venv --prompt some_arbitrary_name --system-site-packages
# Use venv from fish OR
source ./venv/bin/activate.fish
# Use venv from bash
source ./venv/bin/activate
```

`pyenv`

2>pyenv is the core tool of an ecosystem which eases and automates switching between python versions. Manages python and thus implicitly can be used as a manager for all the other managers. The new new hipness, at least for platforms other than windows, where it does not work.

BUT WHO MANAGES THE VIRTUALENV MANAGER MANAGER? Also, what is going on in this ecosystem of bits?Logan Jones explains:

**pyenv**manages multiple versions of Python itself.**virtualenv/venv**manages virtual environments for a specific Python version.**pyenv-virtualenv**manages virtual environments for across varying versions of Python.

Anyway, pyenv compiles a custom version of python and as such is extremely isolated from everything else. Here is an introduction with emphasis on my area:Intro to Pyenv for Machine Learning.

```
#initial pyenv install
pyenv init
# install a specific python version
pyenv install 3.8.13
# ensure we can find that version
pyenv rehash
# switch to that version
pyenv shell 3.8.13
```

Of course, because this is adjacent to the python packaging ecosystem, it immediately becomes complicated and confusing when you try to interact with the rest of the ecosystem, e.g.,

pyenv-virtualenvwrapper is different from

`pyenv-virtualenv`

, which provides extended commands like`pyenv virtualenv 3.4.1 project_name`

to directly help out with managing virtualenvs.`pyenv-virtualenvwrapper`

helps in interacting with`virtualenvwrapper`

, but`pyenv-virtualenv`

provides more convenient commands, where virtualenvs are first-class pyenv versions, that can be (de)activated. That’s to say,`pyenv`

and`virtualenvwrapper`

are still separated while`pyenv-virtualenv`

is a nice combination.

Huh. I am already too bored to think. However, I did work out a command which installed a pyenv tensorflow with an isolated virtualenv:

```
brew install pyenv pyenv-virtualenv
pyenv install 3.8.6
pyenv virtualenv 3.8.6 tf2.4
pyenv activate tf2.4
pip install --upgrade pip wheel
pip install 'tensorflow-probability>=0.12' 'tensorflow<2.5' jupyter
```

Forfish shell you need toadd some special lines to`config.fish`

:

```
set -x PYENV_ROOT $HOME/.pyenv
set -x PATH $PYENV_ROOT/bin $PATH
## fish <3.1
# status --is-interactive; and . (pyenv init -|psub)
# status --is-interactive; and . (pyenv virtualenv-init -|psub)
## fish >=3.1
status --is-interactive; and pyenv init - | source
status --is-interactive; and pyenv virtualenv-init - | source
```

No! wait! The new new new hipness is`poetry`

.
All the other previous hipnesses were not the real eternal ultimate hipness that transcends time.
I know we said this every previous time, but*this* time its real and our love will last forever ONO.

**⛔️⛔️UPDATE⛔️⛔️**:
OK, turns out this love was not actually quite as eternal as it seemed.
Lovely elegant design does not make up for the fact that the project is logjammed and broken in various ongoing ways; seeIssue #4595: Governance—or, “what do we do with all these pull requests?”.
It might be usable if your needs are modest or you are prepared to jump into theproject discord, which seems to be where the poetry hobbyists organise, but since I want to use this project merely incidentally, as a tool to develop something*else*, hobbyist level of engagement is not something I can participate in.
poetry is not ready for prime-time.

Note also that poetry is having difficulty staying current with*local versions*, as made famous by CUDA-supporting packages.
There is an example of the kind of antics that make it workbelow.

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

From the introduction:

Packaging systems and dependency management in Python are rather convoluted and hard to understand for newcomers. Even for seasoned developers it might be cumbersome at times to create all files needed in a Python project:

`setup.py`

,`requirements.txt`

,`setup.cfg`

,`MANIFEST.in`

and the newly added`Pipfile`

.So I wanted a tool that would limit everything to a single configuration file to do: dependency management, packaging and publishing.

It takes inspiration in tools that exist in other languages, like

`composer`

(PHP) or`cargo`

(Rust).And, finally, I started

`poetry`

to bring another exhaustive dependency resolver to the Python community apart fromConda’s.## What about Pipenv?3>

In short: I do not like the CLI it provides, or some of the decisions made, and I think we can make a better and more intuitive one.

**Editorial side-note**: Low-key dissing on similarly-dysfunctional, competing projects is an important part of python packaging.

Lazy install is via this terrifying command line (do not run if you do not know what this does):

`curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -`

Poetry could be regarded as a similar thing to`pipenv`

, in that it (per default, but not necessarily) manages the dependencies in a local`venv`

.
It has a much more full-service approach than systems built on`pip`

.
For example, it has its own dependency resolver, which makes use of modern dependency metadata but also will work with previous dependency specifications by brute force if needed.
It separates specified dependencies from the ones that it contingently resolves in practice, which means that the dependencies seem to transport much better thanconda, which generally requires you to hand-maintain a special dependency file full of just the stuff you actually wanted.
In practice the many small conveniences and thoughtful workflow are helpful.
For example, it will set up the current package for developing*per default* so that imports work as similarly as possible across this local environment and when it is distributed to users.

```
poetry config virtualenvs.create true
poetry config virtualenvs.in-project true # local venvs are easier for my brain.
```

However, poetry does not support installingbuild variants/profiles, which means I cannot install GPU software, so it is useless to me.

As mentioned above, the`poetry`

system does not support “local versions” well and thus in practice is onerous to use for machine learning applications.
There are workarounds:Instructions for installing PyTorch shows a representative installation specification for pytorch.

```
[tool.poetry.dependencies]
python = "^3.10"
numpy = "^1.23.2"
torch = { version = "1.12.1", source="torch"}
torchaudio = { version = "0.12.1", source="torch"}
torchvision = { version = "0.13.1", source="torch"}
[[tool.poetry.source]]
name = "torch"
url = "https://download.pytorch.org/whl/cu116"
secondary = true
```

Note that this produces various errors and reported downlaods gigabytes of supporing files uneccessarily, but apparently works eventually.

`pipenv`

2>**⛔️⛔️UPDATE⛔️⛔️**:
Note that the`pipenv`

system does not support “local versions” and thus in practice cannot be used for machine learning applications.
This project is dead to me.
(Bear in mind that my opinions about will become increasingly outdated depending on when you read this.)

`venv`

has a higher-level, er, …wrapper (?) interface (?) calledpipenv.

Pipenv is a production-ready tool that aims to bring the best of all packaging worlds to the Python world. It harnesses Pipfile, pip, and virtualenv into one single command.

I switched to pipenv from poetry because it looked like it might be less chaos than poetry. I think it is, although the race is close.

HOWEVER, it is still pretty awful.
TBH, I would just use plain pip and`requirements.txt`

which, while it is primitive and broken, is at least broken and primitive in a well-understood way.

At time of writing thepipenv website was 3 weeks into an outage, because dependency management is a quagmire of sadness andcomically broken management with terribleBus factor. However,the backup docs site is semi-functional, albeit too curt to be useful and AFAICT outdated. The documentation siteinside github is readable.

The dependency resolver is, as the poetry devs point out, broken in its own special ways.The procedure to install modern ML frameworks, for example, is gruelling.

Here is an introduction showing pipenv and venv used together.

For my configuration, important configuration settings are

`export WORKON_HOME=~/.venvs`

To get the venv inside the project (required for sanity in my HPC) I need the following:

`export PIPENV_VENV_IN_PROJECT=1`

Pipenv willautomatically load dotenv files which is a nice touch.

Does python’s slapstick ongoing shambles of a failed consensus on dependency management system fill you with distrust? Do you have the vague feeling that perhaps you should use something else to manage python since python cannot manage itself? Seegeneric dependency managers.

An alternative way of constructing graphical models which, loosely speaking, includes directed and undirected graphs by decomposing the model into more nodes and edges. I am currently interested in this because it seems to be pedagogically simpler than either of the other two DAG formalisms, ofdirected andundirected graphs, at least from some angles — we give up the on-to-one mapping between variables and nodes, but gain a much simpler algebra, especially in the Forney-style graphs.

Wikipedia tells us

A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm.

That is, unlike the other two DAG formalisms, we we assign nodes to variables and a different type of node to relations between nodes.

To discuss: relation tomessage passing updates forvariational inference andexpectation propagation e.g.Cox and De Vries (2018) andCox, van de Laar, and de Vries (2019).

We start from theHammersley-Clifford Theorem and work backwards to the most convenient graph to represent it. Once the factor graph is written out (which is a slightly weird process) computations on the graph are essentially the same for every node. By contrast, in a classic I-graph DAG there are many different rules to remember for leaf nodes and branches, for colliders and forks etc.

TBD(Frey et al. 1997;Frey 2003;Kschischang, Frey, and Loeliger 2001). This classic version is explained lots of places, but for various reasons I needed the FFG version first. I may return to classic factor graphs.

For now, a pretty good introduction is included inOrtiz, Evans and Davison’s tutorial onGaussian belief propagation.

A tweaked formalism. Citations tell me this was introduce in a recondite article(Forney 2001) which I did not remotely understand because it was way too far into coding theory for me. It is explained and exploited much better for someone of my background in subsequent articles and used in several computational toolkits(Korl 2005;Cox, van de Laar, and de Vries 2019;H.-A. Loeliger et al. 2007;H.-A. Loeliger 2004;van de Laar et al. 2018;Akbayrak, Bocharov, and de Vries 2021).

Relative to classic factor graphs,H.-A. Loeliger (2004) advocates for FFGs for the following advantages

- suited for hierarchical modeling ("boxes within boxes")
- compatible with standard block diagrams
- simplest formulation of the summary-product message update rule
- natural setting for Forney’s results on Fourier transforms and duality.

Mao and Kschischang (2005) argues:

Forney graphs possess a strikingly elegant duality property: by a local dualization operation, a Forney graph for a linear code may be transformed into another graph, called the dual Forney graph, which represents the dual code

The explanation inCox, van de Laar, and de Vries (2019) gives the flavour of how graphs in this style work:

A Forney-style factor graph (FFG) offers a graphical representation of a factorized probabilistic model. In an FFG, edges represent variables and nodes specify relations between variables. As a simple example, consider a generative model (joint probability distribution) over variables\(x_{1}, \ldots, x_{5}\) that factors as\[ f\left(x_{1}, \ldots, x_{5}\right)=f_{a}\left(x_{1}\right) f_{b}\left(x_{1}, x_{2}\right) f_{c}\left(x_{2}, x_{3}, x_{4}\right) f_{d}\left(x_{4}, x_{5}\right), \] where\(f_{\bullet}(\cdot)\) denotes a probability density function. This factorized model can be represented graphically as an FFG, as shown inFig. 1. Note that although an FFG is principally an undirected graph, in the case of generative models we specify a direction for the edges to indicate the “generative direction”. The edge direction simply anchors the direction of messages flowing on the graph (we speak of forward and backward messages that flow with or against the edge direction, respectively). In other words, the edge directionality is purely a notational issue and has no computational consequences.…

The FFG representation of a probabilistic model helps to automate probabilistic inference tasks. As an example, consider we observe\(x_{5}=\hat{x}_{5}\) and are interested in calculating the marginal posterior probability distribution of\(x_{2}\) given this observation.

In the FFG context, observing the realization of a variable leads to the introduction of an extra factor in the model which “clamps” the variable to its observed value. In our example where\(x_{5}\) is observed at value\(\hat{x}_{5}\), we extend the generative model to\(f\left(x_{1}, \ldots, x_{5}\right) \cdot \delta\left(x_{5}-\hat{x}_{5}\right).\) Following the notation introduced inReller (2013), we denote such “clamping” factors in the FFG by solid black nodes. The FFG of the extended model is illustrated inFig. 2…

Computing the marginal posterior distribution of\(x_{2}\) under the observation\(x_{5}=\hat{x}_{5}\) involves integrating the extended model over all variables except\(x_{2},\) and renormalizing:\[ f\left(x_{2} \mid x_{5}=\hat{x}_{5}\right) \propto \int \ldots \int f\left(x_{1}, \ldots, x_{5}\right) \cdot \delta\left(x_{5}-\hat{x}_{5}\right) \mathrm{d} x_{1} \mathrm{~d} x_{3} \mathrm{~d} x_{4} \mathrm{~d} x_{5} \]\[ =\overbrace{\int \underbrace{f_{a}\left(x_{1}\right)}_{1} f_{b}\left(x_{1}, x_{2}\right) \mathrm{d} x_{1}}^{2} \overbrace{\iint f_{c}\left(x_{2}, x_{3}, x_{4}\right) \underbrace{\left(\int f_{d}\left(x_{4}, x_{5}\right) \cdot \delta\left(x_{5}-\hat{x}_{5}\right) \mathrm{d} x_{5}\right)}_{3} \mathrm{~d} x_{3} \mathrm{~d} x_{4}}^{(4)} . \] The nested integrals result from substituting the [original] factorization and rearranging the integrals according to the distributive law. Rearranging large integrals of this type as a product of nested sub-integrals can be automated by exploiting the FFG representation of the corresponding model. The sub-integrals indicated by circled numbers correspond to integrals over parts of the model (indicated by dashed boxes inFig. 2), and their solutions can be interpreted as messages flowing on the FFG. Therefore, this procedure is known asmessage passing (or summary propagation). The messages are ordered (“scheduled”) in such a way that there are only backward dependencies, i.e., each message can be calculated from preceding messages in the schedule. Crucially, these schedules can be generated automatically, for example by performing a depth-first search on the FFG.

What*local* means here is might not be intuitive.
What we mean is that, in this high dimensional integral, only some*dimensions* are involved in each sub-step thanks to the multiplicative factorisation of the overall function.

Extension to systems without a density is left as an exercise for the student.

Question: These introductory texts discuss*sum-product* message passing, which is essentially about solving integrals.
But what if I want to pass some other kind of update, e.g.expectation propagation?
Then we no longer have exactly the same rational about factorising the form of the integral.
Does this factor graph still help us?

There are some weird things about FFGs; For example, since a variable can only appear in two factors, if you want a variable to appear in more than 2, you add extra factors in, each of which constrains the variables touching it to be equal.

What does it benefit to take a fourier transform of a graph?(Kschischang, Frey, and Loeliger 2001;Forney 2001;Mao and Kschischang 2005)

Seede Vries and Friston (2017) andvan de Laar et al. (2018) for a connection topredictive coding.

How doesdo-calculus work in factor graphs?

Abbeel, Pieter, Daphne Koller, and Andrew Y. Ng. 2006.“Learning Factor Graphs in Polynomial Time and Sample Complexity.”*Journal of Machine Learning Research* 7 (December): 1743–88.

Akbayrak, Semih, Ivan Bocharov, and Bert de Vries. 2021.“Extended Variational Message Passing for Automated Approximate Bayesian Inference.”*Entropy* 23 (7): 815.

Bickson, Danny. 2009.“Gaussian Belief Propagation: Theory and Application.” PhD.

Cox, Marco, and Bert De Vries. 2018.“Robust Expectation Propagation in Factor Graphs Involving Both Continuous and Binary Variables.” In*2018 26th European Signal Processing Conference (EUSIPCO)*, 2583–87. Rome: IEEE.

Cox, Marco, Thijs van de Laar, and Bert de Vries. 2019.“A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms.”*International Journal of Approximate Reasoning* 104 (January): 185–204.

Dauwels, Justin. 2007.“On Variational Message Passing on Factor Graphs.” In*2007 IEEE International Symposium on Information Theory*, 2546–50. Nice: IEEE.

Dellaert, Frank, and Michael Kaess. 2017.“Factor Graphs for Robot Perception.”*Foundations and Trends® in Robotics* 6 (1-2): 1–139.

El-Kurdi, Yousef Malek. 2014.“Parallel Finite Element Processing Using Gaussian Belief Propagation Inference on Probabilistic Graphical Models.” PhD Thesis, McGill University.

Forney, G.D. 2001.“Codes on Graphs: Normal Realizations.”*IEEE Transactions on Information Theory* 47 (2): 520–48.

Frey, Brendan J. 2003.“Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models.” In*Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence*, 257–64. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Frey, Brendan J., Frank R. Kschischang, Hans-Andrea Loeliger, and Niclas Wiberg. 1997.“Factor Graphs and Algorithms.” In*Proceedings of the Annual Allerton Conference on Communication Control and Computing*, 35:666–80. Citeseer.

Korl, Sascha. 2005.“A Factor Graph Approach to Signal Modelling, System Identification and Filtering.” Application/pdf. Konstanz: ETH Zurich.

Kschischang, F.R., B.J. Frey, and H.-A. Loeliger. 2001.“Factor Graphs and the Sum-Product Algorithm.”*IEEE Transactions on Information Theory* 47 (2): 498–519.

Laar, Thijs van de, Marco Cox, Ismail Senoz, Ivan Bocharov, and Bert de Vries. 2018.“ForneyLab: A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models.” In*Conference on Complex Systems*, 3.

Loeliger, H.-A. 2004.“An Introduction to Factor Graphs.”*IEEE Signal Processing Magazine* 21 (1): 28–41.

Loeliger, Hans-Andrea, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R. Kschischang. 2007.“The Factor Graph Approach to Model-Based Signal Processing.”*Proceedings of the IEEE* 95 (6): 1295–1322.

Mao, Yongyi, and F.R. Kschischang. 2005.“On Factor Graphs and the Fourier Transform.”*IEEE Transactions on Information Theory* 51 (5): 1635–49.

Mao, Yongyi, Frank R. Kschischang, and Brendan J. Frey. 2004.“Convolutional Factor Graphs As Probabilistic Models.” In*Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence*, 374–81. UAI ’04. Arlington, Virginia, United States: AUAI Press.

Reller, Christoph. 2013.“State-Space Methods in Statistical Signal Processing: New Ideas and Applications.” Application/pdf. Konstanz: ETH Zurich.

Vries, Bert de, and Karl J. Friston. 2017.“A Factor Graph Description of Deep Temporal Active Inference.”*Frontiers in Computational Neuroscience* 11.

Zhang, Zhen, Fan Wu, and Wee Sun Lee. n.d.“Factor Graph Neural Network,” 11.

\(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\corr}{\operatorname{Corr}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\dist}[1]{\mathcal{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\disteq}{\stackrel{d}{=}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\)

Approximating probability distributions by a Gaussian with the same mode. Thanks tolimit theorems this is not always a terrible idea, especially since Neural networks seempretty keen to converge to Gaussians in various senses.

Specifically, we (possibly locally) approximate some posterior density of interest using a Gaussian\[ p(\theta \mid \mathcal{D}) \approx \mathcal{N}\left(\theta_{\mathrm{MAP}}, \Sigma\right). \] The Laplace trick in general uses a local curvature estimate to choose the covariance.

I am particularly keen to see it work forprobabilistic neural nets, where it is a classic methods(Mackay 1992). There are many variants of the technique for different assumptions. We often see this applied to estimating the posterior predictive density, which is usually compact enough to be tractable, but also we can apply it to input uncertainty and even parameters under certain simplifications(Foong et al. 2019;Immer, Korzepa, and Bauer 2021).

In classic Laplace approximation, we assume that the parameters of our model have a Gaussian distribution (independently why not) both*a prior* and*a posteriori*.
Specifically we then attempt to approximate these using local curvature of the likelihood function at some maximum a posteriori estimate.

⚠️ The next bit needs a do-over. I completely lost track of the prior and the hyperparameters.

Writing that in symbols, the basic idea is that we hold\(x \in \mathbb{R}^{n}\) fixed and use the Jacobian matrix\(J(x):=\left.\nabla_{\theta} f(x ; \theta)\right|_{\theta_{\mathrm{MAP}}} \in \mathbb{R}^{d \times n}\), to the network as\[ f(x ; \theta) \approx f\left(x ; \theta_{\mathrm{MAP}}\right)+J(x)^{\top}\left(\theta-\theta_{\mathrm{MAP}}\right) \] where the variance is now justified using a Taylor expansion and some smoothness assumptions on\(f\). Under this approximation, since\(\theta\) is by assumption Gaussian,\(\theta \sim \mathcal{N}\left(\theta_{\mathrm{MAP}}, \Sigma\right)\), it follows that the marginal distribution over the network output\(f(x)\) is also Gaussian, given by\[ f(x) \mid x, \mathcal{D} \sim \mathcal{N}\left(f\left(x ; \theta_{\mathrm{MAP}}\right), J(x)^{\top} \Sigma J(x)\right). \] For more on this, see e.g.(Bishop 2006, 5.167, 5.188). It can be essentially a gratis Laplace approximation in the sense that if I have fit the networks I can already calculate those Jacobians so I am probably 1 line of code away from getting some kind of uncertainty estimate. However, I have no particular guarantees that it is well calibrated — is it meaningfully estimating the “true” uncertainty in the model subject to all the uncertainties with this simplified model structure?

a.k.a. Neural Linear models.

AFAICT this case is the simplest. We are concerned with the density over the predictive, so we start with a neural network. Then we treat the neural network as a feature generator in all the layers up to the last one, and treat the last layer probabilistically, as an adaptive-basis regression or classification problem, to get a decent learnable predictive uncertainty. I think this was implicit inMackay (1992), but it was named inSnoek et al. (2015), critiqued and extended inLorsung (2021).

For a simple practical example, see theProbflow tutorial.

Under a last-layer Laplace approximation, we write the joint model as\(\vrv{y}= \vrv{r}^{\top}\Phi(\vrv{u})\) so the joint distribution is\[\begin{align*} \left.\left[\begin{array}{c} \vrv{y} \\ \vrv{r} \end{array}\right]\right|\vrv{u} &\sim\dist{N}\left( \left[\begin{array}{c} \vv{m}_{\vrv{y}}\\ \vv{m}_{\vrv{r}} \end{array}\right], \left[\begin{array}{cc} \mm{K}_{\vrv{y}\vrv{y}} & \mm{K}_{\vrv{y}\vrv{r}}^{\top} \\ \mm{K}_{\vrv{y}\vrv{r}} & \mm{K}_{\vrv{r}\vrv{r}} \end{array}\right] \right) \end{align*}\] with\[\begin{align*} \vv{m}_{\vrv{y}} &=\vv{m}_{\vrv{r}}^{\top}\Phi(\vrv{u}) \\ \mm{K}_{\vrv{y}\vrv{r}} &=\Phi(\vrv{u}) \mm{K}_{\vrv{r}\vrv{r}}\\ \mm{K}_{\vrv{y}\vrv{y}} &= \Phi(\vrv{u})\mm{K}_{\vrv{r}\vrv{r}} \Phi^{\top} (\vrv{u})+ \sigma^2\mm{I}. \end{align*}\] Here\(\vrv{r}\sim \dist{N}\left(\vv{m}_{\vrv{r}}, \mm{K}_{\vrv{r}\vrv{r}}\right)\) is the random weighting, and\(\Phi(\vrv{u})\) is called the feature map.

**NB** ⚠️⛔️☣️: This sections is*vague half-arsed notes* of*dubious accuracy*.

Agustinus Kristiadi and team have created various methods for low-overhead neural uncertainty quantification via Laplace approximation that have greater flexibility for adaptively choosing the type and manner of approximation. See, e.g.Painless Uncertainty for Deep Learning and their papers(Kristiadi, Hein, and Hennig 2020,2021).

Kristiadi, Hein, and Hennig (2021) generalises to learnable uncertainty to, for example, allow the distribution to reflect uncertainty about datapoints drawn far from the training distribution.
They define an augmented*Learnable Uncertainty Laplace Approximation* (LULA) network\(\tilde{f}\) with more parameters\(\tilde{\theta}=\theta_{\mathrm{MAP}}, \hat{\theta}.\)

Let\(f: \mathbb{R}^{n} \times \mathbb{R}^{d} \rightarrow \mathbb{R}^{k}\) be an\(L\)-layer neural network with a MAP-trained parameters\(\theta_{\text {MAP }}\) and let\(\widetilde{f}: \mathbb{R}^{n} \times \mathbb{R}^{\widetilde{d}} \rightarrow \mathbb{R}^{k}\) along with\(\widetilde{\theta}_{\text {MAP }}\) be obtained by adding LULA units. Let\(q(\widetilde{\theta}):=\mathcal{N}\left(\tilde{\theta}_{\mathrm{MAP}}, \widetilde{\Sigma}\right)\) be the Laplace-approximated posterior and\(p\left(y \mid x, \mathcal{D} ; \widetilde{\theta}_{\mathrm{MAP}}\right)\) be the (approximate) predictive distribution under the LA. Furthermore, let us denote the dataset sampled i.i.d. from the data distribution as\(\mathcal{D}_{\text {in }}\) and that from some outlier distribution as\(\mathcal{D}_{\text {out }}\), and let\(H\) be the entropy functional. We construct the following loss function to induce high uncertainty on outliers while maintaining high confidence over the data (inliers):\[ \begin{array}{rl} \mathcal{L}_{\text {LULA }}\left(\widetilde{\theta}_{\text {MAP }}\right)&:=\frac{1}{\left|\mathcal{D}_{\text {in }}\right|} \sum_{x_{\text {in }} \in \mathcal{D}_{\text {in }}} H\left[p\left(y \mid x_{\text {in }}, \mathcal{D} ; \widetilde{\theta}_{\text {MAP }}\right)\right] \\ &-\frac{1}{\left|\mathcal{D}_{\text {out }}\right|} \sum_{x_{\text {out }} \in \mathcal{D}_{\text {out }}} H\left[p\left(y \mid x_{\text {out }}, \mathcal{D} ; \widetilde{\theta}_{\text {MAP }}\right)\right] \end{array} \] and minimize it w.r.t. the free parameters\(\widehat{\theta}\).

I am assuming that by the*entropy functional* they mean the entropy of the normal distribution,\[
H(\mathcal{N}(\mu, \sigma)) = {\frac {1}{2}}\ln \left((2\pi \mathrm {e} )^{k}\det \left({\boldsymbol {\Sigma }}\right)\right)
\]
but this looks expensive due to that determinant calculation in a (large)\(d\times d\) matrix.
Or possibly they mean some general entropy with respect to some density\(p\),\[H(p)=\mathbb{E}_{p}\left[-\log p( x)\right]\]
which I suppose one could estimate as\[H(p)=\frac{1}{N}\sum_{i=1}^N \left[-\log p(x_i)\right]\] without taking that normal Laplace approximation at this step, if we could find the density, and assuming the\(x_i\) were drawn from it.

The result is a slightly weird hybrid fitting procedure that requires two loss functions and which feels a little*ad hoc*, but maybe it works?

A Quasi-Bayesian extension of a gradient descent trick called Stochastic Weight Averaging(Izmailov et al. 2018,2020;Maddox et al. 2019;Wilson and Izmailov 2020). AFAICT it precludes using a prior? Any may not actually be a Laplace approximation per se?

This is a different objective, no longer centred at a MAP estimate. SeeVariational Gaussian Approximation.

**NB** ⚠️⛔️☣️: This sections is*vague half-arsed notes* of*dubious accuracy*.

We can estimate marginal likelihood with respect to hyperparameters by Laplace approximation, apparently. SeeImmer et al. (2021).

Where the Laplace approximations are Gaussian Processes over some index space. TBC. SeePiterbarg and Fatalov (1995);Wacker (2017);Alexanderian et al. (2016);Alexanderian (2021) and possiblySolin and Särkkä (2020) and possiblythe INLA stuff.

Integrated nested Laplace approximation leverages theGP-as-SDE idea to generalise Matérn-type covariances to interesting domains and non-stationarity.

I think the*Kronecker-factored Approximate Curvature* (K-FAC) is a famous one.
There are others?(Flaxman et al. 2015;Martens and Grosse 2015;Ritter, Botev, and Barber 2018)

A toolkit combining all the NN Laplace tricks is introduced in Agustinus Kristiadi’sModern Arts of Laplace Approximations is an excellent start.

Alexanderian, Alen. 2021.“Optimal Experimental Design for Infinite-Dimensional Bayesian Inverse Problems Governed by PDEs: A Review.”*arXiv:2005.12998 [Math]*, January.

Alexanderian, Alen, Noemi Petra, Georg Stadler, and Omar Ghattas. 2016.“A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Nonlinear Inverse Problems.”*SIAM Journal on Scientific Computing* 38 (1): A243–72.

Arras, Kai Oliver. 1998.“An Introduction To Error Propagation: Derivation, Meaning and Examples of Equation CY = FX CX FXT,” 22.

Bishop, Christopher M. 2006.*Pattern Recognition and Machine Learning*. Information Science and Statistics. New York: Springer.

Breslow, N. E., and D. G. Clayton. 1993.“Approximate Inference in Generalized Linear Mixed Models.”*Journal of the American Statistical Association* 88 (421): 9–25.

Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021.“Laplace Redux – Effortless Bayesian Deep Learning.” In*arXiv:2106.14806 [Cs, Stat]*.

Flaxman, Seth, Andrew Gordon Wilson, Daniel B Neill, Hannes Nickisch, and Alexander J Smola. 2015.“Fast Kronecker Inference in Gaussian Processes with Non-Gaussian Likelihoods.” In, 10.

Foong, Andrew Y. K., Yingzhen Li, José Miguel Hernández-Lobato, and Richard E. Turner. 2019.“‘In-Between’ Uncertainty in Bayesian Neural Networks.”*arXiv:1906.11537 [Cs, Stat]*, June.

Gorad, Ajinkya, Zheng Zhao, and Simo Särkkä. 2020.“Parameter Estimation in Non-Linear State-Space Models by Automatic Differentiation of Non-Linear Kalman Filters.” In, 6.

Huggins, Jonathan H., Trevor Campbell, Mikołaj Kasprzak, and Tamara Broderick. 2018.“Practical Bounds on the Error of Bayesian Posterior Approximations: A Nonasymptotic Approach.”*arXiv:1809.09505 [Cs, Math, Stat]*, September.

Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Mohammad Emtiyaz Khan. 2021.“Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning.”*arXiv:2104.04975 [Cs, Stat]*, June.

Immer, Alexander, Maciej Korzepa, and Matthias Bauer. 2021.“Improving Predictions of Bayesian Neural Nets via Local Linearization.” In*International Conference on Artificial Intelligence and Statistics*, 703–11. PMLR.

Ingebrigtsen, Rikke, Finn Lindgren, and Ingelin Steinsland. 2014.“Spatial Models with Explanatory Variables in the Dependence Structure.”*Spatial Statistics*, Spatial Statistics Miami, 8 (May): 20–38.

Izmailov, Pavel, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2020.“Subspace Inference for Bayesian Deep Learning.” In*Proceedings of The 35th Uncertainty in Artificial Intelligence Conference*, 1169–79. PMLR.

Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018.“Averaging Weights Leads to Wider Optima and Better Generalization,” March.

Khan, Mohammad Emtiyaz, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. 2020.“Approximate Inference Turns Deep Networks into Gaussian Processes.”*arXiv:1906.01930 [Cs, Stat]*, July.

Kristiadi, Agustinus, Matthias Hein, and Philipp Hennig. 2020.“Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks.” In*ICML 2020*.

———. 2021.“Learnable Uncertainty Under Laplace Approximations.” In*Uncertainty in Artificial Intelligence*.

Lindgren, Finn, and Håvard Rue. 2015.“Bayesian Spatial Modelling with R-INLA.”*Journal of Statistical Software* 63 (i19): 1–25.

Long, Quan, Marco Scavino, Raúl Tempone, and Suojin Wang. 2013.“Fast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations.”*Computer Methods in Applied Mechanics and Engineering* 259 (June): 24–39.

Lorsung, Cooper. 2021.“Understanding Uncertainty in Bayesian Deep Learning.” arXiv.

MacKay, David J C. 2002.*Information Theory, Inference & Learning Algorithms*. Cambridge University Press.

Mackay, David J. C. 1992.“A Practical Bayesian Framework for Backpropagation Networks.”*Neural Computation* 4 (3): 448–72.

Maddox, Wesley, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson. 2019.“A Simple Baseline for Bayesian Uncertainty in Deep Learning,” February.

Margossian, Charles C., Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020.“Hamiltonian Monte Carlo Using an Adjoint-Differentiated Laplace Approximation: Bayesian Inference for Latent Gaussian Models and Beyond.”*arXiv:2004.12550 [Stat]*, October.

Martens, James, and Roger Grosse. 2015.“Optimizing Neural Networks with Kronecker-Factored Approximate Curvature.” In*Proceedings of the 32nd International Conference on Machine Learning*, 2408–17. PMLR.

Martino, Sara, and Andrea Riebler. 2019.“Integrated Nested Laplace Approximations (INLA).” arXiv.

Ober, Sebastian W., and Carl E. Rasmussen. 2019.“Benchmarking the Neural Linear Model for Regression.” In. arXiv.

Opitz, Thomas, Raphaël Huser, Haakon Bakka, and Håvard Rue. 2018.“INLA Goes Extreme: Bayesian Tail Regression for the Estimation of High Spatio-Temporal Quantiles.”*Extremes* 21 (3): 441–62.

Papadopoulos, G., P.J. Edwards, and A.F. Murray. 2001.“Confidence Estimation Methods for Neural Networks: A Practical Comparison.”*IEEE Transactions on Neural Networks* 12 (6): 1278–87.

Petersen, Kaare Brandt, and Michael Syskind Pedersen. 2012.“The Matrix Cookbook.”

Piterbarg, V. I., and V. R. Fatalov. 1995.“The Laplace Method for Probability Measures in Banach Spaces.”*Russian Mathematical Surveys* 50 (6): 1151.

Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018.“A Scalable Laplace Approximation for Neural Networks.” In.

Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009.“Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.”*Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 71 (2): 319–92.

Rue, Håvard, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren. 2016.“Bayesian Computing with INLA: A Review.”*arXiv:1604.00860 [Stat]*, September.

Saumard, Adrien, and Jon A. Wellner. 2014.“Log-Concavity and Strong Log-Concavity: A Review.”*arXiv:1404.5886 [Math, Stat]*, April.

Schraudolph, Nicol N. 2002.“Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent.”*Neural Computation* 14 (7): 1723–38.

Snoek, Jasper, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md Mostofa Ali Patwary, Prabhat, and Ryan P. Adams. 2015.“Scalable Bayesian Optimization Using Deep Neural Networks.” In*Proceedings of the 32nd International Conference on Machine Learning*.

Solin, Arno, and Simo Särkkä. 2020.“Hilbert Space Methods for Reduced-Rank Gaussian Process Regression.”*Statistics and Computing* 30 (2): 419–46.

Stuart, Andrew M., and Aretha L. Teckentrup. 2016.“Posterior Consistency for Gaussian Process Approximations of Bayesian Posterior Distributions.”*arXiv:1603.02004 [Math]*, December.

Tang, Yanbo, and Nancy Reid. 2021.“Laplace and Saddlepoint Approximations in High Dimensions.”*arXiv:2107.10885 [Math, Stat]*, July.

Wacker, Philipp. 2017.“Laplace’s Method in Bayesian Inverse Problems.”*arXiv:1701.07989 [Math]*, April.

Watson, Joe, Jihao Andreas Lin, Pascal Klink, and Jan Peters. 2020.“Neural Linear Models with Functional Gaussian Process Priors,” 10.

Wilson, Andrew Gordon, and Pavel Izmailov. 2020.“Bayesian Deep Learning and a Probabilistic Perspective of Generalization,” February.

A disruptive-soundin’ branding of “empirical charity”. Largely utilitarian, with the strengths and weaknesses of utilitarianism.

Bjørn Lomborg, for example, gives an eloquent justification of the need to account for opportunity costs (a dollar spent saving rich people from cancer is a dollar not spent saving poor people from malaria) but then makes IMO abysmal recommendation for optimality. Don’t get me started on his assessment oftail risk. We need a better Bjørn Lomborg. What is the opportunity cost of keeping this Bjørn Lomborg? The ROI on investing in better Bjørn Lomborgs?

Aaaaanyway, we have loads of better Bjørn Lomborgs. This page mostly exists to bookmark resources they have littered around the internet.

80000hours, a career-advice site, because I thought their considered analytic posturing was sweet if awkward, and it reminds me of over-earnest boyfriends, e.g.

Which would you choose from these two options?

- Prevent one person from suffering next year.
- Prevent 100 people from suffering (the same amount) 100 years from now.
Most people choose the second option. It’s a crude example, but it suggests that they value future generations.

If people didn’t want to leave a legacy to future generations, it would be hard to understand why we invest so much in science, create art, and preserve the wilderness.

We’d certainly choose the second option. […]

First, future generations matter, but they can’t vote, they can’t buy things, and they can’t stand up for their interests. This means our system neglects them; just look at what is happening with issues like climate change.

Second, their plight is abstract. We’re reminded of issues like global poverty and factory farming far more often. But we can’t so easily visualise suffering that will happen in the future. Future generations rely on our goodwill, and even that is hard to muster.

Third, there will probably be many more people alive in the future than there are today.

You might be entertained to discover that their top recommendations for problems to tackle are

- Risks from artificial intelligence
- Promoting effective altruism
- Global priorities research

Like many of therationalist community projects, though, there is some interesting new DIY ethics in there. See alsolesswrong etc,trolley problems.

The famous early cases attached a high value to certainty that any given donation would have measurable effect.GiveWell is the poster child for this idea.

Here’s a question: Does anyone have a better version of this next quote? I want one which looks at the net costs of different modes of distribution instead of lumping everything into “capitalism” vs “other”.

Excerpted fromDoes marginalism in economics of effective altruism lead to self defeating behaviour?.

The core problem is the bourgeois moral philosophy that the movement rests upon. Effective Altruists abstract from—and thereby exonerate—the social dynamics constitutive of capitalism. […] capital’s commodification of necessities directly undermines the self-sufficiency of entire populations by determining how resources are allocated. […]

In the meantime, capital extracts around $2 trillion annually from “developing countries” through things like illicit financial flows, tax evasion, debt service, and trade policies advantageous to the global capitalist class. […]

These dynamics, which spring from capital’s insistence on the commodification of necessities, are what turn billions of people into drowning strangers and generate a need for ever-multiplying charitable organizations in the first place.

There is some cross-talk there (reducing extractive extractive corruption is indeed something that effective altruists wouldargue is a major concern of EA programs).
But there is still a critique here; marginalist incerementalism can indeed lead toMolochian equilibria.
Apparently this is called the*institutional critique*; I should track down a reference for that.

I am skeptical of incrementalist behaviour in EA myself, and in particular the preference for underwhelming changes with low-risk but unspectacular impact (subsidising mosquito net distribution), versus high-risk, structurally revolutionary changes (Taking land from the elites and giving it to peasants).
There is an implicit risk aversion there, about which I have Opinions that I should muster.**tl;dr** risk-aversion is baked in to this model.
Decent decision theory would presumably allow us to favour high-risk, riskier bets whilst still telling us that some charities are inadmissible by any criteria, i.e. that were with high probability a waste of money regardless of risk appetite.
More recent EA thought has addressed this; see below.

Also, I do like the fundamental EA insight that opportunity costs are important in charity; I would like to keep that around.

Portfolio-theory meets effective altruism. Take on a high-risk, high-return portfolio of bets. There is a lot to say here, but I am personally more sympathetic to this idea than I am to the (IMO inherently conservative) notion of trying to achieve low risk in donations at the cost of high return.

Anyway, seeOpen philanthropy on Hits-based Giving.

TBD. Hits based giving is about evaluating risky charities by estimated expected portfolio return. I hope to return here and talk about evaluating charities based on other criteria, such as tail-returns, which I would like to argue is substantively different.

- The Blog Prize
- The Esoteric Social Movement Behind This Cycle’s Most Expensive House Race
- Alice Crary · Against ‘Effective Altruism’ (2021)
- Mushtaq Khan on using institutional economics to predict effective government reforms - 80,000 Hours
- Effective Altruism, Continued: On Measuring Impact | Boston Review
- Some blindspots in rationality and effective altruism

Crary, Alice. 2021.“Against‘Effective Altruism’.”*Radical Philosophy*, no. 210: 33–43.

In which I collect tips from esteemed and eminent minds about how to go about pro-actively discovering stuff. More meta-tips than specific agendas for discovery.

Related:Progress studies,citizen science.

Classic:

- Curiosity

According to SLIME MOLD TIME MOLD

- Stupidity
- Arrogance
- Laziness
- Carefreeness
- Beauty
- Rebellion
- Humor

Endemic Pathogens Are Making You Crazy And Then Killing You: Toxoplasmosis Spotlight

In science, the process is just as important as the discoveries. Improving our scientific processes will speed up our rate of discovery. Feyerabend claims contemporary research has over-indexed on processes such as the scientific method and this rigidness has restrained innovation. The crux of his book

Against Method(Feyerabend and Hacking 2010) is that paradigm shifts in science stem from epistemological anarchism. Epistemology refers to the formulation of beliefs. This anarchy, to any Thomas Kuhn fans, is what is necessary to achieveKuhn’s Stage 4 phase of science, the ‘evolutionary phase’ in which new paradigms are created. In recent decades we have placed too much importance on science beingconsistent,while forgetting that paradigm shifts often come from those who refute mainstream assumptions. In other words, the geniuses who generated scientific paradigm shifts were anarchists to their contemporaries.

My understanding about which parts of research are hard and which things are valuable has changed.

Let us suppose that I was worried about being “scooped,” beaten to publication, which is a thing that people in science tend to worry about a little bit but not so much as I imagined before I joined in for myself.
How valuable is an insight?
It depends.
An insight like “this might be worth trying who knows” is not valueless, but the opportunity cost of trying it is possibly months of specialist time or years of grad student time.
Let us call such*intuitions*.
Even if the certainty is high, the exercise cost is also high, so it is not worth keeping this kind of thing quiet;
rather, you want to share this kind of thing as widely as possible so the work and risk can be shared between multiple researchers.
Having good intuitions is useful but if anything you want to cultivate the

If my idea is “I can*definitely* solve this particular problem in a particular way,” the value of this intelligence is higher.
Let us call these*solutions*.
Often, simply*knowing a solution exists* in a particular domain can lead someone else rapidly to a solution because it constrains the search for solutions enough to allow for other people to find it quickly.
This is because most of mathematics, at least for me, is trying bone-headed things that turn out to be silly, and merely narrowing the search space can make things drastically easier.
So if you were in a competitive, idea stealing environment, existence of solutions tends to be valuable intelligence.
It is not the whole way to a paper which will get you credit for being clever (and I personally value a paper which has more than one great idea in it) but a good solution is an important chunk.

On the other hand, academia can often operate like a cooperative endeavour, or even an competitively cooperativepotlatch and giving away valuable things is a way to signal status and attract collaborators and funding.

Overcoming Bias : The InDirect-Check Sweet Spot

Tony Kulesa, inTyler Cowen is the best curator of talent in the world gives a fannish profile of Tyler Cowen that has some interesting ideas about how you would become expert at identifying underexploited talent. Part of that process, interestingly, is*not* having a committee process, and being shamelessly personal-judgment-driven. Insert speculation about wisdom of crowds versus madness of committees. Also, equity concerns.

As a researcher motivated by big picture ideas (How can we survive on planet earth without being consumed in battles over dwindling resources and environmental crises?) as much as aesthetic ideas, I am sometimes considered damaged goods. A fine scientist, many claim, is safely myopic, the job of discovery being detailed piecework.

On one hand, the various research initiatives that pay my way*are* tied to
various real world goals (“Predict financial crises!”, “Tell us the future of
the climate!”). On the other, researchers involved tell me that it is useless
to try and solve these large issues wholesale, but that one must identify
small retail questions that one can hope to make progress on. On the other
hand, they’ve just agreed to take a lot of money to solve big problems. In
this school, then, the presumed logic is that one takes a large research grant
to strike a light on lots of small problems that lie in the penumbra of the
large issue, in the hope that one is flares up to illuminate the shade. Or
burns the lot to the ground. The example given by the Oxonian scholar who most
recently expounded this to me was Paul David and the path dependence of the
QWERTY keyboard. Deep issue of the contingency of the world, seen through the
tiny window opened by substandard keyboard design.

Truth, in these formulations, is a cat: don’t look at it directly or it will perversely slope off to rub against someone else’s leg. Your acceptance is all in the sidling-up, the feigning disinterest, and waiting for truth to come up and show you its belly. I’m not sure I’m am persuaded by this. It’s the kind of science that would be expounded in an education film directed by Alejandro Jodorowsky.

On the other hand, I’m not sure that I buy the grant-maker’s side of this story either, at least the story that grant-makers seem to expound in Australia, which is that they give out money to go out and find something out. There are productivity outcomes on the application form where you fill out the goals that your research will fulfill; This rules out much of the research done, by restricting you largely to marginally refining a known-good idea rather than trying something new. I romantically imagine that in much research, you would not know what you were discovering in advance.

The compromise is that we meet in the middle and swap platitudes. We will “improve our understanding of X”, we will “find strategies to better manage Y”. We certainly don’t mention that we might spend a while pondering keyboard layouts when the folks ask us to work out how to manage a complex non-linear economy.

Are fields plagued byhyperselection? Can we find fields that are ripe for disruption by outsiders with radical out-of-the-box ideas? Do we needleft-field eccentrics to roam about asking if the emperor has clothes?

Is it just stirring the pot? How many, to choose an example, physicists, can get published by ignoring everyone else’s advances?

How do you know that your left field idea is a radically simple left-field idea that causes the entire field to advance? And how do you know that it is not the crazed ramblings of someone missing the advances of the last several decades, an asylum inmate wandering out of the walled disciplinary asylum in a dressing gown, railing against the Vietnam War?

Edward Kmett — Learning to Learn seems popular.

Alexey Guzey quotes an anonymous twitter user:

The bare truth of science:

- Nobody believes a computational model except the person who built it.
- Everybody believes an experimental measurement except the person who made it.

— … June 28, 2018

Alon, Uri. 2009.“How to Choose a Good Scientific Problem.”*Molecular Cell* 35 (6): 726–28.

Arbesman, Samuel, and Nicholas A Christakis. 2011.“Eurekometrics: Analyzing the Nature of Discovery.”*PLoS Comput Biol* 7 (6): –1002072.

Azoulay, Pierre, Christian Fons-Rosen, and Joshua S. Graff Zivin. 2015.“Does Science Advance One Funeral at a Time?” Working Paper 21788. National Bureau of Economic Research.

Devezer, Berna, Luis G. Nardin, Bert Baumgaertner, and Erkan Ozge Buzbas. 2019.“Scientific Discovery in a Model-Centric Framework: Reproducibility, Innovation, and Epistemic Diversity.”*PLOS ONE* 14 (5): e0216125.

Duchin, Moon. 2004.“The Sexual Politics of Genius,” 34.

Dyba, Tore, Barbara A. Kitchenham, and Magne Jorgensen. 2005.“Evidence-Based Software Engineering for Practitioners.”*IEEE Software*, 2005.

Feyerabend, Paul, and Ian Hacking. 2010.*Against Method*. Fourth Edition. London ; New York: Verso.

Newman, M. E. J. 2009.“The First-Mover Advantage in Scientific Publication.”*EPL (Europhysics Letters)* 86 (6): 68001.

Nissen, Silas B., Tali Magidson, Kevin Gross, and Carl T. Bergstrom. 2016.“Publication Bias and the Canonization of False Facts.”*arXiv:1609.00494 [Physics, Stat]*, September.

Rekdal, Ole Bjørn. 2014.“Academic Urban Legends.”*Social Studies of Science* 44 (4): 638–54.

Schwartz, Martin A. 2008.“The Importance of Stupidity in Scientific Research.”*Journal of Cell Science* 121 (11): 1771.

Thagard, Paul. 1993.“Societies of Minds: Science as Distributed Computing.”*Studies in History and Philosophy of Modern Physics* 24: 49.

———. 1997.“Collaborative Knowledge.”*Noûs* 31 (2): 242–61.

———. 2005.“How to Be a Successful Scientist.”*Scientific and Technological Thinking*, 159–71.

Weng, L, A Flammini, A Vespignani, F Menczer, L Weng, A Flammini, A Vespignani, and F Menczer. 2012.“Competition Among Memes in a World with Limited Attention.”*Scientific Reports* 2.

A terminal famous for thoughtful UI. macOS only.
🏗 It has many features.`tmux`

integration.Little utilities that do useful things like`scp`

files from a remote host to your local folder.Python API.
I cannot help but feel superstitiously that a python API for a terminal is begging for security holes.
This feels like adding a web server to your pacemaker.
Anyway, it is simple and easy and works.

If you are using other systems than macos, read on.

If you are worried that your current terminal doesn’t use enough RAM, you can usehyper which is a javascript app version of terminal. It’s not too bad for one of these web technology desktop app things based on electron or similar, although it is not hard. It has lots of sexy features and nice graphics, to compensate for the obviously hefty RAM usage.

Weird quirk 1: It does not support dragging files into the terminal, which pretty much every alternative does.qweasd1’s hyper-drop-file extension enables that..

`hyper install hyper-drop-file`

Weird quirk 2: Anything which looks remotely like a URL in the terminal becomes a link which the terminal willaggressively open if you click on it or even drag over it which is annoying and slightly dangerous.Apparently this behaviour has become configurable now and I can put`webLinksActivationKey: ctrl`

in your config file to only do it on Ctrl-Click.

VS Code has abuilt-in terminal. It is pretty good, just so long as you do not need to full-screen it, and is ubiquitous and therefore useful.

Windows consoles that also speak linux terminal:Cmder | Console Emulator seems to be a passable default. Per default, it is a friendlified/packaged version ofConEmu. Neat features:POSIX terminals

Cmder can also work best with

`bash`

on Windows. Bash is the default shell found on *NIX systems such asLinuxandmacOS.

It doesn't even need conemu, apparently:

For the best experience, we suggest that you integrate your

favorite IDEswith Cmder -- it makes your editor terminal work as productive as Cmder shell itself.

simple terminal aims to have less lines of code than anything else and as few features as useful. It still does lots of stuff, and is tiny.

Alacritty
(Source) is a GPU-accelerated terminal
editor that aims to draw text real fast, and be otherwise minimalistic.~~That is not my main problem with terminals, so I have not used it.~~
I recant. A fast simple terminal that is not quite so hardline about the simplicity asst is useful.
It is written in Rust which gets it some kind of hip points.
There are fewer features to break than hyper has.
POSIX/Windows.

The project’s architecture and features are guided by a set of values:

Correctness:Alacritty should be able to properly render modern terminal applications like`tmux`

and`vim`

. Glyphs should be rendered properly, and the proper glyphs should be displayed.Performance:Alacritty should be the fastest terminal emulator available anywhere.Appearance:Alacritty should have beautiful font rendering and look fantastic on all supported platforms.Simplicity:Alacritty should be conservative about which features it offers. As we've learned from past terminal emulators, it’s far too easy to become bloated.`st`

taught us that it doesn't need to be that way. Features like GUI-based configuration, tabs and scrollback are unnecessary. The latter features are better provided by a terminal multiplexer like`tmux`

.Portability:Alacritty should support major operating systems including Linux, macOS, and Windows.

Warp, the “The blazingly fast, Rust-based terminal” has presumably even more rust hipness points than Alacritty. Currently in closed beta. Has lots of hype-compatible features such as shared/networked connections and encrypted stuff and multiple cursors and inline manuals.

Is old and messy; let it go.

Tilix is a terminal emulator thatGnome people tend to like.
It has consistent keyboard shortcuts,tiles (but tiles terminals only) and integrates into the Gnome Experience.
The tiles do not spark joy for me; if I wanted to tile things I would tile more than*only* terminals.

Kovid Goyal made a terminal with C inner loops and python UI extensibility calledkitty. It’s not famous, but probably worth checking since Kovid is a powerhouse of feature-packed development. If anything, I should probably be suspicious that it maye have too many features, because that would be on-brand for Kovid. macOS/Unices.

terminator seems to be an acceptable default option for a pure native GNOME app without many frills.

terminus supports some HTML graphics, and appears to work-ish.

Designerly graphics-friendly terminal aims to reinvent terminal protocols! Has avision statement! However it’s dead in the water.

Some cool features

- Smart token-based input with inline autocomplete and automatic escaping
- Rich output for common tasks and formats, using MIME types + sniffing
- Asynchronous views for background / parallel tasks
- Full separation between front/back-end
TermKit is not a…

- …Web application. It runs as a regular desktop app.
- …Scripting language like PowerShell or bash. It focuses on executing commands only.
- …Full terminal emulator. It does not aim to e.g. host ‘vim’.
- …Reimplementation of the Unix toolchain. It replaces and/or enhances built-in commands and wraps external tools.

Stochastic optimization, uses noisy (possibly approximate) 1st-order gradient information to find the argument which minimises

\[ x^*=\operatorname{arg min}_{\mathbf{x}} f(x) \]

for some an objective function\(f:\mathbb{R}^n\to\mathbb{R}\).

That this works with little fuss in very high dimensions is a major pillar ofdeep learning.

The original version, in terms of root finding, is(Herbert Robbins and Monro 1951) who later generalised analysis in(H. Robbins and Siegmund 1971), usingmartingale arguments to analyze convergence. There is some historical context in(Lai 2003) which puts it all in context. That article was written before the current craze for SGD in deep learning; after 2013 or so the problem is rather that there is so much information on the method that the challenge becomes sifting out the AI hype from the useful.

I recommend Francis Bach’sSum of geometric series trick as an introduction to showing things advanced things about SGD using elementary tools.

Francesco Orabana onhow to prove SGD converges:

to balance the universe of first-order methods, I decided to show how to easily prove the convergence of the iterates in SGD, even in unbounded domains.

Gradient flows are a continuous-limit SDE SGD(Ljung, Pflug, and Walk 1992;Mandt, Hoffman, and Blei 2017). Many super nice things are easy to prove using these bad boys, especiallySGMCMC things. Worth the price of dusting off the oldstocahstic calculus.

🏗

Zeyuan Allen-Zhu :Faster Than SGD 1: Variance Reduction:

SGD is well-known for large-scale optimization. In my mind, there are two (and only two) fundamental improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. In this post I’d love to conduct a survey regarding (1),

Zhiyuan Li and Sanjeev Arora argue:

You may remember ourprevious blog post showing that it is possible to do state-of-the-art deep learning with learning rate that increases exponentially during training. It was meant to be a dramatic illustration that what we learned in optimization classes and books isn’t always a good fit for modern deep learning, specifically,

normalized nets, which is our term for nets that use any one of popular normalization schemes,e.g.BatchNorm (BN),GroupNorm (GN),WeightNorm (WN). Today’s post (based uponour paper with Kaifeng Lyu at NeurIPS20) identifies other surprising incompatibilities between normalized nets and traditional analyses.

SeeSGMCMC.

…

Yellowfin an automatic SGD momentum tuner

Mini-batch and stochastic methods for minimising loss when you have a lot of data, or a lot of parameters, and using it all at once is silly, or when you want to iteratively improve your solution as data comes in, and you have access to a gradient for your loss, ideallyautomatically calculated. It’s not clear at all that it should work, except by collating all your data andoptimising offline, except that much of modern machine learning shows that it does.

Sometimes this apparently stupid trick it might even be fast for small-dimensional cases, so you may as well try.

Technically, “online” optimisation inbandit/RL problems might imply that you have to “minimise regret online”, which has a slightly different meaning and, e.g. involves seeing each training only as it arrives along some notional arrow of time, yet wishing to make the “best” decision at the next time, and possibly choosing your next experiment in order to trade-off exploration versus exploitation etc.

In SGD you can see your data as often as you want and in whatever order, but you only look at a bit at a time. Usually the data is given and predictions make no difference to what information is available to you.

Some of the same technology pops up in each of these notions of online optimisation, but I am mostly thinking about SGD here.

There are many more permutations and variations used in practice.

Ahn, Sungjin, Anoop Korattikara, and Max Welling. 2012.“Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In*Proceedings of the 29th International Coference on International Conference on Machine Learning*, 1771–78. ICML’12. Madison, WI, USA: Omnipress.

Alexos, Antonios, Alex J. Boyd, and Stephan Mandt. 2022.“Structured Stochastic Gradient MCMC.” In*Proceedings of the 39th International Conference on Machine Learning*, 414–34. PMLR.

Bach, Francis R., and Eric Moulines. 2013.“Non-Strongly-Convex Smooth Stochastic Approximation with Convergence Rate O(1/n).” In*arXiv:1306.2119 [Cs, Math, Stat]*, 773–81.

Bach, Francis, and Eric Moulines. 2011.“Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning.” In*Advances in Neural Information Processing Systems (NIPS)*, –. Spain.

Benaïm, Michel. 1999.“Dynamics of Stochastic Approximation Algorithms.” In*Séminaire de Probabilités de Strasbourg*, 33:1–68. Lecture Notes in Math. Berlin: Springer, Berlin.

Bensoussan, Alain, Yiqun Li, Dinh Phan Cao Nguyen, Minh-Binh Tran, Sheung Chi Phillip Yam, and Xiang Zhou. 2020.“Machine Learning and Control Theory.”*arXiv:2006.05604 [Cs, Math, Stat]*, June.

Botev, Zdravko I., and Chris J. Lloyd. 2015.“Importance Accelerated Robbins-Monro Recursion with Applications to Parametric Confidence Limits.”*Electronic Journal of Statistics* 9 (2): 2058–75.

Bottou, Léon. 1991.“Stochastic Gradient Learning in Neural Networks.” In*Proceedings of Neuro-Nîmes 91*. Nimes, France: EC2.

———. 1998.“Online Algorithms and Stochastic Approximations.” In*Online Learning and Neural Networks*, edited by David Saad, 17:142. Cambridge, UK: Cambridge University Press.

———. 2010.“Large-Scale Machine Learning with Stochastic Gradient Descent.” In*Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010)*, 177–86. Paris, France: Springer.

Bottou, Léon, and Olivier Bousquet. 2008.“The Tradeoffs of Large Scale Learning.” In*Advances in Neural Information Processing Systems*, edited by J.C. Platt, D. Koller, Y. Singer, and S. Roweis, 20:161–68. NIPS Foundation (http://books.nips.cc).

Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 2016.“Optimization Methods for Large-Scale Machine Learning.”*arXiv:1606.04838 [Cs, Math, Stat]*, June.

Bottou, Léon, and Yann LeCun. 2004.“Large Scale Online Learning.” In*Advances in Neural Information Processing Systems 16*, edited by Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf. Cambridge, MA: MIT Press.

Bubeck, Sébastien. 2015.*Convex Optimization: Algorithms and Complexity*. Vol. 8. Foundations and Trends in Machine Learning. Now Publishers.

Cevher, Volkan, Stephen Becker, and Mark Schmidt. 2014.“Convex Optimization for Big Data.”*IEEE Signal Processing Magazine* 31 (5): 32–43.

Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014.“Stochastic Gradient Hamiltonian Monte Carlo.” In*Proceedings of the 31st International Conference on Machine Learning*, 1683–91. Beijing, China: PMLR.

Chen, Xiaojun. 2012.“Smoothing Methods for Nonsmooth, Nonconvex Minimization.”*Mathematical Programming* 134 (1): 71–99.

Chen, Zaiwei, Shancong Mou, and Siva Theja Maguluri. 2021.“Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization.” arXiv.

Di Giovanni, Francesco, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, and Michael M. Bronstein. 2022.“Graph Neural Networks as Gradient Flows.” arXiv.

Domingos, Pedro. 2020.“Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.”*arXiv:2012.00152 [Cs, Stat]*, November.

Duchi, John, Elad Hazan, and Yoram Singer. 2011.“Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.”*Journal of Machine Learning Research* 12 (Jul): 2121–59.

Friedlander, Michael P., and Mark Schmidt. 2012.“Hybrid Deterministic-Stochastic Methods for Data Fitting.”*SIAM Journal on Scientific Computing* 34 (3): A1380–1405.

Ghadimi, Saeed, and Guanghui Lan. 2013a.“Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.”*SIAM Journal on Optimization* 23 (4): 2341–68.

———. 2013b.“Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.”*arXiv:1310.3787 [Math]*, October.

Goh, Gabriel. 2017.“Why Momentum Really Works.”*Distill* 2 (4): e6.

Hazan, Elad, Kfir Levy, and Shai Shalev-Shwartz. 2015.“Beyond Convexity: Stochastic Quasi-Convex Optimization.” In*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 1594–1602. Curran Associates, Inc.

Heyde, C. C. 1974.“On Martingale Limit Theory and Strong Convergence Results for Stochastic Approximation Procedures.”*Stochastic Processes and Their Applications* 2 (4): 359–70.

Hu, Chonghai, Weike Pan, and James T. Kwok. 2009.“Accelerated Gradient Methods for Stochastic Optimization and Online Learning.” In*Advances in Neural Information Processing Systems*, 781–89. Curran Associates, Inc.

Jakovetic, D., J.M. Freitas Xavier, and J.M.F. Moura. 2014.“Convergence Rates of Distributed Nesterov-Like Gradient Methods on Random Networks.”*IEEE Transactions on Signal Processing* 62 (4): 868–82.

Kingma, Diederik, and Jimmy Ba. 2015.“Adam: A Method for Stochastic Optimization.”*Proceeding of ICLR*.

Lai, Tze Leung. 2003.“Stochastic Approximation.”*The Annals of Statistics* 31 (2): 391–406.

Lee, Jason D., Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2017.“First-Order Methods Almost Always Avoid Saddle Points.”*arXiv:1710.07406 [Cs, Math, Stat]*, October.

Lee, Jason D., Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016.“Gradient Descent Converges to Minimizers.”*arXiv:1602.04915 [Cs, Math, Stat]*, March.

Ljung, Lennart, Georg Pflug, and Harro Walk. 1992.*Stochastic Approximation and Optimization of Random Systems*. Basel: Birkhäuser.

Ma, Siyuan, and Mikhail Belkin. 2017.“Diving into the Shallows: A Computational Perspective on Large-Scale Shallow Learning.”*arXiv:1703.10622 [Cs, Stat]*, March.

Maclaurin, Dougal, David Duvenaud, and Ryan P. Adams. 2015.“Early Stopping as Nonparametric Variational Inference.” In*Proceedings of the 19th International Conference on Artificial Intelligence and Statistics*, 1070–77. arXiv.

Mairal, Julien. 2013.“Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization.” In*Advances in Neural Information Processing Systems*, 2283–91.

Mandt, Stephan, Matthew D. Hoffman, and David M. Blei. 2017.“Stochastic Gradient Descent as Approximate Bayesian Inference.”*JMLR*, April.

McMahan, H. Brendan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, et al. 2013.“Ad Click Prediction: A View from the Trenches.” In*Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 1222–30. KDD ’13. New York, NY, USA: ACM.

Mitliagkas, Ioannis, Ce Zhang, Stefan Hadjis, and Christopher Ré. 2016.“Asynchrony Begets Momentum, with an Application to Deep Learning.”*arXiv:1605.09774 [Cs, Math, Stat]*, May.

Neu, Gergely, Gintare Karolina Dziugaite, Mahdi Haghifam, and Daniel M. Roy. 2021.“Information-Theoretic Generalization Bounds for Stochastic Gradient Descent.”*arXiv:2102.00931 [Cs, Stat]*, August.

Nguyen, Lam M., Jie Liu, Katya Scheinberg, and Martin Takáč. 2017.“Stochastic Recursive Gradient Algorithm for Nonconvex Optimization.”*arXiv:1705.07261 [Cs, Math, Stat]*, May.

Patel, Vivak. 2017.“On SGD’s Failure in Practice: Characterizing and Overcoming Stalling.”*arXiv:1702.00317 [Cs, Math, Stat]*, February.

Polyak, B. T., and A. B. Juditsky. 1992.“Acceleration of Stochastic Approximation by Averaging.”*SIAM Journal on Control and Optimization* 30 (4): 838–55.

Reddi, Sashank J., Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola. 2016.“Stochastic Variance Reduction for Nonconvex Optimization.” In*PMLR*, 1603:314–23.

Robbins, Herbert, and Sutton Monro. 1951.“A Stochastic Approximation Method.”*The Annals of Mathematical Statistics* 22 (3): 400–407.

Robbins, H., and D. Siegmund. 1971.“A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications.” In*Optimizing Methods in Statistics*, edited by Jagdish S. Rustagi, 233–57. Academic Press.

Ruder, Sebastian. 2016.“An Overview of Gradient Descent Optimization Algorithms.”*arXiv:1609.04747 [Cs]*, September.

Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014.“Explorations on High Dimensional Landscapes.”*arXiv:1412.6615 [Cs, Stat]*, December.

Salimans, Tim, and Diederik P Kingma. 2016.“Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” In*Advances in Neural Information Processing Systems 29*, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901–1. Curran Associates, Inc.

Shalev-Shwartz, Shai, and Ambuj Tewari. 2011.“Stochastic Methods for L1-Regularized Loss Minimization.”*Journal of Machine Learning Research* 12 (July): 1865–92.

Şimşekli, Umut, Ozan Sener, George Deligiannidis, and Murat A. Erdogdu. 2020.“Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks.”*CoRR* abs/2006.09313.

Smith, Samuel L., Benoit Dherin, David Barrett, and Soham De. 2020.“On the Origin of Implicit Regularization in Stochastic Gradient Descent.” In.

Spall, J. C. 2000.“Adaptive Stochastic Approximation by the Simultaneous Perturbation Method.”*IEEE Transactions on Automatic Control* 45 (10): 1839–53.

Vishwanathan, S.V. N., Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. 2006.“Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.” In*Proceedings of the 23rd International Conference on Machine Learning*.

Welling, Max, and Yee Whye Teh. 2011.“Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In*Proceedings of the 28th International Conference on International Conference on Machine Learning*, 681–88. ICML’11. Madison, WI, USA: Omnipress.

Wright, Stephen J., and Benjamin Recht. 2021.*Optimization for Data Analysis*. New York: Cambridge University Press.

Xu, Wei. 2011.“Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.”*arXiv:1107.2490 [Cs]*, July.

Zhang, Xiao, Lingxiao Wang, and Quanquan Gu. 2017.“Stochastic Variance-Reduced Gradient Descent for Low-Rank Matrix Recovery from Linear Measurements.”*arXiv:1701.00481 [Stat]*, January.

Zinkevich, Martin, Markus Weimer, Lihong Li, and Alex J. Smola. 2010.“Parallelized Stochastic Gradient Descent.” In*Advances in Neural Information Processing Systems 23*, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 2595–2603. Curran Associates, Inc.

- Learning git
- Handy git config
- Handy git commands
- What git calls things
- Filters
- Commit hooks
- Subtrees/submodules/subprojects/subdirs/subterranean mole people
- Deleting all tags
- Helpers
- git-branchless
- git-undo
- Importing some files across a branch
- Garbage collecting
- Editing history
- Making it work with a broken-permission FS
- Detecting if there are changes to commit
- Emergency commit
- Git hosting
- git email workflows
- Content-specific diffing
- SSH credentials
- Jupyter
- Decent GUIs
- Which repo am I in?
- Data versioning

My own git notes, not intended to be tutorial; there are better learning resources than this. Some are noted here, in fact. See also the more universally acclaimedclassic git tips.

See thefastai masterclass for many more helpful tips/links/scripts/recommendations.Learn Git Branching explains the mechanics in a friendly fashion. Steve Bennett’s10 things I hate about Git is also useful.

Mark Dominus’ master class:

I often do editing inVS code it is convenient toset it as git editor:

`git config --global core.editor "code-insiders --wait" # insiders`

`.DS_Store`

files3>```
echo .DS_Store >> .gitignore_global
git config --global core.excludesfile $HOME/.gitignore_global
```

During a merge,`git checkout --theirs filename`

(or`--ours`

) will checkout respectively their (or our) version.
The following sweet hack will resolve*all* files accordingly:

`grep -lr '<<<<<<<' . | xargs git checkout --theirs --`

TODO: Surely I can find conflicted files using git natively without grep. Should look that up.

Easy, except for the abstruse naming;
It is called “pickaxe” and spelled`-S`

.

`git log -Sword`

Kashyap Kondamudi advisesUse --follow option in git log to view a file’s history

`git log --oneline --find-renames --stat --follow -- src/somefile.ts`

`git clone --single-branch --branch <branchname> <remote-repo>`

`git rm --cached blah.tmp`

`git push <remote_name> --delete <branch_name>`

`git push origin HEAD:refs/heads/backdoor`

This is almost obvious except the git naming of things seem arbitrary.
Why`refs/heads/SOMETHING`

? Well…

By which I mean that which is formally referred to as*git references*.git references is the canonical description of the mechanics.**tl;dr** the most common names are`refs/heads/SOMETHING`

for branch`SOMETHING`

,`refs/tags/SOMETHING`

and`remotes/SOMEREMOTE/SOMETHING`

for (last known state of) a remote branch.

Asalexwlchan explains, these references are friendly names for commits.
The uses are (at least partly) convention and other references can be used too.
For example`gerrit`

uses`refs/for/`

forcode review purposes.

Commands applied to your files on the way in and out of the repository.
Keywords,`smudge`

,`clean`

,`.gitattr`

These are along story, but not so complicated in practice.
A useful one isstripping crap from jupyter notebooks.

For doing stuff before you put it in cold storage. For me this means, e.g asking DID YOU REALLY WANT TO INCLUDE THAT GIANT FILE?

Here is acommit hook that does exactly that. I made aslightly modernized version:

`curl -L https://gist.github.com/danmackinlay/6e4a0e5c38a43972a0de2938e6ddadba/raw/install.sh | bash`

After that installation you can retrofit the hook to an existing repository thusly

`p -R ~/.git_template/hooks .git/`

There are various frameworks for managing hooks, if you have lots.
For example,pre-commit is a mini-system for managing git hooks, based on python.Husky is a`node.js`

-based one.

I am not sure whether hook management system actually saves time overall for a solo developer, since the kind of person who remembers to install a pre-commit hook is also the kind of person who is relatively less likely to need one. Also it is remarkably labour-intensive to install the dependencies for all these systems, so if you are using heterogeneous systems this becomes tedious.

To skip the pre-commit hook,

`git commit --no-verify`

Sub-projects inside other projects? External projects? The simplest way of integrating external projects is assubtrees. Once this is set up you can mostly ignore them. Alternatively there aresubmodules, which havevarious complications. More recently, there is thesubtrac system, which I have not yet used.

Include external projects as separate repositories within a repository is
possible, but
I won’t document it, since it’s well documented elsewhere, and I use it less often
NB: some discipline required to make it go; you need to remember to`git submodule init`

etc.

Have not yet tried.

`subtrac`

is a helper tool that makes it easier to keep track of your git submodule contents. It collects the entire contents of the entire history of all your submodules (recursively) into a separate git branch, which can be pushed, pulled, forked, and merged however you want.

Subtree subsumes one git tree into another is a usually-transparent way (no separate checkout as with submodules.) It can be used for temporary merging or for splicing and dicing projects.

Creatin’:

```
git fetch remote branch
git subtree add --prefix=subdir remote branch --squash
```

Updatin’:

```
git fetch remote branch
git subtree pull --prefix=subdir remote branch --squash
git subtree push --prefix=subdir remote branch --squash
```

Con:Rebasin’ with a subtree in your repo is slow and involved.

Use`subtree split`

to prise out one chunk. Ithas some wrinkles
but is fast and easy.

```
pushd superproject
git subtree split -P project_subdir -b project_branch
popd
mkdir project
pushd project
git init
git pull ../superproject project_branch
```

Alternatively, to comprehensively rewrite history to exclude everything outside a subdir:

```
pushd superproject
cd ..
git clone superproject subproject
pushd subproject
git filter-branch \
--subdirectory-filter project_subdir \
--prune-empty -- \
--all
```

This works for github at least. I think anything running`git-svn`

?

- replace tree/master => trunk
- svn co the new url

`svn co https://github.com/buckyroberts/Source-Code-from-Tutorials/trunk/Python`

arxanas/git-branchless: Branchless workflow for Git

The branchless workflow is designed for use in a repository with a

single main branchthat all commits are rebased onto. It improves developer velocity by encouraging fast and frequent commits, and helps developers operate on these commits fearlessly.In the branchless workflow, the commits you’re working on are inferred based on your activity, so you no longer need branches to keep track of them. Nonetheless, branches are sometimes convenient, and

`git-branchless`

fully supports them. If you prefer, you can continue to use your normal workflow and benefit from features like`git sl`

or`git undo`

without going entirely branchless.

Also :git undo: We can do better

How is it so easy to “lose” your data in a system that’s supposed to never lose your data?

Well, it’s not that it’s too easy to lose your data — but rather, that it’s too difficult to recover it. For each operation you want to recover from, there’s a different “magic” incantation to undo it. All the data is still there in principle, but it’s not accessible to many in practice.

…To address this problem, I offer

`git undo`

`gerrit`

3>Gerrit is a code review system for git.

`legit`

3>`legit`

simplifies feature branch workflows.

`rerere`

3>Not repeating yourself during merges/rebases?git rerere automates this:

```
git config --global rerere.enabled true
git config --global rerere.autoupdate true
```

`git checkout my_branch -- my_file/`

In brief, this will purge a lot of stuff from a constipated repo in emergencies:

`git reflog expire --expire=now --all && git gc --prune=now`

`bfg`

does that:

```
git clone --mirror git://example.com/some-big-repo.git
cd some-big-repo.git
git repack
bfg --strip-blobs-bigger-than 10M .
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push -f
```

I think`bfg`

also does this. There is also native support:

```
git filter-branch -f \
--index-filter
'git rm -r --cached --ignore-unmatch unwanted_files'
```

e.g. you are editing a git repo on NTFS via Linux and things are silly.

`git config core.filemode false`

```
if output=$(git status --porcelain) && [ -z "$output" ]; then
# Working directory clean
else
# Uncommitted changes
fi
```

*Oh crap I’m leaving the office in a hurry and I just need to get my work into
git ASAP for continuing on another computer.
I don’t care about sensible commit messages because I am on my own private
branch and no-one else will see them when I squash the pull request.*

I put this little script in a file called`gitbang`

to automate the this case.

```
#!/usr/bin/env bash
# I’m leaving the office. Capture all changes in my private branch and push to server.
if output=$(git status --porcelain) && [ -z "$output" ]; then
echo "nothing to commit"
else
git add --all && git commit -m bang
fi
git pull && git submodule update --init --recursive && git push
```

- What is Codeberg? | Codeberg Documentation is agitea host
- Gitlab
- github

Or maybe avoid the need for hosting by using…

Tools such asgit-latexdiff provide custom diffing for, in this case,`LaTeX`

code.
These need to be found on a case-by-case basis.

Managing SSH credentials in git is non-obvious. SeeSSH.

For sanity in git+jupyter, see`jupyter`

.

SeeGit GUIs.

For`fish`

and`bash`

shell, seebash-git-prompt.

Seedata versioning.

Estimating the thing that is given to you by oracles in statistics homework assignments: the covariance matrix or its inverse, the precision matrices. Or, if you data is indexed in some fashion, thecovariance kernel. We are especially interested in this inGaussian processes, where the covariance kernel characterises the process up to its mean.

I am not introducing a complete theory of covariance estimation here, merely mentioning a couple of tidbits for future reference.

Two big data problems problems can arise here: large\(p\) (ambient dimension) and large\(n\) (sample size). Large\(p\) is a problem because the covariance matrix is a\(p \times p\) matrixand frequently we need to invert it to calculate some target estimand.

Often life can be made not too bad for large\(n\) with Gaussian structure because, essentially, it has a niceexponential family structure and hence has sufficient statistics.

Estimate the covariance matrix then invert it. This is the baseline. 🏗

🏗Wishart priors?

Aragam, Bryon, Jiaying Gu, and Qing Zhou. 2017.“Learning Large-Scale Bayesian Networks with the Sparsebn Package.”*arXiv:1703.04025 [Cs, Stat]*, March.

Avagyan, Vahe, and Xiaoling Mei. 2022.“Precision Matrix Estimation Under Data Contamination with an Application to Minimum Variance Portfolio Selection.”*Communications in Statistics - Simulation and Computation* 51 (4): 1381–1400.

Chen, Xiaohui, Mengyu Xu, and Wei Biao Wu. 2013.“Covariance and Precision Matrix Estimation for High-Dimensional Time Series.”*The Annals of Statistics* 41 (6).

Fan, Jianqing, Yuan Liao, and Han Liu. 2016.“An Overview of the Estimation of Large Covariance and Precision Matrices.”*The Econometrics Journal* 19 (1): C1–32.

Hsieh, Cho-Jui, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep D. Ravikumar. 2014.“QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.”*Journal of Machine Learning Research* 15 (1): 2911–47.

Hsieh, Cho-Jui, Mátyás A. Sustik, Inderjit S. Dhillon, Pradeep Ravikumar, and Russell A. Poldrack. 2013.“BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables.” In*Advances in Neural Information Processing Systems*, 16. NIPS’13. Red Hook, NY, USA: Curran Associates Inc.

Janková, Jana, and Sara van de Geer. 2015.“Honest Confidence Regions and Optimality in High-Dimensional Precision Matrix Estimation.”*arXiv:1507.02061 [Math, Stat]*, July.

Khoshgnauz, Ehsan. 2012.“Learning Markov Network Structure Using Brownian Distance Covariance.”*arXiv:1206.6361 [Cs, Stat]*, June.

Kuismin, Markku O., and Mikko J. Sillanpää. 2017.“Estimation of Covariance and Precision Matrix, Network Structure, and a View Toward Systems Biology.”*WIREs Computational Statistics* 9 (6): e1415.

Lam, Clifford, and Jianqing Fan. 2009.“Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.”*Annals of Statistics* 37 (6B): 4254–78.

Le, Thien-Minh, and Ping-Shou Zhong. 2021.“High-Dimensional Precision Matrix Estimation with a Known Graphical Structure.” arXiv.

Meier, Alexander, Claudia Kirch, and Renate Meyer. 2020.“Bayesian Nonparametric Analysis of Multivariate Time Series: A Matrix Gamma Process Approach.”*Journal of Multivariate Analysis* 175 (January): 104560.

Moscone, Francesco, Elisa Tosetti, and Veronica Vinciotti. 2017.“Sparse Estimation of Huge Networks with a Block-Wise Structure.”*The Econometrics Journal* 20 (3): S61–85.

Pourahmadi, Mohsen. 2011.“Covariance Estimation: The GLM and Regularization Perspectives.”*Statistical Science* 26 (3): 369–87.

Ubaru, Shashanka, Jie Chen, and Yousef Saad. 2017.“Fast Estimation of\(tr(f(A))\) via Stochastic Lanczos Quadrature.”*SIAM Journal on Matrix Analysis and Applications* 38 (4): 1075–99.

Wang, Lingxiao, Xiang Ren, and Quanquan Gu. n.d.“Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates,” 9.

Wu, Wei Biao, and Mohsen Pourahmadi. 2003.“Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data.”*Biometrika* 90 (4): 831–44.

Yuan, Ming. 2010.“High Dimensional Inverse Covariance Matrix Estimation via Linear Programming.”*The Journal of Machine Learning Research* 11: 26.

Zhang, T., and H. Zou. 2014.“Sparse Precision Matrix Estimation via Lasso Penalized D-Trace Loss.”*Biometrika* 101 (1): 103–20.

## Comment server less suspect than disqus3>

Hyvor Talk is a disqus alternative. The following two are self-hosted but also offer a paid non-self-hosted version: