Probably actually reading/writing

2020-03-05 — 2025-07-11

Wherein a running catalogue of active reading and writing is presented, with Bayes‑meets‑neural‑nets and AI‑safety drafts, ecology of agency notes, and a miscellany of music, tooling and conference links being maintained.

Stuff I’m currently reading or otherwise working on. If you’re looking at this and you’re not me, maybe we should reconsider our hobbies.

1 Refactoring

I need to reclassify the bio computing links; that section’s become confusing and there are too many good ideas there that aren’t clearly distinguished.

2 Currently writing

Not all of it is published yet.

Bayes-meets-neural-nets
1. Italian school Predictive Bayes
2. Singular Learning Theory
3. Continual learning.
AI Safety
1. metrics that come apart from their goals
2. domestication of humans
3. Causal agency.
4. learning with theory of mind
Ecology of agency
1. empowerment
2. utility as a local linearization of fitness
3. Bayesian epistemics
4. opponent shaping
5. big history
6. Intelligence in big history
7. human collective agency
8. coalition games
9. generic collective agency
10. multi-scale agency
11. commitment
Foundation models and their world models
1. Causal/Bayesian inference in foundation models
Community building
1. Collective care
2. Social calendaring
3. Psychological resilience
4. Nationalism
So you’ve just joined a union
When is computation “statistical”? I mean this in the sense that we know the dynamics of a population of solutions even when we can’t do the computations. I’m not sure of the scope — maybe I’m reinventing computational mechanics — so let’s use some examples to flesh it out:
1. Trading equities. We can’t know every trade, but we can price options well under no-arbitrage assumptions, even though traders’ calculations can be far more complex than ours. No-arbitrage assumptions aren’t strictly true, but the returns from extra complexity to find arbitrage opportunities seem to diminish with compute, so in the wash it’s pretty similar.
2. Scaling laws: we can’t know the exact computations an LLM will do, but we can remarkably well predict its performance given a data-parameter-train-compute budget.
3. Algorithmic statistics and pseudorandomness study the statistical behaviours of some classes of algorithms, where they become near-indistinguishable from randomness in technical senses
4. …
Reality gap
Is academic literary studies actually distinct from the security discipline of studying side-channel attacks?
Goodhart coordination
Structural problems are hard — let’s do training programs
Is residual prediction different from adversarial prediction?
Science communication for ML
Human superorganisms
1. Moral orbits.
2. Revisit probability collectives
3. Movement design
4. Returns on hierarchy
5. Effective collectivism
6. Alignment
7. Emancipating my tribe: the cruelty of collectivism (and why I love it anyway)
8. Institutions for angels
9. Institutional alignment
10. Beliefs and rituals of tribes
11. Where to deploy taboo
12. The Great Society will never feel great; it’ll merely be better than the alternatives
13. Player versus game
14. ~~Something about the fungibility of hipness and cash~~
15. Monastic traditions
nationalism
Approximate conditioning
Nested sampling
What even are GFlownets?
Public sphere business models
How to do house stuff (renovation etc)
Power and inscrutability
Strategic ignorance
What is an energy-based model? — tl;dr: a brand for models that handle likelihoods via a potential function that isn’t normalised as a density.
Funny-shaped learning
1. Causal attention
2. ~~Graphical ML~~
3. Gradient message passing
4. All inference is already variational inference
Human learner series
1. Which self?
2. Is language symbolic?
3. Our moral wetware
4. Is “is” “ought”?
5. Morality under uncertainty and computational constraint
6. Superstimuli
7. Clickbait bandits
8. Correlation construction
9. Moral explainability
  1. Burkean conservatism is about identifying when moral training data is out-of-distribution.
  2. Something about universal grammar and its learnable local approximations, versus universal ethics and its learnable local approximations. Morality by template; the computational difficulty of moral identification. Leading by example of necessity.
10. Righting and wronging
11. Akrasia in stochastic processes: What time-integrated happiness should we optimize?
12. ~~Comfort traps~~ ✅ Good enough for now
13. ~~Myths~~ ✅ a few notes are enough
Classification and society series
1. Constructivist rationalism
2. Affirming the consequent and evaporative tribalism
3. Classifications are not very informative
4. Adversarial categorization
5. AUC and collateral damage
6. Bias and base rates
7. Decision theory
8. Decision theory and prejudice
Shouting at each other on the internet series (Teleological liberalism)
1. Modern politics seems excellent at reducing the vast spectrum of policy options to two mediocre choices, then arguing about which is worse. What is this tendency called?
2. The Activist and decoupling games, and game-changing
3. Lived evidence deductions and/or ad hominem for discussing genetic arguments.
4. Diffusion of responsibility — is this distinct from messenger shooting?
5. Iterative game theory of communication styles
6. Invasive arguments
7. Coalition games
8. ~~All We Need Is Hate~~
9. Speech standards
10. ~~Pluralism~~ ✅
Learning in context
1. Interaction effects are what we want
2. Interpolation is what we want
3. Optimal conditioning is what we want
4. Correlation construction is easier than causation learning
Epistemic community design
1. Scientific community
2. Messenger shooting
3. Experimental ethics and surveillance
4. Steps to an ecology of mind
5. Epistemic bottlenecks is probably in this series too.
6. Ensemble strategies at the population level. I don’t need to guess right, we need a society in which people in aggregate guess in a calibrated way.
7. Truth-effectiveness heat pumps
Epistemic bottlenecks and bandwidth problems
1. Information versus learning as a fundamental question of ML. When do we store exemplars on disk? When do we do gradient updates? How much compute should we spend on compressing?
2. What is special about science? One thing is transmissibility. Can ChatGPT do transmission? Or is it 100% tacit? How does explainability relate to transmissibility?
DIY and the feast of fools
Tail risks and epistemic uncertainty
1. Black swan farming
2. Wicked tail risks
3. Planning under uncertainty
Economic dematerialization via
1. Enclosing the intellectual commons
2. Creative economy jobs
Academic publications as Veblen goods
~~Stein variational gradient descent~~ good enough for now
Edge of chaos, history of
X is Yer than Z
But what can I do?
1. Starfish problems
2. Ethical consumption
3. Prefigurative politics
Haunting and exchangeability. Connection to interpolation, individuation, legibility and nonparametrics.
Doing complicated things naively
Conspiracies as simulations
The uncanny ally
Elliptical belief propagation
Strategic ignorance
Privilege accountancy
~~Anthropic principles~~ ✅ Good enough
~~You can’t talk about us without us~~ ❌ What did I even mean? Something about mottes and baileys?
~~Subculture dynamics~~ ✅ Good enough
~~Opinion dynamics (memetics for beginners)~~ ✅ Good enough
~~Table stakes versus tokenism~~ ✅
~~Iterative game theory under bounded rationality~~ ❌ too general
~~Memetics~~ ❌ (too big, will never finish)
~~Cradlesnatch calculator~~ ✅ Good enough

3 music stuff

4 Misc

Transforming Probability Spaces
Doesn’t CGD find a pursuit basis?

5 Workflow optimization

Shell Integration - Documentation - iTerm2 - macOS Terminal Replacement

6 graphical models

7 “transfer” learning

Bernhard Schölkopf: From statistical to causal learning
Bernhard Schölkopf: Learning Causal Mechanisms (ICLR invited talk)
thuml/Transfer-Learning-Library /Transfer Learning — Transfer Learning Library 0.0.24 documentation “Transfer Learning Library for Domain Adaptation, Task Adaptation, and Domain Generalization”.
thuml/A-Roadmap-for-Transfer-Learning

8 Custom diffusion

9 Commoncog

10 Music skills

11 Internal

12 ICML 2023 workshop

13 Neurips 2022 follow-ups

Arya et al. (2022) — stochastic gradients are more general than deterministic ones because they are defined on discrete vars
Rudner et al. (2022)
Phillips et al. (2022) — diffusions in the spectral domain allow us to handle continuous function valued inputs
Gahungu et al. (2022)
Wu, Maruyama, and Leskovec (2022) LE-PDE is a learnable low-rank approximation method
Holl, Koltun, and Thuerey (2022) — Physics loss via forward simulations, without the need for sensitivity.
Neural density estimation
Metrics for inverse design and inverse inference problems — the former is in fact easier. Or is it? Can we simply attain forward prediction loss?
Noise injection in emulator learning (see refs in Su et al. (2022))