Probably actually reading/writing

2020-03-05 — 2025-10-23

Wherein a catalogue of current readings and projects is presented, and a lengthy list of in‑progress essays on Bayes‑meets‑neural‑nets, continual learning, AI safety and human collectivism is enumerated.

Figure 1

Stuff I’m currently reading or otherwise working on. If you’re looking at this and you’re not me, maybe you should reconsider your hobbies.

1 Big Things

Hanging out my shingle.

2 Currently writing

Not all of it is published yet.

  1. A social divide I’ve seen a lot recently: people who value cheap signalling highly versus those who view it negatively.

  2. Bayes-meets-neural-nets

    1. Italian school Predictive Bayes
    2. Singular Learning Theory
    3. Continual learning.
  3. AI Safety

    1. Generalized economics of compute and cognition
    2. Metrics that come apart from their goals
    3. domestication of humans
    4. Causal agency.
    5. learning with theory of mind
  4. Ecology of agency

    1. empowerment
    2. utility as a local linearization of fitness
    3. Bayesian epistemics
    4. opponent shaping
    5. big history
    6. Intelligence in big history
    7. human collective agency
    8. coalition games
    9. generic collective agency
    10. multi-scale agency
    11. commitment
  5. Foundation models and their world models

    1. Causal/Bayesian inference in foundation models
  6. Community building

    1. Collective care
    2. Social calendaring
    3. Psychological resilience
    4. Nationalism
  7. So you’ve just joined a union

  8. When is computation “statistical”? I mean this in the sense that we in statistical mechanics, we know some bulk statics of a population of solutions even when we can’t do the calculations for everything (like: air pressure doesn’t require simulating every molecule). It seems that machine learning sometimes behaves like this in a certain sense I’m not sure of the scope of this idea — maybe I’m reinventing computational mechanics — so let’s use some examples to flesh it out:

    1. Trading equities. We can’t know every trade, but we can price options well under no-arbitrage assumptions, even though traders’ calculations can be far more complex than ours. No-arbitrage assumptions aren’t strictly true, but the returns from extra complexity to find arbitrage opportunities seem to diminish with compute, so in the wash it’s pretty similar.
    2. Statistical mechanics of statistics
    3. Scaling laws: we can’t know the exact computations an LLM will do, but we can predict its performance remarkably well given a data-parameter-train-compute budget.
    4. Algorithmic statistics and pseudorandomness study the statistical behaviours of some classes of algorithms, where they become near-indistinguishable from randomness in technical senses.
  9. Reality gap

  10. Is academic literary studies actually distinct from the security discipline of studying side-channel attacks?

  11. Goodhart coordination

  12. Structural problems are hard — let’s do training programs

  13. Is residual prediction different from adversarial prediction?

  14. Science communication for ML

  15. Human superorganisms

    1. Moral orbits.
    2. Revisit probability collectives
    3. Movement design
    4. Returns on hierarchy
    5. Effective collectivism
    6. Alignment
    7. Emancipating my tribe: the cruelty of collectivism (and why I love it anyway)
    8. Institutions for angels
    9. Institutional alignment
    10. Beliefs and rituals of tribes
    11. Where to deploy taboo
    12. The Great Society will never feel great; it’ll merely be better than the alternatives
    13. Player versus game
    14. Something about the fungibility of hipness and cash
    15. Monastic traditions
  16. nationalism

  17. Approximate conditioning

  18. Nested sampling

  19. What even are GFlownets?

  20. Public sphere business models

  21. How to do house stuff (renovation etc)

  22. Power and inscrutability

  23. Strategic ignorance

  24. What is an energy-based model? — tl;dr: a brand for models that handle likelihoods via a potential function that isn’t normalised as a density.

  25. Funny-shaped learning

    1. Causal attention
    2. Graphical ML
    3. Gradient message passing
    4. All inference is already variational inference
  26. Human learner series

    1. Which self?

    2. Is language symbolic?

    3. Our moral wetware

    4. Is “is” “ought”?

    5. Morality under uncertainty and computational constraint

    6. Superstimuli

    7. Clickbait bandits

    8. Correlation construction

    9. Moral explainability

      1. Burkean conservatism is about identifying when moral training data is out-of-distribution.
      2. Something about universal grammar and its learnable local approximations versus universal ethics and their learnable local approximations. Morality by template; the computational difficulty of moral identification. Leading by example of necessity.
    10. Righting and wronging

    11. Akrasia in stochastic processes: What time-integrated happiness should we optimize?

    12. Comfort traps ✅ Good enough for now

    13. Myths ✅ a few notes are enough

  27. Classification and society series

    1. Constructivist rationalism
    2. Affirming the consequent and evaporative tribalism
    3. Classifications are not very informative
    4. Adversarial categorization
    5. AUC and collateral damage
    6. Bias and base rates
    7. Decision theory
    8. Decision theory and prejudice
  28. Shouting at each other on the internet series (Teleological liberalism)

    1. Modern politics seems excellent at reducing the vast spectrum of policy options to two mediocre choices, then arguing about which is worse. What is this tendency called?
    2. The Activist and decoupling games, and game-changing
    3. Lived evidence deductions and/or ad hominem for discussing genetic arguments.
    4. Diffusion of responsibility — is this distinct from messenger shooting?
    5. Iterative game theory of communication styles
    6. Invasive arguments
    7. Coalition games
    8. All We Need Is Hate
    9. Speech standards
    10. Pluralism
  29. Learning in context

    1. Interaction effects are what we want
    2. Interpolation is what we want
    3. Optimal conditioning is what we want
    4. Correlation construction is easier than causation learning
  30. Epistemic community design

    1. Scientific community
    2. Messenger shooting
    3. Experimental ethics and surveillance
    4. Steps to an ecology of mind
    5. Epistemic bottlenecks is probably in this series too.
    6. Ensemble strategies at the population level. I don’t need to guess right; we need a society in which people in aggregate guess in a calibrated way.
    7. Truth-effectiveness heat pumps
  31. Epistemic bottlenecks and bandwidth problems

    1. Information versus learning as a fundamental question of ML. When do we store exemplars on disk? When do we do gradient updates? How much compute should we spend on compressing?
    2. What is special about science? One thing is transmissibility. Can ChatGPT transmit knowledge? Or is it 100% tacit? How does explainability relate to transmissibility?
  32. DIY and the feast of fools

  33. Tail risks and epistemic uncertainty

    1. Black swan farming
    2. Wicked tail risks
    3. Planning under uncertainty
  34. Economic dematerialization via

    1. Enclosing the intellectual commons
    2. Creative economy jobs
  35. Academic publications as Veblen goods

  36. Stein variational gradient descent good enough for now

  37. Edge of chaos, history of

  38. X is Yer than Z

  39. But what can I do?

    1. Starfish problems
    2. Ethical consumption
    3. Prefigurative politics
  40. Haunting and exchangeability. Connection to interpolation, individuation, legibility and nonparametrics.

  41. Doing complicated things naively

  42. Conspiracies as simulations

  43. The uncanny ally

  44. Elliptical belief propagation

  45. Strategic ignorance

  46. Privilege accountancy

  47. Anthropic principles ✅ Good enough

  48. You can’t talk about us without us ❌ What did I even mean? Something about mottes and baileys?

  49. Subculture dynamics ✅ Good enough

  50. Opinion dynamics (memetics for beginners) ✅ Good enough

  51. Table stakes versus tokenism

  52. Iterative game theory under bounded rationality ❌ too general

  53. Memetics ❌ (too big, will never finish)

  54. Cradlesnatch calculator ✅ Good enough

3 Refactoring

I need to reclassify the bio computing links; that section’s become confusing and there are too many good ideas that aren’t clearly distinguished.

4 music stuff

5 Misc

6 Workflow optimization

7 graphical models

8 “transfer” learning

9 Custom diffusion

10 Commoncog

11 Music skills

12 Internal

13 ICML 2023 workshop

14 Neurips 2022 follow-ups

  1. Arya et al. (2022) — stochastic gradients are more general than deterministic ones because they are defined on discrete vars
  2. Rudner et al. (2022)
  3. Phillips et al. (2022) — diffusions in the spectral domain allow us to handle continuous function valued inputs
  4. Gahungu et al. (2022)
  5. Wu, Maruyama, and Leskovec (2022) LE-PDE is a learnable low-rank approximation method
  6. Holl, Koltun, and Thuerey (2022) — Physics loss via forward simulations, without the need for sensitivity.
  7. Neural density estimation
  8. Metrics for inverse design and inverse inference problems — the former is in fact easier. Or is it? Can we simply attain forward prediction loss?
  9. Noise injection in emulator learning (see refs in Su et al. (2022))

15 Conf, publication venues

16 Neurips 2022

17 Neurips 2021

18 Music

Nestup / cutelabnyc/nested-tuplets: Fancy javascript for manipulating nested tuplets.

19 Hot topics

20 Stein stuff

21 newsletter migration

22 GP research

22.1 Invenia’s GP expansion ideas

23 SDEs, optimization and gradient flows

Nguyen and Malinsky (2020)

Statistical Inference via Convex Optimization.

Conjugate functions illustrated.

Francis Bach on the use of geometric sums and a different take by Julyan Arbel.

Tutorial to approximating differentiable control problems. An extension of this is universal differential equations.

24 Career tips and metalearning

25 Ensembles and particle methods

26 Foundations of ML

So much Michael Betancourt.

27 nonparametrics

28 References

Arya, Schauer, Schäfer, et al. 2022. Automatic Differentiation of Programs with Discrete Randomness.” In.
Gahungu, Lanyon, Álvarez, et al. 2022. Adjoint-Aided Inference of Gaussian Process Driven Differential Equations.” In.
Holl, Koltun, and Thuerey. 2022. Scale-Invariant Learning by Physics Inversion.” In.
Lai, Takida, Murata, et al. 2022. Regularizing Score-Based Models with Score Fokker-Planck Equations.” In.
Nguyen, and Malinsky. 2020. “Exploration and Implementation of Neural Ordinary Differential Equations.”
Phillips, Seror, Hutchinson, et al. 2022. Spectral Diffusion Processes.” In.
Rudner, Chen, Teh, et al. 2022. Tractable Function-Space Variational Inference in Bayesian Neural Networks.” In.
Su, Kempe, Fielding, et al. 2022. “Adversarial Noise Injection for Learned Turbulence Simulations.” In.
Wu, Maruyama, and Leskovec. 2022. Learning to Accelerate Partial Differential Equations via Latent Global Evolution.”