Sundry schools thought in how to stitch mathematics to the world, brief notes and questions thereto. Justin Domke wrote a Dummy’s guide to risk and decision theory which explains the different assumptions underlying each methodology from the risk and decision theory angle.
A lot of the obvious debates are, IMO, uninteresting. From where I am standing, Bayes methods are an attractive option for a bunch of pragmatic reasons (MCMC is often simpler and easier than special-case frequentist methods, Bayes methods comes with baked in regularisation through the prior, Bayes methods come with a to-my-mind conceptually easier interpretation of uncertainty). Bayes methods are annoying because some cool things that are natural in a frequentist framing (bootstrap, lasso regression) feel contrived and/or awkward in a Bayes context. These relative advantages are to my mind dwarfed by the practical problems I face that persist in Bayes and frequentist contexts.
But, dig a little and you will find some debates that are more about the philosophy of science itself, and some more that are about branding and market positioning. Bayesian statistics is controversial amongst some frequentists, and vice versa. Sometimes this is for purely terminological reasons, and sometimes for profound philosophical ones. The real flash point is whether Bayesian inference by itself gives a complete, sufficient description of the scientific enterprise and/or rationality. AFAICS the answer to this is trivially “no” but also “that does not at all matter for the statistical problem in front of me right now why are you bothering me with the entirety of science I have a paper deadline?” I imagine this is a common viewpoint, but have not done a survey.
Avoiding the whole accursed issue
You are a card-carrying frequentist and want to use a Bayesian estimator because it’s tractable and simple? No problem. Discuss prior beliefs in terms of something other than probability, use the Bayesian formalism, then produce a frequentist justification.
Now everyone is happy, apart from you, because you had to miss your family’s weekend in the countryside, and cannot remember the name of your new niece.
This is the best option; just be clear about which guarantees your method of choice will give you. There is a diversity of such guarantees across different fields of statistics, and no free lunches. You know, just like you’d expect.
Frequentist vs Bayesian acrimony
Would I prefer to spend time in an interminable and, to outsiders, useless debate? Is there someone I wish to irritate at the next faculty meeting?
Well then, why not try to use your current data set as a case study to answer the following questions:
Can I recycle Bayes belief updating formalism as a measures of certainty for a hypothesis, or not? Which bizarre edge case can I demonstrate by assuming I can? Or by assuming I can’t? Can I straw-man the “other side” into sounding like idiots?
If I can phrase an estimator in terms of Bayesian belief updates, does it mean that anyone who doesn’t phrase an estimator in terms of Bayesian belief updates is doing it wrong and I need tell them so? If someone produces a perfectly good estimator by belief updating, do I regard it as broken if it uses the language of probabilities to describe belief, even when it still satisfies frequentist desiderata such as admissibility? If I can find a Bayesian rationale for a given frequentist method — say, regularisation — does it mean that what the frequentist is “really” doing is the Bayesian thing I just rationalised, but they are ignorant for not describing it in terms of priors?
That should give me some controversies. Now, I can weigh in!
Here is a sampling of expert opinions probably more expert than mine:
Probabilistic reasoning —always to be understood as subjective— merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported or forgotten; it may even relate to something more or less knowable (by means of a computation, a logical deduction, etc.) but for which we are not willing or able to make the effort; and so on.
More or less, claims “Bayesian statistical practice IS science”. Makes frequentists angry.
Deborah Mayo as a philosopher of science and especially of the practice of frequentism, has more than you could possibly wish to know about the details of statistical practice, as well as rhetorical dissection of the F-vs-B debate, and says BTW that “Bayesian statistics are not science”. Makes Bayesians angry.
Larry Wasserman: Freedman’s neglected theorem
In this post I want to review an interesting result by David Freedman […]
The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian prior distributions yield inconsistent posteriors, in a sense we’ll make precise below. The math is uncontroversial but, as you might imagine, the interpretation of the result is likely to be controversial.
[…] as Freedman says in his paper:
“ … it is easy to prove that for essentially any pair of Bayesians, each thinks the other is crazy.”
David Childers,Some issues with Bayesian epistemology
My main concerns are, effectively, computational. As I understand computer science, the processing of information requires real resources, (mostly time, but also energy, space, etc) and so any theory of reasoning which mandates isomorphism between statements for which computation is required to demonstrate equivalence is effectively ignoring real costs that are unavoidable and so must have some impact on decisions. Further, as I understand it, there is no way to get around this by simply adding this cost as a component of the decision problem.…
The question of these processing costs becomes more interesting to the extent that they are quantitatively nontrivial. As somebody who spends hours running and debugging MCMC samplers and does a lot of reading about Bayesian computation, my takeaway from this literature is that the limits are fundamental. In particular, there are classes of distributions such that the Bayesian update step is hard, for a variety of hardness classes. This includes many distributions where the update step is NP complete, so that our best understanding of P vs NP suggests that the time to perform the update can be exponential in the size of the problem (sampling from spin glass models is an archetypal example, though really any unrestricted distribution over long strings of discrete bits will do). I suppose a kind of trivial example of this is the case with prior mass 1, in which case the hardness reduces to the hardness of the deterministic computation problem, and so encompasses every standard problem in computer science. More than just exponential time (which can mean use of time longer than the length of the known history of the universe for problems of sizes faced practically by human beings every day, like drawing inferences from the state of a high resolution image), some integration problems may even be uncomputable in the Turing sense, and so not just wildly impractical but impossible to implement on any physical substrate (at least if the Church-Turing hypothesis is correct). Amusingly, this extends to the problem above of determining the costs of practical transformations, as determining whether a problem is computable in finite time is itself the classic example of a problem which is not computable.
So, exact Bayesianism for all conceivable problems is physically impossible, which makes it slightly less compelling as a normative goal.
I am told I should look at Andrew Gelman’s model of Bayesian methodology, which is supposed to be reasonable even to frequentists (‘I always feel that people who like Gelman would prefer to have no Bayes at all.’)
Mathematical invective from Shalizi, showing that stubbornly applying Bayesian methods to a sufficiently un-cooperative problem with a sufficiently bad model is effectively producing a replicator system. Which is to say, the failure modes are interesting. (Question: Is this behaviour much worse than in a mis-specified dependent frequentist parametric model? I should read it and find out.)
Sims has a Nobel Memorial Prize, so he gets to speak on behalf of Bayesian econometrics I guess.
I want to lay out some intuitions about why bayesianism is not very useful as a conceptual framework for thinking either about AGI or human reasoning. This is not a critique of bayesian statistical methods; it’s instead aimed at the philosophical position that bayesianism defines an ideal of rationality which should inform our perspectives on less capable agents, also known as ”strong bayesianism”. As described here:
The Bayesian machinery is frequently used in statistics and machine learning, and some people in these fields believe it is very frequently the right tool for the job. I’ll call this position “weak Bayesianism.” There is a more extreme and more philosophical position, which I’ll call ”strong Bayesianism,” that says that the Bayesian machinery is the single correct way to do not only statistics, but science and inductive inference in general — that it’s the ”aspirin in willow bark” that makes science, and perhaps all speculative thought, work insofar as it does work.
I cannot help but feel this computational-unattainability-of-Bayes argument is reminiscent of critiques of equilibrium economic dynamics arguments, where the economics arguments about efficiency at equilibrium require too much time and computation to be relevant.
nostalgebraist grumps about strong Bayes as a methodology for science. Key point: you do not have all the possible models, and you do not have the computational resource to assigned them posterior likelihoods if you did; assuming that you can do Bayesian learning over them is thus a broken model for learning the world.