Bayesians vs frequentists

Just because we both get the same answer doesn’t mean neither of us is wrong

November 25, 2014 — January 14, 2021

how do science
statistics
Figure 1: Disagreements in posterior updates

Sundry schools thought in how to stitch mathematics to the world, brief notes and questions thereto. Justin Domke wrote a Dummy’s guide to risk and decision theory which explains the different assumptions underlying each methodology from the risk and decision theory angle.

A lot of the obvious debates are, IMO, uninteresting. From where I am standing, Bayes methods are an attractive option for a bunch of pragmatic reasons (MCMC is often simpler and easier than special-case frequentist methods, Bayes methods comes with baked in regularisation through the prior, Bayes methods come with a to-my-mind conceptually easier interpretation of uncertainty). Bayes methods are annoying because some cool things that are natural in a frequentist framing (bootstrap, lasso regression) feel contrived and/or awkward in a Bayes context. These relative advantages are to my mind dwarfed by the practical problems I face that persist in Bayes and frequentist contexts.

But, dig a little and you will find some debates that are more about the philosophy of science itself, and some more that are about branding and market positioning. Bayesian statistics is controversial amongst some frequentists, and vice versa. Sometimes this is for purely terminological reasons, and sometimes for profound philosophical ones. The real flash point is whether Bayesian inference by itself gives a complete, sufficient description of the scientific enterprise and/or rationality. AFAICS the answer to this is trivially “no” but also “that does not at all matter for the statistical problem in front of me right now why are you bothering me with the entirety of science I have a paper deadline?” I imagine this is a common viewpoint, but have not done a survey.

1 Avoiding the whole accursed issue

You are a card-carrying frequentist and want to use a Bayesian estimator because it’s tractable and simple? No problem. Discuss prior beliefs in terms of something other than probability, use the Bayesian formalism, then produce a frequentist justification.

Now everyone is happy, apart from you, because you had to miss your family’s weekend in the countryside, and cannot remember the name of your new niece.

This is the best option; just be clear about which guarantees your method of choice will give you. There is a diversity of such guarantees across different fields of statistics, and no free lunches. You know, just like you’d expect.

2 Frequentist vs Bayesian acrimony

Would I prefer to spend time in an interminable and, to outsiders, useless debate? Is there someone I wish to irritate at the next faculty meeting?

Well then, why not try to use your current data set as a case study to answer the following questions:

Can I recycle Bayes belief updating formalism as a measures of certainty for a hypothesis, or not? Which bizarre edge case can I demonstrate by assuming I can? Or by assuming I can’t? Can I straw-man the “other side” into sounding like idiots?

If I can phrase an estimator in terms of Bayesian belief updates, does it mean that anyone who doesn’t phrase an estimator in terms of Bayesian belief updates is doing it wrong and I need tell them so? If someone produces a perfectly good estimator by belief updating, do I regard it as broken if it uses the language of probabilities to describe belief, even when it still satisfies frequentist desiderata such as admissibility? If I can find a Bayesian rationale for a given frequentist method — say, regularisation — does it mean that what the frequentist is “really” doing is the Bayesian thing I just rationalised, but they are ignorant for not describing it in terms of priors?

That should give me some controversies. Now, I can weigh in!

Here is a sampling of expert opinions probably more expert than mine:

  • Deutsch, Popper, Gelman and Shalizi, with a side of Mayo, on Bayesian ideas, models and fallibilism in the philosophy of science and in statistics

    I think this argument may be reaching a point that is often reached when smart people, indeed smart communities of people, discuss, over many years, fundamental issues like this on which they start out with strong differences of opinion: positions become more nuanced on each side, and effectively closer, but each side wants to keep the labels they started with, perhaps in part as a way of wanting to point to the valid or partially valid insights that have come from “their” side of the argument (even if they have come from the other side as well in somewhat different terms), and perhaps also as a way of wanting to avoid admitting having been wrong in “fundamental” ways. For example, one sees insights similar to those in the work of Richard Jeffrey and others from a “broadly Bayesian” perspective, about how belief change isn’t always via conditionalization using fixed likelihoods, also arising in the work of the “hypothetico-deductive” camp, where they are used against the simpler “all-conditionalization-all-the-time” Bayesianism. Similarly, probably Popperian ideas played a role in converting some “relatively crude” inductivists to more sophisticated Bayesian or Jefferian approach. (Nelson Goodman’s “Fact, Fiction, and Forecast”, with its celebrated “paradox of the grue emeralds”, probably played this role a generation or two later.) Roughly speaking, the “corroboration” of hypotheses of which Popper speaks, involves not just piling up observations compatible with the hypothesis (a caricature of “inductive support”) but rather the passage of stringent tests. In the straight “falsification” view of Popper, these are stringent because there is a possibility they will generate results inconsistent with the hypothesis, thereby “falsifying” it; on the view which takes it as pointing toward a more Bayesian view of things (I believe I once read something by I.J. Good in which he said that this was the main thing to be gotten from Popper), this might be relaxed to the statement that there are outcomes that are very unlikely if the hypothesis is true, thereby having the potential, at least, of leading to a drastic lowering of the posterior probability of the hypothesis (perhaps we can think of this as a softer version of falsification) if observed. The posterior probability given that such an outcome is observed of course does not depend only on the prior probability of the hypothesis and the probability of the data conditional on the hypothesis—it also depends on many other probabilities. So, for instance, one might also want such a test to have the property that “it would be difficult (rather than easy) to get an accordance between data × and H (as strong as the one obtained) if H were false (or specifiably flawed)”.

  • Wagenmakers on de Finetti on Probability

    Probabilistic reasoning —always to be understood as subjective— merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported or forgotten; it may even relate to something more or less knowable (by means of a computation, a logical deduction, etc.) but for which we are not willing or able to make the effort; and so on.

  • (Jaynes and Bretthorst 2003)

    More or less, claims “Bayesian statistical practice IS science”. Makes frequentists angry.

  • Deborah Mayo as a philosopher of science and especially of the practice of frequentism, has more than you could possibly wish to know about the details of statistical practice, as well as rhetorical dissection of the F-vs-B debate, and says BTW that “Bayesian statistics are not science”. Makes Bayesians angry.

  • Larry Wasserman: Freedman’s neglected theorem

    In this post I want to review an interesting result by David Freedman […]

    The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian prior distributions yield inconsistent posteriors, in a sense we’ll make precise below. The math is uncontroversial but, as you might imagine, the interpretation of the result is likely to be controversial.

    […] as Freedman says in his paper:

    “ … it is easy to prove that for essentially any pair of Bayesians, each thinks the other is crazy.”

    Is that theorem one of these? (Diaconis and Freedman 1986; Freedman 1999)

  • David Childers, Some issues with Bayesian epistemology

    My main concerns are, effectively, computational. As I understand computer science, the processing of information requires real resources, (mostly time, but also energy, space, etc) and so any theory of reasoning which mandates isomorphism between statements for which computation is required to demonstrate equivalence is effectively ignoring real costs that are unavoidable and so must have some impact on decisions. Further, as I understand it, there is no way to get around this by simply adding this cost as a component of the decision problem.…

    The question of these processing costs becomes more interesting to the extent that they are quantitatively nontrivial. As somebody who spends hours running and debugging MCMC samplers and does a lot of reading about Bayesian computation, my takeaway from this literature is that the limits are fundamental. In particular, there are classes of distributions such that the Bayesian update step is hard, for a variety of hardness classes. This includes many distributions where the update step is NP complete, so that our best understanding of P vs NP suggests that the time to perform the update can be exponential in the size of the problem (sampling from spin glass models is an archetypal example, though really any unrestricted distribution over long strings of discrete bits will do). I suppose a kind of trivial example of this is the case with prior mass 1, in which case the hardness reduces to the hardness of the deterministic computation problem, and so encompasses every standard problem in computer science. More than just exponential time (which can mean use of time longer than the length of the known history of the universe for problems of sizes faced practically by human beings every day, like drawing inferences from the state of a high resolution image), some integration problems may even be uncomputable in the Turing sense, and so not just wildly impractical but impossible to implement on any physical substrate (at least if the Church-Turing hypothesis is correct). Amusingly, this extends to the problem above of determining the costs of practical transformations, as determining whether a problem is computable in finite time is itself the classic example of a problem which is not computable.

    So, exact Bayesianism for all conceivable problems is physically impossible, which makes it slightly less compelling as a normative goal.

  • (Gelman 2011)

    I am told I should look at Andrew Gelman’s model of Bayesian methodology, which is supposed to be reasonable even to frequentists (‘I always feel that people who like Gelman would prefer to have no Bayes at all.’)

  • (Shalizi 2004, 2009)

    Mathematical invective from Shalizi, showing that stubbornly applying Bayesian methods to a sufficiently un-cooperative problem with a sufficiently bad model is effectively producing a replicator system. Which is to say, the failure modes are interesting. (Question: Is this behaviour much worse than in a mis-specified dependent frequentist parametric model? I should read it and find out.)

    Chatty summary here.

  • (Sims 2010)

    Sims has a Nobel Memorial Prize, so he gets to speak on behalf of Bayesian econometrics I guess.

3 Strong Bayesianism

See Saying Bayes is not enough.

4 References

Bacchus, Kyburg, and Thalos. 1990. “Against Conditionalization.” Synthese.
Bernardo, Jose M, and de Valencia. 2006. “A Bayesian Mathematical Statistics Primer.”
Bernardo, José M., and Smith. 2000. Bayesian Theory.
Diaconis, and Freedman. 1986. On the Consistency of Bayes Estimates.” The Annals of Statistics.
Freedman. 1999. Wald Lecture: On the Bernstein-von Mises Theorem with Infinite-Dimensional Parameters.” The Annals of Statistics.
Gelman. 2011. Induction and Deduction in Bayesian Data Analysis.” Rationality, Markets and Morals.
Gelman, and Shalizi. 2013. Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.
Jaynes. 1963. “Information Theory and Statistical Mechanics.” In Statistical Physics. Brandeis University Summer Institute Lectures in Theoretical Physics.
Jaynes, and Bretthorst. 2003. Probability Theory: The Logic of Science.
Mayo, and Spanos. 2011. “Error Statistics.” Philosophy of Statistics.
Shalizi. 2004. “The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic.”
———. 2009. Dynamics of Bayesian Updating with Dependent Data and Misspecified Models.” Electronic Journal of Statistics.
Sims. 2010. Understanding Non-Bayesians.” Unpublished Chapter, Department of Economics, Princeton University.