Bayesians vs frequentists

Just because we both get the same answer doesn’t mean neither of us is wrong


Sundry schools thought in how to stitch mathematics to the world, brief notes and questions thereto. Justin Domke wrote a Dummy’s guide to risk and decision theory which explains the different assumptions underlying each methodology from the risk and decision theory angle.

A lot of the obvious debates here are, IMO, uninteresting. From where I am standing, Bayes methods are an attractive option for a bunch of pragmatic reasons (MCMC is often simpler and easier than special-case frequntist methods, Bayes methods comes with baked in regularisation through the prior, Bayes methods come with a to-my-mind conceptually easier intepretation of uncertainty). Bayes methods are annoying because some cool things that are natural in a fequentist framing (bootstrap, lasso regression) feel contrived and/or awkward in a Bayes context. These relative advantages are to my mind dwarfed by the practical problems I face that persist in Bayes and frequentist contexts.

But, dig a little and you will find some debates that are more about the philosophy of science itself, and some more that are about branding and market positioning. Bayesian statistics is controversial amongst some frequentists, and vice versa. Sometimes this is for purely terminological reasons, and sometimes for profound philosophical ones. The real flash point is whether Bayesian inference by itself gives a complete, sufficient description of the scientific enterprise and/or rationality. AFAICS the answer to this is trivially “no” but also “that does not at all matter for the statistical problem in front of me right now why are you bothering me with the entirety of science I have a paper deadline?” I imagine this is a common viewpoint, but have not done a survey.

Avoiding the whole accursed issue

You are a card-carrying frequentist and want to use a Bayesian estimator because it’s tractable and simple? No problem. Discuss prior beliefs in terms of something other than probability, use the Bayesian formalism, then produce a frequentist justification.

Now everyone is happy, apart from you, because you had to miss your family’s weekend in the countryside, and cannot remember the name of your new niece.

This is the best option; just be clear about which guarantees your method of choice will give you. There is a diversity of such guarantees across different fields of statistics, and no free lunches. You know, just like you’d expect.

Frequentist vs Bayesian acrimony

Would I prefer to spend time in an interminable and, to outsiders, useless debate? Is there someone I wish to irritate at the next faculty meeting?

Well then, why not try to use your current data set as a case study to answer the following questions:

Can I recycle Bayes belief updating formalism as a measures of certainty for a hypothesis, or not? Which bizarre edge case can I demonstrate by assuming I can? Or by assuming I can’t? Can I straw-man the “other side” into sounding like idiots?

If I can phrase an estimator in terms of Bayesian belief updates, does it mean that anyone who doesn’t phrase an estimator in terms of Bayesian belief updates is doing it wrong and I need tell them so? If someone produces a perfectly good estimator by belief updating, do I regard it as broken if it uses the language of probabilities to describe belief, even when it still satisfies frequentist desiderata such as admissibility? If I can find a Bayesian rationale for a given frequentist method — say, regularisation — does it mean that what the frequentist is “really” doing is the Bayesian thing I just rationalised, but they are ignorant for not describing it in terms of priors?

That should give me some controversies. Now, I can weigh in!

Here is a sampling of expert opinions probably more expert than mine:

  • Wagenmakers on de Finetti on Probability

    Probabilistic reasoning —always to be understood as subjective— merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported or forgotten; it may even relate to something more or less knowable (by means of a computation, a logical deduction, etc.) but for which we are not willing or able to make the effort; and so on.

  • (Jaynes and Bretthorst 2003)

    More or less, claims “Baysian statistical practice IS science”. Makes frequentists angry.

  • Deborah Mayo as a philosopher of science and especiallly of the practice of frequentism, has more than you could possibly wish to know about the details of statistical practice, as well as rhetorical dissection of the F-vs-B debate, and says BTW that “Baysian statistics are not science”. Makes Bayesians angry.

  • Larry Wasserman: Freedman’s neglected theorem

    In this post I want to review an interesting result by David Freedman […]

    The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian prior distributions yield inconsistent posteriors, in a sense we’ll make precise below. The math is uncontroversial but, as you might imagine, the interpretation of the result is likely to be controversial.

    […] as Freedman says in his paper:

    “ … it is easy to prove that for essentially any pair of Bayesians, each thinks the other is crazy.”

    Is that theorem one of these? (Diaconis and Freedman 1986; Freedman 1999)

  • Some issues with Bayesian epistemology | David Childers

    My main concerns are, effectively, computational. As I understand computer science, the processing of information requires real resources, (mostly time, but also energy, space, etc) and so any theory of reasoning which mandates isomorphism between statements for which computation is required to demonstrate equivalence is effectively ignoring real costs that are unavoidable and so must have some impact on decisions. Further, as I understand it, there is no way to get around this by simply adding this cost as a component of the decision problem.…

    The question of these processing costs becomes more interesting to the extent that they are quantitatively nontrivial. As somebody who spends hours running and debugging MCMC samplers and does a lot of reading about Bayesian computation, my takeaway from this literature is that the limits are fundamental. In particular, there are classes of distributions such that the Bayesian update step is hard, for a variety of hardness classes. This includes many distributions where the update step is NP complete, so that our best understanding of P vs NP suggests that the time to perform the update can be exponential in the size of the problem (sampling from spin glass models is an archetypal example, though really any unrestricted distribution over long strings of discrete bits will do). I suppose a kind of trivial example of this is the case with prior mass 1, in which case the hardness reduces to the hardness of the deterministic computation problem, and so encompasses every standard problem in computer science. More than just exponential time (which can mean use of time longer than the length of the known history of the universe for problems of sizes faced practically by human beings every day, like drawing inferences from the state of a high resolution image), some integration problems may even be uncomputable in the Turing sense, and so not just wildly impractical but impossible to implement on any physical substrate (at least if the Church-Turing hypothesis is correct). Amusingly, this extends to the problem above of determining the costs of practical transformations, as determining whether a problem is computable in finite time is itself the classic example of a problem which is not computable.

    So, exact Bayesianism for all conceivable problems is physically impossible, which makes it slightly less compelling as a normative goal.

  • (Gelman 2011)

    I am told I should look at Andrew Gelman’s model of Bayesian methodology, which is supposed to be reasonable even to frequentists (‘I always feel that people who like Gelman would prefer to have no Bayes at all.’)

  • (Shalizi 2004, 2009)

    Mathematical invective from Shalizi, showing that stubbornly applying Bayesian methods to a sufficiently un-cooperative problem with a sufficiently bad model is effectively producing a replicator system. Which is to say, the failure modes are interesting. (Question: Is this behaviour much worse than in a mis-specified dependent frequentist parametric model? I should read it and find out.)

    Chatty summary here.

  • (Sims 2010)

    Sims has a Nobel Memorial Prize, so he gets to speak on behalf of Bayesian econometrics I guess.

  • nostalgebraist grumps about strong Bayes as a methodology for science. Key point: you do not have all the possible models, and you do not have the computational resource to assigned them posterior likelihoods if you did; assuming that you can do Bayesian learning over them is thus a broken model for learning the world.

References

Bacchus, F, H E Kyburg, and M Thalos. 1990. “Against Conditionalization.” Synthese 85 (3): 475–506.
Bernardo, Jose M, and Universitat de Valencia. 2006. “A Bayesian Mathematical Statistics Primer,” 6.
Bernardo, José M., and Adrian F. M. Smith. 2000. Bayesian Theory. 1 edition. Chichester: Wiley.
Diaconis, Persi, and David Freedman. 1986. “On the Consistency of Bayes Estimates.” The Annals of Statistics 14 (1): 1–26. http://www.jstor.org/stable/2241255.
Freedman, David. 1999. “Wald Lecture: On the Bernstein-von Mises Theorem with Infinite-Dimensional Parameters.” The Annals of Statistics 27 (4): 1119–41. https://doi.org/10.1214/aos/1017938917.
Gelman, Andrew. 2011. “Induction and Deduction in Bayesian Data Analysis.” Rationality, Markets and Morals 2 (67-78). http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy_online4.pdf.
Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66 (1): 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.
Jaynes, Edwin Thompson. 1963. “Information Theory and Statistical Mechanics.” In Statistical Physics. Vol. 3. Brandeis University Summer Institute Lectures in Theoretical Physics.
Jaynes, Edwin Thompson, and G Larry Bretthorst. 2003. Probability Theory: The Logic of Science. Cambridge, UK; New York, NY: Cambridge University Press.
Mayo, D G, and A Spanos. 2011. “Error Statistics.” Philosophy of Statistics 7: 153.
Shalizi, Cosma Rohilla. 2004. “The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic.”
———. 2009. “Dynamics of Bayesian Updating with Dependent Data and Misspecified Models.” Electronic Journal of Statistics 3: 1039–74. https://doi.org/10.1214/09-EJS485.
Sims, C. 2010. “Understanding Non-Bayesians.” Unpublished Chapter, Department of Economics, Princeton University. http://sims.princeton.edu/yftp/UndrstndgNnBsns/GewekeBookChpter.pdf.