Even for the most currmudgeonly frequentist it is sometimes refreshing to move your effort from deriving frequentist estimators for intractable models, to using the damn Bayesian ones, which fail in different and interesting ways than you are used to. If it works and you are feeling fancy you might then justify your Bayesian method on frequentist grounds, which washes away the sin.
Here are some scattered tidbits here about getting into it. Not attempt is made to be comprehensive, novel, or to even expert.
Everyone references Bayesian Data Analysis as a first stopping point here. It is simple and readable.
The visualisation howto from, basically, the Stan team, is a deeper than it sounds. (Gabry et al. 2019)
See also BAT the Bayesian Analysis Toolkit, which does sophisticated BAyes modelling although AFAICT uses a fairly basic Metropolis-Hastingd Sampler?
Dirichlet processes, Gaussian Processes etc. 🏗
Bacchus, F, H E Kyburg, and M Thalos. 1990. “Against Conditionalization.” Synthese 85 (3): 475–506.
Barbier, Jean, and Nicolas Macris. 2017. “The Stochastic Interpolation Method: A Simple Scheme to Prove Replica Formulas in Bayesian Inference,” May. http://arxiv.org/abs/1705.02780.
Bernardo, José M., and Adrian F. M. Smith. 2000. Bayesian Theory. 1 edition. Chichester: Wiley.
Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” arXiv Preprint arXiv:1509.07164. http://arxiv.org/abs/1509.07164.
Caruana, Rich. 1998. “Multitask Learning.” In Learning to Learn, 95–133. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5529-2_5.
Fink, Daniel. 1997. “A Compendium of Conjugate Priors,” 46.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 182 (2): 389–402. https://doi.org/10.1111/rssa.12378.
Gelman, Andrew. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34. https://doi.org/10.1214/06-BA117A.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3 edition. Boca Raton: Chapman and Hall/CRC.
Gelman, Andrew, and Donald B. Rubin. 1995. “Avoiding Model Selection in Bayesian Social Research.” Sociological Methodology 25: 165–73. https://doi.org/10.2307/271064.
Goodman, Noah, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Daniel Tarlow. 2012. “Church: A Language for Generative Models,” June. http://arxiv.org/abs/1206.3255.
Goodrich, Ben, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Bob Carpenter, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.
Li, Meng, and David B. Dunson. 2016. “A Framework for Probabilistic Inferences from Imperfect Models,” November. http://arxiv.org/abs/1611.01241.
Linden, Sander van der, and Breanne Chryst. 2017. “No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics 3. https://doi.org/10.3389/fams.2017.00012.
MacKay, David JC. 1999. “Comparison of Approximate Methods for Handling Hyperparameters.” Neural Computation 11 (5): 1035–68. https://doi.org/10.1162/089976699300016331.
Mackay, David J. C. 1995. “Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems 6 (3): 469–505. https://doi.org/10.1088/0954-898X_6_3_011.
Raftery, Adrian E. 1995. “Bayesian Model Selection in Social Research.” Sociological Methodology 25: 111–63. https://doi.org/10.2307/271063.
Robert, Christian P. 2007. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. Springer Texts in Statistics. New York: Springer.