statistics on The Dan MacKinlay family of variably-well-considered enterprises
https://danmackinlay.name/tags/statistics.html
Recent content in statistics on The Dan MacKinlay family of variably-well-considered enterprisesHugo -- gohugo.ioen-usWed, 03 Mar 2021 12:29:39 +1100Learning on manifolds
https://danmackinlay.name/notebook/learning_on_manifolds.html
Wed, 03 Mar 2021 12:29:39 +1100https://danmackinlay.name/notebook/learning_on_manifolds.htmlLearning on a given manifold Information Geometry Hamiltonian Monte Carlo Langevin Monte Carlo Natural gradient Homogeneous probability References Abraham Bosse, Moyen vniuersel de pratiquer la perspectiue sur les tableaux, ou surfaces irregulieres : ensemble quelques particularitez concernant cet art, & celuy de la graueure en taille-douce (1653)
A placeholder for learning on curved spaces. Not discussed: learning OF the curvature of spaces.
Learning on a given manifold Learning where there is an a priori manifold seems to also be a usage here?Mind reading by computer
https://danmackinlay.name/notebook/mind_reading.html
Wed, 03 Mar 2021 10:44:24 +1100https://danmackinlay.name/notebook/mind_reading.htmlBase level: brain imaging Advanced: brain decoding References A placeholder.
I’d like to know how good the results are getting in this area, and how general across people/technologies etc. How close are we to the point that someone can put an arbitrary individual in some kind of tomography machine and say what they are thinking without pre-training or priming?
Base level: brain imaging The instruments we have are blunt.Data sets
https://danmackinlay.name/notebook/spatial_data_sets.html
Tue, 02 Mar 2021 10:25:06 +1100https://danmackinlay.name/notebook/spatial_data_sets.htmlSatellite images of various stripes are review. If you just want eye candy, NASA Visible Earth is a good one. I’m fond of LANDSAT maps. Various can be found through earth explorer.
See also Australia-specific stuff.
Also interesting CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations
pangeo is an umbrella organisation providing many geospatial data tools including a catalogueCombining kernels
https://danmackinlay.name/notebook/kernel_compound.html
Mon, 01 Mar 2021 19:53:09 +1100https://danmackinlay.name/notebook/kernel_compound.htmlLocally stationary kernels Stationary reducible kernels Other nonstationary kernels References A sum or product (or outer sum, or tensor product) of kernels is still a kernel. For other transforms YMMV.
For example, in the case of Gaussian processes, suppose that, independently,
\[\begin{aligned} f_{1} &\sim \mathcal{GP}\left(\mu_{1}, k_{1}\right)\\ f_{2} &\sim \mathcal{GP}\left(\mu_{2}, k_{2}\right) \end{aligned}\] then
\[ f_{1}+f_{2} \sim \mathcal{GP} \left(\mu_{1}+\mu_{2}, k_{1}+k_{2}\right) \] so \(k_{1}+k_{2}\) is also a kernel.Kernel zoo
https://danmackinlay.name/notebook/kernel_zoo.html
Mon, 01 Mar 2021 19:53:09 +1100https://danmackinlay.name/notebook/kernel_zoo.htmlStationary dot-product Arc-cosine kernel Causal kernels Markov kernels Wiener process kernel Squared exponential Rational Quadratic Matérn Periodic Locally periodic “Integral” kernel Composing kernels Stationary spectral kernels Nonstationary spectral kernels Compactly supported Genton kernels Kernels with desired symmetry Stationary reducible kernels Other nonstationary kernels References What follows are some useful kernels to have in my toolkit, mostly over \(\mathbb{R}^n\) or at least some space with a metric.Convolutional Gaussian processes
https://danmackinlay.name/notebook/gp_convolution.html
Mon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Random fields as stochastic differential equations
https://danmackinlay.name/notebook/random_fields_as_sdes.html
Mon, 01 Mar 2021 17:08:40 +1100https://danmackinlay.name/notebook/random_fields_as_sdes.htmlCreating a stationary Markov SDE with desired covariance Convolution representations References The representation of certain random fields, especially Gaussian random fields as stochastic differential equations. This is the engine that makes filtering Gaussian processes go, and is also a natural framing for probabilistic spectral analysis.
I do not have much to say right now about this, but I am using it so watch this space.Convolutional stochastic processes
https://danmackinlay.name/notebook/stochastic_convolution.html
Mon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes by convolution of white noise with smoothing kernels.
For now this effectively means Gaussian processes.
References Bolin, David. 2014. “Spatial Matérn Fields Driven by Non-Gaussian Noise.” Scandinavian Journal of Statistics 41 (3): 557–79. https://doi.org/10.1111/sjos.12046. Higdon, Dave. 2002. “Space and Space-Time Modeling Using Process Convolutions.” In Quantitative Methods for Current Environmental Issues, edited by Clive W. Anderson, Vic Barnett, Philip C. Chatwin, and Abdel H.Covariance functions
https://danmackinlay.name/notebook/kernel_learning.html
Mon, 01 Mar 2021 13:25:10 +1100https://danmackinlay.name/notebook/kernel_learning.htmlLearning kernel hyperparameters Learning kernel composition Hyperkernels References This is usually in the context of Gaussian processes where everything can work out nicely if you are lucky, but other kernel machines are OK too. The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML.
Learning kernel hyperparameters 🏗
Learning kernel composition Automating kernel design by some composition of simpler atomic kernels.Memetics
https://danmackinlay.name/notebook/memetics.html
Sat, 27 Feb 2021 11:11:07 +1100https://danmackinlay.name/notebook/memetics.htmlGirardian mimetic violence Pluralistic ignorance Toxoplasma of rage References Placeholder.
Clara Vandeweerdt
I work on public opinion, media and climate politics. My latest paper talks about how reporting on events looks very different depending on the ideological color of the media outlet.
YY Ahn
Giulio Rossetti, author of various network analysis libraries such as dynetx and ndlib.
Francesco Bonchi
Sune Lehmann
Writ large, Everything is Obvious (once you know the answer) is Duncan Watts’ laundry list of examples of how all our intuitions about how society works are self-justifying guesses divorced from evidence, except for his, because Yahoo let him build his own experimental online social networks.Probability divergences
https://danmackinlay.name/notebook/probability_metrics.html
Tue, 23 Feb 2021 12:20:08 +1100https://danmackinlay.name/notebook/probability_metrics.htmlOverview Norms with respect to Lebesgue measure on the state space \(\phi\)-divergences Kullback-Leibler divergence Total variation distance Hellinger divergence \(\alpha\)-divergence \(\chi^2\) divergence Hellinger inequalities Pinsker inequalities Integral probability metrics Wasserstein distance(s) Fisher distances Others Induced topologies To read References Allison Chaney
Quantifying difference between probability measures. Measuring the distribution itself, for, e.g. badness of approximation of a statistical fit. The theory of binary experiments.Statistics and machine learning
https://danmackinlay.name/notebook/statistics_ml.html
Sun, 21 Feb 2021 12:12:09 +1100https://danmackinlay.name/notebook/statistics_ml.htmlReferences This page mostly exists to collect a good selection of overview statistics introductions that are not terrible. I’m especially interested in modern fusion methods that harmonise what we would call statistics and machine learning methods, and the unnecessary terminological confusion between those systems.
Here are some recommended courses to get started if you don’t know what you’re doing.
Larry Wasserman’s stats course Shalizi’s regression lectures Moritz Hardt, Benjamin Recht Patterns, predictions, and actions: A story about machine learning See also the recommended texts below.The Gaussian distribution
https://danmackinlay.name/notebook/gaussian_distribution.html
Thu, 18 Feb 2021 12:45:45 +1100https://danmackinlay.name/notebook/gaussian_distribution.htmlWhat is Erf again? Representations of density and CDF Left tail of iCDF ODE representation for the univariate density ODE representation for the univariate icdf Density PDE representation as a diffusion equation Orthogonal basis Rational approximations Roughness Multidimensional marginals and conditionals Transformed variables Metrics Wasserstein Kullback-Leibler Hellinger References Stunts with bell curves distributions.
Let’s start here with the basic thing.Causal inference in the continuous limit
https://danmackinlay.name/notebook/causality_continuous.html
Wed, 17 Feb 2021 20:41:08 +1100https://danmackinlay.name/notebook/causality_continuous.htmlReferences Causality on continuous index spaces, and, which turns out to be related, equilibrium dynamics. Placeholder.
References Blom, Tineke, Stephan Bongers, and Joris M. Mooij. 2020. “Beyond Structural Causal Models: Causal Constraints Models.” In Uncertainty in Artificial Intelligence, 585–94. PMLR. http://proceedings.mlr.press/v115/blom20a.html. Bongers, Stephan, and Joris M. Mooij. 2018. “From Random Differential Equations to Structural Causal Models: The Stochastic Case.” March 27, 2018. http://arxiv.org/abs/1803.08784. Hansen, Niels, and Alexander Sokol.Stability in linear dynamical systems
https://danmackinlay.name/notebook/stability_dynamical_linear.html
Tue, 16 Feb 2021 08:23:02 +1100https://danmackinlay.name/notebook/stability_dynamical_linear.htmlPole representations Reparameterisation Continuous time Stability and gradient descent References The intersection of linear dynamical systems and stability of dynamic systems.
There is not much content here because I spent 2 years working on it and am too traumatised to revisit it.
Informally, I am admitting as “stable” any dynamical system which does not explode super-polynomially fast; We can think of these as systems where if the system is not stationary then at least the rate of change might be.R
https://danmackinlay.name/notebook/r.html
Sun, 07 Feb 2021 18:42:53 +1100https://danmackinlay.name/notebook/r.htmlPros and cons Good Bad Installing R Installing packages ODEs Command-line scripting Recommended config UI Path surgery for linuxbrew Needful packages Shiny The tidyverse Blogging / reports / reproducible research Machine learning Dataframe alternatives Testing Plotting High performance R Interacting with Julia IDEs Rstudio Jamovi VS Code for R Radian Exploratory Intro help Saving and loading Subsetting hell Data exchange How to pass sparse matrices between R and Python Debugging Inspecting frames post hoc Basic interactive debugger Graphical interactive optionally-web-based debugger R for Pythonistas Opaque imports No scalar types… …yet verbose vector literal syntax what files do I need?Mind as statistical learner
https://danmackinlay.name/notebook/mind_as_ml.html
Thu, 28 Jan 2021 19:30:56 +1100https://danmackinlay.name/notebook/mind_as_ml.htmlLanguage theory Descriptive Bayesian models of cognition That free energy thing References Various morsels on the theme of what-machine-learning-teaches-us-about-our-own-learning. Thus biomimetic algorithms find their converse in our algo-mimetic biology.
This should be more about general learning theory insights. Nitty gritty details about how computing is done by biological systems is more what I think of as biocomputing. If you can unify those then well done, you can grow minds in a petri dish.Feynman-Kac formulae
https://danmackinlay.name/notebook/feynman_kac.html
Wed, 27 Jan 2021 11:55:19 +1100https://danmackinlay.name/notebook/feynman_kac.htmlReferences There is a mathematically rich theory about particle filters work. The notoriously abstruse Del Moral (2004); Doucet, Freitas, and Gordon (2001) are universally commended for unifying and making consistent the diffusion processes and Feynman-Kac formulae and “propagation of chaos”. I will get around to them eventually, maybe?
References Cérou, F., P. Del Moral, T. Furon, and A. Guyader. 2011. “Sequential Monte Carlo for Rare Event Estimation.Kernel warping
https://danmackinlay.name/notebook/kernel_warping.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_warping.htmlStationary reducible kernels Classic deformations MacKay warping Learning transforms References A nonlinear way of transforming stationary kernels into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).
This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somwhat) arbitrary transforms to attain stationarity.Miscellaneous nonstationary kernels
https://danmackinlay.name/notebook/kernel_nonstationary.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_nonstationary.htmlReferences Kernels that are nonstationary constructed by other means than warping stationary ones.
Maybe start with Jun and Stein (2008);Fuglstad et al. (2015); Fuglstad et al. (2013)?
References Bolin, David, and Kristin Kirchner. 2020. “The Rational SPDE Approach for Gaussian Random Fields With General Smoothness.” Journal of Computational and Graphical Statistics 29 (2): 274–85. https://doi.org/10.1080/10618600.2019.1665537. Bolin, David, and Finn Lindgren. 2011. “Spatial Models Generated by Nested Stochastic Partial Differential Equations, with an Application to Global Ozone Mapping.Bayesians vs frequentists
https://danmackinlay.name/notebook/bayesians_vs_frequentists.html
Thu, 14 Jan 2021 15:35:59 +1100https://danmackinlay.name/notebook/bayesians_vs_frequentists.htmlAvoiding the whole accursed issue Frequentist vs Bayesian acrimony Strong Bayesianism References Disagreements in posterior updates
Sundry schools thought in how to stitch mathematics to the world, brief notes and questions thereto. Justin Domke wrote a Dummy’s guide to risk and decision theory which explains the different assumptions underlying each methodology from the risk and decision theory angle.
A lot of the obvious debates here are, IMO, uninteresting.Statistical mechanics of statistics
https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.html
Wed, 06 Jan 2021 12:46:59 +1100https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.htmlPhase transitions in statistical inference Replicator equations and evolutionary processes References Boaz Barak has a miniature dictionary for statisticians:
I’ve always been curious about the statistical physics approach to problems from computer science. The physics-inspired algorithm survey propagation is the current champion for random 3SAT instances, statistical-physics phase transitions have been suggested as explaining computational difficulty, and statistical physics has even been invoked to explain why deep learning algorithms seem to often converge to useful local minima.Covariance functions
https://danmackinlay.name/notebook/covariance_kernels.html
Tue, 05 Jan 2021 15:07:38 +1100https://danmackinlay.name/notebook/covariance_kernels.htmlCovariance kernels of some example processes A simple Markov chain The Hawkes process Gaussian processes General real covariance kernels Bonus: complex covariance kernels Kernel zoo Learning kernels Non-positive kernels References A realisation of a nonstationary rough covariance process (partially observed)
On the interpretation of kernels as the covariance functions of stochastic processes, whcih is one way to define stochastic processes.
Suppose we have a real-valued stochastic processDiagramming and visualising graphical models
https://danmackinlay.name/notebook/diagrams_graphical_models.html
Sun, 03 Jan 2021 20:41:12 +1100https://danmackinlay.name/notebook/diagrams_graphical_models.htmlDaggity dagR yEd diagrammeR Mermaid TETRAD Matplotlib Graphviz tikz Misc References On the art and science of algorithmic line drawings for representing graphical models, which is a important part of statistics. The diagrams we need here are nearly flowchart-like, so I can sketch them with a flowchart if need be; but they are closely integrated with the equations of a particular statistical model, so I would like to incorporate them into the same system to avoid tedious and error-prone manual sync.Psychometrics
https://danmackinlay.name/notebook/psychometrics.html
Fri, 18 Dec 2020 09:38:19 +1100https://danmackinlay.name/notebook/psychometrics.htmlCausality and confounding G-factor No but actually 8 factors Big-5 personality traits Methodology References On measuring the minds of people and possibly even discovering something about them thereby.
Causality and confounding Some of the attempt to measure people’s minds end up tautological rather than explanatory. Because of my own intellectual history I think of this as what Bateson (2002) calls a dormitive principle problem.Reparameterization tricks in inference
https://danmackinlay.name/notebook/reparameterization_trick.html
Fri, 18 Dec 2020 09:28:14 +1100https://danmackinlay.name/notebook/reparameterization_trick.htmlFor variational autoencoders “Normalized” flows For vanilla density estimation Representational power of Tutorials References Approximating the desired distribution by perturbation of the available distribution
A trick in e.g. variational inference, especially autoencoders, for density estimation in probabilistic deep learning, best summarised as “fancy change of variables to that I can differentiate through the parameters of a distribution”. Connections to optimal transport and likelihood free inference in that this trick can enable some clever approximate-likelihood approaches.Garbled highlights from NeurIPS 2020
https://danmackinlay.name/post/neurips2020.html
Fri, 11 Dec 2020 09:35:39 +1100https://danmackinlay.name/post/neurips2020.htmlworkshops interesting papers by ad hoc theme causality learning in continuous time/depth Learning with weird losses ML physical sciences References workshops Machine Learning for Creativity and Design Workshop on Deep Learning and Inverse Problems Differentiable vision, graphics, and physics applied to machine learning Learning Meaningful Representations of Life Tackling Climate Change with Machine Learning AI for Earth Sciences Causal Discovery & Causality-Inspired Machine Learning Interpretable Inductive Biases and Physically Structured Learning interesting papers by ad hoc theme causality Causal Imitation Learning With Unobserved Confounders Causal Learning Domain Adaptation as a Problem of Inference on Graphical Models Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs Differentiable Causal Discovery from Interventional Data learning in continuous time/depth Almost surely stable deep dynamics Learning Differential Equations that are Easy to Solve Dissecting Neural ODEs STEER : Simple Temporal Regularization For Neural ODE Training Generative Adversarial Networks by Solving Ordinary Differential Equations Ode to an ODE Time-Reversal Symmetric ODE Network Hypersolvers: Toward Fast Continuous-Depth Models On Second Order Behaviour in Augmented Neural ODEs Neural Controlled Differential Equations for Irregular Time Series Learning with weird losses Quantile Propagation for Wasserstein-Approximate Gaussian Processes All your loss are belong to Bayes The Wasserstein Proximal Gradient Algorithm Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality ML physical sciences References Alexander Norcliffe, Cristian Bodnar, Ben Day, Jacob Moss, and Pietro Liò.Data sets
https://danmackinlay.name/notebook/data_sets.html
Wed, 02 Dec 2020 14:11:44 +1100https://danmackinlay.name/notebook/data_sets.htmlDatasets about Australia Miscellaneous data sets Collected open data sets at cloud providers Social network-ey ones 3d sensor data Geodata Generic tools for construction thereof Datasets about Australia See Australia in data.
Miscellaneous data sets sods/ods: Open Data Science includes a huge number of small and useful datasets wrapped in a python interface. The documentation is not clear or obvious, and the release schedule is abysmal.Free energy
https://danmackinlay.name/notebook/free_energy.html
Tue, 01 Dec 2020 15:26:58 +1100https://danmackinlay.name/notebook/free_energy.htmlIn variational Bayes As a model for cognition References Not “free as in speech” or “free as in beer”, nor “free energy” in the sense of perpetual motion machines, zero point energy or pills that turn your water into petroleum, but rather a particular mathematical object that pops up in variational Bayes inference and in wacky theories of cognition.
In variational Bayes Variational Bayes inference is a formalism for learning, borrowing bits from statistical mechanics and graphical models.Free energy
https://danmackinlay.name/notebook/predictive_processing.html
Tue, 01 Dec 2020 15:26:58 +1100https://danmackinlay.name/notebook/predictive_processing.htmlFree energy References Confirmation Bias in Action – Put A Number On It! Book Review: Surfing Uncertainty | Slate Star Codex Related: mind as learning process.
Free energy This term, with an analogous definition to the use in variational inference appears to pop up in a “free energy principle” where it is instrumental as a unifying concept for learning systems such as brains.Random embeddings and hashing
https://danmackinlay.name/notebook/random_embedding.html
Tue, 01 Dec 2020 14:01:36 +1100https://danmackinlay.name/notebook/random_embedding.htmlReferences Separation of inputs by random projection
See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, arndomised regressions etc.
Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).
Cover’s Theorem (Cover 1965):
It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems.Randomised regression
https://danmackinlay.name/notebook/randomised_regression.html
Tue, 01 Dec 2020 14:00:10 +1100https://danmackinlay.name/notebook/randomised_regression.htmlReferences Tackling your regression, by using random projections of the predictors.
Usually this means using those projections to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation.
I am especially interested in seeing how this might be useful for dependent data, especially time series.Distribution regression
https://danmackinlay.name/notebook/distribution_regression.html
Tue, 01 Dec 2020 08:31:58 +1100https://danmackinlay.name/notebook/distribution_regression.htmlReferences Poczos et al. (2013):
‘Distribution regression’ refers to the situation where a response \(Y\) depends on a covariate \(P\) where \(P\) is a probability distribution. The model is \(Y=f(P)+\mu\) where \(f\) is an unknown regression function and \(\mu\) is a random error. Typically, we do not observe \(P\) directly, but rather, we observe a sample from \(P .\)
References Bachoc, F., F.Databases viewers / editors
https://danmackinlay.name/notebook/database_uis.html
Tue, 01 Dec 2020 07:54:52 +1100https://danmackinlay.name/notebook/database_uis.htmlFiling Directus OpenRefine Sqlite browser sqlite studio MySQL Workbench datasette sqlelectron redash dbeaver If you want to access databases and query/interact/edit them there are myriad options. I’ve split off the tools that specialize in visualisation and plotting under data dashboards. You can sometimes get db-editing like behaviour out of spreadsheets. See Data organization in spreadsheets
Filing graphql-editor/graphql-editor: 📺 Visual Editor & GraphQL IDE. Draw GraphQL schemas using visual 🔷 nodes and explore GraphQL API with beautiful UI.Recommender systems
https://danmackinlay.name/notebook/recommender_systems.html
Mon, 30 Nov 2020 14:55:18 +1100https://danmackinlay.name/notebook/recommender_systems.htmlReferences Not my area, but I need a landing page to refer to for some non-specialist contacts of mine.
I am most familiar with the matrix factorization approaches (e.g. factorization machines, NNMF) but there are many, e.g. variational autoencoder approaches are en vogue.
An overview by Javier lists many approaches.
Most Popular recommendations (the baseline) Item-User similarity based recommendations kNN Collaborative Filtering recommendations GBM based recommendations Non-Negative Matrix Factorization recommendations Factorization Machines (Steffen Rendle 2010) Field Aware Factorization Machines (Yuchin Juan, et al, 2016) Deep Learning based recommendations (Wide and Deep, Heng-Tze Cheng, et al, 2016) Neural Collaborative Filtering (Xiangnan He et al.R packaging, installation etc
https://danmackinlay.name/notebook/r_packaging.html
Mon, 30 Nov 2020 09:54:55 +1100https://danmackinlay.name/notebook/r_packaging.htmlInstalling R Installing R packages Smoother experience for macos Choosing package location Dependency management Writing packages easy project reload Hadley Wickham pro-style Installing R The following seemed to be simplest for me on macos.
brew install --cask r Note that brew install r without the cask part leads to some compilation issues for package dependencies
For ubuntu, it seems less troublesome to use the OS packages.Variational inference by message-passing in graphical models
https://danmackinlay.name/notebook/message_passing.html
Wed, 25 Nov 2020 17:42:32 +1100https://danmackinlay.name/notebook/message_passing.htmlReferences Variational inference where the model factorizes over some graphical independence structure, which means we get cheap and distributed inference. I am currently particularly interested in this for latent GP models. Many things can be expressed as message passing algorithms. The grandparent idea in this unification seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc.Probabilistic spectral analysis
https://danmackinlay.name/notebook/probabilistic_spectral_analysis.html
Wed, 25 Nov 2020 11:33:34 +1100https://danmackinlay.name/notebook/probabilistic_spectral_analysis.htmlClassic: stochastic processes studied via correlation function Non-stationary spectral kernel Change point detection version Non-Gaussian approaches References Graphical introduction to nonstationary modelling of audio data. The input (bottom) is a sound recording of female speech. We seek to decompose the signal into Gaussian process carrier waveforms (blue block) multiplied by a spectrogram (green block). The spectrogram is learned from the data as a nonnegative matrix of weights times positive modulators (top).Hidden Markov Model inference for Gaussian Process regression
https://danmackinlay.name/notebook/gp_filtering.html
Wed, 25 Nov 2020 11:28:43 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Miscellaneous notes towards implementation References Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.Data dashboards
https://danmackinlay.name/notebook/data_dashboards.html
Wed, 18 Nov 2020 10:06:11 +1100https://danmackinlay.name/notebook/data_dashboards.htmlDash Voilà streamlit Grafana R Tableau superset blazer Metabase Database flow At the intersection of data visualisation and database UI is the data dashboard. AFAICT this means “an exploratory graphing tool for your data which requires little or no programming or statistics special knowledge”. Occasionally useful. Occasionally cargo-culted by bizdev people who don’t know what they are doing. See also, e.g. the open source dashboard framework roundup, or the alternativeto Tableau listing.Tensorflow
https://danmackinlay.name/notebook/tensorflow.html
Tue, 10 Nov 2020 14:20:46 +1100https://danmackinlay.name/notebook/tensorflow.htmlAbstractions Tutorials Debugging Tensorboard Getting data in (Non-recurrent) convolutional networks Recurrent networks Official documentation Community guides Keras: The recommended way of using tensorflow Getting models out Training in the cloud because you don’t have NVIDIA sponsorship Extending Misc HOWTOs Nightly builds Dynamic graphs GPU selection Silencing tensorflow Hessians and higher order optimisation Manage tensorflow environments Optimisation tricks Probabilistic networks A C++/Python/etc neural network toolkit by Google.External validity
https://danmackinlay.name/notebook/external_validity.html
Mon, 09 Nov 2020 15:58:56 +1100https://danmackinlay.name/notebook/external_validity.htmlStandard graphical models Tools Salad Meta References TBD.
This Maori gentleman from the 1800s demonstrates an artful transfer learning from the western fashion domain
One could read Sebastian Ruder’s NN-style introduction to “transfer learning”. NN people like to think about this in particular way which I like because of the diversity of out-of-the-box ideas it invites and which I dislike because it is sloppy.Observability and sensitivity in learning dynamical systems
https://danmackinlay.name/notebook/sensitivity.html
Mon, 09 Nov 2020 13:38:40 +1100https://danmackinlay.name/notebook/sensitivity.htmlReferences The contact between ergodic theorems and statistical identifiability. How precisely can I learn a given parameter of a dynamical system from observation? In ODE theory a useful concept is sensitivity analysis, which tells us how much gradient information our observations give us about a parameter. This comes in local (at my current estimate) and global (for all parameter ranges) flavours
In linear systems theory the term observability is used to discuss whether we can in fact identify a parameter or a latent state, which I will conflate for the current purposes.Weighted data in statistics
https://danmackinlay.name/notebook/weighted_data.html
Fri, 06 Nov 2020 08:48:18 +1100https://danmackinlay.name/notebook/weighted_data.htmlThomas Lumley helpfully disambiguates the “three and half distinct uses of the term weights in statistical methodology”.
The three main types of weights are
the ones that show up in the classical theory of weighted least squares. These describe the precision (1/variance) of observations. …. I call these precision weights; Stata calls them analytic weights. the ones that show up in categorical data analysis. These describe cell sizes in a data set, so a weight of 10 means that there are 10 identical observations in the dataset, which have been compressed to a covariate pattern plus a count.Causal inference on DAGs
https://danmackinlay.name/notebook/causal_inference.html
Wed, 04 Nov 2020 12:36:13 +1100https://danmackinlay.name/notebook/causal_inference.htmlLearning materials Counterfactuals External validity Propensity scores Causal Graph inference from data Causal time series DAGS Drawing graphical models Tools References Inferring the optimal intervention requires accounting for which arrows are independent of which
Inferring cause and effect from nature. Graphical models and related techniques for doing it. Avoiding the danger of folk statistics. Observational studies, confounding, adjustment criteria, d-separation, identifiability, interventions, moral equivalence…Statistics, computational complexity thereof
https://danmackinlay.name/notebook/statistics_complexity.html
Tue, 03 Nov 2020 16:04:50 +1100https://danmackinlay.name/notebook/statistics_complexity.htmlReferences Statistical inference when computation isn’t free; what does this tell us about the learnable?
References Bossaerts, Peter, Nitin Yadav, and Carsten Murawski. 2019. “Uncertainty and Computational Complexity.” Philosophical Transactions of the Royal Society B: Biological Sciences 374 (1766): 20180138. https://doi.org/10.1098/rstb.2018.0138. Frey, B. J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416.Classification
https://danmackinlay.name/notebook/classification.html
Mon, 02 Nov 2020 08:02:45 +1100https://danmackinlay.name/notebook/classification.htmlMulti-label Unbalanced class problems Calibration Metric Zoo Matthews correlation coefficient ROC/AUC Cross entropy f-measure et al Philosophical connection to semantics References Multi-label Precision/Recall and f-scores all work for multi-label classification, although they have bad qualities in unbalanced classes.
Unbalanced class problems 🏗
Calibration Kenneth Tay says
In the context of binary classification, calibration refers to the process of transforming the output scores from a binary classifier to class probabilities.Publication bias
https://danmackinlay.name/notebook/publication_bias.html
Sun, 01 Nov 2020 15:16:14 +1100https://danmackinlay.name/notebook/publication_bias.htmlFixing P-hacking References We’re out here everyday, doing the dirty work finding noise and then polishing it into the hypotheses everyone loves. It’s not easy. —John Schmidt, The noise miners
The noise mining process
Multiple testing across a whole scientific field, with a side helping of uneven data release.
On one hand we hope that journals will help us find things that are relevant.ELBO
https://danmackinlay.name/notebook/elbo.html
Wed, 28 Oct 2020 10:59:07 +1100https://danmackinlay.name/notebook/elbo.htmlReferences \(\renewcommand{\Ex}{\mathbb{E}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\kl}{\operatorname{KL}} \renewcommand{\H}{\mathbb{H}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\pd}{\partial}\)
On using the most convenient probability metric (i.e. KL divergence) to do variational inference.
There is nothing novel here. But everyone who is doing variational inference has to work through this just once, and I’m doing so here.
Yuge Shi’s introduction is the best short intro that gets to state-of-the-art. The canonical intro is de Garis Matthews (2017) who did a thesis on it.Efficient factoring of GP likelihoods
https://danmackinlay.name/notebook/gp_factoring.html
Mon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.