machine_learning on Dan MacKinlay
https://danmackinlay.name/tags/machine_learning.html
Recent content in machine_learning on Dan MacKinlayHugo -- gohugo.ioen-usTue, 11 May 2021 11:36:54 +1000Infinite width limits of neural networks
https://danmackinlay.name/notebook/nn_infinite_width.html
Tue, 11 May 2021 11:36:54 +1000https://danmackinlay.name/notebook/nn_infinite_width.htmlNeural Network Gaussian Process Neural Network Tangent Kernel Implicit regularization Dropout As stochastic processes References Large-width limits of neural nets.
Neural Network Gaussian Process For now: See Neural network Gaussian process on Wikipedia.
The field that sprang from the insight (Neal 1996a) that in the infinite limit, random neural nets with Gaussian weights and appropriate scaling asymptotically approach Gaussian processes, and there are useful conclusions we can draw from that.Compressing neural nets
https://danmackinlay.name/notebook/nn_compressing.html
Fri, 07 May 2021 09:22:39 +1000https://danmackinlay.name/notebook/nn_compressing.htmlLottery tickets References How to make neural nets smaller while still preserving their performance. This is a subtle problem, As we suspect that part of their special sauce is precisely that they are overparameterized which is to say, one reason they work is precisely that they are bigger than they “need” to be. The problem of finding the network that is smaller than the bigger that it seems to need to be is tricky.Overparameterization
https://danmackinlay.name/notebook/overparameterization.html
Fri, 07 May 2021 09:20:00 +1000https://danmackinlay.name/notebook/overparameterization.htmlReferences Notes on the general technique of increasing the number of slack parameters you have, especially in machine learning. Maybe this should rather be simply parameterization, given that I am increasingly concerned with how to select parameterisations for problems with respect to which inference will find good optima, which is not necessarily
The combination of overparameterization and SGD is argued to be the secret to how deep learning works, by Zeyuan Allen-Zhu, Yuanzhi Li and Zhao Song.Moral philosophy
https://danmackinlay.name/notebook/moral_philosophy.html
Wed, 21 Apr 2021 18:01:39 +0800https://danmackinlay.name/notebook/moral_philosophy.htmlThe human side of moral calculations except you can get away without having to write it down, usually.
Miscellaneous link salad.
Robust egg offsetting: It is hard to explain the experience of reading this, but it is recommended. Imagine dining with a person who wishes to offset cruelty but also does not with to implement policies to reduce cruelty en masse. I forsee a fascinating chat over foie gras and ortolan.Machine learning for partial differential equations
https://danmackinlay.name/notebook/ml_pde.html
Tue, 13 Apr 2021 16:24:05 +0800https://danmackinlay.name/notebook/ml_pde.htmlLearning a PDE The PINN lineage Deterministic PINN Stochastic PINN Weak formulation Learning a PDE forward operator Fourier neural operator DeepONet Advection-diffusion PDEs in particular Boundary conditions Inverse problems Differentiable solvers DeepXDE ADCME TenFEM JuliaFEM Trixi FEniCS taichi References Using statistical or machine learning approaches to solve PDEs, and maybe even to perform inference through them. There are various approaches here, which I will document on an ad hoc basis as I need them.ML benchmarks and their pitfalls
https://danmackinlay.name/notebook/ml_benchmarks.html
Tue, 13 Apr 2021 16:01:37 +0800https://danmackinlay.name/notebook/ml_benchmarks.htmlReferences Machine learning’s gamified version of the replication crisis is a paper mill, or perhaps paper tradmill. In this system soemthing counts as “results” if it performs on some conventional benchmarks. But how often does that demonstrate real progress and how often is it overfitting to benchmarks?
Oleg Trott on How to sneak up competition leaderboards.
Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law.Hydrology models
https://danmackinlay.name/notebook/hydrology.html
Tue, 13 Apr 2021 14:36:12 +0800https://danmackinlay.name/notebook/hydrology.htmlTools Simulation Geostats Framework Datasets References Refugees from Brazil’s Grande Seca drought of 1878
TBD: Ground water hydrology, surface water hydrology, coastal water hydrology… UCSB Climate hazards data.
Tools Simulation AFAICT the go-to applied reference here is Anderson, Woessner, and Hunt (2015).
MODFLOW
MODFLOW is the USGS’s modular hydrologic model. MODFLOW is considered an international standard for simulating and predicting groundwater conditions and groundwater/surface-water interactions.Machine learning for physical sciences
https://danmackinlay.name/notebook/ml_physics.html
Fri, 09 Apr 2021 11:48:30 +0800https://danmackinlay.name/notebook/ml_physics.htmlML for PDEs Causality, identifiability, and observational data Likelihood free inference Emulation approaches The other direction: What does physics say about learning? But statistics is ML Applications References Consider a spherical flame
In physics, typically, we are concerned with identifying True Parameters for Universal Laws, applicable without prejudice across all the cosmos. We are hunting something like the Platonic ideals that our experiments are poor shadows of.Statistics and machine learning
https://danmackinlay.name/notebook/statistics_ml.html
Thu, 08 Apr 2021 13:08:28 +0800https://danmackinlay.name/notebook/statistics_ml.htmlTaxonomies Gotchas References This page mostly exists to collect a good selection of overview statistics introductions that are not terrible. I’m especially interested in modern fusion methods that harmonise what we would call statistics and machine learning methods, and the unnecessary terminological confusion between those systems.
Here are some recommended courses to get started if you don’t know what you’re doing.
Larry Wasserman’s stats course Shalizi’s regression lectures Moritz Hardt, Benjamin Recht Patterns, predictions, and actions: A story about machine learning See also the recommended texts below.pytorch
https://danmackinlay.name/notebook/pytorch.html
Tue, 06 Apr 2021 17:39:12 +1000https://danmackinlay.name/notebook/pytorch.htmlGetting started DSP in pytorch Custom functions It’s just as well it’s easy to roll your own recurrent nets because the default implementations are bad Logging and visualizing training Visualising graphs Utility libraries, derived software Lightning ODEs NLP Visdom Fenics Pyro pyprob Inferno Kornia TNT Sparse matrixes Debugging Memory leaks References Successor to Lua’s torch. Evil twin to Googles’s Tensorflow.Model interpretation and explanation
https://danmackinlay.name/notebook/model_interpretation.html
Mon, 15 Mar 2021 16:45:55 +1100https://danmackinlay.name/notebook/model_interpretation.htmlReferences The meeting point of differential privacy, accountability, interpretability, the tank detection story, clever horses in machine learning.
Closely related: am I explaining the model so I can see if it is fair?
There is much work here; I understand little of it at the moment, but I keep needing to refer to papers.
Frequently I need the link to LIME, a neat model that uses penalised regression to do local model explanations.Diagramming and visualising graphical models
https://danmackinlay.name/notebook/diagrams_graphical_models.html
Mon, 15 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/diagrams_graphical_models.htmlDaggity dagR yEd diagrammeR flowchart.fun Mermaid TETRAD Matplotlib Graphviz tikz Misc References On the art and science of algorithmic line drawings for representing graphical models, which is a important part of statistics. The diagrams we need here are nearly flowchart-like, so I can sketch them with a flowchart if need be; but they are closely integrated with the equations of a particular statistical model, so I would like to incorporate them into the same system to avoid tedious and error-prone manual sync.Neural nets with implicit layers
https://danmackinlay.name/notebook/nn_implicit.html
Mon, 15 Mar 2021 12:16:50 +1100https://danmackinlay.name/notebook/nn_implicit.htmlReferences A unifying framework for various networks, including neural ODEs, where our layers are not simple forward operations but who exacluation is represented as some optimisation problem.
For some info see the NeurIPS 2020 tutorial, Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond, by Zico Kolter, David Duvenaud, and Matt Johnson.
NB: This is different to the implicit representation method. Since implicit layers and implicit representation layers also occur in the same problems (such as ML PDES this terminological confusion will haunt us.Statistics, computational complexity thereof
https://danmackinlay.name/notebook/statistics_complexity.html
Wed, 10 Mar 2021 07:29:58 +1100https://danmackinlay.name/notebook/statistics_complexity.htmlReferences Statistical inference when computation isn’t free and memory is not infinite; what does this tell us about the learnable?
Lots of statistics is already concerned with complexity scaling already; in Gaussian process regression, for example, we care a lot about avoiding naïve methods that scale as \(\mathcal{O}(N^3)\). In optimisation we care greatly about the trade off between approach to the optimum and number of steps.Here’s how I would do art with machine learning if I had to
https://danmackinlay.name/notebook/generative_art_nn.html
Tue, 09 Mar 2021 13:44:46 +1100https://danmackinlay.name/notebook/generative_art_nn.htmlVisual synthesis Text synthesis Music Symbolic composition via scores/MIDI/etc Audio synthesis Misc References I’ve a weakness for ideas that give me plausible deniability for making generative art while doing my maths homework.
Quasimondo: so do you.
This page is more chaotic than the already-chaotic median, sorry. Good luck making sense of it. The problem here is that this notebook is in the anti-sweet spot of “stuff I know too much about to need notes but not working on enough to promote”.Neural nets with basis decomposition layers
https://danmackinlay.name/notebook/nn_basis.html
Tue, 09 Mar 2021 12:06:42 +1100https://danmackinlay.name/notebook/nn_basis.htmlNeural networks with continuous basis functions Convolutional neural networks as sparse coding References Neural networks incorporating basis decompositions.
Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation?Stochastic processes which represent measures over the reals
https://danmackinlay.name/notebook/measure_priors.html
Mon, 08 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/measure_priors.htmlSubordinators Other measure priors References Often I need to have a nonparametric representation for a measure over some non-finite index set. We might want to represent a probability, or mass, or a rate. I might want this representation to be something flexible and low-assumption, like a Gaussian process. If I want a nonparametric representation of functions this is not hard; I can simply use a Gaussian process.Convolutional subordinator processes
https://danmackinlay.name/notebook/subordinator_convolution.html
Mon, 08 Mar 2021 15:29:19 +1100https://danmackinlay.name/notebook/subordinator_convolution.htmlReferences Stochastic processes by convolution of noise with smoothing kernels, where the driving noise is a Lévy subordinator.
Why would we want this? One reason is that this gives us a way to create nonparametric distributions over measures.
References Barndorff-Nielsen, O. E., and J. Schmiegel. 2004. “Lévy-Based Spatial-Temporal Modelling, with Applications to Turbulence.” Russian Mathematical Surveys 59 (1): 65. https://doi.org/10.1070/RM2004v059n01ABEH000701. Çinlar, E. 1979. “On Increasing Continuous Processes.Mind reading by computer
https://danmackinlay.name/notebook/mind_reading.html
Wed, 03 Mar 2021 10:44:24 +1100https://danmackinlay.name/notebook/mind_reading.htmlBase level: brain imaging Advanced: brain decoding References A placeholder.
I’d like to know how good the results are getting in this area, and how general across people/technologies etc. How close are we to the point that someone can put an arbitrary individual in some kind of tomography machine and say what they are thinking without pre-training or priming?
Base level: brain imaging The instruments we have are blunt.Memory in machine learning
https://danmackinlay.name/notebook/ml_memory.html
Wed, 03 Mar 2021 09:38:20 +1100https://danmackinlay.name/notebook/ml_memory.htmlReferences How best should learning mechanisms store and retrieve memories? Important in reinforcement learnt Implicit in recurrent networks. One of the chief advantages of neural Turing machines. A great apparent success of transformers.
But, as my colleague Tom Blau points out, perhaps best considered as a topic in its owen right.
References Charles, Adam, Dong Yin, and Christopher Rozell. 2016. “Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks.Convolutional Gaussian processes
https://danmackinlay.name/notebook/gp_convolution.html
Mon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Convolutional stochastic processes
https://danmackinlay.name/notebook/stochastic_convolution.html
Mon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes generated by convolution of white noise with smoothing kernels, which is not unlike kernel density estimation where the “data” is random.
For now, I am mostly interested in certain special cases Gaussian process convolutionss and subordinator convolutions.
patrick-kidger/Deep-Signature-Transforms: Code for “Deep Signature Transforms” patrick-kidger/signatory: Differentiable computations of the signature and logsignature transforms, on both CPU and GPU. References Bolin, David.Causal inference in the continuous limit
https://danmackinlay.name/notebook/causality_continuous.html
Wed, 17 Feb 2021 20:41:08 +1100https://danmackinlay.name/notebook/causality_continuous.htmlReferences Causality on continuous index spaces, and, which turns out to be related, equilibrium/feedback dynamics. Placeholder.
Bongers and Mooij (2018):
Uncertainty and random fluctuations are a very common feature of real dynamical systems. For example, most physical, financial, biochemical and engineering systems are subjected to time-varying external or internal random disturbances. These complex disturbances and their associated responses are most naturally described in terms of stochastic processes.Neural net attention mechanisms
https://danmackinlay.name/notebook/nn_attention.html
Wed, 10 Feb 2021 14:22:50 +1100https://danmackinlay.name/notebook/nn_attention.htmlReferences What’s that now?
Long story, but the most developed family here is the transformer or Sparse Transformer etc for particularly developed examples and explanations of this sub-field. The best illustrated blog post is Jay Alammar’s Illustrated Transformer.
These networks are absolutely massive (heh) in natural language processing right now.
A key point about these networks is that they can be made extremely large but still remain trainable.Voice transcriptions and speech recognition
https://danmackinlay.name/notebook/speech_transcription.html
Sat, 30 Jan 2021 11:10:49 +1100https://danmackinlay.name/notebook/speech_transcription.htmlDictation Transcribing recordings Automation The converse to voice fakes: generating text from speech. a.k. speech-to-text.
This is an older practice than I thought. Check out Volume 89 of Popular Science monthly: Lloyd Darling, The Marvelous Voice Typewriter for the state-of-the-art dictation machine of 1916 (PDF version).
Dictation Speaking as a realtime interactive textual input method. See following roundups of dictation apps to start:
Zapier dictation roundup the rather grimmer Linux-specific roundup.Neural nets for “implicit representations”
https://danmackinlay.name/notebook/nn_implicit_rep.html
Thu, 21 Jan 2021 12:50:41 +1100https://danmackinlay.name/notebook/nn_implicit_rep.htmlReferences A cute hack for generative neural nets. Unlike other structures, here we allow the output to depend upon image coordinates, rather than some presumed-invariant latent factors. I am not quite sure what the rational is for implicit being used as a term here. What representation are implict or explicit are particularly viewpoint-dependent.
NB this is different to the “implicit layers” trick, which allows an optimistation problem to be implicitly solved in a neural net.Auditory features
https://danmackinlay.name/notebook/auditory_features.html
Thu, 14 Jan 2021 11:06:37 +1100https://danmackinlay.name/notebook/auditory_features.htmlJust ask someone Deep neural networks Sparse comb filters Autocorrelation features Linear Predictive coefficents Cepstra MFCC Filterbanks Dynamic dictionaries Cochlear activation models Units References In machine listening and related tasks likes audio analysis we often want compact representations of audio signals in some manner that is not “raw”; something a little more useful than the simple record of the vibrations of the microphone as given by the signal pressure level time series.Statistical mechanics of statistics
https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.html
Wed, 06 Jan 2021 12:46:59 +1100https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.htmlPhase transitions in statistical inference Replicator equations and evolutionary processes References Boaz Barak has a miniature dictionary for statisticians:
I’ve always been curious about the statistical physics approach to problems from computer science. The physics-inspired algorithm survey propagation is the current champion for random 3SAT instances, statistical-physics phase transitions have been suggested as explaining computational difficulty, and statistical physics has even been invoked to explain why deep learning algorithms seem to often converge to useful local minima.Why does deep learning work?
https://danmackinlay.name/notebook/nn_why.html
Mon, 14 Dec 2020 18:09:05 +1100https://danmackinlay.name/notebook/nn_why.htmlSynthetic tutorials Magic of (stochastic) gradient descent … with saddle points Magic of SGD+overparameterization Function approximation theory Crazy physics stuff I have not read There is nothing to see here References No time to frame this well, but there are a lot of versions of the question, so… pick one. The essential idea is that we say: Oh my, that deep learning model I just trained had terribly good performance compared with some simpler thing I tried.Ensembling neural nets
https://danmackinlay.name/notebook/nn_ensemble.html
Mon, 14 Dec 2020 17:16:45 +1100https://danmackinlay.name/notebook/nn_ensemble.htmlExplicit ensembles Distilling Dropout Is this dangerous? Questions References One of the practical forms of Bayesian inference for massively parameterised networks.
Explicit ensembles Train a collection of networks and calculate empirical means and variances to estimate means posterior predictive (He, Lakshminarayanan, and Teh 2020; Huang et al. 2016; Lakshminarayanan, Pritzel, and Blundell 2017; Wen, Tran, and Ba 2020; Xie, Xu, and Chuang 2013). This is neat and on one hand we might think there is nothing special to do here since it’s already more or less classical model ensembling, as near as I can tell.Bayesian deep learning
https://danmackinlay.name/notebook/nn_bayesian.html
Thu, 10 Dec 2020 17:36:51 +1100https://danmackinlay.name/notebook/nn_bayesian.htmlBackgrounders Sampling from Stochastic Gradient Descent Ensemble methods Practicalities References Probably approximately a horse
WARNING: more than usually chaotic notes here
Bayesian inference for massively parameterised networks.
To learn:
marginal likelihood in model selection: how does it work with many optima? Closely related: Generative models where we train a process to generate the phenomenon of interest.
Backgrounders Radford Neal’s thesis (Neal 1996) is a foundational asymptotically-Bayesian use of neural networks.Nonparametrically learning dynamical systems
https://danmackinlay.name/notebook/nn_learning_dynamics.html
Tue, 08 Dec 2020 13:05:58 +1100https://danmackinlay.name/notebook/nn_learning_dynamics.htmlQuestions Tools References Learning stochastic differential equations. Related: Analysing a neural net itself as a dynamical system, which is not quite the same but crosses over. Variational state filters.
A deterministic version of this problem is what e.g. the famous Vector Institute Neural ODE paper (Chen et al. 2018) did. Author Duvenaud argues that in some ways the hype ran away with the Neural ODE paper, and credits CasADI with the innovations here.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_functional.html
Mon, 07 Dec 2020 20:43:06 +1100https://danmackinlay.name/notebook/gp_regression_functional.htmlReferences In which I discov Learning operators via GPs.
References Brault, Romain, Florence d’Alché-Buc, and Markus Heinonen. 2016. “Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning, 110–25. http://arxiv.org/abs/1605.02536. Brault, Romain, Néhémy Lim, and Florence d’Alché-Buc. n.d. “Scaling up Vector Autoregressive Models With Operator-Valued Random Fourier Features.” Accessed August 31, 2016. https://aaltd16.irisa.fr/files/2016/08/AALTD16_paper_11.pdf. Brouard, Céline, Marie Szafranski, and Florence D’Alché-Buc.Regularising neural networks
https://danmackinlay.name/notebook/nn_regularising.html
Tue, 01 Dec 2020 16:41:48 +1100https://danmackinlay.name/notebook/nn_regularising.htmlImplicit regularisation Early stopping Noise layers Input perturbation Regularisation penalties Adversarial training Bayesian optimisation Normalization Weight Normalization References TBD: I have not examined this stuff for a long time and it is probably out of date.
How do we get generalisation from neural networks? As in all ML it is probably about controlling overfitting to the training set by some kind of regularization.Variational inference by message-passing in graphical models
https://danmackinlay.name/notebook/message_passing.html
Wed, 25 Nov 2020 17:42:32 +1100https://danmackinlay.name/notebook/message_passing.htmlReferences Variational inference where the model factorizes over some graphical independence structure, which means we get cheap and distributed inference. I am currently particularly interested in this for latent GP models. Many things can be expressed as message passing algorithms. The grandparent idea in this unification seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc.Graph neural nets
https://danmackinlay.name/notebook/nn_graph.html
Tue, 24 Nov 2020 08:33:48 +1100https://danmackinlay.name/notebook/nn_graph.htmlReferences Neural networks applied to graph data. (Neural networks of course can already be represented as directed graphs, or applied to phenomena which arise from a causal graph but that is not what we mean here.
The version of graphical neural nets with which I am familiar is applying convnets to spectral graph representations. e.g. Thomas Kipf summarises research there.
I gather that the field has moved on and I am no longer across what is happening.External validity
https://danmackinlay.name/notebook/external_validity.html
Mon, 09 Nov 2020 15:58:56 +1100https://danmackinlay.name/notebook/external_validity.htmlStandard graphical models Tools Salad Meta References TBD.
This Maori gentleman from the 1800s demonstrates an artful transfer learning from the western fashion domain
One could read Sebastian Ruder’s NN-style introduction to “transfer learning”. NN people like to think about this in particular way which I like because of the diversity of out-of-the-box ideas it invites and which I dislike because it is sloppy.Causal inference on DAGs
https://danmackinlay.name/notebook/causal_inference.html
Wed, 04 Nov 2020 12:36:13 +1100https://danmackinlay.name/notebook/causal_inference.htmlLearning materials do-calculus Counterfactuals Continuously indexed fields External validity Propensity scores Causal Graph inference from data Causal time series DAGS Drawing graphical models Tools References Inferring the optimal intervention requires accounting for which arrows are independent of which
Inferring cause and effect from nature. Graphical models and related techniques for doing it. Avoiding the danger of folk statistics. Observational studies, confounding, adjustment criteria, d-separation, identifiability, interventions, moral equivalence…Efficient factoring of GP likelihoods
https://danmackinlay.name/notebook/gp_factoring.html
Mon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Audiovisuals
https://danmackinlay.name/notebook/audioviz.html
Mon, 26 Oct 2020 09:16:06 +1100https://danmackinlay.name/notebook/audioviz.htmlMisc notes on visuals that are co-generated with sound.
For inspiration, see Andy Thomas’s Visual sounds of the amazon.Grammatical inference
https://danmackinlay.name/notebook/grammatical_inference.html
Tue, 13 Oct 2020 10:08:51 +1100https://danmackinlay.name/notebook/grammatical_inference.htmlReferences Mathematically speaking, inferring the “formal language” which can describe a set of expressions. In the slightly looser sense used by linguists studying natural human language, discovering the syntactic rules of a given language, which is kinda the same thing but with every term sloppier, and the subject matter itself messier.
This is already a crazily complex area, and being naturally perverse, I am interested in an especially esoteric corner of it, to whit, grammars of things that aren’t speech; inferring design grammars, say, could allow you to produce more things off the same “basic plan” from some examples of the thing; look at enough trees and you know how to build the rest of the forest, that kind of thing.Natural language processing
https://danmackinlay.name/notebook/nlp.html
Thu, 01 Oct 2020 07:51:38 +1000https://danmackinlay.name/notebook/nlp.htmlWhat is NLP? Software HuggingFace SpaCy Stanza Blingfire pytorch.text pytext NLTK NLP4J Misc other References Computation language translation, parsing, search, generation and understanding.
A mare’s nest of intersecting computational philosophical and mathematical challenges (e.g. semantics, grammatical inference, learning theory) that humans seem to be able to handle subconsciously and which we therefore hope to train machines on. Moreover it is a problem of great commercial benefit so it is likely we can muster the resources to tackle it.Big data ML best practice
https://danmackinlay.name/notebook/ml_best_practice.html
Mon, 21 Sep 2020 12:42:17 +1000https://danmackinlay.name/notebook/ml_best_practice.htmlTools References A grab bag of links I have found pragmatically useful in the topsy-turvy world of ML research. Here, where even though we have big data about the world, we still have small data about our own experimental models of the world, because they are so computationally expensive.
see also Surrogate optimisation of experiments.
Martin Zinkervich’s Rules of ML for engineers, and Google’s broad brush workflow overview.Causal inference in highly parameterized ML
https://danmackinlay.name/notebook/causality_ml.html
Fri, 18 Sep 2020 09:34:46 +1000https://danmackinlay.name/notebook/causality_ml.htmlReferences TBD.
Léon Bottou, From Causal Graphs to Causal Invariance
For many problems, it’s difficult to even attempt drawing a causal graph. While structural causal models provide a complete framework for causal inference, it is often hard to encode known physical laws (such as Newton’s gravitation, or the ideal gas law) as causal graphs. In familiar machine learning territory, how does one model the causal relationships between individual pixels and a target prediction?Dimensionality reduction
https://danmackinlay.name/notebook/dimensionality_reduction.html
Fri, 11 Sep 2020 08:20:03 +1000https://danmackinlay.name/notebook/dimensionality_reduction.htmlBayes Learning a summary statistic Feature selection PCA and cousins Learning a distance metric UMAP For indexing my database Locality Preserving projections Diffusion maps As manifold learning Multidimensional scaling Random projection Stochastic neighbour embedding and other visualisation-oriented methods Autoencoder and word2vec Misc References 🏗🏗🏗🏗🏗
I will restructure learning on manifolds and dimensionality reduction into a more useful distinction.
You have lots of predictors in your regression model!Neural nets
https://danmackinlay.name/notebook/nn.html
Wed, 09 Sep 2020 15:56:52 +1000https://danmackinlay.name/notebook/nn.htmlWhat? Why bother? The ultimate regression algorithm Cool maths Insight into the mind Trippy art projects Hip keywords for NN models Probabilistic/variational Convolutional Generative Adversarial Networks Recurrent neural networks Transfer learning Attention mechanism Spike-based Kernel networks Autoencoding Optimisation methods Preventing overfitting Activations for neural networks Practicalities Managing those dimensions Software stuff pre-computed/trained models Howtos References Bjorn Stenger’s brief history of machine learning.Neural network activation functions
https://danmackinlay.name/notebook/nn_activation_gradient.html
Mon, 07 Sep 2020 10:34:23 +1000https://danmackinlay.name/notebook/nn_activation_gradient.htmlReferences The Rectified Linear Unit circa 1920
There is a whole cottage industry in showing neural networks are reasonably universal function approximators with various nonlinearities as activations, under various conditions. In practice you can take this as a given. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties with respect to various assumptions on the learning problems, in a sense which I will not attempt to make rigorous now, vague hand-waving arguments being the whole point of deep learning.Causal graphical model reading group 2020
https://danmackinlay.name/post/reading_group_2020_causal_dags.html
Thu, 03 Sep 2020 11:10:51 +1000https://danmackinlay.name/post/reading_group_2020_causal_dags.htmlMotivational examples Generally Machinery Structural Equation Models Directed Acyclic Graphs (DAGs) Causal interpretation do-calculus Case study: Causal GPs Recommended reading Quick intros Textbooks Questions References See also a previous version, and the notebook on causal inference this will hopefully inform one day.
We follow Pearl’s summary (Pearl 2009a), sections 1-3.
In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called “causal effects” or “policy evaluation”) (2) queries about probabilities of counterfactuals, (including assessment of “regret,” “attribution” or “causes of effects”) and (3) queries about direct and indirect effects (also known as “mediation”)Causal Bayesian networks
https://danmackinlay.name/notebook/causal_bayesian_networks.html
Tue, 01 Sep 2020 08:29:10 +1000https://danmackinlay.name/notebook/causal_bayesian_networks.htmlReferences Some kind of alternative graphical formalism for causal independence graphs 🤷?
discrete probability trees, sometimes also called staged tree models. A probability tree is one of the simplest models for representing the causal generative process of a random experiment or stochastic process The semantics are self-explanatory: each node in the tree corresponds to a potential state of the process, and the arrows indicate both the probabilistic transitions and the causal dependencies between them.Emulators and surrogate models
https://danmackinlay.name/notebook/ml_emulation.html
Wed, 26 Aug 2020 15:12:14 +1000https://danmackinlay.name/notebook/ml_emulation.htmlReferences Emulation, a.k.a. surrogate modelling. In this context, it means reducing complicated physics-driven simulations to simpler/or faster ones using ML techniques. Especially popular in the ML for physics pipeline. I have mostly done this in the context of surrogate optimisation for experiments.
A recent, hyped paper that exemplifies this approach is Kasim et al. (2020), which (somewhat implicitly) uses arguments from Gaussian process regression to produce quasi-Bayesian emulations of notoriously slow simulations.