Neural Nets on Dan MacKinlay
https://danmackinlay.name/tags/neural_nets.html
Recent content in Neural Nets on Dan MacKinlayHugo -- gohugo.ioen-usTue, 13 Apr 2021 16:01:37 +0800ML benchmarks and their pitfalls
https://danmackinlay.name/notebook/ml_benchmarks.html
Tue, 13 Apr 2021 16:01:37 +0800https://danmackinlay.name/notebook/ml_benchmarks.htmlReferences Machine learning’s gamified version of the replication crisis is a paper mill, or perhaps paper tradmill. In this system soemthing counts as “results” if it performs on some conventional benchmarks. But how often does that demonstrate real progress and how often is it overfitting to benchmarks?
Oleg Trott on How to sneak up competition leaderboards.
Filip Piekniewski on the tendency to select bad target losses for convenience, which he analyses as a flavour of Goodhart’s law.pytorch
https://danmackinlay.name/notebook/pytorch.html
Tue, 06 Apr 2021 17:39:12 +1000https://danmackinlay.name/notebook/pytorch.htmlGetting started DSP in pytorch Custom functions It’s just as well it’s easy to roll your own recurrent nets because the default implementations are bad Logging and visualizing training Visualising graphs Utility libraries, derived software Lightning ODEs NLP Visdom Fenics Pyro pyprob Inferno Kornia TNT Sparse matrixes Debugging Memory leaks References Successor to Lua’s torch. Evil twin to Googles’s Tensorflow.Infinite width limits of neural networks
https://danmackinlay.name/notebook/nn_infinite_width.html
Mon, 29 Mar 2021 13:35:28 +1100https://danmackinlay.name/notebook/nn_infinite_width.htmlNeural Network Gaussian Process Neural Network Tangent Kernel Implicit regularization Dropout As stochastic processes References Large-width limits of neural nets.
Neural Network Gaussian Process See Neural network Gaussian process on Wikipedia.
The field that sprang from the insight (Neal 1996a) that in the infinite limit deep NNs asymptotically approach Gaussian processes, and there are theories we can draw from that. Far from the infinite limit there are neural nets which exploit this.Neural nets with implicit layers
https://danmackinlay.name/notebook/nn_implicit.html
Mon, 15 Mar 2021 12:16:50 +1100https://danmackinlay.name/notebook/nn_implicit.htmlReferences A unifying framework for various networks, including neural ODEs, where our layers are not simple forward operations but who exacluation is represented as some optimisation problem.
For some info see the NeurIPS 2020 tutorial, Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond, by Zico Kolter, David Duvenaud, and Matt Johnson.
NB: This is different to the implicit representation method. Since implicit layers and implicit representation layers also occur in the same problems (such as ML PDES this terminological confusion will haunt us.Here’s how I would do art with machine learning if I had to
https://danmackinlay.name/notebook/generative_art_nn.html
Tue, 09 Mar 2021 13:44:46 +1100https://danmackinlay.name/notebook/generative_art_nn.htmlVisual synthesis Text synthesis Music Symbolic composition via scores/MIDI/etc Audio synthesis Misc References I’ve a weakness for ideas that give me plausible deniability for making generative art while doing my maths homework.
Quasimondo: so do you.
This page is more chaotic than the already-chaotic median, sorry. Good luck making sense of it. The problem here is that this notebook is in the anti-sweet spot of “stuff I know too much about to need notes but not working on enough to promote”.Neural nets with basis decomposition layers
https://danmackinlay.name/notebook/nn_basis.html
Tue, 09 Mar 2021 12:06:42 +1100https://danmackinlay.name/notebook/nn_basis.htmlNeural networks with continuous basis functions Convolutional neural networks as sparse coding References Neural networks incorporating basis decompositions.
Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation?Memory in machine learning
https://danmackinlay.name/notebook/ml_memory.html
Wed, 03 Mar 2021 09:38:20 +1100https://danmackinlay.name/notebook/ml_memory.htmlReferences How best should learning mechanisms store and retrieve memories? Important in reinforcement learnt Implicit in recurrent networks. One of the chief advantages of neural Turing machines. A great apparent success of transformers.
But, as my colleague Tom Blau points out, perhaps best considered as a topic in its owen right.
References Charles, Adam, Dong Yin, and Christopher Rozell. 2016. “Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks.Causal inference in the continuous limit
https://danmackinlay.name/notebook/causality_continuous.html
Wed, 17 Feb 2021 20:41:08 +1100https://danmackinlay.name/notebook/causality_continuous.htmlReferences Causality on continuous index spaces, and, which turns out to be related, equilibrium/feedback dynamics. Placeholder.
Bongers and Mooij (2018):
Uncertainty and random fluctuations are a very common feature of real dynamical systems. For example, most physical, financial, biochemical and engineering systems are subjected to time-varying external or internal random disturbances. These complex disturbances and their associated responses are most naturally described in terms of stochastic processes.Neural net attention mechanisms
https://danmackinlay.name/notebook/nn_attention.html
Wed, 10 Feb 2021 14:22:50 +1100https://danmackinlay.name/notebook/nn_attention.htmlReferences What’s that now?
Long story, but the most developed family here is the transformer or Sparse Transformer etc for particularly developed examples and explanations of this sub-field. The best illustrated blog post is Jay Alammar’s Illustrated Transformer.
These networks are absolutely massive (heh) in natural language processing right now.
A key point about these networks is that they can be made extremely large but still remain trainable.Neural nets for “implicit representations”
https://danmackinlay.name/notebook/nn_implicit_rep.html
Thu, 21 Jan 2021 12:50:41 +1100https://danmackinlay.name/notebook/nn_implicit_rep.htmlReferences A cute hack for generative neural nets. Unlike other structures, here we allow the output to depend upon image coordinates, rather than some presumed-invariant latent factors. I am not quite sure what the rational is for implicit being used as a term here. What representation are implict or explicit are particularly viewpoint-dependent.
NB this is different to the “implicit layers” trick, which allows an optimistation problem to be implicitly solved in a neural net.Statistical mechanics of statistics
https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.html
Wed, 06 Jan 2021 12:46:59 +1100https://danmackinlay.name/notebook/statistical_mechanics_of_statistics.htmlPhase transitions in statistical inference Replicator equations and evolutionary processes References Boaz Barak has a miniature dictionary for statisticians:
I’ve always been curious about the statistical physics approach to problems from computer science. The physics-inspired algorithm survey propagation is the current champion for random 3SAT instances, statistical-physics phase transitions have been suggested as explaining computational difficulty, and statistical physics has even been invoked to explain why deep learning algorithms seem to often converge to useful local minima.Why does deep learning work?
https://danmackinlay.name/notebook/nn_why.html
Mon, 14 Dec 2020 18:09:05 +1100https://danmackinlay.name/notebook/nn_why.htmlSynthetic tutorials Magic of (stochastic) gradient descent … with saddle points Magic of SGD+overparameterization Function approximation theory Crazy physics stuff I have not read There is nothing to see here References No time to frame this well, but there are a lot of versions of the question, so… pick one. The essential idea is that we say: Oh my, that deep learning model I just trained had terribly good performance compared with some simpler thing I tried.Probabilistic neural nets
https://danmackinlay.name/notebook/nn_ensemble.html
Mon, 14 Dec 2020 17:16:45 +1100https://danmackinlay.name/notebook/nn_ensemble.htmlExplicit ensembles Distilling Dropout Is this dangerous? Questions References One of the practical forms of Bayesian inference for massively parameterised networks.
Explicit ensembles Train a collection of networks and calculate empirical means and variances to estimate means posterior predictive (He, Lakshminarayanan, and Teh 2020; Huang et al. 2016; Lakshminarayanan, Pritzel, and Blundell 2017; Wen, Tran, and Ba 2020; Xie, Xu, and Chuang 2013). This is neat and on one hand we might think there is nothing special to do here since it’s already more or less classical model ensembling, as near as I can tell.Garbled highlights from NeurIPS 2020
https://danmackinlay.name/post/neurips2020.html
Fri, 11 Dec 2020 09:35:39 +1100https://danmackinlay.name/post/neurips2020.htmlworkshops interesting papers by ad hoc theme causality learning in continuous time/depth Learning with weird losses ML physical sciences References workshops Machine Learning for Creativity and Design Workshop on Deep Learning and Inverse Problems Differentiable vision, graphics, and physics applied to machine learning Learning Meaningful Representations of Life Tackling Climate Change with Machine Learning AI for Earth Sciences Causal Discovery & Causality-Inspired Machine Learning Interpretable Inductive Biases and Physically Structured Learning interesting papers by ad hoc theme causality Causal Imitation Learning With Unobserved Confounders Causal Learning Domain Adaptation as a Problem of Inference on Graphical Models Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs Differentiable Causal Discovery from Interventional Data learning in continuous time/depth Almost surely stable deep dynamics Learning Differential Equations that are Easy to Solve Dissecting Neural ODEs STEER : Simple Temporal Regularization For Neural ODE Training Generative Adversarial Networks by Solving Ordinary Differential Equations Ode to an ODE Time-Reversal Symmetric ODE Network Hypersolvers: Toward Fast Continuous-Depth Models On Second Order Behaviour in Augmented Neural ODEs Neural Controlled Differential Equations for Irregular Time Series Learning with weird losses Quantile Propagation for Wasserstein-Approximate Gaussian Processes All your loss are belong to Bayes The Wasserstein Proximal Gradient Algorithm Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality ML physical sciences References Alexander Norcliffe, Cristian Bodnar, Ben Day, Jacob Moss, and Pietro Liò.Bayesian deep learning
https://danmackinlay.name/notebook/nn_bayesian.html
Thu, 10 Dec 2020 17:36:51 +1100https://danmackinlay.name/notebook/nn_bayesian.htmlBackgrounders Sampling from Stochastic Gradient Descent Ensemble methods Practicalities References Probably approximately a horse
WARNING: more than usually chaotic notes here
Bayesian inference for massively parameterised networks.
To learn:
marginal likelihood in model selection: how does it work with many optima? Closely related: Generative models where we train a process to generate the phenomenon of interest.
Backgrounders Radford Neal’s thesis (Neal 1996) is a foundational asymptotically-Bayesian use of neural networks.Nonparametrically learning dynamical systems
https://danmackinlay.name/notebook/nn_learning_dynamics.html
Tue, 08 Dec 2020 13:05:58 +1100https://danmackinlay.name/notebook/nn_learning_dynamics.htmlQuestions Tools References Learning stochastic differential equations. Related: Analysing a neural net itself as a dynamical system, which is not quite the same but crosses over. Variational state filters.
A deterministic version of this problem is what e.g. the famous Vector Institute Neural ODE paper (Chen et al. 2018) did. Author Duvenaud argues that in some ways the hype ran away with the Neural ODE paper, and credits CasADI with the innovations here.Regularising neural networks
https://danmackinlay.name/notebook/nn_regularising.html
Tue, 01 Dec 2020 16:41:48 +1100https://danmackinlay.name/notebook/nn_regularising.htmlImplicit regularisation Early stopping Noise layers Input perturbation Regularisation penalties Adversarial training Bayesian optimisation Normalization Weight Normalization References TBD: I have not examined this stuff for a long time and it is probably out of date.
How do we get generalisation from neural networks? As in all ML it is probably about controlling overfitting to the training set by some kind of regularization.Graph neural nets
https://danmackinlay.name/notebook/nn_graph.html
Tue, 24 Nov 2020 08:33:48 +1100https://danmackinlay.name/notebook/nn_graph.htmlReferences Neural networks applied to graph data. (Neural networks of course can already be represented as directed graphs, or applied to phenomena which arise from a causal graph but that is not what we mean here.
The version of graphical neural nets with which I am familiar is applying convnets to spectral graph representations. e.g. Thomas Kipf summarises research there.
I gather that the field has moved on and I am no longer across what is happening.Tensorflow
https://danmackinlay.name/notebook/tensorflow.html
Tue, 10 Nov 2020 14:20:46 +1100https://danmackinlay.name/notebook/tensorflow.htmlAbstractions Tutorials Debugging Tensorboard Getting data in (Non-recurrent) convolutional networks Recurrent networks Official documentation Community guides Keras: The recommended way of using tensorflow Getting models out Training in the cloud because you don’t have NVIDIA sponsorship Extending Misc HOWTOs Nightly builds Dynamic graphs GPU selection Silencing tensorflow Hessians and higher order optimisation Manage tensorflow environments Optimisation tricks Probabilistic networks A C++/Python/etc neural network toolkit by Google.Big data ML best practice
https://danmackinlay.name/notebook/ml_best_practice.html
Mon, 21 Sep 2020 12:42:17 +1000https://danmackinlay.name/notebook/ml_best_practice.htmlTools References A grab bag of links I have found pragmatically useful in the topsy-turvy world of ML research. Here, where even though we have big data about the world, we still have small data about our own experimental models of the world, because they are so computationally expensive.
see also Surrogate optimisation of experiments.
Martin Zinkervich’s Rules of ML for engineers, and Google’s broad brush workflow overview.Causal inference in highly parameterized ML
https://danmackinlay.name/notebook/causality_ml.html
Fri, 18 Sep 2020 09:34:46 +1000https://danmackinlay.name/notebook/causality_ml.htmlReferences TBD.
Léon Bottou, From Causal Graphs to Causal Invariance
For many problems, it’s difficult to even attempt drawing a causal graph. While structural causal models provide a complete framework for causal inference, it is often hard to encode known physical laws (such as Newton’s gravitation, or the ideal gas law) as causal graphs. In familiar machine learning territory, how does one model the causal relationships between individual pixels and a target prediction?Dimensionality reduction
https://danmackinlay.name/notebook/dimensionality_reduction.html
Fri, 11 Sep 2020 08:20:03 +1000https://danmackinlay.name/notebook/dimensionality_reduction.htmlBayes Learning a summary statistic Feature selection PCA and cousins Learning a distance metric UMAP For indexing my database Locality Preserving projections Diffusion maps As manifold learning Multidimensional scaling Random projection Stochastic neighbour embedding and other visualisation-oriented methods Autoencoder and word2vec Misc References 🏗🏗🏗🏗🏗
I will restructure learning on manifolds and dimensionality reduction into a more useful distinction.
You have lots of predictors in your regression model!Neural nets
https://danmackinlay.name/notebook/nn.html
Wed, 09 Sep 2020 15:56:52 +1000https://danmackinlay.name/notebook/nn.htmlWhat? Why bother? The ultimate regression algorithm Cool maths Insight into the mind Trippy art projects Hip keywords for NN models Probabilistic/variational Convolutional Generative Adversarial Networks Recurrent neural networks Transfer learning Attention mechanism Spike-based Kernel networks Autoencoding Optimisation methods Preventing overfitting Activations for neural networks Practicalities Managing those dimensions Software stuff pre-computed/trained models Howtos References Bjorn Stenger’s brief history of machine learning.Neural network activation functions
https://danmackinlay.name/notebook/nn_activation_gradient.html
Mon, 07 Sep 2020 10:34:23 +1000https://danmackinlay.name/notebook/nn_activation_gradient.htmlReferences The Rectified Linear Unit circa 1920
There is a whole cottage industry in showing neural networks are reasonably universal function approximators with various nonlinearities as activations, under various conditions. In practice you can take this as a given. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties with respect to various assumptions on the learning problems, in a sense which I will not attempt to make rigorous now, vague hand-waving arguments being the whole point of deep learning.Learning of manifolds
https://danmackinlay.name/notebook/learning_of_manifolds.html
Tue, 23 Jun 2020 09:34:49 +1000https://danmackinlay.name/notebook/learning_of_manifolds.htmlImplementations TTK scikit-learn tapkee References 🏗🏗🏗🏗🏗
I will restructure learning on manifolds and dimensionality reduction into a more useful distinction.
Berger, Daniels and Yu on manifolds in Genome search
As in — handling your high-dimensional, or graphical, data by trying to discover a low(er)-dimensional manifold that contains it. That is, inferring a hidden constraint that happens to have the form of a smooth surface of some low-ish dimension.Deep fakery
https://danmackinlay.name/notebook/deep_fakery.html
Mon, 15 Jun 2020 09:43:36 +1000https://danmackinlay.name/notebook/deep_fakery.htmlThere is little to see here. I am not a researcher in deep video fakes (Although I have a research interest in audio fakes), but I like to keep abreast of how cheap it is to fabricate evidence of things and fret about what this will mean for future society’s agreements on facts. This is obviously going to become ubiquitous in weaponised social media
Sarah Thompson, Fake Faces: People Who Do Not Exist Invade Facebook To Influence 2020 Elections is an interesting bit of meta analysis on Lead Stories.Generative neural net models
https://danmackinlay.name/notebook/nn_generative.html
Sun, 14 Jun 2020 19:04:57 +1000https://danmackinlay.name/notebook/nn_generative.htmlReferences Observations arising from unobserved latent factors
Certain famous models in neural nets are generative — informally, they produce samples some distribution, and the distribution of those samples is tweaks until it resembles, say, the distribution of our observed data.
Tangent: Learning problems involve composition of differentiating and integrating various terms that measure various properties of how well you have approximated the state of the world.Probabilistic neural nets
https://danmackinlay.name/notebook/nn_probabilistic.html
Sun, 14 Jun 2020 19:04:57 +1000https://danmackinlay.name/notebook/nn_probabilistic.htmlReferences Inferring densities and distribution in a massively parameterised deep learning setting.
This is not intrinsically a Bayesian thing to do but in practice much of the demand comes from the demand for Bayesian posterior inference for neural nets, and accordingly most of the action is over there.
References Abbasnejad, Ehsan, Anthony Dick, and Anton van den Hengel. 2016. “Infinite Variational Autoencoder for Semi-Supervised Learning.” In Advances in Neural Information Processing Systems 29.Convolutional neural networks
https://danmackinlay.name/notebook/nn_conv.html
Fri, 24 Apr 2020 13:30:29 +1000https://danmackinlay.name/notebook/nn_conv.htmlVisualising Connection to filter theory References The network topology that more or less kicked off the current revolution in computer vision and thus the whole modern neural network craze.
Convolutional nets (convnets or CNNs to the suave) are well described elsewhere. I’m just going to collect some oddities here. Classic signal processing baked in to neural networks.
There is a long story here about how convolutions naturally encourage certain invariances and symmetries, although AFAICT it’s all somewhat hand-wavey.Learning summary statistics
https://danmackinlay.name/notebook/learning_summary_statistics.html
Wed, 22 Apr 2020 14:40:07 +1000https://danmackinlay.name/notebook/learning_summary_statistics.htmlReferences A dimensionality reduction/feature engineering trick for likelihood-free inference methods such as indirect inference or approximate Bayes computation.
TBD. See de Castro and Dorigo (2019):
Simulator-based inference is currently at the core of many scientific fields, such as population genetics, epidemiology, and experimental particle physics. In many cases the implicit generative procedure defined in the simulation is stochastic and/or lacks a tractable probability density p(x|θ), where θ ∈ Θ is the vector of model parameters.Learning Gamelan
https://danmackinlay.name/notebook/learning_gamelan.html
Mon, 06 Apr 2020 16:21:53 +1000https://danmackinlay.name/notebook/learning_gamelan.htmlReferences Attention conservation notice: Crib notes for a 2 year long project which I ultimately abandoned in late 2018 about approximating convnet with recurrent neural networks for analysing time series. This project currently exists purely as LaTeX files on my hard drive, which need to be imported here for future reference. I did learn some useful tricks along the way about controlling the poles of IIR filters for learning by gradient descent, and those will be actually interesting.Deep learning as a dynamical system
https://danmackinlay.name/notebook/nn_dynamical.html
Thu, 02 Apr 2020 17:26:01 +1100https://danmackinlay.name/notebook/nn_dynamical.htmlConvnets/Resnets as discrete PDE approximations References Image: Donny Darko
A recurring movement within neural network learning research which tries to render the learning of prediction functions tractable by considering them as dynamical systems, and using the theory of stability in the context of Hamiltonians, optimal control and/or ODE solvers, to make it all work.
I’ve been interested by this since seeing the (Haber and Ruthotto 2018) paper, but it’s got a kick from T.Nonparametrically learning spatiotemporal systems
https://danmackinlay.name/notebook/nn_spatiotemporal.html
Thu, 02 Apr 2020 17:26:01 +1100https://danmackinlay.name/notebook/nn_spatiotemporal.htmlReferences On learning stochastic partial differential equations and other processes using neural networks, gaussian processes and other differentiable techniques. Uses the tools of dynamical NNs and their ilk. Probably handy for machine learning physics.
I know little about this yet. But here are some links
References Arridge, Simon, Peter Maass, Ozan Öktem, and Carola-Bibiane Schönlieb. 2019. “Solving Inverse Problems Using Data-Driven Models.” Acta Numerica 28 (May): 1–174.Teaching computers to write music
https://danmackinlay.name/notebook/generative_music.html
Wed, 25 Mar 2020 08:10:01 +0800https://danmackinlay.name/notebook/generative_music.htmlUseful infrastructure Tutorials Audio synthesis Examples References Seems like it should be easy, until you think about it.
Related: Arpeggiate by numbers which discusses music theory, and analysis/resynthesis, which discusses audio.
Useful infrastructure pypianoroll and music21 for respectively, piano rolls and midi scores. See, e.g. MIDO for live MIDI.
Tutorials A tutorial on generating music using Restricted Boltzmann Machines for the conditional random field density, and an RNN for the time dependence after (Boulanger-Lewandowski, Bengio, and Vincent 2012).Learnable indexes and hashes
https://danmackinlay.name/notebook/learnable_indexes.html
Tue, 18 Feb 2020 12:20:29 +1100https://danmackinlay.name/notebook/learnable_indexes.htmlLearnable hashes for similarity search Learnable indexes for arbitrary search References Dr. Wu-Jun LI’s excellent Lit review and practicalities supporting their own papers. Kevin Zakka’s kNN classification using Neighbourhood Components Analysis is an illustrated guide to a type of dimensionality reduction I had not heard of before that looks handy for nearest-neighbour search, which I suppose is the entry-level use here. (Dwibedi et al. 2019)Gradient descent, first-order, stochastic
https://danmackinlay.name/notebook/gd_1st_order_stochastic.html
Fri, 07 Feb 2020 12:27:58 +1100https://danmackinlay.name/notebook/gd_1st_order_stochastic.htmlVariance-reduced Normalized Sundry Hacks References Stochastic optimization, uses noisy (possibly approximate) 1st-order gradient information to find the argument which minimises
\[ x^*=\operatorname{argmin}_{\mathbf{x}} f(x) \]
for some an objective function \(f:\mathbb{R}^n\to\mathbb{R}\).
That this works with little fuss in very high dimensions is a major pillar of deep learning.
The original version, given in terms of root finding, is (Herbert Robbins and Monro 1951) who later generalised analysis in (H.Recurrent neural networks
https://danmackinlay.name/notebook/nn_recurrent.html
Thu, 23 Jan 2020 13:32:31 +1100https://danmackinlay.name/notebook/nn_recurrent.htmlIntro Flavours Linear Vanilla non-linear Long Short Term Memory (LSTM) Gate Recurrent Unit (GRU) Unitary Probabilistic Phased Other Practicalities General References Feedback networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever). Many wheels are re-invented with these, but the essential model is that we have a heavily nonlinear state filter inferred by gradient descent.Gradient descent, Newton-like, stochastic
https://danmackinlay.name/notebook/gd_2nd_order_stochastic.html
Thu, 23 Jan 2020 10:35:22 +1100https://danmackinlay.name/notebook/gd_2nd_order_stochastic.htmlSubsampling General case References Stochastic Newton-type optimization, unlike deterministic Newton optimisation, uses noisy (possibly approximate) 2nd-order gradient information to find the argument which minimises
\[ x^*=\operatorname{argmin}_{\mathbf{x}} f(x) \]
for some an objective function \(f:\mathbb{R}^n\to\mathbb{R}\).
Subsampling Most of the good tricks here are set up for ML-style training losses where the bottleneck is summing a large number of loss functions.
LiSSA attempts to make 2nd order gradient descent methods scale to large parameter sets (Agarwal, Bullins, and Hazan 2016):Gradient descent, Higher order
https://danmackinlay.name/notebook/gd_3rd_order.html
Sat, 26 Oct 2019 16:56:24 +0800https://danmackinlay.name/notebook/gd_3rd_order.htmlReferences Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimiztion problems. Higher order optimisation uses 3rd order gradients and so on. They are elegant for univariate functions.
This is rarely done in problems that I face because
3rd order derivatives of multivariate optimisations are usually too big in time and space complexity to be tractable They are not (simply) expressible as matrices so can benefit from a little tensor theory.Differentiable learning of automata
https://danmackinlay.name/notebook/nn_automata.html
Wed, 11 Sep 2019 15:29:20 +0100https://danmackinlay.name/notebook/nn_automata.htmlReferences Learning stack machines, random access machines, nested hierarchical parsing machines, Turing machines and whatever other automata-with-memory that you wish, from data. In other words, teaching computers to program themselves, via a deep learning formalism.
This is a kind of obvious idea and there are some charming toy examples. Indeed this is wort-of what we have traditionally imagined AI might do.
Obviously a hypothetical superhuman Artificial General Intelligence would be good at handling problems; It’s not the absolute hippest research area right now though, on account of being hard in general, just like we always imagined from earlier attempts.Gradient descent, Newton-like
https://danmackinlay.name/notebook/gd_2nd_order.html
Tue, 03 Sep 2019 14:06:19 +1000https://danmackinlay.name/notebook/gd_2nd_order.htmlVanilla Newton methods Quasi Newton methods Hessian free Natural gradient descent. Stochastic References Newton-type optimization, unlike basic gradient descent, uses (possibly approximate) 2nd-order gradient information to find the argument which minimises
\[ x^*=\operatorname{argmin}_{\mathbf{x}} f(x) \]
for some an objective function \(f:\mathbb{R}^n\to\mathbb{R}\).
Optimization over arbitrary functions typically gets discussed in terms of line-search and trust-region methods, both of which can be construed, AFAICT, as second order methods.Compressing neural nets
https://danmackinlay.name/notebook/nn_compressing.html
Mon, 27 May 2019 17:59:12 +1000https://danmackinlay.name/notebook/nn_compressing.htmlReferences How to make neural nets smaller while still preserving their performance. This is a subtle problem, As we suspect that part of their special sauce is precisely that they are overparameterized which is to say, one reason they work is precisely that they are bigger than they “need” to be. The problem of finding the network that is smaller than the bigger that it seems to need to be is tricky.Overparameterization
https://danmackinlay.name/notebook/overparameterization.html
Tue, 11 Dec 2018 20:18:08 +1100https://danmackinlay.name/notebook/overparameterization.htmlReferences Notes on the general technique of increasing the number of slack parameters you have, especially in machine learning. Maybe this should rather be simply parameterization, given that I am increasingly concerned with how to select parameterisations for problems with respect to which inference will find good optima, which is not necessarily
The combination of overparameterization and SGD is argued to be the secret to how deep learning works, by Zeyuan Allen-Zhu, Yuanzhi Li and Zhao Song.Entity embeddings
https://danmackinlay.name/notebook/entity_embeddings.html
Sat, 01 Apr 2017 09:56:50 +0800https://danmackinlay.name/notebook/entity_embeddings.htmlFeature construction for inconvenient data; made famous by word embeddings such as word2vec being surprisingly semantic. Note that word2vec has a complex relationship to its documentation.
Entity embeddings of categorical variables (code)
We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings of the categorical variables. The mapping is learned by a neural network during the standard supervised training process. Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables.Random neural networks
https://danmackinlay.name/notebook/nn_random.html
Sun, 19 Feb 2017 13:19:38 +1100https://danmackinlay.name/notebook/nn_random.htmlRecurrent: Echo State Machines / Random reservoir networks Random convolutions References If you do not bother to train your neural net, what happens? In the infinite-width limit you get a Gaussian process. There are a number of net architectures which do not make use of that argument and which are still random though.
Recurrent: Echo State Machines / Random reservoir networks This sounds deliciously lazy; At a glance it sounds like the process is to construct a random recurrent network, i.Garbled highlights from NIPS 2016
https://danmackinlay.name/post/nips2016.html
Fri, 03 Feb 2017 15:37:20 +1100https://danmackinlay.name/post/nips2016.htmlTime series workshop Generative Adversarial models MetaGrad: Multiple Learning rates in Online Learning Structured Orthogonal Random Features Universal Correspondence Network Weight Normalization: A simple reparameterisation to Accelerate Training of Deep Neural Networks Relevant sparse codes with variational information bottleneck Dense Associative Memory for Pattern recognition Density estimation using Real NVP InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Parameter Learning for Log-supermodular Distributions Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates High dimensional learning with structure Doug Eck Computing with spikes workshop Bayesian Deep Learning workshop NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop Adaptive and Scalable Nonparametric Methods in Machine Learning Brains and Bits: Neuroscience Meets Machine Learning Spatiotemporal forecasting Constructive machine learning References Full paper listing.Pattern machine
https://danmackinlay.name/notebook/pattern_machine.html
Tue, 24 Nov 2015 12:48:49 +0800https://danmackinlay.name/notebook/pattern_machine.htmlPattern machine is an algorithmic audiovisual art project. Check out our project blog, or some source code.