Linear Algebra on Dan MacKinlay
https://danmackinlay.name/tags/linear_algebra.html
Recent content in Linear Algebra on Dan MacKinlayHugo -- gohugo.ioen-usMon, 12 Apr 2021 12:19:45 +0800Randomized low dimensional projections
https://danmackinlay.name/notebook/low_d_projections.html
Mon, 12 Apr 2021 12:19:45 +0800https://danmackinlay.name/notebook/low_d_projections.htmlRandom projections are kinda Gaussian Random projections are distance preserving Projection statistics Concentration theorems for projections Weird spherical distribution facts of use References One way I can get at the confusing behaviours of high dimensional distributions is to instead look at low dimensional projections of them. If I have a (possibly fixed) data matrix and a random dimensional projection, what distribution does the projection have?High dimensional statistics
https://danmackinlay.name/notebook/high_d_statistics.html
Tue, 23 Mar 2021 14:49:42 +1100https://danmackinlay.name/notebook/high_d_statistics.htmlSoap bubbles Empirical processes in high dimensions Markov Chain Monte Carlo in high dimensions References Placeholder to think about the many weird problems arising in very high dimensional statistical inference. There are many approaches to this problem: throwing out dimensions/predictors as in model selection, considering low dimensional projections, viewing objects with matrix structure for concentration or factorisation, or tensor structure even.
Soap bubbles High dimensional distributions are extremely odd, and concentrate in weird ways.Numerical PDE solvers
https://danmackinlay.name/notebook/pde_solvers.html
Mon, 15 Mar 2021 16:42:29 +1100https://danmackinlay.name/notebook/pde_solvers.htmlHOWTO ADCME DIY in Julia TenFEM JuliaFEM Trixi Firedrake FEniCS PolyFEM SfePy fipy PyAMG An numerical solver iteratively massages an initial condition into a solution with the desired form
There are many methods for numerically solving partial differential equations. Finite element methods, geometric multigrid, algebraic multigrid. Stochastic methods…
For a list that weights them by usefulness in a machine learning context see ML PDEs.Neural nets with implicit layers
https://danmackinlay.name/notebook/nn_implicit.html
Mon, 15 Mar 2021 12:16:50 +1100https://danmackinlay.name/notebook/nn_implicit.htmlReferences A unifying framework for various networks, including neural ODEs, where our layers are not simple forward operations but who exacluation is represented as some optimisation problem.
For some info see the NeurIPS 2020 tutorial, Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond, by Zico Kolter, David Duvenaud, and Matt Johnson.
NB: This is different to the implicit representation method. Since implicit layers and implicit representation layers also occur in the same problems (such as ML PDES this terminological confusion will haunt us.Radial functions
https://danmackinlay.name/notebook/radial_functions.html
Fri, 12 Mar 2021 10:32:33 +1100https://danmackinlay.name/notebook/radial_functions.htmlBasic spherical integrals As dot-product kernels Transforms Hankel transforms Transform algebra Directional statistics Random projections References A function \(g: \mathbb{R}^{d}/ {0} \rightarrow \mathbb{R}\) is radial if there is a function \(k : \mathbb{R}^+ \rightarrow \mathbb{R}\) such that \[g(x)=k(\|x\|),\,x\in\mathbb{R}^d/\{0\}.\] We need to handle these things a lot, in arbitrary dimensions.
Let us put another way. Consider the \(n\)-dimensional polar coordinates representation of a vector, which is unique for non-null vectors: \[ x=r x^{\prime}, \quad \text { where } \quad r=\|x\| \quad \text { and } \quad x'=\frac{x}{\|x\|}.Orthonormal and unitary matrices
https://danmackinlay.name/notebook/orthonormal_matrices.html
Thu, 11 Mar 2021 13:59:33 +1100https://danmackinlay.name/notebook/orthonormal_matrices.htmlParametrising Take the QR decomposition Iterative normalising Householder reflections Givens rotation Parametric sub families Structured Higher rank References In which I think about parameterisations and implementations of finite dimensional energy-preserving operators, a.k.a. matrices. A particular nook in the the linear feedback process library, closely related to stability in linear dynamical systems, since every orthonormal matrix is the forward operator of an energy-preserving system, which is an edge case for certain natural types of stability.Neural nets with basis decomposition layers
https://danmackinlay.name/notebook/nn_basis.html
Tue, 09 Mar 2021 12:06:42 +1100https://danmackinlay.name/notebook/nn_basis.htmlNeural networks with continuous basis functions Convolutional neural networks as sparse coding References Neural networks incorporating basis decompositions.
Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation?Automatic differentiation
https://danmackinlay.name/notebook/autodiff.html
Mon, 08 Mar 2021 11:54:00 +1100https://danmackinlay.name/notebook/autodiff.htmlApplication to backpropagation Computational complexity Forward- versus reverse-mode Symbolic differentiation Misc Software jax Tensorflow Pytorch Julia Aesara taichi Classic python autograd Micrograd Enzyme Theano Casadi ADOL ad ceres solver audi algopy References Gradient field in python
Getting your computer to tell you the gradient of a function, without resorting to finite difference approximation, or coding an analytic derivative by hand. We usually mean this in the sense of automatic forward or reverse mode differentiation, which is not, as such, a symbolic technique, but symbolic differentiation gets an incidental look-in, and these ideas do of course relate.Matrix measure concentration inequalities and bounds
https://danmackinlay.name/notebook/matrix_concentration.html
Mon, 08 Mar 2021 11:08:41 +1100https://danmackinlay.name/notebook/matrix_concentration.htmlMatrix Chernoff Matrix Chebychev Matrix Bernstein Matrix Efron-Stein Gaussian References Concentration inequalities for matrix-valued random variables.
Recommended overviews are J. A. Tropp (2015); van Handel (2017); Vershynin (2018).
Matrix Chernoff J. A. Tropp (2015) summarises:
In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts.Random fields as stochastic differential equations
https://danmackinlay.name/notebook/random_fields_as_sdes.html
Mon, 01 Mar 2021 17:08:40 +1100https://danmackinlay.name/notebook/random_fields_as_sdes.htmlCreating a stationary Markov SDE with desired covariance Convolution representations Covariance representation Input measures \(\mu\) is a hypercube \(\mu\) is the unit sphere \(\mu\) is an isotropic Gaussian Without stationarity via Green’s functions References \(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\sinc}{\operatorname{sinc}}\)
The representation of certain random fields, especially Gaussian random fields as stochastic differential equations. This is the engine that makes filtering Gaussian processes go, and is also a natural framing for probabilistic spectral analysis.Frames and Riesz bases
https://danmackinlay.name/notebook/frames.html
Wed, 24 Feb 2021 08:47:49 +1100https://danmackinlay.name/notebook/frames.htmlReferences Overcomplete basis
You want a fancy basis for your vector space? Try frames! You might care in this case about restricted isometry properties.
Morgenshtern and Bölcskei (Morgenshtern and Bölcskei 2011):
Hilbert spaces and the associated concept of orthonormal bases are of fundamental importance in signal processing, communications, control, and information theory. However, linear independence and orthonormality of the basis elements impose constraints that often make it difficult to have the basis elements satisfy additional desirable properties.Jax
https://danmackinlay.name/notebook/jax.html
Thu, 18 Feb 2021 07:58:37 +1100https://danmackinlay.name/notebook/jax.htmlIdioms Deep learning frameworks Haiku Flax Probabilistic programming frameworks Numpyro Stheno graph networks jax (python) is a successor to classic python/numpy autograd. It includes various code optimisation, jit-compilations, differentiating and vectorizing.
So, a numerical library with certain high performance machine-learning affordances. Note, it is not a deep learning framework per se, but rather the producer species at lowest trophic level of a deep learning ecosystem.Partial differential equations
https://danmackinlay.name/notebook/pdes.html
Mon, 01 Feb 2021 07:33:46 +1100https://danmackinlay.name/notebook/pdes.htmlGreen’s functions Basis function methods References Placeholder.
Green’s functions Cole’s list of Green’s Functions.
Basis function methods TBD. Operator adjoints.
References Bluman, George W. 1983. “On Mapping Linear Partial Differential Equations to Constant Coefficient Equations.” SIAM Journal on Applied Mathematics 43 (6): 1259–73. https://doi.org/10.1137/0143084. Borthwick, David. n.d. Introduction to Partial Differential Equations. Springer International Publishing. Borzì, Alfio, and Volker Schulz.Fourier transforms
https://danmackinlay.name/notebook/fourier_transforms.html
Sat, 30 Jan 2021 11:09:20 +1100https://danmackinlay.name/notebook/fourier_transforms.htmlReferences Placeholder.
The greatest of the integral transforms.
I especially need to learn about Fourier transforms of radial functions.
The Fast Fourier Transform (FFT): Most Ingenious Algorithm Ever? is a virtuosic, illustrated, surprisingly deep explanation of some ideas here (specifically the fast fourier transform) by reducible. Also, great animations. 246B, Notes 2: Some connections with the Fourier transform | What’s new 246B – complex analysis | What’s new References Dokmanic, I.Integral transforms
https://danmackinlay.name/notebook/integral_transforms.html
Sat, 30 Jan 2021 11:09:20 +1100https://danmackinlay.name/notebook/integral_transforms.htmlFourier transform Laplace transform Hankel transform Mellin transform References \[\renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\cc}[1]{\mathcal{#1}}\]
The way we usually analytically solve integral equations and PDEs is via integral transforms. Fourier transforms, Laplace transforms, Mellin transforms, Hankel transforms…
The transforms and applications handbook / edited by Alexander D. Poularikas. Fourier transform See Fourier transforms.
Laplace transform TBD.
Hankel transform TBD.Random embeddings and hashing
https://danmackinlay.name/notebook/random_embedding.html
Tue, 01 Dec 2020 14:01:36 +1100https://danmackinlay.name/notebook/random_embedding.htmlReferences Separation of inputs by random projection
See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, arndomised regressions etc.
Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).
Cover’s Theorem (Cover 1965):
It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems.Randomised regression
https://danmackinlay.name/notebook/randomised_regression.html
Tue, 01 Dec 2020 14:00:10 +1100https://danmackinlay.name/notebook/randomised_regression.htmlReferences Tackling your regression, by using random projections of the predictors.
Usually this means using those projections to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation.
I am especially interested in seeing how this might be useful for dependent data, especially time series.Recommender systems
https://danmackinlay.name/notebook/recommender_systems.html
Mon, 30 Nov 2020 14:55:18 +1100https://danmackinlay.name/notebook/recommender_systems.htmlReferences Not my area, but I need a landing page to refer to for some non-specialist contacts of mine.
I am most familiar with the matrix factorization approaches (e.g. factorization machines, NNMF) but there are many, e.g. variational autoencoder approaches are en vogue.
An overview by Javier lists many approaches.
Most Popular recommendations (the baseline) Item-User similarity based recommendations kNN Collaborative Filtering recommendations GBM based recommendations Non-Negative Matrix Factorization recommendations Factorization Machines (Steffen Rendle 2010) Field Aware Factorization Machines (Yuchin Juan, et al, 2016) Deep Learning based recommendations (Wide and Deep, Heng-Tze Cheng, et al, 2016) Neural Collaborative Filtering (Xiangnan He et al.Probabilistic spectral analysis
https://danmackinlay.name/notebook/probabilistic_spectral_analysis.html
Wed, 25 Nov 2020 11:33:34 +1100https://danmackinlay.name/notebook/probabilistic_spectral_analysis.htmlClassic: stochastic processes studied via correlation function Non-stationary spectral kernel Change point detection version Non-Gaussian approaches References Graphical introduction to nonstationary modelling of audio data. The input (bottom) is a sound recording of female speech. We seek to decompose the signal into Gaussian process carrier waveforms (blue block) multiplied by a spectrogram (green block). The spectrogram is learned from the data as a nonnegative matrix of weights times positive modulators (top).Hidden Markov Model inference for Gaussian Process regression
https://danmackinlay.name/notebook/gp_filtering.html
Wed, 25 Nov 2020 11:28:43 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Miscellaneous notes towards implementation References Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.Weighted data in statistics
https://danmackinlay.name/notebook/weighted_data.html
Fri, 06 Nov 2020 08:48:18 +1100https://danmackinlay.name/notebook/weighted_data.htmlThomas Lumley helpfully disambiguates the “three and half distinct uses of the term weights in statistical methodology”.
The three main types of weights are
the ones that show up in the classical theory of weighted least squares. These describe the precision (1/variance) of observations. …. I call these precision weights; Stata calls them analytic weights. the ones that show up in categorical data analysis. These describe cell sizes in a data set, so a weight of 10 means that there are 10 identical observations in the dataset, which have been compressed to a covariate pattern plus a count.Inverse problems for complex models
https://danmackinlay.name/notebook/inverse_problems_for_complex_models.html
Tue, 13 Oct 2020 12:07:35 +1100https://danmackinlay.name/notebook/inverse_problems_for_complex_models.htmlReferences Inverse problems where the model is more or less a black box.
References Brehmer, Johann, Gilles Louppe, Juan Pavez, and Kyle Cranmer. 2020. “Mining Gold from Implicit Models to Improve Likelihood-Free Inference.” Proceedings of the National Academy of Sciences 117 (10): 5242–49. https://doi.org/10.1073/pnas.1915980117. Cranmer, Kyle, Johann Brehmer, and Gilles Louppe. 2020. “The Frontier of Simulation-Based Inference.” Proceedings of the National Academy of Sciences, May.Sparse model selection
https://danmackinlay.name/notebook/sparse_model_selection.html
Fri, 02 Oct 2020 17:50:51 +1000https://danmackinlay.name/notebook/sparse_model_selection.htmlFOCI Stability selection Relaxed Lasso Dantzig Selector Garotte Degrees-of-freedom penalties References On choosing the right model and regularisation parameter in sparse regression, which turn out to be nearly the same, and closely coupled to doing the regression. There are some wrinkles.
🏗 Talk about when degrees-of-freedom penalties work, when cross-validation and so on.
FOCI The new hotness sweeping the world is FOCI, a sparse model selection procedure (Azadkia and Chatterjee 2019) based on Chatterjee’s ξ statistic as an independence test test.Data summarization
https://danmackinlay.name/notebook/data_summarization.html
Fri, 18 Sep 2020 06:21:41 +1000https://danmackinlay.name/notebook/data_summarization.htmlCoresets representative subsets Directly approximate log likelihood References Summary statistics which don’t require you to keep all the data but which allow you to do inference nearly as well. e.g sufficient statistics in exponential families allow you to do do certain kind of inference perfectly without anything except summaries. Methods such as variational Bayes summarize data by maintaining a posterior density (usually a mixture models) as a summary of all the data, at some cost in accuracy.Dimensionality reduction
https://danmackinlay.name/notebook/dimensionality_reduction.html
Fri, 11 Sep 2020 08:20:03 +1000https://danmackinlay.name/notebook/dimensionality_reduction.htmlBayes Learning a summary statistic Feature selection PCA and cousins Learning a distance metric UMAP For indexing my database Locality Preserving projections Diffusion maps As manifold learning Multidimensional scaling Random projection Stochastic neighbour embedding and other visualisation-oriented methods Autoencoder and word2vec Misc References 🏗🏗🏗🏗🏗
I will restructure learning on manifolds and dimensionality reduction into a more useful distinction.
You have lots of predictors in your regression model!Online learning
https://danmackinlay.name/notebook/online_learning.html
Wed, 26 Aug 2020 16:48:40 +1000https://danmackinlay.name/notebook/online_learning.htmlMirror descent Follow-the-regularized leader Parameter-free Covariance References An online learning perspective gives bounds on the regret: the gap between in performance between online estimation and the optimal estimator when we have access to the entire data.
A lot of things are sort-of online learning; stochastic gradient descent, for example, is closely related. However, if you meet someone who claims to study “online learning” they usually mean to emphasis particular things.(Approximate) matrix factorisation
https://danmackinlay.name/notebook/matrix_factorisation.html
Fri, 03 Jul 2020 19:51:38 +1000https://danmackinlay.name/notebook/matrix_factorisation.htmlWhy does it ever work Overviews Non-negative matrix factorisations As regression Sketching \([\mathcal{H}]\)-matrix methods Randomized methods Connections to kernel learning Implementations References Forget QR and LU decompositions, there are now so many ways of factorising matrices that there are not enough acronyms in the alphabet to hold them, especially if you suspect your matrix is sparse, or could be made sparse because of some underlying constraint, or probably could, if squinted at in the right fashion, be such as a graph transition matrix, or Laplacian, or noisy transform of some smooth object, or at least would be close to sparse if you chose the right metric, or…Automatic differentiation in Julia
https://danmackinlay.name/notebook/julia_autodiff.html
Fri, 05 Jun 2020 12:22:53 +1000https://danmackinlay.name/notebook/julia_autodiff.htmlJulia has an embarrassment of different methods of automatic differentiation (Homoiconicity and introspection makes this comparatively easy.) and it’s not always clear the comparative selling points of each.
The juliadiff project produces ForwardDiff.jl and ReverseDiff.jl which do what I would expect, namely autodiff in forward and reverse mode respectively. ForwardDiff claims to be advanced. ReverseDiff works but is abandoned.
ForwardDiff implements methods to take derivatives, gradients, Jacobians, Hessians, and higher-order derivatives of native Julia functionsMatrix calculus
https://danmackinlay.name/notebook/matrix_calculus.html
Tue, 19 May 2020 12:00:06 +1000https://danmackinlay.name/notebook/matrix_calculus.htmlMatrix differentials Indexed tensor calculus References We can generalise the high school calculus, which is about scalar functions of a scalar argument, in various ways, to handle matrix-valued functions or matrix-valued arguments. One could generalise this further, by to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms, kind of a MVP. (I usually want this for higher order gradient descent.Mixture models for density estimation
https://danmackinlay.name/notebook/mixture_models.html
Fri, 24 Apr 2020 14:02:02 +1000https://danmackinlay.name/notebook/mixture_models.htmlMoments of a mixture Mixture zoo “Classic Mixtures” Continuous mixtures Bayesian Dirichlet mixtures Non-affine mixtures In Bayesian variational inference Estimation/selection methods (Local) maximum likelihood Method of moments Minimum distance Regression smoothing formulation Convergence and model selection Large sample results for mixtures Finite sample results for mixtures Sieve method Akaike Information criterion Quantization and coding theory Minimum description length/BIC Unsatisfactory thing: scale parameter selection theory Connection to Mercer kernel methods Miscellaney References pyxelate uses mixture models to create pixel art colour palettesLearning summary statistics
https://danmackinlay.name/notebook/learning_summary_statistics.html
Wed, 22 Apr 2020 14:40:07 +1000https://danmackinlay.name/notebook/learning_summary_statistics.htmlReferences A dimensionality reduction/feature engineering trick for likelihood-free inference methods such as indirect inference or approximate Bayes computation.
TBD. See de Castro and Dorigo (2019):
Simulator-based inference is currently at the core of many scientific fields, such as population genetics, epidemiology, and experimental particle physics. In many cases the implicit generative procedure defined in the simulation is stochastic and/or lacks a tractable probability density p(x|θ), where θ ∈ Θ is the vector of model parameters.Cross validation
https://danmackinlay.name/notebook/cross_validation.html
Fri, 13 Mar 2020 09:13:31 +0800https://danmackinlay.name/notebook/cross_validation.htmlBasic Cross Validation Generalised Cross Validation Caveats References On substituting simulation for analysis in model selection, in e.g. choosing the “right” regularisation parameter for sparse regression.
Asymptotically equivalent to generalised Akaike information criteria. (e.g. Stone (1977)) Related to bootstrap in various ways.
The computationally expensive default option when your model doesn’t have any obvious short cuts for complexity regularization, for example when AIC cannot be shown to work.Restricted isometry properties
https://danmackinlay.name/notebook/restricted_isometry_props.html
Mon, 09 Mar 2020 15:02:39 +1100https://danmackinlay.name/notebook/restricted_isometry_props.htmlRestricted Isometry Irrepresentability Incoherence Frame theory References Restricted isometry properties, a.k.a. uniform uncertainty principles (E. Candès and Tao 2005; E. J. Candès, Romberg, and Tao 2006), mutual incoherence (David L. Donoho 2006; D. L. Donoho, Elad, and Temlyakov 2006), irrepresentability conditions (Zhao and Yu 2006)…
This is mostly notes while I learn some definitions; expect no actual thoughts.
Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc.Divisibility, decomposability, stability
https://danmackinlay.name/notebook/divisible_distributions.html
Tue, 28 Jan 2020 12:48:19 +1100https://danmackinlay.name/notebook/divisible_distributions.htmlInfinitely divisible Decomposable Self-decomposable Stable Induced processes References 🏗 all of these are about sums; but presumably we can construct this over other algebraic structures of distributions, e.g. max-stable processes.
For now, some handy definition disambiguation.
Infinitely divisible The Lévy process quality.
A probability distribution is infinitely divisible if it can be expressed as the probability distribution of the sum of any arbitrary natural number of independent and identically distributed random variables.Cherchez la martingale
https://danmackinlay.name/notebook/martingales.html
Sat, 30 Nov 2019 18:09:40 +0100https://danmackinlay.name/notebook/martingales.htmlReferences Like Markov processes, a weirdly useful class of stochastic processes. Often you can find a martingale within some stochastic process, or construct a martingale from a stochastic process and prove something nifty thereby; This idea connects and solves a bunch of tricky problems at once.
TODO: examples, maybe a CLT and something else wacky like the life table estimators of (Aalen 1978).
I am indebted to Saif Syed for setting my head straight about the utility of martingales, and Kevin Ross who, in part of Amir Dembo’s course materials, was the one whose explanation of the orthogonality interpretation of martingales finally communicated the neatness of this idea to me.Sparse coding
https://danmackinlay.name/notebook/sparse_coding.html
Tue, 05 Nov 2019 16:28:28 +0100https://danmackinlay.name/notebook/sparse_coding.htmlResources Wavelet bases Matching Pursuits Learnable codings Codings with desired invariances Misc Implementations References Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.Optimal control
https://danmackinlay.name/notebook/optimal_control.html
Fri, 01 Nov 2019 12:58:54 +1100https://danmackinlay.name/notebook/optimal_control.htmlNuts and bolts Online References Nothing to see here; I don’t do optimal control. But here are some notes for when I thought I might.
Feedback Systems: An Introduction for Scientists and Engineers by Karl J. Åström and Richard M. Murray is an interesting control systems theory course from Caltech.
The online control blog post mentioned below has a summary:
Perhaps the most fundamental setting in control theory is a LDS is with quadratic costs \(c_t\) and i.Gaussian processes on lattices
https://danmackinlay.name/notebook/gp_on_lattices.html
Wed, 30 Oct 2019 13:23:08 +1100https://danmackinlay.name/notebook/gp_on_lattices.htmlReferences Gaussian Processes with a stationary kernel are faster if you are working on a grid of points. The main tricks here seem to be circulant embeddings and circulant approximations, which enable one to leverage fast Fourier transforms. This complements, perhaps, the trick of filtering Gaussian processes.
References Chan, G., and A. T. A. Wood. 1999. “Simulation of Stationary Gaussian Vector Fields.” Statistics and Computing 9 (4): 265–68.Sparse regression
https://danmackinlay.name/notebook/sparse_regression.html
Thu, 24 Oct 2019 12:34:31 +1100https://danmackinlay.name/notebook/sparse_regression.htmlLASSO Adaptive LASSO LARS Graph LASSO Elastic net Grouped LASSO Model selection Debiased LASSO Sparse basis expansions Sparse neural nets Other coefficient penalties Other prediction losses Bayesian Lasso Implementations Tidbits References Penalised regression where the penalties are sparsifying. The prediction losses could be anything — likelihood, least-squares, robust Huberised losses, absolute deviation etc.
I will play fast and loose with terminology here regarding theoretical and empirical losses, and the statistical models we attempt to fit.Delays and reverbs for audio processing
https://danmackinlay.name/notebook/delays.html
Tue, 22 Oct 2019 14:30:13 +1100https://danmackinlay.name/notebook/delays.htmlDesigning stable delays Designing allpass delays Designing delay lengths Delays for signal interpolations Things to try References In which I think about parameterisations and implementations of audio recurrence for use in music.
A particular nook in the the linear feedback process library.
Designing stable delays Also, parameterising stable Multi-Input-Multi-Output (MIMO) in signal processing can be done by using a Orthogonal and unitary matrices as the transfer operator, parameterising as stable linear systems.Non-negative matrix factorisation
https://danmackinlay.name/notebook/nnmf.html
Mon, 14 Oct 2019 15:56:01 +1100https://danmackinlay.name/notebook/nnmf.htmlReferences A cute hack in the world of sparse matrix factorisation, where the goal is to decode an element-wise non-negative matrix into a product of two smaller matrices, which looks a lot like sparse coding if you squint at it.
David Austin gives a simple introduction, to the classic Non-negative matrix factorization for the American Mathematical Society.
This method is famous for decomposing things into parts in a sparse way using \(l_2\) loss.Random matrix theory
https://danmackinlay.name/notebook/random_matrix.html
Thu, 10 Oct 2019 13:49:25 +1100https://danmackinlay.name/notebook/random_matrix.htmlTo read References Matrices with distributions for the elements give rise to random matrix theory.
Where you consider “matrix valued random variable” because you are discussing a classic distribution the Wishart distribution or whatever, those are also random matrices, obviously, but not the ones that would usually occur to one when thinking of random matrix theory. The archetypal result in capitalised Random Matrix Theory is Wigner’s semicircle law, which gives the distribution of eigenvalues in growing symmetric square matrices with a certain element-wise real distribution.State filtering parameters
https://danmackinlay.name/notebook/recursive_estimation.html
Tue, 01 Oct 2019 15:33:56 +1000https://danmackinlay.name/notebook/recursive_estimation.htmlClassic recursive estimation Iterated filtering Questions Basic Construction Awaiting filing Implementations References a.k.a. state space model calibration, recursive identification. Sometimes indistinguishable from online estimation.
State filters are cool for estimating time-varying hidden states given known fixed system parameters. How about learning those parameters of the model generating your states? Classic ways that you can do this in dynamical systems include basic linear system identification, and general system identification.Correlograms
https://danmackinlay.name/notebook/correlograms.html
Sun, 22 Sep 2019 13:23:31 +1000https://danmackinlay.name/notebook/correlograms.htmlReferences This material is revised and expanded from the appendix of draft versions of a recent conference submission, for my own reference. I used (deterministic) correlograms a lot in that, and it was startlingly hard to find a decent summary of their properties anywhere. Nothing new here, but… see the matrial about doing this in a probabilistic way via Wiener-Khintchine representation and covariance kernels which lead to a natural probabilistic spectral analysis.Nonparametric state filters via Gaussian Processes
https://danmackinlay.name/notebook/gp_state_filters.html
Wed, 18 Sep 2019 10:21:15 +1000https://danmackinlay.name/notebook/gp_state_filters.htmlReferences Two classic flavours together, Gaussian Processes and state filters. There are other nonparametric state filters, e.g. Variational filters and particle filters.
This is a kind of a dual to using a state filter to calculate a Gaussian process regression as a computational shorthand.
Here we use Gaussian processes to define the filter, in particular to learn nonparametric transition, observation or state densities for a generalized Kalman filter.Fourier interpolation
https://danmackinlay.name/notebook/fourier_interpolation.html
Wed, 19 Jun 2019 13:11:40 +0200https://danmackinlay.name/notebook/fourier_interpolation.htmlMinimum curvature interpolant Derivatives References \[\renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\cc}[1]{\mathcal{#1}}\]
Video
Jezzamon’s Fourier hand
a.k.a. spectral resampling/differentiation/integration.
Rick Lyons, How to Interpolate in the Time-Domain by Zero-Padding in the Frequency Domain. Also more classic Rick Lyons: FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending.
Steven Johnson’s Notes on FFT-based differentiation) is all I really need here; it points out a couple of subtleties about DTFT-based differentiation of functions.(Weighted) least squares fits
https://danmackinlay.name/notebook/least_squares.html
Wed, 22 May 2019 11:52:37 +1000https://danmackinlay.name/notebook/least_squares.htmlIteratively reweighted References A classic. Surprisingly deep.
A few non-comprehensive notes to approximating by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances.
As used in many many problems. e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems.Nearly sufficient statistics
https://danmackinlay.name/notebook/nearly_sufficient_statistics.html
Mon, 14 Jan 2019 15:10:53 +1100https://danmackinlay.name/notebook/nearly_sufficient_statistics.htmlSufficient statistics in exponential families References 🏗
I’m working through a small realisation, for my own interest, which has been helpful in my understanding of variational Bayes; specifically relating it to non-Bayes variational inference. Also sequential monte carlo.
By starting from the idea of sufficient statistics, we come to the idea of variational inference in a natural way, via some other interesting stopovers.
Consider the Bayes filtering setup.Decaying sinusoid dictionaries
https://danmackinlay.name/notebook/decaying_sinusoids.html
Mon, 07 Jan 2019 11:45:01 +1100https://danmackinlay.name/notebook/decaying_sinusoids.htmlInner products of decaying sinusoidal atoms Normalizing decaying sinusoidal atoms Normalizing decaying sinusoidal molecules To file References \(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}[1]{\mathbb{I}\{#1\}} \renewcommand{\inner}[2]{\langle #1,#2\rangle} \renewcommand{\Inner}[2]{\left\langle #1,#2\right\rangle} \renewcommand{\argmax}{\mathop{\mathrm{argmax}}} \renewcommand{\argmin}{\mathop{\mathrm{argmin}}} \renewcommand{\omp}{\mathop{\mathrm{OMP}}}\)
Notes on some calculations with decaying sinusoid atoms as a sparse dictionary basis.
Consider an \(L_2\) signal \(f: \bb{R}\to\bb{R}.\) We will overload notation and write it with free argument \(\xi\), so that \(f(r\xi-\phi),\) for example, refers to the signal \(\xi\mapsto f(r\xi-\phi).Variational state filtering
https://danmackinlay.name/notebook/state_filters_variational.html
Fri, 07 Dec 2018 12:39:45 +1100https://danmackinlay.name/notebook/state_filters_variational.htmlReferences A placeholder; State filtering and estimation where the unobserved state and/or process noise are variationally-learned distributions. For now the only version that is even peripherally related to my work is the Gaussian process state filter.
References Archer, Evan, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. 2015. “Black Box Variational Inference for State Space Models.” November 23, 2015. http://arxiv.org/abs/1511.07367. Bayer, Justin, and Christian Osendorfer.