hilbert_space on Dan MacKinlay
https://danmackinlay.name/tags/hilbert_space.html
Recent content in hilbert_space on Dan MacKinlayHugo -- gohugo.ioen-usTue, 13 Apr 2021 14:40:14 +0800Gaussian process regression
https://danmackinlay.name/notebook/gp_regression.html
Tue, 13 Apr 2021 14:40:14 +0800https://danmackinlay.name/notebook/gp_regression.htmlQuick intro Density estimation Kernels Using state filtering On lattice observations On manifolds By variational inference With inducing variables By variational inference with inducing variables With vector output Approximation with dropout For dimension reduction Readings Implementations Geostat Framework GPy Stheno GPyTorch GPFlow Misc python Stan AutoGP scikit-learn Misc julia MATLAB References Chi Feng’s GP regression demo.
Gaussian random fields are stochastic processes/fields with jointly Gaussian distributions of observations.Dynamical systems via Koopman operators
https://danmackinlay.name/notebook/koopmania.html
Fri, 09 Apr 2021 11:46:21 +0800https://danmackinlay.name/notebook/koopmania.htmlReferences NB: Koopman here is B.O. Koopman (Koopman 1931) not S.J. Koopman, who also works in dynamical systems.
I do not know how this works, but maybe this fragment of abstract will do for now (Budišić, Mohr, and Mezić 2012):
A majority of methods from dynamical system analysis, especially those in applied settings, rely on Poincaré’s geometric picture that focuses on “dynamics of states.Kernel zoo
https://danmackinlay.name/notebook/kernel_zoo.html
Tue, 30 Mar 2021 14:20:40 +1100https://danmackinlay.name/notebook/kernel_zoo.htmlStationary dot-product NN kernels NN Erf kernel Arc-cosine kernel Causal kernels Wiener process kernel Squared exponential Rational Quadratic Matérn Periodic Locally periodic “Integral” kernel Composed kernels Stationary spectral kernels Nonstationary spectral kernels Compactly supported Markov kernels Genton kernels Kernels with desired symmetry Stationary reducible kernels Other nonstationary kernels References What follows are some useful kernels to have in my toolkit, mostly over \(\mathbb{R}^n\) or at least some space with a metric.Neural nets with basis decomposition layers
https://danmackinlay.name/notebook/nn_basis.html
Tue, 09 Mar 2021 12:06:42 +1100https://danmackinlay.name/notebook/nn_basis.htmlNeural networks with continuous basis functions Convolutional neural networks as sparse coding References Neural networks incorporating basis decompositions.
Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation?Combining kernels
https://danmackinlay.name/notebook/kernel_combo.html
Mon, 01 Mar 2021 19:53:09 +1100https://danmackinlay.name/notebook/kernel_combo.htmlLocally stationary kernels Stationary reducible kernels Other nonstationary kernels References A sum or product (or outer sum, or tensor product) of kernels is still a kernel. For other transforms YMMV.
For example, in the case of Gaussian processes, suppose that, independently,
\[\begin{aligned} f_{1} &\sim \mathcal{GP}\left(\mu_{1}, k_{1}\right)\\ f_{2} &\sim \mathcal{GP}\left(\mu_{2}, k_{2}\right) \end{aligned}\] then
\[ f_{1}+f_{2} \sim \mathcal{GP} \left(\mu_{1}+\mu_{2}, k_{1}+k_{2}\right) \] so \(k_{1}+k_{2}\) is also a kernel.Convolutional Gaussian processes
https://danmackinlay.name/notebook/gp_convolution.html
Mon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Convolutional stochastic processes
https://danmackinlay.name/notebook/stochastic_convolution.html
Mon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes generated by convolution of white noise with smoothing kernels, which is not unlike kernel density estimation where the “data” is random.
For now, I am mostly interested in certain special cases Gaussian process convolutionss and subordinator convolutions.
patrick-kidger/Deep-Signature-Transforms: Code for "Deep Signature Transforms" patrick-kidger/signatory: Differentiable computations of the signature and logsignature transforms, on both CPU and GPU. References Bolin, David.Covariance functions
https://danmackinlay.name/notebook/kernel_learning.html
Mon, 01 Mar 2021 13:25:10 +1100https://danmackinlay.name/notebook/kernel_learning.htmlLearning kernel hyperparameters Learning kernel composition Hyperkernels References This is usually in the context of Gaussian processes where everything can work out nicely if you are lucky, but other kernel machines are OK too. The goal for most of these is to maximise the marginal posterior likelihood, a.k.a. model evidence, as is conventional in Bayesian ML.
Learning kernel hyperparameters 🏗
Learning kernel composition Automating kernel design by some composition of simpler atomic kernels.Frames and Riesz bases
https://danmackinlay.name/notebook/frames.html
Wed, 24 Feb 2021 08:47:49 +1100https://danmackinlay.name/notebook/frames.htmlReferences Overcomplete basis
You want a fancy basis for your vector space? Try frames! You might care in this case about restricted isometry properties.
Morgenshtern and Bölcskei (Morgenshtern and Bölcskei 2011):
Hilbert spaces and the associated concept of orthonormal bases are of fundamental importance in signal processing, communications, control, and information theory. However, linear independence and orthonormality of the basis elements impose constraints that often make it difficult to have the basis elements satisfy additional desirable properties.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_vector.html
Tue, 23 Feb 2021 12:09:36 +1100https://danmackinlay.name/notebook/gp_regression_vector.htmlCo-regionalization Multi-task Multi Output Spectral Mixture Kernel References In which I discover for myself whether “multi-task” and “co-regionalized” approaches are different. Álvarez, Rosasco, and Lawrence (2012)
Overview from Invenia: Gaussian Processes: from one to many outputs
Co-regionalization [the] community has begun to turn its attention to covariance functions for multiple outputs. One of the paradigms that has been considered (Bonilla, Chai, and Williams 2007; Osborne et al.Polynomial bases
https://danmackinlay.name/notebook/polynomial_bases.html
Wed, 17 Feb 2021 13:08:53 +1100https://danmackinlay.name/notebook/polynomial_bases.htmlFun things Well known facts Zoo References Placeholder.
Fun things Terry Tao on Conversions between standard polynomial bases.
Well known facts Xiu and Karniadakis (2002) mention the following “Well known facts”:
All orthogonal polynomials \(\left\{Q_{n}(x)\right\}\) satisfy a three-term recurrence relation \[ -x Q_{n}(x)=A_{n} Q_{n+1}(x)-\left(A_{n}+C_{n}\right) Q_{n}(x)+C_{n} Q_{n-1}(x), \quad n \geq 1 \] where \(A_{n}, C_{n} \neq 0\) and \(C_{n} / A_{n-1}>0 .\) Together with \(Q_{-1}(x)=0\) and \(Q_{0}(x)=1,\) all \(Q_{n}(x)\) can be determined by the recurrence relation.Stability in linear dynamical systems
https://danmackinlay.name/notebook/stability_dynamical_linear.html
Tue, 16 Feb 2021 08:23:02 +1100https://danmackinlay.name/notebook/stability_dynamical_linear.htmlPole representations Reparameterisation Continuous time Stability and gradient descent References The intersection of linear dynamical systems and stability of dynamic systems.
There is not much content here because I spent 2 years working on it and am too traumatised to revisit it.
Informally, I am admitting as “stable” any dynamical system which does not explode super-polynomially fast; We can think of these as systems where if the system is not stationary then at least the rate of change might be.Chaos expansions
https://danmackinlay.name/notebook/chaos_expansion.html
Mon, 15 Feb 2021 10:53:01 +1100https://danmackinlay.name/notebook/chaos_expansion.htmlPolynomial chaos expansion “Generalized” chaos expansion Arbitrary chaos expansion References Placeholder, for a topic which has a slightly confusing name. To explore: Connection to/difference from other methods of keeping track of evolution of uncertainty in dynamical systems. C&C Gaussian process regression as used in Gratiet, Marelli, and Sudret (2016), functional data analysis etc.
This is not the same thing as chaos in the sense of the deterministic chaos made famous by dynamical systems theory and fractal t-shirts.Fourier transforms
https://danmackinlay.name/notebook/fourier_transforms.html
Sat, 30 Jan 2021 11:09:20 +1100https://danmackinlay.name/notebook/fourier_transforms.htmlReferences Placeholder.
The greatest of the integral transforms.
I especially need to learn about Fourier transforms of radial functions.
The Fast Fourier Transform (FFT): Most Ingenious Algorithm Ever? is a virtuosic, illustrated, surprisingly deep explanation of some ideas here (specifically the fast fourier transform) by reducible. Also, great animations. 246B, Notes 2: Some connections with the Fourier transform | What’s new 246B – complex analysis | What’s new References Dokmanic, I.Integral transforms
https://danmackinlay.name/notebook/integral_transforms.html
Sat, 30 Jan 2021 11:09:20 +1100https://danmackinlay.name/notebook/integral_transforms.htmlFourier transform Laplace transform Hankel transform Mellin transform References \[\renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\cc}[1]{\mathcal{#1}}\]
The way we usually analytically solve integral equations and PDEs is via integral transforms. Fourier transforms, Laplace transforms, Mellin transforms, Hankel transforms…
The transforms and applications handbook / edited by Alexander D. Poularikas. Fourier transform See Fourier transforms.
Laplace transform TBD.
Hankel transform TBD.Kernel warping
https://danmackinlay.name/notebook/kernel_warping.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_warping.htmlStationary reducible kernels Classic deformations MacKay warping As a function of input Learning transforms References A nonlinear way of transforming stationary kernels into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).
This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somewhat) arbitrary transforms to attain stationarity.Miscellaneous nonstationary kernels
https://danmackinlay.name/notebook/kernel_nonstationary.html
Thu, 21 Jan 2021 10:55:36 +1100https://danmackinlay.name/notebook/kernel_nonstationary.htmlReferences Kernels that are nonstationary constructed by other means than warping stationary ones.
Maybe start with Jun and Stein (2008);Fuglstad et al. (2015); Fuglstad et al. (2013)?
References Bolin, David, and Kristin Kirchner. 2020. “The Rational SPDE Approach for Gaussian Random Fields With General Smoothness.” Journal of Computational and Graphical Statistics 29 (2): 274–85. https://doi.org/10.1080/10618600.2019.1665537. Bolin, David, and Finn Lindgren. 2011. “Spatial Models Generated by Nested Stochastic Partial Differential Equations, with an Application to Global Ozone Mapping.Covariance functions
https://danmackinlay.name/notebook/covariance_kernels.html
Tue, 05 Jan 2021 15:07:38 +1100https://danmackinlay.name/notebook/covariance_kernels.htmlCovariance kernels of some example processes A simple Markov chain The Hawkes process Gaussian processes General real covariance kernels Bonus: complex covariance kernels Kernel zoo Learning kernels Non-positive kernels References A realisation of a nonstationary rough covariance process (partially observed)
On the interpretation of kernels as the covariance functions of stochastic processes, whcih is one way to define stochastic processes.
Suppose we have a real-valued stochastic processWiener-Khintchine representation
https://danmackinlay.name/notebook/wiener_khintchine.html
Sun, 03 Jan 2021 15:47:35 +1100https://danmackinlay.name/notebook/wiener_khintchine.htmlWiener theorem: Deterministic case Wiener-Khinchine theorem: Spectral density of covariance kernels Bochner’s Theorem: stationary spectral kernels Yaglom’s theorem References \[ \renewcommand{\lt}{<} \renewcommand{\gt}{>} \renewcommand{\var}{\operatorname{Var}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \]
Consider a real-valued stochastic process \(\{X_{\vv{t}}\}_{\vv{t}\in\mathcal{T}}\) over an index (metric) space \(\mathcal{T}\), i.e. a realisation of such process is a function \(\mathcal{T}\to\mathbb{R}\). For the sake of concreteness we will take \(\mathcal{T}=\mathbb{R}^{d}\) here.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_functional.html
Mon, 07 Dec 2020 20:43:06 +1100https://danmackinlay.name/notebook/gp_regression_functional.htmlReferences In which I discov Learning operators via GPs.
References Brault, Romain, Florence d’Alché-Buc, and Markus Heinonen. 2016. “Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning, 110–25. http://arxiv.org/abs/1605.02536. Brault, Romain, Néhémy Lim, and Florence d’Alché-Buc. n.d. “Scaling up Vector Autoregressive Models With Operator-Valued Random Fourier Features.” Accessed August 31, 2016. https://aaltd16.irisa.fr/files/2016/08/AALTD16_paper_11.pdf. Brouard, Céline, Marie Szafranski, and Florence D’Alché-Buc.Hidden Markov Model inference for Gaussian Process regression
https://danmackinlay.name/notebook/gp_filtering.html
Wed, 25 Nov 2020 11:28:43 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Miscellaneous notes towards implementation References Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.Tensor decompositions
https://danmackinlay.name/notebook/tensor_decompositions.html
Thu, 19 Nov 2020 15:22:26 +1100https://danmackinlay.name/notebook/tensor_decompositions.htmlReferences I know nothing about decomposing tensors. I get that this is somewhat more general that decomposing matrices. They look at a glance to generalise your usual linear algebra make multilinear regression tractable.
See maybe the tensorly decomposition list.
References Anandkumar, Anima, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2015. “Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT).” In Algorithmic Learning Theory, edited by Kamalika Chaudhuri, CLAUDIO GENTILE, and Sandra Zilles, 19–38.Tensor regression
https://danmackinlay.name/notebook/tensor_regression.html
Thu, 19 Nov 2020 15:00:31 +1100https://danmackinlay.name/notebook/tensor_regression.htmlReferences Generalise your usual linear regression to multilinear regression. Useful tool: tensor decompositions. Tensorly I think is the main implementation of note here.
References Anandkumar, Anima, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2015. “Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT).” In Algorithmic Learning Theory, edited by Kamalika Chaudhuri, CLAUDIO GENTILE, and Sandra Zilles, 19–38. Lecture Notes in Computer Science.Efficient factoring of GP likelihoods
https://danmackinlay.name/notebook/gp_factoring.html
Mon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Transforms of RVs
https://danmackinlay.name/notebook/transforms_of_rvs.html
Fri, 23 Oct 2020 07:54:19 +1100https://danmackinlay.name/notebook/transforms_of_rvs.htmlStochastic Itō-Taylor expansion Linearization Unscented transform References I have a nonlinear transformation of a random process. What is its distribution?
Stochastic Itō-Taylor expansion See stochastic taylor expansion. tl;dr: More trouble than it is worth.
Linearization As seen in the Ensemble Kalman Filter.
Unscented transform The great invention of Uhlmann and Julier is unscented transform, which uses a ‘\(\sigma\)-point approximation.’
In the context of Kalman filtering,Filter design, linear
https://danmackinlay.name/notebook/filter_design_linear.html
Fri, 18 Sep 2020 10:15:52 +1000https://danmackinlay.name/notebook/filter_design_linear.htmlRelationship of discrete LTI to continuous time filters Quick and dirty digital filter design State-Variable Filters Time-varying IIR filters References Linear Time-Invariant (LTI) filter design is a field of signal processing, and a special case of state filtering that doesn’t necessarily involve a hidden state.
z-Transforms, bilinear transforms, Bode plots, design etc.
I am going to consider this in discrete time (i.e. for digital implementation) unless otherwise stated, because I’m implementing this in software, not with capacitors or whatever.Non-Gaussian Bayesian functional regression
https://danmackinlay.name/notebook/stochastic_process_regression.html
Wed, 16 Sep 2020 14:07:32 +1000https://danmackinlay.name/notebook/stochastic_process_regression.htmlReferences Regression using non-Gaussian random fields. Generalised Gaussian process regression.
Is there ever an actual need for this? Or can we just use mostly-Gaussian process with some non-Gaussian distribution marginal and pretend, via GP quantile regression, or some variational GP approximation or non-Gaussian likelihood over Guaussian latents. Presumably if we suspect higher moments than the second are important, or that there is some actual stochastic process that we know matches our phenomenon, we might bother with this, but oh my it can get complicated.Gaussian process quantile regression
https://danmackinlay.name/notebook/gp_quantile_regression.html
Wed, 16 Sep 2020 13:44:32 +1000https://danmackinlay.name/notebook/gp_quantile_regression.htmlReferences How to do quantile regression with GPs.
References Boukouvalas, Alexis, Remi Barillec, and Dan Cornford. 2012. “Gaussian Process Quantile Regression Using Expectation Propagation.” In ICML 2012. http://arxiv.org/abs/1206.6391. Reich, Brian J. 2012. “Spatiotemporal Quantile Regression for Detecting Distributional Changes in Environmental Processes.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 61 (4): 535–53. https://doi.org/10.1111/j.1467-9876.2011.01025.x. Reich, Brian J., Montserrat Fuentes, and David B.Statistics of spatio-temporal processes
https://danmackinlay.name/notebook/spatio_temporal.html
Fri, 11 Sep 2020 13:30:12 +1000https://danmackinlay.name/notebook/spatio_temporal.htmlTools References The dynamics of spatial processes evolving in time.
Clearly there are many different problems one might wonder about here. I am thinking in particular of the kind of problem whose discretisation might look like this, as a graphical model.
This is highly stylized - I’ve imagined there is one spatial dimension, but usually there would be two or three. The observed notes are where we have sensors that can measure the state of some parameter of interest \(w\) which evolves in time \(t\).Stochastic signal sampling
https://danmackinlay.name/notebook/signal_sampling_stochastic.html
Thu, 11 Jun 2020 06:45:08 +1000https://danmackinlay.name/notebook/signal_sampling_stochastic.htmlReferences Signal sampling is the study of approximating continuous signals with discrete ones and vice versa. What if the signal you are trying to recover is random, but you have a model for that randomness, and can thus assign likelihoods (posterior probabilities even) to some sample paths.? Now you are sampling a stochastic process.
This is a particular take on a classic inverse problem that arises in many areas, framed how electrical engineers frame it.Functional regression
https://danmackinlay.name/notebook/functional_data.html
Thu, 28 May 2020 11:17:20 +1000https://danmackinlay.name/notebook/functional_data.htmlRegression using curves Functional autoregression References Statistics where the samples are not just data but whole curves and manifolds, or subsamples from them. Function approximation meets statistics.
Regression using curves To quote Jim Ramsay:
Functional data analysis, […] is about the analysis of information on curves or functions. For example, these twenty traces of the writing of “fda” are curves in two ways: first, as static traces on the page that you see after the writing is finished, and second, as two sets functions of time, one for the horizontal “X” coordinate, and the other for the vertical “Y” coordinate.Lévy stochastic differential equations
https://danmackinlay.name/notebook/levy_sdes.html
Sat, 23 May 2020 18:19:58 +1000https://danmackinlay.name/notebook/levy_sdes.htmlReferences Stochastic differential equations driven by Lévy noise are not so tidy as Itō diffusions (although they are still somewhat tidy), so they are frequently brushed aside in stochastic calculus texts. But I need ’em! There is a developed sampling theory for these creatures called sparse stochastic process theory.
Possibly also chaos expansions might be a useful tool for modelling these, and/or Malliavin calculus whatever that is.Restricted isometry properties
https://danmackinlay.name/notebook/restricted_isometry_props.html
Mon, 09 Mar 2020 15:02:39 +1100https://danmackinlay.name/notebook/restricted_isometry_props.htmlRestricted Isometry Irrepresentability Incoherence Frame theory References Restricted isometry properties, a.k.a. uniform uncertainty principles (E. Candès and Tao 2005; E. J. Candès, Romberg, and Tao 2006), mutual incoherence (David L. Donoho 2006; D. L. Donoho, Elad, and Temlyakov 2006), irrepresentability conditions (Zhao and Yu 2006)…
This is mostly notes while I learn some definitions; expect no actual thoughts.
Recoverability conditions, as seen in sparse regression, sparse basis dictionaries, function approximation, compressed sensing etc.Kernel approximation
https://danmackinlay.name/notebook/kernel_approximation_inversion.html
Fri, 06 Mar 2020 11:50:22 +1100https://danmackinlay.name/notebook/kernel_approximation_inversion.htmlStationary kernels Inner product kernels Lectures: Fastfood etc Implementations Connections References InkedIcon, on inversions in Hilbert space, after Charles Stross
A page where I document what I don’t know about kernel approximation. A page about what I do know would be empty.
What I mean is: approximating implicit Mercer kernel feature maps with explicit features; Equivalently, approximating the Gram matrix, which is also related to mixture model inference and clustering.Cepstral transforms and harmonic identification
https://danmackinlay.name/notebook/cepstrum.html
Thu, 13 Feb 2020 19:19:46 +1100https://danmackinlay.name/notebook/cepstrum.htmlReferences See also machine listening, system identification.
The cepstrum of a time series takes the represents the power-spectrogram using a log link function. I haven’t actually read the foundational literature here (e.g. Bogert, Healy, and Tukey 1963), merely used some algorithms; but it seems to be mostly a hack for rapid identification of correlation lags where said lags are long.
For a generalized modern version, see Proietti and Luati (2019).(Reproducing) kernel tricks
https://danmackinlay.name/notebook/kernel_methods.html
Mon, 20 Jan 2020 13:55:43 +1100https://danmackinlay.name/notebook/kernel_methods.htmlIntroductions Kernel approximation RKHS distribution embedding Specific kernels Non-scalar-valued “kernels” References Kernel in the sense of the “kernel trick”. Not to be confused with smoothing-type convolution kernels, nor the dozens of related-but-slightly-different clashing definitions of kernel; those can have their own respective pages.
Kernel tricks use “reproducing” kernels as inner products between functions. a.k.a. Mercer kernels ((1909)). The classic machine learning explanation is that this induces a particularly tasty flavour of Hilbert space to work with.Gaussian processes
https://danmackinlay.name/notebook/gaussian_processes.html
Tue, 03 Dec 2019 10:11:26 +1100https://danmackinlay.name/notebook/gaussian_processes.htmlRelationship between addition of covariance kernels and of processes References “Gaussian Processes” are stochastic processes/fields with jointly Gaussian distributions of observations. The most familiar of these to many of us is the Gauss-Markov process, a.k.a. the Wiener process, but there are many others. These processes are convenient due to certain useful properties of the multivariate Gaussian distribution e.g. being uniquely specified by first and second moments, nice behaviour under various linear operations, kernel tricks….Non-uniform signal sampling
https://danmackinlay.name/notebook/signal_sampling_nonuniform.html
Tue, 03 Dec 2019 08:18:29 +1100https://danmackinlay.name/notebook/signal_sampling_nonuniform.htmlReferences Signal sampling without a uniform grid and thus a simple Nyquist Theorem. It turns out that this generalisation is not necessarily fatal for the theory.
Reviews in a functional analysis setting are given in (Piroddi and Petrou 2004; Babu and Stoica 2010; Unser 2000; Adcock et al. 2014; Adcock and Hansen 2016).
This problem AFAICT becomes much easier if one can use use priors to provide a theoretically tractable model of the nonuniformly sampled signal.Cherchez la martingale
https://danmackinlay.name/notebook/martingales.html
Sat, 30 Nov 2019 18:09:40 +0100https://danmackinlay.name/notebook/martingales.htmlReferences Like Markov processes, a weirdly useful class of stochastic processes. Often you can find a martingale within some stochastic process, or construct a martingale from a stochastic process and prove something nifty thereby; This idea connects and solves a bunch of tricky problems at once.
TODO: examples, maybe a CLT and something else wacky like the life table estimators of (Aalen 1978).
I am indebted to Saif Syed for setting my head straight about the utility of martingales, and Kevin Ross who, in part of Amir Dembo’s course materials, was the one whose explanation of the orthogonality interpretation of martingales finally communicated the neatness of this idea to me.Time frequency analysis
https://danmackinlay.name/notebook/time_frequency.html
Tue, 26 Nov 2019 09:25:01 +1100https://danmackinlay.name/notebook/time_frequency.htmlReferences 🏗
The approximation of a non-stationary signal by many locally stationary signals.
Here I care more about the hack where you take a non-localised spectrogram and attempt to localise it over short windows of a long signal. That comes next.
Chromatic derivatives, Welch-style DTFT spectrograms, wavelets sometimes. Wigner distribution (which is sort-of a joint distribution over time and frequency). Constant Q transforms.
Much to learn here, even in the deterministic case.Phase retrieval
https://danmackinlay.name/notebook/phase_retrieval.html
Thu, 07 Nov 2019 18:28:49 +0100https://danmackinlay.name/notebook/phase_retrieval.htmlReferences You know the power of the signal; what is the phase? Griffin-Lim algorithm, Wirtinger flow methods based on Wirtinger calculus, Phase-gradient heap integration (Pru and Søndergaard 2016).
Diagram from TiFGAN (Marafioti et al. 2019) via CJ.
TODO: investigate Yue M Lu’s work on phase retrieval as an important example in a large classe of somewhat- analytically-understood nonconvex problems, starting from his recent slide deck on that theme.Sparse coding
https://danmackinlay.name/notebook/sparse_coding.html
Tue, 05 Nov 2019 16:28:28 +0100https://danmackinlay.name/notebook/sparse_coding.htmlResources Wavelet bases Matching Pursuits Learnable codings Codings with desired invariances Misc Implementations References Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.Discrete time Fourier and related transforms
https://danmackinlay.name/notebook/dtft.html
Thu, 17 Oct 2019 09:23:59 +1100https://danmackinlay.name/notebook/dtft.htmlChirp z-transform Windowing the DTFT Chromatic derivatives References Care and feeding of Discrete Fourier transforms (DTFT), especially Fast Fourier Transforms, and other operators on discrete time series. Complexity results, timings, algorithms, properties. These are useful in a vast number of applications, such as filter design, time series analysis, various nifty optimisations of other algorithms etc.
Chirp z-transform Chirplets, one-sided discrete Laplace transform related to damped sinusoid representation.Spatial processes and statistics thereof
https://danmackinlay.name/notebook/spatial_statistics.html
Thu, 03 Oct 2019 09:46:31 +1000https://danmackinlay.name/notebook/spatial_statistics.htmlincoming Intros Kriging Spatial point processes Implementations spatstat Pysal PASSaGE References Statistics on fields with index sets of more than one dimension of support and, frequently, an implicit 2-norm. Sometimes they are also time-indexed. Especially, for processes on a continuous index set with continuous state and undirected interaction. Sometimes over fancy manifolds, although often you can get away with plain old euclidean space, unless you if you are doing spatial statistics over the entire planet, which turns out to be curved.Correlograms
https://danmackinlay.name/notebook/correlograms.html
Sun, 22 Sep 2019 13:23:31 +1000https://danmackinlay.name/notebook/correlograms.htmlReferences This material is revised and expanded from the appendix of draft versions of a recent conference submission, for my own reference. I used (deterministic) correlograms a lot in that, and it was startlingly hard to find a decent summary of their properties anywhere. Nothing new here, but… see the matrial about doing this in a probabilistic way via Wiener-Khintchine representation and covariance kernels which lead to a natural probabilistic spectral analysis.Nonparametric state filters via Gaussian Processes
https://danmackinlay.name/notebook/gp_state_filters.html
Wed, 18 Sep 2019 10:21:15 +1000https://danmackinlay.name/notebook/gp_state_filters.htmlReferences Two classic flavours together, Gaussian Processes and state filters. There are other nonparametric state filters, e.g. Variational filters and particle filters.
This is a kind of a dual to using a state filter to calculate a Gaussian process regression as a computational shorthand.
Here we use Gaussian processes to define the filter, in particular to learn nonparametric transition, observation or state densities for a generalized Kalman filter.Representer theorems
https://danmackinlay.name/notebook/representer_theorems.html
Mon, 16 Sep 2019 12:27:34 +1000https://danmackinlay.name/notebook/representer_theorems.htmlReferences In spatial statistics, Gaussian processes, kernel machines and covariance functions, regularisation.
🏗
References Bohn, Bastian, Michael Griebel, and Christian Rieger. 2018. “A Representer Theorem for Deep Kernel Learning.” June 7, 2018. http://arxiv.org/abs/1709.10441. Boyer, Claire, Antonin Chambolle, Yohann de Castro, Vincent Duval, Frédéric de Gournay, and Pierre Weiss. 2018. “Convex Regularization and Representer Theorems.” In. http://arxiv.org/abs/1812.04355. Boyer, Claire, Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric De Gournay, and Pierre Weiss.Wirtinger calculus
https://danmackinlay.name/notebook/wirtinger_calculus.html
Tue, 10 Sep 2019 10:02:46 +1000https://danmackinlay.name/notebook/wirtinger_calculus.htmlReferences How do you differentiate real-valued functions of complex arguments? Wirtinger calculus. This is a ridiculous hack that happens to work well for signal processing over the complex field, especially in optimisation. It arises naturally in, for example, phase retrieval, (Zhang and Liang 2016; Candes, Li, and Soltanolkotabi 2015; Chen and Candès 2015; Seuret and Gouaisbaut 2013). Because of its area of popularity, this will almost surely arise in combination also of matrix calculus.(Weighted) least squares fits
https://danmackinlay.name/notebook/least_squares.html
Wed, 22 May 2019 11:52:37 +1000https://danmackinlay.name/notebook/least_squares.htmlIteratively reweighted References A classic. Surprisingly deep.
A few non-comprehensive notes to approximating by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances.
As used in many many problems. e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems.Wiener theorem
https://danmackinlay.name/notebook/wiener_theorem.html
Wed, 08 May 2019 13:47:56 +1000https://danmackinlay.name/notebook/wiener_theorem.htmlReferences The special deterministic case of the Wiener-Khintchine theorem, written up with a slightly different notation for a slightly different project.
\[ \renewcommand{\lt}{<} \renewcommand{\gt}{>} \renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\boldsymbol{#1}} \renewcommand{\mmm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\ff}[1]{\mathfrak{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}[1]{\mathbb{I}\{#1\}} \renewcommand{\inner}[2]{\langle #1,#2\rangle} \renewcommand{\Inner}[2]{\left\langle #1,#2\right\rangle} \renewcommand{\finner}[3]{\langle #1,#2;#3\rangle} \renewcommand{\FInner}[3]{\left\langle #1,#2;#3\right\rangle} \renewcommand{\dinner}[2]{[ #1,#2]} \renewcommand{\DInner}[2]{\left[ #1,#2\right]} \renewcommand{\norm}[1]{\| #1\|} \renewcommand{\Norm}[1]{\left\| #1\right\|} \renewcommand{\fnorm}[2]{\| #1;#2\|} \renewcommand{\FNorm}[2]{\left\| #1;#2\right\|} \renewcommand{\trn}[1]{\mathcal{#1}} \renewcommand{\ftrn}[2]{\mathcal{#1}_{#2}} \renewcommand{\Ftrn}[3]{\mathcal{#1}_{#2}\left\{\right\}} \renewcommand{\argmax}{\mathop{\mathrm{argmax}}} \renewcommand{\argmin}{\mathop{\mathrm{argmin}}} \renewcommand{\omp}{\mathop{\mathrm{OMP}}} \]
As seen in correlograms.