probability on Dan MacKinlay
https://danmackinlay.name/tags/probability.html
Recent content in probability on Dan MacKinlayHugo -- gohugo.ioen-usMon, 12 Apr 2021 12:19:45 +0800Randomized low dimensional projections
https://danmackinlay.name/notebook/low_d_projections.html
Mon, 12 Apr 2021 12:19:45 +0800https://danmackinlay.name/notebook/low_d_projections.htmlRandom projections are kinda Gaussian Random projections are distance preserving Projection statistics Concentration theorems for projections Weird spherical distribution facts of use References One way I can get at the confusing behaviours of high dimensional distributions is to instead look at low dimensional projections of them. If I have a (possibly fixed) data matrix and a random dimensional projection, what distribution does the projection have?Prediction processes
https://danmackinlay.name/notebook/prediction_processes.html
Fri, 09 Apr 2021 16:15:20 +0800https://danmackinlay.name/notebook/prediction_processes.htmlReferences Placeholder. idk really, but Cosma Shalizi has opinions on unifying some interesting ideas in this area using chains with complete connections. Maybe related (?) predictive processing as a model of the mind.
References Blasques, F., S. J. Koopman, and A. Lucas. 2015. “Information-Theoretic Optimality of Observation-Driven Time Series Models for Continuous Responses.” Biometrika 102 (2): 325–43. https://doi.org/10.1093/biomet/asu076. Cox, D. R., Gudmundur Gudmundsson, Georg Lindgren, Lennart Bondesson, Erik Harsaae, Petter Laake, Katarina Juselius, and Steffen L.Trading, hedging and portfolios in practice
https://danmackinlay.name/notebook/trading.html
Thu, 08 Apr 2021 12:50:46 +1000https://danmackinlay.name/notebook/trading.htmlFundamental considerations Behavioural considerations Portfolio design Betting Statistical considerations Technical considerations References A balanced trading portfolio
Nothing to see here at the moment, apart from snippets I found interesting, as a guy with good probability theory but weak financial skills.
Financial hacker is pragmatic. I can’t tell if it is fun because I can’t even tell if they are joking but numerai’s introduction to secrecy and information in financial markets is a …singular perspective.Probability divergences
https://danmackinlay.name/notebook/probability_metrics.html
Fri, 26 Mar 2021 08:39:15 +1100https://danmackinlay.name/notebook/probability_metrics.htmlOverview Norms with respect to Lebesgue measure on the state space Relative distributions \(\phi\)-divergences Kullback-Leibler divergence Total variation distance Hellinger divergence \(\alpha\)-divergence \(\chi^2\) divergence Hellinger inequalities Pinsker inequalities Integral probability metrics Wasserstein distance(s) Bounded Lipschitz distance Fisher distances Others Induced topologies To read References Allison Chaney
Quantifying difference between probability measures. Measuring the distribution itself, for, e.g. badness of approximation of a statistical fit.High dimensional statistics
https://danmackinlay.name/notebook/high_d_statistics.html
Tue, 23 Mar 2021 14:49:42 +1100https://danmackinlay.name/notebook/high_d_statistics.htmlSoap bubbles Empirical processes in high dimensions Markov Chain Monte Carlo in high dimensions References Placeholder to think about the many weird problems arising in very high dimensional statistical inference. There are many approaches to this problem: throwing out dimensions/predictors as in model selection, considering low dimensional projections, viewing objects with matrix structure for concentration or factorisation, or tensor structure even.
Soap bubbles High dimensional distributions are extremely odd, and concentrate in weird ways.The Gaussian distribution
https://danmackinlay.name/notebook/gaussian_distribution.html
Tue, 23 Mar 2021 10:10:08 +1100https://danmackinlay.name/notebook/gaussian_distribution.htmlDensity, CDF Differential representations Stein’s representation ODE representation for the univariate density ODE representation for the univariate icdf Density PDE representation as a diffusion equation Extremes Orthogonal basis Rational function approximations Roughness Entropy Multidimensional marginals and conditionals Fourier representation Transformed variables Metrics Wasserstein Kullback-Leibler Hellinger What is Erf again? References Bell curves
Many facts about the useful, boring, ubiquitous Gaussian.Generically approximating probability distributions
https://danmackinlay.name/notebook/approximating_dists.html
Mon, 22 Mar 2021 14:20:29 +1100https://danmackinlay.name/notebook/approximating_dists.htmlStein’s method References There are various approximations we might use for a a probability distribution. Empirical CDFs, Kernel density estimates, variational approximation, Edgeworth expansions, Laplace approximations…
From each of these we might get close in some metric to the desired target.
This is a broad topic which I cannot hope to cover in full generality. Special cases of interest include
Statements about where the probability mass is with high probability (concentration theorems) statements about the asymptotic distributions of variables eventually approaching some distribution as some parameter goes to infinity (limit theorems.Stein’s method
https://danmackinlay.name/notebook/steins_method.html
Mon, 22 Mar 2021 14:20:29 +1100https://danmackinlay.name/notebook/steins_method.htmlNon-Gaussian Stein Multivariate Stein References A famous generic method for approximating distributions is Stein’s method of exchangeable pairs (Stein 1986, 1972). Wikipedia is good on this.
Meckes (2009) summarises.
Heuristically, the univariate method of exchangeable pairs goes as follows. Let \(W\) be a random variable conjectured to be approximately Gaussian; assume that \(\mathbb{E} W=0\) and \(\mathbb{E} W^{2}=1 .\) From \(W,\) construct a new random variable \(W^{\prime}\) such that the pair \(\left(W, W^{\prime}\right)\) has the same distribution as \(\left(W^{\prime}, W\right) .Diagramming and visualising graphical models
https://danmackinlay.name/notebook/diagrams_graphical_models.html
Mon, 15 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/diagrams_graphical_models.htmlDaggity dagR yEd diagrammeR flowchart.fun Mermaid TETRAD Matplotlib Graphviz tikz Misc References On the art and science of algorithmic line drawings for representing graphical models, which is a important part of statistics. The diagrams we need here are nearly flowchart-like, so I can sketch them with a flowchart if need be; but they are closely integrated with the equations of a particular statistical model, so I would like to incorporate them into the same system to avoid tedious and error-prone manual sync.Log concave distributions
https://danmackinlay.name/notebook/log_concave_dist.html
Thu, 11 Mar 2021 09:32:05 +1100https://danmackinlay.name/notebook/log_concave_dist.htmlLangevin MCMC References Langevin MCMC “a Markov Chain reminiscent of noisy gradient descent”. Holden Lee, Andrej Risteski introduce this the connection between log-concavity and convex optimisation.
\[ x_{t+\eta} = x_t - \eta \nabla f(x_t) + \sqrt{2\eta}\xi_t,\quad \xi_t\sim N(0,I). \]
Rob Salomone explains this well; see Hodgkinson, Salomone, and Roosta (2019).
Andrej Risteski’s Beyond log-concave sampling series is a also a good introduction to log-concave sampling.Markov Chain Monte Carlo methods
https://danmackinlay.name/notebook/mcmc.html
Thu, 11 Mar 2021 09:32:05 +1100https://danmackinlay.name/notebook/mcmc.htmlHamiltonian Monte Carlo Connection to variational inference Adaptive MCMC Langevin Monte carlo Tempering Mixing rates Debiasing via coupling References This chain pump is not a good metaphor for how a Markov chain Monte Carlo sampler works, but it does correctly evoke the effort involved.
Despite studying within this area, I have nothing to say about MCMC broadly, but I do have some things I wish to keep notes on.Reparameterization tricks in inference
https://danmackinlay.name/notebook/reparameterization_trick.html
Mon, 08 Mar 2021 18:07:53 +1100https://danmackinlay.name/notebook/reparameterization_trick.htmlFor variational autoencoders “Normalized” flows For density estimation Representational power of Tutorials References Approximating the desired distribution by perturbation of the available distribution
A trick in e.g. variational inference, especially autoencoders, for density estimation in probabilistic deep learning, best summarised as “fancy change of variables to that I can differentiate through the parameters of a distribution”. Connections to optimal transport and likelihood free inference in that this trick can enable some clever approximate-likelihood approaches.Stochastic processes which represent measures over the reals
https://danmackinlay.name/notebook/measure_priors.html
Mon, 08 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/measure_priors.htmlSubordinators Other measure priors References Often I need to have a nonparametric representation for a measure over some non-finite index set. We might want to represent a probability, or mass, or a rate. I might want this representation to be something flexible and low-assumption, like a Gaussian process. If I want a nonparametric representation of functions this is not hard; I can simply use a Gaussian process.Matrix measure concentration inequalities and bounds
https://danmackinlay.name/notebook/matrix_concentration.html
Mon, 08 Mar 2021 11:08:41 +1100https://danmackinlay.name/notebook/matrix_concentration.htmlMatrix Chernoff Matrix Chebychev Matrix Bernstein Matrix Efron-Stein Gaussian References Concentration inequalities for matrix-valued random variables.
Recommended overviews are J. A. Tropp (2015); van Handel (2017); Vershynin (2018).
Matrix Chernoff J. A. Tropp (2015) summarises:
In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts.Measure concentration inequalities
https://danmackinlay.name/notebook/concentration_of_measure.html
Thu, 04 Mar 2021 09:34:53 +1100https://danmackinlay.name/notebook/concentration_of_measure.htmlBackground Markov Chebychev Chernoff Hoeffding Efron-Stein Kolmogorov Gaussian Sub-Gaussian Martingale bounds Khintchine Empirical process theory Matrix concentration References A corral captures the idea of concentration of measure; we have some procedure that guarantees that most of mass (of buffalos) is where we can handle it. Image: Kevin M Klerks, CC BY 2.0
Welcome to the probability inequality mines!
When something in your process (measurement, estimation) means that you can be pretty sure that a whole bunch of your stuff is particularly likely to be somewhere in particular.Random fields as stochastic differential equations
https://danmackinlay.name/notebook/random_fields_as_sdes.html
Mon, 01 Mar 2021 17:08:40 +1100https://danmackinlay.name/notebook/random_fields_as_sdes.htmlCreating a stationary Markov SDE with desired covariance Convolution representations Covariance representation Input measures \(\mu\) is a hypercube \(\mu\) is the unit sphere \(\mu\) is an isotropic Gaussian Without stationarity via Green’s functions References \(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\sinc}{\operatorname{sinc}}\)
The representation of certain random fields, especially Gaussian random fields as stochastic differential equations. This is the engine that makes filtering Gaussian processes go, and is also a natural framing for probabilistic spectral analysis.Frames and Riesz bases
https://danmackinlay.name/notebook/frames.html
Wed, 24 Feb 2021 08:47:49 +1100https://danmackinlay.name/notebook/frames.htmlReferences Overcomplete basis
You want a fancy basis for your vector space? Try frames! You might care in this case about restricted isometry properties.
Morgenshtern and Bölcskei (Morgenshtern and Bölcskei 2011):
Hilbert spaces and the associated concept of orthonormal bases are of fundamental importance in signal processing, communications, control, and information theory. However, linear independence and orthonormality of the basis elements impose constraints that often make it difficult to have the basis elements satisfy additional desirable properties.Causal inference in the continuous limit
https://danmackinlay.name/notebook/causality_continuous.html
Wed, 17 Feb 2021 20:41:08 +1100https://danmackinlay.name/notebook/causality_continuous.htmlReferences Causality on continuous index spaces, and, which turns out to be related, equilibrium/feedback dynamics. Placeholder.
Bongers and Mooij (2018):
Uncertainty and random fluctuations are a very common feature of real dynamical systems. For example, most physical, financial, biochemical and engineering systems are subjected to time-varying external or internal random disturbances. These complex disturbances and their associated responses are most naturally described in terms of stochastic processes.Stability in linear dynamical systems
https://danmackinlay.name/notebook/stability_dynamical_linear.html
Tue, 16 Feb 2021 08:23:02 +1100https://danmackinlay.name/notebook/stability_dynamical_linear.htmlPole representations Reparameterisation Continuous time Stability and gradient descent References The intersection of linear dynamical systems and stability of dynamic systems.
There is not much content here because I spent 2 years working on it and am too traumatised to revisit it.
Informally, I am admitting as “stable” any dynamical system which does not explode super-polynomially fast; We can think of these as systems where if the system is not stationary then at least the rate of change might be.Chaos expansions
https://danmackinlay.name/notebook/chaos_expansion.html
Mon, 15 Feb 2021 10:53:01 +1100https://danmackinlay.name/notebook/chaos_expansion.htmlPolynomial chaos expansion “Generalized” chaos expansion Arbitrary chaos expansion References Placeholder, for a topic which has a slightly confusing name. To explore: Connection to/difference from other methods of keeping track of evolution of uncertainty in dynamical systems. C&C Gaussian process regression as used in Gratiet, Marelli, and Sudret (2016), functional data analysis etc.
This is not the same thing as chaos in the sense of the deterministic chaos made famous by dynamical systems theory and fractal t-shirts.Mind as statistical learner
https://danmackinlay.name/notebook/mind_as_ml.html
Thu, 28 Jan 2021 19:30:56 +1100https://danmackinlay.name/notebook/mind_as_ml.htmlLanguage theory Descriptive Bayesian models of cognition That free energy thing References Various morsels on the theme of what-machine-learning-teaches-us-about-our-own-learning. Thus biomimetic algorithms find their converse in our algo-mimetic biology.
This should be more about general learning theory insights. Nitty gritty details about how computing is done by biological systems is more what I think of as biocomputing. If you can unify those then well done, you can grow minds in a petri dish.Stochastic partial differential equations
https://danmackinlay.name/notebook/spdes.html
Wed, 27 Jan 2021 12:42:41 +1100https://danmackinlay.name/notebook/spdes.htmlReferences Placeholder, for the multidimensional PDE version of SDEs.
This picture of ice floes on the Bering shelf looks like it might be some kinda stochastic PDE thing, right?
References Bolin, David, and Kristin Kirchner. 2020. “The Rational SPDE Approach for Gaussian Random Fields With General Smoothness.” Journal of Computational and Graphical Statistics 29 (2): 274–85. https://doi.org/10.1080/10618600.2019.1665537. Dalang, Robert C., Davar Khoshnevisan, and Firas Rassoul-Agha, eds.Feynman-Kac formulae
https://danmackinlay.name/notebook/feynman_kac.html
Wed, 27 Jan 2021 11:55:19 +1100https://danmackinlay.name/notebook/feynman_kac.htmlReferences There is a mathematically rich theory about particle filters work. The notoriously abstruse Del Moral (2004); Doucet, Freitas, and Gordon (2001) are universally commended for unifying and making consistent the diffusion processes and Feynman-Kac formulae and “propagation of chaos”. I will get around to them eventually, maybe?
References Cérou, F., P. Del Moral, T. Furon, and A. Guyader. 2011. “Sequential Monte Carlo for Rare Event Estimation.Generative adversarial learning
https://danmackinlay.name/notebook/adversarial_learning_generative.html
Mon, 14 Dec 2020 16:32:29 +1100https://danmackinlay.name/notebook/adversarial_learning_generative.htmlWassterstein loss/regularisation Conditional Invertible GANs as SDEs References The critic providing a gradient update to the generator
Game theory meets learning. Hip, especially in combination with deep learning, because it provides an elegant means of likelihood free inference.
I don’t know anything about it. Something about training two systems together to both generate and classify examples of a phenomenon of interest.
Sanjeev Arora gives a cogent intro He also suggests a link with learning theory.Free energy
https://danmackinlay.name/notebook/free_energy.html
Tue, 01 Dec 2020 15:26:58 +1100https://danmackinlay.name/notebook/free_energy.htmlIn variational Bayes As a model for cognition References Not “free as in speech” or “free as in beer”, nor “free energy” in the sense of perpetual motion machines, zero point energy or pills that turn your water into petroleum, but rather a particular mathematical object that pops up in variational Bayes inference and in wacky theories of cognition.
In variational Bayes Variational Bayes inference is a formalism for learning, borrowing bits from statistical mechanics and graphical models.Predictive processing
https://danmackinlay.name/notebook/predictive_processing.html
Tue, 01 Dec 2020 15:26:58 +1100https://danmackinlay.name/notebook/predictive_processing.htmlLayperson intros Free energy References Related: mind as learning process. Maybe related (?) prediction processes. To learn: Is this what the information-dynamics folks are wondering about also, e.g. Ay et al. (2008) or Tishby and Polani (2011)?
Layperson intros Confirmation Bias in Action Book Review: Surfing Uncertainty Free energy This term, with an analogous definition to the use in variational inference appears to pop up in a “free energy principle” where it is instrumental as a unifying concept for learning systems such as brains.Random embeddings and hashing
https://danmackinlay.name/notebook/random_embedding.html
Tue, 01 Dec 2020 14:01:36 +1100https://danmackinlay.name/notebook/random_embedding.htmlReferences Separation of inputs by random projection
See also matrix factorisations, for some extra ideas on why random projections have a role in motivating compressed sensing, arndomised regressions etc.
Occasionally we might use non-linear projections to increase the dimensionality of our data in the hope of making a non-linear regression approximately linear, which dates back to (Cover 1965).
Cover’s Theorem (Cover 1965):
It was shown that, for a random set of linear inequalities in \(d\) unknowns, the expected number of extreme inequalities, which are necessary and sufficient to imply the entire set, tends to \(2d\) as the number of consistent inequalities tends to infinity, thus bounding the expected necessary storage capacity for linear decision algorithms in separable problems.Randomised regression
https://danmackinlay.name/notebook/randomised_regression.html
Tue, 01 Dec 2020 14:00:10 +1100https://danmackinlay.name/notebook/randomised_regression.htmlReferences Tackling your regression, by using random projections of the predictors.
Usually this means using those projections to reduce the dimensionality of a high dimensional regression. In this case it is not far from compressed sensing, except in how we handle noise. In this linear model case, this is of course random linear algebra, and may be a randomised matrix factorisation.
I am especially interested in seeing how this might be useful for dependent data, especially time series.Recommender systems
https://danmackinlay.name/notebook/recommender_systems.html
Mon, 30 Nov 2020 14:55:18 +1100https://danmackinlay.name/notebook/recommender_systems.htmlReferences Not my area, but I need a landing page to refer to for some non-specialist contacts of mine.
I am most familiar with the matrix factorization approaches (e.g. factorization machines, NNMF) but there are many, e.g. variational autoencoder approaches are en vogue.
An overview by Javier lists many approaches.
Most Popular recommendations (the baseline) Item-User similarity based recommendations kNN Collaborative Filtering recommendations GBM based recommendations Non-Negative Matrix Factorization recommendations Factorization Machines (Steffen Rendle 2010) Field Aware Factorization Machines (Yuchin Juan, et al, 2016) Deep Learning based recommendations (Wide and Deep, Heng-Tze Cheng, et al, 2016) Neural Collaborative Filtering (Xiangnan He et al.Variational inference by message-passing in graphical models
https://danmackinlay.name/notebook/message_passing.html
Wed, 25 Nov 2020 17:42:32 +1100https://danmackinlay.name/notebook/message_passing.htmlReferences Variational inference where the model factorizes over some graphical independence structure, which means we get cheap and distributed inference. I am currently particularly interested in this for latent GP models. Many things can be expressed as message passing algorithms. The grandparent idea in this unification seems to be “Belief propagation”, a.k.a. “sum-product message-passing”, credited to (Pearl, 1982) for DAGs and then generalised to MRFs, PGMs, factor graphs etc.Probabilistic spectral analysis
https://danmackinlay.name/notebook/probabilistic_spectral_analysis.html
Wed, 25 Nov 2020 11:33:34 +1100https://danmackinlay.name/notebook/probabilistic_spectral_analysis.htmlClassic: stochastic processes studied via correlation function Non-stationary spectral kernel Change point detection version Non-Gaussian approaches References Graphical introduction to nonstationary modelling of audio data. The input (bottom) is a sound recording of female speech. We seek to decompose the signal into Gaussian process carrier waveforms (blue block) multiplied by a spectrogram (green block). The spectrogram is learned from the data as a nonnegative matrix of weights times positive modulators (top).Hidden Markov Model inference for Gaussian Process regression
https://danmackinlay.name/notebook/gp_filtering.html
Wed, 25 Nov 2020 11:28:43 +1100https://danmackinlay.name/notebook/gp_filtering.htmlSpatio-temporal usage Miscellaneous notes towards implementation References Classic flavours together, Gaussian processes and state filters/ stochastic differential equations and random fields as stochastic differential equations.
I am interested here in the trick which makes certain Gaussian process regression problems soluble by making them local, i.e. Markov, with respect to some assumed hidden state, in the same way Kalman filtering does Wiener filtering. This means you get to solve a GP as an SDE.External validity
https://danmackinlay.name/notebook/external_validity.html
Mon, 09 Nov 2020 15:58:56 +1100https://danmackinlay.name/notebook/external_validity.htmlStandard graphical models Tools Salad Meta References TBD.
This Maori gentleman from the 1800s demonstrates an artful transfer learning from the western fashion domain
One could read Sebastian Ruder’s NN-style introduction to “transfer learning”. NN people like to think about this in particular way which I like because of the diversity of out-of-the-box ideas it invites and which I dislike because it is sloppy.Weighted data in statistics
https://danmackinlay.name/notebook/weighted_data.html
Fri, 06 Nov 2020 08:48:18 +1100https://danmackinlay.name/notebook/weighted_data.htmlThomas Lumley helpfully disambiguates the “three and half distinct uses of the term weights in statistical methodology”.
The three main types of weights are
the ones that show up in the classical theory of weighted least squares. These describe the precision (1/variance) of observations. …. I call these precision weights; Stata calls them analytic weights. the ones that show up in categorical data analysis. These describe cell sizes in a data set, so a weight of 10 means that there are 10 identical observations in the dataset, which have been compressed to a covariate pattern plus a count.Causal inference on DAGs
https://danmackinlay.name/notebook/causal_inference.html
Wed, 04 Nov 2020 12:36:13 +1100https://danmackinlay.name/notebook/causal_inference.htmlLearning materials do-calculus Counterfactuals Continuously indexed fields External validity Propensity scores Causal Graph inference from data Causal time series DAGS Drawing graphical models Tools References Inferring the optimal intervention requires accounting for which arrows are independent of which
Inferring cause and effect from nature. Graphical models and related techniques for doing it. Avoiding the danger of folk statistics. Observational studies, confounding, adjustment criteria, d-separation, identifiability, interventions, moral equivalence…Markov Chain Monte Carlo methods
https://danmackinlay.name/notebook/importance_sampling.html
Wed, 28 Oct 2020 13:16:47 +1100https://danmackinlay.name/notebook/importance_sampling.htmlReferences TBD. Sampling from approximate distributions by reweighting.
Art Owen’s Importance sampling chapter
References Ben Rached, N., A. Kammoun, M.-S. Alouini, and R. Tempone. 2016. “Unified Importance Sampling Schemes for Efficient Simulation of Outage Capacity over Generalized Fading Channels.” IEEE Journal of Selected Topics in Signal Processing 10 (2, 2): 376–88. https://doi.org/10.1109/JSTSP.2015.2500201. Ben Rached, Nadhir, Zdravko Botev, Abla Kammoun, Mohamed-Slim Alouini, and Raul Tempone.ELBO
https://danmackinlay.name/notebook/elbo.html
Wed, 28 Oct 2020 10:59:07 +1100https://danmackinlay.name/notebook/elbo.htmlReferences \(\renewcommand{\Ex}{\mathbb{E}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\kl}{\operatorname{KL}} \renewcommand{\H}{\mathbb{H}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\pd}{\partial}\)
On using the most convenient probability metric (i.e. KL divergence) to do variational inference.
There is nothing novel here. But everyone who is doing variational inference has to work through this just once, and I’m doing so here.
Yuge Shi’s introduction is the best short intro that gets to state-of-the-art. The canonical intro is de Garis Matthews (2017) who did a thesis on it.Path continuity of stochastic processes
https://danmackinlay.name/notebook/path_continuity.html
Tue, 27 Oct 2020 07:22:04 +1100https://danmackinlay.name/notebook/path_continuity.htmlKolmogorov continuity theorem. Stochastic continuity Continuity entailments SDEs with rough paths “Random DEs” References \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
“When are the paths of a stochastic process continuous?” is a question one might like to ask. But things are never so simple in stochastic process theory. Continuity is not unambiguous here. we need to ask more precise questions. If we are concerned about whether the paths sampled from the process are almost-surely continuous functions then we probably mean something like:Efficient factoring of GP likelihoods
https://danmackinlay.name/notebook/gp_factoring.html
Mon, 26 Oct 2020 12:46:34 +1100https://danmackinlay.name/notebook/gp_factoring.htmlBasic sparsity via inducing variables SVI for Gaussian processes Latent Gaussian Process models References There are many ways to cleverly slice up GP likelihoods so that inference is cheap.
This page is about some of them, especially the union of sparse and variational tricks. Scalable Gaussian process regressions choose cunning factorisations such that the model collapses down to a lower-dimensional thing than it might have seemed to need, at least approximately.Transforms of RVs
https://danmackinlay.name/notebook/transforms_of_rvs.html
Fri, 23 Oct 2020 07:54:19 +1100https://danmackinlay.name/notebook/transforms_of_rvs.htmlStochastic Itō-Taylor expansion Linearization Unscented transform References I have a nonlinear transformation of a random process. What is its distribution?
Stochastic Itō-Taylor expansion See stochastic taylor expansion. tl;dr: More trouble than it is worth.
Linearization As seen in the Ensemble Kalman Filter.
Unscented transform The great invention of Uhlmann and Julier is unscented transform, which uses a ‘\(\sigma\)-point approximation.’
In the context of Kalman filtering,Itō-Taylor expansion
https://danmackinlay.name/notebook/stochastic_taylor_expansion.html
Thu, 15 Oct 2020 13:38:07 +1100https://danmackinlay.name/notebook/stochastic_taylor_expansion.htmlReferences Placeholder, for discussing the Taylor expansion equivalent for an SDE.
Let \(f\) denote a smooth function. Then from Itō’s lemma, \[ f\left(X_{t}\right)=f\left(X_{0}\right)+\int_{s=0}^{t} L^{0} f\left(X_{s}\right) d s+\int_{s=0}^{t} L^{1} f\left(X_{s}\right) d B_{s} \] where the operators \(L^{0}\) and \(L^{1}\) are defined by \[ L^{0}=a(x) \frac{\partial}{\partial x}+\frac{1}{2} b(x)^{2} \frac{\partial^{2}}{\partial x^{2}} \quad \text { and } \quad L^{1}=b(x) \frac{\partial}{\partial x} \] We may repeat this procedure arbitrarily many times.Differentiating through the Gamma
https://danmackinlay.name/notebook/gamma_diff.html
Thu, 15 Oct 2020 10:50:59 +1100https://danmackinlay.name/notebook/gamma_diff.htmlReferences Suppose I want to find a distributional gradient for a gamma process. Generalically I woudl find this via monte carlo gradient estimation.
Here is a problem-specific method:
I allow the latent random state to have more dimensions than a univariate. Let’s get specific. An example arises if we raid the random-variate-generation literature for transform methods to generate RNGs and differentiate A Gamma variate can be generated by a transformed normal and a uniform random variable,or two uniforms, depending on the parameter range.Gamma processes
https://danmackinlay.name/notebook/gamma_processes.html
Tue, 13 Oct 2020 15:13:34 +1100https://danmackinlay.name/notebook/gamma_processes.htmlGamma distribution Moments Multivariate gamma distribution with dependence Gamma superpositions The Gamma process Gamma bridge Time-warped gamma process Matrix gamma processes Centred gamma process As a Lévy process Gradients Gamma random field References \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}}\]
Gamma processes provide the classic subordinator models, i.e. non-decreasing Lévy processes. By “gamma process” in fact I mean specifically a Lévy process with gamma increments.Inverse problems for complex models
https://danmackinlay.name/notebook/inverse_problems_for_complex_models.html
Tue, 13 Oct 2020 12:07:35 +1100https://danmackinlay.name/notebook/inverse_problems_for_complex_models.htmlReferences Inverse problems where the model is more or less a black box.
References Brehmer, Johann, Gilles Louppe, Juan Pavez, and Kyle Cranmer. 2020. “Mining Gold from Implicit Models to Improve Likelihood-Free Inference.” Proceedings of the National Academy of Sciences 117 (10): 5242–49. https://doi.org/10.1073/pnas.1915980117. Cranmer, Kyle, Johann Brehmer, and Gilles Louppe. 2020. “The Frontier of Simulation-Based Inference.” Proceedings of the National Academy of Sciences, May.Subordinators
https://danmackinlay.name/notebook/subordinators.html
Thu, 08 Oct 2020 15:54:10 +1100https://danmackinlay.name/notebook/subordinators.htmlProperties Gamma processes Poisson processes Compound Poisson processes with non-negative increments Inverse Gaussian processes An increasing linear function is a subordinator Positive linear combinations of other subordinators Subordination of other subordinators Generalized Gamma Convolutions via Kendall’s identity Multivariate Subordinator-valued stochastic process References \[\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\bf}[1]{\mathbf{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}{\mathbb{I}}\]
A subordinator is an a.s. non-decreasing Lévy process \(\{\rv{g}(t)\}, t \in \mathbb{R}\) with state space \(\mathbb{R}_+\equiv [0,\infty]\) such thatSparse model selection
https://danmackinlay.name/notebook/sparse_model_selection.html
Fri, 02 Oct 2020 17:50:51 +1000https://danmackinlay.name/notebook/sparse_model_selection.htmlFOCI Stability selection Relaxed Lasso Dantzig Selector Garotte Degrees-of-freedom penalties References On choosing the right model and regularisation parameter in sparse regression, which turn out to be nearly the same, and closely coupled to doing the regression. There are some wrinkles.
🏗 Talk about when degrees-of-freedom penalties work, when cross-validation and so on.
FOCI The new hotness sweeping the world is FOCI, a sparse model selection procedure (Azadkia and Chatterjee 2019) based on Chatterjee’s ξ statistic as an independence test test.Monte Carlo gradient estimation
https://danmackinlay.name/notebook/mc_grad.html
Wed, 30 Sep 2020 10:59:22 +1000https://danmackinlay.name/notebook/mc_grad.htmlReferences Taking gradients through integrals.
See Mohamed et al. (2020) for a roundup.
https://github.com/deepmind/mc_gradients
A common activity for me at the moment is differentiating the integral - for example, through the inverse-CDF lookup.
You see, what I would really like is the derivative of the mass-preserving continuous map \(\phi_{\theta, \tau}\) such that
\[\mathsf{z}\sim F(\cdot;\theta) \Rightarrow \phi_{\theta, \tau}(\mathsf{z})\sim F(\cdot;\tau). \] Now suppose I wish to optimise or otherwise perturb \(\theta\).Monte Carlo optimisation
https://danmackinlay.name/notebook/mc_opt.html
Wed, 30 Sep 2020 10:59:22 +1000https://danmackinlay.name/notebook/mc_opt.htmlReferences Optimisation via Monte Carlo Simulation. Annealing and all that. TBD.
References Abernethy, Jacob, and Elad Hazan. 2016. “Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier.” In International Conference on Machine Learning, 2520–28. PMLR. http://proceedings.mlr.press/v48/abernethy16.html. Botev, Zdravko I., and Dirk P. Kroese. 2008. “An Efficient Algorithm for Rare-Event Probability Estimation, Combinatorial Optimization, and Counting.” Methodology and Computing in Applied Probability 10 (4, 4): 471–505.Splitting simulation
https://danmackinlay.name/notebook/splitting_simulation.html
Mon, 28 Sep 2020 10:38:21 +1000https://danmackinlay.name/notebook/splitting_simulation.htmlReferences Splitting is a method for zooming in to the important region of an intractable probability distribution
I have just spent so much time writing about this that I had better pause for a while and leave this as a placeholder.
References Aalen, Odd O., Ørnulf Borgan, and S. Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Statistics for Biology and Health.Extreme value theory
https://danmackinlay.name/notebook/extreme_value_theory.html
Fri, 25 Sep 2020 16:25:18 +1000https://danmackinlay.name/notebook/extreme_value_theory.htmlGeneralized Pareto Distribution Generalized Extreme Value distributions Burr distribution References In a satisfying way, it turns out that there are only so many shapes that probability densities can assume as they head off toward infinity. Extreme value theory makes this notion precise, and gives us some tools to work with them.
See also densities and intensities, survival analysis.
🏗
Generalized Pareto Distribution Best intro from Hosking and Wallis (1987):