regression on Dan MacKinlay
https://danmackinlay.name/tags/regression.html
Recent content in regression on Dan MacKinlayHugo -- gohugo.ioen-usTue, 13 Apr 2021 14:40:14 +0800Gaussian process regression
https://danmackinlay.name/notebook/gp_regression.html
Tue, 13 Apr 2021 14:40:14 +0800https://danmackinlay.name/notebook/gp_regression.htmlQuick intro Density estimation Kernels Using state filtering On lattice observations On manifolds By variational inference With inducing variables By variational inference with inducing variables With vector output Approximation with dropout For dimension reduction Readings Implementations Geostat Framework GPy Stheno GPyTorch GPFlow Misc python Stan AutoGP scikit-learn Misc julia MATLAB References Chi Feng’s GP regression demo.
Gaussian random fields are stochastic processes/fields with jointly Gaussian distributions of observations.Dynamical systems via Koopman operators
https://danmackinlay.name/notebook/koopmania.html
Fri, 09 Apr 2021 11:46:21 +0800https://danmackinlay.name/notebook/koopmania.htmlReferences NB: Koopman here is B.O. Koopman (Koopman 1931) not S.J. Koopman, who also works in dynamical systems.
I do not know how this works, but maybe this fragment of abstract will do for now (Budišić, Mohr, and Mezić 2012):
A majority of methods from dynamical system analysis, especially those in applied settings, rely on Poincaré’s geometric picture that focuses on “dynamics of states.Neural nets with implicit layers
https://danmackinlay.name/notebook/nn_implicit.html
Mon, 15 Mar 2021 12:16:50 +1100https://danmackinlay.name/notebook/nn_implicit.htmlReferences A unifying framework for various networks, including neural ODEs, where our layers are not simple forward operations but who exacluation is represented as some optimisation problem.
For some info see the NeurIPS 2020 tutorial, Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond, by Zico Kolter, David Duvenaud, and Matt Johnson.
NB: This is different to the implicit representation method. Since implicit layers and implicit representation layers also occur in the same problems (such as ML PDES this terminological confusion will haunt us.Neural nets with basis decomposition layers
https://danmackinlay.name/notebook/nn_basis.html
Tue, 09 Mar 2021 12:06:42 +1100https://danmackinlay.name/notebook/nn_basis.htmlNeural networks with continuous basis functions Convolutional neural networks as sparse coding References Neural networks incorporating basis decompositions.
Why might you want to do this? For one it is a different lense to analyze neural nets’ mysterious success through. For another, it gives you interpolation for free. There are possibly other reasons - perhaps the right basis gives you better priors for undersstanding a partial differential equation?Stochastic processes which represent measures over the reals
https://danmackinlay.name/notebook/measure_priors.html
Mon, 08 Mar 2021 16:44:16 +1100https://danmackinlay.name/notebook/measure_priors.htmlSubordinators Other measure priors References Often I need to have a nonparametric representation for a measure over some non-finite index set. We might want to represent a probability, or mass, or a rate. I might want this representation to be something flexible and low-assumption, like a Gaussian process. If I want a nonparametric representation of functions this is not hard; I can simply use a Gaussian process.Convolutional subordinator processes
https://danmackinlay.name/notebook/subordinator_convolution.html
Mon, 08 Mar 2021 15:29:19 +1100https://danmackinlay.name/notebook/subordinator_convolution.htmlReferences Stochastic processes by convolution of noise with smoothing kernels, where the driving noise is a Lévy subordinator.
Why would we want this? One reason is that this gives us a way to create nonparametric distributions over measures.
References Barndorff-Nielsen, O. E., and J. Schmiegel. 2004. “Lévy-Based Spatial-Temporal Modelling, with Applications to Turbulence.” Russian Mathematical Surveys 59 (1): 65. https://doi.org/10.1070/RM2004v059n01ABEH000701. Çinlar, E. 1979. “On Increasing Continuous Processes.Convolutional Gaussian processes
https://danmackinlay.name/notebook/gp_convolution.html
Mon, 01 Mar 2021 17:08:51 +1100https://danmackinlay.name/notebook/gp_convolution.htmlConvolutions with respect to a non-stationary driving noise Varying convolutions with respect to a stationary white noise References Gaussian processes by convolution of noise with smoothing kernels, which is a kind of dual to defining them through covariances.
This is especially interesting because it can be made computationally convenient (we can enforce locality) and non-stationarity.
Convolutions with respect to a non-stationary driving noise H. K.Convolutional stochastic processes
https://danmackinlay.name/notebook/stochastic_convolution.html
Mon, 01 Mar 2021 16:13:24 +1100https://danmackinlay.name/notebook/stochastic_convolution.htmlReferences Stochastic processes generated by convolution of white noise with smoothing kernels, which is not unlike kernel density estimation where the “data” is random.
For now, I am mostly interested in certain special cases Gaussian process convolutionss and subordinator convolutions.
patrick-kidger/Deep-Signature-Transforms: Code for "Deep Signature Transforms" patrick-kidger/signatory: Differentiable computations of the signature and logsignature transforms, on both CPU and GPU. References Bolin, David.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_vector.html
Tue, 23 Feb 2021 12:09:36 +1100https://danmackinlay.name/notebook/gp_regression_vector.htmlCo-regionalization Multi-task Multi Output Spectral Mixture Kernel References In which I discover for myself whether “multi-task” and “co-regionalized” approaches are different. Álvarez, Rosasco, and Lawrence (2012)
Overview from Invenia: Gaussian Processes: from one to many outputs
Co-regionalization [the] community has begun to turn its attention to covariance functions for multiple outputs. One of the paradigms that has been considered (Bonilla, Chai, and Williams 2007; Osborne et al.Random-forest-like methods
https://danmackinlay.name/notebook/boosting_bagging.html
Thu, 11 Feb 2021 08:29:09 +1100https://danmackinlay.name/notebook/boosting_bagging.htmlRandom trees, forests, jungles Self-regularising properties Gradient boosting Bayes Implementations surfin xgboost catboost bartmachine References Doubling down on ensemble methods; mixing predictions from many weak learners (in this case decision trees) to get strong learners. “A selection of randomly stopped clocks is never far from wrong.”
There are many flavours of random-forest-like learning systems. The rule of thumb seems to be “Fast to train, fast to use.Multi-output Gaussian process regression
https://danmackinlay.name/notebook/gp_regression_functional.html
Mon, 07 Dec 2020 20:43:06 +1100https://danmackinlay.name/notebook/gp_regression_functional.htmlReferences In which I discov Learning operators via GPs.
References Brault, Romain, Florence d’Alché-Buc, and Markus Heinonen. 2016. “Random Fourier Features for Operator-Valued Kernels.” In Proceedings of The 8th Asian Conference on Machine Learning, 110–25. http://arxiv.org/abs/1605.02536. Brault, Romain, Néhémy Lim, and Florence d’Alché-Buc. n.d. “Scaling up Vector Autoregressive Models With Operator-Valued Random Fourier Features.” Accessed August 31, 2016. https://aaltd16.irisa.fr/files/2016/08/AALTD16_paper_11.pdf. Brouard, Céline, Marie Szafranski, and Florence D’Alché-Buc.Observability and sensitivity in learning dynamical systems
https://danmackinlay.name/notebook/sensitivity.html
Mon, 09 Nov 2020 13:38:40 +1100https://danmackinlay.name/notebook/sensitivity.htmlReferences The contact between ergodic theorems and statistical identifiability. How precisely can I learn a given parameter of a dynamical system from observation? In ODE theory a useful concept is sensitivity analysis, which tells us how much gradient information our observations give us about a parameter. This comes in local (at my current estimate) and global (for all parameter ranges) flavours
In linear systems theory the term observability is used to discuss whether we can in fact identify a parameter or a latent state, which I will conflate for the current purposes.Non-Gaussian Bayesian functional regression
https://danmackinlay.name/notebook/stochastic_process_regression.html
Wed, 16 Sep 2020 14:07:32 +1000https://danmackinlay.name/notebook/stochastic_process_regression.htmlReferences Regression using non-Gaussian random fields. Generalised Gaussian process regression.
Is there ever an actual need for this? Or can we just use mostly-Gaussian process with some non-Gaussian distribution marginal and pretend, via GP quantile regression, or some variational GP approximation or non-Gaussian likelihood over Guaussian latents. Presumably if we suspect higher moments than the second are important, or that there is some actual stochastic process that we know matches our phenomenon, we might bother with this, but oh my it can get complicated.Gaussian process quantile regression
https://danmackinlay.name/notebook/gp_quantile_regression.html
Wed, 16 Sep 2020 13:44:32 +1000https://danmackinlay.name/notebook/gp_quantile_regression.htmlReferences How to do quantile regression with GPs.
References Boukouvalas, Alexis, Remi Barillec, and Dan Cornford. 2012. “Gaussian Process Quantile Regression Using Expectation Propagation.” In ICML 2012. http://arxiv.org/abs/1206.6391. Reich, Brian J. 2012. “Spatiotemporal Quantile Regression for Detecting Distributional Changes in Environmental Processes.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 61 (4): 535–53. https://doi.org/10.1111/j.1467-9876.2011.01025.x. Reich, Brian J., Montserrat Fuentes, and David B.Long memory time series
https://danmackinlay.name/notebook/long_memory_processes.html
Thu, 28 May 2020 10:56:49 +1000https://danmackinlay.name/notebook/long_memory_processes.htmlReferences Hurst exponents, non-stationarity etc.
TBD.
References Beran, Jan. 1992. “Statistical Methods for Data with Long-Range Dependence.” Statistical Science 7 (4): 404–16. ———. 1994. Statistics for Long-Memory Processes. CRC Press. http://books.google.com?id=jdzDYWtfPC0C. ———. 2010. “Long-Range Dependence.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (1): 26–35. https://doi.org/10.1002/wics.52. Beran, Jan, and Norma Terrin. 1996. “Testing for a Change of the Long-Memory Parameter.” Biometrika 83 (3): 627–38. https://doi.Forecasting
https://danmackinlay.name/notebook/forecasting.html
Thu, 21 May 2020 11:32:13 +1000https://danmackinlay.name/notebook/forecasting.htmlModel selection Software Tidyverse time series analysis and forecasting packages prophet Causal impact asap Micropredictions.org References Time series prediction niceties, where what needs to be predicted is the future. Filed under forecasting because in machine learning terminology, prediction is a general term that does not imply extrapolation into the future necessarily.
🏗 handball to Rob Hyndman.
Model selection Rob Hyndman explains how to cross-validate time series models that use only the lagged observations.Post stratification
https://danmackinlay.name/notebook/post_stratification.html
Wed, 22 Apr 2020 08:44:52 +1000https://danmackinlay.name/notebook/post_stratification.htmlReferences A trick for handling a non-random sampling problem particularly common in survey data.
MRP, a.k.a. Mister P, is one method for correcting for non-response bias and other suc bias sampling. See also RPP.
I have not used this tool practically and so am not at all qualified to comment.
What I can do is link to my reading list of examples and explainers:Survey modelling
https://danmackinlay.name/notebook/survey_modelling.html
Tue, 21 Apr 2020 19:26:06 +1000https://danmackinlay.name/notebook/survey_modelling.htmlSampling challenges Post stratification Cluster randomized trials Ordinal data Confounding and observational studies Graph sampling. Data sets References Tricks of particular use in modeling survey data. Hierarchical models to adjust for issues such as non random sampling and the varied great difficulties of eliciting human preferences by asking them. A grab bag of the weird data types, problems and sampling bias problems.
Sampling challenges What is that Lizardman constant?Applied psephology
https://danmackinlay.name/notebook/psephology.html
Mon, 20 Jan 2020 06:34:30 +1100https://danmackinlay.name/notebook/psephology.htmlBetting on them Practicalities Australian specifics On voters strategically changing electorates References Tom Gauld:
Voting system
On the practicalities of voter modeling in elections, for the purpose of influencing how they vote, with special reference to Australian elections. Marketing psychology for governments, and for those who wish to have control of governments. This also relates, in these increasingly polarised times, to the difficulties of getting along.Gaussian processes
https://danmackinlay.name/notebook/gaussian_processes.html
Tue, 03 Dec 2019 10:11:26 +1100https://danmackinlay.name/notebook/gaussian_processes.htmlRelationship between addition of covariance kernels and of processes References “Gaussian Processes” are stochastic processes/fields with jointly Gaussian distributions of observations. The most familiar of these to many of us is the Gauss-Markov process, a.k.a. the Wiener process, but there are many others. These processes are convenient due to certain useful properties of the multivariate Gaussian distribution e.g. being uniquely specified by first and second moments, nice behaviour under various linear operations, kernel tricks….Sparse coding
https://danmackinlay.name/notebook/sparse_coding.html
Tue, 05 Nov 2019 16:28:28 +0100https://danmackinlay.name/notebook/sparse_coding.htmlResources Wavelet bases Matching Pursuits Learnable codings Codings with desired invariances Misc Implementations References Linear expansion with dictionaries of basis functions, with respect to which you wish your representation to be sparse; i.e. in the statistical case, basis-sparse regression. But even outside statistics, you wish simply to approximate some data compactly. My focus here is on the noisy-observation case, although the same results are recycled enough throughout the field.Delays and reverbs for audio processing
https://danmackinlay.name/notebook/delays.html
Tue, 22 Oct 2019 14:30:13 +1100https://danmackinlay.name/notebook/delays.htmlDesigning stable delays Designing allpass delays Designing delay lengths Delays for signal interpolations Things to try References In which I think about parameterisations and implementations of audio recurrence for use in music.
A particular nook in the the linear feedback process library.
Designing stable delays Also, parameterising stable Multi-Input-Multi-Output (MIMO) in signal processing can be done by using a Orthogonal and unitary matrices as the transfer operator, parameterising as stable linear systems.Statistical learning theory for time series
https://danmackinlay.name/notebook/learning_theory_time_series.html
Tue, 01 Oct 2019 16:20:07 +1000https://danmackinlay.name/notebook/learning_theory_time_series.htmlReferences Statistical learning theory for dependent data such as time series and possibly other dependency structures. But I only know about result for time series
Non-stationary, non-asymptotic bounds please. Keywords: Ergodic, α-, β-mixing.
Mohri and Kuznetsov have done lots of work here; See, e.g. their NIPS2016 tutorial. There seem to be a lot of types of ergodic/mixing results, about which I know as yet nothing.Models for count data
https://danmackinlay.name/notebook/count_models.html
Sun, 22 Sep 2019 13:24:44 +1000https://danmackinlay.name/notebook/count_models.htmlPoisson Negative Binomial Mean/dispersion parameterisation (Polya) Geometric Mean parameterisation Lagrangian distributions Poisson-Poisson Lagrangian Delta Lagrangian distributions General Lagrangian distribution Discrete Stable Zipf/Zeta models Yule-Simon Conway-Maxwell-Poisson Decomposability properties Stability Self-divisibility References I have data and/or predictions made up of non-negative integers \({\mathbb N }\cup\{0\}\). What probability distributions can I use to model it?
All the distributions I discuss here have support unbounded above.Covariance estimation for stochastic processes
https://danmackinlay.name/notebook/covariance_estimation.html
Sat, 21 Sep 2019 14:34:44 +1000https://danmackinlay.name/notebook/covariance_estimation.htmlBayesian Sandwich estimators Parametric covariance functions To read References Estimating the thing that is given to you by oracles in statistics homework assignments: the covariance, precision matrices of things, or, if you data is indexed in some fashion, the covariance kernel. A complement to Gaussian process simulation.
I am not doing a complete theory of covariance estimation here, just mentioning a couple of tidbits for future reference.Hierarchical models
https://danmackinlay.name/notebook/hierarchical_models.html
Mon, 19 Aug 2019 13:25:30 +1000https://danmackinlay.name/notebook/hierarchical_models.htmlImplementations The classical regression set up: Your process of interest generates observations conditional on certain predictors. The observations (but not predictors) are corrupted by noise.
Hierarchical set up: There is a directed graph of interacting random process interactions, generating the observations you observe, and you would like to reconstruct the parameters, possibly even conditional distributions of parameters.
Known as mixed effects models, hierarchical models, nested models (careful!(Weighted) least squares fits
https://danmackinlay.name/notebook/least_squares.html
Wed, 22 May 2019 11:52:37 +1000https://danmackinlay.name/notebook/least_squares.htmlIteratively reweighted References A classic. Surprisingly deep.
A few non-comprehensive notes to approximating by the arbitrary-but-convenient expedient of minimising the sum of the squares of the deviances.
As used in many many problems. e.g. lasso regression.
Nonlinear least squares with ceres-solver:
Ceres Solve is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Non-linear Least Squares problems with bounds constraints and general unconstrained optimization problems.Wacky regression
https://danmackinlay.name/notebook/wacky_regression.html
Thu, 02 May 2019 16:21:05 +1000https://danmackinlay.name/notebook/wacky_regression.htmlReferences I used to maintain a list of regression methods that were almost nonparametric, but as fun as that category was I was not actually suing it often so I broke it up.
See bagging and bosting methods, neural networks, functional data analysis, gaussian process regression and randomised regression.
References Fomel, Sergey. 2000. “Inverse B-Spline Interpolation.” Citeseer. http://www.reproducibility.org/RSF/book/sep/bspl/paper.pdf. Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.Feedback system identification, linear
https://danmackinlay.name/notebook/system_identification_linear.html
Tue, 23 Oct 2018 16:51:07 +1100https://danmackinlay.name/notebook/system_identification_linear.htmlIntros Instrumental variable regression Unevenly sampled Model estimation/system identification Slotting Method of transformed coefficients State filters Online Misc Linear Predictive Coding References In system identification, we infer the parameters of a stochastic dynamical system of a certain type, i.e. usually one with feedback, so that we can e.g. simulate it, or deconvolve it to find the inputs and hidden state, maybe using state filters.Feedback system identification, not necessarily linear
https://danmackinlay.name/notebook/system_identification_nonlinear.html
Wed, 03 Jan 2018 17:24:23 +1100https://danmackinlay.name/notebook/system_identification_nonlinear.htmlAwaiting filing References After all, if you have a system whose future evolution is important to predict, why not try to infer a plausible model instead of a convenient one?
I am in the process of taxonomising here. Stuff which fits the particular (likelihood) model of recursive estimation and so on will be kept there. Miscellaneous other approaches here.
A compact overview is inserted incidentally in Cosma’s review of Fan and Yao (2003) wherein he also recommends (Bosq and Blanke 2007; Bosq 1998; Taniguchi and Kakizawa 2000).Marketing psychology
https://danmackinlay.name/notebook/marketing_psychology.html
Mon, 29 May 2017 16:00:43 +1000https://danmackinlay.name/notebook/marketing_psychology.htmlLots of stuff here, especially at the intersection of privacy and behavioural economics.
However, I don’t know it.
Here are some links I return to.
How online shopping makes suckers of us all. 🏗 The notorious case of the supermarket finding out someone is pregnant before her family and using it in marketing.Generalised linear models
https://danmackinlay.name/notebook/glm.html
Wed, 31 Aug 2016 10:27:42 +1000https://danmackinlay.name/notebook/glm.htmlTODO Classic linear models Generalised linear models Response distribution Linear Predictor Link function Quaslilikelihood Hierarchical generalised linear models Generalised additive models Generalised additive models for location, scale and shape Generalised hierarchical additive models for location, scale and shape Generalised estimating equations References Using the machinery of linear regression to predict in somewhat more general regressions, using least-squares or quasi-likelihood approaches. This means you are still doing something like familiar linear regression, but outside the setting of e.Count time series models
https://danmackinlay.name/notebook/count_time_series.html
Wed, 09 Dec 2015 13:09:18 +0800https://danmackinlay.name/notebook/count_time_series.htmlMaximum processes Finite state Markov chains GLM-type autoregressive Linear branching-type and self-decomposable Queeing models Other References Statistical models for time series with discrete time index and discrete state index, i.e. lists of non-negative whole numbers with a causal ordering.
C&c symbolic dynamics, nonlinear time series wizardry, random fields, branching processes and Galton Watson processes for some important special cases. If there is no serial dependence, you might want unadorned count models.High frequency time series estimation
https://danmackinlay.name/notebook/high_frequency_time_series.html
Wed, 02 Dec 2015 12:29:44 +0800https://danmackinlay.name/notebook/high_frequency_time_series.htmlReferences a.k.a. “Fancy ARIMA”.
Classically, you estimate statistics from many i.i.d. realisations from a presumed generating process.
What if your data are realisations of sequentially dependent time series? How do you estimate parameters from a single time series realisation?
By being a flashy quant!
Bonus points: How do you do this with many time series, whose parameters themselves have a distribution you wish to estimate?Sparse regression for inhomogeneous Hawkes processes
https://danmackinlay.name/post/masters_thesis.html
Tue, 12 May 2015 12:42:05 +0200https://danmackinlay.name/post/masters_thesis.htmlSampling method for our insane social media dataset
I completed my Master’s Thesis in 2015 at the Swiss Federal Insitutde of Technology (ETHZ) under the supervision of Professors Sara va de Geer and Didier Sornette, and was awarded my MSc.
Keywords that the thesis combines:
sparse regression maximum likelihood contagion prcoesses count process inference social media models Hawkes process The novel part is a method of sparse regression and identification of branching processes under inhomogeneous conditions, which I will summarise and make available when I have time.