control on Dan MacKinlayhttps://danmackinlay.name/tags/control.htmlRecent content in control on Dan MacKinlayHugo -- gohugo.ioen-usTue, 07 Jun 2022 15:13:23 +1000Markov decision problemshttps://danmackinlay.name/notebook/pomdp.htmlTue, 07 Jun 2022 15:13:23 +1000https://danmackinlay.name/notebook/pomdp.htmlClassic POMDP POMDP while learning forward propagator References TODO— connect to optimal control.
Classic Bellman and Howard’s classic discrete time control stochastic problem.
Warren Powell’s Introduction to Markov decision processes POMDP Figure: Partial observation of Mrs Brown’s
“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state.Generative flowhttps://danmackinlay.name/notebook/generative_flow.htmlMon, 07 Mar 2022 12:15:25 +1100https://danmackinlay.name/notebook/generative_flow.htmlReferences Placeholder. There are a lot of keywords in this new technique that sound intriguing, so here is a notebook to revisit if I ever have time.
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation GFlowNet Tutorial Bengio et al. (2021):
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.Rough path theoryhttps://danmackinlay.name/notebook/rough_paths.htmlThu, 05 Aug 2021 14:22:03 +1000https://danmackinlay.name/notebook/rough_paths.htmlDiscrete approximations In learning Signatures Code References I am not sure yet. Some kind of alternative extension of integrals which happens to make pathwise calculations ove stochastic integrals simple, in some sense. I am pretty sure they mean rough in the sense of approximate rather than the sense of not smooth. Or maybe both?
Seems to originate in a fairly impenetrable body of work by Lyons, e.g. T. Lyons (1994) but modern recommendations are to read Friz and Hairer (2020), available free online, as an introduction, which covers the simplest (?Bandit problemshttps://danmackinlay.name/notebook/bandit_problems.htmlFri, 16 Oct 2020 07:49:23 +1100https://danmackinlay.name/notebook/bandit_problems.htmlPseudopolitical diversion Intros Theory Practice Bandits-meet-optimisation Bandits-meet-evolution Details Delayed/sparse reward Multi-world testing Extensions Deep reinforcement learning Markov decision problems Connection to graphical models Practicalities Sequential surrogate interactive model optimisation Bandits with theory of mind References Bandit problems, Markov decision processes, a smattering of dynamic programming, game theory, optimal control, and online learning of the solutions to such problems, esp. reinforcement learning.
Learning, where you must learn an optimal action in response to your stimulus, possibly an optimal “policy” of trying different actions over time, not just an MMSE-minimal prediction from complete data.Zeros of random trigonometric polynomialshttps://danmackinlay.name/notebook/trig_roots.htmlMon, 14 Oct 2019 09:54:09 +1100https://danmackinlay.name/notebook/trig_roots.htmlReferences For a certain nonconvex optimisation problem, I would like to know the expected number of real zeros of trigonometric polynomials
\[0=\sum_{k=1}^{k=N}A(k)\sin(kx)B(k)\cos(kx)\]
for given distributions over \(A(k)\) and \(B(k)\).
This is not exactly the usual sense of polynomial, although if one thinks about polynomials over the complex numbers and squint at it the relationship is not hard to see.
This problem is well studied for i.i.d. standard normal coefficients \(A(k),B(k)\).