Gradient descent, Higher order
2019-10-26 — 2019-10-26
Wherein higher-order extensions of gradient descent are examined, and 3rd-order Halley–Chebyshev methods are noted to require tensors beyond Hessian matrices, rendering them costly in multivariate settings.
Suspiciously similar content
Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimization problems. Higher order optimization uses 3rd order gradients and so on. They are elegant for univariate functions.
This is rarely done in problems that I face because
- 3rd order derivatives of multivariate optimizations are usually too big in time and space complexity to be tractable
- They are not (simply) expressible as matrices so can benefit from a little tensor theory.
- Other reasons I don’t know about?
I have nothing to say about this now, but for my own reference, a starting keyword is Halley-Chebyshev methods which seems to be what the 3rd order methods are called.
John Cook has a nifty demo of how this works in the univariate case.