Gradient descent, Higher order

October 26, 2019 — October 26, 2019

functional analysis
neural nets

Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimiztion problems. Higher order optimisation uses 3rd order gradients and so on. They are elegant for univariate functions.

This is rarely done in problems that I face because

  1. 3rd order derivatives of multivariate optimisations are usually too big in time and space complexity to be tractable
  2. They are not (simply) expressible as matrices so can benefit from a little tensor theory.
  3. Other reasons I don’t know about?

I have nothing to say about this now, but for my own reference, a starting keyword is Halley-Chebyshev methods which seems to be what the 3rd order methods are called.

John Cook has a nifty demo of how this works in the univariate case.

1 References

Gower, and Gower. 2016. Higher-Order Reverse Automatic Differentiation with Emphasis on the Third-Order.” Mathematical Programming.
Gutiérrez, and Hernández. 2001. An Acceleration of Newton’s Method: Super-Halley Method.” Applied Mathematics and Computation.
Hernández, and Salanova. 1993. A Family of Chebyshev-Halley Type Methods.” International Journal of Computer Mathematics.