Gradient descent, Higher order

Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimiztion problems. Higher order optimisation uses 3rd order gradients and so on. They are elegant for univariate functions.

This is rarely done in problems that I face because

  1. 3rd order derivatives of multivariate optimisations are usually too big in time and space complexity to be tractable
  2. They are not (simply) expressible as matrices so can benefit from a little tensor theory.
  3. Other reasons I don’t know about?

I have nothing to say about this now, but for my own reference, a starting keyword is Halley-Chebyshev methods which seems to be what the 3rd order methods are called.

John Cook has a nifty demo of how this works in the univariate case.

Gower, R. M., and A. L. Gower. 2016. “Higher-Order Reverse Automatic Differentiation with Emphasis on the Third-Order.” Mathematical Programming 155 (1-2): 81–103. https://doi.org/10.1007/s10107-014-0827-4.

Gutiérrez, J. M., and M. A. Hernández. 2001. “An Acceleration of Newton’s Method: Super-Halley Method.” Applied Mathematics and Computation 117 (2): 223–39. https://doi.org/10.1016/S0096-3003(99)00175-7.

Hernández, M. A., and M. A. Salanova. 1993. “A Family of Chebyshev-Halley Type Methods.” International Journal of Computer Mathematics 47 (1-2): 59–63. https://doi.org/10.1080/00207169308804162.