Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimiztion problems. Higher order optimisation uses 3rd order gradients and so on. They are elegant for univariate functions.
This is rarely done in problems that I face because
- 3rd order derivatives of multivariate optimisations are usually too big in time and space complexity to be tractable
- They are not (simply) expressible as matrices so can benefit from a little tensor theory.
- Other reasons I don’t know about?
I have nothing to say about this now, but for my own reference, a starting keyword is Halley-Chebyshev methods which seems to be what the 3rd order methods are called.
John Cook has a nifty demo of how this works in the univariate case.
Gower, R. M., and A. L. Gower. 2016. “Higher-Order Reverse Automatic Differentiation with Emphasis on the Third-Order.” Mathematical Programming 155 (1-2): 81–103. https://doi.org/10.1007/s10107-014-0827-4.
Gutiérrez, J. M., and M. A. Hernández. 2001. “An Acceleration of Newton’s Method: Super-Halley Method.” Applied Mathematics and Computation 117 (2): 223–39. https://doi.org/10.1016/S0096-3003(99)00175-7.
Hernández, M. A., and M. A. Salanova. 1993. “A Family of Chebyshev-Halley Type Methods.” International Journal of Computer Mathematics 47 (1-2): 59–63. https://doi.org/10.1080/00207169308804162.