# Gradient descent, Higher order

October 26, 2019 — October 26, 2019

functional analysis

neural nets

optimization

statmech

Newton-type optimization uses 2nd-order gradient information (i.e. a Hessian matrix) to solve optimization problems. Higher order optimization uses 3rd order gradients and so on. They are elegant for univariate functions.

This is rarely done in problems that I face because

- 3rd order derivatives of multivariate optimizations are usually too big in time and space complexity to be tractable
- They are not (simply) expressible as matrices so can benefit from a little tensor theory.
- Other reasons I don’t know about?

I have nothing to say about this now, but for my own reference, a starting keyword is *Halley-Chebyshev methods* which seems to be what the 3rd order methods are called.

John Cook has a nifty demo of how this works in the univariate case.

## 1 References

Gower, and Gower. 2016. “Higher-Order Reverse Automatic Differentiation with Emphasis on the Third-Order.”

*Mathematical Programming*.
Gutiérrez, and Hernández. 2001. “An Acceleration of Newton’s Method: Super-Halley Method.”

*Applied Mathematics and Computation*.
Hernández, and Salanova. 1993. “A Family of Chebyshev-Halley Type Methods.”

*International Journal of Computer Mathematics*.