This package implements classic numerical optimization methods for training Artificial Neural Networks.
These methods are not very common in Deep learning frameworks due to their computational requirements, like in the case of Newton-Raphson and Levemberg-Marquardt, which require a large amount of memory since they use information about the second derivative of the loss function. For this reason, it is recommended that these algorithms are applied only to Neural Networks with few hidden layers.
There are also a couple of methods that do not require that much memory such as SGD with line search and the Conjugate Gradient method.
Numerical Optimization, Jorge Nocedal, Stephen J. Wright
Note: Approximate Greatest Descent is not interesting enough to be included, the author of the method is shared with the author of the review paper, making it's inclusion in the review seem biased. The method can be replicated by applying damping to the hessian on Newton's method along with a trust region method to calculate
- Newton-Raphson
- Gauss-Newton
- Levenberg-Marquard (LM)
- Stochastic Gradient Descent with Line Search
- Conjugate Gradient
- AdaHessian
- Quasi-Newton (LBFGS already in pytorch)
- Hessian-free / truncated Newton
If you feel like there's a missing algorithm you can open an issue with the name of the algorithm with some references and a justification why you think it should be included in the package.