#### Visualization of different optimization algorithms used in deep learning on Rastrigin-type function. Click anywhere on the function heatmap to start a minimization. You can toggle the different algorithms (SGD, Momentum, RMSProp, Adam) by clicking on the circles in the lower bar. The function used here is related to the [Rastrigin function](https://en.wikipedia.org/wiki/Rastrigin_function) (which is basically a convex quadratic with added cosines). It's a famous test function for optimization algorithms. *Note:* The learning rate is 1e-2 for Adam, SGD with Momentum and RMSProp, while it is 2e-2 for SGD (to make it converge faster) The algorithms are: 1. [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) 2. [Momentum gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum) 3. [RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) 4. [Adam](http://arxiv.org/abs/1412.6980)