#### Visualization of different optimization algorithms used in deep learning on Rastrigin-type function.

Click anywhere on the function heatmap to start a minimization. You can toggle the different algorithms (SGD, Momentum, RMSProp, Adam) by clicking
on the circles in the lower bar.

The function used here is related to the [Rastrigin function](https://en.wikipedia.org/wiki/Rastrigin_function) (which is basically a convex quadratic with added cosines). It's a famous test function for optimization algorithms.

*Note:* The learning rate is 1e-2 for Adam, SGD with Momentum and RMSProp, while it is 2e-2 for SGD (to make it converge faster)

The algorithms are:

1. [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)

2. [Momentum gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum)

3. [RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)

4. [Adam](http://arxiv.org/abs/1412.6980)