A Proof of the Convergence of Gradient Descent

In this post, I give a proof of the convergence of gradient descent for the class of convex functions with Lipschitz continuous gradient. Then, I show that the convergence rate of gradient descent can be improved by simply adding an momentum (extrapolation) step. The proofs in this post are adapted from the paper A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems.

