Abstract
We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square-summable. We study the trade-offs between the convergence rate and robust-ness to gradient errors when designing the parameters of a first-order algorithm. We focus on a general class of momentum methods (GMM) with constant stepsize and two momentum parameters which can recover gradient descent (GD), Nesterov's accelerated gradient (NAG), the heavy-ball (HB) and the triple momentum methods (TMM) as special cases. We measure the robustness of an algorithm in terms of the cumulative suboptimality over the iterations normalized by the squared 2 norm of the gradient errors. This quantity can be interpreted as the (squared) 2 gain of a dynamical system that represents the GMM iterations where the input is the gradient error sequence and the output is a weighted distance to the optimum. For quadratic objectives, we compute the 2 gain explicitly leveraging its representation as the H∞ norm of the GMM system in the frequency domain and construct gradient errors that lead to worst-case performance explicitly. We also study the stability of GMM with respect to multiplicative errors by characterizing the structured real and stability radius of the GMM system through their connections to the H∞ norm. This allows us to compare GD, HB, NAG methods in terms of robustness, and argue that HB is not as robust as NAG despite being the fastest in terms of the rate. We then develop the robustly stable heavy ball method that can be faster than NAG while being at the best robustness level possible. We also propose the robustly stable gradient descent that is the fastest version of GD with constant stepsize while being at the best robustness level. Finally, we extend our framework to general strongly convex smooth objectives, providing non-asymptotic rate results for inexact GMM methods and bounds on the 2 gain where we can choose the GMM parameters to systematically trade off the rate to robustness in a computationally tractable framework.