Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Rakhlin, Alexander; Shamir, Ohad; Sridharan, Karthik

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Penn collection

Statistics Papers

Subject

Computer Sciences
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47452

View all metadata

Author

Rakhlin, Alexander

Shamir, Ohad

Sridharan, Karthik

Abstract

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T/T ), by running SGD for T iterations and returning the average point. How- ever, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be Ω (log(T)/T ), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step success to recover the O(1/T ) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

Date of presentation

2012-01-01

Conference name

Statistics Papers

Conference dates

2023-05-17T15:30:08.000

Collection

Presentations