Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Penn collection
Statistics Papers
Degree type
Discipline
Subject
Computer Sciences
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Rakhlin, Alexander
Shamir, Ohad
Sridharan, Karthik
Contributor
Abstract

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T/T ), by running SGD for T iterations and returning the average point. How- ever, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be Ω (log(T)/T ), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step success to recover the O(1/T ) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

Advisor
Date of presentation
2012-01-01
Conference name
Statistics Papers
Conference dates
2023-05-17T15:30:08.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection