Mechanical Engineering Seminar
Tuesday, March 13, 2018
10:15 a.m., 3540 Engineering Building
Refreshments Served at 10:00 a.m.
Distributed Asynchronous Stochastic Optimization with Unbounded Delays: How Slow Can You Go?
PhD candidate in Department of Electrical Engineering
Abstract: One of the most widely used optimization methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD). However, a key issue therein is that of delayed gradients: when a computing node asynchronously contributes a local gradient to the global model, the global model parameter may have changed, rendering this information stale. In massively parallel computing grids, these delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD is uncertain under these conditions.
In this paper, we consider two commonly adopted distributed computing architectures (one is the master-slave architecture and the other is multiple processor with shared processor architecture) and show that, perhaps surprisingly, for a broad class of optimization objectives (strictly including convex, pseudo-convex, star-convex optimization problems), DASGD converges almost surely to global optimal solutions even when the delays grow unbounded at a polynomial rate. In this way, our results help clarify and reaffirm the recent empirical success of applying DASGD in large-scale learning applications.
This is joint work with Nick Bambos, Peter Glynn, Fei-Fei Li, Jia Li, Panayotis Mertikopoulos and Yinyu Ye
Bio: Zhengyuan Zhou is a 5th-year PhD candidate in Electrical Engineering at Stanford, with master degrees in Statistics and in Computer Science. He has received a B.E. in Electrical Engineering and Computer Sciences and a B.A. in mathematics, both from UC Berkeley. His research interests include machine learning, optimization, control, game theory and applied probability.