Minimization of functions that are a sum of a smooth part and a convex but possibly nonsmooth part that promotes regularization is prevalent in many application areas, and proximal-type methods are the most effective algorithms for such regularized optimization problems. In this talk, I will present our recent advancements on the convergence analysis of first-order and second-order proximal-type methods for regularized optimization in [1, 2] and their applications in distributed machine learning in . I will first show that the long-known O(1/k) convergence rate on convex problems of gradient descent and coordinate descent in the function value is not tight and can be improved to o(1/k), and extend this result to proximal gradient and proximal coordinate descent for regularized optimization. The second part of this talk will focus on a general framework of inexact successive quadratic approximation that has second-order proximal methods including proximal Newton and proximal quasi-Newton as special cases. Global convergence rates for strongly convex, convex, and nonconvex problems will be discussed.
Finally, an application of the inexact successive quadratic approximation in distributed machine learning will be discussed.