This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize.
The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates.
Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.
Finding a fixed point to a nonexpansive operator, i.e., x = Tx, abstracts many
problems in numerical linear algebra, optimization, and other areas of data sciences. To solve xed-
point problems, we propose ARock, an algorithmic framework in which multiple agents (machines,
processors, or cores) update x in an asynchronous parallel fashion. Asynchrony is crucial to parallel
computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds
up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate
xi based on possibly out-of-date information on x. The agents share x through either global memory
or communication. If writing xi is atomic, the agents can read and write x without memory locks.
We prove that if the nonexpansive operator T has a fixed point, then with probability one, ARock
generates a sequence that converges to a fixed point of T. Our conditions on T and step sizes are
weaker than comparable work. Linear convergence is obtained under suitable assumptions.
We propose special cases of ARock for linear systems, convex optimization, machine learning, as
well as distributed and decentralized consensus problems. Numerical experiments of solving sparse
logistic regression problems are presented.
The stochastic gradient (SG) method can quickly solve a problem with a large number
of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The
block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with
multiple (blocks of) variables. This paper introduces a method that combines the great features of
SG and BCD for problems with many components in the objective and with multiple (blocks of)
variables. This paper proposes a block SG (BSG) method for both convex and nonconvex programs.
BSG generalizes SG by updating all the blocks of variables in the Gauss–Seidel type (updating the
current block depends on the previously updated block), in either a fixed or randomly shuffled order.
Although BSG has slightly more work at each iteration, it typically outperforms SG because of
BSG’s Gauss–Seidel updates and larger step sizes, the latter of which are determined by the smaller
per-block Lipschitz constants. The convergence of BSG is established for both convex and nonconvex
cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex
case, its convergence is established in terms of the expected violation of a first-order optimality
condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no
longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and
logistic regression, which are convex, and low-rank tensor recovery and bilinear logistic regression,
which are nonconvex. On the convex problems, BSG performed significantly better than SG. On the
nonconvex problems, BSG significantly outperformed the deterministic BCD method because the
latter tends to stagnate early near local minimizers. Overall, BSG inherits the benefits of both SG
approximation and block coordinate updates and is especially useful for solving large-scale nonconvex