In this work, we give a geometric interpretation to the Generative Adversarial Networks (GANs). The geometric view is based on the intrinsic relation between Optimal Mass Transportation (OMT) theory and convex geometry, and leads to a variational approach to solve the Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes.
By using the optimal transportation view of GAN model, we show that the discriminator computes the Wasserstein distance via the Kantorovich potential, the generator calculates the transportation map. For a large class of transportation costs, the Kantorovich potential can give the optimal transportation map by a close-form formula. Therefore, it is sufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided, and the computational architecture can be simplified.
Preliminary experimental results show the geometric method outperforms the traditional Wasserstein GAN for approximating probability measures with multiple clusters in low dimensional space.
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10^14) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
In this paper, we aim at developing scalable neural network-type learning systems. Motivated by the idea of constructive neural networks in approximation theory, we focus on constructing rather than training feed-forward neural networks (FNNs) for learning, and propose a novel FNNs learning system called the constructive FNN (CFN). Theoretically, we prove that the proposed method not only overcomes the classical saturation problem for constructive FNN approximation, but also reaches the optimal learning rate when the regression function is smooth, while the state-of-the-art learning rates established for traditional FNNs are only near optimal (up to a logarithmic factor). A series of numerical simulations are provided to show the efficiency and feasibility of CFN.
Personal videos often contain visual distractors, which are objects that are accidentally captured that can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the Temporal-Superpixel (TSP) level using a traditional SVM-based learning framework. We also experiment with end-to-end learning using Convolutional Neural Networks (CNNs), which achieves slightly higher performance than other methods. The classification result is further refined in a post-processing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling; video frame replacement; and camera path re-planning. The user study results show that our method can significantly improve the aesthetic quality of videos.
We propose an effective framework for multi-phase image segmentation and semi-supervised data clustering by introducing a novel region force term into the Potts model. Assume the probability that a pixel or a data point belongs to each class is known a priori. We show that the corresponding indicator function obeys the Bernoulli distribution and the new region force function can be computed as the negative log-likelihood function under the Bernoulli distribution. We solve the Potts model by the primal-dual hybrid gradient method and the augmented Lagrangian method, which are based on two different dual problems of the same primal problem. Empirical evaluations of the Potts model with the new region force function on benchmark problems show that it is competitive with existing variational methods in both image segmentation and semi- supervised data clustering.
This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize.
The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates.
Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.
Finding a fixed point to a nonexpansive operator, i.e., x = Tx, abstracts many
problems in numerical linear algebra, optimization, and other areas of data sciences. To solve xed-
point problems, we propose ARock, an algorithmic framework in which multiple agents (machines,
processors, or cores) update x in an asynchronous parallel fashion. Asynchrony is crucial to parallel
computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds
up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate
xi based on possibly out-of-date information on x. The agents share x through either global memory
or communication. If writing xi is atomic, the agents can read and write x without memory locks.
We prove that if the nonexpansive operator T has a fixed point, then with probability one, ARock
generates a sequence that converges to a fixed point of T. Our conditions on T and step sizes are
weaker than comparable work. Linear convergence is obtained under suitable assumptions.
We propose special cases of ARock for linear systems, convex optimization, machine learning, as
well as distributed and decentralized consensus problems. Numerical experiments of solving sparse
logistic regression problems are presented.
The stochastic gradient (SG) method can quickly solve a problem with a large number
of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The
block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with
multiple (blocks of) variables. This paper introduces a method that combines the great features of
SG and BCD for problems with many components in the objective and with multiple (blocks of)
variables. This paper proposes a block SG (BSG) method for both convex and nonconvex programs.
BSG generalizes SG by updating all the blocks of variables in the Gauss–Seidel type (updating the
current block depends on the previously updated block), in either a fixed or randomly shuffled order.
Although BSG has slightly more work at each iteration, it typically outperforms SG because of
BSG’s Gauss–Seidel updates and larger step sizes, the latter of which are determined by the smaller
per-block Lipschitz constants. The convergence of BSG is established for both convex and nonconvex
cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex
case, its convergence is established in terms of the expected violation of a first-order optimality
condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no
longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and
logistic regression, which are convex, and low-rank tensor recovery and bilinear logistic regression,
which are nonconvex. On the convex problems, BSG performed significantly better than SG. On the
nonconvex problems, BSG significantly outperformed the deterministic BCD method because the
latter tends to stagnate early near local minimizers. Overall, BSG inherits the benefits of both SG
approximation and block coordinate updates and is especially useful for solving large-scale nonconvex