Penghang YinUniversity of California at Los AngelesShuai ZhangUniversity of California at IrvineJiancheng LyuUniversity of California at Los AngelesStanley OsherUniversity of California at IrvineYingyong QiUniversity of California at IrvineJack XinUniversity of California at Irvine
We propose BinaryRelax, a simple two-phase algorithm, for training deep neural networks with quantized weights. The set constraint that characterizes the quantization of weights is not imposed until the late stage of training, and a sequence of pseudo quantized weights is maintained. Specifically, we relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to be the squared Euclidean distance to the set of quantized weights. The pseudo quantized weights are obtained by linearly interpolating between the float weights and their quantizations. A continuation strategy is adopted to push the weights towards the quantized state by gradually increasing the regularization parameter. In the second phase, exact quantization scheme with a small learning rate is invoked to guarantee fully quantized weights. We test BinaryRelax on the benchmark CIFAR- 10 and CIFAR-100 color image datasets to demonstrate the superiority of the relaxed quantization approach and the improved accuracy over the state-of-the-art training methods. Finally, we prove the convergence of BinaryRelax under an approximate orthogonality condition.
Bao WangDepartment of Mathematics, UCLAPenghang YinDepartment of Mathematics, UCLAAndrea L. BertozziDepartment of Mathematics, UCLAP. Jeffrey BrantinghamDepartment of Anthropology, UCLAStanley J. OsherDepartment of Mathematics, UCLAJack XinDepartment of Mathematics, UCLA&UCI
Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spatial temporal residual network on the well represented data to predict the distribution of crime in Los Angeles at the scale of hours in neighborhood-sized parcels. These experiments as well as comparisons with several existing approaches to prediction demonstrate the superiority of the proposed model in terms of accuracy. Finally, we present a ternarization technique to address the resource consumption issue for its deployment in real world. This work is an extension of our short conference proceeding paper [Wang et al, Arxiv 1707.03340].
We present LBW-Net, an efficient optimization based method for quantization and training of the low bit-width convolutional neural networks (CNNs). Specifically, we quantize the weights to zero or powers of two by minimizing the Euclidean distance between full-precision weights and quantized weights during backpropagation. We characterize the combinatorial nature of the low bit-width quantization problem. For 2-bit (ternary) CNNs, the quantization of N weights can be done by an exact formula in O(N log N ) complexity. When the bit-width is three and above, we further propose a semi-analytical thresholding scheme with a single free parameter for quantization that is computationally inexpensive. The free parameter is further determined by network retraining and object detection tests. LBW-Net has several desirable advantages over full-precision CNNs, including considerable memory savings, energy efficiency, and faster deployment. Our experiments on PASCAL VOC dataset show that compared with its 32-bit floating-point counterpart, the performance of the 6-bit LBW-Net is nearly lossless in the object detection tasks, and can even do better in some real world visual scenes, while empirically enjoying more than 4× faster deployment.
We study the minimization problem of a non-convex sparsity promoting penalty func- tion, the transformed l1 (TL1), and its application in compressed sensing (CS). The TL1 penalty inter- polates l0 and l1 norms through a nonnegative parameter a ∈ (0, +∞), similar to lp with p ∈ (0, 1], and is known to satisfy unbiasedness, sparsity and Lipschitz continuity properties. We first consider the constrained minimization problem, and discuss the exact recovery of l0 norm minimal solution based on the null space property (NSP). We then prove the stable recovery of l0 norm minimal solution if the sensing matrix A satisfies a restricted isometry property (RIP). We formulated a normalized problem to overcome the lack of scaling property of the TL1 penalty function. For a general sensing matrix A, we show that the support set of a local minimizer corresponds to linearly independent columns of A. Next, we present difference of convex algorithms for TL1 (DCATL1) in computing TL1-regularized con- strained and unconstrained problems in CS. The DCATL1 algorithm involves outer and inner loops of iterations, one time matrix inversion, repeated shrinkage operations and matrix-vector multiplications. The inner loop concerns an l1 minimization problem on which we employ the Alternating Direction Method of Multipliers (ADMM). For the unconstrained problem, we prove convergence of DCATL1 to a stationary point satisfying the first order optimality condition. In numerical experiments, we identify the optimal value a = 1, and compare DCATL1 with other CS algorithms on two classes of sensing ma- trices: Gaussian random matrices and over-sampled discrete cosine transform matrices (DCT). Among existing algorithms, the iterated reweighted least squares method based on l1/2 norm is the best in sparse recovery for Gaussian matrices, and the DCA algorithm based on l1 minus l2 penalty is the best for over-sampled DCT matrices. We find that for both classes of sensing matrices, the performance of DCATL1 algorithm (initiated with l1 minimization) always ranks near the top (if not the top), and is the most robust choice insensitive to the conditioning of the sensing matrix A. DCATL1 is also com- petitive in comparison with DCA on other non-convex penalty functions commonly used in statistics with two hyperparameters.
We study a non-convex low-rank promoting penalty function, the transformed Schatten- 1 (TS1), and its applications in matrix completion. The TS1 penalty, as a matrix quasi-norm defined on its singular values, interpolates the rank and the nuclear norm through a nonnegative parameter a∈ (0,+∞). We consider the unconstrained TS1 regularized low-rank matrix recovery problem and develop a fixed point representation for its global minimizer. The TS1 thresholding functions are in closed analytical form for all parameter values. The TS1 threshold values differ in subcritical (supercritical) parameter regime where the TS1 threshold functions are continuous (discontinuous). We propose TS1 iterative thresholding algorithms and compare them with some state-of-the-art algorithms on matrix completion test problems. For problems with known rank, a fully adaptive TS1 iterative thresholding algorithm consistently performs the best under different conditions, where ground truth matrices are generated by multivariate Gaussian, (0,1) uniform and Chi-square distributions. For problems with unknown rank, TS1 algorithms with an additional rank estimation procedure approach the level of IRucL-q which is an iterative reweighted algorithm, non-convex in nature and best in performance.