We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the" edge of chaos" hypothesis, these subexponential and polynomial laws allow residual networks to" hover over the boundary between stability and chaos," thus preserving the geometry of the input space and the gradient information flow. In our experiments, for each activation function we study here, we initialize residual networks with different hyperparameters and train them on MNIST. Remarkably, our initialization time theory can accurately predict test time performance of these networks, by tracking either the expected amount of gradient explosion or the expected squared distance between the images of two input vectors. Importantly, we show, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth
This paper proposes a neural network approach for efficiently solving general nonlinear convex programs with second-order cone constraints. The proposed neural network model was developed based on a smoothed natural residual merit function involving an unconstrained minimization reformulation of the complementarity problem. We study the existence and convergence of the trajectory of the neural network. Moreover, we show some stability properties for the considered neural network, such as the Lyapunov stability, asymptotic stability, and exponential stability. The examples in this paper provide a further demonstration of the effectiveness of the proposed neural network. This paper can be viewed as a follow-up version of ,  because more stability results are obtained.
This paper proposes using the neural networks to efficiently solve the second-order cone programs (SOCP). To establish the neural networks, the SOCP is first reformulated as a second-order cone complementarity problem (SOCCP) with the KarushKuhnTucker conditions of the SOCP. The SOCCP functions, which transform the SOCCP into a set of nonlinear equations, are then utilized to design the neural networks. We propose two kinds of neural networks with the different SOCCP functions. The first neural network uses the FischerBurmeister function to achieve an unconstrained minimization with a merit function. We show that the merit function is a Lyapunov function and this neural network is asymptotically stable. The second neural network utilizes the natural residual function with the cone projection function to achieve low computation complexity. It is shown to be Lyapunov stable and converges globally