It is well acknowledged in image processing domain that the information can be decomposed into different frequency parts and each part has its own merits. However, existing neural networks always ignore the distinctions and straightforwardly feed all the information into neural networks together, treating them equally. In this paper, we propose a novel neural networks framework named J-Net that decomposes images into different frequency bands and then processes them sequentially. Concretely, the images have been decomposed by wavelet transformation and then the wavelet coefficients are fed into neural networks gradually in different depth according to their decomposition levels. An attention module is utilized to facilitate the fusion of neural network features and injected information, yielding significant performance gain. Furthermore, we show how does the information with different frequency impact the accuracy of neural networks. Experiments show that 5.91%, 5.32% and 2.00% accuracy improvements on Caltech 101, Caltech256 and ImageNet, respectively.