In its simplest form, convolution neural networks (CNNs) consist of a fully connected layer g composed with a sequence of convolution layers T . Although g is known to have the universal approximation property, it is not known if CNNs, which has the form g ◦ T inherits this property, especially when the kernel size in T is small. In this paper, we show that under suitable conditions, CNNs does inherit the universal approximation property and its sample complexity can be characterized. In addition, we discuss concretely how the nonlinearity of T can improve the approximation power. Finally, we show that when the target function class has a certain compositional form, convolutional networks are far more advantageous compared with fully connected networks, in terms of number of parameters needed to achieve a desired accuracy.