"Deep learning" convolution layer speed-up factorized convolutional neural Networks

Source: Internet
Author: User

Wang, Min, Baoyuan Liu, and Hassan Foroosh. "Factorized convolutional neural Networks." ArXiv preprint (2016).

This paper focuses on the optimization of the convolution layer in the deep network, which has three unique features:
-Can be trained directly . You do not need to train the original model first, then use the sparse, compressed bits and so on to compress.
-Maintain the original input and output of the convolution layer, it is easy to replace the already designed network.
- simple to implement , can be obtained by the classic convolution layer combination.

Using this method to design the classification network, precision and GoogLeNet1, ResNet-182, VGG-163 equivalent, the model size is only 2.8M. multiplication times 470x109 470\times 10^9, only AlexNet4 65%. Standard convolution layer

Let's review the convolution process first. The standard convolution places the volume kernel (orange) on the input data on I (left) and a pixel (blue) for the output O (right) of the phase multiplication.

The size of the convolution kernel on one channel is K2 k^2, and the input and output channels are m,n m,n respectively.

In the current popular network, the main function of convolution layer is to extract the feature, which will always keep the image size unchanged. The steps to shrink the image are generally implemented by the pooling layer. For the writing simplicity, the input output is considered to be the same size as the HxW h\times W.

The number of multiplication required to compute an output pixel is:
K2XM K^2\times m

Total multiplication times are:

K2xmxnxhxw k^2\times m \times n \times h\times W

M,n M,n embodies the excavation of features, the value of a large, often hundreds of; in contrast, K K is generally around 1~5, rarely more than 7, and high level features are realized through multiple small-dimensional convolution layers. optimization of convolution layer

In this paper, the optimization variants of three kinds of convolution layers are introduced. Use of grass roots (bases)

Set the kernel size K2 k^2, the input/output channel number M,n M,n, this method puts the volume integral into two steps.

In the first step, you enter a separate operation for each channel. The results of B-B layer are computed in each channel under the action of the same dimensional convolution nuclei. M-channel data becomes MXB M\times B channel. Each channel of intermediate results is called a base basis.
In the second step, each channel is merged, but the volume kernel size used is 1.

The number of multiplication is
k2xbxmxhxw+bxmxnxhxw= (k2+n) Xbxmxhxw k^2\times b \times m \times h \times w + b \times m \times n \times h \times w = (k^2 + N) \times b \times m \times h\times W

The percentage of multiplication required for the traditional convolution is:
K2B+NBK2N \frac{k^2b+nb}{k^2n}
Note that the big head here is n N, as long as the b<k

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.