Application test of compression and squeezenet of neural network model

Source: Internet
Author: User

Deep learning has made great breakthroughs in many fields, but the existing models for deep learning training are often larger, such as imagenet or Coco, the various training models above are often more than hundreds of M, which is certainly not a problem for the existing mainstream computers, But for some mobile devices or some hardware level applications may be more difficult. Therefore, the neural network compression is also an important part in the research and application of depth learning.

A speech at Microsoft Qin Tao ( Mentioned: The existing network compression has four main categories.

One is called pruning, you know, the neural network is mainly composed of a layer of nodes through the edge of the connection, each side some weight. The meaning of pruning is very simple, if we find that some side of the weight is very small, such edges may not be important, these edges can be removed. After we've trained the big model, we'll see which side weights are smaller, remove the edges, and then train the model on the side of the reservation.

Another way to model compression is through weight sharing. Assuming that the adjacent two layers are fully connected, each layer has 1000 nodes, then there are 1000 times 1000 or 1 million weights (parameters) between the two layers. We can do a cluster of 1 million weights to see which weights are very close, we can use the mean of each class instead of these weights that belong to this category, so many sides (if they are clustered in the same class) share the same weights. If we cluster 1 million numbers into 1000 categories, we can reduce the number of parameters from 1 million to 1000, which is also a very important technique for compressing the size of a model.

There is also a technology that can be thought of as weight sharing goes further, called quantification. The parameters of the depth neural network model are expressed by the number of floating-point types, and the 32bit-length floating-point number. In fact, there is no need to retain such high precision, we can quantify, for example, using 0 to 255 to express the original 32 bit of precision, by sacrificing precision to reduce the amount of space required for each weight.

The more extreme approach to this quantization is the fourth kind of technology called the two-way neural network. The so-called two-system neural network, is that all the weights do not have to express the floating-point numbers, is a binary number, or +1 or 1, in binary way to express, so that the original bit weight now only need a bit to express, thus greatly reducing the size of the model.

In the study of depth model compression this piece, more famous is squeezenet, the most important thing is that the effect is quite good. It is a classic compression model for the compression of Google Alexnet. Its main approach is to convert the original 3*3 conv into 1*1 conv, which can reduce a lot of parameters.

Of course, there are more than the above improvements, as well as the operations in the expand Extension module, as well as interspersed pool operations similar to residual processing. In particular, the final full connection layer of the replacement.

(2) test

Based on the squeezenet test, found that the training model is basically 4.8M, has been very small, online and directly to the parameter type int, etc., can control the size of the 0.8M or so.

Or do you test with this multiple scene type:

Its identification results are as follows: (recognition speed is very fast)

Test Picture two:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.