Model compression: Networks slimming-learning efficient convolutional Networks through network slimming

Source: Internet
Author: User
Tags truncated

Network slimming-learning efficient convolutional Networks through Network slimming (Paper)
2017 ICCV a paper, clear thinking, skeleton Ching ~ ~

Innovation point:
1. Using the scaling factor γ in batch normalization as the importance factor, that is, the smaller the gamma, the channel is less important and can be cropped (pruning).
2. To constrain gamma size, add a regular term for gamma in the target equation, which can be automatically pruned in training, which is not available in the previous model compression.

Three elements of model compression:
1. Model Size
2. The run-time memory, the model is small, the efficiency is also high, cannot the parameter is few, the computation is many, still does not drop.
3. Number of computing operations

Lack of model compression:
1. The low rank decomposition method: to the full connection layer effect may, to the convolution layer is not good, the model size may compress 3 times times, but the computation speed does not have the obvious enhancement.
2. Weight quantization:hashnet Although you can compress the number of parameters you need to save by grouping and sharing weights, there is no compression on run-time memory.
3. Two value of the weight: loss accuracy
4. Weight pruning/sparsifying: Requires a dedicated hardware or code base; [12] None of the training processes are "constrained" or "directed" to sparse (guidance)
5. Structured pruning/sparsifying: This method belongs to the type, of course, no shortcomings ... Even if the article will not mention it ~

—————————————— Split Line —————————————
Body:
Network slimming, the scaling factor γ in bn layer is used to measure the importance of channel in the course of training, and the unimportant channel is truncated to compress the model and improve the operation speed.
Look at the model diagram, the left for the training of the model, the middle column is scaling factors, which is the BN layer in the scaling factor γ, when Γ small (as in the figure 0.001,0.003), the corresponding channel will be truncated to the right side of the model shown. The truth is not very simple, but also clever to add gamma to the objective function, to achieve one side of the training side pruning the magic.

Look at the objective function:
The first is the loss of model predictions, and the second is used to constrain gamma, which is a trade-off between two parameters, which is given in the following experiment, typically set to 1e-4 or 1e-5. G (*) uses the G (s) =|s|, is L1 fan, can achieve the sparse function. The principle is finished ~

Next look at how the whole is running, how to prune and then train, and then prune. The overall flow diagram is shown in the following illustration:

It is divided into three parts, the first step, the training, the second step, the pruning and the third step, fine-tuning the pruning model and looping the execution.

Specific operation details:
Gamma usually takes 1e-4 or 1e-5, specific analysis,
After Gamma, how should we cut, how small gamma is small. Here using the same as the equivalent of the energy of PCA, the current layer of γ all add up, and then in order from large to small, select the larger part, usually choose about 70% (Specific situation analysis).

The effect of lambda selection on Γ is as shown in the figure:

Lambda is 0, the target function does not punish γ, Lambda equals 1e-5, you can find that the γ=0.0+ has more than 450, the whole is to 0 near. When Λ=1e-4, there was a greater sparse constraint on γ, and it was possible to see that nearly 2000 γ were near 0.0x.

Pruning percent: The more you cut, the smaller the model, the more you cut, the more precision you lose. This is contradictory, so the author made a comparison of the experiment to see how much more suitable. The experiment found that when the pruning was more than 80%, the precision decreased.

Specific experiments please read the original text, which involved the vgg,resnet-164 (pre-actionvation), densenet-40. The results are very good, not only compress the model size, improve the speed of operation, but also improve the classification accuracy rate.

Torch Code: Https://github.com/liuzhuang13/slimming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.