International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Model compression--adding model complexity to loss function_ neural network compression and acceleration

Last Update:2018-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here is an article about network compression in 2017ICLR Openreview, "Training compressed fully-connected Networks with a density-diversity penalty."

Read the title of the article to know that the main is for the full link layer, so my goodwill fell by half. ———————— Introduction ————————

The author takes Vgg to say that the whole connection layer will occupy a lot of resources, compress this most important. It seems that there is no t_t (can Shan the accumulation layer is severe).

The article presents two nouns, which I find very interesting: "Density" and "diversity". These two nouns basically elicit the most existing compression methods for the depth model.

"Density" leads to a more representative approach is pruning, matrix decomposition, etc., that is, reduce the network sparsity (redundancy), so that the model is compressed.

"Diversity" leads to a more representative method is a quantitative method, with a small number of code words to represent a large weight matrix, that is, reduce the diversity of network parameters, so you can only store these different code words, thus compressing the model.

As a result, the article adds the density and diversity of the whole connection layer to the loss to punish, with the intention of making the network more sparse and diverse.

And this is one of my favorite explanations for this article: the author punishes the density and diversity of the whole-link layer in loss in order not to get a small model directly, but to better use pruning and quantization methods (reference links) on this basis. This is because the more sparse the network, the more we can cut the branches, the less the diversity of parameters, we can use less code words to quantify.

PS. I've seen several articles before. The width and depth of the network are also added into the loss, and the intention is to train the model with good performance and small volume. I am not very good at this kind of article, because of high complexity, but also bring a lot of training difficulties. We can easily compress to get a small model, but it is very difficult to train a small model directly. ———————— method: Loss Function ————————

The method is in fact divided into two steps: First, a sparse and low diversity network is trained through the loss function below, and then the network is compressed using pruning and quantization methods.

As you can tell, the current loss contains three parts: \ (L (\hat y,y) \) represents the predictive error; \ (\vert w_j\vert_p\) is the P-norm of the weighting matrix, used to describe the density; W_j (a,b)-w_j (a ', B ') |\) is used to describe the multiplicity of weight matrices. In addition, \ (\lambda_j\) is used to adjust the weight of each layer density and diversity loss, mainly to balance the number of each layer of the number of parameters caused by loss changes in the order of magnitude.

In reverse derivation, diversity loss \ (| W_j (a,b)-w_j (a ', B ') |\) The derivation of the calculation is too large, the author proposed a quick way, here is no longer to repeat, have the interest to see the original.

Training, the author also explained that this loss can lead to poor model performance, so the author took a step-by-step iterative approach. In the following diagram, the density and diversity loss are added first (the final model will be very accurate), then remove the loss and keep the fully connected layer template unchanged and then train the other layers to repeat the step.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

waterfall model pros and cons una one and only model hadoop file formats and compression keras model to tensorflow pb model model dreamweaver how to program neural network how to troubleshoot packet loss in network

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Model compression--adding model complexity to loss function_ neural network compression and acceleration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support