International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Deep Learning--msra Initialization

Last Update:2016-05-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This brief introduction to the MSRA initialization method is also derived from He Keming paper delving deep into rectifiers:surpassing human-level performance on ImageNet Classification ".

- Motivation
- MSRA initialization
- Derivation proof
- Additional Information

Motivation

Network initialization is a very important thing. However, the traditional Gaussian distribution of fixed variance is initialized, which makes the model difficult to converge when the network becomes deep. In addition, the Vgg team dealt with the initialization problem: they first trained a 8-tier network and then used the network to initialize deeper networks.

"Xavier" is a relatively good initialization method, which I have described in my other blog post, "Deep learning--xavier initialization method". However, when the Xavier derivation assumes that the activation function is linear , it is clear that the relu and prelu that we commonly use today do not satisfy this condition.

MSRA Initialization

When only the number of inputs is considered, MSRA initialization is a Gaussian distribution with a mean of 0 variance of 2/n:

Derivation Proof

The derivation process is similar to Xavier.

First, the following formula is used to represent the L-layer convolution:

The variance is: (assuming that X and W are independent and each element is the same distribution, that is, the n_l in the following formula represents the number of input elements, both x_l and w_l represent a single element)

When the weight W satisfies the 0 mean, the above variance can be further written as:

For the Relu activation function, we have: (where f is the activation function)

The variance formula that was brought in before is:

In order to keep the variance of each layer of data consistent, the weights should satisfy:

Additional Information

(1) for the first layer of data, because it has not been relu before, so theoretically this layer of initialization variance should be 1/n. However, because there is only one layer, the coefficient is almost insignificant, so in order to simplify the whole operation, the 2/n variance is adopted;

(2) reverse propagation needs to be considered exactly like "Xavier". For the reverse propagation, can also be the above deduction, the final conclusion is still the variance should be 2/n, but because it is reversed, here n is no longer the number of inputs, but the number of outputs. Both of these methods can help the model converge, the article says.

(3) for the Prelu activation function, the condition becomes:

So initialization is related to Prelu, but the current Caffe code does not support manually specifying the value of a at MSRA initialization.

(4) This paper makes some comparative experiments, which shows that MSRA initialization is significantly better than Xavier initialization after network deepening.

Especially when the network is increased to 33 layers, the contrast effect is more obvious

Deep Learning--msra Initialization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep Learning--msra Initialization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support