Because the traditional initialization weight problem is randomly initialized with the standard normal distribution (mean 0, Variance is 1), this is actually an unreasonable part.
Standard Normal Distribution:
It can be seen that the distribution of real data is actually in the part near the peak of the slope, which accords with normal distribution.
The following are reproduced
Click to open the link
Initialization weights in Caffe
First note: in the Caffe/include/caffe FILER.HPP file has its source file, if you want to see, you can see Oh, anyway, I do not want to see, code details, now do not want to know too much, have a macro idea on it, if you want to see the specific words of the code, you can see: http://blog.csdn.net/ xizero00/article/details/50921692, the writing is still very good (but some places of the remark is not correct, do not know changed over).
File FILLER.HPP provides 7 methods of weight initialization, namely: constant initialization (constant), Gaussian distribution initialization (Gaussian), Positive_unitball initialization, uniform distribution Initialization (uniform), Xavier initialization, MSRA initialization, bilinear initialization (bilinear).
275 filler<dtype>* getfiller (const fillerparameter& param) {276 const std::
string& type = Param.type (); 277 if (type = = "Constant") {278 return new constantfiller<dtype> (param); 279} else if (type = = "Gaussian" {280 return new gaussianfiller<dtype> (param); 281} else if (type = "Positive_unitball") {282 return
New positiveunitballfiller<dtype> (param); 283} else if (type = = "Uniform") {284 return new uniformfiller<dtype> (param); 285} else if (type = = Xavi Er ") {286 return new xavierfiller<dtype> (param); 287} else if (type = =" Msra ") {return to new Msrafill
Er<dtype> (param); 289} else if (type = = "Bilinear") {290 return new bilinearfiller<dtype> (param); 291} else {292 CHECK
(false) << "Unknown filler name:" << param.type ();
293} 294 return (filler<dtype>*) (NULL); 295}
and combine the fillerparameter in the. prototxt file to see how it is used:
The message Fillerparameter {A///the filler type.
Optional String type = 1 [default = ' constant ']; Optional float value = 2 [default = 0]; The value in constant filler optional float min = 3 [default = 0]; The min value in uniform filler optional float max = 4 [default = 1]; The max value in uniform filler optional float mean = 5 [default = 0]; The mean value in Gaussian filler Optional float std = 6 [default = 1]; The STD value in Gaussian filler Wuyi//The expected number of Non-zero output weights for a given input in 52/
/Gaussian Filler-the default-1 means don ' t perform sparsification.
Optional Int32 sparse = 7 [default =-1];
Si//normalize the filler variance by fan_in, fan_out, or their average.
Applies to ' Xavier ' and ' Msra ' fillers.
The enum variancenorm {fan_in = 0; Fan_out = 1;
AVERAGE = 2;
Optional Variancenorm variance_norm = 8 [default = Fan_in]; 62}
constant Initialization method:
It is the weight or bias of the initialization into a constant, what is the constant, you can define it. Its value equals the value of value in the. prototxt file above, and defaults to 0
Here are the definitions in the. proto file associated with it, which may be used when defining the network.
Optional String type = 1 [default = ' constant ']; Optional Float value = 2 [default = 0];//The value in constant filler
Uniform Initialization Method
Its function is to the weight and bias of the uniform distribution of the initialization. Use min and Max to control their upper and lower limits, by default (0,1).
Here are the definitions in the. proto file associated with it, which may be used when defining the network.
Optional String type = 1 [default = ' constant ']; Optional Float min = 3 [default = 0];//The Min value in uniform filler Optional float max = 4 [Default = 1]; The max value in uniform filler
Initialization of Gaussian
Given the mean value of the Gaussian function and the standard deviation, and then it. To generate a Gaussian distribution on it.
However, the point is that Gaussina initialization can be sparse, meaning that some weights can be set to 0. The parameter sparse is used to control it. Sparse represents the number of non-0 relative to Num_output, in the code implementation, will be sparse/num_output as the probability of Bernoulli distribution, understand. The number of generated Bernoulli distributions (0 or 1) is as good as the original weights, and a partial weight value of 0 can be achieved. That is why, I do not understand a bit, why not directly define SPARSR as a probability. How easy it is, and how well you know. For Num_output, you are defining your network. Prototxt, there must be some, do not believe you go to see;
Here are the definitions in the. proto file associated with it, which may be used when defining the network.
Optional String type = 1 [default = ' constant ']; Optional float mean = 5 [default = 0];//The mean value in Gaussian filler
50