This article explains some of the other common layers, including: Softmax-loss layer, Inner product layer, accuracy layer, reshape layer and dropout layer, and their parameter configuration.
1, Softmax-loss
The Softmax-loss layer and the Softmax layer are calculated roughly the same. Softmax is a classifier that calculates the probability of a class (likelihood) and is a generalization of the logistic regression.
The Logistic regression can only be used for two classifications, while Softmax may be used for multiple classifications.
The difference between Softmax and Softmax-loss:
Softmax Calculation formula:
and Softmax-loss Calculation formula:
A more specific introduction to the difference between the two can be found in: Softmax vs. Softmax-loss
The user may end up with the probability of getting the probabilities of each category, which requires only one softmax layer, not necessarily the softmax-loss operation, or how the user has obtained some probability likelihood value by other means, and then makes the maximum likelihood estimate, At this point, only the softmax-loss is needed, and the previous Softmax operation is not required. It is therefore much more flexible to provide two different layer structures than to provide only one softmax-loss layer that fits together.
Whether it is a softmax layer or a soft-loss layer, there are no parameters, only the layer type is different.
Softmax-loss Layer: Output loss value
layer {name: " loss type: " softmaxwithloss " bottom: ip1 " bottom: " label top: " loss "
layers { "cls3_fc" "prob " " prob " Type: "Softmax" }
2. Inner Product
Fully connected layer, the input is treated as a vector, and the output is a simple vector (the width and height of the input data blobs all become 1).
Input: N*c0*h*w
Output: n*c1*1*1
The fully connected layer is actually a convolution layer, except that its convolution cores are the same size as the original data. Therefore, its parameters are basically the same as the convolution layer parameters.
Layer Type:innerproduct
lr_mult: The coefficient of the learning rate, the final learning rate is the number multiplied by the BASE_LR in the Solver.prototxt configuration file. If there are two Lr_mult, then the first one represents the weight of the learning rate, the second one indicates the bias of the learning rate. The learning rate of the general bias is twice times that of the weight learning rate.
Parameters that must be set:
num_output: Number of filters (filter)
Other parameters:
Weight_filler: Initialization of weight value. The default is "constant", the value is all 0, many times we use the "Xavier" algorithm to initialize, can also be set to "Gaussian"
Bias_filler: The initialization of the offset. The general setting is "constant" and the value is all 0.
bias_term: Whether to turn on bias, default to True, turn on
Layer {name:"ip1"Type:"innerproduct"Bottom:"pool2"Top:"ip1"param {lr_mult:1} param {lr_mult:2} inner_product_param {num_output:500Weight_filler {type:"Xavier"} bias_filler {type:"constant" } } }
3, Accuracy
The output categorical (predictive) accuracy is only available in the test phase, so you need to include the include parameter.
Layer Type:accuracy
layer {name: " accuracy type: " accuracy " bottom: ip2 " bottom: " label top: " accuracy " include {phase:test}}
4, reshape
Changes the dimension of the input without changing the data.
Layer type:reshape
First look at the example
Layer {name:"Reshape"Type:"Reshape"Bottom:"input"Top:"Output"Reshape_param {shape {dim:0#copy the dimension from belowDim:2Dim:3Dim:-1#infer it from the other dimensions } } }
There is an optional parameter group shape that specifies the values for each dimension of the BLOB data (Bolb is a four-dimensional data: n*c*w*h).
dim:0 Indicates that the dimension is unchanged, that is, the input and output are the same dimensions.
Dim:2 or Dim:3 turns the original dimension into 2 or 3
Dim:-1 indicates that the dimension is automatically calculated by the system. The total amount of data is constant, and the system calculates the dimension values of the current dimension automatically based on the other three dimensions of the BLOB data.
Assuming that the original data is: 64*3*28*28, a color image representing 64 3-channel 28*28
After reshape transformation:
Reshape_param { shape { dim:0 dim:0 -1 } }
Output data is: 64*3*14*56
5, dropout
Dropout is a trick to prevent overfitting. The weights of some hidden layer nodes of the network can be randomly left out of work.
First look at the example:
Layer {name:"DROP7"Type:"Dropout"Bottom:"Fc7-conv"Top:"Fc7-conv"Dropout_param {dropout_ratio:0.5}}layer {name:"DROP7"Type:"Dropout"Bottom:"Fc7-conv"Top:"Fc7-conv"Dropout_param {dropout_ratio:0.5 } }
Just need to set a dropout_ratio on it.
"Turn" Caffe preliminary Examination (vii) other commonly used layers and parameters