Original link: caffe.berkeleyvision.org/tutorial/layers.html
To create the Caffe model, you first define the structure in the protocol buffer definition file (prototxt).
In the Caffe environment, the obvious feature of the image is its spatial structure.
Convolution:
Documents:
1 Parameters (convolutionparameter convolution_param)2 3 Required4 Num_output (c_o): Output number (filter number)5Kernel_size (orKernel_h andkernel_w): Specify convolutional cores6 7 strongly Recommended8Weight_filler [Default type:'constant'value:0]9 Ten Optional One Bias_term [Default true]: Specifies whether to provide a bias APad (orPad_h andPad_w) [Default 0]: Specifies the amount of pixel padding on both sides of the input picture -Stride (orStride_h andSTRIDE_W) [Default 1]: Specifies the intervals at which to apply the filters to the input -Group (g) [Default 1]: if g > 1, we limit the connectivity between each filter for the subset of inputs. Specifies that inputs and outputs are divided into G groups, and that the I output group is only connected to the input group I. the - Input - -n * c_i * h_i *w_i + - Output + An * c_o * h_o * w_o, where h_o = (h_i + 2 * pad_h-kernel_h)/Stride_h + 1 andW_o likewise.
Example
1 Layers {2Name"CONV1"3 type:convolution4Bottom"Data"5Top"CONV1"6Blobs_lr:1#Learning Rate multiplier for the filters7Blobs_lr:2#Learning Rate multiplier for the biases8Weight_decay:1#weight Decay multiplier for the filters9weight_decay:0#weight Decay multiplier for the biasesTen Convolution_param { Onenum_output:96#Learn Filters AKernel_size:11#Each filter is 11x11 -Stride:4#Step 4 pixels between each filter application - Weight_filler { theType"Gaussian" #Initialize the filters from a Gaussian -std:0.01#distribution with Stdev 0.01 (default mean:0) - } - Bias_filler { +Type"constant" #initialize the biases to zero (0) - value:0 + } A } at}
Pooling:
deeplearning.stanford.edu/wiki/index.php/pooling
Pooling: Overview
After the feature (features) has been obtained by convolution, we want to use these features to classify the next step. In theory, one can use all the extracted features to train a classifier, such as a softmax classifier, but this poses a challenge to computational capacity. For example, for an image of a 96X96 pixel, suppose we have learned 400 features defined on the 8x8 input, each feature and image convolution will get one (96−8 + 1) * (96−8 + 1) = 7921 dimensional convolution feature, because there are 400 features, so each A sample (example) will get a convolution feature vector of 892 * 400 = 3,168,400 dimensions. Learning a classifier with more than 3 million feature inputs is inconvenient and prone to overfitting (over-fitting).
To solve this problem, first recall that the reason we decided to use convolution is because the image has a "static" property, which means that features that are useful in one image area are likely to be equally applicable in another area. Therefore, in order to describe large images, a natural idea is to aggregate the characteristics of different locations, for example, one can calculate the average (or maximum) of a particular feature on an area of an image. These summary statistics feature not only a much lower dimension (compared to the use of all extracted features), but also improve the results (not easily overfitting). This aggregation is called pooling (pooling), which is sometimes referred to as averaging pooling or maximum pooling (depending on how pooling is calculated).
Caffe note-Taking routines Learning (iii)