Transferred from: https://ymgd.github.io/codereader/2016/10/20/caffe_sourcecode_analysis/
1. Preface
The current image and natural language processing in many places with neural network/deep learning-related knowledge, magical effect so that the vast number of it-line program ape GG eager, but see deep learning related to a series of formulas after the scalp numbness, and a great idea of giving up.
From the point of view of industrial use, do not intend to do the most cutting-edge research, just use the existing methods or frame to complete some tasks, do not have to die on the theory, it is better to see the neural network simple, as a building block process, so-called convolutional neural Network (CNN) or cyclic neural network (RNN It is not the same as building blocks (layers and functions) and the way they are built, and there is a whole set of theories that will help us to match the building blocks best to the tasks that need to be done.
A lot of mathematical background knowledge may be forgotten, but every day the habit of knocking code does not fall, so perhaps with excellent deep learning open-source framework code learning, is also a good point of neural network learning.
Here to organize and share is the use of a very wide range of deep learning framework Caffe, this is a set of the earliest originated in Berkeley deep Learning framework, widely used in the task of neural networks, a large number of paper experiments are done with it, And the domestic e-commerce and other Internet companies a large number of computer vision applications are based on it to complete. The code structure is clear and suitable for learning. 2.CAFFE Code Structure 2.1 General Overview
The typical neural network is a hierarchy, each layer will complete different operations (can be easily understood as having different functions), the stack of operations to complete the forward propagation operation, "compared to the standard answer" after the "Gap (loss)", but also through the reverse propagation to seek to correct "building block structure (parameters)" Required components, The parameter adjustment is then completed.
So Caffe also defines the interlocking classes to better complete the above process. We see here must involve data, network layer, network structure, optimize the network several parts, in Caffe also is such an idea, Caffe source directory structure is as follows.
In many places can be seen that the introduction of caffe species throughout the blob,layer,net,solver are the main categories. These four major categories are responsible for data transmission, network hierarchy, network skeleton and parametric solution strategy, presenting a bottom-up, interlocking state. In the source code can be found corresponding to the implementation of these names, in detail, these 4 parts are responsible for: Blob: Is the medium of data transmission, the neural network involves the input and output data, network weight parameters and so on, in fact, are converted to BLOB data structure to store.
Layer: is the basic unit of the neural network, the data node between the layer and the layer, both forward and backward transmission are implemented in this data structure, because the neural network network design to a variety of layers, here layers to achieve the convolution layer, excitation layer, pool layer, the full connection layer and so on "building blocks", the richness is very high.
NET: is the network of the whole skeleton, the integration layer of the hierarchical structure of the network.
Solver: Is the network solution optimization strategy, so that you use a variety of "building blocks" network can most adapt to the current scene of the sample, if you do deep learning optimization research, may modify the module.
2.2 Code reading order suggestions
After the overall structure has a general impression, you can begin to read the source code, a reference reading order is probably:
Step 1. Caffe.proto: Corresponding directory Caffe-master\src\caffe\proto\caffe.proto
Step 2. HPP file: Includes a solver.hpp-caffe-master\include\caffe\net.hpp b net.hpp-caffe-master\include\caffe\net.hpp c LAYER.HPP-CA FFE-MASTER\INCLUDE\CAFFE\LAYER.HPP D blob.hpp-caffe-master\include\caffe\blob.hpp
Above d,c,b,a These 4 parts are actually bottom-up structures.
Step 3. CPP/CU file: Corresponding to the above mentioned blob, NET, Solver specific implementation, so you will see Blob.cpp,net.cpp,solver.cpp, but note that there is no layer.cpp, but can see \src\caffe\ Layers are derived from various classes corresponding to various neural network layers, such as \src\caffe\layers\data_layer.cpp, Conv_layer.cpp, Relu_layer.cpp, Pooling_layer.cpp, Inner_product_layer.cpp and so on. (Generally speaking, the Caffe framework has implemented a very common network structure, if you have their own needs, add some new levels can be)
Step 4. Tools file Caffe provides a directory in Caffe-master\tools, such as computing image mean, tuning network, visualization, etc. 2.3 Source code Main Line structure diagram
The caffe code of a thin source main line structure diagram is as follows:
2.4 Code Details 2.4.1 Caffe.proto
Caffe.proto is the first part of the recommended reading, which is located in the ... \src\caffe\proto directory. The first thing to note is that Google Protocol Buffer (protobuf) is a mixed-language data standard within Google, a lightweight and efficient structured data storage format that can be used for structured data serialization, or serialization. Used to do data storage or RPC data Interchange format. Caffe.proto run will generate caffe.pb.cc and caffe.pb.h two files, containing many structured data.
A message from Caffe.proto defines a parameter structure that needs to be transferred, and the package Caffe can pack all the files inside the Caffe.proto into the Caffe class. The approximate code framework is as follows:
Package Caffe;
Message Blobproto {...}
Message blobprotovector {...}
Message Datum {...}
...
Message V0layerparameter {...}
A message defines a parameter structure that needs to be transferred, required must have a value, optional is optional, repeated represents a set of vectors of the same type for the subsequent cell. Like what:
Message Netparameter {Optional String name = 1;//Consider giving the network a name//DEPRECATED. See InputParameter.
The input blobs to the network.
Repeated string input = 3; DEPRECATED. See InputParameter.
The shape of the input blobs.
Repeated blobshape input_shape = 8; 4D input Dimensions--deprecated.
Use "Input_shape" instead. If specified, for each input blob there should is four//values specifying the NUM, channels, height and width of th
e input blob.
Thus, there should is a total of (4 * #input) numbers.
repeated Int32 input_dim = 4;
Whether the network would force every layer to carry out backward operation. If set False, then whether to carry out backward are determined//automatically according to the net structure and Le
Arning rates.
Optional BOOL Force_backward = 5 [default = False];
The current ' state ' of the network, including the phase, level, and stage. Some layers is included/excluded depending onThis state and the states//specified in the layers ' include and exclude fields.
Optional netstate state = 6;
Print debugging information about results while running net::forward,//Net::backward, and Net::update.
optional BOOL Debug_info = 7 [default = False]; The layers.
Each of the their configurations, including//connectivity and behavior, is specified as a layerparameter. Repeated layerparameter layer = 100;
ID layers is printed last.
Deprecated:use ' layer ' instead.
Repeated v1layerparameter layers = 2;
}
Caffe.proto Each message is automatically generated after compiling some functions, presumably a naming convention: Set_+field the function name of the setpoint, Has_ check whether the field has been set, Clear_ is used to clean the field, Mutable_ is used to set the value of string, _size is used to get the number of repetitions.
The
has some kind of message category:
belongs to blob: Blobproto, blobprotovector, datum.
:fillerparameter, layerparameter, argmaxparameter, transformationparameter, that belong to layer lossparameter, accuracyparameter, concatparameter, contrastivelossparameter, convolutionparameter, dataparameter, dropoutparameter, dummydataparameter, Eltwiseparameter, expparameter, hdf5dataparameter, hdf5outputparameter, hingelossparameter , imagedataparameter, infogainlossparameter, innerproductparameter,lrnparameter, Memorydataparameter, mvnparameter, poolingparameter, powerparameter, pythonparameter, reluparameter, sigmoidparameter, sliceparameter, softmaxparameter, tanhparameter, thresholdparameter and so on.
:netparameter, solverparameter, solverstate, netstate, netstaterule, that belong to net Paramspec. 2.4.2 blob
As mentioned above, blobs are the most basic data structures used to store data generated during the transmission of the network and to learn some of the parameters. For example, in its previous layer, the following form will be used to indicate the learning parameters:vector<shared_ptr<blob<dtype> > > blobs_; The Blob inside is the class defined here. Some of the code is as follows:
Template <typename dtype>
class Blob {public
:
blob ()
: Data_ (), diff_ (), Count_ (0), capacity_ (0 {}
///@brief Deprecated; use <code>blob (const vector<int>& shape) </code>.
Explicit BLOB (const int num, const int channels, const int height, const int width);
Explicit Blob (const vector<int>& shape);
@brief Deprecated; Use <code>reshape (const vector<int>& shape) </code>.
void reshape (const int num, const int channels, const int height,
const int width);
...
Where template <typename dtype> represents function templates, Dtype can represent data types such as int,double. The Blob is a four-dimensional continuous array (4-d contiguous array, type = float32), if Used (n, K, H, W), then the meaning of each dimension is: N:number. Enter the amount of data, such as the Mini-batch size for SGD.
C:channel. If it is image data, it can be considered as the number of channels.
H,w:height, Width. If it is image data, it can be considered as the height and width of the picture.
The actual blob at (N, K, H, W) positions the value of the physical position for ((n * k + k) * H + H) * W + W.
Inside the blob there are two fields data and diff. Data represents the flow data (output data), while diff stores the BP gradient.
The header file introduced by the BLOB can be understood in the following directions: #include "CAFFE/COMMON.HPP" Singleton Caffe class, and encapsulates the boost and Cuda with
The function generated by the number of machines, providing a unified interface.
#include the header file mentioned in the previous section of "Caffe/proto/caffe.pb.h"
#include "caffe/syncedmem.hpp" is primarily allocating memory and freeing memory. The class syncedmemory defines the functions of memory allocation management and synchronization between the CPU and the GPU. BLOBs use Syncedmem to automatically decide when to copy data to improve operational efficiency, usually only if the GNU or CPU has a copy operation.
#include "caffe/util/math_functions.hpp" encapsulates many Cblas matrix operations, basically the processing functions of matrices and vectors.
A brief description of the function defined in the BLOB is as follows: Reshape () can change the size of a blob;
Reshapelike () re-allocates a piece of space for data and diff, the same size as another blob;
Num_axes () returns the size of the blob;
Count () is calculated to get count=numchannelsheight*width.
Offset () to get the offsets of the input BLOB data (N,C,H,W);
CopyFrom () copies data from source, and Copy_diff as a flag to differentiate whether to copy data or diff.
Fromproto () reads the data from the proto, in fact it is deserialization.
Toproto () saves BLOB data to Proto. Sharedate ()/sharediff () copies the values of data and diff from the other blob;
2.4.3 Layer
Layer is the basic unit of the network ("Building blocks"), thus deriving various layers of classes. This section needs to be modified if a study of the data feature expression is involved. The layer class derives from the layers class through this implementation of two virtual functions forward () and backward (), which produce various functions of the layer class. Forward is the process of calculating top from bottom, backward is just the opposite. In the network structure definition file (*.proto), the number of parameters bottom and top of each layer determines the number of elements in the vector.
Let's take a look at layer.hpp.