C ++ convolutional neural network example: tiny_cnn code explanation (10) -- layer_base and layer Class Structure Analysis

Source: Internet
Author: User

C ++ convolutional neural network example: tiny_cnn code explanation (10) -- layer_base and layer Class Structure Analysis

In the previous blog posts, we have analyzed most of the layer structure classes. In this blog post, we plan to address the last two layers, it is also the two basic classes layer_base and layer that are at the bottom of the hierarchy for a brief analysis. Since the layer class is only a simple instantiation of layer_base, The layer_base class is analyzed here.

First, the basic structure diagram of the layer_base class is given:

I. member variables

As layer_base is the base class of the class architecture and the cornerstone of building the network layer, it encapsulates the basic attributes of the network layer and has a large number of corresponding member variables:

Next we will give a general introduction to the basic meanings of these member variables:

(1) in_size _ and out_size _: the input and output data sizes of the current layer are saved.

(2) parallelize _: Boolean flag used to mark whether the current project uses TBB multi-thread acceleration.

(3) next _ and prev _: two pointers pointing to the layer_base type. They are used to point to the next layer of the current layer and the previous layer of the current layer. They are the key links to maintain inter-layer connections.

(4) a _: retains the intermediate results of the current layer convolution operation.

(5) output _: final feature output of the current layer after function activation.

(6) prev_delta _: the error sensitivity transmitted from the previous layer (used in the gradient descent method ).

(7) W _ and B _: convolution kernel weight and offset of the current layer.

(8) dW _ and db _: weight derivative and offset derivative, used to update the weight and offset.

(9) Whessian _ and bhessian _: the relevant variables of the hasensen matrix. The specific meanings will be explained in the subsequent blog.

(10) prev_delta2 _: The second derivative of the Error Relative to the input. It is mainly used for Error Calculation in the fully connected layer.

Ii. Constructor

The function of constructor is very simple. You can call the set_size () member function to initialize various variables at the network layer:

Layer_base (partition, partition out_dim, size_t partition, size_t partition): parallelize _ (true), next _ (nullptr), prev _ (nullptr) {set_size (in_dim, out_dim, weight_dim, bias_dim); // initialize the parameters at the neural network layer}

Note that the parallelize _ flag is initialized to true by default, that is, TBB acceleration is used by default. As for the set_size () function, the parameters are initialized by calling the vector member function resize.

Iii. Weight Initialization

The weight Initialization is mainly completed through the set_size () function (note that this function is not only called in the constructor). As mentioned above, this function is essentially calling resize ():

        void set_size(layer_size_t in_dim, layer_size_t out_dim, size_t weight_dim, size_t bias_dim) {            in_size_ = in_dim;            out_size_ = out_dim;            W_.resize(weight_dim);            b_.resize(bias_dim);            Whessian_.resize(weight_dim);            bhessian_.resize(bias_dim);            prev_delta2_.resize(in_dim);            for (auto& o : output_)     o.resize(out_dim);            for (auto& a : a_)          a.resize(out_dim);            for (auto& p : prev_delta_) p.resize(in_dim);            for (auto& dw : dW_) dw.resize(weight_dim);            for (auto& db : db_) db.resize(bias_dim);        }

Note that the range for loop is used to traverse and operate the elements in the vector container. This is a feature of C ++ 11 and needs to be realized slowly, however, from the perspective of traversal, this is indeed more convenient and secure than the traditional for loop.

Iv. Pure virtual function set

As layer_base is a public base class, it is necessary to define some virtual functions and pure virtual functions for different types of child classes derived from them to be rewritten. Here, the author chooses to define a pure virtual function as the activation function and the forward/backward propagation algorithm. The reason is clear: the forward/backward propagation algorithms at different layers are different, and the activation function is optional:

/********** Declare all activation functions, Forward propagation, and reverse propagation as pure virtual functions, define *********/virtual activation: function & activation_function () = 0; virtual const vec_t & forward_propagation (const vec_t & in, size_t worker_index) = 0; virtual const vec_t & back_propagation (const vec_t & current_delta, size_t worker_index) = 0; virtual const vec_t & assign (const vec_t & assign) = 0;

5. Saving intermediate states

Because the training time of Convolutional neural networks is long, it is necessary to define an interface for saving intermediate training results to complete resumable data transfer (this term may be inappropriate ), therefore, layer_base provides the structure functions save and load for saving and loading the intermediate training status of the network:

/********* Save the weights and offsets (intermediate training results) in the network layer **********/virtual void save (std :: ostream & OS) const {if (is_exploded () throw nn_error ("failed to save weights because of infinite weight"); for (auto w: W _) OS <w <"; for (auto B: B _) OS <B <"";} /********** load the intermediate training value **********/virtual void load (std: istream & is) {for (auto & w: W _) is> w; for (auto & B: B _) is> B ;}

Here, we mainly use stream operations to complete the input and output operations of results, which also reflects powerful C ++ features.

Vi. Weight update

There are two main operations for layer_base to update the weight value. One is the initialization operation set_size () for the weight and offset parameters. This is already described in the previous article; the second is to update the update_weight () function (). The update_weight () function updates the corresponding weights and offsets by calling the update () function in each convergence algorithm (for example, the gradient_descent_levenberg_marquardt algorithm used by default:

The specific implementation details of the update function depend on the convergence algorithm used. For this part, I will introduce the convergence algorithm (Optimizer structure) in detail. However, from the calling form on the surface, we can see that dW (first-order derivative) and Jason matrix (second-order derivative) are required in the process of updating the weight value in the BP algorithm ).

VII. Attribute return parameters

This part of the structure function is almost a required function at each network layer, allowing you to view the specific parameter information and feature output results at the corresponding network layer, return Statement and output_to_image type visual conversion function. The return Statement is responsible for returning the relevant member variables of the network layer (some simple operations can be performed internally), output_to_image () the function is responsible for converting the ing kernel and feature output results into the form of images for our viewing. These are mentioned in previous blog posts and will not be described here.

8. layer Structure Analysis

Compared with the layer_base class, the layer structure function is much simpler and can be divided into three types. Activate function instantiation, save/load function instantiation, and define error message.

8.1 activate function instantiation

Since the activation function is defined as a pure virtual function in the layer_base class, the author chooses to instantiate It In The subclass layer:

This involves the use of the Activation class, which encapsulates various types of Activation functions. In the follow-up blog, we will take up one or two articles to analyze this class.

8.2 save and load the intermediate training Value Function

There is nothing to elaborate on here. input and output are performed through the stream operation basic_ostream:

/*
 
  
Std: basic_ostream
  
   
& Operator <(std: basic_ostream
   
    
& OS, const layer_base & v) {v. save (OS); return OS;} template
    
     
Std: basic_istream
     
      
& Operator> (std: basic_istream
      
        & OS, layer_base & v) {v. load (OS); return OS ;}
      
     
    
   
  
 

8.3 error message Function Definition

Three types of information prompting functions are defined in the layer: Connection mismatch, input feature dimension mismatch, and lower sampling dimension mismatch:

(1) If the connection does not match, the connection_mismatch function is displayed. This function is called when the program finds that the feature output dimension of the current layer is different from the feature input dimension of the latter layer, formats the output error information, and specifies the specific layer of the problem.

(2) input feature dimension mismatch information prompt function data_mismatch: This function is called when the program finds that the input data dimension does not match the input dimension of the current layer, formatting the output error message, specifies the specific layer where the problem occurs.

(3) subsample dimension mismatch information prompting function pooling_size_mismatch: this function is mainly called when the program finds that the current feature dimension cannot be divisible by the subsample window size, formatting the output error information, specifies the specific layer where the problem occurs.

Note that the above three functions are only responsible for formatting and outputting error message prompts. The specific error check mechanism must be compiled in the corresponding possible call environment for judgment.

IX. Precautions

1. Range for Loop

In the tiny_cnn project, containers are traversed over time, and all the containers adopt a range for loop, which may be a little unacceptable for the children's shoes that have been using the traditional for loop before, after all, the range for loop is safe and simple, and will be used more in the future.

2. The functions of layer_base are not completely described.

The member functions in the layer_base class are not fully described in the above section. The member functions used in some small patches will be explained later.

3. The activation function is not equal to the convergence algorithm.

Here we emphasize a concept that is confusing for beginners, namely activating functions and convergence algorithms. First, the two are completely different. For example, activation functions include sigmoid, tanh, and Relu. Convergence algorithms mainly refer to gradient descent.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.