convolutional Neural Network (convolutional neural Networks)

Source: Internet
Author: User
Tags types of filters

convolutional neural Network (CNN) is the foundation of deep learning. The traditional fully-connected neural network (fully connected networks) takes numerical values as input.

If you want to work with image-related information, you should also extract the features from the image and sample them. CNN combines features, down-sampling and traditional neural networks to form a new network.



This blog post has assumed that you already have the concept of a simple neural network, such as "layer", "neuron".

1. Theoretical basis

                  Figure 1

As shown in 1, this is a simple convolutional neural network for CNN. The C-layer represents all the layers that are obtained after filtering the input image, also called "convolution layer". The S layer represents the layer that the input image is sampled (subsampling, which is now generally said to be max-pooling). Where C1 and C3 are convolution layers, S2 and S4 are the next sampling layers.

Each layer in the C, S layer consists of a plurality of two-dimensional planes, and each two-dimensional plane is a feature map (feature map).

Take the example of CNN, shown in Figure 1, to talk about the process of image processing:,

After the image input network, the convolution is obtained through three filters (filter) to obtain three feature maps of the C1 layer (feature map). The three feature graphs of the C1 layer are respectively sampled to obtain three feature graphs of the S2 layer. These three feature graphs get three feature graphs of the C3 layer through a filter convolution, then similar to the previous one, and the next sample obtains three feature graphs of the S4 layer. Finally, the S4 layer is transformed into a vector after the fences of the feature map. This vector input is further classified into the traditional fully-connected neural network (fully connected networks).

All feature graphs in the C1, S2, C3, S4 layers in the diagram can define the image size with pixel x pixels. Would you say that the size of the image is not defined by pixel x pixels? Yes, but it's a bit special here, because these feature graphs make up the convolutional layer and the lower sampling layer of the neural network, and in the neural Network (neural networks), each layer has the concept of "neuron" (neurons). Each pixel in these feature graphs is just as old as a neuron. The number of pixels of all feature graphs in each layer is the number of neurons in the layer network, which can be calculated, how to calculate? Take a look at the back.

Hidden layers (hidden layer):

After referring to the concept of neurons, it is possible to tell the c-s of the hidden layer (hidden layer)in CNN (the hidden layers of the nn section is not here) and the concept of the number of connections between layers and neurons (connections) .

1. The hidden layer between the other layers and the convolution layer:

The filter is also between these two layers, the filter is generally defined as:

The Filter_width and Filter_height respectively refer to the width and height of filter filter range. Filter_channels refers to the number of channels the filter filters the image. Filter_types refers to the type of filter.

such as 5x5x3→20: the filter width is 5 pixels, the filter (convolution) channel number is 3, a total of 20 such filters

The concept of local perception and the concept of weight sharing are covered below.

  

In general, if all neurons in the C layer are connected to each pixel on the input image, the number of connections is extremely large. The number of parameters to learn is also jaw-dropping.

For example, if the input image size is 200X200,C1 layer by 6 feature graphs, each feature map size is 20x20 (the number of neurons is 20x20=400). Also, the filter is 10x10 single channel filter (channel = 1), and the step size is 10---equivalent to the adjacent two filter area exactly do not overlap, so as to facilitate the calculation. So the total number of connections is 200x200x (6x20x20) = 96000000. God, Chiching connection, if each connection a parameter, then more than 90 million of the training parameters (learnable parameters), such a complex network, there is no calculation of the possible, the Earth is destroyed, the parameters are not trained well ...

In our lives. See a thing, usually is the first to see its parts, generally will not be able to see the entire contents of an item. "Local feel Wild" is this phenomenon, CNN, the use of local feel wild. For this reason, each neuron in the convolution layer is connected only to pixels in a local area of the input image . (and the size of this local area is the size of the filter filter_width x filter_height).

A feature map of the C layer (feature map) is the input image obtained through a filter. Assume that the filter extracts the "Tiga" feature. Each neuron above the feature map is associated with the local area (and the pixels in the local area) that corresponds to the original image.  So, after each neuron has acquired its own "Tiga" feature of its corresponding region, it is not equivalent to acquiring the "Tiga" feature of the original image after the whole of it is made up. This is the great advantage of "local sensation" Wow!

Let's figure out what the number of connections is now! Each neuron is connected to only one 10x10 area, which is 10x10x (6x20x20) = 240000. Oh, my God! There are now up to 24w of available training parameters. You're right, it's the only thing that's going to be a little more than CNN. Strategies that make many repetitive parameters appear, the actual number of parameters that need training is even less!

PS: Each neuron corresponds to a value that is calculated by the filter (filter) convolution of all pixel values in the area of the original image it points to.

Next, let's talk about the concept of weight sharing . We already know that. The input image obtains a feature map through a filter convolution. Each neuron on the feature map is connected to a pixel point on a rectangular filter area on the original image, and the above example is 10x10. Is it spicy? Each neuron is connected to the 10x10 input neurons on the original. Because these neurons want the same feature, they are filtered by the same filter. Therefore, the parameters of this 10x10 connection on each neuron are a hair-like one.  Yes, it's a hair-like!!!  The explanation is not very reasonable appearance?! In fact, this 10x10 parameter is shared by all neurons on this feature map. This is the weight sharing Ah! So even if 6 feature maps, only 6x10x10 = 600 parameters that need training AH ah AH!!!

Further, this 10x10 parameter seems to be only related to filtering, like a 6 filter, each with 100 parameters. And these parameters need to be trained. It's like a self-learning filter!!! Suddenly felt very demonic.

now we already know:

number of training parameters = (filter size + optional offset number) x filter type

Among them, I. Filter size = filter_width x filter_height x filter_channels, which is the size of the filter

II. Bias: Can be offset, generally 1

Iii. filter_types: Types of filters

e.g. if the hidden layer filter is defined as 5X5X3-20, then the training parameter = (5x5x3 + 1) x 20 = 1520

    

Number of neurons on the convolution layer = numbers of feature graphs X number of neurons on each feature graph

The number of neurons on each feature map is related to the width and height of the input graph, the width of the filter, and the step size of the filter.

The width and height of the feature map are N and M, respectively:

The formula can be introduced:

         

number of neurons = NXM

    

The derivation process is simple:

...

Next

      

The number of neurons on a feature map = the number of input filters (and the width of the input graph, the width of the filter, the size of the filter) and the specific formula

Number of connections = number of neurons x filter area size

God's

  

2. The hidden layer between the convolution layer and the lower sampling layer:

Next

What does the next sample layer do? What do you do?

Number of parameters, number of neurons, number of connections in the lower sampling layer

2. Calculation Practice

The following is an example of Yann LeCun's handwritten numeral recognition of CNN "LeNet-5", which calculates some of the parameters:

Figure 2

(The picture in the paper itself is so vague, not the reason for my posture, everyone will see it)

convolutional Neural Network (convolutional neural Networks)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.