CNN (convolutional neural Network)

Source: Internet
Author: User

CNN (convolutional neural Network)

Convolutional Neural Networks (CNN) dating back to the the 1960s, Hubel and others through the study of the cat's visual cortex cells show that the brain's access to information from the outside world is stimulated by a multi-layered receptive Field. On the basis of feeling wild, 1980 Fukushima proposed a theoretical model Neocognitron is the first application of the field of artificial neural network. In 1998, the LENET-5 model proposed by LeCun was successful in handwritten character recognition, which caused the attention of the academic circles to convolutional neural network. In 2012, Krizhevsky and other people put forward the alexnet model in the Imagenet Image classification contest to obtain the first place, the model accuracy leading the second 11%. After the alexnet, Vgg (Visual Geometry Group), Googlenet, ResNet were also presented.

Ann Simplification Basis

We can also use Ann to classify images. However, using Ann for picture categorization can result in too many model parameters. For example, the training picture size is 100x100x3, the first layer of the hidden layer of the number of neurons is 1000, the parameters reached 3x107. Therefore, people need to use prior knowledge to simplify Ann to reduce the parameters of the model, and the simplified structure is CNN.

The simplified basis of Ann: ① for a neuron, there is no need to connect the global area of the picture, only the local area is connected to the information to detect the local object, 1-1 is shown.

Figure 1-1 overall area and local area

② the same object may appear in different areas of different pictures. If there is a beak position in the upper left corner, and in the middle of another picture, 1-2 shows

Fig. 1-2 Beak in different areas

③ the picture, it will not have a great influence on the understanding of the picture content. Can be the odd line of pictures, even the number of pixels, so that the picture into the original image of 1/4, but does not affect our understanding of the picture content, 1-3. Sub-sampling reduces the size of the picture and then decreases the amount of model parameters.

Fig. 1-3 Sample of the film

CNN Structure convolutional layers (convolution layer)

Convolutional layers are primarily convolution operations. We first define a filter (also known as convolutional Core), which is actually a matrix (1-4)

Figure 1-4 Filter

In figures 1-4 and 1-5, for example, the convolution operation is shown in 1-6. A convolution operation is a filter that corresponds to a local area of the image covered by the filter (1-5 of the red area), with each pixel multiplying first and then accumulating. Assuming that each pixel value on the filter is fij, the pixels of the covered area of the picture are Amn, then the convolution operation is

Figure 1-5 Image

Figure 1-6 Convolution operation

After the convolution is completed, the filter is moved and the convolution is performed, and the distance is called Step (stride). Assuming a step of 1, move to the right, and 1-7 as shown. A single filter is shown in the dynamic process of completing the convolution on the picture 1-8

Figure 1-7 Step Size

Figure 1-8 Full convolution operation

The effect of the convolution calculation of the filter is that if the local area of the image is similar to the filter, the output value after convolution will be larger, and the larger the value of the convolution will indicate that the corresponding object is detected. The size of the filter only covers the local area of the image, corresponding to the "Ann simplification basis ①", while in the moving process, the filter is unchanged, you can detect the same objects in different regions, corresponding to the "Ann simplification based on ②". The corresponding parameters of the filter during the moving process are constant, so it is also called weight sharing.

Multi-Filter convolution operation

For an input image, we can set up multiple filters to make the convolution. For example, if we use 5 filters for convolution, we get 5 feature graphs of the same size. These feature graphs are then combined into a new 5-height image as input to the next layer of convolution. Therefore, the convolution will only get a picture, the depth of the image is the number of filters.

Pooled layers (Pooling layer)

The convolution layer will get a matrix after the convolution operation, which is called the feature map (Feature map). Pooling is the dividing of a feature map into areas where each region is represented by a value. 1-9, the feature map is divided into 4 2x2 regions, each region with the maximum value for the representation (of course, you can also take the mean, the minimum value and other operations).

Figure 1-9 Maximum pooling operation

The function of the pool layer is to reduce the feature map and get a smaller sub-graph to characterize the original image. Therefore the pooling operation corresponds to "Ann simplification basis ③".

parameter Training process

CNN's entire training process is not much different from Ann, which uses BP to update its parameters. However, since there is a feature of increased weight sharing in CNN, special handling is required when updating. As shown in 1-10, the weights of different colors are the same for the forward propagation. In the case of BP update parameters, however, the calculated gradient is different because the input values are different. In order to make all the weights of different colors equal, the gradient of each weight of the same color can be calculated separately, and then the average value is taken. Finally, let each weight update the same value.

Figure 1-10 Weight Sharing

Learning objects at all levels of CNN

CNN did succeed in image processing, but because of the complexity of the interior of CNN, CNN was often seen as a black box. So we still need to explore what CNN has learned.

Detection object of filter

How to know the filters in each layer in the detected object? For a normal process, we should enter a picture, the larger the output value after convolution, indicating that the filter is in the detection of this object. Therefore, we need to reverse the normal process. That is, the larger the output value of the premise, to generate the input of the picture.

each element of the filter after the convolution matrix (1-10) is denoted , K represents the K filter. The evaluation function is that the objective of the solution is to solve the problem by using the gradient rise.

Figure 1-10 Feature Map

The filter learns the detection object, shown in 1-11. Each filter detects different objects, some are vertical lines, others are horizontal lines, and so on.

Figure 1-11 Detection object of the filter

detecting objects of fully connected layer neurons

We can make use of the same method as the filter to detect the object of the fully connected layer neuron. As shown in 1-12. Unlike the filter's detection object, the neuron output of the fully connected layer is more like a global object than a local object.

Fig. 1-12 detection object of fully connected layer neurons

detection object of output layer neuron

When we consider the output layer nerve detection object, using the same method as the filter to detect the object, the result of 1-13 shows that the result is very bad, because there is no idea of what the generated picture is.

Fig. 1-13 Detection object of output layer neuron

However, you put 1-13 of 8 as the input of CNN, the output will indicate that the image is 8. Therefore, the deep neural network does not seem to be so intelligent, it is easy to be deceived. In order to make the detection object of the output layer neuron look more like a number, we need to make some changes to the evaluation function. For numbers, the number of lines that occupy the total picture size is not many. Therefore, we need to make the generated x* as small as possible. The final evaluation function is

Compared with the original evaluation function, is actually more than the L1 regularization of this item. The greater the value of the evaluation function, the smaller the L1 need to be, the better. The new evaluation function will get a slightly better result, as shown in 1-14

Figure 1-14 The detection object of the output layer neuron after L1

practical application of CNNDeep Dream

Deep Dream is you give a picture, then the machine will be based on the picture content, increase the machine to see the object (1-15). The general idea of Deep dream is similar to the method of detecting objects.

Figure 1-15 Deep Dream

Deep Dream's approach flow 1-16 shows that given a picture, the output vector of the image is obtained. For output vectors, the greater the positive value, the lower the negative value. The idea is to exaggerate the objects that CNN detects. Finally, the modified vector is used as the target to reverse the image adjustment.

Figure 1-16 Deep Dream Approximate process

Deep Style

Deep style is given two pictures, one is partial content, one is biased style. Then, the two images are "fused," as shown in effect 1-17

Figure 1-17 Deep Style

The deep style of the general idea of 1-18, a CNN extract the content of the picture; a CNN extract the style of the picture, the style of the image is mainly reflected in the filter and filter output value of the correlation between. The final image should be constructed so that the output is similar to the content on the left, and the correlation between the filter output values is similar to the right.

Figure 1-18 Deep Style thought

Play Go

To use CNN for the next go, the input is the current chessboard is, the output is the next sub-position. For the whole chessboard, we can use (1,-1,0) to represent (sunspots, white, no children). The reason behind CNN is that it can be used to go to the game because it is similar to the "Ann simplification basis" for some features of Weiqi. On the go, there is sometimes no need for global information, and this local information may be in different places of the board. However, go is not like a picture and cannot be sampled. Therefore, when applying CNN to Weiqi, the pooling layer should be removed.

Figure 1-19 Weiqi

References

[1] Machine Learning - Li Hongyi

CNN (convolutional neural Network)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.