convolutional neural Network (CNN) Beginner's Guide 1

Source: Internet
Author: User

http://blog.csdn.net/real_myth/article/details/52273930

convolutional neural Network (CNN) Beginner's Guide 2016-07-29 18:22 Blake1 reviews convolutional Neural Networks (convolutional neural network,cnn) Beginner's Guide

Introduction

convolutional neural Networks: Sounds like a strange combination of biology and math and a handful of computer science, but these networks have created some of the most influential innovations in the field of computer vision. 2012 The neural network began to emerge, that year Alex Krizhevskyj in the Imagenet competition (Imagenet can be regarded as the competition in the field of computer vision for the annual "Olympic Games") to reduce the classification error record from 26% to 15%, It was a pretty amazing improvement at the time. Since then, many companies have started to apply deep learning to their core services, such as Facebook's application of neural networks to their automatic labeling algorithms, which Google has applied to image search, and Amazon has applied it to product referral services. Pinterest applies it to the home page personalization stream, and Instagram also applies deep learning to their image search architecture.

However, the most classic, or most popular, example of neural networks is used in the field of image processing. Referring to image processing, this paper mainly introduces how to use convolutional neural network for image classification.

Problem space

Image classification is the task of classifying input images (cats, dogs, etc.) or classifying them into categories that best describe the characteristics of an image. For humans, cognition is the first skill that we learn after birth, and it is a very natural and relaxing skill as an adult. We can not hesitate to quickly identify the environment and objects around us, when we see a picture or look at the environment, most of the time we can immediately judge the scene and give each object is marked, these do not even need to be deliberately observed. These skills can quickly identify patterns, deduce from our previous experience, and then apply them to different images or environments-all of which are different from our machines.

Input and output

When the computer sees a picture (that is, it enters a picture), it sees a series of pixel values. Depending on the resolution and size of the image, the computer will see a 32x32x3 digital array (3 refers to the rgb-color value). Let's take a little bit of this, assuming we have a JPG-format picture of 480 x 480, which is 480 x 480 x 3 for the expression array. Each of these numbers can have a value from 0 to 255, which describes the pixel strength at this point. Although these figures do not make any sense to us in classifying images, they are the only data that the computer obtains when the image is entered. The idea is that you assign the computer the relevant data arrangement, it will be the image is a specific category of the possibility of output (such as 80-cat, 15-dog, 05-bird, etc.).

What do we want the computer to do?

Now that we understand that the problem is in the input and output, let's consider how to solve the problem. What we want the computer to do is to identify the different categories in all the given images, and it can find the traits of "dogs are dogs" or "cats are cats." This is the process of cognitive recognition in our minds, and when we see a picture of a dog, we can classify it because there are obvious features like claws or four legs in the image. In a similar way, computers can perform image classification tasks by finding low-level features such as edges and curves, and then using a series of convolution layers to create a more abstract concept. This is a general overview of convolutional Neural network applications, and then we'll explore the details below.

Biological links

First of all, a little bit of background knowledge, when you first heard the word convolutional neural network, you might think it is not related to neuroscience or biology? Congratulations, you guessed the right part. Convolutional neural networks are indeed inspired by the biological visual cortex, where cells with tiny areas of the visual cortex are sensitive to specific areas of vision.

In the 1962, Hubel and Wiesel found that some neurons in the brain reacted only to the edges of certain directions. For example, some neurons respond when exposed to vertical edges or to some horizontal or diagonal edges. Hubel and Wiesel found that all of these neurons were structured in a columnar structure that enabled them to produce visual perception. A particular member of the system can accomplish a specific task this idea (nerve cells looking for specific features in the visual cortex) can also be well applied to machine learning, which is the basis of convolutional neural networks.

Architecture

A more detailed description of the Curl neural network is to pass the picture through a series of convolution, nonlinearity, pool (sampling), fully connected layers, and then get an output. As we said earlier, the output is the probability of a class or an image category. Now, the hard part is understanding each level of the task.

First Floor-Math

The first layer of convolutional neural networks is the convolution layer, and the first thing you need to remember is what the input of the crimp layer is. As we mentioned earlier, the input is a 32x32x3 series pixel value. The best way to explain the convolution layer is to imagine a flashlight shining on the left top of the image, assuming the area of the flashlight is 5 x 5. Imagine this flashlight sliding in each area of the input image. In machine learning terminology, this flashlight is called a filter (sometimes called a neuron or a core), and the area it illuminates is called the receiving field. This filter is also a series of data (these are called weights or parameters). It must be mentioned that the depth of the filter must be the same as the depth of the input (so that the math works properly), so the size of this filter is 5x5x3. Now, let's take the first position filter as an example. Since the filter is sliding or convolution on the input image, it is multiplied by the value of the pixel value in the original image of the filter (also known as the multiplication of the calculated element), all of which are summed up (mathematically, this will be the sum of 75 times multiplied). So now you have a number. Keep in mind that this number is only representative when the filter is in the upper-left corner of the image, and now we repeat the process in every position. (The next step is to move the filter to the 1 units on the right, then to the right 1 units, etc.), and a number will be generated for each unique position on each input layer. Slide the filter all the places, and you'll find that the rest is a 28x28x1 series of numbers, which we call an activation or feature map. The reason you get a 28x28 array is that there are 784 different locations, and a 5x5 filter can be adapted to a 32x32 input image, and this group of 784 numbers can be mapped to a 28x28 array.

We currently use two 5 x 5 x 3 filters and our output will be 28x28x2. By using more filters, we are better able to maintain space dimensions. On a mathematical level, these are tasks that are performed in a convolution layer.

First layer-higher-order perspective

Let's talk about this convolution task from a high-level perspective, where each of these filters can be considered a feature identifier. When I say features, I say things like straight edges, simple colors and curves. Consider that all images have the same simplest characteristics. Our first filter is 7x7x3, and it's a curve detector. (In this section let's ignore the fact that the filter is 3 units deep, considering only the depth and image of the top filter.) As a curve detector, the filter will have a higher value and a curved shape of the pixel structure (remember about these filters, we consider only the numbers).

Now, let's go back to the Math visualization section. When we have this filter in the upper left corner of the input, it calculates the product between the filter and pixel values in which region. Now let's take an image we want to classify as an example and put our filters in the upper left corner.

Remember, what we need to do is to use the original pixel values in the image to product in the filter.

Basically in the input image, if there is a shape that is similar to this filter's representation curve, then all multiply accumulation together will result in a larger value! Now let's see what happens when we move our filters.

The detection value should be much lower! This is because there is no partial response curve detection filter in the image. Remember, the output of this convolutional layer is an activation diagram. So, in a simple case of a filter's convolution (if the filter is a curve detector), the activation diagram will show that most of the curve areas that may be in the picture. In this example, the value at the top left of our 28x28x1 activation graph will be 6600, which means it is likely that a certain curve in the input is causing the filter to activate. Because there is nothing in the input to make the filter active (or more simply, the original image in the region does not have a curve), its value on the top right of our activation graph will be 0. Remember, this is just a filter. This filter will detect lines outward and to the right of the curve, we can have other curves to the left or directly to the edge of the filter line. The more filters, the deeper the activation graph, the more information we get from the input.

Disclaimer: The filter described in this section is simplified, and its main purpose is to describe the mathematical process during a convolution process. You will see examples of the actual display of filters for the first convolutional layer in the trained network, although the main argument remains the same.

Further in-depth network

Now shows a traditional convolutional neural network structure, and other layers are interspersed between these layers. Interested readers are strongly advised to understand their functions and roles, but in general they provide non-linear and dimensional retention that helps improve network robustness while also controlling over-fitting. A classic convolutional neural network architecture looks like this:

However, the last layer is very important, but we will mention it later. Let's take a step back and look back at what we're talking about. We talked about the first convolutional layer of the filter that was designed to detect. They detect low-order features such as edges and curves. As you can imagine, to predict the type of image, we need neural networks to recognize higher-order features, such as hands, claws, and ears. Let's consider what the output of the network will be after the first layer of convolutional layer, which would be a 28x28x3 volume (assuming we use three 5x5x3 filters). When passing through another convolution layer, the first output of the convolution layer becomes the input of the second convolution layer, which is difficult to visualize. When we talk about the first layer, the input is just the original image. However, when we talk about the second convolutional layer, the input is the result of the first layer of the activation diagram (S). Therefore, the input of each layer is basically the position of some low-order features in the original image. Now when you apply a set of filters (through the second convolution), the output is activated and represents a higher-order feature. The types of these features may be semicircular (a combination of curves and line edges) or squares (combinations of several straight edges). When through the network, more convolutional layers, you can activate the map, representing more and more complex features. At the end of the neural network, there may be some active filters that indicate that they see handwriting or pink objects in the image, and so on. Another interesting thing is that as you explore deeper in the Web, filters start to get bigger and larger, which means they can receive information from a larger area or more primitive inputs.

Fully connected Layer

Now we can detect these high-order features, and the icing on the line is to connect a fully connected layer at the end of the neural network. This layer basically will be an input amount (whether the output is convolution or relu or pool layer) and the output of an n is the program to select the class of n-dimensional vectors, as shown in the procedure. This fully-connected layer works by looking at the output of the previous layer (the activation diagram representing the higher-order feature) and determining which features are the most relevant for a particular class. For example, if the program predicts that some images are a dog, it will have high values in the activation graph, representing higher order features such as a paw or 4 legs. Similarly, if the program is predicting that some images are a function of a bird, it will have higher-order values in the activation graph, representing higher-order features such as wings or beaks.

Training process

Training engineering as a part of the neural network, I intentionally did not mention it, because it could be the most important part. You may encounter many problems when reading, such as how filters in the first convolutional layer know to look for edges and curves? How does the full connection layer know where the activation diagram is? How do the filters in each layer know what the value is? The way a computer can adjust its filter values (or weights) is through a training process called reverse propagation.

Before we introduce the reverse propagation, we must first look at what is needed to run the neural network. At the moment we were born, our minds are brand new, we don't know what a cat is, what a bird is. Similarly, before the convolutional neural network begins, the weights or filter values are random, the filter does not know to look for edges and curves, and in the higher-order layer the filter does not know to look for claws and beaks. However, when we were a little older, our parents and teachers showed us different pictures and images and gave us a corresponding label. The idea of labeling an image is a training process for convolutional neural Networks (CNNs). Before we talk about it, let's introduce a little bit of our training set, which has thousands of images of dogs, cats and birds, and each image has a label that corresponds to what the animal's picture is.

The reverse propagation can be divided into 4 different parts: forward propagation, loss calculation, reverse propagation, and weight update. In the forward propagation process, you need a digital array as a 32x32x3 training image and pass it through the entire network. In our first training example, all weights or filter values are randomly initialized, and the output may be similar to [. 1.1.1.1.1.1.1.1.1.1] Something that is basically an output that does not give priority to any number. The current weight of the network is not able to find those low-order functions, and therefore can not make any reasonable conclusion of the classification possibilities. This is the loss calculation part of the reverse propagation. We now use the training data, which has an image and a label. For example, the first input training image is a 3, then the label of the image will be [0 0 1 0 0 0 0 0]. Loss calculations can be defined in many different ways, but it is common for the MSE (mean variance) to-½ times (actual predictions) squared.

Assuming that the variable L equals this value, as you can imagine, the loss will be very high for the first set of training images. Now, let's think more intuitively. We want a point prediction tag (the output of convnet) as the training tag to train the same (which means our network forecasts are correct). In order to achieve this, we should try to reduce the amount of my loss. Visualization is just an optimization problem in calculus, and we need to find out which inputs are (the weights in our example) that most directly cause the loss (or error) of the network.

This is a mathematical equivalence of DL/DW, where W is the weight in a particular layer. Now what we're going to do through the network is to perform a reverse propagation process, detecting which weight loss is greatest and finding ways to adjust them to reduce the loss. Once we have finished this calculation process, we can go to the last step-the weight update. The weights of all filters are updated to make them change in the gradient direction.

The learning rate is a parameter chosen by the programmer. A high learning rate means that more steps are in the Weight Update section, so it may take less time for the best weights to converge on the model. However, the rate of learning is too high, which may lead to crossing too big and not accurate enough to reach the best point.

The process of forward propagation, loss calculation, reverse propagation, and parameter updating is also known as an epoch. The program repeats this process for each fixed number of epochs, each training image. In the last training example to complete the parameter update, the network should be good enough to train, the weight of each layer should also be correct.

Test

Finally, to test whether our convolutional neural network works, the different picture and label sets through the convolutional neural network, the output and the real value of the comparison, you can test whether it is functioning properly.

How the industry uses convolutional neural networks

Data, data, data. To a convolutional neural network of more training data, you can do more training iterations, you can achieve more weight update, the neural network for better tuning parameters. Facebook (and Instagram) can use all of the current photos of hundreds of millions of users, Pinterest can use 50 billion of the information on its website, Google can use search data, and Amazon can use millions of products per day to buy data.

convolutional neural Network (CNN) Beginner's Guide 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.