The principle of image recognition and convolutional neural network architecture

Source: Internet
Author: User

Turn from: The Heart of the machine

Introduction


Frankly speaking, I can't really understand deep learning for a while. I look at relevant research papers and articles and feel that deep learning is extremely complex. I try to understand neural networks and their variants, but still feel difficult.


Then one day, I decided to start with a step-by-step basis. I break down the steps of technical operations and manually perform these steps (and calculations) until I understand how they work. It was rather time-consuming and stressful, but the results were extraordinary.


Now, not only do I have a comprehensive understanding of deep learning, but I also have good ideas on this basis, because I have a solid foundation. It is one thing to apply a neural network arbitrarily, to understand what it is and the mechanism behind it is another.


Today, I'm going to share my experience with you and show how I got started with convolutional neural networks and finally figured it out. I'm going to do a holistic presentation that will give you an in-depth understanding of the work mechanism of CNN.


In this article, I'll discuss the architecture behind CNN, which is designed to address image recognition and classification issues. I would also assume that you have a rudimentary understanding of neural networks.


Directory


1. How the machine looks at the picture.

2. How to help the neural network to identify the image.

3. Define convolutional Neural Networks


Convolution layer

Pooling Layer

Output layer


4. Summary

5. Using CNN to classify images


1. How the machine looks at the picture.


The human brain is a very powerful machine that can see (capture) Multiple graphs per second and complete the processing of these graphs without consciousness. But the machine is not so. The first step in machine image processing is understanding, understanding how to express an image, and then reading a picture.


In simple terms, each image is a series of specific sorted points (pixels). If you change the order or color of the pixels, the image changes as well. For example, store and read a picture with the number 4 on it.


Basically, the machine breaks the image into a pixel matrix, storing the color code for each pixel that represents the location. In the figure below, the value 1 is white, and 256 is the deepest green (for simplicity, our example is limited to one color).




Once you have stored the image information in this format, the next step is to let the neural network understand this sort and pattern.


2. How to help the neural network to identify the image.


The values for characterizing pixels are sorted in a particular way.




Suppose we try to identify an image using a fully connected network, what to do.


The fully connected network can be used to flatten it, treat the image as an array, and use the pixel value as the characteristic of the predicted image. To be clear, it is very difficult for the network to understand what is going on in the figure below.




Even humans are hard to understand the meaning of the figure 4. We completely lost the spatial arrangement of the pixels.


What can we do? You can try to extract features from the original image to preserve the spatial arrangement.

Case 1


Here we use a weight multiplied by the initial pixel value.




Now the naked eye recognizes that this is "4" easier. But before handing it over to the fully connected network, it also needs to flatten (flatten) it, so that we can preserve the spatial arrangement of the images.




Case 2


Now we can see that flattening the image completely destroys its arrangement. We need to figure out a way to feed the image to the network without flattening, and also to preserve the spatial arrangement feature, which is the 2d/3d arrangement that needs to feed the pixel values.


We can try to take two pixel values of the image at a time, not one. This gives the network a good insight into the characteristics of neighboring pixels. Since it takes two pixels at a time, it takes two weights at a time.




Hopefully you'll notice that the image has changed from the previous 4 column values to 3 columns. Because we now diverted two pixels at a time (pixels are shared in each move), the image becomes smaller. Although the image has become smaller, we can still largely understand that this is "4". Also, one of the important points to be aware of is that we are using two consecutive horizontal pixels, so we will only consider the horizontal arrangement.


This is a way for us to extract features from images. We can see the left and middle parts, but the right part doesn't look so clear. Mainly because of two problems:


1. The picture corner to the left and right is the weight multiplied once obtained.

2. Left side remains, because the weight value is high, the right side because slightly lower weight, some lost.


Now that we have two problems, we need two of solutions.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.