Evolution notes of deep neural networks in image recognition applications

Source: Internet
Author: User
Tags joins

evolution of deep neural networks in image recognition applications
"Minibatch" You use a data point to calculate to modify the network, may be very unstable, because you this point of the lable may be wrong. At this point you may need a Minibatch method that averages the results of a batch of data and modifies it in their direction. During the modification process, the change intensity (learning rate) can be adjusted. At the beginning of the time, not in doubt to learn quickly a slow, slowly have a grasp, to learn slowly.

"Dropout", "batchnormalization"


"Black Magic"

But there is one thing without essential changes, this network structure is still belong to black magic. But in essence, the experiment came out. Experiment to get some good structure, function, and then in turn to explain to you, it always has some reason.

"LeNet5" "AlexNet" "googlenet" "Residual Net" ""

The lenet5,2012 year AlexNet can drastically reduce the error rate of image recognition by 10%. Then Google googlenet to the complexity of the network structure to break through everyone's imagination. Before that, everyone was doing it in the LeCun way. Microsoft's residual net is also a good job, to make the network structure deeper and more effective.



convolutional neural networks have two angles of improvement.

"Local Connection" "Weight sharing"

The first is that the next layer of nodes is not all nodes connected to the previous layer, it only connects to a few nodes in one area. For example, if you have 10 nodes on the 1th floor and 10 nodes on the 2nd layer, then a node in the 2nd layer joins a small area, a small local area, such as 3 nodes. In the image, it means you're probably just connecting a small area of the image to a small patch.

The second improvement is to connect to this small area where your small area is the same as the weight of all connections to the next small area, called weight sharing (weighted sharing). Or, you have a linear convolution nucleus (convolutional kernel). This convolution core, such as the 3x3 convolution core, has 9 numbers, and these 9 numbers are taken in this space to calculate a convolution, averaging a second shot to another space, also counted as a convolution. The convolution cores in these convolution are the same. So from the very beginning, the MLP went down to CNN and defined the concept of a convolution. That's the difference between the two points.

First, the next node is the second-level node that joins only a few of the local inputs of the previous layer. Second, these local input locations are spatially shared and fixed. You have an input that may have several convolution cores. If you have a convolution, a filter, and then by scanning the way, or put a point, multiply add once even if the number comes, you will get a point here. If you have a convolution function, you can get an area of this side called feature map. If you have six convolution, it is 6x28x28, that is, you made 6 convolution to change the input image from 32x32 to 28x28, which is the first layer of convolution operation. Then you need some drop samples.

At that time lecun to calculate the amount of small, immediately reduced sampling. After the sample is reduced, it is still 6 feature map. He then defined 10 feature maps in the C3 layer. These 10 feature map, each convolution core on the 6 convolution cores on the 6 feature map, the previous layer of 6 feature map is also convolution. Moving continuously in space. After convolution, combine the results linearly. All the parameters of the linear combination itself are learned. The linear combination gets the feature map of the next layer. You have 10, so repeat the process 10 times and turn these 6 feature maps into 10.

In fact, LeCun not be able to handle this much when dealing with the job. That means that 6 feature maps correspond to 10 feature maps, which he actually chose randomly. There may be only one feature map of the C3 layer that corresponds to 6, and the other several are 3 or 4 before it corresponds.

At that time, LeCun's article explained that it was to destroy symmetry, saying not to be completely symmetrical.

But I think the main reason is to calculate the amount. In the end, similarly, you can reduce the scale by a little bit, perform a drop sample, and then do some full connectivity. Full connection means that each input is connected to each output in the next layer, which is not a convolution form.

At that time, the structure had about 60,000 parameters, which could recognize the handwritten numerals or zip codes. LeCun They actually finished the product after they had done it. It was developed to handle handwriting recognition for 1/3 of U.S. Mail, which should be the pinnacle of the 890.

By the AlexNet of 2012 years, the structure itself has not improved substantially, but the scale is much larger.

"AlexNet"


LeNet-5 is a 60,000 parameter, then AlexNet is 60 million parameters.

Divided into the upper and lower layers, the input is a 224x224 image, then the first layer is the 11x11 convolution, is also a 11x11 convolution filter, put on the image, began to translate, multiply add, translate once, calculate a point, calculate out after doing a sampling.

"Average pooling" "Max pooling"

Sampling means that you have several points that you pick a point to use. This sample is called pooling. For example, if you have 4 points, take 1 Max, it's called Max Pooling. If there are 4 points, the average is a value, that is average pooling. As long as you can turn multiple points into a point, then this kind of way is called pooling.

After Max Pooling, continue to convolution, to do Pooling.

"Dropout"

Essentially the structure is the same as the original, just a lot bigger, so that the GPU, with one piece is not training, so at the same time with two pieces to train. At the time, the memory of a GPU might be 3G. At the time of training, I started to make some normalization of the input, and to add some dropout, we randomly set some nodes to 0 in the course of training.

In a neural network, many connections, many nodes, each node may be output a value, make a Activation, to the next layer. Then it is said that each node will contribute to the final result, then dropout is in the process of random 50% or 40% of the output of the node set to 0, so that it does not contribute, so that you train the results of the classification can not rely on all nodes are working properly, or all the information can be collected. In this case, if you want to get a good result, you will not be very dependent on the data to be similar to the previous data, or the data you have seen, so that you can deal with some unknown data. This is the improvement of AlexNet, but the intrinsic network structure itself is still quite.


"Vgg"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.