convolutional Neural Networks

Source: Internet
Author: User
Tags theano

convolutional neural Network Origin: The human visual cortex of the Meow

In the 1958, a group of wonderful neuroscientists inserted electrodes into the brains of the cats to observe the activity of the visual cortex. and infer that the biological vision system starts from a small part of the object,

After layers of abstraction, it is finally put together into a processing center to reduce the suspicious nature of object judgment. This approach runs counter to BP's network.

The BP network argues that each neuron in the brain perceives all of the objects (all pixels fully connected) and simply maps, without abstracting the object.

Who is right and who is wrong? convolutional Neural Networks (convolution neural network) are the first to prove the unscientific nature of BP networks.

CNN originated from machine learning Master LeCun in the late 80 to make a cheque for the digital recognition neural network. In front of the BP network, he experimented with multi-tiered convolution, down-sampling, and some of the network-connected ideas,

Results The training effect was surprisingly good. LeCun's mentor is Hinton, the first computer scientist and neuroscientist to propose deep learning concepts in 2006.

LeCun himself is a half-neuro-scientist. The two men and apprentice together propped up the deep learning of the blue sky.

Part I: Convolution and de-sampling of images

Convolution operation has the effect of smoothing and fuzzy signal, and it makes the product weighting of the original signal and the convolution nucleus neighborhood. A picture can be divided into two parts: noise and detail.

Convolution operation after merging the signal part of the content, played the elimination of noise, highlighting the details of two functions. At the same time, the parameters of convolution operation can be automatically trained by the machine, which becomes the target of the neural network.

Using a two-dimensional convolution, the convolution method is simple: refer to Http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

For Vaild convolution, there is also a full convolution, considering that the convolution core is moved outside the source image, the vaild convolution dimension will become smaller and the full will become larger (generally not)

VAILD:DIMNEW=Dimold−filter. Dim+1

FULL:DIMNEW=Di Mold+fi< Span id= "mathjax-span-70" class= "Mi" >lt er di m− 1

Of course this is something. The definition of a two-dimensional discrete convolution is this: g is not the same as the F order.

The medium yellow 3*3 convolution core is actually rotated 180 degrees before the product of the sliding.

???135246???=??? 64 2 5 3 1 ???

So do not forget the convolution core rotation when writing a volume.

The Pooling is a size, and if it is (2,2), the pixels per 2x2 are combined into 1 pixels.

Max pooling method that has the maximum value as the new value so that the feature is more prominent.

There are also mean pooling methods that take the mean as the new value so that the feature is smoothed more smoothly.

Of course these are not the focus, pooling force the pixel down a magnitude, ① reduce the amount of computational ② to ensure the scaling, rotation invariance [?]

Part II structure of the CNN

This is LeCun's LENET5 structure, which is often used as a CNN teaching.

The first thing to distinguish between a concept, filter and feature map. Feature Map is a big board above, size is 28x28

Filter refers to the convolution core, the number of filter determines the number of the next big board, size is the convolution kernel sizes.

Convolution embedded in the neural network, there is a concept of weight sharing. The traditional neural network, to 28x28 pixels, will have 784 neurons, if the next layer also has 784 neurons, then you need to 784*784 a weight value.

But in convolutional neural networks, the convolution kernel weights are considered as neurons, and how large is a convolution nucleus? 5x5=25, 784 pixels share these 25 neurons, rolling around, that is, a set of weights.

The results show that the number of convolution nuclei has little to do with the size of the data, and it is necessary to increase the number of cores as the number of layers deepens according to the characteristics of the visual system. For example, Lenet's convolution layer is 6, 16 cores

In this way, the full connection process, although the Num (layer K) *num (layer K-1) Set weight value, but because each layer feature map number is limited, and a set of weights is not small, so training is very easy.

Theano's note uses a 4D tensor (which is actually an array of size 4) to represent the structure transfer, assuming that a batch has 500 images, all in black and white.

(Number of pictures, map number, high, wide)---image/input/output, the map number of input is the number of channels, such as RGB has three channels, as 3.

(Filter number, map number, height, width)---filter

The transfer of I/O is as follows:(500,1,32,32)=(500,6,28,28)=(500,6,14,14)=(500,16,10,10)= =(5,5)= (+1 ,1) , the following is the hidden layer + classifier.

The transfer of filter is as follows:(6,1,5,5) => (16,6,5< Span id= "mathjax-span-206" class= "Mo" >,5 ) => (120,16,5 ,5 )

It is noteworthy that the sigmoid after the convolution, as well as the bias B.

Because it is fully connected, the previous layer of the map convolution and the result is superimposed, fed into sigmoid processing. Each pixel of each map in the new layer is uniformly plus B. (Theano the one-dimensional B into four dimensions)

That is, B equals the number of filter, not the same as the number of W. The explanation in the tornadomeet is wrong.

Of course, because there are also reduced sampling, so sigmoid, bias B can be left to drop the sample after the end of the addition to form the next layer of input.

LeCun the 16 maps of the 6 map=>c3 of S2, it did not use an all-in-one approach, but rather a part of the connection that was more relevant to the biological vision. Refer to the explanation of Tornadomeet.

In this way, a better distinction is made between features, but not very well implemented. So Theano simply made a simple version using the full-connection notation of the conv2d package.

If I do, I think the first thing I need to do is to give the output map to reshape, and then use these maps to 艹 16 single maps on the conv2d hand and combine them into the next layer of input.

convolutional Neural Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.