From Alexnet to Mobilenet, take you to the deep neural network

Last Update:2018-05-08 Source: Internet

Author: User

Tags dnn function calculator

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary:On March 13, 2018, the Shen Junan community, from Harbin Institute of Technology, shared a typical model-an introduction to deep neural networks. This paper introduces the development course of deep neural network in detail, and introduces the structure and characteristics of each stage model in detail.

The Shen Junan of Harbin Institute of Technology shares a typical model-an introduction to deep neural networks. This paper introduces the development course of deep neural network in detail, and introduces the structure and characteristics of each stage model in detail.
Live Review Please click here

Here are the highlights of the video:

Problem leads

Learning knowledge is a good way to start with the problem, so this article will be able to focus on the following three questions to unfold:
What is the difference between 1.DNN and CNN? What's the matter? How is it defined?
2. Why is DNN now so hot, and how has it gone through a course of development?
The structure of the 3.DNN is very complex, how can the actual introduction to try it?
The mind map of this article is as follows:

Development history

dnn-Definitions and concepts

In convolutional neural networks, convolution operations and pooling operations are stacked organically together, forming the backbone of the CNN.
It is also inspired by the multi-layered network between the macaque retina and the visual cortex, and the deep Neural network architecture comes into being and has achieved good performance. It can be said that DNN is actually an architecture that refers to a neural network structure that is deeper than several similar layers, generally reaching dozens of layers, or consisting of complex modules.

ILSVRC (Imagenet Large-scale visual identity challenge) is constantly being deeply studied every year, as the model becomes deeper and darker, Top-5 's error rate is lower and lower, currently down to 3.5%, while human identification error rates on the imagenet data set are around 5.1%, So far, the ability to recognize deep learning models has surpassed that of humans.

From Alexnet to Mobilenet

Alexnet

Alexnet is the first time that convolutional neural networks have been introduced into the field of computer vision and achieved breakthrough results.
Alexnet has Alex Krizhevsky, Llya Sutskever, Geoff Hinton proposed, won the ILSVRC 2012-year championship, and then TOP-5 project error rate is only 15.3%, compared to the use of the traditional method runner 26.2% A major breakthrough in the achievement of excellence.
The development of convolutional neural networks and even deep learning is facilitated by the stacking of convolutional layers to make the model deeper and wider than the previous lenet,alexnet, while the GPU enables the results to be obtained within a time-frame that is more acceptable for training.
Here is the architecture of the Alexnet:

The alexnet features are:
1. Train your model with a 15 million-labeled, 22000-imagenet dataset that is close to complex scenes in the real world.
2. Use a deeper and wider CNN to improve your learning capacity.
3. Flexible use of relu as the activation function, the relative sigmoid greatly improved the training speed.
4. Use multiple GPUs to increase the capacity of the model.
5. The competition between neurons is introduced through LRN to help generalization and improve model performance.
6. The partial neurons are randomly ignored by dropout to avoid overfitting.
7. Avoid overfitting by means of data enhancement such as zooming, flipping, and cutting.
The above is a typical method of deep neural network application.
Alexnet in the development of the time, the use of GTX580 only 3GB of video memory, so the creative model disassembly in the two Xian card, the structure is as follows:
1. The first layer is a convolution layer, for the input image of 224x224x3 convolution operation, the parameters are: Convolution core 11x11x3, the number of 96, the step size 4,lrn is normal to the maximum 2x2 pool.
2. The second layer is a convolution layer, only with the first layer of output in the same GPU convolution, the parameters are: convolutional core 5x5x48, Shulang 256,lrn after the 2x2 maximum pooling.
3. The third layer is the convolution layer, with the second layer of all output convolution, the parameters are: 3x3x256, the number of 384.
4. The fourth layer is a convolution layer, only with the third layer of output in the same GPU convolution, the parameters are: convolutional core 3x3x192, the number of 384.
5. The fifth layer is a convolution layer, only with the third layer output in the same GPU convolution, the parameters are: convolutional core 3x3x192, number 256, 2x2 maximum pooling.
6. The sixth layer is an all-connected layer with 4,096 neurons.
7. The seventh layer is an all-connected layer with 4,096 neurons.
8. The first layer is an all-connected layer, representing 1000 categories of Softmax.
vggnet
Vggnet is the CNN model proposed by Oxford's visual Geometry Group, which won the ILSVRC 2014 positioning competition with a 25.3% error rate, with a 7.32% error rate after googlenet,top-5.
Vggnet and Goolenet independently adopted deeper network results, but they have different designs. Vggnet inherited the design of alexnet, but made more optimizations:
1. Deeper networks, commonly used in 16-and 9-layer, achieve good performance.
2. Simpler, the relationship between depth and performance is explored using only 3x3 convolution cores and 2x2 maximum pooling.
3. The impact of the network in Network was received, and some models of vggnet also used 1x1 convolution cores.
4. Parallel training with multi-block GPU.
5. The use of local Response normailzation has been abandoned due to the lack of obvious results.
The network structure is roughly as follows:

In deep learning, we often need techniques such as centering, rotating, horizontal displacement, vertical displacement, horizontal flipping, and so on, to reduce overfitting by data augmentation.
ResNet
ResNet (residual neural network) was presented by Microsoft Research Asia kaiming He, who successfully trained a 152-layer deep neural network using residual unit and won the championship in the ILSVRC2015 race. The top-5 error rate is 3.57%, while the number of arguments is much lower than vggnet.
ResNet's inspiration stems from this problem: Previous studies have shown that depth is critical to model performance, but as the depth increases, accuracy decreases. Surprisingly, the attenuation does not come from overfitting because the accuracy on the training set is reduced. In extreme cases, it is assumed that the appended layers are equivalent mappings, at least not the increase in the error on the training set.
The solution is to introduce residuals: a layer of network input is x, the expected output is H (x), if we direct input x to the output as an equivalent mapping, and the middle of the non-linear layer is f (x) =h (x)-X as residuals. We guess that optimizing the residual mapping is easier than optimizing the original mapping, and in extreme cases the residual f (x) is compressed to 0. ：

The above is the residual unit of ResNet. The benefit of the residual unit is that when the response is propagated, the gradient can be passed directly to the upper layer, and the inefficient gradient disappears to support the deeper network. At the same time, ResNet also uses batch normalization, and the residual unit will be easier to train and more generalized than before.
Googlenet
Googlenet is proposed by Christian Szegedy, the main idea is to use a deeper network to achieve better performance, while optimizing to reduce the loss of computing.
The Googlenet model is the network in network. The convolution layer in the alexnet is used for the internal product operation of the linear convolution check image, followed by a nonlinear activation function after each local output, and the result is called the characteristic function. The convolution kernel is a generalized linear model, and the feature extraction is implicitly assumed to be linear, but the practical problem is not. In order to solve this problem, the network in network proposed using multilayer perceptron to achieve non-linear convolution, actually equivalent to inserting 1x1 convolution while maintaining the feature image size unchanged.

The benefits of

using 1x1 convolution are: adding local feature abstraction capabilities through nonlinear changes, avoiding full-join layers to reduce overfitting, reducing dimensions, and requiring fewer parameters. In a sense, network in networks confirms that deeper Web performance is better. The
Googlenet stacks up the inception and builds a deeper network through a sparse network that controls computational capacity while ensuring model performance, making it more suitable for forecasting in resource-constrained scenarios.
mobilenet
Traditional CNN models tend to focus on performance, but are not feasible in mobile phones and embedded applications. In response to this problem, Google proposed a new model architecture mobilenet. The
Mobilenet small but high-performance CNN model helps users achieve computer vision on mobile devices or embedded devices without the need for computing power in the cloud. With the increasing computational power of mobile devices, Mobilenet can help AI technology load into mobile devices. The
Mobilenet has the following characteristics: To reduce the number of parameters and computational complexity by separating the convolution with the depth direction, the introduction of wide and resolution two global hyper-parameters, can find a balance between delay and accuracy, suitable for mobile phones and embedded applications, with competitive performance, Tasks such as imagenet classification are verified and feasible in mobile applications such as object detection, fine-grained recognition, face properties, and large-scale geographic position.

Understanding the implementation of-vggnet style migrations
Style migration is a very interesting way of deep learning in many applications, and we can use this method to "migrate" the style of one image to another to create a new image.
Deep learning in the field of computer vision is particularly obvious, image classification, identification, positioning, super-resolution, transformation, migration, description and so on have been able to use deep learning technology to achieve. The technology behind it can be word: the deep convolution neural network has the strong ability of extracting image features.
Among them, the success of the style migration algorithm is mainly based on two points: 1. Two images after a pre-trained classification network, the smaller the distance between the extracted high-dimensional features, the more similar the two image content. 2. Two images are pre-trained classification networks, the more similar the two image styles are, the lower wirtgen is basically equal on the branches. Based on these two points, a suitable loss function can be designed to optimize the network.
For the deep network, the deep-roll integration network has good feature extraction ability, different layer extraction features have different meanings, each well-trained network can be regarded as a good feature extractor, in addition, the depth network has a layer of non-linear functions, can be regarded as complex multivariate nonlinear functions, This function completes the mapping of the input image to the output. As a result, thousands can use a well-trained depth network as a loss function calculator.

Model structure, network Framework Classification two parts, its part when image conversion network T (Image Transform net) and pre-trained loss computing network VGG-16, image conversion network T to content image x as input, output style migrated image y, then content image YC, style image ys, and y ' input vgg-16 compute features.
In this deep neural network, the parameter loss function is divided into two parts, for the final image y ', a part is the content, one is the style.
Loss content:, which represents the deep convolutional network VGG-16 perceived loss:, where G is the gram matrix, the calculation process is:
Total loss Fixed calculation method:

Original link

From Alexnet to Mobilenet, take you to the deep neural network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More