1, VGG16 2, VGG19 3, ResNet50 4, Inception V3 5, Xception Introduction--Migration learning

Source: Internet
Author: User
Tags scale image theano keras

ResNet, AlexNet, Vgg, Inception: Understanding the various CNN architectures

This article is translated from ResNet, AlexNet, Vgg, inception:understanding various architectures of convolutional Networks, original author retains copyright

Convolution neural network is an amazing performance in visual recognition task. A good CNN network is a "pang monster" with millions of parameters and many hidden layers. In fact, a bad rule of thumb is: the deeper the network, the better the effect. Alexnet,vgg,inception and ResNet are some of the most popular CNN networks. Why do these networks perform so well? How are they designed? Why do they design that structure? It is not easy to answer these questions, but here we try to discuss some of the above questions. Network structure design is a complex process that takes a little time to learn, and even more time to do your own experiments. First, let's discuss a basic question:

Why did the CNN model beat the traditional computer vision approach?

Image classification refers to the classification of a given image as one of several predefined categories. The traditional process of image classification involves two modules: feature extraction and classification .

feature extraction refers to extracting more advanced features from the original pixel points, which can capture the differences between categories. This feature extraction is an unsupervised way of extracting information from pixel points without using the category label of the image. Common traditional features include gist, HOG, SIFT, LBP, etc. After feature extraction, these features of the image are used to train a classification model with their corresponding category tags. The commonly used classification models are SVM,LR, random forest and decision tree.

a big problem with the above process is that feature extraction cannot be adjusted based on the image and its label. If the chosen feature lacks a certain representation to differentiate between categories, the accuracy of the model is compromised, regardless of the classification strategy you adopt. Using traditional processes, a better approach now is to use a variety of feature extractors and then combine them to get a good feature. But this requires a lot of heuristic rules and manpower to adjust the parameters according to the different fields to achieve a good accuracy, which is said to approach the human level. This is why using traditional computer vision technology takes years to create a good computer vision system (such as OCR, face verification, image recognition, object detection, etc.) that can handle a wide variety of data in real-world applications. Once, it took us 6 weeks to build a CNN model for a company that works better, and it takes a year to use traditional computer vision technology to achieve this effect.

another problem with traditional processes is that it is completely different from the process by which humans learn to recognize objects. Since birth, a child has been able to perceive the surroundings, and as he grows, he touches more data and learns to recognize objects. This is the philosophy behind deep learning, which does not create a hard-coded feature extractor. It integrates the feature extraction and classification two modules into a system, which is extracted by identifying the features of the image and classifying it based on tagged data.

Such an integrated system is a multilayer perceptron, which is a neural network with dense connections of multi-layer neurons. A classic depth network contains many parameters, and because of the lack of enough training samples, it is virtually impossible to train a model that fits. But for the CNN model, you can use a large data set such as Imagenet when you train a network from the beginning. The reason behind this is the two features of the CNN Model: The sharing of weights between neurons and the sparse connection between the convolution layers. This can be seen from. At the convolution layer, the neurons in one layer are only locally connected to the neurons in the input layer, and the convolution kernel parameters are shared across the 2-d feature map.

To understand the design philosophy behind CNN, you might ask: What is the goal?

(1) Accuracy

If you are building an intelligent system, the most important of course is to be as accurate as possible. To be fair, accuracy depends not only on the network, but also on the number of training samples. As a result, the CNN model is typically compared on a standard dataset imagenet.

The Imagenet project is still being improved, and there are now 14,197,122 images with 21841 categories. Since 2010, the Imagenet Image recognition contest has been held every year, with 1.2 million images of category 1000 being extracted from the imagenet data set. Each network architecture tests its accuracy on the 1000 classes on these 1.2 million images.

(2) Calculation amount

Most CNN models require a great amount of memory and computation, especially in the training process. Therefore, the computational amount becomes an important point of concern. Similarly, if you want to deploy on the mobile side, the final model size of the training will also need to be considered in particular. You can imagine that in order to get better accuracy you need a compute-intensive network. Therefore, the accuracy and the amount of computation need to be compromised.

In addition to the above two factors, there are other factors to be considered, such as the ease of training, model generalization ability and so on. Here are some of the most popular CNN architectures that are presented at the time of presentation, and they can be seen to be more accurate.

AlexNet

AlexNet is a deep network applied earlier on Imagenet, and its accuracy is much higher than that of traditional methods. It starts with 5 convolutional layers, followed by 3 fully connected layers, as shown in:

Alex Krizhevs's proposed alexnet uses the Relu activation function, rather than the tanh or sigmoid activation function used earlier in the traditional neural network, Relu mathematically expressed as:

The advantage of Relu compared to sigmoid is that its training speed is faster, because the derivative of sigmoid is very small in the stable area, so the weight is basically no longer updated. This is the problem of gradient vanishing. Therefore, the alexnet uses relu behind both the convolution layer and the fully connected layer.

Another feature of Alexnet is that it reduces the overfitting of the model by adding dropout layers behind each fully connected layer. The dropout layer randomly closes the neuron activation value in the current layer at a certain probability, as shown in:

Why is dropout effective?

The concept behind dropout is similar to the integration model. In the drpout layer, different neuron combinations are turned off, which represents a different structure, all of which use one of the sub-datasets for parallel zone weights training, and the sum of weights is 1. If there is a neuron in the dropout layer, a different substructure is formed. When predicting, it is equivalent to integrating these models and taking the mean values. This structured model regularization technique facilitates avoidance of overfitting. Another point of view that dropout is effective is that because neurons are randomly selected, they can reduce the interdependence between neurons, thus ensuring the extraction of important characteristics that are independent of each other.

VGG16

VGG16 was presented by the Vgg Group at Oxford University. an improvement in VGG16 compared to the alexnet is the use of several successive 3x3 convolution cores instead of the larger convolution cores (11x11,5x5) in Alexnet. For a given field of perception (the local size of the input image associated with the output), the use of a stacked small convolution core is preferable to a large convolution core, since multilayer nonlinear layers can increase the network depth to ensure a more complex pattern of learning, and the cost is smaller (fewer parameters).

For example, the 3-step 1 3x3 convolution nucleus is continuously acting on a field of 7, the total parameter is, if the direct use of 7x7 convolution core, the total parameter is, here refers to the number of inputs and outputs of the channel. And the 3x3 convolution kernel is good for preserving image quality. The architecture of the Vgg network is shown in the following table:

You can see the vgg-d, which uses a block structure: Multiple repeated use of the same size convolution kernel to extract more complex and more expressive features. this block structure (blocks/modules) is widely used after vgg.

The Vgg convolution layer is followed by 3 fully connected layers. The number of channels in the network starts from a smaller 64, and then each of the next sampling or pooling layers increases exponentially, and the size of the feature graph decreases exponentially. Finally, its Top-5 accuracy on the imagenet is 92.3%.

Googlenet/inception

Although Vgg can perform well on imagenet, it is difficult to deploy it on a moderately sized GPU because it requires a high degree of vgg in memory and time. Vgg is not efficient because the number of channels in the convolution layer is too large. For example, a 3x3 convolution kernel, if the number of inputs and outputs of the channel is 512, then the required calculation amount is 9x512x512.

In a convolution operation, a position on the output feature graph is connected to all the input feature graphs, which is a dense connection structure. Googlenet is based on the idea that most of the activation values in the deep network are unnecessary (0), or because the dependencies are redundant. Therefore, the most efficient deep network architecture should be a sparse connection between the activation values, which means that 512 output feature graphs are not necessarily connected to all 512 input feature graphs. There are some techniques for pruning the network to get sparse weights or connections. However, the multiplication of sparse convolution cores is not optimized in Blas and Cublas, which results in the sparse connection structure being slower than the dense structure.

Accordingly , Googlenet designed a module called inception, which uses dense structures to approximate a sparse CNN, as shown in. As I said earlier, only a very small number of neurons are really effective, so a specific size of convolution nuclei is set very little. At the same time, Googlenet uses different sizes of convolution cores to capture different size of the field of sensation.

Another feature of the inception module is the use of a bottleneck layer (which is actually a 1x1 convolution) to reduce the amount of computation :

This assumes that the input of the inception module is 192 channels, which uses 128 3x3 convolution cores and 32 5x5 convolution cores. The amount of the 5x5 convolution is calculated as 25x32x192, but as the network becomes darker, the number of channels and convolution cores of the network increases, and the amount of computation rises. To avoid this problem, reduce the number of input channels before using large convolution cores. Therefore, in the inception module, the input is first fed into a 1x1 layer convolution layer with only 16 convolution cores, which is then sent to the 5x5 convolution layer. This will reduce the overall computational amount to 16x192+25x32x16. This design allows the network to use a larger number of channels. (Translator Note: It is called a 1x1 convolution layer as a bottleneck, you can imagine a 1x1 convolution layer has the fewest number of channels, which in the inception module is like the narrowest of a bottle)

Another special design for Googlenet is that the final convolution layer replaces the full join layer with the global mean pooling layer, so called global pooling is the mean value on the entire 2D feature map. This greatly reduces the number of general staff in the model. To know that in alexnet, the full-join layer parameter accounts for 90% of the total network parameters. Using a deeper and larger network allows the googlenet to remove the full-attached layer without compromising accuracy. Its top-5 accuracy on the imagenet is 93.3%, but it is also faster than Vgg.

ResNet

As can be seen from the front, as the network depth increases, the network accuracy should be increased synchronously, of course, should pay attention to the fitting problem. But the problem with the increased depth of the network is that these additional layers are the signal for parameter updates, since the gradient is propagated from the back to the next, and the gradient of the previous layer is small when the network depth is increased. This means that these layers are basically learning to stall, which is the problem of gradient vanishing. The second problem with deep networks is that training, when the network is deeper, means that the parameter space is larger, the optimization problem becomes more difficult, so simply to increase the network depth instead of the higher the training error. Residual network ResNet The design of a residual module allows us to train deeper networks.

The training problem of the deep network is called the degenerate problem, the logic behind the residual unit can solve the degenerate problem lies in this: Imagine a network A, its training error is x. Now build network B by stacking more layers on a, and these new layers do nothing, just copy the output from the previous a. These new layers are called C. This means that network B should be the same as the training error of a. So, if you train network B, its training error should not be worse than a. But it's actually worse, the only reason is that it's not easy to have an additional layer C learn the identity map. In order to solve this degradation problem, the residual module establishes a direct connection between the input and the output, so that the new layer C only needs to be learned on the basis of the original input layer, that is, learning the residual error, it will be relatively easy.

Similar to googlenet, ResNet finally uses the global mean pooling layer. With the residual module, the residual network of 152 layers can be trained. The accuracy is higher than vgg and googlenet, but the computational efficiency is higher than VGG. The 152-layer ResNet has a top-5 accuracy of 95.51%.

ResNet mainly uses 3x3 convolution, which is similar to Vgg. On the basis of Vgg, the short-circuit connection is inserted into the formation residual network. As shown in the following:

Residual network experimental results show that the 34-layer normal network is more than the 18-layer network training error, which is referred to as the previous degradation problem. But the 34-layer residual network is better than the 18-layer residual network training error.

Summarize

As more and more complex architectures are proposed, some networks may go down the altar for years, but the design philosophy behind it is worth learning. This article summarizes the design principles of the more popular CNN architecture in recent years. Translator Note: You can see that the depth of the network is increasing, in order to ensure better accuracy. The network structure tends to use less convolution cores, such as 1x1 and 3x3 convolution cores, which suggests that the design of CNN is computationally efficient. An obvious trend is the use of modular structure, which can be seen in googlenet and ResNet, this is a good design example, the use of modular structure can reduce the design of our network space, and another point is that the use of bottlenecks in the module can reduce the computational capacity, which is also an advantage. This article does not mention some of the recent mobile-based lightweight CNN models, such as mobilenet,squeezenet,shufflenet, which are very small in size, and computationally efficient enough to meet mobile demand, are a balance between accuracy and speed.

Image recognition is the mainstream application of deep learning today, and Keras is the easiest and most convenient deep learning framework for getting started, so you have to emphasize the speed of the image recognition and not grind it. This article allows you to break through five popular network structures in the shortest time, and quickly reach the forefront of image recognition technology.


A few months ago, I wrote a tutorial on how to classify images using a trained convolutional (pre-trained) neural network model (especially VGG16), which was trained with Python and keras deep learning libraries for imagenet datasets.

These pre-trained models, which have been integrated into (formerly and Keras Separate) Keras, can identify 1000 categories of objects (such as puppies, kittens, etc.) that we see in our daily lives, with a very high rate of accuracy.

The previously pre-trained imagenet model and the Keras library are separate, and we need to clone a separate GitHub repo and add it to the project. Use a separate github repo to maintain the line.

However, before the pre-trained models (VGG16, VGG19, ResNet50, Inception V3 and xception) are fully integrated into the Keras library (no separate backups need to be cloned), my tutorials have been published, and the following links allow you to see the integrated model addresses. I am going to write a new tutorial that demonstrates how to use these most advanced models.

https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py

Specifically, write a Python script that can be loaded using these network models, the backend using TensorFlow or Theano, and then predict your test set.

Vggnet, ResNet, Inception and Xception on the Keras

In the first half of this tutorial, we briefly talk about the Vgg, ResNet, Inception, and Xception model architectures contained in the Keras library.

Then, using Keras to write a Python script, you can load these pre-trained network models from disk and then predict the test set.

Finally, the results of these classifications are viewed on several sample images.

Best deep learning image classifier on Keras

The following five convolutional neural network models are already in the Keras library and are available out of the box:

1, VGG16
2, VGG19
3, ResNet50
4, Inception V3
5, Xception

We start with an overview of the Imagenet dataset and then briefly discuss each model schema.

What is Imagenet?

Imagenet is a manually labeled Image database (for machine vision Research) and currently has 22,000 categories.

However, when we hear the word "ImageNet" in the context of deep learning and convolutional neural networks, we may refer to the ImageNet Visual recognition contest, called ILSVRC.

This picture category game is a model that is trained to correctly categorize input images into a category in 1000 categories. Training set 1.2 million, validation set 50,000, test set 100,000.

These 1,000 picture categories are what we encounter in our daily life, such as dogs, cats, various household items, vehicle types, etc. The complete list of picture categories in the ILSVRC contest is as follows:

Http://image-net.org/challenges/LSVRC/2014/browse-synsets

In the aspect of image classification, the imagenet match accuracy rate has been used as the benchmark of computer vision classification algorithm. Since 2012, convolutional neural networks and deep learning technologies have dominated the rankings of this tournament.

In the last few years of the Imagenet race, Keras has several of the best-performing CNN (Convolutional neural networks) models. These models have a strong ability to generalize data sets other than imaegnet by migrating learning techniques (feature extraction, fine-tuning (fine-tuning)).

VGG16 and VGG19

In 2014, the VGG model architecture was presented by Simonyan and Zisserman, in "extremely deep mass image recognition convolutional networks" (Very depth convolutional Networks for Large scale image Recognition) is presented in this paper.

Paper Address: https://arxiv.org/abs/1409.1556

The VGG model structure is simple and effective, the first few layers use only 3x3 convolutional cores to increase the network depth, and the Max pooling (max pooling) reduces the number of neurons in each layer in turn, and the last three layers are 2 fully connected layers with 4,096 neurons and one softmax layer respectively.

"16" and "19" indicate the number of network layers in the network that need to be updated weight (the parameters to learn) (the columns D and E in Figure 2 below), including convolution layer, full join layer, Softmax layer:

Very deep large-scale image recognition convolution network paper Diagram 1, Simonyan & Zisserman (2014)

In 2014, the 16-and 19-tier networks were thought to be deep, but compared with the current ResNet architecture, ResNet can do 50-200 levels of depth on imagenet, while CIFAR-10 can do 1000+ depth.

Simonyan and Zisserman found some difficulties in training VGG16 and VGG19 (especially the convergence of deep networks). So in order to be easier to train, they reduced the number of layers that needed to update weight (column A and C in Figure 2) to train smaller models.

After a smaller network converges, weight is initialized with a smaller network to initialize a deeper network of weight, which is pre-training. This does not look like a problem, but the pre-training model takes a long time to train before it can be used.

In most cases, instead of pre-trained model initialization, we prefer to use Xaiver/glorot initialization or MSRA initialization. Read all need are a good init this paper provides a deeper understanding of the importance of weight initialization and deep neural network convergence.

MSRA initialization: https://arxiv.org/abs/1502.01852
All need are a good init:https://arxiv.org/abs/1511.06422

Unfortunately, Vgg has two big drawbacks:

1, the network architecture weight The number is quite large, consumes the disk space.
2. Very slow training

Due to the number of its full-connected nodes, coupled with a deep network, VGG16 has 533mb+,vgg19 574MB. This makes deploying VGG more time consuming. We still use Vgg in many deep-learning image classification problems, however, smaller network architectures are usually more desirable (e.g. squeezenet, googlenet, etc.).

ResNet (Residual network)

Unlike traditional sequential network architectures such as Alexnet, Overfeat, and Vgg, they add a y=x layer (identity mapping layer), which allows the network to be degraded in a depth-increase situation. A build block is shown, the input passes through two weight layers, and finally, the input is added to form a micro-architecture module. The ResNet is ultimately made up of many micro-architecture modules.

In the 2015 "deep residual learning for Image recognition" paper, he and others first proposed that the Resnet,resnet architecture has become a meaningful model, It can train very deep networks by using the residuals module and regular SGD (which requires a reasonable initialization of weight):

Paper Address: https://arxiv.org/abs/1512.03385

The article, published 2016 years later, "identity Mappings in deep residual Networks", shows that high accuracy can be achieved by updating the residuals module using identity mapping (identity mapping).

Paper Address: https://arxiv.org/abs/1603.05027

(left) initial residuals model (right) upgraded residuals model

It is important to note that the implementation of RESNET50 (50 weight layers) in the Keras Library is based on a paper that was 2015 years ago.

Even though the resnet is deeper than VGG16 and VGG19, the size of the model is actually quite small, with global average pooling (global average horizontal pool) instead of the full join layer can reduce the size of the model to 102MB.

Inception V3

The "Inception" micro-architecture was first proposed by Szegedy and others in the 2014 paper "going deeper with convolutions".

Paper Address: https://arxiv.org/abs/1409.4842

The original inception model used in Googlenet

The purpose of the inception module is to act as a "multistage feature extractor", using 1x1, 3x3, and 5x5 convolution cores, and finally connecting these convolution outputs as inputs to the next layer.

this architecture, formerly called Googlenet, is now simply called the inception VN, where n refers to the version number set by Google. the Inception V3 architecture in the Keras Library is based on the paper "Rethinking the Inception Architecture for computer Vision", which was later written by Szegedy and others. The update of inception module is proposed, and the effect of imagenet classification is further improved. the weight number of Inception V3 is less than Vgg and ResNet, and the size is 96MB.

Paper Address: https://arxiv.org/abs/1512.00567

Xception

Xception Architecture

Xception is made by Fran?ois Chollet himself (Keras maintainer). Xception is an extension of the inception architecture, which replaces the standard inception module with a deep separable convolution.

The original paper "Xception:deep Learning with Depthwise separable convolutions" is here:

Paper Address: https://arxiv.org/abs/1610.02357

Xception has the smallest number of weight, only 91MB.

As for saying squeezenet?

The "Fire" model of Squeezenet

The squeezenet architecture has alexnet-level accuracy by using the squeeze convolutional layer and the expansion layer (a combination of 1x1 and 3x3 convolution cores), and the model size is only 4.9MB.

Although the Squeezenet model is very small, its training requires skill. in my forthcoming book, "Deep learning computer vision and Python," I'll explain in detail how to train squeezenet from scratch on the imagenet dataset.

Use Python and the above Keras library to classify images

Let's learn how to use the pre-trained convolutional neural network model in the Keras Library for image classification.

Create a new file named classify_image.py, and enter the following code:

The purpose of the 第2-13 line is to import the required Python packages, most of which belong to the Keras library.

Specifically, 第2-6 lines are imported into Resnet50,inception v3,xception,vgg16 and VGG19 respectively.

It is important to note that the Xception network can only be used with the TensorFlow backend (if the Theano backend is used, the class throws an error).

The 7th line, using the Imagenet_utils module, has some functions that can be conveniently used for input image preprocessing and decoding output classification.

In addition, other auxiliary functions are imported, followed by NumPy for numerical processing, cv2 for image editing.

Next, parse the command-line arguments:

We only need a command line parameter--image, which is the path of the input image to classify.

You can also accept an optional command-line argument,--model, that specifies the pre-trained model you want to use, using VGG16 by default.

Given the name of the pre-trained model given by command-line arguments, we need to define a python dictionary that maps the model name (string) to its true Keras class.

The 第25-31 row defines the models dictionary, which maps the model name string to the appropriate class.

If the--model name is not found in models, a assertionerror (第34-36 line) is thrown.

The convolutional neural network takes an image as input and returns a set of probabilities corresponding to the class label as output.

The size of the classic CNN input image is 224x224, 227x227, 256x256, and 299x299, but can also be other sizes.

Both Vgg16,vgg19 and ResNet accept 224x224 input images, while inception V3 and xception require 299x299 pixel input, as shown in the following code block:

Initializes the inputshape to 224x224 pixels. We also use function preprocess_input to perform average subtraction.

However, if you use inception or xception, we need to set the inputshape to 299x299 pixels, and then preprocess_input use separate pre-processing function, Pictures can be scaled in different types.

The next step is to load the pre-trained model weight (weights) from disk and instantiate the model:

The 58th line, from the--model command line argument, gets the name of the model, mapped to the corresponding class through the models dictionary.

Line 59th, and then instantiate the convolutional neural network using the pre-trained imagenet weights.

Note: The weight file for VGG16 and VGG19 is greater than 500MB. ResNet is 100MB, while inception and Xception are between 90-100mb. If this script is run for the first time, these weights files are automatically downloaded and cached to the local disk. Depending on your network speed, this may take some time. However, once the weight files are downloaded, they will not need to be re-downloaded, and running classify_image.py again will be very fast.

The model is now loaded and ready for image classification-we just need to prepare the image for categorization:

Line 65th, load the input image from disk, Inputshape adjust the width and height of the image.

Line 66th converts the image from the Pil/pillow instance to the NumPy array.

The input image is now represented as a numpy array (inputshape[0],inputshape[1],3).

In line 72nd, we usually train/classify images in batches using convolutional neural networks, so we need to add an extra dimension (color channel) to the Matrix via Np.expand_dims.

After np.expand_dims processing, the image has a shape (1,inputshape[0],inputshape[1],3). If this extra dimension is not added, calling. Predict can cause errors.

Finally, line 76th invokes the appropriate preprocessing function to perform data normalization.

After model prediction, and get output classification:

Line 80th, call the CNN. Predict to get the predicted results. Based on these predictions, they are passed to the Imagenet auxiliary function decode_predictions, which gives the name of the Imagenet class tag (the ID is converted to a name, the readability is high) and the probability corresponding to the label.

The first 5 predictions (that is, labels with the maximum probability) are then output to the terminal on lines 85th and 86th.

Before we end the example, we'll take the last thing we do here, load our input image from disk via OpenCV, draw a # # forecast on the image, and finally display the image on our screen:

To see the actual operation of the pre-trained model, see the next section.

Classification results of Vggnet, ResNet, Inception and Xception

All the examples in this post use the keras>=2.0 and TensorFlow backend. If you use TensorFlow, be sure to use version >=1.0, or you will encounter an error. I also tested the script with the Theano backend and confirmed that I could use Theano.

After installing Tensorflow/theano and Keras, click on the source code + sample Image link at the bottom to download.

Now we can classify the images with VGG16:

We can see that VGG16 correctly classifies the image as "soccer" with a probability of 93.43%.

To use VGG19, we only need to change the--network command line arguments:

The VGG19 is able to correctly classify the input image as "convertible" with a probability of 91.76%. Look at other top-5 predictions: the probability of a "sports car" is 4.98% (actually a sedan), the "Limo" is 1.06% (though incorrect but looks reasonable), the "Wheel" is 0.75% (from the model angle is also correct, because the image has wheels).

In the following example, we use the pre-trained resnet architecture to see the top-5 probability values:

ResNet correctly classified the Clinteastwood gun image as a "revolver" with a probability of 69.79%. In the top-5, "rifle" is 7.74%, "submachine gun" is 5.63%. Because of the "revolver" angle of view, the barrel is longer, CNN is easy to think of as a rifle, so the resulting rifle is also higher.

The next example uses ResNet to classify a dog's image:

Dog breeds are correctly identified as "Beagle", with a probability of 94.48%.

Then I try to separate the Pirates of the Caribbean actor Johnny Depp from this image:

Although there is a "boat" class in imagenet, it is interesting that the inception network can correctly identify the scene as "Wreck" and has a 96.29% probability. All other predictive labels, including "Waterfront", "canoe", "paddle" and "breakwater" are related and in some cases are absolutely correct.

For another example of the inception network, I took a photo of the office couch:

Inception correctly predicts that there is a "table light" in the image, with a probability of 69.68%. Other top-5 predictions are also perfectly correct, including "studio sofas", "curtains" (the far right side of the image, almost inconspicuous) "lampshade" and "pillow".

Although inception is not used as an object detector, it is still able to predict the top 5 objects in an image. Convolution neural network can be perfect for the recognition of objects!

Look again at the Xception:

Here we have a picture of a Scottish barrel, especially my favorite Scotch whisky, Lagavulin. Xception correctly classifies this image as "buckets".

The last example is to classify using VGG16:

A few months ago, when I finished the game of the Wild Hunt, I took this photo of the monitor. VGG16 's first prediction was "home theater", which was a reasonable prediction because there was a "TV/monitor" in the top-5 forecast.

As you can see from the examples in this article, the pre-trained model on the Imagenet dataset recognizes a variety of common everyday objects. You can use this code in your own project!

Summarize

For a brief recap, in today's blog post, we present five convolutional neural network models in Keras:

VGG16
VGG19
ResNet50
Inception V3
Xception

Since then, I have demonstrated how to classify images using these neural network models. I hope this article will be of help to you.

Original address:
Http://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception ...

1, VGG16 2, VGG19 3, ResNet50 4, Inception V3 5, Xception Introduction--Migration learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.