[Turn] don't grind, you're an image recognition expert after this.

Source: Internet
Author: User
Tags scale image theano keras

Image recognition is the mainstream application of deep learning today, and Keras is the easiest and most convenient deep learning framework for getting started, so you have to emphasize the speed of the image recognition and not grind it. This article allows you to break through five popular network structures in the shortest time, and quickly reach the forefront of image recognition technology.

Author | Adrian Rosebrock

Translator | Guo Hongguang

Edit | Pigeons

Translation Address: https://cloud.tencent.com/developer/article/1111154

Original address: http://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/

A few months ago, I wrote a tutorial on how to classify images using a trained convolutional (pre-trained) neural network model (especially VGG16), which was trained with Python and keras deep learning libraries for imagenet datasets.

These pre-trained models, which have been integrated into (formerly and Keras Separate) Keras, can identify 1000 categories of objects (such as puppies, kittens, etc.) that we see in our daily lives, with a very high rate of accuracy.

The previously pre-trained imagenet model and the Keras library are separate, and we need to clone a separate GitHub repo and add it to the project. Use a separate github repo to maintain the line.

However, before the pre-trained models (VGG16, VGG19, ResNet50, Inception V3 and xception) are fully integrated into the Keras library (no separate backups need to be cloned), my tutorials have been published, and the following links allow you to see the integrated model addresses. I am going to write a new tutorial that demonstrates how to use these most advanced models.

https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py

Specifically, write a Python script that can be loaded using these network models, the backend using TensorFlow or Theano, and then predict your test set.

Vggnet, ResNet, Inception and Xception on the Keras

In the first half of this tutorial, we briefly talk about the Vgg, ResNet, Inception, and Xception model architectures contained in the Keras library.

Then, using Keras to write a Python script, you can load these pre-trained network models from disk and then predict the test set.

Finally, the results of these classifications are viewed on several sample images.

Best deep learning image classifier on Keras

The following five convolutional neural network models are already in the Keras library and are available out of the box:

    1. VGG16
    2. VGG19
    3. ResNet50
    4. Inception V3
    5. Xception

We start with an overview of the Imagenet dataset and then briefly discuss each model schema.

What is Imagenet?

Imagenet is a manually labeled Image database (for machine vision Research) and currently has 22,000 categories.

However, when we hear the word "ImageNet" in the context of deep learning and convolutional neural networks, we may refer to the ImageNet Visual recognition contest, called ILSVRC.

This picture category game is a model that is trained to correctly categorize input images into a category in 1000 categories. Training set 1.2 million, validation set 50,000, test set 100,000.

These 1,000 picture categories are what we encounter in our daily life, such as dogs, cats, various household items, vehicle types, etc. The complete list of picture categories in the ILSVRC contest is as follows:

Http://image-net.org/challenges/LSVRC/2014/browse-synsets

In the aspect of image classification, the imagenet match accuracy rate has been used as the benchmark of computer vision classification algorithm. Since 2012, convolutional neural networks and deep learning technologies have dominated the rankings of this tournament.

In the last few years of the Imagenet race, Keras has several of the best-performing CNN (Convolutional neural networks) models. These models have a strong ability to generalize data sets other than imaegnet by migrating learning techniques (feature extraction, fine-tuning (fine-tuning)).

VGG16 and VGG19

In 2014, the VGG model architecture was presented by Simonyan and Zisserman, in "extremely deep mass image recognition convolutional networks" (Very depth convolutional Networks for Large scale image Recognition) is presented in this paper.

Paper Address: https://arxiv.org/abs/1409.1556

The VGG model structure is simple and effective, the first few layers use only 3x3 convolutional cores to increase the network depth, and the Max pooling (max pooling) reduces the number of neurons in each layer in turn, and the last three layers are 2 fully connected layers with 4,096 neurons and one softmax layer respectively.

"16" and "19" indicate the number of network layers in the network that need to be updated weight (the parameters to learn) (the columns D and E in Figure 2 below), including convolution layer, full join layer, Softmax layer:

Very deep large-scale image recognition convolution network paper Diagram 1, Simonyan & Zisserman (2014)

In 2014, the 16-and 19-tier networks were thought to be deep, but compared with the current ResNet architecture, ResNet can do 50-200 levels of depth on imagenet, while CIFAR-10 can do 1000+ depth.

Simonyan and Zisserman found some difficulties in training VGG16 and VGG19 (especially the convergence of deep networks). So in order to be easier to train, they reduced the number of layers that needed to update weight (column A and C in Figure 2) to train smaller models.

After a smaller network converges, weight is initialized with a smaller network to initialize a deeper network of weight, which is pre-training. This does not look like a problem, but the pre-training model takes a long time to train before it can be used.

In most cases, instead of pre-trained model initialization, we prefer to use Xaiver/glorot initialization or MSRA initialization. Read all need are a good init this paper provides a deeper understanding of the importance of weight initialization and deep neural network convergence.

MSRA initialization: https://arxiv.org/abs/1502.01852 All need is a good init:https://arxiv.org/abs/1511.06422

Unfortunately, Vgg has two big drawbacks:

    1. The number of network architectures weight is quite large and consumes disk space.
    2. Training is very slow.

Due to the number of its full-connected nodes, coupled with a deep network, VGG16 has 533mb+,vgg19 574MB. This makes deploying VGG more time consuming. We still use Vgg in many deep-learning image classification problems, however, smaller network architectures are usually more desirable (e.g. squeezenet, googlenet, etc.).

ResNet (Residual network)

Unlike traditional sequential network architectures such as Alexnet, Overfeat, and Vgg, they add a y=x layer (identity mapping layer), which allows the network to be degraded in a depth-increase situation. A build block is shown, the input passes through two weight layers, and finally, the input is added to form a micro-architecture module. The ResNet is ultimately made up of many micro-architecture modules.

In the 2015 "deep residual learning for Image recognition" paper, he and others first proposed that the Resnet,resnet architecture has become a meaningful model, It can train very deep networks by using the residuals module and regular SGD (which requires a reasonable initialization of weight):

Paper Address: https://arxiv.org/abs/1512.03385

The article, published 2016 years later, "identity Mappings in deep residual Networks", shows that high accuracy can be achieved by updating the residuals module using identity mapping (identity mapping).

Paper Address: https://arxiv.org/abs/1603.05027

(left) initial residuals model (right) upgraded residuals model

It is important to note that the implementation of RESNET50 (50 weight layers) in the Keras Library is based on a paper that was 2015 years ago.

Even though the resnet is deeper than VGG16 and VGG19, the size of the model is actually quite small, with global average pooling (global average horizontal pool) instead of the full join layer can reduce the size of the model to 102MB.

Inception V3

The "Inception" micro-architecture was first proposed by Szegedy and others in the 2014 paper "going deeper with convolutions".

Paper Address: https://arxiv.org/abs/1409.4842

The original inception model used in Googlenet

The purpose of the inception module is to act as a "multistage feature extractor", using 1x1, 3x3, and 5x5 convolution cores, and finally connecting these convolution outputs as inputs to the next layer.

This architecture, formerly called Googlenet, is now simply called the inception VN, where n refers to the version number set by Google. The Inception V3 architecture in the Keras Library is based on the paper "Rethinking the Inception Architecture for computer Vision", which was later written by Szegedy and others. The update of inception module is proposed, and the effect of imagenet classification is further improved. The weight number of Inception V3 is less than Vgg and ResNet, and the size is 96MB.

Paper Address: https://arxiv.org/abs/1512.00567

Xception

Xception Architecture

Xception is made by Fran?ois Chollet himself (Keras maintainer). Xception is an extension of the inception architecture, which replaces the standard inception module with a deep separable convolution.

The original paper "Xception:deep Learning with Depthwise separable convolutions" is here:

Paper Address: https://arxiv.org/abs/1610.02357

Xception has the smallest number of weight, only 91MB.

As for saying squeezenet?

The "Fire" model of Squeezenet

The squeezenet architecture has alexnet-level accuracy by using the squeeze convolutional layer and the expansion layer (a combination of 1x1 and 3x3 convolution cores), and the model size is only 4.9MB.

Although the Squeezenet model is very small, its training requires skill. In my forthcoming book, "Deep learning computer vision and Python," I'll explain in detail how to train squeezenet from scratch on the imagenet dataset.

Use Python and the above Keras library to classify images

Let's learn how to use the pre-trained convolutional neural network model in the Keras Library for image classification.

Create a new file named classify_image.py, and enter the following code:

The purpose of the 第2-13 line is to import the required Python packages, most of which belong to the Keras library.

Specifically, 第2-6 lines are imported into Resnet50,inception v3,xception,vgg16 and VGG19 respectively.

It is important to note that the Xception network can only be used with the TensorFlow backend (if the Theano backend is used, the class throws an error).

The 7th line, using the Imagenet_utils module, has some functions that can be conveniently used for input image preprocessing and decoding output classification.

In addition, other auxiliary functions are imported, followed by NumPy for numerical processing, cv2 for image editing.

Next, parse the command-line arguments:

We only need a command line parameter--image, which is the path of the input image to classify.

You can also accept an optional command-line argument,--model, that specifies the pre-trained model you want to use, using VGG16 by default.

Given the name of the pre-trained model given by command-line arguments, we need to define a python dictionary that maps the model name (string) to its true Keras class.

The 第25-31 row defines the models dictionary, which maps the model name string to the appropriate class.

If the--model name is not found in models, a assertionerror (第34-36 line) is thrown.

The convolutional neural network takes an image as input and returns a set of probabilities corresponding to the class label as output.

The size of the classic CNN input image is 224x224, 227x227, 256x256, and 299x299, but can also be other sizes.

Both Vgg16,vgg19 and ResNet accept 224x224 input images, while inception V3 and xception require 299x299 pixel input, as shown in the following code block:

Initializes the inputshape to 224x224 pixels. We also use function preprocess_input to perform average subtraction.

However, if you use inception or xception, we need to set the inputshape to 299x299 pixels, and then preprocess_input use separate pre-processing function, Pictures can be scaled in different types.

The next step is to load the pre-trained model weight (weights) from disk and instantiate the model:

The 58th line, from the--model command line argument, gets the name of the model, mapped to the corresponding class through the models dictionary.

Line 59th, and then instantiate the convolutional neural network using the pre-trained imagenet weights.

Note: The weight file for VGG16 and VGG19 is greater than 500MB. ResNet is 100MB, while inception and Xception are between 90-100mb. If this script is run for the first time, these weights files are automatically downloaded and cached to the local disk. Depending on your network speed, this may take some time. However, once the weight files are downloaded, they will not need to be re-downloaded, and running classify_image.py again will be very fast.

The model is now loaded and ready for image classification-we just need to prepare the image for categorization:

Line 65th, load the input image from disk, Inputshape adjust the width and height of the image.

Line 66th converts the image from the Pil/pillow instance to the NumPy array.

The input image is now represented as a numpy array (inputshape[0],inputshape[1],3).

In line 72nd, we usually train/classify images in batches using convolutional neural networks, so we need to add an extra dimension (color channel) to the Matrix via Np.expand_dims.

After np.expand_dims processing, the image has a shape (1,inputshape[0],inputshape[1],3). If this extra dimension is not added, calling. Predict can cause errors.

Finally, line 76th invokes the appropriate preprocessing function to perform data normalization.

After model prediction, and get output classification:

Line 80th, call the CNN. Predict to get the predicted results. Based on these predictions, they are passed to the Imagenet auxiliary function decode_predictions, which gives the name of the Imagenet class tag (the ID is converted to a name, the readability is high) and the probability corresponding to the label.

The first 5 predictions (that is, labels with the maximum probability) are then output to the terminal on lines 85th and 86th.

Before we end the example, we'll take the last thing we do here, load our input image from disk via OpenCV, draw a # # forecast on the image, and finally display the image on our screen:

To see the actual operation of the pre-trained model, see the next section.

Classification results of Vggnet, ResNet, Inception and Xception

All the examples in this post use the keras>=2.0 and TensorFlow backend. If you use TensorFlow, be sure to use version >=1.0, or you will encounter an error. I also tested the script with the Theano backend and confirmed that I could use Theano.

After installing Tensorflow/theano and Keras, click on the source code + sample Image link at the bottom to download.

Now we can classify the images with VGG16:

We can see that VGG16 correctly classifies the image as "soccer" with a probability of 93.43%.

To use VGG19, we only need to change the--network command line arguments:

The VGG19 is able to correctly classify the input image as "convertible" with a probability of 91.76%. Look at other top-5 predictions: the probability of a "sports car" is 4.98% (actually a sedan), the "Limo" is 1.06% (though incorrect but looks reasonable), the "Wheel" is 0.75% (from the model angle is also correct, because the image has wheels).

In the following example, we use the pre-trained resnet architecture to see the top-5 probability values:

ResNet correctly classified the Clinteastwood gun image as a "revolver" with a probability of 69.79%. In the top-5, "rifle" is 7.74%, "submachine gun" is 5.63%. Because of the "revolver" angle of view, the barrel is longer, CNN is easy to think of as a rifle, so the resulting rifle is also higher.

The next example uses ResNet to classify a dog's image:

Dog breeds are correctly identified as "Beagle", with a probability of 94.48%.

Then I try to separate the Pirates of the Caribbean actor Johnny Depp from this image:

Although there is a "boat" class in imagenet, it is interesting that the inception network can correctly identify the scene as "Wreck" and has a 96.29% probability. All other predictive labels, including "Waterfront", "canoe", "paddle" and "breakwater" are related and in some cases are absolutely correct.

For another example of the inception network, I took a photo of the office couch:

Inception correctly predicts that there is a "table light" in the image, with a probability of 69.68%. Other top-5 predictions are also perfectly correct, including "studio sofas", "curtains" (the far right side of the image, almost inconspicuous) "lampshade" and "pillow".

Although inception is not used as an object detector, it is still able to predict the top 5 objects in an image. Convolution neural network can be perfect for the recognition of objects!

Look again at the Xception:

Here we have a picture of a Scottish barrel, especially my favorite Scotch whisky, Lagavulin. Xception correctly classifies this image as "buckets".

The last example is to classify using VGG16:

A few months ago, when I finished the game of the Wild Hunt, I took this photo of the monitor. VGG16 's first prediction was "home theater", which was a reasonable prediction because there was a "TV/monitor" in the top-5 forecast.

As you can see from the examples in this article, the pre-trained model on the Imagenet dataset recognizes a variety of common everyday objects. You can use this code in your own project!

Summarize

For a brief recap, in today's blog post, we present five convolutional neural network models in Keras:

    1. VGG16
    2. VGG19
    3. ResNet50
    4. Inception V3
    5. Xception

Since then, I have demonstrated how to classify images using these neural network models. I hope this article will be of help to you.

Original address:

http://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/

AI100 Reviews:

This article only introduces how to use the Keras in the pre-training model, although the direct use of these models can get the same effect as the expert level, but the specific structure of the model, how to adjust the parameters, behind the idea of some knowledge also need readers to reference other materials. I suggest that you read the code directly in Keras, and then go to the knowledge points you don't know to find other information.

If you need the code and other information to access the original address

[Turn] don't grind, you're an image recognition expert after this.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.