How to train the last layer of inception new category

Source: Internet
Author: User

Original:
How do I train the last layer of the inception new category?

Modern object recognition models have millions of parameters that can take weeks to fully train. Learning Migration is a shortcut to a lot of such work to fully train the pattern of a group of classes such as Imagenet, and retrain new classes from existing weights. In this example, we'll start with the last layer of retraining, and let all the rest be the same. For more information about this method, you can see this page http://arxiv.org/pdf/1310.1531v1.pdf.

While it runs poorly as a complete training, this is surprisingly effective for many applications and can run in just 30 minutes on a laptop without the need for a GPU. This tutorial will show you how to run this sample script on your own image and explain some of the options you need to help control the training process.

Content

How to train the last layer of inception new category

1. Flower Training

2, bottleneck

3. Training

4, the use of training mode

5. Train your own category

6. Create a set of training images

7. Training steps

8. Twist

9. Super parameter

10. Training, validation and test sets

1. Flower Training
Before you start any training, you need a set of images to teach you the network of new courses you want to know. There is a section behind explaining how to prepare your own images, but to make it easy, we have created a creative shared file of the archive license for the flower photos originally used. To get a picture of a group of flowers, run these commands:
CD ~
Curl-o http://download.tensorflow.org/example_images/flower_photos.tgz
Tar xzf flower_photos.tgz
Once you have the picture, you can build retrainer like this, from your TensorFlow source code root directory:
Bazel Build Tensorflow/examples/image_retraining:retrain
If you have a supported machine the AVX instruction Set:https://en.wikipedia.org/wiki/advanced_vector_ Extensions (a common X86 processor produced in the last few years) you can improve the speed of retraining by building a framework like this:
Bazel build-c opt--copt=-mavx tensorflow/examples/image_retraining:retrain
Then Retrainer can do this:
Bazel-bin/tensorflow/examples/image_retraining/retrain--image_dir ~/flower_photos

This script loads the pre-trained inception V3 model, removes the old top layer, and trains the new layer with photos of the flowers you downloaded. The variety of flowers is not in the original training of the imagenet whole network. The magic weapon of transitive learning is that the lower layers that are trained to differentiate some objects can reuse many of the recognition tasks without any changes

2, bottlenecks
The script can take 30 minutes or more to complete, depending on the speed of your machine. The first phase analyzes all the images on the disk and calculates each bottleneck value. ' Bottleneck ' is an informal term that we often use in the final output layer, actually doing the classification layer. This second-to-last layer has been trained to set the output value, which is good enough for the classifier to use to differentiate all classes of objects. This means that these photos must be a meaningful and compact image set, as it must contain enough information in a very small set of values that is a good choice. The reason we retrain the last layer is to work on new things, and the result is that 1000 classes of information in the imagenet are useful for identifying new classes of things.

Because each image is reused multiple times for training and computation in every bottleneck, it takes a lot of time, it accelerates caching of these bottleneck values on disk, and they don't need to be recalculated repeatedly. By default, they are saved in/tmp/bottleneck, and if you run the script they will be reused, so you no longer have to wait for this part.

3. Training
Once the bottleneck is complete, the top job of training the network really starts. You will see a series of step outputs, each showing the accuracy of the training, the validation accuracy, and the crossover entropy. Training accuracy shows the percentage of the image that is being used by the current training batch as the correct label. The validation precision is the precision of a randomly selected group of different image sets. The key difference is that the training accuracy is based on the image and the network has been able to learn to make the network over-fit the noise in the training data. A true measure of network performance is the performance of data sets that are not included in the training data-this is a measure of validation accuracy. If the training accuracy is high, but the verification accuracy is still very low, which means that the network and training images, not conducive to more general characteristics of memory. Cross-entropy is a loss function that provides an understanding of how the learning process progresses. The goal of training is to make the loss as small as possible, so if training learning works, the emphasis is on whether the loss stays down and ignores short-term noise.
By default, this script will run 4,000 training steps. Each step randomly selects 10 images from the training set, finds their bottleneck from the cache, and puts them in the last layer to get predictions. These predictions are compared to the last layer of weight for the actual label update, through the reverse propagation process. As the process progresses, you should see that the report is more accurate, that all the steps are complete, and that the final Test accuracy evaluation is performed on a set of images that are kept separate from the training and validation images. This test evaluates the best estimate of how the training model will be performed for the classification task. You should see an exact value between 90% and 95%, although the exact values will vary during the training process, but there is randomness in the training process. This number is root 10, training, validation and test set according to the percentage of the image on the test set, after getting the correct label, the model is fully trained.

4, the use of training mode
The script outputs a version of Inception V3 with the last layer of retrained to your directory/TMP/OUTPUT_GRAPH.PB, and contains the label/tmp/output_labels.txt text file. Both are in a format C + + and Python image classification Examples:https://www.tensorflow.org/versions/master/tutorials/image_ Recognition/index.html are able to read, so you can start using your new mode immediately. Now that you have replaced the topmost layer, you need to specify a new name in the script, for example, by marking--output_layer = Final_result To specify if you use the Label_image program.
Here's an example of how to build and run Label_image using your chart:
Bazel build Tensorflow/examples/label_image:label_image && \
Bazel-bin/tensorflow/examples/label_image/label_image \
--GRAPH=/TMP/OUTPUT_GRAPH.PB--labels=/tmp/output_labels.txt \
--output_layer=final_result \
--image= $HOME/flower_photos/daisy/21652746_cc379e0eea_m.jpg
You should see a list of flower tags, in most cases, chrysanthemums on top (although each training mode may be slightly different). You can replace the--image parameter with your own picture, and use C + + code as a template to integrate with your own application.

5. Train your own category
If you're working on a picture of a flower example running with this script, you can start training to teach it to identify the kind of thing you want. In theory, all you have to do is specify it to a path with multiple subdirectories, each containing a photo. If you do the above and then use the root of the object as the--iamge_dir parameter, the script will train your class like a flower.
This is an example of the file structure of the flower that gives you a sample of this.

In practice, it may take some work in order to get the precision you want. I will guide you through some common problems that you may encounter in the next.

6. Create a set of training images
The first place to start is to look at the images you collect, because the most common problem we see is from the data being entered. To work well for the results, you should collect at least 100 photos of each object you want to know. The more pictures you collect, the better the accuracy of the model you train. You also need to make sure that the photos are a good representation of what your application actually encounters. For example, if you take all the photos indoors and your users are trying to identify objects outdoors, you may not see good results when you deploy them.
Another pitfall to avoid is to pick up any image that has a common mark in the learning process, and if you don't worry, it could be something that's useless. For example, if you are shooting one object in a blue room and the other is in green, then this model will be based on its background color prediction, not the function of the object you really care about. To avoid this, you try to take photos in various wide scenes, different times and different devices. If you want to learn more about these questions, you can read this class (and possibly apocryphal): Http://www.jefftk.com/p/detecting-tanks.
You may also want to consider the categories you use. It may be a large class worth splitting, covering a small class of many different physical forms, more visually different. For example, you can use "cars", "motorcycles" and "trucks" for transportation. It is also worth thinking about whether you have a "closed world" or "open world" problem. The only thing you need to do in a closed world is the category of objects you know. This may apply to a plant recognition application, and you know that users are likely to take a picture of a flower, so all you have to do is decide which species to use. Instead, roaming bots may see a variety of different things, roaming around the world through their own cameras. In this case, you want to classify the report if it does not know what it is to see. This may be difficult to do, but usually if you collect a lot of typical "background" photos, there are no related objects that you can add to an additional ' unknown ' class in your image folder.

This is also worth checking to make sure all of your images are correctly labeled. Often user-generated labels are unreliable for us, such as using Daisy to name photos, but a person named Daisy. If you pass your image and clear any errors, it can bring wonders for your overall accuracy

7. Training steps
If you're happy with your image, you can improve your results by changing the details of the learning process. One of the simplest attempts is--how_many_training_steps. This default value is 4000, but if you increase it to 8000, it will take twice as long to train. Increasing the speed of precision slows down your training time, and at some point it stops completely, but you can experiment and see when it happens and the limitations of your model.

8. Twist
A common way to improve the training results of an image is to put it into a random training mode through deformation, shearing, or light. This helps to enlarge the size of the effective training data due to all possible changes of the same image, and tends to help the network learn to cope with all the distortions that will occur in real life using the classification. The biggest drawback to making these distortions in our scripts is that the bottleneck cache is no longer useful because the imported images are never reused. This means that the training process takes longer, so I suggest you try to use this as a fine-tuning way, and once you have one, you are reasonable and you will feel very happy.
You make these twisted by--random_crop,--random_scale and--random_brightness in the script. These are the percent values that control the distortion of each image that is applied to each image. It is reasonable to have 5 or 10 starting values for each of them, and then experiment to see how their application helps. --flip_left_right will randomly mirror images in half-horizontally, and that becomes meaningful as long as these fall into your application. For example, if you try to identify a letter, it's not a good idea because it destroys what they mean.

9. Super parameter
There are several other parameters that you can try to adjust to see if they contribute to your results. --learning_rate in the training process, control the amplitude of the update to the last layer. Intuitively, if this is smaller, then learning will take longer, but it can ultimately help with overall accuracy. However, this is not always the case, so you need to experiment carefully to see how your situation works. --train_batch_size controls how many images are examined in a training step, and because the learning rate of each batch is applied, you need to reduce it if you have a larger batch to get the same overall effect.

10. Training, validation and test sets
When you put it in the folder of an image, the script is to divide them into three different collections. The largest is usually the training set, which is the weight of all the image inputs that are trained in the network to update the model. You may wonder why we don't use all the images for training? A big potential problem when we do machine learning is that our model may just memorize irrelevant training image details and come up with the right answer. For example, you can imagine a network in the background of each photo, remember a pattern, and use it to match the label with the object. It can produce good results in all images, which are not important details of the training images, but later failed the new image, because it did not learn the general characteristics of the object, just the memory of the trained image.
This problem is called overfitting and avoids us keeping some of our data in the training process so that the model cannot remember them. Then we use these images as a check to make sure it didn't happen, because if we see good precision, this is a good signal to them that the network is not overfitting. The usual split is to incorporate 80% of the images into the main training set, keep 10%, while the training process is often validated, and then have a final 10%, using fewer test sets to predict the real-world performance of the classification. These ratios can be controlled using the--testing_percentage and--validation_percentage flags. A subtle thing the script does is that it uses the file name of the image to determine which one it is placed in. This is to make sure that the image is not moved between the training and the test sleeve is running differently, because this may be a problem if the image is used to train a model, which is then used in the validation set. In general, you should be able to leave these values in their default values because you usually don't find any advantage to adjust them.

How to train the last layer of inception new category

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.