Original: Image classification in 5 Methods
https://medium.com/towards-data-science/image-classification-in-5-methods-83742aeb3645
Image classification, as the name suggests, is an input image, output to the image content classification of the problem. It is the core of computer vision, which is widely used in practice.
The traditional method of image classification is feature description and detection, such traditional methods may be effective for some simple image classification, but the traditional classification method is overwhelmed because of the complicated situation. Now, instead of trying to use code to describe each image category, we decided to switch to machine learning to deal with image classification problems.
At present, many researchers use the CNN and other depth learning models for image classification, in addition, the classical KNN and SVM algorithm also achieved good results. However, it seems impossible to say which method works best for the image splitting problem.
In this project, we have done some interesting things:
This paper compares CNN and migration learning algorithms which are widely used in image classification in the industry and knn,svm,bp neural networks.
Gain deep learning experience.
Explore Google's machine learning framework TensorFlow.
Below is the detailed implementation details.
First, System design
In this project, 5 algorithms for experiments are KNN, SVM, BP Neural Network, CNN and Migration Learning. We experimented with the following three ways
KNN, SVM, BP Neural network is what we can learn in school. Powerful and easy to deploy. So the first step, we mainly use Sklearn to realize KNN,SVM, and BP neural network.
Because the traditional multilayer perceptron model has a good effect on image recognition, the recognition rate is not ideal for high-resolution images because of the blocking of the whole connection mode between nodes. So this step, we build CNN with the Google TensorFlow framework.
The inception V3 of the Deep neural network has been trained. Inception V3 is provided by TensorFlow and is trained using data from the Imagenet since 2012. Imagenet is a classic challenge in the field of computer vision, where contestants try to use the model to put all the images into 1000 categories. In order to train the model that has been trained well, we must ensure that our own dataset is not trained.
II. implementation
The first approach: using Sklearn preprocessing data and implementing KNN,SVM and BP neural networks.
Step 1, using the OPENCV package, defines 2 preprocessing functions, namely the image eigenvector (used to resize the image and flatten the image into a series of row pixels) and extract the color histogram (using Cv2.normalize to extract a 3D color histogram from the HSV gamut and smoothing it).
Step 2, construct the parameters. Since we are trying to perform performance testing on the entire dataset and on the different classes of cubes, we look at each dataset as a parameter for the experimental analysis. In addition, we set the number of neighbors in KNN as parameters.
Step 3, extract the image features and write to the array. We use the Cv2.imread function to read images and classify them according to the normalized image names. Then run the 2 functions mentioned in step 1, get 2 image features and write to the array respectively.
Step 4, use the function Train_test_split to split the dataset. 85% of the data as a training set, 15% of the data as a test set.
Step 5, use the KNN,SVM and BP neural network method to evaluate the data. For KNN, using Kneighborsclassifier, for SVM, using SVC, for BP Neural network, using Mlpclassifier.
The second approach: Build CNN based on TensorFlow. Using TensorFlow to get a calculation diagram and implement it in C + + is more efficient than python.
Several concepts used in TensorFlow: placeholders, variables, mathematical formulas, cost metrics, best practices, CNN architecture.
Step 1, place the image on the first layer.
Step 2, build 3-layer convolution layer (3 convolutional layers), 2x2 max-pooling and Relu. Input is 4 dimension tensor: "image number, y coordinate, x coordinate, channel". The output is another processed 4-D tensor: "Image number (invariant), y-coordinate, x-coordinate, channel".
Step 3, build the 2-layer full connection layer (2 fully-connected Layers). The input is 2 dimension tensor: "Image number, enter Number". The output is 2 D tensor "image number, output number". Use
Step 4, use the merge layer (flatten Layer) to link the convolution layer and the full join layer.
Step 5, standardize the output using Softmax layer.
Step 6, optimize the training results. We use the cross entropy (cross entropy) as the cost measurement function and take its mean value. The best method is to use Tf.train.AdamOptimizer ().
The third method: Retrain Inception V3. Use retrain Inception V3 and use migration learning to reduce workload.
We get the pre-trained model, remove the original top layer, and train the new model. It then analyzes all the images on the disk and calculates their bottleneck values. The script will run 4,000 times. Each run randomly picks 10 images from the training set, finds their bottleneck values and injects the last layer to get the predicted results. Then, in the process of reverse propagation, the weights of each layer are updated according to the comparison result of the forecast result and the actual label.
Four, the experiment
The data set used in the experiment is OXFORD-IIIT Pet data set.
http://www.robots.ox.ac.uk/~vgg/data/pets/
There are 25 types of dogs, Cat class 12. Each class has 200 images. We use data from 10 categories of cats in the dataset, respectively, [' Sphynx ', ' Siamese ', ' Ragdoll ', ' Persian ', ' maine-coon ', ' british-shorthair ', ' Bombay ', ' Birman ', ' Bengal ', ' Abyssinian '. That is, there are 2000 images, because the size of the image is different, we resize unified to a fixed size 64x64 or 128x128.
V. Assigning value
The first method: KNN,SVM, and BP neural network
The first part: using Sklearn preprocessing data and realizing KNN,SVM and BP neural network. In the Image_to_feature_vector function, we set the size 128x128. The results show that the larger the size of the image, the more accurate the result and the larger the operating burden. Finally we decided to use the 128x128 size. In the Extract_color_histogram function, set the number of containers per channel to be 32,32,32. for datasets, 3 sets of data are used. The first is a child dataset with 400 images and 2 labels. The second is a child dataset with 1000 images and 5 labels. The third is the entire dataset, 1997 images, and 10 labels.
In Kneighborsclassifier, we only change the number of neighbors and store the result as the best K value for each dataset, and the other parameters are default.
In Mlpclassifier, we set up 50 neurons per layer.
In Svc, the maximum number of iterations is 1000, and the class weight is "balanced".
Depending on the dataset, 2 labels to 10 labels are different, running at approximately 3-5 minutes.
The second approach: building CNN based on TensorFlow
Because of the long running time in the entire dataset, we are batch-processed in each iteration. Each batch typically has 32 or 64 images. The dataset is divided into 1600-image training sets, 400-image validation sets, and 300-image test sets.
A large number of parameters can be adjusted in this method. The learning rate is set to 1x10^-4, the image size is set to 64x64 and 128x128, then the layers and shapes, but there are too many parameters to adjust, and we experiment to get the best results.
In order to get the best layers, we carried out the experiment. First, the parameters are as follows:
# convolutional Layer 1. Filter_size1 = 5 Num_filters1 =
convolutional Layer 2 filter_size2 = 5 num_filters2 =
# convolutional Lay ER 3. Filter_size3 = 5 Num_filters3 = 128
# fully-connected layer 1 fc1_size = 256 # fully-connected layer
2. fc1_size = 256
We used 3 convolution layers and 2 fully connected layers, but the tragedy was over fitting. It is found that our data sets are too small and the network is too complex for this structure.
Finally, we use the following parameters:
# convolutional Layer 1. Filter_size1 = 5 Num_filters1 =
convolutional Layer 2 filter_size2 = 3 Num_filters2 =
# fully-connected L Ayer 1. fc1_size = 128
# Number of neurons in fully-connected layer.
# fully-connected Layer 2. fc2_size = 128
# Number of neurons in fully-connected layer.
# Number of color channels for the images:
# 1 channel for gray-scale. Num_channels = 3
We use only 2 convolution layers and 2 fully connected layers. Still unsatisfactory, after 4,000 iterations, the results are still fitting, but fortunately the test results are 10% better than the former. Finally, after 5,000 iterations, we get 43% of the accuracy, the running time is more than half an hour.
PS: We experimented with another dataset, CIFAR-10.
Cifar-10:https://www.cs.toronto.edu/~kriz/cifar.html
The dataset contains 60,000 32x32 color images, divided into 10 categories, 6,000 images per category. Training set of 50,000 images, test set 10,000 images. Using the same network structure, after 10 hours of training, the final 78% accuracy.
The third method: Retrain Inception V3
Similar to the above method, the number of training is 4000, adjusted according to the results. The learning rate is adjusted according to the number of images per batch. 80% of the data is used for training, 10% for validation, and 10% for testing.
V. Results of the experiment
The first method: KNN,SVM, and BP neural network
Because of the fitting, we can't get good experimental results. Run time is generally half an hour, due to the fitting, we believe that the running time is not predictable. By comparison with Method 1, it can be concluded that even if CNN had fitted the training set, the experimental results were still superior to Method 1.
The third method: Retrain Inception V3
The whole training process was not more than 10 minutes, and we got very good results. It turns out that deep learning and migration learning is very strong.
Demo:
Vi. Conclusion
Based on the above experimental comparisons, we conclude that:
KNN,SVM and BP neural networks are not effective in image classification.
Even on CNN, the results of CNN's experiments are still better than the traditional algorithms.
Migration learning is very effective in the image classification problem. The operation time is short and the result is accurate, can solve the problem of fitting and data set too small well.
Through this project, we have gained a lot of valuable experience, as follows:
Resize the image to make it smaller.
For each iteration of the training, randomly select small batches of data.
Randomly select small batches of data as validation set for validation, and feedback validation scores during training.
The image augmentation is used to transform the set of input images into a larger new dataset that can be adjusted.
Image datasets are larger than 200x10.
A complex network structure requires more training sets.
Be careful about fitting.
References
1. cs231n convolutional neural Networks for Visual recognition
2. TensorFlow convolutional Neural Networks
3. How to Retrain Inception's Final Layer for New Categories
4. K-nn Classifier for image classification
5. Image augmentation for Deep Learning with Keras
6. convolutional neural network TensorFlow Tutorial
Note:the The IS-performed by Ji Tong
HTTPS://GITHUB.C Om/ji-tong
Originally published at Gist.github.com.