This article will mainly introduce the problem of image classification , that is, given a picture, we can give this image a label, the label comes from a preset set, such as {People,cat,dog ...} And so on, this is the core of the CV, the orientation classification in the actual application also has a lot of deformation, and many seemingly unrelated problems (such as object detection, segmentation) can finally be divided into image classification problems.
Color images usually have RGB three channels, each channel is a two-dimensional array, such as a 200*150 image, the image is divided into RGB three channels, so the image can be used 200*150*3 = 90000 A one-dimensional array representation, the array each point value is 0 (black) to 255 (white). Image classification Labels The 90000-dimensional array, such as dog.
The current challenges of image recognition are:
- Viewpoint Variation. Change of perspective
- Scale variation. Size Scaling
- Deformation. Some objects can be deformed at will, such as people stretching
- Occlusion. Only a small part of the target appears in the image.
- Illumination conditions. Changes in light
- Background clutter. Background interference
- Intra-class Variation. Differences in the class, such as the size of various birds, color varies
the method of image classification is mainly the method of supervised learning in machine learning, and the training of a classifier for classification, such as KNN algorithm, is given by the trained data {x (i) and Y (i)}.
In the KNN algorithm, there are hyper-parameters (hyperparameters) need to choose the value of K and distance measurement (L1 or L2 distance), so the data needs to be divided, training sets and test sets, where the test set is very valuable to test the generalization of the model, And we have to train an accurate model, then we can further divide the training data to carry out cross-validation. The following is 50 percent cross-validation, the best model is found by cross-validation, and the test set is used to test the generalization ability of the model.
KNN is very slow, because every prediction to calculate the distance from the training data set of all the images, find top K, practice KNN need to pay attention to a few questions:
1) preprocessing data is 0 mean and unit variance (each dimension of the image data is usually the same variance as the mean, because pixels are between 0-255, so the image can omit this step)
2) high-dimensional data available with PCA
3) If there are many parameters , to ensure that the test set of data enough, training data less than the cross-validation, cross-validation of the more fold, the higher the computational complexity.
4) Cross-validation, such as the above figure divided by 50 percent, wherein the fold1 fold2 fold3 fold5 to train, FOLD4 test to get the best model, at this time in the test set test, you can not fold4, fold4 as burden throw away.
Getting Started with computer vision intorduction to Computer vision