First of all, it is necessary to note that the Haar training extracted by OPENCV is characterized by haar characteristics (refer to my other article on Haar features: http://blog.csdn.net/carson2005/article/ details/8094699), the classifier is the AdaBoost cascade classifier (if you need to understand the adaboost algorithm, please refer to my other article: http://blog.csdn.net/carson2005/article/details/ 8130557). The so-called Cascade classifier, is a number of simple component classifier (can be understood as the general ordinary classifier) in series, the final detection classification results, in order to pass through all the component classifier in order to be considered a valid detection classification results. Otherwise, it is considered that there is no target we need to find in the current detection area.
Using the OpenCV Haar training program to train a classifier requires the following steps:
(1) Collect training samples:
Training samples include positive and negative samples. Positive sample, popular point, is the picture only you need to target. and negative samples of the picture as long as it does not contain a target. However, it should be stated that negative samples are not randomly selected. For example, the target you need to test is a car, then the positive sample should be just a picture of a car, and a negative sample cannot be a picture of some sky, ocean, or landscape. Because the purpose of your final training classifier is to test the car, and the car should appear on the road. In other words, the final image of the classifier should be those containing road, traffic signs, buildings, billboards, cars, motorcycles, tricycles, pedestrians, bicycles and other pictures. Obviously, the negative samples here should include motorcycles, tricycles, bicycles, pedestrians, roads, bushes, flowers and plants, traffic signs, billboards and so on.
In addition, it should be reminded that the AdaBoost method is also a classical algorithm in machine learning, and the precondition of machine learning algorithm is that the test sample and training sample are distributed independently. The so-called independent distribution can be simply understood as: training samples are very close to or consistent with the final application. Otherwise, the algorithm based on machine learning does not guarantee the validity of the algorithm. In addition, sufficient training samples (at least thousands of positive samples and thousands of negative samples) are also a precondition to ensure the effectiveness of the training algorithm.
Here, assuming that all the positive samples are placed under the F:/pos folder, all negative samples are placed under the F:/neg folder;
(2) Dimension normalization of all positive samples:
The previous step collected the positive samples, there are a lot of size, some 200*300, some 500*800 ... the goal of size normalization is to zoom all the pictures to the same size. For example, both zoom to the size of the 50*60.
(3) Generate a positive sample description file:
The so-called positive sample description file is actually a text file, but many people like to change the suffix of this file to. dat. The contents of a positive sample description file include: The location of the target number of the file name target in the picture (X,y,width,height)
A typical positive sample description file looks like this:
0.jpg 1 0 0 30 40
1.jpg 1 0 0 30 40
2.jpg 1 0 0 30 40
It is not difficult to find that in the positive sample description file, each positive sample occupies a row, each line starts with a positive sample picture, followed by the number of samples in the picture (usually 1), and the position of the positive sample in the picture
If there are 5,000 positive sample images under the F:\pos folder, there is only one target in each picture. Then, we can write the program (traverse all the picture files in the folder, write the file name to the file, the positive sample in the picture location, the size is written to the file) to generate a Pos.dat file as a positive sample description file.
(4) Creating a positive sample Vec file
Since the haartraining training needs to input the positive sample is the Vec file, you need to use the Createsamples program to convert the positive sample to a VEC file.
Open the executable program named Createsamples (renamed Opencv_createsamples in the new version of OpenCV) in the Bin folder under the OpenCV installation directory. It should be recalled that the program should be launched from the command line (refer to my other blog: http://blog.csdn.net/carson2005/article/details/6704589). and set the path where the positive sample is located and the generated positive sample file to save Brutishness (for example: F:\pos\pos.vec).
Createsamples command-line arguments for the program:
The output file name of the well-trained positive sample.
SOURCE target picture (for example: a company icon)
Background description file.
The number of positive samples to be generated is the same as the number of positive sample images.
Background color (assuming the current picture is a grayscale image). The background color has a transparent color set. For compressed pictures, the amount of color variance is specified by the Bgthresh parameter. The pixels in the middle of the Bgcolor-bgthresh and Bgcolor+bgthresh are considered to be transparent.
If specified, color is reversed
If specified, color is reversed
The maximum deviation of the background color.
The maximum rotation angle, in radians.
If specified, each swatch is displayed, and pressing ESC closes the switch, i.e. the sample picture is not displayed, and the creation process continues. This is a useful debug option.
The width of the output sample (in pixels)
The height of the output sample, in pixels.
(5) Create a negative sample description file
A negative sample description file is generated under the folder where the negative sample is saved, and the steps are the same as (3), which is no longer redundant;
(6) Conduct sample training
This step is done by calling the Haartraining program under the Opencv\bin directory (the new version of OpenCV renamed Opencv_haartraining). where the command line arguments for haartraining are:
The path name of the well-trained classifier is stored.
Positive sample file name (created by the Trainingssamples program or by another method)
Background description file.
A positive/negative sample used to train each classifier phase. The reasonable value is: NPos = 7000;nneg = 3000
The number of cascaded classifier layers trained.
Determines the weak classifier used for the stage classifier. If 1, then a simple stump classifier is used. If 2 or more, a cart classifier with number_of_splits internal nodes is used.
The amount of available memory, in megabytes, that is pre-calculated. The larger the memory, the faster the training will be.
Specifies whether the target object of the training is vertically symmetric. Vertical symmetry increases the training speed of the target. For example, the front part is vertically symmetrical.
The minimum hit rate required for each stage classifier. The total hit rate is min_hit_rate number_of_stages.
There is no maximum error alarm rate for the stage classifier. The total error warning rate is max_false_alarm_rate number_of_stages.
Specifies whether the right to use correction and how much rights are corrected. A basic choice is 0.9.
-mode<basic (default) |core|all>
Select the type of Haar to be used for training. Basic uses only vertical features. All uses a vertical and 45-degree angular rotation feature.
The size of the training sample, in pixels. Must be the same size as the training sample created.
An example of a training classifier:
"D:\Program files\opencv\bin\haartraining.exe"-data Data\cascade-vec data\pos.vec-bg negdata\negdata.dat-npos 49-nn eg 49-mem 200-mode all-w 20-h 20
After training, some subdirectories are generated under directory data, which is a well-trained classifier.
(7) Generating an XML file
In the previous step in the haartraining, we will generate some directories and TXT files in the data directory, we need to call Opencv\bin\haarconv.exe to convert these txt files into an XML file, known as the classifier.
At this point, the training work for the classifier has been completed. The rest is to load the XML file in the program, and call the corresponding function interface to implement the classification detection function.