Because of the work, I need to use the classifier to detect the target, so need to train their own classifier
Here I will simply say the steps and precautions.
Haartraining Step 1: Positive and negative sample processing
Positive sample processing needs to be normalized to the normal sample, in general, Photoshop can be used to size the image of the unified processing, such as are 20*20 or 24*24, and other dimensions such as 240*15 can also be made into samples, do not require a square, or 20*20,24*24,. This is depending on the shape of your goal, but the resolution of the positive sample is not too high, too high in the training will be insufficient memory allocation caused by crash
The error is as follows: OpenCV error:insufficient memory <failed to allocate 250343435bytes> in Cv::\outofmemoryerror
Figure 1.1 Low memory allocation due to too large positive sample resolution crash
If you do not pre-use Photoshop to process a positive sample, you can also use the Opencv_createsamples tool for processing, in fact, even if the sample is processed in Photoshop, the sample will be processed here, The advantage of Photoshop's pre-processing is that the sample's position is 0, 0. Then the normal sample description file processing is relatively simple, especially when the number of positive samples is large enough to reduce the error rate of the positive sample description file, improve the detection rate of the classifier.
1.1 Positive Sample Description file
The positive sample description file is created as follows:
A. txt file or a. dat file contains a positive sample of information,
Structure:
contains the relative or full path of a positive sample picture number of positive samples included in the positive sample position of the first positive sample X y the width and height W h of the first positive sample position of the second positive sample X y The second positive sample width and height W h ... position of nth positive sample X y the width and height w h of the nth positive sample
img01.jpg 1 0 0
img\\img02.jpg 1 0 0
e://img/img03.jpg 2 3 4 248
....
Here you will understand why you should use Photoshop to process images before, so that all images are uniform-sized images that contain only a positive sample, then the positive sample description file looks like this:
Img01.jpg 1 0 0 20 20
Img02.jpg 1 0 0 20 20
Img03.jpg 1 0 0 20 20
......
Since it is unified, it can be done with batch processing:
Commands under Windows: dir imgdir/b > ImgDiscr.txt
(Used under Linux: LS Imgdir >imgdiscr.txt, can also be saved as a. sh file, so you don't have to enter the command every time)
I save it directly as a. bat file, so that each time you do not have to hit the command, you can click on it to generate a file containing all of the positive sample picture names ImgDiscr.txt:
Img01.jpg
Img02.jpg
Img03.jpg
....
Then open with the editor and replace the jpg with JPG 1 0 0 20 20
You can quickly generate a positive sample description file.
1.2 Generation of negative sample description files
Same as the positive sample, but do not need to process the negative sample image, as long as the negative sample image is not better than the resolution of the normal sample image, the negative sample description file only need to include the picture path name:
Non-img01.jpg
Non-img02.jpg
Non-img03.jpg
....
2.opencv_createsamples Generating a positive sample. vec file
The classifier does not directly process the positive sample description file during training, but rather contains a positive sample of the. vec file, which is a vector file generated with MATLAB functions.
The parameters are set as follows:
Opencv_createsamples.exe-info imgDiscr.txt -vec samples.vec -num -W -H 20
Command-line arguments:
- [-vec <vec_file_name>] Output file containing a positive sample for training
- [-img <image_file_name>] Enter an image file name (for example, a company's logo
- [-BG <background_file_name>] A description file of the background image that contains a series of image file names that are randomly selected as the object's background.
- [-mun <number_of_samples>] The number of positive samples generated.
- [-maxxangle <max_x_rotation_angle>] The maximum rotation angle of the x-axis, which must be in radians.
- [-maxyangle <max_y_rotation_angle>] The maximum rotation angle of the y-axis, which must be in radians.
- [-maxzangle <max_z_rotation_angle>] The maximum rotation angle of the z-axis, which must be in radians.
- [-w <sample_width;] the width (in pixels) of the output sample
- [-H <sample_height;] The height of the output sample, in pixels.
Where the value of-num should be the total positive sample number, which is the sum of the second column in the positive sample description file
-w-h is the output image width and height, here will also be the normalization of the sample, wherein we previously used Photoshop to pre-normalization of the image, if not, then here the Createsample tool will also be normalized to the positive sample, all the positive sample size is the same. Also, after Photoshop was normalized to 20 pixels and set to 24 pixels here, the image inside the generated. VEC vector file is 24 pixels.
3.haarTraining Training Classifier
Opencv_haartraining trainer is an obsolete trainer, in 2.4.9 There is also, after the 2.4.10 is not, but the Opencv_traincascade trained classifier in the sample program can not be used, and 3.10 of the OpenCV _traincascade in the use of problems, so forced to find the previous haartraining to use. The following errors occur in Opencv_traincascade:
Train DataSet for temp stage can is filled. Branch Training Terminated
StackOverflow said that the description file is generated under Windows, the end is/r/n, and Linux under the description file is/R no/n, all resulting in not reading the image file, but there is still a problem in training under Linux, so, I think the problem is not on the description file. The training classifier is used to command:
- 2 0.999 0.5 $ - - the 1024x768 1 +
The number of positive samples here is 200, the negative sample number is 20, the sample width is 240, and the height is 15.
The positive sample minimum hit ratio is 0.999 because the cascaded-nstages is 20, so the maximum error rate for 0.999^20 = 0.98 negative samples is 0.5 after 20 cascade error rate is 0.5^20 = 0.0000009536
Basic parameters:
"-data <dir_name>\n" "-vec <vec_file_name>\n" "-BG <background_file_name>\n" "[-npos <number_of_positive_samples =%d>]\n" "[-nneg <number_of_negative_samples =%d>]\n" "[-nstages <number_of_stages =%d>]\n" "[-nsplits <number_of_splits =%d>]\n" "[-mem <memory_in_mb =%d>]\n" "[-sym (default)] [-nonsym]\n" "[-minhitrate <min_hit_rate =%f>]\n" "[-maxfalsealarm <max_false_alarm_rate =%f>]\n" "[-weighttrimming <weight_trimming =%f>]\n" "[-eqw]\n" "[-mode <basic (default) | CORE | all>]\n" "[-W <sample_width =%d>]\n" "[-H <sample_height =%d>]\n" "[-BT <dab | RAB | LB | GAB (default) >]\n" "[-err <misclass (default) | gini | entropy>]\n" "[-maxtreesplits <max_number_of_splits_in_tree_cascade =%d>]\n" "[-minpos <min_number_of_positive_samples_per_cluster =%d>]\n",
Note that the –nneg parameter = the actual negative sample number * Maxfalsealarm
Because the first training is completed, the second Cascade negative sample number is actually not hundred left, in which there is a negative sample of the miscalculation did not load, so-nneg if the actual negative sample number, then the program is likely to enter the dead loop, if the sample number is not small, but a few hours later, there is no movement, then is into the dead loop, Like the 200 positive samples above, 20 negative samples basically the first 10 cascade is completed in a minute, after the cascade for a few minutes, if more than 10 minutes without movement, then it is into the dead loop. Here, the parameters are not set correctly, then the training is unsuccessful, there is a possibility of entering the dead loop is the false alarm rate has been unable to come down, that is, the value of Maxfalsealarm has been high, so the trainer has been running, resulting in the results can not stop, if training for some time, still in training, If there is no problem with the parameters, then the number of cascade should be set appropriately to stop the training and get the classifier.
Using Opencv_traincascade to train can be combined with TBB for multicore processing,
Add TBB when compiling OpenCV.
The classifier trained with opencv_haartraining is stored in an XML file named after the current directory in the-data parameter.
The training process can be stopped at any time, when retraining, the trainer will read the previous parameters, starting from where it stopped.
End of training:
Training of the Opencv_haar classifier