Training of the Opencv_haar classifier

Source: Internet
Author: User

Because of the work, I need to use the classifier to detect the target, so need to train their own classifier

Here I will simply say the steps and precautions.

Haartraining Step 1: Positive and negative sample processing

Positive sample processing needs to be normalized to the normal sample, in general, Photoshop can be used to size the image of the unified processing, such as are 20*20 or 24*24, and other dimensions such as 240*15 can also be made into samples, do not require a square, or 20*20,24*24,. This is depending on the shape of your goal, but the resolution of the positive sample is not too high, too high in the training will be insufficient memory allocation caused by crash

The error is as follows: OpenCV error:insufficient memory <failed to allocate 250343435bytes> in Cv::\outofmemoryerror

Figure 1.1 Low memory allocation due to too large positive sample resolution crash

If you do not pre-use Photoshop to process a positive sample, you can also use the Opencv_createsamples tool for processing, in fact, even if the sample is processed in Photoshop, the sample will be processed here, The advantage of Photoshop's pre-processing is that the sample's position is 0, 0. Then the normal sample description file processing is relatively simple, especially when the number of positive samples is large enough to reduce the error rate of the positive sample description file, improve the detection rate of the classifier.

1.1 Positive Sample Description file

The positive sample description file is created as follows:

A. txt file or a. dat file contains a positive sample of information,

Structure:

contains the relative or full path of a positive sample picture    number of positive samples included in the positive sample   position of the first positive sample X y  the width and height W h of the first positive sample   position of the second positive sample X y  The second positive sample width and height W h ...   position of nth positive sample X y the width and height w h of the nth positive sample

img01.jpg 1 0 0

img\\img02.jpg 1 0 0

e://img/img03.jpg 2 3 4 248

....

Here you will understand why you should use Photoshop to process images before, so that all images are uniform-sized images that contain only a positive sample, then the positive sample description file looks like this:

Img01.jpg 1 0 0 20 20

Img02.jpg 1 0 0 20 20

Img03.jpg 1 0 0 20 20

......

Since it is unified, it can be done with batch processing:

Commands under Windows: dir imgdir/b > ImgDiscr.txt

(Used under Linux: LS Imgdir >imgdiscr.txt, can also be saved as a. sh file, so you don't have to enter the command every time)

I save it directly as a. bat file, so that each time you do not have to hit the command, you can click on it to generate a file containing all of the positive sample picture names ImgDiscr.txt:

Img01.jpg

Img02.jpg

Img03.jpg

....

Then open with the editor and replace the jpg with JPG 1 0 0 20 20

You can quickly generate a positive sample description file.

1.2 Generation of negative sample description files

Same as the positive sample, but do not need to process the negative sample image, as long as the negative sample image is not better than the resolution of the normal sample image, the negative sample description file only need to include the picture path name:

Non-img01.jpg

Non-img02.jpg

Non-img03.jpg

....

2.opencv_createsamples Generating a positive sample. vec file

The classifier does not directly process the positive sample description file during training, but rather contains a positive sample of the. vec file, which is a vector file generated with MATLAB functions.

The parameters are set as follows:

Opencv_createsamples.exe-info imgDiscr.txt -vec samples.vec -num -W -H 20

Command-line arguments:

    • [-vec <vec_file_name>] Output file containing a positive sample for training
    • [-img <image_file_name>] Enter an image file name (for example, a company's logo
    • [-BG <background_file_name>] A description file of the background image that contains a series of image file names that are randomly selected as the object's background.
    • [-mun <number_of_samples>] The number of positive samples generated.
    • [-maxxangle <max_x_rotation_angle>] The maximum rotation angle of the x-axis, which must be in radians.
    • [-maxyangle <max_y_rotation_angle>] The maximum rotation angle of the y-axis, which must be in radians.
    • [-maxzangle <max_z_rotation_angle>] The maximum rotation angle of the z-axis, which must be in radians.
    • [-w <sample_width;] the width (in pixels) of the output sample
    • [-H <sample_height;] The height of the output sample, in pixels.

Where the value of-num should be the total positive sample number, which is the sum of the second column in the positive sample description file

-w-h is the output image width and height, here will also be the normalization of the sample, wherein we previously used Photoshop to pre-normalization of the image, if not, then here the Createsample tool will also be normalized to the positive sample, all the positive sample size is the same. Also, after Photoshop was normalized to 20 pixels and set to 24 pixels here, the image inside the generated. VEC vector file is 24 pixels.

3.haarTraining Training Classifier

Opencv_haartraining trainer is an obsolete trainer, in 2.4.9 There is also, after the 2.4.10 is not, but the Opencv_traincascade trained classifier in the sample program can not be used, and 3.10 of the OpenCV _traincascade in the use of problems, so forced to find the previous haartraining to use. The following errors occur in Opencv_traincascade:

Train DataSet for temp stage can is filled. Branch Training Terminated

StackOverflow said that the description file is generated under Windows, the end is/r/n, and Linux under the description file is/R no/n, all resulting in not reading the image file, but there is still a problem in training under Linux, so, I think the problem is not on the description file. The training classifier is used to command:

 - 2 0.999 0.5  $  -  -  the 1024x768 1  +

The number of positive samples here is 200, the negative sample number is 20, the sample width is 240, and the height is 15.

The positive sample minimum hit ratio is 0.999 because the cascaded-nstages is 20, so the maximum error rate for 0.999^20 = 0.98 negative samples is 0.5 after 20 cascade error rate is 0.5^20 = 0.0000009536
Basic parameters:

"-data <dir_name>\n" "-vec <vec_file_name>\n" "-BG <background_file_name>\n" "[-npos <number_of_positive_samples =%d>]\n" "[-nneg <number_of_negative_samples =%d>]\n" "[-nstages <number_of_stages =%d>]\n" "[-nsplits <number_of_splits =%d>]\n" "[-mem <memory_in_mb =%d>]\n" "[-sym (default)] [-nonsym]\n" "[-minhitrate <min_hit_rate =%f>]\n" "[-maxfalsealarm <max_false_alarm_rate =%f>]\n" "[-weighttrimming <weight_trimming =%f>]\n" "[-eqw]\n" "[-mode <basic (default) | CORE | all>]\n" "[-W <sample_width =%d>]\n" "[-H <sample_height =%d>]\n" "[-BT <dab | RAB | LB | GAB (default) >]\n" "[-err <misclass (default) | gini | entropy>]\n" "[-maxtreesplits <max_number_of_splits_in_tree_cascade =%d>]\n" "[-minpos <min_number_of_positive_samples_per_cluster =%d>]\n",

Note that the –nneg parameter = the actual negative sample number * Maxfalsealarm

Because the first training is completed, the second Cascade negative sample number is actually not hundred left, in which there is a negative sample of the miscalculation did not load, so-nneg if the actual negative sample number, then the program is likely to enter the dead loop, if the sample number is not small, but a few hours later, there is no movement, then is into the dead loop, Like the 200 positive samples above, 20 negative samples basically the first 10 cascade is completed in a minute, after the cascade for a few minutes, if more than 10 minutes without movement, then it is into the dead loop. Here, the parameters are not set correctly, then the training is unsuccessful, there is a possibility of entering the dead loop is the false alarm rate has been unable to come down, that is, the value of Maxfalsealarm has been high, so the trainer has been running, resulting in the results can not stop, if training for some time, still in training, If there is no problem with the parameters, then the number of cascade should be set appropriately to stop the training and get the classifier.

Using Opencv_traincascade to train can be combined with TBB for multicore processing,

Add TBB when compiling OpenCV.

The classifier trained with opencv_haartraining is stored in an XML file named after the current directory in the-data parameter.

The training process can be stopped at any time, when retraining, the trainer will read the previous parameters, starting from where it stopped.

End of training:

Training of the Opencv_haar classifier

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.