Train your own classifier with Haar features

Source: Internet
Author: User


OPENCV provides us with a number of methods and procedures for training classifiers. For human face detection classifier training is called Haier training, we can use these methods to create our own classifier.

(i) Data preparation: Positive sample (Human face)
We need to collect images that contain only faces. The UMIST face Database has a video-like continuous face image, whether it's a positive or a side face. I thought training these images would produce a face detector with a very robust facial expression. But I think too good, in fact the effect is generally. Later, I used the front face database based on the CMU PIE databases, which contains many different lighting conditions, but the effect is similar to the previous, and is not ideal. MIT CBCL face data is another option that includes 2429 positive-facing images, with some different expressions and light intensity. Originally very suitable for Haier training, however, the original size of the library image is only 19*19, so that we can not detect a better size of the human face experiment.
It is possible for OPENCV developers to use the Feret database. Negative sample (background)
We also need to collect some objects that we are not interested in (that is, images that do not contain faces) to generate a Haier practice class classifier. Test the image of the natural state (face in the background)
We can use the Createsamples utility to synthesize the test image set, but it's even better to have a special set of images that test the natural state.
OPENCV's developers used the Cmu-mit frontal face test set to perform their experiments on such images. This collection has a description of the scope, covering the center of the eye, nose and lips, and the azimuth information at both ends, but there is no face orientation represented by the rectangular area.
However, for example, the rectangle of the face area can be computed by the following methods:
Take the height of the nose as the upper edge of the rectangle, with the height of the mouth as the lower edge of the rectangle, with the left end of the mouth as the right side of the rectangle and the right edge of the mouth.
Although this method is not perfect, it seems to be feasible. How to quickly generate images manually
The software I use is imageclipper, which is not only helpful for Haier training, but also can be used in computer vision and machine learning research and other fields. It has the following characteristics:
Automatically open the image sequence in the same directory, open the video file in a frame, use a shortcut key to easily cut the image and jump to the next image, select and cut the area of the image directly with the left mouse button, move or reset the area of the image by the right mouse button, and the selected area can be displayed in the next picture.


1. Haier Training
Now, we use Haartraining.exe to train our own classifiers. The training statements are as follows:

Usage:./haartraining
-data <dir_name>
-vec <vec_file_name>
-BG <background_file_name>
[-npos <number_of_positive_samples = 2000>]
[-nneg <number_of_negative_samples = 2000>]
[-nstages <number_of_stages = 14>]
[-nsplits <number_of_splits = 1>]
[-mem &LT;MEMORY_IN_MB = 200>]
[-sym (default)] [-nonsym]
[-minhitrate <min_hit_rate = 0.995000>]
[-maxfalsealarm <max_false_alarm_rate = 0.500000>]
[-weighttrimming <weight_trimming = 0.950000>]
[-EQW]
[-mode <basic (default) | CORE | All>]
[-W <sample_width = 24>]
[-H <sample_height = 24>]
[-BT <dab | RAB | LB | GAB (default);]
[-err <misclass (default) | gini | Entropy>]
[-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
[-minpos <min_number_of_positive_samples_per_cluster = 500>]


Kuranov et. Al. It is pointed out that the correct rate of 20*20 sample recognition is highest. In addition, for the size of the 18*18, the four-split nodes perform best. For 20*20 samples, two nodes are obviously better. The difference between the number of split nodes is 2, 3, or 4 of the weak tree classifiers is less than their intermediate nodes.
In addition, there is a saying about 20-step training. Assuming that my test set represents a learning task, I can expect an error rate to be 0.5^20≈9.6e-07, and the rate of recognition to be 0.999^20≈0.98.
So, using the sample size of 20*20, and nsplit=2, Nstages=20, minhitrate=0.9999 (default:0.995), maxfalsealarm=0.5 (default:0.5), weighttrimming=0.95 (default:0.95) is a relatively superior combination.

$ haartraining-data Haarcascade-vec samples.vec-bg negatives.dat-nstages 20-nsplits 2-minhitrate 0.999-maxfalsealar M 0.5-npos 7000-nneg 3019-w 20-h 20-nonsym-mem 512-mode All


The "-nonsym" option is used for object classes that do not have vertical (left-right) symmetry. If the object class is vertically symmetric, such as a positive face, use "-sym (default)". This increases the computational speed because only half of the Haier features are put into use.
"-mode All" uses an expansion set of Haier features. By default, only vertical features are used, all in addition to vertical features, and a set of features with a 45° angle.
"-mem 512" is the amount of memory that is expected to be used in megabytes. The default is 200MB.
There are other options that are not used:

[-BT <dab | RAB | LB | GAB (default);]
[-err <misclass (default) | gini | Entropy>]
[-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
[-minpos <min_number_of_positive_samples_per_cluster = 500>]


#你可以使用OpenMP (multi-processing).
#一次训练持续三天.

2. Generating an XML file
When the Haier training process is completely finished, it will generate an XML file.
If you want to convert an intermediate Haier training output directory tree into an XML file, there is a program available under directory opencv/samples/c/convert_cascade.c.
The input format is:

$ convert_cascade--size= "<sample_width>x<sampe_height>"


Example:

$ convert_cascade--size= "20x20" Haarcascade haarcascade.xml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.