Read Paper Binarizednormedgradientsforobjectnessestimationat300fps

Last Update:2018-08-03 Source: Internet

Author: User

Tags bitwise svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

about the paper

This two days turned over cvpr2014 paper, found Ming teacher about objectness detecting paper, so read a. The paper contributed two points of view:

The target has closed boundary, so the window resize to 8x8 can also be targeted and background recognition, which actually reduces the resolution of the window, resize to 8x8 to accelerate the calculation. This is the same as we look on the road to the same people, in a very far place even if we do not see the face, just see a contour can also identify whether we know the person, but, if the face of the face to see a person may not recognize. The authors also use the simplest gradient features, which are very small in number.

The authors skillfully rate the window (the higher the score, the more likely the target, or the more likely the background) of the calculation into (or approximate) by the bitwise operation to achieve, and on this basis to achieve a single image calculation time of 0.003s.

Window scoring is done by a linear model (in fact, a filter),

In order to gain the weight of W, the training data must be trained, the author adopts the simplest linear SVM, the approximate process should be: to the training data, the target window and background window respectively given a different score (from the program, the target window is 1, the background window is 1), training data through linear SVM adjusts w to minimize the error of the training data, the adjustment w vector is used for the prediction of the window score, the score is more than 1 of the window is the target window, otherwise the background window.

Using the linear SVM to train the parameters of the scoring window is also nothing, the focus is: window preprocessing in consideration of the target is generally "not too small", selected some fixed sliding window, such as, 10x160,10x320, and took a way to reduce the window resolution, Resize the window to 8x8, then the window is graded or the operation of W is trained. My understanding of resize is that while resize can reduce the difference between foreground and background, it may make the former background difficult to distinguish, but this operation also reduces the difference between background and background, foreground and foreground, but as long as "the difference between background and background, foreground and foreground" decreases. The difference between the foreground and the background "more, it is still advantageous to distinguish the foreground and background, but should be able to find a compromise, the author seems to calculate the efficiency, directly resize to 8x8." So W and GL in (1) are vectors of 64 dimensions.

Since the W, can directly according to (1) Calculate the window's score, the prediction target is determined, but the author does not simply press (1) to do, but the (1) operation into a bitwise operation, which is why the feature is called Bing (b is binarized), directly using hardware instructions to significantly improve speed. In order to use binary operations, both W and GL must be converted into binary models. ALGORITHM1 is to turn W into a binary model of the algorithm, I feel the principle is basically: W in projection to different orthogonal vectors, if not understand Algorithm1, take a good look at the algorithm is how to operate, that is not "Gram-schmidt orthogonal" it. Only the first NW orthogonal vectors that contain most of the information are taken as output, and the purpose is to reduce the amount of computation. NG features GL into binary model is

I think the meaning is like, for example, a decimal number of 121D, turn into binary is 0111 1001B, can also direct the low truncation (then ng=3), with 0111 1000 approximation instead of 121D. However, there is still some do not understand, B_KL is not the characteristics of 8x8 dimension. Do not understand what this means, the matrix to sum will get the scalar GL. I feel that the following table is used in some confusion, not explained too clearly. In order to calculate the 64-D Bing feature, to scan 64 points, the author uses Algorithm2 also to reduce the computational amount by the binary shift operation, as the author originally said-some similar to the calculation of the integral image (with the integral image Representation).

Finally, the algorithm 1 and 2 combined to the window of the operation from the convolution operation has become the most of the bit operation operation,

where C_j,k is

The above calculation is easy to do with the bitwise operation and SSE instruction (support 8x8=64bit) to complete the fast operation. about the program

Cheng Teacher's program also ran over, roughly looked at, too admire Daniel, even if want, this program may not be able to write well, incredibly also use C + +. The program configuration needs to install OPENCV, the seemingly low version is not yet, the author originally used the VS version is VS2012, if you do not want to recompile OPENCV, it is best to use OPENCV 2.4.8 or more (I use 2.4.10). After you have prepared your OPENCV environment, you will also need to prepare for:

Download VOC database, the author's page gave the link, but note that annotations and VOC official XML format, the author turned into the YML format convenient OPENCV read, after downloading the original VOC database to overwrite the XML file can be. These can be found in the http://mmcheng.net/bing/downloads option.

Configuration VS2012: Because parallel processing is used in the program, in order to turn on/OPENMP, in the configuration of the "c/c++-> language" option, you may also need the SSE directive, in "C/c++-> code generation" to enable/ARCH:SSE2, but it seems that I have enabled, But the compile-time prompt ignores the unknown option "/arch:sse2", in my x64 can not use, but also simply run like this first

My computer ran out of the effect seems and the paper gave a big cut, but the operation speed relative to other previous methods have been better, I do not know how much,

The result of running out

Stagei that is training W parameters incredibly used 13s,stageii 344s Incredibly, single picture 0.1s. Put the last line of the program Objness.illutestreults (boxestests), note go, under voc2007/local/can see the results of the Picture Prediction target window

Target window Results

The effect is still good, the accuracy has not done how to analyze, the program runs after the end, voc2007/results/under the generation of a perimgall.m file, directly in MATLAB can run out of the results: In 1000 samples Dr about 96%, more than 2038 Dr to reach 97%.

Drandmabo

The accuracy curve above is called dr-#WIN curves, a paper from Tpami 2012: Measuring the objectness of image windows. The original text also proposed that the number of windows, such as [[0,5000] normalized to [0,1], with the area under the curve as the target detection of the measurement results, and called the areas under the curve (AUC), so that the scope of the AUC is between [0,1]. the calculation of the detection precision Dr

Dr's calculation is referred to the PASCAL Visual Object Classes (VOC) challenge, the target detection task Dr is calculated true/false positive accuracy, the algorithm detection target results in Groud truth, will "The intersection of the predicted target area and the Groud Truth area" is divided by "predict the region of the target area and the Groud Truth zone" as the DR:

Dr from less than 50% to calculate the target detection is correct, in fact, 50% is very low, almost can not be tested as a result, no wonder those algorithms (Bing This article is also) casually to more than 95%.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More