How to achieve target detection and improve accuracy with SOFT-NMS

Last Update:2018-07-26 Source: Internet

Author: User

Tags require

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Turn from: Global AI http://www.sohu.com/a/135469270_642762

Paper Address: https://arxiv.org/pdf/1704.04503.pdf

GitHub Project: Https://github.com/bharatsingh430/soft-nms

Improve target detection accuracy with one line of code

Paper abstract

Non-maximum suppression (Non-maximum suppression, NMS) is an important part of the object detection process. It first generates a detection frame based on the object detection score, the highest-score detection box M is selected, and the other check boxes that have obvious overlap with the selected detection frame are suppressed. The process is continuously applied recursively to the remaining check boxes. Based on the design of the algorithm, if an object is within a predetermined overlapping threshold, it may result in the detection of the object being detected. Therefore, we propose the SOFT-NMS algorithm, which attenuates the detection fraction of the non-maximum detection frame rather than removing it completely. It only requires simple changes to the traditional NMS algorithm without adding additional parameters. The SOFT-NMS algorithm has been improved in the standard data set Pascal VOC2007 (more than R-FCN and faster-rcnn 1.7%) and Ms-coco (compared to R-FCN 1.3%, 1.1% more faster-rcnn). In addition, the SOFT-NMS has the same algorithmic complexity as the traditional NMS and is efficient to use. SOFT-NMS also does not require additional training and is easy to implement, and it can be easily integrated into any object detection process. SOFT-NMS Source code please participate in the GITHUB:HTTP://BIT.LY/2NJLNMU.

Introduction to NMS Algorithms

Object detection is a classic problem in the field of computer vision, which produces a detection frame for a particular class of objects and scores its classification. The traditional object detection process often uses multi-scale sliding windows to calculate the characteristics of each window based on the foreground/background score of each object category. However, adjacent windows tend to have associated scores, which increases the false positives of the test results. In order to avoid this problem, people will use the non-maximum inhibition method to follow up the test results to obtain the final test results. So far, the non-maximum suppression algorithm is still a popular object detection and processing algorithm and can effectively reduce the false positive results.

In the existing object detection framework (shown in Figure a), each detection frame will produce a detection score, then for an object in the picture may correspond to multiple detection points. In this case, in addition to the most correct (highest detection score) of a detection box, the rest of the test box produces false positive results. The non-maximum suppression algorithm solves this problem by setting overlapping thresholds for specific object categories.

Figure a The object detection process using NMS

The traditional non-maximum suppression algorithm first produces a series of detection frame B and the corresponding fractional s in the detected image. The detection box m of the maximum score selected, which is removed from set B and placed in the final test result set D. At the same time, any detection frame in set B that overlaps with the detection frame m is larger than the overlapping threshold NT and will be removed. The biggest problem in the non-maximum suppression algorithm is that it forces the scores of adjacent detection boxes to zero. In this case, if a real object appears in the overlapping region, it will result in a failure to detect the object and reduce the average detection rate of the algorithm (average precision, AP).

In other thoughts, if we simply reduce the score of the adjacent detection box by a function based on the degree of overlap with m, rather than outright culling. Although the score is reduced, the adjacent detection frame is still in the sequence of object detection. The example in Figure II illustrates this problem.

When the object height is overlapped, the occluded objects have different detection scores under different detection algorithms.

Soft-nms improves the average accuracy of target detection

Aiming at the problem of NMS, we propose a new SOFT-NMS algorithm (figure III), which can effectively improve the traditional greedy NMS algorithm by changing one line of code. In this algorithm, we set an attenuation function for the adjacent detection frame based on the size of the overlapping part rather than completely zeroing its fractional value to zero. Simply put, if a check box has a large overlap with M, it will have a very low score, and if the detection box overlaps with only a small portion of M, its original detection score will not be affected too much. On standard datasets such as the standard data set Pascal VOC and Ms-coco, the SOFT-NMS has significantly improved the average accuracy of the existing object detection algorithms in the detection of multiple overlapping objects. At the same time, SOFT-NMS does not require additional training and is easy to implement, so it is easy to integrate into the current object detection process.

Figure three Soft-nms pseudo code, only need to replace the NMS code (red box) with the SOFT-NMS Code (green box) one step to complete

The traditional NMS processing method can be expressed by the following fractional reset functions (rescoring function):

In this formula, the NMS uses a hard threshold value to determine whether adjacent detection boxes are retained. However, in a different way, let's say we attenuate the detection score of a detection box bi that overlaps m height, rather than all suppression. If the detection box BI contains objects different from m, then the object does not miss the detection when the detection threshold is lower. However, if the BI does not contain any objects, even after the attenuation, the BI score is still high, it will produce a false positive result. Therefore, when using NMS for object detection and processing, the following points need to be noted:

The detection score of the adjacent detection frame should be lowered to reduce the false positive results, but the decay score should still be higher than the apparent false positive result.

Removing all adjacent detection frames with a lower NMS overlap threshold is not an optimal solution and can easily lead to missed objects being detected, especially where objects are highly overlapping.

When the NMS employs a higher overlapping threshold, the average accuracy rate may be reduced accordingly.

Fractional Reset function in Soft-nms

The detection score of the adjacent detection frame with overlapping detection frame m is an effective improvement to the NMS algorithm. The more they overlap with the m height, the more likely they are to have false positive results, and their fractional attenuation should be more severe. Therefore, we have made the following improvements to the NMS's original fractional reset function:

When the overlap of the adjacent detection box and m exceeds the overlapping threshold NT, the detection score of the detection frame is linearly attenuated. In this case, the detection frame close to M is very attenuated, and the detection frame away from M is not affected.

However, the fractional reset function is not a continuous function, and when the overlap exceeds the overlapping threshold NT, the fractional reset function produces a mutation that may cause a large change in the detection result sequence, so we would prefer to find a continuous fractional reset function. It has no attenuation on the original detection score of the detection frame without overlapping, and has a large attenuation on the highly overlapping detection frame. Taking these factors into consideration, we further improved the fractional reset function in Soft-nms:

In the SOFT-NMS algorithm of Might, F (IOU (M,bi)) is a weighting function based on the degree of overlap of the detection frame. The complexity of each step in the algorithm is O (n), and N is the number of detection boxes in the picture. For n detection frames, the algorithm complexity of SOFT-NMS is O (N2), which is the same as the traditional greedy NMS algorithm. Since the detection box with fractions below a minimum threshold is rejected directly, the NMS does not need to operate on all the detection boxes, the calculation amount is not large, and does not slow down the current detector's running speed.

It is worth noting that SOFT-NMS is also a greedy algorithm, and can not guarantee to find the global optimal detection box fractional reset. However, the SOFT-NMS algorithm is a more general non-maximum suppression algorithm, and the traditional NMS algorithm can be regarded as one of its special cases with discontinuous two-valued weighting function. In addition to these two fractional reset functions, we can also consider developing other fractional reset functions that contain more parameters, such as the Gompertz function. However, they add additional parameters during the completion of the fractional reset.

Experimental data analysis

We experimented on two standard data sets, Pascal VOC and Ms-coco, respectively. The Pascal DataSet has 20 categories of objects, and the Ms-coco dataset contains 80. Here we choose the VOC 2007 test set to measure the performance of the algorithm. At the same time, the sensitivity analysis is done on a data set containing 5000 images in Ms-coco. In addition, we have shown the results on the Ms-coco set containing 20288 images. To test our algorithm, we performed the experiment on two existing detectors faster-rcnn and R-FCN.

In table one, we use the Ms-coco dataset to compare the performance of the R-FCN and FASTER-RCNN algorithms with traditional NMS and Soft-nms, respectively. We have NT 0.3 in the linear weighting function, and NT in the Gaussian weighting function is 0.5. It is obvious that SOFT-NMS can improve the performance of the algorithm in the above-mentioned situations, especially in the case of multiple objects overlapping. For example, SOFT-NMS increased the average accuracy of the R-FCN and FASTER-RCNN algorithms by 1.3% and 1.1% respectively, resulting in significant improvements in the MS-COCO data set. It is worth emphasizing that we only need to make minor changes to the original NMS algorithm to achieve this performance improvement. At the same time, we did the same experiment on the Pascal DataSet, and in table two we can see that using SOFT-NMS to help faster-rcnn and R-FCN average accuracy increase by 1.7%. After this experiment, we all use the Gaussian weight function soft-nms.

In the experiment shown in Figure four, we can see that the R-FCN algorithm using SOFT-NMS has improved the accuracy of each type of object recognition in the Ms-coco dataset. Among them, such as zebras, giraffes, sheep, elephants, horses and other animal objects detected by 3% to 6% of the accuracy of the increase. At the same time, for the bread machine, ball, hair dryer and so on a few objects at the same time the category of objects, the average detection rate increase is not obvious. In general, SOFT-NMS can effectively improve the success rate of object detection without affecting the speed of operation.

Figure IV Improved classification accuracy of SOFT-NMS algorithms for R-FCN (left) and faster-rcnn (right)

Sensitivity analysis

The above analysis shows that the parameters need to be set when using SOFT-NMS, and the parameter NT needs to be set using the traditional NMS. To make sensitivity analysis of these parameters, we observe the change of the average accuracy rate by changing the values of these parameters on the Ms-coco data. As shown in Figure five, for both detectors, the average accuracy rate (AP) varies steadily between 0.3-0.6 and is then significantly reduced outside of that range. Compared to traditional NMS, SOFT-NMS has better performance in the range of 0.1-0.7 parameter variations. Within the parameters range of the 0.4-0.7, the average accuracy of both detectors using SOFT-NMS is approximately 1% higher than the traditional NMS. Although SOFT-NMS has better performance when the parameter is 0.6, we set it to 0.5 in order to ensure the consistency of the experiment.

Figure five sensitivity analysis of the R-FCN algorithm for parameters (SOFT-NMS) and NT (NMS)

SOFT-NMS is more accurate than traditional NMS

Positioning capability (Localizationperformance): It is very difficult to apply the average accuracy rate to show the significant improvement of SOFT-NMS in object detection performance. Therefore, we need to calculate the average accuracy rate of the traditional NMS and soft-nms at different overlapping thresholds. At the same time, we also constantly change the parameters of NMS and SOFT-NMS in the experiment to have a deeper understanding of the two algorithms. In table three, the average accuracy rate decreases with the increase of the NMS overlap threshold nt. Although high overlap threshold NT has a relatively good performance in highly overlapping (high OT) environments, the high overlap threshold NT results in an average accuracy AP drop significantly in low OT environments. While the SOFT-NMS has different characteristics, good performance in highly overlapping (high OT) environments can still be maintained in low overlap environments. For different parameter settings, SOFT-NMS can achieve better performance than traditional NMS. At the same time, high can be achieved in a highly overlapping environment to achieve greater performance improvement. Therefore, compared to the traditional Nms,soft-nms in object detection has better positioning effect:

Comparison of accuracy and retrieval rate of SOFT-NMS and NMS

Finally, we look at the performance improvement of the lower SOFT-NMS with respect to the NMS at different overlapping thresholds. With the increase of overlap threshold and retrieval rate, the SOFT-NMS has a greater improvement in accuracy rate. This is because the traditional NMS has zero detection-box detection scores for all overlapping areas, thus missing many objects to be identified and resulting in a lower accuracy rate in the case of high retrieval rates. Soft-nms the score of the detection frame in the adjacent area is adjusted rather than completely suppressed, thus improving the accuracy rate in the case of high retrieval rate. At the same time, because the NMS that is completely suppressed in the adjacent area is more likely to miss the object to be detected in the higher overlapping environment, the SOFT-NMS can still improve the object detection performance at low retrieval rate.

Fig. Six accuracy vs retrieval rate of overlap degree (Ot) of different objects

Qualitative analysis

In Figure VII, we make a qualitative analysis of the data in the Coco validation set. Among them, we use R-FCN to detect the object in the picture, the detection threshold value is 0.45. The results of SOFT-NMS were significantly improved when there was a small overlap between false positive results and real detected objects. In the figure 8th below, for example, a wide detection frame used in the NMS that covers multiple characters is effectively suppressed in SOFT-NMS because it has a small overlap with the higher-score detection frames in the graph, and its detection scores are attenuated by the fractional reset function, and the same situation appears in Figure 9th. In the 1th beach scene, soft-nms the larger box around the women's bag is attenuated below 0.45, and the false positive results in Figure 4th are also suppressed. At the same time, in the animal detection in 2,5,7,13, the NMS has an over-inhibition on the adjacent detection frame and the SOFT-NMS detects more correct results in the threshold value of more than 0.45 by attenuating the detection score of the adjacent detection frame.

Figure Seven the experimental results of qualitative analysis, the image of the left-hand image using NMS algorithm, the right image using SOFT-NMS algorithm. Above the blue line to detect a successful instance, the following is a failed instance. Figure 14th to detect objects for people, 15th figure of the object for the bench, figure 21st to detect objects for bonsai.

Experimental conclusion: SOFT-NMS is more efficient in target detection

A new soft weight non-maximum suppression algorithm is proposed in this paper. It is achieved by providing a function based on the level of detection frame overlap and the detection score. Based on the traditional greedy NMS algorithm, two kinds of improved functions are proposed and verified on two existing detection data sets. Through analysis, the accuracy of object detection can be improved effectively based on the detection frame overlap degree and the soft weight function of detection score. Future work can be considered from the perspective of learning more complex parameters or nonparametric equations. In addition, the end-to-end learning framework for object detection is the ideal solution, and it does not need to consider a number of factors such as non-maximum rejection and the detection score and the position of the check box when generating the inspection box.

Here is the code given by the author: (Of course, more than one line t_t)

def cpu_soft_nms (Np.ndarray[float, ndim=2] boxes, float sigma=0.5, float nt=0.3, float threshold=0.001, unsigned int metho
    d=0): cdef unsigned int N = boxes.shape[0] cdef float IW, IH, Box_area cdef float ua cdef int pos = 0  cdef float maxscore = 0 cdef int maxpos = 0 cdef float X1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov for i in Range (N): Maxscore = Boxes[i, 4] Maxpos = i tx1 = boxes[i,0] ty1 = boxes[i,1] Tx 2 = boxes[i,2] ty2 = boxes[i,3] ts = boxes[i,4] pos = i + 1 # get Max box while POS & Lt
            N:if Maxscore < Boxes[pos, 4]: Maxscore = Boxes[pos, 4] Maxpos = pos pos = pos + 1 # Add Max box as a detection boxes[i,0] = boxes[maxpos,0] boxes[i,1] = Boxes[ma

    xpos,1] boxes[i,2] = boxes[maxpos,2] boxes[i,3] = boxes[maxpos,3] boxes[i,4] = boxes[maxpos,4] # Swap ith box witH Position of max box boxes[maxpos,0] = tx1 boxes[maxpos,1] = ty1 boxes[maxpos,2] = t

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More