A target detection algorithm based on deep learning: YOLO

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The target detection algorithm of the RCNN series previously studied was to extract the candidate regions, then use the classifier to identify the regions and position the candidate regions. The process of this kind of method is complex, there are some shortcomings such as slow speed and difficulty in training.

The YOLO algorithm considers the detection problem as a regression problem, uses a single neural network, uses the information of the whole image to predict the border of the target, identifies the category of the target, and realizes the end-to-end target detection, as shown in the figure. Yolo compared to the previous algorithm has the following advantages:

1) very fast . The YOLO process is simple and fast enough to enable real-time detection.

2)YOLO uses full-image information to predict . Unlike sliding windows, region proposals, YOLO uses full-image information during training and forecasting. The fast R-cnn method incorrectly detects the background block as a target because the fast r-cnn method cannot see the global image at the time of Detection. Reduces the background prediction error rate by half compared to fast R-cnn,yolo

3)Yolo can learn the general information of the goal . We use the natural image training YOLO, and then use the art image to predict, YOLO than other target detection algorithm accuracy is much higher.

YOLO:

This method divides the image into s*s lattice. If the center of a target falls into a grid, then the lattice is responsible for detecting the target. Each lattice predicts B bounding boxes, and the confidence values of these bounding boxes (confidence scores). About the confidence value, I did not read the paper at that time, and then understand that: YOLO model to predict each bounding box, also predicted the bounding box confidence value, can be defined as, this value indicates that the bounding box contains the credibility of the target, And the credibility of the bounding box, when training the model, if the corresponding lattice does not contain the target, we want the confidence value equal to 0, otherwise, we want the confidence value equal to predicted box and ground truth IOU.

Each bounding box consists of 5 values: X,y,w,h,confidence. (x, y) represents the center of the bounding box, (W,h) represents the width and height of the bounding box, Confidecne represents the bounding of ground box and truth IOU, noting that the value is predicted rather than actually calculated.

Each lattice predicts the probability of a C condition.

In testing, we can multiply the bounding box's confidence level with the class probability to get a specific class of confidence score, which represents the probability that the category appears in bounding box, and also represents the degree to which the bounding box matches the target.

The YOLO algorithm uses a network structure that is somewhat similar to the googlenet, with 24 convolutional layers and 2 fully connected layers, as shown in the following figure. When training the model, the Imagenet 1000-class data set is used to pre-train the convolution layer. In the pre-training phase, the first 20 convolution layers in the diagram are used, plus a average-pooling layer and a fully connected layer. Then, turning the model into a detection model, the author adds 4 convolution layers and 2 fully connected layers to the pre-trained model, and improves the model's input resolution (224*224-448*448). Among them, the last layer output category probability, bounding box.

For evaluating YOLO in PASCALVOC, we use S = 7,b= 2. PASCALVOC have labelled classes so c= 20.Our final prediction is a7x7x30 tensor.

Each lattice predicts multiple bounding box. However, in training, we only want a bounding box predictor to be responsible for a goal, so select the appropriate truth box IOU according to the bounding of bounding box and ground predictor. In the course of training, the following loss function is used:

However, there are some drawbacks to the YOLO algorithm:

1) Each lattice can only predict two bounding box, one category, which causes the model to reduce the accuracy of the adjacent target detection. As a result, Yolo's detection accuracy for piles of targets is low.

2) The loss function treats the error of small bounding box, large bounding box equally, which will affect the accuracy of the target detection. Because of the greater impact on small bounding box,small error.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A target detection algorithm based on deep learning: YOLO

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A target detection algorithm based on deep learning: YOLO

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support