The target detection algorithm of the RCNN series previously studied was to extract the candidate regions, then use the classifier to identify the regions and position the candidate regions. The process of this kind of method is complex, there are some shortcomings such as slow speed and difficulty in training.
The YOLO algorithm considers the detection problem as a regression problem, uses a single neural network, uses the information of the whole image to predict the border of the target, identifies the category of the target, and realizes the end-to-end target detection, as shown in the figure. Yolo compared to the previous algorithm has the following advantages:
1) very fast . The YOLO process is simple and fast enough to enable real-time detection.
2)YOLO uses full-image information to predict . Unlike sliding windows, region proposals, YOLO uses full-image information during training and forecasting. The fast R-cnn method incorrectly detects the background block as a target because the fast r-cnn method cannot see the global image at the time of Detection. Reduces the background prediction error rate by half compared to fast R-cnn,yolo
3)Yolo can learn the general information of the goal . We use the natural image training YOLO, and then use the art image to predict, YOLO than other target detection algorithm accuracy is much higher.
YOLO:
This method divides the image into s*s lattice. If the center of a target falls into a grid, then the lattice is responsible for detecting the target. Each lattice predicts B bounding boxes, and the confidence values of these bounding boxes (confidence scores). About the confidence value, I did not read the paper at that time, and then understand that: YOLO model to predict each bounding box, also predicted the bounding box confidence value, can be defined as, this value indicates that the bounding box contains the credibility of the target, And the credibility of the bounding box, when training the model, if the corresponding lattice does not contain the target, we want the confidence value equal to 0, otherwise, we want the confidence value equal to predicted box and ground truth IOU.
Each bounding box consists of 5 values: X,y,w,h,confidence. (x, y) represents the center of the bounding box, (W,h) represents the width and height of the bounding box, Confidecne represents the bounding of ground box and truth IOU, noting that the value is predicted rather than actually calculated.
Each lattice predicts the probability of a C condition.
In testing, we can multiply the bounding box's confidence level with the class probability to get a specific class of confidence score, which represents the probability that the category appears in bounding box, and also represents the degree to which the bounding box matches the target.
The YOLO algorithm uses a network structure that is somewhat similar to the googlenet, with 24 convolutional layers and 2 fully connected layers, as shown in the following figure. When training the model, the Imagenet 1000-class data set is used to pre-train the convolution layer. In the pre-training phase, the first 20 convolution layers in the diagram are used, plus a average-pooling layer and a fully connected layer. Then, turning the model into a detection model, the author adds 4 convolution layers and 2 fully connected layers to the pre-trained model, and improves the model's input resolution (224*224-448*448). Among them, the last layer output category probability, bounding box.
For evaluating YOLO in PASCALVOC, we use S = 7,b= 2. PASCALVOC have labelled classes so c= 20.Our final prediction is a7x7x30 tensor.
Each lattice predicts multiple bounding box. However, in training, we only want a bounding box predictor to be responsible for a goal, so select the appropriate truth box IOU according to the bounding of bounding box and ground predictor. In the course of training, the following loss function is used:
However, there are some drawbacks to the YOLO algorithm:
1) Each lattice can only predict two bounding box, one category, which causes the model to reduce the accuracy of the adjacent target detection. As a result, Yolo's detection accuracy for piles of targets is low.
2) The loss function treats the error of small bounding box, large bounding box equally, which will affect the accuracy of the target detection. Because of the greater impact on small bounding box,small error.