This note describes the third week of convolutional neural networks: Target detection (1) Basic object detection algorithm
The main contents are:
1. Target positioning
2. Feature Point detection
3. Target detection
Target positioning
Use the algorithm to determine whether the image is the target object, if you want to also mark the picture of its position and use the border marked out
Among the problems we have studied, the idea of image classification can help to learn the classification and positioning, and the idea of classifying and positioning can help the learning object detection.
A. Classification positioning problem, there is usually a larger object in the middle of the picture position.
B. Object detection problems, pictures can contain multiple objects, or a picture will have several different categories of objects.
Image classification Help for category positioning:
For the general Picture Classification Network:
Category positioning is the output BX,BY,BH,BW and a classification label (C1,C2,C3) of the fully connected layer are added to the image classification network.
Define Target label Y
The label y is:
Element meaning:
A. Element PC: The probability of representing a classification of the detected object (whether or not it contains the object of interest detection). For the video mentioned in the case, to test cars, motorcycles, pedestrians, scenery. But the first three detection objects are our concern, then if the PC is 1, the picture is the scene or the other is not our relationship with the PC is 0.
B. Element bx,by: The coordinate of the center position of the marker border, typically (Bx,by). The upper-left corner of the picture is marked (0,0), and the lower-right corner is marked (.).
C. Element BH,BW: The height of the marker border. BW is long and BH is high.
D. Element c1,c2,c3.....cn: As a category label, n corresponds to the actual number of categories of labels. But only one of the c1,c2,c3.....cn is 1. The category of care in the video labels only cars, motorcycles and pedestrians, so n only to 3.
Loss function:
When the PC is 1 o'clock, the loss value equals the square of the corresponding difference of each element
PC is 0 o'clock, just focus on the accuracy of the neural network output PC, y1 is the PC
Feature Point Detection
The neural network can realize the recognition of the target feature by outputting the feature point (x, y) coordinates on the image.
To build such a network, you need to select the number of feature points and generate the tag training set with these feature points, X and tag y (these are all hard-coded), and then use neural network training to output the location of the feature points in the image.
Target detection
Object detection algorithm based on sliding window structure
The first step: Create a label training set, the training set for the appropriate cut picture samples, so that the detection object in the center position, and basically occupy the whole picture
Step Two: Start training the neural network, then use this network to implement sliding window detection
Step three: Sliding window detection
Select a specific size of the window, in order to fix the sliding window, traverse each area of the captured image, the small captured images into the above-trained convolutional network, each location by 0 or 1 classification (to determine whether the interception of the image of the object to be detected). Select a larger window to repeat the above actions.
Sliding window for convolution (improvements to the algorithm above)
Convert the fully connected layer into a convolution layer:
Principle: The video is that, from a mathematical point of view, the conversion of the convolution layer and the full join layer, each node in 400 nodes have a 5x5x16 dimension of the filter, these values are the previous layer of these 5x5x16 activation values through an arbitrary linear function output.
Full Connectivity Model:
To the convolution layer: The FC full-connection layer is converted to 16 5x5 filters and the FC layer is implemented using 400 1x1 filters (the last 1x1 filter deals with the Softmax function like FC).
A sliding window object detection algorithm is implemented via convolution:
For a single convolution implementation process:
Convolution sliding window implementation process: Convolution sliding can actually take the window as a convolutional neural network filter, the sliding step is the step of the filter. So that we don't need to split the input image, but instead of it as a whole picture into the convolution network for calculation, where public areas can share many calculations.
Wunda "Deep learning engineer" 04. Convolutional neural Network third-week target detection (1) Basic object detection algorithm