Comparison of RCNN and sppnet

Source: Internet
Author: User
Tags svm

One. RCNN:

1, first through the selective search, to treat the detection of the image to search out 2000 candidate windows.

2, the image of the 2k candidate window is scaled to 227*227, and then input into CNN, each candidate window sill extract a eigenvector, that is, using CNN to extract the eigenvector.

3, the corresponding feature vectors of each candidate window above, using SVM algorithm to classify and identify.

You can see that the R-CNN calculation is certainly very large, because the 2k candidate window to be input into the CNN, respectively, the feature extraction, the computational amount is certainly not large.

Two. Sppnet:

1, first through the selective search, to treat the detection of the image to search out 2000 candidate windows. This step is the same as the r-cnn.

2, feature extraction stage. This step is the biggest difference with R-CNN, the same is the convolution neural network for feature extraction, but spp-net using pyramid pooling. This step is done as follows: The entire image to be detected, input into CNN, a feature extraction, get feature maps, and then in feature maps to find the area of each candidate box, and then the candidate boxes using pyramid space pooling, extracting fixed-length feature vectors. And the R-CNN input is each candidate box, and then on the CNN, because spp-net only need to extract the entire image at once, the speed is much faster ah. Legends can be increased 100 times times faster, because R-CNN is equivalent to traversing a CNN 2000 times, and spp-net only need to traverse 1 times.

3, the last step is the same as R-CNN, using SVM algorithm to classify feature vectors.

Three. A problem:

How do I find the corresponding area of the candidate box in the original picture in feature maps?

Because the candidate box is detected by an entire image, and the size of the feature maps is different from the original image, feature maps is obtained after a series of operations such as the original image convolution and the next sampling. So how do we find the corresponding area in feature maps? Mapping a Window to Feature Maps. The author gives a very convenient formula for us to calculate: assuming (x ', Y ') represents the coordinate point on the feature map, the coordinate point (x, Y) represents the point on the original input image, then they have the following conversion relationship:

(x, y) = (s*x ', S*y ')

where S is the product of all the strides in CNN. such as the ZF-5 used by paper:

S=2*2*2*2=16

And for OVERFEAT-5/7 is s=12, this can look at the following table:

It is important to note that strides contains pooled, convolution stride. Calculate for yourself whether OVERFEAT-5/7 (the first 5 layers) is equal to 12.

In turn, we want to solve (x ', y ') by using the coordinates of (y '), then the formula is as follows:

So we enter the original image detected by Windows, we can get the four corners of each rectangle candidate box, and then we then according to the formula:

Left, Top:

Right, Bottom:

Comparison of RCNN and sppnet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.