About ROI pooling Layer

Source: Internet
Author: User

ROIs pooling as the name implies, is a pooling layer, but also for the ROIs pooling;

The whole ROI process is to pull these proposal out of the process, get the size of a unified feature map.

What is ROI? (Https://www.sogou.com/link?url=DOb0bgH2eKh1ibpaMGjuyy_CKu9VidU_Nm_z987mVIMm3Pojx-sH_PfgfR9iaaFcn666hxi--_g.)

ROI is a shorthand for region of interest, referring to the faster RCNN structure, after RPN layer, the resulting proposal corresponding box box.

Input for ROI pooling

The inputs are made up of two parts:
1. Data: Refers to the feature Map of the conv layer before entering the RPN layer, which is commonly referred to as "Share_conv";
2. Rois: Refers to the output of the RPN layer, a bunch of rectangular boxes , the shape of 1x5x1x1 (4 coordinates + index), which is worth noting: The coordinate reference system is not for the feature map of this picture, But for the original (the first input of the neural network)
Output of ROI pooling

The output is batch vector, where the value of batch equals the number of ROI, thevector size is channelxwxh;The ROI pooling process is a box rectangle with different sizes, are mapped to rectangular boxes of size wxh;

, we first map the ROI coordinates to the feature map, the mapping rule is relatively simple, is to divide the coordinates of the input image and the size of the feature map, get the box coordinates feature map, we use pooling to get the output Because the input image size is different, so here we use the SPP POOLING,SPP pooling in the process of pooling need to calculate the pooling results corresponding to the two pixel point reflects the extent of the community to the feature map, Then take Max or take average in that range.
---------------------

(Https://www.sogou.com/link?url=44aejrzSKwWwrNJcKKLVtEK1rJUb32uHp37TwbVHvja5OaZX_AHBzQ. )

The pool layer of the tensorflow is a fixed size

(Https://www.sogou.com/link?url=DSOYnZeCC_rR_TP93bdO6GxT14t4sbuOSwJr4L_oLI5lf9NGYfOU6pULrym3hTBVtsCnpVGpPpA.)

The RoI pooling is the ability to map from the original area to the CONV5 region to the last pooling to a fixed size.

Input, B0 is the feature map,b1 of the convolution is Rois.

Reshape will top reshape into NUM_B1 (num of Rois) c_b0 pooled_height pooled_width, max_idx_ reshape as top. Forward (Forward propagation) calculates the coordinates of the Rois map to feature map first, that is, the original coordinate *spacial_scale (one of the multiplication integrals of all stride), and then calculates for each output, that is, each output point represents the original area. The size of this area is bin_h= roi_height/pooled_ height, bin_w=roi_width/pooled_width. Traverse all top points mapped back to feature map area and find the maximum value, Record the location where the maximum value is located. Backward (reverse propagation) backward is written directly into the form of a GPU, but at the beginning it can be seen as traversing the feature map and recording n, C, H, W, preparing for subsequent recording Bottom_diff, and then calculating the coordinates of each ROI map to the feature map , then I thought there was a small problem, the author means that if H,w is not within the ROI area, it can be continue, it is not difficult to understand that a point in the ROI may contribute to the top of this ROI (maximum in one bin), If the point is not in that area, it must not contribute to top. And a point may contribute to multiple regions, so when loss returns, the loss of the same point accumulates.

(Https://www.sogou.com/link?url=DSOYnZeCC_rR_TP93bdO6GxT14t4sbuOSwJr4L_oLI5lf9NGYfOU6pULrym3hTBVtsCnpVGpPpA.)

We know that in faster r-cnn, for each ROI (called candidate object) there are mainly two outputs, one output is the classification result, which is the label of the prediction Box, and the other output is the regression result, which is the coordinate offset of the prediction box. The Mask r-cnn adds a third output: Object mask, which is the output of a mask for each ROI, which is achieved through the FCN network (two convolutional layers in Figure1). These three output branches are parallel to each other, and the parallel design is not only simple but also efficient compared to other segmented and reclassified instance segmentation algorithms.
---------------------
The path of AI
Source: CSDN
Original: 81878644

Generally review the operation of the ROI pool layer:

The input of ROI pool is the coordinate of ROI and the output characteristic of one layer, whether it is ROI pool or roialign, the aim is to extract the characteristic of this ROI coordinate on the output feature graph. The ROI coordinates obtained by the RPN network are for the input image size, so we need to reduce the ROI coordinates to the corresponding size of the output feature, assuming that the output feature size is 1/16 of the input image, then divide the ROI coordinates by 16 and take the whole (first quantization), then divide the ROI into h* W Bin (The paper is 7*7, sometimes also use 14*14), because the partitioning process to get the bin coordinates are floating point value, so here also will be the bin coordinates to do a quantization, specifically for the upper-left coordinate with the downward rounding, for the lower-right coordinate using upward rounding, Finally, the maximum pooling operation is used to process each bin, that is, the maximum value in each bin as the value of the bin, each bin in this way to get the value, the final output of the h*w size of the ROI characteristics. From this introduction, we can see that the ROI pool has two quantization operations, and this two-step quantization operation introduces errors.
---------------------
The path of AI
Source: CSDN
Original: 81878644

About ROI pooling Layer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.