3D Graph Neural Networks for RGBD Semantic segmentation
2018-04-13 19:19:48
1. Introduction:
With the development of depth sensors, RGBD semantic segmentation is applied to many problems, such as virtual reality, robot, human-computer interaction and so on. Compared with the existing 2D semantic segmentation, RGBD semantic segmentation can use real-world geometric information to assist segmentation by exploring depth information. As shown in the normal 2D image, that is, the sub-figure (a), the table and the microwave oven pixels are called neighbors, but in the 3D world, there is no such confusion, because these pixels in the 3D point cloud is far away.
There are also many ways to do RGBD segmentation and 2D segmentation, and depth image as an input image. The characteristics of these two images are extracted by using neural network respectively. This approach requires two CNNs, which makes the computation and memory usage twice times the original one. Missing on the contents of the collection may also cause errors, as shown in the following: the 2D CNN model mistakenly considers table to be counter.
Another method is to use 3D CNN to handle it. But there are certain limitations: since 3D Point clouds is quite sparse, effective representation learning from such data is challenging. In addition, 3D CNNs are computationally more expensive than their 2D version, thus& nbsp; it is difficult-scale up these systems to deal with a large number of classes.
< EM id= "__mcedel" > Span class= "FONTSTYLE0" to solve the above challenges, we propose an end-to-end 3D graph neural Network to learn its representation directly from 3D points (directly learns its represent Ation from 3D points). We first convert the pixels to 3D based on the depth information, and then we use a unary eigenvector to connect each of them, namely: an output of a 2D segmentation CNN. Then we build a graph whose nodes is 3D points,edges is the nearest neighbor found from 3D. For each node, we use the image eigenvector as the initial expression, and then iterate over the update with a recurrent function. The core idea of this dynamic computing mechanism is that the node State was determined by its history state and the messages sent by its neighbors, while Tak ing both appearance and 3d information into consideration.
We use the final state of each node to classify the nodes. We use BPTT to calculate the gradient of the graph neural network. We pass the gradient to the unary CNN for the end to end training. Our experimental results show that the top segmentation effect is achieved on the challenging data sets.
2. Related Works:
Slightly
3. Graph Neural Networks:
----
Paper notes: 3D Graph neural Networks for RGBD Semantic segmentation