A single image scene understanding, paper interpretation _2014CV

Source: Internet
Author: User

Karsch K, Liu C, Kang S B. Depth transfer:depth extraction from video Using non-parametric Sampling[j]. IEEE transactions on \& Machine Intelligence, 2014 (11): 2144-2158.

Known rgb-d database, input an RGB image, use the gist feature to find out its topk in the database, and then use SIFT flow topk corresponding depth map to the input image, for K depth map, using energy optimization method to solve, get the depth of the input image map. The establishment of the energy equation is a bright spot, and the rest are the basis of two papers of CE Liu, 2011 pami:sift Flow and Non-parametric. In addition, the work in the video 2d-to-3d effect is very good.


Saxena A, Chung S H, Ng a Y. Learning depth from A to monocular images[c]//advances in neural information processing Sys TEMs. 2005:1161-1168.

3 scales, patch on each scale, and 4 fields, 3*5=15,a0 vertical 4, each patch and 17 filters as convolution, (convolution power 1,2) Get 34 results, (15+4) *34 as absolute depth characteristics. Relative depth characteristics, each patch and 17 filter convolution, get a lot of operation results, using 10bins histogram, equivalent to each patch has 170-dimensional features, vertical two adjacent patch feature vector subtraction, the result as a relative depth feature, the last use of MRF to learn parameter model.


Saxena A, Chung S H, Ng a Y. Depth reconstruction from a single still image[j]. International Journal of Computer Vision, 2008, 76 (1): 53-69.

In contrast to the previous article, the paper increases the depth of the data using stereo parallax.


Saxena A, Sun M, Ng a Y. make3d:learning 3d scene structure from A single still image[j]. Pattern analysis and Machine Intelligence, IEEE transactions on, 2009, 31 (5): 824-840.

Split into many superpixels, establish the relationship between depth and hyper-pixel parameters, feature selection is the same as before, and then establish a MRF model: the first one establishes the relationship between the image feature and the plane parameter, and the target is that the depth in each hyper pixel is consistent with the Ground-truth. The second item is divided into three parts, the first part, the neighborhood of the hyper-pixel, 3D model in addition to the occlusion relationship, it is likely to be connected, the target is two hyper-pixel connected areas of depth consistent. The second part, if the neighboring hyper-pixel has similar characteristics, no edge exists, then the 3D model is likely to be coplanar, the target is in a certain direction, the camera to two pixels of the same depth. The third part, the image plane is a straight line, then 3D model is also a straight line, the goal is in the 3D model, a straight line through a superpixel, the camera along a certain direction to another pixel depth and to the depth of the same line; the third is the depth of multiple images based on parallax estimation. Finally, linear programming is used to solve the problem.


Fouhey D F, Gupta A, Hebert m. Unfolding an indoor origami world[m]//computer VISION–ECCV 2014. Springer International Publishing, 2014:687-702.

Defines primitives that are easy to recognize and infer three-dimensional structures, for a large number of patch and negative samples which have been detected by hog, we get the primitives (many), and then use the learned primitives to interpret the 3D scene of the input image and get the normal vector, which is similar to the depth transfer.


Fouhey D F, Gupta A, Hebert M. Data-driven 3D Primitives for single image Understanding[c]//computer Vision (ICCV), 2013 I EEE International Conference on. IEEE, 2013:3392-3399. To find the three hidden points of a single image, every two hidden points through the speed of light scanning can be a grid plane, three grids and the corresponding position of the image to determine the image of a region is to take which cell in the grid which cell value, using the first paper label transfer method, Then, the smooth of convex structure or concave structure is introduced to infer the normal vector of adjacent surface to ensure the accuracy and completeness of the inference structure.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.