In the beginning, we want to get the corresponding scene depth map from the original image sequence, the problem is as follows:
(1) How the RGB image is converted into a scene depth map.
First, we need to have a clear concept: only from a single RGB image, the depth of information is not available.
The grayscale value of each pixel of the depth image can be used to characterize a point in the scene from the camera's proximity.
The methods of acquiring depth images can be divided into two categories: passive ranging sensing and active depth sensing.
In short: The pixel value of the depth image reflects the distance from the object to the camera in the scene, the method of obtaining the depth image = Passive ranging sensor + active depth sensing.
Passive ranging sensing
The most commonly used method in passive ranging sensing is binocular stereoscopic vision, which obtains two images of the same scene through two cameras at a certain distance, and finds the corresponding pixel points in two images by stereo matching algorithm, then calculates the time difference information according to the triangle principle. The parallax information can be used to characterize the depth information of the object in the scene by transformation.
Principles can be consulted: the mathematical principle of binocular stereo vision
Based on the stereo matching algorithm, the depth image of the scene can be obtained by photographing a group of images at different angles in the same scene. In addition, the depth of scene information can be obtained by indirect analysis of the characteristics of the image, such as photometric features and light and dark features.