The three-dimensional reconstruction based on vision refers to acquiring the image of the scene object through the camera, analyzing and processing the image, and combining computer vision knowledge to deduce the three-dimensional information of the object in the real environment.
1. Related concepts (1) color image and depth image
Color image is also known as RGB image, R, G, b three components corresponding to Yu Hong, green, blue three channels of color, their overlays constitute the image pixels of different gray level. RGB color space is the foundation of a colorful real world. The depth image, also known as the distance image, is different from the luminance value stored in the pixel points in the grayscale image, and its pixels store the distance from the point to the camera, that is, the depth value. Figure 2-1 shows the relationship between the depth image and the grayscale image.
Figure 2-1 Depth image and grayscale image
Fig.2-1 the depth image and gray image
The depth value refers to the distance between the target object and the measuring device. Because the depth value is only related to the distance, and the environment, light, direction and other factors regardless, so the depth of the image can be true and accurate to reflect the geometry depth of the scene information. By establishing the space model of the object, we can provide a more solid foundation for the deep-seated computer vision application.
Figure 2-2 color images and depth images of characters
Fig.2-2 Color image and depth image of the characters
(2) PCL
The PCL (Point Cloud Library) is an open source project developed and maintained by researchers such as Dr.radu, Stanford University, based on the Ros (Robot Operating System, robotic operating system), which was initially used to assist in robotic sensing, Development in areas such as cognition and drive. The PCL was formally opened to the public in 2011. With the addition and expansion of three-dimensional point cloud algorithm, PCL gradually developed into a free, open source, large scale, cross-platform C + + programming library. The PCL framework includes many advanced algorithms and typical data structures, such as filtering, segmentation, registration, identification, tracking, visualization, model fitting, surface reconstruction, and many other functions. It can run on a variety of operating systems and most embedded systems with strong software portability. Due to the wide application of PCL, experts and scholars on the Point Cloud Library Update maintenance is very timely. The development of PCL has now come to the 1.7.0 version. With more fresh, practical, and interesting features than earlier versions, it provides a modular, standardized solution for the use of point cloud data. Through the shape processor, shared storage parallel programming, unified computing equipment architecture and other leading high-performance technology, improve the rate of PCL-related processes to achieve real-time application development.
In terms of algorithms, PCL is a set of algorithms for processing point cloud data, including data filtering, point cloud registration, surface generation, image segmentation and location search. Each set of algorithms is differentiated based on different types, which integrates all three-dimensional reconstruction pipeline functions to ensure the compactness, reusability and enforceability of each set of algorithms. For example, an interface flow that implements pipeline operations in a PCL:
① create processing objects, such as filtering, feature estimation, image segmentation, etc.
② enters the processing module by inputting the initial point cloud data via Setinputcloud;
③ set the parameters of the algorithm;
④ invokes functions of different functions to implement the operation and outputs the result.
To achieve modular application and development, the PCL is subdivided into groups of independent code sets. Therefore, it can be conveniently and quickly applied to embedded systems, and can be used for portable and individual compilation. Some of the commonly used algorithm modules are listed as follows:
LIBPCL I/O: Complete data input, output process, such as point cloud data Read and write;
LIBPCL Filters: Complete data sampling, feature extraction, parameter fitting and other processes;
LIBPCL Register: Complete the registration process of the depth image, such as the iterative nearest point algorithm;
LIBPCL Surface: Complete the three-dimensional model of the production process, including triangular meshing, surface smoothing and so on.
These commonly used algorithmic modules have regression testing to ensure that no errors are introduced during use. Testing is typically performed by specialized agencies to write use case libraries. When a regression error is detected, the message is immediately fed back to the appropriate author. This makes it possible to improve the safety and stability of the PCL and the system.
(3) Point cloud data
2-3, the typical point cloud data (DATA,PCD) model is shown.
Figure 2-3 Point cloud data and its amplification effect
Point cloud data is often present in reverse engineering, and is a collection of information about the surface of an object acquired by a ranging device. The scanned data is recorded in the form of dots, which can be either three-dimensional coordinates or a color or light intensity information. The point cloud data commonly used include point coordinate accuracy, spatial resolution, and surface normal vector content. Point clouds are generally stored in the PCD format, where point cloud data is more operational and can improve the speed of point cloud registration fusion. The point cloud data studied in this paper is unstructured scattered point cloud, which belongs to the characteristic of point cloud of three-dimensional reconstruction.
(4) Coordinate system
In three-dimensional space, all points must be represented in the form of coordinates and can be converted between different coordinate systems. Firstly, the concept, calculation and interrelation of basic coordinate system are introduced.
① Image coordinate system
The image coordinate system is divided into two kinds of pixel and physical coordinate system. The information of digital image is stored in matrix form, that is, the image data of a pair of pixels is stored in the dimension matrix. The image pixel coordinate system takes the origin point, the pixel as the basic unit, the U, the V are the horizontal, the vertical direction axis respectively. The physical coordinate system of the image is based on the intersection of the optical axis of the camera and the image plane as the origin, and the x and Y axes are parallel to the U and V axes respectively. Figure 2-4 shows the positional relationship between the two coordinate systems:
Figure 2-4 Image pixel coordinate system and physical coordinate system
FIG.2-4 Image pixel coordinate system and physical coordinate system
Makes the coordinate point (u0,v0) in the U-v coordinate system, and the physical dimension that represents the pixel point on the x-axis and the y-axis. Then all the pixels in the image have the relationship between the U-v coordinate system and the coordinates in the X-y coordinate system (2-1):
This refers to the tilt factor (Skew Factor) formed by the axis inclination intersection of the image coordinate system.
② Camera coordinate system
The camera coordinate system is composed of the camera's light heart and three, and axis. Its axis corresponds to the axis in the physical coordinate system of the image, the axis is the optical axis of the camera, and is perpendicular to the plane composed of the origin, the axis. 2-5 is shown below:
Figure 2-5 Camera coordinate system
So that the focal length of the camera is F, the point in the physical coordinate system of the image is related to the point in the camera coordinate system:
③ World coordinate system
Considering the uncertainty of camera position, it is necessary to use world coordinate system to unify the coordinate relation between camera and object. The world coordinate system consists of the origin and, three axes. There is a conversion relationship between the world coordinates and the camera coordinates (2-3):
(23)
Where is the rotation matrix, which represents the direction of the camera in the world coordinate system, is the translation vector, which represents the location of the camera.
2. Three-dimensional reconstruction process
This paper uses Kinect to capture the point cloud data of the scene, through the steps of deep image enhancement, point cloud computing and registration, data fusion, surface generation and so on, the three-dimensional reconstruction of the scene is done.
Fig. 2-6 Three-dimensional reconstruction flowchart based on depth sensor
Fig.2-6 Flow Chart of 3D reconstruction based on depth sensor
The process shown in Figure 2-6 shows that the first six steps are taken for each frame depth image obtained, until several frames are processed. Finally, the texture map is completed. Detailed instructions for each step are given below.
2.1 Getting the depth image
The depth image of the scene is captured by the Kinect on the Windows platform, and its corresponding color image can be obtained. In order to get enough images, we need to change different angles to shoot the same scene to ensure that all the information of the scene is included. The solution can be either a fixed Kinect sensor for capturing objects on a rotating platform or a rotating Kinect sensor to capture a fixed object. The low price and simple operation of the depth sensor equipment can obtain real-time scene depth image, which greatly facilitates people's application.
2.2 Preprocessing
Because of the device resolution and other limitations, its depth information also has many shortcomings. In order to promote the subsequent application based on depth image, the image enhancement process such as denoising and repairing of deep image must be done. As the focus of this paper, the specific treatment method will be explained in detail in the fourth chapter.
2.3 Point Cloud computing
After preprocessing the depth image has two-dimensional information, the value of the pixel is depth information, representing the object surface to the Kinect sensor line distance, in millimeters. Based on the camera imaging principle, it is possible to calculate the conversion relationship between the world coordinate system and the image pixel coordinate system:
Then the K value is only relevant, and the parameters are only related to the internal structure of the camera, so it is called the internal parameter matrix of the image machine. With the camera as the world coordinate system, that is, the depth value is the value in the world coordinate system, and the corresponding image coordinate is the point of the image plane.
2.4 Point Cloud Registration
For multi-frame images taken from different angles, each frame contains a certain public part. In order to make use of depth image for three-dimensional reconstruction, the image needs to be analyzed to solve the transformation parameters between each frame. The registration of the depth image is based on the common part of the scene, and the multi-frame images obtained by different time, angle and illumination are superimposed into the unified coordinate system. The corresponding translation vectors and rotation matrices are computed, and redundant information is eliminated. In addition to restricting the speed of three-dimensional reconstruction, point cloud registration will also affect the fine degree and global effect of the final model. Therefore, the performance of point cloud registration algorithm must be improved.
The registration of three-dimensional depth information is divided into three methods, such as rough registration, fine registration and global registration, according to different image input conditions and reconstruction output requirements [48].
(1) coarse registration (coarse registration)
Rough registration is a multi-frame acquisition of depth images from different angles. First, the feature points of the two-frame image are extracted, which can be the explicit features such as line, inflection point, curve curvature and so on, and can also be the characteristics of custom symbols, rotating graphs, axes and other types. Then the preliminary registration is realized according to the characteristic equation. The point cloud and the target point cloud after the rough registration will be in the same scale (pixel sampling interval) and the reference coordinate system, and the coarse matching initial value is obtained by automatically recording coordinates.
(2) fine registration (Fine registration)
Fine registration is a more in-depth registration method. After the coarse registration of the previous step, the transformation estimate is obtained. This value as the initial value, after the continuous convergence and iteration of the fine registration, to achieve more accurate results. As an example of the classic ICP (iterative Closest Point, iterative nearest) algorithm presented by BESL and mckay[49, the algorithm first calculates the distance between all points on the initial point cloud and the target point cloud, ensuring that the nearest points of these points and the target point cloud correspond to each other, At the same time, the objective function of the sum of squared residuals is constructed. The error function is minimized based on the least squares method, and iterative until the mean square error is less than the set threshold value. The ICP algorithm can obtain accurate registration results, which is of great significance to the registration of free-form surfaces. In addition, like SAA (Simulate anneal arithmetic, simulated annealing) algorithm, GA (Genetic algorithm, genetic) algorithm, etc. also have their own characteristics and use category.
(3) Globally registration (global registration)
Global registration is a direct calculation of the transformation matrix using an entire image. Through the two-frame fine registration results, in a certain order or a one-time multi-frame image registration. These two registration methods are called sequence registration (sequential registration) and synchronous registration (simultaneous registration) respectively. In the registration process, the matching error is distributed evenly to the multi-frame images of various viewpoints, and the effect of reducing the cumulative error caused by multiple iterations is achieved. It is worth noting that although the global registration can reduce the error, it consumes a large amount of memory storage space, greatly increasing the time complexity of the algorithm.
2.5 Data Fusion
The depth information after registration is still the scattered and disordered point cloud data in space, which can only show some information of the scenery. Therefore, the point cloud data must be fused to obtain a more granular reconstruction model. A volumetric grid is constructed from the initial position of the Kinect sensor, and the grid divides the point cloud space into a very large number of tiny cubes called voxel (Voxel). Implicitly simulates a surface by giving all voxel a value of SDF (signed Distance field, effective distance field). The SDF value is equal to the minimum distance value of this voxel to the reconstructed surface. When the SDF value is greater than 0, the voxel is in front of the surface, and when SDF is less than 0 o'clock, the voxel is behind the surface, and the closer the SDF value is to 0, the closer the voxel is to the real surface of the scene. Kinectfusion technology, although the reconstruction of the scene with efficient real-time performance, but its reconstructed space is small, mainly embodied in the use of large amounts of space to access a wide range of voxel. In order to solve the problem that voxel occupies a lot of space, curless[50] et tsdf (truncated signed Distance field, truncation symbol distance field) algorithm, this method only stores the few layers of voxel that are closer to the real surface, not all voxel. Therefore, it can greatly reduce the memory consumption of kinectfusion and reduce the redundancy point of the model.
Figure 2-7 Point Cloud fusion based on the space body
The TSDF algorithm uses a raster cube to represent three-dimensional space, each of which holds its distance to the surface of the object. The positive and negative respectively of the TSDF value represent the occluded surface and can be met, while the point on the surface is 0 points, and the left side of 2-7 shows a model in the grid cube. If another model enters the cube, the fusion process is implemented according to the following formula (2-9) and (2-10). This refers to the distance from the point cloud to the grid, which is the initial distance of the grid, and is the weight used to fuse the same grid distance values. As shown on the right side of 2-7, the sum of the two weights is the new weight. For the kinectfusion algorithm, the weight value of the current point cloud is set to 1.
Since the TSDF algorithm is optimized by the least squares method, the point cloud Fusion takes advantage of the weight value, and all of the algorithm has obvious noise reduction function for point cloud data.
2.6 Surface Generation
The purpose of surface generation is to construct the visual contour of the object, and the normal Voxel-level method is used to process the original gray-scale data directly. LORENSEN[51] proposed a classical voxel-level reconstruction algorithm: MC (Marching cube, moving cube) method. The moving cube method first stores the data adjacent to eight locations in the data field at eight vertices of a tetrahedral body element. For two endpoints of one edge of a bounding voxel, when its value is greater than the given constant T and the other is less than T, then there must be a vertex of the contour on this edge. Then, the intersection of 12 edges and the equivalent plane in the BODY element is calculated, and the triangular patches in the BODY element are constructed, and all triangular patches are divided into two regions within the equivalent plane and the equivalent plane. Finally, the triangular patches of all the elements in this data field are connected to form an equivalent plane. Merging the contours of all cubes yields a complete three-dimensional surface.
3 Performance optimization
The advent of the
Kinect and other depth sensors not only brought changes to the entertainment application, but also provided a new direction for scientific research. Especially in the field of three-dimensional reconstruction. However, because the three-dimensional reconstruction process involves a lot of dense point cloud data processing, the computational capacity is huge, so it is very important to optimize the performance of the system. This paper uses the parallel computing function based on GPU (Graphic processing Unit, graphics processor) to improve the overall operation efficiency.
Nvidia introduced the GPU concept in 1999. Over the past more than 10 years, relying on innovation in the hardware industry, the number of transistors on the chip continues to increase, GPU performance doubled in half a year to increase the speed. The GPU's floating-point computing power is far more than the CPU, but it has very low energy consumption, very cost-effective. GPU is not only widely used in graphic image processing, but also in such aspects as video processing, oil exploration, biochemistry, satellite remote sensing data analysis, weather forecast, data mining and so on. As the GPU's creator, Nvidia has been working on GPU performance improvements and introduced the Cuda architecture in 2007. CUDA (Compute Unified device Architecture, unified Computing Equipment Architecture) is a parallel computational program architecture. With Cuda support, users can write programs to take advantage of the NVIDIA series of GPUs for massively parallel computing. The GPU is used as a universal computing device in Cuda, not just for image processing. In Cuda, the computer's CPU is called the host, and the GPU is called a device. Both the host side and the device side have programs running, the host side mainly completes the process of the program and the serial Computing module, while the device side is specialized in parallel computing. The parallel computing process of the device side is recorded in the kernel kernel function, and the host side can perform the calling function of parallel computation from the kernel function entrance. During this process, although the kernel function executes the same code, it handles different data content. The kernel function is programmed with an extended C language, called the Cudac language. It is important to note that not all operations can be implemented in Cuda parallel computing. Only the independence of the calculation, such as the addition and subtraction of the matrix, because only the corresponding subscript element of the addition and subtraction, different subscript elements are irrelevant, so it is suitable for parallel computing, and for example, the calculation of the factorial must multiply all the numbers, it is not possible to use parallel computing. Cuda has threads (thread), block, Grid (GRID) three-level architecture, the calculation process is usually done by a single grid, the grid is divided into several blocks of the average, each block is composed of multiple threads, and ultimately by a single thread to complete each basic operation, 2-8 is shown.
Figure 2-8 Cuda model
To further understand the computational process of the CUDA model, here is an example of the formula (2-11) mentioned in the previous chapter to calculate the conversion between the depth value of a point and the three-dimensional coordinate:
The value of the representation depth in the above, the inner parameter matrix is the known amount, is the coordinate of the point. It can be found that the conversion process of this point is independent of other point conversion processes, so the coordinate transformation of the points in the whole image can be executed in parallel. This parallel computation can significantly increase the overall rate of computation. For example, using a grid to calculate the depth image of a pixel to the three-dimensional coordinates of the transformation, only need to divide the grid into blocks, each block includes a thread, each of the threads to operate a pixel point, you can easily complete all the coordinate conversion operations.
Through the parallel computing of GPU, the three-dimensional reconstruction performance has been greatly improved, and the real-time input and output is realized. It lays the foundation for the application of Kinect in actual production and life.
Summary
First, the basic concepts related to three-dimensional reconstruction are introduced, including depth image, point cloud data, four coordinate system and the transformation relationship between them.
Overview of three-dimensional reconstruction technology