Reprint Please specify source: http://blog.csdn.net/wangyaninglm/article/details/51533549,
From:
Shiter The art of writing programs
Image cutting, confidence propagation and other global optimization stereo matching algorithm, because the operation process needs to be iterative refinement, long operation time, can not achieve real-time calculation of stereo matching requirements, but real-time demand is widely available in stereo matching application scenarios. Many algorithms based on local matching have short operation time, but because only considering the cost aggregation in the matching window, the effect is very poor, the disparity map is only a lot of sparse view almost, but also through the interpolation calculation, obviously can not be used for car navigation, target pickup and so on need accurate results and the operation speed has certain requirements of the scene.
1 Local cost aggregation
Based on the local stereo matching algorithm of window structure, according to matching constraints to search for the best matching point, when searching for the matching cost of the left and right two images at a point of Parallax D, the average value (or other measure) of the cost of all points in the matching window centered on the point is obtained. (4-1):
Figure 4-1 Cost transfer of local matching algorithm
Figure 4-1 Cost delivery of local aggregation
We call this process cost aggregation, this region-based matching method uses the similarity metric between local windows to match the spatial coordinates of the corresponding primitives, which is better for the region with obvious continuity details. Obviously, this kind of method can not affect the generation value of the point in the matching window, so that the value of the cost cluster does not have global characteristics, and the global structure of the matching primitive is lost, so it is very easy to produce the wrong match in the region with lower texture feature.
How to get the global feature of the matching primitives in the cost clustering, and then make the local cost aggregation method overcome these shortcomings, this chapter uses the minimal spanning tree method in graph theory to aggregate the global cost with the tree structure, relative to the region-based local window stereo matching method.
2 bilateral filtering and cost aggregation
The bilateral filter (bilateral filter) is a filter that can be used for edge-preserving denoising. Simple is a filter that takes into account the difference between the pixel space and the intensity, so it has the characteristic of preserving the edge of the image. The filter can be controlled by two filter parameters. A control geometry space distance. Another control pixel difference.
Fig. 4-2 Simultaneous effects of bilateral filtering on spatial and color weights
Figure 4-2 Bilateral filter weights of the central pixel
In the traditional Gaussian filter, the weight is only related to the space distance between pixels, and the same filtering effect is found regardless of the content of the image. A bilateral filter, which increases the weight information of the pixel difference on the basis of the Gaussian filter, the formula (4-1) is as follows:
The formula (4-1) is a weighted average, and separately measures the filter amount of the image I, the former control the weight of the distance information, the latter control the weight of the color information. As a result, in the area of small pixel intensity transformation, bilateral filtering has the effect of Gaussian filtering, while the gradient can be maintained in places where intensity gradients such as edge of image are large. This feature can replace the image segmentation method in the stereo matching problem, or as the preprocessing means of image segmentation method, reduce the computational amount of the core matching algorithm.
The matching cost of setting the pixel p to the Parallax level D is the aggregation cost. Then the bilateral filter can be fused according to the formula (4-1) and the aggregation cost.
where Q serves as a pixel in the support window. and similar to the parameters of the formula (4-1) are two parameters for adjusting spatial similarity and color (grayscale) similarity respectively. Generally, the calculation of the bilateral filter function can eliminate the standardized steps, the formula (4-3) can be simplified to:
3 Minimum Spanning Tree
The minimum spanning tree is also called the minimum weight spanning tree. In a given graph, (u,v) represents the edge of the connection vertex u and Vertex v, W (u,v) represents the weight of this edge, if there is a subset of T as E and no ring, so that W (t) is the smallest, then T is the smallest spanning tree of G.
According to the minimum spanning tree structure, when the image is viewed as a graph of a four Unicom region, the weight of the edge formed by the two point of the image is defined as the difference of the gray value of the two points (or other metrics such as color information), and the minimum spanning tree structure generated under this definition coincides with the expectation of adding global attributes to the matching window
4 Cost aggregation based on the minimum spanning tree
In order to find the cost value of two images to be matched in Parallax D, the cost aggregation method of matching window based on region can not affect the generation value of the point, and focus on the cost clustering, so that the generation value has a global attribute, so that all the points in the image can transmit a support amount to the point. The distance from the point of the color of a large number of different pixels to pass the smaller support, the distance from the same color difference is not small transmission of large support.
According to the minimum spanning tree structure we know that when the image is a four-link area of the graph, the image two points of the formation of the weight of the edge we can define the value of the two-point gray values, the definition of the MST structure is exactly what we expect, the equivalent of the local algorithm added global properties, And does not increase the amount of computation.
The cost clustering process based on the minimum spanning tree is very simple, there are two main cost aggregation methods for the image to be matched after generating a minimum spanning tree:
1. Bottom-up aggregation, which is the traversal from the leaf node to the vertex.
2. Top-down aggregation, which is the traversal from vertex to leaf node.
For each node's aggregation cost, only two traversal of the spanning tree is required to get the result (4-).
Figure 4-3 Two cost aggregation schemes
Firgure 4-3, aggregation schemes
Set S (P,Q) is defined as a similarity of two points, and D (P,Q) is defined as a distance of two points (the smallest path between MST two points), which is a cluster value. Then there are:
It is used as a parameter to control the similarity between two points. Fusion formula (4-4) after the results of bilateral filtering:
Note that there are two filter control parameters in the formula (4-4), since the minimum spanning tree structure itself has a distance metric, and the closer the pixels in the tree are similar, so the formula (4-7) uses only one parameter to control the similarity.
The following two aggregation methods describe how the aggregation cost is calculated separately.
4.1 Bottom-up aggregation (Leaf to Root)
Figure 4-4 Bottom-up aggregation
Figure 4-4 Leaf to Root aggregation
Bottom-up aggregation is the leaf to root, from the leaf node to the root node, the cost of aggregation, take figure 4-4 as an example, assume that figure 4-4 is a minimum spanning tree, the value on the edge represents the weight, at this time compute node V4 cost aggregation, then can directly calculate the child node (V3, V4) The value of the aggregation of values to the respective edge of the product collection, because V4 is the root node and does not need to consider the effects of the parent node. The arrow upwards represents the cost aggregation value from the leaf to the current node. The aggregation cost of the V4 can be expressed as a formula (4-8):
According to the formula (4-8), the method of calculating the cost of bottom-up aggregation can be deduced and calculated according to the aggregation cost of the root node and the sum of the cost products of the sub-node aggregation:
If node v is a leaf node, the
Because of the characteristics of the minimum spanning tree in the calculation process, the calculation of each layer in the bottom-up cost aggregation process only needs to calculate the product of its child nodes, and the cost aggregation value of the child node already contains the influence of the grandson node and its descendants. Therefore, the computational process greatly reduces the computational capacity.
4.2 Top-down aggregation (Root to leaf)
For the situation in Figure 4-4, V4 does not have a Father node, which is a special case, if we want to calculate the cost aggregation value of V3? Obviously it is not enough to consider only V1 and V2, but also to consider the influence of V4. That is, the impact from top to bottom. 4-5 is shown below:
Figure 4-5 Top-down aggregation
Figure 4-5 Root to leaf aggregation
At this point we can fully assume that V3 is the root node, and its parent node is converted down to his child node, you can use the same method to add the cost aggregation value of V4 multiplied by its weight. However, since the cost aggregation value of V4 has already taken into account the V3 effect, it is necessary to reduce the cost aggregation value of the V4 in advance by subtracting the cost aggregation value of V3. Then the aggregation value of V3 can be expressed as:
Among them, the top-down cost aggregation value is the final cost aggregation value, from the top to the next layer of the computational cost, so that the same can reduce the computational amount. For the more general case, when the cost of the leaf node is aggregated from the root node, the general form can be deduced according to the formula (4-10):
To simplify:
5 general parallel processing for stereo matching
Programming models for Parallel program development are divided into two main categories: 1. Messaging model, 2. Shared storage model. In this paper, we use the shared storage model to take coarse-grained parallel partition on each channel of color image, parallel processing on color image, the algorithm of filtering algorithm and the establishment of minimum spanning tree within each channel, and the SIMD extension based on processor instruction quantization.
The main flow is shown in the flowchart below:
Figure 4-Parallelization of stereo matching process
Figure 4-
Firstly, the whole algorithm process of the global stereo matching algorithm based on the minimum spanning tree is used to model the computational analysis, and the intensive computing tasks are analyzed and extracted, and the optimization of bilateral filtering is made by reference to [32].
5.1 OpenMP Thread Parallelization
OpenMP is actually a shared memory parallel system that provides a set of guidelines for compiling annotation schemes. Today's popular brands are based on the x86 architecture of Intel AMD desktop processors, and the arm-based processor is well-supported for OpenMP. As the mainstream shared memory model, it has been supported by almost all commercial compilers, and has good portability.
Basically, with the OpenMP Code compilation option, the code is portable.
5.2 General-purpose processor instruction optimization (SIMD vectorization calculation)
Almost all processor vendors have made multimedia extensions for their processor products. Parallel computing for graphics processors requires additional hardware input, and swapping data with memory can be time consuming. Multimedia extensions are typically present in the processor as a vector part, and the corresponding instruction set appears as a single instruction multi-data stream (instruction Multi data).
The advantages of SIMD in performance: Edit the addition instruction, for example, the single instruction data (SISD) of the CPU after the addition instruction decoding, the execution part first accesses the memory, obtains the first operand, then accesses the memory again, obtains the second operand, then can carry on the summation operation. In the SIMD CPU, the instruction decodes several executing parts to access the memory at the same time, and obtains all the operands at once. This feature makes SIMD particularly suitable for data-intensive operations such as multimedia applications.
SIMD instructions can speed up processing such as C and Java languages. Vector instructions work in parallel across data elements, allowing the host to quickly process large amounts of data. This is a boon for social media and big data workloads, but it does not seem to be much help for system programmers facing normal workloads.
SIMD instructions increase throughput in a number of ways. Most machine instructions will result in one of the different input operands, with most SIMD assemblies using two input registers and storing the results in a third register. This means that programmers can save time with the register tangled.
The vector register is 128 bytes long. The first 16 registers actually coexist with a 64-bit floating-point register (FPRS). Changing a fpr also destroys all bytes of the corresponding vector register. There are some special rules for protecting vector registers through program calls, as detailed in IBM's Assembler Services guide.
SIMD vector directives include all mathematical functions and floating-point patterns. There are also string manipulations and methods for acquiring and storing data.
Reference documents
[11] Yang Q. A non-local Cost aggregation method for stereo matching[c]//PROCEEDINGS/CVPR, IEEE Computer Society Conference on Compu ter Vision and Pattern recognition. IEEE Computer Society Conference on Computer Vision and Pattern recognition. 2012:1402-1409.
[12] Yang Q, Ji P, Li D, et al. Fast Stereo matching using adaptive guided Filtering[j]. Image and Vision Computing, 2014, 32 (3): 202-211.
[13] Yang Q. Hardware-efficient Bilateral filtering for stereo matching[j]. Pattern analysis and Machine Intelligence, IEEE transactions on, 2014, 36 (5): 1026-1032.
[14] Yang Q. Stereo Matching Using Tree filtering[j]. Pattern analysis & Machine Intelligence IEEE transactions on, 2015, 37 (4): 834-846.
Reprint Please specify source: http://blog.csdn.net/wangyaninglm/article/details/51533549,
From:
Shiter The art of writing programs
Introduction of real-time stereo matching algorithm based on minimum spanning tree