3D-HEVC Coding Framework
3D-HEVC coding structure is the expansion of HEVC, each point of view texture and depth map coding mainly using HEVC coding framework, but on the basis of a number of new coding techniques to make it more conducive to depth and multi-view coding.
Figure 1 3D-HEVC Coding structure
As shown, the 3D-HEVC codec structure is similar to MVC. All input video images and depth images in the graph are scenes of the same moment, at different shooting locations, which form an access layer. In the same access layer, the independent viewpoint (datum viewpoint) is encoded first, then the depth map of the viewpoint, and then the other viewpoint video image and depth map are encoded. In principle, the image of each viewpoint, including the video image and the depth image, can be encoded using the HEVC coding framework, and all the input bit streams are combined to form a 3D bit stream.
For independent viewpoints, using the uncorrected HEVC coding structure, since the code of the viewpoint is independent and does not depend on other viewpoints, its corresponding bitstream can be extracted separately to form a 2D bit stream, thus recovering 2D video. This shows that 3D-HEVC is compatible with 2D video codec. Other viewpoints and depth graphs are modified with the HEVC coding structure. The Red Arrows indicate that the similarity information between viewpoints can be used to remove the redundancy between viewpoints and improve the coding performance.
Non-independent viewpoint coding technology
In addition to using all the tools used for independent viewpoint coding, 3D-HEVC uses the HEVC encoding technology for 3D expansion to make it more useful for multi-view coding when coding a non-independent viewpoint. For example, using the information of the encoded independent viewpoint to predict the information of the current coding viewpoint, so as to reduce the redundancy between viewpoints and improve the coding efficiency. The extended technology is mainly the Parallax compensation prediction, the motion prediction between viewpoints and the redundancy prediction between viewpoints.
Parallax Compensation Prediction
Parallax Compensation Prediction (DCP) is an important coding technique in non-independent viewpoint coding, and parallax compensation and motion compensation have similar concepts, which can be understood as a method of inter-frame prediction, but the reference frame of the two is essentially different. The reference frame of the Motion compensation Prediction (MCP) is the encoded frame of the same viewpoint at different moments, while the DCP reference is the encoded frame at the same moment and at different viewpoints. Because the DCP is similar to the MCP, the DCP is added to the MCP list as a predictive mode for the MCP. There is no modification in the macro block level syntax and decoding process, only advanced syntax elements have been improved, so that the same access layer encoded viewpoint image can be added to the reference list.
Fig. 2 Prediction of parallax Compensation and prediction of motion compensation
As shown, the reference image index (r value) distinguishes the MCP from the DCP, i.e. when R=1 represents the DCP and the rest represents the MCP.
Motion prediction between viewpoints
Multi-View video is the same scene at the same time, multiple cameras from different angles of the video shot, different viewpoints of the object motion is similar. Therefore, the motion information of the current viewpoint can be predicted by using the movement of the point of view encoded at the same moment.
One way to predict motion between viewpoints is to use a constant parallax vector for all blocks of a frame image. To more effectively determine the relationship between the current block and the corresponding block in the reference viewpoint, you can also use the depth map information to more accurately predict the relationship between the current viewpoint and the reference viewpoint.
Fig. 3 The motion parameters of the current coded viewpoints are deduced based on the motion parameters of reference viewpoints
As shown, if the depth graph of the current image is given or can be estimated, the maximum depth value of the current encoding block is converted into a parallax vector. For the center position of the current block X plus the obtained parallax vector, so as to obtain the position XR in the reference viewpoint, if the XR is encoded using motion compensation prediction, then the related motion vectors can be used as reference for the motion information of the current viewpoint code block. Similarly, parallax vectors derived from the current block's maximum depth value can be used in a DCP.
Redundancy prediction between viewpoints
The motion information and redundant information of encoded images in the same access layer can be used to improve the coding performance of the non-independent viewpoints. In order to take advantage of the redundancy information between the viewpoints, a flag information is added to the syntax element between the coded blocks to indicate whether the prediction block exploits the redundancy prediction between viewpoints. The process of predicting the redundancy between viewpoints and the prediction of motion vectors between viewpoints is similar:
1. First convert the parallax vector according to the maximum depth in Figure 3.
2. Then, based on the Parallax vector, the position in the reference viewpoint is determined and the redundancy information is obtained.
3. Finally encode the redundancy of the current block and the predicted redundancy difference. If the redundancy information is based on the sub-pixel, then the redundant information of the reference viewpoint should be interpolated filtering.
Fig. 4 prediction structure of disparity redundancy
As shown, the DC represents the current viewpoint (View1) encoding block, and the BC and Dr respectively represent the corresponding block of the same time (TJ) reference point (VIEW0) and the corresponding blocks in the same viewpoint (View1) at different moments (Ti), and VD is the motion information between the DC and the Dr. Since the BC and DC are at the same moment, different viewpoints are projected on the same object, so the motion information of the two blocks should be the same. Therefore, BC's time domain prediction block BR can be determined by VD, whereas the redundancy and motion information VD of BC can also be mapped to DC by weighting factor.
Depth map Coding
In general, all the coding techniques used for video images can be used as depth map coding, but HEVC is designed to encode the best video sequences, and the coding of depth graphs is not optimal. In contrast to video sequences, depth graphs feature large chunks of the same area and sharp edge information.
3D-HEVC's in-depth image in-frame coding adds four modes to video coding, divided into two categories: Wedge segmentation (Wedegelets) with straight line segmentation and contour segmentation method (contours) with arbitrary shape segmentation. The depth map code divides a depth block into two non-rectangular regions, each of which is represented by a constant. In order to be able to represent the split information, at least two element parameters should be determined, respectively, to indicate which region belongs to the parameter and the constant constant of the area.
Figure 5 Wedge-shaped split mode
As shown, the main difference between wedge segmentation and contour segmentation is the different ways of splitting. For wedge-shaped segmentation, two regions of a depth block are divided by a straight line, and the divided two regions are P1 and P2 respectively, and the dividing line is represented by the start position S and the terminating position E. As you can get from Figure 5, for an analog signal (left), you can use a linear function to represent a split line. The middle diagram describes the segmentation of discrete signals, which is a sample matrix of ub*vb size, the starting point S and the end point e correspond to the boundary value of the sampling matrix, which represents the position of the dividing line. For wedge-shaped segmentation during encoding, the segmentation mode is stored, and the information stored includes a matrix of ub*vb size, each element in the matrix is a binary information that indicates whether the current block's sampled value belongs to P1 or P2. The graph on the right represents the segmented prediction block, where the white part represents the P1 area, and the black part represents the P2 area.
Figure 6 Contour Segmentation mode
As shown, the split line of contour segmentation cannot be represented by a geometric function as in the case of a wedge-shaped segmentation method. P1 and P2 can be arbitrarily shaped, and can even be divided into multiple parts. At the same time, contour segmentation and wedge segmentation are very similar in the segmentation mode.
In addition to the split information needs to transfer, but also requires the transmission of the First Division number of the region depth value, each split region depth value is a fixed number, the best choice should be the range of the original depth value of the mean value.
Therefore, according to the segmentation mode and transmission information, the new intra-frame encoding mode of the depth map is divided into four ways:
1. A clear Wedge method: In the coding end to determine the best matching of the segmentation, and in the bitstream transmission of segmented information, using the transmission of the segmented information, the decoder can reconstruct the block signal.
2. Intra-Frame prediction Wedge Method: A modified value is transmitted by predicting the wedge-shaped segmentation of the current block by adjacent coded intra-block blocks.
3. Inter-element Wedge method: The segmentation information of the current block is derived from the co-located block of the reconstructed block, that is, the block and the current code block in the same image.
4. Inter-element contour method: The reconstructed co-located block is deduced to obtain two arbitrary shape region segmentation.
3D-HEVC Video Encoding technology