Kinect for Windows SDK Development Primer (19): Kinect Fusion

Source: Internet
Author: User
Tags amd radeon

Original: http://www.cnblogs.com/yangecnu/p/3428647.html

The Kinect fusion feature was introduced in Kinect for Windows SDK1.7. This feature has been improved and enhanced in the 1.8 SDK, and Kinect fusion allows us to use the Kinect for Windows sensor for three-dimensional geometric reconstruction of real-world scenarios, and currently supports export. obj and. STL Three-dimensional data formats. Kinect Fusion technology enables real-time, three-dimensional modeling of objects on GPU-accelerated machines. Compared to the traditional three-dimensional modeling approach, Kinect Fusion has the greatest advantage of being quick and easy.

Kinect Fusion can be used in industrial design, 3D printing, game production, medical education and other fields.

is the work flow for Kinect fusion. The depth image data acquired by the Kinect sensor has a lot of data lost at first, and the object is scanned by moving the Kinect sensor, allowing a few seconds to create a static scene that is sufficiently smooth to recreate, creating a lattice cloud and a 3D surface model.

A hardware requirement

Kinect Fusion has a high demand for computer hardware, and Kinect fusion is able to use C + + AMP technology to process data on DIRECTX11 compatible GPUs, as well as to process data on the CPU, while rebuilding the cube building. Determined by setting the type of rebuild. CPU processing mode is suitable for offline processing, and only the latest DirectX 11-compatible GPUs support real-time, interactive rebuilds.

The minimum configuration for GPU-based rebuilds requires the system to support DirectX 11 graphics cards, and if not, Kinect Fusion will not work. At present nvidia GeForce GTX560,AMD Radeon 6950, the same type or higher than the type of graphics card configuration of the hardware can achieve real-time interactive three-dimensional reconstruction.

The official recommended configuration is a desktop CPU with 3GH or above, a multi-core processor, and a discrete graphics card with 2G of memory. It is also possible to use a notebook with a graphics card that supports DIRECTX11 technology, but it can run much slower than the same type of desktop. It is generally supported to process 30 frames per second for very smooth tracking and modeling.

two how Kinect Fusion works

Kinect Fusion reconstructs a single-frame smooth surface model of an object by fusing deep image data obtained from multiple angles. As the sensor moves, the camera's position and posture information are recorded, including position and orientation. Since we know the posture of each frame and the association between frame and frame, multi-frame data collected from different angles can be fused into single frame reconstructed fixed-point cube. We can imagine a huge virtual cube in space, which is the scene of our real world, and when we move the sensor, the depth data information is constantly added.

is the processing flow from the Kinect fusion.

    • The first step is the transformation of the deep image data. The SDK converts the raw depth frame data obtained from Kinect into floating-point data in meters, and then optimizes the data to convert the floating-point data to point cloud data that is consistent with the Kinect camera by acquiring the camera's coordinate information. The surface of these points is obtained by using the Alignpointclouds function.
    • The second step is to calculate the posture information of the global camera, including the position and orientation of the camera, by using an interactive registration algorithm to constantly acquire its posture while the camera is moving, so that the system always knows the relative posture of the camera relative to the starting frame of the current camera. There are two kinds of registration algorithms in Kinect Fusion. The first is called Nuifusionalignpointclouds, which is used to register the point cloud computed from the reconstructed object with the point cloud obtained from the Kinect depth image data. or separate use, for example, to register the data of different fields of view of the same scene; the second is called Aligndepthtoreconstruction, which can obtain higher precision tracking results when the reconstructed cube is processed. But the algorithm may not be robust enough for moving objects within a scene. If the tracking in the scene is interrupted, you need to align the camera's position and the last camera position to continue tracking.
    • The third step is to fuse the depth image data produced from the known pose camera into a cube representing the scene in the camera's field of view. This fusion of depth data is carried out by frame, continuous, denoising by smoothing algorithms, and also processing dynamic changes in certain scenes, such as adding or removing small objects in a scene. As the sensor moves, the surface of the object is viewed from different field of view. Any partition or void that is not shown in the original image is filled, and as the camera is closer to the object, the surface of the object is continuously optimized by using new, higher-precision data
    • Finally, the reconstructed cube is light-projected from the sensor viewpoint position, The reconstructed lattice cloud can produce a three-dimensional reconstructed cube that has been rendered.

Kinect Fusion's tracking of objects uses only the depth data stream generated by the Kinect sensor. This kind of tracking relies heavily on the depth of the depth data in the deep image data. So it can combine the data it sees and calculate the difference in the location of the sensor. If you aim the Kinect at a flat wall or with very few undulating objects, the trace may not be successful. Objects in the scene work best when scattered, so try to track the objects in the scene if there is a failure to track when using Kinect fusion to track the scene.

There are two algorithms for tracking in Kinect Fusion, each of which is implemented by the aligndepthfloattoreconstruction and Alignpointclouds functions, which can be used for tracking camera positions, but If we use the Aligndepthfloattoreconstruction function to create a rebuilt cube, there may be better tracking accuracy. Instead, the Alignpointclouds method can be used separately, without rebuilding the cube to align two point clouds.

three related APIs

Here's how Kinect fusion works, and with the relevant API in the SDK, we can use Kinect fusion to make three-dimensional reconstruction of real-life scenes, the Kinect fusion-related process:

First, it needs to be initialized, and during the initialization phase, Kinect Fusion determines the world coordinate system in the modeling process and constructs a static virtual cube with the real scene of the scan, and in the modeling process we only care about the real scene in that virtual cube.

The first step is to perform the processing as shown in depth image data for each frame. The following is a brief introduction to the related functions in Kinect fusion.

Depthtodepthfloatframe function

The signature of the function is as follows:

public void Depthtodepthfloatframe (depthimagepixel[] depthimagedata, Fusionfloatimageframe depthfloatframe,float Mindepthclip, float maxdepthclip, bool mirrordepth)

This method converts the unsigned short depth image data frame format to the floating point depth image data framing format, which represents the distance of the object from the Kinect sensor, the processed data is stored in the pre-allocated depthfloatframe, and the parameters are Depthimagedata and The size of the depthfloatframe must be the same, and the function runs on the GPU.

Depthimagedata is the depth image raw data obtained from the Kinect sensor. Mindepthclip represents the minimum depth threshold, which is less than the value is set to 0,maxdepthclip as the maximum depth threshold, greater than the value is set to 1000, and the last Boolean mirrordepth indicates whether to mirror the depth data.

The minimum maximum depth threshold can be used to process the input data, for example to exclude certain special objects from the three-dimensional reconstruction.

Processframe function

Next you can call the Processframe function, which actually calls the Aligndepthfloattoreconstruction and Integrateframe functions internally, first of all, the Processframe function is introduced here.

public bool Processframe (fusionfloatimageframe depthfloatframe, int maxaligniterationcount, int maxintegrationweight, Matrix4 worldtocameratransform)

This function is used to further process the deep image data of each frame after depthtodepthfloatframe processing. If an error is generated during the aligndepthfloattoreconstruction phase, then the next integrateframe phase is not processed and the camera posture remains the same. The maximum image resolution supported by this function is 640*480.

The Maxaligniterationcount parameter is the number of iterations in the registration process, which is used to denote the number of iterations of the alignment camera tracking algorithm, the minimum value is 1, and the smaller the value is faster, but setting too small can cause the registration process not to converge, thus not getting the correct conversion.

The Maxintegrationweight parameter is used to control the smoothing parameters of the deep image fusion, and the value is too small to make the image more noisy, but the movement of the object shows faster and disappears faster, so it is more suitable for dynamic scene modeling. Large values allow objects to fuse more slowly, but retain more detail and have fewer noise points.

Worldtocameraltransoform, the parameter is the latest camera position.

If the method returns true, the processing succeeds, and if False is returned, it indicates that the algorithm encountered a problem aligning the depth image data and was not able to calculate the correct transformation.

We can call the Aligndepthfloattoreconstruction and Integrateframe functions separately, so that more detail can be controlled, but the processframe speed may be faster, after the successful processing of the method, If you need to output a refactoring image, you only need to call the Calculatepointcloud method and then call Fusiondepthprocessor.shadepointcloud.

aligndepthfloattoreconstruction function

public bool Aligndepthfloattoreconstruction (Fusionfloatimageframe depthfloatframe,int Maxaligniterationcount, Fusionfloatimageframe Deltafromreferenceframe, out float alignmentenergy, Matrix4 worldtocameratransform)

This method is used to match the depth image data frame to the reconstructed cube space, and the space relative position of the camera of the current depth data frames is computed. The camera tracking algorithm needs to reconstruct the cube and, if the tracking is successful, updates the camera's internal position. The maximum resolution supported by this method is 640*480.

The Maxaligniterationcount parameter has the same meaning as the parameter in the Processframe method.

The Deltafromreferenceframe represents a registration error data frame, which is a pre-allocated floating-point image, and typically stores the alignment of each observed pixel to the previous reference image frame. It can often be used to produce color rendering or as a parameter for other visual processing algorithms, such as the object segmentation algorithm. These residuals are normalized to a range of 1 to, representing the degree of registration error for each pixel. If a valid depth value exists, but the cube is not refactored, then the value is 0 for perfect alignment to the refactoring cube. If the depth value is not valid, it returns 1. If you do not need this return message, simply pass in null.

Alignmentenergy indicates the accuracy of the registration, 0 means perfect match

Worldtocameratransform represents the position of the camera at the moment of calculation, usually by calling Fusiondepthprocessor.alignpointclouds or Aligndepthfloattoreconstruction the two methods obtained.

This function returns True if the alignment succeeds, and the return false indicates that the algorithm encountered a problem when aligning the depth image data and was not able to calculate the correct transformation.

Integrateframe function

public void Integrateframe (Fusionfloatimageframe depthfloatframe,int maxintegrationweight, Matrix4 Worldtocameratransform)

Used to fuse depth data frames to maxintegrationweight in a reconstructed scene to control the degree of smoothness of the fusion.

Worldtocameratransform represents the camera position of the depth data frame at this time, and he can be returned by the registration API.

Calculatepointcloud function

public void Calculatepointcloud (Fusionpointcloudimageframe pointcloudframe,matrix4 worldtocameratransform)

The point cloud data under a certain viewpoint is computed by ray tracing algorithm.

These point cloud information can be used as parameters for Fusiondepthprocessor.alignpointclouds or Fusiondepthprocessor.shadepointcloud functions. To produce a visual output of the image.

The Pointcloudframe parameter is a fixed image size, for example, you can point cloud data within the size range of a form, and then place an image control, This image control is populated by calling Fusiondepthprocessor.shadepointcloud, but it is important to note that the larger the image, the more resources are spent on the calculation.

The Pointcloudframe parameter is a pre-allocated point cloud data frame that is populated by the point cloud data that is projected by the reconstructed cube, usually by calling Fusiondepthprocessor.alignpointclouds or Fusiondepthprocessor.shadepointcloud. function generation.

The Worldtocameratransform parameter represents the position of the camera's viewpoint, which is used to indicate where the ray is projected.

Calculatemesh function

Public Mesh calculatemesh (int voxelstep)

The geometric network model used to return the refactoring scene. The function outputs a polygonal stereoscopic surface model from the reconstructed cube. Voxelstep describes the sampling step, the smaller the sampling step setting, the more refined the returned model.

Four Conclusion

Understanding these functions, you should almost be able to use the features that Kinect Fusion provides, and the best way to learn is to look directly at the code of the sample program provided in the Kinect Developer Toolkit, believing that the above function should not seem too laborious.

Because my notebook configuration is not up to the requirements, so there is no way to demonstrate, but I believe through the above introduction, you know and use the Kinect fusion should be helpful

Kinect for Windows SDK Development Primer (19): Kinect Fusion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.