Kinect 2.0 + opencv displays deep data, skeleton information, gesture status, and two-value map of Characters

Source: Internet
Author: User
1. Preface

The performance of the first generation is greatly improved compared with that of the first version of Kinect 2.0!

I would like to simply find a tutorial and copy it and paste it. No one has ever written the C ++ version of the Kinect 2.0 tutorial. I have tried it myself and I will share the result with you.

The following functions are implemented: depth data, body data, hand state, and the Binary Map of a person (the one in Figure 1, the official method of Microsoft is to extract and display the body index data.

The effect is as follows:

Figure 1 skeleton information, two-value map and gesture status

Figure 2 depth information

2. Installation

The installation requirements of Kinect 2.0 are extremely demanding and can be viewed on the official website. It is very easy to install. If you have any questions, leave a message.

One sentence: I am a Mac. I have been using Parallels Desktop to run the virtual machine lever. As a result, due to GPU requirements, the GPU displayed under the virtual machine is not a real GPU, as a result, my Kinect 2.0 cannot be installed on a virtual machine. If you have solved this problem, please kindly advise !! Grateful.

3. Code profiling

If you want to study it yourself, there are basically two research sources. One is the sample code provided by the Kinect 2.0 SDK, and the other is the official website of Microsoft Kinect (in particular, we recommend that you read the "read technical Docs" link ). I found out by combining these two materials.

The specific code will be appended to the end, but first we need to understand the working mechanism of Kinect 2.0. After learning about this, the code will be super easy to understand! Kinect 2.0 is really much simpler than function interfaces of 1.0.

3.1 Operating Mechanism

In Kinect 2.0, each type of data corresponds to three classes: source, reader, and frame. For example, if you want to read the skeleton, there are three classes: ibodyframesource, ibodyframereader, and ibodyframe. To read deep data, there are three classes: idepthframesource, idepthframereader, and idepthframe, similarly, other data such as body index, infrared, and color are also named.

So what do these three interfaces mean?

3.1.1 Source

After we initialize and open the Kinect, We need to request the Kinect to open a source from which we will continuously obtain information. The code is:


M_pkinectsensor is the total port of our Kinect, and pbodyframesource is an ibodyframesource class.

3.1.2 Reader

Because the source is owned by the Kinect client and not by our computer, we need to create a read port, which is bound to the above source, then we can read the information by calling this reader. The code is:


M_pbodyframereader is an ibodyframereader class.

3.1.3 Frame

Frame is a class that truly stores data. Each time, reader reads data into the frame, and then we extract all kinds of final data from the frame. Code:


Pbodyframe is an ibodyframe class.

3.2 How to obtain data from Frame

The request source and reader are identical for each data type, but the information extracted from the frame is different. The following describes how to extract deep information, skeleton information, gesture status, and Binary Map Information.

3.2.1 depth information:

In Kinect 2.0, the range of the depth coordinate space is (height * width = 424*512) (as described on the official website ). Extract data from the frame of the depth information, which is mainly to store the data in the frame into an array (link to the official website ). Code:

pBodyIndexFrame->CopyFrameDataToArray(cDepthHeight * cDepthWidth, bodyIndexArray);
Here, cdepthheight is 424, cdepthwidth is 512, and bodyindexarray is a 16-bit unsigned int array of 424*512 size, which is used to store deep data.

3.2.2 skeleton information:

Kinect 2.0 can track the skeleton of six people at the same time, so each time we need to call the function to obtain the information of the six skeleton (if no one is, the skeleton class is a null pointer ). Code:

pBodyFrame->GetAndRefreshBodyData(_countof(ppBodies), ppBodies);
Here, ppbodies is an ibody array with a length of 6, and ibody is a class used to store tracked skeleton information.

After obtaining this class, we need to further extract the skeleton position from the class. For each element pbody in ppbodies, the code is:

pBody->GetJoints(_countof(joints), joints);

Here, joints is an array with a length of 25. Each element is the position information of the skeleton. However, the skeleton position information is the position in the camera coordinate system (camera view). The range of X and Y is-1 to 1. So we need to convert it to the depth coordinate system. A coordinatemapper class is used here. The Code is as follows:

m_pCoordinateMapper->MapCameraPointToDepthSpace(joints[j].Position, &depthSpacePosition[j]);
Coordinatemapper class creation is very simple. For details, refer to the code. Depthspaceposition is an array with a length of 25. Each element is a depthspacepoint. This element contains the X and Y coordinates in the depth coordinate system.

3.2.3 gesture status:

In the pbody of the above skeleton information, it also contains the gesture status information of the person tracked. The specific code is:

The lefthandstate and righthandstate are both handstate classes, which have five states: open, closed, lasso, not tracked, and unknown ). The first three are special gestures, and the last two are similar, but they cannot be identified. What is special is lasso. This gesture is explained on the official website. The specific shape is: extend the scissors hand and merge the forefoot and middle finger together, which is the lasso status. In fact, it can also be determined to be lasso only when the index finger is stretched out, but the index finger and the middle finger are used together, and the tracing effect is better and more stable.

3.2.4 Two-value map of characters:

The acquisition of the Second-value map data is very similar to that of the deep data. The specific code is:

pBodyIndexFrame->CopyFrameDataToArray(cDepthHeight * cDepthWidth, bodyIndexArray);
However, bodyindexarray is a 424*512 8-bit unsigned char array. If a point is regarded as a part of a person, it is black; otherwise (the background part) is white.

4. Source Code

The source code is relatively long and I have made some comments. Please download it from my GitHub. Address: otnt's GitHub

Note: You must manually configure the project to include the libraries of Kinect and opencv.

If you have any questions or suggestions, please leave a message below. I will keep you updated ~


Kinect 2.0 + opencv displays deep data, skeleton information, gesture status, and two-value map of Characters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.