Many threshold values are often used for somatosensory operation detection, such as the distance between moving and a certain part.
However, for different people, these values are usually different. For example, the mobile range of people with long arm and short arm is also quite different.
If you want to achieve better adaptability, you need to adapt to people of different shapes (heights), so you have the idea of Using Kinect to estimate height.
First, let's see what data the kinect2 API provides:
- Bottom Layer
- YUV colorbuffer (1080 p)
- Depth buffer (512x424)
- IR buffer (512x424)
- Middle Layer
- Cameraspacepoint of each skeleton
- Senior Management
- Speech/emotion/Gesture Recognition
The direct idea is to use (head. position. y-foot. position. y) to estimate the height. if you need more precise values, you can access the boundary points of the depthbuffer. If this article does not discuss them, let's take a look at the value range of each component in the cameraspacepoint:
- X: about [-1.3, 1.3]
- Y: [-1, 1]
- Z: about [0.5, 4.5]
Obviously, the method of getting the y difference does not work. Let's look back at the principle of Kinect:
People with graphics experience can see the perspective conversion principle at a glance. Simply put, the Kinect uses the IR and depth data for skeleton detection, switch to cameraspacepoint and then combine the parameters of kinect2 To figure out how cameraspacepoint came from:
|
Version 1 |
Version 2 |
Depth range |
0.4 m → 4.0 m |
0.4 m → 4.5 m |
Color stream |
640 × 480 |
1920 × 1080 |
Depth stream |
320 × 240 |
512 × 424 |
Infrared stream |
None |
512 × 424 |
Audio Stream |
4-mic Array |
4-mic Array |
USB |
2.0 |
3.0 |
Although X and Y are not traditional projectionspace, in combination with the aspect ratio of FOV and depthbuffer, the X direction can be seen as adding a part out of the range, therefore, the X and Y components can still be regarded as the relative coordinate after the projection transformation. As for the Z component, it is obviously the actual distance (meter). We can manually convert it to [-1, 1] scope:
XMVECTOR headP = XMVectorSet(head.X, head.Y, (head.Z - 0.4f) / 4.1f, 1.0f);XMVECTOR footP = XMVectorSet(foot.X, foot.Y, (foot.Z - 0.4f) / 4.1f, 1.0f);
Then it is easy to convert the coordinates of the projection space into the coordinates of the camera space. Simply multiply the inverse matrix of projection. How can we construct this inverse matrix? For reference and hardware parameters, it can be estimated that aspectratio should be 1, rather than (512/424). Near/far clip is off-the-shelf:
XMMATRIX project = XMMatrixPerspectiveFovRH(60.0f / 180.0f * XM_PI, 1.0f, 0.4f, 4.5f);XMMATRIX inverse = XMMatrixInverse(nullptr, project);
After the coordinates are converted to the coordinates in meters, the height data is calculated in a space:
XMVECTOR headW = XMVector3Transform(headP, inverse);XMVECTOR footW = XMVector3Transform(footP, inverse);XMVECTOR height = XMVector3Length(XMVectorSubtract(headW, footW));
With this data, other thresholds and the like can be used to calculate the relative value according to a certain proportion, to initially solve the problem that the experience of people with different heights varies greatly. In addition, if you need to identify the sitting posture, you can estimate the distance from the spine to the head instead, because this value is relatively stable and will not be affected by leg movements. The following is the demo effect:
Although the result has an error (about 5cm), It is enough as an adaptation parameter.
In addition, you can use the jointfilter Algorithm for filtering to avoid unstable results caused by jitter.
Reference: Kinect for Windows Version 2: overview
Http://pterneas.com/2014/02/08/kinect-for-windows-version-2-overview/Skeletal joint smoothing White Paper
Http://msdn.microsoft.com/en-us/library/jj131429.aspx
Use kinect2 to measure height