Kinect Human-Computer Interaction development Practice

Last Update:2015-01-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kinect for Windows SDK

Bone Tracking--bone tracking for one or two people moving within the Kinect field of vision, traceable to 20 nodes in the body

Depth camera-gain visibility into the environment with depth sensors three-dimensional position information (depth image-each pixel is the distance from the Kinect sensor)-encodes space using infrared emitted by the Kinect infrared emitter, regardless of ambient light, without affecting the measurement results

Audio processing-Integration with Microsoft Speech speech recognition API

Processing of Kinect color and infrared image data

Color image quality-normal quality and high quality-determines the speed at which data is transferred from the Kinect to the PC

Normal quality--compresses the sensor side before it is passed to the application console, then extracts the data from the console-image compression causes the returned color data to have a frame rate of 30, but degrades the image quality

High quality – delivered directly to the console, without compression, with a maximum frame rate of no more than 15

Color data can be selected in two color formats- RGB YUV

The color image type is represented by the enumeration type Colorimageformat

Infrared data stream

Principle:Kinect first through the left side of the infrared emitter to the environment to emit infrared ray, this beam of infrared because of the high randomness, its space in any two different positions reflected in the formation of the spot is not the same, the environment to form a three-dimensional "optical code", The infrared image in the Kinect field is captured by the infrared receiver on the right, and finally, a series of complex calculations are made using the infrared image and the Kinect's original parameters to obtain three-dimensional depth information in the field of view.

The Kinectsensor class provides interfaces for managing the switch of the Kinect device and the acquisition of all data

Infrared image is actually a special form of color image--kinectsensor.colorstream.enable (COLORIMAGEFORMAT.INFRAREDRESOLUTION640X480FPS30)

Displays the image statement this. Colorimage.source = Bitmapsource.create (Imageframe.width, Imageframe.height, 96,96, PIXELFORMATS.GRAY16, NULL, Pixeldata, Imageframe.width * imageframe.bytesperpixel);

Processing of deep data

Kinect identifies two human images in front of the sensor group by processing the depth data, creating a segmented graph-a bitmap that corresponds to the nearest player index in the field of view (the player index is 0, indicating that no player is found in the corresponding location, the index value is 1, and 2 indicates the detected player number)

Although the player segmented data is a logical stream of isolation, actual depth data and player segmented data are incorporated into a separate structure

A high 13 bits per pixel represents the closest distance from a particular coordinate object within the depth sensor's field of view, theoretically the range of 0~8192 mm

The low 3 bits per pixel identifies the visible player index that is tracked on the XYZ coordinate system of the pixel, which can be seen as the shaping value of the 3 bits

The depth image data type is defined as Depthimageframe Copypixeldatato () assigns the depth image data obtained from the Kinect device to the short array, which contains depth information for each pixel and player index information (16 bits)

How to deal with bone tracking data

Kinect's core technology, the section accurately calibrate the human body's 20 key points, and the position of these 20 points to track real-time

The data object type is provided as a skeleton frame and can hold up to 20 points per frame, each of which is represented by a joint type.

Jointtype: The type of the bone point, enumeration type, lists the specific names of the 20 bone points--jointtype.hand_left

The Position:skeletonpoint type represents the location information of the bone point, and Skeletonpoint is a structure that contains three data elements of x, Y, Z to store the three-dimensional coordinates of the bone point.

Trackingstate: Enumeration type that represents the tracking state of a bone point (tracked indicates that the bone point is snapped correctly, nottracked indicates that no snap to the bone point, inferred status is indeterminate)

Bust Mode

Seated mode--bust mode, the system captures only 10 bone points in the upper body, ignoring the lower body (even if the data of the lower body bone point is unstable or does not exist and does not affect the upper body bone data)

The bust pattern is defined in the enumeration type Skeletontrackingmode (default seated)

The way the application gets the next frame of bone data is the same way that you get the color image and the depth image data, all by calling back the function and passing a cache implementation--openskeletonframe

If the new bone data is ready, the system will copy it to the cache

Polling mode reads bone events, which can be achieved by invoking the Opennextframe function of the Skeletonstream class.

Public skeletonframe opennextframe (int millisecondswait)

The Opennextframe () function returns when the new data is ready or when the wait time is exceeded

Time mode gets events in an event-driven manner, more flexible and accurate

The application passes an event handler function to the Skeletonframeready event, which is defined in the Kinectsensor class, and is immediately called when the bone data of the next frame is ready.

The RGB image data differs from the spatial coordinate system of the depth image data (bone data)-The former RGB camera, the latter infrared camera, and therefore the corresponding error is drawn directly on the RGB image using the obtained bone point coordinates.

coordinate system conversion kinectSensor.CoordinateMapper.MapSkeletonPointToColorPoint ()

Rotational information of the bone point (relative rotation information and absolute rotation information)--rotation matrix parameter and four-tuple parameter

The bone point rotation information is defined as the Boneorientation class:

Startjoint starting Bone Point

Endjoint End Bone Point

Hierarchicalrotation Relative rotation information

Absoluterotation Absolute Rotation Information

Bonerotation hierarchical = orientation. Hierarchicalrotation;

Bonerotation absolute=orientation. Absoluterotation; The bonerotation type records the matrix of rotation information and the four-dollar number

Using the audio API-four-dollar microphone array

The task of speech recognition is to use a computer program to convert speech into a string of words.

Kinect for Windows SDK with Microsoft Speech API for hosted applications with the Kinect Mic Group provides the necessary infrastructure to support the latest speech algorithms

The Angleconference property represents the confidence level of the audio source location estimate

Monitors changes in the wavefront direction, and beamchanged events when the Beamangle attribute of the Kinectaudiosource changes

The Speechrecognitionengine class provides a series of methods for acquiring and managing the speech recognition engine (loading the parser, starting speech recognition, ending speech recognition)

Installedrecognizers is a static method that returns a list of speech recognizers-including all speech recognizers installed on the current system

The Speech engine triggers the following 3 events:

The Speechrecognitionengine.loadgrammer event occurs every time the command is tried, and it is passed to the event handler for a Speechrecognizedeventargs object. This object contains a best match word and an estimated confidence value selected from the command collection

The Speechrecognitionengine.speechrecognized event occurs when the attempted command is recognized as a member of the Command collection, and the event is passed to an event handler that contains the identified command. Speechrecognizedeventargs Object

The Speechrecognitionengine.speechrejected event occurs when the attempted command is not recognized as a member of the Command collection. It passes to the event handler a Speechrecognitionrejectedeventargs object

Improved recognition accuracy:

Increase the number of words that recognize a string

Design a gesture that turns on speech recognition only when Kinect captures this particular gesture, otherwise it stays off

Face Tracking SDK

Human face data that can be identified:

Feature point coordinates (identification and tracking of 100 feature points of the face based on the depth and color images provided by Kinect)

Face Orientation

Bounding box

Parameters based on CANDIDE3 human face model

Kinect Human-Computer Interaction development Practice

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More