Kinect for Windows SDK
Bone Tracking--bone tracking for one or two people moving within the Kinect field of vision, traceable to 20 nodes in the body
Depth camera-gain visibility into the environment with depth sensors three-dimensional position information (depth image-each pixel is the distance from the Kinect sensor)-encodes space using infrared emitted by the Kinect infrared emitter, regardless of ambient light, without affecting the measurement results
Audio processing-Integration with Microsoft Speech speech recognition API
Processing of Kinect color and infrared image data
Color image quality-normal quality and high quality-determines the speed at which data is transferred from the Kinect to the PC
Normal quality--compresses the sensor side before it is passed to the application console, then extracts the data from the console-image compression causes the returned color data to have a frame rate of 30, but degrades the image quality
High quality – delivered directly to the console, without compression, with a maximum frame rate of no more than 15
Color data can be selected in two color formats- RGB YUV
The color image type is represented by the enumeration type Colorimageformat
Infrared data stream
Principle:Kinect first through the left side of the infrared emitter to the environment to emit infrared ray, this beam of infrared because of the high randomness, its space in any two different positions reflected in the formation of the spot is not the same, the environment to form a three-dimensional "optical code", The infrared image in the Kinect field is captured by the infrared receiver on the right, and finally, a series of complex calculations are made using the infrared image and the Kinect's original parameters to obtain three-dimensional depth information in the field of view.
The Kinectsensor class provides interfaces for managing the switch of the Kinect device and the acquisition of all data
Infrared image is actually a special form of color image--kinectsensor.colorstream.enable (COLORIMAGEFORMAT.INFRAREDRESOLUTION640X480FPS30)
Displays the image statement this. Colorimage.source = Bitmapsource.create (Imageframe.width, Imageframe.height, 96,96, PIXELFORMATS.GRAY16, NULL, Pixeldata, Imageframe.width * imageframe.bytesperpixel);
Processing of deep data
Kinect identifies two human images in front of the sensor group by processing the depth data, creating a segmented graph-a bitmap that corresponds to the nearest player index in the field of view (the player index is 0, indicating that no player is found in the corresponding location, the index value is 1, and 2 indicates the detected player number)
Although the player segmented data is a logical stream of isolation, actual depth data and player segmented data are incorporated into a separate structure
A high 13 bits per pixel represents the closest distance from a particular coordinate object within the depth sensor's field of view, theoretically the range of 0~8192 mm
The low 3 bits per pixel identifies the visible player index that is tracked on the XYZ coordinate system of the pixel, which can be seen as the shaping value of the 3 bits
The depth image data type is defined as Depthimageframe Copypixeldatato () assigns the depth image data obtained from the Kinect device to the short array, which contains depth information for each pixel and player index information (16 bits)
How to deal with bone tracking data
Kinect's core technology, the section accurately calibrate the human body's 20 key points, and the position of these 20 points to track real-time
The data object type is provided as a skeleton frame and can hold up to 20 points per frame, each of which is represented by a joint type.
Jointtype: The type of the bone point, enumeration type, lists the specific names of the 20 bone points--jointtype.hand_left
The Position:skeletonpoint type represents the location information of the bone point, and Skeletonpoint is a structure that contains three data elements of x, Y, Z to store the three-dimensional coordinates of the bone point.
Trackingstate: Enumeration type that represents the tracking state of a bone point (tracked indicates that the bone point is snapped correctly, nottracked indicates that no snap to the bone point, inferred status is indeterminate)
Bust Mode
Seated mode--bust mode, the system captures only 10 bone points in the upper body, ignoring the lower body (even if the data of the lower body bone point is unstable or does not exist and does not affect the upper body bone data)
The bust pattern is defined in the enumeration type Skeletontrackingmode (default seated)
The way the application gets the next frame of bone data is the same way that you get the color image and the depth image data, all by calling back the function and passing a cache implementation--openskeletonframe
If the new bone data is ready, the system will copy it to the cache
Polling mode reads bone events, which can be achieved by invoking the Opennextframe function of the Skeletonstream class.
Public skeletonframe opennextframe (int millisecondswait)
The Opennextframe () function returns when the new data is ready or when the wait time is exceeded
Time mode gets events in an event-driven manner, more flexible and accurate
The application passes an event handler function to the Skeletonframeready event, which is defined in the Kinectsensor class, and is immediately called when the bone data of the next frame is ready.
The RGB image data differs from the spatial coordinate system of the depth image data (bone data)-The former RGB camera, the latter infrared camera, and therefore the corresponding error is drawn directly on the RGB image using the obtained bone point coordinates.
coordinate system conversion kinectSensor.CoordinateMapper.MapSkeletonPointToColorPoint ()
Rotational information of the bone point (relative rotation information and absolute rotation information)--rotation matrix parameter and four-tuple parameter
The bone point rotation information is defined as the Boneorientation class:
Startjoint starting Bone Point
Endjoint End Bone Point
Hierarchicalrotation Relative rotation information
Absoluterotation Absolute Rotation Information
Bonerotation hierarchical = orientation. Hierarchicalrotation;
Bonerotation absolute=orientation. Absoluterotation; The bonerotation type records the matrix of rotation information and the four-dollar number
Using the audio API-four-dollar microphone array
The task of speech recognition is to use a computer program to convert speech into a string of words.
Kinect for Windows SDK with Microsoft Speech API for hosted applications with the Kinect Mic Group provides the necessary infrastructure to support the latest speech algorithms
The Angleconference property represents the confidence level of the audio source location estimate
Monitors changes in the wavefront direction, and beamchanged events when the Beamangle attribute of the Kinectaudiosource changes
The Speechrecognitionengine class provides a series of methods for acquiring and managing the speech recognition engine (loading the parser, starting speech recognition, ending speech recognition)
Installedrecognizers is a static method that returns a list of speech recognizers-including all speech recognizers installed on the current system
The Speech engine triggers the following 3 events:
The Speechrecognitionengine.loadgrammer event occurs every time the command is tried, and it is passed to the event handler for a Speechrecognizedeventargs object. This object contains a best match word and an estimated confidence value selected from the command collection
The Speechrecognitionengine.speechrecognized event occurs when the attempted command is recognized as a member of the Command collection, and the event is passed to an event handler that contains the identified command. Speechrecognizedeventargs Object
The Speechrecognitionengine.speechrejected event occurs when the attempted command is not recognized as a member of the Command collection. It passes to the event handler a Speechrecognitionrejectedeventargs object
Improved recognition accuracy:
Increase the number of words that recognize a string
Design a gesture that turns on speech recognition only when Kinect captures this particular gesture, otherwise it stays off
Face Tracking SDK
Human face data that can be identified:
Feature point coordinates (identification and tracking of 100 feature points of the face based on the depth and color images provided by Kinect)
Face Orientation
Bounding box
Parameters based on CANDIDE3 human face model
Kinect Human-Computer Interaction development Practice