Development--Basic SDK and Windows programming tips (color image video streaming, deep image video streaming, bone tracking, audio processing, speech recognition API)
Deep data is the essence and soul of Kinect, and many problems are transformed into pattern recognition problems of deep images.
Aforge.net is a set of frameworks written in C # that provides computer vision, machine learning www.aforgenet.com
Image processing requires a lot of computational resources, and using a managed language like C # is unwise and should be used more OPENCV
Application Layer API Detailed
NUI API
Kinect Audio DMO: provides beam forming and audio source positioning functions
Windows Speech SDK: Provides audio, voice, multimedia API sets, and Microsoft's language recognition capabilities
Kinect's core NUI API
Up to 4 Kinect devices are connected to the same computer, but the application can only specify one of the Kinect-enabled bone tracking features. Multiple applications cannot share a single Kinect sensor at the same time
1, get the Kinect instance
Kinectsensor sensor= (from Sensortocheck inkinectsensor.kinectsensors where sensortocheck.status== kinectstatus.connected Select Sensortocheck). FirstOrDefault ();
foreach (kinectsensor kinectsensor indexer kinectsensor.kinectsensors) { if(kinectsensor.status== kinectstatus.connected) { Kinectsensor=kienct; Break ; }}
2, call the Kinectsensor.start method to initialize and start the Kinect sensor
3, registering related events, (such as video streaming or deep data arrival events, skeleton tracking events, and invoking SDK-provided APIs for processing based on these events)
Kinectsensor.colorframeready
Kinectsensor.depthframeready
Kinectsensor.skeletonframeready
Kinectsensor.allframeready
4. Call the Kinectsensor.stop method to turn off the Kinect sensor
The Kinect NUI API handles data from the Kinect sensor in a "pipeline" manner. At initialization time, the application specifies the sensor data it needs. (color, depth, depth and user number, bone tracking)
These options must be set in the initialization, or the data will not be available.
Kinect Audio DMO
Improve audio quality beamforming
echo cancellation echo suppression automatic gain control (automatic gain algorithm makes sound amplitude consistent when the user approaches or stays away from the microphone) beamforming
The key object of the Microsoft.speech class library is speechrecognitionengine, which is responsible for getting noise-pre-processed audio data streams from the Kinect sensor, then analyzing and interpreting them to further match the most appropriate voice commands
Speechrecognitionengine Speech command recognition based on certain grammatical expression, Grammar object consists of a series of individual words or phrases, which are expressed by class Grammarbuilder, and the syntax can be expressed based on the choice of choices class and wildcard character.
Data Flow Overview
1, color image data
Image quality will affect the transmission rate between the Kinect sensor and the computer
Applications can set the encoding format for color images, including RGB,YUV two encodings
30 frames per second The transmission speed and resolution of the 320*240
2, the user splits the data
The depth image of each pixel consists of 2 bytes, a total of 16 bits
The height of 13 bits per pixel represents the distance from the Kinect infrared camera to the nearest object, in millimeters
The
Low 3-bit byte represents the tracked user index number, which is converted to an integer value type, not as a flag bit
Do not reference a specific "user index number" during code writing, and the "User index number" returned by the Kinect skeleton trace may change even for the same person
The
Split data by user can separate the user depth image from the original depth image, coordinate mapping, and further separate the user color image from the original color image-to achieve the "augmented reality" effect
3, Depth image data
Each pixel contains a specific distance information
For the Kinect infrared camera, you can learn about the current camera's operating mode with the Depthimagestream.depthrange enumeration type:
Toofardepth
tooneardepth
unknowdepth
Depth image each pixel is 16 bits, defining a short data to store the depth image:
short [] depthpixeldata=new
Showrt[depthframe.pixeldatalength];d Epthframe.copypixeldatato (depthpixeldata);
For each point in the depth image, p (x, y), Depthframe.width is the depth image width, and the distance of the target object to the Kinect is calculated by a bitwise operation.
Int32 depth = depthpixeldata[pixelindex]>>depthimageframe.playerindexbitmaskwidth;
How to get the data flow
1, polling mode (pull)
First, the image data stream is turned on, then the frame data is requested and the wait time is set to T, in milliseconds, and if the frame data is not ready, the system waits for the T time to return. If the frame data returns successfully, the application can request the next frame of data and perform other operations on the same thread
Opennextframe (T) t--The maximum time to wait for new data to be returned
2, Event model
The application registers the FrameReady event for the data flow, and when the event is triggered, it invokes the event's property Framereadyeventargs to get the data frame
Cannot use both modes for the same data stream
The Allframeready event consists of three data streams, such as an application registering a Allframeready event, and any attempt to get data in a stream in a pull (poll) mode will generate InvalidOperationException
In some applications, in order to keep the depth image and color image as synchronized as possible, you can use polling mode--through the timestamp property
Bone Tracking
Skeleton Information Retrieval
1, polling mode Skeletonstream.opennextframe
2, Event model Kinectsensor.allframesready event, once the new bone data is ready, the event is triggered, call Skeletonframereadyeventargs.openskeletonframe to get the frame
Bone Tracking Object Selection
If you need to manually select a tracking object, you need to use the Appchoosesskeletons property and the Chooseskeletons method.
NUI Coordinate conversion
mapdepthtocolorimagepoint--depth Image coordinate system--color image coordinate system
mapdepthtoskeletonpoint--depth Image coordinate system--bone tracking coordinate system
mapskeletonpointtocolor--Skeleton Tracking coordinate system--color image coordinate system
mapskeletonpointtodepth--Skeleton Tracking coordinate system--depth image coordinate system
Even with the same resolution, the pixels of a depth-image frame cannot be mapped to a color image frame at all-because two cameras are located in different locations of the Kinect
3 coordinate conversion methods in the Depthimageframe class of the depth image frame
Mapfromskeletonpoint mapping bone Joint point coordinates to depth image point coordinates
Maptocolorimagepoint Map a point coordinate in the depth image to the point coordinate of the synchronized color image frame
Maptoskeletonpoint mapping a point coordinate in the view image to the point coordinate of the corresponding bone data frame
The z-axis represents the optical axis of the infrared camera, perpendicular to the image plane. The intersection point between the optical axis and the image plane, which is the origin of the image coordinate system
Both the depth image coordinate system and the bone tracking coordinate system are the Kinect camera coordinate system, the origin is the Infrared Camera center, the x-axis is parallel to the x-axis y-axis of the image, the z-axis infrared camera axis, it is perpendicular to the image plane
Screen coordinate system--the upper-left corner is the origin, the x-axis is positive and the y-axis is positive.
Depth image spatial coordinates--in millimeters
Bone space coordinates--in meters
Sensor array and tilt compensation
Each skeleton frame includes a value that describes the gravity. The value is calculated by the internal triaxial accelerometer and the sensor image measurement. In the case of motion, the accelerometer measures the direction of gravity and the remaining horizontal vertical vector
Bone Mirroring
Non-mirrored bone tracking is not available in the SDK
Implementing a mirrored bone is simple-reversing the ex-coordinate value of a bone node can achieve this effect