Kinect for Xbox: 360 does not support "close-to-scene mode"
Three eyes-infrared projector, RGB camera, and infrared depth projection head-each pixel in the color image corresponds to one pixel in the depth image respectively.
Four ears-L-shaped Microphone Array-filter background noise and locate the sound source-determine the direction of the sound source based on the time difference when the microphone array receives the sound
Moving touch drive motor-used for elevation angle control
PS 1080 SoC chip-deep image acquisition capability
Xbox 360 (effective line of sight 1.2 ~~ 3.5) nearest distance -- 2.26 meters
For Windows (default mode (0.8-4.0), close scene mode (0.4-3.5 ))
In close-to-scene mode, only the skeleton tracking of the center hip joint is supported, not the complete 20-Joint Bone tracking.
Working principle of Kinect
Three categories of raw data information: Deep data streams, color video streams, and raw audio data
At the same time, it corresponds to three processes: Skeletal tracking, identify, and speech pipeline.
Skeleton tracking: separates players in deep data from background images to discover players. Based on the results of machine learning, you can quickly classify human body parts, identify nodes, and create models.
Identity Recognition (Motion Recognition and face recognition): the facial organs are divided into several key facial signs, and face matching is performed based on the features of color video information.
Speech recognition: beam mechanism-sound source localization and noise suppression mechanism-automatic filtering of environmental noise and voice command Recognition
Skeleton tracking
Compression and sensing of bone joints (the Deep image sampling frequency of Kinect is not the main cause of delay and space accuracy reduction-but the chip processing speed and software are not enough processing speed)
Action Recognition
Actions can be abstracted as the State or action sequence of the skeleton nodes.
Static posture
Movements-command word recognition, natural Semantics
Face Recognition
RGB camera-640*480 resolution is limited, using a compromise method for extracting features from the middle layer of the face
You can access face.com to provide developed API interfaces.
Speech Recognition
Voice commands, voice feature recognition, language recognition, word segmentation, tone and tone emotion Detection
Xbox, let's play Xbox is equivalent to a command prompt
The Kinect test has nothing to do with the temperature!
Basically, an object in a "big" shape is close to the proportion of the human body.
Kinect has nothing to do with turning on the light
For non-transparent objects, you can see their "depth ". However, luminous objects will affect the detection.
The principle of Deep Image Rendering in Kinect-the principle of color contour lines. The gradient effect between distance and distance makes the image more layered.
Deep Image imaging principle
Infrared camera classification (based on flight time TOF, structured light measurement based on optical coding)
Principle of structured light scanning: first, the structured light is projected onto the object surface, and then the camera is used to receive the structured light pattern reflected on the object surface, because the receiving pattern must be deformed by the three-dimensional shape of the object, you can try to compress the location and deformation degree of the pattern on the camera to calculate the spatial information of the object surface.
Light coding technology-laser speckle, which is a random diffraction spot formed after laser rays reach a rough object or penetrate through the glass. These splices are highly random and will change the pattern with different distances-the pattern of any two places in the space is not the same. In this way, the whole space is marked and an object is put into this space. You only need to look at the pattern on the object to know where the object is located. -- The measurement accuracy is only related to the reference density of the time mark instead of the Space ry relationship.
Calibration of light source-take a reference plane at intervals and record the pattern of the reference plane-used for 3D Reconstruction of objects (interpolation is also required)
From deep image to skeleton graph
The sofa and other objects behind you will extract the skeleton to produce interference-a process of extracting useful information from noise
A "big"-shaped object-determining which parts belong to the human body (a series of operations on Computer Vision)-extracting the Target Feature Points-Extracting players from the background image
Human body location classification-Machine Learning (32 different parts)
Kinect first identifies the human body and then infers the exit node. This is an approximate probability matching and evaluation process-pixel scanning one by one, first local on the overall
Transmit long-term skeleton speculation-result from neighboring nodes and machine learning
Kinect brain-a huge tree structure produced by machine learning-Decision Tree
Chip processing is not the main cause of latency in the hardware acquisition of Kinect. software processing is the main test phase.