Working principle of Kinect

Source: Internet
Author: User

"You are the controller ." (You are the controller.) If you are interested in Kinect, I believe you have heard of this powerful advertisement word. From Kinect Adventures! In the Zune play interface, the hands and feet and plugging the water holes, waved to change the song, the Kinect opened a more natural way of entertainment interaction. In this blog article, I will reveal the secrets behind this somatosensory system and how it allows developers to create a Kinect experience. Arjun Dayal, the Project Manager of the Kinect team, shows how to control the Xbox dashboard and the Kinect hub through gestures. First, let's start with guiding the concept and principles of the Kinect R & D.

We live in a simulated world

Traditional programming is based on a series of rules: Reason and result, which are neither black nor white, nor true or false. This method works well when modeling a simple system with a limited number of inputs and outputs. Take the game halo for example: press the key to let the sergeant jump, forward the left joystick to let him forward, forward the right joystick to let him look up. Either A or B. Unfortunately, the real world we live in is not so digitalized, but simulated.

In a simulated world, not only simple "yes" and "no", but also "maybe yes/no". Not only are "right" and "wrong ", there is also the possibility of "right/wrong ". Let's imagine all the possibilities of waving this simple action: the intensity of physical movement, the difference in environment, the difference in clothing texture, and the Movement difference caused by cultural differences. You may need to study so many possibilities of 10 to the power of 23. Obviously, it is unrealistic to use traditional programming methods to solve such problems.

From the very beginning, we knew that we had to solve this problem in a completely new way that is close to working in the human brain. When you meet a person, your brain immediately focuses on him and identifies him based on experience. This process is not implemented through hundreds of layers of decision trees. It is difficult for babies to distinguish between two people, but we can do it within a few seconds after years of learning and training. In fact, you may be able to accurately estimate their age, gender, mood, and even personality. This is also one of the reasons for us to make achievements human beings.

Kinect was created in a similar way. It observes the world around you and observes your actions. Even if you have never waved your hand before, you can quickly guess the meaning of your actions from the terabytes of data it has learned.

Kinect sensor

The core of the tracking process of the Kinect skeleton is a CMOS infrared sensor that can perceive the world regardless of the lighting conditions in the surrounding environment. The sensor detects the environment by means of black and white spectra. pure black represents infinite distances, while pure white represents infinite closeness. The gray area of the black and white area corresponds to the physical distance from the object to the sensor. It collects each point in the field of view and forms an image of depth representing the surrounding environment. The sensor generates a depth of field image stream at 30 frames per second to reproduce the surrounding environment 3D in real time. If you have played pin point impression 3D needle mold toys, it may be easier to understand this technology-press your hands (or face, if you want) on these toys, you can create a simple 3D model of a certain part of your body.

Search for Mobile Location

The next thing that Kinect needs to do is to look for moving objects in images that are more likely to be human, just as the human eyes subconsciously focus on moving objects. Next, Kinect evaluates the depth of field image in pixels to identify different parts of the human body. At the same time, this process must be optimized to shorten the response time.

Kinect uses a segmentation policy to distinguish the human body from the background environment, that is, extract useful signals from noise. Kinect can actively track the entire skeleton of up to two players, or passively track the form and position of up to four players. At this stage, we create a so-called split mask for each tracked player in the depth of field image, a background object (such as Chair and PET) depth of field image after removal. In the subsequent processing process, only the part of the split mask is transmitted to reduce the amount of somatosensory computing.

 

Kinect brain

The real magic is happening here. Each pixel of the segmented player image is transmitted into a machine learning system that identifies the human body. The system then gives the possibility of a specific pixel to which the body part belongs. For example, a pixel has a 80% probability of belonging to the foot, a 60% probability belongs to the leg, and a 40% probability belongs to the chest. It seems that at this time we can regard the highest probability as the result, but it is too arbitrary to do so. Our approach is to input all these possibilities into the next processing process and wait until the final stage for further judgment.

After reading the above introduction, you may want to ask how we can teach Kinect to recognize human body parts. Developing this artificial intelligence (known as the exemplar (model) System) is not an easy task: tens of terabytes of data are input into the cluster system to teach Kinect pixel-level technology to recognize hands, feet, and other body parts it sees. Is one of the data we use to train and test exemplar.

Model Matching: generate a skeleton System

The final step of the processing process is to use the output results of the previous phase, and generate a skeleton System Based on the 20 Tracking nodes. Kinect evaluates every possible pixel output by exemplar to determine the nodes. In this way, the position of the human body is most accurately evaluated based on sufficient information. In addition, we also made some additional output filters in the model matching stage to smooth the output and handle special events such as occlusion joints.

One of the goals of the skeleton Tracing System is to provide a single menu selection interface for various outputs of the processing process. Game developers can choose any combination of system components to develop a variety of game experiences. For example, you can just use the separation ing to create some amazing gorgeous effects (your shape: Fitness evolved is a good example ).

At this point, we have drawn a completely real-time somatosensory system that can be used to control games or entertainment. Next, Arjun will introduce the improved Xbox dashboard and Kinect hub. He will show you how these two user interfaces use the depth of field image stream and the 20-joint skeleton system to create a natural gesture-based, an all-new way to access games, movies, music, and other entertainment activities.

 

Kinect: How does Technology ultimately understand you!

Today, technology plays an important role in our daily lives, but until now, technology products are still not doing well in truly understanding human intentions and adapting to individual style differences. The advent of Kinect has changed everything. Standing in front of Kinect, it will know who you are. Not only that, but also can distinguish you from your lover. When you move, the sensor can track you instantly. Want to interact? You can play movies, play games, chat with friends, and so on by moving your voice and body. No need to learn any new control methods. What a magic!

Previously, Ron, the Project Manager of the Kinect team, has described the deep technology behind the real-time tracking of player movements by the Kinect sensor, but how can we best use it? Our goal is to allow gamers to control Xbox as freely as possible, while allowing all users to easily learn and understand various control gestures. Next, we will reveal this integrated technology in depth and talk about the experience of the Kinect hub and dashboard.

Gesture: start from?

When you hear that we want to design a gesture to move an object up or down, you may think: "No difficulty, move your hand to an object, select and move in the desired direction!"

Wait, don't be so confident. Ask your friends what they think, and you may be surprised to find that their answers are so different from yours. Is your method better? Not necessarily, but it is more logical for you. What makes humans unique is that they can accomplish a specific task in multiple ways. Let's take driving as an example. If you let 100 people imitate how to drive, you may get many answers. Some may hold both hands in front of the two o'clock and positions, some may only hold the 12 o'clock position with one hand, and some may sit back in the chair; similarly, imitation of the foot on the accelerator, brake and clutch will also be varied. All these methods allow us to drive, and the job of technology is to identify all these methods-Let technology understand you!

How complicated is it to identify a seemingly simple action? use a hand to give an example. When you want to reach out and get something, you will think that the direction of your hand should be completely perpendicular to the body plane. But in fact, due to the combination of shoulders and arm joints, you cannot reach out in a straight line. Therefore, each person makes a stretch action in a slightly different way, but everyone thinks this is the same stretch action. Successful Gesture Recognition is to understand the nuances of human movements and let technology understand these differences.

In the process of developing this revolutionary product, we must overcome the above challenges and make the product easy to use. Every decision we make is unprecedented in the field of human-computer interaction, and our work may redefine the future of interactive entertainment technology.

Gesture prototype: Get the essence

We use a common method to create control gestures for screen navigation: record all the ideas that can be thought of, such as choosing menus with feet. When we realize that there are too many such ideas, we know that a more reliable choice is needed.

We collect and record all ideas and create prototypes one by one to test which one is more suitable for common users. Prototype testing is very important for normal users, so we learned a lot about human motion and used to re-tune each new test. The existing human-computer interaction rules do not always apply to somatosensory interaction at a distance of 10 inch in the living room. Through testing, we can better understand user behavior, such as how comfortable it is to make gestures for a long time, and whether the control gesture set we create conflicts with the human natural gesture.

In the test process, our philosophy is "continuous failure, to get the essence of the meal". We constantly abandon the inappropriate solution and retain the effective solution. The engineering, user research, and design teams are fully involved in the prototyping process of the gesture set, and are tested with common users to determine the best gesture based on all the acquired data.

After months of testing, observation, and research, we have gained a simple and easy-to-understand control method-Hover selection and paging control. Hover selection is an easy-to-learn, highly reliable, and predictable mechanism, while paging control provides a more touch-sensitive way to control screen content.

Let's talk more deeply about the details of this control model through the implementation of the Xbox dashboard and the Kinect hub.

Kinect hub: experience the base camp of Kinect!

The Kinect hub is the experience center of the Kinect In the Xbox dashboard, where you can use gestures to access the content of the Kinect. The design of the hub is simple and easy to understand. You can note that we use a large project block with clear colors, allowing users to easily find and choose what they want to do.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.