Kinect development-Gesture Recognition (I)

Last Update:2014-07-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Just as clicks is the core of the GUI platform and taps is the core of the touch platform, gestures is the core of the Kinect application.

The definition of a gesture is centered on the ability of a gesture to communicate. The meaning of a gesture lies in the description rather than execution.

In the field of human-computer interaction, gestures are usually used to send simple instructions instead of communicating certain facts, describing problems, or presenting ideas.

Using gestures to operate computers is usually imperative, which is not the goal of people using gestures. For example, the wave action is usually a greeting method in the real world, but this greeting method is rarely used in human-computer interaction. The first write program usually shows "hello", but we are not interested in greeting the computer.

However, in a busy restaurant, the gesture may have different meanings. When recruiting a waiter, it may attract the attention of the waiter and they need to provide services. In a computer, it is also of special significance to attract the attention of the computer. For example, when the computer is sleeping, it is generally awakened by hitting the keyboard or moving the mouse to remind the computer to "Pay attention ". When using the Kinect, you can use a more intuitive way, just like sending a report to Tom, raising your hands, or simply waving to the computer, the computer will wake up from sleep.

Any Kinect gesture we want to use must be consistent with the meaning of a gesture between the user of the application and the design and developer of the application.

The natural user interface is a collection of technologies, including speech recognition, multi-touch, and dynamic interaction interfaces similar to Kinect, he interacts with the mouse and keyboard in windows and Macs operating systems. This is a very common graphic interaction interface.

The Nui interface design makes full use of the skills that users can interact with the UI, they even forgot where they learned the skills they needed to interact with the UI.

The natural user interface relies on prior knowledge and media-free interaction. These two features are the common features of each Nui interface.

A common feature of a button in a graphical user interface is that it provides a Floating State to indicate the correct position of the user cursor over the button. This suspension State separates the click action. The floating status can provide additional information for the button. When a button is transplanted to the touch screen interactive interface, the button cannot be suspended. The touch screen interface can only respond to the touch. Therefore, compared with the image user interface on the computer, buttons can only provide click operations without point capabilities.

In the Kinect-based graphical interface, the button behavior is the opposite of that in the touch interface. It only provides the hover's point capability, it does not have the ability to "click. Button, a weakness that makes the user experience designer feel frustrated, has forced the designer to constantly improve the buttons on the Kinect interface over the past few years, click on visual elements in more clever ways. These improvements include: hovering over a button for a period of time, pushing out the Palm (awkwardly imitating clicking a button)

People generally divide natural interaction interfaces into three types: Voice interaction interfaces, touch interaction interfaces, and gesture interaction interfaces.

In the gesture interaction interface, pure gestures, gestures, tracing, and combinations between them constitute the basic terms of interaction. For Kinect, there are currently eight Common gestures: Wave, hover button, and magnet button ), push Button, magnetic slide, universal pause, vertical scrolling, and swipping ).

In the interaction design, there are two usability aspects: availability and feedback ). The feedback means that the user knows the operation in progress. On the webpage, clicking the button will show that the button has a slight offset, which indicates that the interaction is successful. The voice when the mouse is pressed is also a kind of feedback in a sense. It indicates that the mouse is working.

If feedback occurs during or after an operation, the availability (affordance) occurs before the operation. Availability is a kind of reminder or guidance that tells a user that a visualization element can interact with each other and indicates the use of the element. In the GUI, buttons are the best element to accomplish these concepts. The button performs some function operations through text or icon prompts. The buttons on the GUI display a message indicating the user's purpose through the suspension status.

The above are basically nonsense ~~ The following describes how to implement gesture recognition.

There are three basic methods for recognizing gestures: algorithm-based, neural network-based, and gesture-based sample library. Each method has its own advantages and disadvantages. The specific method used by developers depends on the gestures to be identified, project requirements, development time, and development level. Algorithm-based Gesture Recognition is relatively simple and easy to implement, while neural networks and gesture sample libraries are somewhat complicated.

The basic process of using an algorithm is to define processing rules and conditions. These processing rules and conditions must meet the requirements of processing results. In gesture recognition, the result of this algorithm is a binary object. A gesture either meets the predefined gesture or does not. However, the simplest and most direct method also has its disadvantages. The simplicity of algorithms limits the types of gestures that can be recognized. Good Wave recognition algorithms cannot recognize throw and swing actions. The former is relatively simple and regular, while the latter is more subtle and changeable.

The algorithm also has an inherent scalability problem. Although some code can be reused, each gesture must be identified using a custom algorithm. As the new Gesture Recognition Algorithm is added to the class library, the size of the class library increases rapidly. This affects the performance of the program because many algorithms are required to recognize a gesture to determine its type.

Finally, each gesture recognition algorithm requires different parameters, such as time interval and threshold. This is especially evident when specific gestures are identified based on the process. Developers need to constantly test and experiment to determine the appropriate parameter values for each algorithm. This is a challenging and boring job. However, the recognition of each gesture has its own special problems.

For example, a jump gesture means that the user jumps up for a short time and the foot leaves the ground. This definition cannot provide enough information to identify this action. At first glance, this action seems simple enough to use algorithms for recognition. First, there are many different types of jumps: Basic jumping, hurdling, long jumping, and hopping. However, the biggest problem here is that due to the limitations of the field of view of the Kinect, it is impossible to always detect the location of the floor, which makes it difficult to determine when the foot leaves the floor. Imagine that the user bent down his knees and jumped up. Should the gesture recognition engine consider this as a gesture or multiple gestures: squatting, squatting, or jumping? If the user's squatting time is longer than the jumping time, this gesture may be recognized as squatting rather than jumping. This pose is hard to clearly define, making it impossible to identify by defining algorithms. At the same time, these algorithms become difficult to manage and unstable due to the need to define too many rules and conditions. It is too simple and not robust to use a correct or wrong binary strategy to identify user gestures. It cannot be used to identify similar jumps, squats, and other actions.

The organization and Determination of neural networks are based on statistics and probabilities, making it easy to control the processes like gesture recognition. The network-based Gesture Recognition engine determines that a 80% probability is a hop, and a 10% probability is a squat.

In addition to recognizing complex and precise gestures, the neural network method can also solve the scalability problems of algorithm-based Gesture Recognition. A neural network contains many neurons. Each neuron is a good algorithm that can be used to determine movements of tiny parts of a gesture. In neural networks, many gestures can share neurons. However, each gesture recognition has a unique combination of neurons. Moreover, neurons have an efficient data structure to process information. This makes Gesture Recognition highly efficient.

Compared with algorithm-based methods, neural networks rely on a large number of parameters to obtain accurate results. The number of parameters increases with the number of neurons. Each neuron can be used to recognize multiple gestures. The changes to the parameters of each nerve will affect the recognition results of other nodes. Configuring and adjusting these parameters is an art that requires experience and has no specific rules to follow. However, when a neural network is used to manually adjust parameters during machine learning, the system's recognition accuracy will increase over time.

The sample or template-based Gesture Recognition System can match Human gestures with known gestures. User gestures have been standardized in the template library, so that they can be used to calculate the matching precision of gestures. There are two sample Recognition Methods: one is to store a series of points, and the other is to use a similar skeleton Tracing System in the Kinect SDK. In the method that follows, the system contains a series of skeleton data and depth-of-field frame data, which can be used to match the produced image frame data to identify known frame data.

This gesture recognition method is highly dependent on machine learning. The recognition engine records, processes, and reuses the data of the current frame. As time passes, the gesture recognition accuracy will gradually increase. The system can better recognize the specific gestures you want to express. This method can easily recognize new gestures, and can better process more complex gestures than the other two methods. However, it is not easy to establish such a system. First, the system depends on a large amount of sample data. The more data, the higher the recognition accuracy. Therefore, the system requires a large amount of storage resources and CPU time for search and matching. Second, the system requires samples of different heights, fat and thin, and wearing different clothes (which will affect the depth of field data to extract the body contour) for a certain gesture.

Recognize Common gestures

The method for Gesture Recognition is usually dependent on the project's needs. If the project only needs to recognize a few simple gestures, it is enough to use algorithm-based or neural network-based Gesture Recognition. If you are interested in other types of projects, you can invest time to build reusable gesture recognition engines, or use some recognition algorithms that have already been written.

No matter which gesture recognition method you choose, you must consider the range of the gesture. The system must be flexible and allow a gesture to change within a certain range. The trick is to make as many people as possible with a gesture and try to standardize it. A better way of Gesture Recognition is to focus on the core part of the gesture, rather than the external details.

Waving is the simplest and most basic gesture. Algorithm methods can easily recognize this gesture, but any method mentioned earlier can also be used. Although waving is a simple gesture, how can we use code to identify this gesture? You can wave to yourself in front of the mirror and observe the movements of your hands carefully. Pay special attention to the relationship between your hands and your arm. Continue to observe the relationship between the hand and the arm, and then observe the whole posture of the body while doing this gesture. Some people keep their bodies and arms motionless, and use the left and right sides of their wrists to move their hands. Some people keep their bodies and arms motionless and use the front and back movements of their wrists to wave their hands. You can observe these positions to learn about other different ways of waving.

The wave action in Xbox is defined as bending from the arm to the elbow. The user uses the elbow as the focus to move the forearm back and forth, the Movement plane and the shoulder are on a plane, and the arm and the ground are parallel, in the middle of the gesture (1), the forearm is perpendicular to the rear arm and the ground.

We can observe some patterns. The first rule is that the hands and wrists are on the elbows and shoulders, which are mostly the characteristics of waving. This is also the first criterion for recognizing the gesture of waving. The first figure shows the middle position of the hand posture, vertical between the forearm and the back arm. If the user's arm changes this relationship and the forearm is on the left or right side of the vertical line, we think this is a part of the gesture. For this gesture, each segment must be repeated multiple times. Otherwise, it is not a complete gesture. This movement pattern is our second principle: When a gesture is a wave, the hand or wrist must repeat a specific number of times between the left and right sides of the middle posture. Using these two observations, we can use algorithms to establish algorithm rules to identify the waving gesture.

The algorithm calculates the number of times the hand leaves the intermediate position area. The middle area is an area with the elbow as the origin and a certain threshold value. The algorithm also requires the user to complete this gesture within a certain period of time, otherwise the recognition will fail. The swing Gesture Recognition Algorithm defined here is only a separate algorithm, not included in a multi-layer gesture recognition system. The algorithm maintains its own status and informs the user of the recognition result in the form of an event when the recognition is complete. Waving recognition monitors the gestures of multiple users and both hands. The recognition algorithm computes the new skeleton data of each frame. Therefore, the recognition status must be recorded.

The following code demonstrates two enumeration types and a structure for recording the gesture recognition status. The first enumeration named waveposition is used to define the different positions of the hand in the wave action. The gesture recognition class uses the wavegesturestate enumeration to track the status of each user's hand. The wavegesturetracker structure is used to save the data required for Gesture Recognition. He has a reset method. When a user's hand fails to reach the basic movement condition of the gesture, for example, when the hand is under the elbow, you can call the reset method to reset the data used in gesture recognition.

private enum WavePosition{    None = 0,    Left = 1,    Right = 2,    Neutral = 3}private enum WaveGestureState{    None = 0,    Success = 1,    Failure = 2,    InProgress = 3}private struct WaveGestureTracker{    public int IterationCount;    public WaveGestureState State;    public long Timestamp;    public WavePosition StartPosition;    public WavePosition CurrentPosition;    public void Reset()    {        IterationCount = 0;        State = WaveGestureState.None;        Timestamp = 0;        StartPosition = WavePosition.None;        CurrentPosition = WavePosition.None;    }}

The following code shows the most basic structure of the gesture recognition class: it defines five constants: the threshold value of the intermediate area, the duration of the gesture action, and the number of times the gesture moves left and right from the middle area, and left-hand and right-hand identifier constants. These constants should be stored as configuration items in the configuration file, so they are declared as constants for convenience. The wavegesturetracker array saves the recognition results of the gestures of each player's hands. When this gesture is detected, the gesturedetected event is triggered.

When the main program receives a new data frame, it calls the update method of wavegesture. This method cyclically traverses the skeleton data frames of each user, and then calls the trackwave method to recognize the left and right hands. When the skeleton data is not in the tracking status, reset the gesture recognition status.

public class WaveGesture{    private const float WAVE_THRESHOLD = 0.1f;    private const int WAVE_MOVEMENT_TIMEOUT = 5000;    private const int LEFT_HAND = 0;    private const int RIGHT_HAND = 1;    private const int REQUIRED_ITERATIONS = 4;    private WaveGestureTracker[,] _PlayerWaveTracker = new WaveGestureTracker[6, 2];    public event EventHandler GestureDetected;    public void Update(Skeleton[] skeletons, long frameTimestamp)    {        if (skeletons != null)        {            Skeleton skeleton;            for (int i = 0; i < skeletons.Length; i++)            {                skeleton = skeletons[i];                if (skeleton.TrackingState != SkeletonTrackingState.NotTracked)                {                    TrackWave(skeleton, true, ref this._PlayerWaveTracker[i, LEFT_HAND], frameTimestamp);                    TrackWave(skeleton, false, ref this._PlayerWaveTracker[i, RIGHT_HAND], frameTimestamp);                }                else                {                    this._PlayerWaveTracker[i, LEFT_HAND].Reset();                    this._PlayerWaveTracker[i, RIGHT_HAND].Reset();                }            }        }    }}

The following code is the main part of trackwave, the main logical method for Gesture Recognition. It verifies the conditions we previously defined to form a gesture and updates the gesture recognition status. The first condition for recognizing the hand or right hand gesture is to verify that the hand and elbow points are in the Tracking State. If the two nodes are unavailable, the tracing status is reset. Otherwise, the next verification is performed.

If the pose duration exceeds the threshold and is not in the next step, reset the tracking data when the pose tracing times out. The next step is to verify whether the hand close node is above the elbow joint. If not, the recognition fails or the recognition condition is reset based on the current tracing status. If the hand close node is on the Y axis and higher than the elbow close node, the method continues to judge the position of the hand on the Y axis relative to the elbow joint. Call the updateposition method and input the appropriate location of the toggle. After updating the node location, the user finally determines whether the defined number of repetitions has been met. If these conditions are met, the gesture is successfully recognized and the getsturedetected event is triggered.

Private void trackwave (skeleton, bool isleft, ref wavegesturetracker tracker, long timestamp) {jointtype handjointid = (isleft )? Jointtype. handleft: jointtype. handright; jointtype elbowjointid = (isleft )? Jointtype. elbowleft: jointtype. elbowright; joint hand = skeleton. joints [handjointid]; joint elbow = skeleton. joints [elbowjointid]; If (hand. trackingstate! = Jointtrackingstate. nottracked & elbow. trackingstate! = Jointtrackingstate. nottracked) {If (tracker. state = wavegesturestate. inprogress & tracker. timestamp + wave_movement_timeout <timestamp) {tracker. updatestate (wavegesturestate. failure, timestamp); system. diagnostics. debug. writeline ("fail! ");} Else if (hand. position. y> elbow. position. y) {// use (0, 0) as the center of the screen. from the user's perspective, the X axis is left negative and right positive. if (hand. position. x <= elbow. position. x-wave_threshold) {tracker. updateposition (waveposition. left, timestamp);} else if (hand. position. x> = elbow. position. X + wave_threshold) {tracker. updateposition (waveposition. right, timestamp);} else {tracker. updateposition (waveposition. neutral, timestamp);} If (tr Acker. State! = Wave. sturestate. Success & tracker. iterationcount = required_iterations) {tracker. updatestate (wavegesturestate. Success, timestamp); system. Diagnostics. Debug. writeline ("success! "); If (gesturedetected! = NULL) {gesturedetected (this, new eventargs () ;}} else {If (tracker. state = wavegesturestate. inprogress) {tracker. updatestate (wavegesturestate. failure, timestamp); system. diagnostics. debug. writeline ("fail! ") ;}Else {tracker. Reset () ;}} else {tracker. Reset ();}}

The following code is added to the wavegesturetracker structure: These help methods maintain fields in the structure, making the trackwave method readable. The only thing to note is the updateposition method. Trackwave calls this method to determine if the hand has been moved. The main purpose of this method is to update the currentposition and timestamp attributes. This method is also responsible for updating the interationcount field and inporgress status.

public void UpdateState(WaveGestureState state, long timestamp){    State = state;    Timestamp = timestamp;}public void Reset(){    IterationCount = 0;    State = WaveGestureState.None;    Timestamp = 0;    StartPosition = WavePosition.None;    CurrentPosition = WavePosition.None;}public void UpdatePosition(WavePosition position, long timestamp){    if (CurrentPosition != position)    {        if (position == WavePosition.Left || position == WavePosition.Right)        {            if (State != WaveGestureState.InProgress)            {                State = WaveGestureState.InProgress;                IterationCount = 0;                StartPosition = position;            }            IterationCount++;        }        CurrentPosition = position;        Timestamp = timestamp;    }}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kinect development-Gesture Recognition (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kinect development-Gesture Recognition (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support