‘’‘
Long ago wanted to write this article, their work has been cast CVPR2018, want to sum up.
Update: CVPR2018 has been received and will have time to introduce in the next blog post.
’‘’
There are many early methods, such as HMM,CRBM, Gaussian processes and so on. This article mainly summarizes the recent top article to take the method.
Forecasting Human Dynamics from Static Images (CVPR2017)
This article is mainly on the Hourglass network added RNN, as shown above. In the way of skip connection, the skeleton of the next frame is predicted, and 2D is converted into a whole. 2d 3D Network is also a lot of, here gives a ICCV2017 related work Github The following diagram in the sequence of the projection process.
When the first frame is predicted, the original image is entered and the next input is 0. Familiarity with SEQ2SEQ can be interpreted as a decoding phase in SEQ2SEQ. But some seq2seq after the decoding phase of the next moment input is the last moment of the output. In addition to note that, often only to a single input, the prediction of the action is often ambiguous. such as squatting up this action, you do not know is squatting or up. Therefore, the previous work is to give a pair, that is, two diagrams to eliminate ambiguity.
Recurrent network Models for Human Dynamics (ICCV2015)
This is an earlier article, the method used is relatively simple and rough.
The idea is Encode-rnn-decode, very simple, but compared with the nearest method, it is often easy to converge to the mean value quickly. As a predictive network, the article develops a new role to predict the location of key points.
Structural-rnn:deep Learning on Spatio-temporal Graphs (CVPR2016)
This article is also the most respected one, some websites are also introduced. The framework is as follows:
Using the graph model problem to solve the problem, is very reasonable, but the flaw also is very obvious, is the parameter is too many, especially rnn more, the training is very complex. It seems that the network is very complex, study the paper carefully, the idea is very simple. There is a separate rnn for each node, and the other interactive information is captured by other rnn. A node next frame of action, not only take into account their past information, but also to consider the surrounding interactive information. Github
On human motion prediction using recurrent neural networks (CVPR2017)
I like this work most, the network is simple, the effect is very good.
Adopting the SEQ2SEQ structure and introducing the residual method, it is easier to train. Note the sampling-loss of this paper, for the prediction of long series, in the test seq2seq decoding phase to the Groundtruth or the previous frame of the prediction results. Because there is no real value in the test.
Deep representation Learning for human motion Prediction and classification (CVPR2017)
The focus of this article, as the topic says, emphasizes deep representation and explores 3 different ways. More attractive is the use of convolution in the time dimension. This can be associated with a skeleton-related article, a skeleton S of T-time length of the matrix, convolution and the matrix after the convolution, meaning can be understood as a space and time dimension of the operation.
Summary of individual action prediction, and the skeleton involved, there will be a lot of strange ways to do, this can be seen over the years skeleton based recognition article, the main meaning is not very big. The main or should be able to model behavior. Gauss process is very suitable for this, of course, with some in-depth learning network to do can also. The temporary idea is that it can model a behavior, combine Gan to do the build task.