Learning Motion Patterns in Videos
CVPR2017
Torch code:http://thoth.inrialpes.fr/research/mpnet
The problem to be solved in this paper is that determining whether an object are in motion, irrespective of camera motion,
Note that the camera here can be moved, if the camera does not move, then the problem is relatively simple. The problem with camera movement is more complicated.
The schematic is as follows:
Synthetic flyingthings3d datasets
Two adjacent frames, the camera is moving, there are several moving objects, we want to be able to separate the moving objects
3 Learning Motion Patterns
Our mp-net takes the optical flow field corresponding to the consecutive frames of a video sequence as input, and produces Per-pixel Motion Labels.
In this paper, we use a CNN network to input the optical flow field of two adjacent frames, and to output the motion or not label of each pixel.
In all words, we treat each video as a sequence of frame pairs, and compute the labels independently for each pair.
Each of our image pairs is calculated independently, and the video is considered to consist of each image pair
3.1. Network Architecture
Our task is to distinguish the different modes of motion in the optical flow field, which is done using the CNN network. This requires a large feel field, the output size needs to be the same size as the input image size. A large field of experience is essential for incorporating context information into the model. Small senses can not distinguish between the motion of an object or the motion of a camera.
The result of the network and the structure of semantic segmentation are very similar, essentially the same.
3.2. Training with synthetic data
We use the synthetic database Flyingthings3d dataset to train the CNN network
4.2. Refining the segmentation
We use a fully-connected CRF for subsequent processing of segmentation results
Different input information
Different combinations
Davis ' rivalry
BMS-26