Finding Action tubes-cvpr-2015

Source: Internet
Author: User

Thesis topic Finding action tubes, Paper link This paper is CVPR 2015, mainly about the action tube localization.
Look directly at the picture and speak,the core ideas/steps of this paper can be divided into two components:1 Action detection at every frame of the video2 Linked Detection in time produce action tubesHere's a separate component for each. 1 Action detection at every frame of the videoPresumably the idea is to train spatial-cnn and MOTION-CNN could be free feature, and train linear SVM for each category on feature.     The steps are as follows:  A.find the interesting regions of each frame. Build a positive and negative sample based on the region and action label of Ground-truth.       here with the method of IOU: >0.5 for positive region, <0.3 for negative region.     Why do you do this? Personally think that the action tube in the paper is for the actor inside to get it,         This is the action tracking and action classification of an actor in the video.         The inevitable data set gives the action category and the corresponding actor for each frame of the video .  So how do you find these regions? and how to eliminate unnecessary regions?
    There are many ways to generate proposals, and the paper uses the selective search method to generate proposals (about 2K) per frame in the video.
    Apparently, a large part of these proposals is non- .discriminative, and will cause serious computational consumption, not conducive to real-time detection.
    in this paper, a very simple way to eliminate theseNot descriptive's regions:
        
    It is important to note that the RGB and motion images regions are the same,         That is, prososals is extracted in RGB using the above method, and then used directly on motion.  B. Trainingspatial-cnn and Motion-cnn
  this is where the framework of the two CNN models is unfolded, specifically the paper.
  They are trained in the same way as rcnn. Concrete can be a lab brother blog
    The main points of this training for individuals are two:
      I. Trained on a single frame.
    Ii. Initialization of the CNN model.
        as we all know, the initialization of deep model is very important.
           SPATIAL-CNN is initialized with a CNN model that is trained on the detection task of Pascal Voc 2012.
        MotionThe -cnn is theUCF101 The trained CNN model on the motion data set to initialize it.
  As for some of the details of the training, such as the study rate, data argumentation, etc., please crossing yourself read the paper ha.
C. ExtractionTraining The FC7 characteristics of spatial-cnn and motion-cnn  This is just a cnns of the fc7 features of the mosaic together, simple violence.  can look at the characteristics of this blog is how to further integration.
d. Training of the linear svms of the actions.        

2 Linked Detection in time produce action tubes
This step is based on the component.  A.extracts the corresponding regions per frame, each region over SPATIAL-CNN and MOTION-CNN, to extract the fc7 characteristics,        after the SVMs, to obtain the corresponding action scores.  B. For each category, each video, use the formula to find out linked-action tubes.
    
    
    that is, by finding two regions (one frame) of the highest score (Score+iou) between two adjacent frames that belong to an action category
    these regions are then concatenated together to form the action tube.  So how do you calculate an action tube action Acore?
        

Of course, the paper is not so finished, based on action tube, the video Action ClassificationThis is very simple, see the formula:  

As for the effect, it must be state-of-art to say.


The paper's Main contributions, the individual feels the following points: A. Combines appearance and motion signals.
B. Confirms theappearance and motion signals arecomplementary's.C. Using motion signal to eliminate those non-discriminative regions, this is relatively novel.

Of course there is. Insufficient: A. The dataset is mostly for an actor, and the method is very poor in the case of multiple actors.
B. Motion is pre-calculated rather than learned. C. The entire framework is very pipeline.
Well, no, no, no, no, no. Welcome to harass ...



From for notes (Wiz)

Finding Action tubes-cvpr-2015

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.