CV | Semantic co-segmentation in Videos video Collaborative semantic segmentation

Source: Internet
Author: User


This article is published in ECCV2016, previously read the same author's CVPR2016 object Flow, recently because of the report, the way to organize the study notes, welcome to discuss together.


Collaborative semantic Segmentation of videoThis direction related to the article is not much, is a relatively new field of study. In a word, the semantic segmentation of the video set is carried out, and the semantic tag is added for the output segmentation result. The approximate effect borrows the illustrations from the paper as shown below.

Comparing the results of three methods, the last one listed in this paper, for the input of three groups of video, the output of three groups of video segmentation results, that is, collaborative segmentation, collaborative semantic Segmentation for these segmentation results plus semantic tags, such as the image of elephants, giraffes, lions and so on.
Summary/Contribution pointsThe main contribution of this paper is to propose a video collaborative semantic segmentation algorithm, which is divided into the following three parts: 1. Using FCN (full convolutional network) to get the initial segmentation result and the category semantic score. 2. Generate semantic trajectory chain tracklet based on segmentation result tracking. 3. The Tracklet is optimized by sub-model, and the Tracklet with high similarity and accurate segmentation results are screened to obtain the final semantic segmentation result.
Comparison of existing work 1. Video Target segmentation/collaborative SegmentationAt present, the more common video segmentation algorithms are based on proposals or through the transmission of foreground information. Most of the collaborative segmentation algorithms assume that the input video contains at least one common goal.
The method proposed in this paper is not dependent on any hypothesis and proposals, and does not restrict the object type and quantity of segmentation.
2. Target segmentation algorithm for weak supervisionAt present, most of the weak supervised target segmentation algorithm is known to be segmented, so the segmentation effect is better and is concerned. This kind of target segmentation algorithm can be divided into training class and non-training class: The training-based method relies on the training sample, and the fee-training method relies on the visual inspection algorithm or proposals.
The author points out that the method of this paper is unsupervised and does not depend on proposals, but uses sub-model optimization graph structure to get the segmentation result in different video.
Overall FrameworkIn this paper, the overall block diagram of the proposed method is as follows, and the method is mainly composed of two parts, except for the initial segmentation results obtained by FCN preprocessing.

The approximate process of the algorithm is: after the input multi-segment video, FCN processing the resulting multi-segment video segmentation results, after clustering, respectively, each category of segmentation results generated tracklet, and then based on the sub-mode function optimization, select a high similarity and good segmentation of the trajectory chain Tracklet, Get the final result.
Semantic Tracklet Generation of the meaning chain of raw idiomsThis section is divided into three main parts:

1. Initialization: The semantic segmentation results of each frame are obtained using the FCN full convolution network.
2. Clustering: The resulting segmentation results are clustered, the proposed method belongs to unsupervised, the video contains the target category location, so use the Mean-shift algorithm clustering, and then select the first n large cluster results.
3. Tracking the generation of semantic chain: This section of the clustering results of each kind of target segmentation results, sampling selection of several frames as a semantic chain initialization input. Ensure that each initialization input is within 20 frames to ensure the quality of the trace. The key to the semantic chain is that each initialization is scanned before and after, which effectively solves the problem of occlusion of the target, as shown in the following figure. Finally, each category generates two semantic chains.


This stage outputs: 2N per category semantic chain.
In order to optimize the pre-and post-frame segmentation results, the CRF energy equation is defined, because the definition of the comparison routine does not explain too much. The formula is as follows:

Semantic chain Collaborative selectionThe output from the previous step: multiple track chains for each video are Tracklet, respectively, in different categories.
The purpose of this step is to select a more reliable trajectory chain tracklet, and get the optimized segmentation results.
This paper chooses the sub-mode optimization method to get better segmentation results. Before the sub-mode function is given, it is necessary to construct the graph structure, which is used for each category's trajectory chain in each video. Define the graph structure, g= (v,e), V is the graph node, here is a trajectory chain, E is a correlated belief trajectory chain edge set. Set a total of M categories of the segmentation target, l={1,2,..., M}, for each category L∈l has an initialization path chain set O, the target chain set a.
The Sub-module function defines the target to be solved, the trajectory chain in the same category: 1) has more similar characteristics; 2) has better segmentation results. Corresponds to one facility item (translated as facility item = =) and one unary item (data penalty).
Facility ItemsIn order to find more similar nodes, the model of fire extinguishers commonly used in sub-model optimization is defined as follows:
Where Wij is the similarity between the node VI of each facility to be selected (i.e. the better trajectory chain to be selected in this article) and the current node VJ, the corresponding Wij value is large if the two nodes are similar. The second item is the cost value, that is, the lower the value of f (A) is the value of the generation to be paid for the VI to be selected.
unary ItemsThe definition of this item is to find a node with high tessellation quality, similar to the regular unary term, as defined below:
Among them, fai_o is to measure that VI belongs to a target category of objectness score,fai_m is to measure the consistency of motion, while fai_s measures shape consistency. The larger the U (A), the higher the segmentation quality.
sub-mode function optimizationTo get more similarity and high segmentation quality Tracklet:
It can be understood that maximizing a gives you the best results. This step is solved with a greedy algorithm, the approximate flow is as follows: (where H (Ai) is the energy increment value of the first iteration; n is the total number of nodes. )

The following figure is an example diagram of the greedy optimization process. Assuming a total of three nodes at this time Tracklet has a high similarity, from the figure can be seen similarity value is 80+, the current strategy has left two nodes selected into a set (figure bright node). Judging whether the third node is selected, although there is a higher similarity, but the unary term is to measure the quality of segmentation results of a lower score of only 29.7, so the final decision not to select the node.

Experimental ResultsThis method is evaluated on Youtube-objects dataset, Movics DataSet and Safari dataset, and all of them have achieved good results. Let's stick to a few experimental results.




In general, the innovation of this paper is that we do not rely on proposals to divide multiple videos into collaborative semantics. The two-way tracking of the segmentation between frame and frame is a good solution to the occlusion problem. The optimization of the sub-mode function is optimized to select the optimal experimental results.

You are welcome to study and discuss together.

Off-topic: there is a noon without a nap ... Say recently want to do too many things, it is possible to do well, is not should put down some, do a good job, a piece of slowly.
Off-topic: The weekend, will be a holiday, empty yourself.
I think it's nice to write well. *^_____________________________^*





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.