A review of classical algorithms for image segmentation

Source: Internet
Author: User

[Turn] https://www.leiphone.com/news/201801/vV9tk5kK95g0spUG.html


Image semantic Segmentation is an important branch in the field of AI, and it is an important part of image understanding in machine vision technology. In recent years, this technology has been used in automatic driving technology. Car camera to detect images, background computer can automatically classify image segmentation to avoid obstacles such as pedestrians and vehicles. With the intense study in recent years, make the image segmentation has a huge development, this article introduces the depth learning of the classical image segmentation algorithm.

In the near future Lei Feng Net (public number: Lei Feng net) Gair Big Lecture hall, from Zhejiang University in the reading doctoral Liu Hantang for waiting in the live room of the students to do a theme for "image segmentation of the classic algorithm," The technology to share, this article based on live sharing content, students, if you are interested in the content of the guests can also be AI MU School watch live playback. (technical details recommended to watch video playback)

Liu Han Tang, Zhejiang University computer Department doctoral students in Reading, Alibaba IDST intern. The research direction is the computer vision, the depth study. The personal public number is: Jarvis's Daily (jarvisdaily).

Share Outline :

The problem definition of image segmentation and the application examples in the real scene

Full Convolution network

Bilinear on-Line sampling

Feature Pyramid

Mask-rcnn

Hello everyone, I am a doctoral student in Zhejiang University Liu Hantang, currently in Alibaba IDST internship. The next sharing will first introduce the image segmentation is what to do, image segmentation of what the application scene and do the image segmentation experiment often used several datasets.

Finally, some methods of image segmentation are explained. It is divided into two parts, the first part is the traditional image segmentation algorithm, although it is rarely used, but the algorithm is more beautiful. The second part is the depth learning algorithm, which introduces the classic techniques that have been popular in recent years.

What is image segmentation.

Image segmentation is a category or object that predicts each pixel in the image. Image segmentation has two sub problems, one is to predict only the category level of segmentation, each pixel marked a position. The second is the individual who distinguishes different objects.

Application scenarios, such as automatic driving, 3D map reconstruction, landscaping, face modeling, and so on.

The most commonly used data sets

Mainly introduced three: Pascal Voc;cityscapes;mscoco.

The first is Pascal VOC data set

This is a relatively old set of data, it provides 20 categories, including, people, cars and so on. There are 6929 annotated images that provide category-level annotations and individual-level annotations, meaning that you can do semantic segmentation, differentiate between cars and individual partitions, distinguish between several cars, and mark different cars.

The second one is the cityscapes dataset

Mainly for road driving scenarios, it has 30 fine categories. 5000 of them are finely labeled, accurate to pixel level. There are also 20000 pictures with rough annotations. It can also provide semantic level segmentation and individual level segmentation.

The third is the MS Coco DataSet

This is the largest dataset with semantic segmentation so far, the categories provided are 80 categories, with more than 330,000 images, 200,000 with callouts, and more than 1.5 million individuals in the entire dataset, and some of the latest papers will be tested on the Mscoco dataset because it is the most difficult and the most challenging.

Traditional graphic Cutting

Figure cutting is to remove some edges, so that two of the child graph is not connected; The goal of the figure cut is to find a cut that minimizes the edge and the weight of the removal.

the advantages and disadvantages of Figure cutting

The advantage is that the segmentation effect is also good, and is a universal framework, suitable for a variety of characteristics. The disadvantage is that the time complexity and space complexity is high, the need to select the number of partition blocks in advance.

Diagram of the failure of the cut column

To overcome this failure, there was a paper that presented normalized cut. It is to add the weight parameter Volume in the graph segmentation. Volume (a) is the sum of the weights of all the edges in a. This method balances the size of each child graph.

Depth Learning Algorithm

The first paper to compare successfully using neural network to do image segmentation is fully convolutional Networks (hereinafter referred to as FCN).

The traditional neural network to do the classification of the step is, first is an image after a multi-layer convolution after the dimensionality reduction of the feature map, this feature map through the full join layer into a classifier, and finally output a category of vectors, which is the result of classification.

And FCN is to replace all the full connection layer to roll base, the original can only output a category classification of the network can be in the feature map of each pixel output a classification result. This turns the vector of the classification into a feature map of the classification.

In order to enable the classification of the feature map to restore to the original size, using the upper sampling layer. Specific details can be viewed in video playback.

FCN's structure diagram

The following describes how to enlarge the image operation.

Here are two concepts, the first concept is the deconvolution layer (deconvolution), and the second concept is the double linear difference value on the sampling (bilinear upsampling).

Here the "deconvolution" is not really the inverse of the convolution, with transposed convolution replaced more appropriate, but the original paper used is deconvolution, we use the word below, it can be equivalent to ordinary convolution. Its main purpose is to achieve the sampling.


How to calculate the inverse convolution, detailed process can go to Ai mu school free watch video playback.

let's talk about padding and stride.

Padding and Stride actually refer to the ordinary convolution, rather than the normal convolution equivalent of deconvolution.

On bilinear sampling Difference

Three applications of the bilinear sampling difference value: used as the weight of the initialization deconvolution, no deconvolution, use of convolution + convolution;

expansion convolution or belt hole convolution (dilated convolution) are described below.

Its use can make the view of feature map become larger, but do not increase the amount of calculation, for the benefit of image segmentation is more conducive to the extraction of global information, so that the segmentation accuracy increased a lot.

feature Pyramid (Feature pyramid)

There are several characteristics of the pyramid

Feature Pyramid Network

Pyramid pooling

The front is to extract features at different scales, and this is to pooling the feature to a different size after it is extracted.

Characteristics of MASK-RCNN

The first feature is that it is a multiple-branched output. It also outputs the categories of objects, bounding box and mask.

The second feature is that it uses binary Mask. Neural networks used to use multiple classes of mask, and it only needed to determine where the object was.

Finally, the roialign layer. The position of the object can be accurately mapped to the position of the feature map.

For details please watch the free live playback video.

comparison of Rol pooling and ROI align

Lei Feng Network Ai MU School provides this live playback video, click Link Direct: http://www.mooc.ai/course/414/learn#lesson/2266.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.