Deep Learning Image Segmentation--u-net Network

Last Update:2018-09-02 Source: Internet

Author: User

Tags keras

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Write in front:

has not tidied up the habit, causes many things to be forgotten, misses. Take this opportunity to develop a habit.

Make a collation of the existing things, record, to explore and share new things.

So the main content of the blog for I have done, the study of the collation of records and new algorithms, network framework of learning. It's basically about deep learning and machine learning.

The first one is the deep learning image segmentation--u-net network content. Follow-up will be as systematic as possible to learn deep learning and record.

The update frequency is greater than or equal to one article per week.

The image segmentation of deep learning is derived from classification, which is a classification of the region of pixels.

Unlike the segmentation of images in machine learning using clustering, image segmentation in deep learning is a supervisory problem, and the segmentation gold standard (Ground Truth) is required as a training label.

In the process of image segmentation, the loss function of network generally uses dice coefficient as the loss function, and the dice coefficient is simply the ratio between the number of pixels coincident and the total area between the segmentation result and the gold standard.

"80573234, the commonly used measurement index in medical image segmentation"

U-net Reference Documents:

U-net:convolutional Networks for biomedical image segmentation.

Https://arxiv.org/pdf/1505.04597.pdf

U-NET Network Structure

U-net Network is a CNN-based image segmentation network, mainly used in medical image segmentation, the network was originally proposed for cell wall segmentation, and then in the lung nodule detection and retinal vein extraction and other aspects have excellent performance.

The initial u-net network structure is mainly composed of convolution layer, maximal pooling layer (bottom sampling), deconvolution (upper sampling) and Relu nonlinear activation function. The entire network process is as follows:

Maximum pooling layer, down sampling process:

Assuming that the initial input image size is: 572x572 Gray scale, after 2 times 3x3x64 (64 convolution cores, obtained 64 feature map) convolution kernel for convolution operations into 568x568x64 size,

The 2x2 maximum pooling operation is then changed to 248x248x64. (Note: The 3x3 convolution follows a Relu nonlinear transformation to describe the convenience so it is not written).

According to the above process repeated 4 times, i.e. (3x3 convolution +2x2) x 4 times, the number of 3x3 convolutional cores multiplied in the first 3x3 convolution operation after each pooling.

At the lowest level, the 4th time after the maximum pooling, the image becomes 32x32x512 size, then 2 times the 3x3x1024 of the convolution operation, the final change to the size of 28x28x1024.

Anti-convolution layer, on the sampling process:

At this point the size of the image is 28x28x1024, first 2x2 deconvolution operation makes the image change to 56x56x512 size, and then to the corresponding maximum pool layer before the image copy and clipping (copy and crop),

The image obtained by the deconvolution is stitched together to get an image of 56x56x1024 size, and then the 3x3x512 convolution operation is performed.

Repeated 4 times in accordance with the above process, that is, (2x2 deconvolution +3x3 convolution) x4 times, in each splicing after the first 3x3 convolution operation, the number of 3x3 convolutional cores is reduced exponentially.

After reaching the top level, the 4th time after the deconvolution, the image becomes 392x392x64 size, copied and trimmed and then stitched together to get the size of the 392x392x128, and then two times the 3x3x64 convolution operation.

Get an image of 388x388x64 size, and then perform a 1x1x2 convolution operation at the end.

And then the result is probably this (), the result of the segmentation of the yellow region is to infer the result of the blue region, of course, in practical applications are basically to choose to keep the image size unchanged convolution (after convolution with 0 padding).

For information on convolution and deconvolution, refer to: 80520950 "

After the specific how to do, and then talk about the advantages and disadvantages of u-net, you can see that the network structure is not involved in any of the entire connection layer, while in the sampling process using the results of the next sampling,

So that in the deep convolution can have shallow simple features, so that the input of the convolution more abundant, the natural results can also reflect the original information of the image.

(CNN convolution network, in the shallow convolution is the simple characteristics of the image, the deep convolution is to reflect the complex characteristics of the image)

As stated above, the structure of the U-net network is primarily a development of the structure of the RPN (region proposal networks), which extracts relatively small scales of information (simple features) near the shallow layers of the input.

The deeper layers near the output are extracted from relatively large scales of information (complex features), which are judged by adding shortcut (which directly merges the original information without any manipulation with subsequent results) to integrate multi-scale information.

However, the U-net network structure is only predicted on a single scale, and the problem of size change cannot be handled well.

"Tianchi Medical First Team:https://tianchi.aliyun.com/forum/new_articledetail.html?spm=5176.8366600.0.0.6021311f0wiltq& raceid=231601&postsid=2947 "

So for the improvement of the network, as far as I'm concerned, tried: 1, in the last layer (after the last sampling, before the first sampling) to join a full-join layer, the purpose is to add a cross-entropy loss function, in order to add additional information (such as whether a picture is a certain type of things)

2, for each time the sample is output (prediction), the results will be a fusion (similar to the FPN network (feature pyramid Networks), of course, there are other things in the network)

3. Join the BN (Batch normalization) layer

The result of the improvement is, of course, somewhat helpful for specific issues to be addressed.

Finally is the corresponding code, because the U-net network structure is relatively simple, so the general use of Keras to write will be more, I also write with Keras. After finishing, the link to the code is pasted.

Deep Learning Image Segmentation--u-net Network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More