電腦視覺 - 語義分割（二）

最後更新：2018-10-22 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：tab add ima self sel 處理 erp 採樣 measure

引言

已經有很多U-Net-Like的神經網路被提出。

U-Net適用於醫學映像分割、自然映像產生。

在醫學映像分割表現好：

因為利用了底層的特徵（同解析度級聯）改善上採樣的資訊不足。
醫學映像資料一般較少，底層的特徵其實很重要。

不只是醫學映像，對於二分類的語義分割問題，類 UNet 結構均取得不錯的效果。linknet、large kernel 和 Tiramisu 等模型的效果也不錯，但不如類 UNet 結構

本文的內容主要是根據我在 Kaggle TGS Salt Identification Challenge 比賽中所做的嘗試，以及別人分享的實驗結果。

一、損失函數

最常見的損失函數就是 binary cross entropy loss 結合 dice coeff loss
前者是像素層級的損失函數
後者是映像層級或者是 batch 層級的損失函數，適合基於以 IOU 作為評價指標的問題。
online bootstrapped cross entropy loss
比如FRNN，難樣本挖掘的一種
lovasz loss
來自論文 The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
也是適合以 IOU 作為評價指標的問題。

二、網路的 Backbone

比較流行的 Backbone 如 SE-ResNeXt101，SE-ResNeXt50，SE-ResNet101，我覺得在資料集不是特別充足的情況下，差別不大。
由於顯存的限制，我用的是 ResNet34
之前做的一些執行個體檢測，執行個體分割問題，用 ResNet50 的效果也和 ResNet101 差不多。

三、基於 Attention 的 UNet

Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks
SE-Net 中的 SE 結構就是對 feature maps 中的不同 channel 進行加權處理。
這篇論文中把這種 attention 通用化，SE-Net 中採用的是 cSELayer，還有對不同的 position 進行加權的 sSELayer，以及兩種加權方式結合起來的 scSELayer
論文中的實驗表明這些 Attention-Gated 結構，放在不同階段的 encoder 和 decoder 之後，比起不加 attention，效果更好

class cSELayer(nn.Module):    def __init__(self, channel, reduction=2):        super(cSELayer, self).__init__()        self.avg_pool = nn.AdaptiveAvgPool2d(1)        self.fc = nn.Sequential(            nn.Linear(channel, channel // reduction),            nn.ELU(inplace=True),            nn.Linear(channel // reduction, channel),            nn.Sigmoid()        )            def forward(self, x):        b, c, _, _ = x.size()        y = self.avg_pool(x).view(b, c)        y = self.fc(y).view(b, c, 1, 1)        return x * yclass sSELayer(nn.Module):    def __init__(self, channel):        super(sSELayer, self).__init__()        self.fc = nn.Conv2d(channel, 1, kernel_size=1)        self.sigmoid = nn.Sigmoid()    def forward(self, x):        y = self.fc(x)        y = self.sigmoid(y)        return x * yclass scSELayer(nn.Module):    def __init__(self, channels, reduction=2):        super(scSELayer, self).__init__()        self.sSE = sSELayer(channels)        self.cSE = cSELayer(channels, reduction=reduction)    def forward(self, x):        sx = self.sSE(x)        cx = self.cSE(x)        x = sx + cx        return x

四、關於 Context

class Dblock(nn.Module):    def __init__(self, channel):        super(Dblock, self).__init__()        self.dilate1 = nn.Conv2d(channel, channel, kernel_size=3, dilation=1, padding=1)        self.dilate2 = nn.Conv2d(channel, channel, kernel_size=3, dilation=2, padding=2)        self.dilate3 = nn.Conv2d(channel, channel, kernel_size=3, dilation=4, padding=4)        for m in self.modules():            if isinstance(m, nn.Conv2d):                if m.bias is not None:                    m.bias.data.zero_()    def forward(self, x):        dilate1_out = F.relu(self.dilate1(x), inplace=True)        dilate2_out = F.relu(self.dilate2(dilate1_out), inplace=True)        dilate3_out = F.relu(self.dilate3(dilate2_out), inplace=True)        out = x + dilate1_out + dilate2_out + dilate3_out        return out

OCNet: Object Context Network for Scene Parsing
對於語義分割，模型既需要高緯度的上下文資訊（全域資訊），又需要解析度能力（即圖片的局部資訊）。UNet 通過 concatenate 來提高圖片的局部資訊。那麼如何獲得更好的全域資訊呢。OCNet論文中對 UNet 結構中間的 center block 進行了討論。

五、Hyper columns

Hypercolumns for Object Segmentation and Fine-grained Localization

        d5 = self.decoder5(center)        d4 = self.decoder4(d5, e4)         d3 = self.decoder3(d4, e3)         d2 = self.decoder2(d3, e2)         d1 = self.decoder1(d2, e1)        f = torch.cat((            d1,            F.interpolate(d2, scale_factor=2, mode=‘bilinear‘, align_corners=False),            F.interpolate(d3, scale_factor=4, mode=‘bilinear‘, align_corners=False),            F.interpolate(d4, scale_factor=8, mode=‘bilinear‘, align_corners=False),            F.interpolate(d5, scale_factor=16, mode=‘bilinear‘, align_corners=False),        ), 1)

六、關於 Deep Supervision

電腦視覺 - 語義分割（二）

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

電腦視覺 - 語義分割（二）

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support