MXNET：丟棄法

最後更新：2018-08-23 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：shape 串連 add 指定 vat splay 多層神經網路元素 keep

除了前面介紹的權重衰減以外，深度學習模型常常使用丟棄法（dropout）來應對過擬合問題。

方法與原理

為了確保測試模型的確定性，丟棄法的使用只發生在訓練模型時，並非測試模型時。當神經網路中的某一層使用丟棄法時，該層的神經元將有一定機率被丟棄掉。

設丟棄機率為 $p$。具體來說，該層任一神經元在應用啟用函數後，有 $p$ 的機率自乘 0，有 $1?p$ 的機率自除以 $1?p$ 做展開。丟棄機率是丟棄法的超參數。

多層感知機中，隱層節點的輸出：

\[h_i = \phi(x_1 w_1^{(i)} + x_2 w_2^{(i)} + x_3 w_3^{(i)} + x_4 w_4^{(i)} + b^{(i)}),\]

設丟棄機率為 $p$，並設隨機變數$\xi_i$ 有 $p$ 機率為 0，有 $1?p$ 機率為 1。那麼，使用丟棄法的隱藏單元 $h_i$ 的計算運算式變為

\[h_i = \frac{\xi_i}{1-p} \phi(x_1 w_1^{(i)} + x_2 w_2^{(i)} + x_3 w_3^{(i)} + x_4 w_4^{(i)} + b^{(i)}).\]

注意到測試模型時不使用丟棄法。由於 $\mathbb{E} (\frac{\xi_i}{1-p}) =\frac{\mathbb{E}(\xi_i)}{1-p}=1$，同一神經元在模型訓練和測試時的輸出值的期望不變。

輸出層：
\[o_1 = \phi(h_1 w_1' + h_2 w_2' + h_3 w_3' + h_4 w_4' + h_5 w_5' + b')\]

都無法過分依賴 $h_1,…,h_5$ 中的任一個。這樣通常會造成 $o_1$ 運算式中的權重參數 $w_1',…,w_5'$ 都接近 0。因此，丟棄法可以起到正則化的作用，並可以用來應對過擬合。

實現

按照drop_prob丟棄X中的值。

def dropout(X, drop_prob):    assert 0 <= drop_prob <= 1    keep_prob = 1 - drop_prob    # 這種情況下把全部元素都丟棄。    if keep_prob == 0:        return X.zeros_like()    mask = nd.random.uniform(0, 1, X.shape) < keep_prob    return mask * X / keep_prob

定義網路參數：三層網路結構，針對minst任務。

num_inputs = 784num_outputs = 10num_hiddens1 = 256num_hiddens2 = 256W1 = nd.random.normal(scale=0.01, shape=(num_inputs, num_hiddens1))b1 = nd.zeros(num_hiddens1)W2 = nd.random.normal(scale=0.01, shape=(num_hiddens1, num_hiddens2))b2 = nd.zeros(num_hiddens2)W3 = nd.random.normal(scale=0.01, shape=(num_hiddens2, num_outputs))b3 = nd.zeros(num_outputs)params = [W1, b1, W2, b2, W3, b3]for param in params:    param.attach_grad()

將全串連層和啟用函數 ReLU 串起來，並對啟用函數的輸出使用丟棄法。我們可以分別設定各個層的丟棄機率。通常，建議把靠近輸入層的丟棄機率設的小一點。網路結構如下：

drop_prob1 = 0.2drop_prob2 = 0.5def net(X):    X = X.reshape((-1, num_inputs))    H1 = (nd.dot(X, W1) + b1).relu()    # 只在訓練模型時使用丟棄法。    if autograd.is_training():        # 在第一層全串連後添加丟棄層。        H1 = dropout(H1, drop_prob1)    H2 = (nd.dot(H1, W2) + b2).relu()    if autograd.is_training():        # 在第二層全串連後添加丟棄層。        H2 = dropout(H2, drop_prob2)    return nd.dot(H2, W3) + b3

訓練和測試：

num_epochs = 5lr = 0.5batch_size = 256loss = gloss.SoftmaxCrossEntropyLoss()train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)gb.train_cpu(net, train_iter, test_iter, loss, num_epochs, batch_size, params,             lr)

結果輸出：

epoch 1, loss 0.9913, train acc 0.663, test acc 0.931epoch 2, loss 0.2302, train acc 0.933, test acc 0.954epoch 3, loss 0.1601, train acc 0.953, test acc 0.958epoch 4, loss 0.1250, train acc 0.964, test acc 0.973epoch 5, loss 0.1045, train acc 0.969, test acc 0.974

Gluon 實現

在訓練模型時，Dropout 層將以指定的丟棄機率隨機丟棄上一層的輸出元素；在測試模型時，Dropout 層並不發揮作用。
使用 Gluon，我們可以更方便地構造多層神經網路並使用丟棄法。

import syssys.path.append('..')import gluonbook as gbfrom mxnet import autograd, gluon, init, ndfrom mxnet.gluon import loss as gloss, nndrop_prob1 = 0.2drop_prob2 = 0.5net = nn.Sequential()net.add(nn.Flatten())net.add(nn.Dense(256, activation="relu"))# 在第一個全串連層後添加丟棄層。net.add(nn.Dropout(drop_prob1))net.add(nn.Dense(256, activation="relu"))# 在第二個全串連層後添加丟棄層。net.add(nn.Dropout(drop_prob2))net.add(nn.Dense(10))net.initialize(init.Normal(sigma=0.01))

訓練和結果：

num_epochs = 5batch_size = 256loss = gloss.SoftmaxCrossEntropyLoss()train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)gb.train_cpu(net, train_iter, test_iter, loss, num_epochs, batch_size,None, None, trainer)# outputepoch 1, loss 0.9815, train acc 0.668, test acc 0.927epoch 2, loss 0.2365, train acc 0.931, test acc 0.952epoch 3, loss 0.1634, train acc 0.952, test acc 0.968epoch 4, loss 0.1266, train acc 0.963, test acc 0.972epoch 5, loss 0.1069, train acc 0.969, test acc 0.976

MXNET：丟棄法

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MXNET：丟棄法

聯繫我們

熱門內容

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MXNET：丟棄法

聯繫我們

熱門內容

熱門主題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support