MXNET: Discard method

Source: Internet
Author: User
Tags mxnet

In addition to the weight attenuation described earlier, the deep learning model often uses the Discard method (dropout) to cope with overfitting problems.

Methods and principles

To ensure the certainty of the test model, the use of the Discard method occurs only when the model is trained, not when the model is tested. When a layer in a neural network uses the discard method, the neurons in the layer will have a certain probability of being discarded.

Set the drop probability to \ (p\). Specifically, the level of any neuron after the activation function is applied, there is a probability of \ (p\) squared 0, there is a probability of \ (1?p\) divided by \ (1?p\) to do the stretching. The drop probability is a super parameter of the Discard method.

In multilayer perceptron, the output of the hidden layer node:

\[h_i = \phi (x_1 w_1^{(i)} + x_2 w_2^{(i)} + x_3 w_3^{(i)} + x_4 w_4^{(i)} + b^{(i)}), \]

Set the drop probability to \ (p\), and set the random variable \ (\xi_i\) has a \ (p\) probability of 0, with a \ (1?p\) probability of 1. Then, the calculation expression of the hidden unit \ (h_i\) using the Discard method becomes

\[h_i = \frac{\xi_i}{1-p} \phi (x_1 w_1^{(i)} + x_2 w_2^{(i)} + x_3 w_3^{(i)} + x_4 w_4^{(i)} + b^{(i)}).

Note that the Discard method is not used when testing the model. Because \ (\mathbb{e} (\frac{\xi_i}{1-p}) =\frac{\mathbb{e} (\xi_i)}{1-p}=1\), the same neuron's expectation of output values during model training and testing is constant.

Output layer:
\[o_1 = \phi (h_1 w_1 ' + h_2 w_2 ' + h_3 w_3 ' + h_4 w_4 ' + h_5 w_5 ' + B ') \]

cannot be overly dependent on either of the \ (h_1,..., h_5\) . This usually results in a weight parameter \ (w_1 ',..., w_5 ' \) in the \ (o_1\) expression approaching 0. Therefore, the Discard method can play a regularization role, and can be used to deal with overfitting.

Realize

Discard the values in X as Drop_prob.

def dropout(X, drop_prob):    assert 0 <= drop_prob <= 1    keep_prob = 1 - drop_prob    # 这种情况下把全部元素都丢弃。    if keep_prob == 0:        return X.zeros_like()    mask = nd.random.uniform(0, 1, X.shape) < keep_prob    return mask * X / keep_prob

Define network parameters: three-tier network structure for minst tasks.

num_inputs = 784num_outputs = 10num_hiddens1 = 256num_hiddens2 = 256W1 = nd.random.normal(scale=0.01, shape=(num_inputs, num_hiddens1))b1 = nd.zeros(num_hiddens1)W2 = nd.random.normal(scale=0.01, shape=(num_hiddens1, num_hiddens2))b2 = nd.zeros(num_hiddens2)W3 = nd.random.normal(scale=0.01, shape=(num_hiddens2, num_outputs))b3 = nd.zeros(num_outputs)params = [W1, b1, W2, b2, W3, b3]for param in params:    param.attach_grad()

The full join layer and the activation function ReLU are strung together, and the output of the activation function is used as a discard method. We can set the drop probability of each layer separately. In general, it is recommended to set a smaller drop probability near the input layer . The network structure is as follows:

drop_prob1 = 0.2drop_prob2 = 0.5def net(X):    X = X.reshape((-1, num_inputs))    H1 = (nd.dot(X, W1) + b1).relu()    # 只在训练模型时使用丢弃法。    if autograd.is_training():        # 在第一层全连接后添加丢弃层。        H1 = dropout(H1, drop_prob1)    H2 = (nd.dot(H1, W2) + b2).relu()    if autograd.is_training():        # 在第二层全连接后添加丢弃层。        H2 = dropout(H2, drop_prob2)    return nd.dot(H2, W3) + b3

Training and testing:

num_epochs = 5lr = 0.5batch_size = 256loss = gloss.SoftmaxCrossEntropyLoss()train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)gb.train_cpu(net, train_iter, test_iter, loss, num_epochs, batch_size, params,             lr)

Result output:

epoch 1, loss 0.9913, train acc 0.663, test acc 0.931epoch 2, loss 0.2302, train acc 0.933, test acc 0.954epoch 3, loss 0.1601, train acc 0.953, test acc 0.958epoch 4, loss 0.1250, train acc 0.964, test acc 0.973epoch 5, loss 0.1045, train acc 0.969, test acc 0.974
Gluon implementation

When the model is trained, the dropout layer randomly discards the previous layer's output elements at the specified drop probability, and the dropout layer does not work when testing the model.
Using gluon, we can construct multilayer neural networks more easily and use discard methods.

import syssys.path.append('..')import gluonbook as gbfrom mxnet import autograd, gluon, init, ndfrom mxnet.gluon import loss as gloss, nndrop_prob1 = 0.2drop_prob2 = 0.5net = nn.Sequential()net.add(nn.Flatten())net.add(nn.Dense(256, activation="relu"))# 在第一个全连接层后添加丢弃层。net.add(nn.Dropout(drop_prob1))net.add(nn.Dense(256, activation="relu"))# 在第二个全连接层后添加丢弃层。net.add(nn.Dropout(drop_prob2))net.add(nn.Dense(10))net.initialize(init.Normal(sigma=0.01))

Training and Results:

num_epochs = 5batch_size = 256loss = gloss.SoftmaxCrossEntropyLoss()train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)gb.train_cpu(net, train_iter, test_iter, loss, num_epochs, batch_size,None, None, trainer)# outputepoch 1, loss 0.9815, train acc 0.668, test acc 0.927epoch 2, loss 0.2365, train acc 0.931, test acc 0.952epoch 3, loss 0.1634, train acc 0.952, test acc 0.968epoch 4, loss 0.1266, train acc 0.963, test acc 0.972epoch 5, loss 0.1069, train acc 0.969, test acc 0.976

MXNET: Discard method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.