MXNET: Classification Model

Source: Internet
Author: User
Tags shuffle mxnet dataloader

The linear regression model is suitable for scenarios where output is a continuous value, such as the output as a house price. In other scenarios, the model output can also be a discrete value, a sample category. For such classification problems, we can use a classification model, such as Softmax regression.

For the sake of discussion, let's assume that the input image has a 2x2 size and sets the four eigenvalues of the image, that is, the pixel values are \ (x_1,x_2,x_3,x_4\). Suppose a real label for a picture in a training dataset is a dog, cat, or chicken, which corresponds to a discrete value \ (y_1,y_2,y_3\).

Vector Computing expressions for single-sample classification

For the above problem, assume that the weight and deviation parameters of the classification model are:

\[w=\begin{bmatrix}w_11 & W_12 & w_13 \w_21 & w_22 & w_23 \w_31 & w_32 & w_33 \w_41 & w_42 &am P W_43 \end{bmatrix}, B=\begin{bmatrix}b_1 & b_2 & B_3\end{bmatrix}\]

Set a sample to:\ (x^{(i)}=[x_1^{(i)}, x_2^{(i)}, x_3^{(i)}, x_4^{(i)}]\), the output layer is \ (o^{(i)}=[o_1^{(i)}, o_2^{(i)}, O_ 3^{(i)}]\), the probability of predicting a dog, cat or chicken is \ (y^{(i)}=[y_1^{(i)}, y_2^{(i)}, y_3^{(i)}]\).

An appropriate calculation expression for small batches of samples

If there are N samples per batch, assuming that the number of input features is x and the number of output categories is Y, the dimension of X for the size of n x,w is 1*y for the dimensionxy,b.

\[o=xw+b, \hat (Y) =softmax (O) \]
The addition operation here uses the broadcast mechanism.

Cross Entropy loss function

Softmax regression uses the crossover entropy loss function (cross-entropy
Loss). The number of categories M for the given classification problem. When the label category for sample I is \ (y_j\) (\ (1 \leq J \leq m\)), set \ (q_j^{(i)}=1\) and when \ (k \neq J, 1 \leq k \leq m\) \ (q_k^{(i)}=0\).
The predictive probability of the model for sample I on category \ (y_j\) is \ (p_j^{(i)}\)(\ (1 \leq J \leq m\)). Assuming that the number of samples in the training data set is N, the cross-entropy loss function is defined as
\[\ell (\boldsymbol{\theta}) =-\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m q_j^{(i)} \log p_j^{(i)},\]

where \ (\boldsymbol{\theta}\) represents the model parameter. In training the Softmax regression, we will use the optimization algorithm to iterate over the model parameters and continuously reduce the value of the loss function.

It is advisable to understand the cross-entropy loss function from another angle. In training the model, for each sample on the training dataset, we want the model to be as large as possible to predict the actual label category of these samples, that is, to make it as easy as possible for the model to output the true label category.

Set \ (p_{\text{label}_i}\) is the predictive probability of the model for the Label Class I, and sets the number of samples for the training dataset to N. Because the logarithmic function is monotonically increasing, the joint predictive probability of maximizing the training data set for all label categories (\prod_{i=1}^n p_{\text{label}_i}\) is equivalent to maximizing <\ (\sum_{i=1}^n \log p_ {\text{label}_i}\), which is minimized \ (-\sum_{i=1}^n \log p_{\text{label}_i}\), is also equivalent to minimizing the cross-entropy loss function defined above.

Softmax regression get fashion-mnist datasets

We use a data set with a category of numbers mnist. In this data set, the image size is: math:28 times 28, which includes 10 categories altogether.

from mxnet.gluon import data as gdatadef transform(feature, label):    return feature.astype('float32') / 255, label.astype('float32')mnist_train = gdata.vision.MNIST(train=True, transform=transform)mnist_test = gdata.vision.MNIST(train=False, transform=transform)feature, label = mnist_train[0]print 'feature shape: ', feature.shape, 'label: ', label
Reading data
batch_size = 256train_iter = gdata.DataLoader(mnist_train, batch_size, shuffle=True)test_iter = gdata.DataLoader(mnist_test, batch_size, shuffle=False)
Initialize model parameters

The length of the input vector for the model is 28x28=784: Each element of the vector corresponds to each pixel in the picture. Because the picture has 10 categories, the output layer of the single-layer neural network has a number of 10 outputs. As we know from the previous section, the weights and deviation parameters of the Softmax regression are the matrices of 784x10 and 1x10 respectively.

num_inputs = 784num_outputs = 10W = nd.random_normal(scale=0.01, shape=(num_inputs, num_outputs))b = nd.zeros(num_outputs)params = [W, b]for param in params:    param.attach_grad()
Defining the Softmax function
#在结果中保留行和列这两个维度(keepdims=True)def softmax(X):    exp = X.exp()    partition = exp.sum(axis=1, keepdims=True)    return exp / partition # 这里应用了广播机制。
Defining the Model

Each original picture is converted to a vector of length num_inputs through the reshape function.

def net(X):    return softmax(nd.dot(X.reshape((-1, num_inputs)), W) + b)
Defining loss functions

By using the Pick function, we get the predicted probabilities of the labels of 2 samples.

y_hat = nd.array([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])y = nd.array([0, 2])nd.pick(y_hat, y)# output[ 0.1  0.5]<NDArray 2 @cpu(0)>

The cross-entropy loss function is translated into code:

def cross_entropy(y_hat, y):    return -nd.pick(y_hat.log(), y)
Evaluation of models-calculation of classification accuracy

Classification accuracy is the ratio of the correct forecast quantity to the total forecast quantity.

def accuracy(y_hat, y):    return (nd.argmax(y_hat, axis=1) == y).asnumpy().mean()def evaluate_accuracy(data_iter, net):    acc = 0    for X, y in data_iter:        acc += accuracy(net(X), y)    return acc / len(data_iter)

Because we have randomly initialized the model net, the accuracy of this model should be close to 1/num_outputs = 0.1.

evaluate_accuracy(test_iter, net)# output0.0947265625
Training model

During the training model, the iteration period number Num_epochs and the learning rate LR are all tunable hyper-parameters. Changing their values may give a more accurate model of classification.

def SGD (params, LR, batch_size): For param in params:param[:] = PARAM-LR * Param.grad/batch_size num _epochs = 5LR = 0.1loss = Cross_entropydef train_cpu (NET, Train_iter, Test_iter, loss, Num_epochs, Batch_size, Params=none        , Lr=none, Trainer=none): For epoch in range (1, Num_epochs + 1): train_l_sum = 0 train_acc_sum = 0 For X, y in Train_iter:with Autograd.record (): Y_hat = Net (X) L = loss (Y_hat, y                ) L.backward () If trainer is NONE:SGD (params, LR, batch_size) Else: Trainer.step (batch_size) Train_l_sum + = Nd.mean (l). Asscalar () Train_acc_sum + = accuracy (y_ Hat, y) Test_acc = evaluate_accuracy (test_iter, net) print ("Epoch%d, loss%.4f, train ACC%.3f, test acc%. 3f "% (Epoch, Train_l_sum/len (Train_iter), Train_acc_sum/len (Train_iter), TEST_ACC)) train_cpu (NET, Train_iter, Test_ ITER, loss, Num_epochs, batCh_size, params, LR) # outputepoch 1, Loss 0.7105, train ACC 0.842, test ACC 0.884epoch 2, Loss 0.4296, train acc 0.887, TE St ACC 0.899epoch 3, Loss 0.3840, train ACC 0.896, test ACC 0.905epoch 4, Loss 0.3607, train ACC 0.901, test ACC 0.909EPOC H 5, loss 0.3461, train ACC 0.905, test ACC 0.911
Forecast
data, label = mnist_test[0:9]predicted_labels = net(data).argmax(axis=1)
Softmax regression--using Gluonsoftmax and cross entropy loss functions

Defining Softmax operations and cross-entropy loss functions separately may cause numerical instability. Therefore, Gluon provides a function that includes softmax operations and cross-entropy loss calculations. It has better numerical stability.

Program implementation
# -*- coding: utf-8 -*-from mxnet import init#定义数据源import gbbatch_size = 256train_iter, test_iter = gb.load_data_fashion_mnist(batch_size)#定义网络from mxnet.gluon import nnnet = nn.Sequential()net.add(nn.Flatten())net.add(nn.Dense(10))net.initialize(init.Normal(sigma=0.01))#损失函数from mxnet.gluon import loss as glossloss = gloss.SoftmaxCrossEntropyLoss()#优化算法from mxnet.gluon import Trainertrainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})#训练模型num_epochs = 5gb.train_cpu(net, train_iter, test_iter, loss, num_epochs, batch_size, None,             None, trainer)
Base function Package gb.py
From Mxnet.gluon import data as Gdatafrom mxnet import autogradfrom mxnet import Ndarray as Nddef transform (feature, label ): Return Feature.astype (' float32 ')/255, Label.astype (' float32 ') Mnist_train = Gdata.vision.MNIST (Train=true, TRANSFO Rm=transform) Mnist_test = Gdata.vision.MNIST (Train=false, Transform=transform) def load_data_fashion_mnist (Batch_ Size): Train_iter = Gdata. Dataloader (Mnist_train, Batch_size, shuffle=true) Test_iter = Gdata. Dataloader (Mnist_test, Batch_size, Shuffle=false) return train_iter, test_iterdef accuracy (y_hat, y): Return (Nd.arg        Max (Y_hat, axis=1) = = y). Asnumpy (). Mean () def evaluate_accuracy (Data_iter, net): ACC = 0 for X, y in Data_iter:        ACC + = accuracy (NET (X), y) return Acc/len (Data_iter) def SGD (params, LR, batch_size): for param in params: param[:] = PARAM-LR * Param.grad/batch_sizedef train_cpu (NET, Train_iter, Test_iter, loss, Num_epochs, Batch_size, PA Rams=none, Lr=none, Trainer=none): For epoch in range (1, Num_epochs + 1): train_l_sum = 0 train_acc_sum = 0 for X, y in Train_iter:with Autogra D.record (): Y_hat = Net (X) L = loss (Y_hat, y) L.backward () if trainer is NONE:SGD (params, LR, batch_size) else:trainer.step (batch_size) TR Ain_l_sum + = Nd.mean (l). Asscalar () Train_acc_sum + = accuracy (y_hat, y) Test_acc = evaluate_accuracy (test _iter, net) print ("Epoch%d, loss%.4f, train ACC%.3f, test ACC%.3f"% (Epoch, Train_l_sum/len (trai N_iter), Train_acc_sum/len (Train_iter), TEST_ACC))

MXNET: Classification model

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.