MXNET: Supervised learning

Last Update:2018-08-22 Source: Internet

Author: User

Tags shuffle mxnet

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linear regression

Given a data point set X and the corresponding target value Y, the goal of the linear model is to find a use vector W and displacement B
The lines described, to be as close as possible to each sample x[i] and Y[i].

The mathematical formula is represented as \ (\hat{y}=xw+b\)

The objective function is to minimize a bit of squared error \ (\sum_{i=1}^{n} (\hat{y_i}-y_i) ^2\)

A neural network is a collection of nodes (neurons) and forward edges. We? A few nodes to form a layer, each layer first from the bottom?? Layer of the node to get the output, and then outputs to the top of the layer to make?. To calculate? Node value, we need to make a weighted sum of the value of the output node (the weight value is W), and then add the activation function (activation functions). The activation function here is \ (f (x) =x\)

Create DataSet: \ (y=2*x[0]-3.4*x[1] + 4.2 +noise\)

# -*- coding: utf-8 -*-from mxnet import ndarray as ndfrom mxnet import autogradnum_inputs = 2num_examples = 1000true_w = [2, -3.4]true_b = 4.2X = nd.random_normal(shape=(num_examples, num_inputs))y = true_w[0] * X[:, 0] + true_w[1] * X[:, 1] + true_by += .01 * nd.random_normal(shape=y.shape)print 'dataset'import matplotlib.pyplot as pltplt.scatter(X[:, 1].asnumpy(),y.asnumpy())plt.show()

When we start training neural networks, we need to constantly read blocks of data. What do we define? A function that returns a random sample of batch_size and a corresponding subscript each time.

import randombatch_size = 10def data_iter():    # 产??个随机索引    idx = list(range(num_examples))    random.shuffle(idx)    for i in range(0, num_examples, batch_size):        j = nd.array(idx[i:min(i+batch_size,num_examples)])        yield nd.take(X, j), nd.take(y, j)for data, label in data_iter():     print(data, label)

Random initialization of the model parameters, after which we need to take a derivative of these parameters to update their values, so that the loss as far as possible, so we need to create their gradients.

w = nd.random_normal(shape=(num_inputs, 1))b = nd.zeros((1,))params = [w, b]for param in params:    param.attach_grad()

Define Network

def net(X):    return nd.dot(X, w) + b

Defining loss functions

def square_loss(yhat, y):    # 注意这?我们把 y 变形成 yhat 的形状来避免矩阵形状的?动转换    return (yhat - y.reshape(yhat.shape)) ** 2

Define the optimization scheme, and we'll solve it by random gradient descent. For each step, we take the model parameters along the gradient of the inverse to go to a specific distance, this distance is called the learning rate (learning rates) LR.

def SGD(params, lr):    for param in params:        param[:] = param - lr * param.grad

Now we can start training. Training usually involves iterating over the data several times, in this? Epochs table? The total number of iterations; In the iteration, we randomly read a fixed number of data points each time, calculate the gradient and update the model parameters.

epochs = 5learning_rate = .001niter = 0moving_loss = 0smoothing_constant = .01# 训练for e in range(epochs):    total_loss = 0    for data, label in data_iter():        with autograd.record():            output = net(data)            loss = square_loss(output, label)        loss.backward()        SGD(params, learning_rate)        total_loss += nd.sum(loss).asscalar()        # 记录每读取?个数据点后，损失的移动平均值的变化；        niter +=1        curr_loss = nd.mean(loss).asscalar()        moving_loss = (1 - smoothing_constant) * moving_loss + (smoothing_constant * curr_loss)        if (niter + 1) % 100 == 0:            print("Epoch %d, batch %d. Average loss: %f" % (                epochs, niter, moving_loss))print(params)# output[[ 1.99952257] [-3.39969802]]<NDArray 2x1 @cpu(0)>, [ 4.19949913]<NDArray 1 @cpu(0)>

Linear regression-Using gluon

Here we will use the Gluon interface provided by the mxnet to more easily implement linear regression training.

First, generate the data set

num_inputs = 2num_examples = 1000true_w = [2, -3.4]true_b = 4.2features = nd.random_normal(scale=1, shape=(num_examples, num_inputs))labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_blabels += nd.random_normal(scale=0.01, shape=labels.shape)

Read the data and use the data module provided by Gluon to read it. In each iteration, we randomly read a small batch of 10 data samples.

from mxnet.gluon import data as gdatabatch_size = 10dataset = gdata.ArrayDataset(features, labels)data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)

In the front we need to define the model parameters and use them step-by-step to describe how the model is calculated. These steps become more cumbersome when the model structure becomes more complex. In fact, Gluon provides a large number of pre-defined layers , which allows us to focus on which layers are used to construct the model.
First, import the NN module. We first define a model variable, net, which is a sequential instance. In Gluon, an sequential instance can be seen as a container for each layer in a series. As we construct the model, we add layers to the container in turn. When the input data is given, each layer in the container computes and outputs the output as input to the next layer.
The output layer of linear regression is also called the full connection layer. In Gluon, the full join layer is an dense instance. We define the number of outputs for this layer to be 1.

from mxnet.gluon import nnnet = nn.Sequential()net.add(nn.Dense(1))

It is worth mentioning that in gluon we do not need to specify the shape of each layer of input, such as the number of inputs for linear regression. The model automatically infers the number of inputs for each layer when the model sees data, such as when you execute Net (X) later.

Initialize the model parameters, import the init module from Mxnet, and pass init. Normal (sigma=0.01) specifies the weight parameter for each element to be randomly sampled at initialization with a normal distribution with a mean of 0 standard deviation of 0.01. The deviation parameter initializes all elements to zero.

from mxnet import initnet.initialize(init.Normal(sigma=0.01))

Define loss function, introduce loss module from Gluon

from mxnet.gluon import loss as glossloss = gloss.L2Loss()

Defining an optimization algorithm, after importing gluon, we can create an trainer instance and pass the model parameters to it.

from mxnet.gluon import Trainertrainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})

To train the model, we iterate over the model parameters by invoking the step function. Since the variable L is the Ndarray of the Batch_size dimension, execution L.backward () is equivalent to L.sum (). Backward (). In accordance with the definition of low-volume random gradient descent, we provide batch_size in the step function to ensure that the small batch random gradient is the average of each sample gradient in the batch.

num_epochs = 3for epoch in range(1, num_epochs + 1):    for X, y in data_iter:        with autograd.record():            l = loss(net(X), y)        l.backward()        trainer.step(batch_size)    print("epoch %d, loss: %f"          % (epoch, loss(net(features), labels).asnumpy().mean()))dense = net[0]print true_w, dense.weight.data()print true_b, dense.bias.data()

You can get the layers you need from net and access their weights and displacements. The learning is very close to the real parameters.

MXNET: Supervised learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More