MXNET: Supervised learning

Source: Internet
Author: User
Tags shuffle mxnet

Linear regression

Given a data point set X and the corresponding target value Y, the goal of the linear model is to find a use vector W and displacement B
The lines described, to be as close as possible to each sample x[i] and Y[i].

The mathematical formula is represented as \ (\hat{y}=xw+b\)

The objective function is to minimize a bit of squared error \ (\sum_{i=1}^{n} (\hat{y_i}-y_i) ^2\)

A neural network is a collection of nodes (neurons) and forward edges. We? A few nodes to form a layer, each layer first from the bottom?? Layer of the node to get the output, and then outputs to the top of the layer to make?. To calculate? Node value, we need to make a weighted sum of the value of the output node (the weight value is W), and then add the activation function (activation functions). The activation function here is \ (f (x) =x\)

Create DataSet: \ (y=2*x[0]-3.4*x[1] + 4.2 +noise\)

# -*- coding: utf-8 -*-from mxnet import ndarray as ndfrom mxnet import autogradnum_inputs = 2num_examples = 1000true_w = [2, -3.4]true_b = 4.2X = nd.random_normal(shape=(num_examples, num_inputs))y = true_w[0] * X[:, 0] + true_w[1] * X[:, 1] + true_by += .01 * nd.random_normal(shape=y.shape)print 'dataset'import matplotlib.pyplot as pltplt.scatter(X[:, 1].asnumpy(),y.asnumpy())plt.show()

When we start training neural networks, we need to constantly read blocks of data. What do we define? A function that returns a random sample of batch_size and a corresponding subscript each time.

import randombatch_size = 10def data_iter():    # 产??个随机索引    idx = list(range(num_examples))    random.shuffle(idx)    for i in range(0, num_examples, batch_size):        j = nd.array(idx[i:min(i+batch_size,num_examples)])        yield nd.take(X, j), nd.take(y, j)for data, label in data_iter():     print(data, label)

Random initialization of the model parameters, after which we need to take a derivative of these parameters to update their values, so that the loss as far as possible, so we need to create their gradients.

w = nd.random_normal(shape=(num_inputs, 1))b = nd.zeros((1,))params = [w, b]for param in params:    param.attach_grad()

Define Network

def net(X):    return nd.dot(X, w) + b

Defining loss functions

def square_loss(yhat, y):    # 注意这?我们把 y 变形成 yhat 的形状来避免矩阵形状的?动转换    return (yhat - y.reshape(yhat.shape)) ** 2

Define the optimization scheme, and we'll solve it by random gradient descent. For each step, we take the model parameters along the gradient of the inverse to go to a specific distance, this distance is called the learning rate (learning rates) LR.

def SGD(params, lr):    for param in params:        param[:] = param - lr * param.grad

Now we can start training. Training usually involves iterating over the data several times, in this? Epochs table? The total number of iterations; In the iteration, we randomly read a fixed number of data points each time, calculate the gradient and update the model parameters.

epochs = 5learning_rate = .001niter = 0moving_loss = 0smoothing_constant = .01# 训练for e in range(epochs):    total_loss = 0    for data, label in data_iter():        with autograd.record():            output = net(data)            loss = square_loss(output, label)        loss.backward()        SGD(params, learning_rate)        total_loss += nd.sum(loss).asscalar()        # 记录每读取?个数据点后,损失的移动平均值的变化;        niter +=1        curr_loss = nd.mean(loss).asscalar()        moving_loss = (1 - smoothing_constant) * moving_loss + (smoothing_constant * curr_loss)        if (niter + 1) % 100 == 0:            print("Epoch %d, batch %d. Average loss: %f" % (                epochs, niter, moving_loss))print(params)# output[[ 1.99952257] [-3.39969802]]<NDArray 2x1 @cpu(0)>, [ 4.19949913]<NDArray 1 @cpu(0)>
Linear regression-Using gluon

Here we will use the Gluon interface provided by the mxnet to more easily implement linear regression training.

First, generate the data set

num_inputs = 2num_examples = 1000true_w = [2, -3.4]true_b = 4.2features = nd.random_normal(scale=1, shape=(num_examples, num_inputs))labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_blabels += nd.random_normal(scale=0.01, shape=labels.shape)

Read the data and use the data module provided by Gluon to read it. In each iteration, we randomly read a small batch of 10 data samples.

from mxnet.gluon import data as gdatabatch_size = 10dataset = gdata.ArrayDataset(features, labels)data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)

In the front we need to define the model parameters and use them step-by-step to describe how the model is calculated. These steps become more cumbersome when the model structure becomes more complex. In fact, Gluon provides a large number of pre-defined layers , which allows us to focus on which layers are used to construct the model.
First, import the NN module. We first define a model variable, net, which is a sequential instance. In Gluon, an sequential instance can be seen as a container for each layer in a series. As we construct the model, we add layers to the container in turn. When the input data is given, each layer in the container computes and outputs the output as input to the next layer.
The output layer of linear regression is also called the full connection layer. In Gluon, the full join layer is an dense instance. We define the number of outputs for this layer to be 1.

from mxnet.gluon import nnnet = nn.Sequential()net.add(nn.Dense(1))

It is worth mentioning that in gluon we do not need to specify the shape of each layer of input, such as the number of inputs for linear regression. The model automatically infers the number of inputs for each layer when the model sees data, such as when you execute Net (X) later.

Initialize the model parameters, import the init module from Mxnet, and pass init. Normal (sigma=0.01) specifies the weight parameter for each element to be randomly sampled at initialization with a normal distribution with a mean of 0 standard deviation of 0.01. The deviation parameter initializes all elements to zero.

from mxnet import initnet.initialize(init.Normal(sigma=0.01))

Define loss function, introduce loss module from Gluon

from mxnet.gluon import loss as glossloss = gloss.L2Loss()

Defining an optimization algorithm, after importing gluon, we can create an trainer instance and pass the model parameters to it.

from mxnet.gluon import Trainertrainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})

To train the model, we iterate over the model parameters by invoking the step function. Since the variable L is the Ndarray of the Batch_size dimension, execution L.backward () is equivalent to L.sum (). Backward (). In accordance with the definition of low-volume random gradient descent, we provide batch_size in the step function to ensure that the small batch random gradient is the average of each sample gradient in the batch.

num_epochs = 3for epoch in range(1, num_epochs + 1):    for X, y in data_iter:        with autograd.record():            l = loss(net(X), y)        l.backward()        trainer.step(batch_size)    print("epoch %d, loss: %f"          % (epoch, loss(net(features), labels).asnumpy().mean()))dense = net[0]print true_w, dense.weight.data()print true_b, dense.bias.data()

You can get the layers you need from net and access their weights and displacements. The learning is very close to the real parameters.

MXNET: Supervised learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.