"MXNet"--multi-GPU parallel programming

Last Update:2018-05-26 Source: Internet

Author: User

Tags mxnet

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original information

I. Overview of Ideas

Suppose a machine has a k -GPU on it. Given the model that needs to be trained, each GPU maintains a complete set of model parameters independently.

k " > k " >k share and give each GPU a copy.

k " >< Span id= "Mathjax-element-2-frame" class= "Mathjax" data-mathml= " k " > Then each GPU calculates the gradient of the model parameters based on the sample of the training data it has been given and the model parameters that it maintains.

k " >< Span id= "Mathjax-element-2-frame" class= "Mathjax" data-mathml= " k " > Next, we'll k "> k on a GPU to calculate the sum of the gradients separately, resulting in the current small batch gradient.

Each GPU then uses this small batch gradient to update the complete model parameters that it maintains separately.

Second, network and auxiliary functions

Use the "convolutional neural network-zero-based" Lenet as an example of this section:

# Initialize the model parameters. Scale = 0.01W1 = Nd.random.normal (Scale=scale, shape= (1, 3, 3)) B1 = Nd.zeros (shape=20) W2 = Nd.random.normal (scale=scal E, Shape= (5, 5)) B2 = Nd.zeros (shape=50) W3 = Nd.random.normal (Scale=scale, shape= (+)) B3 = Nd.zeros (shape=128 ) W4 = Nd.random.normal (Scale=scale, shape= (+)) b4 = Nd.zeros (shape=10) params = [W1, B1, W2, B2, W3, B3, W4, b4]# define Model 。 def lenet (X, params): H1_conv = nd. Convolution (data=x, weight=params[0], bias=params[1], kernel= (3, 3), num_filter=20) h1_acti Vation = Nd.relu (h1_conv) h1 = nd. Pooling (Data=h1_activation, pool_type= "avg", Kernel= (2, 2), Stride= (2, 2)) H2_conv = nd. Convolution (DATA=H1, weight=params[2], bias=params[3], kernel= (5, 5), num_filter=50) h2_act Ivation = Nd.relu (h2_conv) h2 = nd. Pooling (Data=h2_activation, pool_type= "avg", Kernel= (2, 2), Stride= (2, 2)) H2 = Nd.flatten (h2) H3 _linear = Nd.dot (h2, params[4]) +PARAMS[5] H3 = Nd.relu (h3_linear) y_hat = Nd.dot (H3, Params[6]) + params[7] return y_hat# cross-entropy loss function. Loss = Gloss. Softmaxcrossentropyloss ()

Parameter list copied to the specified device

The following function will model parameters [parameter one, parameter two, ...] Copy to a specific GPU and mark the gradient solution:

def get_params (params, CTX):    new_params = [P.copyto (CTX) for p in params] for    p in New_params:        P.attach_grad ( )    return New_params

Synchronization between devices of the same parameter

The following function adds the same parameter data on each GPU and then broadcasts it to all GPUs:

def allreduce (data):  # Input as list, containing the same parameter on different devices for    I in range (1, len (data)):        data[0][:] + = Data[i].copyto ( Data[0].context)  # Copy the I bit to a 0-bit device and add the 0-bit for    I in range (1, len (data)):        Data[0].copyto (Data[i])  # Replace I bit with cumulative 0-bit

Data partitioning to devices

Given a batch of data samples, the following functions can be divided and copied onto each GPU:

def split_and_load (data, CTX):    N, k = data.shape[0], len (ctx)    m = n/k    assert m * k = = N, ' # examples is not divided by # devices. '    return [Data[i * M: (i + 1) * M].as_in_context (Ctx[i]) for I in range (k)]

Third, the training process

Copy the full model parameters onto multiple GPUs and perform multi-GPU training on a single small batch at each iteration:

Def train (Num_gpus, Batch_size, LR):    train_iter, test_iter = Gb.load_data_fashion_mnist (batch_size)    CTX = [ Mx.gpu (i) for I in Range (Num_gpus)]  # device designator list    print (' Running on: ', CTX)    # Copies the model parameters to the Num_gpus GPU.    gpu_params = [Get_params (params, c) for C in CTX]  # Each element is a parameter on a device for the    epoch in range (1, 6):        start = time ( For        X, y in Train_iter:            # Multi-GPU training on a single small batch.            Train_batch (X, y, Gpu_params, CTX, LR)        nd.waitall ()        print (' Epoch%d, Time:%.1f sec '% (epoch, time ()-St ART)        # validates the model on GPU0.        net = Lambda x:lenet (x, gpu_params[0])        TEST_ACC = gb.evaluate_accuracy (test_iter, NET, ctx[0])        print (' Validation accuracy:%.4f '% TEST_ACC)

Implement multi-GPU training on a single small batch:

def train_batch (X, y, Gpu_params, CTX, LR):    # Divide small batches of data samples and copy them onto each GPU.    Gpu_xs = Split_and_load (X, ctx)    Gpu_ys = Split_and_load (y, CTX)    # Calculates the loss on each GPU.    with Autograd.record ():        ls = [loss (Lenet (gpu_x, Gpu_w), gpu_y)  # Loss object on different devices for              gpu_x, gpu_y, Gpu_w In Zip (Gpu_xs, Gpu_ys, Gpu_params)]    # propagates backwards on each GPU.    for L in LS:        l.backward ()    # Adds the gradients on each GPU and then broadcasts them to all GPUs.    for I in range (len (Gpu_params[0])):  # Gpu_params[0]: All parameters on Device 0        allreduce ([Gpu_params[c][i].grad For C in range (CTX)])  # summarize gradients and broadcast    # Update the full model parameters that you maintain on each GPU. For    param in gpu_params:  # Each device is updated        gb.sgd (param, LR, x.shape[0] respectively)

"MXNet"--multi-GPU parallel programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More