"MXNet"--multi-GPU parallel programming

Source: Internet
Author: User
Tags mxnet

Original information

I. Overview of Ideas

Suppose a machine has a k -GPU on it. Given the model that needs to be trained, each GPU maintains a complete set of model parameters independently.

k " > k " >k share and give each GPU a copy.

k " >< Span id= "Mathjax-element-2-frame" class= "Mathjax" data-mathml= " k " > Then each GPU calculates the gradient of the model parameters based on the sample of the training data it has been given and the model parameters that it maintains.

k " >< Span id= "Mathjax-element-2-frame" class= "Mathjax" data-mathml= " k " > Next, we'll k "> k on a GPU to calculate the sum of the gradients separately, resulting in the current small batch gradient.

Each GPU then uses this small batch gradient to update the complete model parameters that it maintains separately.

Second, network and auxiliary functions

Use the "convolutional neural network-zero-based" Lenet as an example of this section:

# Initialize the model parameters. Scale = 0.01W1 = Nd.random.normal (Scale=scale, shape= (1, 3, 3)) B1 = Nd.zeros (shape=20) W2 = Nd.random.normal (scale=scal E, Shape= (5, 5)) B2 = Nd.zeros (shape=50) W3 = Nd.random.normal (Scale=scale, shape= (+)) B3 = Nd.zeros (shape=128 ) W4 = Nd.random.normal (Scale=scale, shape= (+)) b4 = Nd.zeros (shape=10) params = [W1, B1, W2, B2, W3, B3, W4, b4]# define Model 。 def lenet (X, params): H1_conv = nd. Convolution (data=x, weight=params[0], bias=params[1], kernel= (3, 3), num_filter=20) h1_acti Vation = Nd.relu (h1_conv) h1 = nd. Pooling (Data=h1_activation, pool_type= "avg", Kernel= (2, 2), Stride= (2, 2)) H2_conv = nd. Convolution (DATA=H1, weight=params[2], bias=params[3], kernel= (5, 5), num_filter=50) h2_act Ivation = Nd.relu (h2_conv) h2 = nd. Pooling (Data=h2_activation, pool_type= "avg", Kernel= (2, 2), Stride= (2, 2)) H2 = Nd.flatten (h2) H3 _linear = Nd.dot (h2, params[4]) +PARAMS[5] H3 = Nd.relu (h3_linear) y_hat = Nd.dot (H3, Params[6]) + params[7] return y_hat# cross-entropy loss function. Loss = Gloss. Softmaxcrossentropyloss ()
Parameter list copied to the specified device

The following function will model parameters [parameter one, parameter two, ...] Copy to a specific GPU and mark the gradient solution:

def get_params (params, CTX):    new_params = [P.copyto (CTX) for p in params] for    p in New_params:        P.attach_grad ( )    return New_params
Synchronization between devices of the same parameter

The following function adds the same parameter data on each GPU and then broadcasts it to all GPUs:

def allreduce (data):  # Input as list, containing the same parameter on different devices for    I in range (1, len (data)):        data[0][:] + = Data[i].copyto ( Data[0].context)  # Copy the I bit to a 0-bit device and add the 0-bit for    I in range (1, len (data)):        Data[0].copyto (Data[i])  # Replace I bit with cumulative 0-bit
Data partitioning to devices

Given a batch of data samples, the following functions can be divided and copied onto each GPU:

def split_and_load (data, CTX):    N, k = data.shape[0], len (ctx)    m = n/k    assert m * k = = N, ' # examples is not divided by # devices. '    return [Data[i * M: (i + 1) * M].as_in_context (Ctx[i]) for I in range (k)]
Third, the training process

Copy the full model parameters onto multiple GPUs and perform multi-GPU training on a single small batch at each iteration:

Def train (Num_gpus, Batch_size, LR):    train_iter, test_iter = Gb.load_data_fashion_mnist (batch_size)    CTX = [ Mx.gpu (i) for I in Range (Num_gpus)]  # device designator list    print (' Running on: ', CTX)    # Copies the model parameters to the Num_gpus GPU.    gpu_params = [Get_params (params, c) for C in CTX]  # Each element is a parameter on a device for the    epoch in range (1, 6):        start = time ( For        X, y in Train_iter:            # Multi-GPU training on a single small batch.            Train_batch (X, y, Gpu_params, CTX, LR)        nd.waitall ()        print (' Epoch%d, Time:%.1f sec '% (epoch, time ()-St ART)        # validates the model on GPU0.        net = Lambda x:lenet (x, gpu_params[0])        TEST_ACC = gb.evaluate_accuracy (test_iter, NET, ctx[0])        print (' Validation accuracy:%.4f '% TEST_ACC)

Implement multi-GPU training on a single small batch:

def train_batch (X, y, Gpu_params, CTX, LR):    # Divide small batches of data samples and copy them onto each GPU.    Gpu_xs = Split_and_load (X, ctx)    Gpu_ys = Split_and_load (y, CTX)    # Calculates the loss on each GPU.    with Autograd.record ():        ls = [loss (Lenet (gpu_x, Gpu_w), gpu_y)  # Loss object on different devices for              gpu_x, gpu_y, Gpu_w In Zip (Gpu_xs, Gpu_ys, Gpu_params)]    # propagates backwards on each GPU.    for L in LS:        l.backward ()    # Adds the gradients on each GPU and then broadcasts them to all GPUs.    for I in range (len (Gpu_params[0])):  # Gpu_params[0]: All parameters on Device 0        allreduce ([Gpu_params[c][i].grad For C in range (CTX)])  # summarize gradients and broadcast    # Update the full model parameters that you maintain on each GPU. For    param in gpu_params:  # Each device is updated        gb.sgd (param, LR, x.shape[0] respectively)

"MXNet"--multi-GPU parallel programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.