"Reprint" Distributed deep learning on MPP and Hadoop

Source: Internet
Author: User

Distributed deep learning on MPP and Hadoop

December 17, 2014 | FEATURES | by Regunathan Radhakrishnan

Joint work performed by Regunathan Radhakrishnan, Gautam Muralidhar, Ailey Crow, and Sarah Aerni of Pivotal's Data science Labs.

Deep learning greatly improves upon manual design of features, allows companies to get more insights from data, and Shorte NS the time to explore, understand, and operationalize analytical results. The approach have recently become popular, both in academia and industry, as a machine-learning framework for learning Stru Cture (commonly referred to as features) from unlabeled data as well as feature generation for a supervised learning task (with labeled data). Researchers in computer vision and Natural language processing (NLP) has shown that deep-learning-generated features Prov IDE State-of-the-art performance when compared to using engineered features (those manually designed) in machine learning. In this blog article, we show how deep learning can be implemented in a distributed computing platform such aspivotal Gre Enplum database  (gpdb) and Hadoop. In the following sections, we'll briefly introduce the building block of deep learning, explain the auto-encoder, and th En describE The details of the implementation itself.

Deep learning Examples and extending, the Reach of machine learning

Applications of deep learning include classification of images to different types where the total number of classes is n OT known. For example, using a large volume of YouTube videos, researchers were able to automatically identify various types of cont ent in videos, which might is useful in automatically curating and recommending new content to users. A Second example is in automated generation of features from gene expression data to detect or classify cancer types. This publication explains how a deep learning-based classifier outperforms state-of-the-art on several image classificatio n tasks such as handwritten digit recognition, traffic sign detection, Andmore.

The complexity of designing features, particularly in the former case of identifying the space of possible classes, is dau Nting. The use of deep learning can increase the reach of machine learning by removing the reliance on and limitations of HUMAN-G Enerated features. Since deep learning are computationally intensive, it lends itself naturally to a distributed framework with large scale CO Mputing platforms such as Hadoop and massively parallel processing (MPP) databases to cycle through the desired large data Sets.

1. Auto-encoder:the Building Block of deep learning

An auto-encoder was a neural network with one hidden layer, learns an identity function under Sparsity and Regularizati on constraints. In other words, the auto-encoder attempts to reconstruct the input data by projecting onto a lower-dimensional subspace de Fined by the hidden nodes. Hence, the hidden layer is forced to learn structure from the input training examples so that it can reconstruct the input At the output. For instance, consider the auto-encoder shown in Figure 1 below for input image patches This learns a hidden layer y1 to O Utputx.  The input layer x is a set of intensity values from image patches. The hidden Layer nodes project the high-dimensional input layer into a set of low-dimensional activation values of the HID Den nodes. The activation values of the hidden nodes Y1 is combined to create the output layerxWhich is a approximation to the input pixels.  The hidden layer in this case learns structure from pixels in the form of edges in various orientations. The hidden layers typically has a smaller number of nodes than the input layer nodes and hence the hidden nodes is force D to compress the information in the input layer in such a, the output layer can still be created. Since most of the local image patches tend to being smooth, the only structure that the hidden layers need to learn are the SE T of edges in different orientations is common among the images.

Figure 1:auto-encoder to learn structure from pixels

Auto-encoders can stacked one beside the other to learn higher order structures that encode different relationships bet Ween the structural elements from the previous layer. For example, we can learn another auto-encoder for which the input was Y1 and the hidden layer is y2. The hidden layer y2 now learns relationship between edges to form shapes, the the-the-first layer learned relationships be Tween pixels in regions. We can derive higher-order features by building on the hidden layer of the previous auto-encoder as shown in Figure 2 Belo W.

Figure 2:auto-encoders for computer Vision

Stacking the hidden layers y1, y2, y3 yields a deep learning framework based on auto-encoders, commonly referred to as a S Tacked Auto Encoder. It can create features at the level of object attributes starting from information in pixels in a completely unsupervised manner (without any labels on input image examples). Figure 3 below shows the final stacked auto-encoder.

Figure 3:deep Learning framework using stacked auto-encoders

2. Learning an Auto-encoder

In order to learn an auto-encoder from a set of N unlabeled training examples, we need to find the set of parameters P = ( W1, B1, W2, B2), such that the reconstruction Errorσ (x–x) 2 are minimized subject to regularization and sparsity Constrai NTS on the parameters. Figure 4 below shows an example of a auto-encoder with 3 input variables and 2 hidden nodes.

Figure 4:parameters learned in an Auto-encoder (W1,B1, W2,B2)

The parameters that minimize this cost function can is learned using a gradient descent procedure as suggested in Unsuperv ised Feature Learning with deep learning Tutorial. The high-level steps during learning is the following:

    • Step 1:initialize the parameters P randomly.
    • Step 2:compute the cost function and gradient of the cost function with the current set of parameters P.
    • Step 3:apply The gradient descent rule to update P Repeat Steps 2 and 3 until convergence of the cost function.

The computation of the gradient and the cost function are based on the popular techniques of the neural network back and FO  Rward propagation. For a large datasets of training examples, this process is computationally intensive, and a distributed platform and Framew Ork speeds the process well beyond what's possible with traditional systems.

3. Distributed learning of Auto-encoder on Pivotal gpdb

In this next section, we show how to distribute the learning problem on Pivotal greenplum Database (gpdb) and Pivotal HD b Y explaining how the cost and gradient functions is distributed.

3.1 Distributed computation of cost FUNCTION

In order to understand what the cost function computation can is distributed, let us consider the computational tasks Invol Ved. For each training example X, perform forward propagation as shown by the equations below:

A1 = sigmoid (W1x + b1)

x = sigmoid (w2a1 + b2)

Here, the first equation computes the activations (A1) of the "The Hidden nodes for the" input example x while the second EQ Uation computes the output responses x of the output layer. Both of these steps can be performed in parallel in all the segments of gpdb for the corresponding data, resides in th OSE segments. After computing (A1and x, the cost function can be computed as a sum of the following terms:

    1. Σ (x – x) 2-the reconstruction Error term
    2. Σ| | w1| | 2 +σ| | w2| | 2-the regularization term
    3. Σρlog (ρ/ρj) + (1-ρ) log ((1-ρ)/(1-ρj))-sparsity term, which is a function of average activation value of a hidden node ( ΡJ) for all the examples

All of the can is accomplished in gpdb through a pl/r function that gets called on the data residing in each of the Segme Nts. Then, the final cost function value for all the data can is computed as the aggregated sum of individual cost function Val UEs from each of the segment. Figure 5 below illustrates the above procedure in gpdb. In all of the N segments in gpdb, a PL/R function computes forward propagation steps on the data stored in the correspond ing segment to obtain A1 and X. Then, steps a-c outlined above was computed to compute "cost_i" which was the cost function calculated on data stored in SE Gment i. Finally, the cost values from all segments is aggregated to obtain "Cost_all".

Figure 5:distributed Computation of cost function in gpdb while learning the Auto-encoder

3.2 Distributed computation of GRADIENT FUNCTION

For gradient computation, in addition to computing the activation (A1) and the output response (x) (as shown in t He previous section), we need to perform backward propagation. This step propagates the error through the Auto-encoder as shown below by the equations:

Delta (3) =-(x – x) * Sigmoid_derivative (W (2) a1+b2) Delta (2) = W (2) Delta (3) * Sigmoid_derivative (W (1) x+b1)

Finally, the gradient value is computed as a function of the activation values and delta values. Similar to the cost function computation, computation of the delta and the gradient value can distributed using a PL/R function in gpdb. Therefore, each segment just computes the gradient value for data that resides on that segment. Then, we can aggregate the gradient values from all the segments to perform one step of the gradient descent algorithm.

4. Learned Hidden Nodes from Natural Image patches

We implemented the distributed deep learning algorithm on gpdb in the same natural image dataset referenced here and Obtai Ned the following hidden layer. As illustrated in Figure 6, the first level hidden layer uncovers edges and ridges at different orientations from the raw Input pixel data.

Figure 6:deep Learning features (hidden nodes from first auto-encoder) from natural image patches (8x8)

5. Distributed learning of Auto-encoder on Pivotal Hadoop and HAWQ

Where Pivotal really provides an advantage are in the seamless reuse of the gpdb deep learning implementation on Pivotal HD , and on Hawq, Pivotal's SQL on Hadoop solution. In Hadoop, the per-segment gradient and cost function computations, which is implemented as PL/R functions can be easily Implemented as mapper functions.  A reducer function can then simply aggregate the values from all the mappers. In Hawq, the gpdb PL/R functions was deployed as is and the algorithm can be run entirely in-database within HAWQ. For running deep learning on Pivotal Hadoop or HAWQ, the image patches need to being stored in HDFS.

Conclusions

Deep learning, which are a framework for learning structure from unlabeled data, can being implemented to run on distributed C Omputing platforms such as Hadoop, gpdb, and Hawq.  we showed that the computation of gradient descent steps can be distributed across multiple compute nodes using PL/R In Hawq and gpdb and using Map-reduce in Hadoop. Also, the implementation in R allowed seamless code re-use across multiple platforms such as PL/R in Gpdb and Hawq, and MA p reduce using R streaming on Hadoop. The ease of using this framework on these platforms ensures, we can learn features from large collections of unlabeled Data and learn about a domain in an unsupervised fashion. The described implementation provides an important toolkit for data Scientists:deep learning functionality for large Volu Mes of data on Hadoop or GPDB/HAWQ.  we Note that the volume of disk I/O for gradient descent iterations can become a bottleneck during the learning of T He auto-encoder for approaches inHadoop or using GPDB/HAWQ. Nevertheless, it is useful to having deep learning functionality in the toolkit of data scientists in these platforms as wel L.

In a future blog post, our colleague Victor Fang would describe how this limitation for particularly large datasets can be Addressed by implementing deep learning on Spark, which is rapidly gaining popularity as a distributed, In-memory Computin G Framework.

"Reprint" Distributed deep learning on MPP and Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.