Parallelx to run the Hadoop task on the GPU

Source: Internet
Author: User
Keywords Run we different compilers algorithms

The performance of the MapReduce paradigm is not always ideal in the face of large-scale computational-intensive algorithms. To address its bottlenecks, a small start-up team built a product called PARALLELX, which will bring significant improvements to the Hadoop task by leveraging the computing power of the GPU.

Parallelx's co-founder, Tony Diepenbrock, says this is a "GPU compiler that translates code written by users into OpenCL and runs on Amazon's AWS GPU Cloud." Its final product is a service similar to Amazon's elastic mapreduce, except that it will take advantage of EC2 GPU instance types.

There is no doubt that Amazon is not the only cloud service provider that provides a GPU server, and other companies such as Ibm/softlayer or Nimbix also offer servers that use the NVIDIA GPU. However, when asked if Parallelx would support a different cloud service provider outside of Amazon, Tony replied "not yet, but we will have an SDK for customers using the internal Hadoop cluster." Most GPU cloud service providers provide GPU in the HPC cloud, but we want to be able to use the GPU in cloud services at a relatively low price. After all, that's what Hadoop was designed to do-cheap commercial hardware. ”

Before we get a better understanding of what the Parallelx compiler can do, we need to understand that there are different types of GPU that are equipped with different parallel computing platforms, such as Cuda or OpenCL. Tony mentions that the PARALLELX's working scenario is "the compiler will convert the JVM bytecode into OpenCL 1.2 code, which can be compiled into a shader assembly through the OpenCL compiler to run on the GPU." Now there are also some FPGA hardware that can run OpenCL code, but to gain support for generalized parallel hardware may need to wait until one day in the future. "While Parallelx does not support reflection or native invocation in Java source code, its goal remains to ensure that developers only need to make the necessary adjustments to their MapReduce-task code-the less the better."

As the PARALLELX team began to study the throughput growth of i/o-bound tasks, Tony found that their products "also support real-time processing, queries expressed in pig and hive code, and large data streams for I/O bound tasks." In our testing, using our pipelined framework, I/O throughput is almost able to achieve the level of GPU computing throughput. ”

While the PARALLELX team is currently focusing on the Amazon version of Hadoop, they are also planning to develop other popular versions of the Hadoop version, such as Cloudera's CDH, and there is no doubt that in PARALLELX environments, It would be very useful to use these business branches to make a lot of improvements to hive and pig.

Parallelx has a unique evolutionary story, In an article, Tony describes the epic project that has lasted for 2.5 years: first a social network developed for a community, then a widget plugin for Facebook, followed by a tool to identify plagiarism code. These projects have some commonality: graphical analysis and GPU based algorithms--almost, Parallelx's ideas naturally emerge.

Parallelx is suitable for many different workloads, but it focuses on heavy analysis such as high-performance computing and graphics processing such as machine learning. The PARALLELX team cited an example to illustrate its ability to cluster a large social networking network on a single GPU in a matter of seconds--in the past, it took six computers in parallel and took an hour to complete. And there is no limit in practice, and any program written for MapReduce can use PARALLELX to compile code that can be run by the GPU.

The PARALLELX team is planning to publish its data and white papers in the future to showcase the performance of this "from Hadoop to GPU" compiler in the face of real-world workloads. There are some slightly different voices in the community's response to this topic. Some are waiting to read the white paper before deciding whether to change to Parallelx. When the news was released on Hacker News, we could find similar comments in the commentary: "Extraordinary statements require extraordinary corroboration." ”

Now, developers have been able to use APARAPI to experience the ability to use GPU on Hadoop. Aparapi is a set of Java APIs that enable developers to run specific snippets on the GPU by translating Java bytecode into OpenCL, and these snippets can be embedded in any Java-written MapReduce task.

Parallelx may be a significant step in the process of promoting Hadoop in a growing demand for complex algorithms. For example, by using the overall synchronous parallel computing model promoted by Apache Hama, the graphical analysis algorithm can achieve very good performance, and if PARALLELX can run the graphical analysis algorithm as a mapreduce task with such as Apache giraph--- The combination of such projects will add a valuable tool to any data scientist's graphical analysis toolbox.

Readers can now use the email address to register the beta version of Parallelx online. PARALLELX plans to support a free value-added program (freemium plan) that allows access to a powerful GPU and uses limited storage space.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.