Using HADOOP-RDMA to speed up large data processing

Last Update:2014-12-18 Source: Internet

Author: User

Keywords We this can here.

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

China's most influential and largest large data-gathering event--2013 China's Big Data Technology conference (DA data Marvell CONFERENCE,BDTC) was held in Beijing in December 2013 5-6th. Dozens of leading companies, nearly 70 keynote speeches, not only cover the Hadoop ecosystem and flow calculation, real-time computing and NoSQL, Newsql and other technical direction, but also on the Internet, finance, telecommunications, transportation, medical and other innovative cases, the laws and regulations of large data resources, The policy control of large data commercial utilization is discussed in depth.

BDTC The first day plenary meeting of China large Data technology conference, Dhabaleswar Panda, a professor of computer science at Ohio State University, published a speech entitled "Using HADOOP-RDMA to accelerate large data processing". He began his speech from a high-performance computing network, emphasizing the role of the MPI component. It then mentions whether HPC can be used in Hadoop. Dhabaleswar that many companies are trying to improve the socket is actually a mistake, because the socket is not intended for high-performance design. Dhabaleswar then shared the performance improvements of components such as HDFs, MapReduce, and so on, after using RDMA.

The following is a transcript of the speech:

Dhabaleswar K. Panda: Good afternoon! I am very pleased to have such an opportunity to participate in our forum. Let me tell you about the work that our team has done in recent years. We developed RDMA, an advanced computing method that, if you don't understand RDMA, is actually a remote DMA. What are the characteristics of RDMA? How we use these characteristics. This morning we talked about big data issues, big data into our business, business analysis of the elements, big data to our management decision-making brings a unique opportunity. In addition, we see that the volume of these data is very large, probably 10 years after a total of 35PB. The large data has three v characteristics, the first V is very large, the second V represents very fast, and the third V is a multiplicity, which is the characteristic of the three v.

Let's take a look at who is using Hadoop, and we know that there are many different businesses that use Hadoop,hadoop to have a large number of users around the world. The next step is to introduce the different parts of Hadoop, and then the network technology and what the protocol will do around the world, and third, discuss the challenges of Hadoop development and talk about the design of different Hadoop. Next to introduce you to Mikachma.

We know this is the architecture of Hadoop, we have HDFs, RPC, and so on, which is the framework and structure of the entire Hadoop. How do we interact with each other during the interactive phase? Here we focus on the interaction, we can use a more high-performance protocol to do the relevant content, to do this work. Now take a look at our HDFs. This is the initial storage of our Hadoop, the ability to error correction, the reliability is very strong, and the system by many different very well-known enterprises should be applied, here we just want to see how we can get RMA or RDMA content embedded in this, See what's good? Let's take a look at the network level, in our clients and in the HDFs data node interaction, we have a relatively high-performance network for such a connection, so we further promote the HDFs of this client application, this is our Hadoop architecture and operation. They charge to get different relevant data and then they do it, let's take a look at this part of the related operation may be the cost is also relatively high. Another important component is hbase, we have hbase host database, this design has been applied in different enterprises, we have talked about the application of hbase this morning, hbase throughout the process there are two different networks, one contact HBase client, There is also a connection between the local server and the regional server and the data node. Let's look at network technology and the trends associated with protocols. This is in the past 20 years of related development, this is the world's 500 strong in the commercial computing cluster related trends, but also with us to further show what this specific situation? From here, our table shows us what happened to him over the years. It was the world's top 500 by 1999, when all the supercomputers were used. After the great development of our commodity cluster, in recent years there are 85% of these are some clusters, may involve clients and many other related components of the cluster. Now let's take a look at this is the more advanced mutual connection, from this slide we can see that we have this interconnected agreement, in fact, has been further application, in China, there are many other companies have applied a new protocol.

In recent years we have RDM this integration, so we now have an integrated interface to integrate the relevant architectures, and we see that the relevant surfaces are very good. So here's what I'm going to show you. This delay is relatively low, our latency may be only a few microseconds, we have a wide bandwidth, 100 TB per second, and CPU-related costs are relatively low, your associated CPU is relatively low about 5% to 10%, so that you can make good interaction, You can also raise your speed. This is similar to Linux development and is also driving the most common development. There are different colors, with different technology, we look at the green this, for many years it has been relatively stable growth. This shows the proportions of what network technology has been used for so many years in 500 systems. With the continuous development of the Internet, its application is also developing. Here we can see that it is probably the speed of 50GB all, the performance difference is quite a lot, but here we see how it behaves in Ethernet. You can see that many things can be considered in designing systems. In 2013, a list of 500 systems, 207 of which were used for this cluster, was announced. You can see that there are 19 of these systems, of course there are many from China. Why is the network growing so fast in high-performance computing? There is actually a middleware MPI, this is a programming model that can drive these high-performance computing development over the years with the development of new network technology, people have done a lot of research and development, including we have done a lot of studies here, This makes performance and scalability further elevated. It's not just the API, it's actually about 13 years in the field, and there are other models that are now coming up. In this case I mentioned just now in our company, in fact we have been studying RDMA, our software with other equipment and machines for research, see here in about 71 countries, more than 2000 are in use, including some universities, and some rankings are very forward, Like the seventh more than 10 in the middle, it is done on the basis of MPI.

Look at the performance this is what I just mentioned the delay is very short there are different indicators, look at the Blue line is the latest data, with Intel's latest architecture. The ability to send data from one node to another can be approximately 12.8GB per second, which enables the development of its performance if the 100G transmission speed and very short delay are considered. What's the relationship with Hadoop if high-performance development is so good? That's what we've been working on for the last two years. We are more concerned about how we can break through its bottleneck, and then give it a new design, whether this design adapts to the MPI model, and if we can achieve its performance, why don't we, here we have some research results. At the same time, you can see that many clusters already exist, and China has already deployed such systems. So whether these machines can be used in environments like Hadoop, consider whether clusters of high-performance computing networks can use Hadoop to make large data applications.

Just mentioned the challenge, if you look at the overall development, look at the different constructs. For example, the technology of multi-core architecture Network, storage technology of course this is also very important. The top is a large data application, and there are some middleware in the middle. The middle is the programming model, if we want to have enough performance at this level, improve the quality of service and so on these are the challenges we now face research. The work we do now is focused on a very small part of the research, whether we can use other systems, this area of research is very wide, there are many performance improvements. This is the more common protocol shown, which is a traditional socket, including a G-two g dozens of G. Of course sockets also have many alternative protocols, and other products are also 4G dozens of G. There are other agreements such as SDP, which is actually my top concern now, and this technology enables some applications to achieve better performance. There is a thing called his, with this there will be better performance, most of the different sockets for comparison, these are some of the future direction.

People will ask now that the design is the case, there are some applications, is a G or 10 g network, some companies may do some improvement work, do the accelerated socket, then whether Hadoop is accelerated use of these networks and protocols. These sockets are not designed with high performance and may not necessarily match at the top. For example, in Hadoop, can our previous designs improve the overall performance. If we can do what we do, so this is also the problem we need to solve.

What we've done in the last two years has been the improvement of the following builds, and here are some details, we have a Hadoop2.0 project this summer, and the new version has some of the latest research techniques in our latest research version. This is the latest version of the 1.2.1 version, you can download. This has been tested with a lot of things and tested on different platforms. In these constructs we can see how we do the design. So far, people think the network is slow, and the network is slow to show you some numbers that the future can be faster. So if we were to revamp the upper layer to improve performance, this is being studied. This is a paper we published last year at the 2012 conference, this is about how we can improve speed, how to get up to 10G from 2G, these numbers are all done with the same software, two protocols, a IPOIB, red is our design, the back one is 10G, The first ipoib with the same hardware can change 30%,10g words is 56%, just mentioned with our technology is 30% change, 10G is 56%, this is the change of communication time.

There is also a new evaluation of the results are similar to the left shows the HDD data node numbers. Then you can see each node it has a single number, where the size of the 20G file, in the Ipoib change over 24%, if the cluster is 4 SID node if it is 20G file size or use Ipoib high-performance network, can improve performance, The use of modern storage to enable High-performance network has a significant improvement. People say I use high-performance networks do not have the effect of acceleration, may depend on some software problems.

This is another experiment similar to the previous experiment, is also a G and two g in contrast, this is not only in the two hard drive improvement, one is the same, where you can see that high-performance network does have some acceleration of the role.

This is another experiment, is in the SDSC do experiments, there are 33 SDSC nodes in 100G, delay increased by 28%, so in our design is to make it more scalability to do better. This is only part of the SDSC. Here we optimize some of the computational logic memory modules to reduce the communication time between them. To the following or two parts, through RDMA, we take full advantage of the accelerated features of each of the hardware. Here are some typical numbers, these numbers also say eight data nodes, the right side is four data nodes, eight DB nodes have 24% improvement on the APSU. Here can be seen compared with SOIP, which we improve is 24%, if using SASD this effect is more obvious. Here you see or 100G, can have a 31% improvement, SSD on the right side of the improvement of performance. This is what we run on a larger cluster, which is 64 nodes, 240GB above, the improvement is 39%, performance is increased by 39%, which is only part of it.

There are a variety of benchmark indicators, we can see here, such as 30GB, the improvement reached 46%. We've done some work on RPC, and that's a very important part of it, in a lot of activities, because there's a lot of communication in there. Because here we first do such a design, including on-demand connection design, RDMA or real-time communication, and so on, left is the latency. You see 39 microseconds, and here we have a 50% increase in performance delay, which is half the amount. The overall throughput on the right improves 82% to 64.2%.

Let's take a look at the other related components, HBase, which is a more detailed description. Here we can see that if we use the relevant technology, I can also greatly reduce the time we communicate. In fact, for a KB get, we are up to 6 times times more for 10 g communication times. This is a framework for hbase. This pair of HBase is a single server structure for multiple clients, so we can further compare the hbase get latency and what's the traffic for the 10 G16 clients, and we'll increase it by 27%. Let's look at a slide, which is a hbase reading and writing, and we can see our further delay and further increase by 42%.

We can actually integrate the different parts. This slide takes them to the relevant integration, and for 5G we actually improve by about 53% compared to 10 g, and it can increase 10% for 20G. Equivalent to 10G use HDD, compared to HDFs can also improve the 46%. This is a comparison of similar comparisons, which is a comparison of our clustering classification. Then we have about eight clusters, and then we have a 80G classification, from which we can see an elevation of 40%, for 10 g using HDD not using HDFS will also increase 32%.

We can see from the diagram that 100 g-related experiments have been conducted, about eight clusters, and we found that we can increase 45% for the actual use of HDD upgrade 39%. This is our SDSC-related classification, and we can see that there are 32 clusters for 200 g random readers, and we have a 20% increase for IPOIB. I am here to introduce a simple RDMA related design, this design is similar to the design we have before, but also to see your specific client what kind of? We can actually have the host line, and then have a good fight with the client.

Let's take a look at the relevant challenges. We have some traditional plug-ins, we have the traditional plug-ins with the RDMA of the corresponding integration, after integration, we hope that the relevant quality or other related performance has a significant increase in the hope that the operating angle can be further changed, so if we have more tools, in this piece will have more benefits. We have also done a lot of research in the future, we hope to launch a new version, further support HDD balanced replication, support more advanced upper design and further optimization.

I finished, thank my team very much, they are very hard. Finally my team is recruiting new members, if you are very interested in our research, you are welcome to join our team, you can contact me and Lucy.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More