What is InfiniBand multi-rail? Amazing

Source: Internet
Author: User
The explanation is as follows:

Multi-rail support (multiple ports per adapter and multiple adapters)

It looks a bit like multi-nic binding or multi-port binding on a single InfiniBand Nic. But this is not the case. Of course, binding multiple NICs or multiple ports will increase the bandwidth, but the multi-rail here is not that simple. In fact, the term "Multi-Rail" is not limited to InfiniBand, Ethernet, and Myrinet.

Multi-Rail refers to simultaneous communication by using multiple NICs or multiple ports on a single Nic. There are two purposes: first, increase bandwidth and bridge bandwidth bottlenecks. Second, enhance fault-tolerance. Naturally, in HPC, we pay more attention to the previous point.

In the HPC field, communication is usually bidirectional. In the case of a single network card, a single network card can receive and send packets at the same time, which is less efficient. However, if there are two NICs, one sending and one receiving, the efficiency will be improved a lot, especially for some HPC applications with frequent bidirectional communication, this is the most typical multi-Rail application.

OK. Therefore, the multi-rail discussion focuses more on a strategy. A good hardware implementation is nothing more than one Nic or another port. However, there are many policies. The so-called policy is to use which Nic and port (NIC and port collectively referred to as rail) to send a message? The traditional program divides the process ID that receives the message by the number of rail in the system and obtains the remainder, that is, the rail that sends the message. Obviously, this method is very problematic, some rail may have a lot of communication, but some rail may have nothing to do.

As a result, there are various policies such as static (based on round-robin), local dynamic, dynamic, and hybrid to implement multi-rail. For details, see the PDF in the attachment, this is from the Ramos lab. It is very good. It includes principles, policy introduction, and actual test data. The conclusion on the last page points out that the static policy is the worst, but it is best to implement it. Using dynamic, the communication bandwidth and efficiency are the best, especially for messages with large data volumes, for applications that frequently send small data volumes, you can add the dynamic policy with the hybird policy, which is the best !!

In turn, let's take a look at the makefile provided by mvapich with multi-rail support. Compared with the normal one, two configure definitions are added during configure, it is better to define data copy on multiple InfiniBand NICs or multiple ports.

The only uncertain thing is that the hardware of Multi-rail is very good. The key is the implementation of those policies, because the implementation of these policies is transparent to the application and is implemented in the formation. Therefore, the implementation of these policies is implemented on hardware (for example, many InfiniBand switcher currently support multi-rail ), or software implementation (for example, mvapich is a special version for multi-rail ). My basic guess is that both parties work together. There should be a basic implementation on the hardware, coupled with software optimization. For example, mvapich is dedicated to MPI (that is, the HPC field) optimized.

Now I think that multi-rail should be implemented mainly by hardware for the following reasons:

1. Because the overhead implemented by those policies is relatively small only by hardware, if the overhead is implemented by software, it will be larger when every message is sent and received.

2. Search on Google, mpich multi-rail, and search for almost all interconnection network solutions such as InfiniBand, Myrinet, and qsnet. I have never heard of a Multi rail version of mpich.

3. after reading the makefile of mvapich with multi-rail support, there are two more definitions in configure, but these two definitions cannot be used to implement the multi-rail policy, it is more like some preparation for the program to run on the hardware of Multi-rail support.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.