Troubleshooting slow parallel Computing in a small project (III.)

Source: Internet
Author: User

After a few days of testing, the final test results came out on January 6, 2015, this time the test results are satisfactory, the same task when the 32 cores to call the time spent is 61,899 seconds, and when the call 64 cores to calculate the same task takes 32,659 seconds, There is a significant advantage over gigabit networks compared to using gigabit networks.

Two compute nodes, a compute node is 32 cores, in the previous gigabit network, the use of 32 cores and 64 cores to calculate the time spent on the same task is not significantly different, and this time using the million-gigabit network, the calculation time almost half of the difference, which shows that The bottleneck in this parallel computing system is actually caused by the network. At the very beginning of the customer push configuration, because only two compute nodes, the company for cost reasons, there is no push million gigabit network or IB network, so it caused the problem of this project. Through this problem also give yourself a wake up, in the future high-performance system, as long as the compute node is more than one, that compute nodes and compute nodes between the network first IB network, in the budget, A gigabit network is selected when the number of compute nodes is low (the gigabit network is also a bottleneck if you do not know how much to compute or how many compute nodes).

But the next day after the test results, when I went to the customer to fetch the test switch, I also submitted a task called 64 cores, and collected some data on two compute nodes, including the monitoring data of gigabit network card traffic and the data of system load. is the traffic monitoring diagram for two compute nodes ' Internet cards:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/58/56/wKiom1SuijmB9Ri4AAdRLvXhIO8790.jpg "title=" 22222222222.png "alt=" Wkiom1suijmb9ri4aadrlvxhio8790.jpg "/>

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/58/53/wKioL1SuipaDcRjGAAWjkVhfdoU074.jpg "title=" Cu02-net.png "alt=" Wkiol1suipadcrjgaawjkvhfdou074.jpg "/>

The above two pictures are after I submit a call 64 Core computing task about 5 minutes after the data to see the bandwidth of the network also ran to nearly 400Mbps, which is similar to the data in the gigabit network, which is far from the speed limit of the million gigabit network, it seems to be a gigabit network bandwidth, This is indeed very strange, so why?

When grabbing the top two graphs, I respectively on two compute nodes with the top command to view the CPU of two compute nodes is full load work, and once the task is stopped, the traffic on the two network cards is reduced to 10kbits/s, so, the network card is working normally, But why in the speed does not reflect the advantages of gigabit network card, and test results are recognized?

is Lammpps software in parallel computing not every time the bandwidth requirements are so harsh, but in the calculation to some of the steps of high bandwidth to play his advantage, that is, in some time in the parallel environment in the computation of communication between nodes will be used to high bandwidth. It's just my guess!

This time I also extracted the output of the top command, and compared with the output of the top and the "mpstat-p-all" command, the last time you use these two commands to output the system's CPU utilization, the top shows that the CPU utilization is around 90%, but with "mpstat-p all" Output per core utilization, almost every core utilization is not high, is around 30%, but this time with "Mpstat-p-all" to see the utilization of each core, found that each core utilization is around 70%, compared to the previous significant improvement, and in the top command, press "1" (number one) also shows the high utilization rate of each core. How is this phenomenon explained?

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/58/57/wKiom1SukLnigTFPAAfO6QLQajc384.jpg "title=" Top.png "alt=" Wkiom1suklnigtfpaafo6qlqajc384.jpg "/> from a technical point of view, the problem of this system has been solved. After switching from gigabit to gigabit, the time taken to compute the same task with 32 cores and 64 cores is almost doubled with the number of cores, and the time spent in computing is halved. But in the process of some of the performance data found in the phenomenon I can not explain, so write here, if later have the opportunity to delve into the knowledge of parallel computing to answer.

Finish!!!

This article is from the "SNAIL" blog, make sure to keep this source http://357742954.blog.51cto.com/368705/1600861

Troubleshooting slow parallel Computing in a small project (III.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.