HPC computing Superman: a big headache

Source: Internet
Author: User

If we compare a PC to an ordinary person, an HPC System or supercomputer can be called a "Superman", because even an ordinary HPC System, its computing power is also several thousand times that of the PC. Top-level HPC systems, such as the Popular IBM Blue Gene series in Europe and America, another example is domestic "sugon", "inspur", "Lenovo", or "shenwei". Its capabilities are even more limitless! Because of this, they can shoulder the important responsibilities of solving large and complex problems.

However, just as super humans in science fiction movies suffer from extreme material loss, or often feel stuck in the issue of love and affection and public opinion attacks initiated by the enemy. HPC systems, the "super" in the computing field, will also encounter various challenges and obstacles from their own or external aspects during their growth. For now, these concerns are mainly focused on the following six aspects:

HPC troubles 1. Hard work, low application efficiency

The global HPC TOP500, or the top in China, is the potential of the HPC system, that is, the theoretical peak computing speed and Linpack Benchmark Testing performance, but they cannot reflect the practical performance of HPC. In fact, for many research institutes, universities, enterprises, and other HPC application institutions, there are many examples of inefficient HPC System applications due to factors such as software, configuration, and management, for example, although the hardware scale of some users' HPC systems is constantly expanding, its actual computing power has not been significantly improved, or, although a large number of HPC Cluster Systems with hundreds or even thousands of computing cores have emerged, there are few applications that can fully utilize their performance ...... The result is that these users can afford HPC but cannot use it.

HPC troubles 2. Insufficient System Configuration "balance"

HPC is a super computing performance. It not only refers to computing performance, but also includes three aspects: CPU floating point processing capability, I/O bandwidth, and memory bandwidth. Different types of applications have different requirements for these three features. Taking the oil exploration industry as an example, reservoir simulation applications are sensitive to memory bandwidth and latency, however, seismic data processing requires powerful computing performance.

In this case, if users cannot reasonably configure the HPC System Based on the performance requirements of their application software, it will inevitably lead to a "congenital disorder ", for example, if you configure a hardware platform optimized for computing-intensive applications for communication-intensive applications, the node has a powerful computing power, however, due to the limitation of I/O communication bandwidth, data accumulation and computing resources are greatly wasted.

HPC troubles 3. unemployment crisis, unbalanced hardware and software development

Although the-trillion and-trillion HPC systems made in China have made frequent appearances this year, they are at risk of "unemployment" at any time!

Believe it? Let's look at two examples: one is the sample uar TX5 trillions of sub-HPC System under the Oak Ridge National Laboratory of the U. S. Department of Energy. It has 0.15 million CPU cores, and its scientific computing tasks are well arranged: the number of jobs with less than 30 thousand CPU cores accounts for 50%; the number of jobs with 32% to 4.5 cores accounts for 90 thousand, and the number of jobs with 18% to cores accounts. In contrast, although there are 30 thousand CPU cores in the sugon trillion HPC System of Shanghai SUPER computing center in China, however, its applications are far from keeping up with each other-the number of jobs with less than 16 kernels accounts for 60%, the number of jobs with less than 39% kernels accounts for 160, and the number of jobs with more than 1% kernels accounts for only.

The above application gaps mainly stem from the lack of HPC application software in China. Although the hardware technology of our HPC System has been greatly developed over the past few years, the foundation of application software has been very weak, moreover, the relevant talent, software investment, and innovative R & D systems are not sound enough, which makes China's HPC applications always face "limited computing scale, low computing accuracy, and low resolution, key applications are restricted and difficult to improve and develop, which leads to the dilemma of "imbalance between big machines, small applications, and hardware and software development.

HPC troubles 4. The amount of meals is too large, and the energy consumption is amazing

As the saying goes, people are iron, rice is steel, and a meal is not hungry. HPC is the same as super computing, But it consumes amazing power. As people have higher requirements on computing workload, computing time, and problem handling complexity, the scale of HPC systems is growing, and the number of CPUs used is also increasing by thousands, electricity usage naturally increases, which overwhelmed the Enterprises and institutions that support it. For example, an HPC System with hundreds of trillions of sub-categories costs RMB 20 thousands or 30 thousands RMB per day, and millions of resources will be charged in a year! The HPC System with a performance of trillion times consumes more power than a small city.

HPC troubles 5. Difficult to "slim down" and challenges to increase computing density

It is inevitable that there will be a blessing in the middle of a person's life. The HPC System is also the same. This super computing system is easy to shape, especially for systems with hundreds of trillions of times or more, if a traditional 1U or 2U rack server is used as a node, the size of the node will be astonishing, which will make users with smaller data centers complain. In addition, there are a lot of cables attached to the "ass" of these nodes, which is difficult to view and manage. As a result, people began to try to build large-scale HPC systems with blade servers or improved high-density servers, such as 1-host dual-board twin servers. This method also brings about some new challenges, such as the IDC environment, especially the transformation of cabinet power supply and Data room heat dissipation, and the absence of uniform standards for blade servers.

HPC troubles 6. What do manufacturers do? Lack of unified standards

There are many manufacturers that can now make HPC systems, including overseas giants such as IBM, HP, Dell, and Sun, as well as national enterprises such as sugon, inspur, Lenovo, and baode, there are also some small local troops, and some systems are "DIY" by users. If they are using x86 rack servers, the situation is still good. After all, such products have unified standards, there is no major problem in compatibility and interoperability, and accessories are easy to find, so there is no need to worry about services; however, if the blade server is used for construction, the inconsistency of its standards may lead to great trouble.

Unlike traditional rack servers, although blade servers have been around for ten years since their birth, there has never been a uniform standard. Only blade server chassis has seen 50 or 60 different products, there are more than a dozen other competitors in the market, and blade servers themselves are even more different. They are not alternative to each other, and it is difficult to achieve interoperability, even the blade servers released by some manufacturers are not compatible with the blade chassis they launched earlier. Obviously, for HPC System users, this situation only makes it vulnerable to being bound by a vendor, poor purchase of related accessories, high service costs, and high platform switching costs, and platforms from different vendors can only run independently in HPC systems, and cannot form a combination of capabilities.

Conclusion

The six troubles listed above are common problems that most HPC System users encounter ". If you don't solve them, the HPC System will be shelved, and even make it more mediocre, so this computing "Superman" has nothing to do with "extraordinary. To avoid this situation, both users of HPC systems, manufacturers of these systems, processors and computing platform providers, and application software developers must find out the cause of these problems, in order to propose targeted solutions.

  1. China's supercomputer ranks among the top five in the world for the first time
  2. Top super computers in China are listed in the rankings today)
  3. China's first supercomputer with trillions of times of development success

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.