How cloud computing conquers high Performance computing resource scheduling this peak

Source: Internet
Author: User
Keywords Cloud computing Cloud computing

Resource scheduling is a difficult problem to be faced with. Often complex and often frustrating for users and keeping system administrators busy, they are something they have to do. The most common complaint is: "Why is my job not running?" "The most common answer to the question depends on the explanation of some of the scheduling rules, some simply said to be full load, or, in rare cases, even a user's program caused the problem."

If you don't know what a resource schedule is, the next few paragraphs must be read. The term is that you have a lot of resources, a lot of jobs in the queue, and you need to list these resources to work in the best state. Some common resource schedules such as Sungird, Engine, Torque/maui, Moab, PBS, platform, and platform Lava. Clustering is the best example of resource scheduling. In a 128-node cluster, each compute node has eight cores. Most user programs require 1-16 cores to work, but some require 256 cores. The problem is, given a list of jobs, what is the best way to make the cluster work?

When a user submits a "job", a script (similar to Qsub,queue Summit) is often used to insert a job into the queue schedule, and if allowed, the user can use a script similar to Qstat (queue status) to control his or her program. At the same time, print out some disturbing information, no one can answer you "why my job is not running" (of course, this also provides this message, but it seems the easiest way to send a message to the system administrator).

To make the scheduling problem a little trickier, in some cases we don't know how long these applications will run, and there may be other resources (such as memory capacity, storage, processor type, etc.) that might be needed. Therefore, resource scheduling is not a simple task, but it is important for cluster utilization. In fact, the emergence of multi-core makes kernel-level scheduling more important (and, of course, more difficult) than ever before. At the kernel level, the kernel must be scheduled, and the tasks must be transferred between the cores based on caching. Interestingly, high-level resource scheduling capabilities have been extended to the CPU, and controlling the location of the core is essential for getting the best performance.

Why is resource scheduling a new, cool tool after high-performance computing? Not because of a new frontal GUI or some other mysterious feature. The real reason is cloud computing. But that does not mean that the cloud will soon be everywhere, in fact, resource scheduling will put the cloud in the right place.

Recently, I heard David Perel of the New Jersey Institute of Technology using the Apache Hadoop Dynamic Resource allocation experiment made by Sun Grid Engine (SGE). Then there is an in-depth study of the articles on Sun Grid engine update. In the new version there are two attractive updates, the first is cloud computing, and the second is Hadoop, something like a mass cloud.

Most specifically, SGe's new version allows swaps in the cloud, like Amazon's EC2. Jobs are allowed, and SGE can control the connection. With EC2, the user needs to build an AMI image for the application. In addition, they need to provide account information on EC2. Once this is done, the user can insert the job into the queue, and for EC2, there is a "cloud burst".

Another new feature is integration with Hadoop. If you don't know what Hadoop is, Google it. Just erecting a Hadopp cluster is not easy. This is a powerful search pattern that does not rely on a single database. Typically, map searches reduce the number of startup servers and set different data for each local hard drive. SGE has been enhanced and now the Hadoop job can be submitted directly.

At this point, high performance computing in the cloud is a mixed blessing. Unless you use a specially designed HPC cloud, like Penguin's pod service, I/O resources that are critical to HPC performance can be diversified. This may change. Contains more kernels as a separate server. HPC Application surveys show that 57% of HPC users use 32 processors or fewer cores. These people confirmed Clustermoney.net's figure for 55% of the survey. When cloud computing starts using 48-kernel servers, it may eliminate some server to server communication problems.

High-performance computing may take a different approach to cloud computing using a dense multi-core server. Users can add jobs to the SGE on the desktop. This resource scheduling approach will touch local or cloud resources that can run virtual machines. This resource scheduling approach can make HPC a valuable desktop. Sounds like grid computing, but it's simpler.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.