How does HPC go with the clouds?

Source: Internet
Author: User
Keywords HPC

Today, "cloud computing" simple function, low-cost Advantage has become a hot word in the IT industry, whether professionals or manufacturers, want the data center can be changed over the world. hpc--high-performance Computing, many related users are also Shing whether their load is suitable for deployment in a cloud environment. The reason why there is disagreement is mainly about "what is cloud computing?" "and" What kind of applications are HPC applications? The answer to this question is quite different.

HPC applications cannot be "one-size-fits-all", and we cannot say that a certain type of application is directly summed up in HPC applications. At the beginning of 2000, the application of computer applications to distributed computing clusters from large, centrally controlled mainframe computers was a complete illustration of this fact. This approach not only brings ordinary commercial hardware into the HPC field, but also provides customers with more choices. HPC users can save on investment costs by building HPC at minimum requirements. You will find that some HPC systems have fewer nodes, but each processor has a larger memory, narrower bandwidth, and longer latency between nodes, while other HPC systems may be designed according to different parameter requirements, all tailored to the needs of the application.
However, for most businesses or IT departments, the pressure to reduce costs is increasing, so the "pay-per-use" cloud model emerges. The point is that not all calculation types are appropriate for the cloud architecture.

Rackspace, Amazon, Savvis, and some other IaaS providers use different virtualization technologies to manage their underlying hardware resources, and unfortunately, the virtualization technologies used by each vendor are different and sometimes confidential, such as AWS EC2. Therefore, for HPC applications, whether a virtual machine or a physical machine is a problem that requires special discussion before building a HPC cloud.

Virtualization Issues

The main reasons why High-performance Computing (HPC) architects have been slow to adopt virtualization technologies are two: one is that virtualization, which is commonly perceived as seriously impacting the performance of applications, is overwhelmed by the drawbacks of reducing application throughput by the benefits of virtualization The second is that the utilization of the traditional HPC infrastructure is already very high (usually 80%~95%), as a result, it is not enough to drive an enterprise to adopt virtualization (increasing hardware utilization, consolidating servers, or increasing license utilization) to offset the complexity and cost increases associated with running workloads with virtualized resources.

In many cases, however, HPC architects are willing to sacrifice 5% of application performance to gain the flexibility and resiliency that virtualization offers. The main reasons that HPC users are willing to do this are as follows:

• Security-A virtual machine can be added to a virtual local area network as an instance or removed from a virtual local area network. Some HPC environments require data and host isolation between multiple groups of users, even the user itself. The traditional virtual local area network is usually combined with the physical server, resulting in the resource islands, in the changing load environment, the island will lead to low resource utilization. Virtual machines and virtual LANs can be used in conjunction with each other to isolate users from each other and isolate the data so that users who have access to it can access it.

• Application stack control-many applications require specific operating system versions, updates, code libraries, and configurations. In a mixed application environment, where multiple applications share the same physical hardware, it is difficult to meet the requirements of all applications for a particular stack. With virtualization technology, the challenge is solved, because the entire stack can be deployed as part of the application in a virtualized environment.

• Make the most of high-value assets-the newest (and fastest) machines often demand the most in heterogeneous HPC systems. In order to meet this demand, some enterprises use a reservation system to minimize conflicts between users. Unfortunately, such reservation systems are often not fully utilized. In contrast, when working with a virtual machine to work with a computing job, most of the migration tools within the hypervisor allow the opportunity workload to use high-value assets, even when a reserved window is opened for a different user. If a user who makes a reservation requires the final submission of the workload, requiring processing of the load, the opportunity workload can then migrate to lower-value assets to continue processing without wasting any processor cycles.

• Handle Long-running jobs--Several HPC applications do not provide checkpoint restart functionality. However, virtual machine technology can get and check the entire state of a virtual machine to check for application settings that were not previously checked. If the job is running long enough to have the same average failure time (MTBF) for the entire solution, the checkpoint tool inside the virtual machine can be very appealing. In addition, if server maintenance is frequent or predictable, checkpoints within virtual machines migrate or pause long-running jobs to prevent computational time loss while eliminating any obstacles to performing regular server maintenance.

Commercial justification for using HPC cloud

Several key factors have prompted companies to consider implementing cloud computing, which can help companies cut costs and provide better services to their internal users. These factors are:

• Pay by usage-customers can pay by the time the application runs or when the storage and data transfer services are used.

• Near-limitless infrastructure-almost instant access to infrastructure; Conversely, when there is no workload, the infrastructure can shrink back to nearly 0 resources.

• Configure resources based on workloads-the types of operating systems or servers can be assigned at any time based on workloads, dramatically improving resource configuration flexibility.

Obstacles to implementing the HPC cloud

While cloud computing has many advantages, there are many hurdles to overcome when considering whether cloud computing is suitable for HPC environments.
• Security and intellectual property-the data in the cloud environment is often the core intellectual property of a business enterprise. This possibility must not be overlooked: Commercial competitors may use the same shared computing resources. From a legal point of view, when data leaks occur, intellectual property protection and compensation provisions provide limited recourse to cloud-computing users.

• Software licensing-most commercial enterprises use Third-party software provided by independent software Developers (ISVs) to run or manage HPC jobs. These applications are purchased with legal protocols that specify where the application can run, often requiring that the application be run only on the customer's site.

• Data Transfer--unless the enterprise completely uses cloud computing and discards its own data center, the models and results used for impersonation must be transmitted between the cloud provider and the customer data center. This work is complicated, however, because Internet bandwidth is limited in terms of the transmission of large volumes of files, and for most infrastructure, service (IaaS) providers, all the data transferred to and from the cloud environment is billed as a "fee-based service".

• The pricing model – a Pay-as-you-go model is often attractive to customers, but if the public cloud is used for a long time, the cost is usually two or three times times more expensive than the cost of owning and maintaining the hardware within two years. Companies should be careful to determine how often the use of public clouds is more advantageous than using a local server.

HPC-oriented cloud computing

HPC data centers must take into account the business rationale for cloud computing and the obstacles it faces to determine whether the model is appropriate for the enterprise and which model is best for itself.

Implementing a HPC cloud requires several tools, including the virtual machine hypervisor platform, the workload manager, and the Infrastructure Management Toolkit. The Management toolkit should provide functionality such as policy definition and execution, configuration management, resource reservation, and reporting. The virtual machine management platform should provide a good foundation for the virtual parts of cloud computing resources. Finally, the workload manager should provide job management functionality.

For most large HPC environments, users need to consider either a private cloud or a hybrid cloud solution. In a mixed cloud environment, an external public cloud can be used during peak demand, which is called "Cloud burst" (cloud bursting), and a smaller HPC environment might consider a public cloud with all the resources in the cloud. Regardless of which pattern of cloud is used, it is particularly important to ensure that the infrastructure contains management that can take full advantage of both physical and virtual resources, as HPC applications are still primarily placed on physical machines. Ideally, the management should be able to merge the hypervisor environment and physical environment into a dynamically shared infrastructure that supports multiple operating systems and heterogeneous environments.

Hybrid cloud scenarios can be very advantageous for HPC environments because they provide the extra computational power required to complete a job. The cloud burst scenario should be considered in particular in the following situations:
• The expected time for the operation of the operation is too long;
• If the cumulative running time required for the job to run locally is too long;
• When the operation of the data transfer to and from the cloud environment is not high.

Once you have identified the best way to implement cloud computing for an enterprise, there are a number of ways to evaluate the IaaS provider and see which provider is best suited to the application and workload requirements of the enterprise. Factors to consider in the assessment include performance, reliability, the speed at which the instance is created, price, and so on, as well as the negotiation and pricing processes and policies of each IaaS provider, and the assessment of reliability against service level agreements (SLAs). Considering each of these factors is helpful for companies to choose the providers that best suit their purpose.

Author Introduction

Chris Porter, currently platform's HPC Cloud product manager, has extensive experience in HPC and cloud computing and has written numerous white papers on HPC and cloud computing.

"Responsible editor: Xiao Ming TEL: (010) 68476606"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.