Build small clusters to use warehouse size (Warehouse-scale) calculations as needed

Source: Internet
Author: User
Keywords Warehouse-scale Warehouse size

This method allows the architect to complete the build locally to provide the expected workload and overflow to the on demand cloud HPC to cope with the peak load. Part 1th focuses on how system builders and HPC application developers can extend your systems and applications most efficiently.

Processor cores with custom extensions and shared memory the external HPC architecture of the internet is rapidly being replaced by an on-demand cluster, these clusters utilize off-the-shelf general-purpose vector collaboration processors, converged Ethernet (each link Gbit or higher speed), and multicore headless (headless) servers. These new on-demand HPC resources are similar to the so-called warehouse sizing (Warehouse-scale computing), where each node is homogeneous and headless, focusing on total cost of ownership and overall power usage efficiency. However, HPC has the processing power needed to resolve beyond similar social networks, WEB searches, and other typical warehouse sizing solutions. This article will focus on how system builders and HPC application developers can extend your systems and applications most efficiently.

Migrating to High-performance Computing

Since 1994, TOP500 and Green500 supercomputers are not typically custom designed, but are designed and integrated using off-the-shelf headless servers, converged Ethernet (or InfiniBand clusters), and generic graphics processing unit (GP-GPU) coprocessor, which is not used for graphics processing, but for single program, multiple data (SPMD) workloads. High-performance Computing (HPC) deviates from the direction of external custom processor and memory interconnect design, with trends towards leveraging existing equipment (warehouse size calculations) (based on controlling TCO, increasing power efficiency, and balancing operating expenses (OPEX) and capital expenditures for new and established HPC operations (CAPEX) Needs). This means that you can build your own small clusters in a similar way, using them as needed when you need HPC warehouse size resources.

The famous 3D ring interconnect used by computers such as Cray never completely disappeared (today, TOP500 1/3 uses a large parallel processor [MPP],2/3 High-performance machine uses a cluster architecture), but for efficiency and new OpEx metrics such as Green500 Floating point The focus of Twist (flop)/watt) is driving HPC development and keeping the architecture focused on the cluster computing architecture. Also, many interesting applications today are data-driven (e.g., digital video analytics), so many systems need not only to use traditional sequential high-performance storage for HPC checkpoints (the saved state of long-running jobs), but also to randomly access structured (database) and unstructured (file) large datasets. Large data access is a common requirement for traditional warehouse sizing and current and emerging HPC workloads for cloud services. As a result, warehouse sizing is not HPC, but HPC applications can use the technology inspired by the data center to implement on-demand cloud HPC, provided it was designed from the outset.

The focus of the scalable computing architecture is always changing, including:

Early attention to the use of a fast single processor, the storage program arithmetic logic unit CPU to the highest clock rate and instruction throughput: John von Neumann, Alan Turing, Robert Noyce (founder of Intel), Ted Hoff (Intel Universal Processor advocates) and Gordon Moore see the initial expansion as a big challenge to scaling up the digital logic and processor clocks as much as possible. At least until 1984 (and possibly later), people generally think that "the processor makes the computer." Cray Computer has designed vector computers (X-MP and Y-MP) and distributed memory multiprocessor, which are interconnected by a 6-to interconnected 3D loop for custom MPP machines. But this is a unique design in the field of Supercomputing. IBM's early focus was on scalable mainframes and fast single processors, until 1999, when a multicore ibm®power® architecture was released and a Ibm®blue gene® architecture with a 3D ring interconnect was launched. The current TOP500 contains a number of Blue Gene systems, which are often topped in the TOP500 of LINPACK metrics. From 1994 until recently, HPC evolved to evolve into custom MPP and usually off-the-shelf clusters, using custom interconnections (such as Blue Gene and Cray) and off-the-shelf converged Ethernet (10G, 40G) and infiniband:top500 have been cluster-dominated, This includes most of today's top performance HPC Solutions (2/3). As shown in the Architecture TOP500 chart since 1994, clusters and MPP now dominate (relative to single instruction, multiple data [SIMD] vectors, fast single processors, Chenduo to [SMP] shared memory, and other less-defined architectures). John Gage of Sun Microsystems (now Oracle) shows that "the network is a computer", which refers to distributed systems and the Internet, but similarly, the low latency network in the cluster is becoming the core of the expansion. A coprocessor that is connected to a cluster node through memory-mapped I/O, including Gp-gpu and a hybrid field Programmable gate Array (FPGA) processor, is used to speed up specific compute workloads on each cluster node. Warehouse sizing and cloud start up, and they focus on highly parallel applications MapReduce and HPC (embarrassingly parallel applications): TOP500 using LINpack and flop to measure, so do not focus on operating costs (such as Flop/watt) or data access. Memory access is important, and storage access is not so important, except for job checkpoints (so you can restart a job if needed). A number of data-driven query applications have emerged in the new century, including social networks, Internet search, global geographic information systems, and analysis associated with more than 10 internet users. This is not in the traditional sense of HPC, but large-scale warehouse computing operations. Luiz Andrébarroso indicates that "data center is the computer" and this is the second time away from focusing on the processor. Data centers are highly focused on OpEx and CapEx, so they are better suited to flop/watt and data access-critical HPC. Google Data Center has a pue,pue of less than 1.2 is a metric that will consume total energy consumed by dividing the power used for calculation. Amazon has launched Amazon elastic Compute Cloud (Amazon EC2), which is best for WEB services, but has some scalable and high-throughput computing features. On-demand Cloud HPC services are expanded to focus on clustering, storage, coprocessor, and resilient scaling: many private and public HPC clusters occupy the TOP500 rankings, they run linux® and use common open source tools, so users can build and extend applications on small clusters, But sometimes you need to migrate to the cloud and perform large job processing as needed. Some companies (such as Penguin Computing, which own Penguin On-demand) take advantage of off-the-shelf clusters (InfiniBand and converged 10g/40g Ethernet), Intel or AMD multicore headless nodes, Gp-gpu Coprocessor and extensible independent Redundant disk array (RAID) storage. IBM Platform Computing Services provide IBM xseries® and zseries® on-demand computing, as well as some workload management tools and features. Many universities and start-ups are using on-demand HPC to supplement their private services with cloud services or off-the-shelf clusters. I know two examples of this are University of Alaska Arctic Region supercomputing Center (ARSC) Pacman (Penguin Computing) and University of Colorado JANUS cluster supercomputer. A common Red Hat Enterprise Linux (RHEL) Open source work negativeLoad working sets and open schemas allow applications to migrate from private to public cloud HPC systems.

Figure 1 shows the TOP500 migrating to clusters and MPP since the middle of the 90 's.

Figure 1. The evolution of TOP500 to clusters and MPP since 1994

On-demand cloud HPC methods need to define well-defined out-of-the-box clusters, compute nodes, and tolerance for WAN latency during transport workloads. In this context, these systems are unlikely to enter the forefront of TOP500, but they may be shortlisted for Green500, providing efficient scalability for many workloads, and now occupy a majority of Top500 's seats.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.