Large data "flooding": fusion of nuclear popularity parallel computing?

Source: Internet
Author: User
Keywords Intel coprocessor very large data parallel computing

Beijing time this morning, Intel (Intel) officially released the Xeon Phi (Xeon) coprocessor based on an integrated Salt Lake City (MIC) architecture at the supercomputer (SC12) Conference held in the city. One of the Xeon Phi coprocessor 5110P with the date of shipment, January 28, 2013 GA, recommended customer price of 2649 dollars; Xeon Phi Coprocessor 3,110 families will be available in the first half of 2013, advising customers price less than 2000 U.S. dollars.

Intel Xeon Phi Coprocessor Family, 5110P and custom products se10p passive heat dissipation for data centers; The 3100 series has passive thermal and active heat dissipation schemes, which are suitable for any environment, including workstations

Despite the strong challenge of arm in the consumer market, the company's market capitalisation has been surpassed for the first pass, but Intel also has a strong backing for the enterprise-class markets. Xeon family in the server and storage market continues to encroach on RISC territory, with the x86 camp AMD is also forced to arm handed "cast".

Think about more than 10 years ago, or the RISC Rule data Center, x86 control desktop computing. Now the situation is almost reversed, the front end of the consumer market, arm with smart phones and the success of the tablet market threat PC, and the opportunistic attack on the back end of the enterprise-level market, to repeat the x86 story. The so-called "Army of impermanence, water impermanence." "The same era of technology often does not have the absolute advantages and disadvantages, to see who is better at using the situation, with the times, is the way of long Sheng."

Intel Xeon Phi coprocessor is used in conjunction with the Intel Xeon (Xeon) CPU in the form of a PCI Express (PCIe) card

In the enterprise market, Intel is trendy. Last year's cloud computing, the big data this year, is not the first to preach, but it is not outdated. Logically, cloud computing and big data can become a popular topic, infrastructure level, inseparable from the proliferation of x86, but, whenever the intel to the big data on, people always feel unaccustomed.

Intel indicates that the xeion Phi coprocessor kernel Adds a number of features to the P54C, including 64-bit support. Now the kernel plus the L2 cache, less than 2% is the x86 component (x87 Logic)

I remember the second Big Data World Forum of July This year, a reporter friend walked into the meeting, saw Intel's logo, exclaimed "Intel and Big data have what relation?" "It sparked a big discussion on the popularity of Hadoop on Weibo (of course, you can't equate Hadoop with big data)." Last month, when Intel introduced its Xeon (Xeon Phi) processor, it associated HPC (High-performance computing) with large data, which caused a backlash from peers.

I would rather interpret this as an aversion to "big data" when it comes to the industry's recent trends. If this layer is left behind, Intel says that a large number of data applications represented by Hadoop have been identified in private discussions with several circles of friends, and that there is much in common with high-performance computing-high parallelism, from computing to I/O.

From calculation to storage, large data is similar to High-performance computing

Subsequently, at the end of October, China's HPC TOP100 rankings, 4 of the top 10 were installed in Internet service providers, and in general, the system applied to Internet services up to 35 sets, accounting for 35%, in the industry to maintain the first, the proportion of a significant jump.

We are not saying that internet services means Hadoop, or large data, but at least they are more closely applied, and are far from being scientifically calculated in the "traditional sense" of scientific computing. To some extent, Internet services/Large data applications have expanded the scope of HPC, helping the latter to come out of the ivory tower and penetrate all aspects of ordinary life.

As we go along this thread, we are in an era of accelerated transition to parallel processing. The CPU emphasizes multi-core and multithreading since needless to say, the hard drive (HDD) is faced by the solid state disk (SSD) part or all the risk of replacing, also reflects the same truth. When Intel was promoting the SSD, one of the big crimes on the hard drive was that the performance of the decade was only 1.3 times times higher, far behind the CPU's progress. The evaluation of the hard drive is a bit biased, here does not examine, but caught the key, that the hard drive for many years is the same moment only one head work, the parallelism is very poor, improve performance basically can only rely on speeding up the speed of mechanical parts, so the effect is very limited. SSD is not, although the performance and capacity of each flash chip is not high, but can be multiple chips (Controller multi-channel) at the same time read/write, with a high degree of parallelism, performance is easy to throw away the hard drive several streets.

The HPC market's annual composite growth rate (CAGR) is close to the cloud

Although performance has barely grown with capacity, the strength of the hard drive in capacity and price is still not as SSD can reach. If the hard drive is not parallel, try to avoid doing two things at once (reduce random access). For example, my office environment is running Outlook in a virtual machine, shutting down Outlook, and shutting down the virtual machine, and writing data to the hard disk. If you perform a close Outlook action, not equal to the data file written, immediately shut down the virtual machine, then two write operations have some overlap, shutdown time will be very long; If you wait for Outlook to completely shut down and then shut down the virtual machine, the total time spent can be significantly shortened. In other words, in a system that lacks parallelism, it takes a short time to perform two tasks completely serially, rather than switching back and forth between the two tasks. (1+1<2?) )

The hard drive's parallel capabilities are poor, but multiple hard drives work at the same time to accommodate parallel access and large capacity, which is what storage systems (RAID) and Hadoop do.

TACC's Stampede System ranked 7th in the freshly released HPC Top500 rankings, benefiting from the thousands of-piece custom Xeon Phi coprocessor se10p

If you need the ultimate parallel access ability, just like the past double 11 Taobao database, one day only deal transactions on the billion, away from the High-performance PCIe SSD, is unimaginable.

What do you want to say? concurrency degree, combination. The hard drive is not designed for concurrent access, but in sequential access, output (throughput) is not much worse than SSD and has an advantage in capacity and price. When the degree of parallelism is not too high, you can use hard disk combination, with the increase in parallel access requirements, the introduction of SSD, or even completely dependent on SSD.

Intel Xeon Phi coprocessor se10p

However, in the Hadoop system, the hard drive is still dominant, the SSD is relatively rare, and the memory capacity of each node is not very large-although the business sector is advocating "memory calculation." The culture of the Internet industry is to use the overall architecture to distribute tasks as much as possible without relying on expensive hardware. Compared with the supercomputing system on the cusp of pyramid, they belong to the civilian version of HPC, pay attention to input-output ratio, can be replicated higher.

I'm on the TACC, and I can see the Intel Xeon Phi coprocessor se10p information, including 61 cores and 8GB GDDR5 memory, note the following TACC Stampede and mic coprocessor display

Now look back and calculate. x86 CPU parallelism is not hard drive comparable, but in the highly parallel GPU design, the gap is very obvious. The Titan system, which has just won the title of the new global Top500 list, is the NVIDIA Tesla k20x GPU accelerator chip.

Fresh out of the top 10 Top500 list, note 1th, 7 and 8

Texas High Computing Center (TACC) 's Stampede system, using the Dell PowerEdge c8220x, Xeon e5-2680 8 core CPU and Intel Xeon Phi Coprocessor Hybrid system, small wins two years ago the header-is also CPU + GPU 1th # , but still far from the Titan system.

TACC's Stampede system employs 6400 Dell PowerEdge c8220x Blades, each with 2 8-core Intel Xeon e5-2680 processors and 32GB of memory

Intel's own GPU is a weak link, and it is not possible to sit and even prop up AMD or NV's GPU to bigger, and it is natural to oppose the Cpu+gpu hybrid system. Intel's solution is to Xeon Phi as a coprocessor, replacing the GPU, and assisting the CPU to handle highly parallel tasks. GPU Attack coprocessor performance, Intel emphasizes that the introduction of the GPU requires a lot of reprogramming, not universal. This is a lot of verbal warfare, my knowledge of HPC is limited, and I'm not a programming expert, and there's no repeats here, mainly to discuss Intel's approach.

Dell PowerEdge c8220x Blade Server

First, Intel emphasizes that the Xeon E5 is the cornerstone of HPC. Here there are several layers of meaning, one is x86 CPU, E5 occupy a clear advantage. CPU + GPU, the latter can not afford to let the former drag. AMD's Opteron Although the kernel number is more, but overall is in downwind. In addition, the Xeon E5 platform integrates PCI Express, PCIe the rabbit, with the Xeon Phi of the interface, can further shorten the delay.

After Intel announced the launch of Xeon Phi (then mic) coprocessor, Nvidia wrote "No free Lunch" (meaning that the mic ran the x86 program without changing the code is nonsense). James Reinders, director of Intel's parallel programming, said humorously that parallel programming was important, but no one could get a free lunch.

Intel is as clear as Nvidia that many of the current programs are serial programming and need to be as parallel as possible to give full play to the parallel computing power of the GPU or Xeon Phi coprocessor. However, James Reinders stressed that parallel programming is also helpful in tapping the potential of the CPU.

The Xeon CPU uses the parallel code developed for the Xeon Phi Coprocessor, and the performance can be increased by a hundredfold

He gave an example of a saxpy (Scalar Alpha X Plus Y, a combination of scalar multiplication and vector addition, a common computational operation instruction in parallel vector), and the parallel code ran on Xeon Phi, 340.6 Times times the 6 core Xeon e5-2600 running serialization code. But when the code that the Xeon E5 runs is also compiled in parallel, this multiple (Xeon phi to e5-2600) drops sharply to 2.3.

Single Xeon Phi coprocessor (right) performance improvement for dual Xeon E5 (left)

Intel aims to demonstrate that highly parallel devices such as Xeon Phi require highly parallel programming, which can also benefit from the E5 (normal parallel) processor. In the parallel era, parallel programming would have been necessary. James Reinders throws up a question: Do you want to use the same language, parallel programming model, and similar tools to meet highly parallel requirements?

In other cases, the Xeon Phi coprocessor has a performance improvement of up to 10 times times

TACC's Jay Boisseau that users want to get performance jumps without having to pay (to change the code), but what happens when they do something they don't want to do (to improve performance) and are locked in a particular hardware architecture (the GPU)? The Xeon Phi is not good for the performance of common parallel computing, but it solves the problem of hardware specific coding, and can be programmed without restriction with FORTRAN, C, C + +. Xeon Phi running serial application will be slow, so to match the Xeon E5 work.

In summary, Xeon Phi combines advanced performance with the benefits of the standard CPU programming model, which is the main reason why the Stampede system chooses its combination with the Xeon E5.

Several cases of working with the Xeon CPU and Xeon Phi coprocessor

Each user, choose a specific solution, there is always a sufficient reason. As for the near future, the combination of the Xeon CPU and the Xeon Phi Coprocessor can defeat the CPU + GPU on the Top500 list, and even the throne, is not what I can judge.

My view is that while big data is at risk of being fried, Intel has used large data as an example of parallel computing, even with HPC, not necessarily a whim to hitch up a hot-concept ride. Xeon Phi really put the market to the beginning of 2013, a short period of time and in the traditional HPC field has been a considerable accumulation of CPU + GPU combination architecture to contend (a piece of the market can be divided). However, the compatibility advantage of Xeon Phi (compared to GPU) may be attractive in a broader range of data areas, especially in the Internet services market, which uses the Intel Xeon platform to build Hadoop clusters. If the market accepts Intel's ideas, it may play a "rural siege" effect on the HPC market ...

Similar thing that Intel has done before, ARM is doing, the future? Just give it to the future.

(Responsible editor: Schpeppen)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.