Multi-core technology

Last Update:2018-07-26 Source: Internet

Author: User

Tags xeon e5

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, first to distinguish between multicore and multi-processor: that is, the difference between multi-core CPU and multiple CPUs http://www.zhihu.com/question/20998226

Architecture can be ever-changing, demand-oriented, comprehensive considerations are kingly.
Come on, let's just give an example. Suppose now we are going to design the architecture of the processor part of a computer. Now there are two options in front of us, multiple single-core CPUs and a single multi-core CPU.

If we choose multiple single-core CPUs, each CPU needs to have a more independent circuit support, with its own cache, and they communicate with each other through the bus on the board. If we are going to run a multi-threaded program on such an architecture (typical case), regardless of hyper-threading, then each thread will run on a separate CPU, and all collaboration between threads will take the bus, and the shared data is more likely to exist in several caches simultaneously. In this case, the bus overhead is quite large compared to what to do. So many caches, even if we do not feel distressed storage capacity of waste, consistency how to guarantee. If it is really done, but also on the motherboard to occupy a number of sites, to layout and cabling to bring greater challenges, how to fix.

If we choose a multi-core single CPU, then we only need a set of chipsets, a set of storage, multi-core communication between the chip's internal bus, sharing the use of memory. On such an architecture, if we run a multi-threaded program, then the communication between threads will be faster than the previous one. If the final realization, the space on the board occupies a small, layout and wiring pressure is also small.

Looks like a multi-core single CPU is a win. But what if you need to run multiple large programs at the same time. Suppose two big program, each program is good multi-threaded also almost full cache, they use the CPU, that in the process of switching between, light instructions and data replacement will cost a lot of things ah.

So, most of the computers we use are single CPU multicore, like our Dell T3600, with an Intel Xeon e5-1650,6 core, virtual 12 logical cores. A small number of high-end people need more powerful multi-tasking capabilities, will be engaged in a multi-core CPU, MAC Pro can have two.

2. Understanding multi-core technology Baidu Encyclopedia

Multi-core refers to the integration of two or more complete compute engines (cores) in a single processor. The development of multicore technology stems from the knowledge of engineers that increasing the speed of a single-core chip produces too much heat and does not provide the corresponding performance improvement, as was the case with previous processor products. They realized that at that rate in previous products, the heat generated by the processor would quickly exceed the surface of the sun. Even without the heat, the price/performance ratio is unacceptable, and the faster processors are much more expensive.

Multicore technology enables servers to handle tasks in parallel, which in the past may require multiple processors, multicore systems that are easier to augment, and more robust processing performance in a slimmer form factor, which consumes less power and generates fewer calories from computing power. Multi-core technology is the inevitable development of the processor in the last 20 years, the main factors that drive the performance of microprocessor are two: the rapid progress of semiconductor process technology and the continuous development of the system structure.

Each progress of semiconductor process technology brings forward new problems for the research of microprocessor architecture, and opens up new fields; The progress of the architecture has further improved the performance of microprocessor based on the development of semiconductor technology. These two factors are mutually influential and mutually reinforcing. Generally speaking, the development of process and circuit technology makes processor performance increase about 20 times times, the development of architecture makes processor performance increase about 4 times times, the development of compiling technology makes processor performance increase about 1.4 times times. But today, this regularity is difficult to maintain. The emergence of multicore is a necessary product of technological development and application demand.

Single-chip multiprocessor improves program parallelism by integrating multiple microprocessor cores on one chip. Each microprocessor core is essentially a relatively simple single-threaded microprocessor or a relatively simple multithreaded microprocessor, so that the core of multiple microprocessors can execute program code in parallel, thus having high thread-level parallelism. Since CMP uses a relatively simple microprocessor as the processor core, CMP has the advantages of high frequency, short design and verification period, simple control logic, good expansibility, easy realization, low power consumption and lower communication delay. In addition, CMP can take full advantage of instruction-level parallelism and thread-level parallelism in different applications, and applications with high thread-level parallelism, such as commercial applications, can make good use of this structure to improve performance. At present, single-chip multiprocessor has become an inevitable trend of processor architecture development.

3, data-level parallelism, instruction-level parallelism, thread-level parallel http://www.zhihu.com/question/21823699

The processor is the machine that executes the instruction. Performance is one of the most important indicators of the processor, "computer system structure." The following processor performance formula is presented in the quantitative research method. CPU time = number of instructions Xcpix clock cycle time. The CPI here (Cycle Per instruction). Therefore, to improve performance, you can proceed from the following aspects. Reduce the number of instructions. There are two aspects of hardware and software that can reduce the number of instructions. <1> eliminate redundant instructions with compiler optimizations. <2> uses data-level parallelism . In the field of multimedia data processing, there is a situation where multiple data is processed repeatedly by the same operation. As a result, many dedicated processors set SIMD instructions (single instruction multiple Data), such as the normal processor to execute the following instructions,

and Dest1  op_a  op_b and
Dest2  op_c  op_d

In the SIMD processor, you might just need one of these instructions.

Simd_and Dest1 Dest2  op_a op_b op_c op_d

The disadvantage is that the hardware cost is slightly higher.
Reducing the clock cycle is equivalent to increasing the clock frequency. This aspect can be achieved by two means. <1> the use of advanced technology, which directly reduces the critical path of the delay, can be fired in a fixed time more instructions, Moore's law is the embodiment of this aspect. <2> using pipelining technology, this is a type of instruction-level parallelism, the execution path is split into several segments, reducing the wait, at any point in time there are n instructions in the execution, the principle can refer to the industrial pipeline technology, the disadvantage is that the directive is related to the relevance of the command block, it is difficult to achieve n times performance improvement. Reduce CPI. This means reducing the average clock period used by each instruction. Multiple instructions can be executed per cycle, i.e. instruction-level parallelism。 The line technology described above is one of them. It is also possible to use techniques such as multiple launches. Say it alone. Multithreading。 We usually hear the multithreading often refers to the operating system level of multi-threading, for example, we are chatting QQ and listen to songs, operating system through the time cycle of the scheduling method to let the processor processing QQ program and player program, but because the switching speed too fast we can not feel. The multithreading of the processor is similar to this. For example, the processor in the same performer T1, T2, T3 three programs (we call three threads). For a single-emitting processor, each point-in-time processor can only be occupied by a single thread, such as T1, when executing to a certain point in time, may perform some slow but do not require processor assistance actions, such as fetching data from the hard disk, in order to avoid the processor too long wait for the performance loss caused by We can go to execute the T2 program segment, that is, switch to the T2 thread. Of course, the breakpoint needs to be saved before switching back to the T1 thread to continue execution. This is also thread-level parallelism, in fact to reduce the CPI.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More