What is the bulldozer architecture in the CPU?

Source: Internet
Author: User
Tags thread

CMP and SMT

The CMP:CMP approach is straightforward, and in simple terms, CMP extends the processor's performance in multithreaded software by replicating the physical core, which is the simplest and most efficient way to achieve optimal performance. But CMP's disadvantage is that manufacturing costs are expensive and are limited by the processor's manufacturing process, which does not make the chips bigger and larger. And the CMP approach is also very high in load requirements, only properly parallel optimized load can give full play to CMP performance, many core CMP often waste resources, in some applications, a higher frequency, simpler structure of dual-core and four-core processors can often achieve better performance.

SMT:SMT is a relatively inexpensive technology, such as Intel's Hyper-threading, which allows two synchronization threads to be run per physical core. The design idea of SMT is to make full use of each core resource. If a physical core has only one thread of execution, the thread is paused while waiting for critical code or data in memory, so that the core utilization is low. And SMT technology allows a physical core to run two or more threads, can be dynamically switched according to the current situation, if a thread in a standstill state waiting for memory, the other thread's instructions can use the physical core of all the execution unit, so that the physical core to use more fully.

For SMT to work, all code and storage portions of the processor need to be replicated or partitioned. For example, a two-thread SMT processor requires two sets of architecture registers and a renaming register, one for thread A, the other for thread B. In addition, the shared instruction queue that makes up the instruction window needs a lot of space so that the instruction window can hold enough instructions from two threads to keep the execution unit busy. Finally, two threads any shared unit, such as the instruction cache that handles different parts of the pipeline, cannot be monopolized by any one thread. In other words, the two threads of the SMT core need to be tightly shared with another, ensuring that the core cache unit is not empty and thread-free.

Analysis on the structure of bulldozer

AMD "Bulldozer" will adopt 32nmSOI process, which makes "bulldozer" compared to "Magny-Cours" Hao long processor can increase the power without increasing the premise of 33% of the core number, increase the throughput of 50%. Unlike all previous processors in AMD, "bulldozers" are designed with "modular" design, each "module" containing two processor cores, somewhat like a single core processor with SMT enabled. Each core has its own integer scheduler and four proprietary pipelines, and two cores share a floating-point scheduler and two 128-bit fmac multiplication accumulators.

Differently, in the K10 architecture, the Alu and Agu share three pipelines (averaging 1.5), and the number of each core integer unit pipeline in the bulldozer increases to 4, 2 Agu proprietary, and 2 Alu proprietary. L1 cache is also different, in the K10 architecture, each core has 64KB L1 instruction cache and 64KB L1 data cache, and "bulldozer" each core has 16KBL1 data cache, each module has 64KB bidirectional L1 instruction cache, as to reduce the L1 cache will affect performance remains to be seen. Two core shared L2 cache, shared L3 cache between modules and North Bridge.

AMD "Bulldozer" module

"Module" and "core", which makes us confused, actually for users, there is no need to deliberately pay attention to the concept of "module", this is only the name of AMD in the design, and when the product on the market, it will still be the core number for the logo, such as we say the use of Bulldozer architecture " The Interlagos server processor has 16 cores, not 8 modules. For the main reason why this "modular" design was used, AMD said it was "to reduce the redundant circuit of CPUs."

In the case of CMP, as the number of cores increases, the core area of the CPU becomes larger, the number of repetitive circuits increases, and so does the power consumption-because CMP is the way to replicate the core. The use of "modular" design can greatly reduce the redundant circuit, which has a significant increase in the core. For example, "bulldozers," two core shared floating-point parts, for most server applications, the part of integer operations is much higher than floating-point operations (except for high-performance computing), so sharing of floating-point execution units does not affect performance in most applications. The whole number of parts is not shared, otherwise it will cause bottlenecks.

As we reviewed the features of CMP and SMT design, we can think of the AMD "Bulldozer" architecture as a design between the two: two threads (cores) share floating-point execution units, but each has separate integer execution resources. This looks like another form of SMT, or an AMD-modified "Third Way" of AMD. Unlike traditional SMT design, however, SMT only replicates the core storage part, one thread A memory module (registerfile), and in the AMD "Bulldozer" architecture, each thread replicates the complete integer execution unit hardware, and one thread has a enclosure ( Registerfile) and a complete set of integer execution units.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.