From Scali ' s Openblog
https://scalibq.wordpress.com/2012/02/14/the-myth-of-cmt-cluster-based-multithreading/
The first time I heard the term "CMT," I wondered, did I ever ignore some kind of multithreaded technology? But when I retrieved it, I understood that if you Google the word, you should be taken to the AMD promotional materials page, the CMT is interpreted as "cluster-based multithreading" or "Cluster multithreading."
Oddly, on another page (http://dl.acm.org/citation.cfm?id=640477.640525), you'll see a different explanation:
Clustering with high-speed network connectivity has become a cost-effective platform for performing computationally intensive parallel multithreaded applications.
It is clear that the concept of cluster multithreading has been used before AMD proposes CMT, and is clearer: traditional PC clusters can be used to build a virtual supercomputer.
So CMT is just an "invention" of the agribusiness market sector, which has invented a term that sounds like SMT, which competes with Intel's hyperthreading technology, although HT is also a business concept, but that is the name of Intel on its SMT implementation, SMT (simultaneous multithreading is a widely recognized concept of multithreading, and Intel is not the first to invent SMT, as early as 1968, IBM has carried out this research.
The problem now is that a lot of people have bought this processor, they think that CMT is a very close to the concept of SMT, or even equivalent (equivalent), so they do the CMT and SMT run points (benchmark), I found the Anandtech A review article (HTTP://WWW.ANANDTECH.COM/SHOW/5279/THE-OPTERON-6276-A-CLOSER-LOOK/6) a few days ago.
Farmers have more than once claimed that multi-threaded multithreading (Hyper-threading) is more effective in facing server applications.
Let's take a look at the CPU used for the evaluation:
Opteron 6,276:8 Module 16 thread, contains two bulldozer chips, 1.2 billion transistors per piece
Opteron 6,220:4 Module 8 thread, 1.2 billion transistor
Opteron 6,174:12 Core 16 thread, contains two chips, total 1.8 billion transistors
Xeon X5650:6 Core 12 thread, 1.17 billion transistors
The problem here is simply to look at the number of transistors and know that the Opteron 6276 is a bit larger than Xeon, and is it fair to compare their performance? If you have a stack of hardware, you should be able to cope with more load, which Opteron still has an advantage, because it can handle 16 threads at the same time, and Xeon can only handle 12.
But if we look at the actual running results, we know that the opposite is true, AMD needs twice times the chip to achieve Intel single core performance, and more serious is that Intel's single chip response time is shorter than AMD. If this is not convincing, the old architecture of Opteron 6174 is more efficient than 6276.
What are we running when we're running? A series of database application scenarios, MySQL and MSSQL, are some integer application code, what is CMT doing? It doesn't do any special work for integer arithmetic! Each module has two independent integer cores, and the FPU inside the module is shared, but we don't use it, and this scenario is best for CMT.
We continue to look at the execution resources available to the CPU.
Opteron 6276 When the CMT is turned off:
8 Modules, 8 threads, 4 alu per module;
Each thread has two Alu, the same module within the different threads of the ALU can not be shared, so close the CMT to halve the number of threads, the number of Alu also halved;
A total of 16 alu;
When you open the CMT:
8 modules, 16 threads, 4 alu per module, 2 alu per thread;
A total of 32 alu;
Because CMT does not share the ALU, the CMT-based CPUs and the traditional SMP (symmetric multiprocessor) work exactly the same, so you can expect the same scalability (scaling), because the execution unit belongs to only a specific thread, so enabling CMT is just allowing more threads to run.
Xeon X5650 When SMT is off:
6 cores, 6 threads;
3 alu per core, 3 alu per thread;
A total of 18 alu;
When you open the SMT:
6 cores, 12 threads;
3 alu per core, two threads share 3 alu;
A total of 18 alu;
The difference between CMT and SMT is obvious: SMT in a single-threaded time, each thread compared to CMT can get more alu, in multi-threaded, each thread compared to the CMT of the thread to be divided into a relatively small alu.
This is why SMT to force and CMT does not give force, AMD before the CPU, each thread has 3 alu, but in order to reduce the size of the module, AMD can now only give each thread 2 alu, so the bulldozer's single-threaded performance compared to the past Opteron processors and Intel The Xeon are catching up very quickly.
At the same time, CMT does not reduce a lot of chip area, each module contains 4 alu. Yes, each module is now capable of running two threads, but single-threaded performance is so urgent that it can only be expected to stack up on the scalability.
What does CMT bring? Nothing, the chip area than the friend of the big, and even bigger than their own old products, and Xeon has a very good single-threaded performance, even if SMT does not have the CMT so good expansion, as can hold the heavy load of multi-threaded scene. But the biggest advantage of SMT is that it is a solution to save the chip area, if willing, intel can even and AMD the same two chips into a package, heap out a 12-core 24-thread processor, AMD burst.
I'm not sure amd really thinks that CMT is "efficient" because it requires more chip area, consumes more power, and in order to catch up with performance that is not high-end Xeon, AMD has to come up with the best Opteron 6276来, but X5650 is just a 2.66 GHz processor, The X5690 at the highest end is 3.46 GHz, which also shows the advantages of a small chip that can be high-frequency.
So let's not pretend that CMT is a useful technology, compared with SMT, can only be regarded as hehe business concept.
Translation The Myth of CMT