2009 (tick time), Intel processor process into the 32nm era, 2010 tock time, Intel launched the code-named Sandy Bridge processor, the processor uses 32nm process. Sandy Bridge (previously known as Gesher) is the successor of Nehalem and its process upgrade, evolving from 45nm to 32nm. Sandy Bridge will have eight core versions, the level two cache is still 512KB, but the level three cache will be expanded to 16MB. And the main feature of Sandy Bridge is to join the game instrution AVX (Advanced Vectors Extensions) technology, which is the former Vsse. Intel claims that using AVX technology for matrix computing will be 90% faster than SSE technology. Its importance comparable to 1999 Pentium III introduced SSE.
From a high-level perspective, the SNB architecture is only an evolution, but if you look at the scale of the transistor changes since nehalem/westmere, it is definitely a revolution. Core 2 introduces a logic block called a circular flow detector (LSD) that shuts down the branch predictor, prefetch/decode engine when the CPU executes the software loop, and then supplies the execution unit with its own cached micro-command (micro-ops). This approach saves power and improves performance by closing the front-end at the time of loop execution.
A SNB cache is added to the code to temporarily store the instruction when decoding it. There is no strict algorithm, the instructions will be placed in the cache as long as the decoding. When the prefetch hardware obtains a new instruction, it first checks whether it exists in the micro-instruction cache, and then the front-end is closed by the cache for the remainder of the pipeline service. Decoding hardware is a very complex part of the x86 pipeline, and shutting it down can save a lot of power. If this technology can also be introduced into the Atom processor architecture, it can certainly benefit.
This cache is mapped directly to store approximately 1.5K of instructions, equivalent to the 6KB command cache. It is located in the first level instruction cache, most of the program's hit rate can reach about 80%, and the bandwidth is higher than the first level instruction cache, more stable. The true first-level instructions and data caches have not changed, and are still 32KB, totaling 64KB.
This looks a bit like the Pentium 4 cache, but the big difference is that it does not cache tracing, but more like an instruction cache, which stores a micro-instruction rather than a x86 instruction (Macro-ops).
Sandy Bridge Benefits:
1. Front End
2. Physical register file (PRF) and implementation improvement
3, Ring bus and level three cache
4. System Assistant
5, integrated graphics core
6. Media Engine
7. Next Generation Turbo Boost
Sandy Bridge Features:
1, wider vector operations: from 128-bit to 256-bit, and maintain backward compatibility
2, enhanced data rearrangement: Single operation can handle 8 32-bit data simultaneously
3. Supports three operands and four operands, non destructive syntax
4, support the flexible visit address is not aligned
5, scalable new operation code (VEX)
6, Stronger integrated display core