Analysis of SM architecture with huge changes
Since NVIDIA claims to adopt the third-generation stream multi-processor (streaming multiprocessors) Design on gf100, we will take a look at it next, what changes have NVIDIA made in the design of SM arrays.
NVIDIA said it has introduced a number of improvements in the third generation SM array architecture to make the SM array more powerful, and will also be highly programmable and efficient.
Analysis of stream multi-processor (SM)
Every time NVIDIA adjusts its architecture, it will adjust the number of stream processors in the SM array (in gf100, NVIDIA calls it Cuda core and Cuda core. In g80, each group of SM arrays contains 16 stream processors, and 24 SM arrays are added after the gt200 core, and 32 in the latest gf100. In addition to the latest gf100, the original g80 and gt200 core are all eight stream processors in one group, one sm of g80 contains two groups, and three sm of gt200.In the gf100 core, it is no longer deliberately grouped for the stream processor core..
Writer/whatisfermi4_d423/sm_thumb.jpg "width =" 260 "border =" 0 "/> SM array logical architecture
We can see that each SM array contains Instruction Cache (Blue box on the top, high-speed instruction cache, responsible for receiving storage thread blocks), twoWarp scheduter (Two orange boxes in the second line, the warp scheduler, responsible for breaking the thread blocks into threads for the following core), two Dispatch Unit (Two orange boxes in the third line, the Dispatch Unit, which is responsible for dispatching the commands allocated by the warp Scheduler ), Register File (Blue Box in the fourth line, register file, warp storing the allocation unit), 32 Cuda Core (Green square box, Cuda core, responsible for operation instructions), 16 Load/Store Unit (Green box marked with "LD/ST" on the right of the core, loading/storage unit, responsible for computing thread address), 4 Special Function Unit (Green Box with "SFU" on the rightmost side, special function unit, responsible for executing other abstract commands), 64 KB Shared Memory/L1 Cache (The blue box of the last 5th lines and 64 kB high-speed cache can be flexibly divided into shared memory and L1 cache ), Uniform Cache (Blue Box in the last and fourth lines, unified cache), fourTexture unit (The last three lines of the dark blue box, Texture unit, texture filling ), Texture Cache (The last line of blue box, texture cache) and Polymorph Engine (The bottom line is a yellow box with multiple shape engines and the main part of the Surface Subdivision ).