What is Fermi? (10)

Source: Internet
Author: User

 

64 kB shared storage

In the first generation of CUDA architecture, to improve the efficiency of application execution, NVIDIA first added the concept of shared memory, and indeed achieved good results. Shared storage is designed in each SM array and directly connected to each stream processor, greatly improving the data uptake accuracy.

After discovering the importance of shared memory, NVIDIA provides 64 kB shared memory and L1 cache for each group of SM products in this GF100 product.

In each SM array, 64 KB On-chip memory is designed. The total capacity of 64 KB is actually composed of 16 KB and 48 KB.

. There are two modes:16 KB L1 cache and 48 KB shared cache; or 16 KB shared cache and 48 KB L1 cache.

By combining two different forms, L1 cache can better complement high-speed shared cache. The main difference between the two lies in that the shared cache can improve the memory access speed by clearly defining the Memory Access Algorithm, while the L1 cache can improve the memory access speed for the rest of the irregular algorithms, in addition, these irregular algorithms do not know the data address in advance.

During graphic processing, 16 KB L1 cache can be used in each SM array, and L1 cache can be used as a buffer for register overflow to Improve the efficiency. In parallel computing, L1 cache and shared memory can work together to allow threads in a thread block to collaborate with each other to reduce off-chip data communication and greatly improve the execution efficiency of CUDA programs. Based on different requirements, 64KB memory is allocated reasonably to achieve better performance.

 

Global shared second-level high-speed cache


In addition to the primary Cache, NVIDIA also designed a kb L2 Cache (secondary Cache) for the GF100 among the four GPC ). The L2 cache mainly serves devices that need to load, store, and texture requests, and the data in the L2 cache can provide data sharing for the entire GPU, this greatly improves the data communication capability between GPC and SM.

The L2 cache in GF100 is designed as read-only and write operations, which is more flexible than the read-only L2 cache in the GT200 architecture. NVIDIA said they adopted a priority algorithm to clear data in the L2 cache, which contains various checks to help ensure that the required data can reside in the cache.

For example, the L2 cache can provide faster efficiency for uncertain data addresses, such as computing physical effects, ray tracing, and coefficient data structures. However, when multiple SM arrays are required to read the same data (such as the post-processing filter), the L2 cache is also a better solution.

Solution.

In addition, L2 caching can balance the cache among various SM arrays. For example, in a group of SM arrays, after a high-speed cache is preemptible by a program, the program cannot be stored across the SM array, however, although the cache in another group of SM arrays is not fully occupied, there will still be idle. In this case, the L2 cache can transfer the excess pre-configured cache overflow part to another group of cache with free space, so as to make full use of the cache.

In GF100, L2 cache replaces L2 texture cache, drop cache, and on-chip FIFO in nvidia gpu.

In addition, the memory stored in the L2 cache executes access commands in the program sequence, which also provides a solid foundation for nvidia cuda to support the C/C ++ language. When the Read and Write paths are separated (for example, a read-only texture path and a write-only drop path), there may be a risk of first writing and then reading. A unified read/write path ensures that the program runs properly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.