Functions, structure, and working principle of High-Speed Buffer Storage
High-speed buffer memory is a level-1 memory between the primary memory and the CPU. It is composed of a static storage chip (SRAM). Its capacity is small but its speed is much higher than that of the primary memory, which is close to the CPU speed. The cache function is used to store commands and data that need to be executed recently. The objective is to increase the access speed of the CPU to the memory. To solve this problem, we need to solve two technical problems: first, the mirror and conversion of the primary and cache addresses; and second, we need to replace the cache content according to certain principles.
The cache structure and working principle are shown in 2.3.1.
It consists of three parts:
Cache storage body: stores commands and data blocks transferred from the primary storage.
Address conversion component: create a directory table to convert the Master Address to the cache address.
Replace parts: When the cache is full, replace data blocks according to certain policies, and modify the address translation parts.
2.3.2 address image and conversion
An address image refers to the relationship between the address of a data in the memory and the address in the buffer. The following describes three address mirroring methods.
1. Full join Mode
Address image rules: any part of the primary storage can be mapped to any part of the cache.
(1) The primary storage and cache are divided into data blocks of the same size.
(2) A primary data block can be loaded into any cached space.
As shown in figure 2.3.2. If the number of cached blocks is CB and the number of primary storage blocks is MB, there are a total of CB × MB image relationships.
Figure 2.3.3 shows the format and address conversion rules of the Directory table. Directory tables are stored in related (connected) Storage, which includes three parts: the block address of the data block in the primary storage, the block address after it is stored in the cache, and the valid bit (also known as the loading bit ). Because it is fully connected, the directory table capacity should be the same as the number of cached blocks.
For example, the master storage capacity of a machine is 1 MB, the cache capacity is 32 KB, and the size of each block is 16 characters (or bytes ). Specify the address format, directory table format, and capacity of the master and cache.
Capacity: same as the number of buffer blocks, that is, 211 = 2048 (or 32 K/16 = 2048 ). Advantage: high hit rate and high cache storage space utilization.
Disadvantage: when accessing the relevant memory, each time it is compared with all the content, the speed is low, the cost is high, so the application is less.
2. Direct Connection
Address mirroring rules: a block in the primary storage can only be mapped to a specific block in the cache.
(1) The primary storage and cache are divided into data blocks of the same size.
(2) The primary storage capacity should be an integer multiple of the cache capacity. The primary storage space is divided into zones based on the cache capacity. The number of blocks in each zone of the primary storage is equal to the total number of cached blocks.
(3) A block in a certain area of the primary storage can only be stored in the same location as the block number in the cache.
Figure 2.3.4 shows the rules for directly connected images. It can be seen that data blocks with the same block number in each region in the primary storage can be transferred to the address with the same block number in the cache, but only one block in the same region can be saved to the cache. Because the primary and cache block numbers are the same, you only need to record the partition numbers of the incoming block during directory registration.
Figure 2.3.5 shows the format and address conversion rules of the master and buffer address formats and directory tables. The fields of the master, cache block number, and in-block address are identical. Directory tables are stored in High-Speed Small-capacity memory, which includes two parts: the Partition Number and valid bit of the data block in the primary storage. The directory table capacity is the same as the number of cached blocks.
Address conversion process: Use block number B in the primary address to access the Directory Memory and compare the read area number with area number E in the primary address. The comparison result is equal and the valid bit is 1, the cache hits. You can directly use the buffer address composed of the block number and the block address to fetch the data in the cache. The comparison results are not equal. The valid bit is 1 and can be replaced, if the valid bit is 0, you can directly call the required block.
Advantage: The address mirroring method is simple. when accessing data, you only need to check whether the area code is equal. Therefore, the access speed is faster and the hardware device is simple.
Disadvantage: frequent replacement operations and low hit rate.
For example, in the above example, the primary storage capacity is 1 MB, the cache capacity is 32 KB, and the size of each block is 16 characters (or bytes ). Specify the address format, directory table format, and capacity of the master and cache.
Capacity: same as the number of buffer blocks, that is, 211 = 2048 (or 32 K/16 = 2048 ).
3. Group connected Image
Associated image rules:
(1) The primary storage and cache are divided into blocks in the same size.
(2) The primary storage and cache are grouped by the same size.
(3) The primary storage capacity is an integer multiple of the cache capacity. The primary storage space is divided into zones based on the buffer size. The number of groups in each zone of the primary storage is the same as that in the cache.
(4) When the data stored in the master database is transferred to the cache, the Group Number of the master database and the cache must be the same, that is, a block in each district can only be stored in the space of the same group number in the cache, however, each block address in the group can be stored as needed, that is, direct Mirroring is used between the master and cache groups, and full-link Mirroring is used within two corresponding groups.
Figure 2.3.6 shows the association of images in the group. In the figure, the cache is divided into CG groups, with each group containing GB blocks. The primary storage is twice the cache's me, so there are a total of me partitions, each area has a CG group, and each group has GB. Therefore, the format of the primary storage address should contain four fields: area code, area group number, intra-group block number, and intra-block address. The cache contains three fields: group number, group block number, and block address. The conversion between the primary address and the cache address is divided into two parts. The Group address is accessed by address in direct image mode, and the block address is accessed by content in Full-link mode. The connected address translation component also uses the relevant memory, as shown in Figure 2.3.7.
Each unit in the memory contains the area code E in the primary address and block number B in the group. The fields corresponding to the two are cache block address B. The storage capacity should be the same as the number of cached blocks. When you access data, first find the directories of each block contained in the group in the directory table based on the group number, and then the master and block numbers of the accessed data and the group block numbers, compare with the directories of each block in the group. If the comparison is equal and the valid bit is "1", it is in progress.
The corresponding cache block address B can be sent to the block address field of the cache address register. It is assembled with the group number and block address to form a cache address. If the comparison is not equal, it indicates that there is no hit, and the accessed data block has not yet entered the cache, It will be replaced in the group; if the valid bit is 0, it indicates that the cached block has not been used, or the original data is invalid. You can call the new block again.
Advantage: The Block conflict probability is relatively low, the block utilization is greatly improved, and the block failure efficiency is significantly reduced.
Disadvantage: The implementation difficulty and cost are higher than the direct image method.
2.3.3 replacement policy
According to the program locality rules, we can know that during the running of a program, the commands and data recently used are always frequently used. This provides a theoretical basis for the replacement strategy. Based on various factors such as hit rate, difficulty of implementation, and speed, the replacement strategy can be random, first-in-first-out, and least-recently used.
1. Random Method (RAND)
The random method is to randomly determine the replaced storage block. Set a random number generator to determine the replacement Block Based on the generated random number. This method is simple and easy to implement, but the hit rate is relatively low.
2. first-in-first-out (FIFO)
The first-in-first-out method selects the first incoming block for replacement. When the block is first transferred and hit multiple times, it is likely to be replaced first, so it does not conform to the local rules. The hit rate of this method is better than that of the random method, but it does not meet the requirements. The first-in-first-out method is easy to implement. For example, the solar-16/65 server cache uses a group-connected method. Each group has four parts and each part has a two-digit counter, when a block is loaded or replaced, the counter of this block is cleared to 0, while the counters of other blocks in the same group are added to 1, if you want to replace the block with the largest Count value, replace the block with the largest Count value.
3. Least recently used method (LRU method)
The LRU method is based on the use of each block. It is always used to replace the least recently used block. This method reflects the local law of the program.
There are multiple methods to implement the LRU policy. The following describes how to design the counter method, register stack method, and hardware logic comparison method.
Counter method: A counter is set for each part of the cache. the operation rules of the counter are as follows:
(1) When a block is transferred or replaced, its counter is "0", while other counters are added with "1 ".
(2) When access hits, the Count value of all blocks should be compared with the Count value of hit blocks. If the Count value is smaller than the Count value of hit blocks, the block Count value is added with "1". If the block Count value is greater than the hit block Count value, the value remains unchanged. Finally, the counter that hits the block is cleared to 0.
(3) When replacement is required, the block with the largest Count value is replaced.
For example, the cache of the IBM 370/65 machine is connected by a group. Each group contains 4 blocks and each segment is configured with a two-digit counter. Its working status is shown in Table 2.3.1.
Table 2.3.1 implement the LRU policy using the counter Method
Primary storage block address |
Block 4 |
Block 2 |
Block 3 |
Block 5 |
|
Block number |
Counter |
Block number |
Counter |
Block number |
Counter |
Block number |
Counter |
Cache Block 0 |
1 |
10 |
1 |
11 |
1 |
11 |
5 |
00 |
Cache Block 1 |
3 |
01 |
3 |
10 |
3 |
00 |
3 |
01 |
Cache Block 2 |
4 |
00 |
4 |
01 |
4 |
10 |
4 |
11 |
Cache block 3 |
Null |
Xx |
2 |
00 |
2 |
01 |
2 |
10 |
Operation |
Starting Status |
Transfer |
Hit |
Replace |
|
Register stack method: Set a register stack to the number of items that are selected when the cache is replaced. For example, in the group association mode, it is the number of blocks in the same group. From the top of the stack to the bottom of the stack, the block numbers of the primary storage data stored in the cache are recorded in sequence. The following uses four blocks in a group as an example to describe their work, as shown in table 2.3.2. Table 1 ~ 4 is the four block numbers in the cache.
Table 2.3.2 register stack implementation
Cache operations |
Initial status |
Transfer 2 |
Hit block 4 |
Replace Block 1 |
Register 0 |
3 |
2 |
4 |
1 |
Register 1 |
4 |
3 |
2 |
4 |
Register 2 |
1 |
4 |
3 |
2 |
Register 3 |
Null |
1 |
1 |
3 |
|
(1) When the cache is still idle, if it does not hit, you can directly call the data block and press the newly accessed buffer block number into the stack, located at the top of the stack. In other stacks, each unit is pressed down from top to bottom until the idle unit ends.
(2) When the cache is full and data access hits, the accessed cache block number is pushed into the stack, the content of other units is pressed down from top to bottom until the original position of the hit block number is reached. If the access is not hit, it must be replaced. At this time, the block number in the bottom unit of the stack is not used for the longest time. Therefore, the new access block number is pushed into the stack, and the content of each unit in the stack is pressed down until the bottom of the stack. Naturally, the block pointed out at the bottom of the stack is replaced.
Comparison Method: The comparison method uses a set of hardware logic circuits to record the time and times used by each block.
Assume that there are four in each cache group. When the cache is replaced, it is not used for the longest time among the four. There can be 6 comparisons between the four segments. If an RS trigger is used for the comparison between the two parts, six triggers (T12, t13, T14, T23, T24, t34) are required ), if T12 = 0 is set, Block 1 is not used for the longest time than block 2. If T12 = 1 is set, Block 2 is not used for the longest time. The status of the trigger related to the block must be modified each time the block is accessed or newly imported. According to this principle, a set of encoding States composed of six triggers can indicate the block to be replaced. For example, if Block 1 is replaced by T12 = 0, t13 = 0, and T14 = 0, Block 2 is replaced by T12 = 1, T23 = 0, t24 = 0 and so on.
2.3.4 cache consistency
The cache content is a part of the primary storage content and a copy of the primary storage. The content should be consistent with that of the primary storage. Because:
(1) The CPU writes the cache, but the master memory is not written immediately;
(2) the I/O processor or I/O device writes the primary memory.
As a result, the cache is inconsistent with the primary storage content, as shown in 2.3.8.
Solution to inconsistent cache write operations:
1. The full writing method is also called the Direct Writing Method.(Wt method-Write Through)
Method: Write the content to the master database while writing the content to the cache.
Advantages: high reliability and simple operation process.
Disadvantage: The write operation speed is not improved, which is the same as that of the write master.
2. Write-Back Method(WB method-write back)
Method: When the CPU executes the write operation, only the cache is written, not the primary memory.
Advantage: high speed.
Disadvantage: poor reliability and complicated control operations.
2.3.5 cache Performance Analysis
1. cache system acceleration Ratio
The main purpose of the cache technology used by the storage system is to increase the access speed of the memory. The acceleration ratio is an important performance parameter. The cache storage system speed-up ratio Sp (speedup) is:
Where: TM refers to the access period of the primary storage, TC indicates the access period of the cache, t indicates the equivalent access period of the cache storage system, and h indicates the hit rate.
We can see that the acceleration ratio is related to two factors: hit rate h and the ratio of cache to master memory access cycle TC/TM. The higher the hit rate, the higher the acceleration ratio. Figure 2.3.9 shows the relationship between the acceleration ratio and the hit rate.
2. cache hit rate
There are many factors that affect the cache hit rate, such as the cache capacity, block size, image mode, replacement policy, and address Stream Distribution during program execution. Generally, the higher the cache capacity, the higher the hit rate. When the capacity reaches a certain level, the higher the hit rate, the higher the hit rate, and the higher the cache block capacity, the higher the hit rate, however, when a value is increased, the hit rate decreases. The direct image method has a low hit rate and the full join method has a high hit rate. In the group join mode, the more groups are allocated, then the rate of life in progress decreases.
Article by: http://www.cnblogs.com/freebye/archive/2005/04/08/133699.html