This chapter focuses on the page-based virtual memory image and address transformation process; LRU and FIFO replacement algorithms; LRU stack analysis process; cache group associated address image and LRU block replacement; virtual storage, the performance analysis of the cache must be comprehensive. This chapter is a key chapter. The basic concepts required include LRU, FIFO, fully connected, direct image, group association, fast table, hit rate, address conversion, page, segment, segment Page Management, virtual memory, high-speed cache and so on.
I. Local principle of Memory Access
The computer's requirements for memory are high speed, large capacity, and low price.
According to a large amount of statistics, 90% of access to a bucket in a program is limited to 10% of the region where the bucket is located, the other 10% of accesses are distributed in the remaining 90% of the storage space. This is the general principle of locality. The local laws of memory access include:
1. Time locality: If a storage item is accessed, it may be accessed again soon.
2. spatial locality: If a storage item is accessed, the item and its adjacent items may be quickly accessed.
In order to solve the contradiction between storage capacity and speed, people applied the principle of access locality and designed the storage system into a hierarchical structure to meet the usage requirements. In this Hierarchical Storage System, it is generally composed of registers, high-speed cache (cache), primary memory (memory), external memory (hard disk, etc. The register is the highest level of storage, with the smallest capacity and the fastest speed. The Register is not transparent to the programmer. The access to the register is based on the register name instead of the address.
Ii. Basic principles of storage system composition
Because the storage system adopts a layered structure, it is very important to manage data access between different layers in the storage system. Generally, management functions are distributed at various layers. The storage management controllers at each layer control data access at this layer and related layers. The Unit for transferring data between layers is block or page.
Hit rate refers to the ratio of the number of hits to the total number of visits, and the ratio of failure to the total number of failed visits. The hit time includes the time required to determine whether to hit or not and the time required to access the upper-layer storage, the expiration time includes the access time to the lower-level memory and the time required to transfer the data in the lower-level memory to the upper-level memory (Transfer Time)
The purpose of the memory design is to reduce the average access time rather than simply increase the hit rate. That is to say, the speed performance indicator of Hierarchical Storage is the average time of memory access. There are also indicators such as bandwidth and storage cycle.
Average access time = hit time + failure time x Failure Rate
The Hierarchical Storage System must solve three problems:
1. Locate the problem: Where is the data block stored in higher-level memory? How to determine and find the block? This is a block identifier and addressing problem. Generally, the lookup table is used to locate, identify, and address the image block or page.
2. Replacement problem: when a hit is not hit, the upper data block must be transferred from the lower layer. If the upper layer is full, how can we replace the data in the upper layer? What is the best method. This is the problem of replacing the policy.
3. Update question: when will the upper-layer results be written to the lower-layer storage when write access is required? Because the upper-layer data is newer than the lower-layer data after calculation, this write method is used to solve the problem of data consistency between the upper and lower layers.
This chapter focuses on solving these three problems. After these three problems are solved, the main problems of Hierarchical Storage System Management are also solved.
Iii. Cache)
This section describes how the Cache solves the preceding three problems. The management of other layers is similar to the solution.
High-speed cache is a high-level storage subsystem between the CPU and the primary storage. The main purpose of using high-speed cache is to increase the average access speed of the memory, so that the memory speed matches the CPU speed.
1. Basic working principle and structure of Cache
The Cache is usually composed of two parts: block table and fast storage. Figure 7.4 shows the basic structure of the Cache. The working principle is that the processor accesses the memory based on the primary address, the high segment of the memory address determines whether the storage unit of the address is in the Cache through the primary storage-Cache address image mechanism. If the storage unit of the address is in the Cache, the Cache hits and accesses the Cache according to the Cache address. Otherwise, if the Cache does not hit the Cache, you need to access the primary storage and transfer the corresponding data block from the primary storage to the Cache. If the Cache is full, then, you need to replace one part of the Cache according to an algorithm and modify the relevant address image relationship.
We can see from this working principle that it involves two problems. First, locate and then replace the problem.
The Cache is transparent to programmers. The address translation and data block replacement algorithms are implemented by hardware. Generally, the Cache is integrated into the CPU to increase the access speed.
2. Next we will show you how to map and transform addresses in the Cache.
Because the access from the processor is based on the primary address, and the Cache space is much smaller than the primary address, how do you know if the access content is in the Cache? Where is the access content in the Cache? This requires an address image to map the addresses in the primary storage to the addresses in the Cache. Make a storage block (Space) in the Cache correspond to several blocks in the primary storage. In this way, when accessing a primary storage address, you can know which address is in the cache. There are three methods for address image: direct image, fully connected image, and group connected image.
Direct image is to map the primary address to a specified address in the Cache. At any time, the data of the storage unit in the primary storage can only be transferred to one location in the Cache. This is fixed. If data already exists in this location, a conflict occurs and the original block will be replaced unconditionally.
A fully connected image is a method in which any primary address can be mapped to any Cache address. In this way, the data of the storage unit in the primary storage can be transferred to any location in the Cache. A block conflict occurs only when all the blocks in the Cache are full.
A group connected image refers to a page that divides a bucket into several groups. Direct images are displayed between different groups, while all blocks in the group are connected.
The following compares the three address image modes.
| |
Direct Image |
Fully Connected Image |
Group connected Image |
| Process |
(1) The primary storage address is divided into area code, block number, and block address. (2) intercept from the primary address and use the Cache address pair as the Cache address. (3) Access the directory table by block number and compare the read area code with the area code in the primary address (4) If equal, hit (5) If the block is not equal and the block is invalid, stop Cache access. Access the primary storage and tune the block |
(1) The primary address is divided into the primary block number and the intra-block address. (2) Use the primary storage block number to compare with the directory table (3) If they are the same, the Cache block number is taken out. The Cache block number is spliced with the address in the block to access the Cache. (4) If none of them are the same, the system will generate missing blocks and adjustment blocks. |
(1) primary storage address Partition Number, group number, block number, and block address (2) Use the group number to select a group (3) Compare the groups with area codes and block numbers (4) or cannot be found, the block becomes invalid. (5) if the same is found, the read Cache block number is spliced with the group number and the address in the block to form the Cache address. |
| Directory table |
Long: Cache size Width: Master Address bit-Cache address bit |
Long: Cache size Width: (primary storage block number + Cache block number) Bit Compare the number of primary storage Blocks |
Length: 2ncbCache size Width: (area code + 2 blocks) (Area code + block number) Bit participation and Comparison |
| Advantages |
(1) Hardware saving, small directory tables, and low costs (2) Access Cache and access area code table simultaneously |
(1) Minimum block conflict (2) Maximum Cache space utilization |
The advantages of centralized full-phase and direct image make up for their shortcomings |
| Disadvantages |
(1) the probability of block conflict is high. (2) Low Cache space utilization |
(1) The image table is too long. (2) Slow query speed |
The block conflict is still greater than the fully connected Utilization is lower than full-Link Directory table is greater than direct mode |
3. Replace and update the primary storage policy
There are three methods for the Cache to read data blocks from the primary storage: read, pre-read, and read. The three methods have their own advantages and disadvantages. Please pay attention to the comparison. The textbook mentions that "it is more appropriate to place shared data in primary storage than in Cache, especially in multi-processor systems", because shared data is often rewritten by other processing processes, data Consistency is often involved. Therefore, it can be ensured to be single in the primary storage to avoid data consistency errors.
In a hierarchical storage system, when accessing the content of a layer of storage, data blocks from this layer are copied to the upper-layer storage layer, and the upper-layer storage capacity is less than the lower-layer storage layer, when copying data to the upper layer, the original data block will be replaced. If the replaced block contains newly written data (such as computing results), the data must be written to the corresponding block of the lower-level storage, which involves the update policy.
In direct image mode, the block replacement algorithm does not exist. Because the position image of each block is fixed, you can directly determine which data is required to transfer the block data to the upper layer to determine the position. The other two types of images have the replacement policy, that is, the Cache block to which the image is to be replaced. That is, the replacement algorithm.
The replacement algorithm is based on the overall memory performance, mainly the access hit rate of the Upper-layer memory. The following compares several common replacement algorithms:
| |
Thoughts |
Advantages |
Disadvantages |
| Random algorithm RAND |
Generate the page number to be replaced in the upper layer with a soft or hard random number generator |
Simple and easy to implement |
The "historical information" used by the upper-layer storage is not used, the program locality is not reflected, and the hit rate is low. |
| FIFO |
Select the first page to load the upper layer as the replaced page |
Easy to implement and utilize the historical information of the primary storage |
The program locality principle cannot be correctly reflected, and the hit rate is not high. An exception may occur. |
| Least recently used LRU |
Select the least recently accessed page as the replaced page |
Correctly reflects the program locality. The hit rate is high by using the history information of memory access. |
Complicated implementation |
| Optimized replacement algorithm OPT |
Replace unused pages in the future |
The highest hit rate, which can be used to measure other replacement Algorithms |
Unrealistic, just an ideal Algorithm |
For the block replacement policy, you must master the replacement process of FIFO, especially the LRU algorithm, and draw a table of its allocation conditions for analysis.
To maintain the consistency between the data in the Cache and the data in the primary storage, the data in the Cache and the primary storage must be written. Generally, there are two update policies:
| Update policy |
Thoughts |
Advantages |
Disadvantages |
| Write-Back Method |
It means that when the CPU executes the write operation, the information is only written to the Cache. Only when it needs to be replaced can the modified Cache block be sent back to the primary storage (written back) first, and then the block is called. |
This saves a lot of unnecessary overhead for writing intermediate results to the primary storage. |
The modification bit is required to increase the Cache complexity. |
| Full writing (Direct Writing) |
During write operations, data is written to both the Cache and primary storage. |
Low overhead and simplicity |
It took a lot of time to write intermediate results. |
In addition, when the write Miss (that is, when the cache block is written, this block is already replaced and cannot be found in the cache), is it necessary to retrieve this block in the cache, there are two solutions: first, the write allocation method is not used, that is, it is directly written to the primary storage, and the block corresponding to the address is no longer transferred back to the cache. the second is the write allocation method, which is written to the primary storage, and this part is transferred from the primary storage to the cache. generally, the write-back method uses the write-by-write allocation method, while the write-by-write method does not.
4. data cache, command cache, and integrated Cache
The cache is used to store data and commands separately, which can be divided into the preceding two types of cache. The integrated cache stores both data and commands. In general, the cache hit rate after separation is improved.
5. cache Performance Analysis (Simple Application)
The cache hit rate has a significant impact on the computer speed. Practice has proved that the smaller the cache size, the greater the impact of the address image method and the replacement policy on the hit rate.
When the Group size is certain, the higher the cache capacity, the higher the hit rate.
When the cache size is determined, the group size or block size will affect the non-hit rate. Because the block size is completely connected, the larger the block size, the higher the hit rate.
We know that the speed and performance of the storage system should be measured by the average access time. The formula for calculating the average read access time is as follows:
Ta = hctc + (1-hc) TM
HC indicates the hit rate, TC indicates the access time when hit, and TM indicates the access time. You can use this formula to calculate multi-layer cache. That is, the upper-level memory hit time plus the time when the lower-level memory is accessed when the hit is not hit.
Iv. Primary memory bandwidth widening
We have learned the High-level memory cache in the Hierarchical Storage System. Now we will discuss the primary storage.
Memory performance indicators include capacity, speed, and price. Memory Speed indicators include access time, access period time, and bandwidth. The main measures to increase the bandwidth of the primary storage are as follows:
1. Increase the data width of the memory. (That is, increasing the number of data digits)
2. multi-body crossover technology using memory.
The multi-body crossover of memory is an effective way to increase its data bandwidth. Here, I will explain it with my own understanding:
| We know that each storage unit in the memory must be given an address to be accessed. Parallel Memory is composed of multiple storage bodies. Parallel access can speed up access, but this is related to the addressing method. For example, there are eight storage units in the primary storage space (the 3rd power of 2, which is easier to understand and the actual number of units is much larger than this ), the address of the entire bucket is composed of three binary numbers: From 000 to 111. If the bucket is implemented using two storage bodies, there are two addressing methods. One is "high cross" Addressing, which is to allocate the first digit of the address code to two storage bodies. The first is 0, and the second is 1 (if there are four storage bodies, it is assigned to the first two digits, and so on) the unit in the first storage body is encoded with this code: 000,001,010,011 (do you see that the first digit is 0 ); the four addresses of the second storage unit are: 100,101,110,111. In this way, when you access the data of two adjacent storage units, such as 110 and 111, the data is stored in the second storage body and can only be accessed in this body, the first storage body is idle and no one accesses it. Generally, data is stored in the address-contiguous memory. Now, we can see why high-level cross-addressing memory is suitable for multi-machine systems, that is, because each processor generally accesses the data they need and the data is placed in different storage bodies, the two storages can work at the same time, thus accelerating the speed. Another method is low-level cross-addressing. In the preceding example, the last address code is the address code allocated to the storage body. The first storage unit is 000,010,100,110 (the last one is always 0 ), the storage unit in the second memory is 001,011,101,111. This method distributes the storage units of adjacent addresses in Different Storage bodies. Therefore, when accessing data of adjacent cells, multiple Parallel storage bodies can work simultaneously for access, so it is suitable for high-speed data access within a single processor. |
The method to increase the memory data width is also the method to increase the bandwidth, generally using a single multi-word method. The actual bandwidth of multi-body crossover memory is higher than that of a single multi-character memory.
V. Virtual Memory
The virtual memory is an extension of the primary storage. The size of the virtual memory depends on the computer's memory access capability rather than the actual size of the external storage. The actual storage space can be smaller than the virtual address space. From the programmer's point of view, external storage is considered as a logical storage space. The accessed address is a logical address (virtual address ), the virtual memory enables the storage system to have both the external storage capacity and access speed close to the primary storage.
The access to virtual memory also involves the image and replacement algorithm of virtual addresses and real addresses. This is similar to that in cache. The address image we mentioned earlier is in block units, in virtual memory, the address image is in the unit of page. To design a virtual storage system, you must consider the primary storage space utilization and primary storage hit rate.
There are many similarities between the management methods of virtual memory and cache memory. They all require address image tables and address conversion mechanisms. However, the two are also different. Pay attention to the comparison.
There are three different management methods for Virtual Memory: The storage image algorithms are divided into segments, pages, and segments. The basic principles of these management methods are similar.
Segment management: the storage management method that distributes primary storage by segment. It is a modular storage management method. Each user program module can be divided into one segment. This module can only access the primary storage space corresponding to the segments allocated to this module. The segment length can be set at will, and can be scaled out or reduced.
The system uses a field table to specify the position of each segment in the primary storage. The segment table includes the segment name (segment number), segment start point, loading bit, and segment length. The field table itself is also a segment. Segments are generally divided by program modules.
Page Management: divides virtual storage space and actual space into fixed-size pages. Each virtual page can be loaded into different actual page locations in the primary storage. In page storage, the processor's logical address consists of the virtual page number and the page address. The actual address is also divided into the page number and the page address, the address image mechanism converts the virtual page number into the actual page number of the primary storage.
Page Management uses a page table, including the page number, the starting position of each page in the primary storage, and the loading position. A page table is a ing table between a virtual page number and a physical page number. Page Management is implemented by the operating system and transparent to application programmers.
Segment-based Page Management: This is a combination of the above two methods. It divides a bucket into segments by logic module, and each segment is divided into several pages. Memory Access is performed through a segment table and several page tables. The segment length must be an integer multiple of the page length, and the start point of the segment must be the start point of a page.
Currently, operating systems generally adopt segment-based page management. The following three management methods are compared:
| |
Segment Management |
Advantages |
Disadvantages |
| Address conversion process |
Multi-user (module) addresses can be divided into three parts: program number, segment number, and intra-segment offset. The address conversion process is as follows: (1) The program number finds the base address register of the field table, which contains the start address and length of the field table. (2) Compare the length of the field table with the field number to check whether the table is out of the border. Normal conversion (3) (3) locate the corresponding table items in the segment table, including the primary storage address, loading location, access location, segment length, and secondary storage address. (4) check whether the loading bit is "1" (in the primary storage) and "1" to (5). Otherwise, the missing segments are interrupted and transferred from the secondary storage to the primary storage. (5) The real physical address is formed by the offset in the primary address + segment. |
(1) Multi-program segmentation, multi-program or parallel programming, reducing the programming time; (2) If each segment is relatively independent, its modification and expansion will not affect other segments; (3) implement virtual storage; (4) Ease of sharing and protection. |
(1) manages the primary memory in segments. The utilization of the primary memory is not very high, and a large number of sub-databases are used; (2) In order to form a valid address, multiple accesses are required, reducing the access speed; (3) it is complicated to allocate and recycle idle zones; (4) The address field and length field in the segment table are long, reducing the query speed. |
| Page Management |
The user logical address is divided into three parts: user sign, User Virtual page number, and intra-page offset. The process is as follows: (1) Find the base address register of the corresponding page table with the page table address. (2) Find the corresponding table items in the page table from the page table's start address and page number. (3) check whether the Mount bit is "1" (in the primary storage). convert it to '1' (4). Otherwise, page missing interruption occurs. (4) a valid address is formed by the primary block number and offset on the page. |
(1) The page table items are short, reducing the table access time. (2) fewer headers. (3) fast transfer speed. |
(1) Forced paging. The page has no logic meaning and is not conducive to storage protection and expansion. (2) a valid address must be generated multiple times and the access speed decreases. |
| Segment and Page Management |
The user logical address is divided into four parts: user sign, segment number, page number, and intra-page offset. The process is as follows: (1) locate the base address register of the field table with the user logo. (2) check whether the length and number of a field table are out of the border. (3) locate the corresponding table items in the segment table at the beginning of the field table and the field number. (4) Check the loading bit and segment length. (5) locate the corresponding table items in the page table from the page table's start address + page number. (6) Check the mounting position. (7) The Real page number + offset in the page form a valid address. |
Segment-based and page-based advantages |
A valid address must be accessed three times at a slow speed. |
Please understand the management methods in the textbook.
Webpage-based Virtual Memory Structure and Its Implementation: the main problem to be solved is the processing of page failures and the conversion speed from virtual addresses to real addresses. In addition, there are also issues with the protection of virtual memory.
When performing address conversion in a virtual memory, the virtual page number needs to be transformed into the internal address transformation of the page number in the primary storage. This is generally achieved through querying the internal page table. If the page is invalid, you also need to change the address of the external page table and then transfer the address from the external store to the page. Therefore, improving the access speed of the page table is the key to improving the address transformation speed. According to the local principle of memory access, some table items with high probability pages are placed in the table consisting of the fast hardware, and the entire table is placed in the main memory, which introduces the concept of fast table and slow table. When you look up a table, both the quick table and slow table can be searched at the same time. The existence of a quick table is transparent to all programmers.
· Virtual storage protection is essential for multi-channel program systems and multi-user systems. The protection of the storage system is divided into the protection of storage areas and the protection of access methods. The protection methods for virtual memory include image table protection, key protection, and ring protection.