"In-depth understanding of computer systems" notes
First, a review
The larger the storage addressing time, the lower the efficiency, although relative to each data computer will be used, but at a certain stage, a certain time, the use of the data range is relatively fixed .
processors need faster processing speed, need to get instructions and data quickly, and these instructions and data are in low-level storage (hard disk, such as local storage or network storage), simply take the hard disk, read time includes the transmission time, rotation time, seek time, efficiency is too low. In order to get the processor to the data faster and better utilize the processor's performance, modern processors evolve the functional unit of branch prediction, allowing the processor to calculate (not within this scope) before executing to a branch. In addition, in addition to the internal file registers, the computer adds a multilevel cache between the processor and main memory to hold the instructions and data that the processor needs to use. Therefore, when the processor is active, it is possible to synchronize the instructions and data that will be used later in the cache (instruction cache/data cache) through technology such as branch prediction, which can make the whole processor pipeline more efficient.
Note: The efficiency of L0,L1,L2,L3-level cache is reduced to a hundredfold, L4 main memory efficiency compared to L1 is the day difference. In the case of the same objective condition, the larger the storage is, the slower the addressing speed. Although the material is different from the above, not only the size of the problem, but the material aspect is not within the scope of this discussion.
Second, cache
caching is a cache of low-level storage, or "condensed and fused" to a low-level cache (L1 is a L2 cache, L2 is a L3 cache). By establishing a set of rules and logic, we read the data in the low-level storage to the higher cache as needed.
The structure of the cache is divided into the following parts (concepts)
Group ROW Data Block
S E Block
The size of a cache (C) is S*e*b (blockSize) where B S must be a power of 2 n
2.1 Why is it so divided?
We need a mechanism to establish a relationship between the memory address and the cache index , which is one of the ways.
s represents the number of cache groups, E represents the number of rows in each group, B represents the chunk size in each row, and with this s and B values, we can mask memory addresses and get corresponding group values and data block offsets.
in a simple step, say:
1) The information in the computer is expressed in binary notation.
2) The S group is a power of 2 N, which means that they are all represented by the number of 2 decimal digits of the range (if s=5, the value may be 0,1,10,11,100,101; But the three-bit binary number has 110, 111, and the group index cannot be computed using three-bit intercept addresses)
3) b data block is also 2 of the power of N, the reason with S.
So s = log2 (s) is the number of bits of the group index in the address, and B = log2 (b) is the number of bits of data block offset.
1000 1000 1000 1000
T S B
What is 2.2 t?
T represents the identity, imagine, since it is the cache, then the low-level storage is cached, the size must be smaller than low-level storage, so faster. Smaller represents more data to use a smaller location, that is, the data of multiple addresses using a cache location to save the number (covering what rules later talk about), so long need an identity to indicate the cache unit of the number in the end is which memory, otherwise read the data is not know which to read.
The formula for T is Memaddrlength-b-S. That is, remove the group index bit and the data block shift, the remaining is the identity bit.
2.3 As a result, what does the cache look like in general?
Through groups and rows we divided the cache into a tree structure, when the need to read low-level storage, the address according to the rules of T S B split, through S to find the corresponding group (number of rows * Cache unit size), and then identify the corresponding row by the identity, and then the data read out according to the block offset. In general it is so simple.
In fact, there are many details of the problem, such as how to do if not hit? What if the corresponding cache unit is not warmed up when hit (no data is read)? If the corresponding group of memory addresses are now full, which one should be replaced? And so on, these computers now have a corresponding strategy, interested students can go to see the book.
3. What am I going to do?
I am going to use C language to write a cache mechanism to try, first I do not have low-level storage (currently not hardware), low-level storage is a pointer and a portion of the memory space abstraction. The cache, too, is in memory, with a pointer and a portion of the memory space abstracted. This is written to give you a better understanding of how the cache works.
The functions implemented are:
1) realize the cache read function, the corresponding T s B value can be processed according to the memory address, and the cache unit is obtained from "cache space".
2) To implement the write function of the cache, now plan to use direct write, that is, write directly to "low-level storage" is not processed according to the cache's write state.
At present, the component design and activity diagram are designed, and the diagram is as follows:
1) Component diagram:
2) Interaction diagram
3) Read activity diagram
4) Write activity diagram
The code is later put into Git@code.aliyun.com:qdxiayongming/c.learn.git
Look for an experienced friend to correct me.