A brief talk on some basic programming optimization techniques in C language program _c language

Source: Internet
Author: User

Probably all beginners who learn C language, have been said by predecessors, C language is the world's fastest programming language, of course, this is not bragging, and not to belittle other languages, admittedly not C language can write high speed code, but C language is easier to write high-speed program (high speed does not mean high efficiency), however, a good tool, Outside the hands of pedestrians can only be bleak decline.

For modern compilers, modern CPUs, we try to cater to the CPU design (such as architecture and processing instructions, etc.), although the compiler is for the programmer, and in doing its best to optimize the code written by the programmer, but after all, it has not been out of the scope of electronics, If our code does not allow the compiler to understand that the compiler cannot help us optimize the code, then we cannot write a high-speed program.

For this, we can ignore the CPU design for the time being, because we can only consider how to meet the compiler's optimization rules at the level, and the CPU is the language and compiler thing.

To increase the speed of the program, there are several ways to C:

The first thing to do is to design a reasonable outline, which is called the biggest performance improvement of a program is the first time it runs
To avoid continuous function calls.

    • Eliminates unnecessary storage use (not recommended)
    • Using the cyclic expansion technique, the general compiler's optimization options automatically help you modify the code to expand the loop
    • For the core time-consuming part of an operation, the technique is used to improve the speed
    • More than a few styles of writing, not intuitive to think, because the computer idea and you are not the same

Note: With the compiler version updated, even if the optimization option is not turned on, self-compiler optimizations can still provide part of the optimization for the code we write, which is why we don't use the old version of the compiler, although as a programmer should not rely too much on the compiler, but I think that the times are progressing, the amount of information is expanding infinitely, But the human brain and energy are limited in a great age, in other words, our memory is limited for the common people, we should not focus on what our predecessors have done, but on the shoulders of giants to look farther, so we should make full use of tools to help us achieve some of the existing functions, Programmers should be more focused on discovering new ideas and ideas, and it's not the wrong thing for programmers to rely on compilers until the Turing test has been broken.

For the current compiler, GCC (gcc is more than just a compiler, but here it's synonymous with compilers), for example,-o2 is an optimized hierarchy acceptable to the public, and for other compilers, General programmers can choose to use the compiler developed by Google and Apple Clang is also a good choice, under the-o2 optimization level, GCC generally can automatically perform cyclic expansion optimization,

Begin.

 /*struct.h*/ 
 #include <stdio.h>
 typedef struct me{
   int  value;
   struct me* next;
 data_t;

 typedef struct{
   int index;
   data_t* storage;
 } Block 

To test the convenience we first defined two structures, respectively:

Blocks represent a block, each block has a sequence number (int), and a data field data_t
data_t represents a data field, the prototype is a linked list, and each data_t object contains one data and one pointer.

 /*main.c*/
 #include "struct.h"
 #define arr_size
 static inline int Get_len (const data_t* data)
 {
  int len = 0;

  if (!data)
   fprintf (stderr, "The data in%p is null\n", data);
  else while
   (!data->next)
   {
    ++len;
    data = data->next;
   }
  return len;
 }

 static inline void mix_cal (const block* process, int result[])
 {for
  (int i = 0;i < Get_len (Process->storag e); ++i)
  {
   *result = = (process->storage) [i];
  }
 }

At this point we get two test functions, Get_len and mix_cal are used to get the data_t length and compute the sum of the data fields.

 /*main.c*/ 
 int main (void)
 {
  block* block_in_all[arr_size] = {NULL};
  int Result_in_all[arr_size] = {0};
  *
  * Assumes a lot of ' block ' type objects
  * Put many ' blocks ' in an array with each element type ' block* '
  * contains non-null data_t type of data field per block Object
  * * for
  (int i = 0;i < Arr_size;++i)
  {
   mix_cal (block_in_all[i], result_in_all+i);
  }
  for (int i = 0;i < Arr_size;++i)
  {
   printf ("The%dth blocks have the total%d data\n",
      block_in_all[i]-> Index, result_in_all[i]);
  }

  return 0;
 }

Patiently read the above code, it is used to sum, to find all the elements of a domain and. With a careful analysis, it's easy to see some of the drawbacks, the biggest of which is the call to the Get_len function in the mix_cal function, which seems obvious here, but can we notice the problem when we write the program?
For some of the unnecessary function calls we have to do is to extract them out, the use of temporary variables is a good way, because the compiler with the help of the temporary variable allows the full use of the CPU's registers. The reason is allowed, because the number of registers is not much, and the compiler in the use of registers need to consider a number of complex factors, so not every time the use of temporary variables can be added to the register. But that doesn't prevent us from lifting the program's performance.

Here, we should extract the Get_len function from the judgment statement in the For Loop and use a temporary variable externally to receive the result instead of calling the function in the loop.

 int len = Get_len (process->storage);

.

is still the top of the code, let's talk about the loop unfold.

For the mix_cal function, how can we say that the compiler could improve its speed? We have said that a small change may have a great impact on the final code of a program, the most commonly used is to try, the previous road has been paved, do not need to repeat the wheel.

Loop expansion:

 int reality = len-1, I;
 for (i = 0;i < reality;i+=2)
 {
  *result = *result + (process->storage) [i] 
      + (process->storage) [i+1];
 }
 for (; i < len;++i)
 {
  *result = = (process->storage) [i];
 }

This is the 2 cycles unfolding in the cyclic expansion, as well as the n cycles unfolding.

Similarly, in the previous mention of the use of registers and the reduction of unnecessary overhead, in this program for (process->storage) [i] such a memory location of the reference is too wasteful, we always optimize the use of low-cost temporary variable

 data* Local_data = process->storage;

This will bring considerable savings to the program, while these tasks can be included in compiler optimizations, once our code is difficult to be understood by the compiler (although the compiler's upgrade is designed to improve the optimization effect), we are likely to get a program that is not very well performing. So when we are not particularly compact, we can do it as a part of our job instead of handing it over to the compiler.

And for external storage location results We also have waste here, and we should use a temporary variable to store the sum instead of dereference it every time the result is obtained.

 int local_result = 0;
 /*...*/
 local_result = Local_result + local_data[i] + local_data[i+1];
 /*...*/
 *result = local_result;

At the top we can see that the cyclic expansion is called the 2-time cyclic expansion, so naturally it can be inferred that there are n cycles unfold, naturally there is, for an n-cycle expansion of the equation we have a simple upper bound formula namely:

 Reality = Len-n + 1;

As for the best, it still depends on the environment. So the final version should be:

 static inline void mix_cal (const block* process, int result[])
 {
  int local_result = 0;
  int len = Get_len (process->storage);
  int reality = len-1, I;
  data* Local_data = process->storage;

  for (i = 0;i < reality;i+=2)
   Local_result + = Local_data[i] + local_data[i+1];
  for (; i < len;++i)
   Local_result + = Local_data[i];

  *result = Local_result;
 }

Explanation: The loop expands the elements into two parts, the first part adds two elements at a time, because it does not add the remaining elements, so the remaining elements are added in the second part.

. There is also a technique called regroup that combines the fastest of the operands in an expression, but this method has never been tested. So do not mention.

. For the time loss caused by conditional branch prediction errors, called punishment, the most popular saying is that when you write the code contains conditional branch, the processor will choose to pre-determine a branch is the correct slip, so as to avoid modifying any actual registers and memory, until the actual results are determined, if not , that would be miserable, and it would be a waste of time doing things. But there is no need to be overly concerned with the prediction of such conditional branching, which is what I put in the last word.

Here are two more objective methods, one called imperative, and one called functional

Command type:

 for (int i = 0;i < N;++i)
 {
  if (A[i] > B[i]) {
   int temp = a[i];
   A[i] = B[i];
   B[i] = temp;
  }
 }

Functional Type:

 int min, max;
 for (int i = 0;i < N;++i)
 { 
  min = A[i] < B[i]? A[i]: b[i];
  max = A[i] < B[i]? B[i]: a[i];
  A[i] = min;
  B[i] = max;
 }


A very clear example, it is obvious that the former for different cases of the number of steps are significantly different, and the latter in any case is the same procedural steps.

Two forms of benefits the former is a good model for predictable data, the latter is the golden mean, what is predictable, unpredictable, such as whether a number is negative or positive this is unpredictable, with the previous code there will be a lot of punishment.

. Multiplexing is also a very important way of thinking, may be seen in many people's eyes, two statements in turn write and merge the effect must be the same. However, there is a disadvantage of multi-channel parallelism is the number of registers required, when the register is not enough (known as overflow), performance does not rise back down. The same is true for cyclic expansion, this time using the four-cycle expansion plus two parallel:

 for (i = 0;i < reality;i+=4) {
  local_result_1 + = Local_data[i] + local_data[i+1];
  Local_result_2 + = local_data[i+2] + local_data[i+3];
 } can also be divided into four-way parallel, each save one. This approach fully utilizes the CPU pipelined performance for
 (; i < len;++i)
  local_result_1 + + local_data[i];

 *result = local_result_1 + local_result_2;

Tips:

What do you mean by most of the functions written above with the static inline keyword? The first thing we need to determine is that the static function has no meaning for a single document that is not a project (meaning that it is not meant to be useless for visibility), and many people are confused about the static function because: I clearly defined a function declaration as a static type, But I can still access it in other files!

In fact, this is because you do not understand the meaning of the C language project, most people are so tested:

First create two files in a folder test_static.c and static.h:

 /*static.h*/
 #ifndef static_h
 #define STATIC_H
 static void Test (void);

 static void Test (void);
 {
  printf ("Hello world!\n");
 }
 #endif
...

 /*test_static.c*/
 #include <stdio.h>
 #include "static.h"

 void Test (void);
 int main (void)
 {
  test ();   Compiled through, can run. return
  0;
 }

Then compile the run and discover that it can go through AH!! What does the standard say is not visible in other files? And the static.h remove #include after found error test undefined, instant beginners messy.

Well, it's actually the mistakes of the seniors and the textbook, because all the external phenomena from the beginning to the end tell us that the C program is a separate file, but it doesn't tell us to make them into a project first! If you are learning C in Visual Studio, you might be slightly better at understanding the concept of engineering, although it is not recommended to use VS to learn C language.

You want to realize that the static function is only visible in this file, you need to cram the concept of engineering, for any visible or invisible concept is built in a project, rather than as the above code, using #include to indicate that you are #include, What else is visible and invisible is certainly visible. So a static function can be seen in terms of all C-language source files that are not visible based on each project. So you will often see seniors who answer your questions like this:

 /*static.h*/
 #ifndef static_h
 #define STATIC_H
 static void Test (void);

 static void Test (void);
 {
  printf ("Hello world!\n");
 }
 #endif
...

 /*test_static.c*/
 #include <stdio.h>

 void Test (void);
 int main (void)
 {
  test ();   Error, because test is a static function. return
  0;
 }

Did you find it? In the upper code, there is a missing line #include "static.h" but the code is still viable because the two files are built in the same project, rather than creating two new source files in a single folder, you can test them using the engineering features of each IDE.

To get back to the point, here's a little bit about the static function, which allows the function to be placed in a static space, rather than in the stack, which is called faster and is often used with the inline keyword, to make the function faster. But pros and cons, you can weigh yourself.

Note: Memory mountain is a three-dimensional coordinate diagram of the reading rate of different size files in different steps. The shape of a mountain, z-axis rate, x-axis for the step, y-axis file size (bytes), some of the mainstream evaluation software is this principle (memory mountain image for a simple transformation, will be able to get what software renders the effect image).

As mentioned above, any minor changes are likely to make a significant change in the performance of the program, why?

At that time we did not delve into because of our inertia to think that the computer operates in the same way as the human way of operation, but also in the past experience that the computer must be in any way beyond the human existence, but in fact, the computer in addition to the faster than the human speed in the duplication of other aspects far behind the human brain , even our most sparse and common visual recognition (seeing things to identify objects) in the computer seems to be an extremely advanced field, so our current era of the computer is still in the beginning, in this era, the role of programmers is irreplaceable, the same programmer's every move is related to the fate of the computer.

Probably in many ways, have been exposed to the main composition of a computer, and programmers are most closely related to the CPU, main memory and the hard drive, perhaps so far many programmers still think that the coding and the memory what is the relationship? However, a programmer, especially programmers who write C language program, the biggest influence is from this, in the computer's memory structure, divided into four kinds of levels:
CPU registers, cache, main memory, hard disk.

But have you ever wondered why computer memory systems are divided into these four-tier structures? We know that the reading and writing speed of these four kinds of memory is decreasing in turn, why don't we choose a kind of low speed and moderate price material to make all levels of memory?

One of the explanations given is that a well-written program always tends to access higher-level memory, and for today's technology, high speed storage, which cannot be made in large quantities, we can choose to distribute the structure hierarchically, allowing us to achieve the highest speed memory with the lowest cost of memory.
Just like on your own computer, when we open a very bulky application, we find that the next time we open it may be quicker, just like a problem left over from previous history Visual Studio 2008 on Windows XP, the first time to open is always very cotton, But the second time after closing the program is very smooth. In reference books, the key points of the speed of two evaluation programs are mentioned: temporal locality and spatial locality.
Time locality: It is likely to be accessed again in the near future after a block of memory has been accessed
Spatial locality: After accessing a block of memory, it is possible to access its adjacent storage location.
Good local improvement is generally good to improve the performance of the program.
The so-called locality is that when we use some resources, these resources are always stored in a form in a more advanced and convenient storage, so that the most recent access request can be more efficient.
To make a less apt analogy, suppose the computer is a home, the CPU is a person, imagine that all the things in this House are orderly, this person want to work will necessarily need work items, so he needs to be taken from some places, used to put back, these places is memory, But after a while, it's a waste of time. Sometimes some things are too far away, so, people want to put it in their own more into the place, so that their efficiency is much higher, if this thing for a period of time no longer used, then put it back to the original position to more needed work items, The more frequently used items are thus formed, the closer they are to the person. This is the meaning of the hierarchical structure of computer memory.
And for a program with a good local nature, we can always find the data we need in the nearest place, back to the computer: we know that the computer's memory is layered, that is, each layer corresponds to a different reading and writing speed level (CPU Register > Cache > main memory > Hard disk), Our programs always look in order from left to right, each time we find a required data, no surprises, always move it to the previous level of memory storage, so that the next more high-speed access, we call this behavior is called hit. The better the program, the more you can put the data you need at that time closer to the left. This is the meaning of locality.
Of course, memory so layered is also condemned, in the processor speed and memory speed is really a gap between the only way to make the processor more fully utilized, and not wait for the memory to read and write and idle, perhaps one day, when the memory of the bit price and common hard disk or a little gap, Maybe the memory is the hard drive. And today some people use some special software in the implementation of this function, with their own computer on the large capacity of memory, split out as a hard disk, access speed so hard to catch up.
Local Nature:

The locality is mentioned in the front, and the locality is embodied in the when the step is larger, the lower the space locality, most of the performance will be reduced, such as the most common multidimensional array loops (I rarely use a multidimensional array is one of the reasons for this), said the multidimensional array is actually just a number of one-dimensional array of packaging, C language does not have a real multidimensional array, but it is interpreted into memory in the one-dimensional contiguous memory, but when we traverse it, C language in order to not let us be bothered by the underlying implementation, so that the creation of multidimensional array traversal illusion:

Let's revisit the multidimensional array:

#include <stdio.h> 
int main (void)
{
 int dim_1_arr[4] = {1, 2, 3, 4};
 int dim_2_arr[2][2] = {1, 2}, {3, 4}};
 int result_1 = 0;
 int result_2 = 0;

 for (int i = 0;i < 4;++i)
  result_1 + = Dim_1_arr[i];
 return 0;
}

In this example, a one-dimensional array is given a 1-pass length, assuming that the starting position of an array in memory is 0

0 => 4 => 8 =>

(int j = 0;j < 3;++j) {for
 (int i = 0;i < 3;++i) {
  result_2 = dim_2_arr[ I][J];
 }


In this case, what is our step size? Let's take a look.

0 => 8 => 4 => 12

You can clearly see the two of different code between the jumps, why? We observed that in the traversal of multidimensional arrays, we were somewhat different from the usual practice of first traversing the I and then iterate over the J, which leads to the program must be in the memory block irregular beat, here is not the law is the computer to think of the irregular, although in our view is indeed a trace can be found, excellent compiler can be optimized to deal with it. For the sake of the case, that is, the space of this program is relatively poor, for a large jump in memory, irregular jump program will affect the performance of the program. This decision is important for a contiguous block of memory, such as the structure in C language.

In fact, C language is also able to object-oriented, but very complex, just like holding a stick to weave clothes. The body of the C language allows us to understand the concept of the object to a certain extent, because it is a complete individual, although it is undefended to the outside world.

For the structure body:

#define VECTOR 4
typedef struct{
  double salary;
  int index[4];
} Test_data;

int main (void)
{
 int result_1 = 0;
 int result_2 = 0;
 Test_data Dim_1_arr[vector];
 /* ... Padding Data * * for

 (int i = 0;i < Vector;++i)
 {for 
  (int j = 0;j < 4;++j)
   result_1 + = dim_1_arr[i].index[ j];
 } /* For Loop 1 *

 /for (int j = 0;j < 4;++j)
 {for
  (int i = 0;i < Vector;++i)
   result_2 + = dim_1_arr[ I].INDEX[J];
 } /* For Loop 2 *
 /return 0;
} 

Or as above, suppose the Dim_1_arr start position is 0

For Loop 1:

8 => => => ==> => => ...

.. For Loop 2:

8 => => => ==> =>-=> => ...

From the above incomplete comparison, Loop 1 has a better spatial locality than Loop 2, and it is clear that in loop 2, CPU reads are hopping in an irregular memory position, while loop 1 reads the memory in a monotonically increasing trend forward (where the forward pointing is visually forward).

Review the structure nature and knowledge of C language here:
For any fully defined structure, the amount of memory each object occupies conforms to the rules for memory alignment.
For each member of the structure body, the distance from which the address is stored relative to the object is called an offset.
Explain:

Memory alignment is that for a struct, its memory size is always the largest member of the total number of times, where the largest member refers to the most basic members, namely:

 typedef struct{
  Test_data test_1;
  int  test_2;
 } test_data_2;

 /*...*/
 printf ("the size of test_data_2 =%d\n", sizeof (test_data_2));
 /*...*/
' output: the size of test_data_2 = '

 typedef ' struct{
   int index[4];
   int store_1;
   int store_2;
 } Test_data_3;
 typedef struct{
   Test_data_3 test_3;
   int   test_4;
 } Test_data_4;

 /*...*/
 printf ("the size of Test_data_4 =%d\n", sizeof (Test_data_4));
 /*...*/

Output

 The size of test_data_2 = 28 '

Carefully contrasting the difference between ' test_data_3 ' and ' test_data ', you can find the difference, in the former's interior contains a ' double ' type member, on my machine its length is ' 8 ', the latter contains two ' int ' type members, each length is ' 4 ' , but their length is intuitively the same! But when we actually use it, we can perceive the difference, which is what I mean by the most basic member * * *. Although we can use structs as a whole, in fact there are some differences between them and the type of built-in (build-in).
Offset in layman's parlance, that is the length of the starting address distance from the beginning of the location, in the structure, C language is how to allocate the size of the structure? In addition to the memory alignment, you need to consider the order in which members are declared when you define a struct, in other words, who first declares, and whose position is the front. And the offset of a member represents its starting position minus the starting position of the object it belongs to. (It should be noted here that the result of the two irrelevant pointer subtraction is meaningless, and that subtraction is meaningful only if the two pointers are in one scope, in order to avoid potential errors, We need to be careful with pointer subtraction operations.
Looking back over the loop explanation above, you should be able to understand that the numbers are calculated by offset.

The reason why there is no detailed introduction of time locality is because, for the time locality, its biggest influence factor is the size of the operation area, such as our operation of the array or file size, the smaller the time local the better, imagine for a small file and large files, we are more easily operated to the same block of places many times, Must be a small file. And the operation of the file size is sometimes not very good to become our operating factors, it can only pay more attention to space Local.
Caching device:

Mentioned in the front, in general, local good programs can make programs more efficient than local programs, and for local variables, a good compiler is always as far as possible to optimize, so that it can fully use CPU registers, then register below, that is, the speed closest to the register, Is the so-called cache, for the cache, the biggest effect is buffering, buffering has two layers meaning:

Cache the data so that the next needed data is as close to the CPU as possible, and the proximity here is not physically close.
Buffer CPU in memory huge speed gap, prevent CPU idle waste.
For today's computers, CPUs are basically three-tier caching: first-level caching (L1), Level two caching (L2), three-level caching (L3), and can view their CPU cache through CPU-Z (Windows)/MAC OS system reports, which we can see in software There are two parts in the first-level cache: first-level data, a level of instruction, which represents only read and write data, only read and write instructions, so that the difference is that the processor can process both a data and a command, which is said for a CPU core, that is, when the CPU is multi-core, there are multiple this " Feature collection (L1+L2) ". Second-level caching is in the same kernel as the first level cache, each with its own level two cache, and the last of all nuclear shares, the only one (L3)

In general, for the cache, generally divided into three layers, the first layer is more special from the independent two components, the second tier of the third layer is independent of the function (both stored data and stored instructions), while the first and second layers are each with a different buffer, and the third layer is each nucleus sharing a layer, So we often see that on personal computers, the size of the L3 is often in megabytes, while the first tier is mostly in kilobytes or even bytes.
In practice, people who like to study computers often see their CPU configurations in some professional software, in the cache column of the first and two levels can always see the parameters such as 2 x Kbytes, 32 represents a certain level of cache size, and 2 in the front is the number of cores, that is, how many cores have multiplied, Consistent with what was said before.

The various layers of the cache still adhere to the rule of gradual deceleration, that is, the read cycle L1 < L2 < L3, and the larger impact is the hit rate mentioned above, and we know that the higher the cache will always map the underlying memory to its own memory, and logically infer that The actual space on the upper level is smaller than the lower level, because the upper space is more valuable and faster, which leads us to be unable to map the lower space one by one corresponding to the upper layer, then we think of a way, not the contents of the underlying memory completely mapped to the upper level, but the upper layer selectively to the lower part of the content extracted to the upper , this is the operation after the no hit.

For the CPU to read the data from the memory of this operation, if we use the concept of caching and memory, then there will be an extension of the concept, hit. There are only two scenarios for this concept, hit or miss. And for an initialized cache, it must be empty, it may not be empty in physical sense, but it is actually empty in the program, and to differentiate this, the cache uses a bit to indicate whether the group is valid (that is, whether it is empty), and since it is empty, The first time we couldn't hit the data, the cache on that layer would go down one level, looking at the desired data in this layer, each time you want to apply for the next level of search behavior is generally called punishment, and when we from the memory to load the required data into the cache, we started the operation, And everything about the efficiency of the cache is focused on improving the hit rate.

Suppose an array needs to be manipulated, since an array is a contiguous memory space, the operation with a step size of 1 has a good spatial locality, so it can be considered as a good example, in the cache, it seems that reading an array vector with n (n>n) elements is not a one-time read. Instead of reading the number of times, if you read k then at least K misses, this is inevitable, and for reading data is not necessarily what we need, in the book Example:
vector:|[0]| [1]| [2]| [3]| []| []| []| []| []| []| []|
Assuming that each element of the array is manipulated, we read the value of three memory at a time, the type int, because the principle is the same. So at initialization time, the cache is empty, the first operation, read four (as shown above), this time must have passed a hit.

Very well understood, because the buffer is empty, so the first operation must not hit, so we need to read the data we need to the lower memory, then the second access to the cache, you can hit ' vector[0] ', hit the follow-up two, until you need ' vector[4 ', there are no hits, Then we need to repeat the previous step and read the three data again, and so on until the end. ' Vector:|[0]| [1]| [2]| [3]| [4]| [5]| [6]| [7]| []| []| []|`

Now we can explain on a certain level why the local good program than local poor program to have a better efficiency, the reason for the use of the cache, * * First repeated use of local temporary variables can fully call the cache function to read and write optimization, The second step is smaller and more capable of maximizing the data read by the cache * * *, at this point to think back to the multi-dimensional array of traversal and operation, if the space is not considered local (that is, the first operation of large chunks, and then operation of small pieces), then in the cache, it's not the hit rate is outrageous, This is also the reason for inefficient operation.
On the other hand, for different step lengths, the effect is also the cache hit rate, or the top vector

Step Size | 1 | 2 | 3 | 4 | 5 |
No hit/hit |1/4|1/2|2/3|1/1|1/1|
As you can see, for the step size, when a certain limit, each request will not hit, then this layer of the cache is equivalent to void, time is spent in the lower data transfer to the upper level of time, because each read is not hit, you can use the example above to try to infer.

Each time that you read the next level of memory data, are read in accordance with the memory alignment, if your memory data, such as reading the structure, is not placed in the position of the memory alignment (where the memory alignment is based on the size of the memory read at each level is the alignment multiple, Instead of the memory-aligned position of the struct when it is stored in memory, the starting position of the front and back complement multiples of that position is read

    • Next Level Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F
    • Structure data storage location in 4~f
    • Every memory at that level reads 12 data
    • So this time because the structure of the storage is not aligned to the extraction of memory location, all the extraction may be 0~b

Also means that not every cache read is so perfect that it happens to be read from the header of the data in memory, but the whole piece is read according to the memory multiples.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.