System optimization rules

Source: Internet
Author: User

System optimization rules

1) alignment rules

Which of the following is the shorter time to access a long data file and a single byte of data?

C/C ++ programmers know that there is an alignment principle. In 32-bit CPUs, it requires four bytes of alignment. Why? Because in most 32-bit CPUs, it is more necessary to access a byte than to access a long data, because the CPU hardware logic can only process data of a long character at a time, if you want to process data that is smaller than a word, writing data directly cannot be done. You cannot overwrite other bytes of data with the same word length to write one byte of data, if hardware is not supported, software is required. For example, if you want to write a byte to the memory, read the data of the character length of the byte to the general register first, and then modify the General Register after the shift or non-Wait operation, finally, write it back to the memory. In this way, the operation of a byte goes through the three steps of reading and saving. If it is to write a long data, it is easy to write it directly to the memory.

If you want to operate on other media, such as flash, rather than memory, it is best to stick to this alignment principle. Flash can only perform whole-block operations, the most terrible thing is that flash should be erased before writing, and only the whole page can be wiped, even if you only need a byte, you also need to operate a whole page or a piece of data. Most programmers do not have access to flash operations, because in the operating system, file systems and drivers will handle this, but some embedded programmers and driver engineers need to understand this, especially those who design file systems, this is even more common sense.

Back to the traditional point of alignment. Most other compilers will help you alignment, but note that it may only help you, rather than always, and sometimes it will help you. Some Coordination Data for inter-process interaction, especially network-related data, should not only pay attention to the problem of size, but also the issue of alignment. It is recommended that before designing the coordination, the alignment problem is taken into account. This is not only to reduce misaligned errors, but also to improve efficiency.

2) try to ensure that the cache can hit

The following is a famous code:

Code 1:

Int cache [200] [300];

For (INT I = 0; I <200; I ++)

For (Int J = 0; j <300; j ++)

{

Read (Cache [I] [J]);

}

Code 2:

For (Int J = 0; j <300; j ++)

For (INT I = 0; I <300; I ++)

{

Read (Cache [I] [J]);

}

You think the code execution efficiency may be high. Generally, the CPU has a cache. In other words, the cache is a piece of memory, which is faster than the external memory. When the CPU accesses the external memory, the data segment after the current data is also put into the cache, so that the CPU will directly find the data in the cache when getting the next data. If the data is found, it will be read directly, this is called a hit. If it is not found, it will still be read from the external memory (the last hit) because it is much less than the time used to determine whether the cache hits or read data from the cache.

Generally, programmers cannot manipulate the cache, And the cache action is completed by hardware. However, as long as you know how the cache works and write the code that can hit the cache as much as possible, your program will be much faster. Code 2 may fail to hit the cache for every access to the memory, so that your code execution will slow down.

3) place the most frequently executed code in the RAM with the fastest access speed

Without a doubt, if the address retrieval speed can keep pace with the CPU speed, the efficiency of the entire system will be greatly improved. But this is just a good wish. High-speed RAM represents a higher cost, and we want our programmers to do this.

Generally, a storage system in a system is composed of several levels: on-chip RAM, off-chip DDL or ram. On-chip RAM is fast but small in size, to improve the execution efficiency of the entire system, we need to improve the memory usage in the disk. It is a good choice to put the code with the highest running frequency in your system into the RAM with the fastest speed. For example, in some LCD systems, operations on graphics or operations on graphic data are the most frequent operations. If you place the execution location of these codes in the fastest-performing on-chip RAM, the Display Effect and speed of the LCD will be greatly improved, and this improvement is absolutely shocking. As for how to locate the code in a ram segment, This is the knowledge of compiling links.

4) do not interrupt the assembly line

The pipeline is not a new technology in the CPU. The CPU processing capability of a seven-level pipeline is equivalent to the processing speed of the seven single-level pipelines at the same frequency. The pipeline technology is quite shocking, so don't interrupt it at will.

Our compiler will have some optimizations in this regard, so that the machine code compiled by your code can meet the requirements of the pipeline as much as possible, but no compiler is smarter than others, if you are lucky enough to write some assembly algorithms, please try to reduce command-related and data-related operations as they will interrupt the pipeline, so that your seven CPUs can only be used as one.

5) try to deal with low-speed devices as little as possible

As a Chinese man suffering from queuing, this is understandable. Do not queue up without waiting in the queue. Do not access devices without accessing slow devices. If you must access the devices, you must obtain the most resources in a queue to minimize the number of queues. Of course, there is another way to go, that is, asynchronous access, so that the device can work quickly and then notify you, instead of waiting. Of course, reasonable resource allocation can also reduce the number of visits to low-speed devices. This is the most important thing. We all know that queuing for train tickets is not because there are few train tickets, but because of unreasonable distribution. All people in the Spring Festival can buy train tickets at the end of each year, but you are suffering from queuing or paying a higher price. Therefore, the resource allocation method is very important. Try to minimize the number of slow devices in your system.

6) use dedicated hardware to perform the most frequent operations

With the development of IC technology and the reduction of chip design and production costs, more and more applications are using dedicated hardware instead of software for computing. For example, how can you optimize the graphics system mentioned above, its performance is not comparable to a dedicated GPU to help you. GPU is used for graphic operations, DMA is used for data processing, and dedicated hard codecs are used for encoding and decoding to free up the CPU. In this way, your system may not be centered on the CPU, but on the data flow (data storage, the speed of the entire system depends on the speed at which data flows between various dedicated processors.

For now, please add others.

Thank you

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.