5. in-depth understanding of computer system notes: optimizing program performance

Source: Internet
Author: User

1Compilation technology is divided into "machine-independent" and "machine-related. "It is not related to machines". When using these technologies, you may not consider executingCodeAnd "machine-related" means that these technologies depend on the low-level details of many machines.

2Least Square Method fitting

3, Optimization: Eliminate cycle inefficiency

This is called code Movement (Code Motion).

4To reduce unnecessary process calls. If the border security can be ensured, the border security check does not need to be performed every time.

5Eliminate unnecessary memory references

Sample Code

Void test1 (int * ptoint) {* ptoint = 0; For (INT I = 0; I <10; I ++) {* ptoint + = I ;}} // The efficiency of the above writing method is much lower than that of the following when the data volume is large, because * ptoint involves memory references. Void Test2 (int * ptoint2) {int itemp = 0; For (INT I = 0; I <10; I ++) {itemp + = I ;} * ptoint = itemp ;}

Introduce temporary variables to save intermediate results. The result is stored in an array or global variable only when the final value is calculated. Through optimization, the compiler uses the register eax (usually) to store the results of intermediate variables. (View the assembly code)

6Modern processor Structure

Amount exceeding the standard (Superscalar): You can execute multiple operations in an out-of-order manner in each clock cycle (Out of order). The command execution order does not have to be in the AssemblyProgramIn the same order.

The whole design has two main parts:ICU(Instruction Control Unit, Command control unit) andEU(Execution Unit, Execution Unit ). The former reads instruction sequences from memory and generates a set of basic operations on program data based on these instruction sequences. The latter performs these operations.

Retired unit (Retirement UnitRecords ongoing processing and ensures that it complies with the ordered semantics of machine-level programs.

7, Most units can start a new operation each clock cycle. The only exception is the floating point multiplier and two delimiters. The divisor is not streamlined.

Latency(Execution time)Represents the total number of cycles for a single operation.

Issue time(Launch time)Denotes the number of cycles between successive(Continuous),

Independent(Independent)Operations. (obtained from Intel literature ).

8, Reduce cycle overhead

We can reduce the impact of the cycle overhead by executing more data operations in each iteration (Loop unrolling. The idea is to access array elements and perform multiplication in an iteration.

9, InIa32On the processor, all floating point operations are extended80Bit-precision execution, and floating-point registers are stored in this format. The register value is converted32Bit (floating point number) or64Bit (double-precision format ).

10, Performance Improvement

1) Select the appropriate data structure andAlgorithm.

2) Encoding:

(1) Pay attention to the several items listed above that may cause low performance.

11Program Analysis

Program profiling (Profiling) Including a version of the running program, where the tool code is inserted to determine the time required for each part of the program.

UNIXThe system provides an profiling ProgramGPROF. For more details, you canGoogle, Or refer to5Chapter.

12,AmdahlLaw

The main idea is: when we return to the speed of a part of the system, the impact on the overall performance of the system depends on how important this part is (percentage of all) and speed increase (several times higher than the original ).

Computer Systems: A programmer's perspective >

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.