Reading abstract-efficient C ++ performance Programming Techniques

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CHP 1 the tracing war story

Tracing becomes necessary when your code size exceeds several thousand lines.

When the tracing mechanism is added to a very small but frequently called function,If you do not pay attention to it, tracing may reduce the system performance by an order of magnitude..

In the C ++ program, unnecessary object construction and destruction bring about great overhead.

Functions that are suitable for inline calls are often not suitable for tracing purposes.

CHP 3 virtual function

The development of programming languages tends to make programming tasks easier by moving more work to compilers, interpreters, compilers, and connectors. The virtual function is to transfer the type parsing task to the compiler.

The time overhead of the virtual function mechanism is as follows:

1. vptr needs to be correctly set in the constructor
2. function calls must be performed indirectly through pointers.
3. The features of virtual functions and inline are difficult to work together

Among the three points, the third point has the greatest impact on performance.

When a virtual function becomes a performance bottleneck, there are two solutions: manual hard encoding execution type determination, or template parameter mechanism.

The template Mechanism often has good performance because it pushes the type parsing from the runtime to the compilation period.

CHP 4 return value Optimization

The basic idea of RVO is to eliminate the local variables in the function. All operations are directly performed on the temporary variables to be returned.

The basic premise of Compiler optimization is to ensure correctness. For RVO, this is not always easy. When a function contains multiple return statements and returns different local variables, RVO obviously cannot maintain the correctness of semantics-therefore, in this case, RVO is not enabled.

Simply put, the premise of RVO is that all the return statements of the function return the same local variable.

Although the standard stipulates that both named and unamed return values can be optimized, the unamed return value is more likely to be optimized from the implementation perspective.

In addition, to enable RVO, the class also needs to provide a copy constructor. Otherwise, RVO will be quietly disabled.

CHP 5 temporaries

The RVO of a function can be roughly divided into two levels based on the degree of radical.

1. Eliminate local variables in the function and directly returnTemporary objectExecute various operations on;

2. In the expression "t = Foo ()" for example, remove the temporary object to be returned by the function Foo ().Final object(The premise of this optimization is that "=" indicates initialization, rather than assigning values)

Chp8 inline Basics

One of the costs of inline is the increase in Compilation Time.

Deeply understand the overhead of function calls

IP(Instruction Pointer): usually called a PC (program counter), which contains the address of the next instruction.

LRLink register: the address of the next instruction to be executed after the current function is returned.

SP(Stack pointer): Record stack usage

AP(Arugment pointer): Specifies the position of a parameter in the stack.

Fp(Frame pointer): used to separate different function frames on the stack.

In general, the overhead for function calls and responses is 25 ~ 100 cycle

The existence of exceptions reduces the possibility of enabling RVO.

An important advantage of inline is that it reduces the impact of branch conditions. The existence of Branch Conditions can significantly reduce the performance of modern processors that use command line prefetch technology.

On the one hand, inline eliminates function calls and returns the corresponding machine code, and the program size tends to decrease. On the other hand, because the code of the function body repeats, it tends to increase the program size.

CHP 9 inlining-performance considerations

The advantage of inline is to eliminate the overhead of function calls. However, this is only the story. The significance of inline is to allow the compiler to grasp more information, so as to better carry out "cross-All optimization" and better optimize it in a context with richer information.

Cross-call optimizaition generally means that the operations originally executed during the runtime are completed in advance during the compilation period.

Compared with eliminating the function call overhead, cross-call optimization significantly improves the performance. However, cross-call is heavily dependent on the performance of the compiler and is less reliable than the former.

In the analogy, for the performance improvement that inline can achieve, cross-call optimization is rabbit, and the elimination of function call overhead is the tortoise.

Literal constants play a key role in the most common optimization.

Be cautious about the possible code expansion caused by inline, so as to reduce the local characteristics of the program and increase the probability of cache miss, thus completely offsetting or even exceeding the advantages brought by inline.

In addition, inline also adds compilation dependencies. Any modifications made to the implementation will result in re-Compilation of modules dependent on the function.

CHP 12 reference count

There is no doubt that reference counting is a good mechanism to reduce memory consumption, but there is no definite answer to the speed issue.

If the object does not occupy a lot of resources and the shared objects in the system are not frequently used, so the reference count will lead to performance degradation-because the reference count means that the space originally allocated on the stack is changed to allocated on the stack, which is obviously a more expensive operation, that is, when the reference counting object is created for the first time, the overhead is large.

In turn, if the object occupies a large amount of resources and the shared objects in the system are frequently used, it is the best choice to take the reference count.

CHP 14 design Optimizations

Basically, the flexibility and efficiency of a program are two opposite goals.

CHP 15 scalability

Basic Features of SMP

1. Multiple processer
2. All processors are in the same status in the system
3. Everything else is still single, such as a single memory, a single kernel code, a single running queue

The SMP bottleneck lies in the bus. The common solution is to introduce an independent cache for each processor. However, this causes the cache coherence problem, which is usually solved by hardware-level update protocols, the software is transparent.

Thundering Herd)

When multiple threads are on the same resource (such as lock), you need to consider how to deal with these suspend threads when the resource is released by the current owner?

1. only wake up one of the threads, for example, the thread with the highest priority or the longest suspend time.

2. Wake up all threads of suspend on the resource, which causesThundering
HurdSymptom-although all threads are awakened, only one thread can finally obtain resources. After other threads perform a high-overhead process switch to gain CPU control, all they can do is suspend again, this seriously reduces performance.

When the probability of a thundering Hurd is large (there are few resources but many threads), more threads only mean worse performance.

The essence of thundering Hurd is that a large number of threads compete for a small amount of resources.

CHP 16 system architecture dependency

Memory hierarchies

Access time is a latency issue:How LongDoes it take to start getting data?

Width is bandwidth issues:How muchData can I get once it starts arriving?

This is akin to the notion of a fire hose: How long does it take
Open the valve, once it is open how much water does the hose deliver
Per second, and how much water is in the trunk?

The increase of latency is not as fast as that of bandwidth. This is true for memory and hard disks.

Generally, L1 cache is not much different from registers in latency, but it is at a disadvantage in bandwidth.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reading abstract-efficient C ++ performance Programming Techniques

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reading abstract-efficient C ++ performance Programming Techniques

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support