Intel64 and IA-32 Architecture Optimization Guide Chapter 1 multi-core and hyper-Threading Technology-8th optimization guidelines

Last Update:2018-12-05 Source: Internet

Author: User

Tags prefetch

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

8.3 optimization criteria

This section summarizes the optimization principles for adjusting multi-threaded applications. Five fields are listed below (in order of importance ):

● Thread Synchronization

● Bus utilization

● Memory Optimization

● Frontend Optimization

● Resource Optimization

This section lists the practices associated with each domain. The following sections will further discuss the principles in each field.

Most programming suggestions can improve the performance of using the multi-processor core and HT technology. Technologies used only in a single environment are not mentioned.

8.3.1 key practices of Thread Synchronization

The key practices for minimizing thread synchronization costs are summarized as follows:

● Insert the pause command in a fast rotating loop and minimize the number of repeating cycles to improve the overall system performance.

● Use a sequential lock to replace a spin lock that may be obtained by multiple threads, so that no more than two threads have write access to a lock. If only one thread needs to write a variable shared by the two threads, the lock is not required.

● Use the thread blocking API to release the processor in a long idle loop.

● Prevent "error sharing" for each thread data between two threads ".

● Place each synchronization variable separately, and place it in an independent 128 bytes or in an independent cache row.

8.3.2 key practices of system bus Optimization

Management bus traffic performance greatly affects the overall performance of multi-threaded software and MP systems. Key Practices for optimizing the system bus to achieve high data throughput and fast response include:

● Increase the data and code location to keep the bandwidth of the bus command.

● Avoid excessive use of software prefetch commands and allow automatic hardware prefetch to take effect. Excessive use of software prefetch will greatly increase the use of the bus without any need, if not properly used.

● Consider using stacked multiple back-to-back memory reads to reduce the effective cache failure latency.

● Use complete write transactions to achieve higher data throughput.

8.3.3 key practices for Memory Optimization

The following describes key practices for optimizing memory operations:

● Use cache blocks to increase data access locations. When the target processor supports HT technology, the target is set to 1/4 or 1/2 of the cache size.

● Minimize data sharing between threads executed on different physical processors. These physical processors share a public bus.

● Minimize the Data Access Mode of the Offset of 64 kB in each thread.

● Adjust the private stack size of each thread in an application, make the gaps between these stacks not equal to the offset of 64 K bytes or 1 M byte multiples (to prevent unnecessary cache row eviction) when the target processor supports HT technology.

● When two instances of the same application are executed in a locked step, add a stack offset for each instance to avoid memory access being a multiple of 64 kB or 1 Mbit/s, when the target processor supports HT technology.

8.3.4 key frontend optimization practices

Key Practices for front-end Optimization on processors supporting HT technology include:

● Avoid excessive loop expansion to ensure efficient Trace Cache operations.

● Optimize the code size to improve the local Trace Cache performance and increase the tracing length.

8.3.5 key practices for implementing Resource Optimization

Each physical processor has dedicated execution resources. Logical processors in physical processors that support HT technology share specific on-chip execution resources. Key Practices for resource optimization include:

● Optimize each thread and adjust the optimum frequency first.

● Optimize multi-threaded applications to achieve optimal adjustment related to the number of physical processors.

● If two threads share the execution resources of the same physical processor package, the on-chip execution resources can be used interchangeably.

● For each processor supporting HT technology, add non-cooperative threads in terms of functions to increase the utilization of hardware resources in each physical processor package.

8.3.6 versatility and performance impact

The next five sections cover detailed optimization techniques. The suggestions discussed in each section are classified based on the estimated local impact and the importance of universality.

The classification is subjective and approximate. They vary depending on the programming style, application, and thread domain. Each suggestion contains a high, medium, and low impact level to provide a relative indication of the expected performance gain when a suggestion is implemented.

It is unlikely that a code instance can be predicted across many applications. Therefore, an impact level cannot be directly related to the performance gain of the application layer. The general classification is subjective and rough.

Programming recommendations that do not affect all three metrics are generally classified as medium or low.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More