Multi-core multiprocessor performance issues and scalability bottlenecks

Last Update:2014-12-24 Source: Internet

Author: User

Keywords Performance issues scalability multiprocessor multi-core

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Computing hardware is fast 8084.html "> booming. While the clock speed is stationary, the transistor density is growing. Processor manufacturers want to improve multiprocessing by having multiple cores and hardware threads per chip. For example, the IBM power7® symmetric multiprocessor architecture supports up to 4 threads per kernel, 8 cores per chip, and 32 chip slots per server to achieve high parallelism, with a total of http://www.aliyun.com/zixun/aggregation/ 12560.html ">1024 a concurrent hardware thread. By contrast, the IBM power6® architecture supports only 2 threads per kernel, 2 cores per chip, 32 chip slots per server, and a total of 128 parallel hardware threads.

When developing software, designers now need to consider multiprocessor, multi-core architectures that might deploy software. This is because:

by using more cores, hardware threads, and higher memory, applications should perform better and scale more efficiently and meet growing performance and efficiency requirements. With the increasing use of multi-core, multiprocessor systems, software design considerations should now consider including ways to effectively distribute software functionality between these computing resources. If these considerations are not considered in the design process, running an application in a multiprocessor, multi-core environment can cause serious and difficult performance problems.

This article will briefly describe some important considerations for designing software for multicore and multiprocessor environments.

Barriers to software scalability on chip multithreading, multi-core, multiprocessor architectures

Applications should be able to scale and perform better in multi-core, multiprocessor environments. However, if 8639.html > Application design is inefficient, it may perform poorly in such environments, but can be scaled and performed better by using available computing resources. Some of the major impediments to this scalability may be:

Inefficient parallelization: A monolithic application or software cannot effectively use the available computing resources. You need to organize your application into parallel tasks. This problem is often seen in traditional applications or software that do not support multithreading. These applications are not scalable on multi-core, multiprocessor, chip-multi-threaded hardware, and do not achieve better throughput. Too many threads, like too few threads, do not produce good results. Serial bottlenecks: Applications that share data structures across multiple threads or processes may have serial bottlenecks. In order to maintain data integrity, access to these shared data structures may have to be serialized using locking and serialization techniques (for example, read locks, read-write locks, write locks, spin locks, mutexes, etc.). Poorly designed locks may attempt to acquire a lock because of a serial bottleneck caused by a high lock contention between threads or processes. This can potentially degrade the performance of your application or software. The performance of the application may decrease as the core or processor count increases. Over-reliance on the operating system (OS) or runtime environment: you cannot rely on the operating system, the runtime environment, or the compiler to do everything you need to flex your application or software. However, compilers and run-time environments can help provide some optimizations that you cannot rely on to resolve all scalability issues. For example, you cannot rely on the Java™ virtual machine (JVM) to discover the best scalability opportunities for Java applications by automating parallelism. Workload imbalances can be a bottleneck: uneven distribution of workloads can lead to the inability to efficiently utilize computing resources. You may have to divide larger tasks into smaller tasks that can run in parallel, and you may have to change your serial algorithms to parallel algorithms to improve performance and scalability. I/O bottlenecks: Bottlenecks caused by blocking disk input/output (I/O) or high network latency can severely inhibit the scalability of your application. Invalid memory management: On multi-core platforms, because there are many processing units, pure computing can be very inexpensive, and primary memory may not be a problem because it is getting bigger. However, memory bandwidth has been a bottleneck because all processor cores contribute to a common bus. Invalid memory management can result in some performance problems that are difficult to detect, such as pseudo sharing.

Low processor utilization may clearly indicate that resource utilization is not up to optimal value. To understand performance issues, you need to evaluate whether your application has too few or too many threads, a lock or sync problem, network or I/O latency, memory jitter, or other memory management issues. High processor utilization is usually good as long as the resource is on an application thread that is spent on meaningful work.

Overview of Chip multithreading (CMT), multi-core and Multiprocessor (MP) systems

Before we discuss the design considerations for a multithreaded, multi-core, multiprocessor environment, we will briefly describe such systems. The system described in Figure 1 has two processors with two cores per processor and two hardware threads per core. Each core has a L1 cache and a L2 cache. As a result, each core may have its own L2 cache, or the core on the same processor may share the L2 cache. Hardware threads on the same core share the L1 and L2 caches.

Figure 1. A typical chip multithreading, multi-core, multiprocessor system

All cores and processors share the system bus and access primary memory or RAM through the system bus. For applications and operating systems, the system looks like 8 logical processors.

The following important concepts will help us understand the challenges of designing applications for such a chip in a multithreaded, multi-core, multiprocessor environment.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More