Multi-core multiprocessor performance issues and scalability bottlenecks

Source: Internet
Author: User
Keywords Performance issues scalability multiprocessor multi-core

Computing hardware is fast 8084.html "> booming. While the clock speed is stationary, the transistor density is growing. Processor manufacturers want to improve multiprocessing by having multiple cores and hardware threads per chip. For example, the IBM power7® symmetric multiprocessor architecture supports up to 4 threads per kernel, 8 cores per chip, and 32 chip slots per server to achieve high parallelism, with a total of http://www.aliyun.com/zixun/aggregation/ 12560.html ">1024 a concurrent hardware thread. By contrast, the IBM power6® architecture supports only 2 threads per kernel, 2 cores per chip, 32 chip slots per server, and a total of 128 parallel hardware threads.

When developing software, designers now need to consider multiprocessor, multi-core architectures that might deploy software. This is because:

by using more cores, hardware threads, and higher memory, applications should perform better and scale more efficiently and meet growing performance and efficiency requirements. With the increasing use of multi-core, multiprocessor systems, software design considerations should now consider including ways to effectively distribute software functionality between these computing resources. If these considerations are not considered in the design process, running an application in a multiprocessor, multi-core environment can cause serious and difficult performance problems.

This article will briefly describe some important considerations for designing software for multicore and multiprocessor environments.

Barriers to software scalability on chip multithreading, multi-core, multiprocessor architectures

Applications should be able to scale and perform better in multi-core, multiprocessor environments. However, if 8639.html > Application design is inefficient, it may perform poorly in such environments, but can be scaled and performed better by using available computing resources. Some of the major impediments to this scalability may be:

Inefficient parallelization: A monolithic application or software cannot effectively use the available computing resources. You need to organize your application into parallel tasks. This problem is often seen in traditional applications or software that do not support multithreading. These applications are not scalable on multi-core, multiprocessor, chip-multi-threaded hardware, and do not achieve better throughput. Too many threads, like too few threads, do not produce good results. Serial bottlenecks: Applications that share data structures across multiple threads or processes may have serial bottlenecks. In order to maintain data integrity, access to these shared data structures may have to be serialized using locking and serialization techniques (for example, read locks, read-write locks, write locks, spin locks, mutexes, etc.). Poorly designed locks may attempt to acquire a lock because of a serial bottleneck caused by a high lock contention between threads or processes. This can potentially degrade the performance of your application or software. The performance of the application may decrease as the core or processor count increases. Over-reliance on the operating system (OS) or runtime environment: you cannot rely on the operating system, the runtime environment, or the compiler to do everything you need to flex your application or software. However, compilers and run-time environments can help provide some optimizations that you cannot rely on to resolve all scalability issues. For example, you cannot rely on the Java™ virtual machine (JVM) to discover the best scalability opportunities for Java applications by automating parallelism. Workload imbalances can be a bottleneck: uneven distribution of workloads can lead to the inability to efficiently utilize computing resources. You may have to divide larger tasks into smaller tasks that can run in parallel, and you may have to change your serial algorithms to parallel algorithms to improve performance and scalability. I/O bottlenecks: Bottlenecks caused by blocking disk input/output (I/O) or high network latency can severely inhibit the scalability of your application. Invalid memory management: On multi-core platforms, because there are many processing units, pure computing can be very inexpensive, and primary memory may not be a problem because it is getting bigger. However, memory bandwidth has been a bottleneck because all processor cores contribute to a common bus. Invalid memory management can result in some performance problems that are difficult to detect, such as pseudo sharing.

Low processor utilization may clearly indicate that resource utilization is not up to optimal value. To understand performance issues, you need to evaluate whether your application has too few or too many threads, a lock or sync problem, network or I/O latency, memory jitter, or other memory management issues. High processor utilization is usually good as long as the resource is on an application thread that is spent on meaningful work.

Overview of Chip multithreading (CMT), multi-core and Multiprocessor (MP) systems

Before we discuss the design considerations for a multithreaded, multi-core, multiprocessor environment, we will briefly describe such systems. The system described in Figure 1 has two processors with two cores per processor and two hardware threads per core. Each core has a L1 cache and a L2 cache. As a result, each core may have its own L2 cache, or the core on the same processor may share the L2 cache. Hardware threads on the same core share the L1 and L2 caches.




Figure 1. A typical chip multithreading, multi-core, multiprocessor system





All cores and processors share the system bus and access primary memory or RAM through the system bus. For applications and operating systems, the system looks like 8 logical processors.

The following important concepts will help us understand the challenges of designing applications for such a chip in a multithreaded, multi-core, multiprocessor environment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.