Load Balancing problems in multi-core programming)

Source: Internet
Author: User
Load Balancing problems in multi-core programmingAuthor: Zhou weiming related article links: Lock competition in multi-core programming several difficulties of multi-core programming and their countermeasures (problem 1) OpenMP parallel programming (2) OpenMP parallel programming (1) fast sorting efficiency on dual-core CPU the lock competition problem in multi-core programming mentioned in this article is a serial problem in multi-core programming. This article will explain another problem in multi-core programming, load Balancing problems. In multi-core CPUs, to make full use of the performance of multiple CPUs, you must ensure a good load balance for the tasks allocated to each CPU. Otherwise, some CPUs are running and others are idle, so they cannot take advantage of multi-core CPUs. There are usually two solutions to achieve a good load balancing: static load balancing and dynamic load balancing. 1 Static Load BalancingIn static load balancing, You need to manually divide the program into multiple parts that can be executed in parallel, and ensure that each part can be evenly distributed to each CPU, that is to say, the workload needs to be evenly distributed among multiple tasks to achieve a high acceleration coefficient. The static load balancing problem is an NP completeness problem in mathematics, Richard M. karp, Jeffrey D. ullman, Christos H. papadimitriou, M. garey, D. johnson and others proved the NP completeness of Static Load Problems under several different constraints from 1972 to 1983. Although NP completeness is a difficult problem in mathematics, it is not a difficult problem in the title, because NP completeness problems can generally be solved by a very effective approximate algorithm. 2 Dynamic Load BalancingDynamic Load Balancing is the task allocation in the running process of the program to achieve load balancing. In actual situations, there are many problems that cannot be solved by static load balancing. For example, in a large loop, the number of loops is input by external entities, and the number of loops is unknown in advance, in this case, it is difficult to achieve Load Balancing by using static load balancing policies. In dynamic load balancing, task scheduling is generally implemented by the system. programmers can only select Dynamic and balanced Scheduling Policies and cannot modify them. Because there are many uncertainties in the actual task, the scheduling algorithm cannot be excellent, so dynamic load balancing sometimes fails to meet the established load balancing requirements. 3 Where are the difficulties in load balancing?The difficulty of Server Load balancer is not the degree of load balancing, because even if there are some gaps in the task execution time allocated on each CPU, however, with the increase in the number of CPU cores, the total execution time is reduced, so that the acceleration coefficient increases with the increase in the number of CPU cores. The difficulty of Load Balancing lies in that many parallel execution blocks in the program are divided by programmers. Of course, the number of CPU cores is small, such as dual-core or 4-core. This division is not very difficult. However, with the increase in the number of cores, the granularity of the Division will become increasingly small. When the number of cores is higher than 16, it is estimated that programmers will be crazy about how to divide tasks. For example, if you want to run a piece of code in sequence on a 128-core CPU, You need to manually divide the code into 128 tasks. The difficulty of the Division can be imagined. The error of load Division will increase as the number of CPU cores increases. For example, a program that requires 16 time units is executed on four tasks, the average load execution time for each task is 4 time units. If the Division error is 1 time unit, the acceleration coefficient is 16/(4 + 1) = 3.2, it is an ideal acceleration factor of 80% of 4. However, if a task is run on a 16-core CPU, if the division error of a task is 0.5 time units, the acceleration coefficient is 16/(1 + 0.5) = 10.67, only 66.7% of the ideal acceleration coefficient 16. If the number of cores increases, the ratio of the acceleration coefficient will decrease compared with the ideal acceleration coefficient due to the amplification of the error. The difficulty of load Division is also reflected in the upgrading of CPU and software. For example, the load division on 4-core CPU is balanced, but to 8-core and 16-core, the load may become unbalanced again. The same is true for software upgrades. When software is added with functions, load balancing will be damaged, and load balancing needs to be re-divided to achieve load balancing. In this way, the difficulty and trouble of software design are greatly increased. If a lock is used, some seemingly balanced loads may also become unbalanced due to lock competition. For details, see: http://blog.csdn.net/drzhouweiming/archive/2007/04/10/1559718.aspx 4 And load balancing strategiesFor software with a small amount of computing, even if it is placed on a single core CPU, the operation speed is fast, and load balancing is not significantly affected, in practice, the load balancing should take into account the large amount of computing and large-scale software. These software must be load balanced on multiple cores to better utilize the multi-core to improve performance. For large-scale software, the countermeasures adopted in load balancing are to develop the macro division method of parallel blocks and divide them from the entire software system, instead of performing parallel decomposition for some local programs and algorithms, because local programs are usually difficult to break down into dozens of tasks. Another coping strategy is at the tool level, that is, the compilation tool can assist in manual Parallel Block decomposition and find a good decomposition solution. Intel has made some efforts in this regard, however, more efforts are needed to make the tool more powerful in order to cope with the large number of cores. References: the parallel programming model, Timothy Mattson, edited by Jack dongarra, Jack dongarra, and translated parallel programming by Mo zeyao, Barry Wilkinson, and translated by Lu Xinda 《 the multi-core programming technology, Shameem Akhter, edited by Li Baofeng and others for translating parallel algorithm practices and Chen Guoliang: http://blog.csdn.net/drzhouweiming/article/details/1559698

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.