30. Java Concurrency and multithreading-Amdahl law

Source: Internet
Author: User

The following is transferred from http://ifeve.com/amdahls-law/:

The Amdahl law can be used to calculate the ability of the processor to improve efficiency after parallel operation. Amdahl's law was named after Gene Amdal introduced the law in 1967. The vast majority of developers using parallel or concurrent systems have a sense of concurrency or parallelism that may bring speed, or even the Amdahl law. However, it is still useful to know the Amdahl law.

I will first introduce the law of Amdahl Law in arithmetic, and then use the chart to illustrate it.

Definition of Amdahl Law

A program (or an algorithm) can be divided into the following two parts according to whether it can be parallelized:

    • Parts that can be parallelized
    • Parts that can not be parallelized

Suppose a program processes files on disk. A small portion of this program is used to scan paths and create file directories in memory. Once this is done, each file is processed in a separate thread. The scan path and the section creating the file directory cannot be parallelized, but the process of processing the file can be.

The total time of the program's serial (non-parallel) execution is recorded as T. Time t includes time that cannot be parallel and can be parallel to the part. It is not possible to be in parallel with the part we remember as B. Then the part that can be parallel is t-b. These definitions are summarized in the following list:

    • T = total time for serial execution
    • B = Total time that cannot be parallel
    • T-b = Total time of the parallel part

From the above can be drawn:

T = B + (t–b)

First of all, this may seem a little strange, the parallel part of the program does not have its own identity in the above formula. However, since the formula can be parallel to the total time t and B (non-parallel parts), the formula has actually been simplified conceptually, that is to say in this way to reduce the number of variables.

T-b is a parallelized part that can be executed in parallel to increase the speed of the program. How much you can speed up depends on how many threads or how many CPUs are executing. The number of threads or CPUs we remember is N. The quickest time that a parallelized part is executed can be calculated by the following formula:

(t–b)/N

or in this way

(1/n) * (T–B)

The second way is used in wikis.

According to Amdahl's law, when a program's parallel parts are executed using n threads or CPUs, the total time to execute is:

T (N) = B + (t–b)/N

T (n) refers to the total execution time when the parallel factor is n. Therefore, T (1) Executes the total execution time of the program in the parallel factor of 1 o'clock. Using T (1) instead of T, the law of Amdahl looks like this:

T (n) = B + (t (1) –b)/N

The meaning of the expression is the same.

A calculation example

To better understand Amdahl's law, let's look at a computational example. The total time to execute a program is set to 1. The program's non-parallelization accounted for 40%, according to the total time 1 calculation, is 0.4. The parallel part is 1–0.4 = 0.6.

In the case of a parallel factor of 2, the execution time of the program will be:

T (2) = 0.4 + (1-0.4)/2     = 0.4 + 0.6/2     = 0.4 + 0.3     = 0.7

在并行因子为5的情况下,程序的执行时间将会是:

T (5) = 0.4 + (1-0.4)/5     = 0.4 + 0.6/5     = 0.4 + 0.12     = 0.52

Amdahl Law diagram

In order to better understand Amdahl's law, I will try to demonstrate how the law of the established is born.

First of all, a program can be divided into two parts, part of the non-parallel part B, part of the 1–b can be parallel part. Such as:

The line with the split line at the top represents the total time t (1).

Below you can see the execution time in the case of a parallel factor of 2:

The case of a parallel factor of 3:

Optimization algorithm

From the Amdahl law, it can be seen that the parallelized part of the program can be run faster by using more hardware (more threads or CPUs). For non-parallelized parts, only by optimizing the code to achieve the goal of speed. Therefore, you can improve the speed and parallelism of your program by optimizing the non-parallelized parts. You can make a little change in the algorithm for non-parallelization, and if possible, you can also move some of the parts that can be parallelized.

Optimizing Serial Components

If you optimize the serialization part of a program, you can also use the Amdahl law to calculate the execution time after the program is optimized. If the non-parallel part is optimized by a factor O, then the Amdahl law looks like this:

T (O, N) = b/o + (1-b/O)/N

Remember, the non-parallelized part of the program now takes up B / O the time, so the time that can be parallelized is accounted 1 - B / O for.

If B is 0.1,o to 2,n as 5, the calculation looks like this:

T (2,5) = 0.4/2 + (1-0.4/2)/5       = 0.2 + (1-0.4/2)/5       = 0.2 + (1-0.2)/5       = 0.2 + 0.8/5       = 0.2 + 0.16       = 0.36

Run time vs. acceleration

So far, we have only used the Amdahl law to calculate the execution time of a program or algorithm after optimization or parallelization. We can also use the Amdahl law to calculate the speedup ratio (speedup), which is optimized or serialized after the program or algorithm is much faster than the original.

If the previous version of the program or algorithm execution time is T, then the growth ratio is:

Speedup = t/t (o,n)

为了计算执行时间,我们常常把T设为1,加速比为原来时间的一个分数。公式大致像下面这样:

Speedup = 1/t (o,n)

如果我们使用阿姆达尔定律来代替T(O,N),我们可以得到下面的公式:

Speedup = 1/(b/o + (1-b/O)/N)

如果B = 0.4, O = 2, N = 5, 计算变成下面这样:

Speedup = 1/(0.4/2 + (1-0.4/2)/5)        = 1/(0.2 + (1-0.4/2)/5)        = 1/(0.2 + (1- 0.2)/5 )        = 1/(0.2 + 0.8/5 )        = 1/(0.2 + 0.16 )        = 1/0.36        = 2.77777 ...

上面的计算结果可以看出,如果你通过一个因子2来优化不可并行化部分,一个因子5来并行化可并行化部分,这个程序或算法的最新优化版本最多可以比原来的版本快2.77777倍。

Measurement, not just the calculation

Although the Amdahl law allows you to parallelize the theoretical speedup of an algorithm, do not rely too much on such calculations. In a real-world scenario, when you optimize or parallelize an algorithm, there are a number of factors that can be factored in.

Memory speed, CPU cache, disk, NIC, etc. may all be a limiting factor. If the latest version of an algorithm is parallelized, but causes a lot of CPU cache waste, you may not use X N CPUs to get the expected acceleration of x N. The same is true if your memory bus, disk, NIC, or network connection are in a high-load state.

Our advice is to use the law of Amdahl to guide us through the optimization process, rather than to measure the actual speedup caused by optimization. Remember, sometimes a highly serializable algorithm is better than a parallelized one, because a serialized version does not require coordinated management (context switching), and a single CPU can be more consistent on the underlying hardware (CPU pipeline, CPU cache, and so on).

30. Java Concurrency and multithreading-Amdahl law

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.