The concept of parallelism has been very popular recently. NET 4 has also introduced TPL (task parallel Library) and PLinq (parallel Linq) to simplify parallel programming.
So I want to try it out and start with the simplest parallel summation to see how efficient the parallel operation is.
I use the i3 530 CPU, which is a pseudo quad-core. If the quad-core parallel sum is used, the time consumed should be about 25% of the non-parallel sum.
Then the code is verified:
Parallel sum Efficiency Test
To form an array full of random numbers:
1
2
3
4
Var random = new Random ();
Var data = Enumerable. Range (1, 67108864)
. Select (I => (long) random. Next (int. MaxValue ))
. ToArray ();
This array is relatively large and takes about 5 seconds to generate.
Use the Stopwatch class timing during testing:
1
2
3
4
5
6
7
8
9
10
11
12
Stopwatch w = new Stopwatch ();
// Parallel
W. Start ();
Var sum1 = data. AsParallel (). Sum ();
W. Stop ();
Console. WriteLine (w. ElapsedMilliseconds );
// Non-parallel
W. Reset ();
W. Start ();
Var sum2 = data. Sum ();
W. Stop ();
Console. WriteLine (w. ElapsedMilliseconds );
The results of multiple executions are as follows (considering that the execution sequence may affect the results, the next five executions will test the non-parallel summation before the parallel summation ):
1 2 3 4 5 6 7 8 9 10
Parallel 385 385 376 409 437 342 419 347 379 342
Non-parallel 733 666 668 665 669 673 667 670 669 672
(Unit: milliseconds)
The average parallel sum takes only 57% of the time, compared to non-parallel sum. Although the efficiency has improved, it is much worse than the 25% we initially imagined.
If you only use two cores, check the CPU usage in the task manager. However, a single execution takes less than one second, and the time is too short to be clearly understood. Add a loop in the Code:
1
2
3
4
For (int I = 0; I <100; I ++)
{
Var sum = data. AsParallel (). Sum ();
}
Execute the command. We can see that the quad-core has all been used:
It seems that the parallel summation algorithm is a problem. It is better to improve it on your own.
Simple parallel summation algorithm
The idea is very simple. Divide the array into multiple parts, and each CPU core is responsible for the sum of one piece. Finally, add each piece as follows:
1 var partitionCount = 4; // number of parts
2var partitionLength = (data. Count ()-1)/partitionCount + 1; // length of each part
3var results = new long [partitionCount]; // declare an array to retain the sum of Blocks
4var r = Parallel. For (0, partitionCount, I => results [I] = PartitionSum (data, partitionLength * I, partitionLength ));
5var sum3 = results. Sum (); // Add each block
PartitionSum is a function used to calculate the sum of a certain partition:
1 public static long PartitionSum (long [] source, int start, int length)
2 {
3 long result = 0;
4 int end = start + length;
5 if (end> source. Length) end = length;
6 for (int I = start; I <end; I ++)
7 result + = source [I];
8 return result;
9}
Test Time:
1 2 3 4 5 6 7 8 9 10
Parallel improvement: 153 161 151 154 152 156 157 153 155 153
(Unit: milliseconds)
23% of the non-parallel execution time, which is similar to our initial expectation, is better.
Summary
Although the test in this article is simple, crude, and unauthoritative, it also shows that the parallel summation algorithm of. NET 4 is inefficient to some extent.
This article only tests the parallel summation of. NET 4 and only involves a few parallel sums in. NET 4 .. The low efficiency of the parallel summation algorithm of NET 4 does not mean that the efficiency of other parts is not high.
Postscript: test results on a real quad-core CPU Xeon E5606
A friend mentioned in the reply that it may be the reason for the pseudo quad-core CPU. So I found a server with Xeon E5606, which is a genuine quad-core processor.
The test results are as follows:
1 2 3 4 5 6 7 8 9 10
Non-parallel 858 863 861 859 858 862 863 859 861 859
Parallel 250 248 280 248 248 248 249 248 249 249
Parallel improvement: 121 121 121 121 121 121 121 121 121 121
(Unit: milliseconds)
Compared with non-parallel computing, the. NET parallel processing time is about 29%, and my simple improvement algorithm is about 14%.
29% is very close to the expected 25%. It seems that the reason is that it is a pseudo-quad-core.
However, another problem also arises. Why is my simple improvement algorithm relatively more efficient.
-------------------
Spark of thoughts, illuminating the world