NET Parallel Programming Advanced Tutorial--parallel
Always feel that they do not know enough about concurrency, especially to see the "Code Clean Road" feel that it is necessary to learn to learn concurrent programming, because performance is also a measure of code cleanliness of a large standard. And in the book "Out of Control" it has been mentioned many times that concurrency, whether it is a computer or a creature, is dealing with various things concurrently. It's strange that when you focus on a thing, you'll find that thing often happens in the things around you. So Curiosity drives learning concurrency. This is the article.
I. Understanding hardware threads and software threads
Multicore processors have more than one physical core-the physical core is a true standalone processing unit, and multiple physical cores enable multiple instructions to run concurrently in parallel. Hardware threads are also known as logical cores, and a physical kernel can use Hyper-Threading technology to provide multiple hardware threads. So a hardware thread does not represent a physical kernel; each running program in Windows is a process, and each process creates and runs one or more threads, called software threads. A hardware thread is like a swimlane, and the software thread is the one who swims in it.
Second, parallel occasions
. Net Framework4 introduces the new Task Parallel library (Task parallel libraries, TPL), which supports data parallelism, task parallelism, and pipelining. Let developers cope with different parallel situations.
- Data parallelism: There is a lot of data that needs to be processed and the same action must be performed on each piece of data. For example, the AES algorithm encrypts 100 Unicode strings with a key of 256bit.
- Task parallelism: Run different operations concurrently through tasks. For example, generate a file hash code, encrypt the string, create a thumbnail image.
- Pipelining: This is a combination of task parallelism and data parallelism.
TPL introduces System.Threading.Tasks, the main class is a task, and this class represents an asynchronous concurrency operation, but we do not necessarily use an instance of the task class, you can use the parallel static class. It provides the Parallel.Invoke, Parallel.For Parallel.forecah three methods.
Third, Parallel.Invoke
The simplest way to try to run many methods in parallel is to use the Invoke method of the parallel class. For example, there are four methods:
- Watchmovie
- Havedinner
- Readbook
- Writeblog
You can use parallelism with the following code.
System.Threading.Tasks.Parallel.Invoke (Watchmovie, Havedinner, Readbook, Writeblog);
This code creates a delegate that points to each method. The Invoke method accepts an action parameter group.
1 |
public static void Invoke( params Action[] actions); |
The same effect can be achieved with a lambda expression or an anonymous delegate.
System.Threading.Tasks.Parallel.Invoke (() = Watchmovie (), () = Havedinner (), () = Readbook (), delegate () {Wr Iteblog (); });
1. There is no specific order of execution.
The Parallel.Invoke method will not return until all 4 methods have been completed. It requires at least 4 hardware cables friend enough for these 4 methods to run concurrently. However, there is no guarantee that these 4 methods can start running at the same time, and if one or more cores are busy, the underlying scheduling logic may delay initialization of some methods.
By adding a delay to the method, you can see that you have to wait for the longest method execution to complete before returning to the main method.
View Code
This can cause many logical cores to remain idle for a long time.
Iv. Parallel.For
Parallel.For provides load balancing for a fixed number of independent for loop iterations (execution is performed in different tasks, so that all tasks can remain busy for most of the time) in parallel execution. This makes it possible to take full advantage of all available cores.
We compare the following two methods, one using a for loop, and one using Parallel.For is to generate the key in the conversion to a hexadecimal string.
private static void Generateaeskeys () { var sw = stopwatch.startnew (); for (int i = 0; i < Num_aes_keys; i++) { var aesm = new aesmanaged (); Aesm.generatekey (); Byte[] result = Aesm.key; String hexstr = convertohexstring (result); } Console.WriteLine ("AES:" +SW. Elapsed.tostring ()); } private static void Parallelgenerateaeskeys () { var sw = stopwatch.startnew (); System.Threading.Tasks.Parallel.For (1, Num_aes_keys + 1, (int i) = = { var aesm = new aesmanaged (); Aesm.generatekey (); Byte[] result = Aesm.key; String hexstr = convertohexstring (result); }); Console.WriteLine ("Parallel_aes:" + SW. Elapsed.tostring ()); }
private static int num_aes_keys = 100000; static void Main (string[] args) { Console.WriteLine ("Executes" +num_aes_keys+ "Times:"); Generateaeskeys (); Parallelgenerateaeskeys (); Console.readkey (); }
Performed 1 million times
The parallel time here is half the serial.
Wu, Parallel.ForEach
In Parallel.For, sometimes optimizing an existing loop can be a very complex task. Parallel.ForEach provides load-balanced parallel execution for a fixed number of independent for Each loop iterations, and supports custom partitions, allowing users to fully master data distribution. The essence is to differentiate all the data to be processed into multiple parts, and then run the serial loops in parallel.
Modify the above code:
System.Threading.Tasks.Parallel.ForEach (Partitioner.create (1, Num_aes_keys + 1), Range = { var aesm = new Aesmanaged (); Console.WriteLine ("AES Range ({0},{1} cycle start time: {2})", Range. Item1,range. Item2,datetime.now.timeofday); for (int i = range. Item1; I < range. ITEM2; i++) { aesm.generatekey (); Byte[] result = Aesm.key; String hexstr = convertohexstring (result); } Console.WriteLine ("AES:" +SW. Elapsed.tostring ()); });
As can be seen from the results of implementation, 13 segments were executed.
The second execution is still 13 paragraphs. The speed is slightly different. The number of partitions is not initially specified, and Partitioner.create uses the built-in default values.
And we found that these partitions were not executed at the same time, roughly three times in a period. And the order of execution is different. The total time is about the same as the Parallel.For method.
public static Parallelloopresult foreach<tsource> (partitioner<tsource> source, action<tsource> Body
The Parallel.ForEach method defines both source and body two parameters. Source refers to the partition device. Provides a data source that is decomposed into multiple partitions. The body is the delegate to invoke. It accepts each defined partition as a parameter. A total of more than 20 overloads, in the above example, the partition type is Tuple<int,int>, is a two-tuple type. In addition, returns a value of Parallelloopresult.
Partitioner.create creation of partitions is determined by the number of logical cores and other factors.
public static Orderablepartitioner<tuple<int, int>> Create (int frominclusive, int toexclusive) { int num = 3; if (toexclusive <= frominclusive) throw new ArgumentOutOfRangeException ("toexclusive"); int rangesize = (toexclusive-frominclusive)/(Platformhelper.processorcount * num); if (rangesize = = 0) rangesize = 1; Return Partitioner.create<tuple<int, Int>> (Partitioner.createranges (frominclusive, ToExclusive, rangesize), enumerablepartitioneroptions.nobuffering); }
So we can modify the number of partitions, rangesize roughly about 250000. That means my logic kernel is 4.
var rangesize = (int) (num_aes_keys/environment.processorcount) + 1; System.Threading.Tasks.Parallel.ForEach (Partitioner.create (1, Num_aes_keys + 1,rangesize), range =
Execute again:
The partition becomes four, with no significant difference in time (the first time is the serial time). We see that these four partitions are executed almost simultaneously. In most cases, the load-balancing mechanism that TPL uses behind the scenes is very efficient, but the control of the partitions makes it easier for users to analyze their workloads to improve overall performance.
Parallel.ForEach can also reconstruct the Ienumerable<int> collection. Enumerable.range produces the number of serialization. However, this does not have the above partition effect.
private static void Parallelforeachgeneratemd5hashes () { var sw = stopwatch.startnew (); System.Threading.Tasks.Parallel.ForEach (Enumerable.range (1, num_aes_keys), number = = {var md5m = MD5. Create (); byte[] data = Encoding.Unicode.GetBytes (environment.username + number); Byte[] result = Md5m.computehash (data); String hexstring = convertohexstring (result); }); Console.WriteLine ("MD5:" +SW. Elapsed.tostring ()); }
Vi. exiting from the loop
Unlike break in serial operation, ParallelLoopState provides two methods to stop parallel.for and Parallel.ForEach execution.
- Break: Lets the loop stop executing as soon as possible after the current iteration has been executed. For example, if you execute to 100, the loop will dispose of all iterations less than 100.
- Stop: Let the loop stop executing as soon as possible. If a 100 iteration is performed, there is no guarantee that all iterations less than 100 will be processed.
Modify the method above: Exit after 3 seconds of execution.
private static void Parallelloopresult (Parallelloopresult loopresult) {string text; if (loopresult.iscompleted) {text = "loop complete"; } else {if (loopResult.LowestBreakIteration.HasValue) { Text = "Break terminated"; } else {text = "Stop terminated"; }} Console.WriteLine (text); } private static void Parallelforeachgeneratemd5hashesbreak () {var SW = stopwatch.startnew (); var loopresult= System.Threading.Tasks.Parallel.ForEach (Enumerable.range (1, Num_aes_keys), (int Number,parallell Oopstate loopstate) = {var md5m = MD5. Create (); byte[] data = Encoding.Unicode.GetBytes (environment.username + number); Byte[] result = Md5m.computehash (data); String hexstring = convertohexstring (result); if (SW. Elapsed.seconds > 3) {loopstate.stop (); } }); Parallelloopresult (Loopresult); Console.WriteLine ("MD5:" + SW. Elapsed); }
Vii. catching exceptions that occur in parallel loops.
When a delegate called in a parallel iteration throws an exception that is not captured in the delegate, it becomes a set of exceptions, and the new system.aggregateexception is responsible for handling the set of exceptions.
private static void Parallelforeachgeneratemd5hashesexception () {var SW = stopwatch.startnew (); var loopresult = new Parallelloopresult (); try {loopresult = System.Threading.Tasks.Parallel.ForEach (Enumerable.range (1, Num_aes_keys), (n Umber, loopstate) = {var md5m = MD5. Create (); byte[] data = Encoding.Unicode.GetBytes (environment.username + number); Byte[] result = Md5m.computehash (data); String hexstring = convertohexstring (result); if (SW. Elapsed.seconds > 3) {throw new TimeoutException ("execute more than three seconds"); } }); } catch (AggregateException ex) {foreach (var innerex in ex). InnerExceptions) {Console.WriteLine (innerex.tostring ()); } } parallelloopresult (Loopresult); Console.WriteLine ("MD5:" + SW. Elapsed); }
Results:
The exception appeared several times.
Viii. specify the degree of parallelism.
The TPL approach always attempts to use all the available logical cores to achieve the best results, but sometimes you don't want to use all the cores in a parallel loop. For example, you need to set aside a kernel that does not participate in parallel computing to create an application that responds to the user, and that the kernel needs to help you run other parts of the code. A good workaround at this time is to specify the maximum degree of parallelism.
This requires creating an instance of ParallelOptions and setting the value of Maxdegreeofparallelism.
private static void Parallelmaxdegree (int maxdegree) { var paralleloptions = new ParallelOptions (); Paralleloptions.maxdegreeofparallelism = Maxdegree; var sw = stopwatch.startnew (); System.Threading.Tasks.Parallel.For (1, Num_aes_keys + 1, paralleloptions, (int i) = = {var aesm = new Aesmanag Ed (); Aesm.generatekey (); Byte[] result = Aesm.key; String hexstr = convertohexstring (result); }); Console.WriteLine ("AES:" + SW.) Elapsed.tostring ()); }
Call: If you are running on a quad-core microprocessor, you will use 3 cores.
Parallelmaxdegree (environment.processorcount-1);
Time is roughly a bit slow (first time parallel.for 3.18s), but you can free up a kernel to handle other things.
Summary: This time we learned about parallel methods and how to exit parallel loops and catch exceptions, set the degree of parallelism, and the knowledge about parallelism. There are similar blogs in the garden. But as a management of their own knowledge, here to comb over.
Blog of the Friends of the park: 8 days of play and concurrency
Read the book: Advanced tutorial on parallel programming in C #
Like reading, also like to share books (not limited to technical books) friends, invites to join the book Mountain Road Group Q:452450927. We recommend too many books to you to read.
More than half of the core competencies of people come from leisurely things-reading, exercising, making friends with the wise, and hobbies.
NET Parallel Programming Advanced Tutorial--parallel