C++11 Multi-Threading Teaching (I.)

Last Update:2014-08-13 Source: Internet

Author: User

Tags posix macbook

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This tutorial code is available on GitHub: Https://github.com/sol-prog/threads.

In previous teaching, I showed some of the most recent c++11 language content:

1. Regular Expressions (http://solarianprogrammer.com/2011/10/12/cpp-11-regex-tutorial/)
2. Raw String (http://solarianprogrammer.com/2011/10/16/cpp-11-raw-strings-literals-tutorial/)
3. Lambda (http://solarianprogrammer.com/2011/11/01/cpp-11-lambda-tutorial/)

Perhaps supporting multithreading is one of the biggest changes in the C + + language. Previously, C + + could only take advantage of the operating system's functionality (Unix family systems using the Pthreads Library) or, for example, OpenMP and MPI, to achieve the goal of multi-core computing.

This tutorial is intended to give you a head start on the c++11 thread, rather than simply listing the language standards here.

Creating and starting a C + + thread is as simple as adding a thread header file to a C + + source. Let's take a look at how to create a simple HelloWorld with threading:

#include "iostream"

#include "Thread"

This function would be is called from a thread

The function will be called in a thread

void Call_from_thread () {

Std::cout << "Hello, World" << Std::endl;

}

int main () {

Launch a thread

Start a thread

Std::thread T1 (call_from_thread);

Join the thread with the main thread

Collaboration with the main thread

T1.join ();

return 0;

}

On Linux systems, the above code can be compiled with g++:

g++-std=c++0x-pthread File_name.cpp

On a xcode4.x system, you can compile the above code with clang++:

clang++-std=c++0x-stdlib=libc++ File_name.cpp

On the Windows system, you can use the paid code base, Just::thread, to compile multithreaded code. Unfortunately, they didn't provide a trial version of the code base, and I couldn't do the testing.

In real-world applications, the function "Call_from_thread" is relative to the main function, doing some arithmetic work independently. In the preceding code, the main function creates a thread and waits for the T1 thread to run at T1.join (). If you forget to consider waiting for a thread to end up in the code, the main thread might end its own running state, and the entire program will kill the previously created thread when it exits, regardless of whether the function "Call_from_thread" has finished executing.

The above code is relatively concise compared to the equivalent code using POSIX threads. Look at the equivalent code using POSIX threads:

This function would be is called from a thread

void *call_from_thread (void *) {

Std::cout << "launched by thread" << Std::endl;

return NULL;

}

int main () {

pthread_t T;

Launch a thread

Pthread_create (&t, NULL, call_from_thread, NULL);

Join the thread with the main thread

Pthread_join (t, NULL);

return 0;

}

We usually want to start multiple threads at once to work in parallel. To do this, we can create a thread group instead of creating a thread as in the previous example. In the following example, the main function creates 10 threads for a set, and waits for those threads to complete their task (also included in the GitHub code base is the POSIX version of this example):

...

static const int num_threads = 10;

...

int main () {

Std::thread T[num_threads];

Launch A group of threads start a set of threads

for (int i = 0; i < num_threads; ++i) {

T[i] = Std::thread (call_from_thread);

}

Std::cout << "launched from the Mainn";

Join the threads with the main thread

for (int i = 0; i < num_threads; ++i) {

T[i].join ();

}

return 0;

}

Remember, the main function is also a thread, often called the main thread, so the above code actually has 11 threads running. After these thread groups are started, the thread group and the main function are allowed to do some other things before they are co-ordinated (join), and at the end of the tutorial we will use an example of image processing to illustrate them.

What about using a function with formal parameters in a thread? C++11 allows us to call in the thread, with any required parameters. To illustrate, we can modify the above code to accept an integer parameter (the POSIX version of this example is also included in the GitHub code base):

static const int num_threads = 10;

This function would be is called from a thread

void call_from_thread (int tid) {

Std::cout << "launched by thread" << tid << Std::endl;

}

int main () {

Std::thread T[num_threads];

Launch a group of threads

for (int i = 0; i < num_threads; ++i) {

T[i] = Std::thread (Call_from_thread, i);

}

Std::cout << "launched from the Mainn";

Join the threads with the main thread

for (int i = 0; i < num_threads; ++i) {

T[i].join ();

}

return 0;

}

On my system, the result of the above code is:

sol$./a.out

Launched by thread 0

Launched by thread 1

Launched by thread 2

Launched from the main

Launched by thread 3

Launched by thread 5

Launched by thread 6

Launched by thread 7

Launched by thread launched by thread 4

Aunched by Thread 9

sol$

Can see the above results, once the program creates a thread, its operation has a sequence of uncertain phenomenon. The programmer's task is to make sure that the set of threads does not block when accessing public data. In the last few lines, the displayed output of the disorder shows that when the Line 8 line is started, the Line 4-line process has not yet completed the write operation on the stdout. In fact, assuming that you run the above code on your own machine, you'll get a completely different result, or even some confusing character output. The reason is that the 11 threads within the program are competing to use the STDOUT public resource (case: Race Conditions).

To avoid the above problem, you can use interceptors (barriers) in your code, such as Std:mutex, to make a bunch of threads access the public resources in a synchronous (synchronize) manner, or, if feasible, to set aside private data structures for the threads and avoid using public resources. We will also talk about thread synchronization in future teaching, including the use of atomic manipulation types (atomic types) and mutexes (mutexes).

In principle, the concepts that are required to write more complex parallel code are discussed in the code above.

For the next example, I'm going to show the power of the parallel programming scenario. This is a slightly more complicated question: Use a soft filter (blur filter) to remove the clutter from a picture. The idea is to take advantage of a bit of pixels and the weighted mean of its neighboring pixels of an algorithm (case: post-Filter), to remove the picture clutter.

This tutorial is not about optimizing image processing, and I am not a specialist in this way, so we only take a fairly simple approach. Our goal is to outline how to write parallel code, as to how to efficiently access the picture, and the convolution calculation of the filter is not the focus. I am here as an example, using only the definition of spatial convolution, rather than using more resonant peaks (?), and of course slightly more difficult to achieve, the frequency domain convolution using a fast Fourier transform.

For simplicity, we will use a simple uncompressed image file of ppm. Next, we provide a simplified C + + class header file that is responsible for reading and writing PPM images and rebuilding images in three unsigned character array structures in memory (RGB tri-color):

Class PPM {

BOOL Flag_alloc;

void Init ();

Info about the PPM file (height and width)

PPM file information (high and wide)

unsigned int nr_lines;

unsigned int nr_columns;

Public

Arrays for storing the r,g,b values

An array that holds the RGB values

unsigned char *r;

unsigned char *g;

unsigned char *b;

unsigned int height;

unsigned int width;

unsigned int max_col_val;

Total number of elements (pixels)

unsigned int size;

PPM ();

Create a PPM object and fill it with data stored in fname

Create a PPM object to load the data stored in the file fname

PPM (const std::string &fname);

Create an ' empty ' PPM image with a given width and height;the r,g,b arrays is filled//with zeros

Create an "empty" ppm image, the size specified by _width and _height; the RGB array is set to zero value

PPM (const unsigned int _width, const unsigned int _height);

The memory used by the R,g,b vectors if the object is destroyed

Releases the memory occupied by the RGB vector when this object is destroyed

~PPM ();

Read the PPM image from fname

Reading PPM images from the fname file

void Read (const std::string &fname);

Write the PPM image in fname

Save PPM image to fname file

void Write (const std::string &fname);

};

A viable coding scheme is:

Loads the image into memory.
Split the image into several parts, each of which is the responsibility of the thread, and the number of threads is the maximum that the system can tolerate, such as a quad-core computer that can enable 8 threads.
Start several threads-each thread is responsible for processing its own block of images.
The main thread processes the last image block.
Coordinates with the main thread and waits for all threads to complete.
Save the processed image.

Next we list the main function, which implements the above algorithm (thanks to wiched for the code changes proposed):

int main () {

std::string fname = std::string ("your_file_name.ppm");

PPM image (fname);

PPM Image2 (image.width, image.height);

Number of threads to use (the image would be divided between threads)

Number of threads used (images will be split to each thread to be processed)

int parts = 8;

Std::vectorbnd = Bounds (parts, image.size);

Std::thread *tt = new Std::thread[parts-1];

time_t start, end;

Time (&start);

Lauch Parts-1 Threads

Start a parts-1 thread

for (int i = 0; i < parts-1; ++i) {

Tt[i] = Std::thread (TST, &image, &image2, Bnd[i], Bnd[i + 1]);

}

Use the main thread to the work!!!

Use the main thread to do part of the task!

for (int i = parts-1; i < parts; ++i) {

TST (&image, &image2, Bnd[i], Bnd[i + 1]);

}

Join parts-1 Threads Collaborative parts-1 thread

for (int i = 0; i < parts-1; ++i)

Tt[i].join ();

Time (&end);

Std::cout << difftime (end, start) << "seconds" << Std::endl;

Save the result

Image2.write ("test.ppm");

Clear memory and exit frees up the RAM that is occupied and then exits

delete [] TT;

return 0;

}

Please disregard the hard-coded image file name and thread startup number. In real-world applications, you should allow users to enter these parameters interactively.

Now, to see how parallel code works, we need to assign enough task load, otherwise the overhead of creating and destroying threads will interfere with the test results, making our parallel tests meaningless. The input image should be large enough to show improvements in the performance of the parallel code. To this end, I used a 16000x10626 pixel size of the PPM format picture, space occupied about 512MB:

I used the GIMP software to mix some clutter into the picture. Miscellaneous effects such as:

The result of running the preceding code:

As you can see, the level of image clutter above has been weakened.

The result of the sample code running on a dual-core MacBook Pro:

Compiler	Optimization	Threads	Time	Speed up
clang++	None	1	601
clang++	None	4	001	2x
clang++	-o4	1	12s
clang++	-o4	4	6s	2x

On dual-core machines, the speed is twice times better than the serial mode (single thread) in parallel.

I also tested on a four-core intel® I7linux machine with the following results:

Compiler	Optimization	Threads	Time	Speed up
g++	None	1	535
g++	None	8	13s	2.54x
g++	-o4	1	9s
g++	-o4	8	3s	3x

Obviously, Apple's clang++ is better at improving parallel programs, which, in any case, is a result of compiler/machine features, and does not rule out that MacBook Pro uses 8GB of memory, while Linux machines are only 6GB.

If you are interested in learning the new c++11 grammar, I recommend reading Professional C + +, or C + + Primer Plus. C++11 multithreaded theme, it is recommended to read "C + + Concurrency in action", which is a good book.

from:http://article.yeeyan.org/view/234235/268247

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More