The key to extracting higher performance from the application being created by the compiler and the operating system is to provide sufficient information about the code intent. When you fully understand the functions and other information that the Code intends to implement, it is possible to maximize the parallel throughput of the code at compile time and runtime, so that developers can focus more on the business issues they are concerned about, and deliver heavyweight multi-core multi-processor task plans to compilers, the Runtime Library and the Infrastructure code in the operating system.
Cyclic functions are an important part, because in all available hardware resources, each part of the isolated loop can provide higher application performance in general. Consider the following small case: select all elements in the combination by iteration to obtain the sum. The simplest and most direct execution method is as follows:
Std: vector <int> v;
V. push_back (1 );
V. push_back (5 );
Int total = 0;
For (int ix = 0; ix <v. size (); ++ ix ){
Total + = v [ix];
}
The above example is very convenient for manual reading and writing. For developers familiar with the C language family syntax, the intention of this loop is also very easy to understand. However, for the compiler and the Runtime Library combination, to plan this loop among multiple threads, it also requires an indication similar to the OpenMP Compilation instruction to tell it where there is room for optimization:
Std: vector <int> v;
V. push_back (1 );
V. push_back (5 );
Int total = 0;
# Pragma omp
For (int ix = 0; ix <v. size (); ++ ix ){
# Pragma omp atomic
Total + = v [ix];
}
The first OpenMP directive puts forward the need to run a for loop with multiple threads, while the second omp atomic directive is used to prevent multiple threads from simultaneously writing data to the total number of variables. For OpenMP, detailed descriptions of all the instructions are provided in the reference documentation of the MSDN library.
If declarative loops are used, it is easier to apply parallel methods to vector summation. The STL for_each function is an ideal alternative. The above example is rewritten as follows:
Class Adder {
Private:
Int _ total;
Public:
Adder (): _ total (0 ){}
Void operator () (int & I)
{
_ Total + = I;
}
Operator int ()
{
Return _ total;
}
};
Void VectorAdd ()
{
Std: vector <int> v;
V. push_back (1 );
V. push_back (5 );
Int total = std: for_each (v. begin (), v. end (), Adder ());
}
Here, the specific for loop is discarded, and the Code for finding the vector and has become clean. However, a class needs to be defined using a series of runtimes, this greatly complicate the solution. Unless there are a large number of similar summation statements in the code base, a developer will not spend more time defining a new class just for the benefit of STL for_each.
By carefully checking this Adder class, we can see that most of its content is only used to meet the call conditions for using instances as function objects. In this class, only the row _ total + = I is involved in computing. With this in mind, C ++ 0x provides a much simplified syntax technique implemented in lambda functions. Lambda functions remove the need for these code and allow the definition of a predicate function in another declaration. Therefore, the VectorAdd function can be rewritten as follows:
Std: vector <int> v;
V. push_back (1 );
V. push_back (5 );
Int total = 0;
Std: for_each (v. begin (), v. end (),
[& Total] (int x) {total + = x ;}
);
The syntax of Lambda functions is straightforward. The first lambda element in square brackets tells the compiler that the local variable total is captured by reference (in this case, it is best to use reference capture, because the vector and result are still valid after for_each), the second part of lambda is the parameter list. The last part of Lambda is the main body of the function. In this example, the value of parameter x is added to the total variable.
If there is no variable to be captured in the lambda function, or you only need to capture a copy of the variable, the square brackets at the beginning of the function can be left blank:
Std: for_each (v. begin (), v. end (), [] (int x ){
Std: cout <x <std: endl;
});
You can also use the following methods to capture data:
Int total = 0;
Bool displayInput = true;
Std: for_each (v. begin (), v. end (), [& total, displayInput] (int x ){
Total + = x;
If (displayInput ){
Std: cout <x <std: endl;
}
});
Here, the variable displayInput is captured through the copy. During compilation, the Visual C ++ compiler reports the error C3491: 'displayinput': a variable whose value is changed in the lambda function cannot be captured by a value in an immutable lambda.
Another noteworthy part of Lambda functions is its return value type. The compiler generally tries its best (and is also required) to deduce the type of the return value of a lambda expression. However, for a complex multi-line expression, it may need to declare the return value type exactly. The return value type declaration is implemented by adding the-> operator between the lambda function parameter and the function body and the type of the returned value to be declared:
Std: for_each (v. begin (), v. end (),
[&] (Int x)-> void {total + = x ;});
}
C ++ has lambda functions, which makes Declarative Programming and the use of STL algorithm more concise. Lambda functions can be defined between executable code lines within the function body. In addition to providing powerful optimization tips for the compiler, the code pattern promoted by Lambda functions makes it easier to understand which code is to implement what functions. Visual C ++ 2010 will significantly improve functions in parallel processing, and lambda functions will be one of the important means to achieve these improvements.