今天學習了OpenMP sections。sections的主要功能是使用者可以將一個任務分成獨立的幾個section,每個section由不同的thread來處理。
C/C++測試代碼:
int a = 2;
int b = 20;
int c = 200;
int d = 2000;
int sum;
omp_set_num_threads( 4 );
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
a = 1;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
b = 10;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
c = 100;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
#pragma omp section
{
d = 1000;
printf("execute thread ID is %d/n", omp_get_thread_num());
}
}
sum = a + b + c +d;
printf("sum = %d/n", sum);
}
測試結果為:
execute thread ID is 3
execute thread ID is 1
execute thread ID is 2
execute thread ID is 0
sum = 1111
sum = 1111
sum = 1111
sum = 1111
可以看出,四個section分配給了四個thread分別執行,由於OpenMP sections是自動同步各個section的,所以sum的結果是
所期望的。為了更直觀的瞭解section是自動同步的(在計算sum之前,會等待所有的thread執行結束),修改代碼如下:
int a = 2;
int b = 20;
int c = 200;
int d = 2000;
int sum;
omp_set_num_threads( 4 );
#pragma
omp parallel
{
#pragma omp sections
{
#pragma
omp section
{
a = 1;
printf("execute thread ID is
%d/n", omp_get_thread_num());
for(int i = 0; i < 0x7ffff; i++)
{
a = 1;
b = 2;
c = 3;
d = 4;
}
}
#pragma omp section
{
b = 10;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7fffff; i++)
{
a = 10;
b = 20;
c = 30;
d = 40;
}
}
#pragma omp section
{
c = 100;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7ffffff; i++)
{
a = 100;
b = 200;
c = 300;
d = 400;
}
}
#pragma omp section
{
d = 1000;
printf("execute thread ID is %d/n", omp_get_thread_num());
for(int i = 0; i < 0x7fffffff; i++)
{
a = 1000;
b = 2000;
c = 3000;
d = 4000;
}
}
}
sum = a + b + c +d;
printf("sum = %d/n", sum);
}
測試結果為:
execute thread ID is 0
execute thread ID is 2
execute thread ID is 1
execute thread ID is 3
sum = 10000
sum = 10000
sum = 10000
sum = 10000
第四個section的執行時間最長,所以sum結果的要等到第四個section被執行完畢後開始計算。
並行程式一個重要的問題就是將待處理問題劃分成不同的模組,這些模組之間是獨立的,可以通過sections分給不同的處理器去執行。
注意到sum = 10000被輸出了4次,這是因為sum = a + b + c +d;代碼在parallel region, 所有的線程都會執行一次。
摘抄:The binding thread set for a sections region is the current team. A sections region binds to the innermost enclosing
parallel region. Only the threads of the team executing the binding parallel region participate in the execution of the
structured blocks and (optional) implicit barrier of the sections region.