OpenMP multithreaded Programming timing issues

Source: Internet
Author: User
Tags xeon e5

When doing the matrix multiplication parallelization test, there was a problem when timing the clock () with <time.h>.

First look at the serial program:

 matrix_cpu.c#include <stdio.h> #include  <stdlib.h> #include  <time.h># Define num 2048void matrixmul (Float *a, float *b, float *c, int  m, int k, int n) {    int i, j, k;     for (i = 0; i < m; i++)     {         for (j = 0; j < n; j++)          {            float  Sum = 0.0f;            for (k =  0; k < k; k++)              {                sum  += a[i*k+k] *&nBsp b[k*n+j];            }             C[i*N+j] = sum;         }    }}int main (int argc, char* argv[]) {     float *A, *B, *C;    clock_t start, finish;     double duration;    A =  (float *)  malloc  (sizeof (float)  * num * num);    b =  (float *)  malloc   (sizeof (float)  * num * num);    c =  (float *)  malloc  (sizeof (float)  * num * num);     memset (A, 0,  sizeof (float)  * num * num);     memset (B, 0, sizeof ( float)  * num *&nbsp NUM);     memset (c, 0, sizeof (float)  * num * num);         printf ("start...\n");     start = clock ();     matrixmul (A, b, c, num, num, num);     Finish = clock ();        duration =  (double) ( Finish - start)  / clocks_per_sec;    printf ("Time: %fs\n",  duration);     return 0;}

After compiling, run the program and get the following result:

[Email protected] matrix]$./matrix_cpustart ... time:26.130000s

Since the CPU is Xeon e5-2650, it is relatively fast (but still serial, that is, single-core one-threaded), so it will take 26 seconds (171 seconds on the blogger's i5-4200 ThinkPad).

Plus the time command to run again, the results are as follows:

[Email protected] matrix]$ time/matrix_cpustart ... Time:26.770000sreal0m28.073suser0m26.779ssys0m0.019s

It can be seen that the time and the statistics of the program, the actual execution time due to the addition of malloc and other time so long, but it is reasonable.


Then, look at the parallel OpenMP program:

#include  <stdio.h> #include  <stdlib.h> #include  <time.h> #define  NUM  2048#define thread_num 2void matrixmul (Float *a, float *b, float *c,  int m, int k, int n) {    int i, j, k; #pragma  omp parallel for private (j,k)  num_threads (thread_num)     for (i  = 0; i < m; i++)     {         for (j = 0; j < n; j++)          {            float sum =  0.0f;             #pragma  ivdep             for (k = 0; k < k ;  k++)             {                 sum += A[i*k+k] * B[k*N+j];             }             C[i*N+j] = sum;        }     }}int main (int argc, char* argv[]) {    float *a, *b,  *C;    clock_t start, finish;    double duration;     A =  (float *)  malloc  (sizeof (float)  * num *  num);    b =  (float *)  malloc  (sizeof (float)  *  Num * num);    c =  (float *)  malloc  (sizeof (float)  * num *&nbSp NUM);     memset (a, 0, sizeof (float)  * num * num);     memset (b, 0, sizeof (float)  * num * num);     memset (c, 0, sizeof (float)  * num * num);     printf (" Start...\n ");     start = clock ();     matrixmul (A, B,  c, num, num, num);     finish = clock ();     duration =  (Double) (Finish - start)  / CLOCKS_PER_SEC;     printf ("time: %fs\n",  duration);     return 0;}

As you can see, the OpenMP program uses only two threads, so the run time can theoretically be halved.

After compiling, run the program and get the following result:

[Email protected] matrix]$./matrix_ompstart ... time:26.550000s

This is strange, clearly in the heart to count a bit about 15 seconds, but why the time is 26 seconds?

Plus the time command to run it again:

[Email protected] matrix]$ time/matrix_ompstart ... Time:26.440000sreal0m13.438suser0m26.457ssys0m0.016s

As you can see, the actual run time is 13 seconds, but the user is more than 13 seconds and is almost twice times the real one.

Looked up, and found such an explanation:

Real: Wall time, that is, the actual elapsed time from open to end of the program user: The actual time spent executing code (excluding kernel calls), refers to the actual CPU time that is consumed by the process execution SYS: time spent on the program's kernel call

In a single-threaded serial, when only one thread is running, the user represents a CPU time. However, when it comes to multithreading, a process may have multiple threads executing in parallel, but the user adds all the thread time, that is, a total time, so that the user's time is basically equal to the user time of the single thread.

In this way, we set the number of threads to 4 and then run the code (about 7 seconds):

[Email protected] matrix]$./matrix_ompstart ... Time:27.270000s[[email protected] matrix]$ time/matrix_ompstart ... Time:27.170000sreal0m7.486suser0m27.176ssys0m0.018s

Can be found, the actual running time of 7 seconds, the total CPU time of 27 seconds, almost:

Then the number of threads to 16, and then run the code (about 2 seconds more):

[Email protected] matrix]$./matrix_ompstart ... Time:33.980000s[[email protected] matrix]$ time/matrix_ompstart ... Time:33.530000sreal0m2.241suser0m33.479ssys0m0.075s

It can be found that the total CPU time has increased trend, but the actual time is still greatly reduced. E5-2650 is a 8 core 16 thread, and then the thread time increases instead.


Summary : In the case of multi-threaded, or use the time command to see it.

OpenMP multithreaded Programming timing issues

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.