C ++ compiler Performance Comparison

Source: Internet
Author: User
Tags intel core 2 duo

 

Currently on the market, mainstream C/C ++ compilers include CL, gcc of M $, icl of Intel, pgcc of PGI, and bcc of Codegear (originally belonging to Borland ). Cl is the most widely used in Windows, while gcc is the first choice for C/C ++ compilers on a broader platform. However, when it comes to capability optimization, the ranking may not be consistent with their market share.

 

 

 

Today, we have made a comparison of the numerical performance of each compiler. The test code is a program for credit calculation. It comes from the example program of the intel compiler and modifies a header file so that each compiler can compile.

 

# Include <stdio. h>

# Include <stdlib. h>

# Include <time. h>

# Include <math. h>

 

// Function to be integrated

// Define and prototype it here

// | Sin (x) |

# Define INTEG_FUNC (x) fabs (sin (x ))

 

// Prototype timing function

Double dclock (void );

 

Int main (void)

{

// Loop counters and number of interior points

Unsigned int I, j, N;

// Stepsize, independent variable x, and accumulated sum

Double step, x_ I, sum;

// Timing variables for evaluation

Double start, finish, duration, clock_t;

// Start integral from

Double maid = 0.0;

// Complete integral

Double interval_end = 2.0*3.141592653589793238;

 

// Start timing for the entire application

Start = clock ();

 

Printf ("\ n ");

Printf ("Number of | Computed Integral | \ n ");

Printf ("Interior Points | \ n ");

For (j = 2; j <27; j ++)

{

Printf ("------------------------------------- \ n ");

 

// Compute the number of (internal rectangles + 1)

N = 1 <j;

 

// Compute stepsize for N-1 internal rectangles

Step = (interval_end-interval_begin)/N;

 

// Approx. 1/2 area in first rectangle: f (x0) * [step/2]

Sum = INTEG_FUNC (interval_begin) * step/2.0;

 

// Apply midpoint rule:

// Given length = f (x), compute the area of

// Rectangle of width step

// Sum areas of internal rectangle: f (xi + step) * step

 

For (I = 1; I <N; I ++)

{

X_ I = I * step;

Sum + = INTEG_FUNC (x_ I) * step;

}

 

// Approx. 1/2 area in last rectangle: f (xN) * [step/2]

Sum + = INTEG_FUNC (interval_end) * step/2.0;

 

Printf ("% 10d | % 14e | \ n", N, sum );

}

Finish = clock ();

Duration = (finish-start );

Printf ("\ n ");

Printf ("Application Clocks = % 10e \ n", duration );

Printf ("\ n ");

 

Return 0;

}

Of course, this code is from intel, and of course it is very suitable for intel compilers. The following tests are performed on Intel Core 2 Duo.

 

 

 

Gcc (GCC TDM-2 for MinGW) 4.3.0 VC 9.0 (cl 15.00.21022.08) Intel (icl 10.1) PGI (pgcc 7.16) CodeGear (bcc32 6.10)

Optimization prohibited

-O0/Od-O0-Od

17161 14461 12441 10514 13400

17133 14430 11687 9956 12917

17155 14476 11871 10099 13026

Compilation option-O2

13011 7737 4540 9348 12636

16571 7706 4185 9148 13026

16573 7706 4042 9183 13057

Platform Optimization

-March = core2-O2/arch: SSE2/O2-QxT-tp core2-O2 none

16060 7710 1938 9578

 

 

 

 

The test results show that intel compiler is very interested in the numerical calculation method, especially for the optimization of a certain CPU, which can improve a lot of performance. GCC is a bit disappointing. In the comparison of prohibiting optimization to-O2-level optimization, we can see that the optimization effects of intel and m $ compilers are very obvious, while the improvements after optimization by other compilers are very limited. If you give a ranking, it will be icl> cl> pgcc> bcc> gcc.

 

 

 

In addition, in a linux environment on a P4 1.5g server, the test results are as follows:

 

Gcc icc pgCC

-O2-O2-O2

24920000 10840000 22270000

-O0-O0-O0

28290000 19210000 24320000

-March = pentium4-O2-xN-tp piv-O2

24990000 6640000 22150000

 

 

 

 

Similarly, intel is the best, while gcc is the worst.

 

 

 

In addition, we tested Athlon X2 4800 + on Linux. The following table is displayed.

 

Gcc icc pgcc

-O0-O0-O0

9390000 14950000 9950000

-O2-O2-O2

8910000 9240000 9400000

-March = amdfam10-O2-msse3-O2-tp k8-32-O2

8800000 3800000 9030000

 

 

 

 

Although icc is mainly for intel processors, as long as the optimization options are correct, it can also greatly improve amd cpu performance. Gcc also returns to the normal level. The strange thing is that the pgi compiler is that I haven't found any good options yet.

 

 

 

In conclusion, in the numerical calculation method, the "fastest" choice should belong to intel.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.