C ++ compiler Performance Comparison

Last Update:2013-12-08 Source: Internet

Author: User

Tags intel core 2 duo

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Currently on the market, mainstream C/C ++ compilers include CL, gcc of M $, icl of Intel, pgcc of PGI, and bcc of Codegear (originally belonging to Borland ). Cl is the most widely used in Windows, while gcc is the first choice for C/C ++ compilers on a broader platform. However, when it comes to capability optimization, the ranking may not be consistent with their market share.

Today, we have made a comparison of the numerical performance of each compiler. The test code is a program for credit calculation. It comes from the example program of the intel compiler and modifies a header file so that each compiler can compile.

# Include <stdio. h>

# Include <stdlib. h>

# Include <time. h>

# Include <math. h>

// Function to be integrated

// Define and prototype it here

// | Sin (x) |

# Define INTEG_FUNC (x) fabs (sin (x ))

// Prototype timing function

Double dclock (void );

Int main (void)

{

// Loop counters and number of interior points

Unsigned int I, j, N;

// Stepsize, independent variable x, and accumulated sum

Double step, x_ I, sum;

// Timing variables for evaluation

Double start, finish, duration, clock_t;

// Start integral from

Double maid = 0.0;

// Complete integral

Double interval_end = 2.0*3.141592653589793238;

// Start timing for the entire application

Start = clock ();

Printf ("\ n ");

Printf ("Number of | Computed Integral | \ n ");

Printf ("Interior Points | \ n ");

For (j = 2; j <27; j ++)

{

Printf ("------------------------------------- \ n ");

// Compute the number of (internal rectangles + 1)

N = 1 <j;

// Compute stepsize for N-1 internal rectangles

Step = (interval_end-interval_begin)/N;

// Approx. 1/2 area in first rectangle: f (x0) * [step/2]

Sum = INTEG_FUNC (interval_begin) * step/2.0;

// Apply midpoint rule:

// Given length = f (x), compute the area of

// Rectangle of width step

// Sum areas of internal rectangle: f (xi + step) * step

For (I = 1; I <N; I ++)

{

X_ I = I * step;

Sum + = INTEG_FUNC (x_ I) * step;

}

// Approx. 1/2 area in last rectangle: f (xN) * [step/2]

Sum + = INTEG_FUNC (interval_end) * step/2.0;

Printf ("% 10d | % 14e | \ n", N, sum );

}

Finish = clock ();

Duration = (finish-start );

Printf ("\ n ");

Printf ("Application Clocks = % 10e \ n", duration );

Printf ("\ n ");

Return 0;

}

Of course, this code is from intel, and of course it is very suitable for intel compilers. The following tests are performed on Intel Core 2 Duo.

Gcc (GCC TDM-2 for MinGW) 4.3.0 VC 9.0 (cl 15.00.21022.08) Intel (icl 10.1) PGI (pgcc 7.16) CodeGear (bcc32 6.10)

Optimization prohibited

-O0/Od-O0-Od

17161 14461 12441 10514 13400

17133 14430 11687 9956 12917

17155 14476 11871 10099 13026

Compilation option-O2

13011 7737 4540 9348 12636

16571 7706 4185 9148 13026

16573 7706 4042 9183 13057

Platform Optimization

-March = core2-O2/arch: SSE2/O2-QxT-tp core2-O2 none

16060 7710 1938 9578

The test results show that intel compiler is very interested in the numerical calculation method, especially for the optimization of a certain CPU, which can improve a lot of performance. GCC is a bit disappointing. In the comparison of prohibiting optimization to-O2-level optimization, we can see that the optimization effects of intel and m $ compilers are very obvious, while the improvements after optimization by other compilers are very limited. If you give a ranking, it will be icl> cl> pgcc> bcc> gcc.

In addition, in a linux environment on a P4 1.5g server, the test results are as follows:

Gcc icc pgCC

-O2-O2-O2

24920000 10840000 22270000

-O0-O0-O0

28290000 19210000 24320000

-March = pentium4-O2-xN-tp piv-O2

24990000 6640000 22150000

Similarly, intel is the best, while gcc is the worst.

In addition, we tested Athlon X2 4800 + on Linux. The following table is displayed.

Gcc icc pgcc

-O0-O0-O0

9390000 14950000 9950000

-O2-O2-O2

8910000 9240000 9400000

-March = amdfam10-O2-msse3-O2-tp k8-32-O2

8800000 3800000 9030000

Although icc is mainly for intel processors, as long as the optimization options are correct, it can also greatly improve amd cpu performance. Gcc also returns to the normal level. The strange thing is that the pgi compiler is that I haven't found any good options yet.

In conclusion, in the numerical calculation method, the "fastest" choice should belong to intel.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More