Comparison of mathematical computing performance of C #/C ++/Fortran in 32-bit/64-bit

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Test Platform

In my previous blog, I compared the performance of C # And C ++ in computing-intensive programs in VS2010. Link to the previous blog:

Http://www.cnblogs.com/ytyt2002ytyt/archive/2011/11/24/2261104.html

At that time, it was the result of a test with amd x-Dragon 9650 CPU (4 cores.

With the release of VS2012 and Intel Parallel Studio XE 2013, we will test the improvement of the VC11 compiler over VC10 and the performance differences between. net4.5. net4.0 and C. Fortran uses the latest Intel Parallel Studio XE 2013. In addition, as a well-established scientific computing language, Fortran also tries to focus on testing the performance differences from mainstream modern programming languages C ++ and C. Fortran, as the earliest programming language after compilation, is very convenient in matrix operations. It has long assumed the throne of performance for decades. Fortran 90/95 and Fortran2003/2008 have added a large number of modern language features and built in parallel support 20 years ago.

Test Platform:

CPU Intel Xeon E3 1230v2 3.5G 4-core 8 thread

Win7 64bit

Compiler:

C ++ vc11 (vs2012)

FORTRAN intel parallel studio Xe 2013

C #. net4.0. net4.5

Test code

However, to be fair, only one thread is used in the following tests, and there is no parallelism or matrix operation. All are default parameter compilation.

C # And C ++ code are the same as the previous test program

C ++ code:

C ++ Code # include <stdio. h> # include <stdlib. h> # include <time. h> # include <math. h> // provide # include <iostream> using namespace std; # define INTEG_FUNC (x) fabs (sin (x) for cin cout. // calculate the formula double dclock (void ); int main (void) {unsigned int I, j, N; double step, x_ I, sum; double start, finish, duration, clock_t; double interval_begin = 0.0; double interval_end = 2.0*3.141592653589793238; start = clock (); // initial time printf ("\ n"); printf ("Number of Chinese | Computed Integral | \ n "); // printf ("Interior Points | \ n"); for (j = 2; j <27; j ++) {N = 1 <j; step = (interval_end-interval_begin)/N; sum = INTEG_FUNC (interval_begin) * step/2.0; for (I = 1; I <N; I ++) {x_ I = I * step; sum + = INTEG_FUNC (x_ I) * step;} sum + = INTEG_FUNC (interval_end) * step/2.0; // printf ("% 10d | % 14e | \ n", N, sum); printf ("% 14e \ n", sum);} finish = clock (); // end time duration = (finish-start); printf ("\ n"); printf ("time = % 10e \ n", duration ); printf ("\ n"); int tempA; cin> tempA; return 0 ;}

C # code:

C # code using System; using System. collections. generic; using System. linq; using System. text; using System. threading. tasks; namespace ConsoleApplication1 {class Program {static void Main (string [] args) {int time = System. environment. tickCount; // Add a timer # region int I, j, N; double step, x_ I, sum; double start, finish, duration, clock_t; double interval_begin = 0.0; double interval_end = 2.0*3.141592653589793238; for (j = 2; j <27; j ++) {N = 1 <j; step = (interval_end-interval_begin)/N; sum = Math. abs (Math. sin (interval_begin) x step/2.0; for (I = 1; I <N; I ++) {x_ I = I * step; sum + = Math. abs (Math. sin (x_ I) * step;} sum + = Math. abs (Math. sin (interval_end) * step/2.0; Console. write (sum. toString () + "\ r \ n");} Console. write (System. environment. tickCount-time ). toString (); Console. readLine (); # endregion }}}

Fortran code:

Fortran code program ForAllProgramimplicit nonereal (8): time1, time2integer: I, j, k, Nreal (8): step, x_ I, sreal (8 ):: interval_begin = 0.0 real (8): interval_end = 2.0*3.141592653589793238 real, allocatable: ArrySum (:)! Call CPU_TIME (time1) do j = 2, 26N = 2 ** j! N = 1 <j; the bitwise operation replaces step = (interval_end-interval_begin)/N; s = Abs (Sin (interval_begin) * step/2.0; do I = 1, N-1! Here the corresponding C ++ <N is N-1x_ I = I * step; s = s + Abs (Sin (x_ I) * step; end dos = s + Abs (Sin (interval_end) * step/2.0; print *, send docall CPU_TIME (time2) print *, time2-time1end program

Note that in Fortran, the multiplication operator replaces the bitwise operation, and the Do loop to the N-1 corresponds to the <N

Test Results

Time unit: milliseconds

Time unit: the smaller the millisecond, the better.

Test conclusion

C # compared with. net 4.5 and. net 4.0, the performance is only slightly improved in the 32bit of. net4.5. The strange thing is that in. net4.5, the performance of 32bit is higher than that of 64bit.

C ++ has improved significantly in VS2012 than VS2010. Microsoft's C ++ CX performance may be similar to Intel's C ++ performance. 64bit performance is significantly higher than 32bit performance.

In computing-intensive problems, Fortran has terrible performance, and even exceeded my original imagination. Without any optimization, the performance exceeds 3 times of C ++, which is 5-6 times of C. The leader in numerical computation is not Fortran. This high performance may be due to the fact that Simd vectoring (AVX instruction set on the local machine) can be fully utilized by default ). C ++, even if Intel's vectorized compilation is enabled (Intel is enabled by default), it is difficult to fully implement automatic vectorization due to complicated syntaxes. You need to add vectorized compilation commands, such as # program simd, or even manually encode vectorized commands (such as optimization implementation in OpenCV ). In this way, the workload and complexity of program optimization will be greatly improved.

It can be seen that for large-scale scientific computing, Fortran is still the most suitable choice. In addition, a large number of existing mathematical computing class libraries are compiled by Fortran, And the syntax is relatively simple. It is indeed a perfect match for numerical computing.

C ++ has inherent advantages in interaction with the underlying system. C # is suitable for presentation layer development and overall architecture design, which is the most convenient and elegant.

Outlook

The next article will continue to test the performance of CPU parallelism and GPU acceleration. Based on past experience, GTX460 graphics cards can achieve 10-20 times the performance of a single CPU thread after optimization in float computing. However, considering the parallel performance of multiple CPU cores and Fortran, it is estimated that the ultimate advantage of GPU will not be that great, but it may be only 2-3 times better. For dual-precision computing, because the dual-precision of desktop graphics cards is only 1/8 of the single-precision (tesla computing card is 1/2, but expensive, the latest open puller 110 architecture tesla k20 and Titans is 1/3, the theoretical dual-precision exceeds 1 T), so it is estimated that the dual-precision of the core tesla can only reach 8-thread CPU Parallel 2-3, and the Kepler may be higher. However, this is just speculation that it will not be known until the next test.

Address: Yang Tao's learning memorandum http://www.cnblogs.com/ytyt2002ytyt/archive/2013/04/02/2996718.html.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Comparison of mathematical computing performance of C #/C ++/Fortran in 32-bit/64-bit

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Comparison of mathematical computing performance of C #/C ++/Fortran in 32-bit/64-bit

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support