The first is the compiler. Everyone knows this.
More importantly, Intel's math kernel library-MKL
In addition, Intel has a set of cluster tools, including cluster MKL, ITC, ITA, and Intel MPI.
As we all know, high-performance computing is mostly used in engineering computing. Currently, engineering computing mainly requires the following things: FFT (Fast Fourier transformation) and LAPACK (linear algebra package) blas (Basic linear algebra function library) and scapack (highly scalable LAPACK, mainly used for distributed memory architecture, that is, the parallel LAPACK of the cluster structure ). These things have free open-source implementations, such as fftw (Mit-based FFT), LAPACK, and scalapack. Among them, scalapack also needs blacs (Basic linear algebra communication function library) blas also has atlas and Goto. This item has never been touched before. Here is an Intel-provided figure, which is very intuitive and clear (from the figure we can see that this item is a library for linear algebra based on MPI ):
Appendix 1
For the above, Intel's MKL and cluster MKL are both provided, and the performance is much better than those of the above open source! Among them, MKL can provide Blas, LAPACK, FFT, and cluster MKL. On this basis, it also provides scalapack and optimized linpack execution Program-xhpl. Therefore, we recommend intel MKL to cluster users. We should push cluster MKL! I did not know it before!
For example, xhpl of linpack requires Blas support, gromacs requires fftw, and VASP of parallel version requires FFT, scalapack, And Blas. You can use intel's cluster MKL directly. You don't need to manually compile those open-source libraries on your own, and the performance is better!
However, there is a small problem here, that is, FFT. There are many types of Fast Fourier transformation. fftw, that is, MIT, is widely used. Many codes are written for this library, and Intel makes an improvement on this base, at the same time, Intel's fft api is not exactly the same as fftw, which leads to some difficulties and problems when migrating applications. For example, if an application is written for fftw and needs to be migrated to Intel's MKL or cluster MKL, some modifications may be required because the APIs of the two are not the same, here is an Intel official document with some instruction in this regard:
Okay. As described above, we should know that Intel's MKL is a good solution for these engineering math libraries, except for FFT. :)
Finally, let's take a look at Intel's entire software product line for some simple explanations:
1. Intel compilers-no more
2. vtune-this is a very powerful performance tuning tool, because first, the program that does not need to be debugged has the source code, as long as there is an executable file. Second, he can analyze the target, and then give the number of clock cycles occupied by each sentence of the target code. During program execution, he can see the hitrate of L1 cache, L2 cache, and so on! Therefore, it is very powerful. However, vtune seems to be able to debug only programs in the SMP structure and cannot Debug Programs in the cluster structure.
3. Intel performance primitives (IPP)-This stuff seems to be used for video/audio encoding/decoding, and has little to do with HPC
4. Intel math kernel library and Intel cluster math kernel library-as mentioned above
5. Intel thread tools-the thread debugging tool has little to do with HPC.
6. Intel cluster toolkit-in fact, this is a general package integrating intel MPI, ITC, ITA and Intel cluster MKL! Haha, you don't have to buy this package separately. It will be cheaper, hoho
7. tuning tools for Intel trace analyser and Intel trace collector (ITA and ITC)-cluster programs. First, we need to link the ITC library when linking a program. After the program runs, ITC will generate a pile of data files. Then, you can see the analysis result with ita. He can analyze the proportion of computing and communication in the program to tell us where the program bottleneck is. In other words, as long as we have a program's. o file or source file, we can analyze the program (only one Executable File Won't work ). Advanced functions can analyze a specific piece of code, but this requires some changes to the Code.
8. Intel MPI-mpich-based MPI, featuring compatibility with InfiniBand, Myrinet, in addition, some other things can be integrated into Intel's MPI through an Intel development language. The overall perception is that Intel's MPI does not aim to improve performance, but to integrate multiple interconnect or other modules to provide a unified interface.