Compile the hpl (hz-2.0_fermi_v08.tar) Process

Source: Internet
Author: User

Hpl: A portable Implementation of the high-performance linpack benchmark for Distributed-memory computers

InstallHpl for GPU (maid)Previously, a compiler was pre-installed on the machine, including MPI in the parallel environment, Blas, and vsipl.

I installed BLAS and cblas. Do not remember if it is necessary, I also installed a LAPACK (linear algebra package, http://www.netlib.org/lapack ).

1. Blas

Relatively simple. Just make

2. clbas

In the make file, there is a bllib path indicating to ../librefblas. A, and change it to the BLAs. Linux. A location of Blas.

3. lapck

In the make file, specify the BLAs location. After successful execution, the following result is displayed.

--> LAPACK testing summary <--
Processing LAPACK testing output found in the testing direcory
Summary Nb test run numerical error other error
========================================================== ================================
Real 1064911 39 (0.004%) 0 (0.000%)
Double Precision 1052315 203 (0.019%) 0 (0.000%)
Complex 508588 2 (0.000%) 0 (0.000%)
Complex16 530862 28 (0.005%) 0 (0.000%)

--> All precisions 3156676 272 (0.009%) 0 (0.000%)

4. hpl for GPU

  1. The key is to change the location of these two files in make, and keep the default value for others.
    Ladir: directory where the cblas library or vsipl library is located
    Lalib: header file and library file of the cblas library or vsipl Library
    After compilation, generate the executable file xhpl (in the hpl/<arch>/bin directory ). My <arch> uses the default cuda_pinned
    You can refer to the compilation file templates on various platforms in the setup directory. I use make. linux_pii_cblas.

  2. There are several prompts in the middle that libhpl. A, needed by 'dexe. grd' cannot be found. After I write the libhpl. A address defined in make. cuda_pinned into an absolute address, the error disappears.
    Original: Maid = $ (libdir)/libhpl.
    Change to: maid/home/michaelchen/shoC/hpl-2.0_FERMI_v08/lib/cuda_pinned/libhpl.
  3. The MPI address must also be correct. I use openmpi and Lib uses lib/libmp I. So.
  4. The linker in make. cuda_pinned uses G77, which is not found here and has been updated to gfortran.
  5. In case of the make. inc access permission denied error, most of the links start with the root file and need to be modified to link to the content of our own make. cuda_pinned file.
    Use ln-SF source/make. cuda_pinned make. inc
  6. The following is a compilation file: Make. cuda_pinned.
    The bold part is the place where I modified it.
    #----------------------------------------------------------------------
    #-Shell --------------------------------------------------------------
    #----------------------------------------------------------------------
    #
    Shell =/bin/sh
    #
    Cd = CD
    CP = CP
    Ln_s = ln-S
    Mkdir = mkdir
    Rm =/bin/Rm-F
    Touch = touch
    #
    #----------------------------------------------------------------------
    #-Platform identifier ------------------------------------------------
    #----------------------------------------------------------------------
    #
    Arch = cuda_pinned
    #
    #----------------------------------------------------------------------
    #-Hpl directory structure/hpl library ------------------------------
    #----------------------------------------------------------------------
    #
    Topdir =/home/Michael Chen/shoC/hpl-2.0_FERMI_v08
    Incdir = $ (topdir)/include
    Bindir = $ (topdir)/bin/$ (ARCH)
    Libdir = $ (topdir)/lib/$ (ARCH)
    #
    Maid =/home/michaelchen/shoC/hpl-2.0_FERMI_v08/lib/cuda_pinned/libhpl.
    #
    #----------------------------------------------------------------------
    #-Message Passing Library (MPI )--------------------------------------
    #----------------------------------------------------------------------
    # Mpinc tells the C compiler where to find the message passing Library
    # Header files, mplib is defined to be the name of the library to be
    # Used. The variable mpdir is only used for defining mpinc and mplib.
    #
    Mpdir =/opt/openmpi-1.4.3
    Mpinc =-I $ (mpdir)/include
    Mplib = $ (mpdir)/lib/libmp I. So
    #
    #----------------------------------------------------------------------
    #-Linear algebra Library (blas or vsipl )-----------------------------
    #----------------------------------------------------------------------
    # Lainc tells the C compiler where to find the linear algebra Library
    # Header files, lalib is defined to be the name of the library to be
    # Used. The variable ladir is only used for defining lainc and lalib.
    #
    Ladir = $ (home)/shoC/cblas
    Lainc =
    Lalib = $ (ladir)/lib/cblas_linux.a
    #
    #----------------------------------------------------------------------
    #-F77/C interface --------------------------------------------------
    #----------------------------------------------------------------------
    # You can skip this section if and only if you are not planning to use
    # A Blas library featuring a Fortran 77 interface. Otherwise, it is
    # Necessary to fill out the f2cdefs variable with the appropriate
    # Options. ** one and only one ** option shoshould be chosen in ** each **
    # The 3 following categories:
    #
    #1) Name Space (how C calla Fortran 77 routine)
    #
    #-Dadd _: all lower case and a suffixed underscore (suns,
    # Intel,...), [Default]
    #-Dnochange: all lower case (IBM rs6000 ),
    #-Dupcase: All upper case (cray ),
    #-Dadd _: the Fortran compiler in use is f2c.
    #
    #2) C and Fortran 77 integer Mapping
    #
    #-Df77_integer = int: Fortran 77 integer is a C int, [Default]
    #-Df77_integer = long: Fortran 77 integer is a C long,
    #-Df77_integer = short: Fortran 77 integer is a C short.
    #
    #3) Fortran 77 String Handling
    #
    #-Dstringsunstyle: the string address is passed at the string loca-
    # Tion on the stack, and the string length is then
    # Passed as an f77_integer after all explicit
    # Stack arguments, [Default]
    #-Dstringstructptr: the address of a structure is passed by
    # Fortran 77 string, and the structure is of
    # Form: struct {char * CP; f77_integer Len ;},
    #-Dstringstructval: A structure is passed by value for each FORTRAN
    #77 string, and the structure is of the form:
    # Struct {char * CP; f77_integer Len ;},
    #-Dstringcraystyle: Special option for cray machines, which uses
    # Cray FCD (Fortran character descriptor)
    # Interoperation.
    #
    F2cdefs =
    #
    #----------------------------------------------------------------------
    #-Hpl uplodes/libraries/specifics -------------------------------
    #----------------------------------------------------------------------
    #
    Maid =-I $ (incdir)/$ (ARCH) $ (lainc) $ (mpinc)
    Maid $ (maid) $ (lalib) $ (mplib)
    #
    #-Compile time options -----------------------------------------------
    #
    #-Dhpl_copy_l force the copy of the Panel L before bcast;
    #-Dhpl_call_cblas call the cblas interface;
    #-Dhpl_call_vsipl call the vsip library;
    #-Dhpl_detailed_timing enable detailed timers;
    #
    # By default hpl will:
    # *) Not copy L before broadcast,
    # *) Call the BLAs Fortran 77 interface,
    # *) Not display detailed timing information.
    #
    Maid =-dhpl_call_cblas
    #
    #----------------------------------------------------------------------
    #
    Maid $ (f2cdefs) $ (maid)
    #
    #----------------------------------------------------------------------
    #-Compilers/linkers-optimization flags ---------------------------
    #----------------------------------------------------------------------
    #
    Cc =/usr/bin/GCC
    Ccnoopt = $ (maid)
    Ccflags = $ (maid)-fomit-frame-pointer-O3-funroll-Loops
    #
    # On some platforms, it is necessary to use the Fortran linker to find
    # The Fortran internals used in the BLAs library.
    #
    Linker =/usr/bin/gfortran
    Linkflags = $ (ccflags)
    #
    Archiver = ar
    Arflags = r
    Ranlib = echo
    #
    #----------------------------------------------------------------------

  7. Some. So and. A files cannot be found several times during the runtime. You can specify the path files in the same directory clearly and remember that source takes effect.
    Or directly copy/link the corresponding file to the directory.
    For example, I use Cuda 4.0, but hpl for Cuda is built on cuda3.0. Therefore, when it looks for libcublas. so.3 and other files in the usr/local/Cuda/lib64 directory, it will prompt that it cannot be found, only so.4.
    We can use the above solution.
  8. An error may also occur during running:
    Hpl-2.0_FERMI_v08/bin/cuda_pinned/xhpl: Error while loading shared libraries: libcublas. so.3: wrong Elf class: elfclass32
    This is because the Lib version is incorrect in Cuda. Instead of usr/local/Cuda/lib, use the so file in usr/local/Cuda/lib64.

Mpirun-NP 1 xi03 is successfully run.

If there is a problem such as "error allocating scratch space 2048.00 MB", find the source file in the corresponding src/Cuda (for example, '123' in cuda_dgemm.c row 229th '), modify it accordingly (for example, change it to 1024) and reduce it.
Eg: err1 = cudamalloc (void **) & dev_scratch [0], (size_t) (1024.0*1024.0*1024.0 ));

For details about how to fine-tune the values in hpl. dat, refer to the tune file in the hpl root directory.

Ref: http://blog.sina.com.cn/s/blog_442806280100mxbu.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.