Performance of Intel Compiler

Source: Internet
Author: User

For a long time did not get this thing, today suddenly want to try, the code did not finish, later fill.

1#include <stdio.h>2#include <stdlib.h>3#include <time.h>4#include <math.h>5#include <Windows.h>6 7 #defineM 10248 9 floatMata[m][m];Ten floatMatb[m][m]; One floatMatc[m][m]; A  - voidInitmatrix (float*MatrixX) - { theRegisterinti; -      for(i =0; I < M * m; i + + ) -     { -*matrixx + + = (float) (rand ()% -) / +; +     } - } +  A voidMulmatrix (float* Matrixa,float* MATRIXB,float*Matrixc) at { -RegisterintI, J, K; -Registerfloat* p, *Q, F; -      for(j =0; J < M; J + + ) -     { -          for(i =0; i < M; i + + ) in         { -p = Matrixa + J *M; toQ = Matrixb +i; +f =0; -              for(k =0; K < M; K + + ) the             { *F + = *p * *Q; $P + +;Panax NotoginsengQ + =M; -             } the  +Matrixc[j * M + i] =F; A         } the     } + } -  $ intMain () $ { - DWORD t; -     //Register int i, J; the  -Srand ((unsignedint) Time (NULL));Wuyi  theInitmatrix ((float*) MatA); -Initmatrix ((float*) MATB); Wu  -t =:: GetTickCount (); AboutMulmatrix ((float*) MatA, (float*) MatA, (float*) MatC); $T =:: GetTickCount ()-T; -  -  -     /*For (j = 0; J < M; J + +) A     { + For (i = 0; i < M; i + +) the         { - printf ("%.2f", Matc[j][i]); $         } the printf ("\ n"); the     }*/ the  theprintf"time:%d\n", T); -  in  the     return 0; the}

machine configuration E3 1231v3 mem:16g vs2010sp1 ICC 2015XE GTX660 in the future, Cuda will be brought together to test

1. CPU single thread only one O2

4750ms generally

Multithreading was originally measured, this time the code is not added. The 4 cores computed by physical cores should be about 6 seconds or so. Hyper-threading estimates will be better. Should be able to be about 5 seconds.

2. Single file to ICC compilation additional optimizations added/qipo/qparallel

Around 2600ms

Multithreading is still not measured, after

3. Cuda is not tested.

4.MKL not measured. A bit sorry this CPU. Oh, whim, must be mended in the future.

5. The more funny is, I was at the time of the whim, to change the MATRIXC related code to local, try to have no effect, this really has, on average 100ms less

It seems that the master has taught the cache hit is very reasonable.

The above code is changed, before changing to

voidMulmatrix (float* Matrixa,float* MATRIXB,float*MATRIXC) {RegisterintI, J, K, T; Registerfloat* p, *P;  for(j =0; J < M; J + + )    {         for(i =0; i < M; i + +) {p= Matrixa + J *M; Q= Matrixb +i; T= J * M +i; Matrixc[t]=0;  for(k =0; K < M; K + +) {Matrixc[t]+ = *p * *Q; P++; Q+=M; }        }    }}

6. More funny is, put q + + M; Change m to 100 ..... Turned into the original 1/10.

Is it also the cache.

Performance of Intel Compiler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.