First, the naming rules:
There are three levels of blas operations,
-
Level 1
-
Vector operations, e.g.
Y =/alpha x + yVector operations
-
Level 2
-
Matrix-vector operations, e.g.
Y =/Alpha a x +/beta yMatrix and vector operations
-
Level 3
-
Matrix-matrix operations, e.g.
C =/Alpha a B + COperations on Matrices and Matrices
Each routine has a name which specifies the operation, the type of matrices involved and their precisions. Some of the most common operations and their names are given below,
-
Dot
-
Scalar Product,
X ^ t y
-
Axpy
-
Vector Sum,
/Alpha x + y
-
MV
-
Matrix-vector product,
A x
-
SV
-
Matrix-vector solve,
Inv (a) x
-
Mm
-
Matrix-matrix product,
A B
-
Sm
-
Matrix-matrix solve,
Inv (a) B
The type of matrices are,
-
Ge
-
General
-
GB
-
General band
-
Sy
-
Symmetric
-
Sb
-
Symmetric band
-
SP
-
Symmetric packed
-
He
-
Hermitian
-
HB
-
Hermitian band
-
HP
-
Hermitian packed
-
Tr
-
Triangular
-
TB
-
Triangular band
-
TP
-
Triangular packed
Each operation is defined for four precisions,
-
S
-
Single real
-
D
-
Double real
-
C
-
Single Complex
-
Z
-
Double Complex
Thus, for example, the name sgemm stands for "Single-precision general matrix-matrix multiply" and zgemm stands for "double-precision complex matrix-matrix multiply ".
Therefore, for example, the function named sgemm indicates "Single-precision normal matrix multiplication", and zgemm indicates "double-precision complex matrix multiplication ".
More, can refer to: http://www.netlib.org/blas/blasqr.pdf
The following example is displayed on the csdn Forum. You can use gotoblas2 to make a successful call.
Program Description: The following matrix. the C file calls C code, Blas Level 1 function (ddot), Blas Level 2 function (dgemv), And Blas Level 3 function (dgemm) to complete matrix calculation: yours_multiply is the c source code, it directly relies on the compiler to generate optimization code. Ddot_multiply and dgemv_multiply use gotoblas2 to call partial matrix operations. Dgemm_multiply directly calls the matrix computing function of gotoblas2.
// Simple minded matrix multiply <br/> # include <stdio. h> <br/> # include <time. h> <br/> # include <stdlib. h> <br/> # include <Common. h> // gotoblas2 <br/> # include <cblas. h> // gotoblas2 </P> <p> void print_arr (int n, char * Name, double * array); <br/> void init_arr (int n, double * A); <br/> void dgemm_multiply (double * a, double * B, double * C, int N); <br/> void dgemv_multiply (double *, double * B, double * C, int N); <B R/> void ddot_multiply (double * a, double * B, double * C, int N); <br/> void yours_multiply (double * a, double * B, double * C, int N); </P> <p> int main (INT argc, char * argv []) <br/>{< br/> clock_t start, stop; <br/> int I, j; <br/> int N; <br/> double * A; <br/> double * B; <br/> double * C; <br/> If (argc <2) <br/> {<br/> printf ("Enter matrix size N = "); <br/> // please enter small number first to ensure that t He <br/> // multiplication is correct! And then you may enter <br/> // a "Reasonably" large number say like 500 or even 1000 <br/> scanf ("% d", & N ); <br/>}< br/> else <br/> {<br/> N = atoi (argv [1]); <br/>}</P> <p> A = (double *) malloc (sizeof (double) * n); <br/> B = (double *) malloc (sizeof (double) * n); <br/> C = (double *) malloc (sizeof (double) * n ); </P> <p> init_arr (n, a); <br/> init_arr (n, B); </P> <p> Start = clock (); <br/> yours_multiply (A, B, C, N); <br/> stop = clock (); <br/> printf ("roll_your_own_multiply (). elapsed time = % G seconds/N ", <br/> (double) (stop-start)/clocks_per_sec ); <br/> // print simple test case of data to be sure multiplication is correct <br/> If (n <7) <br/>{< br/> print_arr (n, "A", a); <br/> print_arr (n, "B", B ); <br/> print_arr (n, "C", c); <br/>}< br/> free (a); <br/> free (B ); <br/> free (c); </P> <p> // ddot m Ultiply <br/> A = (double *) malloc (sizeof (double) * n); <br/> B = (double *) malloc (sizeof (double) * n); <br/> C = (double *) malloc (sizeof (double) * n); <br/> init_arr (n, ); <br/> init_arr (n, B); </P> <p> Start = clock (); <br/> ddot_multiply (A, B, C, N ); <br/> stop = clock (); </P> <p> printf ("ddot_multiply (). elapsed time = % G seconds/N ", <br/> (double) (stop-start)/clocks_per_sec); <br/> // print simpl E test case of data to be sure multiplication is correct <br/> If (n <7) <br/>{< br/> print_arr (n, "", a); <br/> print_arr (n, "B", B); <br/> print_arr (n, "C", C ); <br/>}< br/> free (a); <br/> free (B); <br/> free (C ); </P> <p> // dgemv multiply <br/> // reallcoate to force cash to be flushed <br/> A = (double *) malloc (sizeof (double) * n); <br/> B = (double *) malloc (sizeof (double) * n); <br/> C = (double *) m Alloc (sizeof (double) * n); <br/> init_arr (n, a); <br/> init_arr (n, B ); </P> <p> Start = clock (); <br/> dgemv_multiply (A, B, C, N); <br/> stop = clock (); </P> <p> printf ("dgemv_multiply (). elapsed time = % G seconds/N ", <br/> (double) (stop-start)/clocks_per_sec ); <br/> // print simple test case of data to be sure multiplication is correct <br/> If (n <7) <br/>{< br/> print_arr (n, "A", a); <br/> Print _ Arr (n, "B", B); <br/> print_arr (n, "C", c); <br/>}< br/> free (); <br/> free (B); <br/> free (C ); </P> <p> // dgemm multiply <br/> // reallocate to force cash to be flushed <br/> A = (double *) malloc (sizeof (double) * n); <br/> B = (double *) malloc (sizeof (double) * n); <br/> C = (double *) malloc (sizeof (double) * n); <br/> init_arr (n, a); <br/> init_arr (n, B ); </P> <p> Start = clock (); <br/> dgemm_multiply (a, B, c, N); <br/> stop = clock (); </P> <p> printf ("dgemm_multiply (). elapsed time = % G seconds/N ", <br/> (double) (stop-start)/clocks_per_sec ); <br/> // print simple test case of data to be sure multiplication is correct <br/> If (n <7) <br/>{< br/> print_arr (n, "A", a); <br/> print_arr (n, "B", B ); <br/> print_arr (n, "C", c); <br/>}</P> <p> free (); <br/> free (B); <br/> free (c); </P> <p> return 0; <br/>}</ P> <p> // brute force way of matrix multiply <br/> void yours_multiply (double * a, double * B, double * C, int N) <br/> {<br/> int I, j, k; <br/> for (I = 0; I <n * n; I ++) <br/> C [I] = 0; <br/> for (I = 0; I <n; I ++) <br/> {<br/> for (j = 0; j <n; j ++) <br/> {<br/> for (k = 0; k <n; k ++) <br/> {<br/> C [N * I + J] + = A [n * I + k] * B [N * k + J]; <br/>}</P> <p> // The ddot way to matrix multiply <br/> void DDO T_multiply (double * a, double * B, double * C, int N) <br/>{< br/> int I, j; <br/> int incx = 1; <br/> int incy = N; <br/> for (I = 0; I <n; I ++) <br/> {<br/> for (j = 0; j <n; j ++) <br/> {<br/> C [N * I + J] = cblas_ddot (n, & A [n * I], incx, & B [J], incy); <br/>}< br/> // dgemv way of matrix multiply <br/> void dgemv_multiply (double *, double * B, double * C, int N) <br/>{< br/> int I; <br/> double alpha = 1.0, Beta = 0 .; <br/> int incx = 1; <br/> int incy = N; <br/> for (I = 0; I <n; I ++) <br/>{< br/> cblas_dgemv (cblasrowmajor, cblasnotrans, N, N, Alpha, A, N, & B [I], n, beta, & C [I], n); </P> <p >}< br/>}</P> <p> // dgemm way. the prefered way, especially for large matrices <br/> void dgemm_multiply (double * a, double * B, double * C, int N) <br/>{< br/> int I; <br/> double alpha = 1.0, Beta = 0 .; <br/> int incx = 1; <Br/> int incy = N; <br/> cblas_dgemm (cblasrowmajor, cblasnotrans, cblasnotrans, N, Alpha, B, n, A, N, beta, c, N); <br/>}</P> <p> // initialize array with random data <br/> void init_arr (int n, double *) <br/> {<br/> int I, j; <br/> for (I = 0; I <n; I ++) <br/> {<br/> for (j = 0; j <n; j ++) <br/> {<br/> A [I * n + J] = (I + J + 1) % 10; // keep all entries less than 10. pleasing to the eye! <Br/>}</P> <p> // print array to STD out <br/> void print_arr (int n, char * Name, double * array) <br/>{< br/> int I, j; <br/> printf ("/n % s/n", name ); <br/> for (I = 0; I <n; I ++) <br/> {<br/> for (j = 0; j <N; j ++) <br/>{< br/> printf ("% G/T", array [N * I + J]); <br/>}< br/> printf ("/N"); <br/>}< br/>