C Language Scientific Computing introduction of matrix multiplication related computation _c language

Source: Internet
Author: User

1. Matrix multiplication
matrix multiplication should meet the conditions:
(1) The number of columns in matrix A must equal the number of rows of matrix B, and matrix A and matrix B can be multiplied;
(2) The row number of matrix C equals the number of rows of matrix A, and the number of columns in Matrix C equals that of matrix B;
(3) The sum of the Elements of column I, J of the Matrix C, of the line I of matrix A and the product of the first J of Matrix B,

Such as:

The

2. Common matrix multiplication algorithm
the first line of a is multiplied by the elements of column J of B, and the elements of the J column of C are obtained, in which the access of B is accessed by column, and the code is as follows:

void Arymul (int a[4][5], int b[5][3], int c[4][3])
{
 int i, j, K;
 int temp;
 for (i = 0; i < 4; i++) {for
 (j = 0; J < 3; J +) {
  temp = 0;
  for (k = 0; k < 5; k++) {
  temp + = a[i][k] * b[k][j];
  }
  C[I][J] = temp;
  printf ("%d/t", C[i][j]);
 printf ("%d/n");
 }

3. Improved algorithm
matrices A, B, and C are accessed by rows (the order in which the data is stored), to increase the efficiency of memory access, for the first line of a, the elements of the J column are multiplied by the elements of the J line of B, and the same column K in B is summed in the above calculation to obtain the data in line K of column C, and the code is as follows:

void arymul1 (int a[4][5], int b[5][3], int c[4][3])
{
 int i, j, K;
 int temp[3] = {0};
 for (i = 0; i < 4; i++) {for
 (k = 0; k < 3; k + +)
  temp[k] = 0;
 for (j = 0; J < 5; J + +) {//when forward each element for
  (k = 0; k < 3; k++) {
  temp[k] = a[i][j] * b[j][k]
  ;
 }
 for (k = 0; k < 3; k++) {
  c[i][k] = temp[k];
  printf ("%d/t", C[i][k]);
 printf ("%d/n");
 }

This algorithm can easily go to the multiplication algorithm of sparse matrices.

PS: implementation of Strathearn algorithm
Strathearn method, a method proposed by V. Strathearn in 1969.

We first discuss the method of calculating the second-order matrix.
For second-order matrices

A11 A12 b11 b12 
A = A21 A22 B = B21 B22 

First calculate the following 7 quantities (1)

X1 = (A11 + a22) * (B11 + b22); 
x2 = (A21 + a22) * B11; 
x3 = A11 * (b12-b22); 
x4 = A22 * (B21-B11); 
X5 = (A11 + a12) * B22; 
x6 = (A21-A11) * (B11 + b12); 
X7 = (a12-a22) * (B21 + B22); 

Then set c = AB. According to the rules multiplied by matrices, the elements of C are (2)

C11 = A11 * B11 + A12 * B21 
C12 = A11 * b12 + A12 * B22 
C21 = A21 * B11 + A22 * B21 
C22 = A21 * b12 + A22 * B2 2 

Comparison (1) (2), each element of C can be expressed as (3)

C11 = x1 + x4-x5 + x7 
C12 = x3 + x5 
C21 = x2 + x4 
c22 = x1 + x3-x2 + x6 

Based on the above methods, we can calculate the 4-order matrix, the first 4-order matrix A and B into four block 2-order matrix, respectively, using the formula to calculate their product, and then use (1) (3) to calculate the final results.

Ma11 ma12 mb11 mb12 
A4 = ma21 ma22 B4 = Mb21 mb22 

which

A12 A13 A14 B11 b12 b13 A11 b14 = ma11 A21 A22 
= ma12 A23 A24 = mb11 B21 B22 = mb12 b23 b24 a31 a32 a33 

B32 b33 b34 
ma21 = A41 a42 ma22 = A43 A44 mb21 = b41 B42 mb22 = b43 b44 

Realize

Calculates the 2x2 matrix void multiply2x2 (float& fout_11, float& Fout_12, float& fout_21, float& fout_22, float f1_11 , float F1_12, float f1_21, float f1_22, float f2_11, float F2_12, float f2_21, float f2_22) {const FLOAT x1 (f1_11 + 
F1_22) * (F2_11 + f2_22)); 
Const FLOAT X2 ((f1_21 + f1_22) * f2_11); 
Const FLOAT x3 (F1_11 * (f2_12-f2_22)); 
Const FLOAT x4 (f1_22 * (F2_21-f2_11)); 
Const float X5 ((f1_11 + f1_12) * f2_22); 
Const FLOAT x6 ((f1_21-f1_11) * (F2_11 + F2_12)); 
Const float X7 ((f1_12-f1_22) * (f2_21 + f2_22)); 
Fout_11 = x1 + x4-x5 + x7; 
Fout_12 = x3 + x5; 
fout_21 = x2 + x4; 
fout_22 = x1-x2 + x3 + x6; ///Calculate 4x4 matrix void Multiply (claymatrix& mout, const claymatrix& M1, const claymatrix& m2) {float ftmp[7][4 
]; (MA11 + ma22) * (MB11 + mb22) multiply2x2 (ftmp[0][0], ftmp[0][1], ftmp[0][2], ftmp[0][3], M1._11 + m1._33, M1._12 + M 1._34, m1._21 + m1._43, m1._22 + m1._44, M2._11 + m2._33, M2._12 + m2._34, m2._21 + m2._43, m2._22 + m2._44); (Ma21 + ma22) * Mb11 multiply2x2 (ftmp[1][0], ftmp[1][1], ftmp[1][2], ftmp[1][3], m1._31 + m1._33, M1._32 + M1._34, M1 
. _41 + m1._43, m1._42 + m1._44, M2._11, M2._12, m2._21, m2._22); MA11 * (mb12-mb22) multiply2x2 (ftmp[2][0], ftmp[2][1], ftmp[2][2], ftmp[2][3, M1._11, M1._12, m1._21, m1._22, M2. 
_13-m2._33, m2._14-m2._34, m2._23-m2._43, m2._24-m2._44); Ma22 * (MB21-MB11) multiply2x2 (ftmp[3][0], ftmp[3][1], ftmp[3][2], ftmp[3][3, m1._33, m1._34, m1._43, m1._44, M2. 
_31-m2._11, M2._32-m2._12, m2._41-m2._21, m2._42-m2._22); (MA11 + ma12) * Mb22 multiply2x2 (ftmp[4][0], ftmp[4][1], ftmp[4][2], ftmp[4][3], M1._11 + m1._13, M1._12 + M1._14, M1 
. _21 + m1._23, m1._22 + m1._24, m2._33, m2._34, m2._43, m2._44); (MA21-MA11) * (MB11 + mb12) multiply2x2 (ftmp[5][0], ftmp[5][1], ftmp[5][2], ftmp[5][3], M1._31-m1._11, m1._32-m 
1._12, m1._41-m1._21, m1._42-m1._22, M2._11 + m2._13, M2._12 + m2._14, m2._21 + m2._23, m2._22 + m2._24); (MA12-MA22) * (Mb21 + mb22) multiply2x2 (ftmp[6][0], ftmp[6][1], ftmp[6][2], ftmp[6][3], m1._13-m1._33, m1._14-m1._34, 
m1._23-m1._43, m1._24-m1._44, m2._31 + m2._33, m2._32 + m2._34, m2._41 + m2._43, m2._42 + m2._44); 
The first block mout._11 = ftmp[0][0] + ftmp[3][0]-ftmp[4][0] + ftmp[6][0]; 
Mout._12 = ftmp[0][1] + ftmp[3][1]-ftmp[4][1] + ftmp[6][1]; 
mout._21 = ftmp[0][2] + ftmp[3][2]-ftmp[4][2] + ftmp[6][2]; 
Mout._22 = ftmp[0][3] + ftmp[3][3]-ftmp[4][3] + ftmp[6][3]; 
The second block mout._13 = ftmp[2][0] + ftmp[4][0]; 
Mout._14 = ftmp[2][1] + ftmp[4][1]; 
mout._23 = ftmp[2][2] + ftmp[4][2]; 
mout._24 = ftmp[2][3] + ftmp[4][3]; 
The third block mout._31 = ftmp[1][0] + ftmp[3][0]; 
Mout._32 = ftmp[1][1] + ftmp[3][1]; 
mout._41 = ftmp[1][2] + ftmp[3][2]; 
mout._42 = ftmp[1][3] + ftmp[3][3]; 
Block IV mout._33 = ftmp[0][0]-ftmp[1][0] + ftmp[2][0] + ftmp[5][0]; 
mout._34 = ftmp[0][1]-ftmp[1][1] + ftmp[2][1] + ftmp[5][1]; 
mout._43 = ftmp[0][2]-ftmp[1][2] + ftmp[2][2] + ftmp[5][2]; MOut._44 = ftmp[0][3]-ftmp[1][3] + ftmp[2][3] + ftmp[5][3]; 
 }

Compare
in the standard definition algorithm we need to do n * N-n multiplication, the new algorithm we need to do 7log2n multiplication, for the most commonly used 4-order matrix: The original algorithm of the new algorithm
Addition Times 48 72 (48 addition, 24 times subtraction)
Multiplication times The
requires extra space for * sizeof (FLOAT) * sizeof (float)
The new algorithm is 24 times more subtraction than the original algorithm and 15 times less multiplication. But because the speed of floating-point multiplication is much slower than the addition/subtraction operation, the overall speed of the new algorithm is improved.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.