1. Matrix multiplication
matrix multiplication should meet the conditions:
(1) The number of columns in matrix A must equal the number of rows of matrix B, and matrix A and matrix B can be multiplied;
(2) The row number of matrix C equals the number of rows of matrix A, and the number of columns in Matrix C equals that of matrix B;
(3) The sum of the Elements of column I, J of the Matrix C, of the line I of matrix A and the product of the first J of Matrix B,
Such as:
The
2. Common matrix multiplication algorithm
the first line of a is multiplied by the elements of column J of B, and the elements of the J column of C are obtained, in which the access of B is accessed by column, and the code is as follows:
void Arymul (int a[4][5], int b[5][3], int c[4][3])
{
int i, j, K;
int temp;
for (i = 0; i < 4; i++) {for
(j = 0; J < 3; J +) {
temp = 0;
for (k = 0; k < 5; k++) {
temp + = a[i][k] * b[k][j];
}
C[I][J] = temp;
printf ("%d/t", C[i][j]);
printf ("%d/n");
}
3. Improved algorithm
matrices A, B, and C are accessed by rows (the order in which the data is stored), to increase the efficiency of memory access, for the first line of a, the elements of the J column are multiplied by the elements of the J line of B, and the same column K in B is summed in the above calculation to obtain the data in line K of column C, and the code is as follows:
void arymul1 (int a[4][5], int b[5][3], int c[4][3])
{
int i, j, K;
int temp[3] = {0};
for (i = 0; i < 4; i++) {for
(k = 0; k < 3; k + +)
temp[k] = 0;
for (j = 0; J < 5; J + +) {//when forward each element for
(k = 0; k < 3; k++) {
temp[k] = a[i][j] * b[j][k]
;
}
for (k = 0; k < 3; k++) {
c[i][k] = temp[k];
printf ("%d/t", C[i][k]);
printf ("%d/n");
}
This algorithm can easily go to the multiplication algorithm of sparse matrices.
PS: implementation of Strathearn algorithm
Strathearn method, a method proposed by V. Strathearn in 1969.
We first discuss the method of calculating the second-order matrix.
For second-order matrices
A11 A12 b11 b12
A = A21 A22 B = B21 B22
First calculate the following 7 quantities (1)
X1 = (A11 + a22) * (B11 + b22);
x2 = (A21 + a22) * B11;
x3 = A11 * (b12-b22);
x4 = A22 * (B21-B11);
X5 = (A11 + a12) * B22;
x6 = (A21-A11) * (B11 + b12);
X7 = (a12-a22) * (B21 + B22);
Then set c = AB. According to the rules multiplied by matrices, the elements of C are (2)
C11 = A11 * B11 + A12 * B21
C12 = A11 * b12 + A12 * B22
C21 = A21 * B11 + A22 * B21
C22 = A21 * b12 + A22 * B2 2
Comparison (1) (2), each element of C can be expressed as (3)
C11 = x1 + x4-x5 + x7
C12 = x3 + x5
C21 = x2 + x4
c22 = x1 + x3-x2 + x6
Based on the above methods, we can calculate the 4-order matrix, the first 4-order matrix A and B into four block 2-order matrix, respectively, using the formula to calculate their product, and then use (1) (3) to calculate the final results.
Ma11 ma12 mb11 mb12
A4 = ma21 ma22 B4 = Mb21 mb22
which
A12 A13 A14 B11 b12 b13 A11 b14 = ma11 A21 A22
= ma12 A23 A24 = mb11 B21 B22 = mb12 b23 b24 a31 a32 a33
B32 b33 b34
ma21 = A41 a42 ma22 = A43 A44 mb21 = b41 B42 mb22 = b43 b44
Realize
Calculates the 2x2 matrix void multiply2x2 (float& fout_11, float& Fout_12, float& fout_21, float& fout_22, float f1_11 , float F1_12, float f1_21, float f1_22, float f2_11, float F2_12, float f2_21, float f2_22) {const FLOAT x1 (f1_11 +
F1_22) * (F2_11 + f2_22));
Const FLOAT X2 ((f1_21 + f1_22) * f2_11);
Const FLOAT x3 (F1_11 * (f2_12-f2_22));
Const FLOAT x4 (f1_22 * (F2_21-f2_11));
Const float X5 ((f1_11 + f1_12) * f2_22);
Const FLOAT x6 ((f1_21-f1_11) * (F2_11 + F2_12));
Const float X7 ((f1_12-f1_22) * (f2_21 + f2_22));
Fout_11 = x1 + x4-x5 + x7;
Fout_12 = x3 + x5;
fout_21 = x2 + x4;
fout_22 = x1-x2 + x3 + x6; ///Calculate 4x4 matrix void Multiply (claymatrix& mout, const claymatrix& M1, const claymatrix& m2) {float ftmp[7][4
]; (MA11 + ma22) * (MB11 + mb22) multiply2x2 (ftmp[0][0], ftmp[0][1], ftmp[0][2], ftmp[0][3], M1._11 + m1._33, M1._12 + M 1._34, m1._21 + m1._43, m1._22 + m1._44, M2._11 + m2._33, M2._12 + m2._34, m2._21 + m2._43, m2._22 + m2._44); (Ma21 + ma22) * Mb11 multiply2x2 (ftmp[1][0], ftmp[1][1], ftmp[1][2], ftmp[1][3], m1._31 + m1._33, M1._32 + M1._34, M1
. _41 + m1._43, m1._42 + m1._44, M2._11, M2._12, m2._21, m2._22); MA11 * (mb12-mb22) multiply2x2 (ftmp[2][0], ftmp[2][1], ftmp[2][2], ftmp[2][3, M1._11, M1._12, m1._21, m1._22, M2.
_13-m2._33, m2._14-m2._34, m2._23-m2._43, m2._24-m2._44); Ma22 * (MB21-MB11) multiply2x2 (ftmp[3][0], ftmp[3][1], ftmp[3][2], ftmp[3][3, m1._33, m1._34, m1._43, m1._44, M2.
_31-m2._11, M2._32-m2._12, m2._41-m2._21, m2._42-m2._22); (MA11 + ma12) * Mb22 multiply2x2 (ftmp[4][0], ftmp[4][1], ftmp[4][2], ftmp[4][3], M1._11 + m1._13, M1._12 + M1._14, M1
. _21 + m1._23, m1._22 + m1._24, m2._33, m2._34, m2._43, m2._44); (MA21-MA11) * (MB11 + mb12) multiply2x2 (ftmp[5][0], ftmp[5][1], ftmp[5][2], ftmp[5][3], M1._31-m1._11, m1._32-m
1._12, m1._41-m1._21, m1._42-m1._22, M2._11 + m2._13, M2._12 + m2._14, m2._21 + m2._23, m2._22 + m2._24); (MA12-MA22) * (Mb21 + mb22) multiply2x2 (ftmp[6][0], ftmp[6][1], ftmp[6][2], ftmp[6][3], m1._13-m1._33, m1._14-m1._34,
m1._23-m1._43, m1._24-m1._44, m2._31 + m2._33, m2._32 + m2._34, m2._41 + m2._43, m2._42 + m2._44);
The first block mout._11 = ftmp[0][0] + ftmp[3][0]-ftmp[4][0] + ftmp[6][0];
Mout._12 = ftmp[0][1] + ftmp[3][1]-ftmp[4][1] + ftmp[6][1];
mout._21 = ftmp[0][2] + ftmp[3][2]-ftmp[4][2] + ftmp[6][2];
Mout._22 = ftmp[0][3] + ftmp[3][3]-ftmp[4][3] + ftmp[6][3];
The second block mout._13 = ftmp[2][0] + ftmp[4][0];
Mout._14 = ftmp[2][1] + ftmp[4][1];
mout._23 = ftmp[2][2] + ftmp[4][2];
mout._24 = ftmp[2][3] + ftmp[4][3];
The third block mout._31 = ftmp[1][0] + ftmp[3][0];
Mout._32 = ftmp[1][1] + ftmp[3][1];
mout._41 = ftmp[1][2] + ftmp[3][2];
mout._42 = ftmp[1][3] + ftmp[3][3];
Block IV mout._33 = ftmp[0][0]-ftmp[1][0] + ftmp[2][0] + ftmp[5][0];
mout._34 = ftmp[0][1]-ftmp[1][1] + ftmp[2][1] + ftmp[5][1];
mout._43 = ftmp[0][2]-ftmp[1][2] + ftmp[2][2] + ftmp[5][2]; MOut._44 = ftmp[0][3]-ftmp[1][3] + ftmp[2][3] + ftmp[5][3];
}
Compare
in the standard definition algorithm we need to do n * N-n multiplication, the new algorithm we need to do 7log2n multiplication, for the most commonly used 4-order matrix: The original algorithm of the new algorithm
Addition Times 48 72 (48 addition, 24 times subtraction)
Multiplication times The
requires extra space for * sizeof (FLOAT) * sizeof (float)
The new algorithm is 24 times more subtraction than the original algorithm and 15 times less multiplication. But because the speed of floating-point multiplication is much slower than the addition/subtraction operation, the overall speed of the new algorithm is improved.