# HDU 4920 matrix multiplication (matrix multiplication) Multi-school training 5th

Matrix Multiplication Time Limit: 4000/2000 MS (Java/others) memory limit: 131072/131072 K (Java/Others)

Problem descriptiongiven two matrices A and B of size n × N, find the product of them.

Bobo hates big integers. So you are only asked to find the result modulo 3.
Inputthe input consists of several tests. For each tests:

The first line contains N (1 ≤ n ≤800 ). each of the following n lines contain N integers -- the description of the matrix. the J-th integer in the I-th line equals AIJ. the next n lines describe the matrix B in similar format (0 ≤aij, bij ≤109 ).
Outputfor each tests:

Print n lines. Each of them contain N integers -- the matrix A × B in similar format.
Sample Input
`10120 12 34 56 7`

Sample output
`00 12 1`
Question: Two N * n matrices are given, and the product of these two matrices is obtained. The result returns the remainder of 3. Analysis: the classic matrix multiplication method is used to obtain the question. After the question is submitted, it times out. Then I searched for matrix multiplication Optimization on the Internet and found an optimization method. Unfortunately, I still don't understand how to optimize it.
`#include<cstdio>#include<cstring>#include<algorithm>using namespace std;const int N = 805;int a[N][N], b[N][N], ans[N][N];void  Multi(int n){    int  i, j, k, L, *p2;    int  tmp[N], con;    for(i = 0; i < n; ++i)    {        memset(tmp, 0, sizeof(tmp));        for(k = 0, L = (n & ~15); k < L; ++k)        {            con = a[i][k];            for(j = 0, p2 = b[k]; j < n; ++j, ++p2)                tmp[j] += con * (*p2);            if((k & 15) == 15)            {                for(j = 0; j < n; ++j) tmp[j] %= 3;            }        }        for( ; k < n; ++k)        {            con = a[i][k];            for(j = 0, p2 = b[k]; j < n; ++j, ++p2)                tmp[j] += con * (*p2);        }        for(j = 0; j < n; ++j)            ans[i][j] = tmp[j] % 3;    }}int main(){    int n, i, j, k;    while(~scanf("%d",&n))    {        for(i = 0; i < n; i++)            for(j = 0; j < n; j++)            {                scanf("%d",&a[i][j]);                a[i][j] %= 3;            }        for(i = 0; i < n; i++)            for(j = 0; j < n; j++)            {                scanf("%d",&b[i][j]);                b[i][j] %= 3;            }        Multi(n);        for(i = 0; i < n; i++)        {            for(j = 0; j < n-1; j++)                printf("%d ", ans[i][j]);            printf("%d\n", ans[i][n-1]);        }    }    return 0;}`

The following method can also be used:
`# Include <cstdio> # include <cstring> # include <algorithm> # include <cmath> using namespace STD; const int n = 805; int a [n] [N], B [N] [N], ANS [N] [N]; int main () {int N, I, J, K; while (~ Scanf ("% d", & N) {for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {scanf ("% d", & A [I] [J]); A [I] [J] % = 3 ;}for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {scanf ("% d ", & B [I] [J]); B [I] [J] % = 3;} memset (ANS, 0, sizeof (ANS); For (k = 1; k <= N; k ++) // in a classic algorithm, this layer of loop exists in the innermost layer, which times out, but not in the outermost layer or in the middle, I do not know why for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {ans [I] [J] + = A [I] [k] * B [k] [J]; // ans [I] [J] % = 3; // if the remainder of 3 is obtained here, it will time out} for (I = 1; I <= N; I ++) {for (j = 1; j <N; j ++) printf ("% d", ANS [I] [J] % 3); printf ("% d \ n ", ans [I] [N] % 3) ;}} return 0 ;}`
