Matrix Multiplication

**Time Limit: 4000/2000 MS (Java/others) memory limit: 131072/131072 K (Java/Others)**
Problem descriptiongiven two matrices A and B of size n × N, find the product of them.

Bobo hates big integers. So you are only asked to find the result modulo 3.

Inputthe input consists of several tests. For each tests:

The first line contains N (1 ≤ n ≤800 ). each of the following n lines contain N integers -- the description of the matrix. the J-th integer in the I-th line equals AIJ. the next n lines describe the matrix B in similar format (0 ≤aij, bij ≤109 ).

Outputfor each tests:

Print n lines. Each of them contain N integers -- the matrix A × B in similar format.

Sample Input

10120 12 34 56 7

Sample output

00 12 1

Question: Two N * n matrices are given, and the product of these two matrices is obtained. The result returns the remainder of 3. Analysis: the classic matrix multiplication method is used to obtain the question. After the question is submitted, it times out. Then I searched for matrix multiplication Optimization on the Internet and found an optimization method. Unfortunately, I still don't understand how to optimize it.

#include<cstdio>#include<cstring>#include<algorithm>using namespace std;const int N = 805;int a[N][N], b[N][N], ans[N][N];void Multi(int n){ int i, j, k, L, *p2; int tmp[N], con; for(i = 0; i < n; ++i) { memset(tmp, 0, sizeof(tmp)); for(k = 0, L = (n & ~15); k < L; ++k) { con = a[i][k]; for(j = 0, p2 = b[k]; j < n; ++j, ++p2) tmp[j] += con * (*p2); if((k & 15) == 15) { for(j = 0; j < n; ++j) tmp[j] %= 3; } } for( ; k < n; ++k) { con = a[i][k]; for(j = 0, p2 = b[k]; j < n; ++j, ++p2) tmp[j] += con * (*p2); } for(j = 0; j < n; ++j) ans[i][j] = tmp[j] % 3; }}int main(){ int n, i, j, k; while(~scanf("%d",&n)) { for(i = 0; i < n; i++) for(j = 0; j < n; j++) { scanf("%d",&a[i][j]); a[i][j] %= 3; } for(i = 0; i < n; i++) for(j = 0; j < n; j++) { scanf("%d",&b[i][j]); b[i][j] %= 3; } Multi(n); for(i = 0; i < n; i++) { for(j = 0; j < n-1; j++) printf("%d ", ans[i][j]); printf("%d\n", ans[i][n-1]); } } return 0;}

Bytes.

The following method can also be used:

# Include <cstdio> # include <cstring> # include <algorithm> # include <cmath> using namespace STD; const int n = 805; int a [n] [N], B [N] [N], ANS [N] [N]; int main () {int N, I, J, K; while (~ Scanf ("% d", & N) {for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {scanf ("% d", & A [I] [J]); A [I] [J] % = 3 ;}for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {scanf ("% d ", & B [I] [J]); B [I] [J] % = 3;} memset (ANS, 0, sizeof (ANS); For (k = 1; k <= N; k ++) // in a classic algorithm, this layer of loop exists in the innermost layer, which times out, but not in the outermost layer or in the middle, I do not know why for (I = 1; I <= N; I ++) for (j = 1; j <= N; j ++) {ans [I] [J] + = A [I] [k] * B [k] [J]; // ans [I] [J] % = 3; // if the remainder of 3 is obtained here, it will time out} for (I = 1; I <= N; I ++) {for (j = 1; j <N; j ++) printf ("% d", ANS [I] [J] % 3); printf ("% d \ n ", ans [I] [N] % 3) ;}} return 0 ;}