This article focuses on the algorithms that are extended from the Rabin-Karp Algorithm to the two-dimensional algorithm to solve the two-dimensional pattern matching problem.
Problem:
Search for a given m1 * M2 pattern in the composition of a N1 * N2 two-dimensional character. See exercise 32.2-3 in introduction to algorithms.
Analysis:
1. First, we will briefly introduce the Rabin-Karp Algorithm.
Rabin-Karp is a string matching algorithm. The main idea of Rabin-Karp is to calculate the hash value of the pattern string in advance and calculate the hash value of the Child string to be matched when matching, you can directly compare the hash value of the mode string and the current sub-string to determine whether it matches.
For ease of description, the following uses a number string as an example (each character in a string is a decimal number, such as a string 31415 ). We know a pattern P [1 .. m], and set P to indicate the value of the corresponding decimal number. Similarly, for the given text T [1 .. n]. Use TS to represent the sub-string T [S + 1 .. whose length is M .. S + M] (S = 0, 1 ,.., n-m) the value of the corresponding decimal number. Obviously, if T [S + 1 .. S + M] and mode P [1 .. m] match, then ts must be equal to P. In contrast, if Ts = P, T [S + 1 .. S + M] and P [1 .. m] match. If the P value can be calculated in O (m) time and all ts values are calculated in total O (N-m + 1) Time, then, by comparing the P value with each ts value, we can take the O (m) + O (N-m + 1) = O (n) Time, find all matches.
The rk algorithm uses the Horner's rule to calculate the P value in O (m) Time:
P = P [m] + 10 (p [s-1] + 10 (p [m-2] +... + 10 (p [2] + 10 p [1]) ..)
Similarly, the T0 value can be calculated in O (m) time.
Because
TS + 1 = 10 [ts-10m-1T [S + 1]) + T [S + m + 1]
Therefore, you can calculate TS + 1 based on ts within the constant time.
For example, if M = 5, TS = 12345, assume that the next digit is 6, remove the high digit 1, and then add the low digit 6, you will get:
TS + 1 = 10 * (12345-10000*1) + 6 = 23456
Therefore, the Rabin-Karp Algorithm can use the pre-processing time of O (M) and the matching time of O (N-m + 1) to find the pattern P [1 .. m] in text T [1 .. n.
The only problem in this process is that the values of P and TS may be too large, and the rk algorithm is processed by modulo equivalence. The actually calculated values of P and Ts are the values after Q. If the values of P and Ts are equal during the matching, the strings may not match, at this time, we need to simply compare these two strings for testing. This article does not take this into consideration. If you are interested, refer to the introduction to algorithms.
In addition, this example is based on a number string. In general, it can be assumed that each character is a character in the D-based notation, in this case, 10 in the above two calculation formulas must be replaced with D.
2. extended to two-dimensional
First, let's look at the pattern matrix. If every column in column m2 is regarded as a whole, each of them is a one-dimensional string, and the hash values can be calculated separately (using the hona rule ), in this way, the pattern matrix becomes a one-dimensional pattern string with a length of M2.
Then, use the same method to obtain a string of N2 length for the first M1 row of the large matrix.
In this way, finding the pattern matrix in the first M1 row of the large matrix converts it into a one-dimensional string matching problem. (Here we can solve this problem by using a one-dimensional string matching algorithm, such as KMP)
Finally, in the same method, the rows 2nd to M1 + 1 in the large matrix, 3rd to M1 + 2 in the same way... Can be matched using the same method.
The key here is that during each matching, the converted one-dimensional string can be directly calculated from the previous string. (Similar to Rabin-Karp, TS + 1 can be calculated by ts within a constant time)
Source code-Java
01 public class StringMatch2D {
02
03 public static void main(String[] args) {
04 char[][] text = {
05 { 'a', 'b', 'a', 'b', 'a' },
06 { 'a', 'b', 'a', 'b', 'a' },
07 { 'a', 'b', 'b', 'a', 'a' },
08 { 'a', 'b', 'a', 'a', 'b' },
09 { 'b', 'b', 'a', 'b', 'a' }
10 };
11 char[][] pattern = {
12 { 'a', 'b' },
13 { 'b', 'a' }
14 };
15
16 matrixPatternMatch(text, pattern);
17 }
18
19 private static void matrixPatternMatch(char[][] text, char[][] pattern) {
20 // pre-process
21 int[] patternStamp = new int[pattern[0].length];
22 int[] textStamp = new int1.length];
23
24 caculateStamp(pattern, pattern.length, patternStamp);
25 caculateStamp(text, pattern.length, textStamp);
26
27 int[] next = new int[patternStamp.length];
28 caculateNext(patternStamp, next);
29
30 for (int i = 0; i < (text.length - pattern.length + 1); i++) {
31 int col = isMatch(patternStamp, textStamp, next);
32 if (col != -1) {
33 System.out.println("found");
34 System.out.println(i+", "+col);
35 }
36
37 // move down
38 if(i < text.length - pattern.length)
39 caculateNextStamp(text, pattern.length, textStamp, i);
40 }
41
42 }
43
44 private static int isMatch(int[] patternStamp, int[] textStamp, int[] next) {
45 int i = 0, j = 0;
46 while (j < patternStamp.length && i < textStamp.length) {
47 if (j == -1 || patternStamp[j] == textStamp[i]) {
48 i++;
49 j++;
50 } else {
51 j = next[j];
52 }
53 }
54
55 if (j == patternStamp.length) {
56 return i-j;
57 } else {
58 return -1;
59 }
60 }
61
62 private static void caculateNext(int[] pattern, int[] next) {
63 next[0] = -1;
64
65 int i = 0, j = -1;
66 while(i<pattern.length-1) {
67 if(j==-1 || pattern[i] == pattern[j]) {
68 i++;
69 j++;
70 next[i] = j;
71 } else {
72 j = next[j];
73 }
74 }
75
76 }
77
78 private static void caculateNextStamp(char[][] text, int height,
79 int[] textStamp, int row) {
80 int d = (int) Math.pow(26, height-1);
81 for (int i = 0; i < textStamp.length; i++) {
82 textStamp[i] = 26 * (textStamp[i] - d * text[row][i]) + text[row + height][i];
83 }
84 }
85
86 private static void caculateStamp(char[][] input, int height, int[] result) {
87 for (int i = 0; i < result.length; i++) {
88 result[i] = 0;
89 for (int j = 0; j < height; j++) {
90 result[i] = 26 * result[i] + input[j][i];
91 }
92 }
93 }
94
95 }
The21-28To pre-process the matching.
The21-22Rows, patternstamp, and textstamp are used to store the mode matrix and the first M1 row of the text matrix and convert them into a dimension string.
The86-93The row defines the function caculatestamp. The input is a two-dimensional character matrix, and the output is a one-dimensional numeric string converted. Calculate a value for each column in the input matrix using the Horna rule and use it as a bit of the converted one-dimensional numeric string.
The24-25And use the caculatestamp function to calculate the converted one-dimensional numeric string.
The27-28The next array of the converted one-dimensional pattern string. It is used by the KMP algorithm for one-dimensional string matching.
The30-40Line.
When matching, the text matrix is matched once per M1 line. Before matching, it is converted into a one-dimensional numeric string. The KMP algorithm (31st rows) is used for one-dimensional string matching, A total of n1-m1 + 1 match.
The38-39Line, the text matrix to be matched moves down one line, and is calculated as a new dimension string.
The78-84The row, caculatenextstamp, is used to calculate a new string based on the previous one-dimensional conversion, similar to formula 2 in the rk algorithm.
Time Complexity:
Preprocessing-O (n2 * m1)
Calculation Mode matrix stamp O (m1 * m2) + calculation of the stamp O (n2 * m1) of the M1 row before the text matrix + calculation mode matrix stamp next array O (m2)
Match-O (n1-m1 + 1) * N2)
Total n1-m1 + 1
Match O (n2) + with the KMP algorithm to calculate the stamp O (n2) of m rows in the text matrix each time)
Total time complexity-O (N1 * N2)
[Transfer]: http://www.slimeden.com/2010/10/algorithm/matrixpatternmatching