Rabin-Karp-matcher string matching algorithm; a matching algorithm with good efficiency; the idea is the key.

Source: Internet
Author: User

# Include <stdio. h> <br/> # include <string. h> </P> <p> /****************************** * **/<br/> * n: text string T Length <br/> * m: mode string P length <br/> * H: D m-1) <br/> * D: base Number (26 in this example) <br/> * t [s]: d-base representation of T-substring corresponding to shift S <br/> * P: D-base representation of mode string P <br/> * Q: Q is a prime number (the larger the number, the more likely it is to discharge the illegal displacement S. However, after the General Assembly, the modulo operation overflows. Here we take 13) <br/>/*******************************/< /P> <p> const int maxn = 10000; <br/> int n, m, H, P, D, Q, t [maxn]; <br/> char T [maxn], p [maxn]; </P> <p>/ * ******************************/<Br/> * rabinkarpmatcher () string Matching Algorithm <br/> * preprocessing time: O (N-m + 1) + O (m) <br/> * worst matching time: O (n-m + 1) * m) <br/> * but the actual running effect is generally better than this <br/> * preprocessing: convert both T [s] and P into D-hexadecimal integers <br/> * the matching problem becomes the judgment integer T [s] = P <br/> * due to integer possibility very large, therefore, the mod operation must be performed. <br/> * as a result, even after the mod t [s] = P <br/> * cannot be determined whether it is completely equal. <br/> * therefore, check whether the T [s] is a false hit. <br/> * If the values are not equal after mod, it is definitely not hit <br/> /******************************* **/</P> <p> int modpow (int x, int y, int Z) <br />{< Br/> int ret = 1, I = 0; <br/> for (; I <Y; ++ I) <br/>{< br/> ret = (INT) (Ret * (long) x) % Z ); // Q * D must be in the length of a machine, that is, 64 bits of long in C Language <br/>}< br/> return ret; <br/>}</P> <p> void dnum () <br/>{< br/> int I = 1; <br/> P = 0; <br/> T [0] = 0; <br/> for (; I <= m; ++ I) <br/>{< br/> P = (INT) (long) p * D + (P [I]-'A') % d ); <br/> T [0] = (INT) (long) T [0] * D + (T [I]-'A') % d ); <br/>}</P> <p >}</P> <P> bool rabinkarpmatcher () <br/>{< br/> d = 26; // 26 hexadecimal number <br/> q = 13; // mode 13 <br/> N = strlen (t)-1; <br/> M = strlen (P)-1; <br/> H = modpow (D, m-1, q); // calculate d ^ (m-1) mod q <br/> dnum (); <br/> int S, J; <br/> bool has = false; </P> <p> for (S = 0; S <= N-m; ++ S) <br/>{< br/> If (T [s] = P) <br/>{< br/> bool is = true; <br/> for (j = 1; j <= m; ++ J) <br/>{< br/> If (T [S + J]! = P [J]) <br/>{< br/> is = false; <br/> break; <br/>}< br/> If (is) <br/>{< br/> valid position in printf ("t: % d/N ", S + 1); <br/> has = true; <br/>}< br/> If (S <n-m) <br/> {<br/> // + 26 * q prevent negative numbers during calculation <br/> T [S + 1] = (INT) (d * (T [s]-(T [S + 1]-'A') * (long) H) + 26 * q) + (T [S + m + 1]-'A') % Q); <br/>}< br/> return has; <br/>}</P> <p> int main () <br/> {<br/> T [0] = P [0] = 'Z '; <br/> printf ("Input Text string T, pattern string S, separated by a space or carriage return. /n "); <br/> scanf (" % S % s ", t + 1, p + 1); <br/> If (! Rabinkarpmatcher () <br/>{< br/> printf ("sorry, no matching successful! /N "); <br/>}< br/> return 0; <br/>}< br/>

 

 

Describe the idea of the algorithm:

 

A simple string match matches the substring starting from each character in the text string T with the pattern string P, with a total of N-m + 1 bits, m-bit for each worst scan, so complexity O (n-m + 1 ))

 

How does the rabinkarp algorithm optimize this process?

 

First, the simple string matching is simpler. The T substring starting from the second bit is converted into an integer, and the P string is also converted into an integer, as long as t [s] = P, that is, the integers corresponding to the two strings are equal, it is obvious that the two strings match. For example, you can understand it;

 

Assume that the string contains only three characters: A, B, and C. The corresponding 10-digit number is a: 0, B: 1, C: 2;

 

Assume that the text string T: ABC

Assume that the mode string P: AB

 

First, convert the P string to an integer 0*10 + 1 = 1.

The substrings of text string t are converted into integers: T [0] = 0*10 + 1 = 1; t [1] = 1*10 + 2 = 12;

 

In fact, each character is mapped to one digit of a D-base number, and then the corresponding integer is obtained like a binary number. Here, an iterative calculation method is provided, which is convenient, it is called the Horna rule. You can check the code.

 

Now the matching is changed to T [s] = P, then the matching is successful.

 

However, if the string P is very long, the integer will be very large and will overflow, so some processing is required, that is, MOD calculation.

 

The MOD values of two different strings may be equal, which makes it meaningless to judge. However, there is a feature that if the MOD results of the corresponding Integers of two strings are different, the two strings cannot be the same!

 

Therefore, this method can be used to exclude the judgment of some locations based on the simple matching method, thus reducing the complexity.

 

Preprocessing is to calculate T [s] and P, and to calculate the complexity of each t [s] is a square level, so here is a clever recursive method to find all t [s].

 

For example, the formula is not mentioned.

 

Assume that the text string T: ABC

Assume that the mode string P: AB

 

T [0] = AB, corresponding to 01

T [1] = (01-0*10) * 10 + 2 = 12

 

That is, if AB corresponds to 01, remove the 0 of A and add c = 2 to the right of 1 to 12, this process can be calculated by formula based on the Conversion principle between different hexadecimal units.

 

 

Let's talk about this. The introduction to algorithms should be easy to understand.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.