Sunday Algorithm Research-string matching beyond KMP

Source: Internet
Author: User

The first time I heard about the Sunday algorithm, I said it was a big pie. In his illustrated explanations, I found that this algorithm is an easy-to-understand algorithm that is more efficient than KMP and BM.

So I tried to write one. If it was really good, share it.

First of all:

The Sunday algorithm is a faster algorithm proposed by Daniel M. Sunday in 1990 than the BM algorithm. The core idea is: During the matching process, the pattern string is not required to be compared from left to right or from right to left. When a mismatch is found, the algorithm can skip as many characters as possible to perform the next matching, thus improving the matching efficiency.

 

Assume that s [I] ≈ T [J], 1 ≤ I ≤ n, 1 ≤ j ≤ m in case of mismatch. At this time, the matched part is U, and the length of the string U is assumed to be L. 1. Obviously, s [L + I + 1] must participate in the next round of matching, and t [m] should at least move to this position (that is, the mode string T should move at least one character to the right ).

Figure 1 Sunday Algorithm Mismatch There are two cases: (1) s [L + I + 1] does not appear in the mode string T. At this time, the mode string T [0] is moved to the character position after s [T + I + 1. 2.

Figure 2 Sunday The number 1 Situation(2) S [L + I + 1] appears in the mode string. Here s [L + I + 1] from the right side of the pattern string T, that is, by T [M-1], t [M-2],… T [0. If it is found that s [L + I + 1] is the same as a character in T, write down this position as K, 1 ≤ k ≤ m, T [k] = s [L + I + 1]. In this case, the pattern string T should be moved to the right M-K character position, that is, to the T [k] And s [L + I + 1] Alignment position. 3.

Figure 3 Sunday The number 2 SituationAnd so on. If the match is complete, the match is successful. Otherwise, move the next round until the rightmost end of the Main string s ends. The worst case of this algorithm is O (n * m ). This algorithm is faster to match short mode strings. Code: Int Sunday (const char * SRC, const char * des) <br/>{< br/> int I, j, Pos = 0; <br/> int len_s, len_d; <br/> int next [26] = {0}; // next array, preprocessing initialization <br/> len_s = strlen (SRC ); <br/> len_d = strlen (DES); <br/> for (j = 0; j <26; ++ J) // initialize the next array <br/> next [J] = len_d; <br/> for (j = 0; j <len_d; ++ J) // set the next array <br/> next [des [J]-'a'] = len_d-j; <br/> while (Pos <(len_s-len_d + 1 )) // traverse the original string <br/> {<br/> I = Pos; <br/> for (j = 0; j <len_d; ++ J, ++ I) // compare <br />{< Br/> If (SRC [I]! = Des [J]) // if it does not match, the original string will jump to the next link <br/>{< br/> POS + = next [SRC [POS + len_d]-'a']; <br/> break; <br/>}< br/> If (j = len_d) <br/> return Pos; <br/>}< br/> return-1; // if no substring exists,-1 is returned. <br/>}</P> <p> int main () <br/>{< br/> char SRC [] = "abcdacdaahfacabcdabcdeaa"; <br/> char des [] = "ABCDE "; <br/> cout <Sunday (SRC, des) <Endl; <br/> return 0; <br/>}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.