Data structure--simple pattern of string and KMP matching algorithm

Source: Internet
Author: User

First, the simple mode

Suppose we want to find the substring t= "Google" from the main string s= "Goodgoogle" in the following steps:


I represents the current position subscript of the main string, J indicates the current position subscript of the substring, such as the position of j=4 and i=4 in the first-round comparison (I=1), followed by a pointer fallback, compared from i=2, as follows:


So repeated until compared to the position of I = (main string length-substring length + 1) or j = substring length exits the comparison loop, and the above main string and substring match exactly where the i=5 is compared.


#include <stdio.h>int Index (const char s[], const char t[], int pos) {int i = POS;//main string current subscript int J = 1;//substring current subscript//s[0] Yes  The length of the main string, t[0] is the length of the substring while (i <= s[0] && J <= T[0]) {///if equal then continue down comparison if (s[i] = = T[j]) {++i;++j;//If not equal then the pointer is back}else{i = i -J + 2; Main string Fallback j = 1;}} J>t[0] Indicates that the substring is fully matched if (J > t[0]) return i-t[0];else return 0;} int main () {char s[] = {ten, ' g ', ' o ', ' o ', ' d ', ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};char t[] = {6, ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};i NT I =index (S, T, 1);p rintf ("%d\n", I); return 0;}
Analysis of the best case of this matching algorithm time complexity is O (1) Only need a comparison, the worst case is each time the last character mismatch, time complexity is O (m*n) m is the main string length n is the length of the substring.

Second, KMP algorithm

As with multiple 0 and 1 repeating strings like binary, the above pattern matching needs to be traversed very slowly, and the KMP algorithm can greatly avoid repeated traversal situations.

Let's take a look at the fundamentals of the KMP algorithm


As above, you can see that the main string s and the substring T in the first round of comparison, the front 5 is equal, only the position of i=6 and j=6. Since the 5 characters of the ABCDE in the substring T are not equal to each other, it is possible to know that a in the substring T is impossible and the position of j=2, 3, 4, 5 and so on typeface. So you can jump directly to the location of the i=6 to compare.


Again, if there are duplicate elements in the substring T (such as j=1,2 and j=4,5 characters), according to the above analysis, we can jump directly to the i=4 position comparison, but we already know j=1,2 and j=4,5 equal, and i=4,5 and j=4,5 equal, so can not compare i= 4,5 and j=1,2.

KMP pattern matching algorithm is to not let I pointer back, since I value does not fall back, we have to consider the value of change J. The above observation shows that the change of J value has nothing to do with the main string, but depends on whether there is a repetition problem in the substring T.

We define the change of J value for each position of T string as an array next, then the length of next is the length of the T string, and the following function can be obtained:


#include <stdio.h>void get_next (const char t[], int *next) {int i,j;i = 1;j = 0;next[1] = 0;//t[0] is the length of the substring T while (i < T[0]) {//t[i] represents a single character of the suffix//t[j] the single character of the prefix if (j==0 | | T[i] = = T[j]) {++i;++j;next[i] = j;} Else{j = Next[j];}}  int INDEX_KMP (const char s[], const char t[], int pos) {int i = pos;int j = 1;int Next[255];get_next (T, next); while (I <= S[0] && J <= T[0]) {//relative to the naïve algorithm, adds a j==0 judgment if (j==0 | | S[i] = = T[j]) {++i;++j;} Else{//j fallback to the appropriate position, the value of I is unchanged j = Next[j];}} if (J>t[0]) {return i-t[0];} Else{return 0;}} int main () {return 0;}


Data structure--simple pattern of string and KMP matching algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.