Data structure--simple pattern of string and KMP matching algorithm

Last Update:2014-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the simple mode

Suppose we want to find the substring t= "Google" from the main string s= "Goodgoogle" in the following steps:

I represents the current position subscript of the main string, J indicates the current position subscript of the substring, such as the position of j=4 and i=4 in the first-round comparison (I=1), followed by a pointer fallback, compared from i=2, as follows:

So repeated until compared to the position of I = (main string length-substring length + 1) or j = substring length exits the comparison loop, and the above main string and substring match exactly where the i=5 is compared.

#include <stdio.h>int Index (const char s[], const char t[], int pos) {int i = POS;//main string current subscript int J = 1;//substring current subscript//s[0] Yes  The length of the main string, t[0] is the length of the substring while (i <= s[0] && J <= T[0]) {///if equal then continue down comparison if (s[i] = = T[j]) {++i;++j;//If not equal then the pointer is back}else{i = i -J + 2; Main string Fallback j = 1;}} J>t[0] Indicates that the substring is fully matched if (J > t[0]) return i-t[0];else return 0;} int main () {char s[] = {ten, ' g ', ' o ', ' o ', ' d ', ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};char t[] = {6, ' g ', ' o ', ' o ', ' g ', ' l ', ' e '};i NT I =index (S, T, 1);p rintf ("%d\n", I); return 0;}

Analysis of the best case of this matching algorithm time complexity is O (1) Only need a comparison, the worst case is each time the last character mismatch, time complexity is O (m*n) m is the main string length n is the length of the substring.

Second, KMP algorithm

As with multiple 0 and 1 repeating strings like binary, the above pattern matching needs to be traversed very slowly, and the KMP algorithm can greatly avoid repeated traversal situations.

Let's take a look at the fundamentals of the KMP algorithm

As above, you can see that the main string s and the substring T in the first round of comparison, the front 5 is equal, only the position of i=6 and j=6. Since the 5 characters of the ABCDE in the substring T are not equal to each other, it is possible to know that a in the substring T is impossible and the position of j=2, 3, 4, 5 and so on typeface. So you can jump directly to the location of the i=6 to compare.

Again, if there are duplicate elements in the substring T (such as j=1,2 and j=4,5 characters), according to the above analysis, we can jump directly to the i=4 position comparison, but we already know j=1,2 and j=4,5 equal, and i=4,5 and j=4,5 equal, so can not compare i= 4,5 and j=1,2.

KMP pattern matching algorithm is to not let I pointer back, since I value does not fall back, we have to consider the value of change J. The above observation shows that the change of J value has nothing to do with the main string, but depends on whether there is a repetition problem in the substring T.

We define the change of J value for each position of T string as an array next, then the length of next is the length of the T string, and the following function can be obtained:

#include <stdio.h>void get_next (const char t[], int *next) {int i,j;i = 1;j = 0;next[1] = 0;//t[0] is the length of the substring T while (i < T[0]) {//t[i] represents a single character of the suffix//t[j] the single character of the prefix if (j==0 | | T[i] = = T[j]) {++i;++j;next[i] = j;} Else{j = Next[j];}}  int INDEX_KMP (const char s[], const char t[], int pos) {int i = pos;int j = 1;int Next[255];get_next (T, next); while (I <= S[0] && J <= T[0]) {//relative to the naïve algorithm, adds a j==0 judgment if (j==0 | | S[i] = = T[j]) {++i;++j;} Else{//j fallback to the appropriate position, the value of I is unchanged j = Next[j];}} if (J>t[0]) {return i-t[0];} Else{return 0;}} int main () {return 0;}

Data structure--simple pattern of string and KMP matching algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data structure--simple pattern of string and KMP matching algorithm

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support