KMP string matching algorithm [Z]

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The most common string matching algorithm will not be remembered. Simply paste the code

Int strstr (char * sub, char * str ){
Int I = 0;
Char * P = STR, * q = sub;
While (* (p + I )! = '/0' & * (q + I )! = '/0 '){
If (* (q + I) = * (p + I ))
I ++;
Else {
P ++;
I = 0;
}
}
If (* (q + I) = '/0 ')
Return p-STR;
Return-1;
}

Next, let's talk about the KMP algorithm I understand. Unlike the common matching algorithm, when the KMP algorithm fails to match the substring, the next step is not to re-start matching from the substring header, the next function is used to calculate the position where the substring is matched. For example, it is easier to describe

Red indicates the invalid position, '^' indicates the current pointer position ,'~ 'Indicates the starting position of string.

If it is a common matching algorithm, when it fails, the C string pointer should be traced back to the header, and the string pointer should also be traced back '~ . That is, the next step should start matching the second character of a (e.g. 'B') and the beginning of C (e.g. 'A. This cycle

Until a position in string a matches string C.

However, from the above matching process, we can find that the blue part of A and B has been confirmed in the first step, the match in the above four steps can be regarded as comparing the first half of the blue part with the second half, and the match in the blue part has expired, therefore, these comparisons are meaningless because they are irrelevant to the parent string a and are information of the Child string C itself. Only Step 4 involves the comparison with the character ('C') in parent string.

The KMP algorithm uses this to calculate the position where the next matching should continue when a matching fails by using the information of the substring itself. That is, when the string C is invalid in the last 'B' match, it skips the unnecessary match in the first three steps (1, 2, 3, the next comparison will be performed between 'D' and 'c. In this way, the parent string a does not need to be traced back, and the C string uses a next function to determine the position where it will be traced back. Therefore, the next function is crucial and also the key to this algorithm.

From the above analysis, we can know that the next function is determined by the nature of the substring C itself.

Assume that the substring

Next (j) = K (k> = 0): When the J + 1 character of P fails to match, if the parent string pointer does not backtrack, the next step is to compare and compare. Function compute of next (j) can be expressed

When next (j) = K (k> = 0), the Child string pointer goes back to the position where the parent string pointer remains unchanged;
When next (j) =-1, the Child string pointer goes back to the header, and the parent string pointer goes one step forward;

When designing a program for calculating the next value, we do not need to calculate maximum (k) in every step. We can do this in recursion.

For example
Assume that the substring is P: "abacabab", and we will require the next value of 'B', e.g. next [7]
Suppose next [0 ~ 6] are known: next [0] =-1, next [1] =-1, next [2] = 0, next [3] =-1, next [4] = 0, next [5] = 1, next [6] = 2

"Aba caba B"

Next [6] = 2 can indicate P [0 ~ 2] (blue) and P [4 ~ 6] (red) is the same

If the value of next [7] is required, we can find the substrings with the longest first half of the first six digits ("abacaba") that are equal to the second half, then compare whether the next digit of the first half of the substring is equal to P [7. In this example, P [0 ~ Next [6] (e.g. P [0 ~ 2]) This is the substring. next we compare c and B, that is, P [next [6] + 1] ('C ') and P [7] ('B ').

If they are equal, next [7] = next [6] + 1
If not, we can further find the substrings that are equal to the shorter first half and the second half, because ABA and ABA are the same, to find a substring that is shorter than 'abc' in 'aba Caba ', the value of next [2] In 'aba' is the same. that is, the value of next [next [6. Then compare P [next [next [6] + 1] and P [7]. If not, continue to look for a shorter string like this.

In the preceding example, P [next [6] + 1] = P [3] ('C') is not equal to P [7] ('B, however, P [next [next [6] + 1] = P [next [2] + 1] = P [1] ('B '), equal to P [7] ('B ')

Next [7] = next [next [6] + 1 = next [2] + 1 = 1;

Code for calculating the next value:

Void calnext (char * P, int next []) {
Next [0] =-1; // the next of the first element is always-1, because according to (1), we cannot find a K smaller than J = 0.
For (INT I = 1; I <strlen (p); I ++ ){
Int K = next [I-1]; // because the recursive method is used, to calculate next [I], record next [I-1] First and assume next [I-1] is known
While (P [k + 1]! = P [I] & K> = 0) {// Recursion
K = next [k];
}
If (P [k + 1] = P [I]) // if the end is equal, find a pair of prefix strings and suffix strings whose length is K.
Next [I] = k + 1; // an identical item is added.
Else
Next [I] =-1; // other cases
}
}

Matched code:

Int find (char * t, char * Pat ){
Int n = strlen (PAT );
Int * Next = new int [N];
Calnet (Pat, next );
Char * P = T, * q = pat;
Int I = 0;
While (* P! = '/0' & (* (q + I )! = '/0 ')){
If (* P = * (q + I )){
P ++;
I ++;
} Else {
If (I = 0)
P ++;
Else
I = next [I-1] + 1;
}
}
If (* (q + I) = '/0 ')
Return p-T-n;
Else
Return-1;
}

Record the KMP you understand so that you do not forget what you understand.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KMP string matching algorithm [Z]

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support