A simple example of C-language KMP algorithm and its implementation principle explore _c language

Source: Internet
Author: User

Previously seen KMP algorithm, at that time after contact always feel good esoteric ah, holding the data structure of the number of chewing a noon, and finally to understand the general, and then mention KMP also only left "Austria, it is to do pattern matching" this dry goods. Recently has the time, turns out the algorithm introduction to look, originally is so simple (does not say the procedure realization, the thought is very simple).

Classic application of pattern matching: Find the location of the pattern strings from a string. As in "ABCdef", "CDE" appears in the third position of the original string. Look at the basics

A simple pattern matching algorithm

A:ABCDEFG B:CDE

First of all, B from the first bit of a, b++==a++, if all set up, return can; if not, jump out, start from the second position of a, and so on.

Copy Code code as follows:

/*
* Houkai, 2014-9-16
* Function: Pattern matching
*/
#include <iostream>
#include <string>
using namespace Std;

int index (char *a,char *b)
{
int tarindex = 0;
while (a[tarindex]!= ' ")
{
int tarlen = Tarindex;
int Patlen;
for (patlen=0;b[patlen]!= ';p atlen++)
{
if (A[tarlen++]!=b[patlen])
{
Break
}
}
if (b[patlen]== ' ")
{
return tarindex;
}
tarindex++;
}
return-1;
}
int main ()
{
Char *a = "abcdef";
Char *b = "CDF";
Cout<<index (a,b) <<endl;
System ("Pause");
}

The idea is plain and efficient, but time complexity is O (MN), and M and N are the lengths of strings and pattern strings, respectively. Pattern matching is a common application problem, with a wide range of people thinking to optimize. Rabin-karp algorithm, finite automata, and so on, and finally came up with KMP (Knuth-morris-pratt) algorithm.

KMP algorithm

Optimization: If we know that the pattern of a and the back is not equal, then after the first comparison, we found that the next 4 characters of the enclosed corresponding to the same, you can see a match in the position of a direct positioning to f. It is not necessary to explain the backtracking of the main string corresponding to position I. This is the most basic and KMP idea and goal of the most important.

Another example:

Since ABC is equal to the following ABC, the red part can be directly obtained. And according to the results of the previous comparison, ABC does not need to be compared, now just start from the f-a to compare it. It is not necessary to explain the backtracking of the main string corresponding to position I. To change is the position of J in the pattern string (J does not have to start with 1, like the second example).

The change of J depends on the similarity of the prefix of the pattern string, in Example 2 ABC and ABC (near x), and the prefix is abc,j=4 to start execution.

J is the number of prefixes in the previous execution of the pattern substring (the first few, 6 in the preceding example) +1; It is related to the previous prefix in the pattern string and the same substring from the backward forward suffix, because the next part of the same prefix is moved to the position of this part of the suffix, because if you move to the previous position of the suffix, look at the image:

So if this is J, the next position should be the length +1 of the maximum prefix of the substring in front of J, and it will be happy to compare this new position with the I position of the original string.

This time is J, the next time in the end is how much, this involves how to calculate the problem? In fact, we can build this j->x relationship just by looking at the pattern string, which is called the prefix function, and the result is stored in the array, called the prefix array.

Pseudo code:

Copy Code code as follows:

Compiter-prefix-function (P)
M<-LENGTH[P]
pi[1]<-0
k<-0
For Q<-2 to M
Do While K>0 and P[k+1]!=p[q]
Do k<-pi[k]//prefix prefix ...
If P[K+1]==P[Q]
Then k<-k+1
Pi[q]<-k
return pi

Using the prefix array enables pattern matching to occur quickly, and the program matches all occurrences of the pattern in the string.

Copy Code code as follows:

Kmp-matcher (T, P)
N<-LENGTH[T]
M<-LENGTH[P]
Pi<-compiter-prefix-function (P)
q<-0
For I<-1 to N
Do While Q>0 and P[q+1]!=t[i]
Do q<-pi[q]//prefix prefix ...
If P[q+1]==t[i]
Then q<-q+1
If Q==m
Then print "pattern occurs with shift" i-m
Q<-PI[Q]

The two pieces of code thought exactly the same, if the prefix is compared with the prefixes ..., more ingenious. If the KMP is difficult to understand, it is estimated that the pseudo code.

The time complexity of the KMP algorithm is O (n+m).

Here needs to emphasize, KMP algorithm only when the pattern and the main string has many partial matches to be able to embody its superiority, the partial match when KMP's I does not need to backtrack, otherwise and the naïve pattern match does not have the difference.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.