The KMP algorithm is really dead, and I finally understand it.

Source: Internet
Author: User
Tags string back
The following is a summary:
To understand the KMP algorithm, first look at the original pattern matching int match (char * string, char * Pat) // The original pattern matching function
...{
Int I = 0, j = 0;

While (I <strlen (string) & J <strlen (PAT ))...{
If (sting [I] = Pat [J])... {
I ++;
J ++;
} Else ...{
I = I-j + 1; // backtracing string
}
}

If (j = strlen (PAT ))
Return I-strlen (PAT );
Else
Return 0;
}

Because unnecessary backtracking increases the time complexity of the algorithm itself, a possibility of successful matching may be missed without backtracking.
For example, string = aaabababcaaadfe does not backtrace. This is the case with aaabababcaaadfe.
PAT = ababc // mismatch ababc
Therefore, aaabababcaaadfe is missing if no backtracking is performed.
Ababc // match successful
Why is that missing?
In fact, this is the problem of KMP research. It is determined by the nature of PAT = ababc, because pat itself has a coincidence.
So how does KMP work?
KMP is a trace Pat string, while the string does not always backtrack. This method improves the efficiency of the algorithm itself, especially when the string is very long.
Now we can study the KMP algorithm.
If we leave J-= 2; I unchanged for the above example, is it the case of missing. Obviously this is correct, so that the following process of matching the function will not be a problem. This is where the KMP algorithm is located. So how can we know when to reduce J? This is the failure function defined by KMP.
Let p = p0p1p... Pn-1 is a pattern, then its mismatch function f is defined:
F (j) = I is the maximum integer that satisfies I <j and makes p0p1... Pi = PJ-IPJ-I + 1... PJ if I> = 0
F (j) =-1
For example, for pattern PAT = abcabcacab, there are:
J 0 1 2 3 4 5 6 7 8 9
Pat A B C A B
F (j)-1-1-1 0 1 2 3-1 0 1

Oh, is it a bit dizzy? I see it is dizzy. It doesn't matter. Continue, go ahead! Life will be better! Compared with my previous ideas, I will understand some things. That is to say, if J = 6 is compared and a problem occurs, we can start from the position J = 3, that is to say, Pat abcabcacab traces back three positions, while the three positions of the current bits of string I must be ABC. Do you understand something.
Anyhow, let's implement this failure function first and then/** // * Author: livahu
** 2006.11.26
**/
Void fail (char * Pat, int * failure )...{
Int I, J;
Int n = strlen (PAT );
Failure [0] =-1; // define the initial condition
For (j = 1; j <n; j ++ )...{
I = failure [J-1];
While (Pat [J]! = Pat [I + 1]) & (I> = 0 ))
I = failure [I];
If (Pat [J] = Pat [I + 1])
Failure [J] = I + 1;
Else
Failure [J] =-1; // It indicates that there is no coincidence and a duplicate character string cannot be formed.
}
}

Now let's look at how to define the failure function.
Of course it is to get the pattern matching rules: if there is a shape such as Si-J... Si-1 = p0p1... Pj-1 and Si! = PJ, if J! = 0, then the next pattern match, from the mismatch character Si and pattern string character PF (J-1) + 1 re-start the comparison; And if J = 0, then we continue to compare Si + 1 and P0
Now implement the pattern matching algorithm pmatch ()/** // * Author: livahu
** 2006.11.26
*/
Include <stdio. h>
# Include <string. h>
# Define max_string_size 100
# Define max_pat_size 100

Int pmatch (char * string, char * Pat)
...{
Int failure [max_pat_size];
Fail (Pat, failure); // obtain the f function value of Pat at different positions.

Int I = 0, j = 0;
Int lens = strlen (string );
Int lenp = strlen (PAT );

While (I <lens & J <lenp )...{
If (string [I] = Pat [J])... {
I ++;
J ++;
} Else if (j = 0)... {// move the string back
I ++;
} Else ...{
J = failure [J-1] + 1; // convert pattern string characters to PF (J-1) + 1
}
}

Return (j = lenp )? (I-lenp):-1); // determines whether the matching is successful.
}

Do I really admire D. E. knuth, J. H. Morris and V. R. Pratt? They are actually very simple principles, but they just summarize them.
Innovation is the most difficult!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.