hdu3336 interpreting the next array of KMP algorithms

Source: Internet
Author: User

View original question

Test instructions is roughly: give you a string count the number of occurrences of all prefixes in this polygon. For example, the string Abab,a appears 2 times. AB appeared 2 times, and Aba appeared 1 times. Abab appeared 1 times. Total 6 times.

And it turned out to be too big. Requires a modulo operation on 1007.

AC Code

#include <iostream>using namespace std; #include <string>string s;int n,next[200005];void getNext () {    int len = n;        Next[0]=-1;    int i=0,j=-1;    while (I<len)    {        if (j==-1| | S[I]==S[J])        {            ++i;            ++j;            next[i]=j;        }        else            j = next[j];    }} void Main () {    int t;    cin>>t;    while (t--)    {        cin>>n;        cin>>s;        GetNext ();        int sum=0;        for (int i=1;i<=n;i++)        {            int j=i;            while (j)            {            sum = (sum+1)%10007;            j = next[j];            }        }        cout<<sum<<endl;    }}

An overview of the next array of KMP

Paste code is not the goal, the interpretation algorithm is the key.

。 The idea of solving the problem is to use the KMP algorithm, but not the complete KMP algorithm. It only uses its next array of methods.

However, this is the key to the KMP algorithm itself. The implementation of the next array in the GetNext function in the above code is a classic implementation. Template code.

Very easy to find. The key here is to explain the idea of the next array.

In the sky-flying network data, the next array is represented in roughly two ways:

    • Next array the first bit is-1
    • Next array the first bit is 0

Basically the same. Here I use the first element is the solution of 1, to note that if the first element is-1 of the scheme. Then the size of the next array is the pattern string length +1! To give a sample example:

Subscript 0 1 2 3 4
Pattern string A B A B
Next array -1 0 0 1 2

Of course, here I'm representing a C + + string string. Not a C-style string, so I didn't write '. ' If it's a C-style string (a character array) then the red part is ' s '. It's just not the point, is it?

In the KMP algorithm, there are two general understandings of the next array ( take the first element of the next array-1 For example, which is slightly different for the 0 O'Clock statement ):

    1. The pattern string should backtrack where the pattern string is missing from the main string somewhere.

    2. Ends with the previous bit of the current position, with the maximum length of the preceding string that matches the prefix.

Here are some explanations for these two points:

1th

For example, there is a main string abacabab. There is a pattern string abab. Whether to include a pattern string from within the main string. Then we traverse two strings in turn, and if we traverse the two strings there are two pointers (pointers in logical sense). or called the cursor.

At the beginning, the first three bits can be matched.

Subscript 0 1 2 3 4 5 6 7
Main string A B A C A B A B
Pattern string A B A B

Then the subscript is at 3. That is, the red part is missing.

The simple string matching algorithm is to move the pointer of the main string to the subscript 1. The pattern string pointer is zeroed, which moves to the first place. However, this is obviously an inefficient operation.

The KMP algorithm is in such a case. Does not change the main string pointer, only changes the pattern string pointer, therefore the KMP algorithm also called no backtracking KMP algorithm . So the pattern string pointer changes to what, it depends on the next array.

In the above example, the next label is mismatch at 3, then go to see next[3], yes, 1.

So

Subscript 0 1 2 3 4 5 6 7
Main string A B A C A B A B
Pattern string A B A B
Move the pointer of the pattern string directly to the subscript at 1. Re-mismatch, then observe next[1] =0. Continue the process over and over again. Until the traversal is complete. KMP algorithm efficiency is O (m+n), where M and n are respectively the length of the main string and the pattern string.

2nd

Let's look at the table for the next array again.

Subscript 0 1 2 3 4 5 6 7
Main string A B A C A B A B
Pattern string A B A B

    • The next mark is 1 o'clock, to see the string before it, that is, to see a, self-matching does not count.

      The next array is 0.

    • The current mark is 2 o'clock, to observe that Ab,a and B do not match. The next array is 0.
    • Now labeled 3 o'clock, to observe the ABA, the end of a and prefix a match, because the match length is 1, so the next array is 1.
    • The current label is 4 o'clock. To observe Abab, the AB at the end matches the prefix AB, since the match length is 2 so the next array is 2.
Back to the subject

In the code:

        int sum=0;        for (int i=1;i<=n;i++)        {            int j=i;            while (j)            {            sum = (sum+1)%10007;            j = next[j];            }        }
Used to solve all prefix occurrences and. So why is that?

First look at the for loop, traversing from 1 to N, you should be very clear.

Our next array has one more length than the string length.

while (j) is caused by the case that I = 1,2,3......N in the For loop will make the sum+1.

This is very well understood because, for example, Abab, then a. Ab. Aba,abab. These 4 prefixes will definitely count to 1, right? Then a string of length n will at least make sum+n.


And then the next is J = Next[j]. Next we'll use reverse thinking to explain, another example. Another is the sum of the string Ababa, which is the number of occurrences of the prefix and. We are able to get its next array:

Subscript 0 1 2 3 4 5
Pattern string A B A B A
Next array -1 0 0 1 2 3

into the code above. Compared with abab, there is only one more person. So look directly at the time I equals N (n is 5). On the basis of sum=6 (the sum value of ABAB is 6).

    • J=i=5 //represents the longest prefix of length 5 for Ababa
    • while (j) was established. Sum=6+1=7
    • J=next[5]=3 //denotes aba This prefix of length 3
    • while (j) established, Sum=7+1=8
    • J=next[3]=1 //indicates a minimum prefix length of 1
    • while (j) was established. Sum=8+1=9
    • J=next[1]=0.
    • while (j) is not established, ends.
    • Finally sum=9

To understand the above gaze part, you need to go back to the front to see the 2nd on the next array interpretation.

hdu3336 interpreting the next array of KMP algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.