Face-to-Face classic: Review KMP (without a picture, do the best)

Source: Internet
Author: User
The KMP algorithm has been confused by every beginner. In Data Structure textbooks, this algorithm came into being so early, how can you expect a person who has not yet understood binary tree traversal to understand KMP? The faster you remember, the faster you forget. It wasn't until many years later that we looked back and found that the KMP algorithm shocked me like an oracle. It is hard to imagine that Knuth, Pratt, and Morris found it at the same time. Let's assume that you have a string of red and blue beads in your hand, and there is a long string of beads hanging on the wall. They are also red and blue, your task is to find the same sequence as the beads in your hands. The simplest and easiest way to think of it is to start from the first one on the wall. If the beads in the hand are the same as the beads in the wall, then OK, different from the second one on the wall, and so on. (I will not give the code in the article, and I need to directly jump to the text to copy it.) Some people will think: If the beads in my hands are all red, I still use this method, are you stupid? Yes, so did Knuth. You see, if you have 100 consecutive red beads in your hand and 99 red beads from the first one on the wall, you will find one less than 100th, do you want to repeat it from the second one? No one will do this except programmers. Of course, normal people need to find continuous red beads from 100th. With this awareness, we can begin to enter the KMP algorithm. Now, let alone the metaphor of beads, we start to use the term "string". The beads on the hand are called pattern strings, and the beads on the wall are called as the main strings. The task is to find the mode string from the main string. To highlight the advantages of the KMP algorithm, our strings are composed of 0 and 1. The biggest problem with the original method is that, whenever there is no matching, we need to compare it again. This term is called backtracking. The KMP algorithm uses a clever method to avoid backtracking of the Main string. That is to say, the main string only needs to be scanned once from start to end. (Currently, there is no way to do not backtracing for mode strings, so the following describes that backtracing refers to the backtracking of the Main string.) Here, there are two questions: Is it possible not to backtrack? Can backtracing be avoided in any situation? First, we have already confirmed that in some cases, for example, we do not need to backtrack the 100 consecutive 1 s. The second problem is that, for example, if the first five are the same and there are 6th differences, can you directly start from the master string and compare it with the mode string? If the number starts from 2nd and matches the pattern string, you will miss it. The magic of KMP is that, if there are 6th differences, I will compare the 1st to 5th pattern strings with the 6th primary strings, determined by the value of next. Is this method safe? What about insurance! This proves to you that there are five situations when there are 6th misfit: the first four of the mode strings are the same as the first three of the mode strings before the first three and the first three are the same as the first two are the same the first one is the same as the first 0 is the same as the first 6th is the same as the first 0 is the same as the first 6th is the same, no matter how the main string and the pattern string change, these five cases cannot be escaped, and the subsequent comparison method of these five cases is just to continue the comparison with the one in the middle of the pattern string 1 to 5. For example, if the first four are the same, the comparison starts from 5th. If the first four are not the same, the pattern string starts with 1st. This shows that in any case, the main string does not need to be traced, provided that we have the next value. (Wait, you ignored a problem. What if the next value is related to the main string ?) (Re: the previous comparison has explained a problem. The first five of the mode strings are the same as the main strings, and the main strings are no longer useful, even if I want to trace back to the mode string, I can do it. Should I be satisfied with this explanation ?) Now, the KMP algorithm is ready to work. The method is as follows: 1. At first, compare them one by one as the original method. 2. If this parameter is equal to the primary key, assume that it is the I th of the primary key, the j th of the pattern key, and the next (j) of the pattern string) continue to compare with the primary string I. If 2.1 is the same, then compare them one by one. If 2.1 is different, repeat Step 23 is the same as the last digit of the pattern string, when Task 4 is finished, and the last bit of the Main string is still not completed, we will give up next (j) in Step 4 as a function. Don't worry about why it is so amazing, in short, if it can tell you which one it is, you just need to use it. Haha, it seems like God's Guide: you just compare, and let God tell you which one to continue. This is the KMP algorithm. After finishing your work. --------------------------------------------- (However, I haven't said how next (j) came out yet. Can this happen ?) In fact, the next (j) method is the most critical part of the KMP algorithm. To understand it, you can understand KMP! Let's explore the Internal principles of next (j. As mentioned above, in fact, there is no relationship between next (j) and the main string. This tells us that only the mode string is required to generate next (j ). In this case, the value of next (j) can be regarded as a function. Its independent variables are the pattern string and the mismatched position j. Suppose that next (6) is 3 in a pattern string. What does this mean? That is to say, if the 6th locations do not match, I will directly compare the 3rd mode strings with the 6th main strings, only 1st and 2 of the pattern strings match 4th and 5 of the primary strings, however, four or five of the primary strings match the four or five of the pattern strings, and are transmitted by equal links, we know that the 1, 2, and 4 and 5 of the pattern strings match. This reminds us that if the mode string contains the same fragments, such as 123 and 345, or 12 and 56, then we can continue the comparison at the end of the previous same segment when the last same segment ends out of mismatch. If the same clip does not exist, it means that the parts before the primary string misfit position will no longer match (otherwise, the conflict will occur ), we can continue the comparison from the last position of the primary string mismatch. Such a pair of identical fragments exactly starts from the first of the pattern string. This makes it easier for us to simply compare the 1st and 2nd pattern strings in sequence, then compare 1st and 3rd, and then 1st and 4th, and so on, you can find all possible identical fragments. Coincidentally, this is also a task to find the mode string. It is both the main string and the mode string. In the above self-comparison process, when each comparison is the same, we can record the next value. When there is a mismatch, we start the comparison from the next position of the Main string, and so on. no backtracking is required for the main string. Didn't we write down the previous next value? If it is different for the first time, let's make a rule: next (1) = 0, next (2) = 1, so that when 1st and 2nd are compared, the value of next (2) already exists, and each comparison has the value of next at the current position. Here, the value of next is 0, indicating that there cannot be a match before the mismatch position. In this case, the next value of the primary string is also counted as 0 in the next position. In this way, it seems that the process of finding the next value is similar to that of KMP itself! However, we have noticed that they do not need to be similar. In fact, the process of finding the next value does not need the KMP method. It can be obtained using the original method, but it is a bit more troublesome, we recommend that you do not overwrite the next value recorded in the previous section. So far, the essence of the KMP algorithm should be completely introduced. The last point is the so-called next correction value. This is corrected in the process of finding the next value. If it is not corrected, it does not affect the matching result. The correction is an optimization! Specifically, the correction is like this: if j is out of mismatch, isn't it necessary to jump to next (j? If there is another mismatch, isn't it necessary to jump to next (j)? If there is another mismatch, it will jump to next (j )))...... In order to omit the non-stop hop process, we note that if the j and j are the same, the j is not matched and the next (j) is not matched. In this case, why do we compare them? When calculating the next value, we will judge whether it is the same. If it is the same, we will use the previous next value directly. Now, KMP is all completed. The code is everywhere on the Internet. I will not give it here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.