KMP algorithm (prelude to AC automatic mechanism)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The KMP we are talking about here is not an algorithm for making movies (although I like this software. The KMP algorithm is used to process string matching. In other words, we provide two strings. You need to answer whether string B is a substring of string a (whether string a contains string B ). For example, if string a = "I'm matrix67" and string B = "matrix", we can say that B is a substring of. You can ask your mm politely: "If you want to confess to someone you like, is my name a substring in your confession ?"
To solve this problem, we usually use enumeration to start matching with string B from where string A is located, and then verify whether it matches. If string a is N and string B is m, the complexity of this method is O (Mn. Although the complexity often fails to reach Mn (only one or two letters in the first letter are found to be mismatched), there are many "worst cases", for example, a = "aaaaaaaaaaaaaaaaaaaaaaaaaaaab ", B = "aaaaaaaab ". We will introduce an O (n) algorithm (M <= N) in the worst case, that is, the legendary KMP algorithm.
KMP is called because the algorithm is proposed by knuth, Morris, and Pratt, and takes the first letter of the three names. At this moment, you may suddenly understand why the AVL Tree is called aVL, or why the middle of Bellman-Ford is not a point. Sometimes seven or eight people have studied one thing. How can we name it? Generally, this item will not be named by people, so as to avoid disputes, such as the "3X + 1 problem ". Far away.
I personally think that KMP is the least necessary thing to talk about, because this thing can find a lot of information online. However, the online lectures basically involve the concepts of "shift" and "next function, this is very easy to misunderstand (at least a year and a half ago I did not figure it out when I read these materials to learn KMP ). Here, I use another method to explain the KMP algorithm.

If a = "ababaababacb", B = "ababacb", let's see how KMP works. We use two pointers, I and j, to indicate that a [I-j + 1. I] is exactly the same as B [1. J. That is to say, I is constantly increasing. As I increases, J changes accordingly, and the string whose length ends with a [I] Is J exactly matches the first J characters of string B (the larger the value of J, the better ), now we need to check the relationship between a [I + 1] and B [J + 1. When a [I + 1] = B [J + 1], I and j each add one. When is J = m, let's say that B is a sub-string of A (string B has been completed), and the matching position can be calculated based on the I value. When a [I + 1] <> B [J + 1], the KMP policy is to adjust the position of J (reduce the value of J) so that a [I-j + 1 .. i] and B [1 .. j] Keep matching and the new B [J + 1] exactly matches a [I + 1] (so that I and j can continue to increase ). Let's take a look at the situation when I = J = 5.