Data structures and algorithms in JavaScript (V): Classic KMP algorithm, data structure and algorithm kmp
KMP algorithm and BM algorithm
KMP is a classic algorithm for prefix matching and BM suffix matching. It can be seen that the difference between prefix matching and suffix matching is that the comparison sequence is different.
Prefix match refers to the comparison between the mode string and the parent string from left to right, and the movement of the mode string is also from left to right.
Suffix matching refers to the comparison between the mode string and the parent string from right to left, and the movement of the mode string from left to right.
In the previous chapter, the BF algorithm is also a prefix algorithm, but it is very aggressive to match one by one. Naturally, we do not need to mention O (mn), and KMP on the Internet is a lot of explanation, it's basically because you are still confused by taking the tall route. I try to use my own understanding to describe it in the most grounded manner.
KMP
KMP is also an optimized prefix algorithm. KMP stands for Knuth, Morris, and Pratt, compared to BF, the optimization point of the KMP algorithm is "The distance from each move in the future". It dynamically adjusts the moving distance of each mode string. BF is equal to + 1 each time,
KMP is not necessarily
Comparison between BF and KMP pre-Algorithms
By comparing the images, we found that:
Search for the pattern string P in the text string T, and find the second-class inconsistency when naturally matching the 6th letter c. Then, the BF method is to move the entire pattern string P by one, KMP moves two nodes.
We know the matching method of BF, but why does KMP move two places instead of one or three digits?
In the above figure, we will explain that the pattern string P is correct when it matches ababa. When it reaches c, it is an error. The idea of the KMP algorithm is: ababa is the correct matching information. Can we use this information? Do not move "Search location" back to a position that has already been compared and continue to move it back, which improves efficiency.
So the question is, how do I know how many locations to move?
The authors of this offset algorithm KMP have summarized the following:
Copy codeThe Code is as follows:
Number of mobile digits = number of matched characters-partially matched values
The offset algorithm is only related to the sub-string. It does not matter whether the text string is non-wool, so pay special attention to it here.
So how do we understand that the number of matched characters in the substring matches the corresponding part of the matched value?
Matched characters:
Copy codeThe Code is as follows:
T: abababaabab
P: ababacb
The red mark in p is a matched character, which is easy to understand.
Partially matched values:
This is the core algorithm, which is hard to understand.
Suppose:
Copy codeThe Code is as follows:
T: aaronaabbcc
P: aaronaac
We can observe that if we make an error when matching c, the next Moving position is the most reasonable in terms of the previous structure?
Copy codeThe Code is as follows:
Aaronaabbcc
Aaronaac
That is to say: In the pattern text, the beginning and end of a certain segment of characters are the same, so you can skip this segment of content during natural filtering. This idea is also reasonable.
After knowing this rule, the algorithm for some matching tables is as follows:
First, you need to understand two concepts: "prefix" and "suffix ". "Prefix" refers to the combination of all the headers of a string except the last character. "suffix" refers to all the Tail Combinations of a string except the first character.
"Partial matching value" is the length of the longest common element of "prefix" and "suffix"
Let's take a look at aaronaac's division when BF matches.
BF Displacement: a, aa, aar, aaro, aaron, aarona, aaronaa, and aaronaac
What about KMP division? The prefix and suffix are introduced here.
Let's take a look at the results of the KMP matching table as follows:
Copy codeThe Code is as follows:
A r o n a c
[0, 1, 0, 0, 0, 1, 2, 0]
It must be confused. If we don't rush to break it down, the prefix and suffix
Copy codeThe Code is as follows:
Matching string: "Aaron"
Prefix: A, Aa, Aar, and O
Suffix: aron, ron, on, n
Moving position: in fact, it is to compare the prefix and suffix of each matched character to determine whether it is equal, and then calculate the total length
Decomposition of partially matched tables
Algorithms used to match tables in KMP. p indicates the prefix, n indicates the suffix, and r indicates the result.
Copy codeThe Code is as follows:
A, p => 0, n => 0 r = 0
Aa, p => [a], n => [a], r = a. length => 1
Aar, p => [a, aa], n => [r, ar], r = 0
Aaro, p => [a, aa, aar], n => [o, ra, aro], r = 0
Aaron p => [a, aa, aar, aaro], n => [n, on, ron, aron], r = 0
Aarona, p => [a, aa, aar, aaro, aaron], n => [a, na, ona, rona, arona], r = a. lenght = 1
Aaronaa, p => [a, aa, aar, aaro, aaron, aarona], n => [a, aa, naa, onaa, ronaa, aronaa], r = Math. max (. length, aa. length) = 2
Extends onaac p => [a, aa, aar, aaro, aaron, aarona], n => [c, ac, aac, naac, onaac, ronaac] r = 0
Similar to the BF algorithm, we first break down the location of each possible matched sub-object and cache it first. During matching, we use this part of the matching table to locate the number of digits that need to be moved later.
Therefore, the final result of the matching table of mongoonaac is ,.
The following will implement the JS version of KMP. There are two types:
KMP implementation (1): cache matching table KMP
KMP Implementation (2): dynamically calculate next KMP
KMP implementation (1)
Matching table
In the KMP algorithm, the most important thing is to match the table. If you do not match the table, it is the implementation of BF, and the matching table is the KMP.
The matching table determines the next displacement count.
We designed a kmpGetStrPartMatchValue method for the above table matching rules.
Function kmpGetStrPartMatchValue (str) {var prefix = []; var suffix = []; var partMatch = []; for (var I = 0, j = str. length; I <j; I ++) {var newStr = str. substring (0, I + 1); if (newStr. length = 1) {partMatch [I] = 0;} else {for (var k = 0; k <I; k ++) {// prefix [k] = newStr. slice (0, k + 1); // suffix [k] = newStr. slice (-k-1); // calculate the size if it is equal and put it into the result set if (prefix [k] = suffix [k]) {partMatch [ I] = prefix [k]. length ;}} if (! PartMatch [I]) {partMatch [I] = 0 ;}} return partMatch ;}
Based on the algorithm of matching table in KMP, str. substring (0, I + 1) breaks down a-> aa-> aar-> o-> aaron-> mongoona-> mongoonaa-mongoonaac
Then, the length of the common element is calculated through the prefix Suffix in each decomposition.
Rollback algorithm
KMP is also a front-end algorithm. You can move the BF set. The only modification is to add 1 directly when BF backtracking, in KMP backtracking, we can use the matching table to calculate the next value.
// Subloop for (var j = 0; j <searchLength; j ++) {// if it matches the primary string if (searchStr. charAt (j) = sourceStr. charAt (I) {// if it is matched, if (j = searchLength-1) {result = I-j; break;} else {// if it is matched, continue the loop. I ++ is used to add the subscript I ++;} of the Main string ;}} else {// In the substring match, I is superimposed if (j> 1 & part [j-1]> 0) {I + = (I-j-part [j-1]);} else {// move one I = (I-j)} break ;}}
The red mark is the value of next at the KMP core point = the number of matched characters-corresponding partially matched values
Complete KMP Algorithm
<! Doctype html> <div id = "test2"> <div> <script type = "text/javascript"> function kmpGetStrPartMatchValue (str) {var prefix = []; var suffix = []; var partMatch = []; for (var I = 0, j = str. length; I <j; I ++) {var newStr = str. substring (0, I + 1); if (newStr. length = 1) {partMatch [I] = 0;} else {for (var k = 0; k <I; k ++) {// obtain the prefix [k] = newStr. slice (0, k + 1); suffix [k] = newStr. slice (-k-1); If (prefix [k] = suffix [k]) {partMatch [I] = prefix [k]. length ;}} if (! PartMatch [I]) {partMatch [I] = 0 ;}} return partMatch;} function KMP (sourceStr, searchStr) {// generate the matching table var part = kmpGetStrPartMatchValue (searchStr); var sourceLength = sourceStr. length; var searchLength = searchStr. length; var result; var I = 0; var j = 0; for (; I <sourceStr. length; I ++) {// outermost loop, main string // subloop for (var j = 0; j <searchLength; j ++) {// if it matches the primary string if (searchStr. charAt (j) = sourceStr. charAt (I) {// if it is matched, if (j = searchLength-1) {result = I-j; break;} else {// if it is matched, continue the loop. I ++ is used to add the subscript I ++;} of the Main string ;}} else {// In the substring match, I is superimposed if (j> 1 & part [j-1]> 0) {I + = (I-j-part [j-1]);} else {// move one I = (I-j)} break ;}} if (result | result = 0) {break;} if (result | result = 0) {return result} else {return-1 ;}} var s = "bbc abcdab abcdabcdabde"; var t = "ABCDABD"; show ('indexof ', function () {return s. indexOf (t)}) show ('kmp', function () {return KMP (s, t)}) function show (bf_name, fn) {var myDate = + new Date () var r = fn (); var div = document. createElement ('div ') div. innerHTML = bf_name + 'algorithm, search location:' + r + ", time consumed" + (+ new Date ()-myDate) + "ms"; document. getElementById ("test2 "). appendChild (div) ;}</script> </div>
KMP (2)
The first kmp algorithm is obvious, that is, to query matching tables through cache, that is, to change the common space for time. The other is the time-based search algorithm. The matching value is calculated by passing a specific string. The principles are the same.
When a cache table is generated, it is calculated as a whole. Now we only need to pick one of them, as long as the algorithm locates the matching of course.
Next Algorithm
Function next (str) {var prefix = []; var suffix = []; var partMatch; var I = str. length var newStr = str. substring (0, I + 1); for (var k = 0; k <I; k ++) {// get prefix [k] = newStr. slice (0, k + 1); suffix [k] = newStr. slice (-k-1); if (prefix [k] = suffix [k]) {partMatch = prefix [k]. length ;}} if (! PartMatch) {partMatch = 0;} return partMatch ;}
In fact, it is the same as the matching table. The loop is removed and the string that has been successfully matched is directly located.
Complete KMP. next Algorithm
<! Doctype html> <div id = "testnext"> <div> <script type = "text/javascript"> function next (str) {var prefix = []; var suffix = []; var partMatch; var I = str. length var newStr = str. substring (0, I + 1); for (var k = 0; k <I; k ++) {// get prefix [k] = newStr. slice (0, k + 1); suffix [k] = newStr. slice (-k-1); if (prefix [k] = suffix [k]) {partMatch = prefix [k]. length ;}} if (! PartMatch) {partMatch = 0;} return partMatch;} function KMP (sourceStr, searchStr) {var sourceLength = sourceStr. length; var searchLength = searchStr. length; var result; var I = 0; var j = 0; for (; I <sourceStr. length; I ++) {// outermost loop, main string // subloop for (var j = 0; j <searchLength; j ++) {// if it matches the primary string if (searchStr. charAt (j) = sourceStr. charAt (I) {// if it is matched, if (j = searchLength-1) {result = I-j; break;} else {// if it is matched, continue the loop. I ++ is used to add the subscript I ++;} else {if (j> 1) {I + = I-next (searchStr. slice (0, j);} else {// move one I = (I-j)} break;} if (result | result = 0) {break ;}} if (result | result = 0) {return result} else {return-1 ;}} var s = "bbc abcdab abcdabcdabde "; var t = "ABCDAB"; show ('indexof ', function () {return s. indexOf (t)}) show ('kmp. next ', function () {return KMP (s, t)}) function show (bf_name, fn) {var myDate = + new Date () var r = fn (); var div = document. createElement ('div ') div. innerHTML = bf_name + 'algorithm, search location:' + r + ", time consumed" + (+ new Date ()-myDate) + "ms"; document. getElementById ("testnext "). appendChild (div) ;}</script> </div>
Git code download: https://github.com/JsAaron/data_structure