This article mainly introduces the data structure and algorithm in JavaScript (v): Classical KMP algorithm, this article explains the KMP algorithm side in, the need for friends can refer to the
KMP Algorithm and BM algorithm
KMP is the classical algorithm of prefix matching and bm suffix matching, it can be seen that the difference between prefix matching and suffix matching is only the different order of comparison.
Prefix matching is: the comparison of pattern strings and parent strings from left to right, the movement of pattern strings is also from left to right
Suffix matching refers to the comparison between the pattern string and the parent string, from right to left, and the pattern string moving from left to right.
In the previous chapter it is obvious that the BF algorithm is also a prefix algorithm, but it's very good. The efficiency of matching one by one naturally do not have to mention O (MN), online egg pain is explained a lot of KMP, basically are walking tall on the route to see you are also confused, I tried to use their own understanding of the most grounded gas to describe
Kmp
KMP is also an optimized version of the prefix algorithm, the reason is called KMP is Knuth, Morris, Pratt three names of the initials, compared to the BF then KMP algorithm optimization point in the "each back moving distance" it will dynamically adjust the distance of each mode string, the BF is every time +1,
KMP is not necessarily
such as figure BF and KMP the difference between the predecessor algorithm
In contrast to the diagram we found:
In the text string T Search mode string p, in the natural match the 6th letter C when found two inconsistent, then the BF method, that is, the whole mode of the string p to move one, KMP is mobile two-bit.
BF matching method We know, but why does KMP move two-bit, not one or three-bit four-bit?
This is the last picture we explained, pattern string p in the match Ababa is correct, when to C is the error, then KMP algorithm idea is: Ababa is the correct matching information, we can not use this information, do not put "search location" back to have been compared to the location, Continue to move it backwards, which improves efficiency.
So the question is, how many locations do I know how to move?
This migration algorithm KMP's authors have summed it up:
The code is as follows:
Moved digits = number of characters matched-corresponding partial match values
The migration algorithm is only related to the substring, there is no text string no yarn relationship, so here need to pay special attention to
So how do we understand the number of matched characters in the substring and the corresponding partial match values?
Characters that have been matched:
The code is as follows:
T:abababaabab
P:ababacb
The red mark in P is the character that has been matched, which is well understood.
Partial-match values:
This is the core of the algorithm, but also more difficult to understand
If:
The code is as follows:
T:aaronaabbcc
P:aaronaac
We can observe this text if we make a mistake in matching C, our next moving position is in the last structure, where is the most reasonable move?
The code is as follows:
Aaronaabbcc
Aaronaac
That is to say: In the pattern text inside, a paragraph of characters are the same, then the natural filter can skip this paragraph of content, this idea is also reasonable
Knowing this rule, the Partial matching table algorithm is given as follows:
First, understand two concepts: "prefix" and "suffix." "prefix" means the entire header combination of a string except the last character; Suffix "refers to the entire tail combination of a string except the first character.
"Partial match value" is the length of the longest common element of "prefix" and "suffix"
Let's see if the AARONAAC is a BF match when the partition is this
Displacement of the BF: A,AA,AAR,AARO,AARON,AARONA,AARONAA,AARONAAC
What about the KMP? Here's where we're going to introduce prefixes and suffixes.
Let's take a look at the results of the KMP Partial matching table:
The code is as follows:
A A r o n a a C
[0, 1, 0, 0, 0, 1, 2, 0]
It's gotta be confused, no hurry we decompose, prefixes and suffixes
The code is as follows:
Match string: "Aaron"
Prefix: A,aa, Aar, Aaro
Suffix: aron,ron,on,n
Move position: In fact, for each matched character to do the prefix and suffix of the comparison is equal, and then calculate the total length
Decomposition of partial matching tables
Algorithm of matching table in KMP, where p denotes prefix, n denotes suffix, R indicates result
The code is as follows:
A, p=>0, n=>0 r = 0
AA, P=>[a],n=>[a], r = a.length => 1
AAR, P=>[a,aa], N=>[r,ar], r = 0
Aaro, P=>[a,aa,aar], N=>[o,ra,aro], r = 0
Aaron P=>[a,aa,aar,aaro], N=>[n,on,ron,aron], r = 0
Aarona, P=>[a,aa,aar,aaro,aaron], N=>[a,na,ona,rona,arona], r = a.lenght = 1
Aaronaa, P=>[a,aa,aar,aaro,aaron,aarona], N=>[a,aa,naa,onaa,ronaa,aronaa], r = Math.max (a.length,aa.length) = 2
AARONAAC P=>[a,aa,aar,aaro,aaron,aarona], N=>[C,AC,AAC,NAAC,ONAAC,RONAAC] r = 0
Similar to the BF algorithm, the first decomposition of each possible match of the subscript position first cached, in the matching time through this "partial matching table" to locate the number of digits after the need to move
So the result of the last AARONAAC matching table 0,1,0,0,0,1,2,0 is this.
The following will realize the JS version of the KMP, there are 2 kinds
KMP implementation (i): Cache matching Table KMP
KMP Implementation (II): Dynamically compute next KMP
KMP implementation (i)
Matching tables
KMP algorithm of the most important is the matching table, if not matching the table that is the BF implementation, plus matching table is KMP
The matching table determines next next shift count
According to the rules of the above matching table, we design a Kmpgetstrpartmatchvalue method
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 The 25 26 |
function Kmpgetstrpartmatchvalue (str) {var prefix = []; var suffix = []; var partmatch = []; for (var i = 0, j = Str.leng Th I < J; i++) {var newstr = str.substring (0, i + 1); if (newstr.length = = 1) {Partmatch[i] = 0;} else {for (var k = 0; k < i ; k++) {//prefix prefix[k] = newstr.slice (0, K + 1);//suffix suffix[k] = newstr.slice (-k-1);//If equal is computed and put in result set if (prefix[k) = = Suffix[k]) {partmatch[i] = prefix[k].length}} if (!partmatch[i]) {partmatch[i] = 0;}} return partmatch; } |
Completely according to the algorithm of the matching table in KMP, through str.substring (0, i + 1) decomposition A->AA->AAR->AARO->AARON->AARONA->AARONAA-AARONAAC
Then the length of the common element is calculated by the prefix suffix in each decomposition
Fallback algorithm
KMP is also a front algorithm, can completely put the BF that a set of moved over, the only modification is the BF back to the time directly is add 1,KMP in the back of the time we have to work through the matching table to calculate this next value can
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22-23 |
Sub-loop for (var j = 0 J < Searchlength J + +) {//If matching the main string if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If the match is complete if (j = = searchLength-1) {result = I-j; else {//If match is made, continue looping, i++ is used to increase the subscript bit i++ of the main string;} else {///in the match of substring I was superimposed if (J > 1 && part[j-1] > 0) {i = = (I-j-part[j-1]);} else {//move one i = (i- j)} break; } } |
The red mark is the key point of the KMP. Next value = number of characters matched-corresponding partial matching values
A complete KMP algorithm
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 5 86 87 88 89 90 91 92 93 94 95 96 97 98-99 100 |
<!doctype html><div id= "test2" ><div><script type= "Text/javascript" > function Kmpgetstrpartmatchvalue (str) {var prefix = []; var suffix = []; var partmatch = []; for (var i = 0, j = str.length; I ; J i++) {var newstr = str.substring (0, i + 1); if (newstr.length = = 1) {Partmatch[i] = 0;} else {for (var k = 0; k < i ; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {Partmatch [I] = prefix[k].length; } if (!partmatch[i]) {partmatch[i] = 0;}} return partmatch; } function KMP (SOURCESTR, SEARCHSTR) {//Generate matching table var part = Kmpgetstrpartmatchvalue (SEARCHSTR); var sou Rcelength = Sourcestr.length; var searchlength = searchstr.length; var result; var i = 0; var j = 0; for (; i < sourcestr.length; i++) {//outermost loop, main string //Sub loop for (var j = 0; J < Searchlength; J +) {//if the main string With if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If the match completes if (j = = SearchLength-1) {result = I-j; break;} else {//If match is made, continue looping, i++ is used to increase the subscript bit i++ of the main string;} else {///in the match of substring I was superimposed if (J > 1 && part[j-1] > 0) {i = = (I-j-part[j-1]);} else {//move one i = (i- j)} break; } if (Result | | = = = 0) {break;}} if (Result |: result = = 0) {return result} else {return-1;}} var s = "BBC abcdab Abcdabcdabde"; var t = "abcdabd"; Show (' IndexOf ', function () {return S.indexof (t)}) Show (' KMP ', function () {return KMP (s,t)}) Function Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', time consuming ' + (+new Date ()-mydate) + "MS"; document.getElementById ("Test2"). AppendChild (Div); } </script></div></div> |
KMP (ii)
The first kind of KMP algorithm is obvious, is through the cache lookup matching table is also the common space to change time. Then the other is to find the algorithm from time to time, by passing a specific completion string, calculate the matching value out, the principle is the same
The cache table is generated when the whole is calculated, we are now equal to just pick one of them, so long as the algorithm to locate the matching of course
Next algorithm
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
function Next (str) {var prefix = []; var suffix = []; var partmatch var i = str.length var newstr = str.substring (0, i + 1); for (var k = 0; k < i; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k) = = Suffix[k]) {partmatch = Prefix[k].length;}} if (!partmatch) {partmatch = 0;} return partmatch; } |
In fact, the same as the matching table, removed the loop directly to the current successful match of the string
A complete Kmp.next algorithm
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
<!doctype html><div id= "Testnext" ><div><script type= "text/javascript" > function Next ( STR) {var prefix = []; var suffix = []; var partmatch var i = str.length var newstr = str.substring (0, i + 1); for (Var k = 0; K < I; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {Partmatch = Prefix[k].length; } if (!partmatch) {partmatch = 0;} return partmatch; } function KMP (SOURCESTR, searchstr) {var sourcelength = sourcestr.length; var searchlength = searchstr.length; va R result; var i = 0; var j = 0; for (; i < sourcestr.length; i++) {//outermost loop, main string //Sub loop for (var j = 0; J < Searchlength; J +) {//if the main string With if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If matching completes if (j = = searchLength-1) {result = I-j; break;} else { If a match is made, continue the loop, i++ is used to increase the subscript bit i++ of the main string; } else {if (J > 1) {i + = I-next (Searchstr.slice (0,j));} else {//move one i = (i-j)} break;} if (Result | | result = = 0) {break;}} if (Result |: result = = 0) {return result} else {return-1;}} var s = "BBC abcdab Abcdabcdabde"; var t = "Abcdab"; Show (' IndexOf ', function () {return S.indexof (t)}) Show (' Kmp.next ', function () {return KMP (s,t)}) &N Bsp Function Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', time consuming ' + (+new Date ()-mydate) + "MS"; document.getElementById ("Testnext"). AppendChild (Div); } </script></div></div> |