Data structure and algorithm in JavaScript (v): Classical KMP algorithm

Source: Internet
Author: User
Tags comparison

This article mainly introduces the data structure and algorithm in JavaScript (v): Classical KMP algorithm, this article explains the KMP algorithm side in, the need for friends can refer to the

KMP Algorithm and BM algorithm

KMP is the classical algorithm of prefix matching and bm suffix matching, it can be seen that the difference between prefix matching and suffix matching is only the different order of comparison.

Prefix matching is: the comparison of pattern strings and parent strings from left to right, the movement of pattern strings is also from left to right

Suffix matching refers to the comparison between the pattern string and the parent string, from right to left, and the pattern string moving from left to right.

In the previous chapter it is obvious that the BF algorithm is also a prefix algorithm, but it's very good. The efficiency of matching one by one naturally do not have to mention O (MN), online egg pain is explained a lot of KMP, basically are walking tall on the route to see you are also confused, I tried to use their own understanding of the most grounded gas to describe

Kmp

KMP is also an optimized version of the prefix algorithm, the reason is called KMP is Knuth, Morris, Pratt three names of the initials, compared to the BF then KMP algorithm optimization point in the "each back moving distance" it will dynamically adjust the distance of each mode string, the BF is every time +1,

KMP is not necessarily

such as figure BF and KMP the difference between the predecessor algorithm

In contrast to the diagram we found:

In the text string T Search mode string p, in the natural match the 6th letter C when found two inconsistent, then the BF method, that is, the whole mode of the string p to move one, KMP is mobile two-bit.

BF matching method We know, but why does KMP move two-bit, not one or three-bit four-bit?

This is the last picture we explained, pattern string p in the match Ababa is correct, when to C is the error, then KMP algorithm idea is: Ababa is the correct matching information, we can not use this information, do not put "search location" back to have been compared to the location, Continue to move it backwards, which improves efficiency.

So the question is, how many locations do I know how to move?

This migration algorithm KMP's authors have summed it up:

The code is as follows:

Moved digits = number of characters matched-corresponding partial match values

The migration algorithm is only related to the substring, there is no text string no yarn relationship, so here need to pay special attention to

So how do we understand the number of matched characters in the substring and the corresponding partial match values?

Characters that have been matched:

The code is as follows:

T:abababaabab

P:ababacb

The red mark in P is the character that has been matched, which is well understood.

Partial-match values:

This is the core of the algorithm, but also more difficult to understand

If:

The code is as follows:

T:aaronaabbcc

P:aaronaac

We can observe this text if we make a mistake in matching C, our next moving position is in the last structure, where is the most reasonable move?

The code is as follows:

Aaronaabbcc

Aaronaac

That is to say: In the pattern text inside, a paragraph of characters are the same, then the natural filter can skip this paragraph of content, this idea is also reasonable

Knowing this rule, the Partial matching table algorithm is given as follows:

First, understand two concepts: "prefix" and "suffix." "prefix" means the entire header combination of a string except the last character; Suffix "refers to the entire tail combination of a string except the first character.

"Partial match value" is the length of the longest common element of "prefix" and "suffix"

Let's see if the AARONAAC is a BF match when the partition is this

Displacement of the BF: A,AA,AAR,AARO,AARON,AARONA,AARONAA,AARONAAC

What about the KMP? Here's where we're going to introduce prefixes and suffixes.

Let's take a look at the results of the KMP Partial matching table:

The code is as follows:

A A r o n a a C

[0, 1, 0, 0, 0, 1, 2, 0]

It's gotta be confused, no hurry we decompose, prefixes and suffixes

The code is as follows:

Match string: "Aaron"

Prefix: A,aa, Aar, Aaro

Suffix: aron,ron,on,n

Move position: In fact, for each matched character to do the prefix and suffix of the comparison is equal, and then calculate the total length

Decomposition of partial matching tables

Algorithm of matching table in KMP, where p denotes prefix, n denotes suffix, R indicates result

The code is as follows:

A, p=>0, n=>0 r = 0

AA, P=>[a],n=>[a], r = a.length => 1

AAR, P=>[a,aa], N=>[r,ar], r = 0

Aaro, P=>[a,aa,aar], N=>[o,ra,aro], r = 0

Aaron P=>[a,aa,aar,aaro], N=>[n,on,ron,aron], r = 0

Aarona, P=>[a,aa,aar,aaro,aaron], N=>[a,na,ona,rona,arona], r = a.lenght = 1

Aaronaa, P=>[a,aa,aar,aaro,aaron,aarona], N=>[a,aa,naa,onaa,ronaa,aronaa], r = Math.max (a.length,aa.length) = 2

AARONAAC P=>[a,aa,aar,aaro,aaron,aarona], N=>[C,AC,AAC,NAAC,ONAAC,RONAAC] r = 0

Similar to the BF algorithm, the first decomposition of each possible match of the subscript position first cached, in the matching time through this "partial matching table" to locate the number of digits after the need to move

So the result of the last AARONAAC matching table 0,1,0,0,0,1,2,0 is this.

The following will realize the JS version of the KMP, there are 2 kinds

KMP implementation (i): Cache matching Table KMP

KMP Implementation (II): Dynamically compute next KMP

KMP implementation (i)

Matching tables

KMP algorithm of the most important is the matching table, if not matching the table that is the BF implementation, plus matching table is KMP

The matching table determines next next shift count

According to the rules of the above matching table, we design a Kmpgetstrpartmatchvalue method

?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 The 25 26 function Kmpgetstrpartmatchvalue (str) {var prefix = []; var suffix = []; var partmatch = []; for (var i = 0, j = Str.leng Th I < J; i++) {var newstr = str.substring (0, i + 1); if (newstr.length = = 1) {Partmatch[i] = 0;} else {for (var k = 0; k < i ; k++) {//prefix prefix[k] = newstr.slice (0, K + 1);//suffix suffix[k] = newstr.slice (-k-1);//If equal is computed and put in result set if (prefix[k) = = Suffix[k]) {partmatch[i] = prefix[k].length}} if (!partmatch[i]) {partmatch[i] = 0;}} return partmatch; }

Completely according to the algorithm of the matching table in KMP, through str.substring (0, i + 1) decomposition A->AA->AAR->AARO->AARON->AARONA->AARONAA-AARONAAC

Then the length of the common element is calculated by the prefix suffix in each decomposition

Fallback algorithm

KMP is also a front algorithm, can completely put the BF that a set of moved over, the only modification is the BF back to the time directly is add 1,KMP in the back of the time we have to work through the matching table to calculate this next value can

?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22-23 Sub-loop for (var j = 0 J < Searchlength J + +) {//If matching the main string if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If the match is complete if (j = = searchLength-1) {result = I-j; else {//If match is made, continue looping, i++ is used to increase the subscript bit i++ of the main string;} else {///in the match of substring I was superimposed if (J > 1 && part[j-1] > 0) {i = = (I-j-part[j-1]);} else {//move one i = (i- j)} break; } }

The red mark is the key point of the KMP. Next value = number of characters matched-corresponding partial matching values

A complete KMP algorithm

?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 5 86 87 88 89 90 91 92 93 94 95 96 97 98-99 100 <!doctype html><div id= "test2" ><div><script type= "Text/javascript" >     function Kmpgetstrpartmatchvalue (str) {var prefix = []; var suffix = []; var partmatch = []; for (var i = 0, j = str.length; I ; J i++) {var newstr = str.substring (0, i + 1); if (newstr.length = = 1) {Partmatch[i] = 0;} else {for (var k = 0; k < i ; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {Partmatch [I] = prefix[k].length; } if (!partmatch[i]) {partmatch[i] = 0;}} return partmatch; }       function KMP (SOURCESTR, SEARCHSTR) {//Generate matching table var part = Kmpgetstrpartmatchvalue (SEARCHSTR); var sou Rcelength = Sourcestr.length; var searchlength = searchstr.length; var result; var i = 0; var j = 0;   for (; i < sourcestr.length; i++) {//outermost loop, main string  //Sub loop for (var j = 0; J < Searchlength; J +) {//if the main string With if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If the match completes if (j = = SearchLength-1) {result = I-j; break;} else {//If match is made, continue looping, i++ is used to increase the subscript bit i++ of the main string;} else {///in the match of substring I was superimposed if (J > 1 && part[j-1] > 0) {i = = (I-j-part[j-1]);} else {//move one i = (i- j)} break; }   if (Result | | = = = 0) {break;}}     if (Result |: result = = 0) {return result} else {return-1;}}   var s = "BBC abcdab Abcdabcdabde"; var t = "abcdabd";     Show (' IndexOf ', function () {return S.indexof (t)})   Show (' KMP ', function () {return KMP (s,t)})   Function Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', time consuming ' + (+new Date ()-mydate) + "MS"; document.getElementById ("Test2"). AppendChild (Div); }     </script></div></div>

KMP (ii)

The first kind of KMP algorithm is obvious, is through the cache lookup matching table is also the common space to change time. Then the other is to find the algorithm from time to time, by passing a specific completion string, calculate the matching value out, the principle is the same

The cache table is generated when the whole is calculated, we are now equal to just pick one of them, so long as the algorithm to locate the matching of course

Next algorithm

?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 function Next (str) {var prefix = []; var suffix = []; var partmatch var i = str.length var newstr = str.substring (0, i + 1); for (var k = 0; k < i; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k) = = Suffix[k]) {partmatch = Prefix[k].length;}} if (!partmatch) {partmatch = 0;} return partmatch; }

In fact, the same as the matching table, removed the loop directly to the current successful match of the string

A complete Kmp.next algorithm

?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 <!doctype html><div id= "Testnext" ><div><script type= "text/javascript" >   function Next ( STR) {var prefix = []; var suffix = []; var partmatch var i = str.length var newstr = str.substring (0, i + 1); for (Var k = 0; K < I; k++) {//prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {Partmatch = Prefix[k].length; } if (!partmatch) {partmatch = 0;} return partmatch; }   function KMP (SOURCESTR, searchstr) {var sourcelength = sourcestr.length; var searchlength = searchstr.length; va R result; var i = 0; var j = 0;   for (; i < sourcestr.length; i++) {//outermost loop, main string  //Sub loop for (var j = 0; J < Searchlength; J +) {//if the main string With if (Searchstr.charat (j) = = Sourcestr.charat (i)) {//If matching completes if (j = = searchLength-1) {result = I-j; break;} else { If a match is made, continue the loop, i++ is used to increase the subscript bit i++ of the main string; } else {if (J > 1) {i + = I-next (Searchstr.slice (0,j));} else {//move one i = (i-j)} break;}   if (Result | | result = = 0) {break;}}     if (Result |: result = = 0) {return result} else {return-1;}}   var s = "BBC abcdab Abcdabcdabde"; var t = "Abcdab";     Show (' IndexOf ', function () {return S.indexof (t)})   Show (' Kmp.next ', function () {return KMP (s,t)}) &N Bsp Function Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', time consuming ' + (+new Date ()-mydate) + "MS"; document.getElementById ("Testnext"). AppendChild (Div); }   </script></div></div>
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.