Data structure and algorithm JavaScript (v) string (classical KMP algorithm)

Source: Internet
Author: User

Data structure and algorithm JavaScript (v) string (classical KMP algorithm)

KMP Algorithm and BM algorithm

KMP is a classic algorithm for prefix matching and bm suffix matching, and it can be seen that the difference between prefix matching and suffix matching is simply that the order of comparison is different.

Prefix matching refers to the comparison between the pattern string and the parent string, from left to right, and the movement of the pattern string from left to right.

Suffix matching refers to the comparison of the pattern string and the parent string from right to left, and the movement of the pattern string from left to right.

The last chapter is obviously the BF algorithm is also a prefix of the algorithm, but very pa pretty one-to-one matching efficiency naturally do not mention O (MN), online egg kmp is a lot of explanation, the basic is to go the tall on the road to see you are also confused, I try to use their own understanding of the most grounded gas way to describe

Kmp

KMP is also an optimized version of the prefix algorithm, the reason is called KMP is Knuth, Morris, Pratt three names of the initials, compared to the Bf KMP algorithm optimization point in the "every move back distance" it will dynamically adjust the distance of each pattern string movement, BF is +1 each time,

KMP is not necessarily

Comparison between BF and KMP pre-placement algorithm

By comparing what we find in the graph:

In the text string T search pattern string p, in the natural match 6th letter C when found second-level inconsistency, then the BF method is to put the entire pattern string p move one, KMP is moving two bits.

BF matching method We know, but why does KMP move two-bit, not one or three-bit four-bit?

This is the last picture we explain that the pattern string p is correct when matching the Ababa, when the time to C is the error, then the KMP algorithm idea is: Ababa is the correct match to complete the information, we can use this information, do not move the "search location" back to the location that has been compared, Continue to move it backwards, which increases efficiency.

So the question is, how do I know how many positions to move?

The author of this offset algorithm, KMP, concludes:

Move digits = number of matched characters-corresponding partial match values

The offset algorithm is only related to substrings, there is no string of strings, so we need to pay special attention here.

So how do we understand that the number of matched characters in a substring matches the corresponding part of the matched value?

Characters that have been matched:

T: AbabaBaabab

P: AbabaCB

The red mark in P is the already matched character, which is a good understanding

Partially matched values:

This is the core of the algorithm, it is more difficult to understand

If:

T:aaronaabbccp:aaronaac

We can observe this text if we make a mistake when we match C, and the next moving position is the last structure, where is the most reasonable move?

AaronAAbbcc     aaRONAAC

That is to say: In the pattern text inside, a paragraph of characters are the same, then natural filtering can skip this section of content, this idea is also reasonable

Knowing this rule, then the partial matching table algorithm is given as follows:

First, you need to understand the two concepts: prefix and suffix. "prefix" means the combination of all the headers of a string except the last character; "suffix" means all the trailing combinations of a string in addition to the first character.

"Partial Match" is the length of the longest common element of "prefix" and "suffix"

Let's see if the AARONAAC is a BF match.

The displacement of BF: A,AA,AAR,AARO,AARON,AARONA,AARONAA,AARONAAC

What about the division of KMP? We're going to introduce prefixes and suffixes here.

Let's take a look at the results of the KMP partial match table:

A   a  r  o  n  a  a  c[0]

Must be confused, no hurry we break down, prefix and suffix

Match string: "Aaron" prefix: A,aa, Aar, Aaro suffix: aron,ron,on,n

Moving position: In fact, for each matched character prefix and suffix of the comparison is equal, and then calculate the total length

Decomposition of partially matched tables

An algorithm for matching tables in KMP, where p represents a prefix, n denotes a suffix, and R indicates the result

A, p=>0, n=>0 r = 0   AA, P=>[a],n=>[a], r = a.length = 1 AAR, P=>[a,aa], N=>[r,ar], r = 0 Aaro, P=>[a,aa,aar], N=>[o,ra,aro], r = 0 Aaron P=>[a,aa,aar,aaro], N=>[n,on,ron,aron], r = 0  Aarona, P=>[a,aa,aar,aaro,aaron], N=>[a,na,ona,rona,arona], r = a.lenght = 1   Aaronaa, P =>[a,aa,aar,aaro,aaron,aarona], N=>[a,aa,naa,onaa,ronaa,aronaa], r = Math.max (a.length,aa.length) = 2   aaronaac P=>[a,aa,aar,aaro,aaron,aarona], N=>[C,AC,AAC,NAAC,ONAAC,RONAAC] r = 0    

Similar to the BF algorithm, the first decomposition of each possible match of the position of the subscript first cached, at the time of matching through the "partial match table" to locate the number of bits to move after the need

So the result of the last AARONAAC match table is 0,1,0,0,0,1,2,0.

The following will be implemented JS version of the KMP, there are 2 kinds of

KMP implementation (i): Cache matching Table KMP

KMP Implementation (II): Dynamic calculation of Next's KMP

KMP implementation (i)

Match table

The most important thing in the KMP algorithm is the match table, if you do not match the table that is the BF implementation, plus the match table is KMP

The match table determines the next displacement count of next

For the rule of the above matching table, we design a Kmpgetstrpartmatchvalue method

function Kmpgetstrpartmatchvalue (str) {var prefix =[];var suffix =[];var partmatch =[];for (var i =0, j = str.length; I < J; i++) {var newstr = str.substring (0, i +1);if (newstr.length = =1) {Partmatch[i] =0; }Else{for (var k =0; K < I; k++) {// prefix prefix[k] = Newstr.slice ( 0, K + 1// suffix suffix[k] = newstr.slice (-K-1); // Suffix[k]) {partmatch[i] = Prefix[k]. Length }} if (! Partmatch[i]) {Partmatch[i] = 0return Partmatch;}           

Fully follow the implementation of the algorithm of the matching table in KMP, decomposed by str.substring (0, i + 1) a->aa->aar->aaro->aaron->aarona-> AARONAA-AARONAAC

The length of the common element is then calculated in each decomposition by prefix suffix

Fallback algorithm

KMP is also the predecessor algorithm, completely can put the BF that set to move over, the only modification is the BF back when the time is directly add 1,KMP in the back of the time we calculate the next value by matching the table can be

//Sub-loopsfor (var j =0; J < Searchlength; J + +) {//If you match the main stringif (Searchstr.charat (j) = =Sourcestr.charat (i)) {//If the match is completeif (j = = Searchlength-1) {result = i-JBreak; }Else{//If a match is made, the loop continues, and i++ is used to increase the subscript bit of the main string i++; } }else {// In the substring match I is superimposed if (J > 1 && part[j-1] > 0  I /strong>  += (I-j-part[j-1      ]);   } else {// move one i = (i- J)} break 

The red mark is the core point of KMP next = the number of matched characters-corresponding partial match values

The Complete KMP algorithm

<!doctype html><div id= "test2" ><div><script type= "Text/javascript" > function Kmpgetstrpartmatchvalue (str) {var prefix = []; var suffix = []; var partmatch = []; for (var i = 0, j = str.length; I < J; i++) {var newstr = str.substring (0, i + 1); if (newstr.length = = 1) {Partmatch[i] = 0; } else {for (var k = 0; k < i; k++) {//take prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {partmatch[i] = prefix[k].length; }} if (!partmatch[i]) {partmatch[i] = 0; }}} return partmatch; }function KMP (Sourcestr, SEARCHSTR) {//Generate matching table var part = Kmpgetstrpartmatchvalue (SEARCHSTR); var sourcelength = sourcestr.length; var searchlength = searchstr.length; var result; var i = 0; var j = 0; for (; I < SOurcestr.length; i++) {//outermost loop, main string//Sub-loop for (var j = 0; J < Searchlength; J + +) {//If the main string matches if (sear Chstr.charat (j) = = Sourcestr.charat (i)) {//If the match is complete if (j = = searchLength-1) { result = I-j; Break } else {//If the match is made, continue the loop, i++ is used to increase the main string of the subscript bit i++; }} else {//in the substring match I is superimposed if (J > 1 && part[j-1] > 0) { i + = (I-j-part[j-1]); } else {//move one i = (i-j)} break; }} if (Result | | result = = 0) {break; }} if (Result | | result = = 0) {return result} else {return-1; }} var s = "BBC abcdab Abcdabcdabde"; var t = "abcdabd"; Show (' IndexOf ', function () {return S.indexof (t)}) Show (' KMP ', function () {return KMP (s,t)}) function Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', time consuming + (+new Date ()-mydate) + "MS" ; document.getElementById ("Test2"). AppendChild (Div); }</script></div></div>

KMP (b)

The first KMP algorithm is obvious, and it is the common space-time to find a matching table through the cache. Then the other is always looking for the algorithm, by passing a specific completion string, calculate the matching value out, the principle is the same

When generating the cache table is the overall calculation, we are now equal to just pick one of them, then as long as the algorithm to locate the matching of course

Next algorithm

function Next (str) {var prefix =[];var suffix =[];VarPartmatch;var i =Str.lengthvar newstr = str.substring (0, i +1);for (var k = 0; k < i; +) {// take prefix prefix[k] = Newstr.slice (0, K + 11if (Prefix[k] == Suffix[k]) {partmatch = Prefix[k]. Length }} if (! Partmatch) {Partmatch = 0;} return Partmatch;}       

is actually the same as the match table, removing the loop directly to the string that is currently successfully matched.

The Complete Kmp.next algorithm

<!doctype html><div id= "Testnext" ><div><script type= "Text/javascript" > Function next (str) { var prefix = []; var suffix = []; var Partmatch; var i = str.length var newstr = str.substring (0, i + 1); for (var k = 0; k < i; k++) {//take prefix prefix[k] = newstr.slice (0, K + 1); Suffix[k] = Newstr.slice (-k-1); if (prefix[k] = = Suffix[k]) {partmatch = Prefix[k].length; }} if (!partmatch) {partmatch = 0; } return Partmatch; } function KMP (Sourcestr, searchstr) {var sourcelength = sourcestr.length; var searchlength = searchstr.length; var result; var i = 0; var j = 0; for (; i < sourcestr.length; i++) {//outermost loop, main string//Sub-loop for (var j = 0; J < Searchlength; J + +) { If the main string matches if (Searchstr.charat (j) = = Sourcestr.charat (i)) { If the match is complete if (j = = searchLength-1) {result = I-j; Break } else {//If the match is made, continue the loop, i++ is used to increase the main string of the subscript bit i++; }} else {if (J > 1) {i + = I-next (Searchstr.slice (0,j)); } else {//move one i = (i-j)} break; }} if (Result | | result = = 0) {break; }} if (Result | | result = = 0) {return result} else {return-1; }} var s = "BBC abcdab Abcdabcdabde"; var t = "Abcdab"; Show (' IndexOf ', function () {return S.indexof (t)}) Show (' Kmp.next ', function () {return KMP (s,t)}) F Unction Show (BF_NAME,FN) {var mydate = +new Date () var r = fn (); var div = document.createelement (' div ') div.innerhtml = Bf_name + ' algorithm, search location: ' + R + ', Time consuming "+ (+new Date ()-mydate) +" MS "; document.getElementById ("Testnext"). AppendChild (Div); }</script></div></div>

git code Download: https://github.com/JsAaron/data_structure

Data structure and algorithm JavaScript (v) string (classical KMP algorithm)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.