Rabin-karp algorithm in the Go Language source code

Source: Internet
Author: User
Tags pow

The Strings.go package implements a Rabin-karp algorithm. A bit of a meaning.

About this algorithm:
Turing community has an article: illustrated Rabin-karp string Lookup algorithm
About Go Source implementation:
Netizen Golove has written a very detailed explanation. Http://www.cnblogs.com/golove/p/3234673.html

Golove that has been analyzed very clearly, but the preceding string of instructions is too long. I replaced his description with a code form .

Run straight up so you can see more clearly.

Package main import ("FMT" "Unicode/utf8") func main () {count: = Count ("9876520210520", "520") fmt. Println ("count==", count)}//Primerk is the prime base used in Rabin-karp Algorithm.//primerk equivalent to the binary//In this case, only 0-9 of these 10 numbers are used,  That is, the total number of all characters is 10, so the 10//source is set to 16777619, that is, the equivalent of 16777619 binary//the magic is in the interesting relationship between the special prime 16777619 (2^24 + 403) and 2^32 and 2^8. Const PRIMERK = ten//16777619//HASHSTR returns the hash and the appropriate multiplicative//factor for use in Rabin-ka RP algorithm.func hashstr (Sep string) (UInt32, UInt32) {hash: = UInt32 (0) CharCode: = [...] uint32{5,2,0} for I: = 0; I < Len (Sep); i++ {//hash = Hash*primerk + UInt32 (sep[i]) hash = Hash*primerk + charcode[i]}//is equivalent to hundreds----10 bits, which is the multiplier factor (POW), in this case 520 , the resulting POW is 1000var pow, sq uint32 = 1, primerkfor i: = Len (Sep); i > 0; I >>= 1 {//len (SEP) =3 i>>{1,0} sq:{10,100}if i&1! = 0 {pow *= sq}sq *= sq}/*var pow uint32 = 1for I: = L En (Sep); i > 0; i--{POW *= primerk}*/fmt. PrinTLN ("Hashstr () Sep:", Sep, "hash:", hash, "POW:", pow) return hash, pow}//Count counts the number of Non-overlapping instance S of Sep in S.func Count (S, Sep string) int {fmt. Println ("Count () S:", S, "Sep:", Sep) N: = 0//Special Casesswitch {case len (sep) = = 0://seq is empty, total returned plus 1return UTF8. Runecountinstring (s) + 1case len (sep) = = 1://seq is a single character, direct traversal comparison can//special case worth making FASTC: = sep[0]for I: = 0; I < Len (s); i++ {if s[i] = = c {n++}}return ncase len (Sep) > len (s): Return 0case len (sep) = = Len (s): if Sep = = s {return 1}return 0}/ /Rabin-karp searchhashsep, pow: = Hashstr (Sep) Lastmatch: = 0//Last match position charcode: = [...] uint32{9,8,7,6,5,2,0,2,1,0,5,2,0}//corresponds to String "9876520210520"//Verify S string 0-len (Sep) is not matched by H: = UInt32 (0) for I: = 0; I < Len (Sep);  i++ {//h = H*primerk + UInt32 (s[i]) H = H*primerk + charcode[i]}//as the Len (SEQ) content of the initial s is matched, n++, lastmatch points to Len (seq) position if H = = Hashsep && S[:len (sep)] = = Sep {n++lastmatch = len (Sep)}for I: = Len (Sep); I < Len (s); {FMT. Println ("\na h = =", h ) H *= primerk//plus new//h + = UInt32 (S[i]) H + = Charcode[i] FMT. Println ("b h = =", h)//Remove the old//h-= POW * UInt32 (S[i-len (Sep))) H-= POW * Charcode[i-len (Sep)]fmt. Println ("c h = =", h) i++if h = = Hashsep && lastmatch <= i-len (Sep) && S[i-len (Sep): i] = = Sep {N++lastmat ch = ifmt. Println ("Found n==", N, "lastmatch==", Lastmatch)}}return n}
With this substitution, you can see clearly how the process is done:

Count () s:9876520210520  sep:520hashstr () sep:520  hash:520  pow:1000a h = = 987b h = 9876c h = = 876a H = = 8 76b h = 8765c h = 765a h = 765b h = 7652c h = 652a h = 652b h = 6520c h = 520found n== 1  lastmatch== 7a h = 5 20b h = = 5202c h = 202a h = 202b h = 2021c h = 21a h = 21b h = 210c h = 210a h = = 210b h = 2105c h = = 105a H = 10 5b h = 1052c h = 52a h = 52b h = 520c h = 520found n== 2  lastmatch== 13count== 2

In addition, for "if H = = Hashsep && lastmatch <= i-len (Sep) && S[i-len (Sep): i] == Sep {" This paragraph, can be understood as:

Prevent the calculated hash equal, but the actual string is different if H = = Hashsep && S[i-len (Sep): i] = = Sep {//For example count ("1111", "11") this, 1111 can count 2 times, not 3 times if Lastmatch <= I-len (Sep) {n++lastmatch = i}}
That's why we have to add lastmatch.


MAIL: [Email protected]

blog:http://blog.csdn.net/xcl168



Rabin-karp algorithm in the Go Language source code

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.