Go語言源碼中的Rabin-Karp演算法

來源:互聯網
上載者:User
這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。

    strings.go包實現了一個Rabin-Karp演算法.有點意思.

關於這個演算法:
 圖靈社區有一篇: 圖說Rabin-Karp字串尋找演算法 
關於Go源碼實現:
   網友GoLove已寫一個篇非常詳細的說明了. http://www.cnblogs.com/golove/p/3234673.html 

   GoLove那個已經分析的非常清楚了,只是前面那一串說明太長了.我把他的說明替換成代碼形式.

直接跑起來,這樣更能看得清楚些.

 

package main import ("fmt"    "unicode/utf8"   )func main(){count := Count("9876520210520","520")fmt.Println("count==",count)}// primeRK is the prime base used in Rabin-Karp algorithm.//primeRK相當於進位//本例中,只用到0-9這10個數字,即所有字元的總個數為10,所以定為10//源碼中是16777619,即相當於16777619進位//The magic is in the interesting relationship between the special prime  //16777619 (2^24 + 403) and 2^32 and 2^8. const primeRK = 10 // 16777619 // hashStr returns the hash and the appropriate multiplicative// factor for use in Rabin-Karp algorithm.func hashStr(sep string) (uint32, uint32) {hash := uint32(0)charcode := [...]uint32{5,2,0} for i := 0; i < len(sep); i++ {//hash = hash*primeRK + uint32(sep[i])hash = hash*primeRK + charcode[i] }//即相當於千位->百位->十位,得到乘數因子(pow),本例中的520,得到的pow是1000var pow, sq uint32 = 1, primeRKfor i := len(sep); i > 0; i >>= 1 { //len(sep)=3 i>>{1,0} sq:{10,100}if i&1 != 0 { pow *= sq}sq *= sq}/*var pow uint32 = 1for i := len(sep); i > 0; i-- { pow *= primeRK}*/fmt.Println("hashStr() sep:",sep," hash:",hash," pow:",pow)return hash, pow}// Count counts the number of non-overlapping instances of sep in s.func Count(s, sep string) int {fmt.Println("Count() s:",s," sep:",sep)n := 0// special casesswitch {case len(sep) == 0://seq為空白,返回總數加1return utf8.RuneCountInString(s) + 1case len(sep) == 1://seq為單個字元,直接遍曆比較即可// special case worth making fastc := sep[0]for i := 0; i < len(s); i++ {if s[i] == c {n++}}return ncase len(sep) > len(s):return 0case len(sep) == len(s):if sep == s {return 1}return 0}// Rabin-Karp searchhashsep, pow := hashStr(sep) lastmatch := 0 //最後一次匹配的位置charcode := [...]uint32{9,8,7,6,5,2,0,2,1,0,5,2,0} //對應字串"9876520210520"//驗證s字串 0 - len(sep)是不是匹配的h := uint32(0)for i := 0; i < len(sep); i++ { //h = h*primeRK + uint32(s[i])h = h*primeRK +  charcode[i] }//如初始s的len(seq)內容是匹配的,n++, lastmatch指向len(seq)位置 if h == hashsep && s[:len(sep)] == sep {n++lastmatch = len(sep)}for i := len(sep); i < len(s); { fmt.Println("\na h ==",h )h *= primeRK//加上新的//h += uint32(s[i]) h += charcode[i] fmt.Println("b h ==",h )// 去掉舊的//h -= pow * uint32(s[i-len(sep)])  h -= pow * charcode[i-len(sep)]fmt.Println("c h ==",h )i++if h == hashsep && lastmatch <= i-len(sep) && s[i-len(sep):i] == sep {n++lastmatch = ifmt.Println("found n==",n ," lastmatch==",lastmatch)}}return n}
這樣替換後,可以很清楚的看到運行過程是如何做的: 

Count() s: 9876520210520  sep: 520hashStr() sep: 520  hash: 520  pow: 1000a h == 987b h == 9876c h == 876a h == 876b h == 8765c h == 765a h == 765b h == 7652c h == 652a h == 652b h == 6520c h == 520found n== 1  lastmatch== 7a h == 520b h == 5202c h == 202a h == 202b h == 2021c h == 21a h == 21b h == 210c h == 210a h == 210b h == 2105c h == 105a h == 105b h == 1052c h == 52a h == 52b h == 520c h == 520found n== 2  lastmatch== 13count== 2
 

另外,對於" if h == hashsep && lastmatch <= i-len(sep) && s[i-len(sep):i] == sep {"這段,可以這樣理解: 

//防止計算出的hash相等,但實際串不同的情況if h == hashsep && s[i-len(sep):i] == sep {//比如Count("1111","11")這種,1111隻能算2次,而不是3次if lastmatch <= i-len(sep) {n++lastmatch = i}}

所以才要加上lastmatch.


再補上一個,為什麼是16777619? 可以看看
網友Bryce寫的這篇:http://blog.cyeam.com/golang/2015/01/15/go_index/


MAIL: xcl_168@aliyun.com

BLOG:http://blog.csdn.net/xcl168



聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.