這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。
strings.go包實現了一個Rabin-Karp演算法.有點意思.
關於這個演算法:
圖靈社區有一篇: 圖說Rabin-Karp字串尋找演算法
關於Go源碼實現:
網友GoLove已寫一個篇非常詳細的說明了. http://www.cnblogs.com/golove/p/3234673.html
GoLove那個已經分析的非常清楚了,只是前面那一串說明太長了.我把他的說明替換成代碼形式.
直接跑起來,這樣更能看得清楚些.
package main import ("fmt" "unicode/utf8" )func main(){count := Count("9876520210520","520")fmt.Println("count==",count)}// primeRK is the prime base used in Rabin-Karp algorithm.//primeRK相當於進位//本例中,只用到0-9這10個數字,即所有字元的總個數為10,所以定為10//源碼中是16777619,即相當於16777619進位//The magic is in the interesting relationship between the special prime //16777619 (2^24 + 403) and 2^32 and 2^8. const primeRK = 10 // 16777619 // hashStr returns the hash and the appropriate multiplicative// factor for use in Rabin-Karp algorithm.func hashStr(sep string) (uint32, uint32) {hash := uint32(0)charcode := [...]uint32{5,2,0} for i := 0; i < len(sep); i++ {//hash = hash*primeRK + uint32(sep[i])hash = hash*primeRK + charcode[i] }//即相當於千位->百位->十位,得到乘數因子(pow),本例中的520,得到的pow是1000var pow, sq uint32 = 1, primeRKfor i := len(sep); i > 0; i >>= 1 { //len(sep)=3 i>>{1,0} sq:{10,100}if i&1 != 0 { pow *= sq}sq *= sq}/*var pow uint32 = 1for i := len(sep); i > 0; i-- { pow *= primeRK}*/fmt.Println("hashStr() sep:",sep," hash:",hash," pow:",pow)return hash, pow}// Count counts the number of non-overlapping instances of sep in s.func Count(s, sep string) int {fmt.Println("Count() s:",s," sep:",sep)n := 0// special casesswitch {case len(sep) == 0://seq為空白,返回總數加1return utf8.RuneCountInString(s) + 1case len(sep) == 1://seq為單個字元,直接遍曆比較即可// special case worth making fastc := sep[0]for i := 0; i < len(s); i++ {if s[i] == c {n++}}return ncase len(sep) > len(s):return 0case len(sep) == len(s):if sep == s {return 1}return 0}// Rabin-Karp searchhashsep, pow := hashStr(sep) lastmatch := 0 //最後一次匹配的位置charcode := [...]uint32{9,8,7,6,5,2,0,2,1,0,5,2,0} //對應字串"9876520210520"//驗證s字串 0 - len(sep)是不是匹配的h := uint32(0)for i := 0; i < len(sep); i++ { //h = h*primeRK + uint32(s[i])h = h*primeRK + charcode[i] }//如初始s的len(seq)內容是匹配的,n++, lastmatch指向len(seq)位置 if h == hashsep && s[:len(sep)] == sep {n++lastmatch = len(sep)}for i := len(sep); i < len(s); { fmt.Println("\na h ==",h )h *= primeRK//加上新的//h += uint32(s[i]) h += charcode[i] fmt.Println("b h ==",h )// 去掉舊的//h -= pow * uint32(s[i-len(sep)]) h -= pow * charcode[i-len(sep)]fmt.Println("c h ==",h )i++if h == hashsep && lastmatch <= i-len(sep) && s[i-len(sep):i] == sep {n++lastmatch = ifmt.Println("found n==",n ," lastmatch==",lastmatch)}}return n}這樣替換後,可以很清楚的看到運行過程是如何做的:
Count() s: 9876520210520 sep: 520hashStr() sep: 520 hash: 520 pow: 1000a h == 987b h == 9876c h == 876a h == 876b h == 8765c h == 765a h == 765b h == 7652c h == 652a h == 652b h == 6520c h == 520found n== 1 lastmatch== 7a h == 520b h == 5202c h == 202a h == 202b h == 2021c h == 21a h == 21b h == 210c h == 210a h == 210b h == 2105c h == 105a h == 105b h == 1052c h == 52a h == 52b h == 520c h == 520found n== 2 lastmatch== 13count== 2
另外,對於" if h == hashsep && lastmatch <= i-len(sep) && s[i-len(sep):i] == sep {"這段,可以這樣理解:
//防止計算出的hash相等,但實際串不同的情況if h == hashsep && s[i-len(sep):i] == sep {//比如Count("1111","11")這種,1111隻能算2次,而不是3次if lastmatch <= i-len(sep) {n++lastmatch = i}}
所以才要加上lastmatch.
再補上一個,為什麼是16777619? 可以看看
網友Bryce寫的這篇:http://blog.cyeam.com/golang/2015/01/15/go_index/
MAIL: xcl_168@aliyun.com
BLOG:http://blog.csdn.net/xcl168