1.KMP演算法
代碼
def compute_prefix_function(p):
m = len(p)
pi = [0] * m
k = 0
for q in range(1, m):
while k > 0 and p[k] != p[q]:
k = pi[k - 1]
if p[k] == p[q]:
k = k + 1
pi[q] = k
return pi
def kmp_matcher(t, p):
n = len(t)
m = len(p)
pi = compute_prefix_function(p)
q = 0
for i in range(n):
while q > 0 and p[q] != t[i]:
q = pi[q - 1]
if p[q] == t[i]:
q = q + 1
if q == m:
return i - m + 1
return -1
2.BM演算法例子
代碼
def BoyerMooreHorspool(pattern, text):
m = len(pattern)
n = len(text)
if m > n: return -1
skip = []
for k in range(256): skip.append(m)
for k in range(m - 1): skip[ord(pattern[k])] = m - k - 1
skip = tuple(skip)
k = m - 1
while k < n:
j = m - 1; i = k
while j >= 0 and text[i] == pattern[j]:
j -= 1; i -= 1
if j == -1: return i + 1
k += skip[ord(text[k])]
return -1
if __name__ == '__main__':
text = "this is the string to search in"
pattern = "the"
s = BoyerMooreHorspool(pattern, text)
print 'Text:',text
print 'Pattern:',pattern
if s > -1:
print 'Pattern \"' + pattern + '\" found at position',s
這兩個演算法主要應用於字串匹配。網上評論說BM效能優於KMP,我沒驗證過。改天可以用cProfile測試一下。
ps:今天分別用這兩個演算法,尋找了69K文檔的最後一行字串,KMP用了0.053個CPU時間,BM僅用了0.025個CPU時間。
其實我非常想看一下sunday演算法,據說是BM的改進,提升不少效能。研究了一下演算法,是人性多了。但是現在網上沒有它的python實現,改天嘗試搞一個出來。