#! /Usr/bin/python #-*-coding: UTF-8 # filename BFimport time "t =" this is a big apple, this is a big apple, this is a big apple, this is a big apple. "p =" apple "t =" Why is vector space model? In fact, we can regard each word as a dimension, and the word frequency as its value (directed), that is, vector, in this way, the word and frequency of each article constitute an I-dimensional spatial graph. The similarity between the two documents is the closeness of the two spatial graphs. Assuming that the article only has two dimensions, a spatial graph can be drawn in a plane Cartesian coordinate system, and the reader can imagine two articles with only two words for understanding. "P =" "I = 0 count = 0 start = time. time () while (I <= len (t)-len (p): j = 0 while (t [I] = p [j]): I = I + 1 j = j + 1if j = len (p): breakelif (j = len (p)-1): count = count + 1 else: I = I + 1j = 0 print countprint time. time ()-start
Algorithm idea: the target string t and the mode string p are compared word by word. If the corresponding bits match, the next bits are compared. If they are different, p shifts one bits to the right, start the comparison from the 1st bits of p.
Algorithm features: Overall movement direction: p slides from left to right under fixed conditions, start from the leftmost bits of p and start from the right to compare with the corresponding bits in the t string. The sliding distance of p is 1, which leads to low matching efficiency of BF algorithms (compared with other algorithms, such as BM, KMP, and slide without jumping ).
The time complexity of this algorithm is O (len (t) * len (p), and the space complexity is O (len (t) + len (p ))