In Baidu's interview, is simply the pattern to seek abuse.
First of all, in the interviewer to look at the resume period, in addition to a self-defined string similarity, and write the algorithm to find similarity.
。。。 This is not really heard, PHP's Similar_text function is unheard of. Before looking at the SEO time, to a simple understanding of the page similarity, Baidu algorithm is very common need to judge whether the page is repeated, duplicate affirmation does not include, do SEO is a heavy job is to write the original article, in order to keep the site update, to attract Baidu included to increase traffic.
Page similarity, is purely mathematical, because Baidu is mainly included in Chinese, so the Chinese need to first remove the words, and then calculate the frequency of the words in the article. These phrases are then weighted to find a vector and then the cosine of the two pages. This thing will certainly not be, rip or pull.
The similarity of the strings here is meaningless pure strings such as ABACBCD and ABCBCD.
Now that you have defined yourself, you must define a simple one, primarily to find the most identical string and length. (Miss out a lot of possible)
Come back and check it out for yourself. was found as follows:
This similar_text is divided into three steps
The first step
Continue to write later.
Baidu face question string similarity algorithm similar_text and page similarity algorithm