Generally, for content websites, a list of articles related to this article needs to appear in each article. Most people use the following method: Create a keyword list, judge which keywords are contained in each article, and finally find the most relevant article based on the keywords. For websites with complicated content, determining key list words is obviously troublesome.
Later I checked some php functions, and I felt that the similar_text (php4, php5) function could easily meet my requirements. This idea is: retrieve all the article titles from the article list, compare all the article titles with the current title, and generate an array of the comparison results, the article titles are compared with the original article titles by similar_text based on the similarity, and the titles are re-arranged according to the similarity of the titles, A list of similar articles is obtained.
The key functions used in this approach are:
int similar_text ( string $first, string $second [, float $percent] )
It returns the same number of bytes of the two root strings.
According to this idea, we create the following function. The function is to rearrange the $ arr_title array in a sequence similar to $ title.
<? Php $ demo_title = ""; $ demo_arr_title = array ("simple modern magic", "simple modern magic", "concise ancient magic ", "modern magic is not simple", "modern magic is difficult to understand"); $ new_array = getSimilar ($ demo_title, $ demo_arr_title); // print_r ($ new_array ); echo "the first three articles most relevant to [$ demo_title] are: <br/>"; for ($ j = 0; $ j <= 2; $ j ++) {echo ($ j + 1 ). ":". $ new_array [$ j]. "<br/>" ;}// $ title: Current title. $ arrayTitle is the Array function getSimilar ($ title, $ arr_title) to be searched) {$ arr_len = count ($ arr_title); for ($ I = 0; $ I <= ($ Rr_len-1); $ I ++) {// get two bytes of string similarity $ arr_similar [$ I] = similar_text ($ arr_title [$ I], $ title );} arsort ($ arr_similar); // sort the reset ($ arr_similar) by the number of similar bytes; // move the pointer to the first unit of the array $ index = 0; foreach ($ arr_similar as $ old_index => $ similar) {$ new_title_array [$ index] = $ arr_title [$ old_index]; $ index ++;} return $ new_title_array;}?>
Program running result:
The first three articles most relevant to [helper's house] are: 1: simple and clear modern magic 2: easy to understand modern magic 3: Concise and concise ancient magic
Note the following:
- Someone has done this test on similar_text speed. The result is:
The speed issues for similar_text seem to be only an issue for long sections of text (& gt; 20000 chars ).
I found a huge performance improvement in my application by just testing if the string to be tested was less than 20000 chars before calling similar_text.
20000 + took 3-5 secs to process, anything else (10000 and below) took a fraction of a second. fortunately for me, there was only a handful of instances with> 20000 chars which I couldn't get a comparison %.
It may be slow to directly use the text for comparison.
- This function may not work very well in English (I have not tried it ). You can separate an English sentence with spaces into multiple words and then write a function similar to similar_text.
- When a sentence contains many non-Keyword characters, such as ", and so on, the result may be unsatisfactory.