Full-text search of dvbbs. php in MySQL

Source: Internet
Author: User
Tags mysql manual

Not updated for several daysArticleAlas, people are just lazy.

First, download a copy of dvbbs. php beta1CodeDecompress the package and put aside the PHP code to find out your MySQL manual. If there is no manual, check the following instance operations!

MySQL full-text search:

Match (col1, col2 ,...) Against (expr [in Boolean mode | with query expansion])

For example:

Select * from articles where match (title, body) against ('database ');

The match () function performs a natural language search in the database for a string. A database contains one or two columns in Fulltext. The search string is given as a parameter to against. For each row in the table, match () returns a correlation value, that is, a similarity measurement between the text of the row in the specified column in the search string and match () table.

The example below is more complex. Query the returned correlation values, and sort the rows in the order of weak correlation. To achieve this, you should specify match () twice: one in the select list and the other in the WHERE clause. This does not cause additional internal operations because MySQL is optimized.ProgramNote that the two match () calls are the same, so that only one full-text search code is activated.

Reference content is as follows:

Mysql> select ID, body, match
(Title, body) against
-> ('Security implications
Running MYSQL as root ') as score
-> From articles where Match
(Title, body) against
-> ('Security implications
Running MYSQL as root ');

Therefore, you should search MySQL full text here.

Please pay attention to one problem.

Some words are ignored in full-text search:

* Any word that is too short will be ignored. The default minimum length of words that can be found in full-text search is 4 characters.

* Words in the stopword are ignored.

MySQL also comes with the query Extension function. We will not discuss it too much here.

The following is an analysis of PHP full-text search.

One version of MySQL once supported full-text Chinese search (massive MySQL Chinese +, indicating GPL but not open source)

The key to Chinese full-text search is word segmentation. Mysql itself does not support CJK word segmentation (CJK: Chinese, Japanese, Korean ),

So

!!!! ***** How to Use PHP to simulate word segmentation is the key to MySQL full-text index ****!!!!

Chinese word segmentation is the most difficult language word segmentation, and no one can solve it completely (although these search engines are doing well .)

Reference content is as follows:

// Fcicq: Let's take a look at the PHP word segmentation here.
Function & dv_chinesewordsegment ($ STR, $ encodingname = 'gbk '){

Static $ objenc = NULL;

If ($ objenc === null ){

If (! Class_exists ('DV _ encoding ')){

Require_once root_path. 'inc/dv_encoding.class.php ';

}

$ Objenc = & dv_encoding: getencoding ($ encodingname );

}

$ Strlen = $ objenc-> strlength ($ Str );

$ Returnval = array ();

If ($ strlen <= 1 ){

Return $ STR;

}

$ Arrstopwords = & dv_getstopwordlist ();

// Print_r ($ arrstopwords );

// Filter all HTML tags

$ STR = preg_replace ('# <[A-Za-Z] + ?. *?> | # Is ', ", $ Str );

// Filter all stopword

$ STR = str_replace ($ arrstopwords ['strrepl'], '', $ Str );

$ STR = preg_replace ($ arrstopwords ['pregrepl'], '', $ Str );

// Echo "$ STR: {$ STR}
";

$ Arr = explode ('', $ Str );

// Fcicq: Well, this is the key to PhP word segmentation *************
Foreach ($ arr as $ tmpstr ){

If (preg_match ("/^ [x00-x7f] + $/I", $ tmpstr) = 1)
{// Fcicq: It's all e files. It's okay. MySQL can understand it.

$ Returnval [] = ''. $ tmpstr;

} Else {// fcicq: English-Chinese mixture...

Preg_match_all ("/([A-Za-Z] +)/I", $ tmpstr, $ matches );

If (! Empty ($ matches) {// fcicq: English part

Foreach ($ matches [0] as $ matche ){

$ Returnval [] = $ matche;

}

}

// Filter ASCII characters

$ Tmpstr = preg_replace ('/([x00-x7f] +)/I ","
, $ Tmpstr); // fcicq: You see, the rest is not all Chinese?

$ Strlen = $ objenc-> strlength ($ tmpstr)-1;

For ($ I = 0; $ I <$ strlen; $ I ++ ){

$ Returnval [] = $ objenc-> substring ($ tmpstr, $ I, 2)
; // Fcicq: note that the substr here is not in the manual.
// Fcicq: Take a closer look, all the words are divided into two.
// For example, "database applications" will be divided into applications of the data warehouse...
// Full-text search: full-text search
// This word segmentation is naturally not good
// However, this is also done during search.
// For example, searching for a database is equivalent to searching for a data database.
// This is a traditional full-text search word segmentation method.

}

}

}

Return $ returnval;

} // End function dv_chinesewordsegment

// Fcicq: this is the legendary substr. I believe many people write better PHP code than this one.
Function & substring (& $ STR, $ start, $ length = NULL ){

If (! Is_numeric ($ start )){

Return false;

}

$ Strlen = strlen ($ Str );

If ($ strlen <= 0 ){

Return false;

}

If ($ start <0 | $ length <0 ){

$ Mbstrlen = $ this-> strlength ($ Str );

} Else {

$ Mbstrlen = $ strlen;

}

If (! Is_numeric ($ length )){

$ Length = $ mbstrlen;

} Elseif ($ length <0 ){

$ Length = $ mbstrlen + $ length-1;

}

If ($ start <0 ){

$ Start = $ mbstrlen + $ start;

}

$ Returnval = '';

$ Mbstart = 0;

$ Mbcount = 0;

For ($ I = 0; $ I <$ strlen; $ I ++ ){

If ($ mbcount> = $ length ){

Break;

}

$ Currord = ord ($ STR {$ I });

If ($ mbstart> = $ start ){

$ Returnval. = $ STR {$ I };

If ($ currord> 0 × 7f ){

$ Returnval. = $ STR {$ I + 1}. $ STR {$ I + 2 };

$ I + = 2;

}

$ Mbcount ++;

} Elseif ($ currord> 0 × 7f ){

$ I + = 2;

}

$ Mbstart ++;

}

Return $ returnval;

} // End function substring

// Insert the full-text search word segmentation table. There are two in total, one topic_ft and the other bbs_ft.

$ Arrtopicindex = & dv_chinesewordsegment ($ topic );

If (! Empty ($ arrtopicindex) & is_array ($ arrtopicindex )){

$ Topicindex = $ db-> escape_string (implode ('', $ arrtopicindex ));

If ($ topicindex! = "){

$ Db-> query ("UPD ate {$ DV} topic_ft set topicindex ='
{$ Topicindex} 'where topicid = '{$ rootid }'");

} Else {

$ Db-> query ("del ete from {$ DV} topic_ft
Where topicid = '{$ rootid }'");

}

}
}

This is the so-called MySQL full-text search word segmentation, MySQL will not word segmentation, but PHP will. That's simple.

This is a relatively outdated method, but it is very practical.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.