I believe many people have studied the full-text segmentation index of MYSQL. Currently, MYSQL does not support full-text indexing ~~~ I have found N many articles on PHP + MYSQL full-text index on the Internet, but I don't know how to use it. the PHP Chinese word segmentation involves DDEDE, the DEDE dictionary is not very powerful. you can use it!
I believe many people have studied the full-text segmentation index of MYSQL. Currently, MYSQL does not support full-text indexing ~~~
I have found N many articles on PHP + MYSQL full-text index on the Internet, but I don't know how to use it. the PHP Chinese word segmentation involves DDEDE, the DEDE dictionary is not very powerful. you can use it!
Another one is SCWS-simple Chinese word segmentation system. Currently, Cainiao does not know how to use it. although some of them have compiled the Windows version, it is quite troublesome to install it, I have not tested it yet!
// ==================================
Not much nonsense, first go to 9streets download word segmentation algorithm functions and word library;: http://www.9streets.cn/userfiles/9streetssplit.rar
There are detailed instructions for use. here is an example;
MYSQL table name: music
Field: title, tag
Require ("lib_splitword_full.php ");
$ Str = "here is the content you want to split. generally, it is better not to exceed KB, otherwise it will be slow! ";
$ Sp = new SplitWord (); // instantiate
$ Dd = explode ("", $ sp-> SplitRMM ($ str ));
$ I = 0;
Foreach ($ dd as $ key => $ var ){
If (strlen ($ var)> 2) // Set UTF8 encoding to 3, because UTF8 encoding generally contains more than 3 Chinese characters, filtering a single word is not saved!
{
$ Str. = base64_encode ($ var ). ""; // because MYSQL does not support full-text Chinese indexing, we must convert words into letters or numbers in Word Segmentation. here I select base64 encoding.
// Of course, you can also convert Chinese to a code area. However, I tested the code area method and it seems very common. Therefore, I recommend that you use base64.
$ I ++;
}
If ($ I> = 50) break; // set the number of phrases you want to store. if the number of phrases in an article may be N, you can set a larger value, however, if you split the title, 50 are enough!
// Run the unencoded result in this way: the word segmentation content should not exceed KB. otherwise
// Base64 encoded result: 1eLA7w = t9a0yg = xNrI3Q = 0ruw4w = srvSqg = s6y5/Q = mw.netc 0ru14w = t/HU8g =
}
?>
In this way, we can convert words into encoded word segmentation and store them in the tag field. of course, the tag field must be set to full text index and the data table type must be MyISAM.
The data types of tag fields are CHAR, VARCHAR, and text. for how to create a MYSQL full-text index, you can search for mysql index creation everywhere!
The following example shows a full-text index query based on the content of the TAG field:
Include ("Mysql. class. php ");
$ Rs = $ DB-> get_one ("select title, tag from music where id = $ id"); // read the title and tag field content based on the ID
$ Title = $ rs ['title'];
$ SQL = $ DB-> query ("select title, MATCH (tag) AGAINST ('". trim ($ rs ['tag']). "'In boolean mode) AS score from music where MATCH (tag) AGAINST ('". trim ($ rs ['tag']). "'in boolean mode) order by score DESC limit 21"); // 20 similar contents are queried and sorted IN a similar ORDER, 21 is because the same record will be matched!
While ($ rs = $ DB-> fetch_array ($ SQL ))
{
If ($ rs ['title']! = $ Title)
{
Echo $ rs ['title']."
";
}
}
$ DB-> close ();
?>
The above example implements a simple full-text index for Chinese word segmentation in PHP + MYSQL!