PHP + MYSQL simple Chinese Word Segmentation full-text index

Source: Internet
Author: User
Tags mysql index

I believe many people have studied the full-text index of MYSQL. Currently, MYSQL does not support full-text index of Chinese ~~~

I have found N many articles on PHP + MYSQL full-text index on the Internet, but I don't know how to use it. The PHP Chinese Word Segmentation involves DDEDE, the DEDE dictionary is not very powerful. You can use it!

Another one is SCWS-simple Chinese word segmentation system. Currently, cainiao does not know how to use it. Although some of them have compiled the Windows version, it is quite troublesome to install it, I have not tested it yet!

// ==================================

Not much nonsense, first go to DEDE Download Word Segmentation Algorithm functions and word library;: http://www.dedecms.com/html/chanpinxiazai/20061229/3.html

There are detailed instructions for use. Here is an example;

MYSQL table name: music

Field: title, tag

  

Require ("lib_splitword_full.php ");

$ Str = "here is the content you want to split. Generally, it is better not to exceed kb, otherwise it will be slow! ";

$ Sp = new SplitWord (); // instantiate

$ Dd = explode ("", $ sp-> SplitRMM ($ str ));

$ I = 0;

Foreach ($ dd as $ key => $ var ){

If (strlen ($ var)> 2) // set UTF8 encoding to 3, because UTF8 encoding generally contains more than 3 Chinese characters, filtering a single word is not saved!

{

$ Str. = base64_encode ($ var ). ""; // because MYSQL does not support full-text Chinese indexing, We must convert words into letters or numbers in Word Segmentation. Here I select base64 encoding.

// Of course, you can also convert Chinese to a code area. However, I tested the code area method and it seems very common. Therefore, I recommend that you use base64.

$ I ++;

}

If ($ I> = 50) break; // set the number of phrases you want to store. if the number of phrases in an article may be N, you can set a larger value, however, if you split the title, 50 are enough!

// Run the unencoded result in this way: the word segmentation content should not exceed kb. Otherwise

// Base64 encoded result: 1eLA7w = t9a0yg = xNrI3Q = 0ruw4w = srvSqg = s6y5/Q = mw.netc 0ru14w = t/HU8g =

}

?>

In this way, we can convert words into encoded word segmentation and store them in the tag field. Of course, the tag field must be set to full text index and the data table type must be MyISAM.

The Data Types of tag fields are CHAR, VARCHAR, and text. For how to create a MYSQL full-text index, you can search for mysql index creation everywhere!

The following example shows a full-text index query based on the content of the TAG field:

  

Include ("Mysql. class. php ");

$ Rs = $ DB-> get_one ("select title, tag from music where id = $ id"); // read the title and tag field content based on the ID

$ Title = $ rs ['title'];

$ SQL = $ DB-> query ("select title, MATCH (tag) AGAINST ('". trim ($ rs ['tag']). "'In boolean mode) AS score from music where MATCH (tag) AGAINST ('". trim ($ rs ['tag']). "'in boolean mode) order by score DESC limit 21"); // 20 similar contents are queried and sorted IN a similar ORDER, 21 is because the same record will be matched!

While ($ rs = $ DB-> fetch_array ($ SQL ))

{

If ($ rs ['title']! = $ Title)

{

Echo $ rs ['title']."
";

}

}

$ DB-> close ();

?>

If you want to see the effect can go to my station to see the demo: http://www.dj965.com

The above example implements a simple full-text index for Chinese Word Segmentation in PHP + MYSQL!

Hope it will be helpful for cainiao like me!

BLOG: http://hi.baidu.com/web20/

Article from: http://www.admin5.com/article/20080717/94313.shtml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.