Php+mysql simple implementation of Chinese word segmentation full-text indexing

Source: Internet
Author: User
Keywords mysql php

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

I believe many people have studied the Chinese full-text indexing MySQL problem, the current MySQL does not support Chinese full-text indexing, helpless ~ ~ ~

On the Internet to find more n on the php+mysql Chinese Full-text index articles are not a good, for the general rookie do not know how to use, PHP Chinese participle has ddede, Dede Thesaurus is not very powerful, but also calculate can use it!

There is also a "scws-simple Chinese word segmentation system," the current for the general rookie do not know how to use, although someone compiled the Windows version, but the installation is still very troublesome, I have not tested!

//==========================

Nonsense don't say much, first go to dede download word algorithm function and thesaurus; Download address: http://www.dedecms.com/html/chanpinxiazai/20061229/3.html

There is a detailed use of the instructions, I am here to cite an example of use;

MySQL Table name: Music

Field: Title,tag

  

Require ("lib_splitword_full.php");

$str = "Here is the content you want to participle, generally do not exceed 1024KB will be better, otherwise it will be very slow!"

$SP = new Splitword ()//instantiated

$DD = Explode ("", $sp->splitrmm ($STR));

$i = 0;

foreach ($dd as $key => $var) {

if (strlen ($var) >2)//UTF8 encoded please set to 3, because UTF8 encoded in Chinese is generally more than 3 bytes, filter single words do not save!

{

$str. =base64_encode ($var). "//Because MySQL does not support Chinese full-text indexing, we have to convert words into letters or numbers in participle, where I choose Base64 encoding

Of course, we can also convert Chinese into region code, but I tested the method of the region code, feel very general, so I recommend you to use Base64

$i + +;

}

if ($i >=50) break;//Here is set how many phrases you want to store, if the article may have n many phrases, you can set a larger point, but if you are on the title participle, 50 is enough!

This operation is not coded results: the content of this word is generally not more than 1024KB a point otherwise

Base64 encoded results: 1ela7w== t9a0yg== xnri3q== 0ruw4w== srvsqg== s6y5/q== mtaynetc, 0ru14w== t/hu8g==

}

?>

So we can divide the word and convert it into encoded word into the tag field, of course the tag field must be set to Full-text index full text and 6184.html "> Datasheet type must be MyISAM."

The tag field data type is char and varchar, text, about how to create a MySQL full-text index, you can search the MySQL creation index everywhere!

The following example is a Full-text indexing query based on the contents of the tag field:

  

Include ("Mysql.class.php");

$rs = $DB->get_one ("Select Title,tag from Music where id= $id")//Read title and tag field contents by ID

$title = $rs [' title '];

$sql = $DB->query ("select Title,match (tag) against ('. Trim ($rs [' tag '])." In BOOLEAN MODE as score from music where MATCH (tag) against (' ". Trim ($rs [' tag '])." In BOOLEAN MODE) (score DESC limit 21 ");//This is the query 20 similar content, and according to the similar sort, 21 is because the same record will match!

while ($rs = $DB->fetch_array ($sql))

{

if ($rs [' title ']!= $title)

{

echo $rs [' title '].
";

}

}

$DB->close ();

?>

If you want to see the effect can go to my station to see the demo: http://www.dj965.com

The above example realizes the Php+mysql simple Chinese word segmentation full-text Index!

Hope for the same as me rookie have some help!

I blog:http://hi.baidu.com/web20/

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.