Php+mysql implement full-text search and full-text Search tool

Source: Internet
Author: User
Tags explode urlencode

Using the Division of Speech Library, divided into parts of speech library see: http://www.xunsearch.com/scws/

How do I use PHP for full-text indexing?
Many people may be able to come up with several options, such as file retrieval, SQL-like statements, and so on, but these methods are fairly inefficient.
This paper introduces a more efficient implementation of PHP full-text retrieval, which is the Fulltext field type using MySQL. However, MySQL fulltext Field support for Chinese is not very good, this article also describes how to achieve the Chinese full-text search function through Php+mysql.
First need to use a php Chinese word extension Module--scws, about the installation and use of the module can be WWW.FTPHP.COM/SCWS to find relevant content (if you have questions, please leave a message).
Then look at the information about the MySQL fulltext field type:
The version after MySQL3.23.23 begins to support full-text indexing and searching. Full-text indexing is an fulltext type index in MySQL.
The fulltext index is used for the MyISAM table and can be created on a CHAR, VARCHAR, or TEXT column using ALTER table or create index at or after create table. For large databases, loading the data into a table that does not have a fulltext index, and then creating an index with ALTER TABLE (or CREATE index), it will be very fast. Loading the data into a table that already has a Fulltext index will be very slow.

The

MySQL full text search is done through the MATCH () function.
Here is a simple example:
1, new data table:
CREATE TABLE fulltext_sample (copy) Type=myisam;
The copy here is a field of type Fulltext, and can be added by alert if no full-text search field is added when the table is built, such as:
ALTER table fulltext_sample add fulltext (copy)
2 , insert data:
INSERT INTO Fulltext_sample VALUES
(' It appears good from here '),
(' The Here and the past '),
(' Why is We hear '),
(' an all-out alert '),
(' All Need are love '),
(' A good alert ');
3. Data retrieval:
SELECT * from Fulltext_sample WHERE MATCH (copy) against (' Love ');
This is the full-text search feature of MySQL, note that searching on a full-text index is case-insensitive.

Then see how to achieve Chinese full-text search.
The Fulltext field is in terms of words, the words need to be separated by a space, and Chinese sentences are not separated by spaces between the words, so we need to Chinese word segmentation, which is why the above need strong words to use the Chinese word breaker extension module.
However, despite the Chinese word segmentation, MySQL can not be achieved through the match to achieve full-text retrieval of Chinese, which requires a certain way to convert, a relatively simple and practical method is to use the following function (and of course, better), it will be Chinese urlencode conversion.
function Q_encode ($STR)
{
$data = Array_filter (Explode ("", $str));
$data = Array_flip (Array_flip ($data));
foreach ($data as $ss) {
if (strlen ($SS) >1)
$data _code. = Str_replace ("%", "", UrlEncode ($SS)). " ";
}
$data _code = Trim ($data _code);
return $data _code;
}
Save the converted content to a pre-defined fulltext field. Similarly, the query will need to be the keyword of the same method of conversion.

Php+mysql implementation of UTF8 full-Text Search method

This article explains how to be able to quickly complete full-text retrieval in a huge amount of data. MySQL provides a full-text indexing feature that sets the field to the fulltext indexed property and then finds it through the match against statement of the Select.

We developed a pure English site touchus-the Global Yellow Pages & Business Directory (www.touchus.org) is the use of this feature of MySQL, the average full-text retrieval time for Grovan data is less than 0.5 seconds. But in the development of Touchus's Chinese website-the City Yellow Pages Network (www.city39.cn), encountered new problems. The original English typesetting words and words are separated by a space, Fulltext can be fully supported, but the Chinese or East Asian text is not so simple, because there is no obvious separation between Chinese words and words, so MySQL does not support the full-text search.

How can MySQL also support full text search in Chinese? The occasional generation of a train of thought, that is, can be in Chinese after the word, through the Chinese encoding into English characters, so that in the English and Chinese to establish a specific link, and then the full-text search, so do not realize the Chinese full-text index it? After trial, the answer is yes. The following are the specific processes implemented in the Urban Yellow Pages network:

1. Create a separate index table, such as the corresponding members table, we create a members_index table.

User Information table (members) user information full-text index table (members_index)

USER_ID user_id

User_name Index_intro

User_introduction

Add the fulltext Index to the Index_intro of the Members_index table.

2. user_introduction field content of User Information table (members) in Chinese word processing

Chinese word segmentation process, you can refer to the simple Chinese word segmentation system http://www.ftphp.com/scws/, in the urban Yellow Pages Network, we adopted the SCWS PHP extension Module way to achieve Chinese word segmentation. SCWS's PHP extensions are easy to install and can be used with a simple compile configuration. In the specific PHP code, we write the following function to implement the word segmentation after the word is connected with a space.

Chinese word breaker function

function Str_fc ($STR) {

$so = Scws_new ();

$so->set_charset (' UTF8 ');

There are no calls to Set_dict and the Set_rule system will automatically try to invoke the dictionary and rule file under the specified path in INI

$so->send_text ($STR);

while ($tmp = $so->get_result ())

{

foreach ($tmp as $ss) {

$s = trim ($ss [Word]);

if ($s)

$mystr. = Trim ($ss [Word]). " ";

Echo UrlEncode (Trim ($ss [Word]). " ";

}

}

return $mystr;

}

This function returns the result of a word that is concatenated with a space.

3. The word segmentation results can be encoded using a variety of coding methods, such as base64 encoding, UrlEncode encoding, Chinese characters to pinyin, etc., to gb2312 can even use Location Code encoding method. Considering the storage space and convenience, we use PHP's UrlEncode encoding method. It is important to note that before coding, we can eliminate the repetition of the word to save storage space, the encoding to remove the% symbol in the result, because UrlEncode with RFC 1738??? The line code will generate a lot of%, while the% in MySQL is a wildcard character. Here is the PHP code used in the coding process

$data = Str_fc ($data); Chinese participle

$data = Array_filter (Explode ("", $data)); Delete an array of empty items

$data = Array_flip (Array_flip ($data)); Delete duplicate items

Urlcode encoding of participle results

foreach ($data as $ss) {

if (strlen ($SS) >1)

$data _code. = Str_replace ("%", "", UrlEncode ($SS)). " ";

}

The $data_code here is the result of the encoding. The code results are deposited into the full text of the user information according to USER_ID

Introduction List (Members_index)

4. In the search processing, the user input keyword first for the same word-breaker processing, and then through the MySQL select match against statement for full-text fast retrieval, according to the user_id of the search node can call the user Information table (members) The original data in the display, without the need for a decoding reorganization.

Above MySQL UTF8 Chinese full-text retrieval method.


Full-Text Search tool: http://www.xunsearch.com


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.