MySQL Chinese full-text Search learning Notes

Last Update:2017-01-13 Source: Internet

Author: User

Tags character set ini mysql tutorial

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The fuzzy search keyword in mysql can be used with the LIKE keyword, which can query for fields with key words, but when the data is large (such as millions), a full table scan can cause poor performance due to the inability to use the index. How can you query the contents of the inclusion keyword? In this case, you can use MySQL Full-text search. But there is a prerequisite: you need to add a fulltext index to the Full-text search field and only build on the MyISAM engine. The search is then done in such a way as match (field 1[, Field 2, Field 3 ...]) against (' keyword 1 keyword 2 '), which allows you to query some of the fields for information that includes certain keywords.

In addition, the character is matched with like, and the full-text search matches the keyword. For example: "Welcome to the Martian era IT development", if you use like query this sentence of any connected characters such as "Welcome", "IT development" similar to meaningless characters can also query the corresponding results. Full-Text Search can only use a number of meaningful words to query to be able to query, such as "Mars Era", "development" Such keywords can be found. Usually query relatively short content, such as fuzzy query name, you can use like, and from large paragraph content query some keywords, it is necessary to use Full-text search.

Full-Text Search, if the search for all English content, because English each word is separated by a space, MySQL can be found through the space with the search keyword some records. But because Chinese is not as separate as English, MySQL is unable to search direct query to include the search keyword records, this is required to write the data in the record of the contents of a single keyword with a space to write, so that MySQL can search for inclusion in Chinese full-text search. For example: "Welcome to the Martian era IT development", write a database with a space to write each keyword, similar to "Welcome to the Martian era IT development" statement.

See a SCWS such a Chinese word breaker is quite good, simple to learn a bit. It includes a number of proprietary names, names, place names, digital age and other rules set, you can direct the statement according to these rules to separate into a keyword, the accuracy rate between 90%-95%. Follow the installation instructions to put SCWS extensions into the extended directory of PHP, download rules and dictionary files, and reference them in the PHP configuration file, you can use SCWS for word segmentation.

The following are examples of simple uses:

code is as follows

copy code

<?php//instantiation of the plug-in core class
$so = Scws_new ();
///Set the encoding used for participle
$so->set_charset (' utf-8 ');
//Set the dictionary for participle (use the UTF8 dictionary here)
$so->set_dict ('/path/dict.utf8.xdb ');
//Set the rules for Word segmentation
$so->set_rule ('/path/rules.utf8.ini '), remove punctuation marks before//participle
$so->set_ignore (TRUE);
// Whether duplex division, such as "Chinese" return "Chinese + people + Chinese" three words.
$so->set_multi (true);
//set to automatically aggregate text in two-word participle
$so->set_duality (TRUE);
//Word to be participle
$so->send_text ("Welcome to the Mars era IT development");
// Gets the word segmentation result if the high-frequency word is extracted using the Get_tops method
while ($tmp = $so->get_result ())
{
print_r ($tmp);
}
$so->close ();
?

Note: As the above example, the input of text, dictionaries, rules file the character set of these three must be unified; In addition MySQL 4.XX Some do not support Chinese full-text search, you can deposit key words corresponding to the location code to facilitate full text search

MySQL full-text retrieval notes

1. MySQL 4.x version and above version provides Full-text search support, but the table's storage engine type must be MyISAM,
Here is the table SQL, note that the storage engine type is explicitly set

The code is as follows	Copy Code
CREATE TABLE Articles ( ID INT UNSIGNED auto_increment not NULL PRIMARY KEY, Title VARCHAR (200), Body TEXT, Fulltext (Title,body) ) Engine=myisam DEFAULT Charset=utf8; Www.111cn.net

where Fulltext (title, body) to the title and body of the two columns to establish a Full-text index, after retrieving the attention must be specified both columns.

2. Insert test data

The code is as follows

Copy Code

INSERT into articles (Title,body) VALUES
(' MySQL Tutorial ', ' DBMS stands for DataBase ... '),
(' How to use MySQL ok ', ' after you went through a ... '),
(' Optimizing MySQL ', ' in this tutorial we'll show ... '),
(' 1001 MySQL Tricks ', ' 1. Never run mysqld as root. 2... '),
(' MySQL vs. Yoursql ', ' in the following database comparison ... '),
(' MySQL security ', ' when configured properly, MySQL ... ');

3. Full-Text Search test

The code is as follows	Copy Code
SELECT * from articles WHERE MATCH (title,body) against (' database '); The results of the search are as follows: 5 MySQL vs. Yoursql in the following database comparison ... 1 MySQL Tutorial DBMS stands for DataBase ...

Indicates that case is ignored for full-text matching.

4. Problems that may be encountered

So far, so good, but what if the search for SQL is changed to below?

The code is as follows	Copy Code
SELECT * from Articles www.111cn.net WHERE MATCH (title,body) against (' OK ');

The results of the people surprised, I began to be puzzled for a long time, then went online to find out the original is so:

mysql Specifies the minimum character length, the default is 4, you must match more than 4 will have return results, you can use show VARIABLES like ' Ft_min_word_len ' to see the specified length of characters, can also be in the MySQL configuration file My.ini Change the minimum character length by adding a row to the My.ini, such as: Ft_min_word_len = 2, restart MySQL after the change.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More