MySQL Chinese Fuzzy Lookup inexact solution

Source: Internet
Author: User
Tags ini mysql in mysql tutorial

For example, by using the title to retrieve a news library, the keyword may contain both Chinese and English, the following SQL statement:

Select Id,title,name from achech_com.news where title like '%a% '


Returns, some title fields are determined with the "a" keyword, while others have only Chinese, but are returned in the search results.

Workaround, using the BINARY property to retrieve:

Select Id,title,name from achech_com.news where binary title like '%a% '

The returned results are more correct than before, but the English letters are case-sensitive, so there are times when the results of retrieval such as "Achech" and "Achech" are different. Know the use of the BINARY property can solve the previous problem, and then look at the MySQL-supported UCASE and CONCAT functions, where UCASE is to all uppercase in English, and the function of the CONCAT function is to connect the characters, the following is a fully resolved SQL statement:

Select Id,title,name from achech_com.news where binary UCase (title) like Concat ('% ', UCase (' a '), '% ')


The retrieval step is to specify the attribute as BINARY to accurately retrieve the result, while the title content of like is the possibility of uppercase and lowercase letters, so first use the UCase function to convert all the field contents to uppercase letters, and then do like operations, while like operations using the Fuzzy method, using the CO NCAT's advantage is that it can be a direct keyword, do not need to take the "%" million, the "' A '" directly into your variables, in any language everything is all right. Of course, you can also write this:

Select Id,title,name from achech_com.news where binary UCase (title) Like UCase ('%a% ')


Another way, higher performance, MySQL Full-text search

1. MySQL 4.x version and above version provides Full-text search support, but the table's storage engine type must be MyISAM, the following is the table SQL, note that the storage engine type is explicitly set


CREATE TABLE Articles (
ID INT UNSIGNED auto_increment not NULL PRIMARY KEY,
Title VARCHAR (200),
Body TEXT,
Fulltext (Title,body)
) Engine=myisam DEFAULT Charset=utf8; where Fulltext (title, body) to the title and body of the two columns to establish a Full-text index, and then retrieve the note must be specified both columns.

2. Insert test data


INSERT into articles (Title,body) VALUES
(' MySQL Tutorial ', ' DBMS stands for DataBase ... '),
(' How to use MySQL ok ', ' after you went through a ... '),
(' Optimizing MySQL ', ' in this tutorial we'll show ... '),
(' 1001 MySQL Tricks ', ' 1. Never run mysqld as root. 2... '),
(' MySQL vs. Yoursql ', ' in the following database comparison ... '),
(' MySQL security ', ' when configured properly, MySQL ... ');

3. Full-Text Search test

SELECT * from articles
WHERE MATCH (title,body) against (' database '); The results of the search are as follows:

5 MySQL vs. Yoursql in the following database comparison ...   1 MySQL Tutorial DBMS stands for DataBase ... Indicates that case is ignored for full-text matching.

4. Problems that may be encountered

So far, so good, but what if the search for SQL is changed to below?

SELECT * from articles
WHERE MATCH (title,body) against (' OK '); The results of the people surprised, I began to be puzzled for a long time, then went online to find out the original is so:

mysql Specifies the minimum character length, the default is 4, you must match more than 4 will have return results, you can use show VARIABLES like ' Ft_min_word_len ' to see the specified length of characters, can also be in the MySQL configuration file My.ini   Change the minimum character length by adding a row to the My.ini, such as: Ft_min_word_len = 2, restart MySQL after the change. So the above cannot return the result. But I used the above method to change the configuration file and restart the MySQL server, and then use the show command to view, and did not change.

In addition, MySQL calculates the weight of a word to determine whether it appears in the result set, as follows:

MySQL in the set and query for each of the appropriate words will first calculate their weight, a word appearing in multiple documents will have a lower weight (may even have a 0 weight), because in this particular set, it has a lower semantic value.   Otherwise, if the word is less, it will get a higher weight, the MySQL default threshold is 50%, above ' you ' appears in every document, so is 100%, only below 50% will appear in the result set. But what if you don't consider the weight? MySQL provides Boolean Full-text search (Boolean fulltext search)

Assuming well to appear in all records, and Ft_min_word_len has been changed to 2, the following SQL retrieval statement results in a result set that contains all the records:

SELECT * from articles WHERE MATCH (title,body)
Against (' OK ' in BOOLEAN MODE);

5. Boolean Full-text Search Syntax

The above method specifies that Full-text search is a Boolean Full-text search through in Boolean mode. MySQL also provides some syntax similar to those we use when using the seo/seo.html "target=" _blank > Search engine: Logical, logical, or logical. Several examples of SQL statements are used to illustrate

SELECT * from articles WHERE MATCH (title,body)
Against (' +apple-banana ' in BOOLEAN MODE); + means and, that must be included. -Indicates not, that is, not included.

SELECT * from articles WHERE MATCH (title,body)
Against (' Apple banana ' in BOOLEAN MODE); There is a space between Apple and banana, and a space denotes or, that is, at least one of Apple, banana.

SELECT * from articles WHERE MATCH (title,body)
Against (' +apple banana ' in BOOLEAN MODE); You must include Apple, but you will get a higher weight if you also include banana.

SELECT * from articles WHERE MATCH (title,body)
Against (' +apple ~banana ' in BOOLEAN MODE); ~ is the familiar XOR or operator. The returned record must contain Apple, but if it also contains banana, the weight will be lowered. But it is not +apple-banana strict because the latter does not return if it contains banana.

SELECT * from articles WHERE MATCH (title,body)
Against (' +apple + (>banana <orange) ' in BOOLEAN MODE); Returns records that contain both Apple and banana or both apple and orange. But the records that contain both Apple and banana are more weighted than the records that contain both Apple and orange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.