How to solve the problem of inaccurate mysql fuzzy search

Source: Internet
Author: User
Tags mysql tutorial

In my understanding, mysql itself has poor support for Chinese characters. If we have a small amount of non-standard operations, then the Chinese characters will become even more messy, the following describes how to solve the problem of inaccurate fuzzy search in mysql.

For example, if you search the news library by using the "title", the keywords may contain both Chinese and English. The following SQL statement:

Select id, title, name from achech_com.news where title like '% a %'


In the returned results, some title fields are identified with the "a" keyword, while some are only Chinese, but are also returned in the search results.

Solution: Use the BINARY Attribute for retrieval:

Select id, title, name from achech_com.news where binary title like '% a %'

The returned results are correct, but the English letters are case-sensitive. Therefore, sometimes the results for searching such as "Achech" and "achech" are different. Now that we know that using the BINARY Attribute can solve the preceding problem, let's look at the UCASE and CONCAT functions supported by MySQL. UCASE converts all English words into uppercase letters, the CONCAT function is used to connect characters. The following is the SQL statement after the full solution:

Select id, title, name from achech_com.news where binary ucase (title) like concat ('%', ucase ('A'), '% ')


The search procedure is to first specify the attribute as BINARY for exact search results. The content of the like title may contain uppercase/lowercase letters, therefore, the ucase function is used to convert all the fields into uppercase letters and then perform the like operation. The like operation uses the fuzzy method, the advantage of using concat is that the input can be a direct keyword, without the "%" universal character, replace "'A'" with your variable directly, everything is easy in any language. You can also write the following statement:

Select id, title, name from achech_com.news where binary ucase (title) like ucase ('% a % ')


Another way is to achieve higher performance, MySQL full-text search

1. mySQL 4. full-text search is supported in Version x and later versions, but the storage engine type of the table must be MyISAM. The following table creation SQL statements are used. Note that the storage engine type is explicitly set.


Create table articles (
Id int unsigned AUTO_INCREMENT not null primary key,
Title VARCHAR (200 ),
Body TEXT,
FULLTEXT (title, body)
) ENGINE = MyISAM default charset = utf8; FULLTEXT (title, body) creates a full-text index for the title and body columns, and you must specify these two columns at the same time.

 

2. Insert Test Data


Insert into articles (title, body) VALUES
('Mysql Tutorial ', 'dbms stands for DataBase ...'),
('How To Use MySQL well', 'after you went through ...'),
('Optimizing mysql', 'In this tutorial we will show ...'),
('1970 MySQL Tricks ', '1. Never run mysqld as root. 2 ....'),
('Mysql vs. yoursql', 'In the following database comparison ...'),
('Mysql security', 'When configured properly, MySQL ...');

3. Full-text search test

SELECT * FROM articles
Where match (title, body) AGAINST ('database'); the search result is as follows:

5 MySQL vs. YourSQL In the following database comparison... 1 MySQL Tutorial DBMS stands for DataBase... the case sensitivity is ignored during full-text matching.

 

4. Possible troubles

So far it has been quite smooth, but what if I change the search SQL to the following?

SELECT * FROM articles
Where match (title, body) AGAINST:

Mysql specifies the minimum character length. The default value is 4. The returned result must be matched to a value greater than 4. You can use show variables like 'ft _ min_word_len 'to view the specified character length, you can also go to the mysql configuration file my. ini to change the minimum character length, in the my. add a line for ini, for example, ft_min_word_len = 2. Restart mysql after modification. Therefore, the preceding results cannot be returned. However, I used the above method to modify the configuration file and restart the MySQL server, and then run the show command to check it.

In addition, MySQL calculates the weight of a word to determine whether it appears in the result set, as shown below:

Mysql calculates their weights for each appropriate word in a set and query. A word that appears in multiple documents has a lower weight (or even a zero weight ), because in this particular set, it has lower semantic values. Otherwise, if the word is less, it will get a higher weight. The default mysql threshold value is 50%. The above 'you' appears in each document, so it is 100%, only less than 50% will appear in the result set. But what should we do if we do not consider the weight? MySQL provides boolean fulltext search)

If well appears in all records and ft_min_word_len is changed to 2, The result set obtained by the following SQL query statement will contain all records:

 

SELECT * FROM articles where match (title, body)
AGAINST ('well' in boolean mode );

 

5. boolean full-text search syntax

In boolean mode is used to specify the full-text retrieval MODE as BOOLEAN full-text retrieval. MySQL also provides some similar syntaxes that we usually use when using seo/seo.html "target =" _ blank "> search engines: logic and, logic or, non-logic. Several SQL statement examples are used to describe

SELECT * FROM articles where match (title, body)
AGAINST ('+ apple-banana' in boolean mode); + indicates AND, which must be included. -Indicates NOT, that is, NOT included.

SELECT * FROM articles where match (title, body)
AGAINST ('apple banana 'in boolean mode); There is a space between apple and banana, and a space indicates OR, that is, at least one of apple and banana is included.

SELECT * FROM articles where match (title, body)
AGAINST ('+ apple banana' in boolean mode); must contain apple, but if it also contains banana, it will get a higher weight.

SELECT * FROM articles where match (title, body)
AGAINST ('+ apple ~ Banana 'in boolean mode );~ Is an exclusive OR operator that we are familiar. The returned record must contain apple, but if it also contains banana, the weight is reduced. However, it is not strict with apple-banana, because the latter does not return if it contains banana.

SELECT * FROM articles where match (title, body)
AGAINST ('+ apple + (> banana <orange)' in boolean mode); returns records that contain both apple and banana or both apple and orange. However, records that contain both apple and banana have a higher weight than those that contain both apple and orange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.