MySQL implements fuzzy query (REGEXP, LIKE) in two ways

Source: Internet
Author: User

There are two methods to implement fuzzy query in mysql: LIKE/not like, And REGEXP/not regexp. Next I will introduce their usage, I hope this tutorial will help you.

First, use LIKE/not like,
Second, use REGEXP/not regexp (or RLIKE/not rlike, which are synonyms ).

First: match the standard SQL mode.

It has two wildcards: "_" and "% ". "_" Matches any single character, while "%" matches any number of characters (including 0 ).
Example:

The Code is as follows: Copy code

SELECT * FROM table_name WHERE column_name LIKE'm % '; # query all records whose names begin with limit between m or m in a field
When rows have been selected SELECT * FROM table_name WHERE column_name LIKE '% m %'; # query all records whose fields contain rows have been m or M
Explain SELECT * FROM table_name WHERE column_name LIKE '% m'; # query all records whose names end with m or m in a field
SELECT * FROM table_name WHERE column_name LIKE '_ m _'; # query all records with three characters in the middle of a field and m or M

What if we want to query strings that contain wildcards?
Such as 50% or _ get.
The answer is: escape. It can be used to ESCAPE directly or use ESCAPE to define ESCAPE characters, but only ESCAPE the following character,

For example:

The Code is as follows: Copy code

Explain SELECT * FROM table_name WHERE column_name LIKE '% 50% %';/* 2nd % escaped, query all records with a field containing 50% */
Escape select * FROM table_name WHERE column_name LIKE '% 50/% 'escape'/'; # 2nd % escaped
SELECT * FROM table_name WHERE column_name LIKE '% _ get % 'escape'/';/* "_" is escaped to query all records whose fields contain _ get */


Second: use the extended regular expression for pattern matching.

Let's take a look at the meanings of some characters in the extended regular expression:
".": Matches any single character
"?" : Match the previous subexpression 0 times or 1 time.
"+": Matches the previous subexpression once or multiple times.
"*": Matches the previous subexpression 0 or multiple times. X * Indicates 0 or multiple x characters. [0-9] * matches any number of numbers.
"^": Indicates the starting position of the match.
"$": Indicates the end position of the match.
"[]": Indicates a set. [Hi], indicating matching h or I; [a-d], indicating matching any one of a, B, c, and d.
"{}": Indicates the number of repetitions. 8 {5} indicates that 5 8 digits are matched, that is, 88888; [0-9] {5, 11} indicates that 5 to 11 digits are matched.

Let's look at an example:

The Code is as follows: Copy code

SELECT * FROM table_name WHERE column_name REGEXP '^ 50% {1, 3 }';

/* Query all records starting with 50%, 50%, or 50% % in a field */

Method 3: fulltext full-text search is required to be more advanced.

We will explain the full-text search process step by step through examples:

On the homepage, we create tables and initialize data.

SQL code

The Code is as follows: Copy code
Create table if not exists 'category '(
'Id' int (10) not null auto_increment,
'Fid' int (10) not null,
'Catname' char (255) not null,
'Addtime' char (10) not null,
Primary key ('id '),
Fulltext key 'catname' ('catname ')
) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 5;


Insert into 'category '('id', 'fid', 'catname', 'addtime') VALUES
(1, 0, 'Welcome to you! ', '123 '),
(2, 0, 'Hello phpjs, you are welcome ', '123 '),
(3, 0, 'This is the fan site of you', '123 ');

Create table if not exists 'category '('id' int (10) not null auto_increment, 'fid' int (10) not null, 'catname' char (255) not null, 'addtime' char (10) not null, primary key ('id'), fulltext key 'catname' ('catname') ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 5; insert into 'category '('id', 'fid', 'catname', 'addtime') VALUES (1, 0, 'Welcome to you! ', '123'), (2, 0, 'Hello phpjs, you are welcome', '123'), (3, 0, 'This is the fan site of you', '123 ');

 
Before specific examples, we will analyze the syntax for full-text retrieval of msyql: The function MATCH () compares a text set (a column set that contains one or more columns in a FULLTEXT index) execute a natural language to search for a string. The search string is given as a parameter of AGAINST. Search to ignore uppercase/lowercase letters. To put it bluntly, MATCH is the given matching column (fulltext Index). AGAINST is given the string to be matched. Multiple columns are separated by spaces and punctuations. mysql will automatically separate them.


SQL code

The Code is as follows: Copy code
SELECT * FROM 'category' where match (catname) AGAINST ('phpjs ')

 

Returned results:

The Code is as follows: Copy code

Id fid catname addtime
2 0 hello phpjs, you are welcome 1263363416

Match the row data containing the phpjs keyword.

 
2. SQL code

 

The Code is as follows: Copy code
SELECT * FROM 'category 'where match (catname) AGAINST ('title ')

 
 
According to the above idea, the data in the third row contains this, so we should be able to match the data in the third row, but the fact is strange. The returned result is blank. Why?

It turns out that mysql has specified the minimum character length. The default value is 4. The returned result must be matched to a value greater than 4, you can use show variables like 'ft _ min_word_len 'to view the specified character length, or in the mysql configuration file my. ini to change the minimum character length, in the my. add a line for ini, for example, ft_min_word_len = 2. Restart mysql after modification.

 
3. Here we want to change the minimum character to 2, because all the three rows of records have 'you', so we thought we could return all the results by matching 'you '.

SQL code

The Code is as follows: Copy code
SELECT * FROM 'category 'where match (catname) AGAINST ('you ')

 
The returned result is still blank. Why?

In the past, mysql calculated their weights for each appropriate word in the set and query. A word that appears in multiple documents has a lower weight (or even has a zero weight ), because in this particular set, it has lower semantic values. Otherwise, if the word is less, it will get a higher weight. The default mysql threshold value is 50%. The above 'you' appears in each document, so it is 100%, only less than 50% will appear in the result set.
 

4. Some people may think that I don't care about the weight. If there is a match, I will return a result set. What should I do?

From mysql to 4.0.1, you can use the in boolean mode modifier to perform a logical full-text search.

SQL code

The Code is as follows: Copy code
SELECT * FROM 'category 'where match (catname) AGAINST ('you' in boolean mode)

 
 

Conclusion: 1. Pay attention to the minimum character length;

2. Pay attention to the keyword weight;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.