MySQL Chinese search match and regular

Source: Internet
Author: User
Tags control characters

Today when using the SQL fuzzy query contains the letter D, found that some do not include this letter is also queried:

SELECT *  from Custom WHERE   like ' %d% '

Query for a moment, found the following instructions:

(Last modified to:

SELECT *  from Custom WHERE   like BINARY ' %d% '

Today, when I was doing a search for MySQL, I found out that when I use the select name from the contact where name is like '%a% ', the result of a name other than the name "new" is included in the search results. This makes me want to find out how MySQL matches the pattern and rules, so I decided to check the information to understand, in addition to match the regular expression is also very common! So I'm going to record my learning of these two things here!
This problem occurs because MySQL is case insensitive when querying strings, and when Yi MySQL is generally used as the default character set for the ISO-8859 character set, this behavior is caused by the Chinese encoding character-case conversion during comparison.


Workaround:
1. When building a table, add the "binary" attribute to the field containing the Chinese, making it a binary comparison, for example, "name char (10)" to "name Char (TEN) binary". But this is case-sensitive when you match the field of the table.


2. If you compile MySQL using the source code, you can use the--WITH--CHARSET=GBK parameter when compiling, so that MySQL supports Chinese search and sorting directly.


3. Use MySQL's locate function to judge. Such as:
SELECT * FROM table WHERE Locate (SUBSTR,STR) >0;
There are two forms of locate (): LOCATE(substr,str)LOCATE(substr,str,pos)。返回substr在str中的位置,如果str不包含substr返回0。这个函数也是不区分大小写的。


4.这样使用sql语句:SELECT * FROM TABLE WHERE FIELDS LIKE BINARY ‘%FIND%‘,但是这和1一样是区分大小写的如果你想进行不区分大小写的查询的时候就要使用upper或者lower进行转换。


5.使用binary和ucase函数及concat函数。ucase是讲英文全部转换大写,concat对字符串进行连接。新的sql语句如下:
select id,title,name from achech_com.news where binary ucase(title) like concat(‘%‘,ucase(‘a‘),‘%‘) 
也可以写为select id,title,name from achech_com.news where binary ucase(title) like ucase(‘%a%‘) 
检索的结果还算满意吧,不过速度可能会因此而慢N毫秒喔。 因为使用like和%进行匹配的话对效率会有一定的影响。

正则表达式:
Regular expressions are a powerful way to specify patterns for complex searches.
^
The matched string begins with a string that follows
mysql> Select "Fonfo" REGEXP "^fo$"; -0 (indicates mismatch)
mysql> Select "Fofo" REGEXP "^FO"; --1 (indicates a match)
$
The matched string ends with the preceding string
Mysql> Select "Fono" REGEXP "^fono$"; --1 (indicates a match)
Mysql> Select "Fono" REGEXP "^fo$"; -0 (indicates mismatch)
.
Match any character (including new lines)
mysql> Select "Fofo" REGEXP "^f.*"; --1 (indicates a match)
mysql> Select "Fonfo" REGEXP "^f.*"; --1 (indicates a match)
A *
Match any number of Aces (including empty strings)
mysql> Select "Ban" REGEXP "^ba*n"; --1 (indicates a match)
mysql> Select "Baaan" REGEXP "^ba*n"; --1 (indicates a match)
mysql> Select "Bn" REGEXP "^ba*n"; --1 (indicates a match)

A +
Matches any sequence of 1 or more a characters.

mysql> Select "Ban" REGEXP "^ba+n"; --1 (indicates a match)
mysql> Select "Bn" REGEXP "^ba+n"; -0 (indicates mismatch)

A?
Match one or 0 a
mysql> Select "Bn" REGEXP "^ba?n"; --1 (indicates a match)
mysql> Select "Ban" REGEXP "^ba?n"; --1 (indicates a match)
mysql> Select "Baan" REGEXP "^ba?n"; -0 (indicates mismatch)

De|abc
Match de or ABC
mysql> Select "PI" REGEXP "Pi|apa"; --1 (indicates a match)
Mysql> select "Axe" REGEXP "Pi|apa"; -0 (indicates mismatch)
Mysql> Select "APA" REGEXP "Pi|apa"; --1 (indicates a match)
Mysql> Select "APA" REGEXP "^ (Pi|apa) $"; --1 (indicates a match)
mysql> Select "PI" REGEXP "^ (Pi|apa) $"; --1 (indicates a match)
mysql> select "Pix" REGEXP "^ (Pi|apa) $"; -0 (indicates mismatch)

(ABC) *
Match any number of ABC (including Empty strings)
mysql> Select "PI" REGEXP "^ (pi) *$"; --1 (indicates a match)
mysql> Select "Pip" REGEXP "^ (pi) *$"; -0 (indicates mismatch)
mysql> Select "Pipi" REGEXP "^ (pi) *$"; --1 (indicates a match)

{1} {2,3}
This is a more comprehensive approach that enables the functionality of several previous reserved words
A *
Can be written as a{0,}
A
Can be written as A{1,}
A?
can be written a{0,1}
There is only one integer parameter I in {}, which indicates that the character can only appear I, and there is an integer parameter I in {}.
Followed by a ",", indicating that the character can appear I or I times or more, in {} There is only one integer parameter I,
Followed by a ",", followed by an integer parameter J, indicating that the character can only appear more than I times, J times the following
(including I and J times). Where the integer parameter must be greater than or equal to 0, less than or equal to Re_dup_max (default is 25
5). If both M and n,m are given, they must be less than or equal to N.

[A-DX], [^A-DX]

Matches any character that is (or is not, if used ^) A, B, C, D, or X. The "-" character between the two other characters forms a range that matches all characters from the beginning of the 1th character to the 2nd character. For example, [0-9] matches any decimal number. To include the literal character "]", it must be immediately after the opening parenthesis "[". To include the literal character "-", it must be written first or last. for [] Any character that does not have any special meaning defined internally, it only matches itself.

Mysql> Select "AxBC" REGEXP "[a-dxyz]"; 1 (means match)  
mysql> Select "AxBC" REGEXP "^[a-dxyz]$"; 0 ( &NBSP,
mysql> Select "AxBC" REGEXP "^[a-dxyz] $"; 1 (for matching)  
mysql> Select "AxBC" REGEXP "^[^a -DXYZ] $ "; -0 (indicates mismatch)  
mysql> Select "Gheis" REGEXP "^[^a-dxyz] $"; 1 (for matching)  
mysql> Select "Gheisa" REGEXP "^[^a-dxyz] $"; -0 (indicates mismatch)

[[. characters.]]  
represents the order in which elements are compared. The order of the characters within the parentheses is unique. But the parentheses can contain wildcard characters,  
so he can match more characters. For example: regular expression [[. Ch.]] *c matches the first five characters of a CHCHCC  
.

[=character_class=] 
represents equal classes, which can be substituted for other equal elements in the class, including itself. For example, if O and () are  
is a member of an equal class, then [[=o=], [[= () =]], and [O ()] are exactly equivalent.  

[: character_class:] 
in parentheses, in the middle of [: and:] is the name of the character class, which can represent all the characters belonging to this class.   The
character class names are: Alnum, digit, punct, alpha, graph, space, blank, lower, uppe ,
R, Cntrl, print, and xdigit 
mysql> Select "Justalnums" REGEXP "[[: Alnum:]]"; -1 (means match)  
mysql> select "!!" REGEXP "[[: Alnum:]]"; -0 (indicates mismatch)  

Alnum

Literal numeric characters

Alpha

Literal characters

Blank

Whitespace characters

Cntrl

Control characters

Digit

Numeric characters

Graph

Graphic characters

Lower

Lowercase text characters

Print

Graphic or space character

Punct

Punctuation characters

Space

Spaces, tabs, new lines, and carriage returns

Upper

Uppercase text characters

Xdigit

Hexadecimal numeric characters


[[: <:]]
[[:;:]]
Match an empty string at the beginning and end of a word, and the beginning and end of the word are not included in the Alnum.
Characters cannot be underlined.
Mysql> Select "A word a" REGEXP "[[: <:]]word[[:>:]]"; --1 (indicates a match)
Mysql> Select "A Xword a" REGEXP "[[: <:]]word[[:>:]]"; -0 (indicates mismatch)
Mysql> Select "Weeknights" REGEXP "^ (Wee|week) (knights|nights) $"; -1 (indicates
Match

To use a literal instance of a special character in a regular expression, precede it with a 2 backslash "/" character. The MySQL parser is responsible for interpreting one, and the regular expression Library is responsible for interpreting the other. For example, to match the string "1+2" that contains the special character "+", only the last one is correct in the following regular expression:

SELECT ' 1+2 ' REGEXP ' 1+2 ';                       0
SELECT ' 1+2 ' REGEXP ' 1/+2 ';                      0
SELECT ' 1+2 ' REGEXP ' 1//+2 ';                     1

MySQL Chinese search match and regular

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.