MySQL regular expression and its application

Source: Internet
Author: User
Tags character classes control characters alphanumeric characters

Pattern matching for MySQL WHERE clause

Today in the application of this problem encountered,
There is a field T1, where the value is similar to: 1,1,1,2,3,3,4,4,5,5,2,4,3,2,1,2

Need to search from inside for example: the first comma before the number range is 3-5, the third comma before the number of the range of 3-5, the 10th comma before the number range is 3-5, the rest is 1-5 ...

The SQL statement can be written like this:

    1. SELECT *  from TB WHERE T1 REGEXP ' ^[3-5],[1-5],[3-5],[1-5],[1-5],[1-5],[1-5],[1-5],[1-5],[3-5],[1-5],[1-5],[1-5],[1-5],[1-5],[1-5]% ' ;

1. Use the like and not-like comparison operators (note that you cannot use = or! =);

2. The default mode is to ignore the case;

3. Allow the use of "_" to match any single character, "%" matches any number of characters (including 0 characters);

MySQL also provides an extended regular expression pattern matching format like the Unix utility:

1. Use the regexp and not regexp operators (or rlike and not rlike, they are synonyms);

2. The regexp pattern match matches any place of the matched character, then the match succeeds (that is, if the matched character contains or can be equal to the defined pattern, the match succeeds);

Unlike the like pattern match, only matches the entire value, and the match succeeds (that is, only the matched character matches the defined pattern exactly).

3. RegExp default is also case-insensitive, you can use binary keyword to enforce case-sensitive;

such as: SELECT * from pet WHERE name REGEXP BINARY ' ^b ';

4. The regular expression is an expression that can describe a set of strings. The regexp operator completes an extended regular expression match for MySQL . The function implemented by RegExp is that the match is successful if it is partially or exactly the character that is described by the defined expression.

1) The simplest regular expression is a regular expression that does not contain any special characters, such as Hello.

SELECT * from pet WHERE name REGEXP ' Hello '; means that if a row of name contains the word hello, the match succeeds. (Note the difference between like and like that a row of name must be exactly equal to Hello in order to match successfully).

2) non-trivial regular Expressions , in addition to containing the simplest expression of those things, but also need to use special special structure, the use of characters, look down. (therefore, the usual regular expression is an expression that consists of ordinary words and these regular expression characters)

5. Extend some characters of the regular expression :

1) '. ' Match any single character;

2) [...] Matches any character within square brackets, you can use '-' to represent a range, such as [a-z],[0-9], and you can mix [a-dxyz] to match any one of the a,b,c,d,x,y,z; (note using parentheses and ' | ') Method can also achieve the same effect, such as (A|B|C) match any one of the a,b,c), in addition, you can use ' ^ ' to denote negation, such as [^a-z] means that there is no one character in the middle of a-Z;

3) ' * ' indicates a match of 0 or more characters in front of it. If x* represents 0 or more x characters,. * Indicates that any number of characters are matched;

4) The pattern positioning must match the beginning or end of the matched character, preceded by a "^" in the matching pattern: Indicates that the match starts at the beginning of the matched character and adds "$" after the matching pattern: Indicates that the match is to be made to the end of the matched character.

5) ' + ' indicates a match of 1 or more characters in front of it. such as A + represents 1 or more a characters.

6) '? ' Represents a character that matches 0 or 1 characters in front of it. If a? represents 0 or 1 a characters.

7) ' | ' Such as DE|ABC indicates a matching sequence de or ABC. Note though [...] It can also represent one of the matches, but each time only a single character is represented and [a-bxyz] actually represents only one character per time.

() parentheses can be applied in an expression, making it easier to understand.

9) A{5} indicates a matching total of 5 a,a{2,8} to match 2~8 a.

A * can be written as a{0,} The second parameter ellipsis means no upper bound; A + can be written as a{1,};a? can be written as a{0,1}

More precisely, a{n} matches the n instances of a exactly. A{n,} matches N or more instances of a. A{m,n} matches an m~n instance of a, containing M and n
M and n must be in the range of 0~re_dup_max (default 255) and contain 0 and Re_dup_max. If both M and n,m are given, they must be less than or equal to N.
<!--[If!supportlinebreaknewline]-->
<!--[endif]-->

10) Standard Category [: Character_class:]:

Some common standard categories are commonly used in [], as they are used in [] and [A-z], each time only one character is replaced. (This is somewhat similar to some of the common standard categories defined in Perl: \w represents a word character that is [a-za-z0-9];\w a non-word character opposite \w; \d a number is [0-9];\d a non-numeric; \s A white space character is [\t\f\r\n];\f for a page break; \s a non-white space character)

Standard category name:

Alnum
Literal numeric characters

Alpha
Literal characters

Blank
White space characters

Cntrl
Control characters

Digit
numeric characters

Graph
Graphic characters

Lower
lowercase text characters

Print
Graphic or space character

Punct
Punctuation character

Space
Spaces, tabs, new lines, and carriage returns

Upper
Uppercase literal characters

Xdigit
hexadecimal numeric characters

Usage examples:

SELECT ' justalnums ' REGEXP ' [[: alnum:]]+ ';
explanation [[: Alnum:]] because [: Alnum:] represents a literal numeric character, it is used in [], so [[: Alnum:]] Represents a character it is a literal or a number. The following + sign indicates 1 or more such words or numbers.

The above statement returns 1. That's because the justalnums is made up of letters.

11) Word Boundary: [[: <:]] means start, [[:];:]] means end:

It defines the beginning and ending bounds of a word, which is a word character, so that [[: <:]] represents the part of the character that precedes it, [[:]:] Represents the part that follows this character. Character characters are alphanumeric characters or underscores (_) for the Alnum class, so [[: <:], [[:]:] All represent characters that are not character characters, as long as they are not alphanumeric characters and underscore (_). So it can be nothing. therefore [[: <:]]word[[:>:]] can match all of the following conditions:

That is, the word word itself, word*** explanation * * * represents any character that is not alphanumeric and _ (e.g., word-net); ***word (for example, Micorsoft Word); ***word*** (for example, this is a word program.)

Example: [[: <:]]word[[:>:]]:

SELECT ' A word a ' REGEXP ' [[: <:]]word[[:>:]] '; The result is true
SELECT ' A Xword a ' REGEXP ' [[: <:]]word[[:>:]] '; The result is false

Note For the last note:
To use special characters in regular expressions , you need to add 2 backslashes ' \ ' in front of these characters,
Example:
SELECT ' 1+2 ' REGEXP ' 1+2 '; Result is 0
SELECT ' 1+2 ' REGEXP ' 1\+2 '; Result is 0
SELECT ' 1+2 ' REGEXP ' 1\\+2 '; Result is 1
Explain:
This is because the MySQL parser resolves the SQL statement:
First, the string ' 1\\+2 ' is parsed into 1\+2;
The 1\+2 is then treated as a regular expression , parsed by the regular Expression library, which represents 1+2.
Therefore, you need to add 2 backslashes.

Do not always make a backslash error, plus a backslash will be confusing:
such as select ' 1t2 ' REGEXP ' 1\t2 ';
The result will return 1
Originally meant to match 1 tabs \ t and 2, but because only one \ So, after parsing programming 1t2, so the match succeeds.
12)
[. Characters.] and [=character_class=]

Appendix G:mysql Regular Expressions

Regular expressions are a powerful way to specify patterns for complex searches.

MySQL is implemented using the regular expression of Henry Spencer , whose goal is to conform to POSIX 1003.2. See C. Credits "href=" http://dev.mysql.com/doc/refman/5.1/zh/credits.html "> Appendix C: Thank you. MySQL uses an extended version to support pattern matching operations that are used with the REGEXP operator in SQL statements. See Pattern Matching "href=" http://dev.mysql.com/doc/refman/5.1/zh/tutorial.html#pattern-matching ">3.3.4.7 section," Pattern matching ".

In this appendix, we summarize the special characters and structures that can be used for REGEXP operations in MySQL, and give some examples. This appendix does not contain all the details that can be found on the regex (7) manual page of Henry Spencer. This manual page is included in the MySQL source distribution, in the regex.7 file in the regex directory.

A regular expression describes a set of strings. The simplest regular expression is a regular expression that does not contain any special characters. For example, the regular expression Hello matches Hello.

Non-trivial regular expressions use special specific structures that allow them to match more than 1 strings. For example, the regular expression Hello|word matches the string Hello or string word.

As a more complex example, the regular expression b[an]*s matches any one of the following strings:Bananas,Baaaaas,Bs, and B any other string that begins, ends with s , and contains any number of a or n characters.

For the REGEXP operator, the regular expression can use any of the following special characters and structures:

·          ^

Matches the starting part of the string.

mysql> SELECT ' fo\nfo ' REGEXP ' ^fo$ '; 0
mysql> SELECT ' fofo ' REGEXP ' ^fo '; 1

·          $

Matches the ending part of the string.

mysql> SELECT ' fo\no ' REGEXP ' ^fo\no$ '; 1
mysql> SELECT ' fo\no ' REGEXP ' ^fo$ '; 0

·          .

Matches any character (including carriage return and new line).

mysql> SELECT ' fofo ' REGEXP ' ^f.*$ '; 1
mysql> SELECT ' fo\r\nfo ' REGEXP ' ^f.*$ '; 1

· a *

Matches any sequence of 0 or more a characters.

mysql> SELECT ' Ban ' REGEXP ' ^ba*n '; 1
mysql> SELECT ' Baaan ' REGEXP ' ^ba*n '; 1
mysql> SELECT ' Bn ' REGEXP ' ^ba*n '; 1

· a +

Matches any sequence of 1 or more a characters.

mysql> SELECT ' Ban ' REGEXP ' ^ba+n '; 1
mysql> SELECT ' Bn ' REGEXP ' ^ba+n '; 0

· a?

Match 0 or 1 a characters.

mysql> SELECT ' Bn ' REGEXP ' ^ba?n '; 1
mysql> SELECT ' Ban ' REGEXP ' ^ba?n '; 1
mysql> SELECT ' Baan ' REGEXP ' ^ba?n '; 0

· de|abc

Match the sequence de or ABC.

mysql> SELECT ' pi ' REGEXP ' Pi|apa '; 1
mysql> SELECT ' axe ' REGEXP ' Pi|apa '; 0
mysql> SELECT ' apa ' REGEXP ' Pi|apa '; 1
mysql> SELECT ' apa ' REGEXP ' ^ (Pi|apa) $ '; 1
mysql> SELECT ' pi ' REGEXP ' ^ (Pi|apa) $ '; 1
mysql> SELECT ' pix ' REGEXP ' ^ (Pi|apa) $ '; 0

· (ABC) *

Match 0 or more instances of a sequence ABC .

mysql> SELECT ' pi ' REGEXP ' ^ (pi) *$ '; 1
mysql> SELECT ' pip ' REGEXP ' ^ (pi) *$ '; 0
mysql> SELECT ' pipi ' REGEXP ' ^ (pi) *$ '; 1

· {1}, {2,3}

The {n} or {m,n} symbols provide a more general way to write regular expressions that match many of the aforementioned atoms (or " portions ") of the pattern. m and n are integers.

o A *

Can be written as a{0,}.

o A +

Can be written as A{1,}.

o a?

Can be written as a{0,1}.

More precisely,a{n} matches the n instances of a exactly. A{n,} matches n or more instances of a . A{m,n} matches the m~N instances of a , containing m and n.

m and n must be in the range of 0~Re_dup_max(default = 255) and contain 0 and re_ Dup_max. If both m and nare given,m must be less than or equal to n.

mysql> SELECT ' abcde ' REGEXP ' a[bcd]{2}e '; 0
mysql> SELECT ' abcde ' REGEXP ' a[bcd]{3}e '; 1
mysql> SELECT ' abcde ' REGEXP ' a[bcd]{1,10}e '; 1

· [A-DX], [^A-DX]

Matches any character that is (or is not, if used ^) A, B, C, D, or X. The "-" character between the two other characters forms a range that matches all characters from the beginning of the 1th character to the 2nd character. For example, [0-9] matches any decimal number. To include the literal character "]", it must be immediately after the opening parenthesis "[". To include the literal character "-", it must be written first or last. for [] Any character that does not have any special meaning defined internally, it only matches itself.

mysql> SELECT ' axbc ' REGEXP ' [a-dxyz] '; 1
mysql> SELECT ' axbc ' REGEXP ' ^[a-dxyz]$ '; 0
mysql> SELECT ' axbc ' REGEXP ' ^[a-dxyz]+$ '; 1
mysql> SELECT ' axbc ' REGEXP ' ^[^a-dxyz]+$ '; 0
mysql> SELECT ' Gheis ' REGEXP ' ^[^a-dxyz]+$ '; 1
mysql> SELECT ' Gheisa ' REGEXP ' ^[^a-dxyz]+$ '; 0

· [. characters.]

In the parentheses expression (using [ and ]), match the sequence of characters used to proofread the element. Characters are single characters or character names such as new lines. In file regexp/cname.h , you can find a complete list of character names.

mysql> SELECT ' ~ ' REGEXP ' [[. ~.]] '; 1
mysql> SELECT ' ~ ' REGEXP ' [[. Tilde.]] '; 1

· [=character_class=]

In the parentheses expression (using [and]), [=character_class=] represents the same type. It matches all characters that have the same collation value, including itself, for example, if O and (+) are members of the equivalent class, then [[[=o=]], [[= (+) =]], and [O (+)] are synonyms. The same class must not be used as a range endpoint.

· [: character_class:]

In the parentheses expression (using [and]), [: Character_class:] represents the character class that matches all the characters of the term class. The standard class name is:

alnum

Literal numeric characters

Alpha

Literal characters

blank

Blank character

cntrl

Control characters

digit

Numeric characters

graph

Graphic characters

Lower

Lowercase text characters

print

Graphics or space characters

punct

Punctuation character

space

Spaces, tabs, new lines, and carriage returns

Upper

Uppercase text characters

xdigit

Hexadecimal numeric characters

They represent the character classes defined in the CType (3) manual page. other class names may be available in specific regions. the character class must not be used as the endpoint of a range.

mysql> SELECT ' justalnums ' REGEXP ' [[: alnum:]]+ '; 1
mysql> SELECT '! '               REGEXP ' [[: alnum:]]+ '; 0

· [[: <:]], [[:;:]]

These tokens represent word boundaries. They match the start and end of Word, respectively. Word is a series of character characters, preceded and followed by no word characters. Character characters are alphanumeric characters or underscores (_) in the Alnum class.

mysql> SELECT ' A word a ' REGEXP ' [[: <:]]word[[:>:]] '; 1
mysql> SELECT ' A xword a ' REGEXP ' [[: <:]]word[[:>:]] '; 0

To use a literal instance of a special character in a regular expression, precede it with a 2 backslash "\" character. The MySQL Parser is responsible for interpreting one, and the regular expression Library is responsible for interpreting the other. For example, to match the string "1+2" that contains the special character "+", only the last one is correct in the following regular expression:

mysql> SELECT ' 1+2 ' REGEXP ' 1+2 '; 0
mysql> SELECT ' 1+2 ' REGEXP ' 1\+2 '; 0
  mysql>    SELECT ' 1+2 ' REGEXP ' 1\\+2 ';                         -> 1  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.