Regular Expressions are a powerful way to specify patterns for complex searches.
MySQL adopts the Regular Expression of Henry Spencer, and its goal is to comply with POSIX 1003.2. SeeAppendix C: Thank you. MySQL adopts an extended version to support the pattern matching operation used with the Regexp operator in SQL statements. See3.3.4.7, "pattern matching".
In this Appendix, we summarize the special characters and structures that can be used for Regexp operations in MySQL, and provide some examples. This appendix does not contain all the details that can be found on Henry Spencer's RegEx (7) manual page. This manual page is included in the regex.7 file in the RegEx directory of the MySQL source code distribution edition.
A regular expression describes a set of strings. The simplest regular expression is a regular expression without any special characters. For example, the regular expression Hello matches hello.
Extraordinary Regular Expressions use special structures so that they can match more than one string. For example, the regular expression Hello | word matches the string hello or string word.
As a more complex example, the regular expression B [an] * s matches any of the following strings: bananas, baaaaas, BS, and any other string that starts with B, ends with S, and contains any number of A or n characters.
For the Regexp operator, regular expressions can use any of the following special characters and structures:
· ^
Match the start part of the string.
mysql> SELECT 'fo/nfo' REGEXP '^fo$'; -> 0
mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
· $
Matches the end part of the string.
mysql> SELECT 'fo/no' REGEXP '^fo/no$'; -> 1
mysql> SELECT 'fo/no' REGEXP '^fo$'; -> 0
·.
Match any character (including carriage return and new line ).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1
mysql> SELECT 'fo/r/nfo' REGEXP '^f.*$'; -> 1
· *
Matches any sequence of 0 or multiple A characters.
mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
· A +
Matches any sequence of one or more A characters.
mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
·?
Matches 0 or 1 a character.
mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
· De | ABC
Match the sequence de or ABC.
mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0
mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
· (ABC )*
Matches 0 or multiple instances of the sequence ABC.
mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1
mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0
mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
· {1}, {2, 3}
The {n} or {m, n} symbol provides a more common way to write regular expressions and can match many of the aforementioned atoms (or "parts") of the pattern "). Both m and n are integers.
O *
Can be written as a {0 ,}.
O A +
Can be written as a {1 ,}.
O?
Can be written as a {0, 1 }.
More accurately, A {n} exactly matches n instances of. A {n,} matches N or more instances of. A {m, n} matches M ~ of ~ N instances, including M and N.
M and N must be between 0 and ~ The range of re_dup_max (255 by default) includes 0 and re_dup_max. If both m and n are given, m must be less than or equal to n.
mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0
mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1
mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
· [A-dx], [^ A-dx]
Match any character that is (or is not, if ^ is used) a, B, c, d, or X. The "-" character between two other characters constitutes a range, and matches all characters starting from 1st characters to 2nd characters. For example, [0-9] matches any decimal number. To include the text character "]", it must be followed by the brackets. To contain the text character "-", it must be written first or last. For any character that does not define any special meaning for [], it only matches itself.
mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0
mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1
mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
· [. Characters.]
In the brackets expression (using [and]), match the character sequence used for checking the element. The character is a single character or a new line or another character name. In the Regexp/cname. h file, you can find the complete list of character names.
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1
mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
· [= Character_class =]
In the brackets (using [and]), [= character_class =] indicates the same type. It matches all characters with the same value, including itself. For example, if both O and (+) are similar members, [[= O =], [[= (+) =], and [O (+)] are synonyms. And so on.
· [: Character_class:]
In a bracket expression (using [and]), [: character_class:] indicates the character class that matches all characters in the term class. The standard class name is:
Alnum |
Character |
Alpha |
Character |
Blank |
White space characters |
Cntrl |
Control characters |
Digit |
Numeric characters |
Graph |
Graphical characters |
Lower |
Lowercase characters |
Print |
Graphical or space characters |
Punct |
Punctuation |
Space |
Space, tab, new line, and carriage return |
Upper |
Uppercase characters |
Xdigit |
Hexadecimal numeric characters |
They represent character classes defined on the ctype (3) manual page. Other class names may be provided for specific regions. Character classes cannot be used as endpoints of a range.
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1
mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
· [[: <:], [[:>:]
These tags indicate word boundaries. They match the start and end of word respectively. Word is a series of character characters, with no character at the front and back. The character is a letter, digit, or underscore (_) in the alnum class (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a text instance with special characters in a regular expression, add two backslash (/) characters before it. The MySQL parser is responsible for interpreting one of them, and the regular expression library is responsible for interpreting the other. For example, to match the string "1 + 2" that contains the special character "+", in the following regular expression, only the last one is correct:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1/+2'; -> 0
mysql> SELECT '1+2' REGEXP '1//+2'; -> 1