Regular expressions
Single character notation
The character itself <--except for the following special characters, which can represent itself
. <--any character
\d <--Digit in 0123456789
\d <--Non-digit
\w <--word:letters, digits, underscore (_)
\w <--Non-word
\ t <--Tab
\ r <--Carriage return
\ <--New Line
\s <--whitespace:space, \ t, \ r, \ n
\s <--Non-whitespace
[ABC] <--A, or B, or C
[A-c] <--A, or B, or C
[0-2] <--0, or 1, or 2
[1-3a-cx-z] <--1, 2, 3, A, B, C, X, Y, Z
[^ABC] <--any character except A and B and C
[: Upper:] <--Upper case letters, [A-z]
[: Lower:] <--lower case letters, [A-z]
[: Alpha:] <--alphabetic characters, [a-za-z]
[: Alnum:] <--alphanumeric characters, [a-za-z0-9]
[:d igit:] <--Digits, [0-9]
[: xdigit:] <--hexadecimal digits, [a-fa-f0-9]
[:p UNCT:] <--punctuation and symbols, [][!] #$%& ' () *+,./:;<=>[email protected]\^_ ' {|} ~-]
[: Blank:] <--Space and tab, [\ t]
[: Space:] <--all whitespace characters including line breaks, [\t\r\n\v\f]
[: Cntrl:] <--Control characters, [\x00-\x1f\x7f]
[: Graph:] <--Visible characters (i.e anything except spaces, control characters, etc), [\x21-\x7e]
[:p rint:] <--Visible characters and spaces (i.e. anything except control characters, etc), [\x20-\x7e]
[: Word:] <--Word characters (letters, numbers and underscores), [a-za-z0-9_]
[: ASCII:] <--ASCII characters, [\x00-\x7f]
Special character notation
^$.*+?| \{}[] () have a special meaning, if you need to represent these symbols, you can use backslashes to escape them, such as:
\. Match a point
\ \ matches a backslash
\^
\$
Appears in the [...] The hyphen in is also of special significance, if necessary in the [...] In a hyphen, put it in the head or tail of [...], for example:
[abc-] or [-ABC]
Quantitative notation
Used to denote a previous character, or the number of repetitions of a set of characters
* <--any time, {0,}, c >= 0
+ <--at least 1 times, {1,}, C >= 1
? <--0 or 1 times, {0,1}, c = = 0 | | c = = 1
{m} <--m times, c = = m
{m,} <--at least m times, c >= m
{m,n} <--m times to N times, c >= m && c <= N
A quantifier acts on one of the preceding characters, or a set of characters, enclosed in parentheses.
ab+ Match AB, ABB, ABBB, abbbb ...
(AB) + Match AB, Abab, Ababab, Abababab ...
By default, the quantifier is the maximum match, and some regular expression engines support a question mark? To enable the minimum match
. *b matching Aaabababa <--maximum match
^^^^^^^^
. *?b matching Aaabababa <--minimum match
^^^^
Grouping notation
(ABC) <--a contiguous set of characters ABC
(AA|BB) <--a continuous set of characters ab or BB
Boundary notation
^ <--the beginning of the string
$ <--End of string
\b <--Word boundary
\b <--non-word boundary
\< <--Word left Border
\> <--Word right boundary
Reference notation
Starting from the left the number of left parenthesis (openning brace), the number starting from 1, the first pair of parentheses matching the character can be referenced with \ One, the second pair can be referenced with \, and so on.
[Email protected] shell_03]$ echo Abcabcabcaabb | Grep-e ' (A (BC)) {2}\1 '--color
Abcabcabcaabb
[Email protected] shell_03]$ echo Abcabcabcaabb | Grep-e ' (A (BC)) {2}a\2 '--color
Abcabcabcaabb
[Email protected] ~]$ echo "Hello World, Hello World, Hello Beautiful World" | Grep-e--color ' ((Hello) (World)), \1, \2. * \3 '
Hello World, Hello World, Hello Beautiful World
Bash Shell summarizes "four" regular expressions