Regular Expression BASICS (Reading Notes), regular expression Reading Notes
A regular expression (regex) is a tool.
Two basic functions of a regular expression: search and replace.
. Character (English period) can match any single character. (. Characters can match any single character, number, letter, or even. character itself .)
\ Escape character. This is a metacharacter, which indicates that this character has a special meaning rather than its own meaning .)
(Conclusion:. can match any character; \ is used to escape the character .)
[And] do not match any characters. They are only responsible for defining a character set combination.
-A hyphen (-) is a metacharacters that can be used to define a character range. As a metacharacters, it can only be used between [and]. It is only a common character except for a desirable character set.
Valid character range:
A-Z, matching all the uppercase letters from A to Z;
A-z: matches all lowercase letters from a to z;
A-z: matches all letters from ASCII character A to ASCII letter z (not commonly used );
^ Non-characters are also metacharacters used to perform non-operations on a character set combination.
Metacharacters can be roughly divided into two types: one is used to match text (for example,.), and the other is required by the regular expression syntax (for example, [and]).
// 2015.02.17
Blank metacharacters:
[\ B] |
Roll back (and delete) one character (Backspace key) |
\ F |
Page feed |
\ N |
Line Break |
\ R |
Carriage Return |
\ T |
Tab (Tab key) |
\ V |
Vertical Tab |
Numeric metacharacters:
\ D |
Any numeric character (equivalent to [0-9]) |
\ D |
Any non-numeric character (equivalent to [^ 0-9]) |
Alphanumeric metacharacters:
\ W |
Any letter, digit (case-sensitive) or underscore (equivalent to [a-zA-Z0-9 _]) |
\ W |
Any non-alphanumeric or non-underline character (equivalent to [^ a-zA-Z0-9 _]) |
Blank metacharacters:
\ S |
Any blank character (equivalent to [\ f \ n \ r \ t \ v]) |
\ S |
Any non-blank character (equivalent to [^ \ f \ n \ r \ t \ v]) |
+ Match one or more characters at a time or multiple times (at least one character does not match zero characters ).
* Matches one or more characters zero or multiple times.
? Matches zero or one occurrence of one or more characters.
{N} sets an exact value for the number of repeated matches (for example, {3} indicates that the previous character or character set must appear three times in a row ).
{N, m} sets an interval for the number of repeated matches (for example, {2, 4} indicates that the previous character or character set combination appears at least twice consecutively, at most 4 times, {3 ,} indicates that the previous character or character set must appear at least three times ).
Greedy metacharacters and their lazy versions:
(Conclusion: the real power of a regular expression is reflected in the repeat matching. + One or more occurrences of matching characters or character sets, * zero or multiple occurrences of matching characters or character sets ,? Matches zero or one occurrence of a character or character set. To get more precise control, you can use the {} syntax to precisely control the minimum and maximum values of a repeat or repeat. Metacharacters are classified into two types: "greedy" and "lazy". To prevent over-matching, use the "lazy" metacharacters to construct regular expressions .)
\ B is used to match the start or end of a word.
\ B is used to match the start or end of a character.
^ Defines the start of a string, and $ defines the end of a string.
(? M) used to enable the Branch matching mode ,(? M) must appear at the beginning of the entire mode.
(Conclusion: regular expressions can be used not only to match text blocks of any length, but also to match text that appears at a specific position of a string. \ B is used to specify a word boundary (\ B is the opposite ). ^ And $ are used to customize the string boundary (the start of a string and the end of a string ). If (? M) in combination, ^ and $ will also match the string starting or ending at the beginning of a line break (at this time, the line break will be considered as a string separator ).)