Concepts and basic elements: 1. metacharacters: § Regular expressions consist of some ordinary characters and some meta-characters (meta characters). Ordinary characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings. A. Common meta-characters:
Metacharacters |
Describe |
\ |
Marks the next character as a special character, or a literal character, or a backward reference, or an octal escape. For example, ' n ' matches the character "n". ' \ n ' matches a line break. The sequence ' \ \ ' matches "\" and "\ (" Matches "(". |
. |
Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '. For example, the regular expression r.t matches these strings: Rat, Rut, R T, but does not match root. |
^ |
Matches the starting position of the input string. For example, the regular expression ^when in is able to match the start of the string "When in the course of human events", but does not match "what is in the". |
$ |
Matches the end position of the input string. For example, the regular expression book$ is able to match the end of the string "This was a book", but cannot match the string "They has many books". |
[xyz] |
The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plan '. |
[A-z] |
The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range. You can also make multiple intervals, such as regular expressions [a-za-z] that match any uppercase and lowercase letters. |
\b |
Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' very '. |
\d |
Matches a numeric character. equivalent to [0-9]. |
\s |
Matches any whitespace character, including spaces, tabs, page breaks, and so on. |
\w |
Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '. |
\ n |
Matches a line break. Equivalent to \x0a. |
\ r |
Matches a carriage return character. Equivalent to \x0d. |
B. Repetition:
Metacharacters |
Describe |
* |
Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}. |
+ |
Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}. |
? |
Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}. |
{n} |
N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '. |
{n |
n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '. |
{n,m} |
Both M and n are non-negative integers, where n <= m. Matches at least n times and matches up to m times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers. |
C. Counter-Justification:
Metacharacters |
Describe |
\b |
Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '. |
\d |
Matches a non-numeric character. equivalent to [^0-9]. |
\s |
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v]. |
\w |
Matches any non-word character. Equivalent to ' [^a-za-z0-9_] '. |
[^XYZ] |
Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ' in ' plain '. |
[^a-z] |
A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not within the range of ' a ' to ' Z '. |
D. Other: 1.x|y match condition X or condition y, two matching criteria are logical "or" (or) operation. 2. We have already mentioned how to repeat a single character (directly following the character with a qualifier). But what if you want to repeat multiple characters? You can use parentheses to specify sub-expressions (also called groupings) such as (\d{1,3}\.) {3}\d{1,3} is a simple IP address matching expression, but it will also match the 256.300.888.999 IP address that cannot exist. 3. Lazy Qualifiers
Metacharacters |
Describe |
*? |
Repeat any number of times, but repeat as little as possible |
+? |
Repeat 1 or more times, but repeat as little as possible |
?? |
Repeat 0 or 1 times, but repeat as little as possible |
{n,m}? |
Repeat N to M times, but repeat as little as possible |
{N,}? |
Repeat more than n times, but repeat as little as possible |
Simple Regular Expression application:
- Registered account number: 5-20 letters, numbers or underscores, the first character must be a letter
§:^[A-ZA-Z]\W{4,19};
- Phone number with area code: XXX-XXXXXXXX or Xxxx-xxxxxxx
§: (^[0]\d{2}-\d{8}) | (^[0]\d{3}-\d{7});
- The correct IP address description: (0-255). (0-255). (0-255). (0-255)
§: ((2[0-4]\d|25[0-5]|[ 01]?\d\d?) \.) {3} (2[0-4]\d|25[0-5]| [01]?\d\d?];
Regular Expressions-first entry threshold