Summary: In this part, we learn the basic matching of regular expressions, which is an entry-level skill. Next we will continue to take you to the air.
Iii. Regular Expression-Boundary Warm-up preparation: The boundary part is one of the core regular expressions. The word assertion (zero-width assertion) is sufficient.
Assertion (zero-width assertion) marks the boundary. It does not consume characters, does not match characters, and matches the position in the string.
Start and end of a string or line: "^" Matches the starting position of the line or string, or the starting position of the entire document.
'$' Matches the end position of a row or string.
Example:
Regular Expression: "^ word $"
Match string: Word (only a string of the word, starting with W and ending with D ).
Word editing and non-word boundary: "\ Bxxx \ B" matches the word boundary.
"\ B" is a zero-width assertion. On the surface, it matches spaces or the beginning of a row. In fact, it matches a zero-width assertion.
"\ B" is a non-word matching boundary that matches positions other than words.
Example:
Regular Expression: "\ Ba \ B"
Matching string: "fhrrhahhr" (similar to the character that is not a word boundary on both sides of a, here matching Character ).
Other anchors: "\ A" is similar to "^". The Anchor matches the beginning of the subject. This method is not applicable to all regular expressions, but can be used in Perl and PCRE. to match the end of a subject, use "\ Z ", "\ Z" can also be used in some contexts ".
Example:
Regular Expression: "\ AAAAA \ Z"
Matching string: "aaaa" (a string starting and ending with AAAA, that is, the start and end of the subject)
The nominal value of metacharacters: The character set between "\ Q" and "\ e" can be used to match the string literal value .". ^ $ * +? | () {} []-"The 15 metacharacters have special meanings in regular expressions and are used to write matching modes. The hyphen (-) is used in square brackets of the regular expression to indicate the range. In other cases, it has no special meaning. If you directly enter these characters in a regular expression, they will not be displayed. If you want to display these characters, you need to place them between "\ Q" and "\ e". Of course, you can also add "\" in front of it.
Example:
Regular Expression:"\ Q $ \ e" or "\ $"
Matching character: $ character itself
Practical installation: Continue to load the load as in the previous section, continue to add tags, and continue to use the Linux sed command BB. The command (I) in SED allows you to insert text before a file or a position in the string. The opposite command is (a), which adds text after a position. Examples of actual regular expressions of SED (or grep, VI, and VIM) are not provided here. Google will try it on your own. Here we will focus on regular expressions.
Summary: I learned the boundary and assertion (zero-width assertion ). If there is no summary, start the essence of the Regular Expression and continue to BB.
4. Selection, grouping, and backward reference Select Operation: Select operation can match one of multiple optional modes. For example, if you want to find the number of times that "the" (the, the, the) appears in the "the android developer need Fix bug on the bug system.", then select the mode.
Regular Expression: "(The | the | The)" or "(? I)"
Original string: "the android developer need Fix bug on the bug system ."
Matching result: the,
The above regular expression matches all the upper and lower case of.
The following are other options and modifiers (Note: The following options do not apply to platforms with all regular expressions ):
Option |
Description |
Supported platforms |
(? D) |
Rows in UNIX |
Java |
(? I) |
Case Insensitive |
PCRE, Perl, Java |
(? J) |
Repeated names allowed |
PCRE |
(? M) |
Multiple rows |
PCRE, Perl, Java |
(? S) |
Single Row (dotall) |
PCRE, Perl, Java |
(? U) |
Unicode |
Java |
(? U) |
Default minimum match |
PCRE |
(? X) |
Ignore spaces and comments |
PCRE, Perl, Java |
(? -...) |
Restore or disable options |
PCRE |
Sub-mode: The sub-mode is one or more groups in the regular expression group, that is, the mode in the mode. In most cases, the condition in the submode can be matched on the premise that the previous mode is matched, but there are also exceptions (for example, "(The | the | The)" matching the condition does not depend on, because the match will be performed first. In this example, there are three sub-modes: the, the, and the. There are many sub-pattern expressions. Here we only focus on the sub-pattern in the ARC.
Example (Child pattern matching depends on the previous pattern ):
Regular Expression: (T | T) H (E | E)
Match: the,
In the above example, the second sub-mode "(E | E)" depends on the first sub-mode "(T | T )".
In particular, the arc is not required for submode !!!!! As follows:
Regular Expression: "\ B [TT] H [EE]"
Match: the,
The above "[TT]" character group can be considered as the first sub-mode, the same as the second.
Capture group and backward reference: When all or part of a mode is grouped by a pair of parentheses, the content is captured and temporarily stored in the memory. The captured content can be reused in the back reference as follows:
"\ 1", "\ 2", or" 1 "," 2 ", N captured groups.
Only the group "\ 1" is accepted in the SED command.
Example (simulate backward reference using the SED command in Linux ):
echo "YanBo is an Android Developer!" | sed -En ‘s/(YanBo is) (an Android Developer)/\2 \1/p‘
Output: An android developer Yanbo is!
Explanation:
-E is the SED regulator ere (extended regular expression). Therefore, parentheses can be used as the literal value.
-N overwrites the default settings for each row.
Capture group 1, 2 to replace.
Naming group: A named group is a group with a name. You can reference a group by name instead of a number.
Naming group Syntax:
Syntax |
Description |
(?<name>...) |
Naming Group |
(?name...) |
Another way to group names |
(?P<name>...) |
Naming group in Python |
\k<name> |
Reference Group name in Perl |
\k‘name‘ |
Reference Group name in Perl |
\g{name} |
Reference Group name in Perl |
\k{name} |
Reference Group name in. net |
(?P=name) |
Reference Group name in Python |
Non-capturing group :** Non-capturing groups do not store their content in the memory. You can use it when you do not want to reference a group. Because groups are not stored, non-capture groups have high performance.
Example:
Write the capture group as follows: "(The | the | )"
You do not need any backward reference and can write it :"(? : The | the | )"
Case Insensitive :"(? I )(? : The) "or" (? :(? I) The) "or (recommended )"(? I: )"
Atomic group: There is also an atomic Group for Non-capturing groups. If you use the Regular Expression Engine to perform the rollback operation, this type of grouping can disable the rollback operation, but it only competes for the atomic grouping part, not the entire expression. Syntax:
"(?> The )"
One reason for the slow regular expression is the rollback operation.
Summary: I don't have a summary. I Want To Continue loading and flying. The following force level is higher !!!
6. Regular Expression-quantifiers Greedy, lazy, possession: The quantifiers are greedy. Greedy quantifiers first match the entire string, and then roll back one by one until the matching is found. So he consumes the most resources.
A lazy quantizer uses another strategy. It searches for matching from the starting position of the target, checks a character each time, and finally tries to match the entire string. If you want to change the quantifiers to lazy, you must add a question mark (?) after the common quantifiers (?).
The placeholder quantizer will overwrite the entire target and then try to find the matching content, but it will only try once and will not go back. After a common quantizer, a plus sign (+) is added ).
Regular Expression*、+、?
Match: The following basic quantifiers are greedy by default.
Syntax |
Description |
? |
Zero or one |
+ |
One or more |
* |
Zero or multiple |
For example:
Regular Expression: "9 +"
Match: one or more 9
Matching times: The following arc quantifiers are the most accurate quantifiers for matching. By default, they are greedy.
Syntax |
Description |
{N} |
Exact match n times |
{N ,} |
Match N or more times |
{M, n} |
Match m-N times |
{0, 1} |
And? Same, zero or once |
{1, 0} |
Same as +, one or more |
{0 ,} |
Same as *, zero or more |
Lazy quantizer: In practice, this lazy quantizer is:
Regular Expression: "8 ?"
Match: one or zero 8
Regular Expression: "8 ??" (Lazy)
Match: A single 8 does not match, because it is lazy and as few as possible.
Regular Expression: "8 *?" (Lazy)
Match: A single 8 does not match, because it is lazy and as few as possible.
Regular Expression: "8 + ?" (Lazy)
Matching: an 8 value is matched.
Regular Expression: "8 {3, 8 }?" (Lazy)
Matching: Three 8 matches.
Lazy quantifiers:
Syntax |
Description |
?? |
Lazy match 0-1 times |
+? |
1-multiple lazy matches |
*? |
Lazy match 0-multiple times |
{N }? |
Lazy match multiple times |
{N ,}? |
Lazy match N-multiple times |
{M, n }? |
Lazy Match m-N times |
Quantifiers: Posite Word Table:
Syntax |
Description |
? + |
Occupy matching 0-1 times |
++ |
Possession Matching 1-multiple times |
* + |
Possession matches 0-multiple times |
{N} + |
Possession matching multiple times |
{N ,}+ |
Possession match N-multiple times |
{M, n} + |
Occupy M-N times |
Example:
Regular Expression: "1. * +"
Matching: All 1 values are highlighted.
Regular Expression: ". * + 1"
Match: no match, because there is no rollback.
Regular Expression: ". * 1"
Match: match the string whose end is 1. Greedy mode.
Summary: The quantifiers introduced here are the essence of the regular expression efficiency. If there is no explanation, continue to force the regular expression to continue flying.
VII. Regular Expression-View Loop view is a non-capturing group, also known as a zero-width assertion.
Forward Looking: Example:
Regular Expression :"(? I) AAA (? = BBB )"
Original string: "aaa ccc bbb aaa bbb ccc AAA"
Match: only the second "AAA" is matched ".
The above is to match AAA, and the AAA word is followed by BBB. Positive foresight is used to achieve the goal.
Anti-Foresight: Reverse lookup is a forward-looking reverse operation.
Example:
Regular Expression :"(? I) AAA (?! Bbb )"
Original string: "aaa ccc bbb aaa bbb ccc"
Match: only the first "AAA" is matched ".
The above is the matching of AAA, and the AAA word is not followed by BBB. Use anti-forward to achieve the goal.
Follow-up: The positive and forward direction is the opposite.
Example:
Regular Expression :"(? <= Aaa) BBB"
Original string: "aaa ccc bbb aaa bbb ccc AAA"
Match: only the second "BBB" is matched ".
Reverse Gu: The reverse direction is opposite to the reverse direction.
Example:
Regular Expression :"(?
Summary: This section does not need to be summarized as an example.
Finale This is almost enough for the entire regular expression. To sum up the learning method, you just need to be bold in practice, think about it, and then verify it in the editor.
Regular Expression Basics