11. Grouping
You can use parentheses to denote grouping:
- By using the mon| tues| wednes| thurs| fri| satur| Sun) Day matches one of the days of the week
- (\w*) ility is the same as \w*ility. Matches a word that ends with "ility". We'll explain later why the first approach is more useful.
- Represents a matching pair of parentheses.
- [()] means matching either an opening parenthesis or a closing parenthesis
- (Red|blue) that matches red or blue or an empty string
- ABC () def is the same as abcdef
- (Red|blue)? With (red|blue|) Same
- \w+ (\s+\w+) that matches one or more words separated by spaces
Example
\w+\W+\w+\W+\w+ = \w+(\W+\w+){2}
\w+\W+\w+\W+\w+\W+\w+\W+\w+\W+\w+ = \w+(\W+\w+){5}
12. Word Separators
- \b That matches a word delimiter
- \b\w\w\w\b means matching a three-letter word
- A\BA indicates that there is a word delimiter in the middle of match two A. This regular expression will never have a matching character, regardless of the type of text example
13. Line Break
There can be one or more lines in a text, and rows and rows are separated by a newline character, such as
Line text
Line break newline
Line text
Line break newline
...
Line text
Line break newline
Note that all text ends with a line instead of a newline character. However, any row may be empty, including the last row. The starting position of the line, which is the space between the newline character and the next line. With the word delimiter in mind, the starting position of the text can also be used as the first line position. The last line is the space between the last line's trailing character and the line break. Given the word delimiter, the end of the text can also be thought of as the end of the line. Based on this concept:
- ^ Indicates the starting position of the matching line
- $ indicates the end position of the matching line
- ^& represents a blank line
^.*& 表示匹配全文内容,因为行的开始符号也是一个字符,"."会匹配这个符号。找到单独的一行,可以使用^.*?$
- \^\$ that matches the string "^$"
- [$] represents a $ match. However, [^] is not a valid regular expression. Remember that in square brackets, characters have different special meanings. To match ^ within square brackets, you must use [\^]
14. Text Demarcation
In many regular expression implementations, ^ and $ are used as the starting and ending symbols of the text.
There are also implementations, with \a and \z as the starting and ending symbols of the text.
15. Capturing Groups
The regular expression (\w*) ility represents a word that matches the end of the ility. The first captured part is controlled by the \w*. For example, the input text content has the word accessibility, then the first captured part is the ACCESSIB. If there is a separate ility in the text that is entered, the first thing to be captured is an empty string.
actual use
Credit card verification: \d* (\d\d*) {16}
Summarize:
- Characters:
a b c d 1 2 3 4 etc.
- Character class
. [abc] [a-z] \d : \w\s
.Represents any character
\d 表示Digital
\wDenotes "letter",[0-9A-Za-z_]
\s Represents "spaces, tabs, carriage returns, or line feeds"
- Negative character class:
[^abc] \D \W\S
{4} {3,16} {1,} repeat ? : *+
?Represents "0 or one time"
*Indicates "greater than 0 times"
+means "one or more times"
- What if we don't add it? , all repetitions are the longest match (greed)
- Group:
(Septem|Octo|Novem|Decem)ber
- Words, lines, and the separation of text:
\b ^ $ \A\z
- Escape characters:
\1 \2 \3 etc. (available in Match expressions and substitution expressions)
. \ [ metacharacters ] : { } ? * + | ( ) ^$
- To use metacharacters in character classes:
[ ] \ -^
- You can omit metacharacters using backslashes:
\
Regular Expression Learning notes (iii)