Mastering Regular Expressions
This book by jerffrey E. F. Friedl has been famous for a long time. I purchased a copy from Dangdang in December, but I have never had time to read it. The main reason is, of course, English is poor, and I have not developed the habit and ability to read English.
Sorry, why didn't I spend some time on English.
The terrible demand for searching duplicate words at the beginning is very vivid. If I encounter such a demand, it will certainly be quite distressing.
- Check n files, find duplicate words (such as "this"), and then report the row of the file in which they are located, and use the standard ANSI escape sequence to display them in height;
- In addition, the last word in a row must be the same as the first word in the next (non-empty) followed by cross-row operations.
- In the search process, we also need to ignore the case differences and treat any number of spaces between words as a single space. The most important thing is that, one or two of the two duplicate words are surrounded by HTML tags! For example, "... it is <B> very </B> very important ..."
Such a requirement sounds annoying, but with regular expressions, everything becomes easy.
The following table shows the basic metacharacters:
Metacharacters |
Name |
Matching Behavior |
Remarks |
^ |
Escape Character |
Starting position of matching row |
|
$ |
Circle character |
End position of matching row |
|
/< |
Backslash and less than character |
Match the start boundary of a word |
Not all versions of egrep support this feature. |
/> |
Backslash and greater than character |
Match the end boundary of a word |
. |
Point |
Match any single character |
|
[...] |
Character Set |
Match All characters listed in square brackets |
|
[^...] |
Character non-Set |
Match All characters not listed in square brackets |
|
| |
Or symbol |
Expressions that match or separate symbols |
|
(...) |
Parentheses |
Used to specify the range of the "or" symbol |
|
Note:
- If a metacharacters appear in a character set (a list of characters enclosed by square brackets), it is no longer a metacharacters. For example, when the dot character is outside square brackets, it is a metacharacter that represents any character. If it appears in square brackets, it represents the dot character itself.
- In character sets and character sets, if the minus sign appears at the first character position, it represents the minus sign itself, otherwise it represents a range, such as [-a-z0-9], the first minus sign represents the minus sign itself, and the second minus sign represents the range. It represents 26 lower-case letters from A to Z together with the characters a and Z, the third minus sign has the same meaning as the second minus sign.
- For example, [^ x] does not mean "match as long as it is not character X", but "match any character not X ", the former can match an empty row, but [^ x] does not.
- Some versions of egrep support the-I parameter to perform case-insensitive matching.