I. linux text search command
Before speaking about linux regular expressions, we also introduce three common commands for searching text files in linux:
1. grep: the earliest text matching program that uses the basic regular expression (BRE) defined by POSIX to match the text.
2. egrep: Extended grep, which uses an extended regular expression (ERE) to match text.
3. fgrep: Quick grep. This version matches fixed strings rather than regular expressions. It is the only version that can match multiple strings in parallel.
The following is a brief introduction to the grep command:
Syntax format:
Grep [options...] pattern-spec [files...]
Purpose:
Match one or more text lines.
Options:
-E: use an extended regular expression for matching. grep-E or replace the egrep command.
-F: Use a fixed string for matching. grep-F or replace the traditional fgrep command.
-E: Generally, the first non-option parameter is considered to be the pattern to be matched. It can also provide multiple modes at the same time, as long as it is placed in single quotes and separated by line breaks.
When the mode starts with a minus sign, to prevent confusion as the option, the-e Option indicates that the subsequent parameter is the mode, even if it starts with a minus sign.
-F: The read mode from the pat-file is matched.
-I: Case sensitivity differences are ignored during pattern matching.
-L: lists the names of matching files, rather than printing matching rows.
-Q: silent. If the match succeeds, the matching row is not output to the standard output; otherwise, the matching row is unsuccessful.
-S: the error message is not displayed. It is usually used with-q.
-V: displays the rows in unmatched mode.
Note: You can search for the content of multiple files at the same time. When multiple files are specified, a colon is added to the file name before each row to identify the file from.
You can use multiple-e or-f options to create a list of modes to be searched.
Ii. Regular Expressions
1. Regular Expression Composition
(1). General characters: characters with no special meaning
(2). Special characters (meta characters): metacharacters, which have special meanings in Regular Expressions
2. The following describes the common meta characters in regular expressions.
(1). meta characters in posix bre and ERE:
\: It is usually used to open or close the special meanings of subsequent characters, such as \ (... \) and \{...\}
.: Match any single character (except NUL)
*: Match any number or single character before it. For example, if "." represents any character, ". *" matches any length of any character.
^: Matches the followed regular expression. BRE has special meanings only at the beginning of the regular expression, and ERE has special meanings at any position.
$: Match the regular expression at the end of a string or line. BRE only has a special meaning at the end of the regular expression, and ERE has a special meaning at any position.
[]: Match any character in square brackets, where hyphens (-) can be used to indicate the range of consecutive characters; ^ The symbol bitter appears at the first position in square brackets, match any character that is not in the list,
(2) characters in posix bre:
\ {N, m \}: interval expression, which indicates the number of times a single character before it is reproduced. \ {N \} refers to the reproduction of n times; \ {n, m \} refers to the reproduction of n to m times;
\ (\): Reserved space. up to nine independent sub-modes can be stored in a single mode. For example, \ (AB \). * \ 1: indicates that a combination of AB can be reproduced twice, and any number of characters can exist in the middle.
\ N: Repeat the pattern of nth subpattern To This vertex in \ (and \) square brackets.
(3) characters in posix ere:
{N, m}: Same as \ {n, m \} of BRE
+: Match one or more extensions of the previous regular expression.
? : Matches zero or one extension of the previous regular expression.
|: Match | regular expression before or after a symbol
(): Regular expression group enclosed by square brackets
(4) square brackets ([]) Expression
4. 1. Character Set [:]
The following types of character sets are supported:
[: Alnum]: digit character |
[: Digit:]: digit character |
[: Punct:]: punctuation character |
[: Alpha:]: letter |
[: Graph:]: non-space characters |
[: Space:]: space character |
[: Blank:]: space and positioning character |
[: Lower:]: lowercase letter |
[: Upper:]: uppercase letter |
[: Cntrl:]: Control Character |
[: Print:]: printable characters |
[: Xdigit:]: hexadecimal number |
4. 2. Sort symbols
Multiple characters are considered as one symbol. For example, [. ch.] indicates that ch is regarded as one symbol.
4. 3. Equivalent characters
Multiple characters are considered to be equal. For example, [= e =] can match multiple characters similar to e in locale of French, which is not listed here.
Note: In addition to square brackets, these three structures must be enclosed by square brackets.
Example: [[: alpha:]!] : Match any English letter or exclamation point.
[[. Ch.]: matches the ch sorting element, but does not match a separate letter c or h.
3. Simple Regular Expression matching case
China: Match rows with any Chinese characters in this row
^ China: match the line with the china Switch
China $: Match rows ending with china
^ China $: Match rows with only five Chinese Characters
[Cc] hina: Match rows containing China or china
Ch. na: match a row that contains two letters (Ch), followed by any character, and contains two characters (na ).
Ch. * na: match a row containing Ch characters followed by 0 or more characters, and then continue with na.
Ii. Instance
For example, we usually use the delimiter to learn breand erematching. The content of the source file url.txt is as follows:
Www.baidu.com
Http://www.baidu.com
Https://www.baidu.com
Http: // wwwbaiducom
Baidu.com
Baidu
1. url matching
Matches a string that starts with http or https and is followed by: and contains.
BRE matching:
Grep '^ https \ {0, 1 \}. * \... * 'url.txt
ERE match:
Grep-E '^ https ?. * \ .. * 'Url.txt
The matching result is as follows:
Http://www.baidu.com
Https://www.baidu.com
2. Email matching
The content of the sample file is:
Hfutwyy@qq.com
Aaaa @
Aaa @. com
Aaa@gmail.com
@ Baidu.com
Matches multiple characters starting with a letter, digit, or underscore, followed by a @, followed by multiple letters, numbers, or underscores, with.
Grep '^ [[: alpha:] [: digit:] _] * @ [[: alpha:] [: digit:] * \ .. * 'email.txt
Matching result:
Hfutwyy@qq.com
Aaa @. com
Aaa@gmail.com
First come here, and then write.