The composition of the regular expression:
- General characters: characters with no special meaning
- Special characters (meta characters): metacharacters, which have special meanings in regular expressions
Common meta-characters in regular expressions
- POSIX Bre with both meta characters in ere
- \ usually used to turn on or off special meanings of subsequent characters, such as \ (... \) with \{...\};
- . Matches any single character (except null);
- * Match any number or no single character before it, for example:. Represents any one character, then. * matches any length of any character;
- ^ matches the regular expression immediately thereafter, the Bre only has a special meaning at the beginning of the regular expression, and ere in any position has a special meaning;
- $ matches the preceding regular expression, either at the string or at the end. The BRE has a special meaning only at the end of the regular expression, and Ere has a special meaning in any position;
- [] matches any one of the characters in the square brackets, where the hyphen (-) is the range of consecutive characters, and the ^ symbol appears in the first position of the square brackets, indicating that the match is not any character in the list;
- Only characters in POSIX Bre
- \{n,m\}: An interval expression that matches the number of occurrences of a single character before it. \{n\} refers to reproducing n times; \{n,m\} refers to reproducing N to M times;
- \ (\): Reserve space to store up to 9 independent sub-patterns in a single pattern. such as \ (ab\). *\1: Refers to two occurrences of a matched AB combination, with any number of characters in the middle.
- \ n: Repeats the pattern of the nth sub-pattern in the \ (\) parentheses;
- POSIX ere there are characters
- {n,m} is the same as in the BRE
- + one or more extensions that match the preceding regular expression
- ? Match 0 or one extension of the preceding regular expression
- | Match | Regular expression before or after a symbol
- () match () a regular expression group enclosed
- Character
The identity character set has the following types:
[::alnum]: Span style= "font-family: Arial" > numeric characters |
[:d igit:]: Span style= "font-family: Arial" > numeric characters |
[:p UNCT:]: Span style= "font-family: Arial" > punctuation character |
[:alpha:]: Span style= "font-family: Arial" > alphabetic character |
[:graph:]: Span style= "font-family: Arial" > non-whitespace character |
[:space:]: space character |
[:blank:]: Span style= "font-family: Arial" > spaces and positioning characters |
[:lower:]: Span style= "font-family: Arial" > lowercase alphabetic characters |
[:upper:]: Span style= "font-family: Arial" > capital letter character |
[:cntrl:]: Span style= "font-family: Arial" > control character |
[:p rint:]: Span style= "font-family: Arial" > The characters that can be displayed |
[:xdigit:]: |
[:: Alnum]: Numeric characters |
[:d igit:]: Numeric characters |
[:p UNCT:]: Punctuation characters |
Case One
The contents of the original file Url.txt are as follows:
Www.baidu.com
Http://www.baidu.com
Https://www.baidu.com
Http://wwwbaiducom
Baidu.com
Baidu
Requirement: Match begins with http or HTTPS, followed by: and contains a string of.
Bre match:
Grep ' ^https\{0,1\}:.*\. * ' Url.txt
Ere match:
Grep–e ' ^https?:.*\. * ' Url.txt
Case Two
Email match
The contents of the original file Email.txt are as follows:
[Email protected]
[Email protected]
[Email protected]
[Email protected]
@ @baidu. com
Requirement: A match begins with a letter or a number, or an underscore, followed by an @ followed by multiple alphanumeric or underlined, with one.
Grep ' ^[[:alpha:][:d igit:]_]*@[[:alpha:][:d igit:]]*\. * ' Email.txt
Note: This article refers to http://www.jb51.net/article/42989.htm
Linux Regular expressions