What is a regular expression?
Describes the features of a text by combining symbols, symbols, and letters and numbers. These characters do not represent the original meaning of the character, but represent control and configuration. The combination of the symbol and letter is called metacharacters.
Why use a regular expression.
Regular Expressions can match the features of a text segment, such as year, month, and day, blank lines, words, text lines, and IP addresses. In this way, you can find the files that contain the content based on some texts. You can also copy, delete, replace, or assign values to a variable based on the text found by these features. Many programming languages support regular expressions, which may support different regular expression engines and slightly different representation methods. VI, grep, sed, awk and other text processing tools support regular expressions. Here we mainly use grep for demonstration. Grep supports basic regular expressions. egrep supports extended regular expressions. grep-E is equivalent to egrep.
The difference with globbing is that regular expressions are used for text content matching, while globbing is used for file name matching.
Globbing characters
* Represents any character of any length
? Any single character
[] Any character in the square extension character at the position of a single character
[ABC] indicates A, B, or C
[^] The opposite []
[^ ABC] indicates not A, B, or C
This section describes how to use regular expressions.
Regular Expressions include basic regular expressions and extended regular expressions.
There are several metacharacters in the basic regular expression: character matching, number matching, location anchoring, grouping, and reference.
\ Escape characters. A character after it does not represent its own meaning, but it has a special meaning.
Character matching, used for text content.
. Match any single character. It can match letters, numbers, symbols, and spaces.
[] A single character in the specified range
[^] Specifies a range not included. ^ is used to obtain the inverse.
[ABC] matches A, B, or C, which is case insensitive.
[1-9] match any number between 1 and 9
[^ A-z0-9] ^ indicates reverse. Matching does not include letters or data.
Example: grep '^ r. '/etc/passwd indicates that in the/etc/passwd file, search for the entire row starting with lower-case R, followed by a row of any letter after R. As shown below.
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/3C/B3/wKioL1PB-_OjnepuAAC4d3g4Nl8031.jpg "Title =" 1.jpg" alt = "wKioL1PB-_OjnepuAAC4d3g4Nl8031.jpg"/>
[: Space:] Blank, space, tab, and carriage return are all blank characters. The content in the square extension and expansion is a whole, which is fixed.
[: Punct:] indicates all punctuation marks
[: Lower:] indicates all lowercase letters.
[: Upper:] indicates all uppercase letters.
[: Alpha:] indicates all letters
[: Digit:] indicates all numbers.
[: Alnum:] indicates all letters and numbers.
For example, find the line starting with a letter in the/etc/inittab file.
Grep '^ [[: alnum:]. *'/etc/inittab
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKiom1PB_DfCVAtmAAA4oZtA5_k246.jpg "Title =" 2.jpg" alt = "wkiom1pb_dfcvatmaaa4ozta5_k246.jpg"/>
Number of times matching: the number of times a specified character can appear.
* The length of the preceding character can only be used for any occurrence of the preceding character. Including 0 times.
For example, x * y can match XXY, XY, and Y. Why is there y, because the preceding X is arbitrary.
. * Star represents any character of any length. These two metacharacters are combined to be equivalent *
\? The preceding characters can appear 0 times or 1 time. Or the preceding characters are optional.
Example: X \? Y can match XY, y
\ + The first character appears at least once, but not limited to multiple characters
The characters before \ {M \} appear m times
For example, X \ {2 \} y can match XXY and xxxxy.
The characters before \ {M, N \} must be at least m times and at most N times. m must be less than N here.
For example, X \ {2, 5 \} can match XXY, xxxy, xxxxy, and xxxxxy.
The characters before \ {M, \} must be at least m times.
The characters before \ {0, m \} may not appear. If yes, they can appear up to m times.
For example, find the/etc/passwd file where the row starts with R, and the middle o appears twice followed by a row with a small letter.
Grep '^ ro \ {2 \} [[: lower:]. *'/etc/passwd
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/3C/B3/wKiom1PB_GrCqxyDAABFgpAnyOE599.jpg "Title =" 3.jpg" alt = "wkiom1pb_grcqxydaabfgpanyoe599.jpg"/>
Location anchor, used to describe the location where a specified character appears.
^ The Beginning of the line is anchored, And the matched characters must appear at the beginning of the line. To write it in the leftmost mode.
$ Anchor at the end of the line, written at the rightmost of the Mode
^ $ Indicates blank rows
\ <The Beginning of the word is pinned, and it must be written on the left of the word
\> The end of the word is anchored and must be written at the end of the word.
The beginning and end of the word \ B are anchored. Can appear at the beginning or end of a word. It is equivalent to any one of \ <, \>.
It is called a word that consists of letters and data and does not contain special characters.
For example, find the rows that contain the home word in the/etc/passwd file.
Grep '\
This is part of the matching
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKiom1PB_HrCkji5AAB9ZXln700852.jpg "Title =" 4.jpg" alt = "wkiom1pb_hrckji5aab9zxln700852.jpg"/>
Grouping refers to the content in the expanded number as a whole, which can be referenced in the group.
\ (\) Treats several characters as a whole
\ (AB \) * indicates that AB can appear any time
\ N represents a number, and reference \ n references the expanded content in the nth brackets. The reference rule is the first part of the content between the first left brace (and the right brace corresponding to the left extension), and \ 1 is used for reference. The second (and corresponding) is \ 2 when the second part is referenced. And so on.
\ (1 \ (2 \ (3 \) \ (4 \) \ 1 \ 2 \ 3 \ 4 the matching result is 12341232334
For example, a text file contains the following content and uses references for matching.
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKioL1PB_G_w0ya8AABcmyMpPsM832.jpg "style =" float: none; "Title =" 5.jpg" alt = "wkiol1pb_g_w0ya8aabcmymppsm832.jpg"/>
Grep '\ (1 \ (2 \ (3 \) \ (4 \) \ 1 \ 2 \ 3 \ 4 'wukui
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKiom1PB_L_xEQlvAAA3rtjKBN8382.jpg "style =" float: none; "Title =" 6.jpg" alt = "wkiom1pb_l_xeqlvaaa3rtjkbn8382.jpg"/>
Grep '\ (1 \ (2 \ (3 \). * \ (4 \) \ 1 \ 2 \ 3 \ 4' wukui
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKiom1PB_T6jnTsZAABJphwxfU4912.jpg "Title =" clipboard.jpg "alt =" wkiom1pb_t6jntszaabjphwxfu4912.jpg "/>
For example, find the user whose first word is the same as the last word in/etc/passwd (that is, the user name and shell are the same)
Grep '\ (^ [^ [: Space:] [: punct:] * \> \). * \ 1 $'/etc/passwd
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B3/wKiom1PB_XfgZtDiAADO3Jg5UVU375.jpg "Title =" 7.jpg" alt = "wkiom1pb_xfgztdiaado3jg5uvu375.jpg"/>
Extended Regular Expression
The extended regular expression provides a little more matching capability than the basic regular expression, with the "or" matching capability. Some expressions do not need to be escaped. Only the first part of the metacharacters can be used for anchor and reference. The extended regular expression uses grep-E or egrep.
Character matching
. Any single character
[] Specified range
[^] Specifies the excluded range for Inverse calculation.
Matching times
* Any number of times
? 0 or 1 time
+ At least once, but not limited
{M} exact match m times
{M, n} Should be at least m times, followed by N times
{M,} has at least m times, but many are not limited
{0, n} can not appear. If it appears, it can appear up to n times.
For example, find the/etc/passwd file where the row starts with R, and the middle o appears twice followed by a row with a small letter.
Grep-e '^ Ro {2} [[: lower:]. *'/etc/passwd
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/3C/B3/wKioL1PB_XaClkE4AABPIzbNU9I959.jpg "Title =" 8.jpg" alt = "wkiol1pb_xaclke4aabpizbnu9i959.jpg"/>
The anchor, which is used in the same way as the basic regular expression.
^ First line anchored
$ Anchor at the end of a row
\ <, \ B Head anchored
\>, \ B ending point
^ $, Indicating blank rows
^ [[: Space:] * $ the same as above, it also indicates blank rows
Group. It is used in the same way as a basic regular expression. It is no longer written as \ escape.
() Treat the content in brackets as a whole
Reference:
The usage is the same as that of the basic regular expression.
\ 1, \ 2
Or select one or two before or after |
A | B a or B
Conc | cat indicates conc or cat. If you want to match the first and second characters, you need to expand the first and second characters. AB (c | A) BC indicates abcbc or ababc
For example, find the rows starting with R or B in the/etc/passwd file.
Grep '^ r | ^ B'/etc/passwd
650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/3C/B4/wKiom1PB_bSR0wyfAADxnuQOvCM347.jpg "Title =" 9.jpg" alt = "wkiom1pb_bsr0wyfaadxnuqovcm347.jpg"/>
If any error occurs, please help indicate 650) This. width = 650; "src =" http://img.baidu.com/hi/jx2/j_0059.gif "alt =" j_0059.gif "/> thanks!
This article is from my record blog, please be sure to keep this source http://wukui.blog.51cto.com/1080241/1437537