(Zhuan) RE that sed can directly recognize ...... (Sed for beginners )!
Source: Internet
Author: User
(Zhuan) RE that sed can directly recognize ...... (Sed for beginners )! -- Linux general technology-Linux technology and application information. For details, refer to the following section. When using sed, I am confused about the regular expression in a few places. to thoroughly understand it, try it ......
We will present the results to you. Please criticize and correct them! It also helps beginners avoid detours.
Although some problems are easy to understand, it seems unnecessary to post them. But before understanding them, it is really confusing, so I decided to post them.
In basic regular expressions the metacharacters ?, +, {, |, (, And) lose their special meaning; instead use the backslashed versions \?, \ +, \ {, \ |, \ (, And \).
--- The above is from regular expressions in man grep.
The qualifier in the regular expression:
* Matches the previous subexpression zero or multiple times. For example, zo * can match "z" and "zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "zoo", but cannot match "z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "do (es )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Match n times. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} n is a non-negative integer. Match at least n times. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood.
Both {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
\ D matches a numeric character. It is equivalent to [0-9].
\ D matches a non-numeric character. It is equivalent to [^ 0-9].
\ W matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
PS: linux 9 + sed-4.0.5-1
$ Cat text
1 *
1 +
1?
1.
^ 1
1 $
1 {}
1 {0 ,}
1 {1 ,}
1 {1, 10}
1 ()
1 []
1 [a-z]
1 [^ a-z]
1 [a-z $]
1111
222222
1a
1bb
1 \\
Aaa aa
Boy \ gire \ people
Boy3gire5people
Boy4gire9people
X | y
Aab | cddd
212
21 \
T \ w
T \ W
Y \ d
Y \ D
$ Cat text | sed-n'/1 */P'
All text content is displayed, because sed can directly identify the regular expression symbol *,/1 */indicates matching with all rows including 1 and not including 1.
$ Cat text | sed-n'/1 \ */P'
Only 1 * is displayed. * after escaping, it is only a common character.
$ Cat text | sed-n'/1 +/P'
Only show 1 + lines, because sed does not directly recognize +, but treats it as a common character.
$ Cat text | sed-n'/1 \ +/P'
Show all rows containing one or more 1, + after escaping, it becomes a regular expression.
$ Cat text | sed-n'/1? /P'
Show only 1? This line, because sed does not directly recognize that the regular expression matches? Is used as a common character.
$ Cat text | sed-n'/1 \? /P'
Show all rows,/1 \? /Matches all rows including 1 and not 1.
$ Cat text | sed-n'/1./P'
Display All rows containing 1, because sed can directly identify the regular expression symbol ".".
$ Cat text | sed-n'/1 \./P'
Only the line Matching 1. is displayed. "." is only a common character after escape.
$ Cat text | sed-n'/^ 1/P'
Display All rows starting with 1, because sed can directly identify that the regular expression conforms to ^.
$ Cat text | sed-n'/\ ^ 1/P'
The line ^ 1 is displayed. ^ is only a common character after escaping.
$ Cat text | sed-n'/1 $/P'
Show all rows ending with 1, because sed can directly recognize the regular expression symbol $.
$ Cat text | sed-n'/1 \ $/P'
1 $ is displayed, and $ is only a common character after escaping.
$ Cat text | sed-n'/1 {}/ P'
Only show 1 {}, because sed does not directly recognize that the regular expression matches {}.
$ Cat text | sed-n'/1 {1,}/P'
Same principle
$ Cat text | sed-n'/1 {}/P'
Same principle
$ Cat text | sed-n'/1 {0,}/P'
Same principle
$ Cat text | sed-n'/1 ()/P'
Same principle
$ Cat text | sed-n'/1 \ {2, \}/P'
After escaping, {} has a regular expression qualifier in sed, so two or more rows are displayed.
$ Cat text | sed-n'/1 \ {0, \}/P'
The principle is the same as above. All rows with and without 1 are displayed.
$ Cat text | sed-n'/1 []/P'
The error "sed:-e expression #1, char 6: Unterminated address regex" is reported because sed can directly identify the regular expression symbol [], but it is null in the middle, no related set (letters or numbers.
$ Cat text | sed-n'/1 \ [\]/P'
1 [] is displayed, and [] is only a common character after escape.
$ Cat text | sed-n'/1 [0-9]/P'
1 {1, 10}
1111
But I don't understand why it ran the first line? Is there something wrong with my machine ??? Because/1 [0-9]/matches "10" in!
$ Cat text | sed-n'/1 [^ 0-9]/P'
All rows that contain 1 and are not followed by numbers are displayed.
$ Cat text | sed-n'/boy \ gire \ people/P'
None. Because sed does not know what \ g \ p is, no matching stuff can be found! : D
$ Cat text | sed-n'/boy \ gire \ people/P'
The line containing boy \ gire \ people is displayed. The first \ converts the followed \ to a common character.
$ Cat text | sed-n'/boy [0-9]/P'
Boy3gire5people
Boy4gire9people
However
$ Cat text | sed-n'/boy \ d/P'
$ Cat text | sed-n'/boy \ d \ {1 \}/P'
\ D is not equivalent to [0-9]. Why is it not displayed here?
I have not found any tips for using \ d \ D \ w \ W. Please advise me!
After testing, I found that:
\ W in sed seems to be equivalent to '[A-Za-z _]'
\ W in sed seems to be equivalent to '[^ A-Za-z _]'
\ D at sed, medium price at 'D'
\ D at sed, medium price at 'D'
\ S medium price in sed than 's'
\ S medium price in sed than 's'
$ Cat text | sed-n'/B | c/P'
Displays the line aab | cddd. Sed treats | as a common character.
$ Cat text | sed-n'/x | y/P'
The Line x | y is displayed. Sed treats | as a common character.
$ Cat text | sed-n'/aab \ | 212/P'
Aab | cddd
212
| It indicates the "or" function after escaping, and displays the rows matching aab or 212.
Summary:
RE that can be used directly in sed:
*. ^ $ [] \ C
\ W (equivalent to [A-Za-z _]) \ W (equivalent to [^ A-Za-z _])
RE to be escaped in sed:
+? {} () | <>
Others:
\ D (d) \ D (D) \ s (s) \ S (S)
Match single quotes 'with double quotation marks
Single quotation marks are used to match the quotation marks.
\ B match the front or back boundary of a word (character [^ A-Za-z0-9 _] all constitute the word boundary)
\ B match the non-boundary of the word (the character [^ A-Za-z0-9 _] all constitute the word boundary)
\ <Match the word position before (character [^ A-Za-z0-9 _] constitute the word boundary)
\> After matching the position of a word (the character [^ A-Za-z0-9 _] all constitute the word boundary)
Supports decimal, in the format of "\ d + two or three decimal numbers". For example, "\ d065 or \ d65 represents character"
The hexadecimal format is "\ x + two hexadecimal numbers". For example, "\ x61 stands for character"
The octal sequence is supported. The format is \ o + two or three octal sequence numbers. For example, "\ o077 or \ o77 represents the character ?, \ O101 represents the character ""
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.