Use of grep, egrep regular expressions in Linux

Source: Internet
Author: User
Tags comparison table perl regular expression uppercase letter egrep

The origin of regular expressions

Regular expressions, written in English regular expression, are often abbreviated as regex, REGEXP, etc. in programming languages. It is a single string used to describe and match a series of strings conforming to a certain sentence.

Regular expressions are often used to retrieve and replace text that conforms to a pattern (pattern).

In the 1950 's, the Father of Unix, Ken Thompson, introduced the regular expression into the editor QED, then the editor Ed, which was eventually introduced into grep. Since then, regular expressions have been widely used in the tools of Unix or Unix-like systems, such as Perl.

In recent years, the mainstream operating system, mainstream development language can see the figure of the regular expression, mastering the regular expression and skilled application, become the system maintenance personnel, program developers necessary skills.

about grep and Egrep

Linux uses the GNU version of grep, the EGREP program.

grep, full name global search regular expression and print out of the line, fully search the regular expression and print out the lines.

grep can use regular expressions to search for text and print matching rows or matches, and it can use the-e option to use the Egrep feature, using the Perl regular Expression feature with-P.

Serial number Name English name Abbreviation
1 Basic Regular Expressions Basic Regular Expression BRE
2 Extending regular Expressions Extended Regular Expression Ere
3 Perl Regular Expressions Perl Regular Expression PRE
grep, egrep syntax

grep and Egrep are all pairs of text files that are searched by line in the given pattern (pattern).

1. grep

grep [OPTION] ... PATTERN [FILE] ...

Basic regular expressions are used by default

Using the –E option, use an extended regular expression

Using the –P option, use Perl regular expressions

2, Egrep

Egrep [OPTION] ... PATTERN [FILE] ...

Extended regular expressions are used by default

Using the –P option, use Perl regular expressions

grep Use Example

Example 1: Removing the IP address of the machine

1, Ifconfig | Grep-o "Inet addr:[^[:space:]]\+" | Cut-d:–f2

2, Ifconfig | Grep-o-E "inet addr:[^[:space:]]+" | Cut-d:–f2

3, Ifconfig | Egrep-o "Inet addr:[^[:space:]]+" | Cut-d:–f2

Example 2: Removal of the function name

1, Grep-o "\<[[:alnum:]]\+\> ()"/etc/rc.d/init.d/functions

2, Egrep-o "\<[[:alnum:]]+\>\ (\)"/etc/rc.d/init.d/functions

Example 3: Create a text file that reads as follows

He like his lover.
He love his lover.
He like his liker.
He love his liker.

Find out the last word is the line of the previous word +r.

# cat > 1.txt <<eof
> He like his lover.
> He love his lover.
> He like his liker.
> He love his liker.
> EOF

1, Egrep "(\<[[:alnum:]]+\>). *\1r" 1.txt

2, grep "\ (\<[[:alnum:]]\+\>\). *\1r" 1.txt

3, grep "\ (\<[[:alnum:]]\{1,\}\>\). *\1r" 1.txt

Regular Expression meta-character comparison table
Character Description Basic RegEx Extended RegEx Python RegEx Perl regEx
Escape \ \ \ \
^ Matches the beginning of a line, for example ' ^dog ' matches a row that begins with a string dog (note: In the awk directive, ' ^ ' is the beginning of the matching string) ^ ^ ^ ^
$ Matches the end of the line, for example: ' ^, dog$ ' matches the line that ends with the string dog (Note: ' $ ' is the end of the matching string in the awk Directive) $ $ $ $
^$ Match Blank Line ^$ ^$ ^$ ^$
^string$ Match rows, for example: ' ^dog$ ' matches rows with only one string of dog ^string$ ^string$ ^string$ ^string$
\< Match words, for example: ' \<frog ' (equivalent to ' \bfrog '), matching words beginning with frog \< \< Not supported Not supported (but can use \b to match words, for example: ' \bfrog ')
\> Match words, for example: ' frog\> ' (equivalent to ' frog\b '), matching words ending with frog \> \> Not supported Not supported (but can use \b to match words, for example: ' frog\b ')
\<x\> Match a word or a specific character, for example: ' \<frog\> ' (equivalent to ' \bfrog\b '), ' \<g\> ' \<x\> \<x\> Not supported Not supported (but you can use \b to match words, for example: ' \bfrog\b '
() Match expression, for example: ' (Frog) ' is not supported Not supported (but can use \ (\), such as: \ (dog\) () () ()
\(\) Match expression, for example: ' (Frog) ' is not supported \(\) Not supported (same ()) Not supported (same ()) Not supported (same ())
Match the preceding subexpression 0 or 1 times (equivalent to {0,1}), for example: where (IS)? can match "where" and "Whereis" Not supported (same \?)
\? Match the preceding subexpression 0 or 1 times (equivalent to ' \{0,1\} '), for example: ' where\ (is\) \? ' Can match ' where ' and ' Whereis ' \? Not supported (with?) Not supported (with?) Not supported (with?)
? When the character immediately follows any other restriction (*, +,?, {n},{n,}, {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+ ' will match a single "O", while ' o+ ' will match all ' o ' Not supported Not supported Not supported Not supported
. Match any single character except newline (' \ n ') (note: The period in the AWK directive matches the line break) . . (If you want to match any one of the characters, including "\ n", use: ' (^$) | (.) . . (If you want to match any one of the characters, including "\ n", use: ' [. \ n] '
* Matches the preceding subexpression 0 or more times (equivalent to {0,}), for example: zo* can match "z" and "Zoo" * * * *
\+ Match the preceding subexpression 1 or more times (equivalent to ' \{1, \} '), for example: ' Where\ (is\) \+ ' can match "Whereis" and "Whereisis" \+ Not supported (same +) Not supported (same +) Not supported (same +)
+ Matches the preceding subexpression 1 or more times (equivalent to {1,}), for example: zo+ can match "Zo" and "Zoo", but not "Z" Not supported (with \+) + + +
N n must be a 0 or a positive integer that matches the subexpression n times, for example: zo{2} can match Not supported (with \{n\}) N N N
{N,} "Zooz", but does not match "Bob" N must be a 0 or a positive integer, the matching sub-expression is greater than or equal to n times, for example: go{2,} Not supported (with \{n,\}) {N,} {N,} {N,}
{N,m} Can match "good", but cannot match GODM and N are nonnegative integers, where n <= m, a minimum of n matches and a maximum of M times, for example: o{1,3} will be equipped with "Fooooood" in the first three O (note that there can be no space between the comma and two numbers) Not supported (with \{n,m\}) {N,m} {N,m} {N,m}
X|y Match x or Y, for example: ' z| ' is not supported to match "Z" or "Food"; ' (z|f) Ood ' matches "Zood" or "food" Not supported (with X\|y) X|y X|y X|y
[0-9] Match any numeric character from 0 to 9 (note: To be written in increments) [0-9] [0-9] [0-9] [0-9]
[XYZ] A character set that matches any one of the characters contained, for example: ' [ABC] ' can match ' a ' in ' Lay ' (note: If metacharacters, for example:. * etc, they are placed in [], then they will become a normal character) [XYZ] [XYZ] [XYZ] [XYZ]
[^XYZ] Negative character set, matching any character not included (note: ' [^ABC] ' can match ' L ' in ' Lay ' (note: [^xyz] in the awk directive is a match for any character not included + line break) [^XYZ] [^XYZ] [^XYZ] [^XYZ]
[A-za-z] Matches any one of the characters in uppercase or lowercase letters (note: To be written as incrementing) [A-za-z] [A-za-z] [A-za-z] [A-za-z]
[^a-za-z] Matches any character except uppercase and lowercase letters (note: write increment) [^a-za-z] [^a-za-z] [^a-za-z] [^a-za-z]
\d Match any numeric character from 0 to 9 (equivalent to [0-9]) Not supported Not supported \d \d
\d Match non-numeric characters (equivalent to [^0-9]) Not supported Not supported \d \d
\s Matches any non-whitespace character (equivalent to [^\f\n\r\t\v]) Not supported Not supported \s \s
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on (equivalent to [\f\n\r\t\v]) Not supported Not supported \s \s
\w Matches any non-word character (equivalent to [^a-za-z0-9_]) \w \w \w \w
\w Match any word character that includes an underscore (equivalent to [a-za-z0-9_]) \w \w \w \w
\b Matches a non-word boundary, for example: ' er\b ' can match ' er ' in ' verb ', but not ' er ' in ' Never ' \b \b \b \b
\b Match a word boundary, that is, the position between the word and the space, for example: ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb ' \b \b \b \b
\ t Matches a horizontal tab (equivalent to \x09 and \CI) Not supported Not supported \ t \ t
\v Matches a vertical tab (equivalent to \x0b and \ck) Not supported Not supported \v \v
\ n Match a line break (equivalent to \x0a and \CJ) Not supported Not supported \ n \ n
\f Match a page break (equivalent to \x0c and \CL) Not supported Not supported \f \f
\ r Match a carriage return character (equivalent to \x0d and \CM) Not supported Not supported \ r \ r
\\ Match the escape character itself "\" \\ \\ \\ \\
\cx Matches the control character indicated by x, for example: \CM matches a control-m or carriage return, the value of x must be one of a-Z or a-Z, otherwise, C is treated as a literal ' C ' character Not supported Not supported \cx
\xn Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long, for example: ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. ASCII encoding can be used in regular expressions Not supported Not supported \xn
\num Matches num, where num is a positive integer. Represents a reference to the obtained match Not supported \num \num
[: Alnum:] Match any one letter or number ([a-za-z0-9]), for example: ' [[: Alnum:]] ' [: Alnum:] [: Alnum:] [: Alnum:] [: Alnum:]
[: Alpha:] Match any one letter ([a-za-z]), for example: ' [[: Alpha:]] ' [: Alpha:] [: Alpha:] [: Alpha:] [: Alpha:]
[:d Igit:] Match any number ([0-9]), for example: ' [[:d igit:]] ' [:d Igit:] [:d Igit:] [:d Igit:] [:d Igit:]
[: Lower:] Match any lowercase letter ([A-z]), for example: ' [[: Lower:]] ' [: Lower:] [: Lower:] [: Lower:] [: Lower:]
[: Upper:] Match any uppercase letter ([A-z]) [: Upper:] [: Upper:] [: Upper:] [: Upper:]
[: Space:] Any whitespace character: supports tabs, spaces, for example: ' [[: Space:]] ' [: Space:] [: Space:] [: Space:] [: Space:]
[: Blank:] Spaces and tabs (Landscape and portrait), for example: ' [[: Blank:]] ' ó ' [\s\t\v] ' [: Blank:] [: Blank:] [: Blank:] [: Blank:]
[: Graph:] Any visible and printable character (note: not including spaces and line breaks, etc.), for example: ' [[: Graph:]] ' [: Graph:] [: Graph:] [: Graph:] [: Graph:]
[:p rint:] Any character that can be printed (note: not included: [: Cntrl:], string terminator ' \ s ', EOF file Terminator (-1), but includes a space symbol), for example: ' [[:p rint:] ' [:p rint:] [:p rint:] [:p rint:] [:p rint:]
[: Cntrl:] Any control character (the first 32 characters in the ASCII character set, that is, in decimal notation from 0 to 31, for example: line break, tab, and so on), for example: ' [[: Cntrl:] ' [: Cntrl:] [: Cntrl:] [: Cntrl:] [: Cntrl:]
[:p UNCT:] Any punctuation mark (not including: [: Alnum:], [: Cntrl:], [: space:] These character sets) [:p UNCT:] [:p UNCT:] [:p UNCT:] [:p UNCT:]
[: Xdigit:] Any one hexadecimal number (i.e.: 0-9,a-f,a-f) [: Xdigit:] [: Xdigit:] [: Xdigit:] [: Xdigit:]

Learning materials

1. Regular expression 30-minute introductory tutorial. This is a very good document, the author of the year is to read this article learned regular expression.

Http://deerchao.net/tutorials/regex/regex.htm

Reference documents

1. wikipedia

2. Blog Park, college Donkey, Regular expression description of various genres and shell regular expression

Http://www.cnblogs.com/finallyliuyu/archive/2013/05/27/3101220.html

This article is from the "End of Nanshan" blog, please be sure to keep this source http://me2xp.blog.51cto.com/6716920/1435095

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.