Linux-based text Processing tool grep and regular expressions (with egrep different from grep)

Source: Internet
Author: User
Tags expression engine egrep

Text Processing tool grep, regular expressions in the Linux learning process is prone to confusion and obstacles in the place, here to share some of the experience of this content.


grep Global search REgular expression and Print out of the line

Function: Text Search tool, according to the user specified ' mode (filter condition) ' to match the target text line by row, print the matching line;

' Pattern ': the filter condition written by the metacharacters and text characters of the regular expression.

grep [OPTIONS] PATTERN [FILE ...]

grep [OPTIONS] [-E PATTERN |-f file] [FILE ...]

Common options:

-I: Ignore character case

-O: Show only the string that matches to itself

-V: Show rows that cannot be matched by a pattern

-E: Support for extended regular expression meta-characters

-Q: Silent mode, matching does not display

-a#:after, showing the line after the match condition is in the row

-b#:before, displays the first # line of the row in which the match condition is

-c#:context, showing the line before and after the match condition is in the row

-N: Show matching line numbers (less used)

-C: Count the number of matching rows (less used)

Here are a few small experiments on the use and options of grep to do a concrete demonstration

Experimental catalogue/test Text/test/head

[email protected] test]# cat head 12345 sdabc Saber Berar bserac12cds67890123451236

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/85/85/wKioL1enLOHQBBq6AAA8VZfgRn8682.gif "title=" Grep4.gif "alt=" Wkiol1enlohqbbq6aaa8vzfgrn8682.gif "/>


Regular expression: Regual expression,regexp

It is a pattern written by a class of special characters and text characters, some of which do not represent their literal meaning, but are used to denote control or the function of a wildcard.

It is divided into two categories: the basic regular Expression Bre, and the extended regular expression ere


Basic regular Expression meta-characters:

Include: Character matching, number of matches, position anchoring, grouping


Character Matching:

. : matches any single character; []: matches any single character in the specified range

[^]: matches any single character outside the specified range

Common collections: [:d igit:], [: Lower:], [: Upper:], [: Alpha:], [: Alnum:], [:p UNCT:], [: Space:]

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/85/86/wKiom1enJHPSbB2mAAAOAbMrRZ4951.gif "style=" float: none; "title=" Grep-a1.gif "alt=" Wkiom1enjhpsbb2maaaoabmrrz4951.gif "/>

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/85/85/wKioL1enJHSBRYXkAAAOp5xCHD0739.gif "style=" float: none; "title=" Grep-a2.gif "alt=" Wkiol1enjhsbryxkaaaop5xchd0739.gif "/>

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/85/86/wKiom1enJHTidiqIAAANM23CGag486.gif "style=" float: none; "title=" Grep-a3.gif "alt=" Wkiom1enjhtidiqiaaanm23cgag486.gif "/>


Number of matches: used after the number of characters to be specified, to specify the number of occurrences of the preceding character

*: matches the preceding character any time, including 0 times; greedy mode: match as long as possible

. *: Any character of any length

\?: match its preceding character 0 or 1 times

\+: Matches the preceding characters at least 1 times

\{m\}: Matches the preceding character m times

\{m,n\}: Matches the preceding character at least m times, up to N times

\{,n\}: Matches the preceding character up to n times

\{m,\}: Matches the preceding character at least m times

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/85/85/wKioL1enJU_hYm2UAAAh7gQ1Uh8244.gif "title=" Grep-b1.gif "alt=" Wkiol1enju_hym2uaaah7gq1uh8244.gif "/>

Position anchoring: positioning where it appears

^: Anchor at the beginning of the line for the leftmost mode

$: End-of-line anchoring for the right-most mode

^pattern$: For pattern matching entire row

^$: Blank Line

^[[:space:]]*$: Blank Line

Word: A continuous character consisting of non-special characters in Linux appears to be called a word

\< or \b: The first anchor of the word, used for the left side of the word pattern

\> or \b: the ending anchor; for the right side of the word pattern

\<pattern\>: matches the complete word

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M00/85/86/wKiom1enKfmg9DkTAAAN_-0PJok776.gif "title=" Grep-c1.gif "style=" Float:none; "alt=" Wkiom1enkfmg9dktaaan_-0pjok776.gif "/>

1. Find lines that begin with for

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/85/86/wKiom1enKfrRZcIhAAAMy8Leg-M242.gif "style=" float: none; "title=" Grep-c2.gif "alt=" Wkiom1enkfrrzcihaaamy8leg-m242.gif "/>

2. Retrieve the string containing for only; Retrieve the content containing for

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/85/86/wKiom1enKfqhfPVHAAAXAecfD08440.gif "title=" Grep-c6.gif "style=" Float:none; "alt=" Wkiom1enkfqhfpvhaaaxaecfd08440.gif "/>

3. Retrieves the line ending with a for;

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/85/85/wKioL1enKfrhkNqvAAAUSl5w1lg347.gif "title=" Grep-c5.gif "style=" Float:none; "alt=" Wkiol1enkfrhknqvaaausl5w1lg347.gif "/>


Group: \ (\): Binds one or more characters together as a whole, such as: \ (root\) \+

The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...

\1: From the left, the first opening parenthesis and the matching closing parenthesis to match the pattern between the characters;

Example: \ (string1\+\ (string2\) *\)

\1:string1\+\ (string2\) *

\2:string2

Back reference: References the pattern in the preceding grouping brackets to match the character (not the pattern itself)

650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M02/85/85/wKioL1enK2mg9G-HAAAM4T6H4HY417.gif "title=" Grep-d1.gif "alt=" Wkiol1enk2mg9g-haaam4t6h4hy417.gif "/>

The above command means to retrieve the result that contains a for string followed by any character and occurs once, in successive cases, two times. The next \1 is to repeat the retrieved object within the first parenthesis.


egrep= GREP-E

Egrep[options] PATTERN [FILE ...]

Extend the metacharacters of regular expressions:


Character matching: Same as basic regular expression


Number of matches:

*: matches the preceding character any time

?: 0 or 1 times

+:1 Times or more

{m}: matches M-Times

{M,n}: At least m, up to N times


Position anchoring: Same as basic regular expression


Group:

()

Back reference: \1, \2, ...


Or:

A|b

C|cat:c or Cat

(c|c) At:cat or cat


Finally, we have 9 examples to feel how grep can be combined with regular expressions to achieve the function

1. Display lines in the/proc/meminfo file that begin with size S

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/85/86/wKioL1enOhTjFKj3AAAZ2eDQ6FQ131.gif "title=" Grep-f1.gif "alt=" Wkiol1enohtjfkj3aaaz2edq6fq131.gif "/>

This only needs to know that grep's option I can be easily solved.


2. Display lines in the/etc/passwd file that do not end in/bin/bash

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M00/85/86/wKiom1enOiWhEjtbAABldH1yAcU272.gif "title=" Grep-f2.gif "alt=" Wkiom1enoiwhejtbaabldh1yacu272.gif "/>

Use grep first to retrieve the line that contains the end of "/bin/bash", and then use grep's-V to take the row without the result above. Similar to the complement effect in mathematics.


3. Find all IPv4 address of native in ifconfig command result

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M00/85/86/wKioL1enOjiRNyOvAAA3quiV3Y0006.gif "title=" Grep-f3.gif "alt=" Wkiol1enojirnyovaaa3quiv3y0006.gif "/>

This is divided into three steps:

1) by using grep to lock the line containing the IPV4, this can be seen by analyzing the information listed in Ifconfig, as long as there is a IPv4 in the beginning of inet the letter, so we just need to retrieve it on the line

2) Next use TR to replace all empty with ":" and compress

3) Use cut to achieve results.


4. Find out the maximum percentage value of partition space utilization

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/85/86/wKiom1enOmSgiGVhAAByMj9e38k462.gif "style=" float: none; "title=" Grep-f41.gif "alt=" Wkiom1enomsgigvhaabymj9e38k462.gif "/>

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/85/86/wKioL1enOmXh5z0yAAAvcobvGfo647.gif "style=" float: none; "title=" Grep-f42.gif "alt=" Wkiol1enomxh5z0yaaavcobvgfo647.gif "/>

This is roughly divided into 6 steps:

1) Filtering Chinese characters

2) Replace all null with ":" With TR and compress

3) Use cut to cut out values that contain percent of usage

4) again using TR reject%

5) Sort by numeric case

6) Use tail to remove the maximum value


5. Show user RPC default shell program

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/85/86/wKiom1enOnbgCk-MAAAXCtu3GCc540.gif "title=" Grep-f5.gif "alt=" Wkiom1enonbgck-maaaxctu3gcc540.gif "/>

The above search condition is the line with the beginning of RPC and the end of it as a character


6. Find out the two-bit or three-digit number in/etc/passwd

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M02/85/86/wKioL1enOoawqYciAADHo1B8ojs316.gif "title=" Grep-f6.gif "alt=" Wkiol1enooawqyciaadho1b8ojs316.gif "/>

The extension regular expression is used here because it can be more concise than an expression.

It is important to note that a two-bit or three-digit string is required to anchor the character at the beginning and end of the


7. Find the line at the beginning of the/etc/rc.d/init.d/functions file that has a word (including an underscore) followed by a parenthesis

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/85/86/wKiom1enOpeQbOJyAAAvRCxLKdk216.gif "title=" Grep-f7.gif "alt=" Wkiom1enopeqbojyaaavrcxlkdk216.gif "/>

When we want to filter the conditions, to the beginning of the line and the character anchor, otherwise it will lead to the condition is not rigorous

Note here that the. *\>\ (\), if rewritten here as. *\ (\) \> will fail, you can think for yourself. In fact, * has included the "()", so the following is repeated, so it is prone to error.


8. Use Egrep to remove its base name in/etc/rc.d/init.d/functions

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/85/86/wKioL1enOqqhsZk8AAAXnJhJ6zU480.gif "style=" float: none; "title=" Grep-f81.gif "alt=" Wkiol1enoqqhszk8aaaxnjhj6zu480.gif "/>

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/85/86/wKiom1enOqrBIWfSAAAbhMWWp5g365.gif "style=" float: none; "title=" Grep-f82.gif "alt=" Wkiom1enoqrbiwfsaaabhmwwp5g365.gif "/>

There are two methods, one is to use grep to retrieve it directly, and the other is to split the idea. Each has its own characteristics


9. Use extended regular expressions to represent 0-9, 10-99, 100-199, 200-249, 250-255, respectively

\<[0-9]\>:0-9

\<[1-9][0-9]|>:10-99

\<1[0-9][0-9]\> | \<1[0-9]{2}\>:100-199

\<2[0-4][0-9]\>: 200-249

\<25[0-5]\>:250-255


This article is from "Zhang Fan-it's fantasy drifting" blog, please be sure to keep this source http://chawan.blog.51cto.com/9179874/1835426

Linux-based text Processing tool grep and regular expressions (with egrep different from grep)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.