Getting Started with Linux--text processing the grep of the Three Musketeers

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the process of using Linux, a large number of text files need to be processed, which is in line with the philosophy of Linux-all documents. And because of this, Linux has built-in text-processing musketeers, which are grep, sed, awk, respectively. And the Three Musketeers in the process of using a different focus, today will be mainly about the text filtering tool grep.

1. Use of the grep command

grep option Keyword text path

grep root/etc/passwd

Grep

--color=auto: Color matching to the text display (7 auto display, 6 not shown)

-I: Ignore character case

-N: Show matching line numbers

-O: Show only the matching string

-Q: Silent mode, does not output any information

-E: Implementing a logical or relationship between multiple options

Grep–e ' Cat '-e ' dog ' file

-W: Matches entire word

2. Basic Regular Expressions

When it comes to grep, it has to be said that regular expressions, regular expressions are divided into two basic regular, extended regular, and many of the tools used later support regular expressions, such as Vim,sed,awk, including Python support. Regular expressions are divided into 4 classes of metacharacters: character matching, number of matches, position anchoring, grouping

Character Matching:

. Match any single character

[] matches any single character within the specified range

[^] matches any single character outside the specified range

[: Alnum:] Letters and numbers

[: Alpha:] represents any English uppercase and lowercase characters, i.e. A-Z, A-Z

[: Lower:] lowercase letter [: Upper:] Uppercase

[: Blank:] white space characters (spaces and tabs)

[: Space:] Horizontal and vertical white space characters (more than [: blank:] contains a wide range)

[: Cntrl:] non-printable control characters (backspace, delete, alarm ...) ）

[:d igit:] decimal digits [: xdigit:] hexadecimal digits

[: Graph:] printable non-whitespace characters

[:p rint:] printable characters

[:p UNCT:] Punctuation

Number of matches:

* match the preceding character any time, including 0 times

. * Any character of any length

\ match its preceding character 0 or 1 times

\+ matches the characters in front of it at least 1 times

\{n\} matches the preceding character n times

\{m,n\} matches the preceding character at least m times, up to N times

\{,n\} matches the preceding character up to n times

\{n,\} matches the preceding character at least n times

Location anchoring:

^ Beginning of the line anchor, for the leftmost mode

$ line End anchor for the right side of the pattern

^pattern$ for pattern matching entire row

^$ Empty Line

^[[:space:]]*$ Blank Line

\< or \b The first anchor for the left side of the word pattern

\> or \b ending anchor; for the right side of the word pattern

\<pattern\> Match Whole Word

Group:

Group: \ (\) binds one or more characters together as a whole,

The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...

\1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left

Example: Cat text |grep-o "\ (cat\). *\1\b"

Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself

Or: \|

Example:

Cat text |grep "Cat\|dong"

3. Basic Regular Expression Exercises

1 use DF and grep to remove disk partition utilization and sort from large to small

Df|grep-o "[[:d igit:]]\+%] |sort-n

2 Find the two-bit or three-digit number in the/etc/passwd

CAT/ETC/PASSWD |grep-o "\b[0-9]\{2,3\}\b"

3 Use this string: Welcome to magedu each character in Linux to de-order and sort, repeat the number of rows to the front
Echo ' Welcome to magedu Linux ' |grep-o ' [^[:space:] ' |sort |uniq-c |SORT-NR

4. Extending the regular expression

Through the previous exercises I found that the regular expression when writing a lot of \, write well look and expression package, and write when the idea is very clear, but after writing the reading is very poor, we inductive into the extension of the regular is: egrep \ grep-e, the two rules are the same, but the extension of the regular expression is much less \f Easy to read

Character Matching:

. Any single character

[] Specify the range of characters

[^] characters not in the specified range

Number of matches:

*: matches the preceding character any time

?: 0 or 1 times

+:1 Times or more

{m}: matches M-Times

{M,n}: At least m, up to N times

Location anchoring:

^: Beginning of the line

$: End of line

\<, \b: the first language

\>, \b: The end of the language

Group:

()

Back reference: \1, \2, ...

Or: |

Summary: The extended regular is Latter reference \1,\2,\3 , the first language \<,\b Tail \>,\b The anchor timing must be added \, other places do not need to add \

5. Extended regular exercise

1. Find the line at the beginning of the/etc/rc.d/init.d/functions file that has a word (including an underscore) followed by a parenthesis

Cat/etc/rc.d/init.d/functions |egrep "^.*\>\ (\)"

2. Use Egrep to remove its base name in/etc/rc.d/init.d/functions

echo/etc/rc.d/init.d/functions/|egrep-o "[^/]+/?$"

3. Use Egrep to remove the directory name expression exercise above the path

echo "/etc/rc.d/init.d/functions" |egrep-o ". */." |egrep-o ". */"

4, using extended regular expressions to represent 0-9, 10-99, 100-199, 200-249, 250-255 (often used to determine the correctness of the Ipv4 address)

echo {1..300}|egrep-o "\b[0-9]\b|\b[1-9][0-9]\b|\b1[0-9][0-9]\b|\b2[0-4][0-9]\b|\b25[0-5]\b" (the figure is too long for a truncated part)

5. Display all IPV4 addresses in ifconfig command results

Ifconfig |egrep-o "\< ([0-9]|[ 1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]) \.) {3} ([0-9]| [1-9] [0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]) \> "

Getting Started with Linux--text processing the grep of the Three Musketeers

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Getting Started with Linux--text processing the grep of the Three Musketeers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Getting Started with Linux--text processing the grep of the Three Musketeers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support