Linux Text Processing "Three Musketeers"--grep

Source: Internet
Author: User
Tags diff echo command uuid expression engine egrep

A little touch with Linux will know that there are three very powerful text processing tools, that is grep, sed and awk, you must have heard of it.


Linux Text Processing Three musketeers:

grep, Egrep, Fgrep: Text Filter tool (pattern: pattern);

grep: Basic regular expression, supports-e,-f

Egrep: Extended regular expression, supports-g,-f

Fgrep: Regular expressions are not supported,

Sed:stream Editor, stream editors, text editing tools;

The implementation on Awk:linux is gawk, Text Report Generator (formatted text);


The above three tools support regular expressions


Regular Expressions: regual expression, REGEXP

A pattern written by a class of special characters and text characters, in which some characters do not represent their literal meanings, but are used to denote the function of control or distribution;

Divided into two categories:

Basic Regular Expressions: BRE

Extended Regular expression: ERE


Grep:global search REgular expression and Print out of the line.

Function: The text Search tool, according to user-specified "mode (filter)" to match the target text line by row to check; print matching lines

Patterns: Filter conditions written by metacharacters and text characters of regular expressions


Format:

grep [OPTIONS] PATTERN [FILE ...]

grep [OPTIONS] [-E PATTERN |-f file] [FILE ...]


Options:

--color=auto: Highlighting after the matching text is shaded; (defined in Administrator alias in Centos7, not defined in CENTOS6)

[Email protected]_1 ~]# Alias

Alias grep= ' grep--color=auto '


-i:ignorecase, ignoring the case of characters

[[email protected] ~]# grep-i "UUID"/etc/fstab

UUID=48604746-41C1-41DF-AAF1-F3588BFD3EDC/XFS Defaults 0 0


-O: Show only the string that matches to itself

[[email protected] ~]# grep-o "UUID"/etc/fstab

Uuid


-V,--invert-match: Displays the rows that cannot be matched by the pattern;

[Email protected]_1 ~]# grep-v "UUID"/etc/fstab


-N: Display pattern matches the line number of the content

-E: Support for using extended regular expression meta-characters

-Q,--quiet,--silent: Silent mode, which does not output any information

-A #:after, after # line

[[Email protected]_1 ~]# grep-a 1 "root"/etc/passwd

Root:x:0:0:root:/root:/bin/bash

Bin:x:1:1:bin:/bin:/sbin/nologin

-B #:before, front # line

[Email protected]_1 ~]# grep-b 1 "^bin"/etc/passwd

Root:x:0:0:root:/root:/bin/bash

Bin:x:1:1:bin:/bin:/sbin/nologin

-c #:context, front and back # lines

[Email protected]_1 ~]# grep-c 1 "^bin"/etc/passwd

Root:x:0:0:root:/root:/bin/bash

Bin:x:1:1:bin:/bin:/sbin/nologin

Daemon:x:2:2:daemon:/sbin:/sbin/nologin

Basic regular Expression meta-characters:

Character Matching:

. : matches any single character;

[[email protected] sh]# grep "[[:p UNCT:]]." Issue

\s

[]: matches any single character within the specified range;

[^]: matches any single character outside the specified range;


[[: Upper:]]: All uppercase letters

[[: Lower:]]: All lowercase letters

[[: Alpha:]]: All letters

[[:d Igit:]]: All numbers

[[: Alnum:]]: All letters and numbers

[[:p UNCT:]]: Match punctuation

[[: Space:]]: match whitespace characters

Note: Range matching in regular expressions is case-sensitive


Number of matches:

Used to limit the number of occurrences of the preceding character, after the character to specify the number of occurrences, by default working in greedy mode


*: Matches its preceding character any time, 0, 1, multiple times;

For example: grep "X*y", will match the following multiple conditions

Abxy, Aby, Xxxxxy, Yab

. *: Matches any character of any length

\?: matches the preceding character 0 or 1 times, that is, the preceding character is optional;

\+: Matches the preceding character 1 or more times, i.e. the character of its face must appear at least 1 times;

\{m\}: Matches the preceding character m times;

\{m,n\}: Matches its preceding character at least m times, up to n times;

\{0,n\}: Up to n times

\{m,\}: At least m times


Location anchoring:

^: Anchor at the beginning of the line, for the leftmost mode;

$: End of line anchoring; for the rightmost side of the pattern;

^pattern$: Used for PATTERN (root) to match the entire line;

^$: blank line;

^[[:space:]]*$: A blank line or a line containing white space characters;

\< or \b: The first anchor of the word, used for the left side of the word pattern;

\> or \b: The ending anchor for the right side of the word pattern;

\<pattern\>: matches complete words;


Word: A continuous character (string) consisting of a non-special character is called a word;


Grouping and referencing:

\ (\): Bind one or more characters together and treat them as a whole;

Example: \ (xy\) *ab

Note: The contents of the pattern matching in the grouping brackets are automatically recorded in the internal variables by the regular expression engine, and these variables are:

\1: The pattern from the left side, the first opening parenthesis, and the closing parenthesis that matches to the character

\2: The pattern from the left side, the second opening parenthesis, and the matching closing parenthesis to the character

\3: The pattern from the left side, the third opening parenthesis, and the matching closing parenthesis to the character

...


Example:

He loves his lover.

He likes his lover.

She likes her liker.

She loves her liker.

[Email protected]_1 ~]# grep ' \ (L.. e\). *\1 "Lovers.txt

Back reference: Refers to the character that matches the pattern in the preceding grouping brackets;


The above-mentioned is the use of GREP commands and basic regular expressions, grep is a regular expression support, can be used in conjunction with the work. Let's start with a few exercises :

1. Display the lines in the/etc/passwd file that do not end in/bin/bash;

2. Find the two-digit or three-digit number in the/etc/passwd file;

3, find the/etc/rc.d/rc.sysinit or/etc/grub2.cfg file, with at least one blank character beginning, and followed by a non-whitespace character line;

4. Find the line ending with ' LISTEN ' followed by 0, 1, or more whitespace characters in the result of the "Netstat-tan" command;


Answer:

1, [[email protected]_1 ~]# grep-v "/bin/bash$"/etc/passwd

2, [[Email protected]_1 ~]# grep "\<[0-9]\{2,3\}\>"/etc/passwd

3, [[Email protected]_1 ~]# grep "^[[:space:]]\+[^[:space:]]"/etc/grub2.cfg

4, [[email protected]_1 ~]# Netstat-tan | grep "listen[[:space:]]*$"


===========================================================================================


egrep:

An extended regular expression implementation is similar to the grep text filtering feature; grep-e


Format:

Egrep [OPTIONS] PATTERN [FILE ...]

Options:

-I,-O,-V,-Q,-A,-B,-C

-G: Support for basic regular expressions

Extend the metacharacters of regular expressions:

The extension regular is more than the basic regular one "or" matching pattern, and the extended regular expression is easier to read than the basic regular expression


Character Matching:

.: Any single character

[]: Any single character within the specified range

[^]: Any single character outside the specified range


Number of matches:

*: Any time, 0,1 or multiple times;

?: 0 or 1 times, before the characters are optional;

+: Its preceding character at least 1 times;

{m}: its preceding character m times;

{M,n}: At least m times, up to n times;

{0,n}

{m,}

Location anchoring:

^: Anchor at the beginning of the line;

$: End of line anchoring;

\< \b: The first anchor of the word;

\> \b: the final anchor;

Grouping and referencing:

(): grouping; the character that the pattern in parentheses matches to is recorded hermetical the internal variables of the expression engine;

Back reference: \1, \2, ...


Or:

A|b:a or B;

C|cat:c or Cat

(c| C) At:cat or cat


Practice:

1. Find all the lines in the/proc/meminfo file that begin with uppercase or lowercase s; there are at least three ways to implement it;

2. Displays information about root, CentOS, or User1 users on the current system;

3. Find the line with a parenthesis followed by a word in the/etc/rc.d/init.d/functions file

4, using the echo command output an absolute path, using Egrep to remove the base name;

/var/log/messages takes out its path name, similar to the result of executing the dirname command against it;

5. Find the value between 1-255 in the result of ifconfig command;

6. Find the IP address in the result of ifconfig command;

7, add user Bash, Testbash, basher and Nologin (its shell is/sbin/nologin), and then find the/etc/passwd file in the user name with the shell name of the line;


Solution:

1.

[Email protected] sh]# grep-i ' ^s '/proc/meminfo

[[email protected] sh]# grep ' ^[ss] '/proc/meminfo

[Email protected] sh]# egrep ' ^ (s| S) '/proc/meminfo

2, first need to create users

[[email protected] sh]# grep-e "^ (root|centos|user1) \>"/etc/passwd

3, [[email protected] sh]# grep-e-O "[_[:alnum:]]+\ (\)"/etc/rc.d/init.d/functions

4, [[email protected] sh]# echo/etc/sysconfig/| GREP-E-O "[^/]+/?$"

[Email protected] sh]# Echo/var/log/messages | GREP-E-O "^[/]? */"

5, [[email protected] sh]# Ifconfig | GREP-E-O "\< ([1-9]|[ 1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]) \> "

6.

7, [[email protected] sh]# grep-e "^ ([^:]+\>). *\1$"/etc/passwd


Fgrep: Regular expression meta-characters are not supported;

When there is no need to use meta-characters to write patterns, the use of fgrep will be better and faster;




Text viewing and processing tools: WC, cut, Sort, Uniq, diff, Patch



Wc:word count (count text words)

Format:

WC [OPTION] ... [FILE] ...

Options:

-l:lines (number of rows)

-w:words (Word: All consecutive letters are called words)

-c:bytes (number of bytes: size)

Example:

[Email protected]_1 ~]# WC anaconda-ks.cfg

101 1143 Anaconda-ks.cfg

Solution: 43-line, 101-word, 1143-character size



Cut: Separates interception of specified content

Format:

Cut OPTION ... [FILE] ...

Options:

-B: Split in bytes. These byte locations will ignore multibyte character boundaries unless you also specify the-n flag

-N: Cancels splitting multibyte characters. Used only with the-B flag. If the last byte of the character falls within the range of <br/> indicated by the List parameter of the-B flag, the character will be written out;

-D CHAR: Delimiter with the specified character;

-F Fields: the selected field;

#: The specified single field;

#-#: multiple consecutive fields;

#,#: discrete multiple fields;

#-: Specify a single field to the last

Example:

[Email protected]_1 sh]# Cat Issue | Cut-b 2-6

[[Email protected]_1 sh]# cat/etc/passwd | Cut-d:-f2


Sort: Sorting

Format:

Sort [OPTION] ... [FILE] ...

Options:

-N: Sort based on numeric size instead of characters;

-T CHAR: Specifies the delimiter;

-K #: The field used for sorting comparisons;

-R: Reverse order;

-F: Ignore character case

-U: Duplicate lines retain only one copy;

Repeating rows: continuous and identical;

Example:

[Email protected] sh]# sort-t:-K 3-n/etc/passwd

[Email protected] sh]# sort-t:-K 3-n-r/etc/passwd


Uniq: report or remove duplicate rows

Format:

Uniq [OPTION] ... [INPUT [OUTPUT]]

Options:

-C: Shows the number of repetitions per line;

-U: Displays only rows that have not been duplicated;

-D: Displays only the rows that have been repeated;


Diff:compare files lines by line (file-by-row comparison)

Format:

diff [OPTION] ... FILES

Diff/path/to/oldfile/path/to/newfile >/path/to/patch_file

Options:

-U: Using the unfied mechanism, which displays the context of the row to be modified, the default is 3 rows;

Example:

[[Email protected] sh]# diff issue.bak issue >> Buding_issue.patch


Patch: Patching Files

Format:

Patch [OPTIONS]-i/path/to/patch_file/path/to/oldfile

[Email protected] sh]# Patch-i buding_issue.patch Issue


Patch/path/to/oldfile </path/to/patch_file

[[Email protected] sh]# patch./issue <./buding_issue.patch


Note: diff and Patch are used in conjunction, first to compare the differences in two files, to generate a patch file. Patch files for specified files


This article from "Disguised geek" blog, declined reprint!

Linux Text Processing "Three Musketeers"--grep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.