Linux Text processing the Three Musketeers grep

Source: Internet
Author: User
Tags expression engine egrep

The grep command in a Linux system is a powerful text search tool that uses regular expressions to search for text and print the rows that match.


The full name of grep is: Global Search Regular Expression and print out the Line.grep works by searching for a string pattern in one or more files, if the template includes spaces, it must be referenced, all strings after the template are treated as filenames, and the results of the search are sent to standard output without affecting the contents of the file. Grep can also be used in shell scripts, often using grep to indicate the state of the search by returning a status value, or 0 if the pattern search succeeds, or 1 if the search is unsuccessful, or 2 if the searched file does not exist. You can use these return values to automate some of the text processing tools.

The grep family includes grep, Egrep, and Fgrep.  Egrep and Fgrep commands are only a small difference from grep. Egrep is the extension of grep, which supports more re metacharacters, and fgrep is fixed grep or fast grep, which regards all the letters as words, that is, the metacharacters in the regular expression represents the literal meaning back to itself, no longer special. Linux uses the GNU version of grep. It is more powerful and can use the Egrep and FGREP functions with the-G,-e,-f command line options.

Regular expression definition: A collection of characters or special string patterns.
Function: Searches for text based on the pattern and displays lines of text that conform to the pattern.
Pattern (pattern): Text word wildcards regular the expression's meta-character combination to match the condition

#正则表达式就是里面有一些元字符, these characters do not represent the meaning of itself, but the meaning of a wildcard.

Note: Regular expressions work in greedy mode by default

Basic regular Expression meta-characters:
Character Matching:
. : matches any single character;
[]: matches any single character within the specified range;
[^]: matches any single character outside the specified range;
[:d Igit:] [: Alpha:] [: Upper:] [: Lower:] [: Space:] [: Alnum:] [:p UNCT:]

Number of matches: used to limit the number of occurrences of the preceding character, after the character to specify the number of occurrences;
*: Match its preceding characters any time; 0.1, multiple times;
. *: Matches any character of any length
\?: matches the preceding character 0 or 1 times, that is, the preceding character is optional;
\+: Matches the preceding character one or more times, that is, the preceding character appears at least once;
\{m\}: Matches the preceding character m times;
\{m,n\}: Matches the characters preceding it at least m times, up to n times;

Location anchoring:
^: Anchor at the beginning of the line, for the leftmost mode;
$: End of line anchoring: for the right-most side of the pattern;
^pattern$: Using PATTERN to match the whole line;
^$: blank line;
^[[:space:]]*$: Blank lines or lines that contain white space characters
\< or \b: The first anchor of the word, used on the leftmost side of the word;
\> or \b: The ending anchor for the rightmost side of the word;
\<pattern\>: Exact match of complete words;

Grouping and referencing
\ (\): Binds one or more characters together as a whole
\ (xy\) *ab

Note:
The contents of the pattern in the grouping brackets are automatically recorded in the internal variables by the regular expression engine, which are:
\1: The pattern from the left side, the first opening parenthesis and the matching closing parenthesis, matches the character of the pattern;
\2: The pattern from the left side, the second opening parenthesis, and the matching closing parenthesis to the character;
\3: ...
...
* Back reference: Refers to the character to which the pattern in the preceding grouping brackets matches

Extending the meta-character of a regular expression

Character Matching:
.:
[]:
[^]:

Number of matches:
*: Any time
? : 0 or 1 times,
+: Its first character at least once;
{m}: the characters before the period m times;
{M,n}: At least m times, up to n times;

Location anchoring:
^: Anchor at the beginning of the line
$: End of line anchoring
\<,\b: The first anchor of the word
\>,\b: Ending anchoring
Grouping and referencing:
(): grouping; the characters that match the pattern in parentheses are recorded in the internal variables of the regular expression engine;
Back to reference: \1,\2 ...
Or:
A|b:
C|cat:c or Cat
(c| C) At:cat or cat


grep:
     #文本搜索过滤工具, support basic regular Expressions
    synopsis:grep [ OPTIONS] PATTERN [FILE ...]
    options:
       --color=auto: Highlight the text you have matched to;
        -i:igmorecase, ignoring character case;
       -O: Displays only the string that matches to itself;
       -v:--invert-match: Reverse display;
       -N: Output line number
       -c: Count the number of occurrences of the string matched to
       -W: Exact match in characters
        -e:--entended-regexp: Supports extended regular expression metacharacters;
       -q:--quiet,-- Silent; silent mode, do not output any information;
       -A #:after, after # lines
        -B #:before, front # line
       -C #:context, front and back # lines


e.g:

1, view SDA hard drive status, and display line number ~]# df-lh |grep-n "\<sda.*" 2:/dev/sda3 30G 3.2G 27G 11%/7:/dev/sda1 497M 154M 34 4M 31%/boot2, display only 3 characters of Word in test file ~]# grep-o-W "[[: Alpha:]]\{3\}" Test Varntpetcntpbinbin



egrep:
    support extended regular expression implementations similar to grep text filtering          
    synopsis:grep [OPTIONS] [-E PATTERN |-f FILE] [FILE ...]
    optinos:
           with grep
           - G: Holding basic Regular Expression
e.g:

 1, find all 1 and two digits in the file and show only the string that is matched to the ~]# grep -eo  "(\<[0-9][ 0-9]\>|\<[0-9]\>) " test89893838727202, displays/etc/grub2.cfg file with at least one whitespace character followed by a line with non-whitespace characters, and the last five rows of the result of the zone are output to the screen ~]#  grep -n  "^[[:space:]]\+[^[:space:]"  /etc/grub2.cfg|tail -5112:     fi113:    linux16 /vmlinuz-0-rescue-f6d8429dc7d34e548fde61cc3c526f0c root= uuid=49ceaa41-d060-4dd1-b3d0-c9d7928958fc ro crashkernel=auto rhgb quiet114:     initrd16 /initramfs-0-rescue-f6d8429dc7d34e548fde61cc3c526f0c.img136:  source  ${config_directory}/custom.cfg138:  source  $prefix/custom.cfg; 


#egrep可以把很多grep里面的正则表达式转意符去除, make the command more concise, refer to extended regular expressions

Fgrep: Regular expression meta-characters are not supported;
It is better to use fgrep when you do not need to use meta-characters to write patterns.



This article is from the Linux OPS blog, so be sure to keep this source http://allenyang.blog.51cto.com/10991027/1788196

Linux Text processing the Three Musketeers grep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.