Linux: Regular expression grep command

Source: Internet
Author: User
Tags alphabetic character egrep

Basic syntax
A regular expression is often referred to as a pattern, which is used to describe or match a series of strings that conform to a certain syntactic rule.

One, select:|

    • | Vertical separators indicate selections, such as "Boy|girl" to Match "boy" or " girl"

second, quantity limit:+? *

    • + indicates that the preceding character must appear at least once (1 or more times), for example, "Goo+gle", can Match "Gooogle", "Goooogle" and so on;
    • ? indicates that the preceding character appears at most once (0 or 1 times), for example, "Colou?r", which can match "color" or "colour";
    • * Asterisks indicate that the preceding characters may not appear, or can appear once or multiple times (0, or 1, or more), for example, "0*42" can match 42, 042, 0042, 00042, etc.

Iii. Scope and Priority
() parentheses can be used to define the scope and precedence of a pattern string, which can be easily understood as whether to enclose the pattern strings in parentheses as a whole. For example, "GR (a|e) Y" is equivalent to "Gray|grey", (here is the priority, the vertical delimiter is used to select a or E instead of GRA and EY), "(grand)? Father" matches father and grandfather (the scope is experienced here?). Matches the contents of the parentheses as a whole).

Iv. Grammar (part)
There are many different styles of regular expressions, and here are some common regular expression matching rules for Perl and Python programming languages and grep or egrep that are commonly used as PCRE subsets: (due to the problem of Markdown table parsing, the following vertical delimiter is replaced with full-width characters, Please change the half-width character when you actually use it.

PCRE (Perl Compatible Regular Expressions Chinese Meaning: Perl language compatible regular Expressions) is a library of regular expression functions written in C, written by Philip Heyzer. Pcre is a lightweight library of functions that is much smaller than regular expression libraries like boost. The PCRE is very easy to use and also powerful, with performance exceeding the POSIX regular expression library and some classic regular expression libraries.

Five, character description

  • \ marks the next character as a special character, or a literal character. For example, "n" matches the character "n". "\ n" matches a line break. The sequence "\ \" matches "\" and "\ (" Matches "(".
  • ^ matches the starting position of the input string.
  • $ matches the end position of the input string.
  • {N}n is a non-negative integer.} Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".
  • {N,}n is a non-negative integer.} Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
  • {N,M}M and N are non-negative integers, where n<=m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Note that there can be no spaces between a comma and two numbers.
  • * matches the preceding subexpression 0 or more times. For example, zo* can match "Z", "Zo", and "zoo". * Equivalent to {0,}.
  • + matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
  • Match the preceding subexpression 0 or one time. For example, "Do (es)?" You can match "do" in "do" or "does".?
  • ? When the character immediately follows any other restriction (*,+,?,{n},{n,},{n,m}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "o+?" A single "O" will be matched, and "o+" will match all "O".
  • . matches any single character except "\ n". To match any character that includes "\ n", use a pattern like "(. |\n)".
  • (pattern) matches the pattern and obtains this matching substring. This substring is used for backward referencing. To match the parentheses character, use "\ (" or "\").
  • X|y matches x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food".
  • [XYZ] Character set (character Class). Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain". Where special characters have only backslashes \ Hold special meanings for escaping characters. Other special characters Furus, plus, all kinds of parentheses are used as ordinary characters. Caret ^ If it appears in the first place, it represents a negative character set, and if it appears in the middle of a string, it is only ordinary characters. Hyphen-Represents a character range description if it appears in the middle of a string, or only as a normal character if it appears in the first place.
  • [^XYZ] Exclusion type (negate) character set. Matches any character that is not listed. For example, "[^ABC]" can match "Plin" in "plain".
  • A [A-z] character range. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" to "Z".
  • [^a-z] Exclusion type of character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" to "Z".

Vi. Priority level
The priority is from top to bottom, from left to right, descending:

Operator Description
\ Escape character
(), (?:), (?=), [] Brackets and Brackets
*, + 、?、 {n}, {n,}, {n,m} Qualifier
^, $, \ Any meta-character Anchor points and sequences
| Choose

grep Pattern Matching command

First, the basic operation
The grep command is used to print matching pattern strings in the output text, which uses regular expressions as criteria for pattern matching. grep supports three regular expression engines, specified by three parameters, respectively:

Parameter description

    • -eposix extended Regular expression, ERE
    • -gposix basic Regular expression, BRE
    • -pperl Regular Expression, PCRE

Before using a regular expression with the grep command, first describe its common parameters:

Parameter description

    • -b match binaries as text
    • -C statistics in number of pattern matches
    • -i Ignore case
    • -n Displays the line number of the line in which the matched text is located
    • -v reverse selection, outputs the contents of mismatched rows
    • -r recursive matching lookup
    • -a nn is a positive integer representing the meaning of After, in addition to listing matching rows, It also lists the following n rows
    • -b nn is a positive integer, Represents the meaning of before, in addition to listing matching rows, lists the preceding n rows
    • --color=auto set matches in output to Auto color display

Using regular expressions

Using the basic regular expression, the BRE

Position

Find lines that start with "Shiyanlou" in the/etc/group file

' Shiyanlou  '/etc/'^shiyanlou' /etc/group

Number

#will match all strings beginning with ' Z ' ending with ' o '$ echo'Zero\nzo\nzoo'| Grep'Z.*o'#matches the match with ' Z ' beginning with ' O ', and contains a string of any character in the middle$ echo'Zero\nzo\nzoo'| Grep'Z.O'#matches a string that begins with ' Z ' and ends with any number of ' O '$ echo'Zero\nzo\nzoo'| Grep'zo*'#Note: where \ n is a line break

Choose

#grep is case-sensitive by default and will match all lowercase letters here$ echo'1234\NABCD'| Grep'[A-z]'#will match all the numbers$ echo'1234\NABCD'| Grep'[0-9]'#will match all the numbers$ echo'1234\NABCD'| Grep'[[:d igit:]]'#will match all lowercase letters$ echo'1234\NABCD'| Grep'[[: Lower:]]'#will match all uppercase letters$ echo'1234\NABCD'| Grep'[[: Upper:]]'#will match all the letters and numbers, including the 0-9,a-z,a-z$ echo'1234\NABCD'| Grep'[[: Alnum:]]'#will match all the letters$ echo'1234\NABCD'| Grep'[[: Alpha:]]'

Complete special symbols and descriptions are included below:

Special symbol Description

  • [: Alnum:] on behalf of English uppercase and lowercase letters and numbers, i.e. 0-9, A-Z, A-Z
  • [: Alpha:] on behalf of any English uppercase or lowercase letter, i.e. A-Z, A-Z
  • [: Blank:] represents both the blank key and the [Tab] key
  • [: Cntrl:] represents the control keys above the keyboard, which includes CR, LF, Tab, Del. Wait a minute
  • [:d Igit:] stands for numbers, i.e. 0-9
  • [: Graph:] All other keys except blank bytes (blank key and [Tab] key)
  • [: Lower:] stands for lowercase letters, i.e. A-Z
  • [:p rint:] represents any character that can be printed out
  • [:p UNCT:] represents punctuation (punctuation symbol), i.e.: "'?!; : # $...
  • [: Upper:] represents capital letters, i.e. A-Z
  • [: space:] Any character that generates whitespace, including a blank key, [Tab], CR, etc.
  • [: Xdigit:] Represents a numeric type of 16 digits, so includes: 0-9, A-f, a-f number and byte

Note: The use of special symbols is because the above [a-z] does not work in all cases, it is also related to the host's current language, that is, the value set on the lang environment variable, ZH_CN. UTF-8 words [A-z], that is, all lowercase letters, other languages may be case alternating, such as, "a a B b...z z", [A-z] may contain uppercase letters. So when using [a-z], make sure that the effect of the current language, using [: Lower:] does not have this problem.

# Exclude Characters

' Geek\ngood ' ' [^o] '

Note: When the ^ is placed inside brackets for the exclusion character, the line line is otherwise indicated.

Using an extended regular expression, ERE

To use the extended regular expression with grep, you need to add the-e parameter, or use Egrep.

Number

# matches only "Zo"  'zero\nzo\nzoo'zo{1}'#  matches with ' Zo ' Start with all the words  'zero\nzo\nzoo'zo{1,}' 

Note: The recommended master {N,m} can, +,?, *, these are not very intuitive, and easy to confuse.

Choose

 #   matches "www.shiyanlou.com" and "www.google.com"  $ echo   "  www.shiyanlou.com\nwww.baidu.com\nwww.google.com   | GREP-E  " www\. ( Shiyanlou|google) \.com   " #   $ Echo   " www.shiyanlou.com\nwww.baidu.com\nwww.google.com   " | Grep-ev  " www\.baidu\.com  "  

Note: Because the. Number has a special meaning, it needs to be escaped.

Linux: Regular expression grep command

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.