Regular expressions and text processing one of the Three Musketeers: grep and Egrep

Source: Internet
Author: User
Tags control characters uppercase letter expression engine egrep

for people who are just touching, regular expression learning and using a painful and confusing thing, but as long as there is enthusiasm, will find very interesting. Then please let us play the spirit, come up with their own interests and passion into the wonderful world of Linux.

What is a regular expression?

Regular expressions are what you define, a pattern template that Linux tools use to filter text, in other words, a means for text matching and filtering using certain tools (such as grep and egrep that are written today), implemented by the regular expression engine, While the regular expression engine is the underlying software that interprets the regular expression pattern and uses these patterns for text matching, there are two popular regular expression engines in Linux (two regular expressions are basic regular expressions and extended regular expressions, respectively):

(1) POSIX basic Regular expression type (BRE) engine

(2) POSIX extended regular expression (ERE) engine

What is grep and egrep?

As mentioned above, the text matching and filtering of regular expressions need to be implemented using tools, and grep and Egrep are two of them (in fact grep-e are equivalent to egrep, so they can also be used as a tool).

Basic Regular Expressions:

Character Matching:

.: represents matching any single character. Note that it must be a single character, such as AB. Matches three characters.

[]: Denotes any character that matches "[]", similar to wildcards (similar because wildcards are case insensitive and regular expressions are strictly case-sensitive), or "-" can be used to match any character in a range, for example "[0-9]" Represents any single number between 0 and 9 and contains 0 and 9, "[A-z]" for lowercase letters, and "[A-z]" for uppercase letters.

[[: Alpha:]]: Indicates a match of any one letter, equivalent to [a-za-z].

[[: Upper:]]: means matching any uppercase letter, equivalent to [a-z].

[[: Lower:]]: Indicates matching any lowercase letter, equivalent to [a-z].

[[:d Igit:]]: Indicates a match to any number, equivalent to [0-9].

[[: Alnum:]]: to match any number of characters, note does not include white space and other special characters, equivalent to [[: alpha:][:d Igit:]], also equivalent to [0-9a-za-z]

[[:p UNCT:]]: Indicates a match to any punctuation mark.

[[: Space:]]: Indicates matching any white space character (including spaces, blank lines, tabs).

[[: Graph:]]: Indicates matching any non-null character (not a space, control character).

[[: Contrl:]]: means matching any one of the control characters (that is, Ctrl+key).

[[:p Rint:]]: Indicates that any non-null character (including spaces) is matched.

[[: Xdigit:]]: Indicates matching any 16 binary digits.

Note: Because it is written in the [] class capacity, I add it outside []

[^]: to match any other single character in the [], note that the characters in [] are not included in the first ^ (because the ^ only appears at the front of the special meaning); for example [^^] denotes any single character other than ^, the others just need to precede it with a ^ on it for example [^[:alpha:] ] represents any single character that matches a non-letter. The important thing to say three times: I just to write [] the content so all in the outermost added []. So in fact [: Alpha:] represents the character!!!!!

Number of matches: used after the specified character to limit the number of occurrences of this character:

*: Represents a match for the preceding character n times, note that it can be 0 times such as ab*c can be any string containing AC,ABC,ABBC (the reason is "contained" because of the greedy pattern of regular expressions, It will match the match to the one that contains all of your given patterns) so the above example AACC also matches.

+: Indicates that the preceding character of the match appears at least once or n times, note that the characters in front of it must not be there, for example

?: Represents a match before its preceding character appears at least 0 times at most once

\{m\}: Represents a match for its preceding character to appear m times, note must be M-times

\{m,n\}: Represents a match whose preceding character appears at least m times, up to N times

\{m,\}: Represents a match for which the preceding character appears at least once, at most

\{,n\}: Represents matches its preceding character at most once, at least not unlimited, can be 0 times

Location anchoring:

^: Indicates the beginning of the line anchor, must be used in the leftmost mode is the beginning of the anchor, otherwise invalid

$: Indicates the end of line anchoring, must appear at the far right of the pattern is the end of the line anchor, otherwise invalid

Note: This means that the line is anchored and is the entire line!!! A single use can only match the beginning or row of the same mode as the pattern, and it is necessary to use two together to anchor the beginning and the row position at the same time . For example, the following two looks special:

^$: Represents a blank line

^[[:space:]]*$: Represents a blank line or contains a blank word lines

\< or \b: Indicates that the first anchor of the word must be on the side of the word that you want to anchor now!!!

\ > or \b: Indicates the ending anchor and must be on the right side of the word you want to anchor now!!!

Note: These two are used alone when not matching the meaning of the full word, in order to match the whole word on the two simultaneous use of \< Word mode \>

Grouping and referencing: When you want to match a pattern multiple times the best way is to group it, and then refer to it, that is, "(matching mode)" Of course, there can also be grouped in the matching pattern, that is, grouping nested, but no matter how nested our regular expression engine will order the outermost "(matching pattern) "The match to the content exists in 1, from the outside to the inside, and so on, respectively, is 234 ...." N (It is important to note that it matches the content, not the schema itself), and then the reference only needs 1234 .... N is available:

(matching mode): Matching mode can also have a matching mode, be sure to pay attention to.

Extend regular Expressions (here I'm only explaining what's not the same as the basic regular expressions, you know "don't reinvent the wheel"):

Character matching (as with basic regular expression usage and meaning):

.:

[]:

[^]:

Number of matches:

*: Same as the basic regular expression usage

+: "+" in an extended regular expression does not need to be escaped, as is the basic regular expression

? : Same "? "Also does not need to be escaped, as is the basic regular expression meaning

{m}: Similarly, "{" and "}" do not need to be escaped, as are the basic regular expression meanings

{M,n}

{, n}

{m,}

Positional anchoring (as with basic regular expression usage):

^

$

\< or \b

\> or \b

Grouping and referencing:

(Match pattern): No escaping required, other reference methods and meanings are the same as regular expressions

Or (This is the extended regular expression special):

|: Match to the left or right, note that the left side refers to the left side of the entire content, the right is the right side of the entire content

For example: "like|r" means matching "like" or "R"

OK, regular expression basic meta-characters and meanings are finished, remember to hit the keyboard more practice, only learn not to practice an empty Ah! So write now. Two tools grep and egrep that support the use of regular expressions (only wood does not have an electric saw).

What is grep and egrep?

grep and Egrep are text search tools that print out matches to match to, grep does not have options to support basic regular expressions only, proxy "-E" indicates support for extended regular expressions, EGREP supports extended regular expressions by default and "Grep-e" meaning.

grep command usage:

grep [options] Match pattern file

Common options:

-E: Multiple matching modes can be used simultaneously, for example: Grep-e ' match mode 1 '-e ' matching mode 2 '

。。。 File

-F writes a matching pattern file (Scripfile): Indicates that the matching pattern is extracted from the Scripfile to match the file

-Q: Silent mode, does not print the class volume regardless of the success of the match

--color=auto: Highlight the pattern that is matched to

-V: Output only content that is not matched to the pattern

-O: Output only matches to the pattern itself

-E: Supports extended regular expressions

-I: Do not differentiate case in match pattern

-A #: Outputs the following # lines in addition to the rows that the output matches

-B #: Outputs the lines in front of it in addition to the output that matches the line

-C #: Outputs its front and back # lines in addition to the rows to which the output matches

Egrep Command usage:

-e,-f,-q,--color=auto,-v,-o,-i,-a #,-b#,-c#: as grep means

-G: Supports basic regular expressions, which is equivalent to grep.

Regular expressions and text processing one of the Three Musketeers: grep and Egrep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.