grep and regular expressions for Shell programming

Last Update:2016-07-11 Source: Internet

Author: User

Tags expression engine egrep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Text Processing tools:

The Three musketeers of text processing on Linux:
grep: Text Filter tool (mode: pattern);
grep: Basic Regular expression,-e,-f
Egrep: Extended Regular expression,-g,-f
Fgrep: Regular expressions are not supported,-e,-g
Sed:steam Editor, stream editors, text editing tools;
The implementation on Awk:linux is gawk, Text Report Generator (formatted text);

Regular expression: Regular expression,regexp
A pattern written by a class of special characters and text characters, in which some characters do not represent their literal meanings, but are used to denote the function of control or distribution;
Divided into two categories:
Basic Regular Expressions: BRE
Extended Regular expression: ERE

Regular expression engine:
Using different algorithms to check the software module for processing regular expressions
PCRE (Perl Compatible Regular Expressions)

Metacharacters: \ (hello[[:space:]]\+\) \+
Meta-character classification: character matching, number of matches, position anchoring, grouping

Grep:global search REgular expression and Print out of the line.

Function: Text Search tool, according to user-specified "mode (filter)" to match the target text line by row to check; print matching lines;
Pattern: The filter condition written by metacharacters and text characters of regular expressions;

Usage: grep "UUID"/etc/fstab
grep [OPTIONS] PATTERN [FILE ...]
grep [OPTIONS] [-E PATTERN |-f file] [FILE ...]

grep root/etc/passwd
grep "$USER"/etc/passwd
grep ' WhoAmI '/etc/passwd

Options:
--color=auto: Display color;
-I,--ignore-case: ignores character case;
-O,--only-matching: Displays only the matching parts;
-N,--line-number: Displays the line number;
-V,--invert-match: Reverse display, showing rows not matched to;
-E,--extended-regexp: supports the use of extended regular expressions;
-Q,--quiet,--silent: Silent mode, that is, do not output any information;
-W,--word-regexp: The whole line matches the entire word;
-C,--count: The number of rows that the statistic matches to; Print a count of matching lines;

-a#:after, after # line
-b#:before, Front # line
-c#:context, front and back # lines

Basic regular Expression meta-characters:

Character Matching:
.: matches any single character;
[]: matches any single character within the specified range;
[^]: matches any single character outside the specified range;

Number of matches: used to limit the number of occurrences of the preceding character, after the character to specify the number of occurrences;
*: Matches its preceding character any time, 0, 1, multiple times;
For example: grep "X*y"
Abxy
Aby
Xxxxy
Yab
. *: Matches any character of any length;
\?: matches the preceding character 0 or 1 times;
\+: Matches the preceding character 1 or more times, that is, the preceding character must appear at least 1 times;
\{m\}: Matches its preceding character m times;
\{m,n\}: Matches its preceding character at least m times, up to n times;
\{0,n\}: Up to n times
\{m,\}: At least m times

Location anchoring:
^: Anchor at the beginning of the line; for the leftmost side of the pattern; match the beginning character;
grep ' ^root '/etc/passwd matches characters starting with Root

$: End-of-line anchoring; for the rightmost side of the pattern; matches the trailing character;
grep ' r.*h$ '/etc/passwd matches characters beginning with R ending with H

^$: Blank Line
^[[:space:]]*$: A blank line or a line containing white space characters;

Word: A continuous character (string) consisting of a non-special character is called a word;

\< or \b: The first anchor of the word, used for the left side of the word pattern, defines the left edge of the word;
\\> or \b: The ending anchor for the right side of the word pattern;
Hello\> used to match words ending with Hello
\<pattern\>: matches complete words;
\

Grouping and referencing
\ (\): Bind one or more characters together and treat them as a whole; parentheses cannot intersect, but can be nested;
\ (xy\) *ab

Note: The contents of the pattern in the grouping brackets are automatically recorded in the internal variables by the regular expression engine, and these variables are:
\1: The pattern from the left side, the first opening parenthesis and the matching closing parenthesis, matches the character of the pattern;
\2: The pattern from the left side, the second opening parenthesis, and the matching closing parenthesis to the character;
[3]
...

Vim Lovers.txt

He loves his lover.
He likes his lover.
She likes her liker.
She loves her liker.

grep "\ (L.. e\). *\1 "Lovers.txt

grep "^\" (R. t\). *\1 "/etc/passwd

Back reference: \1 represents a back reference, referring to the content that the first parenthesis above matches;

\d: matches a number; equivalent to [0-9];
\w: Matches letters, numbers, and underscores;
\w: Matches non-letters, numbers and underscores;
\ n: line break;
\ r: Enter;
\ t: tab; tab
\f: page break;
\s: white space character;
\s: non-whitespace characters;

Egrep

An extended regular expression implementation is similar to the grep text filtering function; Grep-e

Egrep [OPTIONS] PATTERN [FILE ...]
Option: Same as grep
Special options:
-G: Support for basic regular expressions

Extend the metacharacters of regular expressions:
Character Matching:
.: Any single character
[]: Any single character within the specified range
[^]: Any single character outside the specified range
Number of matches:
*: Any time, 0,1 or multiple times;
?: 0 Times or 1 times, before the characters are dispensable;
+: Its preceding characters at least 1 times;
{m}: its preceding character m times;
{M,n}: At least m times, up to n times;
{0,n}
{m,}
Position anchoring
^: Anchor at the beginning of the line;
$: End of line anchoring;
\<,\b: The first anchor of the word;
\>,\b: Final anchoring;
grep "\&LT;ABC" F1 lines that filter ABC start words
grep "abc\>" F1 line of words that filter the end of ABC
grep ' c.\{2\}t ' F1 c any two characters after multibyte T

Grouping and referencing
(): grouping; the character that the pattern in parentheses matches to is recorded hermetical the internal variables of the expression engine;
Back reference: \1,\2,...
Or:
A|b:a or B;
C|cat:c or cat;
(c| C) At:cat or cat

Fgrep: Regular expression meta-characters are not supported;
Use Fgrep for better performance when you do not need to use meta-characters to write patterns;

grep and regular expressions for Shell programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More