Linux Text Processing Three musketeers: grep sed awk
Grep:global search Regular expression and Print out of the line.
Function: A text Search tool that matches the target text against the user-specified filter criteria and prints out qualifying rows
Pattern: The filter condition written by metacharacters and text characters of regular expressions;
Use: grep [OPTIONS] PATTEN [FILE ...]
OPTIONS:
--colour=auto: Coloring a match to a string after highlighting
-i:ignorecase, ignoring the case of characters;
-O: Displays only the matching string;
-V: Matched rows are not displayed, displaying rows that cannot be matched
-E: Supports the use of extended regular expression metacharacters;
-Q,--Quite,--silent: silent mode, that is, do not output any information;
-a #:after displays the following # lines of matching rows
-B #: Before displays the first # lines of matching rows
-C #: Context, show front and back # lines
Basic regular Expression meta-characters
character Matching:
.: matches any single character;
[]: matches any single character within the specified range;
[^]: matches any single character outside the specified range;
[:d IGLT:] [: Alpha:] [: Upper:] [: Lower:] [: Alnum:] [:p UNCT:] [: Space:]
Number of matches:
*: matches the preceding character any number of times, 0, 1, multiple times
. *: Matches any character of any length
\?: matches its preceding character 0 or 1 times,
\+: Matches its preceding character at least once;
\{m\}: Matches its preceding character m times
\{m,n\}: Matches its preceding character at least m times, up to N times
\{0,n\}: Up to n times
\{m,\}: At least m times
Location anchoring:
^: denotes the beginning of a character following a symbol
$: Indicates the end of a character preceded by a symbol
\< or \b: The first anchor of the word to the left of the word pattern
\> or \b: ending anchor for the rightmost side of the word pattern
\<patten\>: Match full word PATTEN
Group and application:
\ (\): Bind one or more characters together and treat them as a whole;
Note: The contents of the pattern matching in the grouping brackets are automatically recorded in the internal variables by the regular expression engine, these variables are:
\1: The pattern from the left side, the first opening parenthesis, and the matching closing parenthesis, matches the character of the pattern;
\2: The pattern from the left side, the second opening parenthesis, and the matching closing parenthesis, matches the character of the pattern;
\3: ...
...
egrep:
An extended regular expression implementation is similar to the grep text filtering feature; grep-e
Egrep [OPTION] PATTEN [FILE]
Options:
-i,-o,-v,-q,-a,-b,-c
-G: Support for basic regular expressions
Extend the metacharacters of regular expressions:
Character Matching:
.: matches any single character;
[]: matches any single character within the specified range;
[^]: matches any single character outside the specified range;
[:d IGLT:], [: Alpha:], [: Upper:] [: Lower:] [: Alnum:] [:p UNCT:] [: Space:]
Number of matches:
*: Any time
?: 0 Times or 1 times
+:1 Times or more
{m}:m times
{M,n}: At least m times, up to N times
{0,n}, {m,}
Location anchoring:
^: denotes the beginning of a character following a symbol
$: Indicates the end of a character preceded by a symbol
\< or \b: The first anchor of the word to the left of the word pattern
\> or \b: ending anchor for the rightmost side of the word pattern
\<patten\>: Match full word PATTEN
Group and application:
(): Bind one or more characters together and treat them as a whole;
Note: The contents of the pattern in the grouping brackets are automatically recorded in the internal variables by the regular expression engine
Or:
A|b:a or B
C|cat:c or cat
(c|c) At:cat or cat
Grep--linux Text Processing One of the Three Musketeers