Text processing on Linux "threesome"
grep, Egrep, Fgrep: Text Search tool; Search operations on a given text based on "pattern";
Regular Expressions: Regular expression, REGEX, a pattern written by a class of special characters and text characters, some of which do not represent their literal meaning, but are used to denote the function of control or distribution;
Divided into two categories:
Basic Regular Expressions: BRE
Extended Regular expression: ERE
grep family:
grep: Supports the use of basic regular expressions;
Egrep: Supports the use of extended regular expressions;
Fgrep: The use of regular expressions is not supported;
grep command:
Function: Text Search tool, according to the user specified "pattern (filter)" to match the target text line by row to check, print out the qualifying line;
Common options:
--color=auto: Color The matching text to highlight;
-I: ignore character case;
-O: Displays only the text that matches to itself;
-V,--invert-match: reverse match;
-E: Support for extended regular expressions;
-Q,--quiet,--silient: Silent mode, do not output any information;
Basic regular Expression meta-characters:
Character Matching:
.: matches any single character;
[]: matches any single character within the range;
[^]: matches any single character outside the range;
[:d Igit:],[:lower:], [: Upper:], [: Alpha:], [: Alnum:], [: Space:], [: Blank:], [:p UNCT:]
Number of matches:
Used to specify the number of occurrences of the character to be preceded by a limit to how many times the preceding characters appear, the default work in greedy mode;
*: matches the preceding character any time (0,1 or more);
. *: Any character of any length;
\+: Matches the preceding character at least 1 times;
\?: matches the previous 0 or 1 times;
\{b\}: The preceding character appears B, B is a non-negative integer;
\{b,d\}: The preceding character appears B, B is a non-negative integer, and a maximum of B times, at least D
\{0,n\}: Up to n times;
\{m,\}: at least m times;
Location anchoring:
Restricts the use of pattern search text, which restricts the text that the pattern matches to where it appears only in the target text;
^: Anchor at the beginning of the line; for the leftmost side of the pattern, ^pattern
$: End of line anchoring; for the right side of the pattern, pattern$
^pattern$: To make PATTERN exactly match a whole line;
^$: Empty line;
Word: A continuous character consisting of non-special characters (a string) is called a word;
\< or \b: The initial anchor for the left side of the word pattern, formatted as \<pattern, \bpattern
\> or \b: The ending anchor for the right side of the word pattern, formatted as PATTERN\> pattern\b
\<pattern\>: Word anchoring;
Grouping and referencing:
\ (pattern\): The character matching this PATTERN is treated as a non-infringing whole;
Note: The patterns in the grouping brackets match the characters that are automatically recorded in the internal variables by the regular expression engine, which are \1, \2, \3, ...
Back reference: Refers to the string that matches the pattern in the preceding parentheses;
Two common options:
-E,--extended-regexp: supports the use of extended regular expressions
-F,--fixed-strings: Supports the use of fixed strings, does not support regular expressions, and is equivalent to Fgrep;
-G,--basic-regexp: Supports the use of basic regular expressions;
-P,--perl-regexp: supports the use of pcre regular expressions;
-E pattern,--regexp=pattern: multi-mode mechanism;
-F file,--file=file:file a text file containing a pattern for each line, the grep script;
Egrep
The grep command that supports the use of extended regular expressions is equivalent to GREP-E;
Extend the metacharacters of regular expressions:
Character Matching:
.: Any single character
[]: Any single character within the range
[^]: Any single character outside the range
Number of matches:
*: any time;
?: 0 or 1 times;
+:1 or multiple times;
{B}: Match B-time;
{b,d}: At least B times, at most d times;
{0,d}: up to D times
{b,}: at least B times
Location anchoring:
^: Beginning of the line
$: End of line
\<, \b: The head of the word
\>, \b: suffix
Grouping and referencing:
(pattern): a grouping in which the pattern in parentheses matches to a character that is recorded in a variable inside the hermetical expression engine;
Or:
A|b:a or B
C|cat: Indicates C or cat
(C|C) at: Indicates Cat or cat
This article is from the "11284919" blog, please be sure to keep this source http://11294919.blog.51cto.com/11284919/1748909
The essential "grep trio" and regular expressions in Linux