Text Processing tools:
The Three musketeers of text processing on Linux:
grep: Text Filter tool (mode: pattern);
grep: Basic Regular expression,-e,-f
Egrep: Extended Regular expression,-g,-f
Fgrep: Regular expressions are not supported,-e,-g
Sed:steam Editor, stream editors, text editing tools;
The implementation on Awk:linux is gawk, Text Report Generator (formatted text);
Regular expression: Regular expression,regexp
A pattern written by a class of special characters and text characters, in which some characters do not represent their literal meanings, but are used to denote the function of control or distribution;
Divided into two categories:
Basic Regular Expressions: BRE
Extended Regular expression: ERE
Regular expression engine:
Using different algorithms to check the software module for processing regular expressions
PCRE (Perl Compatible Regular Expressions)
Metacharacters: \ (hello[[:space:]]\+\) \+
Meta-character classification: character matching, number of matches, position anchoring, grouping
Grep:global search REgular expression and Print out of the line.
Function: Text Search tool, according to user-specified "mode (filter)" to match the target text line by row to check; print matching lines;
Pattern: The filter condition written by metacharacters and text characters of regular expressions;
Usage: grep "UUID"/etc/fstab
grep [OPTIONS] PATTERN [FILE ...]
grep [OPTIONS] [-E PATTERN |-f file] [FILE ...]
grep root/etc/passwd
grep "$USER"/etc/passwd
grep ' WhoAmI '/etc/passwd
Options:
--color=auto: Display color;
-I,--ignore-case: ignores character case;
-O,--only-matching: Displays only the matching parts;
-N,--line-number: Displays the line number;
-V,--invert-match: Reverse display, showing rows not matched to;
-E,--extended-regexp: supports the use of extended regular expressions;
-Q,--quiet,--silent: Silent mode, that is, do not output any information;
-W,--word-regexp: The whole line matches the entire word;
-C,--count: The number of rows that the statistic matches to; Print a count of matching lines;
-a#:after, after # line
-b#:before, Front # line
-c#:context, front and back # lines
Basic regular Expression meta-characters:
Character Matching:
.: matches any single character;
[]: matches any single character within the specified range;
[^]: matches any single character outside the specified range;
Number of matches: used to limit the number of occurrences of the preceding character, after the character to specify the number of occurrences;
*: Matches its preceding character any time, 0, 1, multiple times;
For example: grep "X*y"
Abxy
Aby
Xxxxy
Yab
. *: Matches any character of any length;
\?: matches the preceding character 0 or 1 times;
\+: Matches the preceding character 1 or more times, that is, the preceding character must appear at least 1 times;
\{m\}: Matches its preceding character m times;
\{m,n\}: Matches its preceding character at least m times, up to n times;
\{0,n\}: Up to n times
\{m,\}: At least m times
Location anchoring:
^: Anchor at the beginning of the line; for the leftmost side of the pattern; match the beginning character;
grep ' ^root '/etc/passwd matches characters starting with Root
$: End-of-line anchoring; for the rightmost side of the pattern; matches the trailing character;
grep ' r.*h$ '/etc/passwd matches characters beginning with R ending with H
^$: Blank Line
^[[:space:]]*$: A blank line or a line containing white space characters;
Word: A continuous character (string) consisting of a non-special character is called a word;
\< or \b: The first anchor of the word, used for the left side of the word pattern, defines the left edge of the word;
\\> or \b: The ending anchor for the right side of the word pattern;
Hello\> used to match words ending with Hello
\<pattern\>: matches complete words;
\
Grouping and referencing
\ (\): Bind one or more characters together and treat them as a whole; parentheses cannot intersect, but can be nested;
\ (xy\) *ab
Note: The contents of the pattern in the grouping brackets are automatically recorded in the internal variables by the regular expression engine, and these variables are:
\1: The pattern from the left side, the first opening parenthesis and the matching closing parenthesis, matches the character of the pattern;
\2: The pattern from the left side, the second opening parenthesis, and the matching closing parenthesis to the character;
[3]
...
Vim Lovers.txt
He loves his lover.
He likes his lover.
She likes her liker.
She loves her liker.
grep "\ (L.. e\). *\1 "Lovers.txt
grep "^\" (R. t\). *\1 "/etc/passwd
Back reference: \1 represents a back reference, referring to the content that the first parenthesis above matches;
\d: matches a number; equivalent to [0-9];
\w: Matches letters, numbers, and underscores;
\w: Matches non-letters, numbers and underscores;
\ n: line break;
\ r: Enter;
\ t: tab; tab
\f: page break;
\s: white space character;
\s: non-whitespace characters;
Egrep
An extended regular expression implementation is similar to the grep text filtering function; Grep-e
Egrep [OPTIONS] PATTERN [FILE ...]
Option: Same as grep
Special options:
-G: Support for basic regular expressions
Extend the metacharacters of regular expressions:
Character Matching:
.: Any single character
[]: Any single character within the specified range
[^]: Any single character outside the specified range
Number of matches:
*: Any time, 0,1 or multiple times;
?: 0 Times or 1 times, before the characters are dispensable;
+: Its preceding characters at least 1 times;
{m}: its preceding character m times;
{M,n}: At least m times, up to n times;
{0,n}
{m,}
Position anchoring
^: Anchor at the beginning of the line;
$: End of line anchoring;
\<,\b: The first anchor of the word;
\>,\b: Final anchoring;
grep "\<ABC" F1 lines that filter ABC start words
grep "abc\>" F1 line of words that filter the end of ABC
grep ' c.\{2\}t ' F1 c any two characters after multibyte T
Grouping and referencing
(): grouping; the character that the pattern in parentheses matches to is recorded hermetical the internal variables of the expression engine;
Back reference: \1,\2,...
Or:
A|b:a or B;
C|cat:c or cat;
(c| C) At:cat or cat
Fgrep: Regular expression meta-characters are not supported;
Use Fgrep for better performance when you do not need to use meta-characters to write patterns;
grep and regular expressions for Shell programming