In the process of using Linux, a large number of text files need to be processed, which is in line with the philosophy of Linux-all documents. And because of this, Linux has built-in text-processing musketeers, which are grep, sed, awk, respectively. And the Three Musketeers in the process of using a different focus, today will be mainly about the text filtering tool grep.
1. Use of the grep command
grep option Keyword text path
grep root/etc/passwd
Grep
--color=auto: Color matching to the text display (7 auto display, 6 not shown)
-I: Ignore character case
-N: Show matching line numbers
-O: Show only the matching string
-Q: Silent mode, does not output any information
-E: Implementing a logical or relationship between multiple options
Grep–e ' Cat '-e ' dog ' file
-W: Matches entire word
2. Basic Regular Expressions
When it comes to grep, it has to be said that regular expressions, regular expressions are divided into two basic regular, extended regular, and many of the tools used later support regular expressions, such as Vim,sed,awk, including Python support. Regular expressions are divided into 4 classes of metacharacters: character matching, number of matches, position anchoring, grouping
Character Matching:
. Match any single character
[] matches any single character within the specified range
[^] matches any single character outside the specified range
[: Alnum:] Letters and numbers
[: Alpha:] represents any English uppercase and lowercase characters, i.e. A-Z, A-Z
[: Lower:] lowercase letter [: Upper:] Uppercase
[: Blank:] white space characters (spaces and tabs)
[: Space:] Horizontal and vertical white space characters (more than [: blank:] contains a wide range)
[: Cntrl:] non-printable control characters (backspace, delete, alarm ...) )
[:d igit:] decimal digits [: xdigit:] hexadecimal digits
[: Graph:] printable non-whitespace characters
[:p rint:] printable characters
[:p UNCT:] Punctuation
Number of matches:
* match the preceding character any time, including 0 times
. * Any character of any length
\ match its preceding character 0 or 1 times
\+ matches the characters in front of it at least 1 times
\{n\} matches the preceding character n times
\{m,n\} matches the preceding character at least m times, up to N times
\{,n\} matches the preceding character up to n times
\{n,\} matches the preceding character at least n times
Location anchoring:
^ Beginning of the line anchor, for the leftmost mode
$ line End anchor for the right side of the pattern
^pattern$ for pattern matching entire row
^$ Empty Line
^[[:space:]]*$ Blank Line
\< or \b The first anchor for the left side of the word pattern
\> or \b ending anchor; for the right side of the word pattern
\<pattern\> Match Whole Word
Group:
Group: \ (\) binds one or more characters together as a whole,
The contents of the pattern in the grouping brackets are recorded in internal variables by the regular expression engine, which are named: \1, \2, \3, ...
\1 represents the character that matches the pattern between the first opening parenthesis and the matching closing parenthesis from the left
Example: Cat text |grep-o "\ (cat\). *\1\b"
Back reference: References the pattern in the preceding grouping brackets matches the character, not the pattern itself
Or: \|
Example:
Cat text |grep "Cat\|dong"
3. Basic Regular Expression Exercises
1 use DF and grep to remove disk partition utilization and sort from large to small
Df|grep-o "[[:d igit:]]\+%] |sort-n
2 Find the two-bit or three-digit number in the/etc/passwd
CAT/ETC/PASSWD |grep-o "\b[0-9]\{2,3\}\b"
3 Use this string: Welcome to magedu each character in Linux to de-order and sort, repeat the number of rows to the front
Echo ' Welcome to magedu Linux ' |grep-o ' [^[:space:] ' |sort |uniq-c |SORT-NR
4. Extending the regular expression
Through the previous exercises I found that the regular expression when writing a lot of \, write well look and expression package, and write when the idea is very clear, but after writing the reading is very poor, we inductive into the extension of the regular is: egrep \ grep-e, the two rules are the same, but the extension of the regular expression is much less \f Easy to read
Character Matching:
. Any single character
[] Specify the range of characters
[^] characters not in the specified range
Number of matches:
*: matches the preceding character any time
?: 0 or 1 times
+:1 Times or more
{m}: matches M-Times
{M,n}: At least m, up to N times
Location anchoring:
^: Beginning of the line
$: End of line
\<, \b: the first language
\>, \b: The end of the language
Group:
()
Back reference: \1, \2, ...
Or: |
Summary: The extended regular is Latter reference \1,\2,\3 , the first language \<,\b Tail \>,\b The anchor timing must be added \, other places do not need to add \
5. Extended regular exercise
1. Find the line at the beginning of the/etc/rc.d/init.d/functions file that has a word (including an underscore) followed by a parenthesis
Cat/etc/rc.d/init.d/functions |egrep "^.*\>\ (\)"
2. Use Egrep to remove its base name in/etc/rc.d/init.d/functions
echo/etc/rc.d/init.d/functions/|egrep-o "[^/]+/?$"
3. Use Egrep to remove the directory name expression exercise above the path
echo "/etc/rc.d/init.d/functions" |egrep-o ". */." |egrep-o ". */"
4, using extended regular expressions to represent 0-9, 10-99, 100-199, 200-249, 250-255 (often used to determine the correctness of the Ipv4 address)
echo {1..300}|egrep-o "\b[0-9]\b|\b[1-9][0-9]\b|\b1[0-9][0-9]\b|\b2[0-4][0-9]\b|\b25[0-5]\b" (the figure is too long for a truncated part)
5. Display all IPV4 addresses in ifconfig command results
Ifconfig |egrep-o "\< ([0-9]|[ 1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]) \.) {3} ([0-9]| [1-9] [0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]) \> "
Getting Started with Linux--text processing the grep of the Three Musketeers