Differences between Linux text processing tool grep and regular expressions and egrep and grep
The text processing tool grep, a regular expression, is prone to confusion and obstacles in the Linux learning process. Here we will share some of my feelings about this.
Grep Global search REgular expression and Print out the line
Purpose: The text search tool checks the target text row by row based on the 'pattern (filter condition) 'specified by the user, and prints the matched rows;
'Mode': filtering conditions written by metacharacters of regular expressions and text characters.
Grep [OPTIONS] PATTERN [FILE...]
Grep [OPTIONS] [-e PATTERN |-f FILE] [FILE...]
Common options:
-I: case-insensitive characters
-O: only the matched strings are displayed.
-V: displays rows that cannot be matched by the pattern.
-E: supports extended regular expression metacharacters
-Q: silent mode. Matching is not displayed.
-A #: after, after which the row where the matching condition is located is displayed # Row
-B #: before, display the first row of the row where the matching condition is located # Row
-C #: context: displays the front and back of the row where the matching condition is located # Rows
-N: displays the matched row number (less used)
-C: count the number of matched rows (use less)
The following describes the usage and options of grep in a few small experiments.
Experiment directory/test text/test/head
123456789 |
[root@localhost test ] # cat head 12345 Sdabc saber Berar bserac 12cds 67890 12 345 123 6 |
Regular Expression: Regual Expression, REGEXP
It is a mode written by a special character or text character. Some of the characters do not represent their literal meaning, but are used to represent the control or wildcard function.
It has two types: Basic Regular Expression BRE and extended regular expression ERE.
Basic Regular Expression metacharacters:
Including: character matching, matching times, location anchoring, grouping
Character match:
.: Match any single character; []: match any single character within the specified range
[^]: Match any single character out of the specified range
Common Set: [: digit:], [: lower:], [: upper:], [: alpha:], [: alnum:], [: punct:], [: space:]
Matching times: used after the character to be specified, used to specify the number of times the previous character will appear
*: Match any character before, including 0; greedy mode: match as long as possible
. *: Any character of any length
\? : Match the first character 0 or 1 time
\ +: Match the first character at least once
\ {M \}: match the previous characters m times
\ {M, n \}: match the previous character at least m times, at most n times
\ {, N \}: match the preceding characters up to n times
\ {M ,\}: match the previous character at least m times
Location positioning: locate the location that appears
^: The beginning of the line is anchored to the leftmost part of the mode.
$: Anchor at the end of the line, used at the rightmost of the Pattern
^ PATTERN $: Used to match the entire row in a PATTERN.
^ $: Empty rows
^ [[: Space:] * $: blank line
Words: continuous characters consisting of non-special characters are called Words in Linux.
\ <Or \ B: the beginning of the word, used on the left side of the word mode
\> Or \ B: The end of the word; used to the right of the word mode
\ <PATTERN \>: match the complete word
1. Search for rows starting with
2. Search for strings containing only for; search for content containing
3. Search for rows ending with for; search for strings ending with
GROUP: \ (\): binds one or more characters and processes them as a whole, for example, \ (root \) \ +
The pattern matching content in the grouping brackets is recorded in internal variables by the Regular Expression Engine. These variables are named in the following way: \ 1, \ 2, \ 3 ,...
\ 1: Starting from the left, the first left brace and the character matching the pattern between the right brace;
Instance: \ (string1 \ + \ (string2 \)*\)
\ 1: string1 \ + \ (string2 \)*
\ 2: string2
Backward reference: reference the pattern matching characters (rather than the pattern itself) in the group brackets)
The preceding command is used to retrieve the results that contain a for string followed by any character and appear once. The "\ 1" following the repeat the search object in the first parentheses.
Egrep = grep-E
Egrep [OPTIONS] PATTERN [FILE...]
Metacharacters of the extended regular expression:
Character matching: Same as the basic Regular Expression
Matching times:
*: Match any character before
? : 0 or 1 time
+: 1 time or multiple times
{M}: matching m times
{M, n}: At least m, up to n times
Location anchoring: Same as the basic Regular Expression
GROUP:
()
Backward reference: \ 1, \ 2 ,...
Or:
A | B
C | cat: C or cat
(C | c) at: Cat or cat
Finally, we will use nine examples to feel the functions implemented by the combination of grep and regular expressions.
1. display the lines starting with big or small s in the/proc/meminfo File
This can be easily solved by knowing the grep option I.
2. Display rows that do not end with/bin/bash in the/etc/passwd file
Use grep to retrieve rows that end with "/bin/bash", and then use the-v of grep to retrieve rows that do not return the preceding results. Similar to the effect of the supplement set in mathematics.
3. Find all IPv4 addresses of the local machine in the ifconfig command result.
There are three steps:
1) Use grep to lock the row containing IPV4. The rule can be seen by analyzing the information listed in ifconfig, as long as there is an IPv4 header with the letter "inet, so we only need to retrieve it.
2) use tr to replace all null values with ":" And compress them.
3) use cut to implement the result.
4. Find the maximum percentage of partition space usage
This is roughly divided into six steps:
1) filter Chinese Characters
2) use tr to replace all null values with ":" And compress
3) cut the value containing the percentage of utilization.
4) use tr again to remove %
5) Use sort to sort values in case-insensitive format.
6) use tail to retrieve the maximum value
5. display the default shell program of user rpc
The above search condition is the row whose name begins with rpc and ends with its character
6. Find two or three numbers in/etc/passwd.
Here we use an extended regular expression because it can be a more concise expression.
It should be noted that the string should be two or three digits, which requires the characters starting and ending with the string to be anchored
7. Find the rows in the/etc/rc. d/init. d/functions file that are followed by a parentheses after a word (including underscores ).
When we think about the conditions to be filtered, We need to pin the first line and character of the conditions. Otherwise, the condition range will be not rigorous.
Note that. * \> \ (\) will fail if it is changed to. * \ (\) \>. In fact, "()" has already been included, so there is a duplicate behind it, so it is prone to errors.
8. Use egrep to retrieve the base name of/etc/rc. d/init. d/functions
The preceding two methods are used. One method is to retrieve them directly using grep, and the other is to separate them. Features
9. Use an extended regular expression to represent 0-9, 10-99, 100-199, 200-249, and 250-255, respectively.
\ <[0-9] \>: 0-9
\ <[1-9] [0-9] |>: 10-99
\ <1 [0-9] [0-9] \> |\< 1 [0-9] {2} \>: 100-199
\ <2 [0-4] [0-9] \>: 200-249
\ <25 [0-5] \>: 250-255
The above is just a brief summary of grep and regular expressions, but you only need to master the basic content, you can learn more deeply.
Grep uses concise and Regular Expressions
Shell programming in Linux -- basic use of the grep command
Grep command details and related examples
Linux basic commands-grep
Set grep to highlight matching items
Linux grep command learning and summary
Example of 14 grep commands
This article permanently updates the link address: