Text filtering tool (grep)

Source: Internet
Author: User
Tags egrep

Quote: The original post is published by "net player:
For example, in Linux, you can find grep, egrep, and fgrep programs. The differences are roughly as follows:

* Grep:
In traditional grep programs, only sentences that conform to the RE string are output without parameters. Common parameters are as follows:
-V: reverse expression. Only sentences without re strings are output.
-R: recursive mode. It can process all files in subdirectories at the same time.
-Q: silent mode. No result is output (except stderr. Return Value is usually obtained. True is used; false is used .)
-I: case insensitive.
-W: entire word comparison, similar to \ <word \>.
-N: the row number is output at the same time.
-C: only the rows matching the comparison are output.
-L: only the names of matching files are output.
-O: only the re-compliant string is output. (the new version of GNU is unique and not all versions are supported .)
-E: Switch to egrep.

* Egrep:
For the extended version of grep, many operations that traditional grep cannot or are inconvenient are improved. For example:
-Not Supported Under grep? And + modifier, but egrep can.
-Grep does not support comparison between a, B, and (ABC | XYZ), but egrep does.
-Grep must process {n, m} with \ {and \}, but does not need egrep.
I personally recommend that you use egrep instead of grep... ^_^.

* Fgrep:
The expression is only used for string processing, and all meta functions are disabled.

The general format of g r e p is:

Code: [copy to clipboard] grep [Option] basic regular expression [file]
Here, the basic regular expression can be a string.

Double quotation marks
When you input a string parameter in the g r e p command, it is best to enclose it with double quotation marks.

Single quotation marks should be used for invocation mode matching.

For example, "m y s t r I n g ". There are two reasons for doing this: one is to avoid misunderstanding as the s h e l command, and the other is to find strings composed of multiple words.
When you call a variable, double quotation marks should also be used, such as: g r e p "$ m y va R" file name. If not, no results will be returned.

Common g r e p options include:

Quote:-C only outputs the Count of matched rows.
-I is case insensitive (only applicable to single characters ).
-H: When querying multiple files, the file name is not displayed.
-L only names containing matching characters are output when multiple files are queried.
-N: the matching row and row number are displayed.
-S does not display the error message that does not exist or does not match the text.
-V: displays all rows that do not contain matched text.

Before starting the discussion, we will create a file, insert a piece of text, and add the <ta B> key after each column. The vast majority of G r e p command examples will take this as an example, it is named d a t .. f. Generate a file. The record structure of d a t a. F is as follows:

Quote: column 1st: City Location Number.
Column 2nd: Month.
Column 3rd: storage code and delivery year.
Column 4th: product code.
5th columns: Unified product price.
Column 6th: ID number.
7th columns: qualified quantity.

The file content is as follows:

Code: [copy to clipboard] $ cat data. f
48 dec 3bc1977 lpsx68.00 lvx2a 138
483 Sept 5ap1996 USP 65.00 lvx2c 189
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
219 dec 2cc1999 CAD 23.00 plv2c 68
484 Nov 7pl1996 CAD 49.00 plv2c 234
483 may 5pa1998 USP 37.00 kvm9d 644
216 Sept 3zl1998 USP 86.00 kvm9e 234

1. query multiple files
Query the word "Sort it" in all files"

Code: [copy to clipboard] $ grep "Sort it "*

2. Row matching
1) display the text containing the "4 8" string:

Code: [copy to clipboard] $ grep "48" data. f

2) Total number of output matched rows

Code: [copy to clipboard] $ grep-c "48" data. f
4

G r e p returns the number 4, indicating that there are 4 rows containing the string "4 8.

3) number of rows
Display the number of rows meeting the matching mode:

Code: [copy to clipboard] $ grep-n "48" data. f

The number of rows is in the first output column, followed by each matching row containing 4 8.

4) show unmatched rows
Show all rows that do not contain

Code: [copy to clipboard] $ grep-V "48" data. f

5) exact match
You may have noticed that in the previous example, the extracted string "4 8" contains other strings including "4 8", such as 4 8 4 and 4 8 3, in fact, we should accurately extract rows that only contain 4 8 rows.
A more effective method for extracting exact matching using g r e p is to add \> after extracting strings. Assume that the precise extraction of 4 8 is as follows:

Code: [copy to clipboard] $ grep "48 \>" data. fquote: Another method I have tried. It does not seem to work:
Note that there is a <ta B> key after extracting the string in each matching mode, so the operation should be as follows:
<Ta B> click t a B.
$ Grep "48 <tab>" data. f

6) case sensitive
By default, g r e p is case-sensitive. to query strings that are case-insensitive, you must use the-I switch. In the d a t a. F file, there is a month character s e p t, both uppercase and lowercase. To obtain this string for case-insensitive queries, the method is as follows:

Code: [copy to clipboard] $ grep-I "48" data. f

Grep and Regular Expressions

Use regular expressions to add rules to pattern matching. Therefore, you can add more options to the extraction information. It is best to enclose the Regular Expression in single quotes to prevent confusion between the proprietary mode used in g r e p and some special methods of the s h e l command.

1. mode range
Extract the location of the city where the code is 4 8 4 and 4 8 3. You can use [] to specify the string range.

Code: [copy to clipboard] $ grep "48 [34]" data. f
483 Sept 5ap1996 USP 65.00 lvx2c 189
484 Nov 7pl1996 CAD 49.00 plv2c 234
483 may 5pa1998 USP 37.00 kvm9d 644

2. does not match the beginning of the line
To make the beginning of a line not 4 or 8, you can use the ^ mark in square brackets.

Code: [copy to clipboard] $ grep "^ [^ 48]" data. f
219 dec 2cc1999 CAD 23.00 plv2c 68
216 Sept 3zl1998 USP 86.00 kvm9e 234

For string 48

Code: [copy to clipboard] $ grep-V "^ [^ 48]" data. f

3. Set case sensitivity.
Use the-I switch to block the case sensitivity of month s e p t

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep-I "Sept" data. f
483 Sept 5ap1996 USP 65.00 lvx2c 189
216 Sept 3zl1998 USP 86.00 kvm9e 234

Alternatively, you can use the [] mode to extract all information of each row containing s e p t and s e p t.

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '[ss] EPT 'data. f

If you want to extract all months that contain s e p t, regardless of the case, and this row contains a string of 483, you can use the pipeline command, that is, the output of the command on the left of the symbol "|" is used as the input of the command on the right of "|. Example:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '[ss] EPT 'data. f | grep 48
483 Sept 5ap1996 USP 65.00 lvx2c 189

Do not place the file name in the second g r e p command because the input information is from the output of the first g r e p command.

4. match any character
If you extract all the code starting with K and ending with D, you can use the following method, because the known code length is five characters:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep 'K... d' data. f
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
483 may 5pa1998 USP 37.00 kvm9d 644

The above code is slightly changed. The first two are uppercase letters, the middle two are arbitrary, and end with C:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '[A-Z] .. c 'data. f
483 Sept 5ap1996 USP 65.00 lvx2c 189
219 dec 2cc1999 CAD 23.00 plv2c 68
484 Nov 7pl1996 CAD 49.00 plv2c 234

5. Date Query
A common query mode is date query. Query all records whose names start with 5 and end with 1 9 9 6 or 1 9 9 8. Usage mode 5 .. 1 9 [6, 8]. This means that the first character is 5, followed by two vertices, followed by 1 9 9, and the remaining two digits are 6 or 8.

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '5 .. 199 [6, 8] 'data. f
483 Sept 5ap1996 USP 65.00 lvx2c 189
483 may 5pa1998 USP 37.00 kvm9d 644

6. Range combination
You must use [] to extract information. Assume that you want to obtain the city code. The first character is 0-9, the second character is between 0 and 5, and the third character is between 0 and 6. Use the following mode.

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '[0-9] [0-5 [0-6] 'data. f
48 dec 3bc1977 lpsx68.00 lvx2a 138
483 Sept 5ap1996 USP 65.00 lvx2c 189
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
219 dec 2cc1999 CAD 23.00 plv2c 68
484 Nov 7pl1996 CAD 49.00 plv2c 234
483 may 5pa1998 USP 37.00 kvm9d 644
216 Sept 3zl1998 USP 86.00 kvm9e 234

A lot of information is returned, either desired or unwanted. The returned result is correct in reference mode.

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '^ [0-9] [0-5] [0-6] 'data. f
219 dec 2cc1999 CAD 23.00 plv2c 68
216 Sept 3zl1998 USP 86.00 kvm9e 234

In this way, an expected correct result is returned.
Note the differences between the following boundary characters:
7. Probability of mode appearance
Extract all rows that contain numbers 4 at least twice. The method is as follows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '4 \ {2, \} 'data. f
483 may 5pa1998 USP 37.00 kvm9d 644

The preceding syntax indicates that the number 4 must be repeated at least twice. Note that there is no difference between the boundary characters.
Similarly, the extraction record contains the number 9 9 9 (three 9). The method is as follows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '9 \ {3, \} 'data. f
219 dec 2cc1999 CAD 23.00 plv2c 68

If you want to query all rows that repeat for a certain number of times, the syntax is as follows. The number 9 Repeat twice or three times:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '9 \ {3 \} 'data. f
219 dec 2cc1999 CAD 23.00 plv2c 68
[Sam @ chenwy Sam] $ grep '9 \ {2 \} 'data. f
483 Sept 5ap1996 USP 65.00 lvx2c 189
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
219 dec 2cc1999 CAD 23.00 plv2c 68
484 Nov 7pl1996 CAD 49.00 plv2c 234

Sometimes you need to query the number of repeated occurrences within a certain range. For example, if a number or letter appears twice to 6 times, in the following example, the matching number 8 appears twice to 6 times and ends with 3:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ cat myfile
83
888883
8884
88883
[Sam @ chenwy Sam] $ grep '8 \ {2, 6 \} 3' myfile
888883
88883

8. Use grep to match the "and" or "Mode
Add the-e parameter to the G r e p command, which allows the extension mode matching. For example, to extract the city code as 2 1 9 or 2 1 6, the method is as follows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep-e '2017 | 100' data. f
219 dec 2cc1999 CAD 23.00 plv2c 68
216 Sept 3zl1998 USP 86.00 kvm9e 234

9. Empty rows
Use ^ and $ to query empty rows. Use the-C parameter to display the total number of rows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep-c '^ $ 'myfile

Use the-n parameter to display the actual line:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep-c '^ $ 'myfile

10. Match special characters
Query characters with special meanings, such as $. '"* [] ^ | \ +? , Must be preceded by a specific character \. Suppose you want to query all rows containing ".", the script is as follows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '\. 'myfile

Or a double quotation mark:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '\ "'myfile

In the same way, to query the file name c o n f t r o l c o n f (this is a configuration file), the script is as follows:

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep 'conftroll \. conf' myfile

11. query formatted file names
Use a regular expression to match any file name. The system provides standard naming formats for text files. Generally, a maximum of six lower-case characters, followed by a period, followed by two upper-case characters.

Code: [copy to clipboard] [Sam @ chenwy Sam] $ grep '^ [A-Z] \ {\} \. [A-Z] \ {1, 2 \} 'filename

I don't know if there is any error in this writing: Oops:

12. Query IP addresses
To view the N. N network address, if you forget the rest of the second part, only two periods are known, such as N. N... To extract all the NNN. nnn ip addresses, use [0-9] \ {3 \} \. [0-0 \ {3 \}\. It means that any number appears three times, followed by a sentence, followed by any number three times, followed by a sentence.

Code: [copy to clipboard] [0-9] \ {3 \} \. [0-9] \ {3 \}\.'

There is another error. correct it in another day.

1. Class Name

G r e p allows the use of international character pattern matching or matching pattern class name format.
Class Name and its equivalent regular expression class equivalent Regular Expression

Quote: [[: u p e r:] [A-Z] [[: a l n u m:] [0-9 A-Za-Z]
[[: L o w e r:] [A-Z] [[: s p a c e:] space or t a B key
[[: D I g I t:] [0-9] [[: A L p H A:] [A-z A-Z]

Example 1: Start with 5, followed by at least two uppercase letters:

Code: [copy to clipboard] $ grep '5 [[: Upper:] [[: Upper] 'data. f

Take all product codes ending with P or D:

Code: [copy to clipboard] grep '[[: Upper:] [[: Upper:] [p, d] 'data. f

2. Use wildcard * matching mode

Code: [copy to clipboard] $ cat testfile
Looks
Likes
Looker
Long

Try the following:

Code: [copy to clipboard] grep "L. * s" testfile

To query a word at the end of a row, try the following mode:

Code: [copy to clipboard] grep "NG $" testfile

This will query all rows whose end contains the word ng in all files.

3. System grep

File passwd

Code: [copy to clipboard] [root @ linux_chenwy Sam] # grep "Sam"/etc/passwd
SAM: X: 506: 4:/usr/SAM:/bin/bash

The preceding script queries whether the/e t C/p a s w d file contains a SAM string.

If the following script is entered by mistake:

Code: [copy to clipboard] [root @ linux_chenwy Sam] # grep "Sam"/etc/Password
Grep:/etc/password: No file or directory

The Error Code 'No such file or directory' of the g r e p command will be returned '.
The above results indicate that the input file name does not exist. Use the g r e p command-s to disable the error message.
Return to the command prompt without an error message indicating that the file does not exist.

Code: [copy to clipboard] [root @ linux_chenwy Sam] # grep-s "Sam"/etc/Password

If the g r e p command does not support the-s switch, you can use the following command instead:

Code: [copy to clipboard] [root @ linux_chenwy Sam] # grep "Sam"/TEC/password>/dev/null 2> & 1

The script means to match the command output or error (2> $1) and output the result to the system pool. Most system administrators call/d e v/n u L as a bit pool. It doesn't matter. You can think of it as a bottomless pit, and it will never be filled up.

The above two examples are not good, because the purpose here is only to know whether the query is successful.

To save the query results of the g r e p command, you can redirect the command output to a file.

Code: [copy to clipboard] [root @ linux_chenwy Sam] # grep "Sam"/etc/passwd>/usr/SAM/passwd. Out
[Root @ linux_chenwy Sam] # Cat/usr/SAM/passwd. Out
SAM: X: 506: 4:/usr/SAM:/bin/bash

The script redirects the output to the file p a s w d. o u t under/T m p.

Use the ps command
You can use g r e p with the PS x command to query the processes running on the system. The PS x command is used to display the list of all processes running on the system. To check whether the d n s server is running (usually called n a m e d), use the following method:

Code: [copy to clipboard] [root @ linux_chenwy Sam] # ps ax | grep "named"
2897 pts/1 s grep named

The output should also contain the g r e p command, because the g r e p command creates the corresponding process and ps x will find it. Use the-V option in the g r e p command to discard the g r e p process in the p s command. If ps x is not applicable to the user system, use PS-Ef instead. Here, because I have no DNS Service, only the grep process is available.

Use grep for a string
G r e p is not only applied to files, but also to strings. Therefore, use the E c h o string command, and then use the pipeline to input the g r e p command.

Code: [copy to clipboard] [root @ linux_chenwy Sam] # STR = "Mary Joe Peter Paine"
[Root @ linux_chenwy Sam] # echo $ STR | grep "Mary"
Mary Joe Peter Paine

Matching is successful.

Code: [copy to clipboard] [root @ linux_chenwy Sam] # echo $ STR | grep "Simon"

Because no matching string exists, no output is returned.

4. egrep
E g r e p represents e x p r e s I o n or extended grep, depending on the situation. E g r e p accepts all regular expressions. A notable feature of e g r e p is that it can use a file as a saved string, then pass it to e g r e P as the parameter, so the-F switch is used. If you create a file named g r e p s t r I N G S and enter 4 8 4 and 4 7:

Code: [copy to clipboard] [root @ linux_chenwy Sam] # vi grepstrings
[Root @ linux_chenwy Sam] # Cat grepstrings
484
47 code: [copy to clipboard] [root @ linux_chenwy Sam] # egrep-F grepstrings data. f
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
484 Nov 7pl1996 CAD 49.00 plv2c 234

The above script matches all records in d a t a. F that contain 4 8 4 or 4 7. When matching a large number of modes, the-F switch is very useful, and typing these modes in a command line is obviously very cumbersome.

If you want to query the stored code 3 2 L or 2 C, you can use the (|) symbol, that is, one or all of the two sides of the "|" symbol.

Code: [copy to clipboard] [root @ linux_chenwy Sam] # egrep '(3zl | 2cc) 'Data. f
47 Oct 3zl1998 lpsx 43.00 kvm9d 512
219 dec 2cc1999 CAD 23.00 plv2c 68
216 Sept 3zl1998 USP 86.00 kvm9e 234

You can use any number of vertical lines "|". For example, to check whether there are accounts such as l o u I s e, m a T Y, or Paine in the system, use the w h o command and pipe the output to e g r e p.

Code: [copy to clipboard] $ who | egrep (Louise | Matty | Pauline)

You can also use the ^ symbol to exclude strings. If you want to view users on the system, but do not include m a t y and p a u l I n e, the method is as follows:

Code: [copy to clipboard] $ who | egrep-V '^ (Matty | Pauline )'

To query a file list, including s h u t d o w n, s h u t d o W n s, r e B o t and r e B o t s, e g r e p can be easily implemented.

Code: [copy to clipboard] $ egrep '(shutdown | reboot) (s )? '*

Egrep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.