Shell text filtering (awk) classification: Linux Shell script learning 1241 people read comments (0) Add to favorites report shell Regular Expression script task Language
If you want to format packets or extract data packets from a large text file, awk can complete these tasks.
To obtain the required information, the text must be formatted. That is to say, the domain separator is used to divide the extraction domain. The separator can make any character.
The most basic feature of the awk language is to browse and extract information in a file or string based on a specified specification. Only after awk extracts information can other text operations be performed. Awk scripts are usually used to format information in text files.
1. Call awk
① Command line method:
[Python]View plaincopy
- Awk [-F field-separator] 'commands' input-file (s) // 'commands' is a real awk command.
[-F domain separator] is optional, and awk uses space as the default domain separator.
② Insert all the awk commands into a file, which is executable by the awk program, and then use the awk command interpreter as the first line of the script for calling by typing the Script Name.
③ Insert all the awk commands into a separate file and call it.
[Python]View plaincopy
- Awk-F awk-script-file input-files (s)
2. awk script
When an awk script is called in a command, the awk script consists of various operations and modes.
Each time an awk reads a record or a row, it uses the specified separator to separate the specified domain.
① Modes and actions
All awk statements are composed of modes and actions. The Mode part determines when the Action Statement is triggered and the event is triggered. Processing is the operation on data. If the mode is omitted, the action is always executed.
Mode allows any conditional statement, compound expression, or regular expression.
The mode contains two special characters: begin and end.
The begin statement sets the count and print headers. Before any Text Browsing action.
The end statement is used to print the total number of output texts and the ending status mark after the awk completes the Text Browsing. It does not specify the mode, and the awk always matches or prints the number of rows.
3. domain and record
When awk is executed, Its browsing domain is marked as $1, $2,... $ n. $ N indicates the nth domain, and $0 indicates all domains, which are separated by commas.
Print one or all fields and use the print command. This is an awk action. The action syntax is.
① Extraction domain
Example:
[Python]View plaincopy
- M. tansley 05/99 48311 green 8 40 44
- J. Lulu 06/99 48317 green 9 24 26
- P. Bunny 02/99 48 yellow 12 35 28
- J. Troll 07/99 4842 brown-3 12 26 26
- L. tansley 05/99 4712 brown-2 12 30 28
First, we need to extract information from the file and divide them into domains.
② Save the awk output
There are two ways to save the awk Script output at the shell prompt.
First, use the output redirection symbol> file name
[Python]View plaincopy
- Awk '{print $0}' readfile> SaveFile
The second method is to use the tee command to output to the screen while outputting to the file.
[Python]View plaincopy
- Awk '{print $0}' readfile | tee SaveFile
③ Use standard input
In fact, all scripts accept input from standard input.
[Python]View plaincopy
- Method 1: $ awkscript readfile
- Method 2 (redirection): $ awkscript <radfile
- Method 3 (MPS Queue): $ readfile | awkscript
④ Print all records
[Python]View plaincopy
- Awk '{print $0}' readfile // print the entire file
⑤ Print individual records
Use $1, $2... $ n to separate domain IDs with commas
[Python]View plaincopy
- Awk '{print $1, $4}' readfile // print domain 1 and domain 4
⑥ Print the report Header
[Python]View plaincopy
- Awk 'in in {print "XXXX"} {print $1 "\ t" $4} 'readfile
7. End of printed information
[Python]View plaincopy
- Awk 'in in {print "XXX"} {print $1} end {print "end"} 'readfile
4. Regular Expression in awk
Here, the regular expression is enclosed by a slash,/string/
① Match
Use the '~' symbol to match the expression of the domain number '~ 'Followed by the regular expression. You can also use the if statement. In awk, the conditions after the if statement are enclosed.
[Python]View plaincopy
- Awk '{if ($4 ~ /String/) Print $0} 'readfile // If field 4 contains a matched string, print the entire sentence
- Awk '{$0 ~ /String/'} readfile // if the record contains a matched string, print the entire sentence
② Exact match
[Python]View plaincopy
- Awk '{if ($3 ~ /String/) Print $0} 'readfile // All records containing the string match, inaccurate
[Python]View plaincopy
- Awk '$3 = "string" {print $0}' readfile // ensure that only strings are matched for exact match
③ Mismatch
[Python]View plaincopy
- Awk '{if ($4 !~ /Match string/) Print $0} 'readfile
④ Less
[Python]View plaincopy
- Awk '{if ($6 <$7) print "XXX"}' readfile
⑤ Less than or equal
[Python]View plaincopy
- Awk '{if ($6 <= $7) print "XXX"}' readfile
Greater
[Python]View plaincopy
- Awk '{if ($6, $7) print "XXX"}' readfile
7. Set case sensitivity.
To query case information, you can use the [] symbol
[Python]View plaincopy
- Awk '/[Gg] reen/'readfile // match the rows of green
Any character of Limit
[Python]View plaincopy
- Awk '$1 ~ /^... A/'readfile // extract domain 1, which records the fourth character of the first domain when
Condition or link match
When using or Relational operators, the statement must be enclosed in parentheses
[Python]View plaincopy
- Awk '$0 ~ /(String 1 | string 2)/'readfile // match | one of the two modes
Starting line
[Python]View plaincopy
- Awk '/^ string/' readfile
Others
& And: both sides of the statement must be true at the same time.
| Or: The statement matches both sides of the statement or one of them to true.
! Non-Inverse