Shell text filtering (awk)

Source: Internet
Author: User
Tags readfile
Shell text filtering (awk) classification: Linux Shell script learning 1241 people read comments (0) Add to favorites report shell Regular Expression script task Language

If you want to format packets or extract data packets from a large text file, awk can complete these tasks.

To obtain the required information, the text must be formatted. That is to say, the domain separator is used to divide the extraction domain. The separator can make any character.

The most basic feature of the awk language is to browse and extract information in a file or string based on a specified specification. Only after awk extracts information can other text operations be performed. Awk scripts are usually used to format information in text files.

1. Call awk

① Command line method:

[Python]View plaincopy
  1. Awk [-F field-separator] 'commands' input-file (s) // 'commands' is a real awk command.

[-F domain separator] is optional, and awk uses space as the default domain separator.

② Insert all the awk commands into a file, which is executable by the awk program, and then use the awk command interpreter as the first line of the script for calling by typing the Script Name.

③ Insert all the awk commands into a separate file and call it.

[Python]View plaincopy
  1. Awk-F awk-script-file input-files (s)


2. awk script

When an awk script is called in a command, the awk script consists of various operations and modes.

Each time an awk reads a record or a row, it uses the specified separator to separate the specified domain.

① Modes and actions

All awk statements are composed of modes and actions. The Mode part determines when the Action Statement is triggered and the event is triggered. Processing is the operation on data. If the mode is omitted, the action is always executed.

Mode allows any conditional statement, compound expression, or regular expression.

The mode contains two special characters: begin and end.

The begin statement sets the count and print headers. Before any Text Browsing action.

The end statement is used to print the total number of output texts and the ending status mark after the awk completes the Text Browsing. It does not specify the mode, and the awk always matches or prints the number of rows.

 

3. domain and record

When awk is executed, Its browsing domain is marked as $1, $2,... $ n. $ N indicates the nth domain, and $0 indicates all domains, which are separated by commas.

Print one or all fields and use the print command. This is an awk action. The action syntax is.

① Extraction domain

Example:

[Python]View plaincopy
  1. M. tansley 05/99 48311 green 8 40 44
  2. J. Lulu 06/99 48317 green 9 24 26
  3. P. Bunny 02/99 48 yellow 12 35 28
  4. J. Troll 07/99 4842 brown-3 12 26 26
  5. L. tansley 05/99 4712 brown-2 12 30 28

First, we need to extract information from the file and divide them into domains.

② Save the awk output

There are two ways to save the awk Script output at the shell prompt.

First, use the output redirection symbol> file name

[Python]View plaincopy
  1. Awk '{print $0}' readfile> SaveFile

The second method is to use the tee command to output to the screen while outputting to the file.

[Python]View plaincopy
  1. Awk '{print $0}' readfile | tee SaveFile

③ Use standard input

In fact, all scripts accept input from standard input.

[Python]View plaincopy
  1. Method 1: $ awkscript readfile
  2. Method 2 (redirection): $ awkscript <radfile
  3. Method 3 (MPS Queue): $ readfile | awkscript

④ Print all records

[Python]View plaincopy
  1. Awk '{print $0}' readfile // print the entire file

⑤ Print individual records

Use $1, $2... $ n to separate domain IDs with commas

[Python]View plaincopy
  1. Awk '{print $1, $4}' readfile // print domain 1 and domain 4

⑥ Print the report Header

[Python]View plaincopy
  1. Awk 'in in {print "XXXX"} {print $1 "\ t" $4} 'readfile

7. End of printed information

[Python]View plaincopy
  1. Awk 'in in {print "XXX"} {print $1} end {print "end"} 'readfile

 

4. Regular Expression in awk

Here, the regular expression is enclosed by a slash,/string/
① Match

Use the '~' symbol to match the expression of the domain number '~ 'Followed by the regular expression. You can also use the if statement. In awk, the conditions after the if statement are enclosed.

[Python]View plaincopy
  1. Awk '{if ($4 ~ /String/) Print $0} 'readfile // If field 4 contains a matched string, print the entire sentence
  2. Awk '{$0 ~ /String/'} readfile // if the record contains a matched string, print the entire sentence

② Exact match

[Python]View plaincopy
  1. Awk '{if ($3 ~ /String/) Print $0} 'readfile // All records containing the string match, inaccurate
[Python]View plaincopy
  1. Awk '$3 = "string" {print $0}' readfile // ensure that only strings are matched for exact match

③ Mismatch

[Python]View plaincopy
  1. Awk '{if ($4 !~ /Match string/) Print $0} 'readfile

④ Less

[Python]View plaincopy
  1. Awk '{if ($6 <$7) print "XXX"}' readfile

⑤ Less than or equal

[Python]View plaincopy
  1. Awk '{if ($6 <= $7) print "XXX"}' readfile

Greater

[Python]View plaincopy
  1. Awk '{if ($6, $7) print "XXX"}' readfile

7. Set case sensitivity.

To query case information, you can use the [] symbol

[Python]View plaincopy
  1. Awk '/[Gg] reen/'readfile // match the rows of green

Any character of Limit

[Python]View plaincopy
  1. Awk '$1 ~ /^... A/'readfile // extract domain 1, which records the fourth character of the first domain when

Condition or link match

When using or Relational operators, the statement must be enclosed in parentheses

[Python]View plaincopy
  1. Awk '$0 ~ /(String 1 | string 2)/'readfile // match | one of the two modes

Starting line

[Python]View plaincopy
  1. Awk '/^ string/' readfile

 

Others

& And: both sides of the statement must be true at the same time.

| Or: The statement matches both sides of the statement or one of them to true.
! Non-Inverse

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.