Text Analysis Tool-awk
I. Introduction to AWK
Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.
Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.
Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.
Ii. Usage
Awk '{pattern + action}' {filenames}
Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.
The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.
In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.
-------------------------------------- Split line --------------------------------------
Introduction and use of AWK
AWK introduction and Examples
Shell script-AWK text editor syntax
Learning and using AWK in Regular Expressions
AWK diagram of Text Data Processing
How to Use the awk command in Linux
-------------------------------------- Split line --------------------------------------
Iii. Method of calling awk
There are three methods to call awk
1. Command Line
1 |
awk [-F field-separator] 'commands' input-file(s) |
Commands is a real awk command, and the [-F domain separator] is optional. Input-file (s) is a file to be processed.
In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator.
2. shell script
Insert all the awk commands into a file and make the awk program executable. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again.
Equivalent to the first line of shell script :#! /Bin/sh
Can be changed :#! /Bin/awk
3. Insert all the awk commands into a separate file, and then call:
Awk-fawk-script-file input-file (s)
Among them, the-f option loads the awk script in the awk-script-file, and the input-file (s) is the same as above.
Iv. introduction to basic awk commands
Option:
-F [:]: Specifies the input field separator
-V var = var: assign values to built-in variables or custom Variables
Example 1: Use a comma as the field separator to print the first and third fields of the text content (the user name and UID are obtained)
12345 |
#gawk -F: '{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 Omitted |
Two fields are connected without commas, which are output delimiters.
12345 |
# gawk -F: '{print $1$3}' /etc/passwd root0 bin1 daemon2 Omitted |
This is an example of awk + action. action {print $1, $3} is executed for each row }.
5. awk output commands: print and printf
Both print and printf are provided in awk.
5.1.print command:
Command usage:
Usage tips:
1. Each item is separated by a comma, and the output separator is used for output.
2. Each output item can be a string or a value. The field ($ n) of the current record, a variable or an awk expression, and the value is implicitly converted to a character for output.
3. If the item after print is omitted, it is equivalent to print $0 (the entire line is output). print "" is used to output blank space "";
5.2.printf command:
Command Format:
1 |
printf format,item1,item2…… |
Usage tips:
1. The format character must be
2. line breaks are not automatically generated. You need to manually add line delimiters.
3. Specify a format character for each item following the format
Format character: Start With %, followed by a character
% C: the ASCII code of the character;
% I, % d: displays a decimal integer;
% E, % E: Numeric value displayed in scientific notation;
% F: displays floating point numbers;
% G, % G: numerical value is displayed in scientific notation or floating-point number format;
% S: string;
% U: unsigned integer;
%: Display % itself
Modifier:
# [. #]: First # display width, for example, % 30 s; second. # display decimal point Precision
-: Left alignment
+: Displays numeric symbols.
Vi. awk Variables
6. 1. built-in Variables
Records: Row-related
Fields: field-related
FS: input field seperator, which is a field separator. It is a blank character by default.
# Awk-v FS = ":" '{print $1, $3}'/etc/passwd
OFS: output fieldseparator, output field separator
The delimiter between a statement and a statement. The default Delimiter is space.
# Awk 'in in {FS = ":"; OFS = "="} {print $1, $3} '/etc/passwd
RS: input record seperator, which is the delimiter of the input record. The default value is new.
Example: Use a colon as the line break to output the full text
# Awk-v RS = ":" '{print $0}'/etc/passwd
ORS: Outpput Row Seperator, which is the line separator for output;
The default line Delimiter is generally a line break, which can be customized #
The following is to replace all Separators with the separator ":" With the separator:
# Awk 'in in {RS = ":"; ORS = "#"} {print $0} '/etc/passwd
NF: Number of Field, Number of fields in the current record
Count the number of fields in each line in the/etc/issue file:
# Awk '{print NF}'/etc/issue
Note: NF is a variable reference. You do not need to add $, $ NF to display the field location.
NR: number of inputrecords, number of current text lines
If there are multiple files, this number will count the processed files in a unified manner.
FNR: Unlike NR, FNR is used to record the rows being processed as the total number of rows being processed in the current file.
ARGV: array, saving the command itself. awk '{print $0}' file1 file2, meaning ARGV [0] saves awk,
ARGC: saves the number of parameters in the awk command, excluding the command itself;
This command has three parameters: awk/etc/fstab/etc/issue
FILENAME: current file name
IGNORECASE: determines whether to ignore case-insensitive characters.
. Custom Variables
Direct use
-V var = valname: variable names are case sensitive
1. variables can be defined in program
2. variables can be defined in Options
For example:
Equivalent:
# Awk-v file = "passwd" '{printfile, $1}'/etc/passwd
For more details, please continue to read the highlights on the next page: