Linux Awk notes 1. awk language www.2cto.com awk is a programming language used to operate data and generate reports. Awk scans a file row by row from the first row to the last row, searches for the rows matching the specified mode, and performs the selected operations on these rows (the operations are enclosed in curly brackets ). Awk Commands include a combination of modes, operations, or modes, and operations. The awk mode controls the operations that awk performs on a row of input, including regular expressions, which can generate a Boolean expression or a combination of the two. The regular expressions supported by awk are basically the same as those supported by egrep. You can also use &> to represent the matched strings in the search string. An awk operation is a statement enclosed in curly braces separated by semicolons (a statement can be separated by line breaks if it is in its own line ). Awk supports print and printf formatting output. The printf formatting character is consistent with the format character in C. Print provides line breaks, while printf does not. For line breaks, the escape sequence \ n is required. 2. The size of the built-in variable name in the awk can be used in the expression and can be reset. Www.2cto.com variable name variable meaning ARGC command line variable count ARGV command line variable array FILENAME currently input file name NF number of currently recorded domains NR so far all input file records FNR so far number of records of the current file FS input domain delimiter, it is a space by default. You can also use the-F option on the command line to specify the input domain delimiter OFS output domain delimiter RS input record delimiter ORS output record delimiter OFMT value output format; when formatting the output, you can also specify the value output format SUBSEP subscript delimiter. This delimiter can be used to implement multi-dimensional arrays. RLENGTH: the length of the string matching by the match function. RSTART: the offset of the string matching by the match function. 3. BEGIN, END mode: www.2cto.com BEGIN followed by an operation module, which is executed before awk processes files. The BEGIN operation is usually used to change the values of built-in variables such as FS, RS, and OFS, or assign an initial value to a user-defined variable and print the title as a part of the output. Eg: awk 'in in {FS = ":"; OFS = "\ t"} {print $1, $2} 'filename END mode does not match any input row, only the operations related to the END mode are executed. The END module is executed only after awk finishes processing all input rows. Eg: awk 'end {print "The number of records is" NR} 'filename 4. the built-in getlinegetline function of awk is used to read input from standard input, pipeline, or file, rather than from the current file being processed. Eg: awk 'begin{ "date" | getline dt; print dt} 'The filenamesystemsystem function treats a shell command as its variable, runs the command, and returns the exit status to awk. The sub and gsubsub functions match the largest and leftmost string in the record with the regular expression, and then replace the string with the replacement string. Sub uses the target string (eg: $1) as the third parameter. Optional. If no target string is specified, the entire record is used by default. The gsub function replaces each matching of the regular expression in the record. Sub (regular expression, substitution string [, target string]) gsub (regulat expression, substitution string [, target string]) the indexindex function returns the first position of the substring in a string, and the offset starts from 1. The index (string, substring) lengthlength function returns the number of characters in a string. If there is no variable, returns the number of characters in a record. The length [(string)] substrsubstr function returns a substring of the string starting from the first position. Here, the first position is 1. Substr (string, starting position [, length of string]) The matchmatch function returns the index of the position of the regular expression in the string. If not, 0 is returned. The match function sets the built-in variable RSTART as the starting position of the substring and RLENGTH as the number of characters until the end of the substring. The match (string, regular expression) splitsplit function divides the string into an array. Split uses the domain delimiter as the third parameter (Optional). If this parameter is not provided, the current FS value is used by default. Split (string, array [, field separator]) The sprintfsprintf function returns an expression in the specified format. The format description of printf is allowed. The intint function removes any digits after the decimal point to create an integer. The randrand function generates a pseudo-random number greater than or equal to 0 and less than 1. If srandsrand has no parameters, the date is used to generate the seeds of the rand function. srand (x) uses x as the seeds.