Awk basic knowledge Summary (1)

Source: Internet
Author: User
Tags string to number
1. The use of the rule awk is suitable for text processing and report generation. It also has many well-designed features that allow special skill programming. Awk syntax is more common. It draws on some essential parts of some languages, such as C, Python, and bash. Let's continue with the first awk to learn how it works. Enter the following command in the command line: $ awk '{print}'/etc/passwd. You will see that the content of the/etc/passwd file appears. Now, explain what awk has done. When calling awk, we specify/etc/passwd as the input file. When awk is executed, it runs the print command on each line in/etc/passwd in sequence. All outputs are sent to stdout, and the result is exactly the same as that of running catting/etc/passwd. Now, explain the {print} code block. In awk, curly braces are used to combine several pieces of code, which is similar to the C language. There is only one print command in the code block. In awk, if only the print command appears, all contents of the current row will be printed. Here is another awk example, which serves exactly the same purpose as the previous example: $ awk '{print $0}'/etc/passwd in awk. The $0 variable indicates the entire row, so print and Print $0 play exactly the same role. Create an awk program to output data that is completely irrelevant to the input data. Example 1: $ awk '{print ""}'/etc/passwd. If you pass the "" string to the print command, it prints a blank line. Test the script and awk outputs a blank line for each row in the/etc/passwd file. We can see that awk executes this script for each row in the input file. Example 2: $ awk '{print "hiya"}'/etc/passwd run this script to fill your screen with Hiya. 2. processing multiple fields awk is very good at processing texts divided into multiple logical fields. It can also reference each independent field in the awk script. Print the list of all user accounts on the system: $ awk-F ":" '{print $1}'/etc/passwd in the above example, when awk is called, use the-F option to specify ":" As the field separator. When awk processes the Print $1 command, it prints the first field that appears in each row of the input file. The following is another example: $ awk-F ":" '{print $1 $3}'/etc/passwd the following is an excerpt from the Script output: halt7operator11root0shutdown6sync5bin1 .... etc. as you can see, awk prints the first and third fields of the/etc/passwd file, which are the username and user identity fields respectively. Now, when the script runs, it is not ideal-there is no space between two output fields! If you are used to programming using bash or Python, you will expect the Print $1 $3 command to insert spaces between two fields. However, when two strings are adjacent to each other in the awk program, the awk connects them but does not add spaces between them. The following command inserts a space in these two fields: $ awk-F ":" '{print $1 "" $3}'/etc/passwd when calling print in this way, it connects $1, "", and $3 to create readable output. You can also insert some text labels: $ awk-F ":" '{print "username:" $1 "ttuid: "$3"} '/etc/passwd, which generates the following output: Username: Halt uid: 7 Username: Operator uid: 11 Username: Root uid: 0 Username: shutdown uid: 6 Username: Sync uid: 5 Username: Bin uid: 1 .... etc.3. It is very easy to call an external script to pass the script to awk as the command line independent variable. For multi-line programs, you can write scripts in the external file, and then pass the-F option to awk to provide it with the call of the external script file: $ awk-F myscript. awk myfile. in can also use the awk function to add scripts to text files. For example, begin {FS = ":"} {print $1} prints the first field of each row in/etc/passwd in this script, field separators are specified in the Code (by setting the FS variable ). Set the field separator in the script itself. You can enter less than one command line independent variable. 4. For each input line, awk executes each script code block once. However, the initialization code may need to be executed before awk starts to process the text in the input file. In this case, awk allows you to define a begin block. We used the begin block in the previous example. Because awk executes the begin block before processing the input file, it initializes the FS (field separator) variables, print headers, or initialize excellent positions of other global variables that will be referenced in the program. Awk also provides another special block called the end block. Awk executes this block after processing all rows in the input file. Generally, the end block is used to execute the final calculation or print the summary information that should appear at the end of the output stream. 5. Regular Expression awk allows you to use regular expressions to select and execute independent code blocks based on whether the regular expression matches the current row. Output the line containing the Character Sequence FOO:/Foo/{print} complex point. Only the row containing the floating point number is printed:/[0-9] +. [0-9] */{print} can place any Boolean expression before a code block to control when a specific block is executed. Awk executes the code block only when the previous Boolean expression is evaluated as true. The following sample script output outputs the first field equal to the third field in all rows of Fred. If the first field in the current row is not the same as Fred, awk will continue to process the file without executing the print statement for the current row: $1 = "Fred" {print $3} awk provides a complete set of comparison operators, including "=", "<", ">", "<=", "> =", and "! = ". In addition, awk also provides "~ "And "!~ "Operators, which indicate" match "and" mismatch "respectively ". They are used to specify variables on the left side of the operator and regular expressions on the right side. If the fifth field of a row contains the Character Sequence root, the following example prints only the third field in the row: $5 ~ /Root/{print $3} 6. The conditional statement awk also provides a very good if statement similar to the C language. If statement example: {if ($5 ~ /Root/) {print $3} executes the code block for each input line and uses the if statement to select to execute the print command. More complex examples of awk if statements. {If ($1 = "foo") {if ($2 = "foo ") {print "uno"} else {print "one"} else if ($1 = "bar ") {print "two"} else {print "three"} use the if statement to add the code :! /Matchme/{print $1 $3 $4} to: {if ($0 !~ /Matchme/) {print $1 $3 $4} both scripts only output rows that do not contain the matchme character sequence. Awk also allows the use of boolean operators "|" (logical and) and "&" (logical or) to create more complex Boolean expressions: ($1 = "foo ") & ($2 = "bar") {print} This example only prints the row with the first field equal to foo and the Second Field equal to bar. 7. awk variables, numeric variables, and string variables. So far, we do not print strings, the whole line is a specific field. However, awk can also perform integer and floating point operations. By using mathematical expressions, you can easily write scripts with the number of blank rows in the computing file. Begin {x = 0}/^ $/{x = x + 1} end {print "I found" X "blank lines. :} "} in the begin block, initialize the integer variable X to zero. Then, every time awk encounters a blank row, awk will execute the X = x + 1 Statement, increasing X. After all rows are processed, run the end block. awk prints the final summary and specifies the number of blank rows it finds. One of the advantages of the string variable awk is "simple and stringized ". I think the awk variable is "stringized" because all awk variables are stored internally in the string format. At the same time, the awk variable is "simple", because it can be mathematical operations, and as long as the variable contains a valid numeric string, awk will automatically process the conversion steps from string to number. To understand my point of view, consider the following example: X = "1.01" # We just set X to contain the * string * "1.01" x = x + 1 # We just added one to a * string * print X # Incidentally, these are comments :) awk will output: 2.01 although the string value 1.01 is assigned to the variable X, you can still add one to it. But it cannot be done in bash or Python. Bash does not support floating-point operations. Moreover, if Bash has "stringized" variables, they are not "simple"; to perform any mathematical operations, bash requires us to put the numbers in an ugly $ () structure. If python is used, it must be converted to a floating point value before any mathematical operation is performed on the 1.01 string. Although this is not difficult, it is still an additional step. If awk is used, it is fully automated, and it will make our code clean and tidy. If you want to multiply and add one to the first field of each input line, use the following Script: {print ($1 ^ 2) + 1, we can find that if a specific variable does not contain valid numbers, awk treats the variable as a numerical zero when evaluating the mathematical expression. 8. The awk operator has a complete set of mathematical operators. In addition to standard addition, subtraction, multiplication, and division, awk also allows the aforementioned exponential operators "^", modulo (remainder) the "%" operator and many other easy-to-use value assignment operators borrowed from the C language. These operators include addition, subtraction, and subtraction (I ++, -- Foo), add/subtract/multiply/divide the value assignment operator (a + = 3, B * = 2, C/= 2.2, D-= 6.2 ). More than that -- we also have easy-to-use modulo/exponential value assignment operators (a ^ = 2, B % = 4 ). The field separator awk has its own special variable set. Some of them allow you to adjust the running mode of the awk, while other variables can be read to collect useful information about the input. We have been exposed to one of these special variables, FS. As mentioned above, this variable allows you to set the Character Sequence between fields to be searched by awk. When we use/etc/passwd as the input, set FS ":". When there is a problem with this, we can use FS more flexibly. The FS value is not limited to a single character. You can set it to a regular expression by specifying the Character Mode of any length. If you are processing fields separated by one or more tabs, you may want to set FS: FS = "t +" in the preceding example, we use special "+" rule expression characters, which represent "one or more previous characters ". If a field is separated by spaces (one or more spaces or tabs), you may want to set FS to the following rule expression: FS = "[[: Space:] +] "there is also a problem with the value assignment expression, which is not necessary. Why? By default, FS is set to a single space character. awk interprets this as "one or more spaces or tabs ". In this special example, the default FS setting is exactly what you want most! Complex rule expressions are not a problem. Even if your record is separated by the word "foo" and followed by three numbers, the following rule expression still allows correct data analysis: FS = "foo [0-9] [0-9] [0-9]" field quantity. Then we will discuss two variables that do not need to be assigned values, it is used to read useful information about the input. The first is the NF variable, also known as the "field quantity" variable. Awk automatically sets this variable to the number of fields in the current record. You can use the NF variable to display only some input rows: NF = 3 {print "This participating record has three fields:" $0}. Of course, you can also use the NF variable in the Condition Statement, as follows: {If (NF> 2) {print $1 "" $2 ": "$3} 9. Processing Record Number (NR) is another convenient variable. It always contains the number of the current record (awk counts the first record as Record Number 1 ). So far, we have processed an input file containing a record for each row. In these cases, Nr will also tell you the current row number. However, this issue does not occur when we start to process multiple rows of records later in this series! You can use NR to print only some input rows like using NF variables: (NR <10) | (NR> 100) {print "We are on record number 1-9 or 101 +"} another example: {# Skip header if (NR> 10) {print "OK, now for the real information! "} Awk provides additional variables suitable for various purposes. We will discuss these variables in future articles. Multiline record awk is an excellent tool for reading and processing structured data (such as system/etc/passwd files. /Etc/passwd is a UNIX user database, and is a text file bounded by a colon. It contains many important information, including all existing user accounts, user IDs, and other information. In my previous article, I demonstrated how awk can easily analyze this file. We only need to set the FS (field separator) variable ":". After setting the FS variable correctly, you can configure the awk to analyze almost any type of structured data, as long as the data is one record per line. However, to analyze records that occupy multiple rows, setting FS alone is not enough. In these cases, we also need to modify the RS record delimiter variable. The RS variable tells the awk when the current record ends and when the new record starts. For example, let's discuss how to deal with the address list of persons involved in the Federal Witness Protection Plan: Jimmy the weasel100 pleasant drivesan Francisco, CA 12345big tony200 incognito Ave. suburbia, WA 67890 theoretically, we want awk to regard every three rows as an independent record, rather than three independent records. If awk regards the first line of an address as the first field ($1) and the street address as the second field ($2 ), the city, state, and zip code is regarded as the third field $3, so this code will become very simple. The Code is as follows: begin {FS = "N" rs = ""} in the above Code, set FS to "N" to tell awk that each field occupies a row. By setting rs to "", each address record of the awk is also told to be separated by a blank line. Once awk knows how to format the input, it can perform all the analysis work for us. The rest of the script is simple. Let's look at a complete script that will analyze the address list, print each record on a line, and separate each field with commas. Address. awk begin {FS = "N" rs = ""} {print $1 "," $2 "," $3} Save the script as address. awk, address data is stored in the file address.txt, you can enter "awk-F address. awk address.txt "executes this script. Output: Jimmy the weasel, 100 pleasant drive, San Francisco, CA 12345big Tony, 200 incognito ave ., suburbia, WA 67890ofs and ors are in address. in the print Statement of awk, we can see that awk connects (merges) strings adjacent to each other in a row. We use this function to insert a comma and space (",") between the three fields in the same line (","). This method is useful but ugly. Instead of inserting "," string between fields, it is better to set a special awk variable ofs to let the awk complete the task. Print "hello", "there", "Jim! "The comma in this line of code is not part of the actual text string. In fact, they tell awk "hello", "there", and "Jim! "Is a separate field and the OFS variable should be printed between each string. By default, awk generates the following output: Hello there Jim! This is the output result by default. OFS is set to "" with a single space. However, we can easily redefine OFS so that awk will insert our favorite field separator. The original address. the revision of the awk program, which uses ofs to output the intermediate "," string: address. awk's revised version begin {FS = "N" rs = "" OFS = "," }{ Print $1, $2, $3} awk has a special variable ors, the full name is "output record separator ". By setting the OFS with the default line feed ("N"), we can control the characters automatically printed at the end of the print statement. The default ors value causes awk to output each new print statement in the new line. If you want to double the output interval, you can set ORS to "Nn ". Or, if you want to separate records with a single space (without line breaks), set ORS "". Convert multiple rows to a tab-separated format. Suppose we have written a script that converts the address list to a row for each record and uses the tab-bounded format to import the workbook. After using address. awk with a slight modification, we can clearly see that this program is only applicable to three lines of addresses. If awk encounters the following address, it will discard the fourth line and will not print the line: Cousin vinnievinnie's auto sho300city alleysosueme, or 76543 to handle this situation, the code should consider the number of records for each field and print each record in sequence. Now, the Code only prints the first three fields of the address. Here are some of the code we want: Suitable for addresses with any multiple fields. awk begin {FS = "N" rs = "" ors = ""} {x = 1 while (x <NF) {print $ X "T" x ++} Print $ NF "N"} First, set the field separator FS to "N", and set the record separator rs "", in this way, awk can correctly analyze multi-row addresses as before. Then, set the output record separator ORS to "", which will enable the print statement to output no new lines at the end of each call. This means that if you want any text to start from a new line, You need to explicitly write print "N ". In the main code block, a variable X is created to store the number of the current field being processed. At first, it is set to 1. Then, we use a while loop (an awk loop structure, equivalent to a while loop in C) to print records and tab characters repeatedly for all records (except the last record. Finally, print the last record and line feed. In addition, because the ORs is set to "", print does not output line breaks. The program output is as follows, which is exactly what we expected (not pretty, but it is bounded by tab to facilitate the import of workbooks): Jimmy the weasel 100 pleasant drive San Francisco, CA 12345 Big Tony 200 incognito Ave. suburbia, WA 67890 Cousin Vinnie's auto shop 300 city alley sosueme, or 76543


This article from the "Sunset Yuhui" blog, please be sure to keep this source http://whluwit.blog.51cto.com/2306565/1438156

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.