RS, ORS, FS, OFS usage behind awk in Linux

Source: Internet
Author: User

The RS, ORS, FS, OFS meanings behind awk in Linux

One, RS and ORS where is the difference? We often say that awk is based on the row-and-column manipulation of text, but how do you define "lines"? This is the role of Rs.
By default, the value of Rs is \ n. Below is an example to understand the Rs.
echo ' 1a2a3a4a5 ' | awk ' {print '} ' 1a2a3a4a5echo ' 1a2a3a4a5 ' | awk ' begin{rs= ' a "}{print $12345"
We can see that, after changing the value of RS, the rows defined by awk are not the lines in our actual sense.
The RS fixed string above, RS can also be defined as regular expressions.
echo ' 1ab2bc3cd4de5 ' | awk ' begin{rs= ' [a-z]+ '}{print $1,rs,rt} ' 1 [a-z]+ ab2 [a-z]+ BC3 [a-z]+ CD4 [a-z]+ DE5 [a-z]+
When we set RS as a regular expression, RT is a variable, and the value of RS is always the regular one we set, and the value of RT is what this regular match actually matches. If the RS is set to NULL, then awk will use a contiguous blank line as the row delimiter, what's the difference from RS setting to "\n\n+"???
1. Ignore blank lines at the beginning and end of the file. And the file does not end with a record delimiter, that is, it is not a blank line at last, and the end of the last record is removed
2, do not set the RT variable (test not found the law, temporarily think that the RT variable is not available)
3, the impact of the FS variable summary under the RS 3 kinds of situations:
1) non-empty string
Sets a fixed string as the row delimiter, while setting the variable RT to a fixed string
2) Regular Expressions
Use regular expressions as row separators and set the string that the variable RT actually matches the regular expression to
3) NULL character
With consecutive empty lines as the row delimiter, if FS is a single character, it will be forced into the FS variable to understand RS, and then to understand Ors is simple. RS is the line delimiter when awk reads a file, and Ors is the line terminator when awk outputs.
More simply, when awk outputs, it adds a value set by the ORS variable after each row of records.
The value of ORS can only be set to a string, by default, the value of ORS is \ n
Seq 5 | awk ' {print $} ' 12345seq 5 | awk ' begin{ors= "a"}{print $} ' 1a2a3a4a5a
Our usual print $ is equivalent to printf $ ORS
Second, FS and OFS difference in which RS is awk used to define "line", then FS is used by awk to define "columns".
Setting the variable FS is the same as using the-f parameter.
Echo ' | Awk-f, ' {print $ ' 1echo ' | awk ' begin{fs= ', '}{print $1 '
Similar to RS, FS can also be set to regular expressions
echo ' 1ab2bc3cd4de5 ' | awk ' begin{fs= ' [a-z]+ '}{print $1,$2,$5} ' 1 2 5
FS has 1 exceptions, which is to set FS to a space, fs= "", which is also the default value of FS
1.In The special case, FS was a single space, and fields was separated by runs of spaces and/or tabs and/or newlines.
At this point, awk will use a contiguous space or tab (\ t) or newline character (\ n) as the delimiter for the column
So, fs= "" and fs= "[\t\n]+" There is a difference???
The answer is yes.
Echo ' 1 2 ' | awk ' begin{fs= ' "}{print $1echo ' 1 2 ' | awk ' begin{fs= ' [\t\n]+ '}{print} '

When fs= "", awk automatically removes space or tab (\ t) or newline (\ n) characters from the beginning and end of the line, but fs= "[\t\n]+" is not
Similarly, FS can also be set to NULL
echo ' 123 ' | awk ' begin{fs= ' "}{print $1,$2} ' 1 2
When FS is set to an empty string, awk makes each character of a row of records a separate column
Similarly, when we want to separate columns with a fixed length, we can use fieldwidths instead of FS
For example, the first 3 characters of a row of records serve as one column, the next 2 characters as the second column, and the next 4 characters as the third column
echo ' 123456789 ' | awk ' begin{fieldwidths= ' 3 2 4 "}{print $1,$2,$3} ' 123 6789echo ' 123456789 ' | awk ' begin{fieldwidths= ' 3 2 3 "}{print $1,$2,$3} ' 123 678 echo ' 123456789 ' | awk ' begin{fieldwidths= ' 3 2 5 "}{print $1,$2,$3} ' 123 45 6789
If the defined length is less than the actual length, awk truncates and, if it is greater than the actual length, whichever is the actual length. Summarize the following 4 scenarios for FS:
1) non-empty string
Use fixed string as column delimiter
2) Regular Expressions
Use regular expressions as column separators
3) Single Space
Use a contiguous space or tab (\ t) or newline character (\ n) as the column delimiter
4) NULL character
Take each character as a separate column next we look at the issues that are mentioned in the following section:
When rs= "", the \ n force is added to the FS variable
Cat URFILE1A 2a 3awk-v rs= "" ' {print ' # "$" # "} ' urfile#1a# #2a # #3 #awk-f" B "-v rs=" "' {print $} ' Urfile123awk-f" C " -V rs= "" ' {print '} ' Urfile123awk-f "C"-V rs= "\n\n+" ' {print "#" $ "#"} ' urfile#1a# #2a # #3 #
If FS is a single character, \ n always exists in FS, and rs= "\n\n+" does not. To understand FS, let's take a look at OFS, where FS is the column delimiter when awk reads records, and OFS is the column delimiter when awk outputs.
The print $1,$2 we normally use is equivalent to print $ OFS
Echo ' 1 2 ' | Awk-v ofs= "|" ' {print $1,$2} ' 1|2echo ' 1 2 ' | Awk-v ofs= "|" ' {print $ OFS} ' 1|2
If a row of records has many columns and you want to change the output delimiter, print $1,$2,$3 ... It's not too much trouble?
There are, of course, simple ways:
Echo ' 1 2 3 4 5 ' | Awk-v ofs= "|" ' {print $} ' 1 2 3 4 5 echo ' 1 2 3 4 5 ' | Awk-v ofs= "|" ' {$1=$1;print $} ' 1|2|3|4|5echo ' 1 2 3 4 5 ' | Awk-v ofs= "|" ' {nf+=0;print} ' 1|2|3|4|5

In order for the OFS setting to take effect, you need to change the $, here we have a little lie to awk.
$1=$1 or nf+=0, the content of the $ s itself does not actually change, just to make the OFS settings effective after understanding RS and FS, let's look back at the beginning of the sentence: "Awk is based on the column operation text"
This is not really accurate, because the "line" in awk is not a normal "line" after the change of RS.
Similarly, when you change FS, the "column" in Awk is no longer a normal "column".
Therefore, it should be accurate to say: "Awk is based on the Record and field (field) Action text"
The difference between three, 0 and "0" we'll take a look at one example:
awk ' begin{if (0) print "True", else print "false"} ' Falseawk ' Begin{if ("0") print "true"; else print "false"} ' true
Why the same 0, the result is not the same?
In fact, to explain this problem, only need to understand the "true" and "false" in awk.
The following 3 cases are "false" and all other cases are "true"
1) Number 0
2) Empty string
3) undefined value
awk ' Begin{a=0;if (a) print "true", else print "false"} ' Falseawk ' begin{a= ""; if (a) print "true"; else print "false"} ' Falseawk ' Begin{if (a) print "true"; else print "false"} ' false

How is awk going to weigh?
1.awk '! A[$0] + + '

Before we explain, we need to understand one of Awk's features:
awk assigns an undefined variable an initial value based on context
awk ' Begin{print a ' "1} ' 1 awk ' begin{print A + 1} ' 1

For undefined variables, if a string operation is to be performed, it is assigned an empty string ""
If you want to perform a mathematical operation, it will be assigned the number 0 now let's look at the code above! A[$0] + + equivalent to if (! a[$0] + +) print $
For the first occurrence of the record, the value of a[$0] is undefined, because the subsequent + + is a mathematical calculation, so a[$0] will be assigned to the number 0
Also because the + + operator, will first take the value, then calculate, so for the first row of records is actually if (! 0) Print $
! is to take the reverse, 0 is false,! 0 is true, then it executes the back print
For subsequent duplicate records, a[$0] + + calculation has changed to 1, 2, 3 ...
and! 1! 2! 3 ...  are false and will not print. Let's take a deep look at the Heige code to print odd lines with awk:
Seq 10 | awk ' I=!i ' 13579 

RS, ORS, FS, OFS usage behind awk in Linux

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.