Transfer from http://roclinux.cn
This original article belongs to "Linux greenhouse" blog, the blog address is http://roclinux.cn. The author of the article is rocrocket.
===
[Start of body]
Sometimes learning scripts, you will find the sort command followed by a bunch of similar-k1,2, or-k1.2-k3.4 things, some unthinkable. Today, we'll take care of it--k options!
1 Preparing the material
$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500
The first domain is the company name, the second field is the number of companies, and the third field is the average employee wage. (except for the company name, other letters, all ^_^)
2 I want this file to be sorted alphabetically by company, that is, by the first field: (This facebook.txt file has three domains)
$ Sort-t '-K 1 facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500
See, just use the-K 1 to set it. (It's not strictly here, but you'll know later)
3 I want facebook.txt to be sorted by company number.
$ sort-n-T "-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Don't explain, I'm sure you can understand.
However, there is a problem, that is, Baidu and Sohu the same number of companies, are 100 people, this time how to do? The default rule is to sort ascending from the first field, so Baidu is ranked in front of Sohu.
4 I want facebook.txt to be sorted according to the number of companies in the same number of employees in ascending order of average salary:
$ sort-n-T "-K 2-k 3 Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
Look, we've added a-k2-k3 to solve the problem. To drop, sort supports this setting, which is to prioritize the domain sort, first to sort by the 2nd field, and then to sort by the 3rd field if the same. (If you want to, you can always write this down, set a number of sort priorities)
5 I want facebook.txt to be sorted according to the employee's salary in descending order, if the number of employees is the same, according to the number of companies in ascending order: (This is a bit difficult)
$ sort-n-T "-K 3r-k 2 Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
Here are some tips to take a closer look and secretly add a small letter R after the-K 3. Do you think, combined with our last article, can you get an answer? Announcement: The function of the R and-R options is the same, which means reverse order. Because sort is sorted by default in ascending order, it is necessary to add r here to indicate that the third field (employee average wage) is sorted in descending order. Here you can also add n, which means to sort the field by the numeric size, for example:
$ Sort-t ' k 3nr-k 2n facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
Look, we removed the front-n option, but added it to each of the-K options.
the specific syntax format of the 6-K option
To go further down, you have to have some theoretical knowledge. You need to understand the syntax format of the-K option, as follows:
[Fstart [. Cstart] [Modifier] [, [fend] [. Cend] [Modifier]]
This syntax format can be divided into two parts, the start part and the end part, by the comma (",").
Let's start by instilling in you the idea that if you don't set the end part, you think end is set as the end of the line. This concept is important, but often you don't value it.
The start section is also made up of three parts, the modifier part of which is what we said earlier about the options section like N and R. We focus on the Fstart and C.start of the Start section.
C.start can also be omitted, and the omitted words are indicated from the beginning of the domain. The K 2 and K 3 in the previous example are examples of omitting c.start.
Fstart.cstart, where Fstart is the field that represents the use, and Cstart means "sort first character" from the first character in the Fstart field.
Similarly, in the end section, you can set fend.cend, if you omit. Cend, the end to "domain Footer", which is the last character of the domain. Or, if you set Cend to 0 (0), it is also the end to "domain Footer".
7 Whim, start with the second letter of the company's English name:
$ Sort-t ' K 1.2 facebook.txt
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Guge 50 3000
Look, we used the-K 1.2, which represents the sort of string that starts with the second character of the first field until the last character in the field. You will find that Baidu is the number one because the second letter is a. Sohu and Google's second character are O, but Sohu's H is in front of Google's O, so the two are ranked second and third respectively. Guge can only be the fourth.
8!!! Another whim, only the second letter of the company's English name is sorted, if the same is sorted by the employee's salary in descending order:
$ Sort-t ' k 1.2,1.2-k 3,3nr facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000
Because only the second letter is sorted, we use the notation of-K 1.2,1.2, which means that we "only" sort the second letter. (If you ask "do I use the-K 1.2?") ", of course not, because you omit the end part, which means you will sort the string from the second letter to the last character in the field. We also use the-K 3, 3, which is the most accurate representation of the employee's salary, which means that we "only" sort the domain, because if you omit the next 3, we "sort the contents of the beginning of the 3rd field to the last domain location".
9 What options can I use in the modifier section?
B, D, F, I, N, or R can be used.
where N and r you must already be familiar with it.
b means ignoring the sign-in blank symbol for this field.
D indicates that the field is sorted in dictionary order (that is, only white space and letters are considered).
F indicates that the field is sorted by ignoring the case.
I means that "non-printable characters" are ignored and only the printable characters are sorted. (some ASCII is non-printable characters, such as \a is an alarm, \b is a backspace, \ n is a newline, \ r is a carriage return, etc.)
10 Think about the example of a union-K and-u use:
$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500
This is the most original Facebook.txt file.
$ sort-n-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
$ sort-n-K 2-u Facebook.txt
Guge 50 3000
Baidu 100 5000
Google 110 5000
When the settings are sorted by the company's employee domain, and then you add-u, the Sohu line is deleted! The original-u only recognizes the domain that is set with-K and finds the same, deleting the same rows.
$ sort-k 1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500
$ sort-k 1.1,1.1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
The same is also the case with the Guge of the first character G is not immune to the difficulty.
$ sort-n-K 2-k 3-u Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
Hey! With the two-tier sort priority set here, you do not delete any rows using-U. The original-U is going to weigh all the-K options, will be the same will be deleted, as long as there is one level of difference will not be easily deleted: (No, you can add a line Sina 100 4500 try)
11 Most bizarre sort:
$ sort-n-K 2.2,3.1 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Sorts the second character of the second field to the end of the first character of the third field.
The first line extracts 0 3, the second row extracts 00 5, the third row extracts 00 4, and the fourth row extracts 10 5.
And because sort thinks 0 is less than 00 less than 000 less than 0000 ....
So 0 3 is definitely on the first one. 10 5 must be in the last one. But why is 00 5 in front of 00 4? (You can do your own experiment and think about it.) )
The answer is: "Cross-domain setting is an illusion", sort will only compare the second character of the second field to the last character of the second field, instead of the first character of the third field into the comparison range. When 00 and 00 are found to be the same, sort automatically compares the first domain. Of course, Baidu in front of the Sohu. One example confirms:
$ sort-n-K 2.2,3.1-k 1,1r Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000
12 Sometimes you see +1-2 these symbols after the sort command, what is this?
As for this syntax, the latest sort is explained in this way:
On older systems, ' sort ' supports a obsolete Origin-zero syntax ' +POS1 [-pos2] ' for specifying sort keys. POSIX 1003.1-2001 (*note standards Conformance::) does not allow this, use '-K ' instead.
Originally, this ancient expression has been eliminated, you can justly despise the use of this way of expression script!
(To prevent the existence of ancient scripts, here's another way to say that, the plus sign represents the start section, and the minus sign represents the end section.) The most important point is that this approach is counted from 0, the first field that was previously said, and is represented as a No. 0 field. The previous 2nd character, represented here as the 1th character. Got it? )
Shell sort sort Big discussion