K option for sort command Big discussion-linux command five-minute series 27

Source: Internet
Author: User
Tags modifier printable characters

This original article belongs to "Linux greenhouse" blog, the blog address is http://roclinux.cn. The author of the article is rocrocket.

In order to prevent the vicious reproduction of some websites, special in each article before adding this information, but also hope that readers understand.

===

[Start of body]
Sometimes learning scripts, you will find the sort command followed by a bunch of similar-k1,2, or-k1.2-k3.4 things, some unthinkable. Today, we'll take care of it--k options!

1 Preparing the material

$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500

The first domain is the company name, the second field is the number of companies, and the third field is the average employee wage. (except for the company name, other letters, all ^_^)

2 I want this file to be sorted alphabetically by company, that is, by the first field: (This facebook.txt file has three domains)

$ Sort-t '-K 1 facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500

See, just use the-K 1 to set it. (It's not strictly here, but you'll know later)

3 I want Facebook.txt to be sorted by company number.

$ sort-n-T "-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

Don't explain, I'm sure you can understand.

However, there is a problem, that is, Baidu and Sohu the same number of companies, are 100 people, this time how to do? The default rule is to sort ascending from the first field, so Baidu is ranked in front of Sohu.

4 I want facebook.txt to be sorted according to the number of companies in the same number of employees in ascending order of average salary:

$ sort-n-T "-K 2-k 3 Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

Look, we've added a-k2-k3 to solve the problem. To drop, sort supports this setting, which is to prioritize the domain sort, first to sort by the 2nd field, and then to sort by the 3rd field if the same. (If you want to, you can always write this down, set a number of sort priorities)

5 I want facebook.txt to be sorted according to the employee's salary in descending order, if the number of employees is the same, according to the number of companies in ascending order: (This is a bit difficult)

$ sort-n-T "-K 3r-k 2 Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Here are some tips to take a closer look and secretly add a small letter R after the-K 3. Do you think, combined with our last article, can you get an answer? Announcement: The function of the R and-R options is the same, which means reverse order. Because sort is sorted by default in ascending order, it is necessary to add r here to indicate that the third field (employee average wage) is sorted in descending order. Here you can also add n, which means to sort the field by the numeric size, for example:

$ Sort-t ' k 3nr-k 2n facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Look, we removed the front-n option, but added it to each of the-K options.

The specific syntax format of the 6-K option

To go further down, you have to have some theoretical knowledge. You need to understand the syntax format of the-K option, as follows:

[Fstart [. Cstart] [Modifier] [, [fend] [. Cend] [Modifier]]

This syntax format can be divided into two parts, the start part and the end part, by the comma (",").

Let's start by instilling in you the idea that if you don't set the end part, you think end is set as the end of the line. This concept is important, but often you don't value it.

The start section is also made up of three parts, the modifier part of which is what we said earlier about the options section like N and R. We focus on the Fstart and C.start of the Start section.

C.start can also be omitted, and the omitted words are indicated from the beginning of the domain. The K 2 and K 3 in the previous example are examples of omitting c.start.

Fstart.cstart, where Fstart is the field that represents the use, and Cstart means "sort first character" from the first character in the Fstart field.

Similarly, in the end section, you can set fend.cend, if you omit. Cend, the end to "domain Footer", which is the last character of the domain. Or, if you set Cend to 0 (0), it is also the end to "domain Footer".

7 whim, start with the second letter of the company's English name:

$ Sort-t ' K 1.2 facebook.txt
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Guge 50 3000

Look, we used the-K 1.2, which represents the sort of string that starts with the second character of the first field until the last character in the field. You will find that Baidu is the number one because the second letter is a. Sohu and Google's second character are O, but Sohu's H is in front of Google's O, so the two are ranked second and third respectively. Guge can only be the fourth.

8 and the whim, only for the company English name of the second letter to sort, if the same according to the wages of employees in descending order:

$ Sort-t ' k 1.2,1.2-k 3,3nr facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Because only the second letter is sorted, we use the notation of-K 1.2,1.2, which means that we "only" sort the second letter. (If you ask "do I use the-K 1.2?") ", of course not, because you omit the end part, which means you will sort the string from the second letter to the last character in the field. We also use the-K 3, 3, which is the most accurate representation of the employee's salary, which means that we "only" sort the domain, because if you omit the next 3, we "sort the contents of the beginning of the 3rd field to the last domain location".

9 What options can I use in the modifier section?

B, D, F, I, N, or R can be used.

where N and r you must already be familiar with it.

b means ignoring the sign-in blank symbol for this field.

D indicates that the field is sorted in dictionary order (that is, only white space and letters are considered).

F indicates that the field is sorted by ignoring the case.

I means that "non-printable characters" are ignored and only the printable characters are sorted. (some ASCII is non-printable characters, such as \a is an alarm, \b is a backspace, \ n is a newline, \ r is a carriage return, etc.)

10 think about the example of a union-K and-u use:

$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500

This is the most original Facebook.txt file.

$ sort-n-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

$ sort-n-K 2-u Facebook.txt
Guge 50 3000
Baidu 100 5000
Google 110 5000

When the settings are sorted by the company's employee domain, and then you add-u, the Sohu line is deleted! The original-u only recognizes the domain that is set with-K and finds the same, deleting the same rows.

$ sort-k 1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500

$ sort-k 1.1,1.1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500

The same is also the case with the Guge of the first character G is not immune to the difficulty.

$ sort-n-K 2-k 3-u Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

Hey! With the two-tier sort priority set here, you do not delete any rows using-U. The original-U is going to weigh all the-K options, will be the same will be deleted, as long as there is one level of difference will not be easily deleted: (No, you can add a line Sina 100 4500 try)

11 Most bizarre sort:

$ sort-n-K 2.2,3.1 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

Sorts the second character of the second field to the end of the first character of the third field.

The first line extracts 0 3, the second row extracts 00 5, the third row extracts 00 4, and the fourth row extracts 10 5.

And because sort thinks 0 is less than 00 less than 000 less than 0000 ....

So 0 3 is definitely on the first one. 10 5 must be in the last one. But why is 00 5 in front of 00 4? (You can do your own experiment and think about it.) )

The answer is: "Cross-domain setting is an illusion", sort will only compare the second character of the second field to the last character of the second field, instead of the first character of the third field into the comparison range. When 00 and 00 are found to be the same, sort automatically compares the first domain. Of course, Baidu in front of the Sohu. One example confirms:

$ sort-n-K 2.2,3.1-k 1,1r Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

12 Sometimes you see +1-2 these symbols after the sort command, what is this?

As for this syntax, the latest sort is explained in this way:

On older systems, ' sort ' supports a obsolete Origin-zero syntax ' +POS1 [-pos2] ' for specifying sort keys. POSIX 1003.1-2001 (*note standards Conformance::) does not allow this, use '-K ' instead.

Originally, this ancient expression has been eliminated, you can justly despise the use of this way of expression script!

(To prevent the existence of ancient scripts, here's another way to say that, the plus sign represents the start section, and the minus sign represents the end section.) The most important point is that this approach is counted from 0, the first field that was previously said, and is represented as a No. 0 field. The previous 2nd character, represented here as the 1th character. Got it? )

Conclusion:

This article is the only comparison of the internet on the K-option of the sort of the article, if you want to reprint please be sure to indicate "transfer from the Linux greenhouse-linux Theme Blog", Thank you:)

The-K option for sort is basically the pile of content, if you have anything to add, leave a message:) Welcome to the Exchange!

k option for sort command Big discussion-linux command five-minute series 27

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.