[Turn]linux Sort command in detail

Source: Internet
Author: User
Tags modifier pear printable characters sorts

Transferred from: http://www.cnblogs.com/51linux/archive/2012/05/23/2515299.html

Sort is a very common command in Linux, sort by tube, concentrate, five minutes to fix sort, start now!

1 How the Sort works

Sort compares each line of a file as a unit, comparing it from the first character backwards, to the ASCII value in turn, and finally outputting them in ascending order.

[email protected] programming]$ cat Seq.txt
Banana
Apple
Pear
Orange
[Email protected] programming]$ sort Seq.txt
Apple
Banana
Orange
Pear

2-u option for sort

It is simple to remove duplicate rows in the output line.

[email protected] programming]$ cat Seq.txt
Banana
Apple
Pear
Orange
Pear
[Email protected] programming]$ sort Seq.txt
Apple
Banana
Orange
Pear
Pear
[Email protected] programming]$ sort-u seq.txt
Apple
Banana
Orange
Pear

Pear was ruthlessly removed by the-u option because of repetition.

3 Sort's-r option

Sort by default is in ascending order, if you want to change to descending order, add an-R to get it done.

[email protected] programming]$ cat Number.txt
1
3
5
2
4
[Email protected] programming]$ sort Number.txt
1
2
3
4
5
[Email protected] programming]$ sort-r number.txt
5
4
3
2
1

4-o option for sort

Because sort defaults to outputting the results to standard output, a redirect is required to write the results to a file, such as the sort filename > NewFile.

However, if you want to output the sorting results to the original file, redirection is not possible.

[Email protected] programming]$ sort-r number.txt > Number.txt
[email protected] programming]$ cat Number.txt
[Email protected] programming]$
Look, the number was emptied.

At this point, the-O option appears, which successfully solves this problem, allowing you to confidently write the results to the original file. This may also be the only advantage of the-o specific direction.

[email protected] programming]$ cat Number.txt
1
3
5
2
4
[Email protected] programming]$ sort-r number.txt-o number.txt
[email protected] programming]$ cat Number.txt
5
4
3
2
1

5-N option for sort

Have you ever encountered 10:2 small cases. I've met anyway. This occurs because the sorting program sorts the numbers by character, and the sorting program compares 1 and 2, which is obviously 1 small, so 10 is placed before 2. This is also the sort's consistent style.

If we want to change this situation, we need to use the-n option to tell sort, "Sort by value"!

[email protected] programming]$ cat Number.txt
1
10
19
11
2
5
[Email protected] programming]$ sort Number.txt
1
10
11
19
2
5
[Email protected] programming]$ sort-n number.txt
1
2
5
10
11
19

6 Sort's-t option and-K option

If there is a file with the contents of this:

[email protected] programming]$ cat Facebook.txt
banana:30:5.5
apple:10:2.5
pear:90:2.3
orange:20:3.4

This file has three columns, separated by a colon between the column and column, the first column indicates the fruit type, the second column represents the fruit quantity, and the third column represents the fruit price.

So I would like to sort by the number of fruits, that is, in the second column, how to use the sort implementation?

Fortunately, sort provides the-t option, after which you can set the spacer. (Does not think of the cut and paste's-D option, resonance ~ ~)

After you specify a spacer, you can use-K to specify the number of columns.

[Email protected] programming]$ sort-n-K 2-t: Facebook.txt
apple:10:2.5
orange:20:3.4
banana:30:5.5
pear:90:2.3

We use the colon as the spacer, and we sort the numbers in ascending order for the second column, and the result is very satisfying.

7 Other sort common options

-F converts lowercase letters to uppercase for comparison, i.e. ignores case

-C checks to see if the file is ordered, and if it is out of order, it outputs the information about the first scrambled row, and finally returns 1

-C Checks if the file is ordered, if it is not output, returns only 1

-M is sorted by month, for example, Jan is less than Feb and so on

-B ignores all the blank parts preceding each line, starting with the first visible character.

Sometimes learning scripts, you will find the sort command followed by a bunch of similar-k1,2, or-k1.2-k3.4 things, some unthinkable. Today, we'll take care of it--k options!

1 Preparing the material

$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500

The first domain is the company name, the second field is the number of companies, and the third field is the average employee wage. (except for the company name, other letters, all ^_^)

2 I want this file to be sorted alphabetically by company, that is, by the first field: (This facebook.txt file has three domains)

$ Sort-t '-K 1 facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500

See, just use the-K 1 to set it. (It's not strictly here, but you'll know later)

3 I want Facebook.txt to be sorted by company number.

$ sort-n-T "-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

Don't explain, I'm sure you can understand.

However, there is a problem, that is, Baidu and Sohu the same number of companies, are 100 people, this time how to do? The default rule is to sort ascending from the first field, so Baidu is ranked in front of Sohu.

4 I want facebook.txt to be sorted according to the number of companies in the same number of employees in ascending order of average salary:

$ sort-n-T "-K 2-k 3 Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

Look, we've added a-k2-k3 to solve the problem. To drop, sort supports this setting, which is to prioritize the domain sort, first to sort by the 2nd field, and then to sort by the 3rd field if the same. (If you want to, you can always write this down, set a number of sort priorities)

5 I want facebook.txt to be sorted according to the employee's salary in descending order, if the number of employees is the same, according to the number of companies in ascending order: (This is a bit difficult)

$ sort-n-T "-K 3r-k 2 Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Here are some tips to take a closer look and secretly add a small letter R after the-K 3. Do you think, combined with our last article, can you get an answer? Announcement: The function of the R and-R options is the same, which means reverse order. Because sort is sorted by default in ascending order, it is necessary to add r here to indicate that the third field (employee average wage) is sorted in descending order. Here you can also add n, which means to sort the field by the numeric size, for example:

$ Sort-t ' k 3nr-k 2n facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Look, we removed the front-n option, but added it to each of the-K options.

The specific syntax format of the 6-K option

To go further down, you have to have some theoretical knowledge. You need to understand the syntax format of the-K option, as follows:

[Fstart [. Cstart] [Modifier] [, [fend] [. Cend] [Modifier]]

This syntax format can be divided into two parts, the start part and the end part, by the comma (",").

Let's start by instilling in you the idea that if you don't set the end part, you think end is set as the end of the line. This concept is important, but often you don't value it.

The start section is also made up of three parts, the modifier part of which is what we said earlier about the options section like N and R. We focus on the Fstart and C.start of the Start section.

C.start can also be omitted, and the omitted words are indicated from the beginning of the domain. The K 2 and K 3 in the previous example are examples of omitting c.start.

Fstart.cstart, where Fstart is the field that represents the use, and Cstart means "sort first character" from the first character in the Fstart field.

Similarly, in the end section, you can set fend.cend, if you omit. Cend, the end to "domain Footer", which is the last character of the domain. Or, if you set Cend to 0 (0), it is also the end to "domain Footer".

7 whim, start with the second letter of the company's English name:

$ Sort-t ' K 1.2 facebook.txt
Baidu 100 5000
Sohu 100 4500
Google 110 5000
Guge 50 3000

Look, we used the-K 1.2, which represents the sort of string that starts with the second character of the first field until the last character in the field. You will find that Baidu is the number one because the second letter is a. Sohu and Google's second character are O, but Sohu's H is in front of Google's O, so the two are ranked second and third respectively. Guge can only be the fourth.

8 and the whim, only for the company English name of the second letter to sort, if the same according to the wages of employees in descending order:

$ Sort-t ' k 1.2,1.2-k 3,3nr facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500
Guge 50 3000

Because only the second letter is sorted, we use the notation of-K 1.2,1.2, which means that we "only" sort the second letter. (If you ask "do I use the-K 1.2?") ", of course not, because you omit the end part, which means you will sort the string from the second letter to the last character in the field. We also use the-K 3, 3, which is the most accurate representation of the employee's salary, which means that we "only" sort the domain, because if you omit the next 3, we "sort the contents of the beginning of the 3rd field to the last domain location".

9 What options can I use in the modifier section?

B, D, F, I, N, or R can be used.

where N and r you must already be familiar with it.

b means ignoring the sign-in blank symbol for this field.

D indicates that the field is sorted in dictionary order (that is, only white space and letters are considered).

F indicates that the field is sorted by ignoring the case.

I means that "non-printable characters" are ignored and only the printable characters are sorted. (some ASCII is non-printable characters, such as \a is an alarm, \b is a backspace, \ n is a newline, \ r is a carriage return, etc.)

10 think about the example of a union-K and-u use:

$ cat Facebook.txt
Google 110 5000
Baidu 100 5000
Guge 50 3000
Sohu 100 4500

This is the most original Facebook.txt file.

$ sort-n-K 2 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

$ sort-n-K 2-u Facebook.txt
Guge 50 3000
Baidu 100 5000
Google 110 5000

When the settings are sorted by the company's employee domain, and then you add-u, the Sohu line is deleted! The original-u only recognizes the domain that is set with-K and finds the same, deleting the same rows.

$ sort-k 1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Guge 50 3000
Sohu 100 4500

$ sort-k 1.1,1.1-u Facebook.txt
Baidu 100 5000
Google 110 5000
Sohu 100 4500

The same is also the case with the Guge of the first character G is not immune to the difficulty.

$ sort-n-K 2-k 3-u Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

Hey! With the two-tier sort priority set here, you do not delete any rows using-U. The original-U is going to weigh all the-K options, will be the same will be deleted, as long as there is one level of difference will not be easily deleted: (No, you can add a line Sina 100 4500 try)

11 Most bizarre sort:

$ sort-n-K 2.2,3.1 Facebook.txt
Guge 50 3000
Baidu 100 5000
Sohu 100 4500
Google 110 5000

Sorts the second character of the second field to the end of the first character of the third field.

The first line extracts 0 3, the second row extracts 00 5, the third row extracts 00 4, and the fourth row extracts 10 5.

And because sort thinks 0 is less than 00 less than 000 less than 0000 ....

So 0 3 is definitely on the first one. 10 5 must be in the last one. But why is 00 5 in front of 00 4? (You can do your own experiment and think about it.) )

The answer is: "Cross-domain setting is an illusion", sort will only compare the second character of the second field to the last character of the second field, instead of the first character of the third field into the comparison range. When 00 and 00 are found to be the same, sort automatically compares the first domain. Of course, Baidu in front of the Sohu. One example confirms:

$ sort-n-K 2.2,3.1-k 1,1r Facebook.txt
Guge 50 3000
Sohu 100 4500
Baidu 100 5000
Google 110 5000

12 Sometimes you see +1-2 these symbols after the sort command, what is this?

As for this syntax, the latest sort is explained in this way:

On older systems, ' sort ' supports a obsolete Origin-zero syntax ' +POS1 [-pos2] ' for specifying sort keys. POSIX 1003.1-2001 (*note standards Conformance::) does not allow this, use '-K ' instead.

Originally, this ancient expression has been eliminated, you can justly despise the use of this way of expression script!

(To prevent the existence of ancient scripts, here's another way to say that, the plus sign represents the start section, and the minus sign represents the end section.) The most important point is that this approach is counted from 0, the first field that was previously said, and is represented as a No. 0 field. The previous 2nd character, represented here as the 1th character. Got it? )

[Turn]linux Sort command in detail

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.