Detailed Linux Sort command working principle and tutorial

Source: Internet
Author: User
Tags numeric lowercase modifier pear printable characters sort sorts

Sort is a very common command in Linux, pipe sort, focus, five minutes for sort, now start!

1 How the Sort works

Sort compares each row of the file as a unit, comparing it to each other, from the first character backwards, sequentially by the ASCII code value, and finally outputting them in ascending order.

[Rocrocket@rocrocket programming]$ Cat Seq.txt

Banana

Apple

Pear

Orange

[Rocrocket@rocrocket programming]$ Sort Seq.txt

Apple

Banana

Orange

Pear

2 Sort's-u option

It works very simply by removing duplicate rows from the output line.

[Rocrocket@rocrocket programming]$ Cat Seq.txt

Banana

Apple

Pear

Orange

Pear

[Rocrocket@rocrocket programming]$ Sort Seq.txt

Apple

Banana

Orange

Pear

Pear

[Rocrocket@rocrocket programming]$ sort-u Seq.txt

Apple

Banana

Orange

Pear

Pear was ruthlessly deleted because of the repeated-u option.

3 Sort's-r option

Sort default sorting is ascending, and if you want to change to a descending order, add a-R to fix it.

[Rocrocket@rocrocket programming]$ Cat Number.txt

1

3

5

2

4

[Rocrocket@rocrocket programming]$ Sort Number.txt

1

2

3

4

5

[Rocrocket@rocrocket programming]$ sort-r Number.txt

5

4

3

2

1

4 Sort's-o option

Because sort defaults to output the results to standard output, redirection is required to write the results to a file, in the form of sort filename > NewFile.

However, if you want to output the results of the order to the original file, you can not use redirection.

[Rocrocket@rocrocket programming]$ sort-r number.txt > Number.txt

[Rocrocket@rocrocket programming]$ Cat Number.txt

[Rocrocket@rocrocket programming]$

Look, I emptied the number.

At this time, the-o option appears, it successfully solves the problem, let you rest assured that the results are written to the original file. This may also be the only advantage of-----specific orientation.

[Rocrocket@rocrocket programming]$ Cat Number.txt

1

3

5

2

4

[Rocrocket@rocrocket programming]$ sort-r number.txt-o number.txt

[Rocrocket@rocrocket programming]$ Cat Number.txt

5

4

3

2

1

5 Sort's-N option

Have you ever encountered 10:2 small cases. I've come across it anyway. This happens because the sort program sorts the numbers by character, and the sort program compares 1 and 2, obviously 1 small, so put 10 in front of 2. This is also the style of sort.

If we want to change the status quo, we need to use the-n option to tell sort, "to sort by value"!

[Rocrocket@rocrocket programming]$ Cat Number.txt

1

10

19

11

2

5

[Rocrocket@rocrocket programming]$ Sort Number.txt

1

10

11

19

2

5

[Rocrocket@rocrocket programming]$ sort-n Number.txt

1

2

5

10

11

19

6 Sort's-t option and-K option

If the contents of a file are like this:

[Rocrocket@rocrocket programming]$ Cat Facebook.txt

banana:30:5.5

apple:10:2.5

pear:90:2.3

orange:20:3.4

This file has three columns, the column is separated from the column by a colon, the first column indicates the fruit type, the second column indicates the fruit quantity, and the third column indicates the fruit price.

So I want to sort by the number of fruits, which is in the second column, how do I use the sort implementation?

Luckily, sort provides the-t option, followed by a spacer. (Do not think of cut and paste-D option, resonance ~ ~)

Once the spacer is specified, you can specify the number of columns with-K.

[Rocrocket@rocrocket programming]$ sort-n-K 2-t: Facebook.txt

apple:10:2.5

orange:20:3.4

banana:30:5.5

pear:90:2.3

We used the colon as the separator and sorted the numeric ascending order for the second column, and the results were satisfactory.

7 Other sort-common options

-F converts lowercase letters to uppercase letters for comparison, which is to ignore case

-C will check whether the file is sorted, if disorderly, then output the first disorderly row of relevant information, and finally return 1

-C will check whether the file is sorted, if the order, do not output, only return 1

-M will be sorted by month, such as more than Feb and so on

-B ignores all the blanks in front of each line, starting with the first visible character.

Sometimes learning scripts, you will find that the sort command followed by a bunch of similar-k1,2, or-k1.2-k3.4, some strange. Today, we're going to take care of it.--k option!

1 Preparation Material

$ cat Facebook.txt

Google 110 5000

Baidu 100 5000

Guge 50 3000

Sohu 100 4500

The first domain is the company name, the second domain is the number of companies, and the third field is the average employee salary. (other than the company name, other letters, all the written ^_^)

2 I want this file sorted alphabetically by company, which is sorted by the first field: (This facebook.txt file has three domains)

$ Sort-t ' K 1 facebook.txt

Baidu 100 5000

Google 110 5000

Guge 50 3000

Sohu 100 4500

See, just use K-1 to set it up. (In fact, this is not strict, you will know later)

3 I want Facebook.txt to be sorted by company number.

$ sort-n-T ' K 2 facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

Don't explain, I'm sure you can understand.

However, there is a problem, that is, Baidu and Sohu companies are the same number, are 100 people, this time how to do? According to the default rule, the first field is sorted in ascending order, so Baidu is in front of the Sohu.

4 I want facebook.txt to be sorted by number of companies, in the same number of employees in ascending order:

$ sort-n-T ' k 2-k 3 facebook.txt

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

See, we solved the problem by adding a-k2-k3. On the drop, sort supports this setting, that is, setting the priority of a domain sort, first in the 2nd field, if the same, and then sorting in the 3rd field. (If you like, you can always write this down, set a number of sorting priorities)

5 I want Facebook.txt to be sorted in descending order of employee wages, if the number of employees is the same, in ascending order of number of companies: (This is a bit difficult)

$ sort-n-T ' k 3r-k 2 facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

Here are some tips for you to take a closer look and sneak a lowercase r behind K 3. You think, combined with our last article, can we get the answer? Announcement: R and R options work the same way, which means reverse order. Because sort is sorted by default in ascending order, you need to add r here to indicate that the third field (employee average salary) is sorted in descending order. Here you can also add n, which means that you sort the field by numeric size, for example:

$ Sort-t ' k 3nr-k 2n facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

See, we removed the first-n option and added it to every-k option.

Specific syntax format for 6-K options

To go further down, you have to have some theoretical knowledge. You need to understand the syntax format of the-K option, as follows:

[Fstart [.] Cstart]] [Modifier] [, [fend] [. Cend] [Modifier]]

This syntax format can be divided into two parts, the start and end sections, by commas (",").

The first thing to instill in you is that "if you don't set the end part, then you think that it is set to be a line tail." This concept is important, but often you don't value it.

The start part is also composed of three parts, the modifier part of which we have mentioned earlier, like the N and r options section. Let's focus on the Fstart and C.start of the Start section.

C.start can also be omitted, and the omitted words begin at the beginning of the field. The examples of K-2 and-K 3 in the previous example are those that omit the c.start.

Fstart.cstart, where Fstart is the domain that represents the use, and Cstart represents the "sorted first character" from the first few characters in the Fstart domain.

Similarly, in the end section, you can set the Fend.cend, and if you omit. Cend, the ending to the "end of the field", which is the last character in the field. Alternatively, if you set the Cend to 0 (0), it also indicates the end to the "domain end."

7 whim, starting with the second letter of the company's English name:

$ Sort-t ' K 1.2 facebook.txt

Baidu 100 5000

Sohu 100 4500

Google 110 5000

Guge 50 3000

Look, we use-K 1.2, which means that the second character of the first field is sorted by the string that starts with the last character in the field. You will find that Baidu is ranked first for the second letter. Sohu and Google's second character are all O, but Sohu's H is in front of Google's O, so the two are ranked second and third respectively. Guge can only be relegated to the fourth.

8 and whim, only the second letter of the company's English name is sorted, if the same in descending order according to employee wages:

$ Sort-t ' k 1.2,1.2-k 3,3nr facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

Guge 50 3000

Because only the second letter is sorted, we use the representation of the-K 1.2,1.2, which means we "just" sort the second letter. (If you ask, "Why can't I use K 1.2?", of course not, because you omitted the end section, which means you will sort the string from the second letter to the last character in the field). To sort the employee's wages, we also used the-K 3, 3, which is the most accurate expression, indicating that we "only" sort the field, because if you omit the next 3, it becomes our "sort of content for the beginning of the 3rd field to the last field position".

9 What other options are available in the modifier section?

You can use the B, D, F, I, N, or R.

where N and r you must have been very familiar with.

b indicates that the sign-ins for this domain are ignored.

D indicates that the field is sorted in dictionary order (that is, only whitespace and letters are considered).

F indicates that the case is sorted by ignoring the field.

I indicates that "nonprinting characters" are ignored and only the printable characters are sorted. (some ASCII is not printable characters, such as A is alarm, B is backspace, n is newline, R is carriage return, etc.)

10 thinking about the examples of union-K and-u use:

$ cat Facebook.txt

Google 110 5000

Baidu 100 5000

Guge 50 3000

Sohu 100 4500

This is the most original Facebook.txt file.

$ sort-n-K 2 Facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

$ sort-n-K 2-u Facebook.txt

Guge 50 3000

Baidu 100 5000

Google 110 5000

When the setting is sorted numerically with the company's employee field, then the Sohu row is deleted when you add-U. Originally--------------------only recognize the domain with the K-set, and then delete the same rows later.

$ sort-k 1-u Facebook.txt

Baidu 100 5000

Google 110 5000

Guge 50 3000

Sohu 100 4500

$ sort-k 1.1,1.1-u Facebook.txt

Baidu 100 5000

Google 110 5000

Sohu 100 4500

This example is also the same, the beginning of the character is G Guge is not immune from the difficulty.

$ sort-n-K 2-k 3-u Facebook.txt

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

Hey, huh! In the case where the two-tier sorting priority is set, no rows are deleted using-U. The original-u will weigh all k options, will be the same will be deleted, as long as there is a different level will not be easily deleted: (do not believe that you can add a line of Sina 100 4500 try)

11 Most bizarre sort:

$ sort-n-K 2.2,3.1 Facebook.txt

Guge 50 3000

Baidu 100 5000

Sohu 100 4500

Google 110 5000

Sorts the end of the first character in the third field, starting with the second character of the second field.

The first line extracts 0 3, the second row extracts 00 5, the third row extracts 00 4, and line fourth extracts 10 5.

And because sort thinks 0 is less than 00 less than 000 less than 0000 ....

So 0 3 is definitely in the first one. 10 5 is certainly in the last one. But why is 00 5 in front of 00 4? (You can do your own experiment and think about it.) )

The answer is: "Cross-domain setting is an illusion", sort will only compare the second character of the second field to the last character in the second field, not the first character of the third field in the comparison range. When 00 and 00 are found the same, sort automatically compares the first domain. Of course, Baidu is in front of Sohu. One example confirms:

$ sort-n-K 2.2,3.1-k 1,1r Facebook.txt

Guge 50 3000

Sohu 100 4500

Baidu 100 5000

Google 110 5000

12 Sometimes you see +1-2 these symbols after the sort command, what is this stuff?

For this syntax, the latest sort explains this:

On older systems, ' sort ' supports a obsolete Origin-zero syntax ' +POS1 [-pos2] ' for specifying sort keys. POSIX 1003.1-2001 (*note standards Conformance::) does not allow this, use '-K ' instead.

Originally, this ancient way of expression has been eliminated, and later can justifiably despise the use of this representation of the script!

(To prevent the existence of ancient scripts, say this in this way, the plus sign represents the start part, and the minus sign represents the end.) The most important point is that this method is counted starting from 0, the first domain that was previously referred to as the No. 0 domain. The previous 2nd character, represented here as the 1th character. Understand?)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.