Tip: Sort files with Sort and Tsort

Last Update:2017-10-09 Source: Internet

Author: User

Tags month name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article Title: tip: Sort files by Sort and Tsort. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
By using sort and tsort, instead of using a more complex solution that uses Perl or Awk, you can save time and avoid headaches. Jacek Artymiak will show you how to do this.
　　
Although you can use Perl or Awk to write advanced sorting applications, this is not always necessary and it is often a headache. Using the sort command, you can also implement most of the functions you need, and it is easier, it can sort the rows in multiple files, merge files, and even check whether it is necessary to sort them. You can specify the sort key (a part of the row used for comparison) or not. In the latter case, sort compares all rows.
　　
Therefore, if you want to sort the password files, you can use the following command (note that you cannot directly send the output to the input file because this will damage the input file. This is why you need to send it to a temporary file and rename the file as/etc/passwd, as shown below ).
　　
Listing 1. Simple sorting
$ Su-
# Sort/etc/passwd>/etc/passwd-new
# Mv/etc/passwd-new/etc/passwd
　　
If you want to reverse the sorting order, use the-r option. You can also use the-u option to disable printing the same row.
　　
A very practical feature of sort is its ability to sort by field keys. A field is a text string separated by a character from other fields. For example, fields in/etc/passwd are separated by colons. Therefore, if you want to, you can sort/etc/passwd by user ID, group ID, comment field, main directory or shell. To do this, use the-t option, followed by the character used as the separator, followed by the field number used as the sort key, and then the number of the last field used as the key; for example, sort-t:-k 5, 5/etc/passwd sorts the password files according to the comment field, which stores the complete user name (such as "John Smith "). Sort-t:-k 3, 4/etc/passwd simultaneously sorts the same file by user ID and group ID. If you omit the second number, sort assumes that the key starts from the specified field and ends at the end of each row. Try it and observe the difference (add the-g option when the number order looks wrong ).
　　
Note that the blank transition is the default delimiter. Therefore, if the field is already separated by blank characters, you can omit the delimiter and only use-t (note: the field number starts from 1 ).
　　
For better control, you can use keys and offsets. Offset is separated by vertices and keys. For example, in-k 1.3 and 5.7, it indicates that the sort key should start with 1st characters of the 3rd fields, end with 5th characters of the 7th fields (the offset is also numbered from 1 ). When will offset be used? Well, I often use it to sort Apache logs. The key and offset notation allow me to skip the Date Field.
　　
Another option to be concerned is-B, which tells sort to ignore blank characters (space, skip, and so on) and treat the first non-blank character in the row as the start of the sort key. Also, if you use this option, the offset is calculated starting from the first non-blank character (when the field separator is not a blank character and the field may contain a string starting with a blank character, this is very useful ).
　　
You can use the following options to further modify the Sorting Algorithm:-d (only use letters, numbers, and spaces as the sorting Key),-f (disable case sensitivity, lower case and upper case are considered the same),-I (ignore non-printable ASCII characters),-M (use the month name abbreviations of three letters: JAN, FEB, MAR... To sort rows) and-n (to sort rows with only numbers,-, commas, or another thousands of delimiters ). These options and-B and-r options can be used as part of the key number. In this case, they only apply to this key, not global, the function is the same as that used outside the key definition.
　　
Take the key number usage as an example. Consider the following:
　　
Sort-t:-k 4g, 4-k 3gr, 3/etc/passwd
　　
This command sorts passwd files by group ID, and sorts the files by user ID in the group in reverse order.
　　
However, this is not all sort capabilities. If the key you are using cannot be used to determine which row is in the first place, it can also solve this type of draw problem. Add a prompt to solve the draw problem. Add another-k option so that it follows the field and (optional) offset and uses the same notation as the one used to define the key. For example, when sorting rows by sort-k 3.4, 4.5-k 7.3, 9.4/etc/passwd, use a key that starts from 3rd characters of the 4th keys and ends with 4th characters of the 5th keys, the preceding problem is solved by using a key that ranges from 7th characters in 3rd fields to 9th characters in 4th fields.
　　
The last set of options processes input, output, and temporary files. For example, if the-c option is used in sort-c <file, it checks whether the input file has been sorted (you can also use other options). If the input file has been sorted, an error is reported. In this way, you can easily check large files that may take a long time to sort. When you use the-u option and-c option together, it is interpreted as a request: Check that there are no two identical rows in the input file.
　　
When you process large files, there is also an important-T option, which is used to specify other directories for temporary files (these temporary files will be removed after sort completes work, instead of the default/tmp directory.
　　
You can use sort to process multiple files at the same time. There are basically two ways to do this: first, you can use cat to merge them, as shown below:
　　
Cat file1 file2 file3 | sort> outfile
　　
Alternatively, you can use the following command:
　　
Sort-m file1 file2 file3> outfile
　　
In the second case, there is a condition that each file must be sorted before all input files are sort-m together. This seems to be an unnecessary burden, but in fact it accelerates work and saves valuable system resources. By the way, don't forget the-m option. Here you can use the-u option to disable printing the same row.
　　
If you need a more profound sorting method, you may want to view the tsort command, which performs topological sorting on files. The differences between topological sorting and standard sort are shown in Listing 2 (you can download happybirthday.txt from references ).
　　
Listing 2. Differences between topological sorting and standard sorting $ cat happybirthday.txt
　　
Happy Birthday to You!
　　
Happy Birthday to You!
　　
Happy Birthday Dear Tux!
　　
Happy Birthday to You!
　　
$ Sort happybirthday.txt
　　
Happy Birthday Dear Tux!
　　
Happy Birthday to You!
　　
Happy Birthday to You!
　　
Happy Birthday to You!
　　
$ Tsort happybirthday.txt
　　
Dear
　　
Happy
　　
To
　　
Tux!
　　
Birthday
　　
You!
Of course, this is not a very useful demonstration for the use of tsort, but an example shows the differences between the two command outputs.
　　
Tsort is usually used to solve a logic problem, that is, the entire order must be predicted through the observed partial order; for example (from the tsort information page ):
　　
Tsort < 　　
This output will be generated.
A
B
C
D
E
F

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More