13th. Sort, single, and repeat the text: Sort command, Uniq command

Source: Internet
Author: User
Tags modifier

13th. Sort, single, and repeat the text: Sort command, uniq command sort command name explanation

The sort command will sort the files and output the sort results standard. The sort command allows you to get input from a specific file or from stdin.

sort (选项) (参数)
    • -B: Ignores whitespace characters that begin before each line;
    • -C: Check whether the file has been sorted in sequence;
    • -D: When sorting, ignore other characters while processing English letters, numbers and space characters;
    • -F: When sorting, lowercase letters are treated as uppercase letters;
    • -I: When sorting, except for ASCII characters between 040-176, ignore other characters;
    • -M: Merge the files of several sort numbers;
    • -M: Sorts the first 3 letters according to the abbreviation of the month;
    • -N: According to the size of the numerical order;
    • -O: The sorted result is stored in the specified file;
    • -r: Sort in reverse order;
    • -T: Specifies the field separator character to use when sorting;
    • +< start >-< end fields: Sort by the specified fields, ranging from the Start field to the previous column in the End field.
    • -U or--unique: similar to-c meaning, but does not print the same line.

File: Specifies the list of files to be sorted.


Sort compares each line of a file/text as a unit, comparing it from the first character backward to the ASCII value at a time, and finally outputting them in ascending order.

[[email protected] ~]# cat sort.txt     aaa:10:1.1ccc:30:3.3ddd:40:4.4bbb:20:2.2eee:50:5.5eee:50:5.5排序后:[[email protected] ~]# sort sort.txt     aaa:10:1.1bbb:20:2.2ccc:30:3.3ddd:40:4.4eee:50:5.5eee:50:5.5

Ignore the same row

排序 忽略相同的行:[[email protected] ~]# sort -u sort.txt   aaa:10:1.1bbb:20:2.2ccc:30:3.3ddd:40:4.4eee:50:5.5或者[[email protected] ~]# uniq sort.txt aaa:10:1.1ccc:30:3.3ddd:40:4.4bbb:20:2.2eee:50:5.5

Use of the-N,-R,-K,-t options for sort:

-N: is sorted by number size,

-R is in reverse order,

-K is the field that specifies the sort to be sorted,

-t specifies that the field delimiter is a colon

[[email protected] ~]# cat sort2.txt AAA:BB:CCaaa:30:1.6ccc:50:3.3ddd:20:4.2bbb:10:2.5eee:40:5.4eee:60:5.1#将BB列按照数字从小到大顺序排列:[[email protected] ~]# sort -nk 2 -t: sort2.txt AAA:BB:CCbbb:10:2.5ddd:20:4.2aaa:30:1.6eee:40:5.4ccc:50:3.3eee:60:5.1#将CC列数字从大到小顺序排列[[email protected] ~]# sort -nk 3 -t: sort2.txt  AAA:BB:CCaaa:30:1.6bbb:10:2.5ccc:50:3.3ddd:20:4.2eee:60:5.1eee:40:5.4#-n:是按照数字大小排序,-r是以相反顺序,-k是指定需要排序的栏位,-t指定栏位分隔符为冒号

The specific syntax format for the-K option:

FStart.CStart Modifie,FEnd.CEnd Modifier-------Start--------,-------End-------- FStart.CStart 选项  ,  FEnd.CEnd 选项

This syntax format can be , divided into two parts, theStart part and the End part of the comma. The start section is also made up of three parts, the modifier part of which is what we said earlier about the options section like N and R. We're talking about Start part of the FStart and C.Start . C.Startcan also be omitted, the omitted words are indicated from the beginning of the domain. FStart.CStart, which FStart is the field that represents the use, and the CStart FStart "sort first character" from the first character in the field. Similarly, in the end section, you can set FEnd.CEnd , if you omit .CEnd , the end to "domain Footer", which is the last character of the domain. Or, if you set Cend to 0 (0), it is also the end to "domain Footer".

Example: Starting with the second letter of the company's English name

[[email protected] ~]# cat company.txt sohu 100 3000google 100 4000baidu 105 3000guge 105 2500#从公司英文名称的第二个字母开始进行排序[[email protected] ~]# sort -t ‘ ‘ -k 1.2 company.txt  baidu 105 3000sohu 100 3000google 100 4000guge 105 2500#从公司英文名称的第三个字母开始进行排序[[email protected] ~]# sort -t ‘ ‘ -k 1.3 company.txt  guge 105 2500sohu 100 3000baidu 105 3000google 100 4000第1个域 第2个域 第3个域

Use-K 1.2, which indicates the sort of company name for the 2nd character of the 1th field.

Sort by the 2nd letter of the company's English name only, and if you have the same, sort by salary in descending order:

[[email protected] ~]# sort -t ‘ ‘ -k 1.2,1.2 -nrk 3,3 company.txt google 100 4000sohu 100 3000baidu 105 3000guge 105 2500

-K 1.2, 1.2 means sorting only for the 2nd letter. If written as-K 1.2, it means that the string from the 2nd letter to the last character in the domain is sorted.

-K 3, 3 means sorting only for 3rd fields. If written as-K 3, it means that the string starting with the 3rd field begins with the last character in the domain.

Uniq Command name explanation

The Uniq command is used to report or ignore duplicate rows in a file and is generally used in conjunction with the sort command.

uniq (选项) (参数)
    • -C or--count: Displays the number of occurrences of the row next to each column;
    • -D or--repeated: Displays only the rows that appear repeatedly;
    • -f< field > or--skip-fields=<;: Ignores the comparison of the specified characters;
    • -s< character position > or--skip-chars=< character;: Ignores the comparison of the specified character;
    • -U or--unique: Show only one row at a time;
    • -w< character position > or--check-chars=< character position;: Specifies the character to compare.

Input file: Specifies the duplicate line file to be removed. If you do not specify an option, the data is read from the standard;

Output file: Specifies the output file to be written to when the contents of the duplicate row are to be removed. If you do not specify an option, the content is displayed to the standard output device.


To delete duplicate rows:

[[email protected] ~]# cat repeat.txt aaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccdddddddddddddddddddddddd#方法一:[[email protected] ~]# uniq repeat.txt aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccdddddddddddd#方法二:[[email protected] ~]# sort repeat.txt | uniqaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccdddddddddddd#方法三:[[email protected] ~]# sort -u repeat.txt aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccdddddddddddd

Show only one line

#方法一:[[email protected] ~]# uniq -u repeat.txt bbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccc#方法二:[[email protected] ~]# sort repeat.txt | uniq -ubbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccc

Count the number of times each line appears in the file:

[[email protected] ~]# sort repeat.txt | uniq -c      2 aaaaaaaaaaaa      1 bbbbbbbbbb      1 bbbbbbbbbbbbbb      1 cccccccc      1 cccccccccccccc      2 dddddddddddd

Find duplicate lines in the file:

[[email protected] ~]# sort repeat.txt | uniq -daaaaaaaaaaaadddddddddddd

13th. Sort, single, and repeat the text: Sort command, Uniq command

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.