Basic Text Processing-powerful sort commands

Source: Internet
Author: User
Tags pear printable characters

Basic Text Processing-the powerful sort command sort compares each row of a file as a unit. The comparison principle is to compare the lines following the first character by ASCII code value in sequence, finally, output them in ascending order (the default sort sorting method is ascending ). Www.2cto.com # cat seq.txt bananaapplepearorangepear # sort seq.txt applebananaorangepearpear1, sort's-u option www.2cto.com, it is easy to remove duplicate rows in the output line. # Sort-u seq.txt applebananaorangepear2 and sort's-r option sort are sorted in ascending order by default. If you want to change it to descending order, add-r. # Sort-r seq.txt pearpearorangebananaapple3, sort-o options have you ever encountered 10 to 2 smaller cases. I have encountered it. This is because the sorting program sorts these numbers by characters. The sorting program will first compare 1 and 2, obviously 1 is small, so it will put 10 in front of 2. This is also the consistent style of sort. If we want to change this situation, we need to use the-n option to tell sort, "sort by value "! # Sort-r number.txt 554324213101 # sort-r-n number.txt 241310554321 4. sort-o options because sort outputs results to standard output by default, therefore, you must use redirection to write the results to a file, such as sort filename> newfile. However, if you want to output the sorting result to the original file, you cannot use redirection. Sort-r number.txt> number.txt clears the number. At this point, the-o option appears. It successfully solves this problem and allows you to write the result to the original file with confidence. This is perhaps the only advantage of-o proportional targeting. # Sort-r-n number.txt-o number.txt cat number.txt 241310554321 5. The-t option and-k option files of sort have three columns separated by colons, the first column indicates the fruit type, the second column indicates the fruit quantity, and the third column indicates the fruit price. So I want to sort by the number of fruits, that is, by the second column. How can I use sort to achieve this? Fortunately, sort provides the-t option, and you can set the delimiter later. (Do you think of the-d option of cut and paste, and resonate ~~) After the Delimiter is specified, you can use-k to specify the number of columns. # Cat fruit.txt banana: 30: 5.5 apple: 10: 2.5 pear: 90: 2.3 orange: 20: 3.4 we use colons as the delimiter and sort the values in ascending order for the second column, the results were satisfactory. # Sort-n-k2-t: fruit.txt apple: 10: 2.5 orange: 20: 3.4 banana: 30: 5.5 pear: 90: 2.3 6 other common sort options-f converts all lower-case letters to upper-case letters for comparison, that is, ignoring the upper-case letters-c will check whether the file is sorted properly. If it is out of order, then, information about the first unordered row is output, and 1-C is returned to check whether the file is sorted properly. If no content is output due to unordered order, only 1-M is returned and sorted by month, for example, if JAN is smaller than FEB and so on,-B will ignore all the blank parts in front of each line. From the first visible character, compare 7. The k option in sort will discuss the preparation materials: the first domain is the company name, the second domain is the number of people in the company, and the third domain is the average salary of employees. (Except for the company name, others are all written into the comment _ ^) # cat netcompany.txt google 110 5000 baidu 100 5000 3000 guge 50 100 sohu 4500 I want this file to be sorted alphabetically by the company, that is, sort by the first domain # sort-t''-k1 netcompany.txt baidu 100 5000 110 google 5000 3000 guge 50 100 sohu 4500 by company count # sort-n-t' '-k2 netcompany.txt guge 50 3000 baidu 100 5000 sohu 100 4500 110 google 5000 by company count, the same number of employees are sorted in ascending order by the average employee salary: # sort-n-t ''-k2-k3 netcompany.txt guge 50 3000 sohu 100 4500 Baidu 100 5000 google 110 5000, we added-k2-k3 to solve the problem. Sort supports this setting, that is, to set the priority of domain sorting, first sort by 2nd fields, if the same, then sort by 3rd fields. (If you want to, you can keep writing like this and set many sort priorities.) sort by employee salary in descending order. If the number of employees is the same, sort by the number of people in the company in ascending order: (this is a little difficult) # sort-n-t ''-k3r-k2 netcompany.txt baidu 100 5000 google 110 5000 sohu 100 4500 3000 guge 50 here you have used some tips. Take a closer look, A lower-case letter r is added to the end of-k 3. You think about it. Can you get the answer in combination with our previous article? Reveal: the r and-r options have the same effect, that is, reverse order. Because sort is sorted in ascending order by default, r is required to indicate that the third domain (average employee salary) is sorted in descending order. You can add n to sort the field by the value. For example: # sort-t ''-k3nr-k2n netcompany.txt baidu 100 5000 110 google 5000 100 sohu 4500 3000 guge 50, we removed the first-n option, instead, it is added to every-k option. 8. The company's English name is sorted by the second letter, # sort-t ''-k1.2 netcompany.txt baidu 100 5000 sohu 100 4500 google 110 5000 guge 50 3000 if the same order is sorted in descending order by employee salary # sort-t''-k 1.2, 1.2-k 3, 3nr netcompany.txt baidu 100 5000 google 110 5000 sohu 100 4500 3000 guge 50 1.2 because only the second letter is sorted, we use-k, the representation of 1.2 indicates that we only sort the second letter. (If you ask "Why can't I use-k 1.2 ?", Of course not, because you omit the End part, which means that you will sort the strings from the second letter to the last character of the domain ). We also used-k 3, 3 for sorting employees' salaries. This is the most accurate expression, indicating that we only sort this domain, because if you omit the next 3, it becomes "sorting the content from the beginning of 3rd domains to the last domain location. 10 what options can be used in the modifier section? B, d, f, I, n, or r can be used. N and r are familiar to you. B indicates that the blank sign-in characters in the current domain are ignored. D indicates that the domain is ordered alphabetically (that is, only blank spaces and letters are considered ). F indicates that the domain is case-insensitive for sorting. I indicates ignore "printable characters" and only sort printable characters. (Some ASCII characters cannot be printed. For example, \ a is an alarm, \ B is a backspace, \ n is a line feed, and \ r is a carriage return)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.