[Shell Study Notes] the sort and uniq commands sort the text, single and repeat the document directory [hide]
- Sort command syntax
- Common options
- Common sort usage
- Uniq command
- Command Options
- Uniq usage
Sort is a commonly used command in Linux. It sorts files and outputs standard sorting results. The sort command can get input from either a specific file or stdin.
Sort command syntax
Sort option Parameters
Common options
- -B: Ignore space characters at the beginning of each line;
- -C: Check whether the files are sorted in order. If the sorting is true;
- -D: When sorting, English letters, numbers, and space characters are processed and sorted alphabetically. Ignore all other characters;
- -F: In sorting, lowercase letters are considered as uppercase letters;
- -I: Processing ~ during sorting ~ 176 ASCII characters, ignoring all other characters;
- -M: Merge several sorted files;
- -M: Sort the first three letters by the abbreviation of the month;
- -N: Sort by value;
- -O outfile.txt: Save the final result to outfile.txt;
- -R: Sort in reverse order;
- -K: Specify the number of columns to be sorted );
- -T Separator: Specifies the column Separator Used for sorting;
- + Start column-end column: sort by the specified column, and the range is from the start column to the first column of the end column. (Old usage)
Common sort usage
Sort compares each line of a file or text as a unit. The comparison principle is to compare the lines from the first character to the back, compare them by ASCII code value, and output them in ascending order.
[[email protected] text]# cat sort.txtaaa:10:1.1ccc:30:3.3ddd:40:4.4bbb:20:2.2eee:50:5.5eee:50:5.5[[email protected] text]# sort sort.txtaaa:10:1.1bbb:20:2.2ccc:30:3.3ddd:40:4.4eee:50:5.5eee:50:5.5
Ignore the same row using the-u option or uniq:
[[Email protected] text] # Cat sort.txt AAA: 10: 1.1ccc: 30: 3.3ddd: 40: 4.4bbb: 20: 2.2eee: 50: 5.5eee: 50: 5.5 [[email protected] text] # Sort-u sort.txt AAA: 10: 1.1bbb: 20: 2.2ccc: 30: 3.3ddd: 40: 4.4eee: 50: 5.5 or [[email protected] text] # uniq sort.txt AAA: 10: 1.1ccc: 30: 3.3ddd: 40: 4.4bbb: 20: 2.2eee: 50: 5.5
Use of the-N,-R,-k, and-T options of sort:
[[Email protected] text] # Cat sort.txt AAA: BB: ccaaa: 30: 1.6ccc: 50: 3.3ddd: 20: 4.2bbb: 10: 2.5eee: 40: 5.4eee: 60: 5.1 # Sort the BB column in ascending order of numbers: [[email protected] text] # Sort-NK 2-T: sort.txt AAA: BB: ccbbb: 10: 2.5ddd: 20: 4.2aaa: 30: 1.6eee: 40: 5.4ccc: 50: 3.3eee: 60: 5.1 # Sort the numbers in the CC column in ascending order: [[email protected] text] # Sort-NRK 3-T: sort.txt EEE: 40: 5.4eee: 60: 5.1ddd: 20: 4.2ccc: 50: 3.3bbb: 10: 2.5aaa: 30: 1.6aaa: BB: CC #-NIs sorted by number size,-RIn reverse order,-KIs the field that needs to be sorted,-TThe column Delimiter is colon.
-K option syntax format:
-K option syntax format:
Fstart. cstart modifie, fend. Cend modifier -------Start--------,-------End-------- Fstart. cstart option, fend. Cend Option
The syntax format can be divided into two parts by commas,StartParts andEnd. The start part is also composed of three parts. The modifier part is the option part similar to N and R we mentioned earlier. Let's focus on fstart and C. Start in start. C. Start can also be omitted. If it is omitted, it indicates starting from the beginning of the local domain. Fstart. cstart, where fstart indicates the domain used, and cstart indicates that the fstart field starts from the first character of the fstart field ". Similarly, in the end section, you can set fend. Cend. If you omit. Cend, it indicates the end to the end of the domain, that is, the last character of the local domain. Or, if you set Cend to 0 (zero), it also indicates the end to the end of the "domain ".
Sort by the second letter of the company's English Name:
$ sort -t ‘ ‘ -k 1.2 facebook.txtbaidu 100 5000sohu 100 4500google 110 5000guge 50 3000
Used-K 1.2To sort the strings starting from the second character of the first domain to the last character of the current domain. You will find that Baidu is listed first because the second letter is. The second character of Sohu and Google is O, but the H of Sohu is in front of Google's O, so the two are ranked second and third respectively. Guge can only rank fourth.
Only the second letter of the company's English name is sorted. If the same one is sorted in descending order according to the employee's salary:
$ sort -t ‘ ‘ -k 1.2,1.2 -nrk 3,3 facebook.txtbaidu 100 5000google 110 5000sohu 100 4500guge 50 3000
We used-K 1.2, 1.2Indicates that we only sort the second letter. (If you ask "Why can't I use-K 1.2 ?", Of course not, because you omit the end part, which means that you will sort the strings from the second letter to the last character of the domain ). We also used-K 3, 3 for sorting employees' salaries. This is the most accurate expression, indicating that we only sort this domain, because if you omit the next 3, it becomes "sorting the content from the beginning of 3rd domains to the last domain location.
Uniq command
Uniq commandCommonly used language reports or eliminates Duplicated content in files, which is generally used in conjunction with the sort command.
Command Options
- -C: Displays the number of repeated rows at the beginning of each row;
- -D: Only duplicate columns are displayed;
- -F Field: Ignore the specified column;
- -S n: You can skip the first n characters;
- -W characters: Specifies the maximum number of characters for comparison;
- -U: Displays only the rows and columns that appear once;
Uniq usage
Delete duplicate rows:
uniq file.txtsort file.txt | uniqsort -u file.txt
Show only one row:
uniq -u file.txtsort file.txt | uniq -u
Count the number of times each row appears in a file:
sort file.txt | uniq -c
Find duplicate rows in the file:
sort file.txt | uniq -d
Shell Command-sort