Linux sort command details

Source: Internet
Author: User

 

Linux sort command explanation

Syntax format
Sort [-A] [-B] [-C] [-D] [-F] [-I] [-M] [-N] [-R] [-u] [-o outfile]
[-T character] [-T Directory] [-y [kilobytes] [-Z recordsize] [[+ [fskip]
[. Cskip] [B] [d] [f] [I] [N] [r] [-[fskip] [. cskip] [B] [d] [F]
[I] [N] [r] [-K keydefinition] [file]

Instructions for use
The sort command sorts the rows in the file specified by the file parameter and writes the result to the standard output. If the file parameter specifies multiple files,
Then the sort command connects these files and sorts them as one file. -(Minus sign) specifies the standard input instead of the file name.
If no file name is specified, this command sorts the standard input. You can use the-O flag to specify the output file.

If no flag is specified, the sort command sorts all rows in the input file based on the sorting order in the current language environment.


Main Parameters
-A uses the ASCII sorting order instead of the current language environment sorting order on a byte basis.
-B ignores leading spaces and tabs to find the first or last column of the field.
-C. Check whether the input has been sorted by the sorting rules specified in the flag. If the input file is incorrectly sorted, a non-zero value is returned.
-D. Sort data in alphabetical order. Only letters, numbers, and spaces are considered in the comparison.
-F replace all lowercase letters with uppercase letters before comparison.
-I Comparison ignores all non-display characters.
-K keydefinition specifies the sorting keyword. The format of the keydefinition option is:
[Fstart [. cstart] [modifier] [, [fend [. Cend] [modifier]
The sorting keyword includes all the characters starting with the fields specified by the fstart variable and the columns specified by the cstart variable, and the fields specified by the Fend variable and
The ending character of the column specified by the Cend variable. The value of the modifier variable can be B, d, f, I, n, or R. The modifier is equivalent to a flag of the same letter.

-M: only multiple input files are merged. Assume that the input files are sorted.
-N: sorts numeric fields by arithmetic values. The digit field can contain leading spaces, optional minus signs, decimal numbers, thousands separator, and optional base characters.
When you sort fields that contain any non-numeric characters in numbers, unexpected results may occur.
-O outfile directs the output to the file specified by the OUTFILE parameter, instead of the standard output. The OUTFILE parameter value can be the same as the file parameter value.
-R: reverse the specified sorting order.
-T character specifies character as a single field separator.
-U prohibits sorting by sorting keywords and all equivalent options (except for one row in each group ).
-T Directory: Place all temporary files created in the directory specified by the DIRECTORY parameter.
-Y [kilobytes] starts the sort command with the number of kilobytes of the primary storage specified by the kilobytes parameter, and increases the storage capacity as needed.
(If the value specified by the kilobytes parameter is smaller than the minimum storage site or greater than the maximum storage site, it is replaced by the minimum storage site or the maximum storage site ).
If the-y flag is omitted, the sort command starts with the default storage size.
-The y0 flag is started with the minimum storage, while the-y flag (without the kilobytes value) is started with the maximum storage. The storage used by the sort command significantly affects performance.
Sorting large volumes of small files is a waste.
-Z recordsize: if the size of any row being sorted is greater than the default buffer size, it is necessary to prevent exceptional termination.
When the-C or-M flag is specified, the sorting phase is omitted and the default buffer size of the system is used. If the size of the sorted rows exceeds this value, the sorting exception ends.
The-Z option specifies the record of the longest row in the sorting phase, so sufficient buffer can be allocated in the merge phase.
Recordsize must specify the byte value equal to or greater than the maximum row to be merged.

Application Instance
  • To sort the fruits files when lc_all, lc_collate, or Lang environment variable is set to en_us, enter:
Lang = en_us sort Fruits
This command sequence displays the content of fruits files in ascending Lexicographic Order. The characters in each column, including spaces, numbers, and special characters, are compared one by one.
For example, if the fruits file contains text:
Banana
Orange
Persimmon
Apple
% Banana
Apple
Orange
The sort command displays:
% Banana
Orange
Persimmon
Apple
Apple
Banana
Orange
In the ASCII sorting sequence, % (percent) is before the upper-case letters and the upper-case letters are before the lower-case letters.
If your current Language Environment specifies a character set other than ASCII, The results may be different.
  • To sort data in alphabetical order, enter:
Sort-D Fruits
This command sorts and displays the content of the fruits file in sequence, and only compares letters, numbers, and spaces.
If the fruits file is the same as Example 1, the sort command displays:
Orange
Persimmon
Apple
Apple
% Banana
Banana
Orange
The-D flag ignores the % (percent) character because it is not a letter, number, or space. (% Banana is replaced by banana ).
  • To group rows that contain uppercase letters and special characters similar to lower-case rows, enter:
sort -d -f fruits
The-D flag ignores special characters, and the-F flag ignores case differences.
When the lc_all, lc_collate, or Lang environment variables are set to C, the output result of the fruits file is:
Apple
Apple
% Banana
Banana
Orange
Orange
Persimmon
  • To remove duplicate row sorting, enter:
sort  -d  -f  -u fruits
The-u flag tells the sort command to remove duplicate rows so that each row in the file is unique. The command sequence is displayed as follows:
Apple
% Banana
Orange
Persimmon
Besides duplicate apple, banana and orange are also removed.
This is because the-D flag ignores the special character %, and the-F flag ignores case differences.
  • To sort instances in the same order as above (unless they are uppercase letters or punctuation marks), enter:
Sort-U + 0-D-F + 0 Fruits
The sorting completed by input + 0-D-F is the same as the sorting type of-D-F in Example 3, and + 0 makes another comparison to distinguish different rows.
This prevents the-u flag from removing them.
In the fruits file shown in Example 1, the addition of + 0 distinguishes % banana from banana and orange from orange.
However, two apple instances are the same, so one of them is deleted.
Apple
% Banana
Banana
Orange
Orange
Persimmon
  • To specify characters for the fields to be separated, enter:
sort  -t: +1 vegetables
This command sorts the vegetables file in sequence and compares the text after the first colon in each line.
+ 1 tells the sort command to ignore the first field and compare it from the beginning of the second field to the end of the row. -T: indicates that fields are separated by colons in the sort command.

If vegetables contains:

Yams: 104
Turnips: 8
Potatoes: 15
Carrots: 104
Green beans: 32
Radishes: 5
Lettuce: 15
Then, when lc_all, lc_collate, or Lang environment variables are set to C, the sort command will display:
Carrots: 104
Yams: 104
Lettuce: 15
Potatoes: 15
Green beans: 32
Radishes: 5
Turnips: 8
Note that numbers are not sorted by number. This problem occurs when you use a dictionary to classify each character from left to right.
In other words, 3 is before 5, so 32 is before 5.
  • To sort numbers, enter:
sort  -t: +1  -n vegetables
This command sequence sorts the vegetables files by numbers based on the second field.
If the vegetables file is the same as that in Example 6, the sort command displays:
Radishes: 5
Turnips: 8
Lettuce: 15
Potatoes: 15
Green beans: 32
Carrots: 104
Yams: 104
  • To sort multiple fields, enter:
Sort-T: + 1-2-N + 0-1-r vegetables
Or
Sort-T:-K2, 2 N-K1, 1 R vegetables
This command sequence sorts the numbers of the Second Field (+ 1-2-N. In this order, it sorts the first field in the alphabetical order (+ 0-1-R.
When lc_all, lc_collate, or Lang environment variables are set to C, the output will be similar:
Radishes: 5
Turnips: 8
Potatoes: 15
Lettuce: 15
Green beans: 32
Yams: 104
Carrots: 104

This command sorts rows in numerical order. When two rows of numbers are the same, they appear in reverse alphabetical order.

  • To replace the original file with sorted text, enter:
Sort-O vegetables
This command sequence stores the sorting output to the vegetables file (-O vegetables ).


Sort the content of the file example using 2nd fields as the sorting keyword.
$ Sort + 1-2 example

For file1 and file2 files, the results are placed in OUTFILE, and the first character of the 2nd fields is used as the sorting keyword.
$ Sort-r-o OUTFILE + 1.0-1.1 example

Sort sorting is often used in pipelines with other commands to combine complex functions, such as sending files in the current working directory to sort for sorting, the sorting keywords are 6th to 8th fields.
$ LS-L | sort + 5-7

The sort command can also operate on standard input. For example, if you want to merge several file text lines and sort the merged text lines, you can use the command cat to merge multiple files.
Then, use the pipeline operation to input the merged text lines to the command sort. The sort command will output the merged and sorted text lines. In the following example
Fruitlist text lines are merged and sorted and saved to the file clist.

$ Cat veglist fruitlist | sort> clist

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.