Sort classification Operations
Sample files
Boys in company c:hk:192:2192
alien:hk:119:1982
The hill:kl:63:2972
aliens:hk:532:4892
Star wars:hk:301:4102
A Few Good men:kl:445:5851
Toy story:hk:239:3972
The command format for sort is: Sort-cmu-o output_file [Other options] +POS1 +pos2 input_files
Sort the entire row
You can use the-o option to save, or you can use redirection to save
Sort Video.txt > Results.out
Judging whether to sort
Sort-c Video.txt
If it is already categorized, then nothing is returned, and the status code is 1.
If it is-C, do not classify and nothing is output, the status code is 1.
By: Split, sort the contents after the 2nd field
SORT-T:-K 2 Video.txt
Use-T to specify the delimiter, and-K to specify the start of the row sequence.
Note: k Specifies that the domain is actually after the beginning of the content, detailed settings see after the introduction.
Reverse order
Sort-r Video.txt
By default, ascending is used, and the-R can be used in descending order.
Go heavy
Sort-u Video.txt
Sort by value
Sort-n-K 3-t: Video.txt
The default is to use string ASCII for comparison, using-N to start using numeric comparisons.
Multi-key sorting
All in descending order: Sort-n-K 2-k 3 Facebook.txt
Ascending and Descending (R = reverse): sort-n-K 3r-k 2 Facebook.txt
Non-numeric ordering (n = numeric): sort-k 3nr-k 2n Facebook.txt
The syntax of K
[Fstart [. Cstart] [Modifier] [, [fend] [. Cend] [Modifier]]
Where modifier is a similar option for n R
End If not set, default to line trailing
Fstart is a domain, and Cstart is the domain that starts with the first few characters.
If you sort the second letter of the first field, the same is followed by a third field descending
Sort-t "K 1.2,1.2-k 3,3nr Facebook.txt
Other options for Modirier
- b means ignoring the sign-in blank symbol for this field.
- D indicates that the field is sorted in dictionary order (that is, only white space and letters are considered).
- F indicates that the field is sorted by ignoring the case.
- I means that "non-printable characters" are ignored and only the printable characters are sorted. (some ASCII is non-printable characters, such as \a is an alarm, \b is a backspace, \ n is a newline, \ R is a carriage return, etc.)
Other options:
-F ignores case, converts lowercase to uppercase for comparison
-M sort the month
Head and tail
Use head and tail to output the first or last line of standard input
SORT-T:-r-k 4 video.txt | Head-1
Multi-File Merge classification
SORT-T:-K 1 Video2.txt video1.txt
Uniq
Filters adjacent matching rows from the input file or standard input and writes to the output file or standard output. Remove consecutive lines of the same
Usage: uniq [options] ... File
When no option is attached, the matching row is merged at the first occurrence.
The long option must use parameters that are also required for short options.
- -C,--count//precede each line with a prefix number indicating the number of occurrences of the corresponding line item
- -D,--repeated//output duplicate rows only
- -D,--all-repeated//Output only duplicate rows, but several lines of output
- -F,--skip-fields=n//-f ignores the number of segments,-F 1 ignores the first paragraph
- -I,--ignore-case//Case insensitive
- -S,--skip-chars=n//root-f a bit like, but-S is ignored, the number of characters after the-s 5 ignores the next 5 characters
- -U,--unique//Remove duplicates, all show up, root MySQL distinct function a bit like
- -Z,--zero-terminated end lines with 0 byte, not newline
- -W,--check-chars=n//Do not control the contents of the nth character of each line
Join
Similar to SQL's join.
Sample files:
Cn.txt |
En.txt |
1 Yi 2 ER 3 San 4 si |
1 One 2 3 Three 5 Five |
Direct use
Join Cn.txt En.txt
Link two files through the first column, output a common section, and put the first file column to the front.
Left match/Right match
Join-a 1 cn.txt en.txt #以第一个文件为主, the second one is empty
Join-a 2 cn.txt en.txt #以第二个文件为主, the first one left blank
Ignore case:-I
Output specified column
Join-o 1.1-o 1.2-o 2.1-o 2.2-a 1 cn.txt en.txt
Populating items with no matching-E
Join-o 1.1-o 1.2-o 2.1-o 2.2-a 1-e "eee" Cn.txt en.txt
specify input and output delimiters:-T
Cut
Cut columns or fields that you can use to paste into other files.
Cut [-bn] [file] or cut [-c] [file] or cut [-DF] [file]
-B: Split in bytes. These byte locations will ignore multibyte character boundaries unless the-n flag is also specified.
-C: Split in characters.
-D: Custom delimiter, default is tab.
-F: Used with-D to specify which area to display.
-N: Cancels splitting multibyte characters. Used only with the-B flag. If the last byte of a character falls within the range indicated by the List parameter of the-B flag, the character is written out; otherwise, the character is excluded.
Three positioning methods:
First, byte (bytes), with option-B
Second, character (characters), with option-C
Third, domain (fields), with option-f
Use byte
The designation of the quantity
- From the beginning to the specified position: 4
- From the specified position to the last: 4-
- Specified range: 3-5
- Specify a specific column: 1,3-5,7
Paste
Paste-d-s-file1 File2
-d Specifies the delimiter between file 1 and file 2
-S file to line merge
Split
Split large files into small files.
Split [-n] file [name]
-N: Specifies the length of each file truncated, without specifying a default of 1000 rows
File: Files to truncate
Name: The file name of the file that is truncated after the start of the letter, not specified, default to X, which is truncated after the file that is generated is named Xaa,xab .... Until Xzz
Tr
The entire input is treated together for processing.
Tr-c-d-s ["String1_to_translate_from"] ["string2_to_translate_to"] < Input-file
-C replaces this character set with a complement to the character set in string 1, which requires the character set to be ASCII.
-d deletes all input characters in string 1.
-S Delete all occurrences of a sequence of characters, leaving only the first one; the string compression is about to recur to a string.
Input-file is the conversion file name. Although you can enter in other formats, this format is most commonly used.
Character Range
When specifying the contents of string 1 or String 2, only single character or string range or list can be used.
A string that consists of characters in A-Z (A-Z).
[A-z] A string consisting of characters within a-Z.
[0-9] number string.
\octal a three-bit octal number that corresponds to a valid ASCII character.
[O*n] indicates that the character O repeats the specified number of times N. therefore [o*2] matches the OO string.
Different expressions of specific control characters in TR
Shorthand notation meaning octal method
\a ctrl-g Ringtones \007
\b Ctrl-h backspace \010
\f Ctrl-l Walk the line to change page \014
\ ctrl-j New Line \012
\ r Ctrl-m Enter \015
\ t ctrl-i Tab key \011
\v Ctrl-x \030
Go heavy
Echo ' ahhjjjkkk;;; ' | Tr-s "[A-z]"
Replace with uppercase
Echo ' ahhjjjkkk;;; ' | TR "[A-Z]" "[A-z]"
Replace non-specified characters
Echo ' ahhjjjkkk;;; ' | Tr-c ' [a-y] ' # '
Shell Text processing-split merge and filtering