Shell Text processing-split merge and filtering

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sort classification Operations

Sample files

Boys in company c:hk:192:2192
alien:hk:119:1982
The hill:kl:63:2972
aliens:hk:532:4892
Star wars:hk:301:4102
A Few Good men:kl:445:5851
Toy story:hk:239:3972

The command format for sort is: Sort-cmu-o output_file [Other options] +POS1 +pos2 input_files

Sort the entire row

You can use the-o option to save, or you can use redirection to save

Sort Video.txt > Results.out

Judging whether to sort

Sort-c Video.txt

If it is already categorized, then nothing is returned, and the status code is 1.
If it is-C, do not classify and nothing is output, the status code is 1.

By: Split, sort the contents after the 2nd field

SORT-T:-K 2 Video.txt

Use-T to specify the delimiter, and-K to specify the start of the row sequence.

Note: k Specifies that the domain is actually after the beginning of the content, detailed settings see after the introduction.

Reverse order

Sort-r Video.txt

By default, ascending is used, and the-R can be used in descending order.

Go heavy

Sort-u Video.txt

Sort by value

Sort-n-K 3-t: Video.txt

The default is to use string ASCII for comparison, using-N to start using numeric comparisons.

Multi-key sorting

All in descending order: Sort-n-K 2-k 3 Facebook.txt

Ascending and Descending (R = reverse): sort-n-K 3r-k 2 Facebook.txt

Non-numeric ordering (n = numeric): sort-k 3nr-k 2n Facebook.txt

The syntax of K

[Fstart [. Cstart] [Modifier] [, [fend] [. Cend] [Modifier]]

Where modifier is a similar option for n R

End If not set, default to line trailing

Fstart is a domain, and Cstart is the domain that starts with the first few characters.

If you sort the second letter of the first field, the same is followed by a third field descending

Sort-t "K 1.2,1.2-k 3,3nr Facebook.txt

Other options for Modirier

b means ignoring the sign-in blank symbol for this field.
D indicates that the field is sorted in dictionary order (that is, only white space and letters are considered).
F indicates that the field is sorted by ignoring the case.
I means that "non-printable characters" are ignored and only the printable characters are sorted. (some ASCII is non-printable characters, such as \a is an alarm, \b is a backspace, \ n is a newline, \ R is a carriage return, etc.)

Other options:

-F ignores case, converts lowercase to uppercase for comparison

-M sort the month

Head and tail

Use head and tail to output the first or last line of standard input

SORT-T:-r-k 4 video.txt | Head-1

Multi-File Merge classification

SORT-T:-K 1 Video2.txt video1.txt

Uniq

Filters adjacent matching rows from the input file or standard input and writes to the output file or standard output. Remove consecutive lines of the same

Usage: uniq [options] ... File
When no option is attached, the matching row is merged at the first occurrence.
The long option must use parameters that are also required for short options.

-C,--count//precede each line with a prefix number indicating the number of occurrences of the corresponding line item
-D,--repeated//output duplicate rows only
-D,--all-repeated//Output only duplicate rows, but several lines of output
-F,--skip-fields=n//-f ignores the number of segments,-F 1 ignores the first paragraph
-I,--ignore-case//Case insensitive
-S,--skip-chars=n//root-f a bit like, but-S is ignored, the number of characters after the-s 5 ignores the next 5 characters
-U,--unique//Remove duplicates, all show up, root MySQL distinct function a bit like
-Z,--zero-terminated end lines with 0 byte, not newline
-W,--check-chars=n//Do not control the contents of the nth character of each line

Join

Similar to SQL's join.

Sample files:

Cn.txt	En.txt
1 Yi 2 ER 3 San 4 si	1 One 2 3 Three 5 Five

Direct use

Join Cn.txt En.txt

Link two files through the first column, output a common section, and put the first file column to the front.

Left match/Right match

Join-a 1 cn.txt en.txt #以第一个文件为主, the second one is empty

Join-a 2 cn.txt en.txt #以第二个文件为主, the first one left blank

Ignore case:-I

Output specified column

Join-o 1.1-o 1.2-o 2.1-o 2.2-a 1 cn.txt en.txt

Populating items with no matching-E

Join-o 1.1-o 1.2-o 2.1-o 2.2-a 1-e "eee" Cn.txt en.txt

specify input and output delimiters:-T

Cut

Cut columns or fields that you can use to paste into other files.

Cut [-bn] [file] or cut [-c] [file] or cut [-DF] [file]

-B: Split in bytes. These byte locations will ignore multibyte character boundaries unless the-n flag is also specified.
-C: Split in characters.
-D: Custom delimiter, default is tab.
-F: Used with-D to specify which area to display.
-N: Cancels splitting multibyte characters. Used only with the-B flag. If the last byte of a character falls within the range indicated by the List parameter of the-B flag, the character is written out; otherwise, the character is excluded.

Three positioning methods:

First, byte (bytes), with option-B

Second, character (characters), with option-C

Third, domain (fields), with option-f

Use byte

The designation of the quantity

From the beginning to the specified position: 4
From the specified position to the last: 4-
Specified range: 3-5
Specify a specific column: 1,3-5,7

Paste

Paste-d-s-file1 File2

-d Specifies the delimiter between file 1 and file 2

-S file to line merge

Split

Split large files into small files.

Split [-n] file [name]

-N: Specifies the length of each file truncated, without specifying a default of 1000 rows
File: Files to truncate
Name: The file name of the file that is truncated after the start of the letter, not specified, default to X, which is truncated after the file that is generated is named Xaa,xab .... Until Xzz

The entire input is treated together for processing.

Tr-c-d-s ["String1_to_translate_from"] ["string2_to_translate_to"] < Input-file

-C replaces this character set with a complement to the character set in string 1, which requires the character set to be ASCII.
-d deletes all input characters in string 1.
-S Delete all occurrences of a sequence of characters, leaving only the first one; the string compression is about to recur to a string.
Input-file is the conversion file name. Although you can enter in other formats, this format is most commonly used.

Character Range
When specifying the contents of string 1 or String 2, only single character or string range or list can be used.
A string that consists of characters in A-Z (A-Z).
[A-z] A string consisting of characters within a-Z.
[0-9] number string.
\octal a three-bit octal number that corresponds to a valid ASCII character.
[O*n] indicates that the character O repeats the specified number of times N. therefore [o*2] matches the OO string.

Different expressions of specific control characters in TR
Shorthand notation meaning octal method
\a ctrl-g Ringtones \007
\b Ctrl-h backspace \010
\f Ctrl-l Walk the line to change page \014
\ ctrl-j New Line \012
\ r Ctrl-m Enter \015
\ t ctrl-i Tab key \011
\v Ctrl-x \030

Go heavy

Echo ' ahhjjjkkk;;; ' | Tr-s "[A-z]"

Replace with uppercase

Echo ' ahhjjjkkk;;; ' | TR "[A-Z]" "[A-z]"

Replace non-specified characters

Echo ' ahhjjjkkk;;; ' | Tr-c ' [a-y] ' # '

Shell Text processing-split merge and filtering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Shell Text processing-split merge and filtering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Shell Text processing-split merge and filtering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support