Shell Quick Sort, redo text content

Source: Internet
Author: User

Visiting the Chinaunix forum, you can often see the Nikkei, how to quickly sort a text content, or calculate the number of occurrences of a line. Most of these problems can be solved by simple sort, uniq commands.

Prepare two text first

Cat File1:

Boys in company c:hk:192:2192

alien:hk:119:1982

The hill:kl:63:2972

aliens:hk:532:4892

Star wars:hk:301:4102

A Few Good men:kl:445:5851

Toy story:hk:239:3972

Cat File2:

Boy took Bat home

Boy took Bat home

Girl took Bat home

Boy took Bat home

Boy took Bat home

Dog brought hat home

Dog brought hat home

Dog brought hat home



Sort

Compare each line of a file as a unit, comparing it from the first character to the ASCII value, and finally outputting them in ascending order.


A few common parameters: sort-mnrtku

M: Merging Files

N: Sort by numeric size

R: Reverse Row

T: Custom delimiter to divide fields

K: Select fields to sort

U: Full text de-weight

Field equals column

1, sort-m file1 file2 merge two files

Boys in company c:hk:192:2192

alien:hk:119:1982

Boy:took:bat:home

Boy:took:bat:home

Girl:took:bat:home

Boy:took:bat:home

Boy:took:bat:home

Dog:brought:hat:home

Dog:brought:hat:home

Dog:brought:hat:home

The hill:kl:63:2972

aliens:hk:532:4892

Star wars:hk:301:4102

A Few Good men:kl:445:5851

Toy story:hk:239:3972

2, the-n parameter is sometimes sorted by 10:2 small cases, because sort is directly compared to the first character, so use-N for numeric comparisons.

3,-R parameters, sort a small row, big in the next, sometimes a lot of rows, you just want to know the biggest can sort-r |more on it.

4,-t parameter, select the Separator field symbol, this is often used with the-K parameter, such as we need to sort by the value of the third column of File1

Sort-n-t:-k3 File1

The hill:kl:63:2972

alien:hk:119:1982

Boys in company c:hk:192:2192

Toy story:hk:239:3972

Star wars:hk:301:4102

A Few Good men:kl:445:5851

aliens:hk:532:4892

If you do not add the-n parameter :

SORT-T:-k3 file1

alien:hk:119:1982

Boys in company c:hk:192:2192

Toy story:hk:239:3972

Star wars:hk:301:4102

A Few Good men:kl:445:5851

aliens:hk:532:4892

The hill:kl:63:2972

Hidden properties, sort's output is standard screen output, if we want to output to the source file, the use of redirection will be the cup.

Sort File1>file1;cat file1, you will get an empty file. This is the time to use the-o parameter to implement this function.

Sort File1-o File1


Uniq

This command reads the input file and compares adjacent rows. Under normal circumstances, the second and later more repeating rows are deleted, and the row comparison is based on the sort sequence of the character set used. The result of the command processing is written to the output file. The input file and output file must be different. If the input file is represented by "-", it is read from the standard input.

The difference between it and sort-u is that Sort-u is the full-text deduplication, while Uniq is comparing adjacent rows, deleting the second and more repeating rows.

Syntax: uniq [-cdu][-f< field >][-s< character position >][-w< character position >][--help][--version][input file [output file]

Supplemental Note: Uniq can check for repeated rows in a text file.

Parameters:
-C or--count displays the number of occurrences of the row next to each column.
-D or--repeated displays only the rows that appear repeatedly.
-f< field > or--skip-fields=< field > ignores the column specified by comparison.
-s< character position > or--skip-chars=< character position > Ignores comparison of the specified character.
-U or--unique only show one row at a time.
-w< character position > or--check-chars=< character position > Specifies the character to compare.
--help display Help.
--version Displays version information.

Commonly used is the-c parameter, which displays the number of occurrences of a row.

Uniq-c File2

2 Boy:took:bat:home

1 girl:took:bat:home

2 Boy:took:bat:home

3 Dog:brought:hat:home

However, because it is only compared to adjacent rows, it is sorted first with sort:

Sort File2 |uniq-c

4 Boy:took:bat:home

3 Dog:brought:hat:home

1 girl:took:bat:home

So that we can get the results we want.

Finally, combine these two files nicely:

Sort-m file1 file2 |sort |uniq-c |sed ' s///g ' |awk-f ': ' {printf ("%-10s%10s%10s%10s\n", $1,$2,$3,$4)} '

1AFewGoodMen KL 445 5851

1Alien HK 119 1982

1Aliens HK 532 4892

1BoysinCompanyC HK 192 2192

4boy took Bat home

3dog brought hat home

1girl took Bat home

1StarWars HK 301 4102

1TheHill KL 63 2972

1ToyStory HK 239 3972

The feeling is still very pit daddy.

1, sort-m Incredibly only merge, no sort

2, the alignment of printf, it looks so strange.

This article is from the "Hiubuntu" blog, make sure to keep this source http://qujunorz.blog.51cto.com/6378776/1564003

Shell Quick Sort, redo text content

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.