Chapter 5 sorting, merging, and splitting shell learning files

Source: Internet
Author: User

Sort command

Sort [Option] [input file]

Option:

-C: Check whether the file has been sorted. If not, output the first unsorted record.

-K specifies the sorting domain

-M merges two sorted files, and the merged files are also sorted, such as sort-M A1 A2 and A1 records are inserted in order A2.

-N is sorted based on the number size, generally placed after the domain number, such as-k3n

-O redirects the output to the specified file

-R: reverse display the sorting result

-T changes the domain delimiter, such as-T:

-U removes duplicate rows in the result

Sort and awk combination

Example:

[[Email protected] TMP] # Cat test1.txt

B Liu

Dfad

DFW, SFA

A CLC

Wers

Sdfa, werw

F KKK

Ckaf

Fdwae, fwefs

E ccc

Werw

Sfdf, cdfae

[[Email protected] TMP] # Cat test1.txt | awk-V rs = "\ n" '{gsub ("\ n ","@"); print $0} '| sort | awk' begin {ors = "\ n"} {gsub ("@", "\ n"); print $0 }'

A CLC

Wers

Sdfa, werw

B Liu

Dfad

DFW, SFA

E ccc

Werw

Sfdf, cdfae

F KKK

Ckaf

Fdwae, fwefs

Uniq command

Remove rows that repeatedly exist in the text. rows that do not exist repeatedly cannot be removed (this is the difference from sort-U)

Option:

-C: print the total number of times each line appears in the text

-D: only duplicate records are displayed. Each duplicate record is displayed only once.

-U: only records with no duplicates are displayed.

For example, count words

[[Email protected] TMP] # Cat test2.txt

Thank you all the same, but no thank you, you are same with him.

Did you right.

[[Email protected] TMP] # Cat a4.sh

#! /Bin/sh

Argc = 1

E_badarg = 55

E_nofile = 56

If [$ #-ne $ argc] # the number of parameters is incorrect.

Then

Echo "Arg error"

Exit $ e_badarg

Fi

If [! -F $1] # file not found

Then

Echo "File No found"

Exit $ e_nofile

Fi

Sed-E's /\. // G'-E's /\, // G'-E's // \ n/G' "$1" | sed '/^ $/d' | sort | uniq-c | sort-Rn

Exit 0

[[Email protected] TMP] #./a4.sh test2.txt

4 you

2 thank

2 same

1

1

1 right

1 NO

1 him

1 did

1

1 are

1 All

Join command

Connect two file records to select records with the same domain in the two files and put all the domains of the two records in the same row.

Note: Join can only connect two files sorted by the same domain.

Option:

-A1 or-A2: In addition to displaying common domains,-A1 or-A2 displays records of common domains not found in the first or second files, respectively.

-I case insensitive when comparing domain content

-O: set the format of the result display.

-T to change the domain Separator

-V1 or-V2 display records of common domains that are not found in the first or second files, but do not display the results of common domain connections.

-1 and-2 respectively set file 1 and file 2 for the Connected Domain

Example:

1.

[[Email protected] TMP] # Cat test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

[[Email protected] TMP] # Cat test4.txt

Clc1: AAA:

Clc2: BBB: B

Clc3: CCC: c

Clc5: EEE: E

[[Email protected] TMP] # Join-T: test3.txt test4.txt

Clc1: 111: A: AAA:

Clc2: 222: B: BBB: B

Clc5: 555: e: EEE: E

By default, only the connection results of two files with the same domain are displayed.

2.

[[Email protected] TMP] # Join-T:-A1 test3.txt test4.txt

Clc1: 111: A: AAA:

Clc2: 222: B: BBB: B

Clc4: 444: d

Clc5: 555: e: EEE: E

[[Email protected] TMP] # Join-T:-V1 test3.txt test4.txt

Clc4: 444: d

Clc4: 222: DIS unique to test3.txt

3.

[[Email protected] TMP] # Join-T:-1 3-2 3 test3.txt test4.txt

A: clc1: 111: clc1: AAA

B: clc2: 222: clc2: bbb

E: clc5: 555: clc5: eee

Use the 3rd domains of the first file and the 3rd domains of the Second file as the connected domain. By default, all these are the first domain. Note that the connected domain is placed first.

4.

[[Email protected] TMP] # Join-T:-1 3-2 3-O 1.1 1.3 2.2 1.2 test3.txt test4.txt

Clc1: A: AAA: 111

Clc2: B: BBB: 222

Clc5: e: EEE: 555

Adjust the display position. 1.1 is the first field of file 1.

Cut command

Extract text by character or field from standard input or text files

Option:

-C extraction by character

-F extraction by domain

-D defines the domain separator, which is equivalent to the-t of sort and join.

Example:

[[Email protected] TMP] # Cat test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

[[Email protected] TMP] # Cut-C1, 4 test3.txt

C1

C2

C4

C5

[[Email protected] TMP] # Cut-D:-f2-3 test3.txt

111:

222: B

December 444: d

555: E

Paste command

Paste text files or standard output data together

Paste [Option] File 1 file 2

Option:

-D: Set the output domain separator. The default value is tab.

-S: paste each file into a line

Format: File 1 record 1 delimiter file 1 record 2... line feed file 2 Record 1 delimiter file 2 record 2...

Default format: File 1 record 1 delimiter file 2 Record 1 line feed file 1 record 2 delimiter file 2 record 2

-Read data from standard input

Example:

[[Email protected] TMP] # Cat test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

Clc6: 666: F

[[Email protected] TMP] # Cat test4.txt

Clc1: AAA:

Clc2: BBB: B

Clc3: CCC: c

Clc5: EEE: E

[[Email protected] TMP] # paste test3.txt test4.txt

Clc1: 111: A clc1: AAA:

Clc2: 222: B clc2: BBB: B

Clc4: 444: D clc3: CCC: c

Clc5: 555: E clc5: EEE: E

Clc6: 666: F

[[Email protected] TMP] # paste-s test3.txt test4.txt

Clc1: 111: A clc2: 222: B clc4: 444: D clc5: 555: E clc6: 666: F

Clc1: AAA: A clc2: BBB: B clc3: CCC: C clc5: EEE: E

[[Email protected] TMP] # paste [email protected] test3.txt test4.txt

Clc1: 111: [email protected]: AAA:

Clc2: 222: [email protected]: BBB: B

Clc4: 444: [email protected]: CCC: c

Clc5: 555: [email protected]: EEE: E

Clc6: 666: [email protected]

[[Email protected] TMP] # paste-s [email protected] test3.txt test4.txt

Clc1: 111: [email protected]: 222: [email protected]: 444: [email protected]: 555: [email protected]: 666: F

Clc1: AAA: [email protected]: BBB: [email protected]: CCC: [email protected]: EEE: E

[[Email protected] TMP] # ls | paste-D:---# Use: As the separator. Four files are displayed on each line.

1c: A: A1: A1 ~

A1.awk: a2.awk: a3.awk: a4.awk

A4.sh: AA: AABC: AAC

A. awk: A. sh: B: B1

Split command

Cut and store large files in multiple small files

Split [Option] Small file output from the large file to be cut

Option:

-Or-L is equivalent to specifying the number of lines of a large file to be split once.

-B specifies the number of bytes of a large file to be split once.

-C is similar to-B, but the integrity of each row should be maintained as much as possible.

Example:

1. Split files by line

[[Email protected] TMP] # Cat test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

Clc6: 666: F

[[Email protected] TMP] # split-2 test3.txt clc.txt

[[Email protected] TMP] # ls CLC *

Clc.txt AA clc.txt AB clc.txt AC

[[Email protected] TMP] # Cat clc.txt AA

Clc1: 111:

Clc2: 222: B

[[Email protected] TMP] # Cat clc.txt AB

Clc4: 444: d

Clc5: 555: E

[[Email protected] TMP] # Cat clc.txt AC

Clc6: 666: F

2. Split files by byte

[[Email protected] TMP] # ll test3.txt

-RW-r -- 1 Root 55 Dec 15 18:20 test3.txt

[[Email protected] TMP] # split-B 20 test3.txt ClC. DB

[[Email protected] TMP] # ll ClC. DB *

-RW-r -- 1 Root 20 Dec 15 18:44 CLC. dbaa

-RW-r -- 1 Root 20 Dec 15 18:44 CLC. dbab

-RW-r -- 1 Root 15 Dec 15 18:44 CLC. dbac

[[Email protected] TMP] # Cat ClC. dbaa

Clc1: 111:

Clc2: 222: [[email protected] TMP] # Cat ClC. dbab

B

Clc4: 444: d

Clc5: 55 [[email protected] TMP] # Cat ClC. dbac

5: E

Clc6: 666: F

3. Split files by byte, but keep row integrity as far as possible

[[Email protected] TMP] # split-C 20 test3.txt ClC. DB

[[Email protected] TMP] # ll ClC. DB *

-RW-r -- 1 Root 11 Dec 15 18:46 CLC. dbaa

-RW-r -- 1 Root 11 Dec 15 18:46 CLC. dbab

-RW-r -- 1 Root 11 Dec 15 18:46 CLC. dbac

-RW-r -- 1 Root 11 Dec 15 18:46 CLC. dbad

-RW-r -- 1 Root 11 Dec 15 18:46 CLC. dbae

[[Email protected] TMP] # Cat ClC. dbaa

Clc1: 111:

...

Tr command

Character conversion function, which can be replaced by SED

Only standard input is allowed, that is, the file is redirected to standard input, or the MPs queue is used.

Tr [Option] string 1 string 2 <input file

Option:

-C: returns string 1.

-D: delete all characters that appear in string 1 from the standard input.

-S: deletes the repeated characters that the standard input contains in string 1. Only one character is retained.

Example:

[[Email protected] TMP] # Cat test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

Clc6: 666: F

1. Delete 0-9

[[Email protected] TMP] # tr-D 0-9 <test3.txt

CLC:

CLC: B

CLC: d

CLC: E

CLC: F

2.remove duplicate numbers from the standard input (test3.txt) and retain one

[[Email protected] TMP] # tr-S 0-9 <test3.txt

Clc1: 1:

Clc2: 2: B

Clc4: 4: d

Clc5: 5: E

Clc6: 6: F

3. Remove the repeated characters except numbers and retain one (in this example, remove/N)

[[Email protected] TMP] # tr-SC 0-9 <test3.txt

Clc1: 111:

Clc2: 222: B

Clc4: 444: d

Clc5: 555: E

Clc6: 666: F

Tar command

Compression and decompression

There are two types of compressed packages: Tar format and GZIP format. GZIP format is equivalent to tar format and further compression

Tar [Option] file or directory

Option:

-C: Create a new package

-R: Add a new file to the package

-T list the package content

-U updates the file in the package. If this file is not available, add the file. The class replaces-R.

-X decompress the file

-F: required to use a compressed file or device.

-V: displays information about files processed by tar.

-Z: Use gzip to compress and decompress the file. If-Z is added to the package (-c),-Z is also required to decompress the package (-x, in essence, it is changed to tar-CF and then gzip to gzip.

You cannot directly add (-R or-u) files to a gzip package. You must first convert the files to tar (gzip-d) and add (tar-RF) files) then, compress the data into GZIP format (gzip)

Example:

[[Email protected] TMP] # ls CLC * test *. txt

CLC. dbaa ClC. dbac ClC. dbae test2.txt test4.txt

CLC. dbab ClC. dbad test1.txt test3.txt

[[Email protected] TMP] # tar-zcf all.tar.gz CLC * # create a compressed file and compress it directly into GZIP format

[[Email protected] TMP] # tar-TF all.tar.gz # view the compressed Package content

CLC. dbaa

CLC. dbab

CLC. dbac

CLC. dbad

CLC. dbae

[[Email protected] TMP] # gzip-D all.tar.gz # decompress the gzip package into a tar package

[[Email protected] TMP] # ls all *

All.tar

[[Email protected] TMP] # tar-RF all.tar test *. txt # add files to the tar compressed package. Note that you cannot directly add files to gzip.

[[Email protected] TMP] # tar-TF all.tar

CLC. dbaa

CLC. dbab

CLC. dbac

CLC. dbad

CLC. dbae

Test1.txt

Test2.txt

Test3.txt

Test4.txt

[[Email protected] TMP] # gzip all.tar # compress tar into GZIP format

[[Email protected] TMP] # ls all *

All.tar.gz

[[Email protected] TMP] # ls CLC * test *. txt # The packaged file does not disappear.

CLC. dbaa ClC. dbac ClC. dbae test2.txt test4.txt

CLC. dbab ClC. dbad test1.txt test3.txt

[[Email protected] TMP] # rm-f clc * test *. txt

[[Email protected] TMP] # tar-zxvf all.tar.gz # decompress the gzip package. If you decompress the tar package, you do not need the Z option.

CLC. dbaa

CLC. dbab

CLC. dbac

CLC. dbad

CLC. dbae

Test1.txt

Test2.txt

Test3.txt

Test4.txt

[[Email protected] TMP] # ls CLC * test *. txt

CLC. dbaa ClC. dbac ClC. dbae test2.txt test4.txt

CLC. dbab ClC. dbad test1.txt test3.txt


This article is from the "flyclc" blog, please be sure to keep this source http://flyclc.blog.51cto.com/1385758/1540164

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.