第五章 shell學習之檔案的排序、合并和分割

最後更新：2014-08-15 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：shell 檔案排序合并分割

sort命令

sort [選項] [輸入檔案]

選項：

-c 測試檔案是否已經排序，如果未被排序則輸出第一個未被排序的記錄

-k 指定排序的域

-m 合并兩個已排序的檔案，合并的檔案也已經排序，如sort -m a1 a2，a1的記錄被有序的插入a2

-n 根據數位大小進行排序，一般放在域號後，如-k3n

-o 將輸出重新導向到指定檔案

-r 將排序結果逆向顯示

-t 改變域分割符，如-t:

-u 去除結果中的重複行

sort和awk聯合

例：

[[email protected] tmp]# cat test1.txt

B liu

dfad

dfw,sfa

A clc

wers

sdfa,werw

F kkk

ckaf

fdwae,fwefs

E ccc

werw

sfdf,cdfae

[[email protected] tmp]# cat test1.txt | awk -v RS="\n\n" ‘{gsub("\n","@");print $0}‘ | sort | awk ‘BEGIN {ORS="\n\n"} {gsub("@","\n");print $0}‘

A clc

wers

sdfa,werw

B liu

dfad

dfw,sfa

E ccc

werw

sfdf,cdfae

F kkk

ckaf

fdwae,fwefs

uniq命令

去除文本中連續重複的行，不連續重複的行不能去除（這是和sort -u的區別）

選項：

-c 列印每行在文本總重複出現的次數

-d 只顯示重複的記錄，每個重複的記錄只顯示一次

-u 只顯示沒有重複的記錄

例:統計字數

[[email protected] tmp]# cat test2.txt

thank you all the same,but no thank you,you are same with him.

did you right.

[[email protected] tmp]# cat a4.sh

#! /bin/sh

argc=1

e_badarg=55

e_nofile=56

if [ $# -ne $argc ] #參數個數錯誤

then

echo "arg error"

exit $e_badarg

if [ ! -f $1 ] #檔案未找到

then

echo "file no found"

exit $e_nofile

sed -e ‘s/\./ /g‘ -e ‘s/\,/ /g‘ -e ‘s/ /\n/g‘ "$1" | sed ‘/^$/d‘ | sort | uniq -c | sort -rn

exit 0

[[email protected] tmp]# ./a4.sh test2.txt

4 you

2 thank

2 same

1 with

1 the

1 right

1 no

1 him

1 did

1 but

1 are

1 all

join命令

實現兩個檔案記錄的串連操作，即將兩個檔案中具有相同域的記錄選出來，把這兩個記錄的所有域放到同一行

注意join只能對按照兩個按照相同域排序的檔案進行串連

選項：

-a1或-a2 除了顯示共同域，-a1或-a2分別顯示第一或第二個檔案中沒有的共同域的記錄

-i 比較域的內容時忽略大小寫

-o 設定結果顯示的格式

-t 改變域分隔字元

-v1或-v2 分別顯示第一或第二個檔案中沒有的共同域的記錄，但不顯示共同域串連結果

-1和-2 分別設定檔案1和檔案2用於串連的域

例：

[[email protected] tmp]# cat test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

[[email protected] tmp]# cat test4.txt

clc1:aaa:A

clc2:bbb:B

clc3:ccc:C

clc5:eee:E

[[email protected] tmp]# join -t: test3.txt test4.txt

clc1:111:A:aaa:A

clc2:222:B:bbb:B

clc5:555:E:eee:E

預設只顯示兩個檔案有共同域的串連結果

[[email protected] tmp]# join -t: -a1 test3.txt test4.txt

clc1:111:A:aaa:A

clc2:222:B:bbb:B

clc4:444:D

clc5:555:E:eee:E

[[email protected] tmp]# join -t: -v1 test3.txt test4.txt

clc4:444:D

clc4:222:D為test3.txt特有

[[email protected] tmp]# join -t: -1 3 -2 3 test3.txt test4.txt

A:clc1:111:clc1:aaa

B:clc2:222:clc2:bbb

E:clc5:555:clc5:eee

以第一個檔案的第3個域和第二個檔案的第3個域作為串連的域，預設都為第一個域，注意串連的域放在了第一位

[[email protected] tmp]# join -t: -1 3 -2 3 -o 1.1 1.3 2.2 1.2 test3.txt test4.txt

clc1:A:aaa:111

clc2:B:bbb:222

clc5:E:eee:555

調整顯示的位置，1.1為先顯示檔案1的第一個域

cut命令

從標準輸入或文字檔中按字元或者域提取文本

選項：

-c 按字元提取

-f 按域提取

-d 定義域分隔字元，相當於sort和join的-t

例：

[[email protected] tmp]# cat test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

[[email protected] tmp]# cut -c1,4 test3.txt

[[email protected] tmp]# cut -d: -f2-3 test3.txt

111:A

222:B

444:D

555:E

paste命令

將文字檔或者標準輸出的資料粘貼到一起

paste [選項] 檔案1 檔案2

選項：

-d 設定輸出的域分隔字元，預設為tab

-s 將每個檔案粘貼成一行

格式：檔案1的記錄1分隔字元檔案1的記錄2... 換行檔案2的記錄1分隔字元檔案2的記錄2...

而預設格式：檔案1的記錄1分隔字元檔案2的記錄1 換行檔案1的記錄2分隔字元檔案2的記錄2

- 從標準輸入中讀取資料

例：

[[email protected] tmp]# cat test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

clc6:666:F

[[email protected] tmp]# cat test4.txt

clc1:aaa:A

clc2:bbb:B

clc3:ccc:C

clc5:eee:E

[[email protected] tmp]# paste test3.txt test4.txt

clc1:111:A clc1:aaa:A

clc2:222:B clc2:bbb:B

clc4:444:D clc3:ccc:C

clc5:555:E clc5:eee:E

clc6:666:F

[[email protected] tmp]# paste -s test3.txt test4.txt

clc1:111:A clc2:222:B clc4:444:D clc5:555:E clc6:666:F

clc1:aaa:A clc2:bbb:B clc3:ccc:C clc5:eee:E

[[email protected] tmp]# paste [email protected] test3.txt test4.txt

clc1:111:[email protected]:aaa:A

clc2:222:[email protected]:bbb:B

clc4:444:[email protected]:ccc:C

clc5:555:[email protected]:eee:E

clc6:666:[email protected]

[[email protected] tmp]# paste -s [email protected] test3.txt test4.txt

clc1:111:[email protected]:222:[email protected]:444:[email protected]:555:[email protected]:666:F

clc1:aaa:[email protected]:bbb:[email protected]:ccc:[email protected]:eee:E

[[email protected] tmp]# ls | paste -d: - - - - #以:為分隔字元每行顯示4個檔案

1c:a:a1:a1~

a1.awk:a2.awk:a3.awk:a4.awk

a4.sh:aa:aabc:aac

a.awk:a.sh:b:b1

split命令

把大檔案切割存放在多個小檔案中

split [選項] 待切割的大檔案輸出的小檔案

選項：

-或-l 兩個等價，指定大檔案幾行被切一次

-b 指定大檔案多少位元組被切一次

-C 與-b類似，但是盡量維持每行的完整性

例：

1.按行分割檔案

[[email protected] tmp]# cat test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

clc6:666:F

[[email protected] tmp]# split -2 test3.txt clc.txt

[[email protected] tmp]# ls clc*

clc.txtaa clc.txtab clc.txtac

[[email protected] tmp]# cat clc.txtaa

clc1:111:A

clc2:222:B

[[email protected] tmp]# cat clc.txtab

clc4:444:D

clc5:555:E

[[email protected] tmp]# cat clc.txtac

clc6:666:F

2.按位元組分割檔案

[[email protected] tmp]# ll test3.txt

-rw-r--r-- 1 root root 55 Dec 15 18:20 test3.txt

[[email protected] tmp]# split -b 20 test3.txt clc.db

[[email protected] tmp]# ll clc.db*

-rw-r--r-- 1 root root 20 Dec 15 18:44 clc.dbaa

-rw-r--r-- 1 root root 20 Dec 15 18:44 clc.dbab

-rw-r--r-- 1 root root 15 Dec 15 18:44 clc.dbac

[[email protected] tmp]# cat clc.dbaa

clc1:111:A

clc2:222:[[email protected] tmp]# cat clc.dbab

clc4:444:D

clc5:55[[email protected] tmp]# cat clc.dbac

5:E

clc6:666:F

3.按位元組分割檔案但是盡量保留行的完整性

[[email protected] tmp]# split -C 20 test3.txt clc.db

[[email protected] tmp]# ll clc.db*

-rw-r--r-- 1 root root 11 Dec 15 18:46 clc.dbaa

-rw-r--r-- 1 root root 11 Dec 15 18:46 clc.dbab

-rw-r--r-- 1 root root 11 Dec 15 18:46 clc.dbac

-rw-r--r-- 1 root root 11 Dec 15 18:46 clc.dbad

-rw-r--r-- 1 root root 11 Dec 15 18:46 clc.dbae

[[email protected] tmp]# cat clc.dbaa

clc1:111:A

...

tr命令

實現字元轉換功能，可用sed代替

只能標準輸入，即要麼將檔案重新導向到標準輸入，要麼用管道

tr [選項] 字串1 字串2 <輸入檔案

選項：

-c 反選字串1

-d 刪除標準輸入在字串1中出現的所有字元

-s 刪除標準輸入在字串1中出現的重複字元，只保留1個

例：

[[email protected] tmp]# cat test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

clc6:666:F

1.刪除0-9

[[email protected] tmp]# tr -d 0-9 < test3.txt

clc::A

clc::B

clc::D

clc::E

clc::F

2.去除標準輸入（test3.txt）中重複的數字，保留1個

[[email protected] tmp]# tr -s 0-9 < test3.txt

clc1:1:A

clc2:2:B

clc4:4:D

clc5:5:E

clc6:6:F

3.去除除了數字以外的重複字元，保留一個（這個例子中去除了/n）

[[email protected] tmp]# tr -sc 0-9 < test3.txt

clc1:111:A

clc2:222:B

clc4:444:D

clc5:555:E

clc6:666:F

tar命令

壓縮和解壓縮

壓縮包有兩種，tar格式和gzip格式，gzip格式相當於在tar格式上再進行進一步壓縮

tar [選項] 檔案或目錄

選項：

-c 建立新的包

-r 為包添加新的檔案

-t 列出包內容

-u 更新包中的檔案，如無此檔案則添加，課代替-r

-x 解壓縮檔案

-f 使用壓縮檔或裝置，必選

-v 顯示tar處理檔案的資訊

-z 用gzip壓縮和解壓縮檔案，若建立壓縮包(-c)加上-z則解壓(-x)也要加上-z，實質上為先tar -cf變為tar再gzip變為gzip

不能直接往gzip格式的包中添加（-r或-u）檔案，要先變為tar格式（gzip -d），添加(tar -rf)完後在進一步壓縮成gzip格式（gzip）

例：

[[email protected] tmp]# ls clc* test*.txt

clc.dbaa clc.dbac clc.dbae test2.txt test4.txt

clc.dbab clc.dbad test1.txt test3.txt

[[email protected] tmp]# tar -zcf all.tar.gz clc* #建立壓縮檔，直接壓縮成gzip格式

[[email protected] tmp]# tar -tf all.tar.gz #查看壓縮包內容

clc.dbaa

clc.dbab

clc.dbac

clc.dbad

clc.dbae

[[email protected] tmp]# gzip -d all.tar.gz #把gzip壓縮包解壓成tar壓縮包

[[email protected] tmp]# ls all*

all.tar

[[email protected] tmp]# tar -rf all.tar test*.txt #往tar壓縮包添加檔案，注意不能往gzip直接添加檔案

[[email protected] tmp]# tar -tf all.tar

clc.dbaa

clc.dbab

clc.dbac

clc.dbad

clc.dbae

test1.txt

test2.txt

test3.txt

test4.txt

[[email protected] tmp]# gzip all.tar #重新把tar壓縮成gzip格式

[[email protected] tmp]# ls all*

all.tar.gz

[[email protected] tmp]# ls clc* test*.txt #被打包檔案並沒有消失

clc.dbaa clc.dbac clc.dbae test2.txt test4.txt

clc.dbab clc.dbad test1.txt test3.txt

[[email protected] tmp]# rm -f clc* test*.txt

[[email protected] tmp]# tar -zxvf all.tar.gz #解壓gzip包，如果解壓tar包則不用z選項

clc.dbaa

clc.dbab

clc.dbac

clc.dbad

clc.dbae

test1.txt

test2.txt

test3.txt

test4.txt

[[email protected] tmp]# ls clc* test*.txt

clc.dbaa clc.dbac clc.dbae test2.txt test4.txt

clc.dbab clc.dbad test1.txt test3.txt

本文出自 “flyclc” 部落格，請務必保留此出處http://flyclc.blog.51cto.com/1385758/1540164

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More