1. uniq-report or ignore duplicate rows
by default, only the same rows are adjacent to only one parameter the most commonly used general and sort commands are used to count the number of repeating rows.
NAME Uniq-report or omit repeated linessynopsis uniq[option] ... [INPUT [OUTPUT]] Common Parameters-C,--count #统计次数会把重复出现行的次数统计好打印到每一行的前面
Example 1.1 statistics of repeated occurrences of the same content 1.1.1 simulation data
Cat >chen.txt<<eof10.0.0.910.0.0.810.0.0.710.0.0.710.0.0.810.0.0.810.0.0.9eof
1.1.2 Pidm Minimises Method 1.1.2.1 Ideas
1.1.2.2 by default, only the same row adjacent to the same
[Email protected] ~]# Uniq chen.txt10.0.0.910.0.0.810.0.0.710.0.0.810.0.0.9
1.1.2.3 make duplicate rows adjacent
[Email protected] ~]# sort chen.txt10.0.0.710.0.0.710.0.0.810.0.0.810.0.0.810.0.0.910.0.0.9
1.1.2.4 Count The number of occurrences of the same row
[Email protected] ~]# sortchen.txt|uniq-c #count 210.0.0.7 310.0.0.8 210.0.0.9
~ The effect of two commands is the same
[[email protected] ~]# sort chen.txt |uniq10.0.0.710.0.0.810.0.0.9[[email protected] ~]# sort-u Chen.txt # unique1 0.0.0.710.0.0.810.0.0.9
Instance 1.2 taking out the domain name and counting the sort processing 1.2.1 Analog data
Cat >chen.log<<eofhttp://www.oldboyedu.com/http://edu.51cto.com/http://edu.51cto.com/user/user_ id-8804946.htmlhttp://www.zhibo8.cc/http://weibo.com/1995418821/profile?topnav=1&wvr=6http:// chenfage.blog.51cto.com/http://edu.51cto.com/user/user_id-8804946.htmlhttp://www.oldboyedu.com/http:// Www.zhibo8.cc/http://www.zhibo8.cc/EOF
1.2.2 Solution 1awk-sort-uniq1.2.2.1 Ideas1
awk Remove domain name
sort can be sorted by default is ascending
uniq Count of repeated rows
Awk-f/' {print $ chen.log} ' |sort|uniq-c|sort-r
1.2.2.2 AwkTake domain
[[email protected] ~]# awk-f/' {print $} ' Chen.log | Sortchenfage.blog.51cto.comedu.51cto.comedu.51cto.comedu.51cto.comweibo.comwww.oldboyedu.comwww.oldboyedu.comwww.zhibo8.c cwww.zhibo8.ccwww.zhibo8.cc
1.2.2.3 Uniqcount Duplicate Rows
[[email protected] ~]# awk-f/' {print $} ' chen.log|sort|uniq-c 1chenfage.blog.51cto.com 3edu.51cto.com 1weibo.com 2www.oldboyedu.com 3www.zhibo8.cc
1.2.2.4 Sort-rreverse Order
[[email protected] ~]# awk -f / ' {print $3} ' chen.log|sort|uniq - c|sort 1chenfage.blog.51cto.com 1weibo.com 2www.oldboyedu.com 3edu.51cto.com 3www.zhibo8.cc[[email protected] ~]# awk -F / ' {print $3} ' chen.log|sort|uniq -c|sort -r # reverse 3www.zhibo8.cc 3edu.51cto.com 2www.oldboyedu.com 1weibo.com 1chenfage.blog.51cto.com[[email protected] ~]# awk -F / ' { PRINT&NBSP;$3} ' chen.log|sort|uniq -c|sort -r|head -2 3www.zhibo8.cc &nbSp;3edu.51cto.com
1.2.3 Solution 2cut-sort-uniq1.2.3.1 Ideas2
Cut Specifies the delimiter to cut -D
Sort order can reverse
Uniq The number of repeating rows in the statistics phase
CUT-D/-F3 Chen.log|sort|uniq-c|sort-r
1.2.3.2 Cut-dfTake domain
[Email protected] ~]# cut-d/-f3 chen.log #-d,--delimiter (delimiter)-F,--fields (domain) www.oldboyedu.comedu.51cto.comedu.51cto . comwww.zhibo8.ccweibo.comchenfage.blog.51cto.comedu.51cto.comwww.oldboyedu.comwww.zhibo8.ccwww.zhibo8.cc
1.2.3.3 SortSortUniqTake duplicate Rows
[[Email protected] ~]# cut -d/ -f3 chen.log| Sortchenfage.blog.51cto.comedu.51cto.comedu.51cto.comedu.51cto.comweibo.comwww.oldboyedu.comwww.oldboyedu.comwww.zhibo8.c Cwww.zhibo8.ccwww.zhibo8.cc[[email protected] ~]# cut -d/ -f3 chen.log|sort|uniq -c 1chenfage.blog.51cto.com 3edu.51cto.com 1weibo.com 2www.oldboyedu.com 3www.zhibo8.cc[[email protected] ~]# cut -d/ -f3 chen.log|sort|uniq-c|sort -r 3www.zhibo8.cc 3edu.51cto.com 2www.oldboyedu.com 1weibo.com 1chenfage.blog.51cto.com[[email protected] ~]# cut -d/ -f3 chen.log|sort|uniq-c|sort -r|head -2 3www.zhibo8.cc 3edu.51cto.com
2. sort-sorting
Sorts the files in the behavior unit.
NAME sort-sort lines of text files #给文本文件的行排序SYNOPSIS sort[option] ... [FILE] ... Common Parameters-R,--Reverse #序列默认是升序-u,--unique #相同的行只输出一行-K,--Key=pos1[,pos2] #指定第几列或第几列的第几个字符-t,--field-separator=se P #指定分隔符默认是空格-N,--numeric-sort #根据字符串的数值进行排序
Instance 2.1 Sorts the specified columns in reverse order 2.1.1 Simulation data
Cat >chen.txt<<eof192.168.3.1 c192.168.3.2 n192.168.12.41 w192.168.2.20 g192.168.3.3 a192.168.2.22 p192.168.0.152 l192.168.22.33 u192.168.1.10 f192.168.0.150 y192.168.2.20 e192.168.30.2 teof
2.1.2 Solution 2.1.2.1 Ideas
2.1.2.2 Operation Process
[[email protected] ~]# sort -t ' ' -k2 chen.txt # -t specified delimiter is a space,- K Specify column 192.168.3.1 c192.168.2.20 e192.168.1.10 f192.168.2.20 g192.168.0.152 l192.168.3.2 n192.168.2.22 p192.168.30.2 t192.168.22.33 u192.168.12.41 w192.168.0.150 y[[email protected] ~]# sort -k2 chen.txt # The delimiter default is a space 192.168.3.3 a192.168.3.1 c192.168.2.20 e192.168.1.10 f192.168.2.20 g192.168.0.152 l192.168.3.2 n192.168.2.22 p192.168.30.2 t192.168.22.33 u192.168.12.41 w192.168.0.150 y[[email protected] ~]# sort -rk2 chen.txt # -r represents reverse order (ascending by default) "Sort -t" " -rk2 chen.txt" 192.168.0.150 y192.168.12.41 w192.168.22.33 u192.168.30.2 t192.168.2.22 p192.168.3.2 n192.168.0.152 l192.168.2.20 g192.168.1.10 f
Example 2.2 2.2.1 Analog data for IP address classification in reverse order
Cat >arp.txt<<eof192.168.3.1 00:50:56:c0:00:08192.168.3.2 00:0c:29:fd:28:fd192.168.12.41 00:0C:29:21:26: C7192.168.2.20 00:50:56:27:78:ca192.168.3.3 00:50:56:29:c4:6b192.168.2.22 00:40:56:20:6e:ae192.168.0.152 00:50:56:2e:4a:17192.168.22.33 00:0c:29:61:1c:36192.168.1.10 00:40:56:36:bc:b7192.168.0.150 00:50:56:30:C3 : 8b192.168.2.20 01:50:56:c0:00:04192.168.30.2 00:50:56:23:68:fbeof
2.2.2 Solution 2.2.2.1 Ideas
Sort by the whole row by default
-T specify delimiter
-K A comma- delimited field indicates that the first field begins sorting to the end of the first field
-K 1.1,3.3 the first character of the first field begins sorting to the third character of the third field by using a dot separator character to end a
Sort-t.-k3.1,3.2n-k4.1,4.3rn Arp.txt
2.2.2.2 Operation Process
[Email protected] ~]# sort-t.-k3.1,3.2n-k4.1,4.3rnarp.txt#-t specifies that the delimiter is dot #-k3.1, 3.2N indicates that the first character of the third field begins sorting to the end of the second character because the third field has only two digits, and the last n means that the value of the string is sorted by the number of #-k4.1, 4.3rn means the first character of the fourth field begins sorting to the third character end-R represents the reverse sort 192.168.0.152 00:50:56:2e:4a:17192.168.0.150 00:50:56:30:c3:8b192.168.1.10 00:40:56:36:bc:b7192.168.2.22 00:40:56:20:6e:ae192.168.2.20 00:50:56:27:78:ca192.168.2.20 01:50:56:C0:0 0:04192.168.3.3 00:50:56:29:c4:6b192.168.3.2 00:0c:29:fd:28:fd192.168.3.1 00:50:56:c0:00:08192.168.12.41 00:0C : 29:21:26:c7192.168.22.33 00:0c:29:61:1c:36192.168.30.2 00:50:56:23:68:FB
This article is from the "Chen was 007" blog, please be sure to keep this source http://chenfage.blog.51cto.com/8804946/1833496
linux-Count Sort Enterprise Application-uniq-sort