Took an e-mail address, want to go to the address of the heavy.
Open the file a look, several columns. There are comma distinctions and there are spaces to distinguish
9589,860122@qq.com,1,1,2015-04-08 15:31:07.763000000, Xianyang, Shaanxi Province, qq.com,5
9590,4605708@qq.com,1,1,2015-04-08 15:31:07.763000000, Shenzhen, Guangdong Province, qq.com,5
9591,3307150@qq.com,1,1,2015-04-08 15:31:07.763000000, Hangzhou city, Zhejiang Province, qq.com,5
9592,1378747@qq.com,1,1,2015-04-08 15:31:07.763000000, Dazhou, Sichuan Province, qq.com,5
Command 1:
Cat test.txt| awk ' {print $} ' | Sort |uniq
By the space distinction carries on to go heavy, therefore obtains is 15:31:07.763000000, Shaanxi province Xianyang, qq.com,5 this latter half part.
Cat test.txt| Awk-f "," ' {print $} ' | Sort |uniq >> All.txt
By commas, filter out the 2nd column and import the results into a new file
Command 3:
awk ' {print '} ' all.txt |grep-v ' qq.com ' | Grep-v "163.com" |grep-v "Sina.com" | Uniq | Sort-n
Excludes rows containing qq.com,163.com,sina.com in the file
Command 4:
Sed-i '/000/d ' all.txt
Delete a row containing "000" in the All.txt file
Command 5:
awk ' Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]} ' all.txt
Random sequence all.txt rows in a file
Supplementary examples:
First, remove adjacent duplicate data rows
$cat Data1.txt | Uniq
Output:
Beijing
Wuhan
Beijing
Wuhan
Second, remove all duplicate data rows
$cat Data1.txt | Sort | Uniq
Note:
Only the Uniq command, only to remove the adjacent duplicate rows of data.
If you sort, you will have all the duplicate rows of data into adjacent rows of data, and then Uniq, remove all duplicate data rows.
Output:
Beijing
Wuhan
Attached: data1.txt
[Root@syy ~]# Cat Data1.txt
Beijing
Beijing
Wuhan
Wuhan
Wuhan
Beijing
Beijing
Beijing
Wuhan
Wuhan
Note: The IP address in the filter log is useful.