1. Uniq command
Uniq-report or omit repeated lines
Description: Uniq A unique check of the specified ASCII file or standard input to determine which rows are repeated in the text file. Often used for system troubleshooting and log analysis
Command format:
Uniq [OPTION] ... [File1 [File2]]
Uniq removes duplicate lines from the sorted text file File1, outputting to standard standard output or File2. Often used as a filter for use with piping.
Before you can use the Uniq command, you must make sure that the text file for the operation is sorted by sort, and if you run Uniq without parameters, duplicate rows are deleted.
Common parameters:
-C,--count prefix lines by the number of the occurrences to be counted after
2, actual combat drills
Test data:
[email protected] ~]# cat Uniq.txt 10.0.0.910.0.0.810.0.0.710.0.0.710.0.0.810.0.0.810.0.0.9
A, directly to the file, without any parameters, only the same content adjacent to the next weight:
[Email protected] ~]# Uniq uniq.txt 10.0.0.910.0.0.810.0.0.710.0.0.810.0.0.9
B, sort command to make repeated rows adjacent (-u parameters can also be completely de-weighed), and then use Uniq for complete deduplication
[Email protected] ~]# sort uniq.txt 10.0.0.710.0.0.710.0.0.810.0.0.810.0.0.810.0.0.910.0.0.9[[email protected] ~]# Sort-u uniq.txt 10.0.0.710.0.0.810.0.0.9[[email protected] ~]# sort uniq.txt|uniq10.0.0.710.0.0.810.0.0.9
C, sort with uniq to count after weight
[[email protected] ~]# sort uniq.txt|uniq-c 2 10.0.0.7 3 10.0.0.8 2 10.0.0.9
3. Enterprise case
Process the contents of the file, take out the domain name and sort by the domain name count (Baidu and Sohu interview questions)
[email protected] ~]# cat Access.log http://www.etiantian.org/index.htmlhttp://www.etiantian.org/1.htmlhttp:// post.etiantian.org/index.htmlhttp://mp3.etiantian.org/index.htmlhttp://www.etiantian.org/3.htmlhttp:// Post.etiantian.org/2.html
Answer:
Analysis: This kind of problem is the most common problem in operation and maintenance work. Can evolve into an analytic log, view the number of TCP status connections, view the number of single IP connections, and more.
[[email protected] ~]# awk-f ' [/]+ ' {print $ $} ' Access.log|sort|uniq-c|sort-rn-k1 3 www.etiantian.org 2 pos T.etiantian.org 1 mp3.etiantian.org
This article is from the "architects of the Day" blog, be sure to keep this source http://wanyuetian.blog.51cto.com/3984643/1716971
Using Awk+sort+uniq for text analysis