Using Awk+sort+uniq for text analysis

Source: Internet
Author: User

1. Uniq command
Uniq-report or omit repeated lines
Description: Uniq A unique check of the specified ASCII file or standard input to determine which rows are repeated in the text file. Often used for system troubleshooting and log analysis
Command format:
Uniq [OPTION] ... [File1 [File2]]
Uniq removes duplicate lines from the sorted text file File1, outputting to standard standard output or File2. Often used as a filter for use with piping.
Before you can use the Uniq command, you must make sure that the text file for the operation is sorted by sort, and if you run Uniq without parameters, duplicate rows are deleted.
Common parameters:
-C,--count prefix lines by the number of the occurrences to be counted after
2, actual combat drills

Test data:

[email protected] ~]# cat Uniq.txt 10.0.0.910.0.0.810.0.0.710.0.0.710.0.0.810.0.0.810.0.0.9

A, directly to the file, without any parameters, only the same content adjacent to the next weight:

[Email protected] ~]# Uniq uniq.txt 10.0.0.910.0.0.810.0.0.710.0.0.810.0.0.9

B, sort command to make repeated rows adjacent (-u parameters can also be completely de-weighed), and then use Uniq for complete deduplication

[Email protected] ~]# sort uniq.txt 10.0.0.710.0.0.710.0.0.810.0.0.810.0.0.810.0.0.910.0.0.9[[email protected] ~]# Sort-u uniq.txt 10.0.0.710.0.0.810.0.0.9[[email protected] ~]# sort uniq.txt|uniq10.0.0.710.0.0.810.0.0.9

C, sort with uniq to count after weight

[[email protected] ~]# sort uniq.txt|uniq-c 2 10.0.0.7 3 10.0.0.8 2 10.0.0.9

3. Enterprise case
Process the contents of the file, take out the domain name and sort by the domain name count (Baidu and Sohu interview questions)

[email protected] ~]# cat Access.log http://www.etiantian.org/index.htmlhttp://www.etiantian.org/1.htmlhttp:// post.etiantian.org/index.htmlhttp://mp3.etiantian.org/index.htmlhttp://www.etiantian.org/3.htmlhttp:// Post.etiantian.org/2.html

Answer:
Analysis: This kind of problem is the most common problem in operation and maintenance work. Can evolve into an analytic log, view the number of TCP status connections, view the number of single IP connections, and more.

[[email protected] ~]# awk-f ' [/]+ ' {print $ $} ' Access.log|sort|uniq-c|sort-rn-k1 3 www.etiantian.org 2 pos T.etiantian.org 1 mp3.etiantian.org


This article is from the "architects of the Day" blog, be sure to keep this source http://wanyuetian.blog.51cto.com/3984643/1716971

Using Awk+sort+uniq for text analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.