Linux sort Uniq awk head complete access log statistics sorting function

Source: Internet
Author: User

When we develop, we often count some access logs, the URLs in the access log are huge, and many are duplicates. In the URL, for example, the URL appears in the first 5 frequency of the number of URLs, and the number of occurrences in descending order.

Linux commands:cat url.log | sort | uniq-c |sort-n-r-k 1-t ' | awk-f '//' {print $} '

Now let's analyze the meaning of these combinations of commands.

0) Sample Access log


1) cat T1.log


Indicates that the contents of the data file are sorted. The sort command sorts the contents of each row according to the dictionary order (ASCII code), which ensures that duplicate records are contiguous.


2) Cat T1.log | Sort | Uniq-c


Here, the output of the command in the left part is entered as the right part through the pipeline (|). The uniq-c represents merging adjacent duplicate records and counting the repetitions. Because Uniq-c only merges adjacent records, you need to sort them before using the command.

3) Cat T1.log | Sort | uniq-c | Sort-k 1-n-R

After uniq-c processing data format such as "2 Data", the first field is a number, indicating the number of duplicate records, the second field is the contents of the record. We will sort this content. Sort-k 1 means that the first field in each row is sorted, which is the field that represents the number of duplicate records. Because the sort command's default ordering is ASCII, which causes the value 2 to precede the value 11 when sorting from large to small, you need to use the-n parameter to specify that the sort command is sorted by numeric size. -R denotes reverse order, which is ordered from large to small.


4) cat T1.log | Sort | uniq-c | Sort-k 1-n-r | Awk-f '//' {print $} '


After sort data | uniq-c | Sort-k 1-n-R after the text is http://192.168.1.100 such format, we need the result is 192.168.1.100 such a format, need to remove HTTP.//These fields, with awk processing, awk-f '//' is to divide the http://192.168.1.100 into 2 parts of http://And 192.168.1.100,{print the second part of the array, that is, the 192.168.1.100


5) cat T1.log | Sort | uniq-c | Sort-k 1-n-r | Awk-f '//' {print $} ' | Head-5


The head command represents the first x rows of the selected text. The contents of the first five elements of the sorted result can be obtained by head-5.


6) above http://192.168.1.100 so the log format is just a simple example, the actual format will be more complex, such as:/get Http://192.168.1.100/auth, we can use Awk-f or CUT-K command to intercept the URL. For details, refer to the awk, Cut command usage method.


Reference: http://www.linuxidc.com/Linux/2011-01/31347.htm


Linux sort Uniq awk head complete access log statistics sorting function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.