Linux sort Uniq awk head completes the Access log statistics sorting function __linux

Source: Internet
Author: User

When we develop, we often count some access logs, and the URLs in the access log are massive, and many of them are duplicate content. Take the URL, for example, to count the URLs with the first 5 frequency occurrences in the URL, sorted by the number of occurrences in descending order.

Linux command:cat url.log | sort | uniq-c |sort-n-r-k 1-t ' | awk-f '//' {print $} ' | head-5

Now to analyze the meaning of these command combinations.

0 Access Log Sample


1) cat T1.log


Represents the sorting of content in the data file. The sort command sorts the contents of each row according to the dictionary order (ASCII code), which ensures that the duplicate records are adjacent to each other.


2) Cat T1.log | Sort | Uniq-c


Here, through the pipe (|), the output of the command in the left part is entered as the right part. Uniq-c represents the merging of adjacent duplicate records and counts the number of duplicates. Because Uniq-c only merges adjacent records, you need to sort them before using this command.

3) Cat T1.log | Sort | uniq-c | Sort-k 1-n-R

After uniq-c processed data format such as "2 Data", the first field is a number, indicating the number of duplicate records, the second field is the contents of the record. We'll sort this content. Sort-k 1 means that the first field in each row is sorted, and this refers to the field that represents the number of duplicate records. Because the sort command's default ordering is in ASCII, which results in sorting from large to small, the value 2 is ranked before the value 11, so you need to use the-n argument to specify that the sort command is sorted by numeric size. -R is in reverse order, that is, sorted from large to small.


4) cat T1.log | Sort | uniq-c | Sort-k 1-n-r | Awk-f '//' ' {print $} '


After sort data | uniq-c | Sort-k 1-n-R after the text is http://192.168.1.100 such a format, we need the result is 192.168.1.100 such a format, need to remove http://these fields, using awk to deal with, awk-f '//' The role of the HTTP://192.168.1.100 component 2 http://and 192.168.1.100,{print $} is to take the second part of the array, that is, the 192.168.1.100


5) cat T1.log | Sort | uniq-c | Sort-k 1-n-r | Awk-f '//' {print $} ' | Head-5


The head command represents the first X line of the selected text. The first five elements of the sort result can be obtained by head-5.


6 above http://192.168.1.100 so the log format is just a simple example, the actual format will be more complex such as:/get Http://192.168.1.100/auth, we can use Awk-f or CUT-K command to intercept the URL. Refer to the use of awk and cut commands for details.


References: http://www.linuxidc.com/Linux/2011-01/31347.htm


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.